|Weekly News Digest
May 12, 2015 — In addition to this week's NewsBreaks article and the monthly NewsLink Spotlight, Information Today, Inc. (ITI) offers Weekly News Digests that feature recent product news and company announcements. Watch for additional coverage to appear in the next print issue of Information Today. For other up-to-the-minute news, check out ITIís Twitter account: @ITINewsBreaks.
CLICK HERE to view more Weekly News Digest items.
HathiTrust Premieres New Dataset
The HathiTrust Research Center (HTRC) released the HTRC Extracted Features Dataset, which was sourced from 4.8 million public domain volumes from the HathiTrust Digital Library collection. These volumes contain more than 734 billion words in dozens of languages, as well as works from multiple centuries.
The dataset’s features include volume-level metadata, part-of-speech-tagged token counts, header and footer identification, sentence and line count, and algorithmic language detection. Researchers can use these and other page- and line-level features to analyze large worksets of volumes at previously difficult-to-implement scales.
For more information, read the press release.
Send correspondence concerning the Weekly News Digest to NewsBreaks Editor