Online KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA ITIResearch.com
Other ITI Websites

News & Events > NewsBreaks
 



Back Index Forward
Twitter RSS Feed
Weekly News Digest

June 5, 2014 — In addition to this week's NewsBreak(s), the editors have compiled the Weekly News Digest, featuring stories from the week just past that you should know about. Watch for additional coverage to appear in the next print issue of Information Today.

CLICK HERE to view all of this week's Weekly News Digest items.

HathiTrust Dataset Analyzes Page-Level Features

The HathiTrust Research Center (HTRC) released the alpha version of a new dataset of page-level features (notable or informative text characteristics) extracted from HathiTrust’s original, scanned representations of public domain volumes.

Extracted features include occurrences of terms as parts of speech, term-frequency counts, and line and sentence counts on each page of text, with a total of more than 67 million pages. Pages are broken into header, body, and footer sections so they can be analyzed at scale.

The HTRC welcomes feedback on how the dataset can help researchers.

Source: HathiTrust



Send correspondence concerning the Weekly News Digest to NewsBreaks Editor Brandi Scardilli

Related Articles

10/15/2012HathiTrust Lawsuit Decision Reaffirms Libraries in the Digital Age
4/25/2013New Data Mining and Analytics Tools for the HathiTrust Digital Library
9/10/2013HathiTrust Records Go Live on the DPLA
12/3/2013HathiTrust Doesn’t Monkey Around With Metadata Management


Comments Add A Comment

              Back to top