KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe Internet@Schools Intranets Today KMWorld Library Resource Literary Market Place OnlineVideo.net Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



News & Events > NewsBreaks
 



Back Index Forward
Twitter RSS Feed
Weekly News Digest

June 5, 2014 — In addition to this week's NewsBreak(s), the editors have compiled the Weekly News Digest, featuring stories from the week just past that you should know about. Watch for additional coverage to appear in the next print issue of Information Today.

CLICK HERE to view all of this week's Weekly News Digest items.

HathiTrust Dataset Analyzes Page-Level Features

The HathiTrust Research Center (HTRC) released the alpha version of a new dataset of page-level features (notable or informative text characteristics) extracted from HathiTrust’s original, scanned representations of public domain volumes.

Extracted features include occurrences of terms as parts of speech, term-frequency counts, and line and sentence counts on each page of text, with a total of more than 67 million pages. Pages are broken into header, body, and footer sections so they can be analyzed at scale.

The HTRC welcomes feedback on how the dataset can help researchers.

Source: HathiTrust



Send correspondence concerning the Weekly News Digest to NewsBreaks Editor Brandi Scardilli

Related Articles

10/15/2012HathiTrust Lawsuit Decision Reaffirms Libraries in the Digital Age
4/25/2013New Data Mining and Analytics Tools for the HathiTrust Digital Library
9/10/2013HathiTrust Records Go Live on the DPLA
12/3/2013HathiTrust Doesn’t Monkey Around With Metadata Management
4/16/2015HathiTrust Adds Duke University Press Backlist Titles
5/12/2015HathiTrust Premieres New Dataset


Comments Add A Comment

              Back to top