|Weekly News Digest
March 12, 2001 — In addition to this week's NewsBreaks article and the monthly NewsLink Spotlight, Information Today, Inc. (ITI) offers Weekly News Digests that feature recent product news and company announcements. Watch for additional coverage to appear in the next print issue of Information Today. For other up-to-the-minute news, check out ITIís Twitter account: @ITINewsBreaks.
CLICK HERE to view more Weekly News Digest items.
Reuters Releases Free Archive of Over 800,000 News Stories
Reuters, the global information, news, and technology group, is, for the first time, making available free of charge, large quantities of archived Reuters news stories for use by research communities around the world. The first Reuters Corpus archive includes over 800,000 English-language news stories, equivalent to the annual global news output of Reuters.
The Reuters Corpus offers researchers a unique body of static information upon which to research, test, and benchmark emerging technologies. These include research into language processing, speech synthesis, voice recognition, indexation, search, and information retrieval.
The growth of the Internet has led to an explosion in the information services available to businesses and consumers. Additionally, improvements in bandwidth have increased the variety of channels and devices used to deliver and access information. Consequently, research into technologies that help businesses and individuals improve the way they access, search, and manipulate information has assumed even greater significance. According to the announcement, the availability of the Reuters Corpus assists organizations conducting this research.
Richard Willis, head of research and standards for the Reuters Chief Technology Office, said: "Reuters has always been heavily involved in language and data research, and to strengthen our links with the research community around the world, we have made available one of the most complete news archives ever released. The data provided will aid research into many aspects of language processing and information retrieval."
The archive includes all English-language stories produced by Reuters globally between August 20, 1996 and August 19, 1997. The news data is available on two CD-ROMs and formatted in XML to make it easier to use as a research tool. All the news stories are fully referenced using a total of 775 different category codes for topic, geography, and industry sector.
Marc Moens, head of Edinburgh University's Language Technology Group, said: "Because of its size and the amount of preparation that has gone into it, the Reuters collection provides scope for many new types of research and development work. It allows for the systematic evaluation of progress and comparison of results between different development groups. I am sure this Corpus will soon be seen as a standard in document-related work."
Yorick Wilks, a professor at Sheffield University, said: "We can already see the potential benefits of such a corpus for stylistic language analysis. The topic codes would also give us the opportunity to analyze the geographic location, industry area, or topic that received news coverage from Reuters. Areas such as semantic Web applications, categorization research, and machine learning of topic routings would also benefit. This will be a very useful resource."
As part of the research agreement covering use of the archive, researchers will supply Reuters with a copy of any material published using the data. Working with this feedback from research groups, Reuters hopes to bring out other corpora, including multilingual versions and volumes covering other date ranges. Further information on the corpus is available at http://www.reuters.com/researchandstandards/corpus.
Send correspondence concerning the Weekly News Digest to NewsBreaks Editor