Free Money! Well, reasonably free. Three of the big five national newspapers have opened their archives to Google News Archive. And two of the three have left their backdoors open for dirt-cheap downloading. Which two? Read this NewsBreak and see. You'll find out what traditional vendors are doing sleeping with the enemy, too.
To quote Google's FAQ on content: "News Archive Search searches across a large collection of historical archives including major newspapers/magazines, news archives and legal archives. Search results include both content that is accessible to all users (such as BBC News, Time Magazine and Guardian) and content that requires a fee (such as Washington Post Archives, Newspaper Archive, and New York Times Archives)." Uh-huh. But where's it really coming from? From whom and how much? (For background, read the Newsbreak "Traditional Information Industry Opens Premium Content to Google News Archive" at http://newsbreaks.infotoday.com/nbreader.asp?ArticleId=18226.)
Clearly the Google News Archive does not cover every source that Google News does. Even some digital sources that appear in Google News with open Web archives, such as our own Infotoday.com NewsBreaks, do not seem to carry over into the new service. A concentration on print-to-digital sources or a focus on general news over trade press news could explain the omissions. However, much of the content will still come from the general Google Web database. For premium content, when users reach a paid content listing, they may see a snippet of text or as much as several hundred words of the entry before clicking through to the content provider's site for payment or authorization.
In some cases, a general Google Web Search will retrieve News Archive content as well in a separately marked group. According to Bill Brougher, Google News Archive product manager, the "type of things that would trigger this [appearance of Google News Archive entries on a Google Web Search] might be iconic events, like anniversaries, or repeating events, like the Olympics, and people, places, or things associated with them."
One of the prestigious suppliers, Factiva (http://www.factiva.com), a company jointly owned by Dow Jones and Reuters, currently contributes some 5 million articles, according to Alan Scott, chief marketing officer. The articles were selected based on 18 topics identified as most popular with Web users: Applications Software, Art, Books, Education, Entertainment, Environment, Government, Health, Information Technology, Movies, Music, Natural Disasters, Science/Technology, Sports and Recreation, Theatre, TV and Radio, University/College, and Weather. This coverage matches that given to Yahoo! News under the Yahoo! Search Subscriptions program introduced in 2005 (see "‘Fee' Web Content Accessed by Yahoo! Search Subscriptions" at http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16174 and "Varying Content Commitments from Vendors for Yahoo! Search Subscriptions" at http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16175.) Some of the topics hardly seem central to the business-oriented focus of a company owned, in part, by the publisher of The Wall Street Journal (WSJ ). Actually, Factiva excludes all WSJ content from its contributions, and those contributions currently extend back only 18 months. Scott assured me that this is only a test and that both content and length of coverage may expand soon.
In an interesting development, Dow Jones (http://www.dowjones.com; http://online.wsj.com/public/us) appears to be supplying decades of Wall Street Journal content directly in a separate arrangement with Google. In fact, searchers looking for archived WSJ articles may find it quite profitable to start their search on Google News Archive—particularly searchers who subscribe to WSJ.com. Articles extending back at least a decade and marked as coming from the Wall Street Journal Interactive Edition appear in full text when subscribers click on the Google News Archive links. What's so wonderful about this? Well, the contract with WSJ.com subscribers only offers a 90-day "free" archive as part of the subscription. You can search and find titles of earlier articles, but they cost $2.95 each, the same as all Factiva Publication Library articles. Now you can click and find years' worth of listings for WSJ articles in Google News Archive and get the full text at no extra charge. Is this a delicious oversight or an interesting promotion to encourage subscriptions to WSJ.com?
The New York Times , another major contributor, provides an archive of 2 million articles extending back to 1981 with a current digitization process promising to reach back to the 1850s by the end of the year. This corresponds to the archive found at http://www.nytimes.com. More than 1.25 million of those archives will cost consumers $4.95 per article, but not all consumers—the rest are free but carry ad displays. Users who also subscribe to Times Select for $7.95 a month can get up to 100 articles each month. So anyone who wants more than one article from a search (or in a month) from The New York Times would do well to consider Times Select as an option.
HighBeam Research (http://www.highbeam.com) has opened up 25 million premium content articles, though Patrick Spain, founder and CEO, said that Google has not finished indexing them all yet. How much of it goes into the Google News Archive and when is still unclear. The indexing arrangement with Google began 2 years ago, Spain told me, as part of a Google Premium effort still in process. Spain views the Google News Archive as an evolution of the Google Premium effort. "It makes sense. Google is trying to find a place in its world for this content." The full HighBeam service carries some 36 million articles from more than 3,000 sources, including 500 news, 400 health and science, 1,000 business and technology, 300 international, and 400 hobbies and personal interest publications. Its coverage extends back for 20 years. HighBeam does not support pay-per-view, nor will it, according to Spain. However, a monthly subscription buys unlimited full-text viewing for only $19.95 (annual $99.95). If a searcher wants a bunch of articles, even a single search request might justify the expense.
Time Warner Inc.'s TIME magazine (http://www.time.com) provides a full archive for its entire content back to 1923, which is close to 300,000 articles. The magazine has experimented with different business models in the past, including restricting free access to print subscribers only. Now it makes its entire archive available to all for free with online ads. Availability through the Google News Archive should increase online traffic.
Thomson Gale has already made millions of articles available in its AccessMyLibrary.com program; they are also open to Google and Yahoo! search engines. This unique initiative connects people to content only if they can supply the right library card, one that indicates a library licensing its content from Gale that has joined the AccessMyLibrary program. Publishers whose content appears in the program may receive additional revenue when usage records for the program trigger additional royalty payments.
The Washington Post has made its digital archives available through a joint arrangement with its partner, ProQuest. Online searchable archives of The Post are already available to consumers through the ProQuest Archiver service, extending back to 1987. Full coverage back to the first issue of ThePost in 1877 is only available through ProQuest Historical Newspapers, licensed to libraries and carrying page images, according to Rod Gauvin, executive vice president of publishing and marketing for ProQuest. However, Brougher seemed under the impression that ThePost content in Google News Archive extended back to "the late 1800s." Gauvin discussed ProQuest's position on the new service. "ProQuest is participating in Google's News Network initiative because we believe this will be a valuable experience in unlocking proprietary content and in providing a test of alternative economic models. The content set and functionality for consumers is very different from the product offered to institutions. PQ Archiver, used in the Google scenario, provides access to article image, but access through ProQuest offers full page image, is fully browsable, and offers tremendous utility and application. It's far more robust and able to support the rigorous research needs of professional searchers and scholars." It was unclear as to when or whether other newspapers in the ProQuest collection would be added to Google News Archive.
For newspapers, the earliest content in the new service seems to come from NewspaperARCHIVE.com, a service launched by Heritage Microfilm in 1999. The full service carries 45 million image pages of newspapers and continues to add more newspapers every month. The digitized images are OCRed to create searchable text, and searchers can retrieve images of articles as PDF files. Subscriptions start at $9.95 per month. Unlike Gale's AccessMyLibrary program, there is no way at present for a library patron to take advantage of the free access granted to schools and libraries through Access NewspaperARCHIVE.com. This program lets students and library patrons view, save, and print full-page newspapers from 1759 to 1977, provided the newspapers are not published in the school's or library's home state.
Besides news, the Google News Archive also carries full text of legal cases. A major legal text provider, Fastcase (http://www.fastcase.com) has opened its law library, previously available only to subscribers, to both Google and Yahoo! Search (which has had the data online since April). The Fastcase collection includes more than 3 million state and federal cases dating back to 1754, allegedly the oldest document in News Archive. Fastcase charges $4.99 for each full-text case.
Other contributors to the Archive include LexisNexis with an unspecified, but limited subset of content from its credit card LexisNexis AlaCarte! Service, Readex/Newsbank, Guardian Unlimited, Wolter Kluwer's Loislaw, etc.