This has been a busy few months for those of us in the news business (and for those watching the news as well). While it may seem at times that the news is all negative in the aftermath of the September 11 tragedy and in following the woes of companies struggling with economic stresses, there have also been some bright spots. So, in the spirit of resuming "normal" routines, I can report that there have been both recent winning and losing news items concerning information access for online searchers.
A very big win for searchers is the October 24 launch of the Internet Archive's Wayback Machine (http://web.archive.org). The Wayback Machine makes it possible to surf pages stored in the Internet Archive's vast Web archive. We can literally travel back in Web time and view slices of Internet history.
The Internet Archive was founded in 1996 to build a digital library with the purpose of offering permanent and free access to researchers, historians, scholars, and the general public. Now, 5 years later, with a dozen Web crawls completed, the Internet Archive has made the Wayback Machine available to the public. It holds a collection of some 10 billion archived Web pages, dating from 1996 and comprising 100 terabytes. According to estimates on the site, 1 thousand copies of Encyclopaedia Britannica would make up 1 terabyte; the Library of Congress' 20 million books (not counting pictures) would be about 20 terabytes. The archive is growing at a rate of 12 terabytes per month and is reportedly the largest known database in the world.
Beyond these truly astonishing figures about the amount of data that's accessible through the Wayback Machine is how fascinating it is to browse back in time—not to mention how useful it is for tracking sites historically or for providing easy ways to conduct research on and comparisons of sites. It was nostalgic to look back at Information Today, Inc.'s older Web pages and reflect on how our site has improved.
There are also special Wayback collections on Election 2000; the events of September 11; and Web Pioneers, which highlights a handful of sites that played a role in the early development of the Internet. There is also the Television Archive (http://www.televisionarchive.org), a sister site whose first collection presents television news from around the world concerning the September 11 terrorist attacks.
Located in San Francisco, the Internet Archive is a public nonprofit whose benefactors include Alexa Internet, AT&T Research, Compaq, the Kahle/Austin Foundation, Prelinger Archives, Quantum DLT, Xerox PARC, the Library of Congress, and the National Science Foundation. Kudos to the Internet Archive and its supporters for such a valuable historical preservation effort. Finding and exploring the Wayback Machine was definitely a bright spot in my news week.
On the losing side of information access, we've had numerous announcements about information being withdrawn from public sites. Since the Supreme Court handed down its decision in the Tasini case (see the June 28 NewsBreak at http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=17563), we've been hearing reports of publishers, online services, and news sites like The New York Times removing articles written by freelance authors.
Just this week, a letter from LexisNexis went out to its customers, advising them of 13 Gannett newspaper titles that were being "temporarily taken down from LexisNexis services and products beginning Saturday, October 27, 2001." The titles included The Arizona Republic, The Cincinnati Enquirer, and The Detroit News. The letter stated, "We assure you that we are working closely with publishers to see that removed titles are restored in a timely fashion."
As searchers, we should be concerned—very concerned. Barbara Quint called it the "trashing of online data." By the way, next week, at the Internet Librarian conference in Pasadena, California, the Southern California Online Users Group (SCOUG) will sponsor an evening session on November 6 entitled, The Tasini Decision: The End of Full Text as We Know It? On the panel will be top executives from Dialog, Gale Group, and ProQuest, as well as Jonathan Tasini of the National Writers Union. And in the audience will be many searchers, as well as other information industry folks. It should be a lively evening!
And, if battling post-Tasini article disappearances isn't disconcerting enough for searchers, there was also news of many sites removing information as an anti-terrorism response. This touched off debates about balancing the public's need-to-know rights with legitimate security concerns. Numerous U.S. government agencies are reportedly reviewing their sites for anything that could be construed as sensitive information or helpful to our adversaries, and are removing this information from their Web sites.
The New York Times, for example, reported that the EPA has removed from its Web site a database with information on chemicals used at 15,000 industrial sites around the country. The Electronic Frontier Foundation has established a special page that attempts to document the "Chilling Effects of Anti-Terrorism" (http://eff.org/Censorship/Terrorism_militias/antiterrorism_chill.html). It lists sites that have shut down or removed information.
OMB Watch, a Washington, D.C., group that advocates government accountability in budgetary and regulatory matters, is also keeping close tabs on government information and is maintaining a list at http://www.ombwatch.org/info/2001/access.html. While you might be relieved to know that certain information on airport security measures or nuclear power facilities has been withdrawn from the open Web, you might be surprised to learn about the removal of environmental information, such as air- and water-quality data, or of spatial-transportation data.
The debates about public benefits vs. potential risks will continue, as they should. These are tough issues. But searchers need to be aware that they may not have access to certain expected data. In the meantime, to cheer up, go to the Wayback Machine, plug in a URL, and have some fun.