Internet Archive Opens TV News Section
Posted On October 1, 2012
The Internet Archive opened a section titled TV News Search & Browse devoted to television news broadcasts. The initial launch includes 3 years dating from 2009 with a broad range of sources, more than the Vanderbilt Television News Archive, though Vanderbilt’s files extend back to Aug. 5, 1968. The new collection began with about 356,000 news programs from all the national U.S. networks and local stations in San Francisco and Washington, D.C. It updates with a 24-hour delay. Searching the service uses the closed-captioning mandated by law for such programs. Results offer users streamed 30-second clips in low resolution to see if they want to see the rest of the program. However, seeing full programs can get pricey—$50 per DVD-ROM, one program per disc (unlike Vanderbilt’s options that will put various segments on one disc). Both Internet Archive and Vanderbilt designate the DVDs they send as “borrowed.” Internet Archive requires their return within 30 days. Neither service puts any form of copy protection on the DVDs.
The Internet Archive’s network coverage encompasses ABC, CBS, CNBC, CNN, CSPAN, CW, Comedy Central, Current Estrella TV, FOX, FOX Business, FOX News, ION, KQED (San Francisco), MSNBC, PBS, TLN, TeleFutura, Telemundo, and Univision. More than 1,000 individual news series are listed on the site.
In contrast, the core collection of Vanderbilt’s archive includes evening news broadcasts from ABC, CBS, and NBC, an hour per day of CNN since 1995, and Fox News since 2004. The total content contains more than 850,000 news stories from more than 30,000 hours of broadcasting. Vanderbilt does not offer streamed clips, but users can get entire broadcasts on DVDs as well as compilations of individual news stories (segments) as specified by the user.
What is now available on the new Internet Archive service is not all that the Archive has stored. According to Brewster Kahle, founder and head of the Internet Archive, the Archive has been collecting television news since the year 2000. Some of that collecting has already appeared on the Archive (e.g., the 9/11 coverage). As time goes on and with additional sponsored funding, he hopes to tap into that stored content. In the meantime, he recommends users tap into the Vanderbilt service, which he says inspired his own service.
Searching news content using free-text is always a challenge for the thorough searcher, but even more so when the text comes from closed-captioning, according to John Lynch, director of the Vanderbilt Television News Archive. In particular, he points to the problem of spelling. Apparently hitting F7 just won’t do the job with all the names of people, places, and things that the closed-caption data speedsters must enter in real-time to accompany broadcasts. Lynch uses a device from SnapStream containing sophisticated software to clean up Vanderbilt’s input.
The Internet Archive relies on the closed-caption content itself, according to Kahle. He considered it adequate. Marshall Breeding, an independent consultant, speaker, and author, has had extensive experience with the Vanderbilt archive and lengthy discussions during the creation of the Internet Archive collection. Breeding stated, “My experience is that that text is fairly good, but does have lots of spelling inconsistencies and other errors. But in the aggregate, it seems to make a strong basis for populating a search index, creating tag clouds, facets, and other interface tools. The volume of material defies individual human intervention, but maybe the Television Archive will be able to come up with some algorithmic approach to cleaning up the text.”
Ultimately, searcher ingenuity and diligence will remain key factors for comprehensive searches. For example, did you know that some liberal-minded news services were now referring to the presidential campaign of “Ritt Momney”? Only the Internet Archive’s service could trace the frequency of those terms across a broad range of sources. Vanderbilt clearly has the advantage of 4 decades more content than the new service. But, even within the 2009 to present category with its limited coverage, it offers abstracts written by humans for humans. This could solve problems of misspellings and alternative terminology, but, as both Breeding and Lynch point out, it does limit its ability to expand coverage. Breeding said, “The search interface of the Vanderbilt Television News Archive is based on manually written abstracts. That approach is not scalable to the volume of content that the Television Archive is processing.”
Success depends on what you are looking for and what you do to reach it. If you want to track the impact of a news story, the greater coverage of the Internet Archive would be essential. It also offers a timeline display of results by source. If you want to see what someone or something looked like RIGHT NOW, the 30-second streamed clip from the Internet Archive is your best hope. On the other hand, if you want to track a major story over a length of time with a minimum of duplication, perhaps Vanderbilt would be your best choice. Reading abstracts would certainly be cheaper than any of the fees for borrowed DVDs from either archive. Neither service charges for searching or browsing. The standard rate for a half-hour of a complete program from Vanderbilt is $100 with discounts to $50 or $25 for specific categories of users. The standard rate for compilations of segments is $27 per segment with discounts to $17. Internet Archive charges $50 per disc and only sends complete programs, one per disc.
And then the sophisticated (aka cheap) searcher who only wants to get the facts of the story might use either archive to pin down the date and a search term or two and go to free or cheap print sources for all the details. Of course, if you’re working on a dissertation or Ken Burns documentary, you might find it economically prudent to just fly to San Francisco or Nashville, Tenn. and “borrow” the sources you need on site.
The money to support the Internet Archive effort comes from grants from the Library of Congress, the National Archives, and others. It is also talking with the Knight Foundation, the Arcadia Foundation, the Sunlight Foundation, and the Craig Network. The Vanderbilt effort also taps sponsoring institutions, offering discounts for students and faculty. And, of course, the DVD “borrowing” fees also help defray costs.
“What about copyright? What about permissions?” you may be asking. Shhh.
For background, see Paula J. Hane, “Vanderbilt Improves Television News Archive,” Information Today, October 2002.