Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

News & Events > NewsBreaks
Back Index Forward
Threads bluesky LinkedIn FaceBook Instagram RSS Feed

Yahoo! Pursues Invisible Web Content for Its Search Engine
Posted On March 8, 2004
Having launched a major challenge to Google's dominance of the Web search engine field (, Yahoo! has now added a Content Acquisition Program (CAP) that offers enhanced coverage to both commercial (paid inclusion) and noncommercial (nonpaying) data sources. While Site Match (the commercial side of this offering) seems focused on increasing revenues for Yahoo!, Public Site Match (the noncommercial side) promises to introduce major new content sources from databases generally tagged as "invisible" or "deep" Web sources. At this point, the CAP initiative covers only 1 percent of the billions of pages in Yahoo!'s search engine index, according to company reports. Yahoo! representatives would not confirm any plans to provide clustering or categorization or any special status for the information beyond merging it with all other entries in the Web-search database.

In launching the program, Tim Cadogan, vice president of search at Yahoo!, Inc., said: "Our primary goal is to discover all the content on the Web for free. In addition, the Content Acquisition Program serves to make a richer set of content accessible to users in a way that most search engines today are unable to achieve. This program enables us to develop direct, structured relationships with content providers to increase comprehensiveness, maintain the most up-to-date data, improve relevance, and thereby deliver a higher-quality search experience for users."

The Public Site Match service acquires content from the dot-gov, dot-edu, and dot-org side of the Internet (i.e., government, academia, and nonprofit agencies). The following are current participants in the program:

  • Library of Congress (specific content not yet announced)
  • National Science Digital Library, the National Science Foundation's online library with more than 250 collections for science, technology, engineering, and mathematics
  • New York Public Library (specific content not yet announced)
  • National Public Radio (NPR) daily audio transcripts from news and information programs with fresh feeds indexed within 2 days
  • Northwestern University's online OYEZ project, with more than 2,000 hours of Supreme Court audio recorded since 1995
  • Project Gutenberg's free electronic books
  • UCLA's Cuneiform Digital Library Initiative, with content documenting Babylonian history back to 3500 B.C.
  • University of Michigan's OAIster project for academic collections
  • Wikipedia, a free, multilingual online encyclopedia with articles in more than 50 languages

Yahoo! programmers work with each database to handle unique features and metadata.

The commercial Site Match service comes through Overture, a wholly owned subsidiary of Yahoo!. It allows content providers to submit Web content, update it frequently (as often as every 48 hours), target leads, and track and optimize performance. Smaller commercial content providers throughout the U.S. can use Site Match as a self-service subscription program, which is operated through credit cards. Larger commercial content providers will use the full-service Site Match Xchange program.

Resellers for Site Match include Position Technologies, Marketleap, ineedhits, Trellian, Network Solutions, and infoSpider. Site Match Xchange's resellers include TrafficLeader, Performics, and GO TOAST. Overture will migrate customers currently participating in six paid inclusion programs from search engines acquired by Yahoo! (Inktomi, AltaVista, and Fast) into the unified Site Match program.

Yahoo! has assured critics that the merger of paid-inclusion data feeds into the main Web search engine does not mean that paid-inclusion entries will receive higher placement in search results. Sponsored results, another type of paid inclusion, will continue to appear on the top of the screen—clearly labeled—before the full results. Experts looking at the new service predict that users will find it difficult to identify entries contributed from the Site Match program. Shortly after Yahoo! announced its new paid-inclusion program, AskJeeves announced that it had decided to discontinue theirs, called Index Express, because it affected the relevance of results to the disadvantage of users.

While press coverage of Yahoo!'s new developments focus more on the possibility of irrelevant material rising to the top of search results due to the new paid inclusion material, information professionals worry about results from the newly added "Invisible Web" sources sinking back into invisibility. Gary Price's ResourceShelf posting on March 2 reported that six search strategies built around specific information on an NPR event known to be indexed did not draw the correct result above the first 100 results. Price wrote: "The addition of new content into general Web search engines is not a bad idea if the searcher has the tools to find it.... Many people who search the Web use only a few nonspecific search terms and look only at the first five to 10 results. In other words, if it doesn't appear on one of the first results page, even if it's in the database, does it really offer value to the searcher?"

Yahoo! has a tradition of more hands-on attention to handling data. It began as a directory service with human design of folders and thesauri. As a portal service, its home page ( is a mass of categories leading to different data groupings. In the past, it offered searches of premium content (periodical and pamphlet literature) from a connection with the former Northern Light. The new interface for Yahoo!'s search engine ( follows the Google model of a bare page with a minimum of icons, but even there Yahoo! offers users the option to add or remove icons from a more extensive selection. However, at present, a Yahoo! representative said there is no way for searchers to reach just the noncommercial collection of high-quality sources.

On the other hand, the service is in its early days. Yahoo! has invited academics, government agencies, and not-for-profits to bring it data offerings. Yahoo! plans to grow this area of its data coverage, probably to give its service something that Google doesn't offer—yet. However, if the data from the new sources just gets thrown into the ocean of Web entries, never to rise again, addition of the new content may serve neither users nor Yahoo!'s marketing goals.

Without removing the material from the general Web search engine, why couldn't Yahoo! still collect it into a icon category as it does with PeopleSearch, Yellow Pages, News, etc.? A name for the category that would resonate with users? "Library" comes to mind.

Barbara Quint was senior editor of Online Searcher, co-editor of The Information Advisor’s Guide to Internet Research, and a columnist for Information Today.

Comments Add A Comment

              Back to top