Google's new Web directory (http://directory.google.com) combines its sophisticated search technology with data from Netscape's Open Directory Project (ODP) to form a hybrid service that's uniquely broad and deep. The directory provides a "best-of-both-worlds" experience, augmenting Google's standard Web search results with hand-selected listings from the ODP.
Google joins AltaVista, AOL, HotBot, Lycos, and other major search services in offering ODP search results. Not yet 2 years old, the ODP has risen to become a significant player in the Web directory arena by making its data freely available to other search services. Google's directory is different than most others in two significant ways. First, directory search results are ranked according to Google's proprietary PageRank relevance system. Second, Google uses the technology that powers its regular Web search to search over all the content of sites within a category, not just the titles and descriptions.
"The ODP has very useful information," said Sergey Brin, president and co-founder of Google. "But it's tedious to browse. So we put our technology on top [to make it easier to find relevant results without having to scan through often lengthy alphabetized lists of links]."
Google pioneered the use of assessing the "importance" of a Web page as a measure of relevance. In essence, Google seeks to identify the most highly regarded pages on the Web, by analyzing both the quantity of links pointing to a page and the importance of the sites providing those links. "It's almost like a peer-review process for the Web," said Larry Page, Google CEO and co-founder.
The Google directory uses its patent-pending PageRank technology to list search results by relevance and quality of content, rather than the ODP's default alphabetical ordering. For example, PageRanked results in the "Business Schools" category lead off with links to Sloan, University of California-Berkeley, and Harvard. By contrast, the ODP's default alphabetized list leads off with links to an "Accelerated International M.B.A. Program," the Alfred University College of Business, Athabasca University, and so on.
Google can determine importance of directory listings because it goes beyond simple analysis of titles and descriptions that constitute the core of the ODP data. Google determines the importance of directory listings by comparing them with its own full-text database of Web pages. In essence, Google is applying the collective wisdom of the entire Web to the much smaller subset of pages selected by ODP editors to generate highly relevant results.
Integrated Search and Directory
Google has also integrated ODP data into the results of its primary search service, with links to "relevant categories" included at the top of search result pages. This makes it easy to search Google's full index and the smaller directory at the same time.
Google has indexed more than 200 million Web pages, with 300 million "reachable" pages, according to Brin. Reachable pages are those that are not included in Google's index, but that Google has determined are important based on analysis of pages that are in the index. In other words, Google can confidently provide links to more than 100 million pages based on inferred rather than computed relevance.
In contrast to Google's huge number of reachable pages, the ODP currently includes about 1.5 million entries, arranged in over 200,000 categories, selected and maintained by a volunteer corps of more than 22,000 editors. Though the ODP contains links to only a fraction of the estimated 1 billion pages on the Web, its focus is selective. "A ‘free for all' links page benefits no one—and the big win for the Open Directory model is providing high growth, with good quality," said Chris Tolles, marketing director for the ODP.
"The addition of Netscape's Open Directory Project creates the most comprehensive and robust search resource for finding information and browsing the Web," said Page. "We've combined the best aspects of search and directories to create an enhanced tool for easy access to information contained on the Web."