Science.gov 2.0 Launches with New Relevance Ranking Technology
Paula J. Hane
Posted On May 24, 2004
Science.gov has served as the gateway to reliable information about science and technology from across federal government organizations since its launch in December 2002. Now, the interagency alliance has launched Science.gov 2.0, hailing it the "next major step in government science information retrieval." The new site offers additional content, technological enhancements, and a newly developed relevancy ranking technology that helps patrons get to the best documents quickly. Science.gov 2.0 lets users search across 30 databases from 12 government science agencies (up from 10 agencies in version 1.0), as well as across 1,700 Web sites—that's 47 million pages, with results presented in relevancy ranked order.
Use of the site remains free with no registration required. The "FirstGov for Science," as it is called, serves the educational and library communities, as well as businesspeople, entrepreneurs, agency scientists, and anyone with an interest in science. The continuing advantage of Science.gov is that it lets users search for information by subject, rather than by the agency sponsoring it. In announcing the launch, Secretary of Energy Spencer Abraham said: "From the most current information on new technologies, to historical research results, to the most promising medical advancements, Science.gov connects citizens to the world of science."
The metasearch capability for version 1.0 was supplied by Deep Web Technologies (DWT), a small company based in Los Alamos, N.M. For Science.gov 2.0, the Department of Energy (DOE) funded the development by DWT of the new relevance-ranking technology and applied it to metasearches in the Deep Web—the government databases that are unavailable to general Web search engines. Walter Warnick, director of DOE's Office of Scientific and Technical Information (OSTI), called it a first: "the first time relevance ranking has been applied to large collections of Federal R&D results."
Specifically for Science.gov 2.0, DWT developed and deployed QuickRank, which, as its name implies, quickly processes, sorts, ranks, and sifts through thousands of search results. DWT says QuickRank sets the bar higher, vastly outperforming other technologies. QuickRank is part of DWT's Distributed Explorit technology that does federated searching of multiple databases.
"The search technology behind Science.gov 2.0, QuickRank, is far superior to the technology we developed for the original December 2002 Science.gov release," explained Abe Lederman, QuickRank's creator and Deep Web Technologies' president and CEO. "We've learned a lot about how to do fast searches and bring back the most relevant documents in the last year and a half."
Searching with Science.gov 2.0 is certainly faster and easier than before. From the main page, users can run a simple metasearch across all resources or choose to explore sites by topic and then drill down. For example, selecting Health and Medicine and then choosing the narrower topic Alternative Medicine provides a list of 6 sites with direct links to them. This listing of selected federal scientific and technical Web sites is maintained by the National Technical Information Service (NTIS). The number of resources included in Science.gov has expanded from 1,000 in the first version to 1,700 in version 2.0.
From the advanced search page, users can search by databases, by Web sites, or both by selecting all. In the first version, users had to first pick from the list of resources and select up to 10 databases, which was the limit for metasearching. Users needed to know ahead of time which databases to search and then try different databases for a complete search. In version 2.0, the default is to search all databases.
For example, the Health and Medicine category selected from the advanced search page includes the following resources:
- Biologics Evaluation and Research—Blood, vaccines, therapeutics and related products information from the FDA
- ClinicalTrials.gov—Current information from NLM on clinical research studies
- Drug Evaluation and Research—Drug evaluation information from the FDA
- MEDLINEplus Health Information—Consumer health information from National Library of Medicine
- PubMed—Citations by the National Library of Medicine to the results of biomedical research
Of course, users can go directly to PubMed or other sites, but should find it an advantage to be able to select multiple resources for a single search, such as PubMed along with ClinicalTrials.gov and Drug Evaluation and Research, when checking for details on a new drug.
Here's a quick summary of the new technical features of Science.gov 2.0:
- QuickRank Relevancy Ranking
- Advanced Search by subject
- Search ALL sources at once (as default search)
- A one-step query box search (as opposed to the previous two-step search boxes)
- A progress bar showing search status as it proceeds
- An option to display Results by Agency Source
- Expanding number of returns through a MORE button
When I ran some tests and got unexpected results, I talked with Lederman. He cautioned: "Keep in mind that each agency included has its own search engine and some are better than others. QuickRank filters and ranks these results." I asked why I got results listed by source but with no relevancy ranking indicated (a system of 1 to 4 stars). He explained that QuickRank doesn't operate if searches are too specific. QuickRank filtering is based on placement of key words: If a keyword is not in a prime location in the document, it's likely the result won't be ranked. He also said that Science.gov does phrase searching by default, even though I couldn't find details on this in the help document. Serious searchers will wish for more sophisticated features, such as limiting or sorting by date, but metasearching across diverse resources has inherent limitations.
Planning is already underway for Science.gov 3.0, which Warnick expects will be available in about a year. He said that 3.0 will offer improved sophistication in the relevancy ranking algorithm, improved Boolean capabilities, new fielded searching, and an alert service.
Hosted by the DOE's Office of Scientific and Technical Information, Science.gov is made possible through a collaboration of the Departments of Agriculture, Commerce, Defense, Education, Energy, Health and Human Services, and Interior, as well as the Environmental Protection Agency, the Government Printing Office, the National Aeronautics and Space Administration, and the National Science Foundation, with support from the National Archives and Records Administration.