Look behind the screen at many major federal government portals—Science.gov, FirstGov.gov, even GPO Access—and you will often find the Department of Energy's Office of Scientific and Technical Information (http://www.osti.gov) playing a leading role. Recently it expanded its collection of government contract databases (http://www.osti.gov/fedrnd) with some half-a-million summaries of R&D projects to those supplied by the DOE itself and five other federal agencies. Three brand new sources are the Small Business Administration awards; Department of Agriculture (USDA) ongoing and recently completed research projects in agriculture, food and nutrition, and forestry; and snapshots of current work by the Environmental Protection Agency. OSTI's Federal R&D Project Summaries Database incorporates not just DOE records of ongoing and recently completed projects, but also those from the National Institutes of Health (NIH) and the National Science Foundation (NSF).
OSTI uses a metasearch technology developed by tiny Deep Web Technologies (http://www.deepwebtech.com) to perform searches across databases located on different agency host sites without requiring the searcher to enter multiple queries. Results from the sequence of searches are merged and ranked by relevancy. Searchers can also choose to specify individual sources. Suppliers of data to the OSTI service are SBA Technology Resources Network (TECH-Net), USDA Current Research Information System (CRIS), EPA Science Inventory, NIH Computer Retrieval of Information on Scientific Projects (CRISP), and NSF Awards Database.
Currently the public can also access this research tool through GPO Access. OSTI has a partnering relationship with the Government Printing Office. In conversation with Walter Warnick, OSTI's director, it became clear that partnering and facilitating cross-agency data service was basic to OSTI's concept of service to citizens. The office also supports GrayLIT, a search engine that includes report literature from DOD, EPA, NASA, and DOE, as well as Science.gov, a cross-portal connected to the FirstGov.gov "mother" portal for federal data. Warnick said that OSTI has been "pioneering closer collaboration so GPO Access' patrons can have access to the tools as well. The data on GPO Access looks like other GPO tools, but, in fact, the search features only reside at OSTI." Warnick is also a member of the GPO depository libraries council.
OSTI has already contacted the chair of the committee controlling data additions to Science.gov, proposing linking to the Federal R&D Project Summaries collection. Warnick is very hopeful that the connection may be completed by early October. (For background on Science.gov, read Paula Hane's NewsBreak, "Science.gov 2.0 Launches with New Relevance Ranking Technology," http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16445.)
These are tough times for federal information services. As Warnick put it, "I would be the last to suggest that we feel flush with our budget. It is a constant struggle. It would be hard to find a government organization that has suffered more cuts than we have since the mid-1990s." (Readers may remember that PubSCIENCE, an OSTI initiative to introduce the PubMed Central model to physical science data, was discontinued in 2002. See Marydee Ojala's NewsBreak, "PubSCIENCE Joins the Endangered Species List," http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=17110.)
Lean-and-keen seems to be Warnick's strategy. In comparing his tasks in leading OSTI with Bruce James' reforms at GPO, Warnick pointed out that both agencies have strengths and weaknesses in their drive to employ 21st-century modern technology. "Because GPO is so big and has such an enormous constituency, including the library community, it may have difficulty in moving fast and making quick decisions because of all its stakeholders. We do too, but our community is of a different nature. DOE researchers are all sophisticated and hungry for information products. We also have close relationships, which means we can move more quickly and more aggressively in employing new things. For example, we pioneered in working with Yahoo! and Google. All of what DOE owns, e.g., on GrayLIT, is searchable through Yahoo! as well as our project summaries and bibliographic information going back to the Manhattan Project ! It's in full text after 1995. We have made progress with Google as well, but the problem there is the ranking doesn't apply well to government information."
Warnick is particularly proud of the metasearch technology from Deep Web Technologies. "We were the first to pioneer it in 1999," said Warnick. "It is still very powerful, very cheap, and provides tremendous advantages for integrating data. It can offer different levels of sophistication from simple term searches like Google to finding any documents in a database. The more sophisticated versions can reach the full capability of using fields, assuming each database has a field structure, e.g., author in the project summaries. Metasearch can reach disparate databases while placing a negligible burden on creators, increasing Web traffic, and bringing aggregated search results to users with little or no delay." The software is in use in several other DOE offices besides OSTI.
Whatever works to get data to the public seems to fit with Warnick and OSTI's view of its service mission. "We want to bridge from Open Web-type searching to sophisticated database searching. We will expose our data through Google and Yahoo! as well as through metasearching of bigger and bigger aggregations, so patrons don't have to identify and use sources one at a time." Once Warnick gets the links of the expanded R&D summaries on Science.gov, can FirstGov.gov be far behind?