Thomson ISI to Track Web-Based Scholarship with NECís CiteSeer
Posted On March 1, 2004
With the Open Access movement bringing Web-based scholarship to increased prominence, leading A&I services that have long provided the access tools to identify scholarship face new challenges. Thomson ISI (http://www.isinet.com), a longtime leader in netting scholarship, primarily through citation patterns, has launched a new initiative to handle this problem. It will collaborate with NEC Laboratories America (http://www.nec-labs.com) to create a comprehensive, multidisciplinary citation index for Web-based scholarly resources. Due out in early 2005, the new Web Citation Index will tap a number of technologies developed by NEC, primarily the "autonomous citation indexing" tools of NEC's CiteSeer software. CiteSeer (http://citeseer.nj.nec.com/cs) has been highly praised for its strength at monitoring and connecting research for computer literature.
James Pringle, vice president for development at Thomson ISI, pointed out: "Today, we are seeing impressive shifts in the nature of scholarly communications. During this transformation, our mission remains the sameóto provide researchers with access to the highest-quality content available, no matter what medium or business model supports it." Pringle broke down Open Access research into three categories: open access journals, which often maintain a traditional business model; open archive, which usually receives institutional support; and self-archiving, which often includes items originally published through more traditional sources. Thomson ISI hopes to work through the problems involved in tapping the different categories for high-quality research.
NEC CiteSeer is a scientific literature digital library that packages a group of algorithms, techniques, and software. It can handle PostScript and Adobe PDF research articles found on the Web as well as HTML. The CiteSeer technology includes extraction of bibliographic citations, autonomous citation indexing, calculating citation statistics and related documents, reference linking to cited articles, citation context display, automatic notification based on user profiles, correlation of related documents, full-text indexing, query-sensitive summaries of the context of search terms in an article, citation graph analysis, and targeted Web crawling.
Basically, Thomson ISI plans to use its broad base of editorial expertise to expand the coverage NEC CiteSeer currently offers in selected fields to the all-embracing view of scholarship long espoused by ISI. The citation indexes from ISI cover science and technology for half a century and decades of social sciences, arts, and humanities. Robert Millstein, president of NEC Laboratories America, said, "Our collaboration with Thomson will enable us to significantly improve and enhance the NEC CiteSeer service for the research community."
Thomson ISI clearly intends to move carefully in this new area. It plans to operate a pilot project throughout 2004, gathering feedback from the scientific and scholarly community, and open full access to the new index and content sets in early 2005. Pringle said that they would be "looking at issues field by field" and examining "the context for our academic, corporate, and business customers." When operational, the product will operate as a segment within ISI's Web of Knowledge, the subsidiary service to ISI's high-priced Web of Science, licensed primarily to major research libraries and institutions. The Web Citation Index will expand beyond the narrow definition of Web journals that currently limits ISI coverage to 21 open access journals based on quality assessments and a format modeled on print. The new Index will include preprints, proceedings, technical reports, and other open access research sources.
Pricing policies for the new information source have not been set. However, it appears that some of the information will be available for free. Thomson already offers free access to ISIHighlyCited.com, which identifies the most cited and influential scientific authors. Users of the ISI Web of Knowledge can simultaneously search scholarly resources on the open Web and other proprietary databases through a single interface. Christopher Toelg, director of business development for CiteSeer at NEC Laboratories America, also expected current CiteSeer-based free services to continue, although he said that it "will probably depend on what the subscription-based product looks like."
Most of the NEC researchers who developed the CiteSeer/ResearchIndex technology have moved on to other positions. Lee Giles, currently at the eBusiness Research Center at Penn State University's Smeal College of Business, applies the CiteSeer technology to create other citation indexed Web tools, such as eBizSearch (http://www.ebizsearch.org), covering electronic commerce. This particular use of CiteSeer technology shows what it can do working at full power. The site crawls a broad range of Web sites, including universities, commercial institutions, government agencies, research institutes, etc., and catalogs academic articles, working papers, consultant reports, magazine articles, statistics, etc. Giles has a noncommercial license for CiteSeer that guarantees his access for the length of copyright, a guarantee that would corroborate Toelg's expectation that alternative uses of CiteSeer will remain free on the Web.
When asked whether Thomson ISI planned to archive any of the often-evanescent research on the Web, Pringle indicated that the company doesn't see archiving as its role. "Our role is navigation, building access tools," said Pringle. Since Thomson ISI is a subset of Thomson Scientific and Healthcare, I also asked Pringle if any plans were underway to expand use of NEC software or other text-mining tools to assist other subsidiaries, such as Derwent or the newly acquired BIOSIS. He said it was too early to comment at this point, but appeared to enjoy the question.