Scholar Citations—Google Moves into the Domain of Web of Science and Scopus
Nancy K. Herther
Posted On August 4, 2011
On July 20, 2011, Google formally launched Google Scholar Citations (GSC) to provide “a simple way for scholars to keep track of citations to their articles.” Citing the fact that this represents “a new direction us,” GSC is “currently in limited launch with a small number of users,” although some researchers have been able to create profiles in the past week.
The announcement came the same day that Google co-founder and newly designated CEO Larry Page announced the closure of Google Labs in an effort to refocus corporate energies in ways more directly related to protecting shareholder value by “prioritizing our product efforts.”
Author profiling, rising from the need to better disambiguate researchers and to better find and connect relevant researchers, has become an increasingly hot product area in the past 2 years, attracting the interests of such powerhouses as Thomson Reuters (Web of Science), Elsevier (Scopus), and Microsoft. With Google, we now have the motherlode of scholarly citation data, across the entire range of disciplines, available for author profiling and more sophisticated analysis and relevance linking.
A Quick Look Under the Hood
Jane Tinkler of the London School of Economics and Political Science was able to create her profile early on, finding the system to be “easy to use and accurate. The quick verdict is that the new facility looks distinctly promising,” Tinkler notes. “When Citations is fully open, and other academics have also completed their profiles, it will be easy to link through to my co-authors. It will be interesting to see if this enables me to pick up some citation counts that have been only attributed to the lead author on a publication due to incorrect or shortened referencing.”
GSC is remarkable in many aspects:
1. Ease of Use. Once you decide to register, you click a button and follow the clear instructions. You enter basic information identifying yourself and Google Scholar presents you with a listing of works attributed to you.
2. User Control of Content. An Action menu allows you to add missing citations or to delete any item erroneously attributed to you. The system also includes an “Import” option. The citation metrics for your works are immediately calculated and automatically presented. “When you add a group of articles,” Google notes, “we'll also keep track of changes to this group as our search robots index the web. You can choose to have these changes automatically applied to your profile (recommended) or emailed to you for review.” Making your profile public is done by merely clicking on “My profile is public” from the options menu.
3. Metrics and Graphics included. The system currently calculates two versions—for all works and for 'recent' works—of three key measures: the h-index which compares the set of some researcher’s most cited papers with the number of citations that these works have received in other people's publications; the i10-index, the number of publications on the list with at least 10 citations, and the Total Citations to that work in Google Scholar. There are many other potential measures—e-index, g-index, and w-index to name three—however Google has announced that it “would like to understand how these metrics are used before considering additional metrics.”
4. Integration with Other Systems. The Action menu also allows you to export citations and data from your profile by selecting all or individual records and choosing the “Export” option. The system allows you to select from three export options—BibTeX, RefMan, or EndNote.
5. For even naïve users, this is clearly laid out. Users can easily download, re-sort the order of the citations by clicking on Author/Title, Year, or Cited By, and mark/download records.
6. Perhaps best of all, this is free and built off the largest scholarly citation database on the planet.
Once you opt in (which is little more than a click of a button), the Google Scholar search engine takes over, automatically aggregating new data on all of the works cited from the Google Scholar database, making the need for author updating to be virtually nil. Although an email address is required, Google notes that users can “rest assured, we will NOT display your email address on your public profiles. Nor will we sell it, trade it, or use it to send you email unrelated to Google Scholar.”
The GSC process and presentation are far and away the slickest system available today. The physical layout of the information is very similar to the newly-released Microsoft Academic Search. Each of today’s systems offer some interesting features—such as helping authors clearly delineate their work (from potential ambiguous naming issues) and creating links to the works of others throughout the evolving growth of knowledge over time.
GSC’s “layout is clear, easy to follow and the inclusion of the photo capability and the graph of cites by year are very nice touches,” Tinkler notes. “It would be easy to remove items that are incorrectly included and the notes pages highlight that it is possible to add references that should be linked to me but are not currently.” Google Scholar seems to have a clear advantage today since profiles are integrated easily into the Google Scholar search results pages, which makes them easily, freely available to anyone on the planet.
Google is taking this initial launch slowly and you may find it difficult to get into the system to create your own profile. If this happens, you will be directed to a sign-up page “where you can register to be notified when Google Scholar Citations is available to all users.” To see examples of what these profiles look like, do a quick search in Google Scholar on Margaret Mead, Albert Einstein, or Richard Feynman. Look down the list for the Click on the “Author profiles for....” link and you will be quickly and easily taken to that author’s Citation page.
“It’s surprisingly accurate,” notes genetics grad student James Schnable. “The only corrections I had to make to my profile were condensing two duplicates of existing papers which were listed with slightly different titles or author lists on different websites. Google Scholar didn’t miss a single one of my papers, nor did it include any of the papers published by other people what [sic] shared my name back in the 20th century, like so many other searches have.”
A Few Concerns
The system is dependent on the active involvement of researchers today, and there wouldn't appear to be the interest or ability at Google to fully index researchers of the past into the system. Until the past can catch up with the present, the value of this will be limited for serious citation research.
Another interesting issue with GSC is their intention or interest in having quality control through some type of crowd-sourcing to clean up records and do some of this background work.
Anne-Wil Harzing, creator of Publish or Perish (which is based on Google Scholar data) and a management professor at the University of Melbourne, notes that some type of external checks would be required to maintain the integrity of the system. “Who would prevent authors from merging their publications with highly cited publications by other authors that they import into their record, just to boost their citation records? I was able to boost my citations by merging articles by Albert Einstein into my own articles very easily. Of course I undid all of this and my author profile is not public, but it shows you how easily the data can be manipulated. One would of course be able to spot this by drilling down into merged records, but many users would just look at the overall author profiles. As they look quite ‘official,’ one would not easily doubt their status.”
Expect Some Major Changes Ahead
Patrick Dunleavy, professor of Political Science and Public Policy at the London School of Economics and a long-time citation researcher, believes that “GSC will be very popular with academics for their CVs and will be heavily used for this purpose, and has a very clean and easy to us feel... Some of this is achieved by a smart algorithm the Google engineers seem to have running to separate out sets of publications by people with similar names, that makes things look a lot easier than in fact they are.”
Google’s extension into citation research comes at a time when library budgets are dwindling and GSC would appear to be a potential alternative for citation analysis. It will be interesting to watch Google’s changing priorities and strategies over the coming years. Many non-profits have moved to adopting Google Docs, Google Mail, and other services to save costs in this era of austerity.
Remembering back to the early days of the PC, in frustration with the pricing and market controls of IBM and Apple, many were thrilled to throw their support and buying power behind a relatively small, seemingly benign company operating out of Redmond, Wash. Microsoft rose to power—and even more control and manipulation of markets—than IBM ever did in the PC area. Let’s hope Google’s mantra of doing “no evil” will hold.
In a time of cutbacks and belt-tightening, Google seems to be offering a ray of hope to research libraries. “We’ve been doing our best to get them up to the importance of the Scholar offering (and potentially of Google Books also), especially for the social sciences and humanities,” Dunleavy notes. “We have been dialoguing with the very nice folks at Google who run Scholar for about a year now and I am confident from these interactions that Scholar is not a major part of the corporate Google effort...Google internal managers I know see Scholar as a sideline activity, not something they expect to generate revenue from, nor to link at this stage to their larger scale things like Gmail or Google Docs.”
The system is still in the early stages, but is clearly an essential citation tool for the future. Any research organization needs to follow this development closely—and researchers with significant publishing records should get in line to create their own profiles in what promises to be a key citation research tool of the 21st century.
A Sampling of Current Standard Author Identifier Systems
arXiv Author Identifers
A highly-automated electronic archive and distribution server for research articles
“Since 2005 arXiv has used authority records that associate user accounts with articles authored by that user. These records support the endorsement system. The use of public author identifiers as a way to build services upon this data is new in 2009.”
Scopus (Reed Elsevier)
“The Author Search in Scopus allows you to locate a particular author simply by entering the author’s last name and an initial or first name and then click on Search. You will be presented with the preferred author name along with the variants of the name that have been grouped into an author profile. All results include the number of documents that an author has published. You can choose to display the results alphabetically or by document count.”
Author Profile Page
ACM Digital Library
“The Author Profile Page initially collects all the professional information known about authors from the publications record as known by the ACM bibliographic database…. Coverage of ACM publications is comprehensive from the 1950s. Coverage of other publishers generally starts in the mid 1980s. The Author Profile Page supplies a quick snapshot of an author’s contribution to the field and some rudimentary measures of influence upon it. Over time, the contents of the Author Profile page may expand at the direction of the community.”
“Developed for publishers, Author Resolver is a web-based author information service that helps your readers instantly learn more about the authors in your collections. One click on an author’s name links to a concise profile that includes current affiliation, education, most recent publications, and a statement of expertise.”
DBLP Computer Science Bibliography
For some DBLP authors additional information is stored that allows for tracking researcher working in this field or depositing their papers with this free online resource.
A collaboration of CERN, Fermilab, and others
“INSPIRE represents a natural evolution of scholarly communication, built on successful community-based information systems, and provides a vision for information management in other fields of science.”
Microsoft Academic Search
From the Redmond, Wash. giant
Microsoft Research worked on similar projects until they were shut down; now this appears to be another Bing-based effort to cover similar ground. The system, labeled as “beta” on the homepage is still under development, however the system included “27,167,877 publications and 13,804,717 authors” at deadline for this article and the company claims to be adding 10,000 new citations each week.
“OpenID is a decentralized standard, meaning it is not controlled by any one website or service provider. You control how much personal information you choose to share with websites that accept OpenIDs, and multiple OpenIDs can be used for different websites or purposes.”
ORCID: Open Researcher & Contributor ID
An independent, community effort
“Our goal is to resolve the systemic name ambiguity, by means of assigning unique identifiers linkable to an individual’s research output, to enhance the scientific discovery process and improve the efficiency of funding and collaboration.”
Publish or Perish
Developed by a college professor
“Publish or Perish is a software program that retrieves and analyzes academic citations. It uses Google Scholar to obtain the raw citations, then analyzes these and presents” a variety of statistics and calculations based on the Google metrics.
RePEc Author Service
A service of Research Papers in Economics, an open collaborative
“The RePEc Author Service aims to link economists with their research output in the RePEc bibliographic database. A research profile is built, showing all identified works. Author profile can be found through the description of any work the author claimed as his/hers.”
“ResearcherID is a global, multi-disciplinary scholarly research community. With a unique identifier assigned to each author in ResearcherID, you can eliminate author misidentification and view an author’s citation metrics instantly. Search the registry to find collaborators, review publication lists and explore how research is used around the world.”
Various universities with NIH backing
“VIVO in a nutshell, is a web application, provides a semantic database that can be searched that contains information about scientists, their research interests, affiliations, publications, grants, etc. VIVO is in deployment at several universities and is being expanded through a $12.2m stimulus grant from the National Center for Research Resources (NCRR) of the National Institutes of Health (NIH).”