OCLC Recommends ODC-BY for WorldCat Data
Posted On August 16, 2012
In another step toward increased openness and in recognition of the growing developments in linked data in libraries, OCLC announced on Aug. 6, 2012 that it is recommending use of the Open Data Commons Attribution License (ODC-BY) for member libraries that are making their bibliographic data available for use by others. ODC-BY is designed to “allow users to freely share, modify, and use” a database while giving attribution to the source of the data. The license does not place any restrictions on use, including commercial use, beyond the attribution requirement. The decision follows a resolution by the OCLC Global Council in April 2012, and the adoption of that recommendation by the OCLC Board of Trustees. This is the first open licensing of full WorldCat bibliographic data and does not make restrictions on fields or formats. The license can be applied to databases of MARC21 cataloging records.
In its documentation on the use of ODC-BY for WorldCat record reuse, OCLC suggests that the ODC-BY license be accompanied by a reference to the “community norms” that should guide usage of the data in accordance with the WorldCat Rights and Responsibilities guidelines. As the norms and responsibilities are outside of the ODC-BY license itself, the use of ODC-BY emphasizes the voluntary nature of the member agreement regarding the responsibilities entailed in record reuse.
In April 2012, the Harvard University Libraries (HUL) made available a dataset of 12 million bibliographic records in MARC21 format using the Creative Commons license CC0, which releases the data into the public domain with no restrictions on use and no requirement of attribution. The dataset includes an unstated number of WorldCat records. Although attribution is not required by the license, the HUL Bibliographic Dataset Use Terms request that attribution be given to Harvard University, to OCLC, and to the Library of Congress, as sources of the released records. The use terms also request that uses of the records respect the OCLC community norms.
While OCLC preferred that Harvard use the ODC-BY license, Harvard chose the CC0 license because “We preferred an attribution request over a requirement because we could imagine applications ... in which tracking and maintaining the attribution information could become onerous, and we wanted to provide the data in the most open manner possible,” according to Stuart Shieber, Welch Professor of Computer Science and director of the Harvard Library Office for Scholarly Communication. From a practical perspective, the open terms of CC0 make downstream reuse easier for application developers because there is no need to keep track of data provenance when creating data mashups. Recognizing that public domain licensing is not always possible, the Principles on Open Bibliographic Data of the Open Knowledge Foundation recommend “Where possible, explicitly place bibliographic data in the Public Domain via PDDL or CC0” to promote maximum reuse of the data.
The difference between the two licenses, both being used with terms requesting that re-use respect the community norms, is not great, however. In his Hanging Together blog post, Jim Michalko concludes that Harvard’s release of its data under CC0 would probably not be considered “bad acting and a risk to the long-term viability and sustainability of WorldCat,” particularly because the HUL terms give a “prominent nod to OCLC.”
Libraries are not alone in wanting to include attribution with data use. Desire for information about the source and version of consumable data has led the W3C Semantic Web community to work on standards to express “provenance” in data online. As in the area of scholarly citation, knowing the source of one’s data serves both to give “credit where credit is due” and to give the consumer of the data information for judging authoritativeness and to estimate reliability of the data. It is likely that some form of attribution will become a web community norm when the technology facilitates sharing that information with the data itself rather than in a separate license.
OCLC has previously applied the ODC-BY license for other products. These include the Virtual International Authority File (VIAF), the FAST (Faceted Application of Subject Terminology) authority file, and the recently announced schema.org microformat data that is now present in all WorldCat record displays. Related to this latter, 1.2 million of the most widely held resources in WorldCat are available as a downloadable file using the same linked data format. These services are all related to linked data activities at OCLC, and OCLC expects to use the ODC-BY license for future linked data projects. Schema.org is a standard developed by Google, Microsoft, and Yahoo! to provide a way for website developers to include metadata in webpages that will be understood by search engines. The OCLC implementation of schema.org as linked data exposes the publicly available WorldCat data elements to search engines and to linked data applications.
OCLC gives guidance for data attribution that covers a number of technologies and situations.
Karen Coyle is a librarian and a consultant in thearea of digital libraries. She worked for more than 20 years at the University of California in the California Digital Library, has served on library and information standards committees, and has written frequently on technical topics ranging from metadata development, technology management, system design, and on policy areas such as copyright and privacy. As a consultant, she has designed privacy audits for libraries, developed metadata for rights statements, written dozens of articles on libraries and technology, and is currently stepping into the semantic web.
Email Karen Coyle