Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

News & Events > NewsBreaks
Back Index Forward
Twitter RSS Feed

Open Content Alliance Rises to the Challenge of Google Print
Posted On October 3, 2005
What a great idea! Why didn't we think of that? Google Print's ambitious effort to digitize the world's book literature has inspired others to initiate their own effort. And, with the Google Print program caught in the snag of a copyright lawsuit, the sight of a relay race handoff keeps hope burning for a brighter digital future. The just announced Open Content Alliance (OCA; creates an international network of academics, libraries, publishers, technological firms, and a major search engine competitor to Google—all working on a new mass book digitization initiative. The goal of the effort is to establish a flexible, open infrastructure for bringing large collections of digitized material into the open Web. Permanently archived digital content, which is selected for its value by librarians, should offer a new model for collaborative library collection building, according to one OCA member. While openness will characterize content in the program, the OCA will also adhere to protection of the rights of copyright holders.

OCA founding members include the Internet Archive; Yahoo! Search; Hewlett-Packard Labs; Adobe Systems; the University of California; the University of Toronto; the European Archive; the National Archives (U.K.); O'Reilly Media, Inc.; and Prelinger Archives. The Internet Archive (, which is led by Brewster Kahle, will provide hosting and administrative services for a single, permanent repository. Technological and some financial support will come from Adobe and Hewlett-Packard. Yahoo! Search will supply initial search engine access as well as technological support and some funding.

Content collections will cover a wide range of material, including digitized print and multimedia content that will range from fiction to children's books to engineering white papers. Some collections are already in place on the Internet Archive—e.g., the T-Space digital archives from the University of Toronto and other Canadian universities, which were built using MIT's DSpace format. Others are in the process of digitization; e.g., the Classic American Fiction collection is being digitized at the University of California and the California Digital Library. The latter should begin coming online within a month, according to Kahle, with the whole collection expected to be online by the end of next year.

Even though Yahoo! Search has taken a leading role with the OCA, the fundamental principle behind the program is open accessibility. As material comes online, all search engines—and yes, that does include Google—will have access to the repository. Dave Mandelbrot, Yahoo!'s vice-president of search content, stated: "We are honored to participate in a program that helps further our vision of expanding all human knowledge by working with content creators to make their content available to a growing online audience." As collections roll out, Mandelbrot said they will ultimately add features that allow people to search collectively. They will also integrate the material with regular Yahoo! Search content.

Content available on the OCA Web site will be in PDF and other widely accepted formats. At the moment that means Déjà Vu, but the OCA is open to future changes and format developments, according to Kahle. The OCA has arranged for tools to assist contributors in digitizing content. Adobe will supply online tools to help individuals or institutions create PDF documents for the system. Institutions seeking to conduct bulk digitization projects will be able to tap Hewlett-Packard scanning equipment. A demonstration of that equipment is scheduled to take place at the Internet Archive in late October, around the same time as the Internet Librarian 2005 conference in Monterey, Calif.

Experience has shown that the most stringent barriers to digitization often lie in the bureaucratic politics and complex legalities. The Open Content Alliance hopes to work through these problems and, according to Kahle, "establish mechanisms for sharing while meeting each institution's responsibility in opening content." Kahle described the organization's goals. "In essence, we want to get the rules right, to enable libraries to work with commercial sources, governments, etc., without having to hammer out separate agreements. Like the Open Source movement has done for software, we want the Open Content movement to do for institutions, to let them play a role. We want to define a way through the puzzle of who does what, to establish mechanisms for cooperation."

The OCA will only add material that is either in the public domain or has the copyright holders' authorization. It encourages the distribution of copyrighted content using Creative Commons licenses (, which offer a number of licensing models that encourage personal use, reuse, and flexible access to digital content. Initially, according to Kahle, the OCA content will be completely open access; it will be available to all, with no password required. The OCA may carry notices on specific requirements due to Creative Commons licensing, but it will not police compliance.

With the legal controversy looming over Google Print, it is comforting to hear Sally Morris, chief executive of the Association of Learned and Professional Society Publishers (ALPSP), state: "We welcome the launch of the OCA because its approach respects the rights of publishers and other copyright owners … the OCA's model of allowing rights holders to control which of their works are opened up, when, and where they are hosted may encourage others." In that connection, one of the founding members of the OCA is the relatively young publishing house O'Reilly Media—all of its content is in copyright.

When asked what will distinguish the OCA material from Internet Archive's existing archives, e.g., the snapshots of the Web in the Wayback Machine, Kahle said that the "Open Content Alliance will be more library-like, as opposed to an archive. Content will be more curated, more vetted by library staff. The OCA is trying to kick off with an end-user focus, as opposed to where the collections come from, but how it will evolve, we don't know yet."

Kahle believes that "bringing the treasures of our libraries and archives to a worldwide readership is in the interest of many organizations" and invites all "interested organizations to join the effort and help fulfill this digital dream."

University of California provost and senior vice president M. R. C. Greenwood saluted the OCA, saying: "The consortium will create a world-class, world-access library. We are delighted to be part of it, and [we] look forward to contributions from other universities and cultural institutions worldwide."

The OCA will be funded by its contributors and will also accept donations from global institutions including governments, commercial entities, and philanthropies. At present, contributions go to the Internet Archive directly as a 501(c)(3) nonprofit organization. Other arrangements are also possible, e.g., triangles such as the joint funding provided by Yahoo!, the Internet Archive, and the University of California for digitizing Classic American Fiction. Individuals or institutions interested in joining the OCA should contact the Internet Archive or send a message to

The movement toward digitizing all content continues to spread. The European Commission adopted an initiative in June 2005 titled "i2010: European Information Society 2010" in which digital libraries were a flagship goal. On Sept. 30, 2005, at a meeting in Brussels, Belgium, the commission unveiled a strategy for making "Europe's written and audiovisual heritage available on the Internet." It presented a first set of actions at the European level intended to feed into a proposal for digitization and preservation for presentation in June 2006. (For more information, check out Europe's Information Society Thematic Portal at

For Real?

Is the OCA for real? If it's successful, what will it mean to the libraries and librarians working with it? I interviewed Carole Moore, chief librarian at the University of Toronto, and Daniel Greenstein, associate vice president and university librarian at the California Digital Library. Interestingly, both saluted Google Print for getting the ball rolling. Discussing mass digitization projects, Greenstein said: "Google kicked us into gear. They woke us up." Moore said: "It is an idea whose time has come. Before, when it came to digitizing books, the world was not ready, but the world has changed. Google can't do it all. Other people have to contribute."

Moore was quite excited about working with the people who are sharing content in OCA. Though the University of Toronto has been dealing with the Internet Archive for more than a year, "the OCA steps it up," said Moore. She also appreciated the improvement in the scanning process that OCA offered through the Hewlett-Packard system. Greenstein enthused over the fact that the OCA does not claim to have all the answers right now and that they are setting up a place where publishers, academics, libraries, and other interested parties can get together and work out problems. He expects different players to enter the game as time goes by.

For Greenstein, probably the single most promising factor was that he now sees librarians tapping into collection budgets to fund digitization projects. Instead of treating digitization as an extra service that would probably be funded by grant money, librarians have begun seeing digitization and sharing with other institutions—and the world through the open Web—as a form of collaborative collection building. Greenstein said: "We must build off what we have done. These are problems we should solve with partners."

He also admitted that one element of the librarian interest probably comes from weariness with endless "renting of vendor backlists," which is big money in the University of California's library system. In a practical sense, said Greenstein, it comes down to "you can pay now or pay later, but paying later will cost more." According to Greenstein, "perpetual access and sharing with the world" is the ideal. As the proverb says, "Do good and do well."

Barbara Quint was senior editor of Online Searcher, co-editor of The Information Advisor’s Guide to Internet Research, and a columnist for Information Today.

Comments Add A Comment

              Back to top