If we all—librarians, readers, writers, publishers, etc.—pulled together, could we create an online library that included every book, every journal, every instance of every type of content a traditional library might contain? Even if we failed to reach that grand goal—almost as grand a goal as Google’s vaunted mission ("to organize the world’s information and make it universally accessible and useful")—we could make something wonderful in the attempt. The beta launch of the Open Library sets just such a goal and offers a set of tools and model content to build around. Currently, the project is only reachable through a demo version (http://demo.openlibrary.org) built around the site owned by Internet Archive (www.archive.org) and the Open Content Alliance (www.opencontentalliance.org), but project leader Aaron Swartz assured me that a pointer would shortly move users to a revised main site (http://openlibrary.org).According to Swartz, current content on the beta of the service includes some 100,000 Open Content Alliance public domain books, 7 million cataloging records from the Library of Congress, and another 7 million from publishers of in-print books. He stated that the two sets of metadata had little overlap.
Unlike the Open Content Alliance, which focuses on books (for background, see http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=16091), the Open Library has a section for full-text journals. The Guided Tour takes you to a sample of periodical holdings (an article by Edward O. Wilson entitled "Slavery in Ants"). You can browse the article in HTML text or download it with images as a PDF file. However, the entry fails to identify where the article was published. (It took digging back into the Web site supplying the full-text publication to discover a listing for Scientific American.) Swartz explained that they were still experimenting with nonbook/noncard-catalog content and advised steering clear of the periodical content for now.
So far, the project has concentrated on building the technological infrastructure, open source tools that others can use to contribute to the emerging service. A wiki lets users enter structured data, a search engine accesses the content, and a new database can handle millions of dynamic records. The new infrastructure program, called ThingDB, is built on top of PostgreSQL. Publishers have been contacted to contribute data. Public domain books already digitized and made searchable in the Open Content Alliance program are being moved into the service. Links to libraries and bookstores are under construction. Plans are underway to add a print-on-demand feature as well as a scan-on-demand for out-of-copyright books. Interested parties are offered a number of ways to contribute using open forums and open source code.
The "small and dedicated team" behind the project consists of overseer Brewster Kahle, head of the Internet Archive; Swartz as project leader; Alexis Rossi, manager; three programmers; and a designer. The current dominance of techies (aka "propeller heads") on the team could explain the casual exuberance of the documentation’s expectation of user participation. "The beauty of this is in its simplicity: anyone willing to learn a simple formatting language (see WikiLanguage Documentation) can edit the catalog, and anyone with a knowledge of CSS and HTML can build their own templates. By opening up the system for this type of involvement, we are creating a
‘low-barrier enrichment system’ that encourages participation from a broad base of users. …" But the system is still designed to serve those who come to it without such technical background.
Thorough descriptions of the technology behind the service and how to use it are available (http://demo.openlibrary.org/dev/docs/ui; http://demo.openlibrary.org/dev/docs). The software uses a flexible templating system, a structured wiki (infogami) where users can not only add content, but also change and introduce new content templates. It supports tags and user-defined collections and shared, collaborative ratings or comments by users. By taking an action on the site, users automatically get an account without having to register explicitly.
The search engine, "powered by Solr" (http://lucene.apache.org/solr), uses faceted navigation for sophisticated browsing. Along with folksonomy tagging, users can also set up private or public booklists. Groups can set up their own wiki pages as closed, invitation-only, or public sites. Librarians can create OPAC catalogs using hosted Library of Congress catalog records branded to specific collections with a system called futurelib (http://demo.openlibrary.org/about/lib). Books available for downloading will appear in a variety of formats, including PDF, DjVu, XML, and full text. The Book Viewer follows the "page turning" model used on the parent site (http://openlibrary.org).
It’s definitely early days yet for the ambitious project. Funding is currently supplied by the Internet Archive and volunteers. According to Swartz, "We are trying for grants from the California Library Association and the Institute of Museum and Library Services." The project is not taking ads now. "Maybe in the future," said Swartz, "but we’re a not-for-profit now." Current plans have a full-scale launch scheduled for October. For now, Swartz pointed out, "The big thing is to improve the quality of the metadata. It’s spotty now. We want to handle multiple editions of one book, build more links to libraries and other sites, set up print-on-demand and scan-on-demand. We’re still working out the details."
What they seem to need most is more user participation, both "real-people" readers and techies who can improve and increase the templates. For those interested in building a Universal Library on the Web, the Open Library project appears to be calling, "All hands on deck!"