KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe Internet@Schools Intranets Today KMWorld Library Resource Literary Market Place OnlineVideo.net Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



News & Events > NewsBreaks
Back Index Forward
Twitter RSS Feed
 



Publishers Try—Again—to Reach the Open Web: ACAP
by
Posted On February 14, 2008
A new open source protocol has launched following a yearlong pilot test. Basically, Automated Content Access Protocol (ACAP, www.the-acap.org) allows publishers to make their content visible and distributable over the open web, while attaching access policies. Technically, applying ACAP involves either altering the standard robots.txt file or embedding content permissions into HTML content itself. Currently, Rightscom (www.rightscom.com), a consultancy founded in 2000 specializing in intellectual property rights and networked digital content, manages the ACAP effort, but this may change to a standards organization later this year. So far, of the major search engines, only Exalead (www.exalead.com/search) has agreed to accept ACAP, but Mark Bide, ACAP project director, is optimistic. "We are working through many different channels to promote and encourage adoption, particularly the publishing trade associations who have been instrumental in getting us to where we are today. For a project that started only 14 months ago, we have already made a substantial impact."

The protocol began as a joint initiative by the World Association of Newspapers (WAN; www.wan-press.org), the International Publishers Association (IPA; www.internationalpublishers.org), and the European Publishers Council (EPC; www.epceurope.org). The current list of ACAP members (www.the-acap.org/members.php) numbers 43 organizations worldwide. Most of the membership consists of organizations representing or serving publishers, such as the Association of American Publishers (AAP), the Copyright Clearance Center (CCC), the International Association of Scientific, Technical & Medical Publishers (STM), the International DOI Foundation (IDF), and the Recording Industry Association of America (RIAA); the rest are content owners themselves, including the Associated Press (AP), Bowker, Random House Group, Reuters, Scholastic, and Wolters Kluwer. New members will pay an annual subscription fee of $2,200 (€1,500) to gain "the right to propose new use cases to work on with the ACAP technical team and to follow the project at close hand."

The ACAP Pilot Project began in November 2006 and completed version 1.0 in November 2007. As part of the pilot project testing and developing ACAP, MPS Technologies (www.mpstechnologies.com), a Macmillan Group company, applied ACAP to its eBook platform, BookStore. Other participants in the pilot included: Agence France-Presse, De Persgroep, Impresa, Independent News & Media PLC, John Wiley & Sons, Macmillan/Holtzbrinck, Media 24, Reed Elsevier, Sanoma Corp., The British Library, and Exalead. Since the pilot project, the most significant application of ACAP is its introduction onto the London Times site, TimesOnline (www.timesonline.co.uk). You can even see the tiny "ACAP Enabled" logo at the bottom of the screen.

Currently, ACAP’s v1.0 focuses on "policies relating to the usages made by search engine operators of publishers’ content, not with end users," according to Bide. However, it is expected to grow into a more sophisticated tool for enabling content permissions. Bide points out, "We are also being asked to extend the protocol in a number of directions. The new use cases could extend attached policy expressions at individual non-text files (e.g., graphics and audio); other use cases to work on would involve syndicated content and how to communicate those policies, and still others would explore standards to support extended online commissioning, e.g., if someone wants to get permission to use multiple copies or different ways of posting and distribution." He did emphasize that ACAP was only meant to provide a machine-to-machine protocol, however, not to become a service. He expected other third-parties to supply assistance as well. "We anticipate involving CMS [content management system] vendors more closely in the next round of Use Cases, but they too need to see a significant groundswell of interest from their customer base before they are likely fully to engage with the requirement."

When asked how "granular" ACAP could become, whether it could carry rules for groups of publications, single publications, type of content within a publication (e.g., pictures in a newspaper article), groups of articles, single articles, etc., Bide responded, "In v1.0, ACAP follows the approach of the Robots Exclusion Protocol—setting policies for complete websites, for parts of a website directory tree, or for individual HTML pages (using metatags). There is work currently in progress to extend this to (for example) individual non-text files (graphics, audio, audiovisual).

Part of the challenge in following the Robots Exclusion Protocol is that policies need to be directly associated with the content to which they relate. This is again a matter of meeting technical constraints imposed on ACAP by the way in which search technology operates. It is apparently not possible for the search engine crawlers currently to deal with any type of indirection. This makes it hard to set up rules relating to classes of content for this particular application of ACAP (since it requires too much interpretation)."

Full procedures for how to employ the ACAP protocol appear in the ACAP Technical Framework documentation. Though the Copyright Clearance Center has not fully endorsed ACAP, its CopyrightLabs offers an ACAP Validator software tool for publishers interested in applying the protocol. The description of the tool (http://copyrightlabs.com/acap-validator) can also serve as a minitutorial on ACAP.

Many expert commentators on web publishing have expressed doubts that ACAP will succeed. Even ACAP’s advocates concede that success will depend on publishers making enough content available to attract the major search engines. However, its growth into a sophisticated permission service could provide a longed for release from fears of endless litigation. Bide comments, "It is our expectation that ACAP should substantially diminish the necessity of going to the courts to resolve disagreements between those who own and control content, and their business partners on the network. Much has been made of the argument that—because of the size of the internet and the number of transactions—it is not possible for the search engines and other aggregators to know about the policies of individual publishers with respect to reuse of their content. ACAP changes that situation completely; publishers can now express those policies clearly in a language which we have shown that a search engine operator can read, interpret and comply with." Nonetheless, at present, none of the major search engines has adopted the protocol. Bide says, "We know that all of them have been studying the specifications closely. Representatives of the three largest search engines all contributed to the work of the project in one way and another."

But if success depends on the involvement of the Big Three search engines, ACAP has another technical challenge. All of the three majors and other leading search engines have begun using the new open source Sitemap protocol, an XML-based syntax to penetrate content encased in proprietary legacy systems. (For a description of one use, read the NewsBreak, "Google Burrows into State Government Data," May 7, 2007, http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=36142.)

Bide says, "We had initially expected the ACAP protocol to work in close combination with Sitemaps—but were told by at least one of the major search engine operators that they would not be able to manage access and use policies expressed in this form. They were very insistent that we should extend the Robots Exclusion Protocol—something we were relatively reluctant to do, because of the limitations of its syntax when compared with an XML-based protocol like Sitemaps. I confess it is marginally frustrating to be continually asked why we didn’t integrate with the Sitemaps protocol. But it would not be difficult for us to do so now; we are currently working on an XML expression for ACAP which will be used with new Use Cases involving syndicated content. It would not be difficult for us to implement an XML expression for communication with search engines (integrated with or separately from Sitemaps); all we need is an indication from the search engines that they would prefer this. Like Sitemaps, ACAP is a determinedly "open" standard, with no licensing constraints. This has been an essential principle from the outset."

According to Ed Pentz, executive director of PILA (Publishers Internet Linking Association) or CrossRef, "The aim of ACAP is very good. Publishers are trying to solve issues, not just trying to lock up their content. But if the major search engines continue to ignore ACAP, it won’t fulfill its purpose. They need enough publishers to get behind it to succeed." Bide echoed this view. When asked what could tip the balance, e.g., bring Google, Yahoo!, Microsoft, Ask.com, and others to accommodate ACAP, he responded, "Well, of course it depends on a critical mass of publishers showing that they are willing and able to implement ACAP—and that they are determined to do so. Standards implementation always faces the "chicken and egg" dilemma—in this case it is clear that publishers have to move first. Our most important task right now is to encourage implementation, to ensure that we can build on the momentum behind ACAP. Standards are very sensitive to the "network effect" —and there is much to be done in this direction. How long will this take? I remain optimistic that 2008 will see the turning point."


Barbara Quint is senior editor of Online Searcher, co-editor of The Information Advisor’s Guide to Internet Research, and a columnist for Information Today.

Email Barbara Quint
Comments Add A Comment

              Back to top