Building on last year’s release of an open application programming interface (API) for its Calais web service called OpenCalais (http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=40881), Thomson Reuters (www.thomsonreuters.com) together with Phase2 Technology (www.phase2technology.com) announced the debut last week of OpenPublish, a complete Calais-powered publishing suite for the popular open source platform Drupal (http://drupal.org). The new publishing suite offers semantic metatagging from OpenCalais (www.opencalais.com) and a seamless connection to the Linked Data cloud, while leveraging the power of Drupal as a social publishing platform.Available for free download immediately (www.opensourceopenminds.com/openpublish), OpenPublish is designed to make it easier for publishers to maintain relevance and flexibility at a time when user expectations and technology options are rapidly expanding and publisher budgets are contracting. The service includes a starting kit of modules and configurations specifically designed by for the needs of publishers by Phase2 Technology, a provider of web solutions for nonprofit, commercial, and media publishing clients using open source technology.
While the service is available to anyone who registers for an API key, the baked-in functionality will be a boon to small and medium-sized publishers trying to make the most of their offline content in a short period of time and at a low cost. Support is provided for content types from articles to blogs to multimedia presentations, and content monetization tools are also included. In recognition of the fact that audience awareness and engagement is critical for online publishing success, OpenPublish includes functionality such as social bookmarking, email forwarding, RSS feeds, and commenting.
Jeff Walpole, CEO of Phase2 Technology, says that the company was prompted to approach members of Thomson Reuters’ Calais Initiative about developing OpenPublish by a trend they saw among their publishing clients. "We noticed that a lot of publishers were switching from legacy content management systems to Drupal, so we started playing with the idea of integrating it with the semantic tagging capabilities of Calais." The idea was to take the "best in class" Drupal modules already used by Phase2 Technology’s publishing clients and package them together, to help users new to Drupal leverage the social publishing platform faster.
One of the key features characterizing OpenPublish is its embrace of semantic web technologies, which make it possible for servers to find, extract, share, and reuse information. Using OpenCalais, every new piece of content added to OpenPublish is tagged with relevant terms using natural language processing (NLP) technology, cutting down on the manpower needed to add value to content and making it easier for search engines to find. In a time when newsroom staff is being cut, that automated extraction and tagging is a straightforward means of making news more efficient.
Thomas Tague, Calais Initiative lead from Thomson Reuters, says, "Calais extracts semantic metadata from text, which can then be used to harvest other related information from third-party sources. It’s also integrated with Linked Data, the standard advanced by Tim Berners-Lee to expose, share, and connect data on the web." The technology underlying Calais comes courtesy of Reuters’ 2007 acquisition of business intelligence solution provider ClearForest, a move seen at the time as an effort by Reuters to speed development of its advanced search backbone.
The announcement is sure to signal even broader adoption of OpenCalais, which celebrated its 1-year anniversary in January with the release of version 4.0. Already, more than 9,000 users are using OpenCalais to process more than a million documents per day. It begs the question: Why would Thomson Reuters decide to open up access to its proprietary Calais technology? Tague says the reason is twofold. "Calais is a technology used extensively inside Thomson Reuters, so the ability for us to improve our offerings to clients is heavily dependent on it. Opening it up to a broader population of developers means that we attain hyper evolution of the technology, with releases happening monthly instead of annually."
The second reason is that clients are making increasing demands on content providers to give them the ability to mix and match their content with that of other providers. Just this week, the company announced that it is teaming with Harvard University’s Berkman Center for Internet and Society by contributing the Calais web service to support the new "Media Cloud" open research tool. "Calais is an experiment in creating interoperability at the point of consumption," says Tague.
To that end, OpenPublish is designed not just as a CMS in a box but as an integrated suite with advanced functionality, with a short learning curve. "It was important to us that the first step in using OpenPublish not be, ‘One: Go find a geek.’ This is a canned platform for users to download, install, and go," says Tague. The suite also includes an historical tagging module for use with legacy systems to enable publishers to tag entire libraries of archived content within hours for better search results. That should help with increased traffic and superior content monetization.
Other advanced functionality included in OpenPublish includes geotagging, Topic Hubs, and "More Like This," which allows an editor to set up automatic suggestions of other related content of interest to the reader. Tague observes that "Phase2 Technology isn’t just providing an integration of modules in this product. Features like the ‘More Like This’ functionality require some very sophisticated algorithms and have the power to displace commercial products."
The idea of an open source solution that is truly competitive with solutions offered by commercial providers such as Ektron, Inc. (www.ektron.com) and Vignette Corp. (www.vignette.com) is sure to remain a hot topic in 2009 and beyond. Thomson Reuters is hedging its bets, making OpenCalais work not just with Drupal but also with WordPress and Apache Unstructured Information Management applications (UIMA).
One thing seems certain: Whether editors are moving to open source solutions such as Drupal or sticking with commercial enterprise content management systems, not tagging newsroom content is no longer an option. Walpole says that in a world where readers are demanding increasingly sophisticated content mashups, "You just can’t live without tagging anymore."