Global information behemoth Reuters (www.reuters.com) waded into the murky waters of the semantic web last week with the release of an open application programming interface (API) for its new Calais Web service. The service, which is available at www.opencalais.com, aims to make it easier for publishers, bloggers, and other content producers to automatically metatag their content and to develop their own semantic applications. The Calais Web service turns Reuters’ internal content-tagging system out to the general public.
Gerry Campbell, who has spent time working for AOL Search and AltaVista, joined Reuters in 2006 as president of the search and content technologies group and headed up the initiative that led to the creation of the Calais Web service. "The world is still suffering from information overload," Campbell says. "Web 2.0 applications have set content free to a great extent, but the explosive distribution of information doesn’t always help users find exactly what they need, when they need it. Semantic tagging offers superior sorting and filtering of content for more targeted and timely delivery."
Last year, Reuters acquired the tagging platform vendor ClearForest, Ltd. and subsequently began to employ that technology to tag its own structured and unstructured data, including internal corporate information as well as the news articles and business information gathered by its global network of reporters and researchers. The Calais release extends those capabilities out to anyone who wants to use them.
"They wouldn’t have been able to do this without their internal experience over the last year with ClearForest technology," says Leslie Owens, an analyst for Forrester Research. "They have a lot of experience with enterprise solutions, the technology is ready, they have already used it on unstructured content, and now they can use it in this business model."
To use the free web service, a publisher simply inputs unstructured text and the service returns semantic metadata in RDF format in less than a second. Using natural-language processing and machine learning techniques, Calais locates entities, facts, and events and processes those components into metadata. "Publishers can use this information as auxiliary tags for their content to improve searchability, they can use it to enhance news feeds with better tagging, or any number of other purposes," says Tom Tague, Reuters’ chief evangelist for the Calais project. "As we begin to roll out additional applications, we will have a much richer set of tools."
Reuters sees the potential for a lot of innovation in the semantic web and hopes that the open API format of the Calais Web service will encourage developers to work out for themselves the best ways to utilize this technology. Gerry Campbell saw the benefits of the open model from the very beginning of Reuters’ experiments with semantic tagging. "When we released this product internally, we told our own developers to fool around with it, bang on it a little, and see what they could do with it. We found that everyone had different ideas about what to build on top of it. A lot of our people wanted to use it in ways that we hadn’t thought of. That was one of the reasons we thought to go down the path of the open model. This is a tool we’re providing, and we want to facilitate people to use it however they want."
Campbell hopes that encouraging development will solve what he calls the "chicken and egg" dilemma of the semantic web. The conundrum plays out as follows: Publishers don’t use semantic tagging because of the dearth of tools available, and developers don’t create tools because semantic technology isn’t widely used.
Reuters has set up two programs that it hopes will help break the vicious cycle—a contest program and a bounty program. Bounties will be awarded for the development of specific capabilities that Reuters would like to provide to Calais users. The first announced bounty of $5,000 will be awarded to the developer who creates a configurable plug-in for WordPress that will enrich blogs with tag auto-suggestion, a semantic cloud, and a Globally Unique Identifier (GUID). The second program, details of which will be announced later in February, will be a series of contests. "The bounties and contests are there to drive interest," says Tague. "The challenge is to take a technical tool and make it relevant for real people. Developers are one in a thousand. In order to make the technology relevant, smart people have to translate it into applications."
Forrester’s Owens thinks Reuters’ initiative is a step toward broadening the appeal of the semantic web. "What they’re doing, between reaching out to the developer community and providing all of this as a free service, is fostering the growth of the semantic web. The automated approach to metatagging that they’re utilizing makes sense. Normally people don’t have money to invest in entity extraction tools, so they just go without it. By offering it for free, Reuters is encouraging broader adoption," Owens says.
Reuters is already seeing returns on that broader adoption. It has had nearly 300 developers sign up to contribute to the project in the past 2 weeks, and it has been working with a number of content management and database companies to incorporate this software into their toolkits. "Adoption has already been vastly better than I ever hoped for," Campbell says.