Recommind, a software company that specializes in automatic categorization and information retrieval, is introducing MindServer Media on Oct. 20. Its customer list is relatively small, but highly prestigious. It includes Europe's largest television station (ZDF); a major European publisher (Heinrich Bauer Verlag); research library membership group RLG; and Bertelsmann, the publishing powerhouse headquartered in Germany. Bertelsmann uses MindServer Media to process 40,000 news articles per day.According to CEO Bob Tennant, Recommind is building on its earlier MindServer product, which automatically categorizes unstructured data using proprietary text-extraction technologies. MindServer Media adds work-flow capabilities specific to publishing and media companies.
Tennant said: "We worked with our customers to identify work flows associated with day-to-day categorization. This increases efficiency, substantially reducing the time spent archiving, retrieving, and routing news content. Bauer, for example, was outsourcing much of this work. Using our technology, they brought the categorization function in-house. It paid for itself within 5 months."
MindServer Media crawls the documents and performs statistical analysis to identify concepts. From this, it creates index terms and can route articles and news feeds directly to editors and readers who are interested in that topic. The system learns as it goes along. From the searcher perspective, highlighted terms in retrieved documents show why the software picked the terms it did. Organization tabs are included according to the specifications of the customer.
A media company might want search tabs for author, photographer, organization name, subject, geographic region, or named person. Customers can also specify how much human intervention they desire. A document specialist might want to suggest metatags, but MindServer Media doesn't require human interaction. It can work in tandem with existing taxonomies used by media companies, such as Factiva, and can incorporate SIC codes and the like.
MindServer Media's technology springs from chief scientist Thomas Hofmann's research on Probabilistic Latent Semantic Analysis (PLSA) algorithms, employs fully parallel processing, and is completely XML-compatible. PLSA tries to emulate how humans index and categorize information.
MindServer Media integrates with databases and applications using Java RMI, SOAP, or ODBC. It runs on UNIX, Linux, and Windows platforms and its language capabilities are impressive. "MindServer will work with any language as long as the language is tokenizable," says Tennant. He notes that the RLG RedLightGreen database, formerly known as its Union Catalog, has 365 different languages, some of which are archaic and no longer spoken. Both Bauer and Bertelsmann publish in only three or four languages. (For in-depth background on the RLG project, see "RLG's Union Catalog Available on the Open Web," by Barbara Quint, at http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16602.)
MindServer Media is the first of Recommind's vertical market strategies. The company intends to develop MindServer products for the legal and government markets as well. Each will use Recommind's text-mining capabilities, tailored to the work flow of the vertical markets.