The answer to the oft-asked question, "What comes after Google?," may not be the name of a new competitor chasing the market leader in Web search, but an effective, innovative use of Web information. IBM has launched a service named WebFountain that applies an elaborate mesh of software called text mining or text analytics to spidered data from across the Web. Developed at IBM's Almaden Research Center, WebFountain will provide a platform for developing new products and services in partnership with IBM ("powered by IBM"). Factiva has announced the first partnership from the traditional information industry in a service called Reputation Manager, scheduled for release in the second quarter of 2004.
WebFountain is a Web-scale mining and discovery platform that extracts trends, patterns, and relationships from massive amounts of unstructured and semi-structured text. It consists of three primary components:
- A supercomputer-based platform infrastructure that integrates "miners," crawlers, and applications to open, scalable standards, which will host services
- Multi-terabyte data stores of unstructured and semi-structured data including all kinds of Internet data, from Web pages to blogs, bulletin boards, enterprise data, legacy data, licensed content, chat rooms, e-mail, etc.
- Text analytics including natural language processing, statistics, probabilities, machine learning, pattern recognition, and artificial intelligence
Robert Carlson, IBM WebFountain vice president, describes the current content set as over 1 petabyte in storage with over 3 billion pages indexed, 2 billion stored, and the ability to mine 20 million pages a day. "We plan to have the entire Net mined in 12 months," said Carlson. The system also works across multiple languages. According to Carlson, it currently covers the large majority of languages on the Net and will cover 21 languages by the end of 2004.
Initial marketing stemming from the service will aim at enterprise accounts. Specific uses arranged with third party partners and clients will define different sets of content. For example, products generated from WebFountain for Factiva will incorporate a 2-year-plus-current subset of Factiva's Publications Library. While IBM will supply the technology and host the service, partners will supply the marketing and brand name connections to specific targets. Factiva's Reputation Manager will serve senior executives by using WebFountain to discover and measure what the world and its Web are saying about companies and products. Reputation Manager could cost an estimated $150,000-$300,000 annually.
Susan Feldman, director of content and retrieval technologies at IDC, commented: "The advent of the Web, with its information free-for-all has somewhat sidelined content providers who have traditionally provided higher quality content, but at a price. This relationship [between IBM and WebFountain partners] brings the content provider back to the center. If they can create tools that will merge internal and external information, categorize it into a single taxonomy, and then let their users manipulate data from multiple sources, they will have created the ultimate competitive intelligence tool—as well as marketing tools, terrorist tracking tools, reputation monitoring tools, as this one is billed, and other, almost limitless information possibilities. This application takes an unthinkably large collection of worldwide information and makes it accessible as it has never been before."
Feldman points to the breadth of IBM's software research. "IBM has superb research that goes in all directions. It has over 100 content technologies in hand. Here in WebFountain they have pulled them together with their know-how." However, Feldman also salutes the wisdom of using third parties to handle customer relations and marketing.
Initially, according to Kevin Mann, chief strategist for WebFountain, IBM plans to target the Global 2000 for its direct efforts, while they expect partner Factiva to target the Global 4000. Clare Hart, president and CEO of Factiva, commented: "This is the next logical step toward giving people intelligence that they can act upon. We expect this type of service to become a key business asset and a must-have for the most ambitious enterprises." (For more information on WebFountain, go to http://www.almaden.ibm.com/webfountain.)
Until WebFountain, services using a full array of advanced text mining and analytic software have required too much computer power to operate swiftly, restricting operations to millions of documents in coverage and tens or hundreds of thousands of uses per day. IBM's accomplishment, according to Feldman, lies in pushing text mining to a new scale level—billions of documents and millions of requests per day.
Reportedly, dozens of potential partners have approached IBM. Mann indicated that IBM was negotiating with many on the applications side, plus others on the content side. Enabling the full functionality of WebFountain requires both client and domain knowledge. Feldman predicts that text mining companies, such as ClearForest, with strong canned taxonomies and good visualization of results will make logical partners.
Feldman advises information professionals: "If you're involved in building a taxonomy in a discipline, in indexing a field, in competitive intelligence, in setting up news alerts for a large enterprise, you need to pay attention to this announcement. It's an opportunity."