Every searcher's fear is that a search will produce too little of what you want or too much of what you don't want. And, even if you get a nice collection of the right stuff, is it all the right stuff out there or does it omit things you need to see? In technical terms, does your search strategy balance precision and recall effectively? Linguistic and semantic search engines have long held out the promise of helping computers "understand" concepts, rather than just search for terms. Cognition Technologies (www.cognition.com) has launched CognitionSearch, a linguistic search engine that supports ontology, morphology, and synonymy, tapping one of the world's largest computational dictionaries. Initially, the company will market a vertical enterprise service for legal litigation support and for life science and health research. It also offers an open Web service (www.cognitionsearch.com) to demonstrate the technology as applied to MEDLINE and PubMed content, to judicial and legislative sources, and to political blog content.
The complex technology behind CognitionSearch stems from some 20 years of research led by Cognition's chief technology officer, Dr. Kathleen Dahlgren, who also serves as an adjunct professor of linguistics at the University of California-Los Angeles. After leaving IBM in 1990, Dahlgren set up a new company called Intelligent Text Processing (ITP) that patented a natural language understanding system. ITP struggled and closed in 2002. In February 2003, Dahlgren and a team of linguists and lexicographers founded Cognition Technologies. In January 2006, Scott Jarus, former chief executive of j2 Global Communications, the company that produced the eFax service, joined the company as both an investor and CEO. Jarus hopes to supply the business and strategic knowledge needed to leverage the technology into the next big thing in the search arena.
Cognition's two-pronged vertical-market strategy focuses on legal litigation support, particularly in the electronic data-discovery process, and on the life science industry, particularly the pharmaceutical and genetics fields. According to Jarus, the company chose the target areas "based on the realities of finances and marketing opportunities. We needed to find self-contained markets where research lay within boundaries and where we wouldn't have to spend a huge amount of money to penetrate and gain traction even for proof of concept. Launching a search engine on the Web involves an inordinate amount of money that we didn't have. So we looked to vertical markets as a proving ground." The product areas were chosen for their critical, unmet needs or "pain points," to quote Jarus, and the opportunity to meet a lucrative business need.
In the legal arena, Cognition has become the advanced search engine for the LexisNexis Concordance service, a litigation case-management software service used on more than 65,000 desktops. The CognitionSearch-powered Concordance is integrated through Everest Technologies' enhancement (www.everesttek.com) and built into IPRO Tech's IPRO View litigation support service (www.iprocorp.com). In the life science area, the first enterprise target is a large university medical school with whom the company plans to further augment its life science-specific terminology, including those driven by the Human Genome project.
How It Works
Cognition's patented technology combines formal linguistic algorithms with semantic representations to create a "naïve" semantics that speeds up the computational parsing. In building the tools for the service, Cognition uses 4 million semantic representations, 350,000 word stems, 376,000 word senses or concepts, 17,000 ambiguous word definitions, 100,000 phrases, 7,000 nodes for the ontology or tree structure of the taxonomy, and 50,000 thesaural concept groups. Enterprises employing the software are provided with tools to add their own specialized terminology, e.g., product name lists. In the case of very large term expansions, Cognition will supply a consulting service to augment the CognitionSearch service.
In the current launch of the CognitionSearch open Web service, the company selected three subject areas to showcase and demonstrate the technology: health (MEDLINE, PubMed, etc.), legal (U.S. Supreme Court cases, a million Enron emails, etc.), and politics (key political blogs). A glance at the home page clearly indicates plans to add content for government, the environment, and social networks. In the future, the company is also considering the introduction of a consumer-oriented search portal called CogHog. (The CogHog, an icon using a blue pig's head appearing on the CognitionSearch site, is known to Cognition staff as Phil Cogito. The Phil may actually be a gilt—i.e., a female pig with only one litter of piglets to her credit—since she appears to have returned to her maiden name after divorcing "Ergo Sum.")
The Advanced Search mode for CognitionSearch offers five basic search approaches: plain English search, linguistic Boolean search, quoted (or phrase) search, pattern search, and fuzzy search (a variation of the pattern search). The Advanced Search mode will seem familiar to professional searchers with a lot of experience dealing with database services. The complexity and field searching approaches may seem new and somewhat difficult to end users, however.
A Work in Progress
At this point, the CognitionSearch engine is clearly a beta test, a work in progress. Throughout a search, users can access Feedback mechanisms to comment on the service and to report their reactions. They can also use a pull-down menu to push the search into a different conceptual path if the terms chosen by the search engine stray off target. If you activate the Feedback feature at the end of a search, the pop-up window will repeat your search query and ask you to designate the specific results that elicited your reactions. At this point, one clear difficulty lies in the lack of an option to display results in a reverse chronological order for those interested in the most recent work in a field. Relevance ranking is the only display mode offered, though sophisticated users could conduct a series of Advanced Searches in some files using the date field. Mike Reid, vice president of sales and business development, indicated that they are aware of this problem and have it high on their list of changes.
Funding for the new service is coming from a Southern California venture capital group called the Tech Coast Angels, a group that Jarus has joined. At this point, Jarus is leading a venture capital round looking for $5 million to $10 million in investment funding.
I asked Steve Arnold, longtime expert observer of the search engine field (among other information industry concerns), what key factors would affect Cognition's chance of success. Technologically, as Arnold confirmed, the greatest problem linguistic or semantic search engines have had has always been scalability. The greater the volume of content, the slower the engines work until they just can't meet user expectations for real-time service. According to Dahlgren, "Our technology is incredibly scaleable and fast. We were able to index the whole 16 million records in MEDLINE in one day." According to Jarus, "The technology doesn't have a problem with scaling. The only challenge is when we have performance problems; then we need to add more machines, more hardware. It's a commoditized problem."
As for business factors, Arnold named three issues: funding, especially government contracts; penetration of vertical market, especially the problem of countering the resistance of IT staff to solutions from strange, new vendors; and a solid exit strategy, e.g., licensing the technology or selling out the operation. Responding to these issues, Reid indicated that Cognition planned to pursue contracts with a number of federal agencies, including Homeland Security, Defense Intelligence Agency, CIA, etc. Dahlgren pointed out that "the government needs to do automated surveillance. They need disambiguation and synonymy."
As to a sound exit strategy, Jarus responded candidly: "The politically correct answer is that we are building a company with sustainability, but, if we were offered more money than we can imagine, we'd be happy to sell. Assuming we are successful in promoting our technology and gaining traction, the reality is that I'd have a hard time imagining that Cognition would remain independent. The big boys would recognize our value and want to bring it home. For example, another less obvious way to use this technology would be to match up SEO [search engine optimization] with queries and questions. SEO companies love better matches. There are plenty of opportunities out there, a whole raft of opportunities."