The National Library of Medicine (NLM) has played an historic role in the development of online technology and services. In a sense, it might almost be called the parent of online. It has performed this service using a wide array of resources—research budgets for innovative projects, lowered cost and high availability for world-class, high-use databases, operational budgets for new databases, and even publicity efforts like the Show Off Your Apps Challenge awards given in early November to innovative apps built around its services. One of the winners was WebLib for its semantic, metasearch offering, NLMplus. Intended to showcase the application of WebLib’s software to NLM content, the product also offers real value to working searchers and end users. Tamas Doszkocs, president of WebLib and recent retiree from a 30-plus year career at NLM, confirmed that WebLib plans to maintain and enrich the product indefinitely.
Smart but small companies need promotion, which they usually cannot afford; smart and generous open access/source innovators need promotion even more. The five winners of the NLM challenge included got a chance to present and demonstrate their applications at an award ceremony held at NLM on Nov. 2. An archived videocast of the ceremony is available at http://videocast.nih.gov/. Besides WebLib’s NLMplus, other winners were GLAD4U (Gene List Automatically Derived for You); iAnatomy, a digital anatomy atlas for the iPhone and iPod touch; KNALU, a PubMed visualization tool; and Quertle, a metasearch tool that simultaneously searches multiple sources of life science literature.
Five honorable mentions went to BioDigital Human Platform, a web-based 3D platform that simplifies the understanding of anatomy and disease; DailyMedPlus, a search engine for NLM pharmaceutical information; Drug Diary, an iPhone application for inventorying and researching prescription and over-the-counter medications; Molecules, an iPhone, iPod touch, and iPad application that provides 3D molecular modeling; and ORKOV, an iPhone and Android application providing access to PubMed.gov. For more details, go to http://apps.nlm.nih.gov/175/show_off_your_apps_winners_honorable_mentions.cfm.
Headquartered in Budapest, Hungary, with offices in Maryland, WebLib offers enterprises a range of sophisticated search and database enhancement services, including semantic, natural language processing and federated multidatabase searching. Its clients include corporations, government agencies, universities, libraries, and research institutions. With products like HealthMash, ToxSeek and NLMplus, it has special strengths in the health care field. WebLib tools include PolySearch for intelligent indexing and searching of institutional content; Poly/Meta Discovery Search for simultaneous multi-site searching, result filtering by meta data; PolyCluster semantic clustering around topics and key concepts; Poly/Spell and ChemSpell for English, Medical, and Scientific spell checking; Poly/Complete automatic query completion; and Poly/Meta semantic analysis and filtering by topics and subtopics. WebLib has also developed a WebLib Web Knowledge Base covering more than 8 million concepts with an emphasis on the sciences, as well as a Biomedical Knowledge Base of some 4 million concepts. The AllPlus metasearch engine simultaneously searches Google, Yahoo!, Microsoft Bing, and Ask.com with dynamically generated Topic Clusters and visual displays.
NLMplus is a semantic search and discovery application developed in response to the challenge contest by NLM. According to Doszkocs, the proof-of-concept pilot for NLMplus was only completed recently in September and October. After winning the award, WebLib substantially enhanced the product and released it to the public at the end of November 2011. The contest version of NLMplus targeted around 200,000 PubMed reviews dating back some 3 years, while the public version includes more than 1.6 million abstracts in its Semantic PubMed Reviews, the entire collection. It also has a meta-analyses subset of PubMed itself. News releases such as PRWire will be published starting next week, according to Doszkocs.
In addition to the PubMed content, NLMplus searches the entire range of 60 NLM databases simultaneously with hit counts for search results next to each database. This should promote awareness of all that NLM has to offer, ranging from consumer health topics to drugs, news, clinical trials, and translational medicine.
NLMplus uses a variety of tools and technologies reaching out to NLM’s downloadable data sets, APIs, web services, and software tools. As usual, everything starts from a Google-like search box. Users can tap into Health Topics (Medlineplus Health Topics); PubMed Reviews semantically indexed and searched by WebLib’s Semantic Search and Discovery Engine; PubMed Abstracts with semantically enhanced queries sent to NLM’s PubMed search service; NLM’s 60 less-known databases via WebLib’s PolyMeta federated search engine; Drugs and Supplements from NLM’s Drug Information Portal and Medlineplus Drugs & Supplements databases; Medlineplus News; Medlineplus Videos and Tutorials, and images from NLM’s History of Medicine and PubChem databases.
The system integrates a number of tools to perform the searching, including WebLib’s Biomedical Knowledge Base with more than four million concepts automatically generated from trusted biomedical content on the web, including subsets of NLM’s PubMed and Medlineplus databases and semantic resources, such as NLM’s Unified Medical Language System and the MeSH Medical Subject Headings Thesaurus. The Biomedical Knowledge Base uses Web Data Mining for the automatic identification and extraction of related concepts, signs and symptoms, tests and diagnostic procedures, disorders, treatments, alternative and complementary medicine, and genes, biomarkers, and other biomedical concepts. For the semantic indexing of the PubMed Reviews database, WebLib uses the Solr/Lucene open source enterprise search engine with WebLib’s proprietary semantic indexing technology for the automatic identification of key title phrases, abstract phrases, and MeSH qualifier terms. Key phrases are mapped to UMLS concepts and other concepts in the Biomedical Knowledge Base.
Searching NLM’s PubMed Database uses another approach. Instead of enhancing the indexing of the data, it semantically enhances the Boolean queries sent directly to the PubMed search service. For non-PubMed NLM files, WebLib’s PolyMeta Distributed Meta-Search and Discovery Engine simultaneously taps into several different NLM search platforms in a federated search.
Sounds pretty nifty, right? But how long will it last? According to Doszkocs, WebLib will keep it up permanently as long as they are permanent. He hopes it will show other organizations what WebLib could do for them. “It’s particularly designed to show other apps in other areas of sciences facing the same challenges. Anyone can play with the system. Other organizations are in the same boat, just with different databases, e.g., pharmaceuticals and hospitals.” He is very excited about the WebLib Web Knowledge Base building on the Biomed Knowledge Base. “It has 8 to 10 million topics now covering the collective interests of mankind,” as Doszkocs expresses it.
But times are hard and budgets squeezed. For example, the ToxSeek data service designed by WebLib is discouraging to Doszkocs. “Back when I was at NLM, I was Mr. ToxSeek, but they haven’t done anything with it since I left. It hasn’t been touched in at least a year and it should grow as component databases change. We could add semantic search to it. However, you enter a toxicology topic in NLMplus and click, you will see the hit counts from all the NLM Specialized Information Services databases in one fell swoop.”
Doszkocs remains hopeful. “As our Knowledge Base grows better and better, we should add features to NLMplus. For example, it has no social networking now. We might also add semantic searching to HealthMash, but we would always leave NLMplus up because so many organizations can use it.”