Thomson Scientific (http://www.thomson.com/solutions/scientific), part of The Thomson Corp., recently announced that it has added 2.2 million chemical structures to the PubChem project (http://pubchem.ncbi.nlm.nih.gov), the freely available database for chemical information run by the National Center for Biotechnology Information (NCBI) at the U.S. National Institutes of Health (NIH). This represents an approximate 22-percent increase in the size, since PubChem claims to have more than 10 million structures. All of these structures have biologically interesting activity and represent approved drugs as well as compounds that have been investigated by R&D in the pharmaceutical industry.
After performing a search in PubChem, there is a link next to the sources labeled "Thomson Pharma." Clicking on this link takes you to the Thomson Pharma site and allows you to log in if you have a subscription to Thomson Pharma (http://www.thomsonscientific.com/thomsonpharma). You can then view the information that is not available for free from PubChem but can be found from Thomson Pharma. It is somewhat disappointing that the only details provided are the structures, the molecular weights and formulas, the number of hydrogen bond donors and acceptors, and some other information based on the structures.
If you have a subscription to Thomson Pharma, you can use the PubChem interface to search for compounds. Even without a Thomson Pharma subscription, it is useful to be able to search for compounds of interest. Some of the additional information available from Thomson Pharma include the following:
- Activities reported for that compound
- Drug reports citing that compound
- Synthetic methods and critical reaction data
- Patents, journals, and news stories that feature the compound
- Synonyms and trade names
- Related salts and isomers
The related structures link—one of the features I like in PubChem—has been included with the structures from Thomson Pharma. Clicking on this link gives a list of compounds that are structurally closely related. This is useful if you want to know what other known compounds are similar to ones that you may be investigating.
A look through the huge list sorted by various factors such as complexity or molecular weight gives some impression of the range of compounds. They range in molecular weight from 5,746.99 down to 3.016 for helium and from very simple to very complex.
But the availability of an enormous number of structures is not without its problems. When viewing the list sorted by molecular weight (ascending), I found 58,984 compounds that don't have any information other than the structures, and many of those were very difficult to view. It is possible to download structures in various formats such as XML, ASN.1 (Abstract Syntax Notation number One), and SDF (standard data format). If you save the PubChem structure in SDF, you can then view the molecule with ChemDraw to get a better idea of its exact structure. However, this is an awkward way to figure out what the structure is. This is the biggest drawback of PubChem in my opinion. It would be nice to have a way to view a structure easily that actually conveyed information. I'm hopeful that this issue will be corrected in the near future.
There were also quite a few compounds that had no chemical structure—merely an identification number, basically making it worthless. I'm unsure why these are included since they really are of no use since the entry contains no information, at least within PubChem. These compounds cannot be found unless you browse through the list. There may be information on these entries in Thomson Pharma, but there is no way to know that from within PubChem. It would have been better to exclude these entries or at least give some identification that there is further information contained within Thomson Pharma.
While it is exciting to see content from a major information source such as Thomson Pharma get incorporated into a freely available government-sponsored database such as PubChem, there are still drawbacks. The fact that you can increase the number of compounds by 22 percent is helpful, but not if some of the information is practically worthless. If PubChem can solve some of these issues in cooperation with Thomson, then this will be a truly outstanding addition. If not, it is still of benefit, but it would not reach its true potential.
A list of all the compounds from Thomson Pharma contained in PubChem is available at http://tinyurl.com/yjtamv.
Pubchem had an issue related to improper handling of partial valence ions in structures supplied by Thomson that caused certain structures not to display. PubChem is aware and will be fixing this and the structures will display in due course.