Chemical Informatics Letters

Volume 11, Issue 3; September 2005

Editor: Jonathan M Goodman

ACS open letter on PubChem - number two
William Carroll, the president of the ACS has written a second open letter about the NIH's PubChem, urging the NIH to advance its molecular libraries initiative, but to avoid replicating the CAS Registry. This is in response to a letter from the NIH on August 2nd. The ACS offers to make available, for free, a database of all NIH screening and bioassay data. This is a substantial offer of funds and resources, but would restrict PubChem from continuing to make available all available information on the biological activities of small molecules. The NIH has rejected the offer.

The letter is "open", but fuzzy - it can only print out as a low resolution version which is barely legible on a black and white printer. If this is an "open" letter, what is an "open database"? See also: Chem. Inf. Lett. 2005, 11, #1, 1 and Chem. Inf. Lett. 2005, 10, #6, 72.

Degrees in Culinology
Chemistry is not cooking. Cooking is the distinct discipline of culinology. The word is a trademark of the Research Chef's Association, and there are a number of degree courses in culinology, including one at Nebraska Lincoln.

WWW growth slowing
The BBC reports that growth of the WWW is slowing. A study of the chemistry web (Org. Biomol. Chem. 2004, 2, 3222-3225) gives data which suggests that growth will plateau within a few of years.

Professor Jorge E. Hirsch has proposed the "h-number" as index of scientific output. A scientist has the number h if h of their published papers have been cited h or more times. Physics World and Nature (subscribers only) comment on the idea. This provides an alternative to just counting citations, as good h-numbers require the publication of a number of well-cited papers.

InChI: Chemical Names and Structures
The InChI chemical identifier is gathering importance. ACD/labs provides a free implementation. A InChI generation webservice is available in the Unilever Centre for Molecular Sciences Informatics. The software to generate InChIs is available from IUPAC and supporting resources are also available.

The Wellcome Trust Sanger Institute
The Wellcome Trust Sanger Institute, formerly known as the Sanger Centre has sequenced almost three billion bases. What keeps it going? Thousands of processors and terabytes of storage use three quarters of a megawatt of power, according to Roland Piquepaille's Technology Trends. Keeping the institutes computers going requires one thousand horse power.

On-line publishing financial models
On-line publishing has much lower costs than traditional printed journals, but it is still not easy to break even. Technology Research News has 200 000 visitors a month, but needs to ask for donations. How does this compare to major, expensive, science publishers? It is repackaging and sifting, rather than peer-reviewing original papers and needs to pay less than five full-time equivalent staff. The Internet Journal of Chemistry has stopped accepting new papers, despite its modest subscription cost. The Beilstein Journal of Organic Chemistry is financed by the Beilstein Institute.

CSS: Cascading Style Sheets
CSS is a simple mechanism for adding style to Web documents, and is very widely used. Tutorials and lists of CSS resources are available.

The end of the public domain
In this article, Lawrence Lessig, Professor of Law at Stanford and chair of the directors of Creative Commons, suggests that the days of information being available in the public domain may be numbered.

How many papers are correct?
In an article in PloS Medicine, John Ioannidis of the University of Ioannina argues in support of his title "Why Most Published Research Findings Are False". However, the assertions such as "Most Research Findings Are False for Most Research Designs and for Most Fields" are only justified for a specific pattern of research which may be common in parts of medicine, but is not typical of all areas of the scientific community. Despite this, the story is highlighted in the New Scientist. The type of studies the paper considers may not be mainstream chemistry, but may be common in chemoinformatics and chemical informatics.

Who will win the Nobel Prize in Chemistry?
This year's Nobel Prize in Chemistry will be announced on October 5th. The ISI has predicted winners in 2002, 2003 and 2004, so far without success. This year's predictions are available. Earlier predictions based both on citations and prizes worked rather better.

The Nobel Foundation clearly do not rigidly choose the most cited chemists for the prize. Is there anything that can be easily measured that might reflect some of the committee's considerations. Suitable achievements will probably be marked by much internet discussion, as well as notable papers. Combining the number of results from a Google search for "name Nobel Chemistry" with the scientist's h-number might give a measure of this.

The Web of Knowledge cannot be asked directly for the scientists with the highest h-number, and so, taking a lead from the ISI list, here are some approximate results for the product of the h-number and the Google hit number:

The figure for Evans is misleading, because the name "Evans" is a common one. The figure for Stork is probably too low, as the survey focuses on the last couple of decades. Schreiber appears to be ahead, on this crude index. However, if the prize were to be given to three people for an area such as synthetic chemistry or supramolecular chemistry, Grubbs, Nicolaou and Stork have an edge over Stoddart, Whitesides and Shinkai.

ORTEP (Oak Ridge Thermal Ellipsoid Plot Program) figures, which were once the commonest of three-dimensional representations of molecules, are still available, and the program to create them can be downloaded from the ORTEP web site.

© 2005 J M Goodman, Cambridge
