Chemical Informatics Letters

Editor: Jonathan M Goodman

Volume 12

Open Access and the Royal Society
Forty six Fellows of the Royal Society have signed an open letter to the President expressing concern about the Society's statement on open access.

ACS meeting archive
The programmes of past American Chemical Society National Meetings are now available on-line from 1998.

Finding chemical names in papers and patents
Elsevier MDL and TEMIS have developed a method for finding chemical information in text. Chemical names are found and translated into structures. A similar facility is provided by nesC, which was developed by the Nature Publishing Group and the Unilever Centre for Molecular Informatics.

A study of the effects of alternative business models for scholarly journals
This study on open-access business models is published by the Association of Learned and Professional Society Publishers, which clearly has an interest in the outcome. The survey concludes that it is too early to know whether open-access approaches have a viable business model, notes that scholarly publishing is in a state of flux, and comments that peer-review may be less rigorous for open-access journals. The last of these points is the subject of an addendum, which does not make a clear case for the assertion. An Open Letter to All University Presidents and Provosts Concerning Increasingly Expensive Journals from two Californian economists, says that the symbiotic relationship between academics and publishers has broken down, and suggest charging the publishers overheads for editorial services.

DGRweb (Directory of Grad Research) now free online
DGRweb, the online version of ACS Directory of Graduate Research, is now freely available. It covers graduate research in the USA and in Canada.

Maestro free for academic users
Schrodinger's Maestro graphical user interface for its computational chemistry programs is now free for academic users.

Teragrid is a grid of computers run by the NSF, with over forty teraflops of computer power and over twopetabytes of storage. Research using the resource includes protein sequence analysis but does not seem to include chemistry, so far.

The ChemNomParse project is an open source project to create a chemical nomenclature parser, run by from the Computer Science Department at the University of Manchester. The library is now part of the CDK.

Fifty years of citation indexing
Current Science has a special section on citation counting, now fifty years have passed since the idea was suggested by Garfield (Science, 1955, 123, 108-111 - an article which was cited dozens of times in 2005). The Chronicle of Higher Education also has an article on impact factors, which are based on citations. These measures of excellence can be misleading.

Scientific publishing since Oldenburg
Proceedings of the meetings of the Association of Research Libraries are available on-line. The 138th meeting focussed on Creating a Digital Future, and includes an article on the history of scientific publishing starting from the time of Henry Oldenburg, who founded the Philosophical Transactions of the Royal Society of London in 1665.

OSET: Organic Synthesis Exploration Tool
The OSET project from the School of Chemistry at the National Autonomous University of Mexico aims to develop a Computer-Assisted Organic Synthesis (CAOS) program for use in the teaching of organic chemistry. The program may be downloaded from the website, and was last updated in 2002.

SciFinder Scholar for Mac OS X
A version of SciFinder Scholar for Mac OS X has been released.

800 billion bases at the Sanger Institute
In January the Wellcome Trust Sanger Institute database of DNA sequences reached one billion entries. At 22 TB it is one of the biggest scientific RDBMS databases in the world.

The European ligand bank: a database of ligands for catalytic reactions in synthetic organic chemistry. Applications for membership are now invited.

Chemical Thesaurus
An index of chemical entities, from meta-synthesis, with about six thousand entries.

OpenDOAR - Directory of Open Access Repositories
This directory of open access repositories became available in January (Press release: PDF) currently contains 353 entries, all of which have been checked. Over a hundred chemistry resources are listed.

WWW statistics
This December survey of a billion web pages provides statistics on the types and styles of mark-up currently favoured. Most web pages have less than a hundred HTML tags, and, most commonly, use nineteen different sorts.

IUPAC provisional recommendations
IUPAC has made provisional recommendations for Quantities, Units and Symbols in _Physical Chemistry, a Glossary of Terms Relating to Pesticides and Explanatory Dictionary of Key Terms in Toxicology. The abstracts and full texts are available

Search Engines
New search engines since Chem. Inf. Letters 2004, 8, 7 include Amazon's a9, which finds places and gives pictures of the streets. For example, a search for 'Lensfield Road' produces a picture of the Cambridge Chemistry Department, which is on Lensfield Road, several pictures of the road and other buildings on it, and a few chemical schemes, which were published in papers from a Lensfield Road address. Microsoft's search engine, MSN search, has been updated. Yahoo has developed Y!Q, which analyses the web page you are reading and gives you related pages. AskJeeves has also been updated and no longer has a butler icon. Google still seems to lead the field, but competitors are also doing well. Companies which try to raise sites profiles on search engines seem to be doing good business, suggesting that search results can be manipulated. Ideas for search engine optimisation are available. A survey from SearchEngineWatch suggests that whilst search engines are sophisticated and intelligent, their users are not.

ACS electronic reprint policy
ACS Publications has recently modified its electronic reprint policy. Authors may distribute fifty free electronic reprints for the first year after publication. After this time, there is no restriction on the number of reprints accessed through author directed links.

This Protein Information Management System is sponsored by the BBSRC.

Structure-Function Linkage Database (SFLD)
A database that links evolutionarily related sequences and structures from mechanistically diverse superfamilies of enzymes to their chemical reactions. It is developed at the Babbitt and Ferrin labs at UCSF.

pKa Data and Calculation
Many programs are available for the calculation of the pKa of molecules, usually based on the correlation of sigma constants of functional groups with pKa. Some free programs and databases are listed here:

Punch List of Best Practices for Electronic Resources
This document published by the Engineering Libraries Division of the American Society for Engineering Education provides guidelines for the development of on-line resources.

Biology Direct
This new journal has an unusual system of peer review - the process will be open rather than anonymous, and the reviewers' reports will be published with the articles. A similar approach is taken by the European Geosciences Union, where papers are made available for open discussion before being published. A number of essays on peer review are available from Eugene Garfield the found of the ISI.

The "world's most comprehensive catalog of information on proteins", now uses a Creative Commons license. Science Commons has an FAQ on database licensing.

Stigler's law of eponymy
"No scientific discovery is named after its original discoverer." Chemical examples may be found amongst named reactions.

Inchi on-line tools
Bedrich Kosata (the author of BKChem) has a new site, including an on-line converter which generates pictures (and molfiles) from INCHI and SMILES strings.

Elsevier MDL collaborates with NIH on PubChem
Elsevier MDL and NIH are continuing to collaborate on PubChem following the earlier contribution of XPharm pharmacological data.

Free access to RSC journal archives for developing countries
The Royal Society of Chemistry has announced free access for developing countries.

PSIgate is changing as part of a restructuring of the Resource Discovery Network (RDN), and becoming Intute.

PubChem Accuracy
How accurate is PubChem? How can this be tested? A search for 'acetone' finds acetone, and Acetone ketazine and Acetone cyanhydrin. A search for mannitol produces a structure without stereochemistry, and gives dulcitol, galactitol, D-mannitol and sorbitol as synonyms. This is probably useful information, but is somewhat misleading, and so illustrates both the strengths and limitations of PubChem.

Chemistry Journals
Hindawi Publishing has embraced Open Access publishing and now has more than thirty open access titles, including EURASIP Journal on Bioinformatics and Systems Biology and Bioinorganic Chemistry and Applications (not to be confused with Bioinorganic Chemistry and Applications from Freund Publishing). All these journals are listed in c2k.

Google Scholar
How good is Google Scholar? A survey of citation counts shows it compares favourably with the Web of Science, but notes that it is not clear how it is achieving this. How consistent is it, and how reproducible are its results? Another article makes similar points. Two blogs (schoogle and the UBC Google Scholar Blog) monitor Google Scholar's development.

Visa problems for scientists visiting the USA
On 9 February 2006, ICSU President Goverdhan Mehta, FRS, was initially refused a visa to the USA. The Washington Post reports that State Department officials maintain that a letter saying "you have been refused a visa" was not a rejection. The incident has received much publicity. The ACS's Chemical and Engineering News report that his expertise in chemistry meant he was considered a security threat.

CODATA Workshop on Strategies for Permanent Access to Scientific Information
The CODATA Workshop on Strategies for Permanent Access to Scientific Information in Southern Africa: Focus on Health and Environmental Information for Sustainable Development, has made its final report available.

Nature has changed from one journal into a family of thirty journals, at correspondingly increased cost to libraries.

Science on retracted papers
Science has articles on retracted papers. What happens after papers are retracted? They may continue to be cited. Publishers do not have a consistent policy on marking retracted on-line version. A paper has recently been retracted from the Journal of the American Chemical Society, and it is clearly marked as such in the on-line table of contents.

Exploratory Centres for Cheminformatics Research
The NIH called for proposals for cheminformatics reseach, and six centres have started cheminformatics centres as a result:

Newton the chemist
As well as being a mathematician, physicist and Member of Parliament, Isaac Newton studied chemistry.

Pirelli Internetional Award
In May 2006, registration will open for the Eleventh Pirelli INTERNETional Award, an international multimedia competition for the communication of science and diffusion of scientific and technological culture entirely carried out on the Internet.

ACS president's views on open access
Ann Nalley, the ACS president, has written to members with her views on open access publishing. She asks "Is this change for its own sake?" This leads on from her predecessor's letters on similar issues (see Chem. Inf. Lett. 2005, 11m #4, 48).

Ugly web sites can work well
The most beautiful websites are not necessarily the most successful. This article suggests that content is more important than presentation, and also may convey a sense of trust.

Wiley and e-Journal Archiving
How long will on-line access continue? What happens when journals, or even publishers, cease to operate? Can long-term access to information be guarantied? Wiley has addressed this issue by joining the Portico archive an electronic-archiving initiative launched by JSTOR. Should participants cease to make its e-journal content available, there is a mechanism for Portico to keep providing content to subscribers. The CLOCKSS project, derived from the Stanford University library's LOCKSS (Lots of Copies Keep Stuff Safe), may also be used to ensure continuing availability of journals.

Compulsory copyright law classes?
California is considering insisting that copyright law is taught to students, as the USPTO considers improving software patents. It has been suggested that open source software has been an important driver in this. The current system allows too much to be patented.

CoEPrA 2006 - Comparative Evaluation of Prediction Algorithms
The CoEPrA 2006 competition is now open. CoEPrA (Comparative Evaluation of Prediction Algorithms) is a modelling competition to provide objective testing for classification and regression algorithms via the process of blind prediction.

List of conferences
Chemical Informatics Letters has a new conference list, focussed on chemical informatics and related ares. If you want to suggest a conference is included in the list, e-mail with the subject line "CIL conference". The e-mail should have the structure:
Line 1: Dates
Line 2: Title
Line 3: URL
Line 4: blank
Subsequent lines - any other information (this is unlikely to be included on the website)
Conferences suggested in this way will be considered for inclusion on the list.

Courses on chemical informatics
University and college courses in chemical informatics were surveyed in Chem. Inf. Lett. 2005, 10, #5, 50. There are now a large number of course modules on chemical informatics, but only a few degrees.

Structure Based Drug Design Conference
The Cambridge Healthtech Institute conference on drug design starts on June 14th. A list of other chemical informatics related conferences is available.

New antibiotic
A new antibiotic, Platensimycin, has been discovered which is effective against MRSA.

Software patents
In response to a question from an MEP, the European Commission has indicated that software is not patentable;

Boring titles are best
Entertaining headlines catch people's attention, but not search engines. An article in the New York Times suggests that search engines favour simple headlines.

Mining chemical structural information from the drug literature
It is easier to find too much information than the right information. This article discusses the issue, but omits the Experimental Data Checker, which is freely available.

Wikipedia accuracy
How accurate is the Wikipedia? The quality of the information usually seems high. Professor Martin Walker of the State University of New York at Potsdam, presented a paper on chemical information in the Wikipedia (PDF) at the 231st ACS National Meeting at Atlanta. He concluded that the Wikipedia's chemistry content is growing quickly, and is usually accurate. However, it is not accurate all of the time. Is this problem unsolvable, or just unsolved?

Sharing Data
The development of new drug therapies is very hard, and sharing data can make commercial reward harder to obtain. However, it may make it easier to find new therapies, and some groups are investigating data-sharing approaches.

Crystallisation is often the rate-determining step in the analysis of proteins and other molecules, as there are no procedures which guarantee crystal production. A group at Imperial College has discovered that a nucleating agent can help the process in many cases.

Does the open-source software mechanism work well?
An article in the Economist discusses the effectiveness of open-source software. Despite problems with the model, the open-source approach has man advantages.

ChemAxon has announced free access to a cheminformatics toolkit for non-commercial freely accessible web resources (Press release: PDF).

Food Standards Agency
The Food Standards Agency is an independent UK Government department set up in 2000 to protect the public's health and consumer interests in relation to food.

Chemistry 2000 (c2k)
Chemistry 2000 (c2k) is continuing to provide up to date information about chemistry departments, learned societies and chemistry journals around the world. Since the last report in June 2005 (Chem. Inf. Lett. 2005, 10, #6, 69). The database now lists 1860 departments from 139 countries. The United States of America has the most department (637 - down slightly from last year), followed by France (101) and then Germany. Britain is in fourth place. The French academic system is not arranged in the same was as the British one, and this probably inflates the number of departments listed. There are now 2989 sites listed in total (departments, learned societies and journals), of which about thirty are inaccessible in a typical month. The database has slightly fewer entries than last year.

Google Trends
Google Trends gives a measure of the popularity of particular subjects amongst all Google searches. For example, it is possible to compare chemistry, physics and biology. The Philippines, Pakistan and India use these search terms more than other countries, as a proportion of their total use. In the USA, chemistry is the most popular of the three, but in the UK, biology and chemistry are almost equal.

Public Science in the USA
The Cornyn-Lieberman bill (PDF) requires the NIH and other USA-government agencies to create an on-line list of all publicly accessible research papers. Some publishers and learned societies oppose the bill, despite the benefit of easier access to publicly-funded research.

InterDok has announced its Directory of Published Conference Proceedings is now free.

Chmoogle is now eMolecules. The site claims to put 'the world's most powerful cheminformatics system into the hands of the "common chemist"' although its results usually seem to be limited to information from PubChem and some chemical suppliers.

Electrochemical Society
The Electrochemical Society has announced the opening of the ECS Digital Library

Spectroscopy Now
SpectroscopyNow (Chem. Inf. Lett. 2002, 5, #1, 3) and separationsNOW, two free websites from Wiley, have been relaunched.

CAS Information Use Policies
CAS Registry Numbers are regulated, so databases may use no more than ten thousand of them before paying for a license.

Needles in haystacks
Is it possible to find information from noisy data? Ramani Pilla and colleagues from Case Western Reserve, have reported a new approach in Phys. Rev. Lett., which has a geometrical interpretation as well as broad application.

IUPAC-NIST Solubility Data Series Database
With the March 2004 update, this freely available database includes contains over 67,500 solubility measurements.

Garfield's essays
Eugene Garfield founded the Institute for Scientific Information (now Thomson Scientific). A collection of his essays and comments are available, including remarks on citations and peer review.

The discussions between the ACS and the NIH over PubChem were last mentioned in Chem. Inf. Lett. 2005, 11, #4, 48. A letter from the past president of the ACS, William Carroll, was sent to the NIH in March, according to the SPARC OSForum which obtained it under the US Freedom of Information Act. The response from the NIH is also on the same forum. A blogger has also sent a response.

© 2000-2006 J M Goodman, Cambridge; Chemical Informatics Letters ISSN 1752-0010
Cambridge Chemistry Home Page CIL Chemical Calculations Goodman Research Group Chemical Information Laboratory Webmaster: J M Goodman