Chemical Informatics Letters

Editor: Jonathan M Goodman

Volume 9

Chemical Informatics Letters now available via RSS, which can be viewed on a web browser here, using the RDN's viewer, or by loading into any RSS reader. It has been suggested that Google is considering RSS, but it currently prefers Atom.

Statistics can be wrong
This paper in the BMC Medical Research Methodology concludes that statistical practice is generally poor, even in renowned scientific journals.

Scientific Publications: Free for all?
The Science and Technology Committee of the UK Parliament has published its report on scientific publications. It concludes that the current model of publishing is unsatisfactory and encourages the creation of institutional repositories.

PubChem and the NIH
The NIH has announced the establishment of a Chemical Genomics Center, which is the first component of a network that will produce chemical tools for use in biological research and drug development. To support this a repository of chemical compounds will be established and deposited in a central database, called PubChem (powerpoint presentation), which will be managed by the National Center for Biotechnology Information and will be freely available to the entire scientific community as part of its Molecular Libraries initiative.

Open Access
The US House of Representatives have recommended that the NIH provide free access to all the research it funds. Elias A. Zerhouni, the director of the NIH, has asked the publishing executives to tell him how best to manage material so that the public can freely use it. Alan Leshner chief executive officer of the American Association for the Advancement of Science, was concerned about releasing articles immediatedly, but not worried about releasing them after six months. The Proceedings of the National Academy of Sciences USA (PNAS) now has an open access option, whereby authors can pay a thousand dollars to make their article freely available on-line. PNAS is already freely available in over 140 countries.

Changing use of .edu
The .edu suffix used to be a clear indication that a site came from a university in the USA. However, this is no longer the case. Non-USA institutions with a .edu suffix include:

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of small molecular entities, from the European Bioinformatics Institute (EBI).

The Challenges with Substance Databases and Structure Search Engines
This article by Helen Cooke (Schofield) and Damon Ridley (Australian Journal of Chemistry, 2004, 57(5), 387-392) outlines the limitations of the use of connection tables to describe chemical structures. These cannot easily cope with polymers or catenanes, for example. It the describes how some search engines deal with the problems.

Biomedical Acronyms
This database of biomedical acronyms, constructed in the Garner Group at the University of Texas Southwestern Medical Center, was created in an automated process from MEDLINE and contains over two hundred thousand acronyms.

The International Centre for Diffraction Data is a non-profit scientific organization dedicated to collecting, editing, publishing, and distributing powder diffraction data. The Powder Diffraction File contains 279,864 unique entries. Site licensing is available.

Computer Science Bibliography
This index of articles has over half a million entries, and so a search for 'chemistry' has many hits.

Is chemistry getting easier?
The June 2004 list of the top 500 supercomputers has a diagram showing subject areas. Chemistry is getting less attention than it was, relatively speaking. Does this mean the subject is getting easier, or the problems easy enough for supercomputers have all been solved?

The popularity of RSS feeds appears to be increasing. There is an index of academic library RSS feeds at Iowa State, RSS access to's section for Chemistry News, and at NewsIsFree's science section: Science News.

Open Patent Services
This site delivers raw patent data in XML. As a consequence, it can be accessed by other searching services, such as the free IP news flash.

CAS science spotlight
A new service from Chemical Abstracts which highlights the most requested, the most intriguing and the most cited chemical research.

Crystallography Open Database
This database, last updated in September 2003 has 12000 crystal structures.

IUPAC Chemical Identifier
A new test version (documentation and sample structure files, but Microsoft Windows only) of the IUPAC-NIST Chemical Identifier (INChI) is now available.

The Chemistry Style Manual
This book, by Professor Kieran Lim, may be downloaded from the web, and provides useful advice on chemistry writing.

How safe is DES?
The NIST has decided that DES (as described in the PDF FIPS46-3) is no longer secure enough to protect information for the USA federal government (PDF). Instead AES should be used (FIPS197 PDF)

Information about cheminformatics, ontologies and statistical mining. Cosmas is the patron saint of the chemical industry and chemical manufacturers.

CERN Document Server (CDS)
Over 650,000 bibliographic records, including 320,000 fulltext documents, of interest to people working in particle physics and related areas.

Collaboratory for Multi-scale Chemical Science.
The Collaboratory for the Multi-scale Chemical Science (CMCS) brings together leaders in scientific research and technological development across multiple DOE laboratories, other government laboratories and academic institutions to develop an open "knowledge grid" for multi-scale informatics-based chemistry research. CMCS is using advanced collaboration and metadata-based data management technologies to develop this grid and the associated CMCS community portal.

Reaction Reviews
Reaction reviews is a place to submit and discuss interesting reactions that appear in current literature and read reviews on new reactions

Organic Eprints
Organic Eprints is an international open access archive for papers related to research in organic agriculture.

Springer Open Choice
Authors for Springer journals (which include The Journal of Molecular Modelling) will be able to opt for 'Open Choice' once a paper has been refereed and accepted. In exchange for $3 000 per article, the paper will be made freely available world-wide, hosted on Springer servers, in addition to being published in the printed version as usual. If the authors do not opt for 'Open Choice' on-line access will be through the usual subscription arrangements.

This is an intermediate step towards an open-access approach to publishing. The ISI reports that 191 of the titles indexed in the Web of Science (out of almost 9 000) are open access. A study (PDF) on open access journals by the ISI suggests that citation impact and the frequency with which the journal is cited is no different for open access journals and other journals. However, a study in Nature concludes that free on-line availability increases a papers impact.

Elsevier has sold ChemWeb and closed BioMedNet and is concentrating on ScienceDirect and search tools to complement it, according to Nature news. Scopus is planned as a abstract and indexing database which links to the full text for which a library has access. The anticipated launch date is towards the end of 2004.

SciTechMed Information Service
The SciTechMed Information Service is free of charge to those who have published research in: EMBASE, Elsevier BIOBASE, COMPENDEX or World Textiles.

Recent Developments in Chemoinformatics
This meeting runs from November 14th - 16th, and includes discussion of: Pharmaceutical Chemoinformatics; Theoretical Chemistry and Modeling; Computer-Assisted Structure Elucidation; Metabolomics and Biomarker.

Better Search Engines
It will not be easy to develop a better search engine. New products are available, including Clusty, a clustering search engine and a preview of Microsoft's search engine. The Search Engine Wars continue, and there are new developments in Google Labs and Yahoo Labs.

This database, coordinated by Simon Woodward of the University of Nottingham, is a Europe-wide resource for distributing new ligands for organic synthesis and information about their activity. Public access (restricted to Europe) will be available from mid-2005, after which the website will become a market place for exchanging data on promising ligands and the exchange or sale of samples.

Sciencebase RSS newsfeed
An RSS/XML chemistry and science newsfeed from David Bradley.

Digital Library for Earth System Education (DLESE)
The DLESE is a partnership between the NSF and the DLESE Steering Committee. It is a grassroots community effort involving educators, students, and scientists working together to improve teaching and learning about the Earth, and includes chemistry resources,.

Despite its name, its seems likely that technetium does occur naturally ("The ignored discovery of the element Z = 43" Peter H M van Assche Nuclear Physics A 1988, 480, 205-214.) Walter Noddack, Ida Tacke and Otto Berg discovered an element they named masarium, which appears to be a minute quantity of Technetium

Biomed Central
BioMed Central provides Open Access XML full text of its articles, and so is ideal for data mining. This may be important in enabling Patent Offices to determine prior art.

Indexcat, the Index-Catalogue of the Library of the Surgeon-General's Office, 1880-1961, is now made available from the National Library of Medicine.

The Challenges with Substance Databases and Structure Search Engines
This article, by Helen Cooke and Damon Ridley has been published in the Australian Journal of Chemistry, 2004, 57(5), 387-392.

Errata in on-line journals
What should on-line journals do about mistakes that will, inevitably, be incorporated from time to time? A survey of what publishers do has been published (PDF) by Emily Poworoznek, the issue has been discussed in the context of physics journals, and the National Library of Medicine has a policy. The Journal of the American Chemical Society has published a paper 'ASAP' which was subsequently withdrawn before being printed. The PDF and HTML versions of the paper are no longer available.

A site designed to aid chemists in their search for employment, with some links to chemistry resources.

The University of Buffalo's library has assembled a guide to toxicology resources which includes many useful links. In addition to these resources, the United Nations has a project on regionally based assessment of persistent toxic substances.

Medline coverage
Medline now indexes articles back to the 1950s, extending its coverage back through time. Diseases which were common and are now returning, such as tuberculosis, may, therefore, be under represented. See Paying Homage to the Wisdom of Voices from Medicine's Past by Abigail Zuger.

A targeted chemistry search engine, from ChemNet, and the Korea Chemical Network. Despite the scientific focus of the search engine, it is slower and less accurate than Google for the web searches tested, but the directory and product searches appear to be provide useful resources.

Rock Magnetism
The Institute for Rock Magnetism provides scholarly resources online, including the Rock Magnetic Bestiary.

Making Websites Usable
Jakob Nielsen, author of a book on web site usable has given an interview describing his views on Web site design: simplicity is best.

Baik Group
The research group of Professor Mu-Hyun (Mookie) Baik uses quantum chemical models and develops novel methods of extracting chemical information from these calculations. One of the projects is the development of artificially intelligent chemical expert software to automate the computational analysis of chemistry.

GRID computing: Globus Toolkit 4
GRID computing could still be the next big thing. IBM are about to release Globus Toolkit 4 which provides an API for building stateful Web services targeted to distributed heterogeneous computing environments. Globus is a project intended to help the the applications of GRID concepts to scientific and engineering computing. Simpler systems, such as Condor are also available, designed for high throughput computing. The Taverna project aims to provide tools to facilitate the use of workflow and distributed compute technology.

Spresi is a Structure and Reaction Database, introduced in October 2002 by Infochem. It includes Synthesis Tree Search which searches for published synthesis reactions leading to and from the target. The Spresi database includes 4,5 million molecules, 3,5 million reactions, 380.000 references and 95.000 patents.

Gene Expression Omnibus (GEO)
The GEO, which became available in July 2000, is a high-throughput gene expression/molecular abundance data repository, a public repository for a wide range of high-throughput experimental data and an online resource for gene expression data browsing. BioBank is a repository of open access research data and also biological samples, run by the Genetic Alliance.

2004 Nobel Prizes
Who should win the 2004 Nobel Prizes? Analysis of citations has been used by ISI to make Nobel Prize predictions. Although none of the predictions were correct for 2004, perhaps they will be a useful guide for future years.

Journal Colours
Journals are colour-coded according to their policies on author self-archiving. There are at least two schemes for doing this:
  • RoMEO (Rights Metadata for Open archiving)
    • White: self-archiving not allowed
    • Yellow: pre-print archiving permitted (pre-refereeing)
    • Blue: post-print archiving permitted (final draft, post-refereeing)
    • Green: pre-print and post-print archiving permitted
  • Hanard (Summary statistics)
    • Gray: no green light to self-archiving
    • Pale green: pre-print archiving permitted (pre-refereeing)
    • Bright green: post-print archiving permitted (final draft, post-refereeing)
All the journals listed in the Directory of Open Access Journals are gold, which means they are available on open-access, whereas articles in green journals are only openly available if the author chooses to make them so. Gold journals, therefore, are also green (both bright and pale) blue and yellow.

W3C Workshop on Semantic Web for Life Sciences
This workshop ran in October 2004 in Cambridge Massachusetts. One focus of interest was the LSID (Life sciences identifier), which addresses the need for a naming schema for biological entities. This is a more ambitious aim than naming chemical entities, but Inchi may play a role in this. An archive of submissions is available.

The PGM Database
The Platinmum Group Metals Database is now live. This database comprises a large collection of physical, mechanical and chemical data for platinum, palladium, rhodium, iridium, osmium and ruthenium. Information is available on more than four hundred alloys and more than six hundred pages of related data.

Open clip art
This project aims to create an archive of clip art that can be freely used for any purpose. Currently, the archive is rather short of chemistry-related pictures, but a few are available.

RSS News Feeds for U.S. Pregrant Patent Publications
This news feeds covers the latest patent publications in the USA. There is also a Museum of Obscure Patents which profiles some surprising patent claims. It is clear that patents are not restricted by the laws of physics.

Enhanced Public Access to NIH Research Information
The NIH has sought public comments regarding NIH's plans to facilitate enhanced public access to NIH health related research information. The comment period is now closed. Background information is available. Donald E. Knuth has written on the crisis in scientific publishing (PDF). According to the Budapest Open Access Initiative "an old tradition and a new technology have converged to make possible an unprecedented public good".

Notes on Molecular Orbital Calculations
The text book Notes on Molecular Orbital Calculations by John D. Roberts is now on the web as a PDF file. The book was published in 1961.

CrossRef is a collaborative, cross-publisher reference linking service that turns citations into hyperlinks, allowing researchers to navigate online literature at the article level. A group of twenty nine leading journal publishers are participating in a CrossRef Search Pilot, including the American Physical Society, Annual Reviews, Institute of Physics, IUCR, Nature, PNAS, Wiley and Oxford University Press. *Google(tm) technology* CrossRef Search has been developed by CrossRef in partnership with Google.

Open Source mathematics packages
There are many commercial mathematics packages, including Maple, Matlab and Mathematica. Open source alternatives include Maxima, Octave, Axiom, Yacas, SciLab and JACAL. There are also lower-level packages available.

Many databases of Material Safety Data Sheets are available on-line, and some of these are free. Most have disclaimers about the limited liability of the data providers should the information turn out to be inaccurate.

MedLibraryAlert is an automated search engine that scans daily for your customized updates in the PubMed and MEDLINE databases and automatically e-mails you the results.

Applications of Cheminformatics and Chemical Modelling to Drug Discovery
This conference ran in November, and information on the talks is available on-line.

Survey of molecular modelling programs
Last reviewed in Chemical Informatics Letters 2003, 7, #6. This list excludes the major commercial molecular modelling packages and concentrates on programs for which the source code is available in some form and which are available freely or cheaply. Usually there is a license agreement restricting what may be done with the source code.

Originally developed by Peter Kollman, and now maintained by Professor David Case' group at the Scripps Research Institute and collaborators, AMBER 8 costs $400 for an academic license, which includes source code.

A molecular mechanics and dynamics program written in C by Professor Robert Harrison at Georgia State's Computer Science Department. It is some time since the program was last updated.

B, formerly Biomer; Free; Source Code; Java. Has moved from its old location to Professor David Case' group at the Scripps Research Institute. The page was last updated on 11th October 2002.

A plane wave/pseudopotential implementation of Density Functional Theory. The CPMD group is coordinated by Professor Michele Parrinello (Director of the Swiss Center of Scientific Computations and Professor at the ETH Zuerich) and Dr Wanda Andreoni (Manager of the Computational Material Science Group at IBM Zurich Research Laboratory). An e-mail discussion list is available to discuss the program. Last updated May 2004.

A quantum chemistry program using SCF, MP2, MCSCF or CC wave functions. The strengths of the program are mainly in the areas of magnetic and (frequency-dependent) electric properties, and for studies of molecular potential energy surfaces. The main authors are T. Helgaker, H. J. A. Jensen, P. Jorgensen, J. Olsen, K. Ruud, H. Ågren. DALTON 2.0 is expected at the end of 2004.

Free software project for atom scale simulation, which will incorporate Molecular Dynamics and Force Fields, Quantum Chemistry and Density Functional Methods. It is not clear if this program has been updated in 2004; the mailing list has not been active since February.

Ab initio calculations. A software company, Computing for Science, now administers GAMESS-UK, which remains free to UK academics. Martyn Guest, of the Daresbury Laboratory, is the main author. Version 6.3 is now available.

Ab initio calculations. The program is maintained by Dr Mark Gordon's research group at the Ames Laboratory. Last updated November 2004.

A computational chemistry software package released under the GNU GPL; C++; Linux. Developed in Finland by Tommi Hassinen and collaborators. Last updated December 2002.

A molecular dynamics package, available under GPL, initially developed by Herman Berendsen at Groningen University. Last modified March 2004 (version 3.2.1).

A molecular modelling program for periodic solids, gas phase cluseters and isolated defects, written by Julian Gale. Last updated June 2004.

This program provides an interface to other programs and is free to academics; Last updated June 2001.

Open Source Project; Mainly written in Python,with a small amount of C; Konrad Hinsen, from CNRS Orleans, who is also involved with FSatom (vide supra). There is now a MMTK Wiki, which has been updated this month. MMTK was updated in 2004.

An analysis program for molecular dynamics simulations, which interacts with MMTK. There is a nMOLDYN Wiki. The current version is 2.2.1.

A quantum chemistry package developed by Professor Peter Knowles at Cardiff University and Professor Hans-Joachim Werner at Stuttgart University.

A computational chemistry package that is designed both for workstations and high-performance parallel supercomputers, developed in the William R Wiley Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory, USA. Version 4.6 available in May 2004.

A free, full source code (Fortran) molecular mechanics and dynamics program, written in the Ponder Lab. Last updated in June 2004.

Yammp under Python. There are several methods for energy minimisation, molecular dynamics and Monte Carlo. From the Harvey Lab at the Georgia Institute of Technology. Last updated October 2004.

The Biological Innovation for Open Society (BIOS) initiative aims to make the latest genetics and biology tools freely available to researchers over the internet. BIOS encourages companies to contribute their research tools and technologies to the BioForgerepository. In return, users of the technology are bound by an open source license to share all improvements. This is an intiative of CAMBIA, the Center for the Application oof Molecular Biology to International Agriculture.

GUI design
What is the best way to design a graphical user interface? This article provides a series of useful guidelines, including an emphasis on getting the user's work done rather than encouraging them to appreciate the interface.

'The environmental chemistry web site for balanced comment on media reports which feed public concern and the growth of chemophobia." This site has been running since 1998, from Chemical and Bioanalytical Sciences at Royal Holloway, University of London.

World Community Grid
The World Community Grid's mission is to create the largest public computing grid benefiting humanity. Projects include a human proteome folding project. Partners include IBM and United Devices. Information on submitting proposals is available, and web-users are asked to join the grid (provided they are running Windows software).

European PubMed?
The Wellcome Trust is interested in establishing a European archive, inspired by the US National Library of Medicine's PubMed Central which has NIH funding. In due course, it may be that all Wellcome Trust funded research will be placed in a public access archive within six months of publication. The Foundation for Information Policy Research has called for the establishment of a European refence library system to which all publishers would have to contribute an unprotected electronic copy of everything over which they wish to exercise powers of copyright, following the precedent of existing Legal Deposit Libraries in the UK and Ireland. BBC radio recently broadcast an analysis of open access publishing: Publish or Be Damned (listen to the programme).

House of Commons Science and Technology Committee - Fourteenth Report
The UK's House of Commons science and technology committee has published a report on scientific publishing. This follows from the report earlier this year (Scientific Publications: Free for All?) (PDF 118 pages)? The fourteenth report's conclusions are critical of the government, saying that the DTI has sought to neutralise some organisations' views and so prevent significant progress, noting that the government has decided against an author-pays model, and regretting that the government has not taken more decisive action on institutional repositories. The Guardian comments that the government has sided with traditional subscriptions-based publishers. There are more comments in Scitech library questions.

Public molecules
David Bradley's article in Nature Reviews Drug Discovery 2004, 3, 988 discusses two collections of molecules which are publicly available. PubChem has over over 650 000 entries, whilst Chemical Entities of Biological Interest (CheBI) has less than four thousand entries. CheBI is a freely available dictionary of small molecular entities, whereas PubChem contains the chemical structures of small organic molecules and information on their biological activities.

X-ray links
This site from the University of Wisconsin's Chemistry library, lists link to X-ray crystallography sites. Other information is available from NIST, RIO-DB, Reciprocal Net, and an Engineering Chemistry Database

Google Scholar
Google Scholar searches the scientific literature. The American Chemical Society has complained that the name is too similar to their product SciFinder Scholar, even though this searches the Chemical Abstracts database and not the open-access web. Google Scholar may be most useful for those fields for which the open-access literature is key to the subject. Chemistry has so many closed-access journals of importance that Google Scholar appears to give an unbalanced view of the field, at the moment. Slashdot

SPARC Author's Addendum (PDF)
SPARC is developing a standardized "author's addendum" that researchers and authors will be able to use when submitting journal articles to insure retention of key rights. SPARC is seeking feedback on two draft addenda (PDF), developed by Professor Michael Carroll and representing two different approaches to the problem of balancing the needs of authors, readers, and publishers., an assistant professor at Villanova University School of Law, developed these drafts for SPARC.

Synthetic Dye Industry History
The United States synthetic dye industry emerged during the early twentieth century, and was the forerunner of the modern chemical industry. Now Robert Baptista, a former Bayer chemist, has started work on a website dealing with the history of the dye industry in the USA.

© 2000-2006 J M Goodman, Cambridge; Chemical Informatics Letters ISSN 1752-0010
Cambridge Chemistry Home Page CIL Chemical Calculations Goodman Research Group Chemical Information Laboratory Webmaster: J M Goodman