Chemical Informatics Letters

Editor: Jonathan M Goodman

Volume 13

Definition and exchange of crystallographic data
International Tables for Crystallography: Volume G: Definition and exchange of crystallographic data, edited by Sydney Hall and Brian McMahon, is now available from the International Union of Crystallography. Crystallographers lead the world in the precise recording of molecular scientific information, and this book provides an authoritative definition of the key file formats, focussing particularly on the Crystallographic Information File (CIF). The accompanying CD-ROM contains the CIF dictionaries in machine-readable form and a collection of libraries and utility programs.

This is a substantial book with nearly six hundred pages. This is necessary, because the book lists all the possible dictionary entries in the CIF format, in order to define all of the terms precisely and to address all of the detail required to rigorously cover all of the issues involved in crystallography. Some of the detail could perhaps have been abbreviated. For example, a detailed account of the CIF data dictionaries takes up about half the book, and this lists separately all the elements of the matrices and vectors in the data dictionaries, and sometimes these differ only in the indices. However, where this book errs, it is on the side of excessive detail and never risks losing precision by saving space. This bias is essential for a definitive reference work.

The book starts with a historical account of the development of CIF files, and then gives an account of the concepts and specifications for CIF files and related formats. Data definition and classification forms a major section, followed by a detailed description of the CIF data dictionaries. The final part of the book describes applications of CIFs, including some Fortran and ANSI C programs, which are also included on the accompanying CD-ROM. This final section may become dated quite soon, but the formal definitions, which make up most of the book, should be important for years to come.

This book is an essential reference for crystallographers, and for everyone who needs to check precisely what is recorded in a crystallographic information file. Future scientists will depend on these definitions, and so the time and effort devoted to the production of this book has been used well. Readers seeking a quick introduction to the CIF format may do better to consult the original paper (Acta Cryst. 1991, A47, 655-685), which outlines the structures of the files. However, everyone who needs a complete and authoritative description of the format, needs this book.

A successor to SwaN-MR - and MacOS X counterpart to MestReC windows program.

Public access to academic libraries
Does electronic access to journals restrict access for visitors? When journals were all printed, it was possible to go to libraries and read them. Electronic access has reduced this walk-up access, but not removed it. Many publishers allow walk-up access to their electronic journals as part of their license agreements, although access may be restricted, and libraries do not always make it possible to take advantage of the license terms.

Royal Society launches trial of new open access journal service
The Royal Society has started a trial of an open access journal service, called EXiS Open Choice (FAQ). This complements the Royal Society's existing journals. Authors now have the option to pay a fee to have their article available to everybody, as well as published in the existing journals. The first paper published in this way has had the publication fee paid by the Wellcome Trust. Royal Society's view on open access is available, as well as a detailed description of content availability for each journal.

The difference between MedLine and PubMed
MEDLINE (Medical Literature Analysis and Retrieval System Online) is the U.S. National Library of Medicine's main bibliographic database with references to journal articles in biomedicine and the life sciences. This is the main component of PubMed, which provides access to MEDLINE and some other resources, including articles published in MEDLINE journals which are beyond the scope of MEDLINE, such as general chemistry articles.

Public Library of Science (PLoS) losing money
Nature reports that the Public Library of Science is losing money. Its publishing model has just been adjusted to raise more money by charging authors more per article.

Top 500 computers
The list of the world's top 500 computers has been updated. First position goes to IBM's BlueGene, which is used for quantum chemistry and molecular dynamics, amongst other things. Since December 2005 (Chem. Inf. Lett. 2005, 11, #6, 69) the UK's top entry has moved up to number thirty (atomic weapons research), and the HPCx (UK national supercomputer service) has dropped to fifty-nine.

The difficulty of citation tracking
This article (abstract) shows how major databases give inconsistent results for citations. This is partly because of differing treatment of accents and punctuation and partly because of mistakes in the databases. However, the main reason is the different policies of the database companies in deciding what constitutes a valid reference. Analysing citations sounds as if it should be easy, but is much harder than it first appears.

Data preservation
The preservation of digital data is a headache for modern historians.

Cambridge MedChem Consulting
This company, run by Chris Swain provides medicinal chemistry consulting services, and links to resources, including iBabel, a Mac OS X graphical interface to Open Babel, which interconverts chemical structure formats.

Historical pigments
The Pigmentum Project is a programme to catalogue data on historical pigments. The Pigment Compendium is a reference work on the historical terminology and optical microscopy of pigments, which is also available as a searchable database.

BBC open news archive
The BBC is opening a section of its news archives as an experiment. There are terms and conditions for using them, but the BBC says, "You are welcome to download the clips, watch them, and use them to create something unique."

Intute is a free online service providing you with access to the very best Web resources for education and research, replacing PSIgate and related resources. The chemistry section contains hundreds of resources. It competes with other web pages, including ChemSpy which is a chemistry specific search tool.

Lost in a sea of science data
There is too much data, and this is hindering the progress of science. This article describes Purdue University's development of a data repository model to tackle this. Information is distributed around the campus, and librarians produce metadata to help users find and interpret this data.

How long does news last?
A study by Albert-Laszlo Barabasi of the University of Notre Dame, and colleagues, suggests that the number of people reading news stories on the web has a half-life of only thirty-six hours (Phys. Rev. E 2006, 73, 066132, DOI: 10.1103/PhysRevE.73.066132). The authors suggest that the results may be transferred to other web-based information sources. Purely search-based strategies for finding information are unlikely to keep up with this rate of change.

Publish or Perish
The poem, Publish or Perish, by Steve Harnad, a proponent of Open Access, has won a prize in the Euroscience Open Forum (ESOF2006) Poetry Competition.

Where can you find solubility data? NIST might be able to help, or the University of Alberta. Solubility is very hard to measure precisely and reproducibly.

Directory of open access journals
This directory, run by the Lund University Libraries aims to increase the visibility and ease of use of open access scientific and scholarly journals. It currently lists over two thousand journals.

USA may give up some control of the internet
It has been reported that the USA will give up some of its control over the internet, although it will keep control of the net's root zone file, at least for the moment. However, it looks as if the status quo will be maintained for the moment.

Google hosting for open source
Google has announced it will host open-source projects as part of its Google Code developers' network.

European Chemicals Bureau
The European Chemicals Bureau is a focal point for data and the assessment procedure on dangerous chemicals or data and the assessment procedure on dangerous chemicals. It includes ESIS (European chemical Substances Information System) which can be searched by name, formula, CAS number or by EINECS/ELINCS/EC number.

Chemistry World
Information from the RSC's Chemistry World is now available as an RSS feed.

InChI utilities
BKchem is a free chemical drawing program, that can create 2D structures from InChI names, without needing auxiliary information. The PubChem search facility can also be used to find structures from InChIs. An InChI can be pasted into the PubChem sketcher to create a two dimensional picture.

Open J-Gate
Open J-Gate is an electronic gateway to global journal literature in open access domain. It currently lists 3727 open access journals, with links to over a million articles.

Alternative drug approval
In the UK, homeopathic remedies will only have to show that they are safe, and not that they are efficacious, in order to be licensed. This is different to conventional medicines, which cannot be licensed unless they can show they have a beneficial effect. Some doctors unhappy, as are campaigners for the good use of science in public debates.

Big molecule of the month
There are a number of popular 'molecule of the month' websites, including ones at Bristol, Oxford and Prous Science. The RCSB PDB has run a molecule of the month since 2000, featuring interesting proteins from the databank.

The Obernai Declaration
A hundred scientists from Europe and America met in Obernai in 2006, and discussed research, teaching and industrial collaboration in chemical informatics. This lead to the Obernai Declaration.

InChI software version 1.01 is available
The new release includes a validation protocol and InChI to structure conversion (connection tables without coordinates).

Chemistry Central
The Science Navigation Group has announced the launch of Chemistry Central, an open access website for chemists. The same group also produces BioMed Central.

ChemRefer searches the web for chemical information from publications that provide free access to the full text of their articles.

Data mining challenge
The CoEPrA (Comparative Evaluation of Prediction Algorithms) 2006 competition started on April 17, 2006, providing a test for classification and regression algorithms in chemical informatics.

Chemistry blogs
alphabetically by URL
tenderbutton (closing October 2006)
Sceptical Chemist (Nature)
Geoff Hutchison
Occam's blog
science blogs: chemblog
useful chemistry
useful molecules
petermr's blog
endless frontier
sciencebase David Bradley;
the chem blog

OSIRIS property explorer
This on-line tool calculates molecular properties relevant to drug design from molecular structures.

Caltech's CODA (Institutional Repository)
Chemistry books include:
Ballhausen, Carl J. and Gray, Harry B. Molecular orbital theory: an introductory lecture note and reprint volume 1965
Langford, Cooper H. and Gray, Harry B. Ligand Substitution Processes 1966
Roberts, John D. and Stewart, Ross and Caserio, Marjorie C. Organic chemistry: methane to macromolecules 1971
Roberts, John D. Nuclear Magnetic Resonance: applications to organic chemistry 1959
Roberts, John D. An Introduction to the Analysis of Spin-Spin Splitting in High-Resolution Nuclear Magnetic Resonance Spectra 1961
Roberts, John D. Notes on Molecular Orbital Calculations 1961

Alan Wood's Pesticide Names
Alan Wood's web site provides a variety of useful information, including information on pesticide names and on character sets.

Mining data for materials
Gerbrand Ceder from MIT's department of Materials Science and Engineering has used data-mining techniques, linked with quantum chemistry to predict crystal structures. ; Caesar

Chemistry publishing
Professor Robert Schrag was selling his lectures on-line, but has been asked to stop by his university while it develops a policy for this sort of intellectual property.

Self-assembling gel stops bleeding
Professor Shuguang Zhang from MIT has developed a polypeptide which self-assembles and can stop bleeding.

Chemical Information Sources Wiki
A Wiki is user-updateable web page. The most famous is the Wikipedia. A Chemical Information Sources Wiki is now available, based on an undergraduate course from the Indiana University Department of Chemistry by Gary Wiggins.

Royal Society Archives on line
The Royal Society has made its archive of publications (1665 - present) available free on the web, until November 2006.

Nomenclature and representation
The first report of the IUPAC recommendations on graphical representation standards for chemical structure diagrams, is now been available.

Nanotechnology - good enough to eat?
Companies are interested in improving foods through the use of nanotechnology.

Clinical Trials provides regularly updated information about clinical research in the USA. Other resources are also available, including Pharmaprojects, the Investigational Drugs Database (IDdb), Centerwatch, TrialsCentral and WebMD.

Moves to open access
Publishers are moving towards open access publishing models. The ACS are offering 'Author Choice', the RSC are offering 'Open Science', and Wiley are offering 'funded access'. This is different to Elsevier's option of author archiving. This could mean the worst of all possible worlds, for universities. Academics have to pay to publish, and institutions still have to pay to subscribe, to obtain those articles for which authors have not been able to pay to publish. The richer research groups and institutions will increase their profile, because of their ability to pay for open access to their articles. Pharmaceutical companies, which publish little in relation to their research budgets, may benefit.

Sensing magnetic fields
How can animals detect magnetic fields? Cryptochromes may hold the answer.

Internet growth is stalling?
The internet cannot grow forever, and there are indications it is already slowing down (eg Chemistry on the world-wide-web: a ten year experiment. DOI: 10.1039/b409956g, even though there are now more than a hundred million websites on the internet. The web may be evolving towards web 2.0.

Plastics abbreviations
This list of abbreviations related to plastics has 272 entries, of which some have two meanings, including ACS, CM, PA, PAEK, PB, SI, and TEEE.

A History of the International Dyestuff Industry
The 150th anniversary of William H. Perkin's discovery of mauve is recorded by ColourClick from the Society of Dyers and Colourists

IUPAC Gold Book
The IUPAC Compendium of Chemical Terminology, known as the Gold Book is now available on-line.

PubChem training
How to get the most out of PubChem? On-line training is available.

Comparative Toxicogenomics Database (CTD)
The Comparative Toxicogenomics Database (CTD), from the Mount Desert Island Biological Laboratory and funded through the NIH, elucidates molecular mechanisms by which environmental chemicals affect human disease

A database of modular polyketide synthases, from the National Institute of Immunology in New Delhi.

Is nanotech dangerous?
Studies of nanotechnology issues from the Woodrow Wilson International Center for Scholars discuss the benefits and possible dangers of nanotechnology.

Money for science
The Higher Education Funding Council for England which supports English universities, has set aside $pound; 75 million over the next three years to support high-cost science subjects, including chemistry.

How to compare virtual screening methods?
Comparison of different virtual screening methods is hard, partly because there is no agreement on an ideal diverse test set. Cooperation may be the way forward.

Darwin online
A comprehensive collection of Charles Darwin's work will be available online from the University of Cambridge, complementing the existing online collection of Darwin's correspondence.

MEDPHYT is a database for plants of medicinal interest, developed by the Beilstein Institute.

Many databases of Material Safety Data Sheets are available on-line, and some of these are free. Most have disclaimers about the limited liability of the data providers should the information turn out to be inaccurate.

Web Science Research Initiative
Sir Tim Berners-Lee defines Web Science as the study of the web.

Bacteria that live in the dark
Living things nearly always depend, directly or indirectly, on solar energy in order to gain the energy to live. Bacteria have recently been discovered that use energy from radioactive uranium instead.

Survey of molecular modelling programs
Last reviewed in Chemical Informatics Letters 2005, 11, #6, 61. This list excludes the major commercial molecular modelling packages and concentrates on programs for which the source code is available in some form and which are available freely or cheaply. Usually there is a license agreement restricting what may be done with the source code. Also of interest is WebMO: a World Wide Web-based interface to computational chemistry packages (current version is 6.1)

Originally developed by Peter Kollman, and now maintained by Professor David Case' group at the Scripps Research Institute and collaborators, AMBER costs $400 for an academic license, which includes source code. All purchases of Amber version 9 was released on March 29th, 2006.

A molecular mechanics and dynamics program written in C by Professor Robert Harrison at Georgia State's Computer Science Department. The program appears to have been last updated in 2002, and the current Windows release version is 1.6.

B, formerly Biomer; Free; Source Code; Java. Has moved from its old location to Professor David Case' group at the Scripps Research Institute. The page was last updated on 11th October 2002.

The CHARMM Development Project is a network of developers working with Professor Martin Karplus. CHARMM is available for a $600 licensing fee. Latest release: CHARMM c32b2, February 2006.

COLUMBUS provides high-level multi-reference ab initio molecular electronic structure calculations, developed by Hans Lischka at the University of Vienna, amongst others. The latest version is 5.9.1.

A plane wave/pseudopotential implementation of Density Functional Theory. The CPMD group is coordinated by Professor Michele Parrinello (ETH Zurich) and Dr Wanda Andreoni (Manager of the Computational Material Science Group at IBM Zurich Research Laboratory). An e-mail discussion list is available to discuss the program, which is active in December 2006. Last updated May 2004. Current version is v3.11.

A quantum chemistry program using SCF, MP2, MCSCF or CC wave functions. The strengths of the program are mainly in the areas of magnetic and (frequency-dependent) electric properties, and for studies of molecular potential energy surfaces. The main authors are T. Helgaker, H. J. A. Jensen, P. Jorgensen, J. Olsen, K. Ruud, H. Ågren. Dalton 2.0 was released on March 4 2005.

A DFT code from the University of Montreal. Version 2.3 was released in November 2006.

Relativistic ab initio molecular calculations developed at the University of Southern Denmark by Hans Jorgen Aagaard Jensen and others. Latest version is Dirac04, released in 2004.

Free software project for atom scale simulation, which will incorporate Molecular Dynamics and Force Fields, Quantum Chemistry and Density Functional Methods. The last update appears to be the implementation of the database in mySQL in 2004; the mailing list has one message since February 2004.

Ab initio calculations. A software company, Computing for Science, now administers GAMESS-UK, which remains free to UK academics. Martyn Guest, of the CCLRC, is the main author. Version 7 is now available.

Ab initio calculations. The program is maintained by Dr Mark Gordon's research group at the Ames Laboratory. Last updated September 2006.

A computational chemistry software package released under the GNU GPL; C++; Linux. Developed in Finland by Tommi Hassinen and collaborators. Latest version is v2.10, released in August 2006 .

A molecular dynamics package, available under GPL, initially developed by Herman Berendsen at Groningen University. Last modified April 2006 (version 3.3.1).

A molecular modelling program for periodic solids, gas phase cluseters and isolated defects, written by Julian Gale. Version 3.0.1 is the latest version, which was updated in May 2006.

A Java based cheminformatics (computational chemistry) library, last updated in July 2006 and released under GPL. A Wiki is now available to discuss JoeLib.

Large-scale Atomic/Molecular Massively Parallel Simulator, a molecular dynamics code from the Sandia National Laboratory. The current version was released in October 2006.

Minnesota University Solvation Models and Software
Software for solvation models from the groups of Christopher J. Cramer and Donald Truhlar.

Open Source Project; Mainly written in Python,with a small amount of C; Konrad Hinsen, from CNRS Orleans, who is also involved with FSatom (vide supra). There is now a MMTK Wiki, which has been updated in November 2006, and a mailing list which was active in September 2006. The current release of MMTK is v2.4.4, and it was updated in 2005. nMOLDYN is an analysis program for molecular dynamics simulations, which interacts with MMTK. There is a nMOLDYN Wiki. The current version is v2.2.2 and it was updated in 2005.

MOIL is public-domain molecular modeling software, written in the group of Ron Elber. Current version is MOIL 9.1, released in June 2004.

A complete system of ab initio programs for molecular electronic structure calculations developed by Professor Peter Knowles at Cardiff University and Professor Hans-Joachim Werner at Stuttgart University. Version 2006.1 was released in June 2006.

MPQC is the Massively Parallel Quantum Chemistry Program, released released under GPL. The lead developer is Curtis Janssen of Sandia National Laboratories. The latest release is v2.3.1 (March 2006).

A computational chemistry package that is designed both for workstations and high-performance parallel supercomputers, developed in the William R Wiley Environmental Molecular Sciences Laboratory (EMSL) at the Pacific Northwest National Laboratory, USA. Version 5.0 is the latest version, and the documentation was updated in September 2006.

The PSI3 suite of quantum chemical programs is designed for efficient, high-accuracy calculations of properties of small to medium-sized molecules, developed by researchers from Viginia Tech, Georgia Tech, the Oak Ridge National Laboratory and Bethel University. The latest version psi-3.2.3, released in October 2005.

A free, full source code (Fortran) molecular mechanics and dynamics program, written in the Ponder Lab. Version 4.2 became available in June 2004.

Vigyaan is an electronic workbench for bioinformatics, computational biology and computational chemistry, which includes a number of chemistry programs ( Ghemical; Jmol; MPQ; PSI3; XdrawChem) The latest stable version is v 1.0 which was released in 2005.

Yammp under Python. There are several methods for energy minimisation, molecular dynamics and Monte Carlo. From the Harvey Lab at the Georgia Institute of Technology. Version 1.0.061201 was released on December 15, 2006.

New CAS names
CAS has decided to make changes to the way it names molecules. These affect large groups of molecules, including ylides, (abolished to make way for inner salts), preferred tautomeric forms, the order of precedence of amino acids, and almost three thousand stereoparents (compounds whose names imply particular stereochemistries).

Molecular Descriptors
The Milano Chemometrics and QSAR Research Group, led by Roberto Todeschini, has released a new website devoted to molecular descriptors.

Are patents good for pharmaceuticals?
The ACS (The American Constitution Society, rather than the American Chemical Society) notes that a General Accounting Office report (PDF) concludes current patent law discourages drug companies from developing new drugs. Joseph Stiglitz points out the problems the system causes for developing countries.

Google patent search
Google can now search patents in the US patent office.

How big is PubChem?
PubChem now has over fifteen million entries. In March 2006, Reactive Reports gave a figure of only eight and a half million compounds of which five and a half million were unique.

The Selected Organic Reactions Database is capturing chemistry recorded in theses and dissertations but never published. Co-founded by Dick Wife, it has been the subject of a JISC consultation and uses ACD technology.

Mendeleev and other peoples tables
How important was Mendeleev? A new book shows how others were also involved in the development of periodic tables. Mendeleev's fame is assured in Russia, however, for his development of vodka technology.

MyStructure is an open-source plug-in for MySQL that stores chemical structures and calculates descriptors and properties.

Has the XML decade just finished? IBM has just published an issue of its systems journal, called Celebrating 10 Years of XML.

The Chemistry Development Kit (CDK) is a Java library for structural chemo- and bioinformatics. It originated in Christoph Steinbeck's group at Cologne University Bioinformatics Center.

Linux4Chemistry, a list of chemistry resources for Linux, is now hosted at Dublin City University, and maintained by Noel O'Boyle.
© 2000-2006 J M Goodman, Cambridge; Chemical Informatics Letters ISSN 1752-0010
Cambridge Chemistry Home Page CIL Chemical Calculations Goodman Research Group Chemical Information Laboratory Webmaster: J M Goodman