Chemical Informatics Letters

Editor: Jonathan M Goodman

Volume 14

Computational Crystallography Toolbox
The Computational Crystallography Toolbox is an open source component of the PHENIX (Python-based Hierarchical ENvironment for Integrated Xtallography) project to automate macromolecular structure determination. Espoir, a reverse Monte Carlo and pseudo simulated annealing code for ab initio crystal structure determination is also available.

Biomolecular Explorer 3D
Biomolecular Explorer 3D is designed to give high school biology teachers easy access to interactive 3D structures of biologically significant molecules.

User-driven search engine
Jimmy Wales, the founder of Wikipedia, is planning a commercial, user-driven search engine, with user preferences being used to rank sites.

Karypis lab
George Karypis, an Associate Professor at the Department of Computer Science and Engineering at the University of Minnesota in the Twin Cities of Minneapolis and Saint Paul. A computer scientist with and interest in Chemical Informatics.

A guide to the internet
The Association for Computing Machinery has released a guide to the internet.

Dangerous DNA?
Suppose there were a DNA sequence so dangerous that it killed any organism that contained it. If it existed, it could be found be looking for DNA sequences which are absent from all sequenced DNA. Professor Greg Hampikian has been looking for such sequences, and has found absent series which should be present if DNA were just a random sequence.

The program system MOLGEN calculates all isomers that correspond to a given molecular formula. A demo version and an on-line version are available for free. More information is available from the authors at the mathematics department of the university of Bayreuth.

JISC store
JISC, the UK Joint Information System Committee, has a project to encourage interactions between repositories of research publications and repositories of primary research data.

Tripos has sold its Discovery Research section to Provid Pharmaceuticals and its Discovery Informatics section to Vector Capital.

Chemline interface
The National Library of Medicine has just released a new Chemline interface, which allows both MDL Chime and Chemaxon Marvin to be used for input and display. This facility should be compared with the PubChem sketcher (accessible from the PubChem search page's Sketch button.

Molecular Workbench
Molecular workbench can be launched directly. It can be used to visualise simulations for science education.

Physical Properties
Arizona State University libraries have constructed an index to physical and chemical property data.

The end of open review in Nature
The editors of Nature have ended their experiment in open review, which allowed authors who passed an initial editorial assessment to post their papers on-line for open and signed reviews. About five per cent of authors chose this option, but the comments on the papers had less detailed analysis than would be expected from a referee, and many papers received no comments. PlosOne publishes papers using a somewhat similar scheme, and is continuing to do so.

Galapagos acquires Inpharmatica
Inpharmatica, which uses informatics in medicinal chemistry and ADME to help drug discovery, has been acquired by Galápagos, a genomics-based drug discovery company. BioFocusDPI has also been acquired.

UK Government closes websites to improve information transfer
The UK government has run almost a thousand websites, mainly in the domain. More than five hundred will now be closed, and only twenty six will definitely remain.

e-Therapeutics is a systems biology drug discovery company, whose CEO is Professor Malcolm Young from Newcastle University.

CSA trust newsletter
The Winter Newsletter of the Chemical Structure Association Trust is now available.

Front page to the internet
Google is now so powerful that it might be considered to be the internet's front page. If a site is not listed on Google, does it become invisible?

Search Engines
Changes in search engines since Chem. Inf. Letters 2006, 12, 19 include a list of a hundred new search engines. The website SearchEngineWatch continues to monitor developments. The Wiki Search project is aiming to revolutionise searching in a Wikipedia-inspired way.

Synthesis Design
How good are computers at designing syntheses for molecules? A number of programs have been described recently: ARChem - Route Designer (Automating Retrosynthetic Chemistry) is an expert system from SimBioSys which helps chemists design synthetic routes. SimSoup is a graphical Artificial Chemistry simulator, which calculates how groups of molecules interact. ROBIA is a reaction prediction program.

A software environment for statistical analysis, molecular viewing, descriptor generation, and similarity search. The last update was released in March 2005, from the National Institute of Statistical Sciences.

Chemical structure look-up service
The chemical structure look-up service (CSLS ) was developed at the NCI CADD Group headed by Dr. Marc Nicklaus. It searches over eighty databases and over twenty seven million structures.

pKa Data and Calculation
Programs are available for the calculation of the pKa of molecules, usually based on the correlation of sigma constants of functional groups with pKa. The Wikipedia outlines some of the methods. Some free programs and databases are listed here:

XDrawChem is a two-dimensional molecule drawing program for Unix operating systems, last updated in 2005.

Japanese chemistry
A list of Japanese Chemistry databases is available from the Science Links Japan portal.

Reasons for open access publishing
The University of Alberta library blog has an article on reasons for open access. The Office of Scholarly Communication and Publication at the University of Wisconsin-Madison also has a view, and a response from the libraries director. The Belarussian State University has a directory of free full-text journals in chemistry.

CAS anniversary
Since 1907 (PDF), CAS has indexed and summarized chemistry-related articles, and has reported on over thirty million substances (statistical summary PDF). This impressive achievement attracts speculations about the next developments.

Submission, Preservation and Exposure of Chemistry Teaching and Research Data (SPECTRa) is a project to develop tools to deposit chemistry data in digital repositories. Other spectral databases are available, including KnowItAll, FDM Library, Spectral Database for Organic Compounds (SDBS), Coblentz data from the Coblentz Society and NMRShiftDB

Is the Wikipedia reliable enough? If not, try the Scholarpedia or the Citizendium, which may have systems to be more reliable, but have dramatically less information.

Machines that can make anything
Fabbers aer machines which can construct three-dimensional objects on the desktop. Unfortunately, they are a long way from fabber-molecular.

Novo Nordisk drops small molecules
Novo Nordisk announced it wil focus all of its research and development efforts on protein-based therapeutics, a market that is projected to increase from about fifty billion dollars in 2005 towards a hundred billion dollars according to Drug Discovery News.

Lavoisier's notebooks
Thirteen of the fourteen laboratory notebooks by Lavoisier are now available on the website of Panopticon Lavoisier.

Metabolome database
The Human Metabolome Database (HMDB) is a freely available electronic database containing detailed information about small molecule metabolites found in the human body (press release), developed by David Wishart and his research group at the University of Alberta.

Source Code for Chemistry
How much source code is available for chemistry-related programs? How much of this can be retrieved easily? SourceForge currently has more than 260 chemistry projects. Google code search finds thirty thousand hits on a search for "Chemistry". Krugle finds only twelve thousand. All The Code finds fewer than four hundred. There is no simple relationship between the number and the quality of the hits.

Unsolved problems in chemistry
The Wikipedia has a list of unsolved problems in chemistry. Some of these problems, at the time of writing, are simply misunderstandings or so general that no single insight or group of insights will lead to satisfying solutions. Can the page be edited to contain serious challenges? Linus Pauling (here pointing to the incorrect structure of diborane) lectured on unsolved problems of his time.

Simple XML
How simple is XML? XML::Simple is available to interface Perl and XML, as does XML::LibXML and other resources. The large number of XML tutorials which are available (eg a, b, c, d, e, f, g, h, i, j, etc) suggest it is important but not simple.

Patent changes in USA
A new judgement from the U.S. Supreme Court may make it easier to challenge patents by requiring them not to be obvious.

Chemical Kinetics Simulation
IBM's Chemical Kinetics Simulator program (CKS) is available for downloading from the IBM website and can simulate complex chemical kinetics in gas, solution and solid phases.

Professor Nina Nikolova-Jeliazkova from the Bulgarian Academy of Sciences works on a number of projects in chemical informatics including AMBIT - software for chemoinformatic data management.

ChemSpider makes available a database of chemical structures and predicted properties as well as providing access to a series of property prediction algorithms. Its database currently has over ten million compounds, and was written by 'ChemZoo' which currently appears not to have a website. A significant amount of the data is gathered from PubChem.

RefViz from Thomson searches and analyses references visually for major themes. A free trial is available.

BioMed Central and Chemistry Central Blogs
BioMed Central, PhysMath Central and Chemistry Central have started blogs.

Publishing Technology plc
VISTA and Ingenta have merged to create Publishing Technology plc, which will be the "largest provider of specific software solutions to the publishing industry".

CHEMAPPS is a software company which produces SARvision for the analysis of chemical data sets. It also makes available a glossary for medicinal chemistry.

MDL/Elsevier acquires the Beilstein database
MDL/Elsevier has acquired the Beilstein Database (but not the Beilstein Institute). Both the production and marketing of the Beilstein Database have been managed by Elsevier since 1998.

QSAR World
QSAR World is a free online resource dedicated to QSAR from Strand Life Sciences.

PCC is free software for searching and integrating chemical structures and scientific data, but only runs on PCs.

NASA to go metric
Just more than a century after the Mendenhall Order which adopted the metre and kilogram as the fundamental standards of length and mass in the United States, NASA has decided to use the metric system for all journies to the moon. The NIST has a metric programme, that " seeks to accelerate the Nation's transition to the metric system". Most countries already use the metric system.

Courses on chemical informatics
University and college courses in chemical informatics were surveyed in Chem. Inf. Lett. 2006, 12, #5, 49. There are now a large number of course modules on chemical informatics, and an increasing number of degree courses.

ASDL is a peer-reviewed digital library for the analytical sciences.

How to make papers readable
Some papers are not read because they are written badly. Whether or not they are open access may also have an effect. The Science Citation Index is no longer the only way to track citations (PDF), and this change might enhance the importance of open-access journals. There is a directory of free full-text journals in chemistry as well as an index of chemistry journals with a web presence (updated monthly).

Eigenfactor ranks journals much as Google ranks websites using citations to evaluate the importance of each journal.

Substructure search tutorial
Macs in Chemistry has released a tutorial on creating molecule databases and searching for substructures using fingerprints.

UK-QSAR and ChemoInformatics Group
The UK-QSAR and ChemoInformatics Group has updated its website.

The Chemical Informatics and Cyberinfrastructure Collaboratory (CICC)
The CICC at Indiana University combines grid computing with chemical informatics and has a list of chemical databases on the web (alphabetical).

Wolfram demonstrations
With a new release of Mathematica, many demonstrations are available on the web. Some are relevant to chemistry, including protein visulisation, spherical harmonics, properties of the elements, hydrogen orbitals and a double helix.

Semantic web breakthrough
Researchers at the DERI have searched more than seven billion RDF statements in a split second (details in a PDF). This has implications for semantic web search engines such as Swoogle.

Mind map of chemistry
How are all the subjects in chemistry connected? Here is a Mind map which suggests a way of connecting subjects. This is much simpler than mapping the Blogosphere or the whole web.

Brussels Declaration On STM Publishing
The Brusssels Declaration (available as a PDF) was issued in April 2007 by "the international scientific, technical and medical (STM) publishing community" expresses concerns about the impact of open access publishing. The University of California Libraries' Collection Development Committee has a study on methods of assessing journal values and comparing them with prices (PDF).

Beilstein compound classes
Lawson numbers, developed by Alexander Lawson, divide the Beilstein database up into related groups of molecules.

c2k is continuing to provide up to date information about chemistry departments, learned societies and chemistry journals around the world. The last report was in June 2006 (Chem. Inf. Lett. 2006, 12, #6, 61). The database now lists 1935 departments from 142 countries. The United States of America has the most departments (636 - down one from last year), followed by France (101 - not all of which would be 'departments' by conventional UK nomenclature) and then Germany (89). Britain is hanging on in fourth place. There are now 3072 sites listed in total, including departments, learned societies and 860 chemistry-related journals. The automated checking process usually finds about twenty are inaccessible in a typical month. The database has slightly more entries than last year.

The DrugBank database is a bioinformatics and cheminformatics resource that combines drug data with drug target information. The database contains nearly over four thousand drug entries including more than a thousand FDA-approved small molecule drugs. It is a project of the Wishart research group at the University of Alberta.

Database of Protein-Ligand Binding Affinities
The Gilson Group, at the University of Maryland Biotechnology Institute, has developed a web-accesible database of binding affinities for biomolecules, modified biomolecules, and synthetic compounds. The database is described in Nucleic Acids Research 2007, 35, D198-D201 (doi:10.1093/nar/gkl999), and can be searched by compound or by target.

Patent peer review
The USPTO is experimenting with peer review for patents. This might help to prevent court cases on patent validity.

Scientific Commons
Scientific Commons aims to provide comprehensive and freely available access to scientific knowledge. The home-page suggests it currently links to over fifteen million publications. A search for chemistry suggests that this is currently an unrepresented field.

Nature Precedings
Nature Precedings (not 'proceedings') is a place for researchers to share pre-publication research, supplementary findings, and other scientific documents. Submissions are screened but do not undergo peer review. The chemistry section has seven entries currently, mainly focussing on chemical informatics and biological chemistry. The ChemWeb preprint server hosts many interesting articles, but is now closed to new submissions. The ACS refused to accept for publication anything which had previously appeared on the ChemWeb preprint server. Will the same thing happen with Nature?

How effective is the Hirsch-index?
The Hirsch index, h, is defined as the number of papers from a particular author or group with h or more citations. It is effective because it is robust to minor errors in databases, and straightforward to calculate (PDF;

ChemRank allows you to add papers to a database, make comments, and vote on how good it is. This should allow a community-wide ranking of papers. The site started at the end of May 2007, and has only a few papers so far.

Moodle is an open-source course management system, which helps create on-line learning communities. It has some information on Chemistry/Biochemistry.

Semantic Web: path ahead
The semantic web may make knowledge accessible to software agents, and despite many challenges ahead, it may become the way everybody works.

AutoDock 4
AutoDock, the molecular docking program, is now available under the GPL.

Elementeo is a chemical card game, created by a team of school children, aged about thirteen.

© 2007 J M Goodman, Cambridge; Chemical Informatics Letters ISSN 1752-0010
Cambridge Chemistry Home Page CIL Chemical Calculations Chemical Information Laboratory Goodman Research Group Webmaster: J M Goodman