Chemical Informatics Letters

Editor: Jonathan M Goodman

Volume 8

Science Inventory (January 2004)
A searchable catalog of more than 4,000 science activities of the US environmental protection agency (EPA). The site allows keyword searches and highlights nine interdisciplinary topics: Aging Initiative; Contaminated Sediments; Ecological Assessment Tools; Genomics; Tribal Science; Children's Health; Cumulative Risk; Environmental Justice; Non-indigenous Species.

Systems Biology (January 2004)
What is 'Systems Biology'? A definition, from the American Chemical Society (C&EN, May 2003): Integrative approach in which scientists study pathways and networks will touch all areas of biology, including drug discovery. The Stuttgart University definition: Systematic approach, not focused on individual genes and individual proteins, instead interested in analyzing whole systems of genes or proteins by capturing information from many different elements of the overall system. The UMIST approach: Leroy Hood's definition is that it involves 'integrating technology, biology and computation'.

There are many websites, including: Systems Biology and Systems Biology! A Conference, Mass Spectrometry in Systems Biology, another Conference, software resources, SBML, CSBi, and PNNL, with interest from UCSF, Havard, Munich, Stuttgart, Jena, Japan, Italy, and Carnegie Mellon

A search using Google, gave the following numbers of hits:

  • "systems biology" 2003: 23500
  • "systems biology" 2002: 14800
  • "systems biology" 2001: 11200
  • "systems biology" 2000: 9990
  • "systems biology" 1999: 6650
A search of the scientific literature for "systems biology" gave almost 150 hits, most of which more than a hundred were from 2003, and nothing from before 1998. One of the first was a paper by Hood: 'Systems biology: New opportunities arising from genomics, proteomics and beyond' Experimental Hematology 1998, 26. Kitano wrote a brief overview in Nature 2002, 295, 1662.

EPA Right to know initiative (January 2004)
This project has been running since 1998, providing public access to information about high volume production chemicals.

Chemical Informatics Portal (January 2004)
This server, run by David Wild (Wild Ideas) at the University of Michigan's pharmaceutical engineering department, has a list of links and information about the Michigan 'Introduction to Chemoinformatics' class.

Heidelberg Central Spectroscopy Department - Molecular Modeling (January 2004)
Particular interested in sugar, but also web-based molecular modelling.

Information Retrieval in Chemistry (January 2004)
This page was updated in December 2003. It provides general chemistry links, rather than focussing on chemical informatics, despite describing itself as a "ChemInformatics host". The Ulysses search page holds classified listings of science-related search engines and sciences information directories.

Search Engines (January 2004)
Google is a very effective search engine, but it has many competitors. Can it retain dominance? Searching is a difficult problem. Search Engine Watch analyses the different programs which are available. Competitors to Google include Teoma which indexes fewer sites, and so may be more focussed, which tries to be comprehensive. Automatically grouping results ( Vivisimo; Grokker; Kartoo) may be a good approach. Subject-specific search engines may be appropriate for chemistry, although Google has done as good a job in many cases. Here are some science-specific search engines:
CiteSeer - Scientific Literature Digital Library

Scholarly Publishing (January 2004)
The UK Parliament's Science and Technology Committee is to look into scientific publication, under the chairmanship of Ian Gibson, a former chemistry lecturer. The The Committee would welcome written evidence from interested organisations and individuals.

ChemInfoStream (January 2004)
ChemInfoStream Barry Hardy's Chemical Informatics blog has news and views from the world of chemical information, modelling and informatics. There is also a Open source discussion - with access that is not open but requires registration and payment. (Correction - March 2004 - registration is required, but payment is not required to participate in the open source discussion))

Open source software (January 2004)
Mark Webbink, Senior Vice President and General Counsel of Red Hat, Inc., wrote this article for corporate attorneys, explaining free and open source software and comparing various open source licenses, detailing how the GPL really works, explaining US copyright law, and listing some corporate law office best practices for software, from the standpoint of what policies are prudent for the corporate environment. He comments on a series of myths about open source software, and comments on pitfalls to avoid.

ADMET-1 Conference (January 2004)
The ADMET-1 conference will be on February 11th-13th, 2004 in San Diego, California. Topics include: Computational Toxicology, Computational Metabolism/Excretion, Experimental ADMET, Computational Absorption/Distribution and a scientific workshop on Tox-ML.

Courses in Chemical Informatics (January 2004)

(Last surveyed December 2001 Chem. Inf. Lett. 2001, 3, #4.).

RSC on Open Archives (Feburary 2004)
Peter Gregory, the Royal Society of Chemistry's director of publishing, comments on open archive initiatives in an editorial in the final issue of Chemistry in Britain (December 2003). Should successful authors have to pay for the costs of authors whose work is rejected? Since most authors are academics, should academia subsidise industrial research?

Roboscientist (Feburary 2004)
A robot scientist has been created that creates theories, carries out experiments and interprets results. There is some way to go before graduate students, post docs and academics are made obsolete, however.

WebFountain (IBM) (Feburary 2004)
WebFountain has been developed by IBM to help companies keep up to date and respond to developments in their areas. It has three components: (i) data miners, crawlers and applications; (ii) a database of terabytes of unstructured and semi-structured data; (iii) text analysis, including natural language processing. It should be useful to businesses and to venture capitalists. More information is available from Unstructured Information Management, which reports that WebFountain suggested that petrol stations could improve sales by trying to attract police cars.

NIST - care for your CDs (Feburary 2004)
How secure is your data? The NIST have provided a guide on caring for CDs and DVDs, which recommends, amongst other things, that you should not use adhesive labels, and you should not store CDs horizontally for long periods.

Robert Pearlman (Feburary 2004)
Robert Pearlman is the Coulter R Sublett Regents Chair in Pharmacy at the University of Texas at Austin, and the director of the laboratory for the development of computer-assisted drug discovery software. He is most famous for the program CONCORD (available from Tripos) which generates three-dimensional molecular structures from two-dimensional diagrams. His work has recently been the subject of a University of Texas feature.

Household products database (Feburary 2004)
This NIH resource provides safety information about materials which are used in household products.

Office of Scientific and Technical Information - OSTI (Feburary 2004)
This office, run by the Department of Energy in the USA, provides information on energy science and technology, including the E-print network (preprints and communications), DOE research reports and citations of energy research.

Centre International de Recherche Scientifique (CIRS) (Feburary 2004)
This centre is, in its own words, "the first and the most important scientific portal". Surprisingly, the Internet Archive WayBackMachine has no record of it before December 2000. The same organisation has a number of internet addresses (; It is an index of scientific websites, from many disciplines. It is less comprehensive for chemistry than specialised indices, such as C2K, but provides a much broader coverage of science, and so is better compared with the UK Resource Discovery Network's science sections.

Copied Citations (Feburary 2004)
According to this paper, data on misprints in citations suggests that about 70-90% of scientific citations are copied from the lists of references used in other papers.

The end of Chemistry in Britain (Feburary 2004)
Chemistry in Britain, the RSC's monthly magazine for the chemical community has been replaced by Chemistry World and supported by Chemical Science, which illustrates the latest developments in chemistry from all RSC publications.

Free web book (Feburary 2004)
"Chemogenesis: The Story of How Chemical Reactivity Emerges From The Periodic Table of The Elements" by Mark R Leach, who worked at Salford University, is available on the web, and presents a personal view of chemistry. It is linked with other products available for order and not for free.

Molecular Design Group at Trinity College Dublin (Feburary 2004)
This group is run in the department of biochemistry by David Lloyd, the Hitachi Lecturer in Advanced Computing. The group's research is aimed at using molecular modelling to support the drug design process.

Berne Convention
The Berne Convention is an agreement that data cannot be protected by copyright. The USA is considering new legislation which will change this. The Database and Collections of Information Misappropriation Act will create new ownership rights. The SIIA (Software and Information Industry Association) is in favour. The ACM (Association for Computing Machinery) has had a poll of membership and concluded this is unnecessary. The NetCoalition and the National Academies are also opposed.

RDF and OWL Recommendations
The World Wide Web Consortium has announced final approval of two key technologies, the revised Resource Description Framework (RDF) and the Web Ontology Language (OWL). This is likely to be an important step in the development of XML Feb 2004 and the Semantic Web. A new MicroSoft patent appears to cover some software implementations of XML, but not the XML standard itself.

Center for Scientific Review (CSR)
The CSR organizes the peer review groups that evaluate about seventy per cent of the research grant applications sent to NIH. The organization of the review groups has been adjusted.

PubMed Central Back issue scanning
PubMed Central is scanning the back issues of PMC journals that are not already and these will be available free. These include Nucleic Acids Research and PNAS: Proceedings of the National Academy of Sciences of the USA.

Journal of Biological Chemistry
Almost a century of the Journal of Biological Chemistry (JBC) is freely available, 1905-2003, in addition to the current calendar. The American Society for Biochemistry and Molecular Biology (ASBMB) has an Open Access statement.

ACS archiving policies
Professor Richard Armstrong, the new editor of Biochemistry, has commented on ACS archiving policies. The ACS does not have a free, publicly accessible electronic archives policy, and Professor Armstrong says that this will erode the impact of ACS journals.

Academic Publisher Mergers
An analysis of the academic publishing industry from Northern Illinois Associate Dean for Collections and Technical Services Mary H Munroe.

BioInteraction tries to map all the protein interactions for biological scientists. Features include PSIMAP, the first global interactome map.

Computational biology at Sloan-Kettering
Computational Biology at Memorial Sloan-Kettering Cancer Center (MSKCC) hosts BioPAX : Biological Pathways Exchange

Daylight provides an on-line service which converts SMILES strings to GIF images of molecules. A similar service is available from Erlangen (see also Xemistry for academic downloads).

Various Informatics
Buffalo Informatics: The University of Buffalo has an informatics department, which concentrates on the intersection of human communication and information processes.

Irvine Informatics: a part of the computer science department, studying the design, application, use, and impacts of information technology

Museum Informatics: This project at the University of California, Berkeley is a collaborative effort to coordinate the application of information technology in museums.

How many compounds in CAS?
This page gives the latest number - currently around twenty three million organic and inorganic substances.

The effect of the Internet on Nature
Nature will increase the number brief communications arising from papers by publishing additional communications on-line only. They will be citatable by their Digital Object Identifiers

Lausanne Workshop on Chemical Information
The goal of this workshop is to bring together people interested in Chemical Information in order to share ideas and future perspectives. Speakers include Peter Ertl (Novartis, Basel) and Igor Tetko (Institute for Bioinformatics, Germany).

Chemical Information Locator from the Internet
This resource comes from the National Centre for Biodiversity Informatics in India, which mainly focusses on Indian flora and fauna. The Chemical Information Locator is a database containing synonyms, SMILES strings and CAS numbers, linked to an interface to search engines. This approach could be a powerful tool for searching for chemical information.

Elsevier announce the transfer of ownership to with effect from April 8, 2004.

(41) to close
The UK Mirror Service will still be available for use until July 31st 2004. JISC, which funded the service, has announced that Eduserve will take over the service. It will only offer freely available technical software resources, and no longer scholarly and academic resources.

Alternatives to animal testing
Is animal testing of new molecules necessary? Alternative methods are being investigated by the USA agencies NIH and USDA (United States Department of Agriculture) as well as campaigning groups such as FRAME (Fund for the Replacement of Animals in Medical Experiments).

Java future
Sun and Microsoft have ended a long running dispute. It is not clear what effect this will have on Java, or on a campaign to encourage Sun to make Java open source. A recent study of language performance suggests that C does not seem to have a significant performance advantage over Java.

Crystallography Online
Crystallography Online from the International Union of Crystallography lists numerous resources including databases of interest to crystallographers.

ACS ends publishing moratorium for trade-embargo countries
The American Chemical Society has decided to end its moratorium on publishing papers by scientists in countries under trade embargoes (Cuba, Iran, Iraq, Libya, and Sudan), and joined a litigation force, should this be necessary to overturn the Office of Foreign Assets Control (OFAC) ruling.

ChemIDplus is a searchable database of over a third of a million chemical substance records. ChemIDplus Lite is also available (Help) with a simpler user interface, but no structural searching.

The NIH use ChemIDplus in some of its other resources. For example, this page on Amprenavir has a number of references to the alternative names for the drug, stored in the ChemIDplus database.

Cambridge DSpace
Cambridge Dspace is a project to develop a digital repository for the University of Cambridge. In parallel with this, the LEADIRS project (LEarning About Digital Institutional Repositories Seminars) will promote planning for UK education institutional repositories.

Nature Biotechnology Directory Website
A global information resource listing over eight thousand organisations, product and service providers in biotechnology.

Journal Prices Decrease
The American Physical Society (APS) is decreasing the subscription price of its journals, despite increases in the number of pages and papers submitted. The price decreases are a result of the use of new technology and cost-control measures. Will any other publisher follow suit?

RSA security
What are the prime factors of 188 198 812 920 607 963 838 697 239 461 650 439 807 163 563 379 417 382 700 763 356 422 988 859 715 234 665 485 319 060 606 504 743 045 317 388 011 303 396 716 199 692 321 205 734 031 879 550 656 996 221 305 168 759 307 650 257 059 ? A hundred workstations working for three months have solved this problem - the RSA-576 Factoring Challenge and the answers are: 398 075 086 424 064 937 397 125 500 550 386 491 199 064 362 342 526 708 406 385 189 575 946 388 957 261 768 583 317 and 472 772 146 107 435 302 536 223 071 973 048 224 632 914 695 302 097 116 459 852 171 130 520 711 256 363 590 397 527. Computer cryptography depends on this factoring process being difficult. The RSA-576 challenge is easy compared with the RSA-2048 challenge - factoring a number with 617 digits - but how difficult is this, and how difficult will it remain as computers get faster and algorithms improve? How big a number is needed to effectively encrypt data securely? OpenSSL has been certified by NIST for some aspects of security, including DES Modes of Operation, and its standard strong key is based on only 128 bits.

Library futures
The BBC reports that UK library use is declining, and if two data points are linearly extrapolated, will cease within twenty years. This seems an unreasonably pessimistic view. The Americans for Libraries Council is promoting libraries for the 21st century. The UK government department for culture, media and sport has developed a framework for the future for public libraries, and the museums, libraries and archives council aims to connect people to knowledge and information, creativity and inspiration. A project to connect public libraries to the internet is now realising the aim of providing internet access to all those who would like it by 2005.

Opensource problems
Is an opensource approach the best to writing software? There are examples of very effective opensource projects, but others are less successful. The reasons may centre on opensource developers writing for themselves rather than for less-skilled users, and so pay less attention to interface design and documentation than a final user needs. Feed back on the design may come mainly from like-minded experts. Horror stories are available, licensing issues cannot be ignored, and an annotated chronicle has been written. Opensource is not a panacea, but a useful approach to many problems.

NewJour is an announcement list for new journals and newsletters that are available on-line, run from UCSD. An archive for NewJour is also available.

Finding Physical Properties of Chemicals on the Web
Indexes of resources on properties are available at Vanderbilt and from the Chemical Engineers Resource Page. An article on this by a Buffalo librarian is available from the Haworth Press.

Launch of DSSTox website
The Distributed Structure-Searchable Toxicity (DSSTox) Database Network provides a community forum for publishing standard format, structure-annotated chemical toxicity data files for open public access, and is run by the US Environmental Protection Agence. The files are available in Structure Data Format (SDF or SDFiles) and are simple, text files that can represent multiple chemical structures and associated data. Currently the project has data on over 2500 substances.

This website contains links to cheminformatics programs and Cheminformatics links (300 links in more than forty categories). The site is run by Andreas Bender from the Unilever Centre for Molecular Science Informatics at the University of Cambridge.

IUPAC names
A new draft of IUPAC's Nomenclature of Inorganic Chemistry is available for comment and can be downloaded from the IUPAC website. It includes a flowchart guiding readers to the rules they need, and an extended guide to alternative names for simple inorganic compounds.

RDF Site Summary (RSS) is a "lightweight multipurpose extensible metadata description and syndication format". More helpfully, it is a a format for syndicating news and the content of news-like sites. A suitable program will take an RSS channel and display it as a list of items. This can also be done within a web browser. For example:
uses a viewer at to view an RSS channel at, which are the Psigate RSS feeds. The technology can incorporate molecules: CMLRSS.

Universal NMR Databases for Contiguous Polyols
Professor Kishi from Harvard University, has assembled a large database of NMR spectra for polyols, which has been published by the American Chemical Society (Higashibayashi, S.; Czechtizky, W.; Kobayashi, Y.; Kishi, Y. "Universal NMR Databases for Contiguous Polyols" J. Am. Chem. Soc. 2003, 125, 14379-14393.)

Laboratory of Mathematical Chemistry
The LMC was founded in 1983, in the town of Bourgas, on the Black sea coast, as a laboratory for property modelling in Bourgas Professor Assen Zlatarov University. It was first headed by Professor Danail Bonchev, member of the Bulgarian Academy of Sciences, who has worked at the Virginia Commonwealth University's Center for the Study of Biological Complexity. Amongst its research projects, the LMC has produced MetaPath (Consolidation and management of metabolism information) and a centralised database (for all chemicals in regulatory agencies).

Software Patents in the EU
It has been possible to patent computer software in the USA since the 1980s (brief history) but it has not been possible in Europe. The European Union Council has now considered this, and has voted for software patents although doubts remain and the effect this will have is unclear. The Foundation for Free Information Infrastructure (FFII) is campaigning against software patents.

pKa calculation
Many programs are available for the calculation of the pKa of molecules, usually based on the correlation of sigma constants of functional groups with pKa.

ChemBank is a freely available collection of data about small molecules, concentrating on their effects in biology. Resources for studying their properties are also available. A database of 900 000 molecules can be searched or browsed.

Proposal for roentgenium
Roentgenium has been proposed as the name for the element with atomic number 111, after Wilhelm Conrad Röntgen, the discoverer of X-rays.

Electronic Laboratory Notebooks
The Pacific North West National Laboratory's collaboratory has produced a list of electronic laboratory notebooks in the collaboratory and beyond. A fuller list is available from Full annotation of chemical processes is a difficult challenge, even for superficially straightforward tasks such as making tea.

ALB Crystallography
This site is run by Armel Le Bail at the Université du Maine contains a variety of crystallographic resources, and a list of the 10858 most cited chemists (1981 - 1997).

NASA Institute for Advanced Concepts (NIAC)
The NASA Institute for Advanced Concepts is an institute of the Universities Space Research Association, and focuses on revolutionary concepts for space and aeronautics research. The institute funds revolutionary projects, and encourages applicants not to let preoccupation with reality stifle imagination. However, proposals based solely on technically unsubstantiated science fiction will not be accepted. Can chemistry make such a leap, or is its progress inevitably incremental?

SynGen is a organic synthesis generating program, developed by Professor James B. Hendrickson's research group at Brandeis University, which abstracts reactions and compares them with a database of reactions. A similar approach is taken by Lhasa, which relies on chemistry knowledge bases. This is different to reaction prediction, which is the aim of CAMEO (Professor William Jorgensen) and EROS (Professor Johann Gasteiger)

Chemistry 2000 (c2k)
Chemistry 2000 (c2k) continues to provide up to date information on chemistry departments, learned societies and chemistry journals. Since the last report in June 2003 (Chem. Inf. Lett. 2003, 6, #6, 61) the database has grown slightly - up by 2 % in the last year, with 2978 entries. The peak during the year was 2984 entries, and the average number of sites marked as inaccessible was less than twenty.

Subcommittee of the Analytical Chemistry Division supersedes the Commission on Solubility Data. Solubility - in cil v7n5

The Oxford Chemistry Department has set up on-line interactive science experiments, including Photodiode Experiments and Phosphorescent Decay.

Index Medicus to cease print at the end of 2004
The printed Index Medicus, started in 1879, will cease at the end of 2004. The National Library of Medicine stopped publication of the annual Cumulated Index Medicus in 2000, and PubMed was recognised as the definitive permanent source of MEDLINE in the same year.

© 2000-2006 J M Goodman, Cambridge; Chemical Informatics Letters ISSN 1752-0010
Cambridge Chemistry Home Page CIL Chemical Calculations Goodman Research Group Chemical Information Laboratory Webmaster: J M Goodman