Chemical Informatics Letters

Editor: Jonathan M Goodman

Volume 10

Cost per article
How much does it cost to publish a paper and then have access to the results? This study looks at the library expenditure for universities in the USA, and divides it by the number of articles published in order to get an estimate. High publishing schools, such as Harvard, are, therefore, paying less per article than schools which publish less. Only a few schools pay less than a thousand dollars an article, by this measure, and most pay several thousand dollars for each article. A Cornell University Library task force study on open access publishing concludes that subscription models and open access models can coexist. However, the costs of access to the scholarly literature for Cornell would probably rise if current subscription expenditure was switched to paying for author fees for open access publishing. There is further discussion from Newcastle University

Nuclear Receptor Signaling Atlas: The aim of the program is to gather and organize information relating to orphan nuclear receptor biology, and extend this to the wider discipline of nuclear receptor signalling.

UK Freedom of information act
Under the new law, passed by Parliament in 2000, all citizens will have information to any non-exempt information from any UK public sector authority or institution.

Google Scholar
Google Scholar is an experimental Google service to search the scholarly literature. For chemistry, where most publications are not open-source, its coverage is a long way from being compehensive. Google is also digitising libraries from Harvard (FAQ) and Oxford.

ZINC is a free database of commercially-available compounds for virtual screening. It contains over 2.7 million compounds in ready-to-dock, 3D formats, from the Soichet laboratory at USCF, and is described in the first issue of the new Journal of Chemical Information and Modeling 2005, 45, 177-182.

Project Gutenberg
Project Gutenberg now has a full text search.

International Conference on Chemical Structures
The 7th International Conference on Chemical Structures to be held June 5-9, 2005 at Noordwijkerhout, The Netherlands. The conference is jointly organized by the Division of Chemical Information of the American Chemical Society (CINF), Chemical Structure Association Trust (CSA Trust), Division of Chemical Information and Computer Science of the Chemical Society of Japan (CSJ), Chemistry-Information-Computer Division of the Society of German Chemists (GDCh), Royal Netherlands Chemical Society (KNCV), Chemical Information Group of the Royal Society of Chemistry (RSC), and the Swiss Chemical Society (SCS). The conference will cover: Cheminformatics; Structure-Activity and Structure-Property Prediction; Structure-Based Design and Virtual Screening; Analysis of Large Data Sets; and Bridging the Cheminformatics-Bioinformatics Gap.

Some free medical publications
A group of publishers (American Association for the Advancement of Science, American Association for Cancer Research; American Cancer Society; American Diabetes Association; American Heart Association; American Medical Association; American Physiological Association; American Roentgen Ray Society; American Society of Hematology; Annals of Internal Medicine; Blackwell Publishing; BMJ Publishing Group Ltd; Elsevier; Massachusetts Medical Society; Nature Publishing Group; Oxford University Press; Society of Nuclear Medicine) is is starting a service called patientINFORM which will offer some of their publications on the web for free "to help patients and caregivers close a critical information gap". The patientINFORM web site will be launched in the spring.

Vigyaan is a free Linux-based electronic workbench for bioinformatics, computational biology and computational chemistry. The focus is currently biased towards biology rather than chemistry, and is being developed by Pratul K Agarwal of the Computational Biology Institute at Oak Ridge National Laboratory.

GPL update?
The General Public License (GPL) has not changed since version 2 in 1991, but it may be modified again to provide more protection against software patents. In a recent interview, Richard Stallman mentions the development of version 3 of the GPL. Should the license require that distributors of GLP programs should grant unhindered use of all patented technology in the program? Software patents may become even more important. India is moving towards granting software patents. The position of European software patents is being discussed by the European Union, and a decision has been postponed again.

APS delays publication
Glenn T. Seaborg's results on nuclear fission were sent to the APS in 1941, but not published for five years. Five years later, he won a Nobel Prize in Chemistry.

Y2K five years on
Did Y2K bug exist? This article argues that it was an important event. We should now worry about the year 2038.

Syracuse Research Corporation
The Syracuse Research Corporation provides a variety of free services including a PBT Profiler for estimating Persistent, Bioaccumulative, and Toxic Profiles of organic chemicals, Estimation software which calculates a variety of properties and is available from Environmental Protection Agency and a database of Environmental Fates. Other resources are also available for purchase, including a Physical Properties Database and its free demo version.

Virtual Screening and Structure-Based Drug Design
The CHI conferences: a conference on Virtual Screening will run March 30th-31st in Boston and another conference Structure-Based Drug Design will take place from May 24th-26th in Philadelphia. Highlights include lectures from Professor Richard A Friesner and by Dr Matthew Stahl.

IBM Opens Their Patent Portfolio to Open Source
IBM has undertaken not to assert five hundred of their forty thousand software patents against Open Source Software in support of innovation and open standards (Announcement (PDF file) and Press release). Sun also announced that it will provide access to sixteen hundred patents for the open source community. Richard Stallman comments that this is a good thing, but does not remove the need to fight against the software patent system. Linus Torvalds notes that software patents are a problem, despite these announcements. The European Parliament has just decided not to pass a bill which would have allowed software patents.

Internet Journal of Chemistry
The Internet Journal of Chemisty is to stop publication. It was launched in January 1998 and provided innovative web-based services. The existing journal articles will continue to be available for free for the foreseeable future.

Why don't chemists like preprints?
Is the chemical community not interested in this form of publication? There is some evidence, in computer science, that open access papers are cited more, and astrophysics papers published as pre-prints are also cited more. Why not chemistry?


(i) Refereeing services: So chemists rely more on referees than other disciplines? Checking experimental data is laborious. Perhaps some subjects rely less on this, particularly if they may often publish ideas without data or analyses of shared data?
(ii) Tradition: in some respects the process of publishing new syntheses has been similar for over a century. Is it inertia that discourages preprints and open access publication?
(iii) Power of established publishers: need good names as editors and track record. There are big players who are hard to challenge. Chemistry was good business in nineteenth century; maths and biology were not.

Grand Challenges in Computing
The British Computer Society is running a conference considering the grand challenges in computing. These include complete simulations of living organisms, global and scalable ubiquitous computing, managing information over a lifetime, the architecture of the brain, and non-classical computing.

Search Engines
Search engines were last mentioned here in Chem. Inf. Letters 2004, 8, 7. New search engines include Amazon's a9, which finds places and gives pictures of the streets. For example, a search for 'Havemeyer Hall' produces a picture of the Columbia University Chemistry Department. Microsoft has a new search engine: MSN search. Yahoo has developed Y!Q, which analyses the web page you are reading and gives you related pages. Google still seems to lead the field, but competitors are doing a an increasingly good job. Companies which try to raise sites profiles on search engines seem to be doing good business, suggesting that the results can be manipulated. A survey from SearchEngineWatch suggests that whilst search engines are sophisticated and intelligent, their users are not.

Open health standards
Some of the largest US technology companies have agreed to embrace open, nonproprietary technology standards as the software building blocks for a health information network in the USA.

Maintaining Chemical Information
Two historians who have questioned how industry reacts when faced with information about potential dangers from their products are being sued, as are the book's referees. Good chemical information can save lives.

NIH Calls on Scientists to Speed Public Release of Research Publications
The NIH has announced a new policy - it calls on scientists to release the public manuscripts from NIH-supported research within twelve months of publication (PDF). PubMedCentral (PMC), a free digital archive run by the NIH, provides the resources needed to do this. More information is available from the NIH website. The Nature Publishing Group (NPG) has changed its policy to Allow authors to publish articles on an archive. Authors will be encouraged to submit the manuscript to their funding body's archive, their institution's repositories and their personal websites six months after publication. The Nature license is available (PDF).

Molecular Descriptors Calculation
preADME is a web-based program which calculates molecular descriptors which can be used to estimate ADME properties. It was developed at the Research Institute of Bioinformatics and Molecular Design in Korea. The Dragon 5 program, from the Todeschini Group in Milan, calculates 1661 molecular descriptors, but is not freely available to download, but an online version is available from the Virtual Computational Chemistry Laboratory.

Drug names
The World Health Organization has a web site which lists information about some drug names, and also sells a CD-ROM with more information. Fuller on-line list are also available from other sources.

Beilstein Journal of Organic Chemistry
The Beilstein Institute has announced the launch of the first major open access journal for organic chemistry: Beilstein Journal of Organic Chemistry. It will be published by the Beilstein Institute in co-operation with BioMed Central.

The Beilstein Journal of Organic Chemistry will make organic chemistry research freely available immediately on publication, and permanently available in the public archives of science. There will be no charge for authors, as the journal will be supported by the Beilstein Institute. A call for papers with full information for authors will be published in May.

ACS widens access to its journals
The American Chemical Society is broadening access to research articles published in its scholarly journals with two experimental policies: The first policy represents a response to NIH's public access guidelines recently released by the National Institutes of Health: "The NIH encourages authors whose work it funds to submit their peer-reviewed manuscripts to PubMed Central, the agency's free digital archive of biomedical and life sciences journal literature."

Wellcome Trust requires Open Source publishing
The Wellcome Trust is a strong supporter of open access publishing, and now proposes to require Wellcome Trust supported publications to be deposited in PubMed Central, or the European PMC once it is established. The Wellcome Trust's recent study showed that an author-pays business model has the opportunity for substantial cost savings over the reader-pays model.

How did XML begin?
This interview with Tim Bray, one of the creators of XML, explains how it started in the context of a project to computerise the Oxford English Dictionary. It was necessary to bring SGML to the web, and the interview mentions that XML is more complicated than it needs to be, despite the aim of keeping it simple.

FEBS Journal
The European Journal of Biochemistry has been replaced by FEBS Journal from the beginning of 2005. This is one of the journals published by the Federation of European Biochemical Societies. FEBS, a signatory of DC Principles, has made all articles published in EJB, FEBS Journal and FEBS Letters freely available following a 12 month Embargo. In addition, Nucleic Acids Research has also moved towards an Open Access publishing model for 2005.

The World Wide Web Consortium
The World Wide Web Consortium agrees web standards. However, the Web Hypertext Application Technology working group (WHAT), a collaboration of Web browser manufacturers and interested parties who wish to develop new technologies, is also interested in standards, is proposing a different direction for web forms. The two possible developments are XForms and Web Forms 2.0.

British Science Language Science Signs
Online glossaries for science and engineering and the built environment are now available. Art and design has been updated.

Digital Media Project
Which is more important: science or music? Music and film distributors are want to protect their commercial interests, but legislation could have an impact on all digital information.

Burndy Library for the history of science
The Burndy Library may move from MIT to Pittsburg in 2007.

Google's word limit
Google's word limit: Google's word limit has increased from ten to thirty two words, allowing for more complicated queries, which is particularly useful when qualifying search terms with a minus sign preceding other terms to be rejected.

NIH Open Access Policy
Some comments on the NIH Open Access Policy.

New York Public Library Digital Gallery
The New York Public Library's Digital Gallery contains over quarter of a million digitized images, including some of chemistry and chemists.

Google Print
Google Print is a project to include as much book information as possible in Google searches, by digitising libraries. If copyright permits it, it is possible to read a whole book on-line, but more often, searches will return just a few sentences. For example, a Google search for 'book Philosophiae Naturalis Principia Mathematica' does not find Newton's most famous book, but 'book Origin of Species' leads to Darwin's most famous book. A search for 'book origin of the species' which is only slightly different to the correct title, does not find the book. A search for 'book chemistry' finds a number of chemistry books. Linus Pauling's 'General Chemistry' is available, but only a few pages can be browsed.

Impermanence of Data
Information which used to be available is being restricted, as a result of concerns about 'security' this article suggests. This information includes reports from the Los Alamos Laboratory on metallurgy and physics.

Dialog from Thomson now provides chemical structure searching.

Patent Accessibility
On-line patent databases should make it easier to check patents, but a recent article in World Patent Information (volume 27, page 27), summarised in the New Scientist, shows that Espacenet, the European online patent database, is missing hundreds of thousands of documents from the UK Patent Office and the equivalent French and German offices. Some of the missing documents are recent.

Facts about Open Access
The Association of Learned and Professional Society Publishers (ALPSP), the American Association for the Advancement of Science and Stanford's HighWire Press are funding a study on open access, and the results of the first stage are available: The facts about open access.

Open access journals
About 1543 scientific and scholarly journals are available under open access arrangement.

Creative Archive Group
The Creative Archive is a BBC-led initiative to provide access to public service audio and video archives. The Creative Archive License allows for use and distribution of within the UK, subject to some rules.

Finding numbers
How much information is in a number? Google has a search by number feature which finds patent numbers and parcel tracking codes, amongst other things. How big does a number need to be before it is likely to be unique within the Google database? A Google Search for 'patent 6884995' finds only this patent in the USPTO database; a search for '6884995' finds about thirty hits, including telephone numbers and grant numbers.

The ACS has grown concerned about PubChem, because it appears that a new USA government service is competing with an established private-sector business. This complaint sounds extraordinary in the UK, where the government runs free schools and hospitals, for example, which compete directly with private-sector schools and hospitals, and questions of unfair competition are not raised, as the private-sector services provide greater convenience and more features to justify their cost. The Chemical Abstracts Service provides far more data, in a more convenient way, and with many more features, than PubChem. However, PubScience was discontinued for similar reasons, and even the weather is under threat.

Center for Chemical Methodology and Library Development
Synthetic Protocols is a database of solid phase, solution phase and library synthesis procedures, developed at Boston University's Center for Chemical Methodology and Library Development.

Safe exchange of chemical information
Is it possible to share chemical information without giving away commercial secrets. Professor Tudor Oprea believes so, and has arranged for a challenge to test this. If it is possible to hide chemical information this way, does this imply that the chemical descriptors used are inadequate?

European Libraries counter Google
Nineteen European national libraries have decided to cooperate, lead by the Franch national library, to put European literature on-line, and provide competition for Google's project.

Columbia Declaration
The Columbia University Senate endorsed unanimously a resolution on "Open Access" at its meeting at the beginning of April, 2005. The resolution was introduced by the Senate's Committee on Libraries and Academic Computing, and records support for the principle of open access to scholarly research, and urges the scholars of Columbia University to play a part in open-access endeavors.

Courses on chemical informatics
University and college courses in chemical informatics were surveyed in Chem. Inf. Lett. 2004, 8, #1, 12 and also in Chem. Inf. Lett. 2003, 6, #4, 46. There are now a large number of course modules on chemical informatics, but still only a few degrees.

DSpace at the Indian National Chemical Laboratory
A collection of molecules has been set up in DSPace by Dr M. Karthikeyan of the the chemoinformatics team at the National Chemical Laboratory at Pune. MolTable is a database of molecules abstracted from theses, with more soon to be incorporated. The data includes properties, molecular descriptors and spectra.

ASPET increases access to journals
The American Society for Pharmacology and Experimental Therapeutics (ASPET) will make the articles in its journals The Journal of Pharmacology and Experimental Therapeutics, Molecular Pharmacology, and Drug Metabolism and Disposition will be freely accessible to everybody, a year after publication.

ACS Meeting, San Diego 2005
The abstracts from 115 presentations at the CINF session in the Spring 2005 San Diego ACS meeting are available, together with some of the presentations.

Celera opens DNA database
According to Business Week, Celera will make freely available data on about thirty billion base pairs of DNA, from July 1st. This addresses concerns that Celera was not following usual methods of publication for its results.

Accessibility of crystallographic data
How accessible is crystallographic data? The Crystallography Open Database, the ecrystals project, and the RCSB make their data freely available. The CCDC does not, but it does provide free access to individual structures for research purposes.

Holding molecules
Work in the Olson laboratory at the Scripps Research Institute has developed tools to let people hold molecules with their bare hands. Are hands complicated enough to manipulate molecules?

Randomly generated papers
SCIgen is a program which generates text resembling computer science papers using random processes. Recently, a SCIgen paper was accepted by a conference. Could this be done for chemistry? The SCIgen paper falsifies some data, but ensures that it would be hard for these data to be cited and re-used. A random synthetic chemistry paper might be harder to produce in a convincing way, because so many cross-checks are possible, and can be automated.

Chemical and Engineering News has a guest editorial, highlighting the importance of syberinfrastructure. The National Science Foundation has just published the report of its advisory panel on cyberinfrastructure, and its division of Shared Cyberinfrastructure has a number of programs and funding opportunities. There was a workshop on Cyber Chemistry in Washington DC at the end of last year. According to the NSF’s directors of chemistry and shared cyberinfrastructure, cyber-enabled chemistry has the potential to be transformational.

Cheminformatics Virtual Classroom
Mesa Analytics and Computing has received a grant from the National Science Foundation to build a Cheminformatics Virtual Classroom.

Dutch Digital Documents
DAREnet (English version) gives digital access to Dutch academic research output. More than 25 000 publications from two hundred scientists are featured, with about 60% full content available. The content includes chemistry, such as a page for Professor Ben Feringa.

Chemistry 2000 (c2k)
Chemistry 2000 (c2k) is continuing to provide up to date information about chemistry departments, learned societies and chemistry journals around the world. Since the last report in June 2004 (Chem. Inf. Lett. 2004, 8, #6, 69). The database now lists 1857 departments from 138 countries. The United States of America has the most department (642), followed by France (103) and then Britain and Germany almost equal. The French academic system is not arranged in the same was as the British one, and this probably inflates the number of departments listed. There are now 3062 sites listed in total (departments, learned societies and journals), of which about twenty are inaccessible in a typical month. The database has grown by 2.4 % in the last year.

Google Scholar
How good is Google Scholar? A 2004 review has now been updated (June 2005). The reviews conclude that Google Scholar has some way to go before it can compete with subscription-based systems. The tool is probably more effective for disciplines with a tradition of open access publication than for subjects such as chemistry, where most content is available only by subscription.

PageRank algorithm
Google is based on the PageRank algorithm, which is described on a Stanford website. A patent application (20050071741 - for more details just type this number into Google) has now been submitted for this area. This can be used to help understand how Google ranks its lists. Google also provided information on this, on April 1st, 2002.

Accessibility of Crystal Structure Data
Small molecule crystal structure data can be most easily searched through the Cambridge Crystallographic Data Centre, which is available by subscription. However, the request-a-structure service is accessible from the CCDC home page, and does not require a fee. The Crystallography Open Database provides everything freely, but is smaller and does not have the same long tradition of curation. The COD is running a petition asking for (crystal data or powder patterns) to be available at no cost on the Web.

Nearly half a million acronyms are available through a recently announced search engine. Several languages are available, although English dominates. How useful is it to have a general search engine for acronyms? Can this work well for chemistry? A few simple synthetic chemistry acronyms (DMP, DMAP, DIBAL) were tested and all drew blanks. However, AM1 and DFT both were decoded correctly, although there were fourteen suggestions for the latter, only one of which was related to chemistry. A smaller, chemistry focussed index is also available at Indiana University. The same five test acronyms gave identical results, except for DFT, which only found thirteen suggestions - all except one of which were chemistry-focussed.

Scientometrics is the study or measurement of scientific texts and information. There are at least two journals in this area, Scientometrics and Cybermetrics, and also a society: International Society for Scientometrics and informetrics. Not content with the quantitative analysis of general scientific information, there has been a recent bibliometric analysis of scientometrics.

Open Knowledge Initiative
The Open Knowledge Initiative, which started at at MIT in 2001, develops ways of allowing learning management systems interact with enterprise systems. The website includes a list of e-learning products.

Scopus vs Web of Science
Scopus and the The Web of Science are both resources for searching the scholarly scientific literature. Scopus developed since 2002, was launched at the end of 2004. It is distinct from Scirus, a free scientific search engine also produced by Elsevier. The Web of Science, from Thomson, developed from the Science Citation Index. This comparison recommends buying both, if possible, but notes that Scopus appears to be slower to update its database.

Simon the Virtual Stockroom Manager
Sigma-Aldrich has developed a 'virtual stockroom' displaying its catalogue as a stockroom. The resources include substructure searching of chemicals in the catalog.

ACS Web Subscriptions
The ACS is changing its web subscription system for 2006. Instead of rolling access to the last four years and the current year, which meant that a year of access was lost each new calendar year, institutional subscribers to ACS web editions will have access from 1996 to the present. Articles published between 1879 and 1995 will be renamed the ACS Legacy Archives and will be available either for an annual fee or a one-off payment with a 'with a "nominal" annual fee'. The '"nominal"' fee is less than an annual subscription for most journals.

Google raises search word limit to 32
This will allow for more complicated search queries, which may be important for searching for specific chemical entities.

ACS and PubChem
CAS and the ACS are concerned about PubChem. Is a publicly-funded organisation competing with a commercial one? Is this a good use of tax-payers money? This argument may seem strange in the UK, where it is accepted that taxes are spent on hospitals and schools, and there are also commercial hospitals and schools, which do a good business by providing more than the state-funded alternatives, at a cost. However, a similar argument was used to close PubScience (Chem. Inf. Lett. 2001, 3, #1, and Chem. Inf. Lett. 2002, 5, #6, 3).

The ACS published a statement on the issue, and ACS President William F Carroll has written an open letter about the situation, in which he reaffirms that the 'increase and diffusion of chemical knowledge is the conerstone of the ACS mission and its Congressional Charter.' C & E News also has an article on the issue.

Does PubChem compete with CAS? According to an article in Nature, Bob Massie, the head of CAS, thinks that every chemical researcher understands that PubChem is a substitute for CAS. Many people have expressed views disagreeing with the ACS position, and pointed out that CAS, a tax-exempt organization, has received public funding to develop its database. A history of the development of CAS is available to subscribers to J. Chem. Inf. Comput. Sci. The University of California Office of Scholarly Communication has a page commenting on the issue. The Scholarly Publishing and Academic Resources Coalition (SPARC) supports PubChem. Discussions between the NIH and the ACS continued, but the argument then moved to Congress.

On 10th June, the House voted on the 2006 budget for the NIH. PubChem represents a tiny fraction of the NIH budget. A small increase was approved, but one that falls short of inflation. The report accompanying the bill did not ask the NIH to restrict the scope of PubChem, but "urges NIH to work with private sector providers to avoid unnecessary duplication and competition with private sector chemical databases." The ACS noted is was pleased with the report language. Supporters of PubChem see the report language as a victory for the NIH.

© 2000-2006 J M Goodman, Cambridge; Chemical Informatics Letters ISSN 1752-0010
Cambridge Chemistry Home Page CIL Chemical Calculations Goodman Research Group Chemical Information Laboratory Webmaster: J M Goodman