Molecule of the Month
November 2010
A Searchable Map of PubChem

The database PubChem ( is today the largest public repository of chemical structures (> 20 million molecules). In this paper, we have classified the chemical space of PubChem using forty-two integer value descriptors of molecular structure, called molecular quantum numbers (MQNs). Principal component analysis shows that PubChem compounds occupy a partially filled elliptical cone in the (PC1,PC2,PC3)-space. The above visual is a projection of PubChem on the (PC2,PC3)-plane color-coded by the fraction of cyclic double bonds per molecule from 0 (blue) to 0.25 (red). The structures shown in the magnifying glass are examples of analogs of the drug Tamiflu found in its vicinity on the MQN-map. The statement «ceci n’est pas une botte-de-foin» alludes to René Magritte's painting «la trahison des images» (1929): first, looking for a bioactive compound on the MQN-map of PubChem is different from searching for a needle in a haystack (the usual way of screening) since the data is organized; and second, the image shown is not chemical space itself, which is an abstract concept, but only a – detailed – representation of it.
See also and R. van Deursen, L. C. Blum, J.-L. Reymond, J. Chem. Inf. Model. 2010, 50, 1924-1934, (2010), doi:10.1021/ci100237q.

This work was carried out in the group of Prof. Jean-Louis Reymond.