Member Exclusive

Tales of Similarity

The concept of molecular similarity is central to many applications in chemoinformatics and medicinal chemistry. While similarity is an intrinsically subjective concept, we attempt to quantify it by comparing molecular representations.

Go to the profile of Jürgen Bajorath
Jun 12, 2015

Please sign in or register for FREE

Register to MedChemNet – the medicinal chemistry network

MedChemNet is a communal space where industry and academic professionals can come together to discuss the latest research and developments, share thoughts and opinions, and build valuable international relationships.



Go to the profile of Gerald Lushington
Gerald Lushington over 2 years ago

Well put! Any reliance on any current generic measure of molecular similarity (especially, but not limited to, the heavily used distance-based formalism within a space defined by substructural fingerprints) as a basis for med chem goals such as SAR analoging is highly imperfect. Conventional similarity concepts are not without some technical value, but surely we as a community can do far better.

Some time ago my colleagues and I took a crack at this issue with an article entitled "Novel Algorithms for the Identification of Biologically Informative Chemical Diversity Metrics" ( In this work we asserted that a major flaw with diversity metrics (which, I would argue, are basically similarity metrics examined through a mirror) lay in the attempt to find generic measures to quantify biological disposition of compounds, without recognizing how the property-dependence differs vastly from target to target, and even within different mechanistic variants of the same target. The paper thus proposed a simple-minded formalism for winnowing down a large pool of molecular properties to hone in on those that seem best suited to discriminate between active and inactive molecules within a screening data set for a given target.

In retrospect, the specific test cases chosen in that paper to illustrate the method are faulty (we focused on several of the old NCI tumor cell lines, which have since been largely deprecated), so perhaps the data quality was inadequate to derive real insight from, but the simple method (and some more sophisticated schemes we've mapped out since) may provide some incremental progress toward the challenge Professor Bajorath has identified.