Molecular complexity is a fundamental concept underlying fragment-based lead discovery: fragments, being simple, can bind to more sites on proteins and thus give higher hit rates than larger, more complex molecules. The ultimate example of this is water, which – at 55 M concentration – binds to lots of sites on proteins. But although the concept is easy to describe, it is much harder to quantify: everyone can agree that palytoxin is more complex than methane, but by how much? And if complexity could be measured, could it help in optimizing libraries? This is the subject of a review by Oscar Méndez-Lucio and José Medina-Franco at the Universidad Nacional Autónoma de México published recently in Drug Discovery Today.
There are many ways to measure molecular complexity. Two of the simplest to calculate are the fraction of chiral centers (FCC) and the fraction of sp3 carbons (Fsp3). These range from 0 to 1, and larger numbers imply a higher number of unique molecules with the same formula.
More complicated methods to measure complexity abound, but many of these require specialized software. Two that are publicly available are PubChem complexity and DataWarrior complexity. In PubChem, complexity incorporates the number of elements as well as structural features such as symmetry, though stereochemistry is not explicitly considered, and aromaticity is scaled such that both benzene and cyclohexane have the same complexity – a sharp contrast to FCC and Fsp3. DataWarrior uses its own metric, though I couldn’t find the definition. (Ironically, though the software itself is open source, the paper describing it is not.)
So, do more complex molecules have lower hit rates? The researchers looked at several public databases of screening data for dozens of assays against thousands of molecules. Using each of the four metrics, they classified molecules as “simple,” “intermediate,” or “complex”. For FCC and Fsp3, simple compounds did appear to be more promiscuous, in line with theory and with previous findings. However, for PubChem and DataWarrior, the trends were not clear – and even reversed in some cases. The researchers note that the median complexity of molecules in each dataset may vary, and as Pete has also observed simple binning strategies can be misleading.
Do these different definitions of complexity even measure the same thing? The researchers plotted each pair-wise measurement of complexity for >400,000 molecules – for example, Fsp3 vs DataWarrior. Not only are there no universal correlations, those that do exist are conflicting. "For example," the authors write, “compounds with high FCC values are associated with low PubChem complexity values, whereas the same molecules have high DataWarrior complexity."
Teddy has previously invoked Justice Potter Stewart and his famous “I know it when I see it” expression, and I think that just about sums up where things stand in terms of molecular complexity. From a practical standpoint this probably doesn’t matter; a complex molecule is not even necessarily more difficult to make, as evidenced by the ease of oligonucleotide and peptide synthesis. Still, it would be nice if someone could come up with a reliable measurement for such a fundamental property – or even demonstrate whether or not such measures are possible.