20 February 2017

Many measures of molecular complexity

Molecular complexity is a fundamental concept underlying fragment-based lead discovery: fragments, being simple, can bind to more sites on proteins and thus give higher hit rates than larger, more complex molecules. The ultimate example of this is water, which – at 55 M concentration – binds to lots of sites on proteins. But although the concept is easy to describe, it is much harder to quantify: everyone can agree that palytoxin is more complex than methane, but by how much? And if complexity could be measured, could it help in optimizing libraries? This is the subject of a review by Oscar Méndez-Lucio and José Medina-Franco at the Universidad Nacional Autónoma de México published recently in Drug Discovery Today.

There are many ways to measure molecular complexity. Two of the simplest to calculate are the fraction of chiral centers (FCC) and the fraction of sp3 carbons (Fsp3). These range from 0 to 1, and larger numbers imply a higher number of unique molecules with the same formula.

More complicated methods to measure complexity abound, but many of these require specialized software. Two that are publicly available are PubChem complexity and DataWarrior complexity. In PubChem, complexity incorporates the number of elements as well as structural features such as symmetry, though stereochemistry is not explicitly considered, and aromaticity is scaled such that both benzene and cyclohexane have the same complexity – a sharp contrast to FCC and Fsp3. DataWarrior uses its own metric, though I couldn’t find the definition. (Ironically, though the software itself is open source, the paper describing it is not.)

So, do more complex molecules have lower hit rates? The researchers looked at several public databases of screening data for dozens of assays against thousands of molecules. Using each of the four metrics, they classified molecules as “simple,” “intermediate,” or “complex”. For FCC and Fsp3, simple compounds did appear to be more promiscuous, in line with theory and with previous findings. However, for PubChem and DataWarrior, the trends were not clear – and even reversed in some cases. The researchers note that the median complexity of molecules in each dataset may vary, and as Pete has also observed simple binning strategies can be misleading.

Do these different definitions of complexity even measure the same thing? The researchers plotted each pair-wise measurement of complexity for >400,000 molecules – for example, Fsp3 vs DataWarrior. Not only are there no universal correlations, those that do exist are conflicting. "For example," the authors write, “compounds with high FCC values are associated with low PubChem complexity values, whereas the same molecules have high DataWarrior complexity." 

Teddy has previously invoked Justice Potter Stewart and his famous “I know it when I see it” expression, and I think that just about sums up where things stand in terms of molecular complexity. From a practical standpoint this probably doesn’t matter; a complex molecule is not even necessarily more difficult to make, as evidenced by the ease of oligonucleotide and peptide synthesis. Still, it would be nice if someone could come up with a reliable measurement for such a fundamental property – or even demonstrate whether or not such measures are possible.


Douglas Kell said...

If it is anything like the situation with molecular FINGERPRINTING they will be all over the place. We recently compared a matrix of marketed drugs vs endogenous metabolites in terms of the RANK ORDER of their similarities. Very little consonance. One conclusion is that each can have value in different circumstances. I can go with that.

O'Hagan S, Kell DB: Consensus rank orderings of molecular fingerprints illustrate the ‘most genuine’ similarities between marketed drugs and small endogenous human metabolites, but highlight exogenous natural products as the most important ‘natural’ drug transporter substrates. bioRxiv version. bioRxiv 2017:110437.

Preprint at: http://biorxiv.org/content/early/2017/02/21/110437

Anonymous said...

Have you seen this paper? "An Additive Definition of Molecular Complexity" by Thomas Böttcher https://doi.org/10.1021/acs.jcim.5b00723

A framework for molecular complexity is established that is based on information theory and consistent with chemical knowledge. The resulting complexity index Cm is derived from abstracting the information content of a molecule by the degrees of freedom in the microenvironments on a per-atom basis, allowing the molecular complexity to be calculated in a simple and additive way. This index allows the complexity of any molecule to be universally assessed and is sensitive to stereochemistry, heteroatoms, and symmetry. The performance of this complexity index is evaluated and compared against the current state of the art. Its additive character gives consistent values also for very large molecules and supports direct comparisons of chemical reactions. Finally, this approach may provide a useful tool for medicinal chemistry in drug design and lead selection, as demonstrated by correlating molecular complexities of antibiotics with compound-specific parameters.

Anonymous said...

the data warrior method was published. "Molecular Complexity Calculated by Fractal Dimension,Modest von Korff & Thomas Sander", https://www.nature.com/articles/s41598-018-37253-8

Unknown said...

which software or tools can one use to calculate FCC I'm only seeing Number of chiral centers in KNIME and data warrior