Molecular complexity is a fundamental concept
underlying fragment-based lead discovery: fragments, being simple, can bind
to more sites on proteins and thus give higher hit rates than larger, more complex
molecules. The ultimate example of this is water, which – at 55 M concentration
– binds to lots of sites on proteins. But although the concept is easy to
describe, it is much harder to quantify: everyone can agree that palytoxin is
more complex than methane, but by how much? And if complexity could be measured, could it help in optimizing libraries? This is the subject of a review by
Oscar Méndez-Lucio and José Medina-Franco at the Universidad Nacional Autónoma
de México published recently in Drug
Discovery Today.
There are many ways to measure molecular complexity. Two of
the simplest to calculate are the fraction of chiral centers (FCC) and the
fraction of sp3 carbons (Fsp3). These range from 0 to 1,
and larger numbers imply a higher number of unique molecules with the same
formula.
More complicated methods to measure complexity abound, but
many of these require specialized software. Two that are publicly available are
PubChem complexity and DataWarrior complexity. In PubChem, complexity
incorporates the number of elements as well as structural features such as
symmetry, though stereochemistry is not explicitly considered, and aromaticity
is scaled such that both benzene and cyclohexane have the same complexity – a
sharp contrast to FCC and Fsp3. DataWarrior uses its own metric,
though I couldn’t find the definition. (Ironically, though the software itself
is open source, the paper describing it is not.)
So, do more complex molecules have lower hit rates? The
researchers looked at several public databases of screening data for dozens of
assays against thousands of molecules. Using each of the four metrics, they
classified molecules as “simple,” “intermediate,” or “complex”. For FCC and Fsp3,
simple compounds did appear to be more promiscuous, in line with theory and
with previous findings. However, for PubChem and DataWarrior, the trends were
not clear – and even reversed in some cases. The researchers note that the
median complexity of molecules in each dataset may vary, and as Pete has also observed simple binning strategies can be misleading.
Do these different definitions of complexity even measure
the same thing? The researchers plotted each pair-wise measurement of complexity
for >400,000 molecules – for example, Fsp3 vs DataWarrior. Not
only are there no universal correlations, those that do exist are conflicting. "For example," the authors write, “compounds with high FCC values are associated with low PubChem
complexity values, whereas the same molecules have high DataWarrior
complexity."
Teddy has previously invoked Justice Potter Stewart and his famous
“I know it when I see it” expression, and I think that just about sums up where
things stand in terms of molecular complexity. From a practical standpoint this
probably doesn’t matter; a complex molecule is not even necessarily more
difficult to make, as evidenced by the ease of oligonucleotide and peptide
synthesis. Still, it would be nice if someone could come up with a reliable
measurement for such a fundamental property – or even demonstrate whether or
not such measures are possible.
If it is anything like the situation with molecular FINGERPRINTING they will be all over the place. We recently compared a matrix of marketed drugs vs endogenous metabolites in terms of the RANK ORDER of their similarities. Very little consonance. One conclusion is that each can have value in different circumstances. I can go with that.
ReplyDeleteO'Hagan S, Kell DB: Consensus rank orderings of molecular fingerprints illustrate the ‘most genuine’ similarities between marketed drugs and small endogenous human metabolites, but highlight exogenous natural products as the most important ‘natural’ drug transporter substrates. bioRxiv version. bioRxiv 2017:110437.
Preprint at: http://biorxiv.org/content/early/2017/02/21/110437
Have you seen this paper? "An Additive Definition of Molecular Complexity" by Thomas Böttcher https://doi.org/10.1021/acs.jcim.5b00723
ReplyDeleteA framework for molecular complexity is established that is based on information theory and consistent with chemical knowledge. The resulting complexity index Cm is derived from abstracting the information content of a molecule by the degrees of freedom in the microenvironments on a per-atom basis, allowing the molecular complexity to be calculated in a simple and additive way. This index allows the complexity of any molecule to be universally assessed and is sensitive to stereochemistry, heteroatoms, and symmetry. The performance of this complexity index is evaluated and compared against the current state of the art. Its additive character gives consistent values also for very large molecules and supports direct comparisons of chemical reactions. Finally, this approach may provide a useful tool for medicinal chemistry in drug design and lead selection, as demonstrated by correlating molecular complexities of antibiotics with compound-specific parameters.
the data warrior method was published. "Molecular Complexity Calculated by Fractal Dimension,Modest von Korff & Thomas Sander", https://www.nature.com/articles/s41598-018-37253-8
ReplyDeletewhich software or tools can one use to calculate FCC I'm only seeing Number of chiral centers in KNIME and data warrior
ReplyDelete