27 July 2020

Flatland: a nice place to be

The ideal shape of compounds used for biological screens is a subject of vigorous debate, with some arguing that shapely molecules may be superior in various ways to the “flatter” aromatic compounds that tend to dominate libraries. This view was expressed more than a decade ago in the paper, “Escape from Flatland: Increasing Saturation as an Approach to Improving Clinical Success.” However, those conclusions have been challenged. Since many of us are trying to discover drugs, it is worth asking what actual drugs look like. This is the subject of a new ACS Med. Chem. Lett. paper by Seth Cohen and colleagues at University of California, San Diego.

Assessing shapeliness is itself contentious. Here the researchers chose the intuitive metric, principal moment of inertia (PMI), which uses a simple triangle plot to assess whether a molecule is more rod-like, disk-like, or sphere-like. The degree of shapeliness (3D Score) can be calculated by summing the x- and y-coordinates to give values between 1 (rod- or disk-like) and 2 (sphere-like).

The researchers first extracted more than 8500 drugs and nutraceuticals from DrugBank, all of which had associated three-dimensional structures and MW >100. PMI calculations revealed that nearly 80% were linear or planar, with 3D Scores < 1.2. Another 17.5% had 3D Scores up to 1.4, while only 0.5% were greater than 1.6. Interestingly, this distribution is similar to that of the ZINC database of small molecules. You might expect a correlation between size and shapeliness, with larger molecules being more three-dimensional, but this was not the case. Perhaps related, a separate analysis found no correlation between shapeliness of fragments and resulting leads.

The 3D structures of compounds in DrugBank are calculated for energy-minimized conformations, which are not necessarily the biologically relevant conformations. So the researchers next went to the protein data bank (PDB) and its crystal structures of 502 unique DrugBank molecules bound to various proteins. Some molecules were represented multiple times (1036 structures of sapropterin!), and for these the PMIs were averaged. The results of this analysis were similar, with 83.5% of molecules having a 3D Score < 1.2 and just three molecules with a 3D Score > 1.6. As with the DrugBank data, there was no correlation between 3D Score and molecular weight.

Further analyses of compounds with multiple crystallographic structures was interesting. For diclofenac, with 51 PDB entries, 3D Scores ranged from 1.03 to 1.52, with the minimized score being 1.22. However, some of these structures are likely low affinity with questionable biological relevance. In contrast, for five approved HIV drugs, the PMIs remained very similar for molecules bound in the active sites.

Getting out of flatland is surprisingly difficult: the researchers examined the PMIs for several fragments from libraries designed to have shapely members and found that none had 3D Scores > 1.4. They suggest clever ways of increasing three dimensionality, such as building organometallic molecules. While this is likely to increase novelty and patentability, it also introduces unknown biological risks. One analysis that would be interesting is whether natural-product-derived drugs are significantly shapelier than their purely synthetic counterparts.

The researchers conclude:

The true need for topological diversity in feedstocks and final drug molecules remains unclear given the overwhelming number of linear and planar drugs. The question remains as to whether more 3D compounds represent attractive and untapped therapeutic space, or if more linear/planar molecules are indeed the best topologies for bioactive molecules.

This is indeed an interesting question, and I hope that chemists – particularly those in academia – continue to make and test ever more exotic molecules. But since the first word of this blog is “Practical,” I would not discount the more planar molecules that make up most of our pharmacopoeia.


Christophe L Verlinde said...

Cohen's paper is in essence a rehash of the 2003 JCIM paper by Sauer and Schwarz. The only difference is that the 2003 paper used the MDL Drug Data Report (MDDR) instead of the DrugBank, which did not exist at the time. Same 3D metric, similar database, so what is new?

Peter Kenny said...

Hi Dan, I think people doing analyses of ‘3Dness’ (whether they are believers or unbelievers) need to take more account of both conformation and atomic volume. In the 19th century, van der Waals showed that molecules have volume and yet, 20 years into the 21st century medicinal chemists are describing benzene as a 2D molecule. This article on shape indexing may be of interest. One mistake made by some 3D evangelists is to see ‘3Dness’ as a compound quality issue rather than as a molecular shape diversity issue. The compounds that I’ve seen from 3D fragment libraries typically exhibit excessive molecular complexity and I’m guessing that there would be benefits in relaxing the lipophilicity cutoff when designing 3D fragment libraries.

Dan Erlanson said...

Hi Christophe,

Actually, if anything, the 2020 paper contradicts the 2003 paper, which concluded that "maximum shape space coverage was further shown to be correlated with, and probably necessary for, broad biological activity." Also, the 2003 paper looked at bioactivities of HTS hits, whereas the 2020 paper focused on drugs, which have far more demanding constraints in terms of solubility, pharmacokinetics, etc. Finally, the 2003 paper looked only at calculated shapes, whereas the 2020 paper also looked at experimentally determined conformations from the PDB.

Hi Pete,

I think most people realize that benzene is not actually 2-dimensional, just as no pancake is truly flat, and yet everyone knows the cliche "flat as a pancake."

I do agree with your point about shapelier molecules being more complex; I've also discussed this. The problem is that there don't seem to be any unambiguous metrics for molecular complexity, so it is difficult to quantify.

Peter Kenny said...

Hi Dan,

I think the issue with using the 2D label to describe benzene is that it gives the impression that benzene and cyclohexane differ greatly in shape although I accept that most people are aware (at least when prompted) that benzene has thickness.

I’m guessing that some of the people designing ‘3D fragment’ libraries are thinking in terms of scaffolds rather than the substituents which are in closest contact with the molecular surface of the target. I take the latter view and, if designing a ‘3D fragment’ library, I’d focus on minimally-substituted aliphatic bicyclics (this may lead to fragments with logP > 3 getting selected and it wouldn’t worry me unduly if it caused our friends at Astex to spit a few feathers). I use extent of substitution to control molecular complexity when selecting fragments (I’m a huge fan of the Hann model but I find it too abstract to be of practical value when selecting compounds for screening). I’ve defined extent of complexity using SMARTS and there is some code (actually better than what I wrote at AZ) built using the OEChem (OpenEye) toolkit, in the SI for for the correlation inflation article.

Serghei Glinca said...

Hi Dan, I miss in Cohen’s paper issues like “target complexity”, fsp3 and the influence of ligand conformations on the 3D shape in solution and upon binding. A global measure of the 3D shape capturing one conformation or a small set of conformations does not indicate that substructures with tetrahedral sp3 atoms are not present.
In my experience with primary crystallographic screenings, a selection that considers a mix of sp2 and sp3 carbons in fragments, eg. by fsp3, delivers most valuable and practical hits with novel modalities. But a primary crystallographic screen with carefully designed soaking systems is different and much more informative compared to a binary biophysical screenings.
The conclusion that a PMI analysis capturing only the global 3Dness of compounds should argue for screening of “flat” (=sp2 rich) compounds should consider the risk of losing diversity of the chemical space (as pointed out by Pete) and insights into preorganizational features.

Glyn Williams said...

Hi Dan & Pete,

I nearly choked on my feathers!

When I was in a company where all hits are structurally validated by crystallography, it was obvious that multiple, overlaid (flat-ish) hits could describe a 3D volume that represented most, or all, of the space occupied by diverse drug-like compounds (see HSP90). However, multiple hits are only usually available for such tractable targets. The push for greater diversity of shape came when fragments were being directed against more challenging targets and protein interfaces. Usually this was also accompanied by a move to push the biophysical assays that were used for screening to higher concentrations of fragments.
So, for most applications where 3D libraries might generate important hits, good aqueous solubility and lack of aggregation will be important. Although aggregation and solubility might correlate with logP (or even clogP), a better strategy is to measure them. My guess is that both parameters may be improved by more 3D character.

The best part of the search for 3D compounds is the new chemistry that has been developed to make them and the wider applications that may exist for them. A concern for drug design is that some may be too rigid and, (in agreement with Unknown) molecules with multiple, low-energy conformations will prove more valuable.

Dan Erlanson said...

Lots of good discussion!

Hi Pete,
I couldn't find your code for calculating complexity in the SI, just several data sets. I would be interested to know how it compares with the other complexity measures I linked to above.

Hi Unknown,
You bring up a good point about Fsp3, which was not addressed in the Cohen paper. It would be interesting to rerun this sort of analysis as the Escape from Flatland paper is getting a bit long in the tooth and Pete's analysis did raise questions about the statistics. Target complexity is interesting (though also hard to measure) but there is no evidence that more complex targets require more complex ligands, as Glyn notes below. Conformational flexibility is also an important point, though this is touched on for example with diclofenac. Also, the three HIV protease inhibitors have multiple rotatable bonds and could in principle assume quite shapely conformations, yet the observed biological structures are more rod-like.

Hi Glyn,
As a chemist I wholeheartedly support developing new chemistry!

I also agree that shapelier molecules may have better properties (such as solubility), though my impression is that predicting solubility is still more guesswork than science, and predicting aggregation for novel molecules is probably not even guesswork.

I should emphasize that I have nothing against shapely molecules; some of them are quite beautiful, and I think aesthetics play a significant role (for better or worse) in medicinal chemistry. Perhaps in part because of this, "flat" compounds have become demonized over the past decade, and the Cohen paper is a useful reminder that actual drugs may not look like how we've idealized them.

Peter Kenny said...

Hi Glyn,

You’ve escaped so you no longer need to drink the Kool-Aid!

I’m actually a big fan of the idea of looking beyond aromatic rings and, by implication, of new chemistry. The correlation inflation article, which many regard as fundamentally ‘anti-3D’, actually makes these points: “One limitation of aromatic rings as components of drug molecules is that some regions above and below the plane defined by the atomic nuclear positions are not directly accessible to substituents. Molecular recognition considerations suggest a focus on achieving axial substitution in saturated rings with minimal steric footprint, for example by exploiting the anomeric effect or by substituting N-acylated cyclic amines at C2.”

As mentioned earlier, I think that the 3D evangelists are making a serious error by basing their case on ‘compound quality’ rather than molecular recognition. While I would argue for a molecular recognition focus on scientific grounds, compound quality arguments are hamstrung by having to adopt a genuflectory view of some very shaky data analysis (in which categorical data are binned prior to analysis). The Flatland analyses (Fsp3) typically test the significance of differences between mean values and there are two flaws in this approach. First, trends appear to become stronger as the data sets get larger. Second, the ordering of the bins is lost. The approach to data presentation (I would hesitate to use the term ’analysis’) used in support of solubility forecast index (the old name for property forecast index) is to compare bar charts visually, effectively turning what should be a straightforward regression problem into a subjective beauty contest.

Peter Kenny said...

Hi Dan,

Apologies for the misinformation. The code (ssprofilter) is actually in the SI for the alkane/water logP article article and it doesn’t actually calculate molecular complexity explicitly. It combines the functionality of two programs (the substructural profiler Struct_anal and Filter) that I created at Zeneca in the mid-1990s using the Daylight progamming toolkits. Struct_anal, which came first, enabled users to count substructural targets (defined as SMARTS) in molecular structures. It formed the basis of the Zeneca ‘de-crapper’ (in essence, substructural alerts like PAINS filters that were defined by my colleagues Richard Button, Jeff Morris and others) that we used for processing HTS output. Filter, as the name implies, enabled users to specify whether a molecular structure should be accepted or rejected according to the number of matches of substructural targets and is mentioned in our screening library design article.

As mentioned earlier, I restrict extent of substitution to control molecular complexity but I don’t actually define complexity explicitly. I would generally start fragment selection with small, structurally-prototypical compounds using a list of ‘allowed’ substituents. My approach to selecting the core of fragment library is ‘German’ according to the joke about the legal systems of European countries (In Germany everything is forbidden except if it is specifically permitted).

Serghei Glinca said...
This comment has been removed by the author.
Serghei Glinca said...

Indeed, great discussion!

Totally agree with Glyn's points regarding the chemistry. Working with 3D compounds does not mean the chemistry is impractical. You just have to have the right partner for that. To add a point regarding high concentrations: this is also true for soaking experiments. I hear too often the disappointment in the voice of chemistry about an "empty structure". This happens because critical parameters haven't been considered. In my experience, high concentrations for sp3-rich compounds in soaking experiments is critical to achieve a high occupancy due to their flexibility and for sp2-rich compounds of course solubility is an issues

I still have my concerns with the conclusion that rod-like compounds should be viewed as sp2-rich.

PS.: Unknown = Serghei