10 November 2014

Plenty of room at the bottom (of chemical space)

One of the key selling points of fragment-based lead discovery is that small fragments can search chemical space much more efficiently than larger compounds, since there are fewer possibilites. Nonetheless, the numbers are still daunting: more than 166 billion molecules with up to 17 non-hydrogen atoms. The question of how many of these are commercially available has come up before. In a paper just published online in Prog. Biophys. Mol. Biol., Chris Murray and colleagues at Astex take a new look at this – and related – questions.

Rather than considering all possible molecules, the researchers focused on six-membered rings with one or two small substituents of no more than six non-hydrogen atoms. Six-membered rings are found in many drugs, so this is a useful area of chemical space on which to focus. The researchers first considered “topologies,” simple two-dimensional representations of molecules. In the coarsest version, benzene, cyclohexane, pyridine, and piperidine would all have identical topologies: a six-membered ring with no substituents.

The researchers looked at how many topologies having up to 16 atoms were listed in the available chemicals directory (ACD) of 2.7 million commercial molecules. Even using the coarse definition where all non-hydrogen atoms were considered equivalent, less than half of 16-atom topologies are commercially available. At finer resolution (for example, differentiating carbon from nitrogen), the numbers dropped even more: less than 4% of the 2223 16-atom topologies with a pyridazine core were available.

However, things get better the smaller the molecule. When considering only molecules with 11 non-hydrogen atoms, all of the coarsest topologies are available, as are more than 70% of pyridazines. From this, the researchers concluded:
We need to focus on fragments with lower heavy atom counts and… improve the sensitivity of our screening methods to make sure that we can identify the binding of these smaller fragments.
The rest of the paper discusses how they applied this approach, and what lessons they learned.

The researchers assert that X-ray crystallography (upon which Astex was founded) is the most sensitive screening method. That may elicit some debate, but is defensible given the presence of extremely weak binders (water, buffer components, detergents) in many crystal structures. They also argue that while NMR may allow detection of fragments with lower solubilities, this may not be a good thing.

Of the 1633 fragments that were in the Astex library between 2001 and 2007, 22% came up as X-ray hits (ie, they showed up in at least one crystal structure). Strikingly, fragments with 11 or 12 atoms were enriched far above their representation in the overall library, while fragments with 17 or more atoms were underrepresented. This is a beautiful confirmation of the “molecular complexity” hypothesis, the idea that there is a sweet spot where molecules are large enough to make productive interactions with a target but not so complex that negative interactions become dominant.

These results led the researchers to redesign their library to focus on fragments having fewer than 17 non-hydrogen atoms, which entailed considerable custom synthesis. The resulting library has 1371 fragments, of which 47% have shown up as X-ray hits. The average size of hits is the same as that of the overall library (12.2 vs 12.4 non-hydrogen atoms and 172 vs 176 Da, respectively), though the hits are slightly more lipophilic (cLogP = 1.1 vs 0.9).

What about “three-dimensionality?” This is a topic that has been discussed quite a lot (herehere, herehere, and here, for starters), so it is nice to have some solid data. One problem is how to define three-dimensionality: simple metrics such as Fsp3 don’t account for the fact that aromatic compounds such as 2,6-substituted biphenyls can be very non-planar. Many people use PMI, but the Astex researchers chose deviation from planarity (DFP). This method puts a hypothetical plane through the molecule that minimizes the deviation of all non-hydrogen atoms from the plane; the average deviation from the plane for each molecule is calculated in Ångstroms. So, for example, benzene has DFP = 0.0 Å, while cycloleucine has DFP = 0.54 Å. In this study, the researchers used a single conformation for each molecule, but since these fragments have on average only 1.3 rotatable bonds this is probably a reasonable simplification.

Roughly 40% of the Astex library has a DFP < 0.05 Å, but these “flat” fragments were enriched to ~50% among hits. Not surprisingly, kinase hits tended to be even more two-dimensional (>60%), but even protein-protein interaction (PPI) hits were, if anything, slightly more planar than the overall collection, which is consistent with another recent study. Indeed, there seems to be nothing special at all about PPI hits, more than half of which were also found against non-PPI targets. The researchers argue that 3D-fragments are inherently more complex and thus less likely to show up as hits, which supports Teddy’s Safran Zunft challenge.

One of the arguments in favor of three-dimensionality is that such molecules may have better physicochemical properties, and the researchers examine the DFP for fragments and resulting leads. It turns out that there is a weak correlation between the shapeliness of a fragment and that of the resulting lead, but there are many exceptions (such as this one).

Some of these data have been publicly presented, but this paper should broaden the discussion. Coming back to the title of this post, the conclusion is that fragments should be made as small as detectable with your assay. And flat is the new black.

2 comments:

Anonymous said...

This paper contains one of the least scientifically relevant passages I have read in a long time.

"In order to assess the deviation from planarity for molecules we use an approach that has recently been described and analysed in detail by Firth et al. (Firth et al., 2012). A number of years ago, we independently developed this method and here we give a brief description of the approach."

I think I'll try to claim gravity the next time I'm writing up a manuscript!

Matthew R. Lee said...

I really enjoyed the paper, but I don't quite agree some of with the authors points with respect to the 3D fragments. The authors state that even for PPI targets, there is still a higher % of planar X-ray hits than the % in the library itself (although . . . not statistically significant)". Then the authors state "non-planar fragments . . . generally show a slightly poorer hit rate than flat molecules".

My interpretation of the data shown is that for kinases, the aromatic hit rate is higher. But for PPI's, the hit rates of planar and non-planar hits appear pretty comparable.

I really like the point about trying to de-emphasize hit rates from fragment libraries and instead to focus on whether or not the hits "can be optimised into good chemical leads or drugs".

Along those lines, for very difficult targets with shallow 3D pockets, the hit rates of non-planar molecules ought to be quite low, but perhaps with higher hit-rates than aromatics, as the complementarity of aromatics may be too weak to detect binding. However, this would depend on the fragment library containing the right non-planar fragments that can fit...a tough challenge given the much larger chemical space of non-planar fragments.

Thus, hit rates can be quite deceiving for that reason as well. If you happen to have the right fit for a tough target with a shallow, spherical pocket, hit rates might be exceedingly low, but also of tremendous value and able to generate leads. And in those cases, aromatics might not show detectable binding. Ind if your frag-lib's non-planar space is insufficient, none of the non-planars might be detectable as well, but if you've got the right substituted, saturated ring, you'll get the hits.

Lastly, I really like the discussion of how Astex found the chemical space coverage of commercially available fragments to be sparse, thus propelling them to "enhance the library via synthesis".