One of the key selling points of fragment-based lead discovery is that small fragments can search chemical space much more efficiently than larger compounds, since there are fewer possibilites. Nonetheless, the numbers are still daunting: more than 166 billion molecules with up to 17 non-hydrogen atoms. The question of how many of these are commercially available has come up before. In a paper just published online in Prog. Biophys. Mol. Biol., Chris Murray and colleagues at Astex take a new look at this – and related – questions.
Rather than considering all possible molecules, the researchers focused on six-membered rings with one or two small substituents of no more than six non-hydrogen atoms. Six-membered rings are found in many drugs, so this is a useful area of chemical space on which to focus. The researchers first considered “topologies,” simple two-dimensional representations of molecules. In the coarsest version, benzene, cyclohexane, pyridine, and piperidine would all have identical topologies: a six-membered ring with no substituents.
The researchers looked at how many topologies having up to 16 atoms were listed in the available chemicals directory (ACD) of 2.7 million commercial molecules. Even using the coarse definition where all non-hydrogen atoms were considered equivalent, less than half of 16-atom topologies are commercially available. At finer resolution (for example, differentiating carbon from nitrogen), the numbers dropped even more: less than 4% of the 2223 16-atom topologies with a pyridazine core were available.
However, things get better the smaller the molecule. When considering only molecules with 11 non-hydrogen atoms, all of the coarsest topologies are available, as are more than 70% of pyridazines. From this, the researchers concluded:
We need to focus on fragments with lower heavy atom counts and… improve the sensitivity of our screening methods to make sure that we can identify the binding of these smaller fragments.
The rest of the paper discusses how they applied this approach, and what lessons they learned.
The researchers assert that X-ray crystallography (upon which Astex was founded) is the most sensitive screening method. That may elicit some debate, but is defensible given the presence of extremely weak binders (water, buffer components, detergents) in many crystal structures. They also argue that while NMR may allow detection of fragments with lower solubilities, this may not be a good thing.
Of the 1633 fragments that were in the Astex library between 2001 and 2007, 22% came up as X-ray hits (ie, they showed up in at least one crystal structure). Strikingly, fragments with 11 or 12 atoms were enriched far above their representation in the overall library, while fragments with 17 or more atoms were underrepresented. This is a beautiful confirmation of the “molecular complexity” hypothesis, the idea that there is a sweet spot where molecules are large enough to make productive interactions with a target but not so complex that negative interactions become dominant.
These results led the researchers to redesign their library to focus on fragments having fewer than 17 non-hydrogen atoms, which entailed considerable custom synthesis. The resulting library has 1371 fragments, of which 47% have shown up as X-ray hits. The average size of hits is the same as that of the overall library (12.2 vs 12.4 non-hydrogen atoms and 172 vs 176 Da, respectively), though the hits are slightly more lipophilic (cLogP = 1.1 vs 0.9).
What about “three-dimensionality?” This is a topic that has been discussed quite a lot (here, here, here, here, and here, for starters), so it is nice to have some solid data. One problem is how to define three-dimensionality: simple metrics such as Fsp3 don’t account for the fact that aromatic compounds such as 2,6-substituted biphenyls can be very non-planar. Many people use PMI, but the Astex researchers chose deviation from planarity (DFP). This method puts a hypothetical plane through the molecule that minimizes the deviation of all non-hydrogen atoms from the plane; the average deviation from the plane for each molecule is calculated in Ångstroms. So, for example, benzene has DFP = 0.0 Å, while cycloleucine has DFP = 0.54 Å. In this study, the researchers used a single conformation for each molecule, but since these fragments have on average only 1.3 rotatable bonds this is probably a reasonable simplification.
Roughly 40% of the Astex library has a DFP < 0.05 Å, but these “flat” fragments were enriched to ~50% among hits. Not surprisingly, kinase hits tended to be even more two-dimensional (>60%), but even protein-protein interaction (PPI) hits were, if anything, slightly more planar than the overall collection, which is consistent with another recent study. Indeed, there seems to be nothing special at all about PPI hits, more than half of which were also found against non-PPI targets. The researchers argue that 3D-fragments are inherently more complex and thus less likely to show up as hits, which supports Teddy’s Safran Zunft challenge.
One of the arguments in favor of three-dimensionality is that such molecules may have better physicochemical properties, and the researchers examine the DFP for fragments and resulting leads. It turns out that there is a weak correlation between the shapeliness of a fragment and that of the resulting lead, but there are many exceptions (such as this one).