10 January 2022

Virtually screening 11 billion compounds – no problem!

Three years ago we highlighted virtual screens of roughly 100 million molecules which led to numerous high-affinity ligands against two targets. Those efforts made use of the Enamine “readily available for synthesis” (REAL) library, a virtual catalog of molecules that can be rapidly made and delivered. Enamine is continuing to grow this resource, which as of last year stood at 11 billion compounds. This is an impressive number, but how do you make use of it? In a just-published paper in Nature, Vsevolod Katritch (University of Southern California, Los Angeles) and a large group of collaborators provide a promising fragment-based solution.
Molecules in the Enamine REAL collection can be made using one-pot parallel synthesis from two or three reagents; for example, an amide could be made from an amine and a carboxylic acid. Enamine built a set of 75,000 reagents and 121 different reactions which collectively could produce 11 billion molecules (it’s even larger now). However, docking all of these could take thousands of years on a single CPU or cost hundreds of thousands of dollars on a computing cloud.
Rather than docking all the Enamine REAL compounds, the researchers developed an approach called virtual synthon hierarchical enumeration screening, or V-SYNTHES. The first step is to create a library of scaffolds with molecular weights in the 250-350 Da range. Taking the amide example above, imagine linking a set of 1000 amines to benzoic acid and a set of 1000 carboxylic acids to methylamine. This 2000 compound minimal enumeration library, or MEL, could be considered a subset of the full 1000 x 1000 = 1,000,000 virtual amide library. The numbers are even more dramatic for a three-component reaction: a MEL of just 1500 compounds could represent 125,000,000 fully elaborated molecules.
The MEL is docked against a protein of interest, and a diverse set of the top-scoring compounds chosen for fragment growing. In our example, the benzoic acid “cap” on the best compounds would be replaced by the full set of 1000 carboxylic acids. These would then be virtually screened, and the top compounds synthesized and tested.
The researchers applied V-SYNTHES to two targets. The first was a cannabinoid receptor bound to an antagonist. A total of 1.5 million molecules were docked against CB2, representing 11 billion fully enumerated compounds. After filtering the best hits to remove PAINS and molecules similar to known CB2 ligands, 80 diverse compounds were chosen for actual synthesis and testing, of which Enamine was able to deliver 60 in less than 5 weeks. One-third of these turned out to be antagonists with Ki values < 10 µM in biological assays.
How does this compare to a brute-force approach? Screening all 11 billion molecules wasn’t feasible, so the researchers screened a representative subset of the Enamine REAL library consisting of 115 million molecules – two orders of magnitude larger than the libraries screened in V-SYNTHES. Of 97 compounds synthesized and tested, only 5 turned out to be antagonists of CB2 with Ki values < 10 µM.
A nice feature of V-SYNTHES is that it is well-suited to SAR-by-catalog. This was demonstrated by looking for analogs of the three best hits within Enamine REAL space. Of 104 compounds synthesized and tested, more than half had Ki values < 10 µM, and 23 were submicromolar antagonists. In fact, several turned out to be low nanomolar and selective not just against the related CB1 receptor but against a panel of 300 other GPCRs.
V-SYNTHES was also applied to the kinase ROCK1 and achieved similarly impressive results: six of 21 compounds synthesized and tested had Kd < 10 µM in a binding assay, and one was a low nanomolar inhibitor.
This is a lovely and practical application of fragment concepts. Importantly, because the computational cost only increases linearly with the number of synthetic components while the library size increases with the square (for two-component molecules), it is very scalable; the researchers suggest that “terascale and petascale libraries” should be “easily” accommodated. These are numbers beyond even what DNA-encoded libraries can promise.
Currently V-SYNTHES relies on a good structural model for docking, but as computational predictions of protein structures become ever more accurate, perhaps even this will cease to be a limitation. Our SkyFragNet post from 2019 is looking ever more prophetic, in a good way.


WvH said...

Sounds similar to the LEAP2 method that my former colleagues at Pfizer La Jolla published.

Anonymous said...

These guys at BioSolveIT have been doing this since quite a bit of time. They just didn't manage to get a nature publication out of it. Check out their description https://www.biosolveit.de/chemical-space-docking/

Anonymous said...

and whatever millions or billions enumerated, this is STILL nothing within the universe of the accessible drug like molecules.

Anonymous said...

would be interesting to use crystallographic binding poses of fragments as starting points instead of docked fragments.

also variation of the same receptor or a flexible protein backbone/ binding site ... Potentially this could improve the method even further.

Dan Erlanson said...

Thanks everyone for the great comments!
LEAP2 and BioSolveIT do seem similar to V-SYNTHES, but as those with a sweet tooth like to say, the proof is in the pudding. The 2011 LEAP2 paper, for example, is mostly conceptual; the only molecules reported are low micromolar caspase 3 inhibitors whose structures are not provided.

I have been impressed with some of the BioSolveIT presentations I've seen, but don't recall this level of success in terms of novelty and potency; please post references if I've missed them.

I do agree that chemical space is vaster than even the most powerful screens are likely to be able to explore, but ultimately you just need good hits, not all the hits possible.

As for crystallographic starting points, stay tuned for an upcoming blog post!

Anonymous said...

With respect to Biosolveit, I remember seeing a presentation that used almost an identical workflow. Here's a video of that talk - https://www.youtube.com/watch?v=hQBXyQhZKio&t=1152s