Practical Fragments: DOCK

Showing posts with label DOCK. Show all posts

04 November 2024

Catching virtual cheaters

As experienced practitioners of fragment-based lead discovery will know, the best way to avoid being misled by artifacts is to combine multiple methods. (Please vote on which methods you use if you haven’t already done so.) Normally this advice is for physical methods, but what’s true in real life also applies to virtual reality, as demonstrated in a recent J. Med. Chem. paper by Brian Shoichet and collaborators at University of California San Francisco, Schrödinger, and University of Michigan Ann Arbor.

The Shoichet group has been pushing the limits of computational screening using ever larger libraries. Five years ago they reported screens of more than 100 million molecules, and today multi-billion compound libraries are becoming routine. But as more compounds are screened, an unusual type of artifact is emerging: molecules that seem to “cheat” the scoring function and appear to be virtual winners but are completely inactive when actually tested. Although rare, as screens increase in size these artifacts can make up an increasingly large fraction of hits.

Reasoning that these types of artifacts may be peculiar to a given scoring function, the researchers decided to rescore the top hits using a different approach to see whether the cheaters could be caught. They started with a previous screen in which 1.71 billion molecules had been docked against the antibacterial target AmpC β-lactamase using DOCK3.8, and more than 1400 hits were synthesized and tested. These were rescreened using a different scoring approach called FACTS (fast analytical continuum treatment of solvation). Plotting the scores against each other revealed a bimodal distribution, with most of the true hits clustering together. Of the 268 molecules that lay outside of this cluster, 262 showed no activity against AmpC even at 200 µM.

Thus encouraged, the researchers turned to other studies in which between 32 and 537 compounds had been experimentally tested. The top 165,000 to 500,000 scoring hits were tested using FACTS, and 7-19% of the initial DOCK hits showed up as outliers and thus likely cheaters. For six of the targets, none of these outliers were strong hits. For each of the other three, a single potent ligand had been flagged as a potential cheater.

To evaluate whether this “cross-filtering” approach would work prospectively as well as retrospectively, the researchers focused on 128 very high scoring hits from their previous AmpC virtual screen that had not already been experimentally tested. These were categorized as outliers (possible cheaters) or not and then synthesized and tested. Of the 39 outliers, none were active at 200 µM. But of the other 89, more than half (51) showed inhibition at 200 µM, and 19 of these gave K_i values < 50 µM. As we noted back in 2009, AmpC is particularly susceptible to aggregation artifacts, so the researchers tested the ten most potent inhibitors and found that only one formed detectable aggregates.

In addition to FACTS, the researchers also used two other computational methods to look for cheaters: AB-FEP (absolute binding free energy perturbation) and GBMV (generalized Born using molecular volume), both of which are more computationally intensive than either FACTS or DOCK. Interestingly, GBMV performed worse than FACTS, finding at best only 24 cheaters but also falsely flagging 9 true binders. AB-FEP was better, finding 37 cheaters while not flagging any of the experimentally validated hits.

This is an important paper, particularly as virtual screens of multi-billion compound libraries become increasingly common. Indeed, the researchers note that “as our libraries grow toward trillions of molecules… there may be hundreds of thousands of cheating artifacts.”

And although the researchers acknowledge that their cross-filtering aproach has only been tested for DOCK, it seems likely to apply to other computational methods too. I look forward to seeing the results of these studies.

12 September 2022

Growing fragments in silico with FastGrow

Growing fragments is probably the most common approach to improving affinity, and it is immeasurably faster to do this virtually than experimentally. But as anyone who has ever tried can attest, this is often easier said than done. In a new open-access J. Comput. Aided Mol. Des. paper, Matthias Rarey and collaborators at Universität Hamburg, Servier, and BioSolveIT describe a free tool to help.

The application is called FastGrow, and it can be accessed through this web server or the SeeSAR 3D software package. It relies on the “Ray Volume Matrix (RVM) shape descriptor,” which simplifies chemical fragments and protein binding pockets into three-dimensional shapes. This allows extremely rapid assessments of whether a given fragment can fit into a binding pocket. A scoring function called JAMDA assesses interactions beyond simple shapes, such as hydrogen bonds and hydrophobic contacts, and also allows fragments to shift slightly to optimize complementarity with the protein.

One nice feature of FastGrow is that users can input fragments into multiple binding sites with different amino acid conformations, allowing for protein flexibility. You can also specify an important interaction, such as a critical hydrogen-bond, that you prefer to maintain.

To validate the approach, the researchers turned to the database PDBbind and looked for examples in which two ligands with identical cores but different substituents bound to the same protein. They chopped off the substituents from the first ligand and used the resulting fragment as a starting point to try to grow the second ligand. Running 425 of these took just 3 and a half hours and successfully recapitulated the binding mode 71% of the time. This was higher than the popular program DOCK (version 6.9), which seemed to be a pleasant surprise. They attribute the difference to a higher clash tolerance for FastGrow in the initial stages.

For additional validation, the researchers turned to real-world examples of fragment-growing for the kinases DYRK1A/B, which we highlighted last year (here and here). Here too FastGrow outperformed DOCK and was also about five-fold faster when using JAMDA (and 600-times faster without JAMDA, though at some cost in performance).

FastGrow looks to be a valuable tool, and indeed the researchers note that it is currently in use at Servier. There is a lot more detail in the paper and supplementary materials, including the full code for the FastGrow web server and all the underlying data. It would be interesting to compare its performance to the V-SYNTHES approach we highlighted earlier this year.

If you have experience using FastGrow, please leave a comment!

07 March 2022

Virtual screening succeeds against the SARS-CoV-2 main protease

Today marks exactly two years since Practical Fragments first mentioned SARS-CoV-2. Since then, COVID-19 has killed more than 6 million people worldwide. Multiple effective vaccines have been developed and approved, along with a couple small-molecule drugs, but the virus is here to stay, and more drugs will be needed. This brings us to an open-access paper published in J. Am. Chem. Soc. by Jens Carlsson (Uppsala University) and a large group of international collaborators.

The so-called main protease (M^pro, or 3CLp) has been an antiviral target since the earliest days of the pandemic; the work we highlighted two years ago focused on a crystallographic screen against this enzyme. The new paper describes two virtual screening approaches.

The first started with a library of 235 million virtual compounds, mostly from Enamine’s “readily available for synthesis” (REAL) collection. Each compound was docked in thousands of different orientations against the active site of M^pro using DOCK3.7. Despite the staggering numbers (more than 223 trillion complexes!), the screen took just a day on 3500 CPU cores. The top 300,000 compounds were clustered based on similarity, and 100 molecules were synthesized. Nineteen of these showed binding by SPR, and three also inhibited the enzyme. Crystal structures were obtained for two of these, and both bound similarly to the predicted binding modes.

Compounds 1 and 3 each contain a hydantoin moiety that makes multiple hydrogen bonds to the protein, and merging elements led to low micromolar compounds such as compound 15. Further optimization ultimately delivered compound 19.

Compound 19 was potent in SPR and biochemical assays. Though it binds noncovalently, it had comparable cellular activity to nirmatrelvir, the recently approved covalent inhibitor of M^pro. Compound 19 showed nanomolar cell potency against SARS-CoV-1 and MERS-CoV and good selectivity against ten human proteases. The in vitro stability and permeability of compound 19 are also promising.

In addition to this de novo virtual screen, the researchers performed a second screen starting from one of the fragments identified crystallographically at Diamond Light Source. Of 93 molecules purchased and experimentally tested, 21 showed binding by SPR and 5 of these also inhibited the enzyme, with the most potent compound showing low micromolar activity.

There are several lessons from this paper. First, despite searching hundreds of millions of compounds, the best hits had only modest activity. This is perhaps surprising given the high fragment hit rates observed against M^pro in crystallographic and NMR screens, though it is worth noting that those fragments were even weaker binders.

Second, the hit rate from the naïve virtual screen was similar to that from the experimentally derived fragment screen. The researchers suggest that perhaps docking “may be more proficient in ranking diverse chemotypes rather than differentiating between closely related elaborations of the same scaffold.” In other words, virtual screens seem better at evaluating diverse starting points rather many similar molecules.

Third, despite the fact that the de novo virtual screen was not explicitly fragment-based, compound 1 does actually adhere to the rule of three. From there, addition of just six atoms improved affinity by >600-fold while also improving ligand efficiency.

Finally, this work is a testament to the utility of combining massive virtual screening with readily synthesizable compounds: the researchers note that it took less than four months to progress from compound 1 to nanomolar inhibitors.

This work relied heavily on rapid chemical synthesis done in Ukraine. Indeed, the two most popular fragment suppliers are both largely based in that country. Over the years many of us have come to know Ukrainian scientists not just as trusted colleagues but also as friends. I wish them and their families safety, and strength.

10 February 2019

What will you do with hundreds of thousands of new ligands?

Ten years ago we highlighted a paper out of Brian Shoichet’s group in which 137,639 commercially available fragments were screened against the anti-bacterial target AmpC β-lactamase, resulting in a couple dozen weak hits, one of which was ultimately optimized to a picomolar covalent inhibitor. As evidenced by the devices in our pockets, computers have improved over the past decade. This is beautifully illustrated in a paper just published in Nature by Brian Shoichet, Bryan Roth, John Irwin, and an international team of collaborators at UCSF, UNC Chapel Hill, and labs in China, Ukraine, and Latvia.

Rather than limiting themselves to commercially available compounds, the researchers turned to a virtual set of make-on-demand molecules available from Enamine. These are built from 70,000 building blocks using 130 different two-component chemical reactions; 350 million molecules are currently available, with one billion expected by next year. The molecules exist virtually in the ZINC database, but can be physically ordered from Enamine as well. According to the Methods section of the paper, 93% of compounds ordered were successfully synthesized and delivered within six weeks.

The researchers screened 99 million virtual molecules using the program DOCK3.7. On average, 280 conformations of each molecule were fit into the active site in 4054 orientations. The top million compounds were then grouped by scaffold, and only molecules that differed considerably from known AmpC ligands and commercial compounds were considered further. Fifty one compounds were actually made and tested, of which five were active with affinities between 1.3 and 400 µM. Next, 90 analogs of these were synthesized, and more than half were active; the best came in at 77 nM, among the most potent non-covalent AmpC inhibitors ever reported. Crystal structures of several ligands from different scaffolds showed good agreement with the docking predictions.

For a test case against a very different binding pocket, the researchers turned to the D₄ dopamine receptor, against which they screened 138 million molecules in silico: 70 trillion different complexes, a process that took just 1.2 days using 1,500 cores. As with AmpC, the top hits were clustered, and anything resembling commercially available or known ligands was discarded. Of 549 compounds purchased and tested, 81 had K_i values of 8.3 µM or better. Many of the molecules were also active in functional assays, including full and partial agonists and even a couple antagonists. One molecule, a 180 pM agonist, was 2500-fold selective against the related D₂ and D₃ dopamine receptors. By way of comparison, in work published by some of the researchers just two years ago, the best hit from 600,000 commercial compounds was a 260 nM agonist which required three rounds of medicinal chemistry optimization to get to 4 nM.

How well did the hit rates correlate with the docking scores? The researchers separated the molecules screened against the D₄ receptor into a dozen “bins” and randomly chose 444 molecules from across the bins to make and test. Happily, the hit rates did in fact vary by score: among top bins, hit rates were 22-26%, dropping to 12% in the middle, and 0% at the bottom. Based on these numbers (and considerably more sophisticated analyses, including Bayesian statistics), the researchers suggest that the library of 138 million molecules contains more than 453,000 D₄ receptor ligands in more than 72,600 scaffolds with inhibition constants of at least 10 µM, and perhaps 158,000 with K_i values of 1 µM or better. These may well be conservative estimates, as they assume no hits among poorer-scoring molecules.

In a human-machine head-to-“head” contest, the researchers chose 124 of the top-ranked molecules manually and another 114 based on docking scores alone. Reassuringly, carbon-based systems held out over silicon, with hit rates for both sets around 24% but the human-chosen molecules typically having higher affinities, including the 180 pM winner. But while human performance will likely remain steady for the near future, machines will continue to improve.

On an academic level, the approach described in this paper could allow empirical tests of the molecular complexity hypothesis. It would be fascinating to see whether hit rates are higher for smaller molecules than larger ones, though of course smaller ligands are likely to have lower affinities and are thus less likely to be among the top hits. As in a previous analysis from Astex, one would need to compare hit rates among molecules with equal numbers of non-hydrogen atoms.

On a practical level, ultra-large library docking could be a game-changer for targets that have been structurally characterized. If the method proves generalizable, the question a decade hence may not be how to find hits, but rather how to choose between hundreds of thousands of them.

21 July 2010

Virtual phosphate fragments

Phosphate groups are handy little things: easy for enzymes to put on and take off, they pack a lot of charge in a small volume, thereby providing plenty of binding energy for electrostatic interactions. Not surprisingly, they are ubiquitous in biology. Unfortunately, the same things that make them attractive for an organism make them problematic for drugs: they are easily removed, and their highly negative charge gives molecules containing phosphates a real problem getting across membranes. What’s a chemist to do?

This was the dilemma faced by Ruth Brenk, Ian Gilbert, and colleagues at the University of Dundee. They were interested in inhibiting the enzyme 6-phosphogluconate dehydrogenase (6PGDH) from the parasite that causes sleeping sickness. (See here for previous work from the same group using fragment methods to discover inhibitors against a different enzyme from the same organism.) The enzyme 6PGDH, as its name suggests, binds phosphate-containing substrates and has a very polar active site. Nanomolar inhibitors have been reported in the literature, but these contain phosphates and are not active in cell assays.

As reported in a recent issue of Bioorganic and Medicinal Chemistry, the researchers computationally filtered a set of commercially available compounds to find those that were less than 320 Da and were negatively charged, thereby potentially mimicking a phosphate. They then used DOCK 3.5.54 to see which of the resulting 64,000 molecules might bind in the active site of 6PGDH, resulting in 5836 possible hits. Subsequent triaging led to the purchase of 71 compounds. These were tested for inhibition of the enzyme at 200 micromolar concentration. Ten of these compounds inhibited the enzyme more than 80% at this concentration, of which 3 gave clean IC50 curves. These three molecules are all 5-membered carboxylic-acid-containing heterocycles, and although the IC50s are modest (ranging from 28 to 45 micromolar), they have good ligand efficiencies (up to 0.66 (kcal/mol)/atom). A computational search for analogs resulted in a few more active molecules with similar properties.

Whether these fragments can be advanced remains to be seen. The calculated solubilites, Log P, total polar surface area, and intestinal absorption parameters are more attractive than previous inhibitors, but the history of phosphate mimics is not encouraging. Most prominently, the protein PTP-1B, which recognizes phosphotyrosine residues, was once one of the hottest drug targets around, spawning a cottage industry of groups developing phosphotyrosine mimetics. Fragment methods were particularly effective, and numerous potent small molecules were published. But none of them were sufficiently drug-like, and to my knowledge none are in the clinic. Still, it is worth trying: 6PGHD may be more druggable, and approaches like this are likely to provide an answer.

20 July 2009

Fragments for sleeping sickness don’t lie still

In fragment-based drug discovery, the binding mode of the initial fragment often remains constant during the course of optimization (see AT9283 and AT7519 from Astex). But this isn't always true. An intriguing counterexample has recently been published in J. Med. Chem.

Ruth Brenk and colleagues at the University of Dundee were interested in pteridine reductase 1 (PTR1), an enzyme from Trypanosoma brucei, the protozoan that causes sleeping sickness. They used the program DOCK 3.5.54 (which has been successfully used for fragment-docking) to screen 26,084 commercially available fragments against the crystal structure of PTR1. After a variety of computational and manual filters were applied, the researchers purchased and tested 45 compounds in an enzymatic assay. Of these, 10 fragments inhibited PTR1 at least 30% at 100 micromolar concentration, the most potent of which was compound 4 (below).

Removing the chlorine atom to generate compound 5 resulted in a dramatic loss in activity, while adding the dichlorobenzyl moiety caused a similarly large boost in activity (compound 9). The researchers were able to characterize the binding mode of each of these molecules crystallographically, and it turns out that, despite sharing a common aminobenzimidazole core, they all bind in very different fashions.

The initial compound 4 binds in two orientations, one of which closely resembles the binding mode predicted from the computational screen, with hydrogen bonds between the fragment and the enzyme cofactor NADP+. Compound 5 makes indirect (water-mediated) hydrogen bonds with the cofactor, while compound 9 binds in a completely different manner some distance from the cofactor.

Brenk and colleagues observed a hydrophobic pocket near compound 9 which they exploited to generate the low nanomolar compound 12; crystallography confirmed this binds in a similar fashion to compound 9. This molecule also displayed impressive selectivity against the potential off-target dihydrofolate reductase. Unfortunately, despite the promising biochemical activity of compound 12, it displays only modest activity against T. brucei in cell culture.

This study illustrates two important points. First, it can be hazardous to assume that even very closely related molecules, such as 4 and 5, bind in the same manner. Second, because of this, one should not adhere too slavishly to models, even those based on crystal structures. The binding modes of compounds 4 and 5 would not accommodate the dichlorobenzyl moiety, and yet this addition provided a sizable boost in potency. Sometimes it pays to make substitutions even where you wouldn’t expect them to make sense, especially where the changes are easy to make.

03 May 2009

More on DOCKing fragments and sampling chemical space

A few weeks ago, we highlighted a paper from Brian Shoichet’s group at UCSF demonstrating that computational screening could successfully identify fragments binding to a protein target, and that the binding modes predicted were actually observed experimentally. A companion paper just published online in PNAS now extends these results, and also beautifully illustrates that it is possible to cover much more chemical space with fragments than with lead-like molecules.

Denise Teotico, Shoichet, and colleagues used the program DOCK 3.5.54 to screen 137,639 fragments against AmpC beta-lactamase, a bacterial protein responsible for antibiotic resistance. The protein had previously been the target of HTS and computational screens of drug-like like molecules. The computational screens had modest success rates (2-7%), but the HTS screen was a total bust: of the more than 1200 hits from the 70,000+ compound screening collection, more than 95% of these turned out to be false positives, mostly aggregators, with just a few dozen true inhibitors, all of which turned out to be covalent (irreversible).

In contrast, of the 48 high-scoring fragments that were experimentally tested, 23 had Ki values better than 10 mM, for a hit rate of 48%. The authors also assessed potential for false negatives by choosing 20 random fragments and testing these for inhibition; only one showed inhibition (with a Ki value of 3.1 mM), and this molecule had scored in the top 5% of docked fragments.

The paper presents a fascinating empirical test of the Hannian chemical complexity hypothesis. Starting with the 23 active fragments, the researchers calculated how many lead-like molecules (up to 25 non-hydrogen atoms) could contain these fragments. Of the roughly 47,000,000,000 to 430,000,000,000 possible lead-like molecules, only 675 are commercially available. By repeating this analysis with fragment-sized molecules (up to 17 non-hydrogen atoms), the size of the haystack was reduced by six orders of magnitude: only about 10,000 possible molecules contain these fragments, of which 93 are commercially available. Moreover, many of the active fragments represent unique chemotypes not previously observed in AmpC inhibitors. As the authors note:

The chances of discovering interesting chemotypes for biological targets is many orders of magnitude higher when targeting molecules in the fragment weight range than even at slightly higher size ranges.

But, as the paper asks, “are the docking predictions right for the right reasons?” The researchers solved the crystal structures of 8 fragments bound to AmpC. Four of these reproduced the docking predictions well, two were somewhat different, and two were way off. In these last two cases, the protein itself adopted different conformations than had been used in the docking studies.

Protein conformational flexibility is remarkably common, and likely to be a persistent difficulty for computational methods. Clearly, current computational methods can’t identify all possibilities, particularly with fluxional proteins. Still, especially with relatively rigid proteins, computational fragment-screening may reveal chemotypes that HTS won’t.

A notable feature of the fragments is their relatively poor ligand efficiency: with one unusual exception (a phosphinate), all of the active fragments have ligand efficiencies less than 0.3 (kcal/mol)/atom. AmpC has a large, open active site, and the authors suggest that the failure of other hit-ID methods against this target may reflect issues such as solubility.

It remains to be seen whether these fragments can be advanced to low nanomolar inhibitors, but at least fragment-screening has provided many new starting points. And the paper demonstrates, once again, that triaging a fragment set computationally can be an effective means for concentrating the needles in a haystack.

22 March 2009

Fragments in silico, confirmed by X-ray

I’ve always been something of an empiricist, and have therefore been wary of computational fragment screening. It’s not that I think it’s impossible, just that the algorithms and parameters developed to date have not often shown themselves up to the task. A paper just published in Nature Chemical Biology from Brian Shoichet’s group at UCSF has caused me to reconsider my skepticism.

Shoichet and Yu Chen used the program DOCK to screen 67,489 commercially available fragment-sized molecules contained in the database ZINC against the active site of the beta lactamase CTX-M, a bacterial enzyme responsible for resistance to penicillin and cephalosporin. Of 69 top hits, 10 actually inhibited the enzyme when tested experimentally. In contrast, of 37 high-scoring hits from a similar computational screen of 1,147,326 larger lead-like molecules, none showed any inhibition up to the limit of their solubilities.

Interestingly, each of the ten active fragments contained an anionic group: 3 carboxylates, 2 sulfates, and 5 tetrazoles among the set. A reexamination of the docked lead-like molecules revealed a relatively high-scoring tetrazole, which exhibited an experimental Ki value of 21 micromolar (see figure). Although this was an in silico hit, it was swamped by the number of (inactive) hits and so had not been selected for experimental follow-up until the fragment results revealed tetrazoles to be privileged pharmacophores. Additional similarity searching of the lead-like molecules led to two additional low micromolar inhibitors.

Five of the inhibitory fragments and one of the lead-like molecules were characterized crystallographically, and the results were remarkable: all of them bound in a similar manner to that predicted by docking.

Chen and Shoichet also investigated the specificity of the fragments compared to the lead-like compounds, and the results agreed well with those predicted by Hann and colleagues (as discussed on our sister blog FBDD-Lit here). Namely, while the fragments had relatively low specificity against a mechanistically distinct beta lactamase (AmpC), the lead-like molecule exhibited roughly 100-fold tighter inhibition of CTX-M. In other words, fragments likely have a higher hit rate (and correspondingly lower specificity) due in part to their simplicity, but as fragments are elaborated, specificity can be readily built into the molecules.

So does this mean the era of computational fragment-based screening has arrived? While these results are impressive, it is important to keep them in perspective. CTX-M has a relatively rigid active site, while many proteins of interest show a level of flexibility that confounds modeling. Moreover, Chen and Shoichet were working with an ultra-high resolution (0.88-Angstrom) crystal structure of CTX-M in which they could actually see density for hydrogen atoms on some polar groups. Needless to say, this is atypical. Still, the paper does give hope that the computational tools are ready, as long as they are applied to appropriate systems.