Practical Fragments: docking

Showing posts with label docking. Show all posts

02 September 2025

Keeping molecular dynamics cool for fragments

Accurately and reliably predicting fragment binding modes would be preferrable to doing messy, expensive, and sometimes tedious experimental work, but we’re not there yet. One of the biggest problems is that, because fragments usually bind weakly to proteins, it is hard to tell which of several possible binding modes is most favorable. In an open-access J. Chem. Inf. Model. paper published earlier this year, Stefano Moro and colleagues at University of Padova report progress.

Their approach, called Thermal Titration Molecular Dynamics (TTMD), analyzes short molecular dynamics simulations across increasing temperatures; if the ligand remains bound to the protein, this indicates a more stable binding mode. (It seems a bit like the dynamic undocking we wrote about here.) The researchers had previously reported good results for larger, drug-sized molecules, but not for four fragment-protein complexes.

Recognizing the low affinities of fragments, the researchers decided to lower the (virtual) temperatures. Rather than heating from 300 to 450 K, they heated from 73 to 233 K; ie, from just below the boiling point of liquid nitrogen to a moderately cold winter’s day in Minnesota. They first docked fragments using PLANTS-ChemPLP, which is free for academics, and chose the five best-scoring poses for evaluation.

Next, the researchers performed TTMD. There are several different ways to assess how well the ligand remains bound to the protein over the course of a molecular dynamics simulation, and four different scoring methods were chosen. When TTMD was tested on the four fragment-protein complexes that had previously failed, at least two of the scoring methods correctly identified the crystallographic binding mode for three of the fragments.

Thus encouraged, the researchers tested ten more compounds bound to six new proteins. The results were quite encouraging, with up to 86% of crystallographic binding modes being correctly identified by at least one of the scoring functions in TTMD vs 50% for docking alone. Impressively, two of the examples were MiniFrag-sized, with just 6 or 7 non-hydrogen atoms, yet the crystallographic pose was identified as the lowest energy in all four TTMD scores.

This is nice work, but the question arises how these specific ligands and proteins were chosen. Several years ago we highlighted a curated set of 93 protein-ligand structures that were used to benchmark other virtual approaches, and it would be nice to see how TTMD performs on these. Still, TTMD’s performance on its chosen examples is encouraging, and laudably the researchers have made their code freely available. If you try it out, please let us know how it works in your hands.

04 November 2024

Catching virtual cheaters

As experienced practitioners of fragment-based lead discovery will know, the best way to avoid being misled by artifacts is to combine multiple methods. (Please vote on which methods you use if you haven’t already done so.) Normally this advice is for physical methods, but what’s true in real life also applies to virtual reality, as demonstrated in a recent J. Med. Chem. paper by Brian Shoichet and collaborators at University of California San Francisco, Schrödinger, and University of Michigan Ann Arbor.

The Shoichet group has been pushing the limits of computational screening using ever larger libraries. Five years ago they reported screens of more than 100 million molecules, and today multi-billion compound libraries are becoming routine. But as more compounds are screened, an unusual type of artifact is emerging: molecules that seem to “cheat” the scoring function and appear to be virtual winners but are completely inactive when actually tested. Although rare, as screens increase in size these artifacts can make up an increasingly large fraction of hits.

Reasoning that these types of artifacts may be peculiar to a given scoring function, the researchers decided to rescore the top hits using a different approach to see whether the cheaters could be caught. They started with a previous screen in which 1.71 billion molecules had been docked against the antibacterial target AmpC β-lactamase using DOCK3.8, and more than 1400 hits were synthesized and tested. These were rescreened using a different scoring approach called FACTS (fast analytical continuum treatment of solvation). Plotting the scores against each other revealed a bimodal distribution, with most of the true hits clustering together. Of the 268 molecules that lay outside of this cluster, 262 showed no activity against AmpC even at 200 µM.

Thus encouraged, the researchers turned to other studies in which between 32 and 537 compounds had been experimentally tested. The top 165,000 to 500,000 scoring hits were tested using FACTS, and 7-19% of the initial DOCK hits showed up as outliers and thus likely cheaters. For six of the targets, none of these outliers were strong hits. For each of the other three, a single potent ligand had been flagged as a potential cheater.

To evaluate whether this “cross-filtering” approach would work prospectively as well as retrospectively, the researchers focused on 128 very high scoring hits from their previous AmpC virtual screen that had not already been experimentally tested. These were categorized as outliers (possible cheaters) or not and then synthesized and tested. Of the 39 outliers, none were active at 200 µM. But of the other 89, more than half (51) showed inhibition at 200 µM, and 19 of these gave K_i values < 50 µM. As we noted back in 2009, AmpC is particularly susceptible to aggregation artifacts, so the researchers tested the ten most potent inhibitors and found that only one formed detectable aggregates.

In addition to FACTS, the researchers also used two other computational methods to look for cheaters: AB-FEP (absolute binding free energy perturbation) and GBMV (generalized Born using molecular volume), both of which are more computationally intensive than either FACTS or DOCK. Interestingly, GBMV performed worse than FACTS, finding at best only 24 cheaters but also falsely flagging 9 true binders. AB-FEP was better, finding 37 cheaters while not flagging any of the experimentally validated hits.

This is an important paper, particularly as virtual screens of multi-billion compound libraries become increasingly common. Indeed, the researchers note that “as our libraries grow toward trillions of molecules… there may be hundreds of thousands of cheating artifacts.”

And although the researchers acknowledge that their cross-filtering aproach has only been tested for DOCK, it seems likely to apply to other computational methods too. I look forward to seeing the results of these studies.

07 August 2023

Democratizing computational FBLD with BMaps

Computational approaches to FBLD continue to gain in power. For the most part, they require significant knowledge and installation of expensive, customized software. To remedy this, John Kulp, III and colleagues at Conifer Point Pharmaceuticals have introduced a new web-based application, BMaps, which they describe in a recent J. Chem. Inf. Mod. paper.

As the researchers note (and appropriately reference), there are more than a dozen virtual fragment-based design tools and another dozen web-based tools. BMaps (for Boltzmann Maps) aims to provide a full range of functions, from visualizing proteins, finding hot spots, docking fragments, and growing them. It also provides information on the energetics of bound water molecules, which as we’ve written can be crucial players in optimizing protein-ligand interactions.

Two key techniques used by BMaps are Grand Canonical Monte Carlo (GCMC) simulations and Simulated Annealing of Chemical Potential (SACP). The first entails comprehensive sampling of different fragment conformations on a protein of interest and assessing binding free energy. The second tool “forcefully inserts fragments into all the binding sites of the protein” and then removes them slowly to evaluate which are most difficult to remove, and thus most tightly bound. Together, GCMC-SACP can be used to evaluate fragment binding to any protein uploaded to the site from the protein data bank, AlphaFold, or any other source.

One nice feature of BMaps is a repository of several hundred proteins each with more than 100 fragment and water simulations. BMaps also contains a database of more than 4000 fragments, including MiniFrags. Users can import their own fragments or computationally deconstruct larger ligands. The paper itself is quite short, but the supporting information provides more guidance on how to use the software.

The researchers “aim to democratize the availability of accurate fragment and water maps,” a laudable goal. Most computational features are available with a free account, though with restrictions on the number of operations per month.

BMaps looks quite powerful and easy to use, but I do wish the researchers had included some full case studies, for example those used by the free FastGrow tool we highlighted last year. Try it out and let the community know what you think!

12 September 2022

Growing fragments in silico with FastGrow

Growing fragments is probably the most common approach to improving affinity, and it is immeasurably faster to do this virtually than experimentally. But as anyone who has ever tried can attest, this is often easier said than done. In a new open-access J. Comput. Aided Mol. Des. paper, Matthias Rarey and collaborators at Universität Hamburg, Servier, and BioSolveIT describe a free tool to help.

The application is called FastGrow, and it can be accessed through this web server or the SeeSAR 3D software package. It relies on the “Ray Volume Matrix (RVM) shape descriptor,” which simplifies chemical fragments and protein binding pockets into three-dimensional shapes. This allows extremely rapid assessments of whether a given fragment can fit into a binding pocket. A scoring function called JAMDA assesses interactions beyond simple shapes, such as hydrogen bonds and hydrophobic contacts, and also allows fragments to shift slightly to optimize complementarity with the protein.

One nice feature of FastGrow is that users can input fragments into multiple binding sites with different amino acid conformations, allowing for protein flexibility. You can also specify an important interaction, such as a critical hydrogen-bond, that you prefer to maintain.

To validate the approach, the researchers turned to the database PDBbind and looked for examples in which two ligands with identical cores but different substituents bound to the same protein. They chopped off the substituents from the first ligand and used the resulting fragment as a starting point to try to grow the second ligand. Running 425 of these took just 3 and a half hours and successfully recapitulated the binding mode 71% of the time. This was higher than the popular program DOCK (version 6.9), which seemed to be a pleasant surprise. They attribute the difference to a higher clash tolerance for FastGrow in the initial stages.

For additional validation, the researchers turned to real-world examples of fragment-growing for the kinases DYRK1A/B, which we highlighted last year (here and here). Here too FastGrow outperformed DOCK and was also about five-fold faster when using JAMDA (and 600-times faster without JAMDA, though at some cost in performance).

FastGrow looks to be a valuable tool, and indeed the researchers note that it is currently in use at Servier. There is a lot more detail in the paper and supplementary materials, including the full code for the FastGrow web server and all the underlying data. It would be interesting to compare its performance to the V-SYNTHES approach we highlighted earlier this year.

If you have experience using FastGrow, please leave a comment!

18 December 2017

New tools for NMR: 2017 edition

NMR was the first practical fragment-finding method, and continues to be popular. Just over the past year we’ve discussed several new techniques, (here, here, and here), and this post highlights three more.

In Angew. Chem. Int. Ed., Jesus Angulo and colleagues at the University of East Anglia describe differential epitope mapping by STD NMR (DEEP-STD NMR). STD NMR, the most popular of ligand-detected methods according to our poll, can provide some information as to which portions of a ligand are close to a protein, but doesn’t show where on a protein the ligand binds. In DEEP-STD NMR, two separate NMR experiments are conducted and the results compared to provide this information.

The researchers provide two implementation of the technique. In the first, the protein is “irradiated” at two different frequencies; for example, the aliphatic and aromatic regions. Protein residues that are directly irradiated will show a stronger STD to ligand protons than those that are indirectly irradiated, thus revealing whether one region of the ligand is closer to an aromatic or an aliphatic amino acid side chain. If the structure of the protein is known, this can then reveal the orientation of the ligand within the binding site. A similar experiment can be done using H₂O vs D₂O to determine whether a portion of a ligand is in close proximity to polar residues in the protein.

Water is the subject of the second paper, in J. Med. Chem., by Robert Konrat and colleagues at the University of Vienna and Boehringer Ingelheim. As we’ve previously noted, water often plays a critical role in protein-ligand interactions. The new method, called LOGSY titration, involves doing a series of WaterLOGSY experiments at different protein concentrations and plotting the signals for each proton in the ligand as a function of protein concentration; ligand protons close to the protein show steeper slopes. The researchers examine pairs of bromodomain ligands and demonstrate that LOGSY titration can confirm changes in binding mode previously seen by crystallography. The technique could also reveal what portions of the ligands make interactions with disordered water molecules, which are more difficult to detect in crystal structures.

Both of these techniques provide useful but incomplete information about ligand binding modes. A paper in J. Am. Chem. Soc. by Andreas Lingel and his Novartis colleagues describes how to generate more detailed models. The researchers used a deuterated protein in which all methyl groups (in methionine, isoleucine, leucine, valine, alanine, and threonine) were ¹³C-labeled. Multiple intermolecular NOEs between the protein and several previously characterized ligands were collected and the resulting distances fed into modeling software to produce good agreement with the known structures. More significantly, the researchers were able to use the method prospectively with two weak (0.9 and 2.8 mM) fragments. The binding models were sufficiently accurate to guide chemical optimization, resulting in molecules with 30-50 µM affinities. Subsequent crystal structures revealed that these bound as predicted. Impressively, this was done on a protein that forms 115 kD hexamers – larger than those typically tackled by NMR.

Teddy would normally close his NMR posts by stating – usually quite forcefully – whether he felt the technique was practical or not. I’m no NMR spectroscopist, so I’ll throw this question out to readers – do you plan to try any of these approaches?

16 October 2017

Docking for finding and optimizing fragments

Docking can sometimes seem like the Rodney Dangerfield of FBDD: it don’t get no respect. In last year’s poll of fragment finding methods, computational approaches ranked in seventh place. This partly reflects the largely biophysical origins of FBDD, but it is also true that ranking low affinity fragments is inherently challenging. Still, the continuing rise in computational power means that methods are rapidly improving. A recent paper in J. Med. Chem. by Jens Carlsson and collaborators at Uppsala University, the Karolinska Institute, and Stockholm University illustrates just how far they can take you.

The researchers were interested in the enzyme MTH1, whose role in DNA repair makes it a potential anti-cancer target. The crystal structure of the protein bound to an inhibitor had previously been reported, and this was used for a virtual screen (using DOCK3.6) of 300,000 commercially available molecules, all with < 15 non-hydrogen atoms, from the ZINC database.

Finding fragments is one thing, but one really wants slightly larger, more potent compounds to begin lead optimization. Thus, the top 5000 fragments were analyzed to look for analogs with up to 6 additional non-hydrogen atoms among the 4.4 million commercial possibilities. This led to 118,421 compounds, each of which was then virtually screened against MTH1. Of the initial 5000 fragments, the top 1000 that had at least 5 analogs with (predicted) higher affinity were manually inspected. Of 22 fragments purchased and tested in an enzymatic assay, 12 showed some activity, with the 5 most active showing IC₅₀ values between 5.6 and 79 µM and good ligand efficiencies.

Since each of these fragments had commercially available larger analogs, the researchers purchased several to see if these did indeed have better affinities. Impressively, this turned out to be the case: both compounds 1a and 4a bound more than two orders of magnitude more tightly than their fragments. Interestingly, while the researchers were unable to obtain crystal structures of fragments 1 and 4 bound to MTH1, they were able to obtain crystal structures of 1a and a close analog of 4a, and these bound as predicted.

Of course, not everything worked: in the case of one fragment, among 19 commercial analogs purchased, the best was only 7-fold better. The crystal structure of this initial fragment bound to MTH1 was eventually solved, revealing that it bound in a different manner than predicted, thus explaining the modest results. In another case the most interesting commercial analogs turned out not to be available after all, but during the course of the study a different research group published a low nanomolar inhibitor with the same scaffold.

One notable aspect of this work is going from fragments to more potent leads without using experimentally determined structural information, something the majority of respondents in our poll earlier this year said they would not attempt. Although such advancement is not unprecedented, published examples are still rare.

In some ways this work is similar to the Fragment Network approach we highlighted last month, the key difference being that while Fragment Network was focused on looking for other fragments, this is focused on finding larger molecules. But how general is it? The researchers found that, while there are a median of just 3 commercial analogs in which a fragment is an exact substructure of a larger molecule, this increases to 700 when the criterion is relaxed to similarity (for example compound 1 and 1a). These numbers undoubtedly become even more favorable for organizations with large internal screening decks.

Eight years ago I ended a post about another successful computational screen with the statement that “the computational tools are ready, as long as they are applied to appropriate systems.” This new paper demonstrates that the tools have continued to improve. I expect we will see computational fragment finding and optimization methods move increasingly to the fore.