25 May 2020

Machine learning for two-dimensional NMR

Among the many methods to find fragments, only two – X-ray crystallography and protein-observed NMR – can routinely provide detailed structural information. Indeed, the first SAR by NMR paper arguably launched the field of fragment-based lead discovery nearly a quarter century ago. However, whereas crystallography has steadily increased in popularity, protein-observed NMR has lagged. A new paper in Comp. Struct. Biotech. J. by Grzegorz Popowicz and collaborators at Helmholtz Zentrum M√ľnchen, Technical University of Munich, and the ETH seeks to change this.

Two-dimensional NMR techniques, such as 1H-15N HSQC, produce two-dimensional plots with the chemical shift of the proton on one axis and the chemical shift of the nitrogen on the other. Different amide groups in a protein have different chemical shifts, and these can change in position or intensity when a ligand binds. Ideally these chemical shift perturbations (CSPs) can be used to tell exactly where on the protein a ligand binds, but even unassigned perturbations can give qualitative information on whether or not the protein is interacting with a fragment.

Unfortunately, analyzing hundreds of two-dimensional spectra is a tedious manual process; think of spending several hours playing Where’s Wally with blobs instead of people. And with only two colors. Thus, the process is subject to error and human bias. To make life easier for NMR spectroscopists, and to make analysis more objective, the researchers developed an automated software package called the CSP Analyzer.

The process started with 1611 spectra taken from fragment screens against four different proteins, of which 176 had a bound ligand. From the total, a training set was assembled of 32 actives along with 68 inactive or noisy spectra. These training spectra were fed into a machine learning algorithm similar to those used for computer image processing. Building the model required quite a bit of tweaking; because inactives outnumbered actives, a simple algorithm would do better by returning more false negatives than false positives. However, when looking for a fragment needle in a haystack of spectra, you really don’t want to miss anything useful, and the researchers used strategies to minimize this problem. In the end CSP Analyzer performed quite well, with an accuracy of 87% across the entire data set. Importantly, while it returned 10.3% spectra as false positives, it only missed 3.1% of spectra as false negatives.

Teddy would often end his posts by asking whether a new technique was practical. I’m no NMR spectroscopist, so I’ll leave it to readers to weigh in with their opinions. Happily, the software is freely available here, so you can download and try it yourself. Moreover, the researchers have ambitious future plans, such as extending CSP Analyzer to other types of NMR experiments and inputs. The rise of the machines continues, in a benevolent fashion. At least thus far.

18 May 2020

Merging two of the same fragments for FABP4

The fatty acid binding proteins (FABPs) are a family of 10 proteins that – as their name suggests – shuttle fatty acids around cells. FABP4 has been implicated in a host of diseases, from atherosclerosis to nonalcoholic steatohepatitis. A recent paper in J. Med. Chem. by Yechun Xu and collaborators mostly at Shanghai Institute of Materia Medica describes how a fragment led to a compound with in vivo efficacy. It is a lesson in both recognizing and capitalizing on the fact that fragments often have multiple binding modes.

The researchers screened just 500 fragments, each at 1 mM, looking for displacement of a fluorescent ligand. Two hits were identified, of which compound 1 was by far the most potent. The researchers characterized the binding mode using crystallography, which itself was challenging because the protein co-purified with bound fatty acids. They had to denature the protein, strip fatty acids, and then refold it to obtain the apo form. When they were finally able to determine the crystal structure, they were surprised to find that compound 1 adopted three different binding modes under two different conditions (pH 6.5 and 7.5). These experimental results were supported by molecular dynamics calculations.

It is not uncommon for fragments to assume different binding modes. Indeed, the 7-azaindole fragment that led to vemurafenib, pexidartinib, and other clinical compounds has been found to bind in multiple orientations. In this case, the researchers recognized that the three binding modes put the two phenyl rings in three positions, suggesting that grafting a third phenyl ring onto compound 1 could improve affinity. This proved successful, and the resulting compound 3 had an affinity more than two orders of magnitude better as assessed both in the displacement assay and by isothermal titration calorimetry. Crystallography revealed that the molecule bound as expected.


Further structure-based design ultimately led to compound 17, with low nanomolar affinity. This molecule is also active in a cellular assay and has surprisingly good pharmacokinetic properties in mice. Given these encouraging results, the researchers tested whether the molecule could protect mice from multiorgan damage promoted by inflammatory lipopolysaccharides. The results were positive.

Unfortunately, compound 17 does show low micromolar activity against FABP3, whose inhibition would likely cause cardiac toxicity. Still, this is a nice example of fragment “self-merging”. Although merging two different fragments is common, merging a fragment onto itself is relatively rare, and – as shown here – not necessarily easy. It is an approach worth keeping in mind the next time you encounter a fragment with multiple binding orientations.

11 May 2020

Broadening the scope of 19F NMR

Over the past decade, fluorine NMR has established itself as a powerful fragment-finding method due to the advantages Teddy laid out in his classic “fluorine fetish” post. One feature of 19F NMR is that the chemical shifts of organofluorine molecules span a very wide range, in theory allowing large mixtures to be screened. However, existing NMR methods do not work across such large spectral windows, thereby requiring multiple experiments to screen an entire library. This limitation has now been overcome as described in a paper just published in Angew. Chem. by Andreas Lingel, Andreas Frank, and collaborators at Novartis and Karlsruhe Institute of Technology.

The researchers developed an experiment based on “broadband universal rotation by optimized pulses” (BURBOP). I confess that the details evade me (though they are all there in the supporting information if you wish to try it at home), but the upshot is a type of CPMG experiment in which fluorine-containing fragments bound to a protein show decreased peak intensities. Crucially, a single experiment can cover the full frequency range of pharmacologically relevant fluorine-containing molecules, spanning about 210 ppm. Previously, this required four two separate experiments.

Such increased throughput led the researchers to revamp their library, increasing the size from 1600 to 4000 fragments in an augmented library dubbed LEF4000. The paper has a nice, broadly applicable description of their curation process. Candidate members were brought in from both commercial and in-house sources and chosen to complement existing library members in terms of diversity. A modified rule of three was applied, with trifluoromethyl-containing fragments allowed to go up to 350 Da.

An in-house analysis of 25,000 fragments revealed that only about half of those with a clogD7.4 greater than 3 were soluble above 0.5 mM, so this was applied as an upper limit. Fragment solubilities were experimentally measured, and only compounds with solubilities above 0.2 mM were kept. (Although fluorine NMR is often done at low concentrations, complementary biophysical experiments are not.) Additional quality control measures included NMR and LC-MS purity assessments and removal of compounds that formed soluble aggregates as assessed by CPMG. Ultimately, 3969 of 5600 candidate molecules passed the gauntlet, and were combined in 131 mixtures of about 30 compounds each.

Having built their library, the researchers screened it against the antibacterial target CoaD, which is involved in coenzyme A synthesis. The screen took just two days, and automated hit identification took only a few hours on a standard laptop. The overall hit rate was ~6%, and some of the hits were confirmed using two-dimensional protein-observed NMR methods, revealing that they bind in the enzyme active site with affinities in the mid micromolar to low millimolar range.

Pushing the technique further, the researchers built a “Supermixture” of 152 compounds, including five of the hits spanning a wide range of chemical shifts, from -50 to -220 ppm. Even under these conditions the binders were readily identifiable, and the paper states that libraries exceeding 20,000 fragments could in principle be screened in a few days.

In 2009 I wondered why 19F NMR was not used more widely. How things change! At Novartis the LEF4000 library has been screened against “a wide variety of disease-related targets” and identified “tractable hits for each of the screened targets, among them many considered undruggable by small molecules such as transcription factors, a cytokine, a nuclear receptor, and a repeat RNA.” Practical Fragments looks forward to seeing some of these appear in the growing list of FBDD-derived clinical candidates.

04 May 2020

Fragment merging on the WBM site of scaffold protein WDR5

Two years ago we highlighted work out of Stephen Fesik’s lab at Vanderbilt University describing potent binders of WDR5, a molecular scaffold that interacts with dozens of other proteins. Those molecules bind at the so-called WIN site, disrupting interactions with proteins such as MLL1. Other proteins, such as the famous anticancer target MYC, bind at a completely different location – the WBM site. This is the focus of a new paper from the same group in J. Med. Chem.

The researchers had previously completed a traditional high-throughput screen and identified molecules such as compound 1. These were further optimized, but, as one might expect looking at the chemical structure, the best molecules had “challenging physicochemical profiles.” The researchers turned to fragments for help.

A two-dimensional (1H-15N HMQC) NMR screen of ~14,000 fragments yielded 43 hits, all of them quite weak, with dissociation constants in the millimolar range. The tetrapeptide portion of MYC that binds to the WBM site, Ile-Asp-Val-Val, contains a carboxylic acid flanked by lipophilic residues, and as one would expect many hits were hydrophobic acids. Crystal structures were determined for five, and these suggested a fragment merging opportunity.


The carboxylic acid moiety of fragment F2 makes similar interactions with an asparagine residue in WBM as the sulfonamide moiety of compound 1. The resulting merged compound 2a showed improved potency. More than a dozen replacements for the cyclohexyl ring were attempted but none improved potency significantly. Similarly, moving the cycloalkyl group around the 5-membered heterocycle was not productive. However, introducing a methyl sulfone moiety to engage a lysine residue led to a ten-fold boost in potency for compound 12. The molecule disrupted WDR5-MYC complex formation in cell lysates and also reduced MYC binding to target genes in cells.

This is another nice example of using fragment merging to fix problems across early lead series. Of course, compound 12 still has a long way to go; as the researchers note, the phenol is a likely site of glucuronidation. Still, this and the 2018 paper demonstrate the power of fragments to target two separate protein-protein interfaces on the same protein.