29 May 2017

Fragment hot spots revisited: a public validation set and method

This is the last week for our poll on how much structural information you need to begin optimizing a fragment – please vote on the right-hand side of the page if you haven’t already done so. We’ve recently discussed crystallography and NMR, so this post is focused on computation.

Predicting hot spots – regions on proteins where fragments are likely to bind – is becoming something of a cottage industry (see for example here and here). These can provide some indication as to whether or not a protein is ligandable, and ideally even provide starting points for a lead discovery program. But how should one searching for promising hot spots and binders choose a method, or evaluate a new one? In a recent paper in J. Med. Chem., Marcel Verdonk and colleagues at Astex provide a method as well as a validation set, both of which are freely available.

The validation set consists of 52 high-quality crystal structures pulled from the Protein Data Bank (PDB). These were chosen to be maximally diverse in terms of fragments (41 of them) and proteins (45). The fragments were not taken in isolation; rather, fragments of larger molecules were considered if they bound in the same region of the protein when presented from at least three different ligands. For example, the researchers note that there are no structures in the PDB of resorcinol bound to HSP90A, even though this is a privileged fragment that usually binds in a conserved fashion at the ATP-binding site in the context of a larger molecule.

Fragments chosen for the validation set have at most one rotatable bond and are quite small, just 5 to 12 non-hydrogen atoms. However, as they are culled from larger molecules, some (such as adamantane) are more lipophilic than standard “rule of three” guidelines.

The 52 examples in the test set were divided into 40 hot spots and 12 “warm” spots, depending on the occupancy of the binding site in the protein across the PDB. For example, the canonical purine binding site of kinases is a hot spot, while the nearby chlorophenyl-binding site of the PKA-Akt chimeric kinase is classified as warm.

With this validation set in hand, the researchers tested an in-house developed fragment mapping method called PLImap (which relies on the previously published Protein-Ligand Informatics force field, PLIff) to see how well they could reproduce the bound conformations of the fragments. The results were quite favorable in comparison with other docking methods tested. That’s exciting, and since PLImap is free to download and use (here), it should be a useful tool for modelers everywhere.

But of course PLImap also made mistakes, and some of these were “wrong in an interesting way.” Water often plays a critical role in protein-ligand interactions, but water was not included in the docking. Several cases where PLImap did not choose the experimentally observed conformation of the fragment involved water molecules. For example, PLImap placed the bromodomain-privileged 3,5-dimethoxylisoxazole fragment in the right location but in a flipped orientation, because the highly conserved water was not present.

Perhaps more interestingly, in some cases warm spots were ignored in favor of hot spots. For example, in the case of the PKA-Akt chimeric kinase mentioned above, the chlorophenyl fragment bound not at the chlorophenyl-binding subsite, where it sits in the context of larger ligands, but rather in the “hotter” purine binding subsite. This phenomenon was observed experimentally several years ago by Isabelle Krimm and colleagues; large BCL-xL ligands that were deconstructed into component fragments bound mostly at a single site, rather than the two sites occupied by the larger molecules. It would be fascinating to test this same set of molecules using PLImap.

All of which is to say that, while computational methods continue to make impressive strides, we are still (happily!) some way from getting rid of the experimentalists.

22 May 2017

Extracting more information from crystallographic data

The current poll on how much structural information is needed for fragment optimization is still open - if you haven't done so already, please vote on the right hand side. Last week we discussed new developments in NMR. This week we turn to crystallography.

Fragment screening by crystallography is a little like finding needles in haystacks. Typically, dozens or hundreds of crystals are individually soaked with one or more fragments. Diffraction data gathered from each crystal are used to generate electron density maps, which are iteratively refined by tweaking the conformation of side chains and adding water molecules. In theory, any unexplained electron density that remains after refinement should correspond to bound fragments.

In practice, the process of manually inspecting so many data sets can be both tedious and subjective. Although a narrow focus on the active site reduces the amount of work, doing so risks missing the many fragments that bind at interesting secondary sites. Also, because fragments have low affinities, they may only bind to a fraction of protein molecules; this "partial occupancy" lowers the signal to noise ratio. And fragments sometimes bind in more than one conformation, thereby smearing out the electron density and further reducing the signal.

Of course, even though crystallographic fragment screening can give very high hit rates, most crystals will not have bound fragments. In a new paper in Nat. Comm., Frank von Delf at the Structural Genomics Consortium and collaborators at several institutions describe how these "empty" structures can be turned from lemons into lemonade.

The method, called Pan-Dataset Density Analysis (PanDDA), is essentially a form of background correction. Dozens of datasets from empty crystals are averaged and computationally subtracted from a dataset of interest. This averaging gives much cleaner maps, allowing fragments to be more rapidly and easily detected. It’s almost as if you could subtract all the hay from a haystack to reveal any needles.

The researchers present four case studies of crystallographic fragment screens, each with more than 100 datasets, and the results are stunning: in one case manual inspection revealed just 2 fragment hits, both at a single site, while PanDDA revealed 24 fragments at 5 different sites!

One limitation of PanDDA is that it does require dozens of empty datasets – ideally more than 30. In a new paper in Acta Crystallogr. D Struct. Biol., Dorothee Liebschner at the Lawrence Berkeley National Laboratory and collaborators at other institutions describe an alternative approach suitable for lower throughput applications.

One common tool in crystallography is the OMIT map. Atoms in question (such as from a ligand) are omitted from the model, and the calculated electron density is then compared with the observed electron density; if the density remains, this suggests that the atoms really belong. Of course, there is no truly empty space in a crystal – solvent fills any space not occupied by protein or ligands. Typically this is accounted for by treating “bulk solvent” (ie, water molecules not making specific interactions) as being present at a constant level of background electron density. The problem is that when calculating an OMIT map, this bulk solvent could obscure weak but real electron density.

To address this challenge, the researchers develop “polder OMIT maps,” named after land that is kept dry despite being below the surrounding water level. Essentially, the bulk solvent is not allowed into polder OMIT maps when they are generated, thus enhancing any actual density and allowing low-occupancy ligands to be observed. Several lovely figures in the paper illustrate that the process works well.

It is nice to see that, despite its long history, crystallography continues to make practical and creative advances.

15 May 2017

NMR structures without protein assignments

Our latest poll asks how much structural information you need to advance a fragment (please vote on the right hand side of the page). On this subject, a recent paper by Marielle Wälti, Roland Riek, and Julien Orts in Angew Chem. demonstrates a new NMR method.

Researchers typically begin an NMR structure campaign by examining the chemical shift perturbations (CSPs) of proton-nitrogen or proton-carbon crosspeaks from an isotopically labeled protein in the presence and absence of a ligand. If you know which crosspeaks correspond to which specific atoms in an amino acid residue, you can deduce the ligand binding site by looking for the residues with the largest CSPs. Next comes the measurement of nuclear Overhauser effects (NOEs) between atoms in the ligand and atoms in the protein; these are exquisitely dependent on distance, so if you have enough measurements you can use these to accurately dock your ligand into the binding site of your protein.

This is how SAR by NMR was done more than twenty years ago, and it still works well today, but it is neither fast nor easy. In particular, the initial step of assigning the hundreds of protons, nitrogens, and carbons in a typical protein can be daunting.

To streamline the process, the researchers developed NMR molecular replacement (NMR2), first published last year (here) and presented by Julien at FBLD 2016. Rather than requiring knowledge of which peaks correspond to which specific protein atoms, NMR2 relies on the increasing power of computers to run large numbers of complex calculations. Various docking poses will generate different NOEs, so exhaustively and iteratively examining these possibilities and comparing them with the experimental data should generate an optimal model. (NMR2 does require that the structure of the protein is known, so you know the residues surrounding a ligand-binding pocket, even if you don’t know their chemical shifts. Also, the protons of the ligand are assigned, and in fact the intramolecular NOEs of the ligand itself are an important input.)

In the new paper the researchers apply NMR2 to a complex between the onocology target MDMX and a previously disclosed high nanomolar binder and find good agreement (1.35 Å RMSD) with the crystal structure.

The researchers then turn to “ligand #845,” which binds with millimolar affinity to the oncology target HDM2. A total of 33 intramolecular NOEs (from ligand #845) and 21 intermolecular NOEs (between ligand #845 and HDM2) were fed into NMR2 and used to crank through 54,000 structure calculations in a few hours to produce a binding model. No crystal structure was available, but conventional NMR methods support the model.

This seems like a rapid and powerful approach, but readers of this blog are probably wondering how well it will apply to fragments. Clearly NMR2 is sufficiently sensitive to weak binders. However, with 24 non-hydrogen atoms and a molecular weight of 354 Da, ligand #845 is too large to be called a fragment. Smaller molecules will have fewer hydrogen atoms and thus fewer intramolecular and intermolecular NOEs, decreasing the information content of the model. It will be fun to plumb the lower ligand size limits for this technique – leave a comment if you’ve done so!

08 May 2017

Poll: structural information needed for fragment optimization

As mentioned last week, advancing fragments in the absence of structure is a major challenge. But how much of a barrier is it really?

I know some researchers who would not consider moving forward with a fragment in the absence of a crystal structure. As crystallography continues to advance, more targets will be available, but many will remain out of reach for the foreseeable future.

Of course, the first SAR by NMR paper used NMR rather than crystallography, and the early work that ultimately led to venetoclax relied only on NMR-derived structures. Similarly, crystallography was initially unsuccessful against MCL-1, but NMR-based models allowed effective fragment advancement.

When crystallography and NMR both fail, there is in silico modeling, which continues to improve. Last year we highlighted how modeling succeeded in merging fragments to a nanomolar binder.

But the real challenge is advancing fragments with no structural information whatsoever. There are a few published examples (such as this and this). And it’s worth remembering that optimization in the absence of structure was how drug discovery was done decades ago, before the rise of biophysics. Indeed, until recently most GPCR-based drug discovery was done without the benefit of structural information.

So, in the poll to the right please choose the minimum level of structural information you would need to embark on a fragment to lead program. Happy voting!

01 May 2017

Twelfth Annual Fragment-based Drug Discovery Meeting

CHI’s Drug Discovery Chemistry meeting took place over four days last week in San Diego. This was easily the largest one yet, with eight tracks, two one-day symposia, and nearly 700 attendees; the fragment track alone had around 140 registrants. On the plus side, there was always at least one talk of interest at any time. On the minus side, there were often two or more going simultaneously, necessitating tough choices. As in previous years I won’t attempt to be comprehensive but will instead cover some broad themes in the order they might be encountered in a drug discovery program.

You need good chemical matter to start a fragment screen, and there were several nice talks on library design. Jonathan Baell (Monash University) gave a plenary keynote on the always entertaining topic of PAINS. Although there are some 480 PAINS subtypes, 16 of these accounted for 58% of the hits in the original paper, suggesting that these are the ones to particularly avoid. But it is always important to be evidenced-based: some of the rarer PAINS filters may tag innocent compounds, while other bad actors won’t be picked up. As Jonathan wrote at the top of several slides, “don’t turn your brain off.”

Ashley Adams described the reconstruction of AbbVie's fragment libraries. AbbVie was early to the field, and Ashley described how they incorporated lessons learned over the past two decades. This included adding more compounds with mid-range Fsp3 values, which, perhaps surprisingly, seemed to give more potent compounds. A 1000-member library of very small (MW < 200) compounds was also constructed for more sensitive but lower throughput biophysical screens. One interesting design factor was to consider whether fragments had potential sites for selective C-H activation to facilitate fragment-to-lead chemistry.

Tim Schuhmann (Novartis) described an even more “three-dimensional” library based on natural products and fragments. Thus far the library is just 330 compounds and has produced a very low hit rate – just 12 hits across 9 targets – but even a single good hit can be enough to start a program.

Many talks focused on fragment-finding methods, old and new. We’ve written previously about the increasingly popular technique of microscale thermophoresis (MST), and Tom Mander (Domainex) described a success story on the lysine methyltransferase G9a. When pressed, however, he said it did not work as well on other targets, and several attendees said they had success in only a quarter to a third of targets. MST appears to be very sensitive to protein quality and post-translational modifications, but it can rapidly weed out aggregators. (On the subject of aggregators, Jon Blevitt (Janssen) described a molecule that formed aggregates even in the presence of 0.01% Triton X-100.)

Another controversial fragment-finding technique is the thermal shift assay, but Mary Harner gave a robust defense of the method and said that it is routinely used at BMS. She has seen a good correlation between thermal shift and biochemical assays, and indeed sometimes outliers were traced to problems with the biochemical assay. The method was even used in a mechanistic study to characterize a compound that could bind to a protein in the presence of substrate but not in the presence of a substrate analog found in a disease state. Compounds that stabilized a protein could often be crystallized, while destabilizers usually could not, and in one project several strongly destabilizing compounds turned out to be contaminated with zinc.

Crystallography continues to advance, due in part to improvements in automation described by Anthony Bradley (Diamond Light Source and the University of Oxford): their high-throughput crystallography platform has generated about 1000 fragment hits on more than 30 targets. Very high concentrations of fragments are useful; Diamond routinely uses 500 mM with up to 50% DMSO, though this obviously requires robust crystals.

Among newer methods, Chris Parker (Scripps) discussed fragment screening in cells, while Joshua Wand (U. Penn) described nanoscale encapsulated proteins, in which single protein molecules could be captured in reverse micelles, thereby increasing the sensitivity in NMR assays and allowing normally aggregation-prone proteins to be studied. And Jaime Arenas (Nanotech Biomachines) described a graphene-based electronic sensor to detect ligand interactions with unlabeled GPCRs in native cell membranes. Unlike SPR the technique is mass-independent, and although current throughput is low, it will be fun to watch this develop.

We recently discussed the impracticality of using enthalpy measurements in drug discovery, and this was driven home by Ying Wang (AbbVie). Isothermal titration calorimetry (ITC) measurements suggested low micromolar binding affinity for a mixture of four diastereomers that, when tested in a displacement (TR-FRET) assay, showed low nanomolar activity. Once the mixture was resolved into pure compounds the values agreed, highlighting how sensitive ITC is to sample purity.

If thermodynamics is proving to be less useful for lead optimization, kinetics appears to be more so. Pelin Ayaz (D.E. Shaw) described two Bayer CDK kinase inhibitors having either a bromine or trifluoromethyl substitution. They had similar biochemical affinities and the bromine-containing molecule had better pharmacokinetics, yet the trifluoromethyl-containing molecule performed better in xenograft studies. This was ultimately traced to a slower off-rate for the triflouromethyl-substituted compound.

The conference was not lacking for success stories, including MetAP2 and MKK3 (both described by Derek Cole, Takeda), LigA (Dominic Tisi, Astex), RNA-dependent RNA polymerase from influenza (Seth Cohen, UCSD), and KDM4C (Magdalena Korczynska, UCSF). Several new disclosures will be covered at Practical Fragments once they are published.

But these successes should not breed complacency: at a round table chaired by Rod Hubbard (Vernalis and University of York) the topic turned to remaining challenges (or opportunities). Chief among these was advancing fragments in the absence of structure. Multiprotein complexes came up, as did costs in terms of time and resources that can be required even for conventional targets. Results from different screening methods often conflict, and choosing the best fragments both in a library and among hits is not always obvious. Finally, chemically modifying fragments can be surprisingly difficult, despite their small size.

I could go on much longer but in the interest of space I’ll stop here. Please add your thoughts, and mark your calendars for next year, when DDC returns to San Diego from April 2-6!