25 March 2013

Leave Them Asking for More

ret·ro·spec·tive  (rtr-spktv) adj.
1. Looking back on, contemplating, or directed to the past.
2. Looking or directed backward.
3. Applying to or influencing the past; retroactive.
I would add: 4. Looking back on the past, to influence the future.

In this vein, a recent paper by Ferenczy and Keserű in J. Med Chem looks back on hit-lead optimizations derived from fragment starting points.  In this very interesting paper, they look at 145 fragment programs and evaluate the properties of the original hit and then again as it progresses into the lead.  Of the 145 programs, these were aimed at 83 proteins of which 76 are enzymes, 6 are receptors, and 1 is an ion channel.  These programs evolved into leads, tools, and clinical candidates.  The authors set out to answer three questions: 1. do fragments eliminate the risk of property inflation, 2. how do ligand efficiency metrics support fragment optimizations, and 3. what is the impact of detection method, optimization strategy, and company size on the optimization.

Table 2 shows the median the calculated properties for the hits and optimized compounds.  The pIC50 improved by roughly three orders of magnitude, but ligand efficiency (LE) stayed roughly the same.  Log P increased but SILE did not.  SILE was a metric I was not familiar with and is calculated by pIC50/(HAC)^0.3.  SILE is a size-independent metric of ligand efficiency.  I won't attempt to reproduce all the graphs they generated; get the paper.  Some interesting data points: the median fragment hit had 15 heavy atoms (see related poll here), the median size of leads is 28 heavy atoms, and good fraction of hit-lead pairs changed less than 5 heavy atoms (which of course is well known from here). So, what is the answer for their first question?  If you are looking at something like LE, then these hit-lead pairs maintain the efficiency.  If you are looking at logP, then the answer is no, the hit-lead pairs get greasier.  I would really like to see more granularity here (see What's Missing below).  Their major comment here is that SILE and LELP (work by the author's previously reviewed by Dan) are the two best metrics to monitor as hit-lead optimization is underway.  Increases in both metrics correlate with increases in FBDD programs. 
 
They then looked at the  screening method.  The breakout of primary screening (in their definition the first one listed when multiple methods were used) was 38% biochemical, 25% NMR, 18% X-ray, and 11% virtual.  This is an interesting contrast to these results; SPR is not the dominant screening technique (8% tied with MS).  So, does this mean, >40% of practitioners are using SPR, but as a secondary screen?
Table 4 then does a pairwise comparison of the metrics based upon primary screen origin of the fragment (see What's Missing below).  Biochemical screens yield the most potent hits (4.75) while NMR (3.53) has the least potent.  X-ray has the smallest hits( 13 heavy atoms) while virtual screening the largest (17 HA).  I don't think any of this is surprising; the authors point out that hit properties exhibit a significant dependence on the method used. It is noted that optimization tends to diminish differences in hit properties.  Again, I think this is not surprising; thermodynamics and medchem are all the same no matter how big or small the molecule.  They do point out that biochemical based hits preserve their advantage after optimization, primarily relative to NMR.  They posit that the difference is that more potent compounds need less "stuff" to become sufficiently potent, and thus have a better mix of medchem for potency and other property optimization.  Weaker starting points need more bulk, more atoms, to become equipotent with biochemical starting points, they suggest.  Lastly, they show that structural-based optimization efforts are better than those without structural information.  The structural information comes from X-ray primarily (52%) and NMR (10%).  Interestingly, they reference this poll from the blog on whether you need structural information to prosecute fragments.  [As an aside, since I wrote that blog post and it was reference in a paper, do I get to add it to my resume?]

Finally, they break the originating labs into three categories: academic (18%), small/medium enterprises (SME)( 37%), and Big Pharma (45%).  The SME results are superior to those achieved by the academics and Big Pharma.  Their explanation is that SME's tend to be more platform focused and predominantly ensure "structural" enablement of targets.  75% (40/53) optimizations at SMEs used protein structural information, while "only" 62% of those at Big Pharma did.  They do not rule out the differences in target selection at those two different groups of companies.  I would like to propose an alternative hypothesis (tongue-not-entirely-in-cheek): Big Pharma has "old crusty" chemists who don't understand fragments and thus just glom on hydrophobic stuff to increase potency because "that's how we always do it" while SME have innovative chemists.  And of course academia is just making tool compounds and crap.


One thing that I would like to emphasize is that Ro3, metrics, and so on should not be used as hard cutoffs.  As shown in Figure 11, even compounds that are outside the "preferred" space can reach the clinic.  The best way to view them is akin to the Pirate Code; they are more guidelines than rules. 
 
What is Missing?  The supplemental information (which the authors are willing to kindly share) does not break out the targets into specific classes.  However, they do list each target, so it should be easy to add this a data column.  More importantly, they do not break out hit-lead pairs into those that were optimized for use as tools, clinical candidates, and leads (and which of the leads died).  Tools are never supposed to look like leads (but you are lucky if they do), so their inclusion here can be biasing the results.  Although, it is likely that the of the 145 not very many of the examples are strictly tools; it would be nice to know though.   
I am struck by the information denseness of Table 4 and wish that instead of pairwise comparisons, they had instead simply list the hit-lead metrics for each methodology.  I think there is gold to be mined in Table 4 and just like real gold is hard to find.
 
I have not addressed every single point made by the authors.  I, for one, am hoping that they will continue their analyses (especially with an eye to some of What is Missing).  I hope that there will be a significant amount of discussion around these points.  I will make sure we hit on this at the breakfast roundtable at the upcoming CHI FBDD event in SD (I even have the same pithy title as last year!)

18 March 2013

Rad fragments

One of the selling points of FBLD is that it can find starting points against challenging targets such as protein-protein interactions. A major reason these targets are so tricky is that they often have large, flat interfaces with few pockets for small molecules to bind. An example is the interaction between the tumor suppressor BRCA2 and the recombinase RAD51, which is mediated in part by the phenyl ring of a phenylalanine residue – a very small moiety even by the standards of fragments. In a paper published recently in ChemBioChem, Marko Hyvönen and colleagues at the University of Cambridge describe how they’ve found fragments that bind to this site.

The researchers started with a microbial RAD51 homolog that had been humanized by mutagenesis; the human protein itself is unstable and difficult to work with. They performed a thermal screen with 1249 fragments. Thermal denaturation has been criticized for producing noisy data, and indeed, 96 fragments produced complex, uninterpretable results. However, the two best fragments both contained an indole core and were confirmed to bind by STD-NMR.

Competition experiments confirmed that these two fragments competed with a short peptide containing the critical phenylalanine, indicating that they bound at the desired spot. ITC revealed that they had dissociation constants around 2 mM. Their binding modes were also confirmed crystallographically.

The researchers then used one of these fragments as a probe in a round of STD-NMR experiments, in which they examined 42 fragments to see whether any of these could compete away the first fragment. This led to two new hits, both slightly more potent than the initial ones.

One of these new fragments was then used as a probe in another round of STD-NMR experiments with 120 fragments chosen as analogs or by in-silico screening. This led to four additional fragments, some of which had sub-millimolar affinities and good ligand efficiencies. All 6 of the new fragments from the two STD-NMR screens were characterized crystallographically and found to bind at the same site as the original indole fragments, though with some subtle differences that could be exploited for further elaboration.

This is a nice, thorough example of fragment discovery in academia. As the authors conclude:

Investment in a platform of orthogonal biophysical assays and screens is crucial for progression into a programme of medicinal chemistry. The elaboration of poorly validated hits not only has a high likelihood of failure, but without a variety of robust assays in place, the risk of being misled by badly behaving compounds increases.

Of course, these are still relatively weak fragments, but I’ve heard one of the authors speak at a conference in which he stated that they’ve been able to advance these to nanomolar leads with cell activity. Stay tuned!

11 March 2013

With the proper tool, I could move the world

As noted before, bromodomains are a "hot" area of drug discovery.  Dan mentioned last year that PFI-1 was being released as a tool compound by the SGC.  In this paper, Fish et al. describe its discovery (Supplemental Information here).  They started their discovery with potential fragment-sized acetyl-lysine mimics (DMSO need not apply!), like others have described.  In particular, 3,4-dihydro-3-methyl-2(1H)-quinazolinones like Cpd 7 and its bromoequivalent.  These two compounds had sub-30uM potency and thus LE>0.45.  The efforts of Conway et al. and Chung et al. were highly instructive to the Pfizer group.  Crystallography was a key driver of confirming the binding modes seen by Conway are possible and that the quinazolinones are a viable acetyl-lysine mimic. 

The crystallography pointed out that the bromine is pointing towards solvent and thus the appropriate place to start doing chemistry.  Based upon the structures, a "bent" substituent at the 6 position appeared to be promising; sulfonamides were chosen for this role.    Compounds 9 and 11 were also noted as attractive, novel compounds in their own right.  These were used for very limited library construction.  The compounds derived from 9 were profiled first.  While better than the parent bromide, subsequent structural analysis showed that they were not making good interactions with the sites intended (WPF shelf).  The sulfonamides derived from 11 on the other hand showed significantly improved activity.  The SAR was relatively insensitive to the substitution of the aryl group, due to the optimized placement on the shelf and the reversed sulfonamide.  
PFI-1 has 0.22uM activity against BRD4 and it was nominated as the probe molecule.  They further investigated its binding via X-ray.  They also looked at it in a much broader array of assays: broader pharmacological selectivity, a cell-based inflammatory end-point assay, and rodent pharmacokinetics.  It had < 50% inhibition against 15 targets at 10uM (GPCR, ion channels, enzymes) and < 20% inhibition against 50 kinases. It fits the criteria for a good probe.

As the authors state, it was designed in a little over 250 molecules from an efficient fragment starting point covering only two design cycles.  I think this is an excellent example of probe design/discovery. 


07 March 2013

Fragments 2013

Fragments 2013, the 4th RSC-BMCS fragment-based drug discovery meeting, took place this week at STFC Rutherford Laboratory in Oxfordshire, UK. These biennial meetings started in 2007; you can read impressions of Fragments 2009 here and here. With 14 speakers, 49 posters, multiple exhibitors, close to 200 attendees, and a pre-conference training course, this post can touch on just a few topics. Some of the talks and poster summaries are available here.

One of the first things I noticed was the number of new faces, always a good indicator for the health of a field. I was also struck by the number of attendees from big pharma, including a couple companies that had draconian travel policies in 2012. Hopefully this portends a thaw from the last few years.

False positives or false negatives?
A recurring theme was the (ir)reproducibility of fragment-finding methods. Practical Fragments recently discussed this here, and it seems many other folks are also finding that orthogonal methods can produce non-overlapping sets of hits. For example, Ursula Egner from Bayer Healthcare described two different targets screened using multiple methods. For thrombin, her team found the following from a library of 1891 fragments:
27 hits with IC50 ≤ 650 μM using high-concentration activity screening
75 hits with IC50 ≤ 2 mM using SPR
58 hits that stabilized the protein more than 2σ above baseline against thermal melting
Of these, only 2 were found by all three techniques, and of 114 fragments soaked into crystals, only 15 gave structures.
Another (unnamed) protease gave similar results with a library of 2031 fragments:
17 hits from high-throughput screening
48 hits from SPR
38 hits from thermal shift
None of the hits were found in all three assays!
In this case soaking was not possible, and of the 93 co-crystallization trials only 8 produced structures, of which none came from the thermal shift assays (also true for thrombin).
Al Gibbs from Jannsen R&D found similar results in a retrospective analysis of hits against ketohexokinase (see also here). Of 786 fragments tested, there were:
54 hits in an activity assay (mass-spectrometry based)
75 hits from SPR
44 hits that produced crystal structures
Of all these, only 2 were in common. There was also no correlation between affinity or solubility and the ability to obtain a crystal structure.
However, these observations were not universal. Rod Hubbard noted that of the 32 targets screened over the past 10 years at Vernalis, there tended to be good overlap between hits from SPR and NMR, and that these tended to produce X-ray crystal structures, though crystallography had plenty of false negatives too. He did single out thermal melt assays as being particularly unreliable, as have others.

How to reconcile these varied experiences? Rod stressed that assays required very careful optimization, and that subtle changes could dramatically improve the number and quality of hits. Indeed, the thrombin example above used different cutoffs for the activity and SPR screens, and the ketohexokinase crystals were soaked at pH 4.5 while other assays were run at pH 7.5. Tony Giannetti noted that his group at Genentech tries to closely match their SPR screening conditions with those that the crystallographers use.

All this does raise the question of what to do when your orthogonal assays don’t agree: do you go with less validated hits, risking false positives, or throw away potentially valuable fragments? There probably is no one right answer. If you have plenty of hits that confirm in all your assays you should probably stick with those, but if you’re working on a tougher target you may need to dig into the noise, but be especially wary of misleading, ultimately meaningless babble. Of course, the potential for false positives means you want to take even more care in your fragment library design.

Library design
On the topic of “three-dimensional” fragments, a concern raised as far back as 2009 is that they may have a lower hit-rate than “flatter”, more aromatic fragments. Dirk Ullman noted that over the course of 29 screens he and his colleagues at Evotec have obtained 15,687 fragment hits, with any given fragment rarely hitting more than 4 different targets. Reassuringly, Oliver Barker presented a poster in which he found that while these hits were slightly biased towards having fewer tertiary and quaternary carbons as well as a lower Fsp3, this was a very modest trend, probably not statistically significant.

Teddy’s recent poll asking how much diversity readers want in their fragment libraries found that though the majority of respondents (60%) wanted maximum diversity, target-focused libraries can be effective too. Paul Bamborough described how, in addition to a generic fragment screening library, he and his colleagues at GlaxoSmithKline also built a collection of 936 fragments geared for kinases (described here) and, more recently, 1326 fragments targeted to bromodomains. These have yielded much better hit rates than have their diverse fragment sets.

The targeted libraries also provide a nice example of fragment-assisted drug discovery: they were designed based on molecules derived from high-throughput screening, and the data generated by screening the fragments against several bromodomains have in turn informed the design of 25,000 new lead-like molecules for HTS and several billion DNA-encoded molecules.

There was plenty else of note, including some nice fragment-to-lead stories (such as this) and others that should be appearing in the literature soon, but I’ll end here. What were your impressions?

04 March 2013

Poll Results--HAC vs. MW

Our poll asking what denominator people use for the ligand efficiency metrics.  This idea for this poll came from these posts at In The Pipeline.  Of the 38 respondents, 8 "don't need no steenkin' metrics".  Of the remaining 30 answers, 27 people use heavy atom count, 1 uses both, and only 2 use molecular weight. 

So, in terms of everyday atoms, I think people can agree that heavy atom count makes the most sense.  But, Derek is still trying to figure out what to do with heavy halogens.  Do halogens need to be treated differently?  My thoughts are that they don't in the hit generation (HG) stage, but in lead optimization stage they would.   I have always thought that ligand metrics are most germane to the HG stage and less useful once you are trying to optimize things like PK/PD properties.  If a heavy halogen, makes it from hit confirmation, hit expansion, and into lead optimization, it is most likely doing something, so why penalize it?  

Am I thinking about this too naively? 

01 March 2013

Purifying hydrophilic fragments

Lipophilicity is a topic that comes up periodically. Lipophilic molecules are increasingly viewed as problematic from a drug development standpoint. Even if the correlation studies indicting lipophilicity are not as strong as they appear, at the end of the day we would prefer most of our drugs to be nicely water soluble.

That said, many of the molecules we make are on the greasy side. GDB-17, Jean-Louis Reymond’s recent computational enumeration of small molecules with 17 or fewer heavy atoms, reveals that most potential molecules tend to be much more polar than similarly sized compounds that have actually been made. One likely reason for this is that purifying highly water-soluble molecules is difficult; it’s hard to wash away inorganic reagents, and they often stick to the normal silica gel that chemists use to purify conventional molecules. Reverse-phase HPLC is useful, but can be tedious and low throughput.

In a recent issue of Drug Discovery Today, Andrew Hobbs and Robert Young of GlaxoSmithKline provide practical tips on using reverse-phase flash chromatography as an alternative to HPLC. They report working at scales from milligrams to tens of grams and are able to separate some very polar molecules. There’s a lot of good stuff in this paper on choosing columns, solvents, and loading techniques. A lot of these details get pretty nuanced, so it’s nice to have them in one place. If you’re trying to isolate hydrophilic molecules, definitely check it out.