13 July 2015

Fragments vs BTK: metrics in action

Sometimes the discussions over metrics, such as ligand efficiency, can devolve into exegesis: people get so worked up over details that they forget the big picture. A recent paper in J. Med. Chem. by Chris Smith and (former) colleagues at Takeda shows how metrics can be used productively in a fragment-to-lead program.

The researchers were interested in developing an inhibitor of Bruton’s Tyrosine Kinase (BTK) as a potential treatment for rheumatoid arthritis. This is the target of the approved anti-cancer drug ibrutinib, but ibrutinib is a covalent inhibitor, and the Takeda researchers were presumably concerned about the potential for toxicities to arise in a chronic, non-lethal indication. Many of the reported non-covalent BTK inhibitors are large and lipophilic, with consequently suboptimal pharmacokinetic properties. Thus, the team set out to design molecules with MW < 380 Da, < 29 non-hydrogen atoms (heavy atoms, or HA), and clogP ≤ 3.

The first step was a functional screen of Takeda's 11,098 fragment library, all with 11-19 HA, comfortably within the bounds of generally accepted fragment space. At 200 ┬ÁM, 4.6% of the molecules gave at least 40% inhibition. Hits that confirmed by STD NMR were soaked into crystals of BTK, ultimately yielding 20 structures. Fragment 2 was chosen because of its high ligand efficiency, novelty, and the availability of suitable growth vectors.
Close examination of the structure suggested a fragment-growing approach. Throughout the process, the researchers kept a critical eye on molecular weight and lipophilicity. This effort led through a series of analogs to compound 11, with only 24 heavy atoms and clogP = 1.7. This molecule is potent in biochemical and cell-based assays and has excellent ligand efficiency as well as LLE (LipE). Moreover, it has good pharmacokinetic properties in mice, rats, and dogs, with measured oral bioavailability > 70% in all three species. Finally, compound 11 shows efficacy in a rat model of arthritis when dosed orally once per day.

Although compound 11 is selective over the closely related kinase LCK, unfortunately it is a double digit nanomolar inhibitor of oncology-related kinases such as TNK2, Aurora B, and SRC, which would probably be unacceptable in an arthritis drug. Nonetheless, this study is a lovely example of fragment-growing guided by a strict commitment to keeping molecular obesity at bay.


Peter Kenny said...

Hi Dan, They’ve done a good job in optimizing the fragment hit but I would challenge your assertion that this study “shows how metrics can be used productively in a fragment-to-lead program” on the grounds that they’ve not actually used the metrics. Have a look at the article and ask yourself how the outcome would have differed had there been no reference to ligand efficiency metrics. It’s worth remembering that use of ligand efficiency metrics is a subset of control of lipophilicity and molecular size. That the authors have achieved the latter does not imply that they have done the former. They set lipophilicity and molecular size limits at the start of the optimization process and have delivered a compound with acceptable potency within these limits (which have been ‘translated’ to values of LE and LLE). Lord Rutherford might have described the ‘use’ of ligand efficiency metrics in this article as ‘philatelic’.

One might criticize the optimized compound for having too many aromatic rings and a value of Fsp3 which is far too low and surely it is only a matter of time before the authors are denounced in the Stevenage edition of Pravda. However, that is one auto-da-fe that I would be reluctant to light and I must confess to the Inquisition of being reminded, in a manner akin to Pavlov’s canine diners, of Macbeth’s “sound and fury” whenever encountering an article on 3D-ness. Another criticism that might be made of the optimized compound is that it has an excessive number of hydrogen bond donors. Some ‘experts’ assert that three is an absolute maximum although assertions often become less strident in the face of a request for evidence.

There are a number of things that I found interesting about the optimized compound and it would have been interesting to see how its measured logP (or that of the fragment) compared with the calculated values. I’m guessing that the methyl group of compound 11 may influence crystal packing and it would have been interesting to compare its measured aqueous solubility with that of 10. Although 11 has five hydrogen bond donors there are a number of factors that attenuate the hydrogen bond donor potential of 11.

Unknown said...

Hi Peter and Dan, Thanks for your comments and discussion on the paper. I'll take this opportunity to outline my position on the use of LE.

Over the years I have found LE a useful concept to help bring a team together on why to work on low affinity (uM-mM), low molecular weight inhibitors such as fragment 2 (LE 0.53) as opposed to alternative starting points that are nM, higher molecular weight inhibitors with LE values in the 0.2-0.25 range. The argument being the protein-ligand interactions of the smaller, less potent, higher LE compounds are of higher quality than the larger, potent compound, lower LE compounds. My experience is starting a lead optimization program with compounds that make high quality interactions with the protein is easier than the converse. For me the discussion on the validity of LE is an interesting curiosity but any position to advocate that it is not useful is perplexing to me as my experience determines otherwise.

Returning to the paper, the main use of LE was to select a uM compound for optimization over the many already published nM Btk inhibitors for the reason mentioned above.

Peter Kenny said...

Hi Chris, Thanks for joining the online discussion. I do like the article and I should point out that my criticism is of LE (the metric) and not LE (the concept). We make assumptions when using LE (the metric) and are misleading ourselves when these assumptions don’t hold. One might think that points on a straight line plot of pIC50 against heavy atom count represent compounds of equal ligand efficiency. Turns out that they don’t unless the line intersects the pIC50 axis at zero and this is the basis of Mike Schultz’s criticism of LE (the metric). My criticism of LE (the metric) is that ranking of compounds varies with the (arbitrary) concentration used to define the standard state for the Gibbs free energy and therefore LE (the metric) is thermodynamic nonsense (I use the term Voodoo Thermodynamics although Pauli might simply have said, “not even wrong”). Alternatively you can think in terms of your perception of the system changing with the concentration units of IC50.

I have a recent talk (Ligand efficiency: nice concept shame about the metrics http://www.slideshare.net/pwkenny/ligand-efficiency-metrics-n ) which discusses this. Slides 11 and 13 are particularly relevant to what I’ve been saying here and the journal article version is linked there as well.

One way of expressing LE (the concept) is to say that the acceptability threshold for screening hits is a monotonically increasing function of molecular size. LE (the metric) requires that the function be a straight line with zero intercept while LE (the concept) says that it’s OK for the line not to go through the origin. In fact LE (the concept) says that the function doesn’t even have to be a straight line. When prioritizing screening hits in a narrow molecular size range it can be very difficult to tell whether LE (the metric) or potency has been used to assess the hits.

Dan Erlanson said...

Pete, your distinction between LE (the concept) and LE (the metric) is reminiscent of Plato's Allegory of the Cave. Since you accept that LE (the concept) is useful, and since you have been one of the most trenchant critics of LE (the metric), perhaps you will rise to the friendly challenge of providing a mathematical formula that better approximates the Platonic Form we both value.

You've taken a first step with your statement that "the acceptability threshold for screening hits is a monotonically increasing function of molecular size," but this is a qualitative rather than a quantitative statement.

Without a mathematical definition, one ends up needing to resort to Justice Potter Stewart's less actionable "I know it when I see it".

Peter Kenny said...
This comment has been removed by the author.
Peter Kenny said...

Hi Dan, I don’t think we have to go back as far as ancient Greek philosophers (who generally preferred navel-gazing to the uncouth business of performing experiments). Let’s try to remember how FBDD was before LE arrived on the scene in 2004. Back then, the conceptual basis (or hypothesis if you prefer) of FBDD was that a low-affinity, low-MW compound could be as good a starting point for optimization as a higher-affinity, higher-MW compound and I would equate LE (the concept) with this conceptual basis. One can try to put this conceptual basis on a more mathematical footing by suggesting that there is a function of activity (or affinity) and molecular size such compounds with the same value of the function should be treated as ‘equivalent’. We can conjecture about the characteristics that an ‘appropriate' function might have and that’s where ‘acceptability threshold for screening hits is a monotonically increasing function of molecular size’ fits in. This may seem like arm-waving because that’s exactly what it is (although probably not the extent of the flailing of upper limbs required for SEEnthalpy) There are an infinite number of functions that we might define and LE (the metric) is just one of these. One solution (that ancient Greek philosophers would consider very uncultured) to the problem is to use the observed data to inform your choice of function and least squares fitting of activity (or affinity) to molecular size is one way of doing this.

My criticism of LE (the metric) is not that it is necessarily ‘wrong’ but that it is arbitrary. If straight line fit to pIC50 versus HA data actually has a pIC50 intercept of zero then LE (the metric) becomes a meaningful measure of what we think we understand to be ligand efficiency (both concept and metric) for that data set. However, the burden of proof is still on the person analyzing the data to demonstrate that assumptions made in the analysis are actually valid.