09 November 2015

Group efficiency

Ligand efficiency (LE) is one of the more controversial topics we cover at Practical Fragments. One critic asserted – incorrectly – that it is mathematically invalid. Another has stated that it is “not even wrong,” because the metric is predicated on standard state conditions and thus "arbitrary". (As he acknowledges, this also applies to the value and even the sign of the Gibbs free energy for a reaction.) A related metric that has received less attention is group efficiency (GE). In a paper just published in ChemMedChem, Chris Abell and colleagues at the University of Cambridge use this to help them optimize pantothenate synthetase (Pts) inhibitors.

Ligand efficiency is defined simply as the free energy of binding divided by the number of non-hydrogen, or “heavy” atoms (often abbreviated as HAC for heavy atom count) in the ligand. (Geek notes: although the binding energy is negative, LE is expressed as a positive number, so LE = - ΔG / HAC. Also, on Practical Fragments, units are assumed to be kcal mol-1 per heavy atom unless otherwise stated.)

Instead of focusing on a single ligand, group efficiency compares two ligands that differ by the presence or absence of a given group of atoms. To calculate GE, you simply subtract the ΔG values for the two ligands and divide by the number of heavy atoms in the group. For example, if you add a methyl group to your molecule and are lucky enough to get a 100-fold pop in potency, the methyl group has a group efficiency of 2.7 kcal mol-1 per heavy atom.

The current paper chronicles lead discovery for Pts, a potential target for tuberculosis. Previous screening efforts followed by fragment growing and fragment linking had generated low micromolar and high nanomolar inhibitors. The researchers turned to group efficiency to improve their molecules further.

As expected from ligand deconstruction studies (see for example here, here, and here), different portions of a molecule are likely to have vastly different group efficiencies. Indeed, this turned out to be the case here: the acetate moiety had high group efficiency, whereas the pyridyl moiety had lower group efficiency. Thus, the researchers set out to replace the pyridyl with ten diverse substituents. Happily, one of these improved the dissociation constant to 200 nM as assessed by isothermal titration calorimetry of the fully elaborated molecule. Compound 11 also showed reasonable enzyme inhibition in a functional assay.

One potential problem with group efficiency is that it assumes the molecules being compared bind in a similar fashion, which is not always a safe assumption. In this case, the researchers obtained a crystal structure of compound 11 bound to the enzyme, which not only revealed that it binds similarly to compound 5, but also suggested that inserting a methylene may improve binding. The resulting compound 20 showed better activity in the inhibition assay, as well as activity against M. tuberculosis in a cell assay (though unfortunately the dissociation constant was not reported).

This paper offers a clear illustration of how group efficiency can be useful for prioritizing which portions of a molecule to change. In some cases, such as the example here, it makes sense to try to replace groups with low group efficiency. On the other hand, the core fragment may bind in a hot spot, and so just a slight tweak can dramatically boost potency. As with lead optimization in general, there are many paths – both to enlightenment and to perdition.

14 comments:

Peter Kenny said...

Hi Dan,

There is a massive difference between ligand efficiency (LE) and group efficiency (GE) that you have not recognized in your blog post and that the authors have not recognized in their article. When you use GE you are scaling ΔΔG° (as opposed ΔG°) to by molecular size and ΔΔG° values are invariant with respect to standard concentration (as we noted in JCAMD 2014 28:699-710 http://dx.doi.org/10.1007/s10822-014-9757-8 ). I actually used the data in the original group efficiency article (ChemMedChem 2008 3:1179-1180 http://dx.doi.org/10.1002/cmdc.200800132 ) to show how residuals from a plot of -ΔG° against HA quantify the extent to which affinity beats the trend in the data (see slides 18 and 19 in http://www.slideshare.net/pwkenny/ligand-efficiency-metrics-n ). There is good agreement between residuals and GE values except for pyrazole for which a large value of GE was reported. The value of GE calculated for pyrazole is very sensitive to the value of the zero molecular size limit assumed for ΔG° and the plot in slide 18 raises questions about the the validity of the assumption made in the original GE article. The requirement for reference structures and the inability to accommodate structural changes (e.g aza-substitution; halogen exchange) represent limitations for the use of GE in analysing SAR. My challenge to anybody using GE would be to plot -ΔG° against HA and look at the residuals. Analyzing data in this manner provides also provides other information (e.g. different series may differ in their responses to HA) that is potentially useful for lead optimization teams.

You make some assertions in the post that warrant comment and I'll start with “ One critic asserted – incorrectly – that it is mathematically invalid” and, in the interests of accuracy, I suggest that you acknowledge that the formula used for LE to 'prove' the mathematical validity of LE was itself mathematically invalid since the argument logarithm is a quantity with units. I'll actually go a bit further and suggest that an erratum in the journal is very much in order. I'm not quite sure what you're getting at when you say, “ (As he acknowledges, this also applies to the value and even the sign of the Gibbs free energy for a reaction)” since 'states the bleeding obvious' would a lot more accurate than 'acknowledges'. A negative value of ΔG° simply tells us that the Kd is less than the standard concentration. If you find it useful to say that all Kd values below 1 M represent favorable binding then by all means do so. This is also a good point to suggest that you really do need to provide units when you quote LE values. Would you quote values of ΔG°, Kd or IC50 without units?

Peter Kenny said...

"The requirement for reference structures and the inability to accommodate structural changes (e.g aza-substitution; halogen exchange) represent limitations for the use of GE in analysing SAR" should have been (omissions capitalized):

The requirement for reference structures and the inability to accommodate structural changes (e.g aza-substitution; halogen exchange) CORRESPONDING TO ZERO NET CHANGE IN HA represent limitations for the use of GE in analysing SAR.

Jonas said...

Hi Dan,
Just wanted to mention that there's another "not even wrong" post on ligand efficiency, and arguments for and against other simple rules. Mainly against ; )
See link in my handle.

Btw, I liked your presentation in Boston the other week. Wanted to ask your opinion on the possibility/usefulness on doing fragment-based design on 'complete' molecules by 'fragmenting them'...but couldn't since my voice was gone. might be an old topic though.
Keep it up!
Jonas

Dan Erlanson said...

Hi Jonas,

Thanks - I enjoyed your presentation too, as well as your post. I also blogged about Pete's correlation inflation paper and agree that rules (or guidelines) can all too easily be abused. But as you said, "physico-chemical properties are important, and need proper attention." Rules are simply an attempt to formalize this, though they may be flawed in their construction and application.

Regarding your question about fragmenting larger molecules, there is actually a fairly rich history, both in library design (for example here) as well as to improve existing leads (for example here, here, and here).

Hi Pete,

I'm glad we agree on the utility of group efficiency. Regarding your call for units, please see the "geek notes" in the second paragraph of the post. When a community agrees on a unit of measurement, constantly repeating those units is unnecessary, if not pedantic. Thus, most speed limit signs in the US assume miles per hour, even though they don't say so explicitly, while speed limit signs in Germany assume kilometers per hour even though this is not always stated. I have never heard of someone claiming to be confused about the units successfully arguing a speeding ticket.

P.San. said...

In response to Pete, a negative free energy means the reaction will proceed at 25 degrees, 1 atm and at 1 M. It does not say whether it is favourable or whether it will be observed. For example diamond to graphite has a negative Gibbs free energy under the standard state but is too slow to be observed.

For example if I mix 1 µM of a compound with a 1 nM affinity and 1 µM of a compound with a 100 nM affinity the half saturation of the receptor by either ligand has a negative free energy (even after accounting for the entropy of competition), but in realistic terms the receptor will be almost fully occupied by the 1 nM compound.

Peter Kenny said...

Hi Dan,

I didn't actually say that GE had utility in my comment and simply made the point that, unlike LE, GE is not thermodyanmic nonsense. GE becomes less tractable when you have multiple instances of individual substituents as is the case in typical lead optimization projects. If the effects of substituents show non-additivity then GE for a substitiuent depends on what substituents are present at the other positions of the template. Also GE cannot deal with structural changes like halogen exchange and amide reversal in which the number of non-hydrogen atoms is conserved. There will be some specific situations in which GE may be helpful but it's not likely to represent a general solution to analyzing SAR. Like activity cliffs, matched molecular pairs (and series) and free energy perturbation, GE fits into a data-analytic framework in which the the focus is on structural relationships between compounds and I've discussed that in:

http://fbdd-lit.blogspot.com/2015/02/theres-more-to-molecular-design-than.html

Units are very important in science and quantities should be thought of as numbers mulitiplied by units. To be blunt, invoking the 'community' and speed limit signs as an excuse for not reporting units is pseudoscientific psychobabble and arm-waving. Presumably this would not be your response to to a reviewer of a manuscript who was sufficiently uncouth as to request that units should be given for LE? You may find it instructive to take a look at Box 1 in http://dx.doi.org/10.1038/nrd4163 to see just what a hash can be made of units (sadly, beyond repair by the normal erratum mechanism). On the subject of speed limit signs, the reason that you can't invoke a 'units defense' to evade a speeding ticket is that, in the law books, the speed limits most definitely have units even if the signs don't. As an aside, it's not always such a great idea to use driving analogies in the context of ligand efficiency as highlighted in this blog post:

http://fbdd-lit.blogspot.com/2015/09/ligand-efficiency-metrics-why-all-fuss.html

Peter Kenny said...

Hello P.San, You might want to think a bit more about your assertion that, "a negative free energy means the reaction will proceed at 25 degrees, 1 atm and at 1 M" because it implies that a measured free energy can only be used to make statements about what will happen under those conditions. Solution thermodynamics can be analyzed using mole fractions rather than molar concentration units although this is not so convenient for making up solutions. If you want to use molar concentration units, you can't define ΔG° for ligand-protein binding without introducing a standard concentration and, once you've done that, you can describe the ligand protein binding as a function of concentration. If, however, you want to use ΔG° measured at 298 K and 1 atm to predict what will happen at 310 K and 100 atm they you'll need other measurements (changes in enthalpy, entropy and volume associated with binding). You may find this article useful and it has a section on LE and fragment linking thermodynamics.

P.San. said...

Pete, that's a bit of a strawman. I didn't argue thermodynamics "that a measured free energy *can only* be used to make statements about what will happen under those conditions".

The arguments states that if the free energy is negative the reaction will proceed. Nothing more, nothing less. It doesn't imply thermodynamics can't predict how a reaction will proceed under other conditions. I am well aware of entropy and enthalpy in thermodynamics. Further more by the magnitude of the free energy change (and dH, dS, dCp and thermodynamic volume) we can see how far from the standard state we can change things for the reaction to still occur.

All I was highlighting is that the Gibbs free energy is only a measure of useful work performed by a system under a standard set of conditions. The point is these standard conditions are not arbitrary, IUPAC has defined them. For better or worse we have all agreed to use them. If you find a paper where they haven't been used the fault lies with the authors and reviewers not the standard state.

Peter Kenny said...
This comment has been removed by the author.
Peter Kenny said...

Hello P.San, Looks we need to look at this from a different angle and I'll suggest that we not worry about 1 atm and 25 °C because these represent a separate issue. In the context of protein-ligand association, the standard free energy of binding (ΔG°) is a function of temperature, pressure and the standard concentration (C°) which is taken by convention to be 1 M. The fact that that the 1 M standard concentration is endorsed by IUPAC doesn't make it less arbitrary (or more physical) and you might wish to think of what a 1 M solution of a typical protein to convince yourself of this. Given that ΔG° is usually determined from Kd, one could argue that you don't need invoke to ΔG° to describe the behavior of the protein-ligand association as a function of concentration since Kd provides the necessary information. What you can do with thermodynamic measurements (volume, enthalpy and heat capacity) is to calculate the effects of temperature and pressure on Kd so that you can predict the concentration response of the system at other temperatures and pressures.

Thermodynamics aside , you still need to know the units of Kd in order to know what it means for the association your protein and ligand. Whether you work with Kd or with ΔG°, you will need a concentration unit or standard concentration in order to specify the system (unless you want to use mole fractions). One very important principle in science is that a valid perception of a system must be invariant with respect to the units (or reference states) used to describe it. Unfortunately this is not the case for LE and that's why I assert that LE is 'not even wrong'.

P.San. said...

Pete if we are going down that road, all units are arbitrary as we have assigned value to them based on convince. It's a pointless discussion in terms of optimizing a fragment.

Thermodynamics have a much bigger role in drug discovery than just predicting 'the concentration response of the system at other temperatures and pressures.' Look at ITC, thermal shift assays or even basic enzymology to understand the wide reaching uses of thermodynamic parameters and principals.

I've had a quick simulation of your correlation inflation problem tonight and by transforming the equations used very slightly it can be removed by scaling for the standard state used as part of the calculation. Even if people were to do this, it would make very little difference to the med chem decisions.

I honestly don't think ligand efficiency is the end of rational thought as you present it. Worse still I imagine people will still use it to develop fragments, which shock horror, may make it into the clinic. Imagine the scandal if fragments which had ligand efficiency used in their development were put into man!

Peter Kenny said...
This comment has been removed by the author.
Peter Kenny said...

P.San, Could you be more specific (particularly about the scaling) about what you mean by:

“I've had a quick simulation of your correlation inflation problem tonight and by transforming the equations used very slightly it can be removed by scaling for the standard state used as part of the calculation. Even if people were to do this, it would make very little difference to the med chem decisions”

All I’m saying is that it is not valid to invoke thermodynamics in support of the 1 M concentration unit that is built into the definition of LE. Sometimes this choice of unit will not be a problem and sometimes it will. There are ways (e.g. by using GE) in which one can analyze the response of affinity/potency/activity to molecular size without having to assume that 1 M has a special physicochemical significance.

P.San. said...

I can't be bothered to write out a full proof for you in a comments box I'm afraid, but it isn't hard and I'm sure if you wanted to linearize and scale it you could too. If you are really interested we can take this offline to discuss.

I too prefer group efficency as applied to a series for development, but not through thermodynamic rigour. Instead I feel it makes for better discussion about the exact impact of each group although I'm not sure whether it would account for changes to electronics distal to the site modified. For that intrinsic thermodynamic measurements remain king but are too difficult practically to apply to an entire development program.