The ten human fatty-acid binding
proteins (FABPs) shuttle lipids around cells. As we noted several years ago,
FABP4 and FABP5 are potential drug targets for diabetes and atherosclerosis,
but selectivity over FABP3 is needed to avoid cardiotoxicity. Markus Rudolph
and colleagues at Hoffmann-La Roche describe progress towards selective
molecules in three consecutive open-access Acta. Cryst. D papers. Perhaps
more importantly, they gift a massive high quality data set to the scientific
community – along with some important caveats about data for protein-ligand structures.
The first paper focuses on
purification and NMR characterization of FABP4. Recombinant FABPs are normally
expressed in E. coli, and they always contain natural fatty acids that copurify
with the protein. This can complicate ligand binding studies, since the endogenous
fatty acids act as competitors. Indeed, the researchers highlight two
structures in the protein data bank (PDB) whose supposed ligands are probably
fatty acids.
To solve this problem, the
researchers denature FABP4, separate the fatty acid, and then refold the
protein. This truly apo form of the protein was studied by NMR, revealing that
the protein becomes more rigid upon ligand-binding.
The second paper is of more
general interest. It reports a set of 229 crystal structures of various FABPs,
of which 216 have a bound ligand. Of these, 75 have associated IC50
values for at least one FABP, and 50 compounds have IC50 values reported
for FABP3, FABP4, and FABP5. Importantly, the structures are solved to
high resolution, with a median of 1.12 Å. Two crystal forms are particularly
suitable for soaking, and compounds were typically soaked at 60 mM in 30% DMSO
overnight.
All the crystal structures are
deposited in the PDB, and all the binding data are provided in the supporting
information. Given FABPs’ predilection for carboxylic acids, the ligands
contain a variety of carboxylic acid mimetics. This wealth of high-quality data
should be valuable for constructing machine-learning binding models, and the
researchers conclude by calling “on other industrial organizations to also make
their legacy data available such that prediction models with broader
applicability may be developed more quickly.”
But it was the third paper that
really caught my attention: the researchers summarized it as “what is written on the
bottle is not what is in the crystal.” In fact, of the 216 ligands reported, a
whopping 33 (15%) do not match the compound registered. These are grouped into several
categories and described in detail.
Human error is the simplest to explain:
the researchers show an example where a 1,2-benzoxazole was registered as a
1,3-benzoxazole. Because the molecules have the same molecular weight, mass
spectrometry could not distinguish them. Similarly, the researchers find
several cases where the wrong enantiomer or diastereomer was registered. In
another case, a racemic mixture led to a single enantiomer bound to FABP4, with
the protein acting as a “chiral sponge.”
Other cases are more unusual, and
include ring closing, ring opening, acyl shifts, hydrolysis, and instances of
ligand decomposition or incomplete reactions. The researchers note that small amounts
of impurities could be particularly problematic at the high ligand
concentrations used for soaking; they calculate that just 0.06% impurity would
be equivalent to the total amount of FABP in a crystal. Some fragment screens
are done at even higher concentrations, further increasing the risk of enriching
impurities.
A 15% rate of unexpected ligands is
comparable to the numbers we blogged about here, but those were commercial
libraries, whereas this set is from Roche, which likely has better internal quality
control. One factor that led to the recognition of the problem is the high
resolution, where a single atom change could be readily seen. Another is the
buried nature of the ligands; ligands bound on the surface of a protein may
have more dynamically disordered bits, which would be difficult to distinguish
from missing moieties caused by decomposition.
Indeed, the researchers examine
two other proteins, PDE10 and ATX, for which they have also released ~200 ligand-bound
structures but at lower average resolutions. There are some unexpected ligands
for these proteins too, but many fewer than for the FABPs – or perhaps we just
can’t observe some of them.
As we noted back in 2014, up to a
quarter of ligand-containing crystal structures in the PDB may contain serious errors, and the researchers cite a study suggesting that 12% are “just bad.”
These could have obvious negative consequences for training computational
models, and the researchers call on the community to set standards to create a
rigorously chosen training set. Perhaps this discussion could be held in
parallel with the discussion on how to house fragment screening data, which we wrote
about last month.
1 comment:
A nice series of papers. As with any compound collection, rigorous QC pays off.
Even though I'm tempted to say that NMR should have picked up the difference between 1,2- and 1,3-benzoxazole, I admit that it might be easy to miss in a routine NMR QC assay.
Post a Comment