18 August 2025

Hundreds of crystallographic ligands for FABP4 – many not as expected

The ten human fatty-acid binding proteins (FABPs) shuttle lipids around cells. As we noted several years ago, FABP4 and FABP5 are potential drug targets for diabetes and atherosclerosis, but selectivity over FABP3 is needed to avoid cardiotoxicity. Markus Rudolph and colleagues at Hoffmann-La Roche describe progress towards selective molecules in three consecutive open-access Acta. Cryst. D papers. Perhaps more importantly, they gift a massive high quality data set to the scientific community – along with some important caveats about data for protein-ligand structures.
 
The first paper focuses on purification and NMR characterization of FABP4. Recombinant FABPs are normally expressed in E. coli, and they always contain natural fatty acids that copurify with the protein. This can complicate ligand binding studies, since the endogenous fatty acids act as competitors. Indeed, the researchers highlight two structures in the protein data bank (PDB) whose supposed ligands are probably fatty acids.
 
To solve this problem, the researchers denature FABP4, separate the fatty acid, and then refold the protein. This truly apo form of the protein was studied by NMR, revealing that the protein becomes more rigid upon ligand-binding.
 
The second paper is of more general interest. It reports a set of 229 crystal structures of various FABPs, of which 216 have a bound ligand. Of these, 75 have associated IC50 values for at least one FABP, and 50 compounds have IC50 values reported for FABP3, FABP4, and FABP5. Importantly, the structures are solved to high resolution, with a median of 1.12 Å. Two crystal forms are particularly suitable for soaking, and compounds were typically soaked at 60 mM in 30% DMSO overnight.
 
All the crystal structures are deposited in the PDB, and all the binding data are provided in the supporting information. Given FABPs’ predilection for carboxylic acids, the ligands contain a variety of carboxylic acid mimetics. This wealth of high-quality data should be valuable for constructing machine-learning binding models, and the researchers conclude by calling “on other industrial organizations to also make their legacy data available such that prediction models with broader applicability may be developed more quickly.”
 
But it was the third paper that really caught my attention: the researchers summarized it as “what is written on the bottle is not what is in the crystal.” In fact, of the 216 ligands reported, a whopping 33 (15%) do not match the compound registered. These are grouped into several categories and described in detail.
 
Human error is the simplest to explain: the researchers show an example where a 1,2-benzoxazole was registered as a 1,3-benzoxazole. Because the molecules have the same molecular weight, mass spectrometry could not distinguish them. Similarly, the researchers find several cases where the wrong enantiomer or diastereomer was registered. In another case, a racemic mixture led to a single enantiomer bound to FABP4, with the protein acting as a “chiral sponge.”
 
Other cases are more unusual, and include ring closing, ring opening, acyl shifts, hydrolysis, and instances of ligand decomposition or incomplete reactions. The researchers note that small amounts of impurities could be particularly problematic at the high ligand concentrations used for soaking; they calculate that just 0.06% impurity would be equivalent to the total amount of FABP in a crystal. Some fragment screens are done at even higher concentrations, further increasing the risk of enriching impurities.
 
A 15% rate of unexpected ligands is comparable to the numbers we blogged about here, but those were commercial libraries, whereas this set is from Roche, which likely has better internal quality control. One factor that led to the recognition of the problem is the high resolution, where a single atom change could be readily seen. Another is the buried nature of the ligands; ligands bound on the surface of a protein may have more dynamically disordered bits, which would be difficult to distinguish from missing moieties caused by decomposition.
 
Indeed, the researchers examine two other proteins, PDE10 and ATX, for which they have also released ~200 ligand-bound structures but at lower average resolutions. There are some unexpected ligands for these proteins too, but many fewer than for the FABPs – or perhaps we just can’t observe some of them.
 
As we noted back in 2014, up to a quarter of ligand-containing crystal structures in the PDB may contain serious errors, and the researchers cite a study suggesting that 12% are “just bad.” These could have obvious negative consequences for training computational models, and the researchers call on the community to set standards to create a rigorously chosen training set. Perhaps this discussion could be held in parallel with the discussion on how to house fragment screening data, which we wrote about last month.

1 comment:

Reto Walser said...

A nice series of papers. As with any compound collection, rigorous QC pays off.
Even though I'm tempted to say that NMR should have picked up the difference between 1,2- and 1,3-benzoxazole, I admit that it might be easy to miss in a routine NMR QC assay.