Among the many methods to find
fragments, only two – X-ray crystallography and protein-observed NMR – can routinely
provide detailed structural information. Indeed, the first SAR by NMR paper arguably
launched the field of fragment-based lead discovery nearly a quarter century
ago. However, whereas crystallography has steadily increased in popularity,
protein-observed NMR has lagged. A new paper in Comp. Struct. Biotech. J.
by Grzegorz Popowicz and collaborators at Helmholtz Zentrum München, Technical
University of Munich, and the ETH seeks to change this.
Two-dimensional NMR techniques,
such as 1H-15N HSQC, produce two-dimensional plots with
the chemical shift of the proton on one axis and the chemical shift of the
nitrogen on the other. Different amide groups in a protein have different chemical
shifts, and these can change in position or intensity when a ligand binds. Ideally
these chemical shift perturbations (CSPs) can be used to tell exactly where on the
protein a ligand binds, but even unassigned perturbations can give qualitative information
on whether or not the protein is interacting with a fragment.
Unfortunately, analyzing hundreds
of two-dimensional spectra is a tedious manual process; think of spending
several hours playing Where’s Wally with blobs instead of people. And with only
two colors. Thus, the process is subject to error and human bias. To make life
easier for NMR spectroscopists, and to make analysis more objective, the
researchers developed an automated software package called the CSP Analyzer.
The process started with 1611
spectra taken from fragment screens against four different proteins, of which 176
had a bound ligand. From the total, a training set was assembled of 32 actives
along with 68 inactive or noisy spectra. These training spectra were fed into a
machine learning algorithm similar to those used for computer image processing.
Building the model required quite a bit of tweaking; because inactives
outnumbered actives, a simple algorithm would do better by returning more false
negatives than false positives. However, when looking for a fragment needle in
a haystack of spectra, you really don’t want to miss anything useful, and the
researchers used strategies to minimize this problem. In the end CSP
Analyzer performed quite well, with an accuracy of 87% across the entire data
set. Importantly, while it returned 10.3% spectra as false positives, it only
missed 3.1% of spectra as false negatives.
Teddy would often end his posts
by asking whether a new technique was practical. I’m no NMR spectroscopist, so
I’ll leave it to readers to weigh in with their opinions. Happily, the software is
freely available here, so you can download and try it yourself. Moreover, the researchers
have ambitious future plans, such as extending CSP Analyzer to other types of
NMR experiments and inputs. The rise of the machines continues, in a benevolent
fashion. At least thus far.