25 May 2020

Machine learning for two-dimensional NMR

Among the many methods to find fragments, only two – X-ray crystallography and protein-observed NMR – can routinely provide detailed structural information. Indeed, the first SAR by NMR paper arguably launched the field of fragment-based lead discovery nearly a quarter century ago. However, whereas crystallography has steadily increased in popularity, protein-observed NMR has lagged. A new paper in Comp. Struct. Biotech. J. by Grzegorz Popowicz and collaborators at Helmholtz Zentrum München, Technical University of Munich, and the ETH seeks to change this.

Two-dimensional NMR techniques, such as 1H-15N HSQC, produce two-dimensional plots with the chemical shift of the proton on one axis and the chemical shift of the nitrogen on the other. Different amide groups in a protein have different chemical shifts, and these can change in position or intensity when a ligand binds. Ideally these chemical shift perturbations (CSPs) can be used to tell exactly where on the protein a ligand binds, but even unassigned perturbations can give qualitative information on whether or not the protein is interacting with a fragment.

Unfortunately, analyzing hundreds of two-dimensional spectra is a tedious manual process; think of spending several hours playing Where’s Wally with blobs instead of people. And with only two colors. Thus, the process is subject to error and human bias. To make life easier for NMR spectroscopists, and to make analysis more objective, the researchers developed an automated software package called the CSP Analyzer.

The process started with 1611 spectra taken from fragment screens against four different proteins, of which 176 had a bound ligand. From the total, a training set was assembled of 32 actives along with 68 inactive or noisy spectra. These training spectra were fed into a machine learning algorithm similar to those used for computer image processing. Building the model required quite a bit of tweaking; because inactives outnumbered actives, a simple algorithm would do better by returning more false negatives than false positives. However, when looking for a fragment needle in a haystack of spectra, you really don’t want to miss anything useful, and the researchers used strategies to minimize this problem. In the end CSP Analyzer performed quite well, with an accuracy of 87% across the entire data set. Importantly, while it returned 10.3% spectra as false positives, it only missed 3.1% of spectra as false negatives.

Teddy would often end his posts by asking whether a new technique was practical. I’m no NMR spectroscopist, so I’ll leave it to readers to weigh in with their opinions. Happily, the software is freely available here, so you can download and try it yourself. Moreover, the researchers have ambitious future plans, such as extending CSP Analyzer to other types of NMR experiments and inputs. The rise of the machines continues, in a benevolent fashion. At least thus far.

No comments:

Post a Comment