29 May 2017

Fragment hot spots revisited: a public validation set and method

This is the last week for our poll on how much structural information you need to begin optimizing a fragment – please vote on the right-hand side of the page if you haven’t already done so. We’ve recently discussed crystallography and NMR, so this post is focused on computation.

Predicting hot spots – regions on proteins where fragments are likely to bind – is becoming something of a cottage industry (see for example here and here). These can provide some indication as to whether or not a protein is ligandable, and ideally even provide starting points for a lead discovery program. But how should one searching for promising hot spots and binders choose a method, or evaluate a new one? In a recent paper in J. Med. Chem., Marcel Verdonk and colleagues at Astex provide a method as well as a validation set, both of which are freely available.

The validation set consists of 52 high-quality crystal structures pulled from the Protein Data Bank (PDB). These were chosen to be maximally diverse in terms of fragments (41 of them) and proteins (45). The fragments were not taken in isolation; rather, fragments of larger molecules were considered if they bound in the same region of the protein when presented from at least three different ligands. For example, the researchers note that there are no structures in the PDB of resorcinol bound to HSP90A, even though this is a privileged fragment that usually binds in a conserved fashion at the ATP-binding site in the context of a larger molecule.

Fragments chosen for the validation set have at most one rotatable bond and are quite small, just 5 to 12 non-hydrogen atoms. However, as they are culled from larger molecules, some (such as adamantane) are more lipophilic than standard “rule of three” guidelines.

The 52 examples in the test set were divided into 40 hot spots and 12 “warm” spots, depending on the occupancy of the binding site in the protein across the PDB. For example, the canonical purine binding site of kinases is a hot spot, while the nearby chlorophenyl-binding site of the PKA-Akt chimeric kinase is classified as warm.

With this validation set in hand, the researchers tested an in-house developed fragment mapping method called PLImap (which relies on the previously published Protein-Ligand Informatics force field, PLIff) to see how well they could reproduce the bound conformations of the fragments. The results were quite favorable in comparison with other docking methods tested. That’s exciting, and since PLImap is free to download and use (here), it should be a useful tool for modelers everywhere.

But of course PLImap also made mistakes, and some of these were “wrong in an interesting way.” Water often plays a critical role in protein-ligand interactions, but water was not included in the docking. Several cases where PLImap did not choose the experimentally observed conformation of the fragment involved water molecules. For example, PLImap placed the bromodomain-privileged 3,5-dimethoxylisoxazole fragment in the right location but in a flipped orientation, because the highly conserved water was not present.

Perhaps more interestingly, in some cases warm spots were ignored in favor of hot spots. For example, in the case of the PKA-Akt chimeric kinase mentioned above, the chlorophenyl fragment bound not at the chlorophenyl-binding subsite, where it sits in the context of larger ligands, but rather in the “hotter” purine binding subsite. This phenomenon was observed experimentally several years ago by Isabelle Krimm and colleagues; large BCL-xL ligands that were deconstructed into component fragments bound mostly at a single site, rather than the two sites occupied by the larger molecules. It would be fascinating to test this same set of molecules using PLImap.

All of which is to say that, while computational methods continue to make impressive strides, we are still (happily!) some way from getting rid of the experimentalists.

No comments: