Practical Fragments: SAMPL7: Epic computational fail or just no solution?

Every few years computational chemists are invited to compete in the Statistical Assessment of Proteins and Ligands (SAMPL) challenges. Researchers are asked to solve a problem for which the solution is known but not yet published; this blinded format allows a more rigorous test of methods than the typical retrospective studies. SAMPL7 focused on fragments binding to proteins, and the results have been published (open access) in J. Comp. Aided Mol. Des. by Philip Biggin and collaborators at University of Oxford and elsewhere.

The subject of this challenge was PHIP, a multidomain protein implicated in insulin signaling and tumor metastasis, though the biology is a bit complicated. PHIP contains two bromodomains, small modules that act as epigenetic readers by binding to acetylated lysine residues (Kac), and the researchers chose to focus on the second bromodomain (PHIP2). Bromodomains have proven to be highly ligandable, though this one is unusual in having a threonine in place of a conserved asparagine.

The experimental results that contestants were challenged to predict came from fragment screening using high-throughput crystallography at Diamond Light Source’s XChem. PHIP2 crystals diffracted to high resolution (1.2 Å) and were soaked with 20 mM fragment for 2 hours at 5 °C. In total 799 fragments were screened: 768 from the DSI-poised library (see here) and 31 FragLites (see here). The team took great pains to gather high-quality data, screening the FragLites twice and re-soaking 202 fragments that produced poor R factors or resolution worse than 2 Å. This resulted in 52 hits, a hit rate of 6.5%, consistent with the 2-15% typically seen at XChem. Most (47) of these were in the Kac-binding site, and these were the focus of the SAMPL7 challenge.

The first task was for modelers to simply predict which of the 799 fragments bound and which did not. Full experimental details were provided, including pH and the crystallization conditions. Entrants were given 1 month. There were eight submissions plus a control, which randomly selected compounds as binders or non-binders. Most of the contestants used some sort of docking strategy; details are provided in the paper.

Shockingly, none of the submissions scored better than random. Three of the entrants failed to correctly identify a single binder, and four identified between 1 and 5 of the 47.

The second task was to predict the binding modes of the crystallographically identified ligands. Contestants were provided with the 47 hits and asked to submit up to five poses for each. Perhaps stung from their performance on the first task, or perhaps put off by the two-week requested turnaround time, only five groups submitted entries.

Performance was assessed by calculating the root mean square deviation (RMSD) between the experimental and docked structure(s), with RMSD ≤ 2 Å considered successful. Despite this fairly lenient cutoff, “the performance of the methods was disappointing.” The best scored 24%, while two methods scored 2% and 0%. I’ll leave it to chemists to opine whether even a 24% success rate for docking would give confidence to embark on analog synthesis.

The third task was to select follow-up molecules from a large database for experimental validation, but alas “the COVID-19 pandemic resulted in a diversion of funds before this follow-up study could be done.” Nonetheless, four intrepid groups submitted entries, and these are discussed in the paper.

Taken at face value, this is downright damning for computational chemists. It is also at odds with many nice success stories, for example those described at last month’s DDC conference. So what’s going on?

For one thing, not everyone paid attention to the information provided. The crystals were at pH 5.6, but some of the entrants nonetheless assumed pH 7.4.

This raises a second and more important point. As the researchers acknowledge, “there is the possibility that our fragments do not necessarily bind in solution, whereas scoring functions are almost always calibrated and validated against solution and structural data.” In other words, perhaps the fragments were not identified computationally because they only bind extremely weakly to a crystalline protein soaking in dilute acid.

This highlights perhaps the biggest drawback of fragment screening by crystallography: no matter how beautiful the structure may appear, you get no measure of affinity. Indeed, a paper we highlighted last year was able to confirm binding by NMR for only a minority of crystallographically identified fragments against the SARS-CoV-2 main protease. This does not mean that the crystal structures are “wrong,” but the ligands may be so weak as to be unadvanceable.

A picture can be worth a thousand words, but it can also be misleading. Advancing fragments is best done with the help of multiple orthogonal methods.

6 comments:

Peter KennyMay 18, 2022 1:59 PM
Hi Dan, my understanding is that it is possible to estimate affinity using crystallography by measuring the concentration response of occupancy. For example, the 2007 Astex article that introduces group efficiency reports ΔG = -3.1 kcal/mol for pyrazole binding to PKB. I don’t have a feel for the logistics but would guess that you could move hits forward using affinity measured in this manner. I wasn’t particularly surprised by the underwhelming performance of computation in this exercise and would generally recommend screening a generic library before making specific compound selections.
Dan ErlansonMay 19, 2022 6:03 AM
Hi Pete,
This is a good suggestion, and I've actually asked several crystallographers about it, but it doesn't seem to work generally. Would love to see someone publish on a systematic effort though!
Thanks,
Dan
AnonymousMay 20, 2022 1:12 AM
curious:
is there strong evidence that the affinity of the initial fragment is correlated with the development time of the lead candidate in the end?
thus should we only advance fragments with sub-micromolar affinity and discard other hits?

The main bottleneck of Xray is of course the availablity of suitable crystals. However, if crystals are available crystallography is still a great first screening technique, as it is very sensitive and is the only method to yield 3D-structural information about the binding pose in reasonable throughput.

With even higher throughput in Xray, e.g. 1000 xray measurements per day, you will quickly get 100-300 hits, thus also something like 30-100 in the binding pocket of interest. The challenge is probably to gather this knowledge of binding modes and combine it for your efforts, rather than simply picking the highest affinity fragment from a screen without structural information and decide on only advancing this one.

I would also agree that pre-selecting compounds for the screening is not very effective. The prediction accuracy of binding poses for fragment-sized ligands is almost zero with the current computational methods.
Dan ErlansonMay 21, 2022 12:35 PM
Hi Anonymous,

It's an interesting hypothesis, but unfortunately we often don't learn how long it takes to go from a fragment to a lead, so it's hard to come up with statistics. There are examples of progressing weak fragment hits to development candidates in less than a year (vemurafenib and TAK-020). In the second case, the researchers even focused on a less potent fragment because it had more useful vectors for growing.

But all things being equal, I certainly prefer starting with more potent fragments that are also structurally enabled.
AnonymousMay 22, 2022 11:40 AM
Very interesting. But again, this is why we have learned over and over again that a single screening method is often not enough. It's definitely encouraged to screen fragments by soaking as well as in solution with NMR (amongst other techniques). Fear of high soaking concentration uses in crystallography and artifacts are always concerns raised when evaluating a hit.

16 May 2022

SAMPL7: Epic computational fail or just no solution?

6 comments: