Every few years computational
chemists are invited to compete in the Statistical Assessment of Proteins and
Ligands (SAMPL) challenges. Researchers are asked to solve a problem for which
the solution is known but not yet published; this blinded format allows a more
rigorous test of methods than the typical retrospective studies. SAMPL7 focused
on fragments binding to proteins, and the results have been published (open
access) in J. Comp. Aided Mol. Des. by Philip Biggin and collaborators
at University of Oxford and elsewhere.
The subject of this challenge was
PHIP, a multidomain protein implicated in insulin signaling and tumor metastasis,
though the biology is a bit complicated. PHIP contains two bromodomains, small
modules that act as epigenetic readers by binding to acetylated lysine residues
(Kac), and the researchers chose to focus on the second bromodomain (PHIP2).
Bromodomains have proven to be highly ligandable, though this one is unusual in
having a threonine in place of a conserved asparagine.
The experimental results that contestants
were challenged to predict came from fragment screening using high-throughput
crystallography at Diamond Light Source’s XChem. PHIP2 crystals diffracted to
high resolution (1.2 Å) and were soaked with 20 mM fragment for 2 hours at 5 °C.
In total 799 fragments were screened: 768 from the DSI-poised library (see
here) and 31 FragLites (see here). The team took great pains to gather
high-quality data, screening the FragLites twice and re-soaking 202 fragments
that produced poor R factors or resolution worse than 2 Å. This resulted in 52
hits, a hit rate of 6.5%, consistent with the 2-15% typically seen at XChem. Most
(47) of these were in the Kac-binding site, and these were the focus of the SAMPL7
challenge.
The first task was for modelers
to simply predict which of the 799 fragments bound and which did not. Full
experimental details were provided, including pH and the crystallization
conditions. Entrants were given 1 month. There were eight submissions plus a
control, which randomly selected compounds as binders or non-binders. Most of
the contestants used some sort of docking strategy; details are provided in the
paper.
Shockingly, none of the submissions
scored better than random. Three of the entrants failed to correctly identify a
single binder, and four identified between 1 and 5 of the 47.
The second task was to predict
the binding modes of the crystallographically identified ligands. Contestants were
provided with the 47 hits and asked to submit up to five poses for each. Perhaps
stung from their performance on the first task, or perhaps put off by the two-week
requested turnaround time, only five groups submitted entries.
Performance was assessed by calculating
the root mean square deviation (RMSD) between the experimental and docked
structure(s), with RMSD ≤ 2 Å considered successful. Despite this fairly
lenient cutoff, “the performance of the methods was disappointing.” The best
scored 24%, while two methods scored 2% and 0%. I’ll leave it to chemists to
opine whether even a 24% success rate for docking would give confidence to embark
on analog synthesis.
The third task was to select
follow-up molecules from a large database for experimental validation, but alas
“the COVID-19 pandemic resulted in a diversion of funds before this follow-up
study could be done.” Nonetheless, four intrepid groups submitted entries, and these
are discussed in the paper.
Taken at face value, this is downright
damning for computational chemists. It is also at odds with many nice success
stories, for example those described at last month’s DDC conference. So what’s
going on?
For one thing, not everyone paid
attention to the information provided. The crystals were at pH 5.6, but some of
the entrants nonetheless assumed pH 7.4.
This raises a second and more
important point. As the researchers acknowledge, “there is the possibility that
our fragments do not necessarily bind in solution, whereas scoring functions
are almost always calibrated and validated against solution and structural
data.” In other words, perhaps the fragments were not identified computationally
because they only bind extremely weakly to a crystalline protein soaking in dilute
acid.
This highlights perhaps the
biggest drawback of fragment screening by crystallography: no matter how beautiful
the structure may appear, you get no measure of affinity. Indeed, a paper we highlighted
last year was able to confirm binding by NMR for only a minority of crystallographically identified
fragments against the SARS-CoV-2 main protease. This does not mean that the
crystal structures are “wrong,” but the ligands may be so weak as to be unadvanceable.
A picture can be worth a thousand
words, but it can also be misleading. Advancing fragments is best done with the
help of multiple orthogonal methods.
Hi Dan, my understanding is that it is possible to estimate affinity using crystallography by measuring the concentration response of occupancy. For example, the 2007 Astex article that introduces group efficiency reports ΔG = -3.1 kcal/mol for pyrazole binding to PKB. I don’t have a feel for the logistics but would guess that you could move hits forward using affinity measured in this manner. I wasn’t particularly surprised by the underwhelming performance of computation in this exercise and would generally recommend screening a generic library before making specific compound selections.
ReplyDeleteHi Pete,
ReplyDeleteThis is a good suggestion, and I've actually asked several crystallographers about it, but it doesn't seem to work generally. Would love to see someone publish on a systematic effort though!
Thanks,
Dan
curious:
ReplyDeleteis there strong evidence that the affinity of the initial fragment is correlated with the development time of the lead candidate in the end?
thus should we only advance fragments with sub-micromolar affinity and discard other hits?
The main bottleneck of Xray is of course the availablity of suitable crystals. However, if crystals are available crystallography is still a great first screening technique, as it is very sensitive and is the only method to yield 3D-structural information about the binding pose in reasonable throughput.
With even higher throughput in Xray, e.g. 1000 xray measurements per day, you will quickly get 100-300 hits, thus also something like 30-100 in the binding pocket of interest. The challenge is probably to gather this knowledge of binding modes and combine it for your efforts, rather than simply picking the highest affinity fragment from a screen without structural information and decide on only advancing this one.
I would also agree that pre-selecting compounds for the screening is not very effective. The prediction accuracy of binding poses for fragment-sized ligands is almost zero with the current computational methods.
Hi Anonymous,
ReplyDeleteIt's an interesting hypothesis, but unfortunately we often don't learn how long it takes to go from a fragment to a lead, so it's hard to come up with statistics. There are examples of progressing weak fragment hits to development candidates in less than a year (vemurafenib and TAK-020). In the second case, the researchers even focused on a less potent fragment because it had more useful vectors for growing.
But all things being equal, I certainly prefer starting with more potent fragments that are also structurally enabled.
Very interesting. But again, this is why we have learned over and over again that a single screening method is often not enough. It's definitely encouraged to screen fragments by soaking as well as in solution with NMR (amongst other techniques). Fear of high soaking concentration uses in crystallography and artifacts are always concerns raised when evaluating a hit.
ReplyDeleteTo your point: I wonder if having talented expert crystallographers on the team can help with sorting artifacts from hits from crystallography screening. It seemed like us chemists got a lot of help with distinguishing crystal interface binding vs pocket binding hits.
Delete