The rise of high-throughput crystallography is among the most exciting recent developments for fragment
finding. Historically deemed too slow for primary screening, crystallography
was reserved for select hits from an assay cascade. Now crystallographic
screens up-front sometimes yield hundreds of hits. Many have been deposited in
the Protein Data Bank (PDB). In a recent (open access) Protein Sci. commentary,
Mariusz Jaskolski (Mickiewicz University), Bernhard Rupp (Medical University
Innsbruck), and collaborators in the US question this practice.
In particular, the researchers ask
whether molecules processed using Pan-Dataset Density Analysis (PanDDA) belong
in the PDB. The method, which we described here, is typically used when
hundreds of compounds have been soaked into crystals of the same protein. Most molecules
will not bind, and these empty structures can be averaged to provide a background
map to better identify weakly-bound ligands that may have only partial
occupancy.
The researchers seem suspicious
of this technique, referring to “supposed ligands” that may “confuse most
biomedical researchers” and “degrade the PDB integrity,” the effect of which “could
be disastrous.” To support their argument, they provide two examples from the
PDB where the atomic models diverge from the electron density calculated using
conventional methods and one with wonky statistics. 
To avoid “contamination of the
PDB by suboptimal structures,” the researchers suggest depositing structures from
large-scale crystallographic screens in a separate database. Alternatively, they
suggest clearer annotation. (To be fair, all three of the examples cited are already
prominently marked “PanDDA analysis group deposition.”)
Needless to say, this is controversial. In a bioRxiv preprint, Manfred Weiss (Helmholtz-Zentrum
Berlin) and collaborators in the US, Germany, Sweden, and the Netherlands, some
of whom co-developed PanDDA, take a different view.
The researchers agree that group
depositions need to be marked clearly, but they argue that they squarely belong
in the PDB rather than in a separate repository. Moreover, “commentaries that underestimate
the knowledge of PDB users, that ignore the opportunities present in
heterogenous crystallographic data, and that miss out on chances for education
on structure quality do more harm than good.”
The three examples described by Jaskolski
and colleagues are re-examined, and while it is true that two of them do show poor
occupancy using conventional methods, the ligands are clearly visible when
PanDDA is used. (In the third case, there was an error in the resolution cutoff
during automated processing, but the data could be successfully reprocessed
manually.)
PanDDA was developed specifically
to identify small, low occupancy ligands, so the researchers argue that these
entries “cannot and should not be treated in the same way” as other ligands. Banning
them from the PDB would potentially impede future research.
Weiss and colleagues refer to the
Structural Genomics campaign of the late 1990s and early 2000s to solve myriad
structures of diverse proteins, most of which were not being otherwise studied.
At the time some commentators derided this effort as “stamp collecting.” Yet
the number and diversity of structures thus deposited into the PDB likely
contributed to the success of automated protein folding algorithms such as
AlphaFold2.
Similarly, including structures from
PanDDA processing could lead to unforeseen advances. For example, Weiss and
colleagues suggest we may be able to “extract all aspects of conformational
as well as of compositional heterogeneity out of all these data sets.” A
better understanding of the role of protein dynamics in ligand binding is likely
to require thousands of similar datasets of the kind being uploaded.
Personally, I believe that scientists
should be wary of all published information. As the old saying goes, trust,
but verify. As evidenced by my five-part series “Getting misled by crystal structures,” even conventional structures in the PDB should not necessarily be
taken at face value. With that precaution, I’ll hold with the conclusion of
Weiss and colleagues: “As long as the data is there, let’s embrace it and make it
available!”
I agree with Dr. Weiss and co. The model should answer biological questions (recognition event and binding pose), and hiding the data will do more harm.
ReplyDeleteHowever, here there is an alarming mindset that the end user is responsible for how he uses the data. While this is a conventional truth, past month I was at a conference and talked to comp chem people - almost no one, unless coming from industry with a solid structural biology expertise in-house, examines electronic densities. Absolutely no one examines PanDDA event maps and digs into details there. The model is taken for granted. Feels like we need a good guideline on how to judge recognition events in MX data, and most certainly PanDDA depositions should be supplemented with a guideline on how to evaluate them. Unfortunately, such guidelines can not rely on modern MX software environments. The end user simply will not invest time and especially resources into getting familiar with the current solutions and bulk of excellent, but heavy on competence literature. Sad but true.
We are pleased to note that Weiss et al. essentially agree with our main postulate (already expressed in the title of our paper) that the results of PanDDA screening should be archived correctly in a dedicated repository rather than being inadequately presented by the current Protein Data Bank (PDB) protocol, which was developed for an entirely different purpose. Our main concern was appropriate storage, dissemination, and interpretation of fragment screening results and not explicit criticism of the method as such.
ReplyDeleteMariusz Jaskolski, Bernhard Rupp, Alexander Wlodawer, Zbyszek Dauter