A picture is worth a thousand words, but words can mislead as easily as inform. So it is with crystal structures, as Charles Reynolds discusses in the July issue of ACS Med. Chem. Lett. We’ve touched on this issue before (for example, here and here), but this is a nice update.
He starts with a cringe-worthy catalog of horrors found in the protein data bank (pdb):
Just to give a few examples: 1xqd contains three planar oxygens as part of a phosphate group; 1pme features a planar sulfur in the sulfoxide; 1tnk, a 1.8 Å resolution structure, contains a nonplanar tetrahedral aromatic carbon as part of a substituted aniline; and 4g93 contains an olefin that is twisted nearly 90° out of the plane.
Of course, with 100,000 structures, it is inevitable some dross will slip through, but Reynolds argues that around a quarter of all co-crystal structures contain errors so severe that they could lead to misinterpretations.
Why is the situation so dire? Reynolds suggests a number of reasons. First, there’s the push for quantity over quality: fully refining a structure may not be as valued as solving a new one. Second, small molecules comprise only a small portion of the overall structure and thus make minimal contributions to the metrics crystallographers use to assess quality during refinement. Third, with the exception of very high resolution structures, the quality of the electron density maps are such that properly placing the small molecule requires a fair bit of modeling. This challenge is complicated by the fact that most crystallographers were not trained as chemists and thus may not immediately recoil from a tetrahedral aromatic carbon atom. Also, much of the off-the-shelf software used for refining structures is not optimized for small molecules.
Nonetheless, there is good software available that properly accounts for small molecules. Hopefully publicizing errors will encourage more crystallographers to use it. In the meantime, caveat viewor!