20 March 2023

Versatile fragments from the Protein Data Bank

Four years ago we highlighted an analysis of fragments taken from the Protein Data Bank (PDB). Of 462 unique fragments, just 21 bound in more than one pocket. With the assumption that such “versatile” fragments may be particularly valuable starting points, Esther Kellenberger and colleagues at CNRS Univeristé de Strasbourg have done their own exploration of the PDB, as reported (open access) in Front. Chem.
 
Structures deposited in the PDB starting in 2000 with resolution better than 3 Å were examined to find those containing fragment-sized molecules (MW < 300 Da). Crystallization additives, phosphate and sulfate ions, and other unlovable molecules such as PAINS were excluded. Further triaging for fragments that bound in more than one pocket and in more than one binding mode (ie, different types of interactions) ultimately yielded a set of 203 versatile fragments. (One reason why so many more fragments were found in this study is the fact that the previous analysis required the word “fragment” to be present in the PDB entry.)
 
The versatile fragments are mostly compliant with the rule of three, with violations mostly related to the number of hydrogen bond donors or acceptors. Only a single molecule had ClogP > 3, though 50 were quite hydrophilic, with ClogP < 0. Interestingly, 45 of the molecules are listed as small molecule drugs, and 98 are substructures of approved drugs. Perhaps this is not surprising; drugs themselves are studied particularly intensively and frequently included in screening libraries.
 
The researchers had previously analyzed commercial libraries, and in the new paper they compared versatile fragments with the SpotXplorer library we wrote about here and the functionally diverse fragments used at XChem. Surprisingly there was very little overlap, even though most of the versatile fragments or analogs are commercially available. That said, some of the versatile fragments are molecules one may not want in a fragment library, such as the cofactor lipoic acid and the metal chelator 1,10-phenanthroline.
 
Binding modes for the same fragment in different pockets could vary considerably. The “universal fragment” 4-bromopyrazole, which we wrote about here, bound in two different binding modes, while the nucleoside thymidine showed a whopping 26 different binding modes. Conformations of the fragments could vary too, with only 43% of fragments showing a conserved conformation in all binding sites (defined as < 0.5 Å RMSD). Conformational changes, along with different protonation states, could be among the reasons why predicting fragment binding continues to be challenging.
 
This is a nice analysis, and it may be worth adding some of these versatile fragments to your own library. Laudably, SMILES strings for of all of them are provided in the supplementary material.

3 comments:

artmaniaworldwide said...
This comment has been removed by a blog administrator.
Anonymous said...

Hi,

Thanks for sharing a nice study!

Don't you think that all activities related to mining ligands in PDB (numerous studies over last 10 years!) and making library designs on its basis lead to very polar, rich in HBD/HBA collections? Such structures are usually rather far from a typical fragment hit, don't score often in screening campaigns, and are not that easy to progress (if are evolvable at all, to be honest). Just a thought.

Dan Erlanson said...

Hi Anonymous,
It's an interesting point, and I think some of these fragments are perhaps too polar to be easily advanced, but it is worth noting that most fragments that have advanced to leads do have at least one polar interaction with the protein according to this analysis. Indeed, here's an example of a clinical compound that started from a very polar fragment that made multiple hydrogen-bonding interactions with the protein, all of which were maintained during optimization.