The amino acids histidine (H or His), glutamic acid (E or Glu), arginine (R or Arg) and aspartic acid (D or Asp) are often found in ligand-binding sites in proteins. As such, finding fragments that preferentially interact with these amino acids could be useful for fragment-based ligand discovery. David Selwood and colleagues at University College London have combed the Protein Data Bank (PDB) to do just this; they report their results in a recent issue of J. Med. Chem.
The researchers analyzed over 8000 high-resolution protein-ligand structures in which the ligands formed hydrogen bonds with the side chains of His, Glu, Arg, or Asp. They defined fragments as “the largest ring assembly containing the atoms involved in hydrogen bonding.” This excludes functional groups linked to aliphatic chains, but given the importance of rings in most drug molecules this limitation seems reasonable. A total of 462 fragments were found; the number of fragments making hydrogen bonds with each amino acid was broadly similar, with a low of 130 for Arg and a high of 159 for Asp.
The diversity of fragments that interact with the acidic side chains of Asp and Glu is lower than that of fragments interacting with the basic side chains of His and Arg. Not surprisingly, amidines represent a large fraction of the former; these form two hydrogen bonds with either Glu or, more frequently, Asp. Cyclic diols are also common double hydrogen-bonding fragments for Glu and Asp, while cyclic aliphatic amines are perhaps less common than one might expect.
Among fragments that interact with Arg, 124 (91%) do so through an oxygen atom, with only 7 (5%) interacting through nitrogen, and a handful interacting through halogens (5) or sulfur (1 example). The His residue also shows a preference for oxygen-containing fragments, though at 63% this is less pronounced than the much more basic Arg.
One of the attractive features of this work is that, by focusing on fragments that form hydrogen bonds, the free energy of binding is likely to be dominated by enthalpic rather than entropic terms. As has been discussed (here, here, and here) this has some potential advantages for drugs.
This sort of analysis always leads one to ask whether the identified fragments represent limits on what could bind, or, as the researchers also speculate, “the limited variability of currently available chemical libraries from which drugs are derived.” There is clearly some justification for thinking the latter: the preponderance of amidines is likely due to the number of serine proteases that have been targeted with this functional group. Nonetheless, the overall set of fragments could be quite useful for computational screening. In fact, many of them could also be incorporated into physical fragment libraries. Perhaps one of the enterprising commercial fragment library suppliers will put together a sub-library based on this work.