27 June 2022

CovPDB: a free, searchable database of covalent protein-ligand structures

Last week we highlighted KinaFrag, a database of kinase-fragment complexes. Continuing the theme, this week brings us CovPDB, a database of high-resolution covalent protein-ligand structures. The database was described by Stefan Günther and colleagues at Albert-Ludwigs-Universität Freiburg in an open-access Nucleic Acids. Res. paper earlier this year.
 
The researchers downloaded all structures from the protein data bank (PDB) as of 31 August 2020 and extracted those with covalently bound ligands refined to at least 2.5 Å resolution. These were then manually curated to remove cofactors (such as retinal) and crosslinkers. Next, the chemical structures of the pre-reacted ligands were extracted from the primary citations. Everything was then combined into an easy-to-use database, and all the contents can also be downloaded.
 
CovPDB contains 2,294 unique protein-ligand complexes, with 733 different proteins and 1501 different ligands. A total of 93 different types of warheads are represented, from exotic (arsine oxide) to conventional (vinyl carbonyl, including acrylamides). These are further grouped into 21 covalent mechanisms. 
 
As expected, covalent bonds to cysteine and serine are most common, with 959 and 830 examples, respectively. Lysine, with 205 representatives, is a distant third, but I was surprised that various unreactive amino acid residues such as glycine, valine, and proline also showed up. Closer inspection revealed that these are N-terminal residues; the ligand reacts with the free amine. Though these sorts of bonds occur with several drugs, including carfilzomib and voxelotor, it might be nice to have separate annotations to keep these from being confused with residues that react exclusively at the side chain.
 
Browsing by ligand, protein, complex, warhead, covalent mechanism, or targeted residue is straightforward, as is searching by multiple methods, including ligand similarity and substructure. Each entry has its own page with a wealth of information, including an interactive 3D-viewer. Here’s the entry for one of the Tethering hits that ultimately led to sotorasib.
 



 
CovPDB should be especially useful to computational folks looking to build models based on high-quality data, but it's also fun to browse for new ideas and inspiration.
 
Importantly, the researchers state that they will update this database annually. As covalent drug discovery (including with fragments) becomes increasingly prominent, I expect the size of CovPDB to grow rapidly.

1 comment:

Christophe said...

A similar database is CovalentInDB (In=Inhibitor):
http://cadd.zju.edu.cn/cidb/