05 May 2014

Biofragments: extracting signal from noise, and the limits of three-dimensionality

What does this protein do? Now that any genome can be sequenced, this question gets raised quite often. In many cases it is possible to give a rough answer based on protein sequence: this protein is a serine protease, that one is a protein tyrosine kinase, but figuring out the specific substrates can be more of a challenge. In a recent paper in ChemBioChem, Chris Abell and collaborators at the University of Cambridge and the University of Manchester attempt to answer this question with fragments.

The bacterium Mycobacterium tuberculosis (Mtb), which causes tuberculosis, has 20 cytochrome P450 proteins (CYPs), heme-containing enzymes that usually oxidize small molecules. Although some are essential for the pathogen, it is not clear what many of them do. The researchers used an approach called “biofragments” to try to pin down the substrate of CYP126.

The biofragments approach starts by selecting a collection of fragments based on known substrates. Of course, the specific substrates are not known, so in this case the researchers started with a set of several dozen natural (ie, non-synthetic) substrates of various other CYPs, both bacterial and eukaryotic. They then computationally screened the ZINC database of commercial molecules for fragments most similar to these substrates and purchased 63 of them. Perhaps not surprisingly given their similarity to natural products, these turned out to be more “three-dimensional” than conventional fragment libraries, as assessed both by the fraction of sp3 hybridized carbons and by principal moment-of-inertia.

Next, the researchers screened their fragments against CYP126 using three different NMR techniques (CPMG, STD, and WaterLOGSY). Since they were primarily interested in hits that bind at the active site, they also used a displacement assay in which the synthetic heme-binding drug ketoconazole was competed against fragments. This exercise yielded 9 hits – a relatively high 14% hit rate.

Strikingly, all of the hits are aromatic, and 7 of them could reasonably be described as planar. In other words, even though the biofragment library was relatively 3-dimensional, the confirmed hits were some of the flattest in the library! The researchers interpreted this to mean that “CYP126 might preferentially recognize aromatic moieties within its catalytic site,” but there could be something more general going on – perhaps aromatics are simply less complex, and thus more promiscuous.

Examining the fragment hits more closely, the researchers found that one of them – a dichlorophenol – produced a spectrophotometric shift similar to that produced by substrates when bound to the enzyme. This led them to look for similar structures among proposed Mtb metabolites. Weirdly, pentachlorophenol came up as a possible hit, and a spectrophotometric shift assay reveals that this molecule does have relatively high affinity for CYP126. Whether this is a biologically relevant substrate for the enzyme remains to be seen.

This is an intriguing approach, but I do have reservations. First, in constructing fragment libraries based on natural products, it is essential to avoid anything too “funky”. The Abell lab is one of the top fragment groups out there, well aware of potential artifacts, and has a long history of studying CYPs, but researchers with less experience could easily populate a library with dubious compounds.

More fundamentally though, I wonder about the basic premise of biofragments. The whole point of fragments is that they have low molecular complexity and are thus likely to bind to many targets, so is it realistic to try to extract selectivity data from them? Indeed, as we’ve seen (here and here), fragment selectivity is not necessarily predictive of larger molecules.

That said, the approach is worth trying. Even if it doesn’t ultimately lead to new insights into proteins’ natural substrates, it could lead to new inhibitors.

3 comments:

Dr. Teddy Z said...

I saw this paper and was intrigued. It is a very similar approach to the Emerald Fragments of Life. This is also something I have done in that past. It is my experience /impression that this is not a fantastic source of fragment hits. I think you would be just as successful starting with a library of all 20 amino acids as with "substrate fragments". Their approach might be useful for a target validation or "What the heck does this enzyme do study", but I am wholly unconvinced that this is a good entree into any sort of fragment diversity.

Michael Chimenti said...

Although not strictly fragment-based, a colleague in our lab has had success with a related approach. She relied on molecular docking of the entire KEGG library of metabolites and the genome context of the unknown bacterial enzymes to assign function.

http://www.ncbi.nlm.nih.gov/pubmed/24056934

To me, this seems like a kind of in-silico parallel to the experimental approach discussed above. The experimental approach of looking for fragment binders is perhaps handicapped by the small size and diversity of the collection. Docking used in this kind of exploratory analysis is not limited by library size or complexity.

Nate said...

A recent paper in JACS concludes that fragment based approaches to substrate discovery are unlikely to be successful.

http://pubs.acs.org/doi/abs/10.1021/ja501354q