06 January 2016

Secondary ligand binding sites are common

Anyone who has been exposed to much crystallography will have seen examples where a ligand binds somewhere besides the active site of a protein. This is probably all the more likely in the case of fragments, both because fragments are soaked at high concentrations (and thus weaker ligands can be detected) and also because, being less complex, fragments will be able to bind to more sites. In some cases, such as FPPS and HCV NS3, ligands that bind at these “secondary sites” could be advanced to potent allosteric inhibitors. But how common are such sites? This is the question addressed by Harren Jhoti and Astex colleagues in a paper just published in Proc. Nat. Acad. Sci USA.

The researchers were privileged to have 5590 crystal structures of 24 proteins with at least one bound ligand from crystallographic fragment screens. Careful analysis to exclude buffers and molecules bound at crystallograpic interfaces left them with 53 sites total, with each protein having a fragment bound in at least 1 site; one had 6 (still far from the record 16 sites in HIV-1 RT discussed here). Importantly, 16 of the targets had at least 2 ligand binding sites, with an overall average of 2.2. This number of secondary sites is likely a lower bound, as some sites may have been blocked by crystal packing.

What can be said about these ligand binding sites? The researchers compared the sequence conservation between orthologous proteins from different organisms and found that primary binding sites are more conserved than the overall protein sequences. This is expected because, since the proteins likely have similar functions, there are more evolutionary constraints on the active site residues surrounding the primary sites. Interestingly though, the secondary sites were also significantly conserved, suggesting that they too may have some sort of function.

Protein mobility was also examined computationally, with the thought being that functional binding sites should be more rigid than the overall surface of the protein so as to minimize entropic costs of ligand binding. This turned out to be the case for all primary ligand binding sites, but it was also true for most of the secondary sites. Surprisingly, and in contrast to previous results, there were no differences in normalized B factors (roughly, temperature-related motions) for residues in either primary or secondary binding sites compared with surface residues in general.

Comparing the physical properties of the primary and secondary sites revealed that both were more lipophilic than the rest of the protein surface. Ligands tended to be slightly more buried in primary binding sites than in secondary sites, but there didn’t seem to be any differences among the ligands themselves, though the twelve shown in the paper are mostly “flat.”

These combined results suggest that the majority of proteins have multiple sites capable of binding to small molecule ligands. The researchers note that most of their examples are enzymes, so it may not be fair to extrapolate to other protein classes. That said, many GPCRs also have multiple ligand binding sites.

Secondary binding sites have several things going for them. First, allosteric sites provide a means to target proteins in which the primary binding site is problematic, perhaps because it is too closely related to other proteins. Allosteric sites can also be useful for targeting viral or cancer targets in which resistance is an issue, as in the case of ABL001. Finally, secondary sites provide an opportunity to develop not just inhibitors, but activators.

Of course, just because a fragment binds at a site doesn’t necessarily mean that the site is ligandable. Indeed, HSP70 appears to have 5 sites, yet by all accounts is an extremely difficult target. Four of the proteins (including HSP70) are described in some detail in the paper, with protein-fragment structures deposited in the protein data bank. It would be interesting to see how the secondary sites score as potential hot spots using software such as FTMap.

Still, knowing that secondary binding sites are the norm rather than the exception gives new impetus to look for them. It also suggests new areas of biology to explore. Molecular complexity is one thing, but it pales in comparison to biological complexity.


bernatv said...

I checked the correlation of the results in the paper and Allosite onlinte tool for two proteins and it seems like they align pretty well (except those shallow binding pockets on the periphery of proteins). https://bernatv.wordpress.com/2016/01/02/finding-new-pockets-with-fragments/
What bothers me the most is that argument that allosteric sites are conserved almost as much as orthosteric ones. All that I've learned up to now was based on the opposite assumption.

Dan Erlanson said...

Thanks bernatv for the Allosite analysis - very interesting.

As to what the conservation of allosteric sites means for developing selective inhibitors, I guess I'm less concerned. Just because the pockets are somewhat conserved doesn't mean they are absolutely conserved, and it only takes one mutation to block binding. After all, the ATP binding sites of kinases are functionally and structurally conserved, yet it is still possible to develop extremely selective inhibitors. This will be all the more true for allosteric sites where the conservation is less.

Fred Ludlow said...

Hi, Fred here (one of the authors on the paper). Thanks Dan for the review of the paper - very exciting to get mentioned on your blog! And thanks to bernatv for the comments. Just to clear one thing up: When we talk about sequence conservation in this paper we’re referring to conservation between orthologous proteins e.g. between Human CDK2 and Mouse CDK2, rather than conservation between different human proteins.

(We try to be consistent in our use of the terms homolog, paralog and ortholog: human CDK2 and CDK5 are paralogs. Human CDK2 and mouse CDK2 are orthologs. Both are examples of homologs)

When you talk about drug selectivity (infectious diseases aside) you’re generally only worried about the conservation between paralogs, e.g. how to generate selectivity for one particular human kinase vs another human kinase. The idea that allosteric/regulatory sites are less conserved than active sites when you compare different proteins in the same species seems pretty much accepted and we’re not arguing against this. This is of course one of the reasons that allosteric sites are interesting – they’re another way to achieve selectivity.

To rephrase what we were trying to say in the paper:
We’re comparing the “same” protein in different animals. E.g. Human CDK2 vs Mouse CDK2. We would expect these to have similar regulatory mechanisms, given they evolved from some common ancestor however many tens of millions of years ago. If those regulatory mechanisms involve binding at various alternative sites around the protein we’d expect those sites to be preferentially conserved.

During the x million years since mice and humans diverged from some common ancestor, their CDK2 genes have accumulated mutations. You can calculate the global sequence identity between mouse and human CDK2 and the sequence identity within specific regions (the primary and secondary sites).

We observe that the residues in both primary and secondary sites are conserved more than the rest of the protein, suggesting some evolutionary pressure to conserve those regions. Of course this isn’t by itself proof of biological function, you’d need to look at each pocket in much more depth and do a lot of biological validation, but the point we’re trying to make is that these things occur very frequently (more than half of the targets we look at) and many of them pass some of the checks you’d want to do before investing in the biological validation.

I hope that clears it up a bit, rather than just adding to the confusion!