Anyone who has been exposed to
much crystallography will have seen examples where a ligand binds somewhere
besides the active site of a protein. This is probably all the more likely in
the case of fragments, both because fragments are soaked at high concentrations
(and thus weaker ligands can be detected) and also because, being less complex,
fragments will be able to bind to more sites. In some cases, such as FPPS and
HCV NS3, ligands that bind at these “secondary sites” could be advanced to
potent allosteric inhibitors. But how common are such sites? This is the
question addressed by Harren Jhoti and Astex colleagues in a paper just
published in Proc. Nat. Acad. Sci USA.
The researchers were privileged
to have 5590 crystal structures of 24 proteins with at least one bound ligand
from crystallographic fragment screens. Careful analysis to exclude buffers and
molecules bound at crystallograpic interfaces left them with 53 sites total,
with each protein having a fragment bound in at least 1 site; one had 6 (still
far from the record 16 sites in HIV-1 RT discussed here). Importantly, 16 of
the targets had at least 2 ligand binding sites, with an overall average of 2.2. This
number of secondary sites is likely a lower bound, as some sites may have been
blocked by crystal packing.
What can be said about these
ligand binding sites? The researchers compared the sequence conservation between
orthologous proteins from different organisms and found that primary binding
sites are more conserved than the overall protein sequences. This is expected
because, since the proteins likely have similar functions, there are more evolutionary
constraints on the active site residues surrounding the primary sites.
Interestingly though, the secondary sites were also significantly conserved, suggesting
that they too may have some sort of function.
Protein mobility was also
examined computationally, with the thought being that functional binding sites
should be more rigid than the overall surface of the protein so as to minimize
entropic costs of ligand binding. This turned out to be the case for all
primary ligand binding sites, but it was also true for most of the secondary
sites. Surprisingly, and in contrast to previous results, there were no
differences in normalized B factors (roughly, temperature-related motions) for
residues in either primary or secondary binding sites compared with surface
residues in general.
Comparing the physical properties
of the primary and secondary sites revealed that both were more lipophilic than
the rest of the protein surface. Ligands tended to be slightly more buried in
primary binding sites than in secondary sites, but there didn’t seem to be any
differences among the ligands themselves, though the twelve shown in the paper
are mostly “flat.”
These combined results suggest
that the majority of proteins have multiple sites capable of binding to small
molecule ligands. The researchers note that most of their examples are enzymes,
so it may not be fair to extrapolate to other protein classes. That said, many
GPCRs also have multiple ligand binding sites.
Secondary binding sites have
several things going for them. First, allosteric sites provide a means to
target proteins in which the primary binding site is problematic, perhaps
because it is too closely related to other proteins. Allosteric sites can also
be useful for targeting viral or cancer targets in which resistance is an issue,
as in the case of ABL001. Finally, secondary sites provide an opportunity to
develop not just inhibitors, but activators.
Of course, just because a
fragment binds at a site doesn’t necessarily mean that the site is ligandable. Indeed,
HSP70 appears to have 5 sites, yet by all accounts is an extremely difficult
target. Four of the proteins (including HSP70) are described in some detail in
the paper, with protein-fragment structures deposited in the protein data bank.
It would be interesting to see how the secondary sites score as potential hot spots using software such as FTMap.
I checked the correlation of the results in the paper and Allosite onlinte tool for two proteins and it seems like they align pretty well (except those shallow binding pockets on the periphery of proteins). https://bernatv.wordpress.com/2016/01/02/finding-new-pockets-with-fragments/
ReplyDeleteWhat bothers me the most is that argument that allosteric sites are conserved almost as much as orthosteric ones. All that I've learned up to now was based on the opposite assumption.
Thanks bernatv for the Allosite analysis - very interesting.
ReplyDeleteAs to what the conservation of allosteric sites means for developing selective inhibitors, I guess I'm less concerned. Just because the pockets are somewhat conserved doesn't mean they are absolutely conserved, and it only takes one mutation to block binding. After all, the ATP binding sites of kinases are functionally and structurally conserved, yet it is still possible to develop extremely selective inhibitors. This will be all the more true for allosteric sites where the conservation is less.
Hi, Fred here (one of the authors on the paper). Thanks Dan for the review of the paper - very exciting to get mentioned on your blog! And thanks to bernatv for the comments. Just to clear one thing up: When we talk about sequence conservation in this paper we’re referring to conservation between orthologous proteins e.g. between Human CDK2 and Mouse CDK2, rather than conservation between different human proteins.
ReplyDelete(We try to be consistent in our use of the terms homolog, paralog and ortholog: human CDK2 and CDK5 are paralogs. Human CDK2 and mouse CDK2 are orthologs. Both are examples of homologs)
When you talk about drug selectivity (infectious diseases aside) you’re generally only worried about the conservation between paralogs, e.g. how to generate selectivity for one particular human kinase vs another human kinase. The idea that allosteric/regulatory sites are less conserved than active sites when you compare different proteins in the same species seems pretty much accepted and we’re not arguing against this. This is of course one of the reasons that allosteric sites are interesting – they’re another way to achieve selectivity.
To rephrase what we were trying to say in the paper:
We’re comparing the “same” protein in different animals. E.g. Human CDK2 vs Mouse CDK2. We would expect these to have similar regulatory mechanisms, given they evolved from some common ancestor however many tens of millions of years ago. If those regulatory mechanisms involve binding at various alternative sites around the protein we’d expect those sites to be preferentially conserved.
During the x million years since mice and humans diverged from some common ancestor, their CDK2 genes have accumulated mutations. You can calculate the global sequence identity between mouse and human CDK2 and the sequence identity within specific regions (the primary and secondary sites).
We observe that the residues in both primary and secondary sites are conserved more than the rest of the protein, suggesting some evolutionary pressure to conserve those regions. Of course this isn’t by itself proof of biological function, you’d need to look at each pocket in much more depth and do a lot of biological validation, but the point we’re trying to make is that these things occur very frequently (more than half of the targets we look at) and many of them pass some of the checks you’d want to do before investing in the biological validation.
I hope that clears it up a bit, rather than just adding to the confusion!