16 December 2024

How to build a covalent fragment library

Covalent fragment-based lead discovery is becoming ever more popular, driven by success against difficult targets such as KRASG12C. These efforts require the design of new libraries, and in a recent J. Med. Chem. paper Simon Lucas and colleagues at AstraZeneca describe their design philosophy. (Co-author Henry Blackwell presented some of this work at the CHI FBLD meeting earlier this year.)
 
AstraZeneca has taken great care in building their fragment libraries; we discussed the revamp of their general fragment library as well as a “low HBD” (hydrogen bond donor) library here and here. For their covalent library, they considered several design features. First, given that any warhead will add molecular weight (four non-hydrogen atoms and a hydrogen-bond acceptor for an acrylamide), larger molecules are necessary, which requires relaxing the rule of three. Indeed, the researchers refer to their library as “lead-like.”
 
Because larger fragments are more complex, more are needed to explore chemical space. The researchers have built their library to 12,000 compounds, larger than the typical respondent from our poll last year. They have also chosen compounds to be maximally diverse rather than including near neighbors.
 
Attractive covalent hits make specific interactions with a protein target; warheads that are too “hot” can react non-specifically, as is the case with certain PAINS. Thus, the researchers chose molecules having moderate reactivity with the biologically relevant nucleophile glutathione (GSH).
 
The design principles are summarized as:
  • Molecular weight 250-400 Da
  • cLogD 0-4
  • GSH t1/2 > 100 minutes
  • Propensity for molecular interactions (such as hydrogen bond donors and acceptors)
  • Diversity
  • No diastereomeric mixtures (racemates are OK)
  • Synthetically tractable
  • Purity > 85% (and stable)
 
These criteria were used to select ~700 historical compounds from within AstraZeneca’s collection. Next, the researchers chose amines from their internal collection and capped these with an acrylamide moiety, leading to an additional 1200 molecules. They then turned to custom synthesis of scaffolds that were under-represented, commercial compounds, and covalent warheads besides acrylamides, such as cyclic sulfones. The final library consists of 88% acrylamides. Molecular weights range from 150 to 420 Da, and compounds contain 1-6 HBAs, 0-3 HBDs, and 1-3 rings.
 
The paper briefly describes a screen against Bfl-1 (or BFL1), a difficult oncology target we wrote about earlier this year. The protein contains a cysteine residue in the biologically important BH3 binding site, and previous research by others had identified covalent binders.
 
The AstraZeneca researchers tested Bfl-1 against an early version of the library having just 1400 compounds, which were incubated at 20 or 200 µM for 24 hours at 4 °C before analyzing by intact protein mass spectrometry. Hits were defined as giving >50% single labeling and that could be competed with a peptide derived from the binding partner BIM. Six hits are shown in the paper, with kinact/KI values ranging from 0.7 to 9.5 M-1s-1, comparable to some of the early KRASG12C hits. Further development of these molecules is described in a pair of papers that will be the subject of my next post.
 
Including Bfl-1, the library has been screened against 15 targets using mass spectrometry, typically yielding 1-2% hit rates defined as at least 20% labeling of a single site. Given this record of success, if you’re contemplating building a covalent library, this paper is well worth studying.

09 December 2024

They may be cons, but they’re our CONS

Practical Fragments has written repeatedly about various assay artifacts (vide infra). Different technologies are susceptible to different interference mechanisms, making general rules difficult. Earlier this year we wrote about the Metal Ion Interference Set, or MIIS: a collection of a dozen salts that could be used to assess the sensitivity of assays to metal contaminants. In a recent open-access JACS Au paper, Huabin Hu (Uppsala University), Jonathan Baell (Monash University), and collaborators extend the concept to small molecules.
 
The researchers have compiled a Collection Of useful Nuisance compounds, or CONS, perhaps with a nod to “Chemical con artists foil drug discovery” published a decade ago, which we highlighted here. The 103 members of the CONS are divided into three categories.
 
The first set contains five aggregators: molecules that have been shown to form colloidal clusters that non-specifically interfere with biological assays, as discussed here.
 
The largest set, at 67 members, consists of PAINS, or pan-assay interference compounds, which we first wrote about in 2010. These are themselves divided into various subcategories: non-specific electrophiles such as curcumin and an isothiazolone, redox cyclers such as quinones, contaminants such as the decomposition products of certain fused tetrahydroquinolines, miscellaneous, metal chelators, and additional mechanisms including optical interference and singlet oxygen quenchers, which are particularly problematic in AlphaScreen assays.  
 
The last set consists of 31 compounds that can cause problems in phenotypic assays. Some of these non-specifically disrupt cell membranes. Others have well-defined but toxic effects, such as interfering with tubulin or intercalating into DNA. Such bioactivity is not always a bad thing: some of these molecules, such as topotecan and colchicine, are approved drugs, but it’s useful to be aware of whether these types of activities will affect your assay.
 
One criticism of the PAINS concept is that it lumps together multiple mechanisms. (Pete Kenny wrote about this recently.) Another criticism is that, by focusing on chemical substructures, true hits may be unfairly deprioritized based on structure alone. What’s nice about the CONS list is that the potentially interfering mechanisms of each molecule are documented and categorized so they can be considered when establishing an assay. For example, you may not care whether a compound interferes in a phenotypic assay if you are performing a screen on an isolated enzyme.
 
The entire set of compounds is available from Enamine, and additional vendors are provided in a supplementary table. If you’re doing a lot of assays, particularly on new targets and mechanisms, it may be worth testing the CONS to understand what kinds of false positives might occur.

02 December 2024

Mapping protein conformations with fragments

Proteins can be remarkably dynamic, and, as we noted recently, different conformational states can reveal different pockets for small molecule ligands. But how can one survey and categorize all the possibilities? In a recent J. Chem. Inf. Model. paper, Doeke Hekstra and colleagues at Harvard University present a new tool for doing so.
 
High-throughput crystallographic fragment screens are becoming faster and more widely accessible, and the researchers wondered whether the information from these screens could be used to map protein conformational landscapes. To do so, they built a Python program called COLAV, short for COnformational LAndscape Visualization. This open-source tool can compile data from hundreds of protein coordinate files and then, for each protein, calculate the dihedral angles between backbone atoms, the pairwise distances between the alpha-carbon atoms, and the strain.
 
To a first approximation, dihedral angles capture local movements, while distances between alpha-carbons capture global movements, such as the distance between the N-terminus and C-terminus. Strain measurements are also local but can reveal particularly important features such as hinge movements. Also, while dihedral and pairwise distances can be calculated for single proteins, strain measurements are calculated after first aligning multiple structures.
 
Having calculated these three parameters for individual protein structures, COLAV can compare them across the selected set of structures using principal component analysis (PCA). These comparisons can reveal clusters with similar dihedral angles, pairwise distances, or strain.
 
The researchers provide two case studies. The first is the metabolic disease target PTP1B, which we recently wrote about here. This enzyme has been pursued intensively for decades, so the researchers were able to draw on 163 individual protein structures deposited in the protein data bank (PDB) as well as 187 structures from a high-throughput crystallographic fragment screen. PTP1B contains two flexible loops, each of which adopts one of two conformations, and COLAV successfully segregated all 350 structures into four clusters. Importantly, these four clusters were found whether the structures were pulled from the PDB (representing experiments conducted across multiple labs and years) or from the fragment screen, suggesting that a single crystallographic fragment screen can identify most or all of the conformational states available to a protein. This is particularly impressive given that most of the fragments bound in allosteric sites while most of the ligands found in the PDB bound in the active site.
 
Next, the researchers turned to the main protease (MPro) of SARS-CoV-2, the subject of intense and successful drug discovery efforts. They used 656 structures from the PDB and 631 structures from high-throughput crystallographic screens to perform COLAV analyses. Unlike PTP1B, discrete conformational clusters were not observed; rather a continuous band was seen, suggesting that the protein can assume myriad conformations. Here too though, the fragment screens were able to sample most of the conformations observed in the PDB.
 
The fact that a single high-throughput crystallographic screen can capture the conformations seen in hundreds of hard-won discrete protein-ligand crystal structures is encouraging, though of course the paper only describes two case studies. Also, as the researchers note, any structure that cannot be crystallized is not sampled. Since COLAV is free to use, it will be fun to see it applied to other proteins.

18 November 2024

Covalent fragments vs chikungunya nsP2

Perhaps because it sounds like “chicken,” when I first heard of chikungunya I thought it was a joke. But there’s nothing funny about a disease whose name comes from a word meaning “to become contorted,” referring to contortion caused by pain, which can last for months. The mosquito-borne alphavirus was first identified in 1952 in West Africa, introduced to the Americas in 2013, and is now spreading rapidly worldwide. There is no specific treatment. In three recent papers, a large group of researchers mostly from the Structural Genomics Consortium take the first steps towards one.
 
Like many viruses, the chikungunya genome encodes polyproteins that are cleaved by viral proteases, in this case a domain of the nonstructural protein 2 (nsP2). This cysteine protease is essential for viral replication, and the three papers collectively describe finding and exploring selective probes against it.  
 
In Proc. Nat. Acad. Sci. USA, Kenneth Pearce (University of North Carolina at Chapel Hill) and collaborators describe a screen of 6120 covalent fragments from Enamine against this target. Compounds were preincubated in a FRET-based functional assay at 20 µM for 30 minutes, resulting in 153 hits that inhibited activity by at least 50%. 43 of these were repurchased for full-dose response curves, and 20 of these had IC50 values < 20 µM. Of these, compound RA-0002034 was the most potent, with IC50 = 180 nM.


The proper way to assess irreversible covalent inhibitors is not the time-dependent IC50, but rather the (theoretically) invariant kinact/KI ratio. The researchers measured this for the best hits and found the value for RA-0002034 to be 6400 M-1s-1, which is not far below that for the approved covalent drug sotorasib for its target.
 
Mass spectrometry experiments after tryptic digestion revealed the compound binds to the catalytic cysteine of nsP2, as expected, and not to other cysteines. RA-0002034 contains a potentially reactive vinyl sulfone warhead, but the half-life against the biologically relevant nucleophile glutathione is a respectable 130 minutes. A screen against 13 other cysteine proteases was also quite clean, as was chemoproteomic profiling in human cells.
 
The compound was also tested in cellular viral replication assays and found to be remarkably potent, with a low nanomolar EC50 value. Encouragingly, it was also potent against three other alphaviruses, Ross River virus, Venezuelan Equine Encephalitis virus, and Mayaro virus.
 
RA-0002034 appears to be an attractive chemical probe for exploring the biology of chikungunya. Best practices are to also have an inactive control molecule, and the researchers made a substitution off the central pyrazole ring to produce RA-0003161, which is 500-fold less active.
 
The paper includes some SAR-by-catalog, and the chemistry is more extensively explored in an open-access J. Med. Chem. paper by Timothy Willson (UNC Chapel Hill) and collaborators. Although no crystal structures of the compounds bound to nsP2 were available, the researchers used modeling to guide modification of all portions of the molecule. The most potent molecule was 8d, which is slightly more active than RA-0002034. Also, methyl substitution near the electrophilic center is tolerated, which could improve stability, as seen with the covalent WRN inhibitor from Vividion which we wrote about here.
 
One annoying feature of RA-0002034 is its tendency to cyclize to inactive compound 2, a process explored in an open-access Pharmaceuticals paper by Timothy Willson and collaborators. This occurs even at neutral pH. However, replacing the central pyrazole with an isoxazole (compound 10) fixes this problem.
 
Collectively these three publications provide new insights and tools for investigating chikungunya. RA-0002034 is a far more attractive starting point than a molecule Teddy described on Practical Fragments back in 2015. The pharmacokinetics of RA-0002034 need to be improved before in vivo experiments are warranted, but this seems achievable, and I look forward to watching this story develop.

11 November 2024

Poll results: fragment finding methods and structural information needed for fragment-to-lead efforts

Our most recent poll asked about fragment finding methods. The poll ran from September 21 through November 8 and received 135 responses from 20 countries. Two thirds of these were from the US, about 12% were from the UK, 4% from Germany, 3% from the Netherlands, and 2% from Australia.
 
The first question asked how much structural information you need to begin optimizing a fragment. In contrast to 2017, when we first asked this question, crystallography has significantly increased at the expense of the other choices. 
 
 
I confess to being surprised, as I expected that by now people would be more comfortable beginning optimization in the absence of structural information, an approach that has been quite successful as discussed in a 2019 open-access Cell Chemical Biology review by Ben Davis, Wolfgang Jahnke, and me. Perhaps the increasing speed and accessibility of new methods has so lowered the bar to getting crystal structures that people have the luxury of waiting. Of course, with an online poll there is always the risk that many respondents from the same organization may skew the results.
 
The second question asked which methods you use to find and validate fragments. This is the fifth time we’ve run this poll, starting in 2011. As with our first question, X-ray crystallography came out on top, with nearly 80% of respondents choosing it. This was followed by SPR, at 67%, and thermal shift and ligand-detected NMR, each around 55%. 
 
 
Functional screening was used by nearly half of respondents, with computational methods, protein-detected NMR, and literature starting points used by around a third. Mass spectrometry and ITC were each used by slightly more than a quarter of respondents.
 
For the first time we asked about cryo-EM, and nearly 20% of respondents reported using this technique.
 
MST and affinity-based methods each came in at 13%, with just 4% of respondents using BLI, and 5 individual respondents using other methods. I’d be curious to know what these are.
 
The average respondent reported using just over 5 different techniques, which is down slightly from 6 in 2019 but up from 4 in 2016. Using multiple orthogonal methods is clearly well established as best practice, even if the precise number varies.
 
How do these results compare with your own practices?