Practical Fragments: June 2023

26 June 2023

Fragment merging in silico, two ways

Roughly speaking there are three ways to advance fragments to leads: growing, linking, or merging. Growing is the most common, but as the number of crystal structures of bound fragments continues to increase so too does the opportunity for fragment merging, in which elements of two fragments are combined into a new molecule. In a new (open access) J. Chem. Inf. Mod. paper, Charlotte Deane and collaborators at University of Oxford, Informatics Matters, Vernalis, and LifeArc compare two in silico methods.

Fragment merging, as described in the paper, “is used for fragments that bind in partially overlapping space by designing compounds that incorporate substructural features from each.” Each fragment may have extraneous bits that are not kept in the merged molecule. Indeed, sometimes only a small portion of one of the fragments is incorporated into the final molecule. (Both these concepts are shown in an example we recently wrote about here.)

When chemists merge two fragments, they consider the synthetic tractability of the merged molecules. Computers, on the other hand, sometimes propose compounds that are either unreasonable or would require a doctoral-thesis worth of effort to make. One solution is to invert the problem: rather than trying to assess whether specific in-silico-generated molecules can be made, possible merged molecules can be searched against a large virtual library of synthetically accessible molecules for similar compounds (see for example here).

A key question for this approach is the definition of similar. The most common method for finding similar molecules is by reducing them to molecular “fingerprints,” such as the presence or absence of a chlorine atom. The more fingerprints two molecules have in common, the more similar they are; this is the approach used for Tanimoto similarity.

An alternative approach is to use a “graph database” in which molecules are represented as nodes and edges, with nodes being substructures and edges being connections between the substructures. This approach was described by researchers from Astex as the Fragment Network.

Which of these approaches works better?

In the new paper, the researchers built a virtual library of more than 120 million commercially available molecules. They then selected four proteins with published crystallographic fragment hits. Between 9 and 19 individual fragments were chosen for merging for each protein, forming 55 to 134 potentially mergeable pairs. These were then computationally merged and queried against the database using either similarity searching or the Fragment Network.

Both search methods yielded comparable numbers of possible hits, ranging from just under 23,000 to nearly 169,000 per protein. These were then computationally filtered to find those molecules most likely to bind to the proteins, resulting in 56 to 952. Interestingly though, the specific molecules varied considerably depending on which search method was used. In fact, molecules from the Fragment Network mostly occupied different regions of chemical space than those found using similarity searching. Moreover, in many cases fragment pairs yielded merged compounds from one method but not the other. The number of predicted interactions with the protein targets also differed, and these differences extended not just to specific interactions but also to functional diversity.

The researchers did not purchase and test compounds themselves, but they did run the analysis against two published examples of fragment merging (one of which we wrote about here) and found that both the Fragment Network and similarity searching could find molecules related to experimentally validated binders.

The question of which approach works better remains open, so the researchers suggest using both. They do note that running a Fragment Network search is computationally less demanding, in this case taking an average of 2 to 14 minutes of CPU time vs 40 minutes for the similarity search. These differences become even more significant when searching billion-compound libraries.

Importantly, the researchers provide the code to generate your own Fragment Network, so you can try this at home. I look forward to seeing how the two techniques perform prospectively.

19 June 2023

Nucleophilic covalent fragments against cysteine (!)

Covalent fragment-based drug discovery continues to gain momentum, as evidenced by the number of talks at the CHI Drug Discovery Chemistry meeting in April. All of those involved nucleophilic residues on proteins, especially cysteine, reacting with electrophilic fragments. However, as we noted last year, it is possible to do the reverse. This is the topic of a new paper in Nat. Chem. Biol. by Jing Yang (Beijing Institute of Lifeomics), Kate Carroll (UF Scripps), and collaborators.

The reason most covalent fragments are not nucleophiles is that none of the twenty standard amino acids are particularly electrophilic. The work we mentioned last year focused on post-translational modifications introducing aldehydes or ketones into proteins. But it turns out that the thiol group of cysteine, which is normally nucleophilic, can be oxidized to a sulfenic acid, which is electrophilic. The Carroll group has been studying this “cysteine redoxome” for years and found that S-sulfenation can serve a regulatory function akin to phosphorylation.

To assess the reactivity of sulfenic acids across the proteome, the researchers synthesized cyanoacetamide and nitroacetamide derivatives of 3,5-bis(trifluoromethyl)aniline. The chloroacetamide and acrylamide derivatives of this fragment have previously been used in chemoproteomics experiments to probe for reactive cysteines. Cell lysates were treated with one of these four fragments, followed by treatment with a generic probe for sulfenic acids or thiols. If a cysteine residue is modified with the fragment, it will be unavailable to react with the second probe, and this loss in signal can be quantifiably detected using mass spectrometry.

For the thiol-reactive chloroacetamide and acrylamide, the researchers found that 25.2% and 11.0% of quantifiable cysteines in the proteome could form adducts. But only 24 cysteine residues formed adducts with the nitroacetamide (1.3% of the total sulfenic acids quantified), and none formed adducts with the cyanoacetamide.

Despite this lower hit rate, the researchers constructed a library of 65 cyanoacetamide- and nitroacetamide-containing fragments, which was similarly screened in cell lysates. Adding hydrogen peroxide to cell lysates to mimic oxidative stress increased the number of sulfenic acid sites. In total the researchers found 524 liganded sites across 441 proteins. As expected from the earlier experiments, nitroacetamides tended to bind to more sites than cyanoacetamides.

The researchers studied the functional effect of covalent modification for several proteins. The enzymes GAPDH, GSTO1, and ACAT1 all have active-site cysteine residues that can be reversibly modified by oxidation to the sulfenic acid. Reaction of this form of the recombinant proteins with one of the covalent fragments led to irreversible inhibition.

Similarly, the researchers demonstrated that a fragment which hits the enzyme PRXL2A activated MAPK signaling in cells, as expected. Importantly, this effect was not seen in cells containing PRXL2A with a cysteine to serine mutation. The non-enzyme proteins HDGF and BCCIP could also be functionally inhibited in cells with covalent fragments.

This “umpolung” approach to covalent ligand discovery is scientifically interesting, but how useful will it be? In most cases sulfenic acid formation is already deactivating, so targeting this form of the protein will simply keep it in the off-state. However, the researchers do note that sulfenic acid formation can be activating.

A second challenge is that sulfenic acid formation tends to be substoichiometric, with only a small percentage of cysteines existing predominantly in the sulfenic acid form. Thus, it will be difficult to achieve the near quantitative level of modification often required for biological effects. That said, there are cases where you want to tweak a pathway rather than shut it down entirely. Protein activation or the development of new PROTACs could also benefit from limited target protein engagement.

As for the covalent fragments themselves, nitroacetamides may be too reactive, but the cyanoacetamide moiety is actually found in a few approved drugs, such as the anti-inflammatory tofacitinib. And if compelling sulfenic acid targets are identified, chemists will likely develop additional nucleophilic probes suitable for dosing in humans.

12 June 2023

A rule of 1 (hydrogen bond donor) library

Hydrogen bond donors (HBDs) in ligands are troublemakers. Having more than a couple tends to decrease permeability, bioavailability, and even solubility. HBDs can also lead to efflux, which is particularly problematic for drugs that must cross the blood-brain barrier. While this is true in general, the problems become even more acute for heterobifunctional drugs such as PROTACs, which contain two moieties that each recognize a separate protein. To minimize the number of HBDs at the outset of a project, Benjamin Whitehurst and colleagues at AstraZeneca have built a “Low HBD” fragment set, which has just been described in J. Med. Chem. Soc.

The researchers started by examining roughly 205,000 compounds in their collection having between 11 and 19 non-hydrogen atoms and no more than one HBD, defined as “a proton bonded to an oxygen or nitrogen atom in its neutral form.” An extensive series of filters was used to winnow the molecules based on physicochemical properties, diversity, and absence of reactive groups. Consistent with our recent poll, synthetic tractability was considered explicitly. The researchers also made use of a multiparameter optimization score (see here). After quality control, which included solubility and compatibility with SPR, they ended up with a set of 551 fragments.

AstraZeneca has recently revamped their general purpose “Biophysics” fragment library, which consists of 2741 compounds. They also have a set of 402 “Kinase Hinge” fragments, which contain hydrogen bond donors near hydrogen bond acceptors. Comparing the Low HBD set with the other two revealed that it was as diverse as the Biophysics set and more diverse than the Kinase Hinge set. Other parameters such as the number of hydrogen bond acceptors (HBAs), polar surface area, and molecular weight were similar between the Low HBD and Biophysics libraries. Happily, and perhaps defying expectations, lipophilicity was not higher in the Low HBD set.

So how does the library perform? The researchers describe five screens against an E3 ligase, a protein-protein interaction, a kinase, a histone methyltransferase, and a transcription factor. Confirmed hits (defined as having K_d < 1 mM by SPR) were obtained for all targets. Hit rates for two targets were comparable to hit rates for the Biophysics set, as were the dissociation constants and ligand efficiencies. Not surprisingly the Kinase Hinge set produced a higher hit rate for the kinase. (Two targets were only screened with the Low HBD set.)

The percentage of hits from the other fragment libraries having 0 or 1 HBD was 44%, 46%, and 80%, so the Low HBD set does seem to be fulfilling its role of enriching these types of compounds. Interestingly, when the researchers analyzed successful fragment-to-lead studies published between 2015 and 2021, they found that 53% of them had just 0 or 1 HBD.

All these results suggest that sharply curtailing the number of hydrogen bond donors in a fragment library doesn’t have negative consequences. Perhaps this isn’t surprising: an analysis we highlighted in 2021 based on 131 fragment-to-lead success stories noted that most of them only retained one or two polar interactions from the initial hit. That paper also noted that while 35% of the polar interactions were from N-H hydrogen bond donors on the ligands, an even higher percentage came from hydrogen bond acceptors. That paper and the AstraZeneca researchers also note the potential of other types of interactions, such as polarized C-H hydrogen bond donors and halogen bonds. It will be fun to watch hits from this library progress.

05 June 2023

Fragment screening by photo-CIDNP

Last year we highlighted a talk by Félix Torres in which he described photochemically induced dynamic nuclear polarization (photo-CIDNP) as a rapid, sensitive method for fragment screening. He, Roland Riek, and collaborators at the Swiss Federal Institute of Technology and the Latvian Institute of Organic Synthesis have just published details (open access) in J. Am. Chem. Soc.

The discovery of photo-CIDNP dates back to 1967, which is a useful reminder that technology advancement does not necessarily happen rapidly. The physics and mathematics are a bit complicated, but in essence the process requires a photosensitizer molecule that is excited by light and can also form a radical pair with a given ligand molecule. This “hyperpolarized” ligand is easily detectable by NMR. If the ligand is bound to a protein, the ligand is less able to be hyperpolarized, and thus conducting experiments in the presence and absence of protein reveals whether a small molecule binds to a protein of interest.

In practice, the researchers used fluorescein as the photosensitizer. To prevent quenching of the excited state by dissolved oxygen, the samples also included glucose (at 2.5 mM) and the enzymes glucose oxidase and catalase. Samples were illuminated with a 450 nm laser whose light was fed into a 600 MHz NMR instrument via an optical fiber.

So, what do you get for all this elaborate setup? Speed and sensitivity. The hyperpolarization allows ligands to be detected with a single scan taking just 2 seconds, as opposed to a typical STD NMR experiment which can take tens of minutes. Moreover, compound and protein concentration can be reduced, which both saves on precious materials and reduces the risk of aggregation.

But to realize these benefits, the researchers needed to construct a fragment library suitable for photo-CIDNP. Only about 30 molecules had been reported to be suitable for photo-CIDNP, but these included aromatic moieties frequently found in drugs such as indole, phenol, and imidazole rings. The researchers tested over 1300 fragments and selected a set of 212 that were rule-of-three compliant and showed at least five-fold signal-to-noise enhancement in photo-CIDNP.

This “NMhare” library was screened against the enzyme PIN1, which has been implicated in cancer and other diseases. Each fragment was screened individually at 50 µM with or without 25 µM PIN1. Although each experiment took only 2 seconds, changing the samples took longer, and the entire set of 424 experiments took 11 hours. The researchers described a flow-based system that could potentially screen 5000 compounds per day.

After visual inspection and quality control, twenty hits were identified. Remarkably, all twenty of these confirmed as binders using protein-detected (¹⁵N,¹H-HSQC) NMR, with fragments at 200 µM and isotopically labeled protein at 50 µM. Two of the fragments had been previously reported as binders, and the researchers were able to determine dissociation constants for these in the low millimolar range. Moreover, they were able to demonstrate that photo-CIDNP could detect one of these fragments at just 5 µM in the presence of 2 µM PIN1.

Overall this is neat technology, though as it requires some engineering I’m not sure where it falls under the “practical” descriptor of this blog. That said, if it proves sufficiently useful I’m sure vendors will supply off-the-shelf solutions. I look forward to hearing what NMR aficionados have to say.