Practical Fragments: 2025

17 November 2025

xLE: solving problems or missing the point?

Ligand efficiency (LE) has been discussed repeatedly and extensively on Practical Fragments, most recently in September. Two criticisms are its dependence on standard state and the observation that larger molecules frequently have lower ligand efficiencies than smaller molecules. In a just-published open-access ACS Med. Chem. Lett. paper, Hongtao Zhao proposes a new metric, xLE, to address these concerns.

LE is defined as the negative Gibbs free energy of binding (ΔG) divided by the number of non-hydrogen (or heavy) atoms, and of course ΔG is state-dependent. Standard state assumptions are 298K and 1M concentrations, choices that some people see as arbitrary since few biologically relevant molecules ever achieve concentrations near 1M. To remove the dependence on standard state, Zhao proposes to remove the translational entropy term of the unbound ligand from the free energy calculation.

Zhao also addresses the second criticism, that larger molecules often have lower ligand efficiencies. This phenomenon was observed in an (open-access) 1999 paper titled “the maximal affinity of ligands,” which found that, beyond a certain threshold, larger ligands do not have stronger affinities; there are very few femtomolar binders even among the largest small molecules. Thus, Zhao proposes attenuating the size dependence.

The new metric, xLE, is defined as follows:

xLE = (5.8 + 0.9*ln(M_w) – ΔG)/(a*N^α) - b

Where N is the number of non-hydrogen atoms, α is chosen to reduce size dependence, and a and b are “scaling variables.” He chooses α=0.2, a=10, and b=0.5, with little explanation.

To assess performance, Zhao examined nearly 14,000 measured affinities from PDBbind. When plotted by number of atoms, median affinity increased up to about 35 heavy atoms but then leveled off. Median LE values decreased sharply from 6 to 12 heavy atoms and then leveled off somewhere in the 20s. But median xLE values were consistent regardless of ligand size.

Zhao also examined LE and xLE changes for 175 successful fragment-to-lead studies from our annual series of J. Med. Chem. perspectives. LE decreased from fragment to lead for 48% of these, but xLE increased for all but a single pair.

And this, in my opinion, is a problem.

In the seminal 2004 paper, LE was proposed as "a simple ‘ready reckoner’, which could be used to assess the potential of a weak lead to be optimized into a potent, orally bio-available clinical candidate." The metric was particularly important before FBLD was widely accepted, when chemists were even less inclined to work on weak binders.

Here is the situation for which LE was devised. Imagine two molecules, compounds 1 and 2. The first has just 12 non-hydrogen atoms, a molecular weight of 160, and a modest 1 mM affinity for a target - similar to some fragments that have yielded clinical compounds. The second is much larger: 38 non-hydrogen atoms, a molecular weight of 500, and 10 µM affinity for the same target. Considering potency alone, compound 2 is the winner.

However, the LE for compound 1 is a respectable 0.34 kcal/mol/atom, while the LE for compound 2 is 0.18 kcal/mol/atom. So while a 10 µM HTS hit may initially look appealing, the LE suggests that this is an inefficient binder, and further optimization may require adding too much molecular weight to get to a desired low nanomolar affinity.

In contrast, the xLE values for both compounds are nearly identical, 0.38, and so this metric would not help a chemist prioritize which hit to pursue. In other words, xLE does not provide the insight for which LE was created. It might even lead to suboptimal choices.

Moreover, unlike LE, xLE is non-intuitive. And finally, with three scaling or normalization factors, xLE is arguably even more arbitrary than a metric dependent on the widely-accepted definition of standard state.

Personally I find the practical applications of xLE limited, but I welcome your thoughts.

10 November 2025

Searching monstrously large chemical space with FrankenROCS

Back in 2023 we highlighted a computational fragment linking/merging approach which was used to find high nanomolar inhibitors of the SARS-CoV-2 macrodomain (Mac1), a COVID-19 target. However, those molecules contained carboxylic acids, often associated with poor cell permeability. In a new open-access Sci. Adv. paper, James Fraser and collaborators at UCSF, Relay Therapeutics, Enamine, and Chemspace describe a related approach to find new, non-charged inhibitors.

The new approach, called FrankenROCS, “takes pairs of fragments as input to query a database using the rapid overlay of chemical structure (ROCS) method of comparing 3D shape and pharmacophore distribution;” the goal is to find larger molecules that most closely resemble the initial fragment pairs. As with the previous publication, the team started with more than 200 crystallographic fragment hits published in 2021. A set of 7,181 pairs of adjacently-bound fragments were searched against 2.1 million compounds commercially available from Enamine. The top 1000 were inspected, and 39 were purchased and soaked into crystals of Mac1. This led to 10 successful structures, of which AVI-313 did not contain a carboxylic acid. This molecule had weak but measurable activity in an HTRF competition assay.

Two million compounds is a lot but pales in comparison to Enamine’s “make-on-demand” REAL space, which at the time this research was done consisted of more than 22 billion molecules. The REAL space molecules are constructed from 960,398 building blocks that can be combined using 143 reactions. We previously described an approach called V-SYNTHES to screen Enamine’s REAL space. FrankenROCS takes a different active-learning approach called Thompson Sampling, which dates back nearly a century.

Imagine two sets of 1000 building blocks, R1 and R2, which could be coupled to generate 1,000,000 molecules. Rather than searching all possibilities, each R1 building block is linked to three random R2 building blocks, and each R2 building block is linked to three random R1 building blocks. These are virtually screened, and the R1 or R2 building blocks from those with the highest scoring compounds are used for further iterations. In theory, after tens of thousands of iterations, the best compounds will have been identified.

The researchers fed 97 fragment pairs from the 2021 paper into Thompson Sampling FrankenROCS to find molecules that would best overlay with the fragment pairs. Ultimately 32 compounds were purchased, six of which were successfully crystallized with Mac1. Unfortunately, the most potent was a weaker inhibitor than AVI-313 and contained a carboxylic acid. The researchers speculate that the inability to find better molecules in larger chemical space may have stemmed from limitations of the scoring function, a problem we’ve previously discussed.

The researchers returned to focus on AVI 313, making substitutions at multiple positions, ultimately synthesizing 148 compounds, 121 of which could be characterized crystallographically. Importantly, several compounds had low micromolar activity, even without a carboxylic acid. The crystal structures show the binding site to be somewhat flexible, as evidenced by side chain and main chain movements to accommodate some of the binders.

This is a nice, thorough investigation, and the 137 protein-compound crystal structures deposited into the protein data bank provide useful training data for next-generation computational approaches. Moreover, the fact that immeasurably weak fragments can be advanced to low micromolar, ligand-efficient hits is yet another reason for the research community to figure out how to make crystallographic fragment screening data more widely available, as we exhorted here.

03 November 2025

Fragments vs RhoDGI2: Towards a chemical probe

Many readers of this blog will be familiar with KRAS, a mutant form of which was successfully targeted a few years ago by a covalent, fragment-derived drug, sotorasib. KRAS is just one member of a large family of molecular switches which are on when bound to GTP and off when bound to GDP. This exchange is facilitated or inhibited by other proteins, including guanine nucleotide dissociation inhibitors (GDIs). GDIs bind to the GDP-form of RAS proteins, keeping them in the off state, but they can also stabilize Ras proteins against proteasomal degradation, keeping them around longer.

RhoGDI2 is a GDI that regulates Rho GTPases, which are involved in multiple cell pathways. The biology is complicated though, and RhoGDI2 has been implicated as both a cancer driver and inhibitor. Clearly a chemical probe would be useful. In a new ACS Chem. Biol. paper, Wei He and collaborators at Tsinghua University and University of Science and Technology of China Hefei report the first steps.

The story begins with a 2017 paper in Biochim. Biophys. Acta. Gen. Subj. by Ke Ruan (one of the authors of the new paper) and colleagues. A ligand-detected NMR screen of just under 1000 fragments yielded 14 hits, three of which were confirmed by two-dimensional protein-observed NMR. Further experiments suggested these bound in the hydrophobic pocket that binds gerarnylgeranylated Rho GTPases. Compounds 1 and 2, though weak, became the starting points for fragment growing.

Borrowing from compounds 1 and 2 and adding a phenyl moiety led to compound 2102, which was crystallographically confirmed to bind in the substrate binding pocket. Further fragment-growing, guided by structure-based design, ultimately led to HR3119, with low micromolar affinity for RhoGDI2 as assessed by surface plasmon resonance (SPR) and isothermal titration calorimetry (ITC). HR3119 has four diastereomers, and those with an R-configuration at the benzylic position (6R) were almost 100-more potent than 6S.

HR3119 blocked the interaction of RhoGDI2 with the Rho GTPase Rac1 in cell lysates. (6R)-HR3119 stabilized RhoGDI2 in a cellular thermal shift assay, while (6S)-HR3119 did not. (6R)-HR3119 also decreased migratory activity of a cancer cell line, consistent with the role of RhoGDI2 in actin dynamics. However, (6S)-HR3119 also showed activity in this assay, albeit at a higher concentration, suggesting off-target effects.

The biochemical and cell activity are still too weak to nominate (6R)-HR3119 as a chemical probe against RhoGDI2; ideally biochemical activity should be better than 100 nM and cell activity should be better than 1 µM. Nonetheless, this is a good starting point for further optimization, and a nice example of fragment-based lead discovery in academia.

27 October 2025

Fragments vs FEN1: A chemical probe

Synthetic lethality is a relatively new approach to treating cancer by targeting proteins whose inhibition is lethal to cancer cells that have specific mutations. Disruption of flap endonuclease 1 (FEN1), an enzyme important for DNA replication and repair, becomes synthetic lethal combined with BRCA1 and BRCA2 loss of function mutations. However, although a few inhibitors have previously been reported, these had poor cell activity and physicochemical properties. In a new J. Med. Chem. paper, Sam Mann and collaborators at Artios Pharma, Merck KGaA, and several other organizations describe a chemical probe.

FEN1 is a member of the RAD2 nuclease family, all of which contain two magnesium atoms in the active site. Thus, the researchers set out to build a metal-chelating library, a strategy we’ve written about previously. Noting that a bivalent metal chelator requires at least three hydrogen bond acceptors, the researchers included fragments that were not strictly rule-of-three compliant. More than 300 fragments were screened in a biochemical assay against FEN1 and three related proteins, EXO1, GEN1, and XPG. Hits were selected based on potency, ligand efficiency, and selectivity. Two related fragments came out on top, one of which was characterized crystallographically bound to FEN1, confirming engagement with the catalytic magnesium ions

Compound 6 was trimmed back to the chelating core compound 7 before attempts were made to grow the molecule in several directions, leading eventually to compound 21, the first molecule to show target engagement in cells. Modeling suggesting the presence of a high-energy water that might be displaced, which was attempted by expanding the core to include a morpholine moiety. Further modulation of properties ultimately led to MSC778, with modest oral bioavailability in mice, rats, and dogs. The paper describes some nice medicinal chemistry that goes beyond the scope of this post. For example, there was a correlation between cellular target engagement and off-rates as determined by surface plasmon resonance (SPR). One wonders if a covalent inhibitor, with essentially no off-rate, could be even more effective.

MSC778 is at least 65-fold selective for FEN2 over related RAD2 family members. It is also clean in a panel of off-target safety assays. The molecule is cytotoxic to a cancer cell line in which the BRCA2 gene had been knocked out but less so to the same cell line carrying wild-type BRCA2. Surprisingly though, no tumor growth inhibition was seen in a mouse xenograft model using the mutant BRCA2 cell line at the highest tolerable dose. However, tumor stasis was seen when the compound was dosed in combination with niraparib, a PARP inhibitor, consistent with earlier cell experiments suggesting that PARP inhibitors could synergize with FEN1 inhibitors.

The lack of single agent activity seen with MSC778 was undoubtedly disappointing, though the researchers note that it is unclear whether this is “due to insufficient target coverage, or an unexpected disconnect between the phenotypic consequences of FEN1 inhibition in vitro and in vivo.” Nonetheless, MSC778 looks to be a useful chemical probe for further understanding the biology of FEN1. This paper is also a nice application of building and screening a metal-chelating fragment library, which could be useful for targeting additional metalloproteins.

20 October 2025

Checking halogen bonds

Halogen bonds (or X-bonds) are one of the less appreciated protein-ligand interactions. As we discussed in 2022, the polarized nature of a carbon-halogen bond creates a partially-positively charged “σ-hold” at the bit of the halogen furthest from the carbon, and this can make favorable interactions with lone pairs on oxygen or sulfur atoms (or nitrogen, but in most proteins this is limited to histidine residues and is rare.) Halogens can also interact with aromatic π-systems such as the side chains of phenylalanine, tryptophan, histidine, and tryptophan. Since many fragments contain halogen atoms by design, halogen bonds may occur frequently. But how do you decide whether “a halogen in proximity of a possible acceptor” actually contributes to binding? In a new (open-access) paper in Protein Science, Ida de Vries, Robbie Joosten, and colleagues at Oncode Institute and The Netherlands Cancer Institute provide a new metric.

The researchers examined structures of halogen-containing ligands bound to proteins in PDB-REDO, a database of carefully vetted and refined structures from the Protein Data Bank. They only included structures solved to better than 2.5 Å resolution and omitted structures where halogens had high B-factors, which may be the result of radiation damage. This led to 8423 structures in which a halogen possibly interacted with an oxygen or sulfur atom and 8096 potential halogen-π interactions, which were analyzed in detail.

A halogen bond to an oxygen or sulfur atom can be described by the interatom distance and two angles: θ1 (carbon-halogen-oxygen/sulfur) and θ2 (halogen-oxygen/sulfur-carbon). Halogen-π-system bonds can be defined by distance to the centroid of the π-system and θ1, the carbon-halogen-centroid angle. (The paper has a nice diagram.) These parameters were calculated and annotated for all the structures.

Median distances were 3.5 Å between halogen and oxygen/sulfur, regardless of the halogen. Median θ1 angles were smaller than the 150º-180º expected, particularly for fluorine atoms, while median θ2 angles were more consistent with theory, at 90º-120º.

For halogen-π-systems, median distances were 4.8 Å for all halogens except iodine, which came in slightly higher. But θ1 angles were still smaller than expected, mostly between 110º-140º.

Armed with this tranche of high-quality data, the researchers established a Halogen Bond Score, or HalBS. For any potential halogen bond in a new crystal structure or other structural model, the distance, θ1, and, if applicable, the θ2 values are calculated, and if any of these diverge too far from the median values, HalBS flags them. Importantly, the researchers acknowledge that “the current HalBS cannot be used as a direct validation metric but can provide an indication of genuine halogen bonds and ‘not so proper’ halogen bonds.”

With this caveat HalBS could be useful, and the researchers have made the source code available at https://github.com/PDB-REDO/HalBS (though the link doesn’t seem to work for me). As they note, more data, such as might be provided by widespread deposition of large crystallographic fragment screens, could further refine HalBS. Of course, the existence of a halogen bond exists says little about how much binding energy it contributes, but it’s a start.

13 October 2025

Ivermectin postmortem: PAINful experiences with a good drug

Ivermectin is a miracle drug. It cures infections caused by several types of parasitic roundworms, including those that cause river blindness and elephantiasis, two highly unpleasant diseases. Half of the 2015 Nobel Price for Physiology or Medicine was awarded to researchers who discovered the drug.

More recently, ivermectin was touted as a treatment for SARS-CoV-2. Unfortunately, after more than 90 human clinical trials, the preponderance of evidence shows that it is ineffective against COVID-19. A new (open-access) Perspective in J. Med. Chem. by Olaf Andersen, Jayme Dahlin, and collaborators at Weill Cornell Medicine, the National Institutes of Health, University of North Carolina Chapel Hill, and University of California San Francisco explores why it proved so misleading, with lessons for other drug repurposing efforts.

In 2011, ivermectin was one of 480 compounds tested in a biochemical assay at 50 µM and appeared to disrupt the interaction between HIV integrase and a mammalian protein involved in viral trafficking. In April 2020, low micromolar concentrations of ivermectin were reported to have anti-SARS-CoV-2 activity in a cellular assay. Given a terrifying new disease with no treatments, an approved drug that showed even tenuous activity looked like a lifeline. Researchers around the world began studying ivermectin. PubMed citations jumped from 459 in 2019 to 734 in 2021.

The new paper examines some of this past work and dives deeply into the biological effects of ivermectin. The molecule is poorly soluble in water (around 1-2 µM) and partitions into cellular membranes, where it activates a chloride channel receptor in worms, paralyzing them. At this it is quite potent: when taken as directed, human plasma concentrations of ivermectin are around 60 nM, but because the drug is highly protein bound the free concentration is only around 6 nM.

The astute reader may notice that the apparent antiviral activity was observed at low micromolar concentrations of ivermectin, three orders of magnitude higher than physiologically relevant, and the initial biochemical assay was conducted at an even higher concentration. In fact, that work was done using an AlphaScreen, which is notoriously sensitive to artifacts. The new paper demonstrates that ivermectin forms aggregates at low micromolar concentrations, and that these aggregates interfere with the AlphaScreen. (We previously wrote about how many of the early reports of compounds active against SARS-CoV-2 proteins were in fact aggregators.)

If ivermectin perturbs membrane proteins in worms, what does it do in human cells? The researchers tested the drug against panels of G protein-coupled receptors (GPCRs) and found that in one assay format it inhibited more than a quarter of 168 GPCRs at 10 µM, much higher than physiologically relevant but comparable to the in vitro experiments with SARS-CoV-2. Further studies revealed that ivermectin changes the properties of membranes at low micromolar concentrations, as assessed by multiple methods including electrophysiological assays. Several ivermectin analogs were also tested and found to have similar activity, consistent with nonspecific effects. Impressively, these experiments were done blinded to the experimenter.

You might think that messing with membranes would not be good for cells, and you would be right. The researchers found that ivermectin decreased cell viability at low micromolar concentrations in a variety of assay formats through multiple mechanisms, both cytotoxic and apoptotic. Importantly, the concentrations at which ivermectin was active against cells were similar to the concentrations where it showed activity against SARS-CoV-2. The researchers also analyzed 766 PubChem assays and found that ivermectin is active in nearly a third of those assessing cellular toxicity. Killing a host cell is certainly one way of killing a virus, but likely not a useful one.

In summary, the original data suggesting that ivermectin is a developable antiviral agent was flawed. The researchers describe this as “a saga of the damage that can be done by assay interference compounds” and a “cautionary tale for the dangers of ‘pandemic exceptionalism.’” They continue:

The fact that a repurposed drug is well-characterized clinically, or that there is an ongoing pandemic, may justify performing clinical and mechanistic experiments in parallel, but not skipping mechanistic studies, where the key experiments could have been done in a matter of a few weeks/months.

This J. Med. Chem. paper is a meticulous, comprehensive study; with 71 pages of supporting information there is far more to cover than I can do justice to in a blog post. The paper also includes a useful flowchart for derisking nonspecific membrane perturbation. It is well worth reading, particularly for those new to drug discovery. As Richard Feynman warned, “the first principle is that you must not fool yourself -- and you are the easiest person to fool.”

06 October 2025

Exploiting avidity for finding fragments

As our poll last year demonstrated, there is no shortage of methods to find fragments. But that doesn’t mean new approaches aren’t welcome, particularly when they also apply to fragment growing. This is the promise of a recent paper in J. Med. Chem by Thomas Kodadek and collaborators at University of Florida Scripps and Deluge Biotechnologies. (Tom and first author Isuru Jayalath also presented this at the DDC meeting earlier this year.)

The researchers were inspired by the concept of avidity, the observation that multiple copies of a ligand bound to a multiprotein assembly can form a more stable complex than monomeric ligands bound to monomeric proteins. Could this phenomenon be exploited to find weak fragments?

A previous DNA-encoded library screen on streptavidin had identified 28 macrocycles, all of which contained one of two closely related fragments. The affinity of the more potent fragment came in at 706 µM using SPR. The researchers coupled this fragment to TentaGel beads, 10 µm wide polystyrene spheres covered in polyethylene glycol (PEG) chains terminated by amine groups. The PEG makes the beads water soluble. The beads were soaked in a solution of fluorescently labeled streptavidin, washed, and analyzed. Importantly, streptavidin exists as a tetramer, so each tetramer could bind up to four bead-bound fragments.

Streptavidin bound avidly to the beads, even when incubated at low (50 nM) concentrations. A control protein did not bind, nor did streptavidin bind to beads modified with a negative control fragment. Moreover, a monomeric version of streptavidin did not bind to the beads, illustrating the importance of avidity. Finally, adding the natural ligand biotin kept streptavidin from binding to the beads.

TentaGel beads have long been used in combinatorial synthesis, so the researchers built a small library in which the initial fragment was coupled to 48 carboxylic acids. These were then incubated with labeled streptavidin, and some of the beads showed more intense fluorescence, suggesting more protein binding. SPR analysis revealed that these new molecules had improved affinity, with the best coming in at 90 µM as a monomer. Thus, the primary screen can rank order affinities.

This is great for oligomeric proteins, but what about the large number of targets that are monomeric? Many recombinant proteins are expressed as fusions with glutathione S-transferase (GST), which facilitates purification. Importantly, GST exists as a homodimer in solution. The researchers screened a GST fusion of the oncology target Rpn13 against a small library of 94 fragment-coupled beads and found five hits. SPR studies confirmed weak (K_D > 2 mM) binding for two hits to pure Rpn13 (ie, without the GST fusion), and this binding could be competed with a known peptide ligand of Rpn13.

Screening beads in individual wells is one thing, but to really increase throughput it would be nice to be able to screen mixtures of different beads. To do so, the researchers developed a photocleavable linker between bead and fragment. The linker also contained an alkyne group that could be modified with a brominated imidazopyridinium moiety. This tag is UV active, ionizes well, and the bromine’s unique isotopic signature helps distinguish true hits from noise. Beads containing more than 50 different compounds, including the two fragment hits we mentioned above, were incubated with labeled streptavidin. Beads to which protein bound were separated by fluorescence-activated cell sorting (FACS), clicked with the tag, cleaved from the beads, and analyzed by mass spectrometry. Only the two known binders were identified, demonstrating the specificity of the approach.

This is a neat paper well worth reading. I particularly like the fact that the method can be done with minimal equipment. I look forward to seeing how it works against more targets.

29 September 2025

Twentieth-Third Annual Discovery on Target Meeting

The CHI Discovery on Target (DoT) meeting was held last week in Boston. More than 850 people from 24 countries attended, 75% from industry. As usual I’ll just touch on some broad themes.

Covalent approaches

Covalent approaches were prominent throughout the conference. One of the very first talks was by Stefan Harry (Harvard/MGH), who described screening 416 cancer cell lines with three reactive “scout probes,” identifying some 6000 cysteine residues that could be covalently liganded. There are some interesting cell and context-dependent differences, and all the data are publicly available and easily searchable through a free online portal called DrugMap. He is now profiling a library of dual-electrophile-containing compounds to identify molecular glues.

Knowing which cysteines can be targeted is the first step for covalent drug discovery, and Sherry Ke Li described how she and colleagues at Genentech go about finding ligands. They’ve experimentally determined the reactivity of more than 6400 compounds against free cysteine and used this to train a machine-learning model to predict chemical (as opposed to specific) reactivity. Mass spectrometry (MS) using isolated proteins is the workhorse screening approach, but Sherry also described using variable temperature surface plasmon resonance (SPR) to dissect the individual components of k_inact/K_I.

AstraZeneca has also been doing considerable covalent screening, and Hua Xu briefly described the BFL1 story we wrote about here. In addition to pure proteins, they are now also starting to screen their covalent library in cells. Hua also presented earlier work from Pfizer on the discovery of the covalent kinase inhibitor ritlecitinib, which started with a noncovalent binder. Proteomic studies revealed that in addition to the intended target JAK3, it hits other TEC-family kinases too.

Adding a covalent warhead to a reversible binder is also the approach taken by MOMA Therapeutics in the discovery of their clinical WRN inhibitor MOMA-341, as presented by Momar Toure. They ended up targeting the same cysteine as Vividion (see here), though the binding mode is somewhat different. MOMA is also pursuing covalent fragment screening using intact protein MS, and Brian Sosa-Alvarado described how they were able to identify nanomolar inhibitors of RAD54L within six months of starting the program, aided by DNA-encoded libraries (DEL).

Not everyone is pursuing cysteine: Ken Hsu described how he and his team at University of Texas Austin are using sulfur-triazole exchange chemistry (SuTEx) to target tyrosine residues across the proteome. He noted that although cysteine could react with this warhead, the resulting thiosulfonate would be unstable. This is true in general, but I wonder if, just like the reversible cyanoacrylamide warheads we wrote about more than a decade ago, they could be stabilized within folded proteins.

Noncovalent approaches

Covalency is not the only game in town, as exemplified in a talk by Emma Rivers on “integrated hit discovery” at AstraZeneca. I was tickled that she grouped FBLD with HTS as “traditional” approaches, onto which they’ve added DEL and peptide libraries. Importantly, they’re focused on generating and capturing as much high-quality data as possible to enable machine learning – a topic we’ll touch on more below.

Nor are proteins the only target; there was a whole track on RNA- and DNA-targeting small molecules, where Benjamin Brigham described the plate-based equilibrium dialysis-based approach taken at Atavistik to screen metabolites and metabolite-like molecules. This led to two fragment-sized hits against RNA encoding SERPINA1, and although the affinities are modest, they do inhibit translation in a cell-free system.

At FBLD 2018 Astex presented the first cryo-EM structure of a fragment-protein complex, noting that throughput was an issue. The company has leaned into that challenge and now has three microscopes, including a top-of-the-line Krios, with another on the way. Miguel Zamora-Porras described how they have now solved hundreds of structures. Their Krios can collect data on two compounds per day, and the full cycle time from protein-ligand preparation to structure is about a week. Miguel described how structures of ligands bound to the ion channel protein TRPML1 helped reveal why some were agonists and others antagonists.

Data and its discontents

On the subject of structures, Steve Burley (Rutgers) gave an eloquent history and defense of the “RCSB Protein Data Bank: an open access research resource that benefits all humanity.” From its humble beginnings with just seven structures in 1971, the PDB now contains more than 240,000. And these are not just of scientific interest: all 88 of the new molecular entities the FDA approved for oncology between 2010 and 2023 had PDB structures that informed the biology or druggability, and 75% of the efforts involved structure-based design. Steve also mentioned that the question of where and how to store large-scale crystallographic data will be discussed in a meeting sometime in the spring of next year. Finally, Steve is hoping to retire from his position as Director of the RCSB PDB, so if you’re looking to make an impact, please apply.

The dramatic advances in protein structure prediction exemplified by AlphaFold would not have been possible without the PDB, but unfortunately the same high-quality information on protein-ligand binding modes and affinities is not available, as noted by Woody Sherman of Psivant. To illustrate the importance of training data, Woody asked ChatGPT to produce a picture of an analog clock set to 6:32. The result? A clock with three hands, one at 10, one at 2, and one at 6, because most images of clocks are set to 10:10.

Woody asked whether machine-learning-based docking can extrapolate or just interpolate. Although impressive results have been reported for some protein-ligand complexes, it turns out that there are often similar ligands in the training data. For truly novel ligands, the predictions tend to fall flat. Similarly, allosteric ligands are often (mis)placed into an orthosteric site – just because the model has been trained that that’s where ligands should go. Indeed, although Psivant is heavily invested in computational approaches, Woody mentioned that they often use “wet” approaches for finding initial chemical matter.

On the subject of dubious data, Al Edwards (University of Toronto) noted that a third of all immunofluorescence images in the literature use antibodies that give signals in knockout cells. And as we wrote just last month, many reported small molecule “probes” are just as bad. Al is CEO of the Structural Genomics Consortium, whose ambitious Target 2035 aims to find a pharmacological probe for every target in the human genome. As a starter, they’re aiming for 2000 probes in the next five years. They’re using affinity-selected mass spectrometry (ASMS), screening pools of 500 compounds and 8 proteins at a time, and are getting micromolar hits against about 30% of targets. They’re accepting protein submissions, so if you’re looking for starting points against your favorite protein contact them.

I’ll end here, but please leave comments. And mark your calendar for Sep. 28 to Oct. 1 next year, when DoT returns to Boston.

22 September 2025

Fragment merging without crystallography for CGRP receptor antagonists

Migraines are the third leading cause of disability worldwide. Although the pathology is complex, blocking the interaction of calcitonin gene-related peptide (CGRP) with its receptor, thereby decreasing vasodilation, has proven successful in the clinic. However, some of the early small molecule antagonists were discontinued due to hepatotoxicity. In a recent J. Med. Chem. paper, Naohide Morita, Isao Azumaya, and collaborators at Kissei Pharmaceutical and Toho University describe a new class of inhibitors.

CGRP binds at the interface of a heterodimeric receptor comprised of the calcitonin receptor-like receptor (CLR) and receptor activity-modifying protein 1 (RAMP1). To find hits, the researchers screened a library of 2500 fragments (which could be up to 350 Da) at 500 µM against the extracellular CLR/RAMP1 domains using SPR. This yielded 565 hits, which were clustered based on similarity, and 250 were chosen for dose-response studies, leading to 38 confirmed hits. Competition studies with a known CGRP antagonist whittled this number down to just four, with compound 1 being chosen for further study due to ease of analog synthesis.

Compound 1 was confirmed as a binder using isothermal titration calorimetry (ITC). Unfortunately, co-crystallography with CLR/RAMP1 was unsuccessful, so the researchers turned to docking using information from known small molecule inhibitors. This work suggested that compound 1 binds to the CGRP receptor but does not interact with RAMP1, a conclusion further supported by mutagenesis studies.

To find fragments that bind RAMP1, the researchers performed a second fragment screen, again using SPR. This time the fragments were chosen from those in the first set that had not been tested in dose-response studies, supplemented with several hundred more selected based on structures of known CGRP antagonists. Of 784 fragments screened, 114 were taken into dose-response studies, leading to 8 hits. Compound 2 was the most potent, and mutagenesis studies suggested it interacted with RAMP1.

Crystallography of compound 2 was also unsuccessful, but docking, supported by NMR studies, suggested a possible binding mode. Compounds 1 and 2 were merged to yield compound 3, which had a satisfying 2000-fold improvement in potency compared to compound 1. Compound 3 also showed cell activity.

Compound 3 contains three stereocenters, so the researchers sought to simplify the molecule. They also needed to improve potency and metabolic stability. Multiparameter optimization ultimately led to compound 15, with picomolar(!) affinity for the receptor, subnanomolar activity in cells, and good pharmacokinetic properties. A standard model for migraine is inhibition of facial blood flow in marmosets, and compound 15 was active. The compound was also clean in tests for hepatotoxicity.

Although no further development of compound 15 is reported, this is a nice case study in fragment merging. As the researchers note, it is also one of just a handful of examples that succeeded in the absence of crystallographic data (we wrote about another one here). Hopefully this will further embolden researchers to pursue fragment merging and linking without direct structural information.

15 September 2025

Covalent ligand efficiency

Ligand efficiency (LE) was proposed more than two decades ago as “a useful metric for lead selection.” The concept is simple: divide the binding energy of a ligand by the number of non-hydrogen (or heavy) atoms (HA). The higher the number, the higher the binding energy per atom, and thus the more “efficiently” the ligand binds to the protein. LE is particularly useful in fragment-based lead discovery when prioritizing among differently sized hits to ensure that small, weak molecules are not overlooked. While some have criticized the metric’s dependence on standard state, drug hunters have repeatedly found it to be useful, as we’ve discussed here, here, and here.

Irreversible covalent drugs are a horse of a different color - or perhaps a different species entirely. Because of their two-step mechanism, binding followed by bonding, time is an essential parameter, and the proper way to characterize them is with the ratio k_inact/K_I. Is it possible to develop a covalent ligand efficiency metric? This is the task that György Ferenczy and György Keserű at HUN-REN Research Centre for Natural Sciences and Budapest University of Technology and Economics set for themselves in a recent (open-access) Drug Discovery Today paper.

As we wrote just a couple months ago, an important distinction for covalent drugs is specific vs chemical reactivity: you want the first to be high and the second to be low. For cysteine-reactive molecules, this distinction is often assessed by measuring the rate of reaction with the abundant cellular thiol glutathione (GSH). The researchers sought to incorporate this parameter into their definition of covalent ligand efficiency (CLE) as follows:

CLE = LE – LE(GSH) = (-1.4*log₁₀(IC_50,t)/HA) - (1.4*log₁₀(k^2nd_sur*t)/HA)

Where IC_50,t is half maximal inhibitory concentration at time “t” and k^2nd_sur is the second-order rate constant of the ligand reacting with a surrogate nucleophile such as GSH.

The researchers cataloged multiple covalent modifiers from the literature. Some had reported glutathione reactivity data. For the rest, the researchers estimated these values based on analogs. They went on to calculate CLE values for the protein-ligand pairs. Laudably, all of these data are provided in the supplementary data.

So, how useful would CLE have been in prior lead discovery campaigns? The researchers calculated CLE values for the BFL1 covalent fragment hits we wrote about here. The potencies of the six reported fragment hits varied, reflected in k_inact/K_I values, from 0.7 to 9.5 M^-1s^-1. But their CLE values spanned a narrower range, from 0.08 to 0.12. The fragment that was successfully optimized was one of the most potent, with a k_inact/K_I of 7.5 M^-1s^-1, but had a CLE of just 0.09. If anything, CLE would have deprioritized this fragment, at odds with the stated goal that “CLE is designed to support compound priorization.”

As we discussed earlier this year, the researchers previously proposed that covalent fragments may need to be larger than reversible fragments. If this is true, then normalizing for size may be less important for covalent ligands than for noncovalent ones, which can be very small and weak yet still valuable. Indeed, the researchers’ analysis of covalent ligands from the literature shows a smaller range of CLE values than LE values.

The researchers acknowledge other oddities too: “there is no smooth transition from CLE to LE as the reactivity of ligands decreases. Moreover, CLE can take negative values for compounds with low affinity and high reactivity.”

But for me, the biggest liability is the fact that – unlike LE for reversible binders or k_inact/K_I – the value of CLE depends on the time the measurement was taken. (In the paper, the researchers use a 1 hour incubation, so I would propose the annotation CLE_1h.) This makes it difficult to compare CLE values taken at different time points.

The first word of this blog is “practical,” and I’m not convinced this adjective applies to CLE, though I applaud the effort. The popularity of LE spawned a cottage industry of other metrics, some of which we summarized in a 2011 post. I confess that I had nearly forgotten about some of them, but I think they were a useful way for the field to grapple with what characteristics mattered. As covalent drug discovery becomes increasingly popular, perhaps we will see a similar proliferation of metrics. (Indeed, we already wrote about another one here.) It will be interesting to revisit these a decade hence to see which ones have caught on.

08 September 2025

Fragment growing in three dimensions made easy

Nearly a decade ago we highlighted a paper from Astex that exhorted chemists to develop new synthetic methodologies useful for fragment-based drug discovery. Peter O’Brien has taken on the challenge, and he and his collaborators at University of York and AstraZeneca report their progress in a recent (open-access) J. Am. Chem. Soc. paper.

The O’Brien group has previously published synthetic routes to shapely fragments, which we wrote about here. These could be useful for expanding fragment collections, but that happens infrequently. The new paper focuses on the far more common challenge of what to do when you have a fragment hit.

The idea was to create a “modular synthetic platform for the elaboration of fragments in three dimensions.” The researchers designed a set of bifunctional building blocks that could be coupled to existing fragments. The two functionalities were N-methyliminodiacetic acid boronate (BMIDA) and a Boc-protected amine. The amine is a versatile handle for multiple types of chemistry, while the BMIDA moiety is particularly useful for Suzuki-Miyaura cross-coupling. (Indeed, two separate groups of researchers had previously built libraries suited for cross-coupling using halogen-containing fragments, as we discussed here.)

For the new building blocks, the researchers considered azetidines, pyrrolidines, and piperidines with fused or spiro-cyclopropyl groups. These are rigid “three-dimensional” units, and the relative locations of the BMIDA group and the amine could provide very different distances and vectors. After modeling 27 possibilities, the researchers chose nine building blocks based on diversity and predicted ease of synthesis. These were synthesized on gram scale, and all nine are now commercially available.

To demonstrate that the building blocks would be generally synthetically useful, the researchers coupled them to a variety of (hetero)aryl bromides, with yields ranging from 10-90%, and most >60%. The Boc group was then deprotected and the crude amine was used in a variety of successful reactions.

The building blocks were each also coupled to 5-bromopyrimidine, the Boc-group was deprotected, and the free amines were capped as methanesulfonamides. Small molecule crystallography of the resulting compounds confirmed modeling results that the two vectors had a wide range of orientations and were separated by 1.5-4.4 Å. Moreover, most compounds were rule-of-three compliant, had good measured aqueous solubility, and were even stable in human liver microsomes and rat hepatocytes.

As a use-case, the researchers considered the approved drug ritlecitinib, an irreversible JAK3 inhibitor. They imagined that its pyrrolopyrimidine moiety was a fragment hit, and then virtually combined it with their nine scaffolds, each functionalized with an acrylamide. These were then virtually docked, and the best two were synthesized and tested. Compound 96 was quite potent, albeit less so than ritlecitinib.

The question of whether three-dimensionality is desirable as a design feature remains unproven, as we noted recently. However, whether the high Fsp³ of the nine new scaffolds is itself a selling point, they do provide new vectors for fragment growing, and their synthetic enablement justifies including them at least in virtual campaigns.

02 September 2025

Keeping molecular dynamics cool for fragments

Accurately and reliably predicting fragment binding modes would be preferrable to doing messy, expensive, and sometimes tedious experimental work, but we’re not there yet. One of the biggest problems is that, because fragments usually bind weakly to proteins, it is hard to tell which of several possible binding modes is most favorable. In an open-access J. Chem. Inf. Model. paper published earlier this year, Stefano Moro and colleagues at University of Padova report progress.

Their approach, called Thermal Titration Molecular Dynamics (TTMD), analyzes short molecular dynamics simulations across increasing temperatures; if the ligand remains bound to the protein, this indicates a more stable binding mode. (It seems a bit like the dynamic undocking we wrote about here.) The researchers had previously reported good results for larger, drug-sized molecules, but not for four fragment-protein complexes.

Recognizing the low affinities of fragments, the researchers decided to lower the (virtual) temperatures. Rather than heating from 300 to 450 K, they heated from 73 to 233 K; ie, from just below the boiling point of liquid nitrogen to a moderately cold winter’s day in Minnesota. They first docked fragments using PLANTS-ChemPLP, which is free for academics, and chose the five best-scoring poses for evaluation.

Next, the researchers performed TTMD. There are several different ways to assess how well the ligand remains bound to the protein over the course of a molecular dynamics simulation, and four different scoring methods were chosen. When TTMD was tested on the four fragment-protein complexes that had previously failed, at least two of the scoring methods correctly identified the crystallographic binding mode for three of the fragments.

Thus encouraged, the researchers tested ten more compounds bound to six new proteins. The results were quite encouraging, with up to 86% of crystallographic binding modes being correctly identified by at least one of the scoring functions in TTMD vs 50% for docking alone. Impressively, two of the examples were MiniFrag-sized, with just 6 or 7 non-hydrogen atoms, yet the crystallographic pose was identified as the lowest energy in all four TTMD scores.

This is nice work, but the question arises how these specific ligands and proteins were chosen. Several years ago we highlighted a curated set of 93 protein-ligand structures that were used to benchmark other virtual approaches, and it would be nice to see how TTMD performs on these. Still, TTMD’s performance on its chosen examples is encouraging, and laudably the researchers have made their code freely available. If you try it out, please let us know how it works in your hands.

25 August 2025

Fragments vs KEAP1: Fragment growing this time

Kelch-like ECH-associated protein 1 (KEAP1) binds to nuclear factor erythroid 2-related factor 2 (NRF2), targeting it for degradation. Blocking this interaction has anti-inflammatory effects, and indeed the approved drugs dimethyl fumarate and omaveloxolone are believed to act in part through this mechanism. But those drugs hit a lot of other targets, and more specific molecules have long been sought; we wrote about one in 2016 and another in 2021. In an open-access paper just published in Angew. Chem. Int. Ed., Anders Bach and an international team of collaborators at University of Copenhagen and elsewhere describe a new chemical series.

As in the 2016 paper, the researchers started with a crystallographic screen, in this case using the 768-member DSI-poised library, which we wrote about here. This resulted in 80 hits, all binding in the so-called Kelch pocket, which has previously been targeted. Thirteen of these bound in the central region, and compound 1 showed modest but measurable affinity by SPR.

All previously reported non-covalent high-affinity KEAP1 ligands contain at least one acidic moiety to interact with arginine residues in the protein, so the researchers used structure-based design to add carboxylic acids, resulting in compound 4, with low micromolar affinity. This molecule, unlike the initial fragment, could also block the KEAP1-NRF2 interaction in a fluorescence polarization assay.

Building into a hydrophobic sub-pocket yielded compound 12, and adding strategically placed hydrogen-bond acceptors led to further improvements in affinity, ultimately leading to compound 28, with low nanomolar activity. Crystallography revealed that these molecules bound in a similar fashion as the initial fragment.

Compound 28 and related molecules were tested in a variety of assays. They were selective for KEAP1 over 15 other human Kelch domains in a thermal shift assay. Compound 28 activated NRF-2 regulated cytoprotective genes and decreased inflammatory markers in multiple cell lines. It also displayed RNA expression profiles similar to those of other reported non-covalent KEAP1 inhibitors. Cellular potency in some of these assays was as good as 60 nM.

This is a nice fragment-to-lead story, though no ADME or DMPK data are reported, and the combination of relatively high molecular weight, negative charge, and lipophilicity suggest that permeability and oral bioavailability may be challenging. Indeed, the researchers note that no non-covalent KEAP1-NRF2 inhibitors have entered the clinic. Perhaps this target is better suited for covalent inhibitors, preferably ones more selective than dimethyl fumarate. More on those later.

18 August 2025

Hundreds of crystallographic ligands for FABP4 – many not as expected

The ten human fatty-acid binding proteins (FABPs) shuttle lipids around cells. As we noted several years ago, FABP4 and FABP5 are potential drug targets for diabetes and atherosclerosis, but selectivity over FABP3 is needed to avoid cardiotoxicity. Markus Rudolph and colleagues at Hoffmann-La Roche describe progress towards selective molecules in three consecutive open-access Acta. Cryst. D papers. Perhaps more importantly, they gift a massive high quality data set to the scientific community – along with some important caveats about data for protein-ligand structures.

The first paper focuses on purification and NMR characterization of FABP4. Recombinant FABPs are normally expressed in E. coli, and they always contain natural fatty acids that copurify with the protein. This can complicate ligand binding studies, since the endogenous fatty acids act as competitors. Indeed, the researchers highlight two structures in the protein data bank (PDB) whose supposed ligands are probably fatty acids.

To solve this problem, the researchers denature FABP4, separate the fatty acid, and then refold the protein. This truly apo form of the protein was studied by NMR, revealing that the protein becomes more rigid upon ligand-binding.

The second paper is of more general interest. It reports a set of 229 crystal structures of various FABPs, of which 216 have a bound ligand. Of these, 75 have associated IC₅₀ values for at least one FABP, and 50 compounds have IC₅₀ values reported for FABP3, FABP4, and FABP5. Importantly, the structures are solved to high resolution, with a median of 1.12 Å. Two crystal forms are particularly suitable for soaking, and compounds were typically soaked at 60 mM in 30% DMSO overnight.

All the crystal structures are deposited in the PDB, and all the binding data are provided in the supporting information. Given FABPs’ predilection for carboxylic acids, the ligands contain a variety of carboxylic acid mimetics. This wealth of high-quality data should be valuable for constructing machine-learning binding models, and the researchers conclude by calling “on other industrial organizations to also make their legacy data available such that prediction models with broader applicability may be developed more quickly.”

But it was the third paper that really caught my attention: the researchers summarized it as “what is written on the bottle is not what is in the crystal.” In fact, of the 216 ligands reported, a whopping 33 (15%) do not match the compound registered. These are grouped into several categories and described in detail.

Human error is the simplest to explain: the researchers show an example where a 1,2-benzoxazole was registered as a 1,3-benzoxazole. Because the molecules have the same molecular weight, mass spectrometry could not distinguish them. Similarly, the researchers find several cases where the wrong enantiomer or diastereomer was registered. In another case, a racemic mixture led to a single enantiomer bound to FABP4, with the protein acting as a “chiral sponge.”

Other cases are more unusual, and include ring closing, ring opening, acyl shifts, hydrolysis, and instances of ligand decomposition or incomplete reactions. The researchers note that small amounts of impurities could be particularly problematic at the high ligand concentrations used for soaking; they calculate that just 0.06% impurity would be equivalent to the total amount of FABP in a crystal. Some fragment screens are done at even higher concentrations, further increasing the risk of enriching impurities.

A 15% rate of unexpected ligands is comparable to the numbers we blogged about here, but those were commercial libraries, whereas this set is from Roche, which likely has better internal quality control. One factor that led to the recognition of the problem is the high resolution, where a single atom change could be readily seen. Another is the buried nature of the ligands; ligands bound on the surface of a protein may have more dynamically disordered bits, which would be difficult to distinguish from missing moieties caused by decomposition.

Indeed, the researchers examine two other proteins, PDE10 and ATX, for which they have also released ~200 ligand-bound structures but at lower average resolutions. There are some unexpected ligands for these proteins too, but many fewer than for the FABPs – or perhaps we just can’t observe some of them.

As we noted back in 2014, up to a quarter of ligand-containing crystal structures in the PDB may contain serious errors, and the researchers cite a study suggesting that 12% are “just bad.” These could have obvious negative consequences for training computational models, and the researchers call on the community to set standards to create a rigorously chosen training set. Perhaps this discussion could be held in parallel with the discussion on how to house fragment screening data, which we wrote about last month.

11 August 2025

Fragments vs CYP125 and CYP142 for M. tuberculosis

Although 2020 and 2021 were baleful exceptions, tuberculosis is normally the world’s deadliest infectious disease. The pathogen Mycobacterium tuberculosis (Mtb) makes its home inside macrophages, the very cells that normally destroy microorganisms. Worse, some strains have become resistant to approved drugs. In a recent open-access J. Med. Chem. paper, Madeline Kavanagh, Kirsty McLean, and collaborators at University of Manchester, University of Cambridge, and elsewhere explore a new mechanism to fight this ancient disease.

An important nutrient source Mtb exploits inside human cells is cholesterol, which bacteria oxidize with the cytochrome P450 enzyme CYP125. A second enzyme, CYP142, is also present in some strains and is functionally redundant. Thus, the researchers set out to make a dual inhibitor.

Mtb has some 20 CYPs, and the Cambridge researchers have been studying them for a long time: we wrote about their work on CYP121 in 2016 and their work on CYP126 in 2014. All these enzymes contain a heme cofactor, and much is known about targeting the bound iron. However, some ligands are promiscuous, hitting human P450 enzymes, or they are rapidly effluxed out of cells. Thus, the researchers built a fragment library of just 80 likely heme binders but excluded particularly promiscuous moieties, such as imidazoles. The library was screened using UV-vis spectroscopy; ligands that bind to the heme group cause a red-shift in the λ_max. Only four hits were found for CYP125, while a dozen were found for CYP142, including three of the four CYP125 hits. Compound 1a had modest affinity for CYP125 and low micromolar affinity for CYP142.

Compound 1a was soaked into crystals of CYP142, and interestingly two molecules bound at the active site: one coordinating to the iron atom as expected, the other binding near the entrance of the active site. This suggested a linking or merging strategy, so the researchers made small libraries based on compound 1a and tested these against the two enzymes. Compound 5m was the most potent against both. Crystal structures of this molecule bound to both CYP125 and CYP142 confirmed that the pyridine nitrogen maintained its interaction with the heme iron, while the added bit nicely filled the space previously occupied by the second copy of compound 1a.

Functional assays revealed that compound 5m inhibited both enzymes with nanomolar activity, comparable to their affinities. It also inhibited the growth of Mtb grown on media containing cholesterol as the sole source of carbon. More impressively, it even inhibited the growth of Mtb in standard media spiked with just low concentrations of cholesterol. Oddly though, it also inhibited the growth of Mtb grown on media not containing cholesterol, albeit at a higher concentration, suggesting perhaps other targets. But one reason tuberculosis is so hard to treat is that the bacteria persist inside human cells. Encouragingly, compound 5m inhibited the growth of Mtb in human macrophages at low micromolar concentrations, and it did not show cytotoxicity up to 50 micromolar concentration.

Unfortunately, compound 5m did show cytotoxicity to human HepG2 cells, and it also inhibited several human P450 enzymes at high nanomolar concentrations, which could cause drug-drug interactions. Also, selectivity against other MTb P450 enzymes is unclear. Finally, no in vitro ADME data are reported. Nonetheless, this is a nice fragment to lead story, and compound 5m could be used – cautiously – as a chemical probe to study Mtb biology.

04 August 2025

The Chemical Probes Portal turns ten. Use it!

Last week we highlighted a new tool to computationally predict whether a molecule might aggregate, thereby causing false positives. This doesn’t necessarily mean the molecules are bad (after all, some approved drugs aggregate), but it’s all too easy to screen molecules under inappropriate conditions. This brings up the topic of chemical probes, and as it happens the Chemical Probes Portal turns ten years old this year, as celebrated in a Cancer Cell Commentary by Susanne Müller, Domenico Sanfelice, and Paul Workman and a blog post by Ben Kolbington at the Institute of Cancer Research.

We first wrote about the Chemical Probes Portal in July 2015, when it contained just 7 compounds. When we returned in 2023 it contained more than 500 compounds, and by the end of last year the number was up to 803. As of today it lists 1174 probes for 622 targets. Nearly a third of the probes also have chemically related inactive controls. These seem like large numbers, but the the human genome conservatively encodes for some 20,000 proteins, and the ambitious Target 2035 initiative seeks chemical probes for all of them.

The new paper emphasizes that the standards are in some ways higher for chemical probes than for approved drugs: “whereas probes principally require a high degree of selectivity, drugs need ‘only’ to be safe and effective and may often hit several targets.” Dimethyl fumarate comes to mind as a highly promiscuous covalent modifier that is nonetheless a useful drug for multiple sclerosis and psoriasis.

Even when a compound hits a target of interest, that doesn’t mean any biological effects observed are due to the target, particularly when the readout is cell death. The researchers note that TH588 was originally reported as a potent inhibitor of MTH1, but it actually kills cancer cells by binding to tubulin, a fact not always mentioned by chemical suppliers. Another study found that ten clinical compounds were still active in cells even when their putative target was knocked out using CRISPR.

The tone of the Commentary is pragmatic, emphasizing that for new or difficult targets, it may be difficult to find good chemical probes. For example, LY294002 is mentioned as a “pathfinder tool” that was useful to explore the biology around the PI3 kinase family but has now been superseded by more selective molecules.

Unfortunately, not everyone seems to have gotten the message. Curcumin, which as we noted can aggregate, form nonselective covalent adducts, fluoresce, and generate reactive oxygen species, appears in >2600 PubMed publications – just in the past year. What a waste.

If you’re exploring the biology of a target, please check the Portal to see whether there are good probes. If you’re reading (or reviewing!) a paper that reports small molecule studies, please check to see whether the probe has been assessed - especially to see if it shows up as one of more than 250 Unsuitables. And if you’re interested in participating, please consider reviewing or even hosting a Probe Hackathon.

28 July 2025

Can machine learning help you avoid SCAMs?

Among the many types of artifacts that can fool screens and derail efforts to find leads, small colloidally aggregating molecules (SCAMs) are particularly pernicious. As we discussed way back in 2009, these molecules can form aggregates in aqueous buffer that interfere with a variety of assays, leading to wasted resources and embarrassing publications.

The problem is that there isn’t necessarily anything wrong with the molecules per se, and even many approved drugs can form aggregates. Thus, it is difficult to predict whether any given molecule will be a troublemaker. In a new (open-access) Angew. Chem. Int. Ed. paper, Pascal Friederich, Rebecca Davis, and collaborators at Karlsruhe Institute of Technology and University of Manitoba Winnipeg explore whether machine learning can help.

The researchers built a Multi-Explanation Graph Attention Network, or MEGAN, which is accessible through a simple web interface. Rather than a homicidal doll, this MEGAN represents atoms as nodes and bonds as edges in a graph, similar to the Fragment Network we wrote about here. MEGAN was trained on a set of 12,338 aggregators and 177,048 non-aggregating molecules. Importantly, the researchers used explainable AI (xAI), which colors portions of the molecule according to their importance for (non)aggregation.

Testing MEGAN on a set of 1500 aggregators and 1500 non-aggregators, none of which were included in the training set, yielded an accuracy of 82%. Given that most molecules don’t aggregate, a model biased towards non-aggregators would be expected to have a high accuracy, and to account for this the researchers assessed the “F1” score, which was similarly impressive.

The researchers provide several examples in which subtle variations transform a molecule from a non-aggregator to an aggregator, and show that MEGAN correctly predicts these. Furthermore, it “shows its work,” highlighting the chemical features underlying the prediction. For example, 9H-pyrido[3,4-b]indole is predicted with 92% confidence not to be an aggregator.

Just adding a methyl group flips the odds in favor of aggregation to 92%.

Exploring the molecular features that lead to aggregation can reveal general trends, such as rigid, “flat” molecules with moieties that can serve either as hydrogen bond donors or acceptors. This is consistent with a paper we discussed last year, though unfortunately the researchers do not cite it.

To further assess the tool, it was tested against a set of drugs that had been characterized as aggregators or non-aggregators. MEGAN correctly classified 15 of 30 aggregators and 24 of 28 non-aggregators. In contrast, a different program caught only 2 of the aggregators. The researchers note that most of the training data for MEGAN came from a single screen in phosphate buffer at pH 7, and aggregation can be very dependent on buffer components and pH.

Practical Fragments has previously highlighted other aggregation predictors, most notably Aggregator Advisor and Liability Predictor. As for any computational model, the old chestnut “trust but verify” applies. MEGAN appears to be a useful tool, but please run physical experiments if the molecule is important.