10 November 2025

Searching monstrously large chemical space with FrankenROCS

Back in 2023 we highlighted a computational fragment linking/merging approach which was used to find high nanomolar inhibitors of the SARS-CoV-2 macrodomain (Mac1), a COVID-19 target. However, those molecules contained carboxylic acids, often associated with poor cell permeability. In a new open-access Sci. Adv. paper, James Fraser and collaborators at UCSF, Relay Therapeutics, Enamine, and Chemspace describe a related approach to find new, non-charged inhibitors.
 
The new approach, called FrankenROCS, “takes pairs of fragments as input to query a database using the rapid overlay of chemical structure (ROCS) method of comparing 3D shape and pharmacophore distribution;” the goal is to find larger molecules that most closely resemble the initial fragment pairs. As with the previous publication, the team started with more than 200 crystallographic fragment hits published in 2021. A set of 7,181 pairs of adjacently-bound fragments were searched against 2.1 million compounds commercially available from Enamine. The top 1000 were inspected, and 39 were purchased and soaked into crystals of Mac1. This led to 10 successful structures, of which AVI-313 did not contain a carboxylic acid. This molecule had weak but measurable activity in an HTRF competition assay.
 
Two million compounds is a lot but pales in comparison to Enamine’s “make-on-demand” REAL space, which at the time this research was done consisted of more than 22 billion molecules. The REAL space molecules are constructed from 960,398 building blocks that can be combined using 143 reactions. We previously described an approach called V-SYNTHES to screen Enamine’s REAL space. FrankenROCS takes a different active-learning approach called Thompson Sampling, which dates back nearly a century.
 
Imagine two sets of 1000 building blocks, R1 and R2, which could be coupled to generate 1,000,000 molecules. Rather than searching all possibilities, each R1 building block is linked to three random R2 building blocks, and each R2 building block is linked to three random R1 building blocks. These are virtually screened, and the R1 or R2 building blocks from those with the highest scoring compounds are used for further iterations. In theory, after tens of thousands of iterations, the best compounds will have been identified.
 
The researchers fed 97 fragment pairs from the 2021 paper into Thompson Sampling FrankenROCS to find molecules that would best overlay with the fragment pairs.  Ultimately 32 compounds were purchased, six of which were successfully crystallized with Mac1. Unfortunately, the most potent was a weaker inhibitor than AVI-313 and contained a carboxylic acid. The researchers speculate that the inability to find better molecules in larger chemical space may have stemmed from limitations of the scoring function, a problem we’ve previously discussed.
 
The researchers returned to focus on AVI 313, making substitutions at multiple positions, ultimately synthesizing 148 compounds, 121 of which could be characterized crystallographically. Importantly, several compounds had low micromolar activity, even without a carboxylic acid. The crystal structures show the binding site to be somewhat flexible, as evidenced by side chain and main chain movements to accommodate some of the binders.

This is a nice, thorough investigation, and the 137 protein-compound crystal structures deposited into the protein data bank provide useful training data for next-generation computational approaches. Moreover, the fact that immeasurably weak fragments can be advanced to low micromolar, ligand-efficient hits is yet another reason for the research community to figure out how to make crystallographic fragment screening data more widely available, as we exhorted here.

No comments: