26 June 2023

Fragment merging in silico, two ways

Roughly speaking there are three ways to advance fragments to leads: growing, linking, or merging. Growing is the most common, but as the number of crystal structures of bound fragments continues to increase so too does the opportunity for fragment merging, in which elements of two fragments are combined into a new molecule. In a new (open access) J. Chem. Inf. Mod. paper, Charlotte Deane and collaborators at University of Oxford, Informatics Matters, Vernalis, and LifeArc compare two in silico methods.
 
Fragment merging, as described in the paper, “is used for fragments that bind in partially overlapping space by designing compounds that incorporate substructural features from each.” Each fragment may have extraneous bits that are not kept in the merged molecule. Indeed, sometimes only a small portion of one of the fragments is incorporated into the final molecule. (Both these concepts are shown in an example we recently wrote about here.)
 
When chemists merge two fragments, they consider the synthetic tractability of the merged molecules. Computers, on the other hand, sometimes propose compounds that are either unreasonable or would require a doctoral-thesis worth of effort to make. One solution is to invert the problem: rather than trying to assess whether specific in-silico-generated molecules can be made, possible merged molecules can be searched against a large virtual library of synthetically accessible molecules for similar compounds (see for example here).
 
A key question for this approach is the definition of similar. The most common method for finding similar molecules is by reducing them to molecular “fingerprints,” such as the presence or absence of a chlorine atom. The more fingerprints two molecules have in common, the more similar they are; this is the approach used for Tanimoto similarity.
 
An alternative approach is to use a “graph database” in which molecules are represented as nodes and edges, with nodes being substructures and edges being connections between the substructures. This approach was described by researchers from Astex as the Fragment Network.
 
Which of these approaches works better?
 
In the new paper, the researchers built a virtual library of more than 120 million commercially available molecules. They then selected four proteins with published crystallographic fragment hits. Between 9 and 19 individual fragments were chosen for merging for each protein, forming 55 to 134 potentially mergeable pairs. These were then computationally merged and queried against the database using either similarity searching or the Fragment Network.
 
Both search methods yielded comparable numbers of possible hits, ranging from just under 23,000 to nearly 169,000 per protein. These were then computationally filtered to find those molecules most likely to bind to the proteins, resulting in 56 to 952. Interestingly though, the specific molecules varied considerably depending on which search method was used. In fact, molecules from the Fragment Network mostly occupied different regions of chemical space than those found using similarity searching. Moreover, in many cases fragment pairs yielded merged compounds from one method but not the other. The number of predicted interactions with the protein targets also differed, and these differences extended not just to specific interactions but also to functional diversity.
 
The researchers did not purchase and test compounds themselves, but they did run the analysis against two published examples of fragment merging (one of which we wrote about here) and found that both the Fragment Network and similarity searching could find molecules related to experimentally validated binders.
 
The question of which approach works better remains open, so the researchers suggest using both. They do note that running a Fragment Network search is computationally less demanding, in this case taking an average of 2 to 14 minutes of CPU time vs 40 minutes for the similarity search. These differences become even more significant when searching billion-compound libraries.
 
Importantly, the researchers provide the code to generate your own Fragment Network, so you can try this at home. I look forward to seeing how the two techniques perform prospectively.

No comments: