Roughly speaking there are three
ways to advance fragments to leads: growing, linking, or merging. Growing is
the most common, but as the number of crystal structures of bound fragments
continues to increase so too does the opportunity for fragment merging, in
which elements of two fragments are combined into a new molecule. In a new
(open access) J. Chem. Inf. Mod. paper, Charlotte Deane and
collaborators at University of Oxford, Informatics Matters, Vernalis, and LifeArc
compare two in silico methods.
Fragment merging, as described in
the paper, “is used for fragments that bind in partially overlapping space by
designing compounds that incorporate substructural features from each.” Each
fragment may have extraneous bits that are not kept in the merged molecule. Indeed,
sometimes only a small portion of one of the fragments is incorporated into the
final molecule. (Both these concepts are shown in an example we recently wrote
about here.)
When chemists merge two fragments,
they consider the synthetic tractability of the merged molecules. Computers, on
the other hand, sometimes propose compounds that are either unreasonable or
would require a doctoral-thesis worth of effort to make. One solution is to
invert the problem: rather than trying to assess whether specific in-silico-generated
molecules can be made, possible merged molecules can be searched against a
large virtual library of synthetically accessible molecules for similar
compounds (see for example here).
A key question for this approach is
the definition of similar. The most common method for finding similar molecules
is by reducing them to molecular “fingerprints,” such as the presence or absence of
a chlorine atom. The more fingerprints two molecules have in common, the more similar
they are; this is the approach used for Tanimoto similarity.
An alternative approach is to use
a “graph database” in which molecules are represented as nodes and edges, with
nodes being substructures and edges being connections between the
substructures. This approach was described by researchers from Astex as the
Fragment Network.
Which of these approaches works
better?
In the new paper, the researchers
built a virtual library of more than 120 million commercially available molecules.
They then selected four proteins with published crystallographic fragment hits.
Between 9 and 19 individual fragments were chosen for merging for each protein,
forming 55 to 134 potentially mergeable pairs. These were then computationally
merged and queried against the database using either similarity searching or
the Fragment Network.
Both search methods yielded comparable
numbers of possible hits, ranging from just under 23,000 to nearly 169,000 per
protein. These were then computationally filtered to find those molecules most likely
to bind to the proteins, resulting in 56 to 952. Interestingly though, the
specific molecules varied considerably depending on which search method was
used. In fact, molecules from the Fragment Network mostly occupied different
regions of chemical space than those found using similarity searching. Moreover,
in many cases fragment pairs yielded merged compounds from one method but not the
other. The number of predicted interactions with the protein targets also
differed, and these differences extended not just to specific interactions but
also to functional diversity.
The researchers did not purchase
and test compounds themselves, but they did run the analysis against two published
examples of fragment merging (one of which we wrote about here) and found that
both the Fragment Network and similarity searching could find molecules related
to experimentally validated binders.
The question of which approach
works better remains open, so the researchers suggest using both. They
do note that running a Fragment Network search is computationally less
demanding, in this case taking an average of 2 to 14 minutes of CPU time vs 40
minutes for the similarity search. These differences become even more
significant when searching billion-compound libraries.
Importantly, the researchers
provide the code to generate your own Fragment Network, so you can try this at
home. I look forward to seeing how the two techniques perform prospectively.