Practical Fragments: diversity

Showing posts with label diversity. Show all posts

26 June 2023

Fragment merging in silico, two ways

Roughly speaking there are three ways to advance fragments to leads: growing, linking, or merging. Growing is the most common, but as the number of crystal structures of bound fragments continues to increase so too does the opportunity for fragment merging, in which elements of two fragments are combined into a new molecule. In a new (open access) J. Chem. Inf. Mod. paper, Charlotte Deane and collaborators at University of Oxford, Informatics Matters, Vernalis, and LifeArc compare two in silico methods.

Fragment merging, as described in the paper, “is used for fragments that bind in partially overlapping space by designing compounds that incorporate substructural features from each.” Each fragment may have extraneous bits that are not kept in the merged molecule. Indeed, sometimes only a small portion of one of the fragments is incorporated into the final molecule. (Both these concepts are shown in an example we recently wrote about here.)

When chemists merge two fragments, they consider the synthetic tractability of the merged molecules. Computers, on the other hand, sometimes propose compounds that are either unreasonable or would require a doctoral-thesis worth of effort to make. One solution is to invert the problem: rather than trying to assess whether specific in-silico-generated molecules can be made, possible merged molecules can be searched against a large virtual library of synthetically accessible molecules for similar compounds (see for example here).

A key question for this approach is the definition of similar. The most common method for finding similar molecules is by reducing them to molecular “fingerprints,” such as the presence or absence of a chlorine atom. The more fingerprints two molecules have in common, the more similar they are; this is the approach used for Tanimoto similarity.

An alternative approach is to use a “graph database” in which molecules are represented as nodes and edges, with nodes being substructures and edges being connections between the substructures. This approach was described by researchers from Astex as the Fragment Network.

Which of these approaches works better?

In the new paper, the researchers built a virtual library of more than 120 million commercially available molecules. They then selected four proteins with published crystallographic fragment hits. Between 9 and 19 individual fragments were chosen for merging for each protein, forming 55 to 134 potentially mergeable pairs. These were then computationally merged and queried against the database using either similarity searching or the Fragment Network.

Both search methods yielded comparable numbers of possible hits, ranging from just under 23,000 to nearly 169,000 per protein. These were then computationally filtered to find those molecules most likely to bind to the proteins, resulting in 56 to 952. Interestingly though, the specific molecules varied considerably depending on which search method was used. In fact, molecules from the Fragment Network mostly occupied different regions of chemical space than those found using similarity searching. Moreover, in many cases fragment pairs yielded merged compounds from one method but not the other. The number of predicted interactions with the protein targets also differed, and these differences extended not just to specific interactions but also to functional diversity.

The researchers did not purchase and test compounds themselves, but they did run the analysis against two published examples of fragment merging (one of which we wrote about here) and found that both the Fragment Network and similarity searching could find molecules related to experimentally validated binders.

The question of which approach works better remains open, so the researchers suggest using both. They do note that running a Fragment Network search is computationally less demanding, in this case taking an average of 2 to 14 minutes of CPU time vs 40 minutes for the similarity search. These differences become even more significant when searching billion-compound libraries.

Importantly, the researchers provide the code to generate your own Fragment Network, so you can try this at home. I look forward to seeing how the two techniques perform prospectively.

29 August 2022

Diverse function – not structure – in fragment libraries

Successful fragment-based lead discovery typically starts with a good library. But what is “good”? Given that most fragment libraries are small, diversity is generally prized. The idea is to cover as much chemical space as possible with the fewest molecules. When most chemists hear the word diversity they think of structural diversity; tetrahydrofuran looks quite different from pyridine, for example. Functionally though, both contain a hydrogen bond acceptor. In a paper recently published (open access) in J. Med. Chem., Charlotte Deane and collaborators at University of Oxford and Diamond Light Source argue that functional diversity is more important.

Frank von Delft and his XChem colleagues at the Diamond Light Source have been screening dozens of targets crystallographically, many of them using the DSI-poised library, designed to enable rapid elaboration of hits. (We described it here). For the present analysis, the researchers considered ten diverse proteins (maximum pairwise sequence identity of 27%) that had all been screened against 520 fragments. Of these, 225 bound to at least one target.

The researchers considered what types of interactions the bound fragments made with the protein at either the residue or atomic level. For example, a fragment might serve as a hydrogen bond acceptor to the hydroxyl group of a serine residue. These interaction fingerprints, or IFPs, were calculated and compared.

Interestingly, there was no correlation between fragments that made similar IFPs and their structural similarity. In other words, “structurally dissimilar compounds can exploit the same interactions.” Moreover, many different fragments made similar or identical interactions: “structurally diverse fragments can be described as functionally redundant.”

In fact, just 135 fragments could make all the interactions observed for the 225 fragments. Some made more novel interactions than others, with “promiscuous” fragments that bound to multiple targets tending to be more informative.

The top 100 of these 135 functionally diverse fragments tended to have molecular weights between 175 and 240 Da and 12 to 16 non-hydrogen atoms, putting them comfortably within rule of three space. Interestingly, fragments that never hit any target skewed smaller, with many having molecular weights less than 175 Da and fewer than 12 non-hydrogen atoms; this is slightly at odds with work from Astex which found many tiny fragment hits.

The researchers considered sub-libraries consisting of either these functionally diverse fragments, randomly selected fragments, or structurally diverse fragments. The number of interactions discovered was significantly higher for the functionally diverse sets of fragments than for the other sets.

On one level the findings are not surprising: the whole concept of bioisosterism relies on the fact that different functional groups can make the same interactions, meaning that structurally disparate fragments can be functionally redundant. This suggests that libraries could be optimized to capture more information with fewer molecules. How to do so prospectively is not entirely clear, but laudably the researchers have provided chemical structures for all the fragment hits in the Supporting Information. It may be worth adding some of the functionally diverse fragments to your library; perhaps some enterprising vendor will start selling the top 100 as a set.

14 January 2013

Poll Results - Hurray for Diversity

In our latest poll, we asked what kind of libraries people like, giving three options:

I like a maximal diverse library (SAR comes from follow up)
I like diversity, but not at the expense of SAR (follow up is easier with some SAR)
My target is teflon so any active fragment is welcome news.

60% of respondents like a maximally diverse library, 31% like diversity with some SAR, and 8% work of teflon targets, so any hit matter is welcome.

The way I read this is that 60% of people don't consider the screen done when the first results come in. In my eyes, the screen is over when there are actives identified with testable SAR hypotheses. This is probably just my bias of having lived in a very resource constrained environment where follow up to a screen was a second serving of resources. To me, this is great news; companies that are doing fragment screening are invested and not giving short shrift to these efforts.

I would be curious to hear in the comments how people develop SAR with a maximally diverse library. Do you just pick every available fragment that has the same central core and evaluate all possible side chains? Would you apply a similarity cutoff of 0.9 or something? How many compounds do you follow up with per active fragment?

29 April 2011

Not fragments versus DOS, fragments from DOS

A few months ago we highlighted a forum in Nature comparing fragment-based lead discovery with diversity-oriented synthesis, or DOS. This was quite a vigorous debate and was covered on our sister blog as well as In the Pipeline and second messenger. Personally I’ve never been a this or that kind of guy – more of a this and that – so it is refreshing to see a paper in this week’s issue of PNAS describing a DOS approach to building fragments.

Damian Young and colleagues at Harvard and the Broad Institute, ground zero for DOS, noted that many commercial fragments contain a sizable percentage of sp2 carbons: aromatic rings, for example. Because a larger number of aromatic rings correlates with lower solubility and higher attrition in lead development, the researchers focused on using DOS to generate fragments that would have a higher fraction of sp3 carbons at the expense of sp2 carbons. They used a “build/couple/pair” approach, in which chiral “building blocks” (in this case proline derivatives) were “coupled” to another building block and then functional groups were “paired” to generate bicyclic molecules. The result was about three dozen fragments.

So how do they look? Actually, not so bad. Superficially they all resemble one another, but because they contain up to three stereocenters they cover quite a bit of chemical space while still conforming to the rule of 3. Significantly, they are in fact more three-dimensional than commercially available fragments (from ZINC) having the same molecular formula or the same set of calculated physical properties (molecular weight, cLogP, number of hydrogen-bond donors and acceptors, etc.). The DOS fragments contain a larger fraction of methyl esters and carboxylic acids than I would want to see in a library overall, but this was intentional, and none of them are downright ugly.

Unfortunately the paper provides no screening data, so it is anyone’s guess whether any of the fragments will turn out to be active. Still, the approach is likely to probe new areas of chemical space. Hopefully some of the commercial purveyors of fragments will start making and selling these types of molecules.