20 February 2013

Fragmenting natural products – sometimes PAINfully

Many drugs have their origins in natural products. But as any synthetic organic chemist will tell you, natural products often have complex architectures that can take years of effort and dozens of chemists to make in the lab. Thus, many of the compounds made in industry look quite different from natural products, particularly in the past few decades. High failure rates in drug discovery have led folks to return to natural products or similar compounds, such as those from diversity oriented synthesis (DOS). In a recent issue of Nature Chemistry, Herbert Waldmann and colleagues at the Max-Planck Institute in Dortmund examine whether natural products can serve as starting points for new fragments.

The researchers started by computationally deconstructing 183,769 natural products into 751,577 component fragments. After various filters (size, lipophilicity, reactivity, etc.) they arrived at 110,485 fragments sorted by similarity into 2000 clusters. The resulting fragments differ in their overall calculated properties from commercial fragments. This is all highly reminiscent of the Emerald (nee deCODE) “fragments of life”, though surprisingly that work is not referenced.

One challenge of designing new fragments is that you may not be able to buy them. In this case, nearly half of the clusters did have a compound that could be purchased – though perhaps this somewhat defeats the purpose of trying to explore novel chemical space. At any rate, 193 fragments were either bought or synthesized. These were tested in functional assays against p38a MAP kinase and several protein phosphatases. A number of hits were identified, and in the case of p38a, nine kinase-fragment co-crystal structures were solved. Some of these were similar to previously reported fragments, but others were more unusual. Together with the crystal structures, these fragments provide new ideas for a well-studied target.

Looking at the structures of some of the phosphatase inhibitors, however, I started to worry. One strong point of the paper is that it is very complete: the chemical structures of all 193 tested fragments are provided in the supplementary information. Unfortunately, the list contains some truly dreadful members; 17 of the worst are shown here, with the nasty bits shown in red. All of these are PAINS that will nonspecifically interfere with many different assays.

Compounds 15, 44, 49, 159, 166, 173, 174, and 175 are catechols; compounds 89 and 151 (yes, they are the same molecule – guess they really liked this one), 165, 166, 167, and 168 are quinones; compounds 55, 89/151, and 166 are hydroquinones; compound 20 is a Michael acceptor; compound 76 is an epoxide; and compound 184 is a redox cycler. In other words, these fragments are a depressing example of life imitating art (or at least satire).

To be blunt: none of these molecules should appear in a screening library today.

I don’t want to pick on these researchers; it is after all laudable that they fully disclosed the structures of their molecules.

However, I am concerned that other people may build libraries containing some of these fragments, or worse, that opportunistic vendors will start selling “natural-product derived fragments.” Indeed, most of these molecules are commercially available. It is disappointing that so many nuisance compounds would find their way into research published in a Nature family journal, and I think it is important to call it out. Only by publicizing the problems that can arise will people be made aware of the dangers.


Anonymous said...

Not disputing your classification of theese compounds as
PAIN problematic, but perhaps this isn't nesicary a problem for for example Crystallographic screening or other biophysical assays. Much of the PAIN criteria describes interference to biochemical assays in particular. There are other potential problems (tractability etc) with those compounds too, however if you also view the fragment screen as para-analytical just to find ANY starting points it could perhaps still be worthwhile. In particular if it renders crystal structures. As always, it comes down to the target in the end.

Dan Erlanson said...

Andreas brings up the important point that crystallography is less susceptible to false-positives than, for example, biochemical screens. Indeed, crystal structures of PAINs bound non-covalently to proteins have been reported. The problem is that crystallography is usually a secondary screen, and a fairly resource-intensive one at that. Indeed, it is interesting that none of the crystal structures shown in the paper are with PAINs, and conversely none of the PAINs shown in the paper have crystal structures reported.

Also, while crystallography may be less susceptible to PAINs-type false positive, other biophysical techniques such as NMR and SPR may not be so fortunate. Indeed, SPR alone does not conclusively indict this compound, whose behavior is so complex that the authors acknowledge it has limited value for further development despite hitting an important target.

Ultimately any screening library is limited in size, and since this is all the more true when you are using time-consuming biophysical methods to screen fragments, why include molecules known to cause problems?

JonathanBaell said...

Nice Blog Dan!

Have a look at this compound, that possibly wins the prize for hottest Michael Acceptor imaginable:


"A cell-permeable bis-nitrobenzylidene-piperidinone compound that acts as a potent, reversible and selective inhibitor of 19S regulatory-particle-associated deubiquitylating enzymes (DUBs) UCH-L5 and USP-14 (IC50 = 2.1 μM against Ub-AMC substrate), with no effect on UCH-L1, UCH-L3, USP2, USP7, USP8 and BAP1 and on the proteasomal proteolytic activities"

Sold by many outlets already, based on a Nature Medicine paper in 2011 (used as 50uM!). Already being used extensively by the looks of a quick Google. Such a waste. And just one of so many "tool compounds" that are the complete opposite. Why cannot this stop?

Stephen Davey said...

Hi Dan,

[I want to declare upfront my interest - I'm the editor at Nature Chemistry who handled this paper.]

I'm delighted to see the paper being discussed in this forum. I can see from your comments that you have concerns about some of the fragments generated.

The paper does describe a filtering process to remove highly reactive groups - acid chlorides and so on - so I wonder if this could be used to weed out these problematic fragments as well?

Then again, there are several drugs out there with, for example, catechol functionality (I only performed a very quick search) so it's not clear to me that would be the right approach across the board - it seems to me that there are few substitutes for the intuition/knowledge of an experienced medicinal chemist in identifying potentially problematic fragments and removing them based on the target in question. Looking forward to hearing your thoughts.


Dan Erlanson said...

Hi Stephen,

Thanks for commenting - I think these sorts of conversations are useful and I'm glad you're part of this one.

There are computational PAINS filters available; Jonathan Baell has recently published an updated set of several hundred:


You are correct that catechols do appear in some drugs, as do some other problematic moieties. However, there are several reasons why such compounds are not suitable in screening collections. First, isolating a moiety from a natural product or a drug has the potential to remove moderating functionalities, leaving just a core reactive group. Second, a drug may be active against a specific receptor at low concentration, but could bind non-specifically at high concentrations to a wide variety of proteins; the moieties I highlighted have been observed to do this this. Finally, some of these moieties are just too reactive to include in a screening collection; they will oxidize and/or polymerize. A drug is typically carefully formulated and kept under tightly controlled conditions, but screening compounds are often stored for prolonged periods of time in (mildly oxidizing) DMSO, where all kinds of things can happen.

Also, some of the molecules (such as quinones and compound 184) have been shown to generate hydrogen peroxide under typical assay conditions:


There is so much more chemical diversity space than can possibly be explored, even with fragments, that it is worth avoiding moieties that have been shown repeatedly to be problematic.

Stephen Davey said...

Hi Dan,

Thanks for your input. On that basis it seems some of these PAINful fragments need to be included in the filter - like fragment 184 - a lifetime ban.

For some others it needs a non-death penalty solution (to quote Mr Armstrong). You mention the moderating effect of other groups within natural products - Is this a well studied problem? To the extent that it would be possible to flag some fragments (lets stick with catechols) and be able to say, "only use a catechol if you also have X, Y or Z group"

Of course this is still a problem in an initial fragment screen, but might it be useful if you were trying to 'staple' fragments together?

Dan Erlanson said...

That's a good question, but I think I'd turn it around and ask why include things that could be misleading? It's not as if there is a shortage of molecules to choose from or an over-sampling of chemical space.

I saw Chris Austin give a talk at a Keystone Meeting last year (Addressing the Challenges of Drug Discovery) in which he said:

"Every data point we generate is an artifact until proven otherwise."

I think this is prudent, since some of the most insidious artifacts appear very exciting and waste tremendous resources trying to verify. With that in mind, it makes sense to avoid collecting deceptive data in the first place, like Odysseus plugging the ears of his crew to keep them from hearing the dangerous songs of the sirens.

Probe discovery - not to mention drug discovery - is inherently difficult, so I'd recommend not making things even harder by screening PAINS.

Björn Over said...

I am the first author of this paper and I am happy to see it being discussed here.

You are totally correct and there are molecules in this test set that are not optimal from a MedChem point of view and we are aware that these are not the classical first choice for a follow up. But designing a drug was not our intension.

It seems like discussing our main intension, fragmentation procedure, representative library etc. did not seem to be as interesting as pointing on some natural product structural motifs that do not fit common MedChem patterns.

The focus of this work was clearly on the algorithm and to present a rational and objective way to generate virtual natural-product derived fragments for exploration of nature’s chemical space, thereby keeping functional groups, attachment points and side chains for picturing the full diversity. We on purpose did not include any more bias into this computational step, as we wanted to generate the full range of fragments that are present in natural products. In the paper we clearly pointed out that unstable or reactive fragments occur.

By doing this fragmentation of the whole Dictionary of Natural products we, for the first time, provide an overview on the 2,000 most representative fragments for the known natural product chemical space. This set differs essentially from the ‘Fragments of Life’ library that just focusses on a small group of metabolites and their derivatives having very scaffold like structures.

Additionally we stated that the most interesting NP fragments are not commercially available and need to be synthesized first, although we provide a list with derivatives that may be able to substitute nearly half of the cluster centers. As synthesis was not the focus of this project, we screened beside some synthesized ones, the molecules that were available for a proof of concept. Of course the test set is not fully representative for the total natural product chemical space. Nevertheless these structural motifs are annotated in the DNP.

For a follow up we focused on the synthesized more 3D-like fragments and were able to present a completely new class of type-III inhibitors for the well-established drug target p38alpha MAP kinase, based on the natural product cytisine. To my knowledge there are no such fragment-like structures reported solely binding to this allosteric binding site of a kinase. This shows the chances of finding new chemotypes by exploring a divergent chemical space with the help of the structurally different NP-fragments.

I believe that many researchers in the fragment-based community will have a look at the 2,000 representative fragments shown in the Supplementary Information and find interesting structures that are worth to be synthesized as starting points for new screening campaigns.
Therefore I am convinced that this paper is worth to appear in Nature Chemistry, as it will have a high impact and be inspiration for others, like the 3D fragment consortium, to explore nature’s chemical space. And yes, hopefully vendors will pick this up and someday sell natural product-derived fragments.

Dr. Teddy Z said...

@Bjorn...I disagree completely with your last two paragraphs. Nobody wants to spend anything more than $$ on fragments, especially not chemistry resources. If you can't buy it (and cheaply) it will never be a part of a screening collection. And I agree with Dan's original comments that these fragments do not belong in a screening library. As anyone who has done FBDD (or FBHG or whatever acronym) for a job can tell you is that anyone can find a fragment that binds. It's what you do with that fragment after that makes you money, or loses it. No self-respecting is going to spend any time doing hit expansion on a molecule with known ADME/Tox issues. I laud your identifying the range of naturally occuring fragments, but to propose those that are known to be bad actors chemically as part of a screening collection is not laudable.

Björn Over said...

I was not our intention to propose anybody to use known toxic molecules for a screening, but these were natural product fragments that we got results for, which we showed mainly in the Supplementary Information for the sake of presenting the range of fragments we were able to test within this project.
If I would have known that these shown structures distracts the community from the real good results (see the paper) and the discussion how to come to more 3-dimensional and more natural product-like fragments (a lot of people are looking for these), then I would maybe just have left these molecules out.

The message was more supposed to be: 'Look, we have an idea how to tackle the NP chemical space, please look at these 2,000 fragments and see if it can help you generating new ideas.' This it what should be discussed.

You are correct, not a lot of people will spend the effort of making new molecules for their screening libraries and rather use the cheap ones you can buy (like we did partly here and you see how it ends up...).
But, how do you expect to get something new out of your screens if you put in always the same molecules?
Why is there a European Lead Factory within the IMI, that spends a lot of money on making new molecules?

My personal opinion is, that we do not need to higher the haystack, but need to increase the number of needles. And this it not only true for small molecules, but as well for fragments.

So please, ignore the used/generated molecules you do not like and focus on the real story of our work. If you disagree with our approach and statements I can handle it, but do not trash everything because of a suboptimal test set.

Dr. Teddy Z said...

@Bjorn: 'Look, we have an idea how to tackle the NP chemical space, please look at these 2,000 fragments and see if it can help you generating new ideas.' This it what should be discussed.
I thought we were discussing it. :-) So, what percentage of all these molecules are PAINs? Those molecules will not generate new ideas, for exactly the reasons Dan has pointed out in the comments.

Dan Erlanson said...

Hi Björn,

First of all, thanks for commenting; I think these back-and-forth discussions are useful. Ultimately we are all on the same side and trying to advance knowledge.

It is certainly not my intention to “trash” your paper, and I completely agree that natural products can provide valuable starting fragments.

However, I do think that it is important to draw attention to areas of concern. The issue is not whether the molecules highlighted will be useful for drug discovery, but whether they will advance understanding or obfuscate it. You state in your paper:

Application of the knowledge-based HTS filter implemented in Pipeline Pilot removed potentially toxic, unstable or undesirable fragments.

All I am saying is that you should also filter out the molecules I highlighted in the blog, and for the same reasons. The problem is that such molecules are commercially available and they will generate hits against many targets, but they won’t be useful hits.

I think that people need to be very careful in putting together screening collections to avoid including molecules that have repeatedly been shown to be problematic. Including them in your library and your paper could cause researchers to waste a tremendous amount of resources chasing artifacts.

I do hope that the non-problematic structures you highlight will provide some interesting compounds for screening libraries. It is true that people are too often reluctant to invest in synthesis of novel fragments. Perhaps efforts such as the European Lead Factory and 3Dfrag.org can ameliorate this problem: boosting numbers of interesting compounds while avoiding PAINS.

Evert said...

On a different note: is it possible to spot redox cyclers such as compound 184 based on structure (in analogy to PAINS filters), or does one simply need to test this experimentally?

Dan Erlanson said...

Evert, I think many of them can certainly be weeded out using structural filters; I think compound 184 is in fact a PAIN. This is what's so disappointing about the fact that toxoflavin keeps coming up as a hit when there are ample warnings in the literature.

That said, I think as chemists generate new molecules, it will be important to experimentally test them.