24 September 2014

PAINS in Nature

Practical Fragments has previously noted that many pan-assay interference compounds (PAINS) can be found in nature. Indeed, they’ve also found their way – unintentionally – into journals published by Nature Publishing Group. In an effort to educate the scientific community about these artifacts, Jonathan Baell (Monash University) and Mike Walters (University of Minnesota) have just published a Comment in Nature entitled “Chemical con artists foil drug discovery”. This is the clearest discussion I’ve yet seen of PAINS, and it deserves to be widely read.

Since the article is open-access I won’t go into depth here, other than to say that the researchers propose three steps to avoid PAINS.

1) Learn disreputable structures.
As a start, the paper provides a rogue’s gallery of some of the worst molecules, along with memorable interpretations by award-winning New Yorker cartoonist Roz Chast. It would be nice to see this posted in every academic screening center.

2) Check the literature.
This is even easier than having to learn structures, and should prevent people from embarrassing themselves by publishing research that is obviously flawed.

3) Assess assays.
Multiple orthogonal assays are useful for all science, not just FBLD!

Together with the recent C&ENstory and ACS symposium, this article ensures that PAINS are finally reaching the level of recognition such that scientists, reviewers, and editors will no longer be able to claim ignorance. Willful negligence may be another matter, but at least people will be able to recognize it as such.


Anonymous said...

I find it rather ironic that a sister journal, Nature Chemical Biology, recently published an article (doi:10.1038/nchembio.1623) with a very suspicious structure that falls neatly into the PAINS category. Not only that an editorial comment calls for more of these structures to be made (doi:10.1038/nchembio.1632)! A simple search of ChEMBL shows that the compound has been active in 35 assays. Maybe the editors should send the Baell and Walters article around their offices.

Anonymous said...

There are even fast, effective & free ways to filter out PAINS structures - using the free KNIME data analytics platform and a freely available workflow


did i mention it is free?

Pete said...

I believe that we need to be thinking more about the criteria by which compounds are deemed to be PAINS. For example has a compound been shown experimentally to be a PAIN or do we think it ought to be a PAIN because it includes a substructure that is believed to be PAINful? What do we mean when we assert that 8% of compounds in commercial libraries are PAINS. If we believe that a substructure is PAINful then exactly how strong is the link between the presence of the substructure and the observed PAINfulness? Is the PAINfulness restricted to a single type (i.e. detection technology) of assay or has it been observed over different types of assay? How much should we be worried about PAINfulness if affinity to the target has been measured directly and characterized by X-ray crystallography?

I’m certainly not denying that PAINS are a problem and we do need to be aware of potential screening artefacts. At the same time, we do need to find ways to better capture how much we really know about the PAIN levels associated with particular substructures, particularly when asserting literature pollution. As a cautionary tale, it’s worth remembering that it was asserted (http://dx.doi.org/10.1038/nrd2445 ), “Lipophilicity plays a dominant role in promoting binding to unwanted drug targets” even though correlations were for median lipophilicity for each promiscuity level and the activity (>30% inhibition at 10 micromolar) threshold used to define promiscuity is unlikely to have any physiological relevance. Fast forward a bit and we see this work being cited in support of the assertion ( http://dx.doi.org/10.1021/jm201388p ) that lipophilicity, “… has an inevitable role in selectivity and promiscuity” which could be regarded as a form of inflation in its own right.

Dan Erlanson said...

Pete emphasizes nuances, which are important, as also recognized by the authors of the Comment (exemplified by their tip #3, "assess assays.")

That said, like anything in life one needs to strike a balance, and as the first anonymous commenter illustrates the literature today is as polluted as a Superfund site.

Also, it's important to note that even if you have a crystal structure of a PAIN bound to your protein, that doesn't mean you've got a good molecule: it might still interact non-specifically with numerous other proteins in a cell, complicating any biological interpretation.

The safest approach is to assume everything is an artifact until proven otherwise. As Richard Feynman noted, ‘The first principle is that you must not fool yourself–and you are the easiest person to fool. So you have to be very careful about that.’

Pete said...

I’m doing a bit more than emphasizing nuances. I’m saying that the analysis (and data) used assert that a particular substructure is PAINful needs to be presented transparently. One can say that we should assess assays but we also need to assess the ‘Pan’ (e.g. do the offending assays all use the same detection technology?). Also should we worry that the Pan-assay ‘activity’ reflects promiscuity/selectivity or is simply an interference problem?

As a potentially amusing (given that I have skewered ligand efficiency metrics) aside, I should note that when analyzing HTS output, we used to flag very low molecular weight ‘actives’ as potentially suspicious.

MAW said...

In most of my analysis, PAINS (or percent PAINS) simply means that this was the percent of compounds flagged by substructure filters implemented in Canvas (Schrodinger). The point is simply to say that most commercial libraries have suspect compounds in them, and researchers should be wary. I have found at least one 50k library that has no compounds flagged as PAINS.

When doing HTS triage, I simply flag compounds as PAINS and decide by visual inspection how to prioritize them versus other actives. The same is true of any computational filter we use. After all, our groups don't have the resources to follow up on everything we find, so knowing what might be the risks of moving ahead on a series is important.

Presumably, pharmaceutical companies will sometimes employ even harsher filters to flag or remove compounds. REOS filtering is much stricter...usually on the order of 25-30% in large commercial libraries. But of course we don't throw out all nitro aromatics at the triage stage.

We are often approached by academic researchers who have developed compounds that we believe fit the PAINS substructure classes. In one case they have crystal structures. Of course, these crystals were formed under conditions that were not at all like the colormetric assays they were running to determine activity. (And the compounds weren't that potent anyow, even after a library of ~400 was prepare.) Before moving ahead with a collaboration, we proposed to derisk these compounds by performing simple redox cycling assays. After about 2 years we have yet to have access to any of their compounds.

The bottom line is we use many computational filters to flag compounds for prioritization. PAINS filtering is just one tool in this toolkit.

Jonathan Baell said...

HI Pete

Points well taken but they are all pretty much covered in my original PAINS JMC pub (see also subsequent AJC and FMC for a bit extra). Amongst the JMC SI (96pages) are individual sections devoted to PAINS to help one decide the nuances of each class. All the data as to why PAINS are PAINS are there. Also discussion on the fact that some PAINS members hit 6/6 assays and others 0/6 that belong to the same class.

Pete said...

Hi Jonathan,

I fully agree that we need to be extremely wary of PAINS when using assays that do not measure affinity directly (e.g. in HTS). However, things are less clear when measuring affinity more directly, for example, with SPR or by protein-detect NMR (both used screen fragments) . One question that could (should?) be asked of a PAIN in this situation is whether it is actually a 'bad' compound or just interferes with one or more types of assay. I've made an analogous point with respect to lipophilicity when I've asked how much do we still need to worry about logP if we've shown that compounds are aqueous-soluble, hERGless, metabolically stable etc...

I would agree that compounds that are observed to hit many targets should be flagged up. However, there are other ways to look for bad actors when analysing screening data. The observation, that assays against different target classes identify a high proportion of common hits can flag up problem compounds (and susceptable assays) even when the compounds don't show pan-assay behaviour. I recall working up HTS output for a cysteine protease and a tyrosine phosphatase (both have catalytic cysteine) during my time in Wilmington and the fact that certain 'yucky-looking' (including some that were subsequently shown to be redox-cyclers) compounds were hitting in both assays increased our caution levels.

On a more general note with respect to 'yucky' compounds, I have put compounds with nitro groups into fragment libraries. One reason for this was that the scaffolds were interesting and I couldn't find them without the nitro groups (one of the predecessors of my former employer used to make dyestuffs and there were interesting heterocycles available in gram qauantities). Nitro groups are not a big deal when measuring affinity directly and I'd typically be confident that it could be purged (in the Stalinesque manner) from the structure and replaced with something more acceptable.

Joseph Sodroski said...

18A, which was identified in a screen for HIV-1 envelope glycoprotein (Env) inhibitors (doi: 10.1038/nchembio.1623), contains a 4-hydroxyphenylhydrazone moiety. Baell and Holloway (doi: 10.1021/jm901137) reported that 27% of the 215
4-hydroxyphenylhydrazone-containing compounds tested showed activity in two or more Alpha Screen assays.

Anonymous suggests that 18A is a PAIN, noting that the compound has been active in 35 assays (out of 624 assays tested). Considerable evidence was presented in the Nature Chemical Biology paper (doi: 10.1038/nchembio.1623) that 18A exhibits specificity for HIV-1 Env:

1) When the PAIN-like element in 18A was removed by deleting the hydroxyl group from the phenyl ring, the analogue retained activity against HIV-1 (Supplementary Table 5).
2) A counterscreen control assay using the same readout as the Env inhibition assay was used in the high-throughput screen that identified 18A.
3) 18A inhibited HIV-1 infection in an assay in which the compound was washed out of the cell culture after virus entry and long before the reporter protein was expressed in the cells.
4) 18A did not inhibit every strain of HIV-1, nor did it inhibit the related simian immunodeficiency virus. In addition, more than ninety percent inhibition of sensitive HIV-1 strains was observed at concentrations that did not affect a control virus, which was identical to the sensitive viruses except for the envelope glycoproteins.
5) The viral determinant of 18A resistance mapped to gp120, a specific Env subunit.
6) Single amino acid changes in gp120 resulted in altered HIV-1 sensitivity to 18A.
7) 18A had no effect on the binding of Env to host cell receptors, but potently blocked receptor-induced conformational changes in Env.

As reported in the paper, at concentrations higher than those required for antiviral activity, 18A exhibits some non-specific activity. Additional work is needed to determine if the antiviral potency of 18A can be improved and the nonspecific toxicity remedied by modification of the compound. Regardless, the study of 18A revealed the existence of distinct HIV-1 Env conformations and underscored the merit of blocking Env conformational transitions as an antiviral strategy.