23 October 2023

A Liability Predictor for avoiding artifacts?

False positives and artifacts are a constant source of irritation – and worse – in compound screening. We’ve written frequently about small molecule aggregation as well as generically reactive molecules that repeatedly come up as screening hits. It is possible to weed these out experimentally, but this can entail considerable effort, and for particularly difficult targets, false positives may dominate. Indeed, there may be no true hits at all, as we noted in this account of a five-year and ultimately fruitless hunt for prion protein binders.
 
A computational screen to rapidly assess small molecule hits as possible artifacts would be nice, and in fact several have been developed. Among the most popular are computational filters for pan-assay interference compounds, or PAINs. However, as Pete Kenny and others have pointed out, these were developed using data from a limited number of screens in one particular assay format. Now Alexander Tropsha and collaborators at University of North Carolina Chapel Hill and the National Center for Advancing Translational Science (NCATS) at the NIH have provided a broader resource in a new J. Med. Chem. paper.
 
The researchers experimentally screened around 5000 compounds, taken from the NCATS Pharmacologically Active Chemical Toolbox, in four different assays: a fluorescence-based thiol reactivity assay, an assay for redox activity, a firefly luciferase (FLuc) assay, and a nanoluciferase (NLux) assay. The latter two assays are commonly used in cell-based screens to measure gene transcription. The thiol reactivity assay yielded around 1000 interfering compounds, while the other three assays each produced from 97 to 142. Interestingly, there was little overlap among the problematic compounds.
 
These data were used to develop quantitative structure-interference relationship (QSIR) models. The NCATS library of nearly 64,000 compounds was virtually screened, and around 200 compounds were tested experimentally for interference in the four assays, with around half predicted to interfere and the other half predicted not to interfere. The researchers had also previously built a computational model to predict aggregation, and this – along with the four models discussed here – have been combined into a free web-based “Liability Predictor.”
 
So how well does it work? The researchers calculated the sensitivity, specificity, and balanced accuracy for each of the models and state that “they can detect around 55%-80% of interfering compounds.”
 
This sounded encouraging, so naturally I took it for a spin. Unfortunately, my mileage varied. Or, to pile on the metaphors, lots of wolves successfully passed themselves off as sheep. Iniparib was recognized correctly as a possible thiol interference compound. On the other hand, the known redox cycler toxoflavin was predicted not to be a redox cycler – with 97.12% confidence. Similarly, curcumin, which can form adducts with thiols as well as aggregate and redox cycle, was pronounced innocent. Quercetin was recognized as possibly thiol-reactive, but its known propensity to aggregate was not. Weirdly, Walrycin B, which the researchers note interferes with all the assays, got a clean bill of health. Perhaps the online tool is still being optimized.
 
At this point, perhaps the Liability Predictor is best treated as a cautionary tool: molecules that come up with a warning should be singled out for particular interrogation, but passing does not mean the molecule is innocent. Laudably, the researchers have made all the underlying data and models publicly available for others to build on, and I hope this happens. But for now, it seems that no computational tool can substitute for experimental (in)validation of hits.

2 comments:

Peter Kenny said...

Hi Dan, I can’t actually see the article and will make some general comments. The authors state in the abstract that they “… developed and validated quantitative structure–interference relationship (QSIR) models to predict these nuisance behaviors. The resulting models showed 58–78% external balanced accuracy for 256 external compounds per assay.” I take “quantitative” to imply that continuous data have been used to build regression models but “external balanced accuracy” suggests the models are actually categorical. I’m assuming that the four assays that they’ve run all measure nuisance behavior directly (PAINS filters are based on assumptions that frequent-hitter behavior in the assay panel is indicative of nuisance behavior) and these will not detect nuisance behaver resulting from UV/vis absorption, fluorescence, singlet oxygen reactivity/quenching or colloidal aggregation. Interference with read-out increases with concentration and here’s a relevant article from my former AZ colleagues that shows how interference can be assessed and in some cases corrected for.

My view is that QSAR (or ML) models cannot make reliable predictions if they’ve not been trained with data for close structural analogs of the compounds for which predictions are being made and it may be that there’s nothing that is similar to toxoflavin in the data sets used to train the models. One simple way to address this issue is to present the user with the relevant data for the compounds from the training set which are considered to be the closest neighbours of the compound for which the prediction has been made (for some reason QSAR/ML modellers appear to consider this a terrible idea). Uneven coverage of chemical space by training and test sets is a real (although rarely acknowledged) problem in QSAR/ML modelling and my view, expressed in this 2009 article is that some (most?) “global” models are actually ensembles of local models. Another consequence of uneven coverage of chemical space by training/test sets is that validation procedures can lead to optimistic assessments of model quality.

Dan Erlanson said...

Hi Pete,
Unfortunately the paper is behind a paywall, but I'd be curious to get your thoughts on the Liability Predictor itself, which is not. You are correct that it won't pick out problems due to UV/vis absorption etc., but what I like is that in theory it shows which assays a given compound may show false-positive behavior. As you point out the training set may be too small, but it is odd that even compounds specifically described in the paper as being problematic seem to pass the online filter.