But not all artifacts are so easily spotted, as discussed in a new paper just published in J. Med. Chem. by John Irwin, Brian Shoichet, and colleagues at the University of California San Francisco (see also here for Derek Lowe's excellent summary).
The researchers took on one of the most insidious problems, compound aggregation, in which small molecules form colloids that bind to and partially denature proteins, causing false positives in all sorts of assays. This can happen even at nanomolar concentrations of compound, and is all the more problematic at higher concentrations used in fragment screening and early hit to lead optimization. In many cases aggregates can be disrupted or passivated by including nonionic detergents such as Triton X-100 or Tween-80, but not all assays tolerate detergent, and some aggregates form even in the presence of detergent.
Worse, all sorts of molecules can form aggregates, including many approved drugs. Previous attempts to try to predict which molecules will aggregate have not been very successful. Colloid formation is essentially a phase transition, and like other such transitions (crystallization, for example) it is fiendishly difficult to predict what molecules will do this under what conditions. But if we can’t predict from first principles which molecules will form aggregates, can we at least draw empirical lessons?
The researchers assembled a set of >12,600 known aggregators and put together a very simple model that assesses how similar a molecule of interest is to one of these aggregators (using Tanimoto coefficients, or Tcs). Aggregators have a wide range of physicochemical properties, with ClogP values from -5.3 to 9.8, but 80% have ClogP> 3.0. The team hypothesized that a molecule sufficiently similar to a known aggregator – and also somewhat lipophilic – would have a higher probability of being an aggregator than a molecule chosen at random.
To test this idea, the team took a batch of 40 molecules and tested them for aggregation. Among those most similar to known aggregators (Tc ≥95%), 5 of 7 molecules were confirmed as aggregators. This fell to 10 of 19 for the next set (Tc 90-94%), 3 of 7 after that (Tcs 85-89%) and only 1 of 7 for the least similar (Tcs 80-84%). Thus,Tc ≥85% was chosen as the cutoff.
Next, the researchers examined molecules that had been reported as active in some sort of biological assay, and found that 7% were ≥85% similar to a known aggregator and had ClogP> 3. Ominously, this rate is an order of magnitude greater than the number of commercially available compounds that also fit these criteria. More damning, most of this enrichment has occurred since 1995, when high-throughput and virtual screening really went mainstream. In other words, the past couple decades have seen a sizable enrichment of potential aggregators in the literature.
All of this is fascinating, but what really makes this paper significant is that the researchers have made all their primary data available, and also built a simple to use website called “Aggregator Advisor”. Just draw your molecule or paste a SMILES string to generate a report. For example, entering gossypol tells you that this molecule has previously been reported as an aggregator. (With two catechol moieties, it’s also a PAINS.) Perhaps not coincidentally, it shows up in more than 1800 publications.
Of course, as the researchers note, “just because a molecule aggregates, under some conditions, in the same concentration range as it is active, does not establish that its activity is artefactual.” Indeed, 3.6% of FDA-approved drugs are known aggregators. Still, particularly if your hit has only modest activity (0.1 µM or worse), similarity to a known aggregator should at least make you cautious.
The researchers are at pains to emphasize that their model is “primitive and subject to false negatives and false positives.” Thus, any hits need to be tested to see if they behave pathologically in any given assay. More importantly, a molecule that comes up as a negative should not be presumed to be innocent.
All these caveats aside, Aggregator Advisor is very easy to use. It’s certainly worth running the next time you find an interesting molecule – whether in your lab or in the literature – particularly if there was no detergent in the assay.