Among the many types of artifacts
that can fool screens and derail efforts to find leads, small colloidally
aggregating molecules (SCAMs) are particularly pernicious. As we discussed way
back in 2009, these molecules can form aggregates in aqueous buffer that
interfere with a variety of assays, leading to wasted resources and embarrassing
publications.
The problem is that there isn’t
necessarily anything wrong with the molecules per se, and even many approved
drugs can form aggregates. Thus, it is difficult to predict whether any given
molecule will be a troublemaker. In a new (open-access) Angew. Chem. Int.
Ed. paper, Pascal Friederich, Rebecca Davis, and collaborators at Karlsruhe
Institute of Technology and University of Manitoba Winnipeg explore whether
machine learning can help.
The researchers built a Multi-Explanation
Graph Attention Network, or MEGAN, which is accessible through a simple web interface. Rather than a homicidal doll, this MEGAN represents atoms as nodes
and bonds as edges in a graph, similar to the Fragment Network we wrote about
here. MEGAN was trained on a set of 12,338 aggregators and 177,048 non-aggregating
molecules. Importantly, the researchers used explainable AI (xAI), which colors
portions of the molecule according to their importance for (non)aggregation.
Testing MEGAN on a set of 1500
aggregators and 1500 non-aggregators, none of which were included in the training
set, yielded an accuracy of 82%. Given that most molecules don’t aggregate, a
model biased towards non-aggregators would be expected to have a high accuracy,
and to account for this the researchers assessed the “F1” score, which was
similarly impressive.
Just adding a methyl group flips
the odds in favor of aggregation to 92%.
Exploring the molecular features
that lead to aggregation can reveal general trends, such as rigid, “flat” molecules
with moieties that can serve either as hydrogen bond donors or acceptors. This
is consistent with a paper we discussed last year, though unfortunately the
researchers do not cite it.
To further assess the tool, it
was tested against a set of drugs that had been characterized as aggregators or
non-aggregators. MEGAN correctly classified 15 of 30 aggregators and 24 of 28
non-aggregators. In contrast, a different program caught only 2 of the
aggregators. The researchers note that most of the training data for MEGAN came
from a single screen in phosphate buffer at pH 7, and aggregation can be very
dependent on buffer components and pH.
Practical Fragments has previously highlighted other
aggregation predictors, most notably Aggregator Advisor and Liability Predictor. As for any computational model, the old chestnut “trust but verify”
applies. MEGAN appears to be a useful tool, but please run physical experiments
if the molecule is important.