Machine learning is becoming increasingly
common in drug discovery. Just a few months ago we highlighted its use to
design a library of privileged fragments. However, constructing a library is
usually done infrequently (though continued renovation of a library is always a
good idea). In two papers from earlier this year, Jacob Durrant and colleagues
at University of Pittsburgh use machine learning to tackle the more common task
of lead optimization.
The first paper, in Chem. Sci.,
describes DeepFrag, a “deep convolutional neural network for fragment-based
lead optimization.” The researchers started with the Binding MOAD database, a
collection of nearly 39,000 high-quality protein-ligand complex structures from
the Protein Data Bank. Ligands were computationally fragmented by chopping off terminal
appendages less than 150 Da. The fragments were then converted into molecular fingerprints
encoding their structures. Meanwhile, the protein region around each ligand was
converted into a three-dimensional grid of voxels, akin to how images used for
computer vision training are processed.
The researchers describe the goal
as follows. “We propose a new ‘fragment reconstruction’ task where we take a
ligand/receptor complex, remove a portion of the ligand, and ask the question ‘what
molecular fragment should go here.’”
About 60% of the data were used
in a training model for the machine learning algorithm. This was then evaluated
on 20% of the data and further refined before the final evaluation on the
remaining 20% of the data. The details are beyond the scope of this post (and
frankly beyond me as well) but DeepFrag recapitulated known fragments about 60%
of the time. Importantly, the model worked for diverse types of fragments, including
both polar and hydrophobic examples. Even “wrong” answers were often similar to
the “correct” responses, for example a methyl group instead of a chlorine atom.
In some cases where DeepFrag’s predictions differed from the original ligand
the researchers note that these may be acceptable alternatives, a hypothesis
supported by subsequent molecular docking studies.
Of course, the goal for most of
us is not to recapitulate known ligands but to optimize them, so the
researchers applied DeepFrag to crystallographically identified ligands of the
main protease from SARS-CoV-2. Many of them docked well, though they have yet
to be synthesized and tested.
Laudably, the model and source
code have been released and can be accessed here. However, as these require a
certain amount of computer savvy to use, Harrison Green and Jacob Durrant have
also created an open-source browser app which is described in an open-access application note in J. Chem. Inf. Mod.
The browser app runs entirely on
a local computer, without requiring users to upload possibly sensitive data. The
application note describes using the app to recapitulate an example from the original
paper. It also describes using it on a fragment bound to antibacterial target GyrB, a fragment-to-lead success story we blogged about last year. DeepFrag correctly predicted some
of the same fragment additions that were described in that paper.
The app is incredibly easy to
use: just load a protein and ligand (from a pdb file, for example) and the
structure appears in a viewer. Click the “Select Atom as Growing Point” button,
choose an atom, and hit “Start DeepFrag.” The ranked results are provided as
SMILES strings and chemical structures, and the coordinates can also be
downloaded. You can also delete atoms before growing if you would like to
replace a fragment.
In my own cursory evaluation, DeepFrag
correctly suggested adding a second hydroxyl to the ethamivan fragment bound to
Hsp90 (see here). It did not suggest an isopropyl replacement for the methoxy
group, but it did suggest methyl. Trying a newer example unlikely to have been
part of the training set did not recapitulate the ethoxy in the BTK ligand
compound 18 (see here), but did suggest a number of interesting and plausible
rings. Calculations took a few minutes on my aging personal Windows laptop
using Firefox.
In contrast to the hyperbolic claims too often seen in the field, the researchers conclude the Chem.
Sci. paper modestly: “though not a substitute for a trained medicinal
chemist, DeepFrag is highly effective for hypothesis generation.”
Indeed – I recommend playing around
with it. We may still be some way from SkyFragNet, but we’re making progress.