Machine learning is becoming increasingly common in drug discovery. Just a few months ago we highlighted its use to design a library of privileged fragments. However, constructing a library is usually done infrequently (though continued renovation of a library is always a good idea). In two papers from earlier this year, Jacob Durrant and colleagues at University of Pittsburgh use machine learning to tackle the more common task of lead optimization.
The first paper, in Chem. Sci., describes DeepFrag, a “deep convolutional neural network for fragment-based lead optimization.” The researchers started with the Binding MOAD database, a collection of nearly 39,000 high-quality protein-ligand complex structures from the Protein Data Bank. Ligands were computationally fragmented by chopping off terminal appendages less than 150 Da. The fragments were then converted into molecular fingerprints encoding their structures. Meanwhile, the protein region around each ligand was converted into a three-dimensional grid of voxels, akin to how images used for computer vision training are processed.
The researchers describe the goal as follows. “We propose a new ‘fragment reconstruction’ task where we take a ligand/receptor complex, remove a portion of the ligand, and ask the question ‘what molecular fragment should go here.’”
About 60% of the data were used in a training model for the machine learning algorithm. This was then evaluated on 20% of the data and further refined before the final evaluation on the remaining 20% of the data. The details are beyond the scope of this post (and frankly beyond me as well) but DeepFrag recapitulated known fragments about 60% of the time. Importantly, the model worked for diverse types of fragments, including both polar and hydrophobic examples. Even “wrong” answers were often similar to the “correct” responses, for example a methyl group instead of a chlorine atom. In some cases where DeepFrag’s predictions differed from the original ligand the researchers note that these may be acceptable alternatives, a hypothesis supported by subsequent molecular docking studies.
Of course, the goal for most of us is not to recapitulate known ligands but to optimize them, so the researchers applied DeepFrag to crystallographically identified ligands of the main protease from SARS-CoV-2. Many of them docked well, though they have yet to be synthesized and tested.
Laudably, the model and source code have been released and can be accessed here. However, as these require a certain amount of computer savvy to use, Harrison Green and Jacob Durrant have also created an open-source browser app which is described in an open-access application note in J. Chem. Inf. Mod.
The browser app runs entirely on a local computer, without requiring users to upload possibly sensitive data. The application note describes using the app to recapitulate an example from the original paper. It also describes using it on a fragment bound to antibacterial target GyrB, a fragment-to-lead success story we blogged about last year. DeepFrag correctly predicted some of the same fragment additions that were described in that paper.
The app is incredibly easy to use: just load a protein and ligand (from a pdb file, for example) and the structure appears in a viewer. Click the “Select Atom as Growing Point” button, choose an atom, and hit “Start DeepFrag.” The ranked results are provided as SMILES strings and chemical structures, and the coordinates can also be downloaded. You can also delete atoms before growing if you would like to replace a fragment.
In my own cursory evaluation, DeepFrag correctly suggested adding a second hydroxyl to the ethamivan fragment bound to Hsp90 (see here). It did not suggest an isopropyl replacement for the methoxy group, but it did suggest methyl. Trying a newer example unlikely to have been part of the training set did not recapitulate the ethoxy in the BTK ligand compound 18 (see here), but did suggest a number of interesting and plausible rings. Calculations took a few minutes on my aging personal Windows laptop using Firefox.
In contrast to the hyperbolic claims too often seen in the field, the researchers conclude the Chem. Sci. paper modestly: “though not a substitute for a trained medicinal chemist, DeepFrag is highly effective for hypothesis generation.”
Indeed – I recommend playing around with it. We may still be some way from SkyFragNet, but we’re making progress.