07 June 2021

A minimal fragment library for maximal coverage of pharmacophore space

Last week we described a fragment library built with the aid of machine learning and designed to contain privileged fragments that should produce high hit rates. Unfortunately, only about a tenth of the library members are commercially available, so it will be some time before we know whether the design was successful. We continue the theme of fragment libraries with a just published Nat. Commun. paper by György Keserű (Hungarian Research Centre for Natural Sciences) and a large group of multinational collaborators (see also here for a nice summary by György).
 
The researchers started by analyzing more than 3300 crystal structures of protein-fragment complexes in the protein data bank. Fragments were defined as having 10-16 non-hydrogen atoms, and the computational approach FTMap was used to ensure that fragments were binding at hotspots as opposed to spurious, less ligandable sites. This exercise yielded 3584 fragments, but many of them were identical or very similar to one another. The researchers used a series of computational tools to cluster similar fragments (or pharmacophores) and choose a set that would maximize diversity. This ultimately led them to assemble a library of just 96 fragments, purchased from five vendors.
 
This SpotXplorer0 library mostly follows the rule of three, with 7 to 17 non-hydrogen atoms, MW 100-250 (or 280 for bromine-containing molecules), ≤ 3 hydrogen bond donors, ≤ 8 hydrogen bond acceptors, and ≤ 3 rotatable bonds. In addition, all members have 1-3 rings, no more than a single halogen or sulfur atom, and no PAINS. Despite the small size, this library covers most of the pharmacophores identified in the larger set, and considerably more than the F2X-Entry fragment library we highlighted last year or the top five commercial library vendors we noted here.
 
The researchers then screened this library against eight targets. Three GPCRs (the serotonin receptors 5-HT1A, 5-HT6, and 5-HT7) were assessed in a cell-based radioligand displacement assay with fragments at just 10 µM. Despite the low concentration, 4-11 hits were found. Biochemical screens conducted at 800 µM against the proteases thrombin and Factor Xa yielded 7 and 8 hits respectively. Further analysis revealed that the SpotXplorer0 ligands sampled a majority of the pharmacophores found in published fragment hits against theses five targets.
 
Next the researchers screened their library against the histone methyltransferase SETD2, an oncology target with few known attractive ligands. An enzymatic assay yielded two hits, with IC50 values between 300 and 500 µM.
 
Finally, the SpotXplorer0 library was part of the XChem crystallographic screens against the SARS-CoV-2 main protease (Mpro) and Nsp3 macrodomain, which we discussed here and here. For Mpro, just a single hit was found. This is only half the overall hit rate for noncovalent fragments in the crystallographic screen against this target, but the hit is functionally active and has a high ligand efficiency.
 
The screen against NSP3 yielded five hits binding at two different sites, for a hit rate of 5.2%. The overall hit rate against this target was 8%, but that encompasses screens against two crystal forms of the protein. The crystal form used for SpotXplorer0 had a hit rate of 21%.
 
In summary, SpotXplorer0 is new fragment library that gives high coverage of experimental pharmacophore space. Laudably, structures of all 96 fragments are provided in the Supplementary Information. But the jury remains out on how hit-rich the library will be. Interestingly, the F2X-Entry library we highlighted last year gave considerably higher hit rates of 21% and 30%, albeit against two different targets. SpotXplorer0 is being screened crystallographically against multiple targets at XChem, and it will be interesting to see how it performs in the long run.

01 June 2021

New fragments suggested by machine learning

Machine learning has become a hot new thang in drug discovery, attracting massive attention and investment. While easy to parody, artificial intelligence techniques are becoming increasingly powerful. A new paper in J. Chem Inf. Mod. by Angelo Pugliese and colleagues at the Beatson Institute applies the methodology to generate a new fragment library.
 
Machine learning entails collecting large amounts of data, passing that through various neural networks, and obtaining recommendations. In this case, the researchers wanted to generate “privileged fragments” that would hit in multiple assays. (Of course, the idea would be to make genuinely privileged fragments, such as 4-azaindole, rather than PAINS.) The researchers used a training set of 66 fragments that hit in at least three of 25 screens done at the Beatson, for which the average hit rate was 2.18%.
 
First though, the researchers needed to teach their model how to generate chemically valid fragments in the first place (for example, fewer than 5 bonds to carbon). To do this they used both SMILES (simplified molecular-input line-entry system) and chemical fingerprints from a set of 486,565 commercially available fragments. They then combined this model with the privileged fragments. Extensive details are provided; as they go well beyond my expertise I won’t even attempt to summarize them. (For example, “the classifier for the smi2smi model comprised sequential 64-unit and 32-unit dense ReLU layers followed by a single sigmoid output neuron.”) At the end of the exercise, and after triaging by medicinal chemists, the researchers came up with a set of 741 fragments.
 
What are their overall properties? For one thing, generated fragments tend to be more planar (as assessed by PBF) and have lower Fsp3 values than the nearly half-million fragments used for training. The researchers acknowledge that this could reflect the historical composition of the Beatson fragment library, although as we argued here it could also be true that flatter fragments just give higher hit rates.
 
Molecular complexity is a fundamental but poorly defined aspect of fragment-based lead discovery, and the researchers have come up with their own metric, called feature complexity (FeCo), which incorporates information on rotatable bonds, numbers of halogens, hydrogen bond donors and acceptors, charged groups, aromatic rings, and hydrophobic elements, all normalized by the number of heavy atoms. Hopefully this will be explored more fully in a dedicated publication.
 
What do the individual fragments actually look like? Five examples are shown in the paper, and nearly 200 more are provided in the supporting information. Below are seven chosen arbitrarily from that list (sampling every 30 structures).
 

Of course, the question remains as to whether these fragments will truly turn out to be privileged. As might be expected given the vastness of chemical space, only 78 of the 741 are commercially available. The researchers note that they are acquiring some of these, and it will be interesting to see how they perform in the screens to come.