31 January 2022

A framework for evaluating commercial fragment libraries

The easiest way to build a fragment library is to purchase one. Quite a few vendors sell fragments, and as our poll from a few years ago demonstrated, most buyers are quite happy with them. But what exactly do they offer? This is the subject of a new paper by Gilles Marcou, Esther Kellenberger, and colleagues at CNRS Université de Strasbourg in RSC Med. Chem.
The researchers analyzed 86 different libraries from 14 vendors that were available in February of 2021. These were classified into ten categories, such as “general,” “3D-shaped,” “metal chelating,” “diverse,” “covalent,” etc. Individual library sizes ranged considerably: 41 had ≤ 2000 compounds, 31 had 2000-10,000, and 14 libraries had > 10,000 molecules. The total number of fragments came to 754,646, of which 512,284 were unique, indicating some redundancy between libraries. Laudably, the structures and several analyses are all provided as downloadable files here.
Most of the fragments are 200-300 Da, with only 13% less than 200 Da. This skew towards larger molecules is common but may not be desirable, as researchers at Astex demonstrated back in 2014. On the other hand, people do seem to be paying attention to lipophilicity: nearly half the fragments have AlogP < 1. Interestingly, less than a quarter of fragments strictly fulfill the rule of three, though the majority of violations are for more than 3 hydrogen bond acceptors, which is probably not as important as the other criteria, according to analyses of approved drugs.
Two different methods were used to assess diversity. These were applied to 433,433 compounds from fifty libraries; specialized libraries such as fluorine-rich and covalent libraries were excluded. The first analysis deconstructed fragments into 59,270 component scaffolds. Not surprisingly, benzene was the most common, present in nearly 5% of all fragments. Quinoline, indole, pyridine, and benzimidazole all were present in at least 1% of compounds. At the other end of the spectrum, 36,555 scaffolds occurred only once. Not surprisingly, these tended to be more complex.
In addition to assessing scaffolds, the researchers developed a “Generative Topographical Map (GTM) model to represent the chemical space in a landscape.” The resulting figures do indeed look like topographical maps, with darker regions corresponding to more populated areas. For example, since substituted benzimidazoles are common and similar to one another, they form a dark cluster. Not unexpectedly, the landscape for the set of 433,433 compounds is heterogenous, with denser regions separated by sparsely-populated regions.
A nice feature of the GTM model is that it allows easy, intuitive comparisons. For example, some of the “diverse” libraries are more diverse than others, or emphasize different regions of chemical space, and potential customers may want to take these into account.
Fragment shapeliness was assessed using plane of best fit (PBF), where lower values correspond to “flatter” molecules, such as benzene, with PBF = 0. The libraries varied considerably in their average PBF, though reassuringly the “3D-shaped” libraries did have higher values. Interestingly, GTM models showed both flat (PBF < 0.1) and non-planar (PBF ≥ 0.1) fragments had similar distributions across fragment space.
Overall this is a valuable snapshot of the current state of commercial libraries, and makes a useful complement to the ongoing analysis Chris Swain does at Cambridge MedChem Consulting. Of course, the devil is in the details; PAINS still sometimes show up in commercial libraries, and quality control can vary. In the end you’ll want to do your own vetting, but this is a good place to start.

23 January 2022

Fragments (almost) in the clinic: MRTX1719

Synthetic lethality is a relatively new approach to cancer therapy. The idea is to inhibit a protein that is necessary for cancer cells but dispensable for normal cells, thereby minimizing toxicity. Last year we described one example, and in a just-published open access J. Med. Chem. paper Chris Smith and colleagues at Mirati describe another.
The biology gets a bit complicated, so please bear with me. Protein arginine methyl transferase 5 (PRMT5) is an epigenetic writer that adds two methyl groups to arginine residues in a wide variety of proteins. It is essential for cell survival. PRMT5 uses a cofactor, S-adenosyl-L-methionine (SAM), that is converted to methylthioadenosine (MTA) during the reaction. In certain cancers a gene called methylthioadenosine phosphorylase (MTAP) is deleted, causing an accumulation of MTA and – through product inhibition – a decrease in PRMT5 activity. The idea is to develop a drug that binds to and further stabilizes the (inactive) PRMT5•MTA complex, which is abundant in cancer cells, while not interfering with the active form of the protein, which predominates in normal cells. Told you it was complicated! [Note added: as befits the complicated biology I got a couple things wrong, corrected in the comment on 26 Jan.]
The researchers started with an SPR screen of 1000 commercially available fragments, each at 100 µM. PRMT5 was immobilized on the chip, with MTA added to the buffer to form the PRMT5•MTA complex. This screen yielded 17 hits, and based on this encouraging result a further set of nearly 1900 fragments was screened at 500 µM. The higher concentration yielded significantly more hits, and when these were tested in dose response experiments 100 were found with dissociation constants better than 1 mM. The best 24 of these were then screened against PRMT5 loaded with either MTA or the cofactor SAM. Compound F1 proved to be 5-fold selective for the MTA-bound protein over the SAM-bound protein.
Crystallography revealed that this molecule binds in the substrate-binding site in the vicinity of MTA and suggested that it would clash with SAM binding, thus providing an explanation for its selectivity. The crystal structure also revealed a nearby pocket that could be targeted through fragment growing, and this was accomplished with compound 2, which also showed activity in a biochemical assay. Further structure-based design led eventually to compound 14, which was 26-fold selective for the MTA-bound protein.

Crystallography revealed another lipophilic pocket, and adding a phenyl group provided a nice increase in potency in the form of compound 15. This molecule also showed low micromolar cell activity. Further structure-based drug design ultimately led to MRTX1719; the medicinal chemistry is elegant but beyond the scope of this post. Chemists will recognize that the final molecule is an atropisomer. This type of stereoisomer is uncommon in drugs in part because they can be difficult to separate; the researchers note assessing 70 different conditions before abandoning one series in favor of a more tractable one.
The dissociation constant of MRTX1719 was measured by SPR as 0.14 pM and 9.4 pM for the PRMT5•MTA and PRMT5•SAM complexes, respectively. We don’t encounter femtomolar binders very often; the dissociation half-life for the MTA-bound protein is 14 days! The 67-fold difference in binding was in good agreement with 70-80-fold differences in cells without or with MTAP.
MRTX1719 was quite selective in a panel of 42 methyltransferases. Pharmacokinetics and oral bioavailability were good in mice, dogs, and cynomolgus monkeys. The molecule was well tolerated in a mouse tumor model and caused tumor growth inhibition. Based on these results, an IND for the molecule has been submitted to the FDA.
This is a lovely fragment-to-candidate story, and Practical Fragments wishes everyone involved good fortune in the clinic!

17 January 2022

An epidemic of aggregators, and suggestions for cures

COVID-19 has been with us for over two years now. While the human effects have been unquestionably negative, for science it has been the best of times and the worst of times. The development of remarkably effective vaccines in less than a year stands as a triumph of twenty-first century medicine, as does the discovery of nirmatrelvir, a covalent inhibitor of the SARS-CoV-2 main protease Mpro (also called 3CL-Pro). But there is a lot of junk-science out there too, as illuminated in a recent J. Med. Chem. paper by Brian Shoichet and colleagues at University of California San Francisco.
Before vaccines and custom-built drugs were developed, labs everywhere started screening all the compounds they could get against targets relevant for COVID-19. The most popular molecules to test were approved drugs, the idea being that if any of these turned out to be effective they could immediately be put to use.
One of the most common artifacts in screening is caused by aggregation: small molecules can form colloids that non-specifically inhibit a variety of different assays. This phenomenon has been understood for more than two decades; Practical Fragments wrote about it back in 2009. Unfortunately, many labs ignore it.
The UCSF lab investigated 56 drugs that had been reported in 12 papers as inhibitors against two targets relevant for SARS-CoV-2, including 3CL-Pro. The molecules were characterized in multiple assays: particle formation and clean autocorrelation curves in dynamic light scattering (DLS), inhibition of an aggregation-sensitive enzyme in the absence of detergent but no inhibition in the presence of detergent, and a high Hill slope in the dose-response curve. Nineteen molecules, four of them fragment-sized, were positive in most of these assays, clearly indicating aggregation. (Interestingly, several of these gave reasonable Hill slopes (<1.4), and the researchers suggest this be a “soft criterion.”) Another 14 molecules gave more ambiguous results, such as forming particles by DLS but not inhibiting the sentinel enzyme.
OK, so maybe the molecules are aggregators, but perhaps they also act legitimately? Unfortunately, of the 12 drugs reported in the literature to inhibit 3CL-Pro, only two inhibited the enzyme in the presence of detergent, and one of these was five-fold less potent than reported. And as the researchers point out, detergent is not a magic elixir, and sometimes only right-shifts the onset of aggregation. Moreover, of the 19 molecules conclusively found to be aggregators, detergent was not included for 15 of them in the original publications. Brian may be too polite to write this, but channeling my inner Teddy, I would argue that the authors are negligent for failing to test for aggregation, as are the editors and reviewers who allowed these papers to be published.
And the problem is not confined to the COVID-19 literature. The researchers examined a commercial library of 2336 FDA-approved drugs, 73 of which are known aggregators. Another 356 were flagged in the very useful Aggregation Advisor tool (see here), and 6 of 15 experimentally evaluated tested positive in all the aggregation assays.
How do you avoid being misled by these artifacts? An extensive suite of tools for assessing aggregation is provided in a recent Nat. Protoc. paper by Steven LaPlante and colleagues at Université du Québec and NMX. The procedures are described in sufficient detail that they “can be easily performed by graduate students and even undergraduate students.”
Most of the focus is on various NMR techniques, such as one we wrote about here. The easiest is an NMR dilution assay, in which a 20 mM solution of a compound in DMSO is serially diluted into aqueous buffer at concentrations from 200 to 12 µM. If the number, shape, shift, or intensities of the NMR resonances changes, aggregation is likely.
Another assay involves testing compounds in the absence and presence of various detergents, including NP40, Triton, SDS, CHAPS, Tween 20, and Tween 80. Again, changes in the NMR spectra suggest aggregation.
The researchers note that “no one technique can detect all the types of aggregates that exist; thus, a combination of strategies is necessary.” Indeed, the various techniques can distinguish different types of aggregates which can vary in size and polydispersity. On a lemons-to-lemonade note, these “nano-entities” might even be useful for “drug delivery, anti-aggregates, cell penetrators and bioavailability enhancers.”
We live in the age of wisdom and the age of foolishness. As scientists – and as people – it is our responsibility to aspire to the former by being aware of “unknown knowns,” such as aggregation. And perhaps, by even taking advantage of the weird phenomena that can occur with small molecules in water.

10 January 2022

Virtually screening 11 billion compounds – no problem!

Three years ago we highlighted virtual screens of roughly 100 million molecules which led to numerous high-affinity ligands against two targets. Those efforts made use of the Enamine “readily available for synthesis” (REAL) library, a virtual catalog of molecules that can be rapidly made and delivered. Enamine is continuing to grow this resource, which as of last year stood at 11 billion compounds. This is an impressive number, but how do you make use of it? In a just-published paper in Nature, Vsevolod Katritch (University of Southern California, Los Angeles) and a large group of collaborators provide a promising fragment-based solution.
Molecules in the Enamine REAL collection can be made using one-pot parallel synthesis from two or three reagents; for example, an amide could be made from an amine and a carboxylic acid. Enamine built a set of 75,000 reagents and 121 different reactions which collectively could produce 11 billion molecules (it’s even larger now). However, docking all of these could take thousands of years on a single CPU or cost hundreds of thousands of dollars on a computing cloud.
Rather than docking all the Enamine REAL compounds, the researchers developed an approach called virtual synthon hierarchical enumeration screening, or V-SYNTHES. The first step is to create a library of scaffolds with molecular weights in the 250-350 Da range. Taking the amide example above, imagine linking a set of 1000 amines to benzoic acid and a set of 1000 carboxylic acids to methylamine. This 2000 compound minimal enumeration library, or MEL, could be considered a subset of the full 1000 x 1000 = 1,000,000 virtual amide library. The numbers are even more dramatic for a three-component reaction: a MEL of just 1500 compounds could represent 125,000,000 fully elaborated molecules.
The MEL is docked against a protein of interest, and a diverse set of the top-scoring compounds chosen for fragment growing. In our example, the benzoic acid “cap” on the best compounds would be replaced by the full set of 1000 carboxylic acids. These would then be virtually screened, and the top compounds synthesized and tested.
The researchers applied V-SYNTHES to two targets. The first was a cannabinoid receptor bound to an antagonist. A total of 1.5 million molecules were docked against CB2, representing 11 billion fully enumerated compounds. After filtering the best hits to remove PAINS and molecules similar to known CB2 ligands, 80 diverse compounds were chosen for actual synthesis and testing, of which Enamine was able to deliver 60 in less than 5 weeks. One-third of these turned out to be antagonists with Ki values < 10 µM in biological assays.
How does this compare to a brute-force approach? Screening all 11 billion molecules wasn’t feasible, so the researchers screened a representative subset of the Enamine REAL library consisting of 115 million molecules – two orders of magnitude larger than the libraries screened in V-SYNTHES. Of 97 compounds synthesized and tested, only 5 turned out to be antagonists of CB2 with Ki values < 10 µM.
A nice feature of V-SYNTHES is that it is well-suited to SAR-by-catalog. This was demonstrated by looking for analogs of the three best hits within Enamine REAL space. Of 104 compounds synthesized and tested, more than half had Ki values < 10 µM, and 23 were submicromolar antagonists. In fact, several turned out to be low nanomolar and selective not just against the related CB1 receptor but against a panel of 300 other GPCRs.
V-SYNTHES was also applied to the kinase ROCK1 and achieved similarly impressive results: six of 21 compounds synthesized and tested had Kd < 10 µM in a binding assay, and one was a low nanomolar inhibitor.
This is a lovely and practical application of fragment concepts. Importantly, because the computational cost only increases linearly with the number of synthetic components while the library size increases with the square (for two-component molecules), it is very scalable; the researchers suggest that “terascale and petascale libraries” should be “easily” accommodated. These are numbers beyond even what DNA-encoded libraries can promise.
Currently V-SYNTHES relies on a good structural model for docking, but as computational predictions of protein structures become ever more accurate, perhaps even this will cease to be a limitation. Our SkyFragNet post from 2019 is looking ever more prophetic, in a good way.

05 January 2022

Fragment events in 2022

Will 2022 mark the full return of in-person conferences? That's the plan - here's hoping SARS-CoV-2 doesn't interfere.

February 5-9: The  SLAS2022 International Conference and Exhibition will be held in Boston, so if you're looking for new instrumentation this is the place to be.

March 20-24: The American Chemical Society will hold its Spring National Meeting both in-person and virtually in San Diego. There are bound to be fragment talks, including a session on Modern Screening Methods on March 24.

March 27-29: The Royal Society of Chemistry's Fragments 2022 will be held in the original Cambridge, and also virtually. This is the eighth in an esteemed conference series that historically has alternated years with the FBLD meetings. You can read my impressions of Fragments 2013 and Fragments 2009.
April 19-20: CHI’s Seventeenth Annual Fragment-Based Drug Discovery, the longest-running fragment event, returns in-person to sunny San Diego (and will also be online). This is part of the larger Drug Discovery Chemistry meeting. You can read impressions of the 2021 virtual meeting here, the 2020 virtual meeting here, the 2019 meeting here, the 2018 meeting here, the 2017 meeting here, the 2016 meeting here; the 2015 meeting herehere, and here; the 2014 meeting here and here; the 2013 meeting here and here; the 2012 meeting here; the 2011 meeting here; and 2010 here

May 9-11:  While not exclusively fragment-focused, the Eighth NovAliX Conference on Biophysics in Drug Discovery will have several relevant talks, and for the first time will use a hybrid model, both online and in Munich, Germany. You can read my impressions of the 2018 Boston event here, the 2017 Strasbourg event here, and Teddy's impressions of the 2013 event herehere, and here.
October 17-20: CHI’s Twentieth Annual Discovery on Target will be held both virtually and in Boston, as it was last year. As the name implies this event is more target-focused than chemistry-focused, but there are always plenty of FBDD-related talks. You can read my impressions of the 2020 virtual event here, the 2019 event here, and the 2018 event here.
Know of anything else? Please leave a comment or drop me a note!