31 January 2022

A framework for evaluating commercial fragment libraries

The easiest way to build a fragment library is to purchase one. Quite a few vendors sell fragments, and as our poll from a few years ago demonstrated, most buyers are quite happy with them. But what exactly do they offer? This is the subject of a new paper by Gilles Marcou, Esther Kellenberger, and colleagues at CNRS Université de Strasbourg in RSC Med. Chem.
 
The researchers analyzed 86 different libraries from 14 vendors that were available in February of 2021. These were classified into ten categories, such as “general,” “3D-shaped,” “metal chelating,” “diverse,” “covalent,” etc. Individual library sizes ranged considerably: 41 had ≤ 2000 compounds, 31 had 2000-10,000, and 14 libraries had > 10,000 molecules. The total number of fragments came to 754,646, of which 512,284 were unique, indicating some redundancy between libraries. Laudably, the structures and several analyses are all provided as downloadable files here.
 
Most of the fragments are 200-300 Da, with only 13% less than 200 Da. This skew towards larger molecules is common but may not be desirable, as researchers at Astex demonstrated back in 2014. On the other hand, people do seem to be paying attention to lipophilicity: nearly half the fragments have AlogP < 1. Interestingly, less than a quarter of fragments strictly fulfill the rule of three, though the majority of violations are for more than 3 hydrogen bond acceptors, which is probably not as important as the other criteria, according to analyses of approved drugs.
 
Two different methods were used to assess diversity. These were applied to 433,433 compounds from fifty libraries; specialized libraries such as fluorine-rich and covalent libraries were excluded. The first analysis deconstructed fragments into 59,270 component scaffolds. Not surprisingly, benzene was the most common, present in nearly 5% of all fragments. Quinoline, indole, pyridine, and benzimidazole all were present in at least 1% of compounds. At the other end of the spectrum, 36,555 scaffolds occurred only once. Not surprisingly, these tended to be more complex.
 
In addition to assessing scaffolds, the researchers developed a “Generative Topographical Map (GTM) model to represent the chemical space in a landscape.” The resulting figures do indeed look like topographical maps, with darker regions corresponding to more populated areas. For example, since substituted benzimidazoles are common and similar to one another, they form a dark cluster. Not unexpectedly, the landscape for the set of 433,433 compounds is heterogenous, with denser regions separated by sparsely-populated regions.
 
A nice feature of the GTM model is that it allows easy, intuitive comparisons. For example, some of the “diverse” libraries are more diverse than others, or emphasize different regions of chemical space, and potential customers may want to take these into account.
 
Fragment shapeliness was assessed using plane of best fit (PBF), where lower values correspond to “flatter” molecules, such as benzene, with PBF = 0. The libraries varied considerably in their average PBF, though reassuringly the “3D-shaped” libraries did have higher values. Interestingly, GTM models showed both flat (PBF < 0.1) and non-planar (PBF ≥ 0.1) fragments had similar distributions across fragment space.
 
Overall this is a valuable snapshot of the current state of commercial libraries, and makes a useful complement to the ongoing analysis Chris Swain does at Cambridge MedChem Consulting. Of course, the devil is in the details; PAINS still sometimes show up in commercial libraries, and quality control can vary. In the end you’ll want to do your own vetting, but this is a good place to start.

No comments:

Post a Comment