The easiest way to build a
fragment library is to purchase one. Quite a few vendors sell fragments, and as
our poll from a few years ago demonstrated, most buyers are quite happy with
them. But what exactly do they offer? This is the subject of a new paper by
Gilles Marcou, Esther Kellenberger, and colleagues at CNRS Université de
Strasbourg in RSC Med. Chem.
The researchers analyzed 86
different libraries from 14 vendors that were available in February of 2021.
These were classified into ten categories, such as “general,” “3D-shaped,” “metal
chelating,” “diverse,” “covalent,” etc. Individual library sizes ranged
considerably: 41 had ≤ 2000 compounds, 31 had 2000-10,000, and 14 libraries had
> 10,000 molecules. The total number of fragments came to 754,646, of which
512,284 were unique, indicating some redundancy between libraries. Laudably,
the structures and several analyses are all provided as downloadable files here.
Most of the fragments are 200-300
Da, with only 13% less than 200 Da. This skew towards larger molecules is
common but may not be desirable, as researchers at Astex demonstrated back in 2014.
On the other hand, people do seem to be paying attention to lipophilicity:
nearly half the fragments have AlogP < 1. Interestingly, less than a quarter
of fragments strictly fulfill the rule of three, though the majority of
violations are for more than 3 hydrogen bond acceptors, which is probably not
as important as the other criteria, according to analyses of approved drugs.
Two different methods were used
to assess diversity. These were applied to 433,433 compounds from fifty libraries;
specialized libraries such as fluorine-rich and covalent libraries were
excluded. The first analysis deconstructed fragments into 59,270 component scaffolds.
Not surprisingly, benzene was the most common, present in nearly 5% of all
fragments. Quinoline, indole, pyridine, and benzimidazole all were present in
at least 1% of compounds. At the other end of the spectrum, 36,555 scaffolds
occurred only once. Not surprisingly, these tended to be more complex.
In addition to assessing
scaffolds, the researchers developed a “Generative Topographical Map (GTM)
model to represent the chemical space in a landscape.” The resulting figures do
indeed look like topographical maps, with darker regions corresponding to more
populated areas. For example, since substituted benzimidazoles are common and
similar to one another, they form a dark cluster. Not unexpectedly, the landscape
for the set of 433,433 compounds is heterogenous, with denser regions separated
by sparsely-populated regions.
A nice feature of the GTM model
is that it allows easy, intuitive comparisons. For example, some of the “diverse”
libraries are more diverse than others, or emphasize different regions of
chemical space, and potential customers may want to take these into account.
Fragment shapeliness was assessed
using plane of best fit (PBF), where lower values correspond to “flatter”
molecules, such as benzene, with PBF = 0. The libraries varied considerably in
their average PBF, though reassuringly the “3D-shaped” libraries did have
higher values. Interestingly, GTM models showed both flat (PBF < 0.1) and non-planar
(PBF ≥ 0.1) fragments had similar distributions across fragment space.
Overall this is a valuable snapshot
of the current state of commercial libraries, and makes a useful complement to
the ongoing analysis Chris Swain does at Cambridge MedChem Consulting. Of
course, the devil is in the details; PAINS still sometimes show up in
commercial libraries, and quality control can vary. In the end you’ll want to
do your own vetting, but this is a good place to start.