One theoretical advantage of
fragment-based drug discovery is the ability to efficiently explore chemical
space: there are vastly fewer possible fragment-sized molecules than lead-sized
molecules. That said, even fragment space is daunting; the number of possible molecules
with up to 17 non-hydrogen atoms is about three orders of magnitude larger than
the largest computational screen. Maximizing diversity is thus a key goal in
designing fragment libraries, but how do you actually do this? A new open-access
paper in Molecules by Yun Shi and
Mark von Itzstein at Griffith University provides a practical new approach.
As the researchers point out,
diversity itself can be a slippery concept. Functional diversity (ie, what
targets are bound) is important but hard-won knowledge. Physicochemical
diversity is by definition limited for fragments. That leaves structural
diversity, as defined by “molecular fingerprints.” These can be as simple as
the presence or absence of a fluorine atom, or can require complicated
calculations involving, say, the distance between a hydrogen bond donor and
acceptor in the lowest energy conformation of a molecule. In their paper the
researchers focus on “extended-connectivity” fingerprints, which take into
consideration the physical connectivity between different types of atoms.
But how can you actually quantify
structural diversity? One possibility is by comparing molecules to see how different
they are, as used for example in Tanimoto similarity assessments. Each additional
molecule would be chosen to be least similar to those in a library. Alternatively,
one could consider “richness,” how much of chemical space is covered, by
calculating how many unique structural features (such as specific bond
connectivities) are represented. Each additional molecule would be chosen to
provide as many new molecular fingerprints as possible. Shi and von Itzstein propose a
third approach, “true diversity,” that considers the number of unique features
as well as their proportional abundances. In other words, a library with a
higher true diversity would have a “more even distribution of proportional
abundances.” The researchers note that this approach has been used in ecology
for decades.
To see how their approach
performs, the researchers started with a set of 227,787 commercially available fragments,
all of which were roughly rule-of-3-compliant and scrubbed of undesirable functionalities.
They also considered a subset of 47,708 fluorine-containing fragments. For both
sets, they then assessed structural diversity as a function of increasing fragment
library size using Tanimoto similarity, richness, and true diversity, as well
as random sampling.
Naturally, as the size of a
fragment library rose, the diversity increased. As expected, applying Tanimoto similarity or richness led to greater diversity at a smaller library
size than did random sampling. This was even more true for true diversity. Interestingly, true diversity reached a maximum at 8.8% or 15.7%
(for the full and fluorinated libraries) and then began to decline. This
conceptually makes sense because commercial compounds themselves are unlikely to
be truly diverse.
More importantly, just 1% or 2.5%
of fragments were sufficient to achieve the same true diversity as the full
sets. This corresponds to 2052 fragments for the complete commercial set, the
structures of which are provided in the supplementary material. As the
researchers note, this is comparable to the size of many commonly used
fragment libraries.
The method is computationally inexpensive (it runs on a desktop), and should be a useful tool for both building and curating fragment libraries, real and virtual. Of course, diversity is not everything, and it probably makes sense to include privileged pharmacophores even at the cost of lower diversity. But as Lord Kelvin said, “when you can measure what you are speaking about, and express it in numbers, you know something about it.” This paper provides a quantitative approach for measuring diversity.
The method is computationally inexpensive (it runs on a desktop), and should be a useful tool for both building and curating fragment libraries, real and virtual. Of course, diversity is not everything, and it probably makes sense to include privileged pharmacophores even at the cost of lower diversity. But as Lord Kelvin said, “when you can measure what you are speaking about, and express it in numbers, you know something about it.” This paper provides a quantitative approach for measuring diversity.
Hi Dan, I would argue that it is actually coverage rather than diversity that one is trying to maximize in screening library design. Diversity maximization can select weird stuff and singletons which are typically more difficult to follow up. One key question when considering coverage is how similar two molecular structures have to be in order for one to be considered to be representing the other.
ReplyDeleteHi Peter, thank you for the comments.
ReplyDeleteIf by 'diversity' you meant difference, I agree that trying to maximise difference during selection would actually result in not only weird structures but also poor coverage. We define coverage as a ratio: the number of features (richness) in selected cpds over the richness in all cpds available for selection, and in this paper we used radial fingerprints to describe structural features. Instead of putting an arbitrary similarity/difference cutoff values to decide if cpd A is covered/represented by cpd B, I find it easier to quantify coverage by reducing the problem to the fingerprint level. At this level, we have coverage become a yes-or-no issue, i.e. you either have this exact fingerprint or not have this fingerprint at all.