19 December 2012

GDB-17: 166 billion fragments and counting

How many possible fragments are there? Jean-Louis Reymond and colleagues at the University of Berne have been trying to answer this question computationally by enumerating all stable molecules from first principles. In their previous effort they found nearly a billion molecules with up to 13 atoms. In a new paper published in J. Chem. Inf. Model. they have now extended this analysis to molecules containing up to 17 carbon, oxygen, nitrogen, sulfur, and halogen atoms. There are 166,443,860,262 of them.

What do they look like? Before addressing that question, it is worth noting that this set of molecules—dubbed the GDB-17—is not exhaustive. The researchers intentionally excluded many potentially unstable moieties. Most of these are probably best ignored, though doing so does leave out functionalities found in some drugs, such as hemiaminal ethers (acyclovir), sulfoxides (omeprazole), and some non-aromatic double bonds (cyclosporine). In fact, more than 40% of similarly-sized molecules in PubChem (ie, they’ve actually been synthesized) are not represented in GDB-17.

But even looking at the PubChem molecules that do show up in GDB-17, there are dramatic differences between existing molecules and enumerated possibilities. For example, a huge fraction of the GDB-17 set contains 3- or 4-membered rings. Aromatic rings are surprisingly rare, at only 0.8%, compared with roughly a third of similar-sized molecules in PubChem. On the other hand, 57% of the GDB-17 molecules contain nonaromatic heterocycles, compared with just 12% in PubChem; these may be particularly attractive in terms of drug-like properties.

Consistent with their reduced aromaticity, the GDB-17 molecules also contain many more stereocenters: on average more than 6 per molecule compared with just 2 for PubChem. With all these stereocenters, it’s inevitable that the molecules are more three-dimensional than those in PubChem. By the same token, though, synthetic accessibility is likely to be a challenge.

In general the GDB-17 molecules are also considerably more polar than known molecules of similar size, with more than half of them having ClogP ≤ 0. This could in part be due to the fact that it’s often tough to purify molecules that are too soluble in water.

There’s a lot of other fun stuff here: for example, almost half a billion isomers of procaine! And for the synthetic chemists in the audience, this work illuminates vast fields of uncharted chemical territory, just waiting to be explored.

No comments: