What do they look like? Before addressing that question, it is worth noting that this set of molecules—dubbed the GDB-17—is not exhaustive. The researchers intentionally excluded many potentially unstable moieties. Most of these are probably best ignored, though doing so does leave out functionalities found in some drugs, such as hemiaminal ethers (acyclovir), sulfoxides (omeprazole), and some non-aromatic double bonds (cyclosporine). In fact, more than 40% of similarly-sized molecules in PubChem (ie, they’ve actually been synthesized) are not represented in GDB-17.
But even looking at the PubChem molecules that do show up in GDB-17, there are dramatic differences between existing molecules and enumerated possibilities. For example, a huge fraction of the GDB-17 set contains 3- or 4-membered rings. Aromatic rings are surprisingly rare, at only 0.8%, compared with roughly a third of similar-sized molecules in PubChem. On the other hand, 57% of the GDB-17 molecules contain nonaromatic heterocycles, compared with just 12% in PubChem; these may be particularly attractive in terms of drug-like properties.
Consistent with their reduced aromaticity, the GDB-17 molecules also contain many more stereocenters: on average more than 6 per molecule compared with just 2 for PubChem. With all these stereocenters, it’s inevitable that the molecules are more three-dimensional than those in PubChem. By the same token, though, synthetic accessibility is likely to be a challenge.
In general the GDB-17 molecules are also considerably more polar than known molecules of similar size, with more than half of them having ClogP ≤ 0. This could in part be due to the fact that it’s often tough to purify molecules that are too soluble in water.