A few weeks ago we gave a passing nod to work by Jean-Louis Reymond, who with colleagues enumerated all possible compounds with up to 11 C, F, N, and O atoms. In a new JACS Communication with Lorenz Blum, he has now expanded this analysis to molecules containing up to 13 non-hydrogen atoms.
The new dataset, GDB-13, contains 977,468,314 molecules containing carbon, oxygen, and nitrogen atoms (as well as hydrogens, of course). Unlike its predecessor it excludes fluorine, but it happily adds chlorine as an aromatic substituent as well as sulfur in heterocycles or in sulfones, sulfonamides, or thioureas. To speed calculation (which still required the equivalent of 4.5 years of CPU time), a few other simplifications were made to limit the number of heteroatoms in a given structure.
The resulting collection, while huge, is thus obviously incomplete: about two-thirds of 619,675 molecules that contain up to 13 atoms and are reported in a variety of databases do not appear in GDB-13. And GDB-13 has many unconventional structures – over half of the molecules contain one or more three- or four-membered rings.
Still, there is lots of neat stuff here: for example, 804,153 structural isomers of aspirin, and 18,371,393 structural isomers of mexiletine! And since 45.1% of the new molecules are rule-of-three compliant, there are hundreds of millions of virgin fragments just waiting to be made – and tested.