29 June 2009

Fragments of the future - part 3 (977 million and counting)

A few weeks ago we gave a passing nod to work by Jean-Louis Reymond, who with colleagues enumerated all possible compounds with up to 11 C, F, N, and O atoms. In a new JACS Communication with Lorenz Blum, he has now expanded this analysis to molecules containing up to 13 non-hydrogen atoms.

The new dataset, GDB-13, contains 977,468,314 molecules containing carbon, oxygen, and nitrogen atoms (as well as hydrogens, of course). Unlike its predecessor it excludes fluorine, but it happily adds chlorine as an aromatic substituent as well as sulfur in heterocycles or in sulfones, sulfonamides, or thioureas. To speed calculation (which still required the equivalent of 4.5 years of CPU time), a few other simplifications were made to limit the number of heteroatoms in a given structure.

The resulting collection, while huge, is thus obviously incomplete: about two-thirds of 619,675 molecules that contain up to 13 atoms and are reported in a variety of databases do not appear in GDB-13. And GDB-13 has many unconventional structures – over half of the molecules contain one or more three- or four-membered rings.

Still, there is lots of neat stuff here: for example, 804,153 structural isomers of aspirin, and 18,371,393 structural isomers of mexiletine! And since 45.1% of the new molecules are rule-of-three compliant, there are hundreds of millions of virgin fragments just waiting to be made – and tested.


Anonymous said...

"And GDB-13 has many unconventional structures – over half of the molecules contain one or more three- or four-membered rings."

Isn't it interesting that we consider these "unconventional"? Nature sprays three- and four- membered rings around like paintballs but fragment libraries and medicinal chemistry programs tend to avoid them for reasons lost in the mists of time... to hard to make, too reactive...

For example, epoxides are avoided in fragment libraries on the basis of being too reactive, but they appear pretty commonly in natural products. Are our library sets biased in the wrong direction/

Anonymous said...

This gives further support to the suggestion from Shoichet's group in its recent NCB paper, which suggests a survey of cyclic structures found in natural products as a means of discovering new fragment scaffolds.

c. dagostin said...

I suppose that 3 and 4 membered rings are considered 'unconventional', not because they cannot be contained in a drug, but for reasons such chemical tractability or lack of suitable vectors to grow the fragment.