29 June 2009
The new dataset, GDB-13, contains 977,468,314 molecules containing carbon, oxygen, and nitrogen atoms (as well as hydrogens, of course). Unlike its predecessor it excludes fluorine, but it happily adds chlorine as an aromatic substituent as well as sulfur in heterocycles or in sulfones, sulfonamides, or thioureas. To speed calculation (which still required the equivalent of 4.5 years of CPU time), a few other simplifications were made to limit the number of heteroatoms in a given structure.
The resulting collection, while huge, is thus obviously incomplete: about two-thirds of 619,675 molecules that contain up to 13 atoms and are reported in a variety of databases do not appear in GDB-13. And GDB-13 has many unconventional structures – over half of the molecules contain one or more three- or four-membered rings.
Still, there is lots of neat stuff here: for example, 804,153 structural isomers of aspirin, and 18,371,393 structural isomers of mexiletine! And since 45.1% of the new molecules are rule-of-three compliant, there are hundreds of millions of virgin fragments just waiting to be made – and tested.
27 June 2009
There are a few previous reports of discovering fragments that bind to RNA and their subsequent optimization; Ibis Therapeutics was particularly active in this field a few years back. In the current study, Fareed Aboul-ela and coworkers at Louisiana State University started by analyzing 120 known RNA-binding ligands and comparing these to known drugs and publicly available compounds. A variety of computationally derived physicochemical descriptors failed to differentiate the RNA binders from other molecules, but the authors note that:
This result does not preclude the likelihood that a finite set of chemical moieties constitute a “privileged” RNA binding set. The special properties of these functionalities may be too subtle or complex to detect using standard descriptors.
Following up on this hypothesis, the authors computationally “cleaved” their RNA binders to generate a set of fragments, and then purchased just over a hundred of these. These were then screened using four different NMR experiments to see if any bound to a 27-residue oligonucleotide derived from E. coli 16S rRNA, an important antibiotic target. Happily, five fragments were identified as binding to the RNA target, two of which had not previously been identified in the literature.
Whether these fragments can be advanced to high affinity binders, and whether the library will be generally useful against RNA, remain open questions. But one nice feature of this paper is the complete list of fragments tested provided in the supporting information. This list will allow other researchers to easily assemble their own screening set and test its utility. And if it proves useful, perhaps it will one day be sold by one of the commercial suppliers of fragments.
21 June 2009
Alexander Shekhtman and colleagues at SUNY Albany have developed a method they call “screening of small molecule interactor library by using in-cell NMR”, or SMILI-NMR. The process starts by overexpressing two proteins within cells (E. coli, in this case). If the proteins are sequentially expressed, one of them can be selectively labeled with NMR-active isotopes. To test their system, the researchers overexpressed the model proteins FKBP and FRB. These proteins interact only weakly by themselves, but in the presence of the small molecule rapamycin they form a high affinity complex. By performing NMR on the cells, the researchers could observe changes in NMR peaks corresponding to formation of the ternary complex inside the cells when rapamycin was added. They could also do competition studies: adding the small molecule ascomycin to this complex causes a change in the NMR peaks corresponding to the rapamycin being competed away by the ascomycin.
The next step was to look for new molecules that would modulate the interaction between FKBP and FRB, and the researchers chose a library of 289 dipeptides, which are actively transported into cells. The dipeptides were mostly fragment-sized, ranging from a low molecular weight of 132 (Gly-Gly) to a high of 390 (Trp-Trp). The dipeptides were screened in pools (organized in a matrix) and then deconvoluted to identify the most active molecules. Interestingly, none of the molecules caused discrete changes to the NMR spectra as observed with rapamycin or ascomycin, but several caused some of the NMR peaks to disappear and the remaining peaks to broaden dramatically. The most potent compound was Ala-Glu (MW 218), which caused this phenomenon at 5 mM concentration. The authors interpret this effect as being caused by the formation of a large complex consisting of many molecules of FKBP, FRB, and Ala-Glu. Interestingly, although ascomycin could reverse the effect of Ala-Glu, rapamycin could not.
The dipeptide Ala-Glu also behaved similarly to rapamycin in yeast cells: both molecules prevented growth by yeast expressing FKBP, while having no effect on yeast lacking FKBP. This was attributed to both molecules facilitating complex formation between FKBP and FRB within yeast.
The Ala-Glu “fragment” has some issues (ClogP = -4, for example); it would be interesting to see how some of the original FKBP fragments discovered at Abbott behave in this assay. And although not everyone has access to a 700 MHz NMR with a cryoprobe, this is an intriguing approach for studying protein-protein interactions in a very biologically relevant milieu.
14 June 2009
In a previous post about heterocycles that appear chemically feasible but have not been reported, we wondered whether these molecules would show biological activity. The structure of biologically relevant chemical space – that fraction of possible molecules that will exhibit some biological effect – is of great interest, but as yet unknown. Brian Shoichet and coworkers at UCSF have just published a thought-provoking analysis in Nature Chemical Biology that is also relevant to developing new fragment libraries.
The researchers ask why it is that HTS collections of a million or so compounds, vanishingly small in comparison to the roughly 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 possible small drug-sized molecules, nonetheless so often succeed in identifying hits. A lovely paper by Tobias Fink and Jean-Louis Reymond had previously computationally enumerated all possible compounds with up to 11 C, N, O, and F atoms. Of these 26,429,328 molecules, 25,810 are commercially available.
Shoichet and colleagues compared the structures of these compounds with the structures of metabolites and natural products (all of which have by definition been processed by at least one protein) and found that the commercially available compounds were much more similar to natural products and metabolites than were non-commercially available compounds. Indeed, the more similar a molecule is to a known natural product or metabolite, the more likely that it is available for purchase; 2918 of the commercially available compounds are in fact natural products or metabolites.
The bias also increases exponentially with molecular size: a random 11-atom commercial compound is almost 1000-times more likely to resemble a natural product or metabolite than is a non-commercially available molecule, whereas the bias is only about 2-fold for 6-atom molecules. Similar results were observed with other libraries.
The authors conclude that: A major reason why the screening of synthetic compounds ever finds notable hits is that our libraries are biased toward the sort of molecules that proteins have evolved to recognize.
This resemblance is reasonable. After all, most commercially available compounds are ultimately derived from naturally occurring starting materials, so their similarity to natural products isn’t surprising. Moreover, historically much of chemistry was devoted to natural product synthesis, so many of the intermediates built up over the years resemble natural products. And of course, once you learn how to do chemistry on one moiety, you will tend to stick with it unless you have a good reason to do otherwise; each heterocycle behaves (often frustratingly) differently, so if a natural-product-like molecule does the job, why look for trouble?
But does this “biogenic bias” mean that the rest of chemical space is a biological desert? Not necessarily. I can imagine at least two alternative models of chemical-biological diversity space.
Let’s call one model “lamp posts in dark fields.” Consider a vast field of some crop that can only be harvested by night. There are lights scattered haphazardly throughout the field. One might expect that the crops immediately under the lamp posts would be harvested more intensively than crops in darker parts of the field, even if other areas are equally productive. In this scenario, the lamp posts reveal natural products and similar molecules, but much – or even most – of (unlit) chemical space may also be biologically active, it just hasn’t been sampled yet.
Another possibility is the “oil-field model.” As we are all too aware, petroleum is distributed very unevenly across the globe. In some areas, such as Texas, oil was easy to find and easy to extract. In others, such as the deep ocean or the high arctic, oil is harder to find and more technically demanding to access. In this scenario, there are vast pockets of chemical space that are relevant to biology, they just haven’t been identified (let alone accessed) yet.
These are fun, speculative questions, but the paper provides some practical data. Specifically, 83% of core ring scaffolds found in natural products are absent from commercial libraries. In fragment or lead-sized molecules of MW < 350 with less than three stereocenters, 1891 rings scaffolds found in natural products are not commercially available. These could be useful additions to fragment libraries, and the paper lists 18 examples.
In fact, at least one company, deCODE, is explicitly enriching its fragment collection with molecules based on natural products and metabolites. This might be a good strategy. After all, even if the “lamp posts in dark fields” model is correct, there are plenty of brightly illuminated, unharvested chemotypes. At least for now, picking these may be more productive than venturing into the twilight-zone of uncharted chemical space.
11 June 2009
I was also under the impression that industrial post-docs were largely a thing of the past. So, two questions: 1. are there academic programs/labs who are specifically training NMR-FBDD people, and more in general FBDD practitioners at all? 2. If not, are jobs like these a way for management to dip their toe in the water without actually resourcing FBDD efforts?
02 June 2009
Researchers at UCB Celltech have recently taken this to an interesting extreme: they’ve computationally enumerated all neutral mono- and bicyclic 5 and 6 membered heteroaromatic rings containing carbon, nitrogen, oxygen, sulfur, and hydrogen. The resulting VEHICLe (virtual exploratory heterocylic library) is a set of 24,847 ring systems, of which only 1701 have been reported.
Of course, as the authors note, many of the remaining molecules “are outlandish and would obviously be either very difficult or impossible to make.” To address this, they used a machine learning approach to gauge synthetic tractability. This resulted in over 3000 molecules, some of which look quite reasonable:
Interestingly, the researchers estimate that only 5 to 10 of these heterocycles are being made each year, which leaves hundreds of virgin synthetic targets.
Are these a rich source of new fragments? Or, as the authors also speculate, do many of these lie outside biological activity space?