One nice thing about being a consultant is that I get paid to think about things for people (sometimes). One of the things I have been thinking about lately (on the clock) is the optimal size of fragment pools. I got to started wondering if there can be too many fragments in a pool:
You know how you mother always said, don’t eat too much
it will make you sick? I never believed her until my own child was allowed to eat as much easter candy as possible and it actually made him sick. [It was part of a great experiment to see how much "Mother Wisdom" was true, like Snopes.] I have been working lately in library (re)-optimization and one thing that keeps coming up is how many fragments should go in a pool. As pointed out here and discussed here, there are ways to optimize pools for NMR (and I assume the same approach can be done for MS). So, we have always assumed that the more fragments in a pool the better off you are, and of course the more efficient.
But is that true? Is there data to back this up? Probably not, I don’t [know if] anyone wants to run the control. So, let’s do the gedanken. If you have 50 compounds in a pool (nice round number and easy to do math, so its my kind of number) you would expect for a “ligandable” target to have a 3-5% hit rate. That means out of that pool you would expect 1.5-2.5 fragments to hit. So, that means that you have 2 fragments that hit, these two would then compete and your readout signal would be reduced by 50%. So, if you are already having trouble with signal you are going to have more. Also, can you be sure that negatives are real, or did they “miss” because of lowered signal due to the competition. And what if one of the hits is very strong? Also, how do you rank order the hits? Do you scale the signal by the number of hits in the pool?
I then reached out to the smart people I know who tend to be thinking about the same things I do, but in far greater depth. I spoke to Chris at FBLD and he was putting together a large 19F library, aiming to get up 40 or more 19F fragments in a pool. Well, Chris Lepre at Vertex was already thinking about this exact problem. He shared his thoughts with me and agreed to let me share them here (edited slightly for clarity).
To accurately calculate the likelihood of multiple hits in a pool, I [Ed:Chris] used the binomial distribution. For your hypothetical pools of 50 and a 3% hit rate, 44% of the samples will have multiple hits (25.6% with 2 hits, 12.6% with 3, 4.6% with 4); at a 5% hit rate this increases to 71% (26.1% = 2 hits, 22% = 3, 13.6% = 4, 6.6% = 5, 2.6% = 6). So, the problem of competition is very real. It's not practical to deconvolute all mixtures containing hits to find the false negatives: the total number of experiments needed to screen and deconvolute is a minimum when the mixture contains approximately 1/(hit rate)^0.5 (i.e., for a 5% hit rate, mixtures of 5 are optimal). [Ed:Emphasis mine!]Then there's the problem of chemical reactions between components in the mixture. Even after carefully separating acids from bases and nucleophiles from electrophiles in mixtures of 10, Mike Hann (JCICS 1999) found that 9% of them showed evidence of reactions after storage in DMSO. This implies a reaction probability of 5.2%, which, if extended to the 50 pool example, would lead one to expect reactions in 70% of those mixtures. If this seems extreme, keep in mind that the number of possible pairwise interactions = npairs = n(n-1)/2
(n-1/2)*n[Ed: fixed equation], where n = the number of compounds in the pool. So, a mixture of 10 has 45 possible interactions, while a mixture of 50 has 1200. Even with mixtures of only five, I've seen a fair number of reacted and precipitated samples. Kind of makes you wonder what's really going on when people screen mixtures of 100 (4950 pairs!) by HSQC. [Ed: I have also seen this as I am sure other people have. I think people tend to forget about the activity of water. For those who hated that part of PChem, here is a review. Some fragment pools could be 10% DMSO in the final pool, and are probably much higher in intermediate steps.]Finally, there's the problem of chemical shift dispersion. Even though F19 shifts span a very large range and there are typically only one or two resonances per compound, the regions of the spectrum corresponding to aromatic C-F and CF3 become quite crowded. And since the F19 shifts are relatively sensitive to small differences in % DMSO, buffer conditions, etc. it's necessary to separate them by more than would be necessary for 1H NMR. Add to that that need to avoid combining potentially reactive compounds (a la Hann) and the problem of designing non-overlapping mixtures becomes quite difficult. [Ed: They found that Monte Carlo methods failed them.]I've looked at pools as large as 50, but at this point it looks like I'll be using less than 20 per pool. I'm willing to sacrifice efficiency in exchange for avoiding problems with competition and cross-reactions. The way I see it, fragment libraries are so small that each false negative means potentially missing an entire lead series, and sorting out the crap formed by a cross-reaction is a huge time sink (in principle, the latter could yield new, potentially useful compounds, but in practice it never seems to work out that way). The throughput of the F19 NMR method is so high and the number of good F-fragments so low that the screen will run quickly anyway. Screening capacity is not a problem, so there's not really much benefit in being able to complete the screen within 24 hrs vs. a few days.
The most common pool size (from one of our polls) was 10 fragments/pool. This would mean that the expected hit rate was 1% or less. This is a particularly low expected hit rate, or people are probably putting too many fragments in a pool. So, is there an optimal pool size? I would think that there is: between 10-20 fragments. You are looking for a pool size that maximizes efficiency but you don't want to have so many that you also raise the possibility of competition.