Those who have followed the drug discovery literature over the last decade or so will have become aware of a publication genre that can be described as ‘retrospective data analysis of large proprietary data sets’ or, more succinctly, as ‘Ro5 envy’.
Although data analysts frequently tout the statistical significance of the trends that their analysis has revealed, weak trends can be statistically significant without being remotely interesting.
This is especially likely to occur when data are “binned” into a smaller number of categories before being analyzed, thereby hiding variation and making correlations appear stronger than they really are. Since many published analyses use proprietary, unavailable data, Kenny and Montanari constructed model “noisy” data sets and looked for correlations in the primary data and the binned data. They found that correlations in the binned data were inflated. Perhaps counter-intuitively, the effect actually gets more pronounced the larger the data set.
Having described the problem, Kenny and Montanari go on to question some recent high-profile papers correlating, for example, lipophilicity with pharmacological promiscuity, or the percentage of sp3-hybridized carbons (Fsp3) with solubility (see also here). In the latter case, all the data were publicly available, and a reanalysis with the primary data as opposed to binned data caused the correlation coefficient (r) to drop from 0.972 to 0.247!
Graphical representation of data comes under heavy scrutiny too. In particular, the common practice of subdividing data points into small numbers of categories (often red, yellow, and green) can make these categories appear discrete when the underlying data are better described as a continuum.
The overall message is that weak correlations may lead to misguided strategies:
To restrict values of properties such as lipophilicity more stringently than is justified by trends in the data is to deny one’s own drug-hunting teams room to maneuver while yielding the initiative to hungrier, more agile competitors.
There is something to this, though acting on it is not without risk. As the old saying goes, nobody gets fired for buying IBM. Most drug discovery efforts fail, but if you fail making conventional compounds, you’re less likely to come under fire than if you fail by doing something outside the accepted norm.
But whatever you do, it’s worth remembering:
The human liver remains an effective antidote to the hubris of the drug designer.