09 July 2018

Practical Fragments turns ten, and celebrates with a poll on the modern fragment library

Ten years ago today Teddy launched Practical Fragments with a simple question about screening methodologies. More than 660 posts later we've returned to that topic several times, most recently in 2016. But before you can start screening you need a fragment library, which is the subject of our new poll.

Back in 2012 we asked readers the maximum size (in terms of "heavy", or non-hydrogen atoms) they would consider for fragments in their library. The results were mostly consistent with the Rule of 3, so beloved by Teddy that he compared it to a powerful wizard.

There has since been a trend toward smaller fragments, driven in part by empirical findings that smaller fragments have better hit rates, in agreement with molecular complexity theory.

At some point, though, ever smaller fragments will mean lower hit rates: fragments that are too small will bind so weakly they will be difficult to detect. And practical issues arise: organic molecules with just a few non-hydrogen atoms are often volatile.

Therefore, we’re revisiting this question: What is the smallest fragment you would put in your library?

As long as we're on the subject of libraries, how many fragments do you have in your primary screening library, or how many do you screen on a regular basis?

Please vote on the right-hand side of the page. If you have multiple fragment libraries (for example one for crystallographic screening and one for biochemical screening) you can respond for each library; you will need to press "vote" after each answer. Please feel free to leave comments too.

Thanks to all of you for making Practical Fragments a success and for your comments over the years – looking forward to the next decade!


Peter Kenny said...

Hi Dan,

Congratulations on the 10 year anniversary of Practical Fragments.

I think the molecular size distribution of a fragment library is more relevant than upper and lower limits for molecular size. The original SAR by NMR study reported mean fragment MW of 213 Da. I recall the corresponding figure for the 1998 Zeneca NMR library as being just under 200 Da. The smallest fragment that I can recall putting into a fragment library is 4-methyl-2-aminopyridine.

It’s a lot more difficult to say what an ideal molecular size distribution should look like for a generic fragment screening library. The choice of algorithm for sampling chemical space after molecular size cutoffs have been applied can influence the molecular size distribution of the library.

I know those clever chaps in Cambridge defined fragment-likeness with exquisite precision using the seminal rule of 3. However, it would have been nice if they’d been a bit more forthcoming about the analysis that they said they’d done.

Angelo said...

I agree with Peter,

The choice of the algorithm/similarity coefficient used for example in a diversity selection will affect the MW distribution of the final set.

Differences in the size of molecules bias the evaluation of fingerprint similarity which is usually used to explore chemical space. Bigger more complex compounds tend to produce fingerprints with higher bit density than smaller molecules, which leads to artificially high similarity values in search calculations. This means that using the Tanimoto coefficient in a diversity selection will bias towards smaller molecules as they will look more dissimilar. The final diversity set will have a MW distribution shifted towards low MW compared with the parent database. There are ways to avoid this asymmetry.

Glyn Williams said...

Many congratulations to Dan and Teddy for driving 10 years of discussion - sometimes passionate, often entertaining and always useful. The FBDD community are indebted to the blog and its contributors.

Being more of an experimental scientist, my view is that the property distribution of your fragment library (including size) should be tailored to the techniques that you use to detect and validate hits. Techniques that require higher potency will generate hits with (on average) higher molecular weight and the screening library should reflect that. If that technique is also able to screen large numbers of fragments, then this will more strongly bias the library distribution to larger molecules. If you also (or exclusively) use a more sensitive or robust screening method - perhaps one like crystallography which has a lower throughput but can detect very weak hits by screening at very high concentrations - then the library should contain a high proportion of smaller fragments. The latter have the obvious advantage that they will explore more chemical space.

Astex (those clever chaps in Cambridge) have certainly shown the MWt data for their library at many conferences and, in 2015, the distributions of fragments-screened and their X-ray validated fragment hits both had peaks between 11-13 non-H atoms with an average MWT around 175 Da. However, this is not the optimum distribution for all combinations of screening methods, and is also subject to change as methods are improved and additional chemical space is explored synthetically.

Peter Kenny said...

Hi Angelo,

We used two diversity-based selection tools (both created by Dave Cosgrove in the mid-nineties) for the Zeneca NMR library and these were used with Daylight fingerprints. BigPicker, which had been created for product-based (as opposed to reagent-based) synthetic library design, defined the diversity of a selection of compounds by the Tanimoto distance between the two closest (i.e. most similar) compounds in the selection. BigPicker tended to pick singletons (often chemically unattractive and definitely bad for coverage) although there were ways to deal with this. One key feature of BigPicker was the ability to bias compound selection away from what had already been selected and this capability provided the inspiration for the Core and Layer approach to design of compound libraries.

We also used Flush (enables a number of fingerprint-based calculations to be performed) in library design. One of these is what used to be called cluster-sampling (I admit this is a misnomer) which selects compounds based on the number of neighbors within a given similarity threshold. Flush tends to select structures with low bit densities to represent groups of analogs and these tend to correspond be the most structurally prototypical compounds (good for coverage).

Flush and BigPicker were discussed in the context of fragment library design in this article

Peter Kenny said...

Hi Glyn,

I certainly agree that there are advantages to tailoring compound selections to the technique used to detect and quantify binding. With respect to molecular size it can be helpful to draw a distinction between mass (e.g. molecular weight) and bulk (e.g. molecular volume). The ring currents associated with aromatic rings can be helpful when selecting compounds for screening by protein-detect NMR (I concede that these might cause the Sages of Stevenage to spit a few feathers) and heavier halogens can be useful when using SPR. I don’t have any specific experience with crystallographic detection of fragments and what you suggest would be in line with what I’d expect. As an aside, I recall the difficulties that some of my colleagues had experienced in getting structures for fragments that were known to bind was invoked as a rationale for using crystallographic detection. I recall thinking, “maybe not…”

Lipophilicity is the other property that needs to be considered when selecting fragments for screening and I believe that an absolute cutoff of 3 is too restrictive. Although simply scaling the rule of 5 molecular weight and logP cutoffs by 60% might seem obvious, this assumes a correspondence between the distributions for the two quantities in the chemical space from which compound selections are made. Although Astex have disclosed average values of molecular size metrics for their screening library, they didn’t disclose any details of the analysis that they claimed to have done in connection with the rule of 3. I criticized aspects of Ro3 in this blog post

Glyn Williams said...

Hi Peter,

In my day, Astex had quite a variable translation rate from 'biophysical hit' (mostly NMR but sometimes also thermal shift) to validated X-ray hit (i.e. soaked and structurally characterised). Since the commitment of internal chemistry to hit elaboration required structural data, many biophysical hits could not contribute to the design and progression of validated hits into leads.

Some biophysical hits were certainly binding in non-crystallographically accessible sites, but the most obvious issue with many of the others was their solubility or the solubility/binding potency ratio. For ligand detected NMR it is possible to detect binders at concentrations well below their Kd's. Only a small fraction of ligand needs to be complexed (~1%) and the sensitivity of the method is determined by the protein (not ligand) concentration. For crystallographic detection, the ligand concentration in the crystal must exceed the protein-ligand Kd, and often by several-fold.

The best strategy to improve translation was to work on improving the solubility of the library by setting a target aqueous solubility for all new members (5mM) and removing poorly soluble members that had never been validated hits.

Other properties have also been analysed, but the most important read-out for lipophilicity is probably not the number or proportion of hits, but how many could be progressed to leads. That sort of analysis is very project and company specific and might be considered sensitive.

Ro3 did not feature heavily in Astex discussions when I left. While the 3-commandments were still being observed, the day-to-day selection of fragments relied more on principles learned from the analysis of hits, distilled into 'binding pharmacophores'. This analysis has also been presented at conferences, but, since it relies on a lot of experimental data (some obtained with collaborators), its publication is likely to be more problematic. Hope that is helpful.