16 October 2023

Spacial Scores: new metrics for measuring molecular complexity

Molecular complexity is one of the theoretical underpinnings for fragment-based drug discovery. Mike Hann and colleagues proposed two decades ago that very simple molecules may not have enough features to bind tightly to any proteins, whereas highly functionalized molecules may have extraneous spinach that keeps them from binding to any proteins. Fragments, being small and thus less complex, are in a sweet spot: just complex enough.
 
But what does it mean for one molecule to be more complex than another? Most chemists would agree that pyridine is more complex than methane, but is it more complex than benzene? To decide, you need a numerical metric, and there are plenty to choose from. The problem, as we discussed in 2017, is that they don’t correlate with one another, so it is not clear which one(s) to choose. In a new (open access) J. Med. Chem. paper, Adrian Krzyzanowski, Herbert Waldmann and colleagues at the Max Planck Institute Dortmund have provided another. (Derek Lowe also recently covered this paper.)
 
The researchers propose the Spacial Score, or SPS. This is calculated based on four molecular parameters for each atom in a given molecule. The term h is dependent on atom hybridization: 1 for sp-, 2 for sp2-, 3 for sp3-hybrized atoms, and 4 for all others. Stereogenic centers are assigned an s value of 2, while all other atoms are assigned a value of 1. Atoms that are part of non-aromatic rings are also assigned an r value of 2; those that are part of an aromatic ring or linear chain are set to 1. Finally, the n score is set to the number of heavy-atom neighbors.
 
For each atom in a molecule, h is multiplied by s, r, and n2. The SPS is calculated by summing the individual scores for all the atoms in a molecule. Because there is no upper limit, and because it is nice to be able to compare molecules of the same size, the researchers also define the nSPS, or normalized SPS, which is simply the SPS divided by the number of non-hydrogen atoms in the molecule. Although SPS can be calculated manually, the process is tedious and the researchers have kindly provided code to automate the process. Having defined SPS, the researchers compare it to other molecular complexity metrics, including the simple fraction  of sp3 carbons in a molecule, Fsp3, which we wrote about in 2009. 
 
The researchers next calculated nSPS for four sets of molecules including drugs, a screening library from Enamine, natural products, and so-called “dark chemical matter,” library compounds that have not hit in numerous screens. The results are equivocal. For example, the nSPS for dark chemical matter is very similar to that for drugs. On the other hand, natural products tend to have higher nSPS scores than drugs, as expected. Interestingly, the average nSPS score for compounds in the GDB-17 database, consisting of theoretical molecules having up to 17 atoms, is also quite high.
 
The researchers assessed whether nSPS correlated with biological properties, and found that compounds with lower nSPS tended to have lower potencies against fewer proteins, as predicted by theory. That said, this analysis was based on binning compounds into a small number of categories, and as Pete Kenny has repeatedly warned, this can lead to spurious trends.
 
The same issue of J. Med. Chem. carries an analysis of the paper by Tudor Oprea and Cristian Bologa, both at University of New Mexico. This contextualizes the work and confirms that drugs do not seem to be getting more complex over time, as measured by nSPS. This may seem odd, though Oprea and Cristian note that by “normalizing” for size, nSPS misses the increasing molecular weight of drugs.
 
This observation also raises other questions, such as the fact that SPS explicitly excludes element identity. Coming back to benzene and pyridine, both have identical SPS and nSPS, which does not seem chemically intuitive. One could quibble more: why square the value of n in the calculation of SPS? Why allow s to be only 1 and 2, as opposed to 1 and 5?
 
In the end I did enjoy reading this paper, and I do think having some metric of molecular complexity might be valuable. I’m just not sure where SPS will fit in with all the existing and conflicting metrics, and how such metrics can lead to practical applications.

5 comments:

Wim said...

Because complexity is a very subjective thing, it makes sense a lot of things about this specific score are quite ad hoc too. In a way, this score is like a weighted version of the Zagreb M1 topological index, which is simply the sum of each atom's (heavy atom) degree squared. In my opinion, the main strength of the result here is that "normalizing" (dividing the score by atom count) makes the score independent on the molecule size, though that is a kind of obvious consequence: nSPS is actually just the average of the atom per atom SPS.

As the atom's SPS depends only on degree, hybridization, stereo-flag and non-aromatic ring-membership, this is really an atomic descriptor as it does not depend on the environment or on long range co-occurrences in the molecule at all. I quite strongly agree that the categories and their weighing are very ad hoc: is a quaternary sp3 carbon n_i2=16 times more complex than a methyl carbon? Perhaps, but perhaps it could be 5 times or 20 times. Is an sp-hybridized atom less complex than an sp2 atom? And doesn't the degree term already account for this? All of this seems to be based on vibes and having a simple formulation (which is not a criticism!).

To give an example where nSPS clearly fails the intuitive test, try this: nSPS of Dynemicin A, an enediyne natural product, a structure which has haunted total synthesis aficionados for years, with decalin, a commodity chemical. Dynemicin A has nSPS=30.5 and decalin=40.8. Why? Well, decalin is a fully sp3 framework, where every atom is in a ring, and where 20% of the atoms are stereogenic.

Finally, one question remains to be answered: though the fundamental interest of molecular complexity is enough to justify this work and related work, what can medicinal chemists use a complexity score for? The authors show the correlation of nSPS with synthesizability is weak, so it's not that. They show nSPS for marketed drugs does not increase over time, so it's also not that. So then what is it?

All this said, I was pleasantly surprised to see this get so much resonance and I am always happy to see cheminformatics, chemical graph theory and similarly fundamental approaches have their moment in the spotlight in the medicinal chemistry literature, so cheers to the authors and the editors!

Peter Kenny said...

Hi Dan,

This way of looking at molecular complexity is very different to that of Mike Hann and colleagues and it’s not clear where (or even whether) it would fit usefully into drug discovery. The Hann model is framed in molecular recognition terms and the key point is that is that the likelihood of each pharmacophoric feature interacting optimally (dare I say efficiently?) with the target decreases with the number of pharmacophoric features in the molecular structure of the ligand. As with Ro5, the Hann model raises of awareness of factors that you need to consider when selecting compounds for screening.

Although I see advantages from a molecular recognition perspective in having tetrahedral carbons in rings (especially when substituents adopt axial orientations naturally) I’ve always regarded the link between Fsp3 and molecular complexity as rather flimsy. For example, I wouldn’t regard a 1,4-disubstituted benzene ring as any less complex than the equivalent cis-1,4-disubstituted cyclohexane (although the latter will give you access to the region ‘above’ the ring plane). Neither SPS nor nSPS appears to be usefully predictive of biological activity (the trends were observed in structurally diverse data sets and binning of the data suggests that these trends are very weak) and this study wouldn’t persuade me to change the way that I do things. I would speculate that (as is the case for Fsp3) that SPS and nSPS values would tend to be higher for molecular structures with basic centers than those without.

Dan Erlanson said...

Hi Pete,
I mostly agree with you. I do like the idea of having a quantitative measure of molecular complexity since the Hann model is more a qualitative thought experiment. But as you, "Wim," and I have noted, this seems difficult to do in a useful, non-arbitrary fashion.

I do disagree with your example though. Consider 1-bromo-4-methylbenzene vs cis-1-bromo-4-methylcyclohexane. I would argue the second molecule is more "complex" than the former, if for no other reason than that it can assume more conformations. That said, I have no idea how "much" more complex it is.

Peter Kenny said...

I certainly agree with your assessment of the Hann model, Dan, and also that it would be useful to be able to quantify molecular complexity. To be useful in selection of compounds for screening I’d conjecture that a molecular complexity metric would need to ‘know’ something about interaction potential of heteroatoms (hydroxyl is arguably contributes more to complexity than ether oxygen because both HBD and HBA need to be accommodated by the binding site). The Hann view of complexity also has relevance beyond binding of ligands to targets (I invoked it in the aqueous solubility section of this article in the context HBD-HBA imbalance). I’ve found it useful to control extent of substitution when selecting fragments for screening (here’s an ancient post) although I don’t think that it would be possible to define a useful metric in these terms (nevertheless I’d be delighted to be proven wrong on this point).

Having more than one significantly populated conformation would increase the probability of a good match with the binding site and, in the Hann model, this would be equivalent to being of lower complexity than an ‘equivalent’ ligand which only had a single significantly populated conformation. My view is that it would be difficult to account for conformational flexibility in a meaningful manner when attempting to quantify molecular complexity although, as said before, I’d be delighted to be proven wrong on this point.

Dan Erlanson said...

Hi Pete,
The idea of a more flexible molecule having lower molecular complexity than a rigid molecule is interesting, and further illustrates the complexity of trying to come up with a metric. Perhaps the problem is even more devilish than Justice Potter's famous quote about "knowing it when I see it."