Why aren’t we better at weighing results and making sound conclusions?

Mar 29

Whether its vaccines preventing SARS-Cov2 infection, or greenhouse gases causing Climate Change, we find it hard to convince people that our hypotheses are supported by the available evidence.

First, it’s not new. We had the “HIV doesn’t cause AIDS” debacle, aka HIV Denialism, spearheaded most famously by the now-disgraced molecular biologist Peter Duesberg and most damagingly by the former South African President Thabo Mbeki. Mbeki in particular railed against testing and delayed the deployment of retroviral therapies in South Africa for years, policies that led to the unnecessary death of hundreds of thousands of citizens. President Ronald Reagan remained stone-cold silent on the subject of HIV and AIDs for more than 4 years while the epidemic grew ever larger in the US.

You think we’d learn.

Scientific denialism is the extreme result of disputing scientific findings are disputed on irrational grounds. As we see in the anti-vaxxer community, the arguments are not scientifically grounded but rely instead on a bevy of illogical arguments, including rhetorical fallacies, eg. appeals to false authority, appeals to “fairness”, and the right to disagree. But at heart, ignoring and discounting observations that clearly support critical scientific hypotheses is a unifying theme of denialism.

You might wonder: why focus on these hypotheses? Why not simply state the facts. The answer is that we can find gradations of scientific denialism that extend from crazy anti-vaxxers to the recent aducanumab approval, with a lot of gray area in between that involves hypothesis testing.

This is a hypothesis that is also considered to be a true statement:

ALL SPIDERS HAVE 8 LEGS.

This is a true statement as far as we have tested it – no spiders have been found that have more or fewer than 8 legs, and those with less have accidently lost a leg that was originally there. This exercise, of viewing (collectively) many spiders and deriving the statement is an example of inductive reasoning, meaning we have taken individual observations that THIS SPIDER HAS 8 LEGS, which are singular statements, and have formulated the hypothesis that ALL SPIDERS HAVE 8 LEGS, and having (collectively) failed to disprove the hypothesis it has become a universal statement, considered true.

Formally, this is an exercise in evidence-based epistemology. We can ignore the philosophy though, and focus on what “evidence-based” means here:

1) we started with observations: THESE SPIDERS HAVE 8 LEGS.

2) we (perhaps subconsciously) formulated a hypothesis – NO SPIDERS EXIST THAT DO NOT HAVE 8 LEGS (unless they lose 1 or 2 by accident)

3) we collectively have tested the hypotheses and drawn the appropriate evidence-based conclusion that, indeed, ALL SPIDERS HAVE 8 LEGS.

So far, we’re good. But there is an asymmetric element to our hypotheses, first articulated by Karl Popper. Note that the universal statement ALL SPIDERS HAVE 8 LEGS can never be verified by any number of singular statements because there is in theory always another spider to examine. However, they can be refuted by a singular statement, eg.

THIS SPIDER HAS 6 LEGS.

(Popper used ALL SWANS ARE WHITE as his example of a universal statement, readily disproved by the appearance of a BLACK SWAN, an image repurposed to describe a rare and unexpected “black swan event”. To be fair, black swans are rather common which is why Popper used them in his example.)

Popper used the asymmetry inherent in the hypothesis (impossible to prove, easy to disprove) to formulate new criteria for statements about the world, specifically, that scientific statements are fundamentally different from other classes of statements such as metaphysical statements (eg. “man is good”). In Popper’s view, scientific statements can provide evidence-based truth because they can be falsified. His “Criteria of Falsifiability” states that “only those hypotheses which can potentially be contradicted by singular statements qualify as scientific”.[1]

It’s never so cut and dry as spider’s legs, and the acceptance or rejection of a hypothesis can be less straightforward than the weight of any single observation (thus we use statistics to bracket the uncertainty of our measurements). Regardless we can make the statement that the “empirical content” of a hypothesis is a useful measure of its inherent value.

Simply put, there is more empirical content in the statement ALL SPIDERS HAVE 8 LEGS than in the statement ALL SPIDERS HAVE LEGS.

Empirical content reflects the testability and falsifiability of a hypothesis, and an empirically rich hypothesis yields many predictions and therefore many potentially falsifying statements. We can see the richness of complex hypotheses in their ability to generate testable predictions, and the most powerful scientific hypotheses can survive making incorrect predictions. Hypotheses having truly robust empirical power become Theories, as in the Theory of Evolution or the Theory of Relativity.

So, another hypothesis is:

PFIZER/BIONTECH’S VACCINE BNT162B2 PREVENTS SARS-COV2 VIRUS INFECTION.

This hypothesis has a lot of empirical content and predicts, at a minimum, that vaccinated individuals should get sick less often than unvaccinated individuals, and that vaccinated individuals should spread the virus to other people less frequently than non-vaccinated individuals. We can make these predictions based by our experience with other vaccines: flu vaccines for example.

When we review the data we find our predictions are robustly supported: in every instance in which it has been studied BNT162b2 prevents SARS-COV2 virus infection and reduces viral spreading. Note that the foundational hypothesis is eminently falsifiable – just ask GSK or Merck or Sanofi or Curevac, all of whose vaccines failed – but with respect to BNT162b2, and Janssen’s JNJ-78436735 vaccine, and Moderna’s mRNA-1273 vaccine – the hypotheses have withstood rigorous testing. So that’s good!

In contrast, a weak hypothesis generates predictions that fail more often than not.

One example is the hypothesis that hydroxychloroquine can be used to treat SARS-COV2 patients. A study, published in 2005, suggested that chloroquine could prevent SARS (the older virus) from infecting cells kept in culture.[2] This was a simple study published in an obscure journal, but it did make predictions: that chloroquine acted by interfering with the “terminal glycosylation of the cellular receptor, angiotensin-converting enzyme 2” thereby blocking virus/cell interaction, and that chloroquine might be useful in treating SARS patients. This study, accessed over 1M times online and widely shared on social media, was offered as evidence that a form of chloroquine called hydroxychloroquine could treat SARS-Cov2 patients. Thus, the observation (chloroquine blocked SARS infection of cells in a cell culture dish) generated a hypothesis:

HYDROXYCHLOROQUINE CAN TREAT SARS-COV2 PATIENTS.

This hypothesis was tested clinically, many times over, and failed each time. Note that the empirical power of the hypothesis is not zero: angiotensin-converting enzyme 2 (ACE-2) plays a critical role in mediating SARS-COV2 infectivity, but hydroxychloroquine doesn’t block this pathway in patients. Therefore, the hypothesis failed a critical prediction and must be rejected.

Anti-vaccine sentiment offers an extreme example of Science Denialism, and its adherents have many and complicated ways of rationalizing their views built for example on religious, political and cultural mores. Regardless, the active disregard or denial of results that contradict their viewpoint is a central component of their stance.

In science we expect a more clear-headed evaluation of results in the context of hypothesis testing. And yet, much of the published scientific research literature cannot be reproduced. Venture capital firms and pharmaceutic companies that perform ‘wet diligence’ have consistently concluded that 2/3rds of the results in the literature are not reproducible (wet diligence refers to the hiring of an independent lab to repeat published experiments). Note here that we are not referring to fraud, nor are we saying that 2/3rds of published results are wrong. Science is hard! Indeed, the reproduction of results is a core element of the scientific literature; failure to reproduce results is a part of the process that drives science forward, because this process allows us to reject a hypothesis and move on.

Except when we don’t: the counter-weight to the process of rejecting hypotheses in the face of data is the use of ad hoc explanations for results that run counter to the predictions made. This practice is remarkably common in academic labs, in biotech and pharma labs, and in investor communities. “Oh the samples must be mixed up, let’s run the analysis again” is one simple example. But the slope is slippery: tossing out inconvenient results as “outliers”, declaring a specific assay or model irrelevant because it didn’t show you what you wanted to see, the mining of statistics in search of significance, aka ‘p-hacking’, selective display of data and so on. Nonetheless, most ad hoc responses to inconvenient results are pretty harmless, since they fail to survive scrutiny. Of course, a lot of people’s time and energy is wasted trying to reproduce bad experiments.

A rule of thumb, articulated by James Farris[3] among many others, is that each use of an ad hoc explanation to dismiss a result predicted by a hypothesis weakens that hypothesis. Farris made this argument in defense of the use of parsimony in evolutionary systematics, where the minimum number of explanations is used mathematically to derive relationships from a data set. This works as well with fossils as it does with genomic sequences, so we see parsimony can be a powerful tool because it specifically reduces the number of ad hoc explanations.

That ad hoc justifications for inconvenient results should be avoided may seem obvious, but again, the slope is slippery, and made more so when pressure is brought to bear on a hypothesis that is not scientific. This is true of our anti-vaxxers as discussed. It appears to also be true of the approval of aducanumab for the treatment of Alzheimer’s Disease (AD). This issue has been covered in detail, see for example the in-depth analyses by Derek Lowe and by STAT News.[4],[5]

The drug was approved despite inconsistent Phase 3 clinical trial results, and despite the use of post-hoc analysis to identify potentially responding patients. The use of post hoc analysis to generate an ad hoc explanation for inconvenient results sounds confusing, but basically you could say the after the fact (post hoc) the company looked at just some patients to conclude the clinical trial worked, for this population (an ad hoc explanation). As stressed earlier, ad hoc explanations weaken the original hypothesis, in this case that aducanumab can treat AD. If we applied the parsimony principle to the available clinical results, we would reject this hypothesis.

So, what should have happened, of we were to apply scientific rigor here? The hypothesis being tested had changed (which is fine, happens all the time, and should). Specifically, the hypothesis:

ADUCANUMAB WILL REDUCE SYMPTOMS IN AD PATIENTS,

was altered to:

ADUCANUMAB CAN REDUCE AD SYMPTOMS IN SOME PATIENTS, SPECIFICALLY, PATIENTS WITH EARLY SIGNS OF AD.

That hypothesis could, and should have been, the foundation for a new clinical trial. The approval means the hypothesis will be tested in the public domain, without benefit of the clinical trial design that could tell us if its working, or not. This is an excellent example of ad hoc justification undermining the principle of evidence-based truth.

Back in July the Biotech Clubhouse panel worried aloud that the FDA approval of aducanumab and the resulting outcry and controversy would further confuse, and embolden, a public already resistant to the scientific advice for mask and vaccine use to prevent SARS-Cov2. And here we find the real danger in the slippery slope: if we won’t hold the line on scientific integrity, how can we reason with a skeptical general public?

Next time, counterpoint: how successful hypothesis testing led to novel and effective treatments for a nasty autoimmune disease.

Stay tuned.

[1] https://staff.washington.edu/lynnhank/Popper-1.pdf – from Conjectures and Refutations by K. Popper (1963)

[2] https://virologyj.biomedcentral.com/articles/10.1186/1743-422X-2-69 – Virology Journal

[3] https://onlinelibrary.wiley.com/doi/10.1111/j.1096-0031.2008.00214.x – Farris on explanation and parsimony

[4] https://blogs.sciencemag.org/pipeline/archives/category/alzheimers-disease – Derek Lowe’s blog

[5] https://www.statnews.com/2021/06/29/biogen-fda-alzheimers-drug-approval-aduhelm-project-onyx/ – STAT

Paul Rennert

Why aren’t we better at weighing results and making sound conclusions?

PD-1 versus Car-T: stop making sense

Teaching immune cells how to kill