Wednesday, 24 November 2021

The p-value battle continues

After my brief encounter with the p-value battle a couple of years ago, noticed at reference 1, I have been reminded of it by the piece in Science News, detailed at reference 2. And have been moved to read the first half of the paper at reference 3 rather more carefully. Which led me to references 4 – which suited me best of all – and reference 5.

Serious statisticians, psychologists and others have been waging a battle against careless use of p-value significance tests for some decades now. But that battle is far from won, if the letter to Nature a couple of years ago, detailed at reference 5 is anything to go by.

The battleground

P-values is all about the tails of distributions, usually assumed to be normal. The motivation is that values of a random variable with a known distribution are unlikely to be extreme, that is to say on the far right in the figure above, so if the value is very extreme it is likely that we have got the distribution wrong – that is to say, in this case, that we have got the hypothesis wrong. A motivation which is not altogether unreasonable, but all too often poorly founded. It is this poor foundation which is the battleground.

Such p-values might be one tailed, as shown above, or two tailed.

P-values of 0.05 – corresponding to one chance in twenty – are often thought to be sufficiently extreme for these purposes – in which case the statistic concerned is said to be significant.

There is some useful historical background to the use of p-values at reference 2. Which is also from where the figure above has been taken.

Colquhoun

I have now spent a bit of time with the author of references 3 and 4, one David Colquhoun, a retired eminence from UCL; not a statistician, but someone who has spent serious time on statistical matters – and, as it happens, on another long campaign against alternative medicine. I dare say, with a pharmacology background, he is a strong believer in medicines having active ingredients if they are to provide anything more than temporary relief. You can read all about him at reference 6.

While I think I can say that I now know more about the p-value problem than I ever did before – although it does not do to be too definite about what one might have forgotten over the years. I found (the relatively recent) reference 4 the most helpful. While for me, reference 3 was marred by a rather combative and dogmatic tone. Perhaps he thought he was justified by the slow progress.

He starts from the straightforward premise that it is not a good idea to publish breakthroughs, in one’s haste to climb the slippery pole of academe, only to have them rubbished or otherwise falsified a few months later. And the assertion that all too many breakthroughs which are published, often in reputable, peer reviewed journals, are based on careless, not so say wrong, use of p-value significance tests. Contrariwise, maybe the grant renewal is in the bank, or the new job is in the bag, by the time one is found out.

He goes on to tell us that one way out is to opt for very low p-values, say 0.001 rather than 0.05, but the trouble with this would be that very few experiments, particularly in the murky field of fMRI, would generate significant results. One can do better.

He makes a good case for looking at false positive rates, nearly always much larger than p-values would suggest, which have a far more direct bearing on whether one’s breakthrough really is a breakthrough. One part of his case is built on analogy with the screening problem and another part of is built on a straightforward simulation of lots of trials.

The screening problem – which has become topical with COVID – involves negatives that test negative, negatives that test positive, positives which test negative and positives which test positive, with the second and third of these possibilities often providing a great deal of unwanted noise. A model which maps fairly well onto the p-value problem, with the p-value people apt to concentrate on the negatives that test positive – the false positives – to the exclusion of other considerations. And although the rate may be low, there may be a lot of them absolutely, sometimes a lot more than the true positives, giving an uncomfortably high false positive rate.

There is also a Bayes flavoured angle. P-values are about the probability of getting some result, getting some evidence, given the hypothesis, that is to say P(E|H). Whereas what we are looking for is the probability of the hypothesis given the evidence, that is to say P(H|E). To get to which we need to deploy something like Bayes’ theorem and we need to deploy some priors, in one form or another. To blithely assume that P(E|H) and P(H|E) are more or less equal, which is what, in effect, is often done, won’t do at all.

And lastly an angle which appeals to me. He argues that setting up a dichotomy between something – say a possible medicine - having no effect and having a significant effect is not helpful. This world is better regarded as continuous rather than as dichotomous. Reference 5 talks of dichotomania, a mania which is by no means confined to this particular battle. So I go for the curse of the dichotomy.

Note that he does not deny the value of p-values. He grants that they are useful evidence. His point is that they are not proof. For that, more is needed.

Conclusions

Be wary of papers using the phrase p-value. If in doubt get help!

References

Reference 1: http://psmv4.blogspot.com/2019/06/more-hard-for-me-to-know.html.

Reference 2: How the strange idea of ‘statistical significance’ was born: A mathematical ritual has led researchers astray for decades – Bruce Bower, Science News – 2021.

Reference 3: An investigation of the false discovery rate and the misinterpretation of p-values – David Colquhoun – 2014. Royal Society Open Science.

Reference 4: Why p-values can't tell you what you need to know. And what to do about it: the false positive risk - David Colquhoun - 2020. Given at the RIOT Science Club at KCL. To be found at http://www.dcscience.net/2020/10/18/why-p-values-cant-tell-you-what-you-need-to-know-and-what-to-do-about-it/

Reference 5: Scientists rise up against statistical significance: Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories call for an end to hyped claims and the dismissal of possibly crucial effects - Valentin Amrhein, Sander Greenland, Blake McShane and more than 800 signatories – 2019. Nature.

Reference 6: https://en.wikipedia.org/wiki/David_Colquhoun

Reference 7: Lectures on Biostatistics – D. Colquhoun – 1971. Freely available for download. I was amused by the quote at the top of chapter 1, included above. And puzzled by the heading just below because trying not to make a fool of oneself also figures large in reference 3, more than 40 years later. Clearly a matter of some importance, so I shall read on.

No comments:

Post a Comment