A Little on the Rare Side

Jun 11, 2022

As a welcome relief from trying to get my head round gender woo-woo I thought I’d address another issue that has been bugging me for a while. It centres around statistical woo-woo, or rather just how far the gap is between the ‘average’ understanding of stats and what’s really going on.

I’m absolutely not apportioning any blame here because understanding stats and probability really isn’t easy - and it’s common, even for people with some expertise, to get things horribly muddled up (mea culpa).

I’ve discussed the (in)famous Monty Hall problem before. It’s a seemingly simple probability question for which most people’s ‘intuition’ fails. The idea here is that on a game show there are 3 boxes in which one is a diamond. The contestant is asked to choose one of the boxes (let’s say box 1 is chosen). All of the boxes remain unopened at this stage. This choice of box 1 is moved to one side.

Now the host takes one of the unchosen boxes (box 2 or 3) and opens one that he knows does not contain the diamond (let’s say this is box 2). He now asks the contestant whether they’d like to change their mind and pick box 3 instead of their original choice of box 1. The question is, should the contestant change their mind?

The answer is : yes, definitely

The ‘intuitive’ answer here is to think along the lines of, I know the diamond is in either box 1 or it’s in box 3 and so there’s a 50:50 chance of me having the diamond in box 1. So there’s no benefit in me changing my mind. This intuitive answer is, however, incorrect.

It becomes more clear when you think in terms of taking box 1 into another room off-stage, before any opening. There’s a 1/3 chance the diamond is in that room - and a 2/3 chance the diamond is on the stage. So, when the host knowingly opens one of the empty boxes, you haven’t changed this fact at all.

Even very, very brilliant mathematicians can get this wrong (and this actually happened in the case of the Monty Hall problem).

However, and it’s a big however, let’s now suppose the host has completely forgotten where the diamond is. He crosses his fingers behind his back and opens box 2. To his relief, the diamond isn’t there. Is there now any benefit to the contestant changing their mind?

Answering that one is not in the slightest bit ‘easy’. And in this 2nd case there’s no benefit in changing your mind.

The reason being is that you’ve actually got different information in both cases. In the first case you know the host is definitely avoiding a diamond-containing box. In the second case you know he’s winging it and hoping for the best. The inference you can then draw is different in both cases because you have different information to work with.

In the first case the host is not picking one at random (particularly if he knows the diamond is in box 3). In the second case he’s doing an experiment and picking one at random and the result of that experiment gives us some information about the actual probabilities.

Welcome to the wonderful wacky world of probability and inference.

Now most of us have some ‘intuitive’ idea of what probability and randomness are about - usually through exposure at school to balls. No I’m not talking about drag queen acts, but about the typical example of selecting balls at random, from bags.

We need probability and stats in all sorts of ways. A typical example would be the issue of quality control. You have some manufacturing process and you want to find out what the error-rate is. In a run of 100,000 manufactured products how many might we expect to be faulty? How do you go about that?

You could test all 100,000, and that might be appropriate if you’re making some very critical component that might impact safety, for example. But, in general, that’s going to be very expensive. If you can’t do non-destructive testing on a particular product it would also be pointless (how do you test whether a missile works? You blow it up)

The idea is that you test a small sample and see what you can infer about the overall reliability from the results of that sample testing.

You’re in a whole world of statistical pain now, and it gets very technical, very quickly, from here on in.

Things like ‘randomness’ and ‘probability’ are actually very difficult (and some might say impossible) to define in a precise mathematical way and most people’s idea of probability is essentially what is technically known as the ‘frequentist’ position. Probability, in this viewpoint, is assigned as a ratio of ‘events’ and ‘trials’. If we imagine a manufacturing process that sometimes produces red widgets when they’re supposed to be blue we might estimate the probability of producing a red widget by looking at a sample of 100 produced widgets and if 10 are red we might infer that the probability our process produces a red widget is going to be 10/100 = 0.1

We would adopt the position that this is the ‘correct’ probability with a certain probability! So, very loosely speaking, we might say something along the lines of “we know the error rate is going to be 0.1, give or take, and we’re 95% sure about that”. This is the business of putting stuff like confidence intervals on things. The more trials we do, the more confidence we say we have in our results, in general.

I don’t want to get into the technical details of these things except to say that the various parameters you’ll read about in the science papers - things like significance, or confidence level, or confidence interval - should be treated with a large degree of caution. Quite apart from the fact that these are technical ideas that are difficult to understand for the average person (and even for those with some expertise), they also get horribly misused. In a nutshell, for me anyway, if you’re going to be involved in the business of drawing inferences you ought to be working within a more Bayesian understanding of statistics - and a Bayesian approach recognizes that you can’t make inferences without making assumptions.

A frequentist approach relies on testing and samples and trials, and that’s OK provided you do it right, but how does one go about assessing a meaningful probability question like “what’s the probability the defendant is guilty?” - it’s not like you can run hundreds of experiments and count the results. The frequentist approach hits a brick wall here. A Bayesian approach allows you to analyse (or at least attempt to) more broad probability questions like these (although that’s far from its only utility).

You’ll notice that I’ve used the word inference a fair bit. And this is important because this is usually the result of perusing stats - we use them to draw some inference. And this is where things often go horribly wrong.

Take the oft-quoted example concerning black crime rates in the US. I don’t know what the actual figures are, but you’ll see statements along the lines of 50% of violent crime is committed by black people. From this we are supposed to infer that black people are, on average, less law abiding in this regard, because they only make up something like 13% of the population. It might be a correct inference to draw, but it also might not be. If, for example, you had a small number of black criminals who were extremely prolific (and this pattern wasn’t repeated in another community), it could well be possible for the black community to be more law-abiding, on average, than another community. And that’s only one way the initial bald statistic is not sufficient to allow us to draw our initial bold inference about a whole community.

You’ll note here just one assumption that went into the initial inference. It was an assumption about repeat offending rates. Like I said - you can’t make inferences without making assumptions.

In general what you try to do is to frame things in terms of a ‘hypothesis’ - usually termed the null hypothesis. Often we can think of this as the status quo. So, for example, in the case of masks our null hypothesis would be that masks have no effect on the spread of covid (or we might want to phrase this in a slightly different way like masks have less than 1% effectiveness against the spread of covid, for example). We’d then look at the “trial” data (or the real world data we now have) and see if there is sufficient evidence to reject this null hypothesis.

You would come to some conclusion - and it seems to me to be pretty incontrovertible from the real-world data - that the null hypothesis cannot be rejected in the case of masks. We’d then look to put limits on that conclusion by trying to answer questions like “what’s the probability my conclusion is wrong?”

In essence you end up with the position that masks might work, but so far you have not been able to find any evidence that they do. All of that verbiage is better expressed with the more public-facing statement : masks are shit.

The thing that bugged me, that inspired me to write this cautionary tale on the use of statistics, is the “all of my friends have had the Goo and no one has experienced any side effects” brigade. And, it has to be said, this kind of argumentation sometimes gets used by Team Sense too.

Let’s suppose that Goo serious side effects occur at a rate of 1 in 5,000. This is a figure not at all out of line with the estimates of vaccine-induced myocarditis in young men. Let’s further suppose you know 500 people. What’s the probability that, amongst those 500 people you know, there will not be a single serious side effect?

The answer turns out to be about 0.9

What this means is that 9 out of 10 people, each with a circle of 500 friends, will not know a single person who has experienced a serious side effect.

Yet a serious side effect occurring at a rate of 1 in 5,000 is a hideously high rate for vaccines. Or at least it used to be (allegedly - but who is prepared to fully trust the prior data on vaccines any more?)

A similar bonkers and statistically nonsensical argument also gets used by Team Woo-Woo when they opine : I’ve had 12 boosters, both arms, both legs, both testicles, both eyeballs, and 2 in each nipple for good measure, and I still got covid - but thank God I was vaccinated because it would have been so much worse.

The problem is never with the data itself (provided the data is not in error or otherwise falsified), but with the inferences that are drawn from the data.

Let’s end with a quick look at another contentious issue. Suppose you’re the security manager for a football stadium. You have 25 security marshals at your disposal. Now, for the last 5 years, the fans of team red have a bit of a history with throwing bottles onto the pitch. There have been 50 incidents per 1,000 fans. Team blue’s fans are not quite so inclined with only 2 incidents per 1,000 fans.

You don’t sell any bottles inside the stadium - so they have to be brought in. You, therefore, instigate a procedure whereby the bags and belongings are checked on entrance, but you can’t search everyone - it would take too long. So you have to do some kind of sampling here. Where do you concentrate your limited resources? Do you search red and blue fans equally?

If you think it makes sense to weight your search towards the red fans - you’ve just accepted the principle of profiling. You agree with the principle that it is sensible to concentrate your resources where the evidence suggests there is the greater problem.

Now, of course, in the case of racial profiling the inference that is drawn from a very selective dataset might be very suspect indeed, as we’ve seen above. The ‘evidence’ itself might be very suspect. But what if it wasn’t? What if after collecting a ton of stats and doing all the proper analysis it was actually found (that is, inferred with a high degree of confidence) that, for whatever reason, there really was a higher incidence of crime committed by one community?

Would profiling here be correct? Would we want to concentrate more resources towards that community? (and I don’t mean just more arrests and searches etc - resources might mean more positive community interaction, for example). What if you’ve already concentrated your resources somewhat and the higher incidence of observed criminality is itself a function of that increased vigilance?

Profiling when it comes to criminality is such a thorny issue - but, in essence, it’s all about sampling and inference. It’s a statistical issue that requires exactly the same degree of care and caution as we (should) exercise with any other statistical dataset.

But, a lot of people seemed to have no issue at all with profiling the unvaccinated - upon considerably less evidence. Or profiling Russians, these days. Taking the appropriate care with stats, and the inferences drawn from them, seems a little on the rare side these days - and both Team Sense and Team Woo Woo have been guilty (but nowhere near equally guilty) of recklessness in this regard. However, the consequences of this recklessness have been far worse in the case of Team Woo Woo. Team Sense would not have wasted billions upon billions of pounds (or dollars), or damaged children’s education and development, or littered the oceans with billions of masks, or closed businesses, or delayed essential treatments and diagnoses - to list just a very few things.

Our politicians didn’t do a very good job of profiling the impact of covid and the various interventions - so much was based on decidedly dodgy inferences drawn from the data (and, even worse, from the ‘models’).

I found it fascinating that Elon Musk was criticized for offering to spend about £35 billion to buy Twitter - which is less than the budget allocated (£37 billion) for the UK’s Test and Trace program. A program that failed, miserably, and made things much worse overall (and we told you this would happen - but, as ever, you didn’t listen). How do you spend that much on a test and trace program? And couldn’t you have ended world hunger instead? Elon’s critics certainly seemed to think that was a better way of spending his money.

I wonder what the world could have achieved if we’d put the trillions spent on fighting covid (and failing) to better use? Drawing the wrong inferences can be very costly indeed.

Riggery Pokery

A Little on the Rare Side