It's Probably Probable with a Probability Probably not Zero

Aug 26, 2022

Titania McGrath is a well-known parody character created by Andrew Doyle. Andrew is a smart guy with a great sense of humour who has poked fun at a lot of the ‘woke’ nonsense. It’s worth catching up with some of his interviews. He has sometimes expressed the view that it’s getting harder and harder to write parody because it’s difficult, these days, to come up with something ridiculous enough that hasn’t already been said (in all seriousness) by one of the wokerati.

As you might imagine, the ensuing discussion after this sally from Titania was, erm, colourful.

One person opined :

One group excludes others to exploit and deprive those groups.
One group excludes that other group, to have space to breathe.
It's not just the action, it's also the intent. One group's intent is disgusting, the other healing. Get rid of the first, we don't need the second.

Which provoked further interesting commentary. One comment caught my eye. It was a defence of this modern day “healing” discrimination.

In system design and mathematics, systemic racism is a system design issue called 'probability hacking'. It's not a guessing contest. These are math questions. Some people ignore knowledge for instinct. You are wrong, and I don’t care. We need 2 systems, live as you want.

Here we have some strange appeal to mathematics and probability. Systemic racism is, allegedly, engineered through something called probability hacking. It is, we are told, a math question. The fact that the author of this tweet is confusing something called p-hacking (which does exist) with “probability hacking” (which doesn’t) is irrelevant.

What’s fascinating here is the appeal to ‘technicality’. It’s a tweet that is designed to give an air of authority, to provide a word-salad of technobabble that promotes the illusion the author knows what he’s talking about. It shares a familial relationship with lots of other things like stochastic terrorism, or the garbled pretentious claptrap that is post-modernism applied to societal structures. Stuff that is designed to sound all erudite and smart, but in reality the Emperor is stark bollock naked.

The “p” in p-hacking refers to something known as the p-value, which is a statistical parameter that purports to measure statistical ‘significance’. The idea behind p-hacking is essentially that, for whatever reason (incompetence or duplicity), data is selectively analysed to produce a result that appears to be ‘significant’. I don’t want to go down the rabbit hole of the concept of statistical significance, but suffice it to say it’s a concept that needs to be treated with a large degree of caution.

If you’re interested, William Briggs has written quite a few entertaining and insightful pieces on probability and the use of p-values.

Probability is an absolutely central concept these days. It’s everywhere. Whenever you hear the words risk, or chance, or effectiveness, or safety, amongst others, you know that somewhere underneath there be lurking the probability gremlins. Abandon all hope ye who enter here, because probability is a very slippery little gremlin indeed.

When it comes to probability it’s often actually easier to get it wrong than it is to get it right for most people - and this includes people who have some degree of technical competence. I include myself in this category. Despite having had to use probability quite extensively in my work, I have to admit to an embarrassingly high frequency of buggering it up. I check, re-check, think and re-think, and I’m still more than capable of a glorious FUBAR when it comes to probability.

Most of us encounter ‘technical’ notions of probability at school. You’ll have rolled imaginary dice, played with imaginary balls by picking them out of imaginary sacks, and maybe even thought about a hand of poker or two. All of this is fine to introduce some kind of ‘intuitive’ notion of what probability is about. You have to start somewhere, after all.

When asked what’s the probability of rolling a 5 when throwing a dice most people will, probably, be able to tell you it’s 1/6. This can be expressed in words as “a one out of six chance”, or it might be expressed as a percentage as “a nearly 17% chance of rolling a 5”. What often gets excluded here is the rather important phrase, “assuming a uniformly random outcome”.

This means that we assume a fair dice so that one outcome (which number we get) is not favoured above any other. But the other word there that’s a wee bit problematic is the word ‘random’. You could have a dice that rolls a 5 with 50% probability with the other numbers being 10% probable - and it still would be a random outcome. It just wouldn’t be uniformly random.

It turns out that it’s really very difficult, if not impossible, to actually define what is meant by randomness in any fully rigorous technical way. There’s usually a kind of circularity involved. It’s hard to define randomness without either explicitly or implicitly talking about probability, but you can’t define probability without talking about randomness.

A lot of ‘foundational’ maths concepts are a bit like this. Once you start delving further you get yourself into all sorts of trouble. If I asked what’s the probability of picking the number 0.25 when selecting, uniformly at random, from the numbers between, and including, 0 and 1 you might be surprised to learn that the probability is zero.

You what? Well, there are a lot, and I mean a lot, of numbers between 0 and 1. You could make a start writing them down

0.1
0.11
0.111
0.1111
.
.
.

and so on. Even without ever having to write a different number other than 1 (and the initial zero) you can see that you will never be able to complete this list. However many “1’s” you put after the decimal point, you’ll always be able to add another. It’s an infinitely long list, there’s no “stopping” point - and that’s just with “1” being used.

It’s even worse than this, though. It turns out that the infinity of numbers between 0 and 1 is actually a bigger infinity than the infinite set of natural numbers (the counting numbers). You have to distinguish between an infinity that is countable, and an infinity that is uncountable.

So when you select a number, uniformly at random, from the numbers between 0 and 1 you’re selecting from a set of things that is uncountably infinite. It’s not just infinite, but infinity that’s been to the gym.

We asked a simple question and very quickly ran straight into a very important foundational mathematical brick wall of confusion. It was said that the person who first discovered these different classes of infinity, Georg Cantor, went mad by thinking about infinity. Whether that’s actually true or not, it certainly wouldn’t surprise me.

Physicists have the luxury of, usually, being able to brush a lot of these thorny foundational issues under the grand waffle carpet. Mathematicians do not have that luxury and have to really try to pin down all the details.

Despite these kinds of technical difficulties with the foundational notions of probability and randomness, they are, nevertheless, very useful concepts. They’re often the only way we can actually proceed with any sort of analysis at all. Take a gas (technically an ideal gas) and think about how many molecules there would be in a box that is 1 cubic metre in volume if you filled it with this gas - that’s a cube that has a 1m side. A decent-sized box that is hard to get through your front door, but not very big at all when compared to the volume of a supermarket, say.

There are, about, 20,000,000,000,000,000,000,000,000 gas molecules in such a box (give or take one or two). That’s a 2 followed by 25 zeros. That’s more than the number of stars in the observable universe. These gas molecules are all whizzing about, bumping into one another and the walls of the container. We could, in principle, predict the properties by writing down the equations of motion for each molecule (assuming Newtonian mechanics was good enough) - but we’d also need to know the precise positions and velocities of each one at a particular instant in time in order to be able to solve the equations, even in principle.

Good luck with that. So, in order to proceed, a statistical model is constructed. Some probability distributions are assumed and the aggregate properties can be worked out from there.

But what about notions of probability when applied to things of importance to society, or that impact our everyday life?

Here’s where it gets interesting, and messy, and where half-truths and duplicity come to glorious fruition. It is, in short, the world of the politician.

We saw this kind of thing in the definition of stochastic terrorism.

Riggery Pokery

Stochastic terror and the non-stochasticity of application

I’ve recently become aware of a new phrase; stochastic terrorism This blisteringly inane phrase is being used as a justification for censorship. When I first saw this phrase I had no idea what it meant. Was it the terror equivalent of genderfluidity where you would choose to blow people to shit on a Monday for some cause picked at random and then do the …

3 years ago · 37 likes · 44 comments · Rudolph Rigger

The definition referred to a violent event, the specifics of which could not be predicted (i.e. ‘random’) but which, nevertheless, was said to be “statistically probable”. The term ‘statistically probable’ here is not a scientific statement. It is a political statement.

The definition here is trying to seduce you with something that sounds all technical, preying on the implicit nudging that the ‘experts’ understand this stuff. It’s misdirection, of course, because the Emperor here is most definitely displaying his meat and two veg in all their shrivelled glory.

It’s easy to write ‘technical’ garbage. The heterogeneity of implicit association and the co-integration of auto-generated syllogisms leads to the amalgamation of narcissistic counter-commonalities coupled with the epistemological positivity of ethno-congruence and the pseudo-intersectionality of confluent ideation.

All you need is a dictionary, a thesaurus, and the desire to pull the wool over people’s eyes.

I really hate writing which is designed more to promote the author than the ideas contained therein.

That’s one kind of misdirection. The other kind of misdirection is to take a complex idea, like probability, and to misuse it for political ends. Some ideas are difficult to explain properly without being technical, but that’s no excuse for p-hacking the ideas themselves.

Take the notion of vaccine ‘effectiveness’. The vaccines were initially claimed to be 95% effective. Some even claimed 100% effectiveness. What did they mean by that? Well, let’s assume they were talking about effectiveness from serious symptoms - which, we can assume, also means at least 95% effectiveness from death. I don’t think covid is a disease where people walk around with a mild sniffle and just drop down dead. Except in China, of course.

What they mean is that your chance of dying from covid, if you’ve had the vaccine, is 5% of your chance of dying from covid if you have not. It is, in technical parlance, what’s called a conditional probability.

Here’s what it actually means. Suppose I split my population vaxxed and unvaxxed and I put them into separate ‘rooms’. Now, suppose I pick one person, at random, from the vaxxed room, what is the probability I have picked someone who has died of covid?

The notion here is that you’ll be more likely, when selecting at random, to pick a covid fatality from the unvaxxed room than you would if you were selecting from the vaxxed room. Which probability you get is conditional on which room you’re picking from.

In symbolic terms you’re comparing the probability P(dead|vaxxed) with P(dead|unvaxxed). The vertical line here is to be read as “given”. So P(dead|vaxxed) is read as ‘the probability a person died of covid given that they have been vaccinated’.

But these are not the only two probabilities we can compare here. Let’s suppose we put everyone back into the same room and now select an individual at random. We can now consider the probability that we’ve picked someone who has died AND who has been vaccinated. In symbolic terms we’d write P(dead, vaxxed) where the comma here is standing in for the AND. These kinds of probabilities are called joint probabilities.

So in this room, we can compare P(dead, vaxxed) with P(dead, unvaxxed).

This, in probability terms, is the difference between a determination of relative risk vs absolute risk.

Notice we’ve only got two ‘variables’ here. Dead/alive and vaxxed/unvaxxed - so we might call these variables life status and vax status. But we know these aren’t the only factors at play. There are co-morbidities, and age, for example.

Now it gets even more complicated to think about the probabilities. We could do our initial split into vaxxed/unvaxxed rooms and once we’ve done that split each of these rooms into under and over 50 rooms, say. So if we selected someone, at random, from the unvaxxed, over 50, room we could write the probability of picking a dead person given that they are unvaxxed AND over 50, for example.

Whenever we have a conditional probability think about which ‘rooms’ you’ve split things into and it’ll help keep things a bit clearer.

What I’ve described above is one correct way to think about the probabilities involved (at least within a frequentist interpretation of statistics and probability). It’s far from being ‘straightforward’ unless you’re already familiar with the concepts involved. But look at how I’ve been careful to introduce the idea that we’re picking someone at random from one of these ‘rooms’. This is actually more important than most people realize.

A good way to think about this difference between relative and absolute risk (the difference between a comparison of conditional probabilities and a comparison of joint probabilities) is to consider the issue of lightning strikes. Suppose there was a rubber suit manufactured that was described as being 95% effective against death by lightning strike. What they mean is that if you were to be struck by lightning it would improve your chances of survival. Sounds great. So why don’t we all walk around wearing rubber suits?

It’s important to set a little red warning flag in your mind whenever you hear anybody, particularly a politician, or the Chief Executive of a Pharma company, talk about something that involves probability.

One might mischievously say it’s probable they are trying to deceive you.

Which leads me on to the final thought. Analysing stuff in terms of ‘rooms’ and picking stuff out of sacks etc, is all fine and dandy. When you have concrete data like this it’s useful to adopt an idea of probability that depends on a ratio of number of things of interest divided by total number of things (10 red balls, 90 blue balls in a sack - the probability of picking a red ball is number of red balls divided by total number of balls which works out to be 10/100 or 0.1). This is, essentially, what is known as the frequentist perspective on probability. It compares frequencies of events.

But what about the legitimate question “if a politician uses probability in his argument, what’s the probability that he’s trying to pull the wool over my eyes?”

This kind of question can’t really be properly addressed within a frequentist perspective - which would, if space allowed, lead into an even more tangled discussion to do with the difference between the frequentist and Bayesian perspectives on probability.

Be wary of anyone who tries to make simple ideas sound complex. But be equally wary of those who try to make complex ideas sound too simple.

Riggery Pokery

It's Probably Probable with a Probability Probably not Zero

Discussion about this post