This is part of a series where I teach statistics from the ground up with no expectation of prior knowledge. Links to whole series can be found at the bottom of this post.
This post is more involved than the previous in this series, and for good reason - I intend to persuade you that probability is not what you think it is. Anecdotally, people seem to have quite strong intuitions for what the term means. If they’re told that something has a particular probability or they encounter something with an “obvious” probability (such as drawing a particular card from a deck or rolling a particular number on a die), people appear quite confident about how to interpret statements such as “the probability of drawing a queen is 1/13”.
This intuition lines up with the popular frequentist definition of probability. While it can be phrased more formally, the general definition is as follows,
probability is how often a particular outcome would occur if we repeated something many times1
I will dodge the somewhat thorny debates about how to define the word outcome (as well as trial and experiment) and simply appeal to your intuition. The outcome of a draw from a deck is a card. The outcome of a die roll is a particular side. I will restrain myself to these well defined problems in this post, both to avoid linguistic confusion and because I believe that if I can persuade you that the statement, “the probability of rolling a one on a die is 1/6” doesn’t mean what you think it does, I can persuade you about the term more generally.
I present to you that the above definition breaks down once we become precise about the meaning of two things - what exactly “something” is, and what it means to “repeat” that thing. To motivate this post, I will use the example of rolling a die. I suggest to you that insofar as the statement that, “the probability of rolling a one on a die is 1/6” seems coherent (i.e. you can pick up a die and roll it, and will see that you get a one about a sixth of the time), it is for entirely different reasons than you think.
Looping Time
One interpretation of “repeating” is if we could loop time and rerun the exact same roll of a die over and over again. Assuming this was possible, we could say the probability of rolling a one is the frequency of times a one occurs as we observe the same roll many times.
However, the term “roll” is vague. I am going to present two concrete examples to you. The first example is defining a roll as a situation where a die was just released from someone’s hand. It tumbles through the air, hits a table, and bounces along until it comes to a stop.
The second example starts earlier. Another person picks up a die from the table, shakes it in their hand, and then releases it. Again, it tumbles through the air, hits the table, and bounces before coming to rest.
I encourage you to honestly ask yourself what frequency you believe a one might occur with for these two examples. Take some time, and reflect.
Now, I will share my opinion with you. In the first example, I don’t believe that a serious claim can be made that a one would occur at anything resembling a frequency of 1/6. I personally would expect a one to occur at one of four frequencies,
Always
Never
Close to always
Close to never
This is simply because the roll of the die is a process in physics. The difference with the second example is that we have introduced a human element, or more precisely, an element which might have free will and be able to make decisions about how and when they roll the dice2. In the first example, I posit to you whether there is inconsistency in outcomes depends solely on our view of if there exists structural randomness built into the fabric of our reality (i.e. quantum randomness), and if it is significant enough to alter which side is rolled on a die3. In the second example, the same possible structural randomness is present, but free will could also influence the roll.
If we believe there is no free will, then there is no functional difference between our two examples. If we do, then our expectation about the probability which will emerge will depend on our beliefs of how free will operates. Do we believe that it is limited such that someone will shake their hand between 1 and 1.1 seconds? If so, each side very likely will be rolled with different frequencies. If it is wider, it may come closer to an even distribution.
However, as we expand the scope of free will, we begin to enter scenarios where our roller might not even roll the die. They might crush it in their hand, or pocket it and walk away. This could also push the frequency of a one being rolled below 1/6.
My goal is not to persuade you of a particular frequency. My point is that stating that there is a “probability of rolling a one” demands a more specific definition of what a “roll” constitutes, as it will be different depending on how the roll is structured. Additionally, any assertion about a particular expected frequency of a one depends upon certain deep assumptions about the nature of randomness in our reality.
Trials Over Time
The other definition of “repeat” is to do so over time. The simplest would be to roll the same die over and over again. However, there is an important practical problem, which is that even if we try to keep our rolls very similar, we are at risk of changes occurring. For instance, the die might gradually degrade unevenly such that one of the side wears, altering how the die rolls to produce some sides more than others.
I suggest there are better ways to repeat a roll. For instance, we could have one million identical die produced from the same factory, each checked to ensure symmetry as closely as we can tell. However, if we are doing this, we again need to form a more precise definition of what constitutes a roll.
One approach would be to try to define a detailed roll, and attempt to repeat that over and over. For example, we could build a robotic arm which we attempt to have repeat the motion of rolling a die as similarly as possible many times. Alternatively, we could simply provide very detailed instructions to someone, such as: pick up the die with it resting with a one facing up, shake for as close to 1 second as you can, release.
I present to you that this is basically just the time looping situation, with some additional room for error due to the limits of robotics or humans to follow instructions. Not only would I not expect either the robot or the human to produce a one with a frequency around 1/6, I believe that insofar as the human gets close, this is because they are rolling the die in different ways and failing to replicate their motions similarly. Put differently - much of the difference in sides being rolled would be a result of observable differences in how the roll occurred.
To hammer this point home, consider the following thought experiment. Imagine that sitting at the table where the die being rolled is a robot with x-ray vision and an extremely fast processor. I posit to you that this robot would be able to predict which side would be rolled with much more accuracy than 1/64, because of its ability to better identify this error and distinguish between materially different rolls.
We are left with one of two options to salvage our definition of probability. The first is to loosen our definition of “repeat”, instead using it in a lazy and casual way. Here, the statement “the probability of rolling a one is 1/6” is actually just saying “on average, rolling a die produces a one about a sixth of the time”. While I believe this is perfectly acceptable for casual speech, note that this is quite different than the original probability definition. Getting to 1/6 basically requires us to be lumping observably different rolls of a die together.
Alternatively, we can preserve the original statement but let go of the idea that the probability of any given side is 1/6. Instead, we could define a particular roll in an extremely precise way, and explore what deviations (if any) would result as a result of true randomness. In this approach, we would functionally be exploring the existence and nature of true randomness (around quantum randomness and free will), as opposed to baking in assumptions about them ahead of time.
You may reasonably ask why I am hammering this point home so fervently. There is a quote from Leonard J. Savage, where he wrote,
It is unanimously agreed that statistics depends somehow on probability. But, as to what probability is and how it is connected with statistics, there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel. Doubtless, much of the disagreement is merely terminological and would disappear under sufficiently sharp analysis.
I agree with Savage, in part. I concur that much of the disagreement is terminological. However, I am not persuaded that statistics must depend on probability. Why do we even need it? As I understand, the nominal importance of probability stems from the roots of mathematic statistics in set theory, particularly Andrey Kolmogorov’s axioms. His third axiom is,
To each set A in F is assigned a non-negative real number P(A). This number P(A) is called the probability of the event A.
This is the key point. If we are willing believe that there exists some “true” probability for a particular side being rolled, then the axioms of statistics shift from being simply assertions to being firmly rooted in reality. If this is the case, the goal of statistics becomes to explore and find these true probabilities.
However, it seems to me that how we have historically defined probability is to start with certain “probabilistic facts” (such as the probability of a rolling a particular side being 1/6), and then fit a definition to those “facts”. On a technical level, for statistics to be pursuing truth, I believe we must be willing to abandon these assertions and be willing to explore probability as an exploration of true randomness.
For our purposes, statistics can be incredibly useful without even engaging with the term. For instance, we can return to statistics as quantitative history, and might observe that “I rolled a die 1000 times, and I rolled a one 167 times” or “I observed 1000 die rolls which were the same to the best of my knowledge, and we rolled a one 5 times”. These are potentially useful observations, and we can simply call these posterior frequencies rather than a probability. There is no need to imply that we have discovered some truth about the universe.
Alternatively, we can use mathematics purely theoretically. We could say that a metaphor for how a roll of a die is a draw from a bag with 6 balls, one with each side. This bag drawing metaphor fits Kolmogorov’s axioms cleanly, and then we simply need to argue that the metaphor is useful and applicable to be able to apply conclusions based on this model. Again, no claim of discovering a probability needs to be made here.
We could call these probabilities - but I suggest we shouldn’t. The term is already incredibly tortured and confused. While I am trying to stop using the term probability casually (to further clarify my thinking on the matter), in more formal settings, I believe we should constrain the term to a very strict definition which only captures sources of true randomness (quantum randomness and free will)5.
Lastly, I promised to explain why the statement “the probability of rolling a one is 1/6” appears so cogent. I believe it is because even if we observe many rolls which appear the same to our human eye, the variation in the unobserved conditions over time happens to produce a one around a sixth of the time. This is simpler to explain if I use the example of a computer and random number generation.
When a computer generates a “random” number, what it actually does is produce a number as a function result from some fact of the world, such as the precise millisecond time or the temperature in a particular place. Because these things vary over time, we experience the output of the computer as random - but it is an illusion. In fact, if we knew how the random function worked and could observe the same facts it was based off, we wouldn’t experience this generation as random. I present to you that this is basically what is happening with die rolls.
In short, our experience of a particular probability (and by extension randomness) is overwhelmingly a result of our ignorance. This is the point about the x-ray vision robot above - it experiences randomness at a much lower level than us. I believe that our experience of randomness / probability is almost entirely subjective (if not entirely), rather than being an intrinsic property of die or other so-called “random” processes in the world.
Part 1: The Roots are Shallow
Part 2: What Is Data?
Part 3: The Death of Probability
The more formal definition is the limit of the percentage of times a particular outcome occurs if we repeated something over and over approaching infinity.
Technically, the first example could also be exposed to free will - maybe the person bumps the table or someone opens a door and a breeze flows in. I am ignoring this for simplicity.
The one other source of randomness that I can think of which could change the roll would be the interference of some higher power outside our reality, though I also ignore this for simplicity.
It is possible the roll could vary for reasons which are undetectable even by the robot. The formal point I am making is to distinguish between rolls which are theoretically distinguishable and those that are truly the same (which I consider truly random).
There are also other probability definitions. These generally fall into one of two categories. There are those which depend on different epistemic assumptions (for the propensity and classical definitions) and those which aren’t making truth statements at all and focus on using “probability” usefully (which I believe we shouldn’t even call a probability definition). If there is interest, I am happy to write posts digging more into these other approaches.