An Unbiased Biased View on Bayesian Statistics
Veritasium + Crash Course Statistics
This week we’re doing something a little different. We first have a video from a science communicator named Derek Muller on his excellent YouTube channel Veritasium. At the end, there will be two videos from Crash Course Statistics that go deeper into the statistics part of the Bayesian approach.
Note
To be honest most of you will leave this class likely not thinking or caring about the underlying methodological and philosophical disagreement that exists between the two types of statisticians explained below. So why does it matter? In a nutshell, we’re at a point in history where the dominant approach will affect your everyday life ranging from credit to Internet speeds to predictions/reviews in sports to any other aspect where data, in particular “big data” is present. So while you may not be personally invested in the statistical approach that becomes the standard, your life will surely be affected and its not a bad idea to know the underlying reasons behind it.
Full Disclosure
Firstly I consider myself a Bayesian statistician, so there is an inherent bias and uncertainty in this writing which will be apparent later. The benefit here is that unlike a frequentist approach that rejects or minimizes bias and favors outcomes with certainty, Bayesians recognize and utilize bias in their statistics where both uncertainty exists as part of the problem, and solutions are given as multiple possibilities with varying levels of probabilities1.
Bit of History
The Bayesian way of thinking was quite popular in the 18th and 19th centuries, but fell out of favor in the 20th century which gave way to the frequentists. During that time, the few Bayesian statisticians that were active were often shunned and even ridiculed by academics and practitioners alike. It looked like we knew everything about the foundation of statistics and anything that needed a new approach was simply a matter of furthering that viewpoint. But in the mid to late 2000 large sets consisting of a combination of structured, semistructured, and unstructured data - aka “big data” - began to pop up everywhere. However technology, in particular computing technology was simply not able to handle and analyze these data sets. Moreover for most of the public, computers were just a means to create or run an application - the idea that they could do anything beyond that was deemed to be science fiction that could be science one day in the distant future. But as life would have it, not only were we able to figure out how to teach a computer to learn and make decisions - aka machine learning (ML) - many ML algorithms were significant improved by attaching prior beliefs about how a data set should be looked at creating a situation where a person get choice of multiple statistical models and are able to pick the one that works out the best for an analyses.
A Line is Drawn
Statisticians and those who practice statistics are at odds on how the world works. One group - the frequentists - think deductively and see probability as a means for finding a single defined outcome while another - the Bayesians - think inductively and use probability to describe the chance of many possible outcomes.
Frequentist Statisticians
Statisticians who view the world as deterministic, do not include subjectivity, and see probabilities as a way to explain how random events would look after a bunch of trials are known as frequentist statisticians. To show this through a statistical lens, let’s look at the traditional coin flipping example: As a frequentist statistician, we would
- first suppress any prior ideas of how the outcome should look;
- then flip a coin over and over and record the results; and
- find that after enough flips that while we will likely never get 50-50 odds, the data shows that we’re heading that way so with the idea if we flipped that coin an infinite number of times, that’s the true outcome.
This is the idea of something being deterministic where probabilities are used to describe a fixed value which in this case is always going to be 50-50. For any social scientists, the epistemological perspective is that frequentist for the most part believe in a single truth.
Bayesian Statisticians
Statisticians who view the world as probabilistic, allow for prior beliefs about a phenomena, and update the probability of those beliefs with new evidence are known as Bayesian statisticians. To again show this through a statistical lens, let’s look at the traditional coin flipping example: As a Bayesian statistician, we would
- first have a prior belief of what the probability of getting a heads or tails is say 50-50.
- then flip a coin over and over and record the results; and
- find that after enough flips that we will never get to the 50-50 odds implying that there are multiple possible outcomes, each with its own associated probability.
This is the idea of something being deterministic where probabilities are used to describe multiple values which in this case may be 50-50, but could also be 40-60, 60-40, 20 - 80 and so on, each associated with some chance of being true. For any social scientists, the epistemological perspective is that Bayesian for the most part believe in multiple truths with some more likely than others.
Mostly Non Stats Example
Rather than provide an example that uses a bunch of statistics, let’s look at it practically.
Situation: You have misplaced your iPhone somewhere in your apartment or home. You can use the Find My app ® to find it and hear beeping.
Problem: Which room in your apartment or home should you search?
Approaches
Frequentist: You hear the phone beeping and you have some approach using a mental model to help figure out what room the beeping is coming from. So you used inferences from the beeps to locate the room in your home you must search to find the phone.
Bayesian: You hear the phone beeping and along with some approach using a mental model, you also know all of the rooms you have misplaced the phone before which combined help to figure out what room the beeping is coming from. So you used inferences from the beeps and prior knowledge to locate the room in your home you must search to find the phone.
Table of Differences
There are multiple differences in how the sides view not just how problems should be viewed and solved, but on the nature of reality itself. The table below gives some overarching ones.
Type | Information Used | What is Random? | Type of Reasoning | Terminology | Observed Data |
---|---|---|---|---|---|
Frequentist statistician | Outcomes derived strictly from experiments | Observed data. Any data that has been collected from an experiment | Deductive logic | Common terms like p-value, significant, null hypothesis, and confidence interval | Unknown and comes only from experiments |
Bayesian statistician | Prior beliefs about what the truth might be which are interatievly updated as experiments progress | Population parameters. Any summary number, like an average or percentage, that describes the entire population | Inductive logic | Uncommon terms like prior probability, noninformative priors, and credible intervals | Known since we already know what we know |
So Who is Right?
We honestly just don’t know right now. Similar to physics where there are two seemingly incompatible laws governing the universe - relativity for large and quantum for small objects, respectively - statisticians have the same difficulty regarding inferences. The frequentist thinking is by far the dominant approach because it involves fairly “simple” and concrete calculations that can be tested and verified when explaining the phenomena around us. In fact, the approach aligns with the idea of certainty that humans like and need to make sense of the world, but there is a limit to what this approach can explain and that has not been more apparent than with computing. Practically speaking ignoring the human element in a study absolutely makes it easier to perform analyses, but the very act of removing it has an affect from start to finish that results in outcomes that are at the very least imprecise and at most harmful.
The actual explanation for which side is right is likely some combination of both approaches - yet there may also be another method that can unify both into an all encompassing statistical approach that we just haven’t considered yet or maybe both approaches are wrong and we need to rethink statistics from the ground up. For now all we have are these two somewhat related yet also contradictory views on how the world works.
So it may be that frequentist statistics is a needed field in order to get to the Bayesian way of looking at the world similar to needing an understanding of descriptive statistics prior to moving into inferential.
Frequentist Statistics
Bayesian Statistics
Fun fact: The fact you are able to view this exact page right now is based on based on probability ↩︎