Statistics are a fundamental and vital component of science, and a good grasp of statistics is absolutely essentially if you want to be able to understand scientific results. Nevertheless, the vast majority of people have little or no knowledge of statistics. This leads to a great deal of confusion. Indeed, the vast majority of anti-science arguments that I have heard can be traced back to a poor understanding of statistics. It is therefore my goal to use this series of posts as a brief primer on statistics. In it, I will attempt to briefly explain some of the core concepts which are so often confused and misused. To really be qualified to understand science, you should take several courses in statistics and practice repeatedly, but these posts should at least help you to see through some of the crap on the internet.

The first fundamental topic that I want to discuss is the law of large numbers. In its most basic form, this law states that as you increase the number of repetitions in an experiment, your calculated value will approach the true value. In other words, you need a large sample size to be confident of your results. Let me illustrate using coins. Suppose I flip a coin ten times and get three heads and seven tails. No one would be surprised by this because my sample size is very small. If, however, I flip the coin 100 times and get 30 heads and 70 tails, we would be surprised and become suspicious that the coin is not balanced. This is the case because 100 is a reasonably large sample size (for coin flipping), so we expect the results to be close to 50:50. Further, if I flipped the coin 1,000 times and got 300 heads and 700 tails, we would be extremely surprised. In fact, statistically, it would be almost certain that the coin was not balanced (statistics can never prove anything with 100% certainty, but that is a topic for another post).

Intuitively, most people realize that the law of large numbers is true, yet in practice, they do not apply it. The law tells us that in order to have any real confidence in our results, we need a large sample size. This fact automatically eliminates all anecdotal evidence. For example, the fact that you know three people who were vaccinated against the flu and still got the flu is meaningless because your sample size is tiny. It’s no different from flipping a coin ten times and getting three heads.

Similarly, this law is why scientists place very little weight on case reports and studies with very small sample sizes, but place a great deal of weight on studies with large sample sizes like the recent meta-analysis that used over 1.2 million children to compare autism rates among vaccinated and unvaccinated kids. Based on the law of large numbers, by the time you reach that large of a sample size, you expect to see a significant difference between those two groups if one actually exists. The fact that no statistical difference was found is, therefore, very powerful evidence that vaccines do not cause autism.

So far, everything that I have said is pretty straightforward, and most people are intuitively aware of it, even if they frequently choose to ignore it and cling to anecdotes, but there is a logical extension of the law of large numbers that is less well known. Namely, given an infinite number of repetitions, every potentiality will become an actuality. In other words, if you do something enough times, you will eventually get even the most unlikely outcomes. This is why people win the lottery. The odds of any one ticket wining are extremely low, but so many people play that eventually someone wins. To give another example, the probability of flipping a coin 20 times and getting all heads is 0.00000095. That’s roughly a one in a million chance. If I gave you a coin and had you flip it 20 times, we would all be very surprised if you got all heads, but suppose I gave you a million coins and asked you to flip each one of them 20 times. We would be very surprised if you actually obeyed me, but we would not be at all surprised if one of your coins landed on heads every time. In fact, we would expect that at least one of them should land on all heads.

This has a very powerful impact on understanding seemingly improbable events. For example, on several occasions, I have had someone try to convince me that a particular person was psychic because of their impressive track record of being right, but (even ignoring the many tricks supposed psychics use) when you consider the millions of people who claim to be psychics and the countless predictions that they make, we would expect a few of them to be right most of the time just by sheer dumb luck. Similarly, when someone is told that they have a 1 in 100 chance of surviving cancer then succeeds at fighting it, many people are quick to jump to the conclusion that it was a miracle, but the reality is that 99% of the people who are told that don’t make it, and the 1% who survive are not miracles, they are simply statistical outliers that are perfectly predictable and understandable via the laws of mathematics.

This brings me to my final point, which relates back to my previous comment about anecdotal evidence. People in the anti-science movement love outliers. In fact, they base many of their arguments around them, but the law of large numbers tells us that outliers are to be expected just by chance and it is the central trends in the data that we need to concern ourselves with. For example, the fact that a small group of vaccinated people got the disease that they had been vaccinated against does not prove that vaccines don’t work because that group comprises the outliers. The vast majority of vaccinated people are safe, and the vaccinated community gets the disease much less frequently than the unvaccinated community. So you see, you have to apply the law of large numbers.

The take home message here is that you should avoid being duped by anecdotal evidence, small sample sizes, and seeming improbable events. You absolutely have to have large sample sizes in order to understand what is going on, and once you have those sample sizes, it quickly becomes clear that many of the anecdotes were merely statistical outliers, the results of small sample sizes were due to chance, and seemingly miraculous events were nothing more than the law of large numbers playing itself out.

*Other posts on statistics:*

- Basic Statistics Part 2: Correlation vs. Causation
- Basic Statistics Part 3: The Dangers of Large Data Sets: A Tale of P values, Error Rates, and Bonferroni Corrections
- Basic statistics part 4: Understanding P values