Case-fatality rates don’t indicate how well a country contained COVID

Recently, I have been frequently seeing the argument that the USA has actually done a very good job at dealing with COVID because it’s case-fatality rate (i.e., the percentage of infected individuals that die from COVID) is lower than many other countries, including many European countries like Italy, the UK, etc. This claim actually presents a good opportunity to look at some aspects of data analysis, cherry-picking, and how stats can be abused and misused, so I want to take a very brief look at the claim and simply address the following two questions:

  1. Are case-fatalities a good metric for how well a country responded to the outbreak?
  2. Does the US have a particularly good case-fatality rate?

While I am focusing on those two questions because I think they are instructive, much of what I am going to describe applies to many other arguments floating around the internet regarding COVID (e.g., a faulty comparison I keep seeing of countries that did and did not use hydroxychloroquine). So, as you read this, really try to understand the reasoning behind the answers, because that will help you analyze other claims/questions you encounter.

I want to be 100% clear at the start that this is not a political post. People often make the mistake of assuming that any discussion of any topic even remotely related to politics is inherently a political discussion. That is incorrect. Facts aren’t political. The questions I am going to address are strictly factual, scientific questions. They can (and must) be answered with evidence and facts, not politics. Now, you can certainly use those answers to make political arguments about whom to vote for, policies that should be put in place, etc. but the answers themselves are not political. They are simple facts that are not affected by political views. They are about objective reality, not politics.

Are case-fatalities a good metric for how well a country responded to the outbreak?

How relevant case-fatalities are depends on exactly what is being claimed/discussed. If we want to look at how well countries did at treating infected patients, then case-fatalities are relevant (with a lot of caveats; see question 2), because they help to describe the outcomes for people who became infected. However, many people (including the POTUS) keep using case-fatalities to make a more general argument about how well countries responded to the virus, and that’s a problem.

Case-fatalities are not a valid metric for how well a country contained the virus, because they only describe what happened to people who became infected. A country with 10 cases, and a country with 10,000,0000 cases could have exactly the same case-fatality rate. Indeed, if a large country allowed its entire population to be infected, and a full 2% of the entire population died from the virus, it would still have a case-fatality rate that is better than the global average (for countries of comparable size). So, case-fatalities simply don’t show how well a country prevented outbreaks.

If we want to know how well a country did at containing the virus, we need to look at metrics like the number of cases relative to population size. This shows the proportion of the population that became infected, and thus is the relevant metric for looking at how well the spread of the virus is controlled (there are still lots of caveats here, because things like population density have a big impact on spread). When we look at that, the US is the 10th worst country in the world. In other words, there are only 9 countries with more cases per capita (Qatar, Bahrain, French Guiana, Panama, Aruba, Chile, San Marino, Kuwait, and Peru). Further, many of those countries have actually had very small outbreaks, but their populations are so small that it’s a large per-capita rate. At the time of writing this, San Mario only had 735 cases, Aruba only had 2,358, and French Guiana 9,276; so those aren’t really fair comparisons. Regardless of whether you want to include those three, however, the point stands that the US has done one of the worst jobs of any country in the world at containing the virus and has done worse than the European countries people keep comparing to. That’s not a political statement, that is a simple, empirical fact (again, there are caveats that make it hard to actually precisely rank countries, but it is very clear from the data that the US is on the bad end of the distribution).

Does the US have a particularly good case-fatality rate?

Let’s now turn our attention to the question of whether or not the US has a good case-fatality rate, and this is going to bring in several important points about data analysis.

First, for all of the comparisons I’m going to talk about, I’ve limited the data to countries that have had at least 10,000 cases of COVID19 (I decided to do that before I ran any analyses). The reason for this is that percentages can be very unreliable when dealing with small sample sizes. As a result, including countries with few infections generates a lot of what we call “noise” in the data and it makes it hard to see the patterns that are really there, because those patterns are obscured by chance variation in small sample sizes. Also, only using countries that had fairly large outbreaks allows us to compare apples to apples and reduce some of the confounding facts (more on that in a minute).

So, when we do that, how well has the US done? There’s actually a lot of variation in these data, ranging from Singapore with a case-fatality rate <0.1% to Italy with a 13% fatality rate. The mean value is 3.1%, but because of a few extremes like Italy, you could make the case that the median is more appropriate, and it is 2.4%. So, how does the US compare? Its case-fatality rate is 3.0%, which is extremely average. It’s ever so slightly better than the mean value, and slightly worse than the median value, but either way it’s pretty close to the average. Not terrible, but also not great.

At this point, you may be thinking, “fine, the US did not do a good job of containing the virus, and it has done an average job of treating cases, but it still did better than most European countries,” but there is more going on here than is revealed by the crude percentages. We also have to consider confounding factors. There are things other than the variable we are interested in that vary among the groups we are comparing (countries in this case). It is obviously true that there are many differences among these countries other than simply how they treated COVID-19 patients.

To give one obvious example, as I perused these data and looked at the lists of European countries that people kept saying the US did better than, I realized those most of those countries have older populations than the US does. We know that there is a strong relationship between death from COVID and age, with the elderly being far more likely to die following an infection. So, we’d naturally expect countries with older populations to have more deaths per case load (i.e., population age is a confounding factor).

COVID case fatality rates COVID19

Correlation between median population age of a country and its case-fatality rate.

To actually examine this, Iran a regression between median age of the population in each country and the case-fatality rate. Unsurprisingly, there is a statistically significant positive relationship (P = 0.003). In other words, just as we’d, expect countries with older populations have higher case-fatality rates on average, and, as I said, the European countries that have higher case-fatality rates than the US tend to have older populations. In other words, population age is at least part of the reason why the US has a lower case-fatality rate than many European countries.

Now you may be wondering how much of a role population age is playing, so to get at least a crude answer to that, let’s dig a bit deeper into the scatter plot. The closer points are to the line, the more they conform to the central tendency of the data (by definition). In other words, when points are right on the line, then the general relationship between the X and Y variables is doing a good job of explaining those points. When points are further from the line, then other factors are at play and are explaining some of the variation in the data (the vertical distance between a point and the line is called the “residual” and is the variation that is not explained by the relationship between X and Y). For the total data set here, the R2 value (a correlation coefficient) is 0.092, which indicates that 9.2% of the variation in the entire data set is explained by the relationship between age and case-fatalities. Some countries, (like the US) are very close to the line, whereas others (like Italy) are much further. In other words, we know that median population age is explaining some of the variation in the data, and the US is close to where we’d expect it to be, based on that factor. So, age does a good job of explaining why the US is where it is on this graph.

To flip what I mean by that, we can us the equation of this line to predict a country’s case fatality rate based on its population age. The equation of this line is y = 0.0009x + 0.0012 and the median age for the US is 38.5 (x). So, the predicted case-fatality rate based entirely on population age is 3.6%. Its actual case-fatality rate is 3.0%. Thus, the actual case-fatality is lower than expected (based on this single factor; there are others) but not by a huge amount, and overall, population age does a fairly good job of predicting where America falls.

What about the rest of the variation? Well, there are lots of other confounding factors. Things like high population density (which results in very rapid, very localized outbreaks that overwhelm health-care systems) can have a huge impact. Further, some of the variation will inevitably be due to chance.

So, when we add all of that up, how has the US done at actually preventing deaths once people become infected? It’s hard to precisely rank it without doing a full, in depth statistical analysis that takes all confounding factors into account, but based on the available data, it’s done ok at treating patients once they become infected, but not exceptional. It has had an average response, and we know that a lot of the variation in the data is explained by confounding factors like age (also, again, that is just for treating people once they became infected; preventing infections is another story).


In short, you cannot use case-fatality rates to argue that a country did well or poorly at containing the virus and preventing its spread, because that metric shows what happened when people became infected, rather than reflecting the proportion of the country that became infected. It is possible for 100% of a population to be infected with COVID and still have a good case-fatality rate. Further, the USA has an average case-fatality rate. It’s not great, but it’s not terrible either. However, these comparisons are inherently problematic because there are many confounding factors. Things like the median age of a population have an effect, and many of the countries that have worse case-fatality rates than the USA also have older populations than the USA. We need really rigorous statistical analysis that measure and account for all of the confounding factors to get a clearer picture of why case fatalities are high in some countries and low in others.

Finally, I want to stress again that none of this is political. These are simple facts. You can certainly use those facts to make political arguments, but the facts themselves are not political. They are objective statements of reality.

Note: Although case-fatalities are not a good metric of how well a virus was contained, they can be influenced by the course a virus took in the country. A very large, localized outbreak that overwhelms the healthcare system will have a higher case-fatality rate than an outbreak that is very spread out (this is why there was so much focus on flattening the curve). However, it is still not a good metric because of all the reasons I’ve listed, and because there are many scenarios in which it is influenced by the outbreak course, but not in a way that is reflective of how well the outbreak was contained. Consider, for example, one country that had a single, localized outbreak that was contained and didn’t spread beyond that area, but did overwhelmed the resources in that area, and compare that to another country that never had that sort of extreme local outbreak, but failed to contain the virus and let it infect most of the country. The former clearly did better at containing the virus and will have far fewer cases and deaths per capita but will also likely have a higher case-fatality rate.

Update: As several people mentioned on social media, it is also worth pointing out that case-fatality rates are sensitive to the level of testing employed. When little testing is done, it often results in a high case-fatality rate because many people with COVID aren’t included. So for case-fatalities to be accurate, you need broad testing. Additionally, making comparisons can be complicated by differences in testing procedures and standards.

Data sources

  • The data on infections per capita were obtained from WorldOMeter on 05-Sep-2020
  • Data on population ages were obtained from on 04-Sep-2020
  • Date on case-fatalities were obtained from Johns Hopkins on 04-Sep-2020
    (both worldometer and Johns Hopkins have case-fatality data which are in close agreement with each other, but I considered Johns Hopkins to be more reputable and therefore used their data)

Related Posts

Posted in Uncategorized | Tagged , ,

COVID comorbidities are not analogous to car crashes: Debunking the 6% mortality claim

CDC COVID twitter tweet #only 6%Recently, the CDC released data on COVID comorbidities, including data showing that 6% of COVID-19 deaths only listed COVID on the death certificate, while the remaining 94% of COVID deaths also listed other conditions. Many have jumped on this as proof that COVID is far less deadly than previously claimed, and they are arguing that most reported COVID deaths are actually just people who died of some other condition while happening to have COVID. In particular, I keep seeing an analogy of someone who has COVID getting hit by a car, then the death being attributed to COVID. This is a very bad analogy (and faulty argument in general) that horribly mischaracterizes these data. So, I want to briefly explain what is actually going on.

First, you need to realize that when a patient dies, doctors list all of the factors that contributed to the death. This often includes multiple conditions, at which point we call them “comorbidities.” In the case of COVID, two main things are happening. First, in some cases, people have a pre-existing condition that interacts with COVID and makes them more likely to die from COVID. Second, COVID leads to conditions that then contribute to the death.

Let’s start with the pre-existing condition situation. We know that people with some health conditions are more prone to die from COVID than people without those conditions, because those conditions make them more vulnerable to COVID. Thus, there is an interaction between COVID and the pre-existing condition, with both contributing to the death. Importantly, however, in most cases, the person would not have died at this particular point in time had it not been for COVID. In other words, something like an existing respiratory problem makes people more sensitive to COVID, resulting in a higher death rate when infected with COVID. That does not mean that COVID wasn’t a key factor in their deaths. It is simply that it was not the only factor.

By way of analogy, imagine that someone with asthma gets trapped in an environment with lots of smog, ultimately resulting in an inability to breathe and subsequent death. What killed them? Well, both the asthma and the smog played a role. The smog was a serious problem because of the asthma, but conversely, they could have kept on living with the asthma had it not been for the smog. If we could have prevented them form being exposed to the smog, they would have lived.

Even so, for many people, COVID is fatal because of interactions with other conditions, but that still means that COVID was fatal. It still means that they would have lived had it not been for COVID.

To give one final analogy, imagine a disease that is far deadlier in men than in women. Imagine that we look at the mortality statistics form that disease and see that 94% of deaths were from men. It would clearly be absurd to say, “they didn’t die form the disease, it was being a male that killed them.” That would obviously be nuts. It would be apparent to everyone that there was an interaction between the disease and sex that causes men to be more sensitive to it. Even so, there are interactions between many pre-existing conditions and COVID that make people with those conditions more sensitive to COVID and more likely to die from it.

On the flip side, many of the reported comorbidities are actually caused by COVID. Look at the data from the CDC. The single most common comorbidity category* (68,004) was influenza/pneumonia. These diseases are often secondary infections that happen as a result of viral infections. Similarly, respiratory failure was present in 54,803 cases. Again, this is something that we know COVID causes. So many of these comorbidities are actually caused by COVID!

*Technically, the most common category was “other” which includes a very wide range of conditions that were grouped together because each was too uncommon to merit its own category. Thus the influenza/pneumonia category was the most common category for discrete diseases, rather than the large hodgepodge of conditions.

By way of analogy, the argument being made by science deniers is no different from someone bleeding out from a gunshot wound, then someone else saying, “bullets aren’t dangerous, because she died from blood loss, not the bullet.” That’s obviously a dumb argument. She only lost the blood because of the bullet. Even so, many people are only dying from conditions like respiratory failure or heart failure because of COVID19.

It is also worth noting that, as is often the case, this argument is straight out of the anti-vaccine playbook. For diseases like measles, secondary infections with diseases like pneumonia often contribute to children’s deaths. Thus, anti-vaccers incorrectly argue that measles isn’t deadly because the pneumonia is what killed them. Just like COVID and my gunshot example, however, they only developed pneumonia because of measles.

So now, with all of that in place, let’s circle back the analogy of someone getting hit by a car. I like analogies a lot. I have frequently argued that they are valuable for testing whether consistent reasoning is being applied. However, as I have explained before, for the analogies to be useful, they must follow the same logical structure as the original argument. That is very clearly not the case here. Someone who happens to have COVID getting hit by a car is a very, very different thing from either someone with a pre-existing condition that predisposes them to complications from COVID dying from an interaction between the condition and COVID or COVID itself causing a secondary condition.

Do you see the difference? The vast majority of comorbidities listed are directly related to COVID either as a factor that exacerbates the situation or as a result of COVID. In contrast, the car accident has nothing to do with COVID. They are not analogous, and anyone who would use such a clearly terrible argument obviously does not know what they are talking about.

Having said all of that, there are almost certainly some cases in this database where COVID truly wasn’t the cause. There are probably some cases where someone who had COVID just happened to have a heat attack that would have happened without the COVID, or where someone who had COVID was in an accident, but when you start looking closely at the data, those are clearly a very tiny minority, and the vast majority of comorbidities relate to COVID. Indeed, beyond these data and all the data looking at how COVID attacks the body, we also know that there have been far more deaths this year in the US than there were during the same time period last year (Weinberger et al. 2020). Indeed, there are more excess deaths than the total number of reported COVID deaths. Understanding exactly what that means is very complicated because there are many contributing factors. We may be underestimating COVID deaths, but also, there may be increased deaths due to factors like people not seeking medical help for conditions for which they normally would seek help. Conversely, things like a decrease in car accidents could pull the number the other direction. However, several pieces of evidence (such as a spike in excess deaths in places that had large outbreaks with many reported COVID deaths; e.g., New York city) indicate the COVID is a key factor in the number of excess deaths seen this year, and it is very unlikely that we are grossly overestimating the COVID mortalities.

As others have pointed out, the correct way to look at this 6% figure is not that only 6% of reported COVID deaths were actually from COVID. Rather, it means that of all the people who died from COVID, 6% did not have any other reported conditions. In other words, these data show that some people are more vulnerable to COVID than others due to existing health conditions (which we already knew) and COVID often results in secondary problems which contribute to patients’ demise (again, which we already knew). Stop trying to twist science to fit your personal agenda and look rationally at the facts. Think critically and don’t blindly believe something just because you saw it on Facebook or Twitter.

Posted in Uncategorized | Tagged ,

What does “statistically significant” mean?

Lately, social media has been flooded with people sharing studies about various aspects of COVID. This is potentially great. I’m all for people being more engaged with science. Unfortunately, many people lack a good foundation for understanding science, and a common point of confusion is the meaning of “statistically significant.” I’ve written about this at length several times before (e.g., here and here), so for this post, I’m just trying to give a brief overview to hopefully clear up some confusion. In short, “statistically significant” means that there is a low probability that a result as great or greater than the observed result could arise if there is actually no effect of the thing being tested. Statistical tests are designed to show you how likely it is that the sample in a study is representative of the entire population from which the sample was taken. I’ll elaborate on what that means below (don’t worry, I’m not going to do any complex math).

Let’s imagine a randomized controlled drug trial where we take 100 patients, randomly split them into two groups (50 people each), give one group a placebo, give the other group the drug, then record how many of them develop a particular disease over the next month. In the control group, 20% of patients (10 individuals) develop the disease, whereas in the treatment group only 10% (5 patients) developed it. Does the drug work?

This is where the confusion comes in. Many people would look at those results and say, “obviously it helped, because 10% is lower than 20%”. When we do a statistical test on it (in this case a chi-square test), however, we find that it is not statistically significant (P = 0.263), from which we should conclude that this study failed to find evidence that the drug prevents the disease. You may be wondering how that is possible. How can we say that taking the drug doesn’t result in an improvement when there was clearly a difference between our groups? How can 10% not be different from 20%?

To understand this, you need to understand the difference between a population and sample and the reason that we do these tests. This hypothetical experiment did find a difference between the groups for the individuals in the study. In other words, the treatment and control groups were different in this sample, but that’s not very useful. What we really want to know is whether or not this result can be generalized. We really want to know whether, in general, for the entire population, taking the drug will reduce your odds of getting the disease.

To elaborate on that, in statistics, we are interested in the population mean (or percentage). This may be a literal population of people (as in my example) but it applies more generally, and is simply the distribution of data from which a sample was taken. The only way to actually know the population mean (or percentage) is to test the entire population, but that is clearly not possible. So instead, we take a sample or subset of the population, and test it, then apply that result to the population. So, in our example, those 100 people are our sample, and the percentages we observed (10% and 20%) are our sample percentages.

I know this is starting to get complicated, but we are almost there, so bear with me. Now that we have sample percentages we want to know how confident we can be that they accurately represent the population percentages. This is where statistics come in. We need to know how likely it is that we could get a result like ours or greater if there is no effect of the drug, and that’s precisely what statistical tests do. They take a data set and look at things like the mean (or proportion), the sample size, and the variation in the data, and they determine how likely it is that a result as great or greater than the one that was observed could have arisen if there is no effect of treatment. In other words, they assume that the treatment (drug in this case) does not do anything (i.e., they assume that all results are from chance), then they see that how likely it is that a result as great or greater than the observed result could be observed given the assumption that all results are due to chance. Sample size becomes important here, because the larger the sample size, the more confident we can be that a sample result reflects the true population value.

So, in our case, we got P = 0.263. What that means is that if the drug doesn’t do anything, there is still a 26.3% chance of getting a result as great or greater than ours. In other words, even if the drug doesn’t work, there is a really good chance that we’d get the type of difference we observed (10% and 20%). Thus, we cannot be confident that our results were not from chance variation, and we cannot confidently apply those percentages to entire population.

Having said that, let’s see what happens if we increase the sample size. Imagine we have 1,000 people, but still get 10% for the treatment group and 20% for the control group. Now we get a highly significant result of P = 0.00001. In other words, if the drug doesn’t do anything, there is only a 0.001% chance of getting a difference as great or greater than the one we observed. Why? Well, quite simply, the larger the sample the more representative it is of the population, and the less likely we are to get spurious results. From this, we’d conclude that the drug probably does have an effect.

a jar of red and blue marbles labeled "population" and five randomly selected marbles labeled "sample"

Another useful way to think about this is to imagine a jar full of red and blue marbles (this is your population). You want to know if there are more of one color than the other, so you reach in and randomly grab several (this is your sample). Suppose you get more blue than red, can you conclude that there are more blue marbles than red marbles in the jar (population)? This clearly depends on the size of your sample. The larger it is, the more confident you can be that your sample represents the population.

To try to illustrate all of this, imagine flipping a coin. You want to know if a coin is biased, so you flip it 10 times and get 4 heads and 6 tails. That is your sample: 40% heads. Now, is the coin biased? In other words, if you flipped the coin 10,000 times, would you expect, based on your sample, that you’d get roughly 40% heads? How confident are you that your sample result applies to the population? You probably aren’t very confident. We all intuitively know that it is entirely possible for even a totally fair coin to give 4 heads and 6 tails in a mere 10 flips. No one would scream, “but in your test the coin was biased!” We all realize that the sample may not be representative of the population.

Now, however, imagine that you flip it 100 times and get 40 heads and 60 tails. This is your new sample. Now how confident are you? Probably more confident than before, but also probably not that confident. Again, we all realize that there is chance variation. Indeed, if we ran that actual stats on this, we’d get P = 0.2008. In other words, this test says, “assuming that the coin is not biased, there is a 20.08% chance of getting a difference as great or greater than a 40%/60% split,” but what if we did 1,000 flips and got 400 heads and 600 tails. Now, we’d probably think that the coin truly was biased. At that point, we’d expect that our sample probably does apply to the population and continuing to flip the coin will continue to yield percentages of roughly 40% and 60%. If we actually run the stats on that, our conclusion would be justified. The P values is less than 0.00001, meaning that if the coin is not biased, there is less than a 0.001% chance of getting a result as great or greater than ours. This would be good evidence that the coin itself (the population) is likely biased, and our results are unlikely to be from chance variation in our sample.

That is, in a nutshell, what statistical tests (at least frequentists statistical tests) are doing, and we only consider something to be “statistically significant” when the probability that a result like it (or greater) could arise (given the assumption that there is no effect of treatment) is below some pre-defined threshold. In most fields, that threshold is P = 0.05. In other words, we only conclude that the sample results apply to the population if there is less than a 5% chance of getting a result as great or greater than the one we observed if the thing being tested actually has no effect.

Note: An important topic not covered here is confidence intervals. These show you the range of possible population values for a given sample value. So, for example, if you had a mean of 20 and a 95% confidence interval of 10-30, that would mean that you can be 95% sure that the population mean is between 10 and 30.

Note: This post was revised to change statements that a P value showed the probability that a result as great or greater than the one that was observed could arise by chance to statements that the P value showed the probability that a result as great or greater than the one that was observed could arise if there is no effect of the the treatment.

Posted in Nature of Science | Tagged , , | 10 Comments

Increased testing does not explain the increase in US COVID cases

The US is experiencing another sharp increase in COVID19 cases. This is a simple fact, but as always seems to be the case in today’s world, this fact is being treated as an opinion. Countless people (including prominent politicians and even the president) are claiming that cases are not actually increasing, and the apparent increase is simply the result of increased testing. This claim is dangerous and untrue, but it also offers a good opportunity to teach some lessons in data analysis. Obviously, an increase in testing will result in an increase in the number of cases that are documented, that much is true, but that doesn’t necessarily mean that the entirety of the increase is from increased testing. So how can we tell whether the true number of cases is increasing? There are multiple ways to examine this, and I’m going to walk through several of them and try to explain the stats in a non-technical way so that everyone can really grasp these concepts.

To begin with, I’m not actually going to talk about coronavirus. That topic has, unfortunately, becomes such a political battleground (even though it should be entirely scientific) that it is difficult to get people to think clearly and unbiasdly about it. So instead, let’s start by talking about Willy Wonka’s chocolate factory. Like most chocolate factories, they sometimes get insects in their chocolate bars and they test subsets of them to see how often this occurs. This situation is analogous to testing for a disease, and the math is the same, so let’s use it as an example to understand the math, then we’ll apply that understanding to coronavirus.

For sake of example, let’s say that Wonka produces 10,000 chocolate bars a day, and examines 2,000 of them for the presence of insects (these are the tests). Further, as you might have guess, his chocolate factory has rather lax hygiene standards, so out of those 10,000 bars, 1,000 actually have insects. How many do we expect to have insects (i.e., be positive cases) in the sample of 2,000 tests? This is easy to calculate. 1,000 is 10% of 10,000, so we expect 10% of the tests to be positive. Thus, out of 2,000 tests, we expect to get 200 bars with insects (i.e., documented cases; note that I am acting as if testing is random to make the math easy for all to follow; this is a simplification, but doesn’t actually change the point; see note at the end).

Now, suppose that Wonka increases the testing and gets higher numbers of positives (more cases). What does that mean? It could simply mean that the number of bars with insects is unchanged, but more are found due to more testing. However, it is also possible that both testing and the true number of bars with insects are both increasing. How can we tell which is occurring?

Figure 1: Changes in the percent of tests that are positive under different scenarios. For each line, testing increases by 10% of its starting value each day, but the number of actual cases (not observed cases) varies, and the lines show the percent of tests that were positive. Blue lines show a decrease in actual cases over time, the grey line shows no change in actual cases, and the red lines show an increase in actual cases. As you can see, anytime that the total number of cases increases, the percent of tests that are positive will increase, whereas if the total number of cases is unchanged or decreases, the percent of positives will either remain stable or decrease, even if testing increases.

The answer lies in the percentage of tests that are positive. If the actual number of bars with insects is unchanged, and the increase in positives is simply due to increased testing, then the percent of tests that are positive will remain constant even though the total number of positive tests goes up (Figure 1). Think about the math from earlier. 10% of bars have insects. So, we expect roughly 10% of tests to be positive, regardless of how many tests we do (though the percentage will be more accurate with a larger sample size). So, if we do 2,000 tests, we expect 200 bars with insects (10% positive). If we do 4,000 tests, we expect 400 bars with insects (10% positive). If we do 6,000 tests, we expect 600 bars with insects (10% positive), etc. The total number of bars with insects (cases) increase as testing increases, but the percentage of those tests that are positive remains the same. As another example, imagine that you have a bag with 500 blue marbles and 500 red marbles. You reach into the bag and grab a handful. You expect to get roughly 50% of each color regardless of how many you grab (though you expect the value to be closer to 50% [more accurate] as sample size increases). It’s the same with testing.

So, if the increase is entirely from testing, the percent of tests that are positive should be unchanged, but what happens if the number of insects in chocolate bars are actually decreasing, while testing is increasing? What happens then? Well, the total number of positive test results may either go up or down (depending on the sizes of the decrease in insects and increase in testing), but the percentage of tests that are positive will always go down (Figure 1). Going back to the example, we expect 10% of tests to be positive when 1000 out of 10,000 bars actually have insects and 2,000 tests are conducted. Now, suppose that the number of bars with insects is cut in half (500) and testing is tripled (6,000). Now, we expect only 5% of tests to be positive, but 5% of 6,000 is 300. So, while the total number of observed positive cases increased, the percent of tests that were positive decreased. This tells us that the actual number of bars with insects is decreasing, despite the increase in testing.

Conversely, if more bars actually have insects, we expect a higher percentage of tests to be positive, even if the level of testing increases. Imagine, for example, that the number of bars with insects increases to 2,000 out of 10,000, while the number of tests also doubles (4,000). Now, we expect 20% of tests to be positive, resulting in 800 cases. See how that works?

I have illustrated all of these patterns in Figure 1, showing the hypothetical situation I have been describing with changes in testing and, sometimes, changes in the actual number of bars with insects over a 20-day period. Each line shows the percent of tests that were positive. The grey line shows the situation where testing increases but the actual number of bars with insects (cases) do not, the blue lines show increased testing with a decrease in the actual number of cases, and red lines show increased testing coupled with an increase in the actual number of cases. As you can hopefully see, the only way to get a decreasing percentage of positive tests is if the actual number of cases (not simply the number of documented cases) decreases, and any time that the actual number of cases increases, the percent of tests that are positive will also increase. This percentage of positive tests is key for understanding what is actually happening.

Figure 2: Percent of coronavirus tests that were positive for June. The first panel shows the data for the whole country, and the second shows two states with large outbreaks (Florida and Arizona). They are presented in separate panels simply so that the change for the whole country is not obscured by the much larger change for individual states. Data were downloaded from the Covid Tracking Project late on 28-June-20.

Now, with all of that in mind, let’s look at coronavirus in the US. If the situation is truly improving and the actual number of cases is truly decreasing and the apparent recent increase in cases is just a result of increased testing, as many argue, then we should see that the percent of tests that are positive has continued to decrease. That is not, however, what we see. It was decreasing for a while, but if we look at June (when things have been opening back up and when the spike in cases occurred) we see a statistically significant (P < 0.0001) increase in the percentage of tests that are positive (Figure 2). In other words, the increase in tests simply cannot explain the entirety of the increase in cases. It probably is a contributing factor, but the actual the actual number of coronavirus cases in the US is actually going up rapidly. That is a fact. To be clear, exactly what is happening varies by states, and some cases are experiencing decreases in the rates of positive tests, but many others are experiencing sharp increases, particularly in states like Florida and Arizona (Figure 2). They are very much experiencing viral outbreaks (Johns Hopkins has some very nice data and graphs for state data that I recommend looking at)..

There is another really useful way to examine this, which is to look at the percent change for number of tests and number of observed cases (positive tests). Sticking with chocolate bar example and using the data presented in Figure 1, we find that when testing increased by 100 tests each day, but the actual number of cases remained constant, the number of tests increased by 145% over time and the number of positive tests per day (cases) increased by 145%. This is what we expect if the actual number of bars with insects is constant, but the testing increases: the percent difference should be the same for both the total number of cases and the number of observed cases (positive tests). When testing increased by 100 tests a day and the actual number of bars with insects increased by 1% of the original level each day, however, the percent difference in tests was still 145%, but the number of positive tests (cases) increased by 216%, and when actual cases increased by 5% of the original level each day, the number of positive tests increased by 500%! Do you see how that works? If the increase is entirely from increased testing (while the actual number of cases remains the same), then both the increase in tests and the increase in observed cases will match. In contrast, if actual cases are also increasing, then the increase in positive tests will outpace the increase in testing.

So, what do we find for coronavirus in the US? Well, if we compare the last 7 days of May (7-day average) to the past 7 days of June (with the 28th being the most recent date based on when I downloaded the data), we find that the number of tests increase by 40.5%, while observed cases increased by 83.0%! In other words, the increase in cases substantially outpaces the increase in testing, clearly indicating that we are actually experiencing a real increase in coronavirus cases, not simply an increase in known cases due to increased testing. The situation is even more dire when you start looking at states where the largest outbreaks are occurring. In Arizona, for example, again comparing the last 7 days of May to the past 7 days of June, we find testing increased by 116.9%, but daily new cases increased by 498.2%. Florida is a similar story. Testing has increased by 88.3%, but daily new cases has increased by an astounding 726.7%! This is undeniably an outbreak.

Indeed, you can get a sense for these general trends just by looking at a comparison of testing rates and numbers of new cases over time (Figure 3). As you can see, at first, testing lagged well behind cases as we experience the first initial outbreak. Then, cases started declining, even though the number of tests continued a steady increase. It is only in the past few weeks (i.e., since social distancing restrictions, closures, etc. have been being lifted) that we see a spike in cases. Further, the recent spike in cases does not correspond to a spike in testing. Testing has been increasing at a steady rate, whereas cases suddenly shifted from a steady decrease to an exponential increase. In other words, the number of observed cases does not track well with the number of tests. If the current increase in cases was really a result of increased testing, then new cases should have been tracking with testing all along. They should have continued to increase after March, because testing increased. That’s not at all what we see, however. Again, testing simply can’t explain the trends. That doesn’t mean that there is no impact of testing, obviously there is, but it is clearly not the key thing driving trends.

Figure 3: Coronavirus testing and cases for the USA. As you can see, cases are a poor match for testing, indicating that testing alone does not explain the recent increase in cases. The x-axis labels show the start of each month. Data were downloaded from the Covid Tracking Project late on 28-June-20.

Yet more evidence comes from hospitalization rates. The “its just more testing” argument relies on the notion of many asymptomatic people (or at least people with very mild cases) that have only been detected recently due to increased testing. If that was the case, then hospitalization rates should be remaining level or going down (if the virus is truly going away), yet many states are experiencing increased hospitalization rates, with the Texas Medical Center (an enormous complex) hitting 100% capacity for its ICU. That simply cannot be explained as a result of increased testing.

Fortunately, deaths have not started spiking yet. There are several reasons for this. One is that, this time, more young people are getting the disease. Another is simply that death rates inevitably lag behind infection rates, and it is very likely that deaths rates will increase in the coming weeks (though many experts are hopeful that we will be able to avoid the type of enormous spike we saw a few months ago).

In short, an actual examination of the data clearly and unequivocally shows that the current increase in coronavirus cases in the US cannot be explained simply as a result of increased testing. The percent of tests that are positive is increasing, which is a clear indication that the actual number of cases is increasing. Further, in states like Arizona and Florida, the numbers are truly shocking, with the increases in new cases massively outpacing the increases in testing. We are clearly still in the middle of a deadly outbreak, and it is getting worse. This isn’t a liberal conspiracy to undermine Donald Trump; it is a fact, and facts don’t change based on your political party.

Note: Please refrain from political comments. This post is about science and evidence and comments should likewise be about science and evidence (see Comment Rules).

Note: someone might object that my examples assume random testing, while testing is actually somewhat targeted, and people who are symptomatic or are known to have been in contact with someone who is infected are more likely to be tested. This fact is true, but actually doesn’t substantially change anything I’ve said. It does affect the exact percentages but doesn’t change my point about the trends. It is still true that the only way to get an increasing percentage of positive tests while the testing rate is increasing is for the actual number of total cases to be increasing (technically, this could also happen if we learned to do a much better job at targeting our tests, but there is no indication of this that I have seen; certainly not enough to cause the numbers we are seeing, and it still would not explain the increases in hospitalization rates).

Data source: The data I presented here were downloaded from the Covid Tracking Project late on 28-June-20.

Posted in Uncategorized | Tagged , | 25 Comments

Science is a path to knowledge

There are a lot of misconceptions about what science actually is, and, as a result, there are a lot of incorrect conclusions about the reliability and utility of science. I frequently encounter people who expect science to give absolute answers. They act as though science is a method for proving what is true with 100% certainty. As a result, they view cases where science led to an incorrect conclusion as evidence that science itself is flawed. You can clearly see this in arguments that a current scientific result doesn’t need to be accepted because “science has been wrong before” or “there used to be a scientific consensus that the earth was flat” (there wasn’t, but that’s another topic), etc. Similarly, there is a false view that a scientific conclusion is either 100% right or 100% wrong. In reality, science is a path to knowledge. It is a way of testing ideas and slowly building a body of knowledge based on the results of those tests. Sometimes, the path takes wrong turns, but unlike every other path to knowledge that has ever been invented, science is systematic and self-correcting and steers itself back in the correct direction, resulting in a gradual accumulation of knowledge.

Before I go any further, I want to acknowledge that this description of science as a “path to knowledge” is not original with me and was coined by my friend and fellow skeptic, The Credible Hulk. So, go check out their blog and Facebook page for more great science content.

I really love this description of science as a “path to knowledge” because it beautifully encapsulates what science is and why it works. You see, science does not give absolute results. In other words, it does not “prove” anything with utter certainty. Rather, science is all about probabilities. As I often like to say, science simply shows us what is most likely to be true given the current evidence. That probability can, however, always change with future evidence. Any scientific result can be overturned as new evidence comes to light.

The tentative nature of a scientific result is one of its great strengths, but it can lead to confusion. People often make the incorrect leap from, “science does not give definitive answers” to “science is uncertain and therefore I don’t have to accept a given result.” This is a flawed way of understanding science. Remember, it is a way for telling us what is most likely true given the current evidence. Therefore, it’s results should be accepted until such time as future evidence arises to discredit those results. Sticking with our path analogy, a lack of 100% certainty that a path is going the right direction would not justify abandoning the path altogether and wandering aimlessly through the forest. Further, a lack of 100% certainty does not mean that we cannot be highly confident in a result. There are some things that have been so thoroughly tested so many times in so many ways that it is extraordinarily unlikely that they are wrong. In other words, some paths are marked well enough that you can be really confident in them.

On the other end of the spectrum, people ignore the tentative nature of scientific conclusions and act as though it should give definitive answers, leading to the flawed arguments about science having been wrong in the past. These arguments are problematic in a number of important ways. First, they treat the inherently self-correcting nature of science as if it is a bad thing, when in fact, it is another great strength of science. Really think about this. If you are going to argue that, “I don’t have to accept a scientific result because scientists used to think sun moved around the earth,” my question would be, “why do we no longer think that the sun moves around the earth?” The answer is very clearly that other scientists continued conducting tests and discredited the previous view. Science corrected itself. This is not a weakness, but rather a strength. No other path to knowledge does this. No other system of understanding repeatedly and systematically tests its conclusions and updates its information by rejecting debunked results and accepting new results.

Further, because of the way that science advances, the argument that “science has been wrong before” is inherently self-defeating. Sticking with the orbit of the earth for a minute, we only know that the earth orbits the sun because science debunked the notion that the sun orbits the earth, so you can’t use that as an argument that science doesn’t work, because the argument inherently includes the premise that science works! In other words, if this argument gives us carte blanche to disregard scientific results, then why should we accept the result that the earth moves around the sun? That result was produced by science, and this argument claims that we don’t have to accept scientific results, so why should we accept the result that the earth moves around the sun? We only know that science was wrong before because of science. Again, this self-correction is one of the best things about science.

Additionally, it is important to realize that scientific results are often incomplete more than actually wrong, and there are degrees of wrongness. The progression of physics is a great example of this that I use frequently. Newton made enormous strides in physics. He moved us far along the path, but we later found out that he was slightly off course. Einstein showed that Newton’s work was incomplete and his conclusions did not apply universally. However, that didn’t mean that we threw Newton out the window and went all the way back to the trail marker Newton started at. Newton moved us closer to the truth, and Newtonian physics are still taught and applied all around the world, but he was incomplete, and Einstein took Newton’s results and shifted us back on track. Think of it like this: we needed to go north, and Newton took us slightly north west. He still moved us much closer to our goal, but we needed Einstein to reorient us and get us back on track.

This gradual accumulation of knowledge is another key aspect of science. Yes, science sometimes makes mistakes, but because it corrects those mistakes, we gradually get closer and closer to the truth. People who thought the sun revolved around the earth were less wrong than people who though the sun was a god. Galileo was less wrong than the people who thought the sun moved around the earth. Newton was less wrong than Galileo. Einstein was less wrong than Newton, etc. At each step, we got closer, and closer to the truth. This is also another reason why it is so absurd to blindly disregard modern scientific results on the basis that science has been wrong before. Science is a gradual accumulation of knowledge, and although there certainly are things about which we are wrong today, we are less wrong than previous generations, and we know this because we tested the views of previous generations and built on that knowledge.

To give another example, there are certainly things about which modern medicine is wrong. That is inevitable due to the tentative and probabilistic nature of science, but modern medicine is less wrong than medicine was 20 years ago, and medicine 20 years ago was less wrong than medicine 40 years ago, and medicine 40 years ago was less wrong than medicine 60 years ago, etc. Further, I can demonstrate this extremely easily. Imagine you need a major medical intervention and you can be treated using the technology and knowledge from any of the following time points: 200 years ago, 100 years ago, 50 years ago, 25 years ago, or current. Rank your choice from lowest to highest. I’m willing to bet your choices went chronologically (inverse) with your preference being treatment via our current knowledge, and there is a very good reason why that is the correct way to rank things. Namely, science works! It’s not perfect, but it is a path that moves generally in the right direction, and we all intuitively realize that science has helped us progress and, thanks to science, we know more than any generation before us knew.

Further, we can extend my medical analogy to just about any field of science. Imagine that you are on a game show run by omnipotent aliens with a perfect knowledge of the universe. They ask you a chemistry question, and you have a lifeline that will let you call a random chemist from the current year, or from 25 years ago, or from 50 years ago, etc. Whom do you call? Obviously, you call the chemist from the current year. Again, we all intuitively accept that science works and gradually builds knowledge. Even those who like to argue that “science has been wrong before” must admit that, thanks to science, we know more now than at any other point in our history. Science has a proven track record of moving us in the right direction.

Finally, if you are not convinced by anything I’ve said thus far, then my question for you is simply, “what’s the alternative?” Really think about this. What other path to knowledge can compete with science? As I’ve explained before, science is responsible for our modern society. All of the technological and medical marvels around you are the result of gradually testing ideas and accumulating knowledge. Look at all the previously fatal diseases that we can now cure or even prevent, look at the decreases in mortality rates, etc. All of that is because of science. So why should we go back to unsystematic guess work? We tried other systems (like relying on anecdotes) for millennia, and they didn’t work. It was science that brought us out of the dark ages, and it is science that will allow us to continue our advancement as a species. Again, that doesn’t make science perfect or infallible. It simply shows us what is most likely true given the current evidence, but by constantly testing, by constantly self-correcting, by constantly updating, it gradually moves us closer and closer to the truth. It’s not perfect, and it certainly isn’t a straight path, but it’s the best path to knowledge that we have.

Note: To anyone who is about to reply with a snarky remark about doctors/scientists saying that smoking is safe, please read this post. The reality is that there was never a scientific consensus that smoking was safe and, in fact, science had showed that it caused cancer all the way back in the 1930’s. Indeed, actual studies consistently showed that it was dangerous. Tobacco companies simply did a good job of creating the illusion that science was on their side; meanwhile, actual science was continuing along the correct path.

Related posts


Posted in Nature of Science | Tagged , | 29 Comments