The US is experiencing another sharp increase in COVID19 cases. This is a simple fact, but as always seems to be the case in today’s world, this fact is being treated as an opinion. Countless people (including prominent politicians and even the president) are claiming that cases are not actually increasing, and the apparent increase is simply the result of increased testing. This claim is dangerous and untrue, but it also offers a good opportunity to teach some lessons in data analysis. Obviously, an increase in testing will result in an increase in the number of cases that are documented, that much is true, but that doesn’t necessarily mean that the entirety of the increase is from increased testing. So how can we tell whether the true number of cases is increasing? There are multiple ways to examine this, and I’m going to walk through several of them and try to explain the stats in a non-technical way so that everyone can really grasp these concepts.
To begin with, I’m not actually going to talk about coronavirus. That topic has, unfortunately, becomes such a political battleground (even though it should be entirely scientific) that it is difficult to get people to think clearly and unbiasdly about it. So instead, let’s start by talking about Willy Wonka’s chocolate factory. Like most chocolate factories, they sometimes get insects in their chocolate bars and they test subsets of them to see how often this occurs. This situation is analogous to testing for a disease, and the math is the same, so let’s use it as an example to understand the math, then we’ll apply that understanding to coronavirus.
For sake of example, let’s say that Wonka produces 10,000 chocolate bars a day, and examines 2,000 of them for the presence of insects (these are the tests). Further, as you might have guess, his chocolate factory has rather lax hygiene standards, so out of those 10,000 bars, 1,000 actually have insects. How many do we expect to have insects (i.e., be positive cases) in the sample of 2,000 tests? This is easy to calculate. 1,000 is 10% of 10,000, so we expect 10% of the tests to be positive. Thus, out of 2,000 tests, we expect to get 200 bars with insects (i.e., documented cases; note that I am acting as if testing is random to make the math easy for all to follow; this is a simplification, but doesn’t actually change the point; see note at the end).
Now, suppose that Wonka increases the testing and gets higher numbers of positives (more cases). What does that mean? It could simply mean that the number of bars with insects is unchanged, but more are found due to more testing. However, it is also possible that both testing and the true number of bars with insects are both increasing. How can we tell which is occurring?
The answer lies in the percentage of tests that are positive. If the actual number of bars with insects is unchanged, and the increase in positives is simply due to increased testing, then the percent of tests that are positive will remain constant even though the total number of positive tests goes up (Figure 1). Think about the math from earlier. 10% of bars have insects. So, we expect roughly 10% of tests to be positive, regardless of how many tests we do (though the percentage will be more accurate with a larger sample size). So, if we do 2,000 tests, we expect 200 bars with insects (10% positive). If we do 4,000 tests, we expect 400 bars with insects (10% positive). If we do 6,000 tests, we expect 600 bars with insects (10% positive), etc. The total number of bars with insects (cases) increase as testing increases, but the percentage of those tests that are positive remains the same. As another example, imagine that you have a bag with 500 blue marbles and 500 red marbles. You reach into the bag and grab a handful. You expect to get roughly 50% of each color regardless of how many you grab (though you expect the value to be closer to 50% [more accurate] as sample size increases). It’s the same with testing.
So, if the increase is entirely from testing, the percent of tests that are positive should be unchanged, but what happens if the number of insects in chocolate bars are actually decreasing, while testing is increasing? What happens then? Well, the total number of positive test results may either go up or down (depending on the sizes of the decrease in insects and increase in testing), but the percentage of tests that are positive will always go down (Figure 1). Going back to the example, we expect 10% of tests to be positive when 1000 out of 10,000 bars actually have insects and 2,000 tests are conducted. Now, suppose that the number of bars with insects is cut in half (500) and testing is tripled (6,000). Now, we expect only 5% of tests to be positive, but 5% of 6,000 is 300. So, while the total number of observed positive cases increased, the percent of tests that were positive decreased. This tells us that the actual number of bars with insects is decreasing, despite the increase in testing.
Conversely, if more bars actually have insects, we expect a higher percentage of tests to be positive, even if the level of testing increases. Imagine, for example, that the number of bars with insects increases to 2,000 out of 10,000, while the number of tests also doubles (4,000). Now, we expect 20% of tests to be positive, resulting in 800 cases. See how that works?
I have illustrated all of these patterns in Figure 1, showing the hypothetical situation I have been describing with changes in testing and, sometimes, changes in the actual number of bars with insects over a 20-day period. Each line shows the percent of tests that were positive. The grey line shows the situation where testing increases but the actual number of bars with insects (cases) do not, the blue lines show increased testing with a decrease in the actual number of cases, and red lines show increased testing coupled with an increase in the actual number of cases. As you can hopefully see, the only way to get a decreasing percentage of positive tests is if the actual number of cases (not simply the number of documented cases) decreases, and any time that the actual number of cases increases, the percent of tests that are positive will also increase. This percentage of positive tests is key for understanding what is actually happening.
Now, with all of that in mind, let’s look at coronavirus in the US. If the situation is truly improving and the actual number of cases is truly decreasing and the apparent recent increase in cases is just a result of increased testing, as many argue, then we should see that the percent of tests that are positive has continued to decrease. That is not, however, what we see. It was decreasing for a while, but if we look at June (when things have been opening back up and when the spike in cases occurred) we see a statistically significant (P < 0.0001) increase in the percentage of tests that are positive (Figure 2). In other words, the increase in tests simply cannot explain the entirety of the increase in cases. It probably is a contributing factor, but the actual the actual number of coronavirus cases in the US is actually going up rapidly. That is a fact. To be clear, exactly what is happening varies by states, and some cases are experiencing decreases in the rates of positive tests, but many others are experiencing sharp increases, particularly in states like Florida and Arizona (Figure 2). They are very much experiencing viral outbreaks (Johns Hopkins has some very nice data and graphs for state data that I recommend looking at)..
There is another really useful way to examine this, which is to look at the percent change for number of tests and number of observed cases (positive tests). Sticking with chocolate bar example and using the data presented in Figure 1, we find that when testing increased by 100 tests each day, but the actual number of cases remained constant, the number of tests increased by 145% over time and the number of positive tests per day (cases) increased by 145%. This is what we expect if the actual number of bars with insects is constant, but the testing increases: the percent difference should be the same for both the total number of cases and the number of observed cases (positive tests). When testing increased by 100 tests a day and the actual number of bars with insects increased by 1% of the original level each day, however, the percent difference in tests was still 145%, but the number of positive tests (cases) increased by 216%, and when actual cases increased by 5% of the original level each day, the number of positive tests increased by 500%! Do you see how that works? If the increase is entirely from increased testing (while the actual number of cases remains the same), then both the increase in tests and the increase in observed cases will match. In contrast, if actual cases are also increasing, then the increase in positive tests will outpace the increase in testing.
So, what do we find for coronavirus in the US? Well, if we compare the last 7 days of May (7-day average) to the past 7 days of June (with the 28th being the most recent date based on when I downloaded the data), we find that the number of tests increase by 40.5%, while observed cases increased by 83.0%! In other words, the increase in cases substantially outpaces the increase in testing, clearly indicating that we are actually experiencing a real increase in coronavirus cases, not simply an increase in known cases due to increased testing. The situation is even more dire when you start looking at states where the largest outbreaks are occurring. In Arizona, for example, again comparing the last 7 days of May to the past 7 days of June, we find testing increased by 116.9%, but daily new cases increased by 498.2%. Florida is a similar story. Testing has increased by 88.3%, but daily new cases has increased by an astounding 726.7%! This is undeniably an outbreak.
Indeed, you can get a sense for these general trends just by looking at a comparison of testing rates and numbers of new cases over time (Figure 3). As you can see, at first, testing lagged well behind cases as we experience the first initial outbreak. Then, cases started declining, even though the number of tests continued a steady increase. It is only in the past few weeks (i.e., since social distancing restrictions, closures, etc. have been being lifted) that we see a spike in cases. Further, the recent spike in cases does not correspond to a spike in testing. Testing has been increasing at a steady rate, whereas cases suddenly shifted from a steady decrease to an exponential increase. In other words, the number of observed cases does not track well with the number of tests. If the current increase in cases was really a result of increased testing, then new cases should have been tracking with testing all along. They should have continued to increase after March, because testing increased. That’s not at all what we see, however. Again, testing simply can’t explain the trends. That doesn’t mean that there is no impact of testing, obviously there is, but it is clearly not the key thing driving trends.
Yet more evidence comes from hospitalization rates. The “its just more testing” argument relies on the notion of many asymptomatic people (or at least people with very mild cases) that have only been detected recently due to increased testing. If that was the case, then hospitalization rates should be remaining level or going down (if the virus is truly going away), yet many states are experiencing increased hospitalization rates, with the Texas Medical Center (an enormous complex) hitting 100% capacity for its ICU. That simply cannot be explained as a result of increased testing.
Fortunately, deaths have not started spiking yet. There are several reasons for this. One is that, this time, more young people are getting the disease. Another is simply that death rates inevitably lag behind infection rates, and it is very likely that deaths rates will increase in the coming weeks (though many experts are hopeful that we will be able to avoid the type of enormous spike we saw a few months ago).
In short, an actual examination of the data clearly and unequivocally shows that the current increase in coronavirus cases in the US cannot be explained simply as a result of increased testing. The percent of tests that are positive is increasing, which is a clear indication that the actual number of cases is increasing. Further, in states like Arizona and Florida, the numbers are truly shocking, with the increases in new cases massively outpacing the increases in testing. We are clearly still in the middle of a deadly outbreak, and it is getting worse. This isn’t a liberal conspiracy to undermine Donald Trump; it is a fact, and facts don’t change based on your political party.
Note: Please refrain from political comments. This post is about science and evidence and comments should likewise be about science and evidence (see Comment Rules).
Note: someone might object that my examples assume random testing, while testing is actually somewhat targeted, and people who are symptomatic or are known to have been in contact with someone who is infected are more likely to be tested. This fact is true, but actually doesn’t substantially change anything I’ve said. It does affect the exact percentages but doesn’t change my point about the trends. It is still true that the only way to get an increasing percentage of positive tests while the testing rate is increasing is for the actual number of total cases to be increasing (technically, this could also happen if we learned to do a much better job at targeting our tests, but there is no indication of this that I have seen; certainly not enough to cause the numbers we are seeing, and it still would not explain the increases in hospitalization rates).
Data source: The data I presented here were downloaded from the Covid Tracking Project late on 28-June-20.