The Logic of Science

5 reasons why anecdotes are totally worthless

Posted on February 10, 2016 by Fallacy Man

Personal anecdotes are often the primary ammunition of those who deny science. If you ask anyone in the alternative medicine or anti-vaccine movements for their evidence, you will almost certainly get flooded with anecdotes. A quick internet search will reveal countless people who are insisting that totally worthless treatments like homeopathy work because they took them and then felt better. These accounts are often accompanied by emotional stories about how they “tried everything but only [insert nonsense miracle cure] worked.” Similarly, I frequently encounter people who are adamant that detox solutions aren’t scams or that organic food is better than GMOs because “they just feel healthier when they eat organic/use the detox supplement.”

Anti-vaccers are probably the worst group for using anecdotes. They use personal anecdotes to blame vaccines for every ailment imaginable, but they don’t just stop there. For them, collections of reported symptoms such as the vaccine package inserts, VAERS, and cases from the NVICP are the gold standards of evidence that vaccines are bad. Those sources are, however, really just collections anecdotes. Similarly, even when anti-vaccers attempt to use the scientific literature, they often end up accumulating case reports, which are essentially glorified anecdotes.

All of this would be fine if anecdotes were actually useful pieces of evidence, but they aren’t. As I will explain in this post, they are worthless, and if your argument is built on anecdotes, then your argument should be rejected.

Before I begin, I want to clarify what I mean when I say that anecdotes are worthless. They are worthless as evidence, and you cannot use them to establish causal relationships. You can’t, for example, say “Bob took X, then got better; therefore, X works.” You can, however, say “Bob took X, then got better; therefore, X might be an interesting topic for future research.” In other words, anecdotes can be useful in helping researchers decide what topics to study, what potential drugs to investigate, etc. However, in the absence of those large, carefully controlled studies, you cannot jump to the conclusion that a causal relationship exists. In other words, you can’t assume that X works until X has been properly tested, and, perhaps most importantly, if the tests disagree with the anecdotes, you must reject the anecdote, not the tests.

There are also a few other situations in which anecdotes can potentially be useful (e.g., if a patient is dying and a doctor has exhausted all science-based options, then and only then would it be appropriate to try a treatment which has only anecdotal evidence to support it). For the purpose of this post, however, I am just going to focus on why they are completely and totally invalid as evidence for causal relationships.

1). If you are using anecdotes, you are committing a logical fallacy
Anytime that someone uses an anecdote to argue that X causes Y, they are committing a logical fallacy known as post hoc ergo propter hoc (often abbreviated as simply post hoc). The Latin translates to “after this, therefore because of this,” and it occurs whenever an argument takes the following form:

X happened before Y
Therefore, X caused Y

The astute reader will quickly notice that the vast majority of personal anecdotes are identical to that syllogism. For example, if you say, “I took this supplement, then I felt better; therefore, the supplement works” you are committing a logical fallacy. Similarly, if you say, “I vaccinated my child, then he developed autism; therefore, vaccines cause autism” you are committing a logical fallacy. Also, if you say, “I switched to an organic diet, then I started feeling better; therefore, an organic diet is healthier” you are committing a logical fallacy. Am I making my point clear? Using personal anecdotes as evidence of causation is logically invalid, and the rules of logic tell us that any argument that contains a logical fallacy is unreliable and must be rejected.

The reason that post hoc arguments are invalid should be obvious: the fact that Y happened after X does not mean that X caused Y. Let’s say, for example, that you fill your vehicle with fuel from a reputable gas station, and your car breaks down just a few miles later. Can you conclude that the bad gas killed your car? No. It is certainly possible that bad gas was at fault, but it is also possible your car died from something totally unrelated to the gas, and getting gas was just a coincidence. Even so, the fact that you got better after taking X does not mean that X worked because there are many other factors that could have caused your recovery.

It is worth noting, that you can use the order of events to make a legitimate argument if you are making a probabilistic argument, and if a causal relationship has already been established. In other words, if you know based on actual evidence (not anecdotes) that X can cause Y, then if Y happens after X, it is not unreasonable to conclude that X probably caused Y. So, you can say,

Item X is known to cause Y
I took X, then Y happened
Therefore, X probably caused Y

There is nothing wrong with that if and only if there is actually valid, scientific evidence that X can in fact cause Y. Also, the strength of the argument will depend on the strength of the relationship between X and Y (e.g., if X causes Y in 99% of cases, then it is a very strong argument, but if X only causes Y in 0.0000001% of cases, then it’s not a good argument because X almost never causes Y).

2). Anecdotes aren’t representative
Another major problem with anecdotes is that they don’t give you a proper representation of either the effects of X or the causes of Y. Let’s say, for example, that you are interested in miracle cure X, and when you get online, you find several people claiming that it worked for them. That doesn’t actually tell you much because it doesn’t tell you how many people X didn’t work for, nor does it tell you how many people recovered without X.

To give another example, anti-vaccers love to cite anecdotes of a symptom that followed a vaccine, but for every anecdote that they supply, I can supply anecdotes of people (like me) who received the full recommended vaccine schedule and are perfectly fine. Neither set of anecdotes is actually meaningful, because neither set is representative. To actually know whether or not X caused Y, we need the actual rates of Y relative to X, not just scattered reports. In other words, we need to know how many times Y followed X, how many times Y occurred without X occurring, and how many times X occurred but was not followed by Y (in some situations you may only need one of the later two, but you have to have at least one).

3). Anecdotes aren’t controlled
The third major problem with anecdotal evidence is that fact that they don’t control all possible factors. In other words, you can’t say, “I took X, then got better; therefore X works” because there may be something other than X that caused you to get better. In many cases, people simply get better on their own. For example, I often see people take a “remedy” for the common cold, continue to be sick for a day (or often several days), then get better, but after recovering, they insist that the remedy worked. The problem is, of course, that people normally get over colds in a few days. Therefore, it is utterly impossible (based on that anecdote) to determine if the remedy worked, or if their body simply took care of itself. As I explained in #2, this is why it is so important to know the actual rates of event Y relative X.

The placebo effect is another huge confounder. The placebo effect is often misunderstood and misrepresented (you can find good explanations/discussions here and here), but it is true that in many situations, people will report feeling better if they think that they are taking something that will help them, even if the treatment is totally worthless. This is especially true with highly subjective measurements like pain. So in some cases, people may report feeling better even if the treatment itself didn’t actually do anything.

There are many other potential factors that people fail to account for. Alternative medicine, for example, is famous for recommending a whole slew of treatments, then picking one as the responsible party. For example, I often hear people say things like, “I know X works, because my naturopath told my to exercise more, eat more vegetables, and take X, and I feel great now.” It seems rather silly to give X the credit if you you also started exercising more and eating healthier (both of which are actually supported by scientific evidence). Another one that I often encounter is, “my naturopath told me to do A, B, C, and eat less gluten, and I feel much better now, so gluten must be bad for you.” Again, how do you know that it was gluten and not A, B, or C? Those two examples contain pretty obvious confounding factors, but confounding factors may be much more subtle, and you may not even be aware of them. So, even if, to your knowledge, X is the only thing that has changed, there may be some other change that you haven’t thought about or just aren’t aware of.

Finally, it’s worth noting that the fact that, in some situations, we cannot identify the actual cause of an event does not mean that you can assume that it was X. In other words, if you say, “X causes Y, because I took X and Y happened,” and someone calls you out for using an anecdote, you can’t respond with, “Well if it wasn’t X, then what was what? Unless you can prove that it was something else, it must have been X.” That argument is actually another logical fallacy. Specifically, it is an argument from ignorance fallacy. The fact that I don’t know what caused Y doesn’t mean that it was X, and it’s not logically valid for you to jump to that conclusion. To put this another way, by claiming that X causes Y, you are placing the burden of proof on you, and it is your job to provide actual evidence that X causes Y. It’s not my job to provide evidence that X doesn’t cause Y.

4). An anecdote is a sample size of N=1
The importance of sample size is one of the most fundamental concepts in statistics. The larger your sample size, the more power that you have and the more confident you can be in your results. An anecdote, however, is simply a single observation, and extrapolating from a single observation to a general trend is an absurd thing to do. Imaging, for example, that you want to know whether or not a coin is biased, so you flip it twice and it lands on heads both times. Should you conclude that the coin is biased? Of course not. A sample size that small is meaningless because it is entirely possible (even likely) that you got a biased result just by chance. The same thing is true with anecdotes. Saying, “I vaccinated my kid, then he developed autism; therefore, vaccines cause autism” isn’t substantially different (as far as sample size) from saying, “I flipped the coin twice and got heads both times; therefore, the coin is biased.” Tiny sample sizes simply aren’t reliable.

5). Anecdotes aren’t collected systematically
Following my argument in #4, you may be thinking, “but I have met lots of people on the internet with identical anecdotes, so my sample size is much larger than just one.” The problem with that argument is that the anecdotes were not collected in a systematic way. This is really an overarching problem which overlaps substantially with points 2, 3, and 4, but it is important enough that I want to talk about it separately.

One of the hallmarks of science is being systematic. Real research is done in a careful, planned, controlled, repeatable fashion, and that systematic approach is a big part of why science is such a powerful tool for understanding the universe. For example, when we want to answer a question like, “do vaccines cause autism?” we don’t just haphazardly find someone on the internet. Rather, we carefully select a representative study population, control for confounding factors, use large sample sizes, and measure the actual rates of autism in both children with and without vaccines (for example, Taylor et al. 2014). That approach and that approach alone allows us to overcome the problems described in #2-4 and actually achieve a reliable answer. Anecdotes, on the other hand, are in no way systematic, which makes them exceedingly unreliable and unscientific.

Conclusion
In summary, using anecdotes as evidence of causation commits a logical fallacy, which means that anecdotal arguments must be rejected. Further, anecdotes don’t give you a fair representation of the effects of X on Y, nor do they account for potential confounding factors. Therefore, anecdotes are worthless as evidence. They simply cannot demonstrate causal relationships. As I often say on this blog, if you want to know whether or not X causes Y, the one and only way to do it is by conducting large, properly controlled studies that account for confounding variables. Nothing else will suffice. It doesn’t matter if you have “seen it work,” it doesn’t matter if something has been used for centuries, and it doesn’t matter if a symptom has been reported in a database like VAERS or printed on a package insert. Unless proper scientific testing has shown that X causes Y, you cannot conclude that there is a causal relationship between the two.

Posted in GMO, Nature of Science, Rules of Logic, Vaccines/Alternative Medicine | Tagged alternative medicine, anecdotal evidence, anti vaccine arguments, autism, Bad arguments, evaluating evidence, GMOs, peer-reviewed studies, post hoc ergo propter hoc fallacies, rules of logic, Vaccines | 14 Comments

Global warming hasn’t paused

Posted on February 1, 2016 by Fallacy Man

The notion that there has been a recent pause or hiatus in global climate change is one of those myths that just will not die. Numerous studies have shown that it simply isn’t true, and the claim is based on cherry-picked evidence and shoddy statistics. Nevertheless, despite 2015 replacing 2014 as the warmest year on record (based on surface temperature data), the myth lives on. Therefore, I want to provide a simple explanation of why this argument is fraudulent, as well as briefly reviewing several fairly recent studies that have thoroughly demolished the myth of the global warming pause. In short, the “pause” is actually just a normal fluctuation, and there have been multiple similar “pauses” prior to this one. There is nothing truly unique or special about the past two decades, and the climate is still warming.

Cherry-picked dates
Before I talk about the climate data itself, I need to make a few general points about analyzing trends. Generally, when you want to see if something is changing over time, you are going to do a regression analysis to see if there is a significant change in the variable of interest as time progresses (i.e., does it increase or decrease over time). Whether or not you get a significant trend is, however, highly dependent on the dates that you use, and it can be skewed by either starting or ending on an extreme year. This means that for almost any large data set, you can cherry-pick some subset of the data which fits your preconceived view.

These are fictional data intended to show what happens when you cherry-pick your starting point. The top panel is statistically significant, whereas the bottom panel is not.

Let me use the following fictional data set to illustrate this (right). I deliberately left the Y axis blank so that you can pretend that these data are whatever you want them to be (net earnings, population size, temperature, etc.). When we look at the full data set, we can clearly see that there is an overall upward trend, and we get a statistically significant increase over time (P < 0.001; I explained P values in detail here, but for now just realize that anything less than 0.05 is typically considered to be statistically significant). Nevertheless, if we cherry-pick our starting point, we can create the illusion of a pause. For example, you’ll notice that 1998 was a particularly high and unusual year, and if we use that as our starting point, we find that there has not been a statistically significant increase since that time (P = 0.717). If we start with 1999, however, we find a significant increase again (P = 0.013). In other words, if you deliberately start with an unusual year, you can mask the overall trend (which, in my book, is fraudulent).

The example above is obviously extreme because 1998 was such a huge outlier, but we can do the same thing with less extreme situations. Consider, for example, that if we cherry-pick the years 1990–1997 we get a fairly flat, non-significant line (P = 0.8694). Similarly, if we start with 2003 and go through 2015, we get a non-significant result of P = 0.061. Further, let’s imagine that it was currently 2010, so you only had the data going up to 2009. In that case, if you started with the 2003 data, you would actually find a significant negative trend (P = 0.004) even though the overall trend is clearly a positive one.

My point here is simply that you can tell almost any story that you want if you cherry-pick your data carefully enough. There will always be natural fluctuations in the data, so if you cherry-pick where you start your analysis, you can twist the data to fit your preconceptions. Doing so is, however, completely inappropriate, yet it is exactly what has happened with the climate data. The people who claim that global climate change has paused nearly always start the pause in either 1997 or 1998 even though we have data going back much further than that. Why do they use those years? Quite simply, because those are the years that fit their story. 1998 was an extremely strong El Niño year, which made it unusually warm. This is particularly pronounced in the satellite data, which is typically the data set that I see people citing as evidence of a pause. Thus, just like 1998 in my fictional example, starting the climate trend in 1998 biases the analysis. In fact, if we start in either 1996 or 1999, we find a significant warming trend (P = 0.047 and 0.021 respectively). So why should we say, “there has been no warming since 1997” when we could also say, “there has been significant warming since 1996” or “there has been significant warming since 1999”?

If you cherry pick your years, you can find quite a few "pauses" in climate change, because short term data are unreliable if you are interested in long term trends. Image via Skeptical Science. Note: some people have claimed that Skeptical Science had to cherry pick their data set to get a flat line for the fourth section of this image, but that is irrelevant since the entire point of this image is to illustrate that you shouldn't cherry pick data because you can misrepresent it by doing so (i.e., climate change contrarians cherry pick data all of the time, and this image shows why that is a bad idea).

Short term data are unreliable if you are interested in long term trends. As a result, if you cherry-pick your years, you can find quite a few “pauses” in climate change. Image via Skeptical Science. Note: some people have claimed that Skeptical Science cherry-picked their data set for the fourth flat section, but that is irrelevant since the entire point of this image is to illustrate that you shouldn’t cherry-pick data because you can misrepresent it by doing so (i.e., climate change contrarians cherry-pick data all of the time, and this image shows why that is a bad idea).

In fact, any starting point prior to 1997 is significant, and there are multiple significant starting points after 1998 (from both the RSS data and NASA’s surface temperature data set). Similarly, Skeptical Science put together a great image (left) for a surface temperature data set showing that if we cherry-pick our years, we can find many “pauses” despite the clear overarching trend. In other words, our current “pause” is nothing more than a natural fluctuation, and it is in no way unique.

This shows the temperature data once the effects of El Ninos, solar fluctuations, and volcanoes. Image via Open Mind.

This shows the temperature data once the effects of El Niños, solar fluctuations, and volcanoes have been removed. Image via Open Mind.

Nevertheless, even though you have to cherry-pick to see a hiatus, it is true that starting the trend in either 1997 or 1998 will give you a flat line (using the satellite data), and some people are understandably bothered by that, so let’s look at it a bit further. Part of the issue is sample size. The smaller your sample size, the harder it is to detect trends (which is also a big part of why the Skeptical Science figure was able to produce so many flat lines). The second and probably more important reason is that there are many factors that influence the climate (output from the sun, volcanic activity, El Niños, etc.) and cause natural fluctuations. These factors create noise that can make it difficult to see changes over the short-term. In other words, over long periods of time, the impact of human activities has a strong enough effect that it is obvious, but over short periods of time, human-induced changes can be masked by natural factors. To illustrate this, look at what happens to the data when we account for the natural factors (figure above and Foster and Rahmstorf 2011). The RSS and UAH data sets are the same satellite data sets that are generally used to show a flat line starting in 1997/1998, but when we account for the natural factors, suddenly, clear warming patterns emerge, even if we look at 1997/1998 and use it as our starting point (i.e., as we remove natural factors, the influence of human activity becomes more clear).

Cherry-picked data sets
A second major problem with the claim that there has been a pause in climate change is the choice of data set. You see, the term “global warming” is somewhat misleading. Yes, the average temperature of the planet has and will increase, but there is a lot more happening than just the temperature changing, and we should really be more concerned with the total amount of heat energy that the earth is trapping, rather than changes to the surface temperature. This is why most scientists prefer the more accurate term “global climate change.”

Because climate change involves a lot more than just the surface temperature changing, there are multiple data sets that we could use to look at it, such as land surface temperatures, ocean surface temperatures, lower atmosphere temperatures (which is what satellites record), and deep ocean temperatures. All of these should ultimately be affected by climate change, but not necessarily at the same rates. Satellite readings are, for example, particularly sensitive to the effects of El Niños. Also, water has a high heat capacity, which means that the oceans will absorb heat energy far more readily than the earth’s surface.

Therefore, if climate change has actually halted, it should be reflected in all of the major data sets, but it’s not. In other words, there is nothing in the science of climate change that says that all areas of the earth will be warmer all of the time. Rather, different components of the earth will warm at different rates, and some may even cool. So we don’t actually expect every year to be warmer than the last across all of the data sets. If it has truly paused, however, then you should not see trends of increasing temperatures in any of the data sets.

So what do the data sets actually show? For a while, there was no statistically significant increase from either the satellite data or the surface data, but as time has progressed, that has changed, and if you look at NASA’s global Land-Surface Air and Sea-Surface Water Temperature Anomalies data set, you will find a significant increase no matter what year you start the analyses in (note: that is only true if you use each month as a data point, if you use the yearly means, then it is significant for any starting point prior to 2005, after that you start to loose significance, largely because of the small sample size). Further, many of the previous analyses of the surface data sets failed to account for methodological changes and reached incorrect conclusions as a result (more on that later).

Meanwhile, the satellite data for the lower atmosphere (such as the RSS data) do show a flat line for some starting points within the past two decades. As I noted earlier, however, there are also years that yield a significant increase. Further, as noted in an earlier figure, once your correct for natural factors, the warming trend becomes obvious. Additionally, there is some debate among scientists about how reliable the satellite data actually are (Weng et al. 2014). The situation is extremely complicated, so I’m not going to attempt to explain it in detail in this post, but in short, satellites don’t actually measure the temperature. Rather, they measure several wavelengths of radiation and use those measurements to infer the temperature. The problem is that particulates and various gasses can interfere with those radiation measurements. Also, satellites tend to drift over time, which makes it hard to get long term measurements from a fixed point. To be clear, I’m not suggesting that the satellites are worthless, rather I am just pointing out that they have clear limitations and elevating them to the status of irrefutable evidence while ignoring the other data sets makes no sense whatsoever.

The accumulation of energy over time. You’ll notice that most of the energy is getting trapped in the oceans. Image via Rhein et al. 2013.

Finally, let’s turn to a very important source of data: the oceans. Our oceans are massive heat sinks. Indeed, it’s estimated that over 90% of the excess energy that the earth has trapped via global warming is stored in the oceans (Rhein et al. 2013). This is because water has a very high heat capacity, which makes it excellent at absorbing and storing heat energy. So, what’s happening to our oceans? Quite simply, they are trapping more heat energy (Balmaseda et al. 2013; Glecker et al. 2016). Look, for example, at Figure 1 in the Balmaseda et al. study or Box 3.1 (page 262) of Rhein et al. (2013)(left). Yes, there are fluctuations, and in Balmaseda et al. you can see a large peak around 1998 (just like all of the other data sets), but there is also a very clear upward trend. In other words, the oceans are continuing to trap more and more heat.

In short, there are many different data sets that we could use to ask whether or not the climate is still changing, and only the satellite data show any indication of a pause, and even then, the pause is only present if you cherry-pick your dates and fail to correct for confounding factors. As I frequently argue, we have to look at all of the data, not just the subset that conforms to our preconceptions, and when we do that, it is very clear that the planet is still warming.

Scientific analyses show that there is no pause
At this point, I have attempted to explain the statistical problems with the claim that there is a pause, but I clearly don’t expect you to take a blogger’s word for it (even though I am also a scientist). I do, however, expect you to accept results from the peer-reviewed literature. Several research groups have looked at the data to see whether or not the recent “pause” is actually unusual, and they have all reached the same conclusion. Namely, there are large natural fluctuations in the earth’s temperature, and, as a result, if you cherry-pick subsets of years, you can find numerous “pauses,” but those simply represent normal fluctuations, not actual hiatuses in climate change (this is the same thing that the Skeptical Science image shows; Easterling and Wehner 2009; Santer et al. 2011; Lewandowsky et al. 2015a; Lewandowsky et al. 2015b). In other words, the fact that (according to some data sets) the earth’s temperature has not risen significantly in recent years is not actually an indication that global warming has slowed or is no longer happening.

Further, Karl et al. (2015) found that the surface temperature data sets had numerous biases resulting from inconsistent methodologies, and once those biases were removed, the data showed that the rate of climate change over the past 15 years is just as great as the rate in the preceding decades. People often have a knee-jerk response to statements like that and say, “see, they are committing fraud and manipulating the data to make it show warming,” but that’s not what is happening here. You can find a really great, thorough explanation of why the data have to be adjusted here, so I’ll just give you the Cliff Notes version (I also previously wrote a post that was specifically about adjustments in the GHCN data set). In short, climate data have been collected for decades from all over the world using many different methodologies, and those methodologies and collection locations have changed over the years. Those differences and changes create biases in the data that have to be accounted for. These types of corrections are normal for real data sets, and failing to make them will give you incorrect results

Let me give one really simplistic explanation to illustrate why adjustments are necessary. Let’s say that you have been recording the temperature in your back yard for years, and originally the thermometer was in an open area. Over time, however, a large tree has grown and now shades the thermometer. If you don’t account for the presence of that tree, you are going to get an incorrect cooling trend. The same type of thing happens with real data, and we have to account for any biases and changes in the collection methodologies if we want accurate trends. There is nothing dishonest or fraudulent about that.

In blind tests, experts and non-experts reject the idea of a pause
People are obviously prone to biases, and when looking at something like a temperature data set, your biases can cause you to see a pause that isn’t there or make you ignore a pause that is real. Some studies have, however, overcome this problem by doing blind tests. In other words, they present people with the data and ask them to determine whether or not there has been a recent pause, but they don’t tell them that the data are climate data, thus eliminating the biases. These studies have found that once the biases are eliminated, people don’t detect a pause (sometimes they don’t detect a pause even with the biases).

The first of these is not actually a peer-reviewed study, but it is nevertheless informative. In 2009, The Associated Press sent two temperature data sets to four different professional statisticians, and asked them to look for trends and determine whether or not there has been a recent pause, but they did not tell the statisticians that the data sets contained climate data. All four of them said that there was no pause. In other words, when professional statisticians unbiasedly examined the data, they did not detect a hiatus.

The next study was actually peer-reviewed and was conducted by Lewandowsky (2011). This study did not use experts, but instead showed long term climate data to 200 pedestrians and asked them to predict the next three data points. Half of the people were told that the data were share prices, and the other half were told that they were climate data. Interestingly, both groups predicted that the next three points would increase, even if the subjects didn’t think that humans were causing the climate to change.

The final study was also conducted by Lewandowsky (2015b), but it used experts instead of non-experts. It did not, however, use climatologists. Rather, it used economists. This may sound strange at first, but it actually makes good sense because both groups are adept at analyzing trends and making predictions about future events based on those trends. All 25 participants had at least a master’s or Ph.D. in economics or a relevant field, and all but four of them had at least five years of professional experience. They were shown the climate data, but they were told that the data were for the world’s agricultural output and they were asked to analyzed the data in light of the following statement,

“A prominent Australian critic of conventional economics, Mr. X., publicly stated in 2006, that ‘There IS a problem with the growth in world agricultural output—it stopped in 1998.’ A few months ago, Mr. X. reiterated that ‘. . . there’s no trend, 2010 is not significantly more productive in any way than 1998.’”

Finally, they were asked several questions about whether or not the data supported that statement. The majority of them did not think that the data supported that statement, and almost two-thirds of them went as far as saying that the claim may be fraudulent.

All three of these tests tell the same story: the data do not actually support the notion that climate change has paused, which implies that people are latching onto the idea of a pause for ideological reasons rather than scientific ones.

Conclusion
In summary, if you want to claim that the earth is no longer warming, you are going to have to violate numerous principles of both scientific and logical investigation. First, you have to cherry-pick your data set and focus on the satellite data even though the surface and ocean data sets show clear evidence of continued climate change. Then, you are going to have to cherry-pick within those data sets to select the years that match your preconceptions, while simultaneously failing to account for factors such as El Niños and volcanoes. You will also have to ignore the expert analyses of the data which found that there was no pause, and you will have to ignore the fact that there have been multiple other similar “pauses” in the past.

In short, global warming has not paused. The past two decades simply represent normal fluctuations, not a hiatus. So please, I beg of you, stop claiming that climate change is no longer happening, because that claim simply isn’t supported by the data. It is happening, and it will continue to happen until we finally decide to take serious action.

Note: I can already hear the keyboards clicking away as people misuse my statements about “natural fluctuations” to assert that climate change itself is just a natural fluctuation. So let me be clear, that claim is in no way shape or form justified. The overall trend is far, far greater than what is caused by natural fluctuations (over the given time frame), and we are extremely certain that we the cause of the current warming.

Other posts on climate change:

Literature Cited:

Balmaseda et al. 2013. Distinctive climate signals in reanalysis of global ocean heat content. Geophysical Research Letters 40:1754–1759.
Easterling and Wehner 2009. Is the climate warming or cooling? Geophysical Research Letters 36.
Foster and Rahmstorf 2011. Global temperature evolution 1979–2010. Environmental Research Letters 7:011002.
Gleckler et al. 2016. Industrial-era global ocean heat uptake doubles in recent decades. Nature Climate Change.
Karl et al. 2015. Possible artifacts of data biases in the recent global surface warming hiatus. Science 348:1469–1472.
Lewandowsky 2011. Popular consensus climate change is set to continue. Psychological Science 22:460–463.
Lewandowsky et al. 2015a. On the definition and identifiability of the alleged hiatus in global warming. Scientific Reports 5: 16784.
Lewandowsky et al. 2015b. The “pause” in global warming: Turning a routine fluctuation into a problem for science. Bulletin of the American Meteorological Society
Rhein et al. 2013. Observations: Ocean. In: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change. Stocker (eds.). Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA
Santer et al. 2011. Separating signal and noise in atmospheric temperature changes: The importance of timescale. Journal of Geophysical Research: Atmospheres 116.
Weng et al. 2014. Uncertainty of AMSU-A derived temperature trends in relationship with clouds and precipitation over ocean. Climate Dynamics 43:1439–1448.

Posted in Global Warming | Tagged Bad arguments, cherry picking, global climate change | Comments Off

Research, you’re doing it wrong: A look at Tenpenny’s “Vaccine Research Library”

Posted on January 25, 2016 by Fallacy Man

“I’ve done my research.” If you’ve ever debated someone who disagrees with a scientific consensus, then you’ve probably encountered that sentence, especially if they were an anti-vaccer. It is the mantra of the anti-science movement, but it’s nearly always misused. You see, in science, doing research generally means conducting a scientific study and adding new information to the general body of scientific knowledge. Nevertheless, I don’t want to dwell on semantics, and I think that people should educate themselves; however, if you are going to educate yourself, then you have to read good sources (which in science means the peer-reviewed literature), and you can’t cherry-pick which papers to read and which papers to ignore. This is where our story turns to Sherri Tenpenny’s “Vaccine Research Library” (VRL).

The title sounds great, doesn’t it? A single library that houses all of the literature on vaccines would be a wonderful tool; however, the VRL does not contain all of the literature on vaccines. Instead, it only contains the papers that oppose vaccines. So, rather than being a legitimate research tool, it is actually the most glorious confirmation bias generator that I have ever encountered. I could not have asked for a more beautiful example of cherry-picking sources. Therefore, in this post I will not only explain why the VRL is a load of crap, but I will also use it as an illustration of how not to do research.

Note: Although they don’t house all of the literature on vaccines, you can find most studies on PubMed and Google Scholar. So use them if you want to actually be well-informed.

What is the purpose of the VRL?
I’m going to let Tenpenny answer this question for me, because her statements are better than anything I could write (I suggest that you don’t drink anything while reading this section because her justification for this website is honestly pretty funny).

Pro-vaccine information is as abundant and as easy to find as ice in Antarctica. But there is a large body of overlooked medical and scientific research that shows the other side – and chronicles the heartbreaking disasters and long-term health consequences caused by vaccines. The problem is that locating this information can be challenging, difficult to interpret and very time consuming to dig out.

On a different part of the site, she says,

In 2011, we realized how difficult – and time consuming – it is to find mainstream medical references documenting the harm being caused by vaccines. Finding these “needles in the haystack” is a tedious and time-consuming task.

Now, a rational person would think that maybe there is a scientific reason that pro-vaccine papers are so predominant, but that doesn’t stop Tenpenny from plowing forward. Further, she clearly contradicts herself. First, she says that there is a large body of anti-vaccine literature, then she goes on about how hard it is to find these papers, and she refers to them as “needles in the haystack.” So which is it? Are they abundant or aren’t they, and if this body of “overlooked” research is so large, then why is it hard to locate? Why do you have to dig it out? The vast majority of journals archive their abstracts in Google Scholar so if there is actually a large body of literature showing that vaccines are dangerous, then it should be easy to find those papers. The fact that it is difficult to find anti-vaccine publications actually demonstrates just how weak the anti-vaccine position truly is. So Tenpenny is really defeating her own argument.

Another section of the page says (the bold text and bizarre capitalization are in the original):

Convinced that Vaccines are Unsafe but Need Scientific Proof? You need information that gives you “The Other Side of the Story.”

Here we have the real problem. As I have frequently argued, anti-vaccers (and anti-scientists in general) have no interest in being well-informed. They don’t actually care about facts. Rather, they care about protecting their preconceptions. This “library” is not designed for people who actually want to learn about vaccines. Rather, it is intended for those who have already decided that vaccines are dangerous. Stephen Colbert brilliantly described this way of thinking when he coined the word “truthiness,“and it aptly describes the purpose of this website. It isn’t for people who want to carefully analyze the facts and evidence. Rather, it’s for people who know in their gut that vaccines are bad, and it is intended to bolster an existing belief rather than help people to evaluate evidence. Tenpenny makes this explicitly clear with statements like,

They want evidence to support what they intuitively know: The Party Line about vaccines is a charade, perpetuated to bolster profits and expand Big Pharma’s cartel.

Once again, it’s about cherry-picking evidence to support a belief rather than actually informing yourself about the topic. According to Tenpenny, however, her site will help to balance your knowledge.

Now, all in one place, is the irrefutable science you need to defend your position against vaccines. You will be able to prove your point, protect your health and that of your children, write balanced news stories, or support legal cases.

Think about how absurd this is for a minute. First, she claims that reading a tiny subset of the literature will give you irrefutable evidence. Then, she claims that totally ignoring the majority of the literature will help you to write balanced news stories! It’s like me saying, “here is a paper proving that the earth is flat. It disproves all of the papers saying that the earth is round, and it will let you write a balanced news story on why the earth is actually flat.”

To conclude this section, I want to give and discuss one final quote from her site which I find particularly amusing (again the emphasis is in the original).

Concerned that reviewing all this information will be time consuming? “Pre-search” takes the “grunt work” out of your research.

How much time do you spend on the Internet searching and researching…, searching and researching…, and searching and researching…..for reliable scientific facts about the problems associated with vaccines?Because browsers and web crawlers deliver a large number of results, it can take hours to troll through page after page…after page…after page of search results. Then clicking on link after link. Then skimming through reams of material to find a particular fact. What’s worse is the exasperation you feel when you come up empty-handed – after investing so much time, you didn’t find what you were looking for.
…
Now think about how much you are paid per hour in your Day Job. Take that dollar amount times the hundreds, even thousands, of hours you spend on the Internet, searching for information that can be frustratingly difficult to find.
…
The annual membership rate has been drastically reduced: A one year membership to the Library is worth thousands of dollars and hundreds of hours of your time, you can have full access to thousands of references for only $9.98 per month (for quick research of a specific topic) or only $99 – for a full year!

Let me paraphrase this, “Are you tired of spending hours trying to find that one anecdote that supports your preconceptions? Is cherry-picking data taking up too much of your time? Are you annoyed with having to scroll past website after website that says you’re wrong? Well then I have a deal for you, because I’ve cherry-picked the internet for you! Now, for the low price of $100 per year, you can have all information that conforms to your distorted view of reality without having to be bothered with the thousands of studies that say you’re full of crap! Order now, and we’ll even include a free jar of cherries.”

Addendum (26-1-2016): Originally, there were two paragraphs here that questioned Tenpenny’s financial motives for making this site, but as someone pointed out in the comments, they were admittedly speculative, and I don’t really think that they are relevant to the point that I am trying to make, so I removed them.

What is in the VRL?
At this point, I think it is clear that the VRL is not motivated by an honest desire to be well-informed. Nevertheless, let’s look closer because regardless of the motivations for constructing this site, if Tenpenny actually found a large body of properly conducted studies showing that vaccines are dangerous, then we should take those studies seriously. I’m clearly not about to give Tenpenny one penny of my money, however, so I activated a free trial version of the VRL. This admittedly only gave me access to part of the library, but I see no reason to think that the rest of it would be substantially different.

Before I describe the contents of the library, I want to remind everyone that not all scientific studies are equal. Some designs produce very robust, reliable results, whereas others produce very weak, unreliable results. So you should always be careful to avoid the trap of latching onto a study just because it agrees with you. You have to carefully evaluate the study and look at the design that was used to determine whether or not the results are reliable (I explained the hierarchy of evidence in more detail here).

With that in mind, it probably won’t surprise you to learn that the vast majority of studies in the library rank very low on the hierarchy of evidence. For example, there are a large number of case reports. These are the lowest category on the hierarchy of evidence because they are basically just glorified anecdotes. If a doctor observes someone having a heart attack after receiving a vaccine, for example, they would write a case report on it, but that does not in any way shape or form prove that the vaccine caused the heart attack. It could be a total coincidence that the person had a heart attack after the vaccine. In fact, using anecdotes and case reports to draw causal conclusions is a logical fallacy known as post hoc ergo propter hoc. So, rather than proving that vaccines are dangerous, these case reports should (and are) used as the basis for starting large, robustly designed studies to actually test whether or not vaccines cause the reported symptoms, but you don’t see many of those large studies in the VRL because they tend not to fit anti-vaccers’ preconceptions.

Of the studies that did use robust designs, the sample sizes tended to be small, and many of them suffered serious methodological flaws, were published in questionable journals, etc. So rather than being a collection of studies that prove that vaccines are dangerous, the VRL is really a collection of the lowest quality, weakest studies on vaccines. To be clear, there are a few decent studies in the list, but many of those are misrepresented, and you always have to consider scientific papers within the broader context of the literature (more on that later).

What really amazed me about the contents of the VRL, however, was Tenpenny’s ability to cherry-pick within a study. For example, I was very surprised to see a review paper (Shepard et al. 2006) on Hepatitis B infections and vaccinations (remember, reviews are one of the highest levels of evidence). The presence of this paper confused me because it is overwhelmingly supportive of vaccines. Here is an excerpt from the abstract:

Vaccination against HBV infection can be started at birth and provides long-term protection against infection in more than 90% of healthy people. In the 1990s, many industrialized countries and a few less-developed countries implemented universal hepatitis B immunization and experienced measurable reductions in HBV-related disease…Further progress towards the elimination of HBV transmission will require sustainable vaccination programs with improved vaccination coverage, practical methods of measuring the impact of vaccination programs, and targeted vaccination efforts for communities at high risk of infection.

So why on earth is a paper that encourages increased vaccination efforts in the library that supposedly proves that vaccines are dangerous? It’s there because of three sentences.

The earliest recognition of the public health importance of hepatitis B virus (HBV) infection is thought to have occurred when it appeared as an adverse event associated with a vaccination campaign. In 1883 in Bremen, Germany, 15 percent of 1,289 shipyard workers inoculated with a smallpox vaccine made from human lymph fell ill with jaundice during the weeks following vaccination. The etiology of “serum hepatitis,” as it was known for many years, was not identified until the 1960s, and only following the subsequent development of laboratory markers for infection was its significance as a major cause of morbidity and mortality worldwide fully appreciated.

Before I talk about those sentences, I want to make something else clear about the VRL. Access to this library does not give you access to the papers themselves (despite the fact that her page about the VRL clearly implies that you get the full papers). Rather, you get abstracts and a brief blurb from Tenpenny where she has highlighted the “important” parts of the paper for you. In other words, she is cherry-picking within studies! She is actually encouraging people to not only pick and choose which studies to accept, but to actually pick and choose which sentences to accept. Her excerpt from the Shepard et al. study illustrates that perfectly (the emphasis was hers, btw). Out of an entire review that talks about the massive body of literature showing that the Hepatitis B vaccine is useful, she wants you to read just three sentences. In other words, this entire paper describes why she is full of crap, but she wants you to ignore that and focus on three sentences from the introduction instead. It’s the most absurd and outlandish level of cherry-picking that I have ever seen.

Further, why she thinks that these three sentences show that vaccines are dangerous is beyond me. My guess is that she is arguing that the vaccine was contaminated with Hep B, to which I respond, so what? It makes absolutely no sense to say, “the vaccine was contaminated in 1883, therefore it is dangerous now.” Medical technologies have come a long way since 1883. It’s like saying, “the earliest computers were massive and slow, therefore modern computers are no good.” It seems that Tenpenny is suggesting that we should ignore the massive body of evidence supporting the vaccine and focus instead on a mistake that was made (and corrected) decades ago.

Note: Someone is probably getting ready to accuse me of hypocrisy since I also highlighted just a few sentences from the paper, but before you do that, realize that I was simply using those sentences to show that the paper was pro-vaccine. I am not in any way shape or form suggesting that you use those sentences as evidence that the vaccine is safe. For that, you need to read the entire paper (not just the abstract) as well as the rest of the literature on the topic. Finally, unlike Tenpenny’s quote, mine was actually representative of the paper.

Why Tenpenny’s method doesn’t work
Science is a messy process, and reaching a firm conclusion generally involves lots of studies from numerous research groups. As a result, the body of literature on any given topic will contain lots of statistical noise. In other words, there will generally be lots of preliminary studies with small sample sizes or weak designs, and there will be multiple studies that reached the wrong conclusion just by chance. This is why whenever you are trying to learn about a scientific topic, you have to look at the entire body of literature, not just a few cherry-picked studies. There is so much research being done that there are lots of bad papers out there (sometimes at no fault of the authors), and you can find a paper to support almost any position that you can think of. There are, for example, still people who think that the earth is flat, and if you start with that assumption, you can find “evidence” and even a few scientific papers to support it (for example, Benard et al. 1904, which you can find an excerpt from here). This is why it is so important that you avoid the single study syndrome. Individual studies have a high probability of being wrong, but it is far less likely that a large body of studies is wrong.

I’m not sure who created this image, so if it’s yours, please let me know so that I can give credit.

You should never latch onto a single study as irrefutable proof of your position, but that is exactly what Tenpenny is encouraging you to do. In her mind (and in the minds of anti-scientists more generally) all that you have to do to prove your position is find one study that agrees with you (or even one sentence). It doesn’t matter if the study was done correctly, it doesn’t matter what the sample size was, it doesn’t matter if the study used a robust design, it doesn’t matter if there are a thousand other studies that disagree with you. According to her way of thinking, finding that one study is all that you need, but that’s clearly not how science or logic actually works. Replication is one of the central tenets of science, and scientists only reach a consensus after a result has been replicated multiple times and supported by numerous studies. So Tenpenny is ignoring a fundamental principle of science. Further, what she is doing is actually a logical fallacy known as the Texas sharpshooter fallacy. This fallacy occurs whenever you focus on the subset of data that appears to support your position, while ignoring a much larger body of data that refutes your position.

Additionally, she is ignoring a fundamental principle of rational thought: you always have to start with an unbiased question. It’s fine to ask a question like, “are vaccines safe?” then look for answers to that question, but Tennpenny and her followers are starting with the assumption that they are dangerous, then looking for evidence to support that assumption. The problem is that if you do that, if you start with a conclusion, then you will always find something which supports that conclusion (at least in your mind).

Now, invariably some anti-vaccer reading this is going to say, “you’re committing a hasty generalization fallacy. Not all anti-vaccers are like that. I actually have looked at both sides and become well-informed.” In which case, my response is, why do you reject the thousands of papers that clearly demonstrate that vaccines are safe and effective? I’m guessing that it’s either because you have read a few faulty, low-quality studies and are choosing to rigidly cling to them (in which case you are doing exactly what Tenpenny is) or you are blindly rejecting them for one of the flawed reasons that I described here. To put this another way, where’s your evidence? If your position is actually based on an unbiased review of the data, then surely you can provide me with a large body of high-quality, properly-controlled, robustly-designed studies that have been replicated by other research groups which show that vaccines are dangerous and which provide a valid explanation for why thousands of other studies disagree with them. Unless you can do that, then you are succumbing to the same confirmation bias as Tenpenny, and you are picking and choosing what evidence to accept (no, the vaccine inserts, VAERS, and NVICP do not count as evidence that vaccines are dangerous, see the links for details).

Conclusion
In this post, I have been focusing specifically on Tenpenny and the anti-vaccers who follow her, but everything that I have been talking about is widely applicable to everyone. We are all prone to confirmation biases (myself included). It’s ingrained in our psychology to latch onto evidence that supports our views and disregard evidence that doesn’t. The key, therefore, is to acknowledged that tendency and strive to overcome it. If we are going to actually be well-informed on any topic, then we must ensure that we are not simply succumbing to confirmation biases. We have to look at the entire body of evidence, not just the subset that conforms to our preconceptions. That’s why I find the VRL so infuriating. Rather than helping people to become truly open-minded, it insists that people should close their minds to any evidence that supports vaccines, and it openly encourages people to adhere to confirmation biases. It equates gut feelings with actual evidence, and it encourages people to seek out “proof” for their views rather than testing whether or not those views are actually justified. This, in my opinion, is the worst form of pseudoscience and pseudoskepticism, because it doesn’t just mislead people about the evidence. Rather, it misleads them about the way to evaluate the evidence. If you want to truly understand our marvelous universe, then you must train yourself to recognize and avoid this false skepticism, and you must always accept the possibility that you might be wrong. So to any anti-vaccers reading this, I’m not trying to attack you, and I don’t think that you’re stupid, but you have been seriously mislead and misinformed about the evidence and how to evaluate that evidence. You need to learn to recognize confirmation biases and you have to consider the entire body of evidence, not just the pieces of evidence that support your view.

Posted in Nature of Science, Vaccines/Alternative Medicine | Tagged anti vaccine arguments, cherry picking, evaluating evidence, peer-reviewed studies, Vaccines | 14 Comments

The genetic fallacy: When is it okay to criticize a source?

Posted on January 18, 2016 by Fallacy Man

Last week, I wrote a post on the hierarchy of scientific evidence which included the figure to the right. In that post, I explained why some types of scientific papers produced more robust results than others. Some people, however, took issue with that and accused me of committing a genetic fallacy because I was attacking the source of their information rather than the information itself. They were specifically unhappy about my claim that personal anecdotes, gut feelings, counter-factual websites, etc. did not constitute scientific evidence. After all, how dare I assert that their opinions weren’t as valuable as a carefully controlled study (note the immense sarcasm). In reality, of course, my argument was not fallacious, and they were simply misunderstanding how the genetic fallacy works. This misunderstanding is, however, quite common and somewhat understandable. The genetic fallacy can admittedly be very confusing. Therefore, I want to briefly explain what this fallacy is, how to spot it, and when it is and is not acceptable to criticize the source of an argument/piece of information.

If you’re a regular reader of this blog, then much of this may sound very familiar. That is because I have already covered a lot of the key points in a previous post on ad hominem fallacies. The ad hominem fallacy is generally considered to be a type of genetic fallacy; therefore, the same general rules apply.

Note: in this post, I am going to specifically deal with this fallacy as it pertains to scientific issues.

What is the genetic fallacy?
As it’s name suggests, the genetic fallacy results from attacking the source or origin of information, rather than the information itself. If you think about that for a second, the reason for the confusion becomes clear. On the one hand, the reason that genetic fallacies don’t work is obvious: the truth of a claim is not dependent on the one who is making the claim. Even someone who is wrong 99.9% of the time will occasionally be right. On the other hand, however, the source of the information is clearly important. It’s intuitively obvious that not all sources are equal, and some sources are more authoritative than others. Imagine, for example, that during a trial, the prosecution brought in some random guy off of the street and asked him to testify about the forensic evidence of the case. The defense would very correctly attack the source of that information by arguing that this person was not a credentialed expert and, therefore, his testimony should not be trusted. There is obviously nothing fallacious about that, and the prosecution clearly couldn’t respond by accusing the defense of a genetic fallacy (they also couldn’t respond by saying “well he watched some Youtube videos on crime scene investigations and he’s read some blogs and done thousands of hours of research”).

So how do we resolve this apparent dilemma? The answer is that attacking the source of a claim is only fallacious if the source is irrelevant to veracity and trustworthiness of that claim. The Internet Encyclopedia of Philosophy defines it like this (my emphasis):

A critic uses the Genetic Fallacy if the critic attempts to discredit or support a claim or an argument because of its origin (genesis) when such an appeal to origins is irrelevant.

In other words, there is nothing wrong with attacking a source, if the source of the information is actually germane to whether or not you should trust the information. So, if someone cites questionable sources like Youtube videos or personal anecdotes, there is nothing wrong with you saying that we shouldn’t trust that information, because the sources actually are unreliable. That’s no different from not trusting some random guy off the street as an expert witness in a courtroom. Remember, that the burden of proof is always on the person making the claim, so it is their responsibility to provide you with evidence from a trustworthy source. As a result, if they make a claim like, “vaccines are dangerous” and their “evidence” is an Info Wars article, you are under no obligation to discredit that article. Rather, it is their obligation to provide you with evidence from a reliable source.

It’s important to note, however, that you can only use attacks against a source to show that the information cannot be trusted. You cannot use them to say that the information is false. For example, if someone presents you with “evidence” from a Natural News article, there is nothing wrong with saying, “Natural News is not a reliable source, therefore we should not trust that information.” It would, however, be fallacious to say, “Natural News is not a reliable source, therefore that information is wrong” (technically that would be a special case of the fallacy fallacy). Even an extremely unreliable source may be right every once in a while.

In addition to assaults on the source of the information, the genetic fallacy can also occur when you attack the reason for a person holding a particular view. For example, I frequently see creationists attack their opponents by saying, “you only accept evolution because you are an atheist who doesn’t want to believe in God.” Even if that premise was true (which it often isn’t), it’s irrelevant. It has no bearing on whether or not evolution is true, and is, therefore, a genetic fallacy.

Finally, it’s important to realize that for an argument to be a genetic fallacy the assault on the source has to actually be the argument. For example, if you show me a scientific study, and I respond by saying, “well the authors of that study are just ugly idiots so I don’t need to listen to them,” then I would have committed a genetic fallacy (specifically, an ad hominem fallacy). If, however, I explained at length why the study was flawed, then concluded with a Trump-like jab at the authors appearance/intelligence, I would not have committed a fallacy. It would be uncouth and inappropriate for me to do that, but it wouldn’t actually be a fallacy because the attack on the source was tangential to my argument.

Addendum (19-Jan-16): The genetic fallacy also occurs if you assert that something is true because of its source (i.e., the appeal to authority fallacy is actually a type of genetic fallacy), but in this post, my focus was on attacking sources, rather than using them as proof of a position.

The genetic fallacy vs. the hierarchy of scientific evidence
Now that you understand what this fallacy is, let’s bring it to bear on the topic that inspired this post: the hierarchy of scientific evidence. It should by now be clear that using the hierarchy of evidence to assess the validity of a scientific claim is not the same thing as committing a genetic fallacy. Nevertheless, let’s look closer.

First, let’s look at my assertion that personal opinions, anecdotes, anti-science websites, etc. do not count as scientific evidence. It’s worth noting, that I didn’t actually say that they aren’t trustworthy. Rather, I simply said that they aren’t scientific evidence, and that claim is demonstrably true because those sources do not produce evidence via the standards and methodologies of science. Therefore, they are, by definition, not scientific evidence. If I ask someone to give me scientific evidence for a position, then I am asking for actual original research. I want to see the peer-reviewed paper that found the result that they are reporting, not the Youtube video they watched.

flowchart diagram how to publish scientific peer-reviewed paper blog

This flowchart summarizes the steps required to publish a peer-reviewed paper and the steps required to publish a blog post. Take a careful look at this difference, then honestly tell me that you think that blogs are a better source of information about science (more details here).

Nevertheless, although I didn’t claim that non-scientific sources are untrustworthy in the original post, I clearly think that they are. People often take issue with this, but if you stop and think about it for a second, the claim is self-evident. All that I am saying is that for scientific topics, we have to use scientific evidence, which necessarily comes from the peer-reviewed literature. Websites, Youtube videos, etc. are inherently second hand information, which may or may not be reliable. The scientific literature, on the other hand, is primary information. When you read a scientific paper, you can see the actual results of an experiment rather than simply reading someone’s biased explanation of those results. Further, to publish a peer-reviewed paper takes a tremendous amount of work. You have to pass a rigorous peer-review process during which numerous other scientists will evaluate your work to ensure that it was done correctly. In contrast, any idiot with a computer and internet connection can make a website/Youtube video with absolutely no assurance of quality control. To be clear, that doesn’t automatically mean that the information contained in second-hand sources is wrong, but it does mean that you don’t have any reason to trust that information, which is why they aren’t valid sources for scientific topics. Further, websites like Natural News, Info Wars, Answers in Genesis, etc. are notorious for containing inaccurate information, which gives you an extremely strong, relevant, and legitimate reason not to trust them.

Even within the scientific literature, however, you should be looking critically at the sources. Some experimental designs are simply more powerful than others and produce more reliable results. For example, if you have a meta-analysis of randomized controlled trials vs. a cross sectional analysis, it would not be a genetic fallacy to say that the cross sectional analysis is less reliable than the meta-analysis. From a strictly mathematical point of view, cross sectional studies are weak. They simply cannot make causal conclusions. In contrast, randomized controlled trials are very powerful and can make causal conclusions, and meta-analyses are even better because they combine multiple data sets, thus greatly increasing the sample size and reducing the chance of reaching a faulty conclusion. It’s a simple mathematical fact that meta-analyses are better than cross sectional analyses. Therefore, the type of study (i.e., the source of the information) is extremely relevant to the trustworthiness of a study, and using that information in a debate does not constitute a genetic fallacy.

ad hominem fallacy logical fallacy flow chart

Note: this flowchart only works when you are making an attack. Appeals to authority are also a type of genetic fallacy which I did not cover in this post or flowchart (you can find an explanation of them here).

Conclusion
Genetic fallacies occur when you make an irrelevant attack on the source of information rather than the information itself. That does not mean, however, that it is always fallacious to attack the source of information. Some sources clearly are better than others, and the burden of proof is always on the person making the claim. Thus, it is their responsibility to provide high quality sources, and you are not responsible for disproving the information from extremely low quality sources. Nevertheless, determining when attacks on sources are fallacious can admittedly be confusing. Therefore, I have constructed the flowchart on the right to help you determine when you can and cannot attack a source.

Note: Just to be clear, arbitrarily accusing someone of being a shill without providing actual evidence that they are being paid off does not constitute a legitimate, relevant concern.

More posts on logical fallacies:

Posted in Rules of Logic | Tagged ad hominem fallacies, evaluating evidence, logical fallacies, rules of logic | 17 Comments

The hierarchy of evidence: Is the study’s design robust?

Posted on January 12, 2016 by Fallacy Man

People are extraordinarily prone to confirmation biases. We have a strong tendency to latch onto anything that supports our position and blindly ignore anything that doesn’t. This is especially true when it comes to scientific topics. People love to think that science is on their side, and they often use scientific papers to bolster their position. Citing scientific literature can, of course, be a very good thing. In fact, I frequently insist that we have to rely on the peer-reviewed literature for scientific matters. The problem is that not all scientific papers are of a high quality. Shoddy research does sometimes get published, and we’ve reached a point in history where there is so much research being published that if you look hard enough, you can find at least one paper in support of almost any position that you can imagine. Therefore, we must always be cautious about eagerly accepting papers that agree with our preconceptions, and we should always carefully examine publications. I have previously dealt with this topic by describing both good and bad criteria for rejecting a paper; however, both of those posts were concerned primarily with telling whether or not the study itself was done correctly, and the situation is substantially more complicated than that. You see, there are many different types of scientific studies and some designs are more robust and powerful than others. Thus, you can have two studies that were both done correctly, but both reached very different conclusions. Therefore, when examining a paper, it is critical that you take a look at the type of experimental design that was used and consider whether or not it is robust. To aid you in that endeavor, I am going to provide you with a brief description of some of the more common designs, starting with the least powerful and moving to the most authoritative.

Note: Before I begin, I want to make a few clarifications. First, this hierarchy of evidence is a general guideline, not an absolute rule. There certainly are cases where a study that used a relatively weak design can trump a study that used a more robust design (I’ll discuss some of these instances in the post), and there is no one universally agreed upon hierarchy, but it is widely agreed that the order presented here does rank the study designs themselves in order of robustness (many of the different hierarchies include criteria that I am not discussing because I am focusing entirely on the design of the study). Second, the exact order of the designs that I have ranked as “very weak” and “weak” is debatable, but the key point is that they are always considered to be the lowest forms of evidence. Third, for sake of brevity, I am only going to describe the different types of research designs in their most general terms. There are subcategories for most of them which I won’t go into. Fourth, this hierarchy is most germane to issues of human health (i.e., the causes a particular disease, the safety of a pharmaceutical or food item, the effectiveness of a medication, etc.). Many other disciplines do, however, use similar methodologies and much of this post applies to them as well (for example, meta-analysis and systematic reviews are always at the top). Finally, realize that for the sake of this post, I am assuming that all of the studies themselves were done correctly and used the controls, randomization, etc. that are appropriate for that particular type of study. In reality, those are things which you must carefully examine when reading a paper.

Opinions/letters (strength = very weak)
Some journals publish opinion pieces and letters. These are rather unusual for academic publications because they aren’t actually research. Rather, they consist of the author(s) arguing for a particular position, explaining why research needs to start moving in a certain direction, explaining problems with a particular paper, etc. These can be quite good as they are generally written by experts in the relevant fields, but you shouldn’t mistake them for new scientific evidence. They should be based on evidence, but they generally do not contain any new information. Thus, it would be disingenuous to describe one by saying, “a study found that…” Rather, you can say, “this scientist made the following argument, and it is compelling…” but you cannot conflate an argument to the status of evidence. To be clear, arguments can be very informative and they often drive future research, but you can’t make a claim like, “vaccines cause autism because this scientist said so in this opinion piece.” Opinions should always guide research rather than being treated as research.

Case reports (strength = very weak)
These are essentially glorified anecdotes. They are typically reports of some single event. In medicine, these are typically centered on a single patient and can include things like a novel reaction to a treatment, a strange physiological malformation, the success of a novel treatment, the progression of a rare disease, etc. Other fields often have similar publications. For example, in zoology, we have “natural history notes” which are observations of some novel attribute or behavior (e.g., the first report of albinism in a species, a new diet record, etc.).

Case reports can be very useful as the starting point for further investigation, but they are generally a single data point, so you should not place much weight on them. For example, let’s suppose that a novel vaccine is made, and during its first year of use, a doctor has a patient who starts having seizures shortly after receiving the vaccine. Therefore, he writes a case report about it. That report should (and likely would) be taken seriously by the scientific/medical community who would then set up a study to test whether or not the vaccine actually causes seizures, but you couldn’t use that case report as strong evidence that the vaccine is dangerous. You would have to wait for a large study before reaching a conclusion. Never forget that the fact that event A happened before event B does not mean that event A caused event B (that’s actually a logical fallacy known as post hoc ergo propter hoc). It is entirely possible that the seizure was caused by something totally unrelated to the vaccine, and it just happened to occur shortly after the vaccine was administered.

Animal studies (strength = weak)
Animal studies simply use animals to test pharmaceuticals, GMOs, etc. to get an idea of whether or not they are safe/effective before moving on to human trials. Exactly where animal trials fall on the hierarchy of evidence is debatable, but they are always placed near the bottom. The reason for this is really quite simple: human physiology is different from the physiology of other animals, so a drug may act differently in humans than it does in mice, pigs, etc. Also, the strength of an animal study will be dependent on how closely the physiology of the test animal matches human physiology (e.g., in most cases a trial with chimpanzees will be more convincing than a trial with mice).

Because animal studies are inherently limited, they are generally used simply as the starting point for future research. For example, when a new drug is developed, it will generally be tried on animals before being tried on humans. If it shows promise during animal trials, then human trials will be approved. Once the human trials have been conducted, however, the results of the animal trials become fairly irrelevant. So you should be very cautious about basing your position/argument on animal trials.

It should be noted, however, that there are certain lines of investigation that necessarily end with animals. For example, when we are studying acute toxicity and attempting to determine the lethal dose of a chemical, it would obviously be extremely unethical to use human subjects. Therefore, we rely on animal studies, rather than actually using humans to determine the dose at which a chemical becomes lethal.

Finally, I want to stress that the problem with animal studies is not a statistical one, rather it is a problem of applicability. You can (and should) do animal studies by using a randomized controlled design. This will give you extraordinary statistical power, but, the result that you get may not actually be applicable to humans. In other words, you may have very convincingly demonstrated how X behaves in mice, but that doesn’t necessarily mean that it will behave the same way in humans.

In vitro studies (strength = weak)
In vitro is Latin for “in glass,” and it is used to refer to “test tube studies.” In other words, these are laboratory trials that use isolated cells, biological molecules, etc. rather than complex multi-cellular organisms. For example, if we want to know whether or not pharmaceutical X treats cancer, we might start with an in vitro study where we take a plate of isolated cancer cells and expose it to X to see what happens.

The problem is that in a controlled, limited environment like a test tube, chemicals often behave very differently than they do in an exceedingly complex environment like the human body. Every second, there are thousands of chemical reactions going on inside of the human body, and these may interact with the drug that is being tested and prevent it from functioning as desired. For something like a chemical that kills cancer cells to work, it has to be transported through the body to the cancer cells, ignore the healthy cells, not interact with all of the thousands of other chemicals that are present (or at least not interact in a way that is harmful or prevents it from functioning), and it has to actually kill the cancer cells. So, showing that a drug kills cancer cells in a petri dish only solves one very small part of a very large and very complex puzzle. Therefore, in vitro studies should be the start of an area of research, rather than its conclusion. People often don’t seem to realize this, however, and I frequently see in vitro studies being hailed as proof of some new miracle cure, proof that GMOs are dangerous, proof that vaccines cause autism, etc. In reality, you have to wait for studies with a substantially more robust design before drawing a conclusion. To be clear, as with animal studies, this is an application problem, not a statistical problem.

Cross sectional study (strength = weak-moderate)
Cross sectional studies (also called transversal studies and prevalence studies) determine the prevalence of a particular trait in a particular population at a particular time, and they often look at associations between that trait and one or more variables. These studies are observational only. In other words, they collect data without interfering or affecting the patients. Generally, they are done via either questioners or examining medical records. For example, you might do a cross sectional study to determine the current rates of heart disease in a given population at a particular time, and while doing so, you might collect data on other variables (such as certain medications) in order to see if certain medications, diet, etc. correlate with heart disease. In other words, these studies are generally simply looking for prevalence and correlations.

There are several problems with this approach, which generally result in it being fairly weak. First, there’s no randomization, which makes it very hard to account for confounding variables. Further, you are often relying on people’s abilities to remember details accurately and respond truthfully. Perhaps most importantly, cross sectional studies cannot be use to establish cause and effect. Let’s say, for example, that you do the study that I mentioned on heart disease, and you find a strong relationship between people having heart disease and people taking pharmaceutical X. That does not mean that pharmaceutical X causes heart disease. Because cross sectional studies inherently look only at one point in time, they are incapable of disentangling cause and effect. Perhaps, the heart disease causes other problems which in turn result in people taking pharmaceutical X (thus, the disease causes the drug use rather than the other way around). Alternatively, there could be some third variable that you didn’t account for which is causing both the heart disease and the need for X.

Therefore, cross sectional studies should be used either to learn about the prevalence of a trait (such as a disease) in a given population (this is in fact their primary function), or as a starting point for future research. Finding the relationship between heart disease and X, for example, would likely prompt a randomized controlled trial to determine whether or not X actually does cause heart disease. This type of study can also be useful, however, in showing that two variables are not related. In other words, if you find that X and heart disease are correlated, then all that you can say is that there is an association, but you can’t say what the cause is; however, if you find that X and heart disease are not correlated, then you can say that the evidence does not support the conclusion that X causes heart disease (at least within the power and detectable effect size of that study).

Case-control studies (strength = moderate)
Case-control studies are also observational, and they work somewhat backwards from how we typically think of experiments. They start with the outcome, then try to figure out what caused it. Typically, this is done by having two groups: a group with the outcome of interest, and a group without the outcome of interest (i.e., the control group). Then, they look at the frequency of some potential cause within each group.

To illustrate this, let’s keep using heart disease and X, but this time, let’s set up a case control. To do that, we will have one group of people who have heart disease, and a second group of people who do not have heart disease (i.e., the control group). Importantly, these two groups should be matched for confounding factors. For example, you couldn’t compare a group of poor people with heart disease to a group of rich people without heart disease because economic status would be a confounding variable (i.e., that might be what’s causing the difference, rather than X). Therefore, you would need to compare rich people with heart disease to rich people without heart disease (or poor with poor, as well as matching for sex, age, etc.).

Now that we have our two groups (people with and without heart disease, matched for confounders) we can look at the usage of X in each group. If X causes heart disease, then we should see significantly higher levels of it being used in the heart disease category; whereas, if it does not cause heart disease, the usage of X should be the same in both groups. Importantly, like cross sectional studies, this design also struggles to disentangle cause and effect. In certain circumstances, however, it does have the potential to show cause and effect if it can be established that the predictor variable occurred before the outcome, and if all confounders were accounted for. As a general rule, however, at least one of those conditions is not met and this type of study is prone to biases (for example, people who suffer heart disease are more likely to remember something like taking X than people who don’t suffer heart disease). As a result, it is generally not possible to draw causal conclusions from case-controlled studies.

Probably the biggest advantage of this type of study, however, is the fact that it can deal with rare outcomes. Let’s say, for example, that you were interested in trying to study some rare symptom that only occurred in 1 out of ever 1,000 people. Doing a cross-sectional study or cohort study would be extremely difficult because you would need hundreds of thousands of people in other to get enough people with the symptom for you to have any statistical power. With a case-control study, however, you can get around that because you start with a group of people who have the symptom and simply match that group with a group that doesn’t have the symptom. Thus, you can have a large amount of statistical power to study rare events that couldn’t be studied otherwise.

Cohort studies (strength = moderate-strong)
Cohort studies can be done either prospectively or retrospectively (case-controlled studies are always retrospective). In a prospective study, you take a group of people who do not have the outcome that you are interested in (e.g., heart disease) and who differ (or will differ) in their exposure to some potential cause (e.g., X). Then, you follow them for a given period of time to see if they develop the outcome that you are interested in. To be clear, this is another observational study, so you don’t actually expose them to the potential cause. Rather, you choose a population in which some individuals will already be exposed to it without you intervening. So in our example, you would be seeing if people who take X are more likely to develop heart disease over several years. Retrospective studies can also be done if you have access to detailed medical records. In that case, you select your starting population in the same way, but instead of actually following the population, you just look at their medical records for the next several years (this of course relies on you having access to good records for a large number of people).

This type of study is often very expensive and time consuming, but it has a huge advantage over the other methods in that it can actually detect causal relationships. Because you actually follow the progression of the outcome, you can see if the potential cause actually proceeded the outcome (e.g., did the people with heart disease take X before developing it). Importantly, you still have to account for all possible confounding factors, but if you can do that, then you can provide evidence of causation (albeit, not as powerfully as you can with a randomized controlled trial). Additionally, cohort studies generally allow you to calculate the risk associated with a particular treatment/activity (e.g., the risk of heart disease if you take X vs. if you don’t take X).

Randomized controlled trial (strength = strong)
Randomized controlled trials (often abbreviated RCT) are the gold standard of scientific research. They are the most powerful experimental design and provide the most definitive results. They are also the design that most people are familiar with. To set one of these up, first, you select a study population that has as few confounding variables as possible (i.e., everyone in the group should be as similar as possible in age, sex, ethnicity, economic status, health, etc.). Next, you randomly select half the people and put them into the control group, and then you put the other half into the treatment group.The importance of this randomization step cannot be overstated, and it is one of the key features that makes this such a powerful design. In all of the previous designs, you can’t randomly decide who gets the treatment and who doesn’t, which greatly limits your power to account for confounding factors, which makes it difficult to ensure that your two groups are the same in all respects except the treatment of interest. In randomized controlled trials, however, you can (and must) randomize, which gives you a major boost in power.

In additional to randomizing, these studies should be placebo controlled. This means that the people in the treatment group get the thing that thing that you are testing (e.g., X), and the people in the control group get a sham treatment that is actual inert. Ideally, this should be done in a double blind fashion. In other words, neither the patients nor the researchers know who is in which group. This avoids both the placebo affect and researcher bias. Both placebos and blinding are features that are lacking in the other designs. In a case controlled study, for example, people know whether or not they are taking X, which can affect the results.

When you think about all of these factors, the reason that this design is so powerful should become clear. Because you select your study subjects beforehand, you have unparalleled power for controlling confounding factors, and you can randomize across the factors that you can’t control for. Further, you can account for placebo effects and eliminate researcher bias (at least during the data collection phase). All of these factors combine to make randomized controlled studies the best possible design.

Now you may be wondering, if they are so great, then why don’t we just use them all the time? There are a myriad of reasons that we don’t always use them, but I will just mention a few. First, it is often unethical to do so. For example, using these studies to test the safety of vaccines is generally considered unethical because we know that vaccines work; therefore, doing that study would mean knowingly preventing children from getting a lifesaving treatment. Similarly, studies that deliberately expose people to substances that are known to be harmful is unethical. So, in those cases, we have to rely on other designs in which we do not actually manipulate the patients.

Another reason for not doing these studies, is if the outcome that you are interested is extremely rare. If, for example, you think that a pharmaceutical causes a serious reaction in 1 out of every 10,000 people, then it is going to be nearly impossible for you to get a sufficient sample size for this type of study, and you will need to use a case-control study instead.

Cost and effort is also a big factor. These studies tend to be expensive and time consuming, and researchers often simply don’t have the necessary resources to invest in them. Also, in many cases, the medical records needed for the other designs are readily available, so it makes sense to learn as much as we can from them.

Systematic reviews and meta-analyses (strength = very strong)
Sitting at the very top of the evidence pyramid, we have systematic reviews and meta-analyses. These are not experiments themselves, but rather are reviews and analyses of previous experiments. Systematic reviews carefully comb through the literature for information on a given topic, then condense the results of numerous trials into a single paper that discusses everything that we know about that topic. Meta-analyses go a step further and actually combine the data sets from multiple papers and run a statistical analyses across all of them.

Both of these designs produce very powerful results because they avoid the trap of relying on any one study. One of the single most important things for you to keep in mind when reading scientific papers is that you should always beware of the single study syndrome. Bad papers and papers with incorrect conclusions do occasionally get published (sometimes at no fault of the authors). Therefore, you always have to look at the general body of literature, rather than latching onto one or two papers, and meta-analyses and reviews do that for you. Let’s say, for example, that there are 19 papers saying that X does not cause heart disease, and one paper saying that it does. People would be very prone to latch onto that one paper, but the review would correct that error by putting that one study in the broader context of all of the other studies that disagree with it, and the meta-analysis would deal with it but running a single analysis over the entire data set (combined form all 20 papers).

Importantly, garbage in = garbage out. These papers should always list their inclusion and exclusion criteria, and you should look carefully at them. A systematic review of cross sectional analyses, for example, would not be particularly powerful, and could easily be trumped by a few randomized controlled trials. Conversely, a meta-analysis of randomized controlled trials would be exceedingly powerful. Therefore, these papers tend to be designed such that they eliminate the low quality studies and focus on high quality studies (sample size may also be a inclusion criteria). These criteria can, however, be manipulated such that they only include papers that fit the researchers’ preconceptions, so you should watch out for that.

Finally, even if the inclusion criteria seem reasonable and unbiased, you should still take a look at the papers that were eliminated. Let’s say, for example, the you had a meta-analysis/review that only looked are randomized controlled trials that tested X (which is a reasonable criteria), but there are only five papers like that, and they all have small sample sizes. Meanwhile, there are dozens of case-control and cohort studies on X that have large sample sizes and disagree with the meta-analysis/review. In that case, I would be pretty hesitant to rely on the meta-analysis/review.

The importance of sample size
As you have probably noticed by now, this hierarchy of evidence is a general guideline rather than a hard and fast rule, and there are exceptions. The biggest of these is caused by sample size. It’s really the wild card in this discussion because a small sample size can rob a robust design of its power, and a large sample size can supercharge an otherwise weak design.

Let’s say, for example, that there was a meta-analysis of 10 randomized controlled trials looking at the effects of X, and each of those 10 studies only included 100 subjects (thus the total sample size is 1000). Then, after the meta-analysis, someone published a randomized controlled trial with a sample size of 10,000 people, and that study disagreed with the meta-analysis. In that situation, I would place far more confidence in the large study than in the meta-analysis. Honestly, even if that study was a cohort or case-controlled study, I would probably be more confident in its results than in the meta-analysis, because that large of a sample size should give it extraordinary power; whereas, the relatively small sample size of the meta-analysis gives it fairly low power.

Unfortunately, however, there are very few clear guidelines about when sample size can trump the hierarchy. The lowest level studies generally cannot be rescued by sample size (e.g., I have great difficulty imaging a scenario in which sample size would allow an animal study or in vitro trial to trump a randomized controlled trial, and it is very rare for a cross sectional analysis to do so), but for the more robust designs, things become quite complicated. For example, let’s say that we have a cohort study with a sample size of 10,000, and a randomized controlled trial with a sample size of 7000. Which should we trust? I honestly don’t know. If both of them were conducted properly, and both produced very clear results, then, in the absence of additional evidence, I would have a very hard time determining which one was correct.

This brings me back to one of my central points: you have to look at the entire body of research, not just one or two papers. The odds of a single study being flawed are fairly high, but the odds of a large body of studies being flawed are much lower. In some cases, this will mean that you simply can’t reach a conclusion yet, and that’s fine. The whole reason that we do science is because there are things that we don’t know, and sometimes it takes many years to accumulate enough evidence to see through the statistical noise and detect the central trends. So, there is absolutely nothing wrong with saying, “we don’t know yet, but we are looking for answers.”

Conclusion
I have tried to present you with a general overview of some of the more common types of scientific studies, as well as information about how robust they are. You should always keep this in mind when reading scientific papers, but I want to stress again, that this hierarchy is a general guideline only, and you must always take a long hard look at a paper itself to make sure that it was done correctly. While doing so, make sure to look at its sample size and see if it actually had the power necessary to detect meaningful differences between its groups. Perhaps most importantly, always look at the entire body of evidence, rather than just one or two studies. For many anti-science and pseudoscience topics like homeopathy, the supposed dangers of vaccines and GMOs, etc. you can find papers in support of them, but those papers generally have small sample sizes and used weak designs, whereas many much larger studies with more robust designs have reached opposite conclusions. This should tell you that those small studies are simply statistical noise, and you should rely on the large, robustly designed studies instead.

Suggested reading:

Evans. 2002. Hierarchy of evidence: a framework for ranking evidence evaluating healthcare interventions. Journal of Clinical Nursing 77-84.
Lewallen and Courtright. 1998. Epidemiology in practice: Case-control studies. Community Eye Health 11: 57-58.
Mann. 2003. Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emergency Medical Journal 20:54-60.
Silva (ed.). 1999. Cancer Epidemiology: Principles and Methods. WHO: International Agency for Research on Cancer.
(This book has several very good, easy to read chapters on research designs. You can also get to the individual chapters through the IARC website)
Song and Chung. 2010. Observational studies: Cohort and case-control studies. Plastic and Reconstructive Surgery 126:2234-2242.

Posted in Nature of Science | Tagged evaluating evidence, peer-reviewed studies, statistics | 6 Comments

The Logic of Science

5 reasons why anecdotes are totally worthless

Global warming hasn’t paused

Research, you’re doing it wrong: A look at Tenpenny’s “Vaccine Research Library”

The genetic fallacy: When is it okay to criticize a source?

The hierarchy of evidence: Is the study’s design robust?

Follow The Logic of Science on Facebook.

Follow The Logic of Science via Email

Archives

Tags

5 reasons why anecdotes are totally worthless

Share this:

Global warming hasn’t paused

Share this:

Research, you’re doing it wrong: A look at Tenpenny’s “Vaccine Research Library”

Share this:

The genetic fallacy: When is it okay to criticize a source?

Share this:

The hierarchy of evidence: Is the study’s design robust?

Share this:

Follow The Logic of Science on Facebook.

Follow The Logic of Science via Email

Archives

Tags