Ask Professor Puzzler
Do you have a question you would like to ask Professor Puzzler? Click here to ask your question!
Dear Professor Puzzler,
Can you explain to me the difference between correlation and causation? Is there a difference?
Correlation is a strong link between two sets of data. For example, suppose I created a graph of electricity usage by month from January to December, and then, on top of that graph, I superimposed another graph of ice cream consumption per person by month, you might find a correlation between the two sets of data. As ice cream consumption increases, so does electricity usage. Similiarly, as ice cream consumption decreases, electricity usage does also. We say there is a correlation or link, between those two data sets. We haven't made any claims about the reason for that link; we have simply acknowledged that the link exists.
Causation is a word that is used to indicate that one thing causes another. Causation is NOT the same as correlation. Consider the example of the ice cream and the energy usage. Does eating ice cream cause you to use more electricity? Of course not! Does using electricty cause you to eat ice cream? No! Those are both silly ideas!
So what's going on here? It's simple: as temperatures increase, people are more eager to eat ice cream, because it cools them down. But at the same time, people are running air conditioners more, and their refrigerators are working harder because of the heat. Both the ice cream consumption and the energy consumption are a result of the changing temperatures from one month to the next.
This is why it's very important to understand that correlation does not imply causation. Just because two things have a correlation does not mean that one causes the other!
Here's another example. Take a look at the chart below. It's used as proof that autism is caused by the use of herbicides on crops:
It shows a very definite correlation between herbicide use and autism. Later on we'll tear this graph apart, because it is one of the most attrociously unscientific graphs I've ever seen, and the fact that it's linked from the home page of an MIT researcher is downright embarrassing. But we'll get to that a bit later.
Now that you've looked at this graph and drawn the obvious conclusion: Herbicide causes autism, take a look at the next graph, which shows another, quite surprising correlation.
Yes, that's right - you're reading that graph correctly. There is a very clear correlation between purchase of organic foods and autism. The conclusion is very clear: Organic foods cause autism!
I'm sorry, you can't have it both ways. If the first graph (which is merely a correlation) is proof of causation, than so is the second one.
The real challenge is this: if you can't eat chemically treated crops, and you can't eat organic crops, how do you avoid autism without starving to death?
The Embarassing Glyphosate and Autism Graph
I promised we'd tear apart that herbicide/autism graph, and we will. This graph has been shared on the home page of an MIT researcher (whose post graduate degrees are in Electrical Engineering and Computer Science). Never mind the fact that she probably doesn't have any business meddling in biology related stuff, if you have a masters degree or a Ph.D. in any subject area, you know enough to realize that this chart is horrifically nonsensical. For crying out loud, my high school science students probably know enough to realize how dumb this chart is.
So, is this graph really so bad? And if so, how? Oh, let me count the ways...
- The graph uses total numbers of autism cases instead of autism cases per capita, which means it does not take into account population increase (which was significant between 1990 and 2010).
- The graph shows only autistic children served by IDEA (Individuals with Disabilities Education Act). What's wrong with that? Oh, I don't know, maybe the fact that IDEA started serving autistic children in 1990, which means the first few years of that graph are ramped up because IDEA was ramping up.
- The graph does not take into account the fact that diagnosis rates have significantly increased over that time interval; more and more children who used to be diagnosed with mental retardation are now being diagnosed as autism spectrum.
- The graph shows total gallons of herbicide, without any reference to gallons per acre, or number of acres of cultivated land.
- One chart represents USA numbers (autistic children served by IDEA) while the other chart is worldwide (do you really think we don't export significant amounts of corn and soy? Check the USDA site if you doubt it!) If you're comparing US numbers with global numbers, you're not just comparing apples with oranges, you're comparing apples with harmonicas. Or some equally absurd comparison.
- The correlation goes even further out the window if you attempt to limit your gallons of herbicide to ONLY crops used for human consumption. Based on a quick scan of USDA and NCGA websites, it appears that the amount of corn used for ethanol is from 20% to 40% of the entire corn harvest, and that number is rapidly increasing.
Are there more problems with this graph? Oh, undoubtedly! Those are just the ones I came up with off the top of my head, without doing any serious research. And yet, ironically, the "researcher" who shares this on her home page has the audacity to whine about scientific journals that have unfair standards of statistical analysis.
So, Martin, you probably got more than you bargained for - this turned into a little rant (hope you don't mind). But I hope it'll help you remember that just because two things have a correlation doesn't mean that one caused the other!
P.S. If you're wondering why I didn't rip the second chart to shreds, it's because no one actually believes the second one; it was created entirely to prove a point. The organic-food chart (along with a diabetes chart) can be found here.