Correlation, Causation, and Qualitative Research
October 9, 2014
Kathleen Marker, PhDArguments based on correlation are common, especially in the news, because they are easier to report, make arguments clearer, take up less space, and give power to quantitative data. At times, authors themselves are unaware of their error and of the difference between correlation and causation.
CNN recently published an article in which the author stated that “doctors have been less willing to prescribe medications, especially in states like Florida, formerly known for its pill mills, where tighter restrictions on prescribers led to a 23% drop in overdose deaths between 2010 and 2012.”
The problem with this statement is that the author is commenting on correlation (there was a 23% drop in overdose at the same time tighter restrictions on prescribers came about), but telling the reader that the drop in overdoses was due to tighter restrictions on prescribed medications. The author is teaching the reader that a correlation equals causation.
Unfortunately, arguments based on correlation are common for a wide range of reasons: they are easier to report, make arguments clearer, take up less space, and give power to quantitative data. At times, authors themselves are unaware of their error and the difference between correlation and causation.
Extensive and systematic qualitative research allows for a deeper exploration of causation. However, qualitative research costs money, is time consuming, and often presents findings in stories that are more complex than the neatly-packaged 23% drop finding. Qualitative researchers would likely discover that a range of causes are responsible for the drop in overdoses (e.g. changes in racism, increases in employment, immigration adjustments, new housing policies, advancements in drug treatment) which means the author would need to deliver a messier and less clear actionable plan. Controlling prescribers may not reduce death.
Using a correlation to make an argument about causation has two consequences. First, it creates an inaccurate and overly simplistic picture of the world for the reader. It prevents the author from encouraging the reader to think about larger societal issues, such as economic health, racism, health care policy, ageism, immigration, and religion — the list can go on and on. Second, in this particular instance, it can lead to bad policies for drug users and our healthcare system. Policies based on this correlation may be created to tighten restrictions and end up increasing drug overdoses.
This is all to say that as daily consumers of data, we need to be more critical of where our data are coming from and how they are being used. Additionally, as researchers who use data daily to tell stories, we need to be less fearful of complex, and even contradictory stories. We need to not shy away from data that baffle our hypotheses. We need to want to find all possible explanations in order to tell the most accurate stories about causation as possible. Correlations are a great start in data generation, but getting to the bottom of why data correlate will allow us to build better practices for business, governments, and schools, while teaching Americans that society is much more complex, interesting, and complicated than we are commonly led to believe.