Correlation vs. Causation: What’s the Difference?

Written by Coursera Staff • Updated on

Learn about correlation versus causation and how to differentiate these two terms from one another when describing the relationship between variables.

[Featured Image] Two pencils lay in opposite directions. One is yellow, and the other is blue.

In analytics, correlation and causation both describe relationships between variables. However, the two terms are not interchangeable and have significant differences. Causation indicates that one event causes another. Correlation only identifies that there is a relationship between two events or outcomes.

In a situation where two variables have a similar response to an event, you may assume that one event caused the other or that the two variables are somehow directly connected. However, this isn’t always the case, making it important to be able to distinguish between correlation and causation. 

What is meant by correlation vs. causation?

The concept of correlation versus causation strives to determine if two events are simply related to each other or if one caused the other to happen. Correlation versus causation is an important consideration since the presence of a correlation between two variables doesn’t mean one causes the other. When a clear relationship exists between variables, it can be easy to say that a cause-and-effect relationship is present.

The problem with making this observation is that you may fail to consider other factors or variables that could cause the correlation. The correlation you are observing may be causation, as both can be true, but correlation alone isn’t enough to declare causation. 

What is correlation?

Correlation measures the linear relationship between variables. In a positive correlation, when the value of one variable goes up, the other does as well. When one variable goes down, the other variable descends, too.

A negative correlation describes the opposite—as one variable goes up, the other goes down, with the two variables moving in opposite directions. If no relationship exists between variables, you would say there’s zero correlation [1]. 

You can represent the strength of the relationship between variables using a correlation coefficient ranging from -1 to +1, where the closer the linear relationship is to zero, the weaker the correlation is:

  • 1 = Perfect positive correlation

  • 0.5 = Weak positive correlation

  • 0 = Zero correlation

  • -0.5 = Weak negative correlation

  • -1 = Perfect negative correlation

You can also use scatter plots to visualize correlations. If you have a positive correlation, you will notice points on the scatter plot moving up from left to right and points moving down from left to right if a negative correlation is present. A scatter plot representing variables with no correlation will have points that appear spread throughout the graph [2]. 

Limitations exist when it comes to how much you can learn from correlations, as correlation alone isn’t enough to prove causation. Additionally, correlations are only able to establish linear relationships between variables. 

Even when variables are strongly correlated, it doesn’t prove a change in one variable caused the change in the other. To be able to do that, you must establish causation. Causation occurs when one variable is directly responsible for the change in the other. This is much more difficult to prove than correlation and requires experimentation using both independent and controlled variables. 

What is causation?

Causation occurs when one variable is directly responsible for the change in the other. In other words, a change in one variable causes a change in another variable. This is much more difficult to prove than correlation and requires experimentation using both independent and controlled variables. 

In order to prove causation, you need a properly designed experiment that demonstrates these three conditions: 

  • Temporal sequencing: Temporal sequencing states that X, referring to the variable causing the change, comes before Y, the variable that changes.  

  • Non-spurious relationship: A non-spurious relationship means that you can demonstrate with certainty that the relationship between X and Y couldn’t occur simply by chance.

  • Elimination of alternative causes: By eliminating alternative causes, you are stating that the relationship between X and Y isn’t due to other outside variables that aren’t considered part of the experiment. 

If your experiment fails to demonstrate temporal sequencing, a non-spurious relationship, or eliminate any possible alternative causes, you can’t prove causation [3]. A complication of causation compared to correlation is that it’s difficult to prove that one thing causes another.

Does correlation imply causation?

Although it’s possible for both correlation and causation to occur at the same time, correlation doesn’t imply causation. This is because the relationship between variables could either be due to a third variable or simply a coincidence. 

Examples of correlation vs. causation

If you were to collect data on the sale of ice cream cones and swimming pools throughout the year, you would likely find a strong positive correlation between the two as sales of both increase during the summer months. If you make the mistake of assuming correlation implies causation, you would incorrectly claim that an increase in ice cream cone sales causes people to buy swimming pools. However, this isn’t the case since you can attribute the increase in both to another variable—likely the warmer weather people experience during the summer. So although a correlation is present, you can't support causation. 

In another correlation versus causation example, it may not be as easy to identify whether causation is present with two variables. For example, you could find a correlation between the amount someone exercises and their reported levels of happiness. While it’s possible an increase in exercise is causing an increase in happiness, you can't say for sure that it’s the cause since there could be another unknown variable that has a more significant influence on a person's mood.

Reliable ways to determine causation

To reliably determine causation, you can perform randomized A/B/n testing, which is the same as an A/B test, but with any number of additional variables. This ensures that other possible factors are part of the test as well. 

The other method for determining causation is through hypothesis testing. Hypothesis testing is when you test your primary hypothesis against a null hypothesis, which is the opposite of your primary hypothesis. The null hypothesis should be disproved by your primary hypothesis to help you be as certain as possible about your results. 

Learn more with Coursera 

To develop important analytical skills, such as data collection, data calculations, and data analysis, consider earning a Google Data Analytics Professional Certificate on Coursera. With this certificate, you can qualify for in-demand positions in less than six months, such as a data analyst or junior data analyst. 

The University of Colorado Boulder’s Statistical Inference and Hypothesis Testing in Data Science Applications and Data Analysis Tools from Wesleyan University on Coursera are also great courses to learn more about how you can properly implement hypothesis testing.

Article sources

1

University of North Carolina Wilmington. "Bivariate Correlations (Pearson's r), http://people.uncw.edu/pricej/teaching/statistics/correlations.htm." Accessed August 18, 2023.

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.