Correlation vs. Causation: What’s the Difference?

Written by Coursera Staff • Updated on

Learn about correlation versus causation and how to differentiate these two terms from one another when describing the relationship between variables.

[Feature image] A woman experiencing the nuances of correlation and causation through a game of Pickup Sticks.

In analytics, correlation and causation both describe relationships between variables. However, the two terms are not interchangeable and have significant differences. Causation indicates that one event causes another. Correlation only identifies that a relationship exists between two events or outcomes.

In a situation where two variables have a similar response to an event, you may assume that one event caused the other or that the two variables are somehow directly connected. However, this isn’t always the case, making it important to be able to distinguish between correlation and causation. Explore correlation versus causation as well as how to differentiate these two terms from one another when describing the relationship between variables.

What is meant by correlation vs. causation?

The concept of correlation versus causation strives to determine if two events are simply related to each other or if one caused the other to happen. Correlation versus causation is an important consideration since the presence of a correlation between two variables doesn’t mean one causes the other. When a clear relationship exists between variables, it can be easy to say that a cause-and-effect relationship is present.

This type of observation, though, may prevent you from considering other factors or variables that could cause the correlation. The correlation you are observing may be causation, as both can be true, but correlation alone isn’t enough to declare causation. 

What is correlation?

Correlation measures the linear relationship between variables. In a positive correlation, when the value of one variable goes up, the other does as well. When one variable goes down, the other variable descends, too.

A negative correlation describes the opposite—as one variable goes up, the other goes down, with the two variables moving in opposite directions. If no relationship exists between variables, you would say zero correlation is present [1]. 

You can represent the strength of the relationship between variables using a correlation coefficient ranging from -1 to +1, where the closer the linear relationship is to zero, the weaker the correlation is:

  • 1 = Perfect positive correlation

  • 0.5 = Weak positive correlation

  • 0 = Zero correlation

  • -0.5 = Weak negative correlation

  • -1 = Perfect negative correlation

You can also use scatter plots to visualize correlations. If you have a positive correlation, you will notice points on the scatter plot moving up from left to right and points moving down from left to right if a negative correlation is present. A scatter plot representing variables with no correlation will have points that appear spread throughout the graph [2]. 

Limitations exist when it comes to how much you can learn from correlations, as correlation alone isn’t enough to prove causation. Additionally, correlations are only able to establish linear relationships between variables. 

Even when variables are strongly correlated, it doesn’t prove a change in one variable caused the change in the other. To be able to do that, you must establish causation. Causation occurs when one variable is directly responsible for the change in the other. This is much more difficult to prove than correlation and requires experimentation using both independent and controlled variables. 

What is causation?

Causation occurs when one variable is directly responsible for the change in the other. In other words, a change in one variable causes a change in another variable. Causation can be more challenging to prove than correlation and requires experimentation using both independent and controlled variables. 

In order to prove causation, you need a properly designed experiment that demonstrates these three conditions: 

  • Temporal sequencing: Temporal sequencing states that X, referring to the variable causing the change, comes before Y, the variable that changes.

  • Non-spurious relationship: A non-spurious relationship means that you can demonstrate with certainty that the relationship between X and Y couldn’t occur simply by chance.

  • Elimination of alternative causes: By eliminating alternative causes, you are stating that the relationship between X and Y isn’t due to other outside variables that aren’t considered part of the experiment. 

If your experiment fails to demonstrate temporal sequencing, a non-spurious relationship, or eliminate any possible alternative causes, you can’t prove causation [3]. A complication of causation compared to correlation is that it’s difficult to prove that one thing causes another.

What is causality in simple terms?

Essentially, causality is understanding how one thing influences another thing and how a cause produces an effect. Nothing in the world tends to happen without something having caused it. Change is a consistent aspect of reality, and causality is rooted in identifying the incident that caused the change. Take a look at two examples of causality you might recognize: 1. If you plant a seed (cause), a tree might grow (effect). 2. If you press the gas pedal (cause), your car will move forward (effect). 

Placeholder

Does correlation imply causation?

Although it’s possible for both correlation and causation to occur at the same time, correlation doesn’t imply causation. This is because the relationship between variables could either be due to a third variable or simply a coincidence. 

Examples of correlation vs. causation

If you were to collect data on the sale of ice cream cones and swimming pools throughout the year, you would likely find a strong positive correlation between the two as sales of both increase during the summer months. If you make the mistake of assuming correlation implies causation, you might claim that an increase in ice cream cone sales causes people to buy swimming pools. However, this isn’t the case since you can attribute the increase in both to another variable—likely the warmer weather people experience during the summer. Therefore, although a correlation is present, you can't support causation. 

In the following example of how correlation is different from causation, you may find it challenging to identify whether causation is present with two variables: You could find a correlation between the amount someone exercises and their reported levels of happiness. While it’s possible an increase in exercise is causing an increase in happiness, you can't say for sure that it’s the cause since there could be another unknown variable that has a more significant influence on a person's mood.

Reliable ways to determine causation

To reliably determine causation, you can perform randomized A/B/n testing, which is the same as an A/B test, but with any number of additional variables. This ensures that other possible factors are part of the test as well. 

The other method for determining causation is through hypothesis testing. Hypothesis testing is when you test your primary hypothesis against a null hypothesis, which is the opposite of your primary hypothesis. The null hypothesis should be disproved by your primary hypothesis to help you be as certain as possible about your results. 

Explore correlation vs. causation with Coursera 

Although the difference in correlation and causation can be challenging to identify, you can do so with a detailed and structured analytical approach. To develop important analytical skills, such as data collection, data calculations, and data analysis, consider earning a Google Data Analytics Professional Certificate on Coursera. With this Professional Certificate, you can qualify for in-demand positions, such as a data analyst or junior data analyst, in less than six months.

The University of Colorado Boulder’s Statistical Inference and Hypothesis Testing in Data Science Applications and Data Analysis Tools from Wesleyan University on Coursera are also great courses where you can learn more about how to properly implement hypothesis testing.

Article sources

1

University of North Carolina Wilmington. "Bivariate Correlations (Pearson's r), http://people.uncw.edu/pricej/teaching/statistics/correlations.htm." Accessed November 18, 2024.

Keep reading

Updated on
Written by:

Editorial Team

Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...

This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.