Sunday, September 22, 2019

Causation vs. Correlation


Causation is when one thing is the result of another. Correlation is the statistical relationship between variables. Correlation does not mean that one thing caused the result of another. Humans often make a mistake when they confuse these two. 
Causation can be tested through experiments. With experiments, there must be a treatment and a control. This allows us to see a cause and effect relationship and further use statistics, including hypothesis tests, to see if there is a significant relationship with one variable forming a causal relationship with another. For example, if we wanted to see that a dosage of  medicine changed the heart rate of patients, we would take a sample (hopefully greater than 30) and record the data. With this experiment, it is one-way, meaning that a change in medicine dosage could change the heart rate, but the heart rate could not change the dosage of medicine for a patient. After recording data and performing tests, if we end up with a p value less than 0.05, we have statistically significant evidence of a causal relationship. 
Correlation describes the strength of a linear relationship. In statistics, r represents the correlation between two variables, and it has a value between -1 and 1. The r is the correlation coefficient for a least squares regression line that minimizes the sum of the squared residuals, or how far the points are from the line of fit. If the r is negative, then as x increases y decreases. If r is positive, as x increases y increases. To use r, we would say that based on r there is a strong/weak positive/negative correlation. If we take r2 then we get the explained variable that describes the change of y due to change in x. We would say [r2]% of change in y can be explained by x. r2 is always between 0 and 1, inclusive. For example, we were to create a least squares regression line for the relationship between the amount of sleep someone gets per night and their average test scores, we might have a positive r close to 1, and say that there is a strong positive correlation, and that based on the r2 we’d say that a percent of change in y can be explained by x. This does not show a cause and effect, simply a relationship between variables. We could also say that y may be before the x or vice versa, unlike the one-way causal relationships. "The demand line has a negative correlation."
In economics, we may confuse correlation with causation. In the post hoc fallacy, people may see relationships and assume something causes another. To be aware of the differences between causation and correlation are important to not come to invalid conclusions. Because the fallacy, people may focus on changing one of the “causes” but have it end up not affecting the other variable. 
Image result for correlation r vs r squared


Source: my Stats notebook

2 comments:

  1. I think that this post does a good job relating the importance of statistics and economics. While using the principals of statistics, we can see how important a control group can be in determining causation vs correlation. However, when we study economics, it is near impossible to create a true control and experimental group. Understanding the complexities associated with economic studies can help us draw more accurate conclusions.

    ReplyDelete
  2. Wow, you seem to know a lot about statistics, Mika. It's almost if you took it straight out of your stats notes. I agree, correlation does not necessarily mean causation, and many who confuse the two may yield vastly incorrect data and make unhelpful economic predictions. In economics, can we ever really know if something directly impacts something else? It seems as if there are always going to be confounding variables, however, economists can do their best do mitigate them.

    ReplyDelete

Note: Only a member of this blog may post a comment.