Question: Commentary I This weeks lesson deals with regression analysis, which identifies a link between two or more variables in a (hopefully) useful way. However, users
Commentary I
This weeks lesson deals with regression analysis, which identifies a link between two or more variables in a (hopefully) useful way. However, users have to be very careful when running regressions. There are any number of pitfalls that might befall an incautious statistician, and this discussion will highlight one of the most common the notion that the observation of a correlation between variables does not necessarily imply a causal relationship. In other words, just because two variables seem to have a mathematical relationship between them does not mean that movements in one variable drive movements in the other. Note: For anyone curious what a causal relationship is, please refer to the Correlation and Causation link in the Week Discussion folder.
In many regression scenarios, the hope is that it is possible to identify a causal relationship. In some cases, a causal relationship does not exist at all, and we are merely observing correlation between two sets of data. This is known as spurious correlation, and occurs more frequently than most people realize. These types of correlations are examined in the spurious correlation page by Tyler Vigen.
In other cases, the presence of correlation does imply a causal relationship exists but only if we look at it from the correct perspective. Consider a companys expenditures on advertising over the course of the year, and the sales revenues that the company achieves. Generally, we believe that increasing spending on advertising will increase sales volumes, because advertising helps more consumers know about our product. Thus, we would expect positive correlation between sales revenues and advertising expenditures. A regression of either of these variables on the other could yield indicators of positive correlation. However, it makes more sense that advertising drives sales than the next periods sales driving current periods advertising. And so the appropriate regression to run is a regression where sales is our dependent variable and advertising is our independent variable. In this setting, a causal story makes sense. But if we swap the variables, a causal story is much more difficult to believe, even with significant positive correlation.
Part of what makes the previous example easy to believe when we consider how advertising drives sales is that advertising expenditures come before the sales revenues. A company pays to advertise, the commercials/ads run, consumers see the product, and then buy it. So it makes sense that because advertising comes first, it impacts sales figures. However, this is not always the case. Just because one thing follows another does not mean that the first thing causes the second. This is the notion behind the Latin idiom Post hoc, ergo propter hoc which translates to after that, therefore because of that or A follows B, so B must cause A. It is a common logical fallacy, and one that is easy to make.
A solid example of this fallacy is when political candidates dont campaign much in states that they are guaranteed to lose (think Republican Presidential candidates and California). A simple observer might see a Republican not campaigning much in CA, and then the candidate subsequently not get many votes from California. The conclusion that might get drawn is that the Republican candidate lost California because the candidate didnt campaign much there. But this isnt true the truth is that no amount of campaigning was going to win California for a Republican, so they didnt bother campaigning much there. The loss of California wasnt driven by a lack of campaigning, even though the election followed this lack of campaigning.
The last scenario I want to discuss with respect to causation is that sometimes when regression is used, a relationship is identified between two variables but it is difficult to say what exactly is going on. A personal experience should help illuminate this.
In my doctoral dissertation, I examined the connection between companies with retirement savings plans and mutual funds. Mutual fund companies own shares of publicly traded companies, and they are supposed to monitor the actions of the firms as shareholders. But mutual fund companies also manage retirement savings for companies. For example, Fidelity Investments operates mutual funds that own shares of Verizon (and they own a LOT of shares of Verizon). At the same time, Verizon has more than $20B in retirement assets that they contract Fidelity to manage, and they pay Fidelity substantial fees for this service. So Verizon writes Fidelity a big check every year for retirement management, all while Fidelity is supposed to use its large share position to make sure Verizon is making good decisions. But Verizon can pick anyone they want to manage their retirement assets, so Fidelity might not police the actions of Verizon very closely because Fidelity does not want to risk upsetting management at Verizon and losing that large yearly check. This relationship (which I called the Retirement Asset Management relationship) represents a rather severe conflict of interest, and is present for a number of publicly traded firms.
My dissertation detailed how this relationship affected the behavior of firms like Verizon, and I showed that when the relationship with a large shareholder was present, firms like Verizon were more likely to take actions that benefitted management at the expense of shareholders. There was a clear link between the two. Unfortunately, I could not clearly show whether this was a case of the Retirement Asset Management causing good managers to behave in a poor manner, or whether bad managers actively sought to create a Retirement Asset Management relationship as a buffer so they could continue to make self-serving decisions. In other words, I was unable to determine if the relationship created bad managers, or if bad managers created the relationship (whether A caused be or B caused A). The general consensus was that this meant that my findings were interesting, but limited the policy implications of my work. Such is life!
In the folder for this weeks discussion, you will see several links. One details spurious correlations, others explore the Post Hoc fallacy, and the final one shows another example of how it can be difficult to figure out what the correlation between two variables means. Hopefully you enjoy the reading and videos!
Questions:
- What is spurious correlation? Please discuss at least one example from Tyler Vigens page and why the correlation between the two seems to qualify as spurious correlation.
- What is the Post Hoc fallacy?
- Have you ever found yourself falling victim to the Post Hoc fallacy? How so?
- In the article detailing examples of the Post Hoc Fallacy, one example describes how Republicans pass a tax reform bill and then the stock market falls shortly thereafter. The article was published in 2009, but we just experienced this exact scenario in 2018, interestingly enough. Republicans passed a sweeping tax reform bill in late 2017, and in February-March of 2018, the stock market lost more than 6% of its value. Many journalists and individuals concluded that this nosedive was the result of the tax reform bill. Does this example qualify as guilty of falling for the Post Hoc fallacy? If so, what else might have caused stock prices to decline? The article on the stock market may help.
- Read the article on having sex and making more money. After reading it, do you think the frequency of sexual intercourse is related to the amount of money an individual makes? Do you think there is a causal relationship between the two, and if so, which was does the causality run? That is, does more sex cause more money, or does having more money result in having more sex?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
