Question: Problem 3 - Using Regression to Measure The Impact of Social Pressure on Voter Turnout [34 points] Does shaming people into doing good work? Or

Problem 3 - Using Regression to Measure The Impact of Social Pressure on Voter Turnout [34 points] Does shaming people into doing good work? Or does it just cause people to react negatively, even causing them to do the opposite? In the United States, voting is considered a social norm, such that when people learn that others have not voted, they judge them more negatively -- just as people judge others more negatively when they learn they don't recycle or don't pay their taxes. Knowing they could be negatively judged in this way, do many people vote in order to avoid the shame of being a nonvoter? And, can social innovators use this social norm to their benefit? A study by Alan Gerber and Don Green (you do not need to read it) examined this question. They obtained the publicly available voter rolls from the Michigan Secretary of State and randomly assigned voters to one of five conditions ahead of the 2006 primary election. One condition was a control group that received no mailing. The other four conditions got one of the four mailings given here. The last of the four, the "Neighbors" mailing, contained a list of the voter turnout for all residents of one's own household and for all the other households on a street Please download a cleaned version of the data: https://www.dropbox.com/s/q1r8zyaj553rn40/social_pressure_cleaned.csv?dl=0 Please use Rstudio to do it a. Multiple voters often live within the same household, but it would not be feasible to randomly assign different voters who live in the same households to different mailings; they would show the mailings to each other (creating an issue we will discuss in a future week). Therefore, the authors randomly assigned the data at the household level, such that all people within a household will always get the same treatment. This is an example of clustering. Should you cluster standard errors when estimating ATE? At which level? [5 points] b. Estimate the ATE of all four treatments (treatment_civicduty, etc.) on the outcome variable, voted, using regression. Do not use any controls and do not worry about clustering standard errors yet. [5 points] c. Now let's correctly cluster the standard errors to account for the fact that it was randomized at the household level. Take the regression you ran in part (b) and take the clustering into account. R, delightfully, does not have a built-in function for computing clustered standard errors. You need to insert the code below, and then call the function in that code using cl(output.from.lm.here, cluster.variable.here). Report the standard errors for the estimated effects of the four treatments.[5 points] cl <- function(fm, cluster){ require(sandwich, quietly = TRUE) require(lmtest, quietly = TRUE) M <- length(unique(cluster)) N <- length(cluster) K <- fm$rank dfc <- (M/(M-1))*((N-1)/(N-K)) uj <- apply(estfun(fm),2, function(x) tapply(x, cluster, sum)); vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N) coeftest(fm, vcovCL) } # To use CL; first specify your model, like this: lm.result <- lm(y~x, data) # then use this to return the clustered standard errors: cl(lm.result, data$rename.this.to.the.variable.that.identifies.clusters) # To install the proper packages, you may need to run this line once on your computer in the "Console" tab of RStudio: install.packages(c('sandwich', 'lmtest')) d. Run the code from part c), but now add controls for whether someone voted in previous elections, the variables called g2000, g2002, p2000, p2002, p2004 (p is for primary; g is for general; year is the year of the election), and the variables their age (denoted yob for year of birth) and gender. Report the standard errors for the estimated effects of the four treatments. [5 points] e. Why do the standard errors change in part d)? [4 points] f. Given the content of the four mailings and the results of your regressions, what do you conclude about the efficacy of social pressure for increasing voter turnout? [4 points] g. The voted variable records whether someone voted in the 2006 primary election in August 2006. The mailers were sent in July 2006. Imagine the authors also had data on whether these voters later voted in the 2006 general election that took place three months later, in November 2006. Would you recommend adding November 2006 as an additional covariate to the regression estimating the effect of the mailers on turnout in August 2006 you ran in part d)? Why or why not? [3 points] h. Imagine you were interested in the effect of sending these mailers in July 2006 on voter turnout in November 2006 and so ran a regression with November 2006 turnout as the outcome. Would you add voter turnout in August 2006 to this regression? Why or why not? [3 points]

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!