Question: Introduction This is an rmarkdown document, which combines text with r code. Text can be formatted with markdown syntax, e.g. double asterisks for bold or

 Introduction This is an rmarkdown document, which combines text with rcode. Text can be formatted with markdown syntax, e.g. double asterisks forbold or hashtags as above for titles. When the document is knit(towards the top of RStudio) it'll render into a PDF file. When

Introduction This is an rmarkdown document, which combines text with r code. Text can be formatted with markdown syntax, e.g. double asterisks for bold or hashtags as above for titles. When the document is knit (towards the top of RStudio) it'll render into a PDF file. When submitting answers it'll likely be convenient to use your own rmarkdown file so that you can interlace code with output, but you can also just run the code in and attach output separately (e.g. from screenshots) You can write formulas/latex using dollar signs directly in markdown, e.g. limn .().(x - ux)") " = max(x - Hz) This should have all of the major syntax you need for any math, plus it's a cool result (that isn't relevant to this class). R code is written in blocks using triple backticks. The code and the output can be either shown in the knit document or hidden based on your preference. 1+1 ## [1] 2 Question 0 Run the following code and put the output somewhere in your answer (whether by rendering in rmarkdown, taking a screenshot, or pasting the result). If using stata or python you can skip this question. set . seed (Sys . time () ) runif (1) Question 1: Estimating Equation The Survey of Income and Program Participation (SIPP) surveys households and gathers information on household characteristics and participation in welfare programs. Aggregated data from 2007 for a small number of variables is posted on blackboard under the name SIPPsmall.csv. Download this file and read it into R. If running from rmarkdown your working directly will default to where this .Rmd file is saved, so if you save your data file in the same folder as your markdown file you just need to run df Z mutate ( FoodStampAmount-predicted) df[4,] ## hhsize nchildren EarnedIncome FoodStampAmount state predicted residual #1! 4 4 2 56 950 AK 28.18373 921.8163 Now, calculate the predicted amount of food stamps for someone with an income of $100,000. Does your answer make sense? Question 4: Error Term Your predicted food stamp amount is generally very different from your actual food stamp amount. Name 2 factors in the error term of this model. Question 5 You can nd the correlation between variables using the cor function in r. You are concerned that the estimates from your model are biased because other variables are correlated with food stamp usage. You nd the correlation between every numeric variable in your model using the following code. To use the table look up the row and column to get the correlation between these two variables, e.g. in row 1 col- umn 2 you have cor(hhsize,nchildren)=.74. Note that if you go to the second column first row you have cor(nchildren, hhsize)=.74 - ie it's symmetric since cor(x,y)=cor(y,x). Note also that the diagonal is all 1: any variable is always perfectly correlated with itself df %% select (hhsize , nchildren, EarnedIncome, FoodStampAmount) >% cor () ## hhsize nchildren EarnedIncome FoodStampAmount ## hhsize 1. 0000000 0. 7401305 0. 2878217 0. 2236396 ## nchildren 0. 7401305 1. 0000000 0. 1507106 0. 2411061 ## EarnedIncome 0. 2878217 0. 1507106 1 . 0000000 -0. 1239451 ## FoodStampAmount 0. 2236396 0. 2411061 -0. 1239451 1. 0000000 You look at the fourth column and notice that all of these variables are correlated with FoodStampAmount, and therefore your estimate is biased. Is this interpretation correct? If so, why? If not, what should you be looking at to determine if there is potential bias in your model? Question 6: Basic Visualation: histogram Create a histogram of food stamp amount using the hist function. Adjust the number of bins (n) to produce a nice looking graph. hist (df$FoodStampAmount , n=1000) Histogram of df$FoodStampAmount 50000 100000 150000 Frequency O 0 500 1000 1500 df$FoodStampAmountQuestion 7: Basic Visualization: scatterplot This dataset has nearly 182,000 observations, so a scatterplot will be very ugly unless we aggregate the data first. Below we bin the data by rounding to the nearest thousand. Code is provided below to plot food stamps vs income. Create a scatterplot of food stamps vs income df_agg % mutate (income_bucket=round (EarnedIncome, -3) ) %% group_by (income_bucket) %% summarize (FoodStampAmount=mean (FoodStampAmount) ) plot (df_agg$income_bucket, df_agg$FoodStampAmount) The right tail is a bit long and noisy. We can filter to only the left hand side to get a clearer picture df_agg % mutate (income_bucket=round (EarnedIncome, -3) ) %% group_by (income_bucket) %% filter (income_bucket% summarize (FoodStampAmount=mean (FoodStampAmount) ) plot (df_agg$income_bucket , df_agg$FoodStampAmount) What does this imply about the relationship between income and amount of food stamps received? Is the effect linear? Extra credit Redo the income scatterplot above, but add in your line of best fit and make it look nice using ggplot2

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!