Question: Introduction This is an rmarkdown document, which combines text with r code. Text can be formatted with markdown syntax, e.g. double asterisks for bold or

Introduction This is an rmarkdown document, which combines text with r code. Text can be formatted with markdown syntax, e.g. double asterisks for bold or hashtags as above for titles. When the document is knit (towards the top of RStudio) it'll render into a PDF file. When submitting answers it'll likely be convenient to use your own rmarkdown file so that you can interlace code with output, but you can also just run the code in and attach output separately (e.g. from screenshots) You can write formulas/latex using dollar signs directly in markdown, e.g. limn .().(x - ux)") " = max(x - Hz) This should have all of the major syntax you need for any math, plus it's a cool result (that isn't relevant to this class). R code is written in blocks using triple backticks. The code and the output can be either shown in the knit document or hidden based on your preference. 1+1 ## [1] 2 Question 0 Run the following code and put the output somewhere in your answer (whether by rendering in rmarkdown, taking a screenshot, or pasting the result). If using stata or python you can skip this question. set . seed (Sys . time () ) runif (1) Question 1: Estimating Equation The Survey of Income and Program Participation (SIPP) surveys households and gathers information on household characteristics and participation in welfare programs. Aggregated data from 2007 for a small number of variables is posted on blackboard under the name SIPPsmall.csv. Download this file and read it into R. If running from rmarkdown your working directly will default to where this .Rmd file is saved, so if you save your data file in the same folder as your markdown file you just need to run df Z mutate ( FoodStampAmount-predicted) df[4,] ## hhsize nchildren EarnedIncome FoodStampAmount state predicted residual #1! 4 4 2 56 950 AK 28.18373 921.8163 Now, calculate the predicted amount of food stamps for someone with an income of $100,000. Does your answer make sense? Question 4: Error Term Your predicted food stamp amount is generally very different from your actual food stamp amount. Name 2 factors in the error term of this model. Question 5 You can nd the correlation between variables using the cor function in r. You are concerned that the estimates from your model are biased because other variables are correlated with food stamp usage. You nd the correlation between every numeric variable in your model using the following code. To use the table look up the row and column to get the correlation between these two variables, e.g. in row 1 col- umn 2 you have cor(hhsize,nchildren)=.74. Note that if you go to the second column first row you have cor(nchildren, hhsize)=.74 - ie it's symmetric since cor(x,y)=cor(y,x). Note also that the diagonal is all 1: any variable is always perfectly correlated with itself df %% select (hhsize , nchildren, EarnedIncome, FoodStampAmount) >% cor () ## hhsize nchildren EarnedIncome FoodStampAmount ## hhsize 1. 0000000 0. 7401305 0. 2878217 0. 2236396 ## nchildren 0. 7401305 1. 0000000 0. 1507106 0. 2411061 ## EarnedIncome 0. 2878217 0. 1507106 1 . 0000000 -0. 1239451 ## FoodStampAmount 0. 2236396 0. 2411061 -0. 1239451 1. 0000000 You look at the fourth column and notice that all of these variables are correlated with FoodStampAmount, and therefore your estimate is biased. Is this interpretation correct? If so, why? If not, what should you be looking at to determine if there is potential bias in your model? Question 6: Basic Visualation: histogram Create a histogram of food stamp amount using the hist function. Adjust the number of bins (n) to produce a nice looking graph. hist (df$FoodStampAmount , n=1000) Histogram of df$FoodStampAmount 50000 100000 150000 Frequency O 0 500 1000 1500 df$FoodStampAmountQuestion 7: Basic Visualization: scatterplot This dataset has nearly 182,000 observations, so a scatterplot will be very ugly unless we aggregate the data first. Below we bin the data by rounding to the nearest thousand. Code is provided below to plot food stamps vs income. Create a scatterplot of food stamps vs income df_agg % mutate (income_bucket=round (EarnedIncome, -3) ) %% group_by (income_bucket) %% summarize (FoodStampAmount=mean (FoodStampAmount) ) plot (df_agg$income_bucket, df_agg$FoodStampAmount) The right tail is a bit long and noisy. We can filter to only the left hand side to get a clearer picture df_agg % mutate (income_bucket=round (EarnedIncome, -3) ) %% group_by (income_bucket) %% filter (income_bucket% summarize (FoodStampAmount=mean (FoodStampAmount) ) plot (df_agg$income_bucket , df_agg$FoodStampAmount) What does this imply about the relationship between income and amount of food stamps received? Is the effect linear? Extra credit Redo the income scatterplot above, but add in your line of best fit and make it look nice using ggplot2

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related General Management Questions!

i want complete solution for my assignment and it should be without plagiarism COIT20274: Information Systems for Business Professionals, Term One 2016 Assignments 1 & 2 Requirements Assignment 1 -...

B LU E BOOK 70 INTRODUCTION The Bluebook, a Uniform System of Citation, published by the Harvard Law Review, is the definitive style guide for the legal field. The Bluebook, as it is known in the...

part 1: Learning the Basics Using Arbuthnot Data This lab is broken up into two parts.In the first part, you will load data into the workspace and do a variety of activities with the data which you...

Draft should include Introduction, Literature Review, Methodology, Exploratory Data Analysis and References You are expected to work with your data proposal which you need to revise (based on the...

Suppose that the daily log return of a pair of securities follows the following model: 0.361,4-1 - 0.162.t-1 + 0.2r2,4-2 + 21,1 12,4 = 0.1 + 0.5r2,4-1 - 0.171,22 + 0.112,4-2 + 22,4 where at denotes a...

Please help with this Regression Analysis assignment, the answers should be in R-coding language. Any help is greatly appreciated! :) Note: Please show all the procedures of your analysis, and...

Please help with ONLY PARTS G, H, and I. (The questions in the 2nd screenshot.) The work/answers for parts a-f are included in the 3rd and 4th screenshots as well. Here is a link to the data:...

Please help with these regression analysis problems :) I've linked the dataset below https://drive.google.com/file/d/1x7cl-3nQ7GSTnnxF-n1d54-hdN1_Q1Dp/view?usp=sharing Note: Please show all the...

Your employer offers you a choice of two bonus packages: $1,400 today or $2,000 five years from now. Assuming a 6 percent rate of interest, which is the better value? Assuming an interest rate of 10...

After an initial hedge is in place, what do hedge fund investors in convertible bonds do with shares of the underlying stock when the stock price increases or decreases?

49. LO.5, 6, 11 Bonnie and Clyde each own one-third of a fast-food restaurant, and their 13-year-old daughter owns the other shares. Both parents work full-time in the restaurant, but the daughter...

In this situation to make - or - buy decision, calculate the quantity produced using equation Q = FC / ( VC 2 - VC 1 ) using the problem: Suppose a manufacturer needs to produce a custom aluminium...

3. What is the employees role in the career development process? The managers role? The employers role?pg 87

1. Why is it advisable for an employee retention effort to be comprehensive? To what extent does IBMs on-demand program fit that description, and why?pg 87

Review your employee manual to delete statements that could undermine your defense in a wrongful discharge case. For example, delete employees can be terminated only for just cause.pg 87