Question: Consider the data set k401ksubs.csv posted with this assignment. It includes information on 9275 individuals with the following covariates, where the dependent variable is pira,

Consider the data set "k401ksubs.csv" posted with this assignment. It includes information on 9275 individuals with the following covariates, where the dependent variable is pira, equal to 1 if the subject has an IRA.

e401k: =1 if eligible for 401(k)

inc: annual income, $1000s

marr: =1 if married

male: =1 if male respondent

age: in years

fsize: family size

nettfa: net total fin. assets, $1000

p401k: =1 if participate in 401(k)

pira: = 1 if have IRA (Individual Retirement Account)

incsq: income squared

agesq: age squared

Create 2 Linear Probability models

(1) using all variables (Model 1),

(2) using variables you deem important (Model 2)

a. In Model 1 interpret the impact of e401k, inc (note the incsq as well so you need to consider them together) and marr on probability of participation even if they are not statistically significant.

b. Explain how you reached Model 2 (I am interested in you reasoning for eliminating variables)

c. Discuss which model is a better model in explaining the variability in the probability of participation.

d. Predict the probability of participation for the first 10 observations in the data set.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!