Question: Consider the Boston housing dataset. This small data set provides information about houses in different areas in Boston. The following table provides the dictionary of

Consider the Boston housing dataset. This small data set provides information about houses in different areas in Boston. The following table provides the dictionary of the variables in the data set.

Variables

Variable description

CRIM

Crime rate

ZN

Percentage of residential land zoned for lots over 25,000 ft2

INDUS

Percentage of land occupied by nonretail business

CHAS

Does tract bound Charles River (= 1 if tract bounds river, = 0 otherwise)

NOX

Nitric oxide concentration (parts per 10 million)

RM

Average number of rooms per dwelling

AGE

Percentage of owner-occupied units built prior to 1940

TAX

Full-value property tax rate per $10,000

PTRATIO

Pupil-to-teacher ratio by town

LSTAT

Percentage of lower status of the population

MEDV

Median value of owner-occupied homes in $1000s

CAT.MEDV

Is median value of owner-occupied homes in tract above $30,000 (CAT.MEDV = 1) or not (CAT.MEDV = 0)

DIS

Weighted distances to five Boston employment centers

RAD

Index of accessibility to radial highways

We have 2 variables as target variables used by Multiple Linear Regression and Decision Tree modeling. For the linear regression model, MEDV is the target variable and CAT.MEDV is the target variables for the decision tree model developed in this assignment. Students must not consider them as predictor variables. With respect to the Boston housing data set, answer the following questions using R software. You must provide answer, R codes and the results of analysis in the solution document.

Apply str() and summary() commands and display the findings. What is the range of Tax and Age.

CAT.MEDV is the median value of homes: 0 = value < $30000; 1 = value > $30000. Change this variable to a 2 level categorical (factor) variable and save in the dataset. Demonstrate R codes and their results.

Generate a training with 400 rows and a test set with 106 rows from the given dataset.

Develop a multiple regression model to predict MEDV. Consider all other variables as predictors (except CAT.MEDV). Provide a summary of results.

Which variables are statistically significant?

Evaluate the prediction performance of the model using R-square and RMSE parameters. Explain your findings.

Develop a decision tree model to predict CAT.MEDV. Consider all other variables as predictors (except MEDV). Provide a summary of results.

Generate a training with 400 rows and a test set with 106 rows from the given dataset. Create a confusion matrix for the test set. Explain the results in terms of overall prediction error.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!