Question: Note: Quiz 2B in eMajor will be based on this Assignment. Please have your R program available in running condition when you take the quiz.

Note: Quiz 2B in eMajor will be based on this Assignment. Please have your R program available in running condition when you take the quiz. You will need solutions of your program to take the quiz.

The data sets needed for the assignment (HousePrices.csv) is available in DATA SETS folder under Table of Contents. THE DATA SET IS ALSO AVAILABLE IN THE ASSIGNMENT PAGE in GoView.

Part 1 (100 points)

HousePrices data set is a cross-sectional data set on house prices and other features, e.g., number of bedroom, of houses in Windsor, Ontario. The data were gathered during the summer of 1987.

Use the HousePrices data to perform the following tests using Linear Regression settings:

Construct a summary stat (mean, median, max, min) for all the variables in the HousePrices data. (10 points)

What is the percentage of houses in the data with Driveway, Gas-Heat and Air-conditioning present? (Hint: find the mean after creating dummy variables with driveway, gasheat, and aircon variables respectively). (30 points)

Construct a linear regression model to test whether number of bedrooms influence house prices. Provide a summary of the linear regression model using summary() function. (30 points)

The online quiz (Q1 to Q4) will be related to the following concepts. You do not have to respond to the following questions in the R program:

How do you interpret the coefficient of Number of Bedrooms in the model?

What is the null hypothesis related to the model to test the effect of number of bedrooms on house price?

To infer the effect of number of bedrooms on house price, draw your conclusion based on p-value.

Comment on model accuracy: R-square

Construct a multiple linear regression model by including all variables as predictors of house prices (response variable) and observe the effect on the house prices. Provide a summary of the regression model using summary() function. (30 points)

Variable description of HousePrices data: A data frame containing 546 observations on 12 variables.

price: Sale price of a house.

lotsize: Lot size of a property in square feet.

bedrooms: Number of bedrooms.

bathrooms: Number of full bathrooms.

stories: Number of stories excluding basement.

driveway: Factor. Does the house have a driveway?

recreation: Factor. Does the house have a recreational room?

fullbase: Factor. Does the house have a full finished basement?

gasheat: Factor. Does the house use gas for hot water heating?

aircon: Factor. Is there central air conditioning?

garage: Number of garage places.

prefer: Factor. Is the house located in the preferred neighborhood of the city?

Deliverables:

Submit R scripts electronically in eMajor in the corresponding Assignment tab.

The assignment is individual submission. You are not allowed to work with anyone else or get any outside help.

Please submit one R program (one R script) containing the parts of the assignment (mark/comment so that each part is separated clearly in the program). R code should provide comments on each sections of the assignment the code is intended for.

The assignment submission grade will be based on whether you have completed each part of the analysis and whether your R code run through all the parts of analysis. Your grade will be based not only on the correctness of the program but also how efficiently the program executes the tasks.

Note that you do not have to write your response to the above questions related to the interpretation of the model results in the R code. If you do write the responses in the R program - it will not be part of your grade.

The quiz 2B will have questions that will test your conceptual understanding of the output/results of the model. Make sure you understand the relevant concepts of the analysis in each part before you take the online quiz. Have your R code and output of the program available when you take quiz 2B.

Do not submit a separate word document explaining your results.

FOLLOWING PROVIDE SOME HELPFUL TIPS ON ASSIGNMENT 1

LOADING THE HousePrices DATA in the R Environment:

# HousePrices data is available from AER package

install.packages("AER")

library(AER)

# See what are the datasets available in R environment

data()

# Load a particular data set

data("HousePrices")

head(HousePrices)

IF YOU ARE HAVING DIFFICULTY IN LOADING THE DATA USING AER PACKAGE, YOU CAN USE THE FOLLOWING OPTION:

Download the HousePrices Data from GoView Assignment page to the following folder path in your computer (You can save in any folder. Make sure you give the correct path name where the data is stored)

setwd("C:/My Documents/FTA_4005/Class_Data")

getwd()

#Load the data file

house=read.csv("HousePrices.csv",header=TRUE)

attach(house)

For Part 1(i);

After loading the dataset HousePrices in R environment:

use summary() function with the loaded data.

For example, if the object name is house for the data HousePrices

then run the code: summary(house) to compute mean, median, max and min

This is explained in unit 2: Part 2 Introduction to R Read Write Data File (19 min)

For Part 1(ii)

Watch the following video and work with the R program in Unit 6 to see how to create dummy variables:

After creating the dummy variable use mean() to compute the % of Driveway, Gas-Heat etc.

Part 2_Multiple_LinearRegession_Credit_Analysis.R

For Part 1(iii) and (iv)

Please watch the video and work with the R program of the video in Unit 5

Part 4_Multiple_LinearRegression_Advertising_Sales_LAB.R

You can watch Part 3: Multiple Regression Advertising Data (12 min) to understand the concepts.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!