Question: Questions 1 1 - 2 0 are based on the files HW 7 _ training.csv and HW 7 _ scoring.csv Your goal is to predict

Questions 11-20 are based on the files HW7_training.csv and HW7_scoring.csv Your goal is to predict the demand for product consumption. You will use linear regression to predict annual heating oil consumption and help a company to secure a certain amount of oil based on your predictions. You will build a model in the training dataset, and apply the results of your model in the scoring dataset.
Question 11
Upload the Training and Scoring datasets to RapidMiner. (Note: Insulation is measured on a scale 1-10 ten is the highest, outdoor temperature measured in F, Num_Occupants number of total residents per home, Home_Age age in years, Home_Size size in sq ft, Heating_Oil_Used the number of units of oil purchased in a recent month).
Run the process, and explore the results. In the Statistics view, check the ranges for each attribute in both training and scoring datasets. Note that the ranges for Home_Size are different in the training and scoring datasets. Make the ranges identical by applying a filter in the Scoring dataset. Use Home_Size 489 and Home_Size 7081.
Run the results. How many records are now in the Scoring dataset?
7081
9436
9442
9988
5 points
Question 12
In the Training dataset, add operator Set Role, and assign the role of label to attribute Heating_Oil_Used. How many regular attributes are now in the dataset?
3
5
7
9
Question 13
In the Training dataset, add operator Linear Regression, keep the defaults, run the process. Explore the Linear Regression Coefficients. Which attribute has the heaviest weight (highest coefficient)?
Insulation_Rating
Outdoor_Temp
Home_Age
Home_Size
Num_Occupants
5 points
Question 14
In the Linear Regression Coefficients table, explore the significance of attributes. Which attribute is not significant and has been automatically removed from the model?
Insulation_Rating
Outdoor_Temp
Home_Age
Home_Size
Num_Occupants
Question 15
In the Linear Regression Coefficients table, explore the Intercept. What is the coefficient of the Intercept?
210.679
3.329
2.05
160.634
5 points
Question 16
Based on the coefficients in the Linear Regression Coefficients table, create a Regression formula and calculate oil consumption for the house with the following attributes: Insulation_Rating =5, Outdoor_Temp =75, Home_Age =30, Home_Size =2200, Num_Occupants =5. What is the result of your calculation? Round up the number if necessary.
180
189
201
220
5 points
Question 17
Based on a Regression formula created in Question 16, calculate oil consumption for the house with the following attributes: Insulation_Rating =9, Outdoor_Temp =45, Home_Age =15, Home_Size =1600, Num_Occupants =2. What is the result of your calculation? Round up the number if necessary.
134
152
161
187
Question 18
Apply the Linear Regression model to the Scoring dataset. Run the process and explore the results. What is the average predicted oil consumption in the scoring dataset? Round up the answer if necessary.
198.7
218.8
221.5
304.4
5 points
Question 19
Add operator Aggregate (Note: connect the lab port of Apply Model and exa port of Aggregate). In the Aggregate parameters, edit the list of aggregation attributes, and calculate the sum and median of the Prediction(Heating_Oil_Used). Based on your calculations, how many units of oil will be required to satisfy the demand? Round up the answer if necessary.
1,587,999
1,865,345
1,989,868
2,064,756
5 points
Question 20
Based on the aggregate results, what is the median value for the Prediction(Heating_Oil_Used)? Round up the answer if necessary.
198.7
218.8
221.5
304.4

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!