Question: PLEASE ANSWER USING PYTHON ONLY 9 . 3 Predicting Prices of Used Cars ( Regression Trees ) . The file ToyotaCorolla.csv contains the data on

PLEASE ANSWER USING PYTHON ONLY
9.3 Predicting Prices of Used Cars (Regression Trees). The file ToyotaCorolla.csv contains the data on used cars (Toyota Corolla) on sale during late summer of 2004 in the Netherlands. It has 1436 records containing details on 38 attributes, including Price, Age, Kilometers, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications. (The example in Section 9.7 is a subset of this dataset.)
Data Preprocessing. Split the data into training (60%), and validation (40%) datasets.
a. Run a full-grown regression tree (RT) with outcome variable Price and pre- dictors Age_08_04, KM, Fuel_Type (first convert to dummies), HP, Auto- matic, Doors, Quarterly_Tax, Mfr_Guarantee, Guarantee_Period, Airco, Auto- matic_airco, CD_Player, Powered_Windows, Sport_Model, and Tow_Bar. Set random_state=1.
i. Which appear to be the three or four most important car specifications for predicting the cars price?
ii. Comparethepredictionerrorsofthetrainingandvalidationsetsbyexamining their RMS error and by plotting the two boxplots. How does the predictive performance of the validation set compare to the training set? Why does this occur?
iii. Howmightweachievebettervalidationpredictiveperformanceattheexpense of training performance?
iv. CreateasmallertreebyusingGridSearchCV()withcv=5tofindafine-tuned tree. Compared to the full-grown tree, what is the predictive performance on the validation set?
PROBLEMS 249
2509 CLASSIFICATION AND REGRESSION TREES
b. Let us see the effect of turning the price variable into a categorical variable. First, create a new variable that categorizes price into 20 bins. Now repartition the data keeping Binned_Price instead of Price. Run a classification tree with the same set of input variables as in the RT, and with Binned_Price as the output variable. As in the less deep regression tree, create a smaller tree by using GridSearchCV() with cv =5 to find a fine-tuned tree.
i. ComparethesmallertreegeneratedbytheCTwiththesmallertreegenerated by RT. Are they different? (Look at structure, the top predictors, size of tree, etc.) Why?
ii. Predicttheprice,usingthesmallerRTandCT,ofausedToyotaCorollawith the specifications listed in Table 9.10.
TABLE 9.10
SPECIFICATIONS FOR A PARTICULAR
TOYOTA COROLLA
iii. Compare the predictions in terms of the predictors that were used, the mag- nitude of the difference between the two predictions, and the advantages and disadvantages of the two methods.
 PLEASE ANSWER USING PYTHON ONLY 9.3 Predicting Prices of Used Cars

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!