Question: # numpy and pandas import numpy as np import pandas as pd import math #graphics with matplotlib import matplotlib.pyplot as plt plt.style.use('seaborn') %matplotlib inline #

# numpy and pandas import numpy as np import pandas as pd import math #graphics with matplotlib import matplotlib.pyplot as plt plt.style.use('seaborn') %matplotlib inline # model, train/test split, dummies (one-hot-encoding), rmse metric from scikit learn. from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelBinarizer from sklearn.metrics import mean_squared_error 

Get the data

Lets read in the used cars data. We will just use the features mileage and color.

cd = pd.read_csv("https://bitbucket.org/remcc/rob-data-sets/downloads/susedcars.csv") cd = cd[['price','mileage','color']] cd['price'] = cd['price']/1000 cd['mileage'] = cd['mileage']/1000 cd.head() 

We are fitting the model:

price=0+1mileage+2mileage^2+

What do you think ?

Homework:

Use out of sample performance (a train/test split) to decide which of these two models is best:

linear model of log(y) on mileage and color

linear model of y on mileage, mileage squared, and color

Use out of sample rmse and graphics to compare the two models.

How would I code this in python to check the two models?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!