Question: Regression Models 1. Create a python file named myregressor.py. Import the following package. import pickle import numpy as np from sklearn import linear_model import sklearn.metrics
Regression Models
1. Create a python file named myregressor.py. Import the following package.
import pickle
import numpy as np
from sklearn import linear_model
import sklearn.metrics as sm
import matplotlib.pyplot as plt
2. Add the following lines. Read these lines and explain their purpose?
input_file = ' regressor_data.txt'
data = np.loadtxt(input_file, delimiter=',')
X, y = data[:, :-1], data[:, -1]
num_training = int(0.8 * len(X))
num_test = len(X) - num_training
X_train, y_train = X[:num_training], y[:num_training]
X_test, y_test = X[num_training:], y[num_training:]
3. Add the following lines. What is the purpose for these added lines?
regressor = linear_model.LinearRegression()
regressor.fit(X_train, y_train)
y_test_pred = regressor.predict(X_test)
4. Add the following lines. Run the program. Save the plot diagram to your local computer and insert the diagram below.
plt.scatter(X_test, y_test, color='green')
plt.plot(X_test, y_test_pred, color='black', linewidth=4)
plt.xticks(())
plt.yticks(())
plt.show()
5. Explain what have been drawn in the graph?
6. Modify the above code to display in a same diagram the scatter plots of (1) training data set in blue, (2) testing data set in green, and predicted data set in red. Please show your code and insert the diagram you saved.
7. Add the following lines and run your program. Please show the printout.
print("Linear regressor performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred), 2))
print("Mean squared error =", round(sm.mean_squared_error(y_test, y_test_pred), 2))
print("Median absolute error =", round(sm.median_absolute_error(y_test, y_test_pred), 2))
print("Explain variance score =", round(sm.explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(sm.r2_score(y_test, y_test_pred), 2))
8. Use the equations to explain what are mean_absolute_error and mean_squared_error?
9. From the provided document, learn what is explained variation and what is R squared?
10. Add the following lines and run your program. What is the printout?
output_model_file = 'myregressor.pkl'
with open(output_model_file, 'wb') as f:
pickle.dump(regressor, f)
with open(output_model_file, 'rb') as f:
regressor_model = pickle.load(f)
y_test_pred_new = regressor_model.predict(X_test)
print(" New mean absolute error =", round(sm.mean_absolute_error(y_test, y_test_pred_new), 2))
11. Read these above lines and consider what the intent of these lines?
12. According to the previous labs, consider how to use model_selection to split the training and testing data set. Answer the following questions: (1) Which library package should be imported? (2) Which function is used for splitting the data set? (3) Write the code to replace the last four lines in the previous question No.2. (4) Run your code and show the printout only.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
