Question: 1.1 Preprocess the raw data When given a new dataset, we need to deal with the missing values and categorical features. In [10]; import pandas

1.1 Preprocess the raw data When given a new dataset, we

need to deal with the missing values and categorical features. In [10];

1.1 Preprocess the raw data When given a new dataset, we need to deal with the missing values and categorical features. In [10]; import pandas as pd import numpy as np from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LinearRegression, Ridge, Lasso from sklearn.metrics import mean_absolute_error, mean_squared_error import matplotlib.pyplot as plt df = pd.read_csv('housing.csv') # 0. fill in missing values mean_val = df['total_bedrooms ' ) .mean() df ['total_bedrooms'] = df['total_bedrooms'].fillna(mean_val) print (df.isnull().sum()) # 1. convert categorical features to numerical values labelencoder = LabelEncoder() df['ocean_proximity'] = labelencoder.fit_transform(df['ocean_proximity']) print (df.info()) 0 longitude 0 latitude 0 housing_median_age 0 total rooms 0 total bedrooms 0 population 0 households 0 median_income 0 median_house_value ocean_proximity 0 dtype: int64 Range Index: 20640 entries, 0 to 20639 Data columns (total 10 columns): longitude 20640 non-null float64 latitude 20640 non-null float 64 housing_median_age 20640 non-null int64 total_rooms 20640 non-null int64 total_bedrooms 20640 non-null float64 population 20640 non-null int64 households 20640 non-null int64 median_income 20640 non-null float64 median_house_value 20640 non-null int64 ocean_proximity 20640 non-null int64 dtypes: float64(4), int64(6) memory usage: 1.6 MB None 1.5 Use the ridge regression model to do prediction minw || y Xw Il + || w |13 1.5.1 Compare its performance on the testing set with that of the standard linear regression model minw ll y Xw I12 1.5.2 Use different 2 to see how it affects the performance of the ridge regression model on the testing set In [18]: # your code

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model _ selection import train _ test _ split, GridSearchCV from sklearn.pipeline import...

I do not have access to Python or excel/office Covid sick cant access school computer last time getting error please help No matter how much I access and cleanData1 I always get an error on every...

Write Python code to solve this homework in detail with comments. eg of csv file contain: AREA Description AGR The course aims to introduce Rules and Regulations that are designated for undergraduate...

D O, N O T TkeDeep Learning by proximity of networking and advanced programming Criteria Points AVOI Part 1 - Question 1 Normalize the train and test data 2 Part 1 - Question 2 Build and train a ANN...

1. For this assignment, we will write a code which does a basic data processing pipeline having the following steps: Taking data from a file and cleaning it. Modifying this data. Creating a predictor...

Here is the project: [ Overview and Rationale Data mining is used to reveal hard to see and hidden patterns and relationships in Big Data datasets. Data mining helps to classify data for further...

Control Theory allows us to find various useful properties of a system such as stability. Draw a picture of a generic control system, explaining the functions of feedback and the design goals for the...

# run this code to load and process the data import pickle, sklearn import matplotlib.pyplot as plt import pandas as pd import numpy as np from sklearn.tree import...

How many training instances are in the dataset? How many test instances? How many features are in the training data? What is the distribution of labels in the training data? That is, what percentage...

Please provide python code for questions 4-10 if possible, start at 4. I have provided the code I wrote for parts 1 through 3 below the questions. Thanks Challenge 1 Open up a new IPython notebook...

The Committee of Sponsoring Organizations of the Treadway Commission (COSO) issued a thought paper titled, "Enhancing Board Oversight: Avoiding Judgment Traps and Biases." Visit COSO's website...

Protonation converts the hydroxyl group of an alcohol to a good leaving group. Suggest a mechanism for each reaction. (a) (b) OH H,SO, heat (EI) + H,O OH Br HBr, heat 2 or S1 + H2O

Innovation occurs when combinations of ideas and imformmon bring about change Multiple Choice rest reative old new

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

2. Organizations create capabilities for performing tasks that otherwise would be impossible.

1. Individuals must be organized in a structured way to achieve an objective.

2. How will you implement your chosen option and mitigate negative impact?