Question: 3b) Dropping feature columns [20 pts] Imputation is the process of replacing missing data with substituted values. Let's assume we want to keep columns where
![3b) Dropping feature columns [20 pts] Imputation is the process of](https://dsd5zvtm8ll6.cloudfront.net/si.experts.images/questions/2024/09/66f902f737883_25566f902f72189f.jpg)

3b) Dropping feature columns [20 pts] Imputation is the process of replacing missing data with substituted values. Let's assume we want to keep columns where 5% or less of the values are null (keep and impute) and discard any column where more than 5% of the values are null (throw). Treat the string type "None" as a category and not a null value. 3b-i) According to above condition (5% threshold), how many features can be kept and imputed? [5 pts] 3b-ii) Which columns have null values 5% or less of total, so we can impute? [5 pts] 3b-iii) Which columns have null vaues more than 5% of total, so we should throw? [5 pts]In [23] : # your code here df = pd. read_csv(' data/house_data. csv' ) features = df . drop("SalePrice", axis = 1) # 3b-i Hint: In the previous question 2c we calculated null_counts of all the features. We can split that into 2 # Lists i. e features_to_impute and features_to_throw. boolean_throw = null_counts [ : -1]/len(features)> 0.05 boolean_impute = null_counts [ : -1]/len(features) 5% # Complete the codes below by uncommenting and changing the values of features_to_impute and features_to_throw. # Each should be a list of feature names (e.g. ['LotFrontage', 'Alley', . ..]). Do not change the variable names. # There are hidden tests which will grade above three questions. features_to_impute = list(features . columns [boolean_impute ] ) features_to_throw = list(features . columns [boolean_throw] ) print (len(features_to_impute), features_to_impute) print (len(features_to_throw), features_to_throw) 0 0
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
