Question: 3b) Dropping feature columns [20 pts] Imputation is the process of replacing missing data with substituted values. Let's assume we want to keep columns where

3b) Dropping feature columns [20 pts] Imputation is the process of replacing missing data with substituted values. Let's assume we want to keep columns where 5% or less of the values are null (keep and impute) and discard any column where more than 5% of the values are null (throw). Treat the string type "None" as a category and not a null value. 3b-i) According to above condition (5% threshold), how many features can be kept and imputed? [5 pts] 3b-ii) Which columns have null values 5% or less of total, so we can impute? [5 pts] 3b-iii) Which columns have null vaues more than 5% of total, so we should throw? [5 pts]In [23] : # your code here df = pd. read_csv(' data/house_data. csv' ) features = df . drop("SalePrice", axis = 1) # 3b-i Hint: In the previous question 2c we calculated null_counts of all the features. We can split that into 2 # Lists i. e features_to_impute and features_to_throw. boolean_throw = null_counts [ : -1]/len(features)> 0.05 boolean_impute = null_counts [ : -1]/len(features) 5% # Complete the codes below by uncommenting and changing the values of features_to_impute and features_to_throw. # Each should be a list of feature names (e.g. ['LotFrontage', 'Alley', . ..]). Do not change the variable names. # There are hidden tests which will grade above three questions. features_to_impute = list(features . columns [boolean_impute ] ) features_to_throw = list(features . columns [boolean_throw] ) print (len(features_to_impute), features_to_impute) print (len(features_to_throw), features_to_throw) 0 0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

Imputation is the process of replacing missing data with substituted values. Let's assume we want to keep columns where 5% or less of the values are null (keep and impute) and discard any column...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Jones & Bartlett Learning, LLC. NOT FOR RESALE OR DISTRIBUTION CHAPTER Hot Spot Analysis 10 LEARNING OBJECTIVES C A R R Provide a working definition of a \"hot spot.\" , Be able to explain different...

Please create an excel spreadsheet with formulas for requirement 5 (pages 24-27)and requirement 6 (pages 28-31)only. Please submit in excel format. Both questions and answers are provided for...

Please answer me page 51 to page 56 on the attachment. is a multiple choice questions. Thank you FAC1502/101/3/2016 Tutorial letter 101/3/2016 Financial accounting concepts, principles and procedures...

think about what procedural changes would have the biggest positive impact, without being excessively costly for our lab members at every level (including undergrads!). Reference: the Lab Data Check...

this is my assessment which are am going to send you and i need some things about my assessment : Adding some more detail and diving into the case study a bit deeper would really make your points...

PROJ6000: Principles of Project Management Assessment 3 - Project Charter Report. Length 2,000 words (+/- 10%) Task Summary After reading the project case study, use it to develop a 2,000-word...

1. Read the case study below. This will form the basis for your Project Charter, because you will assume that you are the project manager for this project. 2. After reading the case study, begin to...

At December 31, 2007, Roko Co. has two fixed price construction contracts in progress. Both contracts have monthly billings supported by certified surveys of work completed. The contracts are: a. The...

Verify that formulas (5.3a) and (5.3b) are equivalent to formula (5.2).

just as business does not have the resources to mitigate all its risks, internal audit does not have the resources to test every risk

A flow in software hardware or procedures is knows as what?

Keep it current. If the walls are an extension of our learning then having something displayed in December that was learned in September doesnt help students see the relevance in their current...

Highlight the process of learning. Learning is complex work and students need to understand that it is as much about how they arrive at a learning target as it is about the arrival or answer....

Make it a one-stop shop. The weekly message should be the one-stop shop for the building leaders communication needs. Rather than have every department bombard principals with emails that contain...