Question: Problem B [40 Marks]: Consider the data given in HW2_DataB Microsoft Excel (.csv) file and described in Table 1. Note: Solve all the following questions
![Problem B [40 Marks]: Consider the data given in "HW2_DataB" Microsoft](https://dsd5zvtm8ll6.cloudfront.net/si.experts.images/questions/2024/09/66f529f6e5b2a_09466f529f650005.jpg)


Problem B [40 Marks]: Consider the data given in "HW2_DataB" Microsoft Excel (.csv) file and described in Table 1. Note: Solve all the following questions using Python. Use the Pandas \& Sklearn library for all the following analyses. Using the given data do the following: B-1. [3 marks]: Read and display the data. Identify the number of rows and columns. Does any column have missing data? If yes, provide their name. B-2. [2 marks]: Type Consistency: For each column, identify each field type and verify that each column in Python is identified correctly. If there is any discrepancy, then indicate it. B-3. [5 marks]: Filter noise: Looking at the data, some values in the numeric columns ("age") were entered in a less than 1 (by mistake). Fix the inconsistencies. Furthermore, find unique categorical values and remove unknowns (if any). B-4. [7 marks]: Handling NaN values: Drop all columns containing 30% or more missing values. Then impute the columns having missing values. B-5. [5 marks]: Normalization/Transformation: Normalize all numeric columns to a mean of zero and standard deviation of one and print only normalized columns. B-6. [5 marks]: Encoding: Convert "work_type" using label encoder. B-7. [5 marks]: Encoding: For the "ever_married," convert it using binary values ( 0 and 1). Do not drop any new column(s). B-8. [8 marks]: General questions (write your answers in a jupyter notebook): (i) When is best to use a label encoder rather than one hot encoding? (ii) What are data cube aggregation and discretization? (iii) Give a real-world example of direct and indirect data acquisition approaches. (iv) Give a real-world example of structured data and unstructured data. (v) Why is there a need to convert numerical data to Min-Max scaler
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
