Question: In Python 3: Part 1: Data preparation (5 Points) Create a data preparation and cleaning function that does the following Has a single input that

In Python 3:

In Python 3: Part 1: Data preparation (5 Points) Create a data

Part 1: Data preparation (5 Points) Create a data preparation and cleaning function that does the following Has a single input that is a file name string Reads data (the data is comma separated, has a row header and the first column EventID is the index) into a pandas dataframe Cleans the data Convert the feature Label to numeric (choose the minority class to be equal to 1) Create a feature Y with numeric label o Drop the feature Label If a feature has missing values (i.e., -999 ) o Create a dummy variable for the missing value Call the variable orig_var_name _mv where orig_var_name is the name of the actual var with a missing value o Give this new variable a 1 if the original variable is missing o Replace the missing value with the average of the feature (make sure to compute the mean on records where the value isn't missing). You may find pandas' .replace() function useful. After the above is done, rescales the features so that each feature has zero mean and unit variance (hint: look up sklearn.preprocessing) Returns the cleaned and rescaled dataset Grading guideline: if this function is done in more than 30 lines (not including empty lines), we will deduct 2 points. In [3]: #Don't forget to import the packages you'll need here. def cleanBosonData(infile_name): # code here return data_clean Part 1: Data preparation (5 Points) Create a data preparation and cleaning function that does the following Has a single input that is a file name string Reads data (the data is comma separated, has a row header and the first column EventID is the index) into a pandas dataframe Cleans the data Convert the feature Label to numeric (choose the minority class to be equal to 1) Create a feature Y with numeric label o Drop the feature Label If a feature has missing values (i.e., -999 ) o Create a dummy variable for the missing value Call the variable orig_var_name _mv where orig_var_name is the name of the actual var with a missing value o Give this new variable a 1 if the original variable is missing o Replace the missing value with the average of the feature (make sure to compute the mean on records where the value isn't missing). You may find pandas' .replace() function useful. After the above is done, rescales the features so that each feature has zero mean and unit variance (hint: look up sklearn.preprocessing) Returns the cleaned and rescaled dataset Grading guideline: if this function is done in more than 30 lines (not including empty lines), we will deduct 2 points. In [3]: #Don't forget to import the packages you'll need here. def cleanBosonData(infile_name): # code here return data_clean

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

The final exam consists entirely of timed coding exercises in a Jupyter Notebook created by the student. Submit your completed Notebook file at the Blackboard link before the deadline on Monday,...

Python 3 Part III: Grocery Shopping (20 points) Write a function shop () that takes the following arguments, in this order: 1. filename: a file that you will need to input. The file contains many...

This Mini Project involves the creation of a structure that will hold a dataset and statistical information about the data set. First, lets describe the properties of our structure, Data. Data will...

This is Everything they gave us. They want us to write the code: The dataframe for your team is called your_team_df. The variable 'pts' represents the points scored by your team. Calculate and print...

I'm having alot of trouble with the program below: Part I: PA3 Preparation (10 points). Create an outline in comments/psuedocode for the programming assignment below. Place your comments in the...

Project Two: Hypothesis Testing . You are a data analyst for a basketball team and have access to a large set of historical data that you can use to analyze performance patterns. The coach of the...

Project Two: Hypothesis Testing This notebook contains the step-by-step directions for Project Two. It is very important to run through the steps in order. Some steps depend on the outputs of earlier...

Project Two: Hypothesis Testing . You are a data analyst for a basketball team and have access to a large set of historical data that you can use to analyze performance patterns. The coach of the...

ADM 1370 - Applications of Information Technology for Business - Winter 2022 - Assignment 3 Database Table Descriptions VOLUNTEERS Attribute Type Volunteer ID Number Volunteer_Last Name Short Text...

Payback Refer to Problem 12-1. What is the projects payback period? Data from Problem 12-1 Project K has a cost of $52,125, its expected net cash inflows are $12,000 per year for 8 years, and its...

Are people living in a relationship-based governance system likely to be unethical in business dealings?

Varnon Company has a defined benetit pension plan. On December 3 1 ( the end of the fiscal year ) , the company received the PBO report trom the actuary. The following information was included in the...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

=+3 Describe the kinds of health, safety, and security problems that international

=+4 Why is designing and implementing a global human resource information system so difficult?

=+2 Why have family-friendly and work-life balance programs become so important?