Question: Use pandas library only: Task 1: Usin the skeleton code, create a subset of the data which removes 35% of the population. The 65% subset

Use pandas library only:

Task 1: Usin the skeleton code, create a subset of the data which removes 35% of the population. The 65% subset will be called the sample. The 25% subset will is named the validation set.

Task 2: Get the proportion of the population that is MALE and has a mass greater than or equal to a given weight in the sample.

NOTE: (given_weight = self.weight)

Task 3: Get proportion of population that is MALE and has a mass greater than or equal to a given weight in the sample. Assume this is the true value for the population and return the percent error (note a percentage not a proportion).

Task 4: Using any method you deem reasonable decide if it is reasonable to use this weight cutoff to predict if a troll is MALE for the supplied data.

NOTE: this returns True or False where True means it is reasonable and False means it is not.

Please explain each line of code.

Use pandas library only: Task 1: Usin the skeleton code, create a

import pandas as pd from assignment_1_grader import get_file, get_weight, h_assignment_1_grader, r class Assignment_1: def __init__(self): self.name = "INPUT YOUR NAME self.file = get_file() self.weight = get_weight() # answer 1 (sample - 75%, validation is 25%) self.sample_df, self.validation_df = self.get_validation_and_sample() # answer 2 (For this I will feed it your sample df) a_df = pd. DataFrame([]) seIf.get_probability_male_given_weight_greater_than_specified_weight(a_df) # answer 3 self.get_percent_error_of_sample_predicting_validation() # answer 4 self.evaluate_reasonableness_of_weight_as_a_predictor_of_gender_for_given_population_and_weight() h_assignment_1_grader(self) def get_validation_and_sample(self): file = self.file validation_df = pd. DataFrame([]) sample_df = pd. DataFrame([]) # load a big df # split data frame #head( int(df.shape[@]*.75)) # code that divides the file randomly into a sample (75%) and validation (25%) # you will be penalized if it is not random return sample_df, validation_df def get_probability_male_given_weight_greater_than_specified_weight(self, df): probability = r.random() weight = self.weight # code that assigns a value to probability # what equation? is this just averages... # get only those who are heavier than weight return probability def get_percent_error_of_sample_predicting_validation(self): percent_error = r.random() # code that calculates percent error # treat validation set as true value # percent_error = (tested - true)/true return percent_error def evaluate_reasonableness_of_weight_as_a_predictor_of_gender_for_given_population_and_weight(self): is_reasonable = r.choice([True, False]) weight = self.weight # code that assigns is reasonable True or False return is_reasonable

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

In this project you will complete the implementation of a Java program that helps a user maintain a library of their favorite songs. The library is in the form of a list of songs. The program stores...

Java 1. The data file. The first part of your assignment is to select a subject for a data file, which will be a simple version of what is called a "database." A data file typically contains...

Describing Data Once we have collected data from surveys or experiments, we need to summarize and present the data in a way that will be meaningful to the reader. We will begin with graphical...

Create charts to better understand data sets. For cross-sectional data, use a scatter chart. For time series data, use a line chart. Linear y = a + bx Logarithmic y = ln(x) Polynomial (2nd order) y =...

7. Array-Oriented Programming with NumPy Objectives In this chapter, youll: Learn what arrays are and how they differ from lists. Use the numpy modules highperformance ndarrays. Compare list and...

Instuctor's Annotated Edition TENTH EDITION Understandable Statistics Concepts and Methods Charles Henry Brase Regis University Corrinne Pellillo Brase Arapahoe Community College Australia Brazil...

Set Student Name: 1. Describe the relationship between two variables that have a correlation coefficient value: a. Near -1 b. Near 0 c. Near 1 2. Data was collected where a weightlifter was asked to...

Introduction Note: Circular Buffers are described in Section 5.2.4 of our textbook (p. 211). Please read that section before proceeding. An interesting, relatively straightforward data structure is...

\fThis is an electronic version of the print textbook. Due to electronic rights restrictions, some third party content may be suppressed. Editorial review has deemed that any suppressed content does...

1 2.3 Definition of a Discrete Probability Function Definition: Let S be a discrete sample space from some experiment. A function P, defined on all events in S, is said to be a probability function...

(a) Show all of the steps in the mechanism for this reaction. Don't forget to use curved arrows to show the movement of electrons in each step of the mechanism. (b) Show a free energy versus reaction...

Identify the advantages and disadvantages of each type of international staffing policy.

4 ON EXAM Which of the following is acceptable approaches in applying the lower - of - cost - and net realizable value method to inventory would lead to the highest ending inventory balance? Applying...

"A finite automata with extra memory called stack" justify the statement. Construct a push down automata for the following language L = { w c u x 5 | w l o n ( a + b ) * * } .

How can Federal jobs in the same GS Pay Grade be considered jobs of Comparable Worth?

What is the Salary Range Midpoint and how does it relate to the Pay Policy Line? For which analytic is it important?

How wide are Salary Structure Ranges?