Question: Analyzing data by group When you re analyzing a data set, often you want to calculate the metrics broken down by each category of another

Analyzing data by group
When youre analyzing a data set, often you want to
calculate the metrics broken down by each category of
another variable.
In the previous homework, you wrote a function
(generateNumericSummary) to summarize a numeric
variable by another categorical variable.
By generalizing/extending the previous code, we will
design an data object to analyze a variable by another
categorical variable.
Class dataByGroup
dataByGroup (abstract class)
Contains two pandas series: dat and group
numericDataByGroup
Sub-class of dataByGroup
dat should be numeric.
categoricalDataByGroup
Sub-class of dataByGroup
dat should be categorical.
dataByGroupdataByGroup
numericDataByGroupnumericDataByGroup categoricalDataByGroupcategoricalDataByGroup
Class dataByGroup
Write a class definition for object dataByGroup with the
following specification:
Object attributes
dat: (pandas) Series
group: categorical (pandas) Series of the same length as dat
_isBinary: Boolean (True/False), indicating whether dat is binary.
methods
__init__(self, dat, group): takes dat and group as input data and initialize dat and
group of object dataByGroup.
__str__(self): print dat and group (combined together) as a (pandas) DataFrame
isBinary(self): returns the value of _isBinary. (Accessor method)
getNumMissings(self): returns the number of missing values in dat. (Accessor
method)
Class dataByGroup
Class header
import pandas as pd
import numpy as np
class dataByGroup(object):
def __init__(self, dat, group):
#write your code
# __init__ should initialize three object attributes
# dat, group, _isBinary
def __str__(self):
#write your code
def isBinary(self):
#write your code
def getNumMissings(self):
#write your code
Sub-Class numericDataByGroup
Write a class definition for object numericDataByGroup
with the following specification:
Object attributes
Inherits all attributes of its super-class - dataByGroup
methods
Inherits all methods of its super-class - dataByGroup
getMeans(self): returns means of dat across the different levels of
group.
Return data type: pandas Series
getSTD(self): returns standard deviations of dat across the different
levels of group.
Return data type: pandas Series
Sub-Class numericDataByGroup
Class header
class numericDataByGroup(dataByGroup):
def __init__(self,dat, group):
#write your code
def getMeans(self):
#write your code
def getSTD(self):
#write your code
Sub-Class categoricalDataByGroup
Write a class definition for object
categoricalDataByGroup with the following
specification:
Object attributes
Inherits all attributes of its super-class - dataByGroup
Methods
Inherits all methods of its super-class - dataByGroup
getTallies(self): returns tabulated counts (tallies) by dat and group
Return data type: pandas Series
Sub-Class categoricalDataByGroup
Class header
class categoricalDataByGroup(dataByGroup):
def __init__(self,dat, group):
#write your code
def getTallies(self):
#write your code
Output
Test case 1: categorical data
def main():
titanic = pd.read_csv("titanic3.csv",header=0)
survivedByPclass =categoricalDataByGroup(titanic['survived'],titanic['pclass'])
print("Data and Group: ")
print(survivedByPclass) ## __str__ is invoked
print("Is the data binary? : "+str(survivedByPclass.isBinary()))
print("The number of missing values : "+str(survivedByPclass.getNumMissings()))
print("Tallies: ")
print(survivedByPclass.getTallies())
Test case 2: numerical data
def main():
ageBySurvived = numericDataByGroup(titanic['age'],titanic['survived'])
print("Data and Group: ")
print(ageBySurvived) ## __str__ is invoked
print("Is the data binary? : "+str(ageBySurvived.isBinary()))
print("The number of missing values : "+str(ageBySurvived.getNumMissings()))
print("Means: ")
print(ageBySurvived.getMeans())
print("Standard Deviations: ")
print(ageBySurvived.getSTD())
Here is my code:
import numpy as np
import pandas as pd
class dataByGroup(object):
def __init__(self, dat, group):
self.dat = dat
self.group = group
def __str__(self):
df_dat = pd.DataFrame({self.dat.name: self.dat})
df_group = pd.DataFrame({self.group.name: self.group})
return df_merged.to_string()
def isBinary(self):
lst = np.array(self.dat)
lst_check = np.logical_or(lst ==1, lst ==0)
lst_check_nan = np.isnan(lst)
return (lst_final_check == True).all()
def getNumMissings(self):
isnull =0
check_null_df = pd.isnull(self.dat)
for i in range(len(self.dat)):
if check_null_df[i]== True:
isnull +=1
return isnull
class numericDataByGroup(dataByGroup):
def __init__(self, dat, group):
dataByGroup.__init__(self, dat, group)
def getMeans(self):
return self.dat.groupby(self.group).mean()
def getSTD(self):
return self.dat.groupby(self.group).std()

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!