Question: The goal of this problem is to do basic data analysis on a simple data set usingpandaspackage in Python (no machine learning for now). As
The goal of this problem is to do basic data analysis on a simple data set usingpandaspackage in Python (no machine learning for now). As it has been emphasized in the lectures, we need tohave a good understanding of data before training a machine learning model. In this assignment, you areasked to analyze the UCI Adult data set. The Adult data set is a standard machine learning data set thatcontains demographic information about the US residents.
This data was extracted from the census bureaudatabase found at: http://www.census.gov/ftp/pub/DES/www/welcome.html. The data set contains 32561instances and 15 features (please check the notebook for possible values of each feature) with different types(categorical and continuous).
The data is provided as acsvfile and can be loaded intopandas DataFrame object as shown:data = pd.read_csv('adult.data.csv')
You are asked to answer following questions about this data set. Please note that you need to usepandasfunctionalities to answer these questions, rather than implementing pure Python code.
1. How many men and women (sex feature) are represented in this data set?
2. What is the average age (age feature) of women?
3. What is the percentage of German citizens (native-country feature)?
4. What are the mean and standard deviation of age for those who earn more than 50K per year (salaryfeature) and those who earn less than 50K per year?
To answer these questions, you are provided with aJupyternotebook with questions. Please completethe notebook with you code to answer the questions. You are encouraged to installAnacondadistributionof Python to run the Jupyter notebook or directly useJupyterLaband accomplish this problem.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
