Question: PYTHON & PANDAS. Please write your code with the assumption that input files will be in a folder called names, which will be in the
PYTHON & PANDAS.
Please write your code with the assumption that input files will be in a folder called "names", which will be in the same directory as your file.
Input data: We will use the Name Popularity data as our input data set. The data set contains name popularity (frequency) data for years 1880-2010. Please study the data set carefully prior to beginning to write code. This is an important part of the planning process that can save you a lot of coding time.
Background/Motivation: Skewness and Kurtosis are two interesting and important concepts in Statistics. Note: I don't recommend skipping this -- you will not be able to answer many of the questions in the assignment if you do not understand what skewness and kurtosis tell you about data. In short, both describe the shape of the data distribution. Understanding distribution shape is often an important part of the analysis process. We can extract much more useful information from data if we augment common statistical measures (like mean, median, variance/standard deviation) with information about the distribution's shape. In addition to coding practice, the goal of this assignment is to enable you to integrate these measures into the analysis process.
Assignment Part I: Choose any three male and three female names that are listed in the input data. Some of your choices may be very uncommon, but be careful in picking such names -- make sure there is actually data available for them in every year in the data set (or at least in most years). Using your code (not analysis "by eye" or by any other software), answer the following questions in your Jupyter notebook. Answering via print statements or comments would be fine, whichever you prefer. Please include all code in the same notebook, organized neatly.
What is the mean and median popularity of each name you chose (for the time period 1880-2010)?
What is the standard deviation for each name during this time period?
What is the Skewness for each name? What does it tell you (it should tell you something, "nothing" is an incorrect answer)?
What is the Kurtosis for each name? What does it tell you (again, "nothing" == incorrect)?
Assignment Part II: In the same notebook, via your code (not "by eye" or using software), answer the following questions:
5. What two names (one male, one female) have been the most consistently popular between 1880 and 2010? Which statistical measure supports your answer quantitatively? Make your code print the appropriate numbers as part of your answer.
6. What names (one male, one female) could be described using the phrase "brief time of extreme fame, followed and preceded by near-total obscurity". What statistical measure supports your answer? Make your code print the appropriate numbers as part of your answer.
Here is part example of two text files:
First file name and extention for year 1880.
yob1880.txt
Mary,F,7065 Anna,F,2604 Emma,F,2003 Elizabeth,F,1939 Minnie,F,1746 Margaret,F,1578 Ida,F,1472 Alice,F,1414 Bertha,F,1320 Sarah,F,1288 Annie,F,1258 Clara,F,1226 Ella,F,1156 Florence,F,1063 Cora,F,1045 Martha,F,1040 Laura,F,1012 Nellie,F,995 Grace,F,982 Carrie,F,949
Last file for year 2010
yob2010.txt
Isabella,F,22731 Sophia,F,20477 Emma,F,17179 Olivia,F,16860 Ava,F,15300 Emily,F,14172 Abigail,F,14124 Madison,F,13070 Chloe,F,11656 Mia,F,10541 Addison,F,10253 Elizabeth,F,10135 Ella,F,9796 Natalie,F,8715 Samantha,F,8334 Alexis,F,8181 Lily,F,7900 Grace,F,7598 Hailey,F,6969 Alyssa,F,6934 Lillian,F,6898
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
