Question: In this project, you will work with the data files provided in the Names folder in the files section of this class. These files contain
In this project, you will work with the data files provided in the "Names" folder in the files section of this class. These files contain national data on the relative frequency of given names in the population of U.S. births where the individual has a Social Security Number. For each year of birth YYYY after 1879, there is a comma-delimited file called yobYYYY.txt. Each record in the individual annual files has the format "name,sex,number," where "name" is 2 to 15 characters, "sex" is M (male) or F (female) and "number" is the number of occurrences of the name. Each file is sorted first on sex and then on number of occurrences in descending order.
You are tasked with creating an application that will show how often a certain name has been used throughout history in the United States. Your specific objectives explained below
Part 1: Python
In this part, you will analyze the given data to feed a graphical user interface created in MATLAB. Write a program that does the following:
Ask the user for a desired name and a sex
Go throughallthe data files and find how many times the user-given name has been used for the user-given sex each year
Calculate the popularity of the user-given name per year as a percentage of the total number of name occurrences used per year
Find the best linear fit to the name occurrence data using the least squares method. This should be done forboth the absolute number of occurrences and the popularity percentages.
Your results should be saved in txt format and will be used for part 2.
The least squares linear fit approximates a set of (x,y) data as a straight line. This method is based on minimizing the error between the actual data points and the line one gets using the calculated values ofmandb.
To use the least squares method assume that each of yourxvalues is calledxiand each of youryvalues is calledyi, so that a single data point is (xi,yi). The counterigoes from 1 ton, which is the number of data points you have. You can then define averagexandyvalues:
And you can use these values to determinemandb
Part 2: MATLAB
Now that you have processed the data (and saved it in text format) you will build a graphical user interface (GUI) for visualization. Write a program that does the following
Import the text data into MATLAB
Create a GUI that will show the historic occurrences of the user-given name over time. Your GUI should include a plot of name occurrences (y-axis) vs year (x-axis) as well as buttons (or drop-down menus, or checkboxes, or any other control structure you prefer) to choose whether to plot the absolute number of occurrences or the popularity percentages
Plot the appropriate least squares fit line on top of the historic data
Your GUI should be able to refresh whenever the user selects to plot absolute number of occurrences or the popularity percentage. Your plots should contain all relevant information such as axis labels, title, legend, markers, etc.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
