Question: In this project, you will work with the data files provided in the Names folder in the files section of this class. These files contain

In this project, you will work with the data files provided in the "Names" folder in the files section of this class. These files contain national data on the relative frequency of given names in the population of U.S. births where the individual has a Social Security Number. For each year of birth YYYY after 1879, there is a comma-delimited file called yobYYYY.txt. Each record in the individual annual files has the format "name,sex,number," where "name" is 2 to 15 characters, "sex" is M (male) or F (female) and "number" is the number of occurrences of the name. Each file is sorted first on sex and then on number of occurrences in descending order.

You are tasked with creating an application that will show how often a certain name has been used throughout history in the United States. Your specific objectives explained below

Part 1: Python

In this part, you will analyze the given data to feed a graphical user interface created in MATLAB. Write a program that does the following:

Ask the user for a desired name and a sex

Go throughallthe data files and find how many times the user-given name has been used for the user-given sex each year

Calculate the popularity of the user-given name per year as a percentage of the total number of name occurrences used per year

Find the best linear fit to the name occurrence data using the least squares method. This should be done forboth the absolute number of occurrences and the popularity percentages.

Your results should be saved in txt format and will be used for part 2.

The least squares linear fit approximates a set of (x,y) data as a straight line. This method is based on minimizing the error between the actual data points and the line one gets using the calculated values ofmandb.

To use the least squares method assume that each of yourxvalues is calledxiand each of youryvalues is calledyi, so that a single data point is (xi,yi). The counterigoes from 1 ton, which is the number of data points you have. You can then define averagexandyvalues:

 In this project, you will work with the data files provided

And you can use these values to determinemandb

in the "Names" folder in the files section of this class. These

files contain national data on the relative frequency of given names in

Part 2: MATLAB

Now that you have processed the data (and saved it in text format) you will build a graphical user interface (GUI) for visualization. Write a program that does the following

Import the text data into MATLAB

Create a GUI that will show the historic occurrences of the user-given name over time. Your GUI should include a plot of name occurrences (y-axis) vs year (x-axis) as well as buttons (or drop-down menus, or checkboxes, or any other control structure you prefer) to choose whether to plot the absolute number of occurrences or the popularity percentages

Plot the appropriate least squares fit line on top of the historic data

Your GUI should be able to refresh whenever the user selects to plot absolute number of occurrences or the popularity percentage. Your plots should contain all relevant information such as axis labels, title, legend, markers, etc.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!