Question: Writing Prompts 1. Constructing Charts from Categorical Data Here is a spreadsheet showing data about our class that I collected from the Introduction survey: ::Class
Writing Prompts
1. Constructing Charts from Categorical Data
Here is a spreadsheet showing data about our class that I collected from the Introduction survey:
::Class Data Spreadsheet::
Your job for this part of the assignment is to create displays that collect and organize this data, as described below.
You can make choices about how to group and organize the data into categories. This is more or less straightforward for some categories, but more complicated for others. There is nobestway to do this but it depends on what kind of story you're trying to tell with the data. As a rule of thumb, too many different categories usually gives information overload whereas too few categories doesn't tell an interesting enough story. Finding the middle ground depends on the tastes and preferences of the graph-maker (that's you!).
a. Simple Bar Graph
Choose one of the variables "Where are you from?", "What is your favorite subject in school?", or "What is your favorite music?"
Create a bar chart to summarize this class data for the variable you chose. Along the way, you will need to organize the responses into your own categories. When you do this, explain your organizational choices and why you made them to your reader.
(note: Google Sheets and Excel do have ways of making bar charts automatically, but that is not my intention for this assignment. Rather than just pressing a button to create the chart, I want you to construct it yourself by hand.)
Once you've created your chart, write a description of any interesting patterns or trends you notice.
b. Contingency Table and Compound Bar Graph
Now choose any of the other two categorical variables you didn't use above that you think are most interesting to compare. Construct a contingency table showing the relationship between those variables.
Again, if you need to make organizational choices, explain them and your motivation for making them to your reader.
After you've made a contingency table, use it to create either a stacked bar chart or a multi-column bar chart to show this information. Explain to your reader why you chose the format that you did.
Once you've created your charts, write a description of any interesting patterns or trends you notice.
2. Numerical Data
::Here is a data set:: downloaded from our textbook's website showing information about all of the counties in each state in the US. Data was collected between 2010 - 2017. This data is actually showing how these quantities changed in each county between these two years.
Your job is to choose three states and compare one of the variables between those states using the numerical analysis techniques we've been working on this week.
Getting Set Up
First, choose the variable you're interested in studying. Each column on the spreadsheet is a different variable and represents a change (delta) in that measurement between 2010 and 2017:
- change in population
- change in median household income
- change in median age
- change in percent of population that is white
- change in percent of the population with a bachelors degree
- change in percent of the population living in poverty
(Keep in mind that since the data shown is "change in" these quantities, these data sets likely include negative numbers. If a measurement is negative, that means that county decreased in that variable from 2010 - 2017.)
Then, choose your three states. This works the best when the states you choose have more than a few counties.
If you do want to analyze states with only a few counties, then maybe for one of your choices you can group some similar states together, for instance "New England" or "The Dakotas and Montana", etc...?
Be sure your states or groups of states have at least 20 counties.
The "State" Column is sorted in alphabetical order. Highlight the entries for the counties in the State you wish to study, and paste them into your own spreadsheet. See::this video:: for a demonstration.
::This worksheet:: provides a guide to help you use Google Sheets to do this analysis. The videos posted this week also show examples of doing this analysis with Google Sheets.
(You're welcome to use Excel instead of Google Sheets, but some of the details may vary.)
1. Box Plots, medians, and quartiles
For each state, use your spreadsheet to compute the min, Q1, median, Q3, and max of your data.
Describe what this information tells you about these states. Be specific.
For each state, use a spreadsheet to compute the interquartile range, fence length, upper and lower fences.
Use this information to draw adjacent boxplots of the variable you are studying for each state. Draw these boxplotsalong the same axis so that we can directly compare them. Use the computations from above, but draw your boxplots by hand instead of using the spreadsheet's graph features.
Use your box plots and computations to write a comparison between the data sets.
- Rank the states in order of their "center" from lowest to highest. (Which measure of center is appropriate in this context?)
- Rank the states in order of their "spread" from lowest to highest. (Which measure of spread is appropriate in this context?)
- Which state has outliers? How can you tell from the data set or box plot?
- If the state has outliers, research which counties give outlying values and briefly research why we might observe an outlying value in that county.
- Briefly summarize and interpret your observations about the three states (or groups of states) using these measurements and pictures.
2. Histograms, means, standard deviations
For each state, use a spreadsheet to compute the mean and standard deviation.
Describe what this information tells you about these states. Be specific.
For each state, use the spreadsheet to draw a histogram of the variable you are studying.
- Use the same lower and upper limit on your x-axis .
- Use the same bin size for all of your graphs so that you can directly compare them.
Use your histograms and computations to write a comparison between data sets.
- Rank the states in order of their "center." (Which measure of center is appropriate in this context? How can we see this visually on a histogram?)
- Rank the states in order of their "spread." (Which measure of spread is appropriate in this context? How can we see this visually on a histogram?)
- Which state has outliers? How can you tell from the histogram?
- Briefly summarize and interpret your observations about the three states (or groups of states) using these measurements and pictures.
3. Summary and Interpretation
Compare the rankings for your two measures of center. Does the ranking order change? Do the actual numerical measurements greatly vary?
Compare the rankings for your two measures of spread. Does the ranking order change? Why or why not?
If you identified any outliers above, use the raw data to find out which counties those are. Briefly research why these counties may be outliers with respect to your variable.
What information and conclusions do you think are easier to see using the boxplots?
What information and conclusions do you think are easier to see using the histograms?
Using your observations from your computations and graphs as evidence, write 6-8 sentences speculating about why your variable might be like this in the three states you've studied. What could cause the average to be high or low in each of your states? What could cause the spread of this variable to be high or low across the counties in your particular states? What could cause your data to have the shape that it does in your states? Try to be as specific and detailed as possible.
4. Other informative statistics
(This part is optional, if you're interested)
Beyond the basic measurements we've talked about above, the field of statistics is rich with fancier computations, and more clever ways of describing and summarizing a data set. The area of Baseball statistics comes to mind, where the statistician invents more complicated metrics to measure a player's performance in order to more accurately rate their value to the team.
Here are a few other things you can measure about your chosen states:
Shape
Compute"Mean / Median". What kind of numbers do you get from this computation from your states? What does the value of this number tell you about the skew of your data?
Compute"(Q3 - Median) / (Median - Q1)" for each of your states. What does this value tell you about the shape of the data?
Can you come up with your own formula that would give you more info about the shape of your data?
Relative Spread
Both the IQR and the standard deviation give absolute measurements of the spread of the data. As a general example, suppose you were measuring the yearly salaries of college presidents and you found that there was a standard deviation of $10 between each of the salaries. You also measured all of their heights and found a standard deviation of 10 inches. Even though the standard deviation gives exactly the same number between both data sets, they have very different meanings in context. (For which data set is this variation more extreme?)
We can compute the relative spread by dividing the absolute spread measurements by the centers. This "scales" the spread to relate it to the magnitude of the data points themselves.
Compute the Relative Standard Deviation for each city,"Standard Deviation / Mean"
Compute the Relative Interquartile Range,"IQR / Median"
How do the rankings for relative spread for your states compare to the above rankings for spread?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
