Question: Could mostly use help on step 4 and 5 but if could show all the work for the other steps would be very helpful as
Could mostly use help on step 4 and 5 but if could show all the work for the other steps would be very helpful as well.
In this lab, you need to read in a dataset and work on that (in a dataframe). Then, we will explore the distribution within the dataset.
Step 1: Create a function (named readStates) to read a CSV file into R
You need to read a URL, not a local file to your computer.
The file is a dataset on state populations (within the United States).
The URL is:
http://www2.census.gov/programs-surveys/popest/tables/2010-2011/state/totals/nst-est2011-01.csv (Links to an external site.)Links to an external site.
Step 2: Clean the dataframe
Note the issues that need to be fixed (removing columns, removing rows, changing column names).
Within your function, make sure there are 51 rows (one per state + the district of Columbia). Make sure there are only 5 columns with the columns having the following names (stateName, Census, Estimates, Pop2010, Pop2011).
Make sure the last four columns are numbers (i.e. not strings).
Step 3: Store and explore the dataset
Store the dataset into a dataframe, called dfStates.
Test your dataframe by calculating the mean for the 2011 data, by doing:
mean(dfStates$Pop2011)
***you should get an answer of 6,109,645
Step 4: Find the state with the highest population
Based on the 2011 data, what is the population of the state with the highest population? What is the name of that state?
Sort the data, in increasing order, based on the 2011 data.
Step 5: Explore the distribution of the states
Write a function (function name: "Distribution") that takes two parameters. The first is a vector and the second is a number. For example, Distribution <- function(vector, number). This step is just a setup for the following instruction.
The function will return the percentage of elements within the vector that is less than the number (i.e. cumulative distribution below the value provided). For example, (1) only keep the elements within the vector that are less than the number, and store the number of eligible elements into the variable "count": count <- length(vector[vector Test the function with the vector dfStates$Pop2011, and the mean of dfStates$Pop2011. *** you should get 0.6666667 as a result There are many ways to write this function (described in point 10) so please try to write multiple versions of this function which do you think is best?
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
