Question: Could mostly use help on step 4 and 5 but if could show all the work for the other steps would be very helpful as

Could mostly use help on step 4 and 5 but if could show all the work for the other steps would be very helpful as well.

In this lab, you need to read in a dataset and work on that (in a dataframe). Then, we will explore the distribution within the dataset.

Step 1: Create a function (named readStates) to read a CSV file into R

You need to read a URL, not a local file to your computer.

The file is a dataset on state populations (within the United States).

The URL is:

http://www2.census.gov/programs-surveys/popest/tables/2010-2011/state/totals/nst-est2011-01.csv (Links to an external site.)Links to an external site.

Step 2: Clean the dataframe

Note the issues that need to be fixed (removing columns, removing rows, changing column names).

Within your function, make sure there are 51 rows (one per state + the district of Columbia). Make sure there are only 5 columns with the columns having the following names (stateName, Census, Estimates, Pop2010, Pop2011).

Make sure the last four columns are numbers (i.e. not strings).

Step 3: Store and explore the dataset

Store the dataset into a dataframe, called dfStates.

Test your dataframe by calculating the mean for the 2011 data, by doing:

mean(dfStates$Pop2011)

***you should get an answer of 6,109,645

Step 4: Find the state with the highest population

Based on the 2011 data, what is the population of the state with the highest population? What is the name of that state?

Sort the data, in increasing order, based on the 2011 data.

Step 5: Explore the distribution of the states

Write a function (function name: "Distribution") that takes two parameters. The first is a vector and the second is a number. For example, Distribution <- function(vector, number). This step is just a setup for the following instruction.

The function will return the percentage of elements within the vector that is less than the number (i.e. cumulative distribution below the value provided). For example, (1) only keep the elements within the vector that are less than the number, and store the number of eligible elements into the variable "count": count <- length(vector[vector

Test the function with the vector dfStates$Pop2011, and the mean of dfStates$Pop2011. *** you should get 0.6666667 as a result

There are many ways to write this function (described in point 10) so please try to write multiple versions of this function which do you think is best?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!