Question: # Function 1 : Create a function called readStates: #Step 1 : Create a function ( named readStates ) to read a CSV file into
# Function : Create a function called "readStates":
#Step : Create a function named readStates to read a CSV file into R: within the Function
#Q You need to read a URL, not a local file to your computer.
#Q The file is a dataset on state populations within the United States
#Step : Clean the dataframe: within Function
#Q Note the issues that need to be fixed removing columns, removing rows, changing column names
#Q Within your function, make sure there are rows one per state the district of Columbia Make sure there are only columns with the columns having the following names stateName Census, Estimates, Pop Pop
#Q Make sure the last four columns are numbers ie not strings
#Step : Store and explore the dataset: outside of Function
#Q Store the dataset into a dataframe, called dfStates.
# When you run the following, it should print a clean dataframe. Please include the output of "dfStates" in the compiled file by running dfStates as below.
dfStates readStatesurlToRead
dfStates
#Q Test your dataframe by calculating the mean for the data, by doing include your output:
meandfStates$Pop
# You should get an answer of
#Step : Find the state with the highest population: outside the Function
#Q Based on the data, what is the population of the state with the highest population? What is the name of that state, and what is the value of the population?
#Q Sort the data, in increasing order, based on the data.
# Function : Create a function called "Distribution"
#Step : Explore the distribution of the states: You need to create a new function called "Distribution"
#Q You will write a function to calculate percentage of states that have population that is lower than the average. The function function name: "Distribution" takes two parameters. The first is a vector and the second is a number. For example, Distribution functionvector number
# The function will return the percentage of elements within the vector that is less than the number ie cumulative distribution below the value provided
# Think about this: You only keep the elements within the vector that are less than the number, and store the number of eligible elements into the variable "count". Populate XXXX to complete this line of code:
count lengthvectorXXXX
# Then, you will calculate the percentage and return the results. Populate XXXX to complete this line of code:
returncountXXXX
# Test the function with the vector dfStates$Pop and the mean of dfStates$Pop you should get as a result.
table with row headers in column A and column headers in rows through leading dots indicate subparts Table Annual Estimates of the Population for the United States, Regions, States, and Puerto Rico: April to July Geographic Area Apr Population Estimates as of July Census Estimates Base United States ######## ######## ######## ######## Northeast ######## ######## ######## ######## Midwest ######## ######## ######## ######## South ######## ######## ######## ######## West ######## ######## ######## ######## Alabama Alaska Arizona Arkansas California ######## ######## ######## ######## Colorado Connecticut Delaware District of Columbia Florida ######## ######## ######## ######## Georgia Hawaii Idaho Illinois ######## ######## ######## ######## Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada New Hampshire New Jersey New Mexico New York ######## ######## ######## ######## North Carolina North Dakota Ohio ######## ######
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
