Overview Assignment 1 Analysis of the protest data from Crowd Love For each question prompt, write the necessary code to calculate the answer For grading, it's important that you store your answers in the variable names listed with each question in backtics For each prompt marked Reflection , please write a response in your README md file Part 1 Set up In this section, you're loading the data and necessary packages Load the stringr package, which you'll use later Load the data from https countlove org data data csv into a variable called protests How many protests are in the dataset num protests How much information is available about each protest num features Part 2 Attendees In this section, you're exploring the number of attendees Extract the Attendees column into a variable called num attendees What is the lowest number of attendees min attendees (hint for this and other calculations you'll need to consider missing values) What is the highest number of attendees max attendees What is the mean number of attendees mean attendees What is the median number of attendees median attendees What is the difference between the mean and median number of attendees mean median diff Reflection What does the difference between the mean and the median tell you about the distribution of the data (if you're unfamiliar with working with distibutions, feel free to ask your TA for clarification) To further assess the distribution of values, create a boxplot of the number of attendees using the boxplot() function Store the plot in a variable called attendess distribution (Note, we'll use much more refined plotting methods, and pay far more attention to detail later in the course) Create another boxplot of the log of the number of attendees Store the plot in a variable log attendees distribution (note, you will see a warning in the console, which is expected) Part 3 Locations In this section, you're exploring where protests happened Extract the Location column into a variable called locations How many unique locations are in the dataset num locations How many protests occured in Washington num in wa (hint use a function from the stringr package to detect the letters WA ) What proportion of protests occured in Washington prop in wa Reflection Does the number of protests in Washington surprise you Why or why not Write a function count in location() that accepts (as a parameter) a location name, and returns the sentence (note spacing and punctuation) There were N protests in LOCATION , where N is the number of protests that occured in that location, and LOCATION is the parameter that was provided into the function Note, you should count the number of locations that match the parameter put into the function, so Seattle should be a match for Seattle, WA Use your function above to describe the number of protests in Washington, DC dc summary Use your function above to describe the number of protests in Minneapolis minneapolis summary Create a new vector states which is the last two characters of each value in the locations vector Hint, you may want to again use the stringr package Create a vector of the unique states in your dataset uniq states Create a summary sentence for each state by passing your uniq states variable and count in location variables to the sapply() function Store your results in state summary (don't miss how amazing this is Very powerful to apply your function to an entire vector at once with sapply() ) Create a summary table by passing your states variable to the table() funciton, and storing the result in a variable state table Optional use the View() function to more easily read the table Reflection Looking at the state table variable, what data quality issues do you notice, and how would you use that to change your analysis (no need to actually change your analysis) What was the maximum number of protests in a state max in state (hint use your state table variable) Part 4 Dates In this section, you're exploring when protests happened Extract the Date column into a variable called dates by passing the column to the as Date() function (this will process the values as dates, which are luckily already in an optimal format for parsing) What is the most recent date in the dataset most recent What is the earliest date in the dataset earliest What is the length of the timespan of the dataset time span hint R can do math with dates pretty well by default Create a vector of the dates that are in 2020 in 2020 Create a vector of the dates that are in 2019 in 2019 What is the ratio of the number of protests in 2020 comparted to 2019 ratio 2020 2019 Reflection Does the change in the number of protests from 2019 to 2020 surprise you Why or why not Write a function count on date() that accecpts as a parameter a date , and returns the sentence There were N protests on DATE , where N is the number of protests on that date, and DATE is the date provided Using your function you just wrote, how many protests were there on May 24th, 2020 num may 24 Using your function you just wrote, how many protests were there on May 31th, 2020 num on may 31 For more on this timeline, see https www nytimes com article george floyd protests timeline html How many protests occured each month in 2020 by month table Hint use the months() function, your in 2020 dates, and the table() Function If you like, you can do this in multiple different steps As a comparison, let's assess the change between July 2019 and July 2020 What is the difference in the number of protests between July 2020 and July 2019 You'll want to do this in multiple steps as you see fit, though your answer should be stored in the variable change july protests Reflection do a bit of research Find at least two specific policies that have been changed as a result of protests in 2020 These may be at the city, state, or University level Please provide a basic summary, as well as a link to each article Part 5 Protest Purpose In this section, you're exploring why protests happened Extract the Event legacy see tags column into a variable called purpose How many different purposes are listed in the dataset num purposes That's quite a few if you look at View() the vector, you'll notice a common pattern for each purpose It's listed as SOME PURPOSE (additiona detail) To get a higher level summary, create a variable high level purpse by extracting everything before the first parenthesis in each value in the vector For example, from Civil Rights (Black Women's March) you would extract Civil Rights You'll also have to remove the space before the first parenthasis Hint this will take a little bit of googling trial and error Be patient How many high level purposes have you identified num high level Create a table that counts the number of protests for each high level purpose high level table Reflection Take a look ( View() ) your high level table variable What picture does this paint of the U S Part 6 Independent Exploration As a last step, you should write your own function that allows you to quickly ask questions of the dataset For example, in the above sections, you wrote functions to ask the same question about different months, or locations If you need any guidance here, feel free to ask

The Answer is in the image, click to view ...

Question: # Overview ---------------------------------------------------------------- # Assignment 1: Analysis of the protest data from Crowd Love # For each question/prompt, write the necessary code to calculate the

# Overview ----------------------------------------------------------------

	# Assignment 1: Analysis of the protest data from Crowd Love
	# For each question/prompt, write the necessary code to calculate the answer.
	# For grading, it's important that you store your answers in the variable names
	# listed with each question in `backtics`.
	# For each prompt marked `Reflection`, please write a response
	# in your `README.md` file.



	# Part 1:Set up -----------------------------------------------------------

	# In this section, you're loading the data and necessary packages.
	# Load the `stringr` package, which you'll use later.

	# Load the data from https://countlove.org/data/data.csv
	# into a variable called `protests`

	# How many protests are in the dataset? `num_protests`

	# How much information is available about each protest? `num_features`


	# Part 2: Attendees -------------------------------------------------------

	# In this section, you're exploring the number of attendees.

	# Extract the `Attendees` column into a variable called `num_attendees`

	# What is the lowest number of attendees? `min_attendees`
	# (hint for this and other calculations: you'll need to consider missing values)

	# What is the highest number of attendees? `max_attendees`

	# What is the mean number of attendees? `mean_attendees`

	# What is the median number of attendees? `median_attendees`

	# What is the difference between the mean and median number of attendees?
	# `mean_median_diff`

	# Reflection: What does the difference between the mean and the median
	# tell you about the distribution of the data? (if you're unfamiliar with
	# working with distibutions, feel free to ask your TA for clarification)

	# To further assess the distribution of values, create a boxplot of the number
	# of attendees using the `boxplot()` function.
	# Store the plot in a variable called `attendess_distribution`
	# (Note, we'll use much more refined plotting methods, and pay far
	# more attention to detail later in the course)

	# Create another boxplot of the log of the number of attendees.
	# Store the plot in a variable `log_attendees_distribution`.
	# (note, you will see a warning in the console, which is expected)


	# Part 3: Locations -------------------------------------------------------

	# In this section, you're exploring where protests happened.

	# Extract the `Location` column into a variable called `locations`

	# How many unique locations are in the dataset? `num_locations`

	# How many protests occured in Washington? `num_in_wa`
	# (hint: use a function from the stringr package to detect the letters "WA")

	# What proportion of protests occured in Washington? `prop_in_wa`

	# Reflection: Does the number of protests in Washington surprise you?
	# Why or why not?

	# Write a function `count_in_location()` that accepts (as a parameter)
	# a `location` name, and returns the sentence (note: spacing and punctuation):
	# "There were N protests in LOCATION.", where N is the number of
	# protests that occured in that location, and LOCATION is the parameter that
	# was provided into the function.
	# Note, you should count the number of locations that match the parameter
	# put into the function, so `Seattle` should be a match for "Seattle, WA"

	# Use your function above to describe the number of protests in "Washington, DC"
	# `dc_summary`

	# Use your function above to describe the number of protests in "Minneapolis"
	# `minneapolis_summary`

	# Create a new vector `states` which is the last two characters of each
	# value in the `locations` vector. Hint, you may want to again use the
	# `stringr` package

	# Create a vector of the unique states in your dataset. `uniq_states`

	# Create a summary sentence for each state by passing your `uniq_states`
	# variable and `count_in_location` variables to the `sapply()` function.
	# Store your results in `state_summary`
	# (don't miss how amazing this is! Very powerful to apply your function to an
	# entire vector at once with `sapply()`)

	# Create a summary table by passing your `states` variable to the `table()`
	# funciton, and storing the result in a variable `state_table`.

	# Optional: use the View() function to more easily read the table

	# Reflection: Looking at the `state_table` variable, what data quality issues
	# do you notice, and how would you use that to change your analysis (no need
	# to actually change your analysis)?

	# What was the maximum number of protests in a state? `max_in_state`
	# (hint: use your `state_table` variable)


	# Part 4: Dates -----------------------------------------------------------

	# In this section, you're exploring when protests happened.

	# Extract the `Date` column into a variable called `dates` by passing the
	# column to the `as.Date()` function (this will process the values as dates,
	# which are luckily already in an optimal format for parsing)

	# What is the most recent date in the dataset? `most_recent`

	# What is the earliest date in the dataset? `earliest`

	# What is the length of the timespan of the dataset? `time_span`
	# hint: R can do math with dates pretty well by default!

	# Create a vector of the dates that are in 2020 `in_2020`

	# Create a vector of the dates that are in 2019. `in_2019`

	# What is the ratio of the number of protests in 2020 comparted to 2019?
	# `ratio_2020_2019`

	# Reflection: Does the change in the number of protests from 2019 to 2020
	# surprise you? Why or why not?

	# Write a function `count_on_date()` that accecpts as a parameter a `date`,
	# and returns the sentence:
	# "There were N protests on DATE.", where N is the number of protests on that
	# date, and DATE is the date provided

	# Using your function you just wrote, how many protests were there on
	# May 24th, 2020? `num_may_24`

	# Using your function you just wrote, how many protests were there on
	# May 31th, 2020? `num_on_may_31`

	# For more on this timeline, see:
	# https://www.nytimes.com/article/george-floyd-protests-timeline.html

	# How many protests occured each month in 2020? `by_month_table`
	# Hint: use the `months()` function, your `in_2020` dates, and the `table()`
	# Function. If you like, you can do this in multiple different steps.

	# As a comparison, let's assess the change between July 2019 and July 2020.
	# What is the difference in the number of protests between July 2020 and
	# July 2019? You'll want to do this in multiple steps as you see fit, though
	# your answer should be stored in the variable `change_july_protests`.

	# Reflection: do a bit of research. Find at least two specific policies that
	# have been changed as a result of protests in 2020. These may be at the
	# city, state, or University level. Please provide a basic summary, as well as a
	# link to each article.


	# Part 5: Protest Purpose -------------------------------------------------

	# In this section, you're exploring why protests happened
	# Extract the `Event..legacy..see.tags.` column into a variable called `purpose`

	# How many different purposes are listed in the dataset? `num_purposes`

	# That's quite a few -- if you look at -- View() -- the vector, you'll notice
	# a common pattern for each purpose. It's listed as:
	# SOME_PURPOSE (additiona_detail)
	# To get a higher level summary, create a variable `high_level_purpse` by
	# extracting everything before the first parenthesis in each value
	# in the vector. For example, from "Civil Rights (Black Women's March)"
	# you would extract "Civil Rights". You'll also have to remove the space
	# before the first parenthasis.
	# Hint: this will take a little bit of googling // trial and error. Be patient!

	# How many "high level" purposes have you identified? `num_high_level`

	# Create a table that counts the number of protests for each high level purpose
	# `high_level_table`

	# Reflection: Take a look (`View()`) your `high_level_table` variable. What
	# picture does this paint of the U.S.?


	# Part 6: Independent Exploration -----------------------------------------

	# As a last step, you should write your own function that allows you to
	# quickly ask questions of the dataset. For example, in the above sections,
	# you wrote functions to ask the same question about different months, or
	# locations. If you need any guidance here, feel free to ask!

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Accounting Questions!

CST8333 Assignment 1 Project Initiation: Report & Presentation INSTRUCTIONS All material prepared for this assignment was produced by the author. Material from all third parties has been cited and...

Overview In this assignment you'l implement a data structure called a trie, which is used to answer queries regarding the characteristics of a text file (e.g., frequency of a given word). This...

book.cpp file BookList Sequence Containers Homework Last updated: Friday, February 12, 2021 The following class diagrams should help you visualize the BookList interface, and to remind you what the...

Hello, Would you please review the attached assignment? I know I have errors in and this assignment carries into the rest of the course so I need to get it corrected to move forward. I have attached...

In this assignment, you will be analyzing two datasets about food access in the US . The work you do in this assignment will help tell different stories at different levels of granularity about food...

Hi I need assistance with this assignment it is due TONIGHT. Please help. Assignment 1 Role of The Manager and The Impact of Organizational Theories on Managers (Week 3) Purpose: In the first...

PROJECT SCOPE [Instructions for what to include in this section: Define the scope of work that will be undertaken to provide the deliverable(s) mentioned in the Project Charter (PC). Craft this...

1 Final Project Topic 3: Using Regression Read through these instructions in their entirety before you begin. These instructions include your research objective, the assignment requirements, and how...

I need help with this essay. Performing tax research to find correct answers to a given tax situation and composing memoranda summarizing these findings are important parts of tax practice. As...

On August 7, Asian Artifacts Corporation issued for cash 300,000 shares of no-par common stock at $1.75. On September 1, Asian Artifacts issued 25,000 shares of 2%, $40 preferred stock at par for...

Analyze the Shoshoni rectangle data (Case Study 7.4.2) with a sign test. Let = 0.05.

Tick () the appropriate column, in the following table, to indicate whether the named software is either system software or application software. Software Screensaver Antivirus software Printer...

Selected current year-end financial statements of Cabot Corporation follow. (All sales were on credit; selected balance sheet amounts at December 31 of the prior year were inventory, $51,900; total...