In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and titles. It is good practice to get in the habit of this since you will be presenting ideas through graphs in data science. Feel free to make your plots prettier and explore new commands. Try to get the basic idea of the graph down first, then add little details once you feel comfortable. We will be working with the presidential_races. RData` for **Part 1 and 3**. The data set includes state information from several decades of presidential elections. Here are the names of the columns: * year state * state_po * state_fips *state_cens * state_ic * office candidate party_detailed writein * candidatevotes * totalvotes * version * notes * party_simplified Load in the presidential races data. Load {tidyverse}. {r} **NOTE**: All the categorical variables in the presidential_races. RData` data need to be changed to a `factor`, because they are read in weirdly. Write code to fix this below: {r} load("presidential_races. RData") library(tidyverse) # Part 1 (Base R) Using the presidential_races. RData data, create a .... simple scatter plot ***{r} simple barplot {r} simple histogram {r} simple lineplot {r} simple boxplot `{r} # Part 2 {ggplot2} A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows: price` = price in US dollars ($326-$18,823) carat` = weight of the diamond (0.2-5.01) cut = quality of the cut (Fair, Good, Very Good, Premium, Ideal) `color` = diamond color, from J (worst) to D (best) `clarity` = a measurement of how clear the diamond is (I1 (worst), SI1, SIZ, VS1, VS2, VVS1, VVS2, IF (best)) `x' = length in mm (0-10.74) `y` = width in mm (0-58.9) z depth in mm (0-31.8) depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79) table = width of top of diamond relative to widest point (43-95) The diamonds data frame is large, lets take a random sample of 100 records. {r,} Produce a scatter plot in the minimalist theme that communicates the relationship between the price` (y-axis) of diamonds and the carat (x-axis). Does the price also depend on the color of the diamond? Fill the scatter points with the color variable. Fits a smoother to the data and displays the smooth and its standard error. Remember to put a title and labels for axes. {r} Produce a bar chart that shows the frequency of each diamond cut types, change the color of the bar to "coral", make the limit of y-axis from 0 to 40. Display the number of frequency on top of each bar. {r} Produce a multiple box plot chart that display the distribution of `price across the different diamond color`s. Fill each box with different colors corresponding to the diamond color Display the outliers in red (color) cross (`shape) for each box-and-whisker. Adjust the transparency level to 0.5`. ```{r} Produce a histogram that represents the distribution of diamond `carat and fill the color of each bin corresponding to the diamond clarity. Adjust the bin width` that is suitable for the context. {r} Produce a density chart to represent of the distribution of a carat` (limit 0 to 3) by the diamond `color`. {r} # Part 3 (ggplot) 1. Preliminary Items * Y ▶ a. What are the dimensions of the data? What is the structure? Look at the first few rows. {r} b. We need to do some cleaning. For this lab, we will only be concerned with the the columns year`, `state`, `state_po` candidate, candidatevotes`, totalvotes`, and party_simplified. Create a new data frame called 'presidents with only these columns selected. ```{r} c. For this lab, we will also only be considering the Democrat and Republican candidates. Filter the data such that our data only contains candidates who are Democrat and Republican`. Store the results in `presidents`. {r} d. There is one `NA` in this data set. Use `na.omit()` to remove that observation form the data set. ```{r} 2. The goal of this question is to create plots for an analysis of presidential results in the state of Tennessee. a. Filter the data set to only include the results of Tennessee from the past several decades. Call the data set `TN`. --`{r} b. Using the TN data set. Please create a plot of the total voter turnout for each election year. Plot it with both lines and points. Also, find a way to have an x-axis tick mark for each election year (i.e., `1976`, 1980, 1984, 2020¹). Add the attribute theme_minimal()` to obtain a plain background. c. Based on the previous plot, make a plot for both republican candidates and democratic candidates and the votes each candidate obtained for the year. Change the color of the lines: "red" for republican and "blue" for democrat. {r} d. Explain, why raw voter turnout can be misleading in these figures? 3. It may be more interesting to look at the percentage of votes change over time. Create a similar plot from the previous problem. However, use a different state and plot the values based on percentage of votes rather than total votes. a. Create a new variable in presidents called TotalPer`, which is the percentage of votes for each candidate. {r} b. Filter the data with a state of your interest. {r} c. Create a plot of percentage of candidate votes for this new state over the past decades. ***{r} ▶ 4. Create a bar plot for the percentage of votes by each candidate in all 13 battleground states in one plot. a. Filter the data for just the election of 2020 and the state `AZ. Call this dataset AZZOZO`. ```{r} b. Create a bar plot with just the information from `AZZ020 ***{r} c. Now, filter the data set to the 13 battleground states for the 2020 election: AZ`, `FL`, `GA`, `IA`, `MI`, `MN`, `NV`, `NH`, `NC`, `OH`, `PA`, `TX, WI. Call this data set `bgpres2020. ```{r} d. Create a bar plot with the 13 battleground states and the candidate percentage comparison for each state. ```{r} In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and titles. It is good practice to get in the habit of this since you will be presenting ideas through graphs in data science. Feel free to make your plots prettier and explore new commands. Try to get the basic idea of the graph down first, then add little detaills once you feel comfortable. We will be working with the presidential_races. RData` for **Part 1 and 3**. The data set includes state information from several decades of presidential elections. Here are the names of the columns: * year state * state_po * state_fips *state_cens * state_ic * office candidate party_detailed writein * candidatevotes * totalvotes * version * notes * party simplified Load in the presidential races data. Load {tidyverse}. {r} **NOTE**: All the categorical variables in the presidential_races. RData` data need to be changed to a `factor`, because they are read in weirdly. Write code to fix this below: {r} load("presidential_races. RData") library(tidyverse) # Part 1 (Base R) Using the presidential_races. RData data, create a .... simple scatter plot ***{r} simple barplot {r} simple histogram {r} simple lineplot {r} simple boxplot `{r} # Part 2 {ggplot2} A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows: price` = price in US dollars ($326-$18,823) carat` = weight of the diamond (0.2-5.01) cut =quality of the cut (Fair, Good, Very Good, Premium, Ideal) `color` = diamond color, from J (worst) to D (best) `clarity` = a measurement of how clear the diamond is (I1 (worst), SI1, SIZ, VS1, VS2, VVS1, VVS2, IF (best)) `x' = length in mm (0-10.74) `y` = width in mm (0-58.9) z` = depth in mm (0-31.8) depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79) table = width of top of diamond relative to widest point (43-95) The diamonds data frame is large, lets take a random sample of 100 records. {r,} Produce a scatter plot in the minimalist theme that communicates the relationship between the price` (y-axis) of diamonds and the carat (x-axis). Does the price also depend on the color of the diamond? Fill the scatter points with the color variable. Fits a smoother to the data and displays the smooth and its standard error. Remember to put a title and labels for axes. {r} Produce a bar chart that shows the frequency of each diamond cut types, change the color of the bar to "coral", make the limit of y-axis from 0 to 40. Display the number of frequency on top of each bar. {r} Produce a multiple box plot chart that display the distribution of `price across the different diamond color`s. Fill each box with different colors corresponding to the diamond color Display the outliers in red (color) cross (`shape) for each box-and-whisker. Adjust the transparency level to 0.5`. ```{r} Produce a histogram that represents the distribution of diamond `carat and fill the color of each bin corresponding to the diamond clarity. Adjust the `bin width` that is suitable for the context. {r} Produce a density chart to represent of the distribution of a carat` (limit 0 to 3) by the diamond `color`. {r} # Part 3 (ggplot) 1. Preliminary Items * Y ▶ a. What are the dimensions of the data? What is the structure? Look at the first few rows. {r} b. We need to do some cleaning. For this lab, we will only be concerned with the the columns year`, `state`, `state_po` candidate, `candidatevotes`, totalvotes`, and party_simplified. Create a new data frame called 'presidents with only these columns selected. ```{r} c. For this lab, we will also only be considering the Democrat and Republican candidates. Filter the data such that our data only contains candidates who are `Democrat and Republican`. Store the results in presidents`. {r} d. There is one `NA` in this data set. Use `na.omit()` to remove that observation form the data set. ```{r} 2. The goal of this question is to create plots for an analysis of presidential results in the state of Tennessee. a. Filter the data set to only include the results of Tennessee from the past several decades. Call the data set `TN`. --`{r} b. Using the `TN data set. Please create a plot of the total voter turnout for each election year. Plot it with both lines and points. Also, find a way to have an x-axis tick mark for each election year (i.e., `1976`, 1980, 1984, 2020¹). Add the attribute theme_minimal()` to obtain a plain background. c. Based on the previous plot, make a plot for both republican candidates and democratic candidates and the votes each candidate obtained for the year. Change the color of the lines: "red" for republican and "blue" for democrat. {r} d. Explain, why raw voter turnout can be misleading in these figures? 3. It may be more interesting to look at the percentage of votes change over time. Create a similar plot from the previous problem. However, use a different state and plot the values based on percentage of votes rather than total votes. a. Create a new variable in presidents called TotalPer`, which is the percentage of votes for each candidate. {r} b. Filter the data with a state of your interest. {r} c. Create a plot of percentage of candidate votes for this new state over the past decades. ***{r} ▶ 4. Create a bar plot for the percentage of votes by each candidate in all 13 battleground states in one plot. a. Filter the data for just the election of 2020 and the state `AZ. Call this dataset AZZOZO`. ```{r} b. Create a bar plot with just the information from `AZZ020 ***{r} c. Now, filter the data set to the 13 battleground states for the 2020 election: AZ`, `FL`, `GA`, `IA`, `MI`, `MN`, `NV`, `NH`, `NC`, `OH`, `PA, TX, WI. Call this data set `bgpres2020. ```{r} d. Create a bar plot with the 13 battleground states and the candidate percentage comparison for each state. ```{r} In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and titles. It is good practice to get in the habit of this since you will be presenting ideas through graphs in data science. Feel free to make your plots prettier and explore new commands. Try to get the basic idea of the graph down first, then add little details once you feel comfortable. We will be working with the presidential_races. RData` for **Part 1 and 3**. The data set includes state information from several decades of presidential elections. Here are the names of the columns: * year state * state_po * state_fips *state_cens * state_ic * office candidate party_detailed writein * candidatevotes * totalvotes * version * notes * party_simplified Load in the presidential races data. Load {tidyverse}. {r} **NOTE**: All the categorical variables in the presidential_races. RData` data need to be changed to a `factor`, because they are read in weirdly. Write code to fix this below: {r} load("presidential_races. RData") library(tidyverse) # Part 1 (Base R) Using the presidential_races. RData data, create a .... simple scatter plot ***{r} simple barplot {r} simple histogram {r} simple lineplot {r} simple boxplot `{r} # Part 2 {ggplot2} A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows: price` = price in US dollars ($326-$18,823) carat` = weight of the diamond (0.2-5.01) cut = quality of the cut (Fair, Good, Very Good, Premium, Ideal) `color` = diamond color, from J (worst) to D (best) `clarity` = a measurement of how clear the diamond is (I1 (worst), SI1, SIZ, VS1, VS2, VVS1, VVS2, IF (best)) `x' = length in mm (0-10.74) `y` = width in mm (0-58.9) z depth in mm (0-31.8) depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79) table = width of top of diamond relative to widest point (43-95) The diamonds data frame is large, lets take a random sample of 100 records. {r,} Produce a scatter plot in the minimalist theme that communicates the relationship between the price` (y-axis) of diamonds and the carat (x-axis). Does the price also depend on the color of the diamond? Fill the scatter points with the color variable. Fits a smoother to the data and displays the smooth and its standard error. Remember to put a title and labels for axes. {r} Produce a bar chart that shows the frequency of each diamond cut types, change the color of the bar to "coral", make the limit of y-axis from 0 to 40. Display the number of frequency on top of each bar. {r} Produce a multiple box plot chart that display the distribution of `price across the different diamond color`s. Fill each box with different colors corresponding to the diamond color Display the outliers in red (color) cross (`shape) for each box-and-whisker. Adjust the transparency level to 0.5`. ```{r} Produce a histogram that represents the distribution of diamond `carat and fill the color of each bin corresponding to the diamond clarity. Adjust the bin width` that is suitable for the context. {r} Produce a density chart to represent of the distribution of a carat` (limit 0 to 3) by the diamond `color`. {r} # Part 3 (ggplot) 1. Preliminary Items * Y ▶ a. What are the dimensions of the data? What is the structure? Look at the first few rows. {r} b. We need to do some cleaning. For this lab, we will only be concerned with the the columns year`, `state`, `state_po` candidate, candidatevotes`, totalvotes`, and party_simplified. Create a new data frame called 'presidents with only these columns selected. ```{r} c. For this lab, we will also only be considering the Democrat and Republican candidates. Filter the data such that our data only contains candidates who are Democrat and Republican`. Store the results in `presidents`. {r} d. There is one `NA` in this data set. Use `na.omit()` to remove that observation form the data set. ```{r} 2. The goal of this question is to create plots for an analysis of presidential results in the state of Tennessee. a. Filter the data set to only include the results of Tennessee from the past several decades. Call the data set `TN`. --`{r} b. Using the TN data set. Please create a plot of the total voter turnout for each election year. Plot it with both lines and points. Also, find a way to have an x-axis tick mark for each election year (i.e., `1976`, 1980, 1984, 2020¹). Add the attribute theme_minimal()` to obtain a plain background. c. Based on the previous plot, make a plot for both republican candidates and democratic candidates and the votes each candidate obtained for the year. Change the color of the lines: "red" for republican and "blue" for democrat. {r} d. Explain, why raw voter turnout can be misleading in these figures? 3. It may be more interesting to look at the percentage of votes change over time. Create a similar plot from the previous problem. However, use a different state and plot the values based on percentage of votes rather than total votes. a. Create a new variable in presidents called TotalPer`, which is the percentage of votes for each candidate. {r} b. Filter the data with a state of your interest. {r} c. Create a plot of percentage of candidate votes for this new state over the past decades. ***{r} ▶ 4. Create a bar plot for the percentage of votes by each candidate in all 13 battleground states in one plot. a. Filter the data for just the election of 2020 and the state `AZ. Call this dataset AZZOZO`. ```{r} b. Create a bar plot with just the information from `AZZ020 ***{r} c. Now, filter the data set to the 13 battleground states for the 2020 election: AZ`, `FL`, `GA`, `IA`, `MI`, `MN`, `NV`, `NH`, `NC`, `OH`, `PA`, `TX, WI. Call this data set `bgpres2020. ```{r} d. Create a bar plot with the 13 battleground states and the candidate percentage comparison for each state. ```{r} In this lab, we will be exploring ggplot. For all questions, please put appropriate labels and titles. It is good practice to get in the habit of this since you will be presenting ideas through graphs in data science. Feel free to make your plots prettier and explore new commands. Try to get the basic idea of the graph down first, then add little detaills once you feel comfortable. We will be working with the presidential_races. RData` for **Part 1 and 3**. The data set includes state information from several decades of presidential elections. Here are the names of the columns: * year state * state_po * state_fips *state_cens * state_ic * office candidate party_detailed writein * candidatevotes * totalvotes * version * notes * party simplified Load in the presidential races data. Load {tidyverse}. {r} **NOTE**: All the categorical variables in the presidential_races. RData` data need to be changed to a `factor`, because they are read in weirdly. Write code to fix this below: {r} load("presidential_races. RData") library(tidyverse) # Part 1 (Base R) Using the presidential_races. RData data, create a .... simple scatter plot ***{r} simple barplot {r} simple histogram {r} simple lineplot {r} simple boxplot `{r} # Part 2 {ggplot2} A dataset containing the prices and other attributes of almost 54,000 diamonds. The variables are as follows: price` = price in US dollars ($326-$18,823) carat` = weight of the diamond (0.2-5.01) cut =quality of the cut (Fair, Good, Very Good, Premium, Ideal) `color` = diamond color, from J (worst) to D (best) `clarity` = a measurement of how clear the diamond is (I1 (worst), SI1, SIZ, VS1, VS2, VVS1, VVS2, IF (best)) `x' = length in mm (0-10.74) `y` = width in mm (0-58.9) z` = depth in mm (0-31.8) depth = total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79) table = width of top of diamond relative to widest point (43-95) The diamonds data frame is large, lets take a random sample of 100 records. {r,} Produce a scatter plot in the minimalist theme that communicates the relationship between the price` (y-axis) of diamonds and the carat (x-axis). Does the price also depend on the color of the diamond? Fill the scatter points with the color variable. Fits a smoother to the data and displays the smooth and its standard error. Remember to put a title and labels for axes. {r} Produce a bar chart that shows the frequency of each diamond cut types, change the color of the bar to "coral", make the limit of y-axis from 0 to 40. Display the number of frequency on top of each bar. {r} Produce a multiple box plot chart that display the distribution of `price across the different diamond color`s. Fill each box with different colors corresponding to the diamond color Display the outliers in red (color) cross (`shape) for each box-and-whisker. Adjust the transparency level to 0.5`. ```{r} Produce a histogram that represents the distribution of diamond `carat and fill the color of each bin corresponding to the diamond clarity. Adjust the `bin width` that is suitable for the context. {r} Produce a density chart to represent of the distribution of a carat` (limit 0 to 3) by the diamond `color`. {r} # Part 3 (ggplot) 1. Preliminary Items * Y ▶ a. What are the dimensions of the data? What is the structure? Look at the first few rows. {r} b. We need to do some cleaning. For this lab, we will only be concerned with the the columns year`, `state`, `state_po` candidate, `candidatevotes`, totalvotes`, and party_simplified. Create a new data frame called 'presidents with only these columns selected. ```{r} c. For this lab, we will also only be considering the Democrat and Republican candidates. Filter the data such that our data only contains candidates who are `Democrat and Republican`. Store the results in presidents`. {r} d. There is one `NA` in this data set. Use `na.omit()` to remove that observation form the data set. ```{r} 2. The goal of this question is to create plots for an analysis of presidential results in the state of Tennessee. a. Filter the data set to only include the results of Tennessee from the past several decades. Call the data set `TN`. --`{r} b. Using the `TN data set. Please create a plot of the total voter turnout for each election year. Plot it with both lines and points. Also, find a way to have an x-axis tick mark for each election year (i.e., `1976`, 1980, 1984, 2020¹). Add the attribute theme_minimal()` to obtain a plain background. c. Based on the previous plot, make a plot for both republican candidates and democratic candidates and the votes each candidate obtained for the year. Change the color of the lines: "red" for republican and "blue" for democrat. {r} d. Explain, why raw voter turnout can be misleading in these figures? 3. It may be more interesting to look at the percentage of votes change over time. Create a similar plot from the previous problem. However, use a different state and plot the values based on percentage of votes rather than total votes. a. Create a new variable in presidents called TotalPer`, which is the percentage of votes for each candidate. {r} b. Filter the data with a state of your interest. {r} c. Create a plot of percentage of candidate votes for this new state over the past decades. ***{r} ▶ 4. Create a bar plot for the percentage of votes by each candidate in all 13 battleground states in one plot. a. Filter the data for just the election of 2020 and the state `AZ. Call this dataset AZZOZO`. ```{r} b. Create a bar plot with just the information from `AZZ020 ***{r} c. Now, filter the data set to the 13 battleground states for the 2020 election: AZ`, `FL`, `GA`, `IA`, `MI`, `MN`, `NV`, `NH`, `NC`, `OH`, `PA, TX, WI. Call this data set `bgpres2020. ```{r} d. Create a bar plot with the 13 battleground states and the candidate percentage comparison for each state. ```{r}
Expert Answer:
Answer rating: 100% (QA)
Part 1 Base R Load necessary packages and data loadpresidentialracesRData librarytidyverse Fix categorical variables to factor presidentialraces mutat... View the full answer
Related Book For
Financial Accounting and Reporting a Global Perspective
ISBN: 978-1408076866
4th edition
Authors: Michel Lebas, Herve Stolowy, Yuan Ding
Posted Date:
Students also viewed these programming questions
-
Platinum corporation, a Canadian company, invests 50,000,000 Euros in France. The investment generates after-tax cash flows of Euros 25 Million, 33 million, 46 million and 30 million in the first 4...
-
Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...
-
List three specific parts of the Case Guide, Objectives and Strategy Section (See below) that you had the most difficulty understanding. Describe your current understanding of these parts. Provide...
-
Consider a soap bubble. Is the pressure inside the bubble higher or lower than the pressure outside?
-
A 5-kg rock is thrown upward with a force of 150 N at a location where the local gravitational acceleration is 9.79 m/s2. Determine the acceleration of the rock, in m/s2.
-
Design a clamper to perform the function indicated in Fig. 2.180. Silicon diodes 10 V 2.7 V vV Design -10 V
-
Four sewer pipes of \(0.5-\mathrm{m}\) diameter join to form one pipe of diameter \(D\). If the Manning coefficient, \(n\), and the slope are the same for all of the pipes, and if each pipe flows...
-
A not-for-profit organization receives a restricted gift. When, and in which type of fund, should it recognize the revenue? When, and in which type of fund, should it recognize the related expense?...
-
(c) Consider three scenarios: Base Case Worst Case Best Case % of Members Who Do Not Show 25% 50% 15% % of Nonmembers Who Do Not Show Number of Nonmember Registrants 10% 130 30% 100 5% 150 All other...
-
Two radio transmitters positioned 300 mi apart along the shore send simultaneous signals to a ship that is 200 mi offshore, sailing parallel to the shoreline. The signal from transmitter S reaches...
-
Using Warshall's algorithm, obtain the transitive closure of the matrix given below
-
The average prices and quantities of six hypothetical items offered by a government agency running a social protection program for a marginalized community are shown in the following table: 2021 2022...
-
1. The feed (245 kmol/hr) consists of methanol and water containing 68 moleto methanol and 52 mole % water. The overhead product is to contain 93 mole?. methanol and the bottom product contains 4...
-
The electric field component of an electromagnetic wave traveling in a vacuum is given by Ey = Eo sin (kx - wt), where E = 300 V/m and k=107m. What are the frequency of the oscillations (in Hz) and...
-
I. Conversion and Dimensional Analysis 1. Convert 10.0 g-/ to kg-/. 2. The total amount of fresh water on earth is estimated to be. What is this volume in cubic meters? In liters? 3. The ideal gas...
-
1. A particular electrical circuit has three loads R, R, and R3, from where the equivalent resistance can be calculated by: R +R R+R+R For a particular voltage supply V, the current can be calculated...
-
If your friend is addicted to cigarette smoking, as a biology student how would you explain the ill-effects of smoking?
-
Walker, Inc., is an all-equity firm. The cost of the company's equity is currently 11.4 percent and the risk-free.rate is 3.3 percent. The company is currently considering a project that will cost...
-
Sweden-based Ericsson (Telefonaktiebolaget L. M. Ericsson) is the worlds leading provider of communication technology, telecommunications equipment and services to mobile and fixed telecom network...
-
Multiple Choice Questions 1. In a capital increase, the difference between the price paid by the buyer for a companys common share and the par value of each share can be called (several possible...
-
The Agfa-Gevaert group develops, produces and distributes an extensive range of analog and digital imaging systems and IT solutions, mainly for the printing industry and the healthcare sector, as...
-
What are the stages in the product life cycle?
-
What is the hype cycle?
-
What is meant by the term agile innovation?
Study smarter with the SolutionInn App