Question: # Run this cell first. Do NOT edit this cell. Answer 1 = Answer 2 = Answer 3 = Answer 4 = Answer 5 =

# Run this cell first. Do NOT edit this cell.
Answer1= Answer2= Answer3= Answer4= Answer5= Answer6= None
import pandas
import numpy
import matplotlib
import matplotlib.pyplot as plt
#omatplotlib inline
from sklearn. linear_model import LinearRegression
elect = pandas. read_csv('2020_US_County_Level_Presidential_Results.csv')
county = pandas.read_csv('County_demographics.csv')
state = pandas. read_csv('StateNameData.csv')
elect.shape, county.shape, state.shape
((3152,10),(3139,43),(51,3))
Problem 1
Interesting data science projects often combine data from multiple sources to investigate novel relationships. Process the above datasets to link the
per_point_diff in the 2020 Presidential Election results to the Population.Population per Square Mile in the 2020 census County
Demographics.
For each county, per_point_diff measures the differential in the percentage of votes recieved by the 2 major parties (as percent_gop minus
percent_dem).
To measure density, because of skew, we will use numpy. log10(Population.Population per Square Mile) and call it Log_Pop_SqMi.
In Answer1, create a DataFrame with the following form, sorted by per_point_diff . An example row is shown. You should be able to link the
variables for 3109 counties using the given data; ignore any other counties.
Hints: StateNameData might help make the link. It is possible to join on multiple key columns.
 # Run this cell first. Do NOT edit this cell. Answer1=

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!