Question: In this assignment we will be using survey data of preferred programming languages in data science in a number of cities. We will use these
In this assignment we will be using survey data of preferred programming languages in data science in a number of cities. We will use these results to predict the favorite programming languages for places that were NOT part of the survey.
There are two data files for this assignment. The longitude and latitude of states and the survey data. You will be using longitude and latitude data to draw the maps of states and display the survey data results on the map. For this assignment we will be using the survey data of the New York state.
Step 1: Load data files longitude_latitude.csv and ny_survey.csv
The language column of the survey data contains the preferred programming language at the city located at longitude and longitude.
Please see the data links on below link.There are 2 data files are shared
https://drive.google.com/file/d/11L63R3G9LlGkDE6v6EM6gFXmZxHmqZID/view?usp=sharing
Step 2: Using the longitude and latitude data in the file longitude_latitude.csv plot the map of the California state.
Use the function geom_path() from gglot2 library.
Step 3: Draw map and display data
Part A) Using the longitude and latitude data in the file longitude_latitude.csv plot the map of the New York state.
Part B) Using the data in the file ny_survey.csv plot the survey responses on the map you created in Part A.
Hint: You can plot survey data on the same map by adding the following statement to the code that generated the map/graph Part A:
geom_point(data=surveyDataFrame, size = 3, aes(colour = factor(language)))
where surveyDataFrame is the data from the file ny_survey.csv
Part 4: Apply KNN
In this part we want to use the R implementation of K-Nearest Neighbors on the survey data for the New York state to predict the favorite programming languages for places that werent part of the survey.
Use knn() function to answer the following questions.
Question 1: For k=1, what is the preferred language for longitude=-75 and latitude=43.65?
Question 2: For k=2, what is the preferred language for longitude=-75 and latitude=43.65?
Question 3: For k=3, what is the preferred language for longitude=-75 and latitude=43.65?
Part 5: Display the new point on the state map.
Use the map from Part B in Step 3 to display the data point at longitude=-75 and latitude=43.65
You can use the same technique as before to add the new point to the existing map. Simply add the following statement to the code generated the map in Part B in Step 3:
geom_point(data=testingDataFrame, shape=25, fill="blue", color="darkred", size=5)
where testingDataFrame is the test data frame for longitude=-75 and latitude=43.65
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
