Question: In this assignment we will be using survey data of preferred programming languages in data science in a number of cities. We will use these

In this assignment we will be using survey data of preferred programming languages in data science in a number of cities. We will use these results to predict the favorite programming languages for places that were NOT part of the survey.

There are two data files for this assignment. The longitude and latitude of states and the survey data. You will be using longitude and latitude data to draw the maps of states and display the survey data results on the map. For this assignment we will be using the survey data of the New York state.

Step 1: Load data files longitude_latitude.csv and ny_survey.csv

The language column of the survey data contains the preferred programming language at the city located at longitude and longitude.

Please see the data links on below link.There are 2 data files are shared

https://drive.google.com/file/d/11L63R3G9LlGkDE6v6EM6gFXmZxHmqZID/view?usp=sharing

Step 2: Using the longitude and latitude data in the file longitude_latitude.csv plot the map of the California state.

Use the function geom_path() from gglot2 library.

Step 3: Draw map and display data

Part A) Using the longitude and latitude data in the file longitude_latitude.csv plot the map of the New York state.

Part B) Using the data in the file ny_survey.csv plot the survey responses on the map you created in Part A.

Hint: You can plot survey data on the same map by adding the following statement to the code that generated the map/graph Part A:

geom_point(data=surveyDataFrame, size = 3, aes(colour = factor(language)))

where surveyDataFrame is the data from the file ny_survey.csv

Part 4: Apply KNN

In this part we want to use the R implementation of K-Nearest Neighbors on the survey data for the New York state to predict the favorite programming languages for places that werent part of the survey.

Use knn() function to answer the following questions.

Question 1: For k=1, what is the preferred language for longitude=-75 and latitude=43.65?

Question 2: For k=2, what is the preferred language for longitude=-75 and latitude=43.65?

Question 3: For k=3, what is the preferred language for longitude=-75 and latitude=43.65?

Part 5: Display the new point on the state map.

Use the map from Part B in Step 3 to display the data point at longitude=-75 and latitude=43.65

You can use the same technique as before to add the new point to the existing map. Simply add the following statement to the code generated the map in Part B in Step 3:

geom_point(data=testingDataFrame, shape=25, fill="blue", color="darkred", size=5)

where testingDataFrame is the test data frame for longitude=-75 and latitude=43.65

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!