Question: Description This dataset is extracted from the original source ( shown above ) to include weather information observed on July 1 5 , 2 0
Description
This dataset is extracted from the original source shown above to include weather information
observed on July only. Features of data include
date: month, day, and year when the observation is made
station id: ID of a location where the observation is made
lat: latitude coordinate
lon: longitude coordinate
mean temp: average temperature
wind speed
precipitation
The observations are made from states across the United States. Our goal is to apply clustering
methods to group locations with similar weather conditions.
Preprocessing
Import data from weather.csv save it as myData. Hint: use read.csv
Use summary to check if there are columns with NA values.
Remove rows with NA station ID
Check if data contains duplicated station ID
There are still NA values in columns wind speed and precipitation. We will fill those empty
values using these steps:
a Fill NA precipitation with value Hint: isna returns TRUEFALSE for NA values.
b Fill NA values of wind speed with the national median value.
c Can you think of other ways of filling in the missing values?
Remove observations from Alaska and Hawaii.
Let us visualize the data. Make a scatter plot using ggplot. Use longitude xaxis and
latitude yaxis for coordinates, and use column mean temp to color points.
Optional We can change the colorgradient by adding
scale
color
gradientlow "color high "color
where color and color are color names eg gold, red, etc.
Clustering: We are going to apply clustering algorithms using weather features average temper
ature, wind speed, and precipitation
Save a subset of myData containing the above columns only.
Apply Kmeans clustering algorithm to group the data points to clusters.
a Report the number of data points in each cluster.
b For each cluster, report the average temperature, wind speed, and precipitation.
c Repeat Question to make a scatter plot. This time, we will use cluster membership
eg to show different colors. Use asfactor to convert the cluster mem
bership before providing it to ggplot.
Optional We can manually choose colors for each group by adding
scale
color
manualvaluesccolor"color "color "color "color
Apply hierarchical clustering use complete link method on the subset data created in
Question
a Make a dendrogram.
b Trim the clustering to get clusters.
c Repeat Question to visualize the clustering output
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
