Description This dataset is extracted from the original source ( shown above ) to include weather information observed on July 1 5 , 2 0 2 0 only Features of data include date month, day, and year when the observation is made station id ID of a location where the observation is made lat latitude coordinate lon longitude coordinate mean temp average temperature wind speed precipitation The observations are made from 5 0 states across the United States Our goal is to apply clustering methods to group locations with similar weather conditions Pre processing 1 Import data from weather csv , save it as myData Hint use read csv ( ) 2 Use summary ( ) to check if there are columns with NA values 3 Remove rows with NA station ID 4 Check if data contains duplicated station ID 5 There are still NA values in columns wind speed and precipitation We will fill those empty values using these steps ( a ) Fill NA precipitation with value 0 Hint is na ( ) returns TRUE FALSE for NA values ( b ) Fill NA values of wind speed with the national median value ( c ) Can you think of other ways of filling in the missing values 6 Remove observations from Alaska and Hawaii 7 Let us visualize the data Make a scatter plot using ggplot Use longitude ( x axis ) and latitude ( y axis ) for coordinates, and use column mean temp to color points ( Optional ) We can change the color gradient by adding scale color gradient ( low color 1 , high color 2 ) , where color 1 and color 2 are color names ( e g , gold, red, etc ) Clustering We are going to apply clustering algorithms using weather features ( average temper ature, wind speed, and precipitation ) 8 Save a subset of myData containing the above 3 columns only 9 Apply K means clustering algorithm to group the data points to 5 clusters ( a ) Report the number of data points in each cluster ( b ) For each cluster, report the average temperature, wind speed, and precipitation ( c ) Repeat Question 7 to make a scatter plot This time, we will use cluster membership ( e g , 1 , 2 , 3 , 4 , 5 ) to show different colors Use as factor ( ) to convert the cluster mem bership before providing it to ggplot ( Optional ) We can manually choose colors for each group by adding scale color manual ( values c ( color 1 , color 2 , color 3 , color 4 , color 5 ) ) 1 0 Apply hierarchical clustering ( use complete link method ) on the subset data created in Question 8 ( a ) Make a dendrogram ( b ) Trim the clustering to get 3 clusters ( c ) Repeat Question 7 to visualize the clustering output

The Answer is in the image, click to view ...

Question: Description This dataset is extracted from the original source ( shown above ) to include weather information observed on July 1 5 , 2 0

Description

This dataset is extracted from the original source

(

shown above

)

to include weather information

observed on July

15, 2020

only. Features of data include

date: month, day, and year when the observation is made

station id: ID of a location where the observation is made

lat: latitude coordinate

lon: longitude coordinate

mean temp: average temperature

wind speed

precipitation

The observations are made from

50

states across the United States. Our goal is to apply clustering

methods to group locations with similar weather conditions.

Pre

-

processing

1 .

Import data from weather.csv

,

save it as myData. Hint: use read.csv

() .

2 .

Use summary

()

to check if there are columns with NA values.

3 .

Remove rows with NA station ID

.

4 .

Check if data contains duplicated station ID

.

5 .

There are still NA values in columns wind speed and precipitation. We will fill those empty

values using these steps:

(

)

Fill NA precipitation with value

0 .

Hint: is

.

()

returns TRUE

/

FALSE for NA values.

(

)

Fill NA values of wind speed with the national median value.

(

)

Can you think of other ways of filling in the missing values?

6 .

Remove observations from Alaska and Hawaii.

7 .

Let us visualize the data. Make a scatter plot using ggplot. Use longitude

(

-

axis

)

and

latitude

(

-

axis

)

for coordinates, and use column mean temp to color points.

(

Optional

)

We can change the color

-

gradient by adding

+

scale

color

gradient

(

low

=

"color

1 ",

high

=

"color

2 "),

where color

1

and color

2

are color names

(

.

.,

gold, red, etc.

) .

Clustering: We are going to apply clustering algorithms using weather features

(

average temper

ature, wind speed, and precipitation

) .

8 .

Save a subset of myData containing the above

3

columns only.

9 .

Apply K

-

means clustering algorithm to group the data points to

5

clusters.

(

)

Report the number of data points in each cluster.

(

)

For each cluster, report the average temperature, wind speed, and precipitation.

(

)

Repeat Question

7

to make a scatter plot. This time, we will use cluster membership

(

.

., 1, 2, 3, 4, 5)

to show different colors. Use as

.

factor

()

to convert the cluster mem

bership before providing it to ggplot.

(

Optional

)

We can manually choose colors for each group by adding

+

scale

color

manual

(

values

=

("

color

1 ",

"color

2 ",

"color

3 ",

"color

4 ",

"color

5 ")) .

10 .

Apply hierarchical clustering

(

use complete link method

)

on the subset data created in

Question

8 .

(

)

Make a dendrogram.

(

)

Trim the clustering to get

3

clusters.

(

)

Repeat Question

7

to visualize the clustering output

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

A brief description of the phoneme data In speech recognition, the observed data are clearly of a functional nature. For instance, look at the following data set, this data set was formed by...

Please attemp the question I have attached and I have guideline for the answers.For question 3,difference between tax evasion and avoiddance can be found in seminar 2 slides.Tax avoidance mean the...

Students will review examples and evaluate them. Review these documents and evaluate them (click on the link): https://1drv.ms/w/s!AoYu6G3CLyuakjVCGipkRkNSBVUB?e=jrPXX6...

SA-TIED YOUNG SCHOLARS PROGRAMME Young Scholars This paper was produced as a part of the SA-TIED Young Scholars' programme. The programme is a part of SATIED's capacity building initiatives, designed...

What attributes should the company focus on while designing the communication plan? For the exclusive use of J. Barry, 2020. A00183 March 31, 2014 Revised 18 June 2019 Cerenity Sanitiser: Marketing...

CASE 11 Sirius XM Satellite Radio Inc. in 2014: On Track to Succeed connect after a Near-Death Experience? Arthur A. Thompson The University of Alabama In February 2009, the outlook for Sirius XM...

Prior to 2010, Heberling Inc. excluded manufacturing overhead costs from work in process and finished goods inventory. These costs have been expensed as incurred. In 2010, the company decided to...

Adam is attempting to "flip" a property. He is buying a house for $300,000 and he is going to finance 80% of the purchase with a 30-year 6% loan. He plans to stay in the house for 6 years and then...

Which of the following types of stock funds invest in stock traded wthin one specific region of the world? Mutiple Cholce Index funds International Ayndt Regional fundt Growth Aunds

Financial Statement Analysis - Alphabet Inc. Alphabet Inc. has reported the following financial data for the fiscal year ended December 31, 2023: Total Assets: $150,000 million Total Liabilities:...