Fisher's (1936) iris data set provides the measurements in centimeters of the variables sepal length and...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
Fisher's (1936) iris data set provides the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 150 flowers (50 flow- ers from each of the 3 species of iris). The species are Iris setosa, versicolor, and virginica. iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species. a) After using explanatory data analysis, we decided that Petal.Length and Petal.Width were similar among the same species but varied considerably between different species. Therefore we decided to conduct our clustering analysis using the vari- ables of Petal.Length and Petal.Width. What data pre-processing steps would you possibly do to have the raw data ready for the desired clustering analysis? Please consider the first 6 observations of the data presented in Figure 1, and explain your reasoning. (1 pt) > head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 0.2 setosa 0.2 setosa 0.2 setosa 0.2 setosa 0.2 setosa 0.4 setosa 1 2 3 4 5 6 5.1 4.9 4.7 4.6 5.0 5.4 3.5 3.0 3.2 3.1 3.6 3.9 1.4 1.4 1.3 1.5 1.4 1.7 Figure 1: First six observations of the data set b) Figure 2 was retrieved by utilizing k-means clustering analysis using 25 randomly selected starting points. It provides the within sum of squares values for select number of clusters. Using this output and elbow method, determine the number of clusters and explain your reasoning. (1 pt) c) For a given number of clusters, the ratio of between sum of squares and total sum of squares is 94.3 percent. What does this value mean with respect to the quality of clustering?) 500 400 300 Within Sum of Squares 200 100 0 2 4 6 8 -0-0-0-0-0-0-0 10 Number of Clusters 12 14 Figure 2: Within sum of squares values for select number of clusters d) Figure 3 in the following page provides the dendrogram for a sample of the data set. How many clusters would be appropriate given the label of species? Explain your reasoning. e) Using Figure 3 in the following page, in order to group data into 3 clusters, which height (value or range) would you draw the horizontal line at? C D C D C D C D C D CD. Figure 3: Dendogram hclust (*, "ward.D") dist(iris40) DDDDDDDD Cluster Dendrogram 0 10 20 30 40 Height Fisher's (1936) iris data set provides the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 150 flowers (50 flow- ers from each of the 3 species of iris). The species are Iris setosa, versicolor, and virginica. iris is a data frame with 150 cases (rows) and 5 variables (columns) named Sepal.Length, Sepal.Width, Petal.Length, Petal.Width, and Species. a) After using explanatory data analysis, we decided that Petal.Length and Petal.Width were similar among the same species but varied considerably between different species. Therefore we decided to conduct our clustering analysis using the vari- ables of Petal.Length and Petal.Width. What data pre-processing steps would you possibly do to have the raw data ready for the desired clustering analysis? Please consider the first 6 observations of the data presented in Figure 1, and explain your reasoning. (1 pt) > head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 0.2 setosa 0.2 setosa 0.2 setosa 0.2 setosa 0.2 setosa 0.4 setosa 1 2 3 4 5 6 5.1 4.9 4.7 4.6 5.0 5.4 3.5 3.0 3.2 3.1 3.6 3.9 1.4 1.4 1.3 1.5 1.4 1.7 Figure 1: First six observations of the data set b) Figure 2 was retrieved by utilizing k-means clustering analysis using 25 randomly selected starting points. It provides the within sum of squares values for select number of clusters. Using this output and elbow method, determine the number of clusters and explain your reasoning. (1 pt) c) For a given number of clusters, the ratio of between sum of squares and total sum of squares is 94.3 percent. What does this value mean with respect to the quality of clustering?) 500 400 300 Within Sum of Squares 200 100 0 2 4 6 8 -0-0-0-0-0-0-0 10 Number of Clusters 12 14 Figure 2: Within sum of squares values for select number of clusters d) Figure 3 in the following page provides the dendrogram for a sample of the data set. How many clusters would be appropriate given the label of species? Explain your reasoning. e) Using Figure 3 in the following page, in order to group data into 3 clusters, which height (value or range) would you draw the horizontal line at? C D C D C D C D C D CD. Figure 3: Dendogram hclust (*, "ward.D") dist(iris40) DDDDDDDD Cluster Dendrogram 0 10 20 30 40 Height
Expert Answer:
Answer rating: 100% (QA)
Explanations a Based on the information provided in the image the data preprocessing steps I would recommend for conducting the desired clustering analysis using the variables PetalLength and PetalWid... View the full answer
Related Book For
Posted Date:
Students also viewed these programming questions
-
Create a generic class called MyArrayList that includes the following instance methods: 1. add(index, value) 2. indexOf(value) 3. remove(index) 4. set(index, value) 5. toString() 6. addAll(list) 7....
-
A horizontal cross section of a concrete bridge pier is a regular hexagon (six sides, all equal in length, and all internal angles are equal), each side of which is 2.50 m long. If the height of the...
-
Guangqing Ltd. reports net income of NT$200,000. The income ratios are Guang 60% and Qing 40%. Indicate the division of net income to each partner, and prepare the entry to distribute the net income.
-
On April 30, the end of the first month of operations, Joplin Company prepared the following income statement, based on the absorption costing concept: If the fixed manufacturing costs were $450,000...
-
Discuss the general idea that just because two things are correlated, one does not necessarily cause the other. Provide an example (other than ice cream and crime!).
-
Montclair Tours provides guided educational tours to college alumni associations. The company is divided into two operating divisions: domestic tours and world tours. Each of the tour divisions uses...
-
Why the need for communication security protocols is significant? How communication security measures are built upon these principles? Which existing communication security protocol bears the...
-
Show that in the multiple regression of y on a constant, x1 and x2 while imposing the restriction 1 + 2 = 1 leads to the regression of y x1 on a constant and x2 x1.
-
What is herbicide and which market produce it?
-
Use the following Database structure to answer questions. The PK in each table is labeled with a key icon. The tables are linked via the FK-PK relationship. 1 and infinity indicate 1 to many...
-
The language L(M) accepted by the NFA M below is represented by which regular expression? a a b O L(M)=L((ab)* aba*) L(M) = L(p(ba + a)* ba*) OL(M)= L(abaa*ba*) OL(M)=L(aa*ba* + (ab)* ab) b a O
-
2. Question 6.4E Part A (25 points). Given the below EER model for the airline business (also on Page 145), map the EER model to a relational model representation. Clearly indicate the primary and...
-
Page Rank Algorithm 6. In the diagram below, focus on just pages 0, 1, and 2. Which of them would have the highest PageRank? Which the lowest? Briefly explain. Page 0 Page 0 0.022 Page 7 0.031 Page 2...
-
S X i. D II. C R F V G B H N U J 210 MAN Q3) Consider the following B+ tree: M SHIV NADAR THEVERGR Discipline: Computer Science and Engineering Course Name: Introduction to Databases Time: 3:00 pm -...
-
With regard to capital investment, net cash inflow is equal to the Select one: a. sum of all future revenues from the investment. b. net increase in cash payments over cash receipts. c. a net...
-
As you rewrite these sentences, replace the cliches and buzzwords with plain language (if you don't recognize any of these terms, you can find definitions online): a. Being a jack-of-all-trades, Dave...
-
a. Describe in your own words what it means for two events to be mutually exclusive. b. Describe in your own words what it means for two events to be independent. c. Explain how mutually exclusive...
-
The New York Times December 1, 2009, article In November, Car Sales Show Signs of Stability reported that new vehicles were selling at a seasonally adjusted annualized rate of 11 million in November....
-
If a population has a standard deviation s of 25 units, what is the standard error of the mean if samples of size 16 are selected? Samples of size 36? Samples of size 100?
-
A W14 \(\times 30\) structural A992 steel column is pin connected at is ends and has a length \(L=12 \mathrm{ft}\). Determine the maximum eccentric load \(P\) that can be applied so the column does...
-
A W16 \(\times 45\) structural A992 steel column is fixed at the base and free at the top and has a length \(L=8 \mathrm{ft}\). Determine the maximum eccentric load \(P\) that can be applied so the...
-
Determine the \(\operatorname{load} P\) required to cause the steel \(\mathrm{W} 12 \times 50\) structural A-36 steel column to fail either by buckling or by yielding. The column is fixed at its...
Study smarter with the SolutionInn App