Step 1 (Exploratory Data Analysis) In this task, you will consider the task of predicting a species of flower based on the characteristics of the flower In particular, you will consider clustering an Iris flower as to whether it belongs to one of the following three Iris species Setosa, Versicolour, or Virginica To perform this task, you need a data set containing the characteristics of various flowers of these three species A data set with this type of information is the well known Iris data set from the University of California at Irvine Machine Learning Repository https archive ics uci edu ml machine learning databases iris iris data It consists of information on 150 Iris flowers, 50 each from one of three Iris species Setosa, Versicolour, and Virginica Each flower is characterized by four attributes sepal length in centimetres sepal width in centimetres petal length in centimetres petal width in centimetres In this step, as demonstrated in the case study, by using Python code you are going to retrieve data from the above repository, clean them, i e , check for any NAN (Not A Number) or na, remove them if any, and change it to appropriate format so that it only has the required attributes or labels, i e , sepal width, sepal length, petal length, and petal width in your dataset Using the head() method your data should look like You must show the first 10 rows extract meaningful statistics These include a summary statistics of your data, including a number of your data (count), its mean, standard deviation, minimum, maximum for each label or attribute Then you are going to obtain count, mean, standard deviation, minimum, maximum for each species provide appropriate visualisation of your dataset so that they show statistical significance and clusters in your data You must at least use pairplot() and a scatterplot() Your scatter plot should show the petal widths attribute vs the petal length attribute from your dataset From the above Exploratory Data Analysis, you should conclude and show 3 clusters in your dataset Step 2 (plotting singlelinkagedendrogram) In this step, you limit your dataset only to the first 6 rows of sepal widths and sepal length You are going to develop a Python code that uses the Agglomerative Hierarchical Clustering Algorithms, Euclidean Distance and Single Linkage (minimum distance between two clusters to merge them) to cluster above 6 data records using dendrogram Here you are required to look at your data and try to understand them You can assume sepal widths as 'x' and sepal length as 'y' coordinates To start you can use scatterplot() to draw each point Then you are going to use your lecture on Hierarchical Clustering Algorithms to plot a dendrogram using Single Linkage to merge two closest clusters together You must write a step by step algorithm for your specific Agglomerative Hierarchical Clustering with its associated flow chart For each epoch your code must show Your calculations Your cluster based on the updated distance matrix Your dendrogram progress and the new clusters in your dendrogram plot Step 3 Answer the following question How do you interpret your dendrogram how many species does it show

The Answer is in the image, click to view ...

Question: Step 1 (Exploratory Data Analysis) In this task, you will consider the task of predicting a species of flower based on the characteristics of the

Step 1 (Exploratory Data Analysis)

In this task, you will consider the task of predicting a species of flower based on the characteristics of the flower. In particular, you will consider clustering an Iris flower as to whether it belongs to one of the following three Iris species: Setosa, Versicolour, or Virginica. To perform this task, you need a data set containing the characteristics of various flowers of these three species. A data set with this type of information is the well-known Iris data set from the University of California at Irvine Machine Learning Repository.

https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data

It consists of information on 150 Iris flowers, 50 each from one of three Iris species: Setosa, Versicolour, and Virginica. Each flower is characterized by four attributes:

sepal length in centimetres
sepal width in centimetres
petal length in centimetres
petal width in centimetres

In this step, as demonstrated in the case study, by using Python code you are going to

retrieve data from the above repository, clean them, i.e., check for any NAN (Not A Number) or na, remove them if any, and change it to appropriate format so that it only has the required attributes or labels, i.e., sepal width, sepal length, petal length, and petal width in your dataset. Using the head() method your data should look like:

You must show the first 10 rows.

extract meaningful statistics. These include a summary statistics of your data, including a number of your data (count), its mean, standard deviation, minimum, maximum for each label or attribute. Then you are going to obtain count, mean, standard deviation, minimum, maximum for each species.
provide appropriate visualisation of your dataset so that they show statistical significance and clusters in your data. You must at least use pairplot() and a scatterplot(). Your scatter plot should show the petal widths attribute vs the petal length attribute from your dataset.
From the above Exploratory Data Analysis, you should conclude and show 3 clusters in your dataset.

Step 2 (plotting singlelinkagedendrogram)

In this step, you limit your dataset only to the first 6 rows of sepal widths and sepal length.

You are going to develop a Python code that uses the Agglomerative Hierarchical Clustering Algorithms, Euclidean Distance and Single Linkage (minimum distance between two clusters to merge them) to cluster above 6 data records using dendrogram.

Here you are required to look at your data and try to understand them. You can assume sepal widths as 'x' and sepal length as 'y' coordinates. To start you can use scatterplot() to draw each point. Then you are going to use your lecture on Hierarchical Clustering Algorithms to plot a dendrogram using Single Linkage to merge two closest clusters together. You must write a step by step algorithm for your specific Agglomerative Hierarchical Clustering with its associated flow chart. For each epoch your code must show:

Your calculations
Your cluster based on the updated distance matrix.
Your dendrogram progress and the new clusters in your dendrogram plot.

Step 3.

Answer the following question

How do you interpret your dendrogram? how many species does it show?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Task 1.1} Conduct an exploratory data analysis {EDA} of the house-prroeacsy data set using the Rapidlvfiner Studio data mining tool. Provide the foowing for Task 1.1: {i} a screen capture of your...

I need to see the SPSS output. You need to have all z-scores, all charts, all descriptives data from SPSS, everything you used to answer the questions. I am sending you what the previous tutor sent...

Project - 3 : Predicting Algae Blooms Problem Description and Objectives High concentrations of certain harmful algae in rivers constitute a serious ecological problem with a strong impact not only...

Rephrase and summarize the main points of the following paragraph: "Employersuse diversity, equity and inclusion (DE&I) initiatives for both compliance obligations and to increase the overall bottom...

I'm an undergrad accounting student in an introduction to forensic accounting course.I need help getting started on a final project for this class over a fictitious company called the Grand Teton...

Background. When we talk about relationships between men and women, we usually refer to marriage. However, how can we identify a good relationship? Predicting divorce has been an area of interest for...

By udin Background. When we talk about relationships between men and women, we usually refer to marriage. However, how can we identify a good relationship? Predicting divorce has been an area of...

By using python language Background. When we talk about relationships between men and women, we usually refer to marriage. However, how can we identify a good relationship? Predicting divorce has...

Business Research MethodologyQuestion Bank 1 1. When the marketing department of an organization attempts to determine the amount of time the managers in this department spend at their computers in...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

Consider a population P = P (t) with constant relative birth and death rates and , respectively, and a constant emigration rate m, where , , and m are positive constants. Assume that > . Then the...

Service facility consists of 3 servers who can serve an average of 5 customers per hour (service times are exponential). An average of 10 customers per hour arrives at the facility (interarrival...

11. Which interrupt number is used for video services?

Explain five areas a company should be concerned with when examining the characteristics of a goal