Question: Please help to solve this problem. For coding part please use JAVA or Python In this problem the Iris data set will be used to
Please help to solve this problem. For coding part please use JAVA or Python
In this problem the Iris data set will be used to begin understanding how to apply the algorithms in the first four modules to a well know data set. The Iris Plants Database contains 3 classes of 50 instances each, where each class refers to a type of Iris plant. Four attributes/features (in centimeters) were collected for each plant instance. A fifth attribute is provided which is the class label of the plant type. The data can be downloaded from iris.arff on the Sample Weka Data Sets webpage (https://storm.cis.fordham.edu/ gweiss/data-mining/datasets.html).


4. Outlier Removal (25 points) (a) Develop an algorithm (pseudocode) to remove in sequential order observations that are furthest from the data class mean. (b) Provide the running time and total running time of your algorithm in O-notation and T(n). (c) Implement your algorithm in your code of choice. (d) Determine if the data contains an outlier by plotting each class individually, the key is to plot two features at a time n different combinations, e.g., feature 1 vs feature 2, etc. (e) Provide an explanation of the results: i. was there any class that had obvious outliers; if so how did you determine the outlier, if not, why not? 1 ii. what was the metric used to determine separation? Explain why the metric was chosen
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
