Question: 1 4 . 9 Project 5 : Machine Learning Overview In this project, we will explore some basic concepts in artificial intelligence. Using the concepts

14.9 Project 5: Machine Learning Overview In this project, we will explore some basic concepts in artificial intelligence. Using the concepts you have learned thus far in the course, you will design a machine learning method which will be able to identify a flower based on four characteristics: Sepal length Sepal width Petal length Petal width Your program will differentiate between three types of iris flowers: Iris-setosa Iris-versicolor Iris-virginica You must write these 9 functions, but you can write more if you wish: readData: read data from data files display: display the loaded data mean: calculate the average across an array of values stddev: calculate the standard deviation across an array of values stats: display mean and standard deviation of each characteristic distance: how similar two flowers are based on euclidean distance nearestNeighbor: find the flower most similar to another accuracy: calculate how accurate your machine learning method is main: the main function These functions are discussed in more detail in the following sections. Functions can and should make use of each other. For example, the stddev function would call mean as a part of calculating the standard deviation. Command-line Arguments The program will accept three command-line arguments: training data filename testing data filename action display stats accuracy classify Example: ./a.out train.data test.data display If the number of arguments is less than or greater than expected, print the following message and terminate: Usage: ./a train_filename test_filename [display|stats|accuracy|classify] If the the action is invalid, print the following message and terminate: Invalid action Usage: ./a train_filename test_filename [display|stats|accuracy|classify] Read Data You have been given two example files: train.data and test.data. Now, we will focus on train.data. Your first task will be to read the data in this file. There will be up to 1,000 entries (inclusive) in this file. The first four columns correspond to the four characteristics mentioned in the overview. The fifth column is the flower type these characteristics describe, called the 'label'. You will write a function to read the data in this file into five arrays. The first four arrays are for the four characteristics, and the fifth array stores the flower type. The function definition should be: int readData(char filename[], double sepal_lengths[], double sepal_widths[], double petal_lengths[], double petal_widths[], int labels[], int *length); You will notice the labels array is an array of integers. This is because in machine learning, it's common to number each label, as numbers are easier to work with than strings. When reading the file, store Iris-setosa as 0, Iris-versicolor as 1, and Iris-virginica as 2. The number of records read from the file is returned as the final reference parameter. For the example files, train.data would be 120 and test.data would be 30. If the file does not exist, return a value of 0, else return a 1. In your main method, you should read the data for both files before doing anything else. If either method returns a 0, immediately print the following error and terminate: Unable to open file FILENAME where FILENAME is the filename passed to the function. If both files cannot be opened, only print the training data error. Examples: ./a.out not_a_file.txt another_fake_file.txt display Unable to open file not_a_file.txt ./a.out train.data another_fake_file.txt display Unable to open file another_fake_file.txt Display Data To ensure the data was loaded properly, you will write a function to print out all the stored values. The display function will iterate over each flower and print its sepal length, sepal width, petal length, petal width, and label. Formatted as: (sepal length, sepal width, petal length, petal width)=> label The function definition should be: display(double sepal_lengths[], double sepal_widths[], double petal_lengths[], double petal_widths[], int labels[], int length); where the last parameter, length, is how many flowers there are (length of the arrays). Example (first three lines when calling display on the train.data data)(5.100000,3.500000,1.400000,0.200000)=>0(4.900000,3.000000,1.400000,0.200000)=>0(4.700000,3.200000,1.300000,0.200000)=>0 Statistics When working on a machine learning project, it's always important for the data scientist to become familiar with their data. One way to do this is to look at the statistics of your dataset. In this case, we will be interested in the mean and standard deviation for each of the values for each flower. Mean double mean(double values[], int labels[], int filter, int length); The mean method will take an array of values and an array of labels. However, we want to know the mean for a specific flower type. The desired flower type will be passed as filter.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!