Question: Section B: Programming Assignment Please solve the following problem by coding Python programs Data: You will be working on MNIST data, a dataset of thousands

Section B: Programming Assignment Please solve the following problem by coding Python programs Data: You will be working on MNIST data, a dataset of thousands of images of handwritten digits. You can download the dataset here - https://www.kaggle.com/c/digit-recognizer/data. "The data files train.csv and testcsv contain gray-scale images of hand-drawn digits, from zero through nine. Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker This pixel-value is an integer between 0 and 255, inclusive. The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image." You only need to download the train.csv file. test.csv is not required. The train.csv file contains 42k samples of images. To reduce time of running the program, you will only work with 1,000 randomly selected samples out of these, although make sure you have equal number of samples belonging to each label (i.e 100 samples of label '0', 100 samples of label 1' and so on). Problem Statement: 1. Perform Naive Bayes (NB) classification and K Nearest Neighbor (KNN) classification on the above data 2. Calculate the accuracy obtained using NB and KNN methods; 3. Perform cross-validation using NB and KNN methods, and compare the results with problem 2. Task 1. You can implement the method from sklearn to implement classification. You will need to sample 100 instances out of each label(1,000 instances in total). Task 2 Apply NB and KNN methods on the whole dataset (described in Task 1) as training dataset, calculate the training accuracy for each method Task 3 Apply k-fold cross validation using the sampled dataset generated in task 1. Perform the k-fold cross validation experiment using the following values: k-2, and k-4 Task 4 Compared with the results you get in homework 2 (Decision Tree and Multiple Layer Perceptron), what is your observation regarding their results? Deliverables: I. 2. Python source codes in a zipped file; Brief report including all your results and observations; Section B: Programming Assignment Please solve the following problem by coding Python programs Data: You will be working on MNIST data, a dataset of thousands of images of handwritten digits. You can download the dataset here - https://www.kaggle.com/c/digit-recognizer/data. "The data files train.csv and testcsv contain gray-scale images of hand-drawn digits, from zero through nine. Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker This pixel-value is an integer between 0 and 255, inclusive. The training data set, (train.csv), has 785 columns. The first column, called "label", is the digit that was drawn by the user. The rest of the columns contain the pixel-values of the associated image." You only need to download the train.csv file. test.csv is not required. The train.csv file contains 42k samples of images. To reduce time of running the program, you will only work with 1,000 randomly selected samples out of these, although make sure you have equal number of samples belonging to each label (i.e 100 samples of label '0', 100 samples of label 1' and so on). Problem Statement: 1. Perform Naive Bayes (NB) classification and K Nearest Neighbor (KNN) classification on the above data 2. Calculate the accuracy obtained using NB and KNN methods; 3. Perform cross-validation using NB and KNN methods, and compare the results with problem 2. Task 1. You can implement the method from sklearn to implement classification. You will need to sample 100 instances out of each label(1,000 instances in total). Task 2 Apply NB and KNN methods on the whole dataset (described in Task 1) as training dataset, calculate the training accuracy for each method Task 3 Apply k-fold cross validation using the sampled dataset generated in task 1. Perform the k-fold cross validation experiment using the following values: k-2, and k-4 Task 4 Compared with the results you get in homework 2 (Decision Tree and Multiple Layer Perceptron), what is your observation regarding their results? Deliverables: I. 2. Python source codes in a zipped file; Brief report including all your results and observations
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
