Question: The classic MNIST Digit RecognizerLinks to an external site. problem is a competition on Kaggle.com, and you will compete in this competition. For this assignment,

The classic MNIST Digit RecognizerLinks to an external site. problem is a competition on Kaggle.com, and you will compete in this competition. For this assignment, you will develop a classifier that may be used to predict which of the 10 digits is being written.
Management/Research Question
In laymans terms, what is the management/research question of interest, and why would anyone care?
Requirements
Fit a random forest classifier using the full set of explanatory variables and the model training set (csv).
Record the time it takes to fit the model and then evaluate the model on the csvdata by submitting to Kaggle.com. Provide your Kaggle.com score and user ID.
Execute principal components analysis (PCA) on the combined training and test set data together, generating principal components that represent 95 percent of the variability in the explanatory variables. The number of principal components in the solution should be substantially fewer than the explanatory variables.
Record the time it takes to identify the principal components.
Using the identified principal components from step (2), use thecsvto build another random forest classifier.
Record the time it takes to fit the model and to evaluate the model on the csvdata by submitting to Kaggle.com. Provide your Kaggle.com score and user ID.
Use k-means clustering to group MNIST observations into 1 of 10 categories and then assign labels. (Follow the example here if needed: kmeans mnist.pdf Download kmeans mnist.pdf).kmeans mnist-2.pdf Download kmeans mnist-2.pdf
Submit the RF Classifier, the PCA RF, and k-means estimations to Kaggle.com, and provide screen snapshots of your scores as well as your Kaggle.com user name.
The experiment we have proposed has a major design flaw. Identify the flaw. Fix it. Rerun the experiment in a way that is consistent with a training-and-test regimen, and submit this to Kaggle.com.
Report total elapsed time measures for the training set analysis. It is sufficient to run a single time-elapsed test for this assignment. In practice, we might consider the possibility of repeated executions of the relevant portions of the programs, much as the Benchmark Example programs do. Some code that might help you with reporting elapsed total time follows.
start=datetime.now()
rf2.fit(trainimages,labels)
end=datetime.now()
print(end-start)
Python Programming
All programming will be done in Python.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!