Question: In this exercise you need to implement a 3 - layer MLP model ( one input layer, one hidden layer with tanh activation and one

In this exercise you need to implement a 3-layer MLP model (one input layer, one hidden layer with tanh activation and one output layer) in any platform, which will be used to classify the images from the MNIST dataset.
You can use the built-in modules in the platform to build your model, such as Linear, Dropout, Tanh, etc. You also need to write the training function (training), and should explore the following hyperparameter settings:
Batch size: Number of examples per training iteration.
Hidden size: Try using different number of hidden nodes in your model and compare the performances.
Dropout: Dropout is an effective strategy to defend against overfitting. Adding a dropout layer after the hidden layer, and try using different dropout rate to compare the performances.
Optimizer: Try using different optimizers such as SGD, Adam, RMSProp.
Regularization (weight decay): L2 regularization can be specified by setting the weight_decay
parameter in optimizer. Try using different regularization factor and check the performance.
Learning rate, Learning rate scheduler: Learning rate is key hyperparameter in model training, and you can gradually decreasing the learning rate to further improve your model. Try using different learning rate and different learning rate scheduler to compare the performance.
To get full credit, you should explore at least 4 different types of hyperparameters (from listed above), and choose at least 3 different values for each hyperparameters. For simplicity, you could analyze one hyperparameter at a time (i.e. fixing all others to some reasonable value), rather than performing grid search. If you use TensorBoard to monitor your training, you can directly attach the screenshots of the training curves (accuracy) in your report.
To evaluate the performance of trained model, you also need to write a function (evaluation) which loads the trained model and evaluate its performance on train/test set. In your report, please clearly state what hyperparameters you explored, and what accuracy the model achieved on train/test set.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!