Question: Please help with this for deep learning. CODE IN PHYTON. Problem 3 : For a given number of parameters P , let m P (

Please help with this for deep learning. CODE IN PHYTON.
Problem 3: For a given number of parameters P, let mP(k) be the number of nodes per layer such that Params (k,m)=P(or as close to P as possible). A network with k layers and mP(k) nodes per layer should therefore have approximately P total parameters. For such a network, we can define trainLoss (k,P) and testLoss (k,P), for each k from 1 to kP.
Identify 10 values of P, sufficiently distinct so that the resulting network shapes and scale. How did you make your choice? Use the linear softmax model as a baseline.
Note, because of the integer/rounding issues, your P values should be distinct enough so that you don't accidentally create networks with the same m and k for two distinct P values.
Plot (overlaying the curves for each P) trainLoss (k,P), with the x-axis as kkP from 0 to 1.
Note, you probably do not want to test every possible k value, due to time constraints. But test enough k values so that the trend is clear.
Plot (overlaying the curves for each P) test Los(k,P), with the x-axis as kkP from 0 to 1.
How do the results compare to the baseline performance of the linear softmax model?
What do you notice about the underlying trends? Is there a point where layers become too narrow to be useful, and if so, where is it? What seems to be the sweet spot, if any, for network shape? How does it depend on P?
Create a plot showing (overlaying the curves for each P) total training time in terms of passes through the data. Set the x-axis to go over kkP for ease of comparison. Does this change your assessment of the network shape tradeoff at all?
Bonus:
Is total parameters P a fair comparison point? Try to find a better one. Justify it.
Does introducing regularization (weight decay) or normalization layers help?
Problem 4: For a P of your choice and the optimal network shape as determined above - try to find an even better network shape (layers of unequal size, for instance) that gives better results for the same (approximate) total number of parameters. Is it better to have uniform layers? Layers of decreasing size? Increasing size? Experiment with it, and summarize your results. You may want to save the best model you find.
Bonus: Does regularization help?
 Please help with this for deep learning. CODE IN PHYTON. Problem

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!