Question: Please help with this for deep learning. CODE IN PHYTON. Problem 3 : For a given number of parameters P , let m P (

Please help with this for deep learning. CODE IN PHYTON.

Problem

3

: For a given number of parameters

P,

let

m_{P} (k)

be the number of nodes per layer such that Params

(k, m) = P (

or as close to

P

as possible

) .

A network with

k

layers and

m_{P} (k)

nodes per layer should therefore have approximately

P

total parameters. For such a network, we can define trainLoss

(k, P)

and testLoss

(k, P),

for each

k

from

1

k_{P} .

Identify

10

values of

P,

sufficiently distinct so that the resulting network shapes and scale. How did you make your choice? Use the linear softmax model as a baseline.

Note, because of the integer

/

rounding issues, your

P

values should be distinct enough so that you don't accidentally create networks with the same

m

and

k

for two distinct

P

values.

Plot

(

overlaying the curves for each

P)

trainLoss

(k, P),

with the

x -

axis as

\frac{k}{k_{P}}

from

0

1 .

Note, you probably do not want to test every possible

k

value, due to time constraints. But test enough

k

values so that the trend is clear.

Plot

(

overlaying the curves for each

P)

test Los

(k, P),

with the

x -

axis as

\frac{k}{k_{P}}

from

0

1 .

How do the results compare to the baseline performance of the linear softmax model?

What do you notice about the underlying trends? Is there a point where layers become too narrow to be useful, and if so

,

where is it

?

What seems to be the sweet spot, if any, for network shape? How does it depend on

P ?

Create a plot showing

(

overlaying the curves for each

P)

total training time in terms of passes through the data. Set the

x -

axis to go over

\frac{k}{k_{P}}

for ease of comparison. Does this change your assessment of the network shape tradeoff at all?

Bonus:

Is total parameters

P

a fair comparison point? Try to find a better one. Justify it

.

Does introducing regularization

(

weight decay

)

or normalization layers help?

Problem

4

: For a

P

of your choice and the optimal network shape as determined above

-

try to find an even better network shape

(

layers of unequal size, for instance

)

that gives better results for the same

(

approximate

)

total number of parameters. Is it better to have uniform layers? Layers of decreasing size? Increasing size? Experiment with it

,

and summarize your results. You may want to save the best model you find.

Bonus: Does regularization help?

Please help with this for deep learning. CODE IN PHYTON. Problem

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Please help with this for deep learning!! CODE IN PHYTON. 2 Training Effects: Activation Functions, Optimizers, Batch Size Problem 5 : For the model structure you chose in Problem 4 , consider the...

Jupyter Notebook Now that we have tried our hand at some single-layer nets, let's see how they stack up compared to multi-layer nets. :) We will be exploring the basic concepts of learning non-linear...

Please help me with writing a JAVA code Please do not copy from other experts and approach all question (5 codes) Please make sure the code runs Dinosaur. java This file defines a Dinosaur object....

Hi, I have a Java assignment on Collinear Points and am struggling greatly. Please help. And here is the starter code... Point.Java: import java.util.Comparator; /** * Point.java. * Models a two...

IN C#: Below I have the code that I did for this program and the instructions. I am not sure if my code is how it should be for this program. I know that we were given 4 or 5 .txt files, I am not...

Please help me with this problem and code this using python thank you. Thus, we obtain Nu(t) N, (t + At) N (t) + At (7) which forms the basis for a numerical solution of our radioactive decay...

Please help me with writing a JAVA code Please do not copy from other experts and approach all question Dinosaur. java This file defines a Dinosaur object. Variables: The Dinosaur class must have...

Please help me with writing a JAVA code Please do not copy from other experts and approach all question (5 codes) Dinosaur. java This file defines a Dinosaur object. Variables: The Dinosaur class...

C PROGRAMMING: Please Help! Overview: We will program the Babylonian Square Root Method. Let number be the number to get the square root of Start with an estimate of 1. Recompute estimate = 0.5 *...

In both Friedman's and the Keynesian models of the Phillips curve the formation of expectations of inflation plays an important role. Explain how expectations are formed in their respective models....

Why has Dell's direct supply chain been so successful? Describe the main supply chain challenges that Dell is now facing and provide your recommendations for solving them. Why did Dell decide to sell...

An annuity pays $ 1 1 , 0 0 0 per year for 1 0 years. At a 4 % discount rate, what is it worth today?

The closing stock price for each of two stocks was recorded over a 12-month period. The closing price for the Dow Jones Industrial Average (DJIA) was also recorded over this same time period. These...

3 The Amish forgive those involved in conflict with them because of their deeply held religious beliefs. Are there other reasons for forgiveness that can be useful in a conflict?

4 What is your reaction to the information on Amish forgiveness in light of what we learned about Amish shunning earlier in the chapter?

Think of an attitude you have about conflict that is making it difficult for you to talk productively about disagreements with someone in your life. For example, do you believe that discussing...