4 This question is on softmax Softmax is briefly described in Jurafsky and Martin, chapter 7 , section 7 5 1 ( This whole chapter is a good brief introduction to feedforward neural networks ) ( a ) Calculate the probabilities for the following sets of inputs to softmax ( the inputs to softmax are generally called logits ) 2 , 1 , 0 , 1 , 3 , 3 , 3 , 3 , 0 1 , 0 2 , 0 0 , 0 4 ( b ) If the logits are y 1 , y 2 , y 3 , are the softmax probabilities the same for y 1 c , y 2 c , y 3 c for any value c Give an argument justifying your answer ( c ) What would the softmax output be for logits that are 3 3 , 3 3 , inf, inf , where inf is the python expression for minus infinity ( Hint you don t need to do any calculation Remark inf is actually used as an input to softmax in the GPT attention mechanism ) ( d ) Suppose the predicted probabilities of three possible outcomes ( A , B , and C ) are 0 0 5 , 0 8 , 0 1 respectively What is the log loss ( cross entropy loss ) if the true outcome ( ie the label in the training set ) is A What is it if the true label is B and if it is C

The Answer is in the image, click to view ...

Question: 4 . This question is on softmax Softmax is briefly described in Jurafsky and Martin, chapter 7 , section 7 . 5 . 1 .

4 .

This question is on softmax Softmax is briefly described in Jurafsky and Martin, chapter

7,

section

7.5.1 . (

This whole chapter is a good brief introduction to feedforward neural networks.

)

(

)

Calculate the probabilities for the following sets of inputs to softmax

(

the inputs to softmax are generally called

logits

)

[- 2, - 1, 0, 1], [3, 3, 3, 3], [0.1, 0.2, 0.0, 0.4] .

(

)

If the logits are

[

1,

2,

3],

are the softmax probabilities the same for

[

1 +

,

2 +

,

3 +

]

for any value c

?

Give an argument justifying your answer.

(

)

What would the softmax output be for logits that are

[3.3, 3.3, -

inf,

-

inf

],

where

-

inf is the python expression for minus infinity?

(

Hint: you don

t need to do any calculation! Remark:

-

inf is actually used as an input to softmax in the GPT attention mechanism.

)

(

)

Suppose the predicted probabilities of three possible outcomes

(

,

,

and C

)

are

[0.05, 0.8, 0.1]

respectively. What is the log

-

loss

(

cross entropy loss

)

if the true outcome

(

ie the label in the training set

)

is A

?

What is it if the true label is B

?

and if it is C

?

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

KINGS OWN INSTITUTE* Success in Higher Education ICT106 DATA COMMUNICATIONS AND NETWORKS T223 Page 1 of 18 AUSTRALIAN INSTITUTE OF BUSINESS AND MANAGEMENT PTY LTD ABN: 72 132 629 979 CRICOS 03171A...

BUS707 Applied Business Research T123 All information in the Subject Outline is correct at the time of approval. KOI reserves the right to make changes to the Subject Outline if they become...

Problem section A (1, 3,4,7,8,9, 11, 12, 13, 14) Proborn section B ( 3,7, 25,29) also explain how you solved each problem :) 80 1 of 5 ole Numbers: Operations and Properties example 3.1 Use the...

Your reflection notes should be organized to present: Key learning points from the three readings of your choice (a summary) Your reflections References Reflections is underlined as this is...

Theme one is about the greatest skill a leader can possess that is the art of communication. This week's theme is to discuss the importance of communication which at its heart is the art of...

Provide a summary technical report with about Pipelined Execution which is also named as Instruction Level Parallelism, addressing mainly the following areas: 1. What is Pipelined Execution and its...

Read the text in photos and answer the following question As in most areas of life and work, advances in information technology have focused attention on the importance of good information systems to...

Below is the assignment, attached in the excel file is actually all the answers, I need these transferred into my own excel file, answering all the questions below. Integrated mini-case: ?Working...

Find the company selected for the Week 2 ( Starbucks ) assignment's annual report from SEC.gov or the investor relations section of the company's website. Be careful not to use quarterly reports....

As in most areas of life and work, advances in information technology have focused attention on the importance of good information systems to support logistics and distribution activities. The...

Atmospheric pressure is 14.7 pounds per square inch (lb/in.2) at sea level. An increase in altitude of 1 mi produces a 20% decrease in the atmospheric pressure. Mountain climbers use this...

Comment on the following statement: We cannot afford to terminate the project now. We have already spent more than 50 percent of the project budget.

You invest $1,000 a year for 10 years at 6 percent and then invest $2,000 a year for an additional 10 years at 6 percent. How much will you have accumulated at the end of the 20 years? Answer:...

Compared with half a century ago, adoption has become _ _ _ _ _ _ _ _ _ common, but it is more open and acceptabl e , so we probably discuss it _ _ _ _ _ _ _ . fill in the blanks more or much less or...

How do Data Types perform data validation?

How does Referential Integrity work?

What are Dimensional Relational Databases designed to hold primarily?