Question: 4 - Assuming that all Q - values are initialized to 0 , what are the Q - values for the following state - action

4 -

Assuming that all

Q -

values are initialized to

0,

what are the

Q -

values for the following state

-

action pairs after running

[

tabular

]

-

learning for the first episode?

[

skip

/

disregard episodes

2

and

3] .

Use discount factor

= 0.8

and learning rate

= 0.6

(

,

Down

)

Q (B, U p)

Hint: Use the following equations and update

Q

values after each transition until the end of episode

1 .

Consider your new sample estimate

target

= R (s, a, s^{'}) + m a x_{a^{'}}

hat

(Q) (s^{'}, a^{'})

Incorporate the new estimate into a running average

hat

(Q) (s, a) l a r r (1 -) h a t (Q) (s, a) + () [

target

]

5 -

Repeat part

4

if you run SARSA

(

temporal difference

)

with the above experience sequence

(

again assume that all Q

-

values

are initialized to

0

and use only episode

1) ?

Use discount factor

= 0.8

and learning rate

= 0.6

Hint: Use the following equations and update

Q

values after each transition until the end of episode

1 .

Sample of hat

(Q)^{} (s, a)

,

target

= R (s, a, s^{'}) + h a t (Q)^{} (s^{'}, a^{'})

Update hat

(Q)^{} (s, a)

,

hat

(Q)^{} (s, a) l a r r (1 -) h a t (Q)^{} (s, a) +

target

4- Assuming that all Q-values are initialized to 0, what are

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

answer the question clearly You are building a flight-control system for which a convincing safety case must be made. Would you assign the tasks of safety requirements engineering, test case...

20 easy java questions?? 1. The logical structure in which one instruction occurs after another with no branching is a _____. a. case b. loop c. sequence d. selection 2. Which of the following is...

[Solutions to this assignment must be submitted vio CANVAS prior to midnight on the due dote. These dates and times vory depending on the milestone to be submitted. Submissions up to one day late...

ion: Consider the following rules " If one is drunk or sick then he/she is not sober. Further, assume the following facts concerning the respective people: "Tony is sober" "Tom is not sober" "Esther...

Let A, B be sets. Define: (a) the Cartesian product (A B) (b) the set of relations R between A and B (c) the identity relation A on the set A [3 marks] Suppose S, T are relations between A and B, and...

Chapter 5 Discrete Markov process Suppose we have a sequence of random variables, {X0 , X1 , . . .} = {Xn } , which is also n=0 called a stochastic process. In this chapter, we study a commonly used...

ccn2 java solve them all . . . r2 e1 e2 box r2 Write sound typing and subtyping rules for these constructs. [5 marks] Now suppose that we add to this calculus the type variables and bounded universal...

ccm1 java attend all . . . r2 e1 e2 box r2 Write sound typing and subtyping rules for these constructs. [5 marks] Now suppose that we add to this calculus the type variables and bounded universal...

RL simulation You are walking towards home when you see a troll sleeping under a bridge. You know you have two choices: approach the bridge as planned, or wake up the troll and see what happens. If...

A suite of Modula-3 procedures is being developed to handle arbitrarily large nonnegative integers. A test program for handling such numbers includes the following TYPE declaration: TYPE Digit =...

A multimode step-index fibre has a core index of 1.484 and a cladding index of 1.456. When the fibre is overfilled with light emitted from a light-emitting diode (LED), estimate the pulse broadening...

A 68-kg woman is planning to bicycle for an hour. If she is to meet her entire energy needs while bicycling by eating 30-g chocolate candy bars, determine how many candy bars she needs to take with...

Enzo, a 25-year-old freelance graphic designer, is looking to grow his savings and is exploring different investment options. He has $5,000 that he would like to invest in a low-risk, short-term...

Mike Macaro is selling a piece of land. Two offers are on the table. Morton Company offered a $31,000 down payment and $34,100 a year for the next 6 years. Flynn Company offered $20,500 down and...

10-18 Do you think Zagats decision to use a pay wall for its Web site was a mistake? Why or why not? Founded by Tim and Nina Zagat, the Zagat Survey has collected and published ratings of restaurants...

Explain the origins of human resources (HR) and their development over the last 100 years.

10-17 Why was Zagats content well suited for the Web and for the mobile digital platform? Founded by Tim and Nina Zagat, the Zagat Survey has collected and published ratings of restaurants by diners...