Question: Question 6 . ( 6 marks ) Consider the following MDP: the set of states is S = { s 0 , s 1 ,

Question

6 . (6

marks

)

Consider the following MDP: the set of states is

S = {s_{0}, s_{1}, s_{2}, s_{3}}

and the set of actions available at each

state is

A = {l, r} .

Each episode of the MDP starts in

s_{1}

and terminates in

s_{0} .

You do not know the transition probabilities or the reward function of the MDP

,

so you are using Sarsa

to find the optimal policy. Suppose the current

Q -

values are:

Q (s_{0}, l) = 0, Q (s_{0}, r) = 0

Q (s_{1}, l) = 3.4, Q (s_{1}, r) = - 1.8

Q (s_{2}, l) = - 0.8, Q (s_{2}, r) = - 0.7

Q (s_{3}, l) = - 0.5, Q (s_{3}, r) = 7.5

Suppose the next episode is as follows:

s_{1}, l, - 1, s_{1}, r, - 1, s_{2}, l, - 1, s_{1}, l, 10, s_{0} .

(

) (4

marks

)

Do all the Sarsa updates to the

Q -

values that would result from this episode, using

= 0.25

and

= 0.9 .

Show your working.

(

) (1

mark

)

Based on the updated

Q -

values, give the final policy

determined by

Q,

.

.,

give

(s_{1}), (s_{2})

and

(s_{3}) .

Show your working.

(

) (1

mark

)

Give an

l o n -

greedy policy based on the

Q -

values obtained in

(

) .

Question 6.(6 marks) Consider the following MDP: the set of states

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Question 6 (11 marks) Consider the following material and independent situations noted during the audit engagement of Whampoa - Whippy Ltd for the year ended 30 June, 2020. (a) The draft Group...

ino Question 2: (5 Marks) Consider the following flowchart Start Set X=10 Set N=0 Set V=0 No Is V

Question 2: (5 Marks) Consider the following flowchart Start Set X=10 Set N=0 Set V-O No Is V

Question 6 (10 marks) Consider the following information: State of Economy Probability of State of Economy Rate of Return if State Occurs Stock C Stock A Stock B 0.27 Boom 0.10 0.35 0.45 0.10 Good...

Question 6 (11 marks) Consider the following performance data for two portfolio managers (A and B) and a common benchmark portfolio: BENCHMARK Weight Return 0.4 -3.0% MANAGER A Weight Return 0.4...

Question 6 (10 marks) Consider the following classic portfolio choice problem. Two assets are available to an investor at time t. One is riskless, with simple return R, from time 1 to 1+1, and the...

Question 6 [24 marks] Consider the following data for Nestle Bhd. stock in a sub-period of trading days: Date Close 6/4/2021 135.5 7/4/2021 136.5 8/4/2021 135.1 9/4/2021 136.1 12/4/2021 136.5...

Question 6 (10 marks) : Consider the following two processes that share a common variable X and a semaphore S: // Shared variable int X =2; binary semaphore S = 1; Process P1: Process P2:...

Question 6 (11 marks) Consider the following extract from Case 12 2 Mair v. Bank of Nova Scotia: To constitute an apparent alteration within the meaning of the Bills of Exchange Act it should be...

excel is fine Page 7 of 9 Question 6 (20 marks) Consider the following two mutually exclusive projects: Year Cash Flow (A) Cash Flow B) 0 -$200,000 -$250,000 1 $40,000 $40,000 2 $65,000 $65,000 3...

Three circles of radii 4cm,5cm, and 6cm are mutually tangent. Find the area of the "triangular "region they enclose.

What is the basic principle of internal control?

The sum of all the market value of publicly traded corporations in the United States exceeds the combined value of partnerships and sole proprietorships?

------------------------------------- At the beginning of its first year of operations, Bonita Limited has 4,900, $3 preferred shares and 48,000 common shares. Using the format shown below, allocate...

Did you add the logo at correct size and proportion?

Did you use the 60/40 or 50/50 space ratio on the backside if mailing?

Did you check all the correct names, telephone numbers, web addresses, and location for accuracy?