Problem 2 ( 1 5 points ) Consider the following deterministic Markov Decision Process ( MDP ) , describing a simple robot grid world with 6 states and 4 actions RIGHT, LEFT, UP , and DOWN ( not all actions can be taken at all states ) State s 5 is a terminal state and once the agent reaches there, stays The values of the immediate rewards are written next to transitions Transitions with no value have an immediate reward of 0 ( the agent only rewarded if it goes from s 3 to s 5 with RIGHT action with reward of 5 0 , and if it goes from s 6 to s 5 with UP action with reward of 1 0 0 ) Assume the discount factor 0 8 a ) For each state s i n s 1 , s 2 , dots, s 6 , compute the value for v ( s ) b ) What action needs to be taken at each state based on optimal policy in the figure above and include in your solution Mark the state action transition arrows that correspond to one optimal policy If there is a tie, always choose the state with the smallest index c ) How many complete iterations of Value Iteration are sufficient to guarantee finding the optimal policy for this MDP Assume that values are initialized to zero, and that states are considered in an arbitrary order on each iteration

The Answer is in the image, click to view ...

Question: Problem 2 . ( 1 5 points ) Consider the following deterministic Markov Decision Process ( MDP ) , describing a simple robot grid world

Problem

2 . (15

points

)

Consider the following deterministic Markov Decision Process

(

MDP

),

describing a simple robot grid world with

6

states and

4

actions RIGHT, LEFT, UP

,

and DOWN

(

not all actions can be taken at all states

) .

State

s_{5}

is a terminal state and once the agent reaches there, stays. The values of the immediate rewards are written next to transitions. Transitions with no value have an immediate reward of

0 (

the agent only rewarded if it goes from

s_{3}

s_{5}

with RIGHT action with reward of

50,

and if it goes from

s_{6}

s_{5}

with UP action with reward of

100) .

Assume the discount factor

= 0.8 .

)

For each state

s i n {s_{1}, s_{2},

dots,

s_{6}},

compute the value for

v^{* *} (s) .

)

What action needs to be taken at each state based on optimal policy in the figure above and include in your solution? Mark the state

-

action transition arrows that correspond to one optimal policy. If there is a tie, always choose the state with the smallest index.

)

How many complete iterations of Value Iteration are sufficient to guarantee finding the optimal policy for this MDP

?

Assume that values are initialized to zero, and that states are considered in an arbitrary order on each iteration.

Problem 2 . ( 1 5 points ) Consider the following

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

in java Problem 4. Markov Decision Process (MDP) (Adapted from Russell-Norvig Problem 178) (30 points 15 points each part) In class, we studied that one way to solve the Bellman update equation in...

Microkernel operating systems aim to address perceived modularity and reliability issues in traditional "monolithic" operating systems. (i) Describe the typical architecture of a microkernel...

How would you change the MDP representation of Section 13.3 to a POMDP? Take the simple robot problem and its Markov transition matrix created in Section 13.3.3 and change it into a POMDP. Think of...

[Solutions to this assignment must be submitted vio CANVAS prior to midnight on the due dote. These dates and times vory depending on the milestone to be submitted. Submissions up to one day late...

answer the question clearly You are building a flight-control system for which a convincing safety case must be made. Would you assign the tasks of safety requirements engineering, test case...

1 . Consider the following Markov decision process, with the gridworld and transition function as illustrated below. The states are grid squares, identified by their row and column number ( row first...

Score for this attempt: 7.75 out of 10\ Submitted Jan 28 at 10:02pm\ This attempt took 29 minutes.\ \ Question 1\ 1 / 1 pts\ Review the discussion about task environments from class and chapter 2.3...

What is the difference between MouseListener and MouseAdapter? [3 marks] (b) Via suitable HTML, the compiled version of the following Java code is presented to the appletviewer application: import...

A discrete sequence {xn} can be converted into a continuous representation x(t) = ts X n= (t n ts) xn, where ts is the sampling period. (a) State two characteristic properties of Dirac's function. [2...

You are designing a new syntax for a programming language like Java, with the intention of making it more approachable to students by using English words instead of punctuation symbols. (a) How does...

Find the 50th derivative of y = cos 2x.

You are the accountant for a small food and beverage wholesaler. It is the last day of the current accounting month and business has been slow for the past 2 hours. You have just finished finalizing...

Which type of investor would be most comfortable with a self - directed variable annuity contract?

6 A negative point-charge Q, is 10 cm direcly below a positive point-charge Q as shown in (Figure 1). Figure 5 cm Q2 + P 10 cm. 4 Part B If Q-25 nC and Q 4.5 nC, what is the magnitude of the net...