Question: ( 2 0 points ) Figure 4 shows the gridworld MDP and the transition function. The states are grid squares, identified by their row and

(20

points

)

Figure

4

shows the gridworld MDP and the transition function. The states are

grid squares, identified by their row and column number

(

row first

) .

The agent always

starts in state

(1, 1),

marked with the letter S

.

There are two terminal goal states,

(2, 3)

with

reward

+ 5

and

(1, 3)

with reward

- 5 .

Rewards are

0

in non

-

terminal states.

(

The reward for

a state is received as the agent moves into the state.

)

The transition function is such that the

intended agent movement

(

North

,

South, West, or East

)

happens with probability

. 8 .

With

probability

. 1

each, the agent ends up in one of the states perpendicular to the intended

direction. If a collision with a wall happens, the agent stays in the same state. Table

1

is the

optimal policy for this grid.

(

)

Figure

4

(

)

Gridworld MDP

(

)

Transition function

Table

1

: Optimal policy

6 - 1 . (4

points

)

Write the optimal policy when the agent is in

(1, 1) .

[

Answer box

]

6 - 2 . (4

points

)

Write the optimal policy when the agent is in

(1, 2) .

[

Answer box

]

6 - 3 . (4

points

)

Write the optimal policy when the agent is in

(2, 2) .

[

Answer box

]

( 2 0 points ) Figure 4 shows the gridworld MDP

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!

A gridworld is shown in Figure ( a ) below. The states are grid squares, identified by their row and column number ( row first ) . The agent always starts in state ( 1 , 1 ) , marked with the letter...

Please write a program in Java with the following specs, Thank you very much The program The purpose of this assignment is to provide some exercise in using multiply-linked data structures. Your...

In this project, you will: . Practice basic C++ syntax including branching structures .Write a program that calls multiple functions Manage a two-dimensional array . Use simple file input/output 2....

Listen to the following videos and then complete the assignments using the changed numbers on the guidance report. Place your answers on the guidance report. Open the Guidance Report and rework the...

excel project- real estate investment 1.0 Step Instructions Points Possible 0 Start Excel. Download and open the file named Excel Project In the Nor sheet, calculate the Total Rentable Square Feet...

HEADER FILES BmpProcessor.h Source Code: struct BMP_Header { char signature[2]; // ID Field int size; // Size of the BMP File int offset; // Offset where the pixel array can be found short reserved1;...

HEADER FILES BmpProcessor.h struct BMP_Header { char signature[2]; // ID Field int size; // Size of the BMP File int offset; // Offset where the pixel array can be found short reserved1; // Program...

Function Save C Reset BB MATLAB Documentation 1 %Title: LAB 5 2 %Filename: BaleHauling.m 3%Author: 4 Date: 5 Description: This function takes in a list of bale locations and the number of bales that...

Let us define a gridworld MDP , depicted in Figure 2 . The states are grid squares, identified by their row and column number ( row first ) . The agent always starts in state ( 1 , 1 ) , marked with...

For what kinetic energy of a neutron will the associated de Broglie wavelength be 1.40 x 10-0 m? Also find the de Broglie wavelength of a neutron, in thermal equilibrium with matter, having an...

In Problems 1-3, use the method of undetermined coefficients to solve each differential equation? 1. Y" - 9y = x 2. Y" + y' - 6y = 2x2 3. Y" - 2y' + y = x2 + x

Which of the following is not a provision of the federal Truth in Lending Act? The Act only applies to business debtors. The credit in question must be subject to a finance charge or be payable in...

3 Verify that the Mean Value Theorem can be applied to the function f(m) = 3:4 an the interval [0, 16]. Then find the value of c in the interval that satisfies the conclusion of the Mean Value...