Question: Question 4 [ 4 pts ] : Figure 3 . a shows a 3 x 4 robot navigation field. The shade squares are obstacles, and

Question 4[4 pts]: Figure 3.a shows a 3x4 robot navigation field. The shade squares are obstacles, and the three cells [2,4],[3,2] and [3,3] are terminal states, and the values showing are the reward of the terminal states (each cell is also a state). The reward for each of the rest states (except the obstacles and terminal states) is -0.05. To train a robot to navigate in the field, a stochastic transition model shown in Figure 3.b is used. At any location, say [1,1], if the robot cannot move in a certain direction (e.g., there is wall or obstacle), it will remain in the same position. For example, when the robot is at 1,1, it cannot move to the left because of the wall. The discount =0.9, and the initial utility values of each state are 0.
Figure 3.b
(1) Use value iteration algorithm to find utility values for cells [2,2] and [2,3], respectively after the FIRST iteration (exclude terminal states and obstacles). Solutions must show calculations (no need to calculate values for other cells)[4 pts]
Question 4 [ 4 pts ] : Figure 3 . a shows a 3 x 4

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!