Question: 1 Consider a maze shown on the second page. This maze consists of several walls that the agent cannot enter and bumps and oils that

1 Consider a maze shown on the second page. This maze consists of several walls that the agent cannot enter and bumps and oils that moving to them have negative rewards. For simplicity, consider an 18 18 matrix, where each element is associated with one of the following: Empty Full (Wall) Bump Oil State Space (): The state-space contains all cells in the maze except the walls, where the agent can possibly be there (18 18 76() = 248). Action Space (A): The agent can take one of the four possible actions at any given state: up (U), down (D), right (R), and left (L). Transition Probabilities: After choosing an action, the agent will either move to one of the neighborhood cells or stay in its current cell. After taking any action, with a probability of 1-p, the agent moves to the anticipated state and, with an equal probability of p/3, will move to one of the other neighboring cells. Consider the following example: Notice that if any of the neighboring cells are wall, the agent stays in the current cell. Reward Function: The primary objective is to find the optimal policy

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!