Consider the general domain of grid-world navigation tasks, where there is a goal state, obstacles, and a

Question:

Consider the general domain of grid-world navigation tasks, where there is a goal state, obstacles, and a discount factor γ < 1. The actions are stochastic, so the agent may slip into a different cell when trying to move. There are five possible actions: go north, south, east, west, or stay in the same location. Consider the situation in which negative costs are incurred when bumping into walls. Can you draw a 3x3 example environment in which the best action in at least one state is to stay? If so, specify the actions, rewards and transition probabilities. If not, explain why.

Fantastic news! We've Found the answer you've been seeking!