Question: eabove is a windy gridworld. The arrows will push an agent up when it moves onto them (the numbers at the bottom of each column
eabove is a "windy gridworld". The arrows will push an agent up when it moves onto them (the numbers at the bottom of each column indicate the force of the wind). S is the start state and G is the goal state. The idea is for the agent to learn to get to the goal from the start in the minimal amount of steps. Formulate this as a reinforcement learning problem where each move is given a -1 value. Solve using both (1) sarsa and (2) q-learning. Produce a graph showing the total cost of an episode throughout the training run
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
