Question: eabove is a windy gridworld. The arrows will push an agent up when it moves onto them (the numbers at the bottom of each column

 eabove is a "windy gridworld". The arrows will push an agent

eabove is a "windy gridworld". The arrows will push an agent up when it moves onto them (the numbers at the bottom of each column indicate the force of the wind). S is the start state and G is the goal state. The idea is for the agent to learn to get to the goal from the start in the minimal amount of steps. Formulate this as a reinforcement learning problem where each move is given a -1 value. Solve using both (1) sarsa and (2) q-learning. Produce a graph showing the total cost of an episode throughout the training run

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!