Question: Consider the given scenario : Consider a robot that needs to learn how to leave a house in the best path possible. We have a
Consider the given scenario : Consider a robot that needs to learn how to leave a house in the best
path possible. We have a house with rooms, and one "exit" room. A graph representing it is given
below. On this graph all rooms are nodes, and the arrows the actions that can be taken on each
node. The arrow values are the immediate rewards that the agent receives by taking some action
on a specific room. We choose our reinforcement learning environment to give reward for all
rooms that are not the exit room. In our target room we give a reward. Let the discount factor
be and the learning rate be An episode starts with a random start node and ends upon
reaching the target room.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
