Question: Consider the given scenario : Consider a robot that needs to learn how to leave a house in the best path possible. We have a

Consider the given scenario : Consider a robot that needs to learn how to leave a house in the best
path possible. We have a house with 5 rooms, and one "exit" room. A graph representing it is given
below. On this graph all rooms are nodes, and the arrows the actions that can be taken on each
node. The arrow values are the immediate rewards that the agent receives by taking some action
on a specific room. We choose our reinforcement learning environment to give 0 reward for all
rooms that are not the exit room. In our target room we give a 100 reward. Let the discount factor
be 0.7 and the learning rate be 0.4. An episode starts with a random start node and ends upon
reaching the target room.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!