Question: Q - Learning. In the following grid - world, the agent tries to learn the optimal policy. When the agent falls into a state with

Q-Learning. In the following grid-world, the agent tries to learn the optimal policy. When the agent falls into a state with the number in Fig. (a), the corresponding reward is awarded during the transition. All the states with the number in Fig. (b) are terminal states. Other states have actions (NORTH, EAST, SOUTH, WEST). The start state (1,3) denoted by Start. We assume that Q- learning has a learning rate =0.5 and the discount factor =0.5. Here is no stochasticity (i.e., the agent moves deterministically).

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!