Question: Reinforcement Learning problem: Consider the following Reinforcement Learning problem (the rewards R are tagged to the transitions, the transition probabilities are unknown) with states 1...7,
Reinforcement Learning problem:

Consider the following Reinforcement Learning problem (the rewards R are tagged to the transitions, the transition probabilities are unknown) with states 1...7, of which state 7 is a terminal state. Let the initial values of all states be 0. Initialize the discount factor y = 1. What are the values of all states (after each epoch) when Temporal Difference learning is used after the following episodes? The learning parameter a = 0.5 is fixed. Episode 1: {1, 3, 5, 4, 2, 7} Episode 2: {2, 3, 5, 6, 4, 7} Episode 3: {5, 4, 2, 7} 7 R=4 R=-1 2 V 4
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
