Question: Problem 3 (20 marks) Consider the following Reinforcement Learning problem (the rewards R are tagged to the transitions, the transition probabilities are unknown) with states

Problem 3 (20 marks) Consider the following Reinforcement Learning problem (the rewards R are tagged to the transitions, the transition probabilities are unknown) with states 1...7, of which state 7 is a terminal state. Let the initial values of all states be 0. Initialize the discount factor y = 1. What are the values of all states (after each epoch) when Temporal Difference learning is used after the following episodes? The learning parameter a=0.5 is fixed. Episode 1: {1,3,5, 4, 2,7} Episode 2: {2,3,5,6,4,7) Episode 3: {5, 4, 2,7} 7 R=4 R=-1 2 R=-2 R=2 R=1 1 R=-2 R=2 R=-2 3 5 R=3 R=4 Problem 3 (20 marks) Consider the following Reinforcement Learning problem (the rewards R are tagged to the transitions, the transition probabilities are unknown) with states 1...7, of which state 7 is a terminal state. Let the initial values of all states be 0. Initialize the discount factor y = 1. What are the values of all states (after each epoch) when Temporal Difference learning is used after the following episodes? The learning parameter a=0.5 is fixed. Episode 1: {1,3,5, 4, 2,7} Episode 2: {2,3,5,6,4,7) Episode 3: {5, 4, 2,7} 7 R=4 R=-1 2 R=-2 R=2 R=1 1 R=-2 R=2 R=-2 3 5 R=3 R=4
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
