Question: Problem 3 (20 marks) Consider the following Reinforcement Learning problem (the rewards R are tagged to the transitions, the transition probabilities are unknown) with states

 Problem 3 (20 marks) Consider the following Reinforcement Learning problem (the

Problem 3 (20 marks) Consider the following Reinforcement Learning problem (the rewards R are tagged to the transitions, the transition probabilities are unknown) with states 1...7, of which state 7 is a terminal state. Let the initial values of all states be 0. Initialize the discount factor y = 1. What are the values of all states (after each epoch) when Temporal Difference learning is used after the following episodes? The learning parameter a=0.5 is fixed. Episode 1: {1,3,5, 4, 2,7} Episode 2: {2,3,5,6,4,7) Episode 3: {5, 4, 2,7} 7 R=4 R=-1 2 R=-2 R=2 R=1 1 R=-2 R=2 R=-2 3 5 R=3 R=4 Problem 3 (20 marks) Consider the following Reinforcement Learning problem (the rewards R are tagged to the transitions, the transition probabilities are unknown) with states 1...7, of which state 7 is a terminal state. Let the initial values of all states be 0. Initialize the discount factor y = 1. What are the values of all states (after each epoch) when Temporal Difference learning is used after the following episodes? The learning parameter a=0.5 is fixed. Episode 1: {1,3,5, 4, 2,7} Episode 2: {2,3,5,6,4,7) Episode 3: {5, 4, 2,7} 7 R=4 R=-1 2 R=-2 R=2 R=1 1 R=-2 R=2 R=-2 3 5 R=3 R=4

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!