Question: Question 4 , [ 2 0 marks ] Having the following XO sequence of states a long with their values O - - - X

Question 4,[20 marks]
Having the following XO sequence of states a long with their values
O
-
-
-
X
-
X
O
-
0.500
O
-
-
-
X
X
X
O
-
0.576
O
-
-
-
X
X
X
O
O
0.545
O
X
-
-
X
X
X
O
O
0.556
O
X
-
O
X
X
X
O
O
0.577
O
X
X
O
X
X
X
O
O
1.000
a. Assume a learning rate of 0.73 what will be updated values adopting gradient-based state value update with each move.
b. Assume a learning rate () of 0.86 and discount factor () of 0.82, what will be updated values adopting TD-based state value update with each move. Assume all rewards are -1 except for the actions leading to the goal state with respect to X-player.
c. Compare both algorithms applied in a. and b..
Question 4 , [ 2 0 marks ] Having the following

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Programming Questions!