Question: Q6. [15 pts] Reinforcement Learning Imagine an unknown environments with four states (A, B, C, and X), two actions (). An agent acting in this
![Q6. [15 pts] Reinforcement Learning Imagine an unknown environments with four](https://dsd5zvtm8ll6.cloudfront.net/si.experts.images/questions/2024/10/6718cbf802c48_3276718cbf7d9c06.jpg)
Q6. [15 pts] Reinforcement Learning Imagine an unknown environments with four states (A, B, C, and X), two actions (). An agent acting in this environment has recorded the following episode: S a S' r Q-learning iteration numbers (for part b) B 1, 10, 19, .. . O 2, 11, 20, . B 3, 12, 21, .. . 1, 13, 22, . 5, 14, 23, . . . 6, 15, 21, .. . 7, 16, 25, . . . 8, 17, 26, ... 9, 18, 27, .. . (a) [4 pts] Consider running model-based reinforcement learning based on the episode above. Calculate the following quantities: T(B, -, C) = R(C, -, X) = (b) [5 pts] Now consider running Q-learning, repeating the above series of transitions in an infinite sequence. Each transition is seen at multiple iterations of Q-learning, with iteration numbers shown in the table above. After which iteration of Q-learning do the following quantities first become nonzero? (If they always remain zero, write never). Q(A, >)? Q(B,
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
