Question: ## 4 . D - Success Rate Plot Use Matplotlib to create a line plot showing the progress in the success rate for the Monte

## 4.D - Success Rate Plot
Use Matplotlib to create a line plot showing the progress in the success rate for the Monte Carlo agent. The y-values for the line plot should come from the success rate list created in the previous cell. The x-values should be the corresponding number of episodes. The figure should also have the following characteristics.
* A figsize of `[4,3]`.
* The title should read "MC Agent Success Rate".
* The x and y axes should be labeled "Number of Episodes" and "Success Rate", respectively.
* Add a grid to your plot.
## 4.E - Display Policy
Calculate the mean absolute difference between the optimal state-value function and the current estimate produced by Monte Carlo control. Print the message shown below with the blank filled in with the appropriate value, rounded to 2 decimal places.
The mean absolute difference in V is ____.
Display the environment from 4.A, setting `fill` to shade the the cells according to their value under the policy found by MC control, and set `contents` to display that policy. When calling display(), set `size=2` and `show_nums=False`.
## 4.F - Q-Learning
Starter code has been provided in the cell below. Complete this code to repeat the process outline in Step 4.C, but using Q-learning insteasd of MC Control. The process is identical to that described in Step 4,C, with two exceptions:
1. You will use Q-learning instead of MC control.
2. The characters "MC" in the output should be replaced with "TD".
______= TDAgent(env=______, gamma=1, random_state=1)
s_rates_2=[]
for i in range(1,11):
num_eps =______
______.q_learning(episodes=num_eps, epsilon=10**(-i), alpha=0.01, max_steps=200, exploring_starts=______)
sr = success_rate(env=______, policy=______, episodes=1000, max_steps=200, random_state=i)
s_rates_2.append(sr)
print(f"After {i * num_eps} episodes, the TD agent's success rate was {sr:.3f}.")
## 4.G - Success Rate Plot
Repeat the steps outlined in Step 4.D, but using the list created for Q-Learning in 4.F instead. The title of this figure should be "TD Agent Success Rate".
## 4.H - Display Policy
Repeat the steps outlined in Step 4.E, but using the policy and state-value function estimates found using Q-learning rather than those found by Monte Carlo control.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!