Question: help with Q 1 import numpy as np import matplotlib.pyplot as plt import time from tqdm import tqdm from aitools.algs import DPAgent, MCAgent from aitools.envs

help with Q 1
import numpy as np
import matplotlib.pyplot as plt
import time
from tqdm import tqdm
from aitools.algs import DPAgent, MCAgent
from aitools.envs import FrozenPlatform
Create Environment
An instance of the FrozenPlatform environment has been provided for you in this cell. Call the display() method of this isntance with fill='slip' and contents='slip' to display the environment with the slip probabilities for each state.
run cells below
pi1={0:0,1:2,2:2,3:2,4:3,5:1,6:1,7:2,8:0,9:0,10:1,11:2,12:2,13:0,14:1,15:1,16:0}
pi2={0:0,1:2,2:2,3:2,4:3,5:1,6:2,7:2,8:0,9:0,10:1,11:2,12:2,13:0,14:1,15:1,16:0}
plt.subplot(1,2,1)
fp1.display(contents=pi1, fill=None, show_fig=False)
plt.subplot(1,2,2)
fp1.display(contents=pi2, fill=None, show_fig=False)
plt.show()
Create two instances of the DPAgent class, each using the environment created in Step 1.A, and each with gamma=1. One of the agents should be set to have policy pi1 and the other should have policy pi2. Run policy evaluation for both agents to evaluate the two policies.
Then display a 1x2 grid of subplots. Each subplot should show a display of the environment along with a policy. The first subplot should display pi1 and have cells shaded according to the value function for pi1. The second plot should be similar, but should use policy pi2 and its value function.
Note: You can copy the code for the subplots from 1.B, adjusting the arguments used for the fill and contents parameters.
Print the value of State 1(the initial state) under each policy.
You will now estimate the agent's success rate when following each policy. This will be accomplished by generating 10,000 episodes according to each policy and then calculating the proportion of episodes that where sucessful.
Fill in the blanks in order to accomplish the requested task. Then print the two messages shown below, with the blanks filled in with the appropriate success rates, rounded to 4 decimal places. Aside from filling in the blanks, do not change any code provided.
N =10000
goals1=0
goals2=0
np.random.seed(1)
for i in range(N):
ep1=______.generate_episode(policy=______)
ep2=______.generate_episode(policy=______)
if ep1.state == ep1.______:
goals1+=1
if ep2.state == ep2.______:
goals2+=1
sr1=______
sr2=______
print(f"Under policy 1, the agent's success rate was {______:.4f}.")
print(f"Under policy 2, the agent's success rate was {______:.4f}.")

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!