Question: python. help with 2 2 . A - Create Environment Create a 5 x 5 instance of the FrozenPlatform environment with sp _ range =

python. help with 2
2.A - Create Environment
Create a 5x5 instance of the FrozenPlatform environment with sp_range=[0.1,0.3], start=1, holes=3, and with random_state=1. Display this environment with the cells shaded to indicate their slip probabilities and with the cell contents left blank.
[]
2.B - Random Actions
You will now estimate the agent's success rate when taking random actions. We will compare this with the success rate of a trained again later.
Fill in the blanks in this cell to accomplish this task. Then print the message shown below, with the blank filled in with the appropriate success rate rounded to 4 decimal places.
N =10000
goals =0
np.random.seed(1)
for i in range(N):
ep = fp2.copy()
while ep.terminal == False:
a = np.random.choice(ep.get_actions())
ep = ep.take_action(a)
if ep.state == ep.______:
goals +=1
sr =______
print(f"When acting randomly, the agent's sucess rate was {______:.4f}")
2.C - Policy Iteration
Create an instance of the DPAgent class for the environment created in Step 2.A, with gamma=1 and random_state=1. Run policy iteration with the default parameters.
Then call the show_history() method of the DPAgent instance to display a sequence of plots showing the policy and value function after each step of policy iteration.
Finally, call the report() method of the agent to show a summary of each step of policy iteration.
[]
2.D - Value of Initial State
Print the value of State 1(the initial state) under the optimal policy.
[]
2.E - Success Rates
You will now estimate the agent's success rate when following the optimal policy. This will be accomplished by generating 10,000 episodes according to each policy and then calculating the proportion of episodes that where sucessful.
Fill in the blanks in order to accomplish the requested task. Then print the three messages shown below, with the blanks filled in with the appropriate values, rounded to 4 decimal places. Aside from filling in the blanks, do not change any code provided.
# 2.E
N =10000
goals =0
total_return =0
np.random.seed(1)
for i in tqdm(range(N)):
ep =______.generate_episode(policy=______.policy)
total_return += np.sum(ep.rewards)
if ep.state == ep.______:
goals +=1
sr =______
avg_ret =______
print('
When working under the optimal policy:')
print(f"The agent's success rate was {______:.4f}.")
print(f"The agent's average return was {______:.4f}.")
2.F - Successful Episode
Use the generate_episode() method of the environment to simulate an episode following the optimal policy found by policy iteration. Set show_result=True and set a value of your choice for random_state.
Call the display() method of the enviornment, setting the fill, contents, and show_path parameters sp that cells are shaded to indicate the optimal state-value function, arrows for the the optimal policy are displayed, and the path taken during the episode is shown.
Experiment with the value of random_state to find one that results in the agent finding the goal. Use that value for your final submission.
[]
2.G - Failed Episode
Repeat Step 2.F, but this time find a value for random_state that results in a failed episode with at least 4 steps.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!