Question: python. help with 2 2 . A - Create Environment Create a 5 x 5 instance of the FrozenPlatform environment with sp _ range =
python. help with
A Create Environment
Create a x instance of the FrozenPlatform environment with sprange start holes and with randomstate Display this environment with the cells shaded to indicate their slip probabilities and with the cell contents left blank.
B Random Actions
You will now estimate the agent's success rate when taking random actions. We will compare this with the success rate of a trained again later.
Fill in the blanks in this cell to accomplish this task. Then print the message shown below, with the blank filled in with the appropriate success rate rounded to decimal places.
N
goals
nprandom.seed
for i in rangeN:
ep fpcopy
while epterminal False:
a nprandom.choiceepgetactions
ep eptakeactiona
if epstate ep:
goals
sr
printfWhen acting randomly, the agent's sucess rate was :f
C Policy Iteration
Create an instance of the DPAgent class for the environment created in Step A with gamma and randomstate Run policy iteration with the default parameters.
Then call the showhistory method of the DPAgent instance to display a sequence of plots showing the policy and value function after each step of policy iteration.
Finally, call the report method of the agent to show a summary of each step of policy iteration.
D Value of Initial State
Print the value of State the initial state under the optimal policy.
E Success Rates
You will now estimate the agent's success rate when following the optimal policy. This will be accomplished by generating episodes according to each policy and then calculating the proportion of episodes that where sucessful.
Fill in the blanks in order to accomplish the requested task. Then print the three messages shown below, with the blanks filled in with the appropriate values, rounded to decimal places. Aside from filling in the blanks, do not change any code provided.
# E
N
goals
totalreturn
nprandom.seed
for i in tqdmrangeN:
ep generateepisodepolicypolicy
totalreturn npsumeprewards
if epstate ep:
goals
sr
avgret
print
When working under the optimal policy:
printfThe agent's success rate was :f
printfThe agent's average return was :f
F Successful Episode
Use the generateepisode method of the environment to simulate an episode following the optimal policy found by policy iteration. Set showresultTrue and set a value of your choice for randomstate.
Call the display method of the enviornment, setting the fill, contents, and showpath parameters sp that cells are shaded to indicate the optimal statevalue function, arrows for the the optimal policy are displayed, and the path taken during the episode is shown.
Experiment with the value of randomstate to find one that results in the agent finding the goal. Use that value for your final submission.
G Failed Episode
Repeat Step F but this time find a value for randomstate that results in a failed episode with at least steps.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
