Question: python. help with 2 2 . A - Create Environment Create a 5 x 5 instance of the FrozenPlatform environment with sp _ range =

python. help with

2

2 .

-

Create Environment

Create a

5

5

instance of the FrozenPlatform environment with sp

_

range

= [0.1, 0.3],

start

= 1,

holes

= 3,

and with random

_

state

= 1 .

Display this environment with the cells shaded to indicate their slip probabilities and with the cell contents left blank.

[]

2 .

-

Random Actions

You will now estimate the agent's success rate when taking random actions. We will compare this with the success rate of a trained again later.

Fill in the blanks in this cell to accomplish this task. Then print the message shown below, with the blank filled in with the appropriate success rate rounded to

4

decimal places.

= 10000

goals

= 0

.

random.seed

(1)

for i in range

(

)

=

2 .

copy

()

while ep

.

terminal

= =

False:

=

.

random.choice

(

.

get

_

actions

())

=

.

take

_

action

(

)

if ep

.

state

= =

.______

goals

+ = 1

=______

(

"

When acting randomly, the agent's sucess rate was

{______

. 4

} ")

2 .

-

Policy Iteration

Create an instance of the DPAgent class for the environment created in Step

2 .

,

with gamma

= 1

and random

_

state

= 1 .

Run policy iteration with the default parameters.

Then call the show

_

history

()

method of the DPAgent instance to display a sequence of plots showing the policy and value function after each step of policy iteration.

Finally, call the report

()

method of the agent to show a summary of each step of policy iteration.

[]

2 .

-

Value of Initial State

Print the value of State

1 (

the initial state

)

under the optimal policy.

[]

2 .

-

Success Rates

You will now estimate the agent's success rate when following the optimal policy. This will be accomplished by generating

10, 000

episodes according to each policy and then calculating the proportion of episodes that where sucessful.

Fill in the blanks in order to accomplish the requested task. Then print the three messages shown below, with the blanks filled in with the appropriate values, rounded to

4

decimal places. Aside from filling in the blanks, do not change any code provided.

2 .

= 10000

goals

= 0

total

_

return

= 0

.

random.seed

(1)

for i in tqdm

(

range

(

))

=______.

generate

_

episode

(

policy

=______.

policy

)

total

_

return

+ =

.

sum

(

.

rewards

)

if ep

.

state

= =

.______

goals

+ = 1

=______

avg

_

ret

=______

('

When working under the optimal policy:

')

(

"

The agent's success rate was

{______

. 4

} . ")

(

"

The agent's average return was

{______

. 4

} . ")

2 .

-

Successful Episode

Use the generate

_

episode

()

method of the environment to simulate an episode following the optimal policy found by policy iteration. Set show

_

result

=

True and set a value of your choice for random

_

state.

Call the display

()

method of the enviornment, setting the fill, contents, and show

_

path parameters sp that cells are shaded to indicate the optimal state

-

value function, arrows for the the optimal policy are displayed, and the path taken during the episode is shown.

Experiment with the value of random

_

state to find one that results in the agent finding the goal. Use that value for your final submission.

[]

2 .

-

Failed Episode

Repeat Step

2 .

,

but this time find a value for random

_

state that results in a failed episode with at least

4

steps.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

help with python: Average Path Length In Part 4 , you will apply value iteration to a relatively large Frozen Platform environment and will then study the average path length for successful and...

python help: Part 2 : Comparing Control Methods I In Part 2 , you will compare the performance of Monte Carlo control and Q - learning by running both algorithms on a small environment and then...

MATLAB HELP m = 2. Problem 2 - Create a MATLAB function m file, unitvector() that contains the following elements a. One input m, which is assumed to be the delta-y value of slope when delta-x is...

python help q 3 Policy Iteration vs Value Iteration In Part 3 , you will compare the convergence time of policy iteration and value iteration. Both techniques are guaranteed to converge to the...

\f6e Foundations in Strategic Management Jeffrey S. Harrison Robins School of Business University of Richmond Caron H. St. John College of Business Administration University of Alabama in Huntsville...

Please answer all questions. I need Pestle analysis, 5 force analysis or VRIO analysis done and it needs to be on this case study: IKEA's international marketing strategy in china. 4. Markets with...

Assignment 2V4 - Access... Sheridan Lab 2: Access Control with Security Groups and TCP analysis Submission Team assignment with 2 members in each team, You may select you Instructions own partner....

IN JAVA PLEASE AND THANKS! EMAIL ME AT CALCANSWER @ G MAIL . COM (WITHOUT SPACES), AND I CAN SEND YOU THESE FILES AND THE SPECCHECKER IN AN EASIER FORMAT :) SimpleTests.java...

IN JAVA PLEASE AND THANKS! SimpleTests.java TryOutModularArithmetic.java For this assignment you will implement one class, called TV, that models some aspects of the behavior of simple television. A...

IN JAVA PLEASE AND THANKS! EMAIL ME AT CALCANSWER @ G MAIL . COM (WITHOUT SPACES), AND I CAN SEND YOU THESE FILES AND THE SPECCHECKER IN AN EASIER FORMAT :) SimpleTests.java...

Two wires brace a telephone pole and are both fastened to the ground at a point 15 ft from the base of the pole the shorter wire is fastened to the pole 15 ft above the ground in the long. Find the...

Submit the single java file only DO NOT ZIP it. 1. RECURSIVELY calculate the height of a tree. RECURSIVELY calculate the level of a Node in a tree. 3. Print elements of all the Nodes of a tree using...

Crane Travel Agency purchased land for $ 9 4 , 6 0 0 cash on December 1 0 , 2 0 2 2 . At December 3 1 , 2 0 2 2 , the lands value has increased to $ 9 6 , 3 0 0 . What amount should be reported for...

CT Corp Comprehensive Question Canadian Tire Corporation, Limited (Canadian Tire) is a family of companies that includes a retail segment and a financial services division, among others. The retail...

2. List the advantages of listening well

3. Make a list of all the places where you have lived or traveled. (Remember, this does not just mean travel to foreign countries. Think about trips to other neighborhoods in your city or areas of...

5. Do a little virtual shopping in the toy department of an online retailer, and use the search options to see what kinds of toys the retailer suggests for girls versus boys. What do these...