Question: help with Q 1 import numpy as np import matplotlib.pyplot as plt import time from tqdm import tqdm from aitools.algs import DPAgent, MCAgent from aitools.envs

help with Q

1

import numpy as np

import matplotlib.pyplot as plt

import time

from tqdm import tqdm

from aitools.algs import DPAgent, MCAgent

from aitools.envs import FrozenPlatform

Create Environment

An instance of the FrozenPlatform environment has been provided for you in this cell. Call the display

()

method of this isntance with fill

=

'slip' and contents

=

'slip' to display the environment with the slip probabilities for each state.

run cells below

1 = {0

0, 1

2, 2

2, 3

2, 4

3, 5

1, 6

1, 7

2, 8

0, 9

0, 10

1, 11

2, 12

2, 13

0, 14

1, 15

1, 16

0}

2 = {0

0, 1

2, 2

2, 3

2, 4

3, 5

1, 6

2, 7

2, 8

0, 9

0, 10

1, 11

2, 12

2, 13

0, 14

1, 15

1, 16

0}

plt

.

subplot

(1, 2, 1)

1 .

display

(

contents

=

1,

fill

=

None, show

_

fig

=

False

)

plt

.

subplot

(1, 2, 2)

1 .

display

(

contents

=

2,

fill

=

None, show

_

fig

=

False

)

plt

.

show

()

Create two instances of the DPAgent class, each using the environment created in Step

1 .

,

and each with gamma

= 1 .

One of the agents should be set to have policy pi

1

and the other should have policy pi

2 .

Run policy evaluation for both agents to evaluate the two policies.

Then display a

1

2

grid of subplots. Each subplot should show a display of the environment along with a policy. The first subplot should display pi

1

and have cells shaded according to the value function for pi

1 .

The second plot should be similar, but should use policy pi

2

and its value function.

Note: You can copy the code for the subplots from

1 .

,

adjusting the arguments used for the fill and contents parameters.

Print the value of State

1 (

the initial state

)

under each policy.

You will now estimate the agent's success rate when following each policy. This will be accomplished by generating

10, 000

episodes according to each policy and then calculating the proportion of episodes that where sucessful.

Fill in the blanks in order to accomplish the requested task. Then print the two messages shown below, with the blanks filled in with the appropriate success rates, rounded to

4

decimal places. Aside from filling in the blanks, do not change any code provided.

= 10000

goals

1 = 0

goals

2 = 0

.

random.seed

(1)

for i in range

(

)

1 =______.

generate

_

episode

(

policy

=______)

2 =______.

generate

_

episode

(

policy

=______)

if ep

1 .

state

= =

1 .______

goals

1 + = 1

if ep

2 .

state

= =

2 .______

goals

2 + = 1

1 =______

2 =______

(

"

Under policy

1,

the agent's success rate was

{______

. 4

} . ")

(

"

Under policy

2,

the agent's success rate was

{______

. 4

} . ")

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

I need help coding this in Python!! **Note this is all one question and needs to be implemented using only packages imported below (CANNOT use any other packages such as seaborn or sklearn for this...

When I run my program it does nothing, am trying for all print statements to print, please help!! import math import cmath import numpy as np import matplotlib.pyplot as plt import time import pandas...

Google colab assignment. Pls do .ipynb part in colab and word doc part in written word form. Thank you! Lab 3.2: Lab 4: In Lab 3.2 and Lab 4, we have learnt to build CNN model and to analyze of model...

Jupyter Notebook Now that we have tried our hand at some single-layer nets, let's see how they stack up compared to multi-layer nets. :) We will be exploring the basic concepts of learning non-linear...

Jupiter Notebook We have covered some of the limitations of single layer neural networks in class, but they are still powerful learning systems that provide a good way to begin learning about how to...

l I cant fix this error, I'm using spyder. Also some other help as well with shifting the code that creates the random walk x and y position arrays from the main code into a function called...

I'm running into problems with questions following this initial question. I don't know where I'm getting it wrong. And I also don't know how to print the correlation coefficient for the data. This is...

from __future__ import print_function import os, sys, time, datetime, json, random import numpy as np from keras.models import Sequential from keras.layers.core import Dense, Activation from...

Total Number of Wins by Average Points Scored 70 60 50 Total Number of Wins 40 30 20 10 85 90 95 100 105 110 Average Points Scored Correlation between Average Points Scored and the Total Number of...

PLEASE HELP!! Complete the algorithm outlined by the psuedocode in class TreasureHuntGame. ...CODE... Treasurehunt.ipynb from __future__ import print_function import os, sys, time, datetime, json,...

For each of the following situations, indicate which fund would be used to report the transaction. GF- general fund, SRF- special revenue fund, DSF- debt service fund, CPF- capital projects fund, PF-...

An individual is 48 years old. At the end of each month, he deposits $320 in a retirement account that pays 5.44% interest compounded monthly (a) After 8 years, what is the value of the account? (b)...

PROBLEM 2-7. Underapplied or Overapplied Overhead In the past year, Eagle Custom Cabinets had total revenue of $1,200,000, cost of goods sold of $700,000 (before adjustment for over- or underapplied...

In the discussion forum, you are expected to participate often and engage in deep levels of discourse. Please post your initial response as early as possible and continue to participate throughout...

Explain why the following statements are false. a. The aggregate-demand curve slopes downward because it is the horizontal sum of the demand curves for individual goods. b. The long-run...

Suppose that the economy is currently in a recession. If policymakers take no action, how will the economy change over time? Explain in words and using an aggregate-demand/ aggregate-supply diagram.

Suppose the U.S. economy begins in long-run equilibrium. Concerns about global climate change cause the government to significantly restrict the production of electricity from fossil fuels. Because...