Question: Implement a passive learning agent in a simple environment, such as the 4 3 world. For the case of an initially unknown environment model,

Implement a passive learning agent in a simple environment, such as the 4 × 3 world. For the case of an initially unknown environment model, compare the learning performance of the direct utility estimation, TD, and ADP algorithms. Do the comparison for the optimal policy and for several random policies. For which do the utility estimates converge faster? What happens when the size of the environment is increased? (Try environments with and without obstacles.)

Step by Step Solution

3.47 Rating (167 Votes )

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock

The code repository shows an example of this implemented in the passi... View full answer

blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Artificial Intelligence A Modern approach Questions!