Question: Question 2 (RL) [50 points - each part 12.5 points]: Consider the following grid world with five different states. The actions are move east, west,

Question 2 (RL) [50 points - each part 12.5 points]: Consider the following grid world with five different states. The actions are move east, west, south, north, and exit if it is in a terminal

Question 2 (RL) [50 points - each part 12.5 points]: Consider the following grid world with five different states. The actions are move east, west, south, north, and exit if it is in a terminal state. (a) We would like to use Model-based learning using the following four observations. What is the estimated Transition and reward based on these observations? (b) Implement direct evaluation as a model-free based learning based on those four observations and calculate the value states for each state. Assume =0.9. (c) We would like to use TD learning and Q-learning to find the values of these states. Suppose that we have the following observed transitions (s,a,s,r) : (B, East, C,3), (C, South, E, 3), (C, East, E,4) , (D, West, C,1), (A,South,C,3) The initial value of each state is 0 . Assume that =0.9 and =0.4. What are the learned values from TD learning after all five observations? Show the process of computing these values. (d) What are the learned Q-values from Q-learning after all five observations? Show the process of computing these values

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Question 2 (RL) [50 points - each part 12.5 points]: Consider the following grid world with five different states. The actions are move east, west, south, north, and exit if it is in a terminal...

can anyone work out these problems and show me how they did it?reslly need help, please complete all parts with work! Question 2 (RL) [50 points - each part 12.5 points]: Consider the following grid...

chapter 6 \" International Management It was once said that the sun never set on the British Empire. Today, the sun does set on the British Empire, but not on the scores of global empires, including...

Which of the following is a microeconomic concern? ( a) The rate of unemployment in the country ( b) How an individual consumer responds an economic recession. O c) The inflation rate the US faces. O...

Nov 3rd 2020 HONG KONG AND SHANGHAI Jack Ma was in a triumphant mood shortly after Ant Group, his Chinese fintech firm, priced its initial public offeringset to be the worlds biggest ever, with...

Chapter 7 from Mastering Strategic Management was adapted by The Saylor Foundation under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 license without attribution as requested by the...

CSC81002 Assignment 2 Weight:30% of your final mark Due:18 May 2020 11 pm Specifications Your task is to complete various exercises in BlueJ, using the Java language, and to submit these via the...

I noticed there are existing papers available, however could I get a new paper? Appendix A: The Home Depot, Inc. Annual Report in Fundamentals of Financial Accounting Write a 1,050- to 1,750-word...

Subject : Strategic Management in a Global Environment Safaricom: Innovative Telecom Solutions to Empower Kenyans As the largest mobile provider in Kenya, Safaricom has touched the lives of Kenyans...

Resources: Appendix A: The Home Depot, Inc. Annual Report in Fundamentals of Financial Accounting Write a 1,050- to 1,750-word paper in which you answer the following questions: What does the...

An iceboat has a constant velocity toward the east when a sudden gust of wind causes the iceboat to have a constant acceleration toward the east for a period of 3.0 s. A plot of x versus / is shown...

Which of the following are electrophiles, and which are nucleophiles?

iprehensive Problem i Saved View transaction list Journal entry worksheet kipped eBook Print ferences Note: Enter debits before credits. \ begin { tabular } { | l | l | l | l | } \ hline Transaction...

On 1 March 2007 DB Limited issued R560 000 15% debentures at R98. The debentures were to be redeemed at par in four equal annual payments starting 28 February 2010. Required: Journalise the above...