Question: A simple maze below: reward is always 0 until reaching the goal ( reward = 1 ) . With a certain discount factor ( you
A simple maze below: reward is always until reaching the goal reward
With a certain discount factor you decide please provide the Q learning formula and
parameters you are using.
A true value table is your final answer there is no need to provide a stepbystep visit
of the trial
actions
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
