Question: The direct utility estimation method in Section 21.2 uses distinguished terminal states to indicate the end of a trial. How could it be modified for
The direct utility estimation method in Section 21.2 uses distinguished terminal states to indicate the end of a trial. How could it be modified for environments with discounted rewards and no terminal states?
Step by Step Solution
3.33 Rating (171 Votes )
There are 3 Steps involved in it
When there are no terminal states there are no sequences so we ne... View full answer
Get step-by-step solutions from verified subject matter experts
Document Format (1 attachment)
21-C-S-A-I (301).docx
120 KBs Word File
