Question: Consider the MCTS example below. Each node here is a state, the connections between nodes are actions, and transitions are deterministic. Within the nodes we
Consider the MCTS example below. Each node here is a state, the connections between nodes are actions, and transitions are deterministic.
Within the nodes we have two numbers: the number of rollouts ending in success the total number of rollouts that were experienced from that node The value of a particular node is the ratio of these two numbers ie a probability of success In the example below, the value of the root node is
If an action from a particular node has not been experienced, we show it ending in a black dot. Since in the beginning we start with a root node that is empty, the number of rollouts at the root is merely the sum of the rollouts of all it's children.
We have indexed each of the states below with a number to the top left.
Now, assume that a greedy tree policy is used to traverse the tree to select the next node to expand, and that a simulation is run from that node, resulting in success Compute the values of each of the nodes after the tree is updated by this return being backed up
Think about how the value will get propagated back up the tree.
Step by Step Solution
There are 3 Steps involved in it
1 Expert Approved Answer
Step: 1 Unlock
Question Has Been Solved by an Expert!
Get step-by-step solutions from verified subject matter experts
Step: 2 Unlock
Step: 3 Unlock
