Bellman equation
@glimpse [ coming soon ]
Last updated
@glimpse [ coming soon ]
Last updated
@wiki https://en.wikipedia.org/wiki/Bellman_equation
The Bellman equation was formulated by Richard Bellman as a way to relate the value function and all the future actions and states of an MDP.
The function defines the value of a state as the sum over all discounted rewards which are weighted by the probability of their occurrence given by a certain policy π.
π(a∣s) is the therm as a probability of taking an action ( a ) in a state ( s ) under the policy π.
π simply states what the agent ought to do in each state. This could be stochastic or deterministic.
P(s′∣s,a) is the probability of moving to the next state s′ and receiving the reward R given our current state s and taking action a.
(R+γVπ(s′))