# Bellman equation

@wiki <https://en.wikipedia.org/wiki/Bellman_equation>

The Bellman equation was formulated by [Richard Bellman](http://www.iumj.indiana.edu/IUMJ/FULLTEXT/1957/6/56038) as a way to relate the value function and all the future actions and states of an MDP.

![Bellman equation](https://4092223458-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MjY9ZUOIiOq3c33tSsV%2Fuploads%2FoIKrOcq72uiEEZON4ucC%2Fimage.png?alt=media\&token=7d41e955-2460-4f25-9383-a9c17e759a23)

The function defines the value of a state as the sum over all discounted rewards which are weighted by the probability of their occurrence given by a certain policy π.

π(a∣s) is the therm as a probability of taking an action ( a ) in a state ( s ) under the policy π.

&#x20;π simply states what the agent ought to do in each state. This could be stochastic or deterministic.&#x20;

P(s′∣s,a)  is the probability of moving to the next state $$s′$$s′ and receiving the reward $$R$$R given our current state $$s$$s and taking action $$a$$a.

($$R+γVπ(s′)$$R+γVπ(s′))&#x20;
