Reinforcement Learning

@Glimpse The Reinforcement Learning the algorithm has to understand the quality of the last steps. This means how good or bad an action worked.

What we mean by goals and purposes can be well thought of as maximizing the aspected value of the cumulative sum of the value of a received scalar signal (reward) - mobi dick

@BCP [ Document ]

@git https://github.com/ohioh/IntroductionQuantumComputing/blob/main/GettingStarted/ReinforcementLearning.ipynb

Reward-Strategy:

( optimal behavior is achieved by reaching the goal )

One way is to define a state where the goal is achieved as having plus one reward, and all others are 0 reward, that's sometimes called the goal rewarding coding.-Littman

Another is to penalize the agent with a -1 each step in which the goal has not been achieved. Once the goal is achieved, there's no more cost, that's the action penalty representation.

(And both schemes can lead to big problems for goals with really long horizons)

@Fun Computer scientist love the computer scientist love recursion.

In inverse reinforcement learning, a trainer demonstrates an example of the desired behavior, and the learner figures out what rewards the trainer must have been maximizing that makes this behavior optimal. meta reinforcement learning: learning at the evolutionary level that creates better ways of learning at the individual level.

# Reinforcement Learning: Bellman Equation and Optimality [ Part 1 ] , [ Part 2 ] , [ Part 3 ]

@wiki [ Optimalitätsprinzip von Bellman ]

PreviousDocumentation NextKonfidenzinterval [ ger. ]

Last updated 4 years ago

hashtagRecommended Documents:

hashtag# Reinforcement Learning: Bellman Equation and Optimality [ Part 1 ] , [ Part 2 ]arrow-up-right , [ Part 3 ]

hashtag@wiki [ Optimalitätsprinzip von Bellman ]arrow-up-right

Recommended Documents:

# Reinforcement Learning: Bellman Equation and Optimality [ Part 1 ] , [ Part 2 ] , [ Part 3 ]

@wiki [ Optimalitätsprinzip von Bellman ]