Basic Concepts

约 130 字小于 1 分钟

2024-05-15

Basic Concepts

State

describes the agent's status with respect to the environment. In the grid world example, the state corresponds to the agent's location.

Action
State transtion
Policy

A policy tells the agent which actions to take at every state

Reward

Reward is one of the most unique concepts in reinforcement learning

Trajectories

A trajectory is a state-action-reward chain. For example, given the policy shown in Fingure 1.6(a), if the agent can move along a trajectory as follows:

s_1 \underset{r=0}{a_2} s_2 \underset{r=0}{a_3} s_5 \underset{r=0}{a_3} s_8 \underset{r=1}{a_2} s_9 .

Returns
episodes