An Introduction to RL

Preface

While I am in now way an expert and honestly straight up inexperienced in reinforcement learning, I think a lot of the modern material can have poorly explained overviews on what is happening and leave people scratching their heads at the bigger picture even if they understand the smaller fine points. I wanted to create an article that helps alleviate this problem. The material in this article is heavily structured off of the class I took last semester at Cornell, Wen Sun’s 4789: Introduction to Reinforcement Learning

The thing we care about

At some level, all of reinforcement learning comes back to the idea of a Markov Decision Process, or MDP. Specifically we care about optimizing an MDP. A Markov decision process contains multiple things. Recall that

Definition:

Markov Decision Process Recall an MDP is defined as such $MDP = \{A, S, \pi, mu, r\}$ Where $A$ is the action space, $S$ is the state space, $\pi$ is a policy, $\mu$ is the initial state, and $r$ is the reward function.

Now this being said, we actually need to create a distinction between

Definition:

Finite Horizon Markov Decision Process Hi

Definition:

Infinite Horization Markov Decision Process Hi

Convergence of RL Systems with Finite Policy Action Space

PSYCH you thought I would actually fill this out im so lazy lol

Convergence of RL Systems with Parameterized Policy Space

Austin Wu - Cornell 2027. Applied and Engineering Physics

Reinforcement Learning

An Introduction to RL

Preface

The thing we care about

Convergence of RL Systems with Finite Policy Action Space

Convergence of RL Systems with Parameterized Policy Space

Fine Tuning RL Models

Imitation Learning