Preparatory Material |
Lecture 01 - Probability Basics 1 |
Lecture 02 - Probability Basics 2 |
Lecture 03 - Linear Algebra 1 |
Lecture 04 - Linear Algebra 2 |
Introduction to RL and Immediate RL |
Lecture 05 - Introduction to RL |
Lecture 06 - RL Framework and Applications |
Lecture 07 - Introduction to Immediate RL |
Lecture 08 - Bandit Optimalities |
Lecture 09 - Value Function based Methods |
Bandit Algorithms |
Lecture 10 - Upper Confidence Bound 1 (UCB 1) |
Lecture 11 - Concentration Bounds |
Lecture 12 - UCB 1 Theorem |
Lecture 13 - Probably Approximately Correct (PAC) Bounds |
Lecture 14 - Median Elimination |
Lecture 15 - Thompson Sampling |
Policy Gradient Methods and Introduction to Full RL |
Lecture 16 - Policy Search |
Lecture 17 - REINFORCE |
Lecture 18 - Contextual Bandits |
Lecture 19 - Full RL Introduction |
Lecture 20 - Returns, Value Functions and Markov Decision Processes (MDPs) |
MDP Formulation, Bellman Equations and Optimality Proofs |
Lecture 21 - MDP Modelling |
Lecture 22 - Bellman Equation |
Lecture 23 - Bellman Optimality Equation |
Lecture 24 - Cauchy Sequence and Green's Equation |
Lecture 25 - Banach Fixed Point Theorem |
Lecture 26 - Convergence Proof |
Dynamic Programming and Monte Carlo Methods |
Lecture 27 - Lpi Convergence |
Lecture 28 - Value Iteration |
Lecture 29 - Policy Iteration |
Lecture 30 - Dynamic Programming |
Lecture 31 - Monte Carlo |
Lecture 32 - Control in Monte Carlo |
Monte Carlo and Temporal Difference Methods |
Lecture 33 - Off Policy MC |
Lecture 34 - UCT (Upper Confidence Bound 1 applied to Trees) |
Lecture 35 - TD (0) |
Lecture 36 - TD (0) Control |
Lecture 37 - Q-Learning |
Lecture 38 - Afterstate |
Eligibility Traces |
Lecture 39 - Eligibility Traces |
Lecture 40 - Backward View of Eligibility Traces |
Lecture 41 - Eligibility Trace Control |
Lecture 42 - Thompson Sampling Recap |
Function Approximation |
Lecture 43 - Function Approximation |
Lecture 44 - Linear Parameterization |
Lecture 45 - State Aggregation Methods |
Lecture 46 - Function Approximation and Eligibility Traces |
Lecture 47 - Least-Squares Temporal Difference (LSTD) and LSTDQ |
Lecture 48 - LSPI and Fitted Q |
DQN, Fitted Q and Policy Gradient Approaches |
Lecture 49 - DQN and Fitted Q-Iteration |
Lecture 50 - Policy Gradient Approach |
Lecture 51 - Actor Critic and REINFORCE |
Lecture 52 - REINFORCE (cont.) |
Lecture 53 - Policy Gradient with Function Approximation |
Hierarchical Reinforcement Learning |
Lecture 54 - Hierarchical Reinforcement Learning |
Lecture 55 - Types of Optimality |
Lecture 56 - Semi Markov Decision Processes |
Lecture 57 - Options |
Lecture 58 - Learning with Options |
Lecture 59 - Hierarchical Abstract Machines |
Hierarchical RL: MAXQ |
Lecture 60 - MAXQ |
Lecture 61 - MAXQ Value Function Decomposition |
Lecture 62 - Option Discovery |
POMDPs (Partially Observable Markov Decision Processes) |
Lecture 63 - POMDP Introduction |
Lecture 64 - Solving POMDP |