Alberto Castellini
REINFORCEMENT LEARNING (2022/2023) (official webpage)
Master's degree in artificialintelligence, Verona University


Syllabus

Introduction to RL. Multi armed bandits. Markov Decision Processes. RL based on Dynamic Programming (e.g., value and policy iteration). RL based on Monte Carlo methods. RL based on Temporal-Difference learning (e.g., Q-learning, Sarsa). Planning and learning: Model-based RL. Deep Q Learning. RL with approximate solutions (on-policy prediction and control with approximation). Policy Gradient Methods (Reinforce). RL with Actor-critic methods (A2C). TRPO and PPO. RL in partially observable environments (POMCP).

Learning outcomes

The reinforcement learning course introduces students to reinforcement learning and planning under uncertainty. In particular, it is focused on the design of algorithms that enable machines to learn based on reinforcements, hence from partial, implicit and delayed feedback obtained by repeatedly interact with the environment or users. At the end of the course, students will have to demonstrate that they have acquired the ability to i) tackle sequential decision problems with reinforcement learning techniques, ii) identify and apply the most effective and efficient algorithms to solve specific sequential decision problems, iii) designing new reinforcement learning algorithms. In particular, the acquired knowledge concerns advanced techniques for the resolution of Markov Decision Process (eg, research with Monte Carlo methods), bandit problems, model-based and model-free reinforcement learning, Bayesian reinforcement learning, deep reinforcement learning, and advanced reinforcement learning techniques (safe policy improvement, partially observable environments, hierarchical reinforcement learning, imitation-based learning, inverse reinforcement learning, and meta-learning).

Reference books

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning - An introduction (second edition) 2018. (pdf)
Slides
  • Introduction to reinforcement learning (pdf)
  • Multi-armed bandits (pdf)
  • Markov Decision Processes (pdf)
  • RL based on Dynamic Programming (pdf)
  • RL based on Monte Carlo (pdf)
  • RL based on Temporal Difference (pdf)
  • On-policy prediction with approximation (pdf)
  • On-policy control with approximation (Deep Q Networks) (pdf)
  • Policy Gradient Methods (pdf)
Lab
Please refer to https://github.com/d-corsi/RL-Lab