Master's degree in artificialintelligence, Verona University
Syllabus
Introduction to RL. Multi armed bandits. Markov Decision Processes. RL based on Dynamic Programming (e.g., value and policy iteration). RL based on Monte Carlo methods. RL based on Temporal-Difference learning (e.g., Q-learning, Sarsa). Planning and learning: Model-based RL. Deep Q Learning. RL with approximate solutions (on-policy prediction and control with approximation). Policy Gradient Methods (Reinforce). RL with Actor-critic methods (A2C). TRPO and PPO. RL in partially observable environments (POMCP).
Learning outcomes
The reinforcement learning course introduces students to reinforcement learning and planning under uncertainty. In particular, it is focused on the design of algorithms that enable machines to learn based on reinforcements, hence from partial, implicit and delayed feedback obtained by repeatedly interact with the environment or users. At the end of the course, students will have to demonstrate that they have acquired the ability to i) tackle sequential decision problems with reinforcement learning techniques, ii) identify and apply the most effective and efficient algorithms to solve specific sequential decision problems, iii) designing new reinforcement learning algorithms. In particular, the acquired knowledge concerns advanced techniques for the resolution of Markov Decision Process (eg, research with Monte Carlo methods), bandit problems, model-based and model-free reinforcement learning, Bayesian reinforcement learning, deep reinforcement learning, and advanced reinforcement learning techniques (safe policy improvement, partially observable environments, hierarchical reinforcement learning, imitation-based learning, inverse reinforcement learning, and meta-learning).
Reference books
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning - An introduction (second edition) 2018. (
pdf)