Alberto Castellini
STATISTICAL LEARNING - PART II (2021/2022) (official webpage)
Master's degree in mathematics, Verona University


Syllabus

Introduction to data analysis with R and Python. Linear methods for regression (linear regression, least squares, MLE: Estimation, Prediction, Tests under Gaussian assumptions, variable/subset selection). Shrinkage/Regularization methods (Ridge regression, Least absolute shrinkage and selection operator, [Elastic net, Least angle regression]). Linear methods for classification (Logistic regression, MLE: estimation, prediction, variable selection). Linear model assessment and selection (cross-validation, bootstrap methods). Clustering analysis (k-means, principal component analysis and spectral clustering).

Learning outcomes

The objective is to introduce students to statistical modelling and exploratory data analysis. The mathematical foundations of Statistical Learning (supervised and unsupervised learning, deep learning) are developed with emphasis on the underlying abstract mathematical framework, aiming to provide a rigorous, self-contained derivation and theoretical analysis of the main models currently used in applications. Complimentary laboratory sessions will illustrate the use of both the key algorithms and relevant case studies, mainly by using standard software environments such as R or Python.

Reference books

T. Hastie, R. Tibshirani, J. Friedman. The elements of statistical learning. Data mining, inference, and prediction (Ed. 2). Springer, 2009. (pdf)
Slides
  • Introduction to data analysis with Python and R in Kaggle (pdf)
  • Linear methods for regression (pdf)
  • Variable Subset Selection (pdf)
  • Shrinkage (regularization) methods for variable selection (pdf)
  • Unsupervised learning: clustering analysis (pdf)
  • Artificial neural networks - Prediction of house value: California housing dataset (pdf)
Exercises
  • Exercise 1 (Part 1): Telco Customer Churn first data analysis using Python (pdf)
  • Exercise 1 (Part 2): Telco Customer Churn first data analysis using Python (pdf)
  • Exercise 2: Prediction on the prostate cancer dataset using OLS regression in Python (pdf)
  • Exercise 3: Variable subset selection with OLS regression on the prostate cancer dataset in Python (pdf)
  • Exercise 4: Shrinkage (regularization) methods with regression on the prostate cancer dataset in Python (pdf)
  • Exercise 5: Clustering on the human tumor dataset in Python (pdf)
  • Exercise 6: Artificial neural networks - Prediction of house value: California housing dataset (pdf - see last slide)
Other matherial
  • Python 3 tutorial (pdf)