Alberto Castellini
STATISTICAL METHODS FOR DATA ANALYSIS - MACHINE LEARNING (2018/2019) (official webpage)
Master in mathematics, Verona University


Syllabus

Introduction to data analysis with R and Python. Linear methods for regression (linear regression, least squares, MLE: Estimation, Prediction, Tests under Gaussian assumptions, variable/subset selection). Shrinkage/Regularization methods (Ridge regression, Least absolute shrinkage and selection operator, [Elastic net, Least angle regression]). Linear methods for classification (Logistic regression, MLE: estimation, prediction, variable selection). Linear model assessment and selection (cross-validation, bootstrap methods). Clustering analysis (k-means, principal component analysis and spectral clustering).

Learning outcomes

The objective is to introduce students to statistical modelling and exploratory data analysis. The mathematical foundations of Statistical Learning (supervised and unsupervised learning, deep learning) are developed with emphasis on the underlying abstract mathematical framework, aiming to provide a rigorous, self-contained derivation and theoretical analysis of the main models currently used in applications. Complimentary laboratory sessions will illustrate the use of both the key algorithms and relevant case studies, mainly by using standard software environments such as R or Python.

Reference books

T. Hastie, R. Tibshirani, J. Friedman. The elements of statistical learning. Data mining, inference, and prediction (Ed. 2). Springer, 2009. (pdf)
Slides
  • Introduction to data analysis with Python and R in Kaggle (pdf)
  • Linear methods for regression (pdf)
  • Variable Subset Selection (pdf)
  • Shrinkage (regularization) methods for variable selection (pdf)
  • Regression methods based on derived input directions (pdf)
  • Unsupervised learning: clustering analysis (pdf)
Exercises
  • Exercise 1 (Part 1): Telco Customer Churn first data analysis using Python (pdf)
  • Exercise 1 (Part 2): Telco Customer Churn first data analysis using Python (pdf)
  • Exercise 2: Prediction on the prostate cancer dataset using OLS regression in Python (pdf)
  • Exercise 3: Variable subset selection with OLS regression on the prostate cancer dataset in Python (pdf)
  • Exercise 4: Shrinkage (regularization) methods with regression on the prostate cancer dataset in Python (pdf)
  • Exercise 5: Regularization methods based on derived input directions on the prostate cancer dataset in Python (pdf)
  • Exercise 6: Clustering on the human tumor dataset in Python (pdf)
Other matherial
  • Python 3 tutorial (pdf)