Alberto Castellini

STATISTICAL METHODS FOR DATA ANALYSIS - MACHINE LEARNING (2018/2019) (official webpage)

Master in mathematics, Verona University

Syllabus

Introduction to data analysis with R and Python. Linear methods for regression (linear regression, least squares, MLE: Estimation, Prediction, Tests under Gaussian assumptions, variable/subset selection). Shrinkage/Regularization methods (Ridge regression, Least absolute shrinkage and selection operator, [Elastic net, Least angle regression]). Linear methods for classification (Logistic regression, MLE: estimation, prediction, variable selection). Linear model assessment and selection (cross-validation, bootstrap methods). Clustering analysis (k-means, principal component analysis and spectral clustering).

Learning outcomes

The objective is to introduce students to statistical modelling and exploratory data analysis. The mathematical foundations of Statistical Learning (supervised and unsupervised learning, deep learning) are developed with emphasis on the underlying abstract mathematical framework, aiming to provide a rigorous, self-contained derivation and theoretical analysis of the main models currently used in applications. Complimentary laboratory sessions will illustrate the use of both the key algorithms and relevant case studies, mainly by using standard software environments such as R or Python.

Reference books

T. Hastie, R. Tibshirani, J. Friedman. The elements of statistical learning. Data mining, inference, and prediction (Ed. 2). Springer, 2009. (pdf)