Alberto Castellini

STATISTICAL LEARNING (2022/2023) (official webpage)

Master's degree in data science, Verona University

Syllabus

Theory: Linear models for regression. Cross-validation. Variable and model selection in linear regression models. Regularization for linear regression models. Methods for dimensionality reduction. Classification models (Logistic Regression, Linear Discriminant Analysis). Tree Based Methods (Decision Trees, Bagging, Random Forest, Boosting). Unsupervised methods (Principal Component Analysis, K-Means Clustering, Hierarchical Clustering). Introduction to Neural Networks (Single layer neural network, training a neural network). Laboratory: Introduction to data analysis with Python. Linear regression (Python). Variable and model selection in linear models (Python). Ridge and Lasso regularization for linear regression models (Python). Classification with logistic regression (Python). Data clustering with k-means and hierarchical approaches (Python). Artificial Neural Networks (Python).

Learning outcomes

The course aims to introduce students to the statistical models used in data science. The foundations of statistical learning (supervised and unsupervised) will be developed by placing the emphasis on the mathematical basis of the different state-of-the-art methodologies. It also aims to provide rigorous derivations of the methods currently used in industrial and scientific applications to allow students to understand their requirements for correct use. Laboratory sessions will illustrate the use of fundamental algorithms and industrial case studies in which the student will be able to learn to analyze real data-sets by means of Python software. At the end of the course the students have to demonstrate the following skills: - knowledge of the main stages of data preparation, model creation and evaluation - ability to develop solutions for feature selection - knowledge and ability to use the main regression and regularization models (e.g., LASSO, Ridge Regression) - knowledge and ability to use the main methods for dimensionality reduction (e.g., Principal Component Regression, Partial Least Squares); - knowledge and ability to use the main methods for classification (e.g., KNN, Logistic Regression, LDA) - knowledge and ability to use the main methods for tree-based regression and classification (e.g., decision tree, random forest) - knowledge and ability to use the main methods for unsupervised data analysis (e.g., K-means clustering, hierarchical clustering).

Reference books

James, Gareth. An introduction to statistical learning - with applications in R (Ed. 2). Springer. 2021. (pdf)
T. Hastie, R. Tibshirani, J. Friedman. The elements of statistical learning. Data mining, inference, and prediction (Ed. 2). Springer, 2009. (pdf)