Aim

This is an overview of learning material in the course TMA4268 Statistical learning given at the Department of Mathematical Sciences at NTNU. Course description with formal information.

Module pages

These are links to (html-versions of) module pages in TMA4268 Statistical learning, that was held in the spring semester of 2018 at NTNU. (To see .Rmd and .pdf just use in place of .html for most of the module pages)

Introduction
Statistical learning
Linear regression
Classification
Resampling methods
Linear model selection and regularization: 2 files
Moving beyond linearity
Tree-based methods
Support vector machines
Unsupervised learning: 6 files
Neural networks 13 files
Summing up

Exam

May 24, 4hrs:

Compulsory exercises

Modules 2-4: Compulsory1 and Short solutions
Modules 5-7: Compulsory2 and Short solutions
Modules 8-11: Compulsory3 and Short solutions

Solutions to recommended exercises

The recommended exercises are found in each of the module pages (mainly in the end of the pages).

Statistical learning M2
Linear regression M3
Classification M4
Resampling M5
Linear model selection and regularization M6
Moving beyond linearity M7
Tree-based methods M8
Unsupervised methods M10

Planned changes for 2019

Change the reading list for Module 11 and focus on basic concepts, use the theory presented in Module 11 above in the phd course MA8701 in the spring of 2019.
Add more information on handling big data, maybe partly in Module 3.
Have parts of a compulsory exericise where analysis methods are chosen by the students, that is, a data set with description and aim is given (or can be chosen by the student?). This will focus on data analysis skills in R.

Connections to other statistics courses at IMF/NTNU

Nice to have both before, after and at the same time as TMA4268:

(V)TMA4267 Linear statistical models Multiple linear regression. Analysis of variance. Experimental design. Multivariate normal distribution. Multiple testing.
(V)TMA4180 Optimization 1. First and second order necessary and sufficient (Karush-Kuhn-Tucker) optimality conditions for unconstrained and constrained optimization problems in finite-dimensional vector spaces. Basics of convex analysis and Lagrangian duality theory and their application to optimization problems and algorithms. An overview of modern optimization techniques and algorithms for smooth problems (including line-search/trust-region, quasi-Newton, interior point and active set methods, SQP and augmented Lagrangian approaches). Basic derivative-free and non-smooth optimization methods.

Expanding on topics mentioned in TMA4268

(V)TMA4250 Spatial statistics. Parameter estimation, simulation and applications of Gaussian random fields, point fields and discrete Markov random fields. Examples from image analysis, and environmental and natural resource applications.
(V)TMA4275 Lifetime analysis. Basic concepts in lifetime modelling. Censored observations. Nonparametric estimation and graphical plotting for lifetime data (Kaplan-Meier, Nelson-plot). Estimation and testing in parametric lifetime distributions. Analysis of lifetimes with covariates (Cox-regression, accelerated lifetime testing). Modelling and analysis of recurrent events. Nonhomogeneous Poisson-processes. Nelson-Aalen estimators.
(H)TMA4285 Time series models Autoregressive and moving average based models for stationary and non-stationary time series. Parameter estimation. Model identification. Forecasting. ARCH and GARCH models for volatility. State space models (linear dynamic models) and the Kalman filter.
(H)TMA4295 Statistical inference. Transformations and moments of random variables. Families of distributions. Inequalities and convergence theorems. Sufficient statistics. Frequentist and Bayesian estimators. Methods of constructing point estimators, interval estimators and hypothesis tests, and optimality of these. Asymptotic properties of estimators and hypothesis tests.
(V)TMA4300 Computational statistics. Classical and Markov chain methods for stochastic simulation. Hierarchical Bayesian models and inference in these. The expectation maximisation (EM) algorithm. Bootstrapping, cross-validation and non-parametric methods.
(H)TMA4315 Generalized linear models. Univariate exponential family. Multiple linear regression. Logistic regression. Poisson regression. General formulation for generalised linear models with canonical link. Likelihood-based inference with score function and expected Fisher information. Deviance. AIC. Wald and likelihood-ratio test. Linear mixed effects models with random components of general structure. Random intercept and random slope. Generalised linear mixed effects models. Strong emphasis on programming in R. Possible extensions: quasi-likelihood, over-dispersion, models for multinomial data, analysis of contingency tables, quantile regression.

PhD courses

(H)MA8704 Asympotic methods every autumn. Requires TMA4295.
(V)MA8701 General statistical methods next time spring of 2019. We go in more detail into related topics to TMA4268. Requires TMA4295 and TMA4300. Nice to have TMA4180, TMA4285 and TMA4315.
(V)MA8702 Computational statistics 2 next time spring of 2020. Requires TMA4300 and TMA4250.

TMA4268 Statistical Learning

Spring 2018

Mette Langaas, Department of Mathematical Sciences NTNU

22.05.2018

Aim