Aim

This is an overview of learning material in the course TMA4268 Statistical learning given at the Department of Mathematical Sciences at NTNU. Course description with formal information.

Module pages

These are links to (html-versions of) module pages in TMA4268 Statistical learning, that was held in the spring semester of 2018 at NTNU. (To see .Rmd and .pdf just use in place of .html for most of the module pages)

  1. Introduction
  2. Statistical learning
  3. Linear regression
  4. Classification
  5. Resampling methods
  6. Linear model selection and regularization: 2 files
  7. Moving beyond linearity
  8. Tree-based methods
  9. Support vector machines
  10. Unsupervised learning: 6 files
  11. Neural networks 13 files
  12. Summing up

Compulsory exercises

Planned changes for 2019

  • Change the reading list for Module 11 and focus on basic concepts, use the theory presented in Module 11 above in the phd course MA8701 in the spring of 2019.
  • Add more information on handling big data, maybe partly in Module 3.
  • Have parts of a compulsory exericise where analysis methods are chosen by the students, that is, a data set with description and aim is given (or can be chosen by the student?). This will focus on data analysis skills in R.

Connections to other statistics courses at IMF/NTNU

Nice to have both before, after and at the same time as TMA4268:

  • (V)TMA4267 Linear statistical models Multiple linear regression. Analysis of variance. Experimental design. Multivariate normal distribution. Multiple testing.
  • (V)TMA4180 Optimization 1. First and second order necessary and sufficient (Karush-Kuhn-Tucker) optimality conditions for unconstrained and constrained optimization problems in finite-dimensional vector spaces. Basics of convex analysis and Lagrangian duality theory and their application to optimization problems and algorithms. An overview of modern optimization techniques and algorithms for smooth problems (including line-search/trust-region, quasi-Newton, interior point and active set methods, SQP and augmented Lagrangian approaches). Basic derivative-free and non-smooth optimization methods.

Expanding on topics mentioned in TMA4268

  • (V)TMA4250 Spatial statistics. Parameter estimation, simulation and applications of Gaussian random fields, point fields and discrete Markov random fields. Examples from image analysis, and environmental and natural resource applications.
  • (V)TMA4275 Lifetime analysis. Basic concepts in lifetime modelling. Censored observations. Nonparametric estimation and graphical plotting for lifetime data (Kaplan-Meier, Nelson-plot). Estimation and testing in parametric lifetime distributions. Analysis of lifetimes with covariates (Cox-regression, accelerated lifetime testing). Modelling and analysis of recurrent events. Nonhomogeneous Poisson-processes. Nelson-Aalen estimators.
  • (H)TMA4285 Time series models Autoregressive and moving average based models for stationary and non-stationary time series. Parameter estimation. Model identification. Forecasting. ARCH and GARCH models for volatility. State space models (linear dynamic models) and the Kalman filter.
  • (H)TMA4295 Statistical inference. Transformations and moments of random variables. Families of distributions. Inequalities and convergence theorems. Sufficient statistics. Frequentist and Bayesian estimators. Methods of constructing point estimators, interval estimators and hypothesis tests, and optimality of these. Asymptotic properties of estimators and hypothesis tests.
  • (V)TMA4300 Computational statistics. Classical and Markov chain methods for stochastic simulation. Hierarchical Bayesian models and inference in these. The expectation maximisation (EM) algorithm. Bootstrapping, cross-validation and non-parametric methods.
  • (H)TMA4315 Generalized linear models. Univariate exponential family. Multiple linear regression. Logistic regression. Poisson regression. General formulation for generalised linear models with canonical link. Likelihood-based inference with score function and expected Fisher information. Deviance. AIC. Wald and likelihood-ratio test. Linear mixed effects models with random components of general structure. Random intercept and random slope. Generalised linear mixed effects models. Strong emphasis on programming in R. Possible extensions: quasi-likelihood, over-dispersion, models for multinomial data, analysis of contingency tables, quantile regression.

PhD courses