Report errors in material, broken links etc to .

Aim

This is an overview of learning material in the course TMA4268 Statistical learning given at the Department of Mathematical Sciences at NTNU. Course description with formal information.

Textbook

James et al: Introduction to Statistical Learning, with Applications in R. The textbook can be downloaded here: https://www-bcf.usc.edu/~gareth/ISL/

The ebook an also be downloaded from Springer: https://www.springer.com/gp/book/9781461471370 (NB, need to be on NTNU network or via vpn.)

Springer sells a black/white version cheap (from the website), and a (more costly) colour version is available at Akademika.

There are 15 hours of youtube videos by two of the authors of the book, Trevor Hastie an Rob Tibshirani -the inventors of statistical learning - all links here: https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/

Module pages

These are links to (html-versions of) module pages in TMA4268 Statistical learning for the spring semester of 2019 at NTNU. (To see .Rmd and .pdf just use in place of .html for most of the module pages). Links are updated weekly (one new module will be active for each week).

An overview with dates and all files together: 2019 table with dates for course activity

  1. Introduction
  2. Statistical learning
  3. Linear regression
  4. Classification
  5. Resampling methods
  6. Linear model selection and regularization: first lecture and second lecture
  7. Moving beyond linearity
  8. Tree-based methods
  9. Support vector machines
  10. Unsupervised learning. Lecture 1 with Lab1 and New York times stories. Lecture 2 with Lab2 and Lab3 and also extra material on the PCA (due to many questions on rank for Compulsory 2)
  11. Neural networks
  12. Final

Exam

2019

Digital exam in Inspera.

2018

Exams from course using the same textbook

  • STK2100 Maskinlæring og statistiske metoder for prediksjon og klassifikasjon UiO V2017: exam problems, solutions
  • STK2100 Maskinlæring og statistiske metoder for prediksjon og klassifikasjon UiO V2018: exam problems, solutions

Compulsory exercises

See the pages for 2018 for Compulsory exercises given in 2018.

Introductions to R

Connections to other statistics courses at IMF/NTNU

Nice to have both before, after and at the same time as TMA4268:

  • (V)TMA4267 Linear statistical models Multiple linear regression. Analysis of variance. Experimental design. Multivariate normal distribution. Multiple testing.
  • (V)TMA4180 Optimization 1. First and second order necessary and sufficient (Karush-Kuhn-Tucker) optimality conditions for unconstrained and constrained optimization problems in finite-dimensional vector spaces. Basics of convex analysis and Lagrangian duality theory and their application to optimization problems and algorithms. An overview of modern optimization techniques and algorithms for smooth problems (including line-search/trust-region, quasi-Newton, interior point and active set methods, SQP and augmented Lagrangian approaches). Basic derivative-free and non-smooth optimization methods.

Expanding on topics mentioned in TMA4268

  • (V)TMA4250 Spatial statistics. Parameter estimation, simulation and applications of Gaussian random fields, point fields and discrete Markov random fields. Examples from image analysis, and environmental and natural resource applications.
  • (V)TMA4275 Lifetime analysis. Basic concepts in lifetime modelling. Censored observations. Nonparametric estimation and graphical plotting for lifetime data (Kaplan-Meier, Nelson-plot). Estimation and testing in parametric lifetime distributions. Analysis of lifetimes with covariates (Cox-regression, accelerated lifetime testing). Modelling and analysis of recurrent events. Nonhomogeneous Poisson-processes. Nelson-Aalen estimators.
  • (H)TMA4285 Time series models Autoregressive and moving average based models for stationary and non-stationary time series. Parameter estimation. Model identification. Forecasting. ARCH and GARCH models for volatility. State space models (linear dynamic models) and the Kalman filter.
  • (H)TMA4295 Statistical inference. Transformations and moments of random variables. Families of distributions. Inequalities and convergence theorems. Sufficient statistics. Frequentist and Bayesian estimators. Methods of constructing point estimators, interval estimators and hypothesis tests, and optimality of these. Asymptotic properties of estimators and hypothesis tests.
  • (V)TMA4300 Computational statistics. Classical and Markov chain methods for stochastic simulation. Hierarchical Bayesian models and inference in these. The expectation maximisation (EM) algorithm. Bootstrapping, cross-validation and non-parametric methods.
  • (H)TMA4315 Generalized linear models. Univariate exponential family. Multiple linear regression. Logistic regression. Poisson regression. General formulation for generalised linear models with canonical link. Likelihood-based inference with score function and expected Fisher information. Deviance. AIC. Wald and likelihood-ratio test. Linear mixed effects models with random components of general structure. Random intercept and random slope. Generalised linear mixed effects models. Strong emphasis on programming in R. Possible extensions: quasi-likelihood, over-dispersion, models for multinomial data, analysis of contingency tables, quantile regression.

PhD courses