Last changes: (08.01: added link to slides from Thiago, and contact information for lecturers, removed student names)
taken from https://www.ntnu.edu/studies/courses/MA8701#tab=omEmnet
The course is usually given every second year, and only if a sufficient number of students register. It is given next time Spring 2019. If too few students register, the course is given as a guided self study.
The course provides a broad introduction to the basic principles and methods of statistical inference and prediction.
The course will give a detailed presentation of selected (contemporary?) advanced topics in statistical inference and learning.
Together with course MA8704 Probability theory and asymptotic techniques it provides a theoretical basis for PhD students in statistics.
Old content - change due to the new course TMA4268 Statistical learning.
The course includes: Introduction to supervised learning. Linear methods for regression and classification. Basic expansions and regularization. Kernel smoothing methods. Likelihood inference and asymptotic methods. Model inference, assessment and selection. Empirical Bayes methods.
New content
Part 1: Regularized linear and generalized linear models [2w]
Part 2: Smoothing and splines [2w]
Part 3: Experimental design in statistical learning [1w]
Part 4: Deep neural nets [3w]
Part 5: Active learning [2w]
The course gives the student a through background in selected topic relevant for statistical inference and learning. Together with course MA8704 Probability theory and asymptotic techniques it provides a theoretical basis for PhD students in statistics, and together with MA8702 Anvanced computer intensive statistical methods it provides a computational basis.
After completing this course the students are able to use advanced techniques in statistical inference and learning for analysing complex and large amounts of data.
The students will be able to participate in scientific discussions and carry out research in statistics at high international level. They will be able to participate in applied projects involving statistical methods and to apply their knowledge to problems in theoretical statistics.
for the students.
TMA4267 Linear statistical models
TMA4295 Statistical inference
TMA4300 Computer intensive statistical methods
TMA4315 Generalized linear models
TMA4268 Statistical learning
TMA4180 Optimization
In addition good programming skills in either R or Python, and it is also preferable if you have some knowledge of commands in unix and the skills to be able to run a script on a computer cluster.
Suggested help for unix shell: http://swcarpentry.github.com/shell-novice/
A larger project will count 30% of the grade, and is to be presented orally by the students (R or Python). The project work can be done in teams of 2 or 3 students.
There will be a final oral exam counting 70% of the grade (date not decided)
The grade for this course is pass/fail, and 70/100 score is required to pass (this is the standard rule for PhD courses at NTNU).
Lecture: week 2 08.01 - this lecture!
We expand on what you have learned about the lasso and ridge in TMA4268, and marry with the GLM from TMA4315.
Lectures: week 3 15.01 and week 4 22.01.
Readings:
Benjamin elaborates!
Day 1 (15.01.2019)
Day 2 (22.01.2019)
Lecture: week 5 29.01
Erlend Aune will talk about how to perform computations (hopefully on a GPU cluster) for the compulsory project.
In addition: those of you who are not familiar with Classification can attend the lectures in TMA4268 Statistical learning Monday 28.01 at 08.15 in S4 and Thursday 31.01 at 14.15-16.00 in F6 with Mette Langaas.
Lectures: week 6 05.02 and week 7 12.02
Readings: Friedman, Hastie and Tibshirani (2008): Elements of Statistical Learning. Chapter 5 and 6. Book at https://web.stanford.edu/~hastie/ElemStatLearn/ (free)
Slides from Bo: https://www.math.ntnu.no/emner/MA8701/2019v/Smoothingandsplines.pdf
Lectures: week 8 19.03 12.15-14.00 and 21.03 10.15-12.00
Readings: TBA
John elaborates:
Machine learning methods are often complex and they may resist formal analysis methods. Therefore to learn about and better understand their behavior we need to do empirical investigations which are best performed using controlled experiments. In these lectures we will mainly concentrate on three topics:
all based on methods within Design of Experiments.
In this paper, we propose a design of experiments (DOE) methodology as the first step to screen the most significant hyperparameters (factors) of a ML algorithm.
Reducing the number of factors to a subset which has the greatest effect on model performance considerably reduces the number of model-fitting runs in the next round of hyperparameter tuning experiments.
The screening phase is done using fractional factorial designs, which are well-suited for scenarios in which we do not have the luxury of running many experiments; screening may also be done using other designs, as explained at the end of Section 2.1.
Once the main factors are identified, a full factorial experiment can be run on the factors as a confirmatory procedure.
The second phase of our method consists of applying response surface methodology (RSM) to model a first- or second-order polynomial which approximates the performance of the model given different hyperparameter configurations.
Lectures: week 9 26.02, week 10 05.03 and week 11 12.03
Readings:
You choose what you read, both built on the keras package. This book you have to buy - at Manning as ebook or Akademika on paper. The library should also have 2 copies of the R book.
Lectures: week 12 19.03 and week 13 26.03
Readings: Review article to be announced, but this should be a good read http://burrsettles.com/pub/settles.activelearning.pdf
Erlend refers us to the Wiki page for Active learning: https://en.wikipedia.org/wiki/Active_learning_(machine_learning) and says that he will probably also cover the cutting edge transfer learning.
Thiago elaborates!
Link to data set: https://www.kaggle.com/c/avito-demand-prediction/overview
Some of you have not taken our first course on statistical learning, TMA4268. (Who are you?)
Lectures are Mondays at 08.15-10.00 in S4 and Thursdays 14.15-16.00 in F6/Smia.
Textbook is James et al: Introduction to Statistical Learning, with Applications in R, which is made as a soft version of the Elements book that we use in Part 2.
Here is an overview of the topics we cover and when: https://www.math.ntnu.no/emner/TMA4268/2019v/TMA4268overview.html