Module 6

Shrinkage methods

(This was Problem 2 for Compulsory exercise 2 in 2018.)

In this exercise we will study lasso and ridge regression. We continue using the ourAutoTrain dataset from Problem 1 (of Compulsory exercise 2 in 2018).

library(ISLR)
ourAuto=data.frame("mpg"=Auto$mpg,"cylinders"=factor(cut(Auto$cylinders,2)),
                   "displace"=Auto$displacement,"horsepower"=Auto$horsepower,
                   "weight"=Auto$weight,"acceleration"=Auto$acceleration, 
                   "year"=Auto$year,"origin"=as.factor(Auto$origin))
colnames(ourAuto)

## [1] "mpg"          "cylinders"    "displace"     "horsepower"  
## [5] "weight"       "acceleration" "year"         "origin"

ntot=dim(ourAuto)[1]
ntot

## [1] 392

set.seed(4268)
testids=sort(sample(1:ntot,ceiling(0.2*ntot),replace=FALSE))
ourAutoTrain=ourAuto[-testids,]
ourAutoTest=ourAuto[testids,]

a) Lasso and ridge regression [2 points]

In a regression model with \(p\) predictors the ridge regression coefficients are the values that minimize

\[ \sum_{i=1}^{n}(y_i-\beta_0-\sum_{j=1}^p\beta_j x_{ij})^2+\lambda \sum_{j=1}^{p}\beta_j^2 \] while the lasso regression coefficients are the values that minimize \[ \sum_{i=1}^{n}(y_i-\beta_0-\sum_{j=1}^p\beta_j x_{ij})^2+\lambda \sum_{j=1}^{p}\lvert \beta_j \rvert. \] In Figure 1 and Figure 2 you see the results from lasso and ridge regression applied to ourAutoTrain. Standardized coefficients \(\hat{\beta_1},...,\hat{\beta_8}\) are plotted against the tuning parameter \(\lambda\).

Q11. Which figure (1 or 2) corresponds to ridge and which figure corresponds to lasso? Justify your answer.
Q12. Use the two figures and the above formulas to explain the impact of the tuning parameter \(\lambda\) on the coefficients \(\beta_j\), and on the bias and variance of the resulting predictions. In particular, what happens when \(\lambda=0\) and when \(\lambda \rightarrow \infty\)?
Q13. Can you use lasso and/or ridge regression to perform model selection similar to what you did in Problem 1? Explain. Compare what you see in Figure 1 and Figure 2 to the results in Problem 1b.

Figure 1.

Figure 2.

b) Finding the optimal \(\lambda\) [1 point]

In the following, we will use functions in the glmnet package to perform \(lasso\) regression. The first step is to find the optimal tuning parameter \(\lambda\). This is done by cross-validation using the cv.glmnet() function:

library(glmnet)
set.seed(4268)

x=model.matrix(mpg~.,ourAutoTrain)[,-1] #-1 to remove the intercept.
head(x)
y=ourAutoTrain$mpg

lambda=c(seq(from=5,to=0.1,length.out=150),0.01,0.0001) #Create a set of tuning parameters, adding low value to also see least squares fit
cv.out=cv.glmnet(x,y,alpha=1,nfolds=10,lambda=lambda, standardize=TRUE) #alpha=1 gives lasso, alpha=0 gives ridge

plot(cv.out)

##   cylinders(5.5,8.01] displace horsepower weight acceleration year origin2
## 1                   1      307        130   3504         12.0   70       0
## 2                   1      350        165   3693         11.5   70       0
## 4                   1      304        150   3433         12.0   70       0
## 5                   1      302        140   3449         10.5   70       0
## 8                   1      440        215   4312          8.5   70       0
## 9                   1      455        225   4425         10.0   70       0
##   origin3
## 1       0
## 2       0
## 4       0
## 5       0
## 8       0
## 9       0

Q14. Explain what the function cv.glmnet does. Hint: help(cv.glmnet).
Q15. Explain what we see in the above plot. How can it be used to identify the optimal \(\lambda\)? Remark: To find the optimal \(\lambda\) a popular choice is to choose the \(\lambda\) giving the lowest cross-validated MSE. Another choice is called the 1se-rule. See help(cv.glmnet).
Q16. Use the output from cv.glmnet and the 1se-rule to choose the “optimal”" \(\lambda\).

c) Prediction [1 point]

Q17. Use lasso regression to fit the model corresponding to the optimal \(\lambda\) from Q16. What are the coefficient estimates? Write down the model fit.
Q18. Assume that a car has 4 cylinders, displace=150, horsepower=100, weight=3000, acceleration=10, year=82 and comes from Europe. What is the predicted mpg for this car given the chosen model from Q17? Hint: you need to construct the new observation in the same way as observations in the model matrix x (the dummy variable coding for cylinders and origin) and newx need to be a matrix newx=matrix(c(0,150,100,3000,10,82,1,0),nrow=1).

Module 7

Problem 1:

Write the design matrix for a natural spline with \(X\) = year and one knot \(c_1 = 2006\). Let the boundary knots be the extreme values of year, that is \(c_0 = 2003\) and \(c_2 = 2009\). A general basis for a natural spline is \[ b_1(x_i) = x_i, \quad b_{k+2}(x_i) = d_k(x_i)-d_K(x_i),\; k = 0, \ldots, K - 1,\\ \] \[ d_k(x_i) = \frac{(x_i-c_k)^3_+-(x_i-c_{K+1})^3_+}{c_{K+1}-c_k}. \]

Problem 2:

Load the Wage dataset by writing library(ISLR) and attach(Wage). Use library(gam) to fit an additive model with wage as response, a polynomial for age and a cubic spline for year. Use 4 basis functions for each covariate.

Team Kahoot!

for module 6 and 7!

Interactive lecture module 6 and 7

TMA4268 Statistical learning

Mette Langaas, Andreas Strand

26 February, 2019