Module 6

Shrinkage methods

(This was Problem 2 for Compulsory exercise 2 in 2018.)

In this exercise we will study lasso and ridge regression. We continue using the ourAutoTrain dataset from Problem 1 (of Compulsory exercise 2 in 2018).

library(ISLR)
ourAuto=data.frame("mpg"=Auto$mpg,"cylinders"=factor(cut(Auto$cylinders,2)),
                   "displace"=Auto$displacement,"horsepower"=Auto$horsepower,
                   "weight"=Auto$weight,"acceleration"=Auto$acceleration, 
                   "year"=Auto$year,"origin"=as.factor(Auto$origin))
colnames(ourAuto)
## [1] "mpg"          "cylinders"    "displace"     "horsepower"  
## [5] "weight"       "acceleration" "year"         "origin"
ntot=dim(ourAuto)[1]
ntot
## [1] 392
set.seed(4268)
testids=sort(sample(1:ntot,ceiling(0.2*ntot),replace=FALSE))
ourAutoTrain=ourAuto[-testids,]
ourAutoTest=ourAuto[testids,]

a) Lasso and ridge regression [2 points]

In a regression model with \(p\) predictors the ridge regression coefficients are the values that minimize

\[ \sum_{i=1}^{n}(y_i-\beta_0-\sum_{j=1}^p\beta_j x_{ij})^2+\lambda \sum_{j=1}^{p}\beta_j^2 \] while the lasso regression coefficients are the values that minimize \[ \sum_{i=1}^{n}(y_i-\beta_0-\sum_{j=1}^p\beta_j x_{ij})^2+\lambda \sum_{j=1}^{p}\lvert \beta_j \rvert. \] In Figure 1 and Figure 2 you see the results from lasso and ridge regression applied to ourAutoTrain. Standardized coefficients \(\hat{\beta_1},...,\hat{\beta_8}\) are plotted against the tuning parameter \(\lambda\).

  • Q11. Which figure (1 or 2) corresponds to ridge and which figure corresponds to lasso? Justify your answer.
  • Q12. Use the two figures and the above formulas to explain the impact of the tuning parameter \(\lambda\) on the coefficients \(\beta_j\), and on the bias and variance of the resulting predictions. In particular, what happens when \(\lambda=0\) and when \(\lambda \rightarrow \infty\)?
  • Q13. Can you use lasso and/or ridge regression to perform model selection similar to what you did in Problem 1? Explain. Compare what you see in Figure 1 and Figure 2 to the results in Problem 1b.
Figure 1.

Figure 1.

Figure 2.

Figure 2.

b) Finding the optimal \(\lambda\) [1 point]

In the following, we will use functions in the glmnet package to perform \(lasso\) regression. The first step is to find the optimal tuning parameter \(\lambda\). This is done by cross-validation using the cv.glmnet() function:

library(glmnet)
set.seed(4268)

x=model.matrix(mpg~.,ourAutoTrain)[,-1] #-1 to remove the intercept.
head(x)
y=ourAutoTrain$mpg

lambda=c(seq(from=5,to=0.1,length.out=150),0.01,0.0001) #Create a set of tuning parameters, adding low value to also see least squares fit
cv.out=cv.glmnet(x,y,alpha=1,nfolds=10,lambda=lambda, standardize=TRUE) #alpha=1 gives lasso, alpha=0 gives ridge

plot(cv.out)

##   cylinders(5.5,8.01] displace horsepower weight acceleration year origin2
## 1                   1      307        130   3504         12.0   70       0
## 2                   1      350        165   3693         11.5   70       0
## 4                   1      304        150   3433         12.0   70       0
## 5                   1      302        140   3449         10.5   70       0
## 8                   1      440        215   4312          8.5   70       0
## 9                   1      455        225   4425         10.0   70       0
##   origin3
## 1       0
## 2       0
## 4       0
## 5       0
## 8       0
## 9       0
  • Q14. Explain what the function cv.glmnet does. Hint: help(cv.glmnet).
  • Q15. Explain what we see in the above plot. How can it be used to identify the optimal \(\lambda\)? Remark: To find the optimal \(\lambda\) a popular choice is to choose the \(\lambda\) giving the lowest cross-validated MSE. Another choice is called the 1se-rule. See help(cv.glmnet).
  • Q16. Use the output from cv.glmnet and the 1se-rule to choose the “optimal”" \(\lambda\).

c) Prediction [1 point]

  • Q17. Use lasso regression to fit the model corresponding to the optimal \(\lambda\) from Q16. What are the coefficient estimates? Write down the model fit.
  • Q18. Assume that a car has 4 cylinders, displace=150, horsepower=100, weight=3000, acceleration=10, year=82 and comes from Europe. What is the predicted mpg for this car given the chosen model from Q17? Hint: you need to construct the new observation in the same way as observations in the model matrix x (the dummy variable coding for cylinders and origin) and newx need to be a matrix newx=matrix(c(0,150,100,3000,10,82,1,0),nrow=1).

Module 7

Problem 1:

Write the design matrix for a natural spline with \(X\) = year and one knot \(c_1 = 2006\). Let the boundary knots be the extreme values of year, that is \(c_0 = 2003\) and \(c_2 = 2009\). A general basis for a natural spline is \[ b_1(x_i) = x_i, \quad b_{k+2}(x_i) = d_k(x_i)-d_K(x_i),\; k = 0, \ldots, K - 1,\\ \] \[ d_k(x_i) = \frac{(x_i-c_k)^3_+-(x_i-c_{K+1})^3_+}{c_{K+1}-c_k}. \]

Problem 2:

Load the Wage dataset by writing library(ISLR) and attach(Wage). Use library(gam) to fit an additive model with wage as response, a polynomial for age and a cubic spline for year. Use 4 basis functions for each covariate.

Team Kahoot!

for module 6 and 7!