Module 6

Shrinkage methods

In this exercise we will study lasso and ridge regression. We continue using the ourAutoTrain dataset from Problem 1 (of Compulsory exercise 2 in 2018).

a) Lasso and ridge regression [2 points]

In a regression model with \(p\) predictors the ridge regression coefficients are the values that minimize

\[ \sum_{i=1}^{n}(y_i-\beta_0-\sum_{j=1}^p\beta_j x_{ij})^2+\lambda \sum_{j=1}^{p}\beta_j^2 \] while the lasso regression coefficients are the values that minimize \[ \sum_{i=1}^{n}(y_i-\beta_0-\sum_{j=1}^p\beta_j x_{ij})^2+\lambda \sum_{j=1}^{p}\lvert \beta_j \rvert. \] In Figure 1 and Figure 2 you see the results from lasso and ridge regression applied to ourAutoTrain. Standardized coefficients \(\hat{\beta_1},...,\hat{\beta_8}\) are plotted against the tuning parameter \(\lambda\).

  • Q11. Which figure (1 or 2) corresponds to ridge and which figure corresponds to lasso? Justify your answer.
  • Q12. Use the two figures and the above formulas to explain the impact of the tuning parameter \(\lambda\) on the coefficients \(\beta_j\), and on the bias and variance of the resulting predictions. In particular, what happens when \(\lambda=0\) and when \(\lambda \rightarrow \infty\)?
  • Q13. Can you use lasso and/or ridge regression to perform model selection similar to what you did in Problem 1? Explain. Compare what you see in Figure 1 and Figure 2 to the results in Problem 1b.
Figure 1.

Figure 2.

b) Finding the optimal \(\lambda\) [1 point]

In the following, we will use functions in the glmnet package to perform \(lasso\) regression. The first step is to find the optimal tuning parameter \(\lambda\). This is done by cross-validation using the cv.glmnet() function:


x=model.matrix(mpg~.,ourAutoTrain)[,-1] #-1 to remove the intercept.

lambda=c(seq(from=5,to=0.1,length.out=150),0.01,0.0001) #Create a set of tuning parameters, adding low value to also see least squares fit
cv.out=cv.glmnet(x,y,alpha=1,nfolds=10,lambda=lambda, standardize=TRUE) #alpha=1 gives lasso, alpha=0 gives ridge


  • Q14. Explain what the function cv.glmnet does. Hint: help(cv.glmnet).
  • Q15. Explain what we see in the above plot. How can it be used to identify the optimal \(\lambda\)? Remark: To find the optimal \(\lambda\) a popular choice is to choose the \(\lambda\) giving the lowest cross-validated MSE. Another choice is called the 1se-rule. See help(cv.glmnet).
  • Q16. Use the output from cv.glmnet and the 1se-rule to choose the “optimal”" \(\lambda\).

c) Prediction [1 point]

  • Q17. Use lasso regression to fit the model corresponding to the optimal \(\lambda\) from Q16. What are the coefficient estimates? Write down the model fit.
  • Q18. Assume that a car has 4 cylinders, displace=150, horsepower=100, weight=3000, acceleration=10, year=82 and comes from Europe. What is the predicted mpg for this car given the chosen model from Q17? Hint: you need to construct the new observation in the same way as observations in the model matrix x (the dummy variable coding for cylinders and origin) and newx need to be a matrix newx=matrix(c(0,150,100,3000,10,82,1,0),nrow=1).

Module 7

Problem 1:

Write the design matrix for a natural spline with \(X\) = year and one knot \(c_1 = 2006\). Let the boundary knots be the extreme values of year, that is \(c_0 = 2003\) and \(c_2 = 2009\). A general basis for a natural spline is \[ b_1(x_i) = x_i, \quad b_{k+2}(x_i) = d_k(x_i)-d_K(x_i),\; k = 0, \ldots, K - 1,\\ \] \[ d_k(x_i) = \frac{(x_i-c_k)^3_+-(x_i-c_{K+1})^3_+}{c_{K+1}-c_k}. \]

Problem 2:

Load the Wage dataset by writing library(ISLR) and attach(Wage). Use library(gam) to fit an additive model with wage as response, a polynomial for age and a cubic spline for year. Use 4 basis functions for each covariate.

