Module 7

Problem 1

The request is a design matrix for a natural spline with \(X\) = year and one knot \(c_1 = 2006\). The boundary knots be the extreme values of year, that is \(c_0 = 2003\) and \(c_2 = 2009\). A general basis for a natural spline is \[ b_1(x_i) = x_i, \quad b_{k+2}(x_i) = d_k(x_i)-d_K(x_i),\; k = 0, \ldots, K - 1,\\ \] \[ d_k(x_i) = \frac{(x_i-c_k)^3_+-(x_i-c_{K+1})^3_+}{c_{K+1}-c_k}. \] In our case we have one internal knot, that is \(K=1\). Thus, \(k\) takes only the value 0. The two basis functions are \[\begin{align*} b_1(x_i) &= x_i,\\ b_2(x_i) &= d_0(x_i)-d_1(x_i)\\ &= \frac{(x_i-c_0)^3_+-(x_i-c_2)^3_+}{c_2-c_0} - \frac{(x_i-c_1)^3_+-(x_i-c_2)^3_+}{c_2-c_1}\\ &= \frac{1}{c_2-c_0}(x_i-c_0)^3_+ - \frac{1}{c_2-c_1}(x_i-c_1)^3_+ + \left(\frac{1}{c_2-c_1}-\frac{1}{c_2-c_0}\right)(x_i-c_{2})^3_+\\ &= \frac{1}{6}(x_i-2003)^3_+ - \frac{1}{3}(x_i-2006)^3_+ + \frac{1}{6}(x_i-2009)^3_+. \end{align*}\]

The design matrix is obtained by \(\mathbf X_{ij} = b_j(x_i)\). We can simplify the second basis function more by using the fact that the boundary knots are the extreme values of \(x_i\), that is \(2003 \leq x_i \leq 2009\). Thus, \[ b_2(x_i) = \frac{1}{6}(x_i-2003)^3 - \frac{1}{3}(x_i-2006)^3_+. \]

Problem 2

Load the Wage dataset by writing library(ISLR) and attach(Wage). Use library(gam) to fit an additive model with wage as response, a polynomial for age and a cubic spline for year. Use 4 basis functions for each covariate.

library(ISLR)
attach(Wage)
library(gam)
fit = gam(wage ~ poly(age, 4) + bs(year, 4))
plot(fit)