Solutions - see Problem 2: https://www.math.ntnu.no/emner/TMA4268/2018v/Compulsory2solutions.html
year
and one knot \(c_1 = 2006\). The boundary knots be the extreme values of year
, that is \(c_0 = 2003\) and \(c_2 = 2009\). A general basis for a natural spline is \[
b_1(x_i) = x_i, \quad b_{k+2}(x_i) = d_k(x_i)-d_K(x_i),\; k = 0, \ldots, K - 1,\\
\] \[
d_k(x_i) = \frac{(x_i-c_k)^3_+-(x_i-c_{K+1})^3_+}{c_{K+1}-c_k}.
\] In our case we have one internal knot, that is \(K=1\). Thus, \(k\) takes only the value 0. The two basis functions are
\[\begin{align*}
b_1(x_i) &= x_i,\\
b_2(x_i) &= d_0(x_i)-d_1(x_i)\\
&= \frac{(x_i-c_0)^3_+-(x_i-c_2)^3_+}{c_2-c_0} - \frac{(x_i-c_1)^3_+-(x_i-c_2)^3_+}{c_2-c_1}\\
&= \frac{1}{c_2-c_0}(x_i-c_0)^3_+ - \frac{1}{c_2-c_1}(x_i-c_1)^3_+ + \left(\frac{1}{c_2-c_1}-\frac{1}{c_2-c_0}\right)(x_i-c_{2})^3_+\\
&= \frac{1}{6}(x_i-2003)^3_+ - \frac{1}{3}(x_i-2006)^3_+ + \frac{1}{6}(x_i-2009)^3_+.
\end{align*}\]
The design matrix is obtained by \(\mathbf X_{ij} = b_j(x_i)\). We can simplify the second basis function more by using the fact that the boundary knots are the extreme values of \(x_i\), that is \(2003 \leq x_i \leq 2009\). Thus, \[ b_2(x_i) = \frac{1}{6}(x_i-2003)^3 - \frac{1}{3}(x_i-2006)^3_+. \]
Load the Wage dataset by writing library(ISLR)
and attach(Wage)
. Use library(gam)
to fit an additive model with wage
as response, a polynomial for age
and a cubic spline for year
. Use 4 basis functions for each covariate.
library(ISLR)
attach(Wage)
library(gam)
fit = gam(wage ~ poly(age, 4) + bs(year, 4))
plot(fit)