(Latest changes: 08.11.2018 - typos).
(Warning: some changes may occur before the second week.)
Aim: Present methods for analysing correlated responses in a (normal/Gaussian) regression setting.
We will only consider two-level models and in particular focus on random intercept and random slope models.
Textbook: Fahrmeir et al (2013): Chapter 2.4, 7.1-7.3, 7.7. In greater detail: pages 349-354 (not “Alternative view on the random intercept model”), 356-365 (not 7.1.5 “Stochastic Covariates”“), 368-377 (not”Bayesian Covariance Matrix“”), 379-380 (not “Testing Random Effects or Variance Parameters”“, only last part on page 383), 383 (middle), 401-409 (orange juice). Note: Bayesian solutions not on the reading list.
Alternative readings: Zuur et al. (2009): “Mixed Effects Models and Extensions in Ecology with R”, chapter 5 (pages 101-142). Available as free ebook from Springer for NTNU students. More explanations and less mathematics than Fahrmeir et al (2013), more focus on understanding. Link to ebook Chapter 5
Jump to interactive (week 1)
lmer
in package lme4
Jump to interactive (week 2)
We have \(n_i\) repeated observations (\(j=1,\ldots, n_i\)) from each of \(i=1,\ldots,m\) clusters (or individuals).
The covariates \({\bf x}_{ij}\) are \(p\times 1\) vectors (as before, \(k\) covariates and one intercept so \(p=k+1\)).
We start with a simple linear regression (one fixed effect).
First, only one covariate (in addition to the intercept), observed for cluster (beach) \(i\) on occation \(j\) we have \(x_{ij}\)
\[ Y_{ij}=\beta_0+\beta_1 x_{ij}+\varepsilon_{ij} \text{ where } \varepsilon_{ij} \text{ i.i.d. } N(0,\sigma^2)\]
but, we know that \(Y_{i1}\) and \(Y_{i2}\) are observed for the same cluster and should not be independent (i.e. from same beach), to fix that we insert a random intercept.
\[Y_{ij}=\beta_0+\beta_1x_{ij}+\gamma_{0i}+\varepsilon_{ij} \text{ where } \varepsilon_{ij} \text{i.i.d.} N(0,\sigma^2)\]
The clusters are assumed to be random samples from a large population, and for the cluster deviation intercept we assume \[ \gamma_{0i}\sim N(0,\tau_0^2)\] and that the \(\gamma_{0i}\)s and the \(\varepsilon_{ij}\)s are mutually independent random variables. So, we have now two error terms.
Q: What are now our unknown parameters?
A: \({\boldsymbol \gamma}_0, \beta_1, \sigma^2, \tau_0^2\). Remark that the \(\gamma_{0i}\)s are not parameters but random variables.
(will talk about parameter estimation with ML and REML later)
Q: Try to identify \(\hat{\beta_0}, \hat{\beta_1}, \hat{\sigma}^2, \hat{\tau}_0^2\) in the print-out.
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
A: \(\hat{\beta_0}=\) 6.5818929, \(\hat{\beta_1}=\) -2.5683996 , \(\hat{\sigma}^2=\) 9.3621916, and \(\hat{\tau}_0^2=\) 8.6675181
The conditional distribution of \(Y_{ij}\) given the value of \(\gamma_{0i}\) (=regarding \(\gamma_{0i}\) as known) is \[Y_{ij}\mid \gamma_{0i} \sim N(\beta_0+\beta_1x_{ij}+\gamma_{0i},\sigma^2)\]
One motivation for inserting this new random intercept was to make sure that observations from the same cluster are dependent, but between clusters are independent. This means that we need to look at \(\text{Cov}(Y_{ij},Y_{kl})\) when \(i=k\) and when \(i\neq k\). To do that we need the (joint) marginal distribution of the responses.
What is the marginal distribution for \(Y_{ij}\)? \[ Y_{ij}=\beta_0+\beta_1x_{ij}+\gamma_{0i}+\varepsilon_{ij}\]
\[ Y_{ij} \sim N(\beta_0+\beta_1 x_{ij},\sigma^2+\tau_0^2)\]
We consider \(Y_{ij}\), and \(Y_{kl}\), where
\[ Y_{kl}=\beta_0+\beta_1x_{kl}+\gamma_{0k}+\varepsilon_{kl}\sim N(\beta_0+\beta_1 x_{kl},\sigma^2+\tau_0^2)\]
Now to the covariance between \(Y_{ij}\) and \(Y_{kl}\).
\[\text{Cov}(Y_{ij},Y_{kl})=\text{E}[(Y_{ij}-\mu_{ij})(Y_{kl}-\mu_{kl})]\]
\(\oplus\) derivations on the board in class
\[\text{Cov}(Y_{ij},Y_{kl})=\left\{\begin{array}{lr} \tau_0^2+\sigma^2=\text{Var}(Y_{ij}) & \text{for } i=k, j=l\\ \tau_0^2 & \text{for } i=k, j\neq l\\ 0 & \text{for } i\neq k, j\neq l \end{array}\right\} \]
If we put this into a covariance matrix for the vector of responses for cluster \(i\) this type of structure is called compound symmetry.
\(\oplus\) write this out on the board in class \[\text{Cov}({\bf Y}_i)=\tau_0^2 \mathbf{11}^T + \sigma^2 \mathbf{I}\]
where as before \({\bf 1}\) is a \(n_i\times 1\) vector of 1s and \(\mathbf{I}\) is a \(n_i\times n_i\) identity matrix.
The correlation between \(Y_{ij}\) and \(Y_{il}\) (two observations in the same cluster that is - same beach) is called the within subject or within cluster correlation coefficient, and is for our random intercept model
\[\text{Corr}(Y_{ij},Y_{il})=\frac{\text{Cov}(Y_{ij},Y_{il})}{\sqrt{\text{Var}(Y_{ij})\text{Var}(Y_{il})}}=\frac{\tau_0^2}{\tau_0^2+\sigma^2} \text{ for }j\neq l\]
Inserted parameter estimates this is called the intra class correlation (ICC) for the random intercept model.
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
Q: What is the ICC for our fit? Hint: \(\text{Corr}(Y_{ij},Y_{il})=\frac{\tau_0^2}{\tau_0^2+\sigma^2} \text{ for }j\neq l\).
A: 8.688/(8.688+9.362)=0.48.
We have now looked at models with a random intercept, which is a special case of linear mixed models (LMM). We have now see that we have used two components in our model:
But, what if we also need the different clusters to have different slopes? The generalization of the random intercept model is the random slope model which can be written as:
\[Y_{ij}=\beta_0+\beta_1x_{ij}+\gamma_{0i}+\gamma_{1i}x_{ij}+\varepsilon_{ij}\]
and where the new parts are
More about this model in week 2.
We will now imagine that the fixed effects can be of form \({\bf X}_i{\boldsymbol \beta}\) - as before (in previous modules - but now remember that we have \(n_i\) observations for cluster \(i\) so therefore we write \({\bf X}_i\) and not \({\bf x}_i^T\)). But it is also possible to expand the random effects into a similar form \({\bf U}_i{\boldsymbol \gamma}\).
We now define the LMM from a measurement model and distributional assumption for clusters \(i=1,\ldots, m\).
for the \(i\)th cluster:
\[{\bf Y}_i= {\bf X}_i {\boldsymbol \beta} + {\bf U}_i {\boldsymbol \gamma}_{i} + {\boldsymbol \varepsilon}_i\]
for the \(i\)th cluster:
\[{\boldsymbol \gamma}_i \sim N({\bf 0},{\bf Q})\] \[{\boldsymbol \varepsilon}_i \sim N({\bf 0},\sigma^2 {\bf I})\]
All elements of \({\boldsymbol \gamma}_1, {\boldsymbol \gamma}_2, \ldots, {\boldsymbol \gamma}_m\) and \({\boldsymbol\varepsilon}_1,{\boldsymbol\varepsilon}_2,\ldots, {\boldsymbol\varepsilon}_m\) are mutually independent. The dimension of \({\bf I}\) is \(n_i\times n_i\), and \({\bf Q}\) is \((q+1)\times (q+1)\).
Remark: one possible generalization is to assume \({\boldsymbol\varepsilon}_i \sim N({\bf 0},\Sigma)\) where \(\Sigma\) is general, but we will not look into that in our course. This can be needed for example if time series structure (like AR1) is present.
\[{\bf Y}_i= {\bf X}_i {\boldsymbol\beta} + {\bf U}_i {\boldsymbol\gamma}_{i} + {\boldsymbol\varepsilon}_i\] \[{\boldsymbol\gamma}_i \sim N({\bf 0},{\bf Q})\] \[{\boldsymbol\varepsilon}_i \sim N({\bf 0},\sigma^2 {\bf I})\]
Q:
A:
Conditional Gaussian model for the response \({\bf Y}_i\) given the random effect \({\boldsymbol\gamma}_i\): \[{\bf Y}_i\mid {\boldsymbol \gamma}_i \sim N({\bf X}_i {\boldsymbol \beta} + {\bf U}_i {\boldsymbol \gamma}_{i}, \sigma^2{\bf I})\]
which gives \[{\bf Y}_i\sim N(\mu_i={\bf X}_i {\boldsymbol \beta},{\bf V}_i=\sigma^2{\bf I}+{\bf U}_i {\bf Q} {\bf U}_i^T)\]
From the cluster specific model: \[{\bf Y}_i= {\bf X}_i {\boldsymbol \beta} + {\bf U}_i {\boldsymbol \gamma}_{i} + {\boldsymbol \varepsilon}_i\] into the global model for all clusters: \[{\bf Y}={\bf X}{\boldsymbol \beta} + {\bf U} {\boldsymbol \gamma} +{\boldsymbol \varepsilon}\] where \[ {\bf Y}=\begin{pmatrix} {\bf Y}_1\\ {\bf Y}_2\\ \vdots \\ {\bf Y}_m \end{pmatrix}, {\bf X}=\begin{pmatrix} {\bf X}_1\\ {\bf X}_2 \\ \vdots \\ {\bf X}_m \end{pmatrix}, {\bf U}=\begin{pmatrix} {\bf U}_1 & {\bf 0} & \ldots &{\bf 0}\\ {\bf 0 } & {\bf U}_2 & \ldots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ddots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ldots &{\bf U}_m\\ \end{pmatrix}, {\boldsymbol \gamma}=\begin{pmatrix} {\boldsymbol \gamma}_1\\ {\boldsymbol \gamma}_2\\ \vdots \\ {\boldsymbol \gamma}_m \end{pmatrix}, {\boldsymbol \varepsilon}=\begin{pmatrix} {\boldsymbol \varepsilon}_1\\ {\boldsymbol \varepsilon}_2 \\ \vdots \\ {\boldsymbol \varepsilon}_m \end{pmatrix} \]
Let \(N=\sum_{i=1}^m n_i\), then dimensions are:
for the response \({\bf Y}\) given the random effect \({\boldsymbol \gamma}\): \[{\bf Y}\mid {\boldsymbol \gamma} \sim N({\bf X} {\boldsymbol \beta} + {\bf U} {\boldsymbol \gamma}, \sigma^2{\bf I})\] Now \({\bf I}\) is \(N \times N\) where \(N=\sum_{i=1}^m n_i\).
Here \({\bf G}\) is a \(m\dot (q+1)\) block-diagonal matrix with \({\bf Q}\) \(m\) times on the diagonal (see below). This gives
\[{\bf Y}\sim N({\bf X} {\boldsymbol \beta},{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
\[ {\bf G}=\begin{pmatrix} {\bf Q} & {\bf 0} & \ldots &{\bf 0}\\ {\bf 0 } & {\bf Q} & \ldots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ddots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ldots &{\bf Q}\\ \end{pmatrix} \]
For the random effects \({\boldsymbol \gamma}_i\) and \({\boldsymbol \varepsilon}_i\) we also provide predictions
Cluster specific model: \[{\bf Y}_i\sim N(\mu_i={\bf X}_i {\boldsymbol \beta},{\bf V}_i=\sigma^2{\bf I}+{\bf U}_i {\bf Q} {\bf U}_i^T)\] Global model: \[{\bf Y}\sim N({\bf X} {\boldsymbol \beta},{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
The log-likelihood function is then (\(\pm\) some constants) \[l({\boldsymbol \beta},{\bf V})=-\frac{1}{2}\ln \lvert {\bf V}\rvert -\frac{1}{2}({\bf y}-{\bf X}{\boldsymbol \beta})^T {\bf V}^{-1}({\bf y}-{\bf X}{\boldsymbol \beta})\]
We assume that the parameters in \({\bf V}\) are known, and transform our problem with \({\bf V}^{-1/2}\) to get the standard multiple linear regression model.
In TMA4267 Exam 2014 Problem 4, the weighted least squares estimator was derived from a similar situation. Now we have
\[{\bf Y}={\bf X} {\boldsymbol \beta}+{\boldsymbol \varepsilon}^*\]
where \[{\boldsymbol \varepsilon}^{*} \sim N({\bf 0},{\bf V})\] We then define \({\bf V}^{-1/2}\) based on eigenvalue-vector decomposition of \({\bf V}\), and premultiply the above equation to get: \[{\bf V}^{-1/2}{\bf Y}={\bf V}^{-1/2}{\bf X} {\boldsymbol \beta}+{\bf V}^{-1/2}{\boldsymbol \varepsilon}^*\] \[{\bf Y}^{\bullet}={\bf X}^{\bullet} {\boldsymbol \beta}+{\boldsymbol \varepsilon}^{\bullet}\] where \({\boldsymbol \varepsilon}^{\bullet}\sim N({\bf 0}, {\bf I})\). We then know that the ML estimator for \({\boldsymbol \beta}\) is given as
\[\hat{\boldsymbol \beta}=({\bf X}^{\bullet T}{\bf X}^{\bullet})^{-1}{\bf X}^{\bullet T}{\bf Y}^{\bullet}\]
Inserting then for \({\bf X}^{\bullet T}\) and \({\bf Y}^{\bullet}\) leads to the weighted least squares solution for \({\boldsymbol \beta}\)
\[\hat{{\boldsymbol \beta}}=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T {\bf V}^{-1}{\bf Y}\]
Since we have independence between clusters, we had block-diagonal \({\bf V}\) and then \[\hat{{\boldsymbol \beta}}=(\sum_{i=1}^m {\bf X}_i^T{\bf V}_i^{-1}{\bf X}_i)^{-1} \sum_{i=1}^m {\bf X}_i^T {\bf V}_i^{-1}{\bf Y}_i\]
But, we really don’t know \({\bf V}\) so an estimate must be inserted (week 2 with REML).
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
Q: Where is the parameter estimat for effect of NAP, with standard deviation? What is the interpretation of “Correlation of Fixed Effects”?
A:“Fixed effects, NAP, Estimate and Std. Error”. Correlation of fixed effects is the off-diagonal elements of \(\text{Cov}(\hat{{\boldsymbol \beta}})\) which here is only \(\text{Cov}(\hat{\beta}_0,\hat{\beta}_1)\).
Since \(\hat{{\boldsymbol \beta}}\) is a linear function of \({\bf Y}_i\), it has a multivariate normal distribution - let us write with \[\hat{{\boldsymbol \beta}}=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T {\bf V}^{-1}{\bf Y}={\bf A} {\bf Y}\]
Mean: \[\text{E}(\hat{{\boldsymbol \beta}})={\bf A}\text{E}({\bf Y})=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T {\bf V}^{-1}{\bf X}{\boldsymbol\beta}={\boldsymbol \beta}\]
so \(\hat{{\boldsymbol \beta}}\) is unbiased.
Variance-covariance matrix: \[\text{Cov}(\hat{{\boldsymbol \beta}}={\bf A}\text{Cov}({\bf Y}){\bf A}^T={\bf A}{\bf V}{\bf A}^T\] \[=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T {\bf V}^{-1}{\bf V}{\bf V}^{-1}{\bf X}({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}\]
Since \({\bf V}\) is block diagonal this can be written as a sum with the clusters: \[ \text{Cov}(\hat{\boldsymbol \beta})= (\sum_{i=1}^m {\bf X}_i^T{\bf V}_i^{-1}{\bf X}_i)^{-1}\]
Also, if we insert \(\hat{\bf V}\) as an estimate for \({\bf V}\) then \[\hat{{\boldsymbol \beta}}=(\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1}\sum_{i=1}^m {\bf X}_i^T \hat{\bf V}_i^{-1}{\bf Y}_i\]
is asymptotic multivariate normal (under regularity conditions).
Remark: this is only asymptotically, so we need large samples for this to hold. And, we do not arrive at a \(t\)-distribution here - however approximations for \(t\) (and F) exists, but the problem is the number of degrees of freedom (in particular when we have time varying fixed effects).
In this course we will only use the asymptotic normality for confidence intervals (and when a Wald type test is desired). We will consider hypothesis testing with likelihood ratio test under “Model selection” in the end of this module.
## 2.5 % 97.5 %
## .sig01 1.484204 5.100814
## .sigma 2.435429 3.877572
## (Intercept) 4.324560 8.824909
## NAP -3.566901 -1.599779
Q: Interpret! First the last two rows. The first two we have not covered yet - since we don’t know how to estimate the parameters in \({\bf V}\) but what do you think this is?
A: Estimate \(\pm\) 1.96 times standard error of estimate.
Plan:
We will in this exercise look at the following model:
\[Y_{ij} = \mathbf{x}_{ij}^T \pmb{\beta} + \gamma_{0i} + {\varepsilon}_{ij}, \ \varepsilon_{ij} \sim N(0, \sigma^2) \] \[\gamma_{0i} \sim N(0, \tau_0^2)\]
where \(i \in \{1, \dots, m\}\) is the group index, and \(j \in \{1, \dots, n_i\}\) is the index for repeated measurements in each group (cluster). We assume that all random variables are independent.
a) What is this model called? Discuss where and when such models are useful.
b) What is the marginal model for \(\mathbf{Y}_i = (Y_{i1}, \dots, Y_{in_i})\)? Show how you arrive at your answer and write with vectors and matrices (for cluster \(i\)).
What are the advantages of having an expression for the marginal distribution of \(\mathbf{Y}_i\) when doing estimation?
In the rest of the exercise we will look at a dataset consisting of observations of cod from the year 2000. The dataset is from Havforskningsinstitutten in Bergen, and we use only a subset of a large dataset with information about fish in Barentshavet. The following variables are available on a random sample within each catch (a draft of trawl (trål)):
length
: the length of fish (in cm)weight
: the weight of fish (in grams)age
: the age of fish (in years)haulsize
: the size of the total catch (in ton (1000 kg))Let \(i\) be the index for catch, and \(j\) be the index for an individual fish in a catch. We will not use the age variable in this exercise.
We start by looking at the model
\[ \log(\texttt{weight}_{ij}) = \beta_0 + \beta_1 \log(\texttt{length}_{ij}) + \beta_2\log(\texttt{haulsize}_i) + \gamma_{0i} + \varepsilon_{ij} \]
where \(\gamma_{0i}\sim N(0, \tau_0^2)\), \(\varepsilon_{ij} \sim N(0,\sigma^2)\), and all random effects are independent. Below is an excerpt from the output from fitting this model in R
.
library(lme4)
filepath <- "https://www.math.ntnu.no/emner/TMA4315/2018h/fishdata.dat"
fish <- read.table(filepath, header = TRUE)
fit1 <- lmer(log(weight) ~ log(length) + log(haulsize) + (1 | haul), data = fish)
summary(fit1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: log(weight) ~ log(length) + log(haulsize) + (1 | haul)
## Data: fish
##
## REML criterion at convergence: -10188.2
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -4.1204 -0.6107 -0.0314 0.5813 5.1458
##
## Random effects:
## Groups Name Variance Std.Dev.
## haul (Intercept) 0.003030 0.05505
## Residual 0.008594 0.09270
## Number of obs: 5433, groups: haul, 63
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -4.286029 0.041554 -103.143
## log(length) 2.901037 0.009703 298.984
## log(haulsize) 0.006169 0.005328 1.158
##
## Correlation of Fixed Effects:
## (Intr) lg(ln)
## log(length) -0.980
## log(haulsz) 0.047 0.056
c) Write down the estimates for all the parameters in the model. What is the correlation between two weight-variables from the same catch (haul)?
d) We are now interested in \(\theta = \exp(\beta_0 + \beta_1 \log(66) + \beta_2 \log(0.46))\). The standard deviation of \(\hat{\beta}_0 + \hat{\beta}_1 \log(66) + \hat{\beta}_2 \log(0.46)\) is 0.007. Explain how you can calculate this value (you do not have to do the calculation). Then create and calculate a 95 % confidence interval for \(\theta\).
Remark: skip (e), we will not focus so much on model selection. However, a top-down-strategy (which this exam question points to) will be explained in the end of this module page (week 2)
e) Assume that you want to compare different models with respect to which random effects and fixed effects that should be included in the model. Write down a general strategy for doing model evaluation on these kinds of models.
New: f) Below is a (horizontal version of a) catepillarplot of the random effects with confidence intervals, and a density plot for the random effects and a QQ-plot for the random effects. Explain what you see.
library(sjPlot)
plot_model(fit1, y.offset = 0.5, type = "re")
ggplot() + geom_density(aes(x = ranef(fit1)$haul[[1]])) + theme_minimal() +
labs(x = "x", y = "y", title = "Density")
gg <- plot_model(fit1, type = "diag", prnt.plot = FALSE, title = "Quantile plot",
geom.size = 1)
gg[[2]]
## $haul
The data in this problem is part of a longitudinal study of income in the US, the Panel Study of Income Dynamics, begun in 1968. The subset consists of 42 heads of household who were aged 25-39 in 1968. The variables included are
age
, age in 1968cyear
, coded as -10 in 1968, 0 in 1978 and 10 in 1988educ
, years of education in 1968sex
, M = male, F = femaleperson
, containing information on person idBelow is an excerpt from the output from fitting a linear mixed model (LMM) with the function lmer
in R
. The log income (lincm
) is the response,
\[\texttt{lincm}_{ij} = \beta_0 + \beta_1 \texttt{age}_i + \beta_2 \texttt{cyear}_{ij} + \beta_3 \texttt{educ}_i + \beta_4 \texttt{sex}_i + \gamma_{0i} + \varepsilon_{ij}\] \[i = 1, \dots, 42, \ j = 1968, 1978, 1988\]
where \(\gamma_{0i}\) represent the random effects.
## Linear mixed model fit by REML ['lmerMod']
## Formula: lincm ~ age + cyear + educ + sex + (1 | person)
## Data: psid2
##
## REML criterion at convergence: 306
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4411 -0.4003 0.1071 0.5602 1.6038
##
## Random effects:
## Groups Name Variance Std.Dev.
## person (Intercept) 0.001757 0.04192
## Residual 0.563294 0.75053
## Number of obs: 126, groups: person, 42
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7.386823 0.631710 11.693
## age -0.020930 0.015207 -1.376
## cyear 0.084163 0.008189 10.278
## educ 0.116343 0.027582 4.218
## sexM 1.311661 0.142247 9.221
##
## Correlation of Fixed Effects:
## (Intr) age cyear educ
## age -0.831
## cyear 0.000 0.000
## educ -0.685 0.201 0.000
## sexM 0.003 -0.217 0.000 0.041
a) Formulate the model on matrix form and explain the meaning of the different parts. State the model assumptions.
b) Determine an approximate 95 % interval for the coefficient of cyear
. Do you think the nominal income has been constant in the period covered by the survey?
e) Use the values in the R
-output to calculate the estimated covariance matrix for the response \((Y_{i1}, Y_{i2}, Y_{i3})^T\) by hand. Hint: compound symmetry due to random intercept model.
Remark Skip (c), which will be lectured in week 2.
c) Explain how one can test the simultaneous significance of two fixed effects (e.g. age
and educ
) in LMMs.
d) Describe how the random effects \(\gamma_{0i}\) can be predicted/estimated. What information is missing for you to calculate these.
In part c) and d) it was not necessary to do any numerical calculations at the exam.
f) New: do c) and d) in R
(after you have done c) and d) by hand!!!, i.e., do numerical calculations (choose \(i = 2\) in d)). The rmd-file of this module contains necessary code to download the dataset, modify it, and fit the full model. Alternatively use purl
to extract the code.
beach
-example - which we saw was a special case of linear mixed models (LMM).Words to know: measurement model, distributional model, conditional model, marginal model, global model.
Measurement model
\[{\bf Y}_i= {\bf X}_i {\boldsymbol \beta} + {\bf U}_i {\boldsymbol \gamma}_{i} + {\boldsymbol \varepsilon}_i\]
Distributional model
\[{\boldsymbol \gamma}_i \sim N({\bf 0},{\bf Q})\] \[{\boldsymbol \varepsilon}_i \sim N({\bf 0},\sigma^2 {\bf I})\] This gave the marginal model for each cluster:
\[{\bf Y}_i \sim N(\mu_i={\bf X}_i {\boldsymbol \beta},{\bf V}_i=\sigma^2 {\bf I}+{\bf U}_i {\bf Q}{\bf U}_i^T)\]
Q: explain what the different parameters and random variables are, and what are their dimensions.
A:
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
For all cluster together (even more letters now) \[{\bf Y}={\bf X}{\boldsymbol \beta} + {\bf U} {\boldsymbol \gamma} +{\boldsymbol \varepsilon}\]
\[ {\bf Y}=\begin{pmatrix} {\bf Y}_1\\ {\bf Y}_2\\ \vdots \\ {\bf Y}_m \end{pmatrix}, {\bf X}=\begin{pmatrix} {\bf X}_1\\ {\bf X}_2 \\ \vdots \\ {\bf X}_m \end{pmatrix}, {\bf U}=\begin{pmatrix} {\bf U}_1 & {\bf 0} & \ldots &{\bf 0}\\ {\bf 0 } & {\bf U}_2 & \ldots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ddots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ldots &{\bf U}_m\\ \end{pmatrix} \]
\[ {\boldsymbol \gamma}=\begin{pmatrix} {\boldsymbol \gamma}_1\\ {\boldsymbol \gamma}_2\\ \vdots \\ {\boldsymbol \gamma}_m \end{pmatrix}, {\boldsymbol \varepsilon}=\begin{pmatrix} {\boldsymbol \varepsilon}_1\\ {\boldsymbol \varepsilon}_2 \\ \vdots \\ {\boldsymbol \varepsilon}_m \end{pmatrix} \]
\[\begin{align*} {\bf Y}&= {\bf X} {\boldsymbol \beta} + {\bf U} {\boldsymbol \gamma} + {\boldsymbol \varepsilon}={\bf X}{\boldsymbol \beta} + {\boldsymbol \varepsilon}^{*}\\ {\boldsymbol \varepsilon}^{*} &= {\bf U} {\boldsymbol \gamma} + {\boldsymbol \varepsilon}\\ {\bf V}&=\text{Cov}({\boldsymbol \varepsilon}^{*})=\text{Cov}({\boldsymbol \varepsilon})+\text{Cov}({\bf U}{\boldsymbol \gamma})=\sigma^2 {\bf I}+{\bf U} {\bf G} {\bf U}^T\\ {\boldsymbol \varepsilon}^{*} &\sim N({\bf 0},{\bf V})\\ \end{align*}\]Here \({\bf G}\) is a \(m\dot (q+1)\) block-diagonal matrix with \({\bf Q}\) \(m\) times on the diagonal, which gives \[{\bf Y}\sim N({\bf X} {\boldsymbol \beta},{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
estimated using maximum likelihood, with the marginal distribution as starting point: \[{\bf Y}\sim N({\bf X} {\boldsymbol \beta},{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
We assume that the parameters in \({\bf V}\) are known, then we get the weighted least squares solution for \({\boldsymbol \beta}\). \[\hat{{\boldsymbol \beta}}=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T {\bf V}^{-1}{\bf Y}=(\sum_{i=1}^m {\bf X}_i^T{\bf V}_i^{-1}{\bf X}_i)^{-1} \sum_{i=1}^m {\bf X}_i^T {\bf V}_i^{-1}{\bf Y}_i\] \[\hat{{\boldsymbol \beta}}\sim N({\boldsymbol \beta}, (\sum_{i=1}^m {\bf X}_i^T{\bf V}_i^{-1}{\bf X}_i)^{-1})\]
We insert estimates for \({\bf V}_i\) (which we will find next), and the same distribution - but only asymptotically - to be used for inference for the fixed effects.
\[\hat{{\boldsymbol \beta}}=(\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1}\sum_{i=1}^m {\bf X}_i^T \hat{\bf V}_i^{-1}{\bf Y}_i\]
\[ \approx N({\boldsymbol \beta}, (\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1})\]
Now follows:
Then, for the random effects \({\boldsymbol \gamma}_i\) and \({\boldsymbol \varepsilon}_i\) we also provide predictions
There are two ways to explain the REML - a transformation method (we start with this), and an integration method (to come next).
To aid in our understanding we start by looking at the REML solution for multippel linear regression (Module 2), \[{\bf Y=X {\boldsymbol \beta}}+{{\boldsymbol \varepsilon}} \text{ with } {\boldsymbol \varepsilon} \sim N({\bf 0},\sigma^2 {\bf I})\] where \({\bf X}\) is a \(n\times p\) design matrix. Remember that \(\text{SSE}={\bf Y}^T({\bf I}-{\bf H}){\bf{Y}}\) where the hat matrix is \({\bf H}={\bf X}({\bf X}^T{\bf X})^{-1}{\bf X}^T\).
This is called the REML estimate for \(\sigma^2\).
Remark: There are many solutions to \({\bf A}\) but to get \(\text{E}({\bf A}^T{\bf Y})={\bf A}^T{\bf X}{\boldsymbol \beta}={\bf 0}\) then \({\bf A}\) can be chosen to have linearly independent columns orthogonal to columns space of the design matrix.
Remark: We can not choose \({\bf A}={\bf I}-{\bf H}\) since we need \({\bf A}\) to have dimension \(n \times n-p\). But we can for example choose an (othogonal) basis with \(n-p\) vectors for the column space of \({\bf I}-{\bf H}\).
Now, move to our linear mixed effects model. We have the model \[{\bf Y}={\bf X}{\boldsymbol \beta} + {\bf U} {\boldsymbol \gamma} +{\boldsymbol \varepsilon}\] with the marginal distribution \[{\bf Y}\sim N({\bf X} {\boldsymbol \beta},{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
Even if \({\boldsymbol \beta}\) is not estimated in this optimization we already know that \[\hat{{\boldsymbol \beta}}=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T {\bf V}^{-1}{\bf Y}\] and now we have a new \(\hat{\bf V}\) which we insert in this equation, and thus get a new REML-estimator for \({\boldsymbol \beta}\): \[\hat{{\boldsymbol \beta}}=({\bf X}^T\hat{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T \hat{\bf V}^{-1}{\bf Y}\]
This means, that when using REML-estimation for our linear mixed effects model this will influence both the fixed effects and the random effects parameters. However, asymptotically we will still have the same asymptotic distribution for the fixed effects as with ML estimation.
In addition the main justification for using REML is that in the absence of information on \({\boldsymbol \beta}\) then no information about the parameters in \(\vartheta\) is lost when likelihood estimation is based on \({\bf A}^T {\bf Y}\) instead of on \({\bf Y}\). In statistical inference this is referred to as \({\bf A}^T{\bf Y}\) is marginally sufficient for \(\vartheta\) (but this is way beyond the scope of this course).
In addition, according to Verbeke and Molenbergs (2000, page 46, Equation 5.8), the likelihood function of \({\bf A}^T{\bf Y}\) is \[L(\vartheta)= C \lvert \sum_{i=1}^m {\bf X}_i^T {\bf V}(\vartheta)_i^{-1} {\bf X}_i\rvert^{-1/2} L_{ML}(\hat{\boldsymbol \beta}({\vartheta}),\vartheta) \] where \(C\) is a constant not depending on \(\vartheta\), and \(L_{ML}\) is the marginal likelihood of \({\bf}\). Also, the term \(\lvert \sum_{i=1}^m {\bf X}_i^T {\bf V}(\vartheta)_i^{-1} {\bf X}_i\rvert\) does not depend on \({\boldsymbol \beta}\).
Therefore both \({\boldsymbol \beta}\) and \(\vartheta\) can be found by maximizing what is referred to as the REML likelihood function: \[L_{REML}(\vartheta,{\boldsymbol \beta})= \lvert \sum_{i=1}^m {\bf X}_i^T {\bf V}(\vartheta)_i^{-1} {\bf X}_i\rvert^{-1/2} L_{ML}({\boldsymbol \beta},\vartheta) \]
Further reading: Theoretical explanation for REML (beyond the scope of this course) by Inge Helland, UiO and also by Verbeke and Molenberghs (2000), Section 5.3 (free ebook from Springer for NTNU students).
fitREML = lmer(Richness ~ NAP + (1 | Beach), data = RIKZ)
fitML = lmer(Richness ~ NAP + (1 | Beach), data = RIKZ, REML = FALSE)
REMLest = c(fixef(fitREML), as.data.frame(VarCorr(fitREML))[, 4])
MLest = c(fixef(fitML), as.data.frame(VarCorr(fitML))[, 4])
df = data.frame(REML = REMLest, ML = MLest)
rownames(df) = c("$\\beta_0$", "$\\beta_1$", "$\\tau_0$", "$\\sigma$")
kable(df, digits = 4)
REML | ML | |
---|---|---|
\(\beta_0\) | 6.5819 | 6.5844 |
\(\beta_1\) | -2.5684 | -2.5757 |
\(\tau_0\) | 8.6675 | 7.5068 |
\(\sigma\) | 9.3622 | 9.1110 |
Q: Comment on what you see.
Remark: the default for lmer
is REML, and we need to write REML=FALSE
to get ML.
\[{\bf Y}\sim N({\bf X} {\boldsymbol \beta},{\bf V}(\vartheta)=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
For the fixed effects we started with the log-likelihood function and maximized to get estimator for \({\boldsymbol \beta}\) dependent on \(\vartheta\). If we now assume that we have found \({\boldsymbol \beta}(\vartheta)\) and insert this estimate into the loglikelihood then the profile log-likelihood is (discregarding an additive constant)
\[l_{P}(\vartheta)=-\frac{1}{2}\ln \lvert {\bf V(\vartheta)}\rvert -\frac{1}{2}({\bf y}-{\bf X}\hat{\boldsymbol \beta}(\vartheta))^T {\bf V(\vartheta)}^{-1}({\bf y}-{\bf X}\hat{\boldsymbol \beta}(\vartheta))\]
The integration method (can be motivated from the Bayesian perspective by assuming a flat prior on \({\boldsymbol \beta}\)) constructs a marginal or restricted log-likelihood by integrating \({\boldsymbol \beta}\) out of the likelihood \[ l_{\text{REML}}(\vartheta)=\ln \int L({\boldsymbol \beta},\vartheta)d{\boldsymbol \beta}\]
It can be shown that the REML log-likelihood is \[l_{\text{REML}}(\vartheta)= l_{P}(\vartheta)-\frac{1}{2}\ln \lvert \sum_{i=1}^m {\bf X}_i^T {\bf V}(\vartheta)_i^{-1} {\bf X}_i\rvert \]
Maximizing of \(l_{\text{REML}}(\vartheta)\) provides the REML estimator for \(\vartheta\).
lmer
function for fitting LMMs in the lme4
-package in R
.## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
Q: What have we covered so far, and what is missing? Explain the elements of the print-out!
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## AIC BIC logLik deviance df.resid
## 249.8 257.1 -120.9 241.8 41
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4258 -0.5010 -0.1791 0.2452 4.0452
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 7.507 2.740
## Residual 9.111 3.018
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5844 1.0321 6.380
## NAP -2.5757 0.4873 -5.285
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.164
Q: Look for differences between the REML and ML output.
Why do we want a prediction? To rank the beaches (or schools, patients?). Small-area-estimations.
Model check can also use this to check that \({\boldsymbol \gamma}\) is normal is in agreement with our fitted model (same as when using residuals to check distribution of errors).
fit = lmer(Richness ~ NAP + (1 | Beach), data = RIKZ)
plot_model(fit, type = "re")
The joint distribution of \({\bf Y}\) and \({\boldsymbol \gamma}\) is: \[\begin{pmatrix} {\bf Y}\\ {\boldsymbol \gamma} \end{pmatrix} \sim N(\begin{pmatrix} {\bf X}{\boldsymbol \beta} \\ {\bf 0}\end{pmatrix}, \begin{pmatrix}{\bf V}={\bf U}{\bf G}{\bf U}^T+\sigma^2{\bf I} & {\bf U}{\bf G}\\ {\bf G}{\bf U}^T & {\bf G}\end{pmatrix}) \]
This is maximized with respect to \({\boldsymbol \beta}\) and \({\boldsymbol \gamma}\), to give \[\hat{{\boldsymbol \gamma}}={\bf G}{\bf U}^T{\bf V}^{-1}({\bf Y}-{\bf X}\hat{{\boldsymbol \beta}})\]
But, here the elements of \({\bf G}\) and \({\bf V}\) needs to be estimated, and we get: \[\hat{{\boldsymbol \gamma}}=\hat{\bf G}{\bf U}^T\hat{\bf V}^{-1}({\bf Y}-{\bf X}\hat{{\boldsymbol \beta}})\] \[\hat{{\boldsymbol \gamma}}_i=\hat{\bf Q}{\bf U}_i^T\hat{\bf V}_i^{-1}({\bf Y}_i-{\bf X}_i\hat{{\boldsymbol \beta}})\]
Remark: For details on this calculation - involving the Henderson’s mixed model equations, see pages 371-372 of Fahrmeir et al (2013), or pages 98-99 in Verbeke and Molenberghs (2000), Section 5.3 (free ebook from Springer for NTNU students).
Remember: \[\begin{pmatrix} {\bf Y}\\ {\boldsymbol \gamma} \end{pmatrix} \sim N(\begin{pmatrix} {\bf X}{\boldsymbol \beta} \\ {\bf 0}\end{pmatrix}, \begin{pmatrix}{\bf V}={\bf U}{\bf G}{\bf U}^T+\sigma^2{\bf I} & {\bf U}{\bf G}\\ {\bf G}{\bf U}^T & {\bf G}\end{pmatrix}) \]
(Alternatively) The predicted random effects can be found as the mean of the conditional distribution of \({\boldsymbol \gamma}\) given \({\bf Y}\). If we also calculate the covariance of the estimated \({\boldsymbol \gamma}\) we can make approximate prediction intervals for the predicted random effects.
The general formula for the conditional multivariate normal \({\bf X}\) (known from TMA4267) is: \[{\bf X} \sim N(\mu,\Sigma)\] \[{\bf X}_2 \mid ({\bf X}_1={\bf x}_1) \sim N(\mu_2+\Sigma_{21}\Sigma_{11}^{-1} ({\bf x}_1-\mu_1),\Sigma_{22}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12})\]
If we use the formula for the mean with \({\bf X}_1={\bf Y}\) and \({\bf X}_2={\boldsymbol \gamma}\), then \[\text{E}({\boldsymbol \gamma} \mid {\bf Y})={\bf 0}+{\bf G}{\bf U}^T{\bf V}^{-1}({\bf Y}-{\bf X}{\boldsymbol \beta})\] which can be used (inserted parameter estimates) to give our estimated random effects.
We may find the covariance matrix of \({\bf G}{\bf U}^T{\bf V}^{-1}({\bf Y}-{\bf X}\hat{{\boldsymbol \beta}})\) (directly), and for each \(\hat{{\boldsymbol \gamma}}_i\) this is given as \[ {\bf Q}{\bf U}_i^T\left( {\bf V}_i^{-1}-{\bf V}_i^{-1}{\bf X}_i (\sum_{i=1}^m {\bf X}_i^T {\bf V}_i^{-1} {\bf X}_i)^{-1} {\bf X}_i^T {\bf V}_i^{-1} \right) {\bf U}_i {\bf Q} \] (according to Verbeke and Molenberghs (2000), page 78). We insert estimates \({\bf Q}\) and \({\bf V}_i\) (thus underestimating the variability) and get the estimated covariance matrix for the random effect. Such an estimate is used in the catepillar plot.
For \(i=1, \ldots, m\): \[ {\bf Y}_i = {\bf X}_i {\boldsymbol \beta} + {\bf U}_i {\boldsymbol \gamma}_{0i}+{\boldsymbol \varepsilon}_i \] where \[ {\boldsymbol \varepsilon}_i \sim N({\bf 0},\sigma^2 {\bf I}) \text{ and } \gamma_{0i}\sim N(0,\tau_0^2)\] and \({\bf U}_i\) is a \(n_i \times 1\) vector of ones. Further, the \(n_i \times n_i\) marginal covariance matrix for \({\bf Y}_i\) is \[ {\bf V}_i = \sigma^2 {\bf I}+ \tau_0^2 {\bf 1}{\bf 1}^T \text{ with inverse } {\bf V}_i^{-1} = \frac{1}{\sigma^2} ({\bf I}- \frac{\tau_0^2}{\sigma^2+n_i \tau_0^2} {\bf 1}{\bf 1}^T)\] which means that the elements on the main diagonal for \({\bf V}^{-1}\) are \[\frac{1}{\sigma^2(\sigma^2+n_i \tau_0^2)}\] and the off-diagonal entries are \(-\tau_0^2\).
The fixed effect estimate use this inverse matrix as the weighting matrix \({\bf V}_i^{-1}\) in \[\hat{{\boldsymbol \beta}}=(\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1}\sum_{i=1}^m {\bf X}_i^T \hat{\bf V}_i^{-1}{\bf Y}_i\]
The predicted random intercepts are \[\hat{\gamma}_{0i}=\hat{\bf Q}{\bf U}_i^T\hat{\bf V}_i^{-1} ({\bf Y}_i-{\bf X}_i\hat{{\boldsymbol \beta}})=\cdots =\frac{n_i \hat{\tau}_{0}^2}{\hat{\sigma}^2+n_i \hat{\tau}_{0}^2}e_i\] where \(e_i\) is the average (raw, level 0 - see below) residual \[ e_i=\frac{1}{n_i} \sum_{j=1}^{n_i} (Y_{ij}-{\bf x}_{ij}^T\hat{{\boldsymbol \beta}})\]
\[\hat{\gamma}_{0i}=\frac{n_i \hat{\tau}_{0}^2}{\hat{\sigma}^2+n_i \hat{\tau}_{0}^2}e_i\]
Remember \[\text{E}({\boldsymbol \gamma} \mid {\bf Y})={\bf 0}+{\bf G}{\bf U}^T{\bf V}^{-1}({\bf Y}-{\bf X}{\boldsymbol \beta})\]
The formula for \(\hat{\gamma}_{i0}\) can be seen as a weighted sum between the conditional expectation \(0\) and the average residual \(e_i\), with weighting factor \(\frac{n_i \hat{\tau}_{0}^2}{\hat{\sigma}^2+n_i \hat{\tau}_{0}^2}\) for the average residual (and 1-this for 0). The larger the \(n_i\) the closer the weight is to \(1\) and the smaller the shrinkage. Shrinkage is also high if the error variance \(\sigma^2\) is large compared to the random effect variance \(\tau_{0}^2\). The latter gives a very small ICC, so then it makes sense to have random effects close to \(0\).
We can also use the R package sjPlot
to produce plots and outputs from fitting linear mixed effects models with function lmer
in the R package lme4
. This plotting package can also be used to produce nice plots for lm
and glm
. The package uses ggplot2
and other tidyverse
packages.
For details see
For our beach-example: First the predicted values for the estimated random effect for each Beach - with confidence intervals. Unsorted and sorted version. (horisontal version of catepillar plot). Then QQ-plots for the estimated random effects.
plot_model(fit, type = "re", y.offset = 0.4)
plot_model(fit, type = "re", sort.est = "(Intercept)", y.offset = 0.4)
Q: comment on what you see.
Let \(\mu_i\) denote \(\text{E}({\bf Y}_i)\). Fitted values for the LMM can be made on two levels:
\[\begin{align*} \text{Level 0, marginal} &: \hat{\mu}_i={\bf X}_i \hat{{\boldsymbol \beta}}\\ & e_i={\bf Y}_i-{\bf X}_i {\boldsymbol \beta}\\ \text{Level 1, conditional} &: \tilde{\mu}_i={\bf X}_i \hat{{\boldsymbol \beta}}+{\bf U}_i\hat{{\boldsymbol \gamma}}_i\\ & e_i={\bf Y}_i-{\bf X}_i {\boldsymbol \beta}-{\bf U}_i\hat{{\boldsymbol \gamma}}_i \end{align*}\]For lmer
the function fitted
gives the level 1 fitted values (for our two-level models). This means that raw residuals can also be made on two levels, and the default is level 1 for lmer
.
In addition to raw residuals, also Pearson residuals (standardized) are popular.
The residuals can be used in the same way as for the Multiple linear model (module 2).
fit = lmer(Richness ~ NAP + (1 | Beach), data = RIKZ)
df = data.frame(fitted = fitted(fit), resid = residuals(fit, scaled = TRUE))
ggplot(df, aes(fitted, resid)) + geom_point(pch = 21) + geom_hline(yintercept = 0,
linetype = "dashed") + geom_smooth(se = FALSE, col = "red", size = 0.5,
method = "loess") + labs(x = "Fitted values", y = "Residuals", title = "Residuals vs Fitted values")
Q: any trend? homoscedastic?
ggplot(df, aes(sample = resid)) + stat_qq(pch = 19) + geom_abline(intercept = 0,
slope = 1, linetype = "dotted") + labs(x = "Theoretical quantiles", y = "Standardized residuals",
title = "Normal Q-Q")
Q: normally distributed?
In a study on the effect of sleep deprivation the average reaction time per day were measured. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject. This was measured for 18 subjects for 10 days (days 0-9).
We observe that each subject’s reaction time increases approximately linearly with the number of sleepdeprived days. But, it appears that subjects have different slopes and intercepts.
As a first model we may assume that there is a common intercept and slope for the population - called fixed effects, but allow for random deviations for the intercept and slope for each individual. This is called a random intercept and slope model.
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
summary(fm1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1743.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9536 -0.4634 0.0231 0.4634 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 612.09 24.740
## Days 35.07 5.922 0.07
## Residual 654.94 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 251.405 6.825 36.838
## Days 10.467 1.546 6.771
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
Q: What are our parameter estimates and their interpretation?
A: Here the population fixed effects estimates are an intercept of 251.4 ms and a slope of 10.47 ms/day. The random effects for the intercept and the slope have estimated standard deviations 24.74 ms and 5.92 ms/day.
library(sjPlot)
plot_model(fm1, type = "re")
\[Y_{ij}=\beta_0+\beta_1x_{ij}+\gamma_{0i}+\gamma_{1i}x_{ij}+{\varepsilon}_{ij}\]
\[ {\boldsymbol \varepsilon}_i \sim N({\bf 0},\sigma^2 {\bf I})\] \[{\boldsymbol \gamma}_i =\begin{pmatrix} \gamma_{0i}\\ \gamma_{1i} \end{pmatrix} \sim N \left( \begin{pmatrix} 0\\ 0 \end{pmatrix},{\bf Q}= \begin{pmatrix} \tau_0^2 & \tau_{01}\\ \tau_{01} & \tau_1^2 \end{pmatrix} \right)\]
The parameter \(\tau_{01}\) gives the covariance beween the random intercept and random slope.
\[\text{Cov}(Y_{ij},Y_{kl})=\text{E}[(Y_{ij}-\mu_{ij})(Y_{kl}-\mu_{kl})]\]
\[\text{Cov}(Y_{ij},Y_{kl})=\left\{\begin{array}{lr} \tau_0^2+2\tau_{01}x_{ij}+\tau_1^2 x_{ij}^2+\sigma^2=\text{Var}(Y_{ij}) & \text{for } i=k, j=l\\ \tau_0^2 \tau_{01}x_{ij}+\tau_{01}x_{il}+\tau_i^2x_{ij}x_{il}& \text{for } i=k, j\neq l\\ 0 & \text{for } i\neq k, j\neq l \end{array}\right\} \]
The correlation between \(Y_{ij}\) and \(Y_{il}\) (two observations in the same cluster, that is, same beach) dependes in a complicated way on the observed values for \(x\) and is rather difficult to interpret.
\[\text{Corr}(Y_{ij},Y_{il})=\frac{\text{Cov}(Y_{ij},Y_{il})}{\sqrt{\text{Var}(Y_{ij})\text{Var}(Y_{il})}}\]
\[\hat{{\boldsymbol \beta}}=(\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1}\sum_{i=1}^m {\bf X}_i^T \hat{\bf V}_i^{-1}{\bf Y}_i\approx N({\boldsymbol \beta}, (\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1})\]
\[{\bf C}{\boldsymbol \beta}={\bf d} \text{ vs. } {\bf C}{\boldsymbol \beta}\neq {\bf d} \] where \({\bf C}\) is a \(r \times p\) constant matrix and \({\bf d}\) a \(r \times 1\) constant vector. Then: \[(\hat{{\boldsymbol \beta}}-{\boldsymbol \beta})^T {\bf C}^T [{\bf C}(\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1} {\bf C}^T]^{-1}{\bf C}(\hat{{\boldsymbol \beta}}-{\boldsymbol \beta})\] asymptotically follows a \(\chi^2\)-distribution with \(r\)-degrees of freedom.
fit = lmer(Richness ~ NAP + (1 | Beach), data = RIKZ, REML = FALSE)
summary(fit)
z = (summary(fit)$coefficients[2, 3])
1 - pchisq(z^2, 1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## AIC BIC logLik deviance df.resid
## 249.8 257.1 -120.9 241.8 41
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4258 -0.5010 -0.1791 0.2452 4.0452
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 7.507 2.740
## Residual 9.111 3.018
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5844 1.0321 6.380
## NAP -2.5757 0.4873 -5.285
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.164
## [1] 1.254807e-07
Notation:
The random effects parts of these models are assumed to be the same, while the changes are only to the fixed effects part.
The likelihood ratio statistic is defined as \[- 2\ln \lambda=-2(\ln L(\hat{{\boldsymbol \beta}}_B)-\ln L(\hat{{\boldsymbol \beta}}_A)) \] which under the null is asymptotically \(\chi^2\)-distributed with degrees of freedom equal the difference in the number of parameters in the large and the small model. Again, \(p\)-values are calculated in the upper tail of the \(\chi^2\)-distribution.
Remark: this is the log-likelihood, not the REML version.
Remark: this result is not valid if the the models are fitted using REML istead of ML. The reason for this is that the mean structure of the model fitted under the null hypothesis is not the same mean structure under the alternative hypothesis, which leads to that different matrices \({\bf A}\) must be used for the REML method. Therefore these REML log-likelihoods are based on different observations and are therefore not comparable.
fit = lmer(Richness ~ NAP + (1 | Beach), data = RIKZ, REML = FALSE)
fit0 = lmer(Richness ~ 1 + (1 | Beach), data = RIKZ, REML = FALSE)
anova(fit0, fit)
## Data: RIKZ
## Models:
## fit0: Richness ~ 1 + (1 | Beach)
## fit: Richness ~ NAP + (1 | Beach)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit0 3 269.30 274.72 -131.65 263.30
## fit 4 249.83 257.06 -120.92 241.83 21.474 1 3.586e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Q: Which is model A (large) and model B (small)? What do we conclude? Compare to the Wald test result (above).
In most situations the fixed effects model is of prime interest, however, a good choice of covariance structure is useful for interpreting the data and essential to be able to perform valid inference for the fixed effects.
Wald test can also be used for random effects parameters \(\vartheta\) in \({\bf Q}\) and \(\sigma^2\), and
But there is a problem: the performance of the normal approximation depends strongly on the true value of \(\vartheta\) and large samples are needed for values of \(\vartheta\) that are close to the boundary of the parameter space (for the hyptothesis tested), and when on the boundary the normal approximation fails.
We will not dive deep into this matter in this course, but report that the solution to this - both for the Wald and the likelihood ratio test (preferably using the REML log-likelihood) is to use a mixture of \(\chi^2\) distributions in these cases.
The fixed effects parts of these models are assumed to be the same, while the changes are only to the random effects part - and the changes gives nested models.
The likelihood ratio statistic is defined as \[- 2\ln \lambda=-2(\ln L(\hat{\vartheta}_B)-\ln L(\hat{\vartheta}_A)) \] and the REML-likelihood is preferred.
For testing a random intercept model vs. no random intercept (need for this random effect) then \[H_0: {\bf Q}=0 \text{ vs. } H_1: {\bf Q}=\tau_0^2\] asymptotically \(-2\ln \lambda\) is a mixture of \(\chi_1^2\) and \(\chi_0^2\) with equal weights. Here \(\chi_0^2\) is the distribution that gives probability mass 1 to the value 0.
If we instead had used the classical null distribution (\(\chi_1^2\)) then the \(p\)-values would be too large and the null hypotheses kept to often.
For testing a random intercept versus a random slope (with \({\bf Q}\) having three parameters) when \(p\)-values is found from \(0.5\) times a \(\chi^2_1\) and a \(\chi^2_2\) distribution.
Similar strategies for other situations - see Verbeke and Molenberghs (2000) pages 69-72.
There are two main strategies:
\[ \text{AIC}= -2\cdot l(\hat{{\boldsymbol \beta}},\hat{\vartheta}) +2\cdot r\] \[ \text{BIC}= -2\cdot l(\hat{{\boldsymbol \beta}},\hat{\vartheta}) +\ln(N)\cdot r\]
This can be used directly in the ML estimation, and as before BIC will give a smaller model than AIC.
\[ \text{AIC}= -2\cdot l(\hat{{\boldsymbol \beta}},\hat{\vartheta}) +2\cdot r\] \[ \text{BIC}= -2\cdot l(\hat{{\boldsymbol \beta}},\hat{\vartheta}) +\ln(N-p)\cdot r\]
Remember: \[l_{\text{REML}}(\vartheta)= l_{P}(\vartheta)-\frac{1}{2}\ln \lvert \sum_{i=1}^m {\bf X}_i^T {\bf V}(\vartheta)_i^{-1} {\bf X}_i\rvert \]
fm1 = lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
# random slope and intercept, correlated
fm2 = lmer(Reaction ~ Days + ((1 | Subject) + (0 + Days | Subject)), data = sleepstudy)
# random slope and intercept, uncorrelated
fm3 = lmer(Reaction ~ Days + (1 | Subject), data = sleepstudy)
# random intercept
AIC(fm1, fm2, fm3) #with REML
extractAIC(fm1)
extractAIC(fm2)
extractAIC(fm3) #refit and give ML
## df AIC
## fm1 6 1755.628
## fm2 5 1753.669
## fm3 4 1794.465
## [1] 6.000 1763.939
## [1] 5.000 1762.003
## [1] 4.000 1802.079
Q: What are the three models? Which model to choose?
A:
All three models have a population intercept and a slope in Days
, but the random part of the models differ.
fm1
is the most complex with a random intercept and a random slope, and a full \(2\times 2\) matrix \({\bf Q}\)fm2
also has a random intercept and slope, but the \({\bf Q}\)-matrix is diagonal.fm3
only has a random intercept and only one element in the \({\bf Q}\)-matrix.The fm2
model gives the lowest AIC.
Start with model with all explanatory variables and possible interactions for the fixed effects - called a beyond optimal model. (Nothing is really done her, just decide on the largest possible fixed part).
With this beyond optimal fixed effects model we now focus on the random effects. The idea is that since we have many explanatory variables in the fixed effects the random component should not contain information that we would prefer to have in the fixed effect. To do this we may either use testing or AIC or BIC. Testing is problematic due to that the null hypotheses tested is on the boundary of the parameter values tested (\(\tau^2=0\)). REML must be used (to get as unbiased estimates as possible).
Now we have the optimal random effect, so we focus on the optimal fixed effect model. ML must be used because different fixed effects will give incomparable REML-log-likelihoods. Testing or AIC or BIC can be used.
The final model is then presented with REML estimates.
This example is taken from Zuur et al. (2009), pages 127-128.
We deside on fixed model with intercept, main effect of NAP
and Exposure
and the interaction thereof. (fExp
is just a slight adjustment of Exposure
by letting level 8 be 10 - just replicate what Zuur did, so fExp
only has values 10 and 11. )
Fit the fixed model from 1 to “no random effect”, “random intercept” and “random interept and slope for NAP
”. Use REML.
B1 = gls(Richness ~ 1 + NAP * fExp, data = RIKZ, method = "REML")
B2 = lmer(Richness ~ 1 + NAP * fExp + (1 | Beach), data = RIKZ)
B3 = lmer(Richness ~ 1 + NAP * fExp + (1 + NAP | Beach), data = RIKZ)
AIC(B1, B2, B3)
## df AIC
## B1 5 238.5329
## B2 6 236.4925
## B3 8 237.1331
Conclusion: choose the random intercept model (lowest AIC).
F2 = lmer(Richness ~ 1 + NAP * fExp + (1 | Beach), data = RIKZ, REML = FALSE)
confint(F2)
F2a = lmer(Richness ~ 1 + NAP + fExp + (1 | Beach), data = RIKZ, REML = FALSE)
confint(F2a)
## 2.5 % 97.5 %
## .sig01 0.0000000 3.145294
## .sigma 2.3114668 3.681773
## (Intercept) 6.9045813 10.804288
## NAP -4.7299177 -2.275315
## fExp11 -8.1969707 -2.303772
## NAP:fExp11 0.1919491 3.877650
## 2.5 % 97.5 %
## .sig01 0.000000 3.297744
## .sigma 2.435935 3.879425
## (Intercept) 6.562780 10.630777
## NAP -3.578864 -1.644351
## fExp11 -7.563508 -1.505626
F2b = lmer(Richness ~ 1 + NAP + (1 | Beach), data = RIKZ, REML = FALSE)
confint(F2b)
F2c = lmer(Richness ~ 1 + fExp + (1 | Beach), data = RIKZ, REML = FALSE)
confint(F2c)
AIC(F2, F2a, F2b, F2c)
## 2.5 % 97.5 %
## .sig01 1.484204 5.100814
## .sigma 2.435429 3.877572
## (Intercept) 4.324560 8.824909
## NAP -3.566901 -1.599779
## 2.5 % 97.5 %
## .sig01 0.000000 3.95707
## .sigma 3.178126 5.05812
## (Intercept) 5.385254 10.29475
## fExp11 -8.522119 -1.15788
## df AIC
## F2 6 242.1135
## F2a 5 244.7589
## F2b 4 249.8291
## F2c 4 265.4332
Conclusion: keep the full model. No confidence intervals cover 0, and the AIC supports the full model.
Remark: In Zuur et al (2009), page 128, the conclusion was to use the additive model (model F2a above), based on asymptotic \(p\)-values (but this was smaller than 0.05) using the nlme
package. We have instead used AIC for this selection.
summary(B2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ 1 + NAP * fExp + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 224.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4849 -0.4161 -0.0770 0.1521 3.7313
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 3.307 1.819
## Residual 8.660 2.943
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 8.8611 1.0208 8.680
## NAP -3.4637 0.6279 -5.517
## fExp11 -5.2556 1.5452 -3.401
## NAP:fExp11 2.0005 0.9461 2.114
##
## Correlation of Fixed Effects:
## (Intr) NAP fExp11
## NAP -0.181
## fExp11 -0.661 0.120
## NAP:fExp11 0.120 -0.664 -0.221
lmer
in package lme4
This is based on the article Fitting Linear Mixed-Effects Models Using lme4 by Bates, Bolker, Mächler and Walker (2015) in Journal of Statistical Software, and in particular pages 30 and onwards.
We use a data set called the ergonometrics experiment data set ergoStool
for illustration.
effort
: the effort required (socalled Borg scale) to arise from a stool (krakk) - this is our responseType
: the type of stool - types T1, T2, T3 and T4 studied.Subject
: each of nine different subjects tested the four different stools (in random order?). SubjectsWas there any clear winner among the stools, when the goal was to minimize effort?
ergoStool
data setis found in the MEMSS
package.
library(MEMSS)
summary(ergoStool)
table(ergoStool$Subject)
contrasts(ergoStool$Type) #default contrast used
## effort Type Subject
## Min. : 7.00 T1:9 A : 4
## 1st Qu.: 8.00 T2:9 B : 4
## Median :10.00 T3:9 C : 4
## Mean :10.25 T4:9 D : 4
## 3rd Qu.:12.00 E : 4
## Max. :15.00 F : 4
## (Other):12
##
## A B C D E F G H I
## 4 4 4 4 4 4 4 4 4
## T2 T3 T4
## T1 0 0 0
## T2 1 0 0
## T3 0 1 0
## T4 0 0 1
Observe that the type of stool is coded as dummy variable, with T1 as reference category.
lmer
: summary
library(lme4)
fit = lmer(effort ~ Type + (1 | Subject), data = ergoStool)
summary(fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: effort ~ Type + (1 | Subject)
## Data: ergoStool
##
## REML criterion at convergence: 121.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.80200 -0.64317 0.05783 0.70100 1.63142
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 1.775 1.332
## Residual 1.211 1.100
## Number of obs: 36, groups: Subject, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 8.5556 0.5760 14.853
## TypeT2 3.8889 0.5187 7.498
## TypeT3 2.2222 0.5187 4.284
## TypeT4 0.6667 0.5187 1.285
##
## Correlation of Fixed Effects:
## (Intr) TypeT2 TypeT3
## TypeT2 -0.450
## TypeT3 -0.450 0.500
## TypeT4 -0.450 0.500 0.500
The model formula gives first the fixed effects, which here is an intercept and then type of stool (with T1 as reference, so estimate difference from T1). We use a random intercept for each Subject, given as (1|Subject)
.
From the print-out from summary
we see that REML is used to fit the model, and quantiles of scaled Pearson residuals. Could also have used:
formula(fit)
REMLcrit(fit)
quantile(residuals(fit, "pearson", scaled = TRUE))
## effort ~ Type + (1 | Subject)
## [1] 121.1308
## 0% 25% 50% 75% 100%
## -1.80200345 -0.64316591 0.05783115 0.70099706 1.63142054
Then there is a part on the fitted random effects and residual variation. The intra class correlation could also be calculated from VarCorr(fit)
(an object of class VarCorr.merMod
). Observe the very high ICC of 0.6.
vc = VarCorr(fit)
print(vc, comp = "Variance")
df = as.data.frame(vc)
print(df)
print(vc)
nobs(fit)
ngrps(fit)
sigma(fit)
ICC = df[4][[1]][1]/sum(df[4][[1]])
ICC
## Groups Name Variance
## Subject (Intercept) 1.7755
## Residual 1.2106
## grp var1 var2 vcov sdcor
## 1 Subject (Intercept) <NA> 1.775463 1.332465
## 2 Residual <NA> <NA> 1.210648 1.100295
## Groups Name Std.Dev.
## Subject (Intercept) 1.3325
## Residual 1.1003
## [1] 36
## Subject
## 9
## [1] 1.100295
## [1] 0.5945736
Then to the fitted fixed effects, which is interpreted as for lm
with treatment contrast (dummy effect coding), but without any \(p\)-values, and anova
gives the analysis of variance table. See help("pvalues")
to explore your options to find \(p\)-values for testing fixed effects.
fixef(fit)
coef(summary(fit))
anova(fit)
help("pvalues")
## (Intercept) TypeT2 TypeT3 TypeT4
## 8.5555556 3.8888889 2.2222222 0.6666667
## Estimate Std. Error t value
## (Intercept) 8.5555556 0.5760123 14.853079
## TypeT2 3.8888889 0.5186838 7.497610
## TypeT3 2.2222222 0.5186838 4.284348
## TypeT4 0.6666667 0.5186838 1.285304
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## Type 3 81.194 27.065 22.356
It is easiest to rise from the stool of Type T1, followed by Type T4 and the Type T3 and finally Type T2.
We may also get confidence intervals for the fixed effects (and random effects variances), based on the profile likelihood, which might be thought of as analogues to \(p\)-values. However, keep in mind our coding of Type (dummy), so if the intervals contain 0 the Type is not different from the reference Type T1 (the best type wrt effort to arise). The confidence intervals are made using the Wald approximation for the fixed effects. A bootstrap confidence interval can also be provided.
confint(fit)
## 2.5 % 97.5 %
## .sig01 0.7342354 2.287261
## .sigma 0.8119798 1.390104
## (Intercept) 7.4238425 9.687269
## TypeT2 2.8953043 4.882473
## TypeT3 1.2286377 3.215807
## TypeT4 -0.3269179 1.660251
Finally, there is a part on the correlation between estimated fixed effects, here we have the estimated correlation between the three levels of type of stool. We can get the variance-covariance matrix with vcov
, and can calculated correlations from that matrix.
vcov(fit)
## 4 x 4 Matrix of class "dpoMatrix"
## (Intercept) TypeT2 TypeT3 TypeT4
## (Intercept) 0.3317901 -0.1345165 -0.1345165 -0.1345165
## TypeT2 -0.1345165 0.2690329 0.1345165 0.1345165
## TypeT3 -0.1345165 0.1345165 0.2690329 0.1345165
## TypeT4 -0.1345165 0.1345165 0.1345165 0.2690329
Fitted vs. residuals and normal qq-plot (from lattice
). Not ggplot - see below for more plotting.
plot(fit, type = c("p", "smooth"))
library(lattice)
qqmath(fit, id = 0.05)
We may also use anova
to compare models. Assume that we want to compare to the (probably very bad) model where type of stool is not taken into account (which is stupic if we want to investigate the types) - so just go show (better example for sleep study).
fit0 = lmer(effort ~ 1 + (1 | Subject), data = ergoStool)
anova(fit0, fit)
## Data: ergoStool
## Models:
## fit0: effort ~ 1 + (1 | Subject)
## fit: effort ~ Type + (1 | Subject)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit0 3 164.15 168.90 -79.075 158.15
## fit 6 134.14 143.65 -61.072 122.14 36.006 3 7.468e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The comparison is based on the likelihood ratio test with ML (not REML), and also gives \(p\)-values.
lme4
Excerpt from Fitting Linear Mixed-Effects Models Using lme4 by Bates, Bolker, Mächler and Walker (2015) in Journal of Statistical Software page 35:
Computing p values One of the more controversial design decisions of lme4 has been to omit the output of p values associated with sequential ANOVA decompositions of fixed effects. The absence of analytical results for null distributions of parameter estimates in complex situations (e.g., unbalanced or partially crossed designs) is a long-standing problem in mixed-model inference. While the null distributions (and the sampling distributions of non-null estimates) are asymptotically normal, these distributions are not t distributed for finite size samples – nor are the corresponding null distributions of differences in scaled deviances F distributed. Thus approximate methods for computing the approximate degrees of freedom for t distributions, or the denominator degrees of freedom for F statistics (Satterthwaite 1946; Kenward and Roger 1997), are at best ad hoc solutions.
However, computing finite-size-corrected p values is sometimes necessary. Therefore, although the package does not provide them (except via parametric bootstrapping, Section 5.1), we have provided a help page to guide users in finding appropriate methods:
R> help("pvalues")
pvalues {lme4} R Documentation Getting p-values for fitted models
Description
One of the most frequently asked questions about lme4 is “how do I calculate p-values for estimated parameters?” Previous versions of lme4 provided the mcmcsamp function, which efficiently generated a Markov chain Monte Carlo sample from the posterior distribution of the parameters, assuming flat (scaled likelihood) priors. Due to difficulty in constructing a version of mcmcsamp that was reliable even in cases where the estimated random effect variances were near zero (e.g. https://stat.ethz.ch/pipermail/r-sig-mixed-models/2009q4/003115.html), mcmcsamp has been withdrawn (or more precisely, not updated to work with lme4 versions >=1.0.0).
Many users, including users of the aovlmer.fnc function from the languageR package which relies on mcmcsamp, will be deeply disappointed by this lacuna. Users who need p-values have a variety of options. In the list below, the methods marked MC provide explicit model comparisons; CI denotes confidence intervals; and P denotes parameter-level or sequential tests of all effects in a model. The starred (*) suggestions provide finite-size corrections (important when the number of groups is <50); those marked (+) support GLMMs as well as LMMs.
likelihood ratio tests via anova or drop1 (MC,+)
profile confidence intervals via profile.merMod and confint.merMod (CI,+)
parametric bootstrap confidence intervals and model comparisons via bootMer (or PBmodcomp in the pbkrtest package) (MC/CI,*,+)
for random effects, simulation tests via the RLRsim package (MC,*)
for fixed effects, F tests via Kenward-Roger approximation using KRmodcomp from the pbkrtest package (MC,*)
car::Anova and lmerTest::anova provide wrappers for Kenward-Roger-corrected tests using pbkrtest: lmerTest::anova also provides t tests via the Satterthwaite approximation (P,*)
afex::mixed is another wrapper for pbkrtest and anova providing “Type 3” tests of all effects (P,*,+)
arm::sim, or bootMer, can be used to compute confidence intervals on predictions.
For glmer models, the summary output provides p-values based on asymptotic Wald tests (P); while this is standard practice for generalized linear models, these tests make assumptions both about the shape of the log-likelihood surface and about the accuracy of a chi-squared approximation to differences in log-likelihoods.
When all else fails, don’t forget to keep p-values in perspective: http://www.phdcomics.com/comics/archive.php?comicid=905
Plan:
The data in this problem is based on four measurements of a particular bone for 20 boys. The meaurements were taken at 8, 8.5, 9 and 9.5 years of age. The variables in the dataset are
bone
: length of bone in millimetersredage
: centered age (i.e. age - 8.75)Below is an excerpt from the output from fitting a linear mixed model (LMM) with the procedure lmer
in R
where the length of the bone, bone
, is the response,
\[ \texttt{bone}_{ij} = \beta_0 + \beta_1 \texttt{redage}_{ij} + \gamma_{0i} + \gamma_{1i}\texttt{redage}_{ij} + \varepsilon_{ij} \ i = 1, \dots, 20, \ j = 1,2,3,4 \]
where \(\mathbf{{\boldsymbol \gamma}}_i = (\gamma_{0i}, \gamma_{1i})^T\), \(i = 1, \dots, 20\) represent the random effects.
filepath <- "https://www.math.ntnu.no/emner/TMA4315/2018h/bonedata.txt"
bone <- read.table(filepath, header = TRUE)
library(lme4)
fit2 <- lmer(bone ~ redage + (1 + redage | boy), data = bone)
summary(fit2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: bone ~ redage + (1 + redage | boy)
## Data: bone
##
## REML criterion at convergence: 231
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.80183 -0.30917 -0.09912 0.39174 2.07317
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## boy (Intercept) 6.2267 2.4953
## redage 1.2011 1.0960 0.13
## Residual 0.1794 0.4236
## Number of obs: 80, groups: boy, 20
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 50.0713 0.5600 89.416
## redage 1.8630 0.2593 7.185
##
## Correlation of Fixed Effects:
## (Intr)
## redage 0.122
a) Formulate the model on matrix form and explain the meaning and interpretation of the different parts. State the usual model assumptions carefully.
New: explain what you see in the code and output below:
library(reshape2)
library(ggplot2)
fit2df <- data.frame(boy = paste0("boy", 1:20), low = coef(fit2)$boy[, 1] +
coef(fit2)$boy[, 2] * min(bone$redage), high = coef(fit2)$boy[, 1] + coef(fit2)$boy[,
2] * max(bone$redage))
fit2df <- melt(fit2df, id.vars = "boy", value.name = "bone", variable.name = "redage")
fit2df$redage <- (as.numeric(fit2df$redage) - 1) * (max(bone$redage) - min(bone$redage)) +
min(bone$redage)
ggplot(data = fit2df, mapping = aes(x = redage, y = bone, colour = boy)) + geom_line()
b) Describe the distribution of the response \(\mathbf{y}_i = (y_{i1}, y_{i2}, y_{i3}, y_{i4})^T\), \(i = 1, \dots, 20\), and explain how you can find numerical values using the R
-output.
c) New: derive the joint distribution of \((\pmb{\gamma}_i,\mathbf{y}_i)\)?
Find the conditional expectation of a random effect \(\pmb{\gamma}_i\), \(i = 1, \dots, 20\) given the observations, i.e. E\((\pmb{\gamma}_i|\mathbf{y}_1, \dots, \mathbf{y}_{20})\). Describe how the random effects, \(\pmb{\gamma}_i\), \(i = 1, \dots, 20\), can be predicted/estimated.
In the exam, no numerical calculations were necessary in b) and c).
d) New: do b) and c) in R
(i.e. do numerical calculations). The R markdown file of this module contains necessary code to download the dataset, and fit the full model. In addition, plot the distribution of \((\gamma_{0i}, \gamma_{1i})^T\) (this is a multivariate distribution).
The data used in this problem conserns expenses in the the social security system Medicare in US. Average expenses per hospitalization, denoted as ccpd
, were in six years recorded for 54 regions: the fifty US states, Puerto Rico, Virgin Islands, District of Columbia and an unspecified other. Thus there are \(6\times54 = 324\) observations. The expenses are treated as response. The covariates are j = YEAR
which can take values \(1, \dots, 6\) and a factor indicating the average length of stay at hospital, AVETD
, in each region and year. This factor has tree levels, 1
= six days or less, 2
= 7-9 days, 3
= 10 days or more. “Six days or less”" is the reference level and the others are denoted as AVETD2
and AVETD3
.
Below you find the output from fitting the linear mixed effects model
\[y_{ij} = \beta_0 + j \times \beta_1 + \beta_2 \texttt{AVETD2}_{ij} + \beta_3 \texttt{AVETD3}_{ij} + \gamma_{0i} + j \times \gamma_{1i} + \varepsilon_{ij} \]
\(j = 1, \dots, n_i\), $i = 1, , 54 $ and \(N=\sum_{i=1}^m n_i=324\).
filepath <- "https://www.math.ntnu.no/emner/TMA4315/2018h/medicare.dat"
medicare <- read.table(filepath, header = TRUE, colClasses = c("numeric", "numeric",
"factor", "factor"))
library(lme4)
fit1 <- lmer(ccpd ~ YEAR + AVETD + (1 + YEAR | fstate), data = medicare)
summary(fit1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: ccpd ~ YEAR + AVETD + (1 + YEAR | fstate)
## Data: medicare
##
## REML criterion at convergence: 5185
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.1929 -0.6034 0.0180 0.6183 3.5134
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## fstate (Intercept) 5811948 2410.8
## YEAR 69022 262.7 0.42
## Residual 184566 429.6
## Number of obs: 324, groups: fstate, 54
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7419.85 386.05 19.220
## YEAR 706.04 39.55 17.850
## AVETD2 567.72 183.92 3.087
## AVETD3 1008.34 244.25 4.128
##
## Correlation of Fixed Effects:
## (Intr) YEAR AVETD2
## YEAR 0.170
## AVETD2 -0.488 0.168
## AVETD3 -0.468 0.239 0.781
a) Formulate the model in matrix form and explain the what the usual assumptions are.
b) Compute a 95 % confidence interval for the fixed effect coefficient for YEAR
.
Remark: for c we have not focus on testing random effects in our course. c) Explain how we can find out if we can simplify the model by removing the random effect \(\gamma_{1i}\).
d) What is the expectation and covariate matrix in the marginal model of the response \((Y_{i1}, Y_{i2}, Y_{i3}, Y_{i4}, Y_{i5}, Y_{i6})^T\)?
e) Explain how the null hypothesis \(H_0: \beta_3 = 2\times\beta_2\) versus the alternative hypothesis \(H_1: \beta_3 \neq 2\times\beta_2\) can be tested? In this part no numerical calculations are expected.
f) New: Do c) and e) in R
. The R Markdown file of this module contains necessary code to download the dataset, and fit the full model.
[20 points]
New on the reading list this year is how to handle correlated data in a regression setting, with the aid of the linear mixed effects model. The simplest version of such a model is the random intercept model.
Write a short introduction to the random intercept linear mixed effects model and its practical usage, for a student with a good background in multiple linear regression. The introduction should include an example and emphasis should be on:
Topics you do not need to address are: REML, hypothesis testing, AIC.
# packages needed to run this module page
install.packages("lme4")
install.packages("devtools")
library(devtools)
install_github("romunov/AED")
install.packages("lattice")
install.packages("MEMSS")
install.packages("sjPlot")
install.packages("sjmisc")
install.packages("ggpubr")
install.packages("faraway")
install.packages("kableExtra")
install.packages("spcadjust")