#packages needed to run this module page
install.packages("lme4")
install.packages("devtools")
library(devtools)
install_github("romunov/AED")
install.packages("lattice")
install.packages("MEMSS")
install.packages("sjPlot")
install.packages("sjmisc")
install.packages("ggpubr")
install.packages("faraway")
install.packages("kableExtra")
install.packages("spcadjust")
Aim: Present methods for analysing correlated responses in a (normal/Gaussian) regression setting.
We will only consider two-level models and in particular focus on random intercept and random slope models.
Textbook: Fahrmeir et al (2013): Chapter 2.4, 7.1-7.3, 7.7. In greater detail: pages 349-354 (not “Alternative view on the random intercept model”), 356-365 (not 7.1.5 “Stochastic Covariates”“), 368-377 (not”Bayesian Covariance Matrix“”), 379-380 (not “Testing Random Effects or Variance Parameters”“, only last part on page 383), 383 (middle), 401-409 (orange juice). Note: Bayesian solutions not on the reading list.
Alternative readings: Zuur et al. (2009): “Mixed Effects Models and Extensions in Ecology with R”, chapter 5 (pages 101-142). Available as free ebook from Springer for NTNU students. More explanations and less mathematics than Fahrmeir et al (2013), more focus on understanding. Link to ebook Chapter 5
Classnotes from first week (30.10.2017)
Jump to interactive (week 1)
residuals: marginal and conditional
model selection
lmer
in package lme4
what have we not covered?
Jump to interactive (week 2)
This example is taken from Zuur et al. (2009, chapter 5, pages 101-142), and data are referred to as RIKZ because they were collected by a Dutch institute with that name.
Data were collected on nine different beaches in the Netherlands with the aim to investigate if there is a relationship between the
Data: 45 observations, taken at 9 beaches:
In addition also
We want to use the richness of species as the response in a regression model. This is a count, so we could have used Poisson regression, but to make things simpler we assume that the counts are such that we instead can assume a normal distribution (remember we also did that for our DOE experiment in TMA4267).
library("AED")
data(RIKZ)
summary(RIKZ)
## Sample Richness Exposure NAP
## Min. : 1 Min. : 0.000 Min. : 8.00 Min. :-1.3360
## 1st Qu.:12 1st Qu.: 3.000 1st Qu.:10.00 1st Qu.:-0.3750
## Median :23 Median : 4.000 Median :10.00 Median : 0.1670
## Mean :23 Mean : 5.689 Mean :10.22 Mean : 0.3477
## 3rd Qu.:34 3rd Qu.: 8.000 3rd Qu.:11.00 3rd Qu.: 1.1170
## Max. :45 Max. :22.000 Max. :11.00 Max. : 2.2550
## Beach
## Min. :1
## 1st Qu.:3
## Median :5
## Mean :5
## 3rd Qu.:7
## Max. :9
We first consider models with reponse Richness
and one covariate NAP
, and analyse data from each beach separately.
## 1 2 3 4 5 6
## (Intercept) 10.8218944 13.345694 3.400702 3.087716 12.782828 4.324634
## NAP -0.3718279 -4.175271 -1.755353 -1.248577 -8.900178 -1.388512
## 7 8 9
## (Intercept) 3.520626 4.951455 6.295053
## NAP -1.517613 -1.893066 -2.967530
The intercepts differ but the slopes are not that different.
Q: What if we now want to combine the regression models for the nine beaches into one model (for the population of beaches), so we can answer the question about relationship between species richness and NAP on the population level. What can we do then?
Possible solutions:
Use all 45 observations in a regression - with a common intercept and linear effect in NAP. Problem: violation of assumption of independent observations lead to wrong estimates for variances of parameter estimates.
Add beach as covariate to regression - then we estimate one regression coefficient for each beach (intercepts) in addition to the linear effect in NAP. Problem: why do we want to add 8 extra parameters to estimate (why 8?) values for the 9 beaches? Loss of power and what would we use the beach estimates for?
Add beach as a random covariate to the regression: this is called random intercept models. Problem: new stuff - slightly complicated. We do this because beaches not of interest in themselves, only random sample from population of beaches, and therefore we only need to account for beaches, not estimate separate parameters.
Solution 1: all observations together - standard errors not correct
fitall=lm(Richness~NAP,data=RIKZ)
summary(fitall)
##
## Call:
## lm(formula = Richness ~ NAP, data = RIKZ)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0675 -2.7607 -0.8029 1.3534 13.8723
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.6857 0.6578 10.164 5.25e-13 ***
## NAP -2.8669 0.6307 -4.545 4.42e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.16 on 43 degrees of freedom
## Multiple R-squared: 0.3245, Adjusted R-squared: 0.3088
## F-statistic: 20.66 on 1 and 43 DF, p-value: 4.418e-05
—-
Solution 2: fixed effects for each beach - many estimates not so much of interest when population is in focus.
RIKZ$beachfactor=as.factor(RIKZ$Beach)
fitbeach=lm(Richness~NAP+beachfactor,data=RIKZ,contrasts=list(beachfactor="contr.sum"))
summary(fitbeach)
##
## Call:
## lm(formula = Richness ~ NAP + beachfactor, data = RIKZ, contrasts = list(beachfactor = "contr.sum"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.8518 -1.5188 -0.1376 0.7905 11.8384
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.5556 0.4885 13.421 2.30e-15 ***
## NAP -2.4928 0.5023 -4.963 1.79e-05 ***
## beachfactor1 3.2503 1.3554 2.398 0.0220 *
## beachfactor2 6.3284 1.2908 4.903 2.15e-05 ***
## beachfactor3 -3.1546 1.3020 -2.423 0.0207 *
## beachfactor4 -2.7826 1.2943 -2.150 0.0386 *
## beachfactor5 2.3520 1.2967 1.814 0.0783 .
## beachfactor6 -1.9728 1.2915 -1.528 0.1356
## beachfactor7 -2.1864 1.3167 -1.661 0.1057
## beachfactor8 -1.3027 1.2926 -1.008 0.3204
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.06 on 35 degrees of freedom
## Multiple R-squared: 0.7025, Adjusted R-squared: 0.626
## F-statistic: 9.183 on 9 and 35 DF, p-value: 5.645e-07
For the rest - we focus on solution 3!
We have \(n_i\) repeated observations (\(j=1,\ldots, n_i\)) from each of \(i=1,\ldots,m\) clusters (or individuals).
The covariates \({\bf x}_{ij}\) are \(p\times 1\) vectors (as before, \(k\) covariates and one intercept so \(p=k+1\)).
We start with a single linear regression (one fixed effect):
First, only one covariate (in addition to the intercept), observed for cluster (beach) \(i\) on occation \(j\) we have \(x_{ij}\)
\[ Y_{ij}=\beta_0+\beta_1 x_{ij}+\varepsilon_{ij} \text{ where } \varepsilon_{ij} \text{ i.i.d. } N(0,\sigma^2)\]
but, we know that \(Y_{i1}\) and \(Y_{i2}\) are observed for the same cluster and should not be independent (i.e. from same beach), to fix that we insert a random intercept
New: cluster-specific parameters \(\gamma_{0i}\):
\[Y_{ij}=\beta_0+\beta_1x_{ij}+\gamma_{0i}+\varepsilon_{ij} \text{ where } \varepsilon_{ij} \text{i.i.d.} N(0,\sigma^2)\]
Now, the clusters are assumed to be random samples from a large population, and the cluster deviation intercept is assumed \[ \gamma_{0i}\sim N(0,\tau_0^2)\] and that the \(\gamma_{0i}\)s and the \(\varepsilon_{ij}\)s are mutually independent random variables.
So, we have now two error terms.
Q: What are now our unknown parameters? A: \(\beta_0, \beta_1, \sigma^2, \tau_0^2\). Remark that the \(\gamma_{0i}\)s are not parameters but random variables.
(will talk about parameter estimation with ML and REML later)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
Q: Try to identify \(\hat{\beta_0}, \hat{\beta_1}, \hat{\sigma}^2, \hat{\tau}_0^2\) in the print-out. A: \(\hat{\beta_0}=\) 6.5818929, \(\hat{\beta_1}=\) -2.5683996 , \(\hat{\sigma}^2=\) 9.3621916, and \(\hat{\tau}_0^2=\) 8.6675181
The conditional distribution of \(Y_{ij}\) given the value of \(\gamma_{0i}\) (=regarding \(\gamma_{0i}\) as known) is \[Y_{ij}\mid \gamma_{0i} \sim N(\beta_0+\beta_1x_{ij}+\gamma_{0i},\sigma^2)\]
One motivation for inserting this new random intercept was to make sure that observations from the same cluster are dependent, but between clusters are independent. This means that we need to look at \(\text{Cov}(Y_{ij},Y_{kl})\) when \(i=k\) and when \(i\neq k\). To do that we need the (joint) marginal distribution of the responses.
What is the marginal distribution for \(Y_{ij}\)? \[ Y_{ij}=\beta_0+\beta_1x_{ij}+\gamma_{0i}+\varepsilon_{ij}\]
We consider \(Y_{ij}\) and \(Y_{kl}\): \[ Y_{kl}=\beta_0+\beta_1x_{kl}+\gamma_{0k}+\varepsilon_{kl}\]
Now to the covariance between \(Y_{ij}\) and \(Y_{kl}\).
\[\text{Cov}(Y_{ij},Y_{kl})=\text{E}[(Y_{ij}-\mu_{ij})(Y_{kl}-\mu_{kl})]\]
\(\oplus\) derivations on the board in class
\[\text{Cov}(Y_{ij},Y_{kl})=\left\{\begin{array}{lr} \tau_0^2+\sigma^2=\text{Var}(Y_{ij}) & \text{for } i=k, j=l\\ \tau_0^2 & \text{for } i=k, j\neq l\\ 0 & \text{for } i\neq k, j\neq l \end{array}\right\} \]
If we put this into a covariance matrix for the vector of responses for cluster \(i\) this type of structure is called compound symmetry.
\(\oplus\) write this out on the board in class \[\text{Cov}({\bf Y}_i)=\tau_0^2 \mathbf{11}^T + \sigma^2 \mathbf{I}\]
The correlation between \(Y_{ij}\) and \(Y_{il}\) (two observations in the same cluster that is - same beach) is called the within subject or within cluster correlation coefficient, and is for our random intercept model
\[\text{Corr}(Y_{ij},Y_{il})=\frac{\text{Cov}(Y_{ij},Y_{il})}{\sqrt{\text{Var}(Y_{ij})\text{Var}(Y_{il})}}=\frac{\tau_0^2}{\tau_0^2+\sigma^2} \text{ for }j\neq l\]
Inserted parameter estimates this is called the intra class correlation (ICC) for the random intercept model.
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
Q: What is the ICC for our fit? Hint: \(\text{Corr}(Y_{ij},Y_{il})=\frac{\tau_0^2}{\tau_0^2+\sigma^2} \text{ for }j\neq l\). A: 8.688/(8.688+9.362)=0.48.
Within vs. between cluster variability: See Figure 7.2 in Regression textbook.
We have now looked at models with a random intercept, which is a special case of linear mixed models (LMM). We have now see that we have used two components in our model:
We might imagine that the fixed effects can be of form \({\bf X}_i\beta\) - as before, but it is also possible to expand the random effects into a similar form \({\bf U}_i\gamma\). We do that now, and look particularily into the random slope model in week 2.
For cluster \(i=1,\ldots, m\).
Measurement model \[{\bf Y}_i= {\bf X}_i \beta + {\bf U}_i \gamma_{i} + \varepsilon_i\]
Distributional assumptions \[\gamma_i \sim N({\bf 0},{\bf Q})\] \[\varepsilon_i \sim N({\bf 0},\sigma^2 {\bf I})\] All elements of \(\gamma_1, \gamma_2, \ldots, \gamma_m\) and \(\varepsilon_1,\varepsilon_2,\ldots, \varepsilon_m\) are mutually independent. The dimension of \({\bf I}\) is \(n_i\times n_i\), and \({\bf Q}\) is \((q+1)\times (q+1)\).
Remark: one possible generalization is to assume \(\varepsilon_i \sim N({\bf 0},\Sigma)\) where \(\Sigma\) is general, but we will not look into that in our course. This can be needed for example if time series structure (like AR1) is present.
Questions:
\[{\bf Y}_i= {\bf X}_i \beta + {\bf U}_i \gamma_{i} + \varepsilon_i\] \[\gamma_i \sim N({\bf 0},{\bf Q})\] \[\varepsilon_i \sim N({\bf 0},\sigma^2 {\bf I})\]
Q: What is \({\bf U}_i\), \(\gamma_i\) and \({\bf Q}\) for the random intercept model? A: We had \({\bf U}_i={\bf 1}\) (\(n_i\times 1\)) and scalar \(\gamma_i=\gamma_{0i}\), and \({\bf Q}=\tau_{0}^2\).
Q: General: what is the marginal distribution of \({\bf Y}_i\)? A: \({\bf Y}_i \sim N({\bf X}_i \beta,{\bf U}_i{\bf Q}{\bf U}_i^T+ \sigma^2 {\bf I})\).
Conditional Gaussian model for the response \({\bf Y}_i\) given the random effect \(\gamma_i\): \[{\bf Y}_i\mid \gamma_i \sim N({\bf X}_i \beta + {\bf U}_i \gamma_{i}, \sigma^2{\bf I})\]
Marginal Gaussian model for the response \({\bf Y}_i\) (Laird and Ware 1982 formulation) \[\begin{align*} {\bf Y}_i&= {\bf X}_i \beta + {\bf U}_i \gamma_{i} + \varepsilon_i={\bf X}_i \beta + \varepsilon^{*}_i\\ \varepsilon^{*}_i &= {\bf U}_i \gamma_{i} + \varepsilon_i\\ {\bf V}_i&=\text{Cov}(\varepsilon^{*}_i)=\text{Cov}({\bf U}_i\gamma_i)+\text{Cov}(\varepsilon_i)={\bf U}_i {\bf Q} {\bf U}_i^T+\sigma^2 {\bf I}\\ \varepsilon^{*}_i &\sim N({\bf 0},{\bf V}_i)\\ \end{align*}\]which gives \[{\bf Y}_i\sim N(\mu_i={\bf X}_i \beta,{\bf V}_i=\sigma^2{\bf I}+{\bf U}_i {\bf Q} {\bf U}_i^T)\]
From \[{\bf Y}_i= {\bf X}_i \beta + {\bf U}_i \gamma_{i} + \varepsilon_i\] into \[{\bf Y}={\bf X}\beta + {\bf U} \gamma +\varepsilon\] where \[ {\bf Y}=\begin{pmatrix} {\bf Y}_1\\ {\bf Y}_2\\ \vdots \\ {\bf Y}_m \end{pmatrix}, {\bf X}=\begin{pmatrix} {\bf X}_1\\ {\bf X}_2 \\ \vdots \\ {\bf X}_m \end{pmatrix}, {\bf U}=\begin{pmatrix} {\bf U}_1 & {\bf 0} & \ldots &{\bf 0}\\ {\bf 0 } & {\bf U}_2 & \ldots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ddots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ldots &{\bf U}_m\\ \end{pmatrix}, \gamma=\begin{pmatrix} \gamma_1\\ \gamma_2\\ \vdots \\ \gamma_m \end{pmatrix}, \varepsilon=\begin{pmatrix} \varepsilon_1\\ \varepsilon_2 \\ \vdots \\ \varepsilon_m \end{pmatrix} \] Let \(N=\sum_{i=1}^m n_i\), then dimensions are:
Conditional Gaussian model for the response \({\bf Y}\) given the random effect \(\gamma\): \[{\bf Y}\mid \gamma \sim N({\bf X} \beta + {\bf U} \gamma, \sigma^2{\bf I})\] Now \({\bf I}\) is \(N \times N\) where \(N=\sum_{i=1}^m n_i\).
Marginal Gaussian model for the response \({\bf Y}\) \[\begin{align*} {\bf Y}&= {\bf X} \beta + {\bf U} \gamma + \varepsilon={\bf X}\beta + \varepsilon^{*}\\ \varepsilon^{*} &= {\bf U} \gamma + \varepsilon\\ {\bf V}&=\text{Cov}(\varepsilon^{*})=\text{Cov}(\varepsilon)+\text{Cov}({\bf U}\gamma)=\sigma^2 {\bf I}+{\bf U} {\bf G} {\bf U}^T\\ \varepsilon^{*} &\sim N({\bf 0},{\bf V})\\ \end{align*}\]Here \({\bf G}\) is a \(m\dot (q+1)\) block-diagonal matrix with \({\bf Q}\) \(m\) times on the diagonal, which gives \[{\bf Y}\sim N({\bf X} \beta,{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
For the random effects \(\gamma_i\) and \(\varepsilon_i\) we also provide predictions
REML, predicted random effects and residuals: week 2.
\[{\bf Y}\sim N({\bf X} \beta,{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
The log-likelihood function is then (\(\pm\) some constants) \[l(\beta,{\bf V})=-\frac{1}{2}\ln \lvert {\bf V}\rvert -\frac{1}{2}({\bf y}-{\bf X}\beta)^T {\bf V}^{-1}({\bf y}-{\bf X}\beta)\]
We assume that the parameters in \({\bf V}\) are known, and transform our problem with \({\bf V}^{-1/2}\) to get the standard multiple linear regression model. This leads to the weighted least squares solution for \(\beta\).
\[\hat{\beta}=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T {\bf V}^{-1}{\bf Y}\]
Since we have independence between clusters, we had block-diagonal \({\bf V}\) and then \[\hat{\beta}=(\sum_{i=1}^m {\bf X}_i^T{\bf V}_i^{-1}{\bf X}_i)^{-1} \sum_{i=1}^m {\bf X}_i^T {\bf V}_i^{-1}{\bf Y}_i\]
But, we really don’t know \({\bf V}\) so an estimate must be inserted (week 2 with REML).
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
Q: Where is the parameter estimat for effect of NAP, with standard deviation? What is the interpretation of “Correlation of Fixed Effects”? A:“Fixed effects, NAP, Estimate and Std. Error”. Correlation of fixed effects is the off-diagonal elements of \(\text{Cov}(\hat{\beta})\) which here is only \(\text{Cov}(\hat{\beta}_0,\hat{\beta}_1)\).
is still asymptotic multivariate normal (under regularity conditions).
Remark: this is only asymptotically, so we need large samples for this to hold. And, we do not arrive at a \(t\)-distribution here - however approximations for \(t\) (and F) exists, but the problem is the number of degrees of freedom (in particular when we have time varying fixed effects). In this course we will only use the asymptotic normality for confidence intervals (and when a Wald type test is desired). We will consider hypothesis testing with likelihood ratio test under “Model selection” in the end of this module.
## 2.5 % 97.5 %
## .sig01 1.484204 5.100814
## .sigma 2.435429 3.877572
## (Intercept) 4.324560 8.824909
## NAP -3.566901 -1.599779
Q: Interpret! First the last two rows. The first two we have not covered yet - since we don’t know how to estimate the parameters in \({\bf V}\) but what do you think this is? A: Estimate \(\pm\) 1.96 times standard error of estimate.
Plan: work in randomly assigned groups.
We will in this exercise look at the following model:
\[Y_{ij} = \mathbf{x}_{ij}^T \pmb{\beta} + \gamma_{0i} + \varepsilon_{ij}, \ \varepsilon_{ij} \sim N(0, \sigma^2) \] \[\gamma_{0i} \sim N(0, \tau_0^2)\]
where \(i \in \{1, \dots, m\}\) is the group index, and \(j \in \{1, \dots, n_i\}\) is the index for repeated measurements in each group (cluster). We assume that all random variables are independent.
a) What is this model called? Discuss where and when such models are useful.
b) What is the marginal model for \(\mathbf{Y}_i = (Y_{i1}, \dots, Y_{in_i})\)? Show how you arrive at your answer and write with vectors and matrices (for cluster \(i\)).
New: also write down the formula for \({\bf Y}\) (all clusters together).
What are the advantages of having an expression for the marginal distribution of \(\mathbf{Y}_i\) when doing estimation?
In the rest of the exercise we will look at a dataset consisting of observations of cod from the year 2000. The dataset is from Havforskningsinstitutten in Bergen, and we use only a subset of a large dataset with information about fish in Barentshavet. The following variables are available on a random sample within each catch (a draft of trawl (trål)):
length
: the length of fish (in cm)weight
: the weight of fish (in grams)age
: the age of fish (in years)haulsize
: the size of the total catch (in ton (1000 kg))Let \(i\) be the index for catch, and \(j\) be the index for an individual fish in a catch. We will not use the age variable in this exercise, but it will probably be used in Module 7 Generalized linear mixed models.
We start by looking at the model
\[ \log(\texttt{weight}_{ij}) = \beta_0 + \beta_1 \log(\texttt{length}_{ij}) + \beta_2\log(\texttt{haulsize}_i) + \gamma_{0i} + \varepsilon_{ij} \]
where \(\gamma_{0i}\sim N(0, \tau_0^2)\), \(\varepsilon_{ij} \sim N(0,\sigma^2)\), and all random effects are independent. Below is an excerpt from the output from fitting this model in R
.
library(lme4)
filepath <- "https://www.math.ntnu.no/emner/TMA4315/2017h/fishdata.dat"
fish <- read.table(filepath, header = TRUE)
fit1 <- lmer(log(weight) ~ log(length) + log(haulsize) + (1 | haul), data = fish)
summary(fit1)
c) Write down the estimates for all the parameters in the model. What is the correlation between two weight-variables from the same catch (haul)?
d) We are now interested in \(\theta = \exp(\beta_0 + \beta_1 \log(66) + \beta_2 \log(0.46))\). The standard deviation of \(\hat{\beta}_0 + \hat{\beta}_1 \log(66) + \hat{\beta}_2 \log(0.46)\) is 0.007. Explain how you can calculate this value (you do not have to do the calculation). Then create and calculate a 95 % confidence interval for \(\theta\).
Remark: skip (e), we will not focus so much on model selection. However, a top-down-strategy (which this exam question points to) will be explained in the end of this module page (week 2)
e) Assume that you want to compare different models with respect to which random effects and fixed effects that should be included in the model. Write down a general strategy for doing model evaluation on these kinds of models.
New: f) Below is a (horizontal version of a) catepillarplot of the random effects with confidence intervals, and a density plot for the random effects and a qq-plot for the random effects. Explain what you see.
library(sjPlot)
sjp.lmer(fit1,y.offset=0.5)
ggplot() + geom_density(aes(x = ranef(fit1)$haul[[1]])) + labs(x = "x", y = "y", title = "Density")
sjp.lmer(fit1,type="re.qq")
The data in this problem is part of a longitudinal study of income in the US, the Panel Study of Income Dynamics, begun in 1968. The subset consists of 42 heads of household who were aged 25-39 in 1968. The variables included are
age
, age in 1968cyear
, coded as -10 in 1968, 0 in 1978 and 10 in 1988educ
, years of education in 1968sex
, M = male, F = femaleperson
, containing information on person idBelow is an excerpt from the output from fitting a linear mixed model (LMM) with the function lmer
in R
. The log income (lincm
) is the response,
\[\texttt{lincm}_{ij} = \beta_0 + \beta_1 \texttt{age}_i + \beta_2 \texttt{cyear}_{ij} + \beta_3 \texttt{educ}_i + \beta_4 \texttt{sex}_i + \gamma_{0i} + \varepsilon_{ij}\] \[i = 1, \dots, 42, \ j = 1968, 1978, 1988\]
where \(\gamma_{0i}\) represent the random effects.
## Linear mixed model fit by REML ['lmerMod']
## Formula: lincm ~ age + cyear + educ + sex + (1 | person)
## Data: psid2
##
## REML criterion at convergence: 306
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.4411 -0.4003 0.1071 0.5602 1.6038
##
## Random effects:
## Groups Name Variance Std.Dev.
## person (Intercept) 0.001757 0.04192
## Residual 0.563294 0.75053
## Number of obs: 126, groups: person, 42
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 7.386823 0.631710 11.693
## age -0.020930 0.015207 -1.376
## cyear 0.084163 0.008189 10.278
## educ 0.116343 0.027582 4.218
## sexM 1.311661 0.142247 9.221
##
## Correlation of Fixed Effects:
## (Intr) age cyear educ
## age -0.831
## cyear 0.000 0.000
## educ -0.685 0.201 0.000
## sexM 0.003 -0.217 0.000 0.041
a) Formulate the model on matrix form and explain the meaning of the different parts. State the model assumptions.
b) Determine an approximate 95 % interval for the coefficient of cyear
. Do you think the nominal income has been constant in the period covered by the survey?
e) Use the values in the R
-output to calculate the estimated covariance matrix for the response \((Y_{i1}, Y_{i2}, Y_{i3})^T\) by hand. Hint: compound symmetry due to random intercept model.
Remark Skip (c), which will be lectured in week 2.
c) Explain how one can test the simultaneous significance of two fixed effects (e.g. age
and educ
) in LMMs.
d) Describe how the random effects \(\gamma_{0i}\) can be predicted/estimated. What information is missing for you to calculate these.
In part c) and d) it was not necessary to do any numerical calculations at the exam.
f) New: do c) and d) in R
(after you have done c) and d) by hand!!!, i.e., do numerical calculations (choose \(i = 2\) in d)). The rmd-file of this module contains necessary code to download the dataset, modify it, and fit the full model. Alternatively use purl
to extract the code.
beach
-example - which we saw was a special case of linear mixed models (LMM).\[{\bf Y}_i= {\bf X}_i \beta + {\bf U}_i \gamma_{i} + \varepsilon_i\] \[\gamma_i \sim N({\bf 0},{\bf Q})\] \[\varepsilon_i \sim N({\bf 0},\sigma^2 {\bf I})\]
This gave the marginal model for each cluster:
\[{\bf Y}_i \sim N(\mu_i={\bf X}_i \beta,{\bf V}_i=\sigma^2 {\bf I}+{\bf U}_i {\bf Q}{\bf U}_i^T)\]
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
\[ {\bf Y}=\begin{pmatrix} {\bf Y}_1\\ {\bf Y}_2\\ \vdots \\ {\bf Y}_m \end{pmatrix}, {\bf X}=\begin{pmatrix} {\bf X}_1\\ {\bf X}_2 \\ \vdots \\ {\bf X}_m \end{pmatrix}, {\bf U}=\begin{pmatrix} {\bf U}_1 & {\bf 0} & \ldots &{\bf 0}\\ {\bf 0 } & {\bf U}_2 & \ldots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ddots &{\bf 0}\\ {\bf 0 } & {\bf 0} & \ldots &{\bf U}_m\\ \end{pmatrix}, \gamma=\begin{pmatrix} \gamma_1\\ \gamma_2\\ \vdots \\ \gamma_m \end{pmatrix}, \varepsilon=\begin{pmatrix} \varepsilon_1\\ \varepsilon_2 \\ \vdots \\ \varepsilon_m \end{pmatrix} \]
\[\begin{align*} {\bf Y}&= {\bf X} \beta + {\bf U} \gamma + \varepsilon={\bf X}\beta + \varepsilon^{*}\\ \varepsilon^{*} &= {\bf U} \gamma + \varepsilon\\ {\bf V}&=\text{Cov}(\varepsilon^{*})=\text{Cov}(\varepsilon)+\text{Cov}({\bf U}\gamma)=\sigma^2 {\bf I}+{\bf U} {\bf G} {\bf U}^T\\ \varepsilon^{*} &\sim N({\bf 0},{\bf V})\\ \end{align*}\]Here \({\bf G}\) is a \(m\dot (q+1)\) block-diagonal matrix with \({\bf Q}\) \(m\) times on the diagonal, which gives \[{\bf Y}\sim N({\bf X} \beta,{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
Fixed effects \(\beta\): estimated using maximum likelihood, with the marginal distribution as starting point: \[{\bf Y}\sim N({\bf X} \beta,{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
We assume that the parameters in \({\bf V}\) are known, then we get the weighted least squares solution for \(\beta\). \[\hat{\beta}=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T {\bf V}^{-1}{\bf Y}=(\sum_{i=1}^m {\bf X}_i^T{\bf V}_i^{-1}{\bf X}_i)^{-1} \sum_{i=1}^m {\bf X}_i^T {\bf V}_i^{-1}{\bf Y}_i\] \[\hat{\beta}\sim N(\beta, (\sum_{i=1}^m {\bf X}_i^T{\bf V}_i^{-1}{\bf X}_i)^{-1})\]
We insert estimates for \({\bf V}_i\) (which we will find next), and the same distribution - but only asymptotically - to be used for inference for the fixed effects. \[\hat{\beta}=(\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1}\sum_{i=1}^m {\bf X}_i^T \hat{\bf V}_i^{-1}{\bf Y}_i\approx N(\beta, \sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1})\]
Now follows:
For the random effects \(\gamma_i\) and \(\varepsilon_i\) we also provide predictions
There are two ways to explain the REML - a transformation method (we start with this), and an integration method (to come next).
To aid in our understanding we start by looking at the REML solution for multippel linear regression (Module 2), \[{\bf Y=X \beta}+{\varepsilon} \text{ with } \varepsilon \sim N({\bf 0},\sigma^2 {\bf I})\] where \({\bf X}\) is a \(n\times p\) design matrix. Remember that \(\text{SSE}={\bf Y}^T({\bf I}-{\bf H}){\bf{Y}}\) where the hat matrix is \({\bf H}={\bf X}({\bf X}^T{\bf X})^{-1}{\bf X}^T\).
This is called the REML estimate for \(\sigma^2\).
Remark: There are many solutions to \({\bf A}\) but to get \(\text{E}({\bf A}^T{\bf Y})={\bf A}^T{\bf X}\beta={\bf 0}\) then \({\bf A}\) need to be chosen have linearly independent columns orthogonal to columns space of the design matrix.
Now, move to our linear mixed effects model. We have the model \[{\bf Y}={\bf X}\beta + {\bf U} \gamma +\varepsilon\] with the marginal distribution \[{\bf Y}\sim N({\bf X} \beta,{\bf V}=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
Even if \(\beta\) is not estimated in this optimization we already know that \[\hat{\beta}=({\bf X}^T{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T {\bf V}^{-1}{\bf Y}\] and now we have a new \(\hat{\bf V}\) which we insert in this equation, and thus get a new REML-estimator for \(\beta\): \[\hat{\beta}=({\bf X}^T\hat{\bf V}^{-1}{\bf X})^{-1}{\bf X}^T \hat{\bf V}^{-1}{\bf Y}\]
This means, that when using REML-estimation for our linear mixed effects model this will influence both the fixed effects and the random effects parameters. However, asymptotically we will still have the same asymptotic distribution for the fixed effects as with ML estimation.
In addition the main justification for using REML is that in the absence of information on \(\beta\) then no information about the parameters in \(\vartheta\) is lost when likelihood estimation is based on \({\bf A}^T {\bf Y}\) instead of on \({\bf Y}\). In statistical inference this is referred to as \({\bf A}^T{\bf Y}\) is marginally sufficient for \(\vartheta\) (but this is way beyond the scope of this course).
Further reading: Theoretical explanation for REML (beyond the scope of this course) by Inge Helland, UiO and also by Verbeke and Molenberghs (2000), Section 5.3 (free ebook from Springer for NTNU students).
fitREML=lmer(Richness~NAP +(1|Beach),data=RIKZ)
fitML=lmer(Richness~NAP +(1|Beach),data=RIKZ,REML=FALSE)
REMLest=c(fixef(fitREML),as.data.frame(VarCorr(fitREML))[,4])
MLest=c(fixef(fitML),as.data.frame(VarCorr(fitML))[,4])
df=data.frame("REML"=REMLest,"ML"=MLest)
rownames(df)=c("$\\beta_0$","$\\beta_1$","$\\tau_0$","$\\sigma$")
kable(df, digits = 4)
REML | ML | |
---|---|---|
\(\beta_0\) | 6.5819 | 6.5844 |
\(\beta_1\) | -2.5684 | -2.5757 |
\(\tau_0\) | 8.6675 | 7.5068 |
\(\sigma\) | 9.3622 | 9.1110 |
Q: Comment on what you see.
\[{\bf Y}\sim N({\bf X} \beta,{\bf V}(\vartheta)=\sigma^2{\bf I}+{\bf U} {\bf G} {\bf U}^T)\]
For the fixed effects we started log-likelihood function and maximized to get estimator for \(\beta\) dependent on \(\vartheta\). If we now assume that we have found \(\beta(\vartheta)\) then the profile log-likelihood is
\[l_{P}(\vartheta)=-\frac{1}{2}\ln \lvert {\bf V(\vartheta)}\rvert -\frac{1}{2}({\bf y}-{\bf X}\beta(\vartheta))^T {\bf V(\vartheta)}^{-1}({\bf y}-{\bf X}\beta(\vartheta))\]
The integration method (can be motivated from the Bayesian perspective by assuming a flat prior on \(\beta\)) constructs a marginal or restricted log-likelihood by integrating \(\beta\) out of the likelihood \[ l_{\text{REML}}(\vartheta)=\ln \int L(\beta,\vartheta)d\beta\]
It can be shown that the REML log-likelihood is \[l_{\text{REML}}(\vartheta)= l_{P}(\vartheta)-\frac{1}{2}\ln \lvert \sum_{i=1}^m {\bf X}_i^T {\bf V}(\vartheta)_i^{-1} {\bf X}_i\rvert \]
Maximizing of \(l_{\text{REML}}(\vartheta)\) provides the REML estimator for \(\vartheta\).
What do you need to know about REML?
lmer
function for fitting LMMs in the lme4
-package in R
.## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 239.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4227 -0.4848 -0.1576 0.2519 3.9794
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 8.668 2.944
## Residual 9.362 3.060
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5819 1.0958 6.007
## NAP -2.5684 0.4947 -5.192
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.157
Q: What have we covered so far, and what is missing? Explain the elements of the print-out!
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## AIC BIC logLik deviance df.resid
## 249.8 257.1 -120.9 241.8 41
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4258 -0.5010 -0.1791 0.2452 4.0452
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 7.507 2.740
## Residual 9.111 3.018
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5844 1.0321 6.380
## NAP -2.5757 0.4873 -5.285
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.164
Q: Look for differences between the REML and ML output.
Why do we want a prediction? To rank the beaches (or schools, patients?). Small-area-estimations.
Model check can also use this to check that \(\gamma\) is normal is in agreement with our fitted model (same as when using residuals to check distribution of errors).
Best Linear Unbiased Predictor (BLUP): \(\hat{\gamma_i}\)
library(spcadjust)
data(RIKZ)
library(lme4)
fit=lmer(Richness~NAP +(1|Beach),data=RIKZ)
plot_model(fit, type="slope")
The joint distribution of \({\bf Y}\) and \(\gamma\) is: \[\begin{pmatrix} {\bf Y}\\ \gamma \end{pmatrix} \sim N(\begin{pmatrix} {\bf X}\beta \\ {\bf 0}\end{pmatrix}, \begin{pmatrix}{\bf V}={\bf U}{\bf G}{\bf U}^T+\sigma^2{\bf I} & {\bf U}{\bf G}\\ {\bf G}{\bf U}^T & {\bf G}\end{pmatrix}) \]
This is maximized with respect to \(\beta\) and \(\gamma\), to give \[\hat{\gamma}={\bf G}{\bf U}^T{\bf V}^{-1}({\bf Y}-{\bf X}\hat{\beta})\]
But, here the elements of \({\bf G}\) and \({\bf V}\) needs to be estimated, and we get: \[\hat{\gamma}=\hat{\bf G}{\bf U}^T\hat{\bf V}^{-1}({\bf Y}-{\bf X}\hat{\beta})\] \[\hat{\gamma}_i=\hat{\bf Q}{\bf U}_i^T\hat{\bf V}_i^{-1}({\bf Y}_i-{\bf X}_i\hat{\beta})\]
Remark: For details on this calculation - involving the Henderson’s mixed model equations, see pages 371-372 of Fahrmeir et al (2013), or pages 98-99 in Verbeke and Molenberghs (2000), Section 5.3 (free ebook from Springer for NTNU students).
Conditional mean:
This can also be found as the mean of the conditional distribution of \(\gamma\) given \({\bf Y}\). If we also calculate the covariance of the estimated \(\gamma\) we can make approximate prediction intervals for the predicted random effects.
The general formula for the conditional multivariate normal \({\bf X}\) (known from TMA4267) is: \[{\bf X} \sim N(\mu,\Sigma)\] \[{\bf X}_2 \mid ({\bf X}_1={\bf x}_1) \sim N(\mu_2+\Sigma_{21}\Sigma_{11}^{-1} ({\bf x}_1-\mu_1),\Sigma_{22}-\Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12})\]
If we use the formula for the mean with \({\bf X}_1={\bf Y}\) and \({\bf X}_2=\gamma\), then \[\text{E}(\gamma \mid {\bf Y})={\bf 0}+{\bf G}{\bf U}^T{\bf V}^{-1}({\bf Y}-{\bf X}\beta)\] which can be used (inserted parameter estimates) to give our estimated random effects.
We may find the covariance matrix of \({\bf G}{\bf U}^T{\bf V}^{-1}({\bf Y}-{\bf X}\hat{\beta})\) (directly), and for each \(\hat{\gamma}_i\) this is given as \[ {\bf Q}{\bf U}_i^T\left( {\bf V}_i^{-1}-{\bf V}_i^{-1}{\bf X}_i (\sum_{i=1}^m {\bf X}_i^T {\bf V}_i^{-1} {\bf X}_i)^{-1} {\bf X}_i^T {\bf V}_i^{-1} \right) {\bf U}_i {\bf Q} \] (according to Verbeke and Molenberghs (2000), page 78). We insert estimates \({\bf Q}\) and \({\bf V}_i\) (thus underestimating the variability) and get the estimated covariance matrix for the random effect. Such an estimate is used in the catepillar plot below.
For \(i=1, \ldots, m\): \[ {\bf Y}_i = {\bf X}_i \beta + {\bf U}_i \gamma_{0i}+\varepsilon_i \] where \[ \varepsilon_i \sim N({\bf 0},\sigma^2 {\bf I}) \text{ and } \gamma_{0i}\sim N(0,\tau_0^2)\] and \({\bf U}_i\) is a \(n_i \times 1\) vector of ones. Further, the \(n_i \times n_i\) marginal covariance matrix for \({\bf Y}_i\) is \[ {\bf V}_i = \sigma^2 {\bf I}+ \tau_0^2 {\bf 1}{\bf 1}^T \text{ with inverse } {\bf V}_i^{-1} = \frac{1}{\sigma^2} ({\bf I}- \frac{\tau_0^2}{\sigma^2+n_i \tau_0^2} {\bf 1}{\bf 1}^T)\] which means that the elements on the main diagonal for \({\bf V}^{-1}\) are \[\frac{1}{\sigma^2(\sigma^2+n_i \tau_0^2)}\] and the off-diagonal entries are \(-\tau_0^2\).
The fixed effect estimate use this inverse matrix as the weighting matrix \({\bf V}_i^{-1}\) in \[\hat{\beta}=(\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1}\sum_{i=1}^m {\bf X}_i^T \hat{\bf V}_i^{-1}{\bf Y}_i\]
The predicted random intercepts are \[\hat{\gamma}_{0i}=\hat{\bf Q}{\bf U}_i^T\hat{\bf V}_i^{-1} ({\bf Y}_i-{\bf X}_i\hat{\beta})=\cdots =\frac{n_i \hat{\tau}_{0}^2}{\hat{\sigma}^2+n_i \hat{\tau}_{0}^2}e_i\] where \(e_i\) is the average (raw, level 0 - see below) residual \[ e_i=\frac{1}{n_i} \sum_{j=1}^{n_i} (Y_{ij}-{\bf x}_{ij}^T\hat{\beta})\]
Interpretation:
\[\hat{\gamma}_{0i}=\frac{n_i \hat{\tau}_{0}^2}{\hat{\sigma}^2+n_i \hat{\tau}_{0}^2}e_i\]
Remember \[\text{E}(\gamma \mid {\bf Y})={\bf 0}+{\bf G}{\bf U}^T{\bf V}^{-1}({\bf Y}-{\bf X}\beta)\]
The formula for \(\hat{\gamma}_{i0}\) can be seen as a weighted sum between the conditional expectation \(0\) and the average residual \(e_i\), with weighting factor \(\frac{n_i \hat{\tau}_{0}^2}{\hat{\sigma}^2+n_i \hat{\tau}_{0}^2}\) for the average residual (and 1-this for 0). The larger the \(n_i\) the closer the weight is to \(1\) and the smaller the shrinkage. Shrinkage is also high if the error variance \(\sigma^2\) is large compared to the random effect variance \(\tau_{0}^2\). The latter gives a very small ICC, so then it makes sense to have random effects close to \(0\).
We can also use the R package sjPlot
to produce plots and outputs from fitting linear mixed effects models with function lmer
in the R package lme4
. This plotting package can also be used to produce nice plots for lm
and glm
. The package uses ggplot2
and other tidyverse
packages.
For details see
For our beach-example: First the predicted values for the estimated random effect for each Beach - with confidence intervals. Unsorted and sorted version. (horisontal version of catepillar plot). Then QQ-plots for the estimated random effects.
plot_model(fit,type="re",y.offset=0.4)
plot_model(fit,type="re",sort.est="(Intercept)",y.offset=0.4)
Q: comment on what you see.
Q: which assumption can be assessed with this plot, and what is the conclusion?
Let \(\mu_i\) denote \(\text{E}({\bf Y}_i)\). Fitted values for the LMM can be made on two levels:
\[\begin{align*} \text{Level 0, marginal} &: \hat{\mu}_i={\bf X}_i \hat{\beta}\\ \text{Level 1, conditional} &: \tilde{\mu}_i={\bf X}_i \hat{\beta}+{\bf U}_i \hat{\gamma}_i \end{align*}\]For lmer
the function fitted
gives the level 1 fitted values (for our two-level models). This means that raw residuals can also be made on two levels, and the default is level 1 for lmer
.
In addition to raw residuals, also Pearson residuals (standardized) are popular.
The residuals can be used in the same way as for the Multiple linear model (module 2).
fit=lmer(Richness~NAP +(1|Beach),data=RIKZ)
df=data.frame(fitted=fitted(fit),resid=residuals(fit,scaled=TRUE))
ggplot(df, aes(fitted,resid)) +
geom_point(pch = 21) +
geom_hline(yintercept = 0, linetype = "dashed") +
geom_smooth(se = FALSE, col = "red", size = 0.5, method = "loess") +
labs(x = "Fitted values", y = "Residuals", title = "Residuals vs Fitted values")
Q: any trend? homoscedastic?
ggplot(df, aes(sample=resid)) +
stat_qq(pch = 19) +
geom_abline(intercept = 0, slope = 1, linetype = "dotted") +
labs(x = "Theoretical quantiles", y = "Standardized residuals", title = "Normal Q-Q")
Q: normally distributed?
In a study on the effect of sleep deprivation the average reaction time per day were measured. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time on a series of tests given each day to each subject. This was measured for 18 subjects for 10 days (days 0-9).
We observe that each subject’s reaction time increases approximately linearly with the number of sleepdeprived days. But, it appears that subjects have different slopes and intercepts.
As a first model we may assume that there is a common intercept and slope for the population - called fixed effects, but allow for random deviations for the intercept and slope for each individual. This is called a random intercept and slope model.
fm1 <- lmer(Reaction ~ Days + (Days | Subject), sleepstudy)
summary(fm1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy
##
## REML criterion at convergence: 1743.6
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -3.9536 -0.4634 0.0231 0.4634 5.1793
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## Subject (Intercept) 612.09 24.740
## Days 35.07 5.922 0.07
## Residual 654.94 25.592
## Number of obs: 180, groups: Subject, 18
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 251.405 6.825 36.838
## Days 10.467 1.546 6.771
##
## Correlation of Fixed Effects:
## (Intr)
## Days -0.138
Q: What are our parameter estimates and their interpretation?
library(sjPlot)
sjp.lmer(fm1, y.offset = .4,sort.est="Days")
sjp.lmer(fm1, type="rs.ri")
\[Y_{ij}=\beta_0+\beta_1x_{ij}+\gamma_{0i}+\gamma_{1i}x_{ij}+\varepsilon_{ij}\]
\[ \varepsilon_i \sim N({\bf 0},\sigma^2 {\bf I})\] \[\gamma_i =\begin{pmatrix} \gamma_{0i}\\ \gamma_{1i} \end{pmatrix} \sim N \left( \begin{pmatrix} 0\\ 0 \end{pmatrix},{\bf Q}= \begin{pmatrix} \tau_0^2 & \tau_{01}\\ \tau_{01} & \tau_1^2 \end{pmatrix} \right)\]
The parameter \(\tau_{01}\) gives the covariance beween the random intercept and random slope.
\[\text{Cov}(Y_{ij},Y_{kl})=\text{E}[(Y_{ij}-\mu_{ij})(Y_{kl}-\mu_{kl})]\]
\[\text{Cov}(Y_{ij},Y_{kl})=\left\{\begin{array}{lr} \tau_0^2+2\tau_{01}x_{ij}+\tau_1^2 x_{ij}^2+\sigma^2=\text{Var}(Y_{ij}) & \text{for } i=k, j=l\\ \tau_0^2 \tau_{01}x_{ij}+\tau_{01}x_{il}+\tau_i^2x_{ij}x_{il}& \text{for } i=k, j\neq l\\ 0 & \text{for } i\neq k, j\neq l \end{array}\right\} \]
The correlation between \(Y_{ij}\) and \(Y_{il}\) (two observations in the same cluster, that is, same beach) dependes in a complicated way on the observed values for \(x\) and is rather difficult to interpret.
\[\text{Corr}(Y_{ij},Y_{il})=\frac{\text{Cov}(Y_{ij},Y_{il})}{\sqrt{\text{Var}(Y_{ij})\text{Var}(Y_{il})}}\]
There are two main strategies:
\[\hat{\beta}=(\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1}\sum_{i=1}^m {\bf X}_i^T \hat{\bf V}_i^{-1}{\bf Y}_i\approx N(\beta, (\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1})\]
\[{\bf C}\beta={\bf d} \text{ vs. } {\bf C}\beta\neq {\bf d} \] where \({\bf C}\) is a \(r \times p\) constant matrix and \({\bf d}\) a \(r \times 1\) constant vector. Then: \[(\hat{\beta}-\beta)^T {\bf C}^T [{\bf C}(\sum_{i=1}^m {\bf X}_i^T\hat{\bf V}_i^{-1}{\bf X}_i)^{-1} {\bf C}^T]^{-1}{\bf C}(\hat{\beta}-\beta)\] asymptotically follows a \(\chi^2\)-distribution with \(r\)-degrees of freedom.
fit=lmer(Richness~NAP +(1|Beach),data=RIKZ,REML=FALSE)
summary(fit)
1-pchisq((summary(fit)$coefficients[2,3])^2,1)
## Linear mixed model fit by maximum likelihood ['lmerMod']
## Formula: Richness ~ NAP + (1 | Beach)
## Data: RIKZ
##
## AIC BIC logLik deviance df.resid
## 249.8 257.1 -120.9 241.8 41
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4258 -0.5010 -0.1791 0.2452 4.0452
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 7.507 2.740
## Residual 9.111 3.018
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 6.5844 1.0321 6.380
## NAP -2.5757 0.4873 -5.285
##
## Correlation of Fixed Effects:
## (Intr)
## NAP -0.164
## [1] 1.254807e-07
Notation:
The random effects parts of these models are assumed to be the same, while the changes are only to the fixed effects part.
The likelihood ratio statistic is defined as \[- 2\ln \lambda=-2(\ln L(\hat{\beta}_B)-\ln L(\hat{\beta}_A)) \] which under the null is asymptotically \(\chi^2\)-distributed with degrees of freedom equal the difference in the number of parameters in the large and the small model. Again, \(p\)-values are calculated in the upper tail of the \(\chi^2\)-distribution.
Remark: this is the log-likelihood, not the REML version.
Remark: this result is not valid if the the models are fitted using REML istead of ML. The reason for this is that the mean structure of the model fitted under the null hypothesis is not the sam mean structure under the alternative hypothesis, which leads to that different matrices \({\bf A}\) must be used for the REML method. Therefore these REML log-likelihoods are based on different observations and are therefore not comparable.
fit=lmer(Richness~NAP +(1|Beach),data=RIKZ,REML=FALSE)
fit0=lmer(Richness~1+(1|Beach),data=RIKZ,REML=FALSE)
anova(fit0,fit)
## Data: RIKZ
## Models:
## fit0: Richness ~ 1 + (1 | Beach)
## fit: Richness ~ NAP + (1 | Beach)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit0 3 269.30 274.72 -131.65 263.30
## fit 4 249.83 257.06 -120.92 241.83 21.474 1 3.586e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Q: Which is model A (large) and model B (small)? What do we conclude? Compare to the Wald test result (previous page).
In most situations the fixed effects model is of prime interest, however, a good choice of covariance structure is useful for interpreting the data and essential to be able to perform valid inference for the fixed effects.
Wald test can also be used for random effects parameters \(\vartheta\) in \({\bf Q}\) and \(\sigma^2\), and
But there is a problem: the performance of the normal approximation depends strongly on the true value of \(\vartheta\) and large samples are needed for values of \(\vartheta\) that are close to the boundary of the parameter space (for the hyptothesis tested), and when on the boundary the normal approximation fails.
We will not dive deep into this matter in this course, but report that the solution to this - both for the Wald and the likelihood ratio test (preferably using the REML log-likelihood) is to use a mixture of \(\chi^2\) distributions in these cases.
The fixed effects parts of these models are assumed to be the same, while the changes are only to the random effects part - and the changes gives nested models.
The likelihood ratio statistic is defined as \[- 2\ln \lambda=-2(\ln L(\hat{\vartheta}_B)-\ln L(\hat{\vartheta}_A)) \] and the REML-likelihood is preferred.
For testing a random intercept model vs. no random intercept (need for this random effect) then \[H_0: {\bf Q}=0 \text{ vs. } H_1: {\bf Q}=\tau_0^2\] asymptotically \(-2\ln \lambda\) is a mixture of \(\chi_1^2\) and \(\chi_0^2\) with equal weights. Here \(\chi_0^2\) is the distribution that gives probability mass 1 to the value 0.
If we instead had used the classical null distribution (\(\chi_1^2\)) then the \(p\)-values would be too large and the null hypotheses kept to often.
Similar strategies for other situations - see Verbeke and Molenberghs (2000) pages 69-72.
\[ \text{AIC}= -2\cdot l(\hat{\beta},\hat{\vartheta}) +2\cdot p\] \[ \text{BIC}= -2\cdot l(\hat{\beta},\hat{\vartheta}) +\ln(N)\cdot p\]
This can be used directly in the ML estimation, and as before BIC will give a smaller model than AIC.
\[ \text{AIC}= -2\cdot l(\hat{\beta},\hat{\vartheta}) +2\cdot p\] \[ \text{BIC}= -2\cdot l(\hat{\beta},\hat{\vartheta}) +\ln(N-p)\cdot p\]
Remember: \[l_{\text{REML}}(\vartheta)= l_{P}(\vartheta)-\frac{1}{2}\ln \lvert \sum_{i=1}^m {\bf X}_i^T {\bf V}(\vartheta)_i^{-1} {\bf X}_i\rvert \]
fm1=lmer(Reaction~Days+(Days|Subject),data=sleepstudy)# random slope and intercept, correlated
fm2=lmer(Reaction~Days+((1|Subject)+(0+Days|Subject)),data=sleepstudy)# random slope and intercept, uncorrelated
fm3=lmer(Reaction~Days+(1|Subject),data=sleepstudy)# random intercept
AIC(fm1,fm2,fm3)
## df AIC
## fm1 6 1755.628
## fm2 5 1753.669
## fm3 4 1794.465
Q: Which model to choose?
Start with model with all explanatory variables and possible interactions for the fixed effects - called a beyond optimal model. (Nothing is really done her, just decide on the fixed part).
With this beyond optimal fixed effects model we now focus on the random effects. The idea is that since we have many explanatory variables in the fixed effects the random component should not contain information that we would prefer to have in the fixed effect. To do this we may either use testing or AIC or BIC. Testing is problematic due to that the null hypotheses tested is on the boundary of the parameter values tested (\(\tau^2=0\)). REML must be used (to get as unbiased estimates as possible).
Now we have the optimal random effect, so we focus on the optimal fixed effect model. ML must be used because different fixed effects will give incomparable REML-log-likelihoods. Testing or AIC or BIC can be used.
The final model is then presented with REML estimates.
This example is taken from Zuur et al. (2009), pages 127-128.
We deside on fixed model with intercept, main effect of NAP
and Exposure
and the interaction thereof.
Fit the fixed model from 1 to “no random effect”, “random intercept” and random interept and slope for NAP
. Use REML.
B1=gls(Richness~1+NAP*fExp,data=RIKZ,method="REML")
B2=lmer(Richness~1+NAP*fExp+(1|Beach),data=RIKZ)
B3=lmer(Richness~1+NAP*fExp+(1+NAP|Beach),data=RIKZ)
AIC(B1,B2,B3)
## df AIC
## B1 5 238.5329
## B2 6 236.4925
## B3 8 237.1331
Conclusion: choose the random intercept model (lowest AIC).
F2=lmer(Richness~1+NAP*fExp+(1|Beach),data=RIKZ,REML=FALSE)
confint(F2)
## 2.5 % 97.5 %
## .sig01 0.0000000 3.145294
## .sigma 2.3114668 3.681773
## (Intercept) 6.9045813 10.804288
## NAP -4.7299177 -2.275315
## fExp11 -8.1969707 -2.303772
## NAP:fExp11 0.1919491 3.877650
F2a=lmer(Richness~1+NAP+fExp+(1|Beach),data=RIKZ,REML=FALSE)
confint(F2a)
## 2.5 % 97.5 %
## .sig01 0.000000 3.297744
## .sigma 2.435935 3.879425
## (Intercept) 6.562780 10.630777
## NAP -3.578864 -1.644351
## fExp11 -7.563508 -1.505626
F2b=lmer(Richness~1+NAP+(1|Beach),data=RIKZ,REML=FALSE)
confint(F2b)
## 2.5 % 97.5 %
## .sig01 1.484204 5.100814
## .sigma 2.435429 3.877572
## (Intercept) 4.324560 8.824909
## NAP -3.566901 -1.599779
F2c=lmer(Richness~1+fExp+(1|Beach),data=RIKZ,REML=FALSE)
confint(F2c)
## 2.5 % 97.5 %
## .sig01 0.000000 3.95707
## .sigma 3.178126 5.05812
## (Intercept) 5.385254 10.29475
## fExp11 -8.522119 -1.15788
AIC(F2,F2a,F2b,F2c)
## df AIC
## F2 6 242.1135
## F2a 5 244.7589
## F2b 4 249.8291
## F2c 4 265.4332
Conclusion: keep the full model. No confidence intervals cover 0, and the AIC supports the full model.
summary(B2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: Richness ~ 1 + NAP * fExp + (1 | Beach)
## Data: RIKZ
##
## REML criterion at convergence: 224.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4849 -0.4161 -0.0770 0.1521 3.7313
##
## Random effects:
## Groups Name Variance Std.Dev.
## Beach (Intercept) 3.307 1.819
## Residual 8.660 2.943
## Number of obs: 45, groups: Beach, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 8.8611 1.0208 8.680
## NAP -3.4637 0.6279 -5.517
## fExp11 -5.2556 1.5452 -3.401
## NAP:fExp11 2.0005 0.9461 2.114
##
## Correlation of Fixed Effects:
## (Intr) NAP fExp11
## NAP -0.181
## fExp11 -0.661 0.120
## NAP:fExp11 0.120 -0.664 -0.221
Remark: In Zuur et al (2009), page 128, the conclusion was to use the additive model (model F2a above), based on asymptotic \(p\)-values (but this was smaller than 0.05) using the nlme
package.
lmer
in package lme4
This is based on the article Fitting Linear Mixed-Effects Models Using lme4 by Bates, Bolker, Mächler and Walker (2015) in Journal of Statistical Software, and in particular pages 30 and onwards.
We use a data set called the ergonometrics experiment data set ergoStool
for illustration.
effort
: the effort required (socalled Borg scale) to arise from a stool (krakk) - this is our responseType
: the type of stool - types T1, T2, T3 and T4 studied.Subject
: each of nine different subjects tested the four different stools (in random order?). SubjectsWas there any clear winner among the stools, when the goal was to minimize effort?
ergoStool
data setis found in the MEMSS
package.
library(MEMSS)
summary(ergoStool)
table(ergoStool$Subject)
contrasts(ergoStool$Type) #default contrast used
## effort Type Subject
## Min. : 7.00 T1:9 A : 4
## 1st Qu.: 8.00 T2:9 B : 4
## Median :10.00 T3:9 C : 4
## Mean :10.25 T4:9 D : 4
## 3rd Qu.:12.00 E : 4
## Max. :15.00 F : 4
## (Other):12
##
## A B C D E F G H I
## 4 4 4 4 4 4 4 4 4
## T2 T3 T4
## T1 0 0 0
## T2 1 0 0
## T3 0 1 0
## T4 0 0 1
Observe that the type of stool is coded as dummy variable, with T1 as reference category.
lmer
: summary
library(lme4)
fit=lmer(effort~Type + (1|Subject), data=ergoStool)
summary(fit)
## Linear mixed model fit by REML ['lmerMod']
## Formula: effort ~ Type + (1 | Subject)
## Data: ergoStool
##
## REML criterion at convergence: 121.1
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.80200 -0.64317 0.05783 0.70100 1.63142
##
## Random effects:
## Groups Name Variance Std.Dev.
## Subject (Intercept) 1.775 1.332
## Residual 1.211 1.100
## Number of obs: 36, groups: Subject, 9
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 8.5556 0.5760 14.853
## TypeT2 3.8889 0.5187 7.498
## TypeT3 2.2222 0.5187 4.284
## TypeT4 0.6667 0.5187 1.285
##
## Correlation of Fixed Effects:
## (Intr) TypeT2 TypeT3
## TypeT2 -0.450
## TypeT3 -0.450 0.500
## TypeT4 -0.450 0.500 0.500
The model formula gives first the fixed effects, which here is an intercept and then type of stool (with T1 as reference, so estimate difference from T1). We use a random intercept for each Subject, given as (1|Subject)
.
From the print-out from summary
we see that REML is used to fit the model, and quantiles of scaled Pearson residuals. Could also have used:
formula(fit)
REMLcrit(fit)
quantile(residuals(fit,"pearson",scaled=TRUE))
## effort ~ Type + (1 | Subject)
## [1] 121.1308
## 0% 25% 50% 75% 100%
## -1.80200345 -0.64316591 0.05783115 0.70099706 1.63142054
Then there is a part on the fitted random effects and residual variation. The intra class correlation could also be calculated from VarCorr(fit)
(an object of class VarCorr.merMod
). Observe the very high ICC of 0.6.
vc=VarCorr(fit)
print(vc,comp="Variance")
df=as.data.frame(vc)
print(df)
print(vc)
nobs(fit)
ngrps(fit)
sigma(fit)
ICC=df[4][[1]][1]/sum(df[4][[1]])
ICC
## Groups Name Variance
## Subject (Intercept) 1.7755
## Residual 1.2106
## grp var1 var2 vcov sdcor
## 1 Subject (Intercept) <NA> 1.775463 1.332465
## 2 Residual <NA> <NA> 1.210648 1.100295
## Groups Name Std.Dev.
## Subject (Intercept) 1.3325
## Residual 1.1003
## [1] 36
## Subject
## 9
## [1] 1.100295
## [1] 0.5945736
Then to the fitted fixed effects, which is interpreted as for lm
with treatment contrast (dummy effect coding), but without any \(p\)-values, and anova
gives the analysis of variance table. See help("pvalues")
to explore your options to find \(p\)-values for testing fixed effects.
fixef(fit)
coef(summary(fit))
anova(fit)
help("pvalues")
## (Intercept) TypeT2 TypeT3 TypeT4
## 8.5555556 3.8888889 2.2222222 0.6666667
## Estimate Std. Error t value
## (Intercept) 8.5555556 0.5760123 14.853079
## TypeT2 3.8888889 0.5186838 7.497610
## TypeT3 2.2222222 0.5186838 4.284348
## TypeT4 0.6666667 0.5186838 1.285304
## Analysis of Variance Table
## Df Sum Sq Mean Sq F value
## Type 3 81.194 27.065 22.356
It is easiest to rise from the stool of Type T1, followed by Type T4 and the Type T3 and finally Type T2.
We may also get confidence intervals for the fixed effects (and random effects variances), based on the profile likelihood, which might be thought of as analogues to \(p\)-values. However, keep in mind our coding of Type (dummy), so if the intervals contain 0 the Type is not different from the reference Type T1 (the best type wrt effort to arise). The confidence intervals are made using the Wald approximation for the fixed effects. A bootstrap confidence interval can also be provided.
confint(fit)
## 2.5 % 97.5 %
## .sig01 0.7342354 2.287261
## .sigma 0.8119798 1.390104
## (Intercept) 7.4238425 9.687269
## TypeT2 2.8953043 4.882473
## TypeT3 1.2286377 3.215807
## TypeT4 -0.3269179 1.660251
Finally, there is a part on the correlation between estimated fixed effects, here we have the estimated correlation between the three levels of type of stool. We can get the variance-covariance matrix with vcov
, and can calculated correlations from that matrix.
vcov(fit)
## 4 x 4 Matrix of class "dpoMatrix"
## (Intercept) TypeT2 TypeT3 TypeT4
## (Intercept) 0.3317901 -0.1345165 -0.1345165 -0.1345165
## TypeT2 -0.1345165 0.2690329 0.1345165 0.1345165
## TypeT3 -0.1345165 0.1345165 0.2690329 0.1345165
## TypeT4 -0.1345165 0.1345165 0.1345165 0.2690329
Fitted vs. residuals and normal qq-plot (from lattice
). Not ggplot - see below for more plotting.
plot(fit, type=c("p","smooth"))
library(lattice)
qqmath(fit,id=0.05)
We may also use anova
to compare models. Assume that we want to compare to the (probably very bad) model where type of stool is not taken into account (which is stupic if we want to investigate the types) - so just go show (better example for sleep study).
fit0=lmer(effort~1 + (1|Subject), data=ergoStool)
anova(fit0,fit)
## Data: ergoStool
## Models:
## fit0: effort ~ 1 + (1 | Subject)
## fit: effort ~ Type + (1 | Subject)
## Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
## fit0 3 164.15 168.90 -79.075 158.15
## fit 6 134.14 143.65 -61.072 122.14 36.006 3 7.468e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The comparison is based on the likelihood ratio test with ML (not REML), and also gives \(p\)-values.
lme4
Excerpt from Fitting Linear Mixed-Effects Models Using lme4 by Bates, Bolker, Mächler and Walker (2015) in Journal of Statistical Software page 35: Computing p values One of the more controversial design decisions of lme4 has been to omit the output of p values associated with sequential ANOVA decompositions of fixed effects. The absence of analytical results for null distributions of parameter estimates in complex situations (e.g., unbalanced or partially crossed designs) is a long-standing problem in mixed-model inference. While the null distributions (and the sampling distributions of non-null estimates) are asymptotically normal, these distributions are not t distributed for finite size samples – nor are the corresponding null distributions of differences in scaled deviances F distributed. Thus approximate methods for computing the approximate degrees of freedom for t distributions, or the denominator degrees of freedom for F statistics (Satterthwaite 1946; Kenward and Roger 1997), are at best ad hoc solutions. However, computing finite-size-corrected p values is sometimes necessary. Therefore, although the package does not provide them (except via parametric bootstrapping, Section 5.1), we have provided a help page to guide users in finding appropriate methods: R> help("pvalues")
Plan: work in randomly assigned groups.
The data in this problem is based on four measurements of a particular bone for 20 boys. The meaurements were taken at 8, 8.5, 9 and 9.5 years of age. The variables in the dataset are
bone
: length of bone in millimetersredage
: centered age (i.e. age - 8.75)Below is an excerpt from the output from fitting a linear mixed model (LMM) with the procedure lmer
in R
where the length of the bone, bone
, is the response,
\[ \texttt{bone}_{ij} = \beta_0 + \beta_1 \texttt{redage}_{ij} + \gamma_{0i} + \gamma_{1i}\texttt{redage}_{ij} + \varepsilon_{ij} \ i = 1, \dots, 20, \ j = 1,2,3,4 \]
where \(\mathbf{\gamma}_i = (\gamma_{0i}, \gamma_{1i})^T\), \(i = 1, \dots, 20\) represent the random effects.
filepath <- "https://www.math.ntnu.no/emner/TMA4315/2017h/bonedata.txt"
bone <- read.table(filepath, header = TRUE)
library(lme4)
fit2 <- lmer(bone ~ redage + (1 + redage | boy), data = bone)
summary(fit2)
a) Formulate the model on matrix form and explain the meaning and interpretation of the different parts. State the usual model assumptions carefully.
b) Describe the distribution of the response \(\mathbf{y}_i = (y_{i1}, y_{i2}, y_{i3}, y_{i4})^T\), \(i = 1, \dots, 20\), and explain how you can find numerical values using the R
-output.
Remark: c is be a bit go technical for us without access to our textbook/module pages.
c) Find the conditional expectation of a random effect \(\pmb{\gamma}_i\), \(i = 1, \dots, 20\) given the observations, i.e. E\((\pmb{\gamma}_i|\mathbf{y}_1, \dots, \mathbf{y}_{20})\). Describe how the random effects, \(\pmb{\gamma}_i\), \(i = 1, \dots, 20\), can be predicted/estimated.
In the exam, no numerical calculations was necessary in b) and c).
d) New: do b) and c) in R
(after you have done b) and c) by hand!!!, i.e. do numerical calculations). The R markdown file of this module contains necessary code to download the dataset, and fit the full model. In addition, plot the distribution of \((\gamma_{0i}, \gamma_{1i})^T\) (this is a multivariate distribution).
The data used in this problem conserns expenses in the the social security system Medicare in US. Average expenses per hospitalization, denoted as ccpd
, were in six years recorded for 54 regions: the fifty US states, Puerto Rico, Virgin Islands, District of Columbia and an unspecified other. Thus there are \(6\times54 = 324\) observations. The expenses are treated as response. The covariates are j = YEAR
which can take values \(1, \dots, 6\) and a factor indicating the average length of stay at hospital, AVETD
, in each region and year. This factor has tree levels, 1
= six days or less, 2
= 7-9 days, 3
= 10 days or more. “Six days or less”" is the reference level and the others are denoted as AVETD2
and AVETD3
.
Below you find the output from fitting the linear mixed effects model
\[y_{ij} = \beta_0 + j \times \beta_1 + \beta_2 \texttt{AVETD2}_{ij} + \beta_3 \texttt{AVETD3}_{ij} + \gamma_{0i} + j \times \gamma_{1i} + \varepsilon_{ij} \]
\(j = 1, \dots, n_i\), $i = 1, , 54 $ and \(N=\sum_{i=1}^m n_i=324\).
filepath <- "https://www.math.ntnu.no/emner/TMA4315/2017h/medicare.dat"
medicare <- read.table(filepath, header = TRUE, colClasses = c("numeric", "numeric", "factor", "factor"))
library(lme4)
fit1 <- lmer(ccpd ~ YEAR + AVETD + (1 + YEAR | fstate), data = medicare)
summary(fit1)
a) Formulate the model in matrix form and explain the what the usual assumptions are.
b) Compute a 95 % confidence interval for the fixed effect coefficient for YEAR
.
Remark: for c we have not focus on testing random effects in our course. c) Explain how we can find out if we can simplify the model by removing the random effect \(\gamma_{1i}\).
d) What is the expectation and covariate matrix in the marginal model of the response \((Y_{i1}, Y_{i2}, Y_{i3}, Y_{i4}, Y_{i5}, Y_{i6})^T\)?
e) Explain how the null hypothesis \(H_0: \beta_3 = 2\times\beta_2\) versus the alternative hypothesis \(H_1: \beta_3 \neq 2\times\beta_2\) can be tested? In this part no numerical calculations are expected.
f) New: Do c) and e) in R
. The R Markdown file of this module contains necessary code to download the dataset, and fit the full model.