(Latest changes: 08.11.2018 - typos).

(Warning: some changes may occur before the second week.)

Overview

Aim: Present methods for analysing correlated responses in a (normal/Gaussian) regression setting.

We will only consider two-level models and in particular focus on random intercept and random slope models.


Learning material

  • Textbook: Fahrmeir et al (2013): Chapter 2.4, 7.1-7.3, 7.7. In greater detail: pages 349-354 (not “Alternative view on the random intercept model”), 356-365 (not 7.1.5 “Stochastic Covariates”“), 368-377 (not”Bayesian Covariance Matrix”“), 379-380 (not”Testing Random Effects or Variance Parameters”“, only last part on page 383), 383 (middle), 401-409 (orange juice). Note: Bayesian solutions not on the reading list.

  • Alternative readings: Zuur et al. (2009): “Mixed Effects Models and Extensions in Ecology with R”, chapter 5 (pages 101-142). Available as free ebook from Springer for NTNU students. More explanations and less mathematics than Fahrmeir et al (2013), more focus on understanding. Link to ebook Chapter 5

  • Classnotes 01.11.2018

  • Classnotes 08.11.2018

#Interactive session week 1

Plan:

  • Problem 1abcd (not ef - covered in week 2).
  • Problem 2abe (not cd - covered in week 2).
  • If time: Exam 2017, Problem 3

Exercise 1: Taken from UiO, STK3100, 2011, problem 1

We will in this exercise look at the following model:

\[Y_{ij} = \mathbf{x}_{ij}^T \pmb{\beta} + \gamma_{0i} + {\varepsilon}_{ij}, \ \varepsilon_{ij} \sim N(0, \sigma^2) \] \[\gamma_{0i} \sim N(0, \tau_0^2)\]

where \(i \in \{1, \dots, m\}\) is the group index, and \(j \in \{1, \dots, n_i\}\) is the index for repeated measurements in each group (cluster). We assume that all random variables are independent.

a) What is this model called? Discuss where and when such models are useful

Answer This is a random intercept model. Such models are useful if you want to model correlations between variables from the same group (clustered data) or the same invidual (longitudinal data).

b) What is the marginal model for \(\mathbf{Y}_i = (Y_{i1}, \dots, Y_{in_i})\)? Show how you arrive at your answer and write with vectors and matrices (for cluster \(i\)).

What are the advantages of having an expression for the marginal distribution of \(\mathbf{Y}_i\) when doing estimation?

Answer

We have

\[\mathbf{Y}_i = \mathbf{X}_i \pmb{\beta} + \mathbf{1}\gamma_{0i} + \pmb{\varepsilon}_i, \ \pmb{\varepsilon}_i \sim N(0, \sigma^2\mathbf{I}) \]

which gives

\[ \mathbf{Y}_i \sim N(\mathbf{X}_i\pmb{\beta}, \tau_0^2 \mathbf{11}^T + \sigma^2 \mathbf{I}) \]

where \(\mathbf{1}\) is a column vector of ones.

Advantage of having the marginal distribution: Then we also have an explicit expression for the likelihood

\[L(\pmb{\beta}, \tau_0^2, \sigma^2) = \prod_{i = 1}^N f(y_i; \pmb{\beta}, \tau_0^2, \sigma^2) \]

which we need to find MLE and REML estimates.

In the rest of the exercise we will look at a dataset consisting of observations of cod from the year 2000. The dataset is from Havforskningsinstitutten in Bergen, and we use only a subset of a large dataset with information about fish in Barentshavet. The following variables are available on a random sample within each catch (a draft of trawl (trål)):

  • length: the length of fish (in cm)
  • weight: the weight of fish (in grams)
  • age: the age of fish (in years)
  • haulsize: the size of the total catch (in ton (1000 kg))

Let \(i\) be the index for catch, and \(j\) be the index for an individual fish in a catch. We will not use the age variable in this exercise.

We start by looking at the model

\[ \log(\texttt{weight}_{ij}) = \beta_0 + \beta_1 \log(\texttt{length}_{ij}) + \beta_2\log(\texttt{haulsize}_i) + \gamma_{0i} + \varepsilon_{ij} \]

where \(\gamma_{0i}\sim N(0, \tau_0^2)\), \(\varepsilon_{ij} \sim N(0,\sigma^2)\), and all random effects are independent. Below is an excerpt from the output from fitting this model in R.

library(lme4)

filepath <- "https://www.math.ntnu.no/emner/TMA4315/2018h/fishdata.dat"
fish <- read.table(filepath, header = TRUE)

fit1 <- lmer(log(weight) ~ log(length) + log(haulsize) + (1 | haul), data = fish)

summary(fit1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: log(weight) ~ log(length) + log(haulsize) + (1 | haul)
##    Data: fish
## 
## REML criterion at convergence: -10188.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.1204 -0.6107 -0.0314  0.5813  5.1458 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  haul     (Intercept) 0.003030 0.05505 
##  Residual             0.008594 0.09270 
## Number of obs: 5433, groups:  haul, 63
## 
## Fixed effects:
##                Estimate Std. Error  t value
## (Intercept)   -4.286029   0.041554 -103.143
## log(length)    2.901037   0.009703  298.984
## log(haulsize)  0.006169   0.005328    1.158
## 
## Correlation of Fixed Effects:
##             (Intr) lg(ln)
## log(length) -0.980       
## log(haulsz)  0.047  0.056

c) Write down the estimates for all the parameters in the model. What is the correlation between two weight-variables from the same catch (haul)?

Answer

We have the following parameter estimates:

Parameter Estimate
\(\beta_0\) -4.2860
\(\beta_1\) 2.9010
\(\beta_2\) 0.0062
\(\sigma_b\) 0.0550
\(\sigma\) 0.0927

The correlation between two weight-variables from the same catch (haul) is found from inserting parameter estimates into \[\text{Corr}(Y_{ij},Y_{il})=\frac{\tau_0^2}{\tau_0^2+\sigma^2} \text{ for }j\neq l\]

vc <- VarCorr(fit1)
print(vc)
df <- as.data.frame(vc)
numerator <- df[4][[1]][1]
denominator <- sum(df[4][[1]])
cat("numerator:", numerator, "\n")
cat("denominator:", denominator, "\n")
ICC <- numerator/denominator
ICC
##  Groups   Name        Std.Dev.
##  haul     (Intercept) 0.055048
##  Residual             0.092701
## numerator: 0.003030271 
## denominator: 0.01162381 
## [1] 0.2606951

Then, \[\widehat{\frac{\tau_0^2}{\tau_0^2 + \sigma^2}} = \frac{{0.0030303}}{{0.0116238}}={0.2606951}\]

d) We are now interested in \(\theta = \exp(\beta_0 + \beta_1 \log(66) + \beta_2 \log(0.46))\). The standard deviation of \(\hat{\beta}_0 + \hat{\beta}_1 \log(66) + \hat{\beta}_2 \log(0.46)\) is 0.007. Explain how you can calculate this value (you do not have to do the calculation). Then create and calculate a 95 % confidence interval for \(\theta\).

Answer

\(\pmb{\hat{\beta}}\) is approximately Gaussian (normal), and then

\[\log(\hat{\theta}) = \hat{\mu} = \hat{\beta}_0 + \hat{\beta}_1 \log(66) + \hat{\beta}_2 \log(0.46)\]

is also Gaussian with variance

\[Var(\hat{\mu}) = Var(\hat{\beta}_0) + \log(66)^2Var(\hat{\beta}_1) + \log(0.46)^2Var(\hat{\beta}_2) + \\ 2\cdot \log(66)Cov(\hat{\beta}_0, \hat{\beta}_1) + 2\cdot \log(0.46)Cov(\hat{\beta}_0, \hat{\beta}_2) + 2\log(66)\log(0.46)Cov(\hat{\beta}_1, \hat{\beta}_2)) = 0.00706^2\]

Then we get a 95 % confidence interval for \(\mu\):

\[\hat{\mu} \pm 1.96 \cdot \sqrt{Var(\hat{\mu})} = [7.849683, 7.877365] \]

We take the exponential of this interval to get a confidence interval for \(\theta\): \([2564.922, 2636.914]\)

# it is much simpler to just use vcov(fit1), but here we write down everything
# so we can compare with the printout
vbeta0 <- summary(fit1)$coefficients[1, 2]^2
vbeta1 <- summary(fit1)$coefficients[2, 2]^2
vbeta2 <- summary(fit1)$coefficients[3, 2]^2
# cov2cor gives correlations from a covariance matrix
cbeta01 <- sqrt(vbeta0) * sqrt(vbeta1) * cov2cor(vcov(fit1))[1, 2]
cbeta02 <- sqrt(vbeta0) * sqrt(vbeta2) * cov2cor(vcov(fit1))[1, 3]
cbeta12 <- sqrt(vbeta1) * sqrt(vbeta2) * cov2cor(vcov(fit1))[2, 3]

variance <- vbeta0 + log(66)^2 * vbeta1 + log(0.46)^2 * vbeta2 + 2 * log(66) * cbeta01 +
    2 * log(0.46) * cbeta02 + 2 * log(66) * log(0.46) * cbeta12
SD <- sqrt(variance)

# OR

SD2 <- as.numeric(sqrt(c(1, log(66), log(0.46)) %*% vcov(fit1) %*% c(1, log(66),
    log(0.46))))

cat(SD, SD2)

critval <- qnorm(0.975)

CI <- c(-critval * SD, critval * SD) + c(fixef(fit1) %*% c(1, log(66), log(0.46)))
cat(CI)
cat(exp(CI))
## 0.007061665 0.0070616657.849683 7.8773652564.922 2636.914

Remark: skip (e), we will not focus so much on model selection. However, a top-down-strategy (which this exam question points to) will be explained in the end of this module page (week 2)

e) Assume that you want to compare different models with respect to which random effects and fixed effects that should be included in the model. Write down a general strategy for doing model evaluation on these kinds of models.

Answer
  1. Start with a model as close to the saturated as possible (all covariates and as many interactions as possible)
  2. Find the optimal structure on all random effects (use REML)
  3. Find the optimal structure on all fixed effects (use ML)
  4. Present the final model with REML estimates

f) Below is a (horizontal version of a) catepillarplot of the random effects with confidence intervals, and a density plot for the random effects and a QQ-plot for the random effects. Explain what you see.

library(ggplot2)
library(sjPlot)
plot_model(fit1, y.offset = 0.5, type = "re")

ggplot() + geom_density(aes(x = ranef(fit1)$haul[[1]])) + theme_minimal() + labs(x = "x",
    y = "y", title = "Density")

gg <- plot_model(fit1, type = "diag", prnt.plot = FALSE, title = "Quantile plot",
    geom.size = 1)
gg[[2]]

## $haul
Answer

From the plots we see that it is reasonable to say that the random intercepts follow a normal distribution with mean 0 and standard deviation around 0.05 (estimated value is 0.05505):

ggplot(data = data.frame(x = ranef(fit1)$haul[[1]]), aes(x = x)) + geom_density() +
    theme_minimal() + labs(x = "x", y = "y", title = "Density") + stat_function(fun = dnorm,
    args = list(mean = 0, sd = attr(VarCorr(fit1)$haul, "stddev")), col = "red")


Exercise 2: Taken from UiO, STK3100, 2013, problem 2

The data in this problem is part of a longitudinal study of income in the US, the Panel Study of Income Dynamics, begun in 1968. The subset consists of 42 heads of household who were aged 25-39 in 1968. The variables included are

  • annual nominal income, which is the response variable
  • age, age in 1968
  • cyear, coded as -10 in 1968, 0 in 1978 and 10 in 1988
  • educ, years of education in 1968
  • sex, M = male, F = female
  • person, containing information on person id

Below is an excerpt from the output from fitting a linear mixed model (LMM) with the function lmer in R. The log income (lincm) is the response,

\[\texttt{lincm}_{ij} = \beta_0 + \beta_1 \texttt{age}_i + \beta_2 \texttt{cyear}_{ij} + \beta_3 \texttt{educ}_i + \beta_4 \texttt{sex}_i + \gamma_{0i} + \varepsilon_{ij}\] \[i = 1, \dots, 42, \ j = 1968, 1978, 1988\]

where \(\gamma_{0i}\) represent the random effects.

## Linear mixed model fit by REML ['lmerMod']
## Formula: lincm ~ age + cyear + educ + sex + (1 | person)
##    Data: psid2
## 
## REML criterion at convergence: 306
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4411 -0.4003  0.1071  0.5602  1.6038 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  person   (Intercept) 0.001757 0.04192 
##  Residual             0.563294 0.75053 
## Number of obs: 126, groups:  person, 42
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  7.386823   0.631710  11.693
## age         -0.020930   0.015207  -1.376
## cyear        0.084163   0.008189  10.278
## educ         0.116343   0.027582   4.218
## sexM         1.311661   0.142247   9.221
## 
## Correlation of Fixed Effects:
##       (Intr) age    cyear  educ  
## age   -0.831                     
## cyear  0.000  0.000              
## educ  -0.685  0.201  0.000       
## sexM   0.003 -0.217  0.000  0.041

a) Formulate the model on matrix form and explain the meaning of the different parts. State the model assumptions.

Answer

On a vector form for responses \(\textbf{Y}_i^T = (Y_{i1}, Y_{i2}, Y_{i3})^T\):

\[ \begin{pmatrix} Y_{i1} \\ Y_{i2} \\ Y_{i3} \end{pmatrix} = \begin{pmatrix} 1 & \texttt{age}_{i1} & \texttt{cyear}_i & \texttt{sex}_i \\ 1 & \texttt{age}_{i2} & \texttt{cyear}_i & \texttt{sex}_i \\ 1 & \texttt{age}_{i3} & \texttt{cyear}_i & \texttt{sex}_i \\ \end{pmatrix} \begin{pmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \\ \beta_4 \end{pmatrix} + \begin{pmatrix} 1 \\ 1 \\ 1 \end{pmatrix} \mathbf{\gamma}_{0i} + \begin{pmatrix} \varepsilon_{i1} \\ \varepsilon_{i2} \\ \varepsilon_{i3} \end{pmatrix} \]

Which has the form of a linear mixed model \(\mathbf{Y}_i = X_i\beta + U_i\mathbf{\gamma}_{0i} + \varepsilon_i\), where the columns of the \(n_i \times p\)-matrix \(U_i\) is a subset of the columns of the \(n_i \times p\)-matrix \(X_i\). Here, \(\pmb{\gamma}_{0i}\) and \(\varepsilon_i\) are independent, and

\[\pmb{\gamma}_{0i} \sim N_{q+1}(0, Q) \text{ and } \varepsilon_i = \begin{pmatrix} \varepsilon_{i1} \\ \varepsilon_{i2} \\ \varepsilon_{i3} \end{pmatrix} \sim N_{n_i}(0, \Sigma_i), \ i = 1, \dots, N \]

In our case, \(N = 42\), \(n_i = 3\), \(p = 4\), \(q = 0\) (since we only have random intercept), and \(\Sigma_i = \sigma^2 I_3\) where \(I_3\) is the \(3 \times 3\) identity matrix. Here \(\beta_0, \dots, \beta_4\) describe the fixed effects and \(\textbf{b}_i\) the random effects.

Assumptions: \(\pmb{\gamma}_{0i}\) and \(\varepsilon_{i}\) must be independent, and we need a linear relationshop between \(y\) and \(x\). In general, we need things to be like the model says (e.g. Gaussian errors).

b) Determine an approximate 95 % interval for the coefficient of cyear. Do you think the nominal income has been constant in the period covered by the survey?

Answer
library(faraway)
library(lme4)

data(psid, package = "faraway")
psid$cyear <- psid$year - 78
psid2 <- subset(psid, cyear %in% c(-10, 0, 10))
psid2 <- psid2[psid2$person %in% names(which(table(psid2$person) >= 3)), ]
psid2$lincm <- log(psid2$income)

fit2 <- lmer(lincm ~ age + cyear + educ + sex + (1 | person), data = psid2)

summary(fit2)

critval <- qnorm(0.025, lower.tail = FALSE)
interval <- summary(fit2)$coefficients[3, 1] + critval * summary(fit2)$coefficients[3,
    2] * c(-1, 1)
Q <- c(summary(fit2)$varcor$person)  # q = 1, so this is not a matrix (a 1x1-matrix)
sigma_i <- sigma(fit2)^2 * diag(3)
## Linear mixed model fit by REML ['lmerMod']
## Formula: lincm ~ age + cyear + educ + sex + (1 | person)
##    Data: psid2
## 
## REML criterion at convergence: 306
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.4411 -0.4003  0.1071  0.5602  1.6038 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  person   (Intercept) 0.001757 0.04192 
##  Residual             0.563294 0.75053 
## Number of obs: 126, groups:  person, 42
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  7.386823   0.631710  11.693
## age         -0.020930   0.015207  -1.376
## cyear        0.084163   0.008189  10.278
## educ         0.116343   0.027582   4.218
## sexM         1.311661   0.142247   9.221
## 
## Correlation of Fixed Effects:
##       (Intr) age    cyear  educ  
## age   -0.831                     
## cyear  0.000  0.000              
## educ  -0.685  0.201  0.000       
## sexM   0.003 -0.217  0.000  0.041

From the output we see that the coefficient of cyear, \(\beta_3\) is estimated to be \(\hat{\beta}_3 = 0.08416257\) with estimated standard error 0.008188946. Hence, an approximate interval has limits \(0.08416257 \pm 1.96 \times 0.008188946\), so the interval is \((0.06811253, 0.10021261)\). The interval does not contain zero, so a test for constant nominal income would be rejected.

e) Use the values in the R-output to calculate the estimated covariance matrix for the response \((Y_{i1}, Y_{i2}, Y_{i3})^T\) by hand. Hint: compound symmetry due to random intercept model.

Answer

The elements in the covariance matrix are

\[ \text{Cov}(Y_{ij}, Y_{kl}) = \begin{cases} \tau_0^2 + \sigma^2 = \text{Var}(Y_{ij}) & \text{ for } i=k, j=l \\ \tau_0^2 & \text{ for } i=k, j \neq l \\ 0 & \text{ for } i \neq k, j \neq l \end{cases} \]

From d) it follows that \(\hat{\Sigma}_{\mathbf{Y}_i} = U_i\hat{Q}U_i^T + \hat{\Sigma}_i\). In this case \(Z_i = (1,1,1)^T\) and \(\hat{\Sigma}_i = \hat{\sigma}^2 I_3\). From the output we have that \(\hat{Q} = \hat{\tau}_0^2 =\) 0.0017571 \(\approx 0.0018\) and \(\hat{\sigma} =\) 0.5632943 \(\approx 0.563\). Thus we get

\[\hat{\Sigma}_{\mathbf{Y}_i} = \hat{\tau}_0^2 \begin{pmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{pmatrix} + \hat{\sigma}^2 \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{pmatrix}\]

\[ = \begin{pmatrix} 0.0018+0.563 & 0.0018 & 0.0018 \\ 0.0018 & 0.0018+0.563 & 0.0018 \\ 0.0018 & 0.0018 & 0.0018+0.563 \end{pmatrix} \]

\[ = \begin{pmatrix} 0.5648 & 0.0018 & 0.0018 \\ 0.0018 & 0.5648 & 0.0018 \\ 0.0018 & 0.0018 & 0.5648 \end{pmatrix} \]

(diagonal elements all equal to \(\hat{\sigma}^2 + \hat{\tau}_0^2\) and off-diagonal elements all equal to \(\hat{\tau}_0^2\)).

Remark Skip (c), which will be lectured in week 2.

c) Explain how one can test the simultaneous significance of two fixed effects (e.g. age and educ) in LMMs.

Answer

The question involves the fixed effects. One way to do it is to fit two models:

  • one full model containing all the fixed effects, and
  • one nested model where the two effects age and educ are omitted,

and obtain the maximal value of the likelihood at the two models, call them \(L_{max, full}\) and \(L_{max, nested}\). The likelihood ratio test consists of comparing

\[-2 \log(L_{max, nested}/L_{max, full}) =-2(l_{max, nested}-l_{max, full})\]

to a \(\chi^2\)-distribution where the degrees of freedom are the difference of the number of parameters in the two models, i.e., the number of restrictions which is two in this case.

It is important that the ordinary maximum likelihood estimates are used. The restricted maximum likelihood method (REML) consists of basing the estimates on a linear transformation of the data. These transformations are different for the full and the nested model and involve unequal reductions of the data. Therefore it does not make sense to compare the REML likelihoods since they are based on different data.

An alternative is to use the Wald test. The approximate distribution of the estimators of the coefficients have covariance matrix

\[\Sigma_{\hat{\beta}} = \left( \sum_{i = 1}^N (X_i^T \Sigma_i(\hat{\theta})^{-1}X_i) \right)^{-1}\]

where \(\Sigma_i(\theta)\) is the covariance matrix of \(\mathbf{Y}_i\) and \(\theta\) are the parameters that describe this covariance. The Wald statistic for the null hypothesis

\[H_0 : C\beta = d\]

with \(C\) a \(r \times s\)-matrix (\(s\) is the number of parameters in the hypothesis), is

\[(C\hat{\beta}-d)^T(C\hat{\Sigma}_{\hat{\beta}}C^T)^{-1}(C\hat{\beta}-d)\]

and is approximately \(\chi^2\)-distributed with \(s\) degrees of freedom under \(H_0\). In our case, \(s = 2\), \(C = I_2\), and \(d = (0,0)^T\).

# lrt:

fit_full <- lmer(lincm ~ age + cyear + educ + sex + (1 | person), data = psid2)
fit_nest <- lmer(lincm ~ cyear + sex + (1 | person), data = psid2)

# un-restricted (regular) likelihood
lrt <- -2 * (c(logLik(fit_nest)) - c(logLik(fit_full)))
lrt
pchisq(lrt, 2, lower.tail = FALSE)  # reject H0


# wald:

C <- diag(2)
beta <- as.vector(summary(fit2)$coefficients[c(2, 4), 1])
sigma <- as.matrix(vcov(fit2)[c(2, 4), c(2, 4)])
d <- matrix(c(0, 0), ncol = 1)

# since d is a zero-vector and C is a diagonal matrix we can ignore them in the
# calculations wald <- t(C%*%beta-d)%*%solve(C%*%sigma%*%t(C))%*%(C%*%beta-d);
# wald
wald <- t(beta) %*% solve(sigma) %*% beta
wald

pchisq(wald, 2, lower.tail = FALSE)  # small p-value, reject H0
## [1] 6.925417
## [1] 0.03134475
##          [,1]
## [1,] 22.94555
##             [,1]
## [1,] 1.04097e-05

Note that you do not need any new information to find the Wald statistic, as you have all you need from the summary. To calculate this was not expected on the exam.

beta  # both under 'Fixed effects' in the summary
sigma  # the squared roots of the diagonal is under 'Fixed effects' in the summary (standard deviations)
sqrt(diag(sigma))

# the covariances are trickier, but can be found by using that the correlation
# between age and educ is -0.201, found under 'Correlation of Fixed Effects',
# and then multiplying with the standard deviations
-0.201 * 0.015207 * 0.027582
## [1] -0.02092952  0.11634275
##               age         educ
## age  2.312500e-04 8.428403e-05
## educ 8.428403e-05 7.607833e-04
##        age       educ 
## 0.01520691 0.02758230 
## [1] -8.430733e-05

d) Describe how the random effects \(\gamma_{0i}\) can be predicted/estimated. What information is missing for you to calculate these.

Answer

The estimates of \(\pmb{\gamma}_{0i}\) are based on the conditional expectations E\((\pmb{\gamma}_{0i} | \mathbf{Y}_1, \dots, \mathbf{Y}_N) =\) E\((\pmb{\gamma}_{0i} | \mathbf{Y}_i)\) since \(\mathbf{Y}_1, \dots, \mathbf{Y}_N\) are independent and \(\pmb{\gamma}_{0i}\) only depends on \(\mathbf{Y}_i\). Thd simultaneuous distribution of \(\pmb{\gamma}_{0i}\) and \(\mathbf{Y}_i\) is a \((q + n_i)\)-dimensional multinormal with expectation and covariance matrix

\[\begin{pmatrix} 0 \\ X_i\beta \end{pmatrix} \text{ and } \begin{pmatrix} Q & QZ_i^T \\ Z_iQ & Z_iQZ_i^T + \Sigma_i \end{pmatrix} \]

It then follows from the properties if the multinomial distribution that

\[E(\pmb{\gamma}_{0i} | \mathbf{Y}_i) = 0 + QU_i^T(U_iQU_i^T + \Sigma_i)^{-1}(\mathbf{Y}_i - X_i\beta) \]

Hence \(\pmb{\gamma}_{0i}\) is estimated by

\[\hat{Q}Z_i^T(U_i\hat{Q}U_i^T + \hat{\Sigma}_i)^{-1}(\mathbf{Y}_i - X_i\hat{\beta}) \]

where all estimates are the REML estimates.

U_i <- matrix(c(1, 1, 1), ncol = 1)
Q <- c(summary(fit2)$varcor$person)  # q = 1, so this is not a matrix (a 1x1-matrix)
beta <- as.matrix(fixef(fit2))
sigma_i <- sigma(fit2)^2 * diag(3)

# choosing i = 2
i <- 2
Y_i <- as.matrix(subset(psid2, person == i)$lincm)
X_i <- as.matrix(data.frame(int = c(1, 1, 1), subset(psid2, person == i)[, c(1, 7,
    2)], sex = ifelse(psid2$sex[psid2$person == i][1] == "M", c(1, 1, 1), c(0, 0,
    0))))


Q %*% t(U_i) %*% solve(U_i %*% Q %*% t(U_i) + sigma_i) %*% (Y_i - X_i %*% beta)
# can check and see if the estimate coincides with the estimate from lmer
ranef(fit2)$person[which(row.names(ranef(fit2)$person) == i), ]  # and it does
##             [,1]
## [1,] -0.00168172
## [1] -0.00168172

Need \(Y_i\) and \(X_i\), everything else can be found (with some calculations) from the summary.

In part c) and d) it was not necessary to do any numerical calculations at the exam.

f) New: do c) and d) in R (after you have done c) and d) by hand!!!, i.e., do numerical calculations (choose \(i = 2\) in d)). The rmd-file of this module contains necessary code to download the dataset, modify it, and fit the full model. Alternatively use purl to extract the code.

Answer This is a random intercept model. Such models are useful if you want to model correlations between variables from the same group (clustered data) or the same invidual (longitudinal data).