Table 1 shows the results from a study where two injection plans for the the neuroleptic preparation perphenazine decanoate have been compared (from P. Knudsen, L. B. Hansen, K. Højholdt, N. E. Larsen, Acta Psychiatrica Scandinavica, 1985).
A group of 19 psycotic patients was given injections every second week, while another group of 19 patients was given injections every third week. The patients were studied for six months, and the effect of the treatment was evaluated in the end. Clinical evaluations was done using a six-point scale calles CGI (Clinical Global Impression), where a higher score means a worse state for the patient.
The 12 rows in Table 1 corresponds to 12 different combinations of the three explanatory variables \(x_1\), \(x_2\) and \(x_3\):
\[ x_1 = \begin{cases} 0 \text{, if injections are given every second week} \\ 1 \text{, if injections are given every third week} \end{cases} \\ x_2 = \begin{cases} 0 \text{, if patient is female} \\ 1 \text{, if patient is male} \end{cases} \\ x_3 = \text{ CGI at beginning of treatment (initial CGI)} \]
Interval (x1) | Sex (x2) | Initial CGI (x3) | Final CGI 0 (y0) | Final CGI 1 (y1) | Final CGI 2 (y2) |
---|---|---|---|---|---|
2 | F | 2 | 1 | 0 | 0 |
2 | F | 3 | 3 | 1 | 0 |
2 | F | 4 | 0 | 1 | 0 |
2 | M | 3 | 4 | 4 | 1 |
2 | M | 4 | 0 | 2 | 1 |
2 | M | 5 | 0 | 0 | 1 |
3 | F | 2 | 1 | 0 | 0 |
3 | F | 3 | 2 | 1 | 0 |
3 | F | 4 | 1 | 2 | 0 |
3 | M | 2 | 3 | 1 | 0 |
3 | M | 3 | 0 | 5 | 0 |
3 | M | 4 | 0 | 3 | 0 |
The corresponding responses are counts for each combination of explanatory variables:
\[ y_0 = \text{ number with (CGI = 0) after treatment} \\ y_1 = \text{ number with (CGI = 1) after treatment} \\ y_2 = \text{ number with (CGI = 2) after treatment} \]
Note that no patients had final CGI above 2 after the treatment.. We use \(y_2\) as the reference category. Assume that the CGI for a patient with covariate vector \(\mathbf{x} = (x_1, x_2, x_3)\) has response value \(j\), \(j = 0, 1, 2\), with probabilities
\[\pi_j = \text{Prob}(\text{CGI} = j | \mathbf{x}) \text{ for } j = 0, 1, 2.\]
The response \(\mathbf{y} = (y_0, y_1, y_2)\) for a row in the table is assumed to come from a multinomial distribution vector \(\mathbf{Y} = (Y_0, Y_1, Y_2)\) with probability vector \(\mathbf{\pi} = (\pi_0, \pi_1, \pi_2)\), and \(\mathbf{\pi}\) depends on \(\mathbf{x}\). Note that \(x_3\) is numeric, not cathegorical.
The response is ordinal, multinomial. The proportional odds model assumes a latent continuous variable that measures the CGI.
\[\log \left( \frac{\pi_0}{\pi_1 + \pi_2} \right) = \theta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\] \[\log \left( \frac{\pi_0 + \pi_1}{\pi_2} \right) = \theta_1 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\]
Since the inverse of the logit-functiuon is the logitstic function (sometimes called the expit function), we have:
\[\pi_0 = \frac{e^{\theta_0 + \pmb{x}^T \pmb{\beta}}}{1 + e^{\theta_0 + \pmb{x}^T \pmb{\beta}}}\]
\[\pi_0 + \pi_1 = \frac{e^{\theta_1 + \pmb{x}^T \pmb{\beta}}}{1 + e^{\theta_1 + \pmb{x}^T \pmb{\beta}}}\]
\[\implies \pi_1 = \frac{e^{\beta_{01} + \pmb{x}^T \pmb{\beta}}}{1 + e^{\beta_{01} + \pmb{x}^T \pmb{\beta}}} - \pi_0\]
\[\pi_2 = 1 - \pi_0 - \pi_1\]\[\frac{\text{Prob}(\text{CGI} \leq 0 | \mathbf{x})}{\text{Prob}(\text{CGI} > 0 | \mathbf{x})} = \frac{\pi_0}{\pi_1 + \pi_2} = e^{\theta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3}\]
\[\frac{\text{Prob}(\text{CGI} \leq 1 | \mathbf{x})}{\text{Prob}(\text{CGI} > 1 | \mathbf{x})} = \frac{\pi_0 + \pi_1}{\pi_2} = e^{\theta_1 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3}\]
We can see that if \(x_3\) increases with 1 (and \(x_1\) and \(x_2\) are unchanged), we will have \(+ \beta_3\) in the exponent, which means that the cumulative odds ratio is multiplied by \(e^{\beta_3}\).
From \(\text{Prob}(\text{CGI} \leq 0 | \mathbf{x})/\text{Prob}(\text{CGI} > 0 | \mathbf{x})\) we see that increasing \(x_1\) and \(x_2\) will in the same way lead to multilpying with \(e^{\beta_1}\) and \(e^{\beta_2}\), respectively. If \(e^{\beta_1} < 1\), going from treatment every second week (\(x_1 = 0\)) to every third week (\(x_1 = 1\)) gives a reduction in the odds for the “good” states 0 and 1. If \(e^{\beta_2} < 0\), males will have lower odds of having states 0 and 1.
Note that we want as low state (i.e., \(y\)) as possible.
Saturated model: Each row in the table has its own \(\pmb{\pi}\)-vector \([\pi_{i0}, \pi_{i1}, \pi_{i2}]\). The model will thus have \(12 \cdot 2 = 24\) free parameters since we have \(\pi_{i0} + \pi_{i1} + \pi_{i2} = 1\).
The deviance is given by
\[D = -2(l(\text{candidate model}) - l(\text{saturated model})).\]
From the module pages:
\[D = 2 \sum_{i=1}^{12} \sum_{j=0}^2 y_{ij} \log\left(\frac{y_{ij}}{\hat{y}_{ij}}\right)\] where \(\hat{y}_{ij} = n_i \hat{\pi_{ij}}\), and \(\pmb{y}_i = [y_{i0}, y_{i1}, y_{i2}]\) is the response for the \(i\)th line in the table. \(\hat{\pi}_{ij}\) is the estimated probability \(\pi_{ij}\) for the candidate model.
Degrees of freedom for the deviance = number of free parameters in the saturated model minus number of free parameters in the candidate model = \(24 - 5 = 19\).
Below you can see the deviance for all proportional odds models that
contain the variable \(x_3\) (initial
CGI). The formulas work in the same way as for the lm
and
glm
formulas.
Model | Deviance |
---|---|
cbind(y0, y1, y2) ~ x3 | 17.68 |
cbind(y0, y1, y2) ~ x1 + x3 | 17.66 |
cbind(y0, y1, y2) ~ x2 + x3 | 10.64 |
cbind(y0, y1, y2) ~ x1 * x3 | 14.43 |
cbind(y0, y1, y2) ~ x2 * x3 | 10.63 |
cbind(y0, y1, y2) ~ x1 + x2 + x3 | 10.56 |
cbind(y0, y1, y2) ~ x1 * x2 + x3 | 10.33 |
cbind(y0, y1, y2) ~ x1 * x3 + x2 | 8.52 |
cbind(y0, y1, y2) ~ x1 + x2 * x3 | 10.51 |
cbind(y0, y1, y2) ~ x1 * x2 + x1 * x3 | 8.33 |
cbind(y0, y1, y2) ~ x1 * x2 + x2 * x3 | 10.31 |
cbind(y0, y1, y2) ~ x1 * x3 + x2 * x3 | 8.47 |
cbind(y0, y1, y2) ~ x1 * x2 + x1 * x3 + x2 * x3 | 8.28 |
cbind(y0, y1, y2) ~ x1 * x2 * x3 | 8.28 |
cbind(y0, y1, y2) ~ x1 + x2*x3
? OBS: Ask if you are not
sure before moving on!cbind(y0, y1, y2) ~ x1 + x2*x3
means that we have a
model with the covariates x1
, x2
,
x3
and x2x3
(x2
times
x3
).
x1*x2 + x1*x3
. How many parameters are in this model? How
many degrees of freedom for the deviance?x1*x2 + x1*x3
has the linear predictor (all \(x\)’s are vectors here)
\[\theta_j + \beta_1 x1 + \beta_2 x2 + \beta_3 x3 + \beta_4 x1x2 + \beta_5 x2x3\]
Note that this is what we put om the right side of \(\log \left( \frac{\pi_0}{\pi_1 + \pi_2} \right)\) and \(\log \left( \frac{\pi_0 + \pi_1}{\pi_2} \right)\) from a). This model has 7 parameters (\(\beta_{1:5}\) and two intercept parameters), i.e. 24-7=17 degrees of freedom for the deviance.
x2 + x3
, x1 + x2 + x3
, x1*x2 + x3
and x1*x2 + x1*x3
as candidates for “the best model”. Which
of these models would you choose based on the deviances? Reason using
hypothesis testing (you have to choose one model for the
null-hypothesis, which?).The number of parameters and degrees of freedom in each of the four
models (x2 + x3
, x1 + x2 + x3
,
x1*x2 + x3
and x1*x2 + x1*x3
) are listed
below:
model | no | df |
---|---|---|
x2 + x3 | 4 | 20 |
x1 + x2 + x3 | 5 | 19 |
x1*x2 + x3 | 6 | 18 |
x1x2 + x1x3 | 7 | 17 |
We are testing \(H_0\):
x2 + x3
vs the three other models:
x1 + x2 + x3
: \(\Delta D =
10.64 - 10.56 = 0.08\)x1*x2 + x3
: \(\Delta D =
10.64 - 8.52 = 2.12\)x1*x2 + x2*x3
: \(\Delta D
= 10.64 - 8.33 = 2.23\)The deviance is assumed to be \(\chi^2\)-distributed, so the critical value
at a 95 % significance level is 3.841 for 1 df, and even larger for more
df (see below), which means that we choose the model
x2 + x3
. This means that x1
does not have
anything to say in the model, i.e., the interval of treatment does not
matter.
# critical value at 95 % significance level for chi squared distribution for
# various degrees of freedom
qchisq(0.95, 1:4)
## [1] 3.841459 5.991465 7.814728 9.487729
Below you see (a slightly edited) R
-summary of the
x1 + x2 + x3
model. Assume we still use the model from
a).
Call:
vglm(formula = x, family = cumulative(parallel = TRUE), data = data2)
Pearson residuals:
Min 1Q Median 3Q Max
logit(P[Y<=1]) -1.294 -0.33737 -0.08605 0.1788 1.211
logit(P[Y<=2]) -1.442 -0.08222 0.12144 0.2428 1.100
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 8.0355 2.5079 3.204 0.00136 **
(Intercept):2 12.4324 3.1752 3.916 9.02e-05 ***
x1 -0.2199 0.7561 -0.291 0.77114
x2M -2.1576 0.8875 -2.431 0.01506 *
x3 -2.2725 0.6985 -3.253 0.00114 **
---
Number of linear predictors: 2
Names of linear predictors: logit(P[Y<=1]), logit(P[Y<=2])
Residual deviance: 10.5552 on ?? degrees of freedom
Log-likelihood: -11.6754 on ?? degrees of freedom
And this is the estimated covariace matrix for the estimators:
## (Intercept):1 (Intercept):2 x1 x2M x3
## (Intercept):1 6.2898 7.6696 -0.6459 -1.1575 -1.6602
## (Intercept):2 7.6696 10.0816 -0.7281 -1.4997 -2.0776
## x1 -0.6459 -0.7281 0.5716 0.0921 0.0986
## x2M -1.1575 -1.4997 0.0921 0.7877 0.1982
## x3 -1.6602 -2.0776 0.0986 0.1982 0.4879
R
-summary to estimate
\(e^{\beta_k}\) for \(k = 1, 2, 3\).Estimates:
exp(coefficients(allmodels[[6]])[3:5])
## x1 x2M x3
## 0.8025806 0.1156044 0.1030548
Approximate (because we assume normality of the \(\beta\)’s, it is just an approximate confidence interval) confidence interval, we choose 96 % significance level:
\[\hat{\beta}_1 \pm 1.96 \cdot \sqrt{\text{SD}(\hat{\beta}_1)}\]
exp(coefficients(allmodels[[6]])[3] + qnorm(0.975) * sqrt(vcov(allmodels[[6]])[3,
3]) * c(-1, 1))
## [1] 0.1823578 3.5322631
This confidence interval contains 1, so we cannot reject \(H_0\): \(e^{\beta_1} = 1\) (i.e., \(\beta_1 = 0\)). This means that the interval of treatment does not matter for the final CGI levels. This is the same conclusion as the one from d).
Final CGI equal to 0 for
From a):
\[\hat{\pi}_0 = \frac{e^{\hat{\theta}_0 + 0 \hat{\beta}_1 + 0\hat{\beta}_2 + 5\hat{\beta}_3}}{1 + e^{\hat{\theta}_0 + 0 \hat{\beta}_1 + 0\hat{\beta}_2 + 5\hat{\beta}_3}} = \frac{e^{\hat{\theta}_0 + 5\hat{\beta}_3}}{1 + e^{\hat{\theta}_0 + 5\hat{\beta}_3}}\]
Numbers:
pi0hat <- exp(sum(coefficients(allmodels[[6]]) * c(1, 0, 0, 0, 5)))/(1 + exp(sum(coefficients(allmodels[[6]]) *
c(1, 0, 0, 0, 5))))
pi0hat
## [1] 0.03465839
Standard deviations are found by Taylor expanding \(\hat{\pi}_0\) around \(\theta_0, \beta_3\) (see e.g. https://en.wikipedia.org/wiki/Taylor_series#Taylor_series_in_several_variables). Let
\[f(x, y) = \frac{e^{x + 5y}}{1 + e^{x + 5y}}\]
Then \[ \frac{\partial f}{\partial x} = \frac{e^{x + 5y}}{(1 + e^{x + 5y})^2} = f(1-f) \text{ and } \frac{\partial f}{\partial x} = \frac{5e^{x + 5y}}{(1 + e^{x + 5y})^2} = 5f(1-f)\]
Thus \(\hat{\pi}_0 = \pi_0 + \pi_0(1 - \pi_0)(\hat{\theta}_0 - \theta_0) + 5\pi_0(1 - \pi_0)(\hat{\beta}_3 - \beta_3)\) such that the variance is
\[\text{Var}(\hat{\pi}_0) = \pi_0^2 (1-\pi_0)^2 \left[ \text{Var}(\hat{\theta}_0) + 25 \text{Var}(\hat{\beta}_3) + 10 \text{Cov}(\hat{\theta}_0, \hat{\beta}_3) \right]\]
We find the estimate by using \(\hat{\pi}_0\) for \(\pi_0\), and by finding the variances and
covariances in the R
-print. Then we will get:
pi0hat * (1 - pi0hat) * sqrt(vcov(allmodels[[6]])[1, 1] + 25 * vcov(allmodels[[6]])[5,
5] + 10 * vcov(allmodels[[6]])[1, 5])
[1] 0.04594256
We will analyses an extended version of the alligators data, where also the gender of the alligator is included.
In the data below the following column names are given:
First get the data:
library(VGAM)
data2 = "http://www.stat.ufl.edu/~aa/glm/data/Alligators2.dat"
ali2 = read.table(data2, header = T)
head(ali2)
## lake gender size y1 y2 y3 y4 y5
## 1 1 1 1 7 1 0 0 5
## 2 1 1 0 4 0 0 1 2
## 3 1 0 1 16 3 2 2 3
## 4 1 0 0 3 0 1 2 3
## 5 2 1 1 2 2 0 0 1
## 6 2 1 0 13 7 6 0 0
ali2$lake <- factor(ali2$lake)
ali2$size <- factor(ali2$size)
ali2$gender <- factor(ali2$gender)
fit = vglm(cbind(y2, y3, y4, y5, y1) ~ lake + size + gender, family = multinomial,
data = ali2)
Now fit 6 possible models to investigate:
all <- list(fit = vglm(cbind(y2, y3, y4, y5, y1) ~ lake + size + gender, family = multinomial,
data = ali2), fit.lake = vglm(cbind(y2, y3, y4, y5, y1) ~ lake, family = multinomial,
data = ali2), fit.size = vglm(cbind(y2, y3, y4, y5, y1) ~ size, family = multinomial,
data = ali2), fit.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ gender, family = multinomial,
data = ali2), fit.lake.size = vglm(cbind(y2, y3, y4, y5, y1) ~ lake + size, family = multinomial,
data = ali2), fit.lake.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ lake + gender,
family = multinomial, data = ali2), fit.gender.size = vglm(cbind(y2, y3, y4,
y5, y1) ~ size + gender, family = multinomial, data = ali2))
(AICall <- lapply(all, AIC))
## $fit
## [1] 199.6089
##
## $fit.lake
## [1] 201.9464
##
## $fit.size
## [1] 221.7073
##
## $fit.gender
## [1] 227.0375
##
## $fit.lake.size
## [1] 196.089
##
## $fit.lake.gender
## [1] 204.2444
##
## $fit.gender.size
## [1] 228.4214
the best model has the lowest AIC, so is the model with both lake and size
We can test if the residual deviance is larger than would be expected if the model fitted well:
best = all[[which(AICall == min(unlist(AICall)))]]
pchisq(deviance(best), df.residual(best), lower.tail = FALSE)
## [1] 0.05729729
The p-value is just above 0.05, so any evidence for model mis-fit is (at best) weak. The deviance is about 1.36 times larger than would be expected, so is not large.
(question for you: how much would the standard errors change if we use this extra variation)
Have fun…
best = all[[which(AICall == min(unlist(AICall)))]]
(Summ <- summary(best))
##
## Call:
## vglm(formula = cbind(y2, y3, y4, y5, y1) ~ lake + size, family = multinomial,
## data = ali2)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -3.0655 0.6485 -4.727 2.28e-06 ***
## (Intercept):2 -2.0646 0.7296 -2.830 0.00466 **
## (Intercept):3 -1.6166 0.6401 -2.526 0.01155 *
## (Intercept):4 -0.9041 0.4666 -1.938 0.05266 .
## lake2:1 2.9195 0.7032 4.152 3.30e-05 ***
## lake2:2 1.1457 0.8360 1.370 0.17055
## lake2:3 -1.2545 1.1961 -1.049 0.29428
## lake2:4 -0.8960 0.7625 -1.175 0.23998
## lake3:1 2.7219 0.6700 4.063 4.85e-05 ***
## lake3:2 1.6939 0.7843 2.160 0.03080 *
## lake3:3 0.5238 0.7804 0.671 0.50211
## lake3:4 0.6064 0.5584 1.086 0.27746
## lake4:1 1.6417 0.6116 2.684 0.00727 **
## lake4:2 -1.2422 1.1859 -1.047 0.29489
## lake4:3 -0.6408 0.7782 -0.823 0.41025
## lake4:4 -0.8563 0.5573 -1.537 0.12436
## size1:1 1.2966 0.4228 3.067 0.00216 **
## size1:2 -0.3604 0.6458 -0.558 0.57680
## size1:3 -0.2609 0.6444 -0.405 0.68563
## size1:4 0.0956 0.4591 0.208 0.83507
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: log(mu[,1]/mu[,5]), log(mu[,2]/mu[,5]),
## log(mu[,3]/mu[,5]), log(mu[,4]/mu[,5])
##
## Residual deviance: 59.7085 on 44 degrees of freedom
##
## Log-likelihood: -78.0445 on 44 degrees of freedom
##
## Number of Fisher scoring iterations: 6
##
## No Hauck-Donner effect found in any of the estimates
##
##
## Reference group is level 5 of the response
There are a lot of parameters, so we can plot the intersting ones:
best = all[[which(AICall == min(unlist(AICall)))]]
Summ1 <- Summ@coef3[!grepl("\\(", rownames(Summ@coef3)), ]
Ests <- data.frame(Estimate = Summ1[, "Estimate"], L95 = Summ1[, "Estimate"] - 1.96 *
Summ1[, "Std. Error"], U95 = Summ1[, "Estimate"] + 1.96 * Summ1[, "Std. Error"],
Variable = gsub(":.*", "", rownames(Summ1)), Level = gsub(".*:", "", rownames(Summ1)))
Ests$At.x = 1:nrow(Summ1) + as.numeric(as.factor(Ests$Variable))
Vars <- tapply(Ests$At.x, list(Ests$Variable), mean)
Levels <- c("inverterbrate", "reptile", "bird", "other")
plot(Ests[, "Estimate"], Ests$At.x, xlim = range(Ests[, c("L95", "U95")]), ann = FALSE,
col = Ests$Level, yaxt = "n")
segments(Ests[, "L95"], Ests$At.x, Ests[, "U95"], Ests$At.x, col = Ests$Level)
axis(1)
axis(2, labels = names(Vars), at = Vars)
mtext("Coefficient", 1, line = 3)
legend(2.4, 20, legend = Levels, fill = 1:4)
abline(v = 0, col = "grey", lty = 3)
Rememerb, Fish is the base level, so (for example) there are more invertebrates than fish eaten in lakes 2 - 4 compared to lake 1. Similarly adults (size=1) seem to each more invertebrates than fish, compared to juvemiles.
None found at NTNU or UiO - except the IL-problem.
install.packages(c("VGAM", "ggplot2", "statmod", "knitr"))