The response is ordinal, multinomial. The proportional odds model assumes a latent continuous variable that measures the CGI.
\[\log \left( \frac{\pi_0}{\pi_1 + \pi_2} \right) = \theta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\] \[\log \left( \frac{\pi_0 + \pi_1}{\pi_2} \right) = \theta_1 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\]
Since the inverse of the logit-functiuon is the logitstic function (sometimes called the expit function), we have:
\[\pi_0 = \frac{e^{\theta_0 + \pmb{x}^T \pmb{\beta}}}{1 + e^{\theta_0 + \pmb{x}^T \pmb{\beta}}}\]
\[\pi_0 + \pi_1 = \frac{e^{\theta_1 + \pmb{x}^T \pmb{\beta}}}{1 + e^{\theta_1 + \pmb{x}^T \pmb{\beta}}}\]
\[\implies \pi_1 = \frac{e^{\beta_{01} + \pmb{x}^T \pmb{\beta}}}{1 + e^{\beta_{01} + \pmb{x}^T \pmb{\beta}}} - \pi_0\]
\[\pi_2 = 1 - \pi_0 - \pi_1\]
\[\frac{\text{Prob}(\text{CGI} \leq 0 | \mathbf{x})}{\text{Prob}(\text{CGI} > 0 | \mathbf{x})} = \frac{\pi_0}{\pi_1 + \pi_2} = e^{\theta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3}\]
\[\frac{\text{Prob}(\text{CGI} \leq 1 | \mathbf{x})}{\text{Prob}(\text{CGI} > 1 | \mathbf{x})} = \frac{\pi_0 + \pi_1}{\pi_2} = e^{\theta_1 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3}\]
We can see that if \(x_3\) increases with 1 (and \(x_1\) and \(x_2\) are unchanged), we will have \(+ \beta_3\) in the exponent, which means that the cumulative odds ratio is multiplied by \(e^{\beta_3}\).
Interpretation: If \(e^{\beta} < 1\) (i.e. \(\beta_3 < 0\)), increasing the initial CGI will give a reduction in the odds of being in state 0 or 1 after the treatment (and increase the odds if \(\beta_3 > 0\)).
From \(\text{Prob}(\text{CGI} \leq 0 | \mathbf{x})/\text{Prob}(\text{CGI} > 0 | \mathbf{x})\) we see that increasing \(x_1\) and \(x_2\) will in the same way lead to multilpying with \(e^{\beta_1}\) and \(e^{\beta_2}\), respectively. If \(e^{\beta_1} < 1\), going from treatment every second week (\(x_1 = 0\)) to every third week (\(x_1 = 1\)) gives a reduction in the odds for the “good” states 0 and 1. If \(e^{\beta_2} < 0\), males will have lower odds of having states 0 and 1.
Note that we want as low state (i.e., \(y\)) as possible.
Saturated model: Each row in the table has its own \(\pmb{\pi}\)-vector \([\pi_{i0}, \pi_{i1}, \pi_{i2}]\). The model will thus have \(12 \cdot 2 = 24\) free parameters since we have \(\pi_{i0} + \pi_{i1} + \pi_{i2} = 1\).
The deviance is given by
\[D = -2(l(\text{candidate model}) - l(\text{saturated model})).\]
From the module pages:
\[D = 2 \sum_{i=1}^{12} \sum_{j=0}^2 y_{ij} \log\left(\frac{y_{ij}}{\hat{y}_{ij}}\right)\] where \(\hat{y}_{ij} = n_i \hat{\pi_{ij}}\), and \(\pmb{y}_i = [y_{i0}, y_{i1}, y_{i2}]\) is the response for the \(i\)th line in the table. \(\hat{\pi}_{ij}\) is the estimated probability \(\pi_{ij}\) for the candidate model.
Degrees of freedom for the deviance = number of free parameters in the saturated model minus number of free parameters in the candidate model = \(24 - 5 = 19\).
cbind(y0, y1, y2) ~ x1 + x2*x3
means that we have a model with the covariates x1
, x2
, x3
and x2x3
(x2
times x3
).
x1*x2 + x1*x3
has the linear predictor (all \(x\)’s are vectors here)
\[\theta_j + \beta_1 x1 + \beta_2 x2 + \beta_3 x3 + \beta_4 x1x2 + \beta_5 x2x3\]
Note that this is what we put om the right side of \(\log \left( \frac{\pi_0}{\pi_1 + \pi_2} \right)\) and \(\log \left( \frac{\pi_0 + \pi_1}{\pi_2} \right)\) from a). This model has 7 parameters (\(\beta_{1:5}\) and two intercept parameters), i.e. 24-7=17 degrees of freedom for the deviance.
The number of parameters and degrees of freedom in each of the four models (x2 + x3
, x1 + x2 + x3
, x1*x2 + x3
and x1*x2 + x1*x3
) are listed below:
model | no | df |
---|---|---|
x2 + x3 | 4 | 20 |
x1 + x2 + x3 | 5 | 19 |
x1*x2 + x3 | 6 | 18 |
x1x2 + x1x3 | 7 | 17 |
We are testing \(H_0\): x2 + x3
vs the three other models:
x1 + x2 + x3
: \(\Delta D = 10.64 - 10.56 = 0.08\)x1*x2 + x3
: \(\Delta D = 10.64 - 8.52 = 2.12\)x1*x2 + x2*x3
: \(\Delta D = 10.64 - 8.33 = 2.23\)The deviance is assumed to be \(\chi^2\)-distributed, so the critical value at a 95 % significance level is 3.841 for 1 df, and even larger for more df (see below), which means that we choose the model x2 + x3
. This means that x1
does not have anything to say in the model, i.e., the interval of treatment does not matter.
# critical value at 95 % significance level for chi squared
# distribution for various degrees of freedom
qchisq(0.95, 1:4)
## [1] 3.841459 5.991465 7.814728 9.487729
Estimates:
exp(coefficients(allmodels[[6]])[3:5])
## x1 x2M x3
## 0.8025806 0.1156044 0.1030548
Approximate (because we assume normality of the \(\beta\)’s, it is just an approximate confidence interval) confidence interval, we choose 96 % significance level:
\[\hat{\beta}_1 \pm 1.96 \cdot \sqrt{\text{SD}(\hat{\beta}_1)}\]
exp(coefficients(allmodels[[6]])[3] + qnorm(0.975) * sqrt(vcov(allmodels[[6]])[3,
3]) * c(-1, 1))
## [1] 0.1823578 3.5322631
This confidence interval contains 1, so we cannot reject \(H_0\): \(e^{\beta_1} = 1\) (i.e., \(\beta_1 = 0\)). This means that the interval of treatment does not matter for the final CGI levels. This is the same conclusion as the one from d).
Final CGI equal to 0 for
From a):
\[\hat{\pi}_0 = \frac{e^{\hat{\theta}_0 + 0 \hat{\beta}_1 + 0\hat{\beta}_2 + 5\hat{\beta}_3}}{1 + e^{\hat{\theta}_0 + 0 \hat{\beta}_1 + 0\hat{\beta}_2 + 5\hat{\beta}_3}} = \frac{e^{\hat{\theta}_0 + 5\hat{\beta}_3}}{1 + e^{\hat{\theta}_0 + 5\hat{\beta}_3}}\]
Numbers:
pi0hat <- exp(sum(coefficients(allmodels[[6]]) * c(1, 0, 0, 0, 5)))/(1 +
exp(sum(coefficients(allmodels[[6]]) * c(1, 0, 0, 0, 5))))
pi0hat
## [1] 0.03465839
Standard deviations are found by Taylor expanding \(\hat{\pi}_0\) around \(\theta_0, \beta_3\) (see e.g. https://en.wikipedia.org/wiki/Taylor_series#Taylor_series_in_several_variables). Let
\[f(x, y) = \frac{e^{x + 5y}}{1 + e^{x + 5y}}\]
Then \[ \frac{\partial f}{\partial x} = \frac{e^{x + 5y}}{(1 + e^{x + 5y})^2} = f(1-f) \text{ and } \frac{\partial f}{\partial x} = \frac{5e^{x + 5y}}{(1 + e^{x + 5y})^2} = 5f(1-f)\]
Thus \(\hat{\pi}_0 = \pi_0 + \pi_0(1 - \pi_0)(\hat{\theta}_0 - \theta_0) + 5\pi_0(1 - \pi_0)(\hat{\beta}_3 - \beta_3)\) such that the variance is
\[\text{Var}(\hat{\pi}_0) = \pi_0^2 (1-\pi_0)^2 \left[ \text{Var}(\hat{\theta}_0) + 25 \text{Var}(\hat{\beta}_3) + 10 \text{Cov}(\hat{\theta}_0, \hat{\beta}_3) \right]\]
We find the estimate by using \(\hat{\pi}_0\) for \(\pi_0\), and by finding the variances and covariances in the R
-print. Then we will get:
pi0hat * (1 - pi0hat) * sqrt(vcov(allmodels[[6]])[1, 1] + 25 * vcov(allmodels[[6]])[5,
5] + 10 * vcov(allmodels[[6]])[1, 5])
[1] 0.04594256
library(VGAM)
data2 = "http://www.stat.ufl.edu/~aa/glm/data/Alligators2.dat"
ali2 = read.table(data2, header = T)
ali2
## lake gender size y1 y2 y3 y4 y5
## 1 1 1 1 7 1 0 0 5
## 2 1 1 0 4 0 0 1 2
## 3 1 0 1 16 3 2 2 3
## 4 1 0 0 3 0 1 2 3
## 5 2 1 1 2 2 0 0 1
## 6 2 1 0 13 7 6 0 0
## 7 2 0 1 0 1 0 1 0
## 8 2 0 0 3 9 1 0 2
## 9 3 1 1 3 7 1 0 1
## 10 3 1 0 8 6 6 3 5
## 11 3 0 1 2 4 1 1 4
## 12 3 0 0 0 1 0 0 0
## 13 4 1 1 13 10 0 2 2
## 14 4 1 0 9 0 0 1 2
## 15 4 0 1 3 9 1 0 1
## 16 4 0 0 8 1 0 0 1
colnames(ali2)
## [1] "lake" "gender" "size" "y1" "y2" "y3" "y4" "y5"
fit = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(size) + factor(gender),
family = multinomial, data = ali2)
# 6 possible models to investigate: only lake, only gender, only size, lake + gender, lake + size, size + gender
fit.lake = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake), family = multinomial, data = ali2)
fit.size = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(size), family = multinomial, data = ali2)
fit.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(gender), family = multinomial, data = ali2)
fit.lake.size = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(size), family = multinomial, data = ali2)
fit.lake.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(gender), family = multinomial, data = ali2)
fit.gender.size = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(size) + factor(gender), family = multinomial, data = ali2)
all = list(fit = fit,
fit.lake = fit.lake,
fit.size = fit.size,
fit.gender = fit.gender,
fit.lake.size = fit.lake.size,
fit.lake.gender = fit.lake.gender,
fit.gender.size = fit.gender.size)
sapply(all, AIC)
## fit fit.lake fit.size fit.gender
## 199.6089 201.9464 221.7073 227.0375
## fit.lake.size fit.lake.gender fit.gender.size
## 196.0890 204.2444 228.4214
which.min(sapply(all, AIC))
## fit.lake.size
## 5
# what is best? toggle to match your choice
best = fit.lake.size
summary(best)
##
## Call:
## vglm(formula = cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(size),
## family = multinomial, data = ali2)
##
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## log(mu[,1]/mu[,5]) -1.4594 -0.6356 -0.35705 0.2798 2.083
## log(mu[,2]/mu[,5]) -0.8534 -0.5865 -0.32591 0.2850 2.219
## log(mu[,3]/mu[,5]) -1.0038 -0.5733 -0.19778 0.3225 7.196
## log(mu[,4]/mu[,5]) -1.4653 -0.3244 -0.04672 0.9444 1.557
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -3.0655 0.6485 -4.727 2.28e-06 ***
## (Intercept):2 -2.0646 0.7296 -2.830 0.00466 **
## (Intercept):3 -1.6166 0.6401 -2.526 0.01155 *
## (Intercept):4 -0.9041 0.4666 -1.938 0.05266 .
## factor(lake)2:1 2.9195 0.7032 4.152 3.30e-05 ***
## factor(lake)2:2 1.1457 0.8360 1.370 0.17055
## factor(lake)2:3 -1.2545 1.1961 -1.049 0.29428
## factor(lake)2:4 -0.8960 0.7625 -1.175 0.23998
## factor(lake)3:1 2.7219 0.6700 4.063 4.85e-05 ***
## factor(lake)3:2 1.6939 0.7843 2.160 0.03080 *
## factor(lake)3:3 0.5238 0.7804 0.671 0.50211
## factor(lake)3:4 0.6064 0.5584 1.086 0.27746
## factor(lake)4:1 1.6417 0.6116 2.684 0.00727 **
## factor(lake)4:2 -1.2422 1.1859 -1.047 0.29489
## factor(lake)4:3 -0.6408 0.7782 -0.823 0.41025
## factor(lake)4:4 -0.8563 0.5573 -1.537 0.12436
## factor(size)1:1 1.2966 0.4228 3.067 0.00216 **
## factor(size)1:2 -0.3604 0.6458 -0.558 0.57680
## factor(size)1:3 -0.2609 0.6444 -0.405 0.68563
## factor(size)1:4 0.0956 0.4591 0.208 0.83507
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 4
##
## Names of linear predictors:
## log(mu[,1]/mu[,5]), log(mu[,2]/mu[,5]), log(mu[,3]/mu[,5]), log(mu[,4]/mu[,5])
##
## Residual deviance: 59.7085 on 44 degrees of freedom
##
## Log-likelihood: -78.0445 on 44 degrees of freedom
##
## Number of iterations: 6
##
## No Hauck-Donner effect found in any of the estimates
##
## Reference group is level 5 of the response
pchisq(deviance(best), df.residual(best), lower.tail = FALSE)
## [1] 0.05729729
confint(best)
## 2.5 % 97.5 %
## (Intercept):1 -4.3366459 -1.79445187
## (Intercept):2 -3.4946851 -0.63453968
## (Intercept):3 -2.8711154 -0.36203090
## (Intercept):4 -1.8185871 0.01038597
## factor(lake)2:1 1.5412579 4.29773209
## factor(lake)2:2 -0.4928535 2.78426689
## factor(lake)2:3 -3.5988371 1.08990444
## factor(lake)2:4 -2.3905051 0.59852714
## factor(lake)3:1 1.4088292 4.03496737
## factor(lake)3:2 0.1566514 3.23119160
## factor(lake)3:3 -1.0058406 2.05346715
## factor(lake)3:4 -0.4879643 1.70077991
## factor(lake)4:1 0.4429103 2.84041268
## factor(lake)4:2 -3.5665767 1.08216020
## factor(lake)4:3 -2.1659520 0.88438551
## factor(lake)4:4 -1.9485591 0.23586426
## factor(size)1:1 0.4679141 2.12533733
## factor(size)1:2 -1.6261025 0.90532436
## factor(size)1:3 -1.5239492 1.00222010
## factor(size)1:4 -0.8043007 0.99549511