Problem 1:

a)

1)

The response is ordinal, multinomial. The proportional odds model assumes a latent continuous variable that measures the CGI.

\[\log \left( \frac{\pi_0}{\pi_1 + \pi_2} \right) = \theta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\] \[\log \left( \frac{\pi_0 + \pi_1}{\pi_2} \right) = \theta_1 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\]

2)

Since the inverse of the logit-functiuon is the logitstic function (sometimes called the expit function), we have:

\[\pi_0 = \frac{e^{\theta_0 + \pmb{x}^T \pmb{\beta}}}{1 + e^{\theta_0 + \pmb{x}^T \pmb{\beta}}}\]

\[\pi_0 + \pi_1 = \frac{e^{\theta_1 + \pmb{x}^T \pmb{\beta}}}{1 + e^{\theta_1 + \pmb{x}^T \pmb{\beta}}}\]

\[\implies \pi_1 = \frac{e^{\beta_{01} + \pmb{x}^T \pmb{\beta}}}{1 + e^{\beta_{01} + \pmb{x}^T \pmb{\beta}}} - \pi_0\]

\[\pi_2 = 1 - \pi_0 - \pi_1\]

b)

1)

\[\frac{\text{Prob}(\text{CGI} \leq 0 | \mathbf{x})}{\text{Prob}(\text{CGI} > 0 | \mathbf{x})} = \frac{\pi_0}{\pi_1 + \pi_2} = e^{\theta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3}\]

\[\frac{\text{Prob}(\text{CGI} \leq 1 | \mathbf{x})}{\text{Prob}(\text{CGI} > 1 | \mathbf{x})} = \frac{\pi_0 + \pi_1}{\pi_2} = e^{\theta_1 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3}\]

We can see that if \(x_3\) increases with 1 (and \(x_1\) and \(x_2\) are unchanged), we will have \(+ \beta_3\) in the exponent, which means that the cumulative odds ratio is multiplied by \(e^{\beta_3}\).

2)

Interpretation: If \(e^{\beta} < 1\) (i.e. \(\beta_3 < 0\)), increasing the initial CGI will give a reduction in the odds of being in state 0 or 1 after the treatment (and increase the odds if \(\beta_3 > 0\)).

3)

From \(\text{Prob}(\text{CGI} \leq 0 | \mathbf{x})/\text{Prob}(\text{CGI} > 0 | \mathbf{x})\) we see that increasing \(x_1\) and \(x_2\) will in the same way lead to multilpying with \(e^{\beta_1}\) and \(e^{\beta_2}\), respectively. If \(e^{\beta_1} < 1\), going from treatment every second week (\(x_1 = 0\)) to every third week (\(x_1 = 1\)) gives a reduction in the odds for the “good” states 0 and 1. If \(e^{\beta_2} < 0\), males will have lower odds of having states 0 and 1.

Note that we want as low state (i.e., \(y\)) as possible.

c)

1)

Saturated model: Each row in the table has its own \(\pmb{\pi}\)-vector \([\pi_{i0}, \pi_{i1}, \pi_{i2}]\). The model will thus have \(12 \cdot 2 = 24\) free parameters since we have \(\pi_{i0} + \pi_{i1} + \pi_{i2} = 1\).

2)

The deviance is given by

\[D = -2(l(\text{candidate model}) - l(\text{saturated model})).\]

From the module pages:

\[D = 2 \sum_{i=1}^{12} \sum_{j=0}^2 y_{ij} \log\left(\frac{y_{ij}}{\hat{y}_{ij}}\right)\] where \(\hat{y}_{ij} = n_i \hat{\pi_{ij}}\), and \(\pmb{y}_i = [y_{i0}, y_{i1}, y_{i2}]\) is the response for the \(i\)th line in the table. \(\hat{\pi}_{ij}\) is the estimated probability \(\pi_{ij}\) for the candidate model.

3)

Degrees of freedom for the deviance = number of free parameters in the saturated model minus number of free parameters in the candidate model = \(24 - 5 = 19\).

d)

1)

cbind(y0, y1, y2) ~ x1 + x2*x3 means that we have a model with the covariates x1, x2, x3 and x2x3 (x2 times x3).

2)

x1*x2 + x1*x3 has the linear predictor (all \(x\)’s are vectors here)

\[\theta_j + \beta_1 x1 + \beta_2 x2 + \beta_3 x3 + \beta_4 x1x2 + \beta_5 x2x3\]

Note that this is what we put om the right side of \(\log \left( \frac{\pi_0}{\pi_1 + \pi_2} \right)\) and \(\log \left( \frac{\pi_0 + \pi_1}{\pi_2} \right)\) from a). This model has 7 parameters (\(\beta_{1:5}\) and two intercept parameters), i.e. 24-7=17 degrees of freedom for the deviance.

3)

The number of parameters and degrees of freedom in each of the four models (x2 + x3, x1 + x2 + x3, x1*x2 + x3 and x1*x2 + x1*x3) are listed below:

model	no	df
x2 + x3	4	20
x1 + x2 + x3	5	19
x1*x2 + x3	6	18
x1x2 + x1x3	7	17

We are testing \(H_0\): x2 + x3 vs the three other models:

vs x1 + x2 + x3: \(\Delta D = 10.64 - 10.56 = 0.08\)
vs x1*x2 + x3: \(\Delta D = 10.64 - 8.52 = 2.12\)
vs x1*x2 + x2*x3: \(\Delta D = 10.64 - 8.33 = 2.23\)

The deviance is assumed to be \(\chi^2\)-distributed, so the critical value at a 95 % significance level is 3.841 for 1 df, and even larger for more df (see below), which means that we choose the model x2 + x3. This means that x1 does not have anything to say in the model, i.e., the interval of treatment does not matter.

# critical value at 95 % significance level for chi squared
# distribution for various degrees of freedom
qchisq(0.95, 1:4)

## [1] 3.841459 5.991465 7.814728 9.487729

e)

1)

Estimates:

exp(coefficients(allmodels[[6]])[3:5])

##        x1       x2M        x3 
## 0.8025806 0.1156044 0.1030548

2)

Approximate (because we assume normality of the \(\beta\)’s, it is just an approximate confidence interval) confidence interval, we choose 96 % significance level:

\[\hat{\beta}_1 \pm 1.96 \cdot \sqrt{\text{SD}(\hat{\beta}_1)}\]

exp(coefficients(allmodels[[6]])[3] + qnorm(0.975) * sqrt(vcov(allmodels[[6]])[3, 
    3]) * c(-1, 1))

## [1] 0.1823578 3.5322631

This confidence interval contains 1, so we cannot reject \(H_0\): \(e^{\beta_1} = 1\) (i.e., \(\beta_1 = 0\)). This means that the interval of treatment does not matter for the final CGI levels. This is the same conclusion as the one from d).

3)

Final CGI equal to 0 for

injections every second week \(\implies \ x_1 = 0\),
female \(\implies \ x_2 = 0\)
initial CGI = 5 \(\implies \ x_3 = 5\)

From a):

\[\hat{\pi}_0 = \frac{e^{\hat{\theta}_0 + 0 \hat{\beta}_1 + 0\hat{\beta}_2 + 5\hat{\beta}_3}}{1 + e^{\hat{\theta}_0 + 0 \hat{\beta}_1 + 0\hat{\beta}_2 + 5\hat{\beta}_3}} = \frac{e^{\hat{\theta}_0 + 5\hat{\beta}_3}}{1 + e^{\hat{\theta}_0 + 5\hat{\beta}_3}}\]

Numbers:

pi0hat <- exp(sum(coefficients(allmodels[[6]]) * c(1, 0, 0, 0, 5)))/(1 + 
    exp(sum(coefficients(allmodels[[6]]) * c(1, 0, 0, 0, 5))))
pi0hat

## [1] 0.03465839

Standard deviations are found by Taylor expanding \(\hat{\pi}_0\) around \(\theta_0, \beta_3\) (see e.g. https://en.wikipedia.org/wiki/Taylor_series#Taylor_series_in_several_variables). Let

\[f(x, y) = \frac{e^{x + 5y}}{1 + e^{x + 5y}}\]

Then \[ \frac{\partial f}{\partial x} = \frac{e^{x + 5y}}{(1 + e^{x + 5y})^2} = f(1-f) \text{ and } \frac{\partial f}{\partial x} = \frac{5e^{x + 5y}}{(1 + e^{x + 5y})^2} = 5f(1-f)\]

Thus \(\hat{\pi}_0 = \pi_0 + \pi_0(1 - \pi_0)(\hat{\theta}_0 - \theta_0) + 5\pi_0(1 - \pi_0)(\hat{\beta}_3 - \beta_3)\) such that the variance is

\[\text{Var}(\hat{\pi}_0) = \pi_0^2 (1-\pi_0)^2 \left[ \text{Var}(\hat{\theta}_0) + 25 \text{Var}(\hat{\beta}_3) + 10 \text{Cov}(\hat{\theta}_0, \hat{\beta}_3) \right]\]

We find the estimate by using \(\hat{\pi}_0\) for \(\pi_0\), and by finding the variances and covariances in the R-print. Then we will get:

pi0hat * (1 - pi0hat) * sqrt(vcov(allmodels[[6]])[1, 1] + 25 * vcov(allmodels[[6]])[5, 
    5] + 10 * vcov(allmodels[[6]])[1, 5])

[1] 0.04594256

Problem 2: More alligators (nominal model)

library(VGAM)
data2 = "http://www.stat.ufl.edu/~aa/glm/data/Alligators2.dat"
ali2 = read.table(data2, header = T)
ali2

##    lake gender size y1 y2 y3 y4 y5
## 1     1      1    1  7  1  0  0  5
## 2     1      1    0  4  0  0  1  2
## 3     1      0    1 16  3  2  2  3
## 4     1      0    0  3  0  1  2  3
## 5     2      1    1  2  2  0  0  1
## 6     2      1    0 13  7  6  0  0
## 7     2      0    1  0  1  0  1  0
## 8     2      0    0  3  9  1  0  2
## 9     3      1    1  3  7  1  0  1
## 10    3      1    0  8  6  6  3  5
## 11    3      0    1  2  4  1  1  4
## 12    3      0    0  0  1  0  0  0
## 13    4      1    1 13 10  0  2  2
## 14    4      1    0  9  0  0  1  2
## 15    4      0    1  3  9  1  0  1
## 16    4      0    0  8  1  0  0  1

colnames(ali2)

## [1] "lake"   "gender" "size"   "y1"     "y2"     "y3"     "y4"     "y5"

fit = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(size) + factor(gender), 
    family = multinomial, data = ali2)

# 6 possible models to investigate: only lake, only gender, only size, lake + gender, lake + size, size + gender
fit.lake = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake), family = multinomial, data = ali2)
fit.size = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(size), family = multinomial, data = ali2)
fit.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(gender), family = multinomial, data = ali2)
fit.lake.size = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(size), family = multinomial, data = ali2)
fit.lake.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(gender), family = multinomial, data = ali2)
fit.gender.size = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(size) + factor(gender), family = multinomial, data = ali2)

all = list(fit = fit, 
           fit.lake = fit.lake, 
           fit.size = fit.size, 
           fit.gender = fit.gender, 
           fit.lake.size = fit.lake.size, 
           fit.lake.gender = fit.lake.gender, 
           fit.gender.size = fit.gender.size)
sapply(all, AIC)

##             fit        fit.lake        fit.size      fit.gender 
##        199.6089        201.9464        221.7073        227.0375 
##   fit.lake.size fit.lake.gender fit.gender.size 
##        196.0890        204.2444        228.4214

which.min(sapply(all, AIC))

## fit.lake.size 
##             5

# what is best? toggle to match your choice
best = fit.lake.size
summary(best)

## 
## Call:
## vglm(formula = cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(size), 
##     family = multinomial, data = ali2)
## 
## 
## Pearson residuals:
##                        Min      1Q   Median     3Q   Max
## log(mu[,1]/mu[,5]) -1.4594 -0.6356 -0.35705 0.2798 2.083
## log(mu[,2]/mu[,5]) -0.8534 -0.5865 -0.32591 0.2850 2.219
## log(mu[,3]/mu[,5]) -1.0038 -0.5733 -0.19778 0.3225 7.196
## log(mu[,4]/mu[,5]) -1.4653 -0.3244 -0.04672 0.9444 1.557
## 
## Coefficients: 
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept):1    -3.0655     0.6485  -4.727 2.28e-06 ***
## (Intercept):2    -2.0646     0.7296  -2.830  0.00466 ** 
## (Intercept):3    -1.6166     0.6401  -2.526  0.01155 *  
## (Intercept):4    -0.9041     0.4666  -1.938  0.05266 .  
## factor(lake)2:1   2.9195     0.7032   4.152 3.30e-05 ***
## factor(lake)2:2   1.1457     0.8360   1.370  0.17055    
## factor(lake)2:3  -1.2545     1.1961  -1.049  0.29428    
## factor(lake)2:4  -0.8960     0.7625  -1.175  0.23998    
## factor(lake)3:1   2.7219     0.6700   4.063 4.85e-05 ***
## factor(lake)3:2   1.6939     0.7843   2.160  0.03080 *  
## factor(lake)3:3   0.5238     0.7804   0.671  0.50211    
## factor(lake)3:4   0.6064     0.5584   1.086  0.27746    
## factor(lake)4:1   1.6417     0.6116   2.684  0.00727 ** 
## factor(lake)4:2  -1.2422     1.1859  -1.047  0.29489    
## factor(lake)4:3  -0.6408     0.7782  -0.823  0.41025    
## factor(lake)4:4  -0.8563     0.5573  -1.537  0.12436    
## factor(size)1:1   1.2966     0.4228   3.067  0.00216 ** 
## factor(size)1:2  -0.3604     0.6458  -0.558  0.57680    
## factor(size)1:3  -0.2609     0.6444  -0.405  0.68563    
## factor(size)1:4   0.0956     0.4591   0.208  0.83507    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Number of linear predictors:  4 
## 
## Names of linear predictors: 
## log(mu[,1]/mu[,5]), log(mu[,2]/mu[,5]), log(mu[,3]/mu[,5]), log(mu[,4]/mu[,5])
## 
## Residual deviance: 59.7085 on 44 degrees of freedom
## 
## Log-likelihood: -78.0445 on 44 degrees of freedom
## 
## Number of iterations: 6 
## 
## No Hauck-Donner effect found in any of the estimates
## 
## Reference group is level  5  of the response

pchisq(deviance(best), df.residual(best), lower.tail = FALSE)

## [1] 0.05729729

confint(best)

##                      2.5 %      97.5 %
## (Intercept):1   -4.3366459 -1.79445187
## (Intercept):2   -3.4946851 -0.63453968
## (Intercept):3   -2.8711154 -0.36203090
## (Intercept):4   -1.8185871  0.01038597
## factor(lake)2:1  1.5412579  4.29773209
## factor(lake)2:2 -0.4928535  2.78426689
## factor(lake)2:3 -3.5988371  1.08990444
## factor(lake)2:4 -2.3905051  0.59852714
## factor(lake)3:1  1.4088292  4.03496737
## factor(lake)3:2  0.1566514  3.23119160
## factor(lake)3:3 -1.0058406  2.05346715
## factor(lake)3:4 -0.4879643  1.70077991
## factor(lake)4:1  0.4429103  2.84041268
## factor(lake)4:2 -3.5665767  1.08216020
## factor(lake)4:3 -2.1659520  0.88438551
## factor(lake)4:4 -1.9485591  0.23586426
## factor(size)1:1  0.4679141  2.12533733
## factor(size)1:2 -1.6261025  0.90532436
## factor(size)1:3 -1.5239492  1.00222010
## factor(size)1:4 -0.8043007  0.99549511

Module 6: CATHEGORICAL REGRESSION, solution IL week 1

TMA4315 Generalized linear models H2018

Problem 1:

a)

1)

2)

b)

1)

2)

3)

c)

1)

2)

3)

d)

1)

2)

3)

e)

1)

2)

3)

Problem 2: More alligators (nominal model)