(Latest changes: 13.12: typo with \(w_i\) for multivariate exp fam, 27.10 a few typos corrected).
This topic is new on the reading list this year.
multinomial random component
nominal vs. ordinal response
ungrouped and grouped data
multivariate exponential family
nominal response and logit models
ordinal reponse and logit models - based on a latent model
likelihood inference
Jump to interactive.
We consider a situation where our random variable (response) is given as one of \(c+1\) possible categories (where we will look at category \(c+1\) as the reference category).
The categories will either be
Assumptions:
When coding the response variable we use a dummy variable coding with \(c\) elements (the \(c+1\) category is the reference level). This means that if we have that \(\pi_{ir}=1\) then \({\bf y}_i=(0,0,\ldots,0,1,0,\ldots,0)\) with a value of \(1\) in the \(r\)th element of \({\bf y}_i\). If observation \(i\) comes from category \(c+1\) we have \({\bf y}_i=(0,0,\ldots,0)\).
is modelling and estimating the probabilites \(\pi_{ir}=P(Y_i=r)=P(Y_{ir}=1)\) as a function of the covariates \({\bf x}_i\). The modelling is done differently for nominal (unordered) and ordered categories, but both rely upon the multinomial distribution.
Probability mass function for one observation:
\[f({\bf y})=\pi_1^{y_{1}} \pi_2^{y_{2}} \cdots \pi_c^{y_c} (1-\pi_1-\pi_2-\cdots-\pi_c)^{1-y_1-y_2-\cdots-y_c}\] where then \({\bf y}=(y_1,y_2,\ldots,y_c)\) and \(y_r=1\) if the observation comes from the \(r\)th category.
If we then have \(m\) independent trials then \({\bf y}=(y_1,y_2,\ldots,y_c)\) is summed over our \(m\) responses, so that \(y_r\) is the number of observations where the response is from the \(r\)th category.
\[f({\bf y})=\frac{m!}{y_1!\cdots y_c! (m-y_1-\cdots -y_c)!}\pi_1^{y_{1}} \pi_2^{y_{2}} \cdots \pi_c^{y_c}(1-\pi_1-\pi_2-\cdots-\pi_c)^{m-y_1-y_2-\cdots-y_c}\]
The mean and the covariance matrix of the random vector \({\bf Y}\) are given by:
\[\text{E}({\bf Y})=m\boldsymbol{\pi}=\begin{pmatrix}m\pi_1\\m\pi_2\\ \vdots \\ m\pi_c\end{pmatrix}\] \[\text{Cov}({\bf Y})=m \begin{pmatrix}\pi_1(1-\pi_1)& -\pi_1\pi_2&\cdots &-\pi_1\pi_c\\ -\pi_2\pi_1& \pi_2(1-\pi_2)&\cdots &-\pi_2\pi_c\\ \vdots & \vdots & \ddots & \vdots\\ -\pi_c\pi_1& -\pi_c\pi_2&\cdots & \pi_c(1-\pi_c)\end{pmatrix}\]
Q: what about \(\text{E}(Y_{c+1})\) and \(\text{Cov}(Y_1,Y_{c+1})\)?
Finally, if we look at \(\bar{Y}_r=\frac{1}{m}Y_r\) then \(\bar{\bf Y}=\frac{1}{m}{\bf Y}\) follows a scaled multinomial distribution \(\bar{\bf Y}\sim \frac{1}{m}M(m,{\boldsymbol \pi})\) with \(\text{E}(\bar{\bf Y})={\boldsymbol \pi}\) and \(\text{Cov}(\bar{\bf Y})=\frac{1}{m^2}\text{Cov}({\bf Y})\).
\[{\bf Y}=\begin{pmatrix} Y_{11}& Y_{12} & \cdots & Y_{1c}\\ Y_{21}& Y_{22} & \cdots & Y_{2c}\\ \vdots & \vdots & \ddots & \vdots\\ Y_{n1}& Y_{n2} & \cdots & Y_{nc}\\ \end{pmatrix}\]
and \({\bf X}\) is an \(n \times p\) matrix as usual.
As for the binomial case we look at the number of occurences with a group - that is, one covariate pattern.
\[{\bf Y}=\begin{pmatrix} Y_{11}& Y_{12} & \cdots & Y_{1c}\\ Y_{21}& Y_{22} & \cdots & Y_{2c}\\ \vdots & \vdots & \ddots & \vdots\\ Y_{G1}& Y_{G2} & \cdots & Y_{Gc}\\ \end{pmatrix}\]
The notation here is that we have \(n_i\) observation for each covariate pattern (group) \(i\) for \(i=1,\ldots,G\). This will replace the \(m\) used for the multinomial distribution above.
nominal=unordered
Agresti (2015, p203): “The model treats the response variable as nominal scale in the following sense: if the model holds and the outcome categories are permuted in any way, the model still holds with the corresponding permuatation of the effects.”
This is a generalization of the binary logit model with \(P(Y=1)\) vs \(P(Y=0)\), to \(c\) models of \(\pi_{ir}\) vs \(\pi_{i,c+1}\) for \(r=1,\ldots,c\).
The models can be written using log ratios: \[ \ln(\frac{\pi_{ir}}{\pi_{i,c+1}})={\bf x}_i^T {\boldsymbol \beta}_r\]
Remark: \({\boldsymbol \beta}_r\) is the \(p\times 1\) coefficient vector for the \(r\)th response
Using this we may also look at the log ratio for any two probabilites \(\pi_{ia}\) and \(\pi_{ib}\):
\[\ln(\frac{\pi_{ia}}{\pi_{ib}})=\ln(\frac{\pi_{ia}}{\pi_{i,c+1}})-\ln(\frac{\pi_{ib}}{\pi_{i,c+1}})={\bf x}_i^T ({\boldsymbol \beta}_a-{\boldsymbol \beta}_b)\]
Alternatively, we may write out the model for the probabilites:
\[P(Y_i=r)=\pi_{ir}=\frac{\exp({\bf x}_i^T {\boldsymbol\beta}_r)}{1+\sum_{s=1}^{c}\exp({\bf x}_i^T {\boldsymbol\beta}_s)}\]
\[P(Y_i=c+1)=\pi_{i,c+1}=1-\pi_{i1}-\cdots \pi_{ic}=\frac{1}{1+\sum_{s=1}^{c}\exp({\bf x}_i^T {\boldsymbol\beta}_s)}\]
This is a multivariate GLM and the multinomial distribution is a multivariate exponential family.
\[f({\bf y}_i,{\boldsymbol \theta}_i,\phi)=\exp(\frac{{\bf y}_i^T{\boldsymbol \theta}_i-b({\boldsymbol \theta}_i)}{\phi}w_i+c({\bf y}_i,\phi,w_i))\]
where \({\boldsymbol \theta}\) has dimension \(c\).
\[{\boldsymbol \mu}_i=\text{E}({\bf Y}_i)= {\boldsymbol \pi}_i=\begin{pmatrix} \pi_{i1}\\ \pi_{i2}\\ \vdots \\ \pi_{i,c+1}\end{pmatrix}\] Remark: if grouped data we instead look at \(\bar{\bf Y}_i\sim \frac{1}{n_i}M(n_i,\pi_i)\) so that the mean is \({\boldsymbol \pi}_i\)
\[{\boldsymbol \eta}_i=\begin{pmatrix} \eta_{i1}\\ \eta_{i2}\\ \vdots \\ \eta_{i,c}\end{pmatrix}= \begin{pmatrix} {\bf x}_i^T{\boldsymbol \beta}_{1}\\ {\bf x}_i^T{\boldsymbol \beta}_{2}\\ \vdots \\ {\bf x}_i^T{\boldsymbol \beta}_{c} \end{pmatrix}\]
\[ g_{r}(\boldsymbol{\mu}_i)=\ln(\frac{\mu_{ir}}{1-\mu_{i1}-\cdots-\mu_{ic}})= \ln(\frac{\pi_{ir}}{1-\pi_{i1}-\cdots-\pi_{ic}})\]
We also define response functions \((\bf h)\) with elements \(h_r\) given by \(\pi_{ir}=h_r(\eta_{i1},\eta_{i2},\ldots,\eta_{ic})\), and we have for the nominal data model
\[\pi_{ir}=h_r((\eta_{i1},\eta_{i2},\ldots,\eta_{ic})=\frac{\exp(\eta_{ir})}{1+\sum_{s=1}^c \exp(\eta_{is})}\]
It turns out that the reference category logits are the canonical links for the multinomial distribution GLM.
In this case, as for the univariate eksponential family GLM the loglikelihood is concave with an unique maximum (if it exists) and the expected and observed Fisher information matrices are equal.
As before, we find maximum likelihood parameter estimates from the Fisher scoring or Newton Raphson method.
Remember: now we have \(p\times c\) parameters to estimate — \(p\) for each category \(c\). All of these coefficients may either be put into a long vector (length \(p\cdot c\)) — which might be easiest to understand for the estimation, or into a matrix of dimension \(p \times c\) — might be easier for viewing.
(grouped data)
With the notation that \({\boldsymbol \beta}\) is a long vector with the coefficients for the \(c\) categories stacked upon eachother.
\[L({\boldsymbol \beta})=\Pi_{i=1}^G f({\bf y}_i\mid {\boldsymbol \pi})\] where \(f\) is the multinomial distribution function.
\[ l({\boldsymbol \beta})\propto\sum_{i=1}^G \sum_{s=1}^{c+1} y_{is}\ln(\pi_{is})\]
where we remember that \(y_{i,c+1}=n_i-y_{i1}-\cdots-y_{ic}\), and \(1-\pi_{i1}-\cdots \pi_{ic}\).
(This formula is also correct for the ordinal model of the next section.) General formulas for the score function and expected Fisher information matrix follow later.
The derivation used for the binary GLM model generalizes directly ot the multinomial GLM. The fitted probabilities are \(\hat\pi_{ij}\) (group \(i\) and category \(j\)) and the saturated model (grouped data) is \(n_i \tilde\pi_{ij}=y_{ij}\). \[D=2 \sum_{i=1}^G \sum_{s=1}^{c+1} y_{is}\ln(\frac{{y}_{is}}{n_i \hat{\pi}_{is}})\]
The asymptotic distribution is as before \(\chi^2\) with “the number of groups times number of categories minus 1 (Gc)” minus “the number of covariates (cp)”, giving \(Gc-cp=c(G-p)\) degrees of freedom for the nominal model.
The deviance can be used for model check with grouped data (\(G\) groups with \(n_i\) observations), but can be used to compare nested unsaturated models also for individual (ungrouped) data, with again an asymptotic \(\chi^2\) distribution with the difference of number of parameters between the two models.
This formula is also correct for the ordinal model of the next section, except that the number of parameters estimated differ. For the ordinal model to come next we have to estimate \(k+c\) instead of \(p\) parameters, so the formula for degrees of freedom for the deviance for the ordinal model is \(Gc-k-c\).
Example and data are taken from Agresti (2015, pages 217-219).
Research question: what is the factors influencing the primary food choice of alligators?
Data are from 219 captured alligators from four lakes in Florida, where the stomack contents of the alligators were investigated. The weight of different types of food was measured, and then the primary food choice (highest weight) was noted. The primary choice is given as y1:y5 below. In addition the size of the alligator (non-adult or adult) was registered.
These data are grouped, and we let y1:fish be the reference category.
# data from Agresti (2015), section 6, with use of the VGAM packages
data = "http://www.stat.ufl.edu/~aa/glm/data/Alligators.dat"
ali = read.table(data, header = T)
ali
attach(ali)
## lake size y1 y2 y3 y4 y5
## 1 1 1 23 4 2 2 8
## 2 1 0 7 0 1 3 5
## 3 2 1 5 11 1 0 3
## 4 2 0 13 8 6 1 0
## 5 3 1 5 11 2 1 5
## 6 3 0 8 7 6 3 5
## 7 4 1 16 19 1 2 3
## 8 4 0 17 1 0 1 3
y.data = cbind(y2, y3, y4, y5, y1)
y.data
dim(y.data)
x.data = model.matrix(~size + factor(lake), data = ali)
x.data
dim(x.data)
## y2 y3 y4 y5 y1
## [1,] 4 2 2 8 23
## [2,] 0 1 3 5 7
## [3,] 11 1 0 3 5
## [4,] 8 6 1 0 13
## [5,] 11 2 1 5 5
## [6,] 7 6 3 5 8
## [7,] 19 1 2 3 16
## [8,] 1 0 1 3 17
## [1] 8 5
## (Intercept) size factor(lake)2 factor(lake)3 factor(lake)4
## 1 1 1 0 0 0
## 2 1 0 0 0 0
## 3 1 1 1 0 0
## 4 1 0 1 0 0
## 5 1 1 0 1 0
## 6 1 0 0 1 0
## 7 1 1 0 0 1
## 8 1 0 0 0 1
## attr(,"assign")
## [1] 0 1 2 2 2
## attr(,"contrasts")
## attr(,"contrasts")$`factor(lake)`
## [1] "contr.treatment"
##
## [1] 8 5
# We use library VGAM:
library(VGAM)
# We fit a multinomial logit model with fish (y1) as the reference category:
fit.main = vglm(cbind(y2, y3, y4, y5, y1) ~ size + factor(lake), family = multinomial,
data = ali)
summary(fit.main)
pchisq(deviance(fit.main), df.residual(fit.main), lower.tail = FALSE)
##
## Call:
## vglm(formula = cbind(y2, y3, y4, y5, y1) ~ size + factor(lake),
## family = multinomial, data = ali)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -3.2074 0.6387 -5.021 5.13e-07 ***
## (Intercept):2 -2.0718 0.7067 -2.931 0.003373 **
## (Intercept):3 -1.3980 0.6085 -2.297 0.021601 *
## (Intercept):4 -1.0781 0.4709 -2.289 0.022061 *
## size:1 1.4582 0.3959 3.683 0.000231 ***
## size:2 -0.3513 0.5800 -0.606 0.544786
## size:3 -0.6307 0.6425 -0.982 0.326296
## size:4 0.3316 0.4482 0.740 0.459506
## factor(lake)2:1 2.5956 0.6597 3.934 8.34e-05 ***
## factor(lake)2:2 1.2161 0.7860 1.547 0.121824
## factor(lake)2:3 -1.3483 1.1635 -1.159 0.246529
## factor(lake)2:4 -0.8205 0.7296 -1.125 0.260713
## factor(lake)3:1 2.7803 0.6712 4.142 3.44e-05 ***
## factor(lake)3:2 1.6925 0.7804 2.169 0.030113 *
## factor(lake)3:3 0.3926 0.7818 0.502 0.615487
## factor(lake)3:4 0.6902 0.5597 1.233 0.217511
## factor(lake)4:1 1.6584 0.6129 2.706 0.006813 **
## factor(lake)4:2 -1.2428 1.1854 -1.048 0.294466
## factor(lake)4:3 -0.6951 0.7813 -0.890 0.373608
## factor(lake)4:4 -0.8262 0.5575 -1.482 0.138378
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: log(mu[,1]/mu[,5]), log(mu[,2]/mu[,5]),
## log(mu[,3]/mu[,5]), log(mu[,4]/mu[,5])
##
## Residual deviance: 17.0798 on 12 degrees of freedom
##
## Log-likelihood: -47.5138 on 12 degrees of freedom
##
## Number of Fisher scoring iterations: 5
##
## Warning: Hauck-Donner effect detected in the following estimate(s):
## '(Intercept):1'
##
##
## Reference group is level 5 of the response
## [1] 0.1466189
Q:
exp(coefficients(fit.main))
## (Intercept):1 (Intercept):2 (Intercept):3 (Intercept):4 size:1
## 0.0404626 0.1259644 0.2471007 0.3402497 4.2982356
## size:2 size:3 size:4 factor(lake)2:1 factor(lake)2:2
## 0.7037987 0.5322405 1.3931262 13.4043318 3.3739877
## factor(lake)2:3 factor(lake)2:4 factor(lake)3:1 factor(lake)3:2 factor(lake)3:3
## 0.2596748 0.4401925 16.1245576 5.4329197 1.4808988
## factor(lake)3:4 factor(lake)4:1 factor(lake)4:2 factor(lake)4:3 factor(lake)4:4
## 1.9940595 5.2506853 0.2885818 0.4990158 0.4377111
Testing out other models, and comparing with LRT-test - by using deviances for different models.
# Fit model with only lake:
fit.lake = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake), family = multinomial, data = ali)
summary(fit.lake)
# Test effect of size (no anova command is available)
G2 = deviance(fit.lake) - deviance(fit.main)
G2
df.diff = df.residual(fit.lake) - df.residual(fit.main)
df.diff
1 - pchisq(G2, df.diff)
# Size has a significant effect
##
## Call:
## vglm(formula = cbind(y2, y3, y4, y5, y1) ~ factor(lake), family = multinomial,
## data = ali)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -2.0149 0.5323 -3.785 0.000153 ***
## (Intercept):2 -2.3026 0.6055 -3.803 0.000143 ***
## (Intercept):3 -1.7918 0.4830 -3.709 0.000208 ***
## (Intercept):4 -0.8362 0.3320 -2.518 0.011787 *
## factor(lake)2:1 2.0690 0.6257 3.307 0.000944 ***
## factor(lake)2:2 1.3581 0.7517 1.807 0.070810 .
## factor(lake)2:3 -1.0986 1.1353 -0.968 0.333199
## factor(lake)2:4 -0.9555 0.7065 -1.352 0.176228
## factor(lake)3:1 2.3403 0.6448 3.629 0.000284 ***
## factor(lake)3:2 1.8171 0.7540 2.410 0.015963 *
## factor(lake)3:3 0.6131 0.7485 0.819 0.412725
## factor(lake)3:4 0.5739 0.5359 1.071 0.284216
## factor(lake)4:1 1.5141 0.6030 2.511 0.012042 *
## factor(lake)4:2 -1.1939 1.1819 -1.010 0.312427
## factor(lake)4:3 -0.6061 0.7726 -0.785 0.432746
## factor(lake)4:4 -0.8685 0.5543 -1.567 0.117138
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: log(mu[,1]/mu[,5]), log(mu[,2]/mu[,5]),
## log(mu[,3]/mu[,5]), log(mu[,4]/mu[,5])
##
## Residual deviance: 38.1672 on 16 degrees of freedom
##
## Log-likelihood: -58.0575 on 16 degrees of freedom
##
## Number of Fisher scoring iterations: 5
##
## No Hauck-Donner effect found in any of the estimates
##
##
## Reference group is level 5 of the response
## [1] 21.08741
## [1] 4
## [1] 0.0003042796
# Fit model with only size:
fit.size = vglm(cbind(y2, y3, y4, y5, y1) ~ size, family = multinomial, data = ali)
summary(fit.size)
# Test effect of lake
G2 = deviance(fit.size) - deviance(fit.main)
G2
df.diff = df.residual(fit.size) - df.residual(fit.main)
df.diff
1 - pchisq(G2, df.diff)
# Lake has a significant effect
##
## Call:
## vglm(formula = cbind(y2, y3, y4, y5, y1) ~ size, family = multinomial,
## data = ali)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -1.0341 0.2911 -3.553 0.000381 ***
## (Intercept):2 -1.2417 0.3149 -3.944 8.03e-05 ***
## (Intercept):3 -1.7272 0.3837 -4.502 6.74e-06 ***
## (Intercept):4 -1.2417 0.3149 -3.944 8.03e-05 ***
## size:1 0.9489 0.3569 2.659 0.007837 **
## size:2 -0.8583 0.5350 -1.604 0.108626
## size:3 -0.5552 0.6063 -0.916 0.359864
## size:4 0.2943 0.4150 0.709 0.478129
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: log(mu[,1]/mu[,5]), log(mu[,2]/mu[,5]),
## log(mu[,3]/mu[,5]), log(mu[,4]/mu[,5])
##
## Residual deviance: 66.2129 on 24 degrees of freedom
##
## Log-likelihood: -72.0803 on 24 degrees of freedom
##
## Number of Fisher scoring iterations: 4
##
## No Hauck-Donner effect found in any of the estimates
##
##
## Reference group is level 5 of the response
## [1] 49.13308
## [1] 12
## [1] 1.982524e-06
Q: explain what is presented below, in particular “what is the probability that the main food source is fish given size=0 and lake=1”?
library(knitr)
# Fitted values for main effect model 'fit.main':
fitted = data.frame(fitted(fit.main), lake = ali$lake, size = ali$size)
kable(fitted)
y2 | y3 | y4 | y5 | y1 | lake | size |
---|---|---|---|---|---|---|
0.0930988 | 0.0474566 | 0.0704015 | 0.2537396 | 0.5353035 | 1 | 1 |
0.0230717 | 0.0718246 | 0.1408963 | 0.1940096 | 0.5701978 | 1 | 0 |
0.6018967 | 0.0772276 | 0.0088175 | 0.0538721 | 0.2581861 | 2 | 1 |
0.2486452 | 0.1948374 | 0.0294161 | 0.0686628 | 0.4584385 | 2 | 0 |
0.5168385 | 0.0887672 | 0.0358947 | 0.1742005 | 0.1842990 | 3 | 1 |
0.1929612 | 0.2023995 | 0.1082251 | 0.2006616 | 0.2957525 | 3 | 0 |
0.4128558 | 0.0115665 | 0.0296712 | 0.0938024 | 0.4521040 | 4 | 1 |
0.1396778 | 0.0238987 | 0.0810674 | 0.0979136 | 0.6574425 | 4 | 0 |
(we will only consider cumulative models - and not sequential models)
An unobservable latent variable \(U_i\) drives the observed category \(Y_i\).
\[ Y_i=r \Leftrightarrow \theta_{r-1}\le U_i \le \theta_r\] where these \(\theta\)s are our unobservable thresholds, and the thresholds are monotonely increasing, \(-\infty=\theta_0< \theta_1 < \cdots < \theta_{c+1}=\infty\).
We further assume that the latent variables are dependent on our covariates through
\[ U_i=-{\bf x}_i^T{\boldsymbol \beta} + \varepsilon_i\]
where we have a new random variable that has cumulative distribution function (cdf) \(F\). No intercept is included due to identifiability issuse (shift in intercept would produce the same effect as negative shift in threshold).
We get rid of the latent variable \(U_i\) by considereing
\[P(Y_i\le r)=P(U_i\le \theta_r)=P(-{\bf x}_i^T {\boldsymbol \beta}+\varepsilon_i \le \theta_r)\] \[=P(\varepsilon_i \le \theta_r+{\bf x}_i^T {\boldsymbol \beta})=F(\theta_r+{\bf x}_i^T {\boldsymbol \beta}) \]
Observe that the final expression does not include the latent variable \(U_i\), but includes the unknown threshold and \(k\) regression parameters.
Different choices of \(F\) will give different models, and we will only consider \(F\) to be the cdf for the logistic distribution. (Another popular choice is the cdf of the standard normal distribution.)
\[P(Y_i\le r)=\frac{\exp(\theta_r+{\bf x}_i^T {\boldsymbol \beta})}{1+\exp(\theta_r+{\bf x}_i^T {\boldsymbol \beta})}\]
which also can be written as
\[ \ln(\frac{P(Y_i\le r)}{P(Y_i> r)})=\theta_r+{\bf x}_i^T {\boldsymbol \beta}\]
Our model is a proportional odds model, in the sense that the cumulative odds are proportional across categories
\[\frac{\frac{P(Y_i\le r\mid {\bf x}_i)}{P(Y_i> r\mid {\bf x}_i)}} {\frac{P(Y_i\le r\mid {\bf x}^*_i)}{P(Y_i> r\mid {\bf x}^*_i)}}= \exp(({\bf x}_i-{\bf x}^*_i)^T {\boldsymbol \beta})\]
Observe that this is independent of \(r\).
If the only change from \({\bf x}\) to \({\bf x}^*\) is that one covariate (say covariate \(k\)) change with one unit - then \(\exp(({\bf x}_i-{\bf x}^*_i)^T {\boldsymbol \beta})=\exp(\beta_k)\), and the proportional odds model makes us able to explain what \(\beta_k\) means.
What is the response function here?
\[\pi_{i1}=F(\eta_{i1})\] \[\pi_{ir}=F(\eta_{ir})-F(\eta_{i,r-1})\]
where \(\eta_{ir}=\theta_r+{\bf x}_i^T{\boldsymbol \beta}\), and \(F\) is the logistic cdf.
We have five categories, where the fifth is the reference category.
True parameters
The graph shows the cumulative probability \(P(Y\le r)\) for \(r=1\) (red), \(r=2\) (organge), \(r=3\) (green), \(r=4\) (purple).
Observe the parallell lines.
What would \(P(Y\le 5)\) be? Why is this missing from the plot?
Example and data are taken from Agresti (2015, pages 219-223).
Research question: understand mental health issues.
The data comes from a random sample of size 40 of adult residents of Alachua County, Florida, USA.
These data are ungrouped (but could be grouped). In the original study several other explanatory variables were studied.
# Read mental health data from the web:
library(knitr)
data = "http://www.stat.ufl.edu/~aa/glm/data/Mental.dat"
mental = read.table(data, header = T)
colnames(mental)
apply(mental, 2, table)
# kable(mental)
## [1] "impair" "ses" "life"
## $impair
##
## 1 2 3 4
## 12 12 7 9
##
## $ses
##
## 0 1
## 18 22
##
## $life
##
## 0 1 2 3 4 5 6 7 8 9
## 2 5 4 8 5 4 2 2 4 4
library(VGAM)
# We fit a cumulative logit model with main effects of 'ses' and 'life':
fit.imp = vglm(impair ~ life + ses, family = cumulative(parallel = T), data = mental)
# parallell=T gives proportional odds structure - only intercepts differ
summary(fit.imp)
##
## Call:
## vglm(formula = impair ~ life + ses, family = cumulative(parallel = T),
## data = mental)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -0.2819 0.6231 -0.452 0.65096
## (Intercept):2 1.2128 0.6511 1.863 0.06251 .
## (Intercept):3 2.2094 0.7171 3.081 0.00206 **
## life -0.3189 0.1194 -2.670 0.00759 **
## ses 1.1112 0.6143 1.809 0.07045 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]),
## logitlink(P[Y<=3])
##
## Residual deviance: 99.0979 on 115 degrees of freedom
##
## Log-likelihood: -49.5489 on 115 degrees of freedom
##
## Number of Fisher scoring iterations: 5
##
## No Hauck-Donner effect found in any of the estimates
##
##
## Exponentiated coefficients:
## life ses
## 0.7269742 3.0380707
The ML fit for this model can be written as
\[ \text{logit}(\hat{P}(y_i\le r))=\hat{\theta}_r+0.319 x_{i1}+1.111 x_{i2}\]
Q: give an interpretation of this model!
Remember:
Q: How can you interpret the last line below? Why is it exp(CI(beta)) and not CI(exp(beta))?
exp(confint(fit.imp))
## 2.5 % 97.5 %
## (Intercept):1 0.2224503 2.5581328
## (Intercept):2 0.9385968 12.0489557
## (Intercept):3 2.2342109 37.1467162
## life 0.5752574 0.9187045
## ses 0.9114465 10.1266209
Q: How are these predictions calculated? What is the interpretation?
fitted = data.frame(fitted(fit.imp), ses = mental$ses, life = mental$life)
fitted[c(6, 18, 10), ] #0,7 not fitted
xs = cbind(c(2, 7, 2, 7), c(0, 0, 1, 1))
coeff = coefficients(fit.imp)
linpreds = cbind(coeff[1] + xs %*% coeff[4:5], coeff[2] + xs %*% coeff[4:5], coeff[3] +
xs %*% coeff[4:5])
cprobs = exp(linpreds)/(1 + exp(linpreds))
cprobs
pprobs = cbind(cprobs[, 1], cprobs[, 2] - cprobs[, 1], cprobs[, 3] - cprobs[, 2],
1 - cprobs[, 3])
pprobs
## X1 X2 X3 X4 ses life
## 6 0.2850362 0.3548973 0.18808559 0.17198084 0 2
## 18 0.5477558 0.2959810 0.09227184 0.06399141 1 2
## 10 0.1973858 0.3255923 0.22513134 0.25189056 1 7
## [,1] [,2] [,3]
## [1,] 0.28503623 0.6399336 0.8280192
## [2,] 0.07488691 0.2651744 0.4943332
## [3,] 0.54775576 0.8437368 0.9360086
## [4,] 0.19738576 0.5229781 0.7481094
## [,1] [,2] [,3] [,4]
## [1,] 0.28503623 0.3548973 0.18808559 0.17198084
## [2,] 0.07488691 0.1902875 0.22915882 0.50566678
## [3,] 0.54775576 0.2959810 0.09227184 0.06399141
## [4,] 0.19738576 0.3255923 0.22513134 0.25189056
Q: What do you see here, and what is the formula for this matrix?
vcov(fit.imp)
## (Intercept):1 (Intercept):2 (Intercept):3 life ses
## (Intercept):1 0.38819709 0.32992954 0.32615019 -0.04231112 -0.15615761
## (Intercept):2 0.32992954 0.42395851 0.40844529 -0.05427393 -0.11440293
## (Intercept):3 0.32615019 0.40844529 0.51423495 -0.06185757 -0.09055667
## life -0.04231112 -0.05427393 -0.06185757 0.01426291 -0.01855824
## ses -0.15615761 -0.11440293 -0.09055667 -0.01855824 0.37732634
# We consider a model with interaction between 'ses' and 'life':
fit.int = vglm(impair ~ life + ses + life:ses, family = cumulative(parallel = T),
data = mental)
summary(fit.int)
##
## Call:
## vglm(formula = impair ~ life + ses + life:ses, family = cumulative(parallel = T),
## data = mental)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 0.09807 0.81102 0.121 0.90375
## (Intercept):2 1.59248 0.83717 1.902 0.05714 .
## (Intercept):3 2.60660 0.90966 2.865 0.00416 **
## life -0.42045 0.19031 -2.209 0.02715 *
## ses 0.37090 1.13022 0.328 0.74279
## life:ses 0.18131 0.23611 0.768 0.44255
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]),
## logitlink(P[Y<=3])
##
## Residual deviance: 98.5044 on 114 degrees of freedom
##
## Log-likelihood: -49.2522 on 114 degrees of freedom
##
## Number of Fisher scoring iterations: 5
##
## No Hauck-Donner effect found in any of the estimates
##
##
## Exponentiated coefficients:
## life ses life:ses
## 0.6567529 1.4490350 1.1987822
# And test if there is a significant effect of interaction:
G2 = deviance(fit.imp) - deviance(fit.int)
df.diff = df.residual(fit.imp) - df.residual(fit.int)
1 - pchisq(G2, df.diff)
# The effect of interaction is not significant
## [1] 0.4410848
# We consider a model where the effect of the covariates may differ between the
# cumulative logits - so not parallell lines for the cdfs
fit.nopar = vglm(impair ~ life + ses, family = cumulative, data = mental)
summary(fit.nopar)
# The change in the deviance compared to the model 'fit.imp' is
# 99.0979-96.7486=2.3493 with df.diff=115-111=4, which is not significant
# So model 'fit.imp'seems fine.
##
## Call:
## vglm(formula = impair ~ life + ses, family = cumulative, data = mental)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -0.1930 0.7387 -0.261 0.7938
## (Intercept):2 0.8278 0.7036 1.176 0.2394
## (Intercept):3 2.8049 0.9615 NA NA
## life:1 -0.3182 0.1597 -1.993 0.0463 *
## life:2 -0.2739 0.1372 -1.997 0.0458 *
## life:3 -0.3964 0.1592 -2.490 0.0128 *
## ses:1 0.9732 0.7720 1.261 0.2074
## ses:2 1.4962 0.7460 2.006 0.0449 *
## ses:3 0.7518 0.8358 0.899 0.3684
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: logitlink(P[Y<=1]), logitlink(P[Y<=2]),
## logitlink(P[Y<=3])
##
## Residual deviance: 96.7486 on 111 degrees of freedom
##
## Log-likelihood: -48.3743 on 111 degrees of freedom
##
## Number of Fisher scoring iterations: 14
##
## Warning: Hauck-Donner effect detected in the following estimate(s):
## '(Intercept):3'
##
##
## Exponentiated coefficients:
## life:1 life:2 life:3 ses:1 ses:2 ses:3
## 0.7274592 0.7604194 0.6727572 2.6465169 4.4645713 2.1207587
Based on Agresti (2015, p 214-216)
To use MLR the ordinal categories need to be replaced with numerical values, and we then need to assume a normal error structure. The following are questions to be answered and possible limitation to be assumed for using MLR instead of ordinal regression:
We use the notation that \({\boldsymbol \beta}\) is a long vector with all regression parameters. The content of this vector is slightly different for our two models, with intercept and \(k\) covariate effects for each response category for the nominal model - and with \(c\) thresholds but the same \(k\)-dimensional \({\boldsymbol \beta}\) vector for all categories.
Full matrix versions (over all \(i\)) can be found in our textbook, page 345-346.
We have seen that the loglikelihood is: \[ l({\boldsymbol \beta})\propto\sum_{i=1}^n \sum_{s=1}^{c+1} y_{is}\ln(\pi_{is})\]
where we remember that \(y_{i,c+1}=n_i-y_{i1}-\cdots-y_{ic}\), and \(1-\pi_{i1}-\cdots \pi_{ic}\).
The design matrix \({\bf X}\) and coefficient vector are different for our nominal logit model and our ordinal cumulative model.
\[{\bf X}_i=\text{diag}({\bf x}_i^T)=\begin{pmatrix}{\bf x}_i^T & 0 & \cdots & 0\\ 0 & {\bf x}_i^T & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \cdots & {\bf x}_i^T\\ \end{pmatrix}\]
where the \(0\)s are \(1\times p\) vectors. The dimension of the design matrix for covariate pattern \(i\) is \(c \times c\cdot p\).
The vector of coefficients has dimension \(c\cdot p \times 1\). \[{\boldsymbol \beta}=\begin{pmatrix}{\boldsymbol \beta}_1\\{\boldsymbol \beta}_2\\\vdots\\{\boldsymbol \beta}_c\\ \end{pmatrix}\]
\[{\bf X}_i=\begin{pmatrix}1 & 0 & \cdots & 0 & {\bf x}_i^T\\ 0 & 1 & \cdots & 0 & {\bf x}_i^T\\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 1 & {\bf x}_i^T\\ \end{pmatrix}\] he dimension of the design matrix for covariate pattern \(i\) is \(c \times (c+k)\)
The vector of coefficients has dimension \((c+ k) \times 1\) (where \(p=k+1\)), and now the thresholds replace the intercept and are put first in the vector, and the effects of the covariates are the same for all categories. \[{\boldsymbol \beta}=\begin{pmatrix}\theta_1\\ \theta_2 \\ \vdots \\ \theta_c \\{\boldsymbol \beta}\\ \end{pmatrix}\]
\[{\bf s}({\boldsymbol \beta})=\sum_{i=1}^G {\bf X}_i^T {\bf D}_i {\boldsymbol \Sigma}_i^{-1}({\bf y}_i-n_i{\boldsymbol \pi}_i)\]
where
The dimension of the matrix is \(cp\times cp\) for the nominal case and \((c+k) \times (c+k)\) for the ordinal case studied.
\[F({\boldsymbol \beta})=\sum_{i=1}^G {\bf X}_i^T {\bf W}_i {\bf X}_i\]
where \({\bf W}_i\) is given as \({\bf D}_i{\boldsymbol \Sigma}_i^{-1}{\bf D}_i^T\).
As in modules 1-5 we find the ML estimate by the Fisher scoring or Newton Raphson method.
As in modules 1-5 the ML estimator \(\hat{\boldsymbol \beta}\) asymptotically follows a multivariate normal distribution with unbiased mean and covariance matrix given by the inverse of the expected Fisher information matrix.
Table 1 shows the results from a study where two injection plans for the the neuroleptic preparation perphenazine decanoate have been compared (from P. Knudsen, L. B. Hansen, K. Højholdt, N. E. Larsen, Acta Psychiatrica Scandinavica, 1985).
A group of 19 psycotic patients was given injections every second week, while another group of 19 patients was given injections every third week. The patients were studied for six months, and the effect of the treatment was evaluated in the end. Clinical evaluations was done using a six-point scale calles CGI (Clinical Global Impression), where a higher score means a worse state for the patient.
The 12 rows in Table 1 corresponds to 12 different combinations of the three explanatory variables \(x_1\), \(x_2\) and \(x_3\):
\[ x_1 = \begin{cases} 0 \text{, if injections are given every second week} \\ 1 \text{, if injections are given every third week} \end{cases} \\ x_2 = \begin{cases} 0 \text{, if patient is female} \\ 1 \text{, if patient is male} \end{cases} \\ x_3 = \text{ CGI at beginning of treatment (initial CGI)} \]
Interval (x1) | Sex (x2) | Initial CGI (x3) | Final CGI 0 (y0) | Final CGI 1 (y1) | Final CGI 2 (y2) |
---|---|---|---|---|---|
2 | F | 2 | 1 | 0 | 0 |
2 | F | 3 | 3 | 1 | 0 |
2 | F | 4 | 0 | 1 | 0 |
2 | M | 3 | 4 | 4 | 1 |
2 | M | 4 | 0 | 2 | 1 |
2 | M | 5 | 0 | 0 | 1 |
3 | F | 2 | 1 | 0 | 0 |
3 | F | 3 | 2 | 1 | 0 |
3 | F | 4 | 1 | 2 | 0 |
3 | M | 2 | 3 | 1 | 0 |
3 | M | 3 | 0 | 5 | 0 |
3 | M | 4 | 0 | 3 | 0 |
The corresponding responses are counts for each combination of explanatory variables:
\[ y_0 = \text{ number with (CGI = 0) after treatment} \\ y_1 = \text{ number with (CGI = 1) after treatment} \\ y_2 = \text{ number with (CGI = 2) after treatment} \]
Note that no patients had final CGI above 2 after the treatment.. We use \(y_2\) as the reference category. Assume that the CGI for a patient with covariate vector \(\mathbf{x} = (x_1, x_2, x_3)\) has response value \(j\), \(j = 0, 1, 2\), with probabilities
\[\pi_j = \text{Prob}(\text{CGI} = j | \mathbf{x}) \text{ for } j = 0, 1, 2.\]
The response \(\mathbf{y} = (y_0, y_1, y_2)\) for a row in the table is assumed to come from a multinomial distribution vector \(\mathbf{Y} = (Y_0, Y_1, Y_2)\) with probability vector \(\mathbf{\pi} = (\pi_0, \pi_1, \pi_2)\), and \(\mathbf{\pi}\) depends on \(\mathbf{x}\). Note that \(x_3\) is numeric, not cathegorical.
Below you can see the deviance for all proportional odds models that
contain the variable \(x_3\) (initial
CGI). The formulas work in the same way as for the lm
and
glm
formulas.
Model | Deviance |
---|---|
cbind(y0, y1, y2) ~ x3 | 17.68 |
cbind(y0, y1, y2) ~ x1 + x3 | 17.66 |
cbind(y0, y1, y2) ~ x2 + x3 | 10.64 |
cbind(y0, y1, y2) ~ x1 * x3 | 14.43 |
cbind(y0, y1, y2) ~ x2 * x3 | 10.63 |
cbind(y0, y1, y2) ~ x1 + x2 + x3 | 10.56 |
cbind(y0, y1, y2) ~ x1 * x2 + x3 | 10.33 |
cbind(y0, y1, y2) ~ x1 * x3 + x2 | 8.52 |
cbind(y0, y1, y2) ~ x1 + x2 * x3 | 10.51 |
cbind(y0, y1, y2) ~ x1 * x2 + x1 * x3 | 8.33 |
cbind(y0, y1, y2) ~ x1 * x2 + x2 * x3 | 10.31 |
cbind(y0, y1, y2) ~ x1 * x3 + x2 * x3 | 8.47 |
cbind(y0, y1, y2) ~ x1 * x2 + x1 * x3 + x2 * x3 | 8.28 |
cbind(y0, y1, y2) ~ x1 * x2 * x3 | 8.28 |
cbind(y0, y1, y2) ~ x1 + x2*x3
? OBS: Ask Mette/Ingeborg if
you are not sure before moving on!x1*x2 + x1*x3
.
How many parameters are in this model? How many degrees of freedom for
the deviance?x2 + x3
,
x1 + x2 + x3
, x1*x2 + x3
and
x1*x2 + x1*x3
as candidates for “the best model”. Which of
these models would you choose based on the deviances? Reason using
hypothesis testing (you have to choose one model for the
null-hypothesis, which?).Below you see (a slightly edited) R
-summary of the
x1 + x2 + x3
model. Assume we still use the model from
a).
Call:
vglm(formula = x, family = cumulative(parallel = TRUE), data = data2)
Pearson residuals:
Min 1Q Median 3Q Max
logit(P[Y<=1]) -1.294 -0.33737 -0.08605 0.1788 1.211
logit(P[Y<=2]) -1.442 -0.08222 0.12144 0.2428 1.100
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept):1 8.0355 2.5079 3.204 0.00136 **
(Intercept):2 12.4324 3.1752 3.916 9.02e-05 ***
x1 -0.2199 0.7561 -0.291 0.77114
x2M -2.1576 0.8875 -2.431 0.01506 *
x3 -2.2725 0.6985 -3.253 0.00114 **
---
Number of linear predictors: 2
Names of linear predictors: logit(P[Y<=1]), logit(P[Y<=2])
Residual deviance: 10.5552 on ?? degrees of freedom
Log-likelihood: -11.6754 on ?? degrees of freedom
And this is the estimated covariace matrix for the estimators:
## (Intercept):1 (Intercept):2 x1 x2M x3
## (Intercept):1 6.2898 7.6696 -0.6459 -1.1575 -1.6602
## (Intercept):2 7.6696 10.0816 -0.7281 -1.4997 -2.0776
## x1 -0.6459 -0.7281 0.5716 0.0921 0.0986
## x2M -1.1575 -1.4997 0.0921 0.7877 0.1982
## x3 -1.6602 -2.0776 0.0986 0.1982 0.4879
R
-summary to estimate \(e^{\beta_k}\) for \(k = 1, 2, 3\).We will analyses an extended version of the alligators data, where also the gender of the alligator is included.
In the data below the following column names are given:
library(VGAM)
data2 = "http://www.stat.ufl.edu/~aa/glm/data/Alligators2.dat"
ali2 = read.table(data2, header = T)
ali2
## lake gender size y1 y2 y3 y4 y5
## 1 1 1 1 7 1 0 0 5
## 2 1 1 0 4 0 0 1 2
## 3 1 0 1 16 3 2 2 3
## 4 1 0 0 3 0 1 2 3
## 5 2 1 1 2 2 0 0 1
## 6 2 1 0 13 7 6 0 0
## 7 2 0 1 0 1 0 1 0
## 8 2 0 0 3 9 1 0 2
## 9 3 1 1 3 7 1 0 1
## 10 3 1 0 8 6 6 3 5
## 11 3 0 1 2 4 1 1 4
## 12 3 0 0 0 1 0 0 0
## 13 4 1 1 13 10 0 2 2
## 14 4 1 0 9 0 0 1 2
## 15 4 0 1 3 9 1 0 1
## 16 4 0 0 8 1 0 0 1
colnames(ali2)
## [1] "lake" "gender" "size" "y1" "y2" "y3" "y4" "y5"
fit = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(size) + factor(gender),
family = multinomial, data = ali2)
# 6 possible models to investigate: only lake, only gender, only size,
# lake+gender, lake+size, size+gender
fit.lake = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake), family = multinomial, data = ali2)
fit.size = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(size), family = multinomial, data = ali2)
fit.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(gender), family = multinomial,
data = ali2)
fit.lake.size = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(size), family = multinomial,
data = ali2)
fit.lake.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(gender),
family = multinomial, data = ali2)
fit.gender.size = vglm(cbind(y2, y3, y4, y5, y1) ~ factor(size) + factor(gender),
family = multinomial, data = ali2)
all = list(fit = fit, fit.lake = fit.lake, fit.size = fit.size, fit.gender = fit.gender,
fit.lake.size = fit.lake.size, fit.lake.gender = fit.lake.gender, fit.gender.size = fit.gender.size)
lapply(all, AIC)
## $fit
## [1] 199.6089
##
## $fit.lake
## [1] 201.9464
##
## $fit.size
## [1] 221.7073
##
## $fit.gender
## [1] 227.0375
##
## $fit.lake.size
## [1] 196.089
##
## $fit.lake.gender
## [1] 204.2444
##
## $fit.gender.size
## [1] 228.4214
# you may also look at deviance tests if you prefer that to AIC
# what is best? toggle to match your choice
best = fit
summary(best)
##
## Call:
## vglm(formula = cbind(y2, y3, y4, y5, y1) ~ factor(lake) + factor(size) +
## factor(gender), family = multinomial, data = ali2)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 -2.78138 0.66023 -4.213 2.52e-05 ***
## (Intercept):2 -1.94388 0.78487 -2.477 0.01326 *
## (Intercept):3 -1.41471 0.69227 -2.044 0.04100 *
## (Intercept):4 -0.78063 0.50933 -1.533 0.12536
## factor(lake)2:1 3.14631 0.71871 4.378 1.20e-05 ***
## factor(lake)2:2 1.25128 0.84769 1.476 0.13992
## factor(lake)2:3 -1.10312 1.20848 -0.913 0.36134
## factor(lake)2:4 -0.79033 0.77202 -1.024 0.30597
## factor(lake)3:1 3.04532 0.70100 4.344 1.40e-05 ***
## factor(lake)3:2 1.84442 0.83227 2.216 0.02668 *
## factor(lake)3:3 0.76742 0.84320 0.910 0.36276
## factor(lake)3:4 0.76354 0.59707 1.279 0.20097
## factor(lake)4:1 1.87573 0.62803 2.987 0.00282 **
## factor(lake)4:2 -1.15467 1.19439 -0.967 0.33367
## factor(lake)4:3 -0.50671 0.79598 -0.637 0.52439
## factor(lake)4:4 -0.76568 0.57046 -1.342 0.17953
## factor(size)1:1 1.24700 0.42426 2.939 0.00329 **
## factor(size)1:2 -0.35772 0.65755 -0.544 0.58643
## factor(size)1:3 -0.28685 0.65065 -0.441 0.65931
## factor(size)1:4 0.09454 0.46435 0.204 0.83867
## factor(gender)1:1 -0.78010 0.37880 -2.059 0.03946 *
## factor(gender)1:2 -0.32559 0.60986 -0.534 0.59343
## factor(gender)1:3 -0.52430 0.66647 -0.787 0.43147
## factor(gender)1:4 -0.33231 0.46161 -0.720 0.47159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Names of linear predictors: log(mu[,1]/mu[,5]), log(mu[,2]/mu[,5]),
## log(mu[,3]/mu[,5]), log(mu[,4]/mu[,5])
##
## Residual deviance: 55.2285 on 40 degrees of freedom
##
## Log-likelihood: -75.8045 on 40 degrees of freedom
##
## Number of Fisher scoring iterations: 6
##
## No Hauck-Donner effect found in any of the estimates
##
##
## Reference group is level 5 of the response
pchisq(deviance(best), df.residual(best), lower.tail = FALSE)
## [1] 0.05511724
confint(best)
## 2.5 % 97.5 %
## (Intercept):1 -4.0754134 -1.48734693
## (Intercept):2 -3.4821892 -0.40556893
## (Intercept):3 -2.7715358 -0.05787731
## (Intercept):4 -1.7789047 0.21764692
## factor(lake)2:1 1.7376626 4.55495618
## factor(lake)2:2 -0.4101634 2.91273242
## factor(lake)2:3 -3.4717099 1.26546012
## factor(lake)2:4 -2.3034619 0.72280210
## factor(lake)3:1 1.6713798 4.41925474
## factor(lake)3:2 0.2131968 3.47564154
## factor(lake)3:3 -0.8852275 2.42007378
## factor(lake)3:4 -0.4067037 1.93378524
## factor(lake)4:1 0.6448037 3.10665210
## factor(lake)4:2 -3.4956349 1.18628645
## factor(lake)4:3 -2.0668098 1.05338460
## factor(lake)4:4 -1.8837510 0.35239661
## factor(size)1:1 0.4154726 2.07853673
## factor(size)1:2 -1.6464982 0.93105746
## factor(size)1:3 -1.5620902 0.98839240
## factor(size)1:4 -0.8155718 1.00464755
## factor(gender)1:1 -1.5225425 -0.03766570
## factor(gender)1:2 -1.5208982 0.86971475
## factor(gender)1:3 -1.8305675 0.78196426
## factor(gender)1:4 -1.2370402 0.57242028
None found at NTNU or UiO - except the IL-problem.
install.packages(c("VGAM", "ggplot2", "statmod", "knitr"))