Interactive session

Problem 1: Exam 2006, problem 1 (ordinal model)

Table 1 shows the results from a study where two injection plans for the the neuroleptic preparation perphenazine decanoate have been compared (from P. Knudsen, L. B. Hansen, K. Højholdt, N. E. Larsen, Acta Psychiatrica Scandinavica, 1985).

A group of 19 psycotic patients was given injections every second week, while another group of 19 patients was given injections every third week. The patients were studied for six months, and the effect of the treatment was evaluated in the end. Clinical evaluations was done using a six-point scale calles CGI (Clinical Global Impression), where a higher score means a worse state for the patient.

The 12 rows in Table 1 corresponds to 12 different combinations of the three explanatory variables \(x_1\), \(x_2\) and \(x_3\):

\[ x_1 = \begin{cases} 0 \text{, if injections are given every second week} \\ 1 \text{, if injections are given every third week} \end{cases} \\ x_2 = \begin{cases} 0 \text{, if patient is female} \\ 1 \text{, if patient is male} \end{cases} \\ x_3 = \text{ CGI at beginning of treatment (initial CGI)} \]

Table 1: Data for neuroleptic treatment
Interval (x1) Sex (x2) Initial CGI (x3) Final CGI 0 (y0) Final CGI 1 (y1) Final CGI 2 (y2)
2 F 2 1 0 0
2 F 3 3 1 0
2 F 4 0 1 0
2 M 3 4 4 1
2 M 4 0 2 1
2 M 5 0 0 1
3 F 2 1 0 0
3 F 3 2 1 0
3 F 4 1 2 0
3 M 2 3 1 0
3 M 3 0 5 0
3 M 4 0 3 0

The corresponding responses are counts for each combination of explanatory variables:

\[ y_0 = \text{ number with (CGI = 0) after treatment} \\ y_1 = \text{ number with (CGI = 1) after treatment} \\ y_2 = \text{ number with (CGI = 2) after treatment} \]

Note that no patients had final CGI above 2 after the treatment.. We use \(y_2\) as the reference category. Assume that the CGI for a patient with covariate vector \(\mathbf{x} = (x_1, x_2, x_3)\) has response value \(j\), \(j = 0, 1, 2\), with probabilities

\[\pi_j = \text{Prob}(\text{CGI} = j | \mathbf{x}) \text{ for } j = 0, 1, 2.\]

The response \(\mathbf{y} = (y_0, y_1, y_2)\) for a row in the table is assumed to come from a multinomial distribution vector \(\mathbf{Y} = (Y_0, Y_1, Y_2)\) with probability vector \(\mathbf{\pi} = (\pi_0, \pi_1, \pi_2)\), and \(\mathbf{\pi}\) depends on \(\mathbf{x}\). Note that \(x_3\) is numeric, not cathegorical.

a)

  • Write down the proportional odds model for these data, and discuss it briefly. Assume there are no interactions between \(x_1\), \(x_2\) and \(x_3\). Remember that \(y_2\) is the reference category.
Answer

The response is ordinal, multinomial. The proportional odds model assumes a latent continuous variable that measures the CGI.

\[\log \left( \frac{\pi_0}{\pi_1 + \pi_2} \right) = \theta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\] \[\log \left( \frac{\pi_0 + \pi_1}{\pi_2} \right) = \theta_1 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3\]

  • Express \(\pi_j\), \(j = 0, 1, 2\) as functions of the \(\mathbf{\theta}\)’s and \(\mathbf{\beta}\)’s in the model and \(\mathbf{x}\).
Answer

Since the inverse of the logit-functiuon is the logitstic function (sometimes called the expit function), we have:

\[\pi_0 = \frac{e^{\theta_0 + \pmb{x}^T \pmb{\beta}}}{1 + e^{\theta_0 + \pmb{x}^T \pmb{\beta}}}\]

\[\pi_0 + \pi_1 = \frac{e^{\theta_1 + \pmb{x}^T \pmb{\beta}}}{1 + e^{\theta_1 + \pmb{x}^T \pmb{\beta}}}\]

\[\implies \pi_1 = \frac{e^{\beta_{01} + \pmb{x}^T \pmb{\beta}}}{1 + e^{\beta_{01} + \pmb{x}^T \pmb{\beta}}} - \pi_0\]

\[\pi_2 = 1 - \pi_0 - \pi_1\]

b)

  • \(\text{Prob}(\text{CGI} \leq j | \mathbf{x})/\text{Prob}(\text{CGI} > j | \mathbf{x})\) for \(j = 0, 1\) is called the cumulative odds ratios for a patient with covariate vector \(\mathbf{x}\). Show that if initial CGI increases with 1 in the model from a), then the cumulative odds ratios will be multiplied by \(e^{\beta_3}\). Here \(\beta_3\) is the coefficient belonging to \(x_3\) in the linear predictor of the model.
Answer

\[\frac{\text{Prob}(\text{CGI} \leq 0 | \mathbf{x})}{\text{Prob}(\text{CGI} > 0 | \mathbf{x})} = \frac{\pi_0}{\pi_1 + \pi_2} = e^{\theta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3}\]

\[\frac{\text{Prob}(\text{CGI} \leq 1 | \mathbf{x})}{\text{Prob}(\text{CGI} > 1 | \mathbf{x})} = \frac{\pi_0 + \pi_1}{\pi_2} = e^{\theta_1 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3}\]

We can see that if \(x_3\) increases with 1 (and \(x_1\) and \(x_2\) are unchanged), we will have \(+ \beta_3\) in the exponent, which means that the cumulative odds ratio is multiplied by \(e^{\beta_3}\).

  • Interpret the value \(e^{\beta_3}\).
Answer If \(e^{\beta} < 1\) (i.e. \(\beta_3 < 0\)), increasing the initial CGI will give a reduction in the odds of being in state 0 or 1 after the treatment (and increase the odds if \(\beta_3 > 0\)).
  • Interpret also the values \(e^{\beta_1}\) and \(e^{\beta_2}\).
Answer

From \(\text{Prob}(\text{CGI} \leq 0 | \mathbf{x})/\text{Prob}(\text{CGI} > 0 | \mathbf{x})\) we see that increasing \(x_1\) and \(x_2\) will in the same way lead to multilpying with \(e^{\beta_1}\) and \(e^{\beta_2}\), respectively. If \(e^{\beta_1} < 1\), going from treatment every second week (\(x_1 = 0\)) to every third week (\(x_1 = 1\)) gives a reduction in the odds for the “good” states 0 and 1. If \(e^{\beta_2} < 0\), males will have lower odds of having states 0 and 1.

Note that we want as low state (i.e., \(y\)) as possible.

c)

  • Describe the saturated model for these data. How many free parameters does it have? (Remark: how many “parameters” can be estimated.)
Answer

Saturated model: Each row in the table has its own \(\pmb{\pi}\)-vector \([\pi_{i0}, \pi_{i1}, \pi_{i2}]\). The model will thus have \(12 \cdot 2 = 24\) free parameters since we have \(\pi_{i0} + \pi_{i1} + \pi_{i2} = 1\).

  • How would you calculate the deviance for the model from a)? (Just explain using words, no calculations necessary!)
Answer

The deviance is given by

\[D = -2(l(\text{candidate model}) - l(\text{saturated model})).\]

From the module pages:

\[D = 2 \sum_{i=1}^{12} \sum_{j=0}^2 y_{ij} \log\left(\frac{y_{ij}}{\hat{y}_{ij}}\right)\] where \(\hat{y}_{ij} = n_i \hat{\pi_{ij}}\), and \(\pmb{y}_i = [y_{i0}, y_{i1}, y_{i2}]\) is the response for the \(i\)th line in the table. \(\hat{\pi}_{ij}\) is the estimated probability \(\pi_{ij}\) for the candidate model.

  • How many degrees of freedom does the deviance have here? Give reasons for your answer.
Answer

Degrees of freedom for the deviance = number of free parameters in the saturated model minus number of free parameters in the candidate model = \(24 - 5 = 19\).

Below you can see the deviance for all proportional odds models that contain the variable \(x_3\) (initial CGI). The formulas work in the same way as for the lm and glm formulas.

Model Deviance
cbind(y0, y1, y2) ~ x3 17.68
cbind(y0, y1, y2) ~ x1 + x3 17.66
cbind(y0, y1, y2) ~ x2 + x3 10.64
cbind(y0, y1, y2) ~ x1 * x3 14.43
cbind(y0, y1, y2) ~ x2 * x3 10.63
cbind(y0, y1, y2) ~ x1 + x2 + x3 10.56
cbind(y0, y1, y2) ~ x1 * x2 + x3 10.33
cbind(y0, y1, y2) ~ x1 * x3 + x2 8.52
cbind(y0, y1, y2) ~ x1 + x2 * x3 10.51
cbind(y0, y1, y2) ~ x1 * x2 + x1 * x3 8.33
cbind(y0, y1, y2) ~ x1 * x2 + x2 * x3 10.31
cbind(y0, y1, y2) ~ x1 * x3 + x2 * x3 8.47
cbind(y0, y1, y2) ~ x1 * x2 + x1 * x3 + x2 * x3 8.28
cbind(y0, y1, y2) ~ x1 * x2 * x3 8.28

d)

  • New: What do we mean by the formula cbind(y0, y1, y2) ~ x1 + x2*x3? OBS: Ask if you are not sure before moving on!
Answer

cbind(y0, y1, y2) ~ x1 + x2*x3 means that we have a model with the covariates x1, x2, x3 and x2x3 (x2 times x3).

  • Describe the model that corresponds to x1*x2 + x1*x3. How many parameters are in this model? How many degrees of freedom for the deviance?
Answer

x1*x2 + x1*x3 has the linear predictor (all \(x\)’s are vectors here)

\[\theta_j + \beta_1 x1 + \beta_2 x2 + \beta_3 x3 + \beta_4 x1x2 + \beta_5 x2x3\]

Note that this is what we put om the right side of \(\log \left( \frac{\pi_0}{\pi_1 + \pi_2} \right)\) and \(\log \left( \frac{\pi_0 + \pi_1}{\pi_2} \right)\) from a). This model has 7 parameters (\(\beta_{1:5}\) and two intercept parameters), i.e. 24-7=17 degrees of freedom for the deviance.

  • A statistician has picked the models x2 + x3, x1 + x2 + x3, x1*x2 + x3 and x1*x2 + x1*x3 as candidates for “the best model”. Which of these models would you choose based on the deviances? Reason using hypothesis testing (you have to choose one model for the null-hypothesis, which?).
Answer

The number of parameters and degrees of freedom in each of the four models (x2 + x3, x1 + x2 + x3, x1*x2 + x3 and x1*x2 + x1*x3) are listed below:

model no df
x2 + x3 4 20
x1 + x2 + x3 5 19
x1*x2 + x3 6 18
x1x2 + x1x3 7 17

We are testing \(H_0\): x2 + x3 vs the three other models:

  • vs x1 + x2 + x3: \(\Delta D = 10.64 - 10.56 = 0.08\)
  • vs x1*x2 + x3: \(\Delta D = 10.64 - 8.52 = 2.12\)
  • vs x1*x2 + x2*x3: \(\Delta D = 10.64 - 8.33 = 2.23\)

The deviance is assumed to be \(\chi^2\)-distributed, so the critical value at a 95 % significance level is 3.841 for 1 df, and even larger for more df (see below), which means that we choose the model x2 + x3. This means that x1 does not have anything to say in the model, i.e., the interval of treatment does not matter.

# critical value at 95 % significance level for chi squared distribution for
# various degrees of freedom
qchisq(0.95, 1:4)
## [1] 3.841459 5.991465 7.814728 9.487729

e)

Below you see (a slightly edited) R-summary of the x1 + x2 + x3 model. Assume we still use the model from a).


Call:
vglm(formula = x, family = cumulative(parallel = TRUE), data = data2)


Pearson residuals:
                  Min       1Q   Median     3Q   Max
logit(P[Y<=1]) -1.294 -0.33737 -0.08605 0.1788 1.211
logit(P[Y<=2]) -1.442 -0.08222  0.12144 0.2428 1.100

Coefficients: 
              Estimate Std. Error z value Pr(>|z|)    
(Intercept):1   8.0355     2.5079   3.204  0.00136 ** 
(Intercept):2  12.4324     3.1752   3.916 9.02e-05 ***
x1             -0.2199     0.7561  -0.291  0.77114    
x2M            -2.1576     0.8875  -2.431  0.01506 *  
x3             -2.2725     0.6985  -3.253  0.00114 ** 
---

Number of linear predictors:  2 

Names of linear predictors: logit(P[Y<=1]), logit(P[Y<=2])

Residual deviance: 10.5552 on ?? degrees of freedom

Log-likelihood: -11.6754 on ?? degrees of freedom

And this is the estimated covariace matrix for the estimators:

##               (Intercept):1 (Intercept):2      x1     x2M      x3
## (Intercept):1        6.2898        7.6696 -0.6459 -1.1575 -1.6602
## (Intercept):2        7.6696       10.0816 -0.7281 -1.4997 -2.0776
## x1                  -0.6459       -0.7281  0.5716  0.0921  0.0986
## x2M                 -1.1575       -1.4997  0.0921  0.7877  0.1982
## x3                  -1.6602       -2.0776  0.0986  0.1982  0.4879
  • Use the R-summary to estimate \(e^{\beta_k}\) for \(k = 1, 2, 3\).
Answer

Estimates:

exp(coefficients(allmodels[[6]])[3:5])
##        x1       x2M        x3 
## 0.8025806 0.1156044 0.1030548
  • Find an approximate confidence interval for \(e^{\beta_1}\). Comment on this considering the model choice you made in d).
Answer

Approximate (because we assume normality of the \(\beta\)’s, it is just an approximate confidence interval) confidence interval, we choose 96 % significance level:

\[\hat{\beta}_1 \pm 1.96 \cdot \sqrt{\text{SD}(\hat{\beta}_1)}\]

exp(coefficients(allmodels[[6]])[3] + qnorm(0.975) * sqrt(vcov(allmodels[[6]])[3,
    3]) * c(-1, 1))
## [1] 0.1823578 3.5322631

This confidence interval contains 1, so we cannot reject \(H_0\): \(e^{\beta_1} = 1\) (i.e., \(\beta_1 = 0\)). This means that the interval of treatment does not matter for the final CGI levels. This is the same conclusion as the one from d).

  • Estimate the probability to get final CGI equal to 0 for a female with injections every second week and initial CGI equal to 5.
Answer

Final CGI equal to 0 for

  • injections every second week \(\implies \ x_1 = 0\),
  • female \(\implies \ x_2 = 0\)
  • initial CGI = 5 \(\implies \ x_3 = 5\)

From a):

\[\hat{\pi}_0 = \frac{e^{\hat{\theta}_0 + 0 \hat{\beta}_1 + 0\hat{\beta}_2 + 5\hat{\beta}_3}}{1 + e^{\hat{\theta}_0 + 0 \hat{\beta}_1 + 0\hat{\beta}_2 + 5\hat{\beta}_3}} = \frac{e^{\hat{\theta}_0 + 5\hat{\beta}_3}}{1 + e^{\hat{\theta}_0 + 5\hat{\beta}_3}}\]

Numbers:

pi0hat <- exp(sum(coefficients(allmodels[[6]]) * c(1, 0, 0, 0, 5)))/(1 + exp(sum(coefficients(allmodels[[6]]) *
    c(1, 0, 0, 0, 5))))
pi0hat
## [1] 0.03465839

Standard deviations are found by Taylor expanding \(\hat{\pi}_0\) around \(\theta_0, \beta_3\) (see e.g. https://en.wikipedia.org/wiki/Taylor_series#Taylor_series_in_several_variables). Let

\[f(x, y) = \frac{e^{x + 5y}}{1 + e^{x + 5y}}\]

Then \[ \frac{\partial f}{\partial x} = \frac{e^{x + 5y}}{(1 + e^{x + 5y})^2} = f(1-f) \text{ and } \frac{\partial f}{\partial x} = \frac{5e^{x + 5y}}{(1 + e^{x + 5y})^2} = 5f(1-f)\]

Thus \(\hat{\pi}_0 = \pi_0 + \pi_0(1 - \pi_0)(\hat{\theta}_0 - \theta_0) + 5\pi_0(1 - \pi_0)(\hat{\beta}_3 - \beta_3)\) such that the variance is

\[\text{Var}(\hat{\pi}_0) = \pi_0^2 (1-\pi_0)^2 \left[ \text{Var}(\hat{\theta}_0) + 25 \text{Var}(\hat{\beta}_3) + 10 \text{Cov}(\hat{\theta}_0, \hat{\beta}_3) \right]\]

We find the estimate by using \(\hat{\pi}_0\) for \(\pi_0\), and by finding the variances and covariances in the R-print. Then we will get:

pi0hat * (1 - pi0hat) * sqrt(vcov(allmodels[[6]])[1, 1] + 25 * vcov(allmodels[[6]])[5,
    5] + 10 * vcov(allmodels[[6]])[1, 5])

[1] 0.04594256

  • Explain how you can find an estimated standard deviation for this estimate (you do not need to do all calculations). Hint: You need Taylor expansions here!
Answer This is a random intercept model. Such models are useful if you want to model correlations between variables from the same group (clustered data) or the same invidual (longitudinal data).

Problem 2: More alligators (nominal model)

We will analyses an extended version of the alligators data, where also the gender of the alligator is included.

In the data below the following column names are given:

  • lake: each of the 4 lakes in Florida (1:4)
  • gender: gender of alligator (0:) and (1:) – not given in data file, what do you think?
  • size: the size of the alligator (0: 2.3 meters or smaller) and (1: larger than 2.3 meters)
  • y1: fish
  • y2: inverterbrate
  • y3: reptile
  • y4: bird
  • y5: other
  1. Investigate different models and select the best. Call this the best model.
Answer

First get the data:

library(VGAM)
data2 = "http://www.stat.ufl.edu/~aa/glm/data/Alligators2.dat"
ali2 = read.table(data2, header = T)
head(ali2)
##   lake gender size y1 y2 y3 y4 y5
## 1    1      1    1  7  1  0  0  5
## 2    1      1    0  4  0  0  1  2
## 3    1      0    1 16  3  2  2  3
## 4    1      0    0  3  0  1  2  3
## 5    2      1    1  2  2  0  0  1
## 6    2      1    0 13  7  6  0  0
ali2$lake <- factor(ali2$lake)
ali2$size <- factor(ali2$size)
ali2$gender <- factor(ali2$gender)
fit = vglm(cbind(y2, y3, y4, y5, y1) ~ lake + size + gender, family = multinomial,
    data = ali2)

Now fit 6 possible models to investigate:

  • only lake,
  • only gender,
  • only size,
  • lake+gender,
  • lake+size,
  • size+gender
all <- list(fit = vglm(cbind(y2, y3, y4, y5, y1) ~ lake + size + gender, family = multinomial,
    data = ali2), fit.lake = vglm(cbind(y2, y3, y4, y5, y1) ~ lake, family = multinomial,
    data = ali2), fit.size = vglm(cbind(y2, y3, y4, y5, y1) ~ size, family = multinomial,
    data = ali2), fit.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ gender, family = multinomial,
    data = ali2), fit.lake.size = vglm(cbind(y2, y3, y4, y5, y1) ~ lake + size, family = multinomial,
    data = ali2), fit.lake.gender = vglm(cbind(y2, y3, y4, y5, y1) ~ lake + gender,
    family = multinomial, data = ali2), fit.gender.size = vglm(cbind(y2, y3, y4,
    y5, y1) ~ size + gender, family = multinomial, data = ali2))

(AICall <- lapply(all, AIC))
## $fit
## [1] 199.6089
## 
## $fit.lake
## [1] 201.9464
## 
## $fit.size
## [1] 221.7073
## 
## $fit.gender
## [1] 227.0375
## 
## $fit.lake.size
## [1] 196.089
## 
## $fit.lake.gender
## [1] 204.2444
## 
## $fit.gender.size
## [1] 228.4214

the best model has the lowest AIC, so is the model with both lake and size

  1. Assess the model fit of this best model.
Answer

We can test if the residual deviance is larger than would be expected if the model fitted well:

best = all[[which(AICall == min(unlist(AICall)))]]
pchisq(deviance(best), df.residual(best), lower.tail = FALSE)
## [1] 0.05729729

The p-value is just above 0.05, so any evidence for model mis-fit is (at best) weak. The deviance is about 1.36 times larger than would be expected, so is not large.

(question for you: how much would the standard errors change if we use this extra variation)

  1. Interpret effects and perform inference for the best model.
Answer

Have fun…

best = all[[which(AICall == min(unlist(AICall)))]]
(Summ <- summary(best))
## 
## Call:
## vglm(formula = cbind(y2, y3, y4, y5, y1) ~ lake + size, family = multinomial, 
##     data = ali2)
## 
## Coefficients: 
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept):1  -3.0655     0.6485  -4.727 2.28e-06 ***
## (Intercept):2  -2.0646     0.7296  -2.830  0.00466 ** 
## (Intercept):3  -1.6166     0.6401  -2.526  0.01155 *  
## (Intercept):4  -0.9041     0.4666  -1.938  0.05266 .  
## lake2:1         2.9195     0.7032   4.152 3.30e-05 ***
## lake2:2         1.1457     0.8360   1.370  0.17055    
## lake2:3        -1.2545     1.1961  -1.049  0.29428    
## lake2:4        -0.8960     0.7625  -1.175  0.23998    
## lake3:1         2.7219     0.6700   4.063 4.85e-05 ***
## lake3:2         1.6939     0.7843   2.160  0.03080 *  
## lake3:3         0.5238     0.7804   0.671  0.50211    
## lake3:4         0.6064     0.5584   1.086  0.27746    
## lake4:1         1.6417     0.6116   2.684  0.00727 ** 
## lake4:2        -1.2422     1.1859  -1.047  0.29489    
## lake4:3        -0.6408     0.7782  -0.823  0.41025    
## lake4:4        -0.8563     0.5573  -1.537  0.12436    
## size1:1         1.2966     0.4228   3.067  0.00216 ** 
## size1:2        -0.3604     0.6458  -0.558  0.57680    
## size1:3        -0.2609     0.6444  -0.405  0.68563    
## size1:4         0.0956     0.4591   0.208  0.83507    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Names of linear predictors: log(mu[,1]/mu[,5]), log(mu[,2]/mu[,5]), 
## log(mu[,3]/mu[,5]), log(mu[,4]/mu[,5])
## 
## Residual deviance: 59.7085 on 44 degrees of freedom
## 
## Log-likelihood: -78.0445 on 44 degrees of freedom
## 
## Number of Fisher scoring iterations: 6 
## 
## No Hauck-Donner effect found in any of the estimates
## 
## 
## Reference group is level  5  of the response

There are a lot of parameters, so we can plot the intersting ones:

best = all[[which(AICall == min(unlist(AICall)))]]

Summ1 <- Summ@coef3[!grepl("\\(", rownames(Summ@coef3)), ]
Ests <- data.frame(Estimate = Summ1[, "Estimate"], L95 = Summ1[, "Estimate"] - 1.96 *
    Summ1[, "Std. Error"], U95 = Summ1[, "Estimate"] + 1.96 * Summ1[, "Std. Error"],
    Variable = gsub(":.*", "", rownames(Summ1)), Level = gsub(".*:", "", rownames(Summ1)))
Ests$At.x = 1:nrow(Summ1) + as.numeric(as.factor(Ests$Variable))

Vars <- tapply(Ests$At.x, list(Ests$Variable), mean)
Levels <- c("inverterbrate", "reptile", "bird", "other")

plot(Ests[, "Estimate"], Ests$At.x, xlim = range(Ests[, c("L95", "U95")]), ann = FALSE,
    col = Ests$Level, yaxt = "n")
segments(Ests[, "L95"], Ests$At.x, Ests[, "U95"], Ests$At.x, col = Ests$Level)
axis(1)
axis(2, labels = names(Vars), at = Vars)
mtext("Coefficient", 1, line = 3)
legend(2.4, 20, legend = Levels, fill = 1:4)
abline(v = 0, col = "grey", lty = 3)

Rememerb, Fish is the base level, so (for example) there are more invertebrates than fish eaten in lakes 2 - 4 compared to lake 1. Similarly adults (size=1) seem to each more invertebrates than fish, compared to juvemiles.

Exam questions

None found at NTNU or UiO - except the IL-problem.

R packages

install.packages(c("VGAM", "ggplot2", "statmod", "knitr"))

Further reading

  • A. Agresti (2015): “Foundations of Linear and Generalized Linear Models.” Chapter 6. Wiley.