Hints and reminders are in bold

Questions appear in blue.

Some hints and answers are hidden away in a fold like this Well done, you found me!

The exam style questions are optional, but will be helpful for, well, the exam. They will also help you now to see if you understand the content.

Interactive lecture - second week

We will use a data set on contraceptive use in Fiji (data from 1975). The data is to be analysed with “current use of contraceptive” as response and some (or all of) “age”, “level of education”, “desire for more children” as covariates.

The data is available at https://grodri.github.io/datasets/cuse.dat with the following description:

  • Contraceptive use: yes (using) or no (notUsing)
  • age: categorical variable with 5 levels: “<25”, “25-29”, “30-39”, “40-49”
  • education: categorical variable with 2 levels giving highest level of education obtained: Lower and Upper
  • wantsMore: Desires more children: yes or no
ds=read.table("https://grodri.github.io/datasets/cuse.dat",header=TRUE)
names(ds)
## [1] "age"       "education" "wantsMore" "notUsing"  "using"
summary(ds)
##      age             education          wantsMore            notUsing     
##  Length:16          Length:16          Length:16          Min.   :  8.00  
##  Class :character   Class :character   Class :character   1st Qu.: 31.00  
##  Mode  :character   Mode  :character   Mode  :character   Median : 56.50  
##                                                           Mean   : 68.75  
##                                                           3rd Qu.: 85.75  
##                                                           Max.   :212.00  
##      using      
##  Min.   : 4.00  
##  1st Qu.: 9.50  
##  Median :29.00  
##  Mean   :31.69  
##  3rd Qu.:49.00  
##  Max.   :80.00
dim(ds)
## [1] 16  5
head(ds)
##     age education wantsMore notUsing using
## 1   <25       low       yes       53     6
## 2   <25       low        no       10     4
## 3   <25      high       yes      212    52
## 4   <25      high        no       50    10
## 5 25-29       low       yes       60    14
## 6 25-29       low        no       19    10

We will study binary regression using the logit model, and we will work with grouped data.

Plan: Start with Problem 2, then move to 3 and 4, and if you have time you look at Problem 1.

Problem 1: The null model - no covariates included.

(This is the most theoretical of the problems and rather technical - but with some cool results on the null model.)

a) Fit the null model and call it fit0. Explain what you see.

Answer
fit0=glm(cbind(using,notUsing)~1,data=ds,family=binomial(link=logit))
summary(fit0)
## 
## Call:
## glm(formula = cbind(using, notUsing) ~ 1, family = binomial(link = logit), 
##     data = ds)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.77455    0.05368  -14.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 165.77  on 15  degrees of freedom
## Residual deviance: 165.77  on 15  degrees of freedom
## AIC: 239.28
## 
## Number of Fisher Scoring iterations: 4

We see some output. The estimate of the intercept is -0.77: this is less than 0.5, so the mean probability that a woman uses contraception is <0.5 (the response is written as cbind(success, failure): there are other ways of writing it, see the help page for glm).

The null and residual deviances are the same, because this is the null model.

Observe: When we only have an intercept in the model, the model matrix will be an \(n \times 1\)-matrix with only ones. Then \({\bf x}_j^T\boldsymbol{\beta}=\beta_0\). The log-likelihood can be written as (let \(j=1,\ldots,G\))

\[\begin{align} l(\boldsymbol{\beta}) &=\sum_{j=1}^G (y_j{\bf x}_j^T \boldsymbol{\beta} - n_j \log(1+\exp({\bf x}_j^T\boldsymbol{\beta})))= \sum_{j=1}^G (y_j\beta_0 - n_j \log(1+\exp(\beta_0)))\\ &= \beta_0 \sum_{j=1}^G y_j - \log(1+\exp(\beta_0))\sum_{j=1}^G n_j =\beta_0 N_1 - \log(1+\exp(\beta_0))N \end{align}\] where \(N_1=\sum_{j=1}^G y_j\) is the total number of successes and \(N=\sum_{j=1}^G n_j\) is the total number of trials (over all covariates patterns, that is, here the number of individuals in the data set). Also \(N_2=N-N_1\) is the total number of failures.

We will use this loglikelihood in the next question.

b) What is the relationship between your estimated coefficient and the proportion in the data set using contraception (\(N_1/N\))?

Hint 0 What would be your intuitive estimator for \(\pi\) (common to all groups).
Hint 1 Find the derivative of the log-likelihood with respect to \(\beta_0\), and use this to find the MLE for \(\beta_0\).
Hint 2 maybe easier to see what \(\hat{\pi}\) is, where \(\hat{\pi}=\frac{\exp{\hat{\beta}_0}}{1+\exp{\hat{\beta}_0}}\) (so plogis), and then \(\hat{\boldsymbol{\beta}}_0=\text{logit}(\hat{\pi})\) (so qlogis).
Hint 3 You can verify by using the estimate from glm.
Answer

The intuitive estimate of \(\hat{\pi}\) is \(N_1/N\). From above, \(l(\boldsymbol{\beta}) = \beta_0 N_1 - \log(1+\exp(\beta_0))N\), so we can differentiate (to get the score), set that to 0 and solve…

\[ \begin{align} \frac{dl(\boldsymbol{\beta})}{d\beta_0} &= N_1 - \frac{\log(1+\exp(\beta_0))N}{d\beta_0} \\ &= N_1 - N \frac{\log(u)}{du}\frac{(1+\exp(\beta_0))}{d\beta_0} && \text{using } u=1+\exp(\beta_0) \\ &= N_1 - N \frac{1}{1+\exp(\beta_0)}\exp(\beta_0) = 0 \\ \hat{\pi} &= \frac{N_1}{N} = \frac{\exp(\beta_0)}{1+\exp(\beta_0)} \end{align} \]

Recognise this? it’s the logistic function, so the inverse is the logit link: \(\beta_0 = \log(\hat{\pi}/(1-\hat{\pi}))\).

We can verify this in R:

N = sum(ds$using + ds$notUsing)
N1 = sum(ds$using)
N2 = N - N1

log(N1/N2)
## [1] -0.7745545
fit0$coefficients
## (Intercept) 
##  -0.7745545

c) We know that the (asymptotic) estimated covariance matrix of the ML estimate is the inverse of the expected Fisher information matrix, here the matrix is only a scalar and the covariance matrix is just the variance.

Find the mathematical expression for the estimated variance of our estimated intercept.

Hint 1 We have \(F(\boldsymbol{\beta}) = \sum_{j=1}^G x_jx_j^T n_j \pi_j(1-\pi_j)\), and then insert \(x_j = 1\) and \(\pi_j(1-\pi_j)=\pi(1-\pi)\), and hopefully you found above that \(\hat{\pi}=N_1/N\) in our model with only intercept.
Hint 2 \(\frac{N}{N_1 \cdot N_2}=\frac{1}{N_1}+\frac{1}{N_2}\) to make things prettier.
Answer

\(Var(\boldsymbol{\beta}) = F^{-1}(\boldsymbol{\beta})\), so find \(F(\boldsymbol{\beta})\):

\[ \begin{aligned} F(\boldsymbol{\beta}) &= \sum_{j=1}^G x_jx_j^T n_j \pi_j(1-\pi_j) \\ &= N \pi(1-\pi) \\ &= N \frac{N_1}{N}\left(\frac{N_2}{N}\right) \\ &= \frac{N_1 N_2}{N} \end{aligned} \]

So, \(F^{-1}(\boldsymbol{\beta}) = Var(\boldsymbol{\beta}) = N/(N_1 N_2) = \frac{1}{N_1}+\frac{1}{N_2}\)

d) What is the estimated (numerical value) standard deviation of the parameter estimate? Did your calculation above gives the same result?

Hint vcov(fit0)
Answer
vcov(fit0)
##             (Intercept)
## (Intercept) 0.002881477
Hopefully, “yes”, especially if 1/sum(ds$notUsing) + 1/sum(ds$using) = 0.0028815

e) What is the asymptotic distribution of the estimated regression coefficient? Use this to write down the formula for the 95% confidence interval for the intercept, and calculate the interval in R. Compare numerically to confint.default and confint.

Answer

The coefficient is asymptotically normally distributed, so the (approximate) 95% confidence interval is

\(\hat{\beta}_0 \pm 1.96 \sqrt{\sigma^2_{\beta_0}} =\) -0.77 \(\pm 1.96 \times\) 0.05 = (-0.88, -0.67).

c(coef(fit0) - 1.96*sqrt(vcov(fit0)), coef(fit0) + 1.96*sqrt(vcov(fit0)))
## [1] -0.8797661 -0.6693428
(ci=confint.default(fit0))
##                  2.5 %     97.5 %
## (Intercept) -0.8797641 -0.6693448
confint(fit0)
##      2.5 %     97.5 % 
## -0.8804716 -0.6700014

So, the asymptotic answer is (almost) the same as the one confint.default() gives. confint() (which is actually confint.glm()) is different in the third decimal place. It uses a different approach, something called a profile likelihood.

f) Translate the 95% CI to probability scale
Hint plogis is the inverse logit. Or write your own function.
Answer
plogis(ci)
##                 2.5 %    97.5 %
## (Intercept) 0.2932267 0.3386436
fit0$family$linkinv(ci)
##                 2.5 %    97.5 %
## (Intercept) 0.2932267 0.3386436

Problem 2: We then study the effect of the covariate “wantsMore”

(a little less maths)

Fit a regression with wantsMore as covariate, and call this fit1. a) Explain what the estimated coefficient (\(\beta_1\)) means.

Hint

Interpretation using odds – if `wantsMore´ goes from 0=no to 1=yes, then…

Also, check whether a “success” is using or not using contraceptives.
Answer
fit1=glm(cbind(using,notUsing)~wantsMore,data=ds,family=binomial)
summary(fit1)
## 
## Call:
## glm(formula = cbind(using, notUsing) ~ wantsMore, family = binomial, 
##     data = ds)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.18636    0.07971  -2.338   0.0194 *  
## wantsMoreyes -1.04863    0.11067  -9.475   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 165.772  on 15  degrees of freedom
## Residual deviance:  74.098  on 14  degrees of freedom
## AIC: 149.61
## 
## Number of Fisher Scoring iterations: 4
exp(fit1$coefficients)
##  (Intercept) wantsMoreyes 
##    0.8299712    0.3504178
\(\beta_1\) is the difference between the two levels of wantsMore, i.e. it is the difference in the log odds between women who want more children and those who do not. The estimate is negative (-1.05), so a woman who wants more children is less likely to be using contraceptives. The odds for a woman who wants more children are 35% lower than they are for those who want more children, . Put the other way around, they are 2.9 (\(=1/0.35\)) times higher for women who do not want more children.

b) Is this covariate significant?

  • Write down the null and alternative hypothesis to test.
  • Then test using the Wald test - write down the formula yourself.
  • Use vcov(fit1) to access the inverse of the Fisher information matrix.
  • What is the degrees of freedom for this test?
  • Compare to the print-out from summary.
Answer
  • Write down the null and alternative hypothesis to test.

\(H_0: \beta_1=0\) \(H_1: \beta_1 \ne 0\)

  • Then test using the Wald test - write down the formula yourself.

Formula:

\[ w = \frac{\beta_1 - 0}{\sqrt{a_{kk}(\hat{\boldsymbol{\beta}})}} \]

  • Use vcov(fit1) to access the inverse of the Fisher information matrix.

In code:

(w <- coef(fit1)["wantsMoreyes"]^2/vcov(fit1)["wantsMoreyes", "wantsMoreyes"])
## wantsMoreyes 
##     89.77763
(z <- coef(fit1)["wantsMoreyes"]/sqrt(vcov(fit1)["wantsMoreyes", "wantsMoreyes"]))
## wantsMoreyes 
##    -9.475106
  • What is the degrees of freedom for this test?

1

  • Compare to the print-out from summary.
summary(fit1)
## 
## Call:
## glm(formula = cbind(using, notUsing) ~ wantsMore, family = binomial, 
##     data = ds)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.18636    0.07971  -2.338   0.0194 *  
## wantsMoreyes -1.04863    0.11067  -9.475   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 165.772  on 15  degrees of freedom
## Residual deviance:  74.098  on 14  degrees of freedom
## AIC: 149.61
## 
## Number of Fisher Scoring iterations: 4

They are the same!

c) Alternatively, the likelihood ratio test can be used.

Write down the formula for the likelihood ratio test statistic. Use this in combination with the fit0 and fit1 objects to calculate the likelihood ratio statistic. What is the \(p\)-value?

Hint Write a function to calculate the log likelihoods
Answer

\[ - 2\ln \lambda=-2(\ln L(\hat{\boldsymbol{\beta}}_B)-\ln L(\hat{\boldsymbol{\beta}}_A)) \] We can calculate the log likelihoods:

calcloglik <- function(par, args) {
    y <- args$y
    x <- args$x
    n <- args$n
    res <- sum(y * x %*% par - n * log(1 + exp(x %*% par)))
    return(res)
}

args0 <- list(y = ds$using, n = ds$using + ds$notUsing, x = cbind(rep(1, nrow(ds))))
args1 <- list(y = ds$using, n = ds$using + ds$notUsing, x = cbind(rep(1, nrow(ds)), 
    as.numeric(ds$wantsMore=="yes")))

betas0 <- fit0$coefficients
betas1 <- fit1$coefficients

(ll0 <- calcloglik(betas0, args0))
## [1] -1001.847
(ll1 <- calcloglik(betas1, args1))
## [1] -956.0096
(LRTstat <- -2*(ll0-ll1))
## [1] 91.6744

d) The likelihood ratio test statistic can alternatively be calculated using the residual deviance in fit1 and fit0, and is given as fit0$deviance-fit1$deviance.

Do you see why?

Hint

“No” may be correct, but is not the answer we are looking for.

Also, think about deviances being additive.
Answer

Yes, of course I see why! The total deviance is the sum of the “explained” and residual deviance, so we can look at either the increase in deviance or the decrease in residual deviance: they are the same.

fit0$deviance-fit1$deviance
## [1] 91.6744

e) Are the two test statistics (Wald and LRT) equal? Do the two tests give the same conclusions?

Answer No, they are not quite equal, but lead to the same conclusion: there is strong evidence for an effect of wanting more children on the probability of using contraception.

Problem 3: Now two covariates - deviance and model comparison

(no maths - only definitions and print-out)

Now we study the response together with age and wantsMore. We will consider the following 5 models. See R-code and print-out below.

ds$Y <- cbind(ds$using, ds$notUsing)
models <- list(
  null     = glm(Y ~ 1, family=binomial, data=ds),
  age      = glm(Y ~ age, family=binomial, data=ds),
  desire   = glm(Y ~ wantsMore, family=binomial, data=ds),
  additive = glm(Y ~ age + wantsMore, family=binomial, data=ds),
  interact = glm(Y ~ age*wantsMore, family=binomial, data=ds)
)
models
## $null
## 
## Call:  glm(formula = Y ~ 1, family = binomial, data = ds)
## 
## Coefficients:
## (Intercept)  
##     -0.7746  
## 
## Degrees of Freedom: 15 Total (i.e. Null);  15 Residual
## Null Deviance:       165.8 
## Residual Deviance: 165.8     AIC: 239.3
## 
## $age
## 
## Call:  glm(formula = Y ~ age, family = binomial, data = ds)
## 
## Coefficients:
## (Intercept)     age25-29     age30-39     age40-49  
##     -1.5072       0.4607       1.0483       1.4246  
## 
## Degrees of Freedom: 15 Total (i.e. Null);  12 Residual
## Null Deviance:       165.8 
## Residual Deviance: 86.58     AIC: 166.1
## 
## $desire
## 
## Call:  glm(formula = Y ~ wantsMore, family = binomial, data = ds)
## 
## Coefficients:
##  (Intercept)  wantsMoreyes  
##      -0.1864       -1.0486  
## 
## Degrees of Freedom: 15 Total (i.e. Null);  14 Residual
## Null Deviance:       165.8 
## Residual Deviance: 74.1  AIC: 149.6
## 
## $additive
## 
## Call:  glm(formula = Y ~ age + wantsMore, family = binomial, data = ds)
## 
## Coefficients:
##  (Intercept)      age25-29      age30-39      age40-49  wantsMoreyes  
##      -0.8698        0.3678        0.8078        1.0226       -0.8241  
## 
## Degrees of Freedom: 15 Total (i.e. Null);  11 Residual
## Null Deviance:       165.8 
## Residual Deviance: 36.89     AIC: 118.4
## 
## $interact
## 
## Call:  glm(formula = Y ~ age * wantsMore, family = binomial, data = ds)
## 
## Coefficients:
##           (Intercept)               age25-29               age30-39  
##               -1.4553                 0.6354                 1.5411  
##              age40-49           wantsMoreyes  age25-29:wantsMoreyes  
##                1.7643                -0.0640                -0.2672  
## age30-39:wantsMoreyes  age40-49:wantsMoreyes  
##               -1.0905                -1.3671  
## 
## Degrees of Freedom: 15 Total (i.e. Null);  8 Residual
## Null Deviance:       165.8 
## Residual Deviance: 20.1  AIC: 107.6
lapply(models,summary)
## $null
## 
## Call:
## glm(formula = Y ~ 1, family = binomial, data = ds)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.77455    0.05368  -14.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 165.77  on 15  degrees of freedom
## Residual deviance: 165.77  on 15  degrees of freedom
## AIC: 239.28
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## $age
## 
## Call:
## glm(formula = Y ~ age, family = binomial, data = ds)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.5072     0.1303 -11.571  < 2e-16 ***
## age25-29      0.4607     0.1727   2.667  0.00765 ** 
## age30-39      1.0483     0.1544   6.788 1.14e-11 ***
## age40-49      1.4246     0.1940   7.345 2.06e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 165.772  on 15  degrees of freedom
## Residual deviance:  86.581  on 12  degrees of freedom
## AIC: 166.09
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## $desire
## 
## Call:
## glm(formula = Y ~ wantsMore, family = binomial, data = ds)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.18636    0.07971  -2.338   0.0194 *  
## wantsMoreyes -1.04863    0.11067  -9.475   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 165.772  on 15  degrees of freedom
## Residual deviance:  74.098  on 14  degrees of freedom
## AIC: 149.61
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## $additive
## 
## Call:
## glm(formula = Y ~ age + wantsMore, family = binomial, data = ds)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.8698     0.1571  -5.536 3.10e-08 ***
## age25-29       0.3678     0.1754   2.097    0.036 *  
## age30-39       0.8078     0.1598   5.056 4.27e-07 ***
## age40-49       1.0226     0.2039   5.014 5.32e-07 ***
## wantsMoreyes  -0.8241     0.1171  -7.037 1.97e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 165.772  on 15  degrees of freedom
## Residual deviance:  36.888  on 11  degrees of freedom
## AIC: 118.4
## 
## Number of Fisher Scoring iterations: 4
## 
## 
## $interact
## 
## Call:
## glm(formula = Y ~ age * wantsMore, family = binomial, data = ds)
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)            -1.4553     0.2968  -4.903 9.43e-07 ***
## age25-29                0.6354     0.3564   1.783  0.07463 .  
## age30-39                1.5412     0.3183   4.842 1.29e-06 ***
## age40-49                1.7643     0.3435   5.136 2.80e-07 ***
## wantsMoreyes           -0.0640     0.3303  -0.194  0.84637    
## age25-29:wantsMoreyes  -0.2672     0.4091  -0.653  0.51366    
## age30-39:wantsMoreyes  -1.0905     0.3733  -2.921  0.00349 ** 
## age40-49:wantsMoreyes  -1.3672     0.4834  -2.828  0.00468 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 165.772  on 15  degrees of freedom
## Residual deviance:  20.099  on  8  degrees of freedom
## AIC: 107.61
## 
## Number of Fisher Scoring iterations: 4
data.frame(deviance=round(unlist(lapply(models,deviance)),2),df=unlist(lapply(models,df.residual)),aic=round(unlist(lapply(models,AIC))))
##          deviance df aic
## null       165.77 15 239
## age         86.58 12 166
## desire      74.10 14 150
## additive    36.89 11 118
## interact    20.10  8 108

Note: The function lapply (list apply) will apply a function to each element of a list.

a) Explain what each of these models include.

Hint We’re interested in which covariate is in which model
Answer
  • null: Constant only, no effects. This is the same as the model in Q1
  • age: only has Age as an effect
  • desire: Only has whether the woman wants more children. This is the model in Q2
  • additive: has two main effects: from the age and desire models,
  • interact: includes an interaction, i.e. the effect of age depends on whether the woman wants more children.

b) What is the definition of the deviance and df. What can we use the deviance for?

Answer
Answer

The deviance is defined as the likelihood ratio statistic, where we put the saturated model in place of the larger model A and our candidate model in place of the smaller model B:

\[ \begin{aligned} D&=-2(\ln L(\text{candidate model})-\ln L(\text{saturated model})) \\ &=-2(l(\hat{\pi})-l(\tilde{\pi}))= -2\sum_{j=1}^G(l_j(\hat{\pi}_j)-l_j(\tilde{\pi}_j)) \end{aligned} \]

The df is the difference in the number of parameters of the models being compared.

We can use it to (a) compare models, (b) test howgoodness of fit of a model, and (c) test for overdispersion.

Optional: derive the formula for the deviance for the logit model (the derivation is not given on the module pages.)

Hint No answer yet!
Answer Sorry 😢
c) Perform a likelihood ratio test - based on the deviance results given in the data frame in the R chunk- to compare the additive and interact models. First write down the null and alternative hypothesis you are testing. Which of the two models would you prefer?
Answer
Answer

d) What if you use the AIC for model selection, which model (out of the 5) will you then use?

Answer
Answer

Problem 4: Plotting

(yes, mainly R and trying to understand)

Finally, we want to use the additive model with all three covariates: Y~age+education+wantsMore to study different plots.

fit3add=glm(cbind(using,notUsing)~age+education+wantsMore,data=ds,family=binomial)
summary(fit3add)
## 
## Call:
## glm(formula = cbind(using, notUsing) ~ age + education + wantsMore, 
##     family = binomial, data = ds)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -0.8082     0.1590  -5.083 3.71e-07 ***
## age25-29       0.3894     0.1759   2.214  0.02681 *  
## age30-39       0.9086     0.1646   5.519 3.40e-08 ***
## age40-49       1.1892     0.2144   5.546 2.92e-08 ***
## educationlow  -0.3250     0.1240  -2.620  0.00879 ** 
## wantsMoreyes  -0.8330     0.1175  -7.091 1.33e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 165.772  on 15  degrees of freedom
## Residual deviance:  29.917  on 10  degrees of freedom
## AIC: 113.43
## 
## Number of Fisher Scoring iterations: 4
exp(fit3add$coefficients)
##  (Intercept)     age25-29     age30-39     age40-49 educationlow wantsMoreyes 
##    0.4456506    1.4760678    2.4808804    3.2845805    0.7225312    0.4347628

a) Comment on the output from summary. Use the deviance to test if this is a good model. Which model do you then compare to (what is the saturated model here)?.

Answer
Answer

b) Look at the plot of fitted values vs. deviance residuals. Do you see a trend?

Answer
Answer
library(ggplot2)
plotdf <- data.frame(dres = fit3add$residuals, fitted = fit3add$fitted.values, 
    age = ds$age)
ggplot(plotdf, aes(x = fitted, y = dres)) + geom_point() + labs(x = "Fitted values", 
    y = "Deviance residuals")

c) Here is a code to produce a plot of fitted values for the 16 covariate patterns together with a saturated model (called frac) - on the logit scale.

library(ggplot2)
frac=ds$using/(ds$using+ds$notUsing)
logitdf=data.frame(lfitted=qlogis(fit3add$fitted),lfrac=qlogis(frac),age=ds$age,wantsMore=ds$wantsMore,education=ds$education)
ggplot(logitdf,aes(x=age))+geom_point(aes(y=lfrac,colour="saturated"))+geom_point(aes(y=lfitted,colour="fitted"))+facet_wrap(facets=~wantsMore*education) + labs(x = "age", y = "")

How can you see from the dots for fitted values that we are assuming an additive model on the logit scale (that is, the pattern for each panel for logit fitted vs age is the same for each panel if we have an additive model on the logit scale - you see that from the fitted points)? Do you also see that for the observed values? Implication? Any other observations?

Answer
Answer

To see what Rodríguez have found to be the best model, look at the bottom of http://data.princeton.edu/wws509/R/c3s6.html.

Note: This exercise is based on the excellent notes of Germán Rodríguez at Princeton, see in particular the R logs at http://data.princeton.edu/wws509/R/c3s1.html.

Exam questions

For this module the following are exam questions to work with

In addition these essay-type exam questions are closely related to this module.

December 2014

There are two main asymptotic results that are behind essentially everything we have done with respect to inference for generalized linear models. These are

  1. asymptotic normality of maximum likelihood estimators (MLE), and

  2. asymptotic result for likelihood ratio tests (LRT).

State, describe and discuss the main assumptions behind these two asymptotic results, how these results are used in practice to do inference for parameters, testing various hypotheses and comparing nested models, for logistic regression.

December 2016

We will consider (binomial or binary) logistic regression, where we have independent observations \[Y_i \sim \text{Binomial}(n_i,\pi_i) \text{ } i=1,\ldots,n\] so that \[P(Y_i)=\binom{n_i}{y_i}\pi_i^{y_i}(1-\pi_i)^{n_i-y_i}, \text{ } y_i=0,1,\ldots, n_i\] The linear predictor is \[ \eta_i={\bf x}_i^T\boldsymbol{\beta}\] and \[ \pi_i=\frac{\exp(\eta_i)}{1+\exp(\eta_i)}\] or \[ \text{logit}(\pi_i)=\eta_i.\] Here, \({\bf x}_i\) is a vector of the \(p\) covariates for the \(i\)th observation \(y_i\) with size (number of trials) \(n_i\), and \(\boldsymbol{\beta}\) is the vector of \(p\) unknown regression coefficients.

Write an introduction to logistic regression and its practical usage, for a student with a good background in statistics, but no knowledge about Generalized Linear Models (GLM). Topics you may want to consider, are

  • When to use it? Underlying assumptions.
  • Parameter estimation, limiting results for the MLE, Fisher information and observed Fisher information, confidence intervals and hypothesis testing.
  • Output analysis, residual plots (when it is possible) and interpretation of the \(\boldsymbol{\beta}\)-coefficients
  • Deviance and its usage.

R packages

install.packages(c("tidyverse",
                   "investr",
                   "knitr",
                   "kableExtra",
                   "faraway",
                   "viridis",
                   "statmod"))

References for further reading

  • A. Agresti (2015): “Foundations of Linear and Generalized Linear Models.” Wiley.
  • A. J. Dobson and A. G. Barnett (2008): “An Introduction to Generalized Linear Models”, Third edition.
  • J. Faraway (2015): “Extending the Linear Model with R”, Second Edition. http://www.maths.bath.ac.uk/~jjf23/ELM/
  • P. McCullagh and J. A. Nelder (1989): “Generalized Linear Models”. Second edition.