Hints and reminders are in bold
Questions appear in blue.
The exam style questions are optional, but will be helpful for, well, the exam. They will also help you now to see if you understand the content.
We will use a data set on contraceptive use in Fiji (data from 1975). The data is to be analysed with “current use of contraceptive” as response and some (or all of) “age”, “level of education”, “desire for more children” as covariates.
The data is available at https://grodri.github.io/datasets/cuse.dat with the following description:
using
) or no
(notUsing
)age
: categorical variable with 5 levels: “<25”,
“25-29”, “30-39”, “40-49”education
: categorical variable with 2 levels giving
highest level of education obtained: Lower and UpperwantsMore
: Desires more children: yes or nods=read.table("https://grodri.github.io/datasets/cuse.dat",header=TRUE)
names(ds)
## [1] "age" "education" "wantsMore" "notUsing" "using"
summary(ds)
## age education wantsMore notUsing
## Length:16 Length:16 Length:16 Min. : 8.00
## Class :character Class :character Class :character 1st Qu.: 31.00
## Mode :character Mode :character Mode :character Median : 56.50
## Mean : 68.75
## 3rd Qu.: 85.75
## Max. :212.00
## using
## Min. : 4.00
## 1st Qu.: 9.50
## Median :29.00
## Mean :31.69
## 3rd Qu.:49.00
## Max. :80.00
dim(ds)
## [1] 16 5
head(ds)
## age education wantsMore notUsing using
## 1 <25 low yes 53 6
## 2 <25 low no 10 4
## 3 <25 high yes 212 52
## 4 <25 high no 50 10
## 5 25-29 low yes 60 14
## 6 25-29 low no 19 10
We will study binary regression using the logit model, and we will work with grouped data.
Plan: Start with Problem 2, then move to 3 and 4, and if you have time you look at Problem 1.
(This is the most theoretical of the problems and rather technical - but with some cool results on the null model.)
a) Fit the null model and
call it fit0
. Explain what you see.
fit0=glm(cbind(using,notUsing)~1,data=ds,family=binomial(link=logit))
summary(fit0)
##
## Call:
## glm(formula = cbind(using, notUsing) ~ 1, family = binomial(link = logit),
## data = ds)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.77455 0.05368 -14.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.77 on 15 degrees of freedom
## Residual deviance: 165.77 on 15 degrees of freedom
## AIC: 239.28
##
## Number of Fisher Scoring iterations: 4
We see some output. The estimate of the intercept is -0.77: this is
less than 0.5, so the mean probability that a woman uses contraception
is <0.5 (the response is written as
cbind(success, failure)
: there are other ways of writing
it, see the help page for glm
).
The null and residual deviances are the same, because this is the null model.
Observe: When we only have an intercept in the model, the model matrix will be an \(n \times 1\)-matrix with only ones. Then \({\bf x}_j^T\boldsymbol{\beta}=\beta_0\). The log-likelihood can be written as (let \(j=1,\ldots,G\))
\[\begin{align} l(\boldsymbol{\beta}) &=\sum_{j=1}^G (y_j{\bf x}_j^T \boldsymbol{\beta} - n_j \log(1+\exp({\bf x}_j^T\boldsymbol{\beta})))= \sum_{j=1}^G (y_j\beta_0 - n_j \log(1+\exp(\beta_0)))\\ &= \beta_0 \sum_{j=1}^G y_j - \log(1+\exp(\beta_0))\sum_{j=1}^G n_j =\beta_0 N_1 - \log(1+\exp(\beta_0))N \end{align}\] where \(N_1=\sum_{j=1}^G y_j\) is the total number of successes and \(N=\sum_{j=1}^G n_j\) is the total number of trials (over all covariates patterns, that is, here the number of individuals in the data set). Also \(N_2=N-N_1\) is the total number of failures.
We will use this loglikelihood in the next question.
b) What is the relationship between your estimated coefficient and the proportion in the data set using contraception (\(N_1/N\))?
plogis
), and then \(\hat{\boldsymbol{\beta}}_0=\text{logit}(\hat{\pi})\)
(so qlogis
).
glm
.
The intuitive estimate of \(\hat{\pi}\) is \(N_1/N\). From above, \(l(\boldsymbol{\beta}) = \beta_0 N_1 - \log(1+\exp(\beta_0))N\), so we can differentiate (to get the score), set that to 0 and solve…
\[ \begin{align} \frac{dl(\boldsymbol{\beta})}{d\beta_0} &= N_1 - \frac{\log(1+\exp(\beta_0))N}{d\beta_0} \\ &= N_1 - N \frac{\log(u)}{du}\frac{(1+\exp(\beta_0))}{d\beta_0} && \text{using } u=1+\exp(\beta_0) \\ &= N_1 - N \frac{1}{1+\exp(\beta_0)}\exp(\beta_0) = 0 \\ \hat{\pi} &= \frac{N_1}{N} = \frac{\exp(\beta_0)}{1+\exp(\beta_0)} \end{align} \]
Recognise this? it’s the logistic function, so the inverse is the logit link: \(\beta_0 = \log(\hat{\pi}/(1-\hat{\pi}))\).
We can verify this in R:
N = sum(ds$using + ds$notUsing)
N1 = sum(ds$using)
N2 = N - N1
log(N1/N2)
## [1] -0.7745545
fit0$coefficients
## (Intercept)
## -0.7745545
c) We know that the (asymptotic) estimated covariance matrix of the ML estimate is the inverse of the expected Fisher information matrix, here the matrix is only a scalar and the covariance matrix is just the variance.
Find the mathematical expression for the estimated variance of our estimated intercept.
\(Var(\boldsymbol{\beta}) = F^{-1}(\boldsymbol{\beta})\), so find \(F(\boldsymbol{\beta})\):
\[ \begin{aligned} F(\boldsymbol{\beta}) &= \sum_{j=1}^G x_jx_j^T n_j \pi_j(1-\pi_j) \\ &= N \pi(1-\pi) \\ &= N \frac{N_1}{N}\left(\frac{N_2}{N}\right) \\ &= \frac{N_1 N_2}{N} \end{aligned} \]
So, \(F^{-1}(\boldsymbol{\beta}) = Var(\boldsymbol{\beta}) = N/(N_1 N_2) = \frac{1}{N_1}+\frac{1}{N_2}\)d) What is the estimated (numerical value) standard deviation of the parameter estimate? Did your calculation above gives the same result?
vcov(fit0)
vcov(fit0)
## (Intercept)
## (Intercept) 0.002881477
Hopefully, “yes”, especially if
1/sum(ds$notUsing) + 1/sum(ds$using) =
0.0028815
e) What is the asymptotic
distribution of the estimated regression coefficient? Use this to write
down the formula for the 95% confidence interval for the intercept, and
calculate the interval in R. Compare numerically to
confint.default
and confint
.
The coefficient is asymptotically normally distributed, so the (approximate) 95% confidence interval is
\(\hat{\beta}_0 \pm 1.96 \sqrt{\sigma^2_{\beta_0}} =\) -0.77 \(\pm 1.96 \times\) 0.05 = (-0.88, -0.67).
c(coef(fit0) - 1.96*sqrt(vcov(fit0)), coef(fit0) + 1.96*sqrt(vcov(fit0)))
## [1] -0.8797661 -0.6693428
(ci=confint.default(fit0))
## 2.5 % 97.5 %
## (Intercept) -0.8797641 -0.6693448
confint(fit0)
## 2.5 % 97.5 %
## -0.8804716 -0.6700014
So, the asymptotic answer is (almost) the same as the one
confint.default()
gives. confint()
(which is
actually confint.glm()
) is different in the third decimal
place. It uses a different approach, something called a profile
likelihood.
plogis
is the inverse logit. Or write your own function.
plogis(ci)
## 2.5 % 97.5 %
## (Intercept) 0.2932267 0.3386436
fit0$family$linkinv(ci)
## 2.5 % 97.5 %
## (Intercept) 0.2932267 0.3386436
(a little less maths)
Fit a regression with wantsMore
as covariate, and call
this fit1
. a)
Explain what the estimated coefficient (\(\beta_1\)) means.
Interpretation using odds – if `wantsMore´ goes from 0=no to 1=yes, then…
Also, check whether a “success” is using or not using contraceptives.fit1=glm(cbind(using,notUsing)~wantsMore,data=ds,family=binomial)
summary(fit1)
##
## Call:
## glm(formula = cbind(using, notUsing) ~ wantsMore, family = binomial,
## data = ds)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.18636 0.07971 -2.338 0.0194 *
## wantsMoreyes -1.04863 0.11067 -9.475 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.772 on 15 degrees of freedom
## Residual deviance: 74.098 on 14 degrees of freedom
## AIC: 149.61
##
## Number of Fisher Scoring iterations: 4
exp(fit1$coefficients)
## (Intercept) wantsMoreyes
## 0.8299712 0.3504178
\(\beta_1\) is the difference between
the two levels of wantsMore
, i.e. it is the difference in
the log odds between women who want more children and those who do not.
The estimate is negative (-1.05), so a woman who wants more children is
less likely to be using contraceptives. The odds for a woman who wants
more children are 35% lower than they are for those who want more
children, . Put the other way around, they are 2.9 (\(=1/0.35\)) times higher for women who do
not want more children.
b) Is this covariate significant?
vcov(fit1)
to access the inverse of the Fisher
information matrix.summary
.\(H_0: \beta_1=0\) \(H_1: \beta_1 \ne 0\)
Formula:
\[ w = \frac{\beta_1 - 0}{\sqrt{a_{kk}(\hat{\boldsymbol{\beta}})}} \]
vcov(fit1)
to access the inverse of the Fisher
information matrix.In code:
(w <- coef(fit1)["wantsMoreyes"]^2/vcov(fit1)["wantsMoreyes", "wantsMoreyes"])
## wantsMoreyes
## 89.77763
(z <- coef(fit1)["wantsMoreyes"]/sqrt(vcov(fit1)["wantsMoreyes", "wantsMoreyes"]))
## wantsMoreyes
## -9.475106
1
summary
.summary(fit1)
##
## Call:
## glm(formula = cbind(using, notUsing) ~ wantsMore, family = binomial,
## data = ds)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.18636 0.07971 -2.338 0.0194 *
## wantsMoreyes -1.04863 0.11067 -9.475 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.772 on 15 degrees of freedom
## Residual deviance: 74.098 on 14 degrees of freedom
## AIC: 149.61
##
## Number of Fisher Scoring iterations: 4
They are the same!
c) Alternatively, the likelihood ratio test can be used.
Write down the formula for the likelihood
ratio test statistic. Use this in combination with the fit0
and fit1
objects to calculate the likelihood ratio
statistic. What is the \(p\)-value?
\[ - 2\ln \lambda=-2(\ln L(\hat{\boldsymbol{\beta}}_B)-\ln L(\hat{\boldsymbol{\beta}}_A)) \] We can calculate the log likelihoods:
calcloglik <- function(par, args) {
y <- args$y
x <- args$x
n <- args$n
res <- sum(y * x %*% par - n * log(1 + exp(x %*% par)))
return(res)
}
args0 <- list(y = ds$using, n = ds$using + ds$notUsing, x = cbind(rep(1, nrow(ds))))
args1 <- list(y = ds$using, n = ds$using + ds$notUsing, x = cbind(rep(1, nrow(ds)),
as.numeric(ds$wantsMore=="yes")))
betas0 <- fit0$coefficients
betas1 <- fit1$coefficients
(ll0 <- calcloglik(betas0, args0))
## [1] -1001.847
(ll1 <- calcloglik(betas1, args1))
## [1] -956.0096
(LRTstat <- -2*(ll0-ll1))
## [1] 91.6744
d) The likelihood ratio test statistic can
alternatively be calculated using the residual deviance in
fit1
and fit0
, and is given as
fit0$deviance-fit1$deviance
.
Do you see why?
“No” may be correct, but is not the answer we are looking for.
Also, think about deviances being additive.Yes, of course I see why! The total deviance is the sum of the “explained” and residual deviance, so we can look at either the increase in deviance or the decrease in residual deviance: they are the same.
fit0$deviance-fit1$deviance
## [1] 91.6744
e) Are the two test statistics (Wald and LRT) equal? Do the two tests give the same conclusions?
(no maths - only definitions and print-out)
Now we study the response together with age
and
wantsMore
. We will consider the following 5 models. See
R-code and print-out below.
ds$Y <- cbind(ds$using, ds$notUsing)
models <- list(
null = glm(Y ~ 1, family=binomial, data=ds),
age = glm(Y ~ age, family=binomial, data=ds),
desire = glm(Y ~ wantsMore, family=binomial, data=ds),
additive = glm(Y ~ age + wantsMore, family=binomial, data=ds),
interact = glm(Y ~ age*wantsMore, family=binomial, data=ds)
)
models
## $null
##
## Call: glm(formula = Y ~ 1, family = binomial, data = ds)
##
## Coefficients:
## (Intercept)
## -0.7746
##
## Degrees of Freedom: 15 Total (i.e. Null); 15 Residual
## Null Deviance: 165.8
## Residual Deviance: 165.8 AIC: 239.3
##
## $age
##
## Call: glm(formula = Y ~ age, family = binomial, data = ds)
##
## Coefficients:
## (Intercept) age25-29 age30-39 age40-49
## -1.5072 0.4607 1.0483 1.4246
##
## Degrees of Freedom: 15 Total (i.e. Null); 12 Residual
## Null Deviance: 165.8
## Residual Deviance: 86.58 AIC: 166.1
##
## $desire
##
## Call: glm(formula = Y ~ wantsMore, family = binomial, data = ds)
##
## Coefficients:
## (Intercept) wantsMoreyes
## -0.1864 -1.0486
##
## Degrees of Freedom: 15 Total (i.e. Null); 14 Residual
## Null Deviance: 165.8
## Residual Deviance: 74.1 AIC: 149.6
##
## $additive
##
## Call: glm(formula = Y ~ age + wantsMore, family = binomial, data = ds)
##
## Coefficients:
## (Intercept) age25-29 age30-39 age40-49 wantsMoreyes
## -0.8698 0.3678 0.8078 1.0226 -0.8241
##
## Degrees of Freedom: 15 Total (i.e. Null); 11 Residual
## Null Deviance: 165.8
## Residual Deviance: 36.89 AIC: 118.4
##
## $interact
##
## Call: glm(formula = Y ~ age * wantsMore, family = binomial, data = ds)
##
## Coefficients:
## (Intercept) age25-29 age30-39
## -1.4553 0.6354 1.5411
## age40-49 wantsMoreyes age25-29:wantsMoreyes
## 1.7643 -0.0640 -0.2672
## age30-39:wantsMoreyes age40-49:wantsMoreyes
## -1.0905 -1.3671
##
## Degrees of Freedom: 15 Total (i.e. Null); 8 Residual
## Null Deviance: 165.8
## Residual Deviance: 20.1 AIC: 107.6
lapply(models,summary)
## $null
##
## Call:
## glm(formula = Y ~ 1, family = binomial, data = ds)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.77455 0.05368 -14.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.77 on 15 degrees of freedom
## Residual deviance: 165.77 on 15 degrees of freedom
## AIC: 239.28
##
## Number of Fisher Scoring iterations: 4
##
##
## $age
##
## Call:
## glm(formula = Y ~ age, family = binomial, data = ds)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.5072 0.1303 -11.571 < 2e-16 ***
## age25-29 0.4607 0.1727 2.667 0.00765 **
## age30-39 1.0483 0.1544 6.788 1.14e-11 ***
## age40-49 1.4246 0.1940 7.345 2.06e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.772 on 15 degrees of freedom
## Residual deviance: 86.581 on 12 degrees of freedom
## AIC: 166.09
##
## Number of Fisher Scoring iterations: 4
##
##
## $desire
##
## Call:
## glm(formula = Y ~ wantsMore, family = binomial, data = ds)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.18636 0.07971 -2.338 0.0194 *
## wantsMoreyes -1.04863 0.11067 -9.475 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.772 on 15 degrees of freedom
## Residual deviance: 74.098 on 14 degrees of freedom
## AIC: 149.61
##
## Number of Fisher Scoring iterations: 4
##
##
## $additive
##
## Call:
## glm(formula = Y ~ age + wantsMore, family = binomial, data = ds)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8698 0.1571 -5.536 3.10e-08 ***
## age25-29 0.3678 0.1754 2.097 0.036 *
## age30-39 0.8078 0.1598 5.056 4.27e-07 ***
## age40-49 1.0226 0.2039 5.014 5.32e-07 ***
## wantsMoreyes -0.8241 0.1171 -7.037 1.97e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.772 on 15 degrees of freedom
## Residual deviance: 36.888 on 11 degrees of freedom
## AIC: 118.4
##
## Number of Fisher Scoring iterations: 4
##
##
## $interact
##
## Call:
## glm(formula = Y ~ age * wantsMore, family = binomial, data = ds)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.4553 0.2968 -4.903 9.43e-07 ***
## age25-29 0.6354 0.3564 1.783 0.07463 .
## age30-39 1.5412 0.3183 4.842 1.29e-06 ***
## age40-49 1.7643 0.3435 5.136 2.80e-07 ***
## wantsMoreyes -0.0640 0.3303 -0.194 0.84637
## age25-29:wantsMoreyes -0.2672 0.4091 -0.653 0.51366
## age30-39:wantsMoreyes -1.0905 0.3733 -2.921 0.00349 **
## age40-49:wantsMoreyes -1.3672 0.4834 -2.828 0.00468 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.772 on 15 degrees of freedom
## Residual deviance: 20.099 on 8 degrees of freedom
## AIC: 107.61
##
## Number of Fisher Scoring iterations: 4
data.frame(deviance=round(unlist(lapply(models,deviance)),2),df=unlist(lapply(models,df.residual)),aic=round(unlist(lapply(models,AIC))))
## deviance df aic
## null 165.77 15 239
## age 86.58 12 166
## desire 74.10 14 150
## additive 36.89 11 118
## interact 20.10 8 108
Note: The function lapply
(list apply) will apply a
function to each element of a list.
a) Explain what each of these models include.
b) What is the definition of the deviance and df. What can we use the deviance for?
The deviance is defined as the likelihood ratio statistic, where we put the saturated model in place of the larger model A and our candidate model in place of the smaller model B:
\[ \begin{aligned} D&=-2(\ln L(\text{candidate model})-\ln L(\text{saturated model})) \\ &=-2(l(\hat{\pi})-l(\tilde{\pi}))= -2\sum_{j=1}^G(l_j(\hat{\pi}_j)-l_j(\tilde{\pi}_j)) \end{aligned} \]
The df is the difference in the number of parameters of the models being compared.
We can use it to (a) compare models, (b) test howgoodness of fit of a model, and (c) test for overdispersion.
Optional: derive the formula for the deviance for the logit model (the derivation is not given on the module pages.)
d) What if you use the AIC for model selection, which model (out of the 5) will you then use?
(yes, mainly R and trying to understand)
Finally, we want to use the additive model with all three covariates:
Y~age+education+wantsMore
to study different plots.
fit3add=glm(cbind(using,notUsing)~age+education+wantsMore,data=ds,family=binomial)
summary(fit3add)
##
## Call:
## glm(formula = cbind(using, notUsing) ~ age + education + wantsMore,
## family = binomial, data = ds)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8082 0.1590 -5.083 3.71e-07 ***
## age25-29 0.3894 0.1759 2.214 0.02681 *
## age30-39 0.9086 0.1646 5.519 3.40e-08 ***
## age40-49 1.1892 0.2144 5.546 2.92e-08 ***
## educationlow -0.3250 0.1240 -2.620 0.00879 **
## wantsMoreyes -0.8330 0.1175 -7.091 1.33e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 165.772 on 15 degrees of freedom
## Residual deviance: 29.917 on 10 degrees of freedom
## AIC: 113.43
##
## Number of Fisher Scoring iterations: 4
exp(fit3add$coefficients)
## (Intercept) age25-29 age30-39 age40-49 educationlow wantsMoreyes
## 0.4456506 1.4760678 2.4808804 3.2845805 0.7225312 0.4347628
a) Comment on the output from
summary
. Use the deviance to test if this is a good model.
Which model do you then compare to (what is the saturated model
here)?.
b) Look at the plot of fitted values vs. deviance residuals. Do you see a trend?
library(ggplot2)
plotdf <- data.frame(dres = fit3add$residuals, fitted = fit3add$fitted.values,
age = ds$age)
ggplot(plotdf, aes(x = fitted, y = dres)) + geom_point() + labs(x = "Fitted values",
y = "Deviance residuals")
c) Here is a code to produce a plot of fitted values for the 16 covariate patterns together with a saturated model (called frac) - on the logit scale.
library(ggplot2)
frac=ds$using/(ds$using+ds$notUsing)
logitdf=data.frame(lfitted=qlogis(fit3add$fitted),lfrac=qlogis(frac),age=ds$age,wantsMore=ds$wantsMore,education=ds$education)
ggplot(logitdf,aes(x=age))+geom_point(aes(y=lfrac,colour="saturated"))+geom_point(aes(y=lfitted,colour="fitted"))+facet_wrap(facets=~wantsMore*education) + labs(x = "age", y = "")
How can you see from the dots for fitted values that we are assuming an additive model on the logit scale (that is, the pattern for each panel for logit fitted vs age is the same for each panel if we have an additive model on the logit scale - you see that from the fitted points)? Do you also see that for the observed values? Implication? Any other observations?
To see what Rodríguez have found to be the best model, look at the bottom of http://data.princeton.edu/wws509/R/c3s6.html.
Note: This exercise is based on the excellent notes of Germán Rodríguez at Princeton, see in particular the R logs at http://data.princeton.edu/wws509/R/c3s1.html.
For this module the following are exam questions to work with
In addition these essay-type exam questions are closely related to this module.
There are two main asymptotic results that are behind essentially everything we have done with respect to inference for generalized linear models. These are
asymptotic normality of maximum likelihood estimators (MLE), and
asymptotic result for likelihood ratio tests (LRT).
State, describe and discuss the main assumptions behind these two asymptotic results, how these results are used in practice to do inference for parameters, testing various hypotheses and comparing nested models, for logistic regression.
We will consider (binomial or binary) logistic regression, where we have independent observations \[Y_i \sim \text{Binomial}(n_i,\pi_i) \text{ } i=1,\ldots,n\] so that \[P(Y_i)=\binom{n_i}{y_i}\pi_i^{y_i}(1-\pi_i)^{n_i-y_i}, \text{ } y_i=0,1,\ldots, n_i\] The linear predictor is \[ \eta_i={\bf x}_i^T\boldsymbol{\beta}\] and \[ \pi_i=\frac{\exp(\eta_i)}{1+\exp(\eta_i)}\] or \[ \text{logit}(\pi_i)=\eta_i.\] Here, \({\bf x}_i\) is a vector of the \(p\) covariates for the \(i\)th observation \(y_i\) with size (number of trials) \(n_i\), and \(\boldsymbol{\beta}\) is the vector of \(p\) unknown regression coefficients.
Write an introduction to logistic regression and its practical usage, for a student with a good background in statistics, but no knowledge about Generalized Linear Models (GLM). Topics you may want to consider, are
install.packages(c("tidyverse",
"investr",
"knitr",
"kableExtra",
"faraway",
"viridis",
"statmod"))