R packages

install.packages(c("formatR", "gamlss.data", "leaps"))

Problem 1: Model assumptions

a)

We assume:

  1. Binary (or binomial) data \(Y\)
  2. \(\eta = \mathbf{X\beta}\)
  3. Link function \(g(\pi) = \text{logit}(\pi) = \eta\)

b)

We use logit link function (\(\text{logit}(x) = \ln(x/(1-x))\)), and expit response function (\(\text{expit}(x) = \exp(x)/(1+\exp(x)\)).

c)

They are the same!

Problem 2: Log-likelihood

library(investr)
library(ggplot2)
library(viridis)

# from aggregated to individual data (because these data were
# aggregated)
ldose <- rep(investr::beetle$ldose, investr::beetle$n)
y <- NULL
for (i in 1:8) y = c(y, rep(0, investr::beetle$n[i] - investr::beetle$y[i]), 
    rep(1, investr::beetle$y[i]))
beetleds = data.frame(killed = y, ldose = ldose)

loglik <- function(par, args) {
    y <- args$y
    x <- args$x
    n <- args$n
    res <- sum(y * x %*% par - n * log(1 + exp(x %*% par)))
    return(res)
}

a)

The log-likelihood is the natural logarithm of the likelihood. We say that the likelihood of a parameter value (or vector of parameter values) given the outcomes is equal to joint distribution for those outcomes given the parameter value(s) (for discrete variables the joint distribution is a probability, but for continuous varibles it is a joint density). When the observations are independent the log-likelihood is given as \(l(\beta) = \ln(\prod_{j = 1}^n f(y_j))\).

b)

We find this expression for the likelihood using that the observations are assumed to be independent, so \(L = \prod f_i\), and then we take the log of this.

c)

Likelihood for individual data:

\[l(\mathbf{\beta}) = \sum_{j=1}^n \left( y_j \ln\left(\frac{\pi_j}{1-\pi_j}\right) + \ln(1-\pi_j) \right) = \sum_{j=1}^n \left( y_j \ln(\pi_j) + (1-y_j)\ln(1-\pi_j) \right)\]

d)

\(\text{logit}(\pi_i) = \mathbf{x_i}^T\mathbf{\beta}\) and thus \(\pi_i = (1 + e^{-\mathbf{x_i}^T\mathbf{\beta}})^{-1}\) and \(1-\pi_i = (1 + e^{\mathbf{x_i}^T\mathbf{\beta}})^{-1}\), so \(l(\mathbf{\beta)}\) can be written as:

\[l(\mathbf{\beta}) = \sum_{j=1}^n \left( y_j \mathbf{x_i}^T\mathbf{\beta} - \ln(1 + e^{\mathbf{x_i}^T\mathbf{\beta}}) \right)\]

e)

If \(n_j = 1\), \(y_j \leq 1\) and the binomial coefficient is equal to one. \(\ln(1) = 0\) and it disappears. But, as the log-likelihood is used to find the regression parameters, we differentiate with respect to \(\beta\) and the normalization constant disappears. When we use an optimization algorithm it is also superfluous, as it will not change when \(\beta\) changes. The binomial coefficient can become very large (and in the calculation of it, the factorials that are included will be VERY large), so it is wise to always remove it from computational calculations. Also see lchoose if you absolutely need to calculate the logarithm of the normalization constant (on log scale).

f)

Since we have two parameters (\(\beta_0\) and \(\beta_1\)) we get a 3d-plot.

Problem 3: Score function

a)

The score function is a \(p\times 1\) vector of the partial derivatives of the log-likelihood with respect to each regression coefficient.

b)

Use the definition:

\[s(\boldsymbol{\beta})=\frac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} = \sum_{i=1}^n \frac{\partial l_i(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\]

c)

It is necessary in finding the MLE, and tells something about how sensitive the (log-)likelihood is to changes in the parameter values. The covariance matrix of the score function is the Fisher information matrix.

Problem 4: Fisher information

a)

See module pages for definitions: https://www.math.ntnu.no/emner/TMA4315/2018h/3BinReg.html#the_expected_fisher_information_matrix_(f(boldsymbol{beta})). Fisher information is the covariance matrix of the score function. Expected and observed are equal for the canonical link functions (we mainly use canonical links in this course).

b)

The Fisher information matrix is used to calculate the asymptotic covariance matrix for the MLE. It can also be used in the formulation of test statistics, such as the Wald test (week 2 of Module 3).

c)

Also here, use \(\text{logit}(\pi_i) = \mathbf{x}_i^T\beta\), \(\pi_i = (1 + e^{-\mathbf{x_i}^T\mathbf{\beta}})^{-1}\) and \(1-\pi_i = (1 + e^{\mathbf{x_i}^T\mathbf{\beta}})^{-1}\) to obtain an expression with \(\beta\):

\[F(\mathbf{\beta}) = \sum_{j=1}^G {\bf x}_j {\bf x}_j^T n_j \frac{e^{\mathbf{x_i}^T\mathbf{\beta}}}{(1 + e^{\mathbf{x_i}^T\mathbf{\beta}})^2}\]

d)

\[F(\boldsymbol{\beta})=\sum_{j=1}^n {\bf x}_j {\bf x}_j^T \pi_j (1-\pi_j)\]

Problem 5: Maximum likelihood

a)

No solution.

b)

Covergence criterion for the glm: glm.control(epsilon = 1e-8, maxit = 25, trace = FALSE) is the default. See ?glm.control for more.

c)

No solution.

Problem 6: Interpreting results

a)

Intercept: For small numbers, \(\text{expit}(x) \approx exp(x)\), and the exponential of a large negative number is very close to zero. So the probability of dying when the logarithm of the dose is zero (which is when the dose is 1) is extremely small, basically zero.

ldose: if the logarithm of the dose changes one unit, the probability of dying increases with approximately \(e^{34} \approx 6 \cdot 10^{14}\). This is very much, but looking at the data, this does not actually make sense as the data range from about 1.6 to 1.9.

b)

mod <- glm(cbind(y, n - y) ~ ldose, family = "binomial", data = beetle)

ggplot(data = data.frame(x = beetle$ldose, y = mod$fitted.values), mapping = aes(x = x, 
    y = y)) + geom_point() + geom_line()