Hints and reminders are in bold

Questions appear in blue.

Some hints and answers are hidden away in a fold like this

Well done, you found me!

The exam style questions are optional, but will be helpful for, well, the exam. They will also help you now to see if you understand the content.

Interactive session - first week

Problem 1: Exam 2005 (Problem 1d-f - slightly modified) - Female horseshoe crabs and satellites

(Only a subset of 20 horseshoe crabs in the original data was used in the exam problems, but we will use the full data set of 173 horseshoe crabs - so results will not be the same. Permitted aids for the exam was “all printed and handwritten material and all calculators” - NB that is not the case for 2018!)

We assume that the number of satellites for each female horseshoe crab follows a Poisson distribution and want to perform a Poisson regression using a log-link to find if there is a connection between the expected number of satellites Sa and the width W and colour C of the carapace of the female horseshoe crab.

C: the color of the female horseshoe crab (1=light medium, 2=medium, 3=dark medium, 4=dark)
W: width of carapace (cm)

The following model was fitted.

crab <- read.table("https://www.math.ntnu.no/emner/TMA4315/2018h/crab.txt")
colnames(crab) <- c("Obs", "C", "S", "W", "Wt", "Sa")
crab <- crab[,-1] #remove column with Obs
crab$C <- as.factor(crab$C)
modelEXAM <- glm(Sa ~ W + C, family = poisson(link = log), data = crab, contrasts = list(C = "contr.sum"))
summary(modelEXAM)
library(car)
Anova(modelEXAM, type = "III", test.statistic = "Wald")

## 
## Call:
## glm(formula = Sa ~ W + C, family = poisson(link = log), data = crab, 
##     contrasts = list(C = "contr.sum"))
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.92089    0.56010  -5.215 1.84e-07 ***
## W            0.14934    0.02084   7.166 7.73e-13 ***
## C1           0.27085    0.11784   2.298   0.0215 *  
## C2           0.07117    0.07296   0.975   0.3294    
## C3          -0.16551    0.09316  -1.777   0.0756 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 559.34  on 168  degrees of freedom
## AIC: 924.64
## 
## Number of Fisher Scoring iterations: 6
## 
## Analysis of Deviance Table (Type III tests)
## 
## Response: Sa
##             Df  Chisq Pr(>Chisq)    
## (Intercept)  1 27.196  1.839e-07 ***
## W            1 51.350  7.726e-13 ***
## C            3  8.518    0.03644 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

a) Write down the estimated model and evaluate the significance of the estimated coefficients.

Answer

The estimated model is

\[ \begin{aligned} Y_i &\sim \text{Poisson}(\lambda_i) \\ \log{\lambda_i} &= -2.92 + 0.15 \text{W} + 0.27 X_1 + 0.07 X_2 - 0.17 X_3 + \xi X_4 \end{aligned} \]

where \(W\) is the width, and \(X_j\) is an indicator of whether the female is in colour category \(j\). \(\xi\) is the effect of \(X_4\), to be calculated later. Note that we use a sum to zero contrast.

The summary() gives Wald statistics and their associated p-values. The effects of width (W) is significant: p<<0. For the colour effects, C2 is clearly not significantly different from C4: C1 and C3 are both just either side of the critical value at 5%. It is notlcealr why the contrast to C4 is important, though.

How would you proceed to evaluate the significance of the coefficient for the colour C4. Explain (but only do the calculation if you really feel like it).

Hint

First, work out what the coefficient is. Then explain what test you would use, and where you would get the components of it.

If you want to do the calculation, you will need to do a bit of matrix manipulation in R.

Answer

The coefficient is \(-\sum_{i=1}^3{\beta_i} = -(0.27 + 0.07 - 0.17) = -0.17\)

We can calculate the standard error using the contrast, \(C\) and can plug this into

\[ H_0: {\bf C}{\bf \beta}={\bf d} \text{ vs. } H_1: {\bf C}{\bf \beta}\neq {\bf d} \]

i.e. \(C=\{0,0-1,-1,-1 \}\) (the 0s are for the intercept and W term), and \(\beta = \{-2.92, 0.15, 0.27, 0.07, -0.17\}\) so we test if \(-(0.27 + 0.07 - 0.17) = -0.17\) is different from 0. We can use the Wald statistic:

\[ w=({\bf C} \hat{\boldsymbol{\beta}}-{\bf d})^{\text T}[{\bf C}F^{-1}(\hat{\beta}){\bf C}^{\text T}]^{-1}({\bf C}\hat{{\boldsymbol \beta}}-{\bf d}) \]

where \(F^{-1}(\hat{\beta})\) is the covariance matrix, which we can extract with vcov():

vcov(modelEXAM)

##              (Intercept)             W            C1            C2
## (Intercept)  0.313712405 -0.0116075311  0.0090846333  5.635922e-03
## W           -0.011607531  0.0004343333 -0.0002800538 -3.120317e-04
## C1           0.009084633 -0.0002800538  0.0138864871 -2.197738e-03
## C2           0.005635922 -0.0003120317 -0.0021977383  5.323442e-03
## C3          -0.005792980  0.0001811855 -0.0042680583  2.192067e-05
##                        C3
## (Intercept) -5.792980e-03
## W            1.811855e-04
## C1          -4.268058e-03
## C2           2.192067e-05
## C3           8.679453e-03

C = t(as.vector(c(0, 0, -1, -1, -1)))
betahat <- coef(modelEXAM)
d <- 0
invF <- vcov(modelEXAM)

# solve() calculates the inverse of the matrix
(w <- t(C %*% betahat - d) %*% solve(C %*% invF %*% t(C)) %*% (C %*% betahat - d))

##          [,1]
## [1,] 2.076764

(the calculation of w involves a bit of over-kill and matrix operations)

We are testing one hypothesis , so \(r=1\), so we compare this to a \(\chi^1_1\) distribution:

pchisq(w, 1, lower.tail = FALSE)

##           [,1]
## [1,] 0.1495569

So this is not significant at 5%.

Perform a test to evaluate the fit of the model. What is your conclusion? Use a 5% significance level for your evaluations.

Hint

We are asking if any variation not explained by the model could just be random Poisson noise.

Answer

We want to test if the residual deviance follows a \(\chi^2\) distribution. Luckily summary() gives us what we need:

summary(modelEXAM)

## 
## Call:
## glm(formula = Sa ~ W + C, family = poisson(link = log), data = crab, 
##     contrasts = list(C = "contr.sum"))
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.92089    0.56010  -5.215 1.84e-07 ***
## W            0.14934    0.02084   7.166 7.73e-13 ***
## C1           0.27085    0.11784   2.298   0.0215 *  
## C2           0.07117    0.07296   0.975   0.3294    
## C3          -0.16551    0.09316  -1.777   0.0756 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 632.79  on 172  degrees of freedom
## Residual deviance: 559.34  on 168  degrees of freedom
## AIC: 924.64
## 
## Number of Fisher Scoring iterations: 6

The residual deviance is 559.344785, with 168 degrees of freedom. The p-value is thus:

pchisq(modelEXAM$deviance, modelEXAM$df.residual, lower = FALSE)

## [1] 1.471037e-43

which is horribly significant.

(In Problem 3d we derive the mathematical formula for the test statistic this test is based on.)

b) Make a sketch by hand illustrating the connection between the expected number of satellites and the width of the carapace for each of the four colours of the horseshoe crab.

Answer

These are the coefficients:

coef(modelEXAM)

## (Intercept)           W          C1          C2          C3 
## -2.92089322  0.14934271  0.27085229  0.07116581 -0.16551073

Start by drawing on the log scale, so we get straight lines. The C4 effect is about -0.17, so C3 and C4 cross the x axis at about -3.1, and the y-axis at about 3.1/0.15=~20. C2 crosses the x axis at about -2.85, and C1 at -2.65.

But in the data W ranges between about 21 and 33, so the values of \(\log{\lambda}\) are all positive: for C3 \(\log{\lambda}\) is between about \(-3.1+21\times 0.15 = 0.05\) and \(-3.1+33\times 0.15 = 1.85\).

On the log scale these are all parallel. But on the data sale, they will diverge. So we end up with something like this mess:

Two hand drawn graphs of the effect of W on expected number of satellites

The estimated multiplicative change in the expected number of satellites when the width of the carapace is changed by 1 cm is, according to the model, independent of the colour.

Explain why.

Answer

Because the model does not include any interaction between W and C.

Also find a 95% confidence interval for this change.

Answer

exp(confint(modelEXAM)["W", ])

##    2.5 %   97.5 % 
## 1.114412 1.209282

Let \(\hat{\eta}(x_{W})\) be the estimated linear predictor when the width of the carapace is \(x_{W}\) and the horseshoe crab is “light medium”.

What is the distribution of \(\hat{\eta}(x_{W})\)?

Answer

It is, asymptotically at least, a linear combination of normal random variables (the estimates of the parameters), so must be Normal.

Which value of \(x_W\) would give estimated mean number of satellites equal to 5?

Answer

We need to plug in the slope for the W effect, and the “light medium” (=C1) effect:

\[ \ln(5) = -2.92 + 0.15\times W_5 + 0.27 = -2.65 + 0.15\times W_5 \]

i.e.

\[ W_5 = \frac{\ln(5) + 2.65}{0.15} = 28.4 \]

Or, in R:

(W5 = (log(5) - (coef(modelEXAM)["(Intercept)"] + coef(modelEXAM)["C1"]))/coef(modelEXAM)["W"])

## (Intercept) 
##     28.5215

The slight difference is due to rounding.

Optional: Using R – for this value of \(x_W\), construct a 95% confidence interval for the mean number of satellites.

Hint

use the predict function: ?predict.glm tells you more.

Answer

We do this be predicting the value.

newd <- data.frame(C = factor(1, levels = levels(crab$C)), W = W5)
(pred <- predict(modelEXAM, newdata = newd, type = "link", se.fit = TRUE))

# 95% CI:

(c(pred$fit + 1.96 * pred$se.fit, pred$fit - 1.96 * pred$se.fit))

## $fit
## (Intercept) 
##    1.609438 
## 
## $se.fit
## [1] 0.1448597
## 
## $residual.scale
## [1] 1
## 
## (Intercept) (Intercept) 
##    1.893363    1.325513

Optional: Use ggplot to create (and improve) the sketch from b).

Answer/Hint (not yet added)

Problem 2: Exam 2017 (Problem 1) - Poisson regression

Consider a random variable \(Y\). In our course we have considered the univariate exponential family having distribution (probability density function for continuous variables and probability mass function for discrete variables) \[ f(y)=\exp \left( \frac{y \theta-b(\theta)}{\phi}w + c(y,\phi,w) \right)\] where \(\theta\) is called the natural parameter (or parameter of interest) and \(\phi\) the dispersion parameter.

The Poisson distribution is a discrete distribution with probability mass function \[ f(y)=\frac{\lambda^{y}}{y!}\exp(- \lambda), \text{ for } y=0,1,\ldots\] where \(\lambda>0\).

a) (10 points)

Show that the Poisson distribution is a univariate exponential family, and specify what the elements of the exponential family (\(\theta\), \(\phi\), \(b(\theta)\), \(w\), \(c(y,\phi,w)\)) are.

Hint

This means re-writing the Poisson log likelihood in the exponential form, and playing “spot the \(\theta\)” etc.

Answer

The log likelihood for a single datum, \(y\), can be written as

\[ \begin{aligned} l(\theta)&=\ln L(\theta)=[y \ln(\lambda)-\lambda-\ln(y!)] \\ &= \frac{y \ln(\lambda)-\lambda}{1}1 + \ln(y!) \end{aligned} \]

So \(\theta=\ln(\lambda)\), \(=b(\theta)=\lambda\), \(w=\phi=1\) and \(c(y,\phi,w)=\ln(y!)\).

What is the connections between \(\text{E}(Y)\) and elements of the exponential family?

Hint

What is \(\text{E}(Y)\) for a Poisson?

Answer

\(\text{E}(Y) = e^\theta\). It is also \(b(\theta)\), and also \(b'(\theta)\), \(b''(\theta)\), \(b'''(\theta)\) etc. We will see the general relationship between \(\text{E}(Y)\) and hte parts of the exponential family in module 5.

What is the connections between \(\text{Var}(Y)\) and elements of the exponential family?

Answer

Because \(\text{E}(Y) = \text{Var}(Y)\) this is the same as the previous answer! Again, we will see the general form later (but if you look back at the binomial module, you might eb able to guess it)

Use these connections to derive the mean and variance for the Poisson distribution.

Answer/Hint (not yet added)

If the Poisson distribution is used as the distribution for the response in a generalized linear model, what is the canonical link function?

Answer

The log link.

b) (15 points)

We consider a Poisson regression with log link \(\eta_i=g(\mu_i)=\ln(\mu_i)\), and linear predictor equal to \(\eta_i={\bf x}_i^T \boldsymbol{\beta}\). Further, let \(p\) be the number of regression parameters in \(\boldsymbol{\beta}\) (intercept included). The response–covariate pairs \((Y_i, {\bf x}_i)\) are independent for \(i=1,\ldots, n\).

Does this set-up satisfy the requirements of a GLM model? Explain.

Hint (not yet added)

Answer

Yes it does. The Poisson distribution is a member of the exponential family, indeed the log link it the cononical link for the Poisson. The data are independent pairs, and the covaraites have a linear effect on \(\eta_i\).

Write down the log-likelihood.

Answer

\[ \log(f(y)) =\log \left(\frac{\lambda^{y}}{y!}\exp(- \lambda)\right) \]

Strictly this would be correct, but a bit naughty. This is better:

\[ l(y) = y \log \lambda - \lambda - \log(y!) \]

And if you forget the \(- \log(y!)\), it’s still right (if not exact).

From the log-likelihood, derive the formula for the score function \({\bf s}(\boldsymbol{\beta})\) and the expected Fisher information matrix, \({\bf F}(\boldsymbol{\beta})\).
What are the dimensions of \({\bf s}(\boldsymbol{\beta})\) and \({\bf F}(\boldsymbol{\beta})\)?

Answer

Above we wrote “let \(p\) be the number of regression parameters in \(\boldsymbol{\beta}\) (intercept included)”, so

\({\bf s}(\boldsymbol{\beta})\) is a vector of length \(p\), and
\({\bf F}(\boldsymbol{\beta})\) is a \(p \times p\) matrix.

How can \({\bf s}(\boldsymbol{\beta})\) and \({\bf F}(\boldsymbol{\beta})\) be used to arrive at a maximum likelihood estimate for \(\boldsymbol{\beta}\)?

Answer

The MLE for \(\boldsymbol{\beta}\) is the solution to \({\bf s}(\boldsymbol{\beta}) = 0\). We can solve this using Fisher scoring, which is also a Newton-Raphson appraoch (because the observed and expected Fisher information are the same). i.e. we iterate

\[ \beta^{(t+1)} = \beta^{(t)} + F(\beta^{(t)})^{-1} s(\beta^{(t)}) \] until \(\beta^{(t+1)} - \beta^{(t)} < \epsilon\), for some small \(\epsilon\).

c) (10 points)

We now look at a data set giving the number of species of tortoise on the various Galapagos Islands (Data taken from the book “Practical Regression and Anova using R” by Julian J. Faraway.).

The data set contains measurements on 30 islands, and we study the following variables:

Species: The number of species of tortoise found on the island.
Area: The area of the island (km\(^2\)).
Elevation: The highest elevation of the island (m).
Nearest: The distance from the nearest island (km).
Scruz: The distance from Santa Cruz island (km).
Adjacent: The area of the adjacent island (km\(^2\)).

We have fitted a Poisson regression with log link to Species as response, and the other five variables are used as continuous covariates. Print-out from the fitted model is given in below.

library(faraway)
fit <- glm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, data = gala,
    family = poisson(link = log))
summary(fit)

## 
## Call:
## glm(formula = Species ~ Area + Elevation + Nearest + Scruz + 
##     Adjacent, family = poisson(link = log), data = gala)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  3.155e+00  5.175e-02  60.963  < 2e-16 ***
## Area        -5.799e-04  2.627e-05 -22.074  < 2e-16 ***
## Elevation    3.541e-03  8.741e-05  40.507  < 2e-16 ***
## Nearest      8.826e-03  1.821e-03   4.846 1.26e-06 ***
## Scruz       -5.709e-03  6.256e-04  -9.126  < 2e-16 ***
## Adjacent    -6.630e-04  2.933e-05 -22.608  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 3510.73  on 29  degrees of freedom
## Residual deviance:  716.85  on 24  degrees of freedom
## AIC: 889.68
## 
## Number of Fisher Scoring iterations: 5

Let \(\boldsymbol{\beta}\) be a \(6 \times 1\) column vector with the regression coefficients (intercept included), and let \(\boldsymbol{\hat{\beta}}\) be the maximum likelihood estimator for \(\boldsymbol{\beta}\).

Write down the asymptotic distribution for \(\boldsymbol{\hat{\beta}}\), and specify how the covariance matrix for \(\boldsymbol{\hat{\beta}}\) is estimated.

Answer

Assymptotically \(\boldsymbol{\hat{\beta}} \sim N(\boldsymbol{\beta}), F(\beta))\) where \(F(\beta) = \text{Cov}(\beta)\), i.e. the expected Fisher information matrix.

We can approximate \(F(\beta)\) with the observed Fisher information calculated at \(\boldsymbol{\hat{\beta}}\), i.e.

\[ H(\boldsymbol{\hat{\beta}}) = -\frac{\partial s(\boldsymbol{\hat{\beta}})}{\partial \boldsymbol{\hat{\beta}}^T} \]

We will focus on the effect of Elevation, and denote the corresponding regression coefficient \(\beta_2\).

Write down the maximum likelihood estimate for \(\beta_2\) in the print-out above.

Answer

3.541e-03

(or coef(fit)["Elevation"] = 0.0035406 if you want to extract it).

How can you explain this value to a biologist interested in understanding the effect of Elevation on the number of species of tortoise found on the islands?

Answer

It say that the number of species increase \(e^{0.00335} = 1.00035\) times for each meter higher an island is.

This is small, but so is a 1m change in elevation. So we can talk about a change in 100m: this would increase the number of species \(e^{100*0.00335} = 1.42\) times.

In fact, a change of 200m doubles the numer of species.

What is numerical value for the estimated standard deviation of \(\hat{\beta}_2\) given in above?

Answer

8.741e-05

(or sqrt(vcov(fit)["Elevation", "Elevation"]) = 8.7407037^{-5} if you want to extract it).

Construct an approximate 95% confidence interval for \(\beta_2\).

Answer

Approximately this is \({\hat \beta}_2 \pm 1.96 \hat{\sigma}_2\),

i.e. 3.541e-03 \(\pm 1.96 \times\) 8.741e-05 = (3.37e-03, 3.71e-03).

Problem 3: Exam December 2017 from UiO, Problem 1.

(written out - with small changes to fit the notation we use - from https://www.uio.no/studier/emner/matnat/math/STK3100/h17/stk3100-4100_2017_2eng.pdf)

Assume that the random variable \(Y\) is Poisson distributed with probability mass function (pmf)

\[\text{P}(Y = y | \lambda)= \frac{\lambda^y}{y!} \exp(-\lambda), \ y = 0, 1, 2, \dots.\]

a) Show that the distribution of \(Y\) is an exponential family, that is, show that the pmf can be written in the form \[\exp\left\{\frac{y \theta - b(\theta)}{\phi}w + c(y, \phi, w)\right\},\] and determine \(\theta\), \(\phi\), \(w\), \(b(\theta)\) and \(c(y, \phi, w)\).

Answer

The log likelihood for a single datum, \(y\), can be written as

\[ \begin{aligned} l(\theta)&=\ln L(\theta)=[y \ln(\lambda)-\lambda-\ln(y!)] \\ &= \frac{y \ln(\lambda)-\lambda}{1}1 + \ln(y!) \end{aligned} \]

So \(\theta=\ln(\lambda)\), \(=b(\theta)=\lambda\), \(w=\phi=1\) and \(c(y,\phi,w)=\ln(y!)\).

(seem familiar? I just copied and pasted from Q2)

We then assume that \(Y_1, Y_2, \dots, Y_n\) are independent with the pmf from a), and let \(\mu_i = \text{E}(Y_i)\), \(i = 1, \dots, n\).

b) Explain what we mean by a generalized linear model (GLM) for \(Y_1, Y_2, \dots, Y_n\) with link function \(g(\cdot)\), and determine the canonical link function.

Answer

A GLM is a model for the data, \(Y_1, Y_2, \dots, Y_n\), where we assume \(g(\text{E}(Y_i)) = \bf{x}_i' \beta\), where \(g()\) is the link function.

For the Poisson model we see \(\theta=\ln(\lambda)\), where \(\lambda\) is \(\text{E}(Y))\), so \(\log\) must be the canonical link.

c) Derive an expression for the log-likelihood function \(l(\mathbf{\mu}; \mathbf{y})\), where \(\mathbf{y} = (y_1, \dots, y_n)^T\) is the observed value of \(\mathbf{Y} = (Y_1, Y_2, \dots, Y_n)^T\) and \(\mathbf{\mu} = (\mu_1, \dots, \mu_n)^T\).

Hint

Use the answer to (a).

Answer

First, \(l(\mathbf{\mu}; \mathbf{y}) = \sum_{i=1}^n{\log \text{P}(Y_i = y_i | \mu_i)}\), so

\(l(\mathbf{\mu}; \mathbf{y}) = \sum_{i=1}^n{\log \text{P}(Y_i = y_i | \mu_i)}\), so from answer (a) we see \(l(\mu_i; y_i)=y_i \ln(\mu_i)-\lambda_i-\ln(y_i!)\), and thus

\[ l(\mathbf{\mu}; \mathbf{y}) = \sum_{i=1}^n{y_i \ln(\mu_i)} - \sum_{i=1}^n{\mu_i}- \sum_{i=1}^n{\ln(y_i!)} \]

d) Explain what we mean by a saturated model and determine the maximum of \(l(\mathbf{\mu}; \mathbf{y})\) for the saturated model.

Answer

A saturated model is one where there is a parameter for every data point, i.e. \(p=n\)

We can find \(l(\mathbf{\mu}; \mathbf{y})\) for the saturated model by solving \(\partial l(\mu_i; y_i)/\partial \mu_i =0\)

\[ \begin{aligned} \frac{\partial l(\mu_i; y_i)}{\partial \mu_i}&=\frac{\partial \left(y \ln(\mu_i)-\mu_i-\ln(y_i!)\right)}{\partial \mu_i} =0\\ &= \frac{y_i}{\mu_i} -1=0 \end{aligned} \]

So \(y_i = \mu_i\)

e) Explain what we mean by the deviance \(D(\mathbf{y}; \mathbf{\hat{\mu}})\) of a Poisson GLM, find an expression for the deviance, and discuss how it may be used.

Answer

The deviance of a model is -2 times the difference between the log likelihood for the model and the saturated likelihood, i.e. \(- 2 (l(\mathbf{\beta}; \mathbf{y}) - l(\mathbf{\mu}; \mathbf{y}))\).

So for the Poisson distribution, using \(\tilde{\mu}_i\) for the estimate of \(\mu_i\) at the model of interest, and substituting \(\mu_i = y_i\) for the saturated model, we get

\[ \begin{aligned} - 2 (l(\mathbf{\beta}; y_i) - l(\mu_i; y_i)) &= -2\left(\left(y_i \ln(\tilde{\mu}_i)-\tilde{\mu}_i-\ln(y_i!)\right) - \left(y_i \ln(y_i)-y_i-\ln(y_i!) \right)\right)\\ &= -2\left( y_i (\ln \tilde{\mu}_i -\ln y_i) - (\tilde{\mu}_i - y_i) \right) \\ &= 2y_i \ln \frac{y_i}{\tilde{\mu}_i} - 2( y_i - \tilde{\mu}_i) \end{aligned} \]

(there is some flexibility on where to stop fiddling to make this look the most elegant)

R packages

install.packages(c("tidyverse", "ggplot2", "statmod", "corrplot", "ggplot2", "GGally",
    "boot"))

TMA4315 Generalized linear models H2018

Module 4: Count and continuous positive response data (Poisson and gamma regression)

Mette Langaas, Department of Mathematical Sciences, NTNU – with contributions from Ingeborg Hem

27.09.2018 and 04.10.2018 [PL], 28.09.2018 and 05.10.2018 [IL]

Interactive session - first week

Problem 1: Exam 2005 (Problem 1d-f - slightly modified) - Female horseshoe crabs and satellites

Problem 2: Exam 2017 (Problem 1) - Poisson regression

Problem 3: Exam December 2017 from UiO, Problem 1.

R packages

Further reading