smoking <- read.table(file = "https://www.math.ntnu.no/emner/TMA4315/2018h/smoking.txt")
head(smoking)
## dead pop age ageLevel smoke
## 1 18 656 40-44 1 no
## 2 22 359 45-59 2 no
## 3 19 249 50-54 3 no
## 4 55 632 55-59 4 no
## 5 117 1067 60-64 5 no
## 6 170 897 65-69 6 no
nrow(smoking)
## [1] 36
model1 <- glm(dead ~ age + smoke, family = "poisson", data = smoking,
offset = log(pop)) # note that the population is the offset here
summary(model1)
##
## Call:
## glm(formula = dead ~ age + smoke, family = "poisson", data = smoking,
## offset = log(pop))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.06055 -0.54773 0.06431 0.29963 1.48348
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.63222 0.06783 -53.552 < 2e-16 ***
## age45-59 0.55388 0.07999 6.924 4.38e-12 ***
## age50-54 0.98039 0.07682 12.762 < 2e-16 ***
## age55-59 1.37946 0.06526 21.138 < 2e-16 ***
## age60-64 1.65423 0.06257 26.439 < 2e-16 ***
## age65-69 1.99817 0.06279 31.824 < 2e-16 ***
## age70-74 2.27141 0.06435 35.296 < 2e-16 ***
## age75-79 2.55858 0.06778 37.746 < 2e-16 ***
## age80+ 2.84692 0.07242 39.310 < 2e-16 ***
## smokecigarretteOnly 0.36915 0.03791 9.737 < 2e-16 ***
## smokecigarrettePlus 0.17015 0.03643 4.671 3.00e-06 ***
## smokeno -0.04781 0.04699 -1.017 0.309
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4055.984 on 35 degrees of freedom
## Residual deviance: 21.487 on 24 degrees of freedom
## AIC: 285.51
##
## Number of Fisher Scoring iterations: 4
An offset is somewthing we include in the model to model the rate of something happening, rather than the occurrence. This is useful when we have samples where the group sizes differ. It is rather similar to our grouped version of the binomial. The offset is not a quantity to be estimated, we know the offset for each (aggregated) observation.
Estimated number of deaths per person over 6 years for 53-year old non-smoker: \(\hat{\lambda} = \exp(\hat{\beta_0} + \hat{\beta}_{50-54} + \hat{\beta}_{no})\), values below.
names(model1$coefficients)
## [1] "(Intercept)" "age45-59" "age50-54"
## [4] "age55-59" "age60-64" "age65-69"
## [7] "age70-74" "age75-79" "age80+"
## [10] "smokecigarretteOnly" "smokecigarrettePlus" "smokeno"
beta_0 <- coef(model1)[1]
beta_53 <- coef(model1)[3]
beta_no <- coef(model1)[12]
cat("Estimated number of cases of lung cancer:", as.numeric(exp(beta_0 +
beta_53 + beta_no)))
## Estimated number of cases of lung cancer: 0.06722989
Number of df: no. of observations minus no. of estimated parameters. With 12 estimated parameters and 36 observations this gives 24 df.
Goodness of fit: The \(\chi^2\) critical value with 24 df is:
qchisq(0.05, 24, lower.tail = FALSE)
## [1] 36.41503
The residual deviance is lower, thus the model fits the data well. \(p\)-value is
1 - pchisq(model1$deviance, model1$df.residual)
## [1] 0.609872
First - simple notation: Let \(\lambda(a,s_1) = \exp(\log(\text{pop}) + \beta_0 + \beta_a + \beta_{s_1})\) and \(\lambda(a,s_2) = \exp(\log(\text{pop}) + \beta_0 + \beta_a + \beta_{s_2})\), the the ratio is
\[ r(a,s_1,s_2)=\frac{\exp(\log(\text{pop}) + \beta_0+\beta_a+\beta_{s_1})}{\exp(\log(\text{pop}) + \beta_0+\beta_a+\beta_{s_2})}=\frac{\exp(\beta_{s_1})}{\exp(\beta_{s_2})}\]
and does not vary as a function of \(a\). Remark: for our choice the \(\beta_{s_1}\) is the reference category in the dummy variable coding, so here replaced by \(0\).
Alternative notation (more close to original lf): \[\log \lambda(a,s) = \log(\text{pop}) + \beta_0 + \sum_{i = 2}^9 \beta_{i, age} I(a = i) + \sum_{i = 2}^4 \beta_{i, smoke} I(s = i) \] where \(I(a = i)\) is one if the age is at level number \(i\) (where 40-44
is 1, 45-49
is 2 and so on) and 0 else, and \(I(s = i)\) is one if the smoke level is at level \(i\) (cigarPipeOnly
is 1, cigaretteOnly
is 2, cigarettePipe
is 3 and no
is 4). Note that the sums starts at \(i=2\), this is because the first age level (40-44
) and the first smokelevel (cigarPipeOnly
) are included in the intercept (which means that persons between 40 and 44 that smokes only cigar and/or pipe will have \(\exp(beta_0)\) estimated number of deaths due to lung cancer per person). The logatithm of \(r\) is (when the age and offset terms cancels):
\[\log r(a,s_1.s_2) = \log \lambda (a, s_1) - \log \lambda (a, s_2) = \sum_{i = 2}^4\beta_{i, smoke}[I(s_1 = i) - I(s_2 = i)] = \log r(s_1, s_2) \]
As we can see; independent of \(a\) (note that the intercept cancels as well). If we had included an interaction term between the age and smoking status then the ratio would have been dependent on the age.
For \(s_1 =\) cigarPipeOnly
and \(s_2 =\) cigaretteOnly
:
\[ r(a, s_1 = 1, s_2 = 2) = \exp(0-1\cdot\beta_{2, smoke}) \approx \exp(-0.36915) = 0.6913217\]
exp(-coef(model1)[10])
## smokecigarretteOnly
## 0.6913196
90 % CI: \(r(s_1, s_2) \pm z_{0.05} SD(r(s_1,s_2))\). We find it on log-scale, and transform it (as we need the assumption that the \(\beta\)’s are normal, which is not true for the exponentials!)
Estimate: \(\beta_{2, smoke} = 0.36915 = -\log r(s_1,s_2)\), with standard deviation \(0.03791\). So we get the interval (using that \(z_{0.05} = 1.645\)) \([0.6495248, 0.7358038]\) for \(r(s_1, s_2)\) (\([-0.4315143, -0.3067918]\) on log-scale).
# the minus is there because we want the estimate for r and not the
# coefficient
estimate <- -coef(model1)[10]
sd_estimate <- sqrt(vcov(model1)[10, 10])
critval <- qnorm(0.05, lower.tail = FALSE)
lower <- estimate - critval * sd_estimate
upper <- estimate + critval * sd_estimate
cat("log-scale:\nValue: ", estimate, "\nInterval: [", lower, ", ", upper,
"]", sep = "")
## log-scale:
## Value: -0.3691531
## Interval: [-0.4315143, -0.3067918]
cat("Real scale:\nValue: ", exp(estimate), "\nInterval: [", exp(lower),
", ", exp(upper), "]", sep = "")
## Real scale:
## Value: 0.6913196
## Interval: [0.6495248, 0.7358038]
If the ratio between the means equal 1, we can not distinguish between the two situations. As 1 is not in the interval, the difference between only cigarettes and only cigar/pipe is significant (can also see that the interval on log-scale does not contain 0).
model2 <- glm(dead ~ smoke, family = "poisson", data = smoking, offset = log(pop))
model3 <- glm(dead ~ ageLevel + smoke, family = "poisson", data = smoking,
offset = log(pop))
summary(model2)
##
## Call:
## glm(formula = dead ~ smoke, family = "poisson", data = smoking,
## offset = log(pop))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -24.546 -5.892 -2.310 8.343 15.612
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.47319 0.03099 -47.532 < 2e-16 ***
## smokecigarretteOnly -0.31219 0.03576 -8.729 < 2e-16 ***
## smokecigarrettePlus -0.43013 0.03468 -12.402 < 2e-16 ***
## smokeno -0.36678 0.04669 -7.855 3.98e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4056.0 on 35 degrees of freedom
## Residual deviance: 3910.7 on 32 degrees of freedom
## AIC: 4158.7
##
## Number of Fisher Scoring iterations: 5
summary(model3)
##
## Call:
## glm(formula = dead ~ ageLevel + smoke, family = "poisson", data = smoking,
## offset = log(pop))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.0918 -1.1673 -0.2755 0.7803 2.6364
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.705950 0.050717 -73.071 < 2e-16 ***
## ageLevel 0.333006 0.005591 59.559 < 2e-16 ***
## smokecigarretteOnly 0.405019 0.037463 10.811 < 2e-16 ***
## smokecigarrettePlus 0.203426 0.035996 5.651 1.59e-08 ***
## smokeno -0.032927 0.046894 -0.702 0.483
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 4055.984 on 35 degrees of freedom
## Residual deviance: 75.734 on 31 degrees of freedom
## AIC: 325.76
##
## Number of Fisher Scoring iterations: 4
First, test the goodness of fit for each model as compared to the saturated model using the deviance.
At the exam the students would need to find critical values for the \(\chi^2\) distribution, and for the two models these are
qchisq(0.05, 32, lower.tail = FALSE)
qchisq(0.05, 31, lower.tail = FALSE)
1 - pchisq(model2$deviance, model2$df.residual)
1 - pchisq(model3$deviance, model3$df.residual)
## [1] 46.19426
## [1] 44.98534
## [1] 0
## [1] 1.300128e-05
Neighter model2
or model3
are good, reject the null hypothesis that the model is good. Thus only model1
is a good model (not reject the null).
We may use the deviance to compare models that are nested within each other, so we could compare model2
and model3
, and also model2
and model1
. We do this with the likelihood ratio test.
# model2 nested within model1
model2$deviance - model1$deviance
model2$df.residual - model1$df.residual
1 - pchisq(model2$deviance - model1$deviance, model2$df.residual - model1$df.residual)
# H0 is to prefer the small model and H1 to prefer the large model,
# here we reject H0
## [1] 3889.218
## [1] 8
## [1] 0
# model2 nested within model3
model2$deviance - model3$deviance
model2$df.residual - model3$df.residual
1 - pchisq(model2$deviance - model3$deviance, model2$df.residual - model3$df.residual)
# H0 is to prefer the small model and H1 to prefer the large model,
# here we reject H0
## [1] 3834.97
## [1] 1
## [1] 0
The conclusion is to prefer model1
to model2
, and to prefer model3
to model2
. But, we can not compare model1
to model3
since theses are not nested. Overall - we should stay with model1
that was the only passing the goodness of fit test.
library(ggplot2); library(reshape2)
model3 <- glm(dead ~ ageLevel + smoke, family = "poisson", data = smoking, offset = log(pop))
age <- 1:9
beta_0 <- coef(model3)[1]
beta_1 <- coef(model3)[2]
cigO <- coef(model3)[3]
cigP <- coef(model3)[4]
no <- coef(model3)[5]
model3_frame <- data.frame(age = age,
cigarPipeOnly = beta_0 + age*beta_1,
cigaretteOnly = beta_0 + age*beta_1 + cigO,
cigarettePlus = beta_0 + age*beta_1 + cigP,
no = beta_0 + age*beta_1 + no)
model3_df <- melt(model3_frame, id.vars = "age", variable.name = "smoking", value.name = "lambda")
gg1 <- ggplot(model3_df, aes(x = age, y = lambda, col = smoking)) + geom_line() +
labs(x = "Age", y = expression(log(lambda))) +
scale_x_discrete(limits = 1:9, labels = as.character(unique(smoking$age))) +
theme(axis.text.x = element_text(angle = 45))
gg2 <- ggplot(model3_df, aes(x = age, y = exp(lambda), col = smoking)) + geom_line() +
labs(x = "Age", y = expression(lambda)) +
scale_x_discrete(limits = 1:9, labels = as.character(unique(smoking$age))) +
theme(axis.text.x = element_text(angle = 45))
ggpubr::ggarrange(gg1, gg2, common.legend = TRUE)
# smokeno not significant
model3_frame2 <- data.frame(age = age,
cigarPipeOnly = beta_0 + age*beta_1,
cigaretteOnly = beta_0 + age*beta_1 + cigO,
cigarettePlus = beta_0 + age*beta_1 + cigP,
no = beta_0 + age*beta_1 + no*0)
model3_df2 <- melt(model3_frame2, id.vars = "age", variable.name = "smoking", value.name = "lambda")
gg3 <- ggplot(model3_df2, aes(x = age, y = lambda, col = smoking)) + geom_line() +
labs(x = "Age", y = expression(log(lambda))) +
scale_x_discrete(limits = 1:9, labels = as.character(unique(smoking$age))) +
theme(axis.text.x = element_text(angle = 45))
gg4 <- ggplot(model3_df2, aes(x = age, y = exp(lambda), col = smoking)) + geom_line() +
labs(x = "Age", y = expression(lambda)) +
scale_x_discrete(limits = 1:9, labels = as.character(unique(smoking$age))) +
theme(axis.text.x = element_text(angle = 45))
ggpubr::ggarrange(gg3, gg4, common.legend = TRUE)
We see from the first (and second) plot that the line for smokeno
and cigarPipeOnly
are very close to each other, so it is no surprise that smokeno
is not significant.
Member of exponential family if
\[ f(y) = \exp\left(\frac{y\theta - b(\theta)}{\phi} w + c(y, \phi, w)\right). \]
\(Gamma(\nu, \mu)\) can be written as
\[ f_Y(y) = \exp\left( \frac{\frac{-1}{\mu}y + \log\left(-\frac{-1}{\mu}\right)}{\frac{1}{\nu}} + \nu \log(\nu) + (\nu-1)\log(y) - \log(\Gamma(\nu)) \right) \]
See classnotes for details: derivation from Module 1
which means that
\[ \theta = \frac{-1}{\mu}, \ \ \phi = \frac{1}{\nu}, \ \ w = 1, \text{ and } b(\theta) = - \log\left(-\frac{-1}{\mu}\right) = - \log(-\theta). \]
Thus the Gamma distribution with unknown \(\mu\) and known \(\nu\) is an exponential family.
The expected value: E\((Y) = b'(\theta) = \frac{1}{-\theta} = \mu\).
The variance: Var\((Y) = b''(\theta) \frac{\phi}{w} = \frac{1}{\theta^2} \frac{\phi}{w} = \mu^2 \frac{1}{\nu} = \frac{\mu^2}{\nu}\).
See classnotes in Module 5 for proof for these general formulas- link to be added.
Canonical link: Given by \(\theta = -1/\mu\). But we usually use log-link, which means that the expected and observed Fisher information will differ.
A saturated model is a model with as many parameters as possible. In most cases one parameter per observation.
The log-likelihood for one observation:
\[l_i(\mu_i) = \log(f_Y(y)) = \frac{\frac{-1}{\mu_i}y_i + \log\left(-\frac{-1}{\mu_i}\right)}{\frac{1}{\nu}} + \nu \log(\nu) + (\nu-1)\log(y_i) - \log(\Gamma(\nu))\]
which has the derivative
\[l_i' = \nu \frac{d}{d\mu_i} \left( \frac{-1}{\mu_i}y_i + \log\left(\frac{1}{\mu_i}\right) \right) = \nu \left( \frac{y_i}{\mu_i^2} - \frac{1}{\mu_i} \right) = \frac{\nu}{\mu_i} \left( \frac{y_i}{\mu_i} - 1 \right). \]
This leads to the MLE \(\hat{\mu_i} = y_i\).
The deviance based on all \(N\) observations for a model is
\[ D = 2(l_{saturated} - l_{model}) = 2\sum_{i=1}^N(l(\mu_i = y_i)-l(\mu_i = \hat{\mu_i})) \\ = 2\sum_{i=1}^N\left(\nu\left(\frac{-y_i}{y_i} - \log(y_i)\right) - \nu\left( \frac{-y_i}{\hat{\mu_i}}-\log(\hat{\mu_i}) \right)\right) = -2\sum_{i=1}^N\nu\left( \frac{\hat{\mu_i}-y_i}{\hat{\mu_i}} + \log\left(\frac{y_i}{\hat{\mu_i}}\right) \right) \]
It is reasonable to use a Gamma distribution for amount of precipitation given precipitation. From a) and b) we have learned that the Gamma distribution is a member of the exponential family, and hence it can be used as response for a GLM. Gamma distributions require a positive expected value, and hence a log-link (or an other monotone differential function that ensures positive \(\mu\)) should be used. Note that the log-link is not the canoical link! The linear component \(\eta_i = \beta_0 + \beta_1 x_i\) is one alternative. As log-link is used, we get E\((Y_i) = \exp(\eta_i)\). Alternatively, \(\eta_i = \beta_0 + \beta_1\log(x_i)\) is another possible choice.
filepath <- "https://www.math.ntnu.no/emner/TMA4315/2017h/breastcancer.txt"
brc <- read.table(filepath, header = TRUE, colClasses = c(rep("numeric",
2), rep("factor", 4)))
summary(glm(formula = cbind(surv, nsurv) ~ fapp + fcountry, family = binomial,
data = brc))
##
## Call:
## glm(formula = cbind(surv, nsurv) ~ fapp + fcountry, family = binomial,
## data = brc)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8142 -0.7279 0.2147 0.7576 1.8715
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.0803 0.1656 6.522 6.93e-11 ***
## fapp2 0.5157 0.1662 3.103 0.001913 **
## fcountry2 -0.6589 0.1998 -3.298 0.000972 ***
## fcountry3 -0.4945 0.2071 -2.388 0.016946 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 57.588 on 34 degrees of freedom
## Residual deviance: 36.625 on 31 degrees of freedom
## AIC: 139.18
##
## Number of Fisher Scoring iterations: 4
Within the same hospital, \(e^{\hat{\beta_1}} = 1.67\) represents the predicted proportional increase of the odds of survival of having a benign tumor (level 2) with respect to having a malignant tumor. (Yes, survival should increase from malignant to benign.)
The predicted odds for survival within country \(j\) with benign tumor is
\[\frac{\hat{\pi}_{bj}}{1-\hat{\pi}_{bj}} = \begin{cases} e^{\hat{\beta_0} + \hat{\beta_1}} \text{ if } j = 1 \\ e^{\hat{\beta_0} + \hat{\beta_1} + \hat{\beta_2}} \text{ if } j = 2 \\ e^{\hat{\beta_0} + \hat{\beta_1} + \hat{\beta_3}} \text{ if } j = 3 \end{cases}\]
The predicted odds for survival within country \(j\) with malign tumor is
\[\frac{\hat{\pi}_{mj}}{1-\hat{\pi}_{mj}} = \begin{cases} e^{\hat{\beta_0}} \text{ if } j = 1 \\ e^{\hat{\beta_0} + \hat{\beta_2}} \text{ if } j = 2 \\ e^{\hat{\beta_0} + \hat{\beta_3}} \text{ if } j = 3 \end{cases}\]
Thus, the odds ratio is \(\frac{\hat{\pi}_{bj}}{1-\hat{\pi}_{bj}}/\frac{\hat{\pi}_{mj}}{1-\hat{\pi}_{mj}} = e^{\hat{\beta_1}}\) for all three countries \(j = 1, 2, 3\), or the log odds ratio = \(\hat{\beta_1}\).
The categorical covariates gives \(2\cdot2\cdot3\cdot3 = 36\) possible outcomes. This gives 35 df for the null model (1 df to the intercept). We have only 34 df in the null model. This is because one of the combinations of covariates has no observations, no survivors and no non-survivors (there are no observations for fapp = 2, finlf = 2, fage = 1, fcountry = 2
, which corresponds to benign tumor, moderate or severe inflammatory reaction, under 50 years old, and from the US).
formula <- list(cbind(surv, nsurv) ~ fapp + fage + fcountry, cbind(surv,
nsurv) ~ fapp + fage + finfl + fcountry, cbind(surv, nsurv) ~ fapp +
finfl + fage * fcountry, cbind(surv, nsurv) ~ fapp * finfl + fage *
fcountry, cbind(surv, nsurv) ~ fapp * finfl + fapp * fage + fage *
fcountry, cbind(surv, nsurv) ~ fapp * finfl * fage * fcountry)
models <- lapply(formula, glm, data = brc, family = binomial)
anova(models[[1]], models[[2]], models[[3]], models[[4]], models[[5]],
models[[6]])
## Analysis of Deviance Table
##
## Model 1: cbind(surv, nsurv) ~ fapp + fage + fcountry
## Model 2: cbind(surv, nsurv) ~ fapp + fage + finfl + fcountry
## Model 3: cbind(surv, nsurv) ~ fapp + finfl + fage * fcountry
## Model 4: cbind(surv, nsurv) ~ fapp * finfl + fage * fcountry
## Model 5: cbind(surv, nsurv) ~ fapp * finfl + fapp * fage + fage * fcountry
## Model 6: cbind(surv, nsurv) ~ fapp * finfl * fage * fcountry
## Resid. Df Resid. Dev Df Deviance
## 1 29 33.102
## 2 28 33.095 1 0.0065
## 3 24 25.674 4 7.4210
## 4 23 25.504 1 0.1704
## 5 21 22.021 2 3.4823
## 6 0 0.000 21 22.0214
Use the formula that is factor A has \(a\) levels, and factor \(B\) has \(b\) levels, A*B means intercept + (\(a-1\)) main effect parameters of A, + (\(b-1\)) main effect parameters of B and (\(a-1\))(\(b-1\)) interactions ( = \(1+(a-1)+(b-1) + (a-1)(b-1)\)). Hence, remembering that the intercept and the main effects of a factor can only be counted once in a model specification. \(n = 35\), the intercept, fapp
and finfl
takes 1 df each (when included linearly), and fage
and fcountry
takes 2 df each.
When we are missing an observation, the saturated model (model 6) will have 35, and not 36, estimated coefficients. Thus one combination in the saturated model will not be estimated.
See module pages for more on the Wald statistic.
We need the estimate for the coefficients we want to test: \(\hat{\beta_2} + \hat{\beta_3} + 1 = -0.6589 -0.4945 + 1 = -0.1534\)
And the variance (standard deviation): Var(\(\hat{\beta_2} + \hat{\beta_3} + 1\)) = Var(\(\hat{\beta_2}\)) + Var(\(\hat{\beta_3}\)) + 2Cov(\(\hat{\beta_2}, \hat{\beta_3}\)) = 0.040 + 0.043 + 2 \(\cdot\) 0.021 = 0.125 so the standard error of \(\hat{\beta_2} + \hat{\beta_3} + 1\) is \(\sqrt{0.125} \approx 0.354\). The Wald statistic is then
\[\left(\frac{-0.1534}{0.354}\right)^2 \approx (-0.433)^2 \approx 0.187\]
which has a p-value \(P(X \geq 0.187) \approx 0.665\) for \(X \sim \chi_1^2\), so we do not reject the null hypothesis.
fcountry2
corresponds to a dummy variable,dum2
, which is equal to 1 when the level of country is 2, i.e., hospital is in US, and 0 for all combinations.fcountry3
corresponds to a dummy variable,dum3
, which is equal to 1 when the level of country is 3, i.e., hospital is in UK, and 0 for all combinations.The model from part a) corresponds to a model
\[H_1\text{-model}: \beta_0 + \beta_1 \texttt{fapp} + \beta_2 \texttt{dum2} + \beta_3 \texttt{dum3}\]
Using that \(\beta_2 + \beta_3 = 1\), the model under \(H_0\) becomes
\[\beta_0 + \beta_1 \texttt{fapp} + \beta_2 \texttt{dum2} + (-1-\beta_2) \texttt{dum3} = \beta_0 + \beta_1 \texttt{fapp} + \beta_2 (\texttt{dum2-dum3}) - \texttt{dum3} \]
This can be fitted by specifying a model of the form
\[H_0\text{-model}: \text{offset(−{\tt dum3}) +} \beta_1 \text{fapp} + \beta_2 (\text{\tt{dum2-dum3}})\]
Here dum2-dum3
is a variable which is 0 for treatments which takes place in Japan, 1 for treatments in US and -1 for treatments in UK. The test now consists of comparing the two deviances [of this model and the model from a) - marked with \(H_1\) and \(H_0\)-model above], and using a \(\chi^2\)-distribution as reference.
# Doing the test dum2 is 1 when you are in US (country = 2) dum3 is 1
# when you are in UK (country = 3) so dum2 - dum3 is 0 when you are
# in Japan, 1 in US and -1 in UK
brc$dum2 <- as.numeric(brc$fcountry == 2)
brc$dum3 <- as.numeric(brc$fcountry == 3)
model1 <- glm(formula = cbind(surv, nsurv) ~ fapp + fcountry, family = binomial,
data = brc)
model2 <- glm(cbind(surv, nsurv) ~ fapp + I(dum2 - dum3), offset = -dum3,
data = brc, family = binomial)
summary(model2)
##
## Call:
## glm(formula = cbind(surv, nsurv) ~ fapp + I(dum2 - dum3), family = binomial,
## data = brc, offset = -dum3)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8824 -0.6593 0.1562 0.7491 2.0316
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.0292 0.1157 8.893 < 2e-16 ***
## fapp2 0.5140 0.1658 3.100 0.00194 **
## I(dum2 - dum3) -0.5841 0.1012 -5.770 7.91e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 75.819 on 34 degrees of freedom
## Residual deviance: 36.814 on 32 degrees of freedom
## AIC: 137.37
##
## Number of Fisher Scoring iterations: 4
anova(model2, model1)
## Analysis of Deviance Table
##
## Model 1: cbind(surv, nsurv) ~ fapp + I(dum2 - dum3)
## Model 2: cbind(surv, nsurv) ~ fapp + fcountry
## Resid. Df Resid. Dev Df Deviance
## 1 32 36.814
## 2 31 36.625 1 0.18967
The difference in deviances is \(0.18967\), which with 1 degree of freedom is \(\chi^2\)-distributed with a p-value of about 0.66. Thus we keep the null hypothesis.
Remark: compare to the finding with Wald! Same asymptotic distribution.
Remark: this looks so much easier to do with Wald than with LRT!