Project 1

In this project we will use data from Ward-Fear et. al. containing the survival times of a lizard species exposed to a toxic prey (for details see the paper). The lizards have been given two treatments; one group has been trained to recognise the toxic prey and another group has been given no training. Load the data and the survival-package into R using the commands

library(survival)
data <- read.csv("https://www.math.ntnu.no/emner/TMA4275/2018v/lizards.txt",sep="\t",header=TRUE)

Note that the variable Censor unconventionally takes a value of 1 for right-censored observations and 0 otherwise.

  1. Compute and plot the Kaplan-Meier estimate of the survival fuction \(R(t)\) assuming that there is no difference between the two groups. Hints: You’ll need the R-functions Surv, survfit, and plot.survfit (plot-methods of objects of class survfit).

  2. Plot the corresponding estimate of the cumulative hazard function. Do this by defining a suitable R-function that can be supplied as the fun-argument to plot.survfit. Based on the plot, do you see any indication of deviations from exponentiality?

  3. Suppose that data survival times are distributed as a mixture of two exponential distributions with parameters \(\lambda_1>\lambda_2\) in proportions \(p\) and \(1-p\). Derive an expression for the resulting hazard function \(z(t)\). In particular, do this mechanism generate decreasing or increasing hazard? Comment on the hazard \(z(t)\) at time \(t=0\) and the limiting value of the hazard as \(t\rightarrow\infty\).

  4. Write a general R-function that generates a Total-time-on-test (TTT) plot (as a side effect) of the data and carries out the Barlow-Proschan test of exponentiality and returns this as an object of class htest (the output will then be nicely formatted by the print.htest-function). This list should include a \(p\)-value. See the help pages and source code of chisq.test and prop.test for examples. The input argument should be an object of class Surv. Additional arguments should specify if the alternative hypothesis is an increasing or decreasing hazard or if two-sided test should be conducted. Apply your function to the lizard data specifying and alternative hypothesis corresponding to the pattern described in point 3.

  5. Use the log-rank-test (see survdiff) to test if there is a difference between the survival functions for the two groups (the lizards given training vs the ones that were not).

  6. Test for a difference in survival using survival regression based on the Weibull distribution (available via survreg function). Also do a likelihood ratio test of the null hypothesis that the survival times within each category comes from an exponential distribution against the more general Weibull model, again by using the survreg function.

  7. Relying on the simpler exponential model in point 6, let \(\theta_1\) and \(\theta_2\) denote the scale parameters of the two exponential distributions. Derive and plot the profile likelihood of \(\theta_1/\theta_2\) given the observed lizard data and use this to derive an approximate 95%-confidence interval for \(\theta_1/\theta_2\). Derive an alternative approximate confidence interval for the same ratio based on asymptotic normality of parameter estimates in the exponential survival regression in point 6.

  8. Under the same model as in point 7, derive the Bayesian posterior distribution and a 95% credible interval for \(\theta_1/\theta_2\) using independent improper scale priors (Berger, 1986, p. 86) on \(\theta_1\) and \(\theta_2\), \(\pi(\theta_i) \propto 1/\theta_i\), \(i=1,2\).