Project 2, TMA4275 spring 2018

Left trunctation

Suppose that we have left truncated, right censored data from a Weibull distribution with scale parameter \(\theta\) and shape parameter \(\alpha\). Left truncation means that individuals that die prior to the left truncation point do not appear in the sample. Let \((u_i, y_i, \delta_i)\), \(i=1,2,\dots,n\) denote the left truncation point, the time of the event, and the censoring indicator for each individual in the sample. Derive the likelihood and log-likelihood function for the observed data. Hint: you need to consider the conditional probability of being alive at time \(y_i\) and the conditional density of \(T_i\) given a survival time \(T_i\ge u_i\).
Implement an R function that computes the negative of the above log likelihood. The first argument should be a vector containing \(\theta\) and \(\alpha\) and the second argument a data.frame containing the data \((u_i,y_i,\delta_i)\). You’re allowed to use builtin functions dweibull and pweibull. To test your function, first write code that simulates a predetermined number \(n\) of left truncated observations (perhaps using a while-loop) from a Weibull distribution with parameters \(\theta=1\), \(\alpha=2\), with left truncation points uniformly distributed on \((0,1)\) and right censoring points uniformly distributed on \((1,4)\) (use functions rweibull and runif). Second, fit the model to the simulated data using function optim in R. Compute approximate standard errors based on the observed Fisher information (set the argument hessian=TRUE in the call to optim). Do the MLEs appear to fall reasonably close to the true parameter values?

The Cox model for recurrent event data.

The Cox model can be used also for recurrent event data. Here, we will use the dataset bladder2 in the survival package containing data on recurrence of bladder cancer in 118 patients. The only difference from an ordinary analysis based on the Cox model is that different patients possibly remains in the risk set when the event occur (if they are successfully treated). To fit the model, the history of a patients is represented as a series of left trunctated observations that are either right censored or where we observe the event after a given time.

Load the dataset with the command data(bladder2) in R. Each row in the data frame contains the variables start, stop, a censoring indicator event and rx (placebo or thiotepa treatment) (see ?bladder2 for details). You can now fit the model using coxph(Surv(start,stop,event) ~ rx, data=bladder2). By how many percent does the intensity change for patients that are given thiotepa treatment relative to those given placebo? Find an approximate confidence interval for this percentwise change.
Examine the proportional hazard (or intensity) assumption of the Cox model by plotting the Schoenfeld residuals of the fitted models against the observed event times. Use the function smoothSEcurve from Moore to estimate the mean of the residuals as function of time. Interpret what you see.
Using functions survfit.coxph and plot.survfit, plot estimates of the cumulativ hazard (or intensity) functions and associated confidence intervals for each treatment group defined by rx as well as the baseline cumulative hazard function.

Parametric inference for NHPPs

Consider a non-homogeneous Poisson process (NHPP) with intensity function \[w(t;a,b,c) = e^{a + b\cos(2\pi(t-c)/T)}.\]

Explain how this may model events that occur at a rate exhibiting deterministic periodic fluctuations and give a brief interpretation of the parameters \(a,b,c\) and \(T\). Write R code that computes maximum likelihood estimates and approximate standard errors of the parameters \(a\), \(b\) and \(c\) given \(n\) observed event times \(s_1,s_2,\dots,s_n\) on an interval \((0,\tau]\) using the same numerical methods as in point 1. You’ll need to use the integrate fuction in R to compute the integral \(W(\tau;a,b,c)=\int_0^\tau w(u;a,b,c) du\) (one of the terms in the log likelihood). The parameter \(T\) can be assumed to be known.
Data available via the command below contains the event times for mentions of the phrase “blå ekstra” (a type of ski wax) in the norwegian press between the dates Janary 1, 2012 and January 1, 2018 (data retrieved from https://www.retriever.no/product/mediearkiv/). Event times are given in number of days since the beginning of the interval. Fit the above NHPP model to the data using \(T=365.25\). Make a plot of the estimated intensity function \(w(t)\) and the observed data. Briefly discuss if the assumptions of the model may be reasonable or not.

s <- c(13, 13, 22, 22, 22, 22, 25, 25, 25, 25, 32, 32, 33, 33, 37, 
37, 40, 40, 40, 40, 44, 44, 45, 45, 45, 45, 48, 48, 75, 75, 82, 
82, 83, 83, 94, 94, 114, 114, 152, 152, 376, 376, 377, 377, 409, 
409, 412, 412, 415, 415, 436, 436, 446, 446, 447, 447, 448, 448, 
450, 450, 451, 451, 457, 457, 459, 459, 586, 600, 676, 678, 678, 
683, 689, 692, 696, 718, 739, 744, 746, 748, 748, 751, 758, 761, 
779, 779, 783, 790, 810, 818, 862, 989, 1065, 1067, 1084, 1088, 
1123, 1135, 1137, 1139, 1160, 1181, 1186, 1189, 1395, 1472, 1475, 
1476, 1479, 1481, 1489, 1489, 1497, 1497, 1506, 1506, 1511, 1511, 
1512, 1512, 1513, 1513, 1531, 1531, 1532, 1532, 1533, 1533, 1534, 
1534, 1535, 1535, 1732, 1732, 1798, 1798, 1805, 1805, 1811, 1811, 
1811, 1811, 1811, 1811, 1812, 1812, 1814, 1814, 1814, 1814, 1816, 
1816, 1822, 1822, 1822, 1822, 1833, 1833, 1839, 1839, 1844, 1844, 
1847, 1847, 1856, 1856, 1866, 1866, 1872, 1872, 1880, 1880, 1882, 
1882, 1902, 1902, 1903, 1903, 1903, 1903, 1905, 1905, 1905, 1905, 
1906, 1906, 1912, 1912, 1914, 1914, 1935, 1935, 2150, 2150, 2150, 
2150, 2152, 2152, 2152, 2152, 2161, 2161, 2178, 2178, 2179, 2179, 
2181, 2181)