The exam is here. it was the first exam I wrote, so the questions are more or less the same, but the balance of questions has improved since this one.

Problem 1: Olympic Records

a) Write down the model for the women’s times in mathematical notation.

T = a + b * Year + e

(4 marks, exact symbols don’t matter)

b) What assumptions are we making when using the model?

Linear response, gaussian residuals, independence

(6 marks)

c) What unknown parameters does the model contain?

intercept, slope

(2 marks)

d) What is the predicted change in times from one Olympic games to the next (i.e. from one games to the next, 4 years later)?

slope = -0.017, so change is a reduction of \(0.017 \times 4 = 0.068s\)

(2 marks)

e) Write down the estimates for the regression parameters from the men’s and women’s models, and comment on the values.

Women: -0.016822, Mens: -0.0110056.

Women’s line is steeper, higher intercept

(4 marks)

f) Which test in the ANOVA table tests whether the women’s time are changing at a different rate to the men’s? What distribution is used in the test?

the Race:Year line

(1 mark)

g) Does the model suggest that the women’s time are changing at a different rate to the men’s time? What are the test statistic and p-value?

Yes, the model suggests a difference. Test statistic: 7.86, p-value: 0.0079 (if they get the previous question wrong, give marks if they extract the correct values for the wrong hypothesis)

(3 marks)

h) What are the equations for the men’s times and for the women’s times?

Mens: Time = 31.82 - 0.011 * Year, Womens: Time = 31.82 + 12.52 – (0.011 -0.0058) * Year

(4 marks)

i) By how much are the women’s times changing compared to the men’s?

They change by 0.0058s per year, or 0.023s per Olympic games (marks given for either answer)

(2 marks)

j) In a response to this model of winning times, a statistician suggested that the race in 2636 would, according to this analysis, be “far more interesting”. Why?

Because that will be when a woman first records a negative time.

(2 marks)

k) Do you think this is a reasonable model? Explain (briefly!) your thinking.

No for prediction, because we are extrapolating, and if we go too far, it gets silly.

Yes, as a description of the data.

(4 marks)

l)Do the residuals look normally distributed? Explain your answer (in 1 or 2 sentences)

No. The plot suggests that the tails are too thick

(2 marks)

m) There is a concern that not all model assumptions are satisfied, do the residual plots support this?

Either: Yes, there doesn’t see m to be much pattern or No, it seems heteroscedastic (either variance inc. w. mean or with fitted value) (2 marks)

n) If you wanted to test whether there was a non-linear effect of time, how could you do it?

Fit a quadratic term, or use a Box-Cox transformation (or something else sensible).

(2 marks)

Problem 2: House Sparrows in North America

a) Write down the assumptions of this model and an equation for the expected number of sparrows at a site.

Poisson distribution, log link, Linear response to prediction and temperature

(4 marks)

b) What is the estimated coefficient for the effect of mean precipitation (prec.mean.sc)? And what is the 95% confidence interval for this estimate?

Estimate: 0.040401 (1 mark), 95% CI: \(0.040401 \pm 1.96 \times0.010177= (0.0205, 0.0603)\)

(5 marks)

c) Are there any signs of over- or under-dispersion in the data?

Overdsperion: residual dispoersion is 8425.5 on df, so the deviance ratio is 4.9, and p< almost anything

(3 marks: 1 mark for “overdispersed”, 2 for reporting stats)

d) What is the predicted number of sparrows in the following sites: 1. at a site near Seattle where the mean temperature is 0.3 standard devia- tions below the mean, and mean precipitation is 0.3 standard deviations above the mean? 2. at a site just west of Seattle, in a temperate rainforest, where the mean temperature is 0.4 standard deviations below the average, but the pre- cipitation is 5.8 standard deviations above the average.

exp(1.85 + 0.30.040 + (-0.3-0.04519139)) = exp(1.88) = 6.5

exp(1.85 + 5.80.040 + (-0.4-0.045)) = exp(2.1) = 8.2

(4 marks)

e) Does the ecological theory seem reasonable for this data? What parameter values tell you about this?

Yes it does. The theory would suggest that the quadratic terms should be negative.

(2 marks)

f) What is the predicted number of sparrows for the two sites mentioned above?

Near Seattle: exp(2.1 -0.00120.3 – 0.038(-0.3) - 0.105(0.3^2)-(0.168-0.3^2)) = exp(2.09) = 8.11 exp(2.1 -0.00125.8 - 0.0380.4 - 0.105(5.8^2)-(0.168-0.4^2)) = exp(-1.44) = 0.23,

(4 marks)

g) Comment briefly on the differences in the predictions from the linear and quadratic model.

The wet site has a very different predicton, because it is extremely wet: in the linear model the rainfall effect has to increase, whereas the quadratic model is more flexible and makes it decrease.

(4 marks)