Question one

  1. What specific hypothesis does this analysis test?

1 mark for stating that they are testing the hypothesis of whether there is a difference between caste in energy use.

  1. What can you conclude from this test? Explain how you reach your conclusion.

1 mark for stating there is no difference between castes, 1 for giving p-value or confidence interval

  1. Write the mathematical equation for the model fitted with this code

\[y = \alpha + \beta_1 caste + \beta_2 mass\], where \(\alpha\) is an intercept, \(\beta_1\) is the effect of caste, and \(\beta_2\) is the effect of mass.

  1. How does this model differ from the t-test in Figure 2?

the t-test says that there is no effect of caste (p-value is > 0.05, it is 0.4988), this new model says that there is (the p-value changes to 0.0112 which is < 0.05) (2 MARKS). Some of the variation in the t-test can be explained by body size, so including it gives us better results, it is included in the model in Figure 3 but not in the t-test (4 MARKS)

  1. What statistical conclusions can you draw from these results?

There is a statistically distinguishable positive effect of caste on energy usage, with workers using on average 0.39 log KJ of energy per day than the lazy caste. The direction of this effect is clearly distinguished as the p-value is <0.05 (0.0112) and the confidence intervals do not cross 0 (0.09778 to 0.68222). There is also a statistically distinguishable affect of body mass on energy usage. The effect is positive with every 1 log gram of body weight causing mole rats to use 0.89 log KJ more of energy per day. This was also statistically significant (p-value much less than 0.05 and confidence interval (0.51 to 1.27).

  1. What specific hypothesis does this ANOVA analysis test?

1 mark for stating that they are testing the hypothesis of whether there is an interaction between caste and body mass.

  1. What can you conclude from this analysis? Explain how you reach your conclusion.

Any statement of accepting the null hypothesis or incorrect direction of result leads to 0 marks here. 2 marks for discussion of the Pr(>F) value of 0.3206 as the probability of getting our observed F value or higher if the null was true. 2 marks for stating because this Pr(>F) value is over the usual threshold of 0.05, we say it does not have statistical significance. We are quite likely to see this result even if the null is true.

  1. What are the assumptions of a regression model like model1?

1 mark each for stating the assumptions; equal variance, linearity, normality of residuals, no outliers, independence of residuals. (total here = 6).

  1. How good is the model fit for model1? Explain your answer in the context of assumptions of the model.

1 mark for: There seems to be equal variance across the fitted values, it is not heteroscedastic. 1 mark for: There does not seem to be any evidence of non-linearity. The residuals fit along the theoretical line in the normal QQ plot so the normality assumption is met. 1 mark for: There are 3 outliers marked on the Cook’s distance these points might need investigating. 1 mark for an opinion: e.g. Yes, it is ok. No, is also acceptible if supported by a valid reason. (But yes seems more logical).

  1. What can you conclude biologically about the association between log energy expenditure and log mass from the results of these analyses?

There is a positive effect of caste on energy expenditure. Log energy expenditure increases. There is a positive effect of log mass on log energy expenditure. 1 mark for comparing the effect sizes, the effect of body mass is larger than the effect of caste even for a single unit change. We can see from the graph that the range of log(mass) is just greater than 1. So across our data log body mass has a larger effect than caste. The two confidence intervals do overlap slightly, so there are plausible values where the effect could be similar. But the plausible range of log(mass) extends much higher. 2 marks for bringing the results to biological meaning: e.g. the results have shown that bigger individuals have higher energy expenditure, we would expect this as they have higher costs to keep their body going. It supports the assumption of the researchers prior to the analysis. The interpretations may include subjective components this is ok, as long as the interpretation is justified biologically or statistically then marks are still given.

Problem 2: Dragon Breeding

  1. which locus has the largest estimated effect on whether a dragon is a Norwegian Ridgeback?

Locus 2 1 mark

  1. what is the size of that effect?

-1.36

  1. what is the (approximate) 95% confidence interval for this effect?

take the estimate of the effect and ± 2 * the standard error = -0.3741402 -2.3503104

The researchers decided to carry out model selection

  1. why might this be a good idea here, as opposed to just using the analysis above?

Lots of variables, a simpler model is nicer & gives better estimates

  1. what form of model selection is this - exploratory or confirmatory?

exploratory

  1. suggest a situation where the other form of model selection would be preferred (you can invent one involving dragons)

Example should be one where a specific biological hypothesis is tested. e.g. If wanted to test whether country of origin changed probability of being Norwegian or Swedish. (2 marks)

  1. what alternative is there to using AIC, and why might one be preferred to the other?

BIC (1 mark) we want to predict with AIC or we want a smaller model with BIC (2 marks) OR SOMETHING SIMILAR

  1. Write down the final model selected in Figure 7

y = logit(alpha + beta1*Locus2 + beta2*Locus9) (I think this is asking for the equation)

  1. If we have a dragon that is homozygous for “long” alleles (i.e. it has 2 long alleles at each locus), what is the probability that it will be a Norwegian Ridgeback?

logit(p) = -0.6503017+ 2* -1.2478049 + 2* 0.8178833 = -1.5101449 (1 Mark) p = 0.1809173 use formula exp(logit(p))/1+exp(logit(p)) (1 Mark) ANSWER MAY NOT BE EXACT AS THERE IS SIMULATION

  1. If we have a dragon that is homozygous for “short” alleles (i.e. it has 0 long alleles at each locus), what is the probability that it will be a Norwegian Ridgeback?

logit(p) = -0.6503017+ 0* -1.2478049 + 0* 0.8178833 = -0.6503017 (1 Mark) p = 0.3429216 ANSWER MAY NOT BE EXACT AS THERE IS SIMULATION (1 Mark)

  1. From these predictions, how important are these loci for determining whether a dragon is a Norwegian Ridgebacks or a Swedish Short-snout?

Quite a bit, because the change in probability is from 10% to 43%.

  1. How could we change the model if we did not think the effects would be additive? (think about this for a single locus)

One could either add a quadratic term, or treat the number of allels as a factor. 2 marks for either