Dataset 1: Cow diets

Cow image

credit: commons.wikimedia

As written in the module. These data are from an experiment on cows. Cows were split into 6 groups, each given a different diet treatment. Here you will look at how these different treatments influenced the dry matter intake (DMI) of the cows.

You can find the data here: it is a .csv with a header.

Important! When you import the data it is important to make sure it is in the right format. Here you need Treatment to be a factor and Baseline to be numeric. See code below.

cowdata <- read.csv("https://www.math.ntnu.no/emner/ST2304/2020v/Week09/cowdata.csv", header=T)

# str() checks the data structure
str(cowdata)

# we can see that the variables are not the right format
# so we fix it
cowdata$Treatment <- as.factor(cowdata$Treatment)
cowdata$Baseline <- as.numeric(cowdata$Baseline)

# now check again
str(cowdata)

The columns in the dataframe are:

DMI = the dry matter intake during the experiment (grams)
Baseline = the baseline dry matter intake before the experiment (grams)
Treatment = the diet treatment group (1 to 6)

You want to find out how diet treatment influenced dry matter intake while controlling for the baseline intake of each cow.

1. What model will you use to answer this? (1 mark)

1 point for any of: linear model, regression, ancova. No points for anova or t-test. There is one categorical and one continuous explanatory here.

2. What type of variables do you have and which are response or explanatory? (3 marks)

1 point for each of: DMI = response, continuous. Treatment = explanatory, categorical. Baseline = explanatory, continuous.

We have given code to run a model for the data below. Think about what type of model is being run? It is good practice to consider if you would have chosen the same one.

# Model 1
model1 <- lm(DMI ~ Treatment+Baseline, data = cowdata)

coef(model1)

confint(model1)

3. Interpret the output of the model. What does it tell you about the effect of diet on DMI? (5 marks)

1 mark for: Intercept is intercept of line for Treatment group 1. 1 mark for: effect of Baseline intake is positive (cows that ate more before eat more during the experiment 0.05 to 0.12 grams per gram). 1 mark for: this is the same for all groups. 1 mark for: all treatments 2-6 seem to have a negative effect relative to Treatment group 1 (mention the actual estimate of effect). 1 mark for: the confidence intervals cross 0 for groups 2,3,4, and 6. Only group 5 has a distinguishable difference from group 1 (-4.45 to -0.29 grams). To get each mark you should mention confidence intervals and the coefficient estimate.

Below is the code to make some graphs to check the model fit.

# Graph 1

residuals <- residuals(model1)
fitted <- fitted(model1)
plot(fitted, residuals)

qqnorm(residuals)
qqline(residuals)

4. What are the assumptions of the model? (5 marks)

1 mark each for: linearity, equal variance, independence, no outliers, normality of residuals.

5. Are the assumptions met? Reference which plot you use to decide and why you make the choice. (6 marks)

3 marks for: Yes - equal variance, tested using residuals vs fitted plot and there is no structure there. Also 3 marks for: Yes - normality of residuals, tested using normal qq, fall mostly along the line, some deviation at edges.

6. What other plot might you also want for checking assumptions? (1 mark)

1 mark: cook's distance.

Here is code for another model on the same data.

# Model 2
model2 <- lm(DMI ~ Treatment*Baseline, data = cowdata)

coef(model2)

confint(model2)

7. How is this model different to the first one? (1 mark)

1 mark: now has interaction.

8. Given the new model, does this change your interpretation of the effect of diet on DMI? Why? (4 marks)

1 mark for: yes. 1 mark for: Treatment 5 no longer has a clear effect as the confidence intervals now span 0. 1 mark for: the effect of baseline has decreased and confidence intervals also now span 0. 1 mark for: all other effects have similar interpretation. Important to know here that the interaction terms tells you a difference in the slope of the effect of baseline on DMI.

9. Which model do you prefer, why? (4 marks)

1 mark for: I prefer model 1. 1 mark for: because it had clearer estimates of some effects. 1 mark for: model two could not identify the direction of any effects (confidence intervals spanning 0). 1 mark for: as none of the interaction terms had a clear direction, I could conclude the interaction was not needed.