Cow female black white

image source: Keith Weller/USDA, Public domain, via Wikimedia Commons

These data are from an experiment on cows. Cows were split into 6 groups, each given a different diet treatment. Here you will look at how these different treatments influenced the dry matter intake (DMI) of the cows.

You can find the data here: it is a .csv with a header.

Important! When you import the data it is important to make sure it is in the right format. Here you need Treatment to be a factor and Baseline to be numeric. See code below.

cowdata <- read.csv("https://www.math.ntnu.no/emner/ST2304/2020v/Week09/cowdata.csv", header=T)

# str() checks the data structure
str(cowdata)
## 'data.frame':    57 obs. of  3 variables:
##  $ Treatment: int  4 5 4 3 5 2 1 3 3 1 ...
##  $ Baseline : chr  "22.4757" "24.5157" "22.1100" "18.9529" ...
##  $ DMI      : num  24.5 26.8 25.4 27.7 17.1 ...
# we can see that the variables are not the right format
# so we fix it
cowdata$Treatment <- as.factor(cowdata$Treatment)
cowdata$Baseline <- as.numeric(cowdata$Baseline)
## Warning: NAs introduced by coercion
# now check again
str(cowdata)
## 'data.frame':    57 obs. of  3 variables:
##  $ Treatment: Factor w/ 6 levels "1","2","3","4",..: 4 5 4 3 5 2 1 3 3 1 ...
##  $ Baseline : num  22.5 24.5 22.1 19 20.1 ...
##  $ DMI      : num  24.5 26.8 25.4 27.7 17.1 ...

The columns in the dataframe are:

  • DMI = the dry matter intake during the experiment (grams)
  • Baseline = the baseline dry matter intake before the experiment (grams)
  • Treatment = the diet treatment group (1 to 6)

You want to find out how diet treatment influenced dry matter intake while controlling for the baseline intake of each cow.

1. What model will you use to answer this? (1 mark)

2. What type of variables do you have and which are response or explanatory? (3 marks)

We have given code to run a model for the data below. Think about what type of model is being run? It is good practice to consider if you would have chosen the same one.

# Model 1
model1 <- lm(DMI ~ Treatment+Baseline, data = cowdata)

coef(model1)
## (Intercept)  Treatment2  Treatment3  Treatment4  Treatment5  Treatment6 
##  15.5566663  -0.2420337  -1.0015958  -1.3042739  -2.3390966  -1.6270854 
##    Baseline 
##   0.4998618
confint(model1)
##                  2.5 %     97.5 %
## (Intercept) 11.0734299 20.0399026
## Treatment2  -2.2612061  1.7771386
## Treatment3  -3.1054750  1.1022834
## Treatment4  -3.3179503  0.7094025
## Treatment5  -4.4809290 -0.1972641
## Treatment6  -3.6633444  0.4091737
## Baseline     0.2933972  0.7063264

3. Interpret the output of the model. What does it tell you about the effect of diet on DMI? (5 marks)

Below is the code to make some graphs to check the model fit.

# Graph 1

residuals <- residuals(model1)
fitted <- fitted(model1)
plot(fitted, residuals)

qqnorm(residuals)
qqline(residuals)

4. What are the assumptions of the model? (5 marks)

5. Are the assumptions met? Reference which plot you use to decide and why you make the choice. (6 marks)

6. What other plot might you also want for checking assumptions? (1 mark)

Here is code for another model on the same data.

# Model 2
model2 <- lm(DMI ~ Treatment*Baseline, data = cowdata)

coef(model2)
##         (Intercept)          Treatment2          Treatment3          Treatment4 
##         14.96397551          1.83547186         -4.57792728          4.53811818 
##          Treatment5          Treatment6            Baseline Treatment2:Baseline 
##         -7.35426532         -0.21405539          0.52882628         -0.09883937 
## Treatment3:Baseline Treatment4:Baseline Treatment5:Baseline Treatment6:Baseline 
##          0.15638806         -0.28162245          0.22723154         -0.06631897
confint(model2)
##                           2.5 %     97.5 %
## (Intercept)          -0.4488745 30.3768256
## Treatment2          -16.7418094 20.4127532
## Treatment3          -22.9288747 13.7730201
## Treatment4          -13.9556395 23.0318758
## Treatment5          -29.2072749 14.4987443
## Treatment6          -18.7696470 18.3415362
## Baseline             -0.2205945  1.2782471
## Treatment2:Baseline  -0.9908036  0.7931248
## Treatment3:Baseline  -0.7116666  1.0244427
## Treatment4:Baseline  -1.1755869  0.6123420
## Treatment5:Baseline  -0.8011423  1.2556053
## Treatment6:Baseline  -0.9489395  0.8163016
# knitr::kable is used here to remove some decimals.
knitr::kable(anova(model2), digits = 2)
Df Sum Sq Mean Sq F value Pr(>F)
Treatment 5 24.06 4.81 1.02 0.42
Baseline 1 105.52 105.52 22.45 0.00
Treatment:Baseline 5 11.64 2.33 0.50 0.78
Residuals 43 202.12 4.70 NA NA

7. How is this model different to the first one? (1 mark)

8. Given the new model, does this change your interpretation of the effect of diet on DMI? Why? (4 marks)

9. Which model do you prefer, why? (4 marks)