Iris virginica

These data are from three species of the plant, iris. They include measures of petal width and length. Here you will look at how the length of petals influences their width.

You can find the data here: it is a .csv with a header.

Important! When you import the data it is important to make sure it is in the right format. Here you need Species to be a factor and PetalLength to be numeric. See code below.

irisdata <- read.csv("https://www.math.ntnu.no/emner/ST2304/2020v/Week09/irisdata.csv", header=T)

# str() checks the data structure
str(irisdata)
## 'data.frame':    150 obs. of  3 variables:
##  $ PetalWidth : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ PetalLength: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Species    : chr  "Setosa" "Setosa" "Setosa" "Setosa" ...
# we can see that the variables are ok but best to be sure
irisdata$Species <- as.factor(irisdata$Species)
irisdata$PetalLength <- as.numeric(irisdata$PetalLength)

# now check again
str(irisdata)
## 'data.frame':    150 obs. of  3 variables:
##  $ PetalWidth : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ PetalLength: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Species    : Factor w/ 3 levels "Color","Setosa",..: 2 2 2 2 2 2 2 2 2 2 ...

The columns in the dataframe are:

  • PetalWidth = width of the petal in cm.
  • PetalLength = length of the petal in cm.
  • Species = which species it is.

You want to find out how petal length and species effect petal width.

1. What model will you use to answer this? (1 mark)

2. What type of variables do you have and which are response or explanatory? (3 marks)

We have given code to run a model for the data below. Think about what type of model is being run? It is good practice to consider if you would have chosen the same one.

# Model 1
model1 <- lm(PetalWidth ~ Species+PetalLength, data = irisdata)

coef(model1)
##      (Intercept)    SpeciesSetosa SpeciesVirginica      PetalLength 
##        0.3445409       -0.4353703        0.4023368        0.2303895
confint(model1)
##                        2.5 %     97.5 %
## (Intercept)       0.05034518  0.6387367
## SpeciesSetosa    -0.63857536 -0.2321652
## SpeciesVirginica  0.28932647  0.5153472
## PetalLength       0.16234263  0.2984363

3. Interpret the output of the model. What does it tell you about the effect of petal length on petal width? (5 marks)

Below is the code to make some graphs to check the model fit.

# Graph 1

residuals <- residuals(model1)
fitted <- fitted(model1)
plot(fitted, residuals)

qqnorm(residuals)
qqline(residuals)

4. What are the assumptions of the model? (5 marks)

5. Are the assumptions met? Reference which plot you use to decide and why you make the choice. (6 marks)

6. What other plot might you also want for checking assumptions? (1 mark)

Here is code for another model on the same data.

# Model 2
model2 <- lm(PetalWidth ~ Species*PetalLength, data = irisdata)

coef(model2)
##                  (Intercept)                SpeciesSetosa 
##                  -0.08428835                   0.03606803 
##             SpeciesVirginica                  PetalLength 
##                   1.22031966                   0.33105360 
##    SpeciesSetosa:PetalLength SpeciesVirginica:PetalLength 
##                  -0.12980851                  -0.17075665
confint(model2)
##                                   2.5 %      97.5 %
## (Intercept)                  -0.5408795  0.37230284
## SpeciesSetosa                -0.5873051  0.65944111
## SpeciesVirginica              0.5386868  1.90195255
## PetalLength                   0.2245060  0.43760126
## SpeciesSetosa:PetalLength    -0.4371702  0.17755320
## SpeciesVirginica:PetalLength -0.3106942 -0.03081906

7. How is this model different to the first one? (1 mark)

8. Given the new model, does this change your interpretation of the effect of petal length on petal width? Why? (4 marks)

9. Which model do you prefer, why? (3 marks)