Confirmatory model selection

Instructions:

The variable names are also different here as examples are on different data to what you have. Results will also be different


Help and hints (and pictures because it was a bit boring)

Below are a list of click down sections, each covers a different part of the analysis. Use whichever you need. Also try to have a go on your own when you can, you can always check after if you got it right.

Useful R code

anova(YourModel1, YourModel2) 
## Analysis of Variance Table
## 
## Model 1: height ~ 1
## Model 2: height ~ type
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     29 293.44                              
## 2     28 242.08  1    51.352 5.9395 0.02141 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hint: check the degrees of freedom to see if you put them in the right order. The degrees of freedom should make sense i.e. be positive

Hints on which steps to include

In confirmatory model selection you want to test a specific hypothesis.

Therefore, you need to think about a null and alternative hypothesis and a way to compare them statistically. (This part involves a few steps).

Then, as with every analysis, you will want to look at the results and make a conclusion. Hint - it includes rejecting or not the null hypothesis.

I really don't know which steps!! (or I just want to be sure)

The first step in confirmatory model selection is to work out what your H0 (null hypothesis) and H1 (alternative hypothesis) are.

Once you have worked this out you should run a linear model for each of these. You should know the code to do this by now. Make sure to save them as objects e.g. YourModelH0 <- lm(), because you will use them in other functions later.

Then find a way to compare the results statistically. You can do this using the anova() function to compare them.

Now you need to interpret the output of the anova(). Below is an example from a different dataset, in the next section.

Finally, you would need to interpret the output of the 'chosen' model.

How to read outputs of confirmatory model selection

The degrees of freedom HERE begins as: n (number of data points). But is usually n-1

The columns are:

anova(YourModel1, YourModel2) 
## Analysis of Variance Table
## 
## Model 1: height ~ 1
## Model 2: height ~ type
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     29 293.44                              
## 2     28 242.08  1    51.352 5.9395 0.02141 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

To make this an interpretation, the main thing we need is the Pr(>F) value, called the p-value. You might have come across this in another course. The typical threshold for this is 0.05, anything less than this is considered to be unlikely if H0 were true. (5% chance of seeing the result or higher). But this is a bit arbitrary. You could choose 0.01 instead if you want to be really sure, or 0.1 if you don't need much confidence. The p-value does not tell you anything about the strength of any relationships though, for that you need the coefficients of the chosen lm() and their confidence intervals.

To draw a conclusion, you want to focus on whether you reject or not H0.

Never accept H0 or H1, just reject H0 or don't. The p-value does not tell you how likely either H0 or H1 are to be true.