Learning Outcomes

This is a question that you should try to answer

Answer This is where the answer will be. Unless it says “Hint” in which case it is a hint to help you.

Categorical Variables and ANOVA

In the last module we looked at ANOVA, where we have a categorical variable with \(k\) levels. But we often have more than one variable, e.g. in our example from last week we have different fertiliser treatments, but the experiment was also done with different potato varieties. This second variable might also be important. The book describes this as Randomized Block Designs. This is indeed one example of a randomised block design, but only one. The same model can be used for other situations, where we don’t have blocks.

Every observation must have a value for every factor: there has to be a fertiliser treatment and a variety of potato. So the data might look like this:

Variety B Cl S
1 Ajax 3.09 3.24 3.69
4 Arran comrade 2.25 2.07 2.46
7 British queen 2.94 3.19 3.28
10 Duke of York 2.02 1.77 1.54
13 Epicure 2.16 2.23 2.10

We can assume that there are row and column effects. Following the book, we will describe them as Block and Treatment effects. Our model is

\[ Y_{ij} = \mu_j + \beta_i + \varepsilon_{ij} \]

Where \(i=1,\dots b\), \(j=1,\dots k\). We are looking at a problem where there is only one observation for each combination oif \(i\) and \(j\), but this assumption can be relaxed.

Note that the model assumes that the effect of a row is the same for every column, e.g. a fertiliser has the same effect on every potato variety.

We hit a slight problem here: if \(Y_{ij} = \mu_j + \beta_i + \varepsilon_{ij}\), we can substitute \(\mu_j+c\) and \(\beta_i-c\) and get the same value of \(Y_{ij}\). So here we will set \(\sum_{i=1}^k{\beta}_i=0\), and \(\sum_{j=1}^k\mu_j = \mu\). Then, if we want to estiamte the effect of a block, this is not difficult.

What is \(E(\bar{Y}_{i.})\)? And thus show that \(\hat{\beta}_i=\bar{Y}_{i.} - \bar{Y}_{..}\) is unbiased.

Answer

\(\bar{Y}_{i.} = \frac{1}{k}\sum_{j=1}^{k}Y_{ij}\) so

\[ \begin{align} E(\bar{Y}_{i.}) &= \frac{1}{k} E\left(\sum_{j=1}^{k}Y_{ij}\right) \\ &= \frac{1}{k}\sum_{j=1}^{k}E(Y_{ij}) \\ &= \frac{1}{k}\sum_{j=1}^{k}E(\mu_j + \beta_i + \varepsilon_{ij}) \\ &= \frac{1}{k}\left(\sum_{j=1}^{k} E(\mu_j) + \sum_{j=1}^{k}E(\beta_i) + \sum_{j=1}^{k} E(\varepsilon_{ij}) \right)\\ &= \mu + \beta_i \\ \end{align} \]

because \(E(\varepsilon_{ij})=0\), and \(\frac{1}{k}\sum_{j=1}^{k} \mu_j=\mu\).

Then because \(E(\bar{y}_{..})=\mu\), \(E(\bar{Y}_{i.}) = \mu + \beta_i = E(\bar{y}_{..})\), so \(\beta = E(\bar{Y}_{i.}) - E(\bar{y}_{..})\), and thus \(\hat{\beta} = \bar{Y}_{i.} - \bar{y}_{..}\).

Estimating Main Effects

When we looked at the one-way ANOVA, we could calculate the means for each level of the factor. Here we can do almost the same thing, calculating the mean of the level of the factor averaging over the levels of the second factor:

\[ \begin{align} \bar{Y}_{i.} &= \frac{1}{k} \sum_{j=1}^{k} Y_{ij} \\ \bar{Y}_{.j} &= \frac{1}{b} \sum_{i=1}^{b} Y_{ij} \\ \end{align} \]

These are, in essence, row means and column means.

F test for 2 way design

Before we look at estimating any contrasts, we will look at the tests, to see if there are any effects at all. The approach is similar to a one-way ANOVA: we work with the variance, and decomposing it into terms.

From the one-way ANOVA we had the following sums of squares:

And \(SS_{Tot} = SS_{TT} + SS_{E}\). But now we also have a second dimension, our blocks. We can deal with their variance by noting that their variation must be in what was \(SS_E\), so we can decompose \(\sum_{j=1}^k\sum_{i=1}^{nb}(Y_{ij} - \bar{Y}_{.j})^2\) into terms involving the block and the (new) error. We are aiming for a block sum of squares

\[ SS_B =\sum_{i=1}^b\sum_{j=1}^{k}(\bar{Y}_{i.} - \bar{Y}_{..})^2 \] and the error sum of squares

\[ SS_E =\sum_{i=1}^b\sum_{j=1}^{k}(Y_{ij} - \bar{Y}_{.j} - \bar{Y}_{i.} + \bar{Y}_{..})^2 \] Note that \(\bar{Y}_{.j} - \bar{Y}_{i.} + \bar{Y}_{..} =E(Y_{ij})\)

We are going to get there from the “old” \(SS_E\), \(\sum_{j=1}^k\sum_{i=1}^{n_k}(Y_{ij} - \bar{Y}_{.j})^2\). We can do this by adding and subtracting \(\bar{Y}_{i.} + \bar{Y}_{..}\) from \(SS_E\), re-arranging:

\[ \sum_{j=1}^k\sum_{i=1}^{n_k}(Y_{ij} - \bar{Y}_{.j})^2 = \sum_{j=1}^k\sum_{i=1}^{n_k}(Y_{ij} - \bar{Y}_{.j} + (\bar{Y}_{i.} + \bar{Y}_{..}) - (\bar{Y}_{i.} + \bar{Y}_{..}))^2 \]

Show that \(\sum_{j=1}^k\sum_{i=1}^{n_k}(Y_{ij} - \bar{Y}_{.j})^2 = \sum_{j=1}^k\sum_{i=1}^{n_k}(\bar{Y}_{i.} - \bar{Y}_{..})^2 + \sum_{j=1}^k\sum_{i=1}^{n_k}(Y_{ij} - \bar{Y}_{.j} - \bar{Y}_{i.} + \bar{Y}_{..})^2\)

Hint

Expand the quadratic, and show that the cross-product term is 0.

Also - there is a trick: you can write the cross-product as \(\sum_i A \sum_j B\), and \(B\) becomes 0.
Answer This is equation 13.2.3: the proof is on pp 615-6.

So we now have \(SS_{Tot} = SS_{B} + SS_{TR} + SS_{E}\), or Row + Column + Error.

What is \(E(SS_{B})\)?

Hint This is similar to Theorem 12.2.1.
Answer

\[ \begin{align} SS_B &=\sum_{i=1}^b\sum_{j=1}^{k}(\bar{Y}_{i.} - \bar{Y}_{..})^2 \\ &=\sum_{i=1}^b\sum_{j=1}^{k}[(\bar{Y}_{i.} - \mu) - (\bar{Y}_{..}-\mu)]^2 \\ &=\sum_{i=1}^b\sum_{j=1}^{k} (\bar{Y}_{i.} - \mu)^2 + \sum_{i=1}^b\sum_{j=1}^{k}(\bar{Y}_{..}-\mu)^2 - \sum_{i=1}^b\sum_{j=1}^{k}2(\bar{Y}_{i.} - \mu)(\bar{Y}_{..}-\mu) \\ &= k\sum_{i=1}^b (\bar{Y}_{i.} - \mu)^2 + bk (\bar{Y}_{..}-\mu)^2 - 2(\bar{Y}_{..}-\mu)\sum_{i=1}^b\sum_{j=1}^{k}(\bar{Y}_{i.} - \mu) \\ &= k\sum_{i=1}^b (\bar{Y}_{i.} - \mu)^2 + bk (\bar{Y}_{..}-\mu)^2 - 2(\bar{Y}_{..}-\mu)(bk\bar{Y}_{..} - bk\mu) \\ &= k\sum_{i=1}^b (\bar{Y}_{i.} - \mu)^2 - bk (\bar{Y}_{..}-\mu)^2\\ \end{align} \] (because \(\sum_{i=1}^b \bar{Y}_{i.} = b\bar{Y}_{..}\))

So, \(E(SS_B) = E\left(k\sum_{i=1}^b (\bar{Y}_{i.} - \mu)^2 - bk (\bar{Y}_{..}-\mu)^2\right) = k \sum_{i=1}^b E(\bar{Y}_{i.} - \mu)^2 - bk E(\bar{Y}_{..}-\mu)^2\).

For the second term, \(E(\bar{Y}_{..}-\mu)^2 = \sigma^2/bk\).

For the first term, \(E(\bar{Y}_{i.} - \mu)^2 = Var(\bar{Y}_{i.} - \mu) + \left[E(\bar{Y}_{i.} - \mu) \right]^2\). But \(Var(\bar{Y}_{i.} - \mu)=Var(\bar{Y}_{i.}) = \sigma^2/k\), and \(E(\bar{Y}_{i.} - \mu) = \mu + \beta_i-\mu = \beta_i\).

So \(E(\bar{Y}_{i.} - \mu)^2 = \sigma^2/k + \beta_i^2\).

\[ \begin{align} E(SS_B) &= k \sum_{i=1}^b E(\bar{Y}_{i.} - \mu)^2 - bk E(\bar{Y}_{..}-\mu)^2 \\ &= k \sum_{i=1}^b \sigma^2/k + \beta_i^2 - bk \sigma^2/bk \\ &= b\sigma^2 + b\beta_i^2 - \sigma^2 \\ &= (b-1)\sigma^2 + b\beta_i^2 \\ \end{align} \]

Under \(H_0\), what are the distributions of \(SS_{B}/\sigma^2\) and \(SS_{E}/\sigma^2\)?

Hint You should be able to takea guess, or look at previous weeks.
Answer Both follow \(\chi^2\) distributions, with \(b-1\) and \((b-1)(k-1)\)degrees of freedom. See Theorem 13.2.2

So, under \(H_0\), we can test whether there is a block effect with

\[ F = \frac{SS_{B}/(b-1)}{SS_{E}/(b-1)(k-1)} \] Which will follow an F distribution with \(b-1\) and \((b-1)(k-1)\) degrees of freedom. Note that this is independent of the test of the treatment effect.

We now have a couple of tests, and with more complex experiments we will have even more. It is often convenient to summarisae these in tables. These are called ANOVA tables:

Source df SS MS F p
Treatments \(k-1\) \(SS_{TR}\) \(SS_{TR}/(k-1)\) \(\frac{SS_{TR}/(k-1)}{SS_E/(b-1)(k-1)}\) \(Pr(F_{k-1,(b-1)(k-1)} \ge obs. f)\)
Blocks \(b-1\) \(SS_B\) \(SS_B/(b-1)\) \(\frac{SS_{B}/(b-1)}{SS_E/(b-1)(k-1)}\) \(Pr(F_{b-1,(b-1)(k-1)} \ge obs. f)\)
Error \((b-1)(k-1)\) \(SS_E\) \(SS_E/(b-1)(k-1)\)
Total \(n-1\) \(SS_{Tot}\)

Where

And, to remind you,

\[ \begin{align} SS_{TR} &=\sum_{j=1}^k\sum_{i=1}^{n_k}(\bar{Y}_{.j} - \bar{Y}_{..})^2=\sum_{j=1}^k{n_k}(\bar{Y}_{.j} - \bar{Y}_{..})^2 \\ SS_B &=\sum_{i=1}^b\sum_{j=1}^{k}(\bar{Y}_{i.} - \bar{Y}_{..})^2 \\ SS_E &=\sum_{i=1}^b\sum_{j=1}^{k}(Y_{ij} - \bar{Y}_{.j} - \bar{Y}_{i.} + \bar{Y}_{..})^2 \\ SS_{Tot} &= \sum_{j=1}^k\sum_{i=1}^{n_k}(Y_{ij} - \bar{Y}_{..})^2 \end{align} \]

Calculating these is nowadays best done by computer. But there is some value in doing it by hand a few times, so you know what is actually done.

Potato Yields

We can do these calculations for the potatoes. Here we are using the varieties as Blocks, and the means in each block/treatment combination.This is the table:

Df Sum Sq Mean Sq F value Pr(>F)
Fert 2 0.117 0.058 1.755 0.196
Variety 11 14.546 1.322 39.833 0.000
Residuals 22 0.730 0.033

Is there evidence against \(H_0: \beta_i=0, \forall i\), i.e. is there evidence for a treatment effect?

Answer No, the p-value is \(>0.05\), so there is no evidence that any of the means are different

Is there evidence against \(H_0: \beta_i=0, \forall i\), i.e. is there evidence that the varieties differ?

Answer Yes, the p-value is \(<<0.05\), which suggests some differences between varieties.

We can compare this with the ANOVA table where we only have Treatment:

Df Sum Sq Mean Sq F value Pr(>F)
Fert 2 0.117 0.058 0.126 0.882
Residuals 33 15.276 0.463

The conclusion is still that there is no effect of treatment, but the F ratio is lower and the p-value is higher.

What causes this difference in F- and p- values?

Answer There is no difference in the Treatment SS and MS, but in the two-way model the \(SS_E\) is smaller. This is because some of the variation has been moved into the Variety effect. So the denominator is smaller. In other words, we are more certain about any differences there are (even if they are still not statistically significant).

On Error

On P619-621 the book discussed what is meant by “error”. I think this is an important point about how statistical modelling is done. So go away and read it. I’ll wait. Then I have a few questions…

We have already see examples of analyses where this approach to error is used (i.e. where there is no replication). What were they?

Answer

Regression is the main example example where we did not have any replication: each observation could have its own covriate value.

A second, subtler, example is the Goodness of Fit tests. Each predicted count was only used once. One might argue that each observation is a replicate. This can be correct, but there cam also be cases where there might be more variation, e.g. within a class, and this does not follow a Poisson/multinomial error. A typical example would be estimating the abundance of species. If we look on one day, we will count them, but come back the next day and our counts could be quite different, even if the abundance is (presumably) the same on each day.

For the randomised block design analysis here, when could it go wrong and give the wrong results? And what could be done about it?

Answer

The randomised block design model assumes that the block and treatment effects are independent, i.e. the treatment effect is the same in every block. But the treatment effect may depend on the block, e.g. some varieties of potato may respond well to adding sulphate, but others may not. This is called an interaction between variables.

The solution to this is to replicate, i.e. to have more than one observation of a treatment in each block. This means that the replications can be used to estimate \(SS_E\), and the means of each Block/Treatment combination can be used for the interaction.

We will not look at interactions more in this course. The maths is “more of the same”.

Contrasts

This section is easy: see last week. The only difference is that the degrees of freedom associated with \(SS_E\) are now \((b-1)(k-1)\).

Paired t- tests

Paired t-tests are a bit of an odd beast, because they can be viewed in different ways.

They are based on a particular experimental design, where observations are paired. For example, we might have a before/after study: we measure how fast people run before and after they become zombies, and this tells us about whether becoming a zombie slows people down. This is the same as a block design: each individual is a bock, and they get every treatment:

Person Not Zombie Zombie
Alexander 14.4 15.9
Boris 16.5 13.0
C 15.1 15.3
de Pfeffel 13.6 16.9

(names have been redacted to protect the innocent)

Ideally the order of the treatments should be randomised (imagine if the “Not Zombie” measurements were taken when the people were 25, and the “Zombie” taken when they were 45): if this is not possible there are other experimental designs.

The problem a paired t-test is designed to test is whether there is a difference in the treatment. This is, of course, exactly the same question as a two-sampled t-test.

Our data for each observation is a triplet, \((Y_{ij}, i, j)\), where \(i \in 1,\dots,b\), \(j \in 1,\dots,2\). So \(i\) denotes which pair the data is from, and \(j\) is which treatment (e.g. ‘before’ or ‘after’).

Once we have this data, how do we analyse it? There are three appraoches, but two are the same.

Two sample t-test: Ignoring the pairs

We can re-write our normal two sampled t-test by ignoring the blocks. This is the same as a one-way ANOVA with two treatment levels. From this our t statistic is

\[ t = \frac{(\bar{Y}_{1.} - \bar{Y}_{2.}) - (\mu_{1}-\mu_2)}{ \sqrt{\frac{SS^O_E}{k-1} \frac{1}{k}}} \]

where, from week 10 (or p584) \(SS_E^O = \sum_{j=1}^k\sum_{i=1}^b (Y_{ij} - \bar{Y}_{.j})^2\) is the \(SS_E\) for a one-way ANOVA. We can look at the expectation and variance of \(\bar{D}=\bar{Y}_{1.} - \bar{Y}_{2.}\)

What are the expectation and variance of \(\bar{D}=\bar{Y}_{1.} - \bar{Y}_{2.}\)?

Answer

Expectation:

\[ \bar{D} = \bar{Y}_{1.} - \bar{Y}_{2.} = \frac{1}{b}\sum_{i=1}^b(\mu_1 + \beta_{1i} + \varepsilon_{i1}) - \frac{1}{b}\sum_{i=1}^b(\mu_2 + \beta_i + \varepsilon_{i2}) = \mu_1 -\mu_2 + \varepsilon_{i1} - \varepsilon_{i2} \]

\[ \begin{align} E(D_j) &= E(\bar{Y}_{1.} - \bar{Y}_{2.}) = \frac{1}{b}\sum_{i=1}^b E(\mu_1 + \beta_{i} + \varepsilon_{i1}) - \frac{1}{b}\sum_{i=1}^b E(\mu_2 + \beta_i + \varepsilon_{i2}) \\ &= E(\mu_1) - E(\mu_2) + E(\beta_{i}) - E(\beta_{i}) + E(\varepsilon_{i1}) - E(\varepsilon_{i2}) \\ &= E(\mu_1) - E(\mu_2) \end{align} \]

Variance:

\[ \begin{align} Var(\bar{D}) &= Var(\bar{Y}_{1.} - \bar{Y}_{2.}) = Var(\bar{Y}_{1.}) + Var(\bar{Y}_{2.}) \\ \end{align} \] But

\[ \begin{align} Var(\bar{Y}_{j.}) &= Var \left(\frac{1}{b}\sum_{i=1}^b(\mu_j + \beta_{ji} + \varepsilon_{ij}) \right) \\ &= \frac{1}{b^2} \left(\sum_{i=1}^b Var (\mu_j + \beta_{ji} + \varepsilon_{ij}) \right) \\ &= \frac{1}{b} (Var(\beta_{i}) + \sigma^2)\\ \end{align} \] So, noting that the \(\beta_i\)s are the same because of the pairing,

\[ \begin{align} Var(\bar{D}) &= Var(\bar{Y}_{1.}) + Var(\bar{Y}_{2.}) \\ &= \frac{1}{b} (Var(\beta_{1}) + \sigma^2) + \frac{1}{b} (Var(\beta_{2}) + \sigma^2)\\ &= \frac{2}{b} (Var(\beta) + \sigma^2) \end{align} \]

The main point here is that the error variance includes a term involving the \(\beta_i\)s.

Paired two sample t-test

If we take into account that we have paired data, we can estimate \(\mu_{1}-\mu_2\) by calculating \(D_j = Y_{j1} - Y_{j2}\), and then testing if \(D_j =0\). This works because \(Y_{ij} = \mu_j + \beta_i + \varepsilon_{ij}\), so

\[ D_j = Y_{1j} - Y_{2j} = \mu_1 + \beta_i + \varepsilon_{i1} - (\mu_2 + \beta_i + \varepsilon_{i2}) = \mu_1 -\mu_2 + \varepsilon_{i1} - \varepsilon_{i2} \]

What are the expectation and variance of \(D_j\), and what distribution does it follow?

Answer

Expectation: \[ \begin{align} E(D_j) &= E(\mu_1 -\mu_2 + \varepsilon_{i1} - \varepsilon_{i2}) \\ &= E(\mu_1) - E(\mu_2) + E(\varepsilon_{i1}) - E(\varepsilon_{i2}) \\ &= E(\mu_1) - E(\mu_2) \end{align} \]

Variance: \[ \begin{align} Var(D_j) &= Var(\mu_1 -\mu_2 + \varepsilon_{i1} - \varepsilon_{i2}) \\ &= Var(\varepsilon_{i1}) - Var(\varepsilon_{i2}) \\ &= 2 \sigma^2 \end{align} \]

And finally, because \(Y_{1j}\) and \(Y_{2j}\) are normally distributed, so must their difference.

So we have a one-sample normal distribution (see Week 2 or Chapter 7), and can simply use a t-test:

\[ t_D = \frac{\bar{D} - \mu_D}{S_D/\sqrt(b)} \sim t_{b-1} \]

Where \(S_D\) is the standard deviation of \(D_j\).

Note that the error variance is smaller if there are block effects. So the tests will be better, i.e. more likely to reject the null hypothesis if the actual difference is not 0.

Two and a half ways to analyse paired data

We could also analyse this data as a two-way ANOVA, but the results will be exactly the same. Proving this is an exercise for this week, so you don’t have to believe me, but you do have to do the hard work.

We have three ways of looking at this paired test:

  • as a “paired t test”
  • as a t-test on the differences
  • as a two-way ANOVA

They are all the same, but approached from different points of view. The first sees this as a specific test, one of a long list of tests with a different test for each sort of data. The second is perhaps the mathematician’s approach: reduce the test down to something simpler that you already know. The third is to view this as a specific example of a general approach: ANOVA is a much more general method, and we have barely scratched the surface.