class: center, middle, inverse, title-slide # Multiple comparisons ### Prof. Maria Tackett --- class: middle, center ## [Click here for PDF of slides](09-multiple-comparisons.pdf) --- ## Topics - Next steps after ANOVA - Individual vs. family-wise Type I error - Multiple comparisons using Bonferroni correction --- ## Aldrin in the Wolf River <img src="img/07/wolf.png" width="40%" style="display: block; margin: auto;" /> - The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides). - These highly toxic organic compounds can cause various cancers and birth defects. --- ## Aldrin in the Wolf River - The standard methods to test whether these substances are present in a river is to take samples at six-tenths depth. - These compounds are denser than water and their molecules tend to stick to particles of sediment, they are more likely to be found in higher concentrations near the bottom than near mid-depth. - We will compare mean concentration levels (in nanograms per liter) for three depths. --- class: middle ## Is there a difference between the mean aldrin concentrations among the three depth levels? --- ## ANOVA <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> df </th> <th style="text-align:right;"> sumsq </th> <th style="text-align:right;"> meansq </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> depth </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 16.961 </td> <td style="text-align:right;"> 8.480 </td> <td style="text-align:right;"> 6.134 </td> <td style="text-align:right;"> 0.006 </td> </tr> <tr> <td style="text-align:left;"> Residuals </td> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 37.329 </td> <td style="text-align:right;"> 1.383 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> <br> .eq[ $$ `\begin{aligned} &H_0: \mu_1 = \mu_2 = \mu_3\\ &H_a: \text{At least one depth level has }\mu_i \text{ that is not equal to the others} \end{aligned}` $$ ] --- ## ANOVA <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> df </th> <th style="text-align:right;"> sumsq </th> <th style="text-align:right;"> meansq </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> depth </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 16.961 </td> <td style="text-align:right;"> 8.480 </td> <td style="text-align:right;"> 6.134 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.006 </td> </tr> <tr> <td style="text-align:left;"> Residuals </td> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 37.329 </td> <td style="text-align:right;"> 1.383 </td> <td style="text-align:right;"> </td> <td style="text-align:right;background-color: #dce5b2 !important;"> </td> </tr> </tbody> </table> The p-value is very small `\((\approx 0)\)`, so we reject `\(H_0\)`. The data provide sufficient evidence that at least one depth level has a mean aldrin concentration that differs from the others. --- class: middle We know at least one depth level has a mean aldrin concentration that differs from the others. The next question we want to answer in our analysis is .vocab[*which one*]? --- ## Difference in means We can use confident intervals to estimate the difference between the means, `\(\mu_i-\mu_j\)` for each pair of groups .alert[ `$$(\bar{y}_i-\bar{y}_j) \pm t^* \times \sqrt{MS_{Within} \Big(\frac{1}{n_i}+\frac{1}{n_j}\Big)}$$` where the critical value `\(t^*\)` is calculated from a `\(t\)` distribution with `\(n-K\)` degrees of freedom. ] -- If we have `\(K\)` groups, we will make `\({K \choose 2} = K(K-1)/2\)` such comparisons --- ## Comparisions for Aldrin data set There are 3 depth levels in our data, so we can make `\({3 \choose 2} = 3(3-1)/2 = 3\)` comparisons -- `$$\small{(\bar{y}_{middepth}-\bar{y}_{bottom}) \pm t^* \times \sqrt{MS_{Within}\Big(\frac{1}{n_{middepth}}+\frac{1}{n_{bottom}}\Big)}}$$` `$$\small{(\bar{y}_{surface}-\bar{y}_{bottom}) \pm t^* \times \sqrt{MS_{Within}\Big(\frac{1}{n_{surface}}+\frac{1}{n_{bottom}}\Big)}}$$` `$$\small{(\bar{y}_{surface}-\bar{y}_{middepth}) \pm t^* \times \sqrt{MS_{Within}\Big(\frac{1}{n_{surface}}+\frac{1}{n_{middepth}}\Big)}}$$` --- ## Individual vs. Family-wise Type I Error .vocab[Type I error]: Incorrectly reject `\(H_0\)`. - In our example, incorrectly reject the null hypothesis that mean aldrin concentration levels are equal - Based on our confidence interval, we incorrectly conclude there is a difference in the mean aldrin concentration for the two groups -- .vocab[Individual Type I error]: incorrectly reject `\(H_0\)` for **one specific** comparison of group means -- .vocab[Family-wise Type I error]: Incorrectly reject `\(H_0\)` for **at least one** comparison of group means --- ## Multiple Comparisons - The probability of making an individual Type I error is `\(\color{#87037B}{\alpha = 1 - C}\)`, where `\(C\)` is the confidence level - Even if the probability of making an individual Type I error is low, the probability of making a family-wise Type I error becomes much larger when we make multiple comparisons --- ## xkcd "Significant" .pull-left[ <img src="img/09/xkcd-significant-1.png" width="75%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="img/09/xkcd-significant-2.png" width="75%" style="display: block; margin: auto;" /> ] .midi[source:https://xkcd.com/882/] --- ## Correcting for multiple comparisons .alert[ `$$(\bar{y}_i-\bar{y}_j) \pm t^* \sqrt{MS_{Within}\Big(\frac{1}{n_i}+\frac{1}{n_j}\Big)}$$` where the critical value `\(t^*\)` is calculated from a `\(t\)` distribution with `\(n-K\)` degrees of freedom. ] When we make multiple comparisons, we will select the critical value `\(t^*\)` to control for the probability of making a **family-wise Type I error** --- ## Bonferroni correction .vocab[Goal: ] Choose the critical value `\(t^*\)` such that the probability of making a .vocab[family-wise Type I error is ] `\(\color{#87037B}{\alpha}\)`. To do so, we will choose `\(t^*\)` such that the probability of making an .vocab[individual Type I error is ] `\(\color{#87037B}{\frac{\alpha}{m}}\)`, where `\(m\)` is the number of comparisons In other words, we will find `\(t^*\)` that corresponds to a .vocab[confidence level of ] `\(\color{#87037B}{1 - \alpha/m}\)`. --- ## Comparisons for the Aldrin data set We want the probability of making a family-wise Type I error to be `\(\alpha = 0.05\)`. -- We are making 3 comparisons. Therefore, we want probability of making an individual Type I error to be `\(\alpha / m = 0.05 / 3\)`. -- .alert[ We calculate each confidence interval using the critical value `\(t^*\)` that corresponds to a confidence level of `\(C = 1 - 0.05/3 \approx 0.9833\)` in the `\(t\)` distribution with `\(30 - 3 = 27\)` degrees of freedom. ] --- ## Pairwise comparisions in R ```r library(pairwiseCI) pairwiseCI(aldrin ~ depth, data = aldrin, conf.level = 1- 0.05/3, var.equal = TRUE) %>% kable(digits = 3) ``` <table> <thead> <tr> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> lower </th> <th style="text-align:right;"> upper </th> <th style="text-align:left;"> comparison </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> -0.99 </td> <td style="text-align:right;"> -2.598 </td> <td style="text-align:right;"> 0.618 </td> <td style="text-align:left;"> middepth-bottom </td> </tr> <tr> <td style="text-align:right;"> -1.84 </td> <td style="text-align:right;"> -3.268 </td> <td style="text-align:right;"> -0.412 </td> <td style="text-align:left;"> surface-bottom </td> </tr> <tr> <td style="text-align:right;"> -0.85 </td> <td style="text-align:right;"> -1.923 </td> <td style="text-align:right;"> 0.223 </td> <td style="text-align:left;"> surface-middepth </td> </tr> </tbody> </table> --- ## Comparing Aldrin concentrations <table> <thead> <tr> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> lower </th> <th style="text-align:right;"> upper </th> <th style="text-align:left;"> comparison </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> -0.99 </td> <td style="text-align:right;"> -2.598 </td> <td style="text-align:right;"> 0.618 </td> <td style="text-align:left;"> middepth-bottom </td> </tr> <tr> <td style="text-align:right;"> -1.84 </td> <td style="text-align:right;"> -3.268 </td> <td style="text-align:right;"> -0.412 </td> <td style="text-align:left;"> surface-bottom </td> </tr> <tr> <td style="text-align:right;"> -0.85 </td> <td style="text-align:right;"> -1.923 </td> <td style="text-align:right;"> 0.223 </td> <td style="text-align:left;"> surface-middepth </td> </tr> </tbody> </table> Based on this, we see there is a statistically significant difference between the mean aldrin concentration at the surface and at the bottom. -- More specifically, we are 98.3% confident that the mean aldrin level is about 0.412 to 3.268 nanograms per liter lower at the surface than at the bottom. --- ## Recap - Next steps after ANOVA - Individual vs. family-wise Type I error - Multiple comparisons using Bonferroni correction