The Wolf River in Tennessee flows past an abandoned site once used by the pesticide industry for dumping wastes, including chlordane (pesticide), aldrin, and dieldrin (both insecticides).
These highly toxic organic compounds can cause various cancers and birth defects.
## # A tibble: 30 x 2## aldrin depth ## <dbl> <chr> ## 1 3.8 bottom## 2 4.8 bottom## 3 4.9 bottom## 4 5.3 bottom## 5 5.4 bottom## 6 5.7 bottom## 7 6.3 bottom## 8 7.3 bottom## 9 8.1 bottom## 10 8.8 bottom## # … with 20 more rows
depth | n | mean | sd |
---|---|---|---|
bottom | 10 | 6.04 | 1.579 |
middepth | 10 | 5.05 | 1.104 |
surface | 10 | 4.20 | 0.660 |
So far, we have used a quantitative predictor variable to understand the variation in a quantitative response variable.
Now, we will use a categorical (qualitative) predictor variable to understand the variation in a quantitative response variable.
K is number of mutually exclusive groups. We index the groups as i=1,…,K.
ni is number of observations in group i
K is number of mutually exclusive groups. We index the groups as i=1,…,K.
ni is number of observations in group i
n=n1+n2+⋯+nK is the total number of observations in the data
K is number of mutually exclusive groups. We index the groups as i=1,…,K.
ni is number of observations in group i
n=n1+n2+⋯+nK is the total number of observations in the data
yij is the jth observation in group i, for all i,j
K is number of mutually exclusive groups. We index the groups as i=1,…,K.
ni is number of observations in group i
n=n1+n2+⋯+nK is the total number of observations in the data
yij is the jth observation in group i, for all i,j
μi is the population mean for group i, for i=1,…,K
Question of interest Is the mean value of the response y the same for all groups, or is there at least one group with a significantly different mean value?
To answer this question, we will test the following hypotheses:
H0:μ1=μ2=⋯=μKHa:At least one μi is not equal to the others
H0:μ1=μ2=⋯=μKHa:At least one μi is not equal to the others
Main Idea: Decompose the total variation in the data into the variation between groups (model) and the variation within each group (residuals)
K∑i=1ni∑j=1(yij−ˉy)2=K∑i=1ni(ˉyi−ˉy)2+K∑i=1ni∑j=1(yij−ˉyi)2
Main Idea: Decompose the total variation in the data into the variation between groups (model) and the variation within each group (residuals)
K∑i=1ni∑j=1(yij−ˉy)2=K∑i=1ni(ˉyi−ˉy)2+K∑i=1ni∑j=1(yij−ˉyi)2
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
Total variation: variation between and within groups
SSTotal=16.961+37.329=54.290
DFTotal=2+37=29
s2y=SSTotalDFTotal=54.29029=1.872
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
Between variation: variation in the group means
SSBetween=16.961
DFBetween=2
MSBetween=SSBetweenDFBetween=15.9612=8.480
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
Within variation: variation within each group
SSWithin=37.329
DFWithin=27
MSWithin=SSWithinDFWithin=37.32927=1.383
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
H0:μ1=μ2=μ3Ha:At least one depth level has μi that is not equal to the others
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
Test statistic: Ratio of between group and within group variation
F=MSBetweenMSWithin=8.4801.383=6.134
Calculate the p-value using an F distribution with K−1 and n−K degrees of freedom
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
P-value: Probability of observing a test statistic at least as extreme as F Stat given the group means are equal
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
depth | 2 | 16.961 | 8.480 | 6.134 | 0.006 |
Residuals | 27 | 37.329 | 1.383 |
P-value: Probability of observing a test statistic at least as extreme as F Stat given the group means are equal
The p-value is very small (≈0), so we reject H0. The data provide sufficient evidence that at least one depth level has a mean aldrin concentration that differs from the others.
1️⃣ Normality: yij∼N(μi,σ2)
1️⃣ Normality: yij∼N(μi,σ2)
2️⃣ Constant variance: The population distribution for each group has a common variance, σ2
1️⃣ Normality: yij∼N(μi,σ2)
2️⃣ Constant variance: The population distribution for each group has a common variance, σ2
3️⃣ Independence: The observations are independent from each other
1️⃣ Normality: yij∼N(μi,σ2)
2️⃣ Constant variance: The population distribution for each group has a common variance, σ2
3️⃣ Independence: The observations are independent from each other
For ANOVA, we can typically check these assumptions in the exploratory data analysis
✅ No major skewness or outliers.
✅ Points fall relatively along the diagonal line.
## # A tibble: 3 x 4## depth n mean sd## * <chr> <int> <dbl> <dbl>## 1 bottom 10 6.04 1.58 ## 2 middepth 10 5.05 1.10 ## 3 surface 10 4.2 0.660
✅ The maximum standard deviation is about 2.4 times the smallest one. This is OK given the small sample size.
✅ Based on what we know about the study, we have no reason to believe that the aldrin concentrations are not independent of each other.
Normality: yij∼N(μi,σ2)
Independence: There is independence within and across groups
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |