This lab will focus on using Analysis of Variance to analyze data about penguins in the Palmer Archipelago near Palmer Station.
By the end of the lab you will be able to…
A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
Go to course organization on GitHub.
In addition to your private individual repositories, you should now see a repo named lab-05-. Go to that repository.
Each person on the team should clone the repository and open a new project in RStudio.
Now that you have had some experience collaborating as a team on GitHub, it will be up to your team to decide who types the responses to each question. Every team member should contribute to the discussion about each lab exercise even if they are not the one typing the team’s responses.
Every team member should have at least one meaningful commit to the repo on GitHub.
We’ll use the following packages in this lab.
library(tidyverse)
library(knitr)
library(broom)
library(pairwiseCI)
# add more packages as needed
The data is the penguins
data set from the palmerpenguins R package maintained by Dr. Allison Horst. This data set contains measurements and other characteristics for 344 penguins observed near Palmer Station in Antarctica. The data were originally collected by Dr. Kristen Gorman.
The data set is in penguins.csv
located in the data
folder.
The following variables are in the penguins
data set.
variable | class | description |
---|---|---|
species | integer | Penguin species (Adelie, Gentoo, Chinstrap) |
island | integer | Island where recorded (Biscoe, Dream, Torgersen) |
bill_length_mm | double | Bill length in millimeters (also known as culmen length) |
bill_depth_mm | double | Bill depth in millimeters (also known as culmen depth) |
flipper_length_mm | integer | Flipper length in mm |
body_mass_g | integer | Body mass in grams |
sex | integer | sex of the animal |
year | integer | year recorded |
Can we differentiate penguin species based on their bill length? To analyze this question we will use Analysis of Variance to compare the mean bill length for Adelie, Gentoo, Chinstrap penguins found near the Palmer Station in Antarctica.
As you complete the lab
Let’s take a look at the bill_length_mm
based on the species
. Create a visualization to explore the relationship between the two variables. Interpret the visualization in the context of the data.
We’d like to use ANOVA to assess if there’s a relationship between bill length and species. Write the null and alternative hypotheses for the ANOVA test (a) using mathematical notation (b) using words.
Before conducting the test, let’s check the conditions for ANOVA. For each condition write whether or not it is satisfied. Briefly explain your response for each condition, including any plots, summary statistics, etc. used to make your determination.
Type ?kable
in the console to access the help file for the kable
function.
Display the ANOVA table obtained from R. Update the column names in the table, so they are more informative for a general audience (e.g. “Sum of Squares” instead of “sumsq”).
What is \(s_y^2\), the variance of the distribution of bill length? Show how you calculated this value from the ANOVA table.
What is the conclusion from the ANOVA test? State the conclusion in the context of the data.
Let’s dive into the data further and use pairwise comparisons to understand how bill length compares across species. We want the probability of making a family-wise Type I error to be 0.08. Briefly explain what this means in the context of the data.
To achieve the desired family-wise Type I error rate from the previous question, what confidence level should be used for the individual confidence intervals? Briefly explain your response.
Calculate the pairwise confidence intervals. You may calculate them in R using the pairwiseCI
function.
How does the bill length compare across the species of penguins? In your description, include numerical and inferential conclusions based on your analysis.
One team member: Upload the team’s PDF to Gradescope. Be sure to include every team member’s name in the Gradescope submission Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.
There should only be one submission per team on Gradescope.