In this lab, we will use logistic regression to explore what factors are associated with people’s opinions on etiquette while flying on an airplane. More specifically, we’ll take a look at the the factors associated with whether or not a person thinks it’s rude to recline the seat on a plane.
By the end of the lab you will be able to…
A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
Go to course organization on GitHub.
In addition to your private individual repositories, you should now see a repo named lab-08-. Go to that repository.
Each person on the team should clone the repository and open a new project in RStudio.
Now that you have had some experience collaborating as a team on GitHub, it will be up to your team to decide who types the responses to each question. Every team member should contribute to the discussion about each lab exercise even if they are not the one typing the team’s responses.
Every team member should have at least one meaningful commit to the repo on GitHub.
If you would like your git password cached for a week for this project, type the following in the Terminal in RStudio:
git config --global credential.helper 'cache --timeout 604800'
We will use the following packages in today’s lab. Feel free to add any other packages as needed.
library(tidyverse)
library(broom)
library(knitr)
library(Stat2Data)
The data in this lab is from a survey conducted for the FiveThirtyEight article “41 Percent of Fliers Think You’re Rude if You Recline Your Seat” by Walt Hickey. The data were collected through an online poll conducted August 29 - 30, 2014.
Use the code below to load the data set from the FiveThirtyEight GitHub repo and rename a few columns we’ll use in the analysis. We’re renaming the columns to make them easier to use in our code throughout the analysis.
We will also create a category if Age called “Missing” if a respondent didn’t report their age, and remove respondents who did not answer the primary question of interest of whether or not it’s rude to recline the seat on a plane.
<- read_csv("http://bit.ly/538-flying-survey") %>%
flying rename(frequency = `How often do you travel by plane?`,
height = `How tall are you?`,
have_children = `Do you have any children under 18?`,
recline_rude = `Is itrude to recline your seat on a plane?`,
%>%
) mutate(Age = if_else(is.na(Age), "Missing", Age)) %>% #category for missing age
drop_na(recline_rude)
The variables we’ll use in this lab are
Age
: Age group of respondentheight
: Response to “How tall are you?”have_children
: Response to “Do you have any children under 18?”frequency
: Response to “How often do you travel by plane?”recline_rude
: Response to “Is it rude to recline your seat on a plane?”Let’s create a binary indicator for whether or not a person thinks it’s rude to recline the seat on a plane. Create a new variable that takes value “1” if the respondent said that reclining a seat on a plane is “Yes, somewhat rude” or “Yes, very rude” and “0” otherwise.
Are the data consistent with the title of the article? Show any code / output used to make your assessment.
Sometimes when we work on analysis, there are variables that could be useful but are not recorded in the data set in a format that’s easy to use. In this data, that variable is height, which is currently in the data as a categorical variable. Height is currently recorded in feet and inches; for example, 5'9"
means “five feet and nine inches”.
Create a new numeric variable for the height recorded in inches. For example, 5'9"
will be recorded as 69
in the new variable. Your new variable should also incorporate the following -
Under 5 ft.
, record the height as 60 inches.6'6" and above
, record the height as 78 inches.Visualize the distribution of the newly created variable.
Hint: Use the separate
function to split the feet and inches into separate columns.
Create a visualization of the relationship between how often a person travels by plane and whether they think it’s rude to recline the seat on an airplane. Interpret the visualization.
Create an empirical logit plot of the relationship between a person’s height and whether they think it’s rude to recline the seat on an airplane. Interpret the visualization. Hint: See the lecture notes on creating empirical logit plots.
Fit a model using the height (in inches), frequency of traveling by plane, whether a person has children, and age group to understand the odds a person thinks it’s rude to recline the seat on an airplane. Display the model using 3 digits.
Use the model from the previous exercise to interpret the relationship between the following variables and the odds a person thinks it’s rude to recline the seat on a plane:
Check the conditions for logistic regression. For each condition, state whether it is satisfied and a brief explanation of your response, including any tables or visualizations used to make the conclusion.
One team member: Upload the team’s PDF to Gradescope. Be sure to include every team member’s name in the Gradescope submission Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.
There should only be one submission per team on Gradescope.
Note that your submission on Gradescope should be identical to the PDF that is rendered when we knit the Rmd file in your repo.