Lab 08: Reclining the seat on an airplane

due Sun, Apr 11, at 11:59p EDT

In this lab, we will use logistic regression to explore what factors are associated with people’s opinions on etiquette while flying on an airplane. More specifically, we’ll take a look at the the factors associated with whether or not a person thinks it’s rude to recline the seat on a plane.

Learning goals

By the end of the lab you will be able to…

Getting started

Clone assignment repo + start new project

A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.

Workflow: Using git and GitHub as a team

Now that you have had some experience collaborating as a team on GitHub, it will be up to your team to decide who types the responses to each question. Every team member should contribute to the discussion about each lab exercise even if they are not the one typing the team’s responses.

Every team member should have at least one meaningful commit to the repo on GitHub.

Password caching

If you would like your git password cached for a week for this project, type the following in the Terminal in RStudio:

git config --global credential.helper 'cache --timeout 604800'

Packages + data

Packages

We will use the following packages in today’s lab. Feel free to add any other packages as needed.

library(tidyverse)
library(broom)
library(knitr)
library(Stat2Data)

Data

The data in this lab is from a survey conducted for the FiveThirtyEight article “41 Percent of Fliers Think You’re Rude if You Recline Your Seat” by Walt Hickey. The data were collected through an online poll conducted August 29 - 30, 2014.

Use the code below to load the data set from the FiveThirtyEight GitHub repo and rename a few columns we’ll use in the analysis. We’re renaming the columns to make them easier to use in our code throughout the analysis.

We will also create a category if Age called “Missing” if a respondent didn’t report their age, and remove respondents who did not answer the primary question of interest of whether or not it’s rude to recline the seat on a plane.

flying <- read_csv("http://bit.ly/538-flying-survey") %>%
  rename(frequency = `How often do you travel by plane?`, 
         height = `How tall are you?`,
         have_children = `Do you have any children under 18?`, 
         recline_rude = `Is itrude to recline your seat on a plane?`, 
         ) %>%
  mutate(Age = if_else(is.na(Age), "Missing", Age)) %>% #category for missing age
  drop_na(recline_rude) 

The variables we’ll use in this lab are

Exercises

  1. Let’s create a binary indicator for whether or not a person thinks it’s rude to recline the seat on a plane. Create a new variable that takes value “1” if the respondent said that reclining a seat on a plane is “Yes, somewhat rude” or “Yes, very rude” and “0” otherwise.

    Are the data consistent with the title of the article? Show any code / output used to make your assessment.

  2. Sometimes when we work on analysis, there are variables that could be useful but are not recorded in the data set in a format that’s easy to use. In this data, that variable is height, which is currently in the data as a categorical variable. Height is currently recorded in feet and inches; for example, 5'9" means “five feet and nine inches”.

    Create a new numeric variable for the height recorded in inches. For example, 5'9" will be recorded as 69 in the new variable. Your new variable should also incorporate the following -

    • If a person has a height of Under 5 ft., record the height as 60 inches.
    • If a person has a height of 6'6" and above, record the height as 78 inches.

    Visualize the distribution of the newly created variable.

Hint: Use the separate function to split the feet and inches into separate columns.

  1. Create a visualization of the relationship between how often a person travels by plane and whether they think it’s rude to recline the seat on an airplane. Interpret the visualization.

  2. Create an empirical logit plot of the relationship between a person’s height and whether they think it’s rude to recline the seat on an airplane. Interpret the visualization. Hint: See the lecture notes on creating empirical logit plots.

  3. Fit a model using the height (in inches), frequency of traveling by plane, whether a person has children, and age group to understand the odds a person thinks it’s rude to recline the seat on an airplane. Display the model using 3 digits.

  4. Use the model from the previous exercise to interpret the relationship between the following variables and the odds a person thinks it’s rude to recline the seat on a plane:

    • Whether a person has children under the age of 18
    • Age
  5. Check the conditions for logistic regression. For each condition, state whether it is satisfied and a brief explanation of your response, including any tables or visualizations used to make the conclusion.

Submission

One team member: Upload the team’s PDF to Gradescope. Be sure to include every team member’s name in the Gradescope submission Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.

There should only be one submission per team on Gradescope.

Note that your submission on Gradescope should be identical to the PDF that is rendered when we knit the Rmd file in your repo.