+ - 0:00:00
Notes for current slide
Notes for next slide

Exploring multivariable relationships

Prof. Maria Tackett

1

Carbohydrates in Starbucks food

  • Starbucks often displays the total calories in their food items but not the other nutritional information.

  • Our goal is to analyze the relationship between the calories and total carbohydrates (carbs) in Starbucks food items, and assess if it differs based on the type of food item (bakery, salad, sandwich, etc.)

  • We can use our analysis to estimate the total carbs using information about the total calories and type for a given food time

3

Starbucks data

4

Starbucks data

  • Observations: 77 Starbucks food items
4

Starbucks data

  • Observations: 77 Starbucks food items

  • Variables:

    • carb: Total carbohydrates (in grams)
    • calories: Total calories
    • bakery: 1: bakery food item, 0: other food type
4

Terminology

  • carb is the response variable
    • variable whose variation we want to understand / variable we wish to predict
    • also known as outcome or dependent variable
5

Terminology

  • carb is the response variable
    • variable whose variation we want to understand / variable we wish to predict
    • also known as outcome or dependent variable


  • calories, bakery are the predictor variables
    • variables used to account for variation in the outcome
    • also known as explanatory, independent, or input variables
5

Let's look at the data

starbucks <- openintro::starbucks %>%
mutate(bakery = factor(if_else(type == "bakery", 1, 0)))
p1 <- ggplot(data = starbucks, aes(x = carb)) +
geom_histogram(fill = "steelblue", color = "black") +
labs(x = "Carbohydrates (in grams)",
y = "Count")
p2 <- ggplot(data = starbucks, aes(x = calories)) +
geom_histogram(fill = "steelblue", color = "black") +
labs(x = "Calories",
y = "Count")
p3 <- ggplot(data = starbucks, aes(x = bakery)) +
geom_bar(fill = "steelblue", color = "black") +
labs(x = "Bakery Item",
y = "Count")
p1 + (p2 / p3)
6

Response vs. Predictors

carbs=f(calories,bakery)+ϵ

p1 <- ggplot(data = starbucks, aes(x = calories, y = carb)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE, color = "steelblue") +
labs(x = "Calories",
y = "Carbohydrates (grams)")
p2 <- ggplot(data = starbucks, aes(x = bakery, y = carb)) +
geom_boxplot(fill = "steelblue", color = "black") +
labs(x = "Bakery",
y = "Carbohydrates (grams)")
p1 + p2
7

Model

carbs=f(calories,bakery)+ϵ

  • Goal: Determine f
8

Model

carbs=f(calories,bakery)+ϵ

  • Goal: Determine f

  • How do we determine f?

    • Make an assumption about the functional form f
    • Use the data to fit a model based on that form
8

Determine f

In general,

1) Choose the functional form of f, i.e. choose the appropriate model given the response variable

  • Suppose f is a linear model y=f(X)=β0+β1x1++βpxp+ϵ
9

Determine f

In general,

1) Choose the functional form of f, i.e. choose the appropriate model given the response variable

  • Suppose f is a linear model y=f(X)=β0+β1x1++βpxp+ϵ

2) Use the data to fit (or train) the model, i.e estimate the model parameters

  • Estimate β0,β1,,βp
9

Carbs vs. Calories

carbs=β0+β1 calories+ϵ

10

Carbs vs. Calories + Bakery

carbs=β0+β1 calories+β2 bakery+ϵ

11

Carbs vs. Calories + Bakery (with interaction)

carbs=β0+β1 calories+β2 bakery+β3 calories×bakery+ϵ

12

Code for plot on previous slide

ggplot(data = starbucks, aes(x = calories, y = carb, color = bakery)) +
geom_point(alpha = 0.5) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Calories",
y = "Carbohydrates (grams)",
color = "Bakery",
title = "Total Carbohydrates vs. Calories",
subtitle = "With Interaction") +
scale_color_manual(values=c("#1B9E77", "#7570B3"))
13

Why?

carbs=β0+β1 calories+β2 bakery+β3 calories×bakery+ϵ

14

Why?

carbs=β0+β1 calories+β2 bakery+β3 calories×bakery+ϵ

Prediction:

What do we expect the total carbohydrates to be in a piece of Starbucks pumpkin bread, a bakery item that is 410 calories?

14

Why?

carbs=β0+β1 calories+β2 bakery+β3 calories×bakery+ϵ

Prediction:

What do we expect the total carbohydrates to be in a piece of Starbucks pumpkin bread, a bakery item that is 410 calories?

Inference:

What is the relationship between the calories and total carbohydrates for bakery items at Starbucks? For non-bakery items?

14

Course Outline

Unit 1: Quantitative Response Variables

  • Simple Linear Regression
  • Multiple Linear Regression


Unit 3: Looking Ahead

  • Log-linear Regression
  • Weighted Least Squares
  • Presenting statistical results
  • Unit 2: Categorical Response Variable
    • Logistic Regression
    • Multinomial Logistic Regression
15
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow