Model comparison

# Model comparison
### Prof. Maria Tackett

---

## [Click here for PDF of slides](16-model-comparison.pdf)

---

## Topics

- ANOVA for Multiple Linear Regression

- Nested F Test

- `$R^2$` vs. Adj. `$R^2$`

- AIC & BIC

---

## Restaurant tips

What affects the amount customers tip at a restaurant?

- **Response:**
    - <font class="vocab">`Tip`</font>: amount of the tip
    
- **Predictors:**
    - <font class="vocab">`Party`</font>: number of people in the party
    - <font class="vocab">`Meal`</font>:  time of day (Lunch, Dinner, Late Night) 
    - <font class="vocab">`Age`</font>: age category of person paying the bill (Yadult, Middle, SenCit)

---

## Response Variable

---

## Predictor Variables

---

## Response vs. Predictors

---

## Restaurant tips: model

|term        | estimate| std.error| statistic| p.value| conf.low| conf.high|
|:-----------|--------:|---------:|---------:|-------:|--------:|---------:|
|(Intercept) |    0.838|     0.397|     2.112|   0.036|    0.055|     1.622|
|Party       |    1.837|     0.124|    14.758|   0.000|    1.591|     2.083|
|AgeSenCit   |    0.379|     0.410|     0.925|   0.356|   -0.430|     1.189|
|AgeYadult   |   -1.009|     0.408|    -2.475|   0.014|   -1.813|    -0.204|
<br>

---

## ANOVA test for MLR

Using the ANOVA table, we can test whether any variable in the model is a significant predictor of the response. We conduct this test using the following hypotheses:

.eq[
`$$\begin{aligned}&H_0: \beta_{1} =  \beta_{2} = \dots = \beta_p = 0 \\ 
&H_a: \text{at least one }\beta_j \text{ is not equal to 0}\end{aligned}$$`
]

- The statistic for this test is the `$F$` test statistic in the ANOVA table

- We calculate the p-value using an `$F$` distribution with `$p$` and `$(n-p-1)$` degrees of freedom

---

## Tips: ANOVA Test

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> df </th>
   <th style="text-align:right;"> sumsq </th>
   <th style="text-align:right;"> meansq </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;background-color: #dce5b2 !important;"> Party </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 1 </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 1188.636 </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 1188.636 </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 285.712 </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 0.000 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #dce5b2 !important;"> Age </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 2 </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 38.028 </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 19.014 </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 4.570 </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> 0.012 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Residuals </td>
   <td style="text-align:right;"> 165 </td>
   <td style="text-align:right;"> 686.444 </td>
   <td style="text-align:right;"> 4.160 </td>
   <td style="text-align:right;">  </td>
   <td style="text-align:right;">  </td>
  </tr>
</tbody>
</table>

**Model df**: 3

**Model SS**: 1188.636 + 38.028 = 1226.664

**Model MS**: 1226.664/ 3 = 408.888

**FStat**: 408.888 /  4.160 = 98.2903846

**P-value**: P(F > 98.2903846) `$\approx 0$`
  - calculated using an F distribution with 3 and 165 degrees of freedom

---

## Tips: ANOVA Test

.vocab[The data provide sufficient evidence to conclude that at least one coefficient is non-zero, i.e. at least one predictor in the model is significant.]

---

## Testing subset of coefficients

- Sometimes we want to test whether a **subset of coefficients** are all equal to 0

- This is often the case when we want test 
    - whether a categorical variable with `$k$` levels is a significant predictor of the response
    - whether the interaction between a categorical and quantitative variable is significant

- To do so, we will use the .vocab[Nested (Partial) F Test]

---

## Nested (Partial) F Test

- Suppose we have a full and reduced model:

`$$\begin{aligned}&\text{Full}: y = \beta_0 + \beta_1 x_1 + \dots + \beta_q x_q + \beta_{q+1} x_{q+1} + \dots \beta_p x_p \\
&\text{Reduced}: y = \beta_0 + \beta_1 x_1 + \dots + \beta_q x_q\end{aligned}$$`

- We want to test whether any of the variables `$x_{q+1}, x_{q+2}, \ldots, x_p$` are significant predictors. To do so, we will test the hypothesis:

.eq[
`$$\begin{aligned}&H_0: \beta_{q+1} =  \beta_{q+2} = \dots = \beta_p = 0 \\ 
&H_a: \text{at least one }\beta_j \text{ is not equal to 0}\end{aligned}$$`
]

---

## Nested F Test

- The test statistic for this test is

`$$F = \frac{(SSE_{reduced} - SSE_{full})\big/\text{# predictors tested}}{SSE_{full}\big/(n-p_{full}-1)}$$`
<br>

- Calculate the p-value using the F distribution with df1 = # predictors tested and df2 = `$(n-p_{full}-1)$`

---

## Is `Meal` a significant predictor of tips?

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 1.254 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Party </td>
   <td style="text-align:right;"> 1.808 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AgeSenCit </td>
   <td style="text-align:right;"> 0.390 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AgeYadult </td>
   <td style="text-align:right;"> -0.505 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #dce5b2 !important;"> MealLate Night </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> -1.632 </td>
  </tr>
  <tr>
   <td style="text-align:left;background-color: #dce5b2 !important;"> MealLunch </td>
   <td style="text-align:right;background-color: #dce5b2 !important;"> -0.612 </td>
  </tr>
</tbody>
</table>

---

## Tips: Nested F test

`$$\begin{aligned}&H_0: \beta_{late night} = \beta_{lunch} = 0\\
&H_a: \text{ at least one }\beta_j \text{ is not equal to 0}\end{aligned}$$`

```r
reduced <- lm(Tip ~ Party + Age, data = tips)
```

```r
full <- lm(Tip ~ Party + Age + Meal, data = tips)
```

```r
#Nested F test in R
anova(reduced, full)
```

---

## Tips: Nested F test

| Res.Df|     RSS| Df| Sum of Sq|     F| Pr(>F)|
|------:|-------:|--:|---------:|-----:|------:|
|    165| 686.444|   |          |      |       |
|    163| 622.979|  2|    63.465| 8.303|      0|

.vocab[P-value]: P(F > 8.303) = 0.0003
  - calculated using an F distribution with 2 and 163 degrees of freedom
  
--

.vocab[The data provide sufficient evidence to conclude that at least one coefficient associated with `Meal` is not zero. Therefore, `Meal` is a significant predictor of `Tips`.]

---

## Model with `Meal`

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
   <th style="text-align:right;"> conf.low </th>
   <th style="text-align:right;"> conf.high </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 1.254 </td>
   <td style="text-align:right;"> 0.394 </td>
   <td style="text-align:right;"> 3.182 </td>
   <td style="text-align:right;"> 0.002 </td>
   <td style="text-align:right;"> 0.476 </td>
   <td style="text-align:right;"> 2.032 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Party </td>
   <td style="text-align:right;"> 1.808 </td>
   <td style="text-align:right;"> 0.121 </td>
   <td style="text-align:right;"> 14.909 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;"> 1.568 </td>
   <td style="text-align:right;"> 2.047 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AgeSenCit </td>
   <td style="text-align:right;"> 0.390 </td>
   <td style="text-align:right;"> 0.394 </td>
   <td style="text-align:right;"> 0.990 </td>
   <td style="text-align:right;"> 0.324 </td>
   <td style="text-align:right;"> -0.388 </td>
   <td style="text-align:right;"> 1.168 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AgeYadult </td>
   <td style="text-align:right;"> -0.505 </td>
   <td style="text-align:right;"> 0.412 </td>
   <td style="text-align:right;"> -1.227 </td>
   <td style="text-align:right;"> 0.222 </td>
   <td style="text-align:right;"> -1.319 </td>
   <td style="text-align:right;"> 0.308 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> MealLate Night </td>
   <td style="text-align:right;"> -1.632 </td>
   <td style="text-align:right;"> 0.407 </td>
   <td style="text-align:right;"> -4.013 </td>
   <td style="text-align:right;"> 0.000 </td>
   <td style="text-align:right;"> -2.435 </td>
   <td style="text-align:right;"> -0.829 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> MealLunch </td>
   <td style="text-align:right;"> -0.612 </td>
   <td style="text-align:right;"> 0.402 </td>
   <td style="text-align:right;"> -1.523 </td>
   <td style="text-align:right;"> 0.130 </td>
   <td style="text-align:right;"> -1.405 </td>
   <td style="text-align:right;"> 0.181 </td>
  </tr>
</tbody>
</table>

---

## Including interactions

Does the effect of `Party` differ based on the `Meal` time?

|term                 | estimate|
|:--------------------|--------:|
|(Intercept)          |    1.276|
|Party                |    1.795|
|AgeSenCit            |    0.401|
|AgeYadult            |   -0.470|
|MealLate Night       |   -1.845|
|MealLunch            |   -0.461|
|Party:MealLate Night |    0.111|
|Party:MealLunch      |   -0.050|

---

## Nested F test for interactions

Let's use a Nested F test to determine if `Party*Meal` is statistically significant.

```r
reduced <- lm(Tip ~ Party + Age + Meal, data = tips)
```

```r
full <- lm(Tip ~ Party + Age + Meal + Meal * Party, 
           data = tips)
```

| Res.Df|     RSS| Df| Sum of Sq|     F| Pr(>F)|
|------:|-------:|--:|---------:|-----:|------:|
|    163| 622.979|   |          |      |       |
|    161| 621.965|  2|     1.014| 0.131|  0.877|

---

## Final model for now

We conclude that the effect of **`Party`** does not differ based **`Meal`**. Therefore, we will use the original model that only included main effects.

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> term </th>
   <th style="text-align:right;"> estimate </th>
   <th style="text-align:right;"> std.error </th>
   <th style="text-align:right;"> statistic </th>
   <th style="text-align:right;"> p.value </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> (Intercept) </td>
   <td style="text-align:right;"> 1.254 </td>
   <td style="text-align:right;"> 0.394 </td>
   <td style="text-align:right;"> 3.182 </td>
   <td style="text-align:right;"> 0.002 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Party </td>
   <td style="text-align:right;"> 1.808 </td>
   <td style="text-align:right;"> 0.121 </td>
   <td style="text-align:right;"> 14.909 </td>
   <td style="text-align:right;"> 0.000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AgeSenCit </td>
   <td style="text-align:right;"> 0.390 </td>
   <td style="text-align:right;"> 0.394 </td>
   <td style="text-align:right;"> 0.990 </td>
   <td style="text-align:right;"> 0.324 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> AgeYadult </td>
   <td style="text-align:right;"> -0.505 </td>
   <td style="text-align:right;"> 0.412 </td>
   <td style="text-align:right;"> -1.227 </td>
   <td style="text-align:right;"> 0.222 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> MealLate Night </td>
   <td style="text-align:right;"> -1.632 </td>
   <td style="text-align:right;"> 0.407 </td>
   <td style="text-align:right;"> -4.013 </td>
   <td style="text-align:right;"> 0.000 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> MealLunch </td>
   <td style="text-align:right;"> -0.612 </td>
   <td style="text-align:right;"> 0.402 </td>
   <td style="text-align:right;"> -1.523 </td>
   <td style="text-align:right;"> 0.130 </td>
  </tr>
</tbody>
</table>

---

## Model comparision

---

## `$R^2$`

**Recall**: `$\color{purple}{R^2}$` is the proportion of the variation in the response variable explained by the regression model

`$R^2$` will always increase as we add more variables to the model 
  + If we add enough variables, we can always achieve `$R^2=100\%$`

If we only use `$R^2$` to choose a best fit model, we will be prone to choose the model with the most predictor variables

---

## Adjusted `$R^2$`

Similar to `$R^2$`, it is a measure of the amount of variation in the response that is explained by the regression model

Differs from `$R^2$` by using the mean squares rather than sums of squares and therefore adjusting for the number of predictor variables

---

## `$R^2$` and Adjusted `$R^2$`

`$$R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Error}}{SS_{Total}}$$`
<br>

---

## Using `$R^2$` and `$Adj. R^2$`

`$Adj. R^2$` can be used as a quick assessment to compare the fit of multiple models; however, it should not be the only assessment!

Use `$R^2$` when describing the relationship between the response and predictor variables

---

## Tips: Comparing models

Let's compare two models:

```r
model1 <- lm(Tip ~ Party + Age + Meal, data = tips)
glance(model1) %>% select(r.squared, adj.r.squared)
```

```
## # A tibble: 1 x 2
##   r.squared adj.r.squared
##       <dbl>         <dbl>
## 1     0.674         0.664
```

```r
model2 <- lm(Tip ~ Party + Age + Meal + Day, data = tips)
glance(model2) %>% select(r.squared, adj.r.squared)
```

```
## # A tibble: 1 x 2
##   r.squared adj.r.squared
##       <dbl>         <dbl>
## 1     0.683         0.662
```
]

---

## AIC & BIC

.vocab[Akaike's Information Criterion (AIC):]
`$$AIC = n\log(SS_\text{Error}) - n \log(n) + 2(p+1)$$`
<br>

.vocab[Schwarz's Bayesian Information Criterion (BIC)]
`$$BIC = n\log(SS_\text{Error}) - n\log(n) + log(n)\times(p+1)$$`
<br>
<br>

See the [supplemental note](https://github.com/sta210-sp20/supplemental-notes/blob/master/model-selection-criteria.pdf) on AIC & BIC for derivations.

---

## AIC & BIC

`$$\begin{aligned} & AIC = \color{blue}{n\log(SS_\text{Error})} - n \log(n) + 2(p+1) \\
& BIC = \color{blue}{n\log(SS_\text{Error})} - n\log(n) + \log(n)\times(p+1) \end{aligned}$$`

<br>

---

## AIC & BIC

`$$\begin{aligned} & AIC = n\log(SS_\text{Error}) - \color{blue}{n \log(n)} + 2(p+1) \\
& BIC = n\log(SS_\text{Error}) - \color{blue}{n\log(n)} + \log(n)\times(p+1) \end{aligned}$$`

<br>

---

## AIC & BIC

`$$\begin{aligned} & AIC = n\log(SS_\text{Error}) - n\log(n) + \color{blue}{2(p+1)} \\
& BIC = n\log(SS_\text{Error}) - n\log(n) + \color{blue}{\log(n)\times(p+1)} \end{aligned}$$`

<br>

---

## Using AIC & BIC

`$$\begin{aligned} & AIC = n\log(SS_{Error}) - n \log(n) + \color{red}{2(p+1)} \\
& BIC = n\log(SS_{Error}) - n\log(n) + \color{red}{\log(n)\times(p+1)} \end{aligned}$$`
<br>
<br>

- Choose model with the smaller value of AIC or BIC

- If `$n \geq 8$`, the <font color="red">penalty</font> for BIC is larger than that of AIC, so BIC tends to favor *more parsimonious* models (i.e. models with fewer terms)

---

## Tips: AIC & BIC

```r
model1 <- lm(Tip ~ Party + Age + Meal, data = tips)
glance(model1) %>% select(AIC, BIC)
```

```
## # A tibble: 1 x 2
##     AIC   BIC
##   <dbl> <dbl>
## 1  714.  736.
```

```r
model2 <- lm(Tip ~ Party + Age + Meal + Day, data = tips)
glance(model2) %>% select(AIC, BIC)
```

```
## # A tibble: 1 x 2
##     AIC   BIC
##   <dbl> <dbl>
## 1  720.  757.
```
]

---

## Recap

- ANOVA for Multiple Linear Regression

- Nested F Test

- `$R^2$` vs. Adj. `$R^2$`

- AIC & BIC