class: center, middle, inverse, title-slide # Model comparison ### Prof. Maria Tackett --- class: middle, center ## [Click here for PDF of slides](16-model-comparison.pdf) --- ## Topics - ANOVA for Multiple Linear Regression - Nested F Test - `\(R^2\)` vs. Adj. `\(R^2\)` - AIC & BIC --- ## Restaurant tips What affects the amount customers tip at a restaurant? - **Response:** - <font class="vocab">`Tip`</font>: amount of the tip - **Predictors:** - <font class="vocab">`Party`</font>: number of people in the party - <font class="vocab">`Meal`</font>: time of day (Lunch, Dinner, Late Night) - <font class="vocab">`Age`</font>: age category of person paying the bill (Yadult, Middle, SenCit) --- ## Response Variable <img src="16-model-comparison_files/figure-html/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ## Predictor Variables <img src="16-model-comparison_files/figure-html/unnamed-chunk-5-1.png" style="display: block; margin: auto;" /> --- ## Response vs. Predictors <img src="16-model-comparison_files/figure-html/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ## Restaurant tips: model |term | estimate| std.error| statistic| p.value| conf.low| conf.high| |:-----------|--------:|---------:|---------:|-------:|--------:|---------:| |(Intercept) | 0.838| 0.397| 2.112| 0.036| 0.055| 1.622| |Party | 1.837| 0.124| 14.758| 0.000| 1.591| 2.083| |AgeSenCit | 0.379| 0.410| 0.925| 0.356| -0.430| 1.189| |AgeYadult | -1.009| 0.408| -2.475| 0.014| -1.813| -0.204| <br> .center[ .question[ **Is this the best model to explain variation in Tips?** ] ] --- ## ANOVA test for MLR Using the ANOVA table, we can test whether any variable in the model is a significant predictor of the response. We conduct this test using the following hypotheses: .eq[ `$$\begin{aligned}&H_0: \beta_{1} = \beta_{2} = \dots = \beta_p = 0 \\ &H_a: \text{at least one }\beta_j \text{ is not equal to 0}\end{aligned}$$` ] - The statistic for this test is the `\(F\)` test statistic in the ANOVA table - We calculate the p-value using an `\(F\)` distribution with `\(p\)` and `\((n-p-1)\)` degrees of freedom --- ## Tips: ANOVA Test <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> df </th> <th style="text-align:right;"> sumsq </th> <th style="text-align:right;"> meansq </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> Party </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 1 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 1188.636 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 1188.636 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 285.712 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.000 </td> </tr> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> Age </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 2 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 38.028 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 19.014 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 4.570 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.012 </td> </tr> <tr> <td style="text-align:left;"> Residuals </td> <td style="text-align:right;"> 165 </td> <td style="text-align:right;"> 686.444 </td> <td style="text-align:right;"> 4.160 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> -- **Model df**: 3 **Model SS**: 1188.636 + 38.028 = 1226.664 **Model MS**: 1226.664/ 3 = 408.888 -- **FStat**: 408.888 / 4.160 = 98.2903846 **P-value**: P(F > 98.2903846) `\(\approx 0\)` - calculated using an F distribution with 3 and 165 degrees of freedom --- ## Tips: ANOVA Test <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> df </th> <th style="text-align:right;"> sumsq </th> <th style="text-align:right;"> meansq </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> Party </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 1 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 1188.636 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 1188.636 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 285.712 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.000 </td> </tr> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> Age </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 2 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 38.028 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 19.014 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 4.570 </td> <td style="text-align:right;background-color: #dce5b2 !important;"> 0.012 </td> </tr> <tr> <td style="text-align:left;"> Residuals </td> <td style="text-align:right;"> 165 </td> <td style="text-align:right;"> 686.444 </td> <td style="text-align:right;"> 4.160 </td> <td style="text-align:right;"> </td> <td style="text-align:right;"> </td> </tr> </tbody> </table> .vocab[The data provide sufficient evidence to conclude that at least one coefficient is non-zero, i.e. at least one predictor in the model is significant.] --- ## Testing subset of coefficients - Sometimes we want to test whether a **subset of coefficients** are all equal to 0 - This is often the case when we want test - whether a categorical variable with `\(k\)` levels is a significant predictor of the response - whether the interaction between a categorical and quantitative variable is significant - To do so, we will use the .vocab[Nested (Partial) F Test] --- ## Nested (Partial) F Test - Suppose we have a full and reduced model: `$$\begin{aligned}&\text{Full}: y = \beta_0 + \beta_1 x_1 + \dots + \beta_q x_q + \beta_{q+1} x_{q+1} + \dots \beta_p x_p \\ &\text{Reduced}: y = \beta_0 + \beta_1 x_1 + \dots + \beta_q x_q\end{aligned}$$` -- - We want to test whether any of the variables `\(x_{q+1}, x_{q+2}, \ldots, x_p\)` are significant predictors. To do so, we will test the hypothesis: .eq[ `$$\begin{aligned}&H_0: \beta_{q+1} = \beta_{q+2} = \dots = \beta_p = 0 \\ &H_a: \text{at least one }\beta_j \text{ is not equal to 0}\end{aligned}$$` ] --- ## Nested F Test - The test statistic for this test is `$$F = \frac{(SSE_{reduced} - SSE_{full})\big/\text{# predictors tested}}{SSE_{full}\big/(n-p_{full}-1)}$$` <br> - Calculate the p-value using the F distribution with df1 = # predictors tested and df2 = `\((n-p_{full}-1)\)` --- ## Is `Meal` a significant predictor of tips? <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 1.254 </td> </tr> <tr> <td style="text-align:left;"> Party </td> <td style="text-align:right;"> 1.808 </td> </tr> <tr> <td style="text-align:left;"> AgeSenCit </td> <td style="text-align:right;"> 0.390 </td> </tr> <tr> <td style="text-align:left;"> AgeYadult </td> <td style="text-align:right;"> -0.505 </td> </tr> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> MealLate Night </td> <td style="text-align:right;background-color: #dce5b2 !important;"> -1.632 </td> </tr> <tr> <td style="text-align:left;background-color: #dce5b2 !important;"> MealLunch </td> <td style="text-align:right;background-color: #dce5b2 !important;"> -0.612 </td> </tr> </tbody> </table> --- ## Tips: Nested F test `$$\begin{aligned}&H_0: \beta_{late night} = \beta_{lunch} = 0\\ &H_a: \text{ at least one }\beta_j \text{ is not equal to 0}\end{aligned}$$` -- ```r reduced <- lm(Tip ~ Party + Age, data = tips) ``` -- ```r full <- lm(Tip ~ Party + Age + Meal, data = tips) ``` -- ```r #Nested F test in R anova(reduced, full) ``` --- ## Tips: Nested F test | Res.Df| RSS| Df| Sum of Sq| F| Pr(>F)| |------:|-------:|--:|---------:|-----:|------:| | 165| 686.444| | | | | | 163| 622.979| 2| 63.465| 8.303| 0| -- .vocab[F Stat]: `\(\frac{(686.444 - 622.979)/2}{622.979/(169 - 5 - 1)} = 8.303\)` -- .vocab[P-value]: P(F > 8.303) = 0.0003 - calculated using an F distribution with 2 and 163 degrees of freedom -- .vocab[The data provide sufficient evidence to conclude that at least one coefficient associated with `Meal` is not zero. Therefore, `Meal` is a significant predictor of `Tips`.] --- ## Model with `Meal` <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> <th style="text-align:right;"> conf.low </th> <th style="text-align:right;"> conf.high </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 1.254 </td> <td style="text-align:right;"> 0.394 </td> <td style="text-align:right;"> 3.182 </td> <td style="text-align:right;"> 0.002 </td> <td style="text-align:right;"> 0.476 </td> <td style="text-align:right;"> 2.032 </td> </tr> <tr> <td style="text-align:left;"> Party </td> <td style="text-align:right;"> 1.808 </td> <td style="text-align:right;"> 0.121 </td> <td style="text-align:right;"> 14.909 </td> <td style="text-align:right;"> 0.000 </td> <td style="text-align:right;"> 1.568 </td> <td style="text-align:right;"> 2.047 </td> </tr> <tr> <td style="text-align:left;"> AgeSenCit </td> <td style="text-align:right;"> 0.390 </td> <td style="text-align:right;"> 0.394 </td> <td style="text-align:right;"> 0.990 </td> <td style="text-align:right;"> 0.324 </td> <td style="text-align:right;"> -0.388 </td> <td style="text-align:right;"> 1.168 </td> </tr> <tr> <td style="text-align:left;"> AgeYadult </td> <td style="text-align:right;"> -0.505 </td> <td style="text-align:right;"> 0.412 </td> <td style="text-align:right;"> -1.227 </td> <td style="text-align:right;"> 0.222 </td> <td style="text-align:right;"> -1.319 </td> <td style="text-align:right;"> 0.308 </td> </tr> <tr> <td style="text-align:left;"> MealLate Night </td> <td style="text-align:right;"> -1.632 </td> <td style="text-align:right;"> 0.407 </td> <td style="text-align:right;"> -4.013 </td> <td style="text-align:right;"> 0.000 </td> <td style="text-align:right;"> -2.435 </td> <td style="text-align:right;"> -0.829 </td> </tr> <tr> <td style="text-align:left;"> MealLunch </td> <td style="text-align:right;"> -0.612 </td> <td style="text-align:right;"> 0.402 </td> <td style="text-align:right;"> -1.523 </td> <td style="text-align:right;"> 0.130 </td> <td style="text-align:right;"> -1.405 </td> <td style="text-align:right;"> 0.181 </td> </tr> </tbody> </table> --- ## Including interactions Does the effect of `Party` differ based on the `Meal` time? |term | estimate| |:--------------------|--------:| |(Intercept) | 1.276| |Party | 1.795| |AgeSenCit | 0.401| |AgeYadult | -0.470| |MealLate Night | -1.845| |MealLunch | -0.461| |Party:MealLate Night | 0.111| |Party:MealLunch | -0.050| --- ## Nested F test for interactions Let's use a Nested F test to determine if `Party*Meal` is statistically significant. ```r reduced <- lm(Tip ~ Party + Age + Meal, data = tips) ``` -- ```r full <- lm(Tip ~ Party + Age + Meal + Meal * Party, data = tips) ``` -- | Res.Df| RSS| Df| Sum of Sq| F| Pr(>F)| |------:|-------:|--:|---------:|-----:|------:| | 163| 622.979| | | | | | 161| 621.965| 2| 1.014| 0.131| 0.877| --- ## Final model for now We conclude that the effect of **`Party`** does not differ based **`Meal`**. Therefore, we will use the original model that only included main effects. <table> <thead> <tr> <th style="text-align:left;"> term </th> <th style="text-align:right;"> estimate </th> <th style="text-align:right;"> std.error </th> <th style="text-align:right;"> statistic </th> <th style="text-align:right;"> p.value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> (Intercept) </td> <td style="text-align:right;"> 1.254 </td> <td style="text-align:right;"> 0.394 </td> <td style="text-align:right;"> 3.182 </td> <td style="text-align:right;"> 0.002 </td> </tr> <tr> <td style="text-align:left;"> Party </td> <td style="text-align:right;"> 1.808 </td> <td style="text-align:right;"> 0.121 </td> <td style="text-align:right;"> 14.909 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> AgeSenCit </td> <td style="text-align:right;"> 0.390 </td> <td style="text-align:right;"> 0.394 </td> <td style="text-align:right;"> 0.990 </td> <td style="text-align:right;"> 0.324 </td> </tr> <tr> <td style="text-align:left;"> AgeYadult </td> <td style="text-align:right;"> -0.505 </td> <td style="text-align:right;"> 0.412 </td> <td style="text-align:right;"> -1.227 </td> <td style="text-align:right;"> 0.222 </td> </tr> <tr> <td style="text-align:left;"> MealLate Night </td> <td style="text-align:right;"> -1.632 </td> <td style="text-align:right;"> 0.407 </td> <td style="text-align:right;"> -4.013 </td> <td style="text-align:right;"> 0.000 </td> </tr> <tr> <td style="text-align:left;"> MealLunch </td> <td style="text-align:right;"> -0.612 </td> <td style="text-align:right;"> 0.402 </td> <td style="text-align:right;"> -1.523 </td> <td style="text-align:right;"> 0.130 </td> </tr> </tbody> </table> --- class: middle, center ## Model comparision --- ## `\(R^2\)` **Recall**: `\(\color{purple}{R^2}\)` is the proportion of the variation in the response variable explained by the regression model -- `\(R^2\)` will always increase as we add more variables to the model + If we add enough variables, we can always achieve `\(R^2=100\%\)` -- If we only use `\(R^2\)` to choose a best fit model, we will be prone to choose the model with the most predictor variables --- ## Adjusted `\(R^2\)` .vocab[Adjusted R<sup>2</sup>]: measure that includes a penalty for unnecessary predictor variables -- Similar to `\(R^2\)`, it is a measure of the amount of variation in the response that is explained by the regression model -- Differs from `\(R^2\)` by using the mean squares rather than sums of squares and therefore adjusting for the number of predictor variables --- ## `\(R^2\)` and Adjusted `\(R^2\)` `$$R^2 = \frac{SS_{Model}}{SS_{Total}} = 1 - \frac{SS_{Error}}{SS_{Total}}$$` <br> -- .eq[ `$$Adj. R^2 = 1 - \frac{SS_{Error}/(n-p-1)}{SS_{Total}/(n-1)}$$` ] --- ## Using `\(R^2\)` and `\(Adj. R^2\)` `\(Adj. R^2\)` can be used as a quick assessment to compare the fit of multiple models; however, it should not be the only assessment! -- Use `\(R^2\)` when describing the relationship between the response and predictor variables --- ## Tips: Comparing models Let's compare two models: .small[ ```r model1 <- lm(Tip ~ Party + Age + Meal, data = tips) glance(model1) %>% select(r.squared, adj.r.squared) ``` ``` ## # A tibble: 1 x 2 ## r.squared adj.r.squared ## <dbl> <dbl> ## 1 0.674 0.664 ``` ```r model2 <- lm(Tip ~ Party + Age + Meal + Day, data = tips) glance(model2) %>% select(r.squared, adj.r.squared) ``` ``` ## # A tibble: 1 x 2 ## r.squared adj.r.squared ## <dbl> <dbl> ## 1 0.683 0.662 ``` ] --- ## AIC & BIC .vocab[Akaike's Information Criterion (AIC):] `$$AIC = n\log(SS_\text{Error}) - n \log(n) + 2(p+1)$$` <br> .vocab[Schwarz's Bayesian Information Criterion (BIC)] `$$BIC = n\log(SS_\text{Error}) - n\log(n) + log(n)\times(p+1)$$` <br> <br> See the [supplemental note](https://github.com/sta210-sp20/supplemental-notes/blob/master/model-selection-criteria.pdf) on AIC & BIC for derivations. --- ## AIC & BIC `$$\begin{aligned} & AIC = \color{blue}{n\log(SS_\text{Error})} - n \log(n) + 2(p+1) \\ & BIC = \color{blue}{n\log(SS_\text{Error})} - n\log(n) + \log(n)\times(p+1) \end{aligned}$$` -- <br> .vocab3[First Term: Decreases as *p* increases] --- ## AIC & BIC `$$\begin{aligned} & AIC = n\log(SS_\text{Error}) - \color{blue}{n \log(n)} + 2(p+1) \\ & BIC = n\log(SS_\text{Error}) - \color{blue}{n\log(n)} + \log(n)\times(p+1) \end{aligned}$$` <br> .vocab3[Second Term: Fixed for a given sample size *n*] --- ## AIC & BIC `$$\begin{aligned} & AIC = n\log(SS_\text{Error}) - n\log(n) + \color{blue}{2(p+1)} \\ & BIC = n\log(SS_\text{Error}) - n\log(n) + \color{blue}{\log(n)\times(p+1)} \end{aligned}$$` <br> .vocab3[Third Term: Increases as *p* increases] --- ## Using AIC & BIC `$$\begin{aligned} & AIC = n\log(SS_{Error}) - n \log(n) + \color{red}{2(p+1)} \\ & BIC = n\log(SS_{Error}) - n\log(n) + \color{red}{\log(n)\times(p+1)} \end{aligned}$$` <br> <br> - Choose model with the smaller value of AIC or BIC - If `\(n \geq 8\)`, the <font color="red">penalty</font> for BIC is larger than that of AIC, so BIC tends to favor *more parsimonious* models (i.e. models with fewer terms) --- ## Tips: AIC & BIC .small[ ```r model1 <- lm(Tip ~ Party + Age + Meal, data = tips) glance(model1) %>% select(AIC, BIC) ``` ``` ## # A tibble: 1 x 2 ## AIC BIC ## <dbl> <dbl> ## 1 714. 736. ``` ```r model2 <- lm(Tip ~ Party + Age + Meal + Day, data = tips) glance(model2) %>% select(AIC, BIC) ``` ``` ## # A tibble: 1 x 2 ## AIC BIC ## <dbl> <dbl> ## 1 720. 757. ``` ] --- ## Recap - ANOVA for Multiple Linear Regression - Nested F Test - `\(R^2\)` vs. Adj. `\(R^2\)` - AIC & BIC