You can find the hw-04 repo in the sta210-sp21 organization on GitHub. This repo contains the starter documents and data set needed to complete the lab.
See the Lab 01 instructions for more details about cloning the repo, starting a new RStudio project, and configuring git.
Here are some tips as you complete the assignment:
We will use the following packages in this assignment:
library(tidyverse)
library(broom)
library(knitr)
#add other packages as needed
For Questions 1 - 4, we will use data from an analysis by Siddarth et al. (2018) on the relations between time spent sitting (sedentary behavior) and the thickness of a participant’s medial temporal lobe (MTL). Their 2018 paper is entitled, “Sedentary behavior associated with reduced medial temporal lobe thickness in middle-aged and older adults”.
It is important to understand MTL volume, since it is negatively associated with Alzheimer’s disease and memory impairment. Their data on 35 adults can be found in sitting.csv
. Key variables include:
MTL
= Medial temporal lobe thickness in mmsitting
= Reported hours per day spent sittingMET
= Reported metabolic equivalent unit minutes per weekage
= Age in yearssex
= Sex (M
= Male, F
= Female)education
= Years of education completedIn their article’s introduction, Siddarth et al. (2018) differentiate their analysis on sedentary behavior from an analysis on active behavior by citing evidence supporting the claim that, “one can be highly active yet still be sedentary for most of the day.” Fit your own linear model with MET
and sitting
as your predictor and response variables, respectively.
Fit a preliminary model with MTL
as the response and sitting
as the sole predictor variable. Then, interpret the coefficient of sitting
in the context of the data.
Next, let’s extend the model from the previous exercise and consider the model using sitting
, MET
, and mean-centered age
as predictor variables. Fit the model, then interpret the coefficients of sitting and age in the context of the data.
Compare the two models you’ve fit to understand variability in MTL
. Which model would you select based on the following criteria (1) Adjusted \(R^2\), (2) AIC, (3) BIC. For each criterion, state your choice and briefly explain your reasoning.
For Questions 5 - 7 we will go back to data on houses in King County you used in Lab 06. The data set KingCountyHouses.csv
contains the price and other characteristics of over 20,000 houses sold in King County, Washington (the county that includes Seattle). We will focus on the following variables:
price
: selling price of the house in US dollarssqft
: interior square footagewaterfront
: 1 if the house has a view of the waterfront, 0 otherwiseAs in Lab 06, use only the observations with 1 to 4 bedrooms
We are interested in the fitting a model using the square footage, whether the house has a waterfront view, and the interaction between the two variables to help explain variability in the price. Make a visualization of the price versus square footage with the points differentiated by waterfront
. Interpret the visualization.
Fit a model with the log-transformed price (see Lab 06 to see why we use log-transformed price!) as the response and sqft
, waterfront
, and their interaction as the predictors.
Interpret the effect of square footage on the price of a house for
Use the results from Questions 6 - 8 to write a short paragraph ( ~ 3- 5 sentences) about the relationship between square footage and the price of houses in King County, WA, and how (if at all) the relationship differs based on whether the house has a waterfront view. The paragraph should be written in a way that is practical and can be easily understood by a general audience.
Knit, commit, and push your final changes to your GitHub repo. Then, submit the PDF on Gradescope. See Lab 01 for more detailed submission instructions.
Total | 50 |
---|---|
Part 1: Conceptual questions | 40 |
Part 2: Communicating results | 5 |
Document a PDF rendered from the Rmd file with clear question headers | 3 |
At least 3 informative commit messages | 2 |
The questions from Part 1 are adapted from Beyond Multiple Linear Regression.