Introduce multinomial logistic regression
Interpret model coefficients
Inference for a coefficient βjk
In practice, there are many different types of response variables including:
These are all examples of generalized linear models, a broader class of models that generalize the multiple linear regression model
See Generalized Linear Models: A Unifying Theory for more details about GLMs
log(ˆπi1−ˆπi)=ˆβ0+ˆβ1xi
ˆπi=exp{ˆβ0+ˆβ1xi}1+exp{ˆβ0+ˆβ1xi}
Suppose we consider y=0 the baseline category such that
P(yi=1|xi)=ˆπi1 and P(yi=0|xi)=ˆπi0
Suppose we consider y=0 the baseline category such that
P(yi=1|xi)=ˆπi1 and P(yi=0|xi)=ˆπi0
Then the logistic regression model is
log(ˆπi11−ˆπi1)=log(ˆπi1ˆπi0)=ˆβ0+ˆβ1xi
Suppose we consider y=0 the baseline category such that
P(yi=1|xi)=ˆπi1 and P(yi=0|xi)=ˆπi0
Then the logistic regression model is
log(ˆπi11−ˆπi1)=log(ˆπi1ˆπi0)=ˆβ0+ˆβ1xi
Slope, ˆβ1: When x increases by one unit, the odds of y=1 versus the baseline y=0 are expected to multiply by a factor of exp{ˆβ1}
Intercept, ˆβ0: When x=0, the predicted odds of y=1 versus the baseline y=0 are exp{ˆβ0}
Suppose the response variable y is categorical and can take values 1,2,…,K such that (K>2)
P(y=1)=π1,P(y=2)=π2,…,P(y=K)=πK
such that K∑k=1πk=1
If we have an explanatory variable x, then we want to fit a model such that P(y=k)=πk is a function of x
Choose a baseline category. Let's choose y=1. Then,
log(πikπi1)=β0k+β1kxi
If we have an explanatory variable x, then we want to fit a model such that P(y=k)=πk is a function of x
Choose a baseline category. Let's choose y=1. Then,
log(πikπi1)=β0k+β1kxi
Suppose we have a response variable y that can take three possible outcomes that are coded as "A", "B", "C"
Let "A" be the baseline category. Then
log(πiBπiA)=β0B+β1Bxilog(πiCπiA)=β0C+β1Cxi
National Health and Nutrition Examination Survey is conducted by the National Center for Health Statistics (NCHS)
The goal is to "assess the health and nutritional status of adults and children in the United States"
This survey includes an interview and a physical examination
We will use the data from the NHANES
R package
Contains 75 variables for the 2009 - 2010 and 2011 - 2012 sample years
The data in this package is modified for educational purposes and should not be used for research
Original data can be obtained from the NCHS website for research purposes
Type ?NHANES
in console to see list of variables and definitions
Question: Can we use a person's age and whether they do regular physical activity to predict their self-reported health rating?
We will analyze the following variables:
HealthGen
: Self-reported rating of participant's health in general. Excellent, Vgood, Good, Fair, or Poor.
Age
: Age at time of screening (in years). Participants 80 or older were recorded as 80.
PhysActive
: Participant does moderate to vigorous-intensity sports, fitness or recreational activities
## Rows: 6,710## Columns: 4## $ HealthGen <fct> Good, Good, Good, Good, Vgood, Vgood, Vgood, Vgood, Vgood, …## $ Age <int> 34, 34, 34, 49, 45, 45, 45, 66, 58, 54, 50, 33, 60, 56, 56,…## $ PhysActive <fct> No, No, No, No, Yes, Yes, Yes, Yes, Yes, Yes, Yes, No, No, …## $ obs_num <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
multinom()
function in the nnet
package library(nnet)health_m <- multinom(HealthGen ~ Age + PhysActive, data = nhanes_adult)
results = "hide"
in the code chunk header to suppress convergence output tidy(health_m, conf.int = TRUE, exponentiate = FALSE) %>% kable(digits = 3, format = "markdown")
y.level | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|
Vgood | (Intercept) | 1.205 | 0.145 | 8.325 | 0.000 | 0.922 | 1.489 |
Vgood | Age | 0.001 | 0.002 | 0.369 | 0.712 | -0.004 | 0.006 |
Vgood | PhysActiveYes | -0.321 | 0.093 | -3.454 | 0.001 | -0.503 | -0.139 |
Good | (Intercept) | 1.948 | 0.141 | 13.844 | 0.000 | 1.672 | 2.223 |
Good | Age | -0.002 | 0.002 | -0.977 | 0.329 | -0.007 | 0.002 |
Good | PhysActiveYes | -1.001 | 0.090 | -11.120 | 0.000 | -1.178 | -0.825 |
Fair | (Intercept) | 0.915 | 0.164 | 5.566 | 0.000 | 0.592 | 1.237 |
Fair | Age | 0.003 | 0.003 | 1.058 | 0.290 | -0.003 | 0.009 |
Fair | PhysActiveYes | -1.645 | 0.107 | -15.319 | 0.000 | -1.856 | -1.435 |
Poor | (Intercept) | -1.521 | 0.290 | -5.238 | 0.000 | -2.090 | -0.952 |
Poor | Age | 0.022 | 0.005 | 4.522 | 0.000 | 0.013 | 0.032 |
Poor | PhysActiveYes | -2.656 | 0.236 | -11.275 | 0.000 | -3.117 | -2.194 |
The baseline category for the model is Excellent
.
The baseline category for the model is Excellent
.
The model equation for the log-odds a person rates themselves as having "Fair" health vs. "Excellent" is
log(ˆπFairˆπExcellent)=0.915+0.003 age−1.645 PhysActive
log(ˆπFairˆπExcellent)=0.915+0.003 age−1.645 PhysActive
For each additional year in age, the odds a person rates themselves as having fair health versus excellent health are expected to multiply by 1.003 (exp(0.003)), holding physical activity constant.
log(ˆπFairˆπExcellent)=0.915+0.003 age−1.645 PhysActive
For each additional year in age, the odds a person rates themselves as having fair health versus excellent health are expected to multiply by 1.003 (exp(0.003)), holding physical activity constant.
The odds a person who does physical activity will rate themselves as having fair health versus excellent health are expected to be 0.193 (exp(-1.645 )) times the odds for a person who doesn't do physical activity, holding age constant.
log(ˆπFairˆπExcellent)=0.915+0.003 age−1.645 PhysActive
The odds a 0 year old person who doesn't do physical activity rates themselves as having fair health vs. excellent health are 2.497 (exp(0.915)).
log(ˆπFairˆπExcellent)=0.915+0.003 age−1.645 PhysActive
The odds a 0 year old person who doesn't do physical activity rates themselves as having fair health vs. excellent health are 2.497 (exp(0.915)).
⚠️ Need to mean-center age for the intercept to have a meaningful interpretation!
The test of significance for the coefficient βjk is
Hypotheses: H0:βjk=0 vs Ha:βjk≠0
Test Statistic: z=ˆβjk−0SE(^βjk)
P-value: P(|Z|>|z|),
where Z∼N(0,1), the Standard Normal distribution
ˆβjk±z∗SE(ˆβjk)
We are C% confident that for every one unit change in xj, the odds of y=k versus the baseline will multiply by a factor of exp{ˆβjk−z∗SE(ˆβjk)} to exp{ˆβjk+z∗SE(ˆβjk)}, holding all else constant.
y.level | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|
Fair | (Intercept) | 0.915 | 0.164 | 5.566 | 0.00 | 0.592 | 1.237 |
Fair | Age | 0.003 | 0.003 | 1.058 | 0.29 | -0.003 | 0.009 |
Fair | PhysActiveYes | -1.645 | 0.107 | -15.319 | 0.00 | -1.856 | -1.435 |
y.level | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|
Fair | (Intercept) | 0.915 | 0.164 | 5.566 | 0.00 | 0.592 | 1.237 |
Fair | Age | 0.003 | 0.003 | 1.058 | 0.29 | -0.003 | 0.009 |
Fair | PhysActiveYes | -1.645 | 0.107 | -15.319 | 0.00 | -1.856 | -1.435 |
We are 95% confident, that for each additional year in age, the odds a person rates themselves as having fair health versus excellent health will multiply by 0.997 (exp(-0.003)) to 1.009 (exp(0.009)) , holding physical activity constant.
y.level | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|
Fair | (Intercept) | 0.915 | 0.164 | 5.566 | 0.00 | 0.592 | 1.237 |
Fair | Age | 0.003 | 0.003 | 1.058 | 0.29 | -0.003 | 0.009 |
Fair | PhysActiveYes | -1.645 | 0.107 | -15.319 | 0.00 | -1.856 | -1.435 |
We are 95% confident that the odds a person who does physical activity will rate themselves as having fair health versus excellent health are 0.156 (exp(-1.856 )) to 0.238 (exp(-1.435)) times the odds for a person who doesn't do physical activity, holding age constant.
Introduce multinomial logistic regression
Interpret model coefficients
Inference for a coefficient βjk
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |