Statistical inference review

Statistical inference reviewProf. Maria Tackett1

Click for PDF of slides

Topics

Sampling distributions and the Central Limit Theorem
Hypothesis test to test a claim about a population parameter
Confidence interval to estimate a population parameter

Sample Statistics and Sampling Distributions4

Terminology

Population: a group of individuals or objects we are interested in studying

Parameter: a numerical quantity derived from the population (almost always unknown)

If we had data from every unit in the population, we could just calculate population parameters and be done!

Terminology

Population: a group of individuals or objects we are interested in studying

Parameter: a numerical quantity derived from the population (almost always unknown)

If we had data from every unit in the population, we could just calculate population parameters and be done!

Unfortunately, we usually cannot do this.

Sample: a subset of our population of interest

Statistic: a numerical quantity derived from a sample

Inference

If the sample is representative, then we can use the tools of probability and statistical inference to make generalizable conclusions to the broader population of interest.

Similar to tasting a spoonful of soup while cooking to make an inference about the entire pot.

Statistical inference

Statistical inference is the process of using sample data to make conclusions about the underlying population the sample came from.

Estimation: using the sample to estimate a plausible range of values for the unknown parameter
Testing: evaluating whether our observed sample provides evidence for or against some claim about the population

Let's virtually go to Asheville!

How much should we expect to pay for an Airbnb in Asheville?

Asheville data

Inside Airbnb scraped all Airbnb listings in Asheville, NC, that were active on June 25, 2020.

Population of interest: listings in the Asheville with at least ten reviews.

Parameter of interest: Mean price per guest per night among these listings.

What is the mean price per guest per night among Airbnb rentals in June 2020 with at least ten reviews in Asheville (zip codes 28801 - 28806)?

Visualizing our sample

We have data on the price per guest (ppg) for a random sample of 50 Airbnb listings.

Sample statistic

A sample statistic (point estimate) is a single value of a statistic computed from the sample data to serve as the "best guess", or estimate, for the population parameter.

abb %>% 
  summarize(mean_price = mean(ppg))

## # A tibble: 1 x 1
##   mean_price
##        <dbl>
## 1       76.6

Sample statistic

A sample statistic (point estimate) is a single value of a statistic computed from the sample data to serve as the "best guess", or estimate, for the population parameter.

abb %>% 
  summarize(mean_price = mean(ppg))

## # A tibble: 1 x 1
##   mean_price
##        <dbl>
## 1       76.6

If we took another random sample of 50 Airbnbs in Asheville, we'd likely have a different sample statistic.

Variability of sample statistics

Each sample from the population yields a slightly different sample statistic.
The sample-to-sample difference is called sampling variability.
We can use theory to help us understand the underlying sampling distribution and quantify this sample-to-sample variability.