Download Report

Section 7.2 Sampling Distribution of the Sample Mean
The Central Limit Theorem
It seems reasonable to estimate the mean µ of a population by using a sample mean from a
representative simple random sample drawn from the population. For example, I might estimate the
mean height µ for all APSU students enrolled in spring 2011 by taking a SRS of 50 students and finding
the mean height for those 50 students. Recall that we call this a point estimate for µ.
So how good is our estimate? Well, the sample mean is a random variable (it varies from sample to
sample) and so it has a distribution. Knowing the distribution of the sample mean helps us to know
how good our estimate is. Let’s look at an example to see what we can say about the mean and
standard deviation of the distribution of the sample mean . Another example will help us see what the
shape of the distribution should be.
Example 1 The heights in inches of 5 starting players on a men’s basketball team are as follows.
Alfred: 76
Bob: 79 Carl: 85
Dennis: 82
Edgar: 78
We will now answer the following questions.
The population mean height µ = _____________________.
How many samples of size n = 2 players can be chosen? __________________
Now list all the possible samples of size 2 and then calculate the sample mean.
The mean of the sample means , µ( ), is _____________________.
What you have just noticed is not a coincidence! The mean of all the sample means is always equal to
the population mean.
The population standard deviation of the players’ heights is ____________________.
The standard deviation of the mean heights is ______________________.
These numbers are different! It turns out that the standard deviation for the distribution of sample
means depends on n, the sample size. If the sample size is small relative to the population size, then for
samples of size n, the standard deviation for the distribution of the sample means is given by
σ( ) =

n
the mean.
, where σ is the population standard deviation. We will call this the exact standard error of
Notice that, because we are dividing by
, the larger the size n of the sample, the closer the sample
means will be packed around the population mean.
Example Suppose that in Tennessee the mean living space for a single family detached home is
µ = 1742 ft2 with a standard deviation of σ = 568 ft2.
a) For samples of size n = 25, give the mean and standard deviation for the distribution of sample
means.
µ( ) = ______________
σ( ) = __________________
b) For samples of size n = 500, give the mean and standard deviation for the distribution of sample
means.
µ( ) = ______________
σ( ) = __________________
Shape of the distribution of the sample means
So now we know what the mean and standard deviation are for the distribution of sample means from
samples of size n. But what shape does that distribution have? Is it unimodal? bimodal? symmetric?
Let’s look at an example to see if we can find out.
Example Consider the following table giving the number of people per household and the relative
frequency for each number.
# of people
1
2
3
4
5
6
7
Relative freq.
.232
.317
.175
.154
.073
.030
.019
What are the population mean and standard deviation? [Enter one column in L1 on your calculator, the
other in L2 and do 1-var stat L1,L2 on your home screen to get the answers.] You should get µ = 2.685
persons and σ = 1.47 persons. What shape distribution does this population have? To get an idea we’ll
take a simple random sample of size n = 1000 (using Minitab) to see what it might look like. What shape
do you see?
Now let’s take 10,000 samples of size n = 30, calculate the mean for each sample and look at the
distribution of those means. (Again, using Minitab) What shape does this distribution have? Is it the
same as the population?
*We have seen that the sample means are approximately normally distributed with a mean µ( ) =
2.685 (the population mean) and σ( ) = 0.2684 (the population standard deviation divided by the square
root of the sample size). If we were to increase our sample size, the distribution would still be
approximately normal, centered at the population mean of 2.685 persons, but the standard deviation
will decrease, so the distribution ‘tightens’ around 2.685. (There is less variation in sample means as the
sample size gets larger.)
This is the content of the ‘fundamental theorem of statistics’ the Central Limit Theorem (CLT).
CLT As the sample size n increases, the sample mean has a distribution that tends toward a normal
distribution N(µ( ),σ( )) where µ( ) = population mean and σ( ) = population standard deviation
divided by the square root of the sample size: σ/
.
Note: If the population distribution is itself normal or very nearly so, then the distribution of will have
a normal model for samples of any size. In general we can use a normal model if the sample size n is at
least 30, regardless of the population distribution! If the population is ‘somewhat normal’, then we can
use a normal model even for a sample size of 10 or 12.
Example Suppose that for adults the mean weight is 175 lb with a standard deviation of 25 lb and that
the weights have approximately a normal distribution. An elevator has a weight limit of 10 people or
2000 lb. What is the probability that the 10 people who get on the elevator will go over its weight limit?
Solution: We are really asking ‘What is the probability that the mean weight of a sample of 10 people is
more than 200 lb (2000 lb/10)?” We will assume that the 10 people are a random sample and that the
weights are independent. [Is this always necessarily so? Think of an elementary school field trip, a
football team at a hotel, a weight loss clinic on the 4th floor etc.]
Our population mean is 175 lb and our population standard deviation is 25 lb. Since our population
distribution is approximately normal, the CLT says we can use a normal model for the distribution of
sample means from samples of size n = 10. This model will have mean = 175 lb (the population mean)
and standard deviation = 25/
= 7.91 lb, correct to two decimal places. To see where a mean of 200
lb would be in this distribution we calculate its z-score.
200  175
 3.16 . Thus, P( > 200) = P(z > 3.16) = .0008. Our conclusion is that there only a very
7.91
slight chance that 10 people would overload the elevator.
z=
Sampling Distribution Models: A Recap
 The statistic (mean, proportion, etc.) is a random variable.
 The sampling distribution shows us the distribution of possible values the statistic could have.
 For the sample mean and the sample proportion the CLT tells us that we can model the
sampling distribution with a normal model for samples of an appropriate size.
 Key idea: The CLT states that the sampling distribution model for the sample mean (and the
sample proportion) is approximately normal for large n, regardless of the shape of the
population distribution, as long as the observations are independent.
Note: A proportion can always be viewed as a mean by letting a ‘success’ be indicated by a 1, a ‘failure’
by a 0. Then the mean of the 1’s and 0’s gives the proportion of successes!