7.3 The Sampling Distribution of the Sample Mean

7.3 The Sampling Distribution of the Sample
Mean
I. Sampling Distribution:
• Up to this point we have only partly described the sampling distribution of the
sample mean, i.e. we have shown that the mean and standard deviation of the
sampling distribution, x , can be expressed in terms of the sample size and
population mean and standard deviation:
µx = µ
and
σx =
σ
n
• Now we finish describing the sampling distribution of the sample mean by
utilizing a very important mathematical fact:
If the variable under consideration is normally distributed
Then so is the variable x .
[The proof of this fact requires advanced and complicated mathematics which
We will not concern ourselves with here. An example will serve to show that this
fact is true.]
Ex:
Intelligence Quotients
Intelligence quotients (IQs) measured on the
Stanford-Binet scale are normally distributed
with a mean of 100 and a standard deviation of
16. For a sample size of 4, we will use
simulation to make plausible the fact that x is
normally distributed, i.e., that the possible
sample mean IQs for samples of 4 people have a
normal distribution.
Solution:
µ x = µ = 100
a.
and
σx =
σ
n
=
16
=8
4
b. If we simulate on Minitab 1000 samples of n = 4 IQs each and
determine x for each sample, we obtain the following histogram:
Intelligence Quotients
Histogram / Normal Curve
c. We superimpose a normal curve above the histogram and the normal
curve has a mean of 100 and a standard deviation of 8. Note that
they are roughly shaped alike, i.e. x is normally distributed.
• TECHNOLOGY: Do the simulation on Minitab.
• From the above example we may generalize as follows:
a. Suppose that a variable x of a population is normally distributed with
a mean of µ and a standard deviation of σ. Then, for samples of size
n, the variable x is normally distributed and has a mean of µ and a
standard deviation of
σ
n
This is seen in the following figure for the
population and samples of size 4 and size 16:
Intelligence Quotients
b. From these curves we note the following:
i. Each curve is centered at the population mean, i.e.
µx = µ .
ii. The spread or dispersion becomes less extensive as the
sample size increases, i.e.
σx =
σ
n
.
iii. As the sample size increases, the possible sample means
cluster more closely around the population mean.
iv. The larger the sample size, the smaller the sampling error
is in estimating a population mean by a sample mean
(inferential statistics).
II. The Central Limit Theorem:
• We can further extend the concept of the distribution of the variable x by the
CENTRAL LIMIT THEOREM which is especially important in statistics:
Central Limit Theorem
For a relatively large sample size (n>30), the
variable x is approximately normally distributed,
regardless of the variable under consideration.
The approximation becomes better with
increasing sample size.
• We can illustrate the CLT the same way we illustrated the mathematical fact
above, i.e. we can simulate non-normal distributions on the computer, take
samples from these variables (greater than size 30) and show that the
distribution of the resulting sample will be approximately normally distributed.
• TECHNOLOGY: show on Minitab
• Here’s a summary of the Central Limit theorem:
Sampling Distribution of x
If a variable x of a population has mean µ and
standard deviation σ, then for samples of size n,
1. µx = µ
2. σx = σ / n
3. if x is distributed normally, then so is x,
regardless of n.
4. if n is large ( > 30), x is approximately
normally distributed, regardless of the
distribution of x.
• To show how we use the CLT, let’s look at the following example:
Example
An article by Scott M. Berry titled “Drive for
Show and Putt for Dough” (Chance,1999,
Vol. 12(4),pp. 50-54)discussed driving
distances of PGA players. The mean distance
for tee-shots on the 1999 men’s PGA tour is
272.2 yards with a standard deviation of
8.12 yards.
Example
Example
(a) Determine the sampling distribution of the
sample mean for sample size of 100.
(b) Determine the sampling distribution of the
sample mean for sample size of 200.
(c) Must you assume that the tee-shot distances
are normally distributed to answer parts a
and b? Explain.
(d) What is the probability that the sampling
error made in estimating the population mean
tee-shot distance by that of a random sample
of 100 tee-shot distances will be at most 1
yard?
(e) Same as (d) for sample size of 200?
Solution:
a. The sampling distribution of the sample mean for samples of size 100
will be approximately normally distributed with
µ x = µ = 272.2 yards and σ x = σ
n
= 8.12
100
= 0.812 yards .
µ x = µ = 272.2 yards and σ x = σ
= 8.12
n
b.
and is approximately normally distributed.
200
= 0.57 yards
c. We do not have to assume that the tee-shot distances are normally
distributed because the CLT tells us that for a sample size greater
than or equal to 30, the distribution of the sample mean will be
approximately normally distributed regardless of the distribution of the
variable in the population.
d. Here we are interested in finding P (271.2 ≤ x ≤ 273.2) .
Computing the z – scores and their associated area we get:
z=
271.2 − 272.2
= −1.23 with an associated area to its left of 0.1093
0.812
and
273.2 − 272.2
= 1.23 with an associated area to its left of 0.8907
0.812
Thus, the total area is: 0.8907 − 0.1093 = 0.7814
z=
Interpretation: There is a 0.7814 probability that the sampling error
will be less than 1 yard for samples of size 100.
e. From the example above, σ x = 0.574 and we are interested in
obtaining P(271.2 ≤ x ≤ 273.2) . So,
z=
271.2 − 272.2
= −1.74 with an associated area to its left of 0.0409
0.574
and
273.2 − 272.2
= 1.74 with an associated area to its left of 0.9591
0.574
Thus, the total area is: 0.9591 − 0.409 = 0.9192
z=
Interpretation: There is a 0.9182 probability that the sampling error
will be less than 1 yard for samples of size 200.