7.3 The Sampling Distribution of the Sample Mean I. Sampling Distribution: • Up to this point we have only partly described the sampling distribution of the sample mean, i.e. we have shown that the mean and standard deviation of the sampling distribution, x , can be expressed in terms of the sample size and population mean and standard deviation: µx = µ and σx = σ n • Now we finish describing the sampling distribution of the sample mean by utilizing a very important mathematical fact: If the variable under consideration is normally distributed Then so is the variable x . [The proof of this fact requires advanced and complicated mathematics which We will not concern ourselves with here. An example will serve to show that this fact is true.] Ex: Intelligence Quotients Intelligence quotients (IQs) measured on the Stanford-Binet scale are normally distributed with a mean of 100 and a standard deviation of 16. For a sample size of 4, we will use simulation to make plausible the fact that x is normally distributed, i.e., that the possible sample mean IQs for samples of 4 people have a normal distribution. Solution: µ x = µ = 100 a. and σx = σ n = 16 =8 4 b. If we simulate on Minitab 1000 samples of n = 4 IQs each and determine x for each sample, we obtain the following histogram: Intelligence Quotients Histogram / Normal Curve c. We superimpose a normal curve above the histogram and the normal curve has a mean of 100 and a standard deviation of 8. Note that they are roughly shaped alike, i.e. x is normally distributed. • TECHNOLOGY: Do the simulation on Minitab. • From the above example we may generalize as follows: a. Suppose that a variable x of a population is normally distributed with a mean of µ and a standard deviation of σ. Then, for samples of size n, the variable x is normally distributed and has a mean of µ and a standard deviation of σ n This is seen in the following figure for the population and samples of size 4 and size 16: Intelligence Quotients b. From these curves we note the following: i. Each curve is centered at the population mean, i.e. µx = µ . ii. The spread or dispersion becomes less extensive as the sample size increases, i.e. σx = σ n . iii. As the sample size increases, the possible sample means cluster more closely around the population mean. iv. The larger the sample size, the smaller the sampling error is in estimating a population mean by a sample mean (inferential statistics). II. The Central Limit Theorem: • We can further extend the concept of the distribution of the variable x by the CENTRAL LIMIT THEOREM which is especially important in statistics: Central Limit Theorem For a relatively large sample size (n>30), the variable x is approximately normally distributed, regardless of the variable under consideration. The approximation becomes better with increasing sample size. • We can illustrate the CLT the same way we illustrated the mathematical fact above, i.e. we can simulate non-normal distributions on the computer, take samples from these variables (greater than size 30) and show that the distribution of the resulting sample will be approximately normally distributed. • TECHNOLOGY: show on Minitab • Here’s a summary of the Central Limit theorem: Sampling Distribution of x If a variable x of a population has mean µ and standard deviation σ, then for samples of size n, 1. µx = µ 2. σx = σ / n 3. if x is distributed normally, then so is x, regardless of n. 4. if n is large ( > 30), x is approximately normally distributed, regardless of the distribution of x. • To show how we use the CLT, let’s look at the following example: Example An article by Scott M. Berry titled “Drive for Show and Putt for Dough” (Chance,1999, Vol. 12(4),pp. 50-54)discussed driving distances of PGA players. The mean distance for tee-shots on the 1999 men’s PGA tour is 272.2 yards with a standard deviation of 8.12 yards. Example Example (a) Determine the sampling distribution of the sample mean for sample size of 100. (b) Determine the sampling distribution of the sample mean for sample size of 200. (c) Must you assume that the tee-shot distances are normally distributed to answer parts a and b? Explain. (d) What is the probability that the sampling error made in estimating the population mean tee-shot distance by that of a random sample of 100 tee-shot distances will be at most 1 yard? (e) Same as (d) for sample size of 200? Solution: a. The sampling distribution of the sample mean for samples of size 100 will be approximately normally distributed with µ x = µ = 272.2 yards and σ x = σ n = 8.12 100 = 0.812 yards . µ x = µ = 272.2 yards and σ x = σ = 8.12 n b. and is approximately normally distributed. 200 = 0.57 yards c. We do not have to assume that the tee-shot distances are normally distributed because the CLT tells us that for a sample size greater than or equal to 30, the distribution of the sample mean will be approximately normally distributed regardless of the distribution of the variable in the population. d. Here we are interested in finding P (271.2 ≤ x ≤ 273.2) . Computing the z – scores and their associated area we get: z= 271.2 − 272.2 = −1.23 with an associated area to its left of 0.1093 0.812 and 273.2 − 272.2 = 1.23 with an associated area to its left of 0.8907 0.812 Thus, the total area is: 0.8907 − 0.1093 = 0.7814 z= Interpretation: There is a 0.7814 probability that the sampling error will be less than 1 yard for samples of size 100. e. From the example above, σ x = 0.574 and we are interested in obtaining P(271.2 ≤ x ≤ 273.2) . So, z= 271.2 − 272.2 = −1.74 with an associated area to its left of 0.0409 0.574 and 273.2 − 272.2 = 1.74 with an associated area to its left of 0.9591 0.574 Thus, the total area is: 0.9591 − 0.409 = 0.9192 z= Interpretation: There is a 0.9182 probability that the sampling error will be less than 1 yard for samples of size 200.
© Copyright 2024