IT223 - Qualls - Week 6 Objectives IT223 Data Analysis Week 6 At the end of this section you should be able to answer questions concerning sampling distributions of sample means. Confidence Intervals and Sample Sizes Part 1: Means Specifically, you should understand: • what is a sampling distribution (SD) • the mean and standard deviation of a SD • the Central Limit Theorem DePaul University Bill Qualls 1 2 Sampling Distributions • Recall µ = Σxp(x) and σ² = Σ(x-µ)²p(x). • Given p(x=1) = .25, p(x=2) = .50, and p(x=3) = .25, find the mean and variance. Sampling Distributions x 1.0 p(x) xp(x) .2500 .2500 µ 2 (x-µ µ) -1.0 (x-µ µ )² 1.00 2.0 .5000 1.0000 2 0.0 0.00 .00000 .2500 .7500 2 1.0000 2.0000=µ µ 1.0 1.00 .25000 σ²=.50 3.0 (x-µ µ )²p(x) .25000 3 Sampling Distributions 4 Sampling Distributions • An experiment consists of drawing a sample of size 2, with replacement, and finding the sample mean. • Find the mean and variance. x1 1 1 1 2 2 2 3 3 3 x2 1 2 3 1 2 3 1 2 3 x 1.0 1.5 2.0 1.5 2.0 2.5 2.0 2.5 3.0 p(x1) .25 .25 .25 .50 .50 .50 .25 .25 .25 p(x2) p(x) xp(x) .25 .0625 .0625 .50 .1250 .1875 .25 .0625 .1250 .25 .1250 .1875 .50 .2500 .5000 .25 .1250 .3125 .25 .0625 .1250 .50 .1250 .3125 .25 .0625 .1875 1.0000 2.0000=µ µ µ 2 2 2 2 2 2 2 2 2 (x-µ µ) -1.0 -0.5 0.0 -0.5 0.0 0.5 0.0 0.5 1.0 (x-µ µ )² 1.00 0.25 0.00 0.25 0.00 0.25 0.00 0.25 1.00 (x-µ µ )²p(x) .06250 .03125 .00000 .03125 .00000 .03125 .00000 .03125 .06250 σ²=.25 σ =.50 5 Updated 5/12/2013 6 IT223 - Qualls - Week 6 Sampling Distributions Sampling Distributions • Sampling with n=2: DistributionOfSampleMeans2.xls Summary • Sampling with n=3: DistributionOfSampleMeans3.xls • the mean of the distribution of sample means is equal to the population mean • Sampling with n=4: DistributionOfSampleMeans4.xls • the standard deviation of the distribution of the sample means is equal to the population standard deviation divided by the square root of the sample size µx = µ σx = σ n 7 8 Together Central Limit Theorem The numbers of sales per day made by telemarketer in four days: 1, 11, 9, 3. Assume that samples of size 2 are randomly selected with replacement from this population of four values. • The Central Limit Theorem tells us that regardless of the shape of the distribution of the population, given n sufficiently large, the distribution of the sample means is approximately normally distributed. a. List the 16 different possible samples and find the mean of each of them. b. Identify the probability of each sample, then describe the sampling distribution of sample means. c. Find the mean of the sampling distribution. d. Is the mean of the sampling distribution from part c equal to the mean of the population of the four listed values? Are those means always equal? 9 10 Together Central Limit Theorem • When working with an individual value from a normally distributed population, use z= Assume adult males' weights are normally distributed with a mean of 180 pounds and a standard deviation of 30 pounds. x−µ σ • Find the probability that an adult male selected at random weighs over 200 pounds. • When working with a mean of a sample drawn from a population which is normally distributed, be sure to use the value of σ / n for the standard deviation of sample means, and use: z= • Find the probability that the mean weight of 9 adult males selected at random is over 200 pounds. x−µ n σ 11 Updated 5/12/2013 12 IT223 - Qualls - Week 6 Together Together Assume adult males' weights are normally distributed with a mean of 180 pounds and a standard deviation of 30 pounds. Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. • If an individual is selected at random, what is the probability that their IQ score is less than 95? • Find the probability that an adult male selected at random weighs between 175 pounds and 190 pounds. • If 16 individuals are selected at random, what is the probability that the mean of their IQ scores is less than 95? • Find the probability that the mean weight of 16 adult males selected at random is between 175 pounds and 190 pounds. 13 14 Together Objectives Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. At the end of this section you should be able to answer questions concerning confidence intervals for a population mean. • If an individual is selected at random, what is the probability that their IQ score is between 105 and 110? Specifically, you should understand: • how to calculate a confidence interval for a population mean • how and when to use z vs. t distributions • how to determine the margin of error • how to determine the requisite sample size given a desired confidence level and margin of error. • If 25 individuals are selected at random, what is the probability that the mean of their IQ scores is between 105 and 110? 15 16 Point Estimate of a Population Mean We saw the following example in week 1… • The following are the invoice amounts for 25 invoices drawn at random from last quarter's sales data: Confidence Intervals about a Population Mean 82 105 126 76 86 17 Updated 5/12/2013 77 112 71 67 94 97 68 97 109 77 100 93 84 83 121 99 72 98 100 115 18 IT223 - Qualls - Week 6 Point Estimate of a Population Mean Point Estimate of a Population Mean • A point estimate is a single value used to approximate a population parameter. Relative Frequency 28% • The best point estimate of the population mean (µ) is 24% the sample mean (x-bar). 20% • The best point estimate of the population standard 16% 12% deviation (σ) is the sample standard deviation (s). 8% 4% 0 59.5 69.5 79.5 89.5 99.5 109.5 119.5 129.5 Sales 19 20 Interval Estimates Point Estimate of a Population Mean • We can, however, assign a level of confidence to an interval estimate. • For the given data, n=25, x-bar = 92.4 and s=16.7. • What if we sampled n=250 invoices and found the • If you were asked to come up with a 95% confidence interval for the first case (x-bar = 92.4, n = 25), you might say you were 95% confident that the true mean is in the interval 92.4 ± 10. same sample mean? We would intuitively have more confidence in the second statistic than in the first. • But the problem with a point estimate is that we • But in the second case (x-bar = 92.4, n=250), you might say you were 95% confident that the true mean is in the interval 92.4 ± 4. cannot assign a statistical level of confidence to it. (Numbers used above are "guesses" only, for illustrative purposes.) 21 22 90% Confidence Interval CI for Population Mean (σ known) • The formula for the confidence interval (CI) for a population mean is usually shown as: σ µ = x ± zα / 2 • Or sometimes n µ = x ± E where E is the margin of error and is calculated as: E = zα / 2 σ n • We use σ if known (only in stats textbooks!), otherwise we use s. 23 Updated 5/12/2013 24 IT223 - Qualls - Week 6 95% Confidence Interval 99% Confidence Interval 25 26 Together Calculating Confidence Intervals • Find the 95% confidence interval for the mean invoice amount using the sample data: n=250, x-bar = 92.4. Assume σ is known to be 16.7. • Solution: µ = x ± zα / 2 σ n = 92.4 ± 1.96 16.7 250 = 92.4 ± 2.1 = (90.3, 94.5) 27 Interpretation 28 Interpretation So what does it mean? Wrong: We are 95% confident that the population mean is between 90.3 and 94.5. Correct: If the sampling process were repeated many times, and the interval calculated each time, 95% of those intervals would capture the true mean. 29 Updated 5/12/2013 30 IT223 - Qualls - Week 6 Together Margin of Error Find the 99% confidence interval for the population mean µ of the gambling losses suffered by Packers fans following the infamous substitute referee debacle of September 24, 2012 given n = 40 and x-bar = $189. Assume σ is known to be $87. Given a confidence interval of [10.2, 16.4]. • What is the mean? (Answer: 13.3) • What is the margin of error? (Answer: 3.1) E Aside from mentioning the Packers, what's wrong with this question? 10.2 E 16.4 • What is the margin of error for the previous problem? 31 32 When to use z vs. t Confidence Intervals about a Population Mean t-Distribution 33 t Distribution 34 t is a Family of Distributions • Sometimes we need to use the t distribution instead of the z distribution (what to use when is discussed shortly) • The t distribution has the following properties: – it is a family of distributions (infinitely many) – it has mean = 0 – it has standard deviation > 1 – it is flatter, more spread out, than z – it approaches z as n gets larger 35 Updated 5/12/2013 36 IT223 - Qualls - Week 6 Together CI for Population Mean (n<30) • Find the 95% confidence interval for the mean invoice amount using the sample data: n=25, x-bar = 92.4 and s=16.7. Assume the population is normal. • The formula for the confidence interval (CI) for a population proportion is usually shown as: µ = x ± tα / 2 • Or sometimes s n • Solution: µ = x ± tα / 2 µ = x ± E where E is the margin of error and is calculated as: E = tα / 2 s n = 92.4 ± 2.064 s n 16.7 25 24df Two tails .05 t=2.064 = 92.4 ± 6.9 Margin • Use n-1 degrees of freedom (df). = (85.5, 99.3) of error 37 38 t table (extract) Comparing Confidence Intervals ---------------------------------------------------------------| | α | | |---------------------------------------------------------------------------------------------------------------------| | |----------------------------------------------------------| | .005 | .01 | .025 | .05 | .10 | | | (1 tail) | (1 tail) | (1 tail) | (1 tail) | (1 tail) | | |---------------------------------------------------------------------------------------------------------------------| | |----------------------------------------------------------| | .01 | .02 | .05 | .10 | .20 | | df | (2 tails) | (2 tails) | (2 tails) | (2 tails) | (2 tails) | |-------------------------------------------------------------------------------------------------------------------------------| | | | | | | | | | 21 | 2.831 | 2.518 | 2.080 | 1.721 | 1.323 | | 22 | 2.819 | 2.508 | 2.074 | 1.717 | 1.321 | | 23 | 2.807 | 2.500 | 2.069 | 1.714 | 1.320 | | 24 | 2.797 | 2.492 | 2.064 | 1.711 | 1.318 | | 25 | 2.787 | 2.485 | 2.060 | 1.708 | 1.316 | | | | | | | | |-------------------------------------------------------------------------------------------------------------------------------| | 39 40 Together Together • Use the given confidence level and sample statistics to find (a) the margin of error, and (b) the 90% confidence interval for the population mean µ lifespan of a home furnace: n = 25, x-bar = 8.5 years, s = 3.1 years. Assume the population is normally distributed. Find the margin of error. • Given the following sample, find the 90% confidence interval for the mean lifespan of a home furnace in years. Assume the population is normally distributed. 9.4 12.3 6.3 8.7 9.5 6.1 10.6 7.3 8.1 8.4 14.7 9.2 • Find the margin of error. 41 Updated 5/12/2013 42 IT223 - Qualls - Week 6 Sample Size • How large does sample need to be to get an estimate of µ, with an acceptable margin of error? Determining the Proper Sample Size E = zα / 2 z σ → solve for n → n = α / 2 n E σ 2 • In the above formula, E might be, for example, 400 as in ±400 dollars. • If the population standard deviation (σ) is unknown, then use the sample standard deviation (s). 43 Together 44 Comparing Confidence Intervals • How many invoices do I need to sample to get a 95% confidence interval of the mean invoice amount with a $4 margin of error? Previous sampling has yielded a sample standard deviation of s=16.7. 45 Together You want to estimate the mean weight loss of people one year after using the Atkins diet. How many dieters must be surveyed if we want to be 95% confident that the sample mean weight loss is within 0.25 lb. of the true population mean? Assume that the population standard deviation is known to be 10.6 lb (based on data from "Comparison of the Atkins, Ornish, Weight Watchers, and Zone Diets for Weight Loss and Heart Risk Disease Reduction", by Dansinger et. al., Journal of the American Medical Association, Vol. 293, No. 1). Source: Triola, Page 348, Section 7-3, #35 47 Updated 5/12/2013 46
© Copyright 2024