What is a sample proportion? Say we survey n people and ask them if they like the movie The Social Network (yes or no). We obtain n responses of the form  1 if y e s Xi =   0 if n o Stat 104: Quantitative Methods for Economists The true (but unknown) population proportion of people who like the movie is p so that P(Xi=1)=p. We can estimate p by usingn Class 17: Confidence Intervals- One Sample Proportion p$ = X = 1 ∑ Xi n i =1 1 2 Another Student Survey 1 0 1 1 Sample Member Student ID Do you smoke regularly? Numerical Coding 1 232923 No 0 2 234932 Yes 1 3 Yes 1 4 No 0 : 1 0 p$ = 4 6 : : 49 : No 0 50 Yes 1 Suppose that 11 of the 50 students surveyed report that they regularly smoke. The sample proportion is pˆ = 11 = 22%. 50 3 The Central Limit Theorem Works for Proportions 4 Example If the true proportion of voters who support Proposition A is p = .4, what is the probability that a sample of size 200 yields a sample proportion between .40 and .45? If a random sample of size n is obtained from some population where the probability of having some characteristic is p, then (for large sample sizes)  p(1 − p )  pˆ ~ N  p,  n   5 i.e.: if p = .4 and n = 200, what is ^ ≤ .45) ? P(.40 ≤ p 6 1 Example if p = .4 and n = 200, what is ^ P(.40 ≤ p ≤ .45) ? Find Confidence Intervals for Proportions σ: pˆ Convert to standard normal: σ pˆ = Using the same logic as before for means, a (1-α)100% confidence interval is given by p(1 − p) .4(1 − .4) = = .03464 n 200 .45 − .40   .40 − .40 ≤z≤ P(.40 ≤ pˆ ≤ .45) = P   .03464   .03464 = P(0 ≤ z ≤ 1.44) Use standard normal table: P(0 ≤ z ≤ 1.44) = .4251 p$ ± z α / 2 p$ (1 − p$ ) n We always assume n is big (larger than 30) when estimating proportions. 7 The critical value 8 Example: A marketing research firm contacts a random sample of 100 men in Chicago and finds that 40% of them prefer the Gillette Sensor razor to all other brands. The 95% C.I. for the proportion of all men in Chicago who prefer the Gillette Sensor is determined as follows: The 95% confidence interval is usually used, but some other favorites are 90% and 99% 0.40 ± 1.96 0.40(1 − 0.40) = 0.40 ± 1.96(0.05) = (0.30398, 0.49602) 100 So with 95% confidence, we estimate the proportion of all men in Chicago who prefer the Gillette Sensor to be somewhere between 30 and 50 percent (pretty good market share). 9 Example : CI for proportion Calculating the CI in Stata There are actually several ways to calculate a confidence interval for a proportion (more details in a few slides). This interval is called a Wald interval Number of trials 10 Number of successes As the new manager of a bank’s credit card department, you have been asked to persuade the 1.5 million cardholders to spend an extra $10/month on credit card insurance. Because this insurance mainly protects the bank, you have doubts about people’s willingness to buy the new product. You take a random sample of 300 cardholders and 33 say they would buy the insurance. 33 .11(1 − .11) ± 1.96 = .11 ± 1.96(.0181) = (.075,0.145) 300 300 With 95% confidence the population proportion of those who might buy the insurance is estimated to lie between 7.5% and 15.5%. 11 12 2 Example : Survey Data Stata Output A CNN/USA Today/Gallup Poll asked 299 parents of K-12 children the following question (during March 2009): Thinking about your oldest child, when he or she is at school, do you fear for his or her physical safety? Of the parents surveyed, 136 (45.5%) answered “Yes” and 163 (54.5%) answered “No.” The pollsters reported a margin of error of +/– 6 percent. Where does this 6% come from? 14 13 Examine the Formula Determining Sample Size Using our binomial confidence interval formula, the confidence interval for the proportion of “yes” responses is Say we want to perform a survey. How many people do we need to poll to be, oh, within 3% of the true value ? pˆ ± 2 pˆ (1 − pˆ ) .455(.545) = 0.455 ± 2 = 0.455 ± 0.058 n 299 pˆ ± 1.96 The 6% mentioned on the last slide. This is what pollsters call the “margin of error” pˆ (1 − pˆ ) n 1.96 pˆ (1 − pˆ ) = .03 n or n = (1.96) 2 pˆ (1 − pˆ ) / (.03) 2 Oops! I get lazy sometime and use 2 instead of 1.96. 15 16 Example: Calculating Sample Size Example: Calculating Sample Size How do we find n? We need to know a value for p: So use phat=0.5 as the “worst case scenario” when performing sample size. So the desired sample size is n = (1.96) 2 pˆ (1 − pˆ ) / (.03) 2 n = (1.96) 2 pˆ (1 − pˆ ) / (.03) 2 = (3.84)(0.5)(0.5) / (.03) 2 What’s the worst case scenario for p? = (0.96) / (.03) 2 The value is maximized when phat=0.5 = 1066.66 17 So we should sample 1067 people-we always round up. 18 3 What if there are no successes? Confidence Interval I once taught a short course to twenty students at General Electric As part of a class survey, I asked how many were vegetarians. None of them were. What happens to the confidence interval in this case? What does Stata give if we observe no successes? The interval is (0,0) not that useful 19 20 Other CI’s for the Proportion The Agresti Interval Stata has several different types of CI’s for the proportion. The one we just discussed and will use on hw’s and exams is called the Wald Interval. The Wald interval is ok, but not great if either n is small and/or p is near 0 or 1. Alan Agresti-teaches Stat 101 in the fall He wrote an extensive paper a few years ago comparing and discussing all the different confidence intervals for the proportion. 21 The Agresti Interval (cont) Stata Example We usually define the sample proportion as: x number of successes in the sample pˆ = = n sample size Under the Agresti approach, we define it as x+2 pˆ = n+4 The do the CI formula we have been using pˆ ± 1 .9 6 22 Consider the Gillete Example from earlier, with 40 out of 100 men preferring the Sensor razor. pˆ (1 − pˆ ) n 23 24 4 Stata Example Practice Now consider 0 out of 20 people are vegetarians: A university dean is interested in determining the proportion of students who receive some sort of financial aid. Rather than examine the records for all students, the dean randomly selects 200 students and finds that 118 of them are receiving financial aid. Compute a 95% confidence interval for the true proportion of students on financial aid. On homeworks and exams we will use the original confidence interval. 25 26 Practice Practice The amount of caffeine is measured for a random sample of n=31 cups of coffee served at a local diner. Across the 31 cups, the sample mean is 110 milligrams, and the sample standard deviation is 15 milligrams. Calculate a 95% confidence interval for the population mean caffeine content. You are told that a 95% confidence interval for the population mean is 17.3 to 24.5. If the sample standard deviation is 18.2, how large was the sample ? 27 28 Practice Practice A campaign was designed to convince car owners that they should fill their tires with nitrogen instead of air. At a cost of about $ 5 per tire, nitrogen supposedly has the advantage of leaking at a much slower rate than air, so that the ideal tire pressure can be maintained more consistently. Before spending huge sums to advertise the nitrogen, it would be wise to conduct a survey to determine the percentage of car owners who would pay for the nitrogen. At the 95% confidence level, how many randomly selected car owners should be surveyed? Assume that we want to be 95% confident that the sample percentage is within three percentage points of the true percentage of all car owners who would be willing to pay for the nitrogen. Two confidence interval estimates from the same sample are (16.4,29.8) and (14.3,31.9). What is the sample mean, and if one estimate is at the 95% level while the other is at the 99% level, which is which ? 29 a) (16.4,29.8) is the 95% level b) (16.4,29.8) is the 99% level c) It is impossible to completely answer his question without knowing the sample size d) It is impossible to completely answer his question without knowing the sample standard deviation e) It is impossible to completely answer his question without knowing both the sample standard deviation and the sample size. 30 5 Midterm Exam: March 9 Things you should know What is a confidence interval 6pm at night-locations to be announced Open book/Open notes Topics Include How to calculate CI’s for means and proportions C.I. Summary : Truth Guess Confidence Interval µ X X ± zα / 2 s n p pˆ pˆ ± zα / 2 pˆ (1 − pˆ ) n Mean and Variance of Data, Correlation, Covariance, Regression Basic probability, 2x2 tables, conditional, indep. Random Variables, means, variance, sums Binomial and Normal Distributions CI for means and proportions (all 95%) Readings from the book Sections 4.1,4.2 and 5.3 (skip 5.3.3) 31 32 6