What is a sample proportion?

What is a sample proportion?
Say we survey n people and ask them if they like the
movie The Social Network (yes or no). We obtain n
responses of the form
 1 if y e s
Xi = 
 0 if n o
Stat 104: Quantitative Methods for Economists
The true (but unknown) population proportion of people
who like the movie is p so that P(Xi=1)=p.
We can estimate p by usingn
Class 17: Confidence Intervals- One Sample Proportion
p$ = X =
1
∑ Xi
n i =1
1
2
Another Student Survey
1
0
1
1
Sample Member
Student ID
Do you smoke
regularly?
Numerical Coding
1
232923
No
0
2
234932
Yes
1
3
Yes
1
4
No
0
:
1
0
p$ =
4
6
:
:
49
:
No
0
50
Yes
1
Suppose that 11 of the 50 students surveyed report that they
regularly smoke. The sample proportion is pˆ = 11 = 22%.
50
3
The Central Limit Theorem Works for Proportions
4
Example
If the true proportion of voters who support
Proposition A is p = .4, what is the probability
that a sample of size 200 yields a sample
proportion between .40 and .45?
If a random sample of size n is obtained from
some population where the probability of
having some characteristic is p, then (for
large sample sizes)
 p(1 − p ) 
pˆ ~ N  p,

n


5
i.e.: if p = .4 and n = 200, what is
^ ≤ .45) ?
P(.40 ≤ p
6
1
Example
if p = .4 and n = 200, what is
^
P(.40 ≤ p ≤ .45) ?
Find
Confidence Intervals for Proportions
σ: pˆ
Convert to
standard normal:
σ pˆ =
Using the same logic as before for means,
a (1-α)100% confidence interval is given
by
p(1 − p)
.4(1 − .4)
=
= .03464
n
200
.45 − .40 
 .40 − .40
≤z≤
P(.40 ≤ pˆ ≤ .45) = P 

.03464 
 .03464
= P(0 ≤ z ≤ 1.44)
Use standard normal table:
P(0 ≤ z ≤ 1.44) = .4251
p$ ± z α / 2
p$ (1 − p$ )
n
We always assume n is big (larger than 30)
when estimating proportions.
7
The critical value
8
Example: A marketing research firm contacts a
random sample of 100 men in Chicago and finds that
40% of them prefer the Gillette Sensor razor to all
other brands. The 95% C.I. for the proportion of all
men in Chicago who prefer the Gillette Sensor is
determined as follows:
The 95% confidence interval is usually used,
but some other favorites are 90% and 99%
0.40 ± 1.96
0.40(1 − 0.40)
= 0.40 ± 1.96(0.05) = (0.30398, 0.49602)
100
So with 95% confidence, we estimate the proportion
of all men in Chicago who prefer the Gillette Sensor
to be somewhere between 30 and 50 percent (pretty
good market share).
9
Example : CI for proportion
Calculating the CI in Stata
There are actually several ways to calculate a
confidence interval for a proportion (more
details in a few slides).
This interval is called a Wald interval
Number of trials
10
Number of successes
As the new manager of a bank’s credit card department,
you have been asked to persuade the 1.5 million
cardholders to spend an extra $10/month on credit card
insurance. Because this insurance mainly protects the
bank, you have doubts about people’s willingness to buy
the new product. You take a random sample of 300
cardholders and 33 say they would buy the insurance.
33
.11(1 − .11)
± 1.96
= .11 ± 1.96(.0181) = (.075,0.145)
300
300
With 95% confidence the population proportion of
those who might buy the insurance is estimated to lie
between 7.5% and 15.5%.
11
12
2
Example : Survey Data
Stata Output
A CNN/USA Today/Gallup Poll asked 299 parents of K-12
children the following question (during March 2009):
Thinking about your oldest child, when he or she is at
school, do you fear for his or her physical safety?
Of the parents surveyed, 136 (45.5%) answered “Yes” and
163 (54.5%) answered “No.” The pollsters reported a
margin of error of +/– 6 percent.
Where does this 6% come from?
14
13
Examine the Formula
Determining Sample Size
Using our binomial confidence interval formula,
the confidence interval for the proportion of “yes”
responses is
Say we want to perform a survey. How many people do
we need to poll to be, oh, within 3% of the true value ?
pˆ ± 2
pˆ (1 − pˆ )
.455(.545)
= 0.455 ± 2
= 0.455 ± 0.058
n
299
pˆ ± 1.96
The 6% mentioned on the last slide. This
is what pollsters call the “margin of error”
pˆ (1 − pˆ )
n
1.96
pˆ (1 − pˆ )
= .03
n
or n = (1.96) 2 pˆ (1 − pˆ ) / (.03) 2
Oops! I get lazy sometime and use 2 instead of 1.96.
15
16
Example: Calculating Sample Size
Example: Calculating Sample Size
How do we find n? We need to know a value
for p:
So use phat=0.5 as the “worst case scenario”
when performing sample size.
So the desired sample size is
n = (1.96) 2 pˆ (1 − pˆ ) / (.03) 2
n = (1.96) 2 pˆ (1 − pˆ ) / (.03) 2
= (3.84)(0.5)(0.5) / (.03) 2
What’s the worst case scenario for p?
= (0.96) / (.03) 2
The value is maximized when
phat=0.5
= 1066.66
17
So we should sample 1067
people-we always round up.
18
3
What if there are no successes?
Confidence Interval
I once taught a short course to twenty
students at General Electric
As part of a class survey, I asked how many
were vegetarians.
None of them were.
What happens to the confidence interval in
this case?
What does Stata give if we observe no
successes?
The interval is (0,0) not that useful
19
20
Other CI’s for the Proportion
The Agresti Interval
Stata has several different types of CI’s for
the proportion.
The one we just discussed and will use on
hw’s and exams is called the Wald Interval.
The Wald interval is ok, but not great if either
n is small and/or p is near 0 or 1.
Alan Agresti-teaches Stat 101 in the fall
He wrote an extensive paper a few years ago
comparing and discussing all the different
confidence intervals for the proportion.
21
The Agresti Interval (cont)
Stata Example
We usually define the sample proportion as:
x
number of successes in the sample
pˆ =
=
n
sample size
Under the Agresti approach, we define it as
x+2
pˆ =
n+4
The do the CI formula we have been using
pˆ ± 1 .9 6
22
Consider the Gillete Example from earlier,
with 40 out of 100 men preferring the Sensor
razor.
pˆ (1 − pˆ )
n
23
24
4
Stata Example
Practice
Now consider 0 out of 20 people are
vegetarians:
A university dean is interested in determining the proportion of
students who receive some sort of financial aid. Rather than
examine the records for all students, the dean randomly selects 200
students and finds that 118 of them are receiving financial aid.
Compute a 95% confidence interval for the true proportion of
students on financial aid.
On homeworks and exams we will use the original confidence interval.
25
26
Practice
Practice
The amount of caffeine is measured for a random sample of n=31
cups of coffee served at a local diner. Across the 31 cups, the
sample mean is 110 milligrams, and the sample standard deviation
is 15 milligrams. Calculate a 95% confidence interval for the
population mean caffeine content.
You are told that a 95% confidence interval for the
population mean is 17.3 to 24.5. If the sample standard
deviation is 18.2, how large was the sample ?
27
28
Practice
Practice
A campaign was designed to convince car owners that they should
fill their tires with nitrogen instead of air.
At a cost of about $ 5 per tire, nitrogen supposedly has the
advantage of leaking at a much slower rate than air, so that the ideal
tire pressure can be maintained more consistently. Before spending
huge sums to advertise the nitrogen, it would be wise to conduct a
survey to determine the percentage of car owners who would pay for
the nitrogen.
At the 95% confidence level, how many randomly selected car
owners should be surveyed? Assume that we want to be 95%
confident that the sample percentage is within three percentage
points of the true percentage of all car owners who would be willing
to pay for the nitrogen.
Two confidence interval estimates from the same sample are
(16.4,29.8) and (14.3,31.9). What is the sample mean, and if one
estimate is at the 95% level while the other is at the 99% level,
which is which ?
29
a) (16.4,29.8) is the 95% level
b) (16.4,29.8) is the 99% level
c) It is impossible to completely answer his question without knowing the sample size
d) It is impossible to completely answer his question without knowing the sample standard
deviation
e) It is impossible to completely answer his question without knowing both the sample
standard deviation and the sample size.
30
5
Midterm Exam: March 9
Things you should know
What is a confidence interval
6pm at night-locations to be announced
Open book/Open notes
Topics Include
How to calculate CI’s for means and proportions
C.I. Summary :
Truth
Guess
Confidence Interval
µ
X
X ± zα / 2
s
n
p
pˆ
pˆ ± zα / 2
pˆ (1 − pˆ )
n
Mean and Variance of Data, Correlation,
Covariance, Regression
Basic probability, 2x2 tables, conditional, indep.
Random Variables, means, variance, sums
Binomial and Normal Distributions
CI for means and proportions (all 95%)
Readings from the book
Sections 4.1,4.2 and 5.3 (skip 5.3.3)
31
32
6