large unknown. To take a random sample of size n (n is...

(Proportion – Large Sample)
INTERVAL ESTIMATION
GIVEN:
The proportion of members of a large population who possess some particular
characteristic is p. The value of p is unknown.
AIM:
To take a random sample of size n (n is large), with or without replacement,
from the population and then find an interval which may contain p.
METHOD
1st step.
Select a random sample of size n (n is large) from the population.
2nd step.
Choose a number α which is between 0 and 1. Usually α = 0.05 so that
1 − α = 95 % or α = 0.10 so that 1 − α = 90 % .
3rd step.
Compute the values
x z α2 / 2 z α / 2 ⎛ x ⎞⎛ x ⎞ z α2 / 2
+
±
⎜ ⎟⎜1 − ⎟ +
n
2n
n ⎝ n ⎠⎝ n ⎠ 4n
z α2 / 2
1+
n
where x is the number of members who possess the characteristic in the
random sample.
5th step.
The interval with endpoints
x z α2 / 2 z α / 2 ⎛ x ⎞⎛ x ⎞ z α2 / 2
+
±
⎜ ⎟⎜1 − ⎟ +
n
2n
n ⎝ n ⎠⎝ n ⎠ 4n
z2
1+ α/ 2
n
is called a (1 − α)100% confidence interval for the p.
If 1 − α = 95 % , then the interval is called a 95% confidence interval for p.
If 1 − α = 90 % , then the interval is called a 90% confidence interval for p.
NOTE
For large values of n,
x z α2 / 2 z α / 2 ⎛ x ⎞⎛ x ⎞ z α2 / 2
+
±
⎜ ⎟⎜1 − ⎟ +
n
2n
x
n ⎝ n ⎠⎝ n ⎠ 4n
≈ ± zα / 2
2
n
z
1+ α/ 2
n
1
x⎛ x⎞
⎜1 − ⎟
n⎝ n⎠
.
n
EXAMPLE 1
If 36 out of 100 persons interviewed are familiar with the tax incentives ( 動機,誘因 ) for
installing certain energy-saving devices, construct a 95% confidence interval for the
corresponding true proportion. What can we assert with 95% confidence about the possible
size of error.
SOLUTION
x
± zα/ 2
n
36 ⎛
36 ⎞
x⎛ x⎞
⎜1 −
⎟
⎜1 − ⎟
36
100 ⎝ 100 ⎠
n⎝ n⎠
=
± z 0.05 / 2
n
100
100
0.36 × 0.64
100
= 0.266 and 0.454
= 0.36 ± 1.96
A 95% confidence interval for p is approximately:
0.266 < p < 0.454
To find the size of error, we use the followings:
⎡
X⎛ X⎞
X⎛ X⎞⎤
⎜1 − ⎟
⎜1 − ⎟ ⎥
⎢
X
X
n⎝
n⎠
n⎝
n ⎠⎥
⎢
P
≈ 95%
− 1.96
< p < + 1.96
⎢n
⎥
n
n
n
⎢
⎥
⎣⎢
⎦⎥
⎡
X⎛ X⎞
X⎛ X⎞⎤
⎜1 − ⎟
⎜1 − ⎟ ⎥
⎢
n⎝
n⎠ X
n⎝
n ⎠⎥
⎢
P − 1.96
≈ 95%
< − p < 1.96
⎢
⎥
n
n
n
⎢
⎥
⎣⎢
⎦⎥
The size of error with 95% confidence is accordingly equal to
X⎛ X⎞
36 ⎛
36 ⎞
⎜1 − ⎟
⎜1 −
⎟
n⎝
n⎠
100 ⎝ 100 ⎠
= 0.094
1.96
= 1.96
n
100
2
Supplementary Reading
MEMBERS WHO POSSESS SOME PARTICULAR CHARACTERISTIC
True proportion
Population
P
Sample proportion
X
n
Size
n (LARGE)
Since the population size is large, we have:
P(X = x ) ≈ C nx p x (1 − p) n − x
Since the sample size n is large, we can approximate the binomial distribution by the normal
distribution. Hence:
⎛
P⎜ − z α / 2 <
⎜
⎝
⎛
⎜
P⎜ − z α / 2 <
⎜
⎜
⎝
⎛ X
⎜
−p
⎜ n
P⎜
p(1 − p)
⎜⎜
n
⎝
⎞
< zα/ 2 ⎟ ≈ 1− α
⎟
np(1 − p)
⎠
⎞
X
⎟
−p
n
< zα / 2 ⎟ ≈ 1− α
⎟
p(1 − p)
⎟
n
⎠
⎞
⎟
⎟
< zα / 2 ⎟ ≈ 1 − α
⎟⎟
⎠
X − np
Note that the followings are equivalent:
X
−p
n
< zα / 2
p(1 − p)
n
2
⇔
⇔
⇔
⎞
⎛X
2 ⎛ p(1 − p) ⎞
⎟
⎜ − p⎟ < zα / 2 ⎜
⎝ n ⎠
⎠
⎝n
⎛ z 2 ⎞ ⎛ 2X z α2 / 2
p 2 ⎜⎜1 + α / 2 ⎟⎟ − p⎜⎜
+
n ⎠ ⎝ n
n
⎝
⎞ ⎛ X ⎞2
⎟+⎜ ⎟ < 0
⎟ ⎝n⎠
⎠
X z α2 / 2 z α / 2 ⎛ X ⎞⎛ X ⎞ z α2 / 2
X z α2 / 2 z α / 2 ⎛ X ⎞⎛ X ⎞ z α2 / 2
+
−
+
+
⎜ ⎟⎜1 − ⎟ +
⎜ ⎟⎜1 − ⎟ +
n
2n
n ⎠ 4n
n
2n
n ⎠ 4n
n ⎝ n ⎠⎝
n ⎝ n ⎠⎝
<p<
2
2
z
z
1+ α/2
1+ α/2
n
n
3
Thus:
⎡ X z2
z
⎢ + α/2 − α/2
2n
n
⎢n
P⎢
⎢
1+
⎢⎣
2
X z α2 / 2 z α / 2
⎛ X ⎞⎛ X ⎞ z α / 2
+
+
⎜ ⎟⎜1 − ⎟ +
n ⎠ 4n
n
2n
n
⎝ n ⎠⎝
<p<
2
zα / 2
1+
n
2
⎛ X ⎞⎛ X ⎞ z α / 2 ⎤
⎥
⎜ ⎟⎜1 − ⎟ +
n ⎠ 4n ⎥
⎝ n ⎠⎝
⎥
z α2 / 2
⎥
n
⎥⎦
≈ 1− α
It follows that the endpoints of a (1 - α)100 % confidence interval for p are
x z α2 / 2 z α / 2 ⎛ x ⎞⎛ x ⎞ z α2 / 2
+
±
⎜ ⎟⎜1 − ⎟ +
n
2n
n ⎝ n ⎠⎝ n ⎠ 4n
.
z α2 / 2
1+
n
END
4
EXERCISE
1.
2.
In a random sample of 1000 houses in a certain city it is found that 288 own color
television sets. Find a 98% confidence interval for the fraction of homes in this city
that have color sets.
Ans.
(0.2547, 0.3213)
(a)
A random sample of 500 cigarette smokers is selected and 86 are found to
have a preference for brand X. Find a 90% confidence interval for the fraction
of the population of cigarette smokers who prefer brand X.
(b)
What can be asserted with 90% confidence about the possible size of our error?
Ans.
(a)
(0.1442, 0.1998)
(b)
5
0.0278
THEOREM 1
Suppose
(a)
(b)
(c)
E > 0 is given,
z 2 p(1 − p)
n ≥ α/2 2
,
E
n is large.
X
⎛
⎞
Then P ⎜ −E < − p < E ⎟ ≥ 1 − α .
n
⎝
⎠
Proof (for reference only)
Since n is large, we have:
⎛
P ⎜ −z α / 2 <
⎜
⎝
⎞
< zα / 2 ⎟ = 1 − α
⎟
np(1 − p)
⎠
⎛
⎜
P ⎜ −z α / 2 <
⎜
⎜
⎝
⎞
X
−p
⎟
n
< zα / 2 ⎟ = 1 − α
⎟
p(1 − p)
⎟
n
⎠
X − np
⎛
p(1 − p) X
p(1 − p) ⎞
< − p < zα / 2
P ⎜⎜ −z α / 2
⎟⎟ = 1 − α
n
n
n
⎝
⎠
If n ≥
z α2 / 2 p(1 − p)
E2
, then:
n≥
zα / 2
z α / 2 p(1 − p)
E
p(1 − p)
≤E
n
Thus:
⎛
p(1 − p) X
p(1 − p) ⎞
X
⎛
⎞
< − p < zα / 2
P⎜ − E < − p < E ⎟ ≥ P ⎜⎜ − z α / 2
⎟⎟
n
n
n
n
⎝
⎠
⎝
⎠
= 1− α
6
EXAMPLE 2
Suppose that we want to estimate the true proportion of defectives in a very large shipment
of adobe bricks, and that we want to be at least 95% confident that the error is at most 0.04.
How large a sample will we need if:
(a)
(b)
We have no idea what the true proportion might be.
We know that the true proportion dose not exceed 0.12.
SOLUTION
(a)
max
z α2 / 2 p(1 − p)
0≤ p ≤1
E2
1.96 2 p(1 − p)
0≤ p ≤1
0.04 2
1.96 2 1
=
×
0.04 2 4
= 600.25
= max
Hence n = 601
(b)
max
0≤ p ≤ 0.12
z α2 / 2 p(1 − p)
E2
1.96 2 p(1 − p)
0≤ p≤ 0.12
0.04 2
1.96 2 × 0.12 × (1 − 0.12)
=
0.04 2
= 253.55
= max
Hence n = 254
7
HOW TO MAKE A GUESS – SOME OBSERVATIONS
Suppose X is a random variable with a binomial probability distribution given by
x
60− x
P(X = x ) = C 60
.
x p (1 − p)
We try to make a guess about the binomial parameter p. Suppose we want to test the
following three possibilities:
p < 0.7
p = 0.7
p > 0.7
The following diagrams are helpful in making a decision.
The above diagrams show that an observation of X gives a hint of the possible value of p.
8
TESTS CONCERNING p
AIM
To set up two hypotheses about the true population proportion p. The first is
called the null hypothesis and is denoted by H 0 . The second is called the
alternative hypothesis and is denoted by H1 . The null hypothesis and the
alternative hypothesis are given in the table below:
Suppose you want to test one
of the following hypotheses
against the other.
p = p0
p ≠ p0
H 0 : p = p0
H1 : p ≠ p0
METHOD
p ≥ p0
p < p0
H 0 : p = p0
p = p0
p < p0
H1 : p < p0
p ≤ p0
p > p0
H 0 : p = p0
p = p0
p > p0
H1 : p > p0
To compute the value of a statistic and then determine whether the null
hypothesis or the alternative hypothesis should be accepted. The decision may
be correct or wrong (refer to the table below).
IF:
H 0 is true
H 0 is true
H 1 is true
H 1 is true
Decision
Reject H 0
(i.e. accept H1 )
Remark
Type I error
Accept H 0
Decision is correct
Accept H 0
Type II error
Reject H 0
(i.e. accept H1 )
Decision is correct
T h e n u m b e r α = P(H 0 is rejected | H 0 is true) i s c a l l e d t h e l e v e l o f
significance.
9
EXAMPLE 3
A pheasant hunter claims that he hits 70% of the birds he shoots at. Do you agree with his
claim if on a given day he brings down 38 of the 60 pheasants he shoots at? Use a 0.05 level
of significance.
SOLUTION
Let p be the probability that the hunter hits a pheasant and X be the number of pheasants hit
by the hunter when he shoots at 60 pheasants. Then:
x
60− x
P(X = x ) = C 60
.
x p (1 − p)
p = 0.70
p ≠ 0.70
1.
Ho :
H1 :
2.
α = 0.05
3.
Critical region:
Reject H 0 if X < 35 or X > 49 .
α
34
Note:
∑ C 60x 0.70 x (1 − 0.70) 60− x = 0.0196 < 2
x=0
α
35
∑ C 60x 0.70 x (1 − 0.70) 60−x = 0.0362 > 2
x=0
α
60
∑ C 60x 0.70 x (1 − 0.70) 60−x = 0.0139 < 2
x = 50
α
60
∑ C 60x 0.70 x (1 − 0.70) 60−x = 0.0295 > 2
x = 49
4.
Decision:
Since 38 does not fall in the critical region, we do not reject the
null hypothesis H 0 : p = 0.70 .
Understanding the level of significance and type I error
For this example, we have:
P(the null hypothesis is rejected | p = 0.7 ) = P(X < 35 or X > 49 | p = 0.7)
= 0.0335
≠α
10
COMPUTER SIMULATION
We can use computer simulation to understand the method used to solve the above example.
p = 0.7
Case 1:
The above figure shows that we usually accept the null hypothesis.
Case 2:
p < 0.7
If p = 0.5 < 0.7 , the values of X are usually smaller and we usually reject H 0 and conclude
that p is less than 0.7.
Case 3:
p > 0.7
If p = 0.88 > 0.7 , the values of X are usually larger and we usually reject H 0 and conclude
that p is greater than 0.7.
11
EXAMPLE 4
A manufacturer claimed that at least 90% of the equipment which he supplied to a factory
conformed to specifications. An examination of a random sample of 30 pieces of equipment
revealed that 23 were not faulty. Test the claim of the manufacturer at a 0.05 level of
significance.
SOLUTION
Aim: Let p be the percentage of pieces of equipment which were not faulty. We want to test
whether p ≥ 0.90 is true or not.
Note: Let X be the number of pieces of equipment which were not faulty. Then:
x
30− x
P(X = x) ≈ C 30
x p (1 - p)
p = 0.90
p < 0.90
1.
Ho :
H1 :
2
α = 0.05
3.
Critical region:
Reject H 0 if X < 24
23
Note:
∑ C 30x 0.90 x (1 − 0.90) 30−x = 0.0258 < α
x=0
24
∑ C 30x 0.90 x (1 − 0.90) 30−x = 0.0732 > α
x=0
4.
Decision:
Since 23 falls in the critical region, we reject the null
hypothesis and conclude that p < 0.90 .
Understanding the level of significance and type I error
For this example, we have:
P(the null hypothesis is rejected | p = 0.90 ) = P(X < 24 | p = 0.90)
= 0.0258
≠α
12
COMPUTER SIMULATION
We can use computer simulation to understand the method used to solve the above example.
p = 0.90
Case 1:
The above figure shows that we usually accept the null hypothesis.
Case 2:
p < 0.90
If p = 0.7 < 0.90 , many of the values of X are smaller and the probability of rejecting H 0
and concluding that p < 0.90 is high.
Case 3:
p > 0.90
If p = 0.95 > 0.90 , the values of X are usually larger and we usually do not reject H 0 and
conclude that p is greater than or equal to 0.90
13
EXAMPLE 5
A basketball player has hit on 75% of his shots from the floor. If on the next 60 shots he
makes 50 baskets, can you conclude that his shooting has improved? Use a 0.05 level of
significance.
SOLUTION
Aim: Let p be the probability that the player make a basket. We want to test whether
p > 0.75 is true or not.
x
60- x
Note: Let X be the baskets made on 60 shots. Then P(X = x) ≈ C 60
x p (1 - p)
p = 0.75
p > 0.75
1.
Ho :
H1 :
2.
α = 0.05
3.
Critical region:
Since n = 60 is large, we use the normal approximation to the
binomial distribution and the critical region is
X − 60 × 0.75
100 × 0.75 × 0.25
> z 0.05 = 1.645
where X is the number of baskets made by the player.
50 − 60 × 0.75
= 1.4907
4.
Calculations:
5.
Decision:
Since 1.4907 is not in the critical region, we do not reject the
hypothesis H o : p = 0.75 at the 0.05 level of significance and conclude that his
shooting percentage has not improved.
60 × 0.75 × 0.25
Understanding the level of significance and type I error
For this example, we have:
⎛ X − 60 × 0.75
⎞
P(the null hypothesis is rejected | p = 0.75 ) = P⎜
> 1.645 p = 0.75 ⎟
⎝ 60 × 0.75 × 0.25
⎠
= 0.05
14
COMPUTER SIMULATION
We can use computer simulation to understand the method used to solve the above example.
NOTE:
You can not use the cumulative binomial probabilities table to obtain the
number 50. Hence you have to use the normal approximation to the binomial
distribution.
Case 1:
p = 0.75
The above figure shows that we usually accept the null hypothesis.
Case 2:
p < 0.75
If p = 0.6 < 0.75 , many of the values of X are smaller and we usually do not reject the null
hypothesis and conclude that p is less than or equal to 0.75.
Case 3:
p > 0.75
If p = 0.90 > 0.75 , the values of X are usually larger and we usually reject H 0 and conclude
that p is greater than 0.75.
15
EXERCISE
3.
In a random sample of 1000 houses in a certain city, it is found that 618 own color
television sets. Is this sufficient evidence to conclude that 2/3 of the houses in this city
have color television sets? Use a 0.02 level of significance.
Ans.
4.
It is believed that at least 60% of the residents in a certain area favor an annexation
suit by a neighboring city. What conclusion would you draw if only 110 in a sample
of 200 voters favor the suit. Use a 0.04 level of significance.
Ans.
5.
Reject the hypothesis
Do not reject the hypothesis p = 0.6 .
The manufacturer of a patent medicine claimed that it was at least 90% effective in
relieving an allergy for a period of 8 hours. In a sample of 200 people who had the
allergy, the medicine provided relief for 160 people. Using a 0.01 level of
significance, determine whether the manufacturer’s claim is legitimate.
Ans.
Reject the claim.
16