(Proportion – Large Sample) INTERVAL ESTIMATION GIVEN: The proportion of members of a large population who possess some particular characteristic is p. The value of p is unknown. AIM: To take a random sample of size n (n is large), with or without replacement, from the population and then find an interval which may contain p. METHOD 1st step. Select a random sample of size n (n is large) from the population. 2nd step. Choose a number α which is between 0 and 1. Usually α = 0.05 so that 1 − α = 95 % or α = 0.10 so that 1 − α = 90 % . 3rd step. Compute the values x z α2 / 2 z α / 2 ⎛ x ⎞⎛ x ⎞ z α2 / 2 + ± ⎜ ⎟⎜1 − ⎟ + n 2n n ⎝ n ⎠⎝ n ⎠ 4n z α2 / 2 1+ n where x is the number of members who possess the characteristic in the random sample. 5th step. The interval with endpoints x z α2 / 2 z α / 2 ⎛ x ⎞⎛ x ⎞ z α2 / 2 + ± ⎜ ⎟⎜1 − ⎟ + n 2n n ⎝ n ⎠⎝ n ⎠ 4n z2 1+ α/ 2 n is called a (1 − α)100% confidence interval for the p. If 1 − α = 95 % , then the interval is called a 95% confidence interval for p. If 1 − α = 90 % , then the interval is called a 90% confidence interval for p. NOTE For large values of n, x z α2 / 2 z α / 2 ⎛ x ⎞⎛ x ⎞ z α2 / 2 + ± ⎜ ⎟⎜1 − ⎟ + n 2n x n ⎝ n ⎠⎝ n ⎠ 4n ≈ ± zα / 2 2 n z 1+ α/ 2 n 1 x⎛ x⎞ ⎜1 − ⎟ n⎝ n⎠ . n EXAMPLE 1 If 36 out of 100 persons interviewed are familiar with the tax incentives ( 動機,誘因 ) for installing certain energy-saving devices, construct a 95% confidence interval for the corresponding true proportion. What can we assert with 95% confidence about the possible size of error. SOLUTION x ± zα/ 2 n 36 ⎛ 36 ⎞ x⎛ x⎞ ⎜1 − ⎟ ⎜1 − ⎟ 36 100 ⎝ 100 ⎠ n⎝ n⎠ = ± z 0.05 / 2 n 100 100 0.36 × 0.64 100 = 0.266 and 0.454 = 0.36 ± 1.96 A 95% confidence interval for p is approximately: 0.266 < p < 0.454 To find the size of error, we use the followings: ⎡ X⎛ X⎞ X⎛ X⎞⎤ ⎜1 − ⎟ ⎜1 − ⎟ ⎥ ⎢ X X n⎝ n⎠ n⎝ n ⎠⎥ ⎢ P ≈ 95% − 1.96 < p < + 1.96 ⎢n ⎥ n n n ⎢ ⎥ ⎣⎢ ⎦⎥ ⎡ X⎛ X⎞ X⎛ X⎞⎤ ⎜1 − ⎟ ⎜1 − ⎟ ⎥ ⎢ n⎝ n⎠ X n⎝ n ⎠⎥ ⎢ P − 1.96 ≈ 95% < − p < 1.96 ⎢ ⎥ n n n ⎢ ⎥ ⎣⎢ ⎦⎥ The size of error with 95% confidence is accordingly equal to X⎛ X⎞ 36 ⎛ 36 ⎞ ⎜1 − ⎟ ⎜1 − ⎟ n⎝ n⎠ 100 ⎝ 100 ⎠ = 0.094 1.96 = 1.96 n 100 2 Supplementary Reading MEMBERS WHO POSSESS SOME PARTICULAR CHARACTERISTIC True proportion Population P Sample proportion X n Size n (LARGE) Since the population size is large, we have: P(X = x ) ≈ C nx p x (1 − p) n − x Since the sample size n is large, we can approximate the binomial distribution by the normal distribution. Hence: ⎛ P⎜ − z α / 2 < ⎜ ⎝ ⎛ ⎜ P⎜ − z α / 2 < ⎜ ⎜ ⎝ ⎛ X ⎜ −p ⎜ n P⎜ p(1 − p) ⎜⎜ n ⎝ ⎞ < zα/ 2 ⎟ ≈ 1− α ⎟ np(1 − p) ⎠ ⎞ X ⎟ −p n < zα / 2 ⎟ ≈ 1− α ⎟ p(1 − p) ⎟ n ⎠ ⎞ ⎟ ⎟ < zα / 2 ⎟ ≈ 1 − α ⎟⎟ ⎠ X − np Note that the followings are equivalent: X −p n < zα / 2 p(1 − p) n 2 ⇔ ⇔ ⇔ ⎞ ⎛X 2 ⎛ p(1 − p) ⎞ ⎟ ⎜ − p⎟ < zα / 2 ⎜ ⎝ n ⎠ ⎠ ⎝n ⎛ z 2 ⎞ ⎛ 2X z α2 / 2 p 2 ⎜⎜1 + α / 2 ⎟⎟ − p⎜⎜ + n ⎠ ⎝ n n ⎝ ⎞ ⎛ X ⎞2 ⎟+⎜ ⎟ < 0 ⎟ ⎝n⎠ ⎠ X z α2 / 2 z α / 2 ⎛ X ⎞⎛ X ⎞ z α2 / 2 X z α2 / 2 z α / 2 ⎛ X ⎞⎛ X ⎞ z α2 / 2 + − + + ⎜ ⎟⎜1 − ⎟ + ⎜ ⎟⎜1 − ⎟ + n 2n n ⎠ 4n n 2n n ⎠ 4n n ⎝ n ⎠⎝ n ⎝ n ⎠⎝ <p< 2 2 z z 1+ α/2 1+ α/2 n n 3 Thus: ⎡ X z2 z ⎢ + α/2 − α/2 2n n ⎢n P⎢ ⎢ 1+ ⎢⎣ 2 X z α2 / 2 z α / 2 ⎛ X ⎞⎛ X ⎞ z α / 2 + + ⎜ ⎟⎜1 − ⎟ + n ⎠ 4n n 2n n ⎝ n ⎠⎝ <p< 2 zα / 2 1+ n 2 ⎛ X ⎞⎛ X ⎞ z α / 2 ⎤ ⎥ ⎜ ⎟⎜1 − ⎟ + n ⎠ 4n ⎥ ⎝ n ⎠⎝ ⎥ z α2 / 2 ⎥ n ⎥⎦ ≈ 1− α It follows that the endpoints of a (1 - α)100 % confidence interval for p are x z α2 / 2 z α / 2 ⎛ x ⎞⎛ x ⎞ z α2 / 2 + ± ⎜ ⎟⎜1 − ⎟ + n 2n n ⎝ n ⎠⎝ n ⎠ 4n . z α2 / 2 1+ n END 4 EXERCISE 1. 2. In a random sample of 1000 houses in a certain city it is found that 288 own color television sets. Find a 98% confidence interval for the fraction of homes in this city that have color sets. Ans. (0.2547, 0.3213) (a) A random sample of 500 cigarette smokers is selected and 86 are found to have a preference for brand X. Find a 90% confidence interval for the fraction of the population of cigarette smokers who prefer brand X. (b) What can be asserted with 90% confidence about the possible size of our error? Ans. (a) (0.1442, 0.1998) (b) 5 0.0278 THEOREM 1 Suppose (a) (b) (c) E > 0 is given, z 2 p(1 − p) n ≥ α/2 2 , E n is large. X ⎛ ⎞ Then P ⎜ −E < − p < E ⎟ ≥ 1 − α . n ⎝ ⎠ Proof (for reference only) Since n is large, we have: ⎛ P ⎜ −z α / 2 < ⎜ ⎝ ⎞ < zα / 2 ⎟ = 1 − α ⎟ np(1 − p) ⎠ ⎛ ⎜ P ⎜ −z α / 2 < ⎜ ⎜ ⎝ ⎞ X −p ⎟ n < zα / 2 ⎟ = 1 − α ⎟ p(1 − p) ⎟ n ⎠ X − np ⎛ p(1 − p) X p(1 − p) ⎞ < − p < zα / 2 P ⎜⎜ −z α / 2 ⎟⎟ = 1 − α n n n ⎝ ⎠ If n ≥ z α2 / 2 p(1 − p) E2 , then: n≥ zα / 2 z α / 2 p(1 − p) E p(1 − p) ≤E n Thus: ⎛ p(1 − p) X p(1 − p) ⎞ X ⎛ ⎞ < − p < zα / 2 P⎜ − E < − p < E ⎟ ≥ P ⎜⎜ − z α / 2 ⎟⎟ n n n n ⎝ ⎠ ⎝ ⎠ = 1− α 6 EXAMPLE 2 Suppose that we want to estimate the true proportion of defectives in a very large shipment of adobe bricks, and that we want to be at least 95% confident that the error is at most 0.04. How large a sample will we need if: (a) (b) We have no idea what the true proportion might be. We know that the true proportion dose not exceed 0.12. SOLUTION (a) max z α2 / 2 p(1 − p) 0≤ p ≤1 E2 1.96 2 p(1 − p) 0≤ p ≤1 0.04 2 1.96 2 1 = × 0.04 2 4 = 600.25 = max Hence n = 601 (b) max 0≤ p ≤ 0.12 z α2 / 2 p(1 − p) E2 1.96 2 p(1 − p) 0≤ p≤ 0.12 0.04 2 1.96 2 × 0.12 × (1 − 0.12) = 0.04 2 = 253.55 = max Hence n = 254 7 HOW TO MAKE A GUESS – SOME OBSERVATIONS Suppose X is a random variable with a binomial probability distribution given by x 60− x P(X = x ) = C 60 . x p (1 − p) We try to make a guess about the binomial parameter p. Suppose we want to test the following three possibilities: p < 0.7 p = 0.7 p > 0.7 The following diagrams are helpful in making a decision. The above diagrams show that an observation of X gives a hint of the possible value of p. 8 TESTS CONCERNING p AIM To set up two hypotheses about the true population proportion p. The first is called the null hypothesis and is denoted by H 0 . The second is called the alternative hypothesis and is denoted by H1 . The null hypothesis and the alternative hypothesis are given in the table below: Suppose you want to test one of the following hypotheses against the other. p = p0 p ≠ p0 H 0 : p = p0 H1 : p ≠ p0 METHOD p ≥ p0 p < p0 H 0 : p = p0 p = p0 p < p0 H1 : p < p0 p ≤ p0 p > p0 H 0 : p = p0 p = p0 p > p0 H1 : p > p0 To compute the value of a statistic and then determine whether the null hypothesis or the alternative hypothesis should be accepted. The decision may be correct or wrong (refer to the table below). IF: H 0 is true H 0 is true H 1 is true H 1 is true Decision Reject H 0 (i.e. accept H1 ) Remark Type I error Accept H 0 Decision is correct Accept H 0 Type II error Reject H 0 (i.e. accept H1 ) Decision is correct T h e n u m b e r α = P(H 0 is rejected | H 0 is true) i s c a l l e d t h e l e v e l o f significance. 9 EXAMPLE 3 A pheasant hunter claims that he hits 70% of the birds he shoots at. Do you agree with his claim if on a given day he brings down 38 of the 60 pheasants he shoots at? Use a 0.05 level of significance. SOLUTION Let p be the probability that the hunter hits a pheasant and X be the number of pheasants hit by the hunter when he shoots at 60 pheasants. Then: x 60− x P(X = x ) = C 60 . x p (1 − p) p = 0.70 p ≠ 0.70 1. Ho : H1 : 2. α = 0.05 3. Critical region: Reject H 0 if X < 35 or X > 49 . α 34 Note: ∑ C 60x 0.70 x (1 − 0.70) 60− x = 0.0196 < 2 x=0 α 35 ∑ C 60x 0.70 x (1 − 0.70) 60−x = 0.0362 > 2 x=0 α 60 ∑ C 60x 0.70 x (1 − 0.70) 60−x = 0.0139 < 2 x = 50 α 60 ∑ C 60x 0.70 x (1 − 0.70) 60−x = 0.0295 > 2 x = 49 4. Decision: Since 38 does not fall in the critical region, we do not reject the null hypothesis H 0 : p = 0.70 . Understanding the level of significance and type I error For this example, we have: P(the null hypothesis is rejected | p = 0.7 ) = P(X < 35 or X > 49 | p = 0.7) = 0.0335 ≠α 10 COMPUTER SIMULATION We can use computer simulation to understand the method used to solve the above example. p = 0.7 Case 1: The above figure shows that we usually accept the null hypothesis. Case 2: p < 0.7 If p = 0.5 < 0.7 , the values of X are usually smaller and we usually reject H 0 and conclude that p is less than 0.7. Case 3: p > 0.7 If p = 0.88 > 0.7 , the values of X are usually larger and we usually reject H 0 and conclude that p is greater than 0.7. 11 EXAMPLE 4 A manufacturer claimed that at least 90% of the equipment which he supplied to a factory conformed to specifications. An examination of a random sample of 30 pieces of equipment revealed that 23 were not faulty. Test the claim of the manufacturer at a 0.05 level of significance. SOLUTION Aim: Let p be the percentage of pieces of equipment which were not faulty. We want to test whether p ≥ 0.90 is true or not. Note: Let X be the number of pieces of equipment which were not faulty. Then: x 30− x P(X = x) ≈ C 30 x p (1 - p) p = 0.90 p < 0.90 1. Ho : H1 : 2 α = 0.05 3. Critical region: Reject H 0 if X < 24 23 Note: ∑ C 30x 0.90 x (1 − 0.90) 30−x = 0.0258 < α x=0 24 ∑ C 30x 0.90 x (1 − 0.90) 30−x = 0.0732 > α x=0 4. Decision: Since 23 falls in the critical region, we reject the null hypothesis and conclude that p < 0.90 . Understanding the level of significance and type I error For this example, we have: P(the null hypothesis is rejected | p = 0.90 ) = P(X < 24 | p = 0.90) = 0.0258 ≠α 12 COMPUTER SIMULATION We can use computer simulation to understand the method used to solve the above example. p = 0.90 Case 1: The above figure shows that we usually accept the null hypothesis. Case 2: p < 0.90 If p = 0.7 < 0.90 , many of the values of X are smaller and the probability of rejecting H 0 and concluding that p < 0.90 is high. Case 3: p > 0.90 If p = 0.95 > 0.90 , the values of X are usually larger and we usually do not reject H 0 and conclude that p is greater than or equal to 0.90 13 EXAMPLE 5 A basketball player has hit on 75% of his shots from the floor. If on the next 60 shots he makes 50 baskets, can you conclude that his shooting has improved? Use a 0.05 level of significance. SOLUTION Aim: Let p be the probability that the player make a basket. We want to test whether p > 0.75 is true or not. x 60- x Note: Let X be the baskets made on 60 shots. Then P(X = x) ≈ C 60 x p (1 - p) p = 0.75 p > 0.75 1. Ho : H1 : 2. α = 0.05 3. Critical region: Since n = 60 is large, we use the normal approximation to the binomial distribution and the critical region is X − 60 × 0.75 100 × 0.75 × 0.25 > z 0.05 = 1.645 where X is the number of baskets made by the player. 50 − 60 × 0.75 = 1.4907 4. Calculations: 5. Decision: Since 1.4907 is not in the critical region, we do not reject the hypothesis H o : p = 0.75 at the 0.05 level of significance and conclude that his shooting percentage has not improved. 60 × 0.75 × 0.25 Understanding the level of significance and type I error For this example, we have: ⎛ X − 60 × 0.75 ⎞ P(the null hypothesis is rejected | p = 0.75 ) = P⎜ > 1.645 p = 0.75 ⎟ ⎝ 60 × 0.75 × 0.25 ⎠ = 0.05 14 COMPUTER SIMULATION We can use computer simulation to understand the method used to solve the above example. NOTE: You can not use the cumulative binomial probabilities table to obtain the number 50. Hence you have to use the normal approximation to the binomial distribution. Case 1: p = 0.75 The above figure shows that we usually accept the null hypothesis. Case 2: p < 0.75 If p = 0.6 < 0.75 , many of the values of X are smaller and we usually do not reject the null hypothesis and conclude that p is less than or equal to 0.75. Case 3: p > 0.75 If p = 0.90 > 0.75 , the values of X are usually larger and we usually reject H 0 and conclude that p is greater than 0.75. 15 EXERCISE 3. In a random sample of 1000 houses in a certain city, it is found that 618 own color television sets. Is this sufficient evidence to conclude that 2/3 of the houses in this city have color television sets? Use a 0.02 level of significance. Ans. 4. It is believed that at least 60% of the residents in a certain area favor an annexation suit by a neighboring city. What conclusion would you draw if only 110 in a sample of 200 voters favor the suit. Use a 0.04 level of significance. Ans. 5. Reject the hypothesis Do not reject the hypothesis p = 0.6 . The manufacturer of a patent medicine claimed that it was at least 90% effective in relieving an allergy for a period of 8 hours. In a sample of 200 people who had the allergy, the medicine provided relief for 160 people. Using a 0.01 level of significance, determine whether the manufacturer’s claim is legitimate. Ans. Reject the claim. 16
© Copyright 2024