Chapter 8 Statistical Intervals for a Single Sample 8-1 INTRODUCTION

Chapter 8 Statistical Intervals for a Single Sample
8-1 INTRODUCTION
z
An interval estimate for a population parameter is called a confidence
interval, the confidence interval is constructed so that we have high
confidence that it does contain the unknown population parameter.
z
A tolerance interval is another important type of interval estimate.
z
For a normal distribution, we know that 95% of the distribution is in
the interval
µ − 1.96σ, µ + 1.96σ
z
(8-1)
However, this is not a useful tolerance interval because the parameters
µ and σ are unknown.
z
Point estimates such as x and s can be used in Equation 8-1 for µ
and σ . The result is an interval of the form
x − ks , x + ks
(8-2)
where k is an appropriate constant (that is larger than 1.96 to account
for the estimation error).
z
A prediction interval provides bounds on one (or more) future
observations from the population.
z
Keep the purpose of the three types of interval estimates clear:
z
A confidence interval bounds population or distribution
parameters (such as the mean viscosity).
z
A tolerance interval bounds a selected proportion of a
distribution.
A prediction interval bounds future observations from the
z
population or distribution.
8-2 CONFIDENCE INTERVAL ON THE MEAN OF A
NORMAL DISTRIBUTION, VARIANCE KNOWN
8-2.1 Development of the Confidence Interval and its Basic
Properties
z
Suppose that X1, X2, , Xn is a random sample from a normal
distribution with unknown mean µ and known variance σ 2 .
z
Sample mean X is normally distributed with mean µ and variance
σ2 / n .
X −µ
σ/ n
z
Standardize X : Z =
z
A confidence interval (CI) estimate for µ is an interval of the form
(8-3)
l ≤ µ ≤ u , where the endpoints l and u are computed from the sample
data.
z
Because different samples will produce different values of l and u,
these end-points are values of random variables L and U.
z
Determine values of L and U such that the following probability
statement is true:
P{L ≤ µ ≤ U } = 1 − α
z
Once
we
have
selected
the
sample,
(8-4)
so
that
X 1 = x1 ,
X 2 = x2 , … , X n = x n , and computed l and u, the resulting confidence
interval for µ is
l ≤µ≤u
z
(8-5)
The end-points or bounds l and u are called the lower- and
upper-confidence limits.
z
1 − α is called the confidence coefficient.
X −µ
⎧
⎫
≤ zα / 2 ⎬ = 1 − α
P ⎨− z α / 2 ≤
σ n
⎩
⎭
σ
σ ⎫
⎧
≤ X − µ ≤ zα / 2 ⋅
P ⎨− z α / 2 ⋅
⎬ = 1− α
n
n
⎩
⎭
σ
σ ⎫
⎧
≤ −µ ≤ − X + z α / 2 ⋅
P ⎨− X − z α / 2 ⋅
⎬ = 1− α
n
n⎭
⎩
σ
σ ⎫
⎧
≤ µ ≤ X − zα / 2 ⋅
P⎨ X + zα / 2 ⋅
⎬ = 1− α
n
n⎭
⎩
1− α
α/2
− zα /2
α/2
+ zα /2
Definition
If is the sample mean of a random sample of size n from a normal
population with known variance σ 2 , a 100(1 − α )% CI on µ is given by
x − zα / 2 σ / n ≤ µ ≤ x + zα / 2σ / n
(8-7)
where z α / 2 σ is the upper 100α / 2 percentage point of the standard
normal distribution.
EXAMPLE 8-1
z
Ten measurements of impact energy (J) on specimens of A238
steel cut at 60ºC are as follows: 64.1, 64.7, 64.5, 64.6, 64.5, 64.3,
64.6, 64.8, 64.2, and 64.3.
z
Assume that impact energy is normally distributed with σ = 1J.
z
Find a 95% CI for µ .
z
The required quantities are z α / 2 = z 0.025 = 1.96 , n = 10 , σ = 1 ,
and x = 64.46 .
95%
2.5%
− z 0 .025
2.5%
z 0 .025
P( z < z 0.025 ) = (1 − 95%) / 2 + 95% = 97.5%
⇒ z 0.025 = 1.96
σ
σ
≤ µ ≤ x + zα / 2
n
n
1
1
64.46 − 1.96
≤ µ ≤ 64.46 + 1.96
10
10
63.84 ≤ µ ≤ 65.08
x − zα / 2
z
Find a 99% CI for µ !
z
Confidence Level and Precision of Estimation
z
In Example 8-1
z
At α = 0.01, we find z α / 2 = z 0.01 / 2 = z 0.005 = 2.58 , while for α = 0.05 ,
z 0.025 = 1.96 .
z
The length of the 95% confidence interval
2(1.96σ / n ) = 3.92σ / n
z
The length of the 99% CI
2(2.58σ / n ) = 5.16σ / n
z
Thus, the 99% CI is longer than the 95% CI.
z
Generally, for a fixed sample size n and standard deviation σ , the
higher the confidence level, the longer the resulting CI.
z
The length of a confidence interval is a measure of the precision of
estimation. CI ∝
1
percision of estimation
8-2.2 Choice of Sample Size
z
The precision of the confidence interval in Equation 8-7 is
2 zσ / 2σ / n .
z
The error E = x − µ is less than or equal to z α / 2 σ / n with
confidence 100(1 − α) . See Fig. 8-2.
Figure 8-2 Error in estimating µ with x .
z
The appropriate sample size is found by choosing n such that
zα / 2σ / n = E .
Definition
If is used as an estimate of µ , we can be 100(1 − α) % confident that the
error x − µ will not exceed a specified amount E when the sample size
is
⎛ z σ⎞
n = ⎜ α/2 ⎟
⎝ E ⎠
2
,
CI = 2 E
(8-8)
z
Notice that 2E is the length of the resulting confidence interval.
EXAMPLE 8-2
z
In Example 8-1, suppose that we wanted to determine how many
specimens must be tested to ensure that the 95% CI on µ for
A238 steel cut at 60°C has a length of at most 1.0J.
z
Since the bound on error in estimation E is one-half of the length
of the CI, to determine n we use Equation 8-8 with E = 0.5 ,
σ = 1, and z α / 2 = 0.025 .
⎛ z σ⎞
⎡ (1.96)1⎤
n = ⎜ α/2 ⎟ = ⎢
= 15.37 ≅ 16
⎣ 0.5 ⎥⎦
⎝ E ⎠
2
2
8-2.3 One-Sided Confidence Bounds
Definition
A 100(1 − α) % upper-confidence bound for µ is
µ ≤ u = x + zα σ / n
(8-9)
and a 100(1 − α) % lower-confidence bound for µ is
x − zα σ / n = l ≤ µ
(8-10)
8-2.5 A Large-Sample Confidence Interval for µ
z
It turns out that when n is large, replacing σ by the sample
standard deviation S has little effect on the distribution of Z.
Definition
When n is large, the quantity
X −µ
S/ n
has an approximate standard normal distribution. Consequently,
x − zα / 2
s
s
≤ µ ≤ x + zα / 2
n
n
(8-13)
is a large sample confidence interval for µ , with confidence level of
approximately 100(1 − α) %.
z
Generally n should be at least 40 to use this result reliably.
EXAMPLE 8-3
z
A sample of fish was selected from 53 Florida lakes
z
Mercury concentration in the muscle tissue was measured (ppm).
z
The mercury concentration values are
z
The summary statistics from Minitab are displayed below:
Descriptive Statistics: Concentration
Variable
N
Mean
Median
TrMean
StDev
SE Mean
Concentration 53
0.5250
0.4900
0.5094
0.3486
0.0479
Variable
Maximum Q1
Q3
1.3300
0.7900
Minimum
Concentration 0.0400
z
0.2300
Figure 8-3(a) and (b) presents the histogram and normal
probability plot of the mercury concentration data.
z
Find an approximate 95% CI on µ .
z
n = 53, x = 0.5250, s = 0.3486, and z 0.025 = 1.96.
z
Because n>40, we can use Eq 8-13.
s
s
≤ µ ≤ x + z 0.025
n
n
0.3486
0.3486
≤ µ ≤ 0.5250 + 1.96
0.5250 − 1.96
53
53
0.4311 ≤ µ ≤ 0.6189
x − z 0.025
z
Find an approximate 99% CI on µ .
Figure 8-3 Mercury concentration in largemouth bass (a) Histogram. (b)
Normal probability plot.
z
A General Large Sample Confidence Interval.
z
The large-sample confidence interval for µ in Equation 8-13 is a
special case.
z
Suppose that θ is a parameter of a probability distribution.
z
ˆ be an estimator of θ .
Let Θ
z
A large-sample approximate CI for θ is given by
θˆ − z α / 2 σ Θˆ ≤ θ ≤ θˆ + z α / 2 σ Θˆ
Exercise 8-2 : 8-1, 8-3, 8-5, 8-7, 8-9, 8-11, 8-13, 8-15.
(8-14)
8-3 CONFIDENCE INTERVAL ON THE MEAN OF A
NORMAL DISTRIBUTION, VARIANCE UNKNOWN
z
We wish to construct a two-sided CI on µ .
z
If the variance σ 2 is known, we know that Z = ( X − µ) /(σ / n )
has a standard normal distribution.
z
When σ 2 is unknown, a logical procedure is to replace σ with the
sample standard deviation S.
z
The random variable Z now becomes T = ( X − µ) /( S / n ) .
z
A logical question is what effect does replacing σ by S have on the
distribution of the random variable T?
z
If n is large, the answer to this question is “very little.”
z
However, n is usually small in most engineering problems, and in this
situation a different distribution must be employed to construct the CI.
8-3.1 The t Distribution
Definition
Let X 1 , X 2 , ... , X n be a random sample from a normal distribution with
unknown mean µ and unknown variance σ 2 . The random variable
T=
X −µ
S/ n
has a t distribution with n - 1 degrees of freedom.
(8-15)
z
The t probability density function
f ( x) =
Γ[(k + 1) / 2]
1
⋅ 2
-∞ < x < ∞
( k +1) / 2
πk Γ(k / 2) [( x / k ) + 1]
(8-16)
where k is the number of degrees of freedom. The mean and variance
of the t distribution are zero and k/(k - 2) (for k > 2), respectively.
z
Appendix Table IV provides percentage points of the t distribution.
z
Let t α ,k be the value of the random variable T with k degrees of
freedom above which we find an area (or probability) α .
Figure 8-5 Percentage points of the t distribution.
z
Ex: The t-value with 10 degrees of freedom having an area of 0.05 to
the right is t 0.05,10 = 1.812 .
查表
P (T10 > t 0.05,10 ) = P (T10 > 1.812) = 0.05
99%
0.05%
0.05%
t 0.05,10
8-3.3 The t Confidence Interval on µ
Definition
If and s are the mean and standard deviation of a random sample from a
normal distribution with unknown variance σ 2 , a 100(1 − α) percent
confidence interval on µ is given by
x − t α / 2,n−1 s / n ≤ µ ≤ x + t α / 2 ,n−1 s / n
(8-18)
where tα / 2,n−1 is the upper 100α / 2 percentage point of the t distribution
with n – 1 degrees of freedom. (n < 40)
EXAMPLE 8-4 (n < 40)
z
The load at specimen failure is as follows (in megapascals):
19.8 10.1 14.9
7.5
15.4 15.4
15.4 18.5 7.9
12.7 11.9 11.4
11.4 14.1 17.6 16.7 15.8
19.5 8.8
z
13.6
11.9
11.4
The sample mean is x = 13.71 , and the sample standard
deviation is s = 3.55.
z
We want to find a 95% CI on µ .
z
Since n = 22, we have n - 1 = 21 degrees of freedom for t, so
t 0.025, 21 = 2.080 .
x − t α / 2 ,n−1 s / n ≤ µ ≤ x + t α / 2 ,n−1 s / n
13.71 − 2.080(3.55) / 22 ≤ µ ≤ 13.71 + 2.080(3.55) / 22
13.71 − 1.57 ≤ µ ≤ 13.71 + 1.57
12.14 ≤ µ ≤ 15.28
Exercise 8-3 : 8-21, 8-23, 8-25, 8-27, 8-29, 8-31.
8-4 CONFIDENCE INTERVAL ON THE VARIANCE AND
STANDARD DEVIATION OF A NORMAL POPULATION
Definition
Let X 1 , X 2 , ... , X n be a random sample from a normal distribution with
mean µ and variance σ 2 , and S 2 let be the sample variance. Then the
random variable
(n − 1) S 2
X =
σ2
2
(8-19)
has a chi-square ( X 2 ) distribution with n - 1 degrees of freedom.
z
The probability density function of a χ 2 random variable is
f ( x) =
1
x ( k / 2 )−1e − x / 2
k/2
2 Γ ( k / 2)
x>0
where k is the number of degrees of freedom.
z
Several chi-square distributions are shown in Fig. 8-8.
(8-20)
Figure 8-8 Probability density functions of several χ 2 distributions.
z
Define χ α2 ,k as the percentage point or value of the chi-square random
variable with k degrees of freedom such that the probability that X 2
exceeds this value is α .
P( X > χ ) =
2
2
α ,k
∞
∫ f (u )du = α
χα2 ,k
This probability is shown as the shaded area in Fig. 8-9(a).
Figure 8-9 Percentage point of the χ 2 distribution. (a) The percentage
point χ α2 ,k . (b) The upper percentage point χ 02.05,10 = 18.31 and the lower
percentage point χ 02.95,10 = 3.94 .
z
z
z
The construction of the 100(1 − α) % CI for σ 2 .
(n − 1) S 2
σ2
Chi-square with n − 1 degrees of freedom, we may write
Because X 2 =
P(χ12−α / 2 ,n−1 ≤ X 2 ≤ χ α2 / 2 ,n−1 ) = 1 − α
(n − 1) S 2
⎛ 2
⎞
P⎜ χ1−α / 2 ,n−1 ≤
≤ χ α2 / 2 ,n−1 ⎟ = 1 − α
2
σ
⎝
⎠
⎛ (n − 1) S 2
(n − 1) S 2 ⎞
⎟ = 1− α
P⎜⎜ 2
≤ σ2 ≤ 2
⎟
χ
χ
1−α / 2 ,n −1 ⎠
⎝ α / 2 ,n−1
1− α
α/2
α/2
χ12−α / 2,n−1
χ α2 / 2,n−1
Definition
If s 2 is the sample variance from a random sample of n observations
from a normal distribution with unknown variance
σ 2 , then
a 100(1 − α )% confidence interval on σ 2 is
(n − 1) S 2
(n − 1) S 2
2
≤σ ≤ 2
χ α2 / 2,n−1
χ1−α / 2,n−1
where
χ α2 / 2,n −1
and
χ12−α / 2,n−1
(8-21)
are the upper and lower 100α / 2
percentage points of the chi-square distribution with n − 1 degrees of
freedom, respectively. A confidence interval for σ has lower and
upper limits that are the square roots of the corresponding limits in
Equation 8-21.
The 100(1 − α) % lower and upper confidence bounds on σ 2 are
(n − 1) s 2
≤ σ2
2
χ α , n −1
and
(n − 1) s 2
σ ≤ 2
χ1− α , n −1
2
(8-22)
respectively.
EXAMPLE 8-5
z
An automatic filling machine is used to fill bottles with liquid
detergent.
z
A random sample of 20 bottles results in a sample variance of fill
volume of s 2 = 0.0153 (fluid ounces) 2 .
z
We will assume that the fill volume is approximately normally
distributed.
z
A 95% upper-confidence interval is found from Equation 8-22.
(n − 1) s 2
σ ≤ 2
χ 0.95,19
2
σ2 ≤
z
(19)0.0153
= 0.0287 (fluid ounce) 2
10.117
σ ≤ 0.17
Therefore, at the 95% level of confidence, the data indicate that
the process standard deviation could be as large as 0.17 fluid
ounce.
Exercise 8-4 : 8-33, 8-35, 8-37, 8-39, 8-41
8-5 A LARGE-SAMPLE CONFIDENCE INTERVAL FOR A
POPULATION PROPORTION
z
Construct confidence intervals on a population proportion.
z
A random sample of size n has been taken from a large (possibly
infinite) population.
z
X (≤ n) observations in this sample belong to a class of interest.
z
Pˆ = X / n is a point estimator of the proportion of the population p
that belongs to this class.
z
n and p are the parameters of a binomial distribution.
z
The sampling distribution of Pˆ is approximately normal with mean p
and variance p (1 − p ) / n .
X
Pˆ =
n
X
n
interest class
interest class
=p
population
⎡X ⎤ 1
E[ Pˆ ] = E ⎢ ⎥ = ⋅ np = p
⎣n⎦ n
1
1
1
1
V [ Pˆ ] = 2 V [ X ] = 2 ⋅ npq = pq = p ⋅ (1 − p )
n
n
n
n
Definition
If n is large, the distribution of
Z=
X − np
=
np (1 − p )
is approximately standard normal.
Pˆ − p
p (1 − p )
n
z
To construct the confidence interval on p
P(− z α / 2 ≤ Z ≤ z α / 2 ) ≅ 1 − α
⎛
⎜
P⎜ − z α / 2 ≤
⎜
⎜
⎝
⎞
⎟
ˆ
P− p
≤ zα / 2 ⎟ ≅ 1 − α
⎟
p (1 − p )
⎟
n
⎠
⎛
P⎜⎜ Pˆ − z α / 2
⎝
p (1 − p )
≤ p ≤ Pˆ + z α / 2
n
p (1 − p ) ⎞
⎟⎟ ≅ 1 − α
n
⎠
standard error of the point estimator Pˆ
z
Equation 8-23 contain the unknown parameter p.
z
To replace Pˆ by in the standard error.
Definition
If pˆ is the proportion of observations in a random sample of size n that
belongs to a class of interest, an approximate 100(1 − α) % confidence
interval on the proportion p of the population that belongs to this class is
pˆ − z α / 2
pˆ (1 − pˆ )
≤ p ≤ pˆ + z α / 2
n
pˆ (1 − pˆ )
n
(8-25)
where z α / 2 is the upper α / 2 percentage point of the standard normal
distribution.
EXAMPLE 8-6
z
In a random sample of 85 automobile engine crankshaft bearings.
z
10 have a surface finish that is rougher than the specifications
allow.
z
z
pˆ = x / n = 10 / 85 = 0.12.
A 95% two-sided confidence interval for p is computed from
Equation 8-25 as
pˆ (1 − pˆ )
≤ p ≤ pˆ + z 0.025
n
pˆ − z 0.025
pˆ (1 − pˆ )
n
0.12(0.88)
0.12(0.88)
≤ p ≤ 0.12 + 1.96
85
85
0.05 ≤ p ≤ 0.19
0.12 − 1.96
Choice of Sample Size
z
Pˆ is the point estimator of p.
z
The error in estimating p by Pˆ as E = p − Pˆ .
z
Set E = z α / 2 p (1 − p) / n and solve for n.
E = (zα / 2 )
2
2
p (1 − p )
⎛z ⎞
⋅
⇒ n = ⎜ α / 2 ⎟ ⋅ p (1 − p )
n
⎝ E ⎠
2
2
⎛z ⎞
n = ⎜ α / 2 ⎟ p (1 − p )
⎝ E ⎠
z
(8-26)
Choosing n uses the fact that the sample size from Equation 8-26 will
always be a maximum for p = 0.5.
upper bound on n
2
⎛z ⎞
n = ⎜ α / 2 ⎟ (0.25)
⎝ E ⎠
Set p = 0.5
(8-27)
EXAMPLE 8-7
z
See Ex 8-6.
z
How large a sample is required if we want to be 95% confident
that the error in using pˆ to estimate p is less than 0.05?
z
Using pˆ = 0.12 as an initial estimate of p.
2
2
⎛z ⎞
⎛ 1.96 ⎞
n = ⎜ 0.025 ⎟ pˆ (1 − pˆ ) = ⎜
⎟ 0.12(0.88) ≅ 163
⎝ 0.05 ⎠
⎝ E ⎠
z
At least 95% confident that our estimate pˆ of the true
proportion p was within 0.05 regardless of the value of p, we
would use Equation 8-27 to find the sample size
2
2
⎛z ⎞
⎛ 1.96 ⎞
n = ⎜ 0.025 ⎟ (0.25) = ⎜
⎟ (0.25) ≅ 385
⎝ 0.05 ⎠
⎝ E ⎠
One-Sided Confidence Bounds
The approximate 100(1 − α) % lower and upper confidence bounds are
pˆ − z α
pˆ (1 − pˆ )
≤p
n
respectively.
Exercise 8-5 : 8-43, 8-45, 8-47.
and
p ≤ pˆ + z α
pˆ (1 − pˆ )
n
(8-28)
8-6
A
PREDICTION
INTERVAL
FOR
A
FUTURE
OBSERVATION
z
In this section we show how to obtain a 100(1 − α) % prediction
interval on a future value of a normal random variable.
z
Suppose that X 1 , X 2 , ..., X n is a random sample from a normal
population.
z
We wish to predict the value X n+1 , a single future observation.
z
A point prediction of X n+1 is X , the sample mean.
z
The prediction error is X n+1 − X .
z
The expected value of the prediction error is
E ( X n+1 − X ) = µ − µ = 0
z
The variance of the prediction error
V ( X n+1 − X ) = σ 2 +
z
The prediction error X n+1 − X is normally distributed.
Z=
z
σ2
1
= σ 2 (1 + )
n
n
X n+1 − X
1
σ 1+
n
Replacing σ with S results in
T=
X n+1 − X
1
S 1+
n
which has a t distribution with n − 1 degrees of freedom.
Definition
A 100(1 − α) % prediction interval on a single future observation from a
normal distribution is given by
x − t α / 2,n−1 s 1 +
1
1
≤ X n+1 ≤ x + t α / 2 ,n−1 s 1 +
n
n
(8-29)
EXAMPLE 8-8
z
See Example 8-4.
z
The load at failure for n = 22 specimens was observed.
z
x = 13.71 and s = 3.55 .
z
95% confidence interval on µ was 12.14 ≤ µ ≤ 15.28 .
z
We plan to test a twenty-third specimen.
z
A 95% prediction interval on the load at failure for this specimen
is
x − t α / 2,n−1 s 1 +
1
1
≤ X n+1 ≤ x + t α / 2,n−1 s 1 +
n
n
13.71 − (2.080)3.55 1 +
1
1
≤ X 23 ≤ 13.71 + (2.080)3.55 1 +
22
22
6.16 ≤ X 23 ≤ 21.26
z
Notice that the prediction interval is considerably longer than
the CI.
Exercise 8-6 : 8-49, 8-51, 8-55, 8-57, 8-59.
8-7
TOLERANCE
INTERVALS
FOR
A
NORMAL
DISTRIBUTION
z
The interval from µ − z α / 2 to µ − z α / 2 σ is called a tolerance interval.
Definition
A tolerance interval for capturing at least γ % of the values in a normal
distribution with confidence level 100(1 − α) % is
x − ks,
x + ks
where k is a tolerance interval factor found in Appendix Table XI.
Values are given for γ = 90%, 95%, and 95% and for 95% and 99%
confidence.
EXAMPLE 8-9
z
See Example 8-4.
z
The load at failure for n = 22 specimens was observed, and we
found that x = 31.71 and s = 3.55 .
z
Find a tolerance interval for the load at failure that includes 90%
of the values in the population with 95% confidence.
z
The tolerance factor k for n = 22 , γ = 0.90 , and 95% confidence
is k = 2.264 .
z
The desired tolerance interval is
( x − ks, x + ks)or[31.71 − (2.264)3.55,31.71 + (2.264)3.55]
= (23.67,39.75)
z
95% confident that at least 90% of the values of load at failure for
this particular alloy lie between 23.67 and 39.75 megapascals.
Exercise 8-7 : 8-61, 8-63, 8-65, 8-67, 8-69, 8-71, 8-77, 8-79, 8-81, 8-83,
8-85.