Download Report

One-sample Tests of Hypothesis
Ka-fu WONG
23 June 2007
Abstract
Like confidence interval, hypothesis test is a probability statement. Often, we have to decide whether
we should accept or reject a claim or a statement about the population parameters made by others,
based on information from a random sample. The concepts covered in this chapter are very important
and can be extended to more complicated siutations. Thus, a thorough understanding of this chapter is
important. Although it might appear obivious, there is in fact a close connection between constructing
confidence interval and testing hypothesis. Thus, this chapter will also serve as a test of our understanding
of confidence interval, as well.
We all have some belief about things around us. Most of the times, the belief is correct. Sometimes that
belief turns out incorrect. When do we reject our original belief about these things? We do so only if we
see strong evidence against it. In the early days it was believed that earth was a flat plain. Some pioneer
astronomists proposed that the earth took the shape of a ball. Although it is now obvious to us, it actually
took very strong evidence and decades to convince people that the earth was indeed ball-sahped.
1
Hypothesis
Definition 1 (Hypothesis): A Hypothesis is a statement about the value of a population parameter developed for the purpose of testing.
Example 1 (Hypothesis):
1. The current unemployment rate is 5%.
2. The current unemployment rate is larger than 5%.
3. At least 20% of people in the local economy earns less than 2000 dollars a month.
4. The mean monthly income for systems analysts is $3,625.
5. Twenty percent of all customers at Spaghetti House return for another meal within a month.
To illustrate the idea of testing hypothesis, consider the following example.
1
Example 2 (Extreme case – rejecting a hypothesis based on only one observation): After a midterm exam, the teaching assistant of a class of 127 students announced that the average mark
was 70. David got only 65. He suspected that the true average mark was lower than 70. Based
on his own result of 65, which is less than the announced average of 70, should David reject the
teaching assistant’s statement about the population average mark and favor his own hypothesis?
Probably not, because in most cases (with some reasonable standard deviation), the chance of
seeing a random draw of 65 when the true average is 70 is quite high. For instance, suppose the
marks of the 127 students are distributed as normal with mean 70 and standard deviation of 8,
the chance of seeing a random draw of 65 or lower is
P rob(x < 65 | x ∼ N (70, 8)) = P rob( x−70
<
8
65−70
8 )
= P rob(z < −0.625) = 0.266
which is usually not viewed as small.
For the sake of illustration, suppose the marks of the 127 students are distributed as normal with
mean 70 and standard deviation of 2, the chance of seeing a random draw of 65 or lower is
P rob(x < 65 | x ∼ N (70, 2)) = P rob( x−70
<
2
65−70
2 )
= P rob(z < −2.5) = 0.00621
which is usually viewed as small.
Thus, depending on the dispersion (variance) of the population, often one observation cannot be considered as hard evidence against a hypothesis. As we learn from previous chapters, it is possible to improve our
precision of the estimator about the population parameter with a larger sample. Thus, if the standard deviation of the distribution of marks were 8, David might want to collect more information before he concludes
whether to reject the statement.
Example 3 (Extreme case – rejecting a hypothesis based on the population): After a mid-term
exam, the teaching assistant of a class of 127 announced that the average mark was 70. David
who believed himself to have done a very good job in the exam got only 65. He was shocked to
know that the average mark was 70. He suspected that the true average mark was lower than
70. To verify his hypothesis, he sent an email to all students taking the course to gather their
marks. Suppose all students told him truthfully about their marks. He was able to compute
the population average mark, which turned out to be 60. Therefore, he rejected the teaching
assistant’s statement about the population average mark and favor his own hypothesis.
2
In this extreme example, David obtained the population average mark to check against the statement
made by teaching assistant. In this case, rejecting the TA’s statement about the average midterm mark is
not a probability statement.
However, in a lot of situations, it is impossible to obtain information about the whole population. Instead,
we have only a random sample (i.e., a subset) of the population.
For instance, in the last example, it could be that only 30 students reply to David’s email. Suppose
the responses were random and the average from the sample was 60. Can we reject the teaching assistant’s
statement about the population average mark? Yes, we might, but with a chance (probability) of making
a mistake. When we reject the statement, we want to inform the readers how likely (i.e., this chance or
probability) we will be making a mistake (rejecting the statement when the statement is actually correct).
Generally, we would want to minimize this chance of making a mistake. That is, the hypothesis that
“the average mark is 70” is maintained until we observe very strong evidence against it. In a sense, we are
giving the benefit of doubt to this hypothesis — so called null hypothesis.1 The null hypothesis is presumed
true until we prove beyond reasonable doubt that it is false. “Beyond reasonable doubt” means that “the
probability of rejecting our maintained hypothesis when the null hypothesis is true” is less than an a priori
level of significance (usually, 10%, 5% or 1%).
Definition 2 (Null and alternative Hypothesis): Null Hypothesis (often written as H0 ) is a
maintained hypothesis. The altnerative hypothesis (often written as Ha or H1 ) is the hypothesis
we will accept when the maintained hypothesis is rejected.
Example 4 (Null and alternative hypothesis): In the example, the null and alternative hypothesis
can be stated as one of the followings:
1. H0 : The average mark is 70.
H1 : The average mark is not 70.
2. H0 : The average mark is 70.
H1 : The average mark is less than 70
Note that the second pair of hypotheses differs from the first one in that the alternative is onesided (or one-tail): “less than 70” instead of “not equal”. In essence, “the average mark is not 70”
means that the average mark is either less or greater than 70. Thus, the first set fo hypothesis
may be called a two-sided test or a two-tail test.
1 In
court, the defendant is presumed innocent until proven beyond reasonable doubt to be guilty of stated charges.
3
In the examples above, David’s claims that the average mark is lower than 70 is like charging a person
(the teaching assitant) guilty. Like the court, we will give the benefit of doubt to the opposite statement
that the average mark is 70.
2
Type I and Type II Errors
Our acceptence or rejection of null hypothesis is based on sample information. Sampling errors prevent
us from knowing the truth exactly. Thus, there is some probability that we will make a mistake in our
acceptance or rejection decision based on the sample. There are two types of mistakes/errors:
1. Type I error: Reject a null hypothesis when the null hypothesis is actually correrct.
2. Type II error: Accept the null hypothesis when the null hypothesis is actually false.
Truth: null true
Truth: null false
Decision: Accept null
correct decision
Type II error
Decision: Reject null
Type I error
Correct decision
Theoretically, we would like to minimize both the probability of Type I errors and the probability of
Type II errors. However, there is a trade off between committing the two errors. In the extreme case, we
can avoid Type I error completely by never rejecting the null, i.e. Prob(reject null | null true) = 0. But,
in this case, we will be committing type II error too often, i.e., Prob(accept null | null false) = 1. In the
other extreme, to avoid Type II error completely, we always reject the null, i.e., Prob(reject null | null false)
= 1. But, in this case, we will be committing type I error too often, i.e., Prob(reject null | null true) = 1.
In practice, for practitioners at least, we often focus on reducing the probability of committing the Type I
error to some tolerable level. This tolerable level of probability of committing the Type I error is known as
the level of significance.
Definition 3 (Level of significance): Level of significance (often denoted as α) is the probability
of rejecting the null hypothesis when it is actually true. That is, Prob(reject null | null true) =
α.
Test of hypothesis is based on the sampling information. A summary of the sampling information is
called statistics. Test statistics is a statistics developed for the purpose of testing hypothesis.
Definition 4 (Test statistic): Test statistic is a value, determined from sample information, used
to determine whether or not to reject the null hypothesis.
4
If, under the null hypothesis, the probability of observing the sample is less than α, the null is rejected.
From a sample, we can derive many statistics. Some statistics are better than the others in testing our
hypothesis. What statistics to use in testing the hypothesis depends on the type of hypothesis we have. If
we are testing whether the population mean is equal to zero, naturally we will use the sample mean as a test
statistic.
In order to talk about the probability of observing the sample (or sample statistic), we will have to
know the probability distrbution of the statistic, which is generally a random variable. Often, we will try to
standardize the test statistic so that the test statistic will have some common distributions, such as standard
normal distribution and Student-t distribution. From the distribution of the “random” test statistic, we can
find the critical value for our decision.
Definition 5 (Critical value): Critical value is the dividing point between the region where the
null hypothesis is rejected and the region where it is not rejected.
3
Testing population mean
Often, we are asked to test whether the population mean equals to some number.
Suppose we have a population with mean µ and variance σ 2 . Suppose that σ 2 is known. We are asked
to test whether the population mean is equal to k (H0 : µ = k versus H1 : µ 6= k) based on a sample of n
observations (say, n > 30) drawn from the population.
First, we know the logical test statistic is the sample mean. If the observed sample mean differs very much
from k (the value that the population mean is assumed to take under H0 ), i.e., when m is too small or too
big, we will reject the null hypothesis. Second, we know that the sample mean of n i.i.d. observations from
the population is a random variable and is approximately normal with mean k and variance σ 2 /n because
of the Central Limit Theorem. Let m denote the random variable of sample mean. We know that, under
the null
m−k
z=p
σ 2 /n
A
∼ N (0, 1)
Thus, if the observed z (denoted as zˆ) differs very much from 0 (the population mean of z under H0 ), i.e.,
when z is much smaller than 0 or when z is much bigger than 0, we will reject the null hypothesis.
Suppose we want to keep the probability of type I error to be 0.05 (i.e., α = 0.05). Then, we will be
looking for a c1 (much smaller than zero) and c2 (much larger than zero) such that P rob(z > c1 ) + P rob(z <
c2 ) = 0.05. And, we will reject the null when zˆ > c1 or zˆ < c2 . Because of the symmetry property of normal
5
distribution and for convenience, we often make c2 = −c1 . From the standard normal table, we read that c1
would have to be 1.96 for P rob(z > c1 ) + P rob(z < −c1 ) = 0.05, i.e., P rob(|z| > 1.96) = 0.05.
Of course, the critical values of c1 for z can be converted to critical values for m. Let d1 and d2 be such
critical values for m. The critical values d1 and d2 may be computed using the following relationship.
c1
=
c2
=
p
d −k
p1
⇒ d1 = k + c1 × σ 2 /n
2
σ /n
p
d −k
p2
⇒ d2 = k + c2 × σ 2 /n
σ 2 /n
We reject the null if m > d1 or m < d2 .
Example 5 (Significance level, critical value and the null): Suppose we want to test the population mean, µ, using a sample with n (large enough) observations from a huge population with
variance σ 2 . The null and alternative are given by H0 = k versus H1 6= k. The fantastic Central
Limit Theorem gives us the following graph about the distribution of sample mean, m:
m2 d2
k
m3 d1 m1
As mentioned above, whenever our sample mean m < d2 or m > d1 , i.e. m1 and m2 in the graph,
we are ready to reject H0 . But we do not reject H0 for sample mean (like m3 ) falling in between
d1 and d2 .
Simulation 1 (Testing hypothesis): Let’s simulate a test of the hypothesis H0 : µ = 0 versus
H1 : µ 6= 0 when the true µ takes different values.
1. Fix a µ, say, µ = 0.
2. Generate n (=50) obervations from x = µ + , ∼ N (0, 1), so that x ∼ N (µ, 1).
6
3. For this sample, compute
m−0
z=p
σ 2 /n
Reject H0 and favor H1 if |z| > zα/2 . Do this for different values of α.
4. Repeat the last two steps 1000 times and compute the percentage of samples that reject H0 .
The following table reports the percentage of the simulated samples in which the null (H0 : µ = 0)
is rejected.
α
Rejection rule
µ=0
µ = 0.1
µ = 0.3
µ = 0.5
µ=1
µ=2
µ=5
0.01
|z| > 2.576
1.20%
2.80%
32.90%
85.90%
100.00%
100.00%
100.00%
0.05
|z| > 1.960
4.40%
10.80%
60.80%
94.80%
100.00%
100.00%
100.00%
0.10
|z| > 1.645
8.70%
18.40%
70.80%
97.40%
100.00%
100.00%
100.00%
We can see that the percentage of simulated samples are close to α when the null is true, i.e., µ
is in fact zero. When µ is not zero, the rejection rate is higher than α because m is more likely to
be much larger than the value of m under the null. This higher rejection rate is more apparent
when the true mean differs from the hypothesized value very much.
[Reference: Sim1.xls]
4
4.1
Testing hypothesis when the variance is unknown
Variance unknown but sample size is reasonably large
In the discussion above, we have assumed that the population variance is known. However, in real situations,
the populatioin variance is not known and has to be estimated. It turns out that, as the following simulation
shows, we can still assume the test statistic to be normal if the sample size is reasonally large (n > 30).
Simulation 2 (Does it matter if we know the population variance?): Let’s simulate a test of the
hypothesis H0 : µ = 0 versus H1 : µ 6= 0 when the true µ takes different values, and when the
population variance has to be estiimate.
1. Fixed a µ, say, µ = 0.
2. Generate n (=50) obervations from x = µ + , ∼ N (0, 1), so that x ∼ N (µ, 1).
7
3. For this sample, compute
m−0
z=p
s2 /n
Reject H0 and favor H1 if |z| > zα/2 . Do this for different values of α.
4. Repeat the last two steps 1000 times and compute the percentage of samples that reject H0 .
The following table summarize our simulation results.
α
Rejection rule
µ=0
µ = 0.1
µ = 0.3
µ = 0.5
µ=1
µ=2
µ=5
0.01
|z| > 2.576
1.20%
3.30%
35.00%
85.00%
100.00%
100.00%
100.00%
0.05
|z| > 1.960
4.80%
11.30%
59.90%
94.30%
100.00%
100.00%
100.00%
0.10
|z| > 1.645
8.80%
19.50%
71.20%
96.80%
100.00%
100.00%
100.00%
Thus, using estimate variance to replace the population variance is fine when the sample size is
large.
[Reference: Sim2.xls]
4.2
Variance is unknown and sample size is small
In the example, n > 30. What if the number of observation is less than 30? We have discussed similar
siuations in last chapter. In fact, the discussion here is almost identical to that in the last chapter of
constructing confindence interval. This is complicated. When n < 30, we cannot apply the Central Limit
Theorem to get normality. However, if we are willing to impose some additional assumptions, we can still
conduct hypothesis testing but in a slightly different manner. Basically we have to impose the assumption
of normality of the population distribution. If the popuation distribution is normal, the random variable m
will be normal with mean k and variance σ 2 /n under the null.
1. If σ 2 is known, we will have
m−k
z=p
σ 2 /n
A
∼ N (0, 1)
If σ 2 is unknown, we have to use the sample estimate of σ 2 as a substitute. Let’s denote the sample estimate
be s2 . We have
m−k
z=p
s2 /n
∼ t(df = n − 1)
8
Can we check the normality assumption (of the underlying population)? Theoretically, the normality
assumptioin might be checked. Note that, however, this assumption is need when the sample size is small.
When we have small sample, most test of normality is likely unreliable. Due to this technical difficulty,
normality assumption is often made with any check.2
Simulation 3 (Standard normal or student-t): We would like to investigate whether the hypothesis of population mean depends on the knowledge of σ, via simulations. We test the hypothesis
H0 : µ = 0 versus H1 : µ 6= 0.
1. Generate one sample of n observations drawn from a probability distribution, N (0, 1) or
U (−2, 2). Mean and variance of the uniform random variable (denoted as µ and σ 2 , respectively) may be computed.
2. Compute sample mean m =
1
n
Pn
i=1
xi , and the sample variance s2 =
1
n−1
Pn
i=1 (xi
− m)2 .
Test the hypothesis using three different test statistics:
(m−0)
, where σ 2 is assumed known, and reject H0 if |t1 | > zα/2 where zα/2 is
(a) t1 = √
2
σ /n
the value of standard normal distribution such that probability of the standard normal
random variable z larger than zα/2 equals α/2, i.e., P rob(z > zα/2 ) = α/2. In essence,
√
we assume t1 = m−µ
σm , where σm = σ/ n to be standard normal.
(m−0)
(b) t2 = √
, where s2 is estimated variance, and reject H0 if |t2 | > zα/2 where zα/2 is
2
s /n
the value of standard normal distribution such that probability of the standard normal
random variable z larger than zα/2 equals α/2, i.e., P rob(z > zα/2 ) = α/2. In essence,
√
we assume t2 = m−µ
sm , where sm = s/ n to be standard normal.
(m−0)
(c) t3 = √
, where σ 2 is assumed unknown and is replaced by s2 , and and reject H0 if
2
s /n
|t3 | > tn−1,α/2 where tn−1,α/2 is the value of student-t distribution with n − 1 degrees
of freedom such that probability of the student-t random variable t larger than tn−1,α/2
equals α/2, i.e., P rob(t > tn−1,α/2 ) = α/2. In essence, we assume t3 =
√
sm = s/ n to be student-t with n − 1 degrees of freedom.
m−µ
sm ,
where
3. Repeat the last two steps 1000 times. Compute the percentage of rejection in the simulated
samples.
2 There are advanced statistical procedures to test hypothesis without the assumption of normality. Examples include Jacknife
and Bootstrap. Interested readers may take a look at Efron and Tihshirani (1993).
9
Distribution
n
α
t1
t2
t3
xi ∼ U (−2, 2)
16
0.01
1.10%
3.40%
1.60%
xi ∼ U (−2, 2)
40
0.01
0.90%
1.40%
1.10%
xi ∼ U (−2, 2)
16
0.05
6.60%
8.50%
6.30%
xi ∼ U (−2, 2)
40
0.05
5.20%
6.00%
5.20%
xi ∼ U (−2, 2)
16
0.10
11.20%
12.70%
11.10%
xi ∼ U (−2, 2)
40
0.10
9.60%
10.30%
9.80%
xi ∼ N (0, 1)
16
0.01
0.90%
1.40%
0.60%
xi ∼ N (0, 1)
40
0.01
1.40%
1.40%
1.30%
xi ∼ N (0, 1)
16
0.05
5.10%
6.90%
4.20%
xi ∼ N (0, 1)
40
0.05
5.00%
5.40%
4.90%
xi ∼ N (0, 1)
16
0.10
10.30%
12.40%
9.80%
xi ∼ N (0, 1)
40
0.10
9.30%
10.60%
9.50%
We note that the simulated rejection rate of t1 and t3 is closer to the theoretical rejection rate of
α than that of t2 . The simulated rejection rate of t1 , t2 and t3 are equally close to the theoretical
rejection rate when the sample size is large. Thus, when sample size is large (say, n > 30), using
any of the three ways to test hypothesis makes no difference. When the sample size is small, it
is better to use t3 .
[Reference: Sim3.xls]
When to use normal approximation? When to use Student-t? The rule is always use Student-t if a
computer is readily available to compute the t values or probability. Use normal or Student-t according to
the discussion above when only statistical tables are available.
5
Testing proportions
When the population mean in question is really population proportion: Thus the discussion above goes
through but the variance takes a special form. If π is the population proportion, the variance of a randomly
drawn obervation is π(1 − π). Thus, under the null that π = k, we have
p−k
z=p
A
k(1 − k)/n
10
∼ N (0, 1)
What if the variance is unknown for the proportion under the null? It simply cannot happen because under
the null that π = k, the variance of a randomly drawn obervation is π(1 − π) – and hence known. Some
might argue that we can also use p(1 − p)/n instead of k(1 − k)/n as an estimate for the variance of p. To
convince ourseleves, we can do a simulation as follows.
Simulation 4 (Using the sample variance of the variance under the null?): We would like to
test H0 : π = 0.3 versus H1 : π 6= 0.3, using different variance estimate of p.
1. Draw n (= 50) observations from Bernoulli distribbution with parameter π (i.e., prob(success) =
π), π = 0.3, 0.5, 0.7.
2. For the sample, obtain the sample proportion of success, p. Test the hypothesis using the
two different statistics at various levels of signficance (α).
p−k
t1 = p
k(1 − k)/n
t2 = p
p−k
p(1 − p)/n
3. Repeat the last two steps 1000 times. Compute the percentage of rejection in the simulated
samples.
————————— t1 —————————
α
Rejection rule
π = 0.1
π = 0.3
π = 0.5
π = 0.7
π = 0.9
0.01
|t1 | > 2.576
79.00%
1.20%
68.30%
100.00%
100.00%
0.05
|t1 | > 1.960
94.30%
5.40%
84.50%
100.00%
100.00%
0.10
|t1 | > 1.645
97.60%
9.40%
89.50%
100.00%
100.00%
————————— t2 —————————
α
Rejection rule
π = 0.1
π = 0.3
π = 0.5
π = 0.7
π = 0.9
0.01
|t1 | > 2.576
93.80%
2.60%
57.60%
100.00%
99.50%
0.05
|t1 | > 1.960
97.10%
7.30%
84.50%
100.00%
99.50%
0.10
|t1 | > 1.645
98.50%
13.10%
89.50%
100.00%
99.50%
The simulation results show that the test statistic t1 with the null value imposed in computation
of variance performs better than t2 without using the null value in computation of variance.
11
Specifically, t1 yields a simulated rejection rate closer to the theoretical rejection rate when the
null is correct (column labeled π = 0.3); t1 ’s ability of rejecting the null (π = 0.3) when the null
is false (i.e., π 6= 0.3) is comparable to that of t2 , if not better.
[Reference: Sim4.xls]
Thus, the simulation suggests that in testing hypothesis about proportions, it is better to use t1 with the
null value imposed in computation of variance.
6
6.1
One-sided tests
One-sided test with simple null, H0 : µ = k versus H1 : µ > k
The example deals with the hypothesis in this form H0 : µ = k versus H1 : µ 6= k. What if we have inequality
structure like H0 : µ = k versus H1 : µ > k? Basically the same analysis goes through. However, we will be
looking for a c1 such that P rob(z > c1 ) = 0.05. And, we will reject the null when the test statistic z is larger
than c1 . From the standard normal table, we read that c1 would have to be 1.64 for P rob(z > c1 ) = 0.05.
Simulation 5 (One-sided tests with simple null): We would like to convince ourselves that the
one-sided test may be conducted as described. Let’s consider the null H0 : µ = k versus H1 :
µ > k, where k is set to 0 for convenience.
1. Fix a µ, say, µ = 0.
2. Generate n (= 50) obervations from x = µ + , ∼ N (0, 1), so that x ∼ N (µ, 1).
3. For this sample, compute
m−0
z=p
s2 /n
At α level of significance, reject H0 and favor H1 if z > zα . Do this for different values of α.
4. Repeat the last two steps 1000 times and compute the percentage of samples that reject H0 .
The following table summarize our simulation results.
12
—– H0 : µ = 0 versus H1 : µ > 0 —–
—– H0 : µ = 0 versus H1 : µ < 0 —–
α
Rejection rule
µ=0
µ=1
Rejection rule
µ=0
µ = −1
0.01
z > 2.326
1.20%
100.00%
z < −2.326
1.20%
100.00%
0.05
z > 1.645
4.80%
100.00%
z < −1.645
4.80%
100.00%
0.10
z > 1.282
10.00%
100.00%
z < −1.282
10.00%
100.00%
It is obvious that the simulated rejection rates is close to the theretical rejection rates. Thus, the
simulation confirms that our suggested procedure in testing the one-sided hypthesis is valid.
[Reference: Sim5.xls]
Note that in both panels of the simulation above, we reported only the test of the hypothesis when the
true parameter equals to the null value or the alternatives. For instance, in the left panel, when we are
testing H0 : µ = 0 versus H1 : µ > 0, we consider the true parameter of 0 (µ = 0), and the true parameter
larger than 0 (i.e., µ > 0, in the example above), but not the case of µ < 0 because the µ < 0 is not feasible
under the null or the alternative.
Situations similar to this hypothesis structure happen in real life. For instances, we might have
1. H0 : The mean income of students at a local primary school is zero.
H1 : The mean income of students at a local primary school is larger than zero.
(Mean income below zero is impossible, theoretically.)
2. H0 : The mean number of spouses of a married person in Hong Kong is one.
H1 : The mean income of spouses of a married person in Hong Kong is larger than one.
(Mean number of spouses of a married person in Hong Kong cannot be less than one, theoretically.)
Example 6 (one-sided test): In the past, 15% of the mail order solicitations for a political party
resulted in a financial contribution. A new solicitation letter that has been drafted is sent to a
sample of 200 people and 40 responded with a contribution. At the 0.05 significance level can it
be concluded that the new letter is more effective?
The problem is solve in the following steps:
1. State the null and the alternate hypothesis:
H0 : π = 0.15 (new letter is as effective as the old letter)
H1 : π > 0.15 (new letter is more effective)
13
2. Identify the test statistics and its distribution:
z = (p − 0.15)/std(p)
∼ N (0, 1), where p is the sample proportion
3. State the decision rule:
The null hypothesis is rejected if z is greater than 1.96, i.e., P rob(z > 1.96) = 0.05.
4. Make a decision and interpret the results:
p−π
z=q
40
π(1−π)
n
= q200
− 0.15
= 1.98
0.15(1−0.15)
200
The null hypothesis is rejected.
Rejection region
α=0.05
π=0.15
µ=0
1.65
0.2
p
1.98
z
Standardized to standard normal: z=(p-π)/std(p)
Can you give other examples that fit into the one-sided hypothesis structure?
6.2
One-sided test with composite null, H0 : µ ≤ k versus H1 : µ > k
There are many situations when it appears logical to consider the hypothesis with inequality structure like
H0 : µ ≤ k versus H1 : µ > k. For instances,
1. H0 : The unemployment rate is lower than or equal to 5 percent.
H1 : The unemployment rate is higher than 5 percent.
2. H0 : The average marks of students in the mid-term is lower than or equal to 70.
H1 : The average marks of students in the mid-term is higher than 70.
Note that under the null, µ can take a range of values, e.g., µ = k and µ = k − 0.001. We call this kind of
null hypothesis that the population parameter can take a set of values “composite null”. This composite null
makes the hypothesis testing very different from what we have discussed earlier.
14
Simulation 6 (One-sided tests with composite null): We would like to explore the properties
of the one-sided test with composite null. Let’s consider the null H0 : µ ≤ k versus H1 : µ > k,
where k is set to 0 for convenience.
1. Fix a µ, say, µ = 0.
2. Generate n (= 50) obervations from x = µ + , ∼ N (0, 1), so that x ∼ N (µ, 1).
3. For this sample, compute
m−0
z=p
s2 /n
At α level of significance, reject H0 and favor H1 if z > zα . Do this for different values of α.
4. Repeat the last two steps 1000 times and compute the percentage of samples that reject H0 .
5. Repeat the simulation with different µ.
The following table summarize our simulation results.
——————— H0 : µ ≤ 0 versus H1 : µ > 0 ——————–
α
Rejection rule
µ = −1
µ = −0.5
µ=0
µ = 0.5
µ=1
0.01
z > 2.326
0.00%
0.00%
1.20%
89.90%
100.00%
0.05
z > 1.645
0.00%
0.00%
4.80%
96.80%
100.00%
0.10
z > 1.282
0.00%
0.00%
10.00%
98.80%
100.00%
[Reference: Sim6.xls]
From the simulation results, we see that when the true parameter is less than zero, the probability
of rejecting the null is much less than α. In addition, when the true parameter is equal to zero,
the probability of rejecting the null is much close to α.
Based on the simulation, we conclude that treating the composite null (hypothesized parameters taking
a range of value) as a simple null (taking only one value) mechanically will yield
1. a rejection rate that is less than the intended rejection rate (i.e., α) when the true parameter differs
from the boundary value of the composite null.
2. a rejection rate the same as the intended rejection rate (i.e., α) only when the true parameter lies at
the boundary value of the composite null.
15
Thus, if we treat the composite null (taking a range of value) as a simple null (taking only one value)
mechanically, the level of significance (α) is really the maximum level of significance, correct only when the
true parameter is at the boundary. Should we be concerned? Yes! Because when we report our test, we
typically say “We reject H0 at 0.05 level of significance” which means “The probability of rejecting the null
when the null is true is 0.05”. When the “probability of rejecting the null when the null is true” (i.e., the
true rejection rate) is less than 0.05 and we state that it is 0.05, we are making a false statement.
It is not too difficult to understand why we have difficulty in finding a correct level of significance for
a composist null. Hypothesis testing builds on the probability distribution of a sample statistic, which are
often characterized by the population parameters (such as mean and variance). Under a composite null,
we really have infinite possible null parameters, and hence correspondingly infinite probability distributions
(one for each of the infinite parameter values), and correspondingly infinite level of significance (one for each
of the infinite parameter values).
In most empirical studies, researchers often avoid this kind of composite null because the interpretation
of the level of significance is unclear. We suggest to follow this conventional practice and avoid the composite
null in our own work.3 However, at times it is more intuitive to consider the composite null. In that case,
we better remember what α means.
7
Level of significance versus p-value
Suppose given a sample statistic, we want to claim a rejection of the null but we want to make a honest
probability statement. That is, we would like to find the smallest possible level of significance (p) such that
we can reject the null at his level of signifiance.
Definition 6 (p-value): A p-value is the probability, assuming that the null hypothesis is true,
of finding a value of the test statistic (denoted as zˆ) at least as extreme as the computed value
for the test.
p-value= P rob(z > zˆ)
where z is the corresponding random variable.
The p-value can be used to make acceptance and rejection decisions.
3 Most textbook in statistics either never mention the composite null or gave the wrong interpretation when they mention
it. See Liu and Stone (1999) for a discussion.
16
1. If the p-Value is smaller than the level of significance, H0 is rejected.
2. If the p-Value is larger than the level of significance, H0 is not rejected.
To understand the relation between the level of significance and the p-value, consider the following algorithm
of finding the p-value. Consider H0 : µ = k versus H1 : µ 6= k. Suppose the observe statistic is denote as zˆ
(= (m
ˆ − k)/sm ), and its corresponding variable is denoted z (= (m − k)/sm ), where m
ˆ denotes a realized
sample mean, and m denotes a random mean (not yet realized).
1. Set the level of significance to its large possible number, i.e., α = 1.
2. Test the hypothesis at α level of significance.
3. Update the α with the following set of rules.
(a) If the hypothesis is not rejected at α level of significance, stop the process and set the p-value to
α.
(b) If the hypothesis is rejected at α level of significance, replace α with α − ∆ (where ∆ is a small
number, say, 0.0001) and repeat the last two steps.
Simulation 7 (Relating the p-value to level of significance): We generate one sample of 50
observations from a N (0, 1) population, i.e., the true population mean is 0. We are interested
in the hypothesis H0 : µ = k versus H1 : µ 6= k, for different k. We use the above algorithm in
finding the p-value of rejecting the null. The following table reports whether the hypothesis is
rejected at different level of signficance α.
17
α
k=0
k = 0.1
k = 0.3
k = 0.5
k = 0.7
k = 0.9
1
Yes
Yes
Yes
Yes
Yes
Yes
0.9
No
Yes
Yes
Yes
Yes
Yes
0.8
No
Yes
Yes
Yes
Yes
Yes
0.7
No
Yes
Yes
Yes
Yes
Yes
0.6
No
Yes
Yes
Yes
Yes
Yes
0.5
No
Yes
Yes
Yes
Yes
Yes
0.4
No
No
Yes
Yes
Yes
Yes
0.3
No
No
Yes
Yes
Yes
Yes
0.2
No
No
Yes
Yes
Yes
Yes
0.1
No
No
Yes
Yes
Yes
Yes
0.05
No
No
Yes
Yes
Yes
Yes
0.01
No
No
No
Yes
Yes
Yes
We can reject the null with k = 0.5, the null with k = 0.7 and the null with k = 0.9 at 0.01 level.
That is, if one of these nulls are correct, the chance of observing the sample statistics (which is
in fact generated with N(0,1)) is extremely small (i.e., less then 0.01). If the null with k = 0.3
is correct, the chance of observing the sample statistics (which is in fact generated with N (0, 1))
is small, and definitely not as small as when k = 0.9. When the null with k = 0 is correct, the
chance of observing the sample statistics (which is in fact generated with N (0, 1)) is big, and we
were not able to reject the null at α ≤ 0.9).
In fact, the p-value for the hypotheses was found to be as
k=
0
0.1
0.3
0.5
0.7
0.9
p-value =
0.9642
0.4632
0.0220
0.0001
0.0000
0.0000
It is easy to verify that the null is rejected at α significance level if p − value < α.
[Reference: Sim7.xls]
Example 7 (p-value of a one-sided test): In the past, 15% of the mail order solicitations for a
political party resulted in a financial contribution. A new solicitation letter that has been drafted
is sent to a sample of 200 people and 40 responded with a contribution. We would like to test
H0 : π = 0.15 (new letter is as effective as the old letter) versus H1 : π > 0.15 (new letter is more
effective). What is the p-value of rejecting the null?
18
1. The relevant test statistics and its distribution is z = (p − 0.15)/std(p)
∼ N (0, 1), where
p is the sample proportion.
2. The sample statistic is:
p−π
z=q
π(1−π)
n
40
= q200
− 0.15
= 1.98
0.15(1−0.15)
200
3. The p-value is
prob(z > 1.98) = prob(z > 1.98) = 0.023835
α
critical value
Test z-stat.
Decision
0.05
1.644854
1.98
reject
0.04
1.750686
1.98
reject
0.03
1.880794
1.98
reject
0.02
2.053749
1.98
Not reject
0.01
2.326348
1.98
Not reject
Sampling
distribution of the
test statistics
P-value
Test statistic from
the sample.
Example 8 (p-value of a two-sided test): The supervisor of a production line believes that the
average time to assemble an electronic component is 14 minutes. Assume that assembly time is
normally distributed with a standard deviation of 3.4 minutes. The supervisor times the assembly
of 14 components, and finds that the average time for completion was 16.6 minutes. What is the
smallest significant level the null hypothesis H0 : µ = µ0 = 14 could be rejected?
Test statistic = (m∗ − µ0 )/std(m) = (16.6 − 14)/(3.4/141/2) = 2.86 > 0.
19
P-value = 2 × P (Z > 2.86) = 2 × 0.0021 = 0.0042.
Note that p-value = Prob [ |z| > the absolute value of test statistic | H0 ]
= 2× Prob [ z < value of test statistic | H0 ] if value of test statistic < 0
= 2× Prob[ z > value of test statistic | H0 ] if value of test statistic > 0
B
P-value = A+B
A
Test statistic from
the sample.
8
Power
At a starry night, we noticed a bright spot in the sky. From the map, we knew that it is a system of binary
stars. However, our naked eyes could not tell. So, we borrowed a telescope and looked at it again. Given
the same telescope (or naked eyes), the larger the distance between the two stars, the more likely we can
distinguish between them. Given the same two stars, the larger the Magnification Coefficient (MC) of our
telescope, the more likely we can distinguish between the two stars.
While the discussion about star-gazing might appear totally irrelevant to statistics, star-gazing actually
is analogous to the power of hypothesis testing. In the context of hypothesis testing, the power of the test
is the ability of the test in telling us the null is wrong when the true parameter is different from the null.
Conditional (given) a test, such ability depends on how far apart the null is from the true parameter. The
larger the difference between the truth and the null, the larger the power of a given test.
In this sense, there is some similarity in statistical power and the power of a telescope.
20
Statistical test
Telescope (Star gazing)
Power of a test in distinguishing between
Magnification coefficient (MC) of your
two values of parameter.
telescope
The larger the difference between the
The larger the distance between the two
truth and the null, the larger the power.
stars, the more likely we can distinguish
between them using the same telescope.
Given the same truth and null, the larger
Given the same two stars, the larger the
is the power, the more accurate is the
MC of our telescope, the more likely we
test.
can distinguish the two stars.
Definition 7 (Power): The power of a test is a measure of the ability of a test in distinguishing
between two possible values of the parameter of interest. The power of a test against an alternative
value of parameter (different from the null value) is the probability of rejecting the alternative
value when true parameter equals to the alternative value.
Example 9 (Power): A random sample of 802 supermarket shoppers had 378 shoppers that
preferred generic brand items if the price was lower. Test at the 10% level the null hypothesis
that at least one-half of all shoppers preferred generic brand items against the alternative that
the population proportion is less than one-half. Find the power of a 10% level test if, in fact,
45% of the supermarket shoppers are able to state the correct price of an item immediately after
putting it into the cart.
1. The hypotheses are H0 : π = 0.5, versus H1 : π < 0.5
2. Variance of the sample proportion p under H0 is: 0.5 × (1 − 0.5)/802 = 0.000312
3. Level of significance = 0.1 Reject H0 if sample proportion p is too small. At the level of
significance (=0.1), z = −1.28. Upper limit of rejection is 0.5 + z × [std.dev.underH0 ] =
0.4774 . Therefore, H0 is rejected when sample proportion is less than 0.4774.
4. If the real proportion is 0.45
(a) H0 is false since p = 0.45 < 0.5.
(b) The power is P rob(rejecting H0 |p = 0.45) = P r(p < 0.4774|p = 0.45).
(c) Variance under π = 0.45 is : .45 ∗ (1 − .45)/802 = .000309
p
(d) P rob(p < 0.4774|π = 0.45) = P rob(z < (.4774 − .45)/ (0.000309) = 0.9404
21
5. Thus, 0.9404 is the probability that H0 (π = 0.5) is correctly rejected when the truth is
π = 0.45.
Simulation 8 (Power): We would like to simulate the power of a simple test of the population
mean, H0 : µ = 0 versus H1 : µ 6= 0.
1. Fix a population mean µ. µ = 0 + 0.1 × i, i = −10, −9, ... − 1, 0, 1..., 10.
2. Generate a sample of 30 observation from a population of N(µ,1).
3. Perform the test of hypothesis H0 : µ = 0 versus H1 : µ 6= 0 at α level of significance.
4. Repeat the last two steps 1000 times. Compute the percentage of rejections in the simulations, for different α.
5. Repeat with different µ.
The simulated rejection rate is plotted against different µ in the following chart.
Rejection rate
100%
80%
60%
40%
20%
0%
-1
-0.8 -0.6 -0.4 -0.2
0
0.2 0.4 0.6 0.8
1
True mean
[Reference: Sim8.xls]
Note that if the test has extremely power, it will reject the null whenever the true µ is not zero.
This ideal scenario does not happen. What often happens is that a test will have much power
against the null when the true µ is very far away from the null value, and very little power against
the null when the true µ is very close to the null value.
9
Switching Null and alternative hypothesis
Why must we set a hypothesis as null, the other as alternative? We have discuss the argument earlier –
“giving the benefit of doubt to the defedent”. However, in that section, we were not able to see the impact
22
of switching the null and alternative hypothesis on the conclusion because the execution of hypothesis tests
was not yet discussed. Now we are ready. To see the impact, it is better to consider the following example.
Example 10 (Switching null and alternative): In the past, 15% of the mail order solicitations
for a certain charity resulted in a financial contribution. In past years, the letter were drafted
by a staff Mr A. A new solicitation letter has been drafted by a job applicant Mr B. The letter
is sent to a sample of 200 people and 30 responded with a contribution. At the .05 significance
level can it be concluded that the new letter is more effective? Can we conclude that the job
applicant Mr B is better than Mr A?
Suppose we give the benefit of doubt to the old letter (or Mr A). That is, unless the new letter
performs much better than the old one, we will use the old one.
1. Let π be the rate of the mail order solicitations that resulted in a financial contribution.
State the null and the alternate hypothesis:
H0 : π ≤ 0.15 (new letter is no more effective than the old letter)
H1 : π > 0.15 (new letter is more effective)
2. Identify the test statistics and its distribution:
z = (p − 0.15)/std(p)
∼ N (0, 1), where p is the sample proportion
3. State the decision rule:
The null hypothesis is rejected if z is greater than 1.65, i.e., P rob(z > 1.65) = 0.05.
4. Make a decision and interpret the results:
p−π
z=q
π(1−π)
n
30
= q200
− 0.15
= 0 < 1.65
0.15(1−0.15)
200
The null is not rejected.
What if we switch the null and the alternative, and give the benefit of doubt to the new letter
(or Mr B). That is, unless the new letter performs much worse than the old one, we will use the
new one.
1. State the null and the alternate hypothesis:
H0 : π > 0.15 (new letter is more effective)
H1 : π ≤ 0.15 (new letter is as effective as the old letter)
23
2. Identify the test statistics and its distribution:
z = (p − 0.15)/std(p)
∼ N (0, 1), where p is the sample proportion
3. State the decision rule:
The null hypothesis is rejected if z is smaller than -1.65, i.e., P rob(z > −1.65) = 0.05.
4. Make a decision and interpret the results:
p−π
z=q
π(1−π)
n
30
= q200
− 0.15
= 0 > −1.65
0.15(1−0.15)
200
Hence the null that the new letter is more effective than the old letter is not rejected.
In this example, if we give the benefit of the doubt to Mr A, we will keep Mr A. If we give the
benefit of doubt to Mr B, we will fire Mr A and hire Mr B. In most organizations, we tend to
keep the existing staff unless the contender is much better.
Note that this example is very extreme – we have p = π but we reach opposite conclusions when we
switch the null and the alternative. In this example, the logical choice of null and alternative are rather
obvious. Unfortunately, in a lot of examples, the choice is not as easy as this one. It will take a lot of practice
to acquire such skills in formulating the null and the alternative.
10
Relation between testing hypothesis and confidence intervals
Suppose we are interested in the inference about population mean µ. Using a sample, we may compute
the sample mean (m) and use it as an estimate for the population mean. We may further construct a
(1 − α) × 100% confidence interval for the population mean, (m − zα/2 × σm ,m + zα/2 × σm ). Recall that
a (1 − α) × 100% confidence interval for the population mean means that (1 − α) × 100% of the random
intervals constructed in this way is expected to cover the population mean, and α × 100% of the random
intervals constructed in this way is expected not to cover the population mean.
With the sample mean, we can also test, at α level of significance, the hypothesis about the population
mean equal to the true value, H0 : µ = µ0 versus H1 : µ 6= µ0 . We reject H0 if (m − µ0 )/σm > zα/2 or
(m − µ0 )/σm < −zα/2 . Equivalently, we reject H0 if m > µ0 + zα/2 × σm or m < µ0 − zα/2 × σm . Or written
in a slightly more complicated way, we reject H0 if m − zα/2 × σm > µ0 or µ0 > m + zα/2 /σm . That is, we
do not reject the null if µ0 lies between (m − zα/2 × σm ,m + zα/2 × σm ).
24
Theorem 1 (Equivalence of hypothesis test and confidence intervals):Consider the hypothesis
test H0 : µ = µ0 versus H1 : µ 6= µ0 at α level of significance. We reject H0 if and only if µ0 does
not lie into the (1 − α) × 100% confidence interval of µ.
Simulation 9 (Relation between testing hypothesis and confidence intevals):We illustrate that
that testing hypothesis and CI are equivalent in the above theorem. We will the hypothesis H0 :
µ = k versus H1 : µ 6= k at α level of significance.
1. Fix a µ = 0.
2. Generate a sample of 50 observations according to N (µ, σ 2 ), σ 2 = 1.
3. Construct (1 − α) × 100% confidence interval for µ. Test the hypothesis using CI.
4. Test the hypothesis using the conventional method at α level of significance, with α =
0.01, 0.05, 0.10.
5. Repeat the above steps 1000 times. Compute the percentage of rejection rate with different
k using both procedures.
————– Simulated rejection rate of testing ————–
H0 : µ = k versus H1 : µ 6= k at α level of significance
α
k=0
k = 0.1
k = 0.3
k = 0.5
k = 0.7
0.10
9.20%
15.20%
67.50%
97.10%
100.00%
0.05
4.70%
8.70%
52.50%
94.00%
99.80%
0.01
1.20%
2.90%
28.80%
83.40%
99.00%
—————– Simulated coverage rate of —————–
(1 − α) × 100% confidence interval of covering µ = k
α
k=0
k = 0.1
k = 0.3
k = 0.5
k = 0.7
0.10
90.80%
84.80%
32.50%
2.90%
0.00%
0.05
95.30%
91.30%
47.50%
6.00%
0.20%
0.01
98.80%
97.10%
71.20%
16.60%
1.00%
Thus, it is obvious that the rejection rate and the coverage rate of the same null value of µ are
related:
25
rejection rate = 1 − coverage rate.
[Reference: Sim9.xls]
11
A general procedure of testing hypothesis
Suppose we are interested in testing whether the population parameter θ is equal to k.
H0 :
θ=k
H1 :
θ 6= k
1. We need to get a sample estimate (q) of the population parameter θ.
2. We know in most cases, the test statistics will be in the following form:
t=
q−k
σq
where σq is the standard deviation of q under the null. The form of σq depends on what q is.
3. Sample size and the null at hand determine the distribution of the statistic. If θ is population mean,
and the sample size is larger than 30, t is approximately standard normal.
12
Testing population variance
The general procedure of testing hypothesis suggests that it is essential to know the distribution of the
sample analog of the population parameter of concern. If such distribution is not directly available, we may
try to obtain the distribution of a transformation of the sample analog. This is the case when we test the
population variance.
Consider the case that we are interested in testing H0 : σ 2 = σ02 , versus H0 : σ 2 > σ02 . To test the
Pn
hypothesis, we have to rely on an estimate of sample variance, i.e., s2 = i=1 (xi − m)/(n − 1). We can show
that E(s2 ) = σ 2 and V ar(s2 ) = 2σ 4 /(n − 1). Because neither s2 nor s is normally distributed, we cannot
26
rely on the procedure we discussed earlier. Fortunately, we can show that the following statistics4
(n − 1)s2
σ2
has a χ2 (Chi-square) distribution with n−1 degrees of freedom. Thus we can formulate our testing procedure
as
1. Compute the sample variance, s2 .
2. Compute the statistic χ2 = (n − 1)s2 /σ02 .
3. Reject the null at α level of significance if χ2 > χ2n−1,α , where χ2n−1,α is the critical value such that
P rob(χ2 > χ2n−1,α ) = α.
Example 11 (Example):The time it takes to complete the assembly of an electronic component
is normally distributed with a standard deviation of 4.38 minutes. If we select 20 components
at random, what is the probability that the standard deviation for the time of assembly of these
units is less than 3.0 minutes?
Let s2 denote the sample variance, and hence s denote the sample standard deviation. We are
trying to compute P rob(s < 3.0). But, P rob(s < 3.0) = P rob(s2 < 9.0) = P rob((n − 1)s2 /σ 2 <
(n − 1)9.0/σ 2 ). I know (n − 1)s2 /σ 2 has a Chi-square distribution with degree n − 1. So, we just
have to find P rob(χ2 < 19 × 9.0/19.18) = P rob(χ2 < 8.91) = 1 − 0.975 = 0.25
4 Suppose
x1 , x2 , ..., xn are standard normal random variables. Then,
n
X
x2i ∼ χ2 (n)
i=1
a Chi-square distribution with n degrees of freedom.
27
References
[1] Efron, Bradley, and Robert J. Tibshirani (1993): An Introduction to the Bootstrap, Chapman & Hall.
[2] Liu, Tung, and Courtenay C. Stone (1999): “A Critique of One-Tailed Hypothesis Test Procedures in
Business and Economics Statistics Textbooks,” Journal of Economic Education, 30(1): 59-63.
Problem sets
We have tried to include some of the most important examples in the text. To get a good understand of
the concepts, it is most useful to re-do the examples and simulations in the text. Work on the following
problems only if you need extra practice or if your instructor assigns them as an assignment. Of course, the
more you work on these problems, the more you learn.
[To be written]
28