Chapter 9: TESTS OF HYPOTHESES FOR A SINGLE SAMPLE Statistical Inference

Chapter 9: TESTS OF HYPOTHESES
FOR A SINGLE SAMPLE
Part 1: Intro to Hypothesis Testing
Sections 9-1, 9-2, 9-3
Statistical Inference
We infer something about the population as a
whole from the information in a sample.
Sample
Population
- Point estimation X
- Confidence intervals X
- Hypothesis testing (introduced in chapter 9)
1
Hypothesis Testing
Sections 9-1, 9-2, 9-3
We’ll start with an illustration...
• Example: Reduction of car emissions
A certain automobile engine emits 100 mg
of nitrogen oxides per second on average. A
modification to the engine has been proposed
that may reduce the emissions.
The new design will be put into production
IF it can be demonstrated that its mean emission rate is less than 100 mg/s.
To make a decision, a random sample of
n = 50 modified engines is taken and
emission measurements are recorded.
2
The sample mean is x¯ = 92 mg/s and the
sample standard deviation is s = 21 mg/s.
A normal probability plot suggests emissions
follow a normal distribution.
Isn’t 92 far enough below 100 for us to say
the modified engine is better?
Is there enough evidence to completely change
the manufacturing line and switch which
engine is produced?
3
STATISTICAL QUESTION:
Could we have gotten this low of a sample
mean emission x¯ even if the modified engine
WASN’T any better than the first (i.e. it’s
population mean was actually 100)?
Could we have grabbed a sample that happened to have many low emission values eventhough the population mean was 100?
To make a decision on the engines, we want
to quantify the above question with a probability:
“Given that the true population mean emission is 100 mg/s, what is the probability
of observing an emissions x¯ this low or
lower?
4
Recall from the last chapter:
If we assume µ = 100 and n large, we have
2
σ
¯
X ∼ N (100, n ).
This is a known behavior of the sample mean.
Probability of interest:
Given µ = 100 (engine not any better),
¯ ≤ 92) = ?
P (X
Since σ 2 is unknown in this case, we have
¯ −µ
X
√ ∼ tn−1
T =
S/ n
where S is the sample standard deviation
and T has a t distribution with n − 1 degrees
of freedom (and n = 50 in this example).
5
¯
X − µ 92 − 100
¯
√ ≤
√
P (X ≤ 92) = P
S/ n
21/ 50
= P (T ≤ −2.69)
because
T ∼ t49
t(49) density
t with 49 df
−3
−2
−1
0
T
= 0.0049
6
1
2
3
NOT VERY LIKELY...
The probability of observing an emissions x¯
this low or lower, given that the true population mean is 100 mg/s is
0.0049
This suggests that our initial assumption in
the calculation, that the true mean was 100,
is perhaps incorrect.
For this reason, we reject the assumption of
µ = 100 in favor of the ‘alternative’, that
the true mean emissions IS LESS THAN 100
mg/s.
We don’t know FOR SURE, but there’s strong
evidence against someone saying that the mean
of the modified engine is 100 mg/s.
7
If it was 100 mg/s, we would very rarely see
an x¯ this low (could happen, but not likely).
What’s unlikely enough to actually reject
the initial assumption (that the two engine
models were equal)?
There’s some opinion here, but we often use
0.05 as a threshold. Anything less than this
is considered rather unlikely.
————————————————————
We have essentially just performed a hypothesis test, now we will formalize the procedure...
8
• General set-up for testing a
hypothesis for µ
1. State your null H0 and alternative H1
hypotheses.
(The null is what we assume to be true.)
H0 : µ = µ0
(The subscript on µ0 is used to emphasize
that this value is the assumed mean under
the null hypothesis being true.)
There are 3 choices for the alternative,
either...
* H1 : µ 6= µ0
* H1 : µ < µ0
* H1 : µ > µ0
(two-sided alternative)
(one-sided alternative)
(one-sided alternative)
9
2. Calculate the test statistic (either a Z or T ).
(In this example, the test statistic was a
T , we’ll make a conclusion based on this.)
3. Compute the probability of observing a test
statistic this extreme, or more extreme,
under the null being true.
(This probability is called a p-value.)
4. State your conclusion with respect to the
problem:
Either...
‘Reject the null’
or
‘Fail to reject the null’.
5. Be sure to verify any assumptions that were
needed.
(This is usually a normal probability plot
for verifying normality which is needed to
have T ∼ tn−1).
10
• Example: Formalizing the emissions
hypothesis test
1. State your null H0 and alternative H1
hypotheses.
H0 : µ = 100
H1 : µ < 100
(this is a one-sided
hypothesis test with
µ0 = 100)
2. Calculate the observed test statistic.
x¯ − µ0 92 − 100
√ = −2.69
t0 = √ =
s/ n
21/ 50
(The subscript on t0 is used to emphasize
the fact that we’re assuming the mean to
be µ0.)
11
3. Compute the probability of observing a
test statistic this extreme, or more extreme,
under the null being true (i.e. compute the
p-value).
¯
X−µ
Under H0 true, T0 = S/√n0 ∼ t49, and
P (T0 ≤ −2.69) = 0.0049
t(49) density
t with 49 df
−3
−2
−1
0
1
2
3
T
Thus, because this is a one-sided hypothesis test, the p-value=0.0049.
12
p-value=0.0049...
“If the true mean is really µ = 100, then
the probability of observing a sample mean
(from a sample of size n = 50) this far below 100 (or even farther) is only 0.0049.”
4. State your conclusion for the hypothesis
test:
5 as a threshold for ‘unUsing 0.05 or 100
likeliness’, we have
p-value = 0.0049 < 0.05
and we reject the null in favor of the
alternative, which is that µ < 100.
13
5. Be sure to verify any assumptions that
were needed.
As stated earlier, we checked the normal
probability plot of the emission values and
it was OK, and the needed requirement for
T0 ∼ t49 (that the parent population was
normally distributed) was fulfilled.
When we reject H0, we say the test was significant.
For this example, we say there was significant
statistical evidence that the modified engine
has a mean emissions lower than 100 mg/s.
So, there was strong evidence that the modified engine is better.
14
• Why do we use this test statistic T0 to test
H0 : µ = µ0?
¯ − µ0
X
√
T0 =
S/ n
Let’s pick-apart this statistic...
¯ = µ0 and the ex– Under H0 true, E(X)
pected value of the numerator in T0 is 0,
and the distribution of T0 is unimodal centered at zero.
¯ is far from µ0 in either direction, the
– If X
numerator in T0 will be ‘large’(+ or −)
leading to a ‘large’ T0, leading to rejection
of H0.
A ‘large’ or ‘extreme’ T0 would not be expected if H0 was true (we expect T0 to
‘bounce-around ’ 0 if H0 true).
15
¯ − µ0?
– But what is a ‘large’ difference or X
This is where the denominator comes into
play. ‘Large’ is based on our sample size
and the variability in the population σ 2
(which shows up in S).
For one thing, scale matters. A ‘large’ dif¯ −µ0 on a nanoscale will probference in X
ably not be the same as a large difference
in kilometers (S will make this adjustment
here).
We also know that the expected squared
¯ from µ goes down as n indistance of X
creases. This also has to be taken into
account for deciding what is ‘large’.
Bottom line... if we observe a realized t0
value that is in the far tail of the T0 distribution, it suggests we should reject H0.
16
Some comments on terminology...
• The Null Hypothesis:
– It is what we assume to be true upon entering the hypothesis test
In many formal arguments, we often assume something to be true, and then see
if we can contradict this assumption
later.
We’re not looking to prove something
here, but we may find that the data were
not very likely to have occurred under the
null being true, which was the assumption
we made (in which case we reject the null).
– Often, the null is the less interesting statement to the researcher.
17
– Innocent until proven guilty.
We’re being cautious, we’re giving the
status-quo the benefit of the doubt.
– The situation is assumed uninteresting
until evidence can show (beyond reasonable doubt) that something interesting is
going on.
– Symbolized by H0.
– It is a statement about a population parameter, not a statistic.
– Example: the modified engine data,
H0 : µ = 100
18
• P-value:
– The p-value represents the probability of
obtaining a test statistic as extreme (or
more extreme) in magnitude than the observed test statistic under H0 true
– If you perform a two-sided hypothesis test
H0 : µ = µ0 vs. H1 : µ 6= µ0,
the p-value is the probability in both tails
(example on slide p.23)
– Large test statistic (in absolute value) ⇔
small p-value
– Small p-values are evidence against the
null hypothesis (as are large test statistics)
– When we make a decision to reject H0 it
is because the p-value is small
19
– A small p-value says we would have been
very unlikely to have gotten a sample with
data like this if H0 were true
– The p-value is not the probability that H0
is true
– We use the calculated p-value to make a
conclusion or decision on the hypothesis
test based on a chosen significance level α
(on next slide):
∗ Reject the null hypothesis
∗ Fail to reject the null hypothesis
(i.e. accept the null hypothesis)
– We do not prove the null hypothesis true,
this is not how things are set-up. We will
assume it to be true right from the start
of the procedure.
20
• The significance level α:
– How low must a p-value be to reject the
null?
– We set a threshold that will control our
chance of making a particular mistake.
What mistake?
REJECTING H0 WHEN H0 IS
ACTUALLY TRUE.
This is called a type I error.
This is often seen as a big mistake.
In the emissions example, the company
would completely re-do their engine manufacturing set-up if they reject. This would
be a big waste if the modified engine actually wasn’t any better.
21
– We set the chance of such a mistake to be
α which is often set at 0.05 (though 0.01
and others are also seen).
We simply accept a 5% chance that we
make a type I error. For most situations,
this chance of a mistake is considered low
enough.
– By only rejecting when the p-value is less
then α we control the type I error at the
α level.
α = P (type I error)
= P (reject H0 when H0 is true)
= P (reject H0|H0 is true)
= P (a false positive occuring)
22
• Example: An example where σ 2 is known
or you have very large sample
If σ 2 is known, or you have a very large
sample, the test statistic will be the
Z test statistic, instead of the T .
An inspector measured the full volume of a
simple random sample of n = 100 cans of
juice that were labeled as containing 12 oz.
The sample had a mean volumed 11.98 oz
and a standard deviation of 0.19 oz.
Let µ represent the mean fill volume for all
cans of juice recently filled by the machine.
Perform a hypothesis test that µ = 12 versus
µ 6= 12 at the α = 0.05 significance level.
23
ANS:
24