3.3 Statistical Inference with one sample from a population

3.3 Statistical Inference with one sample from a population
3.3.1 Introduction
A hypothesis is a theory that has neither been proven nor
disproven.
A statistical test will never prove nor disprove a hypothesis with
100% certainty.
1/1
Statistical Hypotheses
A statistical hypothesis is one that can be tested using a random
sample or samples.
It relates to a parameter of the population, commonly the
population mean or proportion e.g.
a) 30% of the adult population smoke.
b) On average, Americans are heavier than Japanese.
2/1
Statistical Hypotheses
In a statistical test we test between two opposing hypotheses, the
null hypothesis H0 and the alternative HA (sometimes referred to
as H1 ).
The null hypothesis always contains an equality e.g. in case a) we
test H0 : p = 0.3.
In case b), the hypothesis in words states that two population
means (say µX : mean mass of Americans and µY : mean mass of
Japanese) are unequal.
This cannot be the null hypothesis. The null hypothesis in this
case is H0 : µX = µY i.e. on average Americans and Japanese
weigh the same.
3/1
One-tailed and two-tailed tests
The alternative HA can be either directional (one-tailed) or
non-directional (two-tailed).
Two-tailed alternatives simply state that there is a difference.
One-tailed alternatives state what type of difference there is.
4/1
One-tailed and two-tailed tests
e.g. in case a) HA : p 6= 0.3 would be the appropriate two-tailed
alternative hypothesis i.e. the proportion of smokers in the
population simply differs from 30%.
In case b) the appropriate one-tailed alternative hypothesis would
be HA : µX > µY i.e. on average Americans weigh more than
Japanese.
5/1
One-tailed and two-tailed tests
We use two-tailed alternatives when the initial hypothesis (written
in words) i) does not state what sort of difference is expected and
ii) the source of the hypothesis is unknown or can be assumed to
be unbiased.
This is true in case a), hence we use a two-tailed alternative.
This is not true in case b), the hypothesis given in words states
that on average Americans weigh more that Japanese.
Hence, the alternative in this case is one-tailed i.e. HA : µX > µY .
6/1
One-tailed and two-tailed tests
Suppose the hypothesis is given by a source that is likely to biased,
for example, a hypothesis of a producer regarding a product of his.
The alternative should state that the product is worse than the
producer states. i.e. if a car producer states that his car consumes
µ0 litres/100km, then the alternative should be that the car
consumes more petrol i.e. HA : µ > µ0 .
7/1
General procedure of statistical testing
Before the test is carried out, it is assumed that the null hypothesis
(H0 ) is correct.
The test is based on a sample of observations. If the sample is
close to what we would expect under H0 , then we DO NOT
REJECT H0 .
This does not indicate that H0 is true, but it is a reasonable
hypothesis given our data.
If the sample is far away from what we would expect under H0 ,
then we accept HA is correct i.e. WE REJECT H0 .
In this case it is likely that H0 is false, but we can never be 100%
sure of this.
8/1
3.3.2 Types of Error
A type I error is committed when a true null hypothesis is rejected.
A type II error is committed when a false null hypothesis is not
rejected.
Suppose a drug company is testing a new drug to see whether a
new drug is more effective than the presently used drug.
The null hypothesis here is that the two drugs are as effective as
each other. The alternative is that the new drug is better than the
old drug.
9/1
Types of statistical error
A type I error occurs when the test indicates that the new drug is
better, but in fact it is not (H0 is incorrectly rejected).
This leads to costs, since such a conclusion implies the
introduction of a new drug which is no better (and possibly worse)
than the old drug.
A type II error occurs when the test indicates that the new drug is
not better, but in fact it is (H0 should be rejected, but is not).
This leads to costs, since such a conclusion implies that an
improved drug is not introduced to the market.
10 / 1
Types of statistical error
The probability of a type I error is the significance level of a test
and denoted by α.
The significance level is defined by the person carrying out the test.
Clearly, the significance level of a test should be small (commonly
≤ 5% ).
The stronger the evidence required to reject H0 , the lower the
significance level (e.g. when the costs of rejecting H0 are high, we
wish to avoid wrongly rejecting H0 and so we reduce α).
However, the more we try to avoid type I errors (as the significance
level is decreased), the more likely type II errors become.
11 / 1
Types of statistical error
The probability of a type II error is denoted by β.
The power of a test is defined to be 1-β.
This is the probability of correctly rejecting a false null hypothesis.
The power of a test cannot be measured in practice, since it
depends on the (unknown) value of the parameter which is the
subject of the test.
12 / 1
The power of a test
For example, consider the test of the hypothesis that 30% of the
adult population smoke.
H0 : p = 0.3 versus H1 : p 6= 0.3.
If in reality p = 0.5, the power of this test will be greater than if
p = 0.4 (the further H0 is from reality, the more likely it is that we
reject it).
For a fixed significance level, the power of a test increases as the
sample size increases.
Ideally, we would like a test to have a low significance level and
high power (i.e. the likelihood of either type of error is small).
This can only be achieved if we have a large sample.
13 / 1
3.3.3 The process of hypothesis testing
The procedure is as follows
1. State H0 and HA .
2. Choose the appropriate test statistic, T . This can be
thought of as a measure of distance from H0 . If e.g.
H0 : p = 0.3 is true, then we expect that close to
30% of our sample will smoke (there will be some
random variation around this population proportion).
3. Calculate the realisation of the test statistic, t, based
on the sample.
14 / 1
The process of hypothesis testing
4. Either a) calculate the p-value of the test (this is a
measure of the ”credibility” of a null hypothesis.
Statistical packages give this value). If the p-value of
the test is less than the significance level then we
reject H0 , or
b) determine the appropriate critical value (this is a
critical ”distance” from H0 ). If the realisation of the
test statistic exceeds this critical value then we reject
H0 .
5. Based on the p-value of the test (or the critical value
and realisation of the test statistic) state your
conclusion in words.
15 / 1
3.3.4 Testing hypotheses for a population mean (σ known
or n > 30)
The null hypothesis is H0 : µ = µ0 (µ0 given).
The test statistic is
Z=
X − µ0
S.E .(X )
Given that the null hypothesis is true, then this statistic has
approximately a standard normal distribution (independently of the
distribution that the observations come from).
Note that when the sample mean is close to µ0 , then the realisation
of the test statistic is close to 0. In this case, we do not reject H0 .
Realisations of the test statistic far from 0 correspond to the
sample mean being ”significantly different” from µ0 . In this case,
we should (in general) reject H0 .
16 / 1
Testing hypotheses for a population mean (σ known or
n > 30)
If the population variance σ 2 (or standard deviation σ) is known,
we use
σ
S.E .(X ) = √ .
n
When σ is unknown, we use
s
S.E .(X ) ≈ √ .
n
17 / 1
Calculation of the p-value for two-sided tests
Suppose the alternative is non-directional i.e. HA : µ 6= µ0 .
The p-value of the test is given by p = P(|Z | > |t|) = 2P(Z > |t|),
where t is the realisation of the test statistic.
The p-value is the probability that given H0 is true a randomly
chosen sample favours the alternative more than the sample
observed.
Low values of the p-value indicate that H0 should be rejected.
18 / 1
Calculation of the p-value for two-sided tests
19 / 1
Interpretation of the p-value
p > 0.05 indicates that there is no evidence against H0 (do not
reject at the 5% level).
0.01 < p < 0.05 indicates that there is evidence against H0 (reject
at the 5% level but not at the 1% level).
0.001 < p < 0.01 indicates that there is strong evidence against
H0 (reject at the 1% level but not at the 0.1% level).
p < 0.001 indicates that there is very strong evidence against H0
(reject at the 0.1% level).
20 / 1
The critical value of such a test
The critical value of such a test at a significance level of α is
Zα/2 = t∞,α/2 .
We reject H0 if and only if |t| > Zα/2 = t∞,α/2 .
It should be noted that if |t| > Zα/2 = t∞,α/2 then the p-value is
less than α.
21 / 1
The critical value of such a test
22 / 1
Example 3.3.1
The average weight of a sample of 100 students is 72kg with a
standard deviation of 12kg.
Test the hypothesis that on average students weigh 75kg at a
significance level of 1%.
23 / 1
Example 3.3.1
i) First, we state the hypotheses
H0 : µ = 75;
HA : µ 6= 75.
ii) Second, we choose the appropriate test statistic. For a
hypothesis regarding the population mean with one large sample,
we use
X − µ0
.
Z=
S.E .(X )
We use the approximation
12
s
S.E .(X ) ≈ √ = √
= 1.2
n
100
24 / 1
Example 3.3.1
iii) We calculate the realisation of the test statistic
t=
72 − 75
= −2.5
1.2
iv) We can calculate the p-value of the test
p = 2P(Z > |t|) = 2P(Z > 2.5) = 2 × 0.00621 = 0.01242
25 / 1
Example 3.3.1
v) Based on this, we can state our conclusion.
Since p > α = 0.01, we do not reject H0 at a significance level of
1%.
Hence, we do not reject the hypothesis that the average weight of
students is 75kg.
26 / 1
Example 3.3.1
Instead of calculating the p-value, we can base our conclusion on
the appropriate critical value.
iv) The critical value for a non-directional test is
Zα/2 = t∞,α/2 = t∞,0.005 = 2.576.
v) Based on this, we make our conclusion. Since |t| = 2.5 < 2.576,
we do not reject H0 .
Hence, we do not reject the hypothesis that the average weight of
students is 75kg.
27 / 1
Duality between confidence intervals and two-sided tests
Result
Suppose we are testing H0 : µ = µ0 against HA : µ = µ0 . We
should reject H0 at a significance level of 100α% if and only if µ0
does not belong to the 100(1 − α)% confidence interval for the
population mean.
e.g. we reject H0 at a significance level of 5% if and only if µ0 does
not belong to the 95% confidence interval for the population mean.
”Confidence level + Significance level” = 100%.
28 / 1
Duality between confidence intervals and two-sided tests
Intuition: The values in the confidence interval are credible values
of the population mean at the appropriate significance level.
Hence, we can carry out the test H0 : µ = µ0 against HA : µ = µ0
by calculating the appropriate confidence interval and basing our
conclusion on the confidence interval.
29 / 1
Example 3.3.2
The average weight of a sample of 100 students was 72kg with a
standard deviation of 12kg.
Test the hypothesis that on average students weigh 75kg at a
significance level of 1%.
30 / 1
Example 3.3.2
Since the significance level is 1%, we calculate a 99% confidence
interval for the population mean.
Since we have a large sample, the confidence interval is given by
X ± t∞,α/2 S.E .(X ) ≈ X ±
st∞,α/2
√
n
31 / 1
Example 3.3.2
We have
α = 0.01, t∞,α/2 = t∞,0.005 = 2.576.
The 99% confidence interval is given by
72 ±
12 × 2.576
√
= 72 ± 3.1 = [68.9, 75.1]
100
Since 75 belongs to this confidence interval, we do not reject H0
(the hypothesis that the average weight of all students is 75kg).
32 / 1
Use of the duality thoerem
Note that using duality we can test a number of different
hypotheses at a given significance level.
e.g. in this case, we would not reject the null hypothesis that the
average weight of students is 70kg at a significance level of 99%,
since 70 also belongs to the confidence interval.
By calculating the p-value (or realisation of the test statistic), we
can test a particular null hypothesis at various significance levels.
i.e. we can give more precise information on the weight of evidence
against a null hypothesis.
33 / 1
Right-sided tests
We consider two types of one-sided tests. Right-sided tests
H0 : µ = µ0 ;
HA : µ > µ0 .
In this case, we reject H0 if the sample mean is ”significantly
greater” than µ0 .
The p-value is given by p = P(Z > t).
Note, large positive realisations of the test statistic (associated
with small p-values) occur when the sample mean is significantly
greater than the hypothetical population mean µ0 .
34 / 1
Right-sided tests
35 / 1
One-sided tests
As before, the null hypothesis is rejected if p < α.
The critical value is given by Zα = t∞,α .
We reject H0 if t > Zα = t∞,α .
36 / 1
Right-sided tests
37 / 1
Left-sided tests
In this case we test between
H0 : µ = µ0 ;
HA : µ < µ0 .
In this case, we reject H0 if the sample mean is ”significantly
lower” than µ0 .
The p-value is given by p = P(Z < t).
38 / 1
Left-sided tests
39 / 1
Left-sided tests
Note, large negative realisations of the test statistic (associated
with small p-values) occur when the sample mean is significantly
lower than the hypothetical population mean µ0 .
As before, the null hypothesis is rejected if p < α.
The critical value is given by −Zα = −t∞,α . We reject H0 if
t < −Zα = −t∞,α .
40 / 1
Left-sided tests
41 / 1
Example 3.3.3
A manufacturer states that his light bulbs function on average for
1000hrs.
The mean working life of a sample of 81 bulbs was measured to be
920hrs with a standard deviation of 360hrs.
Is the manufacturers claim reasonable at a significance level of 5%?
42 / 1
Example 3.3.3
i) We state our hypotheses
H0 : µ = 1000;
HA : µ < 1000.
Note that this alternative states that the bulbs are worse than the
producer states. i.e. this is a left-sided test.
ii) The appropriate test statistic is
Z=
X −µ
.
S.E .(X )
43 / 1
Example 3.3.3
iii) We calculate the realisation of the test statistic
s
360
S.E .(X )≈ √ = √ = 40
n
81
920 − 1000
t=
= −2.
40
iv) The p-value for this test is
p = P(Z < t) = P(Z < −2) = P(Z > 2) = 0.02275.
44 / 1
Example 3.3.3
v) Conclusion. Since p < 0.05 = α, we reject H0 .
We have evidence that the statement of the producer is unfounded.
45 / 1
Example 3.3.3
iv) We can also base our conclusion on the appropriate critical
value.
For a left-sided test, the appropriate critical value is
−Zα = −t∞,α = −1.645.
v) Since t = −2 < −Zα , we reject H0 at the 5% level.
We have evidence that the statement of the producer is unfounded.
46 / 1
3.3.5 Testing hypotheses for a population mean (with a
small sample, n < 30)
In this case we use the test statistic
T =
where
X − µ0
,
S.E .(X )
s
S.E .(X ) = √ .
n
Given H0 is true, if the observations come from a normal
distribution, then this statistic has a Student t distribution with
n − 1 degrees of freedom.
Note: if the observations come from a distribution which is not
normal, then this will not be true.
47 / 1
Testing hypotheses for a population mean (with a small
sample, n < 30)
We cannot calculate p-values by hand using tables.
Hence, inference is based on the appropriate critical value read
from the table for the Student t-distribution (Table 7).
Again, the test statistic is a measure of how far the data are away
from H0 .
48 / 1
Two sided tests
We reject the null hypothesis if and only if
|t| > tn−1,α/2
, where tn−1,p satisfies P(T > tn−1,p ) = p when T has a student
t-distribution with n − 1 degrees of freedom.
49 / 1
Two-sided tests
50 / 1
Example 3.3.4
The average weight of a sample of 25 students was 72kg with a
standard deviation of 12kg.
Test the hypothesis that on average students weigh 75kg at a
significance level of 5% .
51 / 1
Example 3.3.4
i) We state the hypotheses
H0 : µ = 75;
HA : µ 6= 75.
ii) We choose the appropriate test statistic
T =
X − µ0
.
S.E .(X )
Given H0 is true and the data come from a normal distribution, this
statistic has a student t-distribution with n − 1 degrees of freedom.
52 / 1
Example 3.3.4
iii) We calculate the realisation of the test statistic
s
12
S.E .(X )≈ √ = √ = 2.4
n
25
72 − 75
t=
= −1.25.
2.4
iv) We read the appropriate critical value from the table for the
Student t-distribution.
iv) Since this is a two-tailed test, the significance level is α = 0.05
and the sample size is small, the critical value is
tn−1,α/2 = t24,0.025 = 2.064.
53 / 1
Example 3.3.4
v) We state our conclusion. Since
|t| = 1.25 < t24,0.025 = 2.064,
we do not reject H0 (the hypothesis that the average weight of
students is 75kg).
54 / 1
Example 3.3.4
It should be noted that weight does not have a normal distribution.
However, its distribution is not highly asymmetrical and the
number of observations is not very low.
Hence, the distribution of the test statistic will be reasonably close
to the student t-distribution.
Also, the realisation of the test statistic is not particularly close to
the critical value. Hence, our conclusion seems reasonable.
55 / 1
Use of duality for two-sided tests
We can also use the duality between confidence intervals and two
sided tests.
In this case since the significance level is 5%, the appropriate
confidence level is 95%. The appropriate confidence interval for
the population mean (n < 30) is given by
stn−1,α/2
√
n
12 × 2.064
√
=72 ±
25
=72 ± 4.95 = [67.05, 76.95].
X ± tn−1,α/2 S.E .(X )=X ±
Since 75 belongs to this confidence interval, we do not reject H0 .
56 / 1
Right-sided tests
We consider two types of one sided tests. The first are right sided
tests.
These are tests of the form
H0 : µ = µ0 ;
HA : µ > µ0 .
We reject H0 only if the sample mean is significantly greater than
µ0 .
This corresponds to realisations of the test statistic significantly
greater than 0.
Precisely, we reject the null hypothesis if and only if t > tn−1,α .
57 / 1
Right-sided tests
58 / 1
Left-sided tests
The second type of tests are left-sided tests. These are tests of the
form
H0 : µ = µ0 ; HA : µ < µ0 .
We reject H0 only if the sample mean is significantly smaller than
µ0 .
This corresponds to realisations of the test statistic significantly
smaller than 0.
Precisely, we reject the null hypothesis if and only if t < −tn−1,α .
59 / 1
Left-sided tests
60 / 1
Example 3.3.5
A car producer states that one of his cars burns 6.2 litres of petrol
per 100km.
10 magazines tested the car. The average of their results was 6.5
litres/100 km with a standard deviation of 0.3 litres/100 km.
Is the statement of the producer reasonable at a 5% significance
level?
61 / 1
Example 3.3.5
i) In this case the hypothesis H0 : µ = 6.2 is from a producer.
The alternative states that the product is worse than the producer
states (i.e. consumes more petrol). Hence, HA : µ > 6.2.
ii) The test statistic is
T =
where S.E .(X ) =
X − µ0
,
S.E .(X )
√s .
n
If the observations come from a normal distribution, then this has
a student distribution with n − 1 degrees of freedom.
62 / 1
Example 3.3.5
iii) We calculate the realisation of the test statistic.
s
0.3
S.E .(X )= √ = √ ≈ 0.0095
10
10
6.5 − 6.2
t=
≈ 3.16.
0.0095
iv) We read the appropriate critical value. Since this is a
right-sided test the critical value is given by
tn−1,α = t9,0.05 = 1.833.
63 / 1
Example 3.3.5
v) We state our conclusion. This is a right sided test. Since
t = 3.16 > tn−1,α = t9,0.05 = 1.833,
we reject H0 at a significance level of 5%.
Hence, there is evidence that the producers statement is
unfounded.
64 / 1
3.3.6 Tests for a population proportion
We only consider such tests with large samples (n > 30).
The null hypothesis is H0 : p = p0 .
Under the null hypothesis the standard error of the sample
proportion, pˆ, is
r
p0 (1 − p0 )
S.E .(ˆ
p) =
n
65 / 1
Tests for a population proportion
The test statistic,
Z=
pˆ − p0
,
S.E .(ˆ
p)
has approximately a standard normal distribution.
Note that this statistic is analogous to the statistic for large
sample tests for a population mean.
66 / 1
Tests for a population proportion
The test statistic is a measure of the distance between the sample
proportion and the population proportion.
We reject H0 if this difference is significantly large.
The p-values and critical values for such tests can be calculated in
the same way as for tests for the population mean with a large
sample.
67 / 1
Example 3.3.6
100 of 300 people stated that they wanted to vote for Fine Gael at
the next election.
Test the hypothesis that 30% of the population wish to vote for
Fine Gael at a significance level of 5%.
68 / 1
Example 3.3.6
i) We state our hypotheses
H0 : p = 0.3;
HA : p 6= 0.3
Since we do not know where this hypothesis is from, we use a
two-sided test.
ii) The test statistic is
Z=
pˆ − p0
.
S.E .(ˆ
p)
69 / 1
Example 3.3.6
iii) We calculate the realisation of the test statistic
1
100
=
pˆ=
300
3
r
p0 (1 − p0 )
S.E .(ˆ
p )=
n
r
0.3 × 0.7 √
=
= 0.0007 ≈ 0.02646.
300
Hence,
t=
1/3 − 3
≈ 1.26.
0.02646
70 / 1
Example 3.3.6
iv) We can calculate the p-value of the test. For a two-sided test
p = 2P(Z > |t|) = 2P(Z > 1.26) = 2 × 0.1038 = 0.2176.
v) Since p > α = 0.05, there is no evidence that this proportion
deviates from 30% (we do not reject H0 ).
71 / 1
Example 3.3.6
Note that this conclusion can also be based on the appropriate
critical value.
For a two-sided test this is
Zα/2 = t∞,α/2 = t∞,0.025 = 1.96.
Since |t| = 1.26 < t∞,0.025 = 1.96, we do not reject H0 at a
significance level of 5%.
There is no evidence that the population proportion deviates from
30%.
72 / 1
Example 3.3.6
The duality between confidence intervals and two-sided tests also
works for tests for the population proportion.
However, when we calculate a confidence interval for a proportion,
the estimate of the standard error is based on the sample
proportion and not (as in the hypothesis test) on the supposed
population proportion.
Hence, the duality in this case is only approximate. For example, if
I base the conclusion of a test on a 99% confidence interval for a
population proportion, then the significance level is approximately
1%.
73 / 1
Example 3.3.7
100 of 300 people stated that they wanted to vote for Fine Gael at
the next election.
Calculate a 95% confidence interval for the proportion of the
population wishing to vote for Fine Gael.
On the basis of this confidence interval test the hypothesis that
30% of the population wish to vote for Fine Gael.
74 / 1
Example 3.3.7
The 95% confidence interval for the population proportion is
pˆ ± t∞,α/2 S.E .(ˆ
p ),
where
r
S.E .(ˆ
p )≈
r
=
pˆ(1 − pˆ)
n
1/3 × 2/3 √
= 0.000741 ≈ 0.02722
300
75 / 1
Example 3.3.7
t∞,α/2 = t∞,0.025 = 1.96.
The confidence interval is given by
1
pˆ ± t∞,α/2 S.E .(ˆ
p )= ± 1.96 × 0.02722
3
=0.333 ± 0.053 = [0.280, 0.386]
Since 0.3 belongs to this interval, we do not reject the null
hypothesis that 30% of the population wish to vote for Fine Gael.
The significance level of this test is approximately 5%.
76 / 1