Download Report

Two Sample t Test and its Applications
Data Analysis II
1. Introduction
The two sample t-test is used to compare two groups. This test has some variations
depending on whether these two groups are independent, paired, or both, whether the
sample size is large enough, and on whether the sample meets certain assumptions.
Although according to the Central Limit Theorem, large enough samples would have
normally distributed sample means regardless of the population distributions the samples
came from, and large enough is usually greater than 30, it is always safer to test the data
in case it might need some transformations; especially since various computer programs
make testing for normality relatively simple.
If assumptions for the t-test are not met, the corresponding nonparametric tests should be
used. Nonparametric test also have assumptions, but these are less rigorous than the
parametric tests.
In some situations one might use independent where paired was most appropriate or vice
versa. It is important to use the right test since each would give the best results when
properly used.
The different t test can be performed with the aid statistical packages such as Sas and
Spss preferably, but if not available Microsoft Excel can also be used.
Independent or Paired
Independent refers to non correlated groups, while paired refers to two different
treatments on the same group.
Independent
Examples of independent groups will be testing the gpa of males versus females;
consumer’s preference on some brand versus another; one method or production versus
another.
Basically independent test is for comparing between two independent groups as the name
suggests.
Paired
Examples of experiments on paired groups would be comparing test performance of
students before and after a tutoring program; amount of some body fluid before and after
some treatment; operation of some system before and after a change.
Although in general paired test is for some before and after type of situation on the same
subject, we can present some examples of pairs without using the same subject. For
example: comparing some aspect using twins, married couples, or any two closely
related subjects. These examples would be better described as matched pairs.
Note that paired samples have less variability than independent samples. A paired test is
the same as a single t test on the differences of the two samples.
Assumptions
Once paired or independent has been selected, the next step will be to check for
assumptions. Some alternative procedure exist in cases were assumptions for a
particular test are not met. Using the wrong test or not checking assumptions would give
invalid results.
Independent t-test Assumptions
The two samples are randomly and independently selected from the two populations.
The two populations are normally distributed.
The two sample variances are equal.
Note: For large samples (n1 and n2 ≥ 30) the sample variances do not have to be equal.
For small samples (n1 or n2 or both ≤ 30) there is an alternative test when variances
cannot be assumed equal. The F test checks the equal variance assumption.
Paired t-test Assumptions
Since the observations are paired in this case we test the differences.
The population of differences has a normal distribution.
The differences are independent.
The differences have equal mean and variance.
Bar charts in Excel and/or Q-Q plots and the Kolmogorov-Smirnov test in statistical
packages will check the normality data assumption. If the data does not follow a normal
distribution transformations can be done. For example the log transformation usually
works for right skewed data. If the log produces a normal transformation then the data is
called lognormal. Reciprocal and square root transformations are also performed
depending on what the original data looks like.
Transformed data will lose some information. The final result could be changed back to
the original scale but rather than interpreting the differences in means it the results would
express the ratio of the means.
Statistical packages as well as Excel will also check for equal variances. Statistical
packages will show two results: one for equal variances assumed and one for equal
variances not assumed. Excel uses the F-test to check for equal variances and then one
can choose from the t test for equal variances or the t test for not equal variances
depending of course on the result obtained from the F-test.
If the data does not follow a normal distribution then a corresponding nonparametric test
would have to be used.
Nonparametric tests
Note: Nonparametric tests use the median and/or ranks rather than the mean and have
fewer restrictions than parametric tests.
For Independent samples: Mann-Whitney
Assumptions
The samples are random and independent each from populations with unknown medians.
Continuous variable measured on at least an ordinal scale.
The distributions differ only with respect to location.
For Paired samples: Wilcoxon Matched-Pairs Signed-Ranks
Assumptions
The differences come from random samples of matched continuous variables.
The differences are independent, and measured on at least an interval scale.
The distribution of differences is symmetric about the median.
Note: Although not as popular as the above tests, the Median test for independent
samples and the Sign test for paired samples are also nonparametric tests available.
2. Methodology
Independent Samples
Hypotheses
A. Ho: µ1 - µ2 = δ0
B. Ho: µ1- µ2 ≥ δ0
C. Ho: µ1- µ2 ≤ δ0
Ha: µ1- µ2 ≠ δ0
Ha: µ1- µ2 < δ0
Ha: µ1- µ2 > δ0
Large sample size: n1 and n2 ≥ 30
Test Statistic
z=
( x1 − x 2 ) − δ 0
σ 12
n1
+
σ 22
n2
Rejection Region
A. Reject Ho if |z| > zα/2
B. Reject Ho if z < -zα
C. Reject Ho if z > zα
100(1- α)% Confidence Interval
A. ( x 1 − x 2 ) ± zα / 2
σ 12
n1
+
σ 22
n2
B. ( µ1 − µ 2 ) ≤ ( x1 − x 2 ) + zα
C. ( µ1 − µ 2 ) ≥ ( x1 − x 2 ) − zα
σ 12
n1
σ 12
n1
+
+
σ 22
n2
σ 22
n2
Small samples n1 or n2 or both ≤ 30
Equal variances assumed
σ1 = σ2
The F tests checks for equal variances
Ho: σ 12 = σ 22
Ha: σ 12 ≠ σ 22
s12
s 22
F=
Reject Ho if F > f n1 −1,n2 −1,α / 2 or F < f n1 −1,n2 −1, −α / 2
Test Statistic
t=
( x1 − x 2 ) − δ 0
SP
1
1
+
n1 n 2
where S P stands for pooled variance and
(n1 − 1) s12 + (n2 − 1) s 22
S =
n1 + n2 − 2
2
P
Rejection Region
A. Reject Ho if |t| > tv,α/2
B. Reject Ho if t < -tv,α
C. Reject Ho if t > tv,α
Where v stands for degrees of freedom and v = n1+ n2 - 2
100(1- α)% Confidence Interval
A. ( x 1 − x 2 ) ± t v ,α / 2 S P
1
1
+
n1 n 2
B. ( µ1 − µ 2 ) ≤ ( x1 − x 2 ) + t v ,α S P
1
1
+
n1 n2
C. ( µ1 − µ 2 ) ≥ ( x1 − x 2 ) − t v ,α S P
1
1
+
n1 n2
Equal variances not assumed
σ1 ≠ σ2
Test Statistic
t=
( x1 − x 2 ) − δ 0
s12 s 22
+
n1 n2
Rejection Region
A. Reject Ho if |t| > tv,α/2
B. Reject Ho if t < -tv,α
C. Reject Ho if t > tv,α
Where v stands for degrees of freedom and
( w1 + w2 ) 2
v= 2
,
w1 /(n1 − 1) + w22 /(n2 − 1)
s12
w1 =
,
n1
s 22
w2 =
n2
100(1- α)% Confidence Interval
A. ( x 1 − x 2 ) ± t v ,α / 2
s12 s 22
+
n1 n2
B. ( µ1 − µ 2 ) ≤ ( x1 − x 2 ) + t v ,α
s12 s 22
+
n1 n2
C. ( µ1 − µ 2 ) ≥ ( x1 − x 2 ) − t v ,α
s12 s 22
+
n1 n2
Paired Samples
Hypotheses
A. Ho: µd = δ0
B. Ho: µd ≥ δ0
C. Ho: µd ≤ δ0
Ha: µd ≠ δ0
Ha: µd < δ0
Ha: µd > δ0
µd = µ1- µ2
Large sample
Test Statistic
z=
xd − δ0
σ d / nd
where x d = sample mean differences
σ d or sd = standard deviation of differences
nd = number of differences (pairs)
Rejection Region
A. Reject Ho if |z| > zα/2
B. Reject Ho if z < -zα
C. Reject Ho if z > zα
100(1- α)% Confidence Interval
A. x d ± zα / 2
σd
σ d ≈ sd
nd
σd
B. µ d ≤ x d + zα
nd
σd
C. µ d ≥ x d − zα
nd
Small samples
Test Statistic
t=
xd − δ0
s d / nd
Rejection Region
A. Reject Ho if |t| > t nd −1,α / 2
B. Reject Ho if t < - t nd −1,α
C. Reject Ho if t > t nd −1,α
100(1- α)% Confidence Interval
A. x d ± t nd −1,α / 2
sd
nd
B. µ d ≤ x d + t nd −1,α
sd
C. µ d ≥ x d − t nd −1,α
sd
nd
nd
Note: Paper 3 suggests a corrected z test for samples with combined independent and
paired observations explaining how this test will give more accurate results than either of
the above tests.
3. Applications
Independent Samples
Equal Variances
This example was taken from http://bmj.bmjjournals.com/collections/statsbk/7.shtml
The addition of bran to the diet has been reported to benefit patients with diverticulosis.
Several different bran preparations are available, and a clinician wants to test the efficacy
of two of them on patients, since favorable claims have been made for each. Among the
consequences of administering bran that requires testing is the transit time through the
alimentary canal. Does it differ in the two groups of patients taking these two
preparations?
The null hypothesis is that the two groups come from the same population. By random
allocation the clinician selects two groups of patients aged 40-64 with diverticulosis of
comparable severity. Sample 1 contains 15 patients who are given treatment A, and
sample 2 contains 12 patients who are given treatment B. The transit times of food
through the gut are measured by a standard technique with marked pellets and the results
are recorded, in order of increasing time, in Table 7.1 .
These data are shown in figure 7.1 . The assumption of approximate Normality and
equality of variance are satisfied. The design suggests that the observations are indeed
independent. Since it is possible for the difference in mean transit times for A-B to be
positive or negative, we will employ a two sided test.
With treatment A the mean transit time was 68.40 h and with treatment B 83.42 h. What
is the significance of the difference, 15.02h?
The table of the tdistribution Table B (appendix) which gives two sided P values is
entered at
degrees of freedom.
For the transit times of table 7.1,
Treatment A Treatment B
shows that
at 25 degrees of freedom (that is (15 - 1) + (12 - 1)), t= 2.282 lies between 2.060 and
2.485. Consequently,
. This degree of probability is smaller than the
conventional level of 5%. The null hypothesis that there is no difference between the
means is therefore somewhat unlikely.
A 95% confidence interval is given by
83.42 - 68.40 2.06 x 6.582
15.02 - 13.56 to 15.02 + 13.56 or 1.46 to 18.58 h.
Unequal standard deviations
If the standard deviations in the two groups are markedly different, for example if the
ratio of the larger to the smaller is greater than two, then one of the assumptions of the
ttest (that the two samples come from populations with the same standard deviation) is
unlikely to hold.
The unequal variance t test tends to be less powerful than the usual t test if the variances
are in fact the same, since it uses fewer assumptions. However, it should not be used
indiscriminantly because, if the standard deviations are different, how can we interpret a
nonsignificant difference in means, for example? Often a better strategy is to try a data
transformation, such as taking logarithms as described in Chapter 2. Transformations that
render distributions closer to Normality often also make the standard deviations similar.
If a log transformation is successful use the usual t test on the logged data.
Applying this method to the data of Table 7.1
Thus d.f. = 22.9, or approximately 23. The tabulated values for 2% and 5% from table B
are 2.069 and 2.500, and so this gives
as before. This might be expected,
because the standard deviations in the original data set are very similar and the sample
sizes are close, and so using the unequal variances t test gives very similar results to the
t test which assumes equal variances.
Unequal Variances
This Example was taken from
http://www.people.vcu.edu/~wsstreet/courses/314_20033/hyptest2ex.pdf
2. Two sections of a class in statistics were taught by two different methods. Students’
scores on a standardized test are shown below. Do the results present evidence of a
difference in the effectiveness of the two methods? (Use á = 0.01.)
Step 1 : Hypotheses
H0 : µA - µB = 0
Ha: µA - µB = 0
Step 2 : Significance Level
α= 0.01
Step 3 : Critical Value(s) and Rejection Region(s)
Since we don’t know the population variances and don’t think that they are equal, we’ll
use the non-pooled t-test.
Reject the null hypothesis if T_≤_–2.82 or if T ≥ 2.82.
Step 4 : Test Statistic
T = 1.2193
p - value = 0.242
Step 5 : Conclusion
Since –2.82 ≤ 1.2193 ≤ 2.82 ( p-value ≈ 0.242 > 0.01), we fail to reject
the null hypothesis.
Step 6 : State conclusion in words
At the α = 0.01 level of significance, there is not enough evidence to conclude
that there is a difference in the effectiveness of the two methods.
Paired Samples
This example was taken from http://www.physics.csbsju.edu/stats/t-test.html
Cedar-apple rust is a (non-fatal) disease that affects apple trees. Its most obvious
symptom is rust-colored spots on apple leaves. Red cedar trees are the immediate source
of the fungus that infects the apple trees. If you could remove all red cedar trees within a
few miles of the orchard, you should eliminate the problem. In the first year of this
experiment the number of affected leaves on 8 trees was counted; the following winter all
red cedar trees within 100 yards of the orchard were removed and the following year the
same trees were examined for affected leaves. The results are recorded below:
tree
number of rusted
leaves: year 1
number of rusted
leaves: year 2
1
2
3
4
5
6
7
8
38
10
84
36
50
35
73
48
32
16
57
28
55
12
61
29
6
-6
27
8
-5
23
12
19
46.8
23
36.2
19
10.5
12
average
standard dev
difference: 1-2
As you can see there is substantial natural variation in the number of affected leaves; in
fact, an unpaired t-test comparing the results in year 1 and year 2 would find no
significant difference. (Note that an unpaired t-test should not be applied to this data
because the second sample was not in fact randomly selected.) However, if we focus on
the difference we find that the average difference is significantly different from zero. The
paired t-test focuses on the difference between the paired data and reports the probability
that the actual mean difference is consistent with zero. This comparison is aided by the
reduction in variance achieved by taking the differences.
4. Discussion
Since the t test has variations according to the above explained situations, one must
carefully decide whether the samples are independent or paired (or even samples
combined of both independent and paired observations) and whether the required
assumptions for the particular chosen test hold to avoid obtaining invalid results.
A test for paired samples reduces to a single t test performed on the differences.
Independent samples t test has more power when the respective variances can be assumed
equal (by testing them with the F test first.) Paired experiments are preferable since less
variation exits, however if the samples are independent the t test for paired would give
invalid results (and vice versa.)
Sometimes a sample could contain a combination of paired and independent
observations. In this case a corrected z test (as explained in “Paper 3”) should be
employed.
References
1. Papers 1-4
2. Tamhane, Ajit C. and. Dunlop, Dorothy D (2000). Statistics and Data Analysis from
Elementary to Intermediate, Upper Saddle River: Prentice Hall, Inc.
3. Daniel, Wayne W. (1990). Applied Nonparametric Statistics, Second Edition, Boston:
PWS-Kent Publishing Company.
4. http://www.physics.csbsju.edu/stats/t-test.html
5. http://www.people.vcu.edu/~wsstreet/courses/314_20033/hyptest2ex.pdf
6. http://bmj.bmjjournals.com/collections/statsbk/7.shtml