Power Curves for t-test
α = .05, X1, . . . , Xn ∼ N (µ, 1)
One-Sample t-Test
1.0
n=20
0.0
0.2
data: rnorm(20, mean = 2, sd = 2)
t = 2.384, df = 19, p-value = 0.02771
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
0.1558692 2.3985594
sample estimates:
mean of x
1.277214
0.6
One Sample t-test
0.4
power
0.8
> t.test(rnorm(20,mean=2,sd=2))
0.0
0.5
1.0
1.5
2.0
2.5
3.0
2.0
2.5
3.0
2.0
2.5
3.0
mu
power
0.2
One Sample t-test
0.4
> t.test(rnorm(20,mean=0,sd=2))
0.6
0.8
1.0
n=10
0.0
data: rnorm(20, mean = 0, sd = 2)
t = -0.9111, df = 19, p-value = 0.3737
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-1.5884003 0.6249483
sample estimates:
mean of x
-0.481726
0.0
0.5
1.0
1.5
mu
0.6
0.0
0.2
>
0.4
power
0.8
1.0
n=5
0.0
0.5
1.0
30
1.5
mu
31
Shoe Wear
> library(MASS)
> shoes
$A
[1] 13.2 8.2 10.9 14.3 10.7 6.6 9.5 10.8
$B
[1] 14.0 8.8 11.2 14.2 11.8 6.4 9.8 11.3
> layout(rbind(c(1,2),c(1,3)))
> boxplot(list(A=shoes$A, B=shoes$B),
notch=T, horizontal=T, boxwex=.5)
> qqnorm(shoes$A); qqline(shoes$A)
> qqnorm(shoes$B); qqline(shoes$B)
Paired t-Test
8.8 13.3
> t.test(shoes$A, shoes$B, paired=T)
9.3 13.6
Paired t-test
data: shoes$A and shoes$B
t = -3.3489, df = 9, p-value = 0.008539
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.6869539 -0.1330461
sample estimates:
mean of the differences
-0.41
12
10
> t.test(shoes$A-shoes$B)
8
B
Sample Quantiles
14
Normal Q−Q Plot
One Sample t-test
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
data: shoes$A - shoes$B
t = -3.3489, df = 9, p-value = 0.008539
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.6869539 -0.1330461
sample estimates:
mean of x
-0.41
Theoretical Quantiles
12
10
8
A
Sample Quantiles
14
Normal Q−Q Plot
>
8
10
12
14
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Theoretical Quantiles
32
33
Non-normality of Traffic Data
Paired vs. Unpaired
> t.test(shoes$A, shoes$B, paired=T)
Paired t-test
data: shoes$A and shoes$B
t = -3.3489, df = 9, p-value = 0.008539
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.6869539 -0.1330461
sample estimates:
mean of the differences
-0.41
Normal Q−Q Plot
# bad idea: using paired=F
Normal Q−Q Plot
30
> t.test(shoes$A, shoes$B)
> library(MASS)
> attach(Traffic)
> qqnorm(y[limit=="yes"]); qqline(y[limit=="yes"])
> qqnorm(y[limit=="no"]); qqline(y[limit=="no"])
> shapiro.test(y[limit=="yes"])
Shapiro-Wilk normality test
data: y[limit == "yes"]
W = 0.9213, p-value = 0.0003330
> shapiro.test(y[limit=="no"])
Shapiro-Wilk normality test
data: y[limit == "no"]
W = 0.9516, p-value = 0.0003928
>
20
15
Sample Quantiles
5
−2
>
10
20
15
10
5
data: shoes$A and shoes$B
t = -0.3689, df = 17.987, p-value = 0.7165
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-2.745046 1.925046
sample estimates:
mean of x mean of y
10.63
11.04
Sample Quantiles
25
25
Welch Two Sample t-test
−1
0
1
Theoretical Quantiles
2
−3
−2
−1
0
1
2
3
Theoretical Quantiles
35
34
Two-Sample t-Test
Normality (?) of Log Traffic Data
> ly.yes <- log(y[limit=="yes"])
> ly.no <- log(y[limit=="no"])
> t.test(ly.yes, ly.no)
Welch Two Sample t-test
data: ly.yes and ly.no
t = -3.2954, df = 147.673, p-value = 0.001231
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.31472006 -0.07876154
sample estimates:
mean of x mean of y
2.866770 3.063511
> var.test(ly.yes, ly.no)
F test to compare two variances
data: ly.yes and ly.no
F = 0.9262, num df = 68, denom df = 114, p-value = 0.7389
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.6109935 1.4372433
sample estimates:
ratio of variances
0.9262063
> t.test(ly.yes, ly.no, var.equal=T)
Two Sample t-test
data: ly.yes and ly.no
t = -3.2638, df = 182, p-value = 0.001313
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.31567636 -0.07780524
sample estimates:
mean of x mean of y
2.866770 3.063511
>
> qqnorm(log(y[limit=="yes"]))
> qqnorm(log(y[limit=="no"]))
> shapiro.test(log(y[limit=="yes"]))
Shapiro-Wilk normality test
data: log(y[limit == "yes"])
W = 0.9832, p-value = 0.4814
> shapiro.test(log(y[limit=="no"]))
Shapiro-Wilk normality test
data: log(y[limit == "no"])
W = 0.9868, p-value = 0.3235
>
Normal Q−Q Plot
3.0
2.5
2.0
Sample Quantiles
1.0
1.5
2.5
2.0
1.5
1.0
Sample Quantiles
3.0
Normal Q−Q Plot
−2
−1
0
1
Theoretical Quantiles
2
−3
−2
−1
0
1
2
3
Theoretical Quantiles
36
37
Approximate Z-Tests in R
Wilcoxon Signed-Rank Test
Instead of log-transforming traffic data, let’s do an
approximate Z-test on the untransformed data:
• For one sample:
• Same statistic, so use same test function:
> x <- c(8.5,8.6,6.4,12.1,8.2,7.4,7.8,8.3,10.3,8.4)
> wilcox.test(x, mu=10, conf.int=T)
Wilcoxon signed rank test
data: x
V = 8, p-value = 0.04883
alternative hypothesis: true mu is not equal to 10
95 percent confidence interval:
7.50 9.95
sample estimates:
(pseudo)median
8.35
> t.test(y[limit=="yes"],y[limit=="no"])
Welch Two Sample t-test
data: y[limit == "yes"] and y[limit == "no"]
t = -3.3995, df = 165.545, p-value = 0.000846
alternative hypothesis: true difference
in means is not equal to 0
95 percent confidence interval:
-6.666816 -1.767967
sample estimates:
mean of x mean of y
18.91304 23.13043
• For paired data:
> wilcox.test(shoes$A, shoes$B, paired=T)
Wilcoxon signed rank test with continuity correction
data: shoes$A and shoes$B
V = 3, p-value = 0.01431
alternative hypothesis: true mu is not equal to 0
Warning message: Cannot compute exact p-value with ties in:
wilcox.test.default(shoes$A, shoes$B, paired = T)
> wilcox.test(shoes$A-shoes$B)
Wilcoxon signed rank test with continuity correction
data: shoes$A - shoes$B
V = 3, p-value = 0.01431
alternative hypothesis: true mu is not equal to 0
Warning message: Cannot compute exact p-value with ties in:
wilcox.test.default(shoes$A - shoes$B)
>
• If you want to be picky, use p-value based on Z
instead of t(165.545):
> 2*pt(-3.3995,df=165.545,lower.tail=T)
[1] 0.0008459695
> 2*pnorm(-3.3995,lower.tail=T)
[1] 0.0006750918
>
but this is all so approximate anyway, why not
take the more conservative t-based p-value?
Recall that “exact” p-value for log-transformed data
assuming normality and equal variances was
p = 0.001313.
39
38
Why I Love the Wilcoxon Test
1.5
1.5
2.0
2.5
3.0
0.0
0.5
0.6
t
mu
2.0
2.5
3.0
power
0.5
1.0
1.0
0.8
0.6
power
0.4
0.2
0.0
0.5
1.0
1.5
mu
2.0
0.5
1.0
1.0
w
0.8
2.0
2.5
2.5
3.0
0.6
3.0
w
w
w
w
wt
wt
w
t
w
t
w
t
0.0
t
t
t
w
t
t
w
w
w
w
t
t
t
t
t
t
t
t
t
t
0.5
1.0
1.5
w
w
2.0
2.5
w
w
w
w
w
w
w
w
w
w
t
t
t
t
t
t
t
t
t
t
w
w
w
t
1.5
w
2.0
w
t
1.0
w
w
w
Mixture of 90% N(mu,sd=1) and 10% N(mu,sd=15), n=20
t
0.5
2.0
t
1.0
0.0
wt
t
n=2
t
t
wt
w
mu
t
t
wt
t−test (theoretical under normality)
t−test (simulated)
Wilcoxon (simulated)
mu
t
t
wt
1.5
w
0.4
power
1.5
t
0.10
t
t
t t
w w w w w w w w w w w
0.0
t
0.0
0.4
power
t
0.2
1.5
0.0
t
wt wt w w w w w w w w w
t
0.25
t
t
t
t
t
t t
w w w w w w w w w w w
0.20
1.0
0.8
0.6
0.4
t
t
t
t
0.0
wt
t−Dist Sample, df=2, n=20
1.0
3.0
t
0.2
power
2.5
t
t
0.0
2.0
wt
mu
t
n=3
t
wt
0.0
t
0.6
0.2
1.5
t
t
3.0
0.2
w
1.0
2.5
w
0.0
t
t w
w
2.0
wt
wt
w
power
t
w
w
mu
t
1.0
1.5
t
t
w
n=4
0.5
1.0
wt
wt
n=5
t w
t w
t
t w
w
0.4
t
mu
0.0
0.5
0.8
1.0
0.8
t
0.6
power
0.2
0.0
0.0
wt
1.0
0.0
wt
wt
wt
mu
t
0.4
1.0
0.8
0.6
0.4
wt
0.2
power
t
w
0.5
3.0
n=6
wt wt wt wt
wt
t
w
0.0
2.5
mu
n=7
t
w
2.0
0.0
1.0
wt
t
w
0.8
0.5
mu
wt
0.8
0.0
t
w
0.0
wt
wt
0.6
3.0
power
2.5
wt
t
w
0.4
2.0
wt
wt
0.2
1.5
wt
t
w
0.15
1.0
0.05
0.0
0.0
0.5
t
w
t
w
0.2
0.2
t
w
t
w
wt
0.0
power
t
w
wt
t
w
0.6
0.6
power
t
w
Normal Sample, sd=1, n=20
w w w w
t wt t t t t
w
0.4
0.8
t w
t w
t w
t
t w
t w
w
0.4
0.8
0.6
0.4
w
t
0.2
power
wt
n=8
1.0
n=9
1.0
1.0
n=10
t wt wt wt wt wt
wt w
Why I Love It Even More
3.0
mu
w
wt
0.0
w
t
w
t
t
t
t
t
0.5
t
t
t
t
1.0
1.5
2.0
mu
40
41
Power of Shapiro Test
Two-Sample Wilcoxon Test
> wilcox.test(y.yes,y.no)
Wilcoxon rank sum test with continuity correction
data: y.yes and y.no
W = 2878, p-value = 0.001830
alternative hypothesis: true mu is not equal to 0
> wilcox.test(log(y.yes),log(y.no))
Wilcoxon rank sum test with continuity correction
data: log(y.yes) and log(y.no)
W = 2878, p-value = 0.001830
alternative hypothesis: true mu is not equal to 0
>
> rejects <- 0
> for (i in 1:10000) {
+
x <- rnorm(30, mean=45, sd=20)
+
if (shapiro.test(x)$p.value < 0.05) rejects <- rejects + 1
+ }
> rejects
[1] 530
> rejects <- 0
> for (i in 1:10000) {
+
x <- rgamma(30, shape=2, rate=20)
+
if (shapiro.test(x)$p.value < 0.05) rejects <- rejects + 1
+ }
> rejects
[1] 7513
> rejects <- 0
> for (i in 1:10000) {
+
x <- 30+45*rt(30, df=4)
+
if (shapiro.test(x)$p.value < 0.05) rejects <- rejects + 1
+ }
> rejects
[1] 3249
>
43
42
Power of Shapiro Test
Multigroup Location Tests
100
Normal Sample (Shapiro p=0.24)
60
40
20
Sample Quantiles
80
> names(PlantGrowth)
[1] "weight" "group"
> levels(PlantGrowth$group)
[1] "ctrl" "trt1" "trt2"
> oneway.test(weight ~ group, data=PlantGrowth)
0
One-way analysis of means (not assuming equal variances)
−2
−1
0
1
data: weight and group
F = 5.181, num df = 2.000, denom df = 17.128, p-value = 0.01739
2
Theoretical Quantiles
> oneway.test(weight ~ group, data=PlantGrowth, var.equal=T)
0.5
Gamma Sample (Shapiro p=0.000063)
0.2
0.3
data: weight and group
F = 4.8461, num df = 2, denom df = 27, p-value = 0.01591
> summary(aov(weight ~ group, data=PlantGrowth))
Df Sum Sq Mean Sq F value Pr(>F)
group
2 3.7663 1.8832 4.8461 0.01591 *
Residuals
27 10.4921 0.3886
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> kruskal.test(weight ~ group, data=PlantGrowth)
0.1
Sample Quantiles
0.4
One-way analysis of means
−2
−1
0
1
2
Theoretical Quantiles
50
Kruskal-Wallis rank sum test
−50
0
data: weight by group
Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842
>
−100
Sample Quantiles
100
t Sample (Shapiro p=0.097)
−2
−1
0
Theoretical Quantiles
1
2
44
45
Testing Two-Way Contingencies
Testing Two-Way Contingencies
> table(matlab,stat404)
stat404
matlab - n y
0 5 9 17
1 0 6 5
2 1 1 1
3 0 0 2
> chisq.test(table(matlab,stat404))
> table(r,stat404)
stat404
r
- n y
0 2 4 0
1 3 4 1
2 0 7 14
3 1 1 10
> chisq.test(table(r,stat404))
Pearson’s Chi-squared test
Pearson’s Chi-squared test
data: table(matlab, stat404)
X-squared = 6.3824, df = 6, p-value = 0.3817
data: table(r, stat404)
X-squared = 21.9431, df = 6, p-value = 0.00124
Warning message:
Chi-squared approximation may be incorrect
in: chisq.test(table(matlab, stat404))
> fisher.test(table(matlab,stat404))
Warning message:
Chi-squared approximation may be incorrect
in: chisq.test(table(r, stat404))
> fisher.test(table(r,stat404))
Fisher’s Exact Test for Count Data
Fisher’s Exact Test for Count Data
data: table(matlab, stat404)
p-value = 0.3786
alternative hypothesis: two.sided
data: table(r, stat404)
p-value = 0.0001139
alternative hypothesis: two.sided
>
>
47
46
Bigger Example
> library(MASS)
> names(Melanoma)
[1] "time"
"status"
"sex"
[6] "thickness" "ulcer"
> attach(Melanoma)
> table(sex,ulcer)
ulcer
sex 0 1
0 79 47
1 36 43
> chisq.test(table(sex,ulcer))
For Unpaired (Independent)
Samples
"age"
"year"
Independent, iid samples:
X1 , . . . , X20 ∼ N (µX , 1)
Y1, . . . , Y20 ∼ N (µY , 1)
At α = 0.05,
Pearson’s Chi-squared test with Yates’ continuity correction
Simulated Power for Unpaired Samples (n=20)
1.0
data: table(sex, ulcer)
X-squared = 5.1099, df = 1, p-value = 0.02379
0.8
u
w
p
0.6
power
data: table(sex, ulcer)
p-value = 0.02061
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
1.089871 3.700065
sample estimates:
odds ratio
2.000701
p
u
w
p
u
w
p
u
w
u
p
w
u
w
p
0.4
Fisher’s Exact Test for Count Data
u
w
p
u
w
p
u
w
p
u
w
p
u
w
p
0.2
> fisher.test(table(sex,ulcer))
u
w
p
p
u
w
paired (wrong!)
unpaired (okay)
Welch (okay)
p
u
w
p
u
w
0.0
p
u
w
p
u
w
0.5
1.0
1.5
difference in means
>
48
49
For Paired Data
Summary of Tests
(X1, Y1 ), . . . , (X20 , Y20 ) iid bivariate normal.
Location tests:
At α = 0.05,
• t.test: t-tests for one sample, paired, two-sample
(equal variance and Welch’s approximation). Can
also be used for approximate Z-tests.
• Positive correlation (usual) ρ = +0.9
Simulated Power for Paired Data (n=20)
p
p
p
0.8
p
u
w
p
u
w
p
u
w
p
u
w
p
u
w
p
u
w
p
u
w
p
u
w
u
w
p
0.4
power
p
p
u
w
0.0
p
p
u
w
u
w
p
u
w
u
w
u
w
u
w
u
w
0.0
0.5
paired (okay)
unpaired (wrong!)
Welch (wrong!)
1.0
1.5
difference in means
1.0
power
0.6
0.2
u
w
u
w
p
0.0
u
w
p
u
w
p
u
w
p
u
w
p
u
w
p
p
u
w
u
w
p
p
p
u
w
1.0
u
w
p
u
w
p
u
w
p
u
w
p
p
p
p
0.5
u
w
• kruskal.test: Kruskal-Wallis rank-sum test for
difference in means of two or more samples.
• chisq.test: χ2 -test for one-way and interaction in
two-way contingency tables.
Simulated Power for Paired Data (n=20)
u
w
• oneway.test: F -test for difference in means of two
or more samples (equal variance and Welch’s
approximation).
Categorical data tests:
• Negative correlation (unusual) ρ = −0.9
u
w
• wilcox.test: Wilcoxon signed-rank test for one
sample or paired data, Wilcoxon/Mann-Whitney
rank-sum test for two samples.
paired (okay)
unpaired (wrong!)
Welch (wrong!)
1.5
• fisher.test: Fisher’s exact test for interaction in
two-way contingency tables.
Other tests:
• shapiro.test: Shapiro-Wilk test for H0 : sample is
normal.
• var.test: F -test for H0 : samples have equal
variances.
difference in means
50
51