2.3.4 Fisher’s exact test

i
i
“book” — 2009/6/8 — 14:41 — page 75 — #95
i
2.4. TWO SAMPLE TESTS FOR CONTINUOUS VARIABLES
2.3.4
i
75
Fisher’s exact test
HELP example: see 2.6.3
SAS
proc freq data=ds;
tables x * y / exact;
run;
or
proc freq data=ds;
tables x * y;
exact fisher / mc n=bnum;
run;
Note: The former requests only the exact p-value; the latter generates a Monte Carlo pvalue, an asymptotically equivalent test based on bnum random tables simulated using the
observed margins.
R
fisher.test(y, x)
or
fisher.test(ymat)
Note: The fisher.test() command can accept either two class vectors or a matrix with
counts (here denoted by ymat). For tables with many rows and/or columns, p-values can
be computed using Monte Carlo simulation using the simulate.p.value option.
2.3.5
McNemar’s test
McNemar’s test tests the null hypothesis that the proportions are equal across matched
pairs, for example, when two raters assess a population.
SAS
proc freq data=ds;
tables x * y / agree;
run;
R
mcnemar.test(y, x)
Note: The mcnemar.test() command can accept either two class vectors or a matrix with
counts.
2.4
Two sample tests for continuous variables
2.4.1
Student’s t-test
SAS
proc ttest data=ds;
class x;
var y;
run;
HELP example: see 2.6.4
i
i
i
i
i
i
“book” — 2009/6/8 — 14:41 — page 76 — #96
i
76
i
CHAPTER 2. COMMON STATISTICAL PROCEDURES
Note: The variable X takes on two values. The output contains both equal and unequalvariance t-tests, as well as a test of the null hypothesis of equal variance.
R
t.test(y1, y2)
or
t.test(y ~ x)
Note: The first example for the t.test() command displays how it can take two vectors
(y1 and y2) as arguments to compare, or in the latter example a single vector corresponding
to the outcome (y), with another vector indicating group membership (x) using a formula
interface (see sections B.4.6 and 3.1.1). By default, the two-sample t-test uses an unequal
variance assumption. The option var.equal=TRUE can be added to specify an equal variance
assumption. The command var.test() can be used to formally test equality of variances.
2.4.2
Nonparametric tests
SAS
proc npar1way data=ds wilcoxon edf median;
class y;
var x;
run;
HELP example: see 2.6.4
Note: Many tests can be requested as options to the proc npar1way statement. Here we
show a Wilcoxon test, a Kolmogorov–Smirnov test, and a median test, respectively. Exact
tests can be generated by using an exact statement with these names, e.g., the exact
median statement will generate the exact median test.
R
wilcox.test(y1, y2)
ks.test(y1, y2)
library(coin)
median_test(y ~ x)
Note: By default, the wilcox.test() function uses a continuity correction in the normal
approximation for the p-value. The ks.test() function does not calculate an exact pvalue when there are ties. The median test shown will generate an exact p-value with the
distribution="exact" option.
2.4.3
Permutation test
SAS
proc npar1way data=ds;
class y;
var x;
exact scores=data;
run;
HELP example: see 2.6.4
i
i
i
i
i
i
“book” — 2009/6/8 — 14:41 — page 77 — #97
i
2.5. FURTHER RESOURCES
i
77
or
proc npar1way data=ds;
class y;
var x;
exact scores=data / mc n=bnum;
run;
Note: Any test described in 2.4.2 can be named in place of scores=data to get an exact
test based on those statistics. The mc option generates an empirical p-value (asymptotically
equivalent to the exact p-value) based on bnum Monte Carlo replicates.
R
library(coin)
oneway_test(y ~ as.factor(x), distribution=approximate(B=bnum))
Note: The oneway_test function in the coin library implements a variety of permutation
based tests (see also the exactRankTests package). The distribution=approximate syntax generates an empirical p-value (asymptotically equivalent to the exact p-value) based
on bnum Monte Carlo replicates.
2.4.4
Logrank test
HELP example: see 2.6.5
See also 5.1.19 (Kaplan–Meier plot) and 4.3.1 (Cox proportional hazards model)
SAS
proc phreg data=ds;
model timevar*cens(0) = x;
run;
or
proc lifetest data=ds;
time timevar*cens(0);
strata x;
run;
Note: If cens is equal to 0, then proc phreg and proc lifetest treat time as the time of
censoring, otherwise it is the time of the event. The default output from proc lifetest
includes the logrank and Wilcoxon tests. Other tests, corresponding to different weight
functions, can be produced with the test option to the strata statement. These include
test=fleming(ρ1 , ρ2 ), a superset of the G-rho family of Fleming and Harrington [23], which
simplifies to the G-rho family when ρ2 = 0.
R
library(survival)
survdiff(Surv(timevar, cens) ~ x)
Note: Other tests within the G-rho family of Fleming and Harrington [23] are supported by
specifying the rho option.
2.5
Further resources
Comprehensive introductions to using SAS to fit common statistical models can be found
in [9] and [15]. Similar methods in R are accessibly presented in [95]. Efron and Tibshi-
i
i
i
i