Two‐sample Hypothesis tests in R. Example. (Dependent samples) A Calculus professor gives their students a 10 question algebra pretest on the first day of class, and a similar test towards the end of the course. The idea is that while taking Calculus, maybe the students would remember some long‐forgotten things from algebra. The results are below. Student 1 2 3 4 5 6 7 pretest 1 8 9 5 7 3 10 posttest 2 9 8 6 8 5 10 Determine whether the students performed significantly better on the posttest, using .05. Assume differences in scores would be normally distributed, and the only plausible alternate hypothesis of interest is an improvement on the posttest (a semester of Calculus surely won’t make students worse at doing algebra!). Solution 1. > # create a dataframe with pretest and posttest data > pretest = c(1,8,9,5,7,3,10) > posttest = c(2,9,8,6,8,5,10) > Calculus = data.frame(pretest, posttest) > Calculus pretest posttest 1 1 2 2 8 9 3 9 8 4 5 6 5 7 8 6 3 5 7 10 10 > # Compute differences between the pretest and posttest > d = posttest ‐ pretest > d [1] 1 1 ‐1 1 1 2 0 > # Test whether d tends to be positive, against the null that d = 0 > t.test(d, alternative = 'greater', mu = 0, conf.level =.95) One Sample t‐test data: d t = 1.9868, df = 6, p‐value = 0.04707 alternative hypothesis: true mean is greater than 0 95 percent confidence interval: 0.01568146 Inf sample estimates: mean of x 0.7142857 Since the p‐ value is less than the chosen alpha, we reject the null hypothesis (that there is no difference between the pre‐ and post‐ tests) and conclude that overall the students did significantly better on the post‐test at the 5% level. Solution 2. This can be done a bit more efficiently. > t.test(Calculus$posttest, Calculus$pretest, alternative='greater', conf.level=.95, paired=TRUE) Paired t‐test data: Calculus$posttest and Calculus$pretest t = 1.9868, df = 6, p‐value = 0.04707 alternative hypothesis: true difference in means is greater than 0 95 percent confidence interval: 0.01568146 Inf sample estimates: mean of the differences 0.7142857 The general syntax is t.test(dataset1, dataset2, alternative, conf.level, paired). The first two (ordered) arguments are the sets of data to be used (here two vectors of the data frame we’ve named Calculus). Setting alternative = ‘greater’ specifies that the alternate hypothesis is that the mean of dataset1 is greater than the mean of dataset2. Putting paired = TRUE tells R that we are directly comparing (side‐ by‐side if you will) each pair of data points in the two chosen vectors. On the other hand, sometimes a side‐by‐side comparison of data in this manner would be nonsensical or impossible. Example. Dr. Smith is teaching two sections of statistics, with 15 and 19 students respectively. The grades on an exam are as follows. Section 1: 100,95,90,90,90,90,85,83,80,79,71,71,70,66,48 Section 2: 100,100,100,100,98,98,98,93,93,90,86,83,81,79,79,76,61,48,41 One of the classes asks if they did significantly better or worse on the exam than the other class. Using .10, what should Smith tell them? Doing some calculation (I’ll omit the R code which creates the necessary vectors)… > numSummary(section1) mean sd 0% 25% 50% 75% 100% n 80.53333 13.46353 48 71 83 90 100 15 > numSummary(section2) mean sd 0% 25% 50% 75% 100% n 84.42105 17.65193 41 79 90 98 100 19 The question is now whether this roughly 4 point difference in mean score between the two classes is any big deal statistically. However, a side‐by‐side comparison of the exam scores would make no sense in this situation (these are two independent samples of data!). > t.test(section1, section2, alternative="two.sided", paired=FALSE, var.equal=FALSE) Welch Two Sample t‐test data: section1 and section2 t = ‐0.7284, df = 31.977, p‐value = 0.4716 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: ‐14.759210 6.983771 sample estimates: mean of x mean of y 80.53333 84.42105 We use a two‐sided test since there was no initial inclination that section 1 was superior (or inferior) to section 2. We set paired=FALSE since we do not want a side‐by‐side comparison of the data (in fact, R will inform you that this makes no damn sense if you ask it for the paired test). “var.equal = FALSE” means we do not assume that the two sets of data have the same variance (same standard deviation). This is a technical issue in calculating the t‐statistic; many textbooks would call this assumption “not pooling the data”. In any event, since the p‐value is not less than alpha, we do not reject the null and conclude there’s no significant difference between the exam results in the two classes. About using the t.test In the previous example, > t.test(section1,section2,paired=FALSE) gives the output Welch Two Sample t‐test data: section1 and section2 t = ‐0.7284, df = 31.977, p‐value = 0.4716 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: ‐14.759210 6.983771 sample estimates: mean of x mean of y 80.53333 84.42105 Note that not specifying an alternate hypothesis defaults to a two‐tailed test, and ‘var.equal’ automatically defaults to “FALSE”, as these are the most conservative assumptions. Example. Comparing proportions in two different samples‐‐‐the artificially simple example. A drug company has manufactured a topical compound which is meant to cure certain skin infections. 200 patients with infections are given the treatment and it is found that in 145 of them, the condition had vanished within six days of the treatment. 200 patients with the infection are given a placebo, and 158 of them see the condition vanish within six days. Test whether the cure rate with the treatment is significantly different than with the placebo, using .01. > prop.test(c(145,158), c(200,200), alternative="two.sided", conf.level=.99) 2‐sample test for equality of proportions with continuity correction data: c(145, 158) out of c(200, 200) X‐squared = 1.9598, df = 1, p‐value = 0.1615 alternative hypothesis: two.sided 99 percent confidence interval: ‐0.18008092 0.05008092 sample estimates: prop 1 prop 2 0.725 0.790 Because of the high p‐value, we do not reject the null; this sample does not convince us that the treatment is significantly different from the placebo at the 1% significance level. Example. Comparing proportions in two different samples‐‐‐a more ‘real’ example. Here, I’ve created a data frame called ‘Success’. The idea is that a group of 20 students took a course, and the gender of each student was noted, as was whether or not they passed the course. The question is whether there is a significant difference in the pass rate between males and females. > Success gender passed 1 M Y 2 M N 3 M N 4 F Y 5 F N 6 F Y 7 F Y 8 F N 9 M Y 10 M N 11 F Y 12 F Y 13 F N 14 M N 15 F N 16 F N 17 M Y 18 F N 19 F Y 20 F Y Given a data frame with non‐numeric entries, the table command will create a 2x2 table summarizing the relationships between two of the variables in the data frame. The syntax is intuitive, so I won’t comment further on that. > table(Success$gender, Success$passed) N Y F 6 7 M 4 3 You can see that there were 13 females, of which 7 passed, and there were 7 males, of which 3 passed. Here, it should be apparent that the sample sizes are far too small to make any conclusions, but let’s proceed anyway. > prop.test(table(Success$gender, Success$passed), correct=FALSE) Warning in prop.test(table(Success$gender, Success$passed), correct = FALSE) : Chi‐squared approximation may be incorrect 2‐sample test for equality of proportions without continuity correction data: table(Success$gender, Success$passed) X‐squared = 0.2198, df = 1, p‐value = 0.6392 alternative hypothesis: two.sided 95 percent confidence interval: ‐0.5657762 0.3459960 sample estimates: prop 1 prop 2 0.4615385 0.5714286 In case it wasn’t obvious (the difference in pass rate between men and women is only 11% in a sample of size twenty!), the high p‐value tells us that on the basis of this sample, we cannot conclude that there is any significant difference in pass rate between men and women. Note that ‘correct=FALSE’ means we used no continuity correction, but since many of the cell sizes were small in our tabled data, R warns us that not using the continuity correction is probably a bad idea (in fact the p‐value is probably even higher than .6392).