BMED2803 BV Two Sample t-Test Dependent Samples Formulas. The sample from the population 1, X11 , X12 , . . . , X1,n , is paired with the sample from the population 2, X21 , X22 , . . . , X2,n , such that (X1i , X2i ) represents the ith pair. Usually, the observations in the pair are taken on the same subject or dependent subjects (brother-sister, two subjects with the same IQ, etc.) The examples are numerous: pretest-posttest, measure before treatment-measure after treatment, etc. It is assumed that the samples come from the normal populations with possibly different means (subject of the test) and with the same, but unknown variances. Then, di = X1i − X2i are also normal. Define t= d¯ √ , sd / n where d¯ is an average of differences di and sd is the sample standard deviation of the differences. The sample size n is equal to number of pairs. We are interested in testing H0 : µ1 = µ2 versus one of the alternatives H1 : µ1 >, 6=, < µ2 , and the test statistics t has Student t distribution with n − 1 degrees of freedom when H0 is true. This test coincides with one sample t, where the sample are all differences and we are testing that the mean in the population of differences is equal to 0. two samples Alternative H1 : µ1 > µ2 i.e., µ1 − µ2 > 0 H1 : µ1 6= µ2 i.e., µ1 − µ2 6= 0 H1 : µ1 < µ2 i.e., µ1 − µ2 < 0 α-level rejection region [tn−1,1−α , ∞) (−∞, tn−1,α/2 ] ∪ [tn−1,1−α/2 , ∞) (−∞, tn−1,α ] p-value 1-tcdf(t, n-1) 2*tcdf(-abs(t), n-1) tcdf(t, n-1) Controlling Blood Pressure. In the past, many bodily functions were thought to be beyond conscious control. However, recent experimentation suggests that it may be possible for a person to control certain body functions if that person is trained in a program of biofeedback exercises. An experiment is conducted to show that blood pressure levels can be consciously reduced in people trained in this program. The blood pressure measurements (in millimeters of mercury) listed in the table represent readings before and after the biofeedback training of five subjects. Subject 1 2 3 4 5 Before 137 201 167 150 173 After 130 180 150 153 162 (a) If we want to test whether the mean blood pressure decreases after the training, what are the appropriate null and alternative hypotheses? (b) Perform the test in (a) with α = 0.05. (c) What assumptions are needed to assure validity of results. [(a) H0 : µ1 = µ2 versus H1 : µ1 > µ2 or in terms of differences, H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 > 0. 1 ¯ (b) To follow the alternative H√ 1 the diferrences d1 should be taken as X1i −X2i Here, di = {7, 21, 17, −3, 11}, d = 10.6, sd = 9.32, t = 10.6/(9.32/ 5) = 2.54, t4,0.95 = 2.131847.] (c) Variances are the same, normal distributions. Marijuana. Investigators have studied the effects of marijuana on human physiology. One common belief held by laypersons is that marijuana affects pupil size. Weil et al.1 studied number of subjects. Each was administered a high dose of marijuana by smoking a potent marijuana cigarette. The subjects ware all males, 21 to 26 years of age, all of whom smoked tobacco cigarettes regularly but have never tried marijuana. In this study, pupil size was measured with a millimeter rule under constant illumination with eyes focused on an object at a constant distance. Pupil size was measured before and after smoking marijuana. The part of data are given below. Individual Before marijuana After marijuana 1. 2. 3. 4. 1 6 6 2 5 7 3 3 9 4 3 5 5 5 9 6 3 9 Describe the hypotheses of interest for testing. (Hint. The alternative should be one sided) What is the error of II kind in the terms of the problem? Perform the test at 5% significance level. You assumed data come from normal populations. Why then you can not use z cut-points. two samples IQ test pairing. In a study, children were first given an IQ test. The two lowest-scoring children were randomly assigned, one to a “noun-first” task, the other to a “noun-last” task. The two next-lowest IQ children were similarly assigned, one to “noun-first” task, the other to a “noun-last” task, and so on until all children were assigned. The data (scores on a word-recall task) are shown here, listed in order from lowest to highest IQ score Noun-first Noun-last 12 10 21 12 12 23 16 14 20 16 39 8 26 16 29 22 30 32 35 13 38 32 34 35 1. Are these two samples (Noun-first, Noun-last) independent? 2. Test the hypothesis that the population mean difference is 0 assuming the two sided alternative. Take α = 10%. The following info may be useful: the difference sample mean is 6.583 and the difference sample standard deviation is 11.041. % Noun First Example disp(’Noun First Example’) nounfirst =[12 21 12 16 20 39 nounlast =[10 12 23 14 16 8 d=nounfirst - nounlast; dbar = mean(d) %dbar = 6.5833 sd = std(d) %sd = 11.0409 n = length(d) 26 16 1 29 22 30 35 38 34]; 32 13 32 35]; Weil, A. T., Zinberg, N. E., and Nelson, J. (1968). Clinical and psychological effects of marijuana in man. Science, 1968, No 162, 1234-1242. 2 %n = 12 t = dbar/(sd/sqrt(n)) %t = 2.0655 pval = 1-tcdf(t, n-1) %pval =0.0316 Fatigue. According to the article “Practice and Fatigue Effects on the Programming of a Coincident Timing Response,” published in the Journal of Human Movement Studies in 1976, practice under fatigued conditions distorts mechanisms which govern performance. An experiment was conducted using 15 college males who were trained to make a continuous horizontal right-to-left arm movement from a micro-switch to a barrier, knocking over the barrier coincident with the arrival of a clock sweephand to the 6 o’clock position. The absolute value of the difference between the time, in milliseconds, that it took to knock over the barrier and the time for the sweephand to reach the 6 o’clock position (500 msec) was recorded. Each participant performed the task five times under pre-fatigue and post-fatigue conditions, and the sums of the absolute differences for the five performances were recorded as follows: Subject 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Absolute Time differences (msec) Pre-fatigue Post-fatigue 158 91 92 59 65 215 98 226 33 223 89 91 148 92 58 177 142 134 117 116 74 153 66 219 109 143 57 164 85 100 two samples An increase in the mean absolute time differences when the task is performed under post-fatigue conditions would support the claim that practice under fatigued conditions distorts mechanisms that govern performance. Assuming the populations to be normally distributed, test this claim at level α = 0.01. Presidents. No man who ever held the office of President would congratulate a friend on obtaining it. J OHN A DAMS . In this partial list of American presidents two variables are recorded: X-life expectancy after 1st inauguration, and Y -actual years lived after 1st inauguration. Test the hypothesis that the number of actual years lived is substantially smaller than the life expectancy. Use α = 0.05. 3 Name Andrew Johnson Ulysses Grant Rutherford Hayes James Garfield Chester Arthur Grover Cleveland Benjamin Harrison William McKinley Theodore Roosevelt William Taft Woodrow Wilson Warren Harding Calvin Coolidge Herbert Hoover Franklin Roosevelt Harry Truman Dwight Eisenhower John Kennedy Lyndon Johnson X - life expectancy after 1st inauguration 17.2 22.8 18.0 21.2 20.1 22.1 17.2 18.2 26.1 20.3 17.1 18.1 21.4 19.0 21.7 15.3 14.7 28.5 19.3 Y - actual years lived after 1st inauguration 10.3 16.4 15.9 .5 5.2 23.3 12.0 4.5 17.3 21.2 10.9 2.4 9.4 35.6 12.1 27.7 16.2 2.8 9.2 two samples lexp=[17.2 22.8 18.0 21.2 20.1 22.1 17.2 18.2 26.1 20.3 17.1 ... 18.1 21.4 19.0 21.7 15.3 14.7 28.5 19.3]; ylived = [10.3 16.4 15.9 0.5 5.2 23.3 12.0 4.5 17.3 21.2 10.9... 2.4 9.4 35.6 12.1 27.7 16.2 2.8 9.2]; d = lexp - ylived; n= length(d) dbar = mean(d) sd = std(d) t = dbar/( sd/sqrt(n)) pval = 1 - tcdf(t, n-1) % n =19; dbar = 6.6000; sd = 10.3425; t = 2.7816; pval = 0.0062 Independent Samples We consider testing equality of two normal means when variances are not known and the populations/samples are independent. Assume we observed X11 , X12 , . . . , X1,n1 from population with distribution N (µ1 , σ12 ) and X21 , X22 , . . . , X2,n2 from N (µ2 , σ22 ). We are interested in testing the hypothesis H0 : µ1 = µ2 versus the alternative H1 : µ1 >, 6=, < µ2 , at significance level α. There are two scenarios that depend on population variances. Scenario 1: Variances unknown but assumed equal. In this case joint σ 2 is estimated by both s21 and s22 . The weighted average of s21 and s22 is better estimator than individual s2 ’s and the weights depend on sample sizes: s2p = (n1 − 1)s21 + (n2 − 1)s22 n2 − 1 n2 − 1 = s21 + s2 = ws21 + (1 − w)s22 . n1 + n2 − 2 n1 + n2 − 2 n1 + n2 − 2 2 4 One can show that when H0 is true, i.e., when µ1 = µ2 the statistic t= ¯ −X ¯2 X p 1 sp 1/n1 + 1/n2 has Student t distribution with df = n1 + n2 − 2 degrees of freedom. Scenario 2: No assumption about the variances. In this case, when H0 is true, i.e., when µ1 = µ2 the statistic ¯1 − X ¯2 X t= p 2 s1 /n1 + s22 /n2 has t distribution with approximately df = (s21 /n1 + s22 /n2 )2 (s21 /n1 )2 /(n1 − 1) + (s22 /n2 )2 /(n2 − 1) degrees of freedom. This is a special case of the so called Welch-Satterwhite formula that approximates the degree of freedom for a linear combination of chi-square distributions In both cases Alternative H1 : µ1 > µ2 i.e., µ1 − µ2 > 0 H1 : µ1 6= µ2 i.e., µ1 − µ2 6= 0 H1 : µ1 < µ2 i.e., µ1 − µ2 < 0 α-level rejection region [tdf,1−α , ∞) (−∞, tdf,α/2 ] ∪ [tdf,1−α/2 , ∞) (−∞, tdf,α ] p-value 1-tcdf(t, df) 2*tcdf(-abs(t), df) tcdf(t, df) two samples Exposure to lead. To verify the hypothesis that blood levels tend to be higher for children whose parents work in a factory that uses lead in the manufacturing process, researchers examined lead levels in the blood of 12 children whose parents worked in a battery manufacturing factory. The results for the “case children” X11 , X12 , . . . , X1,12 are compared to a “control” sample X21 , X22 , . . . , Y2,15 consisting of children selected randomly from the families where the parents did not work in a factory that uses lead. The resulting sample means and sample standard deviations were X 1 = .015, s1 = .004, X 2 = .006, and s2 = .006. (i) Formulate the hypotheses to be tested and use the one-sided alternative. (ii) Perform the test of hypotheses from (i) at the level α = 0.05. To select the test analyze the equality of population variances, also at α = 0.05 level. (iii) Find 95% Confidence Interval for the difference in population means, µ1 − µ2 . (iv) What power has this test against the alternative H1 : µ1 − µ2 = 0.005? (v) In designing future experiment to test the same phenomenon as in (i), it is desired that α = 5% test achieves the power of 1 − β = 90% against the alternative H1 : µ1 − µ2 = 0.005. What sample size is needed? Solution: (i) Only two research hypotheses make sense in this context, H1 : µ1 > µ2 and H1 : µ1 6= µ2 . The one sided alternative is leads to more precise analysis. Thus, we will test H0 : µ1 = µ2 versus H1 : µ1 > µ2 . (ii) Before testing for the equality of means we need to make assumption about variances. This is guided by na additional test of equality of variances. The test for variances is described in your text [pages 310-317]. 5 s2 To test H0 : σ12 = σ22 versus H1 : σ12 6= σ22 , we compute the ratio F = s21 that has Fisher’s F distribution 2 with n1 − 1 and n2 − 1 degrees of freedom. Then we find p-value as 2 * fcdf(F, n1-1, n2-1) if the statistics F < 1 and 2*(1-fcdf(F, n1-1, n2-1)) if the statistics F > 1. In our case F was less than 1 and n1 = 12; X1bar = 0.010; s1=0.004; n2 = 15; X2bar = 0.006; s2 = 0.006; Fstat = s1ˆ2/s2ˆ2 %% Fstat = 0.4444 %Since Fstat<1 the p-value is pval = 2 * Fcdf(0.4444, n1-1, n2-1) %% pval = 0.1825 Guided by the test of variances we assume that the population variances are the same and use t-statistics with pooled standard deviation. The test statistics is s ¯1 − X ¯2 X (n1 − 1)s21 + (n2 − 1)s22 p t= , where sp = , n1 + n2 − 2 sp 1/n1 + 1/n2 two samples and it is t-distributed with n1 + n2 − 2 degrees of freedom. sp = sqrt( ((n1-1)*s1ˆ2 + (n2-1)*s2ˆ2 )/(n1 + n2 - 2) ) %% sp =0.0052 df= n1 + n2 - 2 %%df = 25 tstat = (X1bar - X2bar)/(sp * sqrt(1/n1 + 1/n2)) %%tstat=1.9803 pvalue = 1 - tcdf(tstat, n1+n2-2) %%pvalue = 0.0294 approx 3% The null hypothesis is rejected at 5% level. If we wanted to use rejection region method, the alternative is one sided and the rejection region is RR = [tn1 +n2 −2,1−α , ∞). tinv(1-0.05, df) %%%ans =1.7081 By rejection-region arguments, the hypothesis H0 is rejected since t > tn1 +n2 −2,1−α , that is 1.9803 > 1.7081. (iii) The expression for confidence interval for the difference of population means µ1 − µ2 follows from the form of t statistics in (ii), see the text, page 308-309. The CI of confidence (1 − α) × 100% is p p ¯1 − X ¯ 2 − tn +n −2,1−α/2 sp 1/n1 + 1/n2 , X ¯1 − X ¯ 2 + tn +n −2,1−α/2 sp 1/n1 + 1/n2 ]. [X 1 2 1 2 LB = X1bar - X2bar - tinv(0.975, df)*sp * sqrt(1/n1 + 1/n2) %%LB=-0.00016 UB = X1bar - X2bar + tinv(0.975, df)*sp * sqrt(1/n1 + 1/n2) %%UB = 0.0082 (iv) The equation 8.28 modified for one-sided alternative (text page 333, zα = −z1−α ), Ã ! ∆ 1 − β = Φ zα + p 2 , σ1 /n1 + σ22 /n2 gives power of one-sided, level α test against the alternative H1 : µ1 − µ2 = 0.005(= ∆). The normal approximation is used and s21 and s22 are plugged in the place of σ12 and σ22 . 6 power = normcdf( norminv(0.05) + 0.005/sqrt(s1ˆ2/n1+s2ˆ2/n2) ) %%power1= 0.8271 Thus the power is about 83%. (v) The sample size for the new test is prospective in nature and we assume that σ12 and σ22 are known and equal to s21 and s22 from the study (now considered to be a pilot study). The formula 8.26 (text page 332) adjusted for one-sided test is (σ 2 + σ22 )(z1−α + z1−β )2 n= 1 , ∆2 which in MATLAB gives ssize = (s1ˆ2 + s2ˆ2)*(norminv(0.95)+norminv(0.9))ˆ2/(0.005ˆ2) %% ssize = 17.8128 approx 18 each The number of children is 18 per group, if the sample sizes are desired the same, n1 = n2 . Consult the book for the case when n2 = k × n1 , if such sampling is desired. Stress, Diet and Acids. In the study “Interrelationships Between Stress, Dietary Intake, and Plasma Ascorbic Acid During Pregnancy” conducted at the Virginia Polytechnic Institute and State University, the plasma ascorbic acid levels of pregnant women were compared for smokers versus non-smokers. Thirty-two women in the last three months of pregnancy, free of major health disorders, and ranging in age from 15 to 32 years were selected for the study. Prior to the collection of 20 ml of blood, the participants were told to avoid breakfast, forego their vitamin supplements, and avoid foods high in ascorbic acid content. From the blood samples, the following plasma ascorbic acid values of each subject were determined in milligrams per 100 milliliters: two samples Plasma Ascorbic Acid Values Non-smokers Smokers 0.97 1.06 0.48 0.72 0.86 0.81 1.00 0.85 0.98 0.81 0.58 0.68 0.62 0.57 1.18 1.22 0.64 1.36 1.24 0.98 0.88 0.89 1.09 1.64 0.90 0.92 0.74 0.78 0.88 1.14 0.94 1.18 Propose statistical inference. nonsmo =[0.97 0.72 1.00 0.81 0.62 1.32 1.24 0.99 ... 0.90 0.74 0.88 0.94 1.06 0.86 0.85 0.58 0.57... 0.64 0.98 1.09 0.92 0.78 1.14 1.18]; smo =[ 0.48 0.81 0.98 0.68 1.18 1.36 0.78 1.64]; %test hypothesis that the plasma ascorbic acid levels are different for %the two groups. Use alpha=0.05. 7 X1bar = mean(nonsmo); s1 = std(nonsmo); n1 = length(nonsmo); X2bar = mean(smo); s2 = std(smo); n2= length(smo); %s1 = 0.2045 s2 = 0.3833 we will check for equality of variances F = s1ˆ2/s2ˆ2 %is smaller than 1 pval1 = 2*fcdf(F, n1-1, n2 -1) % pval1 =0.0208 < 5\% and we will not assume equality of variance in % comparing the two means. % The "nasty" df for the t test is [text 318p] ndf = (s1ˆ2/n1 + s2ˆ2/n2 )ˆ2 /( (s1ˆ2 /n1)ˆ2/(n1-1) + (s2ˆ2/n2)ˆ2 /(n2-1) ) t = (X1bar - X2bar)/sqrt( s1ˆ2/n1 + s2ˆ2/n2 ) pval = 2*tcdf(-abs(t),ndf) % ndf = 8.3689; t =-0.5730; pval = 0.5817 % the two means are not significantly different at level 5\% Satterthwaite, F. E. (1946), ”An Approximate Distribution of Estimates of Variance Components.”, Biometrics Bulletin 2: 110-114 Welch, B. L. (1947), ”The generalization of ”student’s” problem when several different population variances are involved.”, Biometrika 34: 28-35 two samples 8