Two-Sample Designs Q560: Experimental Methods in Cognitive Science Lecture 9 Why not z-test: An Example: It is thought that we are genetically hardwired to recognize human faces. In a preferential looking paradigm, newborns are presented with two stimuli: one representing a face, and one containing the same features, but in a different configuration. The experimenter records how long the infants look at the face stimulus during a 60-sec presentation (lets assume they always look at one or the other) By chance, we would only expect them to look at the face stimulus for 30 seconds, but they look for 35 seconds…is this effect significant? Sample Variance We don t know the variability of the population. But: we do know the variability of the sample. Sample variance = s2 = SS n-1 = SS Sample standard deviation = √s2 df Estimated Standard Error We can use the estimated standard error as an estimate of the real standard error. " "2 "M = = n n Standard error = ! Estimated standard error = ! s sM = = n s2 n t Statistic: Definition Substituting the estimated standard error in the formula for the z-score gives us the following: M-µ t statistic = t = sM The t-statistic approximates a z-score, using the sample variance instead of the population variance (which is unknown). How well does that work? Degrees of Freedom and t Statistic Degrees of freedom describes the number of scores in a sample that are free to vary. degrees of freedom = df = n-1 The greater df, the better the t-statistic approximates the z-score. The set of t statistics for a given df (n) forms a t distribution. For large df (large n) the t distribution approximates the normal distribution. t Distribution: Shape Hypothesis Tests Using the t Statistic Same procedure as with z-scores, except using the t statistic instead. Step 1: State hypothesis, in terms of population parameter µ. Step 2: Determine critical region, using α, df, and looking up t. Step 3: Collect data and calculate value for t using estimated standard error. Step 4: Decide, based on whether t value for sample falls within critical region One sample T-Test: An Example We ll go back to our preferential looking paradigm and newborn babies. We show them the two stimuli for 60 seconds, and measure how long they look at the facial configuration. Our null assumption is that they will not look at it for longer than half the time, µ = 30 Our alternate hypothesis is that they will look at the face stimulus longer b/c face recognition is hardwired in their brain, not learned (directional) Our sample of n = 26 babies looks at the face stimulus for M = 35 seconds, s = 16 seconds Test our hypotheses (α = .05, one-tailed) Step 1: Hypotheses Sentence: Null: Babies look at the face stimulus for less than or equal to half the time Alternate: Babies look at the face stimulus for more than half the time Code Symbols: H 0 = µ " 30 H1 = µ > 30 Step 2: Determine Critical Region Population variance is not known, so use sample variance to estimate n = 26 babies; df = n-1 = 25 Look up values for t at the limits of the critical region from our critical values of t table Set α = .05; one-tailed 1.708 Step 2: Determine Critical Region Population variance is not known, so use sample variance to estimate n = 26 babies; df = n-1 = 25 Look up values for t at the limits of the critical region from our critical values of t table Set α = .05; one-tailed tcrit = +1.708 Step 3: Calculate t statistic from sample a) Sample variance: b) Estimated ! standard error: c) t statistic: ! ! s2 = 16 2 = 256 sM = s2 256 = = 3.14 n 26 M " µ 35 " 30 t= = = 1.59 sm 3.14 Step 4: Decision and Conclusion The tobt=1.59 does not exceed tcrit=1.708 ∴  We must retain the null hypothesis Conclusion: Babies do not look at the face stimulus more often than chance, t(25) = +1.59, n.s., one-tailed. Our results do not support the hypothesis that face processing is innate. Two-sample designs T-Tests with Unknown Populations So far, we have focused on comparing a sample to a population to see if the (treated) sample differs from the (expected) population More commonly, we are interested in determining if two samples are from different populations: •  experimental vs. control group •  pure text vs. animated text .ppt We need to use different forms of the t-test depending on whether we are analyzing data from a between-subjects or within-subjects design T-Tests with Unknown Populations Recall: Between-subjects (independent-measures) designs involves two (or more) groups of different individuals Within-subjects (repeated-measures) designs involve two (or more) groups consisting of the same individuals t Statistic for IndependentMeasures Design The goal of an independent-measures research study: To evaluate the difference of the means between two populations (or between two treatments). Mean of first population: µ1 Mean of second population: µ2 Difference between the means: µ1- µ2 t Statistic for IndependentMeasures Design Hypothesis Test: Null hypothesis: no change = no effect = no difference H0: µ1- µ2 = 0 Alternative hypothesis: there is a difference H1: µ1- µ2 ≠ 0 t Statistic for IndependentMeasures Design The formula: data - hypothesis t= error t= ! (M1-M2) – (µ1- µ2) standard error standard error = s(M1-M2) t Statistic for IndependentMeasures Design But, there is a problem: The formula is limited for n1 = n2. It is not appropriate for n1 ≠ n2, because variances obtained from from larger samples tend to be better than variances obtained from smaller samples. → Averaging or pooling of variance. t Statistic for IndependentMeasures Design One sample: sample variance = s2 SS = df Two samples (pooled variance): sp2 SS1 + SS2 = df1 + df2 t Statistic for IndependentMeasures Design Formula for two-sample (independent-measures) standard error: sM 1 "M 2 = 2 p 2 p s s + n1 n 2 Formula for independent-measures t statistic: ! t= (M1-M2) – (µ1- µ2) s(M1-M2) t Statistic for IndependentMeasures Design Comparison of t statistic for single-sample and independent-measures designs: t Statistic for IndependentMeasures Design Value for degrees of freedom: df = df1 + df2 Now we re ready to use the independentmeasures t statistic to test hypotheses about differences between population means (using differences between sample means). Hypothesis Testing: An Example Research question: Does use of mental images help memory? Experiment: Two groups of subjects are given a single list of 40 pairs of nouns for 5 minutes: dog/bicycle chair/rug book/flower etc. All subjects are instructed to memorize the list. Subjects in group 1 are instructed to form a mental image for each of the pairs. Subjects in group 2 are given no further instructions. Later both groups of subjects are given a memory test. Here are the results (in number of pairs recalled): Group 1 (IMAGES): 18, 31, 19, 29, 23, 26, 29, 21, 30, 24. M1 = 25 SS1 = 200 Group 2 (NO IMAGES): 24, 13, 23, 17, 16, 20, 17, 15, 19, 26. M2 = 19 SS2 = 160 Note: n=10 for both groups. Hypothesis Testing: An Example Realize: This is an independent measures design! Step 1: ( mental images have no effect ) H0: µ1- µ2 = 0 ( mental images have an effect ) H1: µ1- µ2 ≠ 0 Set α=.05. Hypothesis Testing: An Example Step 2: df = df1 + df2 df = 18 (example given) Look up t distribution for df=18, α=.05. Boundaries are t = ±2.101. Hypothesis Testing: An Example Step 3: Obtain data (see above), then calculate t statistic. a) Find pooled variance: sp2 SS1 + SS2 = df1 + df2 ! 200 + 160 = = 20 9+9 Hypothesis Testing: An Example Step 3: Obtain data (see above), then calculate t statistic. a) Find pooled variance: sp2 SS1 + SS2 = df1 + df2 200 + 160 = = 20 9+9 b) Use pooled variance to compute standard error: sM 1 "M 2 = s2p ! s2p + n1 n 2 = 20 20 + =2 10 10 Hypothesis Testing: An Example Step 3 (continued): c) Now, use the standard error to calculate the t statistic for the data: t= (M1-M2) – (µ1- µ2) s(M1-M2) M1 " M 2 25 "19 = = =3 s(M 1 "M 2 ) 2 Hypothesis Testing: An Example Step 4: Make a decision. In this case, t = 3.00 is in the critical region → reject H0. Write paper! Report result like this: The group using mental images recalled more words (M=25, SD=4.71) than the group that did not use mental images (M=19, SD=4.22). This difference was significant, t(18)=3.00, p<.05, two-tailed. Visualizing the distributions: Directional Tests State the hypotheses in terms of a prediction , or expectation about the outcome… Step 1: In our previous example: H0: µimages ≤ µno images H1: µimages > µno images Directional Tests Step 2: When locating critical region, there is ONE tail only! Does the sample mean difference go in the right direction (favoring H1)? If yes, continue… If no, retain H0. df = 18, α=.05, → t = 1.734 Step 3: Data give t(18) = 3.00. (t(18) is greater than the boundary of the critical region) Step 4: Decision → reject H0. Dependent Samples t-test Repeated-Measures and Matched-Subjects Definition: A repeated-measures study is one in which a single sample of individuals is measured more than once on the same dependent variable. Main benefit: two sets of data are from the same subjects. Matched-subjects study is attempting to simulate a repeated-measures study by matching two groups of subjects. t Statistic for Related Samples t Statistic for related samples is based on difference scores. difference score = D = X2 – X1 t Statistic for Related Samples Example: X1 = score before treatment X2 = score after treatment 4 subjects Hypothesis Tests for Related Samples What are we interested in? Population of difference scores µD Hypotheses: Null hypothesis H0: µD = 0 Alternative hypothesis H1: µD ≠ 0 Hypothesis Tests for Related Samples Remember: single sample t statistic t= M–µ sM Now: repeated-measures t statistic t= MD – µD sMD Hypothesis Tests for Related Samples Calculation of standard error is very analogous to the single sample case, except that difference scores (not raw scores) are used. s2 = sM D SS n-1 = s = = n SS df 2 s n Hypothesis Testing: An Example Posner, M.J., & Mitchell, R.F. (1967). Chronometric analysis of classification. Psychological Review, 74, 392-409. Physical matching instructions: AA or EE are correct, Aa or AE are incorrect Name matching instructions: AA or Aa are correct, AE or Ae are incorrect Hypothesis Testing: An Example Data: (fictitious) Subject XName XPhys D D2 A B C D E 309 304 305 304 305 25 9 0 16 16 ∑D = 16 ∑D2 = 66 MD = 3.2 SS = 14.8 304 301 305 300 301 5 3 0 4 4 Hypothesis Testing: An Example Realize: This is a repeated-measures design! Step 1: ( there is no difference between physical and name matching instructions ) H0: µD = 0 ( there is a difference between physical and name matching instructions ) H1: µD ≠ 0 Set α=.05 (two-tailed). Hypothesis Testing: An Example Step 2: df = n-1 df = 4 for the example Look up t distribution for df=4, α=.05. Boundaries are t = ±2.776. Hypothesis Testing: An Example Step 3: Obtain data; calculate t statistic. SS 14.8 s = = = 3.7 n "1 4 2 ! ! sM D = s2 = n 3.7 = 0.86 5 M D " µD t= = 3.72 sM D Hypothesis Testing: An Example Step 4: Make a decision. In this case: Reject H0 Name similarity instructions produced a significant increase in matching time over physical similarity instructions (M=3.2, SD=1.92) This difference was statistically significant, t(4)=3.72, p<.05, two-tailed.