Two-Sample Designs Q560: Experimental Methods in Cognitive Science Lecture 9

Two-Sample Designs
Q560: Experimental Methods in Cognitive Science
Lecture 9
Why not z-test: An Example:
It is thought that we are genetically hardwired to
recognize human faces.
In a preferential looking paradigm, newborns are
presented with two stimuli: one representing a
face, and one containing the same features, but
in a different configuration. The experimenter
records how long the infants look at the face
stimulus during a 60-sec presentation (lets
assume they always look at one or the other)
By chance, we would only expect them to look at
the face stimulus for 30 seconds, but they look
for 35 seconds…is this effect significant?
Sample Variance
We don t know the variability of the population.
But: we do know the variability of the sample.
Sample variance =
s2
=
SS
n-1
=
SS
Sample standard deviation = √s2
df
Estimated Standard Error
We can use the estimated standard error as an
estimate of the real standard error.
"
"2
"M =
=
n
n
Standard error =
!
Estimated standard
error =
!
s
sM =
=
n
s2
n
t Statistic: Definition
Substituting the estimated standard error in the
formula for the z-score gives us the following:
M-µ
t statistic = t =
sM
The t-statistic approximates a z-score, using the
sample variance instead of the population variance
(which is unknown).
How well does that work?
Degrees of Freedom and t Statistic
Degrees of freedom describes the number of
scores in a sample that are free to vary.
degrees of freedom = df = n-1
The greater df, the better the t-statistic
approximates the z-score.
The set of t statistics for a given df (n) forms a t
distribution.
For large df (large n) the t distribution
approximates the normal distribution.
t Distribution: Shape
Hypothesis Tests Using the t Statistic
Same procedure as with z-scores, except using the
t statistic instead.
Step 1: State hypothesis, in terms of population
parameter µ.
Step 2: Determine critical region, using α, df, and
looking up t.
Step 3: Collect data and calculate value for t using
estimated standard error.
Step 4: Decide, based on whether t value for
sample falls within critical region
One sample T-Test: An Example
We ll go back to our preferential looking paradigm
and newborn babies. We show them the two
stimuli for 60 seconds, and measure how long they
look at the facial configuration. Our null
assumption is that they will not look at it for longer
than half the time, µ = 30
Our alternate hypothesis is that they will look at
the face stimulus longer b/c face recognition is
hardwired in their brain, not learned (directional)
Our sample of n = 26 babies looks at the face
stimulus for M = 35 seconds, s = 16 seconds
Test our hypotheses (α = .05, one-tailed)
Step 1: Hypotheses
Sentence:
Null: Babies look at the face stimulus for less than
or equal to half the time
Alternate: Babies look at the face stimulus for
more than half the time
Code Symbols:
H 0 = µ " 30
H1 = µ > 30
Step 2: Determine Critical Region
Population variance is not known, so use sample
variance to estimate
n = 26 babies; df = n-1 = 25
Look up values for t at the limits of the critical
region from our critical values of t table
Set α = .05; one-tailed
1.708
Step 2: Determine Critical Region
Population variance is not known, so use sample
variance to estimate
n = 26 babies; df = n-1 = 25
Look up values for t at the limits of the critical
region from our critical values of t table
Set α = .05; one-tailed
tcrit = +1.708
Step 3: Calculate t statistic from sample
a) Sample variance:
b) Estimated !
standard error:
c) t statistic:
!
!
s2 = 16 2 = 256
sM =
s2
256
=
= 3.14
n
26
M " µ 35 " 30
t=
=
= 1.59
sm
3.14
Step 4: Decision and Conclusion
The tobt=1.59 does not exceed tcrit=1.708
∴  We must retain the null hypothesis
Conclusion: Babies do not look at the face
stimulus more often than chance, t(25) = +1.59,
n.s., one-tailed. Our results do not support the
hypothesis that face processing is innate.
Two-sample designs
T-Tests with Unknown Populations
So far, we have focused on comparing a sample to a
population to see if the (treated) sample differs from
the (expected) population
More commonly, we are interested in determining if
two samples are from different populations:
•  experimental vs. control group
•  pure text vs. animated text .ppt
We need to use different forms of the t-test
depending on whether we are analyzing data from a
between-subjects or within-subjects design
T-Tests with Unknown Populations
Recall:
Between-subjects (independent-measures) designs
involves two (or more) groups of different individuals
Within-subjects (repeated-measures) designs involve
two (or more) groups consisting of the same
individuals
t Statistic for IndependentMeasures Design
The goal of an independent-measures research
study:
To evaluate the difference of the means between
two populations (or between two treatments).
Mean of first population: µ1
Mean of second population: µ2
Difference between the means:
µ1- µ2
t Statistic for IndependentMeasures Design
Hypothesis Test:
Null hypothesis: no change = no effect = no
difference
H0: µ1- µ2 = 0
Alternative hypothesis: there is a difference
H1: µ1- µ2 ≠ 0
t Statistic for IndependentMeasures Design
The formula:
data - hypothesis
t=
error
t=
!
(M1-M2) – (µ1- µ2)
standard error
standard error = s(M1-M2)
t Statistic for IndependentMeasures Design
But, there is a problem:
The formula is limited for n1 = n2. It is not
appropriate for n1 ≠ n2, because variances
obtained from from larger samples tend to be
better than variances obtained from smaller
samples.
→ Averaging or pooling of variance.
t Statistic for IndependentMeasures Design
One sample:
sample variance =
s2
SS
=
df
Two samples (pooled variance):
sp2
SS1 + SS2
=
df1 + df2
t Statistic for IndependentMeasures Design
Formula for two-sample (independent-measures)
standard error:
sM 1 "M 2 =
2
p
2
p
s
s
+
n1 n 2
Formula for independent-measures t statistic:
!
t=
(M1-M2) – (µ1- µ2)
s(M1-M2)
t Statistic for IndependentMeasures Design
Comparison of t statistic for single-sample and
independent-measures designs:
t Statistic for IndependentMeasures Design
Value for degrees of freedom:
df = df1 + df2
Now we re ready to use the independentmeasures t statistic to test hypotheses about
differences between population means (using
differences between sample means).
Hypothesis Testing: An Example
Research question:
Does use of mental images help memory?
Experiment:
Two groups of subjects are given a single list of 40
pairs of nouns for 5 minutes:
dog/bicycle
chair/rug
book/flower etc.
All subjects are instructed to memorize the list.
Subjects in group 1 are instructed to form a mental
image for each of the pairs. Subjects in group 2 are
given no further instructions.
Later both groups of subjects are given a memory
test. Here are the results (in number of pairs
recalled):
Group 1 (IMAGES): 18, 31, 19, 29, 23, 26, 29, 21,
30, 24.
M1 = 25
SS1 = 200
Group 2 (NO IMAGES): 24, 13, 23, 17, 16, 20, 17,
15, 19, 26.
M2 = 19
SS2 = 160
Note: n=10 for both groups.
Hypothesis Testing: An Example
Realize: This is an independent measures design!
Step 1:
( mental images have no effect )
H0: µ1- µ2 = 0
( mental images have an effect )
H1: µ1- µ2 ≠ 0
Set α=.05.
Hypothesis Testing: An Example
Step 2:
df = df1 + df2
df = 18 (example given)
Look up t distribution for df=18, α=.05.
Boundaries are t = ±2.101.
Hypothesis Testing: An Example
Step 3: Obtain data (see above),
then calculate t statistic.
a) Find pooled variance:
sp2
SS1 + SS2
=
df1 + df2
!
200 + 160
=
= 20
9+9
Hypothesis Testing: An Example
Step 3: Obtain data (see above),
then calculate t statistic.
a) Find pooled variance:
sp2
SS1 + SS2
=
df1 + df2
200 + 160
=
= 20
9+9
b) Use pooled variance to compute standard error:
sM 1 "M 2 =
s2p ! s2p
+
n1 n 2
=
20 20
+
=2
10 10
Hypothesis Testing: An Example
Step 3 (continued):
c) Now, use the standard error to
calculate the t statistic for the data:
t=
(M1-M2) – (µ1- µ2)
s(M1-M2)
M1 " M 2 25 "19
=
=
=3
s(M 1 "M 2 )
2
Hypothesis Testing: An Example
Step 4: Make a decision. In this case, t = 3.00
is in the critical region → reject H0.
Write paper! Report result like this:
The group using mental images recalled more
words (M=25, SD=4.71) than the group that did
not use mental images (M=19, SD=4.22). This
difference was significant, t(18)=3.00, p<.05,
two-tailed.
Visualizing the distributions:
Directional Tests
State the hypotheses in terms of a
prediction , or expectation about the
outcome…
Step 1: In our previous example:
H0: µimages ≤ µno images
H1: µimages > µno images
Directional Tests
Step 2: When locating critical region, there is
ONE tail only!
Does the sample mean difference go in the right
direction (favoring H1)? If yes, continue… If
no, retain H0.
df = 18, α=.05, → t = 1.734
Step 3: Data give t(18) = 3.00. (t(18) is greater
than the boundary of the critical region)
Step 4: Decision → reject H0.
Dependent Samples t-test
Repeated-Measures and
Matched-Subjects
Definition:
A repeated-measures study is one in which a
single sample of individuals is measured more
than once on the same dependent variable.
Main benefit: two sets of data are from the same
subjects.
Matched-subjects study is attempting to simulate
a repeated-measures study by matching two
groups of subjects.
t Statistic for Related Samples
t Statistic for related samples is based on
difference scores.
difference score = D = X2 – X1
t Statistic for Related Samples
Example:
X1 = score before treatment
X2 = score after treatment
4 subjects
Hypothesis Tests for
Related Samples
What are we interested in?
Population of difference scores µD
Hypotheses:
Null hypothesis H0: µD = 0
Alternative hypothesis H1: µD ≠ 0
Hypothesis Tests for Related Samples
Remember: single sample t statistic
t=
M–µ
sM
Now: repeated-measures t statistic
t=
MD – µD
sMD
Hypothesis Tests for Related Samples
Calculation of standard error is very analogous to
the single sample case, except that difference
scores (not raw scores) are used.
s2
=
sM D
SS
n-1
=
s
=
=
n
SS
df
2
s
n
Hypothesis Testing: An Example
Posner, M.J., & Mitchell, R.F. (1967). Chronometric
analysis of classification. Psychological Review, 74,
392-409.
Physical matching instructions:
AA or EE are correct, Aa or AE are incorrect
Name matching instructions:
AA or Aa are correct, AE or Ae are incorrect
Hypothesis Testing: An Example
Data:
(fictitious)
Subject
XName XPhys D
D2
A
B
C
D
E
309
304
305
304
305
25
9
0
16
16
∑D = 16
∑D2 = 66
MD = 3.2
SS = 14.8
304
301
305
300
301
5
3
0
4
4
Hypothesis Testing: An Example
Realize: This is a repeated-measures design!
Step 1:
( there is no difference between physical and
name matching instructions )
H0: µD = 0
( there is a difference between physical and
name matching instructions )
H1: µD ≠ 0
Set α=.05 (two-tailed).
Hypothesis Testing: An Example
Step 2:
df = n-1
df = 4
for the example
Look up t distribution for df=4, α=.05.
Boundaries are t = ±2.776.
Hypothesis Testing: An Example
Step 3: Obtain data; calculate t statistic.
SS 14.8
s =
=
= 3.7
n "1
4
2
!
!
sM D =
s2
=
n
3.7
= 0.86
5
M D " µD
t=
= 3.72
sM D
Hypothesis Testing: An Example
Step 4: Make a decision.
In this case: Reject H0
Name similarity instructions produced a
significant increase in matching time over
physical similarity instructions (M=3.2,
SD=1.92) This difference was statistically
significant, t(4)=3.72, p<.05, two-tailed.