Statistics II • What is a “The General Linear Model (GLM)”? t

(a crash course in)
Statistics II
•What is a “The General Linear Model (GLM)”?
•How are the GLM and the t-test related?
•Nuts and bolts in Matlab?
Monday, 7 February 2011
The General Linear Model
• Also known as “Linear regression” or “Multiple
regression”
• A model that is linear in its unknowns
(parameters).
For example:
How does blood
pressure change
with age?
Blood pressure
170
160
150
140
130
120
110
40
Monday, 7 February 2011
50
Age
60
70
The General Linear Model
• Also known as “Linear regression” or “Multiple
For example:
How does blood
pressure change
with age?
Monday, 7 February 2011
Dependent variable
regression”
• A model that is linear in its unknowns
(parameters).
170
160
150
140
130
120
110
40
50
60
70
Independent variable
The General Linear Model
• Also known as “Linear regression” or “Multiple
For example:
How does blood
pressure change
with age?
Monday, 7 February 2011
Response variable
regression”
• A model that is linear in its unknowns
(parameters).
170
160
150
140
130
120
110
40
50
60
Regressor
70
The General Linear Model
• Also known as “Linear regression” or “Multiple
regression”
• A model that is linear in its unknowns
(parameters).
•
•
But what does
“fit” mean?
And how do we
know if it is
“significant”?
Monday, 7 February 2011
Blood pressure
So we “fit” a line
to the data.
170
160
150
140
130
120
110
40
50
Age
60
70
What does a “fit” mean?
Every point yi is
associated with an
“error” ei.
Blood pressure
170
160
ei
150
140
130
120
110
40
Monday, 7 February 2011
yi
50
Age
60
70
What does a “fit” mean?
Every point yi is
associated with an
“error” ei.
Different “fits”
have different
sets of “errors” ei
Monday, 7 February 2011
Blood pressure
170
yi
160
ei
150
140
130
120
110
40
50
Age
60
70
What does a “fit” mean?
Every point yi is
associated with an
“error” ei.
Different “fits”
have different
sets of “errors” ei
Blood pressure
170
160
ei
150
140
130
120
110
40
The “best fit” is that which
minimizes the sum-ofsquared errors ei
Monday, 7 February 2011
yi
50
Age
SSE =
60
N
!
i=1
70
2
ei
So, how do we find that then?
Start with
the data
And a model
Blood pressure
170
yi = β0 + β1 × agei + ei
yi = β0 + β1 xi + ei
Monday, 7 February 2011
160
150
140
130
120
110
40
50
Age
60
70
So, how do we find that then?
And a model
yi = β0 + β1 xi + ei
Blood pressure
Start with
the data
170
Define a “cost-function”
SSE =
N
!
i=1
Monday, 7 February 2011
160
150
140
130
120
110
40
(yi − (β0 + β1 xi ))2
50
Age
60
70
So, how do we find that then?
And a model
yi = β0 + β1 xi + ei
Define a “costfunction”
SSE =
N
!
i=1
Blood pressure
Start with
the data
170
160
150
140
130
120
110
40
(yi − (β0 + β1 xi ))
2
50
Age
60
70
Find the parameters β0 and β1 that minimizes SSE
Monday, 7 February 2011
And in Matrix formulation
38
Some response
For didactic
purposes we
use different
data for this
36
34
32
30
28
26
0
Monday, 7 February 2011
10
20
30
40
And in Matrix formulation
Data that comes
from two groups
38
Some response
For didactic
purposes we
use different
data for this
36
34
32
30
28
26
0
Monday, 7 February 2011
10
20
30
40
And in Matrix formulation
Data that comes
from two groups
38
Some response
For didactic
purposes we
use different
data for this
And we will model it as
a “linear combination”
of these two “basis
functions”
Monday, 7 February 2011
36
34
32
30
28
26
0
10
20
30
1.2
1.2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
10
20
30
40
0
0
10
40
20
30
40
And in Matrix formulation
I.e. our model is
38
1.2
1.2
36
1
1
34
0.8
0.8
= β0
32
30
28
26
0
+β1
0.6
0.4
0.2
10
Monday, 7 February 2011
20
30
40
0
0
+e
0.6
0.4
0.2
10
20
30
40
0
0
10
20
30
40
And in Matrix formulation
I.e. our model is
38
1.2
1.2
36
1
1
34
0.8
0.8
= β0
32
30
28
26
0
+β1
0.6
0.4
0.2
10
20
30
40
0
0
0.4
0.2
10
20
30
40
0
0
yi = β0 xi1 + β1 xi2 + ei
Indicator variables
Monday, 7 February 2011
+e
0.6
10
20
30
40
And in Matrix formulation
I.e. our model is
38
1.2
1.2
36
1
1
34
0.8
0.8
= β0
32
30
28
26
0
+β1
0.6
0.4
0.2
10
20
30
40
0
0
+e
0.6
0.4
0.2
10
20
30
0
0
40
10
20
30
38
36
34
β0 = 30 and β1 = 32
32
30
28
26
0
Monday, 7 February 2011
10
20
30
40
40
And in Matrix formulation
I.e. our model is
38
1.2
1.2
36
1
1
34
0.8
0.8
= β0
32
30
28
26
0
+β1
0.6
0.4
0.2
10
20
30
40
0
0
+e
0.6
0.4
0.2
10
20
30
0
0
40
10
20
30
40
38
36
34
β0 = 29.8 and β1 = 34.3
32
30
28
26
0
Monday, 7 February 2011
10
20
30
40
And in Matrix formulation
I.e. our model is
38
1.2
1.2
36
1
1
34
0.8
0.8
= β0
32
30
28
26
0
+β1
0.6
0.4
0.2
10
20

Monday, 7 February 2011
30
0
0
40
y1
y2
..
.





 yn−1
yn


 
 
 
=
 
 
+e
0.6
0.4
0.2
10
20

30
40
0
0

10
20
1 0
e1
 e2
1 0 
'
(


.. ..  β0 +  ..
 .
. . 
 β1

 en−1
0 1 
0 1
en
30







40
And in Matrix formulation
38
1.2
1.2
36
1
1
34
0.8
0.8
= β0
32
30
28
26
0
+β1
0.6
0.4
0.2
10
20

30
y1
y2
..
.





 yn−1
yn
Monday, 7 February 2011
0
0
40


 
 
 
=
 
 
+e
0.6
0.4
0.2
10
20
30
40

0
0

10
20
1 0
e1
 e2
1 0 
'
(


.. ..  β0 +  ..
 .
. . 
 β1

 en−1
0 1 
0 1
en
y = Xβ + e
30







40
And in Matrix formulation
38
1.2
1.2
36
1
1
34
0.8
0.8
= β0
32
30
28
26
0
+β1
0.6
0.4
0.2
10
20
30
40
0
0
y=
0.4
0.2
10
20
30
40
β+
y = Xβ + e
Monday, 7 February 2011
+e
0.6
0
0
10
20
30
40
And in Matrix formulation
38
1.2
1.2
36
1
1
34
0.8
0.8
= β0
32
30
28
26
0
+β1
0.6
0.4
0.2
10
20
30
0
0
40
!
0.4
0.2
10
20
30
"−1
T
ˆ
β= X X
$%
#
ˆ
β=
+e
0.6
0
0
40
10
20
30
40
X y
&
T
Pseudoinverse
y
I.e. the parameters are a linear combination of the data
Monday, 7 February 2011
And in Matrix formulation
So, let us return to
our original example
Blood pressure
170
160
150
140
130
120
110
40
y=
Monday, 7 February 2011
!
β0
β1
"
+ e or

y1
y2
..
.





 yn−1
yn
50


 
 
 
=
 
 
Age
60

70

1 56
e1
 e2
1 49 
'
(


.. ..  β0 +  ..
 .
. . 
β


1
 en−1
1 47 
1 53
en







And in Matrix formulation
So, let us return to
our original example
Blood pressure
170
160
150
140
130
120
110
40
!
Monday, 7 February 2011
βˆ0
βˆ1
"
=
50
y
Age
=
!
60
81.1
1.06
70
"
And in Matrix formulation
ˆ is what we
So, β
think β should be.
What else can we
calculate?
Blood pressure
170
160
150
140
130
120
110
40
ˆ
ˆ = Xβ
y
What we think the
data should be
Monday, 7 February 2011
50
Age
60
70
What we think the
parameters should be
And in Matrix formulation
ˆ is what we
So, β
think β should be.
What else can we
calculate?
Blood pressure
170
160
150
140
130
120
110
40
ˆ
ˆ = Xβ
y
ˆ
ˆ
yˆi = β0 + β1 xi
Monday, 7 February 2011
50
Age
60
70
And in Matrix formulation
ˆ is what we
So, β
think β should be.
What else can we
calculate?
Blood pressure
170
160
150
140
130
120
110
40
50
ˆ
ˆ = Xβ
y
!
"−1
ˆ = X XT X
y
$%
#
XT y
&
The “Hat matrix” H
Monday, 7 February 2011
Age
60
70
But are they “significant”
What constitutes a
“significant”
parameter?
Blood pressure
170
160
150
140
130
120
110
40
Monday, 7 February 2011
50
Age
60
70
But are they “significant”
What constitutes a
“significant”
parameter?
Do you remember the
“sample variance”?
Blood pressure
170
160
150
140
130
120
110
40
50
Age
60
70
The “sample variance” is the “SS residual error”
after fitting a “model” consisting of a mean
Monday, 7 February 2011
But are they “significant”
What constitutes a
“significant”
parameter?
Do you remember the
“sample variance”?
Blood pressure
170
160
150
140
130
120
110
40
50
Age
60
But we can also calculate the residual error
after fitting the “full model”
Monday, 7 February 2011
70
But are they “significant”
What constitutes a
“significant”
parameter?
Do you remember the
“sample variance”?
Blood pressure
170
160
150
140
130
120
110
40
50
Age
60
In fact the sums of squares add up in a very
neat way
SStot = SSmean + SSslope + SSerr
or in general SStot = SSmodel + SSerr
Monday, 7 February 2011
70
But are they “significant”
What constitutes a
“significant”
parameter?
Blood pressure
170
160
150
140
130
120
110
40
50
60
70
Age
The question now is: Is this a bigger value
than we would expect from “any old” varible?
SStot = SSmean + SSslope + SSerr
dftot = dfmean + dfslope + dferr
Monday, 7 February 2011
But are they “significant”
What constitutes a
“significant”
parameter?
Blood pressure
170
160
150
140
130
120
110
40
50
60
70
Age
What would you think about these under the
null-hypothesis?
SSslope
dfslope
Monday, 7 February 2011
vs
SSerr
dferr
But are they “significant”
What constitutes a
“significant”
parameter?
Blood pressure
170
160
150
140
130
120
110
40
50
60
70
Age
So what would you think about this under the
null-hypothesis?
SSslope /dfslope
SSerr /dferr
Monday, 7 February 2011
But are they “significant”
The F-distribution
•Is parametric with parameters df1 and df2
4
F1,38
3
2
1
0
0
Monday, 7 February 2011
F5,5
F5,35
2
4
6
But are they “significant”
Or we can use our old friend the t-test
√ x1 − x2
t= n √
σ2
Many measurements:
Trustworthy
Monday, 7 February 2011
Large difference:
Trustworthy
Small variability:
Trustworthy
But are they “significant”
Or we can use our old friend the t-test
x1 - x2
Large difference: Trustworthy
!
cT β
!
cT β
!
t= √ "
σ 2 cT (XT X)−1 c
Small variability: Trustworthy
Monday, 7 February 2011
Many measurements: Trustworthy
(a crash course in)
Statistics I
• A statistic, Y, can be “good”, like e.g. the mean or
the median of X (where X=[X1, X2, ..., Xn]).
• It can also be “poor”, like e.g. Y=max(X) (biased)
or Y=X1 (unbiased, but poor precision).
Let us say we “take a sample”, i.e. ask 10 british
women how tall they are and calculate the average.
What do you think we would get?
160cm ? 163cm ? 165cm ?
Do you think we would get the same result if we
repeated the experiment?
Monday, 7 February 2011
(a crash course in)
Statistics I
Y = g(X1 , X2 , . . . , Xn )
Why did I write it like this?
Monday, 7 February 2011
(a crash course in)
Statistics I
Y = g(X1 , X2 , . . . , Xn )
• To stress that Y is a stochastic variable.
• Hence,Y has a distribution, i.e. there is a
probability associated with each value for Y.
Monday, 7 February 2011
(a crash course in)
Statistics I
Y = g(X1 , X2 , . . . , Xn )
• To stress that Y is a stochastic variable.
• Hence,Y has a distribution, i.e. there is a
probability associated with each value for Y.
• The distribution associated with a statistic is
called a “sampling distribution”
Monday, 7 February 2011
(a crash course in)
Statistics I
• What would our “sampling distribution” look
like then?
• It depends on the underlying distribution in
the population, so we must make some
assumption about that.
Y ∼ N (µ, σ )
2
or equivalently Y ∼ µ + N (0, σ )
2
Normal distribution
Very common assumption, but how do we motivate it?
Monday, 7 February 2011
Motivations for assuming
normal distribution
• It makes life a lot easier
- Not the greatest motivation, but often the true
one.
• It is a good approximation to many other
distributions (e.g. Binomial).
• When errors are “multi-factorial” the resulting total
error tend to be normal distributed. Think e.g.
about I.Q. where we can assume that there is a
large # of factors that together decide its value.
Monday, 7 February 2011
How about our sample then?
Let’s say we are a bit lazy and take a sample of n=1.
To take that sample is equivalent to
drawing a sample from the
underlying distribution.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
Monday, 7 February 2011
How about our sample then?
Let’s say we are a bit lazy and take a sample of n=1.
To take that sample is equivalent to
drawing a sample from the
underlying distribution.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
What do you think would
happen if we repeated the
“experiment”
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s say we are a bit lazy and take a sample of n=1.
To take that sample is equivalent to
drawing a sample from the
underlying distribution.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
And again
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s say we are a bit lazy and take a sample of n=1.
To take that sample is equivalent to
drawing a sample from the
underlying distribution.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
And again
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s say we are a bit lazy and take a sample of n=1.
To take that sample is equivalent to
drawing a sample from the
underlying distribution.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
And again
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s say we are a bit lazy and take a sample of n=1.
To take that sample is equivalent to
drawing a sample from the
underlying distribution.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
And if I repeated this many
times, how would the
resulting distribution look?
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s say we are a bit lazy and take a sample of n=1.
To take that sample is equivalent to
drawing a sample from the
underlying distribution.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
Hmm, that looks kind of
familiar...
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s say we are a bit lazy and take a sample of n=1.
To take that sample is equivalent to
drawing a sample from the
underlying distribution.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
Oh yeah, that was it!
120
Monday, 7 February 2011
140
160
180
200
How about our sample then?
Let’s say we are a bit lazy and take a sample of n=1.
So this means our “sampling distribution” is identical
to the underlying distribution. What does that imply
for the precision of our “sample mean”?
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s be slightly less lazy and take a sample of n=3.
Now we draw three samples from
the underlying distribution and
calculate their mean.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
Monday, 7 February 2011
How about our sample then?
Let’s be slightly less lazy and take a sample of n=3.
Now we draw three samples from
the underlying distribution and
calculate their mean.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
And as before we repeat the
“experiment”
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s be slightly less lazy and take a sample of n=3.
Now we draw three samples from
the underlying distribution and
calculate their mean.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
And again
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s be slightly less lazy and take a sample of n=3.
Now we draw three samples from
the underlying distribution and
calculate their mean.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
And again
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s be slightly less lazy and take a sample of n=3.
Now we draw three samples from
the underlying distribution and
calculate their mean.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
And again
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s be slightly less lazy and take a sample of n=3.
Now we draw three samples from
the underlying distribution and
calculate their mean.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
And how would the resulting
distribution look this time?
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s be slightly less lazy and take a sample of n=3.
Now we draw three samples from
the underlying distribution and
calculate their mean.
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
I think that looks more
narrow.
Monday, 7 February 2011
120
140
160
180
200
How about our sample then?
Let’s be slightly less lazy and take a sample of n=3.
So this means our “sampling distribution” is more
narrow than our underlying distribution. What does
that imply for the precision of our “sample mean”?
0.04
0.03
0.02
0.01
0
120
140
160
180
200
µ = 164 cm σ = 10 cm
Monday, 7 February 2011
120
140
160
180
200
The “sampling distribution”?
So, what can we say about this “sampling distribution”?
Useful fact about the Normal Distribution: “Any
linear combination of normal distributed stochastic
variables is itself normal distributed”
1
1
1
x
¯ = x1 + x2 + x3
3
3
3
i.e. a linear combination
Hence, our “sampling
distribution” is normal.
But what can we say
about its parameters?
120
Monday, 7 February 2011
140
160
180
200
The “sampling distribution”?
So, what can we say about this “sampling distribution”?
If we take a sample of size n from a population ∼ N (µ, σ)
σ
the sample mean will be ∼ N (µ, √ )
10
N (164, √ )
3
N (164, 10)
120
Monday, 7 February 2011
140
160
180
200
n
What do we know this far?
•A statistic is a function of a sample taken from some
population.
•In other words: A statistic is a number that one
calculates from some data one has acquired.
•The statistic is a stochastic variable, and will hence
have a probability distribution.
•That distribution is called a “sampling distribution”.
•If the distribution of the population is normal, then
any statistic that is a linear combination of the sample
will have a normal sampling distribution.
- Even if the population is not normal distributed, the sample
mean is often approximately normal.
Monday, 7 February 2011
Hypothesis testing
Up until now we have been talking about descriptive
“summary statistics”.
Often when we talk about statistics we mean some
sort of comparison about which we wish to make
some categorical statement.
For example: •The blood pressure is lower after
taking my new medicine.
•People experience less pain when
distracted.
•Boys are taller than girls.
Monday, 7 February 2011
Hypothesis testing
So we want to make a “comparison” in order to
detect a difference.
On way of doing that would be to calculate a statistic
that reflects how reliable/trustworthy a difference is.
What is it then that makes us trust/distrust a
difference?
Monday, 7 February 2011
A question of trust
What is it then that makes us trust/distrust a
difference?
Do you trust either of these?
Monday, 7 February 2011
A question of trust
What is it then that makes us trust/distrust a
difference?
Do you trust either of these?
Monday, 7 February 2011
A question of trust
What is it then that makes us trust/distrust a
difference?
Do you trust either of these?
Monday, 7 February 2011
A question of trust
What is it then that makes us trust/distrust a
difference?
Trustworthy
Dodgy
i.e. we trust
Monday, 7 February 2011
Big
difference
Small
variance
Large
sample
Can we think of a statistic that
summarizes that?
How about a t-statistic?
√ x1 − x2
t= n √
2
σ
Many measurements: Trustworthy
Monday, 7 February 2011
Large difference:
Trustworthy
Small variability:
Trustworthy
Can we think of a statistic that
summarizes that?
Can we all agree that the t-statistic summarizes our
intuition for when a difference is trustworthy or not?
But when is it really trustworthy? I.e., when can we say:
“There is a difference”?
When t is 2? When t is 3? 4?
To answer that we need some more tools.
Monday, 7 February 2011
The “null-hypothesis”
• The null-hypothesis is a statement about some
comparison
• It is typically the opposite of what we really want to
show
• For example:
- H0: Surely there can be no effect of a blood pressure
medicine developed by little me.
- Really means: Dear god, let there be a difference so I
get to keep my job.
Monday, 7 February 2011
The “null-hypothesis”
• The null-hypothesis is a statement about the
underlying population
• It is framed in terms of parameters of the underlying
distribution, e.g. H0 : µ1 = µ2
The “population” of people
not treated with my drug
0.04
0.04
0.03
0.03
0.02
0.02
0.01
0.01
0
120
Monday, 7 February 2011
The “population” of people
treated with my drug
140
µ1
160
180
200
0
120
140
µ2
160
180
200
The “null-distribution”
• Associated with the “null-hypothesis” is a “null-
distribution”.
• The “null-distribution” is what the sampling
distribution of our statistic would be like if the nullhypothesis was true.
• Lots of red there. Let’s see if we can work out what
that really means.
Monday, 7 February 2011
The “null-distribution”
• Let us start out with a hypothetical experiment.
• We feed 10 subjects our medicine and measure their
blood pressure before and after

• We will postulate an


underlying distribution
z=
Z ∼ N (0, σ )
• I.e. H0 : µ = 0
2
• We will then calculate a t-
statistic from the sample z
t=
Monday, 7 February 2011
√
¯
z
10 √
σ2
z1
z2
.
 ..
z10


 
 
=
 
y1 − x1
y1 − x1
..
.
y10 − x10
0.04
0.03
0.02
0.01
0
120
140
µ=0
160
180
200





The “null-distribution”
• So, how might this be distributed then?
t=
√
¯
z
10 √
σ2
First of all, where do
σ we get this from?
We don’t know σ so we try to calculate it from the sample
!
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
s=
((z1 − z
n−1
• What is s?
Monday, 7 February 2011
The “null-distribution”
• So, how might this be distributed then?
t=
√
¯
z
10 √
σ2
First of all, where do
σ we get this from?
We don’t know σ so we try to calculate it from the sample
!
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
s=
((z1 − z
n−1
• It is a stochastic variable.
• It is a statistic.
• Hence it has a sampling distribution.
• It is known as the “sample standard deviation”.
Monday, 7 February 2011
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
s=
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
What do range of values do you think s might take?
Monday, 7 February 2011
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
s=
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
0
1
2
3
Let’s now fill this range
Monday, 7 February 2011
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
s=
Take sample of 10
z
Calculate ¯
Calculate s = 1.45
0.4
0.3
0.2
0.1
0
−3
Monday, 7 February 2011
−1
1
3
0
1
2
3
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
s=
Take another sample of 10
z
Calculate ¯
Calculate s = 1.04
0.4
0.3
0.2
0.1
0
−3
Monday, 7 February 2011
−1
1
3
0
1
2
3
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
s=
And another sample of 10
z
Calculate ¯
Calculate s = 0.91
0.4
0.3
0.2
0.1
0
−3
Monday, 7 February 2011
−1
1
3
0
1
2
3
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
s=
And another
z
Calculate ¯
Calculate s = 0.91
0.4
0.3
0.2
0.1
0
−3
Monday, 7 February 2011
−1
1
3
0
1
2
3
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
s=
And another
z
Calculate ¯
Calculate s = 1.01
0.4
0.3
0.2
0.1
0
−3
Monday, 7 February 2011
−1
1
3
0
1
2
3
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
s=
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
And let’s stop faffing around
How do you think it
would have looked
for n = 100 ?
0
Monday, 7 February 2011
1
2
3
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
s=
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
n = 10
0
What about n = 3 ?
n = 100
0
Monday, 7 February 2011
1
1
2
3
2
3
The “sample variance”
• Let us see if we can find out how s is distributed.
doing that let us assume Z ∼ N (0, 1)
• When
!
1
¯)2 + (z2 − z
¯)2 + . . . + (zn − z
¯)2 )
((z1 − z
n−1
s=
n = 10
n=3
0
1
n = 100
0
Monday, 7 February 2011
1
2
3
0
1
2
3
2
3
The “sample variance”
• And it turns out there is a parametric distribution for
the sampling distribution of s
n−1 −x2 /2
x
e
f (x; n) = n/2−1
2
Γ(n/2)
n = 100
n = 100
0
1
n = 10
n = 10
0
n=3
n=3
0
Monday, 7 February 2011
1
2
3
0
1
2
1
3
2
3
2
3
What about z
¯ then?
Well, we already know that. Do you
remember this slide?
Monday, 7 February 2011
t=
√
¯
z
10 √
s2
The sampling distribution of t
t=
Monday, 7 February 2011
√
¯
z
n√
s2
σ
distributed as N (0, √ )
n
The sampling distribution of t
t=
√
¯
z
n√
s2
σ
distributed as N (0, √ )
n
How then do you think
Monday, 7 February 2011
√
¯ is distributed?
nz
The sampling distribution of t
t=
√
¯
z
n√
s2
σ
distributed as N (0, √ )
n
√
That’s right n z¯ ∼ N (0, σ)
Monday, 7 February 2011
The sampling distribution of t
t=
√
¯
z
n√
s2
distributed as N (0, σ)
hint
And let’s say we have a huge
sample, what do you expect
this to be?
n = 100
n = 10
n=3
0
1
2
3
+ remember our assumption
Monday, 7 February 2011
The sampling distribution of t
t=
√
¯
z
n√
s2
distributed as N (0, σ)
σ when sample is ∞
distributed as ... ?
Monday, 7 February 2011
The sampling distribution of t
t=
√
¯
z
n√
s2
distributed as N (0, σ)
σ when sample is ∞
distributed as N (0, 1)
Monday, 7 February 2011
The sampling distribution of t
t=
√
¯
z
n√
s2
distributed as N (0, σ)
But, n is not ∞
So then what?
n = 100
n = 10
n=3
0
Monday, 7 February 2011
1
2
3
The sampling distribution of t
t=
√
¯
z
n√
s2
distributed as N (0, σ)
But, n is not ∞
n = 100
n = 10
n=3
What will happen to
those that fall in this
area?
n = 10
0
Monday, 7 February 2011
1
2
0
3
1
2
3
The sampling distribution of t
t=
√
¯
z
n√
s2
distributed as N (0, σ)
But, n is not ∞
n = 100
n = 10
n=3
What will happen to
those that fall in this
area?
n = 10
0
1
And those that
fall in this area?
0
Monday, 7 February 2011
1
2
3
2
3
The sampling distribution of t
t=
√
¯
z
n√
s2
distributed as N (0, σ)
But, n is not ∞
n = 100
n = 10
n=3
What will happen to
those that fall in this
area?
n = 10
1
And those that
fall in this area?
And this area?
0
Monday, 7 February 2011
0
1
2
3
2
3
The sampling distribution of t
t=
√
¯
z
n√
s2
distributed as N (0, σ)
But, n is not ∞
So, we are looking for a distribution that is kind of
normal, but is more peaked AND has longer tails.
Monday, 7 February 2011
The sampling distribution of t
t=
√
distributed as N (0, σ)
¯
z
n√
s2
But, n is not ∞
Enter the
t-distribution
f (t; ν) =
ν+1
Γ( 2 )
√
νπΓ( ν2 )
!
N (0, 1)
ν+1
"
−
(
2
2 )
t
1+
ν
−4
Monday, 7 February 2011
t9
−2
0
2
4
The sampling distribution of t
t=
√
distributed as N (0, σ)
¯
z
n√
s2
But, n is not ∞
Really a family
of distributions
f (t; ν) =
ν+1
Γ( 2 )
√
νπΓ( ν2 )
!
t99
ν+1
"
−
(
2
2 )
t
1+
ν
−4
Monday, 7 February 2011
t2
t9
−2
0
2
4
The sampling distribution of t
t=
√
¯
z
n√
s2
f (t; ν) =
distributed as N (0, σ)
But, n is not ∞
Γ( ν+1
2 )
√
νπΓ( ν2 )
!
ν+1
"
−
(
2
2 )
t
1+
ν
N.B. it is a parametric distribution.
Why is that important?
Monday, 7 February 2011
The sampling distribution of t
t=
√
distributed as N (0, σ)
¯
z
n√
s2
But, n is not ∞
Because once we have
“planned” our
experiment we can
calculate this curve.
f (t; ν) =
Γ( ν+1
2 )
√
νπΓ( ν2 )
!
2
t
1+
ν
"−( ν+1
2 )
−4
Monday, 7 February 2011
−2
0
2
4
The sampling distribution of t
t=
√
¯
z
n√
s2
distributed as N (0, σ)
But, n is not ∞
It gives the probability
to observe a given tvalue, provided the
null-hypothesis is
true.
−4
Monday, 7 February 2011
−2
0
2
4
The sampling distribution of t
• Let us return to our experiment.
• We feed 10 subjects our medicine and measure their
blood pressure before
and
after
√
¯
z
• We calculate t = 10 √σ2
• And we got t=2.9
• Now what?
Monday, 7 February 2011
The sampling distribution of t
• Let us return to our experiment.
• We feed 10 subjects our medicine and measure their
blood pressure before
and
after
√
¯
z
• We calculate t = 10 √σ2
• And we got t=2.9
• We got to our calculated t9 curve.
• And we ask: “What is
the chance I would
have obtained this
value (or larger) if the
null-hypothesis was
true?
2.9
−4
Monday, 7 February 2011
−2
0
2
4
The sampling distribution of t
• Let us return to our experiment.
• We feed 10 subjects our medicine and measure their
blood pressure before
and
after
√
¯
z
• We calculate t = 10 √σ2
• And we got t=2.9
• We got to our calculated t9 curve.
• Equivalent to the area
under the curve for
t > 2.9
2.9
−4
Monday, 7 February 2011
−2
0
2
4
The sampling distribution of t
• Let us return to our experiment.
• We feed 10 subjects our medicine and measure their
blood pressure before
and
after
√
¯
z
• We calculate t = 10 √σ2
• And we got t=2.9
• We got to our calculated t9 curve.
• Equivalent to the area
under the curve for
t > 2.9
• Which is ~0.009
−4
Monday, 7 February 2011
−2
2.9
0
2
4
The sampling distribution of t
• And we got t=2.9
• We got to our calculated t9 curve.
• Equivalent to the area under the curve for t > 2.9
• Which is ~0.009
• Then comes the leap of faith, we say:
“If the null-hypothesis was
true, there is only a ~1%
chance I would observe a
value this large (or larger).
So it’s not very likely the
null-hypothesis is true
then. I will reject it.”
2.9
−4
Monday, 7 February 2011
−2
0
2
4
Summary?
•When we want to test for some comparison (effect)
we start with a “null-hypothesis”.
•We will then use our data to calculate a “teststatistic”.
•A test statistic summarizes our intuition about
“trust”
•The sampling-distribution of our test-statistic is
known “under the null-hypothesis”.
•Once we observe a value, we can calculate how likely
that value would be if the null-distribution was
true.
•If that value is unlikely, then we say that the nullhypothesis is unlikely to be true. We reject it
Monday, 7 February 2011
False positives/negatives
• I am sure you have all heard about “false positives”
and “false negatives”.
• But what does that actually mean?
Monday, 7 February 2011
False positives/negatives
• I am sure you have all heard about “false positives”
and “false negatives”.
• But what does that actually mean?
• We want to perform an experiment and as part of
that we define a null-hypothesis, e.g. H0 : µ = 0
• Now what can happen?
Monday, 7 February 2011
False positives/negatives
• I am sure you have all heard about “false positives”
and “false negatives”.
• But what does that actually mean?
• We want to perform an experiment and as part of
that we define a null-hypothesis, e.g. H0 : µ = 0
• Now what can happen?
H0 is true
H0 is false
Monday, 7 February 2011
} True state of affairs
False positives/negatives
• I am sure you have all heard about “false positives”
and “false negatives”.
• But what does that actually mean?
• We want to perform an experiment and as part of
that we define a null-hypothesis, e.g. H0 : µ = 0
• Now what can happen?
H0 is true
H0 is false
} True state of affairs
We don’t reject H0
We reject H0
Monday, 7 February 2011
} Our decision
False positives/negatives
H0 is true
H0 is false
} True state of affairs
We don’t reject H0
We reject H0
} Our decision
We don’t reject H0
H0 is true
H0 is false
Monday, 7 February 2011
We reject H0
False positives/negatives
H0 is true
H0 is false
} True state of affairs
We don’t reject H0
We reject H0
} Our decision
We don’t reject H0
H0 is true
H0 is false
Monday, 7 February 2011
We reject H0
☺
☺
False positives/negatives
H0 is true
H0 is false
} True state of affairs
We don’t reject H0
We reject H0
} Our decision
We don’t reject H0
We reject H0
H0 is true
☺
False positive
H0 is false
False negative
☺
Monday, 7 February 2011
False positives/negatives
H0 is true
H0 is false
} True state of affairs
We don’t reject H0
We reject H0
} Our decision
We don’t reject H0
We reject H0
H0 is true
☺
False positive
Type I error
H0 is false
False negative
Type II error
☺
Monday, 7 February 2011