6. The Sample Variance II

Virtual Laboratories > 5. Random Samples > 1 2 3 4 5 6 7 8 9 10 11
6. The Sample Variance II
We continue our discussion of the sample variance from the last section, but now we assume that the variables are random.
Thus, suppose that we have a basic random experiment, and that X is a real-valued random variable for the experiment
with mean ΞΌ and standard deviation Οƒ . We will need some higher order moments as well. Let Οƒ3 = 𝔼[(X βˆ’ ΞΌ)3 ] and
Οƒ4 = 𝔼[(X βˆ’ ΞΌ)4 ] denote the 3rd and 4th moments about the mean. Recall that Οƒ3 /Οƒ 3 = skew(X), the skewness of X , and
Οƒ4 /Οƒ 4 = kurt(X) , the kurtosis of X . We assume that Οƒ4 < ∞.
We repeat the basic experiment n times to form a new, compound experiment, with a sequence of independent random
variables X = (X1 , X2 , … , Xn ), each with the same distribution as X . In statistical terms, X is a random sample of size n
from the distribution of X . All of the statistics in the previous section make sense for X, of course, but now these statistics
are random variables. We will use the notation established in that section, except for the usual convention of denoting
random variables by capital letters. Finally, note that the deterministic properties and relations established in the last
section still hold.
In addition to being a measure of the center of the data X, the sample mean
M=
n
1
Xi
nβˆ‘
i=1
is a natural estimator of the distribution mean ΞΌ. In this section, we will derive statistics that are natural estimators of the
distribution variance Οƒ 2 . The statistics that we will derive are different, depending on whether ΞΌ is known or unknown; for
this reason, ΞΌ is referred to as a nuisance parameter for the problem of estimating Οƒ 2 .
A Special Sample Variance
First we will assume that ΞΌ is known. Although this is almost always an artificial assumption, it is a nice place to start
because the analysis is relatively easy and will give us insight for the standard case. A natural estimator of Οƒ 2 is the
following statistic, which we will refer to as the special sample variance.
W2 =
n
1
(Xi βˆ’ ΞΌ)2
nβˆ‘
i=1
1. W 2 is the sample mean for a random sample of size n from the distribution of (X βˆ’ ΞΌ)2 , and satisfies the following
properties:
a. 𝔼(W 2 ) = Οƒ 2
b. var(W 2 ) =
1
n
(Οƒ4 βˆ’ Οƒ 4 )
c. W 2 β†’ Οƒ 2 as n β†’ ∞ with probability 1
2
4
d. The distribution of √n
β€Ύ (W βˆ’ Οƒ 2 )/βˆšΟƒβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύ
4 βˆ’ Οƒ β€Ύ converges to the standard normal distribution as n β†’ ∞.
Proof:
These result follow immediately from standard results in the section on the Law of Large Numbers and the section on
the Central Limit Theorem. For part (b), note that
var[(X βˆ’ ΞΌ)2 ] = 𝔼[(X βˆ’ ΞΌ)4 ] βˆ’ (𝔼[(X βˆ’ ΞΌ)2 ]) = Οƒ4 βˆ’ Οƒ 4
2
In particular part (a) means that W 2 is an unbiased estimator of Οƒ 2 . From part (b), note that var(W 2 ) β†’ 0 as n β†’ ∞; this
means that W 2 is a consistent estimator of Οƒ 2 . The square root of the special sample variance is a special version of the
sample standard deviation, denoted W .
2. 𝔼(W) ≀ Οƒ . Thus, W is a negativley biased estimator that tends to underestimate Οƒ .
Proof:
This follows from Theorem 1(a) and Jensen's inequality. Since w ↦ √w
β€Ύβ€Ύ is concave downward on [0, ∞), we have
β€Ύβ€Ύβ€Ύ2 ≀ βˆšβ€Ύ
β€Ύβ€Ύβ€Ύβ€Ύ
𝔼(W) = 𝔼 (βˆšβ€Ύ
W
𝔼(
W 2β€Ύ) = βˆšβ€Ύβ€Ύ
Οƒβ€Ύ2 = Οƒ .
)
Next we compute the covariance and correlation between the sample mean and the special sample variance.
3. The covariance and correlation of M and W 2 are
a. cov(M, W 2 ) = Οƒ3 /n .
2 (Οƒ βˆ’ Οƒ 4β€Ύ
β€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύ
b. cor(M, W 2 ) = Οƒ 3 /βˆšΟƒ
)
4
Proof:
From the bilinearity of the covariance operator and by independence,
⎑
⎀
n
n
n
1
1
1
cov(M, W 2 ) = cov ⎒⎒
Xi ,
(Xj βˆ’ ΞΌ)2 βŽ₯βŽ₯ = 2
cov[Xi , (Xi βˆ’ ΞΌ)2 ]
βˆ‘
βˆ‘
nβˆ‘
n
⎣ n i=1
⎦
j=1
i=1
But cov[Xi , (Xi βˆ’ ΞΌ)2 ] = cov[Xi βˆ’ ΞΌ, (Xi βˆ’ ΞΌ)2 ] = 𝔼[(Xi βˆ’ ΞΌ)3 ] βˆ’ 𝔼(Xi βˆ’ ΞΌ)𝔼[(Xi βˆ’ ΞΌ)2 ] = Οƒ3 . Substituting gives
part (a). Part(b) follows from part (a), Theorem 1 (b), and our previous result that var(M) = Οƒ 2 /n .
Note that the correlation does not depend on the sample size, and that the sample mean and the special sample variance are
uncorrelated if Οƒ3 = 0 (equivalently skew(X) = 0).
The Standard Sample Variance
Consider now the more realistic case in which ΞΌ is unknown. In this case, a natural approach is to average, in some sense,
the squared deviations (Xi βˆ’ M )2 over i ∈ {1, 2, … , n}. It might seem that we should average by dividing by n . However,
another approach is to divide by whatever constant would give us an unbiased estimator of Οƒ 2 . This constant turns out to
be n βˆ’ 1, leading to the standard sample variance:
S2 =
n
1
(Xi βˆ’ M )2
nβˆ’1 βˆ‘
i=1
4. 𝔼(S 2 ) = Οƒ 2 .
Proof:
By expanding (as was shown in the last section),
n
βˆ‘
i=1
(Xi βˆ’ M )2 =
n
βˆ‘
i=1
Xi2 βˆ’ nM 2
Recall that 𝔼(M) = ΞΌ and var(M) = Οƒ 2 /n . Taking expected values in the displayed equation gives
𝔼
n
(βˆ‘
i=1
(Xi βˆ’ M )2
)
=
n
βˆ‘
i=1
(Οƒ 2 + ΞΌ2 ) βˆ’ n
Οƒ2
Οƒ2
+ ΞΌ2 = n(Οƒ 2 + ΞΌ2 ) βˆ’ n
+ ΞΌ2 = (n βˆ’ 1)Οƒ 2
( n
)
( n
)
Of course, the square root of the sample variance is the sample standard deviation, denoted S .
5. 𝔼(S) ≀ Οƒ . Thus, S is a negativley biased estimator than tends to underestimate Οƒ .
Proof:
The proof is exactly the same as in Theorem 2.
6. S 2 β†’ Οƒ 2 as n β†’ ∞ with probability 1.
Proof:
This follows from the strong law of large numbers. Recall again that
S2 =
n
1
n
n
Xi2 βˆ’
M2 =
[M(X 2 ) βˆ’ M 2 (X)]
βˆ‘
n βˆ’ 1 i=1
nβˆ’1
nβˆ’1
But with probability 1, M(X 2 ) β†’ Οƒ 2 + ΞΌ2 as n β†’ ∞ and M 2 (X) β†’ ΞΌ2 as n β†’ ∞.
Since S 2 is an unbiased estimator of Οƒ 2 , the variance of S 2 is the mean square error, a measure of the quality of the
estimator.
7. var(S 2 ) =
Proof:
1
n
(Οƒ4 βˆ’
nβˆ’3
nβˆ’1
Οƒ 4 ).
Recall from the last section that
S2 =
n
n
1
(X βˆ’ Xj )2
βˆ‘ i
2n(n βˆ’ 1) βˆ‘
i=1 j=1
Hence, using the bilinear property of covariance we have
var(S 2 ) = cov(S 2 , S 2 ) =
1
n
n
n
n
βˆ‘βˆ‘βˆ‘
4n 2 (n βˆ’ 1)2 βˆ‘
i=1 j=1 k=1 k=1
cov[(Xi βˆ’ Xj )2 , (Xk βˆ’ Xl )2 ]
We compute the covariances in this sum by considering disjoint cases:
cov[(Xi βˆ’ Xj )2 , (Xk βˆ’ Xl )2 ] = 0 if i = j or k = l, and there are 2n 3 βˆ’ n 2 such terms.
cov[(Xi βˆ’ Xj )2 , (Xk βˆ’ Xl )2 ] = 0 if i, j, k, l are distinct, and there are n(n βˆ’ 1)(n βˆ’ 2)(n βˆ’ 3) such terms.
cov[(Xi βˆ’ Xj )2 , (Xk βˆ’ Xl )2 ] = 2Οƒ4 + 2Οƒ 4 if i β‰  j and {k, l} = {i, j} , and there are 2n(n βˆ’ 1) such terms.
cov[(Xi βˆ’ Xj )2 , (Xk βˆ’ Xl )2 ] = Οƒ4 βˆ’ Οƒ 4 if i β‰  j, k β‰  l and #({i, j} ∩ {k, l}) = 1 , and there are 4n(n βˆ’ 1)(n βˆ’ 2)
such terms.
Substituting gives the result.
Note that var(S 2 ) β†’ 0 as n β†’ ∞, and hence S 2 is a consistent estimator of Οƒ 2 . On the other hand, it's not surprising that
the variance of the standard sample variance (where we assume that ΞΌ is unknown) is greater than the variance of the
special standard variance (in which we assume ΞΌ is known).
8. var(S 2 ) > var(W 2 ).
Proof:
From Theorem 1, Theorem 7, and simple algebra,
var(S 2 ) βˆ’ var(W 2 ) =
2
Οƒ4
n(n βˆ’ 1)
Note however that the difference goes to 0 as n β†’ ∞.
Next we compute the covariance between the sample mean and the sample variance.
9. The covariance and correlation between the sample mean and sample variance are
a. cov(M, S 2 ) = Οƒ3 /n
b. cor(M, S 2 ) =
Οƒ3
ΟƒβˆšΟƒ 4 βˆ’Οƒ 4 (nβˆ’3)/(nβˆ’1)
Proof:
Recall again that
M=
n
1
Xi ,
nβˆ‘
i=1
S2 =
n
n
1
(X βˆ’ Xk )2
βˆ‘ j
2n(n βˆ’ 1) βˆ‘
j=1 k=1
Hence, using the bilinear property of covariance we have
cov(M, S 2 ) =
n
n
n
1
cov[Xi , (Xj βˆ’ Xk )2 ]
βˆ‘βˆ‘
2n 2 (n βˆ’ 1) βˆ‘
i=1 j=1 k=1
We compute the covariances in this sum by considering disjoint cases:
cov[Xi , (Xj βˆ’ Xk )2 ] = 0 if j = k , and there are n 2 such terms.
cov[Xi , (Xj βˆ’ Xk )2 ] = 0 if i, j, k are distinct, and there are n(n βˆ’ 1)(n βˆ’ 2) such terms.
cov[Xi , (Xj βˆ’ Xk )2 ] = Οƒ3 if j β‰  k and i ∈ {j, k}, and there are 2n(n βˆ’ 1) such terms.
Substituting gives part (a). Part (b) follows from part(a), Theorem 7, and var(M) = Οƒ 2 /n .
In particular, note that cov(M, S 2 ) = cov(M, W 2 ) . Again, the sample mean and variance are uncorrelated if Οƒ3 = 0 so that
skew(X) = 0. Our last result gives the covariance and correlation between the special sample variance and the standard
one. Curiously, the covariance the same as the variance of the special sample variance.
10. The covariance and correlation between W 2 and S 2 are
a. cov(W 2 , S 2 ) = (Οƒ4 βˆ’ Οƒ 4 )/n
β€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύβ€Ύ
β€Ύ
Οƒ 4 βˆ’Οƒ 4
b. cor(W 2 , S 2 ) =
√ Οƒ4 βˆ’Οƒ 4 (nβˆ’3)/(nβˆ’1)
Proof:
Recall again that
W2 =
n
1
(Xi βˆ’ ΞΌ)2 ,
nβˆ‘
i=1
S2 =
n
n
1
(X βˆ’ Xk )2
βˆ‘ j
2n(n βˆ’ 1) βˆ‘
j=1 k=1
so by the bilinear property of covariance we have
cov(W 2 , S 2 ) =
n
n
n
1
cov[(Xi βˆ’ ΞΌ)2 , (Xj βˆ’ Xk )2 ]
2
βˆ‘βˆ‘
2n (n βˆ’ 1) βˆ‘
i=1 j=1 k=1
Once again, we compute the covariances in this sum by considering disjoint cases:
cov[(Xi βˆ’ ΞΌ)2 , (Xj βˆ’ Xk )2 ] = 0 if j = k , and there are n 2 such terms.
cov[(Xi βˆ’ ΞΌ)2 , (Xj βˆ’ Xk )2 ] = 0 if i, j, k are distinct, and there are n(n βˆ’ 1)(n βˆ’ 2) such terms.
cov[(Xi βˆ’ ΞΌ)2 , (Xj βˆ’ Xk )2 ] = Οƒ4 βˆ’ Οƒ 4 if j β‰  k and i ∈ {j, k}, and there are 2n(n βˆ’ 1) such terms.
Substituting gives part (a). Part (b) follows from part (a) and Theorems 1 and 7
Note that cor(W 2 , S 2 ) β†’ 1 as n β†’ ∞, not surprising since with probability 1, S 2 β†’ Οƒ 2 and W 2 β†’ Οƒ 2 as n β†’ ∞.
Exercises
Simulation Exercises
Many of the applets in this project are simulations of experiments with a basic random variable of interest. When you run
the simulation, you are performing independent replications of the experiment. In most cases, the applet displays the
standard deviation of the distribution, both numerically in a table and graphically as the radius of the blue, horizontal bar
in the graph box. When you run the simulation, the sample standard deviation is also displayed numerically in the table
and graphically as the radius of the red horizontal bar in the graph box.
11. In the binomial coin experiment, the random variable is the number of heads. For various values of the parameters n
(the number of coins) and p (the probability of heads), run the simulation 1000 times and note the apparent agreement
between the sample standard deviation and the distribution standard deviation.
12. In the simulation of the matching experiment, the random variable is the number of matches. For selected values of n
(the number of balls), run the simulation 1000 times and note the apparent agreement between the sample standard
deviation and the distribution standard deviation.
13. Run the simulation of the gamma experiment 1000 times for various values of the rate parameter r and the shape
parameter k . Note the apparent agreement between the sample standard deviation and the distribution standard
deviation.
Computational Exercises
14. Suppose that X has probability density function f (x) = 12 x 2 (1 βˆ’ x) for 0 ≀ x ≀ 1. The distribution of X is a
member of the beta family. Compute each of the following
a. ΞΌ = 𝔼(X)
b. Οƒ 2 = var(X)
c. d3 = 𝔼[(X βˆ’ ΞΌ)3 ]
d. d4 = 𝔼[(X βˆ’ ΞΌ)4 ]
Answer:
a. 3/5
b. 1/25
c. βˆ’2/875 \)
d. 33/8750
15. Suppose now that (X1 , X2 , … , X10 ) is a random sample of size 10 from the beta distribution in the previous problem.
Find each of the following:
a. 𝔼(M)
b. var(M)
c. 𝔼(W 2 )
d. var(W 2 )
e. 𝔼(S 2 )
f. var(S 2 )
g. cov(M, W 2 )
h. cov(M, S 2 )
i. cov(W 2 , S 2 )
Answer:
a. 3/5
b. 1/250
c. 1/25
d. 19/87 500
e. 1/25
f. 199/787 500
g. βˆ’2/8750
h. βˆ’2/8750
i. 19/87 500
16. Suppose that X has probability density function f (x) = Ξ»eβˆ’Ξ»x for 0 ≀ x < ∞, where Ξ» > 0 is a parameter. Thus X has
the exponential distribution with rate parameter Ξ». Compute each of the following
a. ΞΌ = 𝔼(X)
b. Οƒ 2 = var(X)
c. d3 = 𝔼[(X βˆ’ ΞΌ)3 ]
d. d4 = 𝔼[(X βˆ’ ΞΌ)4 ]
Answer:
a. 1/Ξ»
b. 1/Ξ»2
c. 2/Ξ»3
d. 9/Ξ»4
17. Suppose now that (X1 , X2 , … , X5 ) is a random sample of size 5 from the exponential distribution in the previous
problem. Find each of the following:
a. 𝔼(M)
b. var(M)
c. 𝔼(W 2 )
d. var(W 2 )
e. 𝔼(S 2 )
f. var(S 2 )
g. cov(M, W 2 )
h. cov(M, S 2 )
i. cov(W 2 , S 2 )
Answer:
a. 1/Ξ»
b. 1/5Ξ»2
c. 1/Ξ»2
d. 8/5Ξ»4
e. 1/Ξ»2
f. 17/10Ξ»4
g. 2/5Ξ»3
h. 2/5Ξ»3
i. 8/5Ξ»4
18. Recall that for an ace-six flat die, faces 1 and 6 have probability
1
4
each, while faces 2, 3, 4, and 5 have probability
each. Let X denote the score when an ace-six flat die is thrown. Compute each of the following:
1
8
a. ΞΌ = 𝔼(X)
b. Οƒ 2 = var(X)
c. d3 = 𝔼[(X βˆ’ ΞΌ)3 ]
d. d4 = 𝔼[(X βˆ’ ΞΌ)4 ]
Answer:
a. 7/2
b. 15/4
c. 0
d. 333/16
19. Suppose now that an ace-six flat die is tossed 8 times. Find each of the following:
a. 𝔼(M)
b. var(M)
c. 𝔼(W 2 )
d. var(W 2 )
e. 𝔼(S 2 )
f. var(S 2 )
g. cov(M, W 2 )
h. cov(M, S 2 )
i. cov(W 2 , S 2 )
Answer:
a. 7/2
b. 15/32
c. 15/4
d. 27/32
e. 15/4
f. 207/512
g. 0
h. 0
i. 27/32
A particularly important special case occurs when the sampling distribution is normal. This case is explored in the section
on Special Properties of Normal Samples.