6. The Sample Variance II

Virtual Laboratories > 5. Random Samples > 1 2 3 4 5 6 7 8 9 10 11
6. The Sample Variance II
We continue our discussion of the sample variance from the last section, but now we assume that the variables are random.
Thus, suppose that we have a basic random experiment, and that X is a real-valued random variable for the experiment
with mean μ and standard deviation σ . We will need some higher order moments as well. Let σ3 = 𝔼[(X − μ)3 ] and
σ4 = 𝔼[(X − μ)4 ] denote the 3rd and 4th moments about the mean. Recall that σ3 /σ 3 = skew(X), the skewness of X , and
σ4 /σ 4 = kurt(X) , the kurtosis of X . We assume that σ4 < ∞.
We repeat the basic experiment n times to form a new, compound experiment, with a sequence of independent random
variables X = (X1 , X2 , … , Xn ), each with the same distribution as X . In statistical terms, X is a random sample of size n
from the distribution of X . All of the statistics in the previous section make sense for X, of course, but now these statistics
are random variables. We will use the notation established in that section, except for the usual convention of denoting
random variables by capital letters. Finally, note that the deterministic properties and relations established in the last
section still hold.
In addition to being a measure of the center of the data X, the sample mean
M=
n
1
Xi
n∑
i=1
is a natural estimator of the distribution mean μ. In this section, we will derive statistics that are natural estimators of the
distribution variance σ 2 . The statistics that we will derive are different, depending on whether μ is known or unknown; for
this reason, μ is referred to as a nuisance parameter for the problem of estimating σ 2 .
A Special Sample Variance
First we will assume that μ is known. Although this is almost always an artificial assumption, it is a nice place to start
because the analysis is relatively easy and will give us insight for the standard case. A natural estimator of σ 2 is the
following statistic, which we will refer to as the special sample variance.
W2 =
n
1
(Xi − μ)2
n∑
i=1
1. W 2 is the sample mean for a random sample of size n from the distribution of (X − μ)2 , and satisfies the following
properties:
a. 𝔼(W 2 ) = σ 2
b. var(W 2 ) =
1
n
(σ4 − σ 4 )
c. W 2 → σ 2 as n → ∞ with probability 1
2
4
d. The distribution of √n
‾ (W − σ 2 )/√σ‾‾‾‾‾‾
4 − σ ‾ converges to the standard normal distribution as n → ∞.
Proof:
These result follow immediately from standard results in the section on the Law of Large Numbers and the section on
the Central Limit Theorem. For part (b), note that
var[(X − μ)2 ] = 𝔼[(X − μ)4 ] − (𝔼[(X − μ)2 ]) = σ4 − σ 4
2
In particular part (a) means that W 2 is an unbiased estimator of σ 2 . From part (b), note that var(W 2 ) → 0 as n → ∞; this
means that W 2 is a consistent estimator of σ 2 . The square root of the special sample variance is a special version of the
sample standard deviation, denoted W .
2. 𝔼(W) ≤ σ . Thus, W is a negativley biased estimator that tends to underestimate σ .
Proof:
This follows from Theorem 1(a) and Jensen's inequality. Since w ↦ √w
‾‾ is concave downward on [0, ∞), we have
‾‾‾2 ≤ √‾
‾‾‾‾
𝔼(W) = 𝔼 (√‾
W
𝔼(
W 2‾) = √‾‾
σ‾2 = σ .
)
Next we compute the covariance and correlation between the sample mean and the special sample variance.
3. The covariance and correlation of M and W 2 are
a. cov(M, W 2 ) = σ3 /n .
2 (σ − σ 4‾
‾‾‾‾‾‾‾‾‾‾
b. cor(M, W 2 ) = σ 3 /√σ
)
4
Proof:
From the bilinearity of the covariance operator and by independence,
⎡
⎤
n
n
n
1
1
1
cov(M, W 2 ) = cov ⎢⎢
Xi ,
(Xj − μ)2 ⎥⎥ = 2
cov[Xi , (Xi − μ)2 ]
∑
∑
n∑
n
⎣ n i=1
⎦
j=1
i=1
But cov[Xi , (Xi − μ)2 ] = cov[Xi − μ, (Xi − μ)2 ] = 𝔼[(Xi − μ)3 ] − 𝔼(Xi − μ)𝔼[(Xi − μ)2 ] = σ3 . Substituting gives
part (a). Part(b) follows from part (a), Theorem 1 (b), and our previous result that var(M) = σ 2 /n .
Note that the correlation does not depend on the sample size, and that the sample mean and the special sample variance are
uncorrelated if σ3 = 0 (equivalently skew(X) = 0).
The Standard Sample Variance
Consider now the more realistic case in which μ is unknown. In this case, a natural approach is to average, in some sense,
the squared deviations (Xi − M )2 over i ∈ {1, 2, … , n}. It might seem that we should average by dividing by n . However,
another approach is to divide by whatever constant would give us an unbiased estimator of σ 2 . This constant turns out to
be n − 1, leading to the standard sample variance:
S2 =
n
1
(Xi − M )2
n−1 ∑
i=1
4. 𝔼(S 2 ) = σ 2 .
Proof:
By expanding (as was shown in the last section),
n
∑
i=1
(Xi − M )2 =
n
∑
i=1
Xi2 − nM 2
Recall that 𝔼(M) = μ and var(M) = σ 2 /n . Taking expected values in the displayed equation gives
𝔼
n
(∑
i=1
(Xi − M )2
)
=
n
∑
i=1
(σ 2 + μ2 ) − n
σ2
σ2
+ μ2 = n(σ 2 + μ2 ) − n
+ μ2 = (n − 1)σ 2
( n
)
( n
)
Of course, the square root of the sample variance is the sample standard deviation, denoted S .
5. 𝔼(S) ≤ σ . Thus, S is a negativley biased estimator than tends to underestimate σ .
Proof:
The proof is exactly the same as in Theorem 2.
6. S 2 → σ 2 as n → ∞ with probability 1.
Proof:
This follows from the strong law of large numbers. Recall again that
S2 =
n
1
n
n
Xi2 −
M2 =
[M(X 2 ) − M 2 (X)]
∑
n − 1 i=1
n−1
n−1
But with probability 1, M(X 2 ) → σ 2 + μ2 as n → ∞ and M 2 (X) → μ2 as n → ∞.
Since S 2 is an unbiased estimator of σ 2 , the variance of S 2 is the mean square error, a measure of the quality of the
estimator.
7. var(S 2 ) =
Proof:
1
n
(σ4 −
n−3
n−1
σ 4 ).
Recall from the last section that
S2 =
n
n
1
(X − Xj )2
∑ i
2n(n − 1) ∑
i=1 j=1
Hence, using the bilinear property of covariance we have
var(S 2 ) = cov(S 2 , S 2 ) =
1
n
n
n
n
∑∑∑
4n 2 (n − 1)2 ∑
i=1 j=1 k=1 k=1
cov[(Xi − Xj )2 , (Xk − Xl )2 ]
We compute the covariances in this sum by considering disjoint cases:
cov[(Xi − Xj )2 , (Xk − Xl )2 ] = 0 if i = j or k = l, and there are 2n 3 − n 2 such terms.
cov[(Xi − Xj )2 , (Xk − Xl )2 ] = 0 if i, j, k, l are distinct, and there are n(n − 1)(n − 2)(n − 3) such terms.
cov[(Xi − Xj )2 , (Xk − Xl )2 ] = 2σ4 + 2σ 4 if i ≠ j and {k, l} = {i, j} , and there are 2n(n − 1) such terms.
cov[(Xi − Xj )2 , (Xk − Xl )2 ] = σ4 − σ 4 if i ≠ j, k ≠ l and #({i, j} ∩ {k, l}) = 1 , and there are 4n(n − 1)(n − 2)
such terms.
Substituting gives the result.
Note that var(S 2 ) → 0 as n → ∞, and hence S 2 is a consistent estimator of σ 2 . On the other hand, it's not surprising that
the variance of the standard sample variance (where we assume that μ is unknown) is greater than the variance of the
special standard variance (in which we assume μ is known).
8. var(S 2 ) > var(W 2 ).
Proof:
From Theorem 1, Theorem 7, and simple algebra,
var(S 2 ) − var(W 2 ) =
2
σ4
n(n − 1)
Note however that the difference goes to 0 as n → ∞.
Next we compute the covariance between the sample mean and the sample variance.
9. The covariance and correlation between the sample mean and sample variance are
a. cov(M, S 2 ) = σ3 /n
b. cor(M, S 2 ) =
σ3
σ√σ 4 −σ 4 (n−3)/(n−1)
Proof:
Recall again that
M=
n
1
Xi ,
n∑
i=1
S2 =
n
n
1
(X − Xk )2
∑ j
2n(n − 1) ∑
j=1 k=1
Hence, using the bilinear property of covariance we have
cov(M, S 2 ) =
n
n
n
1
cov[Xi , (Xj − Xk )2 ]
∑∑
2n 2 (n − 1) ∑
i=1 j=1 k=1
We compute the covariances in this sum by considering disjoint cases:
cov[Xi , (Xj − Xk )2 ] = 0 if j = k , and there are n 2 such terms.
cov[Xi , (Xj − Xk )2 ] = 0 if i, j, k are distinct, and there are n(n − 1)(n − 2) such terms.
cov[Xi , (Xj − Xk )2 ] = σ3 if j ≠ k and i ∈ {j, k}, and there are 2n(n − 1) such terms.
Substituting gives part (a). Part (b) follows from part(a), Theorem 7, and var(M) = σ 2 /n .
In particular, note that cov(M, S 2 ) = cov(M, W 2 ) . Again, the sample mean and variance are uncorrelated if σ3 = 0 so that
skew(X) = 0. Our last result gives the covariance and correlation between the special sample variance and the standard
one. Curiously, the covariance the same as the variance of the special sample variance.
10. The covariance and correlation between W 2 and S 2 are
a. cov(W 2 , S 2 ) = (σ4 − σ 4 )/n
‾‾‾‾‾‾‾‾‾‾‾‾
‾
σ 4 −σ 4
b. cor(W 2 , S 2 ) =
√ σ4 −σ 4 (n−3)/(n−1)
Proof:
Recall again that
W2 =
n
1
(Xi − μ)2 ,
n∑
i=1
S2 =
n
n
1
(X − Xk )2
∑ j
2n(n − 1) ∑
j=1 k=1
so by the bilinear property of covariance we have
cov(W 2 , S 2 ) =
n
n
n
1
cov[(Xi − μ)2 , (Xj − Xk )2 ]
2
∑∑
2n (n − 1) ∑
i=1 j=1 k=1
Once again, we compute the covariances in this sum by considering disjoint cases:
cov[(Xi − μ)2 , (Xj − Xk )2 ] = 0 if j = k , and there are n 2 such terms.
cov[(Xi − μ)2 , (Xj − Xk )2 ] = 0 if i, j, k are distinct, and there are n(n − 1)(n − 2) such terms.
cov[(Xi − μ)2 , (Xj − Xk )2 ] = σ4 − σ 4 if j ≠ k and i ∈ {j, k}, and there are 2n(n − 1) such terms.
Substituting gives part (a). Part (b) follows from part (a) and Theorems 1 and 7
Note that cor(W 2 , S 2 ) → 1 as n → ∞, not surprising since with probability 1, S 2 → σ 2 and W 2 → σ 2 as n → ∞.
Exercises
Simulation Exercises
Many of the applets in this project are simulations of experiments with a basic random variable of interest. When you run
the simulation, you are performing independent replications of the experiment. In most cases, the applet displays the
standard deviation of the distribution, both numerically in a table and graphically as the radius of the blue, horizontal bar
in the graph box. When you run the simulation, the sample standard deviation is also displayed numerically in the table
and graphically as the radius of the red horizontal bar in the graph box.
11. In the binomial coin experiment, the random variable is the number of heads. For various values of the parameters n
(the number of coins) and p (the probability of heads), run the simulation 1000 times and note the apparent agreement
between the sample standard deviation and the distribution standard deviation.
12. In the simulation of the matching experiment, the random variable is the number of matches. For selected values of n
(the number of balls), run the simulation 1000 times and note the apparent agreement between the sample standard
deviation and the distribution standard deviation.
13. Run the simulation of the gamma experiment 1000 times for various values of the rate parameter r and the shape
parameter k . Note the apparent agreement between the sample standard deviation and the distribution standard
deviation.
Computational Exercises
14. Suppose that X has probability density function f (x) = 12 x 2 (1 − x) for 0 ≤ x ≤ 1. The distribution of X is a
member of the beta family. Compute each of the following
a. μ = 𝔼(X)
b. σ 2 = var(X)
c. d3 = 𝔼[(X − μ)3 ]
d. d4 = 𝔼[(X − μ)4 ]
Answer:
a. 3/5
b. 1/25
c. −2/875 \)
d. 33/8750
15. Suppose now that (X1 , X2 , … , X10 ) is a random sample of size 10 from the beta distribution in the previous problem.
Find each of the following:
a. 𝔼(M)
b. var(M)
c. 𝔼(W 2 )
d. var(W 2 )
e. 𝔼(S 2 )
f. var(S 2 )
g. cov(M, W 2 )
h. cov(M, S 2 )
i. cov(W 2 , S 2 )
Answer:
a. 3/5
b. 1/250
c. 1/25
d. 19/87 500
e. 1/25
f. 199/787 500
g. −2/8750
h. −2/8750
i. 19/87 500
16. Suppose that X has probability density function f (x) = λe−λx for 0 ≤ x < ∞, where λ > 0 is a parameter. Thus X has
the exponential distribution with rate parameter λ. Compute each of the following
a. μ = 𝔼(X)
b. σ 2 = var(X)
c. d3 = 𝔼[(X − μ)3 ]
d. d4 = 𝔼[(X − μ)4 ]
Answer:
a. 1/λ
b. 1/λ2
c. 2/λ3
d. 9/λ4
17. Suppose now that (X1 , X2 , … , X5 ) is a random sample of size 5 from the exponential distribution in the previous
problem. Find each of the following:
a. 𝔼(M)
b. var(M)
c. 𝔼(W 2 )
d. var(W 2 )
e. 𝔼(S 2 )
f. var(S 2 )
g. cov(M, W 2 )
h. cov(M, S 2 )
i. cov(W 2 , S 2 )
Answer:
a. 1/λ
b. 1/5λ2
c. 1/λ2
d. 8/5λ4
e. 1/λ2
f. 17/10λ4
g. 2/5λ3
h. 2/5λ3
i. 8/5λ4
18. Recall that for an ace-six flat die, faces 1 and 6 have probability
1
4
each, while faces 2, 3, 4, and 5 have probability
each. Let X denote the score when an ace-six flat die is thrown. Compute each of the following:
1
8
a. μ = 𝔼(X)
b. σ 2 = var(X)
c. d3 = 𝔼[(X − μ)3 ]
d. d4 = 𝔼[(X − μ)4 ]
Answer:
a. 7/2
b. 15/4
c. 0
d. 333/16
19. Suppose now that an ace-six flat die is tossed 8 times. Find each of the following:
a. 𝔼(M)
b. var(M)
c. 𝔼(W 2 )
d. var(W 2 )
e. 𝔼(S 2 )
f. var(S 2 )
g. cov(M, W 2 )
h. cov(M, S 2 )
i. cov(W 2 , S 2 )
Answer:
a. 7/2
b. 15/32
c. 15/4
d. 27/32
e. 15/4
f. 207/512
g. 0
h. 0
i. 27/32
A particularly important special case occurs when the sampling distribution is normal. This case is explored in the section
on Special Properties of Normal Samples.