Homework 6 - Solutions Intermediate Statistics - 36705

Homework 6 - Solutions
Intermediate Statistics - 36705
Problem 1. [20 points: 5+10+5]
(a) First note that:
θ = P (Xi = 0) = e−λ
¯ So by
In HW5, problem 3, we saw that the MLE for λ is given by the sample mean X.
¯
−X
ˆ
equivariance we have that the MLE θ for θ is e .
(b) By the Central Limit Theorem:
√ ¯
n(X − λ)
√
N (0, 1)
λ
¯
So by the Delta Method with g(t) = e−t we have that the limiting distribution of θˆ = e−X is
defined by:
√ −X¯
n(e − e−λ )
√
N (0, 1)
e−λ λ
P
¯ →
(c) By the Weak Law of Large numbers X
λ. Since function e−t is continuous, by the
¯ P
Continuous Mapping Theorem θˆ = e√−X → e−λ , i.e. θˆ is consistent.
¯
Note: notice that we also have n(√X−λ)
N (0, 1). How would you formally show it?
¯
X
Problem 2. [30 points: 5+5+10+10]
(a)
θ
¯
=⇒ θˆ1 = 2X
2
The likelihood function is L(θ) = θ1n I(X(n) ,∞) (θ). This function is equal to zero on (−∞, X(n) ),
n
then at X(n) jumps to 1/(X(n)
) > 0 and decreases asymptotically to zero as θ → ∞. This
means that the supremum is at θ = X(n) , i.e. the MLE for θ is
E [X1 ] =
θˆ2 = X(n)
P
¯−
(b) By the Weak Law of Large Numbers we know that X
→ θ/2 so by Continuous Mapping
Theorem θˆ1 is consistent. To show consistency of θˆ2 , let > 0 and then, using results from
previous HWs and Test1:
n
P(|θ − X(n) | > ) = P(θ − > X(n) ) = 1 −
→0
θ
as n → ∞, which proves the claim.
(c) MOME. We have that Var(θˆ1 ) =
θ2
.
3n
Thus, by the Central Limit Theorem:
√
¯ − θ)
θˆ1 − θ
n(2X
√
=
θ/ 3
sd(θˆ1 )
N (0, 1)
MLE. To find the limiting distribution θˆ2 = X(n) :
n(θ − X(n) )
c n
P
≤ c = P(θ − X(n) ≤ θc/n) = 1 − 1 −
→ 1 − e−c
θ
n
which implies that:
n(θ − X(n) )
Exp(1)
θ
(d) MOME. For every > 0 we can find K such that
√ nVar(θˆ1 )
θ2
=
<
n|θˆ1 − θ| > K = P |θˆ1 − θ| > K/ n ≤
K2
3K 2
√
in particular for every K > √θ3 . Thus, θˆ1 − θ = OP (1/ n).
n
MLE. Recall that if Wi ∼ Uniform(0, 1), then W(n) ∼ Beta(n, 1) such that E[W(n) ] = n+1
2
θn
θ
and Var(W(n) ) = (n+1)n2 (n+2) . Thus E[θˆ2 ] = n+1
and Var(θˆ2 ) = (n+1)θ 2n(n+2) and Bias(θˆ2 ) = n+1
.
Therefore, for every > 0 we can find K such that
2
2
2
2
n
θ
n
n
θ
2
+
P(n|θˆ1 − θ| > K) ≤ 2 E[(θˆ1 − θ) ] = 2
K
K
(n + 1)2 (n + 1)2 (n + 2)
n2 θ 2 θ 2 n
2θ2
≤ 2
+
<
≤
K
n2
n3
K2
√
P
√
in particular for every K > 2θ . This proves that θˆ2 − θ = OP (1/n).
In terms of speed of convergence the θˆ2 is preferable.
Problem 3. [10 points]
¯ We also have that
For the Normal distribution we know the MLE for its mean is given by X.
1
¯
in this problem sd(X) = √n . Thus, the Wald statistic for this problem is given by:
W =
√
¯ − µ0 )
n(X
and under the null hypothesis H0 : µ = µ0 we have W ∼ N (0, 1) (exact distribution).
The alternative hypothesis is H1 : µ 6= µ0 , which means that we are doing a two-sided test.
Thus, in the case of equal tails, the rejection rule of a size α Wald test has the form:
√
¯ − µ0 )| > zα/2
W < −zα/2 or W > zα/2 ⇐⇒ |W | > zα/2 ⇐⇒ | n(X
where zβ = Φ−1 (1 − β), and Φ(∗) is the cdf of Z ∼ N (0, 1). Notice that the absolute value
comes from symmetry about zero.
√
The p-value for the observed w = n(¯
x − µ) is:
Pµ0 (|W | > w) = 2Φ(−|w|)
where Φ √
is the CDF of the standard normal distribution. If Xi ∼ N (µ, 1) then we have that
W ∼ N ( n(µ − µ0 ), 1), where µ is the true value. Thus, the CDF of the p-value is given by:
Pµ (2Φ(−|W |) ≤ α) = Pµ (Φ(−|W |) ≤ α/2) = Pµ (−|W | ≤ −zα/2 ) =
= Pµ (W ≤ −zα/2 ) + Pµ (W ≥ zα/2 )
√
Writing W = Z + n(µ − µ0 ), where Z ∼ N (0, 1), the above equals to
√
√
= Pµ (Z ≤ −zα/2 − n(µ − µ0 )) + Pµ (Z ≥ zα/2 − n(µ − µ0 ))
√
√
= Φ(−zα/2 − n(µ − µ0 )) + Φ(−zα/2 + n(µ − µ0 ))
In the particular case of µ = µ0 the previous equation equals α which proves that the p-value
has Uniform distribution under the null hypothesis.
Problem 4. [40 points: 10+10+10+10]
(a) We compute the MLE when both µ, σ 2 are unknown for a given sample X1 , . . . , Xn :
" n
#
n
X (Xi − µ)2
1
L(µ, σ 2 ) = √
exp −
2σ 2
2πσ
i=1
Taking log and finding the partial derivatives:
n
1 X
∂
2
log L(µ, σ ) = 2
(Xi − µ)
∂µ
σ i=1
n
∂
n
1 X
2
log L(µ, σ ) = − 2 + 4
(Xi − µ)2
2
∂σ
σ
σ i=1
Setting both partial derivatives equal to 0 and solving the system of equations we obtain the
estimators:
n
1X
2
¯ 2
¯
(Xi − X)
µ
ˆ = X, σ
ˆ =
n i=1
This is exactly the same as example 4 in lecture 7. See also Lecture 9, Example 9 to see that
the second order conditions are satisfied. Since the MLE σ
ˆ is a consistent estimator of σ, by
Slutsky Theorem
¯ −µ
X
√
N (0, 1)
σ
ˆ/ n
Thus, the test statistic of the Wald test for H0 : µ = µ0 versus Ha : µ 6= µ0 is
W =
¯ − µ0
X
√
σ
ˆ/ n
Hence, under H0 we have W ≈ N (0, 1). The alternative hypothesis is H1 : µ 6= µ0 , which
means that we are doing a two-sided test. Thus, in the case of equal tails, the rejection rule of
a size α Wald test has the form:
W < −zα/2 or W > zα/2 ⇐⇒ |W | > zα/2
(b) Likelihood Ratio Test for H0 : µ = µ0 Vs H1 : µ 6= µ0 .
We reject H0 if:
supθ∈Θ0 L(θ)
L(θˆ0 )
=
<k
λ(X n ) =
ˆ
supθ∈Θ L(θ)
L(θ)
where Θ0 = µ0 × (0, +∞), and k is some constant that will make P(TYPE I error) ≤ α.
We have:
(
)
n
X
1
L(µ0 ; σ 2 ) = (2πσ 2 )−n/2 exp − 2
(Xi − µ0 )2
2σ i=1
n
n
1 X
2
l(µ0 ; σ ) = − log(2πσ ) − 2
(Xi − µ0 )2
2
2σ i=1
2
Moreover, let ξ = σ 2 :
∂l
n
1
=− +
∂ξ
2ξ 2
2
Pn
∗
solved for σ
ˆ =ξ =
i=1 (Xi
n
− µ0 )2
Pn
i=1 (Xi −
ξ2
µ0 )2
=0
(1)
(2)
(3)
(4)
. The second order condition is satisfied:
n
n
∂ 2l
=
−
<0
∂ξ 2
2ξ 2 ξ 2
(5)
Thus
Pn
n no
2 −n/2
i=1 (Xi − µ0 )
2π
exp −
n
2
¯ and 1 Pn (Xi − X)
¯ 2 , respectively. Thus:
The MLE estimators for µ and σ 2 are X
i=1
n
Pn
n no
¯ 2 −n/2
(X
−
X)
i
i=1
ˆ = 2π
L(θ)
exp −
n
2
L(θˆ0 ) = L(µ0 , σ
ˆ2) =
Therefore, we will reject H0 if
Pn
n no
2 −n/2
i=1 (Xi − µ0 )
Pn
exp
−
2π
¯ 2 n/2
(Xi − X)
L(θˆ0 )
n
2
n
i=1
λ(X ) =
= Pn
= Pn
=
2
n no
2 −n/2
ˆ
(X
−
µ
)
¯
i
0
L(θ)
(X
−
X)
i=1
i
2π i=1
exp −
n
2
Pn
n/2
¯ 2
¯ − µ0 )2 −n/2
n(X
i=1 (Xi − X)
= Pn
= 1 + Pn
<k
¯ 2 + n(X
¯ − µ0 )2
¯ 2
(Xi − X)
(Xi − X)
i=1
i.e. we reject H0 if
(6)
(7)
(8)
(9)
i=1
¯ − µ0 )2
n(X
Pn
¯ 2
i=1 (Xi − X)
(10)
¯ − µ0
X
is sufficiently large. Equivalently we reject H0 when T = p
is very large or very small,
2 /n
S
Pn
¯ 2
i=1 (Xi − X)
2
where S =
. Since T ∼ Tn−1 , for a fixed α we reject H0 if |T | > tn−1,α/2 .
n−1
However, under the null hypothesis, for large n, T ≈ N (0, 1), such that we can also use zα/2 as
the critical value. Thus, compare results in (a).
(c) We have that
Pn
i=1 (Xi −
σ2
¯ 2
X)
∼ χ2n−1
so that for the MLE
P
¯ 2
σ 2 ni=1 (Xi − X)
σ2 2
σ
ˆ =
χ
∼
n
σ2
n n−1
p
2
Thus, sd(ˆ
σ 2 ) = σn 2(n − 1) which we can estimate by
ˆ2 p
ˆ σ2) = σ
2(n − 1)
sd(ˆ
n
So by asymptotic normality of MLE and Slutsky Theorem, we have
2
σ
ˆ 2 − σ2
σ
ˆ 2 − σ2
p
p
≈
σ
ˆ2
2
2(n
−
1)
σ
ˆ
2/n
n
N (0, 1)
So the test statistic is
σ
ˆ 2 − σ02
p
σ
ˆ 2 2/n
and the rejection region of the α level Wald Test is defined by
W =
|W | > zα/2
for similar reasons explained in (a).
(d) Likelihood Ratio Test for H0 : σ = σ0 Vs. σ 6= σ0 .
We reject H0 if:
supθ∈Θ0 L(θ)
L(θˆ0 )
=
<k
(11)
λ(X n ) =
ˆ
supθ∈Θ L(θ)
L(θ)
where Θ0 = σ0 × R and k is a constant. For every fixed value of σ 2 , we have that the MLE for
¯ Thus:
µ is X.
P
¯ 2
1 ni=1 (Xi − X)
2 −n/2
ˆ
L(θ0 ) = 2πσ0
exp −
(12)
2
σ02
¯ and 1 Pn (Xi − X)
¯ 2 , respectively. Thus:
The MLE estimators for µ and σ 2 are X
n i=1
Pn
n no
¯ 2 −n/2
i=1 (Xi − X)
ˆ
exp −
(13)
L(θ) = 2π
n
2
Thus:
P
¯ 2
1 ni=1 (Xi − X)
exp −
2
σ02
n
λ(X ) = Pn
=
n no
¯ 2 −n/2
i=1 (Xi − X)
2π
exp −
n
2
P
Pn
¯ 2 n/2
¯ 2 n
n ni=1 (Xi − X)
i=1 (Xi − X)
=
+
exp −
nσ02
2
nσ02
2
−n/2
(2πσ02 )
(14)
(15)
We can see that λ(X n ) is a function of the kind (g(y))n/2 , where g(y) = ye1−y . Function g is
positive with unique maximum
in y = 1 (see figure below). We will reject
< k1 or
Pn H0 if g(y)
Pn
2
2
¯
¯
(Xi − X)
(Xi − X)
. We can then use statistic T = i=1 2
∼ χ2n−1 ,
g(y) > k2 , where y = i=1 2
nσ0
σ0
under the null hypothesis H0 . In particular, we will reject H0 if T < a or T > b, for some
particular a, b such that Pσ0 ({T < a} ∪ {T > b}) = α. The exact choice would be {a, b} =
2/n
g −1 (kα ), where kα is such that Pσ0 (λ(X n ) < kα ) = α and g −1 is the generalized inverse of
g. However, we might simply consider the equal tailed quantiles of the χ2n−1 distribution (i.e.
a = χ2n−1;1−α/2 , b = χ2n−1;α/2 ), or choosing a, b delimiting the shortest interval with coverage
1 − α of the χ2n−1 distribution (see the figure below). Those a, b satisfy the following system of
equations:

 0<a<b
f (a) = f (b)
(16)
 Rb
f
(t)dt
=
1
−
α
a
where f is the p.d.f. of χ2n−1 . These values can be found numerically (e.g. by using R).
Alternatively, we could use the asymptotic result κ = −2 log λ(X n ) ≈ χ21 (1 df because under
H1 there are 2 free parameters, while under H0 we have 1 free parameter) and rejecting H0 if
κ > χ21,α . Yet, in this exercise we prefer the previous non-asymptotic solutions.
g(y) = yexp(1 − y)
0.0
0.10
0.00
0.2
0.05
0.4
g(y)
f(t)
0.6
0.8
0.15
1.0
Chi−square. LR−Test on sigma
0
5
10
y
15
20
0
2
4
6
t
8
10