Chapter 5 – Properties of a Random Sample  , X

Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Chapter 5 – Properties of a Random Sample
Section 5.4 – Order Statistics
Definition 5.4.1: The order statistics of a random sample X 1 ,, X n are the sample values placed in ascending
order. They are denoted by X (1) ,, X ( n ) , where X (1)    X ( n ) .
Theorem 5.4.4: Let X (1) ,, X ( n ) denote the order statistics of a random sample, X 1 ,, X n , from a continuous
population with cdf FX ( x) and pdf f X ( x) . Then the pdf of X ( j ) is
f X ( j ) ( x) 
n!
f X ( x)[ FX ( x)] j 1[1  FX ( x)]n j .
( j  1)!(n  j )!
Example 5.4.5: (Uniform order statistic pdf) The j th order statistic from uniform(0,1) sample has a
j
j (n  j  1)
beta ( j , n  j  1) . Consequently, E ( X ( j ) ) 
and VarX ( j ) 
.
n 1
(n  1) 2 (n  2)
Theorem 5.4.6: Let X (1) ,, X ( n ) denote the order statistics of a random sample, X 1 ,, X n , from a continuous
population with cdf FX ( x) and pdf f X ( x) . Then the joint pdf of X (i ) and X ( j ) 1  i  j  n , is
f X ( i ) , X ( j ) (u, v) 
n!
f X (u ) f X (v)[ FX (u )]i 1[ FX (v)  FX (u )] j i 1[1  FX (v)]n j for   u  v   .
(i  1)!( j  i  1)!(n  j )!
1
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
The joint pdf of X (1) ,, X ( n ) is given by
n! f ( x ) f X ( xn ),   x1    xn ;
f X (1) ,, X ( n ) ( x1 ,, xn )   X 1
0, otherwise.
Example: Let X 1 ,, X n be continuous, independent random variables and let X (1) ,, X ( n ) denote their order
statistics. Assume that X i ~ f X i ( x) . What is the pdf of X (1) and X ( n ) ?
Solution: The cdf of X (1) is:
FX (1) ( x)  P ( X (1)  x)  1  P( X (1)  x)  1  P( X 1  x,, X n  x)
 1   i 1 P( X i  x)  1   i 1[1  P( X i  x)]
n
n
 1   i 1[1  FX i ( x)].
n
So the pdf of X (1) is: f X (1) ( x) 
n
d
FX (1) ( x)   i 1 f X i ( x) j i [1  FX j ( x)].
dx
The cdf of X ( n ) is:
FX ( n ) ( x)  P( X ( n )  x)  P( X 1  x,, X n  x)
  i 1 P( X n  x)   i 1 FX i ( x).
n
So the pdf of X ( n ) is: f X ( n ) ( x) 
n
n
d
FX ( n ) ( x)   i 1 f X i ( x) j i FX j ( x).
dx
If f X i ( x)  f ( x) , then f X (1) ( x)  nf ( x)[1  F ( x)]n1 and f X ( n ) ( x)  nf ( x)[ F ( x)]n1 (same as theorem 5.4.4).
2
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Chapter 6 – Principles of Data Reduction
Section 6.2.1: Sufficient Statistics
Definition 6.2.1 A statistic T ( X) is a sufficient statistic for  if the conditional distribution of the sample X given
the value of T ( X) does not depend on  .
Example 6.2.3 (Binomial sufficient statistic) Let X 1 ,, X n be iid Bernoulli with parameter  ,0    1 . Show
that T ( X)  X 1    X n is a sufficient statistic for  .
Example 6.2.4 (Normal sufficient statistic) Let X 1 ,, X n be iid n(  , 2 ) , where  2 is known. Show that
T ( X)  X is a sufficient statistic for  .
Example (Example of a Statistic that Is Not Sufficient) Consider the model of Example 6.2.2 again with n  3 .
Then T  X 1  X 2  X 3 is sufficient while T  X 1  2 X 2  X 3 is not sufficient because:
P( X 1  1, X 2  0, X 3  1| X 1  2 X 2  X 3  2)   .
3
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Example (Sufficient Statistic for Poisson Family) Let X 1 ,, X n be iid Poisson population with the parameter
  0 . Then T ( X)   i 1 X i is a sufficient statistic for  .
n
Theorem 6.2.6 (Factorization Theorem) Let f (x |  ) denote the joint pdf or pmf of a sample X . A statistic T ( X)
is a sufficient statistic for  if and only if there exist functions g (t |  ) and h(x) such that, for all sample points x
and all parameter points  ,
f (x |  )  g (T (x) |  )h(x) .
Example 6.2.7 Let X 1 ,, X n be iid n(  , 2 ) , where  2 is known. Show that T ( X)  X is a sufficient statistic for
 using the Factorization Theorem.
Example 6.2.8 (Uniform sufficient statistic) Let X 1 ,, X n be iid from a discrete uniform on 1,, . Show that
T ( X)  X ( n )  max1in X i is a sufficient statistic for  .
Example 6.2.9 (Normal sufficient statistic,  and  2 unknown) Let X 1 ,, X n be iid n(  , 2 ) . Show that
T ( X)  (T1 ( X), T2 ( X))  ( X , S 2 ) is a sufficient statistic for  and  2 .
4
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Example (Sufficient Statistic for Poisson Family) Let X 1 ,, X n be iid Poisson population with the parameter
  0 . Then use the Factorization Theorem to show that both T ( X)   i 1 X i and T ( X)  ( X 1 ,  i 2 X i ) are
n
n
sufficient statistics for  .
Theorem 6.2.10 Let X 1 ,, X n be iid from a pdf or pmf f ( x |  ) that belongs to an exponential family given by
f ( x | θ)  h( x)c(θ)exp( i 1 wi (θ)ti ( x)) ,
k
Where θ  (1 ,, d ), d  k . Then
T ( X)  ( j 1 t1 ( X j ), j 1 t2 ( X j ),,  j 1 tk ( X j ))
n
n
n
is a sufficient statistic for θ .
Example (Sufficient Statistic for Poisson Family) Let X 1 ,, X n be iid Poisson population with the parameter
  0 . Then f ( x |  )  exp( )
x
x!

1
exp( )exp( x log( )) , we have h( x)  1 / x!, c( )  exp( ) ,
x!
w( )  log( ) , and t ( x)  x , so T ( X)   i 1 t ( X i )   i 1 X i is a sufficient statistic for  .
n
n
Section 6.2.2: Minimal Sufficient Statistics
5
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Definition 6.2.11 A sufficient statistic T ( X) is called minimal sufficient statistic if, for any other sufficient
statistic T '( X) , T ( X) is a function of T '( X) .
Theorem 6.2.13 Let f (x |  ) be the pmf or pdf of a sample X . Suppose there exists a function T ( X) such that, for
every two sample points x and y , the ratio of f ( x |  ) / f ( y |  ) is constant as a function of  if and only if
T (x)  T (y ) . Then T ( X) is a minimal sufficient statistic for  .
Example 6.2.14 (Normal minimal sufficient statistic) Let X 1 ,, X n be iid n(  , 2 ) , where both  and  2 are
unknown. Show that ( X , S 2 ) is a minimal sufficient statistic for (  , 2 ) .
Example 6.2.15 (Uniform minimal sufficient statistic) Suppose X 1 ,, X n are iid uniform observations on the
interval ( ,  1),      . Show that T ( X )  ( X (1) , X ( n ) ) is a minimal sufficient statistic. (In this example, the
dimension of the minimal sufficient statistic does not match the dimension of the parameter.)
Section 6.2.3 Ancillary Statistics
Definition 6.2.16 A statistic S ( X) whose distribution does not depend on the parameter  is called an ancillary
statistic.
6
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Example 6.2.17 (Uniform ancillary statistic) Let X 1 ,, X n be iid uniform observations on the interval
( ,  1),      . Show that the range statistic, R  X ( n )  X (1) , is an ancillary statistic.
Example 6.2.18 (Location family ancillary statistic) Suppose X 1 ,, X n are iid observations from a location
parameter family with cdf F ( x   ),      . Show that the range statistic, R  X ( n )  X (1) , is an ancillary
statistic.
Example 6.2.19 (Scale family ancillary statistic) Suppose X 1 ,, X n are iid observations from a scale parameter
family with cdf F ( x /  ),  0 . Then any statistic that depends on the sample only through the n  1 values
X 1 / X n ,, X n1 / X n is an ancillary statistic.
Section 6.2.4 Sufficient, Ancillary and Complete Statistics
Definition 6.2.21 Let f (t |  ) be a family of pdfs and pmfs for a statistic T ( X) . The family of probability
distributions is called complete if E g (T )  0 for all  implies P ( g (T )  0)  1 . Equivalently, T ( X) is called a
complete statistic.
7
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Example 6.2.22 (Binomial complete sufficient statistic) Suppose that T has a binomial (n, p ) distribution with
0  p  1 . Show that T is a complete statistic.
Example 6.2.23 (Uniform complete sufficient statistic) Let X 1 ,, X n be iid uniform (0, ),0     ,
observations. Show that T  X ( n ) is a complete statistic.
Theorem 6.2.24 (Basu’s Theorem) If T ( X) is a complete and minimal sufficient statistic, then T ( X) is
independent of every ancillary statistic.
Theorem 6.2.25 (Complete statistics in the exponential family) Let X 1 ,, X n be iid observations from an
exponential family with pdf or pmf of the form
f ( x | θ)  h( x)c(θ)exp( i 1 wi (θ)ti ( x)) ,
k
Where   (1 ,, k ) . Then the statistic
T ( X)  ( j 1 t1 ( X j ), j 1 t2 ( X j ),,  j 1 tk ( X j ))
n
n
is complete as long as the parameter space contains an open set in R k .
8
n
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Example 6.2.26 (Using Basu’s Theorem – I) Let X 1 ,, X n be iid exponential(  ) observations. Compute
E g ( X) where g ( X) 
Xn
.
X1    X n
Example 6.2.27 (Using Basu’s Theorem – II) Let X 1 ,, X n be iid observations from n(  , 2 ) population. Using
Basu’s Theorem, show that X and S 2 are independent.
Example (A minimal sufficient statistics that is not complete) Let X 1 ,, X n be iid n( , 2 ) , we know that
T  ( X , S 2 ) is a minimal sufficient statistic. Let g (T ) 
1
n
X 2  S 2 . Because X ~ n( ,  2 ) , so we have
n
n 1
1
n 1 2
E ( X 2 )  ( E X ) 2  Var ( X )   2   2 
 . We also have E S 2   2 , so
n
n
E ( g (T ))  E (
However, P ( g (T )  0)  P (
n
n
n 1 2
X 2 )  E ( S 2 ) 
*
  2  0 .
n 1
n 1 n
n
X 2  S 2  0)  0 because X and S 2 independent and continuous random
n 1
variables. So T ( X)  ( X , S 2 ) is not complete.
Section 6.3 – Likelihood Principle
9
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Definition 6.3.1 Let f (x |  ) denote the joint pdf or pmf of the sample X  ( X 1 ,, X n ) . Then given that X = x is
observed, the function of  defined by
L( | x)  f (x |  )
is called the likelihood function.
Example (Likelihood Function for Uniform Distribution) Let X 1 ,, X n be iid uniform (0, ) , then the
likelihood function is: L( | x) 
1

n
I[0 x( n )  ] ( x1 ,, xn ) 
1
n
10
I (0, ) ( X ( n ) ) .
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Chapter 7 – Methods of Finding Estimators
Section 7.1 – Introduction
Definition 7.1.1 A point estimator is any function W ( X)  W ( X 1 , X 2 ,, X n ) of a sample; that is, any statistic is a
point estimator.
7.2.1 Method of Moments (MME)
Let X 1 ,, X n be iid from pmf or pdf f ( x | 1 ,, k ) , we have
1st sample moment: m1 
1 n
 Xi
n i 1
1st population moment: 1'  EX  1' (1 ,, k )

k th sample moment: mk 
1 n k
 Xi
n i 1
k th population moment: k'  EX k  k' (1 ,, k )
11
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
To get MME: “Equate” the first k sample moments to the corresponding k population moments and solve these
equations for (1 ,, k ) in terms of (m1 ,, mk )  (
1 n
1 n
1 n
X ,  i 1 X i2 ,,  i 1 X ik ) .

i 1 1
n
n
n
Example 7.2.1 (Normal method of moments) Suppose X 1 ,, X n are iid from a n(  , 2 ) . In this case, k  2 and
1   and  2   2 .
7.2.2 Maximum Likelihood (MLE)
Let X 1 ,, X n be iid from pmf or pdf f ( x | 1 ,, k ) . The likelihood function is defined by
L(θ | x)  L(1 ,, k | x1 ,, xn )   i 1 f ( xi |1 ,, k ) .
n
Definition 7.2.4 For each sample point x , let ˆ(x) be a parameter value at which L(θ | x) attains its maximum as a
function of θ , with x held fixed. A maximum likelihood estimator (MLE) of the parameter θ based on a sample
X is ˆ( X) .
Example 7.2.5 (Normal likelihood) Let X 1 ,, X n are iid from a n(  ,1) . Show that X is the MLE of  using
derivatives.
Solution:
12
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Step 1: find the solution from the equation:
d
L(  | x)  0 , which gives the possible solutions.
d
d2
Step 2: verify the solution achieves the global maximum ( 2 L(  | x) |  x  0 in this case).
d
Step 3: check the boundaries (    in this case; it is not necessary in this case).
Example 7.2.7 (Bernoulli MLE) Let X 1 ,, X n are iid Bernoulli( p ). Find the MLE of p where 0  p  1 . Note
that we include the possibility that p  0 or p  1 .
Solution: Use the natural log of the likelihood function.
Example 7.2.8 (Restricted range MLE) Let X 1 ,, X n are iid from a n( ,1) , where   0 .
Solution: Without any restriction, X is the MLE. So when x  0 , ˆ  x . When x  0 , L( | x) achieves its
maximum at ˆ  0 for   0 , so ˆ  0 in this situation. In summary:
 X , X  0;
ˆ  XI[0, ) ( X )  
0, X  0.
Invariance Property of Maximum Likelihood Estimators
13
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Theorem 7.2.10 (Invariance Property of MLEs) If ˆ is the MLE of  , then for any function  ( ) , the MLE of
 ( ) is  (ˆ) .
Example Let X 1 ,, X n be iid n( ,1) , the MLE of  2 is X 2 .
Example Let X 1 ,, X n be iid Poisson(  ). Find the MLE of P( X  0) .
Solution: The MLE of  is: ˆ  X . Because P( X  0)  exp( ) , so the MLE of P( X  0) is exp( X ) .
Section 7.3 – Methods of Evaluating Estimators
7.3.1 Mean Squared Error
Definition 7.3.1 The mean squared error (MSE) of an estimator W of a parameter  is the function defined by
MSE  E (W   ) 2  VarW  ( BiasW ) 2 , where BiasW  EW   .
Definition 7.3.1 The mean squared error (MSE) of an estimator W as a function of parameter  ,  ( ) is the
function defined by MSE  E (W   ( )) 2  VarW  ( BiasW ) 2 , where BiasW  EW   ( ) .
14
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Definition 7.3.2 The bias of a point estimator W of a parameter  is the difference between the expected value of
W and  . An estimator whose bias is identically (in  ) equal to 0 is called unbiased and satisfies EW   for all  .
Definition 7.3.2 The bias of a point estimator W of a  ( ) is the difference between the expected value of W and
 ( ) . An estimator whose bias is identically (in  ( ) ) equal to 0 is called unbiased and satisfies EW   ( ) for all
.
Example 7.3.3 (Normal MSE) Let X 1 ,, X n be iid from a n(  , 2 ) . We know that X and S 2 are unbiased
estimators of  and  2 , respectively, EX   and ES 2   2 . And the MSE of them are
2 4
MSE ( X )  E ( X   )   / n and MSE ( S )  E ( S   )  VarS 
.
n 1
2
2
2
2
2 2
2
Example 7.3.4 Let X 1 ,, X n be iid from a n(  , 2 ) . Recall that the MLE (and MME) of  2 is
ˆ 2 
1 n
n 1 2
2
(
)
X

X

S . Verify that ˆ 2 has the smaller MSE than S 2 .

i
i 1
n
n
7.3.2 Best Unbiased Estimator
Consider the class of estimators
15
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
C  {W : EW   ( )} .
For any W1 ,W2  C , we have BiasW1  BiasW2   ( )   ( )  0 , so
MSE (W1 )  MSE (W2 )  E (W1   ( )) 2  E (W2   ( )) 2  Var (W1 )  Var (W2 ) .
Definition 7.3.7 An estimator W * is a best unbiased estimator of  ( ) if it satisfies EW *   ( ) for all  and, for
any other estimator W with EW   ( ) , we have Var (W *)  Var (W ) for all  . W * is also called a uniform
minimum variance unbiased estimator (UMVUE) of  ( ) .
Theorem 7.3.9 (Cramer-Rao inequality) Let X 1 ,, X n be a sample with pdf f (x |  ) and let
W ( X)  W ( X 1 ,, X n ) be any estimator satisfying
d

EW ( X)  
[W (x) f (x |  )]dx
 
d
and
Var (W ( X))   .
Then
d
EW ( X)) 2
d
Var (W ( X)) 
.

2
E (( log f ( X |  )) )

(
16
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
where log is the natural logarithm.
Corollary 7.3.10 (Cram´er-Rao inequality, iid case) If the assumptions of Theorem 7.3.9 are satisfied and,
additionally, X 1 ,, X n are iid with pdf f ( x |  ) , then
d
EW ( X)) 2
d
Var (W ( X)) 
.

2
nE (( log f ( X |  )) )

(
Lemma 7.3.11 If f ( x |  ) satisfies
d

 
E ( log f ( X |  ))   [( log f ( x |  )) f ( x |  )]dx
d

 
(true for an exponential family), then

2
2
E {[ log f ( X |  )] }   E ( 2 log f ( X |  )) .


Example 7.3.12 Recall the Poisson problem. We will show that X is the UMVUE of  .
7.3.3 Sufficiency and Unbiasedness
17
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Theorem 7.3.17 (Rao-Blackwell) Let W be any unbiased estimator of  ( ) , and let T be a sufficient statistic for
 . Define  (T )  E (W | T ) . Then E ( (T ))   ( ) and Var ( (T ))  Var (W ) for all  ; i.e.,  (T ) is a uniformly
better unbiased estimator of  ( ) .
Theorem 7.3.19 If W is a best unbiased estimator of  ( ) , then W is unique.
Theorem 7.3.23 Let T be a complete sufficient statistic for a parameter  , and let  (T ) be any estimator based
only on T . Then  (T ) is the unique best unbiased estimator of its expected value.
Example 7.3.24 (Binomial best unbiased estimation) Let X 1 ,, X n be iid binomial (k , ) . We want to estimate
 ( )  P ( X  1)  k (1   )k 1 .
18
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Chapter 8 – Hypothesis Testing
Section 8.1 – Introduction
Definition 8.1.1 A hypothesis is a statement about a population parameter.
Definition 8.1.2 The two complementary hypotheses in a hypothesis testing problem are called the null hypothesis
and the alternative hypothesis. They are denoted by H 0 and H1 , respectively.
Setting: Let  be a parameter of interest with   :
H 0 :  0 versus H1 :  1  c0
where 0   and c0    0 .
Definition 8.1.3 A hypothesis testing procedure or hypothesis test is a rule that specifies:
1. For which sample values the decision is made to accept H 0 as true.
2. For which sample values H 0 is rejected and H1 is accepted as true.
Rejection Region or Critical Region: subset of the sample space for which H 0 will be rejected.
19
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Acceptance Region: complement of the rejection region
Section 8.2 – Methods of Finding Tests
8.2.1 Likelihood Ratio Tests
Definition 8.2.1 The likelihood ratio test statistic for testing H 0 :  0 versus H1 :   1  c0 is
 ( x) 
sup0 L( | x)
sup  L( | x)
A likelihood ratio test (LRT) is any test that has a rejection region of the form {x :  (x)  c} , where c is any
number satisfying 0  c  1 .
LRT and MLE: Let ˆ be the MLE of  under the unrestricted parameter space  and ˆ0 be the MLE of  under
the restricted parameter space 0 . Then
L(ˆ0 | x)
.
 ( x) 
L(ˆ | x)
20
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Example 8.2.2 (Normal LRT) Let X 1 ,, X n be iid from n( ,1) . We want to test H 0 :    0 versus H1 :    0
where  0 is a fixed number set by the experimenter. Show that  (x)  exp[n( x   0 ) 2 / 2]. So that the LRT rejects
H 0 for small values of  (x) . Therefore the rejection region is {x :  (x)  c} , which is equivalent to
{x :| x   0 | 2(log c) / n } .
Example 8.2.3 (Exponential LRT) Let X 1 ,, X n be iid from an exponential population with pdf
f ( x |  )  exp[( x   )]I[ , ) ( x), where      . We want to test H 0 :    0 versus H1 :   0 , where 0 is a
1, x(1)   0 ;
So that the LRT rejects H 0
fixed number set by the experimenter. Show that  (x)  
exp[n( x(1)   0 )], x(1)   0 .
for small values of  (x) . Therefore the rejection region is
{x :  (x)  c} which is equivalent to {x : x(1)   0 
log(c)
}.
n
Theorem 8.2.4 If T ( X) is a sufficient statistic for  and  * (t ) and  (x) are the LRT statistics based on T and X ,
respectively, then  * (T (x))   (x) for every x in the sample space.
Example 8.2.5 (LRT and Sufficiency):
21
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
 In Example 8.2.2, we could have used the likelihood associated with the sufficient statistic X using the fact
that X ~ n( ,1 / n) which rejects for large values of | X   0 | .
 Similarly, in Example 8.2.3, we can use the likelihood associated with the sufficient statistic X (1) :
L( | x(1) )  n exp[ n( x(1)   )]I[ , ] ( x(1) ) , which rejects for large values of X (1) .
Example 8.2.6 (Normal LRT with unknown variance) Let X 1 ,, X n be iid from n(  , 2 ) and an experimenter
is interested only in testing We want to test H 0 :   0 versus H1 :   0 . The LRT statistic
 ( x) 
max{  
0 ,
2
0}
L (  ,  2 | x)
max{  , 2 0} L(  , 2 | x)

max{  
1, ˆ  0 ;

2
2
 L( 0 ,ˆ 0 | x) / L( ˆ ,ˆ | x),   0 .
where ˆ 02   i 1 ( xi  0 ) 2 / n .
n
Section 8.3 – Methods of Evaluating Tests
8.3.1 Error Probabilities and the Power Function
Two types of Error:
22
0 ,
2
0}
L (  , 2 | x)
L( ˆ ,ˆ 2 | x)
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
 Type I error: If  0 but the hypothesis test incorrectly decides to reject H 0 .
 Type II error: If   c0 but the hypothesis test decides to accept H 0 .
Definition 8.3.1 The power function of a hypothesis test with rejection region R is the function of
 ( )  P ( X  R) .
Example 8.3.2 (Binomial power function) Let X ~ binomial( 5, ). Consider H 0 :   1 / 2 versus H1 :   1 / 2
and calculate the power function of the following tests:
Test 1: R  {all “successes” are observed}
Test 2: R  { X  3,4,or 5} .
Example 8.3.3 (Normal power function) Let X 1 ,, X n be iid from n( , 2 ) where  2 is known. Consider
 X  0

 c  . Therefore
H 0 :    0 versus H1 :   0 . The LRT for this test has rejection region defined by R  
 / n

the power function is:  ( )  P (
 
X  0
).
 c)  P( Z  c  0
/ n
/ n
23
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Example 8.3.4 (continuation of Example 8.3.3) Suppose that the experimenter would like the maximum
probability of a Type I error of 0.1 and a minimum probability of a Type II error of 0.2 if   0   . How do we
choose c and n ?
Definition 8.3.5 For 0    1 , a test with power function  ( ) is a size  test if sup 0  ( )   .
Definition 8.3.6 For 0    1 , a test with power function  ( ) is a level  test if sup 0  ( )   .
Example 8.3.7 (Size of LRT) A size  LRT is constructed by choosing the appropriate c such that
sup 0 P ( ( X )  c)   .
In Example 8.2.2, H 0 :    0 versus H1 :    0 so that R  {| X   0 |
z /2
}.
n
In Example 8.2.3, H 0 :    0 versus H1 :    0 so that
P0 ( X (1)  c)  exp( n(c   0 ))   if c  ( log( )) / n   0 .
8.3.2 Most Powerful Tests
24
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Definition 8.3.11 Let  be a class of tests for testing H 0 :   0 versus H1 :   0c . A test in class  , with power
function  ( ) , is uniformly most powerful (UMP) class  test if  ( )   '( ) for every   c0 and every  '( )
that is a power function of a test in class  .
Theorem 8.3.12 (Neyman-Pearson Lemma) Consider testing H 0 :    0 versus H1 :   1 , where the pdf or pmf
corresponding to i is f ( x | i ), i  0,1 , then any test with rejection region R is a UMP  test if it satisfies
x  R if f (x | 1 )  kf (x |  0 ) or, equivalently,
f (x | 1 )
k
f (x |  0 )
x  R c if f (x | 1 )  kf (x |  0 ) or, equivalently,
f (x | 1 )
k
f (x |  0 )
and
for some k  0 , and   P0 ( X  R) .
Corollary 8.3.13 Consider the hypothesis problem posed in Theorem 8.3.12. Suppose T ( X) is a sufficient statistic
for  and g (t | i ) is the pdf or pmf of T corresponding to i , i  0,1, then any test based on T with rejection
region S (subset of the sample space T ) is a UMP level  test if it satisfies
t  S if g (t | 1 )  kg (t |  0 ) or, equivalently,
25
g (t | 1 )
k
g (t |  0 )
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
and
t  S c if g (t | 1 )  kg (t |  0 ) or, equivalently,
g (t | 1 )
k
g (t |  0 )
for some k  0 , and   P0 (T  S ) .
Example 8.3.14 (UMP Binomial Test) Let X ~ binomial( 2, ). We want to test H 0 :   1 / 2 versus H1 :   3 / 4.
Find UMP test.
Example 8.3.15 (UMP Normal Test) Let X 1 ,, X n be iid from n( , 2 ) where  2 is known. Find the UMP test
for H 0 :    0 versus   1 where  0  1 and find the exact rejection region for such size  test.
Types of Hypotheses:
1. Simple Hypothesis: H :   0 .
2. Composite Hypothesis: more than one possible distribution
a. One-sided Hypothesis: H :   0
b. Two-sided Hypothesis: H :    0
26
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Definition 8.3.16 A family of pdfs and pmfs { g (t |  ) :   } for a univariate random variable T with real-valued
parameter  has a monotone likelihood ratio (MLR) if, for every  2  1 , g (t |  2 ) / g (t | 1 ) is a monotone
(nonincreasing or nondecreasing) function of t on {t : g (t | 1 )  0 or g (t |  2 )  0} . Note that
c / 0   if c  0 .
Theorem 8.3.17 (Karlin-Rubin) Consider testing H 0 :    0 versus H1 :    0 . Suppose that T is a sufficient
statistic for  and the family of pdfs and pmfs {g (t |  ) :   } of T has an MLR and g (t |  2 ) / g (t | 1 ) for  2  1
is nondecreasing function of t . Then for any t0 , the test that rejects H 0 if and only if T  t0 is a UMP level  test,
where   P0 (T  t0 ) .
Note:
 Consider testing H 0 :    0 versus H1 :    0 . Suppose that T is a sufficient statistic for  and the family of
pdfs and pmfs {g (t |  ) :   } of T has an MLR and g (t |  2 ) / g (t | 1 ) for  2  1 is nonincreasing function
of t . Then for any t0 , the test that rejects H 0 if and only if T  t0 is a UMP level  test, where
  P (T  t0 ) .
0
 Consider testing H 0 :    0 versus H1 :   0 . Suppose that T is a sufficient statistic for  and the family of
pdfs and pmfs {g (t |  ) :   } of T has an MLR and g (t |  2 ) / g (t | 1 ) for  2  1 is nondecreasing function
27
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
of t . Then for any t0 , the test that rejects H 0 if and only if T  t0 is a UMP level  test, where
  P (T  t0 ) .
0
 Consider testing H 0 :    0 versus H1 :    0 . Suppose that T is a sufficient statistic for  and the family of
pdfs and pmfs {g (t |  ) :   } of T has an MLR and g (t |  2 ) / g (t | 1 ) for  2  1 is nonincreasing function
of t . Then for any t0 , the test that rejects H 0 if and only if T  t0 is a UMP level  test, where
  P (T  t0 ) .
0
Note: For many problems there is no UMP level  test. This is because the class of level  tests is so large that
no one test dominates all the others in terms of power.
Example 8.3.18 (Continuation of Example 8.3.15) Consider testing H 0' :    0 versus H1' :    0 . Show the test
that rejects H 0 if X 
 z 
  0 is an UMP size  test.
n
8.3.4 p-values
Definition 8.3.26 A p-value p ( X) is a test statistic satisfying 0  p( x)  1 for every sample point x . Small values
of p ( X) give evidence that H1 is true. A p -value is valid if, for every  0 and every 0    1 ,
28
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
P ( p ( X)   )   .
Theorem 8.3.27 Let W ( X) be a test statistic such that large values of W give evidence that H1 is true. For each
sample point x , define
p (x)  sup0 P (W ( X)  W (x)) .
Then p ( X) is a valid p-value.
Example 8.3.28 (Two-sided normal p-value) Let X 1 ,, X n be iid from n(  , 2 ) . The LRT test (Exercise 8.38)
for H 0 :   0 versus H1 :   0 : rejects H 0 if
p (x)  2 P(Tn1 
| X  0 |
is large. Show that p( X) where
S/ n
| x  0 |
) is valid p-value.
s/ n
Example 8.3.29 (One-sided normal p-value) Let X 1 ,, X n be iid from n(  , 2 ) . The LRT test (Exercise 8.38)
for H 0 :   0 versus H1 :   0 : rejects H 0 if
p (x)  P(Tn1  W (x))  P (Tn1 
X  0
is large. Show that p( X) where
S/ n
x  0
) is a valid p-value.
s/ n
29
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Chapter 9 – Interval Estimation
Section 9.1 – Introduction
Definition 9.1.1 An interval estimate of a real-valued parameter  is any pair of functions, L( x1 ,, xn ) and
U ( x1 ,, xn ) , of a sample that satisfy L( x)  U ( x) for all x   . If X  x is observed, the inference
L(x)    U (x) is made. The random interval [ L( X),U ( X)] is called an interval estimator.
Example 9.1.2 (Interval estimator) Let X 1 ,, X 4 be a random sample from n(  ,1) . A possible interval estimator
for  is [ X  1, X  1] . This means that we assert that (the true)  is in this interval.
Example 9.1.3 (Continuation of Example 9.1.2) Note that in this case P( X   )  0 (Why?)
Now consider the interval estimator[ X  1, X  1] . Find P (   [ X  1, X  1])
Definition 9.1.4 For an interval estimator [ L( X),U ( X)] of a parameter  , the coverage probability of
[ L( X),U ( X)] is the probability that the random interval [ L( X),U ( X)] covers the true parameter,  . In symbols, it
is denoted by either P (  [ L( X),U ( X)]) or P (  [ L( X),U ( X)] |  ) .
30
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Definition 9.1.5 For an interval estimator [ L( X),U ( X)] of a parameter  , the confidence coefficient of
[ L( X),U ( X)] is the infimum of the coverage probabilities inf P ( [ L( X),U ( X)]) .
Notes:
 Interval estimators are random quantities not parameters so that the probability in P ( [ L( X),U ( X)]) is not
a statement about the probability of  but the probability of the functions of X .
 Interval estimator with a measure of confidence is usually referred to as confidence intervals.
 In general, we will be working on confidence sets rather than a simple interval where no closed forms are
available for these sets.
 A confidence set with a confidence coefficient equal 1   is called 1   confidence set.
Example 9.1.6 (Scale uniform interval estimator) Let X 1 ,, X n be a random sample from uniform (0, ) . Let
Y  X ( n )  max( X 1 ,, X n ) . Consider the following interval estimators for  .
 Candidate 1: [aY , bY ],1  a  b.
 Candidate 2: [Y  c, Y  d ],0  c  d .
Where a, b, c, d are specified constants. Find the coverage probabilities and confidence coefficients of each interval
estimators.
31
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Section 9.2 – Methods of Finding Interval Estimators
9.2.1 Inverting a Test Statistic
Example 9.2.1 (Inverting a normal test) Let X 1 ,, X n be iid n(  , 2 ) and consider testing H 0 :   0 versus
H1 :   0 . For fixed  level, a most powerful unbiased test rejects H 0 when {x :| x  0 | z /2 / n } and we
accept H 0 :   0 when {x :| x  0 | z /2 / n} , that is
x  z /2 / n  0  x  z /2 / n .
Note that
P( X  z /2 / n  0  X  z /2 / n |   0 )  1   (Why?)
for every 0 so that
P ( X  z /2 / n    X  z /2 / n )  1   (Why?)
Therefore, a 1   confidence interval for  is given by
[ x  z /2 / n , x  z /2 / n ] .
Note: (Correspondence between tests and confidence sets) The acceptance region of the hypothesis test is
32
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
A( 0 )  {x : 0  z /2 / n  x  0  z /2 / n} ,
and the confidence interval is given by
C (x)  { : x  z /2 / n    x  z /2 / n } .
Therefore,
x  A( 0 )  0  C (x) .
Theorem 9.2.2 For each    , let A( 0 ) be the acceptance region of a level  test of H 0 :    0 . For each x   ,
define a set C ( x) in the parameter space by
C (x)  { 0 : x  A( 0 )} .
Then the random set C ( x) is a 1   confidence set. Conversely, let C (x) be a 1   confidence set. For any
 0   , define
A( 0 )  {x :  0  C (x)} .
Then A( 0 ) is the acceptance region of a level  test of H 0 :    0 .
Note:
 All of the techniques we have for obtaining tests can immediately be applied to constructing confidence sets.
 In most cases, one-sided tests give one-sided intervals while two-sided tests give two-sided intervals. Strange
shaped-acceptance regions give strange-shaped confidence sets.
33
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
 The properties of inverted test also carry over (suitably modified) to the confidence set.
 Since we can confine attention to sufficient statistics when looking for a good test, it follows that we can
confine attention to sufficient statistics when looking for “good” confidence sets.
Example 9.2.4 (Normal one-sided confidence bound) Let X 1 ,, X n be a random sample from a n(  , 2 ) . We
will construct a one-sided 1   confidence interval by inverting the test for H 0 :   0 versus H1 :   0 . Recall
the size  LRT of H 0 versus H1 rejects H 0 if
X  0
 tn1,
S/ n
and the acceptance region is defined by
A( 0 )  {x : x  0  tn1, ( s / n )} .
Then the resulting 1   one-sided confidence interval is
C (x)  {0 : x  A( 0 )}  {0 : x  tn1,
9.2.2 Pivotal Quantities
34
s
 0 } .
n
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Definition 9.2.6 A random variable Q( X, ) is a pivotal quantity (or pivot) if the distribution of Q( X, ) is
independent of all parameters. That is, if X ~ F ( x |  ) , then Q( X, ) has the same distribution for all values of  .
Example 9.2.7 (Location-scale pivots)
 Pivot for location family with pdf f ( x   ) : X  
 Pivot for scale family with pdf
1

x
f ( ): X /

 Pivot for location and scale family with pdf
1

f(
x

) (where  is a nuisance parameter):
X 
SX
Example 9.2.8 (Gamma pivot) Let X 1 ,, X n be iid exponential  . Then T   i 1 X i is sufficient for  and
n
T ~ gamma (n,  ) (which is a scale family). Hence a pivot that may be used is
Q1 (T ,  )  T /  ~ gamma (n,1) ;
or
Q1 (T ,  )  2T /  ~ gamma (n,2)   22n .
Given a pivot Q( X, ) , we find numbers a and b such that
P (a  Q( X, )  b)  1   .
35
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
The acceptance region for a level  test for H 0 :    0 is given by
A( 0 )  {x : a  Q(x, 0 )  b}
By Theorem 9.2.2, inverting this test will give us a 1   confidence set given by
C (x)  { : a  Q(x, )  b} .
If  is real-valued and if, for each x   , Q(x, ) is a monotone function of  , then C (x) will be an interval.
If Q(x, ) is an increasing function of  , then C (x) has the form
L(x, a )    U (x, b) .
If Q(x, ) is an decreasing function of  (which is typical), then C (x) has the form
L(x, b)    U (x, a ) .
Example 9.2.9 (continuation of Example 9.2.8) Recall the pivot for  is Q(T ,  )  2T /  ~  22n . Choose a and b
such that
P (a 
2T

 b)  1   .
Example 9.2.10 (normal pivotal interval)
(1) Consider a normal population with  2 known. We want to find a confidence interval for  . Find a pivotal
quantity and construct a confidence interval for  based on this pivot.
36
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
(2) Consider a normal population with  2 also unknown and we want to find a confidence interval for  2 . Find
a pivotal quantity and construct a confidence interval for  2 based on this pivot.
Section 9.3 – Methods of Evaluating Interval Estimators (not required in Final Exam)
37
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Chapter 5 Properties of Random Sample
Section 5.5 Convergence Concepts
Definition 5.5.1: A sequence of random variables, X 1 , X 2 ,  , converges in probability to a random variable X if,
for every   0 ,
lim n P (| X n  X |  )  0 , or equivalently, lim n P (| X n  X |  )  1 .
Theorem 5.5.2 (Weak Law of Large Numbers): Let X 1 , X 2 ,, be iid random variables with EX i   and
VarX i   2   . Define X n 
1 n
 X i . Then for every   0 ,
n i 1
lim n P (| X n   |  )  1 .
Proof: Use Chebychev’s Inequality.
Example 5.5.3 (Consistency of S 2 and S ). Let X 1 , X 2 ,, be iid random variables with EX i   and
VarX i   2   and define S n2 
n
1
( X i  X n ) 2 , can we prove a WLLN for Sn2 ? Using Chebychev’s

i 1
n 1
Inequality, we have
38
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
P(| Sn2   2 |  ) 
E ( Sn2   2 ) 2
2

Var ( Sn2 )
2
.
And thus, a sufficient condition that S n2 converges in probability to  2 is that Var ( S n2 )  0 when n   .
Theorem 5.5.4: Suppose that X 1 , X 2 , converges in probability to a random variable X and that h is continuous
function. Then h( X 1 ), h( X 2 ), converges in probability to h( X ) .
Definition 5.5.10: A sequence of random variables, X 1 , X 2 , , converges in distribution to a random variable X if
lim n FX n ( x)  FX ( x) for all points x where FX ( x) is continuous.
Theorem 5.5.12: If the sequence of random variables, X 1 , X 2 , , converges in probability to X , then the sequence
converges in distribution to X .
Theorem 5.5.13: The sequence of random variable, X 1 , X 2 , , converges in probability to a constant  if and
only if the sequence also converges in distribution to  . That is, the statement
lim n P (| X n   |  )  0 for every   0
is equivalent to
39
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
0, x  u ,
lim n P( X n  x)  
1, x  u.
Theorem 5.5.15 (Stronger form of the Central Limit Theorem) Let X 1 , X 2 , be a sequence of iid random
variables with EX i   and 0  VarX i   2   . Define X n 
1 n
 X n . Let Gn ( x) denote the cdf of
n i 1
n ( X n   ) /  . Then, for any   x   ,
lim n Gn ( x)  
x

That is,
1
y2
exp( )dy.
2
2
n ( X n   ) /  has a limiting standard normal distribution.
Some Notes:
 assumptions: independence, identical distribution and mean and variance exists
 finite variance is necessary for convergence to normality (CLT will not apply to rvs from Cauchy
distribution)
 how good the approximation is in general depends on the original distribution
40
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Example (Normal Approximation to the binomial). Suppose X 1 , X 2 ,, X n area random sample from a
Bernoulli( p ). We know that EX 1  p and Var ( X 1 )  p(1  p) . The Central Limit Theorem tells us that
n ( X n  p)
is approximately n(0,1) . Some comparisons between the exact and approximate calculations are
p (1  p )
given in the following table:
p  0.6
n  100
n  120
0.8744 0.9291 0.9587 0.9755 0.9852
0.9910
X n  0.7 (Approximate) 0.8193 0.9016 0.9430 0.9661 0.9794
0.9873
X n  0.7 (Exact)
Difference
n  20
n  40
n  60
n  80
0.0551 0.0274 0.0156 0.0094 0.0058
0.0037
Theorem 5.5.17 (Slutsky’s Theorem) If X n  X in distribution and Yn  a , a constant, in probability, then
a. Yn X n  aX in distribution.
b. X n  Yn  X  a in distribution.
Example 5.5.18 (Normal approximation with estimated variance) Suppose that
n( X n  )

 n(0,1) , but the value of  is unknown. If we can prove S n2   2 in probability, then by exercise
5.32, we have  / Sn  1 in probability. By theorem 5.5.17, we have
41
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
n( X n  ) 

Sn
Sn
n( X n  )

 n(0,1) .
Notes (relationship between several convergences)
1. converges in probability  converges in distribution
2. converges in probability to a constant  converges in distribution to a constant
3. Slutsky’s Theorem
Example 5.5.19 (Estimating the odds) Suppose that X 1 , X 2 ,, X n are iid Bernoulli( ( p) random variables. The
typical parameter of interest is p , which can be estimated by X n 
1 n
 X i . We can obtain the distribution of
n i 1
nX n , which is Binomial( n, p ). Sometimes we are interested the odds,
Xn
p
, which may be estimated by
.
1 Xn
1 p
Then what are properties of it? For example, how to calculate the variance of it? The exact calculation may be
difficult, but an approximation can be obtained.
For the statistical application of Taylor’s Theorem, we are most concerted with the first-order Taylor series. Let T
be a random variable with mean  and suppose that g is differentiable function, then
g (t )  g ( )  g '( )(t   ) .
42
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Then we have
E ( g (T ))  E ( g ( ))  g '( ) E (T   )  g ( ) ;
And
Var ( g (T ))  E[ g (T )  g ( )]2  E[ g '( )(T   )]2  [ g '( )]2Var (T ).
Theorem 5.5.24 (Delta Method) Let Yn be a sequence of random variable that satisfies
n (Yn   )  n(0, 2 ) in
distribution. For a given g and a specific value of  , suppose that g '( ) exists and is not 0. Then
n [ g (Yn )  g ( )]  n(0, 2 [ g ' ( )]2 ) in distribution.
Example 5.5.22 (Continuation of Example 5.5.19) Recall that we are interested in the properties of
g (t ) 
t
1
,   E ( X n )  p , then g '(t ) 
, thus
1 t
(1  t ) 2
E ( g ( X n ))  g ( p ) 
p
;
1 p
and
Var ( g ( X n ))  [ g '( p )]2Var ( X n ) 
1
1
1
p
.
p (1  p ) 
2
2
n(1  p )3
(1  p ) (1  p ) n
43
Xn
. Let
1 Xn
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Chapter 10 – Asymptotic Evaluations
Section 10.1 - Point Estimation
Section 10.1.1 - Consistency
Definition 10.1.1 A sequence of estimators Wn  Wn ( X 1 ,, X n ) is a consistent sequence of estimators of the
parameter  if, for every   0 and every    ,
lim n P (| Wn   |  )  1 or lim n P (| Wn   |  )  0 .
Recall from Chapter 5, we say that Wn converges in probability to  .
Also, recall an application of Chebychev’s Inequality:
P (| Wn   |  ) 
E [(Wn   ) 2 ]
2

VarWn  (BiasWn ) 2
2
Theorem 10.1.3 If Wn is a sequence of estimators of a parameter  satisfying:
(i) lim n VarWn  0
(ii) lim n BiasWn  0
44
.
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
for every    , then Wn is a consistent sequence of estimators of  .
Example 10.1.2 (consistency of X ) Let X 1 ,, X n be a random sample from n(  ,1) , and consider the consistency
of X n 
1 n
 Xi .
n i 1
Example (consistency of S 2 ): Let X 1 ,, X n be a random sample from n(  , 2 ) . Consider the estimators of
 2 : Sn2 and ˆ n2 
n 1 2
Sn (MLE).
n
Theorem 10.1.6 (Consistency of MLEs) Let X 1 ,, X n be a random sample from f ( x |  ) , and let
n
L( | x)   i 1 f ( xi |  ) be the likelihood function. Let ˆ denote the MLE of  . Let  ( ) be a continuous function
of  . Under the regularity conditions in Miscellanea 10.6.2 on f ( x |  ) and, hence, L( | x) , for every   0 and
every    ,
lim n P (|  (ˆ)   ( ) |  )  0 .
That is,  (ˆ) is a consistent estimator of  ( ) .
Section 10.1.2 - Efficiency
45
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Definition 10.1.11 A sequence of estimators Wn is asymptotically efficient for a parameter  ( ) if
n (Wn   ( ))  n(0, v( )) in distribution and
v( ) 
[ '( )]2

E (( log f ( X |  )) 2 )

.
i.e., the asymptotic variance of Wn achieves the Cramer-Rao Lower Bound.
Theorem 10.1.12 (Consistency and Asymptotic efficiency of the MLEs) Let X 1 ,, X n be iid f ( x |  ) , let ˆ
denote the MLE of  , and let  ( ) be a continuous function of  . Under the regularity conditions on Miscellanea
10.6.2
(p. 516) on f ( x |  ) and, hence, L( | x) ,
n[ (ˆ)   ( )]  n(0, v( )),
where v( ) is the Cramer-Rao Lower Bound. That is,  (ˆ) is a consistent and asymptotically efficient estimator of
 ( ) .
In other words, under the conditions of Theorem 10.1.12:
  (ˆ) is a consistent estimator of  ( ) .
46
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
  (ˆ) has an asymptotic normal distribution and has an asymptotic variance that equals to the Cramer-Rao
Lower Bound. Therefore,
n [ (ˆ)   ( )]
 n(0,1) in distribution.
v( )
Notes:
 Most of the common distributions (for instance, the regular exponential family of distributions) satisfy the
conditions of Theorem 10.1.12.
 However, when the support depends on the parameter  , Theorem 10.1.12 is not applicable.
Section 10.1.3 Calculations and Comparisons
From the Delta method and asymptotic efficiency of MLEs, the variance of h(ˆ) is:
[h '( )]
Var (h(ˆ) |  ) 

I n ( )
2
[h '( )]2 | ˆ
The estimator can be Var (h(ˆ)) 
E (
[h '( )]2
2
E ( 2 log L( | X))


log L( | X)) | ˆ
 2
2
47
(expected information number, MLE of Var (h(ˆ)) )
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Example 10.1.14 (Approximate binomial variance) In example 7.2.7 we saw that pˆ  X is the MLE of p ,
where X 1 ,, X n are iid from Bernoulli( p ). Then Varp ( pˆ ) 
p (1  p )
pˆ (1  pˆ )
(direct calculation). So Varpˆ ( pˆ ) 
n
n
(direct calculation).
We can obtain this by calculating the expected information number:
2
2
E ( 2 log L( p | X)) | p  pˆ  E{ 2 [npˆ log( p)  n(1  pˆ )log(1  p)]}| p  pˆ
p
p
np n(1  p )
n
 ( 2 
) | p  pˆ  
.
2
p
pˆ (1  pˆ )
(1  p )
We can calculate the variance of
pˆ
using the delta method:
1  pˆ
n ( pˆ  p)  n(0, p(1  p)) in probability;
 p
1

;
p 1  p (1  p) 2
So
pˆ
p
p
1
n(
)  n(0, p (1  p )[
]2 )  n(0,
).

2
1  pˆ 1  p
(1  p )
(1  p )3
pˆ
pˆ
.
So Varpˆ (
)
n(1  pˆ )3
1  pˆ
48
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Section 10.3 Hypothesis Testing
Section 10.3.1 Asymptotic Distribution of LRTs
Theorem 10.3.1 (Asymptotic distribution of the LRT – simple H 0 ) For testing H 0 :    0 versus H1 :    0 ,
suppose X 1 ,, X n are iid f ( x |  ) , ˆ is the MLE of  , and f ( x |  ) satisfies the regularity conditions in
Miscellanea 10.6.2. Then under H 0 , as
2log  (x)  12 in distribution
where 12 is a  2 random variable with 1 degree of freedom. Therefore an approximate  level test for H 0 :    0
versus H1 :    0 , Rejects H 0 when 2log  (x)  1,2 .
Example 10.3.2 (Poisson LRT) For testing H 0 :   0 versus H1 :   0 based on X 1 ,, X n iid Poisson(  ), we
have
2log  ( x)  2log(
exp(n0 )0 nx
)  2n[(0  ˆ )  ˆ log(0 / ˆ )] ,
nx
ˆ
ˆ
exp(n )
Where ˆ  x is the MLE of  .
Section 10.3.2 Other Large-Sample Tests
49
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Definition: A Wald test is a test based on statistic of the form
Zn 
Wn   0
,
Sn
Where  0 is a hypothesized value of the parameter  , Wn is an estimator of  , and S n is a standard error of Wn , an
estimator of the standard deviation of Wn .
Application of Theorem 10.1.12: Let vˆ( ) be consistent estimator for v( ) , then
n [ (ˆ)   ( )]
 n(0,1)
vˆ( )
Wald’s statistic is asymptotically standard normal by applying Slutsky’s Theorem.
Notes
 We can use Wald’s statistic as a test statistic for constructing approximate, asymptotic, or large sample tests
for  ( ) . The resulting test is known as the Wald’s test.
 Inverting the Wald’s test will give us an approximate, asymptotic, or large sample confidence interval for
 ( ) . This is equivalent to treating equation (1) as a pivotal quantity.
 Inference procedures based on the Wald’s statistic do not perform very well in small samples.
50
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
 Estimating the variance using the Cramer-Rao Lower Bound will usually result in an underestimation of the true
variance.
Example 10.3.5 (Large-sample binomial tests) Let X 1 ,, X n be iid Bernoulli( p ). Consider H 0 : p  p0 versus
H1 : p  p0 , where 0  p0  1 is specified value. Consider the Wald test:
pˆ n  p
 n(0,1).
( pˆ n (1  pˆ n ) / n)
When p  p0 , Z n 
pˆ n  p0
~ n(0,1) . We reject H 0 if Z n  z . The same statistic Z n obtains if we use the
( pˆ n (1  pˆ n ) / n)
information number to derive a standard error for pˆ n .
If we are interested in H 0 : p  p0 versus H1 : p  p0 , we know that
pˆ n  p
 n(0,1).
( p (1  p ) / n)
So we can use Z n' 
pˆ n  p0
 n(0,1) as a test statistic. We reject H 0 if | Z n' | z /2 .
( p0 (1  p0 ) / n)
We can also use the test statistic Z n 
pˆ n  p0
~ n(0,1) in this situation.
( pˆ n (1  pˆ n ) / n)
Section 10.4 Interval Estimation
51
Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010
Section 10.4.1 MLE Based Method
If X 1 ,, X n from f ( x |  ) and ˆ is the MLE of  , then
varˆ (h(ˆ) |  ) 
Then
[h '( )]2 | ˆ
2
 2 log L( | X) | ˆ

or varˆ (h(ˆ) |  ) 
[h '( )]2 | ˆ
2
E{ 2 log L( | X)}| ˆ

h(ˆ)  h( )
 n(0,1) ,
var (h(ˆ) |  )
ˆ
So the approximate confidence interval is:
h(ˆ)  z /2 varˆ (h(ˆ) |  )  h( )  h(ˆ)  z /2 varˆ (h(ˆ) |  ) .
Example 10.4.1 (Confidence Interval for Odds) We know that the MLE of odds p / (1  p) is pˆ / (1  pˆ ) and its
pˆ
pˆ
approximate variance is varpˆ (
.
)
n(1  pˆ )3
1  pˆ
52