Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Chapter 5 – Properties of a Random Sample Section 5.4 – Order Statistics Definition 5.4.1: The order statistics of a random sample X 1 ,, X n are the sample values placed in ascending order. They are denoted by X (1) ,, X ( n ) , where X (1) X ( n ) . Theorem 5.4.4: Let X (1) ,, X ( n ) denote the order statistics of a random sample, X 1 ,, X n , from a continuous population with cdf FX ( x) and pdf f X ( x) . Then the pdf of X ( j ) is f X ( j ) ( x) n! f X ( x)[ FX ( x)] j 1[1 FX ( x)]n j . ( j 1)!(n j )! Example 5.4.5: (Uniform order statistic pdf) The j th order statistic from uniform(0,1) sample has a j j (n j 1) beta ( j , n j 1) . Consequently, E ( X ( j ) ) and VarX ( j ) . n 1 (n 1) 2 (n 2) Theorem 5.4.6: Let X (1) ,, X ( n ) denote the order statistics of a random sample, X 1 ,, X n , from a continuous population with cdf FX ( x) and pdf f X ( x) . Then the joint pdf of X (i ) and X ( j ) 1 i j n , is f X ( i ) , X ( j ) (u, v) n! f X (u ) f X (v)[ FX (u )]i 1[ FX (v) FX (u )] j i 1[1 FX (v)]n j for u v . (i 1)!( j i 1)!(n j )! 1 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 The joint pdf of X (1) ,, X ( n ) is given by n! f ( x ) f X ( xn ), x1 xn ; f X (1) ,, X ( n ) ( x1 ,, xn ) X 1 0, otherwise. Example: Let X 1 ,, X n be continuous, independent random variables and let X (1) ,, X ( n ) denote their order statistics. Assume that X i ~ f X i ( x) . What is the pdf of X (1) and X ( n ) ? Solution: The cdf of X (1) is: FX (1) ( x) P ( X (1) x) 1 P( X (1) x) 1 P( X 1 x,, X n x) 1 i 1 P( X i x) 1 i 1[1 P( X i x)] n n 1 i 1[1 FX i ( x)]. n So the pdf of X (1) is: f X (1) ( x) n d FX (1) ( x) i 1 f X i ( x) j i [1 FX j ( x)]. dx The cdf of X ( n ) is: FX ( n ) ( x) P( X ( n ) x) P( X 1 x,, X n x) i 1 P( X n x) i 1 FX i ( x). n So the pdf of X ( n ) is: f X ( n ) ( x) n n d FX ( n ) ( x) i 1 f X i ( x) j i FX j ( x). dx If f X i ( x) f ( x) , then f X (1) ( x) nf ( x)[1 F ( x)]n1 and f X ( n ) ( x) nf ( x)[ F ( x)]n1 (same as theorem 5.4.4). 2 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Chapter 6 – Principles of Data Reduction Section 6.2.1: Sufficient Statistics Definition 6.2.1 A statistic T ( X) is a sufficient statistic for if the conditional distribution of the sample X given the value of T ( X) does not depend on . Example 6.2.3 (Binomial sufficient statistic) Let X 1 ,, X n be iid Bernoulli with parameter ,0 1 . Show that T ( X) X 1 X n is a sufficient statistic for . Example 6.2.4 (Normal sufficient statistic) Let X 1 ,, X n be iid n( , 2 ) , where 2 is known. Show that T ( X) X is a sufficient statistic for . Example (Example of a Statistic that Is Not Sufficient) Consider the model of Example 6.2.2 again with n 3 . Then T X 1 X 2 X 3 is sufficient while T X 1 2 X 2 X 3 is not sufficient because: P( X 1 1, X 2 0, X 3 1| X 1 2 X 2 X 3 2) . 3 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Example (Sufficient Statistic for Poisson Family) Let X 1 ,, X n be iid Poisson population with the parameter 0 . Then T ( X) i 1 X i is a sufficient statistic for . n Theorem 6.2.6 (Factorization Theorem) Let f (x | ) denote the joint pdf or pmf of a sample X . A statistic T ( X) is a sufficient statistic for if and only if there exist functions g (t | ) and h(x) such that, for all sample points x and all parameter points , f (x | ) g (T (x) | )h(x) . Example 6.2.7 Let X 1 ,, X n be iid n( , 2 ) , where 2 is known. Show that T ( X) X is a sufficient statistic for using the Factorization Theorem. Example 6.2.8 (Uniform sufficient statistic) Let X 1 ,, X n be iid from a discrete uniform on 1,, . Show that T ( X) X ( n ) max1in X i is a sufficient statistic for . Example 6.2.9 (Normal sufficient statistic, and 2 unknown) Let X 1 ,, X n be iid n( , 2 ) . Show that T ( X) (T1 ( X), T2 ( X)) ( X , S 2 ) is a sufficient statistic for and 2 . 4 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Example (Sufficient Statistic for Poisson Family) Let X 1 ,, X n be iid Poisson population with the parameter 0 . Then use the Factorization Theorem to show that both T ( X) i 1 X i and T ( X) ( X 1 , i 2 X i ) are n n sufficient statistics for . Theorem 6.2.10 Let X 1 ,, X n be iid from a pdf or pmf f ( x | ) that belongs to an exponential family given by f ( x | θ) h( x)c(θ)exp( i 1 wi (θ)ti ( x)) , k Where θ (1 ,, d ), d k . Then T ( X) ( j 1 t1 ( X j ), j 1 t2 ( X j ),, j 1 tk ( X j )) n n n is a sufficient statistic for θ . Example (Sufficient Statistic for Poisson Family) Let X 1 ,, X n be iid Poisson population with the parameter 0 . Then f ( x | ) exp( ) x x! 1 exp( )exp( x log( )) , we have h( x) 1 / x!, c( ) exp( ) , x! w( ) log( ) , and t ( x) x , so T ( X) i 1 t ( X i ) i 1 X i is a sufficient statistic for . n n Section 6.2.2: Minimal Sufficient Statistics 5 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Definition 6.2.11 A sufficient statistic T ( X) is called minimal sufficient statistic if, for any other sufficient statistic T '( X) , T ( X) is a function of T '( X) . Theorem 6.2.13 Let f (x | ) be the pmf or pdf of a sample X . Suppose there exists a function T ( X) such that, for every two sample points x and y , the ratio of f ( x | ) / f ( y | ) is constant as a function of if and only if T (x) T (y ) . Then T ( X) is a minimal sufficient statistic for . Example 6.2.14 (Normal minimal sufficient statistic) Let X 1 ,, X n be iid n( , 2 ) , where both and 2 are unknown. Show that ( X , S 2 ) is a minimal sufficient statistic for ( , 2 ) . Example 6.2.15 (Uniform minimal sufficient statistic) Suppose X 1 ,, X n are iid uniform observations on the interval ( , 1), . Show that T ( X ) ( X (1) , X ( n ) ) is a minimal sufficient statistic. (In this example, the dimension of the minimal sufficient statistic does not match the dimension of the parameter.) Section 6.2.3 Ancillary Statistics Definition 6.2.16 A statistic S ( X) whose distribution does not depend on the parameter is called an ancillary statistic. 6 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Example 6.2.17 (Uniform ancillary statistic) Let X 1 ,, X n be iid uniform observations on the interval ( , 1), . Show that the range statistic, R X ( n ) X (1) , is an ancillary statistic. Example 6.2.18 (Location family ancillary statistic) Suppose X 1 ,, X n are iid observations from a location parameter family with cdf F ( x ), . Show that the range statistic, R X ( n ) X (1) , is an ancillary statistic. Example 6.2.19 (Scale family ancillary statistic) Suppose X 1 ,, X n are iid observations from a scale parameter family with cdf F ( x / ), 0 . Then any statistic that depends on the sample only through the n 1 values X 1 / X n ,, X n1 / X n is an ancillary statistic. Section 6.2.4 Sufficient, Ancillary and Complete Statistics Definition 6.2.21 Let f (t | ) be a family of pdfs and pmfs for a statistic T ( X) . The family of probability distributions is called complete if E g (T ) 0 for all implies P ( g (T ) 0) 1 . Equivalently, T ( X) is called a complete statistic. 7 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Example 6.2.22 (Binomial complete sufficient statistic) Suppose that T has a binomial (n, p ) distribution with 0 p 1 . Show that T is a complete statistic. Example 6.2.23 (Uniform complete sufficient statistic) Let X 1 ,, X n be iid uniform (0, ),0 , observations. Show that T X ( n ) is a complete statistic. Theorem 6.2.24 (Basu’s Theorem) If T ( X) is a complete and minimal sufficient statistic, then T ( X) is independent of every ancillary statistic. Theorem 6.2.25 (Complete statistics in the exponential family) Let X 1 ,, X n be iid observations from an exponential family with pdf or pmf of the form f ( x | θ) h( x)c(θ)exp( i 1 wi (θ)ti ( x)) , k Where (1 ,, k ) . Then the statistic T ( X) ( j 1 t1 ( X j ), j 1 t2 ( X j ),, j 1 tk ( X j )) n n is complete as long as the parameter space contains an open set in R k . 8 n Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Example 6.2.26 (Using Basu’s Theorem – I) Let X 1 ,, X n be iid exponential( ) observations. Compute E g ( X) where g ( X) Xn . X1 X n Example 6.2.27 (Using Basu’s Theorem – II) Let X 1 ,, X n be iid observations from n( , 2 ) population. Using Basu’s Theorem, show that X and S 2 are independent. Example (A minimal sufficient statistics that is not complete) Let X 1 ,, X n be iid n( , 2 ) , we know that T ( X , S 2 ) is a minimal sufficient statistic. Let g (T ) 1 n X 2 S 2 . Because X ~ n( , 2 ) , so we have n n 1 1 n 1 2 E ( X 2 ) ( E X ) 2 Var ( X ) 2 2 . We also have E S 2 2 , so n n E ( g (T )) E ( However, P ( g (T ) 0) P ( n n n 1 2 X 2 ) E ( S 2 ) * 2 0 . n 1 n 1 n n X 2 S 2 0) 0 because X and S 2 independent and continuous random n 1 variables. So T ( X) ( X , S 2 ) is not complete. Section 6.3 – Likelihood Principle 9 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Definition 6.3.1 Let f (x | ) denote the joint pdf or pmf of the sample X ( X 1 ,, X n ) . Then given that X = x is observed, the function of defined by L( | x) f (x | ) is called the likelihood function. Example (Likelihood Function for Uniform Distribution) Let X 1 ,, X n be iid uniform (0, ) , then the likelihood function is: L( | x) 1 n I[0 x( n ) ] ( x1 ,, xn ) 1 n 10 I (0, ) ( X ( n ) ) . Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Chapter 7 – Methods of Finding Estimators Section 7.1 – Introduction Definition 7.1.1 A point estimator is any function W ( X) W ( X 1 , X 2 ,, X n ) of a sample; that is, any statistic is a point estimator. 7.2.1 Method of Moments (MME) Let X 1 ,, X n be iid from pmf or pdf f ( x | 1 ,, k ) , we have 1st sample moment: m1 1 n Xi n i 1 1st population moment: 1' EX 1' (1 ,, k ) k th sample moment: mk 1 n k Xi n i 1 k th population moment: k' EX k k' (1 ,, k ) 11 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 To get MME: “Equate” the first k sample moments to the corresponding k population moments and solve these equations for (1 ,, k ) in terms of (m1 ,, mk ) ( 1 n 1 n 1 n X , i 1 X i2 ,, i 1 X ik ) . i 1 1 n n n Example 7.2.1 (Normal method of moments) Suppose X 1 ,, X n are iid from a n( , 2 ) . In this case, k 2 and 1 and 2 2 . 7.2.2 Maximum Likelihood (MLE) Let X 1 ,, X n be iid from pmf or pdf f ( x | 1 ,, k ) . The likelihood function is defined by L(θ | x) L(1 ,, k | x1 ,, xn ) i 1 f ( xi |1 ,, k ) . n Definition 7.2.4 For each sample point x , let ˆ(x) be a parameter value at which L(θ | x) attains its maximum as a function of θ , with x held fixed. A maximum likelihood estimator (MLE) of the parameter θ based on a sample X is ˆ( X) . Example 7.2.5 (Normal likelihood) Let X 1 ,, X n are iid from a n( ,1) . Show that X is the MLE of using derivatives. Solution: 12 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Step 1: find the solution from the equation: d L( | x) 0 , which gives the possible solutions. d d2 Step 2: verify the solution achieves the global maximum ( 2 L( | x) | x 0 in this case). d Step 3: check the boundaries ( in this case; it is not necessary in this case). Example 7.2.7 (Bernoulli MLE) Let X 1 ,, X n are iid Bernoulli( p ). Find the MLE of p where 0 p 1 . Note that we include the possibility that p 0 or p 1 . Solution: Use the natural log of the likelihood function. Example 7.2.8 (Restricted range MLE) Let X 1 ,, X n are iid from a n( ,1) , where 0 . Solution: Without any restriction, X is the MLE. So when x 0 , ˆ x . When x 0 , L( | x) achieves its maximum at ˆ 0 for 0 , so ˆ 0 in this situation. In summary: X , X 0; ˆ XI[0, ) ( X ) 0, X 0. Invariance Property of Maximum Likelihood Estimators 13 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Theorem 7.2.10 (Invariance Property of MLEs) If ˆ is the MLE of , then for any function ( ) , the MLE of ( ) is (ˆ) . Example Let X 1 ,, X n be iid n( ,1) , the MLE of 2 is X 2 . Example Let X 1 ,, X n be iid Poisson( ). Find the MLE of P( X 0) . Solution: The MLE of is: ˆ X . Because P( X 0) exp( ) , so the MLE of P( X 0) is exp( X ) . Section 7.3 – Methods of Evaluating Estimators 7.3.1 Mean Squared Error Definition 7.3.1 The mean squared error (MSE) of an estimator W of a parameter is the function defined by MSE E (W ) 2 VarW ( BiasW ) 2 , where BiasW EW . Definition 7.3.1 The mean squared error (MSE) of an estimator W as a function of parameter , ( ) is the function defined by MSE E (W ( )) 2 VarW ( BiasW ) 2 , where BiasW EW ( ) . 14 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Definition 7.3.2 The bias of a point estimator W of a parameter is the difference between the expected value of W and . An estimator whose bias is identically (in ) equal to 0 is called unbiased and satisfies EW for all . Definition 7.3.2 The bias of a point estimator W of a ( ) is the difference between the expected value of W and ( ) . An estimator whose bias is identically (in ( ) ) equal to 0 is called unbiased and satisfies EW ( ) for all . Example 7.3.3 (Normal MSE) Let X 1 ,, X n be iid from a n( , 2 ) . We know that X and S 2 are unbiased estimators of and 2 , respectively, EX and ES 2 2 . And the MSE of them are 2 4 MSE ( X ) E ( X ) / n and MSE ( S ) E ( S ) VarS . n 1 2 2 2 2 2 2 2 Example 7.3.4 Let X 1 ,, X n be iid from a n( , 2 ) . Recall that the MLE (and MME) of 2 is ˆ 2 1 n n 1 2 2 ( ) X X S . Verify that ˆ 2 has the smaller MSE than S 2 . i i 1 n n 7.3.2 Best Unbiased Estimator Consider the class of estimators 15 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 C {W : EW ( )} . For any W1 ,W2 C , we have BiasW1 BiasW2 ( ) ( ) 0 , so MSE (W1 ) MSE (W2 ) E (W1 ( )) 2 E (W2 ( )) 2 Var (W1 ) Var (W2 ) . Definition 7.3.7 An estimator W * is a best unbiased estimator of ( ) if it satisfies EW * ( ) for all and, for any other estimator W with EW ( ) , we have Var (W *) Var (W ) for all . W * is also called a uniform minimum variance unbiased estimator (UMVUE) of ( ) . Theorem 7.3.9 (Cramer-Rao inequality) Let X 1 ,, X n be a sample with pdf f (x | ) and let W ( X) W ( X 1 ,, X n ) be any estimator satisfying d EW ( X) [W (x) f (x | )]dx d and Var (W ( X)) . Then d EW ( X)) 2 d Var (W ( X)) . 2 E (( log f ( X | )) ) ( 16 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 where log is the natural logarithm. Corollary 7.3.10 (Cram´er-Rao inequality, iid case) If the assumptions of Theorem 7.3.9 are satisfied and, additionally, X 1 ,, X n are iid with pdf f ( x | ) , then d EW ( X)) 2 d Var (W ( X)) . 2 nE (( log f ( X | )) ) ( Lemma 7.3.11 If f ( x | ) satisfies d E ( log f ( X | )) [( log f ( x | )) f ( x | )]dx d (true for an exponential family), then 2 2 E {[ log f ( X | )] } E ( 2 log f ( X | )) . Example 7.3.12 Recall the Poisson problem. We will show that X is the UMVUE of . 7.3.3 Sufficiency and Unbiasedness 17 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Theorem 7.3.17 (Rao-Blackwell) Let W be any unbiased estimator of ( ) , and let T be a sufficient statistic for . Define (T ) E (W | T ) . Then E ( (T )) ( ) and Var ( (T )) Var (W ) for all ; i.e., (T ) is a uniformly better unbiased estimator of ( ) . Theorem 7.3.19 If W is a best unbiased estimator of ( ) , then W is unique. Theorem 7.3.23 Let T be a complete sufficient statistic for a parameter , and let (T ) be any estimator based only on T . Then (T ) is the unique best unbiased estimator of its expected value. Example 7.3.24 (Binomial best unbiased estimation) Let X 1 ,, X n be iid binomial (k , ) . We want to estimate ( ) P ( X 1) k (1 )k 1 . 18 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Chapter 8 – Hypothesis Testing Section 8.1 – Introduction Definition 8.1.1 A hypothesis is a statement about a population parameter. Definition 8.1.2 The two complementary hypotheses in a hypothesis testing problem are called the null hypothesis and the alternative hypothesis. They are denoted by H 0 and H1 , respectively. Setting: Let be a parameter of interest with : H 0 : 0 versus H1 : 1 c0 where 0 and c0 0 . Definition 8.1.3 A hypothesis testing procedure or hypothesis test is a rule that specifies: 1. For which sample values the decision is made to accept H 0 as true. 2. For which sample values H 0 is rejected and H1 is accepted as true. Rejection Region or Critical Region: subset of the sample space for which H 0 will be rejected. 19 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Acceptance Region: complement of the rejection region Section 8.2 – Methods of Finding Tests 8.2.1 Likelihood Ratio Tests Definition 8.2.1 The likelihood ratio test statistic for testing H 0 : 0 versus H1 : 1 c0 is ( x) sup0 L( | x) sup L( | x) A likelihood ratio test (LRT) is any test that has a rejection region of the form {x : (x) c} , where c is any number satisfying 0 c 1 . LRT and MLE: Let ˆ be the MLE of under the unrestricted parameter space and ˆ0 be the MLE of under the restricted parameter space 0 . Then L(ˆ0 | x) . ( x) L(ˆ | x) 20 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Example 8.2.2 (Normal LRT) Let X 1 ,, X n be iid from n( ,1) . We want to test H 0 : 0 versus H1 : 0 where 0 is a fixed number set by the experimenter. Show that (x) exp[n( x 0 ) 2 / 2]. So that the LRT rejects H 0 for small values of (x) . Therefore the rejection region is {x : (x) c} , which is equivalent to {x :| x 0 | 2(log c) / n } . Example 8.2.3 (Exponential LRT) Let X 1 ,, X n be iid from an exponential population with pdf f ( x | ) exp[( x )]I[ , ) ( x), where . We want to test H 0 : 0 versus H1 : 0 , where 0 is a 1, x(1) 0 ; So that the LRT rejects H 0 fixed number set by the experimenter. Show that (x) exp[n( x(1) 0 )], x(1) 0 . for small values of (x) . Therefore the rejection region is {x : (x) c} which is equivalent to {x : x(1) 0 log(c) }. n Theorem 8.2.4 If T ( X) is a sufficient statistic for and * (t ) and (x) are the LRT statistics based on T and X , respectively, then * (T (x)) (x) for every x in the sample space. Example 8.2.5 (LRT and Sufficiency): 21 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 In Example 8.2.2, we could have used the likelihood associated with the sufficient statistic X using the fact that X ~ n( ,1 / n) which rejects for large values of | X 0 | . Similarly, in Example 8.2.3, we can use the likelihood associated with the sufficient statistic X (1) : L( | x(1) ) n exp[ n( x(1) )]I[ , ] ( x(1) ) , which rejects for large values of X (1) . Example 8.2.6 (Normal LRT with unknown variance) Let X 1 ,, X n be iid from n( , 2 ) and an experimenter is interested only in testing We want to test H 0 : 0 versus H1 : 0 . The LRT statistic ( x) max{ 0 , 2 0} L ( , 2 | x) max{ , 2 0} L( , 2 | x) max{ 1, ˆ 0 ; 2 2 L( 0 ,ˆ 0 | x) / L( ˆ ,ˆ | x), 0 . where ˆ 02 i 1 ( xi 0 ) 2 / n . n Section 8.3 – Methods of Evaluating Tests 8.3.1 Error Probabilities and the Power Function Two types of Error: 22 0 , 2 0} L ( , 2 | x) L( ˆ ,ˆ 2 | x) Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Type I error: If 0 but the hypothesis test incorrectly decides to reject H 0 . Type II error: If c0 but the hypothesis test decides to accept H 0 . Definition 8.3.1 The power function of a hypothesis test with rejection region R is the function of ( ) P ( X R) . Example 8.3.2 (Binomial power function) Let X ~ binomial( 5, ). Consider H 0 : 1 / 2 versus H1 : 1 / 2 and calculate the power function of the following tests: Test 1: R {all “successes” are observed} Test 2: R { X 3,4,or 5} . Example 8.3.3 (Normal power function) Let X 1 ,, X n be iid from n( , 2 ) where 2 is known. Consider X 0 c . Therefore H 0 : 0 versus H1 : 0 . The LRT for this test has rejection region defined by R / n the power function is: ( ) P ( X 0 ). c) P( Z c 0 / n / n 23 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Example 8.3.4 (continuation of Example 8.3.3) Suppose that the experimenter would like the maximum probability of a Type I error of 0.1 and a minimum probability of a Type II error of 0.2 if 0 . How do we choose c and n ? Definition 8.3.5 For 0 1 , a test with power function ( ) is a size test if sup 0 ( ) . Definition 8.3.6 For 0 1 , a test with power function ( ) is a level test if sup 0 ( ) . Example 8.3.7 (Size of LRT) A size LRT is constructed by choosing the appropriate c such that sup 0 P ( ( X ) c) . In Example 8.2.2, H 0 : 0 versus H1 : 0 so that R {| X 0 | z /2 }. n In Example 8.2.3, H 0 : 0 versus H1 : 0 so that P0 ( X (1) c) exp( n(c 0 )) if c ( log( )) / n 0 . 8.3.2 Most Powerful Tests 24 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Definition 8.3.11 Let be a class of tests for testing H 0 : 0 versus H1 : 0c . A test in class , with power function ( ) , is uniformly most powerful (UMP) class test if ( ) '( ) for every c0 and every '( ) that is a power function of a test in class . Theorem 8.3.12 (Neyman-Pearson Lemma) Consider testing H 0 : 0 versus H1 : 1 , where the pdf or pmf corresponding to i is f ( x | i ), i 0,1 , then any test with rejection region R is a UMP test if it satisfies x R if f (x | 1 ) kf (x | 0 ) or, equivalently, f (x | 1 ) k f (x | 0 ) x R c if f (x | 1 ) kf (x | 0 ) or, equivalently, f (x | 1 ) k f (x | 0 ) and for some k 0 , and P0 ( X R) . Corollary 8.3.13 Consider the hypothesis problem posed in Theorem 8.3.12. Suppose T ( X) is a sufficient statistic for and g (t | i ) is the pdf or pmf of T corresponding to i , i 0,1, then any test based on T with rejection region S (subset of the sample space T ) is a UMP level test if it satisfies t S if g (t | 1 ) kg (t | 0 ) or, equivalently, 25 g (t | 1 ) k g (t | 0 ) Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 and t S c if g (t | 1 ) kg (t | 0 ) or, equivalently, g (t | 1 ) k g (t | 0 ) for some k 0 , and P0 (T S ) . Example 8.3.14 (UMP Binomial Test) Let X ~ binomial( 2, ). We want to test H 0 : 1 / 2 versus H1 : 3 / 4. Find UMP test. Example 8.3.15 (UMP Normal Test) Let X 1 ,, X n be iid from n( , 2 ) where 2 is known. Find the UMP test for H 0 : 0 versus 1 where 0 1 and find the exact rejection region for such size test. Types of Hypotheses: 1. Simple Hypothesis: H : 0 . 2. Composite Hypothesis: more than one possible distribution a. One-sided Hypothesis: H : 0 b. Two-sided Hypothesis: H : 0 26 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Definition 8.3.16 A family of pdfs and pmfs { g (t | ) : } for a univariate random variable T with real-valued parameter has a monotone likelihood ratio (MLR) if, for every 2 1 , g (t | 2 ) / g (t | 1 ) is a monotone (nonincreasing or nondecreasing) function of t on {t : g (t | 1 ) 0 or g (t | 2 ) 0} . Note that c / 0 if c 0 . Theorem 8.3.17 (Karlin-Rubin) Consider testing H 0 : 0 versus H1 : 0 . Suppose that T is a sufficient statistic for and the family of pdfs and pmfs {g (t | ) : } of T has an MLR and g (t | 2 ) / g (t | 1 ) for 2 1 is nondecreasing function of t . Then for any t0 , the test that rejects H 0 if and only if T t0 is a UMP level test, where P0 (T t0 ) . Note: Consider testing H 0 : 0 versus H1 : 0 . Suppose that T is a sufficient statistic for and the family of pdfs and pmfs {g (t | ) : } of T has an MLR and g (t | 2 ) / g (t | 1 ) for 2 1 is nonincreasing function of t . Then for any t0 , the test that rejects H 0 if and only if T t0 is a UMP level test, where P (T t0 ) . 0 Consider testing H 0 : 0 versus H1 : 0 . Suppose that T is a sufficient statistic for and the family of pdfs and pmfs {g (t | ) : } of T has an MLR and g (t | 2 ) / g (t | 1 ) for 2 1 is nondecreasing function 27 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 of t . Then for any t0 , the test that rejects H 0 if and only if T t0 is a UMP level test, where P (T t0 ) . 0 Consider testing H 0 : 0 versus H1 : 0 . Suppose that T is a sufficient statistic for and the family of pdfs and pmfs {g (t | ) : } of T has an MLR and g (t | 2 ) / g (t | 1 ) for 2 1 is nonincreasing function of t . Then for any t0 , the test that rejects H 0 if and only if T t0 is a UMP level test, where P (T t0 ) . 0 Note: For many problems there is no UMP level test. This is because the class of level tests is so large that no one test dominates all the others in terms of power. Example 8.3.18 (Continuation of Example 8.3.15) Consider testing H 0' : 0 versus H1' : 0 . Show the test that rejects H 0 if X z 0 is an UMP size test. n 8.3.4 p-values Definition 8.3.26 A p-value p ( X) is a test statistic satisfying 0 p( x) 1 for every sample point x . Small values of p ( X) give evidence that H1 is true. A p -value is valid if, for every 0 and every 0 1 , 28 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 P ( p ( X) ) . Theorem 8.3.27 Let W ( X) be a test statistic such that large values of W give evidence that H1 is true. For each sample point x , define p (x) sup0 P (W ( X) W (x)) . Then p ( X) is a valid p-value. Example 8.3.28 (Two-sided normal p-value) Let X 1 ,, X n be iid from n( , 2 ) . The LRT test (Exercise 8.38) for H 0 : 0 versus H1 : 0 : rejects H 0 if p (x) 2 P(Tn1 | X 0 | is large. Show that p( X) where S/ n | x 0 | ) is valid p-value. s/ n Example 8.3.29 (One-sided normal p-value) Let X 1 ,, X n be iid from n( , 2 ) . The LRT test (Exercise 8.38) for H 0 : 0 versus H1 : 0 : rejects H 0 if p (x) P(Tn1 W (x)) P (Tn1 X 0 is large. Show that p( X) where S/ n x 0 ) is a valid p-value. s/ n 29 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Chapter 9 – Interval Estimation Section 9.1 – Introduction Definition 9.1.1 An interval estimate of a real-valued parameter is any pair of functions, L( x1 ,, xn ) and U ( x1 ,, xn ) , of a sample that satisfy L( x) U ( x) for all x . If X x is observed, the inference L(x) U (x) is made. The random interval [ L( X),U ( X)] is called an interval estimator. Example 9.1.2 (Interval estimator) Let X 1 ,, X 4 be a random sample from n( ,1) . A possible interval estimator for is [ X 1, X 1] . This means that we assert that (the true) is in this interval. Example 9.1.3 (Continuation of Example 9.1.2) Note that in this case P( X ) 0 (Why?) Now consider the interval estimator[ X 1, X 1] . Find P ( [ X 1, X 1]) Definition 9.1.4 For an interval estimator [ L( X),U ( X)] of a parameter , the coverage probability of [ L( X),U ( X)] is the probability that the random interval [ L( X),U ( X)] covers the true parameter, . In symbols, it is denoted by either P ( [ L( X),U ( X)]) or P ( [ L( X),U ( X)] | ) . 30 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Definition 9.1.5 For an interval estimator [ L( X),U ( X)] of a parameter , the confidence coefficient of [ L( X),U ( X)] is the infimum of the coverage probabilities inf P ( [ L( X),U ( X)]) . Notes: Interval estimators are random quantities not parameters so that the probability in P ( [ L( X),U ( X)]) is not a statement about the probability of but the probability of the functions of X . Interval estimator with a measure of confidence is usually referred to as confidence intervals. In general, we will be working on confidence sets rather than a simple interval where no closed forms are available for these sets. A confidence set with a confidence coefficient equal 1 is called 1 confidence set. Example 9.1.6 (Scale uniform interval estimator) Let X 1 ,, X n be a random sample from uniform (0, ) . Let Y X ( n ) max( X 1 ,, X n ) . Consider the following interval estimators for . Candidate 1: [aY , bY ],1 a b. Candidate 2: [Y c, Y d ],0 c d . Where a, b, c, d are specified constants. Find the coverage probabilities and confidence coefficients of each interval estimators. 31 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Section 9.2 – Methods of Finding Interval Estimators 9.2.1 Inverting a Test Statistic Example 9.2.1 (Inverting a normal test) Let X 1 ,, X n be iid n( , 2 ) and consider testing H 0 : 0 versus H1 : 0 . For fixed level, a most powerful unbiased test rejects H 0 when {x :| x 0 | z /2 / n } and we accept H 0 : 0 when {x :| x 0 | z /2 / n} , that is x z /2 / n 0 x z /2 / n . Note that P( X z /2 / n 0 X z /2 / n | 0 ) 1 (Why?) for every 0 so that P ( X z /2 / n X z /2 / n ) 1 (Why?) Therefore, a 1 confidence interval for is given by [ x z /2 / n , x z /2 / n ] . Note: (Correspondence between tests and confidence sets) The acceptance region of the hypothesis test is 32 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 A( 0 ) {x : 0 z /2 / n x 0 z /2 / n} , and the confidence interval is given by C (x) { : x z /2 / n x z /2 / n } . Therefore, x A( 0 ) 0 C (x) . Theorem 9.2.2 For each , let A( 0 ) be the acceptance region of a level test of H 0 : 0 . For each x , define a set C ( x) in the parameter space by C (x) { 0 : x A( 0 )} . Then the random set C ( x) is a 1 confidence set. Conversely, let C (x) be a 1 confidence set. For any 0 , define A( 0 ) {x : 0 C (x)} . Then A( 0 ) is the acceptance region of a level test of H 0 : 0 . Note: All of the techniques we have for obtaining tests can immediately be applied to constructing confidence sets. In most cases, one-sided tests give one-sided intervals while two-sided tests give two-sided intervals. Strange shaped-acceptance regions give strange-shaped confidence sets. 33 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 The properties of inverted test also carry over (suitably modified) to the confidence set. Since we can confine attention to sufficient statistics when looking for a good test, it follows that we can confine attention to sufficient statistics when looking for “good” confidence sets. Example 9.2.4 (Normal one-sided confidence bound) Let X 1 ,, X n be a random sample from a n( , 2 ) . We will construct a one-sided 1 confidence interval by inverting the test for H 0 : 0 versus H1 : 0 . Recall the size LRT of H 0 versus H1 rejects H 0 if X 0 tn1, S/ n and the acceptance region is defined by A( 0 ) {x : x 0 tn1, ( s / n )} . Then the resulting 1 one-sided confidence interval is C (x) {0 : x A( 0 )} {0 : x tn1, 9.2.2 Pivotal Quantities 34 s 0 } . n Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Definition 9.2.6 A random variable Q( X, ) is a pivotal quantity (or pivot) if the distribution of Q( X, ) is independent of all parameters. That is, if X ~ F ( x | ) , then Q( X, ) has the same distribution for all values of . Example 9.2.7 (Location-scale pivots) Pivot for location family with pdf f ( x ) : X Pivot for scale family with pdf 1 x f ( ): X / Pivot for location and scale family with pdf 1 f( x ) (where is a nuisance parameter): X SX Example 9.2.8 (Gamma pivot) Let X 1 ,, X n be iid exponential . Then T i 1 X i is sufficient for and n T ~ gamma (n, ) (which is a scale family). Hence a pivot that may be used is Q1 (T , ) T / ~ gamma (n,1) ; or Q1 (T , ) 2T / ~ gamma (n,2) 22n . Given a pivot Q( X, ) , we find numbers a and b such that P (a Q( X, ) b) 1 . 35 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 The acceptance region for a level test for H 0 : 0 is given by A( 0 ) {x : a Q(x, 0 ) b} By Theorem 9.2.2, inverting this test will give us a 1 confidence set given by C (x) { : a Q(x, ) b} . If is real-valued and if, for each x , Q(x, ) is a monotone function of , then C (x) will be an interval. If Q(x, ) is an increasing function of , then C (x) has the form L(x, a ) U (x, b) . If Q(x, ) is an decreasing function of (which is typical), then C (x) has the form L(x, b) U (x, a ) . Example 9.2.9 (continuation of Example 9.2.8) Recall the pivot for is Q(T , ) 2T / ~ 22n . Choose a and b such that P (a 2T b) 1 . Example 9.2.10 (normal pivotal interval) (1) Consider a normal population with 2 known. We want to find a confidence interval for . Find a pivotal quantity and construct a confidence interval for based on this pivot. 36 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 (2) Consider a normal population with 2 also unknown and we want to find a confidence interval for 2 . Find a pivotal quantity and construct a confidence interval for 2 based on this pivot. Section 9.3 – Methods of Evaluating Interval Estimators (not required in Final Exam) 37 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Chapter 5 Properties of Random Sample Section 5.5 Convergence Concepts Definition 5.5.1: A sequence of random variables, X 1 , X 2 , , converges in probability to a random variable X if, for every 0 , lim n P (| X n X | ) 0 , or equivalently, lim n P (| X n X | ) 1 . Theorem 5.5.2 (Weak Law of Large Numbers): Let X 1 , X 2 ,, be iid random variables with EX i and VarX i 2 . Define X n 1 n X i . Then for every 0 , n i 1 lim n P (| X n | ) 1 . Proof: Use Chebychev’s Inequality. Example 5.5.3 (Consistency of S 2 and S ). Let X 1 , X 2 ,, be iid random variables with EX i and VarX i 2 and define S n2 n 1 ( X i X n ) 2 , can we prove a WLLN for Sn2 ? Using Chebychev’s i 1 n 1 Inequality, we have 38 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 P(| Sn2 2 | ) E ( Sn2 2 ) 2 2 Var ( Sn2 ) 2 . And thus, a sufficient condition that S n2 converges in probability to 2 is that Var ( S n2 ) 0 when n . Theorem 5.5.4: Suppose that X 1 , X 2 , converges in probability to a random variable X and that h is continuous function. Then h( X 1 ), h( X 2 ), converges in probability to h( X ) . Definition 5.5.10: A sequence of random variables, X 1 , X 2 , , converges in distribution to a random variable X if lim n FX n ( x) FX ( x) for all points x where FX ( x) is continuous. Theorem 5.5.12: If the sequence of random variables, X 1 , X 2 , , converges in probability to X , then the sequence converges in distribution to X . Theorem 5.5.13: The sequence of random variable, X 1 , X 2 , , converges in probability to a constant if and only if the sequence also converges in distribution to . That is, the statement lim n P (| X n | ) 0 for every 0 is equivalent to 39 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 0, x u , lim n P( X n x) 1, x u. Theorem 5.5.15 (Stronger form of the Central Limit Theorem) Let X 1 , X 2 , be a sequence of iid random variables with EX i and 0 VarX i 2 . Define X n 1 n X n . Let Gn ( x) denote the cdf of n i 1 n ( X n ) / . Then, for any x , lim n Gn ( x) x That is, 1 y2 exp( )dy. 2 2 n ( X n ) / has a limiting standard normal distribution. Some Notes: assumptions: independence, identical distribution and mean and variance exists finite variance is necessary for convergence to normality (CLT will not apply to rvs from Cauchy distribution) how good the approximation is in general depends on the original distribution 40 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Example (Normal Approximation to the binomial). Suppose X 1 , X 2 ,, X n area random sample from a Bernoulli( p ). We know that EX 1 p and Var ( X 1 ) p(1 p) . The Central Limit Theorem tells us that n ( X n p) is approximately n(0,1) . Some comparisons between the exact and approximate calculations are p (1 p ) given in the following table: p 0.6 n 100 n 120 0.8744 0.9291 0.9587 0.9755 0.9852 0.9910 X n 0.7 (Approximate) 0.8193 0.9016 0.9430 0.9661 0.9794 0.9873 X n 0.7 (Exact) Difference n 20 n 40 n 60 n 80 0.0551 0.0274 0.0156 0.0094 0.0058 0.0037 Theorem 5.5.17 (Slutsky’s Theorem) If X n X in distribution and Yn a , a constant, in probability, then a. Yn X n aX in distribution. b. X n Yn X a in distribution. Example 5.5.18 (Normal approximation with estimated variance) Suppose that n( X n ) n(0,1) , but the value of is unknown. If we can prove S n2 2 in probability, then by exercise 5.32, we have / Sn 1 in probability. By theorem 5.5.17, we have 41 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 n( X n ) Sn Sn n( X n ) n(0,1) . Notes (relationship between several convergences) 1. converges in probability converges in distribution 2. converges in probability to a constant converges in distribution to a constant 3. Slutsky’s Theorem Example 5.5.19 (Estimating the odds) Suppose that X 1 , X 2 ,, X n are iid Bernoulli( ( p) random variables. The typical parameter of interest is p , which can be estimated by X n 1 n X i . We can obtain the distribution of n i 1 nX n , which is Binomial( n, p ). Sometimes we are interested the odds, Xn p , which may be estimated by . 1 Xn 1 p Then what are properties of it? For example, how to calculate the variance of it? The exact calculation may be difficult, but an approximation can be obtained. For the statistical application of Taylor’s Theorem, we are most concerted with the first-order Taylor series. Let T be a random variable with mean and suppose that g is differentiable function, then g (t ) g ( ) g '( )(t ) . 42 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Then we have E ( g (T )) E ( g ( )) g '( ) E (T ) g ( ) ; And Var ( g (T )) E[ g (T ) g ( )]2 E[ g '( )(T )]2 [ g '( )]2Var (T ). Theorem 5.5.24 (Delta Method) Let Yn be a sequence of random variable that satisfies n (Yn ) n(0, 2 ) in distribution. For a given g and a specific value of , suppose that g '( ) exists and is not 0. Then n [ g (Yn ) g ( )] n(0, 2 [ g ' ( )]2 ) in distribution. Example 5.5.22 (Continuation of Example 5.5.19) Recall that we are interested in the properties of g (t ) t 1 , E ( X n ) p , then g '(t ) , thus 1 t (1 t ) 2 E ( g ( X n )) g ( p ) p ; 1 p and Var ( g ( X n )) [ g '( p )]2Var ( X n ) 1 1 1 p . p (1 p ) 2 2 n(1 p )3 (1 p ) (1 p ) n 43 Xn . Let 1 Xn Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Chapter 10 – Asymptotic Evaluations Section 10.1 - Point Estimation Section 10.1.1 - Consistency Definition 10.1.1 A sequence of estimators Wn Wn ( X 1 ,, X n ) is a consistent sequence of estimators of the parameter if, for every 0 and every , lim n P (| Wn | ) 1 or lim n P (| Wn | ) 0 . Recall from Chapter 5, we say that Wn converges in probability to . Also, recall an application of Chebychev’s Inequality: P (| Wn | ) E [(Wn ) 2 ] 2 VarWn (BiasWn ) 2 2 Theorem 10.1.3 If Wn is a sequence of estimators of a parameter satisfying: (i) lim n VarWn 0 (ii) lim n BiasWn 0 44 . Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 for every , then Wn is a consistent sequence of estimators of . Example 10.1.2 (consistency of X ) Let X 1 ,, X n be a random sample from n( ,1) , and consider the consistency of X n 1 n Xi . n i 1 Example (consistency of S 2 ): Let X 1 ,, X n be a random sample from n( , 2 ) . Consider the estimators of 2 : Sn2 and ˆ n2 n 1 2 Sn (MLE). n Theorem 10.1.6 (Consistency of MLEs) Let X 1 ,, X n be a random sample from f ( x | ) , and let n L( | x) i 1 f ( xi | ) be the likelihood function. Let ˆ denote the MLE of . Let ( ) be a continuous function of . Under the regularity conditions in Miscellanea 10.6.2 on f ( x | ) and, hence, L( | x) , for every 0 and every , lim n P (| (ˆ) ( ) | ) 0 . That is, (ˆ) is a consistent estimator of ( ) . Section 10.1.2 - Efficiency 45 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Definition 10.1.11 A sequence of estimators Wn is asymptotically efficient for a parameter ( ) if n (Wn ( )) n(0, v( )) in distribution and v( ) [ '( )]2 E (( log f ( X | )) 2 ) . i.e., the asymptotic variance of Wn achieves the Cramer-Rao Lower Bound. Theorem 10.1.12 (Consistency and Asymptotic efficiency of the MLEs) Let X 1 ,, X n be iid f ( x | ) , let ˆ denote the MLE of , and let ( ) be a continuous function of . Under the regularity conditions on Miscellanea 10.6.2 (p. 516) on f ( x | ) and, hence, L( | x) , n[ (ˆ) ( )] n(0, v( )), where v( ) is the Cramer-Rao Lower Bound. That is, (ˆ) is a consistent and asymptotically efficient estimator of ( ) . In other words, under the conditions of Theorem 10.1.12: (ˆ) is a consistent estimator of ( ) . 46 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 (ˆ) has an asymptotic normal distribution and has an asymptotic variance that equals to the Cramer-Rao Lower Bound. Therefore, n [ (ˆ) ( )] n(0,1) in distribution. v( ) Notes: Most of the common distributions (for instance, the regular exponential family of distributions) satisfy the conditions of Theorem 10.1.12. However, when the support depends on the parameter , Theorem 10.1.12 is not applicable. Section 10.1.3 Calculations and Comparisons From the Delta method and asymptotic efficiency of MLEs, the variance of h(ˆ) is: [h '( )] Var (h(ˆ) | ) I n ( ) 2 [h '( )]2 | ˆ The estimator can be Var (h(ˆ)) E ( [h '( )]2 2 E ( 2 log L( | X)) log L( | X)) | ˆ 2 2 47 (expected information number, MLE of Var (h(ˆ)) ) Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Example 10.1.14 (Approximate binomial variance) In example 7.2.7 we saw that pˆ X is the MLE of p , where X 1 ,, X n are iid from Bernoulli( p ). Then Varp ( pˆ ) p (1 p ) pˆ (1 pˆ ) (direct calculation). So Varpˆ ( pˆ ) n n (direct calculation). We can obtain this by calculating the expected information number: 2 2 E ( 2 log L( p | X)) | p pˆ E{ 2 [npˆ log( p) n(1 pˆ )log(1 p)]}| p pˆ p p np n(1 p ) n ( 2 ) | p pˆ . 2 p pˆ (1 pˆ ) (1 p ) We can calculate the variance of pˆ using the delta method: 1 pˆ n ( pˆ p) n(0, p(1 p)) in probability; p 1 ; p 1 p (1 p) 2 So pˆ p p 1 n( ) n(0, p (1 p )[ ]2 ) n(0, ). 2 1 pˆ 1 p (1 p ) (1 p )3 pˆ pˆ . So Varpˆ ( ) n(1 pˆ )3 1 pˆ 48 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Section 10.3 Hypothesis Testing Section 10.3.1 Asymptotic Distribution of LRTs Theorem 10.3.1 (Asymptotic distribution of the LRT – simple H 0 ) For testing H 0 : 0 versus H1 : 0 , suppose X 1 ,, X n are iid f ( x | ) , ˆ is the MLE of , and f ( x | ) satisfies the regularity conditions in Miscellanea 10.6.2. Then under H 0 , as 2log (x) 12 in distribution where 12 is a 2 random variable with 1 degree of freedom. Therefore an approximate level test for H 0 : 0 versus H1 : 0 , Rejects H 0 when 2log (x) 1,2 . Example 10.3.2 (Poisson LRT) For testing H 0 : 0 versus H1 : 0 based on X 1 ,, X n iid Poisson( ), we have 2log ( x) 2log( exp(n0 )0 nx ) 2n[(0 ˆ ) ˆ log(0 / ˆ )] , nx ˆ ˆ exp(n ) Where ˆ x is the MLE of . Section 10.3.2 Other Large-Sample Tests 49 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Definition: A Wald test is a test based on statistic of the form Zn Wn 0 , Sn Where 0 is a hypothesized value of the parameter , Wn is an estimator of , and S n is a standard error of Wn , an estimator of the standard deviation of Wn . Application of Theorem 10.1.12: Let vˆ( ) be consistent estimator for v( ) , then n [ (ˆ) ( )] n(0,1) vˆ( ) Wald’s statistic is asymptotically standard normal by applying Slutsky’s Theorem. Notes We can use Wald’s statistic as a test statistic for constructing approximate, asymptotic, or large sample tests for ( ) . The resulting test is known as the Wald’s test. Inverting the Wald’s test will give us an approximate, asymptotic, or large sample confidence interval for ( ) . This is equivalent to treating equation (1) as a pivotal quantity. Inference procedures based on the Wald’s statistic do not perform very well in small samples. 50 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Estimating the variance using the Cramer-Rao Lower Bound will usually result in an underestimation of the true variance. Example 10.3.5 (Large-sample binomial tests) Let X 1 ,, X n be iid Bernoulli( p ). Consider H 0 : p p0 versus H1 : p p0 , where 0 p0 1 is specified value. Consider the Wald test: pˆ n p n(0,1). ( pˆ n (1 pˆ n ) / n) When p p0 , Z n pˆ n p0 ~ n(0,1) . We reject H 0 if Z n z . The same statistic Z n obtains if we use the ( pˆ n (1 pˆ n ) / n) information number to derive a standard error for pˆ n . If we are interested in H 0 : p p0 versus H1 : p p0 , we know that pˆ n p n(0,1). ( p (1 p ) / n) So we can use Z n' pˆ n p0 n(0,1) as a test statistic. We reject H 0 if | Z n' | z /2 . ( p0 (1 p0 ) / n) We can also use the test statistic Z n pˆ n p0 ~ n(0,1) in this situation. ( pˆ n (1 pˆ n ) / n) Section 10.4 Interval Estimation 51 Lecture 1 for BST 632: Statistical Theory II – Kui Zhang, Spring 2010 Section 10.4.1 MLE Based Method If X 1 ,, X n from f ( x | ) and ˆ is the MLE of , then varˆ (h(ˆ) | ) Then [h '( )]2 | ˆ 2 2 log L( | X) | ˆ or varˆ (h(ˆ) | ) [h '( )]2 | ˆ 2 E{ 2 log L( | X)}| ˆ h(ˆ) h( ) n(0,1) , var (h(ˆ) | ) ˆ So the approximate confidence interval is: h(ˆ) z /2 varˆ (h(ˆ) | ) h( ) h(ˆ) z /2 varˆ (h(ˆ) | ) . Example 10.4.1 (Confidence Interval for Odds) We know that the MLE of odds p / (1 p) is pˆ / (1 pˆ ) and its pˆ pˆ approximate variance is varpˆ ( . ) n(1 pˆ )3 1 pˆ 52
© Copyright 2024