Chapter 5 1 Properties of A Random Sample. Basic Concepts of Random Sample Def: The random variables X1 , · · · , Xn are called a random sample of size n from the population f (x), if X1 , · · · , Xn are mutually independent random variables and the marginal pdf or pmf of each Xi is the same function f (x). They are also called independent and identically distributed (iid) random variables with the pdf or pmf f (x). The joint pdf or pmf of the iid random variables (X1 , . . . , Xn ) is given by f (x1 , · · · , xn ) = f (x1 )f (x2 ) · · · f (xn ) = n Y f (xi ). i=1 Example. Assume X1 , . . . , Xn are i.i.d. sample from exp(β). (1) Find the joint distribution of the sample. (2) Compute P (X1 > 2, · · · , Xn > 2). 104 Remarks: • A sample drawn from a finite N population with replacement are iid. • A sample drawn from a finite N population without replacement are identically distributed but not independent. 2 Summary of a Random Sample Def: A statistic is a real- or vector-valued function of the random sample: Y = T (X1 , . . . , Xn ). Statistics provide good summaries of the sample. Def: The probability distribution of Y is called the sampling distribution of Y (because this distribution is derived from the distributions of the random variables in the random sample). Example of Commonly-Used statistics: sample sum sample mean sample variance sample standard deviation minimum of sample maximum of sample part of sample P T = P Xi . ¯= 1 Xi . X n 1 P 2 ¯ 2. S = n−1 (Xi − X) √ S = S2. T = min(X1 , . . . , Xn ). T = max(X , . . . , Xn ). 1P P T = ( Xi , Xi2 ). Remark: A statistic can not be a function of parameter. Example. Assume X1 , . . . , Xn are i.i.d. sample from exp(β). Let T = min(X1 , . . . , Xn ). Find the sampling distribution of T . 105 2.1 Sum and Mean of A Random Sample Theorem. Let X1 , . . . , Xn be a random sample from a population with mean µ and variance σ 2 . Then P P (1) E( ni=1 Xi ) = nµ, Var( ni=1 Xi ) = nσ 2 . ¯ = µ, Var(X) ¯ = (2) E(X) σ2 n . (3) E(S 2 ) = σ 2 . ¯ is an unbiased estimator of µ, and S 2 is an unbiased Remark: We say X estimator of σ 2 . Theorem. Distribution of the sum and mean of a random sample. ¯ = 1 Y . Then Let X1 , . . . , Xn be iid. Define Y = X1 + · · · + Xn and X n MY (t) = [MX (t)]n , t t MX¯ = MY ( ) = [MX ( )]n . n n Example 1: P Xi ∼ binom (k, p) i.i.d. P −→ ni=1 Xi ∼ binom (nk, p) Xi ∼ Poi (λ) i.i.d. −→ Pni=1 Xi ∼ Poi (nλ) 2 Xi ∼ N(µ, σ 2 ) i.i.d. −→ ni=1 X Pi n∼ N(nµ, nσ ) Xi ∼ Gamma (α, β) i.i.d. −→ i=1 Xi ∼ Gamma (nα, β) 106 3 Sampling From the Normal Distribution 3.1 Sample Mean and Sample Variance 2 ). Let the sample mean Theorem. For a random sample from N (µ, σP P 1 1 2 ¯ 2 . Then ¯= Xi and the sample variance S = n−1 (Xi − X) X n ¯ ⊥ S2 (a) X ¯ ∼ N (µ, σ 2 /n) (b) X (c) (n − 1)S 2 /σ 2 ∼ χ2n−1 Proof: 107 Theorem. Assume Xj ∼ N (µj , σj2 ), j = 1, · · · , n, are independent. For constants aij and brj (j = 1, . . . , n; i = 1, . . . , k; r = 1, . . . , m), where k+m ≤ n, define n X Ui = aij Xj , i = 1, . . . , k, j=1 Vr = n X brj Xj , r = 1, . . . , m. j=1 (a) The random variables Ui and Vr are independent if and only if Cov(Ui , Vr ) = 0. (b) Cov(Ui , Vr )= Pn 2 j=1 aij brj σj . (c) The random vectors (U1 , . . . , Uk ) and (V1 , . . . , Vm ) are independent if and only if Ui is independent of Vr for all the pair (i, r), where i = 1, . . . , k and r = 1, . . . , m. Remark: (i) Based on (a), if we start with independent normal random variables, covariance and independence are equivalent for linear functions of these random variables. Thus, we can check independence for normal variables by merely checking the covariance term. Example: Let X1 , · · · , Xn be a random sample from N (µ, σ 2 ) distribu¯ Xj − X) ¯ = 0 for all j = 1, · · · , n. tion. Then Cov(X, (ii) Based on part (b), we can conclude that the pairwise independence implies the vector independence for the linear functions of independent normal random variables. (This is not true in general) 108 3.2 Distributions Derived from Normal χ2p distribution: same as Gamma(p/2, 2), with the pdf f (x) = 1 x(p/2)−1 e−x/2 , 2p/2 Γ(p/2) x > 0. (1) If X is N (0, 1), then X 2 ∼ χ21 . P (2) If X1 , . . . , Xn are independent and Xi ∼ χ2pi , then ni=1 Xi ∼ χ2p1 +···+pn . Note: Student’s t distribution: If U is N (0, 1) and V is χ2p , and U and V are independent, then √U V /p has a Student’s tp distribution with pdf f (t) = Γ((p + 1)/2) √ (1 + t2 /p)−(p+1)/2 , Γ(p/2) pπ −∞ < t < ∞. If Tp is a random variable with a tp distribution, then (i) Tp has no mgf as it does not have moments of all orders; (ii) Tp has only p − 1 moments; (iii) T1 has no mean, T2 has no variance; p for p > 2. (iv) ETp = 0 for p > 1, VarTp = p−2 Example: Let X1 , . . . , Xn be a random sample from N (µ, σ 2 ). Then T = ¯ −µ X √ ∼ tn−1 S/ n 109 F distribution: If U ∼ χ2p and V ∼ χ2q , and U and V are independent, then Snedecor’s Fp,q distribution with pdf f (x) = x(p/2)−1 Γ((p + q)/2) p p/2 ( ) , Γ(p/2)Γ(q/2) q [1 + (p/q)x](p+q)/2 U/p V /q has a 0 < x < ∞. (i) Fp,q has no mgf as it does not have moments of all orders; (ii) Fp,q has (finite) moments of order m where m < q/2; q (iii) EFp,q = q−2 if q > 2. 2 ) Example (Variance ratio): Let X1 , . . . , Xn be a random sample from N (µX , σX 2 and Y1 , . . . , Ym be a random sample from N (µY , σY ). Then F = 2 /σ 2 SX X ∼ Fn−1,m−1 . SY2 /σY2 Theorem • If X ∼ Fp,q , then 1/X ∼ Fq,p . • If X ∼ tq , then X 2 ∼ F1,q . • If X ∼ Fp,q , then (p/q)X 1+(p/q)X ∼ Beta(p/2, q/2). 110 4 Convergence Concepts Assume there is a sequence of random variables X1 , X2 , ..., Xn , ... In this section, we study how the distribution of Xn converges to some limiting distribution as n → ∞. 4.1 Convergence in Probability Def: A sequence of random variables X1 , X2 , ..., Xn , ... →p X if for every > 0, lim P (|Xn − X| < ) → 1, n→∞ or equivalently lim P (|Xn − X| ≥ ) → 0, n→∞ Remark: • The sequence X1 , X2 , ... here is not required to be iid. Actually, in many cases, the distribution of Xn changes as n changes. • The limiting variable X may be a constant or a random variable. Example: Let X be a random variable having the distribution exp(1). Define Xn = (1 + n1 )X for n = 1, 2, · · ·. Show that Xn →p X. 111 Weak Law of Large Numbers. (WLLN): Let X1 , . . . , Xn be iid with E(X) = µ, var(X) < ∞. Then X1 + · · · + Xn →p µ. n Proof: Chebychev’s Inequality. [The sample mean goes to the population mean in probability, and this property is called consistency.] Examples: ¯ →p µ. X1 , . . . , Xn ∼ N (µ, σ 2 ) iid. Then X ¯ X1 , . . . , Xn ∼ Bin(1, p) iid. Then X →p p. (The sample proportion goes to the population proportion) Convergence of Transformations: Assume Xn →p X, Yn →p Y , then (1) aXn + bYn →p aX + bY, Xn Yn →p XY (2) Xn /Yn →p X/Y if P (Y = 0) = 0. (3) Assume g is a continuous function. Then g(Xn ) →p g(X). (4) Assume h is a continuous function. Then h(Xn , Yn ) →p h(X, Y ). Proof for (3): Example: If Var(Xi2 ) < ∞, or equivalently E(Xi4 ) < ∞, then Sn2 →p σ 2 . And Sn is a consistent estimator of σ. 112 4.2 Almost Sure Convergence Def.: We say Xn → X almost sure (a.s.) if P ({ω ∈ S : Xn (ω) → X(ω)}) = 1, where S is the sample space. In some sense, this convergence can be regarded as pointwise convergence (almost everywhere) Example: Let S = [0, 1], with the uniform probability distribution. Define Xn (ω) = ω + ω n and X(ω) = ω. Then Xn → X a.s. Borel-Cantelli Lemma: If for any > 0, Xn → X a.s. P∞ n=1 P (|Xn − X| > ) < ∞, then Strong Law of Large Numbers. (SLLN) Suppose X1 , . . . , Xn iid with E(X) = µ, var(X) = σ 2 . Then X1 + · · · + Xn →µ n a.s. Remark: The sample mean goes to the population mean a.s. In particular, the sample proportion goes to the population proportion. 113 Convergence of Transformations: Assume Xn → X a.s., Yn → Y a.s., then (1) aXn + bYn → aX + bY a.s., Xn Yn → XY a.s. (2) Xn /Yn → X/Y a.s. if P (Y = 0) = 0. (3) Assume g is a continuous function. Then g(Xn ) → g(X) a.s. (4) Assume h is a continuous function. Then h(Xn , Yn ) → h(X, Y ) a.s. Example: Sn2 = (n − 1)−1 Pn i=1 (Xi ¯ n )2 → σ 2 a.s. −X 114 4.3 Convergence in Distribution Def.: We say Xn →d X if the sequence of distribution functions FXn of Xn converge to that of X in an appropriate sense: FXn (x) → FX (x) for all x where FX is continuous. Remark: Convergence in distribution is very indirect unlike that in probability or a.s. It is a property of the distribution, not of a specific random variable. The random variables are secondary, and Xn ’s and X need not even be related. Example: Let Xn be a random variable following the distribution exp(βn ), where βn = 1 + n1 , for n = 1, 2, · · ·. Let X be a random variable having the distribution exp(1). Show that Xn →d X. Result: Direct verification of Xn →d X is often difficult. A very useful criterion for Xn →d X is the convergence of MGFs, that is MXn (t) → MX (t), for all tin a neighborhood of 0. We have seen previously: Example: Consider the sequence X1 , ...., Xr , ..., with Xr ∼NB(r,p). If r → ∞, p → 1 so that r(1 − p) → λ, then Xr →d X with X ∼ Poisson (λ). Example: Consider the sequence X1 , ...., Xn , ..., with Xr ∼Bin(n,p). If n → ∞, p → 0, np → λ, then Xn →d X where X ∼ Poisson(λ). 115 Theorem: If Xn →d X and g is a continuous function, then g(Xn ) →d g(X). This holds even if X is a random vector. Example: Let X1 , X2 , . . . be random variables. If Xn →d X, where X has the N (0, 1) distribution, then Xn2 →d X 2 , which has the χ21 distribution. Example: Let (X1 , Y1 ), (X2 , Y2 ), . . . be a sequence of random bivariate satisfying (Xn , Yn ) →d (X, Y ), where X and Y are independently following N (0, 1). Then Xn /Yn →d X/Y , which has the Cauchy distribution. 4.4 Relationship Among Different Types of Convergences. (i) Xn → X a.s. =⇒ Xn →p X =⇒ Xn →d X. (ii) In general, the inverse direction in the above is not true. Example 5.5.8. (iii) When X = c is a constant (i.e., X takes the deterministic value c with probability one), Xn →p c is equivalent to Xn →d c. (iii) Slutsky’s Theorem. If Xn →d X, Yn →p c (constant), then (a) Xn + Yn →d X + c, (b) Xn Yn →d cX, (b) Xn /Yn →d X/c if c 6= 0. 116 5 Central Limit Theorem. Let X1 , X2 , . . . be iid with mean µ and variance σ 2 . Then √ ¯ n(Xn − µ) →d N (0, 1). σ Example: A shooter hits a target with probability p independently in each attempt. She decides to hit the target r times. Let X stand for the number of attempts she needs. (a) State the distribution of X. (b) If r → ∞ and 0 < p < 1 remains fixed, show that the distribution of r−1/2 (X − r/p) converges a normal distribution. Specify the mean and variance for the limiting distribution. √ ¯ Example: (A useful application) By CLT, n(X n − µ)/σ →d N (0, 1). However, often σ is unknown. So we can use its estimate n X ¯ n )2 /(n − 1)]1/2 . Sn = [ (Xi − X i=1 By Slutsky, √ ¯ n − µ) n(X →d N (0, 1). Sn This can be used to test hypothesis or construct confidence interval for µ. 117 6 Delta Methods If X is normal, any linear transform of X is also normal. This is, however, not true for nonlinear transformations. If Xn is asymptotically normal (with mean µ and variance σn2 going to zero), then for any smooth function g, g(Xn ) is also asymptotically normal. In most applications, CLT gives σn2 = τ 2 /n for sample averages. Delta method can be applied in such situations to calculate the asymptotic distribution of functions of sample average. Intuitively, when σn2 is small, Xn is concentrated near µ and thus only the behavior of g(x) near µ matters. Any smooth function behaves locally like a linear function. More formally, g(x) can be expanded near µ as g(x) = g(µ) + g 0 (µ)(x − µ) + o(|x − µ|). Thus g(Xn ) = g(µ) + g 0 (µ)(Xn − µ) + Remainder. (i) First-order Delta method: √ Let Xn be a sequence of r.v.s satisfying n(Xn − µ) →d N (0, σ 2 ). If g 0 (µ) 6= 0, then √ n[g(Xn ) − g(µ)] →d N (0, σ 2 [g 0 (µ)]2 ) by Slutsky’s theorem. In other words, g(Xn ) is asymptotically normal with mean g(µ) and variance [g 0 (µ)]2 σ 2 /n. ¯ Example p 1: X1 , . . . , Xn iid Poi(λ). Find the asymptotic distributions of Xn ¯n. and X Further, λ can be substituted by Slutsky. 118 ¯n. Example 2: X1 , . . . , Xn iid binom(1, p). Define Yn = X (1) Find the asymptotic distribution of − log Yn . ¯ n (1 − X ¯ n ), assume p 6= 1 . (2) Find the asymptotic distribution of X 2 Further, p can be substituted by Slutsky. 119 (ii) Second-order Delta method: √ Let Xn be a sequence of r.v.s satisfying n(Xn − µ) →d N (0, σ 2 ). If g 0 (µ) = 0 and g 00 (µ) 6= 0, then n[g(Xn ) − g(µ)] →d σ 2 Note that: g(Xn ) = g(µ) + 0 + g 00 (µ) 2 (Xn g 00 (µ) 2 χ1 . 2 − µ)2 + Remainder. Example: X1 , . . . , Xn iid binom(1, p). Find the asymptotic distribution ¯ ¯ n ), assume p = 1 . of Xn (1 − X 2 (iii) Multivariate Delta method Let X 1 , . . . , X n be p-dimensional random vector i.i.d. with E(X) = µ 1 Pn ¯ and D(X) = Σ. Define Xj = n i=1 Xij for j = 1, . . . , p, then √ ¯1, . . . , X ¯ p ) − g(µ1 , . . . , µp )] →d N (0, [g 0 (µ)]T Σ[g 0 (µ)]), n[g(X ∂g(µ) ∂g(µ) ¯1, . . . , X ¯p) where g 0 (µ) = ( ∂µ1 , . . . , ∂µp )T . In other words, we have g(X is AN(g(µ), n−1 [g 0 (µ)]T Σ[g 0 (µ)]). Note g(x) = g(µ) + [g 0 (µ)]T (x − µ) + o(kx − µk2 ). 120 Some Exercises Example: Let X1 , . . . , Xn be Normal(0, θ) and Y1 , . . . , Yn be Normal(0, 1), and let all the variables be mutually independent. Consider Vn = (a) Show that E(Vn ) = X12 + · · · + Xn2 . Y12 + · · · + Yn2 nθ n−2 . (b) Using the fact that Pn Vn − θ = (X 2 − θY 2 ) i=1 Pn i 2 i , i=1 Yi √ n(Vn − θ) converges in distribution to Normal (0, 4θ2 ). √ (c) Is it true that n(Vn −E(Vn )) converges in distribution to Normal(0, 4θ2 )? show that (d) Obtain the asymptotic distribution of log Vn . 121 Example: Let Xi , i =, 1, 2, · ·Q · be independent uniform random variables on interval (0, 1). Define Yn = ni=1 Xi , where n is a positive integer. (a) Derive the distribution of − log(Yn ). (b) Define Tn = (Yn )−1/n . Show that Tn converges in probability to e1 as n goes to infinity. √ (c) Show that n(Tn − e1 ) converges in distribution to a normal random variable N (0, τ ) as n goes to infinity. Determine the value of τ . 122 Example: (X, Y ) is a bivariate random variable and define θ = Pr(X < Y ). (a) Define the function H(X, Y ) to take the value 1 if X < Y and 0 otherwise. Compute E[H(X, Y )] = θ. (b) Let the pairs (Xi , Yi ), i = 1, · · · , n P be i.i.d. samples with the same distribution as (X, Y ). Define T = ni=1 H(Xi , Yi ). What is the distribution of T ? (c) Show that T /n →p θ. (d) Describe the limiting distribution of n−1/2 (T − nθ). 123 Example: Let Xi , i = 1, · · · , n, and Yj , j = 1, · · · , m, be independent ¯ and Sx2 denote normal random variables with mean µ and variance σ 2 . Let X the sample mean and sample variance of Xi ’s; similarly, define Y¯ and Sy2 for Yj ’s. ¯ − Y¯ . (5pts) (a) Find the distribution of X (b) Find the distribution of (n − 1)Sx2 + (m − 1)Sy2 /σ 2 . (5pts) (c) Define r T = ¯ − Y¯ nm(n + m − 2) X q . n+m (n − 1)S 2 + (m − 1)S 2 x y Show that T follows a t-distribution with m+n−2 degrees of freedom. (6 pts) 124