CLT for Linear Spectral Statistics of Large Dimensional Sample Covariance Matrices By Z.D. Bai1 and Jack W. Silverstein2 National University of Singapore and North Carolina State University Summary 1/2 1/2 Let Bn = (1/N )Tn Xn Xn∗ Tn where Xn = (Xij ) is n × N with i.i.d. complex 1/2 standardized entries having finite fourth moment, and Tn is a Hermitian square root of the nonnegative definite Hermitian matrix Tn . The limiting behavior, as n → ∞ with n/N approaching a positive constant, of functionals of the eigenvalues of Bn , where each is given equal weight, is studied. Due to the limiting behavior of the empirical spectral distribution of Bn , it is known that these linear spectral statistics converges a.s. to a non-random quantity. This paper shows their of rate of convergence to be 1/n by proving, after proper scaling, they form a tight sequence. Moreover, if EX12 1 = 0 and E|X1 1 |4 = 2, or if X1 1 and Tn are real and EX14 1 = 3, they are shown to have Gaussian limits. 1 Research supported in part by Research Grant R-155-000-014-112 from the National University of Singapore 2 Research supported by the National Science Foundation under grant DMS-9703591 AMS 2000 subject classifications. Primary 15A52, 60F05; Secondary 62H99. Key Words and Phrases. Linear spectral statistics, Random matrix, empirical distribution function of eigenvalues, Stieltjes transform. 1. Introduction. Due to the rapid development of modern technology, statisticians are confronted with the task of analyzing data with ever increasing dimension. For example, stock market analysis can now include a large number of companies. The study of DNA can now incorporate a sizable number of its base pairs. Computers can easily perform computations with high dimensional data. Indeed, within several milli-seconds, a mainframe can complete the spectral decomposition of a 1000 × 1000 symmetric matrix, a feat unachievable only 20 years ago. In the past, so-called dimension reduction schemes played the main role in dealing with high dimensional data. But a large portion of information contained in the original data would inevitably get lost. For example, in variable selection of multivariate linear regression models, one shall lose all information contained in the unselected variables; in principal component analysis, all information contained in the components deemed “non-principal” would be gone. Now, when dimension reduction is performed, it is usually not due to computational restrictions. However, even though the technology exists to compute much of what is needed, there is a fundamental problem with the analytical tools used by statisticians. Their use relies on their asymptotic behavior as the number of samples increase. It is to be expected that larger dimension will require larger samples in order to maintain the same level of behavior. But the required increase is typically orders of magnitude larger, sample sizes that are simply unattainable in most situations. With a necessary limitation on the number of samples, many frequently used statistics in multivariate analysis perform in a completely different manner than they do on data of low dimension with no restriction on sample size. Some methods behave very poorly (see Bai and Saranadasa (1996)), and some are even not applicable (see Dempster (1958)). Consider the following example. Let Xij be iid standard normal variables. Write à SN = N 1 X Xik Xjk N k=1 !n i,j=1 which can be considered as a sample covariance matrix, N samples of an n-dimensional mean zero random vector with population matrix I. An important statistic in multivariate analysis is n X TN = ln(detSN ) = ln(λN,j ), j=1 where λN,j j = 1, . . . , n, are the eigenvalues of SN . When n is fixed, λN,j → 1 almost a.s. surely as N → ∞ and thus TN −→ 0. Further, by taking a Taylor expansion on ln x, one can show that p N/n TN →D N (0, 2), 1 for any fixed n. This suggests the possibility that TN is asymptotically normal, provided that n = O(N ). However, this is not the case. Let us see what happens when n/N → c ∈ (0, 1) as n → ∞. Using results on the limiting spectral distribution of {SN } [see Marˇcenko and Pastur (1967), Bai (1999)], we have with probability one (1.1) 1 TN → n Z b(c) a(c) ln x p c−1 ln(1 − c) − 1 ≡ d(c) < 0 (b(c) − x)(x − a(c))dx = 2πxc c √ √ where a(c) = (1 − c)2 , b(c) = (1 + c)2 (see section 5 for a derivation of the integral). This shows that almost surely p √ N/n TN ∼ d(c) N n → −∞. p Thus, any test which assumes asymptotic normality of N/n TN will result in a serious error. Besides demonstrating problems with relying on traditional methodology when sample size is restricted, the example introduces one of several results that can be used to handle data with large dimension n, proportional to N , the sample size. They are limit theorems, as n approaches infinity, on the eigenvalues of a class of random matrices of sample covariance type [Yin and Krishnaiah (1983), Yin (1986), Silverstein (1995), Bai and Silverstein (1998),(1999)]. They take the form Bn = 1 1/2 Tn Xn Xn∗ Tn1/2 N 1/2 n n where Xn = (Xij ) is n × N , Xij ∈ C are i.i.d. with E|X1n1 − EX1n1 |2 = 1, Tn 1/2 Tn X1n1 is n × n random Hermitian, with Xn and independent. When is known to have mean zero, and Tn is nonrandom, Bn can be viewed as a sample covariance matrix, which includes any 1/2 n n Wishart matrix, formed from N samples of the random vector Tn X·1 (X·1 denoting the 1/2 2 first column of Xn ), which has population covariance matrix Tn ≡ (Tn ) . Besides sample covariance matrices, Bn , whose eigenvalues are the same as those of (1/N )Xn Xn∗ Tn , models the spectral behavior of other matrices important to multivariate statistics, in particular multivariate F matrices, where X1n1 is N (0, 1) and Tn is the inverse of another Wishart matrix. The basic limit theorem on the eigenvalues of Bn concerns its empirical spectral distribution F Bn , where for any matrix A with real eigenvalues, F A denotes the empirical distribution function of the eigenvalues of A, that is, if A is n × n then F A (x) = 1 (number of eigenvalues of A ≤ x). n 2 If n 1) for all n, i, j, Xij are i.d., 2) with probability one, F Tn →D H, a proper cumulative distribution function (c.d.f.), and, 3) n/N → c > 0 as n → ∞, then with probability one F Bn converges in distribution to F c,H , a nonrandom proper c.d.f. The case when H distributes its mass at one positive number (called the PasturMarcˇenko law with a dimension-sample size ratio c2 ∈ (0, 1)), as in the above example, is one of seven non-trivial cases where an explicit expression for F c,H is known (the multivariate F matrix case [Silverstein (1985)], and, as to be seen below, when H is discrete with at most three mass points). However, a good deal of information, including a way to compute F c,H , can be extracted out of an equation satisfied by its Stieltjes transform, defined for any c.d.f. G to be Z 1 mG (z) ≡ dG(λ) z ∈ C+ ≡ {z ∈ C : Im z > 0}. λ−z For each z ∈ C+ , the Stieltjes transform m(z) ≡ mF c,H (z) is the unique solution to Z 1 m= dH(λ) λ(1 − c − czm) − z + c,H in the set {m ∈ C : − 1−c z + cm ∈ C }. The equation takes on a simpler form when F is replaced by F c,H ≡ (1 − c)I[0,∞) + cF c,H (IA denoting the indicator function on the set A) which is the limiting empirical distribution function of B n ≡ (1/N )Xn∗ Tn Xn (the spectra of which differs from that of Bn by |n − N | zeros). Its Stieltjes transform m(z) ≡ mF c,H (z) = − has inverse 1 z = z(m) = − + c m (1.2) Z 1−c + cm(z) z t dH(t). 1 + tm Using (1.2) it is shown in Silverstein and Choi (1995) that, on (0, ∞) F c,H has a continuous density, is analytic inside its support and is given by (1.3) f c,H (x) = c−1 d c,H F (x) = (cπ)−1 Im m(x) ≡ (cπ)−1 lim Im m(z). z→x dx 3 Also, F c,H (0) = max[1 − c−1 , H(0)]. Moreover, considering (1.2) for m real, the range of values where it is increasing constitutes the complement of the support of F c,H on (0, ∞) [Marˇcenko and Pastur (1967), Silverstein and Choi (1995)]. From (1.2) and (1.3) f c,H (x) can be computed using Newton’s method for each x ∈ (0, ∞) inside its support (see Bai and Silverstein (1998) for an illustration of the density when c = .1 and H places mass 0.2, 0.4, and 0.4 at respectively, 1, 3, and 10). Notice in (1.2) when H is discrete with at most three positive mass points the density has an explicit expression, since m(z) is the root of a polynomial of degree at most four. Convergence in distribution of F Bn of course reveals no information on the number of eigenvalues of Bn appearing on any interval [a, b] outside the support of F c,H , other than the number is almost surely o(n). In Bai and Silverstein (1998) it is shown that, with probability one, no eigenvalues of Bn appear in [a, b] for all n large under the following additional assumptions: 10 ) Xn is the first n rows and N columns of a doubly infinite array of i.i.d. random variables, with EX1 1 = 0 E|X1 1 |2 = 1, and E|X1 1 |4 < ∞, and 20 ) Tn is nonrandom, kTn k, the spectral norm of Tn , is bounded in n, and 30 ) [a, b] lies in an open subset of (0, ∞) which is outside the support of F cn ,Hn for all n large, where cn ≡ n/N and Hn ≡ F Tn . The result extends what has been previously known on the extreme eigenvalues of A (1/N )Xn Xn∗ (Tn = I). Let λA max , λmin denote, respectively, the largest and smallest eigenvalues of the Hermitian matrix A. Under condition 10 Yin, Bai, and Krishnaiah (1988) showed, as n → ∞ √ (1/N )X X ∗ a.s. λmax n n −→ (1 + c)2 , while in Bai and Yin (1993) for c ≤ 1 ∗ a.s. (1/N )Xn Xn λmin −→ (1 − √ c)2 . If [a, b] separates the support of F c,H in (0, ∞) into two non-empty sets, then associated with it is another interval J which separates the eigenvalues of Tn for all n large. The mass F cn ,Hn places, say, to the right of b equals the proportion of eigenvalues of Tn lying to the right of J. In Bai and Silverstein (1999) it is proven that, with probability one, the number of eigenvalues of Bn and Tn lying on the same side of their respective intervals is the same for all n large. The above two results are intuitively plausible when viewing Bn as an approximation of Tn , especially when cn is small (it can be shown that F c,H →D H as c → 0). However, regardless of the size of c, when separation in the support of F c,H on (0, ∞) associated with a gap in the spectrum of Tn occurs, there will be exact splitting of the eigenvalues of Bn . 4 These results can be used in applications where location of eigenvalues of the population covariance matrix is needed, as in the detection problem in array signal processing [see Silverstein and Combettes (1992)]. Here, each entry of the sampled vector is a reading off a sensor, due to an unknown number q of sources emitting signals in a noise-filled environment (q < n). The problem is to determine q. The smallest eigenvalue of the population covariance matrix is positive with multiplicity n − q (the so-called “noise” eigenvalues). The traditional approach has been to sample enough times so that the sample covariance matrix is close to the population matrix, relying on fixed dimension, large sample asymptotic analysis. However, it may be impossible to sample enough times if q is sizable. The above results show that in order to determine the number of sources, simply sample enough times so that the eigenvalues of Bn split into two discernable groups. The number on the right will, with high probability, equal q. The results also enable us to understand the true behavior of statistics such as TN in the above example when n and N are large but on the same order of magnitude: TN is not close to zero, rather n−1 TN is close to the quantity d(c) in (1.1), or perhaps more appropriately d(cn ). However, in order to fully utilize n−1 TN , typically in hypothesis testing, it is important to establish the limiting distribution of TN − nd(cn ). We come to the main purpose of this paper, to study the limiting distribution of normalized spectral functionals like Tn − nd(c), and as a by-product, the rate of convergence of statistics such as n−1 TN , functionals of the eigenvalues of Bn where each is given equal weight. We will call them linear spectral statistics, quantities of the form 1X f (λj ) (λ1 , . . . , λn denoting the eigenvalues of Bn ) = n j=1 n Z f (x)dF Bn (x) where f is a function on [0, ∞). R We will show, under the assumption E|X1 1 |4 < ∞ and the analyticity of f , the rate R f (x)dF Bn (x) − f (x)dF cn ,Hn (x) approaches zero is essentially 1/n. Define Gn (x) = n[F Bn (x) − F cn ,Hn (x)]. The main result is stated in the following theorem. Theorem 1.1. Assume: n (a) For each n Xij = Xij , i ≤ n, j ≤ N are i.i.d., i.d. for all n, i, j, EX1 1 = 0, E|X1 1 |2 = 1, E|X1 1 |4 < ∞, n/N → c, and (b) assumption 20 above. 5 Let f1 , . . . , fr be C 1 functions on R with bounded derivatives, and analytic on an open interval containing (1.4) n [lim inf λTmin I(0,1) (c)(1 − √ n n c)2 , lim sup λTmax (1 + √ c)2 ] n Then (1) the random vector µZ (1.5) ¶ Z f1 (x) dGn (x), . . . , fr (x) dGn (x) forms a tight sequence in n. 4 (2) If X11 and Tn are real and E(X11 ) = 3, then (1.4) converges weakly to a Gaussian vector (Xf1 , . . . , Xfr ), with means (1.6) 1 EXf = − 2πi Z f (z) ³ c R 1−c m(z)3 t2 dH(t) (1+tm(z))3 R m(z)2 t2 dH(t) (1+tm(z))2 ´2 dz and covariance function (1.7) 1 Cov(Xf , Xg ) = − 2 2π ZZ d f (z1 )g(z2 ) d m(z1 ) m(z2 )dz1 dz2 2 (m(z1 ) − m(z2 )) dz1 dz2 (f, g ∈ {f1 , . . . , fr }). The contours in (1.6) and (1.7) (two in (1.7), which we may assume to be non-overlapping) are closed and are taken in the positive direction in the complex plane, each enclosing the support of F c,H . 2 ) = 0 and E(|X11 |4 ) = 2, then (2) also holds, except the (3) If X11 is complex with E(X11 means are zero and the covariance function is 1/2 the function given in (1.7). We begin the proof of Theorem 1.1 here with the replacement of the entries of Xn with truncated and centralized variables. For m = 1, 2, . . . find nm (nm > nm−1 ) satisfying Z 4 m |X11 |4 < 2−m √ {|X11 |≥ n/m} for all n ≥ nm . Define δn = 1/m for all n ∈ [nm , nm+1 ) (=1 for n < n1 ). Then, as n → ∞, δn → 0 and Z −4 (1.8) δn |X11 |4 → 0. √ {|X11 |≥δn n} Let now for each n δn be the larger of δn constructed above and the δn created in the proof of Lemma 2.2 of Yin, Bai, and Krishnaiah (1988) with r = 1/2 and satisfying δn n1/4 → ∞. 6 bn = (1/N )Tn1/2 X bn n × N having (i, j)-th entr y Xij I{|X |<δ √n} . bn X bn∗ Tn1/2 with X Let B ij n We have then Z √ −4 b P(Bn 6= Bn ) ≤ nN P (|X11 | ≥ δn n) ≤ Kδn |X11 |4 = o(1). √ {|X11 |≥δn n} en X en∗ Tn1/2 with X en = (1/N )Tn1/2 X en n × N having (i, j)-th entry (X bij − Define B 2 2 bij −EX bij | . From Yin, Bai, and Krishnaiah (1988) we know that bij )/σn , where σn = E|X EX √ both lim sup λBbn and lim sup λBen are almost surely bounded by lim sup kT k(1+ c)2 . n max n max n n b n (x) and G e n (x) to denote the analogues of Gn (x) with the matrix Bn replaced by We use G bn and B en respectively. Let λA denote the k-th smallest eigenvalue of Hermitian A. Using B k the same approach and bounds that are used in the proof of Lemma 2.7 of Bai (1999), we have, for each j = 1, 2, · · · , r, ¯Z ¯ Z n X ¯ ¯ bn en B b n (x) − fj (x)dG e n (x)¯ ≤ Kj ¯ fj (x)dG |λB k − λk | ¯ ¯ k=1 bn en 1/2 B bn − X en )(X bn − X en )∗ Tn1/2 )1/2 (n(λB ≤ 2Kj (N −1 tr (Tn1/2 (X max + λmax )) where Kj is a bound on |fj0 (z)|. From (1.8) we have Z Z 2 2 −2 −1 |σn − 1| ≤ 2 |X11 | ≤ δn n √ {|X11 |≥δn n} Moreover, b11 | = | |EX These give us √ {|X11 |≥δn n} Z {|X11 |≥δn √ |X11 |4 = o(δn2 n−1 ). X11 | = o(δn n−3/2 ). n} bn − X en )(X bn − X en )∗ Tn1/2 )1/2 (N −1 tr (Tn1/2 (X bn∗ )1/2 + (N −1 tr ((σn−2 (E X bn X bn E X bn∗ ))1/2 ≤ (N −1 tr ((1 − 1/σn )2 (X ≤( (1 − σn2 )2 bn 1/2 + o(δ n−1 ). b11 | = o(δn n−1/2 )(λB nλBbn )1/2 + n1/2 σ −1 |EX n max ) σn2 (1 + σn )2 max From the above estimates, we obtain Z Z e n (x) + op (1) fj (x)dGn (x) = fj (x)dG i.p. (op (1) −→ 0 as n → ∞.) Therefore, we only need to find the limiting distribution of R e n (x), j = 1, · · · , r}. Hence, in the sequel, we shall assume the underlying vari{ fj (x)dG √ ables are truncated at δn n, centralized and renormalized. For simplicity, we shall sup√ press all sub- or superscripts on the variables and assume that |Xij | < δn n, EXij = 0, 7 E|Xij |2 = 1, E|Xij |4 < ∞, and in the real Gaussian case E|X11 |4 = 3 + o(1),while in the 2 complex Gaussian case EX11 = o(1/n) and E|X11 |4 = 2 + o(1). Since the truncation steps are identical to those in Yin, Bai, and Krishnaiah (1988) √ we have for any η > (1 + c)2 the existence of {kn } for which kn →∞ ln n and Ek(1/N )Xn Xn∗ kkn ≤ η kn for all n large. Therefore, P (kBn k ≥ η) = o(n−` ), (1.9a) √ for any η > lim sup kT k(1 + c)2 and any positive `. By modifying the proof in Bai and Yin (1993) on the smallest eigenvalue of (1/N )Xn Xn∗ it follows that when xl > 0 −` n P (λB min ≤ η) = o(n ), (1.9b) √ whenever 0 < η < lim inf n λTmin (1 − c)2 . The modification is given in the appendix. After truncation and centralization, our proof of the main theorem relies on establishing limiting results on Mn (z) = n[mF Bn (z) − mF cn ,Hn (z)] = N [mF Bn (z) − mF cn ,Hn (z)], cn (·), a truncated version of Mn (·) when viewed as a random twoor more precisely, on M dimensional process defined on a contour C of the complex plane, described as follows. Let v0 > 0 be arbitrary. Let xr be any number greater than the right end-point of interval (1.4). Let xl be any negative number if the left end-point of (1.4) is zero. Otherwise choose √ n xl ∈ (0, lim inf n λTmin (1 − c)2 ). Let Cu = {x + iv0 : x ∈ [xl , xr ]}. Then C ≡ {xl + iv : v ∈ [0, v0 ]} ∪ Cu ∪ {xr + iv : v ∈ [0, v0 ]}. cn (·). Choose sequence We define now the subsets Cn of C on which Mn (·) agrees with M {²n } decreasing to zero satisfying for some α ∈ (0, 1) ²n ≥ n−α . (1.10) ½ Let Cl = {xl + iv : v ∈ [n−1 ²n , v0 ]} {xl + iv : v ∈ [0, v0 ]} 8 if xl > 0 if xl < 0 and Cr = {xr + iv : v ∈ [n−1 ²n , v0 ]}. cn (·) can now be defined. For z = x + iv we have: Then Cn = Cl ∪ Cu ∪ Cr . The process M for z ∈ Cn Mn (z) −1 c (1.11) Mn (z) = Mn (xr + in ²n ) for x = xr , v ∈ [0, n−1 ²n ], and if xl > 0 Mn (xl + in−1 ²n ) for x = xl , v ∈ [0, n−1 ²n ]. cn (·) is viewed as a random element in the metric space C(C, R2 ) of continuous functions M from C to R2 . All of Chapter 2 of Billingsley (1968) applies to continuous functions from a set such as C (homeomorphic to [0, 1]) to finite dimensional Euclidean space, with | · | interpreted as Euclidean distance. Most of the paper will deal with proving cn (·)} forms a tight Lemma 1.1. Under conditions (a) and (b) of Theorem 1.1 {M sequence on C. Moreover, if assumptions in (2) or (3) of Theorem 1.1 on X1 1 hold, then cn (·) converges weakly to a two-dimensional Gaussian process M (·) satisfying for z ∈ C M under the assumptions in (2) EM (z) = ³ (1.12) c R 1−c m(z)3 t2 dH(t) (1+tm(z))3 R m(z)2 t2 dH(t) (1+tm(z))2 ´2 and for z1 , z2 ∈ C (1.13) Cov(M (z1 ), M (z2 )) ≡ E[(M (z1 ) − EM (z1 ))(M (z2 ) − EM (z2 )) m0 (z1 )m0 (z2 ) 1 = − , 2 (m(z1 ) − m(z2 )) (z1 − z2 )2 while under the assumptions in (3) EM (z) = 0, and the “covariance” function analogous to (1.13) is 1/2 of (1.13). We show now how Theorem 1.1 follows from the above Lemma. We use the identity Z Z 1 (1.14) f (x)dG(x) = − f (z)mG (z)dz 2πi valid for c.d.f. G and f analytic on an open set containing the support of G. The complex integral on the right is over any positively oriented contour enclosing the support of G and on which f is analytic. Choose v0 , xr , and xl so that f1 , . . . , fr are all analytic on and inside the resulting C. Due to the a.s. convergence of the extreme eigenvalues of (1/N )Xn Xn∗ and the bounds A B λAB max ≤ λmax λmax A B λAB min ≥ λmin λmin , 9 valid for n × n Hermitian non-negative definite A and B, we have with probability one Bn n lim inf min(xr − λB max , λmin − xl ) > 0. n→∞ It also follows that the support of F cn ,Hn is contained in n I(0,1) (cn )(1 − [λTmin √ n cn )2 , λTmax (1 + √ cn )2 ]. Therefore for any f ∈ {f1 , . . . , fr }, with probability one Z 1 f (x)dGn (x) = − 2πi Z f (z)Mn (z)dz ¯ with C ¯ ≡ {¯ for all n large, where the complex integral is over C ∪ C, z : z ∈ C}. Moreover, with probability one, for all n large ¯Z ¯ ¯ ¯ √ −1 n n cn (z))dz ¯ ≤ 4K²n (| max(λTmax ¯ f (z)(Mn (z) − M (1 + cn )2 , λB max ) − xr | ¯ ¯ √ −1 n n + | min(λTmin I(0,1) (cn )(1 − cn )2 , λB ) min ) − xl | which converges to zero as n → ∞. Here K is a bound on f over C. Since µ ¶ Z Z 1 1 cn (·) −→ − cn (z)dz, . . . , − cn (z)dz M f1 (z) M fr (z) M 2πi 2πi is a continuous mapping of C(C, R2 ) into Rr , it follows that the above vector and subsequently (1.5) form tight sequences. Letting M (·) denote the limit of any weakly converging cn (·)} we have the weak limit of (1.5) equal in distribution to subsequence of {M µ 1 − 2πi Z 1 f1 (z) M (z)dz, . . . , − 2πi Z ¶ fr (z) M (z)dz . The fact that this vector, under the assumptions in (2) or (3), is multivariate Gaussian follows from the fact that Riemann sums corresponding to these integrals are multivariate Gaussian, and that weak limits of Gaussian vectors can only be Gaussian. The limiting expressions for the mean and covariance follow immediately. Notice the assumptions in (2) and (3) require X11 to have the same first, second and fourth moments of either a real or complex Gaussian variable, the latter having real and imaginary parts i.i.d. N (0, 1/2). We will use the terms “real Gaussian” and “complex Gaussian” to refer to these conditions. 10 The reason why concrete results are at present only obtained for the assumptions in (2) and (3) is mainly due to the identity ∗ ∗ (1.15) E(X·1 AX·1 − tr A)(X·1 BX·1 − tr B) 4 = (E|X11 | − 2 2 |EX11 | − 2) n X 2 2 aii bii + |EX11 | tr AB T + tr AB. i=1 valid for n × n A = (aij ) and B = (bij ), which is needed in several places in the proof of Lemma 1.1. The assumptions in (3) leave only the last term on the right, whereas those in (2) leave the last two, but in this case the matrix B will always be symmetric. This also accounts for the relation between the two covariance functions, and the difficulty in obtaining explicit results more generally. As will be seen in the proof, whenever (1.15) is P used little is known about the limiting behavior of aii bii . Simple substitution reveals ZZ 1 f (z(m1 ))g(z(m2 )) (1.16) R.H.S. of (1.7) = − 2 d(m1 )d(m2 ). 2π (m1 − m2 )2 However, the contours depend on the z1 , z2 contours, and cannot be arbitrarily chosen. It is also true that ¯ ¯ ZZ ¯ m(x) − m(y) ¯ 1 0 0 ¯ ¯ dxdy (1.17) (1.7) = 2 f (x)g (y) ln ¯ π m(x) − m(y) ¯ ¶ µ ZZ 1 mi (x)mi (y) 0 0 = dxdy f (x)g (y) ln 1 + 4 2π 2 |m(x) − m(y)|2 and (1.18) 1 EXf = 2π Z 0 µ Z f (x) arg 1 − c ¶ t2 m2 (x) dH(t) dx. (1 + tm(x))2 Here for 0 6= x ∈ R (1.19) m(x) = lim m(z) z→x z ∈ C+ , known to exist and satisfying (1.2) (see Silverstein and Choi (1995)), and mi (x) = Im m(x). The term µ ¶ Z t2 m2 (x) j(x) = arg 1 − c dH(t) (1 + tm(x))2 in (1.18) is well defined for almost every x and takes values in (−π/2, π/2). Section 5 contains proofs of (1.17) and (1.18), along with showing µ ¶ mi (x)mi (y) (1.20) k(x, y) ≡ ln 1 + 4 |m(x) − m(y)|2 11 to be Lebesgue integrable on R2 . It is interesting to note that the support of k(x, y) matches the support of f c,H on R − {0}: k(x, y) = 0 ⇐⇒ min(f c,H (x), f c,H (y)) = 0. We also have f c,H (x) = 0 =⇒ j(x) = 0. Section 5 also contains derivations of the relevant quantities associated with the example given at the beginning of this section. The linear spectral statistic (1/n)TN has a.s. limit d(c) as stated in (1.1). The quantity TN − nd(n/N ) converges weakly to a Gaussian random variable Xln with EXln = (1.21) 1 ln(1 − c) 2 and Var Xln = −2 ln(1 − c). (1.22) R R Results on both TN − ETN and n[ xr dF SN (x) − E xr dF SN (x)] for positive integer r are derived in Jonsson (1982). Included in Section 5 are derivations of the following expressions for means and covariances in this case (H = I[1,∞) ). We have (1.23) EXxr r µ ¶2 √ 2r √ 2r 1 1X r = ((1 − c) + (1 + c) ) − cj 4 2 j=0 j and Cov(Xxr1 , Xxr2 ) ¶k +k r −k µ ¶µ ¶ r2 µ ¶µ ¶µ rX 1 −1 X r1 r2 1 − c 1 2 1X1 2r1 −1−(k1 + `) 2r2 −1−k2 + ` r1 +r2 = 2c ` . r1 − 1 r2 − 1 k1 k2 c (1.24) k1 =0 k2 =0 `=1 It is noteworthy to mention here a consequence of (1.17), namely that if the assumptions in (2) or (3) of Theorem 1.1 were to hold, then Gn , considered as a random element in D[0, ∞) (the space of functions on [0, ∞) that are right-continuous with left-hand limits, together with the Skorohod metric) cannot form a tight sequence in D[0, ∞). Indeed, under the assumptions of either one, if G(x) were a weak limit of a subsequence, then, because of Theorem 1.1, it is straightforward to conclude that for any x0 in the interior of the support of F and positive ² Z x0 +² G(x)dx x0 would be Gaussian, and therefore so would 1 G(x) = lim ²→0 ² Z x0 +² G(x)dx. x0 12 However, the variance would necessarily be Z Z 1 1 x0 +² x0 +² lim k(x, y)dxdy = ∞. ²→0 2π 2 ²2 x x0 0 Still, under the assumptions in (2) or (3), a limit may exist for {Gn } when Gn is viewed as a linear functional Z f −→ f (x)dGn (x), that is, a limit expressed in terms of a measure in a space of generalized functions. The characterization of the limiting measure of course depends on the space, which in turn relies on the set of test functions, which for now is restricted to functions analytic on the support of F . Work in this area is currently being pursued. The proof of Lemma 1.1 is divided into three sections. Sections 2 and 3 handles the limiting behavior of the centralized Mn , while section 4 analyzes the non-random part. In each of the three sections the reader will be referred to work done in Bai and Silverstein (1998). 2. Convergence of finite dimensional distributions. Write for z ∈ Cn , Mn (z) = + Mn2 (z) where Mn1 (z) Mn1 (z) = n[mF Bn (z) − EmF Bn (z)] and Mn2 (z) = n[mEF Bn (z) − mF cn ,Hn (z)]. In this section we will show for any positive integer r, the sum r X αi Mn1 (zi ) (Im zi 6= 0) i=1 whenever it is real, is tight, and, under the assumptions in (2) or (3) of Theorem 1.1, will converge in distribution to a Gaussian random variable. Formula (1.13) will also be derived. We begin with a list of results. Lemma 2.1 [Burkholder (1973)]. Let {Xk } be a complex martingale difference sequence with respect to the increasing σ-field {Fk }. Then for p > 1 ¯X ¯p ´p/2 ³X ¯ ¯ E¯ . X k ¯ ≤ Kp E |Xk |2 (Note: The reference considers only real variables. Extending to complex variables is straightforward.) Lemma 2.2 [Lemma 2.7 in Bai and Silverstein (1998)]. For X = (X1 , . . . , Xn )T i.i.d. standardized (complex) entries, C n × n matrix (complex) we have for any p ≥ 2 ³¡ ´ ¢p/2 E|X ∗ CX − tr C|p ≤ Kp E|X1 |4 tr CC ∗ + E|X1 |2p tr (CC ∗ )p/2 . 13 Lemma 2.3. Let f1 , f2 , . . . be analytic in D, a connected open set of C, satisfying |fn (z)| ≤ M for every n and z in D, and fn (z) converges, as n → ∞ for each z in a subset of D having a limit point in D. Then there exists a function f , analytic in D for which fn (z) → f (z) and fn0 (z) → f 0 (z) for all z ∈ D. Moreover, on any set bounded by a contour interior to D the convergence is uniform and {fn0 (z)} is uniformly bounded. Proof. The conclusions on {fn } are from Vitali’s convergence theorem (see Titchmarsh (1939), p. 168). Those on {fn0 } follow from the dominated convergence theorem (d.c.t.) and the identity Z fn (w) 1 0 fn (z) = dw. 2πi C (w − z)2 Lemma 2.4 [Theorem 35.12 of Billingsley (1995)]. Suppose for each n Yn1 , Yn2 , . . . , Ynrn is a real martingale difference sequence with respect to the increasing σ-field {Fnj } having second moments. If as n → ∞ rn X (i) i.p. 2 E(Ynj |Fn,j−1 ) −→ σ 2 j=1 where σ 2 is a positive constant, and for each ² > 0 rn X (ii) 2 E(Ynj I(|Ynj |≥²) ) → 0 j=1 then rn X Ynrn →D N (0, σ 2 ). j=1 Recalling the truncation and centralization steps, we get from Lemma 2.2 (2.1) ∗ E|X·1 CX·1 − tr C|p ≤ Kp kCkp [np/2 + δn(2p−4) n(p−1) ] ≤ K p kCkp δn(2p−4) n(p−1) , p ≥ 2. Let v = Im z. For the following analysis we will assume v > 0. To facilitate notation, we will let T = Tn . Because of assumption 20 we may assume kT k ≤ 1 for all n. Constants appearing in inequalities will be denoted by K and may take on different values from one √ expression to the next. Let rj = (1/ N )T 1/2 X· j , D(z) = Bn − zI, Dj (z) = D(z) − rj rj∗ ²j (z) = rj∗ Dj−1 (z)rj − 1 tr T Dj−1 (z), N δj (z) = rj∗ Dj−2 (z)rj − 1 d tr T Dj−2 (z) = ²j (z) N dz and βj (z) = 1 1 1 ¯j (z) = , β (z) = , b . n −1 −1 1 + rj∗ Dj (z)rj 1 + N −1 tr Tn Dj (z) 1 + N −1 Etr Tn D1−1 (z) 14 All of the three latter quantities are bounded in absolute value by |z|/v (see (3.4) of Bai and Silverstein (1998)). We have D−1 (z) − Dj−1 (z) = −Dj−1 (z)rj rj∗ Dj−1 (z)βj (z), and from Lemma 2.10 of Bai and Silverstein (1998) for any n × n A |tr (D−1 (z) − Dj−1 (z))A| ≤ (2.2) kAk . Im z For nonrandom n×n Ak , k = 1, · · · , p and Bl , l = 1, · · · , q, we shall establish the following inequality. (2.3) |E( p Y r1∗ Ak r1 k=1 q Y (r1∗ Bl r1 − N −1 tr T Bl ))| l=1 KN −(1∧q) δn(2q−4)∨0 ≤ p Y kAk k k=1 q Y kBl k p ≥ 0, q ≥ 0. l=1 When p = 0, q = 1, the left side is 0. When p = 0, q > 1, (2.3) is a consequence of (2.1) and H¨ older’s inequality. If p ≥ 1, then by induction on p we have |E( p Y r1∗ Ak r1 k=1 ≤ |E( p−1 Y q Y (r1∗ Bl r1 − N −1 tr T Bl ))| l=1 r1∗ Ak r1 (r1∗ Ap r1 −N −1 tr T Ap ) k=1 +nN q Y (r1∗ Bl r1 − N −1 tr T Bl ))| l=1 −1 kAp k|E( p−1 Y r1∗ Ak r1 k=1 ≤ q Y (r1∗ Bl r1 − N −1 tr T Bl ))| l=1 KN −1 δn(2q−4)∨0 p Y kAk k p k=1 q Y kBl k. l=1 We have proved the case where q > 0. When q = 0, (2.3) is an trivial consequence of (2.1). Let E0 (·) denote expectation and Ej (·) denote conditional expectation with respect to the σ-field generated by r1 , · · · , rj . We have n[mF Bn (z) − EmF Bn (z)] = tr [D −1 (z) − ED −1 (z)] = N X j=1 15 tr Ej D−1 (z) − tr Ej−1 D−1 (z) = N X tr Ej [D−1 (z) − Dj−1 (z)] − tr Ej−1 [D−1 (z) − Dj−1 (z)] j=1 =− N X (Ej − Ej−1 )βj (z)rj∗ Dj−2 (z)rj . j=1 Write βj (z) = β¯j (z) − βj (z)β¯j (z)²j (z) = β¯j (z) − β¯j2 (z)²j (z) + β¯j2 (z)βj (z)²2j (z). We have then (Ej − Ej−1 )βj (z)rj∗ Dj−2 (z)rj = (Ej − Ej−1 )(β¯j (z)δj (z) − β¯j2 (z)²j (z)δj (z) 1 − β¯j2 (z)²j (z) tr T Dj−2 (z) + β¯j2 (z)βj (z)²2j (z)rj∗ Dj−2 (z)rj ) N 1 = Ej (β¯j (z)δj (z) − β¯j2 (z)²j (z) tr T Dj−2 (z)) N − (Ej − Ej−1 )β¯j2 (z)(²j (z)δj (z) − βj (z)rj Dj−2 (z)rj ²2j (z)). Using (2.3), we have E| N X (Ej − Ej−1 )β¯j2 (z)²j (z)δj (z)|2 = j=1 N X E|(Ej − Ej−1 )β¯j2 (z)²j (z)δj (z)|2 j=1 ≤4 N X E|β¯j2 (z)²j (z)δj (z)|2 = o(1). j=1 Therefore, N X (Ej − Ej−1 )β¯j2 (z)²j (z)δj (z) converges to zero in probability. j=1 By the same argument, we have N X i.p. (Ej − Ej−1 )βj (z)rj Dj−2 (z)rj ²2j (z) −→ 0. j=1 Therefore we need only consider the sum r X i=1 αi N X Yj (zi ) = j=1 r N X X αi Yj (zi ) j=1 i=1 where 1 d Yj (z) = −Ej (β¯j (z)δj (z) − β¯j2 (z)²j (z) tr T Dj−2 (z)) = −Ej β¯j (z)²j (z). N dz 16 Again, by using (2.3), we obtain µ 4 E|Yj (z)| ≤ K ¶ |z|4 |z|8 ³ n ´4 4 4 E|δj (z)| + 16 E|²j (z)| = o(N −1 ) v4 v N which implies for any ² > 0 ¯2 ¯ r ¯4 ¶ µ¯X N N X ¯ r ¯ ¯ 1 X ¯¯X P ¯ ¯ E ¯ αi Yj (zi )¯ I(| r αi Yj (zi )|≥²) ≤ 2 E¯ αi Yj (zi )¯¯ → 0 ² j=1 i=1 i=1 j=1 i=1 as n → ∞. Therefore condition (ii) of Lemma 2.4 is satisfied and it is enough to prove, under the assumptions in (2) or (3), for z1 , z2 ∈ C+ N X (2.4) Ej−1 [Yj (z1 )Yj (z2 )] j=1 converges in probability to a constant (and to determine the constant). Pr We show here for future use the tightness of the sequence { i=1 αi Mn1 (zi )}. From (2.3) we easliy get E|Yj (z)|2 = O(N −1 ), so that (2.5) ¯ r ¯2 ¯ r ¯2 ¯ ¯2 N r N N X X X ¯X X ¯ ¯X ¯ ¯ ¯ E¯¯ αi Yj (zi )¯¯ = E¯¯ αi Yj (zi )¯¯ ≤ r |αi |2 E¯¯Yj (zi )¯¯ ≤ K. i=1 j=1 j=1 i=1 j=1 i=1 Consider the sum (2.6) N X Ej−1 [Ej (β¯j (z1 )²j (z1 ))Ej (β¯j (z2 )²j (z2 ))]. j=1 In the j th term (viewed as an expectation with respect to rj+1 , . . . , rN ) we apply the d.c.t. to the difference quotient defined by β¯j (z)²j (z) to get ∂2 (2.6) = (2.4). ∂z2 ∂z1 Let v0 be a lower bound on Imzi . For each j let Aij = (1/N )T 1/2 Ej Dj−1 (zi )T 1/2 , ∗ i = 1, 2. Then tr Aij Aij ≤ n(v0 N )−2 . Using (2.1) we see, therefore, that (2.6) is bounded. We can then appeal to Lemma 2.3. Suppose (2.6) converges in probability for each zk , zl ∈ {zi }, bounded away from the imaginary axis and having a limit point. Then by a diagonalization argument, for any subsequence of the natural numbers, there is a further subsequence such that, with probability one, (2.6) converges for each pair zk , zl . Applying Lemma 2.3 twice we see that almost surely (2.4) will converge on the subsequence for each 17 pair. That is enough to imply convergence in probability of (2.4). Therefore we need only show (2.6) converges in probability. From the derivation above (4.3) of Bai and Silverstein (1998) we get E|β¯j (zi ) − bn (zi )|2 ≤ K |zi |4 −1 N . v06 This implies E|Ej−1 [Ej (β¯j (z1 )²j (z1 ))Ej (β¯j (z2 )²j (z2 ))] − Ej−1 [Ej (bn (z1 )²j (z1 ))Ej (bn (z2 )²j (z2 ))]| = O(N −3/2 ), from which we get N X Ej−1 [Ej (β¯j (z1 )²j (z1 ))Ej (βˆj (z2 )²j (z2 ))] − bn (z1 )bn (z2 ) j=1 N X Ej−1 [Ej (²j (z1 ))Ej (²j (z2 ))] j=1 i.p. −→ 0. Thus the goal is to show (2.7) bn (z1 )bn (z2 ) N X Ej−1 [Ej (²j (z1 ))Ej (²j (z2 ))] j=1 converges in probability, and to determine its limit. The latter’s second mixed partial derivative will yield the limit of (2.4). 2 We now assume the “complex Gaussian” case, namely EX11 = o(1/n) and E|X11 |4 = 2 + o(1), so that, using (1.15), (2.7) becomes bn (z1 )bn (z2 ) N 1 X (tr T 1/2 Ej (Dj−1 (z1 ))T Ej (Dj−1 (z2 ))T 1/2 + o(1)An ) 2 N j=1 where ¯ −1 (z1 ))tr T Ej (D−1 (z2 ))T Ej (D ¯ −1 (z2 )))1/2 = O(N ). |An | ≤ K(tr T Ej (Dj−1 (z1 ))T Ej (D j j j Thus we study (2.8) N 1 X bn (z1 )bn (z2 ) 2 tr T 1/2 Ej (Dj−1 (z1 ))T Ej (Dj−1 (z2 ))T 1/2 N j=1 18 The “real Gaussian” case (Tn , X11 real, E|X11 |4 = 3 + o(1)) will be double that of the limit of (2.8). Let Dij (z) = D(z) − ri ri∗ − rj rj∗ , βij (z) = 1 −1 1 + ri∗ Dij (z)ri and b1 (z) = We write Dj (z1 ) + z1 I − N −1 N b1 (z1 )T = N X 1 . 1 + N −1 Etr T D1−1 2 (z) ri ri∗ − N −1 N b1 (z1 )T. i6=j Multiplying by (z1 I − N −1 −1 N b1 (z1 )T ) on the left, Dj−1 (z1 ) on the right, and using −1 ri∗ Dj−1 (z1 ) = βij (z1 )ri∗ Dij (z1 ) we get Dj−1 (z1 ) = −(z1 I − −1 N −1 N b1 (z1 )T ) + X βij (z1 )(z1 I − −1 ∗ −1 N −1 N b1 (z1 )T ) ri ri Dij (z1 ) i6=j − (2.9) = −(z1 I − where A(z1 ) = B(z1 ) = −1 N −1 N b1 (z1 )T ) X X (z1 I − N −1 N b1 (z1 )(z1 I − −1 −1 N −1 N b1 (z1 )T ) T Dj (z1 ) + b1 (z1 )A(z1 ) + B(z1 ) + C(z1 ), −1 N −1 (ri ri∗ N b1 (z1 )T ) −1 − N −1 T )Dij (z1 ) i6=j (βij (z1 ) − b1 (z1 ))(z1 I − −1 −1 N −1 ri ri∗ Dij (z1 ) N b1 (z1 )T ) i6=j and C(z1 ) = N −1 b1 (z1 )(z1 I − −1 N −1 T N b1 (z1 )T ) X −1 (Dij (z1 ) − Dj−1 (z1 )). i6=j It is easy to verify for any real t ¯ ¯−1 −1 −1 ¯ ¯ t ¯1 − ¯ ≤ |z(1 + N Etr T D12 (z))| −1 −1 ¯ z(1 + N −1 Etr T D12 (z)) ¯ Im z(1 + N −1 Etr T D12 (z)) ≤ Thus (2.10) k(z1 I − −1 N −1 k N b1 (z1 )T ) 19 ≤ 1 + n/(N v0 ) . v0 |z|(1 + n/(N v0 )) . v0 Let M be n × n and let |||M ||| denote a nonrandom bound on the spectral norm of M for all parameters governing M and under all realizations of M . From (4.3) of Bai and Silverstein (1998), (2.3), and (2.10) we get (2.11) −1 E|tr B(z1 )M | ≤ N E1/2 (|β1 2 (z1 ) − b1 (z1 )|2 )E1/2 (|ri∗ Dij (z1 )M (z1 I − ≤ −1 N −1 ri |2 ) N b1 (z1 )T ) |z1 |2 (1 + n/(N v0 )) 1/2 K|||M ||| N . v05 From (2.2) we have |tr C(z1 )M | ≤ |||M ||| (2.12) |z1 |(1 + n/(N v0 )) . v03 From (2.3) and (2.10) we get for M nonrandom (2.13) E|tr A(z1 )M | ≤ −1 (z1 )M (z1 I − KE1/2 tr T 1/2 Dij −1 N −1 T (¯ z1 I N b1 (z1 )T ) − −1 N −1 z1 )T )−1 M ∗ Dij (¯ z1 )T 1/2 N b1 (¯ ≤ KkM k (1 + n/(N v0 )) 1/2 N . v02 We write (using the identity above (2.2)) tr Ej (A(z1 ))T Dj−1 (z2 )T = A1 (z1 , z2 ) + A2 (z1 , z2 ) + A3 (z1 , z2 ), (2.14) where A1 (z1 , z2 ) = X (z1 I − − tr i<j =− X −1 −1 −1 −1 N −1 ri ri∗ Ej (Dij (z1 ))T βij (z2 )Dij (z2 )ri ri∗ Dij (z2 )T N b1 (z1 )T ) −1 −1 −1 βij (z2 )ri∗ Ej (Dij (z1 ))T Dij (z2 )ri ri∗ Dij (z2 )T (z1 I − −1 N −1 ri N b1 (z1 )T ) i<j A2 (z1 , z2 ) = − tr X (z1 I − −1 −1 −1 N −1 N T Ej (Dij (z1 ))T (Dj−1 (z2 ) N b1 (z1 )T ) −1 − Dij (z2 ))T i<j and A3 (z1 , z2 ) = tr X (z1 I − −1 N −1 (ri ri∗ N b1 (z1 )T ) i<j 20 −1 −1 − N −1 T )Ej (Dij (z1 ))T Dij (z2 )T. We get from (2.2) and (2.10) |A2 (z1 , z2 )| ≤ (2.15) (1 + n/(N v0 )) v02 and similar to (2.11) we have E|A3 (z1 , z2 )| ≤ (1 + n/(N v0 )) 1/2 N . v03 Using (2.1), (2.3), and (4.3) of Bai and Silverstein (1998) we have for i < j −1 −1 −1 E|βij (z2 )ri∗ Ej (Dij (z1 ))T Dij (z2 )ri ri∗ Dij (z2 )T (z1 I − − −1 N −1 ri N b1 (z1 )T ) −1 −1 −1 b1 (z2 )N −2 tr (Ej (Dij (z1 ))T Dij (z2 )T )tr (Dij (z2 )T (z1 I − NN−1 b1 (z1 )T )−1 T )| ≤ KN −1/2 (K now depending as well on the zi and on n/N ). Using (2.2) we have −1 −1 −1 |tr (Ej (Dij (z1 ))T Dij (z2 )T )tr (Dij (z2 )T (z1 I − − −1 N −1 T) N b1 (z1 )T ) tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T )tr (Dj−1 (z2 )T (z1 I − NN−1 b1 (z1 )T )−1 T )| ≤ KN. It follows that (2.16) E|A1 (z1 , z2 ) + j−1 −1 −1 −1 N 2 b1 (z2 )tr (Ej (Dj (z1 ))T Dj (z2 )T )tr (Dj (z2 )T (z1 I − −1 N −1 N b1 (z1 )T ) T )| 1/2 ≤ KN . Therefore, from (2.9)–(2.16) we can write tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T )[1 + j−1 −1 −1 N −1 T )] N 2 b1 (z1 )b1 (z2 )tr (Dj (z2 )T (z1 I − N b1 (z1 )T ) = −tr ((z1 I − NN−1 b1 (z1 )T )−1 T Dj−1 (z2 )T ) + A4 (z1 , z2 ) where E|A4 (z1 , z2 )| ≤ KN 1/2 . Using the expression for Dj−1 (z2 ) in (2.9) and (2.11)–(2.13) we find that tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T ) [1 − j−1 −1 N −1 T (z1 I − NN−1 b1 (z1 )T )−1 T )] N 2 b1 (z1 )b1 (z2 )tr ((z2 I − N b1 (z2 )T ) = tr ((z2 I − NN−1 b1 (z2 )T )−1 T (z1 I − NN−1 b1 (z1 )T )−1 T ) + A5 (z1 , z2 ) 21 where E|A5 (z1 , z2 )| ≤ KN 1/2 . From (2.2) we have |b1 (z) − bn (z)| ≤ KN −1 . From (4.3) of Bai and Silverstein (1998) we have |bn (z) − Eβ1 (z)| ≤ KN −1/2 . From the formula N 1 X mn = − βj (z) zN j=1 [(2.2) of Silverstein (1995)] we get Eβ1 (z) = −zEmn (z). Section 5 of Bai and Silverstein (1998) proves that |Emn (z) − mn0 (z)| ≤ KN −1 . Therefore we have |b1 (z) + zm0n (z)| ≤ KN −1/2 , (2.17) so that we can write (2.18) tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T ) [1 − j−1 0 0 N 2 mn (z1 )mn (z2 )tr ((I = + m0n (z2 )T )−1 T (I + m0n (z1 )T )−1 T )] 1 tr ((I + m0n (z2 )T )−1 T (I + m0n (z1 )T )−1 T ) + A6 (z1 , z2 ) z1 z2 where E|A6 (z1 , z2 )| ≤ KN 1/2 . Rewrite (2.18) as tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T ) [1 − Z t2 dHn (t) ] (1 + tm0n (z1 ))(1 + tm0n (z2 )) Z N cn t2 dHn (t) = + A6 (z1 , z2 ). z1 z2 (1 + tm0n (z1 ))(1 + tm0n (z2 )) j−1 0 0 N cn mn (z1 )mn (z2 ) 22 Using (3.9) and (3.16) in Bai and Silverstein (1998) we find ¯ ¯ Z 2 ¯ ¯ dH (t) t n 0 0 ¯ (2.19) ¯¯cn mn (z1 )mn (z2 ) (1 + tm0n (z1 ))(1 + tm0n (z2 )) ¯ ¯ ¯ R t2 dHn (t) ¯ ¯ 0 (1+tmn (z1 ))(1+tm0n (z2 )) ¶µ = ¯¯cn µ R t dHn (t) R ¯ −z1 + cn 1+tm0 (z1 ) −z2 + cn ¯ n R ≤ cn ¯ R ¯ ¯−z1 + cn = Im m0n (z1 )cn Im z1 + 1/2 t2 dHn (t) |1+tm0n (z1 )|2 R ¯2 t dHn (t) ¯ 1+tm0n (z1 ) ¯ t2 dHn (t) |1+tm0n (z1 )|2 Im m0n (z1 )cn since R t2 dHn (t) |1+tm0n (z1 )|2 R t2 dHn (t) |1+tm0n (z2 )|2 cn ¯ R ¯ ¯−z2 + cn 1/2 Im z2 + 1/2 ¯2 t dHn (t) ¯ 1+tm0n (z2 ) ¯ Im m0n (z2 )cn ¯ ¯ ¯ ¯ ¶ ¯¯ t dHn (t) ¯ 1+tm0n (z2 ) ¯ R t2 dHn (t) |1+tm0n (z2 )|2 Im m0n (z2 )cn R t2 dHn (t) |1+tm0n (z2 )|2 1/2 <1 Im z R t2 dHn (t) Im m0n (z)cn |1+tm 0 (z)|2 n is bounded away from 0. Therefore using (2.17) and letting an (z1 , z2 ) denote the expression inside the absolute value sign in (2.19) we find that (2.8) can be written as N 1 X an (z1 , z2 ) N j=1 1 − 1 j−1 N an (z1 , z2 ) + A7 (z1 , z2 ) where E|A7 (z1 , z2 )| ≤ KN −1/2 . We see then that i.p. (2.8) −→ a(z1 , z2 ) where Z 0 1 1 dt = 1 − ta(z1 , z2 ) Z 0 a(z1 ,z2 ) 1 dz 1−z Z t2 dH(t) = (1 + tm(z1 ))(1 + tm(z2 )) µ Z ¶ Z t dH(t) t dH(t) m(z1 )m(z2 )(z1 − z2 ) m(z1 )m(z2 ) c −c =1+ . m(z2 ) − m(z1 ) 1 + tm(z1 ) 1 + tm(z2 ) m(z2 ) − m(z1 ) a(z1 , z2 ) = cm(z1 )m(z2 ) Thus the i.p. limit of (2.4) under the “complex Gaussian” case is à ∂ ! Z a(z1 ,z2 ) a(z , z ) 1 2 1 ∂ ∂2 ∂z1 dz = ∂z2 ∂z1 0 1−z ∂z2 1 − a(z1 , z2 ) 23 · ¸ ∂ (m(z2 )−m(z1 ))(m0 (z1 )m(z2 )(z1 −z2 )+m(z1 )m(z2 ))+m(z1 )m(z2 )(z1 −z2 )m0 (z1 ) = ∂z2 (m(z2 )−m(z1 ))2 m(z2 ) − m(z1 ) × m(z1 )m(z2 )(z2 − z1 ) ∂ =− ∂z2 µ = m0 (z1 ) m0 (z1 ) 1 + + m(z1 ) z 1 − z2 m(z2 ) − m(z1 ) ¶ m0 (z1 )m0 (z2 ) 1 − 2 (m(z2 ) − m(z1 )) (z1 − z2 )2 which is (1.13). 3. Tightness of Mn1 (z). We proceed to prove tightness of the sequence of random cn1 (z) for z ∈ C defined by (1.11). We will use Theorem 12.3 (p. 96) of Billingsley functions M (1968). It is easy to verify from the proof of the Arzela-Ascoli Theorem (p. 221 of Billingsley (1968)), that condition (i) of Theorem 12.3 can be replaced with the assumption of tightness at any point in [0, 1]. From (2.5) we see that this condition is satisfied. We will verify condition (ii) of Theorem 12.3 by proving the moment condition (12.51) of Billingsley (1968). We will show E|Mn1 (z1 ) − Mn1 (z2 )|2 |z1 − z2 |2 n z1 ,z2 ∈Cn sup is finite. −1 We claim that moments of kD−1 (z)k, kDj−1 (z)k and kDij (z)k are bounded in n and z ∈ Cn . This is clearly true for z ∈ Cu and for z ∈ Cl if xl < 0. For z ∈ Cr or, if xl > 0, z ∈ Cl , we use (1.9) and (1.10) on, for example B(1) = Bn − r1 r1∗ , to get B (1) EkDj−1 (z)kp ≤ K1 + v −p P(kB(1) k ≥ ηr or λmin ≤ ηl ) ≤ K1 + K2 np ²−p n−` ≤ K √ for suitably large `. Here, ηr is any fixed number between lim supn kT k(1 + c)2 and xr , √ and, if xl > 0, ηl is any fixed number between xl and lim inf n λTmin (1 − c)2 (take ηl < 0 if xl < 0). Therefore for any positive p −1 (z)kp ) ≤ Kp . max(EkD−1 (z)kp , EkDj−1 (z)kp , EkDij (3.1) We can use the above argument to extend (2.3). Using (1.8) and (2.3) we get (3.2) q Y |E(a(v) (r1∗ Bl (v)r1 − N −1 tr T Bl (v)))| ≤ KN −(1∧q) δn(2q−4)∨0 l=1 24 p ≥ 0, q ≥ 0 where now the matrices Bl (v) are independent of r1 and e max(|a(v)|, kBl (v)k) ≤ K(1 + ns I(kBn k ≥ ηr or λB min ≤ ηl )) e being Bn or Bn with one or two of the rj ’s removed. for some positive s, with B We remark here that when q ≥ 2, (3.2) still holds if the absolute sign is inside the expectation sign, which is a consequence of (3.2) and an application of Cauchy-Schwarz inequality q Y E|a(v) (r1∗ Bl (v)r1 − N −1 tr T Bl (v)))| l=1 1/2 ≤E (a(v)¯ a(v) q−1 Y ¯l (v))) ¯l (v)r1 − N −1 tr T B (r1∗ Bl (v)r1 − N −1 tr T Bl (v))(r1∗ B l=1 ×E1/2 ((r1∗ Bq (v)r1 − N −1 tr T Bq (v))(r1∗ B q (v)r1 − N −1 tr T B q (v))) ≤ KN −(1∧q) δn(2q−4)∨0 . Also, we would like to inform the reader that in applications of (3.2), a(v) is a product of factors of the form β1 (z) or r1∗ A(z)r1 and A is a product of one or several D1−1 (z) D1−1 (zj ), j = 1, 2 or similarly defined D−1 matrices. The matrices Bl also have this form. For example we have |r1∗ D1−1 (z1 )D1−1 (z2 )r1 | ≤ |r1 |2 kD1−1 (z1 ))D1−1 (z2 )k ≤ K 2 ηr + B(1) |z|n3+2α I(kBn k ≥ ηr or λmin ≤ ηl )), where K can be taken to be max((xr − ηr )−1 , (ηl − xl )−1 , v0−1 ), and where we have used (1.10) and the fact that |r1 |2 ≤ ηr if kBn k < ηr and |r1 |2 ≤ n otherwise. We have kBl k obviously satisfying this condition. We also have β1 (z) satisfying this condition since from (3.3) (see below) |β1 (z)| = |1 − r1∗ D−1 r1 | ≤ n 1 + Kηr + |z|n2+α I(kBn k ≥ ηr or λB min ≤ ηl )). In the sequel, we shall freely use (3.2) without verifying these conditions, even similarly defined βj functions and A, B matrices. We have (3.3) D −1 (z) − Dj−1 (z) =− Dj−1 (z)rj rj∗ Dj−1 (z) 1 + rj∗ Dj−1 (z)rj = −βj (z)Dj−1 (z)rj rj∗ Dj−1 (z). Let γj (z) = rj∗ Dj−1 (z)rj − N −1 E(tr (Dj−1 (z)T ). We first derive bounds on the moments of γj (z) and ²j (z). Using (3.2) we have E|²j (z)|p ≤ Kp N −1 δn2p−4 , (3.4) p ≥ 2. It should be noted that constants obtained do not depend on z ∈ Cn . Using Lemma 2.1, (3.2), and H¨ older’s inequality, we have for all p ≥ 2 ¯ X ¯p ¯1 N ¯ −1 −1 E|γj (z) − ²j (z)| = E|γ1 (z) − ²1 (z)| = E¯¯ Ej tr T D1 (z) − Ej−1 tr T D1 (z)¯¯ N p p j=2 25 ¯ X ¯p ¯1 N ¯ −1 −1 −1 −1 ¯ = E¯ Ej tr T (D1 (z) − D1j (z)) − Ej−1 tr T (D1 (z) − D1j (z))¯¯ N j=2 ¯p ¯N ¯ 1 ¯¯X −1 ∗ −1 = p E¯ (Ej − Ej−1 )β1j (z)rj D1j (z)T D1j (z)rj ¯¯ N j=2 p/2 ¯ ¯ N 2 X¯ ¯ Kp −1 −1 ≤ p E ¯¯(Ej − Ej−1 )β1j (z)rj∗ D1j (z)T D1j (z)rj ¯¯ N j=2 ≤ Kp N 1+p/2 N X −1 −1 E|(Ej − Ej−1 )β1j (z)rj∗ D1j (z)T D1j (z)rj |p j=2 ≤ Kp Kp −1 ∗ −1 p E|β (z)r D (z)T D (z)r | ≤ . 12 2 2 12 12 N p/2 N p/2 Therefore E|γj |p ≤ Kp N −1 δn2p−4 , (3.5) p ≥ 2. We next prove that bn (z) is bounded for all n. From (3.2) we find for any p ≥ 1 E|β1 (z)|p ≤ Kp . (3.6) Since bn = β1 (z) + β1 (z)bn (z)γ1 (z) we get from (3.5),(3.6) |bn (z)| = |Eβ1 (z) + Eβ1 (z)bn (z)γ1 (z)| ≤ K1 + K2 |bn (z)|N −1/2 . Thus for all n large |bn (z)| ≤ K1 , 1 − K2 N −1/2 and subsequently bn (z) is bounded for all n. From (3.3) we have D−1 (z1 )D−1 (z2 ) − Dj−1 (z1 )Dj−1 (z2 ) = (D−1 (z1 ) − Dj−1 (z1 ))(D−1 (z2 ) − Dj−1 (z2 )) + (D−1 (z1 ) − Dj−1 (z1 ))Dj−1 (z2 ) + Dj−1 (z1 )(D−1 (z2 ) − Dj−1 (z2 )) = βj (z1 )βj (z2 )Dj−1 (z1 )rj rj∗ Dj−1 (z1 )Dj−1 (z2 )rj rj∗ Dj−1 (z2 ) − βj (z1 )Dj−1 (z1 )rj rj∗ Dj−1 (z1 )Dj−1 (z2 ) − βj (z2 )Dj−1 (z1 )Dj−1 (z2 )rj rj∗ Dj−1 (z2 ). Therefore, 26 ¡ ¢ (3.7) tr D−1 (z1 )D−1 (z2 ) − Dj−1 (z1 )Dj−1 (z2 ) = βj (z1 )βj (z2 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2 − βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj − βj (z2 )rj∗ Dj−2 (z2 )Dj−1 (z1 )rj . We write mn (z1 ) − mn (z2 ) = 1 1 tr (D−1 (z1 ) − D−1 (z2 )) = (z1 − z2 )tr D−1 (z1 )D−1 (z2 ) n n Therefore, from (3.7) we have mn (z1 ) − mn (z2 ) − E(mn (z1 ) − mn (z2 )) X n = (Ej − Ej−1 )tr D−1 (z1 )D−1 (z2 ) z1 − z2 j=1 N (3.8) = N X (Ej − Ej−1 )βj (z1 )βj (z2 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2 j=1 − N X (Ej − Ej−1 )βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj − j=1 N X (Ej − Ej−1 )βj (z2 ))rj∗ Dj−2 (z2 )Dj−1 (z1 )rj . j=1 Our goal is to show the absolute second moment of (3.8) is bounded. We begin with the second sum in (3.8). We have N X (Ej − Ej−1 )βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj j=1 = N X ¡ ¢ (Ej − Ej−1 ) bn (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj −βj (z1 )bn (z1 ))rj∗ Dj−2 (z1 )Dj−1 (z2 )rj γj (z1 ) j=1 = bn (z1 ) N X Ej (rj∗ Dj−2 (z1 )Dj−1 (z2 )rj − N −1 tr T 1/2 Dj−2 (z1 )Dj−1 (z2 )T 1/2 ) j=1 − bn (z1 ) N X (Ej − Ej−1 )βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj γj (z1 ) ≡ bn (z1 )W1 − bn (z1 )W2 . j=1 Using (3.2) we have 2 E|W1 | = N X E|Ej (rj∗ Dj−2 (z1 )Dj−1 (z2 )rj − N −1 tr T 1/2 Dj−2 (z1 )Dj−1 (z2 )T 1/2 )|2 ≤ K. j=1 27 Using (3.5), and the bounds for β1 (z1 ) and r1∗ D1−2 (z1 )D1−1 (z2 )r1 given in the remark to (3.2), we have E|W2 |2 = N X E|(Ej − Ej−1 )βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj γj (z1 )|2 j=1 B (1) ≤ KN [E|γ1 (z1 )|2 + v −10 n2 P (kBn k > ηr or λmin < ηl )] ≤ K. This argument of course handles the third sum in (3.8). For the first sum in (3.8) we have N X (Ej − Ej−1 )βj (z1 )βj (z2 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2 j=1 N X = bn (z1 )bn (z2 ) (Ej −Ej−1 )[(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2 −(N −1 tr T 1/2 Dj−1(z1 )Dj−1(z2 )T 1/2 )2 ] j=1 −bn (z2 ) N X (Ej − Ej−1 )βj (z1 )βj (z2 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2 γj (z2 ) j=1 −bn (z1 )bn (z2 ) N X (Ej − Ej−1 )βj (z1 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2 γj (z1 ) j=1 = bn (z1 )bn (z2 )Y1 − bn (z2 )Y2 − bn (z1 )bn (z2 )Y3 . Both Y2 and Y3 are handled the same way as W2 above. Using (3.2) we have E|Y1 |2 ≤ N E|(r1∗ D1−1 (z1 )D1−1 (z2 )r1 )2 − (N −1 tr T 1/2 D1−1 (z1 )D1−1 (z2 )T 1/2 )2 |2 ≤ N (2E|(r1∗ D1−1 (z1 )D1−1 (z2 )r1 − N −1 tr T 1/2 D1−1 (z1 )D1−1 (z2 )T 1/2 |4 ¯ ¯2 ¯ ¯ +4(nN −1 )2 E¯(r1∗ D1−1 (z1)D1−1 (z2 )r1 −N −1 tr T 1/2 D1−1 (z1)D1−1 (z2 )T 1/2 )kD1−1 (z1)D1−1 (z2 )k¯ ≤K Therefore, condition (ii) of Theorem 12.3 in Billingsley (1968) is satisfied, and we c1 (z)} is tight. conclude that {M n 4. Convergence of Mn2 (z). The proof of Lemma 1.1 is completed with the verification of {Mn2 (z)} for z ∈ Cn to be bounded and forms an equicontinuous family, and convergence to (1.12) under the assumptions in (2) and to zero under those in (3). 28 In order to simplify the exposition, we let C1 = Cu or Cu ∪ Cl if xl < 0, and C2 = C2 (n) = Cr or Cr ∪ Cl if xl > 0. We begin with proving sup |Emn (z) − m(z)| → 0 (4.1) z∈Cn as n → ∞. Since F B n →D F c,H almost surely, we get from d.c.t. EF B n →D F c,H . It is easy to verify that EF B n is a proper c.d.f. Since, as z ranges in C1 , the functions (λ − z)−1 in λ ∈ [0, ∞) form a bounded, equicontinuous family, it follows [see, e.g. Billingsley (1968), Problem 8, p. 17] that sup |Emn (z) − m(z)| → 0. z∈C1 For z ∈ C2 we write (ηl , ηr defined as in previous section) Z Z 1 1 Bn c,H Emn (z)−m(z) = I[ηl ,ηr ] (λ)d(EF (λ)−F (λ))+E I[η ,η ]c (λ)dF B n (λ). λ−z λ−z l r As above, the first term converges uniformly to zero. For the second term we use (1.9) with ` ≥ 2 to get ¯ Z ¯ ¯ ¯ 1 Bn ¯ sup ¯E I[ηl ,ηr ]c (λ)dF (λ)¯¯ λ−z z∈C2 −1 −` n n → 0. ≤ (²n /n)−1 P(kBn k ≥ ηr or λB min ≤ ηl ) ≤ Kn² Thus (4.1) holds. From the fact that F cn ,Hn →D F c,H [see Bai and Silverstein (1998), below (3.10)] along with the fact that C lies outside the support of F c,H , it is straightforward to verify that sup |m0n (z) − m(z)| → 0 z∈C (4.2) as n → ∞. We now show that sup k(Emn (z)T + I)−1 k < ∞. n z∈Cn (4.3) From Lemma 2.11 of Bai and Silverstein (1998), k(Emn (z)T + I)−1 k is bounded by max(2, 4v0−1 ) on Cu . Let x = xl or xr . Since x is outside the support of F c,H it follows from Theorem 4.1 of Silverstein and Choi (1995) that for any t in the support of H m(x)t+1 6= 0. Choose any t0 in the support of H. Since m(z) is continuous on C0 ≡ {x + iv : v ∈ [0, v0 ]}, there exist positive constants δ1 and µ0 such that inf |m(z)t0 + 1| > δ1 , z∈C0 and 29 sup |m(z)| < µ0 . z∈C0 Using Hn →D H and (4.1), for all large n, there exists an eigenvalue λT of T such that |λT − t0 | < δ1 /4µ0 and supz∈Cl ∪Cr |Emn (z) − m(z)| < δ1 /4. Therefore, we have inf |Emn (z)λT + 1| > δ1 /2, z∈Cl ∪Cr which completes the proof of (4.3). Next we show the existence of ξ ∈ (0, 1) such that for all n large ¯ ¯ Z 2 ¯ ¯ t 2 ¯ < ξ. (4.4) sup ¯¯cn Emn (z) dH (t) n ¯ 2 (1 + tEm (z)) n z∈Cn From the identity (1.1) of Bai and Silverstein (1998) 1 R m(z) = t −z + c 1+tm(z) dH(t) valid for z = x + iv outside the support of F c,H , we find R t2 v + Im m(z)c |1+tm(z)| 2 dH(t) Im m(z) = . ¯ ¯2 R ¯ ¯ t dH(t)¯ ¯−z + c 1+tm(z) Therefore (4.5) R ¯ ¯ Z t2 2 c |1+tm(z)| ¯ ¯ 2 dH(t) t ¯cm(z)2 ¯≤ ¯ dH(t) ¯2 ¯ ¯ ¯ R (1 + tm(z))2 ¯ t ¯−z + c 1+tm(z) dH(t)¯ = v+ R t2 |1+tm(z)|2 dH(t) R t2 Im m(z)c |1+tm(z)| 2 dH(t) Im m(z)c c = R R 1+ 1 t2 c,H dF (x) 2 |x−z| |1+tm(z)|2 dH(t) R 1 R t2 c |x−z|2 dF c,H (x) |1+tm(z)| 2 dH(t) < 1, for all z ∈ C. By continuity, we have the existence of ξ1 < 1 such that ¯ ¯ Z ¯ ¯ t2 2 ¯ < ξ1 . ¯ dH(t) (4.6) sup ¯cm(z) ¯ 2 (1 + tm(z)) z∈C Therefore, using (4.1), (4.4) follows. We proceed with some improved bounds on quantities appearing earlier. Let M be nonrandom n × n. Then, using (3.2) and the argument used to derive the bound on E|W2 |, we find E|tr D −1 M − Etr D −1 2 M | = E| N X Ej tr D−1 M − Ej−1 tr D−1 M |2 j=1 = E| N X (Ej − Ej−1 )tr (D −1 − Dj−1 )M |2 = j=1 ≤2 N X N X E|(Ej − Ej−1 )βj rj∗ Dj−1 M Dj−1 rj |2 j=1 E|[βj (rj∗ Dj−1M Dj−1 rj −N −1 tr (T Dj−1M Dj−1 )|2 +E|βj − β¯j |2 |N −1 tr (T Dj−1M Dj−1 )|2 ] j=1 30 ≤ KkM k2 . (4.7) The same argument holds for D1−1 so we also have (4.8) E|tr D1−1 M − Etr D1−1 M |2 ≤ KkM k2 . Our next task is to investigate the limiting behavior of µ Z ¶ dHn (t) N cn + zcn Emn 1 + tEmn · ¸ 1 ∗ −1 −1 −1 −1 = N Eβ1 r1 D1 (Emn T + I) r1 − Etr (Emn T + I) T D . N for z ∈ Cn [see (5.2) in Bai and Silverstein (1998)]. Throughout the following, all bounds, including O(·) and o(·) expressions, and convergence statements hold uniformly for z ∈ Cn We have (4.9) Etr (Emn T + I)−1 T D1−1 − Etr (Emn T + I)−1 T D−1 = Eβ1 tr (Emn T + I)−1 T D1−1 r1 r1∗ D1−1 = bn E(1 − β1 γ1 )r1∗ D1−1 (Emn T + I)−1 T D1−1 r1 . From (3.2), (3.5), (4.3) we get |Eβ1 γ1 r1∗ D1−1 (Emn T + I)−1 T D1−1 r1 | ≤ KN −1 . Therefore |(4.9) − N −1 bn Etr D1−1 (Emn T + I)−1 T D1−1 T | ≤ KN −1 . Since β1 = bn − b2n γ1 + β1 b2n γ12 we have N Eβ1 r1∗ D1−1 (Emn T + I)−1 r1 − Eβ1 Etr (Emn T + I)−1 T D1−1 = −b2n N Eγ1 r1∗ D1−1 (Emn T + I)−1 r1 + b2n (N Eβ1 γ12 r1∗ D1−1 (Emn T + I)−1 r1 − (Eβ1 γ12 )Etr (Emn T + I)−1 T D1−1 ) = −b2n N Eγ1 r1∗ D1−1 (Emn T + I)−1 r1 +b2n (E[N β1 γ12 r1∗ D1−1 (Emn T + I)−1 r1 − β1 γ12 tr D1−1 (Emn T + I)−1 T ]) +b2n Cov(β1 γ12 , tr D1−1 (Emn T + I)−1 T ) 31 (Cov(X, Y ) = EXY − EXEY ). Using (3.2), (3.5), and (4.3), we have |E[N β1 γ12 r1∗ D1−1 (Emn T + I)−1 r1 − β1 γ12 tr D1−1 (Emn T + I)−1 T ]| ≤ Kδn2 . Using (3.5), (3.6), (4.3) and (4.8) we have |Cov(β1 γ12 , tr D1−1 (Emn T + I)−1 T )| ≤ (E|β1 |4 )1/4 (E|γ1 |8 )1/4 (E|tr D1−1 (Emn T + I)−1 T − Etr D1−1 (Emn T + I)−1 T |2 )1/2 ≤ Kδn3 N −1/4 . Since β1 = bn − bn β1 γ1 , we get from (3.5) and (3.6) Eβ1 = bn + O(N −1/2 ). Write EN γ1 r1∗ D1−1 (Emn T + I)−1 r1 = N E[(r1∗ D1−1 r1 − N −1 tr D1−1 T )(r1∗ D1−1 (Emn T + I)−1 r1 − N −1 tr D1−1 (Emn T + I)−1 T )] + N −1 Cov(tr D1−1 T, tr D1−1 (Emn T + I)−1 T ). From (4.8) we see the second term above is O(N −1 ). Therefore, we arrive at µ Z (4.10) N cn dHn (t) + zcn Emn 1 + tEmn ¶ = b2n N −1 Etr D1−1 (Emn T + I)−1 T D1−1 T − b2n N E[(r1∗ D1−1 r1 − N −1 tr D1−1 T )(r1∗ D1−1 (Emn T + I)−1 r1 − N −1 tr D1−1 (Emn T + I)−1 T )] + o(1). Using (1.15) on (4.10) and arguing the same way (1.15) is used in section 2 (below (2.7)), we see that under the assumptions in (3), the “complex Gaussian” case µ (4.11) N Z cn dHn (t) + zcn Emn 1 + tEmn ¶ →0 as n → ∞ while under the assumptions in (2), the “real Gaussian” case µ N Z cn dHn (t) + zcn Emn 1 + tEmn Let An (z) = cn R dHn (t) 1+tEmn (z) ¶ = −b2n N −1 Etr D1−1 (Emn T + I)−1 T D1−1 T + o(1). + zcn Emn (z). Using the identity Emn (z) = − (1 − cn ) + cn Emn z 32 we have Z An (z) = cn dHn (t) − cn + zEmn (z) + 1 1 + tEmn (z) µ = −Emn (z) −z − It follows that Emn (z) = −z + cn R 1 + cn Emn (z) 1 tdHn (t) 1+tEmn (z) + An /Emn (z) Z tdHn (t) 1 + tEmn (z) ¶ . . From this, together with the analogous identity [below (4.4)] we get (4.12) Emn (z) − mn0 (z) = − m0n An R 1 − cn Emn m0n t2 dHn (t) (1+tEmn )(1+tm0n ) . We see from (4.4) and the corresponding bound involving m0n (z), that the denominator of (4.12) is bounded away from zero. Therefore from (4.11), in the “complex Gaussian” case sup Mn2 (z) → 0 z∈Cn as n → ∞. We now find the limit of N −1 Etr D1−1 (Emn T + I)−1 T D1−1 T . Applications of (3.1)– (3.3), (3.6), and (4.3) show that both Etr D1−1 (Emn T + I)−1 T D1−1 T − Etr D−1 (Emn T + I)−1 T D1−1 T and Etr D−1 (Emn T + I)−1 T D1−1 T − Etr D−1 (Emn T + I)−1 T D−1 T are bounded. Therefore it is sufficient to consider N −1 Etr D−1 (Emn T + I)−1 T D−1 T. Write D(z) + zI − bn (z)T = N X rj rj∗ − bn (z)T. j=1 It is straightforward to verify that zI − bn (z)T is non-singular. Taking inverses we get D−1 (z) = −(zI − bn (z)T )−1 + N X βj (z)(zI − bn (z))−1 rj rj∗ Dj (z) − bn (z)(zI − bn (z)T )−1 T D−1 (z) j=1 33 = −(zI − bn (z)T )−1 + bn (z)A(z) + B(z) + C(z) (4.13) where A(z) = N X (zI − bn (z)T )−1 (rj rj∗ − N −1 )Dj−1 (z) j=1 B(z) = N X (βj (z) − bn (z))(zI − bn (z)T )−1 rj rj∗ Dj−1 (z) j=1 and C(z) = N −1 bn (z)(zI − bn (z)T ) −1 T N X (Dj−1 (z) − D−1 (z)) j=1 =N −1 bn (z)(zI − bn (z)T ) −1 T N X βj (z)Dj−1 (z)rj rj∗ Dj−1 (z). j=1 Since Eβ1 = −zEmn and Eβ1 = bn + O(N −1 ) we have bn → −zm. From (4.3) it follows that k(zI − bn (z)T )−1 k is bounded. We have by (3.5) and (3.6) E|β1 − bn |2 = |bn |2 E|β1 γ1 |2 ≤ KN −1 . (4.14) Let M be n × n. From (3.1),(3.2),(3.6), and (4.14) we get (4.15) |N −1 Etr B(z)M | ≤ K(E|β1 − bn |2 )1/2 (E|r1∗ r1 kD1−1 M k |2 )1/2 ≤ KN −1/2 (EkM k4 )1/4 and (4.16) |N −1 Etr C(z)M | ≤ KN −1 E|β1 |r1∗ r1 kD1−1 k2 kM k ≤ KN −1 (EkM k2 )1/2 . For the following M , n × n, is nonrandom, bounded in norm. Write (4.17) tr A(z)T D−1 M = A1 (z) + A2 (z) + A3 (z) where A1 (z) = tr N X (zI − bn T )−1 rj rj∗ Dj−1 T (D−1 − Dj−1 )M j=1 N X A2 (z) = tr (zI − bn T )−1 (rj rj∗ Dj−1 T Dj−1 − N −1 T Dj−1 T Dj−1 )M j=1 34 and A3 (z) = tr N X (zI − bn T )−1 N −1 T (Dj−1 − D−1 )M. j=1 We have EA2 (z) = 0 and similar to (4.16) we have |EN −1 A3 (z)| ≤ KN −1 . (4.18) Using (3.2) and (4.14) we get EN −1 A1 (z) = −Eβ1 r1∗ D1−1 T D1−1 r1 r1∗ D1−1 M (zI − bn T )−1 r1 = −bn E(N −1 tr D1−1 T D1−1 T )(N −1 tr D1−1 M (zI − bn T )−1 T ) + o(1) = −bn E(N −1 tr D−1 T D−1 T )(N −1 tr D−1 M (zI − bn T )−1 T ) + o(1). Using (3.1) and (4.7) we find |Cov(N −1 tr D−1 T D−1 T, N −1 tr D−1 M (zI − bn T )−1 T )| ≤ (E|N −1 tr D−1 T D−1 T |2 )1/2 N −1 × (E|tr D−1 M (zI − bn T )−1 T − ED−1 M (zI − bn T )−1 T |2 )1/2 ≤ KN −1 . Therefore (4.19) EN −1 A1 (z) = −bn (EN −1 tr D−1 T D−1 T )(EN −1 tr D−1 M (zI − bn T )−1 T ) + o(1). From (4.13),(4.15), and (4.16) we get (4.20) EN −1 tr D−1 T (zI − bn T )−1 T = N −1 tr (−(zI − bn T )−1 + EB(z) + EC(z))T (zI − bn T )−1 T cn =− 2 z Z t2 dHn (t) + o(1). (1 + tEmn )2 Similarly (4.21) EN −1 tr D −1 (Emn T + I) −1 T (zI − bn T ) −1 cn T =− 2 z Z t2 dHn (t) + o(1). (1 + tEmn )3 Using (4.13),(4.15)–(4.20) we get EN −1 tr D−1 T D−1 T = −EN −1 D−1 T (zI − bn T )−1 T − b2n (EN −1 tr D−1 T D−1 T )(EN −1 tr D−1 T (zI − bn T )−1 T ) + o(1) 35 Z cn = 2 z t2 dHn (t) (1 + z 2 Em2n EN −1 tr D−1 T D−1 T ) + o(1). (1 + tEmn )2 Therefore EN −1 tr D−1 T D−1 T = (4.22) cn z2 R 1 − cn t2 dHn (t) (1+tEmn )2 R Em2n t2 dHn (t) + o(1) (1+tEmn )2 Finally we have from (4.13)–(4.19),(4.21), and (4.22) N −1 Etr D−1 (Emn T + I)−1 T D−1 T = −EN −1 D−1 (Emn T + I)−1 T (zI − bn T )−1 T − b2n (EN −1 tr D−1 T D−1 T )(EN −1 tr D−1 (Emn T + I)−1 T (zI − bn T )−1 T ) + o(1) = cn z2 Z cn z2 R t2 dHn (t) (1+tEmn )2 t2 dHn (t) 2 2 1 + z Emn R Em2n t2 dHn (t) + o(1). (1 + tEmn )3 1 − cn (1+tEmn )2 R t2 dHn (t) cn 2 z (1+tEmn )3 = R Em2n t2 dHn (t) + o(1). 1 − cn (1+tEmn )2 Therefore, from (4.12) we conclude that in the “real Gaussian” case c R m(z)3 t2 dH(t) (1+tm(z))3 sup Mn2 (z) → ³ R z∈Cn 1−c m(z)2 t2 dH(t) (1+tm(z))2 ´2 as n → ∞ which is (1.12). Finally, for general standardized X1 1 , we see that in light of the above work, in order to show {Mn2 (z)} for z ∈ Cn is bounded and equicontinuous, it is sufficient to prove {fn0 (z)}, where fn (z) ≡ N E[(r1∗ D1−1 r1 −N −1 tr D1−1 T )(r1∗ D1−1 (Emn T +I)−1 r1 −N −1 tr D1−1 (Emn T +I)−1 T )], is bounded. Using (2.3) we find ³ −2 −1 |f 0 (z)| ≤ KN −1 (E(tr D1−2 T D1 T )E(tr D1−1 (Emn T + I)−1 T (Emn T + I)−1 D1 T ))1/2 −1 −2 + (E(tr D1−1 T D1 T )E(tr D1−2 (Emn T + I)−1 T (Emn T + I)−1 D1 T ))1/2 ´ −1 −1 +|Em0n |(E(tr D1−1 T D1 T )E(tr D1−1 (Emn T + I)−2 T 3 (Emn T + I)−2 D1 T ))1/2 36 Using the same argument resulting in (3.1) it is a simple matter to conclude that Em0n (z) is bounded for z ∈ Cn . All the remaining expected values are O(N ) due to (3.1) and (4.3), and we are done. 5. Some Derivations and Calculations. Choose f, g ∈ {f1 , . . . , fr }. Let SF denote the support of F c,H , and let a, b be such that SF is a subset of (a, b), and is contained in a proper rectangle on which f and g are analytic. We proceed to show (1.17). Assume the z1 contour encloses the z2 contour. Then, because the mapping z → m(z) is one-to-one on C − SF and increasing on intervals in R − SF , the resulting contour m(z2 ) forms encloses the resulting m(z1 ) contour (if b ≤ x2 < x1 , then m(x2 ) < m(x1 ) < 0. Similarly x1 < x2 ≤ a ⇒ 0 < m(x1 ) < m(x2 )). Using integration by parts twice, first with respect to z2 and then with respect to z1 we get ZZ 1 f (z1 )g 0 (z2 ) d (1.7) = m(z1 )dz2 dz1 2 2π (m(z1 ) − m(z2 )) dz1 ZZ 1 f 0 (z1 )g 0 (z2 )Log(m(z1 ) − m(z2 ))dz1 dz2 . =− 2 2π Here Log is any branch of the logarithm. For fixed z2 m(z1 ) − m(z2 ) does not encircle the origin as z1 moves around its contour. Therefore Log(m(z1 ) − m(z2 )) is single-valued, so we can write (5.1) ZZ 1 RHS of (1.7) = − 2 f 0 (z1 )g 0 (z2 )[ln |m(z1 ) − m(z2 )| + i arg(m(z1 ) − m(z2 ))]dz1 dz2 2π where, say, for fixed z2 the z1 contour begins at x1 ∈ R to the right of b, arg(m(z1 ) − m(z2 )) ∈ [0, 2π), for z1 = x1 , and varies continuously from that angle. We choose the contours to be rectangles, sides parallel to the axes, symmetric with respect to the real axis, with the height of the rectangle corresponding to z1 twice as that of the z2 rectangle. Since the integral is independent of the selection of the contours, we may compute the limit of the integral as the height approaches zero. Let A(z1 , z2 ) be a function. Then Z Z A(z1 , z2 )dz1 dz2 = I1 + I2 + I3 + I4 C where Z b Z b+ε [A(z1 , z2 ) + A(¯ z1 , z¯2 ) − A(z1 , z¯2 ) − A(¯ z1 , z2 )]dxdy I1 = a a−ε where z1 = x + 2iv and z2 = y + iv, and I2 is the integral in which z1 = x + 2iv and z2 runs on the vertical route, I3 is its analogue and I4 is the integral where both z1 and z2 run along the vertical routes. 37 We first consider the case A = f 0 (z1 )g 0 (z2 ) arg[m(z1 ) − m(z2 )]. Since the integrand is bounded, we conclude that Ij → 0, for j = 2, 3, 4. In I1 , z1 = x + iv, z2 = y + 2iv, and the integrand A(z1 , z2 ) + A(¯ z1 , z¯2 ) − A(z1 , z¯2 ) − A(¯ z 1 , z2 ) ∼ f 0 (x)g 0 (y) [arg(m(z1 ) − m(z2 )) + arg(m(¯ z1 ) − m(¯ z2 )) − arg(m(z1 ) − m(¯ z2 )) − arg(m(¯ z1 ) − m(z2 ))] = 0. Hence I1 → 0 by d.c.t. Next, let us consider the case A(z1 , z2 ) = f 0 (z1 )g 0 (z2 ) ln |m(z1 ) − m(z2 )|. We first show that I2 , I3 and I4 → 0. It is obvious that for any z1 = x ± 2iu and z2 = y ± iu with u > 0, ¯ ¯Z ¯ ¯ dF c,H (x) ¯ ≥ u/K. ¯ 2/u ≥ |m(z1 ) − m(z2 )| = |z1 − z2 | ¯ (x − z1 )(x − z2 ) ¯ Rv Thus, |I2 | ≤ K 0 ln(1/u)du → 0. Similarly, I3 → 0 and I4 → 0. Finally, we are finding the limit of I1 . Note that |f 0 (z1 )g 0 (z2 ) − f 0 (x)g 0 (y)| ≤ Kv for some constant K and that ln |m(z1 ) − m(z2 )| ≤ K ln(1/v) when z1 = x ± 2iv and z2 = y ± iv. The integrand [A(z1 , z2 ) + A(¯ z1 , z¯2 ) − A(z1 , z¯2 ) − A(¯ z1 , z2 )] = 2f 0 (x)f 0 (y)[ln |m(x + 2iv) − m(y + iv)| − ln |m(x + 2iv) − m(y − iv)|] + Rn (z1 , z2 ) with |Rn (z1 , z2 )| ≤ Kv ln(1/v). Thus, Z b Z b+ε I1 = 2 a f 0 (x)f 0 (y) ln a−ε |m(x + 2iv) − m(y + iv)| dxdy + o(1). |m(x + 2iv) − m(y − iv)| By (5.1), we have (5.2) 1 RHS of (1.7) = 2 π Z bZ a b+ε f 0 (x)f 0 (y) ln a−ε |m(x + 2iv) − m(y − iv)| dxdy + o(1). |m(x + 2iv) − m(y + iv)| The proof of (1.17) is then complete if we can prove that the limiting procedure can be taken under the integration sign on the right hand side of (5.2). To this end, by applying d.c.t., we need only to establish a bound for the integrand. From (1.2) we get m(z1 ) − m(z2 ) z1 − z2 = m(z1 )m(z2 ) µ Z 1−c m(z1 )m(z2 )t2 dH(t) (1 + tm(z1 ))(1 + tm(z2 )) This implies that (5.3) ¯ ¯ ¯ m(z1 ) − m(z2 ) ¯ ¯ ≥ |m(z1 )m(z2 )|. ¯ ¯ ¯ z 1 − z2 38 ¶ . Rewrite ¯ ¯ ¶ µ ¯ m(x + i2v) − m(y + iv) ¯ 1 4mi (x + i2v)mi (x + iv) ¯ ¯ (5.4) ln ¯ . = ln 1 + m(x + i2v) − m(y + iv) ¯ 2 |m(x + i2v) − m(y + iv)|2 From (5.3) we get µ 2mi (x + i2v)mi (x + iv) RHS of (5.4) ≤ ln 1 + (x − y)2 |m(x + i2v)m(y + iv)|2 ¶ . We claim that (5.5) mi (x + i2v)mi (x + iv) < ∞. 2 x,y∈[a−²,b+²] |m(x + i2v)m(y + iv)| sup v∈(0,1) Suppose not. Then there exists a sequence {zn } ⊂ C+ which converges to a number in [a − ², b + ²] for which m(zn ) → 0. From (1.2) we must have Z tm(zn ) (5.6) c dH(t) → 1. 1 + tm(zn ) But the limit of the left hand side of (5.6) is obviously 0 because m(zn ) → 0. The contradiction proves our assertion. Therefore, there exists a K > 0 for which the right hand side of (5.4) is bounded by ¶ µ 1 K (5.7) ln 1 + 2 (x − y)2 for x, y ∈ [a − ², b + ²]. The integrand of the integral in (5.2.) is bounded by the integrable function ¶ µ K 0 0 |f (x)g (y)| ln 1 + . (x − y)2 An application of the d.c.t. yields the exchangeability of the limiting and integration procedures and (1.17) follows. We now verify (1.18). From (1.2), we have d m2 (z) . m(z) = R t2 m2 (z) dz 1 − c (1+tm(z))2 dH(t) In Silverstein and Choi (1995) it is argued that the only place where the m0 (z) can possibly become unbounded are near the origin and the boundary, ∂SF , of SF . It is a simple matter to verify ¶ µ Z Z 1 d t2 m2 (z) EXf = dH(t) dz f (z) Log 1 − c 4πi dz (1 + tm(z))2 µ ¶ Z Z 1 t2 m2 (z) 0 =− dH(t) dz f (z)Log 1 − c 4πi (1 + tm(z))2 39 where, because of (4.6), the arg term for Log can be continuously taken within an interval of length 2π. We choose a contour as above. From (3.17) of Bai and Silverstein (1998), ¯ ¯ Z 2 2 ¯ ¯ t m (x + iv) ¯1 − c ¯ ≥ v 2 /2 dH(t) ¯ ¯ 2 (1 + tm(x + iv)) the integrals on the two vertical routes are controlled by Kv| ln v| → 0. The integral on the two horizontal routes is equal to 1 2π Z a b ¯ ¯ Z 2 2 ¯ ¯ t m (x + iv) ¯ dx (f (x + iv) − f (x − iv)) ln ¯¯1 − c dH(t) ¯ (1 + tm(x + iv))2 µ ¶ Z Z b 1 t2 m2 (x + iv) 0 0 + (f (x − iv) + f (x + iv)) arg 1 − c dH(t) dx. 2π a (1 + tm(x + iv))2 0 0 1 = π (5.8) Z b µ 0 Z f (x) arg 1 − c a ¶ t2 m2 (x + iv) dH(t) dx + o(1). (1 + tm(x + iv))2 The equality (1.18) then follows by applying d.c.t. to the above equation. We now derive d(c) (c ∈ (0, 1)) in (1.1), (1.21) and and the variance in (1.22). The first two rely on Poisson’s integral formula 1 u(z) = 2π (5.9) Z 2π u(eiθ ) 0 1 − r2 dθ, 1 + r2 − 2r cos(θ − φ) where u is harmonic on the unit disk in C, and z = reiφ with r ∈ [0, 1). Making the √ substitution x = 1 + c − 2 c cos θ we get 1 d(c) = π Z 0 2π √ sin2 θ √ ln(1 + c − 2 c cos θ)dθ 1 + c − 2 c cos θ Z 2π √ 2 sin2 θ 1 √ = ln |1 − ceiθ |2 dθ. 2π 0 1 + c − 2 c cos θ It is straightforward to verify f (z) ≡ −(z − z −1 )2 (Log(1 − √ cz) + √ cz) − √ is analytic on the unit disk, and that Re f (eiθ ) = 2 sin2 θ ln |1 − 40 √ ceiθ |2 . c(z − z 3 ) Therefore from (5.9) we have √ f ( c) c−1 d(c) = = ln(1 − c) − 1. 1−c c For (1.21) we use (1.18). From (1.2), with H(t) = I[1,∞) (t) we have for z ∈ C+ z=− (5.10) c 1 + . m(z) 1 + m(z) Solving for m(z) we find m(z) = −(z + 1 − c) + p p (z + 1 − c)2 − 4z −(z + 1 − c) + (z − 1 − c)2 − 4c = , 2z 2z the square roots defined to yield positive imaginary parts for z ∈ C+ . As z → x ∈ [a(y), b(y)] (limits defined below (1.1)) we get p p −(x + 1 − c) + 4c − (x − 1 − c)2 i −(x + 1 − c) + (x − a(c))(b(c) − x) i m(x) = = . 2x 2x The identity (5.10) still holds with z replaced by x and from it we get m(x) 1 + xm(x) = , 1 + m(x) c so that m2 (x) 1 1−c =1− 2 (1 + m(x)) c à −(x − 1 − c) + p 4c − (x − 1 − c)2 i 2 !2 p ´ 4c − (x − 1 − c)2 ³p = 4c − (x − 1 − c)2 + (x − 1 − c)i 2c Therefore, from (1.18) à ! Z b(c) x−1−c 1 dx f 0 (x) tan−1 p (5.11) EXf = 2π a(c) 4c − (x − 1 − c)2 Z b(c) f (a(c)) + f (b(c)) f (x) 1 p = dx. − 4 2π a(c) 4c − (x − 1 − c)2 To compute the last integral when f (x) = ln x we make the same substitution as before, arriving at Z 2π √ 1 ln |1 − ceiθ |2 dθ. 4π 0 41 √ We apply (5.9) where now u(z) = ln |1 − cz|2 , which is harmonic, and r = 0. Therefore, the integral must be zero, and we conclude EXln = ln(a(c)b(c)) 1 = ln(1 − c). 4 c To derive (1.22) we use (1.16). Since the z1 , z2 contours cannot enclose the origin (because of the logarithm), neither can the resulting m1 , m2 contours. Indeed, either from √ the graph of x(m) or from m(x) we see that x > b(c) ⇐⇒ m(x) ∈ (−(1 + c)−1 , 0) √ and x ∈ (0, a(y)) ⇐⇒ m(x) < ( c − 1)−1 . For our analysis it is sufficient to know that the m1 , m2 contours, non-intersecting and both taken in the positive direction, enclose (c − 1)−1 and −1, but not 0. Assume the m2 contour encloses the m1 contour. For fixed m2 , using (5.10) we have Z c Z 12 − 1 Log(z(m1 )) (1+m1 )2 m1 dm1 = dm1 1 c 2 (m1 − m2 ) − m1 + 1+m1 (m1 − m2 ) à ! à Z −1 (1 + m1 )2 − cm21 1 1 1 dm1 = 2πi = + − 1 cm1 (m1 − m2 ) m1 + 1 m1 − c−1 m2 + 1 m2 − ! 1 c−1 . ! 1 1 Log(z(m))dm − 1 m + 1 m − c−1 # à # ! Z " Z " 1 m − c−1 1 1 1 1 1 1 = Log Log(m)dm. − dm− − 1 1 πi m + 1 m − c−1 m+1 πi m + 1 m − c−1 Therefore 1 VarXln = πi Z à The first integral is zero since the integrand has antiderivative " à !#2 1 m − c−1 1 − Log 2 m+1 which is single valued along the contour. Therefore we conclude that VarXln = −2[Log(−1) − Log((c − 1)−1 )] = −2 ln(1 − c). Finally, we compute expressions for (1.23) and (1.24). Using (5.11) we have EX xr (a(c))r + (b(c))r 1 = − 4 2π 42 Z b(c) a(c) xr p dx 4c − (x − 1 − c)2 (a(c))r + (b(c))r 1 = − 4 4π Z 2π 0 |1 − √ ceiθ |2r dθ (a(c))r + (b(c))r 1 = − 4 4π Z 0 2π r µ ¶µ ¶ X √ r r (− c)j+k ei(j−k)θ dθ j k j,k=0 r µ ¶2 √ 2r √ 2r 1 1X r = ((1 − c) + (1 + c) ) − cj , 4 2 j=0 j which is (1.23). For (1.24) we use (1.16) and rely on observations made in deriving (1.22). For c ∈ (0, 1) the contours can again be made enclosing −1 and not the origin. However, because of the fact that (1.7) derives from (1.14) and the support of F c,I[1,∞) on R+ is [a(c), b(c)], we may also take the contours taken in the same way when c > 1. The case c = 1 simply follows from the continuous dependence of (1.16) on c. Keeping m2 fixed, we have on a contour within 1 of −1 Z (− m11 + c r1 1+m1 ) (m1 − m2 )2 dm1 µ ¶r ¶−2 m1 + 1 1 1−c 1 −r1 −2 1− =c (1 − (m1 + 1)) (m2 + 1) dm1 + m1 + 1 c m2 + 1 ¶k Z X r1 µ ¶ µ r1 1−c 1 r1 =c (1 + m1 )k−r1 k1 c k1 =0 µ ¶ ¶`−1 ∞ ∞ µ X X r1 + j − 1 m1 + 1 j −2 ` dm1 (m1 + 1) (m2 + 1) j m + 1 2 j=0 Z µ r1 `=1 r1 = 2πic rX −k1 1 −1 r1 X k1 =0 `=1 µ ¶µ ¶k µ ¶ r1 1 − c 1 2r1 − 1 − (k1 + `) `(m2 + 1)−`−1 . r1 − 1 k1 c Therefore, ¶k µ ¶ r1 −1 r1X −k1 µ ¶ µ r1 1 − c 1 2r1 − 1 − (k1 + `) i r1 +r2 X Cov(Xxr1 , Xxr2 ) = − c ` r1 − 1 π k1 c k1 =0 `=1 µ ¶ Z r ∞ µ 2 X r2 ¶ µ 1 − c ¶k2 X r2 + j − 1 −`−1 k2 −r2 × (m2 + 1) (m2 + 1) (m2 + 1)j dm2 j k2 c j=0 k2 =0 43 rX 1 −1 r1 +r2 = 2c ¶k +k r −k µ ¶µ ¶ r2 µ ¶µ ¶µ X r1 r2 1 − c 1 2 1X1 2r1 −1−(k1 + `) 2r2 −1 −k2 + ` ` , r1 − 1 r2 − 1 k1 k2 c k1 =0 k2 =0 `=1 which is (1.24), and we are done. APPENDIX We verify (1.9b) by modifying the proof in Bai and Yin (1993) (hereafter referred to as BY (1993)). To avoid confusion we maintain as much as possible the original notation used in BY (1993). Theorem. For Zij ∈ C i = 1, . . . , p, j = 1, . . . , n i.i.d. EZ1 1 = 0, E|Z1 1 |2 = 1, and E|Z1 1 |4 < ∞, let Sn = (1/n)XX ∗ where X = (Xij ) is p × n with Xij = Xij (n) = Zij I{|Z|ij ≤δn √n} − EZij I{|Z|ij ≤δn √n} where δn → 0 more slowly than that constructed in the proof of Lemma 2.2 of Yin, Bai, and Krishnaiah (1988), and satisfying δn n1/3 → ∞. Assume p/n → y ∈ (0, 1) as n → ∞. √ Then for any η < (1 − y)2 and any ` > 0 P(λmin (Sn ) < η) = o(n−` ). Proof. We follow along the proof of Theorem 1 in BY (1993). The conclusions of Lemmas 1 and 3–8 need to be improved from “almost sure” statements to ones reflecting tail probabilities. We shall denote the augmented Lemmas with primes (0 ) after the number. We remark here that the proof in BY (1993) assumes entries of Z1 1 to be real, but all the arguments can be easily modified to allow complex variables. For Lemma 1 it has been shown that for the Hermitian matrices T (l) defined in (2.2), √ 1/3 and integers mn satisfying mn / ln n → ∞, mn δn / ln n → 0, and mn /(δn n) → 0 Etr T 2mn (l) ≤ n2 ((2l + 1)(l + 1))2mn (p/n)mn (l−1) (1 + o(1))4mn l . ((2.13) of BY (1993)). Therefore, writing mn = kn ln n, for any ² > 0 there exists an a ∈ (0, 1) such that for all n large (A.1) P(tr T (l) > (2l + 1)(l + 1)y (l−1)/2 + ²) ≤ n2 amn = n2+kn log a = o(n−` ) for any positive `. We call (A.1) Lemma 10 . We next replace Lemma 2 of BY (1993) with Lemma 20 Let for every n X1 , X2 , ..., Xn be i.i.d. with X1 = X1 (n) ∼ X1 1 (n). Then for any ² > 0 and ` > 0 ¯ µ¯ ¶ n ¯ −1 X ¯ 2 ¯ ¯ P ¯n |Xi | − 1¯ > ² = o(n−` ) i=1 44 and for any f > 1 µ ¶ n X −f 2f P n |Xi | > ² = o(n−` ). i=1 Proof. Since as n → ∞ E|X1 |2 → 1, n X n−f E|Xi |2f ≤ 22f E|Z1 1 |2f n1−f → 0 for f ∈ (1, 2], i=1 and −f n n X E|Xi |2f ≤ 22f E|Z1 1 |4 n1−f +(2f −4)/2 = Kn−1 → 0 for f > 2, i=1 it is sufficient to show for f ≥ 1 ¯X ¯ µ ¶ ¯ n ¯ −f ¯ 2f 2f ¯ P n ¯ (|Xi | − E|Xi | )¯ > ² = o(n−` ). (A.2) i=1 For any positive integer m we have this probability bounded by −2mf −2m ² n ·X n E 2f (|Xi | ¸2m 2f − E|Xi | ) i=1 =n µ X −2mf −2m ² i1 ≥0,...,in ≥0 i1 +···+in =2m =n −2mf −2m ² m µ ¶ X n k k=1 n ² m X ≤2 n ² m X n n i1 ≥2,...,ik ≥2 i1 +···+ik =2m k=1 2m −2mf −2m ≤2 n µ ² m X E(|Xt |2f − E|Xt |2f )it t=1 2m i1 · · · ik µ i1 ≥2,...,ik ≥2 i1 +···+ik =2m X k ¶Y n X k k=1 2m −2mf −2m µ X i1 ≥2,...,ik ≥2 i1 +···+ik =2m 2m −2mf −2m ≤2 2m i1 · · · in ¶Y k E(|X1 |2f − E|X1 |2f )it t=1 2m i1 · · · ik 2m i1 · · · ik ¶Y k ¶Y k E|X1 |2f it t=1 √ (2δn n)2f it −4 E|Z1 1 |4 t=1 √ k 2m (2δn n)4f m−4k nk (E|Z1 1 |4 )k k=1 2m −2m =2 ² m X (2δn )4f m (E|Z1 1 |4 )k (4δn2 n)−k k 2m k=1 45 µ m ≤ (for all n large) ¶2m (2δn )2f 4m ² ln(4δn2 n/E|Z1 1 |4 ) , where we have used the inequality a−x xb ≤ (b/ ln a)b , valid for all a > 1, b > 0, x ≥ 1. Choose mn = kn ln n with kn → ∞ and δn2f kn → 0. Since δn n1/3 ≥ 1 for n large we get for these n ln(δn2 n) ≥ (1/3) ln n. Using this and the fact that limx→∞ x1/x = 1, we have the existence of a ∈ (0, 1) for which µ m (2δn )2f 4m ² ln(4δn2 n/E|Z1 1 |4 ) ¶2m ≤ a2kn ln n = n2kn ln a for all n large. Therefore (A.2) holds. Redefining the matrix X (f ) in BY (1993) to be [|Xuv |f ], Lemma 30 states for any positive integer f ∗ P(λmax {n−f X (f ) X (f ) } > 7 + ²) = o(n−` ) for any positive ² and `. Its proof relies on Lemmas 10 , 20 (for f = 1, 2) and on the bounds used in the proof of Lemma 3 in BY (1993). In particular we have the Gerˆsgorin bound −f (A.3) λmax {n X (f ) (f ) ∗ X −f } ≤ max n i −f ≤ max n i n X n X j=1 2f |Xij | 2f |Xij | −f + max n µ i −1 + max n i j=1 n XX |Xij |f |Xkj |f k6=i j=1 n X ¶µ |Xij | f max n j j=1 −1 p X ¶ |Xkj | f . k=1 We show the steps involved for f = 2. With ²1 > 0 satisfying (p/n + ²1 )(1 + ²1 ) < 7 + ² for all n we have from Lemma 20 and (A.3) −2 P(λmax {n X (2) X (2) ∗ µ } > 7 + ²) ≤ pP n µ −2 n X j=1 −1 + pP n n X 2 ¶ |X1j | − 1 > ²1 j=1 ¶ 4 |X1j | > ²1 µ −1 + nP p p X 2 |Xk1 | − 1 > ²1 ¶ = o(n−` ). k=1 The same argument can be used to prove Lemma 40 , which states for integer f > 2 P(kn−f /2 X (f ) k > ²) = o(n−` ) for any positive ² and `. The proofs of Lemmas 40 –80 are handled using the arguments in BY (1993) and those used above: each quantity Ln in BY (1993) that is o(1) a.s. can be shown to satisfy P(|Ln | > ²) = o(n−` ). 46 From Lemmas 10 and 80 there exists a positive C such that for every integer k > 0 and positive ² and ` P(kT − yIkk > Ck 4 2k y k/2 + ²) = o(n−` ). (A.4) For given ² > 0 let integer k > 0 be such that √ |2 y(1 − (Ck 4 )1/k )| < ²/2. Then √ √ 2 y + ² > 2 y(Ck 4 )1/k + ²/2 ≥ (Ck 4 2k y k/2 + (²/2)k )1/k . Therefore from (A.4) we get for any ` > 0 (A.5) √ P(kT − yIk > 2 y + ²) = o(n−` ). From Lemma 20 and (A.5) we get for positive ² and ` √ P(kSn − (1 + y)Ik > 2 y + ²) ≤ P(kSn − I − T k > ²/2) + o(n−` ) ¯ ¯ µ ¶ n ¯ −1 X ¯ 2 ¯ ¯ = P max¯n |Xij | − 1¯ > ²/2 + o(n−` ) = o(n−` ). i≤p Finally, for any positive η < (1 − √ j=1 y)2 and ` > 0 √ y)2 − 2 y) √ √ ≤ P(kSn − (1 + y)Ik) > 2 y + (1 − y)2 − η) = o(n−` ) P(λmin (Sn ) < η) = P(λmin (Sn − (1 + y)I) < η − (1 − √ and we are done. Acknowledgments. Part of this work was done while J. W. Silverstein visited the Department of Statistics and Applied Probability at National University of Singapore. He thanks the members of the department for their hospitality. REFERENCES Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, A review. Statistica Sinica 9 611-677. Bai, Z. D. and Saranadasa, H. (1996). Effect of high dimension comparison of significance tests for a high dimensional two sample problem. Statistica Sinica 6 311-329. 47 Bai, Z. D. and Silverstein, J. W. (1999). Exact separation of eigenvalues of large dimensional sample covariance matrices. Ann. Probab. 27 1536-1555. Bai, Z.D., Silverstein, J.W., and Yin, Y.Q. (1988). A note on the largest eigenvalue of a large dimensional sample covariance matrix. J. Multivariate Anal. 26 166-168. Bai, Z.D. and Yin, Y.Q. (1993). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix Ann. Probab. 21 1275-1294. Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York. Billingsley, P. (1995). Probability and Measure, Third Edition. Wiley, New York. Burkholder, D.L. (1973). Distribution function inequalities for martingales. Ann. Probab. 1 19-42. Dempster A.P. (1958). A high dimensional two sample significance test. Ann. Math. Statis. 29 995-1010. Jonsson, D. (1982). Some limit theorems for the eigenvalues of a sample covariance matrix J. Multivariate Analysis 12 1-38. ˇenko, V.A. and Pastur, L.A. (1967). Distribution of eigenvalues for some sets Marc of random matrices. Math. USSR-Sb. 1 Silverstein, J. W. and Choi S.I. (1995). Analysis of the limiting spectral distribution of large dimensional random matrices. J. Multivariate Anal. 54 295-309. Silverstein, J.W. and Combettes, P.L. (1992). Signal detection via spectral theory of large dimensional random matrices. IEEE Trans. Signal Processing 40 2100-2105. Titchmarsh, E.C. (1939). The Theory of Functions, Second Edition. Oxford University Press, London. Yin, Y. Q. (1986) Limiting spectral distribution for a class of random matrices. J. Multiv. Anal., 20 50-68. Yin, Y.Q., Bai, Z.D., and Krishnaiah, P.R. (1988). On limit of the largest eigenvalue of the large dimensional sample covariance matrix Probab. Theory Related Fields 78 509-521. Yin, Y. Q. and Krishnaiah, P. R. (1983). A limit theorem for the eigenvalues of product of two random matrices. J. Multiv. Anal. 13 489-507. Department of Statistics and Applied Probability National University of Singapore 10 Kent Ridge Crescent Singapore 119260 E-mail: [email protected] Department of Mathematics Box 8205 North Carolina State University Raleigh, North Carolina 27695-8205 E-mail: [email protected] 48
© Copyright 2024