CLT for Linear Spectral Statistics of Large Dimensional Sample Covariance Matrices

CLT for Linear Spectral Statistics of Large
Dimensional Sample Covariance Matrices
By Z.D. Bai1 and Jack W. Silverstein2
National University of Singapore and North Carolina State University
Summary
1/2
1/2
Let Bn = (1/N )Tn Xn Xn∗ Tn where Xn = (Xij ) is n × N with i.i.d. complex
1/2
standardized entries having finite fourth moment, and Tn is a Hermitian square root of
the nonnegative definite Hermitian matrix Tn . The limiting behavior, as n → ∞ with
n/N approaching a positive constant, of functionals of the eigenvalues of Bn , where each
is given equal weight, is studied. Due to the limiting behavior of the empirical spectral
distribution of Bn , it is known that these linear spectral statistics converges a.s. to a
non-random quantity. This paper shows their of rate of convergence to be 1/n by proving,
after proper scaling, they form a tight sequence. Moreover, if EX12 1 = 0 and E|X1 1 |4 = 2,
or if X1 1 and Tn are real and EX14 1 = 3, they are shown to have Gaussian limits.
1
Research supported in part by Research Grant R-155-000-014-112 from the National
University of Singapore
2
Research supported by the National Science Foundation under grant DMS-9703591
AMS 2000 subject classifications. Primary 15A52, 60F05; Secondary 62H99.
Key Words and Phrases. Linear spectral statistics, Random matrix, empirical distribution function of eigenvalues, Stieltjes transform.
1. Introduction. Due to the rapid development of modern technology, statisticians
are confronted with the task of analyzing data with ever increasing dimension. For example, stock market analysis can now include a large number of companies. The study
of DNA can now incorporate a sizable number of its base pairs. Computers can easily
perform computations with high dimensional data. Indeed, within several milli-seconds, a
mainframe can complete the spectral decomposition of a 1000 × 1000 symmetric matrix, a
feat unachievable only 20 years ago. In the past, so-called dimension reduction schemes
played the main role in dealing with high dimensional data. But a large portion of information contained in the original data would inevitably get lost. For example, in variable
selection of multivariate linear regression models, one shall lose all information contained
in the unselected variables; in principal component analysis, all information contained in
the components deemed “non-principal” would be gone. Now, when dimension reduction
is performed, it is usually not due to computational restrictions.
However, even though the technology exists to compute much of what is needed, there
is a fundamental problem with the analytical tools used by statisticians. Their use relies
on their asymptotic behavior as the number of samples increase. It is to be expected
that larger dimension will require larger samples in order to maintain the same level of
behavior. But the required increase is typically orders of magnitude larger, sample sizes
that are simply unattainable in most situations. With a necessary limitation on the number
of samples, many frequently used statistics in multivariate analysis perform in a completely
different manner than they do on data of low dimension with no restriction on sample size.
Some methods behave very poorly (see Bai and Saranadasa (1996)), and some are even
not applicable (see Dempster (1958)). Consider the following example.
Let Xij be iid standard normal variables. Write
Ã
SN =
N
1 X
Xik Xjk
N
k=1
!n
i,j=1
which can be considered as a sample covariance matrix, N samples of an n-dimensional
mean zero random vector with population matrix I. An important statistic in multivariate
analysis is
n
X
TN = ln(detSN ) =
ln(λN,j ),
j=1
where λN,j j = 1, . . . , n, are the eigenvalues of SN . When n is fixed, λN,j → 1 almost
a.s.
surely as N → ∞ and thus TN −→ 0.
Further, by taking a Taylor expansion on ln x, one can show that
p
N/n TN →D N (0, 2),
1
for any fixed n. This suggests the possibility that TN is asymptotically normal, provided
that n = O(N ). However, this is not the case. Let us see what happens when n/N → c ∈
(0, 1) as n → ∞. Using results on the limiting spectral distribution of {SN } [see Marˇcenko
and Pastur (1967), Bai (1999)], we have with probability one
(1.1)
1
TN →
n
Z
b(c)
a(c)
ln x p
c−1
ln(1 − c) − 1 ≡ d(c) < 0
(b(c) − x)(x − a(c))dx =
2πxc
c
√
√
where a(c) = (1 − c)2 , b(c) = (1 + c)2 (see section 5 for a derivation of the integral).
This shows that almost surely
p
√
N/n TN ∼ d(c) N n → −∞.
p
Thus, any test which assumes asymptotic normality of N/n TN will result in a serious
error.
Besides demonstrating problems with relying on traditional methodology when sample size is restricted, the example introduces one of several results that can be used to
handle data with large dimension n, proportional to N , the sample size. They are limit
theorems, as n approaches infinity, on the eigenvalues of a class of random matrices of
sample covariance type [Yin and Krishnaiah (1983), Yin (1986), Silverstein (1995), Bai
and Silverstein (1998),(1999)]. They take the form
Bn =
1 1/2
Tn Xn Xn∗ Tn1/2
N
1/2
n
n
where Xn = (Xij
) is n × N , Xij
∈ C are i.i.d. with E|X1n1 − EX1n1 |2 = 1, Tn
1/2
Tn
X1n1
is n × n
random Hermitian, with Xn and
independent. When
is known to have mean zero,
and Tn is nonrandom, Bn can be viewed as a sample covariance matrix, which includes any
1/2 n
n
Wishart matrix, formed from N samples of the random vector Tn X·1
(X·1
denoting the
1/2 2
first column of Xn ), which has population covariance matrix Tn ≡ (Tn ) . Besides sample
covariance matrices, Bn , whose eigenvalues are the same as those of (1/N )Xn Xn∗ Tn , models
the spectral behavior of other matrices important to multivariate statistics, in particular
multivariate F matrices, where X1n1 is N (0, 1) and Tn is the inverse of another Wishart
matrix.
The basic limit theorem on the eigenvalues of Bn concerns its empirical spectral distribution F Bn , where for any matrix A with real eigenvalues, F A denotes the empirical
distribution function of the eigenvalues of A, that is, if A is n × n then
F A (x) =
1
(number of eigenvalues of A ≤ x).
n
2
If
n
1) for all n, i, j, Xij
are i.d.,
2) with probability one, F Tn →D H, a proper cumulative distribution function (c.d.f.),
and,
3) n/N → c > 0 as n → ∞,
then with probability one F Bn converges in distribution to F c,H , a nonrandom proper
c.d.f.
The case when H distributes its mass at one positive number (called the PasturMarcˇenko law with a dimension-sample size ratio c2 ∈ (0, 1)), as in the above example, is
one of seven non-trivial cases where an explicit expression for F c,H is known (the multivariate F matrix case [Silverstein (1985)], and, as to be seen below, when H is discrete
with at most three mass points). However, a good deal of information, including a way to
compute F c,H , can be extracted out of an equation satisfied by its Stieltjes transform,
defined for any c.d.f. G to be
Z
1
mG (z) ≡
dG(λ)
z ∈ C+ ≡ {z ∈ C : Im z > 0}.
λ−z
For each z ∈ C+ , the Stieltjes transform m(z) ≡ mF c,H (z) is the unique solution to
Z
1
m=
dH(λ)
λ(1 − c − czm) − z
+
c,H
in the set {m ∈ C : − 1−c
z + cm ∈ C }. The equation takes on a simpler form when F
is replaced by
F c,H ≡ (1 − c)I[0,∞) + cF c,H
(IA denoting the indicator function on the set A) which is the limiting empirical distribution
function of B n ≡ (1/N )Xn∗ Tn Xn (the spectra of which differs from that of Bn by |n − N |
zeros). Its Stieltjes transform
m(z) ≡ mF c,H (z) = −
has inverse
1
z = z(m) = − + c
m
(1.2)
Z
1−c
+ cm(z)
z
t
dH(t).
1 + tm
Using (1.2) it is shown in Silverstein and Choi (1995) that, on (0, ∞) F c,H has a continuous
density, is analytic inside its support and is given by
(1.3)
f c,H (x) = c−1
d c,H
F (x) = (cπ)−1 Im m(x) ≡ (cπ)−1 lim Im m(z).
z→x
dx
3
Also, F c,H (0) = max[1 − c−1 , H(0)]. Moreover, considering (1.2) for m real, the range of
values where it is increasing constitutes the complement of the support of F c,H on (0, ∞)
[Marˇcenko and Pastur (1967), Silverstein and Choi (1995)]. From (1.2) and (1.3) f c,H (x)
can be computed using Newton’s method for each x ∈ (0, ∞) inside its support (see Bai
and Silverstein (1998) for an illustration of the density when c = .1 and H places mass
0.2, 0.4, and 0.4 at respectively, 1, 3, and 10).
Notice in (1.2) when H is discrete with at most three positive mass points the density
has an explicit expression, since m(z) is the root of a polynomial of degree at most four.
Convergence in distribution of F Bn of course reveals no information on the number
of eigenvalues of Bn appearing on any interval [a, b] outside the support of F c,H , other
than the number is almost surely o(n). In Bai and Silverstein (1998) it is shown that, with
probability one, no eigenvalues of Bn appear in [a, b] for all n large under the following
additional assumptions:
10 ) Xn is the first n rows and N columns of a doubly infinite array of i.i.d. random
variables, with EX1 1 = 0 E|X1 1 |2 = 1, and E|X1 1 |4 < ∞, and
20 ) Tn is nonrandom, kTn k, the spectral norm of Tn , is bounded in n, and
30 ) [a, b] lies in an open subset of (0, ∞) which is outside the support of F cn ,Hn for all n
large, where cn ≡ n/N and Hn ≡ F Tn .
The result extends what has been previously known on the extreme eigenvalues of
A
(1/N )Xn Xn∗ (Tn = I). Let λA
max , λmin denote, respectively, the largest and smallest
eigenvalues of the Hermitian matrix A. Under condition 10 Yin, Bai, and Krishnaiah
(1988) showed, as n → ∞
√
(1/N )X X ∗ a.s.
λmax n n −→ (1 + c)2 ,
while in Bai and Yin (1993) for c ≤ 1
∗
a.s.
(1/N )Xn Xn
λmin
−→ (1 −
√
c)2 .
If [a, b] separates the support of F c,H in (0, ∞) into two non-empty sets, then associated with it is another interval J which separates the eigenvalues of Tn for all n large. The
mass F cn ,Hn places, say, to the right of b equals the proportion of eigenvalues of Tn lying
to the right of J. In Bai and Silverstein (1999) it is proven that, with probability one, the
number of eigenvalues of Bn and Tn lying on the same side of their respective intervals is
the same for all n large.
The above two results are intuitively plausible when viewing Bn as an approximation
of Tn , especially when cn is small (it can be shown that F c,H →D H as c → 0). However,
regardless of the size of c, when separation in the support of F c,H on (0, ∞) associated
with a gap in the spectrum of Tn occurs, there will be exact splitting of the eigenvalues of
Bn .
4
These results can be used in applications where location of eigenvalues of the population covariance matrix is needed, as in the detection problem in array signal processing [see
Silverstein and Combettes (1992)]. Here, each entry of the sampled vector is a reading off
a sensor, due to an unknown number q of sources emitting signals in a noise-filled environment (q < n). The problem is to determine q. The smallest eigenvalue of the population
covariance matrix is positive with multiplicity n − q (the so-called “noise” eigenvalues).
The traditional approach has been to sample enough times so that the sample covariance
matrix is close to the population matrix, relying on fixed dimension, large sample asymptotic analysis. However, it may be impossible to sample enough times if q is sizable. The
above results show that in order to determine the number of sources, simply sample enough
times so that the eigenvalues of Bn split into two discernable groups. The number on the
right will, with high probability, equal q.
The results also enable us to understand the true behavior of statistics such as TN
in the above example when n and N are large but on the same order of magnitude: TN
is not close to zero, rather n−1 TN is close to the quantity d(c) in (1.1), or perhaps more
appropriately d(cn ).
However, in order to fully utilize n−1 TN , typically in hypothesis testing, it is important
to establish the limiting distribution of TN − nd(cn ). We come to the main purpose of this
paper, to study the limiting distribution of normalized spectral functionals like Tn − nd(c),
and as a by-product, the rate of convergence of statistics such as n−1 TN , functionals of
the eigenvalues of Bn where each is given equal weight. We will call them linear spectral
statistics, quantities of the form
1X
f (λj ) (λ1 , . . . , λn denoting the eigenvalues of Bn ) =
n j=1
n
Z
f (x)dF Bn (x)
where f is a function on [0, ∞).
R
We will show, under the assumption E|X1 1 |4 < ∞ and the analyticity of f , the rate
R
f (x)dF Bn (x) − f (x)dF cn ,Hn (x) approaches zero is essentially 1/n. Define
Gn (x) = n[F Bn (x) − F cn ,Hn (x)].
The main result is stated in the following theorem.
Theorem 1.1. Assume:
n
(a) For each n Xij = Xij
, i ≤ n, j ≤ N are i.i.d., i.d. for all n, i, j, EX1 1 = 0, E|X1 1 |2 = 1,
E|X1 1 |4 < ∞, n/N → c, and
(b) assumption 20 above.
5
Let f1 , . . . , fr be C 1 functions on R with bounded derivatives, and analytic on an open
interval containing
(1.4)
n
[lim inf λTmin
I(0,1) (c)(1 −
√
n
n
c)2 , lim sup λTmax
(1 +
√
c)2 ]
n
Then
(1) the random vector
µZ
(1.5)
¶
Z
f1 (x) dGn (x), . . . ,
fr (x) dGn (x)
forms a tight sequence in n.
4
(2) If X11 and Tn are real and E(X11
) = 3, then (1.4) converges weakly to a Gaussian
vector (Xf1 , . . . , Xfr ), with means
(1.6)
1
EXf = −
2πi
Z
f (z) ³
c
R
1−c
m(z)3 t2 dH(t)
(1+tm(z))3
R
m(z)2 t2 dH(t)
(1+tm(z))2
´2 dz
and covariance function
(1.7)
1
Cov(Xf , Xg ) = − 2
2π
ZZ
d
f (z1 )g(z2 )
d
m(z1 )
m(z2 )dz1 dz2
2
(m(z1 ) − m(z2 )) dz1
dz2
(f, g ∈ {f1 , . . . , fr }). The contours in (1.6) and (1.7) (two in (1.7), which we may
assume to be non-overlapping) are closed and are taken in the positive direction in
the complex plane, each enclosing the support of F c,H .
2
) = 0 and E(|X11 |4 ) = 2, then (2) also holds, except the
(3) If X11 is complex with E(X11
means are zero and the covariance function is 1/2 the function given in (1.7).
We begin the proof of Theorem 1.1 here with the replacement of the entries of Xn with
truncated and centralized variables. For m = 1, 2, . . . find nm (nm > nm−1 ) satisfying
Z
4
m
|X11 |4 < 2−m
√
{|X11 |≥ n/m}
for all n ≥ nm . Define δn = 1/m for all n ∈ [nm , nm+1 ) (=1 for n < n1 ). Then, as n → ∞,
δn → 0 and
Z
−4
(1.8)
δn
|X11 |4 → 0.
√
{|X11 |≥δn
n}
Let now for each n δn be the larger of δn constructed above and the δn created in the proof
of Lemma 2.2 of Yin, Bai, and Krishnaiah (1988) with r = 1/2 and satisfying δn n1/4 → ∞.
6
bn = (1/N )Tn1/2 X
bn n × N having (i, j)-th entr y Xij I{|X |<δ √n} .
bn X
bn∗ Tn1/2 with X
Let B
ij
n
We have then
Z
√
−4
b
P(Bn 6= Bn ) ≤ nN P (|X11 | ≥ δn n) ≤ Kδn
|X11 |4 = o(1).
√
{|X11 |≥δn
n}
en X
en∗ Tn1/2 with X
en = (1/N )Tn1/2 X
en n × N having (i, j)-th entry (X
bij −
Define B
2
2
bij −EX
bij | . From Yin, Bai, and Krishnaiah (1988) we know that
bij )/σn , where σn = E|X
EX
√
both lim sup λBbn and lim sup λBen are almost surely bounded by lim sup kT k(1+ c)2 .
n
max
n
max
n
n
b n (x) and G
e n (x) to denote the analogues of Gn (x) with the matrix Bn replaced by
We use G
bn and B
en respectively. Let λA denote the k-th smallest eigenvalue of Hermitian A. Using
B
k
the same approach and bounds that are used in the proof of Lemma 2.7 of Bai (1999), we
have, for each j = 1, 2, · · · , r,
¯Z
¯
Z
n
X
¯
¯
bn
en
B
b n (x) − fj (x)dG
e n (x)¯ ≤ Kj
¯ fj (x)dG
|λB
k − λk |
¯
¯
k=1
bn
en 1/2
B
bn − X
en )(X
bn − X
en )∗ Tn1/2 )1/2 (n(λB
≤ 2Kj (N −1 tr (Tn1/2 (X
max + λmax ))
where Kj is a bound on |fj0 (z)|. From (1.8) we have
Z
Z
2
2
−2 −1
|σn − 1| ≤ 2
|X11 | ≤ δn n
√
{|X11 |≥δn
n}
Moreover,
b11 | = |
|EX
These give us
√
{|X11 |≥δn n}
Z
{|X11 |≥δn
√
|X11 |4 = o(δn2 n−1 ).
X11 | = o(δn n−3/2 ).
n}
bn − X
en )(X
bn − X
en )∗ Tn1/2 )1/2
(N −1 tr (Tn1/2 (X
bn∗ )1/2 + (N −1 tr ((σn−2 (E X
bn X
bn E X
bn∗ ))1/2
≤ (N −1 tr ((1 − 1/σn )2 (X
≤(
(1 − σn2 )2
bn 1/2 + o(δ n−1 ).
b11 | = o(δn n−1/2 )(λB
nλBbn )1/2 + n1/2 σ −1 |EX
n
max )
σn2 (1 + σn )2 max
From the above estimates, we obtain
Z
Z
e n (x) + op (1)
fj (x)dGn (x) = fj (x)dG
i.p.
(op (1) −→ 0 as n → ∞.) Therefore, we only need to find the limiting distribution of
R
e n (x), j = 1, · · · , r}. Hence, in the sequel, we shall assume the underlying vari{ fj (x)dG
√
ables are truncated at δn n, centralized and renormalized. For simplicity, we shall sup√
press all sub- or superscripts on the variables and assume that |Xij | < δn n, EXij = 0,
7
E|Xij |2 = 1, E|Xij |4 < ∞, and in the real Gaussian case E|X11 |4 = 3 + o(1),while in the
2
complex Gaussian case EX11
= o(1/n) and E|X11 |4 = 2 + o(1).
Since the truncation steps are identical to those in Yin, Bai, and Krishnaiah (1988)
√
we have for any η > (1 + c)2 the existence of {kn } for which
kn
→∞
ln n
and
Ek(1/N )Xn Xn∗ kkn ≤ η kn
for all n large. Therefore,
P (kBn k ≥ η) = o(n−` ),
(1.9a)
√
for any η > lim sup kT k(1 + c)2 and any positive `. By modifying the proof in Bai and
Yin (1993) on the smallest eigenvalue of (1/N )Xn Xn∗ it follows that when xl > 0
−`
n
P (λB
min ≤ η) = o(n ),
(1.9b)
√
whenever 0 < η < lim inf n λTmin (1 − c)2 . The modification is given in the appendix.
After truncation and centralization, our proof of the main theorem relies on establishing limiting results on
Mn (z) = n[mF Bn (z) − mF cn ,Hn (z)] = N [mF Bn (z) − mF cn ,Hn (z)],
cn (·), a truncated version of Mn (·) when viewed as a random twoor more precisely, on M
dimensional process defined on a contour C of the complex plane, described as follows. Let
v0 > 0 be arbitrary. Let xr be any number greater than the right end-point of interval
(1.4). Let xl be any negative number if the left end-point of (1.4) is zero. Otherwise choose
√
n
xl ∈ (0, lim inf n λTmin
(1 − c)2 ). Let
Cu = {x + iv0 : x ∈ [xl , xr ]}.
Then
C ≡ {xl + iv : v ∈ [0, v0 ]} ∪ Cu ∪ {xr + iv : v ∈ [0, v0 ]}.
cn (·). Choose sequence
We define now the subsets Cn of C on which Mn (·) agrees with M
{²n } decreasing to zero satisfying for some α ∈ (0, 1)
²n ≥ n−α .
(1.10)
½
Let
Cl =
{xl + iv : v ∈ [n−1 ²n , v0 ]}
{xl + iv : v ∈ [0, v0 ]}
8
if xl > 0
if xl < 0
and
Cr = {xr + iv : v ∈ [n−1 ²n , v0 ]}.
cn (·) can now be defined. For z = x + iv we have:
Then Cn = Cl ∪ Cu ∪ Cr . The process M

for z ∈ Cn

 Mn (z)
−1
c
(1.11)
Mn (z) =
Mn (xr + in ²n ) for x = xr , v ∈ [0, n−1 ²n ], and if xl > 0


Mn (xl + in−1 ²n ) for x = xl , v ∈ [0, n−1 ²n ].
cn (·) is viewed as a random element in the metric space C(C, R2 ) of continuous functions
M
from C to R2 . All of Chapter 2 of Billingsley (1968) applies to continuous functions from
a set such as C (homeomorphic to [0, 1]) to finite dimensional Euclidean space, with | · |
interpreted as Euclidean distance.
Most of the paper will deal with proving
cn (·)} forms a tight
Lemma 1.1. Under conditions (a) and (b) of Theorem 1.1 {M
sequence on C. Moreover, if assumptions in (2) or (3) of Theorem 1.1 on X1 1 hold, then
cn (·) converges weakly to a two-dimensional Gaussian process M (·) satisfying for z ∈ C
M
under the assumptions in (2)
EM (z) = ³
(1.12)
c
R
1−c
m(z)3 t2 dH(t)
(1+tm(z))3
R
m(z)2 t2 dH(t)
(1+tm(z))2
´2
and for z1 , z2 ∈ C
(1.13)
Cov(M (z1 ), M (z2 )) ≡ E[(M (z1 ) − EM (z1 ))(M (z2 ) − EM (z2 ))
m0 (z1 )m0 (z2 )
1
=
−
,
2
(m(z1 ) − m(z2 ))
(z1 − z2 )2
while under the assumptions in (3) EM (z) = 0, and the “covariance” function analogous
to (1.13) is 1/2 of (1.13).
We show now how Theorem 1.1 follows from the above Lemma. We use the identity
Z
Z
1
(1.14)
f (x)dG(x) = −
f (z)mG (z)dz
2πi
valid for c.d.f. G and f analytic on an open set containing the support of G. The complex
integral on the right is over any positively oriented contour enclosing the support of G and
on which f is analytic. Choose v0 , xr , and xl so that f1 , . . . , fr are all analytic on and
inside the resulting C.
Due to the a.s. convergence of the extreme eigenvalues of (1/N )Xn Xn∗ and the bounds
A
B
λAB
max ≤ λmax λmax
A
B
λAB
min ≥ λmin λmin ,
9
valid for n × n Hermitian non-negative definite A and B, we have with probability one
Bn
n
lim inf min(xr − λB
max , λmin − xl ) > 0.
n→∞
It also follows that the support of F cn ,Hn is contained in
n
I(0,1) (cn )(1 −
[λTmin
√
n
cn )2 , λTmax
(1 +
√
cn )2 ].
Therefore for any f ∈ {f1 , . . . , fr }, with probability one
Z
1
f (x)dGn (x) = −
2πi
Z
f (z)Mn (z)dz
¯ with C
¯ ≡ {¯
for all n large, where the complex integral is over C ∪ C,
z : z ∈ C}. Moreover,
with probability one, for all n large
¯Z
¯
¯
¯
√
−1
n
n
cn (z))dz ¯ ≤ 4K²n (| max(λTmax
¯ f (z)(Mn (z) − M
(1 + cn )2 , λB
max ) − xr |
¯
¯
√
−1
n
n
+ | min(λTmin
I(0,1) (cn )(1 − cn )2 , λB
)
min ) − xl |
which converges to zero as n → ∞. Here K is a bound on f over C.
Since
µ
¶
Z
Z
1
1
cn (·) −→ −
cn (z)dz, . . . , −
cn (z)dz
M
f1 (z) M
fr (z) M
2πi
2πi
is a continuous mapping of C(C, R2 ) into Rr , it follows that the above vector and subsequently (1.5) form tight sequences. Letting M (·) denote the limit of any weakly converging
cn (·)} we have the weak limit of (1.5) equal in distribution to
subsequence of {M
µ
1
−
2πi
Z
1
f1 (z) M (z)dz, . . . , −
2πi
Z
¶
fr (z) M (z)dz .
The fact that this vector, under the assumptions in (2) or (3), is multivariate Gaussian
follows from the fact that Riemann sums corresponding to these integrals are multivariate
Gaussian, and that weak limits of Gaussian vectors can only be Gaussian. The limiting
expressions for the mean and covariance follow immediately.
Notice the assumptions in (2) and (3) require X11 to have the same first, second and
fourth moments of either a real or complex Gaussian variable, the latter having real and
imaginary parts i.i.d. N (0, 1/2). We will use the terms “real Gaussian” and “complex
Gaussian” to refer to these conditions.
10
The reason why concrete results are at present only obtained for the assumptions in
(2) and (3) is mainly due to the identity
∗
∗
(1.15) E(X·1
AX·1 − tr A)(X·1
BX·1 − tr B)
4
= (E|X11 | −
2 2
|EX11
|
− 2)
n
X
2 2
aii bii + |EX11
| tr AB T + tr AB.
i=1
valid for n × n A = (aij ) and B = (bij ), which is needed in several places in the proof of
Lemma 1.1. The assumptions in (3) leave only the last term on the right, whereas those
in (2) leave the last two, but in this case the matrix B will always be symmetric. This
also accounts for the relation between the two covariance functions, and the difficulty in
obtaining explicit results more generally. As will be seen in the proof, whenever (1.15) is
P
used little is known about the limiting behavior of
aii bii .
Simple substitution reveals
ZZ
1
f (z(m1 ))g(z(m2 ))
(1.16)
R.H.S. of (1.7) = − 2
d(m1 )d(m2 ).
2π
(m1 − m2 )2
However, the contours depend on the z1 , z2 contours, and cannot be arbitrarily chosen. It
is also true that
¯
¯
ZZ
¯ m(x) − m(y) ¯
1
0
0
¯
¯ dxdy
(1.17) (1.7) = 2
f (x)g (y) ln ¯
π
m(x) − m(y) ¯
¶
µ
ZZ
1
mi (x)mi (y)
0
0
=
dxdy
f (x)g (y) ln 1 + 4
2π 2
|m(x) − m(y)|2
and
(1.18)
1
EXf =
2π
Z
0
µ
Z
f (x) arg 1 − c
¶
t2 m2 (x)
dH(t) dx.
(1 + tm(x))2
Here for 0 6= x ∈ R
(1.19)
m(x) = lim m(z)
z→x
z ∈ C+ ,
known to exist and satisfying (1.2) (see Silverstein and Choi (1995)), and mi (x) = Im m(x).
The term
µ
¶
Z
t2 m2 (x)
j(x) = arg 1 − c
dH(t)
(1 + tm(x))2
in (1.18) is well defined for almost every x and takes values in (−π/2, π/2). Section 5
contains proofs of (1.17) and (1.18), along with showing
µ
¶
mi (x)mi (y)
(1.20)
k(x, y) ≡ ln 1 + 4
|m(x) − m(y)|2
11
to be Lebesgue integrable on R2 . It is interesting to note that the support of k(x, y)
matches the support of f c,H on R − {0}: k(x, y) = 0 ⇐⇒ min(f c,H (x), f c,H (y)) = 0. We
also have f c,H (x) = 0 =⇒ j(x) = 0.
Section 5 also contains derivations of the relevant quantities associated with the example given at the beginning of this section. The linear spectral statistic (1/n)TN has a.s.
limit d(c) as stated in (1.1). The quantity TN − nd(n/N ) converges weakly to a Gaussian
random variable Xln with
EXln =
(1.21)
1
ln(1 − c)
2
and
Var Xln = −2 ln(1 − c).
(1.22)
R
R
Results on both TN − ETN and n[ xr dF SN (x) − E xr dF SN (x)] for positive integer
r are derived in Jonsson (1982). Included in Section 5 are derivations of the following
expressions for means and covariances in this case (H = I[1,∞) ). We have
(1.23)
EXxr
r µ ¶2
√ 2r
√ 2r
1
1X r
= ((1 − c) + (1 + c) ) −
cj
4
2 j=0 j
and
Cov(Xxr1 , Xxr2 )
¶k +k r −k µ
¶µ
¶
r2 µ ¶µ ¶µ
rX
1 −1 X
r1 r2
1 − c 1 2 1X1 2r1 −1−(k1 + `) 2r2 −1−k2 + `
r1 +r2
= 2c
`
.
r1 − 1
r2 − 1
k1 k2
c
(1.24)
k1 =0 k2 =0
`=1
It is noteworthy to mention here a consequence of (1.17), namely that if the assumptions in (2) or (3) of Theorem 1.1 were to hold, then Gn , considered as a random element
in D[0, ∞) (the space of functions on [0, ∞) that are right-continuous with left-hand limits, together with the Skorohod metric) cannot form a tight sequence in D[0, ∞). Indeed,
under the assumptions of either one, if G(x) were a weak limit of a subsequence, then,
because of Theorem 1.1, it is straightforward to conclude that for any x0 in the interior of
the support of F and positive ²
Z
x0 +²
G(x)dx
x0
would be Gaussian, and therefore so would
1
G(x) = lim
²→0 ²
Z
x0 +²
G(x)dx.
x0
12
However, the variance would necessarily be
Z
Z
1 1 x0 +² x0 +²
lim
k(x, y)dxdy = ∞.
²→0 2π 2 ²2 x
x0
0
Still, under the assumptions in (2) or (3), a limit may exist for {Gn } when Gn is
viewed as a linear functional
Z
f −→ f (x)dGn (x),
that is, a limit expressed in terms of a measure in a space of generalized functions. The
characterization of the limiting measure of course depends on the space, which in turn
relies on the set of test functions, which for now is restricted to functions analytic on the
support of F . Work in this area is currently being pursued.
The proof of Lemma 1.1 is divided into three sections. Sections 2 and 3 handles the
limiting behavior of the centralized Mn , while section 4 analyzes the non-random part. In
each of the three sections the reader will be referred to work done in Bai and Silverstein
(1998).
2. Convergence of finite dimensional distributions. Write for z ∈ Cn , Mn (z) =
+ Mn2 (z) where
Mn1 (z)
Mn1 (z) = n[mF Bn (z) − EmF Bn (z)]
and
Mn2 (z) = n[mEF Bn (z) − mF cn ,Hn (z)].
In this section we will show for any positive integer r, the sum
r
X
αi Mn1 (zi )
(Im zi 6= 0)
i=1
whenever it is real, is tight, and, under the assumptions in (2) or (3) of Theorem 1.1,
will converge in distribution to a Gaussian random variable. Formula (1.13) will also be
derived. We begin with a list of results.
Lemma 2.1 [Burkholder (1973)]. Let {Xk } be a complex martingale difference sequence with respect to the increasing σ-field {Fk }. Then for p > 1
¯X ¯p
´p/2
³X
¯
¯
E¯
.
X k ¯ ≤ Kp E
|Xk |2
(Note: The reference considers only real variables. Extending to complex variables is
straightforward.)
Lemma 2.2 [Lemma 2.7 in Bai and Silverstein (1998)]. For X = (X1 , . . . , Xn )T i.i.d.
standardized (complex) entries, C n × n matrix (complex) we have for any p ≥ 2
³¡
´
¢p/2
E|X ∗ CX − tr C|p ≤ Kp E|X1 |4 tr CC ∗
+ E|X1 |2p tr (CC ∗ )p/2 .
13
Lemma 2.3. Let f1 , f2 , . . . be analytic in D, a connected open set of C, satisfying
|fn (z)| ≤ M for every n and z in D, and fn (z) converges, as n → ∞ for each z in a subset
of D having a limit point in D. Then there exists a function f , analytic in D for which
fn (z) → f (z) and fn0 (z) → f 0 (z) for all z ∈ D. Moreover, on any set bounded by a contour
interior to D the convergence is uniform and {fn0 (z)} is uniformly bounded.
Proof. The conclusions on {fn } are from Vitali’s convergence theorem (see Titchmarsh
(1939), p. 168). Those on {fn0 } follow from the dominated convergence theorem (d.c.t.)
and the identity
Z
fn (w)
1
0
fn (z) =
dw.
2πi C (w − z)2
Lemma 2.4 [Theorem 35.12 of Billingsley (1995)]. Suppose for each n Yn1 , Yn2 , . . . ,
Ynrn is a real martingale difference sequence with respect to the increasing σ-field {Fnj }
having second moments. If as n → ∞
rn
X
(i)
i.p.
2
E(Ynj
|Fn,j−1 ) −→ σ 2
j=1
where σ 2 is a positive constant, and for each ² > 0
rn
X
(ii)
2
E(Ynj
I(|Ynj |≥²) ) → 0
j=1
then
rn
X
Ynrn →D N (0, σ 2 ).
j=1
Recalling the truncation and centralization steps, we get from Lemma 2.2
(2.1)
∗
E|X·1
CX·1 − tr C|p ≤ Kp kCkp [np/2 + δn(2p−4) n(p−1) ] ≤ K p kCkp δn(2p−4) n(p−1) ,
p ≥ 2.
Let v = Im z. For the following analysis we will assume v > 0. To facilitate notation,
we will let T = Tn . Because of assumption 20 we may assume kT k ≤ 1 for all n. Constants
appearing in inequalities will be denoted by K and may take on different values from one
√
expression to the next. Let rj = (1/ N )T 1/2 X· j , D(z) = Bn − zI, Dj (z) = D(z) − rj rj∗
²j (z) = rj∗ Dj−1 (z)rj −
1
tr T Dj−1 (z),
N
δj (z) = rj∗ Dj−2 (z)rj −
1
d
tr T Dj−2 (z) =
²j (z)
N
dz
and
βj (z) =
1
1
1
¯j (z) =
,
β
(z)
=
,
b
.
n
−1
−1
1 + rj∗ Dj (z)rj
1 + N −1 tr Tn Dj (z)
1 + N −1 Etr Tn D1−1 (z)
14
All of the three latter quantities are bounded in absolute value by |z|/v (see (3.4) of Bai
and Silverstein (1998)). We have
D−1 (z) − Dj−1 (z) = −Dj−1 (z)rj rj∗ Dj−1 (z)βj (z),
and from Lemma 2.10 of Bai and Silverstein (1998) for any n × n A
|tr (D−1 (z) − Dj−1 (z))A| ≤
(2.2)
kAk
.
Im z
For nonrandom n×n Ak , k = 1, · · · , p and Bl , l = 1, · · · , q, we shall establish the following
inequality.
(2.3) |E(
p
Y
r1∗ Ak r1
k=1
q
Y
(r1∗ Bl r1 − N −1 tr T Bl ))|
l=1
KN −(1∧q) δn(2q−4)∨0
≤
p
Y
kAk k
k=1
q
Y
kBl k p ≥ 0, q ≥ 0.
l=1
When p = 0, q = 1, the left side is 0. When p = 0, q > 1, (2.3) is a consequence of (2.1)
and H¨
older’s inequality. If p ≥ 1, then by induction on p we have
|E(
p
Y
r1∗ Ak r1
k=1
≤ |E(
p−1
Y
q
Y
(r1∗ Bl r1 − N −1 tr T Bl ))|
l=1
r1∗ Ak r1 (r1∗ Ap r1
−N
−1
tr T Ap )
k=1
+nN
q
Y
(r1∗ Bl r1 − N −1 tr T Bl ))|
l=1
−1
kAp k|E(
p−1
Y
r1∗ Ak r1
k=1
≤
q
Y
(r1∗ Bl r1 − N −1 tr T Bl ))|
l=1
KN −1 δn(2q−4)∨0
p
Y
kAk k
p
k=1
q
Y
kBl k.
l=1
We have proved the case where q > 0. When q = 0, (2.3) is an trivial consequence of (2.1).
Let E0 (·) denote expectation and Ej (·) denote conditional expectation with respect to
the σ-field generated by r1 , · · · , rj .
We have
n[mF Bn (z) − EmF Bn (z)] = tr [D
−1
(z) − ED
−1
(z)] =
N
X
j=1
15
tr Ej D−1 (z) − tr Ej−1 D−1 (z)
=
N
X
tr Ej [D−1 (z) − Dj−1 (z)] − tr Ej−1 [D−1 (z) − Dj−1 (z)]
j=1
=−
N
X
(Ej − Ej−1 )βj (z)rj∗ Dj−2 (z)rj .
j=1
Write βj (z) = β¯j (z) − βj (z)β¯j (z)²j (z) = β¯j (z) − β¯j2 (z)²j (z) + β¯j2 (z)βj (z)²2j (z). We
have then
(Ej − Ej−1 )βj (z)rj∗ Dj−2 (z)rj = (Ej − Ej−1 )(β¯j (z)δj (z) − β¯j2 (z)²j (z)δj (z)
1
− β¯j2 (z)²j (z) tr T Dj−2 (z) + β¯j2 (z)βj (z)²2j (z)rj∗ Dj−2 (z)rj )
N
1
= Ej (β¯j (z)δj (z) − β¯j2 (z)²j (z) tr T Dj−2 (z))
N
− (Ej − Ej−1 )β¯j2 (z)(²j (z)δj (z) − βj (z)rj Dj−2 (z)rj ²2j (z)).
Using (2.3), we have
E|
N
X
(Ej − Ej−1 )β¯j2 (z)²j (z)δj (z)|2 =
j=1
N
X
E|(Ej − Ej−1 )β¯j2 (z)²j (z)δj (z)|2
j=1
≤4
N
X
E|β¯j2 (z)²j (z)δj (z)|2 = o(1).
j=1
Therefore,
N
X
(Ej − Ej−1 )β¯j2 (z)²j (z)δj (z) converges to zero in probability.
j=1
By the same argument, we have
N
X
i.p.
(Ej − Ej−1 )βj (z)rj Dj−2 (z)rj ²2j (z) −→ 0.
j=1
Therefore we need only consider the sum
r
X
i=1
αi
N
X
Yj (zi ) =
j=1
r
N X
X
αi Yj (zi )
j=1 i=1
where
1
d
Yj (z) = −Ej (β¯j (z)δj (z) − β¯j2 (z)²j (z) tr T Dj−2 (z)) = −Ej β¯j (z)²j (z).
N
dz
16
Again, by using (2.3), we obtain
µ
4
E|Yj (z)| ≤ K
¶
|z|4
|z|8 ³ n ´4
4
4
E|δj (z)| + 16
E|²j (z)| = o(N −1 )
v4
v
N
which implies for any ² > 0
¯2
¯ r
¯4
¶
µ¯X
N
N
X
¯ r
¯
¯
1 X ¯¯X
P
¯
¯
E ¯
αi Yj (zi )¯ I(| r αi Yj (zi )|≥²) ≤ 2
E¯
αi Yj (zi )¯¯ → 0
² j=1 i=1
i=1
j=1
i=1
as n → ∞. Therefore condition (ii) of Lemma 2.4 is satisfied and it is enough to prove,
under the assumptions in (2) or (3), for z1 , z2 ∈ C+
N
X
(2.4)
Ej−1 [Yj (z1 )Yj (z2 )]
j=1
converges in probability to a constant (and to determine the constant).
Pr
We show here for future use the tightness of the sequence { i=1 αi Mn1 (zi )}. From
(2.3) we easliy get E|Yj (z)|2 = O(N −1 ), so that
(2.5)
¯ r
¯2
¯ r
¯2
¯
¯2
N
r
N
N X
X
X
¯X X
¯
¯X
¯
¯
¯
E¯¯
αi
Yj (zi )¯¯ =
E¯¯
αi Yj (zi )¯¯ ≤ r
|αi |2 E¯¯Yj (zi )¯¯ ≤ K.
i=1
j=1
j=1
i=1
j=1 i=1
Consider the sum
(2.6)
N
X
Ej−1 [Ej (β¯j (z1 )²j (z1 ))Ej (β¯j (z2 )²j (z2 ))].
j=1
In the j th term (viewed as an expectation with respect to rj+1 , . . . , rN ) we apply the d.c.t.
to the difference quotient defined by β¯j (z)²j (z) to get
∂2
(2.6) = (2.4).
∂z2 ∂z1
Let v0 be a lower bound on Imzi . For each j let Aij = (1/N )T 1/2 Ej Dj−1 (zi )T 1/2 ,
∗
i = 1, 2. Then tr Aij Aij ≤ n(v0 N )−2 . Using (2.1) we see, therefore, that (2.6) is bounded.
We can then appeal to Lemma 2.3. Suppose (2.6) converges in probability for each
zk , zl ∈ {zi }, bounded away from the imaginary axis and having a limit point. Then by a
diagonalization argument, for any subsequence of the natural numbers, there is a further
subsequence such that, with probability one, (2.6) converges for each pair zk , zl . Applying
Lemma 2.3 twice we see that almost surely (2.4) will converge on the subsequence for each
17
pair. That is enough to imply convergence in probability of (2.4). Therefore we need only
show (2.6) converges in probability.
From the derivation above (4.3) of Bai and Silverstein (1998) we get
E|β¯j (zi ) − bn (zi )|2 ≤ K
|zi |4 −1
N .
v06
This implies
E|Ej−1 [Ej (β¯j (z1 )²j (z1 ))Ej (β¯j (z2 )²j (z2 ))] − Ej−1 [Ej (bn (z1 )²j (z1 ))Ej (bn (z2 )²j (z2 ))]|
= O(N −3/2 ),
from which we get
N
X
Ej−1 [Ej (β¯j (z1 )²j (z1 ))Ej (βˆj (z2 )²j (z2 ))] − bn (z1 )bn (z2 )
j=1
N
X
Ej−1 [Ej (²j (z1 ))Ej (²j (z2 ))]
j=1
i.p.
−→ 0.
Thus the goal is to show
(2.7)
bn (z1 )bn (z2 )
N
X
Ej−1 [Ej (²j (z1 ))Ej (²j (z2 ))]
j=1
converges in probability, and to determine its limit. The latter’s second mixed partial
derivative will yield the limit of (2.4).
2
We now assume the “complex Gaussian” case, namely EX11
= o(1/n) and E|X11 |4 =
2 + o(1), so that, using (1.15), (2.7) becomes
bn (z1 )bn (z2 )
N
1 X
(tr T 1/2 Ej (Dj−1 (z1 ))T Ej (Dj−1 (z2 ))T 1/2 + o(1)An )
2
N j=1
where
¯ −1 (z1 ))tr T Ej (D−1 (z2 ))T Ej (D
¯ −1 (z2 )))1/2 = O(N ).
|An | ≤ K(tr T Ej (Dj−1 (z1 ))T Ej (D
j
j
j
Thus we study
(2.8)
N
1 X
bn (z1 )bn (z2 ) 2
tr T 1/2 Ej (Dj−1 (z1 ))T Ej (Dj−1 (z2 ))T 1/2
N j=1
18
The “real Gaussian” case (Tn , X11 real, E|X11 |4 = 3 + o(1)) will be double that of the
limit of (2.8).
Let Dij (z) = D(z) − ri ri∗ − rj rj∗ ,
βij (z) =
1
−1
1 + ri∗ Dij
(z)ri
and
b1 (z) =
We write
Dj (z1 ) + z1 I −
N −1
N b1 (z1 )T
=
N
X
1
.
1 + N −1 Etr T D1−1
2 (z)
ri ri∗ −
N −1
N b1 (z1 )T.
i6=j
Multiplying by (z1 I −
N −1
−1
N b1 (z1 )T )
on the left, Dj−1 (z1 ) on the right, and using
−1
ri∗ Dj−1 (z1 ) = βij (z1 )ri∗ Dij
(z1 )
we get
Dj−1 (z1 ) = −(z1 I −
−1
N −1
N b1 (z1 )T )
+
X
βij (z1 )(z1 I −
−1
∗ −1
N −1
N b1 (z1 )T ) ri ri Dij (z1 )
i6=j
−
(2.9)
= −(z1 I −
where
A(z1 ) =
B(z1 ) =
−1
N −1
N b1 (z1 )T )
X
X
(z1 I −
N −1
N b1 (z1 )(z1 I
−
−1
−1
N −1
N b1 (z1 )T ) T Dj (z1 )
+ b1 (z1 )A(z1 ) + B(z1 ) + C(z1 ),
−1
N −1
(ri ri∗
N b1 (z1 )T )
−1
− N −1 T )Dij
(z1 )
i6=j
(βij (z1 ) − b1 (z1 ))(z1 I −
−1
−1
N −1
ri ri∗ Dij
(z1 )
N b1 (z1 )T )
i6=j
and
C(z1 ) = N −1 b1 (z1 )(z1 I −
−1
N −1
T
N b1 (z1 )T )
X
−1
(Dij
(z1 ) − Dj−1 (z1 )).
i6=j
It is easy to verify for any real t
¯
¯−1
−1
−1
¯
¯
t
¯1 −
¯ ≤ |z(1 + N Etr T D12 (z))|
−1
−1
¯
z(1 + N −1 Etr T D12
(z)) ¯
Im z(1 + N −1 Etr T D12
(z))
≤
Thus
(2.10)
k(z1 I −
−1
N −1
k
N b1 (z1 )T )
19
≤
1 + n/(N v0 )
.
v0
|z|(1 + n/(N v0 ))
.
v0
Let M be n × n and let |||M ||| denote a nonrandom bound on the spectral norm of M
for all parameters governing M and under all realizations of M . From (4.3) of Bai and
Silverstein (1998), (2.3), and (2.10) we get
(2.11)
−1
E|tr B(z1 )M | ≤ N E1/2 (|β1 2 (z1 ) − b1 (z1 )|2 )E1/2 (|ri∗ Dij
(z1 )M (z1 I −
≤
−1
N −1
ri |2 )
N b1 (z1 )T )
|z1 |2 (1 + n/(N v0 )) 1/2
K|||M |||
N .
v05
From (2.2) we have
|tr C(z1 )M | ≤ |||M |||
(2.12)
|z1 |(1 + n/(N v0 ))
.
v03
From (2.3) and (2.10) we get for M nonrandom
(2.13) E|tr A(z1 )M | ≤
−1
(z1 )M (z1 I −
KE1/2 tr T 1/2 Dij
−1
N −1
T (¯
z1 I
N b1 (z1 )T )
−
−1
N −1
z1 )T )−1 M ∗ Dij
(¯
z1 )T 1/2
N b1 (¯
≤ KkM k
(1 + n/(N v0 )) 1/2
N .
v02
We write (using the identity above (2.2))
tr Ej (A(z1 ))T Dj−1 (z2 )T = A1 (z1 , z2 ) + A2 (z1 , z2 ) + A3 (z1 , z2 ),
(2.14)
where
A1 (z1 , z2 ) =
X
(z1 I −
− tr
i<j
=−
X
−1
−1
−1
−1
N −1
ri ri∗ Ej (Dij
(z1 ))T βij (z2 )Dij
(z2 )ri ri∗ Dij
(z2 )T
N b1 (z1 )T )
−1
−1
−1
βij (z2 )ri∗ Ej (Dij
(z1 ))T Dij
(z2 )ri ri∗ Dij
(z2 )T (z1 I −
−1
N −1
ri
N b1 (z1 )T )
i<j
A2 (z1 , z2 ) =
− tr
X
(z1 I −
−1
−1 −1
N −1
N T Ej (Dij
(z1 ))T (Dj−1 (z2 )
N b1 (z1 )T )
−1
− Dij
(z2 ))T
i<j
and
A3 (z1 , z2 ) = tr
X
(z1 I −
−1
N −1
(ri ri∗
N b1 (z1 )T )
i<j
20
−1
−1
− N −1 T )Ej (Dij
(z1 ))T Dij
(z2 )T.
We get from (2.2) and (2.10)
|A2 (z1 , z2 )| ≤
(2.15)
(1 + n/(N v0 ))
v02
and similar to (2.11) we have
E|A3 (z1 , z2 )| ≤
(1 + n/(N v0 )) 1/2
N .
v03
Using (2.1), (2.3), and (4.3) of Bai and Silverstein (1998) we have for i < j
−1
−1
−1
E|βij (z2 )ri∗ Ej (Dij
(z1 ))T Dij
(z2 )ri ri∗ Dij
(z2 )T (z1 I −
−
−1
N −1
ri
N b1 (z1 )T )
−1
−1
−1
b1 (z2 )N −2 tr (Ej (Dij
(z1 ))T Dij
(z2 )T )tr (Dij
(z2 )T (z1 I − NN−1 b1 (z1 )T )−1 T )|
≤ KN −1/2
(K now depending as well on the zi and on n/N ). Using (2.2) we have
−1
−1
−1
|tr (Ej (Dij
(z1 ))T Dij
(z2 )T )tr (Dij
(z2 )T (z1 I −
−
−1
N −1
T)
N b1 (z1 )T )
tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T )tr (Dj−1 (z2 )T (z1 I − NN−1 b1 (z1 )T )−1 T )|
≤ KN.
It follows that
(2.16)
E|A1 (z1 , z2 ) +
j−1
−1
−1
−1
N 2 b1 (z2 )tr (Ej (Dj (z1 ))T Dj (z2 )T )tr (Dj (z2 )T (z1 I
−
−1
N −1
N b1 (z1 )T ) T )|
1/2
≤ KN
.
Therefore, from (2.9)–(2.16) we can write
tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T )[1 +
j−1
−1
−1
N −1
T )]
N 2 b1 (z1 )b1 (z2 )tr (Dj (z2 )T (z1 I − N b1 (z1 )T )
= −tr ((z1 I − NN−1 b1 (z1 )T )−1 T Dj−1 (z2 )T ) + A4 (z1 , z2 )
where
E|A4 (z1 , z2 )| ≤ KN 1/2 .
Using the expression for Dj−1 (z2 ) in (2.9) and (2.11)–(2.13) we find that
tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T )
[1 −
j−1
−1
N −1
T (z1 I − NN−1 b1 (z1 )T )−1 T )]
N 2 b1 (z1 )b1 (z2 )tr ((z2 I − N b1 (z2 )T )
= tr ((z2 I − NN−1 b1 (z2 )T )−1 T (z1 I − NN−1 b1 (z1 )T )−1 T ) + A5 (z1 , z2 )
21
where
E|A5 (z1 , z2 )| ≤ KN 1/2 .
From (2.2) we have
|b1 (z) − bn (z)| ≤ KN −1 .
From (4.3) of Bai and Silverstein (1998) we have
|bn (z) − Eβ1 (z)| ≤ KN −1/2 .
From the formula
N
1 X
mn = −
βj (z)
zN j=1
[(2.2) of Silverstein (1995)] we get Eβ1 (z) = −zEmn (z). Section 5 of Bai and Silverstein
(1998) proves that
|Emn (z) − mn0 (z)| ≤ KN −1 .
Therefore we have
|b1 (z) + zm0n (z)| ≤ KN −1/2 ,
(2.17)
so that we can write
(2.18) tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T )
[1 −
j−1 0
0
N 2 mn (z1 )mn (z2 )tr ((I
=
+ m0n (z2 )T )−1 T (I + m0n (z1 )T )−1 T )]
1
tr ((I + m0n (z2 )T )−1 T (I + m0n (z1 )T )−1 T ) + A6 (z1 , z2 )
z1 z2
where
E|A6 (z1 , z2 )| ≤ KN 1/2 .
Rewrite (2.18) as
tr (Ej (Dj−1 (z1 ))T Dj−1 (z2 )T )
[1 −
Z
t2 dHn (t)
]
(1 + tm0n (z1 ))(1 + tm0n (z2 ))
Z
N cn
t2 dHn (t)
=
+ A6 (z1 , z2 ).
z1 z2
(1 + tm0n (z1 ))(1 + tm0n (z2 ))
j−1
0
0
N cn mn (z1 )mn (z2 )
22
Using (3.9) and (3.16) in Bai and Silverstein (1998) we find
¯
¯
Z
2
¯
¯
dH
(t)
t
n
0
0
¯
(2.19) ¯¯cn mn (z1 )mn (z2 )
(1 + tm0n (z1 ))(1 + tm0n (z2 )) ¯
¯
¯
R
t2 dHn (t)
¯
¯
0
(1+tmn (z1 ))(1+tm0n (z2 ))
¶µ
= ¯¯cn µ
R t dHn (t)
R
¯
−z1 + cn 1+tm0 (z1 )
−z2 + cn
¯
n

R

≤ cn ¯
R
¯
¯−z1 + cn

=
Im m0n (z1 )cn
Im z1 +
1/2 
t2 dHn (t)
|1+tm0n (z1 )|2
R

¯2 
t dHn (t) ¯
1+tm0n (z1 ) ¯
t2 dHn (t)
|1+tm0n (z1 )|2
Im m0n (z1 )cn
since
R
t2 dHn (t)
|1+tm0n (z1 )|2
R
t2 dHn (t)
|1+tm0n (z2 )|2

cn ¯
R
¯
¯−z2 + cn
1/2
Im z2 +
1/2

¯2 
t dHn (t) ¯
1+tm0n (z2 ) ¯
Im m0n (z2 )cn
 
¯
¯
¯
¯
¶ ¯¯
t dHn (t)
¯
1+tm0n (z2 ) ¯
R
t2 dHn (t)
|1+tm0n (z2 )|2
Im m0n (z2 )cn
R
t2 dHn (t)
|1+tm0n (z2 )|2
1/2
 <1
Im z
R t2 dHn (t)
Im m0n (z)cn |1+tm
0 (z)|2
n
is bounded away from 0. Therefore using (2.17) and letting an (z1 , z2 ) denote the expression
inside the absolute value sign in (2.19) we find that (2.8) can be written as
N
1 X
an (z1 , z2 )
N j=1 1 −
1
j−1
N an (z1 , z2 )
+ A7 (z1 , z2 )
where
E|A7 (z1 , z2 )| ≤ KN −1/2 .
We see then that
i.p.
(2.8) −→ a(z1 , z2 )
where
Z
0
1
1
dt =
1 − ta(z1 , z2 )
Z
0
a(z1 ,z2 )
1
dz
1−z
Z
t2 dH(t)
=
(1 + tm(z1 ))(1 + tm(z2 ))
µ Z
¶
Z
t dH(t)
t dH(t)
m(z1 )m(z2 )(z1 − z2 )
m(z1 )m(z2 )
c
−c
=1+
.
m(z2 ) − m(z1 )
1 + tm(z1 )
1 + tm(z2 )
m(z2 ) − m(z1 )
a(z1 , z2 ) = cm(z1 )m(z2 )
Thus the i.p. limit of (2.4) under the “complex Gaussian” case is
à ∂
!
Z a(z1 ,z2 )
a(z
,
z
)
1
2
1
∂
∂2
∂z1
dz =
∂z2 ∂z1 0
1−z
∂z2 1 − a(z1 , z2 )
23
·
¸
∂ (m(z2 )−m(z1 ))(m0 (z1 )m(z2 )(z1 −z2 )+m(z1 )m(z2 ))+m(z1 )m(z2 )(z1 −z2 )m0 (z1 )
=
∂z2
(m(z2 )−m(z1 ))2
m(z2 ) − m(z1 )
×
m(z1 )m(z2 )(z2 − z1 )
∂
=−
∂z2
µ
=
m0 (z1 )
m0 (z1 )
1
+
+
m(z1 )
z 1 − z2
m(z2 ) − m(z1 )
¶
m0 (z1 )m0 (z2 )
1
−
2
(m(z2 ) − m(z1 ))
(z1 − z2 )2
which is (1.13).
3. Tightness of Mn1 (z). We proceed to prove tightness of the sequence of random
cn1 (z) for z ∈ C defined by (1.11). We will use Theorem 12.3 (p. 96) of Billingsley
functions M
(1968). It is easy to verify from the proof of the Arzela-Ascoli Theorem (p. 221 of
Billingsley (1968)), that condition (i) of Theorem 12.3 can be replaced with the assumption
of tightness at any point in [0, 1]. From (2.5) we see that this condition is satisfied. We
will verify condition (ii) of Theorem 12.3 by proving the moment condition (12.51) of
Billingsley (1968). We will show
E|Mn1 (z1 ) − Mn1 (z2 )|2
|z1 − z2 |2
n z1 ,z2 ∈Cn
sup
is finite.
−1
We claim that moments of kD−1 (z)k, kDj−1 (z)k and kDij
(z)k are bounded in n and
z ∈ Cn . This is clearly true for z ∈ Cu and for z ∈ Cl if xl < 0. For z ∈ Cr or, if xl > 0,
z ∈ Cl , we use (1.9) and (1.10) on, for example B(1) = Bn − r1 r1∗ , to get
B
(1)
EkDj−1 (z)kp ≤ K1 + v −p P(kB(1) k ≥ ηr or λmin
≤ ηl ) ≤ K1 + K2 np ²−p n−` ≤ K
√
for suitably large `. Here, ηr is any fixed number between lim supn kT k(1 + c)2 and xr ,
√
and, if xl > 0, ηl is any fixed number between xl and lim inf n λTmin (1 − c)2 (take ηl < 0
if xl < 0). Therefore for any positive p
−1
(z)kp ) ≤ Kp .
max(EkD−1 (z)kp , EkDj−1 (z)kp , EkDij
(3.1)
We can use the above argument to extend (2.3). Using (1.8) and (2.3) we get
(3.2)
q
Y
|E(a(v) (r1∗ Bl (v)r1 − N −1 tr T Bl (v)))| ≤ KN −(1∧q) δn(2q−4)∨0
l=1
24
p ≥ 0, q ≥ 0
where now the matrices Bl (v) are independent of r1 and
e
max(|a(v)|, kBl (v)k) ≤ K(1 + ns I(kBn k ≥ ηr or λB
min ≤ ηl ))
e being Bn or Bn with one or two of the rj ’s removed.
for some positive s, with B
We remark here that when q ≥ 2, (3.2) still holds if the absolute sign is inside the
expectation sign, which is a consequence of (3.2) and an application of Cauchy-Schwarz
inequality
q
Y
E|a(v) (r1∗ Bl (v)r1 − N −1 tr T Bl (v)))|
l=1
1/2
≤E
(a(v)¯
a(v)
q−1
Y
¯l (v)))
¯l (v)r1 − N −1 tr T B
(r1∗ Bl (v)r1 − N −1 tr T Bl (v))(r1∗ B
l=1
×E1/2 ((r1∗ Bq (v)r1 − N −1 tr T Bq (v))(r1∗ B q (v)r1 − N −1 tr T B q (v))) ≤ KN −(1∧q) δn(2q−4)∨0 .
Also, we would like to inform the reader that in applications of (3.2), a(v) is a product
of factors of the form β1 (z) or r1∗ A(z)r1 and A is a product of one or several D1−1 (z)
D1−1 (zj ), j = 1, 2 or similarly defined D−1 matrices. The matrices Bl also have this
form. For example we have |r1∗ D1−1 (z1 )D1−1 (z2 )r1 | ≤ |r1 |2 kD1−1 (z1 ))D1−1 (z2 )k ≤ K 2 ηr +
B(1)
|z|n3+2α I(kBn k ≥ ηr or λmin
≤ ηl )), where K can be taken to be max((xr − ηr )−1 , (ηl −
xl )−1 , v0−1 ), and where we have used (1.10) and the fact that |r1 |2 ≤ ηr if kBn k < ηr
and |r1 |2 ≤ n otherwise. We have kBl k obviously satisfying this condition. We also have
β1 (z) satisfying this condition since from (3.3) (see below) |β1 (z)| = |1 − r1∗ D−1 r1 | ≤
n
1 + Kηr + |z|n2+α I(kBn k ≥ ηr or λB
min ≤ ηl )). In the sequel, we shall freely use (3.2)
without verifying these conditions, even similarly defined βj functions and A, B matrices.
We have
(3.3)
D
−1
(z) −
Dj−1 (z)
=−
Dj−1 (z)rj rj∗ Dj−1 (z)
1 + rj∗ Dj−1 (z)rj
= −βj (z)Dj−1 (z)rj rj∗ Dj−1 (z).
Let
γj (z) = rj∗ Dj−1 (z)rj − N −1 E(tr (Dj−1 (z)T ).
We first derive bounds on the moments of γj (z) and ²j (z). Using (3.2) we have
E|²j (z)|p ≤ Kp N −1 δn2p−4 ,
(3.4)
p ≥ 2.
It should be noted that constants obtained do not depend on z ∈ Cn .
Using Lemma 2.1, (3.2), and H¨
older’s inequality, we have for all p ≥ 2
¯ X
¯p
¯1 N
¯
−1
−1
E|γj (z) − ²j (z)| = E|γ1 (z) − ²1 (z)| = E¯¯
Ej tr T D1 (z) − Ej−1 tr T D1 (z)¯¯
N
p
p
j=2
25
¯ X
¯p
¯1 N
¯
−1
−1
−1
−1
¯
= E¯
Ej tr T (D1 (z) − D1j (z)) − Ej−1 tr T (D1 (z) − D1j (z))¯¯
N
j=2
¯p
¯N
¯
1 ¯¯X
−1
∗ −1
= p E¯ (Ej − Ej−1 )β1j (z)rj D1j (z)T D1j (z)rj ¯¯
N
j=2
p/2

¯
¯
N
2
X¯
¯
Kp
−1
−1
≤ p E  ¯¯(Ej − Ej−1 )β1j (z)rj∗ D1j
(z)T D1j
(z)rj ¯¯ 
N
j=2
≤
Kp
N 1+p/2
N
X
−1
−1
E|(Ej − Ej−1 )β1j (z)rj∗ D1j
(z)T D1j
(z)rj |p
j=2
≤
Kp
Kp
−1
∗ −1
p
E|β
(z)r
D
(z)T
D
(z)r
|
≤
.
12
2
2
12
12
N p/2
N p/2
Therefore
E|γj |p ≤ Kp N −1 δn2p−4 ,
(3.5)
p ≥ 2.
We next prove that bn (z) is bounded for all n. From (3.2) we find for any p ≥ 1
E|β1 (z)|p ≤ Kp .
(3.6)
Since bn = β1 (z) + β1 (z)bn (z)γ1 (z) we get from (3.5),(3.6)
|bn (z)| = |Eβ1 (z) + Eβ1 (z)bn (z)γ1 (z)| ≤ K1 + K2 |bn (z)|N −1/2 .
Thus for all n large
|bn (z)| ≤
K1
,
1 − K2 N −1/2
and subsequently bn (z) is bounded for all n.
From (3.3) we have D−1 (z1 )D−1 (z2 ) − Dj−1 (z1 )Dj−1 (z2 )
= (D−1 (z1 ) − Dj−1 (z1 ))(D−1 (z2 ) − Dj−1 (z2 )) + (D−1 (z1 ) − Dj−1 (z1 ))Dj−1 (z2 )
+ Dj−1 (z1 )(D−1 (z2 ) − Dj−1 (z2 ))
= βj (z1 )βj (z2 )Dj−1 (z1 )rj rj∗ Dj−1 (z1 )Dj−1 (z2 )rj rj∗ Dj−1 (z2 )
− βj (z1 )Dj−1 (z1 )rj rj∗ Dj−1 (z1 )Dj−1 (z2 ) − βj (z2 )Dj−1 (z1 )Dj−1 (z2 )rj rj∗ Dj−1 (z2 ).
Therefore,
26
¡
¢
(3.7) tr D−1 (z1 )D−1 (z2 ) − Dj−1 (z1 )Dj−1 (z2 ) = βj (z1 )βj (z2 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2
− βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj − βj (z2 )rj∗ Dj−2 (z2 )Dj−1 (z1 )rj .
We write
mn (z1 ) − mn (z2 ) =
1
1
tr (D−1 (z1 ) − D−1 (z2 )) = (z1 − z2 )tr D−1 (z1 )D−1 (z2 )
n
n
Therefore, from (3.7) we have
mn (z1 ) − mn (z2 ) − E(mn (z1 ) − mn (z2 )) X
n
=
(Ej − Ej−1 )tr D−1 (z1 )D−1 (z2 )
z1 − z2
j=1
N
(3.8)
=
N
X
(Ej − Ej−1 )βj (z1 )βj (z2 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2
j=1
−
N
X
(Ej − Ej−1 )βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj
−
j=1
N
X
(Ej − Ej−1 )βj (z2 ))rj∗ Dj−2 (z2 )Dj−1 (z1 )rj .
j=1
Our goal is to show the absolute second moment of (3.8) is bounded. We begin with
the second sum in (3.8). We have
N
X
(Ej − Ej−1 )βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj
j=1
=
N
X
¡
¢
(Ej − Ej−1 ) bn (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj −βj (z1 )bn (z1 ))rj∗ Dj−2 (z1 )Dj−1 (z2 )rj γj (z1 )
j=1
= bn (z1 )
N
X
Ej (rj∗ Dj−2 (z1 )Dj−1 (z2 )rj − N −1 tr T 1/2 Dj−2 (z1 )Dj−1 (z2 )T 1/2 )
j=1
− bn (z1 )
N
X
(Ej − Ej−1 )βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj γj (z1 ) ≡ bn (z1 )W1 − bn (z1 )W2 .
j=1
Using (3.2) we have
2
E|W1 | =
N
X
E|Ej (rj∗ Dj−2 (z1 )Dj−1 (z2 )rj − N −1 tr T 1/2 Dj−2 (z1 )Dj−1 (z2 )T 1/2 )|2 ≤ K.
j=1
27
Using (3.5), and the bounds for β1 (z1 ) and r1∗ D1−2 (z1 )D1−1 (z2 )r1 given in the remark to
(3.2), we have
E|W2 |2 =
N
X
E|(Ej − Ej−1 )βj (z1 )rj∗ Dj−2 (z1 )Dj−1 (z2 )rj γj (z1 )|2
j=1
B
(1)
≤ KN [E|γ1 (z1 )|2 + v −10 n2 P (kBn k > ηr or λmin
< ηl )] ≤ K.
This argument of course handles the third sum in (3.8).
For the first sum in (3.8) we have
N
X
(Ej − Ej−1 )βj (z1 )βj (z2 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2
j=1
N
X
= bn (z1 )bn (z2 ) (Ej −Ej−1 )[(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2 −(N −1 tr T 1/2 Dj−1(z1 )Dj−1(z2 )T 1/2 )2 ]
j=1
−bn (z2 )
N
X
(Ej − Ej−1 )βj (z1 )βj (z2 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2 γj (z2 )
j=1
−bn (z1 )bn (z2 )
N
X
(Ej − Ej−1 )βj (z1 )(rj∗ Dj−1 (z1 )Dj−1 (z2 )rj )2 γj (z1 )
j=1
= bn (z1 )bn (z2 )Y1 − bn (z2 )Y2 − bn (z1 )bn (z2 )Y3 .
Both Y2 and Y3 are handled the same way as W2 above. Using (3.2) we have
E|Y1 |2 ≤ N E|(r1∗ D1−1 (z1 )D1−1 (z2 )r1 )2 − (N −1 tr T 1/2 D1−1 (z1 )D1−1 (z2 )T 1/2 )2 |2
≤ N (2E|(r1∗ D1−1 (z1 )D1−1 (z2 )r1 − N −1 tr T 1/2 D1−1 (z1 )D1−1 (z2 )T 1/2 |4
¯
¯2
¯
¯
+4(nN −1 )2 E¯(r1∗ D1−1 (z1)D1−1 (z2 )r1 −N −1 tr T 1/2 D1−1 (z1)D1−1 (z2 )T 1/2 )kD1−1 (z1)D1−1 (z2 )k¯
≤K
Therefore, condition (ii) of Theorem 12.3 in Billingsley (1968) is satisfied, and we
c1 (z)} is tight.
conclude that {M
n
4. Convergence of Mn2 (z). The proof of Lemma 1.1 is completed with the verification of {Mn2 (z)} for z ∈ Cn to be bounded and forms an equicontinuous family, and
convergence to (1.12) under the assumptions in (2) and to zero under those in (3).
28
In order to simplify the exposition, we let C1 = Cu or Cu ∪ Cl if xl < 0, and C2 =
C2 (n) = Cr or Cr ∪ Cl if xl > 0. We begin with proving
sup |Emn (z) − m(z)| → 0
(4.1)
z∈Cn
as n → ∞.
Since F B n →D F c,H almost surely, we get from d.c.t. EF B n →D F c,H . It is easy to verify
that EF B n is a proper c.d.f. Since, as z ranges in C1 , the functions (λ − z)−1 in λ ∈ [0, ∞)
form a bounded, equicontinuous family, it follows [see, e.g. Billingsley (1968), Problem 8,
p. 17] that
sup |Emn (z) − m(z)| → 0.
z∈C1
For z ∈ C2 we write (ηl , ηr defined as in previous section)
Z
Z
1
1
Bn
c,H
Emn (z)−m(z) =
I[ηl ,ηr ] (λ)d(EF (λ)−F (λ))+E
I[η ,η ]c (λ)dF B n (λ).
λ−z
λ−z l r
As above, the first term converges uniformly to zero. For the second term we use (1.9)
with ` ≥ 2 to get
¯ Z
¯
¯
¯
1
Bn
¯
sup ¯E
I[ηl ,ηr ]c (λ)dF (λ)¯¯
λ−z
z∈C2
−1 −`
n
n → 0.
≤ (²n /n)−1 P(kBn k ≥ ηr or λB
min ≤ ηl ) ≤ Kn²
Thus (4.1) holds.
From the fact that F cn ,Hn →D F c,H [see Bai and Silverstein (1998), below (3.10)]
along with the fact that C lies outside the support of F c,H , it is straightforward to verify
that
sup |m0n (z) − m(z)| → 0
z∈C
(4.2)
as n → ∞.
We now show that
sup k(Emn (z)T + I)−1 k < ∞.
n z∈Cn
(4.3)
From Lemma 2.11 of Bai and Silverstein (1998), k(Emn (z)T + I)−1 k is bounded by
max(2, 4v0−1 ) on Cu . Let x = xl or xr . Since x is outside the support of F c,H it follows from
Theorem 4.1 of Silverstein and Choi (1995) that for any t in the support of H m(x)t+1 6= 0.
Choose any t0 in the support of H. Since m(z) is continuous on C0 ≡ {x + iv : v ∈ [0, v0 ]},
there exist positive constants δ1 and µ0 such that
inf |m(z)t0 + 1| > δ1 ,
z∈C0
and
29
sup |m(z)| < µ0 .
z∈C0
Using Hn →D H and (4.1), for all large n, there exists an eigenvalue λT of T such that
|λT − t0 | < δ1 /4µ0 and supz∈Cl ∪Cr |Emn (z) − m(z)| < δ1 /4. Therefore, we have
inf |Emn (z)λT + 1| > δ1 /2,
z∈Cl ∪Cr
which completes the proof of (4.3).
Next we show the existence of ξ ∈ (0, 1) such that for all n large
¯
¯
Z
2
¯
¯
t
2
¯ < ξ.
(4.4)
sup ¯¯cn Emn (z)
dH
(t)
n
¯
2
(1
+
tEm
(z))
n
z∈Cn
From the identity (1.1) of Bai and Silverstein (1998)
1
R
m(z) =
t
−z + c 1+tm(z)
dH(t)
valid for z = x + iv outside the support of F c,H , we find
R
t2
v + Im m(z)c |1+tm(z)|
2 dH(t)
Im m(z) =
.
¯
¯2
R
¯
¯
t
dH(t)¯
¯−z + c 1+tm(z)
Therefore
(4.5)
R
¯
¯
Z
t2
2
c |1+tm(z)|
¯
¯
2 dH(t)
t
¯cm(z)2
¯≤ ¯
dH(t)
¯2
¯
¯ ¯
R
(1 + tm(z))2
¯
t
¯−z + c 1+tm(z) dH(t)¯
=
v+
R
t2
|1+tm(z)|2 dH(t)
R
t2
Im m(z)c |1+tm(z)|
2 dH(t)
Im m(z)c
c
=
R
R
1+
1
t2
c,H
dF
(x)
2
|x−z|
|1+tm(z)|2 dH(t)
R 1
R
t2
c |x−z|2 dF c,H (x) |1+tm(z)|
2 dH(t)
< 1,
for all z ∈ C. By continuity, we have the existence of ξ1 < 1 such that
¯
¯
Z
¯
¯
t2
2
¯ < ξ1 .
¯
dH(t)
(4.6)
sup ¯cm(z)
¯
2
(1
+
tm(z))
z∈C
Therefore, using (4.1), (4.4) follows.
We proceed with some improved bounds on quantities appearing earlier.
Let M be nonrandom n × n. Then, using (3.2) and the argument used to derive the
bound on E|W2 |, we find
E|tr D
−1
M − Etr D
−1
2
M | = E|
N
X
Ej tr D−1 M − Ej−1 tr D−1 M |2
j=1
= E|
N
X
(Ej − Ej−1 )tr (D
−1
−
Dj−1 )M |2
=
j=1
≤2
N
X
N
X
E|(Ej − Ej−1 )βj rj∗ Dj−1 M Dj−1 rj |2
j=1
E|[βj (rj∗ Dj−1M Dj−1 rj −N −1 tr (T Dj−1M Dj−1 )|2 +E|βj − β¯j |2 |N −1 tr (T Dj−1M Dj−1 )|2 ]
j=1
30
≤ KkM k2 .
(4.7)
The same argument holds for D1−1 so we also have
(4.8)
E|tr D1−1 M − Etr D1−1 M |2 ≤ KkM k2 .
Our next task is to investigate the limiting behavior of
µ Z
¶
dHn (t)
N cn
+ zcn Emn
1 + tEmn
·
¸
1
∗ −1
−1
−1
−1
= N Eβ1 r1 D1 (Emn T + I) r1 − Etr (Emn T + I) T D
.
N
for z ∈ Cn [see (5.2) in Bai and Silverstein (1998)]. Throughout the following, all bounds,
including O(·) and o(·) expressions, and convergence statements hold uniformly for z ∈ Cn
We have
(4.9)
Etr (Emn T + I)−1 T D1−1 − Etr (Emn T + I)−1 T D−1
= Eβ1 tr (Emn T + I)−1 T D1−1 r1 r1∗ D1−1
= bn E(1 − β1 γ1 )r1∗ D1−1 (Emn T + I)−1 T D1−1 r1 .
From (3.2), (3.5), (4.3) we get
|Eβ1 γ1 r1∗ D1−1 (Emn T + I)−1 T D1−1 r1 | ≤ KN −1 .
Therefore
|(4.9) − N −1 bn Etr D1−1 (Emn T + I)−1 T D1−1 T | ≤ KN −1 .
Since β1 = bn − b2n γ1 + β1 b2n γ12 we have
N Eβ1 r1∗ D1−1 (Emn T + I)−1 r1 − Eβ1 Etr (Emn T + I)−1 T D1−1
= −b2n N Eγ1 r1∗ D1−1 (Emn T + I)−1 r1
+ b2n (N Eβ1 γ12 r1∗ D1−1 (Emn T + I)−1 r1 − (Eβ1 γ12 )Etr (Emn T + I)−1 T D1−1 )
= −b2n N Eγ1 r1∗ D1−1 (Emn T + I)−1 r1
+b2n (E[N β1 γ12 r1∗ D1−1 (Emn T + I)−1 r1 − β1 γ12 tr D1−1 (Emn T + I)−1 T ])
+b2n Cov(β1 γ12 , tr D1−1 (Emn T + I)−1 T )
31
(Cov(X, Y ) = EXY − EXEY ). Using (3.2), (3.5), and (4.3), we have
|E[N β1 γ12 r1∗ D1−1 (Emn T + I)−1 r1 − β1 γ12 tr D1−1 (Emn T + I)−1 T ]| ≤ Kδn2 .
Using (3.5), (3.6), (4.3) and (4.8) we have
|Cov(β1 γ12 , tr D1−1 (Emn T + I)−1 T )|
≤ (E|β1 |4 )1/4 (E|γ1 |8 )1/4 (E|tr D1−1 (Emn T + I)−1 T − Etr D1−1 (Emn T + I)−1 T |2 )1/2
≤ Kδn3 N −1/4 .
Since β1 = bn − bn β1 γ1 , we get from (3.5) and (3.6) Eβ1 = bn + O(N −1/2 ).
Write
EN γ1 r1∗ D1−1 (Emn T + I)−1 r1
= N E[(r1∗ D1−1 r1 − N −1 tr D1−1 T )(r1∗ D1−1 (Emn T + I)−1 r1 − N −1 tr D1−1 (Emn T + I)−1 T )]
+ N −1 Cov(tr D1−1 T, tr D1−1 (Emn T + I)−1 T ).
From (4.8) we see the second term above is O(N −1 ). Therefore, we arrive at
µ Z
(4.10) N cn
dHn (t)
+ zcn Emn
1 + tEmn
¶
= b2n N −1 Etr D1−1 (Emn T + I)−1 T D1−1 T
− b2n N E[(r1∗ D1−1 r1 − N −1 tr D1−1 T )(r1∗ D1−1 (Emn T + I)−1 r1 − N −1 tr D1−1 (Emn T + I)−1 T )]
+ o(1).
Using (1.15) on (4.10) and arguing the same way (1.15) is used in section 2 (below
(2.7)), we see that under the assumptions in (3), the “complex Gaussian” case
µ
(4.11)
N
Z
cn
dHn (t)
+ zcn Emn
1 + tEmn
¶
→0
as n → ∞
while under the assumptions in (2), the “real Gaussian” case
µ
N
Z
cn
dHn (t)
+ zcn Emn
1 + tEmn
Let An (z) = cn
R
dHn (t)
1+tEmn (z)
¶
= −b2n N −1 Etr D1−1 (Emn T + I)−1 T D1−1 T + o(1).
+ zcn Emn (z). Using the identity
Emn (z) = −
(1 − cn )
+ cn Emn
z
32
we have
Z
An (z) = cn
dHn (t)
− cn + zEmn (z) + 1
1 + tEmn (z)
µ
= −Emn (z) −z −
It follows that
Emn (z) =
−z + cn
R
1
+ cn
Emn (z)
1
tdHn (t)
1+tEmn (z)
+ An /Emn (z)
Z
tdHn (t)
1 + tEmn (z)
¶
.
.
From this, together with the analogous identity [below (4.4)] we get
(4.12)
Emn (z) − mn0 (z) = −
m0n An
R
1 − cn Emn m0n
t2 dHn (t)
(1+tEmn )(1+tm0n )
.
We see from (4.4) and the corresponding bound involving m0n (z), that the denominator
of (4.12) is bounded away from zero.
Therefore from (4.11), in the “complex Gaussian” case
sup Mn2 (z) → 0
z∈Cn
as n → ∞.
We now find the limit of N −1 Etr D1−1 (Emn T + I)−1 T D1−1 T . Applications of (3.1)–
(3.3), (3.6), and (4.3) show that both
Etr D1−1 (Emn T + I)−1 T D1−1 T − Etr D−1 (Emn T + I)−1 T D1−1 T
and Etr D−1 (Emn T + I)−1 T D1−1 T − Etr D−1 (Emn T + I)−1 T D−1 T
are bounded. Therefore it is sufficient to consider
N −1 Etr D−1 (Emn T + I)−1 T D−1 T.
Write
D(z) + zI − bn (z)T =
N
X
rj rj∗ − bn (z)T.
j=1
It is straightforward to verify that zI − bn (z)T is non-singular. Taking inverses we get
D−1 (z) = −(zI − bn (z)T )−1
+
N
X
βj (z)(zI − bn (z))−1 rj rj∗ Dj (z) − bn (z)(zI − bn (z)T )−1 T D−1 (z)
j=1
33
= −(zI − bn (z)T )−1 + bn (z)A(z) + B(z) + C(z)
(4.13)
where
A(z) =
N
X
(zI − bn (z)T )−1 (rj rj∗ − N −1 )Dj−1 (z)
j=1
B(z) =
N
X
(βj (z) − bn (z))(zI − bn (z)T )−1 rj rj∗ Dj−1 (z)
j=1
and
C(z) = N
−1
bn (z)(zI − bn (z)T )
−1
T
N
X
(Dj−1 (z) − D−1 (z))
j=1
=N
−1
bn (z)(zI − bn (z)T )
−1
T
N
X
βj (z)Dj−1 (z)rj rj∗ Dj−1 (z).
j=1
Since Eβ1 = −zEmn and Eβ1 = bn + O(N −1 ) we have bn → −zm. From (4.3) it follows
that k(zI − bn (z)T )−1 k is bounded.
We have by (3.5) and (3.6)
E|β1 − bn |2 = |bn |2 E|β1 γ1 |2 ≤ KN −1 .
(4.14)
Let M be n × n. From (3.1),(3.2),(3.6), and (4.14) we get
(4.15)
|N −1 Etr B(z)M | ≤ K(E|β1 − bn |2 )1/2 (E|r1∗ r1 kD1−1 M k |2 )1/2 ≤ KN −1/2 (EkM k4 )1/4
and
(4.16)
|N −1 Etr C(z)M | ≤ KN −1 E|β1 |r1∗ r1 kD1−1 k2 kM k ≤ KN −1 (EkM k2 )1/2 .
For the following M , n × n, is nonrandom, bounded in norm. Write
(4.17)
tr A(z)T D−1 M = A1 (z) + A2 (z) + A3 (z)
where
A1 (z) = tr
N
X
(zI − bn T )−1 rj rj∗ Dj−1 T (D−1 − Dj−1 )M
j=1
N
X
A2 (z) = tr
(zI − bn T )−1 (rj rj∗ Dj−1 T Dj−1 − N −1 T Dj−1 T Dj−1 )M
j=1
34
and
A3 (z) = tr
N
X
(zI − bn T )−1 N −1 T (Dj−1 − D−1 )M.
j=1
We have EA2 (z) = 0 and similar to (4.16) we have
|EN −1 A3 (z)| ≤ KN −1 .
(4.18)
Using (3.2) and (4.14) we get
EN −1 A1 (z) = −Eβ1 r1∗ D1−1 T D1−1 r1 r1∗ D1−1 M (zI − bn T )−1 r1
= −bn E(N −1 tr D1−1 T D1−1 T )(N −1 tr D1−1 M (zI − bn T )−1 T ) + o(1)
= −bn E(N −1 tr D−1 T D−1 T )(N −1 tr D−1 M (zI − bn T )−1 T ) + o(1).
Using (3.1) and (4.7) we find
|Cov(N −1 tr D−1 T D−1 T, N −1 tr D−1 M (zI − bn T )−1 T )|
≤ (E|N −1 tr D−1 T D−1 T |2 )1/2 N −1
× (E|tr D−1 M (zI − bn T )−1 T − ED−1 M (zI − bn T )−1 T |2 )1/2 ≤ KN −1 .
Therefore
(4.19) EN −1 A1 (z)
= −bn (EN −1 tr D−1 T D−1 T )(EN −1 tr D−1 M (zI − bn T )−1 T ) + o(1).
From (4.13),(4.15), and (4.16) we get
(4.20) EN −1 tr D−1 T (zI − bn T )−1 T
= N −1 tr (−(zI − bn T )−1 + EB(z) + EC(z))T (zI − bn T )−1 T
cn
=− 2
z
Z
t2 dHn (t)
+ o(1).
(1 + tEmn )2
Similarly
(4.21)
EN
−1
tr D
−1
(Emn T + I)
−1
T (zI − bn T )
−1
cn
T =− 2
z
Z
t2 dHn (t)
+ o(1).
(1 + tEmn )3
Using (4.13),(4.15)–(4.20) we get
EN −1 tr D−1 T D−1 T = −EN −1 D−1 T (zI − bn T )−1 T
− b2n (EN −1 tr D−1 T D−1 T )(EN −1 tr D−1 T (zI − bn T )−1 T ) + o(1)
35
Z
cn
= 2
z
t2 dHn (t)
(1 + z 2 Em2n EN −1 tr D−1 T D−1 T ) + o(1).
(1 + tEmn )2
Therefore
EN −1 tr D−1 T D−1 T =
(4.22)
cn
z2
R
1 − cn
t2 dHn (t)
(1+tEmn )2
R Em2n t2 dHn (t) + o(1)
(1+tEmn )2
Finally we have from (4.13)–(4.19),(4.21), and (4.22)
N −1 Etr D−1 (Emn T + I)−1 T D−1 T = −EN −1 D−1 (Emn T + I)−1 T (zI − bn T )−1 T
− b2n (EN −1 tr D−1 T D−1 T )(EN −1 tr D−1 (Emn T + I)−1 T (zI − bn T )−1 T ) + o(1)
=
cn
z2
Z

cn
z2
R
t2 dHn (t)
(1+tEmn )2

t2 dHn (t) 

2
2
1 + z Emn
R Em2n t2 dHn (t)  + o(1).
(1 + tEmn )3
1 − cn
(1+tEmn )2
R t2 dHn (t)
cn
2
z
(1+tEmn )3
=
R Em2n t2 dHn (t) + o(1).
1 − cn
(1+tEmn )2
Therefore, from (4.12) we conclude that in the “real Gaussian” case
c
R
m(z)3 t2 dH(t)
(1+tm(z))3
sup Mn2 (z) → ³
R
z∈Cn
1−c
m(z)2 t2 dH(t)
(1+tm(z))2
´2
as n → ∞
which is (1.12).
Finally, for general standardized X1 1 , we see that in light of the above work, in order
to show {Mn2 (z)} for z ∈ Cn is bounded and equicontinuous, it is sufficient to prove {fn0 (z)},
where
fn (z) ≡ N E[(r1∗ D1−1 r1 −N −1 tr D1−1 T )(r1∗ D1−1 (Emn T +I)−1 r1 −N −1 tr D1−1 (Emn T +I)−1 T )],
is bounded. Using (2.3) we find
³
−2
−1
|f 0 (z)| ≤ KN −1 (E(tr D1−2 T D1 T )E(tr D1−1 (Emn T + I)−1 T (Emn T + I)−1 D1 T ))1/2
−1
−2
+ (E(tr D1−1 T D1 T )E(tr D1−2 (Emn T + I)−1 T (Emn T + I)−1 D1 T ))1/2
´
−1
−1
+|Em0n |(E(tr D1−1 T D1 T )E(tr D1−1 (Emn T + I)−2 T 3 (Emn T + I)−2 D1 T ))1/2
36
Using the same argument resulting in (3.1) it is a simple matter to conclude that Em0n (z)
is bounded for z ∈ Cn . All the remaining expected values are O(N ) due to (3.1) and (4.3),
and we are done.
5. Some Derivations and Calculations. Choose f, g ∈ {f1 , . . . , fr }. Let SF
denote the support of F c,H , and let a, b be such that SF is a subset of (a, b), and is
contained in a proper rectangle on which f and g are analytic. We proceed to show (1.17).
Assume the z1 contour encloses the z2 contour. Then, because the mapping z → m(z) is
one-to-one on C − SF and increasing on intervals in R − SF , the resulting contour m(z2 )
forms encloses the resulting m(z1 ) contour (if b ≤ x2 < x1 , then m(x2 ) < m(x1 ) < 0.
Similarly x1 < x2 ≤ a ⇒ 0 < m(x1 ) < m(x2 )). Using integration by parts twice, first with
respect to z2 and then with respect to z1 we get
ZZ
1
f (z1 )g 0 (z2 )
d
(1.7) =
m(z1 )dz2 dz1
2
2π
(m(z1 ) − m(z2 )) dz1
ZZ
1
f 0 (z1 )g 0 (z2 )Log(m(z1 ) − m(z2 ))dz1 dz2 .
=− 2
2π
Here Log is any branch of the logarithm. For fixed z2 m(z1 ) − m(z2 ) does not encircle the
origin as z1 moves around its contour. Therefore Log(m(z1 ) − m(z2 )) is single-valued, so
we can write
(5.1)
ZZ
1
RHS of (1.7) = − 2
f 0 (z1 )g 0 (z2 )[ln |m(z1 ) − m(z2 )| + i arg(m(z1 ) − m(z2 ))]dz1 dz2
2π
where, say, for fixed z2 the z1 contour begins at x1 ∈ R to the right of b, arg(m(z1 ) −
m(z2 )) ∈ [0, 2π), for z1 = x1 , and varies continuously from that angle.
We choose the contours to be rectangles, sides parallel to the axes, symmetric with
respect to the real axis, with the height of the rectangle corresponding to z1 twice as that
of the z2 rectangle. Since the integral is independent of the selection of the contours, we
may compute the limit of the integral as the height approaches zero. Let A(z1 , z2 ) be a
function. Then
Z Z
A(z1 , z2 )dz1 dz2 = I1 + I2 + I3 + I4
C
where
Z
b
Z
b+ε
[A(z1 , z2 ) + A(¯
z1 , z¯2 ) − A(z1 , z¯2 ) − A(¯
z1 , z2 )]dxdy
I1 =
a
a−ε
where z1 = x + 2iv and z2 = y + iv, and I2 is the integral in which z1 = x + 2iv and z2
runs on the vertical route, I3 is its analogue and I4 is the integral where both z1 and z2
run along the vertical routes.
37
We first consider the case A = f 0 (z1 )g 0 (z2 ) arg[m(z1 ) − m(z2 )]. Since the integrand is
bounded, we conclude that Ij → 0, for j = 2, 3, 4. In I1 , z1 = x + iv, z2 = y + 2iv, and the
integrand A(z1 , z2 ) + A(¯
z1 , z¯2 ) − A(z1 , z¯2 ) − A(¯
z 1 , z2 )
∼ f 0 (x)g 0 (y)
[arg(m(z1 ) − m(z2 )) + arg(m(¯
z1 ) − m(¯
z2 )) − arg(m(z1 ) − m(¯
z2 )) − arg(m(¯
z1 ) − m(z2 ))] = 0.
Hence I1 → 0 by d.c.t.
Next, let us consider the case A(z1 , z2 ) = f 0 (z1 )g 0 (z2 ) ln |m(z1 ) − m(z2 )|. We first
show that I2 , I3 and I4 → 0. It is obvious that for any z1 = x ± 2iu and z2 = y ± iu with
u > 0,
¯
¯Z
¯
¯
dF c,H (x)
¯ ≥ u/K.
¯
2/u ≥ |m(z1 ) − m(z2 )| = |z1 − z2 | ¯
(x − z1 )(x − z2 ) ¯
Rv
Thus, |I2 | ≤ K 0 ln(1/u)du → 0. Similarly, I3 → 0 and I4 → 0. Finally, we are finding
the limit of I1 . Note that |f 0 (z1 )g 0 (z2 ) − f 0 (x)g 0 (y)| ≤ Kv for some constant K and
that ln |m(z1 ) − m(z2 )| ≤ K ln(1/v) when z1 = x ± 2iv and z2 = y ± iv. The integrand
[A(z1 , z2 ) + A(¯
z1 , z¯2 ) − A(z1 , z¯2 ) − A(¯
z1 , z2 )]
= 2f 0 (x)f 0 (y)[ln |m(x + 2iv) − m(y + iv)| − ln |m(x + 2iv) − m(y − iv)|] + Rn (z1 , z2 )
with |Rn (z1 , z2 )| ≤ Kv ln(1/v). Thus,
Z
b
Z
b+ε
I1 = 2
a
f 0 (x)f 0 (y) ln
a−ε
|m(x + 2iv) − m(y + iv)|
dxdy + o(1).
|m(x + 2iv) − m(y − iv)|
By (5.1), we have
(5.2)
1
RHS of (1.7) = 2
π
Z bZ
a
b+ε
f 0 (x)f 0 (y) ln
a−ε
|m(x + 2iv) − m(y − iv)|
dxdy + o(1).
|m(x + 2iv) − m(y + iv)|
The proof of (1.17) is then complete if we can prove that the limiting procedure can be
taken under the integration sign on the right hand side of (5.2).
To this end, by applying d.c.t., we need only to establish a bound for the integrand.
From (1.2) we get
m(z1 ) − m(z2 )
z1 − z2 =
m(z1 )m(z2 )
µ
Z
1−c
m(z1 )m(z2 )t2 dH(t)
(1 + tm(z1 ))(1 + tm(z2 ))
This implies that
(5.3)
¯
¯
¯ m(z1 ) − m(z2 ) ¯
¯ ≥ |m(z1 )m(z2 )|.
¯
¯
¯
z 1 − z2
38
¶
.
Rewrite
¯
¯
¶
µ
¯ m(x + i2v) − m(y + iv) ¯ 1
4mi (x + i2v)mi (x + iv)
¯
¯
(5.4)
ln ¯
.
= ln 1 +
m(x + i2v) − m(y + iv) ¯ 2
|m(x + i2v) − m(y + iv)|2
From (5.3) we get
µ
2mi (x + i2v)mi (x + iv)
RHS of (5.4) ≤ ln 1 +
(x − y)2 |m(x + i2v)m(y + iv)|2
¶
.
We claim that
(5.5)
mi (x + i2v)mi (x + iv)
< ∞.
2
x,y∈[a−²,b+²] |m(x + i2v)m(y + iv)|
sup
v∈(0,1)
Suppose not. Then there exists a sequence {zn } ⊂ C+ which converges to a number in
[a − ², b + ²] for which m(zn ) → 0. From (1.2) we must have
Z
tm(zn )
(5.6)
c
dH(t) → 1.
1 + tm(zn )
But the limit of the left hand side of (5.6) is obviously 0 because m(zn ) → 0. The
contradiction proves our assertion.
Therefore, there exists a K > 0 for which the right hand side of (5.4) is bounded by
¶
µ
1
K
(5.7)
ln 1 +
2
(x − y)2
for x, y ∈ [a − ², b + ²]. The integrand of the integral in (5.2.) is bounded by the integrable
function
¶
µ
K
0
0
|f (x)g (y)| ln 1 +
.
(x − y)2
An application of the d.c.t. yields the exchangeability of the limiting and integration
procedures and (1.17) follows.
We now verify (1.18). From (1.2), we have
d
m2 (z)
.
m(z) =
R t2 m2 (z)
dz
1 − c (1+tm(z))2 dH(t)
In Silverstein and Choi (1995) it is argued that the only place where the m0 (z) can possibly
become unbounded are near the origin and the boundary, ∂SF , of SF . It is a simple matter
to verify
¶
µ
Z
Z
1
d
t2 m2 (z)
EXf =
dH(t) dz
f (z) Log 1 − c
4πi
dz
(1 + tm(z))2
µ
¶
Z
Z
1
t2 m2 (z)
0
=−
dH(t) dz
f (z)Log 1 − c
4πi
(1 + tm(z))2
39
where, because of (4.6), the arg term for Log can be continuously taken within an interval
of length 2π. We choose a contour as above. From (3.17) of Bai and Silverstein (1998),
¯
¯
Z
2 2
¯
¯
t
m
(x
+
iv)
¯1 − c
¯ ≥ v 2 /2
dH(t)
¯
¯
2
(1 + tm(x + iv))
the integrals on the two vertical routes are controlled by Kv| ln v| → 0. The integral on
the two horizontal routes is equal to
1
2π
Z
a
b
¯
¯
Z
2 2
¯
¯
t
m
(x
+
iv)
¯ dx
(f (x + iv) − f (x − iv)) ln ¯¯1 − c
dH(t)
¯
(1 + tm(x + iv))2
µ
¶
Z
Z b
1
t2 m2 (x + iv)
0
0
+
(f (x − iv) + f (x + iv)) arg 1 − c
dH(t) dx.
2π a
(1 + tm(x + iv))2
0
0
1
=
π
(5.8)
Z
b
µ
0
Z
f (x) arg 1 − c
a
¶
t2 m2 (x + iv)
dH(t) dx + o(1).
(1 + tm(x + iv))2
The equality (1.18) then follows by applying d.c.t. to the above equation.
We now derive d(c) (c ∈ (0, 1)) in (1.1), (1.21) and and the variance in (1.22). The
first two rely on Poisson’s integral formula
1
u(z) =
2π
(5.9)
Z
2π
u(eiθ )
0
1 − r2
dθ,
1 + r2 − 2r cos(θ − φ)
where u is harmonic on the unit disk in C, and z = reiφ with r ∈ [0, 1). Making the
√
substitution x = 1 + c − 2 c cos θ we get
1
d(c) =
π
Z
0
2π
√
sin2 θ
√
ln(1 + c − 2 c cos θ)dθ
1 + c − 2 c cos θ
Z 2π
√
2 sin2 θ
1
√
=
ln |1 − ceiθ |2 dθ.
2π 0 1 + c − 2 c cos θ
It is straightforward to verify
f (z) ≡ −(z − z −1 )2 (Log(1 −
√
cz) +
√
cz) −
√
is analytic on the unit disk, and that
Re f (eiθ ) = 2 sin2 θ ln |1 −
40
√
ceiθ |2 .
c(z − z 3 )
Therefore from (5.9) we have
√
f ( c)
c−1
d(c) =
=
ln(1 − c) − 1.
1−c
c
For (1.21) we use (1.18). From (1.2), with H(t) = I[1,∞) (t) we have for z ∈ C+
z=−
(5.10)
c
1
+
.
m(z) 1 + m(z)
Solving for m(z) we find
m(z) =
−(z + 1 − c) +
p
p
(z + 1 − c)2 − 4z
−(z + 1 − c) + (z − 1 − c)2 − 4c
=
,
2z
2z
the square roots defined to yield positive imaginary parts for z ∈ C+ . As z → x ∈
[a(y), b(y)] (limits defined below (1.1)) we get
p
p
−(x + 1 − c) + 4c − (x − 1 − c)2 i
−(x + 1 − c) + (x − a(c))(b(c) − x) i
m(x) =
=
.
2x
2x
The identity (5.10) still holds with z replaced by x and from it we get
m(x)
1 + xm(x)
=
,
1 + m(x)
c
so that
m2 (x)
1
1−c
=1−
2
(1 + m(x))
c
Ã
−(x − 1 − c) +
p
4c − (x − 1 − c)2 i
2
!2
p
´
4c − (x − 1 − c)2 ³p
=
4c − (x − 1 − c)2 + (x − 1 − c)i
2c
Therefore, from (1.18)
Ã
!
Z b(c)
x−1−c
1
dx
f 0 (x) tan−1 p
(5.11)
EXf =
2π a(c)
4c − (x − 1 − c)2
Z b(c)
f (a(c)) + f (b(c))
f (x)
1
p
=
dx.
−
4
2π a(c)
4c − (x − 1 − c)2
To compute the last integral when f (x) = ln x we make the same substitution as before,
arriving at
Z 2π
√
1
ln |1 − ceiθ |2 dθ.
4π 0
41
√
We apply (5.9) where now u(z) = ln |1 − cz|2 , which is harmonic, and r = 0. Therefore,
the integral must be zero, and we conclude
EXln =
ln(a(c)b(c))
1
= ln(1 − c).
4
c
To derive (1.22) we use (1.16). Since the z1 , z2 contours cannot enclose the origin
(because of the logarithm), neither can the resulting m1 , m2 contours. Indeed, either from
√
the graph of x(m) or from m(x) we see that x > b(c) ⇐⇒ m(x) ∈ (−(1 + c)−1 , 0)
√
and x ∈ (0, a(y)) ⇐⇒ m(x) < ( c − 1)−1 . For our analysis it is sufficient to know that
the m1 , m2 contours, non-intersecting and both taken in the positive direction, enclose
(c − 1)−1 and −1, but not 0. Assume the m2 contour encloses the m1 contour. For fixed
m2 , using (5.10) we have
Z
c
Z 12 −
1
Log(z(m1 ))
(1+m1 )2
m1
dm1 =
dm1
1
c
2
(m1 − m2 )
− m1 + 1+m1 (m1 − m2 )
Ã
!
Ã
Z
−1
(1 + m1 )2 − cm21
1
1
1
dm1 = 2πi
=
+
−
1
cm1 (m1 − m2 )
m1 + 1 m1 − c−1
m2 + 1 m2 −
!
1
c−1
.
!
1
1
Log(z(m))dm
−
1
m + 1 m − c−1
#
Ã
#
!
Z "
Z "
1
m − c−1
1
1
1
1
1
1
=
Log
Log(m)dm.
−
dm−
−
1
1
πi
m + 1 m − c−1
m+1
πi
m + 1 m − c−1
Therefore
1
VarXln =
πi
Z Ã
The first integral is zero since the integrand has antiderivative
"
Ã
!#2
1
m − c−1
1
− Log
2
m+1
which is single valued along the contour. Therefore we conclude that
VarXln = −2[Log(−1) − Log((c − 1)−1 )] = −2 ln(1 − c).
Finally, we compute expressions for (1.23) and (1.24). Using (5.11) we have
EX
xr
(a(c))r + (b(c))r
1
=
−
4
2π
42
Z
b(c)
a(c)
xr
p
dx
4c − (x − 1 − c)2
(a(c))r + (b(c))r
1
=
−
4
4π
Z
2π
0
|1 −
√
ceiθ |2r dθ
(a(c))r + (b(c))r
1
=
−
4
4π
Z
0
2π
r µ ¶µ ¶
X
√
r
r
(− c)j+k ei(j−k)θ dθ
j
k
j,k=0
r µ ¶2
√ 2r
√ 2r
1
1X r
= ((1 − c) + (1 + c) ) −
cj ,
4
2 j=0 j
which is (1.23).
For (1.24) we use (1.16) and rely on observations made in deriving (1.22). For c ∈ (0, 1)
the contours can again be made enclosing −1 and not the origin. However, because of the
fact that (1.7) derives from (1.14) and the support of F c,I[1,∞) on R+ is [a(c), b(c)], we may
also take the contours taken in the same way when c > 1. The case c = 1 simply follows
from the continuous dependence of (1.16) on c.
Keeping m2 fixed, we have on a contour within 1 of −1
Z
(− m11 +
c
r1
1+m1 )
(m1 − m2 )2
dm1
µ
¶r
¶−2
m1 + 1
1
1−c 1
−r1
−2
1−
=c
(1 − (m1 + 1)) (m2 + 1)
dm1
+
m1 + 1
c
m2 + 1
¶k
Z X
r1 µ ¶ µ
r1
1−c 1
r1
=c
(1 + m1 )k−r1
k1
c
k1 =0
µ
¶
¶`−1
∞
∞ µ
X
X
r1 + j − 1
m1 + 1
j
−2
`
dm1
(m1 + 1) (m2 + 1)
j
m
+
1
2
j=0
Z µ
r1
`=1
r1
= 2πic
rX
−k1
1 −1 r1
X
k1 =0 `=1
µ ¶µ
¶k µ
¶
r1
1 − c 1 2r1 − 1 − (k1 + `)
`(m2 + 1)−`−1 .
r1 − 1
k1
c
Therefore,
¶k µ
¶
r1 −1 r1X
−k1 µ ¶ µ
r1
1 − c 1 2r1 − 1 − (k1 + `)
i r1 +r2 X
Cov(Xxr1 , Xxr2 ) = − c
`
r1 − 1
π
k1
c
k1 =0 `=1
µ
¶
Z
r
∞ µ
2
X r2 ¶ µ 1 − c ¶k2
X
r2 + j − 1
−`−1
k2 −r2
× (m2 + 1)
(m2 + 1)
(m2 + 1)j dm2
j
k2
c
j=0
k2 =0
43
rX
1 −1
r1 +r2
= 2c
¶k +k r −k µ
¶µ
¶
r2 µ ¶µ ¶µ
X
r1 r2
1 − c 1 2 1X1 2r1 −1−(k1 + `) 2r2 −1 −k2 + `
`
,
r1 − 1
r2 − 1
k1 k2
c
k1 =0 k2 =0
`=1
which is (1.24), and we are done.
APPENDIX
We verify (1.9b) by modifying the proof in Bai and Yin (1993) (hereafter referred to
as BY (1993)). To avoid confusion we maintain as much as possible the original notation
used in BY (1993).
Theorem. For Zij ∈ C i = 1, . . . , p, j = 1, . . . , n i.i.d. EZ1 1 = 0, E|Z1 1 |2 = 1, and
E|Z1 1 |4 < ∞, let Sn = (1/n)XX ∗ where X = (Xij ) is p × n with
Xij = Xij (n) = Zij I{|Z|ij ≤δn √n} − EZij I{|Z|ij ≤δn √n}
where δn → 0 more slowly than that constructed in the proof of Lemma 2.2 of Yin, Bai,
and Krishnaiah (1988), and satisfying δn n1/3 → ∞. Assume p/n → y ∈ (0, 1) as n → ∞.
√
Then for any η < (1 − y)2 and any ` > 0
P(λmin (Sn ) < η) = o(n−` ).
Proof. We follow along the proof of Theorem 1 in BY (1993). The conclusions of
Lemmas 1 and 3–8 need to be improved from “almost sure” statements to ones reflecting
tail probabilities. We shall denote the augmented Lemmas with primes (0 ) after the number. We remark here that the proof in BY (1993) assumes entries of Z1 1 to be real, but
all the arguments can be easily modified to allow complex variables.
For Lemma 1 it has been shown that for the Hermitian matrices T (l) defined in (2.2),
√
1/3
and integers mn satisfying mn / ln n → ∞, mn δn / ln n → 0, and mn /(δn n) → 0
Etr T 2mn (l) ≤ n2 ((2l + 1)(l + 1))2mn (p/n)mn (l−1) (1 + o(1))4mn l .
((2.13) of BY (1993)). Therefore, writing mn = kn ln n, for any ² > 0 there exists an
a ∈ (0, 1) such that for all n large
(A.1)
P(tr T (l) > (2l + 1)(l + 1)y (l−1)/2 + ²) ≤ n2 amn = n2+kn log a = o(n−` )
for any positive `. We call (A.1) Lemma 10 .
We next replace Lemma 2 of BY (1993) with
Lemma 20 Let for every n X1 , X2 , ..., Xn be i.i.d. with X1 = X1 (n) ∼ X1 1 (n). Then
for any ² > 0 and ` > 0
¯
µ¯
¶
n
¯ −1 X
¯
2
¯
¯
P ¯n
|Xi | − 1¯ > ² = o(n−` )
i=1
44
and for any f > 1
µ
¶
n
X
−f
2f
P n
|Xi | > ² = o(n−` ).
i=1
Proof. Since as n → ∞ E|X1 |2 → 1,
n
X
n−f
E|Xi |2f ≤ 22f E|Z1 1 |2f n1−f → 0
for f ∈ (1, 2],
i=1
and
−f
n
n
X
E|Xi |2f ≤ 22f E|Z1 1 |4 n1−f +(2f −4)/2 = Kn−1 → 0 for f > 2,
i=1
it is sufficient to show for f ≥ 1
¯X
¯
µ
¶
¯ n
¯
−f ¯
2f
2f ¯
P n ¯ (|Xi | − E|Xi | )¯ > ² = o(n−` ).
(A.2)
i=1
For any positive integer m we have this probability bounded by
−2mf −2m
²
n
·X
n
E
2f
(|Xi |
¸2m
2f
− E|Xi | )
i=1
=n
µ
X
−2mf −2m
²
i1 ≥0,...,in ≥0
i1 +···+in =2m
=n
−2mf −2m
²
m µ ¶
X
n
k
k=1
n
²
m
X
≤2
n
²
m
X
n
n
i1 ≥2,...,ik ≥2
i1 +···+ik =2m
k=1
2m −2mf −2m
≤2
n
µ
²
m
X
E(|Xt |2f − E|Xt |2f )it
t=1
2m
i1 · · · ik
µ
i1 ≥2,...,ik ≥2
i1 +···+ik =2m
X
k
¶Y
n
X
k
k=1
2m −2mf −2m
µ
X
i1 ≥2,...,ik ≥2
i1 +···+ik =2m
2m −2mf −2m
≤2
2m
i1 · · · in
¶Y
k
E(|X1 |2f − E|X1 |2f )it
t=1
2m
i1 · · · ik
2m
i1 · · · ik
¶Y
k
¶Y
k
E|X1 |2f it
t=1
√
(2δn n)2f it −4 E|Z1 1 |4
t=1
√
k 2m (2δn n)4f m−4k nk (E|Z1 1 |4 )k
k=1
2m −2m
=2
²
m
X
(2δn )4f m (E|Z1 1 |4 )k (4δn2 n)−k k 2m
k=1
45
µ
m
≤ (for all n large)
¶2m
(2δn )2f 4m
² ln(4δn2 n/E|Z1 1 |4 )
,
where we have used the inequality a−x xb ≤ (b/ ln a)b , valid for all a > 1, b > 0, x ≥ 1.
Choose mn = kn ln n with kn → ∞ and δn2f kn → 0. Since δn n1/3 ≥ 1 for n large we get
for these n ln(δn2 n) ≥ (1/3) ln n. Using this and the fact that limx→∞ x1/x = 1, we have
the existence of a ∈ (0, 1) for which
µ
m
(2δn )2f 4m
² ln(4δn2 n/E|Z1 1 |4 )
¶2m
≤ a2kn ln n = n2kn ln a
for all n large. Therefore (A.2) holds.
Redefining the matrix X (f ) in BY (1993) to be [|Xuv |f ], Lemma 30 states for any
positive integer f
∗
P(λmax {n−f X (f ) X (f ) } > 7 + ²) = o(n−` )
for any positive ² and `.
Its proof relies on Lemmas 10 , 20 (for f = 1, 2) and on the bounds used in the proof of
Lemma 3 in BY (1993). In particular we have the Gerˆsgorin bound
−f
(A.3) λmax {n
X
(f )
(f ) ∗
X
−f
} ≤ max n
i
−f
≤ max n
i
n
X
n
X
j=1
2f
|Xij |
2f
|Xij |
−f
+ max n
µ
i
−1
+ max n
i
j=1
n
XX
|Xij |f |Xkj |f
k6=i j=1
n
X
¶µ
|Xij |
f
max n
j
j=1
−1
p
X
¶
|Xkj |
f
.
k=1
We show the steps involved for f = 2. With ²1 > 0 satisfying (p/n + ²1 )(1 + ²1 ) < 7 + ²
for all n we have from Lemma 20 and (A.3)
−2
P(λmax {n
X
(2)
X
(2) ∗
µ
} > 7 + ²) ≤ pP n
µ
−2
n
X
j=1
−1
+ pP n
n
X
2
¶
|X1j | − 1 > ²1
j=1
¶
4
|X1j | > ²1
µ
−1
+ nP p
p
X
2
|Xk1 | − 1 > ²1
¶
= o(n−` ).
k=1
The same argument can be used to prove Lemma 40 , which states for integer f > 2
P(kn−f /2 X (f ) k > ²) = o(n−` )
for any positive ² and `.
The proofs of Lemmas 40 –80 are handled using the arguments in BY (1993) and those
used above: each quantity Ln in BY (1993) that is o(1) a.s. can be shown to satisfy
P(|Ln | > ²) = o(n−` ).
46
From Lemmas 10 and 80 there exists a positive C such that for every integer k > 0
and positive ² and `
P(kT − yIkk > Ck 4 2k y k/2 + ²) = o(n−` ).
(A.4)
For given ² > 0 let integer k > 0 be such that
√
|2 y(1 − (Ck 4 )1/k )| < ²/2.
Then
√
√
2 y + ² > 2 y(Ck 4 )1/k + ²/2 ≥ (Ck 4 2k y k/2 + (²/2)k )1/k .
Therefore from (A.4) we get for any ` > 0
(A.5)
√
P(kT − yIk > 2 y + ²) = o(n−` ).
From Lemma 20 and (A.5) we get for positive ² and `
√
P(kSn − (1 + y)Ik > 2 y + ²) ≤ P(kSn − I − T k > ²/2) + o(n−` )
¯
¯
µ
¶
n
¯ −1 X
¯
2
¯
¯
= P max¯n
|Xij | − 1¯ > ²/2 + o(n−` ) = o(n−` ).
i≤p
Finally, for any positive η < (1 −
√
j=1
y)2 and ` > 0
√
y)2 − 2 y)
√
√
≤ P(kSn − (1 + y)Ik) > 2 y + (1 − y)2 − η) = o(n−` )
P(λmin (Sn ) < η) = P(λmin (Sn − (1 + y)I) < η − (1 −
√
and we are done.
Acknowledgments. Part of this work was done while J. W. Silverstein visited the
Department of Statistics and Applied Probability at National University of Singapore. He
thanks the members of the department for their hospitality.
REFERENCES
Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, A review. Statistica Sinica 9 611-677.
Bai, Z. D. and Saranadasa, H. (1996). Effect of high dimension comparison of significance tests for a high dimensional two sample problem. Statistica Sinica 6 311-329.
47
Bai, Z. D. and Silverstein, J. W. (1999). Exact separation of eigenvalues of large
dimensional sample covariance matrices. Ann. Probab. 27 1536-1555.
Bai, Z.D., Silverstein, J.W., and Yin, Y.Q. (1988). A note on the largest eigenvalue
of a large dimensional sample covariance matrix. J. Multivariate Anal. 26 166-168.
Bai, Z.D. and Yin, Y.Q. (1993). Limit of the smallest eigenvalue of a large dimensional
sample covariance matrix Ann. Probab. 21 1275-1294.
Billingsley, P. (1968). Convergence of Probability Measures. Wiley, New York.
Billingsley, P. (1995). Probability and Measure, Third Edition. Wiley, New York.
Burkholder, D.L. (1973). Distribution function inequalities for martingales. Ann. Probab. 1 19-42.
Dempster A.P. (1958). A high dimensional two sample significance test. Ann. Math.
Statis. 29 995-1010.
Jonsson, D. (1982). Some limit theorems for the eigenvalues of a sample covariance matrix
J. Multivariate Analysis 12 1-38.
ˇenko, V.A. and Pastur, L.A. (1967). Distribution of eigenvalues for some sets
Marc
of random matrices. Math. USSR-Sb. 1
Silverstein, J. W. and Choi S.I. (1995). Analysis of the limiting spectral distribution
of large dimensional random matrices. J. Multivariate Anal. 54 295-309.
Silverstein, J.W. and Combettes, P.L. (1992). Signal detection via spectral theory of
large dimensional random matrices. IEEE Trans. Signal Processing 40 2100-2105.
Titchmarsh, E.C. (1939). The Theory of Functions, Second Edition. Oxford University
Press, London.
Yin, Y. Q. (1986) Limiting spectral distribution for a class of random matrices. J. Multiv.
Anal., 20 50-68.
Yin, Y.Q., Bai, Z.D., and Krishnaiah, P.R. (1988). On limit of the largest eigenvalue
of the large dimensional sample covariance matrix Probab. Theory Related Fields
78 509-521.
Yin, Y. Q. and Krishnaiah, P. R. (1983). A limit theorem for the eigenvalues of product
of two random matrices. J. Multiv. Anal. 13 489-507.
Department of Statistics
and Applied Probability
National University of Singapore
10 Kent Ridge Crescent
Singapore 119260
E-mail: [email protected]
Department of Mathematics
Box 8205
North Carolina State University
Raleigh, North Carolina 27695-8205
E-mail: [email protected]
48