Addendum to Integrated Conditional Moment Tests for Parametric

Addendum to Integrated Conditional Moment
Tests for Parametric Conditional Distributions
Herman J. Bierens
June 7, 2015
Abstract
In this addendum to
Bierens, H. J., and L. Wang (2012): ”Integrated Conditional Moment Tests
for Parametric Conditional Distributions”, Econometric Theory 28, 328362. [BW hereafter],
I will provide the proof of Mercer’s theorem for the complex case (Lemma
3 in BW), including the Hilbert-Schmidt theorem regarding the existence
of eigenvalues and eigenfunctions. Moreover, Lemma 4 in BW appears to
be incorrect. Therefore, a revised version of Lemma 4 will be provided as
well. Furthermore, on the basis of this revised Lemma 4 I will derive upper
bounds of the asymptotic critical values of the SICM test.
1. Introduction
The main purpose of this addendum to Bierens and Wang (2012) [BW hereafter] is
to provide a complete proof of Mercer’s theorem (Lemma 3 in BW), including the
underlying Hilbert-Schmidt theorem regarding the existence of eigenvalues and
eigenfunction of complex-valued continuous symmetric positive definite kernels.
For the proof of Lemma 3 BW referred to an unpublished paper, HadinejadMahram et al. (2002), which has disappeared from the internet, and a published
paper by Krein (1998). The latter author derived the complex Hilbert-Schmidt
and Mercer theorems as by-products of linear operator theory, which I am not
familiar with, and likely the same applies to most of my fellow econometricians.
Therefore, in this addendum I will provide my own proofs of the complex HilbertSchmidt and Mercer theorems.
The same problem occurred with the (real) version of Mercer theorem in
Bierens and Ploberger (1997), for which the proof was a reference to an exercise in a textbook. Therefore, I decided to derive this proof myself. See Bierens
(2014b). The proofs in the current addendum are complex adaptations of the
ones in Bierens (2014b).
Also, it appears that Lemma 4 is incorrect. Therefore, in this addendum I will
provide a revised version of Lemma 4.
On the basis of the results in BW and in this addendum I have been able
to update my first consistent model specification testing paper, Bierens (1982),
by deriving the asymptotic null distribution of the test, which is similar to BW,
and upper bounds of the critical values on the basis of the revised Lemma 4. See
Bierens (2015) and section 7 below.
In this addendum I will use the same notations as in BW, except that in
integrals with respect to a probability measure µ on set B I will use the notation
dµ(β) instead of µ(dβ) because µ(β) can be interpreted as a distribution function.
The proofs in this addendum employ Hilbert space theory at the level of
Bierens (2014a), linear algebra at the level of Bierens (2004, Appendix I), complex
calculus at the level of Bierens (2004, Appendix III), and measure and probability
theory at the level of Bierens (2004, Ch. 1-3).
2. Complex covariance functions
In Lemma 3 in BW the covariance function is of the form
Γ(β1 , β2 ) = E[Z(β1 )Z(β2 )]
= E [(Re[Z(β1 )] + i. Im[Z(β1 )]) (Re[Z(β2 )] − i. Im[Z(β2 )])]
= E [Re[Z(β1 )]. Re[Z(β2 )]] + E [Im[Z(β1 )]. Im[Z(β2 )]]
+i.E [Im[Z(β1 )]. Re[Z(β2 )]] − i.E [Re[Z(β1 )]. Im[Z(β2 )]] , (2.1)
where Z(β) is a complex-valued zero-mean continuous Gaussian process on a
compact subset B of a Euclidean space. This covariance function is symmetric
positive semidefinite in the following sense.
First, symmetry in the complex case means that
Γ(β1 , β2 ) = Γ(β2 , β1 ),
2
(2.2)
which follows straightforwardly from (2.1). In particular, writing
(2.3)
Γ(β1 , β2 ) = Re[Γ(β1 , β2 )] + i. Im[Γ(β1 , β2 )],
the symmetry condition (2.2) implies that
Re[Γ(β1 , β2 )] = Re[Γ(β2 , β1 )], Im[Γ(β1 , β2 )] = − Im[Γ(β2 , β1 )]
(2.4)
Im[Γ(β, β)] = 0.
(2.5)
and thus
Second, positive semidefiniteness with respect to a probability measure µ on
B means that
Z Z
ϕ(β1 )Γ(β1 , β2 )ϕ(β2 )dµ(β1 )dµ(β2 ) ≥ 0,
(2.6)
for all ϕ ∈ L2C (µ), where:
Definition 2.1. L2C (µ) denotes the RHilbert space of all Borel measurable complexvalued functions ϕ on B satisfying |ϕ(β)|2 dµ(β) < ∞, endowed with the innerR
product hϕ1 , ϕ2 i = ϕ1 (β)ϕ2 (β)dµ(β) and associated norm
sZ
sZ
p
||ϕ|| = hϕ, ϕi =
ϕ(β)ϕ(β)dµ(β) =
|ϕ(β)|2 dµ(β)
and metric ||ϕ1 − ϕ2 ||.
In particular, in the case (2.1) it can be shown, after some tedious but straightforward complex calculations, that
Z Z
ϕ(β1 )Γ(β1 , β2 )ϕ(β2 )dµ(β1 )dµ(β2 )
"µZ
¶#
2
=E
Re[ϕ(β)] Re[Z(β)]dµ(β)
+E
+E
∙µZ
+
Z
"µZ
¶2 #
Im[ϕ(β)] Im[Z(β)]dµ(β)
Im[ϕ(β)] Re[Z(β)]dµ(β)
¶2 #
≥ 0.
Re[ϕ(β)] Im[Z(β)]dµ(β)
3
(2.7)
Moreover, the covariance function Γ(β1 , β2 ) in (2.1) is continuous because
Re[Z(β)] and Im[Z(β)] are a.s. continuous.
In the mathematical literature such a function Γ(β1 , β2 ) is called a kernel, and
I will do so too.
The interpretation of Γ(β1 , β2 ) as a covariance function of a continuous complexvalued Gaussian process is irrelevant for the Hilbert-Schmidt and Mercer theorems. All we need to require is that:
Assumption 2.1. The kernel Γ(β1 , β2 ) on B × B is complex-valued, continuous,
and symmetric positive semidefinite with respect to a probability measure µ on B,
where B is a compact subset of a Euclidean space,
and
Assumption 2.2.
RR
|Γ(β1 , β2 )|2 dµ(β1 )dµ(β2 ) > 0.
The latter excludes the case that Γ(β1 , β2 ) is identical zero.
3. The Hilbert-Schmidt theorem for complex kernels
3.1. The eigenvalue-eigenfunction problem
The general eigenvalue problem is to find a scalar λ and a function ψ ∈ L2C (µ)
normalized to unit norm, ||ψ|| = 1, such that1
Z
(3.1)
Γ(β1 , β2 )ψ(β2 )dµ(β2 ) = λψ(β1 ) for all β1 ∈ B.
Taking complex-conjugates in (3.1), the latter is equivalent to
Z
ψ(β2 ).Γ(β2 , β1 )dµ(β2 ) = λψ(β1 ).
(3.2)
If such a pair (λ, ψ) exists then by (2.6), (3.1), (3.2) and the normalization
||ψ|| = 1,
Z Z
ψ(β1 )Γ(β1 , β2 )ψ(β2 )dµ(β1 )dµ(β2 )
0 ≤
Z
Z
= λ ψ(β1 )ψ(β1 )dµ(β1 ) = λ |ψ(β)|2 dµ(β) = λ.
1
In Lemma 3 in BW, (3.1) was incorrectly stated as λψ(β1 ) =
4
R
Γ(β1 , β2 )ψ(β2 )dµ(β2 ).
Thus, eigenvalues of positive semidefinite kernels are real valued and nonnegative. Moreover, if a solution of (3.1) exists with λ > 0 then the corresponding
eigenfunction ψ is continuous on B because Γ(β1 , β2 ) is continuous on B × B.
Now denote
¯2
Z ¯Z
¯
¯
¯
¯
G(λ, ψ) =
¯ Γ(β1 , β2 )ψ(β2 )dµ(β2 ) − λψ(β1 )¯ dµ(β1 )
for λ ∈ (0, ∞), ψ ∈ L2C (µ) with ||ψ|| = 1.
Then eigenvalue problem (3.1) for λ > 0 is equivalent2 to the problem:
Find a λ > 0 and a ψ ∈ L2C (µ) with ||ψ|| = 1 such that G(λ, ψ) = 0.
(3.3)
Observe that
Z µZ
¶
ψ(β2 ).Γ(β2 , β)dµ(β2 ) − λψ(β)
G(λ, ψ) =
¶
µZ
×
Γ(β, β2 )ψ(β2 )dµ(β2 ) − λψ(β) dµ(β)
¶
Z µZ
ψ(β1 ).Γ(β1 , β)dµ(β1 ) − λψ(β)
=
¶
µZ
×
Γ(β, β2 )ψ(β2 )dµ(β2 ) − λψ(β) dµ(β)
Z Z
ψ(β1 )Γ2 (β1 , β2 )ψ(β1 )dµ(β1 )dµ(β2 )
=
Z
−2λ ψ(β1 )Γ(β1 , β2 )ψ(β2 )dµ(β2 ) + λ2
where
Γ2 (β1 , β2 ) =
Z
Γ(β1 , β)Γ(β, β2 )dµ(β)
Minimizing G(λ, ψ) to λ yields
Z Z
λ=
ψ(β1 )Γ(β1 , β2 )ψ(β2 )dµ(β1 )dµ(β2 )
2
See Remark 4.1 in Bierens(2014b).
5
(3.4)
and substituting this solution in G(λ, ψ) yields
Z Z
ψ(β1 )Γ2 (β1 , β2 )ψ(β2 )dµ(β1 )dµ(β2 )
G(ψ) = min G(λ, ψ) =
λ>0
¶2
µZ Z
ψ(β1 )Γ(β1 , β2 )ψ(β2 )dµ(β1 )dµ(β2 ) . (3.5)
−
Thus, the eigenfunction problem (3.3) is equivalent to the following problem:
Find a ψ ∈ L2C (µ) with ||ψ|| = 1 such that G(ψ) = 0.
Then the corresponding eigenvalue λ is given by (3.4).
(3.6)
3.2. The maximum eigenvalue problem
The solution (3.4) suggests that, possibly, the eigenfunction ψ1 corresponding to
the largest eigenvalue λ1 , and λ1 itself, can be determined by
Z Z
ψ1 = arg
max
ψ(β1 )Γ(β1 , β2 )ψ(β2 )dµ(β1 )dµ(β2 ),
(3.7)
ψ∈L2C (µ),||ψ||=1
Z Z
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ).
(3.8)
λ1 =
If so, we must have that G(ψ1 ) = 0, or equivalently,
Z Z
ψ1 (β1 )Γ2 (β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ) = λ21 .
(3.9)
Indeed, this conjecture is correct.
Theorem 3.1.
(a) Under Assumption 2.1 the pair λ1 , ψ1 determined by (3.7) and (3.8) is the
maximum eigenvalue with corresponding eigenfunction of the kernel Γ(β1 , β2 ) in
Assumption 2.1.
(b) Under the additional Assumption 2.2, λ1 > 0.
(c) Otherwise, λ1 = 0 implies that Γ(β1 , β2 ) ≡ 0 on B × B.
Proof. This theorem will be proved by converting the maximum eigenvalue
problem involved in real terms as in the addendum to Bierens and Ploberger
(1997) [see Bierens (2014b)] for real-valued kernels, using the properties of the
real Hilbert space L2 (µ).
6
Definition 3.1. RL2 (µ) is the Hilbert space of Borel measurable real functions f
on B satisfying f (β)2 dµ(β) < ∞, endowed with innerproduct
Z
hf, gi = f (β)g(β)dµ(β)
and associated norm ||f || =
p
hf, f i and metric ||f − g||.
Lemma 3.1. The Hilbert space L2 (µ) has an orthonormal base, say {ϕj (β)}∞
j=1 ,
2
so that every f ∈ L (µ) has the series representation
f (β) =
∞
X
cj ϕj (β),
(3.10)
j=1
where cj = hf, ϕj i satisfying
P∞
2
j=1 cj
=
R
f (β)2 dµ(β) < ∞.
Note that the series representation (3.10) holds with probability 1, in the sense
that
Ã(
)!
n
X
µ
β ∈ B : lim
cj ϕj (β) = f (β)
= 1,
n→∞
j=1
rather than exactly for all β ∈ B. C.f. Bierens(2014a,b).
Since the solution ψ1 of (3.7) is an element of L2C (µ), and therefore Re[ψ1 ] and
Im[ψ1 ] are elements of L2 (µ), ψ1 has the series representation
ψ1 (β) =
∞
X
(cj + i.dj ) ϕj (β), where
(3.11)
j=1
cj = hRe[ψ1 ], ϕj i , dj = hIm[ψ1 ], ϕj i
Z
∞
X
¡ 2
¢
2
cj + dj =
ψ1 (β)ψ1 (β)dµ(β) = 1.
j=1
In particular, denoting for n ∈ N,
ψ1,n (β) =
n
X
(cn,j + i.dn,j ) ϕj (β), where
j=1
cj
dj
p
,
d
,
=
P
n,j
n
2
2
2
2
(c
+
d
)
(c
+
d
)
i
i
i=1 i
i=1 i
cn,j = pPn
7
(3.12)
it follows that
lim
n→∞
Z
|ψ1 (β) − ψ1,n (β)|2 dµ(β) = 0
(3.13)
as is not hard to verify. See Bierens (2014b) for a similar result. This implies the
following result.
Lemma 3.2. Let ψ1 in (3.11) be a solution of (3.7) and let ψ1,n be defined by
(3.12). Then
Z Z
lim
ψ1,n (β1 )Γ(β1 , β2 )ψ1,n (β2 )dµ(β1 )dµ(β2 )
n→∞
Z Z
=
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ).
Proof. To prove Lemma 3.2, observe first that
¯Z Z
¯
¯
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 )
¯
¯
Z Z
¯
−
ψ1,n (β1 )Γ(β1 , β2 )ψ1,n (β2 )dµ(β1 )dµ(β2 )¯¯
¯Z Z ³
´
¯
≤ ¯¯
ψ1 (β1 ) − ψ1,n (β1 ) Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 )
Z Z
+
ψ1,n (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 )
¯
Z Z
¯
ψ1,n (β1 )Γ(β1 , β2 )ψ1,n (β2 )dµ(β1 )dµ(β2 )¯¯
−
¯Z ³
¯
¶
´ µZ
¯
¯
¯
≤¯
ψ1 (β1 ) − ψ1,n (β1 )
Γ(β1 , β2 )ψ1 (β2 )dµ(β2 ) dµ(β1 )¯¯
¯Z µZ
¯
¶
¯
¯
¯
+¯
ψ1,n (β1 )Γ(β1 , β2 )dµ(β1 ) (ψ1 (β2 ) − ψ1,n (β2 )) dµ(β2 )¯¯
sZ
¯
¯2
¯
¯
≤
¯ψ1 (β1 ) − ψ1,n (β1 )¯ dµ(β1 )
s ¯
¯2
Z Z
¯
¯
¯
¯ dµ(β1 )
Γ(β
,
β
)ψ
(β
)dµ(β
)
×
1
2
1
2
2
¯
¯
8
s
Z
+
¯Z
¯2
¯
¯
¯ ψ1,n (β1 )Γ(β1 , β2 )dµ(β1 )¯ dµ(β2 )
¯
¯
sZ
|ψ1 (β2 ) − ψ1,n (β2 )|2 dµ(β2 )
×
≤2
sup
(β1 ,β2 )∈B×B
|Γ(β1 , β2 )| ×
sZ
|ψ1 (β) − ψ1,n (β)|2 dµ(β)
where the third inequality follows from the Cauchy-Schwartz inequality for inner
products, |hx, yi| ≤ ||x||.||y||, which also holds for the complex case hx, yi = hy, xi.
To prove the last inequality, observe that by the same Cauchy-Schwartz inequality,
sZ
sZ
¯Z
¯
¯
¯
¯ Γ(β1 , β2 )ψ1 (β2 )dµ(β2 )¯ ≤
|Γ(β1 , β2 )|2 dµ(β2 )
|ψ1 (β2 )|2 dµ(β2 )
¯
¯
sZ
|Γ(β1 , β2 )|2 dµ(β2 )
=
≤
sup
(β1 ,β2 )∈B×B
|Γ(β1 , β2 )| < ∞
where the last inequality follows from the uniform continuity of Γ(β1 , β2 ) on B×B.
Hence,
s ¯
¯2
Z Z
¯
¯
¯ Γ(β1 , β2 )ψ1 (β2 )dµ(β2 )¯ dµ(β1 ) ≤
sup
|Γ(β1 , β2 )| < ∞
¯
¯
(β1 ,β2 )∈B×B
and similarly
s ¯
¯2
Z Z
¯
¯
¯ ψ1,n (β1 )Γ(β1 , β2 )dµ(β1 )¯ dµ(β2 ) ≤
¯
¯
sup
(β1 ,β2 )∈B×B
|Γ(β1 , β2 )| < ∞.
Lemma 3.2 follows now from (3.13).
The following lemma is also a well-known result, related to Lemma 3.1.
9
2
Lemma 3.3. Given the orthonormal base {ϕj (β)}∞
j=1 of L (µ) in Lemma 3.1 ,
every Borel measurable real function g(β1 , β2 ) on B × B satisfying
Z Z
g(β1 , β2 )2 dµ(β1 )dµ(β2 ) < ∞
has the series representation
g(β1 , β2 ) =
∞
∞ X
X
ci,j ϕi (β1 )ϕj (β2 ),
(3.14)
i=1 j=1
where
ci,j =
Z Z
∞
∞ X
X
c2i,j
i=1 j=1
ϕi (β1 )g(β1 , β2 )ϕj (β2 )dµ(β1 )dµ(β2 ), with
Z Z
=
g(β1 , β2 )2 dµ(β1 )dµ(β2 ) < ∞.
Similar to Lemma 3.1 the series representation (3.14) holds with probability
1, in the
µ½ sense that
¾¶
Pn1 Pn2
µ×µ
(β1 , β2 ) ∈ B × B :
lim
i=1
j=1 ci,j ϕi (β1 )ϕj (β2 ) = g(β1 , β2 )
min(n1 ,n2 )→∞
=1
rather than exactly for all (β1 , β2 ) ∈ B × B, where µ × µ is the product measure
defined as follows.
e1 and β
e2 be independent random drawings from the distriDefinition 3.2. Let β
bution of µ. Then the product measure µ × µ is the probability measure on B × B
e1 , β
e2 ).
induced by (β
Lemma 3.3 implies that
Re[Γ(β1 , β2 )] =
∞
∞ X
X
αi,j ϕi (β1 )ϕj (β2 ), where
(3.15)
i=1 j=1
αi,j =
∞
∞ X
X
i=1 j=1
2
αi,j
=
Z Z
Z Z
ϕi (β1 ) Re[Γ(β1 , β2 )]ϕj (β2 )dµ(β1 )dµ(β2 ),
(Re[Γ(β1 , β2 )])2 dµ(β1 )dµ(β2 ) < ∞
10
(3.16)
Im[Γ(β1 , β2 )] =
∞
∞ X
X
γi,j ϕi (β1 )ϕj (β2 ), where
(3.17)
i=1 j=1
γi,j =
∞ X
∞
X
2
γi,j
=
i=1 j=1
Z Z
Z Z
ϕi (β1 ) Im[Γ(β1 , β2 )]ϕj (β2 )dµ(β1 )dµ(β2 ),
(Re[Γ(β1 , β2 )])2 dµ(β1 )dµ(β2 ) < ∞
(3.18)
Note that by (2.4) and (2.5),
αi,j = αj,i , γi,j = −γj,i , γi,i = 0.
(3.19)
Hence,
Γ(β1 , β2 ) =
∞
∞ X
X
(3.20)
(αi,j + i.γi,j ) ϕi (β1 )ϕj (β2 ).
i=1 j=1
Γ2 (β1 , β2 ) =
Z
Γ(β1 , β)Γ(β, β2 )dµ(β)
Ã
!
Z X
∞
∞ X
(αi1 ,j1 + i.γi1 ,j1 ) ϕi1 (β1 )ϕj1 (β)
=
×
=
Ã
i1 =1 j1 =1
∞
∞ X
X
(αj2 ,i2
j2 =1 i2 =1
∞ X
∞ X
∞
∞ X
X
!
+ i.γj2 ,i2 ) ϕj2 (β)ϕi2 (β2 ) dµ(β)
(αi1 ,j1 + i.γi1 ,j1 ) (αj2 ,i2 + i.γj2 ,i2 ) ϕi1 (β1 )ϕi2 (β2 )
i1 =1 j1 =1 j2 =1 i2 =1
×
Z
ϕj1 (β)ϕj2 (β)dµ(β)
̰
!
∞ X
∞
X
X
(αi1 ,j + i.γi1 ,j ) (αj,i2 + i.γj,i2 ) ϕi1 (β1 )ϕi2 (β2 )
=
=
i1 =1 i2 =1
j=1
∞
∞
∞
XXX
(αi1 ,j αj,i2
i1 =1 i2 =1 j=1
∞ X
∞ X
∞
X
− γi1 ,j γj,i2 ) ϕi1 (β1 )ϕi2 (β2 )
(αi1 ,j γj,i2 + γi1 ,j αj,i2 ) ϕi1 (β1 )ϕi2 (β2 ).
+i.
i1 =1 i2 =1 j=1
11
(3.21)
Combining (3.11) and (3.20) it follows that
Z Z
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 )
!Ã ∞ ∞
!
Z Z ÃX
∞
XX
(cm − i.dm ) ϕm (β1 )
(αi,j + i.γi,j ) ϕi (β1 )ϕj (β2 )
=
×
=
̰
X
m=1
(ck + i.dk ) ϕk (β2 ) dµ(β1 )dµ(β2 )
k=1
∞
∞ X
X
m=1 k=1
×
=
=
=
Z
(cm − i.dm ) (ck + i.dk )
ϕi (β1 )ϕm (β1 )dµ(β1 )
∞
∞ X
X
m=1 k=1
∞ X
∞
X
m=1 k=1
∞
∞ X
X
=
cm αm,k ck +
m=1 k=1
∞
∞ X
X
ϕj (β2 )ϕk (β2 )dµ(β2 )
∞
∞ X
X
m=1 k=1
dm αm,k dk −
(cm αm,k dk − dm αm,k ck ) + i.
cm αm,k ck +
m=1 k=1
(αi,j + i.γi,j )
i=1 j=1
((cm ck + dm dk ) + i.(cm dk − dm ck )) (αm,k + i.γm,k )
+i.
m=1 k=1
∞
∞ X
X
Z
∞ X
∞
X
(cm − i.dm ) (ck + i.dk ) (αm,k + i.γm,k )
m=1 k=1
∞
∞ X
X
=
i=1 j=1
!
(cm , dm )
µ
∞
∞ X
X
m=1 k=1
∞
∞ X
X
αm,k −γm,k
γk,m αm,k
m=1 k=1
(cm γm,k ck
m=1 k=1
∞
∞ X
X
dm αm,k dk − 2
¶µ
cm γm,k dk +
m=1 k=1
∞
∞
XX
ck
dk
+ dm γm,k dk )
cm γm,k dk
m=1 k=1
¶
where the last two equalities follows from the fact that by (3.19),
∞
∞ X
X
m=1 k=1
∞
∞ X
X
m=1 k=1
cm αm,k dk =
∞
∞ X
X
dm αm,k ck ,
m=1 k=1
∞
∞ X
X
dm γm,k ck = −
m=1 k=1
12
∞
∞ X
X
cm γm,k ck ,
dm γm,k ck
∞ X
∞
X
∞ X
∞
X
cm γm,k ck = 0,
m=1 k=1
dm γm,k dk = 0,
m=1 k=1
and from the easy equality
∞
∞ X
X
cm αm,k ck
m=1 k=1
∞
∞ X
X
=
+
(cm , dm )
m=1 k=1
∞
∞ X
X
m=1 k=1
µ
dm αm,k dk − 2
αm,k −γm,k
γk,m αm,k
¶µ
∞
∞ X
X
cm γm,k dk
m=1 k=1
ck
dk
¶
.
Thus, denoting
Am,k =
µ
αm,k −γm,k
γk,m αm,k
¶
, bm = (cm , dm )0
(3.22)
we have
Z Z
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ) =
∞
∞ X
X
b0m Am,k bk
m=1 k=1
Similarly,
Z Z
ψ1,n (β1 )Γ(β1 , β2 )ψ1,n (β2 )dµ(β1 )dµ(β2 ) =
n
n X
X
b0n,m Am,k bn,k
(3.23)
m=1 k=1
where bn,m = (cn,m , dn,m )0 . C.f. (3.12).
Now stack the bn,m ’s in an 2n × 1 vector xn , and recall from (3.12) that
0
xn xn = 1. Moreover, denote
⎞
⎛
A1,1
A1,2 · · · A1,n−1
A1,n
⎜ A2,1
A2,2 · · · A2,n−1
A2,n ⎟
⎟
⎜
⎟
⎜
..
..
..
..
...
(3.24)
An = ⎜
⎟,
.
.
.
.
⎟
⎜
⎝ An−1,1 An−1,2 · · · An−1,n−1 An−1,n ⎠
An,1
An,2 · · · An,n−1
An,n
13
which is a symmetric 2n × 2n matrix.3 Then (3.23) reads
Z Z
ψ1,n (β1 )Γ(β1 , β2 )ψ1,n (β2 )dµ(β1 )dµ(β2 ) = x0n An xn ,
(3.25)
whereas obviously,
x0n An xn
≤
Z Z
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ).
(3.26)
P
Similarly, replacing ψ1,n (β) by ψn (β) = nj=1 (y1,j +i.y2,j )ϕj (β), where the yi,j ’s
P
P
2
2
+ nj=1 y2,j
= 1, and denoting
are arbitrary subject to the restriction nj=1 y1,j
yn = (y1,1 , y2,1 , y1,2 , y2,2 , ..., y1,n , y2,n ), we have
Z Z
0
sup
yn An yn ≤
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ).
(3.27)
0 y =1
yn ∈R2n .:yn
n
Recall from linear algebra that the maximum eigenvalue λn of An is equal to
λn =
y 0 An y,
sup
(3.28)
y∈R2n :y 0 y=1
so that by (3.25), (3.26) and (3.27),
Z Z
ψ1,n (β1 )Γ(β1 , β2 )ψ1,n (β2 )dµ(β1 )dµ(β2 ) ≤ λn
Z Z
≤ λn ≤
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ).
(3.29)
Consequently, it follows from Lemma 3.2 that
RR
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ) = limn→∞ λn , where λn is
Lemma 3.4.
the maximum eigenvalue of the matrix An in (3.24).
Next, let us focus on the case Γ2 (β1 , β2 ). Obviously, Lemma 3.2 carries over if
we replace Γ by Γ2 .
3
The symmetry follows from
µ
αk,m
Ak,m =
γk,m
−γk,m
αk,m
¶
=
µ
14
αm,k
−γm,k
γm,k
αm,k
¶
= A0m,k .
Lemma 3.5. Let ψ1 in (3.11) be a solution of (3.7) and let ψ1,n be defined by
(3.12). Then
Z Z
lim
ψ1,n (β1 )Γ2 (β1 , β2 )ψ1,n (β2 )dµ(β1 )dµ(β2 )
n→∞
Z Z
=
ψ1 (β1 )Γ2 (β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ).
Moreover, after some tedious complex calculus exercises it can be shown from
(3.21) and (3.11) that
Z Z
ψ1 (β1 )Γ2 (β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 )
=
∞
∞ X
∞ X
X
(ck , dk )
k=1 m=1 j=1
×
µ
αk,j αj,m − γk,j γj,m −αk,j γj,m − γk,j αj,m
αk,j γj,m + γk,j αj,m αk,j αj,m − γk,j γj,m
̰
!
∞ X
∞
X
X
=
b0k
Ak,j Aj,m bm ,
k=1 m=1
¶µ
cm
dm
¶
(3.30)
j=1
where Ak,j , Aj,m and bm are defined in (3.22). Furthermore, similar to (3.23) we
have
Z Z
ψ1,n (β1 )Γ2 (β1 , β2 )ψ1,n (β2 )dµ(β1 )dµ(β2 )
̰
!
n X
n
X
X
=
b0n,k
Ak,j Aj,m bn,m
(3.31)
k=1 m=1
j=1
where bn,m = (cn,m , dn,m )0 . C.f. (3.12).
Stacking the bn,m ’s in a 2n × 1 vector xn as before, and denoting
⎞
⎛
C1,1 (n) · · · C1,n (n)
∞
X
⎟
⎜
.
.
.
..
..
..
Ak,j Aj,m ,
Cn = ⎝
⎠ , where Ck,m (n) =
j=n+1
Cn,1 (n) · · · Cn,n (n)
15
we can write the right-hand side of (3.31) as
à n
!
à ∞
!
n X
n X
n
n
X
X
X
X
b0n,k
Ak,j Aj,m bn,m +
b0n,k
Ak,j Aj,m bn,m
k=1 m=1
=
j=1
0
2
xn An xn +
k=1 m=1
j=n+1
x0n Cn xn .
Because x0n xn = 1 the term x0n A2n xn is dominated by the maximum eigenvalue
of A2n , which is the square of the maximum eigenvalue λn of An , and x0n Cn xn is
dominated by the trace of Cn , where
trace(Cn ) =
n
X
trace (Cm,m (n))
m=1
n
X
= 2
= 2
∞
X
m=1 j=n+1
∞
n
X
X
m=1 j=n+1
≤ 2
Ã
αm,j αj,m − 2
∞
n
X
X
¡ 2
¢
2
αm,j + γm,j
∞
∞ X
X
2
αm,j
+
j=n+1 m=1
γm,j γj,m
m=1 j=n+1
∞
∞ X
X
j=n+1 m=1
2
γm,j
!
.
Due to (3.16) and (3.18) the latter converges to zero as n → ∞.
Thus, it has been shown that
Z Z
2
ψ1,n (β1 )Γ2 (β1 , β2 )ψ1,n (β2 )dµ(β1 )dµ(β2 ) ≤ λn + o(1).
It follows now from Lemmas 3.4 and 3.5 that
Z Z
ψ1 (β1 )Γ2 (β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 )
¶2
µZ Z
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ) .
≤
(3.32)
However, note that the function G(ψ) in (3.5) is always nonnegative, which implies
that
Z Z
ψ1 (β1 )Γ2 (β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 )
µZ Z
¶2
≥
ψ1 (β1 )Γ(β1 , β2 )ψ1 (β2 )dµ(β1 )dµ(β2 ) .
(3.33)
16
Part (a) of Theorem 3.1 now follows from (3.32) and (3.33).
Finally, observe from (3.28) and (3.29) that the sequence λn is monotonic nondecreasing and bounded, hence limn→∞ λn = supn≥1 λn . If the latter supremum is
zero, then for all n ≥ 1, An = O2n,2n , hence the αi,j ’s and γi,j ’s are all zero and
thus by (3.20), Γ(β1 , β2 ) ≡ 0 on B × B. This proves parts (b) and (c) of Theorem
3.1.
3.3. The other eigenvalues and eigenfunctions
Given that λ1 > 0, let
Γ(2) (β1 , β2 ) = Γ(β1 , β2 ) − λ1 ψ1 (β1 )ψ1 (β2 ),
which is symmetric and continuous on B×B. To prove that Γ(2) (β1 , β2 ) is positive
semidefinite, let φ ∈ L2C (µ) be arbitrary. We can write φ = hφ, ψ1 i ψ1 + r, where
hr, ψ1 i = 0, hence
Z Z
φ(β1 )Γ(2) (β1 , β2 )φ(β2 )dµ(β1 )dµ(β2 )
Z Z
=
r(β1 )Γ(β1 , β2 )r(β2 )dµ(β1 )dµ(β2 ) ≥ 0,
as is not hard to verify. Then the maximum eigenvalue λ2 of Γ(2) (β1 , β2 ) with
corresponding eigenfunction ψ2 can be derived in the same way as before, which
are an eigenvalue and corresponding eigenfunction of Γ(β1 , β2 ) as well, with
Z
ψ1 (β)ψ2 (β)dµ(β) = 0 if λ2 > 0.
R
To see this, note that ψ1 (β1 )Γ(2) (β1 , β2 )dµ(β1 ) = 0, hence
Z Z
Z
ψ1 (β1 )Γ(2) (β1 , β2 )ψ2 (β2 )dµ(β1 )dµ(β2 )
λ2 ψ1 (β1 )ψ2 (β1 )dµ(β1 ) =
= 0.
Now
λ2 ψ2 (β1 ) =
=
=
Z
Z
Z
Γ(2) (β1 , β2 )ψ2 (β2 )dµ(β2 )
Γ(β1 , β2 )ψ2 (β2 )dµ(β2 ) − λ1 ψ1 (β1 )
Γ(β1 , β2 )ψ2 (β2 )dµ(β2 ).
17
Z
ψ1 (β2 )ψ2 (β2 )dµ(β2 )
Repeating this construction n times yield eigenvalues λ1 ≥ λ2 ≥ .... ≥ λn of
Γ(β1 , β2 ) with corresponding orthonormal eigenfunctions ψm , m = 1, 2, ..., n. At
this point the next eigenvalue λn+1 ≤ λn and eigenfunction ψn+1 are the maximum
eigenvalue and corresponding eigenfunction of the kernel
(n)
Γ
(β1 , β2 ) = Γ(β1 , β2 ) −
n
X
λm ψm (β1 )ψm (β2 ).
(3.34)
m=1
Suppose that for some n, λn > 0 but λn+1 = 0. Then the maximum eigenvalue
of Γ(n) (β1 , β2 ) is zero, hence by part (c) of Theorem 3.1, Γ(n) (β1 , β2 ) ≡ 0 on B × B
and thus
n
X
λm ψm (β1 )ψm (β2 ).
Γ(β1 , β2 ) ≡
m=1
Then any function φ in the orthogonal complement of span({ψm }nm=1 ), i.e., the
space
©
ª
Un = φ ∈ L2C (µ) : hφ, ψm i = 0 for m = 1, 2, ..., n ,
is an eigenfunction of Γ(β1 , β2 ) with zero eigenvalue. Since Un is a Hilbert space
itself contained in L2C (µ), it is possible to select an orthonormal basis {ψm }∞
m=n+1
for U, and the extended orthonormal sequence {ψm }∞
m=1 is then an orthonormal
basis of L2C (µ), i.e.,
L2C (µ) = span({ψm }∞
(3.35)
m=1 ).
If there does not exists an n such that λn = 0 then we can repeat this construction indefinitely, yielding a non-increasing sequence {λn }∞
n=1 of positive eigenvalues
of Γ(β1 , β2 ) with corresponding orthonormal sequence {ψm }∞
m=1 of eigenfunctions.
However, that does not mean that Γ(β1 , β2 ) has only positive eigenvalues. If the
Hilbert space U∞ = {φ ∈ L2C (µ) : hφ, ψm i = 0 for all m ∈ N} is non-trivial, in the
sense that it is larger than the singleton {0}, then by Mercer’s theorem below, all
the function in U∞ are eigenfunctions of Γ(β1 , β2 ) with zero eigenvalues.
3.4. The Hilbert-Schmidt theorem
Summarizing, the following main result has been proved.
Theorem 3.2. (Hilbert-Schmidt Theorem) Under Assumption 2.1 the eigenvalue
problem: ”Find a scalar λ and a function ψ ∈ L2C (µ) normalized to unit norm,
18
R
||ψ|| = 1, such that Γ(β1 , β2 )ψ(β2 )dµ(β2 ) = λψ(β1 ) for all β1 ∈ B” has count4
able many solutions {λm , ψm }∞
m=1 , i.e.,
Z
(3.36)
Γ(β1 , β2 )ψm (β2 )dµ(β2 ) ≡ λm ψm (β1 ),
where the eigenvalues λm are real valued and nonnegative and the eigenfunctions
ψm are orthonormal. Moreover, the eigenfunctions corresponding to the positive
eigenvalues are continuous on B. If all the eigenvalues are zero then Γ(β1 , β2 ) ≡ 0
on B × B.
This theorem is called after David Hilbert and his Ph.D. student Erhard
Schmidt who published a series of papers in the period 1904-1908 regarding the
existence of eigenvalues and corresponding eigenfunctions for real-valued kernels
on a rectangle [a, b] × [a, b]. Their results are nowadays referred to as the HilbertSchmidt theorem. See for example Bernkopf (1966) and Siegmund-Schultze (1986)
and the references therein.
The complex case in Theorem 3.2 is not a new results, of course. See for
example Krein (1998). However, the proof of Theorem 3.2 is different from what
I have seen in the literature, where the proof is based on operator theory.
4. Mercer’s theorem for complex kernels
The original Mercer’s theorem is due to Mercer (1909), who proved it for realvalued kernels on the rectangle [a, b] × [a, b].
The current version of Mercer’ theorem can be restated somewhat shorter than
in Lemma 3 in BW because some parts are already covered by Theorem 3.2.
Theorem 4.1. ( Mercer’s Theorem) Under Assumptions 2.1 and 2.2 the complex
kernel Γ(β1 , β2 ) involved has the series representation
Γ(β1 , β2 ) =
∞
X
(4.1)
λm ψm (β1 )ψm (β2 ),
m=1
where the λm ’s are the eigenvalues of Γ(β1 , β2 ) and the ψm ’s are the corresponding
orthonormal eigenfunctions. Then in addition to the results in Theorem 3.2 the
following hold.
4
In Lemma 3 in BW, (3.36) was incorrectly stated as λm ψm (β1 ) =
19
R
Γ(β1 , β2 )ψm (β2 )dµ(β2 ).
P
(a) The eigenvalues satisfy ∞
m=1 λm < ∞.
(b) The convergence of the right-hand side of (4.1) is uniform on B × B, i.e.,
¯
¯
n
¯
¯
X
¯
¯
(4.2)
lim
sup
λm ψm (β1 )ψm (β2 )¯ .
¯Γ(β1 , β2 ) −
n→∞ (β ,β )∈B×B ¯
¯
1 2
m=1
(c) The orthonormal sequence {ψm }∞
m=1 of eigenfunctions, including the eigenfunctions with zero eigenvalues, is complete in L2C (µ), i.e., L2C (µ) = span({ψm }∞
m=1 ).
Proof. Let {ψm }∞
m=1 be the sequence of all eigenfunctions, thus including those
with zero eigenvalues. Denote
S2 = span({ψk (β1 )ψm (β2 )}∞
k,m=1 ),
which is a subspace of the Hilbert space L2C (µ×µ) of all square-integrable complexvalued Borel measurable functions on B×B endowed with the usual innerproduct
and associated norm and metric, and note that Γ ∈ L2C (µ × µ). By the projection
theorem, the projection of Γ on S2 takes the form
Γ(β1 , β2 ) =
∞
∞ X
X
ck,m ψm (β1 )ψk (β2 ), where
m=1 k=1
ck,m =
Z Z
= λk
Z
ψm (β1 )Γ(β1 , β2 )ψk (β2 )dµ(β1 )dµ(β2 )
ψm (β1 )ψk (β1 )dµ(β1 ) = λk 1(k = m),
where as in BW, 1(.) denotes the indicator function. Hence, the projection of Γ
on S2 is
∞
X
Γ(β1 , β2 ) =
λm ψm (β1 )ψm (β2 ),
m=1
with projection residual
R(β1 , β2 ) = Γ(β1 , β2 ) − Γ(β1 , β2 ) ∈ S2⊥ ,
where S2⊥ is the orthogonal complement of S2 .
If R(β1 , β2 ) is continuous and symmetric positive semidefinite then by Theorem
3.2, R has an eigenfunction ϕ ∈ S2⊥ . But then ϕ is also an eigenfunction of Γ,
20
and therefore already contained in S2 . As in the proof of Mercer’s theorem in the
real-valued case in Bierens (2014b), we then must have that R(β1 , β2 ) = 0 on
B × B.
It is obvious that R(β1 , β2 ) is symmetric. To prove that R(β1 , β2 ) is positive
semidefinite, let f ∈ L2C (µ) be arbitrary. Project f on S1 = span({ψm }∞
m=1 ),
⊥
and
let
f
∈
S
be
the
projection
and
f
∈
S
be
the
projection
residual.
Then
1
1
2
1
R
R
R(β1 , β2 ) f1 (β2 )dµ(β2 ) = 0 and Γ(β1 , β2 )f2 (β2 )dµ(β2 ) = 0, hence
Z Z
f (β1 )R(β1 , β2 )f (β2 )dµ(β1 )dµ(β2 )
Z Z
=
f2 (β1 )R(β1 , β2 )f2 (β2 )dµ(β1 )dµ(β2 )
Z Z
=
f2 (β1 )Γ(β1 , β2 )f2 (β2 )dµ(β1 )dµ(β2 ) ≥ 0. (4.3)
as is not hard to verify. Thus, R(β1 , β2 ) is positive semidefinite.
To prove that R(β1 , β2 ) is continuous it suffices to prove that Γ(β1 , β2 ) is
continuous, as follows. Note that (4.3) implies that R(β, β) ≥ 0 for all β ∈ B,
which in its turn implies that for all β ∈ B,
∞
X
m=1
2
λm |ψm (β)|
=
∞
X
λm ψm (β)ψm (β) = Γ(β, β)
m=1
= Γ(β, β) ≤ sup Γ(β, β) < ∞.
(4.4)
β∈B
Integrating β out yields
Next, observe that
P∞
m=1
λm < ∞, which is just part (a) of Theorem 4.1.
³
´
|ψm (β1 ) − ψm (β2 )|2 = (ψm (β1 ) − ψm (β2 )) ψm (β1 ) − ψm (β2 )
(4.5)
= ψm (β1 )ψm (β1 ) − ψm (β2 )ψm (β1 )
−ψm (β1 )ψm (β2 ) + ψm (β2 )ψm (β2 )
= |ψm (β1 )|2 + |ψm (β2 )|2
−ψm (β1 )ψm (β2 ) − ψm (β2 )ψm (β1 )
hence,
ψm (β1 )ψm (β2 ) + ψm (β2 )ψm (β1 ) ≤ |ψm (β1 )|2 + |ψm (β2 )|2 ,
where the left-hand side is real valued. Similarly, replacing − by + in (4.5) yields,
ψm (β1 )ψm (β2 ) + ψm (β2 )ψm (β1 ) ≥ |ψm (β1 )|2 + |ψm (β2 )|2 .
21
Thus
´
1³
ψm (β1 )ψm (β2 ) + ψm (β2 )ψm (β1 )
2
1
1
|ψm (β1 )|2 + |ψm (β2 )|2 .
≤
2
2
|ψm (β1 )ψm (β2 )| ≤
(4.6)
It follows now from (4.4) and (4.6) that for all (β1 , β2 ) ∈ B × B,
∞
X
∞
∞
1X
1X
2
λ|ψm (β1 )ψm (β2 )| ≤
λm |ψm (β1 )| +
λm |ψm (β2 )|2
2
2
m=1
m=1
m=1
≤ sup Γ(β, β) < ∞.
(4.7)
β∈B
By the same argument as in the proof of Mercer’s theorem for real-valued kernels
in Bierens(2014b) it follows that (4.7) implies
¯
¯
n
¯
¯
X
¯
¯
(4.8)
sup
λm ψm (β1 )ψm (β2 )¯ = 0
lim
¯Γ(β1 , β2 ) −
n→∞ (β ,β )∈B×B ¯
¯
1 2
m=1
Pn
which in its turn implies, by the continuity of
m=1 λm ψm (β1 )ψm (β2 ) for all
n ∈ N, that Γ(β1 , β2 ) is continuous on B × B, and so is R(β1 , β2 ). But then
R(β1 , β2 ) ≡ 0 on B × B, hence
Γ(β1 , β2 ) ≡ Γ(β1 , β2 ).
(4.9)
Part (b) of Theorem 4.1 follows now from (4.8) and (4.9).
2
As to part (c), suppose that {ψm }∞
m=1 is not complete in LC (µ). Then the
⊥
∞
orthogonal complement S1 of S1 = span({ψm }m=1 ) contains at least one nonzero
function ϕ with unit norm. Since now (4.1) holds exactly on B × B this ϕ is an
eigenvalue of Γ with zero eigenvalue, but then ϕ is already included in {ψm }∞
m=1 .
2
Therefore, S1⊥ = {0} and thus {ψm }∞
is
complete
in
L
(µ).
m=1
C
This completes the proof of Theorem 4.1.
Remark 4.1. Similar to Remark 5.2 in Bierens (2014b), the condition in Assumption 2.1 that B is compact is only used in the proof of Theorem 3.2 to
guarantee
Z Z
|Γ(β1 , β2 )|2 dµ(β1 )dµ(β2 ) < ∞
22
(4.10)
and is only used in the proof of Theorem 4.1 to guarantee that supβ∈B Γ(β, β) <
∞. Therefore, Mercer’s theorem carries over to probability measures µ on unbounded domains B as long as (4.10) holds and Γ(β, β) is uniformly bounded.
5. Lemma 4 revised
Lemma 4 in BW claims that, with Z(β) a complex-valued continuous Gaussian
process on a compact subset B of a Euclidean space and µ a probability measure
on B,
Z
∞
X
2
λm e0m em ,
(5.1)
|Z(β)| dµ(β) =
m=1
where the λm ’s are the eigenvalues of the covariance kernel
h
i
Γ(β1 , β2 ) = E Z(β1 )Z(β2 )
and the em ’s are independently N2 [0, I2 ] distributed.
However, it follows from Mercer’s theorem that
∙Z
¸ Z
∞
X
2
E
λm ,
|Z(β)| dµ(β) = Γ(β, β)dµ(β) =
m=1
£R
¤
P
whereas (5.1) implies E |Z(β)|2 dµ(β) = 2 ∞
m=1 λm . Apart from this impossibility result, the main flaw in the original proof of Lemma 4 is due to equation
(A.6) in BW, which reads
0
E[Z2 (β1 )Z2 (β2 ) ] =
∞
X
λm Qm (β1 )Qm (β2 ), .
(5.2)
m=1
where
Z2 (β) =
instead of the correct expression
µ
Re[Z(β)]
Im[Z(β)]
E[Z2 (β1 )Z2 (β2 )0 + Z2∗ (β1 )Z2∗ (β2 )0 ] =
¶
∞
X
m=1
23
,
λm Qm (β1 )Qm (β2 ),
(5.3)
(5.4)
where
Z2∗ (β)
=
µ
Im[Z(β)]
− Re[Z(β)]
¶
.
Actually, the following corrected version of Lemma 4 is related to Bierens and
Ploberger (1997, Theorem 3):
Theorem 5.1. (Revised Lemma 4 in BW) Let Z(β) be a complex-valued continuous Gaussian process on a compact subset B of a Euclidean space and let µ be a
probability
measure on B. Then there exists a nonnegative sequence ωm satisfying
P∞
m=1 ωm < ∞ such that
Z
∞
X
2
ωm ε2m ,
|Z(β)| dµ(β) =
m=1
where the εm ’s are independent standard normally distributed.
Proof. Let {λm }∞
m=1 be the sequence of eigenvalues of the covariance kernel
h
i
Γ(β1 , β2 ) = E Z(β1 )Z(β2 )
with corresponding sequence {ψm (β)}∞
m=1 of orthonormal eigenfunctions
P∞ (relative
∞
to µ). By the completeness of {ψm (β)}m=1 we can write Z(β) = m=1 gm ψm (β)
R
a.e. µ,5 where gm = Z(β)ψm (β)dµ(β). Consequently
Z
2
|Z(β)| dµ(β) =
∞
X
m=1
|gm |2 .
(5.5)
Since Z(β) is zero-mean Gaussian, the gm ’s are jointly zero-mean complexvalued normally distributed. Moreover, by Mercer’s theorem and
Z Z
h
i
ψk (β2 )E Z(β2 )Z(β1 ) ψm (β1 )dµ(β1 )dµ(β2 )
E [gk gm ] =
Z Z
=
ψk (β2 )Γ(β1 , β2 )ψm (β1 )dµ(β1 )dµ(β2 )
Z Z
∞
X
λj
=
ψk (β2 )ψj (β1 )ψj (β2 ).ψm (β1 )dµ(β1 )dµ(β2 )
j=1
5
I.e., µ ({β ∈ B : Z(β) =
P∞
m=1 gm ψm (β)})
= 1.
24
∞
X
=
λj
j=1
∞
X
=
Z
ψk (β2 )ψj (β2 )dµ(β2 )
Z
ψj (β1 )ψm (β1 )dµ(β1 )
λj I(k = j).I(m = j)
j=1
(5.6)
= λm I(k = m).
By joint normality, (5.6) implies that the sequence {gm }∞
m=1 is independent, and
0
so is the sequence Gm = (Re[gm ], Im[gm ]) . This is a well-known result, but will
be proved in Lemma 5.1 below.6
Each Gm is bivariate zero mean normally distributed, i.e., Gm ∼ N2 [0, Σm ]. Using the well-known decomposition Σm = Qm Ωm Q0m , where Ωm = diag(ω1,m , ω2,m )
is the diagonal matrix of eigenvalues of Σm and Qm is the orthogonal matrix of
the two corresponding eigenvectors, we can write
¶
µ √
ω1,m e1.m
0
Qm Gm = √
,
ω2,m e2.m
were the sequence (e1.m , e2.m )0 is i.i.d. N2 [0, I2 ]. Now
|gm |2 = gm gm = G0m Gm = Qm Q0m Gm = ω1,m e21.m + ω2,m e22.m
and by (5.6) and Mercer’s theorem, ω1,m + ω2,m = λm and
∞
X
ω1,m +
m=1
∞
X
ω2,m =
m=1
∞
X
E[gm gm ] =
m=1
∞
X
m=1
λm < ∞.
Thus (5.5) now reads
Z
2
|Z(β)| dµ(β) =
∞
X
ω1,m e21.m
m=1
+
∞
X
ω2,m e22.m .
m=1
Finally, denoting for m ∈ N, ω2m−1 = ω1,m , ω2m = ω2,m , ε2m−1 = e1.m , ε2m =
e2.m , for example, the result of Theorem 5.1 follows.
As said before, the claim that (5.6) implies that the sequence {gm }∞
m=1 is
0
independent, and so is the sequence Gm = (Re[gm ], Im[gm ]) , is a well-known
6
Because I could not find a formal proof in the literature.
25
result. However, as appears from the proof of the following lemma, this result is
far from obvious.
Lemma 5.1. Let {gm }∞
m=1 be a sequence of zero-mean complex-valued jointly
Gaussian random variables satisfying
E [gm gk ] = 0 for k 6= m.
(5.7)
Then the sequence Gm = (Re[gm ], Im[gm ])0 , m ∈ N, is independent.
Proof. By the joint normality of the sequence {Gm }∞
m=1 it suffices to verify that
for m 6= k, E [Gk G0m ] = O, as follows. First, note that
gm gk = (Re[gm ] Re[gk ] + Im[gm ] Im[gk ])
+i. (Re[gm ] Im[gk ] − Im[gm ] Re[gk ]) ,
hence (5.7) implies
E (Re[gm ] Re[gk ]) + E (Im[gm ] Im[gk ]) = 0,
E (Re[gm ] Im[gk ]) − E (Im[gm ] Re[gk ]) = 0.
(5.8)
It follows straightforwardly from (5.8) that
∙µ
¶µ
¶¸
Re[gk ] − Im[gk ]
Re[gm ] Im[gm ]
E
= O,
Im[gk ]
− Im[gm ] Re[gm ]
Re[gk ]
which can be written as
E
[Gk G0m ]
+ P2 E
[Gk G0m ] P20
∙
= E (Gk , P2 Gk )
where
P2 =
µ
0 1
−1 0
¶
µ
G0m
G0m P20
¶¸
= O,
.
Next, observe from (5.8) that
¶
µ
E (Re[gk ] Re[gm ]) E (Re[gk ] Im[gm ])
0
E [Gk Gm ] =
E (Im[gk ] Re[gm ]) E (Im[gk ] Im[gm ])
µ
¶
E (Re[gk ] Re[gm ])
E (Re[gk ] Im[gm ])
=
,
E (Re[gk ] Im[gm ]) −E (Re[gk ] Re[gm ])
26
(5.9)
hence E [Gk G0m ] is symmetric, with eigenvalues
q
λ1 =
(E (Re[gm ] Re[gk ]))2 + (E (Re[gm ] Im[gk ]))2 ,
λ2 = −λ1 ,
Therefore, E [Gm G0k ] can be written as
E
[Gm G0k ]
= λ1 Qk,m
µ
1
0
0 −1
¶
Q0k,m ,
(5.10)
where Qk,m is an orthogonal 2 × 2 matrix. Substituting the expression (5.10) in
(5.9) yields
¶
¶
µµ
µ
¶
1
0
1
0
0
0
0
+ Qk,m P2 Qk,m
Qk,m P2 Qk,m = O.
λ1
(5.11)
0 −1
0 −1
Since P2 and Qk,m are orthogonal, the matrix Q0k,m P2 Qk,m is orthogonal, which
in the 2 × 2 case takes the general form
¶
µ
cos(φ) sin(φ)
0
(5.12)
Qk,m P2 Qk,m =
− sin(φ) cos(φ)
for some φ ∈ [0, 2π]. It is now easy to verify that equation (5.11) reads
¶
µ
cos(φ) − sin(φ)
2λ1 cos(φ)
= O.
− sin(φ) − cos(φ)
(5.13)
Since the matrix in (5.13) is non-zero for all φ ∈ [0, 2π], it follows now that
λ1 cos(φ) = 0, so that either
λ1 = 0 or φ ∈ {π/2, 3π/4} .
(5.14)
Suppose that φ = π/2. Then by (5.12),
Q0k,m P2 Qk,m = P2 .
Again, without loss of generality we may assume that for some θ ∈ [0, 2π],
µ
¶
cos(θ) − sin(θ)
.
Qk,m =
sin(θ)
cos(θ)
27
(5.15)
Then it is easy to verify that
¶
µ
−2 cos(θ) sin(θ)
1 − 2 sin2 (θ)
0
,
Qk,m P2 Qk,m =
−2 cos(θ) sin(θ) 1 − 2 cos2 (θ)
which is obviously unequal to P2 for all θ ∈ [0, 2π]. Thus, the equality (5.15) is
not possible, hence φ 6= π/2. Similarly, φ = 3π/4 implies Q0k,m P2 Qk,m = P20 , which
is also not possible, so that φ 6= 3π/4 as well. Consequently, it follows from (5.14)
that λ1 = 0, which by (5.10) implies that E [Gm G0k ] = O.
6. Upper bounds of the critical values of the SICM test
Similar to Bierens and Ploberger (1997, Theorem 7) it follows from Theorem 5.1
that the following result holds.
Theorem 6.1. Let the conditions of Theorem 5.1 hold, and let
χ21 = sup
n≥1
n
1X 2
ε
n m=1 m
Then
"µZ
#
¶−1 Z
∙P∞
¸
ωm ε2m
2
m=1
Pr
Γ(β, β)dµ(β)
|Z(β)| dµ(β) > t = Pr P∞
> t ≤ Pr[χ21 > t]
m=1 ωm
for all t > 0. Therefore, for α ∈ (0, 1) and t(α) such that Pr[χ21 > t(α)] = α,
¸
∙Z
Z
2
Pr
|Z(β)| dµ(β) > t(α). Γ(β, β)dµ(β) ≤ α.
The values of t(α) for α = 0.01, α = 0.05 and α = 0.10 have been calculated
in Bierens and Ploberger (1997), i.e.,
t(0.01) = 6.81, t(0.05) = 4.26, t(0.10) = 3.23
(6.1)
To apply these upper bounds of Rthe asymptotic critical values to the SICM
test we need a consistent estimate of Γ(β, β)dµ(β). Recall from Section 3 in BW
that the empirical process on which the SICM test is based takes the form
n
´
1 X³
(s)
0
0˜
b
exp(i.τ Yj ) − exp(i.τ Yj ) exp(i.ξ 0 Xj )
Zn (τ, ξ) = √
n j=1
28
where Yj and Xj are (made) bounded random vectors, Y˜j is a random drawing
from the estimated conditional distribution F (y|Xj ; ˆθ) of Yj , and (τ, ξ) ∈ Υ × Ξ.
The corresponding estimated covariance function takes the form
b(s)
Γ
n ((τ1 , ξ1 ), (τ2 , ξ2 ))
n
´
1 X³
0
0 ˜
exp(i.τ1 Yj ) − exp(i.τ1 Yj ) exp(i.ξ10 Xj )
=
n j=1
´
³
× exp(−i.τ20 Yj ) − exp(−i.τ20 Y˜j ) exp(−i.ξ20 Xj )
n
1 X³
=
exp(i.(τ1 − τ2 )0 Yj ) + exp(i.(τ1 − τ2 )0 Y˜j )
n j=1
−
exp(i.τ10 Yj ) exp(−i.τ20 Y˜j )
−
´
exp(i.τ10 Y˜j ) exp(−i.τ20 Yj )
× exp(i.(ξ1 − ξ2 )0 Xj ) exp(−i.ξ20 Xj ),
hence
b(s)
Γ
n ((τ, ξ), (τ, ξ))
n
´
1 X³
=
2 − exp(i.τ 0 (Yj − Y˜j )) − exp(−i.τ 0 (Yj − Y˜j ))
n j=1
1X
cos(τ 0 (Yj − Y˜j ))
=2−2
n j=1
n
Now let Υ = [−c, c]m and Ξ = [−c, c]k and let µ
R
(s)
measure on Υ×Ξ. Then the expression for Tbn (c) =
in equation (31) in BW, whereas
bn(s) (c) =
R
Z
1
b(s)
Γ
n ((τ, ξ), (τ, ξ)) dµ(τ, ξ) = 2 − 2
n
probability
¯be the uniform
¯2
¯ b(s)
¯
¯Zn (τ, ξ)¯ dµ(τ, ξ) is given
m
n Y
X
j=1 i=1
³
´
˜
sin c(Yi,j − Yi,j )
c(Yi,j − Y˜i,j )
,
where Yi,j and Y˜i,j are component i of Yj and Y˜j , respectively. The upper bounds
(s)
bn(s) (c),
(6.1) are now applicable to the standardized SICM test statistic Tbn (c)/R
29
i.e., under the null hypothesis,
h
i
bn(s) (c) > 6.81 ≤ 0.01
lim sup Pr Tbn(s) (c)/R
n→∞
h
i
(s)
(s)
b
b
lim sup Pr Tn (c)/Rn (c) > 4.36 ≤ 0.05
n→∞
h
i
(s)
(s)
b
b
lim sup Pr Tn (c)/Rn (c) > 3.23 ≤ 0.10
n→∞
7. Concluding remarks
In the mathematical literature the Hilbert-Schmidt and Mercer theorems are
nowadays usually derived as by-products of linear operator theory,7 which however is way over my head. Therefore, in this addendum to BW I have presented
alternative proofs which only require elementary Hilbert space theory and basic
knowledge of linear algebra and complex calculus. I do not claim any originality.
It seems likely that these proofs are already done in this way somewhere, but if
so I am not aware of any references. The only originality claim I can make is that
I figured out these proofs all by myself.
Note that we could have used moment generating functions rather than characteristic functions because all the variables involved are bounded or made bounded.
Then we could have used the Hilbert-Schmidt and Mercer theorems in Bierens and
Ploberger (1997) and its addendum Bierens (2014b).
Finally, note that recently, in Bierens and Wang (2014), the SICM test has
been generalized to parametric conditional distributions of stationary time series
models, by combining the weighted ICM testing idea in Bierens (1984) with the
approach in the current paper under review.
References
Bernkopf, M. (1966): ”The Development of Function Spaces with Particular
Reference to their Origins in Integral Equation Theory”, Archive for History of
Exact Sciences 3, 1-96.
Bierens, H. J. (1982): ”Consistent Model Specification Tests”, Journal of
Econometrics 20, 105-134.
7
See for example Krein (1998), among others.
30
Bierens, H. J. (1984): ”Model Specification Testing of Time Series Regressions”, Journal of Econometrics 26, 323-353.
Bierens, H. J. (1990): ”A Consistent Conditional Moment Test of Functional
Form”, Econometrica 58, 1443-1458.
Bierens, H. J. (2004): Introduction to the Mathematical and Statistical Foundations of Econometrics, Cambridge University Press.
Bierens, H. J. (2014a): ”The Hilbert Space Theoretical Foundation of SemiNonparametric Modeling”, Chapter 1 in: J. Racine, L. Su and A. Ullah (eds), The
Oxford Handbook of Applied Nonparametric and Semiparametric Econometrics
and Statistics, Oxford University Press.
Bierens, H. J. (2014b): ”Addendum to Asymptotic Theory of Integrated Conditional Moment Tests”, http://grizzly.econ.psu.edu/~hbierens/ADDENDUM_
BP1997.PDF.
Bierens, H. J. (2015): ”Addendum to Consistent Model Specification Tests”,
http://grizzly.econ.psu.edu/~hbierens/ADDENDUM_B1982.PDF.
Bierens, H. J., and W. Ploberger (1997): ”Asymptotic Theory of Integrated
Conditional Moment Tests”, Econometrica 65, 1129-1151.
Bierens, H. J., and L. Wang (2012): ”Integrated Conditional Moment Tests
for Parametric Conditional Distributions”, Econometric Theory 28, 328-362.
Bierens, H. J., and L. Wang (2014): ”Weighted Simulated Integrated Conditional Moment Tests for Parametric Conditional Distributions of Stationary Time
Series Processes”, forthcoming in Econometric Reviews.
Hadinejad-Mahram, H., D. Dahlhaus and D. Blomker (2002): ”KarhunenLoeve Expansion of Vector Random Processes”, Technical Report No. IKT-NT
1019, Communications Technological Laboratory, Swiss Federal Institute of Technology, Zurich.
Krein, M.G. (1998): ”Compact Linear Operators on Functional Spaces with
Two Norms”, Integral Equations and Operator Theory 30, 140-162.
Mercer, J. (1909): ”Functions of Positive and Negative Type and their Connection with the Theory of Integral Equations”, Philosophical Transactions of the
Royal Society A 209, 415-446.
Siegmund-Schultze, R. (1986), ”Der Beweis des Hilbert-Schmidt-Theorems”,
Archive for History of Exact Sciences 36, 251-270.
31