Sample Covariance Based Parameter Estimation for Digital Communications Tesi Doctoral

Tesi Doctoral
Sample Covariance Based
Parameter Estimation
for Digital Communications
Autor: Javier Villares Piera
Director: Gregori V´azquez Grau
Department de Teoria del Senyal i Comunicacions
Universitat Polit`ecnica de Catalunya
Barcelona, maig de 2005
Abstract
This thesis deals with the problem of blind second-order estimation in digital communications.
In this field, the transmitted symbols appear as non-Gaussian nuisance parameters degrading
the estimator performance. In this context, the Maximum Likelihood (ML) estimator is generally unknown unless the signal-to-noise (SNR) is very low. In this particular case, if the SNR is
asymptotically low, the ML solution is quadratic in the received data or, equivalently, linear in
the sample covariance matrix. This significant feature is shared by other important ML-based
estimators such as, for example, the Gaussian and Conditional ML estimators. Likewise, MUSIC
and other related subspace methods are based on the eigendecomposition of the sample covariance matrix. From this background, the main contribution of this thesis is the deduction and
evaluation of the optimal second-order parameter estimator for any SNR and any distribution
of the nuisance parameters.
A unified framework is provided for the design of open- and closed-loop second-order estimators. In the first case, the minimum mean square error and minimum variance second-order
estimators are deduced considering that the wanted parameters are random variables of known
but arbitrary prior distribution. From this Bayesian approach, closed-loop estimators are derived by imposing an asymptotically informative prior. In this small-error scenario, the best
quadratic unbiased estimator (BQUE) is obtained without adopting any assumption about the
statistics of the nuisance parameters. In addition, the BQUE analysis yields the lower bound
on the performance of any blind estimator based on the sample covariance matrix.
Probably, the main result in this thesis is the proof that quadratic estimators are able to
exploit the fourth-order statistical information about the nuisance parameters. Specifically,
the nuisance parameters fourth-order cumulants are shown to provide all the non-Gaussian
information that is utilizable for second-order estimation. This fourth-order information becomes
relevant in case of constant modulus nuisance parameters and medium-to-high SNRs. In this
situation, the Gaussian assumption is proved to yield inefficient second-order estimates.
Another original result in this thesis is the deduction of the quadratic extended Kalman filter
(QEKF). The QEKF study concludes that second-order trackers can improve simultaneously the
acquisition and steady-state performance if the fourth-order statistical information about the
i
ii
nuisance parameters is taken into account. Once again, this improvement is significant in case
of constant modulus nuisance parameters and medium-to-high SNRs.
Finally, the proposed second-order estimation theory is applied to some classical estimation
problems in the field of digital communications such as non-data-aided digital synchronization,
the related problem of time-of-arrival estimation in multipath channels, blind channel impulse
response identification, and direction-of-arrival estimation in mobile multi-antenna communication systems. In these applications, an intensive asymptotic and numerical analysis is carried
out in order to evaluate the ultimate limits of second-order estimation.
Resum
En aquesta tesi s’estudia el problema d’estimaci´
o cega de segon ordre en comunicacions digitals.
En aquest camp, els s´ımbols transmesos esdevenen par`
ametres no desitjats (nuisance parameters)
d’estad´ıstica no gaussiana que degraden les prestacions de l’estimador. En aquest context,
l’estimador de m`
axima versemblan¸ca (ML) ´es normalment desconegut excepte si la relaci´
o senyalsoroll (SNR) ´es prou baixa. En aquest cas particular, l’estimador ML ´es una funci´o quadr`atica del
vector de dades rebudes o, equivalentment, una transformaci´o lineal de la matriu de covari`
ancia
mostral. Aquesta caracter´ıstica es compartida per altres estimadors importants basats en el
principi de m`axima versemblan¸ca com ara l’estimador ML gaussi`a (GML) i l’estimador ML
condicional (CML). Aix´ı mateix, l’estimador MUSIC, i altres m`etodes de subespai relacionats
amb ell, es basen en la diagonalitzaci´
o de la matriu de covari`
ancia mostral. En aquest marc,
l’aportaci´
o principal d’aquesta tesi ´es la deducci´
o i avaluaci´
o de l’estimador o`ptim de segon ordre
per qualsevol SNR i qualsevol distribuci´
o dels nuisance parameters.
El disseny d’estimadors quadr`
atics en lla¸c obert i lla¸c tancat s’ha plantejat de forma unificada.
Pel que fa als estimadors en lla¸c obert, s’han derivat els estimadors de m´ınim error quadr`
atic
mig i m´ınima vari`
ancia considerant que els par`
ametres d’inter`es s´
on variables aleat`
ories amb
una distribuci´o estad´ıstica coneguda a priori per`
o, altrament, arbitr`aria. A partir d’aquest
plantejament Bayesi`
a, els estimadors en lla¸c tancat es poden obtenir suposant que la distribuci´o
a priori dels par`ametres ´es altament informativa. En aquest model de petit error, el millor
estimador quadr`atic no esbiaixat, anomenat BQUE, s’ha formulat sense convenir cap estad´ıstica
particular pels nuisance parameters. Afegit a aix`
o, l’an`
alisi de l’estimador BQUE ha perm`es
calcular quina ´es la fita inferior que no pot millorar cap estimador cec que utilitzi la matriu de
covari`
ancia mostral.
Probablement, el resultat principal de la tesi ´es la demostraci´
o de qu`e els estimadors
quadr`atics s´
on capa¸cos d’utilitzar la informaci´
o estad´ıstica de quart ordre dels nuisance parameters. M´es en concret, s’ha demostrat que tota la informaci´
o no gaussiana de les dades que
els m`etodes de segon ordre s´on capa¸cos d’aprofitar apareix reflectida en els cumulants de quart
ordre dels nuisance parameters. De fet, aquesta informaci´
o de quart ordre esdev´e rellevant si el
m`
odul dels nuisance parameters ´es constant i la SNR ´es moderada o alta. En aquestes condi-
iii
iv
cions, es demostra que la suposici´o gaussiana dels nuisance parameters d´ona lloc a estimadors
quadr`atics no eficients.
Un altre resultat original que es presenta en aquesta mem`
oria ´es la deducci´
o del filtre de
Kalman est`es de segon ordre, anomenat QEKF. L’estudi del QEKF assenyala que els algoritmes
de seguiment (trackers) de segon ordre poden millorar simult`aniament les seves prestacions
d’adquisici´
o i seguiment si la informaci´
o estad´ıstica de quart ordre dels nuisance parameters es
t´e en compte. Una vegada m´es, aquesta millora ´es significativa si els nuisance parameters tenen
m`
odul constant i la SNR ´es prou alta.
Finalment, la teoria dels estimadors quadr`
atics plantejada s’ha aplicat en alguns problemes
d’estimaci´
o cl`
assics en l’`
ambit de les comunicacions digitals com ara la sincronitzaci´o digital no
assistida per dades, el problema de l’estimaci´
o del temps d’arribada en entorns amb propagaci´o
multicam´ı, la identificaci´
o cega de la resposta impulsional del canal i, per u
´ltim, l’estimaci´
o de
l’angle d’arribada en sistemes de comunicacions m`
obils amb m´
ultiples antenes. Per cadascuna
d’aquestes aplicacions, s’ha realitzat un an`
alisi intensiu, tant num`eric com asimpt`
otic, de les
prestacions que es poden aconseguir amb m`etodes d’estimaci´
o de segon ordre.
a Natalia,
Agra¨ıments
Sin duda ´este es el cap´ıtulo m´
as importante de la tesis. Sin este cap´ıtulo, el resto de la tesis
no tendr´ıa ning´
un sentido. En estos cinco a˜
nos he encontrado un “lugar en el mundo” donde
desarrollar las dos vocaciones que dan sentido a mi vida: aprender y ense˜
nar. Sin duda este
lugar lo han hecho posible un grupo de personas excepcionales sin las cuales no ser´ıa quien soy.
Desde estas p´
aginas, quiero compartir con ellos este momento de alegr´ıa y satisfacci´
on.
El meu primer agra¨ıment ´es pel Gregori: per fer-me sempre costat, per creure en mi i, sobre
tot, per la seva amistat incondicional. Vull tamb´e fer arribar el meu m´es profund agra¨ıment a
la Xell, per ajudar-me des del primer dia i oferir-me la seva amistat. Una altra abra¸cada molt
forta pel Francesc, perqu`e ha estat com el meu germ`
a gran al departament.... Recorda que tenim
pendent un sopar per celebrar les tesis! L’altre germ`a gran va marxar de casa per`
o sempre li
estar´e agra¨ıt per recolzar-me tant. Gr`
acies, Xavi.
M´es agra¨ıments... al Jaume, perqu`e parlar amb ell ´es un plaer i sempre ha tingut un moment
per parlar amb mi. Al Jose, perqu`e ´es una de les persones que m´es admiro personal i professionalment. Espero de tot cor que puguem treballar molts anys junts. El mateix m’agradaria
dir-li a l’Enric. Des d’aqu´ı li desitjo que sigui molt feli¸c dins i fora del seu matrimoni... Josep,
els teoremes de la capacitat tremolen quan senten el teu nom.... “que sigues feli¸c”. I molts m´es
agra¨ıments a tots els companys del departament amb els que he viscut tantes estones agradables.
Aquesta tesi ha estat una aventura apasionant perqu`e l’he viscut amb vosaltres...
El u
´ltimo agradecimiento, aunque sea el primero de todos ellos, es para mi familia. Esta
tesis es vuestra; he llegado aqu´ı gracias a vuestro esfuerzo y cari˜
no. Un beso muy fuerte para
mis abuelos, para mis padres, para Robert y sus dos chicas, para mis tios y primos. Tambi´en
para todos los amigos y amigas que no han dejado de darme ´animos. Gracias.
Ahora y siempre, a Natalia, por ser principio y fin de todo lo que hago y siento. TQM.
This work has been partially supported by the European Commission (FEDER) and Spanish/Catalan Government under projects TIC2003-05482, TEC2004-04526 and 2001SGR-00268.
vii
Contents
1 Introduction
1
1.1
The nuisance unknowns in parameter estimation . . . . . . . . . . . . . . . . . .
1
1.2
The Bayesian approach: the bias-variance dilema . . . . . . . . . . . . . . . . . .
3
1.3
Noncircular nuisance unknowns . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4
Self-noise in multivariate problems: interparameter interference. . . . . . . . . . .
6
1.5
Informative priors: estimation on track . . . . . . . . . . . . . . . . . . . . . . . .
7
1.6
Limiting asymptotic performance . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.7
Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
2 Elements on Estimation Theory
13
2.1
Classical vs. Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.2
MMSE and MVU Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.3
Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.3.1
Decision Directed ML Estimation . . . . . . . . . . . . . . . . . . . . . . .
19
2.3.2
Asymptotic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
20
Linear Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.4.1
Low-SNR Unconditional Maximum Likelihood . . . . . . . . . . . . . . .
25
2.4.2
Conditional Maximum Likelihood (CML) . . . . . . . . . . . . . . . . . .
27
2.4.3
Gaussian Maximum Likelihood (GML) . . . . . . . . . . . . . . . . . . . .
28
Maximum Likelihood Implementation . . . . . . . . . . . . . . . . . . . . . . . .
29
2.5.1
ML-Based Closed-Loop Estimation . . . . . . . . . . . . . . . . . . . . . .
31
2.5.2
ML-based Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
Lower Bounds in Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . .
37
2.6.1
Deterministic Bounds based on the Cauchy-Schwarz Inequality . . . . . .
37
2.6.2
Bayesian Bounds based on the Cauchy-Schwarz Inequality . . . . . . . . .
49
2.6.3
Bayesian Bounds based on the Kotelnikov’s Inequality . . . . . . . . . . .
52
2.A UML for polyphase alphabets . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
2.B Low-SNR UML results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
2.C CML results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
2.D GML asymptotic study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
58
2.4
2.5
2.6
ix
CONTENTS
x
2.E Closed-loop estimation efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . .
59
2.F Computation of Σsv (θ) for the small-error bounds . . . . . . . . . . . . . . . . .
61
2.G MCRB, CCRB and UCRB derivation . . . . . . . . . . . . . . . . . . . . . . . .
62
3 Optimal Second-Order Estimation
65
3.1
Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
67
3.2
Second-Order MMSE Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
3.3
Second-Order Minimum Variance Estimator . . . . . . . . . . . . . . . . . . . . .
70
3.4
A Case Study: Frequency Estimation . . . . . . . . . . . . . . . . . . . . . . . . .
72
3.4.1
Bias Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
3.4.2
MSE Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
75
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
3.A Second-order estimation in noncircular transmissions . . . . . . . . . . . . . . . .
81
3.B Deduction of matrix Q(θ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
3.C Fourth-order moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
84
3.D Bayesian average in frequency estimation . . . . . . . . . . . . . . . . . . . . . .
85
3.E Bias study in frequency estimation . . . . . . . . . . . . . . . . . . . . . . . . . .
86
3.5
4 Optimal Second-Order Small-Error Estimation
89
4.1
Small-Error Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91
4.2
Second-Order Minimum Variance Estimator . . . . . . . . . . . . . . . . . . . . .
92
4.3
Second-Order Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
93
4.4
Generalized Second-Order Constrained Estimators . . . . . . . . . . . . . . . . .
95
4.5
A Case Study: Frequency Estimation . . . . . . . . . . . . . . . . . . . . . . . . .
96
4.6
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.A Small-error matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.B Proof of bias cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5 Quadratic Extended Kalman Filtering
107
5.1
Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2
Background and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
5.3
Linearized Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.4
Quadratic Extended Kalman Filter (QEKF) . . . . . . . . . . . . . . . . . . . . . 112
5.4.1
Another QEKF derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4.2
Kalman gains recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.4.3
QEKF programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.5
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.6
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6 Case Studies
123
CONTENTS
6.1
6.2
6.3
6.4
6.5
xi
Non-Data-Aided Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.1.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.1.2
Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.1.3
Open-Loop Timing Synchronization . . . . . . . . . . . . . . . . . . . . . 129
6.1.4
Closed-Loop Analysis and Optimization . . . . . . . . . . . . . . . . . . . 131
6.1.5
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Carrier Phase Synchronization of Noncircular Modulations . . . . . . . . . . . . . 141
6.2.1
Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
6.2.2
NDA ML Estimation in Low-SNR Scenarios . . . . . . . . . . . . . . . . . 143
6.2.3
High-SNR Analysis: Self-noise . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2.4
Second-Order Optimal Estimation . . . . . . . . . . . . . . . . . . . . . . 146
6.2.5
High SNR Study: Self-noise . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6.2.6
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
TOA Estimation in Multipath Scenarios . . . . . . . . . . . . . . . . . . . . . . . 152
6.3.1
Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
6.3.2
Optimal Second-Order NDA Estimator . . . . . . . . . . . . . . . . . . . 154
6.3.3
Optimal Second-Order DA Estimator . . . . . . . . . . . . . . . . . . . . 156
6.3.4
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Blind Channel Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4.1
Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
6.4.2
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Angle-of-Arrival (AoA) Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.5.1
Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.5.2
Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.A Computation of Q for carrier phase estimation . . . . . . . . . . . . . . . . . . . 175
6.B Asymptotic expressions for multiplicative channels . . . . . . . . . . . . . . . . . 176
7 Asymptotic Studies
177
7.1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
7.2
Low SNR Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
7.3
High SNR Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
7.4
7.3.1
(Gaussian) Unconditional Cram´er-Rao Bound . . . . . . . . . . . . . . . . 183
7.3.2
Gaussian Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 184
7.3.3
Best Quadratic Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . 184
7.3.4
Large Error Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Large Sample Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
7.4.1
(Gaussian) Unconditional Cram´er-Rao Bound . . . . . . . . . . . . . . . . 189
7.4.2
Gaussian Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 190
7.4.3
Best Quadratic Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . 191
CONTENTS
xii
7.5
7.6
7.4.4
Second-Order Estimation in Digital Communications . . . . . . . . . . . . 193
7.4.5
Second-Order Estimation in Array Signal Processing . . . . . . . . . . . . 196
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
7.5.1
SNR asymptotic results for the BQUE and GML estimators . . . . . . . . 199
7.5.2
SNR asymptotic results for the large-error estimators . . . . . . . . . . . 201
7.5.3
Large sample asymptotic results for the BQUE and GML estimators . . . 202
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.A Low-SNR ML scoring implementation . . . . . . . . . . . . . . . . . . . . . . . . 210
−1 (θ) . . . . . . . . . . . . . . . . . . . . . . . . 211
7.B High-SNR limit of R−1 (θ) and R
7.C High-SNR limit of Q−1 (θ) (K full-rank) . . . . . . . . . . . . . . . . . . . . . . . 213
7.D High-SNR limit of Q−1 (θ) (K singular) . . . . . . . . . . . . . . . . . . . . . . . 215
7.E High-SNR results with A (θ) singular . . . . . . . . . . . . . . . . . . . . . . . . . 217
7.F High-SNR UCRB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
7.G High-SNR UCRB variance floor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
7.H High-SNR study in feedforward second-order estimation . . . . . . . . . . . . . . 221
7.I
High-SNR MSE floor under the Gaussian assumption . . . . . . . . . . . . . . . . 223
7.J Performance limits in second-order frequency estimation . . . . . . . . . . . . . . 224
7.K Asymptotic study for M → ∞
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 226
7.L Asymptotic study for Ns → ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
8 Conclusions and Topics for Future Research
8.1
237
Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
A Notation
245
B Acronyms
249
Bibliography
Chapter 1
Introduction
This dissertation is the result of almost five years working in digital communications and signal
processing. During all this time, multiple estimation problems have been addressed. The experience in these applications is materialized in this document. We thought that an attractive way
of introducing this thesis is to explain its evolution from the original idea until the completion
of this dissertation. In the following sections, the series of obstacles that were dealt with in the
way of this thesis are briefly commented and contextualized. In short, this is the history of the
thesis...
1.1
The nuisance unknowns in parameter estimation
In the beginning, our research activity was focused on non-data-aided (blind) digital synchronization [Gar88a][Men97] [Vaz00]. In this field, the receiver has to estimate some parameters
from the received waveform in order to recover the transmitted data symbols. Basically, the
receiver has to determine the symbol timing and, in bandpass communications, the carrier phase
and frequency. With this aim, in most communication standards, a known training sequence
is transmitted to assist the receiver during the signal synchronization. Once the training is
finished, the synchronizer has to maintain the synchronism despite the parameters usually fluctuate due to the time-varying propagation channel and the terminal equipments nonidealities.
In these conditions, the synchronizer has to cope with the thermal noise and, in addition, with
the so-called self-noise —or pattern-noise— generated by the own unknown random data symbols.
In fact, the random data symbols can be regarded as nuisance parameters that complicate the
estimation of the parameter of interest. An special attention is given in this thesis to these
nuisance parameters and the induced self-noise.
Most non-data-aided techniques in the literature are designed assuming a low SNR
1
CHAPTER 1. INTRODUCTION
2
[Men97][Vaz00]. This assumption is rather realistic in modern digital communications due to the
utilization of sofisticated error correcting codes [Ber93]. In that case, it is well-known that the
maximum likelihood (ML) estimator is quadratic in the received data in most estimation problems as, for instance, timing and frequency synchronization. The most important point is that,
whatever the actual SNR, the ML estimator is known to attain the Cram´er-Rao bound (CRB)
if the observation time is sufficiently large [Kay93b]. Thus, the ML estimator is asymptotically
the minimum variance unbiased estimator [Kay93b].
Unfortunately, the ML estimator is generally unknown for medium-to-high SNRs in the
presence of nuisance parameters. Thus, the ML estimator is an unknown function of the observed
data y, that can be approximated around the true parameter θ —small-error approximation— by
means of the following N-th order polynomial:
M L θ
N
Mn (θ) yn
n=0
where yn is the vector containing all the n-th order sample products of y, and Mn (θ) the
associated coefficients. Notice that the so-called small-error approximation can be achieved by
means of iterative and closed-loop schemes (Chapter 2).
Regarding the last expression, we conclude that higher-order techniques (i.e., N > 2) are
generally required to attain the CRB for medium-to-high SNR. For example, a heuristic fourthorder closed-loop timing synchronizer was proposed in [And90][Men97, Sec.9.4] for the minimum
shift keying (MSK) modulation that outperforms —at high SNR— any existing second-order
technique. This work was the motivation for deducing the optimal fourth-order estimator given
by
4 = M0 (θ) + M2 (θ) y2 + M4 (θ) y4
θ
where M0 (θ) , M2 (θ) and M4 (θ) are selected to minimize the estimator variance [Vil01b].
The proposed estimator became quadratic at low SNR (M4 = 0) and exploited the fourth-order
component when the SNR was increased (M4 = 0). For more details, the reader is referred to
the original paper [Vil01b]:
• “Fourth-order Non-Data-Aided Synchronization”. J. Villares, G. V´
azquez, J. Riba. Proc.
of the IEEE Int. Conf. on Accoustics, Speech and Signal Processing 2001 (ICASSP 2001).
pp. 2345-2348. Salt Lake City (USA). May 2001.
Although the focus shifted from fourth-order to second-order methods soon, this contribution was actually the basis of this thesis. In this work, the estimator coefficients were directly
optimized for a given observation length and estimator order (N = 4). To carry out this optimization, the Kronecker product ⊗ and vec (·) operators were introduced in order to manipulate
1.2. THE BAYESIAN APPROACH: THE BIAS-VARIANCE DILEMA
3
the n-th order observation yn . Moreover, during the computation of the optimal coefficients M2
and M4 , we realized that some fourth-order moments of the MSK modulation were ignored in
other well-known second-order ML-based approximations.
After this work, we wondered about the best quadratic non-data-aided estimator or, in other
words, which are the optimal coefficients M0 (θ) and M2 (θ) in
2 = M0 (θ) + M2 (θ) y2
θ
for a given estimation problem. At low SNR, the optimal second-order estimator is given by the
low-SNR ML approximation, at least for sufficiently large data records. On the other hand, if
the SNR increases, the optimal second-order estimator is only known in case of Gaussian data
symbols. However, the Gaussian assumption is clearly unrealistic in digital communications
because the transmitted symbols belong to a discrete alphabet. This intuition was confirmed
for the MSK modulation in the following paper [Vil01a].
• “Best Quadratic Unbiased Estimator (BQUE) for Timing and Frequency Synchronization”. J. Villares, G. V´
azquez. Proc. of the 11th IEEE Int. Workshop on Statistical
Signal Processing (SSP01). pp. 413-416. Singapore. August 2001. ISBN 0-7803-7011-2.
In this pioneering paper, the Gaussian assumption was found to yield suboptimal timing
estimates at high SNR when the observation time is short. Quadratic estimators were improved
considering the fourth-order cumulants or kurtosis of the MSK constellation. However, this
fourth-order information was shown to be irrelevant if the number of observations is augmented.
Thus, it was shown via Monte Carlo simulations that the Gaussian assumption is asymptotically
optimal in the problem of timing synchronization. Additional simulations and remarks were
given in the tutorial paper presented in the ESA workshop [Vaz01].
1.2
The Bayesian approach: the bias-variance dilema
Another contribution in the referred paper was the formulation of optimal second-order openloop estimators. Open-loop schemes are very attractive in digital synchronization because they
allow reducing the acquisition time of closed-loop synchronizers [Men97][Rib97]. To design openloop estimators, the small-error approximation is abandonned and the parameter θ is assumed
to take values in a given interval Θ. In this large-error scenario, the N-th order expansion of
the ML estimator depends on the unknown value of θ ∈ Θ and, consequently, the ML estimator
cannot be generally implemented by means of a polynomial in y.
To overcome this limitation, the Bayesian formulation was adopted and the parameter θ
was modelled as a random variable of know prior distribution fθ (θ). Then, the prior was
CHAPTER 1. INTRODUCTION
4
applied to optimize the estimator coefficients {Mn } “on the average”, that is, considering all the
possible values of θ ∈ Θ and the associated probabilities fθ (θ). Actually, the Bayesian approach
encompasses both the small and large error scenarios since the small-error approximation can
be imposed by considering an extremely informative prior.
It can be shown that, in the small-error regime, it is equivalent to minimize the estimator
mean square error (MSE) and the estimator variance. However, in the large-error regime,
the minimum MSE estimator becomes usually biased with the aim of reducing the variance
contribution. In fact, the more corrupted is the observation y, the more biased is the minimum
MSE solution. The reason is that, when the observation is severely degraded by the thermal
and self noise terms, the minimum MSE estimator is not confident about the observation y
and it resorts to the a priori knowledge on the parameters. In that way, the estimator reduces
the variance induced by the random terms (noise and self-noise) although it becomes biased in
return unless the value of θ coincides with the expected value of the prior.
In this early paper, the main problem in second-order open-loop estimation was identified
for the first time. In general, unbiased second-order open-loop estimators are not feasible. Even
if the estimator variance can be usually removed by extending the observation time, there is
always a residual bias that sets a limit on the performance of open-loop estimators. Despite
this conclusion, the design of almost unbiased open-loop second-order estimators was addressed
by imposing the unbiasedness constraint at L values of θ ∈ Θ. Actually, the L test points were
distributed regularly inside the parameter space Θ. The number of unbiased test points was in
practice a function of the observation time and the oversampling factor.
This formulation was further improved by allowing the estimator to select automatically the
best unbiased test points. In that way, the estimator can decide the number and position of the
test points in order to minimize the overall estimator bias. This formulation was developed in
the following conference paper for the problem of timing and frequency synchronization [Vil02b].
• “Sample Covariance Matrix Based Parameter Estimation for Digital Synchronization”.
J. Villares, G. V´
azquez. Proc. of the IEEE Global Communications Conference 2002
(Globecom 2002). November 2002. Taipei (Taiwan).
Another important advance in this paper was the closed-form derivation of the kurtosis
matrix K for any linear modulations. This matrix contains all the fourth-order statistical information about the transmitted symbols that is relevant for second-order estimation. Actually,
matrix K gathers all the statistical information about the digital modulation that is ignored
when the Gaussian assumption is adopted.
The last two papers [Vil01a][Vil02b] are actually the foundation of Chapter 3 (open-loop
estimation) and Chapter 4 (closed-loop estimation).
1.3. NONCIRCULAR NUISANCE UNKNOWNS
5
In the same year, the results obtained in the last two papers [Vil01a][Vil02b] were extended to estimate the timing and frequency parameters in the presence of multipath propagation. This work was actually motivated by the participation in the EMILY European project
[Bou02a][Bou02b], in which advanced radiolocation techniques for wireless outdoor communication systems (e.g., GSM and UMTS) were studied. The results of this research were published
in the following paper [Vil02a] and are included in Section 6.3.
• “Optimal Quadratic Non-Assisted Parameter Estimation for Digital Synchronisation”. J.
Villares, G. V´
azquez. Proc. of the Int. Zurich Seminar on Broadband Communications
2002 (IZS2002). pp. 46.1-46.4. Zurich (Switzerland). February 2002.
The Bayesian formulation adopted to design open-loop estimators requires in most cases to
compute numerically the estimator coefficients. The reason is that, in most estimation problems,
the average with respect to the prior fθ (θ) does not admit an analytical solution. Exceptionally,
closed-form expressions can be obtained for the frequency estimation problem if the prior is
uniform. Thus, the exhaustive evaluation of open-loop second-order frequency estimators was
carried out in the following paper [Vil03a].
• “Sample Covariance Matrix Parameter Estimation: Carrier Frequency, A Case Study”. J.
Villares, G. V´
azquez. Proc. of the IEEE Int. Conf. on Accoustics, Speech and Signal
Processing (ICASSP). pp. VI-725 - VI-728. Hong Kong (China). April 2003.
In this paper, it was shown that unbiased second-order open-loop estimators can be obtained
by increasing the oversampling factor. In practice, unbiased open-loop estimators are feasible if
the sampling rate is greater than four times the maximum frequency error (Section 3.4).
1.3
Noncircular nuisance unknowns
Thus far, the second-order framework was only applied to formulate NDA timing and frequency
synchronizers. However, the problem of carrier phase synchronization was ignored because
higher-order methods are usually required to estimate the signal phase. However, this is not true
in case of noncircular modulations (e.g., PAM, BPSK, staggered formats and CPM). Remember
that the transmitted symbols {xi } belong to a noncircular constellation if the expected value of
xi xk is different from zero for certain values of i and k.
The problem of carrier phase synchronization in case of MSK-type modulations was addressed
in the following paper [Vil04b] and can be consulted in Section 6.2.
CHAPTER 1. INTRODUCTION
6
• “Self-Noise Free Second-Order Carrier Phase Synchronization of MSK-Type Signals”, J.
Villares, G. V´
azquez, Proc. of the IEEE Int. Conf. on Communications (ICC 2004). June
2004. Paris (France).
1.4
Self-noise in multivariate problems: interparameter interference.
At high SNR, the dominant disturbance is enterely due to the randomness of the received symbols
(i.e., the self-noise). In this high-SNR scenario, the self-noise variance is minimized if the kurtosis
of the data symbols is taken into account. Otherwise, if the Gaussian assumption is imposed,
the variance of the self-noise term increases. However, the self-noise contribution is normally
negligible in digital synchronization and the Gaussian assumption is practically optimal. In
order to test the Gaussian assumption, we decided to study other estimation problems in which
the self-noise term was more critical.
With this purpose, the uniparametric formulation was generalized to encompass important
multivariate estimation problems in the context of digital communications such as directionof-arrival (DOA) and channel estimation. These problems were selected because the self-noise
contribution was expected to degrade significantly the estimator performance at high SNR.
Hence, these two problems were valuable candidates for examining the Gaussian assumption.
In the DOA estimation problem, the DOA estimator is faced with the self-noise caused by the
user of interest and, in addition, by the other interfering users (multiple access interference). In
the channel estimation problem, the received pulse is severely distorted by the unknown channel
impulse response. Then, the intersymbol interference is enhanced at high SNR and hence the
self-noise variance is amplified.
The formulation of the optimal second-order multiparametric open- and closed-loop estimator
will appear in the IEEE Transactions on Signal Processing next July [Vil05]. The theoretical
material in this article is presented in Chapter 3 (open-loop or large-error estimation) and
Chapter 4 (closed-loop or small-error estimation).
• “Second-Order Parameter Estimation”. J. Villares, G. V´
azquez. IEEE Transactions on
Signal Processing. July 2005.
As it was expected, the performance of second-order DOA estimators was severely degraded
when the angular separation of the users was reduced because, in that case, the multiple access interference became the dominant impairment. In these singular scenarios, the Gaussian
assumption yielded a significant loss for practical SNRs if the transmitted symbols were drawn
1.5. INFORMATIVE PRIORS: ESTIMATION ON TRACK
7
from a constant-modulus constellation such as MPSK or CPM. The Gaussian assumption loss
was a function of the angular separation as well as the the number of antennas. All these
important results were presented in the following paper [Vil03b] and are included in Section 6.5.
• “Second-Order DOA Estimation from Digitally Modulated Signals”, J. Villares, G.
V´azquez, Proc. of the 37th IEEE Asilomar Conf. on Signals, Systems and Computers,
Pacific Grove (USA), November 2003.
In this paper, the problem of tracking the DOA of multiple moving digitally-modulated users
is considered. In this scenario, the tracking condition can be lost at low SNR when two users
approach each other. In this paper, it is shown that this is usually the outcome if the tracker is
forced to cancel out the multiple access interference. On the other hand, if the multiple access
interference is incorporated as another random self-noise term in the tracker optimization, the
optimal second-order tracker is able to maintain the tracking condition even if the users cross
each other [Vil03b].
As it has been explained before, the problem of blind channel estimation was also a promising candidate for testing the Gaussian assumption. Some results are presented in Section 6.4
that confirm the interest of the optimal second-order estimator in the medium-to-high SNR
range when the nuisance parameters have constant modulus. In that case, the Gaussian assumption cannot estimate the channel amplitude whereas the optimal solution yields self-noise
free estimates even if the channel amplitude is unknown (Section 6.4). This channel estimation
problem is currently being investigated in case of noncircular constant-modulus transmissions
[LS04][LS05a][LS05b].
1.5
Informative priors: estimation on track
Thus far, all the second-order closed-loop estimators and trackers had been designed and evaluated in the steady-state, that is, assuming that all the parameters were initially captured during
the acquisition phase. In fact, once the acquisition is completed, the estimator begins to operate in the small-error regime. The estimator coefficients were precisely optimized under the
small-error assumption. However, the acquisition performance had never been involved into the
estimator optimization.
After this reflection, we were concerned with the optimization of closed-loop second-order
estimators considering both the acquisition (large error) and steady-state (small-error) performance. With this aim, the Kalman filter formulation [And79][Kay93b] was adopted because
it is known to supply the optimal transition from the large to the small error regime when
the parameters and the observations are jointly Gaussian. Evidently, this assumption fails in
CHAPTER 1. INTRODUCTION
8
most estimation problems in digital communications and the suboptimal extended Kalman filter
(EKF) is only optimal in the steady-state [And79][Kay93b]. Despite this, the EKF provides a
systemmatic and automatic procedure for updating the prior distribution fθ (θ) every time a
new observation is processed. In that way, it is possible to enhance the acquisition performance
without altering the optimal steady-state solution.
The research in this direction yielded the so-called quadratic EKF (QEKF) that extended
the classical EKF to deal with quadratic observations. The QEKF formulation was published
in the following conference paper [Vil04a] and it has been included in Chapter 5.
• “On the Quadratic Extended Kalman Filter”, J. Villares, G. V´
azquez. Proc. of the Third
IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM 2004). July
2004. Sitges, Barcelona (Spain).
In this paper, the QEKF is designed and simulated for the aforementioned DOA estimation
problem. The most important conclusion is that, at high SNR, the Gaussian assumption is also
suboptimal during the acquisition phase if the data symbols are drawn from a constant-modulus
constellation. In that way, the acquisition time can be notably reduced if the QEKF takes into
account the kurtosis of the data symbols. Besides, the Gaussian assumption loss at high SNR is
shown to persist in the steady-state even if the tracker observation time is increased to infinity
(Chapter 5).
1.6
Limiting asymptotic performance
The last remark on the QEKF asymptotic performance persuaded us to study in detail the
performance limits in second-order estimation. The objective was to determine the asymptotic
conditions for the Gaussian assumption to apply. The asymptotic analysis confirmed that the
Gaussian assumption was optimal at low SNR but it was suboptimal at high SNR if the nuisance
parameters belonged to a constant-modulus alphabet. Finally, the performance of second-order
closed-loop estimators was evaluated when the number of samples went to infinity. The conclusion was that the Gaussian assumption applies in digital synchronization, and in DOA estimation
if the number of antennas goes to infinite. On the other hand, the Gaussian assumption fails for
the medium-to-high SNR range in the problem of channel estimation and for DOA estimation in
case of finite sensor arrays. All these asymptotic results were finally collected and are presented
for the first time in Chapter 7.
1.7. THESIS OUTLINE
1.7
9
Thesis Outline
The structure of the dissertation is presented next. The main contents and contributions are
described chapter by chapter.
Chapter 2: Elements on Estimation Theory.
In this chapter, the most important concepts from the estimation theory are reviewed. The
problem of parameter estimation in the presence of nuisance parameters is introduced and
motivated. The maximum likelihood (ML) estimator is presented and the most important MLbased approaches in the literature are described. Special emphasis is put on the Gaussian ML
(GML) estimator because it converges to the ML estimator at low SNR and yields the conditional
ML (CML) solution at high SNR. The important point is that all these ML-based estimators
are quadratic in the observation. The GML estimator is actually the ML estimator in case of
having Gaussian nuisance parameters. However, the optimal second-order estimator is normally
unknown if the nuisance parameters are not Gaussian. This was actually the motivation for this
thesis.
The iterative implementation of the aforementioned ML-based estimators is considered and
the utilization of closed-loop schemes motivated. Finally, a survey on estimation bounds is
included for the interested reader.
Chapter 3: Optimal Second-Order Estimation.
In this chapter, the optimal second-order estimator is formulated from the known distribution
of both the wanted parameters and the nuisance parameters. The Bayesian formulation and two
different optimization criteria are considered. In the first case, the estimator mean square error
(MSE) is minimized in the Bayesian sense, that is, averaging the estimator MSE according to
the assumed prior distribution. In the second case, the estimator variance is minimized subject
to the minimum bias constraint. Again, the variance and bias are averaged by means of the
prior distribution.
The resulting large-error or open-loop estimators are evaluated for the problem of blind
frequency estimation. The minimum MSE solution is shown to make a trade-off between the
bias and variance terms. On the other hand, the minimum bias constraint is unable to completely
eliminate the bias contribution although the observation time is augmented. Accordingly, the
ultimate performace of quadratic open-loop estimators becomes usually limited by the residual
bias.
Chapter 4: Optimal Second-Order Small-Error Estimation.
In this chapter, the design of closed-loop second-order estimators is addressed. Assuming that
all the parameters have been previously acquired, closed-loop estimators are due to compensate
CHAPTER 1. INTRODUCTION
10
for small errors. In this context, the optimal second-order small-error estimator is derived from
the minimum variance estimator in Chapter 3 by considering an extremely informative prior.
The resulting estimator is the best quadratic unbiased estimator (BQUE) and its variance is the
(realizable) lower bound on the variance of any sample covariance based parameter estimator.
The BQUE is proved to exploit the kurtosis matrix of the nuisance parameters whereas the
Gaussian ML estimator ignores this information.
Later, the conditions for second-order identifiability are analyzed and some important remarks are made about the so-called interparameter interference in multiuser scenarios. The
frequency estimation problem is chosen once again to illustrate the main results of the chapter.
Some simulations are selected to illustrate how the Gaussian assumption fails at high SNR.
Chapter 5: Quadratic Extended Kalman Filtering.
In this chapter, the well-known extended Kalman filter (EKF) is adapted to deal with
quadratic observations. The coefficients of the quadratic EKF are calculated from the actual
distribution of nuisance parameters. The optimal tracker is shown to exploit the kurtosis matrix
of the nuisance parameters.
The Gaussian assumption is evaluated during the acquisition and the steady-state for the
problem of DOA estimation. It is shown that the acquisition time and the steady-sate variance
can be reduced at high SNR if the transmitted symbols are drawn from a constant-modulus
alphabet (e.g., MPSK or CPM) and this information is incorporated.
Chapter 6: Case Studies.
In this chapter, the optimal second-order small-error estimator deduced in Chapter 4 is
applied to some relevant estimation problems. In the first section, some contributions in the
field of non-data-aided sychronization are presented. Specifically, Section 6.1 is devoted to
the global optimization of second-order closed-loop synchronizers and the design of open-loop
timing sycnronizers in the frequency domain. In Section 6.2, the problem of second-order carrier
phase synchronization is addressed in case of noncircular transmissions. In this section, the ML
estimator is shown to be quadratic at low SNR for MSK-type modulations. Moreover, secondorder self-noise free estimates are achieved at high SNR exploiting the non-Gaussian structure
of the digital modulation.
In Section 6.3, the problem of time-of-arrival estimation in wireless communications is studied. The frequency-selective multipath is shown to increase the number of nuisance parameters
and the Gaussian assumption is shown to apply in this case study. In Section 6.4, the classical
problem of blind channel identification is dealt with. The channel amplitude is shown to be not
identifiable unless the transmitted symbols belong to a constant-modulus constellation and this
information is exploited by the estimator.
1.7. THESIS OUTLINE
11
Finally, the problem of angle-of-arrival estimation in the context of cellular communications
is addressed in Section 6.5. The Gaussian assumption is clearly outperformed for practical SNRs
in case of constant-modulus nuisance parameters and closely spaced sources. In this section, the
importance of the multiple access interference (MAI) is emphasized and MAI-resistant secondorder DOA trackers are derived and evaluated.
Chapter 7: Asymptotic Studies.
In this chapter, analytical expressions are obtained for the asymptotic performance of the
second-order estimators presented in Chapter 3 and Chapter 4. Firstly, the low SNR study
concludes that the nuisance parameters distribution is irrelevant at low SNR and, therefore,
the Gaussian assumption is optimal. On the other hand, the high SNR study states that the
Gaussian assumption does not apply in case of constant-modulus nuisance parameters. This
conclusion is related to the eigendecomposition of the nuisance parameters kurtosis matrix.
Finally, the large sample study confirms that the Gaussian assumption is optimal in digital
synchronization if the observation time goes to infinity. Likewise, the Gaussian assumption
applies in DOA estimation if the number of antennas goes to infinity. However, the Gaussian
assumption cannot be applied —even if the number of snapshots is infinite— in case multiple
constant-modulus signals impinge into a finite array. Regarding the channel estimation problem,
the asymptotic study indicates that second-order estimates could be improved by considering
the actual distribution of the nuisance parameters.
Chapter 8: Conclusions.
This chapter concludes and summarizes the main results of this thesis. To finish, some topics
for further research are proposed.
12
CHAPTER 1. INTRODUCTION
Chapter 2
Elements on Estimation Theory
The estimation theory deals with the basic problem of infering some relevant features of a random
experiment based on the observation of the experiment outcomes. In some cases, the experiment
mechanism is totally unknown to the observer and the use of nonparametric estimation methods
is necessary. The term “nonparametric” means that the observed experiment cannot be modelled
mathematically. Let us consider, for instance, the classical problem of spectral analysis that
consists in computing the power spectral density of the observed signal from a finite sample.
The performance of nonparametric methods is usually unsatisfactory when the observed time
is limited. This situation is actually very usual because the experiment output is only temporally
available; the experiment is not stationary; or the observer is due to supply the estimate in a
short time. To design more efficient estimation techniques, it is recommended to find previously
a convenient mathematical model for the studied experiment. The result of the experiment is
thus a function of a finite number of unknow parameters, say θ, and other random terms forming
the vector w. The vector w collects all the nuisance terms in the model that vary randomly
during the observation time as, for example, the measurement noise.
The objective is therefore finding the minimal parameterization in order to concentrate the
most the uncertainty about the experiment. In those fields dealing with natural phenomena, the
parametrization of the problem is definitely the most difficult point and, actually, the ultimate
goal of scientists working in physics, sociology, economics, among others. Fortunately, the parameterization of human-made systems is normally accesible. In particular, in communication
engineering, the received signal is known except for a finite set of parameters that must be
estimated before recovering the transmitted information. Likewise, in radar applications, the
received signal is know except for the time of arrival and, possibly, some other nuisance parameters. In the following, we will focus exclusively on parametric estimation methods assuming
that we are provided with a convenient parameterization or signal model.
13
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
14
In some of the examples above, it is possible to act on the experiment by introducing an
excitation signal. In that case, the random experiment can be seen as an unknown system that
is identified by observing how the system reacts to the applied excitation. This alternative
perspective is normally adopted in system engineering and, specifically, in the field of automatic
control. Unfortunately, in some scenarios, the observer is unaware of the existing input signal and
blind system identification is required. For example, in digital communications, the transmitted
symbols are usually unknown at the receiver side. This thesis is mainly concerned with blind
estimation problems in which the problem parameterization includes the unknown input.
Thus far, the formulation is rather general; the observation y ∈ CM is a given function of
the input x ∈ CK , the vector of parameters θ ∈ RP and the random vector w ∈ CM of arbitrary
known distribution. Formally, the general problem representation is considered in the following
equation:
y = a(x, θ, w)
(2.1)
where the function a (·) should be univoque with respect to θ and x, that is, it should be possible
to recover θ and x from y if the value of w were known. In that case, the estimation problem
is not ambiguous. The basic problem is that multiple values of θ, x and w yield the same
observation y. Otherwise, it would not be an estimation problem but an inversion problem
consisting in finding the inverse of a (·).
Then, the objective is to estimate the value of θ based on the observation of y without knowing the input x and the random vector w. Thus, the entries of x appear as nuisance parameters
increasing the uncertainty on the vector of parameters θ. In general, the vector of nuisance
parameters would include all the existing parameters which are not of the designer’s interest,
including the unknown inputs. For example, the signal amplitude is a nuisance parameter when
estimating the time-of-arrival in radar applications. This thesis is mainly concerned with the
treatment of these nuisance parameters in the context of digital communications.
An estimator of θ is a given function z (·) of the random observation y,
= z(y),
θ
− θo with θo the true value of θ. Evidently, the aim is to
yielding a random error e = θ
minimize the magnitude of e. Several criteria are listed in the literature minimizing a different
cost function C(e) as, for example, the mean square error e2 , or the maximum error max {e}.
On the other hand, a vast number of estimators have been formulated by proposing ad hoc
functions z (·) whose performance is evaluated next. Some of them are briefly presented in
the following sections. For more details, the reader is referred to the excellent textbooks on
parametric estimation in the bibliography [Tre68][Sch91a][Kay93b].
2.1. CLASSICAL VS. BAYESIAN APPROACH
2.1
15
Classical vs. Bayesian Approach
There are two important questions that must be addressed before designing a convenient estimator. The first one is why some terms of the signal model are classified as random variables (w)
whereas others are deterministic parameters (θ). The second question is whether the nuisance
parameters in x should be modelled as random or deterministic variables.
In the classical estimation theory, the wanted parameters θ are deterministic unknowns that
are constant along the observation interval. On the other hand, those unwanted terms varying
“chaotically” along the observation interval are usually modelled as random variables (e.g., the
measurement noise, the signal amplitude in fast fading scenarios, the received symbols in a
digital receiver, etc.).
Regarding the vector x, the nuisance parameters can be classified as deterministic constant
unknowns, say xc , or random variable unknowns, say xu . In the random case, we will assume
hereafter that the probability density function of xu is known. However, if this information
were not available, the entries of xu could be considered deterministic unknowns and estimated
together with θ.1
In the classical estimation theory, the likelihood function fy (y; xc , θ) supplies all the statistical information for the joint estimation of xc and θ. If some nuisance parameters are random,
say xu , the conditional likelihood function fy/xu (y/xu ; xc , θ) must be averaged with respect to
the prior distribution of xu , as indicated next
fy (y; xc , θ) = Exu fy/xu (y/xu ; xc , θ) = fy/xu (y/xu ; xc , θ) fxu (xu ) dxu .
(2.2)
On the other hand, modeling the constant nuisance parameters as random variables is rather
controversial. For example, the received carrier phase is almost constant when estimating the
signal timing in static communication systems. Even if these parameters come from a random
experiment and their p.d.f. is perfectly known, we are only observing a particular realization of
xc , which is most probably different from their mean value. Therefore, modeling these nuisance
parameters as random variables might yield biased estimates of θ. Evidently, this bias will be
cancelled out if several realizations of y were averaged, but only one realization is available!
This controversy is inherent to the Bayesian philosophy [Kay93b, Ch. 10]. In the Bayesian
or stochastic approach, all the parameters —including the vector of wanted parameters θ— are
modelled as random variables of known a priori distribution or prior. Then, the resulting
with respect to the
estimators are designed to be optimal “on the average”, that is, averaging θ
prior distributions of θ and x. Actually, all the classical concepts such as bias, variance, MSE,
consistency and efficiency must be reinterpreted in the Bayesian sense.
1
Notice that this is not the only solution. For example, we can assume a non-informative prior for xu or, alterna-
tively, we can apply Monte Carlo methods to evaluate numerically the unknow distribution of xu [Mer00][Mer01].
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
16
Bayesian estimators are able to outperform classical estimators when they are evaluated “on
the average”, mainly when the observation y is severely degraded in noisy scenarios. This is
possible because Bayesian estimators are able to exploit the a priori information on the unknown
parameters. Anyway, as S. M. Kay states in his book, ‘It is clear that comparing classical and
Bayesian estimators is like comparing apples and oranges’ [Kay93b, p. 312].
Bearing in mind the above explaination, let us consider that y, x, and θ are jointly distributed
random vectors. In that case, the whole statistical information about the parameters is given
by the joint p.d.f.
fy,x,θ (y, x, θ) = fy/x,θ (y/x, θ) fx (x) fθ (θ) ,
(2.3)
assuming that x and θ are statistically independent. The first conditional p.d.f. in (2.3) is
numerically identical to the conditional likelihood function fy/xu (y/xu ; xc , θ) in (2.2) but it
highlights the randomness of xc and θ in the adopted Bayesian model. The other terms fx (x) =
fxc (xc ) fxu (xu ) and fθ (θ) are the a priori distributions of x and θ, respectively.
Notice that the classical and Bayesian theories coincide in case of non-informative priors,
i.e., when fy (y; xc , θ) is significantly narrower than fxc (xc ) and fθ (θ) [Kay93b, Sec. 10.8].
In the sequel and for the sake of simplicity, all the nuisance parameters will be modelled as
random variables or, in other words, x = xu and xc = ∅. Thus,
fy (y; θ) = Ex fy/x,θ (y/x, θ)
will be referred to as the unconditional or stochastic likelihood function in opposition to the
joint or conditional likelihood function
fy (y; x, θ) = fy/x,θ (y/x; θ) ,
which is also referred to as the deterministic likelihood function in the literature.
2.2
MMSE and MVU Estimation
The ultimate goal in the classical estimation theory is the minimization of the estimator mean
square error (MSE), that is given by
2
MSE(θ) Ey θ
− θ = Ey z(y) − θ2
where Ey {·} involves, implicitly, the expectation over the random vectors w and x. The MSE
can be decomposed as
MSE(θ) = BIAS 2 (θ) + V AR(θ)
2.2. MMSE AND MVU ESTIMATION
17
where the estimator bias and variance are given by
2
− θ
BIAS 2 (θ) = Ey θ
2
V AR(θ) = Ey θ
− Ey θ
The minimum MSE (MMSE) estimator finds a trade-off between the bias and the variance
for every value of θ. Unfortunately, the bias term is usually a function of θ and, consequently,
the MMSE estimator is generally not realizable because it depends on θo [Kay93b, Sec. 2.4.]. In
general, any estimator depending on the bias term will be unrealizable in the classical framework.
This limitation suggests to focus uniquely on unbiased estimators holding that BIAS 2 (θ) = 0
for all θ. Thus, the estimator MSE coincides with its variance and the resulting estimator is
usually referred to as the minimum variance unbiased (MVU) estimator [Kay93b, Ch. 2]. The
MVU estimator minimizes the variance subject to the unbiased constraint for every θ.
The Rao-Blackwell-Lehmann-Scheffe theorem facilitates a procedure for finding the MVU
estimator [Kay93b, Ch.5]. Unfortunately, this method is usually tedious and sometimes fails to
produce the MVU estimator. Notice that the existence of the MVU estimator is not guaranteed
either. Despite these difficulties, the MVU formulation is widely adopted because the maximum
likelihood principle is known to provide approximatelly the MVU estimator under mild regularity
conditions [Kay93b, Ch. 7].
If the classical framework is abandonned in favour of the Bayesian approach, the dependence
of MSE(θ) on the true parameter θ can be solved by averaging with respect to the prior fθ (θ).
Therefore, the Bayesian MMSE estimator can be formulated as the minimizer of
2 2
Eθ {MSE(θ)} = Eθ Ey θ
− θ = Ey θ
− θ fθ (θ) dθ,
(2.4)
that is known to be the mean of the posterior p.d.f. fθ/y (θ/y) [Kay93b, Eq. 10.5], i.e.,
M MSE = Eθ/y {θ/y} = fy−1 (y)
θ
θfy (y; θ) fθ (θ) dθ
(2.5)
where the Bayes’ rule is applied to write fθ/y (θ/y) in terms of the likelihood function and the
prior:
fθ/y (θ/y) =
fy (y; θ) fθ (θ)
fy (y; θ) fθ (θ)
.
=
fy (y)
fy (y; θ) fθ (θ) dθ
The Bayesian MMSE estimator is known to minimize the MSE “on the average” (2.4). This
means that the actual MSE will be high if the actual parameter θo is unlikely, and small if fθ (θ)
is distributed around the true parameter θo .
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
18
2.3
Maximum Likelihood Estimation
Although there are other relevant criteria, the maximum likelihood (ML) principle has become
the most popular parametric method for deducing statistically optimal estimators of θ. In the
studied signal model (2.1), the observation is clearly a random variable due to the presence of
the random vectors w and x. Actually, we have a single observation yo of this random variable
from which the value of θ must be inferred. The ML estimator is the one chosing the value of
θ —and implicitly the value of w and x— that makes yo the most likely observation. Formally,
if fy (y; θ) is the probability density function of the random vector y parameterized by θ, the
ML estimator is given by
U M L = arg max {fy (yo ; θ)} ,
θ
θ
(2.6)
where yo is the vector of observed data2 and
fy (y; θ) = Ex fy/x,θ (y/x, θ) =
fy (y; x, θ) fx (x) dx
(2.7)
is known as the unconditional likelihood function. Likewise, the estimator in (2.6) is known
as the unconditional or stochastic maximum likelihood (UML) estimator because the nuisance
parameters are modelled as random unknowns (Section 2.1). If the nuisance parameters are
really random variables, the UML estimator is actually the true ML estimator of θ.
Alternatively, the nuisance parameters can be modelled as deterministic unknowns —as done
for θ. In the context of the ML theory, the deterministic or conditional model is unavoidable
when x is a constant unknown or there is no prior information about x (Section 2.1). Moreover,
even if the nuisance parameters are actually random, the CML approach is often adopted if the
expectation in (2.7) cannot be solved analytically. In that case, however, the CML solution is
generally suboptimal because it ignores the prior information about x. Thus, the deterministic
or conditional maximum likelihood (CML) estimator is formulated as follows
CM L = arg max max fy (y; x, θ) = arg max {fy (y; x
M L , θ)}
θ
(2.8)
where fy (y; x, θ) is the joint or conditional likelihood function and
M L = arg max max fy (y; x, θ)
x
(2.9)
θ
θ
x
x
θ
is the ML estimator of x.
Comparing the UML and CML solutions in (2.6) and (2.8), we observe that in the unconditional model the nuisance parameters are averaged out using the prior fx (x) whereas in the
2
In the sequel, the random variable y and the observation yo will be indistinctly named y for the sake of
simplicity.
2.3. MAXIMUM LIKELIHOOD ESTIMATION
19
M L .
conditional model fy (y; x, θ) is compressed by means of the ML estimate of x, namely x
Also, it is worth noting that, if the nuisance parameters belong to a discrete alphabet, we are
ML is actually the ML detector. It is found that the
dealing with a detection problem and x
estimation of θ is significantly improved by exploiting the discrete3 nature of x. This aspect is
crucial when designing estimation techniques for digital communications in which x is the vector
of transmitted symbols.
Finally, the following alternative estimator is proposed now:
CM L2 = arg max max fy (y; x, θ) fx (x) = arg max {fy (y; x
M AP , θ)}
θ
θ
where
θ
x
M AP = arg max max fy (y; x, θ) fx (x)
x
x
θ
(2.10)
(2.11)
is the Maximum a Posteriori (MAP) detector exploiting the prior distribution of x. Notice that
CM L2 in case of equally likely nuisance parameters.
CM L = θ
θ
2.3.1
Decision Directed ML Estimation
Focusing on those estimation problems dealing with discrete nuisance parameters, the conditional ML estimators in equations (2.8) and (2.10) exploit the hard decisions provided by the ML
or MAP detectors of x, respectively. In the context of digital communications, these estimation
techniques are referred to as decision directed (DD). Decision-directed estimators are usually
implemented iterating equations (2.8) and (2.9) for the ML detector, or (2.10) and (2.11) for
the MAP detector. The main drawback of iterative algorithms is the uncertain convergence to
the global maximum of fy (y; x, θ).
In some kind of problems, decision directed methods are efficient at high SNR. For example, in digital communications, DD synchronizers are known to attain the Cram´er-Rao bound
at high SNR [And94][Moe98]. However, when the noise variance is high, hard decisions are
unreliable and it is better to compute soft decisions on the nuisance parameters. In digital communications, the estimation techniques based on soft decisions about the transmitted symbols
are usually known as non-data-aided (NDA) [Men97]. Indeed, this interpretation is adopted in
[Vaz00][Rib01b] to describe some ML-based NDA synchronizers.
In [Noe03], the Expectation-Maximization (EM) algorithm [Dem77][Fed88] is invoked to
prove that UML estimation requires soft decisions from the MAP detector. More specifically,
the nuisance parameters soft information is introduced by means of the a posteriori probabilities
3
In order to unify the study of continuous and discrete nuisance parameters, the prior fx (x) will be used
indistinctly in both cases. To do so, if x ∈ {a1 , . . . , aI } with I the alphabet size, fx (x) will be a finite number of
I
i=1 p(ai )δ (x − ai ) with p(ai ) the probability of ai .
Dirac’s deltas, i.e, fx (x) =
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
20
fx (a1 /y, θ) , ..., fx (aI /y, θ) where {ai }i=1,... ,I are all the possible values of x. The EM algorithm
is then applied to obtain an iterative implementation of the UML estimator following the socalled Turbo principle [Mie00][Ber93]. In these schemes, the estimator (2.6) is assisted with the
decoder soft decisions and vice versa. The EM foundation ensures the convergence to the UML
solution under fairly general conditions. The required soft decisions are provided by the optimal
MAP decoder proposed in [Bah74], that supplies at each iteration the a posteriori probability
fx (x/y, θ) for every possible value of x.
It is worth noting that the UML estimator is able to exploit the statistical dependence
introduced by the encoder whereas the conditional approach in (2.8)-(2.10) does not. In the
conditional model, the estimator is only informed that the codeword x is a redundant vector
and, thus, it belongs to a reduced subset or codebook. In addition, the UML estimator is able to
exploit the statistical dependence of the nuisance parameters in order to reduce their uncertainty
at low SNR.
Another suitable implementation of the conditional estimators (2.8)-(2.10) is to assign a
to each survivor path in the Viterbi decoder, corresponding to a tentative
different estimator θ
sequence of symbols x. The estimator output is then used to recompute the metric of the
associated path. These kind of methods are usually referred to in the literature as Per Survivor
Processing (PSP) [Pol95]. It can be shown that this approach attains the performance of the
CML estimator in (2.8).
2.3.2
Asymptotic properties
The importance of the ML theory is that it supplies the minimum variance unbiased (MVU)
estimator if the observed vector is sufficiently large under mild conditions. This result is a
consequence of the asymptotic efficiency of the ML criterion, which is known to attain the
Cram´er-Rao lower bound as the number of observations increases (Section 2.6.1). Therefore,
the ML theory facilitates a systemmatic procedure to formulate the MVU estimator in most
estimation problems of interest.
In this section, the most relevant properties of the ML estimator are enunciated [Kay93b,
Sec. 7B]. If the observation size goes to infinity (M → ∞), it can be shown that
Property 1. The ML estimator is asymptotically Gaussian distributed with mean θo and
covariance BCRB (θo ) where θo is the true parameter and BCRB (θo ) is the Cr´amer-Rao lower
bound evaluated at θo (Section 2.6.1). This means that the ML estimator is asymptotically
unbiased and efficient or, in other words, the ML estimator leads asymptotically (M → ∞) to
2.3. MAXIMUM LIKELIHOOD ESTIMATION
fy(y;θ)
21
outlier
fy(y;θ)
θ^ ML
θo
small-error
θ
θo
θ^ ML
θ
large-error
Figure 2.1: This picture illustrates the significance of the term outlier in the context of ML
estimation.
the minimum variance unbiased (MVU) estimator with
M L −→ θo
Ey θ
H Ey
θM L − θo θM L − θo
−→ BCRB (θo ) .
M L → θo as
Property 2. The ML estimator is asymptotically consistent meaning that θ
the observation size goes to infinity. This property implies that the CRB tends to zero as M is
increased, i.e., BCRB (θo ) → 0.
These properties are verified if the regularity condition
∂
Ey
ln fy (y; θ)
=0
∂θ
θ=θ o
(2.12)
is guaranteed for every θo . Fortunately, most problems of interest verify the above regularity
condition. The implicit requirement is that the function support on y of fy (y; θ) does not
depend on the parameter θ so that the integral limits of Ey {·} are independent of θ. This
condition is needed to have unbiased estimates since (2.12) guarantees that Ey {ln fy (y; θ)} has
a maximum at the true parameter θo whatever the value of θo .
As proved in [Kay93b, Theorem 7.5], the first property on the optimality of the ML estimator
is satisfied even for finite observations provided that the signal model is linear in θ and x.
However, a large number of estimation problems are nonlinear in the parameter vector θ. In
that case, it is very important to determine how many samples (M) are required to guarantee
the ML asymptotic efficiency (property 1). Fortunately, in most problems of interest this value
is not excessive. It is found that the minimum M depends on the signal model at hand as well
as the variance of the noise term w, say σ2w . If the value of σ 2w is low and/or M is large, the
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
22
variance
ML
CRB
SNR
large-error
region
small-error
SNR
region
threshold
Figure 2.2: This picture illustrates the existence of a SNR threshold in nonlinear estimation
problems. This threshold divides the SNR axis into the small-error regime, in which the CRB
is attained, and the large-error regime, in which efficient estimators does not exist. Notice that
the threshold position can be moved to the left by increasing the vector of observations.
log-likelihood function ln fy (y; θ) exhibits a parabolic shape —quadratic form— with a unique
maximum near the true parameter θo . Only in this small-error regime, the ML estimator is
statistically efficient holding property 1.
On the other hand, if the value of σ2w is high and/or M is not sufficiently large, the likelihood
function fy (y; θ) becomes multimodal and large errors are committed when the level of a distant
maximum or outlier exceeds the true parameter maximum (Fig. 2.1). In this large-error regime,
the variance of the ML estimator departs abruptly from the CRB. It is found that the estimator
enters in the large-error regime if the noise variance σ2w exceeds a given threshold. This threshold
can be augmented (i.e., σ2w greater) if the observation size is increased and, therefore, the largeerror region disappears as long as the observation size goes to infinity. This is actually the sense
of the ML asymptotic efficiency (property 1).
The existence of a low-SNR threshold in nonlinear estimation problems suggests to distinguish between the small-error and large-error scenario (Fig. 2.2). In the first case, ML estimators
are efficient and, hence, they attain the CRB (Section 2.6.1). Thus, the ML principle becomes
the systematic way of deducing the MVU estimator in the small-error regime. Moreover, in the
small-error case, the ML estimator is also optimal in terms of mean square error (Section 2.2).
This conclusion is important because MMSE estimators are generally not realizable since they
depend on the unknown parameter θo .
On the other hand, efficient estimators do not exist in the large-error case and, other lower
2.4. LINEAR SIGNAL MODEL
23
bounds are needed to take into account the existence of large errors —or outliers— and predict the
threshold effect (Section 2.6). In this large-error regime, unbiased estimators are generally not
optimal from the MSE point of view and the MMSE solution establishes a trade-off between the
variance and bias contribution (Section 2.2). In this context, the Bayesian theory allows deducing
realizable estimators minimizing the so-called Bayesian MSE, which is the MSE averaged over
all the possible values of the parameter θ [Kay93b, Sec. 10].
In this thesis, Chapter 3 is devoted to design optimal second-order large-error estimators
whereas these results are particularized in Chapter 4 to formulate the optimal second-order
small-error estimator.
To conclude this brief introduction to the maximum likelihood theory, two additional properties are presented next. These properties are satisfied even if the observation interval is finite.
Property 3. Whenever an efficient estimator exists, it corresponds to the ML estimator.
In other words, if the MVU estimator attains the CRB, the ML estimator is also the MVU
estimator. Otherwise, if the MVU variance is higher than the CRB, nothing can be stated
about the optimality of the ML estimator for finite observations.
M L stands for the ML
Property 4. The ML estimator is invariant in the sense that, if θ
M L for any one-to-one
ML = g θ
estimator of θ, the ML estimator of α = g (θ) is simply α
ML maximizes fy (y; α) , that is obtained as
function g(·). Otherwise, if g(·) is not univoque, α
max fy (y; θ) subject to g (θ) = α [Kay93b, Th. 7.2].
θ
2.4
Linear Signal Model
The formulation of parameter estimation techniques from the general model introduced in (2.1)
is mostly fruitless. Accordingly, in the following, the focus will be on those linear systems
corrupted by an additive Gaussian noise, holding that
y = A(θ)x + w
(2.13)
where x ∈ CK is the system input forming the vector of nuisance parameters, w ∈ CM is the
Gaussian noise vector and, A(θ) ∈ CM ×K is the system response parameterized by the vector
θ ∈ RP . Despite its simplicity, the adopted linear model is really important because it appears
in a vast number of engineering applications. In the context of digital communications, this
model applies for any linear modulation as well as for continuous phase modulations (CPM)
thanks to the Laurent’s expansion [Lau86][Men97, Sec. 4.2] (Section 6.1.2).
We will assume that the noise vector in (2.13) is zero-mean and its covariance matrix is a
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
24
priori known, that is,
E {w} = 0
E wwH = Rw ,
with Rw a given full-rank matrix. Furthermore, we will assume that w is a proper or circular
random vector holding that E wwT = 0 [Sch03][Pic96]. The statistical distribution of the
noise samples is normal (Gaussian), although the results in the following chapters could be easily
extended to admit any other noise distribution. Finally, the noise variance is defined as
σ 2w Tr (Rw )
,
M
which is the variance of the noise samples [w]m , if they are identically distributed. Additionally,
we introduce the matrix N, which is defined as
N σ −2
w Rw .
In the unconditional model, the nuisance parameters are modelled as random variables of
known probability density function fx (x) with zero-mean and uncorrelated entries4 , meaning
that
E {x} =
E xxH =
xfx (x) dx = 0
xxH fx (x) dx = IK
where fx (x) would be composed of a finite number of Dirac’s deltas in case of discrete nuisance
parameters. On the other hand, the nuisance parameters are possibly improper random variables
with E xxT = 0 [Sch03][Pic96]. This consideration is specially important in digital communications because some relevant modulations (e.g., BPSK and CPM) are actually improper or
noncircular, i.e., E xxT = 0.
In the linear signal model, the conditional or joint likelihood function is given by
1
fy (y; θ, x) = M
exp − y − A (θ) x2R−1
w
π det (Rw )
H H
H H
−1
= C1 exp 2 Re x A (θ) R−1
w y − x A (θ) Rw A (θ) x
(2.14)
with
C1 exp(−yH R−1
w y)
π M det (Rw )
an irrelevant factor independent of θ.
4
Notice that there is no loss of generality because the correlation of x can always be included into the matrix
A(θ) in (2.13).
2.4. LINEAR SIGNAL MODEL
25
On the other hand, the unconditional likelihood function in (2.7) does not admit a general
analytical solution, even for the linear model presented in this section. By replacing (2.14) into
(2.7), it is found that the unconditional likelihood function is given by5
H H −1
fy (y; θ) = C1 Ex exp 2 Re xH AH R−1
.
w y − x A Rw Ax
(2.15)
Moreover, in case of i.i.d. nuisance parameters, the expectation with respect to x results in
the following expressions:
K
−1
y
=
Ex exp 2 Re x∗k aH
Ex exp 2 Re xH AH R−1
w
k Rw y
Ex exp xH AH R−1
=
w Ax
=
k=1
K K
−1
Ex exp x∗k xl aH
k Rw al
k=1 l=1
K
K
k=1
−1
Ex exp 2 Re x∗k xl aH
+
k Rw al
l>k
−1
R
a
Ex exp |xk |2 aH
w k
k
where xk [x]k and ak [A]k are the k-th element and column of x and A, respectively.
The above expectations over the nuisance parameters have only been solved analytically in case
of Gaussian nuisance parameters (Section 2.4.3) and polyphase discrete alphabets as shown in
Appendix 2.A. However, a general closed-form solution is not available. In the next subsections,
some alternative criteria are proposed to circumvent the computation of the exact unconditional
likelihood function (2.15).
2.4.1
Low-SNR Unconditional Maximum Likelihood
The usual way of finding the UML estimator is the evaluation of (2.15) assuming a very low SNR
[Vaz00][Men97]. The low-SNR constitutes a worst-case situation leading to robust estimators of
θ. When the noise variance increases, the exponent of (2.15) is very small and, therefore, the
exponential can be expanded into the following Taylor series:
fy (y; θ) C2 Ex 1 + χ (y; θ, x) + χ2 (y; θ, x)
(2.16)
H H −1
where χ (y; θ, x) 2 Re xH AH R−1
w y − x A Rw Ax is the exponent of (2.15) [Vaz00]. Assuming that the nuisance parameters are circular, zero-mean, unit-power and uncorrelated, the
expectation in (2.16) is evaluated obtaining that
H −1 −2
Ex {χ (y; θ, x)} = − Tr AH R−1
w A = −σw Tr A N A
H −1 R
+ ζ (θ)
AA
R
Ex χ2 (y; θ, x) = 2 Tr R−1
w
w
−1
H −1 = 2σ −4
Tr
N
AA
N
R
+ ζ (θ)
w
5
For the sake of clarity, the dependence on θ is omitted from A (θ) in the following expressions.
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
26
yyH is the sample covariance matrix and, ζ (θ) σ−4
where R
w Ex
2 has
xH AH N−1 Ax
not been expanded because it is negligible compared to Ex {χ (y; θ, x)} for σ2w → ∞.
Finally, having in mind that ln(1 + x) x for x 0 and omitting constant terms, the
low-SNR log-likelihood function becomes
−1
H −1 ln fy (y; θ) ∝ − Tr AH R−1
A
+
Tr
R
AA
R
R
w
w
w
H −1 R − Rw ,
= Tr R−1
w AA Rw
(2.17)
yyH is a sufficient statistic for the estimation
proving that the sample covariance matrix R
of θ in the studied linear model, if the SNR goes to zero. More precisely, the log-likelihood
function in (2.17) is an affine transformation of the sample covariance matrix with
b (θ) = − Tr AH (θ) R−1
A
(θ)
w
H
−1
M (θ) = R−1
w A (θ) A (θ) Rw
the independent term and the kernel of ln fy (y; θ), respectively. Notice that this result is
independent of the actual distribution of the nuisance parameters fx (x). Actually, the result is
valid for any circular distribution having zero mean and unitary variance.
Finally, the explicit formula for the UML estimator at low SNR is given by
lowSN R = arg max Tr N−1 AAH N−1 R
− Rw .
θ
θ
(2.18)
This result is relevant because it states that in low SNR scenarios, second-order techniques
are asymptotically efficient for any estimation problem following the linear model in (2.13).
Actually, this conclusion was the starting point of this thesis.
Unfortunately, the low-SNR solution has some important inconveniences. In Appendix 2.B,
it is shown that the low SNR approximation usually yields biased estimates for any positive
SNR. Moreover, the low-SNR UML estimator might yield a significant variance floor when
applied in high SNR scenarios due to the variance induced by the random nuisance parameters
(Appendix 2.B). This variability is usually referred to as self-noise or pattern-noise in digital
synchronization [Men97].
Despite these potential problems, the low SNR approximation is extensively used in the
context of digital communications and ad hoc methods are introduced to mitigate or cancel the
self-noise contribution at high SNR. On the other hand, the ML-based estimators proposed in
the following sections are suitable candidates to cancel out the bias and self-noise terms at high
SNR. However, our main contribution in Chapter 4 is proving that all of them are suboptimal
in terms of self-noise cancelation when applied to polyphase alphabets such as MPSK.
2.4. LINEAR SIGNAL MODEL
27
yyT also appears in
To conclude this section, we notice that the term depending on R
2
Ex χ (y; θ, x) when dealing with noncircular nuisance parameters. Therefore, the low-SNR
log-likelihood function should be modified in the following way:
−1
H −1 −T ∗ H H −1 ln fy (y; θ) − Tr AH R−1
w A + Tr Re Rw AA Rw R + Rw A Γ A Rw R ,
with
Γ E xxT
the improper covariance matrix of x. Furthermore, if x is real-valued (e.g., in baseband com
munications or for the BPSK modulation), it follows that Γ = E xxH = IK . Notice that this
second term is the one exploited in Section 6.2 to estimate the carrier phase because the term
does not provide information about the signal phase.
on R
2.4.2
Conditional Maximum Likelihood (CML)
In this section, the CML criterion in (2.8) is formulated for the linear signal model in (2.13).
In that case, the conditional likelihood function in (2.14) can be compressed with respect to x
if the nuisance parameters are continuous variables, i.e., x ∈ CK . If the nuisance parameters
are discrete (e.g., in digital communications), this compression strategy yields a suboptimal
version of the CML estimator formulated in (2.8). This suboptimal CML estimator has been
successfully applied to different estimation problems in digital communications such as timing
synchronization [Rib01b]. Some degradation is incurred because the estimator does not exploit
the fact that x belongs to a finite alphabet. As it is shown in Section 2.4.1, this information is
irrelevant at low SNR but it is crucial when the noise term vanishes at high SNR. Nonetheless, in
the following, we will refer to this estimator as the CML estimator regardless of having discrete
or continuous nuisance parameters.
Therefore, if there is absolutely no information about x, the nuisance parameters must be
assumed deterministic, continuous unknowns. Then, the ML estimator of x in (2.9) is obtained
in the linear case by solving a classical weighted least squares (WLS) problem leading to
−1
M L (θ) = AH (θ) R−1
x
A (θ)H R−1
w A (θ)
w y,
assuming that A (θ) is a tall matrix, i.e., M > K [Sch91a, Sec. 9.12]. After some algebra, the
corresponding log-likelihood function is given by
M L (θ)2Rw
xM L (θ)) ∝ − y − A (θ) x
ln fy (y; θ,
H −1 −1 H −1 ,
∝ Tr R−1
A
A Rw A
A Rw R
w
(2.19)
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
28
becoming a linear transformation of the sample covariance matrix and, thus, a quadratic function
of the observation y. Finally, the CML estimator of θ is computed as follows:
CML = arg max ln fy (y; θ,
xML (θ))
θ
θ
−1 H −1 A N R
= arg max Tr N−1 A AH N−1 A
θ
,
= arg max Tr M (θ) R
θ
(2.20)
with
−1 H
M (θ) = N−1 A (θ) AH (θ) N−1 A (θ)
A (θ) N−1
the associated kernel.
The resulting estimator is actually projecting the whitened observation N−1/2 y onto the
orthogonal subspace generated by the columns of N−1/2 A (θ). Clearly, the above solution
is related to subspace methods like MUSIC [Sch79][Bie80][Sto89][Sto97]. In fact, the CML
estimator in (2.20) is equivalent to a variant of the MUSIC algorithm proposed in [Sto89].
It can be seen that the CML estimator in (2.20) corresponds to the low-SNR UML estimator
−1/2
in (2.18) if Rw
A (θ) is unitary or, in other words,
AH (θ) N−1 A (θ) ∝ IK .
(2.21)
If the above equation is not fulfilled, the CML estimator might suffer from noise-enhancement
at low SNR when the observation length is limited. In that case, the low-SNR UML estimator
deduced in Section 2.4.1 outperforms the CML estimator in the low SNR regime because the
former exploits the a priori statistical knowledge about x.
The CML solution is shown in Appendix 2.C to hold the following regularity condition:
∂
Ey
ln fy (y; θ,
xM L (θ))
=0
∂θ
θ=θ o
and, therefore, the CML estimator is always unbiased and self-noise free even for finite observations. Another significative feature of the CML solution is that it is not necessary to know the
variance of the noise samples σ2w .
2.4.3
Gaussian Maximum Likelihood (GML)
The Gaussian assumption on the nuisance parameters is generally adopted when the actual
distribution is unknown or becomes an obstacle to compute the expectation in (2.15). The
Gaussian assumption is known to yield almost optimal second-oder estimators on account of
2.5. MAXIMUM LIKELIHOOD IMPLEMENTATION
29
the Central Limit Theorem. This subject is addressed throughout this dissertation and the
asymptotic efficiency of the Gaussian assumption is studied in Chapter 7.
If the nuisance parameters are Gaussian, the observed vector y is also Gaussian in the studied
linear signal model. Thus, we have that
exp −yH R−1 (θ) y
fy (y; θ) =
,
π M det (R (θ))
(2.22)
where y is zero-mean and
= E yyH = A (θ) AH (θ) + Rw
R (θ) E R
(2.23)
is the covariance matrix of y. Once again, the log-likelihood solution is an affine transformation
of the sample covariance matrix that, omitting constant additive terms, is given by
ln fy (y; θ) = ln Ex {fy (y/x; θ)} = − ln det (R (θ)) − Tr R−1 (θ) R
(2.24)
Therefore, having in mind that ln det (M) = Tr ln (M), it is found that
b (θ) = − ln det (R (θ)) = − Tr (ln R (θ))
−1
M (θ) = −R−1 (θ) = − A (θ) AH (θ) + Rw
are the independent term and the kernel of the GML likelihood function, respectively. Consequently, the GML estimator is computed as follows:
GM L = arg min Tr ln R (θ) + R−1 (θ) R
θ
θ
(2.25)
In Appendix 2.D, we prove that the GML estimator converges to the low-SNR UML solution
(2.18) for σ2w → ∞ and to the CML solution (2.20) for σ2w → 0. Therefore, the GML estimator
is asymptotically efficient at low SNR and, evidently, for any SNR if the nuisance parameters
are Gaussian. Indeed, any statistical assumption about the nuisance parameters leads to the
UML solution (2.18) at low SNR. Consequently, the GML estimator can only be outperformed
using quadratic techniques in the medium-to-high SNR interval if the nuisance parameters are
non-Gaussian random variables. This subject is addressed thoroughly in subsequent chapters.
2.5
Maximum Likelihood Implementation
Generally, the ML-based estimators presented in the last section does not admit an analytical
solution6 and the maximization of the associated log-likelihood function must be carried out
using numerical techniques. In that case the log-likelihood function should be sampled. If
6
An exception is the estimation of the carrier phase in digital communications (see Section 6.2).
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
30
the samples separation is decided according to the sampling theorem, the ML estimate can
be determined by means of ideal interpolation. Otherwise, if the sampling rate violates the
Nyquist criterion, a gradient-based algorithm can be applied to find the maximum of ln fy (y; θ).
Moreover, if the gradient of ln fy (y; θ) has a single root in the parameter space, a gradient-based
algorithm is able to look for the maximum of ln fy (y; θ) without any assistance. Nonetheless,
in a multimodal problem, the same gradient-based method might converge to a local maximum
unless a preliminary search of the global maximum is performed.
The utilization of a gradient-based or iterative algorithm is generally preferred because it has
a lower complexity than the grid search implementation7 . The convergence of gradient-based
methods is guaranteed if and
is negative definite —and lower bounded—
only if the Hessian matrix
0 the initial guess [Boy04, Sec.
with θ
in the closed subset Θ = θ | fy (y; θ) ≥ fy y; θ0
8.3.]. Among the existing gradient-based methods, the Newton-Raphson algorithm is extensively
adopted because its convergence is quadratic —instead of linear— when the recursion approaches
M L ) [Boy04, Sec. 8.5.]. Other methods are the steepest descent
the log-likelihood maximum (θ
method, conjugate gradient, quasi-Newton method, among many others (see [Boy04][Lue84] and
references therein).
The Newton-Raphson iteration is given by
k+1 = θ
k − H−1 (y;θ
k )∇(y; θ
k )
θ
(2.26)
where k is the iterate index and
∂ ln fy (y; θ)
∂θ
∂ 2 ln fy (y; θ)
H (y; θ) ∂θ∂θT
∇(y; θ) are the gradient and the Hessian of the log-likelihood function, respectively. Notice that, in
a low-SNR scenario (2.17) and/or if the nuisance parameters are Gaussian (2.24), ∇(y; θ) is
yyH . In that case, the Newton-Raphson recursion
linear in the sample covariance matrix R
in (2.26) is quadratic in the observation y.
The quadratic convergence of the Newton-Raphson algorithm is accelerated when approachM L because ln fy (y; θ) becomes approximatelly parabolic around the current estimate θ
k ,
ing θ
that is,
k + ∇(y; θ
k ) θ−θ
k +
ln fy (y; θ) ln fy y;θ
T 1
Tr H(y; θk ) θ−θk θ−θk
2
7
Recall that the parameter θ ∈ RP is a continuous variable and we are assuming that fy (y; θ) is continuously
differentiable in θ.
2.5. MAXIMUM LIKELIHOOD IMPLEMENTATION
31
and, therefore, (2.26) yields approximatelly the ML solution for θ
−
θ
M L sufficiently small.
k
Notice that ln fy (y; θ) is strictly quadratic in case of linear estimation problems having additive
Gaussian noise [Kay93b, Theorem 3.5]. In that case, the ML estimate is obtained after a single
iteration of the Raphson-Newton algorithm. Otherwise, the convergence rate is slow if the logM L . In that case, however, the estimation
likelihood curvature is large near the maximum θ
accuracy is found to be superior.
The Newton-Raphson method in (2.26) can be generalized to estimate a given transformation
of the parameter [Sto01][Kay93b, Sec. 3.8] as follows
k )H−1 (y; θ
k )∇(y; θ
k )
k+1 = α
k − Dg (θ
α
(2.27)
where α = g (θ) is the referred transformation, Dg (θ) ∂g (θ) /∂θT is the Jacobian of g (θ)
ML ) holds from the invariance property of the ML estimator (Section 2.3.2).
M L = g(θ
and, α
According to the asymptotic properties of the ML estimator (Section 2.3.2), it follows that
any iterative method converging to the ML solution is asymptotically (M → ∞) consistent and
efficient, if the ML regularity condition is satisfied (2.12). In the asymptotic case, the small-error
condition is verified and the ML estimator attains the Cramer-Rao bound (Section 2.6.1), which
is given by
BCRB (θo ) Dg (θo ) J−1 (θo ) DH
g (θ o ) ,
where
J(θ) −Ey {H (y; θ)} = Ey ∇ (y; θ) ∇H (y; θ)
(2.28)
is the Fisher’s information matrix (FIM) and the expectation is computed with respect to the
random observation y. The last equality is a consequence of the regularity condition (2.12)
[Kay93b, Appendix 3A].
The asymptotic efficiency is also guaranteed if the Newton-Raphson method (2.27) is substituted by the following scoring method:
k )J−1 (θ
k )∇(y; θ
k ),
k+1 = α
k + Dg (θ
α
(2.29)
in which the Hessian matrix is replaced by the negative of its expected value (2.28). The method
of scoring is preferred because it improves the convergence to the ML solution for short data
records, mainly in multiparametric problems. However, both methods are equivalent if the
observation size goes to infinity.
2.5.1
ML-Based Closed-Loop Estimation
Conventionally, ML estimators are developed in batch mode, that is, the M samples of y are
recorded first and, afterwards, ln fy (y; θ) is iteratively maximized in order to find the ML
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
32
Nss
y:
y1
y2
M
y3
y4
y5
y6
y7
y8
z1 :
z2 :
Figure 2.3: Sequential processing of the received vector y in the context of digital communications. The observed blocks {zn } last M = 4 samples and are taken every Nss = 2 samples where
Nss is the number of samples per symbol.
ML . Unfortunately, the complexity and latency of this batch-mode implementation is
estimate θ
excessive when long observations are required to comply with the specifications. To ameliorate
this problem, the long observation y is fragmented into smaller blocks {zn }n=1,...,N that are
ergodic realizations of the same distribution fz (z; θ). The minimum block size is one sample, in
which case the estimator would work in a sample-by-sample basis. Identically distributed blocks
are feasible if the observation is (cyclo-)stationary.
In Appendix 2.E, it is shown that the following closed-loop estimator,
n )J−1 (θ
n )∇z (zn ; θ
n ),
n+1 = α
n + µDg (θ
α
z
(2.30)
is efficient in the small-error regime if the N partial observations zn are statistically independent,
where
∂ ln fz (z; θ)
∂θ
2
∂ ln fz (z; θ)
1
Jz (θ) −Ez
= Ez ∇z (z; θ) ∇H (z; θ) = J(θ)
T
N
∂θ∂θ
∇z (z; θ) is the gradient and the FIM for the block-size observations {zn }n=1,...,N , respectively. The stepsize or forgetting factor µ is selected to achieve the same performance than the off-line recursions
in (2.27) and (2.29). If N is sufficiently large, the parameter µ must be set to approximatelly
2/N (Appendix 2.E).
Although closed-loop estimators have the same aspect as their off-line versions in (2.27) and
(2.29), the closed-loop scheme in (2.30) aims at maximizing the stochastic likelihood function
fz (z; θ), which has a time-varying shape. Therefore, the gradient ∇z (z; θ) is also a random
2.5. MAXIMUM LIKELIHOOD IMPLEMENTATION
33
vector pointing into the direction of the maximum of fz (z; θ). Thus, the ML-based closed-loop
estimator proposed in (2.30) belongs to the family of stochastic gradient algorithms. Indeed,
equation (2.30) is referred to as the natural gradient in the context of neural learning [Ama98].
Despite the closed-loop estimator in (2.29) has been deduced assuming N independent blocks,
the necessary and sufficient condition for efficiency is more general and is formulated next.
Proposition 2.1 The closed loop estimator proposed in (2.30) is efficient in the small-error
regime if and only if there is at least one block zn in which each sample [y]m (m = 1, ..., M) is
jointly processed with all the samples that are statistically dependent on it.
The above proposition implies in most cases the partial overlapping of the observed blocks.
This means that the same sample is processed more than once. For example, in digital communications the received signal is cyclostationary if we have Nss > 1 samples per symbol. The data
symbols are usually i.i.d. random variables that modulate a known pulse p(t) lasting LNss samples. In that case, the optimal performance is achieved if the block size is equal to LNss samples
and the block separation is one sample. However, in order to have identically distributed blocks,
the block separation is usually set to Nss , taking into account the signal cyclostationarity (see
Fig. 2.3).
As it has been previously stated, closed-loop estimators yield efficient estimates if the small0 is usually far away
error regime is attained in the steady-state. However, the initial guess θ
from the true parameter θo and the algorithm has to converge towards θo . The initial convergence constitutes the estimator acquisition and has been studied for a long time [Mey90].
Unfortunately, only approximated results are available on the acquisition mean time, lock-in
and lock-out probability, etc. [Mey90]. The step-size µ in equation (2.30) can be adjusted to
trade acquisition speed —large µ— and steady-state performance —small µ.
Closed Loop Architecture
The ML-based closed loop proposed in (2.30) has two components (Fig. 2.4): a nonlinear discriminator (or detector) of the estimation error, and a first-order loop filter. The discriminator
input-output response is given by
e (zn ; θ) = Dg (θ)J−1
z (θ)∇z (zn ; θ)
n is the current estimate serving as a reference to infere the estimation error g(θ
n )−
where θ = θ
g (θo ) at time n.
The mean value of the discriminator output is given by
e (zn ; θ)} = Dg (θ)J−1
Ez {
z (θ)Ez {∇z (zn ; θ)} .
(2.31)
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
34
zn
J (θ n ) ∇ z ( i ;θ n )
−1
z
DISCRIMINATOR
θ n
µ
1+ z
−1
LOOP
FILTER
D g(θ n )
e (z n;θ n )
µ
1 + z −1
α n
LOOP
FILTER
Figure 2.4: Block diagram for the ML closed-loop estimator in equation (2.30). The same scheme
is applicable to any other closed-loop estimator or tracker if the discriminator and/or the loop
filters are conveniently modified.
It can be shown that the discriminator output is unbiased in the neighbourhood of the equilibrium point θ = θo because
e (zn ; θo )} = 0
Ez {
∂
E
{
e
(z
;
θ)}
= Dg (θo ) ,
z
n
∂θT
o
θ=θ o
taking into account that
Ez {∇z (zn ; θo )} = 0
∂
∂
E {∇z (zn ; θ)} =−
E {∇z (zn ; θ)} T z
T z
∂θo
∂θ
θ=θ
θ=θ o
o
∂ ln fz (zn ; θ) = −Ez
= Jz (θo )
∂θ∂θT
θ=θ o
is always verified in the studied linear signal model (Section 2.4). The first equation is the classical regularity condition introduced in (2.12) and the second equation is the Fisher’s information
matrix Jz (θ). Precisely, J−1
z (θ) normalizes the discriminator slope in (2.31) to have unbiased
estimates of θ − θo . The Jacobian matrix Dg (θ) is then used to obtain unbiased estimates of
g (θ)−g (θo ) taking into account that g (θ) can be linearized around θ θo using the first-order
Taylor expansion g (θ) g (θo ) + Dg (θo ) (θ − θo ).
In some problems of digital communications, the discriminator mean value (2.31) only depends on the estimation error θ − θo and is named the discriminator S-curve because it looks
like an “S” rotated by 90o [Men97][Mey90].
2.5. MAXIMUM LIKELIHOOD IMPLEMENTATION
2.5.2
35
ML-based Tracking
An important feature of the stochastic gradient methods previously presented is the ability of
tracking the evolution of slowly time-varying parameters. Thus, let us consider that θn is a
time-varying parameter and αn = g (θn ) a given transformation. The closed loop in (2.30)
must be modified to track the parameter evolution and supply unbiased estimates of θn in the
steady-state.
A first-order loop filter was used in the last section because the parameter was constant, i.e.,
θn = θo (Fig. 2.4). However, if θn has a polynomial evolution in time, i.e.,
θn = θo +
R−1
δ r nr
r=1
a Rth-order loop filter is required to track θn without systemmatic or pursuit errors
[Men97][Mey90]. For example, if θo is the carrier phase and we are designing a phase-lock
loop (PLL), δ 1 corresponds to the Doppler frequency and δ 2 to the Doppler rate.
Another alternative to take into account the parameter dynamics is the one adopted in the
Kalman filter theory [Kay93b, Ch.13]. In this framework, a dynamical model (or state-equation)
is assumed for the parameters of interest
θn+1 = f (θn ) ,
T
where θn stacks all the parameters involved in the dynamical model, i.e., θ = θTo , δ T1 , ..., δ TR−1
for the polynomial model above. Although the parameter dynamics are generally nonlinear, they
n , leading to the following approximation
can be linearized around the actual estimate θ
n
n θ n − θ
n + Df θ
f (θn ) f θ
where Df (θ) ∂f (θ)/∂θT is the Jacobian of f (θ).
If the parameter dynamics are incorporated into the original closed loop (2.30), we obtain
the following higher-order tracker
n ) + diag (µ) Dh θ
n J−1
n+1 = h(θ
α
z (θ n )∇n (zn ; θ n )
(2.32)
where h(θ) g (f (θ)) and
Dh (θ) ∂h(θ)/∂θT = Dg (θ) Df (θ)
is the Jacobian of the composite function h(θ) [Gra81, Sec. 4.3.]8 . The vector of forgetting
factors µ sets the (noise equivalent) loop bandwidth of each parameter in αn . In Appendix 2.E
8
n+1 = f (α
n ), the composition must be reversed having that
If the dynamical model is specified for αn , i.e., α
h (θ) f (g (θ)) .
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
36
it is shown that Bn µ/4. The loop bandwidth determines the maximum variability of the
parameters that the closed loop is able to track as well as the closed loop effective observation
time that, approximately, is equal to N 0.5/Bn samples [Men97, Sec. 3.5.6] (see also Appendix
2.E).
A vast number of tracking techniques have been proposed in the field of automatic control [Kai00][S¨
od89], signal processing [Kay93b] and communications [Men97][Mey90], e.g., least
mean squares (LMS) and recursive least squares (RLS) [Hay91][Kai00], Kalman-Bucy filtering
[And79][Hay91], machine learning [Mit97], etc. In fact, filtering, smoothing, prediction, deconvolution, source separation and other applications can be seen as particular cases of parameter
estimation or tracking in which the aim is to determine the input data at time n, say θn , from
a vector of noisy observations.
2.6. LOWER BOUNDS IN PARAMETER ESTIMATION
2.6
37
Lower Bounds in Parameter Estimation
The calculation of an attainable benchmark for the adopted performance criterion is necessary
to identify whether a given estimation technique is efficient or not. For example, the ML
estimator is known to be optimal in the small-error regime because it attains the Cramer-Rao
lower bound. Once the optimal performance is known, suboptimal techniques can be devised
trading-off performance and complexity. Moreover, lower bounds usually give insight into the
contribution of the different parameters onto the estimator performance (e.g., SNR, observation
size and others). In the following sections, some important lower bounds are briefly described.
Focusing on the mean squared error (MSE), lower bounds can be classified as Bayesian or
deterministic depending on whether the prior statistics of the parameters are exploited or not.
On the other hand, lower bounds are also classified into small-error (or local) bounds and largeerror (global) bounds. Furthermore, the lower bounds in the literature are derived from either
the Cauchy-Schwarz or Kotelnikov inequalies.
From the above classification criteria, the most important lower bounds in the literature
are described and interconnected in the following subsections. Finally, all these bounds are
organized and presented in a concluding table at the end of the section (Fig. 2.5).
NOTE: the material in the following section is not essential to understand the central chapters of the dissertation. Only those lower bounds derived from the CRB in the presence of
nuisance parameters will be extensively used throughout the thesis. Thus, we recommend the
reader to skip Section 2.6 in the first reading.
2.6.1
Deterministic Bounds based on the Cauchy-Schwarz Inequality
A large number of deterministic lower bounds on the mean square error (MSE) have been derived
from the Cauchy-Schwarz inequality, e.g., [Gor90, Eq. 10][Abe93, Eq. 5][McW93, Eq. 2][Rif75,
Eq. 13]. The Cauchy-Schwarz inequality states that
# H E se
E eeH ≥ E esH E ssH
(2.33)
for two arbitrary random vectors e and s.9 The Moore-Penrose pseude-inverse operator was
introduced in [Gor90, Eq. 10][Sto01] to cover those cases in which E ssH is singular. Notice
that the expectation is computed with respect to the random components of e and s. Furthermore, equation (2.33) holds with equality if and only if the vector e and s are connected as
9
For the scalar case, we have the conventional Cauchy-Schwarz inequality, E |e|2 ≥ |E {es}|2 /E |s|2 , as it
appears in [Wei88b, Eq. 7]
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
38
follows
#
s.
e = E esH E ssH
(2.34)
Indeed, the Cauchy-Schwarz inequality is a consequence of the more general relation
A BH
≥ 0 ⇔ A ≥ BH C# B
(2.35)
BC
which is valid if C is non-negative definite [Mag98, Ex. 3, p. 25]. This property is used in [Gor90,
Lemma 1] to prove the vectorial Cauchy-Schwarz inequality (2.33). The proof is straightforward
T
if (2.35) is applied to the matrix E zzH with z eT , sT . Also, this matrix inequality is
adopted in [McW93, Eq. 2] to analyze the geometry of several “quadratic covariance bounds”.
Based on the Cauchy-Schwarz inequality (2.33), lower bounds on the estimation mean square
− g (θ) is the estimation error and s an arbierror can be formulated considering that e = α
trary score function. In the deterministic case, both e and s are functions of the random
observation y, which is distributed as fy (y; θ). Various deterministic lower bounds on the
MSE have been deduced by selecting different score functions s as, for instance, the following
well-known bounds; Cram´er-Rao [Kay93b, Chapter 3], Battacharyya [Bat46], Barankin [Bar49],
Hammersley-Chapman-Robbins [Cha51][Ham50], Abel [Abe93] and Kiefer [Kie52], among others.
Because (2.33) is valid for any score function, the aim is to find the score function leading
to the highest lower bound on the estimator MSE and, if possible, the estimator attaining the
resulting bound. Conversely, if an estimator satisfies (2.33) with equality for a given score
function, the resulting bound is the tightest, attainable lower bound on the MSE. Furthermore,
this estimator is the one holding (2.34).
In [McW93], it is shown that tight lower bounds are obtained provided that
P1: the score function is zero-mean, i.e.,
E {s (y, θ)} =
s (y, θ) fy (y; θ) dy = 0
for every value of θ. Thus, we are only concerned with unbiased estimators since the
estimation error is proportional to s (y, θ) (2.34);
P2: the score function is a function of the sufficient statistics of the estimation problem at hand.
Recall that t (y) is a sufficient statistic if and only if fy (y; θ) depends on the parameter
vector θ uniquely throught a function of the sufficient statistic t (y) . Consequently, s (y, θ)
can be any biunivoque function of the likelihood function fy (y; θ), as for instance, its
gradient ∇(y; θ). See the Neyman-Fisher factorization theorem in [Kay93b, Th. 5.3];
2.6. LOWER BOUNDS IN PARAMETER ESTIMATION
39
P3: the score function must hold (2.34). This mean that s (y, θ) must span the estimation
error subspace.
The first property is really important because it states that we only have to consider unbiased
estimators of the parameter. In fact, it can be shown that the bias term always increases the
overall MSE and it is not possible to trade bias for variance in the deterministic case. To show
this result, we have to decompose the estimation error as
e (y; θ) b (θ) + v (y; θ)
(y) −E {
with b (θ) = E {
α (y)} − g(θ) the estimator bias and v (y; θ) = α
α (y)} the deviation
with respect to the estimator mean. Consequently, the estimator MSE can be written as
Σee (θ) = b (θ) bH (θ) + Σvv (θ)
where
Σxy E xyH
stands for the cross correlation matrix10 . Then, the Cauchy-Schwarz inequality (2.33) can be
applied to the covariance matrix Σvv (θ) in order to obtain the following lower bound on the
MSE:
Σee (θ) ≥ b (θ) bH (θ) + Σvs (θ) Σ#
ss (θ) Σsv (θ)
(2.36)
in which the bias function has been set to b (θ) [Abe93, Eq. 6]. Equation (2.36) is usually
referred to as the “covariance inequality” [Gor90][McW93][Abe93, Eq. 6]. Therefore, if the
covariance inequality in (2.36) is compared with the original bound,
#
Σee (θ) ≥ Σes (θ) Σ#
ss (θ) Σse (θ) = Σvs (θ) Σss (θ) Σsv (θ) ,
it follows that the bias term b (θ) bH (θ) can never reduce the MSE matrix Σee (θ). In the last
expression, we take into account that Σes = Σvs because the score function is zero-mean.
The Cauchy-Schwarz inequality can be used then to extend the concept of efficiency to
(y) is an efficient estimator
other lower bounds besides the usual Cram´er-Rao bound. Thus, α
of α = g(θ) if and only if it holds that
E {e (y, θ)} = 0
Σee (θ) = Σvv (θ) = Σvs (θ) Σ#
ss (θ) Σsv (θ)
(2.37)
(2.38)
for (at least) a score function s (y, θ) .
10
Notice that the transpose conjugate will be considered in the sequel for both real and complex vectors.
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
40
(y) is efficient if and only if it verifies
Additionally, we know from (2.34) that the estimator α
that
(y) = g(θ) + Σvs (θ) Σ#
α
ss (θ) s (y, θ)
(2.39)
for any value of θ.
An important question is whether a realizable11 , unbiased estimator can attain the covariance
inequality or not for a given score function. If this estimator was found, the resulting covariance
would constitute the highest lower bound. Therefore, any other score function s (y, θ) would
yield a weaker bound on the MSE, which will not be attainable. Next, a sufficient condition on
s (y, θ) leading to realizable estimators is shown.
Proposition 2.2 If the zero-mean score function can be factorized as
s (y, θ) H (θ)z (y) −u(θ)
with z (y) a function of the sufficient statistics t (y) and
H
Σvs (θ) Σ#
ss (θ) H (θ) M
Σvs (θ) Σ#
ss (θ) u(θ) = g (θ) ,
(y) = MH z (y) is efficient and its covariance matrix is given by
the estimator α
H
H
Σee (θ) = Σvv (θ) = Σvs (θ) Σ#
ss (θ) Σsv (θ) = M Σzz M − g(θ)g (θ)
that becomes therefore the highest lower bound on the estimation error covariance.
Unfortunately, most score functions of interest cannot be factorized as in the last proposition for all the values of θ. Consequently, efficient estimators are usually unrealizable in the
deterministic framework. In that case, efficient deterministic estimators are only feasible in the
small-error regime once the value of θ has been iteratively learnt using a suitable gradient-based
method. Notice that this was the adopted approach in the case of the ML estimator and the
associated Cram´er-Rao bound. Thus, the following scoring method
k Σ#
k + Σvs θ
k+1 = g θ
α
ss θ k s y,θ k
k θ) for any score function.
is efficient in the small-error regime (i.e., θ
Consequently, all the deterministic bounds will converge to the Cram´er-Rao bound in the
small-error regime. However, the Cram´er-Rao bound is not attained when the estimator operates
in the large-error regime. In that case, tighter bounds can be formulated by using a better score
function. Next, the score functions associated to the most important large-error and small-error
deterministic bounds are presented.
11
(y) does not depend on the vector of unknown parameters θ.
The adjective “realizable” means that α
2.6. LOWER BOUNDS IN PARAMETER ESTIMATION
41
Barankin Bound (BB)
The Barankin bound was originally formulated in [Bar49] for scalar, real-valued estimation
problems. The Barankin bound is constructed looking for the estimator minimizing its sth -order
absolute central moment subject to the unbiased constraint over all the parameter space Θ, i.e.,
α
BB = arg min E |
α − g (θo )|s subject to E {
α} = g (θ)
α
(2.40)
for every θ ∈ Θ. Focusing on the estimator variance (s = 2), it can be only stated that α
BB
is the minimum variance unbiased estimator in the neighbourhood of θo . Furthermore, if the
obtained local solution is independent of θo , α
BB turns out to be the global minimum variance
unbiased estimator.
The Barankin bound has been extended in [Mar97] to multivariate estimation problems
adopting a simpler formulation than the original one. In [Mar97, Eq. 9] the Barankin bound
was shown to be a covariance inequality bound (2.38) with
fy (y; θ)
θ)dθ
s (y; θ) =
f (θ,
f
(y;
θ)
y
θ∈Θ
(2.41)
θ) ∈ RP an arbitrary function that must be selected to
the adopted score function, and f (θ,
supply the tightest covariance lower bound. Notice that tighter lower bounds will be obtained
if the mean of the score function is null (property 1), i.e.,
(θ,
θ)dθ
= 0.
fy (y; θ)f
E {s (y; θ)} =
θ∈Θ
θ) leading to the tightest lower bound must be proportional to the
Therefore, the functions f (θ,
θ) and f2 (θ,
θ), i.e.,
difference of two vectors of probability density functions f1 (θ,
θ) = κ f1 (θ,
θ) − f2 (θ,
θ)
f (θ,
θ)dθ
= f2 (θ,
θ)dθ
= 1. This relevant
with κ an arbitrary constant (e.g., κ = 1) and f1 (θ,
θ) was taken into account in [Tre68, Pr. 2.4.18] to derive the Barankin bound
property of f (θ,
in a different way. Also, a multidimensional version of the Kiefer bound [Kie52] can be obtained
θ) by a multivariate delta measure δ(θ−θ).
replacing f2 (θ,
Using now the covariance inequality, we have that the Barankin bound for the estimation of
α = g(θ) is given by
BBB (θ) = sup Σvs (θ) Σ#
ss (θ) Σsv (θ) ≤ Σvv (θ)
)
f (θ
with
− g(θ) f H (θ,
θ)dθ
g(θ)
θ∈Θ
⎫
⎧ 1 fy y; θ
2 ⎬
⎨ fy y; θ
, θ)dθ
1 dθ
2 .
, θ)f H (θ
E
Σss (θ) =
f (θ
1
2
⎩
⎭
fy2 (y; θ)
1 ,θ
2 ∈Θ
θ
Σvs (θ) =
(2.42)
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
42
Notice that the original bound [Bar49] is somewhat more involved because the integral on θ
is formulated as a Riemann integration, that is,
θ∈Θ
Q
, θ)
q f(θ
ξ θ f (θ, θ)dθ = lim
ξ θ
q
Q→∞
q=1
Q are selected to expand the whole parameter domain
1 , ..., θ
where the so-called test points θ
Θ and take into account the existence of large errors. In fact, we can understand the original
θ) is sampled at the test
approach as the bound obtained when the continuous function f (θ,
Q . From the sampling theory, the separation of the test points should be adjusted
1 , ..., θ
points θ
θ). Specifically, a dense sampling —closer
according to the variability of the selected function f (θ,
θ) is more abrupt and vice versa. An
test points— should be applied to those regions where f (θ,
important consequence of the sampling theorem is that infinite test points are needed if the
θ). This comment is related with
parameter range is finite whatever the selected function f (θ,
the fact that unbiased estimators do not exist for all θ ∈ Θ when Θ is a finite set.12
If the number of test points is finite, the Barankin bound is only constrained to be unbiased
1 , ..., θ
Q [For02]. Consequently, the resulting lower bound is not the highest
at the test points θ
Barankin bound (Q → ∞) but it is generally realizable even if the parameter range is finite.
The resulting bound can be improved by considering also the bias derivatives at the test points.
This idea has been applied to derive other hybrid lower bounds in [Abe93] or [For02]. Also, the
same reasoning was applied in [Vil01a] to design second-order almost-unbiased estimators.
The Barankin bound theory has been applied to determine the SNR threshold in a lot of
nonlinear estimation problems as, for example, time delay estimation [Zei93][Zei94] or frequency
estimation [Kno99]. A geometric interpretation of the Barankin bound is provided in [Alb73]
and references therein.
12
If an estimator were unbiased in the boundary of Θ, this would imply that the estimation error must be zero
for these values of θ. Unfortunately, this situation is unreal and biased estimators are unavoidable along the
boundary of Θ.
2.6. LOWER BOUNDS IN PARAMETER ESTIMATION
43
Hammersley-Chapman-Robbins Bound (HCRB)
The simplest Barankin bound was formulated by Chapman and Robbins [Cha51] and Hammersley [Ham50] simultaneously by considering a single test point per parameter, i.e., Q = P . This
simplified version is by far the most usual variant of the Barankin bound. The original scalar
bound was extended to deal with multidimensional problems by Gorman et al. [Gor90]. In that
θ) ∈ RP in the following manner:
paper, every test point determines a single component of f (θ,
−θ
p − δ θ
−θ
δ θ
θ) =
f (θ,
,
p
θ p − θ p −θ are linearly independent and span the entire parameter space Θ.
where the P vectors δ p θ
in case of having P test points [Wei88b,
It can be shown that this is the optimal choice of f θ
Eq. 33]. Therefore, the p-th element of the score function (2.41) becomes
[s (y; θ)]p =
fy (y; θ + δ p ) − fy (y; θ)
for p = 1, ..., P
δ p fy (y; θ)
and the multiparametric Hammersley-Chapman-Robbins bound is given by
BHCRB (θ) = sup Σvs (θ) Σ#
ss (θ) Σsv (θ) ≤ Σvv (θ)
δ 1 ,...,δ P
with
g (θ + δ p ) − g (θ)
δ p Σss (θ) = E s (y; θ) sH (y; θ) .
[Σvs (θ)]p =
Cram´
er-Rao Bound (CRB)
The Cram´er-Rao bound can be obtained from the Hammersley-Chapman-Robbins bound when
the P test points converge to the true parameter θ [Gor90][For02]. This means that the CRB is
only able to test the small-error region whereas the Barankin-type bounds were able to test the
large-error region, as well. The CRB score function is shown to correspond to the projection of
the log-likelihood gradient ∇y (y; θ) onto the directions determined by δ 1 , ..., δ P , i.e.,
fy (y; θ + δ p ) − fy (y; θ)
∂fy (y; θ) /∂θ
= δH
= δH
p
p ∇y (y; θ)
δ p fy (y; θ)
fy (y; θ)
δ p →0
[s (y; θ)]p = lim
(2.43)
and, thus,
s (y; θ) = WH ∇y (y; θ)
(2.44)
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
44
with W [δ 1 , ..., δ P ] the non-singular square matrix stacking the P linearly-independent directions. Therefore, the CRB bound is given by
BCRB =
lim sup
δ 1 ,...,δ P →0
BHCRB = Σvs (θ) Σ#
ss (θ) Σsv (θ)
= Dg (θ) J# (θ) DH
g (θ) ≤ Σvv (θ)
(2.45)
where Σvs (θ) and Σss (θ) are given by
∂g (θ)
∂θT
J (θ) E ∇y (y; θ) ∇H
y (y; θ) = −E {Hy (y; θ)} ,
Dg (θ) respectively (Appendix 2.F). The matrix W becomes irrelevant provided that W−1 exists and,
thus, we can choose the canonical basis W = IP .
Notice that the CRB bound only makes sense in estimation problems, i.e., when the parameter is continuous and the first- and second-order derivatives exist for θ ∈ Θ. On the other
hand, the above large-error bounds could be also applied to detection problems in which the
parameters are discrete variables.
In [Fen59, Th. 1], it is shown that the necessary and sufficient condition for a statistic z (y)
to attain the CRB is that fy (y; θ) belongs to the exponential family below
fy (y; θ) = exp hT (θ)z (y) + u(θ) + v (y)
(2.46)
whatever the content of h(θ), u(θ) or v (y). From the fourth property of the ML estimator
in Section 2.3.2, it follows that z (y) must be the maximum likelihood estimator. This result
can also be obtained by introducing the CRB score function (2.43) into Proposition 2.2. The
existence of efficient estimates for the exponential family is relevant since the normal, Rayleigh
and exponential distributions are members of this family [Kay93b, Pr. 5.14].
Another interpretation of the Cram´er-Rao bound is possible [For02] if equation (2.40) is
evaluated locally for every value of the true parameter θo . Thus, the Cr´amer-Rao bound can be
obtained solving the following optimization problem:
∂b (θ) min E α − g (θo ) subject to b (θo ) = 0 and,
=0
∂θ θ=θ o
α
2
where b (θ) = E {
α} − g (θ) stands for the estimator bias.
Finally, the Cram´er-Rao bound can also be derived by expanding the log-likelihood function
in a quadratic Taylor series around the true parameter θ = θo (small-error condition), obtaining
that
ln fy (y; θ) ln fy (y; θo ) + ∇ (y; θo ) (θ − θo ) +
1 Tr H (y; θo ) (θ − θo ) (θ − θo )H
2
(2.47)
2.6. LOWER BOUNDS IN PARAMETER ESTIMATION
45
where ∇ (y; θ) and H (y; θ) are the gradient and Hessian of the log-likelihood function. Thus,
the gradient of the log-likelihood is linear in the parameter of interest,
∇ (y; θ) ∇ (y; θo ) + H (y; θo ) (θ − θo ) ,
and becomes zero for
ML θo − H−1 (y; θo ) ∇ (y; θo ) .
θ
Taking now into account the invariance property of the ML estimator, we obtain the following
clairvoyant estimator of α = g (θ),
M L ) g(θo ) + Dg (θo ) H−1 (y; θo ) ∇ (y; θo ) ,
M L = g(θ
α
whose covariance matrix coincides with the CRB (2.45). Although the above estimator does not
admit a closed form unless fy (y; θ) belong to the exponential family (2.43), efficient estimates
are approximatelly supplied by the Newton-Raphson and scoring algorithms in the small-error
M L θo (Section 2.5).
k = θ
regime, i.e., limk→∞ θ
Bhattacharyya Bound (BHB)
The Bhattacharyya bound constitutes an extension of the CRB when considering the higherorder derivatives in the Taylor expansion of fy (y; θ) (2.47). Therefore, it is also a small-error
bound with higher-order derivative constraints on the bias. Indeed, it can be seen as the result
of the following optimization problem [For02]:
∂ n bH (θ) α − g (θo ) subject to b (θo ) = 0 and,
= 0 (i = 1, ..., N)
min E ∂θn θ=θ o
α
2
where b (θ) = E {
α} − g (θ) stands for the estimator bias and, ∂θn ∈ RP
n
stands for the
vectorized n-th power of the differential ∂θ, which can be computed recursively as ∂θn =
vec ∂θn−1 ∂θT with ∂θ1 ∂θ. Notice that the CRB corresponds to N = 1.
= z (y) is an
To motivate the interest of the Bhattacharyya bound, let us consider that θ
efficient, unbiased estimator of θ and, therefore, the likelihood function is given by (2.46). Let
us consider the estimation of the following polynomial in θ of order I,
α = g(θ) =
I
Gi θi ,
i=0
i
with θi ∈ RP the vectorized i-th power of θ. It can be shown that the estimator
(y) = g(θ) + Σvs (θ) Σ#
α
ss (θ) s (y; θ)
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
46
attains the N-th order Bhattacharyya bound for N ≥ I [Gor91, Prop. 3][Fen59, Th. 2] with
sn (y; θ) =
1
∂ n ln fy (y; θ)
∂ n fy (y; θ)
=
fy (y; θ)
∂θn
∂θn
(n = 1, ..., N),
the n-th component of the Bhattacharyya score funtion s (y; θ) = [sT1 (y; θ) , ..., sTN (y; θ)]T .
Accordingly, the Bhattacharyya bound becomes
BBHB (θ) = Σvs (θ) Σ#
ss (θ) Σsv (θ) ≥ Σvv (θ)
where
∂g (θ) ∂ 2 gH (θ)
∂ N gH (θ)
,
,
.
.
.
,
Σvs (θ) =
2 T
N T
(∂θ)T
∂θ
∂θ
Σss (θ) = E s (y; θ) sH (y; θ)
bearing in mind the results in Appendix 2.F [Abe93].
(y) is unable to attain the N-th Battacharyya bound for any N < I
It can be proved that α
and hence the Cram´er-Rao bound (N = 1). Moreover, the ML estimator is not efficient even in
the asymptotic case [Fen59].
Finally, the Bhattacharyya bound can also be obtained from the Barankin bound when
we have at least Q = N × P test points that converge to the true parameter θ following N
linearly-dependent trajectories per parameter [Gor91, Sec. 4][For02]. In [For02], the N colinear
trajectories corresponding to the p-th parameter are θ+nδ p with δ p → 0 and n = 1, ..., N .
Therefore, we have that
BBHB = lim
nδ p →0
BHCRB
for p = 1, ..., P and n = 1, ..., N.
Deterministic Cram´
er-Rao Bounds in the presence of Nuisance Parameters
All the above lower bounds are formulated from the likelihood function fy (y; θ). If we deal with
a blind estimation problem in which there is a vector of unknown stochastic nuisance parameters
x, we have to calculate fy (y; θ) from the conditional p.d.f. fy/x (y/x; θ) as explained in Section
2.3 and indicated next
fy (y; θ) = Ex fy/x (y/x; θ) =
fy/x (y/x; θ) fx (x) dx.
Therefore, the same assumptions about the nuisance parameters leading to the conditional
and Gaussian ML estimators in Section 2.4.2 and 2.4.3 can be applied now to obtain their
asymptotic performance in the small-error regime. In the first case, we obtain the so-called
2.6. LOWER BOUNDS IN PARAMETER ESTIMATION
47
conditional CRB (CCRB) and, in the second case, the Gaussian unconditional CRB (UCRB).
The CCRB and UCRB were deduced in [Sto90a][Sto89] [Ott93] in the context of array signal
processing and adapted to the field of digital synchronization in [Vaz00][Rib01b][Vaz01] and
references therein.
To obtain the (Gaussian) UCRB, the observed vector y is supposed to be normally distributed (2.24). Likewise, the CCRB is obtained assuming that y is distributed according to the
conditional p.d.f. fy (y; θ,
xM L (θ)) (2.19). Therefore, the CCRB and UCRB are not “universal”
lower bounds and, in general, they are only meaningful in the ambit of the conditional or the
unconditional assumptions.
Thus, the CCRB and UCRB can be derived from the CRB formula (2.45) under the assumption adopted on the nuisance parameters. In the multidimensional case, it is obtained in
Appendix 2.G that
T
BCCRB (θ) = Dg (θ) J#
c (θ) Dg (θ)
(2.48)
T
BU CRB (θ) = Dg (θ) J#
u (θ) Dg (θ)
(2.49)
−1 ⊥
D
(θ)
I
⊗
R
P
(θ)
Jc (θ) 2 Re DH
K
a
a
w
A
(2.50)
−1
∗
Ju (θ) DH
Dr (θ)
r (θ) (R (θ) ⊗ R (θ))
(2.51)
where
are the Fisher’s information matrix for the conditional and unconditional model, respectively,
and Da (θ) , Dr (θ) are defined as
∂A (θ)
[Da (θ)]p vec
∂θp
∂R (θ)
[Dr (θ)]p vec
.
∂θ p
The CCRB predicts the asymptotic performance of the CML and GML quadratic estimators
when the SNR goes to infinity. On the other hand, the UCRB supplies the performance of the
GML estimator for Gaussian nuisance parameters or, in general, for infinitely large samples.
These two bounds are generally applied to bound the (small-error) variance of second-order
estimation methods. However, in this dissertation it is shown that, if the nuisance parameters
belong to a polyphase alphabet of constant modulus, this information can be exploited —using
exclusively quadratic processing— to improve the CML and GML estimates. The covariance of
the resulting estimator is shown in Chapter 4 to be the highest lower bound on the performance
of any second-order technique. The resulting bound is deduced in Section 4.2 and has the
following form
H
BBQU E (θ) = Dg (θ) J#
2 (θ) Dg (θ) ,
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
48
where BQUE is the acronym of “Best Quadratic Unbiased Estimator” [Vil01a][Vil05] and
−1
J2 (θ) DH
(θ) Dr (θ)
r (θ) Q
(2.52)
becomes the Fisher’s information matrix in second-order estimation problems (4.14) with Q (θ)
the matrix containing the central fourth-order moments of y (3.10).
Another useful lower bound is the so-called modified CRB (MCRB). This bound was deduced
in the context of digital synchronization by D’Andrea et al. [And94] under the assumption that
all the nuisance parameters are known (see also [Men97][Moe98][Vaz00]). This assumption
corresponds to data-aided estimation problems in which the input signal is known. Thus, the
MCRB allows assessing the performance loss due to the lack of knowledge about the nuisance
parameters in blind estimation problems.
In the multidimensional case, the MCRB is given by
H
BM CRB (θ) = Dg (θ) J#
m (θ) Dg (θ) ≤ Σvv (θ)
(2.53)
where
Jm (θ) −Ex Ey/x
∂ 2 ln fy/x (y/x; θ)
∂θ∂θT
−1
Da (θ) .
= 2 Re DH
a (θ) IK ⊗ Rw
(2.54)
is deduced in Appendix 2.G.
To conclude this section, let us explain how the lower bounds above are connected in the
studied linear model. It can be shown that
BU CRB (θ) ≥ BBQU E (θ) ≥ BCRB (θ) ≥ BM CRB (θ) .
(y) is a second-order unbiased estimator of g (θ), the associated error covariAdditionally, if α
ance matrix holds that
Σvv (θ) ≥ BBQUE (θ) ≥ BCRB (θ) ≥ BM CRB (θ) ,
and the following statements are verified:
1. BCRB (θ) = BM CRB (θ) if the nuisance parameters are known [And94]. Alternatively, the
MCRB could be attained in high-SNR scenarios if the mean of the nuisance parameters
were not zero (i.e., semiblind estimation problems).
is a sufficient statistic for the estimation problem
2. BBQUE (θ) = BCRB (θ) if and only if R
at hand. This occurs in case of Gaussian nuisance parameters (Section 2.4.3), or in lowSNR scenarios (Section 2.4.1) whatever the distribution of the nuisance parameters x.
2.6. LOWER BOUNDS IN PARAMETER ESTIMATION
49
3. BU CRB (θ) = BBQU E (θ) if the nuisance parameters are Gaussian or the SNR is sufficiently
low. Moreover, if the amplitude of x is not constant, it is shown in this thesis that the
Gaussian assumption supplies asymptotically (M → ∞) second-order efficient estimates,
i.e., BU CRB (θ) → BBQU E (θ). This point is intensively studied in Chapter 7.
4. BCCRB (θ) ≤ BU CRB (θ) and BCCRB (θ) = BU CRB (θ) if the SNR tends to infinity (Appendix 2.D).
2.6.2
Bayesian Bounds based on the Cauchy-Schwarz Inequality
In the Bayesian case, lower bounds on the estimator MSE can also be derived from the CauchySchwarz inequality
#
Ey,θ eeH ≥ Ey,θ esH Ey,θ ssH
Ey,θ seH
in which the expectation involves also the random parameters and the score function s (y, θ) is
zero-mean for any value of y [Wei88b, Eq. 1], i.e.,
Eθ/y {s (y, θ)} = s (y, θ) fθ/y (θ/y) dθ = 0
(2.55)
and, therefore, Ey,θ {s (y, θ)} = Ey Eθ/y {s (y, θ)} = 0. Once again the bound is attained if
and only if the estimation error is proportional to the selected score function, i.e.,
#
e (y, θ) = Ey,θ esH Ey,θ ssH
s (y, θ) .
It is known that the conditional mean estimator yields the highest lower bound on the
(Bayesian) MSE [Wei88b, Eq. 9][Kay93b, Sec. 11.4] with
s (y; θ) = e (y; θ) = Eθ/y {g(θ)/y} − g (θ)
the associated score function. However, the conditional mean estimator is often not practical
because it usually requires numerical integration. For this reason, some simpler but weaker lower
bounds have been proposed in [Wei88b] by adopting a different set of score functions. Accordingly, none of these bounds will be attained unless they coincide with the MMSE bound. Among
these bounds, we can find the Bayesian Cram´er-Rao [Tre68] [Wei88b], Bayesian Bhattacharyya
[Tre68][Wei88b], Bobrovsky-Zakai [Bob76] and Weiss-Weinstein [Wei85][Wei88b]. These bounds
are the Bayesian counterparts of the CRB, Bhattacharyya, Hammersley-Chapman-Robbins and
Barankin-type deterministic bounds, respectively, in which the likelihood function fy (y; θ) is
substituted by the joint p.d.f. fy,θ (y, θ). Notice that Bayesian bounds are implicitly large-error
bounds because the whole range of θ is considered by means of the parameter prior fθ (θ). The
Weiss-Weinstein bound is briefly described in the following section since it is the most general
one.
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
50
Weiss-Weinstein Bound (WWB)
The Weiss-Weinstein bound can be understood as the Bayesian version of a Barankin-type bound
in which multiple test points are considered. The score function of the WWB is given by
Qs (y, θ, δ) f (δ) dδ
s (y, θ) =
θ±δ∈Θ
with Qs (y, θ, δ) defined as
Qs (y, θ, δ) fy,θ (y, θ + δ)
fy,θ (y, θ)
s(δ)
−
fy,θ (y, θ − δ)
fy,θ (y, θ)
1−s(δ)
and the terms 0 < s (δ) < 1 and f (δ) selected to produce the tightest lower bound. If we choose
s (δ) = 1, we have exactly the Bayesian replica of the Barankin bound. However, the authors
showed that tighter lower bounds can be derived with s (δ) < 1.
The above score function verifies the regularity condition Eθ/y {s (y, θ)} = 0 in (2.55) so
that the WWB can be computed as
BW W B = sup Σes Σ#
ss Σse ≤ Σee
f (δ),s(δ)
where
Σes = Ey,θ e (y, θ) sH (y, θ) = −Ey,θ g (θ) sH (y, θ)
fy,θ (y, θ + δ) s(δ) H
[g (θ + δ) − g (θ)]
f (δ) dδ
= Ey,θ
fy,θ (y, θ)
θ±δ∈Θ
Σss = Ey,θ s (y, θ) sH (y, θ)
Thus far, infinite test points have been considered as done in the initial approach to the
Barankin bound in (2.41). If a finite number of Q test points shall be considered, we can always
$
use a set of delta measures, f (δ) = Q
q=1 f (δ q ) δ (δ − δ q ) , to obtain the following score function
s (y, θ) =
Q
Qs y, θ, δ q f (δ q ) ,
q=1
that must be optimized for {δ q }q=1,...,Q , {f (δ q )}q=1,...,Q and {s (δ q )}q=1,...,Q . In that case, the
Qth-order WWB can be obtained as indicated next
BW W B =
=
sup
{δ q },{f (δ q )},{s(δ q )}
sup
{δ q },{s(δ q )}
Σes Σ#
ss Σse =
GQ# GH
sup
{δ q },{f (δ q )},{s(δ q )}
H
G FF#
Q# FF# GH
2.6. LOWER BOUNDS IN PARAMETER ESTIMATION
51
where Σes = GFH and Σss = FQFH are given by
[G]q Ey,θ
fy,θ (y, θ + δ q )
[g (θ + δ q ) − g (θ)]
fy,θ (y, θ)
s(δ q ) F [f (δ1 ) , . . . , f (δ Q )]
[Q]p,q Ey,θ Qs y, θ, δp Qs y, θ, δ q .
A simpler expression is obtained if g(θ) = θ. In that case, we get the original WWB bound
[Wei85], that is given by
# ∆H ≤ Σee
BW W B = sup ∆Q
(2.56)
∆
with ∆ [δ 1 , . . . , δ Q ] and
Q
p,q
Ey,θ Qs y, θ, δ p Qs y, θ, δ q
,
Ey,θ Ls y, θ, δ p Ey,θ Ls y, θ, δ q
using the following definition
Ls (y, θ, δ) fy,θ (y, θ + δ)
fy,θ (y, θ)
s(δ)
.
The optimization of BW W B is normally prohibitive and the authors suggest in [Wei88b, Eq.
39] to work with s (δ q ) = 1/2 because it is usually the optimal choice in the unidimensional case.
In that case, it is possible to write the WWB bound in terms of the distance
µ (s, θ, δ) ln Ey,θ {Ls (y, θ, δ)} = ln
1−s
s
fy,θ
(y, θ + δ) fy,θ
(y, θ) dydθ
used to derive the Chernoff bound on the probability of detection error [Tre68, p. 119]. Thus,
can be represented in terms of the Bhattacharyya distance µ (1/2, δ) as follows
the matrix Q
Q
p,q
2
eµ(1/2,δ p −δ q ) − eµ(1/2,δ p +δ q )
.
eµ(1/2,δ p )+µ(1/2,δ q )
As it happened in the deterministic case, the Bobrovsky-Zakai, Bayesian Cram´er-Rao and
Bhattacharyya bounds can be deduced from the more general Weiss-Weinstein bound in (2.56).
Specifically, the Bobrovsky-Zakai bound is obtained by setting s = 1 and Q = P (i.e., a test
point per parameter). The Bayesian CRB is obtained from the Bobrovsky-Zakai bound if the
Q = P test points converge to the true parameter along linearly-independent lines. In addition,
the Nth-order Bhattacharyya bound is obtained when there are N × P test points converging
to the true parameter through P linealy-independent trajectories.
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
52
2.6.3
Bayesian Bounds based on the Kotelnikov’s Inequality
Other important class of Bayesian lower bounds are obtained from the Kotelnikov’s inequality
proposed for the first time in [Kot59, p. 91], and used afterwards in [Bel74, Eq. 2] and [Cha75]
to bound the MSE in case of a single uniformly distributed parameter. The Kotelnikov’s result
is extended in [Bel97, Eq. 11] to admit any distribution of the parameter of interest, resulting
in the following inequality
Pr (|e| ≥ δ) ≥
∞
−∞
Pe (θ − δ, θ + δ) [fθ (θ − δ) + fθ (θ + δ)] dθ D (δ)
(2.57)
where e = θ − θo is the estimation error for the scalar case, fθ (θ) is the parameter prior and
Pe (θ − δ, θ + δ) is the minimum error probability associated to the following binary detection
problem:
Definition 1 Let us assume that the parameter θ o could take only two possible values, θ− θ−δ
and θ+ θ + δ with probabilities
Pr θ − +
fθ θ −
fθ θ +
and Pr θ − ,
fθ θ− + fθ θ+
fθ θ + fθ θ +
respectively. In that case, the estimation problem becomes a binary detection problem consisting
in deciding the most likely hypothesis θ− or θ+ in view of the observation yo and the prior
probabilities Pr θ− and Pr θ+ .
The solution to this classical problem is supplied by the MAP detector or, equivalently, by
the likelihood ratio test [Kay93a]. Then, the parameter is decided as follows
θ − fy y; θ − Pr θ− ≥ fy y; θ+ Pr θ+
θ=
θ + fy y; θ − Pr θ− < fy y; θ+ Pr θ+
and, thus,
Pe θ − , θ+ = Pr θ −
θ
∞
fy y; θ− dy+ Pr θ+
θ
−∞
fy y; θ+ dy.
If the proposed estimator solves optimally the related detection problem for all the possible
values of θ, equation (2.57) is hold with equality. Moreover, if the hypotesis are very close
(δ → 0), the MAP estimator,
M AP = arg max fy (y; θ) fθ (θ) ,
θ
θ
attains the Kotelnikov’s bound in (2.57) and, thus, minimizes Pr (|e| ≥ δ) as explained in
[Kay93b, Sec. 11.3].
2.6. LOWER BOUNDS IN PARAMETER ESTIMATION
53
Ziv-Zakai Bounds (ZZB)
The original work relating the estimation and detection problems was presented by Ziv and
Zakai in [Ziv69]. However, they applied the Chebyshev’s inequeality,
Eθ E |e|2
,
δ2
in lieu of the Kotelnivov’s one (2.57), and the resulting bound was looser. The original idea was
Pr (|e| ≥ δ) ≥
improved in [Cha75] [Bel74][Wei88a][Bel97] where the Kotelnikov’s inequality is used to derive
tight bounds on the (Bayesian) MSE. To do so, it is necessary to use the following relation
between Pr (|e| ≥ δ) and the mean square error [Bel97, Eq. 2]:
∞
2
Eθ E |e| =
Pr (|e| ≥ δ) δdδ
0
where the Bayesian expectation is made explicit again. In the scalar case, the Ziv-Zakai bound
is extended in [Bel97, Eq. 14] as follows
2
Eθ E |e| ≥
∞
ν [D(δ)] δdδ
(2.58)
0
where D(δ) is the bound on Pr (|e| ≥ δ) introduced previously in (2.57) and ν [·] is the “valleyfilling” function introduced by Bellini and Tartara in [Bel74] and defined as
v [f (x)] max f(x + ξ).
ξ≥0
If the prior distribution is uniform on a finite interval, the above bound reduces to the BelliniTartara bound [Bel74].
Finally, the Bellini-Tartara bound is generalized in [Bel97] to multivariate problems and
arbitrary prior functions. In that case, the extended Ziv-Zakai bound is obtained projecting the
− θo onto a given direction determined by the vector v [Bel97]. For a
estimation error e = θ
given v, we have the same expression,
H Eθ E v e ≥
0
where
Dmax (δ) =
max
∆:vH ∆=δ
∞
−∞
∞
ν [Dmax (δ)] δdδ,
Pe (θ − ∆, θ + ∆) [fθ (θ − ∆) + fθ (θ + ∆)] dθ.
In principle, the two hypothesis θ− θ− ∆ and θ+ θ + ∆ could be placed arbitrarily in
the hyperplane RP provided that the projection of the estimation error vH e is equal to δ in case
of an erroneous detection or, in other words, ∆ must hold that vH ∆ = δ. Then, the tightest
lower bound corresponds to the vector ∆, yielding the highest error probability. The reader is
referred to the original work [Bel97] for further results, properties and examples. The utilization
of the Ziv-Zakai bound (ZZB) in the problem of passive time delay estimation is carried out in
detail in [Wei83][Wei84].
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
54
DETERMINISTIC
Lower Bounds
Cramér-Rao (CRB)
[Kay93b]
Battacharyya
[Bat46][For02]
[Fen59] [Gor91]
Conditional CRB
[Sto89][Rib01]
[Vaz00]
Unconditional CRB
[Sto90a] [Vaz00]
BQUE
[Vil01b][Vil05]
Chapter 3
Modified CRB
[And94][Men97]
[Moe98]
BAYESIAN
Lower Bounds
Bayesian CRB
[Tre68][Wei88b]
Bayesian
Battacharyya
[Tre68][Wei88b]
Small-Error
Bounds
CauchySchwarz
Inequality
Second-Order
MMSE
Barankin
Kiefer
HammersleyChapman-Robbins
Abel
[Bar49][Mar97]
[Tre68][For02]
Classification
[Vil01b][Vil05]
Chapter 4
Weiss-Weinstein
[Wei85][Wei88b]
Bobrovsky-Zakai
[Bob76]
Ziv-Zakai
[Ziv69]
Bellini-Tartara
[Bel74]
Extended ZivZakai
[Bel97]
[Kie62]
[Ham50][Cha51]
[Gor90]
Large-Error
Bounds
[Abe93]
Kotelnikov’s
Inequality
Figure 2.5: Classification of the most important lower bounds in the literature. The lower
bounds assuming a certain model for the nuisance parameters —or imposing the second-order
constraint— are marked in gray.
2.A UML FOR POLYPHASE ALPHABETS
55
Appendix 2.A UML for polyphase alphabets
Let us consider that the nuisance parameters belong to a polyphase alphabet of dimension I so
that xk ∈ ej2πi/I with i = 0, ..., I − 1. In that case, it can be shown that the log-likelihood
ln fy (y; θ) is the sum of a finite number of cosh(·) functions, which are computed next
Ex
I−1
∗ H −1 1 −1
=
exp 2 Re e−j2πi/I aH
R
y
exp 2 Re xk ak Rw y
k
w
I i=0
I/2−1
2 −1
cosh 2 Re ej2πi/I aH
R
y
=
w
k
I
i=0
K
−1
=
Ex exp 2 Re x∗k xl aH
k Rw al
l>k
K
l>k
I−1
2 j2πi/I H −1
(I
−
i)
cosh
2
Re
e
a
R
a
k
w l
I2
i=0
2 −1
−1
=
Ex exp 2 Re |xk |2 aH
cosh aH
k Rw ak
k Rw ak .
I
I/2−1
i=0
Notice that the term Ex exp xH AH R−1
can be omitted if AH R−1
w Ax
w A does not depend
on the parameter. This situation is usual in digital communications [Men97, Sec. 5.7.3] because
the noise is white (i.e., R = σ2 IM ) and aH al ∼
= Es δ (k, l) with Es the energy of the received
w
w
k
symbols. In that case, we have that
ln fy (y; θ) ∝
K
k=1
I/2−1
ln
i=0
−1
cosh 2 Re ej2πi/I aH
(θ)
R
y
.
w
k
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
56
Appendix 2.B Low-SNR UML results
Unbiasedness Condition
It can be shown that the low-SNR UML estimator is unbiased for any positive SNR if
AH (θ) R−1
w A (θ) is independent of θ. If this condition is verified, then the mean value of the
log-likelihood gradient is null at θ = θo for any value of the parameter. The proof is provided
next. Let
Ey
∂
−1
ln fy (y; θ)
= Tr AH (θo ) R−1
w Dp (θ o ) Rw A (θ o )
∂θp
θ=θ o
(2.59)
be the expected value of the log-likelihood gradient under the low SNR approximation with
Dp (θ) ∂A (θ)
∂ ∂AH (θ)
A (θ) AH (θ) =
AH (θ) + A (θ)
.
∂θ p
∂θp
∂θp
If we plug Dp (θ) into (2.59), the argument of the trace can be written as
AH (θ) R−1
w A (θ)
∂ H
A (θ) R−1
w A (θ)
∂θp
using that Tr (AB) = Tr (BA). Therefore, since AH (θ) R−1
w A (θ) is positive definite, (2.59)
−1
−1
vanishes iff AH (θ) R−1
w A (θ) is independent of θ. This condition implies that Rw Dp (θ) Rw
(2.59) must lie completely into the orthogonal subspace of A (θ) for p = 1, ..., P .
Self-Noise Free Condition
If the gradient of the low-SNR UML log-likelihood function is not zero at θ = θo as the noise
variance goes to zero, the estimator variance exhibits a variance floor due to the randomness of
the nuisance parameters. A sufficient condition to have self-noise free estimates at high SNR is
that
∂
lim
ln fy (y; θ)
=0
σ2w →0 ∂θp
θ=θ o
meaning that
xH AH (θ) N−1 Dp (θ) N−1 A (θ) x = 0
for any value of θ and x. Notice that this requirement coincides with the unbiasedness condition
if A (θ) x effectively spans all the signal subspace.
2.C CML RESULTS
57
Appendix 2.C CML results
Unbiasedness
−1/2
Plugging B (θ) Rw
A (θ) into ln fy (y; θ,
xM L (θ)) (2.14), we have that
H
−1 H
−1/2 ln fy (y; θ,
R
=
xM L (θ)) = C4 + Tr R−1/2
B
(θ)
B
(θ)
B
(θ)
B
(θ)
R
w
w
= C4 + Tr R−1/2
PB (θ) R−1/2
R
w
w
−1 H
B (θ) the orthogonal projector onto the subspace genwith PB (θ) B (θ) BH (θ) B (θ)
erated by the columns of B (θ). Computing now the log-likelihood gradient, it is found that
∂
−1/2 ∂PB (θ) −1/2 ln fy (y; θ,
xML (θ)) = Tr Rw
Rw R
∂θp
∂θ p
where the derivative of the orthogonal projector is given by [Vib91, Eq. 33]
%
&H
∂B (θ) #
∂B (θ) #
∂PB (θ)
⊥
⊥
= PB (θ)
B (θ) + PB (θ)
B (θ)
∂θp
∂θ p
∂θ p
(2.60)
with P⊥
B (θ) IM − PB (θ). Therefore, the expected value of the gradient is
'
∂
∂P
(θ)
B
−1/2
−1/2
H
A (θo ) A (θo ) + Rw
ln fy (y; θ,
xML (θ))
R
Ey
= Tr Rw
∂θp
∂θp θ=θ o w
θ=θ o
'
∂PB (θ) ∂PB (θ) H
= Tr
+ B (θo )
B (θo ) ,
∂θp ∂θ p θ=θ o
θ=θ o
that is equal to zero because
P⊥
B (θ) B (θ) = 0
BH (θ) P⊥
B (θ) = 0.
Self-Noise Free
If the gradient of the CML log-likelihood function is not zero at θ = θo as the noise variance goes
to zero, the estimator variance exhibits a variance floor due to the randomness of the nuisance
parameters. A sufficient condition to have self-noise free estimates at high SNR is that
∂
lim
ln fy (y; θ,
xM L (θ))
= 0,
σ2 →0 ∂θp
θ=θ o
w
meaning that
∂PB (θ) x B (θ)
B (θo ) x = 0
∂θp θ=θ o
H
H
for any value of θ and x. Notice that the last equation is verified for any value of x due to
(2.60). Actually, the CML is able to cancel out the self-noise as well as the bias because of the
orthogonal projector P⊥
B (θ) appearing in ∂PB (θ) /∂θ p (2.60).
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
58
Appendix 2.D GML asymptotic study
Using the inversion lemma [Kay93b, p. 571], we find that R−1 (θ) has the following asymptotic
expressions:
H −1
lim R−1 = R−1
w IM −AA Rw
−1 H −1 IM −A AH R−1
A Rw
lim R−1 = R−1
w
w A
σ2w →∞
σ2w →0
with the operator lim meaning “asymptotically approximated to” in this appendix.
If we substitute these results into (2.24) and omit constant terms, we obtain the following
asymptotic expressions for the GML cost function:
H −1
−1
H −1 lim ln Ex {fy (y/x; θ)} ∝ Tr ln IM −AA Rw + Tr Rw AA Rw R
σ2w →∞
−1
H −1 − Tr AH R−1
A
+
Tr
R
AA
R
R
(2.61)
w
w
w
H −1 −1 H −1 lim ln Ex {fy (y/x; θ)} ∝ − Tr ln AAH + Rw + Tr R−1
A
A Rw A
A Rw R
w
σ2w →0
H −1 −1 H −1 ,
Tr R−1
A
A Rw A
A Rw R
(2.62)
w
that correspond to the low-SNR UML and CML solutions obtained in (2.17) and (2.19), respectively.
The independent term b (θ) in (2.61) has been approximated using the Taylor expansion of
the logarithm and the commutative property of the trace [Kay93b, p. 571], yielding
H −1
H −1
lim Tr ln IM −σ−2
= T r ln (IM ) + Tr −σ−2
= − Tr AH R−1
w AA N
w AA N
w A .
σ2w →∞
On the other hand, the independent term b (θ) in (2.62) is neglected at high SNR since it
converges to the constant − Tr ln AAH whereas the second term is proportional to σ−2
w .
2.E CLOSED-LOOP ESTIMATION EFFICIENCY
59
Appendix 2.E Closed-loop estimation efficiency
Following the indications in [Kay93b, Appendix 7B], if the observation y is splitted into N
statistically independent blocks, the log-likelihood function ln fy (y; θ) is given by
ln fy (y; θ) =
N
ln fz (zn ; θ)
n=1
and, thus, the corresponding gradient and Hessian are given by
∇(y; θ) =
H(y; θ) =
N
∂ ln fz (zn ; θ)
n=1
N
∂θ
∂ 2 ln f
N
∇z (zn ; θ)
n=1
z (zn ; θ)
T
(2.64)
∂θ∂θ
n=1
(2.63)
respectively. Therefore, the Newton-Raphson and scoring algorithms are updated in the k-th
iteration adding the following term
N
1 k )J−1 (θ
k )∇z (zn ; θ
k ),
Dg (θ
z
N n=1
(2.65)
in which we have taken into account that
N
∂ 2 ln fz (zn ; θ)
n=1
∂θ∂θT
NEz
∂ 2 ln fz (zn ; θ)
∂θ∂θT
−NJz (θ),
for N sufficiently large [Kay93b, Appendix 7B]. Notice that the last equation is approximatelly
equal to the Fisher’s information matrix:
J(θ) = −Ey {H(y; θ)} = −
N
Ez
n=1
∂ 2 ln fz (zn ; θ)
∂θ∂θT
= −NJz (θ).
Then, the averaging in (2.65) can be substituted by an exponential filtering such as
k )J−1 (θ
k )∇z (zn ; θ
k ),
εn = (1 − µ) εn−1 − µDg (θ
(2.66)
with ε0 = 0. The step-size or forgetting factor µ is adjusted to yield the same noise equivalent
bandwidth, which is defined as
1/2T
Bn −1/2T
|H (f)|2 df
2T |H (0)|2
where T is the sampling period and H (f) is the frequency response of the adopted filter [Men97,
Sec. 3.5.5]. Using this formula, it follows that the noise equivalent bandwidth for the integrator
(2.65) and the exponential filter (2.66) is Bn = 0.5/N and Bn = 0.5µ/(2−µ) µ/4, respectively,
60
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
where the last approximation is verified for µ 1. Using this approximation, the step-size µ is
approximatelly equal to 2/N (for N 1) and (2.66) can be written as
k )J−1 (θ
k )∇z (zn ; θ
k ).
εn = εn−1 − µDg (θ
(2.67)
Finally, if (2.67) is integrated into the Newton-Raphson or scoring recursions, and the estimated parameter is updated after processing each block, we obtain the closed-loop estimator
presented in (2.30). Notice that the obtained closed-loop estimator can also iterate the N blocks
several times as the original iterative methods in (2.27) and (2.29).
2.F COMPUTATION OF ΣVS (θ) FOR THE SMALL-ERROR BOUNDS
61
Appendix 2.F Computation of Σsv (θ) for the small-error bounds
The computation of Σsv (θ) = E s(y; θ)vT (y; θ) for the Bhattacharyya and Cram´er-Rao
bounds requires to compute the following term
n
n
∂ ln fy (y; θ) H
∂ fy (y; θ) H
Ey
v (y; θ) =
v (y; θ) dy
n
∂θ
∂θn
(2.68)
The last term can be further manipulated taking into account that the estimator is unbiased,
i.e., E {v (y; θ)} = 0. Then, if the chain rule is applied and the integral and derivative signs are
swapped, we obtain that
H
∂n
∂n
E
(y;
θ)
=
v
fy (y; θ) vH (y; θ) dy
y
∂θn
∂θn
n
∂ fy (y; θ) H
∂ n vH (y; θ)
=
v
(y;
θ)
dy
+
f
(y;
θ)
dy = 0,
y
∂θn
∂θn
(y) − g(θ), it follows that
Then, using that v (y; θ) α
∂ n vH (y; θ)
∂ n gH (θ)
∂ n gH (θ)
dy = − f y (y; θ)
dy = −
f y (y; θ)
n
n
∂θ
∂θ
∂θn
must be equal to (2.68) except for the minus sign. Thus, we conclude that
n
∂ ln fy (y; θ) H
∂ n gH (θ)
Ey
v
(y;
θ)
=
.
∂θn
∂θn
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
62
Appendix 2.G MCRB, CCRB and UCRB derivation
In this appendix, the derivation of the lower bounds introduced in Section 2.6.1 is sketched.
UCRB Derivation
The UCRB involves the computation of the following score function:
∂ ln fy (y; θ)
∂ ln det (R (θ)) + Tr R−1 (θ) R
=−
∂θp
∂θp
∂R (θ) −1
−1
− R (θ) ,
R (θ) R
= Tr R (θ)
∂θp
[su (y; θ)]p where fy (y; θ) is the Gaussian p.d.f. introduced in (2.22) and the following two expressions
from [Ott93, Eq. 4.57-58] have been applied:
∂
∂R (θ)
−1
ln det (R (θ)) = Tr R (θ)
∂θp
∂θp
∂
∂R (θ) −1
−1
−1
Tr R (θ) R = − Tr R (θ)
R (θ) R .
∂θp
∂θp
Therefore, the score function su (y; θ) can be written as follows
−1
∗
(
r − r (θ)) ,
su (y; θ) = DH
r (θ) (R (θ) ⊗ R (θ))
with the following definitions
[Dr (θ)]p vec (∂R (θ) /∂θp )
r vec R
(2.69)
r (θ) vec (R (θ)) ,
and using the following relationships:
vec ABCH = (C∗ ⊗ A) vec (B)
A−1 ⊗ B−1 = (A ⊗ B)−1 .
Finally, in the unconditional model, the Fisher’s information matrix becomes
−1
H
∗
Dr (θ)
Ju (θ) Ey su (y; θ) sH
u (y; θ) = Dr (θ) (R (θ) ⊗ R (θ))
using that the covariance matrix of r − r (θ) is precisely R∗ (θ) ⊗ R (θ) under the Gaussian
assumption [Li99, Eq. 20]. In Chapter 4, it will be shown that Ju can be obtained from J2
(2.52) when the nuisance parameters are Gaussian distributed.
2.G MCRB, CCRB AND UCRB DERIVATION
63
CCRB Derivation
The CCRB was originally derived in [Sto89][Sto90a] for DOA estimation. A different derivation
is given next based on the asymptotic performance of the CML estimator and the estimation
bounds theory presented in Section 2.6.1.
In the conditional model, the CML estimator is formulated from the following score function:
∂
∂A (θ) #
−1 ⊥
ln fy (y; θ,
xML (θ)) = 2 Re Tr Rw PA (θ)
A (θ) R
[sc (y; θ)]p ∂θp
∂θp
∂A (θ) #
−1 ⊥
A (θ) R − R (θ)
= 2 Re Tr Rw PA (θ)
∂θ p
that is obtained using the results in Appendix 2.C. Using again (2.69) and
vec ABCH = (C∗ ⊗ A) vec (B) ,
it follows that
∂
H
#∗
−1 ⊥
sc (y; θ) =
xM L (θ)) = 2 Re Da (θ) A (θ) ⊗ Rw PA (θ) (
r − r (θ))
ln fy (y; θ,
∂θ
with [Da (θ)]p vec (∂A (θ) /∂θp ).
After some tedious simplifications that are omitted for the sake of brevity, in the conditional
model, the Fisher’s information matrix is given by
H
H
−1 ⊥
Ey sc (y; θ) sH
c (y; θ) = 2 Re Da (θ) xx ⊗ Rw PA (θ) Da (θ)
−1
H
H
−1
−1 ⊥
+2 Re Da (θ) A (θ) Rw A (θ)
⊗ Rw PA (θ) Da (θ) ,
that is found to depend on the actual vector of nuisance parameters x. It is shown in [Sto90a,
Eq. 2.13] that the first term converges to its expected value as the observation size increases
and, thus, xxH → IK . On the other hand, the second term can be neglected if the SNR or the
observation length goes to infinity. Actually, this second term causes the CML degradation at
low SNR when the observation is short [Sto90a, Eq. 2.15]. Bearing in mind these arguments, the
asymptotic Fisher’s information matrix apperaring in the CCRB expression (2.50) contains only
the average of the first term. The resulting expression is known to bound the performance of
the CML and GML estimators whatever the SNR or the observation size. However, the adopted
CCRB becomes a loose bound for low SNRs in case of finite observations.
CHAPTER 2. ELEMENTS ON ESTIMATION THEORY
64
MCRB Derivation
A straightforward derivation of the multidimensional MCRB is provided next:
ln fy/x (y/x; θ) = const − y − A (θ) x2R−1
w
∂ ln fy/x (y/x; θ)
H
−1 ∂A (θ)
= 2 Re (y − A (θ) x) Rw
x
∂θp
∂θp
2
H
∂ 2 ln fy/x (y/x; θ)
H
−1 ∂ A (θ)
H ∂A (θ) −1 ∂A (θ)
= 2 Re (y − A (θ) x) Rw
x−x
Rw
x
∂θp ∂θq
∂θp ∂θq
∂θq
∂θp
H
∂ 2 ln fy/x (y/x; θ)
H ∂A (θ) −1 ∂A (θ)
= −2 Re x
Ey/x
Rw
x
∂θp ∂θq
∂θq
∂θp
and, therefore,
[Jm ]p,q = −Ex Ey/x
∂ 2 ln fy/x (y/x; θ)
∂θp ∂θq
∂AH (θ) −1 ∂A (θ)
= 2 Re Tr
.
Rw
∂θq
∂θp
Finally, the elements of Jm can be arranged as in equation (2.54) using the following properties:
Tr AH B = vecH (A) vec (B)
vec (ABC) = CT ⊗ A vec (B) .
Chapter 3
Optimal Second-Order Estimation
In this chapter, optimal second-order estimators are formulated considering that the estimator is
provided with some side information about the unknown parameters. This side information can
be exploited to improve the estimator accuracy or imposed by the designer in order to constrain
the estimator mean response. Although the formulation in both cases will be the same, the
classification above becomes crucial from a theoretical viewpoint. In that way, the adopted
framework allows unifying the Bayesian and classical estimation theories.
In the first case, the Bayesian approach is adopted to model the unknown parameters as
random variables of known probability density function fθ (θ). Thus, fθ (θ) provides the available statistical information on the parameters prior to the observation of the data. Bayesian
estimators resort to this prior information when the observation is severely corrupted by the
noise in low SNR scenarios. On the other hand, the prior contribution is scarse if the observation is rather informative. The above side information is supposed to be obtained in a previous
estimation stage providing both the estimate and its accuracy. In that case, Gaussian priors
are usually employed having in mind that the output of a consistent estimator becomes asymptotically Gaussian distributed on account of the Central Limit Theorem. When the parameter
is constrained to a given interval, the folded Gaussian distribution is more appropiate [Rib97].
In particular, the folded Gaussian p.d.f. converges to the uniform distribution when all the
available knowledge is the parameter range.
Bayesian estimation has received a lot of attention in the past decades but it has always raised
a lot of controversy because the parameters are actually deterministic unknowns in a typical
estimation problem (Section 2.1). The Bayesian approach is realistic if the parameters can be
modelled as ergodic realizations of the a priori distribution fθ (θ). In this kind of applications,
adaptive filters or trackers must be designed in order to track the parameter temporal evolution
(Section 2.5.2). If the observation is linear in the parameters and the prior is Gaussian, the
optimal linear tracker is the well-known Kalman filter [And79] [Kay93b]. Unfortunately, most
65
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
66
estimation problems in communications are nonlinear and the suboptimal Extended Kalman
filter (EKF) must be used instead. In Chapter 5, the EKF formulation is generalized to design
blind second-order trackers based on the Bayesian interpretation of the results in this chapter.
If the classical estimation theory is adopted, the side information can be used to constrain
the estimator mean response. In that case, fθ (θ) is a weighting function introduced by the
designer to define new custom-built optimization criteria. Anyway, the formulation in both
cases will be identical and fθ (θ) will be referred to as the prior distribution in spite of dealing
with deterministic parameters. Likewise, Eθ {·} will denote indistintly the Bayesian expectation
with respect to the random vector θ or solely the following averaging
Eθ {F (θ)} F (θ) fθ (θ) dθ,
(3.1)
if the parameters are deterministic amounts.
Based on the a priori distribution fθ (θ) and the known linear signal model introduced in
Section 2.4, the objective is to find the optimal second-order estimator of α ∈ RQ where
α = g (θ)
is an arbitrary transformation of θ ∈ RP . With this aim, the general expression of any secondorder estimator of α is presented next:
= b + MH α
r
(3.2)
= vec yyH
r vec R
(3.3)
where
introduced in Section 2.4.1
is the column-wise vectorization of the sample covariance matrix R
and, b and M are the estimator coefficients corresponding to the independent and the quadratic
term, respectively. Notice that the linear term is not considered because the nuisance parameters
x are usually zero-mean random variables in the context of NDA estimation (2.4).
If the transmitted constellation is polarized or some training symbols are transmitted, the
linear term LH y should be included following a semi-blind approach, improving so the estimator
performance at low SNR [Mes02, Ch.3][Gor97][Car97]. Notice too that the vec (·) operator can
be applied successively to formulate higher-order estimators in order to improve the estimator
performance in high SNR scenarios [Vil01b].
Finally, it is worth noting that a circular constellation is assumed. In that case, the improper
covariance matrix E yyT is equal to zero and, therefore, no information can be drawn from
the term vec yyT where (·)T stands for the transpose [Sch03][Pic96]. In Appendix 3.A, the
results in this section are extended to encompass important noncircular constellations holding
3.1. DEFINITIONS AND NOTATION
67
E yyT = 0 (e.g., PAM, BPSK or CPM). Moreover, the design of quadratic carrier phase
synchronizers for the noncircular CPM modulation will be addressed in detail in Section 6.2.
Henceforth, the objective is to determine the estimator optimal coefficients M and b under
a given performance criterion. Two criteria will be analyzed next; the first one is the usual
minimum mean squared error (MMSE) criterion that minimizes the aggregated contribution of
variance and bias. The MMSE criterion usually leads to biased estimators, mainly in low SNR
scenarios in which the noise-induced variance is dominant. Unfortunately, in some applications
bias is not tolerated (e.g., navigation applications) and some constraints must be introduced to
compensate this bias. In that case, the proposed alternative is to minimize the estimator MSE
subject to the minimum bias constraint. This chapter presents a convenient framework from
which different estimation strategies can be devised as a trade-off between bias and variance.
With this purpose, the following definitions are introduced in the next section.
3.1
Definitions and Notation
In this section, the mean square error (MSE) and variance figures are computed for the linear signal model introduced in Section 2.4 and for second-order estimation. Thus, the MSE
associated to the generic second-order estimator in (3.2) is given by
2
MSE (θ) = E α (θ) − g (θ)2 = E b + MH r (θ) − g (θ)
(3.4)
where the expectation is computed over the noise w and the nuisance parameters x. The
estimator MSE can be divided into the bias and variance contributions so that
MSE (θ) = BIAS 2 (θ) + V AR (θ)
where the squared bias and variance are defined as follows:
2
BIAS 2 (θ) = α (θ) − g (θ)2 = b + MH r (θ) − g (θ)
2
r (θ) − r (θ)) = Tr MH Q (θ) M
V AR (θ) = E α (θ) − α (θ)2 = E MH (
(3.5)
(3.6)
with α (θ) the estimator mean value
α (θ) E {
α (θ)} = b + MH r (θ)
(3.7)
r (θ) E {
r (θ)} = vec A (θ) AH (θ) + Rw
H
Q (θ) E (
r (θ) − r (θ)) (
r (θ) − r (θ))
(3.8)
and
(3.9)
the mean and the covariance matrix of the vectorized sample covariance matrix r, respectively.
Notice that r (θ) corresponds to the (vectorized) covariance matrix of y whereas Q (θ) gathers
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
68
the central fourth-order moments of y. The vectorization is fundamental to derive a closed form
for the matrix Q (θ). In Appendix 3.B, it is found that
Q (θ) = R∗ (θ) ⊗ R (θ) + A (θ) KAH (θ)
where A (θ) A∗ (θ) ⊗ A (θ), R (θ) was introduced in Section 2.4.3 and,
K Ex {vec xxH vecH xxH } − vec (IK ) vecH (IK ) − IK 2
(3.10)
(3.11)
is the matrix containing the fourth-order cumulants (kurtosis) of the nuisance parameters x.
It is worth realizing that Q (θ) and K are calculated analytically for the linear signal model
introduced in Section 2.4, avoiding so the problematic estimation of fourth-order statistics.
In the case of zero-mean, circular complex nuisance parameters, the matrix K is the following
diagonal matrix:
K = (ρ − 2) diag (vec (IK ))
where the scalar
(3.12)
E | [x]k |4
ρ 2
E {| [x]k |2 }
is the fourth- to second-order moment ratio (Appendix 3.C). If the nuisance parameters are
not circular (e.g., for the CPM modulation), the expectation in (3.11) has to be computed
numerically —and offline— from the known p.d.f. of x. Moreover, if the nuisance parameters
are discrete, as it happens in digital communications, the computation of K needs only a small
number of realizations of fx (x).
It is well-known that matrix K is zero for normally distributed nuisance parameters for
which ρ = 2. Otherwise, matrix K provides the complete non-Gaussian information about
the nuisance parameters that second-order estimators are able to exploit. In fact, the GML
estimator is sometimes outperformed at high SNR if the second term of (3.10) is considered.
This remark was actually the motivation of this thesis and will be analyzed intensively in the
following chapters.
Unfortunately, the vector b and matrix M minimizing the bias, variance or MSE figures
are generally a function of the unknown vector of parameters θ and, therefore, the resulting
estimator is not realizable. Accordingly, the estimator coefficients have to be optimized from a
convenient average of these figures of merit over all the possible values of θ. In that sense, the
prior fθ (θ) introduced previously is applied to obtain the following averaged MSE, bias and
variance:
2
M SE Eθ {MSE (θ)} = BIAS 2 + V AR = Eθ E b + MH r (θ) − g (θ)
2
BIAS 2 Eθ BIAS 2 (θ) = Eθ b + MH r (θ) − g (θ)
V AR Eθ {V AR (θ)} = Tr MH QM
(3.13)
(3.14)
(3.15)
3.2. SECOND-ORDER MMSE ESTIMATOR
69
with
Q Eθ {Q (θ)} .
The estimator variance in (3.15) is independent of b. Therefore, b can be selected to minimize
the bias contribution without degrading the estimator variance. It is found that the optimum
b is given by
bopt arg min BIAS 2 = arg min M SE = g − MH r
b
b
(3.16)
where
g Eθ {g (θ)}
(3.17)
r Eθ E {
r (θ)}
(3.18)
If now bopt is substituted into (3.13) and (3.14), we obtain that
2
− MH S − SH M
BIAS 2 = Eθ MH (r (θ) − r) − (g (θ) − g) = σ2g + Tr MH QM
(3.19)
(3.20)
V AR = Tr MH QM
M − MH S − SH M
MSE = BIAS 2 + V AR = σ 2g + Tr MH Q + Q
(3.21)
with the following definitions
σ2g Eθ g (θ) − g2
Eθ (r (θ) − r) (r (θ) − r)H
Q
S Eθ (r (θ) − r) (g (θ) − g)H .
(3.22)
(3.23)
(3.24)
The expectation with respect to the prior fθ (θ) poses serious problems when calculating
and S. In Appendix 3.D, this problem is solved when
the analytical expressions of g, r, Q, Q
the parameter dependence is phasorial. In the following sections, the MMSE estimator and the
minimum bias-variance estimator are formulated, and further analyzed, assuming that these
vectors and matrices have been computed somehow.
3.2
Second-Order MMSE Estimator
The second-order MMSE estimator is obtained minimizing the overall MSE in equation (3.21).
It follows that the optimum matrix M is given by
−1
Mmse = arg min MSE = Q + Q
S
M
(3.25)
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
70
where the inversion is guaranteed assuming that the noise covariance matrix Rw is positive
definite. Notice that the above expression corresponds to the linear Bayesian MMSE estimator
is the autocorrelation matrix of of α based on the sample covariance vector r where Q + Q
r
and, S the cross-correlation between r and α [Kay93b].
If (3.25) is now plugged into (3.21), the minimum MSE is found to be
−1 2
H
MSEmin = σg − Tr S Q + Q
S
(3.26)
where σ2g is the initial (prior) uncertainty about the parameter α and the second term is the
MSE improvement after processing the data vector y. It is easy to show that this second term
vanishes as the noise variance is increased.
3.3
Second-Order Minimum Variance Estimator
The aim of this section would be obtaining the minimum variance unbiased (MVU) estimator
(Section 2.2). However, in most cases it is not possible to cancel out the bias term unless the
covariance vector r (θ) is an affine transformation of α ∈ RQ , that is, r (θ) = Wg (θ) +v for any
value of W and v. If so, it is straightforward to verify that the estimator bias (3.14) is removed
by setting MH W = IQ . Unfortunately, this situation is unusual and quadratic estimators are
normally degraded by some residual bias. Taking into account this limitation, in this section the
minimum variance estimator is deduced subject to those constraints minimizing the estimator
bias. Thus, let us first obtain the equation that M must verify to yield minimum bias:
dBIAS 2
−S =0
= QM
dM∗
(3.27)
Generally, the constraints
obtained in (3.27) form an underdetermined system of equations
< M 2 and S ∈ CM 2 ×Q in (3.24) lies, by definition, in the column
because R rank Q
∈ CM 2 ×M 2 . Hence, (3.27) is actually imposing RQ design constraints on the matrix
span of Q
M ∈ CM
2 ×Q
= VΣVH , can be formulated as follows:
that, after the diagonalization of Q
VH M = S
(3.28)
where S Σ−1 VH S, Σ ∈ RR×R is the diagonal matrix containing the non-zero eigenvalues of
and, V ∈ CM 2 ×R are the corresponding eigenvectors.
Q
Therefore, since equation (3.27) is only forcing R constraints, the remaining degrees of freedom in M can be used to optimize the estimator variance. Specifically, the aim is to minimize
the estimator variance subject to the constraints on α (θ) given in (3.27) or (3.28), that is,
Mvar = arg min V AR = arg min MH QM
M
M
subject to QM
= S or VH M = S,
(3.29)
3.3. SECOND-ORDER MINIMUM VARIANCE ESTIMATOR
71
which yields the following solution:
with P and P defined as
Mvar = PH S = P H S
(3.30)
#
−1
−1 Q
QQ
P QQ
−1 H −1
V Q .
P VH Q−1 V
(3.31)
(3.32)
Thus, after substitutions in (3.2), the minimum variance estimator is given by
= g + SH P (
α
r − r) = g + S H P (
r − r)
(3.33)
where P is projecting the sample covariance vector (
r − r) onto the minimum-bias subspace
in (3.27) (see Fig. 3.1). Plugging now (3.30) into (3.20), the minimum
generated by matrix Q
variance is equal to
# −1 −1 Q
V ARmin = Tr SH QQ
S = Tr S H VH Q−1 V
S
(3.34)
where the argument inside the trace operator is the covariance matrix of the estimation error:
#
H
H −1 = S QQ Q S
α (θ) − α (θ)) (
α (θ) − α (θ))
Eθ E (
−1
S
(3.35)
= S H VH Q−1 V
Finally, plugging (3.30) into (3.19), the residual bias can be expressed in any of these alternative forms:
2
= σ2g − Tr MH S = σ2g − Tr MH QM
BIASmin
= σ2g − Tr SH PS = σ2g − Tr S H ΣS
#S
= σ2g − Tr SH Q
(3.36)
The last equation is obtained from MH QM
using that1
=Q
#Q
PQ
1
= VΣVH . It is found that [Mag98, Ch.2]:
The following identity is obtained from the diagonalization of Q
= VΣ−1 VH
Q
−1 Q
= VΣVH Q−1 VΣVH
QQ
#
−1
−1 Q
QQ
= VΣ−1 VH Q−1 V
Σ−1 VH
and, thus,
#
−1 Q
= VVH
= QQ
−1 Q
QQ
PQ
#Q
=Q
Q
# = VVH .
Q
taking into account that VH V = IR .
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
72
m2
=S
QM
M = PHS
(min VAR)
#S
M =Q
(min BIAS)
)−1 S
M =(Q + Q
(min MSE)
(min norm)
m1
Figure 3.1: Geometric interpretation of the second-order estimators deduced in this chapter for
a uniparametric, hypothetical problem in which M had only 2 coefficients: m1 and m2 .
The
is actually the orthogonal projector onto the R-dimensional subspace generated by Q.
resulting expression is then simplified using the following property of the pseudo-inverse:
#Q
# [Mag98, Eq. 5.2].
Q
# = Q
Q
Notice that any matrix M solving (3.27) or (3.28) yields the same bias, for example, M =
#
Q S. Indeed, among all of them, Mvar (3.30) is the one yielding minimum variance (Fig. 3.1).
3.4
A Case Study: Frequency Estimation
In this section, the second-order MMSE and minimum variance estimators are applied to estimate the carrier frequency offset in the context of digital synchronization. This problem has
been chosen because closed form expressions exist based on the results in Appendix 3.D.
The signal model for frequency synchronization fits the general linear model in Section 2.4,
in which the transfer matrix A (θ) is given by
[A (ν)]k = exp (j2πνdM /Nss ) [A]k
where ν and Nss are, respectively, the normalized carrier frequency offset and sampling rate,
matrix A generates the actual modulation and, dM [0, ..., M − 1]T . The precise content of
matrix A in digital synchronization will be detailed in Section 6.1.2.
In addition, a uniform prior is assumed for the unknown carrier frequency ν as the following
one:
fν (ν) =
∆−1 |ν| ≤ ∆/2
0
otherwise
3.4. A CASE STUDY: FREQUENCY ESTIMATION
73
1
Closed−Loop (∆ → 0)
Min bias OL (∆=20%)
Min bias OL (∆=50%)
Min bias OL (∆=80%)
Min bias OL (∆=100%)
Min MSE OL (∆=50%)
0.9
0.8
Estimator Mean Value
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
Normalized Frequency Error
0.7
0.8
0.9
1
Figure 3.2: Estimator mean response for different values of ∆. The simulation parameters are
M=4, Nss =2, Es/No=10dB.
with ∆ ≤ Nss determining the frequency offset range2 . Notice that ∆ constitutes the sole prior
knowledge about the parameter.
In the following figures, the MMSE and minimum variance estimators are compared in terms
of bias, variance and MSE.
The results in this section were partially presented in the following conferences:
• “Sample Covariance Matrix Based Parameter Estimation for Digital Synchronization”. J.
Villares, G. V´
azquez. Proceedings of the IEEE Global Communications Conference 2002
(Globecom 2002). November 2002. Taipei (Taiwan).
• “Sample Covariance Matrix Parameter Estimation: Carrier Frequency, A Case Study”. J.
Villares, G. V´
azquez. Proceedings of the IEEE International Conference on Accoustics,
Speech and Signal Processing (ICASSP). April 2003. Hong Kong (China).
3.4.1
Bias Analysis
The estimator mean response E {
ν } is plotted as a function of the parameter value for different
values of ∆. Fig. 3.2 shows how the minimum variance solution minimizes the estimator bias
2
Sometimes ∆ will be specified as a percentage of Nss .
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
74
0
10
−2
10
M
mse
−4
10
−6
10
BIAS2
closed−loop
−8
10
−10
10
−12
10
M
−14
var
10
−16
10
0
0.2
0.4
0.6
0.8
1
∆
1.2
1.4
1.6
1.8
2
Figure 3.3: Averaged squared bias as a function of the prior range ∆ for Es/No=10dB, M = 4
and Nss =2.
within the prior range. The estimator mean response oscilates around the unbiased response
cancelling out the bias of 2 min (LNss , M) −1 points within the prior interval (−∆/2, ∆/2], with
L the effective pulse duration (in symbols). These points are automatically selected in order to
minimize the overall estimator bias (3.19). This basic result is shortly proved in Appendix 3.E
and states that the residual bias is a function of the following ratio
∆
min (LNss , M )
Therefore, if the prior range ∆ is fixed, the estimator bias can be reduced by oversampling
the received signal and/or, if possible, reducing the trasmission bandwidth, i.e., increasing L.
Surprisingly, the bias cannot be reduced by augmenting the observation time in the studied
frequency estimation problem for M ≥ LNss (Appendix 3.E).
Regarding Fig. 3.2, one concludes that the bias term increases dramatically if ∆/Nss exceeds
0.5 (50%) for the simulated MSK modulation (L = 2). In the same figure, the mean response of
the MMSE estimator is plotted showing how it is clearly biased. This bias is found to increase
if the SNR is reduced because, in that case, the MMSE estimator trades more bias for variance.
Finally, the S-curve for the closed-loop estimator deduced in the next chapter is depicted. In that
case, the estimator is only required to yield unbiased estimates around the origin (ν = 0). As it
will be studied with more detail in Chapter 4, the closed-loop solution is obtained considering
the asymptotic case in which ∆ → 0.
3.4. A CASE STUDY: FREQUENCY ESTIMATION
75
1
10
M
var
Gaussian
Assumption
0
10
σ2
g
−1
10
M
MSE
mse
−2
10
self−noise
BIAS2
min
−3
10
−4
10
−20
−10
0
10
20
Es/No (dB)
30
40
50
60
Figure 3.4: Normalized MSE for the MMSE and the minimum variance frequency estimators
deduced in (3.25) and (3.30) for the MSK modulation. The corresponding estimators deduced
under the Gaussian assumption are also plotted for comparison. The simulation parameters are
M = 8, Nss = 2 and ∆ = 1.6.
Another interesting simulation is presented in Fig. 3.3 in which the squared bias BIAS 2
(3.19) is plotted as a function of ∆ for the MMSE estimator (Mmse ), the minimum variance
estimator (Mvar ) and the closed-loop small-error estimator (∆ → 0) deduced in Chapter 4. The
SNR is set to 10 dB and, therefore, the noise induced variance is very significative. This fact
justifies the relaxation of the MMSE estimator with respect to the bias term. Notice that the
three estimators are able to cancel out the bias term if the prior range approaches zero (∆ → 0).
This simple remark is of paramount importance in the following sections and motivates the need
of closed-loop algorithms for second-order blind parameter estimation (Chapter 4).
3.4.2
MSE Performance
In this section, the performance of second-order frequency estimators is evaluated in terms of
their mean square error (3.21). Observing the following figures, the next remarks are relevant:
• A priori knowledge. The performance of the MMSE estimator is upper bounded at low
SNR by the a priori mean square error σ2g (Fig. 3.4). In such a noisy scenario, the
MMSE solution becomes biased with the aim of limiting the variance increase caused
by the noise-induced variability. As the SNR increases, the observation provides more
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
76
1
10
Mvar
0
MSE
10
2
σg
M
mse
−1
10
self−noise
Gaussian
Assumption
2
BIASmin
−2
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 3.5: Normalized MSE for the MMSE and the minimum variance frequency estimators
deduced in (3.25) and (3.30) for the 16-QAM modulation. The transmitted pulse is a squareroot raised cosine with roll-off 0.75 truncated at ±5T . The corresponding estimators deduced
under the Gaussian assumption are also plotted for comparison. The simulation parameters are
M = 8, Nss = 2, K = 13 and ∆ = 1.6.
information about the parameter of interest and this information is exploited to reduce
the average MSE.
• Self-Noise. For finite observations (M finite), the studied quadratic estimators manifest
a significant variance floor at high SNR due to the so-called self-noise (Fig. 3.4). Remember that self-noise refers to the random fluctuations caused by the unknown nuisance
parameters x in blind estimation schemes (See Section 2.4.1). Effectively, the feed-forward
estimators presented in this section are unable to cancel out the self-noise for all the possible values of ν. On the other hand, the self-noise free condition is guaranteed in the case
of closed-loop (∆ → 0) second-order frequency estimators as shown in Fig. 3.7. Consequently, the amount of information that can be drawn from the current sample y is very
limited in the studied case due to the presence of self-noise. In fact, the level of the highSNR floor is a function of the observation time M (Fig. 3.8) as well as the prior range ∆
(Fig. 3.7).
• Modulation. If the figures 3.4-3.6 are compared, one concludes that the performance of
the MMSE estimator is practically insensitive to the actual distribution of the transmitted
2 , depends on the transmitted
symbols. However, the incurred minimum bias, BIASmin
3.4. A CASE STUDY: FREQUENCY ESTIMATION
77
1
10
Mvar
0
MSE
10
2
σg
self−noise
−1
M
10
Gaussian
Assumption
mse
2
BIASmin
−2
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 3.6: Normalized MSE for the MMSE and the minimum variance frequency estimators
deduced in (3.25) and (3.30) for the MPSK modulation. The transmitted pulse is a squareroot raised cosine with roll-off 0.75 truncated at ±5T . The corresponding estimators deduced
under the Gaussian assumption are also plotted for comparison. The simulation parameters are
M = 8, Nss = 2, K = 13 and ∆ = 1.6.
pulse. For the considered Nyquist pulse of roll-off 0.75, the minimum variance solution
becomes significantly degraded with respect to the MSK performance for any SNR (Fig.
3.4). Specifically, the bias and self-noise contribution is more significative for the simulated
MPSK and 16-QAM modulations.
• Bias vs. variance trade-off. The MMSE solution outperforms the minimum variance
solution because it is not forced to minimize the bias. On the contrary, it tolerates some
bias if the variance term can be attenuated in return, minimizing so the overall MSE. This
trade-off is more significant in the low SNR regime but it is also observed at high SNR on
account of the self-noise variance. If the self-noise variance is reduced by increasing M,
the minimum variance solution converges to the MMSE solution at high SNR (Fig. 3.8).
• Consistency. For large samples (M → ∞, with Nss constant), the estimator variance is
completely removed whatever the actual SNR and the residual MSE is the estimator bias
computed in (3.36). Therefore, consistent second-order estimation is not possible unless
2
vanishes as explained in Section 3.3. This asymptotic result applies
the bias term BIASmin
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
78
1
10
0
10
−1
MSE
10
1.6
0.8
−2
10
0.4
−3
10
0.2
−4
10
0.1
0.05
−5
10
−10
−5
0
5
10
15
Es/No (dB)
20
25
30
35
40
Figure 3.7: MSE corresponding to the minimum variance solution for different values of the
parameter range ∆=0.05, 0.1, 0.2, 0.4, 0.8 and 1.6. The received signal is MSK-modulated and
M=8 samples are processed with Nss =2.
to both the MMSE and minimum variance solutions (Fig. 3.8). Formally,
2
#S
lim MSE = lim BIASmin
= σ 2g − lim Tr SH Q
M →∞
M →∞
M →∞
where the last term becomes constant for M ≥ Nss L. Notice that the MSE curves in
Fig. 3.8 would eventually converge to the bias floor shown in Fig. 3.4 if the M-axis were
2
expanded, i.e., limM →∞ BIASmin
≈ 10−3 .
• Gaussian assumption. The Gaussian assumption is checked in Fig. 3.4 showing that it
yields a significant loss for medium-to-high SNRs. On the other hand, it converges to
the optimal solution as the SNR approaches to zero. Regarding Fig. 3.8, the Gaussian
assumption also supplies asymptotically (M → ∞) self-noise free estimators but it suffers a
constant penalty for any finite SNR. This loss is less significative in the case of considering
a linear modulation as shown in Figs. 3.5-3.6.
3.4. A CASE STUDY: FREQUENCY ESTIMATION
79
−1
10
MSE
Gaussian
M
var
M
mse
−2
10
4
8
12
16
20
24
28
32 36 40 44 48
M
Figure 3.8: Normalized MSE for the MMSE and the minimum variance frequency estimators
deduced in (3.25) and (3.30). The modulation is MSK, Es/No=40dB, Nss = 2 and ∆ = 1.6.
1
10
Mvar
0
10
closed−loop
−1
MSE
10
−2
10
M
−3
mse
10
−4
10
−5
10
0
0.1
0.2
0.3
0.4
0.5
∆/2
0.6
0.7
0.8
0.9
1
Figure 3.9: MSE as a function of ∆ for the MSK modulation. The simulation parameters are
Es/No=10dB, M = 4 and Nss =2.
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
80
3.5
Conclusions
This chapter was devoted to design feedforward second-order estimators, adopting the wellknown Bayesian approach. The coefficients of the quadratic estimator were selected to minimize
the estimator MSE or the estimator variance on the average, where this average involves the a
priori distribution of the unknown parameters. In the optimization of the estimator coefficients,
the actual distribution of the nuisance parameters was considered avoiding the usual Gaussian
assumption.
The applicability of the studied second-order estimators in nonlinear estimation problems
becomes generally limited due to the impossibility of cancelling the bias term. Indeed, consistent
second-order estimators are mostly unfeasible due to the persistent bias term. Moreover, if the
observation time is finite, a variance floor appears at high SNR due to presence of the random
nuisance parameters. This floor depends on the actual distribution of the nuisance parameters
and can be reduced exploiting their actual distribution, especially in case of CPM signals.
Nonetheless, most of these conclusions depend on the actual parameterization and the assumed prior distribution. In this chapter, the problem of blind frequency synchronization was
chosen to illustrate these conclusions by means of analytical and numerical results. In this case
study, the minimization of the estimator bias —within the parameter range— is proved to be
limited by the effective duration of the transmitted pulse. On the other hand, open-loop secondorder frequency estimators exhibited the referred variance floor at high SNR, whereas self-noise
free closed-loop frequency estimators exist in the literature even for limited observation times.
Beyond the practical interest of open-loop second-order estimators, the formulation in this
chapter constitutes the basis for the deduction of optimum quadratic closed-loop estimators in
Chapter 4.
3.A SECOND-ORDER ESTIMATION IN NONCIRCULAR TRANSMISSIONS
81
Appendix 3.A Second-order estimation in noncircular transmissions
In the main text, optimal second-order estimators have been deduced for complex, circular
constellations. The nuisance parameters circularity can be assumed for any bandpass modulation
if the carrier phase is uniformly distributed and this random term is incorporated into the vector
of nuisance parameters x. In that case, the expectation of yyT becomes zero and does not
provide information about the parameter of interest. However, optimal second-order estimators
should also exploit the improper correlation matrix yyT in case of baseband transmissions
or noncircular bandpass modulations, provided that the carrier phase is known or estimated.
Precisely, the carrier phase estimation is addressed in Section 6.2 using quadratic schemes in
case of MSK-type modulations. Other important noncircular modulations are the CPM format,
any staggered modulation (e.g., offset QPSK), any real-valued constellation such as BPSK or
ASK, trellis coded modulations (TCM) as well as other coded transmissions [Pro95].
The analysis of noncircular or improper complex random variables has been carried out in
[Sch03][Pic96] and references therein. Widely-linear estimators are proposed in [Sch03][Pic95]
in which the vector
z
y
y∗
is linearly processed. This extended signal model has been applied in the field of communications
by some authors, e.g., [Gel00][Tul00] [Ger01].
Therefore, all the results in this thesis can be extended by considering the following sample
covariance matrix
= zzT =
R
yyT yyH
y∗ yT y∗ yH
to obtain the optimal widely-quadratic estimator. When stacking the sample covariance matrix,
because the term yyH provides
it is worth realizing that y∗ yT could be omitted from r = vec(R)
the same information.
To compute the coefficients of the optimal second-order estimator, it is necessary to obtain
following the guidelines in Appendix 3.B.
the covariance of r = vec(R)
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
82
Appendix 3.B Deduction of matrix Q(θ)
The expression of Q (θ) in (3.9) can be written as follows:
H
− rrH
Q (θ) = E r
r
(3.37)
where
r = vec (Ax + w) (Ax + w)H = vec AxxH AH + AxwH + wxH AH + wwH
and the dependence on θ is omitted for the sake of brevity.
Taking into account the noise is circular and zero mean, i.e.,
E {w} = 0
E wwT = 0
E wi wj∗ wk = E wi∗ wj∗ wk = 0,
only six terms, out of the sixteen in r
rH , survive to the expectation in (3.37). These terms can
be classified as follows:
• signal × signal : vec AxxH AH vecH AxxH AH
• signal × noise : vec AxwH vecH AxwH + vec wxH AH vecH wxH AH
+ vec AxxH AH vecH wwH + vec wwH vecH AxxH AH
• noise × noise : vec wwH vecH wwH .
Then, using the following three properties [Mag98, Chapter 2]:
vec ABCH = (C∗ ⊗ A) vec (B)
(A ⊗ B) (C ⊗ D) = AC ⊗ BD
H
∗
vec ab vecH abH = (b∗ ⊗a) (b∗ ⊗a)H = bbH ⊗aaH
(3.38)
(3.39)
(3.40)
and, bearing in mind that E xxH = IK , one obtains
H
= AE vec xxH vecH xxH AH +
E r
r
∗
+ R∗w ⊗ AAH + AAH ⊗ Rw + vec AAH vecH (Rw ) + vec (Rw ) vecH AAH
+ R∗w ⊗ Rw + vec (Rw ) vecH (Rw )
(3.41)
where A A∗ ⊗ A and the following property of Gaussian vectors is used (Appendix 3.C):
E vec wwH vecH wwH = R∗w ⊗Rw + vec (Rw ) vecH (Rw ) .
(3.42)
3.B DEDUCTION OF MATRIX Q(θ)
83
Therefore, grouping terms in (3.41) and having in mind that R = AAH + Rw (2.23), the
following expression is obtained:
H
E r
r
= AE vec xxH vecH xxH AH
∗
− AAH ⊗AAH − vec AAH vecH AAH
+ R∗ ⊗R + vec (R) vecH (R)
Finally, using once more (3.38) and (3.39) in order to write the negative terms above as a
function of A and, plugging this result into (3.37), the expression proposed in (3.10) is obtained:
Q (θ) = R∗ (θ) ⊗ R (θ) + A (θ) KAH (θ) .
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
84
Appendix 3.C Fourth-order moments
In this section the fourth-order moments of a generic zero-mean, circular, possibly non-Gaussian
vector v ∈ CL are deduced. The resulting L4 terms are ordered in the following matrix:
Qv = E vec vvH vecH vvH
(3.43)
whose elements are given by
[Qv ]i+Lj,k+Ll = E vi vj∗ vk∗ vl = E vi vj∗ E {vk∗ vl } + E {vi vk∗ } E vj∗ vl −
+ E vi 4 − 2E 2 vi 2 δ (i, j, k, l)
with i, j, k, l ∈ {0, ..., L − 1} and δ (i, j, k, l) the Kronecker delta of multiple dimensions.
If all these elements are arranged in Qv , three components are identified:
Qv = vec (Rv ) vecH (Rv ) + R∗v ⊗ Rv + diag (vec (Γ))
with Rv E vvH and Γ the diagonal matrix with [Γ]i,i E vi 4 − 2E 2 vi 2 .
If the elements of v are identically distributed, µ E vi 2 and ρ E vi 4 /µ2 do
not depend on i and, thus, the third term can be simplified to obtain that
Qv = vec (Rv ) vecH (Rv ) + R∗v ⊗ Rv + µ2 (ρ − 2) diag (vec (IL ))
(3.44)
In particular, the fourth-order moments of x in (3.12) are given by (3.44) having in mind
that the symbols autocorrelation is E xxH = IK and, thus, we have that Rv = IK and µ = 1.
On the other hand, if v is a complex Gaussian vector, as the noise vector w in the adopted
signal model, the third term in (3.44) can be removed taking into account that ρ = 2 in the
Gaussian case, hence proving equation (3.42):
E vec wwH vecH wwH = R∗w ⊗Rw + vec (Rw ) vecH (Rw ) .
3.D BAYESIAN AVERAGE IN FREQUENCY ESTIMATION
85
Appendix 3.D Bayesian average in frequency estimation
Let us assume that the scalar parameter λ is estimated from the following observation:
y = exp (j2πλdM ) Ax + w
where dM [0, ..., M − 1]T and A stands for A (θ)|θ=0 . Therefore, the observation covariance
matrix is given by
R (λ) = E yyH = E (λ) AAH + Rw
where E (λ) is defined as
[E (λ)]i,k = ej2πλ(i−k) .
(3.45)
Let us consider that the prior is uniform in the interval λ ∈ (−∆/2, ∆/2] with ∆ ≤ 1. In
that case, it is possible to obtain closed-form expressions for those matrices appering in bopt
(3.16), Mmse (3.25) and, Mvar (3.30). The resulting expressions are listed next:
g = Eλ {g (λ)} = Eλ {λ} = 0
σ 2g = Eλ λ2 = ∆2 /12
R = E AAH + Rw
Q = R∗ ⊗ R + (Eq − E∗ ⊗ E) AAH + Eq AKAH
= Eq − vec (E) vecH (E) vec AAH vecH AAH
Q
s = vec Es AAH
with
E Eλ {E (λ)}
Eq Eλ {E∗ (λ) ⊗ E (λ)}
Es Eλ {E (λ) λ}
whose elements are given next [Vil03a]:
[E]i,k = sinc ((i − k) ∆)
[Eq ]i+Mj,k+M l = sinc ((i − j + l − k) ∆)
0
[Es ]i,k =
j
2π(i−k) [sinc ((i − k) ∆) − cos (π (i − k) ∆)]
i=k
i = k
and the sinc (·) operator is defined as sinc (x) sin(πx)/(πx) with sinc(0) = 1.
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
86
Appendix 3.E Bias study in frequency estimation
In this appendix, the minimum bias solution is studied in detail for the frequency estimation
problem. The coefficients m vec (M) minimizing the estimator bias in (3.19) have to satisfy
the minimum-bias constraints in (3.27). After some trivial manipulations, this equation can be
written as
Eν {B (ν) α∗ (ν)} = Eν {B (ν) ν}
(3.46)
where B (ν) A (ν) AH (ν) = E (ν/Nss ) AAH (Appendix 3.D) and
α∗ (ν) = (r (ν) − r)H m = Tr BH (ν) M − C
stands for the estimator mean value (3.7) as a function of the parameter ν and
C Eν Tr BH (ν) M
is a constant term, which is independent of the parameter ν. Notice also that α (ν) is actually
real-valued despite the complex conjugation in (3.46), that is kept for the sake of generality.
Regarding the obtained minimum bias equation (3.46), it is straightforward to realize that
any unbiased estimator verifies (3.46). Unfortunately, the converse is not usually possible and
(3.46) supplies the least squares fitting of α (ν) to the ideal linear response α (ν) = ν within the
prior domain (i.e., |ν| < ∆/2).
Furthermore, if some elements of B (ν) are connected by an affine transformation, i.e.,
[B (ν)]i2 ,j2 = Ca [B (ν)]i1 ,j1 + Cb for any value of Ca and Cb , the system of equations in (3.46)
becomes underdetermined, as it was equation (3.27). Indeed, this is exactly what happens in
the frequency estimation case since the diagonal entries of B (ν) share the same phasor (3.45).
Thus, it is possible to reduce (3.46) to 2M − 1 equations corresponding to the diagonals of
B(ν). Nonetheless, the uppest and lowest diagonals are equal to zero if M > Nss L with L the
effective transmitted pulse duration (in symbols). Therefore, the minimization of the estimator
bias requires to fulfill the following 2K + 1 equations:
Eν α∗ (ν) ej2πνk/Nss = Eν νej2πνk/Nss
or, equivalently,
R/2
−R/2
V (f) e
j2πf k
df = Nss
R/2
−R/2
f ej2πf k df
k ∈ [−K, K]
k ∈ [−K, K]
(3.47)
where K min (M, LNss ) − 1, f ν/Nss , R ∆/Nss is the carrier uncertainty relative to the
Nyquist bandwidth and,
K H
V (f) Tr B (Nss f) M =
[M]i,i+k [B∗ (Nss f)]i,i+k
=
K
k=−K
i
k=−K
[M]i,i+k
i
∗
AAH i,i+k
'
e−j2πf k
3.E BIAS STUDY IN FREQUENCY ESTIMATION
87
0.5
0.4
0.3
0.2
α(ν)
0.1
0
−0.1
−0.2
−0.3
−0.4
K=4
−0.5
K=16
−0.5
−0.4
−0.3
−0.2
−0.1
−R/2
0
f=ν/Nss
0.1
0.2
0.3
0.4
0.5
R/2
Figure 3.10: Mean value of the frequency estimator corresponding to the minimum bias solution
for K=4 and 16.
is the Fourier transform of the sequence v[k] defined as
⎧$
⎨ [M]i,i+k AAH ∗
i,i+k
i
v[k] F−1 {V (f)} =
⎩0
|k| ≤ K
(3.48)
otherwise.
Notice that in (3.47) we have taken into account that α∗ (ν) = V (v/Nss ) − C where C must
be null to guarantee the odd symmetry of the harmonic expansion of f in the right-hand side of
(3.47).
Thus, equation (3.47) states that the 2K + 1 central terms of the discrete Fourier series of
Nss f and V (f), filtered in the interval ±R/2, must be identical in order to minimize the estimator
bias. Formally, this means that the sequence v[k] must be equal to the inverse discrete Fourier
transform of Nss f as stated in the next equation:
v[k] =
jNss δ [k] − (−1)k
2πk
|k| < K
Ideally, if K were arbitrarily long, (3.47) would imply the identity of α (ν) and ν within the
prior interval |ν| < ∆/2 or, in other words,
V (f)|K→∞ = lim
K→∞
K
k=−K
v[k]e−j2πf k = Nss f
|f| < R/2
(3.49)
88
CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION
whatever the value of R. However, since K is finite and is limited by the transmitted pulse
duration L, the value at which the above Fourier series can be truncated without noticiable
distortion is a function of the ratio R = ∆/Nss ; the smaller R, the less terms are required
for the same distortion of α (ν). In the limit (R → 0), the Taylor expansion of (3.49) around
f = 0 ensures that K = 1 is sufficient to hold exactly (3.49) with v [1] = −v [−1] = jNss /(2π).
Otherwise, if (3.49) is truncated taking too few elements, α (ν) will suffer from ripple and the
Gibbs effect, i.e., the overshooting at the discontinuity points |ν| = ±∆/2, as shown in Fig. 3.10
for the most critical situation in which R = 1.
Finally, notice that the effective duration L is inversely proportional to the effective signal
bandwidth. Because the minimum transmission bandwidth in bandpass communications is 1/T
Hz (i.e., 0% roll-off), it follows that the main lobe of the signal autocorrelation lasts 2T seconds
and, thus, in practice the Fourier series in (3.49) becomes truncated approximately at K = Nss
or, in the best case, at a few multiples of Nss .
Chapter 4
Optimal Second-Order Small-Error
Estimation
In the last chapter, second-order estimators were designed by achieving a trade-off between bias
and variance. The MMSE and minimum variance estimators were obtained averaging all the
possible values of the parameter of interest. This approach has a few drawbacks that are summarized next. First of all, second-order estimators are usually biased even if the observation time
is increased indefinitely. This fact precludes the existence of consistent quadratic estimators in
a majority of nonlinear estimation problems. Moreover, the randomness of the nuisance parameters generally causes a serious variance floor at high SNR for finite data records and, therefore,
self-noise free estimates are only possible asymptotically in case of infinite data samples.
In this chapter and the next one, the above problems are faced following two different but
complementary approaches. In both cases, a closed-loop or feed-back scheme is adopted in which
the estimator output is fed back in order to re-design the estimator coefficients and estimate once
more the parameters of interest. The closed-loop implementation allows approaching succesively
to the true parameter until the estimator attains —after convergence— the so-called small-error
regime in which the estimator operates in the neighborhood of the true solution θo . Contrarily, the estimators studied in the previous chapter were based on an open-loop or feedforward
architecture in which the parameter was extracted in a “single iterate” from the observed vector.
Based on this closed-loop architecture, two different approaches are considered in this chapter
following the arguments in Section 2.5. On the one hand, the design of iterative methods is
considered in which the observed vector y is repeatedly processed until attaining the small-error
regime. With this aim, the gradient-based algorithms presented in Section 2.5 are implemented.
The contribution of this chapter is the deduction of the optimal second-order gradient, and the
corresponding Hessian, in case of arbitrarily distributed nuisance parameters. Throughout this
chapter, we will assume that the length of the observed vector y is sufficient for exceeding the
89
CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION
90
SNR threshold and, thus, working in the small-error regime. Otherwise, the algorithm might
converge to a spurious solution, usually referred to as outlier (Section 2.3.2).
On the other hand, the design of closed-loop estimators (Section 2.5.1) and trackers (Section
2.5.2) is also addressed in this chapter. As explained in Chapter 2, closed-loop estimators process
the observation vector sequentially. The sequential implementation allows a significant reduction
in terms of complexity and is unavoidable in case of dealing with a continuous transmission
system in which the observation is infinite. It is shown in Section 2.5.1 that the closed-loop
architecture yields efficient estimates if the observation is appropriately fragmented and all the
parameters have been acquired correctly. Another important feature of closed-loop schemes is
their capability of tracking the parameter evolution in time-variant scenarios as explained in
Section 2.5.2.
As it was explained in Section 2.5.1, closed-loop estimators are composed of a discriminator
and a loop filter. The discriminator is actually a small-error estimator dedicated to detect
parameter deviations from the current estimate of θ. On the other hand, the loop filter is
responsible for filtering the noisy estimates from the discriminator and predicting the parameter
evolution in time-varying scenarios. The contribution of this chapter is the deduction of the
optimal second-order discriminator assuming that the closed loop has attained the steady-state
and, thus, it is working in the small-error regime. The actual distribution of the nuisance
parameters is considered in order to cope with the self-noise in an optimal way.
The optimal second-order discriminator is obtained focusing uniquely on the steady-state
performance and ignoring absolutely the acquisition and tracking behaviour. To complement
this approach, the optimal second-order tracker is sought in Chapter 5 based on the Kalman
filter theory. In that case, the discriminator and the loop filter are jointly, adaptively designed
to optimize both the acquisition and steady-state performance.
To summarize this introduction, the small-error regime can be achieved by means of iterative
or closed-loop algorithms. Once the small-error regime is achieved, second-order estimators are
known to be unbiased since the estimator mean response E {
α} is approximately linear on
the parameter α, irrespectively of the actual parameterization. Besides, in this small-error
situation, second-order estimators become efficient for Gaussian nuisance parameters or in lowSNR scenarios. In the following sections, optimal second-order estimators are designed for the
small-error regime and, afterwards, the resulting estimators are applied to the same estimation
problem dealt with in Section 3.4; blind frequency offset estimation from digitally-modulated
signals. More results can be found in Chapter 6 for the problems of NDA timing synchronization
(Section 6.1), NDA carrier phase synchronization (Section 6.2), time-of-arrival estimation in
multipath channels (Section 6.3), blind channel identification (Section 6.4) and, angle-of-arrival
estimation (Section 6.5).
4.1. SMALL-ERROR ASSUMPTION
4.1
91
Small-Error Assumption
In the last chapter, the variability of θ was considered by means of the prior fθ (θ). This
chapter deals with the asymptotic case in which this variability is very small (θ θo ). In this
small-error regime, the prior fθ (θ) is concentrated around the true parameter θo . Then, the
formulation presented in the last chapter can be particularized for a very informative prior fθ (θ)
holding that fθ (θ) < ε for any θ = θo with ε arbitrarily small. Accordingly, the prior can be
appropriately modelled as a Dirac’s delta centered at θ = θo , that is, fθ (θ) = δ (θ − θo ).
Assuming that the estimator works in the small-error regime, the expected value of those
complex matrices appearing in Section 3.2 and Section 3.3 can be approximated by means of
their Taylor expansion at θ θo . Thus, if F(θ) is a generic complex matrix depending on the
vector of parameters θ, its mean value in the neighborhood of θ = θo can be approximated as
follows:
Eθ {F(θ)} F (θo ) +
P
1 ∂ 2 F (θ) [Cθ ]p,q
2
∂θp ∂θ q θ=θ o
(4.1)
p,q=1
where the linear term is omitted taking into account that Eθ {θ} θo by definition, and Cθ is
the a priori covariance matrix of the parameter:
Cθ Eθ (θ − θo ) (θ − θo )H .
(4.2)
S and Q (Section 3.1) are approximated
In Appendix 4.A, the vectors and matrices r, g, Q,
in the small-error using (4.1), obtaining that
r r (θo ) ro
(4.3)
g g (θo )
(4.4)
Q
Dr Cθ DH
r
(4.5)
S
Dr Cθ DH
g
(4.6)
Q Q (θo ) Qo
where
∂r (θ) Dr ∂θT θ=θ o
∂g (θ) Dg ∂θT (4.7)
(4.8)
(4.9)
θ=θ o
Finally, under the small-error assumption, the prior is concentrated in θ = θo so that Cθ
(4.2) collapses at this point becoming proportional to a given matrix C0θ defined as
1
Cθ
∆→0 ∆
C0θ lim
with ∆ θ − θo the radius of the infinitesimal ball in which the prior is defined.
(4.10)
CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION
92
4.2
Second-Order Minimum Variance Estimator
In the small-error regime, the MMSE solution deduced in Section 3.2 makes no sense because
= g(θo ) with Mmse = 0. Thus, the
it becomes dominated by the prior in such a way that α
MMSE solution must be constrained in some way to avoid the trivial solution. Once more, the
minimum bias constraint is imposed to guarantee minimum bias around the true solution θo and,
then, the second-order minimum variance estimator in Section 3.3 is formulated again for the
small-error regime. The important point is that the bias contribution can be totally eliminated
in the small-error case (i.e., ∆ → 0). Actually, a perfect matching between the estimator mean
response
α (θ) = g(θo ) + MH (r (θ) − ro )
and the target response g (θ) is possible. The necessary and sufficient condition to have unbiased
estimates (BIAS 2 = 0) is the equality of the derivatives of α (θ) and g (θ) evaluated at θ = θo
(Appendix 4.B):
H
DH
r M = Dg
(4.11)
For the time being, the target response g (θ) is supposed to verify the above equality for
at least one matrix M. Therefore, solving again the minimization problem in (3.29) under the
constraints on b and M obtained in (3.16) and (4.11), the optimal small-error estimator is given
by
# H −1
−1
= g (θo ) +Dg DH
α
Dr Qo (
r − ro )
r Qo Dr
(4.12)
where ro and Qo were defined in (4.3) and (4.7) and, the Moore-Penrose pseudoinverse is maintained to cover those cases in which Dr is singular. Thus, the estimation error covariance matrix
is given by1
# H
−1
BBQU E (θo ) E (
α−g (θo )) (
α−g (θo ))H = Dg DH
Dg
r Qo Dr
(4.13)
and the overall variance defined in (3.34) is calculated as the trace of BBQU E (θo ), i.e.,
V ARmin = V AR (θo ) = Tr {BBQU E (θo )} .
Regarding the obtained solution, it is remarkable that the estimator covariance matrix in
(4.13) has the same structure than the CRB in Section 2.6.1 where now
−1
J2 DH
r Qo Dr
1
(4.14)
The following estimator was named in [Vil01a] the “Best Quadratic Unbiased Estimator” (BQUE) since it
can be understood as a logical extension of the well-known “Best Linear
Unbiased Estimator” (BLUE) [Kay93b,
(3.3).
Ch.6] in case of dealing with a quadratic observation, i.e., r = vec R
4.3. SECOND-ORDER IDENTIFIABILITY
93
plays the same role than the Fisher information matrix (FIM) for the family of second-order
estimators considered in this dissertation. Therefore, (4.13) can be seen as the particularization
of the Cr´amer-Rao bound to second-order estimation techniques. In section 2.6.1, the matrix J2
is shown to coincide with the FIM of the problem when the SNR is asymptotically low (Section
2.4.1) and/or the nuisance parameters are Gaussian (Section 2.4.3).
In general, it can be affirmed that
E (
α−g (θo )) (
α−g (θo ))H ≥ BBQU E (θo ) ≥ BCRB (θo )
∀θo
(4.15)
= yyH where BCRB (θ)
based on the sample covariance matrix R
for any unbiased estimator α
is the CRB of α = g (θ) (Section 2.6.1). As stated before, the second inequality in (4.15)
becomes an identity if the SNR tends to zero and/or the nuisance parameters are Gaussian
random variables.
4.3
Second-Order Identifiability
This section is devoted to the analysis of the minimum-bias constraints introduced in (4.11).
Using basic results on linear algebra, the solution of the system of equations in (4.11) offer three
different possibilities [Mag98, Sec. 2.9], which are enumerated next:
1. Dr ∈ CM
2 ×P
is full column rank. In that case, (4.11) is always consistent independently
of the content of Dg ∈ RQ×P . Assuming that Dr is a tall matrix (i.e., M 2 > P ), the
solution of (4.11) is not unique since (4.11) becomes underdetermined. Actually, the
2
bias minimization is only consuming QP degrees of freedom from M ∈ CM ×Q , whereas
the remaining degrees of freedom, M 2 − P Q are dedicated to minimize the estimator
variance.
In Appendix 4.B, it is shown that α g (θo ) + Dg (θ − θo ) in the small-error regime with
Dg = MH Dr (4.11). This means that the rank of MH Dr determines the dimension of
the subspace that contains the values of α ∈ RQ that can be estimated in the small-error
regime from the sample covariance matrix without any ambiguity. As the rank of Dr is
P , the rank of MH Dr is equal to min(P, Q) and, thus, α ∈ RQ is locally identifiable from
the sample covariance matrix assuming that Q ≤ P.
2. Dr ∈ CM
2 ×P
H
is singular and DH
g ∈ span Dr . In that case, (4.11) is consistent if and
P ×Q lies in the subspace generated by the rows of D . Then, if R < P is
only if DH
r
g ∈ R
the column rank of Dr , only QR constraints, out of the total QP constraints in (4.11),
can be imposed.
CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION
94
In that case, the rank of MH Dr is the minimum of R and Q. Therefore, the parameter
α ∈ RQ is locally identifiable from the sample covariance matrix if and only if α belongs
to the subspace generated by g (θo ) + Dg (θ − θo ) , where the rank of Dg = MH Dr is
equal to min(R, Q).
3. Dr is singular but DH
/ span DH
g ∈
r . In that case, (4.11) has no solution and, therefore,
there is not exist any unbiased second-order estimator of α = g(θ), even in the small-error
regime. In [Sto01], Stoica and Marzetta proved that a finite variance estimator does not
exist if (4.11) is not satisfied. Alternatively, the same conclusion can be drawn following
the geometrical interpretation derived in [McW93].
In that case, the designer has to proceed as done in (3.29) to obtain the best approximation
of g (θ) holding the minimum bias constraints in (3.27). Thus, substituting (4.5)-(4.6) into
(3.27), the minimum-bias constraints are given by
0 H
Dr C0θ DH
r M = Dr Cθ Dg
(4.16)
where the a priori covariance matrix C0θ (4.10) is used to carry out the matching pro-
posed in (3.27). Otherwise, if Dr is full-rank, C0θ is not profitable, showing that Bayesian
estimators cannot improve deterministic ones when the small-error assumption applies.
Focusing on the second case, there are two circumstances reducing the rank of Dr :
• The parameterization is not appropriate. In the following three situations, the estimation
problem is not correctly defined and Dr becomes singular. Example 1 : the number of
parameters Q is greater than the size of the sample covariance matrix M 2 . Example 2 :
the Q parameters are not linearly independent and, therefore, the model is “overparame = yyH , is insensitive to the phase
terized”. Example 3 : the sample covariance matrix, R
of y in second-order estimation.2
• The estimator has a finite resolution. The estimator is unable to resolve two parameters
of the same nature if they are very similar. For example, this problem arises in multiuser
estimation problems as, for example, the problem of angle-of-arrival estimation in array
signal processing (Section 6.5). It is worth noting that this situation, contrary to the
ambiguities related before, cannot be predicted beforehand so it is not possible to guarantee
(4.11) all the time. Therefore, the constraints in (4.16) should be used instead of those
in (4.11) and the general estimator in (3.33) must be adopted using now the small-error
matrices in (4.3)-(4.7).
2
The signal modulus would be also ambiguous if the noise variance σ 2w were not known, as we have assumed
throughout the dissertation.
4.4. GENERALIZED SECOND-ORDER CONSTRAINED ESTIMATORS
95
However, from the designer viewpoint, the use of (4.16) may be problematic because the
H
estimator would reduce automatically the rank of DH
g = Dr M when entering into a
singular situation (e.g., if two users cross each other as studied in Section 6.5), changing
the value of Dg . In the next section, this problem is overcome by setting free the value of
the cross derivatives of (4.11).
4.4
Generalized Second-Order Constrained Estimators
Thus far, the estimator is designed to have an unbiased mean response when working under the
small-error regime. Let us consider first that α = g (θ) is a vector of Q independent parameters
H
holding that DH
g is diagonal. In that case, the diagonal entries of Dr M are related to the
estimator bias in the neighborhood of θ = θo whereas the cross-terms reflect the coupling
between parameters or, in other words, the interparameter interference (IPI). The classical
H
unbiased solution forces DH
r M = Dg (4.11) in order to yield unbiased estimates without IPI.
However, strictly speaking, unbiased estimators are only required to constrain the value of the
diagonal entries, that is,
H
diag DH
r M = diag Dg ,
(4.17)
since the IPI contribution is zero-mean in the small-error regime and, therefore, can only increase the estimator variance. Moreover, in noisy scenarios, the IPI-free condition usually causes
noise-enhancement whereas, if the cross-terms in (4.11) are kept free, the estimator makes automatically a trade-off among noise, self-noise and IPI in order to minimize the overall variance.
Therefore, in case of independent parameters for which Dg is diagonal, the proposed secondorder unbiased estimator is given by
−1
= g (θo ) + Dg Dg−1 (J2 ) DH
r − ro )
α
r Qo (
(4.18)
−1
where J2 DH
r Qo Dr is the second-order FIM introduced in (4.14).
However, the P parameters in θ may appear coupled in α = g (θ). In that case, the matrix of
H
derivatives DH
g is not diagonal and the significance of the out-of-diagonal entries of Dg changes
radically. Assuming that DH
g is a full matrix (all the elements different from 0), any unbiased
estimator of α is required to fulfill (4.11) leading to the original small-error solution in (4.12).
In general, if DH
g is sparse, only the constraints in (4.11) corresponding to non-zero elements of
DH
g have to be imposed to obtain unbiased estimators of α.
In Section 6.5, the alternative solution obtained in (4.18) is evaluated and compared to the
classical unbiased solution in (4.12) for the problem of tracking the angle-of-arrival of multiple
digitally-modulated sources in the context of array signal processing.
CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION
96
4.5
A Case Study: Frequency Estimation
In this section, the small-error estimator proposed in (4.12) is simulated for the frequency
estimation problem addressed in Section 3.4. Additional results are given in Chapter 6 for
timing estimation and other relevant estimation problems. The results in this section show
that the optimal second-order frequency estimator is unbiased and self-noise free. The Gaussian
assumption is examined when the transmitted signal is digitally modulated showing that it is
generally appropriate in the studied uniparametric problem. In addition, some singular cases are
identified for the CPM modulation in which the Gaussian assumption is not able to cancel out
the self-noise at high SNR. Later, in Section 6.5, the interest of including the digital information
about the symbols is emphasized for the related problem of bearing estimation of multiple
digitally-modulated sources.
Based on the signal model introduced in Section 3.4 for the problem of frequency estimation,
the matrix of derivatives Dr is simply the following column vector:
'
'
∂R(ν) ∂E(ν/Nss ) AAH
dr vec
= vec
∂ν ν=ν o
∂ν
ν=ν o
where ν o is the actual value of the parameter, and [E(λ)]i,k = exp (j2πλ (i − k)) are the elements of the Toeplitz matrix introduced in Appendix 3.D. The derivative of E(ν/Nss ) is then
calculated, obtaining
∂ [E(ν/Nss )]i,k
i−k
= j2π
[E(ν/Nss )]i,k .
∂ν
Nss
Therefore, the optimal second-order small-error estimator is given by
ν = νo +
−1
dH
r Qo
(
r − ro )
−1
dH
r Qo dr
where ro and Qo were defined in (4.3) and (4.7), and the denominator is responsible for the
unitary slope of E {
ν} .
Alternatively, a classical synchronization loop can be implemented in which the received
signal is corrected using the estimated parameter ν (see Fig. 4.1). Thus, the discriminator can
be designed assuming that the input parameter is ν o = 0 once the small-error regime is attained.
Consequently, the optimal second-order discriminator is given by
ν=
−1
dH
r Qo
r
−1
dH
r Qo dr
where dr and Qo are computed at ν o = 0. Notice too that the last expression is simplified using
−1
that dH
r Qo ro = 0. This condition is fulfilled thanks to the symmetry of matrix A (ν) for the
problem at hand.
4.5. A CASE STUDY: FREQUENCY ESTIMATION
y
CORDIC
(ν o 0 )
97
µ
ν
−1
dH
r Qo
r
−1
dH
Q
o dr
r
1 + z −1
LOOP FILTER
DISCRIMINATOR
Figure 4.1: Block diagram of a (first-order) closed-loop frequency synchronizer. The optimal
second-order discriminator, which was derived in this section under the small-error condition,
is indicated in the figure. The CORDIC block is due to rotate the phase of the received signal
according to the estimated frequency offset.
Whatever the selected scheme and the actual value of ν o , the variance at the discriminator
output is given by
V AR = E ν − ν o 2 =
1
,
−1
dH
r Qo dr
which constitutes the lower bound for the variance of any quadratic unbiased frequency error
detector. If the nuisance parameters were normally distributed, then the above expression would
correspond to the (Gaussian) UCRB bound presented in Section 2.6.1.
Notice that the discriminator variance could be reduced by including the usual loop filter (Fig.
4.1). In that case, the steady-state variance of the related closed-loop estimator is computed
using the results in Section 6.1.4 following the reasoning in [Men97, Sec. 3.5.5].
The estimator performance is depicted in the following plots and compared with the MLbased estimators. The Gaussian assumption (GML) is shown to be practically optimal whatever
the working point. Nonetheless, a minor degradation of about 0.9 dB is observed in Fig. 4.2 for
positve Es /N0 in spite of increasing the observation time (Fig. 4.3). On the other hand, the
low-SNR UML solution is rapidly limited by the self-noise as the SNR is augmented, manifesting
a significant variance floor. This result is a consequence of the modulation intersymbol interference (ISI) and the finite observation time. In case of linear modulations, this high-SNR floor
disappears (e.g., MPSK, QAM, etc.). Finally, the CML solution suffers from noise-enhancement
at low SNR due also to the ISI.
The interest of the optimal small-error solution is more significant when dealing with a
partial-response CPM modulatioon such as the LREC format [Men97, Sec. 4.2]. It can be seen
that all the ML-based methods are dominated by the self-noise at high SNR (Figs. 4.4 and
4.5). The CML and GML solutions are not able to cancel out the self-noise when the number
of nuisance parameters (K) is greater than the number of samples (M). In that case, the CML
estimator cannot remove the self-noise term because there is no noise subspace where to project
the data on. Moreover, as it will be studied in Appendix 7.E, the CML and GML solutions
CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION
98
0
10
CML
−1
10
Low−SNR UML
−2
Variance
10
−3
10
GML
MCRB
−4
10
BQUE
−5
10
−6
10
−10
−5
0
5
10
15
Es/No (dB)
20
25
30
35
40
Figure 4.2: Frequency estimation variance under the small-error assumption for the optimal and
GML estimators in case of MSK symbols, Nss =2 and, M =4. The UCRB is not plotted for clarity
since it is only slightly lower than the GML performance from Es/No=-5dB to Es/No=25dB.
are not equivalent at high SNR because the columns of A (ν) are linearly dependent. Another
simulation is run in which the received signal is oversampled (Nss = 4) to guarantee that M > K
(Fig. 4.6). In that case, the GML and CML estimators supply self-noise free estimates at high
SNR, although a significant loss is exhibited for practical SNRs.
On the other hand, the optimal second-error estimator is self-noise free under the small-error
assumption, as shown for the 2REC and 3REC modulations in Fig. 4.4 and 4.5. Self-noise is
removed by exploiting the pseudo-symbols fourth-order moments matrix K. A detailed analysis
on the asymptotic behaviour of second-order estimators at high SNR is given in Section 7.3.
Two classical small-error lower bounds for the variance of unbiased estimators are used to
evaluate the performance of second-order techniques in the presence of nuisance parameters
(Section 2.6.1). The (Gaussian) UCRB corresponds to the performance of the GML estimator
in case of Gaussian nuisance parameters (Section 2.4.3). Although it has been extensively
used in the literature as a valid bound in second-order estimation, simulations show how the
UCRB is outperformed by the optimal second-order estimator when the nuisance parameters
are discrete symbols. On the other hand, the MCRB predicts the ultimate performance of
data-aided estimators that could be approached at high SNR by means of higher-order methods
[Vil01b].
4.5. A CASE STUDY: FREQUENCY ESTIMATION
99
−4
10
−5
10
Variance
GML (UCRB)
BQUE
−6
10
MCRB
−7
10
−8
10
1
10
50
M
Figure 4.3: Frequency estimation variance under the small-error assumption as a function of M
for the MSK modulation, Es/No=40dB and Nss = 2.
It is worth noting that the performance predicted in the above curves is only realistic for
high SNRs and/or a narrowband loop filter (Section 2.5.2). Otherwise, the studied closed-loop
estimators are not able to achieve the small-error regime. This abnormal behaviour is not only
associated to closed-loop schemes but it also appears in open-loop estimation in the form of
outliers or large-errors.
Closed-loop schemes are sometimes able to acquire the parameter without external assistance.
The necessary condition is that the estimator bias curve E {
ν − ν o } —the so-called S-curve—
uniquely intercepts the abcisa with positive slope at the origin. In Fig. 4.7, the acquisition stage
of a first-order tracker with forgetting factor µ = 1/20 is simulated for the 2REC modulation.
The Es /N0 is set to 60dB in order to study the relevance of the self-noise term. Both the GML
and the optimal second-order tracker are shown to acquire the parameter correctly with almost
the same speed. On the other hand, the GML self-noise variance is apparent in the steady-state.
The associated S-curves are also depicted in Figs. 4.8-4.10. It can be seen that all of them cross
the origin and have unitary slope there.
100
CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION
0
10
CML
−1
10
Low−SNR UML
−2
10
Variance
GML
−3
10
UCRB
−4
10
BQUE
MCRB
−5
10
−6
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 4.4: Estimators variance as a function of the Es/No for the 2REC modulation and M=8,
Nss =2. The number of pseudo-symbols is equal to K=12.
0
10
−1
10
CML
Low−SNR UML
−2
Variance
10
GML
−3
MCRB
UCRB
10
BQUE
−4
10
−5
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 4.5: Estimators variance as a funciton of the Es/No for the 3REC modulation and M=8,
Nss =2. The number of pseudo-symbols is equal to K=28.
4.5. A CASE STUDY: FREQUENCY ESTIMATION
101
0
10
CML
−1
10
Low−SNR UML
−2
10
Variance
GML
−3
10
UCRB
−4
10
−5
BQUE
MCRB
10
−6
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 4.6: Estimators variance as a funciton of the Es/No for the 2REC modulation and M=4,
Nss =4. The number of pseudo-symbols is equal to K=12.
0.55
BQUE
0.5
0.45
GML
Frequency Estimator Output
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
25
50
75
100
125
n (time)
150
175
200
225
250
Figure 4.7: Frequency tracker output as a function of time for the 2REC modulation in a high
SNR scenario (Es/No=60dB). The true frequency offset is equal to ν o = 0.4 (GML) and ν o = 0.5
(BQUE). Both trackers are initialized at ν = 0 with M=8, Nss =2. A first-order closed-loop is
implemented with µ = 0.02 the selected step-size or forgetting factor.
102
CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION
CML
0.5
BQUE
0.4
GML
0.3
0.2
Low−SNR UML
S−curve
0.1
0
−0.1
−0.2
SNR
−0.3
−0.4
−0.5
−1
−0.8
−0.6
−0.4
−0.2
0
ν
0.2
0.4
0.6
0.8
1
Figure 4.8: S-curve of the optimum and ML-based discriminators for the MSK modulation with
M=8, Nss =2, Es/No=10dB. The dashed arrow points out the tendency of the GML and BQUE
S-curves as the Es/No is augmented from Es/No=0 (low-SNR UML S-curve) to Es/No=∞
(CML S-curve).
0.5
0.4
0.3
0.2
S−curve
0.1
0
Low−SNR UML
−0.1
CML
−0.2
−0.3
GML
−0.4
BQUE
−0.5
−1
−0.8
−0.6
−0.4
−0.2
0
ν
0.2
0.4
0.6
0.8
1
Figure 4.9: S-curve of the optimum and ML-based discriminators for the 2REC modulation with
M=8, Nss =2, Es/No=10dB.
4.5. A CASE STUDY: FREQUENCY ESTIMATION
103
0.5
0.4
0.3
0.2
S−curve
0.1
0
CML
Low−SNR UML
−0.1
−0.2
GML
−0.3
BQUE
−0.4
−0.5
−1
−0.8
−0.6
−0.4
−0.2
0
ν
0.2
0.4
0.6
0.8
1
Figure 4.10: S-curve of the optimum and ML-based discriminators for the 3REC modulation
with M=8, Nss =2, Es/No=10dB.
CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION
104
4.6
Conclusions
The limitations of second-order feedforward methods in nonlinear estimation problems motivated
the design of closed-loop estimators for the so-called small-error regime. Generally, secondorder estimators are able to yield unbiased and self-noise free estimates once the small-error is
attained after the acquisition. This important result is verified, whatever the considered estimation problem, if all the parameters are (locally) identifiable. Focusing only on those identifiable
parameters, the prior distribution becomes irrelevant under the small-error assumption. Therefore, it can be stated that Bayesian estimators never outperform deterministic estimators in the
small-error regime.
In this chapter, the Best Quadratic Unbiased Estimator (BQUE) is formulated considering
the true distribution of the nuisance parameters. The BQUE expression is obtained analytically
by expanding the constrained minimum variance solution in Chapter 3 in a Taylor’s series in
the neighbourhood of the true parameter where the small-error condition is satisfied. The
resulting estimator is “the best” in the sense that it does not exist any other unbiased secondorder estimator yielding a lower variance. Consequently, the BQUE performance constitutes the
tigthest lower bound on the cvariance of any second-order unbiased blind estimator. Besides, it
can be interpreted as the particularization of the CRB theory to second-order estimation.
The optimal second-order estimator is proved to depend on the fourth-order cumulants of the
nuisance parameters. In some estimation problems, this fourth-order information becomes important to cope with the self-noise disturbance at high SNR. On the other hand, this information
is omitted when the Gaussian assumption is adopted. In this chapter, the frequency estimation
problem is studied concluding that the Gaussian assumption is practically optimal when we
deal with a linear constellation. However, other simulations have shown that the non-Gaussian
information about the nuisance parameters is needed to remove the self-noise at high SNR if
the number of nuisance parameters exceeds the number of observations and a partial-response
CPM transmission is considered. Some other illustrative examples will be studied in Chapter 6
in which the Gaussian assumption is questioned.
Finally, in the context of multiuser communications, the estimator peformance is seriously
affected by the so-called multiple access inteference (MAI). The original BQUE solution is forced
to eliminate the MAI contribution and, for this reason, it suffers from a significant noise enhancement in noisy scenarios. Thus, it is preferable to include the MAI term in the estimator
optimization in order to make an optimal trade-off among the three disturbing random terms:
thermal noise, self-noise and MAI. The obtained MAI-resistant BQUE estimator is further evaluated in Section 6.5 for the problem direction-of-arrival estimation in cellular communication
systems.
4.A SMALL-ERROR MATRICES
105
Appendix 4.A Small-error matrices
(θ) as the arguments inside the brackets of (3.24) and (3.23):
Let us define S (θ) and Q
S (θ) (r (θ) − r) (g (θ) − g)H
(θ) (r (θ) − r) (r (θ) − r)H .
Q
Regarding the matrix S (θ), it is easy to show that
S(θo ) = 0
'H
'H
∂g (θ) ∂g (θ) ∂r (θ) ∂r (θ) ∂ 2 S (θ) =
+
∂θ p ∂θq θ=θ o
∂θp θ=θ o
∂θq θ=θ o
∂θq θ=θ o
∂θp θ=θ o
H
= [Dr ]p [Dg ]H
q + [Dr ]q [Dg ]p ,
since the pair of terms depending on r (θ) − ro and g (θ) − g vanish at θ = θo .
Then, equation (4.6) is obtained after plugging into (4.1) the following term:
P
P
P
∂ 2 S (θ) H
[C
]
=
[D
]
[D
]
[C
]
+
[Dr ]q [Dg ]H
r
g
θ
θ
p,q
p
p,q
q
p [Cθ ]p,q
∂θ
∂θ
p
q
θ=θ o
p,q=1
p,q=1
p,q=1
T H
H
= Dr Cθ DH
g + Dr Cθ Dg = 2Dr Re {Cθ } Dg .
(4.19)
(θ), it is found that
Proceeding in the same way with the matrix Q
o) = 0
Q(θ
'H
'H
(θ) ∂r (θ) ∂r (θ) ∂r (θ) ∂r (θ) ∂2Q
=
+
∂θ p ∂θq ∂θp θ=θ o
∂θq θ=θ o
∂θq θ=θ o
∂θ p θ=θ o
θ=θ o
H
= [Dr ]p [Dr ]H
q + [Dr ]q [Dr ]p .
Then, equation (4.5) is deduced after plugging into (4.1) the following expression:
P
P
P
(θ) ∂2Q
H
[Cθ ]p,q =
[Dr ]p [Dr ]q [Cθ ]p,q +
[Dr ]q [Dr ]H
p [Cθ ]p,q
∂θp ∂θq p,q=1
θ=θ o
p,q=1
p,q=1
T H
H
= Dr Cθ DH
r + Dr Cθ Dr = 2Dr Re {Cθ } Dr .
(4.20)
Finally, the real operator in (4.19) and (4.20) can be omitted taking into account that the
vector of parameters is actually real-valued throughout this dissertation.
CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION
106
Appendix 4.B Proof of bias cancellation
If the Taylor expansion of α (θ) and the target response g (θ) are calculated around θ = θo , it
is found that
∂α(θ) α (θ) g (θo ) +
(θ − θo ) = g (θo ) +MH Dr (θ − θo )
∂θT θ=θ o
∂g (θ) (θ − θo ) = g (θo ) +Dg (θ − θo )
g (θ) g (θo ) +
∂θT θ=θ o
(4.21)
with Dr and Dg defined in (4.8) and (4.9), respectively. Therefore, if (4.21) is plugged into
(3.14), it follows that
BIAS 2 = Eθ α (θ) − g (θ)2 = Tr
H MH Dr − Dg Cθ MH Dr − Dg
where Cθ is the prior covariance matrix introduced in (4.2). Therefore, it follows that MH Dr =
Dg is a necessary and sufficient condition to ensure that BIAS 2 = 0 in the small-error regime
if Cθ is a full-rank matrix.
Chapter 5
Quadratic Extended Kalman
Filtering
As it was explained in Section 2.5.2, a tracker is a closed-loop estimator that is able to follow
the variations of the parameters of interest. To do so, the tracker is composed of a discriminator
and a loop filter (see Fig. 2.4). In that scheme, the discriminator is designed to deliver unbiased estimates that are further integrated at the loop filter according to the known parameter
dynamics.
In Chapter 4, the optimal second-order discriminator was formulated by minimizing the
steady-state variance subject to the unbiasedness constraint. In this optimization, it was assumed
that the small-error condition is satisfied in the steady-state. Implicitly, this assumption means
that all the parameters have been initially acquired and the tracker is following accurately
their temporal evolution. However, the tracker optimization was carried out without taking
into account the acquisition and tracking performance. For this reason, the loop filter was not
involved in the design.
Alternatively, the Kalman filter is designed considering globally both the acquisition and
steady-state performance. In the Kalman filter theory, the parameter is modelled as a random
variable of known statistics [And79][Kay93b, Ch. 13]. The Kalman filter, which is linear in
the observed data, is known to be the optimal tracker if the parameters and observations are
Gaussian random variables of known a priori mean and variance. In that case, the optimality
of the Kalman filter means that it provides minimum variance unbiased estimates in the steadystate as well as minimum MSE estimates during the acquisition.
From the results in Chapter 4, the prior distribution about the parameters is useless once the
small-error regime is attained. However, this information is very relevant during the acquisition,
that is, in the large-error regime. The Kalman filter is considered in this thesis because it
107
CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING
108
performs a gradual transition from the large-error regime in Chapter 3 to the small-error regime
in Chapter 4 as the observation length increases. As stated before, this transition is optimal if
and only if all the random variables are Gaussian distributed.
Unfortunately, the Gaussian condition is quite restrictive because it implies linear models for
the observation as well as for the parameter dynamics. Otherwise, the observation and dynamics
equations have to be linearized in order to derive the so-called Extended Kalman filter (EKF)
[And79][Kay93b, Sec. 13.7]. It can be shown that the EKF is solely the best linear tracker in
the steady-state independently of the parameter and observation distribution. This statement is
verified because, whatever the parameterization and dynamical model at hand, the observation
and dynamics equations are always linear in the vector of parameters if these equations are
approximated around the true value of the parameters (small-error assumption). On the other
hand, nothing can be stated about the EKF optimality during the acquisition stage (large-error
regime), which is actually uncertain.
In the context of blind parameter estimation, second-order methods are mandatory because
the observation is zero mean. Thus, the EKF is extended in this chapter to deal with quadratic
observation models. The result is the so-called Quadratic Extended Kalman Filter (QEKF)
that constitutes an alternative deduction for the optimal second-order tracker studied in Chapter 4. The main advantage is that the QEKF adjusts automatically its response during the
acquisition phase in order to speed up the tracker convergence without altering the (optimal)
steady-state solution. On the other hand, in Chapter 4, the tracker response was specifically designed for the steady-state (small-error regime) and it was not changed during all the operation
time. Therefore, the QEKF can be seen as a time-variant quadratic tracker that automatically
adjusts the loop bandwidth depending on the current uncertainty on the parameters (Section
2.5.2). Thus, the QEKF bandwidth is progressively decreased during the acquisition time and
is finally “frozen” in the steady-state. Another important feature is that, assuming a successful
acquisition, the QEKF provides a recursive low-cost implementation of the minimum variance
unbiased estimator when the observation time increases indefinitely and the parameters remain
stationary.
The main criticism about the EKF/QEKF tracker is that the acquisition cannot be guaranteed. Effectively, even in the noiseless case, the linearized model assumed in the EKF/QEKF
formulation is not correct when the tracker operates out of the small-error regime, e.g., during
the acquisition. To overcome this inconvenient, the Unscented Kalman Filter (UKF) is proposed
in [Jul97][Wan00]. The UKF applies the actual nonlinear observation model to propagate correctly the mean as well as the covariance of the Gaussian parameter. The important point is
that the convergence of the UKF is guaranteed under some mild conditions [Mer00, Sec. 5].
Implicityly, the UKF is still assuming Gaussian parameters.
For other statistical dis-
tributions, sequential Monte Carlo estimators —also named particle filters— can be applied
5.1. SIGNAL MODEL
109
[Mer00][Mer01]. The complexity of these methods is usually much greater than that of the
well-known EKF. Anyway, the UKF and other particle filters were not considered in this thesis
because they are actually higher-order techniques in which the observed samples are plugged into
= Eθ E {θ/y} . A
nonlinear posterior distributions to approximate the MMSE estimator, i.e., θ
tutorial article on the UKF and related sequential Monte Carlo methods is provided in [Mer03].
Finally, the QEKF is deduced and evaluated in the context of DOA estimation and tracking.
The Gaussian assumption on the nuisance parameters is tested once more showing the significant
improvement in terms of acquisition time as well as steady-state variance when the received
signals are digitally modulated and this information is correctly exploited.
The results in this section were presented in the 3rd IEEE Sensor Array and Multichannel
Signal Processing Workshop that was held in Barcelona in 2004 [Vil04a]:
• “On the Quadratic Extended Kalman Filter”, J. Villares, G. V´
azquez. Proc. of the Third
IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM 2004). July
2004. Sitges, Barcelona (Spain).
5.1
Signal Model
Let us consider a time-variant scenario in which the observed vector at time n is given by
yn = A (θn ) xn + wn
n = 1, 2, 3, ...
(5.1)
where the transfer matrix A (θn ) is known except for a vector of P real-valued parameters θn , xn
is the vector of K unknown zero-mean inputs and, wn is the vector of Gaussian noise samples.
The covariance matrix of xn and wn is given by
H
= Rw δ (k)
E wn wn+k
H
E xn xn+k = IK δ (k) ,
respectively. Therefore, we are assuming that the noise and the nuisance parameters are uncorrelated in the time domain.
In order to track the parameter evolution in time, the estimator is provided with the following
dynamical model or state equation [And79][Kay93b, Ch.13]:
θn = f (θn−1 ) + un
(5.2)
where un is a zero-mean random variable of known covariance matrix Ru E un uH
modeling
n
the uncertainty about the assumed model. The initial state θ0 is also a random variable of known
CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING
110
mean µ0 E {θ0 } and covariance matrix Σ0/0 E θ0 θH
0 . These two quantities summarize
all the available prior information about the parameter θn .
It is important to note that consistent estimates cannot be obtained using linear schemes
because the observation yn is zero-mean. Thus, blind estimation imposes the need of secondorder tecnniques that are known to be optimal for low-SNR and/or Gaussian data (Section 2.4.1
and Section 2.4.3). Accordingly, the following quadratic measurement equation is considered:
rn vec yn ynH = h (θn ) + vn (θn )
(5.3)
where
h (θn ) E {rn } = vec A (θn ) AH (θn ) + Rw
vn (θn ) rn − E {rn }
H
H
H H
H
= vec A (θn ) xn xH
n − IK A (θ n ) + A (θ n ) xn wn + wn xn A (θ n ) + wn wn − Rw
are the signal and noise components of the measurement equation, respectively. Notice that
the observation noise vn (θn ) is zero-mean and it depends on the wanted parameters in the
considered quadratic model.
5.2
Background and Notation
Following the classical notation in [And79], an/m will denote the linear MMSE estimate of a
given random vector an based on the quadratic observations r1 , . . . , rm . This means that an/m
is an affine transformation of the sample covariance vectors r1 , . . . , rm in (5.3) or, equivalently,
a quadratic transformation of the input data y1 , . . . , ym (5.1).
It is well-known that the MMSE estimator E {an /r1 , . . . , rm } is linear in r1 , . . . , rm if and
only if an and r1 , . . . , rm are jointly Gaussian distributed. However, the Gaussian assumption
is not satisfied most times. In that case, it is convenient to introduce the following notation
n/m = EL {an /r1 , . . . , rm }
a
n/m , bearing in mind that EL {an /r1 , . . . , rm } =
to refer to the linear MMSE estimator a
E {an /r1 , . . . , rm } in the Gaussian case [And79, Sec. 5.2].
The Kalman filter can be seen as a sequential implementation of the linear MMSE estimator
of θn that, using the notation above, is given by
n/n = EL {θn /r1 , . . . , rn } .
θ
n/n is unavoidable as the
From a complexity point of view, the sequential computation of θ
number of observations augments (n → ∞).
The Kalman filter recursion is based on two facts:
5.2. BACKGROUND AND NOTATION
111
• The orthogonalization (decorrelation) of the original observations r1 , . . . , rn using the
Gram-Schmidt method1 .
The transformed observations are the so-called innovations
r1 , . . . , rn [And79][Kay93b][Hay91], which are computed as
rn/n−1
rn = rn − (5.4)
with
rn/n−1 = EL {rn /r1 , . . . , rn−1 }
the linear MMSE prediction of rn based on the past observations r1 , . . . , rn−1 . Thus, the
innovation rn supplies the new information contained in the observation rn or, in other
words, it yields the unpredictable component of rn . It can be shown that the innovation
rn is zero-mean and it is uncorrelated with both rm and rm for any m = n. Using this
property, it is easy to show that
n/n = EL {θn /r1 , . . . , rn } = EL {θn /
r1 , . . . , rn } =
θ
n
EL {θn /
rk }
k=1
n/n−1 + EL {θn /
= EL {θn /r1 , . . . , rn−1 , rn } = θ
rn }
n/n−1 + EL θ
n /
=θ
rn ,
(5.5)
where
n/n−1 = EL {θn /r1 , . . . , rn−1 }
θ
n/n−1
n θ n − θ
θ
(5.6)
are the linear MMSE prediction of θn —based on the past observations r1 , . . . , rn−1 — and
n are
n/n−1 and θ
the resulting prediction error, respectively. It can be shown that both θ
zero mean and they are uncorrelated with both rn and rn . In fact, this property
has been
n/n−1 /
applied to obtain the final expression in (5.5) considering that EL θ
rn = 0.
• The existence of a linear state equation (5.2) as well as a linear measurement equation
n/n (5.5) can be obtained from θ
n−1/n−1 (i.e., the previous
(5.3). When this is possible, θ
estimate) and rn (i.e., the new datum) bearing in mind that
EL {Man /r1 , . . . , rn } = MEL {an /r1 , . . . , rn } .
(5.7)
Unfortunately, the state and measurement equations in (5.2)-(5.3) are generally nonlinear in
the parameters of interest. Consequently, these two equations have to be linearized in order to
apply the Kalman filter formulation. This matter is addressed in the next section.
1
Although xn and wn are uncorrelated in Sec. 5.1, the observations r1 , . . . , rn are correlated because they
depend on the random parameters θ 1 , . . . , θ n , which are correlated in the assumed dynamical model (5.2).
CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING
112
5.3
Linearized Signal Model
In order to have linear state and measurement equations, the original nonlinear equations (5.2)n−1/n−1 and θn = θ
n/n−1 ,
(5.3) are expanded in a first-order Taylor series at the points θn−1 = θ
n−1/n−1 and θ
n/n−1 are the linear MMSE
respectively. These points are selected because θ
n−1/n−1 and
estimates of θn−1 and θn before the new datum rn is processed. By definition, θ
n/n−1 are given by
θ
n−1/n−1 = EL {θn−1 /r1 , . . . , rn−1 }
θ
n/n−1 = EL {θn /r1 , . . . , rn−1 } .
θ
n−1/n−1 and θ
n/n−1 are previously computed at time n −1, the QEKF
Thus, assuming that θ
will be derived from the linearized state and quadratic measurement equations given next:
n−1/n−1 + un
n−1/n−1 θn−1 − θ
n−1/n−1 + F θ
θn ≈ f θ
(5.8)
n/n−1 + vn θ
n/n−1 θn − θ
n/n−1
n/n−1 + Hn θ
(5.9)
rn ≈ h θ
where F (θn−1 ) and Hn (θn ) are the Jacobian of θn and rn , respectively, that is given by
∂θn
∂f (θn−1 )
=
T
∂θn−1
∂θTn−1
∂rn
∂h (θn ) ∂vn (θn )
Hn (θn ) =
+
.
T
∂θn
∂θTn
∂θTn
F (θn−1 ) n/n−1 in (5.9) can be computed as
From the linearized state equation (5.8), the prediction θ
n/n−1 = f θ
n−1/n−1 ,
(5.10)
θ
using (5.7) and taking into account that the noise un is zero mean. On the other hand, the
Jacobian Hn (θn ) is calculated from (5.3), obtaining
%
H
∂A (θ)
H
H ∂A (θ)
xn xH
A
(θ)
+
A
(θ)
x
x
[Hn (θ)]p = vec
n n
n
∂θp
∂θp
&
H
∂A (θ)
∂A (θ)
+
xn wnH + wn xH
n
∂θp
∂θp
where θp stands for the p-th component of θ. Note that the transfer matrix Hn (θ) appearing
in (5.9) is noisy because it depends on the random terms xn and wn . This particularity is a
consequence of the original quadratic observation model (5.3).
5.4
Quadratic Extended Kalman Filter (QEKF)
In this section, the Kalman filter is derived from the quadratic and linearized model introduced
in the last two sections. The resulting tracker is named the Quadratic Extended Kalman Filter (QEKF) because it corresponds to the so-called Extended Kalman Filter (EKF) [And79]
5.4. QUADRATIC EXTENDED KALMAN FILTER (QEKF)
113
[Kay93b, Sec. 13.7] in case of having quadratic observations (5.3). The QEKF is thus obtained
from (5.5) after solving the second term as indicated now:
n /
EL θ
rn = MH
rn
n
(5.11)
where Mn is the so-called Kalman gain matrix,
Mn Q−1
n Sn
n
θ
Sn E rH
n
H
Qn E rn ,
rn
(5.12)
(5.13)
(5.14)
and E {·} stands for the expectation with respect to all the random terms inside the brackets,
namely θ0 , . . . , θn and r1 , . . . , rn .
The Kalman gain matrix (5.12) has been derived using the following well-known result
[Kay93b, Eq. 12.6]:
EL {x/y} = E {x} + E xyH E −1 yyH (y−E {y})
n and rn introduced in (5.6) and (5.4), reparticularized for the zero-mean random vectors θ
spectively. This abbreviated deduction of the extended Kalman filter [Kay93b, App. 13.B] is
based on the following two important equations:
E {
rn } = Er1 ,... ,rn−1 E {
rn /r1 , . . . , rn−1 } = 0
n = Er1 ,... ,rn−1 E θ
n /r1 , . . . , rn−1 = 0
E θ
n /r1 , . . . , rn−1 are strictly zero in view of their definitions
since E {
rn /r1 , . . . , rn−1 } and E θ
in (5.4) and (5.6), respectively.
Therefore, plugging (5.10) and (5.11) into (5.5), we obtain the QEKF recursion:
n−1/n−1 + SH Q−1 rn − n/n = f θ
rn/n−1
θ
n n
(5.15)
where
n−1/n−1
n/n−1 = h f θ
rn/n−1 = h θ
is obtained from (5.15) and (5.10).
5.4.1
Another QEKF derivation
Thus far, the classical formulation of the QEKF is sketched introducing some simplifications.
In this section, a simpler derivation of the QEKF is proposed based on the general formulation
CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING
114
in Section 3.2. In fact, the solution in (5.15) is obtained by considering a generic second-order
estimator,
n/n = bn + MH rn ,
θ
n
and solving the following optimization problem:
2
bn , Mn = arg max E b + MH rn − θn /r1 , ..., rn−1
b,M
where bn and Mn are the independent and quadratic components of the second-order MMSE
estimator of θn , respectively. In (3.16), it was obtained that the optimal independent term is
n/n−1 − MH bn = θ
rn/n−1 . On the other hand, the optimal quadratic term Mn was derived in
n
(3.25) obtaining precisely the Kalman gain matrix in (5.12).
The conditional expectation in the last equation suggests that the random parameters are
averaged by means of the prior distribution fθ n /r1 ,...,rn−1 (θn /r1 , ..., rn−1 ), which has all the
existing knowledge on θn before processing rn . In that way, the QEKF provides a means of
updating the prior distribution every time a new datum is incorporated. In case the QEKF
converges to the true parameter, the sequence of priors fθ n /r1 ,...,rn−1 (θn /r1 , ..., rn−1 ) becomes
progressively more informative until the small-error regime is attained (Chapter 4). Moreover,
if the Gaussian assumption applies, the prior updating is optimal, minimizing so the acquisition
time. Definitely, this was the motivation of considering in this thesis the Kalman filter formulation: the QEKF provides the transition from the MMSE large-error solution in Chapter 3 to
the small-error BQUE solution in Chapter 4.
An evident connection is observed between (5.15) and the expression obtained for the optimal
second-order discriminator in (4.12). However, there are some important differences. First of
all, the so-called Kalman gain matrix Mn appearing in (5.15) includes both the discriminator
and the loop filter of a classical closed-loop implementation. Moreover, Mn is time-varying
and, therefore, the QEKF is able to adjust online the overall tracker response in view of the
instantaneous uncertainty about the parameters.
It can be shown that the QEKF and the closed-loop implementation in Chapter 4 become
equivalent in the steady-state if they are arranged to have the same (noise equivalent) loop
bandwidth. Formally, it is verified that
lim Mn = diag (µ) M
n→∞
where M is the optimal second-order discriminator obtained in (4.12) and, the vector of step
sizes µ is determined by the state equation noise covariance matrix Ru E un uH
(5.2).
n
The proof of this important statement would require to solve properly the Ricciati steady-state
equation [And79, Ch.4] and it suggests an in-depth study that is still incomplete.
5.4. QUADRATIC EXTENDED KALMAN FILTER (QEKF)
5.4.2
115
Kalman gains recursion
The linearized model in Section 5.3 allows obtaining Mn = Q−1
n Sn recursively. In this section,
the QEKF deduction is completed by making this recursion explicit.
Let us study first the cross-correlation matrix Sn (5.13). It is easy to prove that
n/n−1 Σn/n−1 = Dr f θ
n−1/n−1 Σn/n−1
Sn = Dr θ
where
H /r1 , ..., rn−1 = F θ
n−1/n−1 Σn−1/n−1 FH θ
n−1/n−1 + Ru
n θ
Σn/n−1 E θ
n
(5.16)
n expressed as a function of the estimation
is the covariance matrix of the prediction error θ
MSE matrix2 at time n − 1:
H
Σn−1/n−1 = E
θn−1 − θn−1/n−1 θn−1 − θn−1/n−1
/r1 , ..., rn−1 .
The linear relationship between Σn/n−1 and Σn−1/n−1 is a consequence of the linearized state
equation (5.8).
On the other hand, Dr (θ) = E {Hn (θ)} was introduced in Chapter 4 as the matrix collecting
the covariance matrix derivatives, i.e.,
&
%
∂A (θ) H
∂AH (θ)
.
A (θ) + A (θ)
[Dr (θ)]p = vec
∂θp
∂θp
Likewise, the innovations covariance matrix Qn/n−1 can be also computed from the last estin−1/n−1 and the associated MSE matrix Σn−1/n−1 . In the studied quadratic observation
mate θ
model, the deduction of Qn/n−1 results
a little bit more involved because Hn (θ) is random
n/n−1 depends on the parameterization. Ommiting
(noisy) and the measurement noise vn θ
n/n−1 = f θ
n−1/n−1 for the sake of clarity, it follows that
the dependence on θ
H
H
Qn/n−1 E rn /r1 , ..., rn−1 = E Hn Σn/n−1 HH
rn
n + E vn vn
Regarding now the second term, E vn vnH is the measurement noise covariance (Section
5.1). It is easy to realize that E vn vnH is the fourth-order matrix Q (θ) introduced in (3.9) for
n/n−1 . In that chapter, a closed-form expression was deduced for Q (θ) in equation (3.10),
θ=θ
obtaining
Q (θ) = R∗ (θ) ⊗ R (θ) + A (θ) KAH (θ)
2
(5.17)
Due to the original nonlinear signal model, Σn−1/n−1 is not the tracker covariance matrix. However, following
the original nomenclature in the Kalman filter theory, Σn−1/n−1 will be referred to as the MSE matrix.
CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING
116
where R (θ) = A (θ) AH (θ) + Rw , A (θ) = A∗ (θ) ⊗ A (θ) and K is the so-called kurtosis
matrix, that supplies all the non-Gaussian information about the nuisance parameters xn . Once
more, K plays a prominent role in this chapter in case of non-Gaussian nuisance parameters.
Regarding now the first term of Qn/n−1 , it follows that
P
H
Σn/n−1 p,q Hp,q (θ)
E Hn Σn/n−1 Hn =
p,q=1
n/n−1
θ=θ
where, after some tedious manipulations,
Hp,q (θ) E [Hn (θ)]p [Hn (θ)]H
q
H
= A∗ (θ) ⊗ [Dr (θ)]p + [Dr (θ)]∗p ⊗ A (θ) K A∗ (θ) ⊗ [Dr (θ)]q + [Dr (θ)]∗q ⊗ A (θ)
∗
H
+R∗w ⊗ [Dr (θ)]p [Dr (θ)]H
+
[D
(θ)]
[D
(θ)]
⊗ Rw
(5.18)
r
r
q
q
p
with
H
xn xH
= IK 2 + vec (IK ) vecH (IK ) + K
K E vec xn xH
n vec
n
being K the nuisance parameters kurtosis matrix.
Therefore, it is found that the Kalman gains Mn can be computed from the previous estimate
θn−1/n−1 and the associated covariance matrix Σn−1/n−1 . In order to apply this recursion to
Mn+1 in the next time instant, it is necessary to evaluate the estimation MSE matrix at time
n, which is given by
Σn/n
H
E
θn − θn/n θn − θn/n
/r1 , ..., rn = Σn/n−1 − MH
n Sn
considering the QEKF solution in (5.15).
5.4.3
QEKF programming
In this section, the more important equations in the QEKF deduction are listed in order to
n−1/n−1
facilitate its implementation in a hardware or software platform. Thus, assuming that θ
and Σn−1/n−1 were computed in the previous iterate, the following operations must be carried
out when the new sample yn is received:
1. Prediction:
n/n−1 = f θ
n−1/n−1
θ
n−1/n−1
n/n−1 = h f θ
rn/n−1 = h θ
n−1/n−1 Σn−1/n−1 FH θ
n−1/n−1 + Ru
Σn/n−1 = F θ
5.5. SIMULATIONS
117
2. Kalman gain:
Mn = Q−1
n Sn
Sn = Dr (θ) Σn/n−1 θ=θ
Qn =
P
Σn/n−1
p,q=1
n/n−1
∗
H
H
(θ)
+
R
(θ)
⊗
R
(θ)
+
A
(θ)
KA
(θ)
p,q
p,q
n/n−1
θ=θ
where Mn is eventually a function of the signal model A (θ) and its derivatives
n/n−1 — as well as the noise covariance Rw and
∂A (θ) /θ1 , . . . , ∂A (θ) /θP —evaluated at θ
the kurtosis matrix K. The exact expressions of Sn and Qn were deduced in Section 5.4.2.
3. Estimation:
n/n−1 + MH rn − n/n = θ
rn/n−1
θ
n
with rn = vec yn ynH the sample covariance matrix.
4. MSE matrix update:
−1
Σn/n = Σn/n−1 − MH
n Sn = Σn/n−1 − Sn Qn Sn .
5.5
Simulations
Let us consider the problem of tracking the direction-of-arrival (DOA) of P mobile terminals
transmitting toward a uniform linear array composed of M > P antennas spaced λ/2 meters, with λ the wavelength of the received signals. The received signal is passed throught
the matched-filter and then sampled at one sample per symbol. We will consider independent
snapshots assuming that the actual modulation is ISI-free and the P signals are perfectly synchronized. Assuming for simplicity that all the users are received with the same power, the
observed signal verifies the linear signal model in equation (5.1) with
A (θn ) exp jπdM θTn
dM = [0, . . . , M − 1]T
being xn the transmitted symbols for the P users and wn the vector of AWGN samples with
E wn wnH = σ2w IM . Therefore, the SNR (per user) is given by σ−2
w bearing in mind that
E xn xH
IK with K = P in this case.
n
Several illustrative simulations have been carried out to evaluate the performance of the
QEKF (5.15) when the transmitted signal is digitally modulated. The optimum QEKF is compared with the one based on the Gaussian assumption that is obtained imposing K = 0 into
CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING
118
−1
10
−2
10
−3
MSE
10
Gaussian
QEKF
−4
10
Optimum
QEKF
−5
10
−6
10
−7
10
0
2
4
6
8
10
n (time)
12
14
16
18
20
Figure 5.1: Estimation MSE as a function of time for the optimum and Gaussian QEKF in
the case of a single static MPSK-modulated user with random DOA in the range ±0.4 and
SNR=40dB.
(5.17) and (5.18). This suboptimal QEKF will be referred to as the Gaussian QEKF in the
sequel. The normalized mean square error (MSE) is adopted as the figure of merit, that at time
n reads
2 1
MSE (n) E θn − θn/n .
P
- Simulation 1: in Fig. (5.1), a single user (P = 1) transmitting from a static DOA is
simulated. The transmitted symbols are drawn from a phase shift keying (MPSK) constellation.
The basestation array is composed of M = 4 antennas and the SNR per user is set to 40 dB. This
very high SNR scenario is studied in order to analyze how the trackers cope with the random
nuisance parameters, i.e., the so-called self-noise.
0/0 = 0 with Σ0/0 = 1000. Then, 1000 realizations are run
The estimator is initialized at θ
with θ uniformly distributed within (−0.4, 0.4). The parameter range is limited in this interval
because the tracker acquisition margin is limited to ±2/M = ±0.5. In general, the QEKF
solution is unique, whatever the initial start-up, if and only if M = P + 1. When M > P + 1
the array directivity is augmented but new sidelobes appear in the array beam pattern yielding
spurious solutions.
Figure 5.1 depicts the estimated MSE(n) for the optimum and Gaussian QEKF. The state
equation noise Ru (5.2) is set up to attain the same steady-state variance in both cases. It
5.5. SIMULATIONS
119
−1
10
−2
10
Gaussian
Assumption
−3
MSE
10
T
−4
θ=[−0.1,0.1]
10
θ=[−0.2,0.2]T
−5
10
−6
10
0
5
10
15
20
n (time)
25
30
35
40
Figure 5.2: Estimation MSE as a function of time for the optimum and Gaussian QEKF in the
case of two static MPSK-modulated users placed at ±0.2, ±0.1 and SNR=40dB.
becomes apparent that the acquisition time is reduced if the QEKF exploits the digital structure
of the received signal by incorporating the kurtosis matrix K. Alternatively, this improvement
could be used to reduce the QEKF steady-state variance if the optimum and Gaussian QEKF
trackers were adjusted to yield the same acquisition time.
- Simulation 2: in this simulation, we have P = 2 users transmitting from θ = [−0.1, 0.1]T
or θ = [−0.2, 0.2]T . The array size is M = 4 and the SNR per user is again 40 dB. The QEKF
0/0 = [−0.5, 0.5]T with Σ0/0 = 1000IP and Ru = 10−3 IP . The
trackers are initialized at θ
resulting MSE(n) is plotted in Fig. 5.2 for the optimum and Gaussian QEKF. Once more, the
fourth-order information about the discrete symbols is shown to improve the QEKF performance
in both the acquisition and steady-state regimes. As shown in figure 5.2, the closer are the two
sources the higher is this improvement. Further simulations showed that the simulated Gaussian
QEKF is unable to acquire the actual DOAs in some cases, e.g., θ = [0.2, 0.4]T , whereas the
optimum QEKF converges eventually to the true DOAs.
- Simulation 3: in this simulation, the state equation noise is removed (Ru → 0) in order
to evaluate the estimator consistency when n → ∞. First of all, the DOAs are acquired (n < 0)
with all the QEKFs adjusted to yield the same steady-state variance (0 < n < 20). From
this steady-state situation, Ru is set to zero at n = 20 so that the QEKF (noise equivalent)
bandwidth is progressively reduced. At this moment, the optimum QEKF becomes an orderrecursive implementation of the second-order tracker in Section 4.2 as the observation time goes
CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING
120
−5
10
−6
10
MSE
θ=+/−0.01
Gaussian
QEKF
θ=+/−0.1
−7
10
θ=+/−0.2
−8
10
Optimum
QEKF
−9
10
1
2
10
10
n (time)
Figure 5.3: Estimation MSE as a function of time for the optimum and Gaussian QEKF (dashed)
in the case of two static MPSK-modulated sources transmitting from ±0.2 (), ±0.1 () and,
±0.01 (♦) when Ru is set to zero at n=20. SNR=40dB.
to infinity. Likewise, the Gaussian QEKF implements the well-known GML estimator explained
in Section 2.4.3. These statements are based on the technical discussion in Section 5.4.1.
In Fig. 5.3, numerical results are provided for two quiet users at θ = ±0.01, ±0.1 or ±0.2
with M = 4 and SNR=40dB. We observe that the Gaussian assumption suffers a constant
penalty as n is augmented. Consequently, the GML estimator is proved to be suboptimal at
high SNR when the modulation has constant envelope (e.g., MPSK or CPM [Pro95]), even if
the observation is arbitrarily large (n → ∞). This result is further validated by means of the
asymptotic study in Section 7.4.5.
Finally, notice that the incurred loss is a function of the users angular separation. Surprisingly, the variance of the optimal QEKF is improved as the user are closer. This abnormal
result is a consequence of the secondary lobes of the array response when the number of antennas is small. The same effect will be observed in Section 7.5 when studying the asymptotic
performance of the optimal small-error DOA tracker.
- Simulation 4: in order to validate that the simulated QEKFs are tracking the actual
DOAs, the users are moved with constant angular speed from −0.8 to 0.8 with fixed angular
separation (0.02). 50 trials are plot in figure 5.4 showing that the Gaussian QEKF fails in
tracking the two users.
5.6. CONCLUSIONS
121
−0.6
−0.65
Optimum
QEKF
DOA
−0.7
Gaussian
QEKF
−0.75
−0.8
−0.85
0
10
20
30
40
50
n (time)
60
70
80
90
100
Figure 5.4: DOA Tracking of two close MPSK-modulated signals separated 0.02 using the
optimum and Gaussian QEKF. SNR=40dB.
- Simulation 5: the same simulation in Fig. 5.3 has been carried out for a low signal-tonoise ratio (SNR=10 dB) and a multilevel modulation such as 16-QAM [Pro95]. Figures 5.5
and 5.6 manifest the optimality of the Gaussian assumption when multilevel constellations or
low SNRs are considered, respectively.
5.6
Conclusions
The EKF formulation has been extended to deal with quadratic signal models that appear naturally in blind estimation problems. The resulting Quadratic EKF (QEKF) is found to exploit the
fourth-order cumulants (kurtosis) of the unknown inputs whereas this information is implicitly
omitted when the classical Gaussian assumption is adopted in the design. The QEKF is further
applied to estimate and track the DOA of multiple digitally-modulated sources concluding that
constant amplitude modulations (e.g., MPSK or CPM) yield a significant improvement in terms
of acquisition and/or steady-state variance for moderate-to-high SNRs. In these scenarios, the
Gaussian assumption is found to provide suboptimal DOA estimators or trackers even if the
tracker bandwidth is indefinitely reduced or, in other words, the (effective) observation time is
increased without limit.
CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING
122
−5
10
−6
10
MSE
+/−0.01
−7
10
+/−0.1
+/−0.2
−8
10
−9
10
1
2
10
10
n (time)
Figure 5.5: Estimation MSE as a function of time for the optimum and Gaussian QEKF (dashed)
in the case of two 16-QAM modulated signals received from ±0.2 (), ±0.1 () and ±0.01 (♦)
when Ru is set to zero at time n=20. SNR=40dB.
θ=+/−0.2
−5
MSE
10
θ=+/−0.05
−6
10
1
2
10
10
n (time)
Figure 5.6: Estimation MSE as a function of time for the optimum and Gaussian QEKF (dashed)
in the case of two MPSK-modulated signals received from ±0.2 () and ±0.05 (♦) when Ru is
set to zero at time n=20 and the SNR is equal to 10dB.
Chapter 6
Case Studies
This chapter explores some illustrative applications in the context of digital communications.
The second-order estimation theory in the preceding chapters is developed for these selected
case studies. In most examples, the focus is on closed-loop second-order schemes assuming
that the small-error approximation is satisfied. The Gaussian ML estimator and the rest of MLbased approximations are numerically compared to the optimal second-order small-error solution
in Chapter 4. Likewise, the related lower bounds in the presence of nuisance parameters are
included for completeness (Section 2.6.1).
In the first section, some contributions in the field of non-data-aided sychronization are
presented. Specifically, Section 6.1 proposes the global optimization of second-order closed-loop
synchronizers and the design of open-loop timing sycnronizers in the frequency domain. In
Section 6.2, the problem of second-order carrier phase synchronization is addressed in case of
noncircular transmissions. In this section, the ML estimator is shown to be quadratic at low
SNR for MSK-type modulations. Moreover, second-order self-noise free estimates are achieved
at high SNR exploiting the non-Gaussian structure of the digital modulation. In Section 6.3,
the problem of time-of-arrival estimation in wireless communications is studied. The frequencyselective multipath is shown to increase the number of nuisance parameters and the Gaussian
assumption is shown to apply in this case study.
In Section 6.4, the classical problem of blind channel identification is dealt with. The channel
amplitude is shown to be not identifiable unless the transmitted symbols belong to a constantmodulus constellation and this information is exploited by the estimator. Finally, the problem
of angle-of-arrival estimation in the context of cellular communications is addressed in Section
6.5. The Gaussian assumption is clearly outperformed for practical SNRs in case of constantmodulus nuisance parameters and closely spaced sources. In this section, the importance of the
multiple access interference (MAI) is emphasized and MAI-resistant second-order DOA trackers
are derived and evaluated.
123
CHAPTER 6. CASE STUDIES
124
6.1
Non-Data-Aided Synchronization
The problem of blind frequency estimation was adopted in the core of the dissertation —Chapters
3 and 4— to illustrate the most significant conclusions of this thesis. This choice was based on the
relevance of this problem in many applications and, the existence of closed-form expressions for
the feedforward frequency estimator considered in Chapter 3. In this section, some additional
contributions in the field of non-data-aided (NDA) digital synchronization are presented and
simulated.
To introduce the reader to the problem of digital synchronization, a brief review of the
state-of-the-art is provided in Section 6.1.1, in which the most successful timing and frequency
estimators are presented. Afterwards, in Section 6.1.2, the signal model for digital synchronization is reviewed and some important remarks are made on the structure of the transfer matrix
A (θ). Based on this signal model, the performance of the most important NDA (quadratic)
timing estimators —for both linear and CPM modulations— is extensively evaluated via simulation in Section 6.1.5. In this context, a closed-form expression for the optimal second-order
open-loop timing estimator is deduced by processing the received signal in the frequency domain
(Section 6.1.3). Another contribution of this section is the global optimization of closed-loop
estimators, showing that the discriminator should be designed to minimize the variance of the
low-pass noisy terms because the high-pass terms (e.g., the self-noise) are filtered at the loopfilter (Section 6.1.4). Finally, all these results are validated via simulation in Section 6.1.5.
6.1.1
Overview
In digital communications, the receiver has to recover some reference parameters in order to
demodulate the received signal. These parameters are mostly the signal timing and, in bandpass coherent communications, the carrier phase and the carrier frequency. The knowledge of
these parameters is necessary to synchronize the demodulator and take reliable decisions on the
transmitted symbols [Men97][Vaz00].
Despite the data symbols are a priori unknown, digital modulations exhibit a strict-sense
cyclostationarity that can be exploited to derive sufficient statistics for the estimation of the
aforementioned parameters. Thus, all the methods in the literature for non-data-aided (NDA)
timing and frequency estimation make use of the cyclostationarity property of the received signal
[Rib94].
As trying to exploit the entire statistics would be unpractical, two main directions have been
adopted in the development of practical algorithms. The first direction focuses on an explicit
exploitation of the second-order cyclostationarity [Gar86b][Rib94]. As a result, the algorithms
derived become quadratic with respect to the received signal. There are, at least, two motivations
6.1. NON-DATA-AIDED SYNCHRONIZATION
125
for choosing the second-order statistics. The first one is that it represents a minimum complexity
constraint. The other is that it allows extracting useful insights from the spectral correlation
concept [Gar86b], which is useful for guiding the designer in the derivation of synchronization
algorithms. Although all the above methods start from a solid theoretical foundation, the second
order constraint appears as an ad hoc selection, and the obtained methods are based on heuristic
reasoning. For the preliminary issues on cyclostationarity the reader is referred to [Gar94] and
references therein.
The second direction commonly adopted for the design of synchronization algorithms is
the application of the well-known maximum likelihood principle explained in Section 2.3
[Men97][Vaz00]. While the cyclostationary framework is useful for the derivation of both feedforward and feedback structures, the ML criterion leads primarily to feedback schemes (Section
2.5). With the purpose of deriving NDA methods, the data symbols should be modeled as
random variables following the stochastic approach introduced in Section 2.3. Then, the likelihood function should be obtained by averaging the joint likelihood function using the known
statistical distribution of the symbols. Additionally, the rest of unknown nuisance parameters
can also be averaged out following a Bayesian approach. The resulting NDA ML criterion is
referred to as the unconditional (or stochastic) maximum likelihood estimator in the literature
(Section 2.3).
Because the difficult computation of the mentioned statistical averages, it is very common
to consider that the signal-to-noise ratio of the received signal is very low (Section 2.4.1). Although this low-SNR assumption is not generally satisfied, it allows the development of reduced
complexity synchronizers because the resulting schemes are usually quadratic in the observation.
A different interpretation of the NDA ML estimation is given in [Vaz00][Vaz01][Rib01a]. The
new approach is based on the compression of the NDA ML function with respect to the vector of
unknown symbols by adopting a linear estimation of these symbols. This approach is valuable
because it unifies the different ML-based NDA solutions, namely the Low-SNR UML (Section
2.4.1), the Conditional ML (Section 2.4.2) and the Gaussian ML (Section 2.4.3).
In the following sections, the most important NDA synchronization techniques are briefly
described and classified. For more information, the reader is referred to the excellent textbooks
and historical reports on digital synchronization [Men97][Mey90][Gar88a][Gar90].
Timing Sychronization
One of the simplest algorithms exploiting the cyclostationarity property for timing estimation
is the well-known Filter and Square Timing Recovery proposed in [Oer88] by M. Oerder and H.
Meyr. This feedforward timing synchronizer is based on an explicit spectral line regeneration
using quadratic processing. The Oerder&Meyr synchronizer was proved in [LS05a][Vaz00] to be
CHAPTER 6. CASE STUDIES
126
the low-SNR ML timing estimator if the carrier frequency error is uniformly distributed within
the Nyquist bandwidth and the received symbols are uncorrelated. Likewise, the Oerder&Meyr
synchronizer yields also the low-SNR ML solution in the absence of frequency errors as shown
in [LS05a][Vaz00].
On the other hand, the application of the ML principle along with the approximation of lowSNR and some additional simplifications, has led to well-known closed-loop estimators such as
the NDA Early-Late detector [Men97, Sec. 8.3.1.] and the Gardner’s detector [Gar86a], which
is shown to outperform the NDA Early-Late detector at high SNR.
A common problem of the existing NDA timing error detectors is the presence of the socalled self-noise (or pattern-noise) [Men97]. Self-noise is the timing jitter induced by the random
received symbols. Indeed, this self-noise is a consequence of the adopted low SNR approximation.
The occurrence of self-noise yields a high SNR floor on the timing estimation variance that might
invalidate these techniques for the medium-to-high SNR range. This problem was addressed in
detail in [And96]. In this paper, the authors proposed to pre-filter the received signal before
detecting the timing error.
Finally, more recent research efforts have been concerned with timing recovery for Continuous
Phase Modulation (CPM) [Men97][Vaz00]. These modulation schemes are attractive for their
high spectral eficiency and constant envelope nature, which allows the use of low-cost, nonlinear
amplifiers. The ML principle along with the low-SNR approximation has been also applied in
this case, leading to timing recovery detectors very similar to those derived for a linear format.
Carrier-frequency synchronization
The structure of the frequency synchronizers highly depends on the magnitude of the maximum
frequency offset as compared with the symbol rate. Early methods for feedback frequency
recovery in the case of high frequency offset include quadricorrelators [Cah77][Gar88a] and dual
filter detectors [Alb89][Gar88a], which have been proved to be equivalent solutions [Moe92]. The
rotational detectors for estimating moderate frequency offsets with no timing uncertainty were
introduced in [Mes79]. Other ad hoc schemes were proposed in [Sar88] and [Chu91] for the same
problem.
The first rigorous treatment of the problem starting from a ML perspective can be found in
[Gar90]. The frequency recovery methods developed under this framework also make use of the
low-SNR approximation. However, the resulting low-SNR ML frequency error detectors become
self-noise free if the timing is known. Self-noise appears only when the estimator does not use
the timing information [Men97, Sec. 3.5.] and ad hoc techniques for eliminating this effect has
been proposed in [Alb89] and [And93].
6.1. NON-DATA-AIDED SYNCHRONIZATION
6.1.2
127
Signal Model
In this section, the signal model used in the context of digital communications is presented in
detail. It is shown that most modulations of interest can be represented by means of the linear,
vectorial model presented in Section 2.4. Let us start formulating the complex envelope of a
generic digital modulation as follows:
y(t) = s(t − τ ; dk )ej(ϕ+ωt) + w(t)
(6.1)
where {dk } are the information symbols conveyed in the transmitted signal s(t), τ is the timing
error within a symbol period (−T/2, +T /2], ϕ and ω are the carrier phase and the carrier
pulsation errors, respectively, and w(t) is the complex AWGN term with double-sided power
spectral density Sw (f) = 2No Watts/Hz.1
If the received signal is low-pass filtered in the Nyquist bandwidth (−0.5/Ts , 0.5/Ts ], the
equivalent discrete signal model is given by
y(mTs ) = s(mTs − τ ; dk )ej(ϕ+ωmTs ) + w(mTs )
(6.2)
where Ts is the sampling period. Under this sampling condition, the discrete noise w(mTs )
remains white.
A first case of interest are those linear modulations admiting the following representation:
y(mTs ) =
+∞
dk p(mTs − kT − τ )ej(ϕ+ωmTs ) + w(mTs )
(6.3)
k=−∞
where T = Nss Ts is the symbol period and, p(mTs ) are the samples of the pulse p(t), which is
supposed to last L symbol intervals. If we take M Ns Nss samples to estimate the unknown
parameters, the n-th observed vector
yn [y(nT ), . . . , y(nT + (M − 1)Ts )]T
is given by
yn = A(θ)xn + wn
(6.4)
where the transfer matrix
⎡
⎤
ej(ϕ+nωT ) p((L − 1) T − τ )
···
ej(ϕ+nωT ) p((1 − Ns ) T − τ )
⎢
⎥
..
..
..
⎥
A(θ) ⎢
.
.
.
⎣
⎦
j(ϕ+(n+(M−1)/N
)ωT
)
j(ϕ+(n+(M−1)/N
)ωT
)
ss
ss
p((L + Ns − 1) T − Ts − τ ) · · · e
p(T − Ts − τ )
e
1
The in-phase and quadrature two-sided power spectral density is No W/Hz.
CHAPTER 6. CASE STUDIES
128
K
K
Nss
M=NsNss
Nss
M=NsNss
LNss
A(θ)
A(θ)
Burst Mode
(TDMA)
Continuous Mode
(SCPC)
N s = ⎢⎡ L + K −1⎥⎤
K = ⎢⎡ N s + L −1⎥⎤
LNss
Figure 6.1: Structure of the transfer matrix A(θ) in a burst or continuous transmission.
is a function of the normailzed vector of parameters
θ [ϕ, τ /T, ωT/2π]T ,
the vector xn [dn−L+1 , . . . , dn+Ns −1 ]T contains the observed data symbols, and the receiver
noise wn is defined in the same way than the vector yn . The phase origin can be arbitrarily
selected. For instance, a practical choice is the center of the observed interval.
In this section, we will focus on the timing and frequency estimation problems assuming
that the signal phase is unknown. In that case, the term ejϕ can be integrated into the nuisance
parameter vector xn and non-coherent (i.e., quadratic) estimation techniques are adopted.
Implicitly, a single channel per carrier (SCPC) system is assumed throughout this thesis in
which a continuous, infinite stream of symbols is received (6.3). In that case the initial L − 1
and final L − 1 symbols are partially observed and, consequently, the modulation matrix A(θ)
has Ns + L − 1 columns and only Ns Nss rows. Therefore, oversampling (Nss > 1) is normally
necessary to have more samples than unknowns, i.e.,
Ns Nss > Ns + L − 1.
This condition is usually a requirement to cancel out the disturbance of the modulation and yield
self-noise free estimates of θ. The structure of matrix A(θ) is depicted in Fig. 6.1 (right-hand
side) and its Grammian is a function of τ and ω as indicated next
∞
p(m1 Ts − kT − τ )p(m2 Ts − kT − τ ).
A(θ)AH (θ) m1 ,m2 = ejω(m1 −m2 )Ts
k=−∞
On the other hand, a burst of K symbols is transmitted in a time-division multiple access
(TDMA) system. In that case, the observation is composed of (K + L) Nss −1 non-zero samples
6.1. NON-DATA-AIDED SYNCHRONIZATION
129
and, thus, oversampling is not strictly necessary if the received burst is integrally processed. In
a TDMA system the matrix A(θ) is Sylvester (see Fig. 6.1) and, if the transmitted pulse is
sampled without aliasing, we have that
H
A (θ)A(θ) k1 ,k2 = Rpp ((k1 − k2 ) T )
where Rpp (∆t) p(t)p(t + ∆t)dt is the pulse autocorrelation. Therefore, AH (θ)A(θ) does
not depend on θ.
Synchronization algorithms for SCPC systems has to cope with the partial observation of
the initial and final symbols. Optimal synchronizers weight the observed samples taking into
account that the initial and final symbols provide less information about θ than the central
ones. The larger is the observation time (Ns ) the less significative is this “edge effect”. This
problem is very relevant, for example, in the carrier phase estimation problem studied in Section 6.2. Asymptotically, the “edge effect” is negligible and the synchronization techniques for
SCPC systems are identical to those derived for TDMA systems. Thus, in the asymptotic case
synchronizers can be designed considering uniquely the central column of A(θ) (Section 7.4.4).
The linear model in equation (6.3) can be extended to encompass more sofisticated scenarios
such as multicarrier schemes, multiple access systems, space-time transmissions, or binary CPM
modulations on account of the Laurent’s decomposition [Lau86][Men95]. In all these cases, the
received signal can be expressed as the superposition of J linearly modulated signals as follows
y=
J
Aj (θj ) xj + w = A (θ1 , ..., θJ ) x + w
j=1
with
A (θ1 , ..., θJ ) [A1 (θ1 ) , ..., AJ (θJ )]
T
x xT1 , ..., xTJ
where the index n is omitted for simplicity and θj stands for the parameters of the j-th user
in case of a multiple access system. The basic difference with respect to (6.3) is that the J
signals are usually non-orthogonal and, thus, they interfere each other if we deal with spacetime transmissions [Vil03c], asynchronous CDMA users or, binary CPM signals. Moreover, the
J pseudo-pulses of the CPM signal suffer from intersymbol interference (ISI) at their matched
filter output. All these terms of interference introduce an additional noisy component affecting
the estimator performance at high SNR and yielding the so-called self-noise.
6.1.3
Open-Loop Timing Synchronization
In Chapter 3, the formulation of the optimal open-loop second-order estimator was addressed.
In that chapter, the parameters of interest were modeled as random variables with known probability density function fθ (θ). Then, the estimator coefficients were optimized averaging the
CHAPTER 6. CASE STUDIES
130
estimator bias and variance with respect to the prior fθ (θ). The Bayesian expectation was
solved analytically for the problem of frequency estimation in Section 3.4 and some simulations
were presented to illustrate the theory of feedforward quadratic estimation.
Unfortunately, in most problems, the expectation with respect to fθ (θ) must be solved
numerically as, for example, when addressing the problem of timing sychronization. To overcome
this drawback, in this section we propose to process the received signal in the frequency domain
where the timing error appears as a frequency shift. In that way, the formulation in Section 3.4
can be applied to both the frequency and timing estimation problems.
Let z be the DFT of the observed vector y, which is computed as
z Fy = FA (θ) x + Fw,
where F stands for the unitary M × M DFT matrix defined as follows:
1
2π
F √ exp −j dM dTM
M
M
with dM [−M/2, . . . , M/2 − 1]T . Notice that MTs must be greater than the burst duration
plus twice the maximum delay to prevent the existence of temporal aliasing.
In the frequency domain, the transfer matrix can be written as
B (θ) FA (θ) = ejϕ E2 (τ ) FE1 (ν) A (0)
with
(6.5)
&
%
ν
dM
E1 (ν) diag exp j2π
Nss
&
%
τ Nss
dM
E2 (τ ) diag exp −j2π
M
the diagonal matrices accounting for the frequency and timing error, normalized with respect to
the symbol period T . In that way, the observation z exhibits the same phasorial dependence on
the three parameters ϕ, τ and ν. Therefore, the results in Appendix 3.D can be used to obtain
a closed-form expression for the optimal quadratic open-loop timing sychronizer.
Notice that optimal estimators can be obtained from z = Fy since F is a unitary transformation that can always be inverted —if necessary— by the estimator matrix M without having
noise enhancement. Moreover, the transformation does not change the noise statistics if the
original Gaussian noise w is spectrally white.
To conclude, it is worth realizing that (6.5) is only held if all the received pulses are entirely
observed. Otherwise, those partially observed pulses cannot be interpolated from the vector of
samples y because they do not satisfy the Nyquist criterion. Thus, the above expression can be
applied to design open-loop estimators if the entire burst —including the pulse tails— is captured
and processed in a TDMA system or, alternatively, if the observation time is sufficiently large
to neglect the “edge effect” in SCPC systems (Section 6.1.2).
6.1. NON-DATA-AIDED SYNCHRONIZATION
6.1.4
131
Closed-Loop Analysis and Optimization
In Chapter 4, the optimum second-order small-error estimator was deduced and then simulated for the frequency estimation problem. The solution therein can be adopted to design the
discriminator of NDA timing and frequency closed-loop synchronizers. In this manner, the discriminator coefficients are selected to minimize the steady-state variance at the discriminator
output. However, this optimization criterion is not taking into account that the discriminator
output is further lowpass filtered by the loop impulse response. For example, an exponential
filtering is carried out in case of a first-order closed-loop. When the discriminator output is temporally uncorrelated, this standard procedure is globally optimal and the estimator variance is
computed as the discriminator variance divided by the effective loop filter memory N ≈ 0.5/Bn
where Bn is the noise equivalent loop bandwidth. This case corresponds to the closed-loop estimator in Section 2.5.1 processing independent blocks zn . However, if the detected errors are
correlated because overlapped blocks of the received signal are processed, the estimator variance
is no longer divided by N and the standard procedure for designing the discriminator is suboptimal. Remember that overlapping is generally required to have efficient closed-loop estimators
(see Proposition 2.1).
In this section, the small-error variance of any quadratic NDA closed-loop estimator is formulated analytically. This expression is then optimized to find the optimal discriminator coefficients. Some numerical results for the timing estimation problem are provided comparing the
aforementioned design criteria. Notice that the formulation is absolutely general and can be applied to other uniparametric and multiparametric second-order estimation problems. Also, the
results in this section are useful in the context of open-loop estimation (Chapter 3) if the parameter estimates are post-filtered. In that case, the Bayesian expectation should be incorporated
into all the following expressions.
The output of any quadratic discriminator of α = g(θ) can be expressed as
n − g (θo ) = MH (
en α
rn − ro )
(6.6)
where M are the discriminator coefficients under design, rn is the (vectorized) sample covariance
rn for any value of n. The
matrix for the n-th observed block and, ro is the expected value of sequence en is strict-sense stationary with zero mean and covariance MH Q0 M where
H
rn − ro ) (
rn − ro )
Q0 E (
is the covariance matrix of the quadratic observation rn (3.10). The meaning of subindex in Q0
will be explained next. Let us remind the reader that the discriminator coefficients minimizing
the variance of en were found in Section 4.2.
Let us consider now that hn is the loop infinite impulse response. In that case, the estimation
CHAPTER 6. CASE STUDIES
132
errors are given by
εn ∞
hk en−k =
k=0
∞
H
H
hk M (
rn−k − ro ) = M
k=0
∞
hk (
rn−k − ro )
k=0
that is a strict-sense stationary zero-mean sequence with covariance
∞
'
H
H
E εn εn = M
Rhh [m] Qm M
m=−∞
where Rhh [m] $∞
k=m hk hk−m
is the autocorrelation function of the filter hn and
Qm E (
rn − ro ) (
rn−m − ro )H
stands for the “vectorial autocorrelation function” of the quadratic observation rn evaluated at
the m-th lag. Thus, Q0 stands for Qm at m = 0.
Notice that Qm is defined for lags |m| ≤ D where D stands for the number of consecutive
statistically-dependent blocks. In that way, the covariance of the estimation error is
D
'
D
'
E εn εH
Rhh [m] Qm M ≈ Eh MH
Qm M
= MH
n
m=−D
where Eh Rhh [0] =
$∞
2
k=0 hk
m=−D
is the filter impulse response energy. In the last approximation,
we have taken into account that the bandwidth of hn is very small and, therefore, Rhh [m] is
approximately flat for |m| ≤ D. Finally, notice that Eh = 1/N ≈ 2Bn where N and Bn are
the effective loop memory and the noise equivalent loop bandwidth, respectively, assuming that
$∞
k=0 hk = 1 is verified to have unbiased estimates (Section 2.5.2).
In the last equation, the variance of any quadratic (unbiased) closed-loop estimator is given
by
E εn εH
= MH Qopt M
n
where the fourth-order matrix Qopt is given by
Qopt D
m=−D
Rhh [m] Qm ≈ Eh
D
m=−D
D
1 Qm =
Qm .
N
m=−D
and, therefore, the optimal solution is the one deduced in Section 4.2 with Qo = Qopt instead
of Qo = Q0 . Thus, the optimal and original second-order discriminators are
H −1 # H
Dg
Mopt = Q−1
opt Dr Dr Qopt Dr
# H
H −1
M0 = Q−1
Dg ,
0 Dr Dr Q0 Dr
respectively.
6.1. NON-DATA-AIDED SYNCHRONIZATION
133
−1
10
−2
10
Normalized Timing Error Variance
−3
10
−4
10
M=4
−5
10
M=8
−6
10
Optimized
Discriminator
Mopt
−7
10
M=16
−8
10
0
10
20
30
Es/No (dB)
40
50
60
Figure 6.2: Timing error variance with and without loop optimization for different number of
samples M in case of the QPSK modulation with roll-off 0.1 and Nss = 2. The same curves are
obtained for QAM and MPSK.
If Ree [m] and See (f) =
$
m Ree [m]e
−j2πf m
stand for the autocorrelation and the power
spectrum of the error sequence en in (6.6), we can affirm that the optimal discrimina$
tor Mopt minimizes See (0) =
m Ree [m] whereas the original discriminator M0 minimized
1/2
Ree [0] = −1/2 See (f)df. This means that the optimal discriminator should filter out the very
low-frequency errors and let the loop filter to cancel out the high-frequency errors. This fact
becomes relevant at high SNR because the self-noise is actually a highpass disturbance.
Unfortunately, this desirable aim is severely limited by the unbiased constraint and minor
gains have been observed for practical SNRs, at least for the symbol synchronization problem.
In Fig. 6.2 and 6.3, it is shown how the self-noise can be reduced at high SNR in case of MPSK
and QAM transmissions with small roll-off pulse shaping. On the other hand, if some bias is
accepted, the discriminator could adopt a more highpass response in order to reduce the ultimate
variance in low-SNR scenarios following the Bayesian formulation in Chapter 3.
6.1.5
Numerical Results
The carrier estimation problem was adopted in Sections 3.4 and 4.5 to illustrate the theory
of second-order optimal estimation in the field of digital communications. Simulations were
provided comparing the optimal solution with the classical ML-based estimators as well as the
CHAPTER 6. CASE STUDIES
134
−1
10
−2
Normalized Timing Error Variance
10
−3
10
Original
Discriminator
M0
−4
10
Optimized
Discriminator
Mopt
−5
10
Nss=4,6,8
N =4,6,8
ss
−6
10
0
10
20
30
Es/No (dB)
40
50
60
Figure 6.3: Timing error variance with and without loop optimization for Nss equal to 4, 6 and
8 in case of the QPSK modulation with roll-off 0.1 and M=4.
unconditional and modified CRB. To complement these results, some simulations are presented
in this section for the problem of digital clock recovery.
Closed-loop timing synchronization
The optimal second-order timing estimator is compared to the GML and the low-SNR UML.
The NDA Early&Late [Men97, Sec. 8.5.2.], Gardner’s [Gar86a], Oerder&Meyr’s [Oer88] synchronizers are also simulated because they are actually the most usual timing sychronizers in
practical implementations (Fig. 6.4 and 6.5). Notice that the three algorithms are based on
the low-SNR approximation and, therefore, they suffer from self-noise at high SNR. A firstorder closed-loop is simulated with the (normalized) noise equivalent loop bandwidth set to
Bn = 5 × 10−3 (i.e., N = 100 symbols). All the Es /N0 values are simulated assuming that the
small-error condition is verified. Finally, the CML estimator is not simulated because, in the
considered scenario, there are more nuisance parameters than observed samples, i.e., M = 4 and
K = 11.
The Gaussian assumption is found to yield optimal timing synchronizers for those linear
modulations, such as QAM and MPSK, for all the simulated SNR (Fig. 6.4 and 6.5). On the
other hand, simulations for the MSK modulation have shown a minor improvement for mediumto-high SNRs [Vil01b], as illustrated in Fig. 6.6. In the same plot, the optimal fourth-order
6.1. NON-DATA-AIDED SYNCHRONIZATION
135
−2
10
−3
Normalized Timing Error Variance
10
−4
10
Gardner
OM
EL
−5
10
BQUE & GML
−6
low−SNR
10
−7
10
0
5
10
15
20
Es/No (dB)
25
30
35
40
Figure 6.4: Normalized timing variance for the low-SNR ML and GML estimators as well as
the EL (Early&Late), OM (Oerder&Meyr) and Gardner’s symbol sychronizers. The simulation
parameters are; 16-QAM, roll-off 0.75, Nss =2 (Nss =4 for the Oerder&Meyr), M = 2Nss and,
Bn = 5 · 10−3 . The shaping pulse and the associated matched filter are truncated at ±5T .
detector designed in [Vil01b] is simulated showing that higher-order methods are only able to
outperform second-order techniques at high SNR.
Open-loop timing synchronization
Some simulations are also presented in Figs. 6.7-6.12 for the open-loop timing synchronizer.
The second-order minimum variance (Mvar ) and MMSE (Mmse ) estimators proposed in Section
3 are compared with the closed-loop estimator formulated in Section 4. The timing is estimated
from a burst of K = 4 symbols. Simulations are run for the 16-QAM and MSK modulations. In
the first case the transmitted pulse is a square-root raised cosine with roll-off 0.75 and duration
5T . The sampling rate is twice the symbol rate, i.e., Nss = 2. The normalized timing error is
modeled as a uniform random variable in the interval ±∆/2. Notice that in a TDMA system
the range of ∆ is extended to ±Ns with Ns the burst duration in symbols. The reason is that we
are actually dealing with a semiblind estimation problem since those symbols before and after
the burst are known to be null.
In Fig. 6.7 the normalized MSE is plotted as a function of the timing error for ∆ = 1,
K = 4 and Es /N0 =10dB. It can be shown that the MMSE is able to outperform the minimum
variance estimator because it is not forced to yield unbiased estimates within the prior range,
CHAPTER 6. CASE STUDIES
136
−2
10
Gardner
−3
10
Normalized Timing Error Variance
OM
EL
−4
10
Low−SNR UML
BQUE & GML
−5
10
−6
10
−7
10
0
5
10
15
20
Es/No (dB)
25
30
35
40
Figure 6.5: Normalized timing variance for the low-SNR ML and GML estimators as well as
the EL (Early&Late), OM (Oerder&Meyr) and Gardner’s symbol sychronizers. The simulation
parameters are; 16-QAM, roll-off 0.25, Nss =2 (Nss =4 for the Oerder&Meyr), M = 2Nss and,
Bn = 5 · 10−3 . The shaping pulse and the associated matched filter are truncated at ±5T .
i.e., ±∆/2 = ±1/2. On the other hand, the closed-loop estimator is optimized for τ = 0 but its
performance degrades rapidly when the timing error approaches the prior limits at τ = ±∆/2.
The estimators mean response as well as their squared bias is simulated in Fig. 6.8 and Fig.
6.9, respectively. It is shown that bias is easily cancelled for ∆ = 1 since it is a small fraction of
the burst duration, which is equal to 8 symbols in the simulated scenario.
Some additional conclusions can be drawn from these simulations:
• In noisy scenarios, the loss incurred by open-loop estimators becomes negligible when
compared to the performance of closed-loop estimators (Fig. 6.7). On the other hand,
closed-loop estimators are superior at high SNR as shown in Fig. 6.10.
• The minimum variance and MMSE open-loop estimators converge when the SNR is augmented for the MSK modulation (Fig. 6.11). On the other hand, self-noise is observed at
high SNR for the 16-QAM constellation (Fig. 6.12). In that case, the MMSE estimator
outperforms the minimum variance solution because it introduces some bias in order to
reduce the self-noise variance.
6.1. NON-DATA-AIDED SYNCHRONIZATION
137
−1
10
GML, CML, low−SNR ML
(Gaussian) UCRB
BQUE
4th−Order [Vil01]
MCRB
−2
Normalized Timing Error Variance
10
−3
10
−4
10
−5
10
−6
10
−7
10
0
5
10
15
20
Es/No (dB)
25
30
35
40
Figure 6.6: Normalized timing variance for the ML-based estimators for the MSK modulation
with Nss =2, M=4 and, Bn = 5 · 10−3 .
• The Gaussian assumption leads to suboptimal open-loop sychronizers at high SNR (Figs.
6.11 and 6.12). Regarding the MSK simulation, the Gaussian assumption avoids having
self-noise free timing estimates (Fig. 6.11).
Some of the results in this section were presented for the first time in the IEEE International
Workshop on Statistical Signal Processing that was held in Singapore in 2001 [Vil01a]. This
work was further elaborated in [Vil02b] and presented in the IEEE Global Communications
Conference that was held in Taipei in 2002:
• “Best Quadratic Unbiased Estimator (BQUE) for Timing and Frequency Synchronization”. J. Villares, G. V´
azquez. Proceedings of the 11th IEEE International Workshop on
Statistical Signal Processing (SSP01). pp. 413-416. Singapore. August 2001.
• “Sample Covariance Matrix Based Parameter Estimation for Digital Synchronization”. J.
Villares, G. V´
azquez. Proceedings of the IEEE Global Communications Conference 2002
(Globecom 2002). November 2002. Taipei (Taiwan).
CHAPTER 6. CASE STUDIES
138
1
10
∆/2
−∆/2
0
Normalized Mean Squared Error
10
Mmse
−1
10
(MSE=0.029)
M
var
−2
10
(MSE=0.068)
closed−loop
(MSE=0.043)
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
Normalized timing Error
Figure 6.7: Normalized timing MSE for the MMSE, minimum variance and closed-loop secondorder estimators for K = 4 and Es/No=10dB. The average MSE for the three estimators is
included inside round brackets.
1
∆/2
Mvar
Estimator Mean Response
0.8
0.6
0.4
M
mse
0.2
0
closed−loop
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Normalized Timing Error
1.4
1.6
1.8
2
Figure 6.8: Estimation mean response for the MSE for the MMSE, minimum variance and
closed-loop second-order estimators for K = 4 and Es/No=10dB.
6.1. NON-DATA-AIDED SYNCHRONIZATION
139
0
10
closed−loop
(BIAS=0.036)
−2
10
M
∆/2
−∆/2
MSE
(BIAS=0.012)
Normalized Squared Bias
−4
10
−6
10
Mvar
−8
10
(BIAS≈0)
−10
10
−12
10
−2
−1.5
−1
−0.5
0
0.5
Normalized Timing Error
1
1.5
2
Figure 6.9: Normalized timing squared bias for the MMSE, minimum variance and closed-loop
second-order estimators for K = 4 and Es/No=10dB. The average BIAS for the three estimators
is included inside round brackets.
1
10
∆/2
−∆/2
Mvar
0
10
(MSE=0.015)
Mmse
Normalized Mean Squared Error
−1
10
(MSE=0.012)
−2
10
−3
10
−4
10
−5
10
closed−loop
(MSE=0.040)
−6
10
−2
−1.5
−1
−0.5
0
0.5
Normalized timing Error
1
1.5
2
Figure 6.10: Normalized timing MSE for the MMSE, minimum variance and closed-loop secondorder estimators for K = 4 and Es/No=40dB. The average MSE for the three estimators is
included inside round brackets.
CHAPTER 6. CASE STUDIES
140
1
10
M
0
var
10
2
στ
−1
Normalized MSE
10
Gaussian Assumption
Mmse
−2
10
−3
10
−4
10
−5
10
0
5
10
15
20
Es/No (dB)
25
30
35
40
Figure 6.11: Normalized timing MSE for the MMSE (Mmse ) and minimum variance (Mvar )
second-order estimators for the MSK modulation when K = 4 and ∆ = 1. The suboptimal
estimators deduced under the Gaussian assumption are also plotted.
0
10
Normalized MSE
Mvar
σ2
−1
10
τ
M
mse
Gaussian Assumption
−2
10
0
5
10
15
20
Es/No (dB)
25
30
35
40
Figure 6.12: Normalized timing MSE for the MMSE (Mmse ) and minimum variance (Mvar )
second-order estimators for the 16-QAM modulation when K = 4 and ∆ = 1. The suboptimal
estimators deduced under the Gaussian assumption are also plotted.
6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS
6.2
141
Carrier Phase Synchronization of Noncircular Modulations
Coherent demodulation of continuous phase modulations (CPM) requires knowledge of the phase
and frequency of the received carrier. Self-synchronizing techniques are normally preferred
because they avoid the transmission of inefficient training sequences. Moreover, non-data-aided
(NDA) algorithms are more appropriate in noisy scenarios because they do not rely on unreliable
decisions but on the statistical structure of the received waveform [Men97].
In the synchronization field, the Laurent’s expansion (LE) has been frequently used to derive
synchronization techniques for CPM receivers [Men97][Vaz00][Mor00]. The LE is interesting
because it allows expressing the non-linear CPM format as the summation of a finite number of
pulse amplitude modulated (PAM) signals [Lau86][Men95]. Thus, all the extensive literature on
synchronization and parameter estimation for linearly modulated signals can be reused [Rib01b].
On the other hand, the LE allows building scalable schemes considering uniquely the most
powerful components of the decomposition [Mor00].
Focusing on the carrier phase estimation problem, Mengali et al. derived in [Men97, Sec.
6.6.2] the ML NDA carrier phase synchronizer under the low SNR assumption for MSK-type
modulations (e.g., MSK, LREC, LRC, GMSK) [Men97]. The obtained solution was shown to
be quadratic in the data. This is actually a unique feature of MSK-type signals because higher
order techniques are required for NDA carrier phase synchronization in case of linear modulations
[Ser01][Moe94] as well as general CPM signals.
Based on the LE, this property can be justified because the pseudo-symbols are not circular
[Pic94] in case of MSK-type modulations and, therefore, the square of the received signal is not
zero-mean and offers information about the parameter of interest [Moe94]. Finally, note that
the N-th power synchronizer studied in [Moe94] can still be applied to MSK-type modulations
although it will be inefficient at low SNR, as stated previously, and will not attain the Cram´erRao bound (CRB) either when the SNR tends to infinity because CPM modulations suffer from
intersymbol interference (ISI).
From this background, in Section 6.2.2, the low-SNR ML estimator has been reformulated
using vectorial notation and the Laurent’s decomposition. The subsequent analysis of the lowSNR approximation at high SNR in Section 6.2.3 reveals the existence of a significant variance
floor due to the so-called self-noise, that is, the variability caused by the own modulation in
NDA schemes. This floor is inappreciable when the observation is sufficiently large but it is
determinant for short samples.
This drawback motivated the design of second-order self-noise free schemes minimizing the
aggregated contribution of thermal plus pattern noise for a given SNR. The proposed secondorder optimal synchronizer is deduced in Section 6.2.4 and its asymptotic study is presented in
CHAPTER 6. CASE STUDIES
142
Section 6.2.5 concluding that, with partial-response signals, some data patterns make the carrier
phase unidentifiable if self-noise corrupted estimates are not tolerated. The estimator failure has
been related to the singularity of the modulating matrix in partial-response signals. Anyway,
although the above circumstance might slow down the parameter acquisition in a closed-loop
implementation, self-noise free estimates are guaranteed after convergence. To conclude, the
above statements are checked via simulation in Section 6.2.6.
The results of this section were presented in the IEEE International Conference on Communications that was held in Paris in 2004 [Vil04b]:
• “Self-Noise Free Second-Order Carrier Phase Synchronization of MSK-Type Signals”, J.
Villares, G. V´
azquez, Proc. of the IEEE International Conference on Communications
(ICC 2004). June 2004. Paris (France).
6.2.1
Signal Model
The Laurent’s expansion (LE) allows the representation of binary CPM signals as the sum of a
few PAM waveforms [Lau86][Men97]. This transformation is adopted in this section in order to
formulate carrier phase synchronizers for the nonlinear CPM format. It was shown in Section
6.1.2 that the complex envelope of the sampled CPM signal is given by
y = ejθo
J−1
Aj xj + w = ejθo Ax + w
(6.7)
j=0
where θo is the unknown carrier phase that must be estimated, xj the pseudo-symbols from the
j-th component of the Laurent’s expansion having contribution into the observation y, Aj the
associated modulating matrix formed from the j-th pseudo-pulse coefficients and, w the vector
of AWG noise. The J components of the LE expansion are stacked in the following manner:
T
x = xT0 , . . . , xTJ−1
A = [A0 , . . . , AJ−1 ] .
In order to simplify the study, the following assumptions are taken in the following; 1) the
receiver has perfect timing and frequency-offset synchronization; 2) the CPM modulator has
achieved the steady-state; 3) the focus is on MSK-type signals for which the modulation index
is h = 0.5 and hence the carrier phase shifts are equal to ±π/2; 4) θo ∈ (−π/2, π/2] in order to
avoid the inherent ambiguity of quadratic methods [Men97]. Additionally, the study is carried
out for a continuous transmission system as explained in Section 6.1.2. This point is specially
relevant because some of the concluding remarks are a consequence of the continuous mode
model.
6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS
6.2.2
143
NDA ML Estimation in Low-SNR Scenarios
In this section, the ML principle is applied to find the optimum estimator of θo when the SNR
is asymptotically low. As it was stated in Section 2.3, the ML estimator is the maximizer of the
following cost function:
−2
jθ
2
∝ Ex exp σ−2
fy (y; θ) = CEx exp −σw y − e Ax
w χ(y; x, θ)
(6.8)
where C is an irrelevant constant, Ex {·} the expectation with respect to the pseudo-symbols
distribution, σ2w the variance of the noise samples and
χ(y; x, θ) 2 Re(e−jθ xH AH y)
(6.9)
is the term in the exponent of (6.8) that depends on θ.
Unfortunately, the expectation with respect to x normally complicates the calculation of
a closed-form for fy (y; θ). To overcome this obstacle, the likelihood function (6.8) is usually
evaluated assuming that the SNR tends to zero, that is, σ 2w → ∞. Following this reasoning,
Mengali deduced in [Men97, Sec. 6.6.2] the low-SNR ML estimator of θ o directly from the
angular signal model in case of MSK-type signals.
Next, an alternative deduction is provided from the vectorial model in Section 6.2.1. In
contrast to [Men97, Sec. 6.6.2], the obtained ML solution is exact even if the observation is
short. Notice too that [Men97, Sec. 6.6.2] approximates the squared CPM signal (averaged
with respect to the data) by means of the first harmonic of its Fourier series in order to yield a
low-cost implementation based on transversal filtering. It can be shown that this approximation
is only exact for LREC signals in which the frequency pulse is rectangular.
The deduction is initiated expanding the logarithm of (6.8) in a Taylor series at σ−2
w = 0,
having that
2
1
ln fy (y; θ) σ−4
w Ex χ (y; x, θ)
2
except for some irrelevant additive constants (see Section 2.4.1). Then, computing the above
expectation, it results that
H
ln fy (y; θ) σ−4
w Re Tr(R (θ)R)
(6.10)
where the improper sample covariance matrix,
yyT ,
R
(6.11)
constitutes a sufficient statistic for the estimation of θ o in the studied low-SNR scenario (Section
2.4.1) and
= ej2θ AΓAT ej2θ R
R(θ) E R
CHAPTER 6. CASE STUDIES
144
stands for its expected value evaluated at θ with Γ Ex xxT the improper covariance matrix
of the pseudo-symbols.
= yT Ry is the minimal sufficient statistic
Notice that the quadratic form Tr RH R
[Kay93b] in the studied low SNR scenario. It is worth noting that it is possible to estimate
the carrier phase from the second-order statistics because the pseudo-symbols x do not hold the
circular property [Pic94] (i.e., Γ = 0), as it happens in case of linear modulations.
Finally, the log-likelihood gradient is given by
∂
H
∇(y; θ) ln fy (y; θ) = 2σ−4
w Im Tr(R (θ)R)
∂θ
which is in-quadrature with the likelihood function in (6.10) and vanishes for
T 1
= 1 arg x
ˆ
ˆ Γˆ
x
θ = arg Tr(RH R)
2
2
(6.12)
(6.13)
where x
ˆ AH y stands for the detected pseudo-symbols at the matched filter output [Vaz00].
The existence of an analytical solution is exceptional and an iterative algorithm is normally
required to seek for the maximum of the log-likelihood function (e.g., in timing and carrier frequency synchronization [Vaz00][Men97]). Anyway, even if we have a closed-form solution (6.13),
gradient-based algorithms (Section 2.5) allow the design of closed-loop schemes for tracking the
parameter of interest in time-varying scenarios (Section 2.5). In that case, the CRB theory
(Section 2.3) guarantees that the following recursion
ˆ
θ n+1 = ˆ
θn + I −1 (ˆθn )∇(y; ˆθn )
(6.14)
attains asymptotically (M → ∞) the CRB after convergence to the true parameter, i.e., ˆθn θo
[Kay93b]. Hence, the asymptotic variance of both the open-loop estimator in (6.13), and its
closed-loop implementation in (6.14), is given by
2
var(ˆ
θ) E ˆθ − θo = I −1 CRB
where
I −E
∂
H
∇(y; θ)
= E{∇2 (y; θ 0 )} = 4σ −4
w Tr(R R)
∂θ
θ=θ0
(6.15)
(6.16)
stands for the Fisher’s information [Kay93b] at low SNR, that is found to be independent of θo .
Notice that the second-order derivative computed in (6.16) normalizes the scoring algorithm in
(6.14) to yield unbiased estimates in the small-error regime (ˆθn θo ).
Remark: equation (6.15) predicts the variance of the open-loop estimator in (6.13) if and
only if the asymptotic (or small-error) condition holds true and, thus, (6.13) works in the linear
region of the arg{·} function. Otherwise, (6.13) becomes biased and the CRB theory fails. For
instance, at low SNR, the CRB is proportional to σ 4w (6.15) whereas the variance of (6.13)
is limited to π 2 /12 bearing in mind that |ˆθ| < π/2 (Fig. 6.13). Nonetheless, the small-error
assumption always applies at high SNR even for short samples.
6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS
6.2.3
145
High-SNR Analysis: Self-noise
The main drawback of the low-SNR approximation is that it usually suffers from self-noise at
high SNR when the sample is finite [Vaz00]. The reason is that the pseudo-pulses of the Laurent’s
expansion are not ISI-free, that is, AH A = IK , even in case of full-response CPM formats such as
MSK. Consequently, the variance of (6.13) presents a high-SNR floor. In this section, this floor
is characterized and, afterwards, second-order self-noise free phase synchronizers are designed
in Section 6.2.4.
First of all, let us compute the asymptotic variance of (6.13) for an arbitrary value of the
SNR. This is done evaluating the variance of (6.14) in the steady-state (ˆθn θo ) obtaining, after
some tedious manipulations (Appendix 6.A), that
var(
θ) = I −2 E{∇2 (y; θ0 )} = 2σ−8 I −2 rH Qr =
rH Qr
8 Tr2 (RH R)
(6.17)
where r vec(R) stands for the column-wise stacking of R, and Q is the fourth-order moments
matrix given by
Q = 2R ⊗ R+AKAT ,
(6.18)
that extends the formulation in Chapter 3 to noncircular constellations with the following set
of definitions:2
R E yyH = AAH + σ2w IM
A A⊗A
− 2P
KΓ
(6.19)
1
2 (IK 2 +K)
P
Ex vec(xxT ) vecH (xxT ) − Ex vec(xxT ) vecT (xxT )
Γ
where K is the commutation matrix that is implicitly defined as the matrix holding that
vec(XT ) = K vec(X) for any matrix X [Mag98, Sec. 3.7]. Likewise, P is the orthogonal projector onto the subspace that contains the vectorization of any symmetric matrix, i.e., vec(X) with
lie in this subspace. The matrix
X = XT [Mag98, Sec. 3.7]. It can be shown that both r and Γ
is specific of the actual CPM format and can be calculated numerically. In case of MSK-type
Γ
i,j ∈ {0, ±2}.
modulations, this task is simplified because [Γ]
Therefore, if (6.17) is evaluated in the noiseless case, one finds that the self-noise variance
causing the high-SNR floor is equal to
Hr
rH AΓA
.
lim var(ˆθ) =
σ2w →0
8 Tr2 (RH R)
2
(6.20)
The reader is warned that some notation is slightly redefined in this section. For example, R (θ) is the
improper covariance matrix; the conjugation is omittedin A; matrix K is redefined in (6.19); and, finally, Q is
(6.11)
the covariance matrix of the new sufficient statistic vec R
CHAPTER 6. CASE STUDIES
146
As stated before, the estimator would be self-noise free in the absence of ISI (AH A = IK )
vec(Γ) in (6.20) becomes equal to zero
because in that case AH r = vec(Γ) and, thus, the term Γ
for most noncircular modulations of interest, e.g., real-valued constellations, MSK-type signals
as well as the offset QPSK format. Anyway, the estimator is consistent for any SNR since the
self-noise variance in (6.20) turns out to be proportional to M −1 for M 1. For example, the
simulations in [Men97, Sec. 6.6.2] show that the variance curvature is practically inappreciable
below SNR=20dB with M = 100 for the MSK and GMSK modulations.
6.2.4
Second-Order Optimal Estimation
The aim of this section is to deduce optimal second-order synchronization techniques for the
whole SNR range. Assuming the noise variance is known (or accurately estimated), the proposed
estimator will minimize the joint contribution of thermal and pattern noise, leading to the
previous ML solution (6.13) when the SNR is sufficiently low and to self-noise free schemes at
high SNR (Section 6.2.5).
With this purpose, let us introduce the equation of a generic second-order gradient following
the structure provided by (6.12) under the low-SNR assumption:
= 2 Im e−j2θ mH ∆(y; θ) 2 Im e−j2θ Tr(MH R)
r
(6.21)
where M is thematrix of coefficients that should be optimized, m vec(M) its vectorization
the vectorization of (6.11).
and r vec R
The value of θ for which (6.21) is null is given by
1
ˆ
= 1 arg mH r
θ = arg Tr(MH R)
2
2
(6.22)
r = 0. Otherwise, the open-loop algorithm in (6.22) is unable to extract any
provided that mH phase information from this specific r. This fact will be studied in detail in Section 6.2.5 because
it is only relevant at high SNR. For the moment, (6.22) is assumed to be “well-conditioned”.
Another important remark is that the estimation problem at hand allows obtaining a closedform solution for the zero of ∆(y; θ). However, notice that the open-loop estimator proposed
in (6.22) is not quadratic in the data due to the arg {·} operator. Therefore, the estimation
techniques studied in this section should be seen as a nonlinear transformation of the sample
= yyT , that is only a sufficient statistic under the low-SNR approximation.
covariance matrix R
Thus, the variance of (6.22) in the small-error regime is given by
var(ˆ
θ) = J −2 E{∆2 (y; θ 0 )} = 2J −2 mH PQm,
(6.23)
6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS
147
that can be seen as a generalization of (6.17) with J defined as the gradient slope at θo , that is:
∂
J −E
∆(y; θ)
(6.24)
= 4 Re mH r .
∂θ
θ=θo
Notice that J plays the same role than the Fisher’s information in (6.14)-(6.16), that is, it
normalizes the recursion in (6.14) to yield unbiased estimates around the true parameter θo .
The optimal coefficients are obtained minimizing the estimator variance (6.23) subject to the
above bias constraint (6.24). This optimization leads to an underdetermined system of equations
and
m=
JQ−1 r
4rH Q−1 r
(6.25)
is found to be the minimum-norm solution. Anyway, all the solutions are found to yield the
same variance, that is equal to
var(ˆθ) = J −1 =
1
8rH Q−1 r
(6.26)
plugging (6.25) into (6.23). Eventually, the coefficients of the optimal estimator are given by
m = 2Q−1 r.
Finally, note that all the above expressions reduce to the ones obtained in Section 6.2.2 under
the low-SNR assumption (σ2w → ∞) taking into account that
−4 1
Q−1 = σ−4
w IM 2 + o σ w
2
(6.27)
gathers all the terms converging to zero faster than σ−4
where o σ−4
w
w .
Notice that the GML estimator has not been considered in the carrier phase estimation
problem because the Gaussian assumption also implies the circularity of the nuisance parameters.
6.2.5
High SNR Study: Self-noise
This section is concerned with the high-SNR study of the optimal second-order synchronizer
deduced in the last section. Although the analysis is more involved than in Section 6.2.3,
closed-form expressions have been obtained concluding that self-noise can be totally removed.
Nonetheless, in the case of partial-response schemes (e.g., GMSK) the open-loop implementation
(6.22) may fail when mH r = mH A vec(xxT ) = 0 in the noiseless case. When this happens, the
carrier phase is not identifiable from this particular observation r. The reason for this abnormal
behavior is that it is not always possible to cancel out the imaginary part of the argument (selfnoise) while the real part is kept positive (6.22). For example, when the binary data symbols
CHAPTER 6. CASE STUDIES
148
are alternate, i.e., {+1, −1, +1, ...}, the 2REC modulation exhibits a constant phase equal to
±π/4 and, thus, r = A vec(xxT ) is strictly imaginary in the noiseless case.
A deeper analysis shows that this limitation is a consequence of the singularity of matrix A
for partial-response modulations. However, this conclusion needs to be clarified; the singularity
of A is due to the partial contribution from the pseudo-symbols outside the observation window
in the studied SCPC system (Section 6.1.2). Therefore, only in the asymptotic case (M → ∞),
this “border effect” is negligible and matrix A is effectively full-rank.
The asymptotic study of (6.26) involves the computation of Q−1 when the noise power tends
to zero. Because the noiseless component of Q is singular (6.19), we must resort to the inversion
lemma obtaining that
m = 2Q−1 r = R−1 I − V(2Σ−1 + VH R−1 V)−1 VH R−1 r
(6.28)
where R R ⊗ R and VΣVH is the “economy-size” diagonalization of AKAH (6.19), i.e., Σ
only contains the non-zero eigenvalues and, V the associated eigenvectors.
−1
Using again the inversion lemma, the high-SNR asymptotic value of R
can be expanded
in terms of σ2w , yielding
−1
R
4
⊥
2 2
= σ−2
w PA + B − σ w B + O σw
(6.29)
#
where O σ4w contain all the terms converging to zero as σ4w or faster, P⊥
A I − AA stands
for the orthogonal projector onto the span of matrix A and
B (AAH )#
is introduced to compact further equations. Thus, the limit of R−1 is straightforward from
(6.29), using that
−1
R−1 = R
−1
⊗R
.
(6.30)
However, all the terms in (6.30) containing P⊥
A go to zero when multiplied by V or r in
(6.28) since span{V} ⊂ span{A} and r = A vec(Γ) ∈ span{A}. Therefore, considering only the
surviving terms, it is found that
R−1 = B − σ2w B ⊗ B2 + B2 ⊗ B + O σ 2w
(6.31)
for σ2w → 0 where B B ⊗ B = (AAH )# .
To complete the deduction, the inversion lemma has to be used once again in order to
compute the inner inverse in (6.28) because
T 2Σ−1 + VH BV
6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS
149
turns out to be singular again. Precisely because of that, the second term of (6.28) becomes
proportional to σ−2
w and prevails at high SNR avoiding the variance floor. Taking this fact into
account, the high-SNR asymptotic expression of m is given by
⊥ H
m = σ−2
w BVPT V Br + O (1)
with
(6.32)
−1 I − V (VH U−1 V )−1 VH U−1
P⊥
T
T
T U
T
T
U VH B ⊗ B2 + B2 ⊗ B V
(6.33)
The above expression is general no matter if A is singular or not. In case A is full-rank,
(6.32) can be simplified if the deduction is started again by decomposing the first term of Q as
H AH with Σ the diagonal matrix having the non-zero eigenvalues of K and V the
AVK ΣK VK
K
K
related eigenvectors. Thus, if V and Σ are redefined as V = AVK and Σ = ΣK , respectively,
(6.32) can be written as follows:
H
#
H #
m = σ−2
VK P⊥
w A
T VK A r + O (1)
(6.34)
using that AH B = A# and T = 2Σ−1
K + I.
At this point, it is worth understanding that the obtained solution differs from the standard CML estimator (Section 2.4.2) in that (6.34) cannot project the self-noise term onto the
orthogonal subspace of A because they are collinear [Vaz00][Rib01b]. Alternatively, in (6.34)
the received signal y is passed through a zero-forcing equalizer A# in order to decorrelate the
received pseudo-symbols [Vaz00] and, afterwards, the outer product of the detected pseudoH
⊥ , which
symbols is projected onto the matrix VK P⊥
T VK whose span coincide with that of PΓ
is the orthogonal projector onto the subspace generated by Γ (6.19). The key property of P⊥
H
⊥ vec(xxT ) is real-valued for any possible vector x.
—inherited by VK P⊥
T VK — is that PΓ
Γ
Resuming the initial discussion, if A is full-rank, the zero-forcer recovers without error the
H
vector of pseudo-symbols, i.e., x
ˆ = A# y = x and VK P⊥
T VK is able to eliminate the imaginary
part of vec(xxT ) that causes the referred self-noise while the real part is preserved, allowing
feed-forward estimation (6.22). Otherwise, the real and imaginary parts are coupled and the
self-noise cancellation attenuates inevitably the real part too.
To conclude this section, the estimator variance at high SNR is given by
var(ˆ
θ) =
σ2w
H
4 vecH (Γ)A# VP⊥
TV A
#H
vec(Γ)
+ o σ2w
(6.35)
for the general case (6.32) and, reduces to
var(ˆ
θ) =
σ2w
+ o σ2w
⊥
H
H
4 vec (Γ)VK PT VK vec(Γ)
(6.36)
when A is full-rank (6.34). Notice that in both cases the estimators are consistent, i.e., the
denominator increases without limit as M is augmented.
CHAPTER 6. CASE STUDIES
150
1
10
0
10
Low−SNR
approximation
−1
Carrier Phase Variance
10
M=2
M=4
−2
10
MCRB
(M=4)
Optimal
quadratic
−3
10
M=2
M=4
−4
10
−5
10
−10
−5
0
5
10
15
Es/No (dB)
20
25
30
35
40
Figure 6.13: Carrier phase variance as a function of the SNR for the MSK modulation with
Nss = 2. Dotted lines correspond to the high-SNR bounds computed in Sections 6.2.3 and 6.2.5.
6.2.6
Numerical Results
This section validates via simulation the theoretical results presented in this case study. The
steady-state variance of the closed-loop solution presented in equation (6.14) is evaluated for both
the low-SNR approximation and the optimal second-order synchronizer deduced in Section 6.2.4
(see Fig. 6.13 and Fig. 6.14). Simulations show that the proposed solution is self-noise free even
if the observation is rather short (M = 2, 4). Furthermore, the high-SNR asymptotic expressions
obtained in (6.20), (6.35) and (6.36) exhibit a perfect match at high SNR. Although it is not
plotted, the feedforward synchronizer derived in (6.22) was tested for the MSK modulation in
Fig. 6.13 confirming that it is always self-noise free.
Surprisingly, the high-SNR estimators deduced in (6.32) and (6.34) are also exact for any
SNR. The reason is that, as mentioned before, there is no noise-enhancement at low-SNR because
(6.32) and (6.34) does not include the orthogonal projector P⊥
A.
Finally, the acquisition performance of the optimal closed-loop synchronizer is evaluated in
Fig. 6.15 in order to validate its operability when A is singular in case of partial-response
schemes (e.g., 3REC). In the same figure, the probability of “failure” has been computed as a
function of the observation length for some partial response schemes. As shown in the plot, the
probability of failure decays exponentially with the observation time and the damping factor
increases if the modulator memory is shortened.
6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS
151
1
10
0
10
Low−SNR
approximation
−1
Carrier Phase Variance
10
M=2
M=4
−2
10
MCRB
(M=4)
Optimal
quadratic
−3
10
M=2
M=4
−4
10
−5
10
−10
−5
0
5
10
15
Es/No (dB)
20
25
30
35
40
Figure 6.14: Carrier phase variance as a function of the SNR for the 3REC modulation with
Nss = 4.
0
0.3
10
θo
0.25
3REC
−1
10
SNR=60 dB
Failure probability
Carrier Phase Estimate / π
0.2
0.15
0.1
2REC
−2
10
0.05
−3
10
0
−0.05
−4
0
100
200
300
n
400
500
10
2
4
6
8
M
10
12
14
Figure 6.15: On the left side, 10 acquisitions for the 3REC modulation with Nss = 2 and M = 2.
The SNR was set to 20 dB and the loop step-size fixed to µ = 0.01. On the right-hand side, the
probability of failure for 2REC and 3REC with Nss = 2 and σ2w = 0.
CHAPTER 6. CASE STUDIES
152
6.3
TOA Estimation in Multipath Scenarios
In the context of wireless, underwater or optical communications, the transmitted signal is
severely distorted by the channel due to the so-called multipath propagation. In a multipath
scenario, the received signal is the sum of multiple replicas of the transmitted waveform whose
delays, amplitudes and phases are unknown. The resulting dispersive channel can be modeled
as a finite impulse response (FIR) filter of unknown complex-valued coefficients. In digital
communications, the multipath disturbance is mitigated implementing digital equalizers in order
to prevent intersymbol interference (ISI). This topic is addressed in Section 6.4 where blind
channel estimators are designed from the sample covariance matrix.
In this section, we focus on the problem of radiolocation in cellular networks using range
estimates from several base stations. Range information can be obtained estimating the time
of arrival (TOA) —if the network is synchronous— or the time difference of arrival (TDOA) in
case of an asynchronous cellular network. Although the principle is the same than in radar and
navigation applications (e.g., GPS or GALILEO), the mobile radio channel poses some additional
impairments as, for example, time- and frequency-selective fast fading, non line-of-sight (NLOS)
conditions, narrowband signaling in case of second generation terminals, low Es /N0 for the
received signal coming from the non-serving base stations, limited training periods for TOA
estimation (e.g., GSM midamble), etc.
In this context, a lot of effort has been made to design TOA estimators robust to the
multipath degradation. Some of them have been developed for satellite positioning systems
(i.e., GPS, GLONASS and, GALILEO) using direct sequence spread spectrum signaling, e.g.,
[Bra01][Sec00, Sec. 2.2.2.] and references therein. All these contributions are intended for singleantenna receivers. Nonetheless, it has been proved that the use of antenna arrays is useful to
mitigate multipath and also to cancel interferences [Sec00]. Actually, a multisensor receiver
is able to combine direction of arrival (DOA) and TOA information in order to render more
accurate position estimates.
The application of these techniques to third generation cellular systems such as UMTS is
rather straightforward since they share the same signal format. On the other hand, timing
recovery in narrowband systems (e.g., GSM) becomes more difficult since the time resolution is
inversely proportional to the signal bandwidth and, the self-noise contribution becomes critical
for the working SNRs. Recall that self-noise consists of the intersymbol interference (ISI) at
the timing error detector output. In spread spectrum communications, the self-noise term is
negligible because it is filtered out in the despreading stage.
Some relevant contributions in the context of narrowband communications are
[Chi94][Mog03] and, [Fis98][Win00][Rib02]. In the first two proposals, the Gaussian assumption is adopted and non-data-aided TOA estimators are deduced. On the other hand, the last
6.3. TOA ESTIMATION IN MULTIPATH SCENARIOS
153
three papers rely on the transmission of known training data. In all the papers, with the exception of [Rib02], the channel coefficients are deterministic unknowns that need to be estimated
in a first step. An alternative approach is adopted in [Rib02] where the multipath is modeled
as a random term of known first- and second-order moments. In this manner, unbiased TOA
estimates are obtained trading some estimation variance following a Bayesian approach.
In this section, the problem of both DA and NDA timing —and also carrier frequency-offset—
estimation is studied in a multipath scenario assuming that the channel response is unknown.
Optimal second-order unbiased estimators are deduced based on the channel first- and secondorder statistics following a similar approach to the one presented in [Rib02]. Some numerical
results are presented for the problem of TOA estimation in a typical wireless outdoor scenario
in the context of the GSM standard and the EMILY European project [Bou02a][Bou02b].
The results in this section were partially presented in the the International Zurich Seminar
on Broadband Communications that was held in Zurich in 2002 [Vil02a]:
• “Optimal Quadratic Non-Assisted Parameter Estimation for Digital Synchronisation”. J.
Villares, G. V´
azquez. Proceedings of the International Zurich Seminar on Broadband
Communications 2002 (IZS2002). pp. 46.1-46.4. Zurich (Switzerland). February 2002.
6.3.1
Signal Model
Let us consider that the channel impulse response is time-invariant during the observation time
(M samples). The channel low-pass equivalent impulse response within the receiver bandwidth
W is given by
L−1
1 h(t) =
h(k/W ) sinc(W t − k)
W
(6.37)
k=0
with L/W the effective duration of the channel [Pro95, Sec. 5-1, Ch. 14]. Hereafter, the
bandwidth W is set to 2/T (100% excess of bandwidth) in order to admit the majority of
bandpass modulations.
The channel taps h(k/W ) will be modeled as zero-mean complex Gaussian variables with
their envelope and phase following a Rayleigh and uniform distribution, respectively. The
Rayleigh distribution is adopted hereafter because it corresponds to a worst-case situation.
Anyway, it is possible to assume for the first coefficient of h(k/W ) a Ricean distribution in order
to take into account the line-of-sight (LOS) component [Gre92].
From the above considerations, the complex envelope of the received signal at the sampler
CHAPTER 6. CASE STUDIES
154
output can be written as
y (mTs ) =
∞
i=−∞
di
L−1
h(k/W )ej2πνm/Nss p (mTs − k/W − iT − τ T ) + w (mTs )
(6.38)
k=0
where Ts is the sampling period, {di } is the sequence of transmitted symbols, ν and τ are the
frequency and timing errors normalized with respect to the symbol time T = Nss Ts , p(t) is the
shaping pulse, and w(t) the stationary AWGN term. Following the guidelines in Section 6.1.2,
the above formula can be expressed in vectorial form as follows:
y = A (λ) Hd + w = A (λ) x + w
(6.39)
where λ stands for either the timing or frequency error and the columns of H are 1/W -seconds
delayed versions of the channel impulse response h(k/W ). It is really important to realize that
the inclusion of a random channel h(t) yields directly the same model in (6.3), except that the
proposed estimators will have to cope with the extended, correlated vector of symbols
x Hd.
Therefore, the channel has two negative effects:
1. Firstly, the unknown vector of symbols x is about W T times longer than the vector of
transmitted symbols d. The increment of the nuisance parameters will establish a limit
on the variance of blind estimators when dealing with a time dispersive channel.
2. Secondly, the channel modifies the covariance of the transmitted symbols in the following
way:
Γ E xxH = EH HHH
assuming again uncorrelated symbols.
Notice that the Gaussian assumption is verified in case of constant amplitude modulations
such as MPSK or CPM. On the other hand, the received symbols are not strictly Gaussian
when the transmitted signal is a multilevel modulation such as QAM or APK. However, it can
be shown that in that case the Gaussian assumption yields practically optimal second-order
estimators for any SNR.
6.3.2
Optimal Second-Order NDA Estimator
The optimal second-order estimator of λ from the above signal model is deduced in the following
lines. First, the covariance matrix R (λ) and the fourth-order matrix Q (λ) are calculated, having
6.3. TOA ESTIMATION IN MULTIPATH SCENARIOS
155
that
R (λ) = A (λ) ΓAH (λ) + Rw
Q (λ) = R∗ (λ) ⊗ R (λ) + A (λ) KAH (λ)
(6.40)
where the kurtosis matrix K is diagonal in the presence of multipath because x = Hd is always
circular even if the transmitted symbols d are not. Moreover, K is strictly zero for any constant
amplitude modulation and admits a simple form in case of a linear circular modulation. Taking
into account that
K = Ed {vec ddH vecH ddH } − vec (I) vecH (I) − I
is equal to
K = (ρ − 2) diag (vec (I))
in case of circular constellations (3.12), it follows that the diagonal entries of K are
∞
[K]k,k = 2 (ρ − 1)
P DP 2 (k/W − iT )
(6.41)
i=−∞
where
P DP (t) ⎧ ⎨ E |h(t)|2
0 ≤ t < L/W
⎩0
otherwise
stands for the channel power delay profile (PDP). Notice that (6.41) vanishes in case of constant
amplitude modulations (ρ = 1). Thus, the Gaussian assumption in a multipath scenario applies
for an important class of modulations whereas it is not verified in case of Gaussian distributed
symbols (ρ = 2).
In some circumstances, the channel taps are uncorrelated and, if W T is an integer number,
Γ is a diagonal matrix with entries:
∞
[Γ]k,k =
P DP (k/W − iT )
i=−∞
In this uncorrelated scattering (US) scenario, the channel PDP conveys all the statistical
information about the channel. Then, assuming that the channel PDP is known or accurately
estimated, optimal second-order synchronizers can be built for the studied scenario using the
framework provided in Chapter 3 and 4.
Moreover, in some situations a limited set of parameters is sufficient to describe completely
the PDP function. For example, sometimes the mobile radio channel is correctly modeled by
adopting a (decreasing) exponential PDP [Gre92] as the following:
−t
P DP (t; σ) = Cexp
σ
(6.42)
CHAPTER 6. CASE STUDIES
156
where σ is the so-called delay spread and C is a normalization constant forcing Tr {Γ} = K with
K the length of x.
Depending on the channel delay spread, two asymptotic situations can be studied:
1. Flat fading channel (σ → 0):
[Γ]k,k =
1 k multiple of W T
0 otherwise
(6.43)
and hence (6.39) reduces to the ideal channel case with x = d. In that case, the channel
only changes the distribution of the received symbols.
2. Highly frequency-selective channel (σ → ∞):
1
(6.44)
IK
WT
and, therefore, the channel is implicitly increasing the vector of received symbols x in (6.39)
Γ=
by a factor of W T , as well as changing their distribution. Notice that this expansion may
require to oversample the received signal in order to guarantee that the matrix A (λ) is tall,
i.e., it has more rows (received samples) than columns (unknown symbols). Otherwise,
the estimator variance will exhibit a high-SNR floor because the self-noise term cannot be
cancelled. This fact forces the designer to ensure that Nss > W T .
6.3.3
Optimal Second-Order DA Estimator
Thus far, the vector of transmitted symbols d is unknown at the receiver side and, therefore,
NDA estimators are required. Next, the optimal DA estimator is formulated assuming that d
is deterministic (i.e., a training sequence) but the channel H is still unknown. In that case, the
received symbols x = Hd are zero-mean random variables and, hence, second-order methods are
necessary once more. In order to deduce the optimal second-order estimator, the signal model
in (6.39) must be modified in the following way:
y = A (λ) Dh + w
(6.45)
with [h]k = h(k/W ) the k-th tap of the unknown channel and D the matrix stacking the known
transmitted symbols {di } in such a way that Dh = Hd.
At this point, the optimal second-order estimator of λ is straighforward from the above
signal model with h the vector of Gaussian nuisance parameters. It only rests to compute the
covariance matrix R (λ) and the fourth-order matrix Q (λ) for the problem at hand, obtaining
that
R (λ) = A (λ) DΓh DH AH (λ) + Rw
Q (λ) = R∗ (λ) ⊗ R (λ)
6.3. TOA ESTIMATION IN MULTIPATH SCENARIOS
157
where
Γh E hhH
and Q (λ) is computed taking into account that h is normally distributed. In case of uncorrelated
scattering, Γh becomes diagonal with entries [Γh ]k,k = P DP (k/W ).
6.3.4
Numerical Results
The results in this section were applied to devise multipath resistant TOA estimators in the
context of the EMILY project [Bou02b]. The aim of this project was the integration of positioning measurements from the GPS and GSM networks. In the second case, the spatial accuracy
is severely degraded due to the multipath propagation. In this kind of scenarios the proposed
NDA and DA estimators are robust against the multipath providing unbiased TOA estimates.
In both cases, the channel PDP and the noise variance is measured off-line. For the sake of simplicity, an uncorrelated Rayleigh channel having exponential PDP is considered. Different delay
spreads are simulated and the channel is varied in time according to the Jake’s Doppler spectrum [Gre92] although this information is not exploited by the estimator. Finally, the GMSK
modulation from the GSM standard as well as the MPSK and MSK modulations are considered
in the simulations.
To estimate the timing error τ , the received bandpass signal is filtered into W = 2/T and,
afterwards, the I and Q components are generated and sampled taking Nss = 4 per symbol. A
first-order closed-loop is implemented to estimate and track the TOA of the user of interest.
The optimal second-order NDA discrimator is considered with M = 8 the number of input
samples. The variance at the discriminator output is computed as a function of the SNR. Notice
that this variance is further reduced by the loop filter. Simulations fit pretty well with the
theoretical performance obtained in Chapter 4, where it is shown that the minimum variance
for any quadratic unbiased timing detector is given by
V AR (τ ) =
with
dR (τ )
dr (τ ) vec
dτ
1
dH
r
(τ ) Q−1 (τ ) dr
(τ )
(6.46)
dAH (τ ) dA (τ ) H
= vec A (τ )
+
A (τ )
dτ
dτ
and dA (τ ) /dτ the matrix whose columns are 1/W -delayed versions of the shaping pulse derivative, i.e., dp(t)/dt. Notice that the estimator variance in (6.46) becomes independent of the
actual value of the parameter τ .
The first simulation in Fig. 6.16 is carried out for the 16-QAM modulation. As commented
before, the number of significant nuisance parameters grows with the delay spread σ. In Fig.
CHAPTER 6. CASE STUDIES
158
1
10
σ=5T
0
10
σ=T
−1
10
−2
Variance
10
σ=T/4
−3
10
σ=T/10
−4
10
−5
10
σ=T/20
−6
10
ideal chanel & σ=0
−7
10
0
10
20
30
SNR (dB)
40
50
60
Figure 6.16: TOA estimation variance as a function of the SNR for 16-QAM symbols. The
simulation parameters are Nss = 4 and M = 8. The transmitted pulse is a square-root raised
cosine truncated at ±4T (100% roll-off).
6.16, the estimator is unable to cope with the self-noise enhancement and exhibits a variance
floor at high SNR. This degradation is rapidly observed even for very small values of σ (e.g.,
σ = T /10). If σ is slightly augmented (σ = T /4), this degradation is also observed at low SNR.
In the limit (σ → ∞), the number of nuisance parameters is multiplied by W T = 2. Notice
that the loss in terms of timing accuracy caused by the channel is extremely important in case
of QAM modulated signals.
On the other hand, the maximum loss with constant modulus constellations such as MPSK,
MSK and LREC is bounded and occurs when the delay spread approaches the symbol time (Figs.
6.17-6.20). The estimator is found to be self-noise free for the MPSK and MSK modulations
whatever the channel delay spread. This loss is manifested first at high SNR, as it was observed
in the QAM simulations (Fig. 6.16). On the other hand, self-noise can be eliminated augmenting
the observation time when dealing with the 2REC and 3REC modulations.
Regarding the Gaussian assumption, it is always verified for the QAM modulation (Fig.
6.16). On the other hand, it applies for any constant modulus modulation (e.g., MPSK and
CPM) in the presence of a fading channel (Figs. 6.17-6.20).
6.3. TOA ESTIMATION IN MULTIPATH SCENARIOS
159
0
10
σ=0
−1
10
σ=∞
−2
10
Variance
−3
10
−4
10
Gaussian
Assumption
(ideal channel)
−5
10
−6
Ideal
channel
10
−7
10
0
10
20
30
SNR
40
50
60
Figure 6.17: TOA estimation variance as a function of the SNR for MPSK symbols. The
simulation parameters are Nss = 4 and M = 8. The transmitted pulse is a square-root raised
cosine truncated at ±4T (100% roll-off).
0
10
σ>T
σ=0
−1
10
−2
Variance
10
−3
10
Gaussian
Assumption
(ideal channel)
−4
10
−5
10
Ideal
channel
−6
10
0
10
20
30
SNR (dB)
40
50
60
Figure 6.18: TOA estimation variance as a function of the SNR for MSK symbols. The simulation parameters are Nss = 4 and M = 8.
CHAPTER 6. CASE STUDIES
160
0
10
−1
10
−2
Variance
10
Gaussian
Assumption
(ideal channel)
−3
10
σ>T
Ideal
channel
−4
10
σ=0
−5
10
−6
10
0
10
20
30
SNR (dB)
40
50
60
Figure 6.19: TOA estimation variance as a function of the SNR for the 2REC modulation. The
simulation parameters are Nss = 4 and M = 8.
2
10
1
10
0
10
Variance
−1
10
−2
10
−3
10
σ>T
−4
Gaussian
Assumption
(ideal channel)
10
σ=0
Ideal
channel
−5
10
0
10
20
30
SNR (dB)
40
50
60
Figure 6.20: TOA estimation variance as a function of the SNR for the 3REC modulation. The
simulation parameters are Nss = 4 and M = 8.
6.4. BLIND CHANNEL IDENTIFICATION
6.4
161
Blind Channel Identification
In some scenarios, the transmission channel is frequency-selective causing intersymbol interference (ISI) at the matched filter output [Pro95]. Most times the channel response is not known
a priori and the receiver has to identify the channel in order to cope with this ISI. This task
is mandatory if the channel response is time-variant as it happens in wireless communications.
In that case, adaptive techniques have to be developped to track the channel evolution. On
the other hand, in a given access network, the subscribers have different channel responses and,
thus, their equipments are supposed to configure theirselves when they are plugged for the first
time to the network.
In most standards, some training or pilot symbols are transmitted periodically to facilitate
the receiver synchronization and the channel identification. The use of training sequences reduces
the system efficiency, mostly when the channel varies in time. This inconvenience has motivated
for a long time the study of blind channel estimation and equalization techniques. The pionnering
work is authored by Y. Sato [Sat75] and was further developed by Godard [God80], Treichler et
al. [Tre83], Benveniste et al. [Ben84], Picci et al. [Pic87], Salvi et al. [Sha90], Giannakis et al.
[Gia89], Nikias [Nik92], Sala [Sal97] among others.
All these methods exploit the higher-order moments of the received signal in the belief that
non-minimum phase channels were not identifiable from second-order techniques3 . This idea
was refuted in the revolutionary paper by Tong et al. [Ton91] where the authors proved that
the channel response can be identified from the second-order moments if the received signal is
cyclostationary and multiple samples per symbol are taken from the channel output. This new
perspective is founded into the fractionally-spaced equalizer proposed by Ungerboeck in 1976
[Ung76]. In this paper, the oversampling was proposed as a means of improving the equalizer
performance in the presence of timing errors. Anyway, the main advantage of second-order
methods is that their convergence is faster than the one of higher-order methods.
The original paper was further simplified by Moulines et al. in [Mou95] and studied in
[Ton94][Ton95][Tug95] from different points of view. All these channel estimators are subspace
methods based on the eigendecomposition of the sample covariance matrix. A different perspective was introduced by Giannakis et al. [Gia97] and Zeng et al. [Zen97a] in which the asymptotic (large sample) best quadratic unbiased channel estimator is formulated from the cyclic
spectrum or the cyclic correlation, respectively. Additionally, an hybrid method including subspace constraints is proposed in [Zen97a]. The resulting estimator is shown to encompass most
second-order methods in the literature [Gia97][Liu93][Mou95][Sch94] [Ton95]. Some asymptotic
studies are also supplied in [Zen97b].
3
A system is minimum phase if all the zeros of its transfer function are inside the unit circle. This implies that
the inverse system is realizable.
CHAPTER 6. CASE STUDIES
162
In this chapter, the best quadratic unbiased estimator is deduced for a finite observation.
The proposed estimator exploits the knowledge on the pulse shaping as well as the statistics
of the transmitted discrete symbols. It is found that the optimal solution is able to estimate
the channel amplitude in case of constant-modulus constellations such as MPSK or CPM. On
the other hand, the amplitude is ambiguous under the Gaussian assumption. This contribution
actually complements the results in [Gia97][Zen97a][Zen97b].
6.4.1
Signal Model
This section is based on the signal model presented in Section 6.3.1. The aim is now to estimate
the channel impulse response h(k/W ) in (6.37). The vector of parameters is given by
θ [Re {h0 } , . . . , Re {hL−1 } , Im {h0 } , . . . , Im {hL−1 }]T
with hk h(k/W ) the k-th tap of the channel.
The received waveform is the superposition of L replicas of the transmitted pulse p(t),
L−1
hk p (t − k/W )
(6.47)
k=0
that, if it is sampled every Ts = T/Nss seconds, yields the following transfer matrix A (θ):
A (θ) =
L−1
hk B (k/W )
k=0
where
[B(τ )]m,i = p(mTs − τ − iT )
m = 0, ..., M − 1, i = 0, ..., I − 1
is the matrix performing the convolution with the delayed shaping pulse p(t − τ ) and I is the
number of observed symbols.
Notice that the proposed signal model can be applied in spread spectrum communications
with p(t) the known signature and h∗ the weight associated to the k-th finger in a RAKE
k
receiver. The optimality of the RAKE receiver is guaranteed if the L fingers are uncorrelated
which means that p(t) is spectrally white and the channel taps hk are uncorrelated, as well.
In the context of narrowband communications, the identification of the channel impulse response h(k/W ) is required to implement fractional equalizers [Ung76]. On the other hand, if the
estimated channel is later employed to implement the maximum likelihood sequence estimator
(MLSE) [For72][Pro95, Sec. 5-1-4] and detect the sequence of transmitted symbols without incurring in noise-enhancement, the objective is to estimate the complex channel response at the
6.4. BLIND CHANNEL IDENTIFICATION
163
matched filter output sampled at one sample per symbol, that is,
αn =
where g(t) L−1
kk g (nT − k/W ) ≈
k=0
L−1
hk sinc (n − k/W T )
k=0
p(τ )p(t + τ )dτ stands for the shaping pulse at the matched filter output and
the last equality holds if g (t) is an ideal Nyquist pulse without truncation. Although it is not
strictly necessary, hereafter W T is assumed to be an integer for the sake of simplicity. In that
case the vector of real parameters becomes
α [Re {α0 } , . . . , Re {αN −1 } , Im {α0 } , . . . , Im {αN −1 }]T = Gθ,
which is a linear transformation of θ given by matrix G I2 ⊗ T with N L/W T and
[T]n,k g (nT − k/W ) ≈ sinc (n − k/W T ) .
Taking now into account that the estimator is invariant in front of any linear transformation,
Therefore, if an unbiased
= Gθ.
the optimal second-order estimator of α is directly given by α
estimator of α = Gθ is aimed, it has to guarantee that MH Dr = I2N and, hence, the IPI-free
solution stated in (4.12) must be adopted (see Section 4.4).
Once the signal model is identified, the procedure for deducing the optimal second-order
estimator is systematic and consists in finding the set of constituent matrices in (4.12) for the
problem at hand. Regarding the matrix of derivatives Dr , the first coefficient h0 is supposed to
be real valued in order to solve the phase ambiguity of second-order algortihms.
Eventually, the matrix Dr is built stacking the derivatives of the vectorized covariance matrix
r (θ) = vec (R (θ)) with respect to the real and imaginary part of the complex coefficients hk :
∂r (θ) /∂ Re {hk } = vec B (k/W ) AH (θ) + A (θ) BH (k/W )
∂r (θ) /∂ Im {hk } = j vec B (k/W ) AH (θ) − A (θ) BH (k/W )
for k = 0, . . . , L − 1.
As stated in [Zen97a, Theorem 2], the channel is identifiable if the channel Z-transform
$
−k can be decomposed into N subchannels having different reciprocal zeros4 .
H(z) L−1
ss
k=0 hk z
When this condition is not hold, Dr is singular for this channel realization (see Section 4.3).
Notice that this condition is weaker than the usual identifiability condition [Ton95][Tug95].
6.4.2
Numerical Results
The simulated channel spreads over N = 3 symbol periods. The transmitted pulse p(t) is
a squared-root raised cosine of roll-off r truncated to last 8 symbols. The channel taps hk are
4
z0 is a reciprocal zero of H(z) if H(z0 ) = H(z0−1 ) = 0.
CHAPTER 6. CASE STUDIES
164
1
10
CML
0
10
Low−SNR ML
−1
Normalized Variance
10
−2
10
GML
−3
10
UCRB
BQUE
MCRB
−4
10
−5
10
−6
10
0
10
20
30
SNR (dB)
40
50
60
Figure 6.21: Performance of the second-order ML-based estimators (Low-SNR approximation,
Conditional ML and Gausian ML) and the optimal solution provided in Chapter 4. The simulation parameters are r = 0.35, µ = 0.02 (Bn = 5 · 10−3 ).
generated as independent zero-mean Gaussian variables of unit variance. The receiver bandwidth
is set to W = 2/T to encompass any roll-off factor. Consequently, the number of taps is
L = NW T = 6. The observation window is set to M = 18 samples and the received signal is
oversampled taking Nss = 3 samples per symbol. Finally, the transmitted symbols are QPSK.
in
The figure of merit computed in this section is the normalized variance of the estimator α
the steady-state. The expected value with respect to the random channel will be computed in
order to obtain the average performance of the estimator. Thus, the channel estimator variance
is defined in the following way:
V AR Eθ
E α (n) −α2
α2
where the expectation with respect to θ is approximated by averaging 100 random channels.
The above figure of merit will be plotted as a function of the signal-to-noise ratio. The noise
variance σ 2w will be adjusted at each realization to maintain the SNR since the received power
depends on the actual channel response.
The ML-based estimators discussed in Chapter 3 and the corresponding bounds are evaluated
and compared with the optimal second order estimator formulated in Chapter 4. In all the cases,
a closed-loop scheme is implemented with its bandwidth adjusted to guarantee the small-error
6.4. BLIND CHANNEL IDENTIFICATION
165
−1
10
−2
10
Normalized variance
UCRB
−3
10
64−QAM
(GML&BQUE)
MCRB
−4
10
QPSK (GML)
−5
10
QPSK
(BQUE)
−6
10
10
20
30
40
SNR (dB)
50
60
70
Figure 6.22: Comparison of the BQUE and GML estimators when the channel amplitude is
estimated too and the transmitted symbols are QPSK, 64-QAM or, Gaussian distributed for a
roll-off factor equal to 1.
condition for the simulated SNR range.
Suboptimal Algorithms comparison: Low-SNR ML, CML, GML
Using the CML method the channel can be determined up to a constant complex factor
[Car00]. For comparison, all the methods will assume that the value of the first coefficient is 1.
Fig. 6.21 points out that the low-SNR approximation suffers from a severe high-SNR floor
due to the self-noise contribution. On the other hand, the CML criterion is shown to be not
useful for channel estimation because its variance is extremely high within the range of operative
SNRs. Contrarily, the Gaussian model is shown to be appropiate to build good estimators of
the channel response. Uniquely at high SNR, the exploitation of the discrete distribution of the
symbols is found to improve the estimator accuracy.
The UCRB is also depicted showing that is a valid lower bound for the performance of
second-order techniques. Nonetheless, in Fig. 6.21 the UCRB is shown to be a little optimistic
in high-SNR scenarios. Finally, the MCRB predicts the theoretical performance that data-aided
schemes would attain compared with second-order blind estimators. Clearly, the insertion of
pilot symbols improves notably the estimator performance for any SNR. Additionally, another
important advantage of DA methods is that they do not exhibit outliers at low SNR because
CHAPTER 6. CASE STUDIES
166
−3
10
GML
−4
Normalized variance
10
r=1
−5
10
r=1/3
r=0
BQUE
(r=0,0.35,1)
−6
10
−7
10
−8
10
10
20
30
40
SNR (dB)
50
60
70
Figure 6.23: Comparison of the BQUE and GML estimators if the channel is multiplicative
using different values of the roll-off factor.
the estimator is a linear transformation of the parameters.
Channel amplitude estimation: GML vs BQUE
In Fig. 6.22, the amplitude of the first channel tap is estimated too. The estimator variance
exhibits a severe floor at high-SNRs due to the self-noise unless the symbols are drawn from a
constant modulus constellation such as M-PSK or CPM. The floor level is inversely proportional
to the observation time, which means that the estimator is consistent for any SNR, and is related
to the amplitude dispersion of the constellation.
In order to clarify these conclusions, in Appendix 6.B the high-SNR asymptotic variances at
high SNR for the GML and the optimal quadratic estimator is deduced when the transmitted
signal is linearly modulated and the channel is multiplicative (Fig. 6.23). The asymptotic
expressions obtained therein predict exactly the aforementioned floor showing its dependence
on the constellation fourth-order moment ρ and the number of observed symbols.
Regarding the QPSK simulation (ρ = 1), the BQUE asymptotic variance is inversely proportional to the SNR whereas the GML performance degrades for high SNR (Fig. 6.22). The
underlying motive is the poor estimation of the channel amplitude, as depicted in Fig. 6.23. In
this figure, the GML suffers a transitory floor because the Gaussian assumption fails gradually
as the SNR is augmented.
6.5. ANGLE-OF-ARRIVAL (AOA) TRACKING
6.5
167
Angle-of-Arrival (AoA) Tracking
The classical approach in array signal processing considers that the sources are deterministic unknowns (conditional model) or, alternatively, Gaussian random variables (unconditional model)
[Sto90a][Ott93]. As a consequence of the Central Limit Theorem, the Gaussian assumption
provides optimal second-order DOA trackers with independence of the actual distribution of the
sources if the number of sensors is asymptotically large [Sto89] or the SNR asymptotically low,
as studied in Chapter 7.
However, in the context of mobile communications, the array size is limited and the above
asymptotic condition is unrealistic. In these scenarios, the consideration of the discrete distribution of the transmitted signals yields a significant improvement in terms of tracking
variance when two or more sources transmit from a similar DOA, even if the SNR is moderate. Notice that this improvement is not obtained exploiting the signal cyclostationarity
[Gar88b][Sch89][Xu92][Rib96] because we consider that all the users transmit using the same
modulation and, thus, share the same cyclostationarity. Nonetheless, it would be straightforward to incorporate this information if the received signal were oversampled as indicated in
Section 6.1.
From this background, in the next subsection, we have sketched the formulation of the
optimal second-order DOA tracker when the transmitted signals are digitally modulated. The
performance of the resulting estimator constitutes the lower bound for the variance of any sample
covariance based DOA estimator including the ML [Sto90a][Ott93] and subspace based methods
such as the Pisarenko’s [Pis73], MUSIC [Sch79][Bie80][Sto89], ESPRIT [Roy89], MODE [Sto90b],
weigthed subspace fitting (WSF) [Vib91] and other variants (see [Kri96][Ott93] and references
therein). Notice that all these quadratic methods achieve the same asymptotic performance
under appropiate hypothesis on the array manifold, when the observation time goes to infinity,
as proved in [Ott92][Car94]. However, in this section they are shown to be inefficient —even in
the asymptotic case— if the symbols are drawn from a constant modulus alphabet (e.g., MPSK
or CPM).
The results in this section were presented in the IEEE Asilomar Conference on Signals,
Systems and Computers that was held in Pacific Grove (USA) in 2003 [Vil03b]:
• “Second-Order DOA Estimation from Digitally Modulated Signals”, J. Villares, G.
V´azquez, Proc. of the 37th IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove (USA), November 2003.
CHAPTER 6. CASE STUDIES
168
6.5.1
Signal Model
In all the estimation problems addressed in this chapter, the parameter θ remains static during
the observation time. However, in the context of mobile communications, it is important to
obtain an accurate estimate of the users angular position but also to track their location as the
transmitters move around the base station. Consequently, it is necessary to estimate both the
angle and the angular speed of every source transmitting towards the base station. Higher-order
derivatives of the AoA (acceleration and so on) will be disregarded from the study for simplicity.
Therefore, based on the small-error estimators obtained in (4.12) and (4.18), a closed-loop
scheme (tracker) as the one suggested in Section 2.5.2 will be implemented in order to track the
parameter θn evolution. To do so, the parameter dynamics (speed, acceleration, etc.) must be
incorporated into the model.
Formally, let us consider the problem of tracking the angle-of-arrival of P narrowband sources
impinging into a uniform linear array composed of M antennas spaced λ/2 meters, with λ the
common signal wavelength. Let us consider that all the transmitters are visible from the base
station array and they do not experience multipath propagation. Let φ (t) ∈ [−π, π)P be the
temporal evolution of the P angles-of-arrival in radians and φ (t) ∂φ (t) /∂t the respective
derivatives accounting for the angular speed. Let us assume that the acceleration and higherorder derivatives are negligible during the observation time, that is, ∂ i φ (t) /∂ti = 0 for i > 1.
Furthermore, let us assume that the bandwidth of φ(t) does not exceed 1/2T , with T the symbol
period. In that case, φ (nT ) holds the sampling theorem and the P trajectories can be ideally
reconstructed from their samples φn φ (nT ) yielding the following discrete-time dynamical
model or state equation:
φn−k = φn − kφn
(6.48)
where the angular speed φn is normalized to the symbol period T . Therefore, the composed
vector of parameters that must be estimated to track the users without having any systematic
pursuit error is
θn+1 φn+1
φn+1
= Gθn
(6.49)
with
G
IP IP
0P IP
Consequently, the optimal second-order AoA tracker is given by
n + diag (µ) GJ# (θ
n )DH Q−1 (θ
n ) n ) ,
n+1 = Gθ
r
−
r(
θ
θ
r
2
(6.50)
6.5. ANGLE-OF-ARRIVAL (AOA) TRACKING
169
using the small-error expression obtained in (4.12) with Dg = G. Notice that the above expression follows the structure of the optimal ML tracker in Section 2.5.2 with h (θn ) = Gθn .
The above solution forces to zero all the cross-derivatives of g (θ) including the IPI terms
associated to the interference from other users (Section 4.4). This interference is referred to as
multiuser or multiple access interference (MUI or MAI) in the literature. Thus, the second-order
AoA tracker in (6.50) will be referred to as the MUI-free AoA tracker hereafter.
On the other hand, following the reasoning in Section 4.4, it is not strictly necessary to cancel
out the cross-derivatives corresponding to different users because the tracker optimization will
remove the MUI contribution if the SNR is sufficiently high. Likewise, if the SNR is low, the
MUI term will be automaticaly ignored to not enhance the noise contribution. Thus, it is only
necessary to decouple the estimates of φn and φn in order to have unbiased estimates of θn+1
in (6.49). If not, AoA estimation errors would yield angular speed deviations and vice versa. To
avoid this, we have to constrain these cross-derivatives to zero, as indicated next:
n n ∂ φ
∂ φ
p
p
=
= 0 p = 1, ..., P
∂ [φn ]p ∂ φn p
θ=θ n
(6.51)
θ=θ n
while the rest of cross-derivatives is liberated. We will refer to this solution as the MUI-resistant
AoA tracker in the sequel.
To complete the signal model, the received signal is passed throught the matched-filter
and then sampled at one sample per symbol in order to collect K snapshots5 . Independent
snapshots are obtained assuming that the actual modulation is ISI-free and, consequently, the
symbol synchronization has been established. On the other hand, it can be shown that carrier
synchronization is not required since the matrix Q (θ) is insensitive to the phase of the P × K
nuisance parameters.
According to the considerations above, the snapshot recorded at time n − k is given by
yn−k = An−k (θn ) xn−k + wn−k
where xn−k is the vector containing the symbols transmitted by the P users at time n − k, wn−k
the white noise samples and the p-th column of An−k (θn ) ,
⎤
1
⎥
⎢
⎥
⎢
exp
jπ
sin
[φ
]
−
k
φ
n
n
p
p
⎥
⎢
[An−k (θn )]p ⎢
⎥,
..
⎥
⎢
.
⎦
⎣
exp jπ (M − 1) sin [φn ]p − k φn p
⎡
5
Notice that the problem dynamics (angle and angular velocity) require to process K ≥ 2 snapshots.
CHAPTER 6. CASE STUDIES
170
is the steering vector associated to the p-th source at time n − k with j √
−1. Notice that
An−k (θn ) incorporates the known dynamical model (6.48).
In order to reproduce the vectorial model in (2.13), the K snapshots are stacked to build
the following spatio-temporal observation:
⎡
⎤
yn
⎢
⎥
..
⎥ = A (θn ) x (n) + w (n)
y (n) ⎢
.
⎣
⎦
yn−K+1
where x (n) and w (n) are constructed as y (n) and the transfer matrix A (θn ) is given by
⎡
⎤
An (θn )
⎢
⎥
..
⎥
A (θn ) ⎢
.
⎣
⎦
An−K+1 (θn )
As stated before, once the signal model has been determined, we only have to find the set
of constituent matrices in (4.12) and (4.18). To conclude, the derivatives of the steering vectors
are provided next:
∂ [An−k (θn )]p,m
∂ [φn ]q
= jπm cos [φn ]p − k φn p exp jπ sin [φn ]p − k φn p δ (p − q)
∂ [An−k (θn )]p,m
= −jπmk cos [φn ]p − k φn p exp jπ sin [φn ]p − k φn p δ (p − q)
∂ φn q
for all p, q ∈ {1, . . . , P } where δ (·) stands for the Kronecker delta.
6.5.2
Numerical Results
Two independent sources transmitting from the far-field to a uniform linear array composed of
M = 4 antennas are simulated. The received power is assumed to be the same for simplicity.
Both signals are QPSK modulated and two snapshots (K = 2) are recorded at the matched-filter
output.
The figure of merit considered in this section is the estimator normalized steady-state variance defined as
V AR (∆φ) 2
E φ
−φ
n
n
P ∆φ2
(6.52)
with ∆φ ([φn ]2 − [φn ]1 ) /2 half of the sources separation. The variance will be plotted as a
function of the SNR per source at the matched-filter output Es /N0 = σ−2
w with Es the received
symbol energy and N0 the noise double-sided spectral density.
Two AoA trackers forcing a different set of constraints on g (θ) will be tested:
6.5. ANGLE-OF-ARRIVAL (AOA) TRACKING
171
AoA (rad)
MUI−free AoA tracker
MUI−resistant AoA tracker
0.15
0.15
0.1
0.1
0.05
0.05
0
0
−0.05
−0.05
−0.1
−0.1
−0.15
−0.15
450
500
n (time)
550
450
500
n (time)
550
Figure 6.24: AoA tracking of two users whose trajectories cross at time instant n=500. The
output of the MUI-free and MUI-resistant trackers is plotted on the left and right hand side,
respectively. Two simulations are run with two different outcomes for the MUI-free tracker:
tracking is lost (solid line) or the two sources are interchanged (dashed line). The signal SNR is
fixed to 10 dB in both cases.
1. MUI-free AoA tracker: MH Dr = Dg = IP ;
2. MUI-resistant AoA tracker: diag MH Dr = diag (Dg ) = diag (IP ) and the cross terms in
(6.51) are set to zero.
Two different scenarios have been simulated in order to illustrate the benefit of considering
the actual distribution of the sources when they are transmitting from similar angles.
Two users crossing
Figure 6.24 shows that the MUI-free AoA tracker (left plot) loses tracking as the two sources
approach each other due to the noise enhancement observed when the SNR is low (SNR=10dB).
This situation arises because, when the users are transmitting from similar angles, the matrix
Dr becomes nearly singular and the estimator variance (4.13) augments suddenly.
On the other hand, the MUI-resistant AoA tracker (right plot) overcomes this critical situation because it does not try to remove the MUI term associated to the cross derivatives of Dr
when the noise contribution is dominant (low SNR). Following the explanation in Section 4.4,
CHAPTER 6. CASE STUDIES
172
−1
10
CML
Low−SNR ML
−2
10
Normalized Variance
UCRB
−3
10
GML
MCRB
−4
10
BQUE
−5
10
−6
10
0
5
10
15
20
Es/No (dB)
25
30
35
40
Figure 6.25: Steady-state variance of the AoA tracker for two sources located at ±5 degrees
from the broadside. The loop-bandwidth is set to Bn = 1.25 · 10−3 and the MUI-free estimator
is simulated.
the MUI-resistant AoA tracker liberates the cross derivatives in Dr while the users are crossing
and matrix Dr is badly conditioned. In this manner, the tracker does not enhance the noise
contribution and is able to remain “locked” during the crossing.
Steady-state variance for two near sources
The steady-state variance of the MUI-free AoA tracker is evaluated as a function of the
SNR, considering that we have two still users separated 10o (Fig. 6.25) and 1o (Fig. 6.26). The
noise equivalent loop bandwidth Bn (Section 2.5.2) has been selected in order to guarantee the
small-error condition for all the simulated SNRs (Section 4). For the studied set-up, the noise
enhancement caused by the sources proximity is found to be negligible. This fact makes the
two suggested implementations (MUI-resistant and MUI-free) to be practically equivalent in the
simulated scenarios. A minor improvement is appreciatted in Fig. 6.26 for low SNR.
Theoretically, the performance of the MUI-free estimator is very limited at low SNR when the
two sources are close, as shown in figure 6.27, whereas its competitor (MUI-resistant) achieves the
single user performance whatever the simulated SNR. Thus, Fig. 6.27 illustrates the potential
gain that the MUI-resistant alternative offers in terms of steady-state variance when the problem
is badly conditioned and the observations are very corrupted by the noise.
6.5. ANGLE-OF-ARRIVAL (AOA) TRACKING
173
2
10
1
10
Low−SNR ML
0
10
−1
Normalized Variance
10
CML
−2
10
−3
10
UCRB
−4
10
MUI−resistant
GML
MCRB
−5
10
−6
10
BQUE
−7
10
10
15
20
25
30
35
Es/No (dB)
40
45
50
55
60
Figure 6.26: Steady-state variance of the AoA tracker for two sources located at ±0.5 degrees
from the broadside. The loop-bandwidth is set to Bn = 1.65 · 10−4 and the MUI-free estimator
is simulated.
In Fig. 6.25 and Fig. 6.26, the optimal second-order tracker has also been compared with the
ML-based trackers formulated in Section 2.4. The first conclusion is that the low-SNR approximation appears to be useless in these critical scenarios for the SNRs of interest. The underlying
motive is the so-called self-noise, i.e., the variance floor caused by the nuisance parameters at
high SNR (Section 2.4.1). The self-noise is really irrelevant when the SNR tends to zero but it
becomes dominant as soon as the SNR is increased. Notice that, in the AoA estimation problem at hand, the so-called self-noise is generated by the random symbols (nuisance parameters)
from the user of interest as well as the other interfering users. Therefore, the MUI and self-noise
contributions are strongly connected in this case study.
To overcome the low-SNR UML variance floor, the CML tracker was proposed in Section
2.4.2. The CML is able to yield self-noise free estimates but it suffers from noise enhancement
when the SNR is low because it tries to decorrelate the nuisance parameters from the different
users.
Regarding the GML AoA tracker presented in Section 2.4.3, the convergence to the CML
solution for high SNR and to the low-SNR UML solution for low SNR (if the x -axis were
expanded) is observed. Between these two asymptotic extremes, the GML adjusts its coefficients
depending on the actual SNR to minimize the joint contribution of the noise and the self-noise.
Indeed, the GML solution is found to be the best quadratic estimator or tracker based uniquely
CHAPTER 6. CASE STUDIES
174
−2
10
0.1
MUI−free
0.2
−3
10
0.5
1
−4
Normalized Variance ·∆φ2
10
−5
10
MUI−resistant
−6
10
−7
10
−8
10
−9
10
0
5
10
15
20
γ (SNR)
25
30
35
40
Figure 6.27: Normalized variance as a function of the SNR for the MUI-free and MUI-resistant
AoA trackers when the sources are separated 0.1, 0.2, 0.5 or 1 as indicated for each curve. The
tracker loop bandwidth is set to Bn = 1.25 · 10−3 .
on the second-order moments of the nuisance parameters, i.e., if K = 0 in (3.10). Nonetheless,
when comparing the variance of the GML and the BQUE AoA trackers in Figs. 6.25-6.26, it
is confirmed that second-order estimation is improved for medium-to-high SNRs if the fourthorder statistical knowledge on the nuisance parameters (K = 0) is exploited. The resulting gain
is shown to be greater when the angular separation is reduced if one compares Fig. 6.25 and
Fig. 6.26. Moreover, when the loop bandwidth Bn is small (Fig. 6.26), the BQUE performance
is rather close to the one predicted by the MCRB in case of known nuisance parameters and, it
definitely constitutes the lower bound for the variance of any unbiased estimator based on the
sample covariance matrix.
Surprisingly, the GML estimator does not attain the UCRB bound out of the aforementioned
asymptotic cases because the nuisance parameters are actually non-Gaussian (QPSK discrete
symbols) and the UCRB is based on the Gaussian assumption.
6.A COMPUTATION OF Q FOR CARRIER PHASE ESTIMATION
175
Appendix 6.A Computation of Q for carrier phase estimation
The expected value of ∇2 (y; θo ) (6.12) can be manipulated as follows
2 2
−8
2
−j2θo H
−8
−j2θo H
j2θo H
r e
r E{∇ (y; θo )} = 4σw E Im e
r
= −σ w E
r−e r r
H
T ∗ H
−j4θo H
r−
Re
e
r
E
r
E
= 2σ−8
r
r
r
r r
w
T
H
r
= 2σ−8
E r
rH − e−j4θo E r
r
w r
bearing in mind that both Γ and Ex vec xxT vecT xxT
are real amounts for any CPM
signal according to the Laurent’s expansion [Lau86]. In that case, the proper and improper
correlation matrices of r can be computed as follow:
T
E r
r = ej4θo AEx vec xxT vecT xxT AT
H
E r
r = AEx vec xxT vecH xxT AH +
+ (I + K) Rw ⊗ AAH + (I + K) AAH ⊗ Rw +
+ (I + K) (Rw ⊗ Rw )
= A Ex vec xxT vecH xxT − 2P AH + 2P R ⊗ R
where P 1
2
(I + K) and the following identities have been applied as done in Appendix 3.B:
vec ABCT = (C ⊗ A) vec (B)
(A ⊗ B) (C ⊗ D) = AC ⊗ BD
T
vec ab vecH abT = (b ⊗ a) (b ⊗ a)H = bbH ⊗aaH
vec baT vecH abT = K vec abT vecH abT = K bbH ⊗aaH = aaH ⊗bbH K.
Finally, if Q is defined as
Q AKAH + 2R ⊗ R
with K given in (6.19), then
H
−8 H
E{∇2 (y; θo )} = 2σ−8
w r PQr = 2σw r Qr
using the following properties of the orthogonal projector P:
PAKAT = APKAT = AKAT
Pr = r.
CHAPTER 6. CASE STUDIES
176
Appendix 6.B Asymptotic expressions for multiplicative channels
Let y =ax + w be the matched filter output with a ∈ (0, +∞) the amplitude we aim to estimate,
x the vector of N symbols and w the AGW noise of variance σ2w . From this simple model and
after some manipulations, the variance of the GML and BQUE estimators is identical and is
given by
V AR = BU CRB +
a2 (ρ − 2)
4N
where BU CRB denotes the associated UCRB (Section 2.6.1):
BU CRB
2
2
a + σ2w
=
4Na2
If we take the limit of V AR when the noise variance tends to zero, we obtain
V AR =
σ2
a2 (ρ − 1)
+ w + o σ2w ,
4N
2N
that only goes to zero as the noise vanishes if N → ∞ (consistent estimator) or ρ = 1, which
is the case of the MPSK modulation, proving that the signal amplitude can only be perfectly
estimated from a finite observation in case of constant amplitude modulations such as the MPSK.
Finally, notice that the performance of the GML and the BQUE is generally different if the
estimator operates directly on the received signal as shown in Fig. 6.23.
Chapter 7
Asymptotic Studies
In this chapter, some asymptotic results are provided for the second-order estimators formulated
in Chapters 3 and 4. In the first sections, their asymptotic performance is evaluated when the
SNR is very low or very high. The low-SNR study concludes that the nuisance parameters
distribution is irrelevant in a noisy scenario. In that case, the Gaussian assumption is shown to
yield efficient estimators. On the other hand, the high-SNR asymptotic study is useful to bound
the loses incurred when the Gaussian assumption is applied in spite of having non-Gaussian
nuisance parameters. The most important conclusion is that the Gaussian assumption leads
to optimal second-order schemes unless the nuisance parameters belong to a constant modulus
constellation, such as MPSK or CPM. Therefore, the Gaussian assumption applies for very
important constellations in digital communications such as QAM or APK.
The theoretical study is accompanied with some simulations for the problem of bearing
estimation in case of digitally-modulated signals (Section 7.5.1). Numerical results are also
provided in Section 7.5.2 for the problem of feedforward second-order frequency estimation
initially addressed in Section 4.5. The same asymptotic study was carried out in Section 6.2 for
the carrier phase estimation problem in case of noncircular transmissions.
In the second part of the chapter, the asymptotic performance of the second-order smallerror estimators in Chapter 4 is evaluated when the data record grows to infinity. Asymptotic
expressions are deduced for a vast majority of estimation problems in digital communications,
such as timing and frequency synchronization, channel impulse response estimation and timeof-arrival estimation, among others. In that case, the large sample asymptotic expressions
become a function of the spectra of the received waveform and its derivatives. In this context,
a simple condition is obtained that identifies whether the Gaussian assumption yields optimal
second-order schemes or not. From this result, the Gaussian assumption is proved to be optimal
for timing and frequency synchronization. Some simulations are supplied in Section 7.5.2 that
validate this last conclusion.
177
CHAPTER 7. ASYMPTOTIC STUDIES
178
Asymptotic expressions are also obtained for the DOA estimation problem when the spatiotemporal observation grows indefinitely. If the number of antennas increases, it is shown that
the covariance of the estimation error is asymptotically independent of the sources statistical
distribution and, therefore, the Gaussian assumption can be applied to obtain efficient DOA
estimators. On the other hand, the Gaussian assumption is found to yield an important loss
if the number of sensors is small and multiple constant-modulus signals (e.g., MPSK or CPM)
impinge into the array from near directions. These conclusions are validated numerically in
Section 7.5.3.
7.1
Introduction
Let us summarize first the main results obtained in Chapters 3 and 4. As it was shown therein,
any second-order estimator of α = g(θ) is an affine transformation of the sample covariance
matrix, having the following form:
= g + MH (
r − r)
α
where g =Eθ {g(θ)} and r =Eθ E {
r} are the a priori knowledge about the parameter α and the
quadratic observation r = vec(yyH ), respectively.
Based on the linear signal model presented in Section 2.4, matrix M was optimized in
Chapters 3 and 4 by adopting different criteria. For the large-error MMSE and minimum
variance second-order estimators studied in Chapter 3, matrix M was given by
−1
Mmse Q + Q
S
#
QQ
−1 Q
S
Mvar Q−1 Q
(7.1)
was introduced in (3.23), and Q is the Bayesian expectation, Eθ {·} , of matrix
where Q
Q (θ) = R (θ) + A (θ) KAH (θ) ,
with K the fourth-order cumulant matrix in (3.11) and
A (θ) = A∗ (θ) ⊗ A (θ)
R (θ) = R∗ (θ) ⊗ R (θ)
(7.2)
R (θ) = A (θ) AH (θ) + Rw .
On the other hand, the optimum second-order small-error estimator was obtained in Chapter
4, having that
−1 H
−1
Mbque (θ) Q−1 (θ) Dr (θ) DH
(θ) Dr (θ)
Dg (θ)
r (θ) Q
(7.3)
7.1. INTRODUCTION
179
where θ stands henceforth for the actual value of the parameter and
∂A (θ) H
∂R (θ)
∂AH (θ)
= vec
A (θ) + A (θ)
[Dr (θ)]p = vec
∂θp
∂θp
∂θp
(7.4)
is the derivative of R (θ) with respect to the p-th parameter, i.e., θp [θ]p .
The MSE matrices for the above estimators are given by1
−1
Σmse Σg − SH Q + Q
S
#
−1 Q
#S
Σvar Σg + SH QQ
S − SH Q
−1 H
−1
(θ) Dr (θ)
Dg (θ) .
Bbque (θ) Dg (θ) DH
r (θ) Q
(7.5)
where Dg (θ) = ∂g (θ) /∂θT and
Σg Eθ (g(θ) − g) (g(θ) − g)H
stands for the prior covariance matrix.
Finally, when the above estimators are deduced under the Gaussian assumption (i.e., K = 0
and Q = R), their performance is given by the following MSE matrices:
−1
+R
Σmse Σg − SH Q
S + Xmse (K)
#
−1 Q
# S + Xvar (K)
S − SH Q
Σvar Σg + SH QR
(7.6)
Bgml (θ) BUCRB (θ) + Xgml (K) .
where
−1 H
−1
BU CRB (θ) Dg (θ) DH
(θ) Dr (θ)
Dg (θ)
r (θ) R
(7.7)
is the well-known (Gaussian) unconditional CRB (Section 2.6.1) and, Xmse (K), Xvar (K),
Xgml (K) are the terms depending on the kurtosis matrix K, which are given by
−1
−1
+R
+R
Xmse (K) SH Q
Eθ A (θ) KAH (θ) Q
S
#
#
−1 Q
−1 Eθ A (θ) KAH (θ) R−1 Q
QR
−1 Q
Xvar (K) SH QR
QR
S
(7.8)
−1 H
−1
Dr (θ) R−1 (θ) A (θ) KAH (θ) R−1 (θ) Dr (θ)
(θ) Dr (θ)
Xgml (K) Dg (θ) DH
r (θ) R
−1 H
H
Dg (θ)
Dr (θ) R−1 (θ) Dr (θ)
It will be shown in next sections that Xgml (K) is always negligible for very low or high
SNR. Nonetheless, the GML estimator might outperform the associated UCRB if the SNR is
1
The “MSE matrix” is defined as Eθ E eeH where e stands for the considered estimation error [Kay93b].
CHAPTER 7. ASYMPTOTIC STUDIES
180
moderate and Xgml (K) is negative. This behaviour has been observed, for example, in the
DOA estimation problem in Section 6.5. On the other hand, in this chapter, the Gaussian
assumption is proved to yield the optimal second-order estimator when the SNR goes to zero or
if the amplitude of the nuisance parameters is not constant and the SNR goes to infinity. Finally,
regarding the large-error MMSE and minimum variance estimators, Xvar (K) and Xmse (K) are
irrelevant at low SNR but they are determinant at high SNR because they are able to reduce
the variance floor.
Before going into detail, let us decompose the noise covariance matrix Rw as σ2w N in order
to make explicit the dependence on the noise variance σ2w . Assuming that the noise is stationary,
the diagonal entries of Rw are precisely the noise variance σ2w . Formally, the noise variance is
given by
σ2w Tr (Rw ) /M.
and, therefore, N = Rw /σ 2w has unitary diagonal entries, by definition. Furthermore, in next
sections, it will be useful to consider the following fourth-order matrix:
N N∗ ⊗ N.
7.2
Low SNR Study
When the noise variance goes to infinity (σ2w → ∞), the inverse of R (θ), Q (θ) and R(θ) take
the following asymptotic form:
−1
R−1 (θ) = σ−2
+ o σ −2
w N
w
−1
+ o σ−4
Q−1 (θ) ,R−1 (θ) = σ−4
w N
w
assuming that N is full-rank. The Landau symbol o (x) is introduced to consider all those terms
that converge to zero faster than x. On the other hand, the rest of matrices appearing in (7.5)
and (7.6) are independent of σ2w . Specifically, the noise variance does not affect the value of
S, Dr (θ), Dg (θ), K and Σg .
A (θ), A (θ), Q,
Therefore, the MSE matrices in (7.5) and (7.6) have the following asymptotic expressions at
low SNR:
H −1
Σmse , Σmse = Σg − σ−4
S + o σ −4
w S N
w
#
−1 Q
(7.9)
S + o σ4w
Σvar , Σvar = σ 4w SH QN
−1 H
−1
BU CRB (θ) , Bgml (θ) , Bbque (θ) = σ 4w Dg (θ) DH
(θ) Dr (θ)
Dg (θ) + o σ4w .
r (θ) N
taking into account that Xmse (K) in (7.8) is proportional to σ−8
w and, Xvar (K) and Xgml (K)
are constant. Notice that the fourth-order matrix K does not appear in none of the above
7.3. HIGH SNR STUDY
181
asymptotic expressions. This implies that the actual distribution of the nuisance parameters
becomes irrelevant at low SNR when designing second-order schemes. Moreover, any assumption
about the distribution of the nuisance parameters yields the same MSE expressions in (7.9).
To complete the analysis, the asymptotic expression of the studied second-order estimators
is provided next:
−1
S + o σ −4
Mmse , Mmse = σ −4
w N
w
#
QN
−1 Q
Mvar , Mvar = N −1 Q
S + o (1)
(7.10)
−1 H
−1
(θ) Dr (θ)
Dg (θ) + o (1)
Mbque (θ) , Mgml (θ) = N −1 (θ) Dr (θ) DH
r (θ) N
where Mmse and Mvar correspond to the minimum variance and MMSE estimators obtained
under the Gaussian assumption.
In Appendix 7.A, it is shown that Mbque (θ) and Mgml (θ) in (7.10) coincide with the scoring
method that implements the low-SNR ML estimator deduced in Section 2.4.1. Due to the
asymptotic efficiency of the ML estimator (Section 2.3.2), if the GML and BQUE estimators
converge to the ML solution at low SNR, we can state that the GML and BQUE estimators
become asymptotically efficiency as the SNR goes to zero. As it was discussed in Section 2.3.2,
the “asymptotic” condition is satisfied whenever the estimator operates in the small-error regime
or, equivalently, the actual SNR exceeds the SNR threshold. Accordingly, in the studied low
SNR scenario (σ2w → ∞), the asymptotic condition requires that the observation length goes to
infinity (M → ∞) in order to attain the small-error regime.
Likewise, because the GML is efficient at low SNR, the associated (Gaussian) UCRB (7.9)
becomes the true CRB at low SNR if and only if the observation size goes to infinity (smallerror). Notice that both the UCRB and the true CRB are proportional to σ−4
w at low SNR, as
it was reported in [Ste01] for the problem of timing synchronization.
7.3
High SNR Study
In low SNR conditions, the Gaussian assumption has been proved to yield optimal second-order
estimators. However, when the SNR increases, the optimal second-order estimators listed in
(7.1) and (7.3) exploit the fourth-order statistical information about the nuisance parameters
contained in matrix K. When the Gaussian assumption is adopted and this information is
omitted (K = 0), the performance of the studied second-order estimators degrades at high
SNR. In this section, this loss is upper bounded by evaluating the asymptotic performance of
the aforementioned estimation methods when the noise variance goes to zero.
In Appendix 7.B, the asymptotic value of R−1 (θ) and R−1 (θ) as the noise variance goes to
CHAPTER 7. ASYMPTOTIC STUDIES
182
zero is calculated, obtaining
4
⊥
2
R−1 (θ) = σ −2
w PA (θ) + B (θ) − σ w B (θ) NB (θ) + O σw
⊥∗
⊥
R−1 (θ) = σ −4
P
(θ)
⊗
P
(θ)
w
A
A
∗
⊥
⊥∗
B
+σ−2
(θ)
⊗
P
(θ)
+
P
(θ)
⊗
B
(θ)
w
A
A
(7.11)
+B∗ (θ) ⊗ B (θ)
−σ2w (B∗ (θ) ⊗ B (θ) NB (θ) + [B (θ) NB (θ)]∗ ⊗ B (θ))
+O σ4w
(7.12)
where the Landau symbol O (x) includes all the terms that converge to zero as x or faster. The
asymptotic value of R−1 (θ) and R−1 (θ) is given in terms of the following matrices:
−1 H
A (θ) N−1
A# (θ) AH (θ) N−1 A (θ)
−1
#
P⊥
I
(θ)
N
−
A
(θ)
A
(θ)
M
A
H
B (θ) A# (θ) A# (θ)
(7.13)
(7.14)
(7.15)
where A# (θ) and P⊥
A (θ) are variations of the Moore-Penrose pseudoinverse and the projector
onto the null subspace of A (θ), respectively. The original definitions are altered to include the
whitening matrix N−1 in case of correlated noise samples (i.e., N = IM ). Although abusing of
notation, the above matrices retain all the properties of the original definition, that is,
A# (θ) A (θ) = IK
A (θ) A# (θ) A (θ) = A (θ)
A# (θ) A (θ)# A (θ) = A# (θ)
P⊥
A (θ) A (θ) = 0
AH (θ) P⊥
A (θ) = 0.
On the other hand, the asymptotic value of Q−1 (θ) depends on the kurtosis matrix K. The
complete study is carried out in Appendix 7.C when K is full-rank and in Appendix 7.D when K
is singular. In these appendices, Q−1 (θ) is proved to have the following asymptotic expression:
−1
Q
−1
(θ) = R
(θ) + σ−2
w
H
#
A (θ) P⊥
K (θ) A (θ) + O (1)
#
(7.16)
where the pseudoinverse of A (θ) = A∗ (θ) ⊗ A (θ) (7.2) is defined as follows
∗
−1 H
A (θ) N −1 = A# (θ) ⊗ A# (θ) .
A# (θ) AH (θ) N −1 A (θ)
and P⊥
K (θ) stands for the projector onto the subspace generated by the eigenvectors of
H associated to the eigenvalue −1.
K = VK ΣK VK
7.3. HIGH SNR STUDY
183
The second term in (7.16) is positive semidefinite and it becomes zero if and only if P⊥
K (θ) =
0, i.e., if all the eigenvalues of K are different from −1. The rank of P⊥
K (θ) is thus determinant
to assess the potential benefit of considering the kurtosis matrix K in the design of secondorder estimators. The exact expression of P⊥
K (θ) is given in Appendix 7.C (K full-rank) and in
Appendix 7.D (K singular). In Section 7.3.3, the study of P⊥
K (θ) will be addressed with more
detail.
It is worth realizing that all the above asymptotic results are implicitly assuming that the
transfer matrix A (θ) is full column rank. This will be the baseline for the asymptotic studies in
this section. In addition, some indications are given in Appendix 7.E to carry out the asymptotic
study when the rank of A (θ) is lower than the number of columns K.
7.3.1
(Gaussian) Unconditional Cram´
er-Rao Bound
The (Gaussian) UCRB is widely used to lower bound the performance of second-order estimators.
Thus far, it is proved that the UCRB is a valid second-order lower bound when the SNR goes
to zero or if the nuisance parameters are actually Gaussian. Nonetheless, this is not generally
true. Indeed, the UCRB is shown to be outperformed at high SNR by the optimal second-order
small-error estimator proposed in Chapter 4. Likewise, the GML estimator usually outperforms
the UCRB for intermediate SNRs.
In this section, the high-SNR limit of BU CRB (θ) when the noise variance goes to zero is
derived. It is shown that BU CRB (θ) becomes proportional to σ2w at high SNR and, therefore,
self-noise free estimates are feasible when the nuisance parameters are Gaussian. Formally, we
have that
−1 H
−1
BUCRB (θ) = Dg (θ) DH
(θ) Dr (θ)
Dg (θ)
r (θ) R
4
= σ 2w Dg (θ) B1−1 (θ) DH
g (θ) + O σw
(7.17)
−1 (θ) D (θ) . The entries of B (θ)
where B1 (θ) stands for the high-SNR limit of σ2w DH
r
1
r (θ) R
are determined in Appendix 7.F, obtaining
[B1 (θ)]p,q = 2 Re Tr
∂AH (θ) ⊥
∂A (θ)
.
PA (θ)
∂θp
∂θq
(7.18)
Notice that this result requires that ∂A (θ) /∂θp does not lie totally on the subspace generated
by the columns of A (θ), i.e.,
P⊥
A (θ)
∂A (θ)
= 0
∂θp
for all the parameter θ1 , . . . , θ P . This abnormal situation takes place if the noise subspace of
matrix A (θ) is null (Appendix 7.D) but also in the problem of carrier phase synchronization
CHAPTER 7. ASYMPTOTIC STUDIES
184
addressed in Section 6.2. In both cases, the constant term B∗ (θ) ⊗ B (θ) in (7.12) has to be
considered in order to evaluate the variance floor at high SNR, having that
lim BUCRB (θ) = Dg (θ) B2−1 (θ) DH
g (θ)
σ2w →0
(7.19)
−1 (θ) D (θ) . The entries of B (θ) are
where B2 (θ) stands for the high-SNR limit of DH
r
2
r (θ) R
determined in Appendix 7.G, obtaining
∂A (θ) #
∂A (θ) #
∂A (θ) ∂AH (θ)
[B2 (θ)]p,q 2 Re Tr
A (θ)
A (θ) +
B (θ) .
∂θp
∂θq
∂θp
∂θq
7.3.2
(7.20)
Gaussian Maximum Likelihood
In most estimation problems, the UCRB takes the form in equation (7.17) and self-noise free
estimation is possible with Gaussian nuisance parameters. In that case, the asymptotic performance of the GML estimator is exactly the one computed in (7.17) irrespective of the actual
distribution of the nuisance parameters, i.e., even if K = 0. Formally, we have that
4
Bgml (θ) , BU CRB (θ) = σ 2w Dg (θ) B1−1 (θ) DH
g (θ) + O σw
(7.21)
with B1 (θ) given in (7.18).
This statement is true because the term Xgml (K) in (7.6) can be neglected since it depends
on σ4w whereas BUCRB (θ) is proportional to σ2w . Notice that Xgml (K) is proportional to σ4w
because
R−1 (θ) A (θ) = [B∗ (θ) ⊗ B (θ)] A (θ) + O σ 2w
is asymptotically constant, as pointed out in Appendix 7.B.
Finally, if ∂A (θ) /∂θp and A (θ) were linearly dependent, the GML performance would
exhibit a variance floor at high SNR that would be a function of the kurtosis matrix K. Using
equation (7.6), it follows that the GML variance floor would be equal to
(7.22)
lim Bgml (θ) = Dg (θ) B2−1 (θ) DH
g (θ)
H
−1
H
#
#
+Dg (θ) B2 (θ) Dr (θ) A (θ) KA (θ) Dr (θ) B2−1 (θ) DH
g (θ)
σ2w →0
where BU CRB (θ) = Dg (θ) B2−1 (θ) DH
g (θ) is the variance floor in case of Gaussian nuisance
parameters (7.19) and the second term corresponds to Xgml (θ) in (7.6).
7.3.3
Best Quadratic Unbiased Estimator
In this section, closed form expressions are obtained for the ultimate performance of secondorder small-error estimators at high SNR. The study in Appendix 7.C and Appendix 7.D comes
7.3. HIGH SNR STUDY
185
to the conclusion that the Gaussian assumption is optimal at high SNR unless some eigenvalues
of the kurtosis matrix K are equal to −1. It seems that this condition is related to the constant
modulus of the nuisance parameters. This important result suggests to classify the nuisance
parameters distribution according to the eigendecomposition of K. With this purpose, let us
first obtain the asymptotic expression of Bbque (θ) (7.5) as the noise variance goes to zero, i.e.,
σ2w → 0.
Using the asymptotic value of Q−1 (θ) in (7.16), it follows that
−1
−1
DH
(θ) Dr (θ) = DH
(θ) Dr (θ)
r (θ) Q
r (θ) R
H
H
#
#
D
(θ)
A
(θ)
P⊥
+σ−2
w
r
K (θ) A (θ) Dr (θ) + O (1)
K
where P⊥
K (θ) ∈ R
2 ×K 2
(7.23)
denotes the projector onto the subspace generated by the eigenvectors
of K associated to the eigenvalue −1.
−1 (θ) D (θ) in (7.17), it follows that
Using now the asymptotic expression of DH
r
r (θ) R
(7.24)
Bbque (θ) = σ 2w Dg (θ)
−1
H
#
#
4
B1 (θ) + DH
P⊥
DH
r (θ) A (θ)
K (θ) A (θ) Dr (θ)
g (θ) + O(σ w )
where the second term inside the inverse is always positive semidefinite and, therefore, we can
state at high SNR that
Bbque (θ) ≤ Bgml (θ) = BU CRB (θ) .
The second term of (7.24) is zero and, therefore, the Gaussian assumption applies at high
SNR in any of the following situations:
1. Signal parameterization. The Gaussian assumption applies at high SNR if ∂A (θ) /∂θp
lies totally in the noise subspace of A (θ), i.e,
∂A (θ)
∂A (θ)
= P⊥
A (θ)
∂θp
∂θ p
or, taking into account the definition of P⊥
A (θ) in (7.14),
AH (θ) N−1
∂A (θ)
= 0.
∂θp
In that case, after some simple manipulations, it can be shown that
H ∂R (θ) #
#
#
A (θ) Dr (θ) = vec A (θ)
A (θ)
∂θp
p
H
∂AH (θ) #
∂A (θ)
#
= vec
A (θ) + A (θ)
=0
∂θp
∂θ p
(7.25)
CHAPTER 7. ASYMPTOTIC STUDIES
186
and, thus, the second term in (7.24) is strictly zero independently of the nuisance parameters distribution. For example, this condition applies in digital synchronization as the
observation length goes to infinity (Section 7.4.4).
By comparing this condition and the one introduced in Section 7.3.1, we can conclude that
the condition (7.25) never applies if the UCRB and GML suffer from self-noise at high
SNR since, in that case, P⊥
A (θ) ∂A (θ) /∂θp = 0.
2. Nuisance parameters distribution. Regardless of the signal parameterization, the
Gaussian assumption applies at high SNR if all the eigenvalues of the kurtosis matrix K
are different from −1. In that case, P⊥
K (θ) is strictly zero, and the second term in (7.24)
becomes zero.
If the nuisance parameters are drawn from an arbitrary circular complex alphabet, the
kurtosis matrix is given by K = (ρ − 2) diag (vec (IK )) (3.12) and, therefore, the Gaussian
assumption always applies except if ρ = 1. It can be shown that this condition (ρ = 1) is
solely verified in case of constant modulus alphabets. Accordingly, in the context of digital
communications, the Gaussian assumption applies for any multilevel linear modulation
such as QAM or APK. On the other hand, it does not apply in case of any complex MPSK
modutation holding that ρ = 1.
If the nuisance parameters are not circular, there is not a closed-form expression for the
eigenvalues of K. However, it is found that the kurtosis matrix of some important constantmodulus noncircular modulations has some eigenvalues equal to −1. Among them, an
special attention is given in this thesis to the CPM modulation. Other important constantmodulus noncircular modulations are the BPSK and those constant-modulus staggered
modulations such as the offset QPSK [Pro95].
Finally, in those scenarios in which the UCRB (7.19) and the GML (7.22) exhibit a variance
floor at high SNR because P⊥
A (θ) ∂A (θ) /∂θ p = 0, the Gaussian assumption fails when
the nuisance parameters have constant modulus. In that case, the second term in (7.24)
allows cancelling the self-noise because
H
−1
−2 H
#
#
DH
(θ)
Q
(θ)
D
(θ)
=
B
(θ)
+
σ
D
(θ)
A
(θ)
P⊥
r
2
r
w
r
K (θ) A (θ) Dr (θ) + O (1)
and, therefore, the constant term B2 (θ) (7.20) can be neglected when compared to the
second term, that is proportional to σ−2
w . Using this result, the asymptotic variance of the
optimal second-order estimator is given by
−1
H
4
2
H
#
⊥
#
DH
Bbque (θ) = σw Dg (θ) Dr (θ) A (θ) PK (θ) A (θ) Dr (θ)
g (θ) + O σ w
This situation arises in the carrier phase estimation problem studied in Section 6.2 as well
as in the scenarios simulated in Section 4.5 in which A (θ) is not full-column rank.
7.3. HIGH SNR STUDY
7.3.4
187
Large Error Estimators
In this section, the asymptotic performance of the Bayesian estimators in (7.1) is analyzed when
the SNR goes to infinity. The result of this asymptotic study depends on the influence of the
Bayesian expectation Eθ {·} on the following matrices:
R Eθ {R (θ)} = G + σ2w N
R Eθ {R (θ)} = Eθ {R∗ (θ) ⊗ R (θ)}
= Eθ A (θ) AH (θ) + σ2w U + σ4w N
Q Eθ {Q (θ)} = R + Eθ A (θ) KAH (θ)
= Eθ A (θ) (IK 2 + K) AH (θ) + σ2w U + σ 4w N
(7.26)
with the following definitions:
G Eθ A (θ) AH (θ)
(7.27)
U [G∗ ⊗ N + N∗ ⊗ G]
(7.28)
The Bayesian expectation always increases the rank of these matrices. Even if the prior
distribution is rather informative, these matrices become rapidly full rank. Therefore, let us
consider that G, and hence U, are eventually full rank. In that case, the MSE matrices in (7.5)
and (7.6) converge to the following limits at high SNR (Appendix 7.H):
lim Σmse = Σg − SH BT1 S
σ2w →0
#
T Q
#S
lim Σvar = Σg + SH QB
S − SH Q
2
σ2w →0
lim Σmse = Σg − SH BT3 S + Xmse (K)
σ2w →0
(7.29)
#
# S + Xvar (K)
T Q
S − SH Q
lim Σvar = Σg + SH QB
4
σ2w →0
where
Xmse (K) = SH BT3 Eθ A (θ) KAH (θ) BT3 S
#
#
T Q
T4 Eθ A (θ) KAH (θ) BT4 Q
QB
T4 Q
Xvar (K) = SH QB
QB
S
4
(7.30)
and BT is computed as
−1 −1 H −1
−1 H −1
BT U−1 VT VTH U−1 VT
ΣT VT U VT
VT U
(7.31)
with VT ΣT VTH the “economy-size” diagonalization of the specific matrix T considered in (7.29):
T1 Eθ A (θ) (IK 2 + K) AH (θ) + Q
T2 Eθ A (θ) (IK 2 + K) AH (θ)
T3 Eθ A (θ) AH (θ) + Q
(7.32)
T4 Eθ A (θ) AH (θ) .
CHAPTER 7. ASYMPTOTIC STUDIES
188
Taking a glance at (7.29), one observes that the terms
#
T Q
SH QB
S
2
#
T Q
SH QB
S + Xvar (K)
4
# S is the estimator bias
in Σvar and Σvar correspond to the self-noise whereas the term Σg −SH Q
at high SNR. The self-noise terms were found to vanish when the observation time was infinite
for the problem of blind frequency estimation in Section 3.4. On the other hand, the bias
term could not be cancelled extending the observation time in the problem of blind frequency
estimation (Section 3.4).
7.4
Large Sample Study
In this section, the asymptotic performance of the second-order estimators deduced in Chapter
4 is evaluated when the number of observed samples goes to infinity (M → ∞). Notice that
M can be increased by augmenting either the sampling rate (Nss ) or the observation interval
(Ns = M/Nss ).
In the first case, the sampling theorem states that it is enough to take Nss = 2 samples
per symbol for those modulations with an excess of band smaller than 100%. However, when
the observation window is too short, the observed spectrum becomes wider due to well-known
smearing and leakage effects [Sto97]. The proposed estimators deal with this problem by applying
the best temporal window according to the known signal model and the adopted optimization
criterion. Nonetheless, if the vector of nuisance parameters is longer than the number of observed
samples, it is not possible to avoid the variance floor at high SNR unless Nss is increased (see
Appendix 7.E). This problem is only relevant when the observation time is really short, as it
has been considered in this dissertation so far. If the observation time Ns is augmented, the
problem of spectral aliasing becomes rapidly negligible and Nss = 2 becomes sufficient.
The importance of the sampling rate was also evidenced in Section 3.4 for the problem of
carrier frequency-offset synchronization. It was shown therein that the estimator bias can only
be cancelled if Nss goes to infinity. Surprisingly, the bias term cannot be removed by only
increasing Ns . However, this sort of arguments are specific to the frequency estimation problem
and should be revised for other estimation problems.
Considering in the sequel that Nss is fixed, asymptotic expressions are given in this section
for the small-error second-order estimators deduced in Chapter 4 as the observation length goes
to infinity (M → ∞) . The study for the large error estimators in Chapter 3 is omitted because
it is less insightful due to the role of the Bayesian expectation (see Section 7.3.4).
7.4. LARGE SAMPLE STUDY
189
In the large sample case, a unified analysis is not feasible because the results depend on the
actual parameterization and on how A (θ) grows when M → ∞. In this section, the problems
of non-data-aided synchronization in Section 6.1, blind time-of-arrival estimation in Section 6.3,
blind channel identification in Section 6.4 and DOA estimation in Section 6.5 are considered.
Before addressing the asymptotic study for the aforementioned estimation problems, the
covariance matrices BU CRB (θ), Bgml (θ) and Bbque (θ) in Section 7.1 are now restated in terms
of the following matrices
B (θ) AH (θ) N−1 A (θ)
∂AH (θ) −1
Bp (θ) N A(θ)
∂θp
∂AH (θ) −1 ∂A (θ)
Bp,q (θ) N
.
∂θp
∂θq
(7.33)
for p, q = 1, . . . , P . These matrices collect all the scalar products between two columns of A (θ)
and ∂A (θ) /∂θp —normalized by means of N−1 . It is shown in the following subsections that
these K × K matrices determine enterely the performance of second-order estimators. Thus, it
is only necessary to study the asymptotic value of B (θ) , Bp (θ) and Bp,q (θ) as the number of
observations goes to infinity (M → ∞) or, in other words, as the dimension of the column space
of A (θ) and ∂A (θ) /∂θp increases wihtout limit.
For the sake of clarity, we will consider hereafter that g (θ) = θ and, in most cases, the noise
term will be assumed white, i.e., N = IM .
7.4.1
(Gaussian) Unconditional Cram´
er-Rao Bound
After some simplifications, the (Gaussian) UCRB in (7.7) can be restated as
−1
−1
BU CRB (θ) p,q = DH
(θ) Dr (θ) p,q
r (θ) R
∂AH (θ) −1
∂AH (θ) −1
= 2 Re Tr
R (θ) A (θ)
R (θ) A (θ)
∂θp
∂θq
H
∂AH (θ) −1
∂A (θ)
−1
+ A (θ) R (θ) A (θ)
R (θ)
.
∂θp
∂θq
using the algebraic properties in (7.54) from Appendix 7.A.
Then, if the inversion lemma is applied to arrange the inverse of R (θ) as
−1 H
H
−1
−1
−1
R−1 (θ) = σ−2
− σ−2
A (θ) N−1 ,
w N
w N A (θ) A (θ) N A (θ) + IK
(7.34)
CHAPTER 7. ASYMPTOTIC STUDIES
190
it follows that the entries of B−1
U CRB (θ) become a function of the following three matrices:
−1
X (θ) σ2w AH (θ) R−1 (θ) A (θ) = B (θ) − B (θ) B (θ) + σ2w IK
B (θ)
H
−1
∂A (θ) −1
R (θ) A (θ) = Bp (θ) − Bp (θ) B (θ) + σ2w IK
B (θ)
(7.35)
Xp (θ) σ2w
∂θ p
−1 H
∂AH (θ) −1
∂A (θ)
Xp,q (θ) σ2w
R (θ)
= Bp,q (θ) − Bp (θ) B (θ) + σ 2w IK
Bq (θ)
∂θ p
∂θq
where B (θ) , Bp (θ) and Bp,q (θ) were introduced in (7.33).
Therefore, plugging (7.35) into (7.34), it follows that
−1
BU CRB (θ) p,q = 2σ−4
Re
Tr
X
(θ)
X
(θ)
+
X
(θ)
X
(θ)
.
p
w
q
p,q
7.4.2
(7.36)
Gaussian Maximum Likelihood
In this section, the GML covariance matrix Bgml (θ) is restated in terms of B (θ) , Bp (θ) and
Bp,q (θ). Bearing in mind that Dg (θ) = IP , it follows that (7.6) can be written as
Bgml (θ) = BU CRB (θ) + Xgml (K) = BU CRB (θ) + BUCRB (θ)Ψ (K) BU CRB (θ).
(7.37)
where
−1
Ψ (K) DH
(θ) A (θ) KAH (θ) R−1 (θ) Dr (θ) .
r (θ) R
Next, we will prove that Ψ (K) is also a function of B (θ), Bp (θ) and Bp,q (θ) (7.33). Taking
into account the definitions of A (θ) and R (θ) in (7.2), the associative property of the Kronecker
product yields
∗
AH (θ) R−1 (θ) = AH (θ) R−1 (θ) ⊗ AH (θ) R−1 (θ) .
Then, using again the matrix properties in (7.54), it can be seen that
H
A (θ) R−1 (θ) Dr (θ) p = vec X (θ) Xp (θ) + XH
p (θ) X (θ)
where X (θ) and Xp (θ) were introduced in (7.35).
Therefore, Ψ (K) can be written as
H
[Ψ (K)]p,q = σ−4
w vec (Yp (θ)) K vec (Yq (θ))
H
H
= σ−4
w vec (Yp (θ)) VK ΣK VK vec (Yq (θ))
(7.38)
where
Yp (θ) X (θ) Xp (θ) + XH
p (θ) X (θ)
(7.39)
7.4. LARGE SAMPLE STUDY
191
H is the “economy-size” diagonalization of K.
and VK ΣK VK
To conclude this analysis, Ψ (K) can be further simplified when the nuisance parameters are
circular. In that case, the kurtosis matrix K is equal to (ρ − 2) diag (vec (IK )) (3.12), and the
eigenvalues and eigenvectors of K are given by
ΣK = (ρ − 2) IK
[VK ]k = vec ek eH
k
where ek ∈ RK is defined as
[ek ]i 1 i=k
0 i = k
(7.40)
.
In [Mag98, Ex.4, p.62], it is shown that VK has the following interesting properties:
H
vec(A) = diag (A)
VK
[(A ⊗ B) VK ]k = [A]k ⊗ [B]k
H
VK
(A ⊗ B) VK = A B
(7.41)
for any pair of matrices A and B of appropriate size.
Taking into account the first property in (7.41), (7.42) becomes
H
[Ψ (K)]p,q = σ−4
w diag (Yp (θ)) ΣK diag (Yq (θ))
= σ−4
w (ρ − 2) Tr (Yp (θ) Yq (θ))
(7.42)
using (7.40) and the following identity:
diagH (A) diag (B) = Tr (A∗ B) = Tr AH B .
Regarding now the definition of Yp (θ) (7.39), it follows that diag (Yq (θ)) is always realvalued because
diag (Yq (θ)) = 2 Re diag XH
.
p (θ) X (θ)
7.4.3
Best Quadratic Unbiased Estimator
In this section, the same analysis is carried out for the optimal second-order estimator. The aim
is also to formulate Bbque (θ) in terms of B (θ), Bp (θ) and Bp,q (θ) (7.33). To begin with, the
inversion lemma is applied to Q−1 (θ) obtaining
H
−1
−1
(θ) Dr (θ) = DH
(θ) Dr (θ) + Γ (K)
B−1
r (θ) R
bque (θ) = Dr (θ) Q
(7.43)
CHAPTER 7. ASYMPTOTIC STUDIES
192
where the first term corresponds to B−1
U CRB (θ) in (7.36), and
H H
−1
−1
(θ) A (θ) VK VK
A (θ) R−1 (θ) A (θ) VK + Σ−1
Γ (K) −DH
r (θ) R
K
H H
VK
A (θ) R−1 (θ) Dr (θ)
(7.44)
is the term depending on the kurtosis matrix2 . The “economy-size” diagonalization of K is
introduced to encompass those problems in which K is singular (e.g., CPM).
Next, Γ(K) is formulated in terms of B (θ), Bp (θ) and Bp,q (θ) (7.33),
H
[Γ (K)]p,q = −σ −4
w vec (Yp (θ))
H ∗
−1 H
[X (θ) ⊗ X (θ)] VK + σ4w Σ−1
VK vec (Yp (θ)) ,
VK VK
K
(7.45)
using the following identities:
H
A (θ) R−1 (θ) Dr (θ) p = σ−4
w vec (Yp (θ))
∗
AH (θ) R−1 (θ) A (θ) = σ−4
w [X (θ) ⊗ X (θ)] .
Unfortunately, the analysis of (7.45) is really involved except for those circular alphabets
holding K = (ρ − 2) diag (vec (IK )) (3.12). In that case, from (7.40) and (7.41), we have
H
vec (Yp (θ)) = diag (Yp (θ))
VK
H
VK
[X∗ (θ) ⊗ X (θ)] VK = X∗ (θ) X (θ) .
Accordingly, if ρ = 2, the non-Gaussian term Γ(K) is given by
−1
−1
H
∗
4
[Γ (K)]p,q = −σ −4
IK
diag (Yq (θ)) . (7.46)
w diag (Yp (θ)) X (θ) X (θ) + σ w (ρ − 2)
Regarding the last expression, the following important conclusion arises. If the nuisance
parameters are circular (3.12), the Gaussian assumption applies independently of the SNR and
the nuisance parameters distribution if
diag (Yp (θ)) = 2 Re diag XH
p (θ) X (θ) = 0
(7.47)
for p = 1, . . . , P where X (θ) and Xp (θ) were defined in (7.35). If the last equation holds true,
Ψ (K) and Γ(K) are exactly zero in view of (7.42) and (7.46). This condition will be tested in
the following sections to validate the Gaussian assumption in some relevant estimation problems
in digital communications.
Notice that the last condition is more restrictive than the one presented in (7.25). Actually, it
is straightforward to realize that (7.47) is satisfied if (7.25) is held because, in that case, BH
p (θ) =
AH (θ) N−1 ∂A (θ) /∂θp = 0 and hence Xp (θ) = 0 (7.35).
2
Notice that Γ (K) is actually the second term in (7.23).
7.4. LARGE SAMPLE STUDY
7.4.4
193
Second-Order Estimation in Digital Communications
In this section, simple asymptotic closed-form expressions are obtained for any estimation problem in which multiple replicas of the same waveform (pulse) are periodically received. The
received waveform is parameterized by a finite set of parameters θ and will be referred to as
g(t; θ) in this section. Assuming that a continuous stream of symbols is received, the structure
of A (θ) corresponds to the one represented in Fig. 6.1 (right-hand side) in Section 6.1.2.
This framework encompasses most estimation problems in digital communications, among
others the synchronization problems described in Section 6.1, the problem of blind channel
identification in Section 6.4 and, the problem of time-of-arrival estimation studied in Section
6.3. Although the problem of frequency estimation does not fall into this category because the
phase of the received waveform is time-varying, it is proved in Appendix 7.J that quadratic NDA
techniques are only aware of the carrier phase variation within the received pulse duration.
In this section, the aymptotic value of B (θ), Bp (θ) and Bp,q (θ) (7.33) is determined for
Ns going to infinity. In that case, the size of these K × K square matrices also increases
proportionally as Ns → ∞ because
K = Ns + L − 1
with L the pulse duration in symbol periods. However, although the size of B (θ), Bp (θ) and
Bp,q (θ) tends to infinity, the central rows and columns of B (θ), Bp (θ) and Bp,q (θ) contain
delayed versions of the following autocorrelation and cross-correlation functions3
∞
R [k] g(t)g ∗ (t + kT )dt
−∞
∞
g(t)gp∗ (t + kT )dt
Rp [k] −∞
∞
Rp,q [k] gp (t)gq∗ (t + kT )dt,
−∞
where gp (t; θ) ∂g (t; θ) /∂θ p stands for derivative of g (t; θ) with respect to the p-th parameter
θp . In the sequel, the dependence on θ will be omitted for the sake of brevity.
Henceforth, only the central rows and columns of B (θ), Bp (θ) and Bp,q (θ) will be considered
bearing in mind that the “edge effect” is negligible in the asymptotic case (Ns → ∞) or in case
of TDMA signals (Section 6.1.2). This analysis is inspired in the asymptotic study carried out
in [Rib01b] for the CML timing estimator4 . In [Rib01b], it is shown that the multiplication of
these matrices yields another matrix whose central columns and rows are the convolution (∗)
3
For simplicity the noise is assumed uncorrelated, i.e., N = IM . Otherwise, the same expressions are valid for
the whitened waveform η(t) ∗ g (t; θ) where η(mTs ) is the central column of N−1/2 .
4
Likewise, the same reasoning was adopted in [Kay93b, Sec. 7.9] to get asymptotic expressions for the NewtonRaphson and scoring recursions in the context of maximum likelihood estimation.
CHAPTER 7. ASYMPTOTIC STUDIES
194
of the central columns and rows of the original matrices. This allows computing BUCRB (θ),
Bgml (θ) and Bbque (θ) as follows:
−1
BU CRB (θ) p,q = 2σ −4
w Re Tr (Xp (θ) Xq (θ) + X (θ) Xp,q (θ))
= 2Ns σ−4
w Re (Xp [k] ∗ Xq [k] + X [k] ∗ Xp,q [k])|k=0 + o (Ns )
where the central rows and columns of X (θ), Xp (θ) and Xp,q (θ) are given by
−1
X[k] R[k] − R[k] ∗ R[k] + σ 2w δ[k]
∗ R[k]
−1
∗ R[k]
Xp [k] Rp [k] − Rp [k] ∗ R[k] + σ2w δ[k]
(7.48)
Xp,q [k] Rp,q [k] − Rp [k] ∗ (R[k] + δ[k])−1 ∗ R∗q [−k].
In the above equations, the inverse operator (·)−1 stands for the deconvolution, i.e., a−1 [k] is
the sequence holding a[k] ∗ a−1 [k] = δ[k]. As it is the usual practice, this deconvolution is solved
in the frequency domain. Using standard Fourier calculus, it is found that
B−1
U CRB (θ) p,q
=
2Ns σ −4
w Re
0.5
−0.5
SXp (f)SXq (f) + SX (f )SXp,q (f)df + o (Ns )
where the Fourier transforms of X[k], Xp [k] and Xp,q [k] are given next in terms of the Fourier
transforms of R[k], Rp [k] and Rp,q [k]:
S 2 (f )
σ2w S(f)
=
S(f ) + σ2w
S(f ) + σ2w
Sp (f) S(f)
σ2w Sp (f )
SXp (f ) F {Xp [k]} = Sp (f ) −
=
S(f ) + σ2w
S(f ) + σ2w
Sp (f)Sq∗ (f )
.
SXp,q (f ) F {Xp,q [k]} = Sp,q (f) −
S(f ) + σ2w
SX (f ) F {X[k]} = S(f ) −
(7.49)
Focusing uniquely on circular complex alphabets (e.g., MPSK and QAM), the term Ψ (K)
appearing in the GML covariance matrix (7.37) is asymptotically (Ns → ∞) given by
−4
2
[Ψ (K)]p,q = σ−4
w (ρ − 2) Tr (Yp (θ) Yq (θ)) = Ns σ w (ρ − 2) Yp [0] + o (Ns )
where
Yp [k] X[k] ∗ Xp [k] + Xp∗ [−k] ∗ X[k] = 2 Re Xp∗ [−k] ∗ X[k] .
On the other hand, the term Γ(K) (7.46) appearing in the BQUE covariance matrix (7.43)
7.4. LARGE SAMPLE STUDY
195
is asympotically equal to
−1
−1
H
∗
4
[Γ (K)]p,q = −σ−4
diag
(Y
(θ))
X
(θ)
X
(θ)
+
σ
(ρ
−
2)
I
diag (Yq (θ))
p
K
w
w
∞ −1
|X[k]|2 + σ4w (ρ − 2)−1 δ [k]
+ o (Ns )
2
= −Ns σ −4
w Yp [0]
k=−∞
2 [0]
Y
p
N
+ o (Ns )
= −σ−4
s
w
∗
4
S (f) ∗ S (−f) + σ w / (ρ − 2) f =0
0.5 2
−0.5 SYp (f)df
= −σ−4
+ o (Ns )
N
s
w
0.5
2
4
−0.5 |S (f) | df + σ w / (ρ − 2)
with SYp (f ) = F {Yp [k]} the Fourier transform of Yp [k].
Regarding now the Gaussian condition in (7.47), the matrices Γ(K) and Ψ (K) are asymptotically null if
∗
∗
Yp [0] = 2 Re Xp [−k] ∗ X[k] k=0 = 2 Re
Xp [n]X[n]
= 2 Re
0.5
−0.5
∗
SX
(f )S(f )df
p
n
=
2σ 4w Re
0.5
Sp∗ (f ) S(f )
−0.5
|S(f ) + σ2w |2
df
is equal to zero independently of the actual value of ρ and σ2w . The last expression has been
formulated in the frequency domain using the Parceval’s identity and the Fourier transforms of
X[k] and Xp [k] in (7.49).
Notice that the energy spectrum S(f) = F {R[k]} is always real because R[k] has Hermitian
symmetry, i.e., R[k] = R∗ [−k]. Besides, S(f ) is even if R[k] is real-valued, which implies that
g (t; θ) is also real-valued. Therefore, there are three possible situations leading to Yp [0] = 0,
and hence validating the Gaussian assumption:
1. Sp (f ) is imaginary, i.e., Re {Sp (f)} = 0. From the Fourier theory, Sp (f) = F {Rp [k]} is
imaginary if Rp [k] is imaginary or, Rp [k] is real but it has odd symmetry.
2. S (f) is an even function whereas Sp (f ) is an odd function. The former condition is held
for g (t; θ) real-valued. The latter condition is held if and only if the cross-correlation Rp [k]
is also odd.
3. S (f) is an odd function whereas Sp (f) is an even function. The former condition is held
when the received waveform g (t; θ) is imaginary and even. The latter condition is held if
the cross-correlation Rp [k] is also even.
It is found that the last conditions usually apply in frequency and timing5 synchronization.
In frequency synchronization, the cross-correlation Rp [k] is imaginary (condition 1). On the
5
The same conclusion applies to the related problem of time-of-arrival estimation in radiolocation applications.
CHAPTER 7. ASYMPTOTIC STUDIES
196
other hand, in timing synchronization, the cross-correlation Rp [k] is a real-valued odd function
because the transmitted pulse is usually a real symmetric function in digital communications
(condition 2)6 . Nonetheless, the Gaussian assumption generally fails in the problem of blind
channel identification (Section 6.4) because the received waveform g (t; θ) is distorted by the
complex channel impulse response and, hence, the complex cross-correlation Rp [k] does not
exhibit any symmetry. For example, if the channel is multiplicative, the received waveform is a
scaled version of the transmitted pulse and it is easy to show that Rp [k] is proportional to R[k].
7.4.5
Second-Order Estimation in Array Signal Processing
In array signal processing, the spatio-temporal observation can be written as
y = vec As (θ) (At X)T + w
(7.50)
where
M
[As (θ)]p = exp jπθp d
is the spatial signature of the p-th user impinging on a λ/2-spaced linear array from the direction
θp ∈ [−1, 1) with
M dM − M − 1
d
2
dM = [0, . . . M − 1]T .
In (7.50), the modulation matrix At ∈ RNs ×K contains the shaping pulse p(t), [X]k are the
received symbols from to the k-th user and, w the spatio-temporal Gaussian noise vector. Notice
that the array is calibrated to have unitary response when the signal comes from the broadside
(θ p = 0). However, the same results would be obtained adopting any other calibration.
The observation vector y can be arranged in the standard form,
y = A (θ) x + w,
using that vec ABCT = (A ⊗ C) vec (B) . Then, we have
A (θ) = At ⊗ As (θ)
x = vec XT .
Based on the general expressions deduced in Section 7.4, the asymptotic value of B (θ),
Bp (θ) and Bp,q (θ) in (7.33) is now obtained for the above spatio-temporal signal model. It is
6
This implies that the same pulse shaping is used in the in-phase and quadrature components.
7.4. LARGE SAMPLE STUDY
197
straightforward to show that
−1
H
−1
B (θ) = AH
t Nt At ⊗ As (θ) Ns As (θ)
∂AH
s (θ) −1
−1
N
A
⊗
Ns As (θ)
Bp (θ) = AH
t
t
t
∂θ p
−1
Bp,q (θ) = AH
t Nt At ⊗
∂AH
s (θ) −1 ∂As (θ)
Ns
∂θ p
∂θ q
assuming that the temporal and spatial components of the noise are decoupled as N = Nt ⊗ Ns
and using that
%
∂As (θ)
∂θp
&
i
⎧
⎨ jπ d
M
M exp jπθ p d
i=p
=
.
⎩0
i=
p
Moreover, assuming for simplicity that the noise is spatially uncorrelated (i.e., Ns = IM ), we
have that the spatial cross-correlation for the P users is determined by the following matrices:
−1
[B (θ)]i,k AH
s (θ) Ns As (θ) i,k = FM (θ i − θ k )
%
&
∂AH
dFM (f) s (θ) −1
[Bp (θ)]i,k Ns As (θ)
=
δ (i, p)
∂θ p
df f =θi −θk
i,k
&
%
∂AH
s (θ) −1 ∂As (θ)
[Bp,q (θ)]i,k Ns
∂θ p
∂θ q
i,k
2
d FM (f) δ (i, p) δ (k, q)
=
df 2 f =θi −θk
(7.51)
where δ (i, j) is the Kronecker delta and FM (f) is the following sinc function,
M
f =0
.
FM (f) sin(πM f /2)
f
=
0
sin(πf /2)
Notice that Bp (θ) and Bp,q (θ) in (7.51) are derived resorting to the differential property of
the Fourier transform, i.e.,
−j2πF {nv[n]} =
where V (f) = F {v[n]} =
$
−j2πf n
n v[n]e
dV (f)
df
is the Fourier transform of a given sequence v[n].
In the studied space-time signal model, the observation y can be increased by augmenting
either the number of antennas (M) or the number of snapshots (Ns ), where M and Ns are the
number of rows of As (θ) and At , respectively. The asymptotic performance of the GML and
CML direction-of-arrival estimators have already been studied in [Sto89] [Sto90a][Vib95] when
M → ∞ and in [Sto89] [Sto90a][Ott92][Car94] when Ns → ∞. In the following two sections,
the aforementioned study is extended to the optimal second-order DOA estimator deduced in
Chapter 4.
CHAPTER 7. ASYMPTOTIC STUDIES
198
Infinite number of antennas
If the number of sensors is increased (M → ∞), the asymptotic MSE matrix for the optimal
and the GML estimators is computed in Appendix 7.K, having that
−3 σ2w
6
I
+
o
M
P
−1
π 2 M 3 Tr AH
t Nt At
6
IP + o Ns−1 M −3
= 2 3
π M Ns Es /N0
BU CRB (θ) , Bgml (θ) , Bbque (θ) =
(7.52)
with σ2w = N0 the noise double-sided spectral density and Es the energy of the received symbols.
In the last expression, we have taken into account that
1
−1
Tr AH
t Nt At = Es .
Ns →∞ Ns
lim
This result was previously obtained in [Sto89] for the conditional model (Section 2.3). Moreover, it has been proved in [Sto90a, R8-R9] that the conditional and unconditional model yield
efficient estimates when the number of sensors or the SNR goes to infinity. The analysis in
Appendix 7.K focuses on the unconditional model and becomes an extension to the concise solution provided in [Sto89]. In this appendix, the asymptotic value of the off-diagonal entries of
BU CRB (θ) as well as the non-Gaussian terms in Bgml (θ) and Bbque (θ) is calculated, concluding
that they are totally negligible if M → ∞.
Also, notice that (7.52) coincides with the modified CRB (MCRB) for the problem of carrier
frequency-offset estimation [Men97, Eq. 2.4.23][Rif74], which is known to be attained by means
of data-aided (DA) methods. In both cases, the estimator tries to infere the frequency of an
infinitely long sinusoid either in the space domain (DOA) or in the time domain (DA frequency
synchronization). Nonetheless, in array signal processing, the array size is implicityly limited
by the narrowband and far-field assumptions [Vib95].
Infinite number of snapshots
When the number of observed symbols Ns is large, it is shown in [Ott92][Car94] that most
second-order DOA estimators in the literature —based on both the conditional and unconditional
model— are asymptotically robust. This means that the covariance matrix of the estimation error
is independent of the sources statistical distribution provided that Ns → ∞. This statement
implies that the higher-order term Xgml (K) (7.8) is negligible for Ns → ∞ whatever the content
of matrix K. However, it was shown in Section 6.5 that the knowledge of K can be exploited to
improve significantly the estimator accuracy when multiple constant-modulus sources transmit
towards the array from near directions. Actually, this result was already pointed out in [Ott92]
where the authors stated that ‘[...] a Gaussian distribution of the emitter signals represents the
7.5. SIMULATIONS
199
worst case. Any other distribution would tipically give better estimates, provided the appropiate
ML method is used.’
In Appendix 7.L, the asymptotic expressions of BU CRB (θ), Bgml (θ) and Bbque (θ) are obtained when the number of received symbols Ns tends to infinity whereas the number of antennas
M is kept constant. In that case, it is shown that Γ (K) is exactly zero in the single user case
(Appendix 7.L). On the other hand, if there are multiple users transmitting towards the array
(P > 1), both B−1
U CRB (θ) and Γ (K) are proportional to the number of snapshots and, therefore,
Γ(K) does not disappear as Ns → ∞. In addition, it is found that, at high SNR, the second
term Γ(K) is proportional to σ−2
w if and only if ρ = 1. If so, the contribution of Γ(K) remains
as the SNR is increased. This important result is formalized in the next equations. If Es /N0
and Ns go to infinity, it is proved in Appendix 7.Lthat
−1
(θ)
,
B
(θ)
B−1
U CRB
gml
p,q
Es
= 2Ns
Re Tr Bp,q (θ) − Bp (θ) B −1 (θ) BqH (θ) + o (Ns )
N0
p,q
−1
ρ = 1
BU CRB (θ) p,q + o (Ns )
−1
= −1
Bbque (θ)
p,q
BU CRB (θ) p,q + [Γ (K)]p,q + o (Ns ) ρ = 1
with
[Γ (K)]p,q = 2ξNs
Es
Tr Bp (θ) B−1 (θ) Dg −1 B −1 (θ) Bp (θ) B −1 (θ) δ (p, q) + o (Ns )
N0
being the result a function of the cross-correlation of the P users signatures and their derivatives
(7.51). On the other hand, the constant ξ ≤ 1 is a function of the temporal correlation of the
received signal (7.91) and is unitary in the uncorrelated case (Appendix 7.L).
7.5
Simulations
In this section, the asymptotic studies carried out in this chapter are validated via computer
simulations.
7.5.1
SNR asymptotic results for the BQUE and GML estimators
To evaluate the asymptotic performance of the BQUE and GML small-error estimators when
the SNR goes to zero and infinity, the problem of DOA estimation is adopted (see Section 6.5).
The angle of arrival of two users is estimated with a linear array of four elements (M = 4).
A single snapshot is taken at the matched filter output (Ns = 1). Assuming perfect timing
synchronization and ISI-free received pulses, the estimator MSE becomes inversely proportional
to the number of integrated snapshots. In Fig. 7.1 and Fig. 7.2, the sum of the variance of the
CHAPTER 7. ASYMPTOTIC STUDIES
200
2
10
1
10
0
Normalized Variance
10
low−SNR
asymptote
−1
10
high−SNR
asymptote
−2
10
−3
10
UCRB
−4
10
BQUE
−5
10
GML
−6
10
−20
−10
0
10
20
30
Es/No (dB)
40
50
60
70
Figure 7.1: Normalized variance for the GML and BQUE DOA estimators in case of having two
16-QAM sources transmitting from ±0.5 degrees. The low- and high-SNR limits computed in
this chapter are plotted as well.
two users, i.e.,
V ARbque = Tr (Bbque (θ))
V ARgml = Tr (Bgml (θ)) ,
(7.53)
is evaluated as a function of the Es /N0 per user when the two users are located at ±0.5o from
the broadside. For more details the reader is referred to Section 6.5.
In these figures, it is shown how the asymptotic expressions deduced in this chapter predict
exactly the low and high SNR performance of the studied quadratic small-error estimators. In
Fig. 7.1, the Gaussian assumption is shown to be optimal at low and high SNR whereas minor
losses are observed in the middle of these extremes. It has been checked that the BQUE converges
to the GML performance when the alphabet dimension is augmented (e.g., 64-QAM). On the
other hand, if the constellation has constant modulus (e.g., MPSK or CPM), the Gaussian
assumption is found to yield important losses when the SNR exceeds a given critical value or
threshold determined by the array size (Fig. 7.2). The position of this SNR threshold is actually
independent of the number of processed snapshots.
Additional simulations have been carried out for the CPM modulation obtaining the same
curves than in Fig. 7.2. Therefore, it seems that the only relevant feature of the nuisance
parameters for DOA estimation is their constant amplitude.
7.5. SIMULATIONS
201
2
10
1
10
0
low−SNR
asymptote
10
−1
Normalized Variance
10
−2
10
high−SNR
asymptote
high−SNR
asymptote
−3
10
UCRB
−4
10
GML
−5
10
−6
10
BQUE
−7
10
−20
−10
0
10
20
30
Es/No (dB)
40
50
60
70
Figure 7.2: Normalized variance for the GML and BQUE DOA estimators in case of having two
MPSK sources transmitting from ±0.5 degrees. The low- and high-SNR limits computed in this
chapter are plotted as well.
Regarding Fig. 7.2, another important remark is that the low- and high-SNR asymptotes can
be combined to lower bound the performance of any second-order technique in case of constantmodulus alphabets. Finally, notice that the UCRB only predicts the asymptotic performance of
the GML estimator. However, in both figures, the GML estimator outperforms the UCRB for
intermediate SNRs.
7.5.2
SNR asymptotic results for the large-error estimators
In this section, the large-error frequency-offset estimators presented in Section 3.4 are simulated
again. The low- and high-SNR asymptotic expressions deduced in this chapter are validated for
the 16-QAM, MPSK and MSK modulations. In all the simulations, the rank of G = E AAH
is full (Appendix 3.D). A uniform prior with ∆ = 0.4 is considered. Although this prior is rather
informative and the variance floor was not observed in Fig. 3.7 for the MSK modulation, its
existence is evidenced in Fig. 7.3. In this figure, it is also shown how the Gaussian assumption
leads to a higher variance floor at high SNR. Comparing Figs. 7.3, 7.4 and 7.5, the floor level
depends on the modulation at hand. This statement is true for both the optimal estimator and
the one deduced under the Gaussian assumption, although the latter is not represented in Figs.
7.4 and 7.5 for the sake of clarity.
CHAPTER 7. ASYMPTOTIC STUDIES
202
1
10
0
Mean Square Error (MSE)
10
−1
10
Mvar
low−SNR
asymptote
low−SNR
asymptote
Gaussian
Assumption
−2
10
M
mse
−3
10
high−SNR
asymptotes
Optimum
−4
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 7.3: Mean square error for the MMSE and minimum variance frequency-offset estimators
for the MSK modulation (Nss = 2, M = 8, K = 5). The estimators based on the Gaussian
assumption as well as all the low- and high-SNR limits are indicated.
7.5.3
Large sample asymptotic results for the BQUE and GML estimators
In this section, the large sample study in Section 7.4 is validated numerically. In the first
two figures (Fig. 7.6 and 7.7), the normalized variance is computed for the optimal secondorder timing and frequency estimators deduced in Chapter 4 under the small-error condition.
The normalization consists in multiplying the estimator variance by the number of processed
symbols, i.e., Ns = M/Nss . The estimators variance is simulated for different data lengths and
is compared to the asymptotic variance obtained from the large sample study (Ns → ∞) in
Section 7.4.4. The Gaussian assumption is optimal in all the simulations except in the timing
synchronization problem (Fig. 7.6). In that case, the Gaussian assumption exhibits a higher
variance floor (self-noise) when the noise subspace of matrix A (θ) is null (M ≤ K).
Regarding the DOA estimation problem, the large sample study presented in Section 7.4.5 is
validated via simulation for the same scenario considered in Section 7.5.1. In the first simulations
(Fig. 7.8 and 7.9), the estimator variance (7.53) is evaluated for different values of ρ considering
an array of four antennas an a single snapshot7 . The performance associated to a hypothetical
super-Gaussian constellation with ρ = 10 is also depicted in Fig. 7.9, although all the alphabets
7
Remember that estimator variance is inversely proportional to the number of processed snapshots whatever
the value of ρ. Therefore, all the results and conclusions are still correct if Ns → ∞.
7.5. SIMULATIONS
203
1
10
Mvar
0
Mean Square Error (MSE)
10
−1
10
low−SNR
asymptote
low−SNR
asymptote
−2
10
M
mse
−3
10
high−SNR
asymptotes
−4
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 7.4: Mean square error for the MMSE and minimum variance frequency-offset estimators
for the 16-QAM modulation with roll-off 0.75 (Nss = 2, M = 8, K = 8). The low- and high-SNR
limits computed in this chapter are plotted as well.
of interest in digital communications are sub-Gaussian (ρ < 2).
Regarding Fig. 7.8 and Fig. 7.9, one concludes that the asymptotic expression derived in
(7.52) for M → ∞,
BU CRB (θ) , Bgml (θ) , Bbque (θ) =
6
IP
π 2 M 3 Ns Es /N0
+ o(Ns−1 M −3 ),
is attained for practical SNRs in case of having constant-amplitude nuisance parameters (ρ = 1)
even if the number of antennas is very small (M = 4). Notice that the optimality at high SNR
is verified irrespective of the users angular separation if one compares Fig. 7.8 and Fig. 7.9.
Nonetheless, minor discrepancies are observed in Fig. 7.8 due to sinc-like beam pattern when
M is finite (7.51).
On the other hand, if ρ > 1, the estimator performance at high SNR converge to the
(Gaussian) UCRB, that corresponds to ρ = 2. It can be seen that the larger is ρ and the closer
are the sources, the lower is the Es /N0 from which the convergence to the UCRB is manifested.
Moreover, the closer are the users the more significant is the loss incurred by the Gaussian
assumption in case of constant-modulus nuisance parameters.
These conclusions are manifested again when the estimator variance is evaluated as a function
of M (Figs. 7.10-7.12). In that case, the UCRB attains the asymptotic limit (7.52) if the number
of antennas goes to infinity (M → ∞). On the other hand, when the nuisance parameters have
CHAPTER 7. ASYMPTOTIC STUDIES
204
1
10
0
10
M
Mean Square Error (MSE)
var
−1
10
low−SNR
asymptote
low−SNR
asymptote
−2
10
M
mse
−3
10
high−SNR
asymptotes
−4
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 7.5: Mean square error for the MMSE and minimum variance frequency-offset estimators
with MPSK symbols and roll-off 0.75 (Nss = 2, M = 8, K = 8). The low- and high-SNR limits
computed in this chapter are plotted as well.
constant modulus (ρ = 1), the optimal second-order estimator attains (7.52) for any value of
M, except for an intermediate interval in which the estimator converges to the UCRB. It can
be shown that the value of M from which Bbque (θ) departs from (7.52) is inversely proportional
to the angular separation of the users. Specifically, this critical value occurs when Γ(K) (7.46)
attains its maximum value. In case of having two users, the critical value of M corresponds to the
first maximum of (7.90) in Appendix 7.K that, asymptotically, takes place at M = 0.5/ |θ1 − θ 2 |.
For example, using equation (7.90), the referred threshold should take place at M 20 and
M 100 in Fig. 7.11 and Fig. 7.12, respectively.
The GML performance coincides with the UCRB unless the angular separation is reduced
(Fig. 7.12). When the number of antennas is less than 20, the GML outperforms the UCRB
bound for both the MPSK and 16-QAM modulations. Indeed, the UCRB is severely degraded
when the number of antennas is less than 10 whereas the variance of the BQUE and GML
estimators is practically constant for M < 10 (Fig. 7.13).
7.5. SIMULATIONS
205
1
10
0
10
−1
Normalized Timing Variance
10
−2
10
−3
10
M=4,K=6
−4
10
M=8, K=8
−5
10
M→∞
M=20, K=14
−6
10
M=10, K=9
−7
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 7.6: Normalized variance for the optimal second-order timing synchronizer in case of
the MPSK modulation. The transmitted pulse is a square-root raised cosine with roll-off 0.75,
truncated at ±5T . The observation timel (M) is augmented with Nss = 2 constant. The dashed
curves correspond to estimator based on the Gaussian assumption.
1
10
0
10
−1
Normalized Variance
10
−2
10
−3
M=4,K=6
10
−4
10
M=8, K=8
−5
10
M→∞
−6
10
M=20, K=14
M=10, K=9
−7
10
−10
0
10
20
30
40
50
60
Es/No (dB)
Figure 7.7: Normalized variance for the optimal second-order frequency-offset synchronizer in
case of the MPSK modulation. The transmitted pulse is a square-root raised cosine with roll-off
0.75, truncated at ±5T . The observation interval (M) is augmented with Nss = 2 constant.
CHAPTER 7. ASYMPTOTIC STUDIES
206
0
10
−1
10
ρ=2 (UCRB)
asymptote
for M→∞
−2
Normalized Variance
10
ρ=1.2
−3
10
−4
10
ρ=1.01
−5
10
ρ=1.001
ρ=1
−6
10
−7
10
−10
0
10
20
Es/No (dB)
30
40
50
Figure 7.8: Normalized variance for the optimal second-order small-error DOA estimator for
different values of ρ. The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1,
M = 4, two users transmitting from ±5 degrees.
−1
10
−2
10
−3
Normalized Variance
10
ρ=10
asymptote
for M→∞
ρ=2 (UCRB)
−4
10
ρ=1.2
ρ=1.05
−5
10
ρ=1.01
−6
10
ρ=1.001
−7
10
ρ=1
−8
10
0
10
20
30
40
50
60
70
Es/No (dB)
Figure 7.9: Normalized variance for the optimal second-order small-error DOA estimator for
different values of ρ. The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1,
M = 4, two users transmitting from ±0.5 degrees.
7.5. SIMULATIONS
207
−6
10
−7
10
UCRB, GML
BQUE
−8
Normalized Variance
10
−9
Asymptote
for M→∞
10
−10
10
−11
10
−12
10
1
2
10
10
M
Figure 7.10: Normalized variance for the optimal second-order small-error DOA estimator as
a function of M. The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1,
Es /N0 =60dB, two MPSK users transmitting from ±5 degrees.
−4
10
−5
10
UCRB, GML
−6
10
−7
Normalized Variance
10
−8
10
−9
10
Asymptote
for M→∞
−10
10
−11
10
BQUE
−12
10
−13
10
−14
10
1
2
10
10
M
Figure 7.11: Normalized variance for the optimal second-order small-error DOA estimator as
a function of M. The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1,
Es /N0 =60dB, two MPSK users transmitting from ±0.5 degrees.
CHAPTER 7. ASYMPTOTIC STUDIES
208
−2
10
UCRB
−4
10
BQUE
(16−QAM)
−6
10
Normalized Variance
GML
(MPSK)
−8
10
Asymptote
for M→∞
−10
10
−12
10
−14
10
BQUE
(MPSK)
−16
10
1
2
10
3
10
10
M
Figure 7.12: Normalized variance for the optimal second-order small-error DOA estimator as
a function of M . The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1,
Es /N0 =60dB, two MPSK (or 16-QAM) users transmitting from ±0.1 degrees.
0
10
UCRB
−2
Normalized Variance
10
−4
10
BQUE, GML
(16−QAM)
−6
10
GML
(MPSK)
BQUE
(MPSK)
−8
10
M→∞
−10
10
2
4
6
8
10
12
14
16
18
20
M
Figure 7.13: Zoom of the previous plot between M = 2 and M = 20. The simulation parameters
are Nss = 1, Nyquist pulse shaping, K = 1, Es /N0 =60dB, two MPSK (or 16-QAM) users
transmitting from ±0.1 degrees.
7.6. CONCLUSIONS
7.6
209
Conclusions
In the previous chapters, the Gaussian assumption was proved to yield significant losses at
high SNR if the nuisance parameters had a constant modulus and the observation interval
was reduced. In this chapter, the Gaussian assumption has been examined again when the
observation interval is increased to infinity. From the Central Limit Theorem, it seems that
the data statistics will be irrelevant in this asymptotic case. This intuition is validated in
some important estimation problems such as digital synchronization and DOA estimation, if
the number of antennas goes to infinity. However, the Gaussian assumption is shown to be
suboptimal in some other scenarios. In particular, the Gaussian assumption fails when the
estimator suffers from self-noise at high SNR. In that case, the fourth-order information about
the nuisance parameters can be exploited to reduce the variance floor, mainly when the nuisance
parameters have constant amplitude.
By considering the fourth-order information of the nuisance parameters, second-order DOA
estimators are able to attain the asymptotic performance associated to an infinite number of
antennas, even if the array is very short. On the other hand, the Gaussian assumption yields an
important loss that is a function of the sources angular separation. Therefore, in array signal
processing, we have concluded that the Gaussian assumption is only optimal if there is a single
source or the number of antennas goes to infinity.
Finally, in the problem of blind channel identification, some improvement is also expected in
the asymptotic case when the transmitted symbols are drawn from a constant-modulus alphabet
(Section 6.4).
CHAPTER 7. ASYMPTOTIC STUDIES
210
Appendix 7.A Low-SNR ML scoring implementation
It was obtained in Section 2.4.1 that the log-likelihood function in a low SNR scenario is given
by
H
−1 ln fy (y; θ) = Tr R−1
A
(θ)
A
(θ)
R
R
−
R
+ o σ−2
w
w
w
w
that, resorting again to the vec (·) operator, can be manipulated as follows
H
−1
(
r − rw ) + o σ−2
ln fy (y; θ) = vecH R−1
w A (θ) A (θ) Rw
w
r − rw ) + o σ−2
= vecH A (θ) AH (θ) (R∗w ⊗ Rw )−1 (
w
H
r − rw ) + o σ−2
= σ−4
A (θ) AH (θ) N −1 (
w vec
w
and Rw , respectively, and the
where r and rw are the vectorization of the Hermitian matrices R
following relations have been applied
vecH (A) vec(B) = Tr(AH B) = Tr(BAH )
vec ABCH = (C∗ ⊗ A) vec(B).
(7.54)
AH ⊗ BH = (A ⊗ B)H
A−1 ⊗ B−1 = (A ⊗ B)−1
The gradient of the asymptotic log-likelihood function is given by
∂ ln fy (y; θ)
H
−1
= σ −4
(
r − rw ) + o σ−2
w Dr (θ)N
w
∂θ
since ∂R (θ) /∂θ = ∂ A (θ) AH (θ) /∂θ. Next, the Fisher’s information matrix is computed as
the expected value of the Hessian matrix, obtaining the following asymptotic expression:
2
∂ ln fy (y; θ)
∂ ln fy (y; θ)
∂ ln fy (y; θ) H
Ey
= Ey
∂θ
∂θ
∂θ∂θT
H
−1
Q (θ)N −1 Dr (θ) + o σ−4
= σ −8
w Dr (θ)N
w
H
−1
= σ −4
Dr (θ) + o σ−4
w Dr (θ)N
w
using that, at low SNR, the fourth-order matrix Q (θ) is given by
r − rw )H = σ4w N + o σ4w .
Q (θ) =E (
r − rw ) (
Therefore, the following scoring recursion, which was presented in equation (2.29),
k ) (
k+1 = α
k + MH (θ
α
r − rw )
−1 H
−1
M(θ) N −1 Dr (θ) DH
Dr (θ)
Dg (θ)
r (θ)N
is known to attain the CRB at low SNR if the small-error condition is verified.
−1 (θ)
7.B HIGH-SNR LIMIT OF R−1 (θ) AND R
211
−1(θ)
Appendix 7.B High-SNR limit of R−1 (θ) and R
In this appendix, we consider that A (θ) is full column rank. In that case, the asymptotic value
of R−1 (θ) can be easily obtained by means of the inversion lemma:
−1
R−1 (θ) = A (θ) AH (θ) + σ2w N
−1 H
−1
IM −A (θ) AH (θ) N−1 A (θ) + σ 2w IK
= σ−2
A (θ) N−1 .
w N
(7.55)
At high SNR, the inner inverse can be expanded in a Taylor series around σ2w = 0, having
that8
H
−1 H
−1
−2
A (θ) N−1 A (θ) + σ2w IK
= A (θ) N−1 A (θ)
− σ2w AH (θ) N−1 A (θ)
−3
+σ4w AH (θ) N−1 A (θ)
+ O σ6w .
Finally, plugging these three terms into (7.55), the high-SNR limit of R−1 (θ) is given by
4
⊥
2
R−1 (θ) = σ−2
w PA (θ) + B (θ) − σ w B (θ) NB (θ) + O σw
(7.56)
where P⊥
A (θ) and B (θ) are defined in (7.13)-(7.15), and the following identity has been considered:
AH (θ) N−1 A (θ)
−1
H
= A# (θ) N A# (θ) .
(7.57)
The key property of R−1 (θ) is that, asymptotically, it holds that
AH (θ) R−1 (θ) = AH (θ) B (θ) + O σ2w = A# (θ) + O σ 2w
H
R−1 (θ) A (θ) = B (θ) A (θ) + O σ2w = A# (θ) + O σ2w
AH (θ) R−1 (θ) A (θ) = AH (θ) B (θ) A (θ) + O σ2w = IK + O σ2w
(7.58)
because, by definition,
AH (θ) P⊥
A (θ) = 0
P⊥
A (θ) A (θ) = 0.
8
The following relation has been considered to obtain the terms of the Taylor expansion:
∂X−1 (λ)
∂X (λ) −1
= −X−1 (λ)
X (λ) .
∂λ
∂λ
(7.59)
CHAPTER 7. ASYMPTOTIC STUDIES
212
To conclude this appendix, the asymptotic value of R−1 (θ) is obtained from (7.56) as indicated now:
R−1 (θ) = R∗−1 (θ) ⊗ R−1 (θ)
⊥∗
⊥
(θ)
⊗
P
(θ)
P
= σ−4
w
A
A
∗
⊥
⊥∗
B
+σ−2
(θ)
⊗
P
(θ)
+
P
(θ)
⊗
B
(θ)
w
A
A
(7.60)
+B∗ (θ) ⊗ B (θ)
−σ2w (B∗ (θ) ⊗ B (θ) NB (θ) + (B (θ) NB (θ))∗ ⊗ B (θ))
+O σ 4w .
is orthogonal to
The key property of R−1 (θ) is that the term proportional to σ−4
w
vec(A (θ) X), vec(XAH (θ)), X ⊗ A (θ) and A∗ (θ) ⊗ X for any matrix X on account of (7.59).
In particular, this is true for the matrix of derivatives Dr (θ) in (7.4) and for A (θ) in (7.2). On
the other hand, the first term on σ−2
w is orthogonal to vec(A (θ) X) and X ⊗ A (θ) whereas the
second one is orthogonal to vec(XAH (θ)) and A∗ (θ) ⊗ X, based again on (7.59).
The same properties in (7.58) can be stated for R−1 (θ), having that
AH (θ) R−1 (θ) = AH (θ) [B∗ (θ) ⊗ B (θ)] + O σ 2w = A# (θ) + O σ2w
H
R−1 (θ) A (θ) = [B∗ (θ) ⊗ B (θ)] A (θ) + O σ2w = A# (θ) + O σ2w
AH (θ) R−1 (θ) A (θ) = IK 2 + O σ2w
using the following definition of pseudoinverse:
∗
−1 H
A# (θ) AH (θ) N −1 A (θ)
A (θ) N −1 = A# (θ) ⊗ A# (θ) .
All these properties will be used to simplify the high-SNR expressions in Section 7.3.
(7.61)
7.C HIGH-SNR LIMIT OF Q−1 (θ) (K FULL-RANK)
213
Appendix 7.C High-SNR limit of Q−1 (θ) (K full-rank)
Assuming that K is invertible, the inversion lemma allows expressing Q−1 (θ) as follows
−1
Q−1 (θ) = A (θ) KAH (θ) + R (θ)
−1 H
= R−1 (θ) IM 2 − A (θ) K−1 + AH (θ) R−1 (θ) A (θ)
A (θ) R−1 (θ) (7.62)
Using (7.61) from Appendix 7.B, it follows that
AH (θ) R−1 (θ) A (θ) = IK 2 + O σ2w
and, therefore, the asymptotic value of Q−1 (θ) is straightforward if K−1 + IK 2 is invertible. In
that case, (7.62) becomes
H −1 #
A (θ) + O σ2w
K−1 + I
Q−1 (θ) = R−1 (θ) − A# (θ)
using that AH (θ) R−1 (θ) = A# (θ) (7.61). Notice that the term depending on K is negligible
unless the dominant terms of R−1 (θ) in (7.60) are null.
However, K−1 + IK 2 becomes singular in case of CPM modulations and, therefore, the inner
inverse in (7.62) is a little more involved. In that case, the terms in (7.60) depending on σ2w
must also be considered obtaining that
AH (θ) R−1 (θ) A (θ) = IK 2 − σ 2w U (θ) + O σ 4w
where U (θ) is the following full-rank matrix,
U (θ) AH (θ) [B∗ (θ) ⊗ B (θ) NB (θ) + [B (θ) NB (θ)]∗ ⊗ B (θ)] A (θ)
−1 ∗−1
−1
−1
H
H
= IK ⊗ A (θ) N A (θ)
+ A (θ) N A (θ)
⊗ IK ,
(7.63)
that is simplified applying the associative property of the Kronecker product,
(A ⊗ B) (C ⊗ D) = AC ⊗ BD,
and using then the results in (7.57) and (7.58).
Thus, the inverse in (7.62) can be solved computing the “economy-size” diagonalization of
K−1
+ IK 2 as follows
K−1 + IK 2 = V Σ−1 + IK 2 VH
where Σ is the diagonal matrix containing the eigenvalues of K that are different from −1 and
the columns of V are the associated eigenvectors. Then, the inversion lemma can be applied
once more to obtain
−1 −1
−1
−1
= V Σ + I VH −σ2w U (θ)
+ O (1)
K + AH (θ) R−1 (θ) A (θ)
⊥
= −σ−2
w PV (θ) + O (1)
CHAPTER 7. ASYMPTOTIC STUDIES
214
where the orthogonal projector P⊥
V (θ) onto the subspace spanned by V is defined as
H −1
−1 H −1
−1
P⊥
(θ)
U
(θ)
I
−
V
V
U
(θ)
V
V
U
(θ)
.
V
(7.64)
As it was argumented for the projector P⊥
A (θ) in (7.14), the conventional definition of the
H
is modified to include the weighting matrix U−1 (θ).
orthogonal projector P⊥
V = I − VV
Anyway, P⊥
V (θ) holds that
P⊥
V (θ) V = 0
VH P⊥
V (θ) = 0,
and, thus, P⊥
V (θ) is the projection matrix onto the subspace generated by the eigenvectors of
K associated to the eigenvalue −1.
Finally, putting together all the above partial results, we obtain that
H
#
#
A
(θ)
P⊥
Q−1 (θ) = R−1 (θ) + σ −2
w
V (θ) A (θ) + O (1)
using that AH (θ) R−1 (θ) = A# (θ) (7.61).
⊥
In the following, the orthogonal projector P⊥
V (θ) will be referred to as PK (θ) in order to
emphasize the dependence on the kurtosis matrix K, i.e.,
⊥
P⊥
K (θ) PV (θ) .
7.D HIGH-SNR LIMIT OF Q−1 (θ) (K SINGULAR)
215
Appendix 7.D High-SNR limit of Q−1 (θ) (K singular)
If K is singular, as it happens when the nuisance parameters are drawn from a circular constellation (3.12), the inversion lemma cannot be applied directly and it is necessary to diagonalize
previously the matrix K as indicated next:
−1
H H
Q−1 (θ) = R (θ) + A (θ) VK ΣK VK
A (θ)
−1 H H
H H
−1
−1
= R−1 (θ) IM 2 − A (θ) VK Σ−1
+
V
A
(θ)
R
(θ)
A
(θ)
V
V
A
(θ)
R
(θ)
K
K
K
K
where ΣK is the diagonal matrix containing the non-zero eigenvalues of K, and VK the associated
eigenvectors. Therefore, the study carried out in Appendix 7.C is still correct if K−1 and A (θ)
are substituted by Σ−1
K and A (θ) VK , respectively. In that case, the inner inverse studied in
detail in Appendix 7.C is given by
−1
−1
H H
ΣK + VK
A (θ) R−1 (θ) A (θ) VK
(7.65)
with
H H
VK
A (θ) R−1 (θ) A (θ) VK = I−σ2w U (θ) + O σ4w
and
−1 H
∗−1
H
U (θ) VK
IK ⊗ AH (θ) N−1 A (θ)
+ A (θ) N−1 A (θ)
⊗ IK VK .
(7.66)
(7.67)
According to this result, the same two scenarios of Appendix 7.C can be distinguished:
1. If Σ−1
K + I is invertible, it follows that
H
−1 H #
Q−1 (θ) = R−1 (θ) − A# (θ) VK Σ−1
VK A (θ) + O σ2w
K +I
and the second term becomes negligible at high SNR.
−1
2. Otherwise, if Σ−1
K + I is singular, the diagonal matrix ΣK + I has to be diagonalized as
VΣV H where Σ is the diagonal matrix containing the eigenvalues of K different from −1
and V are the vectors of the canonical basis {ek } selecting the position of these eigenvalues
in ΣK . Formally, the k-th diagonal entry of ΣK different from −1 is selected by means of
the vector ek defined as
[ek ]i 1 i=k
0 i = k
.
In that case, the term −σ2w U (θ) in (7.66) must be considered in the computation of (7.65),
yielding
−1
−1 −1
ΣK + I−σ2w U (θ)
= VΣV H −σ2w U (θ)
⊥
= −σ −2
w PV (θ) + O (1)
CHAPTER 7. ASYMPTOTIC STUDIES
216
where P⊥
V (θ) is the following orthogonal projector:
H −1
−1 H −1
−1
P⊥
(θ)
U
(θ)
I−V
V
U
(θ)
V
V
U
(θ)
V
with U (θ) defined in (7.67).
Finally, we obtain that
H
H #
A# (θ) VK P⊥
Q−1 (θ) = R−1 (θ) + σ−2
w
V (θ) VK A (θ) + O (1)
using again that AH (θ) R−1 (θ) = A# (θ) (7.61).
H
The matrix VK P⊥
V (θ) VK is also a projector onto the subspace generated by the eigen-
vectors of K associated to the eigenvalue −1. This projection is carried out in two steps.
H are projecting onto the subspace associated to the eigenFirst, the matrices VK and VK
values of K different form 0. Afterwards, P⊥
V (θ) is projecting onto the subspace associated
to those eigenvalues different from −1.
H
⊥
In the following, the orthogonal projector VK P⊥
V (θ) VK will be referred to as PK (θ) in
order to emphasize the dependence on the kurtosis matrix K, i.e.,
⊥
H
P⊥
K (θ) VK PV (θ) VK .
7.E HIGH-SNR RESULTS WITH A (θ) SINGULAR
217
Appendix 7.E High-SNR results with A (θ) singular
Depending on the rank of A (θ) ∈ CM ×K , two singular situations can be distinguished:
1. If the rank of A (θ) is equal to M with M ≤ K, the high-SNR limit of R−1 (θ) and R−1 (θ)
is independent of the noise variance and is simply given by
−1
+ o (1)
R−1 (θ) = A (θ) AH (θ)
−1
+ o (1)
R−1 (θ) = A (θ) AH (θ)
whereas the asymptotic value of Q−1 (θ) is determined by the rank of the K 2 × K 2 matrix
IK 2 + K. If the rank of IK 2 + K is greater or equal to M 2 , the inverse of Q (θ) is the
following constant matrix:
−1
Q−1 (θ) = R (θ) + A (θ) KAH (θ)
−1
= A (θ) AH (θ) + A (θ) KAH (θ)
+ o (1)
−1
= A (θ) (IK 2 + K) AH (θ)
+ o (1) .
In that case, all the MSE matrices in (7.5) and (7.6) suffer a serious floor at high SNR. This
situation arises when there are less observations than nuisance parameters (i.e., M ≤ K)
and the noise subspace becomes null.
On the other hand, if the rank of IK 2 + K is less than M 2 (assuming that M ≤ K), the
use of the fourth-order information avoids the variance floor at high SNR because in that
case A (θ) (IK 2 + K) AH (θ) is not invertible and the terms of R−1 (θ) (7.12) proportional
to σ2w has to be considered, as done in Appendix 7.C and Appendix 7.D. This situation is
only possible if the nuisance parameters have constant modulus as, for example, the MPSK
and CPM constellations. In the MPSK case, the rank of IK 2 +K is exactly K 2 −K because
K = (ρ − 2) diag (vec (IK )) .
In the CPM case, the rank reduction is still more significant.
2. If the rank of A (θ) is lower than min(M, K), the covariance matrix R (θ) must be diagonalized as follows
H
(θ) + σ2w N
R (θ) = VA (θ) ΣA (θ) VA
(7.68)
where ΣA (θ) is the diagonal matrix having the positive eigenvalues of A (θ) AH (θ) and,
VA (θ) are the associated eigenvectors. Therefore, the inverse of R (θ) is obtained after
applying the inversion lemma to (7.68), obtaining that
H
−1 H
−1
−1
IM 2 − VA (θ) VA
.
(θ) N−1 VA (θ) + σ2w Σ−1
(θ)
V
(θ)
N
R−1 (θ) = σ−2
w N
A
A
CHAPTER 7. ASYMPTOTIC STUDIES
218
Then, similar results to those in Appendix 7.B are obtained with these substitutions:
#
−1
P⊥
I
(θ)
−→
N
−
V
(θ)
V
(θ)
M
A
A
A
H
#
#
B (θ) −→ VA
(θ) Σ−1
A (θ) VA (θ)
H
−1 H
#
(θ) VA
(θ) N−1 VA (θ)
VA (θ) N−1 . This second situation is observed
with VA
when some columns of A (θ), if M ≥ K, or some rows of A (θ), if M ≤ K, are linearly
dependent. This is actually the case of the partial response CPM signals simulated in this
thesis (e.g., 2REC, 3REC and GMSK).
7.F HIGH-SNR UCRB
219
Appendix 7.F High-SNR UCRB
Using the asymptotic results in (7.12), it follows that
−1
∗
⊥
σ2w DH
(θ) Dr (θ) = DH
r (θ) R
r (θ) B (θ) ⊗ PA (θ) Dr (θ)
⊥∗
+DH
(θ)
P
(θ)
⊗
B
(θ)
Dr (θ) + o (1) .
r
A
Then, the entries of this matrix can be simplified as indicated next:
H
∂R (θ) ⊥
∂R (θ)
2
−1
σ w Dr (θ) R (θ) Dr (θ) p,q = Tr
PA (θ)
B (θ)
∂θp
∂θq
∂R (θ) ⊥
∂R (θ)
B (θ)
PA (θ) + o (1) ,
+
∂θp
∂θq
(7.69)
bearing in mind that [Dr (θ)]p = vec (∂R (θ) /∂θp ) and using the properties in (7.54). The
final expression is simplified because all the matrices in (7.69) are Hermitian. Therefore, if
∂R (θ) /∂θp is decomposed as
∂R (θ)
∂A (θ) H
∂AH (θ)
=
A (θ) + A (θ)
,
∂θp
∂θp
∂θ p
H
⊥
and all the terms including P⊥
A (θ) A (θ) and A (θ) PA (θ) are removed using (7.59), it follows
that
σ 2w
DH
r
−1
(θ) R
∂AH (θ) ⊥
∂A (θ)
PA (θ)
∂θp
∂θq
H
∂A (θ) ⊥
∂A (θ)
+ Tr
+ o (1)
PA (θ)
∂θq
∂θp
(θ) Dr (θ) p,q = Tr
using that AH (θ) B (θ) A (θ) = IK (7.58). Finally, the matix B1 (θ) in (7.18) is obtained
observing that the last two terms are complex conjugated.
CHAPTER 7. ASYMPTOTIC STUDIES
220
Appendix 7.G High-SNR UCRB variance floor
Using the asymptotic results in (7.12), it follows that
−1
∗
DH
(θ) Dr (θ) = DH
r (θ) R
r (θ) [B (θ) ⊗ B (θ)] Dr (θ) + o (1) .
Then, the entries of this matrix can be simplified as indicated next:
H
∂R (θ)
∂R (θ)
−1
Dr (θ) R (θ) Dr (θ) p,q = Tr
B (θ)
B (θ) + o (1) ,
∂θp
∂θq
bearing in mind that [Dr (θ)]p = vec (∂R (θ) /∂θ p ) and using the properties listed in (7.54). The
final expression is simplified because all the matrices in the last equation are Hermitian. Thus,
if ∂R (θ) /∂θp is decomposed as
∂R (θ)
∂A (θ) H
∂AH (θ)
=
A (θ) + A (θ)
,
∂θ p
∂θ p
∂θ p
and the relations in (7.58) are applied, it follows that
H
∂A (θ) #
∂A (θ) #
−1
A (θ)
A (θ)
Dr (θ) R (θ) Dr (θ) p,q = Tr
∂θp
∂θq
H
∂A (θ) ∂AH (θ) #
+
A (θ) A# (θ)
∂θp
∂θq
H ∂AH (θ) H
H
∂A (θ)
+
A# (θ)
A# (θ)
∂θp
∂θq
H
H
∂A (θ)
∂A (θ)
#
#
A (θ) A (θ)
+ o (1)
+
∂θp
∂θq
H
taking into that B (θ) A# (θ) A# (θ) . Finally, the matrix B2 (θ) in (7.20) is obtained
observing that the third and fourth terms are the complex conjugated versions of the first and
second terms9 .
9
Notice that T r (AB) = T r (BA) .
7.H HIGH-SNR STUDY IN FEEDFORWARD SECOND-ORDER ESTIMATION
221
Appendix 7.H High-SNR study in feedforward second-order estimation
Looking at the MSE matrices Σmse , Σvar , Σmse and Σvar in (7.5) and (7.6), the high-SNR limit
−1
−1
and Q + Q
has to be computed. In all these four cases, the
of R−1 , Q−1 , R + Q
following inversion problem must be solved:
−1
T + σ2w U + σ 4w N
where the expression of T depends on the inverse that is being solved (7.32) and
U [G∗ ⊗ N + N∗ ⊗ G]
G Eθ A (θ) AH (θ) .
(7.70)
(7.71)
The Bayesian expectation is found to augment the rank of the involved matrices. This effect
is actually negative since it reduces the dimension of the noise subspace. In the limit, if the
constant term T became full-rank, the estimators would exhibit the typical variance floor at
high SNR. However, in the sequel, we will assume that T is always rank defficient.
It is worth noting that the kurtosis matrix K appearing in Q (7.26), reduces the rank of T
and, therefore, the dimension of the noise subspace is increased. This aspect was also addressed
in the first point of Appendix 7.E
In order to evaluate the above inverse, the “economy-size” diagonalization of T = VT ΣT VTH
is calculated and the auxiliary matrix X U + σ2w N is introduced. Then, the inversion lemma
is invoked as it was done in Appendix 7.B, obtaining
−1 −1
= VT ΣT VTH + σ2w X
T + σ 2w U + σ4w N
2
⊥
= σ −2
w PT + BT + O σ w
(7.72)
where BT is the following matrix:
H
#
BT VT# Σ−1
T VT
(7.73)
with
−1 H −1
VT X
VT# VTH X−1 VT
#
−1
I
P⊥
X
−V
V
2
T T
M
T
the generalization of the pseudoinverse and the projection matrix onto the noise subspace of T,
respectively.
CHAPTER 7. ASYMPTOTIC STUDIES
222
In most problems, the rank of G Eθ A (θ) AH (θ) grows rapidly due to the Bayesian
expectation and, eventually, matrix G becomes full-rank. In that case, taking into account
(7.70), matrix U is also invertible so that
lim X−1 = U−1 ,
σ2w →0
bearing in mind that X U + σ2w N .
⊥
When this happens (i.e., G is full rank), if the first term σ−2
w PT survives when (7.72) is
or S = QM
multiplied by Q
in (7.5)-(7.6), it is posssible to have self-noise free estimates. Oth-
∈ span (T), the estimator exhibits the typical variance floor because the surviving
erwise, if Q
term BT in (7.72) is constant at high SNR.
When the Gaussian assumption is adopted, it is shown in Appendix 7.I that
= Eθ A (θ) vec (IK ) vecH (IK ) AH (θ)
Q
−Eθ {A (θ)} vec (IK ) vecH (IK ) EθH {A (θ)}
(7.74)
always lies in the subpace generated by
T3 Eθ A (θ) AH (θ) + Q
T4 Eθ A (θ) AH (θ) ,
which are the matrices T appearing in the MMSE and minimum variance estimators deduced
under the Gaussian assumption (7.6). This result is independent of the actual parameterization and the nuisance parameters distribution. Consequently, if G is full rank, the Gaussian
assumption always suffers from self-noise at high SNR (7.29), and the level of the variance floor
is determined by Xvar (K) and Xmse (K) (7.30).
Regarding the optimal estimators in (7.5), the cumulant matrix K is able to reduce the rank
of
T1 Eθ A (θ) (IK 2 + K) AH (θ) + Q
T2 Eθ A (θ) (IK 2 + K) AH (θ)
if the nuisance parameters have constant modulus. Unfortunately, this reduction is usually
out of the span of T1 or T2 (7.32). In that case, the optimal large-error
insufficient to move Q
estimators in (7.5) also exhibit a variance floor at high SNR (7.29). The level of this variance
floor depends again on the kurtosis matrix K.
7.I HIGH-SNR MSE FLOOR UNDER THE GAUSSIAN ASSUMPTION
223
Appendix 7.I High-SNR MSE floor under the Gaussian assumption
does not increases the rank of these two matrices (7.32):
In this appendix, it is proved that Q
T4 Eθ A (θ) AH (θ)
T3 Eθ A (θ) AH (θ) + Q.
This implies that
rank (T4 ) = rank T4 + Q
.
rank (T3 ) = rank T3 + Q
Regarding the first statement, it is found that
= Eθ A (θ) vec (IK ) vecH (IK ) AH (θ)
Q
−Eθ {A (θ)} vec (IK ) vecH (IK ) EθH {A (θ)}
is the sum of infinitessimal terms like this:
α1 A (θ1 ) vec (IK ) vecH (IK ) AH (θ1 ) + α2 A (θ2 ) vec (IK ) vecH (IK ) AH (θ2 )
−α3 A (θ1 ) vec (IK ) vecH (IK ) AH (θ2 ) − α3 A (θ2 ) vec (IK ) vecH (IK ) AH (θ1 ) ,
(7.75)
corresponding to two arbitrary values of θ, namely θ1 and θ2 , with
α1 = fθ (θ1 ) > 0
α2 = fθ (θ2 ) > 0
α3 = fθ (θ1 ) fθ (θ2 ) > 0
the associated probability densities.
It can be shown that (7.75) is contained in the span of the following matrix:
α1 A (θ1 ) vec (IK ) vecH (IK ) AH (θ1 ) + α2 A (θ2 ) vec (IK ) vecH (IK ) AH (θ2 ) .
Then, if T4 Eθ A (θ) AH (θ) is decomposed in the same way, T4 becomes the sum of
infinitessimal terms such as
α1 A (θ1 ) IK 2 AH (θ1 ) + α2 A (θ2 ) IK 2 AH (θ2 ) .
must lie in the span of T4 on account of the following relationship:
Therefore, Q
A vec (B) vecH (B) C ∈ span {A (B ⊗ B) C}
that is hold for arbitrary matrices A, B and C.
∈ span (T4 ), it belongs necessarily to the span of T3 = T4 + Q.
Finally, if Q
CHAPTER 7. ASYMPTOTIC STUDIES
224
Appendix 7.J Performance limits in second-order frequency estimation
When the received signal exhibits a frequency offset equal to ν/T , the pulse received at time kT
is given by
g (mTs − kT ; ν) = g0 (mTs − kT ; ν) ej2πνk
where T and Ts are the symbol and sample period, respectively, and
g0 (mTs ; ν) p (mTs ) ej2πνm/Nss
stands for the pulse p(t) received at t = 0. The derivative of g (mTs − kT ; ν) is given by
%
&
∂g (mTs − kT ; ν)
∂g0 (mTs − kT ; ν)
=
+ g0 (mTs − kT ; ν) (j2πk) ej2πνk .
∂ν
∂ν
Let A (ν) and A0 (ν) stand for the matrices whose columns are delayed replicas of g (mTs ; ν)
and g0 (mTs ; ν) , respectively. It can be shown that these two matrices —and their derivatives—
are related in the following manner:
A (ν) = A0 (ν) diag (exp (j2πdK ν))
&
%
∂A0 (ν)
∂A (ν)
=
+ A0 (ν) diag (j2πdK ) diag (exp (j2πdK ν))
∂ν
∂ν
where
dK = [0, 1, . . . , K − 1]T
is the K-long vector accounting for the intersymbol phase slope. On the other hand, the stationary matrix A0 (ν) only accounts for the phase variation during the observation interval.
Thus, in the frequency estimation problem, X1 (θ) and X1,1 (θ) (7.35) have some additional
terms depending on dK that are listed next:
X (ν) = σ 2w AH (ν) R−1 (ν) A (ν) = X (ν) E∗ (ν)
∂AH (ν) −1
X1 (ν) = σ 2w
R (ν) A (ν) = X1 (ν) − σ2w diag (j2πdK ν) X (ν) E∗ (ν)
∂ν
∂AH (ν) −1
∂A (ν) X1,1 (ν) = σ 2w
R (ν)
= X1,1 (ν) − σ 2w diag (j2πdK ν) X1H (ν)
∂ν
∂ν
+X1 (ν) diag (j2πdK ν) − diag (j2πdK ν) X (ν) diag (j2πdK ν)] E∗ (ν)
where
H
−1
(ν) A0 (ν)
X (ν) σ −2
w A0 (ν) R
H
∂A0 (ν) −1
R (ν) A0 (ν)
X1 (ν) σ −2
w
∂ν
∂AH
∂A0 (ν)
0 (ν) −1
X1,1 (ν) σ −2
R (ν)
w
∂ν
∂ν
7.J PERFORMANCE LIMITS IN SECOND-ORDER FREQUENCY ESTIMATION
225
are the functions X (θ), X1 (θ) and X1,1 (θ) associated to the stationary matrix A0 (ν) , and
[E (ν)]i,k exp (j2π (i − k) ν) was introduced in Appendix 3.D.
It can be shown that the new terms on dK as well as the factor E∗ (ν) vanish when X (θ),
X1 (θ) and X1,1 (θ) are plugged into BUCRB (θ) (7.36), Ψ(K) (7.38) and Γ(K) (7.45). The terms
on dK are imaginary and they are eliminated when the real part is extracted in BUCRB (θ), Ψ(K)
and Γ(K). On the other hand, only the diagonal entries of E∗ (ν) —which are all equal to 1—are
involved in (7.36), (7.38) and (7.44).
The conclusion of this appendix is that, despite the received signal is not stationary in the
frequency estimation problem, the asymptotic study can be addressed considering uniquely the
stationary matrix A0 (ν) and its derivatives.
CHAPTER 7. ASYMPTOTIC STUDIES
226
Appendix 7.K Asymptotic study for M → ∞
In this appendix, the asymptotic limit of BU CRB (θ), Bgml (θ) and Bbque (θ) is derived when the
number of antennas goes to infinity. While the asymptotic study of BU CRB (θ) and Bgml (θ)
was already addressed in the literature, the asymptotic study of the optimal second-order DOA
estimator is carried out in this appendix for the first time. The most important conclusion
is that the term Γ(K) that appears in Bbque (θ) is always negligible in front of B−1
UCRB (θ)
independently of the nuisance parameters distribution. Indeed, B−1
U CRB (θ) is found to grow
as M 3 whereas Γ(K) cannot grow faster than M. Moreover, if the nuisance parameters are
circular, Γ(K) goes to zero as M −1 or faster. An interesting conclusion is that the convergence
order can be increased in one order when the parameters are drawn from a constant-modulus
alphabet. However, this increase is not sufficient in the studied problem because the dominant
term, B−1
U CRB (θ) , goes to infinity faster.
When the number of sensors goes to infinity (M → ∞), the spatial cross-correlation matrices
in (7.51) have the following asymptotic expressions:
−1
B (θ) = AH
s (θ) Ns As (θ) = MIP + O(1)
∂AH
s (θ) −1
Ns As (θ) = Bp∞ (θ) + O(1)
Bp (θ) =
∂θp
∞ (θ) + o(M 2 ) p = q
Bp,q
(θ)
∂A
(θ)
∂AH
s
s
N−1
=
Bp,q (θ) =
s
∞ (θ) + o(M 3 ) p = q
∂θp
∂θq
Bp,q
with
[B (θ)]i,k M
θi = θk
sin(πM (θi −θk )/2)
sin(π(θi −θk )/2)
otherwise
⎧
⎪
i = p, θp = θk
⎪
⎨0
∞
π cos(π(θp −θk )/2)
i = p, θp − θk = 1/M, 3/M, ...
Bp (θ) i,k ± 2 sin2 (π(θp −θk )/2)
⎪
⎪
⎩ π M cos(πM (θp −θk )/2) otherwise
2
sin(π(θp −θk )/2)
⎧
⎪
0
i = p or k = q
⎪
⎪
⎪
⎪
π2
3
⎨
M
i = p, k = q, θp = θq
∞
Bp,q (θ) i,k 12π2 cos(π(θp −θq )/2)
.
⎪
± 2 M sin2 (π(θ −θ )/2) i = p, k = q, θp − θq = 2/M, 4/M, ...
⎪
p
q
⎪
⎪
⎪
⎩ π2 M 2 sin(πM(θp −θq )/2) otherwise
4
(7.76)
(7.77)
(7.78)
sin(π(θp −θq )/2)
In order to find the asymptotic value of BUCRB (θ), Bgml (θ) and Bbque (θ), it is necessary
to obtain the limit of X (θ) , Xp (θ) and Xp,q (θ) (7.35) as M → ∞. Before doing so, we have to
evaluate the inverse appearing in X (θ) , Xp (θ) and Xp,q (θ) when the number of antennas goes
to infinity. Taking into account that the diagonal entries of
−1
B (θ) = AH
t Nt At ⊗ B (θ)
7.K ASYMPTOTIC STUDY FOR M → ∞
227
are proportional to M (7.76), it follows that
−1
−1
σ2
= M −1 M −1 B (θ) + w IKP
= B−1 (θ) + o M −1
B (θ) + σ2w IKP
M
(7.79)
where the last expression is verified for σ2w /M → 0. If the resulting inverse is now plugged
into (7.35), X (θ) and Xp (θ) become zero when σ2w /M goes to zero10 . Hence, (7.79) should be
expanded in a Taylor series around σ2w /M = 0 in order to determine its order of convergence,
obtaining
−1
B (θ) + σ2w IKP
= B−1 (θ) − σ2w B−2 (θ) + σ4w B−3 (θ) + o M −3 .
Plugging now the Taylor series into (7.35), it follows that
X (θ) = σ2w IKP − σ4w B−1 (θ) + O M −2
−1
−1
⊗ B−1 (θ) + O M −2
= σ2w IKP − σ4w AH
t Nt At
(7.80)
Xp (θ) = σ2w Bp (θ) B−1 (θ) + o (1)
= σ2w IK ⊗ Bp∞ (θ) B−1 (θ) + o (1)
−1
∞
2
Bp,q (θ) + o(M 2 ) = AH
t Nt At ⊗ Bp,q (θ) + o(M ) p = q
Xp,q (θ) =
−1
∞
3
Bp,q (θ) + o(M 3 ) = AH
t Nt At ⊗ Bp,q (θ) + o(M ) p = q
(7.81)
(7.82)
where the inverse of B (θ) has the following asymptotic value:
B −1 (θ) = M −1 IP + O M −2 .
(7.83)
(Gaussian) Unconditional Cram´
er-Rao Bound
Using the above results, it can be shown that the diagonal entries of B−1
U CRB (θ) (7.36) have
the following asymptotic value
−1
3
−1
BU CRB (θ) p,p = DH
(θ) Dr (θ) p,p = 2σ−4
r (θ) R
w Re Tr (X (θ) Xp,p (θ)) + o M
3
= 2σ −2
w Re Tr (Xp,p (θ)) + o M
H −1
3
∞
= 2σ −2
w Re Tr At Nt At ⊗ Bp,p (θ) + o M
3
π 2 σ −2
w
−1
(7.84)
M 3 Re Tr AH
=
t Nt At + o M
6
whereas the off-diagonal entries converge to a constant when the number of antennas is aug10
Notice that this condition will be also satisfied at high SNR.
CHAPTER 7. ASYMPTOTIC STUDIES
228
mented. After some tediuos calculations, it can be shown that
−1
−1
(θ) Dr (θ) p,q
BU CRB (θ) p,q = DH
r (θ) R
= 2σ−4
w Re Tr (Xp (θ) Xq (θ) + X (θ) Xp,q (θ))
∞
(θ) + o(1)
= 2 Tr IK ⊗ M −2 Bp∞ (θ) Bq∞ (θ) − B−1 (θ) Bp,q
∞
= 2K M −2 Bp∞ (θ) p,q Bq∞ (θ) q,p − Bp,q
(θ) p,q B −1 (θ) q,p + o(1)
∞
= 2KM −2 Bp∞ (θ) p,q Bq∞ (θ) q,p + Bp,q
(θ) p,q [B (θ)]q,p + o(1)
=−
Kπ 2 cos (πM (θp − θ q ))
+ o(1)
2 sin2 (π (θp − θq ) /2)
assuming in the last equation that θp − θq is not multiple of 1/2M with probability one.11
Notice that the off-diagonal entries of B−1
UCRB (θ) are constant because the term proportional
∞ (θ) are null for p = q (7.78). Therefore, in order to
to M is zero since the diagonal entries of Bp,q
∞ (θ), the off-diagonal entries of B −1 (θ) in (7.83) must be taken
evaluate the trace of B −1 (θ) Bp,q
into account. Thus, B −1 (θ) needs to be expanded in a Taylor series around M −1 = 0, obtaining
−1
B−1 (θ) = M −1 IP + M −1 [B (θ) − MIP ]
M −1 IP − M −2 [B (θ) − MIP ] + o M −1
= 2M −1 IP − M −2 B (θ) + o M −1
(7.85)
and, using (7.76), we have
sin (πM (θp − θq ) /2)
+ o M −2
B −1 (θ) q,p = −M −2 [B (θ)]q,p + o M −2 = M −2
sin (π (θp − θq ) /2)
Finally, the term Bp∞ (θ) Bq∞ (θ) is computed taking into account that
π cos (πM (θp − θk ) /2)
M −1 Bp∞ (θ) p,q =
2 sin (π (θp − θk ) /2)
using (7.77).
Gaussian Maximum Likelihood
In this section, the asymptotic study of Bgml (θ) = BU CRB (θ) + Xgml (K) when M → ∞
is addressed concluding that the second term Xgml (K) is negligible in front of BUCRB (θ).
Therefore, the GML estimator is proved to be robust to the sources’ distribution when the
number of antennas goes to infinity.
11
Notice that, if θp − θ q were multiple of 1/M , the final expression could be calculated considering these
particular cases of Bp (θ) and Bp,q (θ) in (7.77)-(7.78). Anyway, the off-diagonal entries are found to become
asymptotically constant unless θp − θq = 0.5/M, 1.5/M, ... In that case, the constant term is equal to zero and,
−1
thus, the convergence order of B−1
).
U CRB p,q becomes O(M
7.K ASYMPTOTIC STUDY FOR M → ∞
229
To begin with, let us remind the expression of Xgml (K) in (7.37):
Xgml (K) = BU CRB (θ)Ψ (K) BU CRB (θ)
H
[Ψ (K)]p,q = σ−4
w vec (Yp (θ)) K vec (Yq (θ)) .
Then, the asymptotic value of Yp (θ) is obtained from (7.39), concluding that
4
−1
(θ) + σ4w B−1 (θ) BH
Yp (θ) = X (θ) Xp (θ) + XH
p (θ) X (θ) = σw Bp (θ) B
p (θ) + o (1)
−1
H 4
∞
−1
∞
= σw IK ⊗ Bp (θ) B (θ) + B (θ) Bp (θ)
+ o (1)
H
= σ4w IK ⊗ M −1 Bp∞ (θ) + M −1 Bp∞ (θ)
+ o (1)
(7.86)
and, taking into account that Bp∞ (θ) is proportional to M (7.77), the convergence order of Yp (θ)
is O (1), at most. In that case, Xgml (K) decays as O M −6 whereas BU CRB (θ) decreases as
O M −3 (7.84) when the number of sensors goes to infinity.
Focusing now on those circular alphabets considered in (7.42), it is found that Xgml (K)
decays as O M −8 because Ψ (K) becomes proportional to M −2 , as indicated next:12
H
[Ψ (K)]p,q = σ−4
w (ρ − 2) diag (Yp (θ)) diag (Yq (θ))
= σ−4
w (ρ − 2) Tr (Yp (θ) Yq (θ))
= 4K (ρ − 2) Tr Bp∞ (θ) B −1 (θ) Bq∞ (θ) B −1 (θ) + o M −2
Bp∞ (θ) p,i B−1 (θ) i,p Bq∞ (θ) q,i B −1 (θ) i,q + o M −2
= 4K (ρ − 2)
i=p,q
=
=
4K (ρ − 2) ∞
Bp (θ) p,i [B (θ)]i,p Bq∞ (θ) q,i [B (θ)]i,q + o M −2
2
M
i=p,q
⎧
'2
⎪
⎪
$
2
sin(πM (θp −θq ))
⎨ Kπ (ρ − 2)
+ o M −2 p = q
4M 2
⎪
⎪
⎩0
p=q
sin2 (π(θp −θq )/2)
(7.87)
p = q
where the off-diagonal elements of B −1 (θ) in (7.85) are considered again because the diagonal
of Bp∞ (θ) is zero (7.77). Remember that K is the number of columns of matrix At or, in other
words, the number of nuisance parameters per user.
Best Quadratic Unbiased Estimator
Thus far, the performance of the GML estimator is shown to be independent of the nuisance
parameters distribution when the number of sensors goes to infinity. Next, the BQUE estimator
is shown to converge asymptotically to the (Gaussian) UCRB when M → ∞. Specifically, if the
nuisance parameters have constant modulus, the non-Gaussian term Γ(K) could be proportional
12
All the matrices are real-valued and the Re{} operator is omitted for simplicity.
CHAPTER 7. ASYMPTOTIC STUDIES
230
to M as the number of antennas is augmented. However, this is not possible if the nuisance
parameters are circular. In that case, Γ(K) goes to zero as M −1 . On the other hand, if the
modulus of the nuisance parameters is not constant, Γ(K) might be constant but it decays as
M −2 if the nuisance parameters are circular.
To support this conclusion, we begin by recovering the general expression of Γ(K) from
(7.45):
H ∗
H
4 −1 −1 H
[Γ (K)]p,q −σ−4
VK vec (Yp (θ))
w vec (Yp (θ)) VK VK [X (θ) ⊗ X (θ)] VK + σ w ΣK
H is the “economy-size” diagonalization of K. Then, bearing in mind that
where K = VK ΣK VK
Yp (θ) (7.86) is constant in the best case, the asymptotic order of Γ(K) is determined by
H ∗
−1
VK [X (θ) ⊗ X (θ)] VK + σ4w Σ−1
,
K
that converges to a constant if all the eigenvalues of the kurtosis matrix K are different from −1.
This condition on the eigenvalues of K is equivalent to the aforementioned constant-modulus
condition. In that case, bearing in mind that X (θ) = σ−2
w IKP + o(1), it is straightforward to
obtain
H ∗
−1
−1 −1
VK [X (θ) ⊗ X (θ)] VK + σ4w Σ−1
= σ−4
+ o (1)
w I + ΣK
K
and, therefore, Γ(K) converges to a constant as M → ∞.
On the other hand, if some eigenvalues of K are equal to −1, the inverse of I + Σ−1
K does
not exist and the second component of X (θ) (7.80) must be considered. Thus, it follows that
H ∗
−1 −1
6
VK [X (θ) ⊗ X (θ)] VK + σ4w Σ−1
= σ4w I + Σ−1
K
K − 2σ w U (θ) + o M
where the second term,
H −1
H
B (θ) VK = M −1 VK
U (θ) VK
−1
−1 −1
,
AH
N
A
⊗
I
t
P VK + o M
t
t
(7.88)
is proportional to M −1 . At this point, the inversion lemma should be applied to compute the
above inverse, as it was done in Section 7.3. By doing so, the inversion would yield a term
proportional to M and, therefore, the non-Gaussian term Γ(K) will become proportional to M,
as well.
To illustrate this general conclusion, the previous analysis is particularized in case of having
circular nuisance parameters. In that case, the non-Gaussian term Γ(K) is given in (7.46):
−1
−1
H
∗
4
[Γ (K)]p,q −σ−4
IKP
diag (Yq (θ)) .
w diag (Yp (θ)) X (θ) X (θ) + σ w (ρ − 2)
where
X∗ (θ) X (θ) + σ4w (ρ − 2)−1 IKP = σ4w
ρ−1
IKP − 2σ6w U (θ) + o M −1
ρ−2
(7.89)
7.K ASYMPTOTIC STUDY FOR M → ∞
231
and U (θ) is the matrix introduced in (7.88), that can be written as
−1
−1
U (θ) = M −1 IKP AH
+ O M −2
N
A
⊗
I
t
P
t
t
−1 −1
⊗ IP + O M −2
N
A
= M −1 Dg AH
t
t
t
being Dg [A] the diagonal matrix built from the diagonal of A.
Therefore, if the fourth- to second-order ratio ρ is not unitary, Γ (K) is given by
2−ρ
diagH (Yp (θ)) diag (Yq (θ))
ρ−1
4K 2 − ρ ∞
Tr Bp (θ) B−1 (θ) Bq∞ (θ) B−1 (θ) + o M −2
= 4
M ρ−1
⎧
'2
⎪
⎪
⎨ Kπ2 2−ρ $ sin(πM(θp −θq ))
+ o M −2 p = q
2 ρ−1
2 (π(θ −θ )/2)
4M
sin
p
q
,
=
p=q
⎪
⎪
⎩0
p = q
[Γ (K)]p,q = σ−8
w
repeating the calculations in (7.87). Notice that this term goes to zero as O M −2 and, therefore,
it is absolutely negligible when compared to B−1
U CRB (θ) (7.84).
On the other hand, if we deal with a constant modulus alphabet with ρ = 1, the constant
term in (7.89) is zero and the next term must be considered, yielding
H
−1
[Γ (K)]p,q = 0.5σ−10
(θ) diag (Yq (θ))
w diag (Yp (θ)) U
−2
2σ ξKEs ∞
= w 3
Tr Bp (θ) B −1 (θ) Bq∞ (θ) B −1 (θ) + o M −1
⎧ M
'2
⎪
−1 ⎪
⎨ σ−2 ξKEs π2 $ sin(πM(θp −θq ))
+
o
M
p=q
2
w
8M
sin (π(θp −θq )/2)
=
p=q
⎪
⎪
⎩0
p = q
(7.90)
where
Es 1
−1
Tr AH
t Nt At
K
is the energy of the received symbols (for K sufficiently large) and
−1 −1
N
A
Tr Dg −1 AH
t
t
t
H −1 ≤1
ξ
Tr At Nt At
(7.91)
is a coefficient determined by the snapshots correlation. In particular, ξ is unitary if the snap−1
shots are uncorrelated because, in that case, AH
t Nt At = Es IK . Therefore, ξK can be under-
stood as the effective observation time. The index ξ is therefore the only information about the
temporal waveform that is retained in the asymptotic performance of the optimal second-order
estimator.
CHAPTER 7. ASYMPTOTIC STUDIES
232
Finally, regarding (7.90), we can state that the term Γ (K) decays as O M −1 and, therefore,
it is asymptotically negligible in front of B−1
U CRB (θ) (7.84).
Putting together all these partial results, if follows that the estimator performance is independent of the number of interfering users because the off-diagonal terms of BU CRB (θ), Bgml (θ)
and Bbque (θ) are negligible. Furthermore, the non-Gaussian information is negligible when the
number of antennas goes to infinity because Γ (K) B−1
U CRB (θ) and Xgml (K) BUCRB (θ).
Regarding the asymptotic value of Γ(K), it can be seen how a positive term proportional to
σ−2
w appears when the nuisance parameters have a constant amplitude. However, this term is
actually proportional to M −1 and, therefore, Γ (K) is absolutely negligible in front of B−1
U CRB (θ)
(7.84) whatever the actual SNR.
7.L ASYMPTOTIC STUDY FOR NS → ∞
233
Appendix 7.L Asymptotic study for Ns → ∞
−1
The asymptotic study considering an arbitrary temporal correlation AH
t Nt At and a finite
number of sensors is rather involved because of the inverse appearing in X (θ), Xp (θ) and Xp,q (θ)
(7.35). To circumvent this obstacle, two directions have been adopted. In the first approach,
the asymptotic study (Ns → ∞) is carried out considering that the SNR goes to infinity without
−1
any assumption about AH
t Nt At . The objective is to prove that the non-Gaussian term Γ(K)
(7.46) remains significant even if the observed time is infinite. An important conclusion is
that, asymptotically, the estimator performance is independent of the temporal structure of
the received signals, at least at high SNR. Bearing this result in mind, in the second part of
this appendix, the same asymptotic study is done assuming that the received snapshots are
−1
uncorrelated, i.e., AH
t Nt At = Es IK . This scenario is actually the one simulated in Section 6.5
considering that the received symbols are detected without ISI at the matched filter output.
Large sample study for high SNR and arbitrary temporal correlation
To begin with, let us consider that the SNR is very high, i.e, σ2w → 0. In that case, the
inverse in X (θ), Xp (θ) and Xp,q (θ) (7.35) can be evaluated as we did when the number of
antennas was infinite (7.80)-(7.82), obtaining
−1
= B−1 (θ) − σ2w B−2 (θ) + σ4w B−3 (θ) + o M −3 .
B (θ) + σ2w IKP
Then, plugging this result into (7.35), we get
X (θ) = σ2w IKP − σ4w B−1 (θ) + o σ4w
−1
−1
⊗ B−1 (θ) + o σ 4w
= σ2w IKP − σ4w AH
t Nt At
Xp (θ) = σ2w Bp (θ) B−1 (θ) + o σ2w
= σ2w IK ⊗ Bp (θ) B−1 (θ) + o σ 2w
Xp,q (θ) = Bp,q (θ) + Bp (θ) B−1 (θ) BH
q (θ) + o (1)
−1
−1
= AH
(θ) BqH (θ) + o (1)
t Nt At ⊗ Bp,q (θ) − Bp (θ) B
where B (θ), Bp (θ) and Bp,q (θ) are the spatial correlation matrices for M finite (7.51).
Based on the above high-SNR expressions, the (Gaussian) UCRB,
B−1
UCRB
p,q
= 2σ −4
w Re Tr (Xp (θ) Xq (θ) + X (θ) Xp,q (θ)) ,
as well as the non-Gaussian terms Ψ (K) and Γ (K) introduced in (7.38) and (7.45),
H
H
[Ψ (K)]p,q = σ−4
w vec (Yp (θ)) VK ΣK VK vec (Yq (θ))
H ∗
H
4 −1 −1 H
[Γ (K)]p,q = −σ−4
VK vecH (Yp (θ)) ,
w vec (Yp (θ)) VK VK [X (θ) ⊗ X (θ)] VK + σ w ΣK
CHAPTER 7. ASYMPTOTIC STUDIES
234
can be evaluated when the number of received symbols goes to infinity (Ns → ∞). In the last
H is the “economy-size” diagonalization of the kurtosis matrix K and the
equations, VK ΣK VK
high-SNR limit of Yp (θ) is given by
Yp (θ) = X (θ) XP (θ) + XH
p (θ) X (θ)
= σ4w IK ⊗ Bp (θ) B −1 (θ) + B−1 (θ) BpH (θ) + o σ4w .
At this point, the formulation in Appendix 7.K can be reproduced to produce asymptotic
expressions for Ns → ∞. Thus, in this appendix, similar asymptotic expressions to those in
Appendix 7.K are deduced for B (θ), Bp (θ) and Bp,q (θ) the spatial correlation matrices of the
studied finite sensor array (7.51).
In Appendix 7.K, the asymptotic form of Ψ (K) and Γ (K) is derived as a function of K and,
afterwards, the obtained expressions are simplified in case of having circular nuisance parameters. Next, assuming again circular nuisance parameters, the limit of BUCRB (θ) , Bgml (θ) and
Bbque (θ) is calculated as the number of received symbols Ns goes to infinity13 . Thus, starting
from the above high-SNR limits of X (θ) , Xp (θ) , Xp,q (θ) and Yp (θ), we arrive at
Es
−1
B−1
(θ) BqH (θ) + o (Ns )
U CRB (θ) p,q = 2Ns 2 Re Tr Bp,q (θ) − Bp (θ) B
σw
[Ψ (K)]p,q =
[Γ (K)]p,q
4Ns (ρ − 2) Tr Bp (θ) B−1 (θ) Bp (θ) B−1 (θ) + o (Ns ) p = q
p = q
0
⎧
2−ρ
−1 (θ) B (θ) B −1 (θ) + o (N )
⎪
4N
Tr
B
(θ)
B
ρ = 1, p = q
⎪
s
p
p
s
ρ−1
⎨
−1
Es
−1
−1
−1
= 2ξNs σ2 Tr Bp (θ) B (θ) Dg
B (θ) Bp (θ) B (θ) + o (Ns ) ρ = 1, p = q
w
⎪
⎪
⎩0
p = q
taking into account that, if the number of received symbols Ns goes to infinity, the central rows
−1
and columns of AH
t Nt At are delayed versions of the pulse autocorrelation R[k] (Section 7.4.4)
and, therefore,
1
−1
Tr AH
t Nt At = R[0] = Es
Ns →∞ Ns
lim
Besides, the coefficient ξ (7.91) can be manipulated using the spectral analysis in Section 7.4.4,
yielding
−1 −1
Tr Dg−1 AH
N
A
t
t
t
1
H −1 = 1
lim ξ = lim
Ns →∞
Ns →∞
Tr At Nt At
0 Es /S (f) df
13
Ns .
Notice that, if Ns goes to infinity,the number of observed symbols K = Ns + L − 1 is asymptotically equal to
7.L ASYMPTOTIC STUDY FOR NS → ∞
235
with S (f) = F {R[k]}.
Following the same reasoning in Appendix 7.K, the asymptotic expression of [Γ (K)]p,q for
ρ = 1 is obtained from (7.46) by expanding the argument of the inverse as follows:
X∗ (θ) X (θ) + σ 4w (ρ − 2)−1 IKP = σ4w
ρ−1
IKP − 2σ6w U (θ) + o σ6w
ρ−2
where
U (θ) = Dg
−1 −1
AH
⊗ Dg B−1 (θ)
t Nt At
is the surviving term when ρ = 1.
Notice that the asymptotic results in this appendix are equivalent to those obtained in the
high-SNR study of Section 7.3.3 if we deal with circular nuisance parameters. The first conclusion
is that Xgml (θ) = BU CRB (θ)Ψ(K) BU CRB (θ) (7.35) is negligible at high SNR because it is
proportional to σ4w whereas BU CRB (θ) is only proportional to σ 2w . The second conclusion is
−1
that the second term Γ(K) has the same dependence on Ns and σ −2
w than BU CRB (θ) in case of
a constant-modulus alphabet and, therefore, Γ(K) is not negligible even if Ns → ∞.
However, the last conclusion is only verified in the multiuser case, i.e., P > 1. In the single
user case, B (θ), Bp (θ) and Bp,q (θ) are the following scalars:
B (θ) = M
Bp (θ) = 0
π2 M 2 − 1 M
.
Bp,q (θ) =
12
and, therefore, the non-Gaussian terms Ψ (K) and Γ(K) are zero for any SNR because Bp (θ) =
0. Thus, in the single user case, the asymptotic performance of second-order bearing estimators
is given by
M + σ2w
6 σ 2w
+ o Ns−1
2
2
2
π Ns Es (M − 1) M
1
M + Es /N
−1 6
0
+
o
Ns
= 2
π Ns Es /N0 (M 2 − 1) M 2
BUCRB (θ) , Bbque (θ) , Bgml (θ) =
where σ2w = N0 is the double-sided spectral density of the AWG noise.
The above result is valid for any value of σ 2w or M. Moreover, this expression converges to
the bound in (7.52) when the number of antennas holds that
M max (Es /N0 )−1 , 1 ,
which is equivalent to M 1 in the context of digital communications.
CHAPTER 7. ASYMPTOTIC STUDIES
236
Large sample study for uncorrelated snapshots and arbitrary SNR.
Next, the performance of the GML and BQUE estimators is evaluated considering an arbi−1
trary SNR and uncorrelated snapshots, i.e., AH
t Nt At = Es IK . In this scenario, it is straight-
forward to show that the (Gaussian) UCRB (7.36) is inversely proportional to the number of
snapshots K = Ns , even if Ns is finite. Actually, we have
B−1
U CRB (θ)
p,q
= 2Ns Es σ−4
w Re Tr (Xp (θ) Xq (θ) + X (θ) Xp,q (θ))
[X
(θ)]
= 2Ns Es σ−4
[X
(θ)]
+
[X
(θ)]
[X
(θ)]
p,q
p
q
w
q,p
p,q
p,q
q,p
where X (θ) , Xp (θ) and Xp,q (θ) are the spatial components of X (θ) , Xp (θ) and Xp,q (θ), that
is,
−1
X (θ) B (θ) − B (θ) B (θ) + σ2w IP
B (θ)
−1
B (θ)
Xp (θ) Bp (θ) − Bp (θ) B (θ) + σ2w IP
−1 H
Xp,q (θ) Bp,q (θ) − Bp (θ) B (θ) + σ2w IP
Bq (θ) .
Furthermore, the non-Gaussian terms Ψ (K) (7.42) and Γ (K) (7.46) are also proportional to
Ns and, consequently, they do not vanish as more snapshots are processed. Simple manipulations
yield the following expressions in case of circular nuisance parameters:
[Ψ (K)]p,q = 4Ns Es σ −4
w (ρ − 2) Tr (X (θ) Xp (θ) X (θ) Xp (θ))
−1
−1
H
∗
4
diag
(X
(θ)
X
(θ))
X
(θ)
X
(θ)
+
σ
(ρ
−
2)
I
[Γ (K)]p,q = −4Ns Es σ−4
p
P
w
w
diag (X (θ) Xq (θ)) .
Finally, notice that Bp (θ) is still null in the single user case and, therefore, Ψ (K) and Γ (K)
are also zero because Xp (θ) = 0.
Chapter 8
Conclusions and Topics for Future
Research
In this thesis, optimal blind second-order estimators are deduced considering the true distribution of the nuisance parameters. Quadratic estimators are formulated assuming that the nuisance
parameters distribution is known and a certain side information on the unknown parameters
is available. Adopting the Bayesian formulation, the referred side information is introduced by
means of the parameters prior distribution. This approach allows unifying the formulation of
the open-loop (large-error) estimators in Chapter 3 and the closed-loop (small-error) estimators
in Chapter 4. In the former case, the prior knowledge is rather vague whereas a very informative
prior is considered in the latter case.
The first important conclusion is that, in most estimation problems, second-order techniques
are severely degraded due to the bias term unless the small-error condition is satisfied. As an
illustrative example, it is shown in Chapter 3 that it could be difficult to have unbiased frequency
estimates using second-order open-loop schemes. However, the interest of second-order openloop estimators is motivated by the problematic convergence of closed-loop schemes in noisy
scenarios, in which large-error and small-error estimators are shown to yield approximately the
same mean square error.
To avoid the bias limitations, the Best Quadratic Unbiased Estimator (BQUE) is deduced
in Chapter 4 under the small-error condition. The covariance matrix associated with the BQUE
estimator constitutes the tighest lower bound on the variance of any second-order unbiased
estimator. Formally, it is claimed in Chapter 4 that
−1 H
−1
E (
α − g (θ))2 ≥ BBQUE (θ) = Dg (θ) DH
(θ) Dr (θ)
Dg (θ)
r (θ) Q
for any quadratic estimator of α = g (θ). In the above expression, Dg (θ) and Dr (θ) are the
Jacobian of g (θ) and vec (R (θ)), respectively, where R (θ) stands for the covariance matrix of
237
CHAPTER 8. CONCLUSIONS AND TOPICS FOR FUTURE RESEARCH
238
the observed vector y. On the other hand, Q (θ) contains all the central fourth-order moments
of y. The matrix Q (θ) can be splitted into two terms, as it was pointed out in Chapter 3,
obtaining
Q (θ) R∗ (θ) ⊗ R (θ) + A (θ) KAH (θ)
(8.1)
where the second term accounts for all the non-Gaussian information about the nuisance parameters (K = 0) that is profitable to estimate the vector of parameters θ by means of quadratic
processing. The evaluation of the potential benefits gained when considering this second term
has become one of the most important issues in this thesis.
In many problems, the Gaussian assumption (i.e., K = 0) is adopted to design secondorder schemes when the actual distribution is unknown or it becomes an obstacle to obtain
analytically the ML estimator. The most relevant contribution in this thesis is proving that the
Gaussian assumption leads, in some scenarios, to suboptimal second-order estimation methods.
Conversely, the Gaussian assumption is proved to supply the optimal second-order estimator
—independently of the actual parameterization— in all these cases:
• The nuisance parameters are normally distributed.
• The SNR is low.
• All the derivatives of the transfer matrix A (θ) are orthogonal to the columns of A (θ)
(Section 7.3). Formally,
PA (θ)
∂A (θ)
=0
∂θp
where PA (θ) is the orthogonal projector onto the subspace generated by the columns of
A (θ) .1
• The SNR is high and the nuisance parameters are drawn from a multilevel alphabet (e.g.,
the QAM constellation).
Otherwise, in case of dealing with constant-modulus nuisance parameters (e.g., the MPSK
or CPM modulations), some improvement can be expected in case of exploiting the second term
of Q (θ) for medium-to-high SNR. The actual improvement is a function of the observation size
and depends on the actual parameterization.
All these conclusions have also been evidenced in Chapter 5 where the design of the optimal
second-order tracker is presented. In this chapter, the Kalman filter formulation is adopted
to optimize both the acquisiton and steady-state performance. In that way, in Chapter 5, the
1
A more general condition is presented in (7.47) in case of circular nuisance parameters.
239
Gaussian assumption is validated in the acquisition phase, concluding that the acquistion time
can be significantly shortened for practical SNRs if the nuisance parameters are drawn from a
constant modulus alphabet. Otherwise, if the nuisance parameters have multiple amplitudes,
the Gaussian assumption is also optimal in terms of acquisition performance.
Despite the last statements, in some significant estimation problems, the Gaussian assumption applies asymptotically as long as the observation interval grows to infinity, as proved in
Chapter 7. In that case, the importance of the second term of Q (θ) is relegated to those scenarios in which the observation interval is short. Despite this, most conclusions from the asymptotic
study in Chapter 7 are problem-dependent. In the following paragraphs, the main conclusions
from every estimation problem addressed in this thesis are summarized:
• Synchronization. The Gaussian assumption is asymptotically optimal when the number
of observed symbols is infinite (Section 7.4.4). In a continuous mode transmission, the
asymptotic condition allows neglecting the so-called “edge effect” related to the partial
observation of the border symbols (Section 6.1.2).
On the other hand, if the observation time is limited, the fourth-order information about
the received constellation is crucial to filter out the self-noise at high SNR in case of
a partial response CPM modulation (e.g., LREC or GMSK) but it is negligible in case
of linear modulations (e.g., MPSK or QAM). The importance of this result is beyond
the actual interest of CPM modulations. Actually, it shows that the optimal second-order
estimator is able to take advantage of the statistical dependence of the Laurent’s expansion
pseudo-symbols. In this case, as it happens in coded transmissions, the received symbols
are not statistically independent in spite of being uncorrelated. Thus, the results for the
CPM format could be translated to optimize existing second-order synchronizers in case
of coded communication systems.
Finally, note that the Gaussian assumption always applies in TDMA communication systems whatever the observation length.
• Channel Estimation. If the channel amplitude is not estimated, the Gaussian assumption yields minor losses at high SNR on the average, that is, if the estimator performance
is averaged considering multiple channel realizations (Section 6.4). However, the Gaussian assumption is expected to fail for some particular realizations of the channel impulse
response. Indeed, this point is currently being investigated [LS04][LS05a][LS05b].
Another important conclusion is that the Gaussian assumption yields a severe degradation
at high SNR when the channel amplitude is estimated too. If the transmitted symbols belong to a multilevel constellation (e.g., QAM), second-order estimators exhibit the typical
variance floor at high SNR. On the other hand, if the transmitted symbols have constant
240
CHAPTER 8. CONCLUSIONS AND TOPICS FOR FUTURE RESEARCH
modulus (e.g., MPSK or CPM), the BQUE estimator is able to avoid the aforementioned
variance floor whereas, if the Gaussian assumption is adopted, the estimator performance
degrades at high SNR because the channel amplitude estimate is drastically degraded. In
fact, the higher is the transmitted pulse bandwidth (roll-off factor) the more important is
the incurred loss at high SNR.
The asymptotic study in Section 7.4.4 states that the above conclusions are still valid if the
observation time goes to infinity. The loss incurred by the Gaussian assumption becomes
a function of the actual channel impulse response.
• Direction-of-Arrival (DOA). The Gaussian assumption is asymptotically optimal when
the number of antennas is infinite (Section 7.4.5). The Gaussian assumption also applies in
the single user case whatever the array size or the working SNR (Section 7.4.5). Likewise,
the Gaussian assumption is optimal at high SNR if the transmitted symbols are drawn
from a multilevel constellation (e.g., QAM or APK).
On the other hand, if the transmitted symbols belong to a constant-modulus alphabet (e.g.,
MPSK or CPM), the fourth-order statistics of the transmitted symbols can be exploited
to discriminate the DOA of those signals impinging into the array from near directions.
When the Gaussian assumption is adopted and this fourth-order information is omitted,
a significant loss is manifested at high SNR that is a function of the number of antennas
and the users angular separation. Furthermore, the incurred loss cannot be reduced even
if the number of received symbols goes to infinity (Section 7.4.5).
8.1
Further Research
In this thesis, the ultimate limits of second-order estimation are studied from both the practical
and theoretical point of view. However, some interesting points are still open and should be
investigated in the future. From the author’s opinion, the most promising topics for further
study are outlined in the following paragraphs:
1. Multiuser estimation problems. Second-order methods are able to exploit the constant
modulus property of the random nuisance parameters. This property appears reflected
in the eigendecomposition of the kurtosis matrix K. In some estimation problems, this
information is crucial to deal with the intersymbol interference as well as the multiple
access interference in multiuser applications. Actually, the results of this thesis suggest
that the constant modulus property is mainly relevant in multiuser or MIMO scenarios.
In these scenarios, the constant-modulus property could be exploited to discriminate the
parameters associated to non-orthogonal interfering users.
8.1.
FURTHER RESEARCH
241
2. Asymptotic Gaussian assumption. The Gaussian assumption does not apply for practical SNRs if the nuisance parameters have constant-modulus and the observation length
is rather short in case of low-cost implementations. However, in some important problems,
the Gaussian assumption is rapidly satisfied as the number of observations is augmented
because of the Central Limit Theorem. This asymptotic study was addressed in Chapter
7 for different estimation problems. In all the studied problems, if the nuisance parameters have constant modulus, the second term in equation (8.1) generates a favourable
term that persists at high SNR. In the problem of DOA estimation, this term becomes
negligible if the number of antennas goes to infinity. However, this term survives if the
number of antennas is finite, even if the number of snapshots goes to infinity. Therefore,
the results in Chapter 7 could be useful to identify those estimation problems in which the
non-Gaussian information persists as the number of observations goes to infinity.
3. Noncircular and coded nuisance parameters. If we deal with noncircular nuisance
parameters (e.g., CPM signals), the kurtosis matrix K provides additional information
regarding the statistical dependence of the nuisance parameters. In that way, it is possible
to remove the self-noise —including the multiple access interference— even if the number
of parameters exceeds the number of observations. This feature should be throroughly
investigated because it could be exploited to improve second-oder estimators in case of
coded transmissions. Moreover, other noncircular constellations should be studied in detail
as, for example, BPSK, digital PAM and staggered formats as the offset QPSK modulation
Besides these three principal research lines, some other topics for future research are listed
next:
1. Large error bounds with nuisance parameters. In Section 2.6, the most important
lower bounds in the literature were classified and briefly described. Among all the existing
bounds, the Cram´er-Rao bound (CRB) is without doubt the most widespread one due to
its simplicity. However, the true CRB is still unknown in a lot of estimation problems in
the presence of nuisance parameters. To fill this gap, the CRB is derived in some particular scenarios: low SNR, high-SNR, Gaussian nuisance parameters (Gaussian UCRB),
deterministic and continuous nuisance parameters (CCRB), and known nuisance parameters (MCRB). Moreover, in this thesis we have deduced the CRB under the quadratic
constraint.
In the context of digital communications, it would be useful to apply the last assumptions
to the large-error bounds in Section 2.6 in order to characterize the large-error region and
the SNR threshold in the presence of nuisance parameters. Among all the lower bounds
in Section 2.6, the Hammersley-Chapman-Robbins, Weiss-Weinstein and Ziv-Zakai lower
bounds are surely the most promising candidates. Then, the obtained large-error bounds
242
CHAPTER 8. CONCLUSIONS AND TOPICS FOR FUTURE RESEARCH
should be compared to the second-order large-error estimators deduced in Chapter 3. From
this comparison, we could determine whether quadratic estimators are optimal or not at
low SNR in the large-error regime. Also, we could evaluate the performance loss due to
the presence of the random nuisance parameters.
2. Acquisition optimization. The QEKF was proposed in Chapter 5 with the aim of improving the acquisition performance of classical closed loop schemes. Initially, the QEKF
supplies the large-error MMSE solution derived in Chapter 3 considering the initial noninformative prior. Thereafter, the QEKF converges progressively to the small-error solution in Chapter 4 every time that a new observation is processed. The new datum is
used to update the prior distribution so that the prior becomes every time more and more
informative.
Unfortunately, the QEKF convergence is not guaranteed unless the observations and parameters are jointly Gaussian distributed. Moreover, although the QEKF had converged,
the acquisition time could have been shortened by optimizing the prior update. Therefore,
an important topic for research is to find the optimal prior update for optimizing the acquisition probability and delay. In that sense, the Unscented Kalman Filter proposed in
[Jul97][Wan00] should be considered since it is known to guarantee the convergence under
mild conditions.
3. Low-cost implementation. The optimal second-order estimator is formulated in Chapter 3 and Chapter 4 resorting to the vec(·) transformation. Consequently, it is necessary
to compute the inverse of the M 2 × M 2 square matrix Q(θ) and, thus, the estimator computational cost increases rapidly when the number of observations M is augmented. In
some problems, matrix Q−1 (θ) can be computed offline before processing the first sample
(e.g., in digital synchronization). However, in other relevant problems such as channel and
DOA estimation, the inverse needs to be computed every time (online).
The closed-loop architecture introduced in Section 2.5.1 as well as the QEKF formulation
in Chapter 5 allow reducing the number of observations M that are jointly processed each
time. Additionally, suboptimal quadratic estimators could be investigated by considering
rank-reduction techniques [Sic92][Sch91b] and transversal filtering implementations. In
the latter case, the (scalar) parameter θ could be estimated from the sample covariance
matrix at time n applying a one-rank matrix M = hhH . Thus, we have
n = b + ynH Myn = b + hH yn 2
θn = b + Tr MR
where yn is the observation at time n and h collects the coefficients of the estimator
time-invariant impulse response. In that case, we aim at optimizing the coefficients of h
according to the criteria presented in Chapter 3 and Chapter 4. Evidently, we can consider
8.1.
FURTHER RESEARCH
243
multiple transversal filters:
θn = b +
R
H 2
hm yn m=1
where R is the rank of
M=
R
hm hH
m.
m=1
Actually, the original estimators in Chapter 3 and Chapter 4 were the sum of M transversal
filters because matrix M was originally full rank, i.e., R = M.
Appendix A
Notation
In general, uppercase boldface letters (A) denote matrices, lowercase boldface letters (a) denote (column) vectors and italics (a, A) denote scalars. In some occasions, matrices are also
represented with calligraphic fonts (A).
AT , A∗ , AH
Transpose, complex conjugate and transpose conjugate of matrix A, respectively.
A−1 , A#
Inverse and Moore-Penrose pseudoinverse of matrix A, respectively.
A1/2
Positive definite Hermitian square root of matrix A, i.e. A1/2 A1/2 = A.
A (B)
Matrix A is a function of the entries in matrix B.
det (A)
Determinant of matrix A.
Tr (A)
Trace of matrix A.
vec (A)
Column vector formed stacking the columns of matrix A on top of one another.
diag (a) , diag (A)Following the Matlab notation, diag (a) is the N × N diagonal matrix whose
entries are the N elements of the vector a, and diag (A) is the column vector
containing the N diagonal elements of matrix A.
Dg (A)
a
aW
An N × N diagonal matrix
whose entries are the N elements in the diagonal
of matrix A, i.e., diag [A]1,1 , . . . , [A]N,N or, equivalently, diag (diag (A)) .
√
Euclidean norm of a, i.e. a = aH a.
√
Weighted norm of a, i.e. aW = aH Wa (with Hermitian positive definite
W).
[A]i,j
The entry of matrix A in the i-th row and the j-th column.
[A]i
The i-th column of matrix A.
245
APPENDIX A. NOTATION
246
[v]i
The i-th element of vector v.
A⊗B
Kronecker product between A and B. If A is M × N,
⎡
⎤
[A]1,1 B · · · [A]1,N B
⎢
⎥
..
..
..
⎥ .
A ⊗ B =⎢
.
.
.
⎣
⎦
[A]M,1 B · · · [A]M,N B
AB
Elementwise (Schur-Hadamard) product between A and B (they must have
the same dimensions).
A ≥ B, A > B
The matrix A − B is positive semidefinite and positive definite, respectively.
IN , I
The N × N identity matrix and the identity matrix of implicit size.
0M ×N , 0M , 0
An M ×N all-zeros matrix, an M-long all-zeros vector and, an all-zeros matrix
or vector of implicit size.
1M ×N , 1M , 1
An M × N all-ones matrix, an M-long all-ones vector and, an all-ones matrix
or vector of implicit size.
dN
The vector defined as dN = [0, . . . , N − 1]T
ei
Vector that has unity in its i-th position and zeros elsewhere.
RM×N , CM ×N
The set of M × N matrices with real and complex valued entries, respectively.
j
Imaginary unit (j =
√
−1).
Re {A} , Im {A} The matrices containing the real and imaginary parts of the entries of A
respectively.
Im(a)
Re(a)
arg {a}
Angle of the complex number a, i.e., arg {a} = arctan
|a| , sign (a)
Absolute value and sign of a real valued a.
a
Smallest integer bigger than or equal to a.
A
Estimator or estimate of the matrix A.
fA (A)
Probability density function of the random matrix A.
E {A}
Expectation of a random matrix A.
EB {A}
Expectation of a random matrix A with respect to the statistics in B.
arg min f(B)
Matrix B minimizing the scalar function f (B)
arg max f (B)
Matrix B maximizing the scalar function f(B)
B
B
.
247
∂A/∂B
If B is M × N,
∂A
∂B
is a matrix formed as
⎡
∂A
∂{B}1,1 · · ·
⎢
∂A ⎢
..
..
=⎢
.
.
∂B ⎣
∂A
···
∂{B}
M,1
∂A
∂{B}1,N
..
.
∂A
∂{B}M,N
⎤
⎥
⎥
⎥ .
⎦
In addition, for a given scalar b, ∂A/∂b is the matrix containing the derivatives
of the entries of A with respect to b. If b is complex, we have
∂
,
j ∂ Im{b}
Multidimensional Kronecker delta defined as
1 i1 = . . . = iN
.
δ (i1 , . . . , iN ) =
0 otherwise
δ (x)
Vectorial Dirac‘s delta defined as
δ (x) =
1
x=0
0
otherwise
−
Direct and inverse Fourier transform for both the analog and discrete cases
∞
$
−j2πf n ,
defined as F {x(t)} = −∞ x(t)e−j2πf t dt and F {x[n]} = ∞
n=−∞ x[n]e
∞
∗
Analog or discrete convolution defined as x(t) ∗ y(t) =
$
x[n] ∗ y[n] = ∞
k=−∞ x[k]y[n − k], respectively.
x ∈ (A, B]
The scalar x belongs to the interval given by x > A and x ≤ B
sin (πx) / (πx) x = 0
Function defined as sinc (x) .
1
x=0
sup
∂
∂ Re{b}
.
respectively.
sinc (x)
=
see [Bra83].
δ (i1 , . . . , iN )
F {·} , F−1 {·}
∂
∂b
−∞ x(τ )y(t
− τ )dτ or
Supremum (lowest upper bound). If the set is finite, it coincides with the
maximum (max).
lim sup
Limit superior (limit of the sequence of suprema).
O (·) , o (·)
Landau symbols for order of convergence.
Symbol used to define a new variable.
∝
It stands for “proportional to” or sometimes “equivalent to”.
ln (·)
Natural logarithm.
Appendix B
Acronyms
AoA
Angle of Arrival.
APK
Amplitude Phase Keying.
AWG
Additive White Gaussian.
AWGN
Additive White Gaussian Noise.
BLUE
Best Linear Unbiased Estimator.
BPSK
Binary Phase Shift Keying.
BQUE
Best Quadratic Unbiased Estimator.
CDMA
Code Division Multiple Access.
CML
Conditional Maximum Likelihood.
CPM
Continuous Phase Modulation.
CRB
Cram´er-Rao Bound.
DA
Data Aided.
DD
Decision Directed.
DOA
Direction of Arrival.
EKF
Extended Kalman Filter.
FIM
Fisher Information Matrix.
FIR
Finite Impulse Response.
GML
Gaussian Maximum Likelihood.
249
APPENDIX B. ACRONYMS
250
GMSK
Gaussian Minimum Shift Keying.
GSM
Global System Mobile.
IIR
Infinite Impulse Response.
ISI
Intersymbol Interference.
LOS
Line-of-Sight.
MAI
Multiple Access Interference.
MCRB
Modified Cram´er-Rao Bound.
MIMO
Multiple Input Multiple Output.
ML
Maximum Likelihood.
MMSE
Minimum Mean Squared Error.
MPSK
M-ary Phase Shift Keying.
MSE
Mean Squared Error.
MSK
Minimum Shift Keying.
MUI
Multiuser Interference.
MVU
Minimum Variance Unbiased.
NDA
Non Data Aided.
NLOS
Non Line-of-Sight.
OFDM
Orthogonal Frequency Divison Multiple Access.
PAM
Pulse Amplitude Modulation.
p.d.f.
Probability Density Function.
QAM
Quadrature Amplitude Modulation.
QEKF
Quadratic Extended Kalman Filter.
QPSK
Quaternary Phase Shift Keying.
RX
Receiver.
SCPC
Single Channel per Carrier.
SNR
Signal to Noise Ratio.
TOA
Time of Arrival.
251
TX
Transmitter.
UKF
Unscented Kalman Filter
ULA
Uniform Linear Array.
UMTS
Universal Mobile Telecommunications System.
WLS
Weighted Least Squares.
ZF
Zero Forcing.
Bibliography
[Abe93]
J.S. Abel, “A bound on mean-square-estimate error”, IEEE Trans. on Information
Theory, Vol. 39, no 5, pags. 1675—1680, Sept. 1993.
[Alb73]
J.P.A. Albuquerque, “The barankin bound: A geometric interpretation”, IEEE Trans.
on Information Theory, Vol. 19, no 4, pags. 559—561, Jul. 1973.
[Alb89]
T. Alberty, V. Hespelt, “A new pattern jitter free frequency error detector”, IEEE
Trans. on Communications, Vol. 37, pags. 159—163, Feb. 1989.
[Ama98] S.I. Amari, “Natural gradient works efficiently in learning”, Neural Computation,
Vol. 10, pags. 251—276, 1998.
[And79]
B.D.O. Anderson, J.B. Moore, Optimal Filtering, Prentice-Hall, New Jersey, 1979.
[And90]
A.N.D’ Andrea, U. Mengali, R. Reggiannini, “A digital approach to clock recovery in
generalized minimum shift keying”, Proc. of the IEEE Trans. on Vehicular Technology,
Vol. 39, no 3, pags. 227—234, Aug. 1990.
[And93]
A.N.D’ Andrea, U. Mengali, “Design of quadricorrelators for automatic frequency
control systems”, IEEE Trans. on Communications, Vol. 41, pags. 988—997, Jun.
1993.
[And94]
A.N.D’ Andrea, U. Mengali, R. Reggiannini, “The modified Cram´er-Rao bound and
its application to synchronization problems”, IEEE Transactions on Communications,
Vol. 42, no 2, pags. 1391—1399, Feb.-Apr. 1994.
[And96]
A.N.D’ Andrea, M. Luise, “Optimization of symbol timing recovery for QAM data
demodulators”, IEEE Trans. on Communications, Vol. 44, pags. 399—406, Mar 1996.
[Bah74]
L. Bahl, J. Cocke, F. Jelinek, , J. Raviv, “Optimal decoding of linear codes for minimizing the symbol error rate”, IEEE Trans. on Information Theory, pags. 284—287,
Mar. 1974.
[Bar49]
E.W. Barankin, “On some analogues of the amount of information and their use
in statistical estimation”, Annals of Mathematical Statistics, Vol. 20, pags. 477—501,
1949.
253
BIBLIOGRAPHY
254
[Bat46]
A. Battacharyya, “On some analogues of the amount of information and their use in
statistical estimation”, Sankya, Vol. 8, pags. 1—14, 1946.
[Bel74]
S. Bellini, G. Tartara, “Bounds on error in signal parameter estimation”, IEEE Trans.
on Communications, Vol. 22, pags. 340—342, Mar. 1974.
[Bel97]
K. L. Bell, Y. Steinberg, Y. Ephraim, H. L. Van Trees, “Extended Ziv-Zakai lower
bound for vector parameter estimation”, IEEE Trans. on Information Theory, Vol. 43,
no 2, pags. 624—637, Mar. 1997.
[Ben84]
A. Benveniste, M. Goursat, “Blind equalizers”, IEEE Trans. on Communications,
Vol. 32, no 8, pags. 871—883, Aug. 1984.
[Ber93]
C. Berrou, P. Glavieux, P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes”, Proc. of the IEEE Int. Conf. on Communications
(ICC), Geneva (Switzerland), 1993.
[Bie80]
G. Bienvenu, L. Kopp, “Adaptivity to background noise spatial coherence for high
resolution passive methods”, Proc. of IEEE Int. Conf. on Accoustics, Speech and
Signal Processing, pags. 307—310, Apr. 1980.
[Bob76]
B.Z. Bobrovsky, M. Zakai, “A lower bound on the estimation error for certain diffusion
processes”, IEEE Trans. on Information Theory, Vol. 22, no 1, pags. 45—52, Jan. 1976.
[Bou02a] N. Bourdeau, S. Di Girolamo, J. Riba, F. Barcelo, M. Burri, M. Gibeaux, , F. Sansone,
“System architecture definition report”, Tech. Rep. Deliverable D6, IST-2000-26040
EMILY, Nov. 2002.
[Bou02b] N. Bourdeau, S. Di Girolamo, J. Riba, M. Gibeaux, M. Burri, F. Sansone, “System
performance definition report”, Tech. Rep. Deliverable D7, IST-2000-26040 EMILY,
April 2002.
[Boy04]
S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, 2004.
[Bra83]
B.A. D.H. Brandwood, “A complex gradient operator and its application in adaptive
array theory”, Proceedings of the IEE , Vol. 130, F and H, pags. 11—16, Feb. 1983.
[Bra01]
M.S. Braasch, “Performance comparison of multipath mitigating receiver architectures”, Proc. of the IEEE Aerospace Conference, pags. 3.1309—3.1315, Mar. 2001.
[Cah77]
C.R. Cahn, “Improving frequency acquisition of a Costas loop”, IEEE Trans. on
Communications, Vol. 25, pags. 1453—1459, Feb. 1977.
BIBLIOGRAPHY
[Car94]
255
J.F. Cardoso, E. Moulines, “A robustness property of DOA estimators based on covariance”, IEEE Trans. on Signal Processing, Vol. 42, no 11, pags. 3285—3287, Nov.
1994.
[Car97]
E. de Carvalho, D.T.M. Slock, “Cramer-Rao bounds for semi-blind, blind and training
sequence based channel estimation”, Proc. of the IEEE Signal Processing Workshop
on Signal Processing Advances in Wireless Communications (SPAWC), pags. 129—132,
Paris (France), Apr. 1997.
[Car00]
E. Carvalho, J. Cioffi, D. Slock, “Cramer-Rao bounds for blind multichannel estimation”, Proc. of the IEEE Int. Conf. on Global Communications (GLOBECOM), Nov
2000.
[Cha51]
D.G. Chapman, H. Robbins, “Minimum variance estimation without regularity assumptions”, Annals of Mathematical Statistics, Vol. 22, pags. 581—586, 1951.
[Cha75]
D. Chazan, M. Zakai, J. Ziv, “Improved lower bounds on signal parameter estimation”,
IEEE Trans. on Information Theory, Vol. 21, no 1, pags. 90—93, Mar. 1975.
[Chi94]
P.C Ching, H.C So, “Two adaptive algorithms for multipath time delay estimation”,
IEEE Journal of Oceanic Engineering, Jul. 1994.
[Chu91]
J.C.-I. Chuang, N.R. Sollenberger, “Burst coherent demodulation with combined symbol timing, frequency offset estimation, and diversity selection”, IEEE Trans. on Communications, Vol. 39, pags. 1157—1164, Jul. 1991.
[Dem77] A.P. Dempster, N.M. Laird, D.B. Rubin, “Maximum likelihood from incomplete data
via the EM algorithm”, Annals of the Royal Statistic Society, Vol. 39, pags. 1—38,
Dec. 1977.
[Fed88]
M. Feder, E. Weinstein, “Parameter estimation of superimposed signals using the EM
algorithm”, IEEE Trans. on Signal Processing, Vol. 36, pags. 477—489, April 1988.
[Fen59]
A.V. Fend, “On the attainment of Cram´er-Rao and Bhattacharyya bounds for the
variances of an estimate”, Annals of Mathematical Statistics, Vol. 30, pags. 381—388,
1959.
[Fis98]
S. Fischer, H. Grubeck, A. Kangas, H. Koorapaty, E. Larsson, P. Lundqvist, “Time of
arrival estimation of narrowband TDMA signals for mobile positioning”, Proc. of the
IEEE Int. Symp. on Personal, Indoor and Mobile Radio Communications (PIMRC),
pags. 451—455, Sep. 1998.
[For72]
G.D. Jr Forney, “Maximum-likelihood sequence estimation of digital sequences in the
presence of intersymbol interference”, IEEE Trans. on Information Theory, Vol. 18,
no 5, pags. 363—378, May 1972.
BIBLIOGRAPHY
256
[For02]
P. Forster, P. Larzabal, “On lower bounds for deterministic parameter estimation”,
Proc. of the IEEE Int. Conf. on Accoustics, Speech and Signal Processing (ICASSP),
Vol. II, pags. 1137—1140, Orlando (Florida, USA), May 2002.
[Gar86a] F.M. Gardner, “A BPSK/QPSK timing-error detector for sampled receivers”, IEEE
Trans. on Communications, Vol. 5, pags. 423—685, May 1986.
[Gar86b] W.A. Gardner, “The role of spectral correlation in design and performance analysis
of synchronizers”, IEEE Trans. on Communications, Vol. 34, pags. 1089—1095, Nov.
1986.
[Gar88a] F.M. Gardner, “Demodulator reference recovery techniques suited for digital implementation”, Tech. Rep. Final Report, ESTEC Contract No. 6847/86/NL/DG, European Space Agency, Aug. 1988.
[Gar88b] W.A. Gardner, “Simplification of MUSIC and ESPRIT by exploitation of cyclostationarity”, Proceedings of the IEEE , Vol. 76, no 7, pags. 845—847, Jul. 1988.
[Gar90]
F.M. Gardner, “Frequency detectors for digital demodulators via maximum-likelihood
derivation”, Tech. Rep. Final Report, Part II, ESTEC Contract No. 8022/88/NL/DG,
European Space Agency, Jun. 1990.
[Gar94]
W.A. Gardner, Cyclostatonarity in Communications and Signal Processing, IEEE
Press, 1994.
[Gel00]
G. Gelli, L. Paura, A.R.P. Ragozini, “Blind widely linear multiuser detection”, IEEE
Communications Letters, Vol. 4, pags. 187—189, Jun. 2000.
[Ger01]
W.H. Gerstacker, R. Schober, A. Lampe, “Equalization with widely linear filtering”,
Proc. of Int. Symposium on Information Theory, pag. 265, Washington, USA, Jun.
2001.
[Gia89]
G. Giannakis, J. Mendel, “Identification of non-minimum phase systems via higherorder statistics”, IEEE Trans. on Accoustics, Speech and Signal Processing, Vol. 38,
pags. 360—377, Mar. 1989.
[Gia97]
G. Giannakis, S.D. Halford, “Asymptotically optimal blind fractionally spaced channel
estimation and performance analysis”, IEEE Trans. on Signal Processing, Vol. 45,
no 7, pags. 1815—1830, Jul. 1997.
[God80] D.N. Godard, “Self-recovering equalization and carrier tracking in two-dimensional
data communication systems”, IEEE Trans. on Communications, Vol. 28, no 11,
pags. 1867—1875, Nov. 1980.
BIBLIOGRAPHY
[Gor90]
257
J.D. Gorman, A.O. Hero, “Lower bounds for parametric estimation with constraints”,
IEEE Trans. on Information Theory, Vol. 26, no 6, pags. 1285—1301, Nov. 1990.
[Gor91]
J.D. Gorman, A.O. Hero, “On the application of Cram´er-Rao type lower bounds
for constrained estimation”, Proc. of the IEEE Int. Conf. on Accoustics and Signal
Processing (ICASSP), pags. Vol. 2, 1333—1336, Toronto (Canada), April 1991.
[Gor97]
A. Gorokhov, P. Loubaton, “Semi-blind second order identification of convolutive
channels”, Proc. of IEEE Int. Conf. on Accoustics, Speech and Signal Processing,
pags. 3905—3908, Munich (Germany), Apr. 1997.
[Gra81]
A. Graham, Kronecker Products and Matrix Calculus: with Applications, John Wiley
and Sons, New Jersey, 1981.
[Gre92]
D. Greenwood, L. Hanzo, Mobile Radio Communications, chap. Characterization of
Mobile Radio Channels (chapter 2), pags. 93—185, John Wiley and Sons, 1992, Editor:
R. Steele.
[Ham50] J.M. Hammersley, “On estimating restricted parameters”, Annals of the Royal Statistics Society (Series B), Vol. 12, pags. 192—240, 1950.
[Hay91]
S. Haykin, Adaptive Filter Theory, Prentice-Hall International, 1991.
[Jul97]
S.J.. Julier, J.K.. Uhlmann, “A new extension of the Kalman filter to nonlinear systems”, Proc. of the Aerosense: The 11th Int. Symposium on Aerospace/Defence Sensing, Simulation and Controls., Orlando, Florida, 1997.
[Kai00]
T. Kailath, A.H. Sayed, B. Hassibi, Linear Estimation, Prentice Hall, 2000.
[Kay93a] S.M. Kay, Fundamentals of Statistical Signal Processing: Detection Theory, Prentice
Hall, New Jersey, 1993.
[Kay93b] S.M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice
Hall, New Jersey, 1993.
[Kie52]
J. Kiefer, “On minimum variance estimators”, Annals in Mathematical Statistics,
Vol. 23, pags. 629—632, 1952.
[Kno99]
L. Knockaert, “On the effect of nuisance parameters on the threshold SNR value of
the Barankin bound”, IEEE Trans. on Signal Processing, Vol. 47, no 2, pags. 523—527,
Feb. 1999.
[Kot59]
V.A. Kotelnikov, The Theory of Optimum Noise Immunity, McGraw-Hill, New York,
1959.
BIBLIOGRAPHY
258
[Kri96]
H. Krim, M. Viberg, “Two decades of array signal processing research. The parametric
approach”, IEEE Signal Processing Magazine, pags. 67—95, Jul. 1996.
[Lau86]
P.A. Laurent, “Exact and approximate construction of digital phase modulations by
superposition of amplitude modulated pulses”, IEEE Trans. on Communications,
Vol. 34, pags. 150—160, Feb. 1986.
[Li99]
H. Li, P. Stoica, J. Li, “Computationally efficient maximum likelihood estimation of
structured covariance matrices”, IEEE Trans. on Signal Processing, Vol. 47, no 5,
pags. 1314—1323, May 1999.
[Liu93]
H. Liu, G. Xu, L. Tong, “A deterministic approach to blind identification of multichannel FIR systems”, Proc. of the Int. 27th Asilomar Conf. Signals, Systems and
Computers, Oct. 1993.
[LS04]
J.A. Lopez-Salcedo, G. Vazquez, “Frequency domain iterative pulse shape estimation based on second-order statistics”, Proc. of 5th IEEE Int. Workshop on Signal
Processing Advances in Wireless Communications (SPAWC), Lisbon, Portugal, Jul.
2004.
[LS05a]
J.A. Lopez-Salcedo, G. Vazquez, “Asymptotic equivalence between the unconditional
maximum likelihood and the square-law nonlinearity symbol timing estimation”, IEEE
Trans. on Signal Processing, 2005, to appear.
[LS05b]
J.A. Lopez-Salcedo, G. Vazquez, “Low-SNR subspace-compressed approach to waveform estimation in digital communications”, IEEE Trans. on Signal Processing, May
2005, submitted.
[Lue84]
D. Luenberger, Linear and Nonlinear Programming, Addison Wesley, Massachusetts,
1984.
[Mag98] J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, John Wiley and Sons, England, 1998.
[Mar97]
T.L. Marzetta, “Computing the Barankin bound, by solving an unconstrained
quadratic optimization problem”, Proc. of the IEEE Int. Conf. on Accoustics, Speech
and Signal Processing (ICASSP), pags. 3829—3832, Munich (Germany), April 1997.
[McW93] L.T. McWhorter, L.L. Scharf, “Properties of quadratic covariance bounds”, Proc. of
the 27th Asilomar Conference on Signals, Systems and Computers, Oct. 1993.
[Men95] U. Mengali, M. Morelli, “Decomposition of M-ary CPM signals into PAM waveforms”,
IEEE Trans. on Information Theory, Vol. 41, pags. 1265—1275, Sept. 1995.
BIBLIOGRAPHY
259
[Men97] U. Mengali, A. D’Andrea, Synchronization Techniques for Digital Receivers, Plenum
Press, 1997.
[Mer00]
R. van der Merwe, N. de Freitas, A. Doucet, E. Wan, “The unscented particle filter”,
Technical Report CUED/F-INFENG/TR 380, Cambridge University Engineering Department, Aug. 2000.
[Mer01]
R. van der Merwe, N. de Freitas, A. Doucet, E. Wan, “The unscented particle filter”,
Advances in Neural Information Processing Systems, Vol. 13, Nov. 2001.
[Mer03]
R. van der. Merwe, E.A.. Wan, “Sigma-point Kalman filters for probabilistic inference
in dynamic state-space models”, Proc. of the Workshop on Advances in Machine
Learning, Montreal, Canada, Jun. 2003.
[Mes79]
D.G. Messerschmitt, “Frequency detectors for PLL acquisition in timing and carrier
recovery”, IEEE Trans. on Communications, Vol. 27, pags. 1288—1295, Sep. 1979.
[Mes02]
X. Mestre, Space Processing and Channel Estimation: Performance Analysis and
Asymptotic Results, PhD Thesis, Dept. of Signal Theory and Communications, Technical University of Catalonia (Spain), Nov. 2002.
[Mey90] H. Meyr, G. Ascheid, Synchronization in Digital Communications. Phase-, FrequencyLocked Loops, and Amplitude Control, John Wiley and Sons, 1990.
[Mie00]
B. Mielczarek, Synchronization in Turbo Coded Systems, Licentiate thesis, Dept.
of Signals and Systems, Chalmers University of Technology (Sweden), April 2000,
chalmers technical report 342L.
[Mit97]
T.M. Mitchell, Machine Learning, McGraw-Hill, 1997.
[Moe92] M. Moeneclaey, “Overview of digital algorithms for carrier frequency synchronization”, Proc. of the Int. European Space Agency Conf. on Digital Signal Processing for
Space Communications, pags. 1.1—1.7, Sep. 1992.
[Moe94] M. Moeneclaey, G. de Jonghe, “ML-oriented NDA carrier synchronization for general rotationally symmetric signal constellations”, IEEE Trans. on Communications,
Vol. 42, pags. 2531—2533, Aug. 1994.
[Moe98] M. Moeneclaey, “On the true and modified Cramer-Rao bounds for the estimation of
a scalar parameter in the presence of nuisance parameters”, IEEE Trans. on Communications, Vol. 46, pags. 1536—1544, Nov. 1998.
[Mog03] P.P Moghaddam, H. So, R.L. Kirlin, “A new time-delay estimation in multipath”,
IEEE Trans. on Signal Processing, May 2003.
BIBLIOGRAPHY
260
[Mor00]
M. Morelli, G.M. Vitetta, “Joint phase and timing recovery for MSK-type signals”,
IEEE Trans. on Communications, Vol. 48, pags. 1997—1999, Dec. 2000.
[Mou95] E. Moulines, P. Duhamel, J. Cardoso, S. Mayrargue, “Subspace methods for blind
identification of multichannel FIR filters”, IEEE Trans. on Signal Processing, Vol. 43,
pags. 516—526, Feb. 1995.
[Nik92]
C. Nikias, “Blind deconvolution using higher-order statistics”, Proc. of the Int. 2nd
Int. Conf. of Higher-Order Statistics (Elsevier), pags. 49—56, 1992.
[Noe03]
N. Noels, C. Herzet, A. Dejonghe, V. Lottici, H. Steendam, M. Moeneclaey, M Luise,
L. Vandendorpe, “Turbo synchronization: an EM algorithm interpretation”, Proc. of
the IEEE Int. Conf. on Communications (ICC), 2003.
[Oer88]
M. Oerder, H. Meyr, “Digital filter and square timing recovery”, IEEE Trans. on
Communications, Vol. 36, pags. 605—612, May 1988.
[Ott92]
B. Ottersten, M. Viberg, T. Kailath, “Analysis of subspace fitting and ML techniques
for parameter estimation from sensor array data”, IEEE Trans. on Signal Processing,
Vol. 40, pags. 590—600, Mar. 1992.
[Ott93]
B. Ottersten, M. Viberg, P. Stoica, Radar Array Processing, chap. Exact and Large
Sample Maximum Likelihood Techniques for Parameter Estimation and Detection,
Springer-Verlag, 1993.
[Pic87]
G. Picci, G. Prati, “Blind equalization and carrier recovery using a stop-and-go decision directed algorithm”, IEEE Trans. on Communications, Vol. 35, no 9, pags. 877—
887, Sept. 1987.
[Pic94]
B. Picinbono, “On circularity”, IEEE Trans. on Signal Processing, Vol. 42, pags. 3473—
3482, Dec. 1994.
[Pic95]
B. Picinbono, P. Chevalier, “Widely linear estimation with complex data”, IEEE
Trans. on Signal Processing, Vol. 43, pags. 2030—2033, Aug. 1995.
[Pic96]
B. Picinbono, “Second-order complex random vectors and normal distributions”,
IEEE Trans. on Signal Processing, Vol. 44, pags. 2637—2640, Jul. 1996.
[Pis73]
V.F. Pisarenko, “The retrieval of harmonics from a covariance function”, Journal of
the Royal Astronomy Society, Vol. 33, pags. 347—366, 1973.
[Pol95]
A. Polydoros, R. Raheli, C.K. Tzou, “Per-Survivor-Processing: A general approach
to MLSE in uncertain enviroments”, IEEE Trans. on Communications, Vol. 43,
pags. 354—364, Feb./March/April 1995.
BIBLIOGRAPHY
261
[Pro95]
J.G. Proakis, Digital Communications, McGraw-Hill, 1995.
[Rib94]
J. Riba, G. Vazquez, “Bayesian recursive estimation of frequency and timing exploiting
the cyclostationary property”, EURASIP Signal Processing, Vol. 40, pags. 21—37, Oct.
1994.
[Rib96]
J. Riba, J. Goldberg, G. Vazquez, “Signal selective DOA tracking for multiple moving
targets”, Proc. of IEEE Int. Conf. on Accoustics, Speech and Signal Processing, pags.
2559—2562, May 1996.
[Rib97]
J. Riba, Procesado de Senal Bayesiano en Estimaci´
on Conjunta de Frequencia y
Tiempo de Llegada, PhD Thesis, Dept. of Signal Theory and Communications, Technical University of Catalonia (Spain), Feb. 1997.
[Rib01a] J. Riba, “Parameter estimation of binary CPM signals”, Proc. of the IEEE Int. Conf.
on Accoustics, Speech and Signal Processing (ICASSP), Salt Lake City (Utah, USA),
2001.
[Rib01b] J. Riba, J. Sala, G. Vazquez, “Conditional maximum likelihood timing recovery: Estimators and bounds”, IEEE Trans. on Signal Processing, Vol. 49, pags. 835—850, April
2001.
[Rib02]
J. Riba, A. Urruela, “A robust multipath mitigation technique for time-of-arrival
estimation”, Proc. of the IEEE Vehicular Technology Conference (VTC), pags. 2263—
2267, Sep. 2002.
[Rif74]
D.C. Rife, R.R. Boorstyn, “Single-tone parameter estimation from discrete-time observations”, IEEE Trans. on Information Theory, Vol. 20, pags. 378—392, Sep. 1974.
[Rif75]
D.C. Rife, M. Goldstein, R.R. Boorstyn, “A unification of Cram´er-Rao type bounds”,
IEEE Trans. on Information Theory, Vol. 21, no 3, pags. 330—332, May 1975.
[Roy89]
R. Roy, T. Kailath, “ESPRIT - estimation of signal parameters via rotational invariance techniques”, IEEE Trans. on Accoustics, Speech and Signal Processing, Vol. 37,
no 7, pags. 984—995, Jul. 1989.
[Sal97]
J. Sala, G. Vazquez, “Statistical reference criteria for adaptive signal processing in
digital communications”, Proc. of the IEEE , Vol. 45, no 1, pags. 14—31, Jan. 1997.
[Sar88]
H. Sari, S. Moridi, “New phase and frequency detectors for carrier recovery in PSK
and QAM systems”, IEEE Trans. on Communications, Vol. 36, pags. 1035—1043, Sep.
1988.
BIBLIOGRAPHY
262
[Sat75]
Y. Sato, “A method of self-recovering equalization for multilevel amplitudemodulation”, IEEE Trans. on Communications, Vol. 23, no 6, pags. 679—682, Jun.
1975.
[Sch79]
R.O. Schmidt, “Multiple emitter location and signal parameter estimation”, Proc. of
RADC Spectral Estimation Workshop, pags. 243—258, 1979.
[Sch89]
S.V. Schell, R.A. Calabretta, W.A. Gardner, B.G. Agee, “Cyclic MUSIC algorithms
for signal-selective direction finding”, Proc. of the IEEE Int. Conf. on Accoustics,
Speech and Signal Processing (ICASSP), pags. 2278—2281, Glasgow (Scotland), May
1989.
[Sch91a] L.L. Scharf, Statistical Signal Processing. Detection, Estimation, and Time Analysis,
Addison Wesley, 1991.
[Sch91b] L.L. Scharf, “The SVD and reduced-rank signal processing”, R.J. Vaccaro (ed.), SVD
and Signal Processing, II: Algorithms, Analysis and Applications, pags. 3—31, Elsevier
Science Publishers B.V. (North-Holland), 1991.
[Sch94]
S.V. Schell, D.L. Smith, S. Roy, “Blind channel identification using subchannel response matching”, Proc. of the Int. 26th Conf. Information Sciences and Systems,
Princeton (NJ), Mar. 1994.
[Sch03]
P.J. Schreier, L.L. Scharf, “Second-order analysis of improper complex random vectors
and processes”, IEEE Trans. on Signal Processing, Vol. 51, no 3, pags. 714—725, Mar.
2003.
[Sec00]
G. Seco, Antenna Arrays for Multipath and Interference Mitigation in GNSS Receivers, PhD Thesis, Dept. of Signal Theory and Communications, Technical University of Catalunya (Spain), Jul. 2000.
[Ser01]
E. Serpedin, P. Ciblat, G.B. Giannakis, P. Loubaton, “Performance analysis of blind
carrier phase estimators for general QAM constellations”, IEEE Trans. on Signal
Processing, Vol. 49, pags. 1816—1823, Aug. 2001.
[Sha90]
O. Shalvi, E. Weinstein, “New criteria for blind deconvolution of nonminimum phase
systems (channels)”, IEEE Trans. on Information Theory, Vol. 36, no 2, pags. 312—
320, Mar. 1990.
[Sic92]
G.L. Sicuranza, “Quadratics filters for signal processing”, Proceedings of the IEEE ,
Vol. 80, 1992.
[S¨
od89]
T. S¨oderstr¨om, P. Stoica, System Identification, Prentice Hall, London, 1989.
BIBLIOGRAPHY
[Ste01]
263
H. Steendam, M. Moeneclaey, “Low-SNR limit of the Cramer-Rao bound for estimating the time delay of a PSK, QAM, or PAM waveform”, IEEE Communications
Letters, Vol. 5, no 1, pags. 31—33, Jan. 2001.
[Sto89]
P. Stoica, A. Nehorai, “MUSIC, Maximum Likelihood, and Cram´er-Rao Bound”,
IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. 37, no 5, pags. 720—
741, May 1989.
[Sto90a] P. Stoica, A. Nehorai, “Performance study of conditional and unconditional directionof-arrival estimation”, IEEE Trans. on Accoustics and Signal Processing, Vol. 38,
no 10, pags. 1783—1795, Oct. 1990.
[Sto90b] P. Stoica, K. Sharman, “Maximum likelihood methods for direction-of-arrival estimation”, IEEE Trans. on Accoustics, Speech and Signal Processing, Vol. 38, no 7,
pags. 1132—1143, Jul. 1990.
[Sto97]
P. Stoica, R.L. Moses, Introduction to Spectral Analysis, Prentice Hall, 1997.
[Sto01]
P. Stoica, T. Marzetta, “Parameter estimation problems with singular information
matrices”, IEEE Trans. on Signal Processing, Vol. 49, pags. 87—90, Jan. 2001.
[Ton91]
L. Tong, G. Xu, T. Kailath, “A new approach to blind identification and equalization
of multipath channels”, Proc. of the 25th Asilomar Conference, pags. 856—860, Pacific
Grove (CA), Nov. 1991.
[Ton94]
L. Tong, G. xu, T. Kailath, “Blind channel identification and equalization based
on second-order statistics: A time domain approach”, IEEE Trans. on Information
Theory, Vol. 40, no 2, pags. 340—349, Mar. 1994.
[Ton95]
L. Tong, G. xu, B. Hassibi, T. Kailath, “Blind channel identification based on secondorder statistics: A frequency-domain approach”, IEEE Trans. on Information Theory,
Vol. 41, no 1, pags. 329—334, Jan. 1995.
[Tre68]
H.L. Van Trees, Detection, Estimation and Modulation Theory. Part I , Wiley, New
York, 1968.
[Tre83]
J.R. Treichler, B.G. Agee, “A new approach to multipath correction of constant modulus signals”, IEEE Trans. on Accoustics, Speech and Signal Processing, Vol. 31, no 2,
pags. 459—472, Apr. 1983.
[Tug95]
J.K. Tugnait, “On blind identifiability of multipath channels using fractional sampling
and second-order cyclostationary statistics”, IEEE Trans. on Information Theory,
Vol. 41, no 1, pags. 308—311, Jan. 1995.
BIBLIOGRAPHY
264
[Tul00]
A.M. Tulino, S. Verdu, “Improved linear receivers for BPSK-CDMA subject to fading”, Proc. of Allerton Conf. on Communications, Control and Computation, pags.
11—21, Monticello, IL, Oct. 2000.
[Ung76]
G. Ungerboeck, “Fractional tap-spacing equalizer and consequences for clock recovery
in data modems”, IEEE Trans. on Communications, Vol. 24, no 8, pags. 856—864,
Aug. 1976.
[Vaz00]
G. Vazquez, J. Riba, Signal Processing Advances in Wireless Communications. Trends
in Single and Multi-user Systems, Vol. II, chap. Non-Data-Aided Digital Synchronization, pags. 357—402, Prentice-Hall, 2000, Editors: G.B. Giannakis, Y. Hua, P. Stoica
and L. Tong.
[Vaz01]
G. Vazquez, J. Villares, “Optimal quadratic NDA synchronization”, Proc. of the 7th
Int. European Spacial Agency Conf. on Digital Signal Processing for Space Communications, Lisbon, Portugal, Sept. 2001.
[Vib91]
M. Viberg, B. Ottersten, T. Kailath, “Detection and estimation in sensor arrays using
weighted subspace fitting”, IEEE Transactions on Signal Processing, Vol. 39, no 11,
pags. 2435—2449, Nov. 1991.
[Vib95]
M. Viberg, A. Nehorai B. Ottersten, “Performance analysis of direction finding with
large arrays and finite data”, IEEE Trans. on Signal Processing, Vol. 43, no 2,
pags. 469—477, Feb. 1995.
[Vil01a]
J. Villares, G. Vazquez, “Best quadratic unbiased estimator (BQUE) for timing and
frequency synchronization”, Proc. of the IEEE Statistical Signal Processing Workshop,
Singapore, Aug. 2001.
[Vil01b]
J. Villares, G. Vazquez, J. Riba, “Fourth order non data aided synchronization”, Proc.
of the IEEE Int. Conf. on Accoustics and Signal Processing (ICASSP), May 2001.
[Vil02a]
J. Villares, G. Vazquez, “Optimal quadratic non-assisted parameter estimation for
digital synchronisation”, Proc. of Int. Zurich Seminar on Broadband Communications
(IZS), Zurich (Switzerland), Feb. 2002.
[Vil02b]
J. Villares, G. Vazquez, “Sample covariance matrix based parameter estimation for
digital synchronization”, Proc. of the IEEE Global Communications Conference 2002
(Globecom), Taipei (Taiwan), Nov. 2002.
[Vil03a]
J. Villares, G. Vazquez, “Sample covariance matrix parameter estimation: Carrier
frequency, a case study”, Proc. of the IEEE Int. Conf. on Accoustics and Signal
Processing (ICASSP), pags. Vol.6, 725—728, Hong Kong (China), Apr. 2003.
BIBLIOGRAPHY
[Vil03b]
265
J. Villares, G. Vazquez, “Second-order DOA estimation from digitally modulated signals”, Proc. of the Int. 37th Asilomar Conf. Signals, Systems and Computers, Pacific
Grove (USA), Nov. 2003.
[Vil03c]
J. Villares, G. Vazquez, M. Lamarca, “Maximum likelihood blind carrier synchronization in space-time coded OFDM systems”, Proc. of the IEEE Signal Processing
Workshop on Signal Processing Advances in Wireless Communications (SPAWC),
Rome (Italy), Jun. 2003.
[Vil04a]
J. Villares, G. Vazquez, “On the quadratic extended Kalman filter”, Proc. of the Third
IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Barcelona
(Spain), Jul. 2004.
[Vil04b]
J. Villares, G. Vazquez, “Self-noise free second-order carrier phase synchronization of
MSK-type signals”, Proc. of the IEEE Int. Conf. on Communications (ICC), Paris
(France), Jun. 2004.
[Vil05]
J. Villares, G. Vazquez, “Second-order parameter estimation”, IEEE Transactions on
Signal Processing, Jul. 2005.
[Wan00] E.A. Wan, R. van der Merwe, “The unscented kalman filter for nonlinear estimation”,
Proc. of IEEE Symposium 2000 (AS-SPCC), Lake Louise, Alberta, Canada, Oct.
2000.
[Wei83]
A.J. Weiss, E. Weinstein, “Fundamentals limitations in passive time delay estimation - part I: Narrow-band systems”, IEEE Trans. on Accoustics, Speech, and Signal
Processing, Vol. 31, no 2, pags. 472—486, Apr. 1983.
[Wei84]
E. Weinstein, A.J. Weiss, “Fundamentals limitations in passive time delay estimation - part II: Wide-band systems”, IEEE Trans. on Accoustics, Speech, and Signal
Processing, Vol. 32, no 5, pags. 1064—1078, Oct. 1984.
[Wei85]
A.J. Weiss, E. Weinstein, “A lower bound on the mean square error in random parameter estimation”, IEEE Trans. on Information Theory, Vol. 31, no 5, pags. 680—682,
Sept. 1985.
[Wei88a] E. Weinstein, “Relations between Bellini-Tartara, Chazan-Zakai-Ziv, and Waz-Ziv
lower bounds”, IEEE Trans. on Information Theory, Vol. 34, pags. 342—343, Mar.
1988.
[Wei88b] E. Weinstein, A.J. Weiss, “A general class of lower bounds in parameter estimation”,
IEEE Trans. on Information Theory, Vol. 34, pags. 338—342, Mar. 1988.
BIBLIOGRAPHY
266
[Win00]
J. Winter, C. Wengerter, “High resolution estimation of the time of arrival for GSM
location”, Proc. of the IEEE Vehicular Technology Conference (VTC), pags. 1343—
1347, May 2000.
[Xu92]
G. Xu, T. Kailath, “Direction-of-arrival estimation via exploitation of cyclostationarity - a combination of temporal and spatial processing”, IEEE Transactions on Signal
Processing, Vol. 7, no 40, pags. 1775—1786, Jul. 1992.
[Zei93]
A. Zeira, P.M. Schultheiss, “Realizable lower bounds for time delay estimation”, IEEE
Trans. on Signal Processing, Vol. 41, no 11, pags. 3102—3113, Nov. 1993.
[Zei94]
A. Zeira, P.M. Schultheiss, “Realizable lower bounds for time delay estimation: Part 2
- threshold phenomena”, IEEE Trans. on Signal Processing, Vol. 42, no 5, pags. 1001—
1007, May 1994.
[Zen97a] H.H. Zeng, L. Tong, “Blind channel estimation using the second-order statistics: Algorithms”, IEEE Trans. on Signal Processing, Vol. 48, no 8, pags. 1919—1930, Aug.
1997.
[Zen97b] H.H. Zeng, L. Tong, “Blind channel estimation using the second-order statistics:
Asymptotic performance and limitantions”, IEEE Trans. on Signal Processing,
Vol. 48, no 8, pags. 2060—2071, Aug. 1997.
[Ziv69]
J. Ziv, M. Zakai, “Some lower bounds on signal parameter estimation”, IEEE Trans.
on Information Theory, Vol. 15, no 3, pags. 386—391, May 1969.