Tesi Doctoral Sample Covariance Based Parameter Estimation for Digital Communications Autor: Javier Villares Piera Director: Gregori V´azquez Grau Department de Teoria del Senyal i Comunicacions Universitat Polit`ecnica de Catalunya Barcelona, maig de 2005 Abstract This thesis deals with the problem of blind second-order estimation in digital communications. In this field, the transmitted symbols appear as non-Gaussian nuisance parameters degrading the estimator performance. In this context, the Maximum Likelihood (ML) estimator is generally unknown unless the signal-to-noise (SNR) is very low. In this particular case, if the SNR is asymptotically low, the ML solution is quadratic in the received data or, equivalently, linear in the sample covariance matrix. This significant feature is shared by other important ML-based estimators such as, for example, the Gaussian and Conditional ML estimators. Likewise, MUSIC and other related subspace methods are based on the eigendecomposition of the sample covariance matrix. From this background, the main contribution of this thesis is the deduction and evaluation of the optimal second-order parameter estimator for any SNR and any distribution of the nuisance parameters. A unified framework is provided for the design of open- and closed-loop second-order estimators. In the first case, the minimum mean square error and minimum variance second-order estimators are deduced considering that the wanted parameters are random variables of known but arbitrary prior distribution. From this Bayesian approach, closed-loop estimators are derived by imposing an asymptotically informative prior. In this small-error scenario, the best quadratic unbiased estimator (BQUE) is obtained without adopting any assumption about the statistics of the nuisance parameters. In addition, the BQUE analysis yields the lower bound on the performance of any blind estimator based on the sample covariance matrix. Probably, the main result in this thesis is the proof that quadratic estimators are able to exploit the fourth-order statistical information about the nuisance parameters. Specifically, the nuisance parameters fourth-order cumulants are shown to provide all the non-Gaussian information that is utilizable for second-order estimation. This fourth-order information becomes relevant in case of constant modulus nuisance parameters and medium-to-high SNRs. In this situation, the Gaussian assumption is proved to yield inefficient second-order estimates. Another original result in this thesis is the deduction of the quadratic extended Kalman filter (QEKF). The QEKF study concludes that second-order trackers can improve simultaneously the acquisition and steady-state performance if the fourth-order statistical information about the i ii nuisance parameters is taken into account. Once again, this improvement is significant in case of constant modulus nuisance parameters and medium-to-high SNRs. Finally, the proposed second-order estimation theory is applied to some classical estimation problems in the field of digital communications such as non-data-aided digital synchronization, the related problem of time-of-arrival estimation in multipath channels, blind channel impulse response identification, and direction-of-arrival estimation in mobile multi-antenna communication systems. In these applications, an intensive asymptotic and numerical analysis is carried out in order to evaluate the ultimate limits of second-order estimation. Resum En aquesta tesi s’estudia el problema d’estimaci´ o cega de segon ordre en comunicacions digitals. En aquest camp, els s´ımbols transmesos esdevenen par` ametres no desitjats (nuisance parameters) d’estad´ıstica no gaussiana que degraden les prestacions de l’estimador. En aquest context, l’estimador de m` axima versemblan¸ca (ML) ´es normalment desconegut excepte si la relaci´ o senyalsoroll (SNR) ´es prou baixa. En aquest cas particular, l’estimador ML ´es una funci´o quadr`atica del vector de dades rebudes o, equivalentment, una transformaci´o lineal de la matriu de covari` ancia mostral. Aquesta caracter´ıstica es compartida per altres estimadors importants basats en el principi de m`axima versemblan¸ca com ara l’estimador ML gaussi`a (GML) i l’estimador ML condicional (CML). Aix´ı mateix, l’estimador MUSIC, i altres m`etodes de subespai relacionats amb ell, es basen en la diagonalitzaci´ o de la matriu de covari` ancia mostral. En aquest marc, l’aportaci´ o principal d’aquesta tesi ´es la deducci´ o i avaluaci´ o de l’estimador o`ptim de segon ordre per qualsevol SNR i qualsevol distribuci´ o dels nuisance parameters. El disseny d’estimadors quadr` atics en lla¸c obert i lla¸c tancat s’ha plantejat de forma unificada. Pel que fa als estimadors en lla¸c obert, s’han derivat els estimadors de m´ınim error quadr` atic mig i m´ınima vari` ancia considerant que els par` ametres d’inter`es s´ on variables aleat` ories amb una distribuci´o estad´ıstica coneguda a priori per` o, altrament, arbitr`aria. A partir d’aquest plantejament Bayesi` a, els estimadors en lla¸c tancat es poden obtenir suposant que la distribuci´o a priori dels par`ametres ´es altament informativa. En aquest model de petit error, el millor estimador quadr`atic no esbiaixat, anomenat BQUE, s’ha formulat sense convenir cap estad´ıstica particular pels nuisance parameters. Afegit a aix` o, l’an` alisi de l’estimador BQUE ha perm`es calcular quina ´es la fita inferior que no pot millorar cap estimador cec que utilitzi la matriu de covari` ancia mostral. Probablement, el resultat principal de la tesi ´es la demostraci´ o de qu`e els estimadors quadr`atics s´ on capa¸cos d’utilitzar la informaci´ o estad´ıstica de quart ordre dels nuisance parameters. M´es en concret, s’ha demostrat que tota la informaci´ o no gaussiana de les dades que els m`etodes de segon ordre s´on capa¸cos d’aprofitar apareix reflectida en els cumulants de quart ordre dels nuisance parameters. De fet, aquesta informaci´ o de quart ordre esdev´e rellevant si el m` odul dels nuisance parameters ´es constant i la SNR ´es moderada o alta. En aquestes condi- iii iv cions, es demostra que la suposici´o gaussiana dels nuisance parameters d´ona lloc a estimadors quadr`atics no eficients. Un altre resultat original que es presenta en aquesta mem` oria ´es la deducci´ o del filtre de Kalman est`es de segon ordre, anomenat QEKF. L’estudi del QEKF assenyala que els algoritmes de seguiment (trackers) de segon ordre poden millorar simult`aniament les seves prestacions d’adquisici´ o i seguiment si la informaci´ o estad´ıstica de quart ordre dels nuisance parameters es t´e en compte. Una vegada m´es, aquesta millora ´es significativa si els nuisance parameters tenen m` odul constant i la SNR ´es prou alta. Finalment, la teoria dels estimadors quadr` atics plantejada s’ha aplicat en alguns problemes d’estimaci´ o cl` assics en l’` ambit de les comunicacions digitals com ara la sincronitzaci´o digital no assistida per dades, el problema de l’estimaci´ o del temps d’arribada en entorns amb propagaci´o multicam´ı, la identificaci´ o cega de la resposta impulsional del canal i, per u ´ltim, l’estimaci´ o de l’angle d’arribada en sistemes de comunicacions m` obils amb m´ ultiples antenes. Per cadascuna d’aquestes aplicacions, s’ha realitzat un an` alisi intensiu, tant num`eric com asimpt` otic, de les prestacions que es poden aconseguir amb m`etodes d’estimaci´ o de segon ordre. a Natalia, Agra¨ıments Sin duda ´este es el cap´ıtulo m´ as importante de la tesis. Sin este cap´ıtulo, el resto de la tesis no tendr´ıa ning´ un sentido. En estos cinco a˜ nos he encontrado un “lugar en el mundo” donde desarrollar las dos vocaciones que dan sentido a mi vida: aprender y ense˜ nar. Sin duda este lugar lo han hecho posible un grupo de personas excepcionales sin las cuales no ser´ıa quien soy. Desde estas p´ aginas, quiero compartir con ellos este momento de alegr´ıa y satisfacci´ on. El meu primer agra¨ıment ´es pel Gregori: per fer-me sempre costat, per creure en mi i, sobre tot, per la seva amistat incondicional. Vull tamb´e fer arribar el meu m´es profund agra¨ıment a la Xell, per ajudar-me des del primer dia i oferir-me la seva amistat. Una altra abra¸cada molt forta pel Francesc, perqu`e ha estat com el meu germ` a gran al departament.... Recorda que tenim pendent un sopar per celebrar les tesis! L’altre germ`a gran va marxar de casa per` o sempre li estar´e agra¨ıt per recolzar-me tant. Gr` acies, Xavi. M´es agra¨ıments... al Jaume, perqu`e parlar amb ell ´es un plaer i sempre ha tingut un moment per parlar amb mi. Al Jose, perqu`e ´es una de les persones que m´es admiro personal i professionalment. Espero de tot cor que puguem treballar molts anys junts. El mateix m’agradaria dir-li a l’Enric. Des d’aqu´ı li desitjo que sigui molt feli¸c dins i fora del seu matrimoni... Josep, els teoremes de la capacitat tremolen quan senten el teu nom.... “que sigues feli¸c”. I molts m´es agra¨ıments a tots els companys del departament amb els que he viscut tantes estones agradables. Aquesta tesi ha estat una aventura apasionant perqu`e l’he viscut amb vosaltres... El u ´ltimo agradecimiento, aunque sea el primero de todos ellos, es para mi familia. Esta tesis es vuestra; he llegado aqu´ı gracias a vuestro esfuerzo y cari˜ no. Un beso muy fuerte para mis abuelos, para mis padres, para Robert y sus dos chicas, para mis tios y primos. Tambi´en para todos los amigos y amigas que no han dejado de darme ´animos. Gracias. Ahora y siempre, a Natalia, por ser principio y fin de todo lo que hago y siento. TQM. This work has been partially supported by the European Commission (FEDER) and Spanish/Catalan Government under projects TIC2003-05482, TEC2004-04526 and 2001SGR-00268. vii Contents 1 Introduction 1 1.1 The nuisance unknowns in parameter estimation . . . . . . . . . . . . . . . . . . 1 1.2 The Bayesian approach: the bias-variance dilema . . . . . . . . . . . . . . . . . . 3 1.3 Noncircular nuisance unknowns . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Self-noise in multivariate problems: interparameter interference. . . . . . . . . . . 6 1.5 Informative priors: estimation on track . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Limiting asymptotic performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.7 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Elements on Estimation Theory 13 2.1 Classical vs. Bayesian Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2 MMSE and MVU Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1 Decision Directed ML Estimation . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.2 Asymptotic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Linear Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.4.1 Low-SNR Unconditional Maximum Likelihood . . . . . . . . . . . . . . . 25 2.4.2 Conditional Maximum Likelihood (CML) . . . . . . . . . . . . . . . . . . 27 2.4.3 Gaussian Maximum Likelihood (GML) . . . . . . . . . . . . . . . . . . . . 28 Maximum Likelihood Implementation . . . . . . . . . . . . . . . . . . . . . . . . 29 2.5.1 ML-Based Closed-Loop Estimation . . . . . . . . . . . . . . . . . . . . . . 31 2.5.2 ML-based Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Lower Bounds in Parameter Estimation . . . . . . . . . . . . . . . . . . . . . . . 37 2.6.1 Deterministic Bounds based on the Cauchy-Schwarz Inequality . . . . . . 37 2.6.2 Bayesian Bounds based on the Cauchy-Schwarz Inequality . . . . . . . . . 49 2.6.3 Bayesian Bounds based on the Kotelnikov’s Inequality . . . . . . . . . . . 52 2.A UML for polyphase alphabets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 2.B Low-SNR UML results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.C CML results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.D GML asymptotic study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.4 2.5 2.6 ix CONTENTS x 2.E Closed-loop estimation efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.F Computation of Σsv (θ) for the small-error bounds . . . . . . . . . . . . . . . . . 61 2.G MCRB, CCRB and UCRB derivation . . . . . . . . . . . . . . . . . . . . . . . . 62 3 Optimal Second-Order Estimation 65 3.1 Definitions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 3.2 Second-Order MMSE Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3 Second-Order Minimum Variance Estimator . . . . . . . . . . . . . . . . . . . . . 70 3.4 A Case Study: Frequency Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 72 3.4.1 Bias Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.4.2 MSE Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.A Second-order estimation in noncircular transmissions . . . . . . . . . . . . . . . . 81 3.B Deduction of matrix Q(θ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 3.C Fourth-order moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.D Bayesian average in frequency estimation . . . . . . . . . . . . . . . . . . . . . . 85 3.E Bias study in frequency estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 86 3.5 4 Optimal Second-Order Small-Error Estimation 89 4.1 Small-Error Assumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2 Second-Order Minimum Variance Estimator . . . . . . . . . . . . . . . . . . . . . 92 4.3 Second-Order Identifiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.4 Generalized Second-Order Constrained Estimators . . . . . . . . . . . . . . . . . 95 4.5 A Case Study: Frequency Estimation . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.A Small-error matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.B Proof of bias cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 5 Quadratic Extended Kalman Filtering 107 5.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 Background and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 5.3 Linearized Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.4 Quadratic Extended Kalman Filter (QEKF) . . . . . . . . . . . . . . . . . . . . . 112 5.4.1 Another QEKF derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.4.2 Kalman gains recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 5.4.3 QEKF programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.5 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6 Case Studies 123 CONTENTS 6.1 6.2 6.3 6.4 6.5 xi Non-Data-Aided Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.1.2 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 6.1.3 Open-Loop Timing Synchronization . . . . . . . . . . . . . . . . . . . . . 129 6.1.4 Closed-Loop Analysis and Optimization . . . . . . . . . . . . . . . . . . . 131 6.1.5 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Carrier Phase Synchronization of Noncircular Modulations . . . . . . . . . . . . . 141 6.2.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 6.2.2 NDA ML Estimation in Low-SNR Scenarios . . . . . . . . . . . . . . . . . 143 6.2.3 High-SNR Analysis: Self-noise . . . . . . . . . . . . . . . . . . . . . . . . 145 6.2.4 Second-Order Optimal Estimation . . . . . . . . . . . . . . . . . . . . . . 146 6.2.5 High SNR Study: Self-noise . . . . . . . . . . . . . . . . . . . . . . . . . . 147 6.2.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 TOA Estimation in Multipath Scenarios . . . . . . . . . . . . . . . . . . . . . . . 152 6.3.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6.3.2 Optimal Second-Order NDA Estimator . . . . . . . . . . . . . . . . . . . 154 6.3.3 Optimal Second-Order DA Estimator . . . . . . . . . . . . . . . . . . . . 156 6.3.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Blind Channel Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.4.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.4.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Angle-of-Arrival (AoA) Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 6.5.1 Signal Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 6.5.2 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.A Computation of Q for carrier phase estimation . . . . . . . . . . . . . . . . . . . 175 6.B Asymptotic expressions for multiplicative channels . . . . . . . . . . . . . . . . . 176 7 Asymptotic Studies 177 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 7.2 Low SNR Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 7.3 High SNR Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 7.4 7.3.1 (Gaussian) Unconditional Cram´er-Rao Bound . . . . . . . . . . . . . . . . 183 7.3.2 Gaussian Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 184 7.3.3 Best Quadratic Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . 184 7.3.4 Large Error Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Large Sample Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 7.4.1 (Gaussian) Unconditional Cram´er-Rao Bound . . . . . . . . . . . . . . . . 189 7.4.2 Gaussian Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . 190 7.4.3 Best Quadratic Unbiased Estimator . . . . . . . . . . . . . . . . . . . . . 191 CONTENTS xii 7.5 7.6 7.4.4 Second-Order Estimation in Digital Communications . . . . . . . . . . . . 193 7.4.5 Second-Order Estimation in Array Signal Processing . . . . . . . . . . . . 196 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 7.5.1 SNR asymptotic results for the BQUE and GML estimators . . . . . . . . 199 7.5.2 SNR asymptotic results for the large-error estimators . . . . . . . . . . . 201 7.5.3 Large sample asymptotic results for the BQUE and GML estimators . . . 202 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 7.A Low-SNR ML scoring implementation . . . . . . . . . . . . . . . . . . . . . . . . 210 −1 (θ) . . . . . . . . . . . . . . . . . . . . . . . . 211 7.B High-SNR limit of R−1 (θ) and R 7.C High-SNR limit of Q−1 (θ) (K full-rank) . . . . . . . . . . . . . . . . . . . . . . . 213 7.D High-SNR limit of Q−1 (θ) (K singular) . . . . . . . . . . . . . . . . . . . . . . . 215 7.E High-SNR results with A (θ) singular . . . . . . . . . . . . . . . . . . . . . . . . . 217 7.F High-SNR UCRB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 7.G High-SNR UCRB variance floor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 7.H High-SNR study in feedforward second-order estimation . . . . . . . . . . . . . . 221 7.I High-SNR MSE floor under the Gaussian assumption . . . . . . . . . . . . . . . . 223 7.J Performance limits in second-order frequency estimation . . . . . . . . . . . . . . 224 7.K Asymptotic study for M → ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 7.L Asymptotic study for Ns → ∞ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 8 Conclusions and Topics for Future Research 8.1 237 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 A Notation 245 B Acronyms 249 Bibliography Chapter 1 Introduction This dissertation is the result of almost five years working in digital communications and signal processing. During all this time, multiple estimation problems have been addressed. The experience in these applications is materialized in this document. We thought that an attractive way of introducing this thesis is to explain its evolution from the original idea until the completion of this dissertation. In the following sections, the series of obstacles that were dealt with in the way of this thesis are briefly commented and contextualized. In short, this is the history of the thesis... 1.1 The nuisance unknowns in parameter estimation In the beginning, our research activity was focused on non-data-aided (blind) digital synchronization [Gar88a][Men97] [Vaz00]. In this field, the receiver has to estimate some parameters from the received waveform in order to recover the transmitted data symbols. Basically, the receiver has to determine the symbol timing and, in bandpass communications, the carrier phase and frequency. With this aim, in most communication standards, a known training sequence is transmitted to assist the receiver during the signal synchronization. Once the training is finished, the synchronizer has to maintain the synchronism despite the parameters usually fluctuate due to the time-varying propagation channel and the terminal equipments nonidealities. In these conditions, the synchronizer has to cope with the thermal noise and, in addition, with the so-called self-noise —or pattern-noise— generated by the own unknown random data symbols. In fact, the random data symbols can be regarded as nuisance parameters that complicate the estimation of the parameter of interest. An special attention is given in this thesis to these nuisance parameters and the induced self-noise. Most non-data-aided techniques in the literature are designed assuming a low SNR 1 CHAPTER 1. INTRODUCTION 2 [Men97][Vaz00]. This assumption is rather realistic in modern digital communications due to the utilization of sofisticated error correcting codes [Ber93]. In that case, it is well-known that the maximum likelihood (ML) estimator is quadratic in the received data in most estimation problems as, for instance, timing and frequency synchronization. The most important point is that, whatever the actual SNR, the ML estimator is known to attain the Cram´er-Rao bound (CRB) if the observation time is sufficiently large [Kay93b]. Thus, the ML estimator is asymptotically the minimum variance unbiased estimator [Kay93b]. Unfortunately, the ML estimator is generally unknown for medium-to-high SNRs in the presence of nuisance parameters. Thus, the ML estimator is an unknown function of the observed data y, that can be approximated around the true parameter θ —small-error approximation— by means of the following N-th order polynomial: M L θ N Mn (θ) yn n=0 where yn is the vector containing all the n-th order sample products of y, and Mn (θ) the associated coefficients. Notice that the so-called small-error approximation can be achieved by means of iterative and closed-loop schemes (Chapter 2). Regarding the last expression, we conclude that higher-order techniques (i.e., N > 2) are generally required to attain the CRB for medium-to-high SNR. For example, a heuristic fourthorder closed-loop timing synchronizer was proposed in [And90][Men97, Sec.9.4] for the minimum shift keying (MSK) modulation that outperforms —at high SNR— any existing second-order technique. This work was the motivation for deducing the optimal fourth-order estimator given by 4 = M0 (θ) + M2 (θ) y2 + M4 (θ) y4 θ where M0 (θ) , M2 (θ) and M4 (θ) are selected to minimize the estimator variance [Vil01b]. The proposed estimator became quadratic at low SNR (M4 = 0) and exploited the fourth-order component when the SNR was increased (M4 = 0). For more details, the reader is referred to the original paper [Vil01b]: • “Fourth-order Non-Data-Aided Synchronization”. J. Villares, G. V´ azquez, J. Riba. Proc. of the IEEE Int. Conf. on Accoustics, Speech and Signal Processing 2001 (ICASSP 2001). pp. 2345-2348. Salt Lake City (USA). May 2001. Although the focus shifted from fourth-order to second-order methods soon, this contribution was actually the basis of this thesis. In this work, the estimator coefficients were directly optimized for a given observation length and estimator order (N = 4). To carry out this optimization, the Kronecker product ⊗ and vec (·) operators were introduced in order to manipulate 1.2. THE BAYESIAN APPROACH: THE BIAS-VARIANCE DILEMA 3 the n-th order observation yn . Moreover, during the computation of the optimal coefficients M2 and M4 , we realized that some fourth-order moments of the MSK modulation were ignored in other well-known second-order ML-based approximations. After this work, we wondered about the best quadratic non-data-aided estimator or, in other words, which are the optimal coefficients M0 (θ) and M2 (θ) in 2 = M0 (θ) + M2 (θ) y2 θ for a given estimation problem. At low SNR, the optimal second-order estimator is given by the low-SNR ML approximation, at least for sufficiently large data records. On the other hand, if the SNR increases, the optimal second-order estimator is only known in case of Gaussian data symbols. However, the Gaussian assumption is clearly unrealistic in digital communications because the transmitted symbols belong to a discrete alphabet. This intuition was confirmed for the MSK modulation in the following paper [Vil01a]. • “Best Quadratic Unbiased Estimator (BQUE) for Timing and Frequency Synchronization”. J. Villares, G. V´ azquez. Proc. of the 11th IEEE Int. Workshop on Statistical Signal Processing (SSP01). pp. 413-416. Singapore. August 2001. ISBN 0-7803-7011-2. In this pioneering paper, the Gaussian assumption was found to yield suboptimal timing estimates at high SNR when the observation time is short. Quadratic estimators were improved considering the fourth-order cumulants or kurtosis of the MSK constellation. However, this fourth-order information was shown to be irrelevant if the number of observations is augmented. Thus, it was shown via Monte Carlo simulations that the Gaussian assumption is asymptotically optimal in the problem of timing synchronization. Additional simulations and remarks were given in the tutorial paper presented in the ESA workshop [Vaz01]. 1.2 The Bayesian approach: the bias-variance dilema Another contribution in the referred paper was the formulation of optimal second-order openloop estimators. Open-loop schemes are very attractive in digital synchronization because they allow reducing the acquisition time of closed-loop synchronizers [Men97][Rib97]. To design openloop estimators, the small-error approximation is abandonned and the parameter θ is assumed to take values in a given interval Θ. In this large-error scenario, the N-th order expansion of the ML estimator depends on the unknown value of θ ∈ Θ and, consequently, the ML estimator cannot be generally implemented by means of a polynomial in y. To overcome this limitation, the Bayesian formulation was adopted and the parameter θ was modelled as a random variable of know prior distribution fθ (θ). Then, the prior was CHAPTER 1. INTRODUCTION 4 applied to optimize the estimator coefficients {Mn } “on the average”, that is, considering all the possible values of θ ∈ Θ and the associated probabilities fθ (θ). Actually, the Bayesian approach encompasses both the small and large error scenarios since the small-error approximation can be imposed by considering an extremely informative prior. It can be shown that, in the small-error regime, it is equivalent to minimize the estimator mean square error (MSE) and the estimator variance. However, in the large-error regime, the minimum MSE estimator becomes usually biased with the aim of reducing the variance contribution. In fact, the more corrupted is the observation y, the more biased is the minimum MSE solution. The reason is that, when the observation is severely degraded by the thermal and self noise terms, the minimum MSE estimator is not confident about the observation y and it resorts to the a priori knowledge on the parameters. In that way, the estimator reduces the variance induced by the random terms (noise and self-noise) although it becomes biased in return unless the value of θ coincides with the expected value of the prior. In this early paper, the main problem in second-order open-loop estimation was identified for the first time. In general, unbiased second-order open-loop estimators are not feasible. Even if the estimator variance can be usually removed by extending the observation time, there is always a residual bias that sets a limit on the performance of open-loop estimators. Despite this conclusion, the design of almost unbiased open-loop second-order estimators was addressed by imposing the unbiasedness constraint at L values of θ ∈ Θ. Actually, the L test points were distributed regularly inside the parameter space Θ. The number of unbiased test points was in practice a function of the observation time and the oversampling factor. This formulation was further improved by allowing the estimator to select automatically the best unbiased test points. In that way, the estimator can decide the number and position of the test points in order to minimize the overall estimator bias. This formulation was developed in the following conference paper for the problem of timing and frequency synchronization [Vil02b]. • “Sample Covariance Matrix Based Parameter Estimation for Digital Synchronization”. J. Villares, G. V´ azquez. Proc. of the IEEE Global Communications Conference 2002 (Globecom 2002). November 2002. Taipei (Taiwan). Another important advance in this paper was the closed-form derivation of the kurtosis matrix K for any linear modulations. This matrix contains all the fourth-order statistical information about the transmitted symbols that is relevant for second-order estimation. Actually, matrix K gathers all the statistical information about the digital modulation that is ignored when the Gaussian assumption is adopted. The last two papers [Vil01a][Vil02b] are actually the foundation of Chapter 3 (open-loop estimation) and Chapter 4 (closed-loop estimation). 1.3. NONCIRCULAR NUISANCE UNKNOWNS 5 In the same year, the results obtained in the last two papers [Vil01a][Vil02b] were extended to estimate the timing and frequency parameters in the presence of multipath propagation. This work was actually motivated by the participation in the EMILY European project [Bou02a][Bou02b], in which advanced radiolocation techniques for wireless outdoor communication systems (e.g., GSM and UMTS) were studied. The results of this research were published in the following paper [Vil02a] and are included in Section 6.3. • “Optimal Quadratic Non-Assisted Parameter Estimation for Digital Synchronisation”. J. Villares, G. V´ azquez. Proc. of the Int. Zurich Seminar on Broadband Communications 2002 (IZS2002). pp. 46.1-46.4. Zurich (Switzerland). February 2002. The Bayesian formulation adopted to design open-loop estimators requires in most cases to compute numerically the estimator coefficients. The reason is that, in most estimation problems, the average with respect to the prior fθ (θ) does not admit an analytical solution. Exceptionally, closed-form expressions can be obtained for the frequency estimation problem if the prior is uniform. Thus, the exhaustive evaluation of open-loop second-order frequency estimators was carried out in the following paper [Vil03a]. • “Sample Covariance Matrix Parameter Estimation: Carrier Frequency, A Case Study”. J. Villares, G. V´ azquez. Proc. of the IEEE Int. Conf. on Accoustics, Speech and Signal Processing (ICASSP). pp. VI-725 - VI-728. Hong Kong (China). April 2003. In this paper, it was shown that unbiased second-order open-loop estimators can be obtained by increasing the oversampling factor. In practice, unbiased open-loop estimators are feasible if the sampling rate is greater than four times the maximum frequency error (Section 3.4). 1.3 Noncircular nuisance unknowns Thus far, the second-order framework was only applied to formulate NDA timing and frequency synchronizers. However, the problem of carrier phase synchronization was ignored because higher-order methods are usually required to estimate the signal phase. However, this is not true in case of noncircular modulations (e.g., PAM, BPSK, staggered formats and CPM). Remember that the transmitted symbols {xi } belong to a noncircular constellation if the expected value of xi xk is different from zero for certain values of i and k. The problem of carrier phase synchronization in case of MSK-type modulations was addressed in the following paper [Vil04b] and can be consulted in Section 6.2. CHAPTER 1. INTRODUCTION 6 • “Self-Noise Free Second-Order Carrier Phase Synchronization of MSK-Type Signals”, J. Villares, G. V´ azquez, Proc. of the IEEE Int. Conf. on Communications (ICC 2004). June 2004. Paris (France). 1.4 Self-noise in multivariate problems: interparameter interference. At high SNR, the dominant disturbance is enterely due to the randomness of the received symbols (i.e., the self-noise). In this high-SNR scenario, the self-noise variance is minimized if the kurtosis of the data symbols is taken into account. Otherwise, if the Gaussian assumption is imposed, the variance of the self-noise term increases. However, the self-noise contribution is normally negligible in digital synchronization and the Gaussian assumption is practically optimal. In order to test the Gaussian assumption, we decided to study other estimation problems in which the self-noise term was more critical. With this purpose, the uniparametric formulation was generalized to encompass important multivariate estimation problems in the context of digital communications such as directionof-arrival (DOA) and channel estimation. These problems were selected because the self-noise contribution was expected to degrade significantly the estimator performance at high SNR. Hence, these two problems were valuable candidates for examining the Gaussian assumption. In the DOA estimation problem, the DOA estimator is faced with the self-noise caused by the user of interest and, in addition, by the other interfering users (multiple access interference). In the channel estimation problem, the received pulse is severely distorted by the unknown channel impulse response. Then, the intersymbol interference is enhanced at high SNR and hence the self-noise variance is amplified. The formulation of the optimal second-order multiparametric open- and closed-loop estimator will appear in the IEEE Transactions on Signal Processing next July [Vil05]. The theoretical material in this article is presented in Chapter 3 (open-loop or large-error estimation) and Chapter 4 (closed-loop or small-error estimation). • “Second-Order Parameter Estimation”. J. Villares, G. V´ azquez. IEEE Transactions on Signal Processing. July 2005. As it was expected, the performance of second-order DOA estimators was severely degraded when the angular separation of the users was reduced because, in that case, the multiple access interference became the dominant impairment. In these singular scenarios, the Gaussian assumption yielded a significant loss for practical SNRs if the transmitted symbols were drawn 1.5. INFORMATIVE PRIORS: ESTIMATION ON TRACK 7 from a constant-modulus constellation such as MPSK or CPM. The Gaussian assumption loss was a function of the angular separation as well as the the number of antennas. All these important results were presented in the following paper [Vil03b] and are included in Section 6.5. • “Second-Order DOA Estimation from Digitally Modulated Signals”, J. Villares, G. V´azquez, Proc. of the 37th IEEE Asilomar Conf. on Signals, Systems and Computers, Pacific Grove (USA), November 2003. In this paper, the problem of tracking the DOA of multiple moving digitally-modulated users is considered. In this scenario, the tracking condition can be lost at low SNR when two users approach each other. In this paper, it is shown that this is usually the outcome if the tracker is forced to cancel out the multiple access interference. On the other hand, if the multiple access interference is incorporated as another random self-noise term in the tracker optimization, the optimal second-order tracker is able to maintain the tracking condition even if the users cross each other [Vil03b]. As it has been explained before, the problem of blind channel estimation was also a promising candidate for testing the Gaussian assumption. Some results are presented in Section 6.4 that confirm the interest of the optimal second-order estimator in the medium-to-high SNR range when the nuisance parameters have constant modulus. In that case, the Gaussian assumption cannot estimate the channel amplitude whereas the optimal solution yields self-noise free estimates even if the channel amplitude is unknown (Section 6.4). This channel estimation problem is currently being investigated in case of noncircular constant-modulus transmissions [LS04][LS05a][LS05b]. 1.5 Informative priors: estimation on track Thus far, all the second-order closed-loop estimators and trackers had been designed and evaluated in the steady-state, that is, assuming that all the parameters were initially captured during the acquisition phase. In fact, once the acquisition is completed, the estimator begins to operate in the small-error regime. The estimator coefficients were precisely optimized under the small-error assumption. However, the acquisition performance had never been involved into the estimator optimization. After this reflection, we were concerned with the optimization of closed-loop second-order estimators considering both the acquisition (large error) and steady-state (small-error) performance. With this aim, the Kalman filter formulation [And79][Kay93b] was adopted because it is known to supply the optimal transition from the large to the small error regime when the parameters and the observations are jointly Gaussian. Evidently, this assumption fails in CHAPTER 1. INTRODUCTION 8 most estimation problems in digital communications and the suboptimal extended Kalman filter (EKF) is only optimal in the steady-state [And79][Kay93b]. Despite this, the EKF provides a systemmatic and automatic procedure for updating the prior distribution fθ (θ) every time a new observation is processed. In that way, it is possible to enhance the acquisition performance without altering the optimal steady-state solution. The research in this direction yielded the so-called quadratic EKF (QEKF) that extended the classical EKF to deal with quadratic observations. The QEKF formulation was published in the following conference paper [Vil04a] and it has been included in Chapter 5. • “On the Quadratic Extended Kalman Filter”, J. Villares, G. V´ azquez. Proc. of the Third IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM 2004). July 2004. Sitges, Barcelona (Spain). In this paper, the QEKF is designed and simulated for the aforementioned DOA estimation problem. The most important conclusion is that, at high SNR, the Gaussian assumption is also suboptimal during the acquisition phase if the data symbols are drawn from a constant-modulus constellation. In that way, the acquisition time can be notably reduced if the QEKF takes into account the kurtosis of the data symbols. Besides, the Gaussian assumption loss at high SNR is shown to persist in the steady-state even if the tracker observation time is increased to infinity (Chapter 5). 1.6 Limiting asymptotic performance The last remark on the QEKF asymptotic performance persuaded us to study in detail the performance limits in second-order estimation. The objective was to determine the asymptotic conditions for the Gaussian assumption to apply. The asymptotic analysis confirmed that the Gaussian assumption was optimal at low SNR but it was suboptimal at high SNR if the nuisance parameters belonged to a constant-modulus alphabet. Finally, the performance of second-order closed-loop estimators was evaluated when the number of samples went to infinity. The conclusion was that the Gaussian assumption applies in digital synchronization, and in DOA estimation if the number of antennas goes to infinite. On the other hand, the Gaussian assumption fails for the medium-to-high SNR range in the problem of channel estimation and for DOA estimation in case of finite sensor arrays. All these asymptotic results were finally collected and are presented for the first time in Chapter 7. 1.7. THESIS OUTLINE 1.7 9 Thesis Outline The structure of the dissertation is presented next. The main contents and contributions are described chapter by chapter. Chapter 2: Elements on Estimation Theory. In this chapter, the most important concepts from the estimation theory are reviewed. The problem of parameter estimation in the presence of nuisance parameters is introduced and motivated. The maximum likelihood (ML) estimator is presented and the most important MLbased approaches in the literature are described. Special emphasis is put on the Gaussian ML (GML) estimator because it converges to the ML estimator at low SNR and yields the conditional ML (CML) solution at high SNR. The important point is that all these ML-based estimators are quadratic in the observation. The GML estimator is actually the ML estimator in case of having Gaussian nuisance parameters. However, the optimal second-order estimator is normally unknown if the nuisance parameters are not Gaussian. This was actually the motivation for this thesis. The iterative implementation of the aforementioned ML-based estimators is considered and the utilization of closed-loop schemes motivated. Finally, a survey on estimation bounds is included for the interested reader. Chapter 3: Optimal Second-Order Estimation. In this chapter, the optimal second-order estimator is formulated from the known distribution of both the wanted parameters and the nuisance parameters. The Bayesian formulation and two different optimization criteria are considered. In the first case, the estimator mean square error (MSE) is minimized in the Bayesian sense, that is, averaging the estimator MSE according to the assumed prior distribution. In the second case, the estimator variance is minimized subject to the minimum bias constraint. Again, the variance and bias are averaged by means of the prior distribution. The resulting large-error or open-loop estimators are evaluated for the problem of blind frequency estimation. The minimum MSE solution is shown to make a trade-off between the bias and variance terms. On the other hand, the minimum bias constraint is unable to completely eliminate the bias contribution although the observation time is augmented. Accordingly, the ultimate performace of quadratic open-loop estimators becomes usually limited by the residual bias. Chapter 4: Optimal Second-Order Small-Error Estimation. In this chapter, the design of closed-loop second-order estimators is addressed. Assuming that all the parameters have been previously acquired, closed-loop estimators are due to compensate CHAPTER 1. INTRODUCTION 10 for small errors. In this context, the optimal second-order small-error estimator is derived from the minimum variance estimator in Chapter 3 by considering an extremely informative prior. The resulting estimator is the best quadratic unbiased estimator (BQUE) and its variance is the (realizable) lower bound on the variance of any sample covariance based parameter estimator. The BQUE is proved to exploit the kurtosis matrix of the nuisance parameters whereas the Gaussian ML estimator ignores this information. Later, the conditions for second-order identifiability are analyzed and some important remarks are made about the so-called interparameter interference in multiuser scenarios. The frequency estimation problem is chosen once again to illustrate the main results of the chapter. Some simulations are selected to illustrate how the Gaussian assumption fails at high SNR. Chapter 5: Quadratic Extended Kalman Filtering. In this chapter, the well-known extended Kalman filter (EKF) is adapted to deal with quadratic observations. The coefficients of the quadratic EKF are calculated from the actual distribution of nuisance parameters. The optimal tracker is shown to exploit the kurtosis matrix of the nuisance parameters. The Gaussian assumption is evaluated during the acquisition and the steady-state for the problem of DOA estimation. It is shown that the acquisition time and the steady-sate variance can be reduced at high SNR if the transmitted symbols are drawn from a constant-modulus alphabet (e.g., MPSK or CPM) and this information is incorporated. Chapter 6: Case Studies. In this chapter, the optimal second-order small-error estimator deduced in Chapter 4 is applied to some relevant estimation problems. In the first section, some contributions in the field of non-data-aided sychronization are presented. Specifically, Section 6.1 is devoted to the global optimization of second-order closed-loop synchronizers and the design of open-loop timing sycnronizers in the frequency domain. In Section 6.2, the problem of second-order carrier phase synchronization is addressed in case of noncircular transmissions. In this section, the ML estimator is shown to be quadratic at low SNR for MSK-type modulations. Moreover, secondorder self-noise free estimates are achieved at high SNR exploiting the non-Gaussian structure of the digital modulation. In Section 6.3, the problem of time-of-arrival estimation in wireless communications is studied. The frequency-selective multipath is shown to increase the number of nuisance parameters and the Gaussian assumption is shown to apply in this case study. In Section 6.4, the classical problem of blind channel identification is dealt with. The channel amplitude is shown to be not identifiable unless the transmitted symbols belong to a constant-modulus constellation and this information is exploited by the estimator. 1.7. THESIS OUTLINE 11 Finally, the problem of angle-of-arrival estimation in the context of cellular communications is addressed in Section 6.5. The Gaussian assumption is clearly outperformed for practical SNRs in case of constant-modulus nuisance parameters and closely spaced sources. In this section, the importance of the multiple access interference (MAI) is emphasized and MAI-resistant secondorder DOA trackers are derived and evaluated. Chapter 7: Asymptotic Studies. In this chapter, analytical expressions are obtained for the asymptotic performance of the second-order estimators presented in Chapter 3 and Chapter 4. Firstly, the low SNR study concludes that the nuisance parameters distribution is irrelevant at low SNR and, therefore, the Gaussian assumption is optimal. On the other hand, the high SNR study states that the Gaussian assumption does not apply in case of constant-modulus nuisance parameters. This conclusion is related to the eigendecomposition of the nuisance parameters kurtosis matrix. Finally, the large sample study confirms that the Gaussian assumption is optimal in digital synchronization if the observation time goes to infinity. Likewise, the Gaussian assumption applies in DOA estimation if the number of antennas goes to infinity. However, the Gaussian assumption cannot be applied —even if the number of snapshots is infinite— in case multiple constant-modulus signals impinge into a finite array. Regarding the channel estimation problem, the asymptotic study indicates that second-order estimates could be improved by considering the actual distribution of the nuisance parameters. Chapter 8: Conclusions. This chapter concludes and summarizes the main results of this thesis. To finish, some topics for further research are proposed. 12 CHAPTER 1. INTRODUCTION Chapter 2 Elements on Estimation Theory The estimation theory deals with the basic problem of infering some relevant features of a random experiment based on the observation of the experiment outcomes. In some cases, the experiment mechanism is totally unknown to the observer and the use of nonparametric estimation methods is necessary. The term “nonparametric” means that the observed experiment cannot be modelled mathematically. Let us consider, for instance, the classical problem of spectral analysis that consists in computing the power spectral density of the observed signal from a finite sample. The performance of nonparametric methods is usually unsatisfactory when the observed time is limited. This situation is actually very usual because the experiment output is only temporally available; the experiment is not stationary; or the observer is due to supply the estimate in a short time. To design more efficient estimation techniques, it is recommended to find previously a convenient mathematical model for the studied experiment. The result of the experiment is thus a function of a finite number of unknow parameters, say θ, and other random terms forming the vector w. The vector w collects all the nuisance terms in the model that vary randomly during the observation time as, for example, the measurement noise. The objective is therefore finding the minimal parameterization in order to concentrate the most the uncertainty about the experiment. In those fields dealing with natural phenomena, the parametrization of the problem is definitely the most difficult point and, actually, the ultimate goal of scientists working in physics, sociology, economics, among others. Fortunately, the parameterization of human-made systems is normally accesible. In particular, in communication engineering, the received signal is known except for a finite set of parameters that must be estimated before recovering the transmitted information. Likewise, in radar applications, the received signal is know except for the time of arrival and, possibly, some other nuisance parameters. In the following, we will focus exclusively on parametric estimation methods assuming that we are provided with a convenient parameterization or signal model. 13 CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 14 In some of the examples above, it is possible to act on the experiment by introducing an excitation signal. In that case, the random experiment can be seen as an unknown system that is identified by observing how the system reacts to the applied excitation. This alternative perspective is normally adopted in system engineering and, specifically, in the field of automatic control. Unfortunately, in some scenarios, the observer is unaware of the existing input signal and blind system identification is required. For example, in digital communications, the transmitted symbols are usually unknown at the receiver side. This thesis is mainly concerned with blind estimation problems in which the problem parameterization includes the unknown input. Thus far, the formulation is rather general; the observation y ∈ CM is a given function of the input x ∈ CK , the vector of parameters θ ∈ RP and the random vector w ∈ CM of arbitrary known distribution. Formally, the general problem representation is considered in the following equation: y = a(x, θ, w) (2.1) where the function a (·) should be univoque with respect to θ and x, that is, it should be possible to recover θ and x from y if the value of w were known. In that case, the estimation problem is not ambiguous. The basic problem is that multiple values of θ, x and w yield the same observation y. Otherwise, it would not be an estimation problem but an inversion problem consisting in finding the inverse of a (·). Then, the objective is to estimate the value of θ based on the observation of y without knowing the input x and the random vector w. Thus, the entries of x appear as nuisance parameters increasing the uncertainty on the vector of parameters θ. In general, the vector of nuisance parameters would include all the existing parameters which are not of the designer’s interest, including the unknown inputs. For example, the signal amplitude is a nuisance parameter when estimating the time-of-arrival in radar applications. This thesis is mainly concerned with the treatment of these nuisance parameters in the context of digital communications. An estimator of θ is a given function z (·) of the random observation y, = z(y), θ − θo with θo the true value of θ. Evidently, the aim is to yielding a random error e = θ minimize the magnitude of e. Several criteria are listed in the literature minimizing a different cost function C(e) as, for example, the mean square error e2 , or the maximum error max {e}. On the other hand, a vast number of estimators have been formulated by proposing ad hoc functions z (·) whose performance is evaluated next. Some of them are briefly presented in the following sections. For more details, the reader is referred to the excellent textbooks on parametric estimation in the bibliography [Tre68][Sch91a][Kay93b]. 2.1. CLASSICAL VS. BAYESIAN APPROACH 2.1 15 Classical vs. Bayesian Approach There are two important questions that must be addressed before designing a convenient estimator. The first one is why some terms of the signal model are classified as random variables (w) whereas others are deterministic parameters (θ). The second question is whether the nuisance parameters in x should be modelled as random or deterministic variables. In the classical estimation theory, the wanted parameters θ are deterministic unknowns that are constant along the observation interval. On the other hand, those unwanted terms varying “chaotically” along the observation interval are usually modelled as random variables (e.g., the measurement noise, the signal amplitude in fast fading scenarios, the received symbols in a digital receiver, etc.). Regarding the vector x, the nuisance parameters can be classified as deterministic constant unknowns, say xc , or random variable unknowns, say xu . In the random case, we will assume hereafter that the probability density function of xu is known. However, if this information were not available, the entries of xu could be considered deterministic unknowns and estimated together with θ.1 In the classical estimation theory, the likelihood function fy (y; xc , θ) supplies all the statistical information for the joint estimation of xc and θ. If some nuisance parameters are random, say xu , the conditional likelihood function fy/xu (y/xu ; xc , θ) must be averaged with respect to the prior distribution of xu , as indicated next fy (y; xc , θ) = Exu fy/xu (y/xu ; xc , θ) = fy/xu (y/xu ; xc , θ) fxu (xu ) dxu . (2.2) On the other hand, modeling the constant nuisance parameters as random variables is rather controversial. For example, the received carrier phase is almost constant when estimating the signal timing in static communication systems. Even if these parameters come from a random experiment and their p.d.f. is perfectly known, we are only observing a particular realization of xc , which is most probably different from their mean value. Therefore, modeling these nuisance parameters as random variables might yield biased estimates of θ. Evidently, this bias will be cancelled out if several realizations of y were averaged, but only one realization is available! This controversy is inherent to the Bayesian philosophy [Kay93b, Ch. 10]. In the Bayesian or stochastic approach, all the parameters —including the vector of wanted parameters θ— are modelled as random variables of known a priori distribution or prior. Then, the resulting with respect to the estimators are designed to be optimal “on the average”, that is, averaging θ prior distributions of θ and x. Actually, all the classical concepts such as bias, variance, MSE, consistency and efficiency must be reinterpreted in the Bayesian sense. 1 Notice that this is not the only solution. For example, we can assume a non-informative prior for xu or, alterna- tively, we can apply Monte Carlo methods to evaluate numerically the unknow distribution of xu [Mer00][Mer01]. CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 16 Bayesian estimators are able to outperform classical estimators when they are evaluated “on the average”, mainly when the observation y is severely degraded in noisy scenarios. This is possible because Bayesian estimators are able to exploit the a priori information on the unknown parameters. Anyway, as S. M. Kay states in his book, ‘It is clear that comparing classical and Bayesian estimators is like comparing apples and oranges’ [Kay93b, p. 312]. Bearing in mind the above explaination, let us consider that y, x, and θ are jointly distributed random vectors. In that case, the whole statistical information about the parameters is given by the joint p.d.f. fy,x,θ (y, x, θ) = fy/x,θ (y/x, θ) fx (x) fθ (θ) , (2.3) assuming that x and θ are statistically independent. The first conditional p.d.f. in (2.3) is numerically identical to the conditional likelihood function fy/xu (y/xu ; xc , θ) in (2.2) but it highlights the randomness of xc and θ in the adopted Bayesian model. The other terms fx (x) = fxc (xc ) fxu (xu ) and fθ (θ) are the a priori distributions of x and θ, respectively. Notice that the classical and Bayesian theories coincide in case of non-informative priors, i.e., when fy (y; xc , θ) is significantly narrower than fxc (xc ) and fθ (θ) [Kay93b, Sec. 10.8]. In the sequel and for the sake of simplicity, all the nuisance parameters will be modelled as random variables or, in other words, x = xu and xc = ∅. Thus, fy (y; θ) = Ex fy/x,θ (y/x, θ) will be referred to as the unconditional or stochastic likelihood function in opposition to the joint or conditional likelihood function fy (y; x, θ) = fy/x,θ (y/x; θ) , which is also referred to as the deterministic likelihood function in the literature. 2.2 MMSE and MVU Estimation The ultimate goal in the classical estimation theory is the minimization of the estimator mean square error (MSE), that is given by 2 MSE(θ) Ey θ − θ = Ey z(y) − θ2 where Ey {·} involves, implicitly, the expectation over the random vectors w and x. The MSE can be decomposed as MSE(θ) = BIAS 2 (θ) + V AR(θ) 2.2. MMSE AND MVU ESTIMATION 17 where the estimator bias and variance are given by 2 − θ BIAS 2 (θ) = Ey θ 2 V AR(θ) = Ey θ − Ey θ The minimum MSE (MMSE) estimator finds a trade-off between the bias and the variance for every value of θ. Unfortunately, the bias term is usually a function of θ and, consequently, the MMSE estimator is generally not realizable because it depends on θo [Kay93b, Sec. 2.4.]. In general, any estimator depending on the bias term will be unrealizable in the classical framework. This limitation suggests to focus uniquely on unbiased estimators holding that BIAS 2 (θ) = 0 for all θ. Thus, the estimator MSE coincides with its variance and the resulting estimator is usually referred to as the minimum variance unbiased (MVU) estimator [Kay93b, Ch. 2]. The MVU estimator minimizes the variance subject to the unbiased constraint for every θ. The Rao-Blackwell-Lehmann-Scheffe theorem facilitates a procedure for finding the MVU estimator [Kay93b, Ch.5]. Unfortunately, this method is usually tedious and sometimes fails to produce the MVU estimator. Notice that the existence of the MVU estimator is not guaranteed either. Despite these difficulties, the MVU formulation is widely adopted because the maximum likelihood principle is known to provide approximatelly the MVU estimator under mild regularity conditions [Kay93b, Ch. 7]. If the classical framework is abandonned in favour of the Bayesian approach, the dependence of MSE(θ) on the true parameter θ can be solved by averaging with respect to the prior fθ (θ). Therefore, the Bayesian MMSE estimator can be formulated as the minimizer of 2 2 Eθ {MSE(θ)} = Eθ Ey θ − θ = Ey θ − θ fθ (θ) dθ, (2.4) that is known to be the mean of the posterior p.d.f. fθ/y (θ/y) [Kay93b, Eq. 10.5], i.e., M MSE = Eθ/y {θ/y} = fy−1 (y) θ θfy (y; θ) fθ (θ) dθ (2.5) where the Bayes’ rule is applied to write fθ/y (θ/y) in terms of the likelihood function and the prior: fθ/y (θ/y) = fy (y; θ) fθ (θ) fy (y; θ) fθ (θ) . = fy (y) fy (y; θ) fθ (θ) dθ The Bayesian MMSE estimator is known to minimize the MSE “on the average” (2.4). This means that the actual MSE will be high if the actual parameter θo is unlikely, and small if fθ (θ) is distributed around the true parameter θo . CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 18 2.3 Maximum Likelihood Estimation Although there are other relevant criteria, the maximum likelihood (ML) principle has become the most popular parametric method for deducing statistically optimal estimators of θ. In the studied signal model (2.1), the observation is clearly a random variable due to the presence of the random vectors w and x. Actually, we have a single observation yo of this random variable from which the value of θ must be inferred. The ML estimator is the one chosing the value of θ —and implicitly the value of w and x— that makes yo the most likely observation. Formally, if fy (y; θ) is the probability density function of the random vector y parameterized by θ, the ML estimator is given by U M L = arg max {fy (yo ; θ)} , θ θ (2.6) where yo is the vector of observed data2 and fy (y; θ) = Ex fy/x,θ (y/x, θ) = fy (y; x, θ) fx (x) dx (2.7) is known as the unconditional likelihood function. Likewise, the estimator in (2.6) is known as the unconditional or stochastic maximum likelihood (UML) estimator because the nuisance parameters are modelled as random unknowns (Section 2.1). If the nuisance parameters are really random variables, the UML estimator is actually the true ML estimator of θ. Alternatively, the nuisance parameters can be modelled as deterministic unknowns —as done for θ. In the context of the ML theory, the deterministic or conditional model is unavoidable when x is a constant unknown or there is no prior information about x (Section 2.1). Moreover, even if the nuisance parameters are actually random, the CML approach is often adopted if the expectation in (2.7) cannot be solved analytically. In that case, however, the CML solution is generally suboptimal because it ignores the prior information about x. Thus, the deterministic or conditional maximum likelihood (CML) estimator is formulated as follows CM L = arg max max fy (y; x, θ) = arg max {fy (y; x M L , θ)} θ (2.8) where fy (y; x, θ) is the joint or conditional likelihood function and M L = arg max max fy (y; x, θ) x (2.9) θ θ x x θ is the ML estimator of x. Comparing the UML and CML solutions in (2.6) and (2.8), we observe that in the unconditional model the nuisance parameters are averaged out using the prior fx (x) whereas in the 2 In the sequel, the random variable y and the observation yo will be indistinctly named y for the sake of simplicity. 2.3. MAXIMUM LIKELIHOOD ESTIMATION 19 M L . conditional model fy (y; x, θ) is compressed by means of the ML estimate of x, namely x Also, it is worth noting that, if the nuisance parameters belong to a discrete alphabet, we are ML is actually the ML detector. It is found that the dealing with a detection problem and x estimation of θ is significantly improved by exploiting the discrete3 nature of x. This aspect is crucial when designing estimation techniques for digital communications in which x is the vector of transmitted symbols. Finally, the following alternative estimator is proposed now: CM L2 = arg max max fy (y; x, θ) fx (x) = arg max {fy (y; x M AP , θ)} θ θ where θ x M AP = arg max max fy (y; x, θ) fx (x) x x θ (2.10) (2.11) is the Maximum a Posteriori (MAP) detector exploiting the prior distribution of x. Notice that CM L2 in case of equally likely nuisance parameters. CM L = θ θ 2.3.1 Decision Directed ML Estimation Focusing on those estimation problems dealing with discrete nuisance parameters, the conditional ML estimators in equations (2.8) and (2.10) exploit the hard decisions provided by the ML or MAP detectors of x, respectively. In the context of digital communications, these estimation techniques are referred to as decision directed (DD). Decision-directed estimators are usually implemented iterating equations (2.8) and (2.9) for the ML detector, or (2.10) and (2.11) for the MAP detector. The main drawback of iterative algorithms is the uncertain convergence to the global maximum of fy (y; x, θ). In some kind of problems, decision directed methods are efficient at high SNR. For example, in digital communications, DD synchronizers are known to attain the Cram´er-Rao bound at high SNR [And94][Moe98]. However, when the noise variance is high, hard decisions are unreliable and it is better to compute soft decisions on the nuisance parameters. In digital communications, the estimation techniques based on soft decisions about the transmitted symbols are usually known as non-data-aided (NDA) [Men97]. Indeed, this interpretation is adopted in [Vaz00][Rib01b] to describe some ML-based NDA synchronizers. In [Noe03], the Expectation-Maximization (EM) algorithm [Dem77][Fed88] is invoked to prove that UML estimation requires soft decisions from the MAP detector. More specifically, the nuisance parameters soft information is introduced by means of the a posteriori probabilities 3 In order to unify the study of continuous and discrete nuisance parameters, the prior fx (x) will be used indistinctly in both cases. To do so, if x ∈ {a1 , . . . , aI } with I the alphabet size, fx (x) will be a finite number of I i=1 p(ai )δ (x − ai ) with p(ai ) the probability of ai . Dirac’s deltas, i.e, fx (x) = CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 20 fx (a1 /y, θ) , ..., fx (aI /y, θ) where {ai }i=1,... ,I are all the possible values of x. The EM algorithm is then applied to obtain an iterative implementation of the UML estimator following the socalled Turbo principle [Mie00][Ber93]. In these schemes, the estimator (2.6) is assisted with the decoder soft decisions and vice versa. The EM foundation ensures the convergence to the UML solution under fairly general conditions. The required soft decisions are provided by the optimal MAP decoder proposed in [Bah74], that supplies at each iteration the a posteriori probability fx (x/y, θ) for every possible value of x. It is worth noting that the UML estimator is able to exploit the statistical dependence introduced by the encoder whereas the conditional approach in (2.8)-(2.10) does not. In the conditional model, the estimator is only informed that the codeword x is a redundant vector and, thus, it belongs to a reduced subset or codebook. In addition, the UML estimator is able to exploit the statistical dependence of the nuisance parameters in order to reduce their uncertainty at low SNR. Another suitable implementation of the conditional estimators (2.8)-(2.10) is to assign a to each survivor path in the Viterbi decoder, corresponding to a tentative different estimator θ sequence of symbols x. The estimator output is then used to recompute the metric of the associated path. These kind of methods are usually referred to in the literature as Per Survivor Processing (PSP) [Pol95]. It can be shown that this approach attains the performance of the CML estimator in (2.8). 2.3.2 Asymptotic properties The importance of the ML theory is that it supplies the minimum variance unbiased (MVU) estimator if the observed vector is sufficiently large under mild conditions. This result is a consequence of the asymptotic efficiency of the ML criterion, which is known to attain the Cram´er-Rao lower bound as the number of observations increases (Section 2.6.1). Therefore, the ML theory facilitates a systemmatic procedure to formulate the MVU estimator in most estimation problems of interest. In this section, the most relevant properties of the ML estimator are enunciated [Kay93b, Sec. 7B]. If the observation size goes to infinity (M → ∞), it can be shown that Property 1. The ML estimator is asymptotically Gaussian distributed with mean θo and covariance BCRB (θo ) where θo is the true parameter and BCRB (θo ) is the Cr´amer-Rao lower bound evaluated at θo (Section 2.6.1). This means that the ML estimator is asymptotically unbiased and efficient or, in other words, the ML estimator leads asymptotically (M → ∞) to 2.3. MAXIMUM LIKELIHOOD ESTIMATION fy(y;θ) 21 outlier fy(y;θ) θ^ ML θo small-error θ θo θ^ ML θ large-error Figure 2.1: This picture illustrates the significance of the term outlier in the context of ML estimation. the minimum variance unbiased (MVU) estimator with M L −→ θo Ey θ H Ey θM L − θo θM L − θo −→ BCRB (θo ) . M L → θo as Property 2. The ML estimator is asymptotically consistent meaning that θ the observation size goes to infinity. This property implies that the CRB tends to zero as M is increased, i.e., BCRB (θo ) → 0. These properties are verified if the regularity condition ∂ Ey ln fy (y; θ) =0 ∂θ θ=θ o (2.12) is guaranteed for every θo . Fortunately, most problems of interest verify the above regularity condition. The implicit requirement is that the function support on y of fy (y; θ) does not depend on the parameter θ so that the integral limits of Ey {·} are independent of θ. This condition is needed to have unbiased estimates since (2.12) guarantees that Ey {ln fy (y; θ)} has a maximum at the true parameter θo whatever the value of θo . As proved in [Kay93b, Theorem 7.5], the first property on the optimality of the ML estimator is satisfied even for finite observations provided that the signal model is linear in θ and x. However, a large number of estimation problems are nonlinear in the parameter vector θ. In that case, it is very important to determine how many samples (M) are required to guarantee the ML asymptotic efficiency (property 1). Fortunately, in most problems of interest this value is not excessive. It is found that the minimum M depends on the signal model at hand as well as the variance of the noise term w, say σ2w . If the value of σ 2w is low and/or M is large, the CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 22 variance ML CRB SNR large-error region small-error SNR region threshold Figure 2.2: This picture illustrates the existence of a SNR threshold in nonlinear estimation problems. This threshold divides the SNR axis into the small-error regime, in which the CRB is attained, and the large-error regime, in which efficient estimators does not exist. Notice that the threshold position can be moved to the left by increasing the vector of observations. log-likelihood function ln fy (y; θ) exhibits a parabolic shape —quadratic form— with a unique maximum near the true parameter θo . Only in this small-error regime, the ML estimator is statistically efficient holding property 1. On the other hand, if the value of σ2w is high and/or M is not sufficiently large, the likelihood function fy (y; θ) becomes multimodal and large errors are committed when the level of a distant maximum or outlier exceeds the true parameter maximum (Fig. 2.1). In this large-error regime, the variance of the ML estimator departs abruptly from the CRB. It is found that the estimator enters in the large-error regime if the noise variance σ2w exceeds a given threshold. This threshold can be augmented (i.e., σ2w greater) if the observation size is increased and, therefore, the largeerror region disappears as long as the observation size goes to infinity. This is actually the sense of the ML asymptotic efficiency (property 1). The existence of a low-SNR threshold in nonlinear estimation problems suggests to distinguish between the small-error and large-error scenario (Fig. 2.2). In the first case, ML estimators are efficient and, hence, they attain the CRB (Section 2.6.1). Thus, the ML principle becomes the systematic way of deducing the MVU estimator in the small-error regime. Moreover, in the small-error case, the ML estimator is also optimal in terms of mean square error (Section 2.2). This conclusion is important because MMSE estimators are generally not realizable since they depend on the unknown parameter θo . On the other hand, efficient estimators do not exist in the large-error case and, other lower 2.4. LINEAR SIGNAL MODEL 23 bounds are needed to take into account the existence of large errors —or outliers— and predict the threshold effect (Section 2.6). In this large-error regime, unbiased estimators are generally not optimal from the MSE point of view and the MMSE solution establishes a trade-off between the variance and bias contribution (Section 2.2). In this context, the Bayesian theory allows deducing realizable estimators minimizing the so-called Bayesian MSE, which is the MSE averaged over all the possible values of the parameter θ [Kay93b, Sec. 10]. In this thesis, Chapter 3 is devoted to design optimal second-order large-error estimators whereas these results are particularized in Chapter 4 to formulate the optimal second-order small-error estimator. To conclude this brief introduction to the maximum likelihood theory, two additional properties are presented next. These properties are satisfied even if the observation interval is finite. Property 3. Whenever an efficient estimator exists, it corresponds to the ML estimator. In other words, if the MVU estimator attains the CRB, the ML estimator is also the MVU estimator. Otherwise, if the MVU variance is higher than the CRB, nothing can be stated about the optimality of the ML estimator for finite observations. M L stands for the ML Property 4. The ML estimator is invariant in the sense that, if θ M L for any one-to-one ML = g θ estimator of θ, the ML estimator of α = g (θ) is simply α ML maximizes fy (y; α) , that is obtained as function g(·). Otherwise, if g(·) is not univoque, α max fy (y; θ) subject to g (θ) = α [Kay93b, Th. 7.2]. θ 2.4 Linear Signal Model The formulation of parameter estimation techniques from the general model introduced in (2.1) is mostly fruitless. Accordingly, in the following, the focus will be on those linear systems corrupted by an additive Gaussian noise, holding that y = A(θ)x + w (2.13) where x ∈ CK is the system input forming the vector of nuisance parameters, w ∈ CM is the Gaussian noise vector and, A(θ) ∈ CM ×K is the system response parameterized by the vector θ ∈ RP . Despite its simplicity, the adopted linear model is really important because it appears in a vast number of engineering applications. In the context of digital communications, this model applies for any linear modulation as well as for continuous phase modulations (CPM) thanks to the Laurent’s expansion [Lau86][Men97, Sec. 4.2] (Section 6.1.2). We will assume that the noise vector in (2.13) is zero-mean and its covariance matrix is a CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 24 priori known, that is, E {w} = 0 E wwH = Rw , with Rw a given full-rank matrix. Furthermore, we will assume that w is a proper or circular random vector holding that E wwT = 0 [Sch03][Pic96]. The statistical distribution of the noise samples is normal (Gaussian), although the results in the following chapters could be easily extended to admit any other noise distribution. Finally, the noise variance is defined as σ 2w Tr (Rw ) , M which is the variance of the noise samples [w]m , if they are identically distributed. Additionally, we introduce the matrix N, which is defined as N σ −2 w Rw . In the unconditional model, the nuisance parameters are modelled as random variables of known probability density function fx (x) with zero-mean and uncorrelated entries4 , meaning that E {x} = E xxH = xfx (x) dx = 0 xxH fx (x) dx = IK where fx (x) would be composed of a finite number of Dirac’s deltas in case of discrete nuisance parameters. On the other hand, the nuisance parameters are possibly improper random variables with E xxT = 0 [Sch03][Pic96]. This consideration is specially important in digital communications because some relevant modulations (e.g., BPSK and CPM) are actually improper or noncircular, i.e., E xxT = 0. In the linear signal model, the conditional or joint likelihood function is given by 1 fy (y; θ, x) = M exp − y − A (θ) x2R−1 w π det (Rw ) H H H H −1 = C1 exp 2 Re x A (θ) R−1 w y − x A (θ) Rw A (θ) x (2.14) with C1 exp(−yH R−1 w y) π M det (Rw ) an irrelevant factor independent of θ. 4 Notice that there is no loss of generality because the correlation of x can always be included into the matrix A(θ) in (2.13). 2.4. LINEAR SIGNAL MODEL 25 On the other hand, the unconditional likelihood function in (2.7) does not admit a general analytical solution, even for the linear model presented in this section. By replacing (2.14) into (2.7), it is found that the unconditional likelihood function is given by5 H H −1 fy (y; θ) = C1 Ex exp 2 Re xH AH R−1 . w y − x A Rw Ax (2.15) Moreover, in case of i.i.d. nuisance parameters, the expectation with respect to x results in the following expressions: K −1 y = Ex exp 2 Re x∗k aH Ex exp 2 Re xH AH R−1 w k Rw y Ex exp xH AH R−1 = w Ax = k=1 K K −1 Ex exp x∗k xl aH k Rw al k=1 l=1 K K k=1 −1 Ex exp 2 Re x∗k xl aH + k Rw al l>k −1 R a Ex exp |xk |2 aH w k k where xk [x]k and ak [A]k are the k-th element and column of x and A, respectively. The above expectations over the nuisance parameters have only been solved analytically in case of Gaussian nuisance parameters (Section 2.4.3) and polyphase discrete alphabets as shown in Appendix 2.A. However, a general closed-form solution is not available. In the next subsections, some alternative criteria are proposed to circumvent the computation of the exact unconditional likelihood function (2.15). 2.4.1 Low-SNR Unconditional Maximum Likelihood The usual way of finding the UML estimator is the evaluation of (2.15) assuming a very low SNR [Vaz00][Men97]. The low-SNR constitutes a worst-case situation leading to robust estimators of θ. When the noise variance increases, the exponent of (2.15) is very small and, therefore, the exponential can be expanded into the following Taylor series: fy (y; θ) C2 Ex 1 + χ (y; θ, x) + χ2 (y; θ, x) (2.16) H H −1 where χ (y; θ, x) 2 Re xH AH R−1 w y − x A Rw Ax is the exponent of (2.15) [Vaz00]. Assuming that the nuisance parameters are circular, zero-mean, unit-power and uncorrelated, the expectation in (2.16) is evaluated obtaining that H −1 −2 Ex {χ (y; θ, x)} = − Tr AH R−1 w A = −σw Tr A N A H −1 R + ζ (θ) AA R Ex χ2 (y; θ, x) = 2 Tr R−1 w w −1 H −1 = 2σ −4 Tr N AA N R + ζ (θ) w 5 For the sake of clarity, the dependence on θ is omitted from A (θ) in the following expressions. CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 26 yyH is the sample covariance matrix and, ζ (θ) σ−4 where R w Ex 2 has xH AH N−1 Ax not been expanded because it is negligible compared to Ex {χ (y; θ, x)} for σ2w → ∞. Finally, having in mind that ln(1 + x) x for x 0 and omitting constant terms, the low-SNR log-likelihood function becomes −1 H −1 ln fy (y; θ) ∝ − Tr AH R−1 A + Tr R AA R R w w w H −1 R − Rw , = Tr R−1 w AA Rw (2.17) yyH is a sufficient statistic for the estimation proving that the sample covariance matrix R of θ in the studied linear model, if the SNR goes to zero. More precisely, the log-likelihood function in (2.17) is an affine transformation of the sample covariance matrix with b (θ) = − Tr AH (θ) R−1 A (θ) w H −1 M (θ) = R−1 w A (θ) A (θ) Rw the independent term and the kernel of ln fy (y; θ), respectively. Notice that this result is independent of the actual distribution of the nuisance parameters fx (x). Actually, the result is valid for any circular distribution having zero mean and unitary variance. Finally, the explicit formula for the UML estimator at low SNR is given by lowSN R = arg max Tr N−1 AAH N−1 R − Rw . θ θ (2.18) This result is relevant because it states that in low SNR scenarios, second-order techniques are asymptotically efficient for any estimation problem following the linear model in (2.13). Actually, this conclusion was the starting point of this thesis. Unfortunately, the low-SNR solution has some important inconveniences. In Appendix 2.B, it is shown that the low SNR approximation usually yields biased estimates for any positive SNR. Moreover, the low-SNR UML estimator might yield a significant variance floor when applied in high SNR scenarios due to the variance induced by the random nuisance parameters (Appendix 2.B). This variability is usually referred to as self-noise or pattern-noise in digital synchronization [Men97]. Despite these potential problems, the low SNR approximation is extensively used in the context of digital communications and ad hoc methods are introduced to mitigate or cancel the self-noise contribution at high SNR. On the other hand, the ML-based estimators proposed in the following sections are suitable candidates to cancel out the bias and self-noise terms at high SNR. However, our main contribution in Chapter 4 is proving that all of them are suboptimal in terms of self-noise cancelation when applied to polyphase alphabets such as MPSK. 2.4. LINEAR SIGNAL MODEL 27 yyT also appears in To conclude this section, we notice that the term depending on R 2 Ex χ (y; θ, x) when dealing with noncircular nuisance parameters. Therefore, the low-SNR log-likelihood function should be modified in the following way: −1 H −1 −T ∗ H H −1 ln fy (y; θ) − Tr AH R−1 w A + Tr Re Rw AA Rw R + Rw A Γ A Rw R , with Γ E xxT the improper covariance matrix of x. Furthermore, if x is real-valued (e.g., in baseband com munications or for the BPSK modulation), it follows that Γ = E xxH = IK . Notice that this second term is the one exploited in Section 6.2 to estimate the carrier phase because the term does not provide information about the signal phase. on R 2.4.2 Conditional Maximum Likelihood (CML) In this section, the CML criterion in (2.8) is formulated for the linear signal model in (2.13). In that case, the conditional likelihood function in (2.14) can be compressed with respect to x if the nuisance parameters are continuous variables, i.e., x ∈ CK . If the nuisance parameters are discrete (e.g., in digital communications), this compression strategy yields a suboptimal version of the CML estimator formulated in (2.8). This suboptimal CML estimator has been successfully applied to different estimation problems in digital communications such as timing synchronization [Rib01b]. Some degradation is incurred because the estimator does not exploit the fact that x belongs to a finite alphabet. As it is shown in Section 2.4.1, this information is irrelevant at low SNR but it is crucial when the noise term vanishes at high SNR. Nonetheless, in the following, we will refer to this estimator as the CML estimator regardless of having discrete or continuous nuisance parameters. Therefore, if there is absolutely no information about x, the nuisance parameters must be assumed deterministic, continuous unknowns. Then, the ML estimator of x in (2.9) is obtained in the linear case by solving a classical weighted least squares (WLS) problem leading to −1 M L (θ) = AH (θ) R−1 x A (θ)H R−1 w A (θ) w y, assuming that A (θ) is a tall matrix, i.e., M > K [Sch91a, Sec. 9.12]. After some algebra, the corresponding log-likelihood function is given by M L (θ)2Rw xM L (θ)) ∝ − y − A (θ) x ln fy (y; θ, H −1 −1 H −1 , ∝ Tr R−1 A A Rw A A Rw R w (2.19) CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 28 becoming a linear transformation of the sample covariance matrix and, thus, a quadratic function of the observation y. Finally, the CML estimator of θ is computed as follows: CML = arg max ln fy (y; θ, xML (θ)) θ θ −1 H −1 A N R = arg max Tr N−1 A AH N−1 A θ , = arg max Tr M (θ) R θ (2.20) with −1 H M (θ) = N−1 A (θ) AH (θ) N−1 A (θ) A (θ) N−1 the associated kernel. The resulting estimator is actually projecting the whitened observation N−1/2 y onto the orthogonal subspace generated by the columns of N−1/2 A (θ). Clearly, the above solution is related to subspace methods like MUSIC [Sch79][Bie80][Sto89][Sto97]. In fact, the CML estimator in (2.20) is equivalent to a variant of the MUSIC algorithm proposed in [Sto89]. It can be seen that the CML estimator in (2.20) corresponds to the low-SNR UML estimator −1/2 in (2.18) if Rw A (θ) is unitary or, in other words, AH (θ) N−1 A (θ) ∝ IK . (2.21) If the above equation is not fulfilled, the CML estimator might suffer from noise-enhancement at low SNR when the observation length is limited. In that case, the low-SNR UML estimator deduced in Section 2.4.1 outperforms the CML estimator in the low SNR regime because the former exploits the a priori statistical knowledge about x. The CML solution is shown in Appendix 2.C to hold the following regularity condition: ∂ Ey ln fy (y; θ, xM L (θ)) =0 ∂θ θ=θ o and, therefore, the CML estimator is always unbiased and self-noise free even for finite observations. Another significative feature of the CML solution is that it is not necessary to know the variance of the noise samples σ2w . 2.4.3 Gaussian Maximum Likelihood (GML) The Gaussian assumption on the nuisance parameters is generally adopted when the actual distribution is unknown or becomes an obstacle to compute the expectation in (2.15). The Gaussian assumption is known to yield almost optimal second-oder estimators on account of 2.5. MAXIMUM LIKELIHOOD IMPLEMENTATION 29 the Central Limit Theorem. This subject is addressed throughout this dissertation and the asymptotic efficiency of the Gaussian assumption is studied in Chapter 7. If the nuisance parameters are Gaussian, the observed vector y is also Gaussian in the studied linear signal model. Thus, we have that exp −yH R−1 (θ) y fy (y; θ) = , π M det (R (θ)) (2.22) where y is zero-mean and = E yyH = A (θ) AH (θ) + Rw R (θ) E R (2.23) is the covariance matrix of y. Once again, the log-likelihood solution is an affine transformation of the sample covariance matrix that, omitting constant additive terms, is given by ln fy (y; θ) = ln Ex {fy (y/x; θ)} = − ln det (R (θ)) − Tr R−1 (θ) R (2.24) Therefore, having in mind that ln det (M) = Tr ln (M), it is found that b (θ) = − ln det (R (θ)) = − Tr (ln R (θ)) −1 M (θ) = −R−1 (θ) = − A (θ) AH (θ) + Rw are the independent term and the kernel of the GML likelihood function, respectively. Consequently, the GML estimator is computed as follows: GM L = arg min Tr ln R (θ) + R−1 (θ) R θ θ (2.25) In Appendix 2.D, we prove that the GML estimator converges to the low-SNR UML solution (2.18) for σ2w → ∞ and to the CML solution (2.20) for σ2w → 0. Therefore, the GML estimator is asymptotically efficient at low SNR and, evidently, for any SNR if the nuisance parameters are Gaussian. Indeed, any statistical assumption about the nuisance parameters leads to the UML solution (2.18) at low SNR. Consequently, the GML estimator can only be outperformed using quadratic techniques in the medium-to-high SNR interval if the nuisance parameters are non-Gaussian random variables. This subject is addressed thoroughly in subsequent chapters. 2.5 Maximum Likelihood Implementation Generally, the ML-based estimators presented in the last section does not admit an analytical solution6 and the maximization of the associated log-likelihood function must be carried out using numerical techniques. In that case the log-likelihood function should be sampled. If 6 An exception is the estimation of the carrier phase in digital communications (see Section 6.2). CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 30 the samples separation is decided according to the sampling theorem, the ML estimate can be determined by means of ideal interpolation. Otherwise, if the sampling rate violates the Nyquist criterion, a gradient-based algorithm can be applied to find the maximum of ln fy (y; θ). Moreover, if the gradient of ln fy (y; θ) has a single root in the parameter space, a gradient-based algorithm is able to look for the maximum of ln fy (y; θ) without any assistance. Nonetheless, in a multimodal problem, the same gradient-based method might converge to a local maximum unless a preliminary search of the global maximum is performed. The utilization of a gradient-based or iterative algorithm is generally preferred because it has a lower complexity than the grid search implementation7 . The convergence of gradient-based methods is guaranteed if and is negative definite —and lower bounded— only if the Hessian matrix 0 the initial guess [Boy04, Sec. with θ in the closed subset Θ = θ | fy (y; θ) ≥ fy y; θ0 8.3.]. Among the existing gradient-based methods, the Newton-Raphson algorithm is extensively adopted because its convergence is quadratic —instead of linear— when the recursion approaches M L ) [Boy04, Sec. 8.5.]. Other methods are the steepest descent the log-likelihood maximum (θ method, conjugate gradient, quasi-Newton method, among many others (see [Boy04][Lue84] and references therein). The Newton-Raphson iteration is given by k+1 = θ k − H−1 (y;θ k )∇(y; θ k ) θ (2.26) where k is the iterate index and ∂ ln fy (y; θ) ∂θ ∂ 2 ln fy (y; θ) H (y; θ) ∂θ∂θT ∇(y; θ) are the gradient and the Hessian of the log-likelihood function, respectively. Notice that, in a low-SNR scenario (2.17) and/or if the nuisance parameters are Gaussian (2.24), ∇(y; θ) is yyH . In that case, the Newton-Raphson recursion linear in the sample covariance matrix R in (2.26) is quadratic in the observation y. The quadratic convergence of the Newton-Raphson algorithm is accelerated when approachM L because ln fy (y; θ) becomes approximatelly parabolic around the current estimate θ k , ing θ that is, k + ∇(y; θ k ) θ−θ k + ln fy (y; θ) ln fy y;θ T 1 Tr H(y; θk ) θ−θk θ−θk 2 7 Recall that the parameter θ ∈ RP is a continuous variable and we are assuming that fy (y; θ) is continuously differentiable in θ. 2.5. MAXIMUM LIKELIHOOD IMPLEMENTATION 31 and, therefore, (2.26) yields approximatelly the ML solution for θ − θ M L sufficiently small. k Notice that ln fy (y; θ) is strictly quadratic in case of linear estimation problems having additive Gaussian noise [Kay93b, Theorem 3.5]. In that case, the ML estimate is obtained after a single iteration of the Raphson-Newton algorithm. Otherwise, the convergence rate is slow if the logM L . In that case, however, the estimation likelihood curvature is large near the maximum θ accuracy is found to be superior. The Newton-Raphson method in (2.26) can be generalized to estimate a given transformation of the parameter [Sto01][Kay93b, Sec. 3.8] as follows k )H−1 (y; θ k )∇(y; θ k ) k+1 = α k − Dg (θ α (2.27) where α = g (θ) is the referred transformation, Dg (θ) ∂g (θ) /∂θT is the Jacobian of g (θ) ML ) holds from the invariance property of the ML estimator (Section 2.3.2). M L = g(θ and, α According to the asymptotic properties of the ML estimator (Section 2.3.2), it follows that any iterative method converging to the ML solution is asymptotically (M → ∞) consistent and efficient, if the ML regularity condition is satisfied (2.12). In the asymptotic case, the small-error condition is verified and the ML estimator attains the Cramer-Rao bound (Section 2.6.1), which is given by BCRB (θo ) Dg (θo ) J−1 (θo ) DH g (θ o ) , where J(θ) −Ey {H (y; θ)} = Ey ∇ (y; θ) ∇H (y; θ) (2.28) is the Fisher’s information matrix (FIM) and the expectation is computed with respect to the random observation y. The last equality is a consequence of the regularity condition (2.12) [Kay93b, Appendix 3A]. The asymptotic efficiency is also guaranteed if the Newton-Raphson method (2.27) is substituted by the following scoring method: k )J−1 (θ k )∇(y; θ k ), k+1 = α k + Dg (θ α (2.29) in which the Hessian matrix is replaced by the negative of its expected value (2.28). The method of scoring is preferred because it improves the convergence to the ML solution for short data records, mainly in multiparametric problems. However, both methods are equivalent if the observation size goes to infinity. 2.5.1 ML-Based Closed-Loop Estimation Conventionally, ML estimators are developed in batch mode, that is, the M samples of y are recorded first and, afterwards, ln fy (y; θ) is iteratively maximized in order to find the ML CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 32 Nss y: y1 y2 M y3 y4 y5 y6 y7 y8 z1 : z2 : Figure 2.3: Sequential processing of the received vector y in the context of digital communications. The observed blocks {zn } last M = 4 samples and are taken every Nss = 2 samples where Nss is the number of samples per symbol. ML . Unfortunately, the complexity and latency of this batch-mode implementation is estimate θ excessive when long observations are required to comply with the specifications. To ameliorate this problem, the long observation y is fragmented into smaller blocks {zn }n=1,...,N that are ergodic realizations of the same distribution fz (z; θ). The minimum block size is one sample, in which case the estimator would work in a sample-by-sample basis. Identically distributed blocks are feasible if the observation is (cyclo-)stationary. In Appendix 2.E, it is shown that the following closed-loop estimator, n )J−1 (θ n )∇z (zn ; θ n ), n+1 = α n + µDg (θ α z (2.30) is efficient in the small-error regime if the N partial observations zn are statistically independent, where ∂ ln fz (z; θ) ∂θ 2 ∂ ln fz (z; θ) 1 Jz (θ) −Ez = Ez ∇z (z; θ) ∇H (z; θ) = J(θ) T N ∂θ∂θ ∇z (z; θ) is the gradient and the FIM for the block-size observations {zn }n=1,...,N , respectively. The stepsize or forgetting factor µ is selected to achieve the same performance than the off-line recursions in (2.27) and (2.29). If N is sufficiently large, the parameter µ must be set to approximatelly 2/N (Appendix 2.E). Although closed-loop estimators have the same aspect as their off-line versions in (2.27) and (2.29), the closed-loop scheme in (2.30) aims at maximizing the stochastic likelihood function fz (z; θ), which has a time-varying shape. Therefore, the gradient ∇z (z; θ) is also a random 2.5. MAXIMUM LIKELIHOOD IMPLEMENTATION 33 vector pointing into the direction of the maximum of fz (z; θ). Thus, the ML-based closed-loop estimator proposed in (2.30) belongs to the family of stochastic gradient algorithms. Indeed, equation (2.30) is referred to as the natural gradient in the context of neural learning [Ama98]. Despite the closed-loop estimator in (2.29) has been deduced assuming N independent blocks, the necessary and sufficient condition for efficiency is more general and is formulated next. Proposition 2.1 The closed loop estimator proposed in (2.30) is efficient in the small-error regime if and only if there is at least one block zn in which each sample [y]m (m = 1, ..., M) is jointly processed with all the samples that are statistically dependent on it. The above proposition implies in most cases the partial overlapping of the observed blocks. This means that the same sample is processed more than once. For example, in digital communications the received signal is cyclostationary if we have Nss > 1 samples per symbol. The data symbols are usually i.i.d. random variables that modulate a known pulse p(t) lasting LNss samples. In that case, the optimal performance is achieved if the block size is equal to LNss samples and the block separation is one sample. However, in order to have identically distributed blocks, the block separation is usually set to Nss , taking into account the signal cyclostationarity (see Fig. 2.3). As it has been previously stated, closed-loop estimators yield efficient estimates if the small0 is usually far away error regime is attained in the steady-state. However, the initial guess θ from the true parameter θo and the algorithm has to converge towards θo . The initial convergence constitutes the estimator acquisition and has been studied for a long time [Mey90]. Unfortunately, only approximated results are available on the acquisition mean time, lock-in and lock-out probability, etc. [Mey90]. The step-size µ in equation (2.30) can be adjusted to trade acquisition speed —large µ— and steady-state performance —small µ. Closed Loop Architecture The ML-based closed loop proposed in (2.30) has two components (Fig. 2.4): a nonlinear discriminator (or detector) of the estimation error, and a first-order loop filter. The discriminator input-output response is given by e (zn ; θ) = Dg (θ)J−1 z (θ)∇z (zn ; θ) n is the current estimate serving as a reference to infere the estimation error g(θ n )− where θ = θ g (θo ) at time n. The mean value of the discriminator output is given by e (zn ; θ)} = Dg (θ)J−1 Ez { z (θ)Ez {∇z (zn ; θ)} . (2.31) CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 34 zn J (θ n ) ∇ z ( i ;θ n ) −1 z DISCRIMINATOR θ n µ 1+ z −1 LOOP FILTER D g(θ n ) e (z n;θ n ) µ 1 + z −1 α n LOOP FILTER Figure 2.4: Block diagram for the ML closed-loop estimator in equation (2.30). The same scheme is applicable to any other closed-loop estimator or tracker if the discriminator and/or the loop filters are conveniently modified. It can be shown that the discriminator output is unbiased in the neighbourhood of the equilibrium point θ = θo because e (zn ; θo )} = 0 Ez { ∂ E { e (z ; θ)} = Dg (θo ) , z n ∂θT o θ=θ o taking into account that Ez {∇z (zn ; θo )} = 0 ∂ ∂ E {∇z (zn ; θ)} =− E {∇z (zn ; θ)} T z T z ∂θo ∂θ θ=θ θ=θ o o ∂ ln fz (zn ; θ) = −Ez = Jz (θo ) ∂θ∂θT θ=θ o is always verified in the studied linear signal model (Section 2.4). The first equation is the classical regularity condition introduced in (2.12) and the second equation is the Fisher’s information matrix Jz (θ). Precisely, J−1 z (θ) normalizes the discriminator slope in (2.31) to have unbiased estimates of θ − θo . The Jacobian matrix Dg (θ) is then used to obtain unbiased estimates of g (θ)−g (θo ) taking into account that g (θ) can be linearized around θ θo using the first-order Taylor expansion g (θ) g (θo ) + Dg (θo ) (θ − θo ). In some problems of digital communications, the discriminator mean value (2.31) only depends on the estimation error θ − θo and is named the discriminator S-curve because it looks like an “S” rotated by 90o [Men97][Mey90]. 2.5. MAXIMUM LIKELIHOOD IMPLEMENTATION 2.5.2 35 ML-based Tracking An important feature of the stochastic gradient methods previously presented is the ability of tracking the evolution of slowly time-varying parameters. Thus, let us consider that θn is a time-varying parameter and αn = g (θn ) a given transformation. The closed loop in (2.30) must be modified to track the parameter evolution and supply unbiased estimates of θn in the steady-state. A first-order loop filter was used in the last section because the parameter was constant, i.e., θn = θo (Fig. 2.4). However, if θn has a polynomial evolution in time, i.e., θn = θo + R−1 δ r nr r=1 a Rth-order loop filter is required to track θn without systemmatic or pursuit errors [Men97][Mey90]. For example, if θo is the carrier phase and we are designing a phase-lock loop (PLL), δ 1 corresponds to the Doppler frequency and δ 2 to the Doppler rate. Another alternative to take into account the parameter dynamics is the one adopted in the Kalman filter theory [Kay93b, Ch.13]. In this framework, a dynamical model (or state-equation) is assumed for the parameters of interest θn+1 = f (θn ) , T where θn stacks all the parameters involved in the dynamical model, i.e., θ = θTo , δ T1 , ..., δ TR−1 for the polynomial model above. Although the parameter dynamics are generally nonlinear, they n , leading to the following approximation can be linearized around the actual estimate θ n n θ n − θ n + Df θ f (θn ) f θ where Df (θ) ∂f (θ)/∂θT is the Jacobian of f (θ). If the parameter dynamics are incorporated into the original closed loop (2.30), we obtain the following higher-order tracker n ) + diag (µ) Dh θ n J−1 n+1 = h(θ α z (θ n )∇n (zn ; θ n ) (2.32) where h(θ) g (f (θ)) and Dh (θ) ∂h(θ)/∂θT = Dg (θ) Df (θ) is the Jacobian of the composite function h(θ) [Gra81, Sec. 4.3.]8 . The vector of forgetting factors µ sets the (noise equivalent) loop bandwidth of each parameter in αn . In Appendix 2.E 8 n+1 = f (α n ), the composition must be reversed having that If the dynamical model is specified for αn , i.e., α h (θ) f (g (θ)) . CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 36 it is shown that Bn µ/4. The loop bandwidth determines the maximum variability of the parameters that the closed loop is able to track as well as the closed loop effective observation time that, approximately, is equal to N 0.5/Bn samples [Men97, Sec. 3.5.6] (see also Appendix 2.E). A vast number of tracking techniques have been proposed in the field of automatic control [Kai00][S¨ od89], signal processing [Kay93b] and communications [Men97][Mey90], e.g., least mean squares (LMS) and recursive least squares (RLS) [Hay91][Kai00], Kalman-Bucy filtering [And79][Hay91], machine learning [Mit97], etc. In fact, filtering, smoothing, prediction, deconvolution, source separation and other applications can be seen as particular cases of parameter estimation or tracking in which the aim is to determine the input data at time n, say θn , from a vector of noisy observations. 2.6. LOWER BOUNDS IN PARAMETER ESTIMATION 2.6 37 Lower Bounds in Parameter Estimation The calculation of an attainable benchmark for the adopted performance criterion is necessary to identify whether a given estimation technique is efficient or not. For example, the ML estimator is known to be optimal in the small-error regime because it attains the Cramer-Rao lower bound. Once the optimal performance is known, suboptimal techniques can be devised trading-off performance and complexity. Moreover, lower bounds usually give insight into the contribution of the different parameters onto the estimator performance (e.g., SNR, observation size and others). In the following sections, some important lower bounds are briefly described. Focusing on the mean squared error (MSE), lower bounds can be classified as Bayesian or deterministic depending on whether the prior statistics of the parameters are exploited or not. On the other hand, lower bounds are also classified into small-error (or local) bounds and largeerror (global) bounds. Furthermore, the lower bounds in the literature are derived from either the Cauchy-Schwarz or Kotelnikov inequalies. From the above classification criteria, the most important lower bounds in the literature are described and interconnected in the following subsections. Finally, all these bounds are organized and presented in a concluding table at the end of the section (Fig. 2.5). NOTE: the material in the following section is not essential to understand the central chapters of the dissertation. Only those lower bounds derived from the CRB in the presence of nuisance parameters will be extensively used throughout the thesis. Thus, we recommend the reader to skip Section 2.6 in the first reading. 2.6.1 Deterministic Bounds based on the Cauchy-Schwarz Inequality A large number of deterministic lower bounds on the mean square error (MSE) have been derived from the Cauchy-Schwarz inequality, e.g., [Gor90, Eq. 10][Abe93, Eq. 5][McW93, Eq. 2][Rif75, Eq. 13]. The Cauchy-Schwarz inequality states that # H E se E eeH ≥ E esH E ssH (2.33) for two arbitrary random vectors e and s.9 The Moore-Penrose pseude-inverse operator was introduced in [Gor90, Eq. 10][Sto01] to cover those cases in which E ssH is singular. Notice that the expectation is computed with respect to the random components of e and s. Furthermore, equation (2.33) holds with equality if and only if the vector e and s are connected as 9 For the scalar case, we have the conventional Cauchy-Schwarz inequality, E |e|2 ≥ |E {es}|2 /E |s|2 , as it appears in [Wei88b, Eq. 7] CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 38 follows # s. e = E esH E ssH (2.34) Indeed, the Cauchy-Schwarz inequality is a consequence of the more general relation A BH ≥ 0 ⇔ A ≥ BH C# B (2.35) BC which is valid if C is non-negative definite [Mag98, Ex. 3, p. 25]. This property is used in [Gor90, Lemma 1] to prove the vectorial Cauchy-Schwarz inequality (2.33). The proof is straightforward T if (2.35) is applied to the matrix E zzH with z eT , sT . Also, this matrix inequality is adopted in [McW93, Eq. 2] to analyze the geometry of several “quadratic covariance bounds”. Based on the Cauchy-Schwarz inequality (2.33), lower bounds on the estimation mean square − g (θ) is the estimation error and s an arbierror can be formulated considering that e = α trary score function. In the deterministic case, both e and s are functions of the random observation y, which is distributed as fy (y; θ). Various deterministic lower bounds on the MSE have been deduced by selecting different score functions s as, for instance, the following well-known bounds; Cram´er-Rao [Kay93b, Chapter 3], Battacharyya [Bat46], Barankin [Bar49], Hammersley-Chapman-Robbins [Cha51][Ham50], Abel [Abe93] and Kiefer [Kie52], among others. Because (2.33) is valid for any score function, the aim is to find the score function leading to the highest lower bound on the estimator MSE and, if possible, the estimator attaining the resulting bound. Conversely, if an estimator satisfies (2.33) with equality for a given score function, the resulting bound is the tightest, attainable lower bound on the MSE. Furthermore, this estimator is the one holding (2.34). In [McW93], it is shown that tight lower bounds are obtained provided that P1: the score function is zero-mean, i.e., E {s (y, θ)} = s (y, θ) fy (y; θ) dy = 0 for every value of θ. Thus, we are only concerned with unbiased estimators since the estimation error is proportional to s (y, θ) (2.34); P2: the score function is a function of the sufficient statistics of the estimation problem at hand. Recall that t (y) is a sufficient statistic if and only if fy (y; θ) depends on the parameter vector θ uniquely throught a function of the sufficient statistic t (y) . Consequently, s (y, θ) can be any biunivoque function of the likelihood function fy (y; θ), as for instance, its gradient ∇(y; θ). See the Neyman-Fisher factorization theorem in [Kay93b, Th. 5.3]; 2.6. LOWER BOUNDS IN PARAMETER ESTIMATION 39 P3: the score function must hold (2.34). This mean that s (y, θ) must span the estimation error subspace. The first property is really important because it states that we only have to consider unbiased estimators of the parameter. In fact, it can be shown that the bias term always increases the overall MSE and it is not possible to trade bias for variance in the deterministic case. To show this result, we have to decompose the estimation error as e (y; θ) b (θ) + v (y; θ) (y) −E { with b (θ) = E { α (y)} − g(θ) the estimator bias and v (y; θ) = α α (y)} the deviation with respect to the estimator mean. Consequently, the estimator MSE can be written as Σee (θ) = b (θ) bH (θ) + Σvv (θ) where Σxy E xyH stands for the cross correlation matrix10 . Then, the Cauchy-Schwarz inequality (2.33) can be applied to the covariance matrix Σvv (θ) in order to obtain the following lower bound on the MSE: Σee (θ) ≥ b (θ) bH (θ) + Σvs (θ) Σ# ss (θ) Σsv (θ) (2.36) in which the bias function has been set to b (θ) [Abe93, Eq. 6]. Equation (2.36) is usually referred to as the “covariance inequality” [Gor90][McW93][Abe93, Eq. 6]. Therefore, if the covariance inequality in (2.36) is compared with the original bound, # Σee (θ) ≥ Σes (θ) Σ# ss (θ) Σse (θ) = Σvs (θ) Σss (θ) Σsv (θ) , it follows that the bias term b (θ) bH (θ) can never reduce the MSE matrix Σee (θ). In the last expression, we take into account that Σes = Σvs because the score function is zero-mean. The Cauchy-Schwarz inequality can be used then to extend the concept of efficiency to (y) is an efficient estimator other lower bounds besides the usual Cram´er-Rao bound. Thus, α of α = g(θ) if and only if it holds that E {e (y, θ)} = 0 Σee (θ) = Σvv (θ) = Σvs (θ) Σ# ss (θ) Σsv (θ) (2.37) (2.38) for (at least) a score function s (y, θ) . 10 Notice that the transpose conjugate will be considered in the sequel for both real and complex vectors. CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 40 (y) is efficient if and only if it verifies Additionally, we know from (2.34) that the estimator α that (y) = g(θ) + Σvs (θ) Σ# α ss (θ) s (y, θ) (2.39) for any value of θ. An important question is whether a realizable11 , unbiased estimator can attain the covariance inequality or not for a given score function. If this estimator was found, the resulting covariance would constitute the highest lower bound. Therefore, any other score function s (y, θ) would yield a weaker bound on the MSE, which will not be attainable. Next, a sufficient condition on s (y, θ) leading to realizable estimators is shown. Proposition 2.2 If the zero-mean score function can be factorized as s (y, θ) H (θ)z (y) −u(θ) with z (y) a function of the sufficient statistics t (y) and H Σvs (θ) Σ# ss (θ) H (θ) M Σvs (θ) Σ# ss (θ) u(θ) = g (θ) , (y) = MH z (y) is efficient and its covariance matrix is given by the estimator α H H Σee (θ) = Σvv (θ) = Σvs (θ) Σ# ss (θ) Σsv (θ) = M Σzz M − g(θ)g (θ) that becomes therefore the highest lower bound on the estimation error covariance. Unfortunately, most score functions of interest cannot be factorized as in the last proposition for all the values of θ. Consequently, efficient estimators are usually unrealizable in the deterministic framework. In that case, efficient deterministic estimators are only feasible in the small-error regime once the value of θ has been iteratively learnt using a suitable gradient-based method. Notice that this was the adopted approach in the case of the ML estimator and the associated Cram´er-Rao bound. Thus, the following scoring method k Σ# k + Σvs θ k+1 = g θ α ss θ k s y,θ k k θ) for any score function. is efficient in the small-error regime (i.e., θ Consequently, all the deterministic bounds will converge to the Cram´er-Rao bound in the small-error regime. However, the Cram´er-Rao bound is not attained when the estimator operates in the large-error regime. In that case, tighter bounds can be formulated by using a better score function. Next, the score functions associated to the most important large-error and small-error deterministic bounds are presented. 11 (y) does not depend on the vector of unknown parameters θ. The adjective “realizable” means that α 2.6. LOWER BOUNDS IN PARAMETER ESTIMATION 41 Barankin Bound (BB) The Barankin bound was originally formulated in [Bar49] for scalar, real-valued estimation problems. The Barankin bound is constructed looking for the estimator minimizing its sth -order absolute central moment subject to the unbiased constraint over all the parameter space Θ, i.e., α BB = arg min E | α − g (θo )|s subject to E { α} = g (θ) α (2.40) for every θ ∈ Θ. Focusing on the estimator variance (s = 2), it can be only stated that α BB is the minimum variance unbiased estimator in the neighbourhood of θo . Furthermore, if the obtained local solution is independent of θo , α BB turns out to be the global minimum variance unbiased estimator. The Barankin bound has been extended in [Mar97] to multivariate estimation problems adopting a simpler formulation than the original one. In [Mar97, Eq. 9] the Barankin bound was shown to be a covariance inequality bound (2.38) with fy (y; θ) θ)dθ s (y; θ) = f (θ, f (y; θ) y θ∈Θ (2.41) θ) ∈ RP an arbitrary function that must be selected to the adopted score function, and f (θ, supply the tightest covariance lower bound. Notice that tighter lower bounds will be obtained if the mean of the score function is null (property 1), i.e., (θ, θ)dθ = 0. fy (y; θ)f E {s (y; θ)} = θ∈Θ θ) leading to the tightest lower bound must be proportional to the Therefore, the functions f (θ, θ) and f2 (θ, θ), i.e., difference of two vectors of probability density functions f1 (θ, θ) = κ f1 (θ, θ) − f2 (θ, θ) f (θ, θ)dθ = f2 (θ, θ)dθ = 1. This relevant with κ an arbitrary constant (e.g., κ = 1) and f1 (θ, θ) was taken into account in [Tre68, Pr. 2.4.18] to derive the Barankin bound property of f (θ, in a different way. Also, a multidimensional version of the Kiefer bound [Kie52] can be obtained θ) by a multivariate delta measure δ(θ−θ). replacing f2 (θ, Using now the covariance inequality, we have that the Barankin bound for the estimation of α = g(θ) is given by BBB (θ) = sup Σvs (θ) Σ# ss (θ) Σsv (θ) ≤ Σvv (θ) ) f (θ with − g(θ) f H (θ, θ)dθ g(θ) θ∈Θ ⎫ ⎧ 1 fy y; θ 2 ⎬ ⎨ fy y; θ , θ)dθ 1 dθ 2 . , θ)f H (θ E Σss (θ) = f (θ 1 2 ⎩ ⎭ fy2 (y; θ) 1 ,θ 2 ∈Θ θ Σvs (θ) = (2.42) CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 42 Notice that the original bound [Bar49] is somewhat more involved because the integral on θ is formulated as a Riemann integration, that is, θ∈Θ Q , θ) q f(θ ξ θ f (θ, θ)dθ = lim ξ θ q Q→∞ q=1 Q are selected to expand the whole parameter domain 1 , ..., θ where the so-called test points θ Θ and take into account the existence of large errors. In fact, we can understand the original θ) is sampled at the test approach as the bound obtained when the continuous function f (θ, Q . From the sampling theory, the separation of the test points should be adjusted 1 , ..., θ points θ θ). Specifically, a dense sampling —closer according to the variability of the selected function f (θ, θ) is more abrupt and vice versa. An test points— should be applied to those regions where f (θ, important consequence of the sampling theorem is that infinite test points are needed if the θ). This comment is related with parameter range is finite whatever the selected function f (θ, the fact that unbiased estimators do not exist for all θ ∈ Θ when Θ is a finite set.12 If the number of test points is finite, the Barankin bound is only constrained to be unbiased 1 , ..., θ Q [For02]. Consequently, the resulting lower bound is not the highest at the test points θ Barankin bound (Q → ∞) but it is generally realizable even if the parameter range is finite. The resulting bound can be improved by considering also the bias derivatives at the test points. This idea has been applied to derive other hybrid lower bounds in [Abe93] or [For02]. Also, the same reasoning was applied in [Vil01a] to design second-order almost-unbiased estimators. The Barankin bound theory has been applied to determine the SNR threshold in a lot of nonlinear estimation problems as, for example, time delay estimation [Zei93][Zei94] or frequency estimation [Kno99]. A geometric interpretation of the Barankin bound is provided in [Alb73] and references therein. 12 If an estimator were unbiased in the boundary of Θ, this would imply that the estimation error must be zero for these values of θ. Unfortunately, this situation is unreal and biased estimators are unavoidable along the boundary of Θ. 2.6. LOWER BOUNDS IN PARAMETER ESTIMATION 43 Hammersley-Chapman-Robbins Bound (HCRB) The simplest Barankin bound was formulated by Chapman and Robbins [Cha51] and Hammersley [Ham50] simultaneously by considering a single test point per parameter, i.e., Q = P . This simplified version is by far the most usual variant of the Barankin bound. The original scalar bound was extended to deal with multidimensional problems by Gorman et al. [Gor90]. In that θ) ∈ RP in the following manner: paper, every test point determines a single component of f (θ, −θ p − δ θ −θ δ θ θ) = f (θ, , p θ p − θ p −θ are linearly independent and span the entire parameter space Θ. where the P vectors δ p θ in case of having P test points [Wei88b, It can be shown that this is the optimal choice of f θ Eq. 33]. Therefore, the p-th element of the score function (2.41) becomes [s (y; θ)]p = fy (y; θ + δ p ) − fy (y; θ) for p = 1, ..., P δ p fy (y; θ) and the multiparametric Hammersley-Chapman-Robbins bound is given by BHCRB (θ) = sup Σvs (θ) Σ# ss (θ) Σsv (θ) ≤ Σvv (θ) δ 1 ,...,δ P with g (θ + δ p ) − g (θ) δ p Σss (θ) = E s (y; θ) sH (y; θ) . [Σvs (θ)]p = Cram´ er-Rao Bound (CRB) The Cram´er-Rao bound can be obtained from the Hammersley-Chapman-Robbins bound when the P test points converge to the true parameter θ [Gor90][For02]. This means that the CRB is only able to test the small-error region whereas the Barankin-type bounds were able to test the large-error region, as well. The CRB score function is shown to correspond to the projection of the log-likelihood gradient ∇y (y; θ) onto the directions determined by δ 1 , ..., δ P , i.e., fy (y; θ + δ p ) − fy (y; θ) ∂fy (y; θ) /∂θ = δH = δH p p ∇y (y; θ) δ p fy (y; θ) fy (y; θ) δ p →0 [s (y; θ)]p = lim (2.43) and, thus, s (y; θ) = WH ∇y (y; θ) (2.44) CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 44 with W [δ 1 , ..., δ P ] the non-singular square matrix stacking the P linearly-independent directions. Therefore, the CRB bound is given by BCRB = lim sup δ 1 ,...,δ P →0 BHCRB = Σvs (θ) Σ# ss (θ) Σsv (θ) = Dg (θ) J# (θ) DH g (θ) ≤ Σvv (θ) (2.45) where Σvs (θ) and Σss (θ) are given by ∂g (θ) ∂θT J (θ) E ∇y (y; θ) ∇H y (y; θ) = −E {Hy (y; θ)} , Dg (θ) respectively (Appendix 2.F). The matrix W becomes irrelevant provided that W−1 exists and, thus, we can choose the canonical basis W = IP . Notice that the CRB bound only makes sense in estimation problems, i.e., when the parameter is continuous and the first- and second-order derivatives exist for θ ∈ Θ. On the other hand, the above large-error bounds could be also applied to detection problems in which the parameters are discrete variables. In [Fen59, Th. 1], it is shown that the necessary and sufficient condition for a statistic z (y) to attain the CRB is that fy (y; θ) belongs to the exponential family below fy (y; θ) = exp hT (θ)z (y) + u(θ) + v (y) (2.46) whatever the content of h(θ), u(θ) or v (y). From the fourth property of the ML estimator in Section 2.3.2, it follows that z (y) must be the maximum likelihood estimator. This result can also be obtained by introducing the CRB score function (2.43) into Proposition 2.2. The existence of efficient estimates for the exponential family is relevant since the normal, Rayleigh and exponential distributions are members of this family [Kay93b, Pr. 5.14]. Another interpretation of the Cram´er-Rao bound is possible [For02] if equation (2.40) is evaluated locally for every value of the true parameter θo . Thus, the Cr´amer-Rao bound can be obtained solving the following optimization problem: ∂b (θ) min E α − g (θo ) subject to b (θo ) = 0 and, =0 ∂θ θ=θ o α 2 where b (θ) = E { α} − g (θ) stands for the estimator bias. Finally, the Cram´er-Rao bound can also be derived by expanding the log-likelihood function in a quadratic Taylor series around the true parameter θ = θo (small-error condition), obtaining that ln fy (y; θ) ln fy (y; θo ) + ∇ (y; θo ) (θ − θo ) + 1 Tr H (y; θo ) (θ − θo ) (θ − θo )H 2 (2.47) 2.6. LOWER BOUNDS IN PARAMETER ESTIMATION 45 where ∇ (y; θ) and H (y; θ) are the gradient and Hessian of the log-likelihood function. Thus, the gradient of the log-likelihood is linear in the parameter of interest, ∇ (y; θ) ∇ (y; θo ) + H (y; θo ) (θ − θo ) , and becomes zero for ML θo − H−1 (y; θo ) ∇ (y; θo ) . θ Taking now into account the invariance property of the ML estimator, we obtain the following clairvoyant estimator of α = g (θ), M L ) g(θo ) + Dg (θo ) H−1 (y; θo ) ∇ (y; θo ) , M L = g(θ α whose covariance matrix coincides with the CRB (2.45). Although the above estimator does not admit a closed form unless fy (y; θ) belong to the exponential family (2.43), efficient estimates are approximatelly supplied by the Newton-Raphson and scoring algorithms in the small-error M L θo (Section 2.5). k = θ regime, i.e., limk→∞ θ Bhattacharyya Bound (BHB) The Bhattacharyya bound constitutes an extension of the CRB when considering the higherorder derivatives in the Taylor expansion of fy (y; θ) (2.47). Therefore, it is also a small-error bound with higher-order derivative constraints on the bias. Indeed, it can be seen as the result of the following optimization problem [For02]: ∂ n bH (θ) α − g (θo ) subject to b (θo ) = 0 and, = 0 (i = 1, ..., N) min E ∂θn θ=θ o α 2 where b (θ) = E { α} − g (θ) stands for the estimator bias and, ∂θn ∈ RP n stands for the vectorized n-th power of the differential ∂θ, which can be computed recursively as ∂θn = vec ∂θn−1 ∂θT with ∂θ1 ∂θ. Notice that the CRB corresponds to N = 1. = z (y) is an To motivate the interest of the Bhattacharyya bound, let us consider that θ efficient, unbiased estimator of θ and, therefore, the likelihood function is given by (2.46). Let us consider the estimation of the following polynomial in θ of order I, α = g(θ) = I Gi θi , i=0 i with θi ∈ RP the vectorized i-th power of θ. It can be shown that the estimator (y) = g(θ) + Σvs (θ) Σ# α ss (θ) s (y; θ) CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 46 attains the N-th order Bhattacharyya bound for N ≥ I [Gor91, Prop. 3][Fen59, Th. 2] with sn (y; θ) = 1 ∂ n ln fy (y; θ) ∂ n fy (y; θ) = fy (y; θ) ∂θn ∂θn (n = 1, ..., N), the n-th component of the Bhattacharyya score funtion s (y; θ) = [sT1 (y; θ) , ..., sTN (y; θ)]T . Accordingly, the Bhattacharyya bound becomes BBHB (θ) = Σvs (θ) Σ# ss (θ) Σsv (θ) ≥ Σvv (θ) where ∂g (θ) ∂ 2 gH (θ) ∂ N gH (θ) , , . . . , Σvs (θ) = 2 T N T (∂θ)T ∂θ ∂θ Σss (θ) = E s (y; θ) sH (y; θ) bearing in mind the results in Appendix 2.F [Abe93]. (y) is unable to attain the N-th Battacharyya bound for any N < I It can be proved that α and hence the Cram´er-Rao bound (N = 1). Moreover, the ML estimator is not efficient even in the asymptotic case [Fen59]. Finally, the Bhattacharyya bound can also be obtained from the Barankin bound when we have at least Q = N × P test points that converge to the true parameter θ following N linearly-dependent trajectories per parameter [Gor91, Sec. 4][For02]. In [For02], the N colinear trajectories corresponding to the p-th parameter are θ+nδ p with δ p → 0 and n = 1, ..., N . Therefore, we have that BBHB = lim nδ p →0 BHCRB for p = 1, ..., P and n = 1, ..., N. Deterministic Cram´ er-Rao Bounds in the presence of Nuisance Parameters All the above lower bounds are formulated from the likelihood function fy (y; θ). If we deal with a blind estimation problem in which there is a vector of unknown stochastic nuisance parameters x, we have to calculate fy (y; θ) from the conditional p.d.f. fy/x (y/x; θ) as explained in Section 2.3 and indicated next fy (y; θ) = Ex fy/x (y/x; θ) = fy/x (y/x; θ) fx (x) dx. Therefore, the same assumptions about the nuisance parameters leading to the conditional and Gaussian ML estimators in Section 2.4.2 and 2.4.3 can be applied now to obtain their asymptotic performance in the small-error regime. In the first case, we obtain the so-called 2.6. LOWER BOUNDS IN PARAMETER ESTIMATION 47 conditional CRB (CCRB) and, in the second case, the Gaussian unconditional CRB (UCRB). The CCRB and UCRB were deduced in [Sto90a][Sto89] [Ott93] in the context of array signal processing and adapted to the field of digital synchronization in [Vaz00][Rib01b][Vaz01] and references therein. To obtain the (Gaussian) UCRB, the observed vector y is supposed to be normally distributed (2.24). Likewise, the CCRB is obtained assuming that y is distributed according to the conditional p.d.f. fy (y; θ, xM L (θ)) (2.19). Therefore, the CCRB and UCRB are not “universal” lower bounds and, in general, they are only meaningful in the ambit of the conditional or the unconditional assumptions. Thus, the CCRB and UCRB can be derived from the CRB formula (2.45) under the assumption adopted on the nuisance parameters. In the multidimensional case, it is obtained in Appendix 2.G that T BCCRB (θ) = Dg (θ) J# c (θ) Dg (θ) (2.48) T BU CRB (θ) = Dg (θ) J# u (θ) Dg (θ) (2.49) −1 ⊥ D (θ) I ⊗ R P (θ) Jc (θ) 2 Re DH K a a w A (2.50) −1 ∗ Ju (θ) DH Dr (θ) r (θ) (R (θ) ⊗ R (θ)) (2.51) where are the Fisher’s information matrix for the conditional and unconditional model, respectively, and Da (θ) , Dr (θ) are defined as ∂A (θ) [Da (θ)]p vec ∂θp ∂R (θ) [Dr (θ)]p vec . ∂θ p The CCRB predicts the asymptotic performance of the CML and GML quadratic estimators when the SNR goes to infinity. On the other hand, the UCRB supplies the performance of the GML estimator for Gaussian nuisance parameters or, in general, for infinitely large samples. These two bounds are generally applied to bound the (small-error) variance of second-order estimation methods. However, in this dissertation it is shown that, if the nuisance parameters belong to a polyphase alphabet of constant modulus, this information can be exploited —using exclusively quadratic processing— to improve the CML and GML estimates. The covariance of the resulting estimator is shown in Chapter 4 to be the highest lower bound on the performance of any second-order technique. The resulting bound is deduced in Section 4.2 and has the following form H BBQU E (θ) = Dg (θ) J# 2 (θ) Dg (θ) , CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 48 where BQUE is the acronym of “Best Quadratic Unbiased Estimator” [Vil01a][Vil05] and −1 J2 (θ) DH (θ) Dr (θ) r (θ) Q (2.52) becomes the Fisher’s information matrix in second-order estimation problems (4.14) with Q (θ) the matrix containing the central fourth-order moments of y (3.10). Another useful lower bound is the so-called modified CRB (MCRB). This bound was deduced in the context of digital synchronization by D’Andrea et al. [And94] under the assumption that all the nuisance parameters are known (see also [Men97][Moe98][Vaz00]). This assumption corresponds to data-aided estimation problems in which the input signal is known. Thus, the MCRB allows assessing the performance loss due to the lack of knowledge about the nuisance parameters in blind estimation problems. In the multidimensional case, the MCRB is given by H BM CRB (θ) = Dg (θ) J# m (θ) Dg (θ) ≤ Σvv (θ) (2.53) where Jm (θ) −Ex Ey/x ∂ 2 ln fy/x (y/x; θ) ∂θ∂θT −1 Da (θ) . = 2 Re DH a (θ) IK ⊗ Rw (2.54) is deduced in Appendix 2.G. To conclude this section, let us explain how the lower bounds above are connected in the studied linear model. It can be shown that BU CRB (θ) ≥ BBQU E (θ) ≥ BCRB (θ) ≥ BM CRB (θ) . (y) is a second-order unbiased estimator of g (θ), the associated error covariAdditionally, if α ance matrix holds that Σvv (θ) ≥ BBQUE (θ) ≥ BCRB (θ) ≥ BM CRB (θ) , and the following statements are verified: 1. BCRB (θ) = BM CRB (θ) if the nuisance parameters are known [And94]. Alternatively, the MCRB could be attained in high-SNR scenarios if the mean of the nuisance parameters were not zero (i.e., semiblind estimation problems). is a sufficient statistic for the estimation problem 2. BBQUE (θ) = BCRB (θ) if and only if R at hand. This occurs in case of Gaussian nuisance parameters (Section 2.4.3), or in lowSNR scenarios (Section 2.4.1) whatever the distribution of the nuisance parameters x. 2.6. LOWER BOUNDS IN PARAMETER ESTIMATION 49 3. BU CRB (θ) = BBQU E (θ) if the nuisance parameters are Gaussian or the SNR is sufficiently low. Moreover, if the amplitude of x is not constant, it is shown in this thesis that the Gaussian assumption supplies asymptotically (M → ∞) second-order efficient estimates, i.e., BU CRB (θ) → BBQU E (θ). This point is intensively studied in Chapter 7. 4. BCCRB (θ) ≤ BU CRB (θ) and BCCRB (θ) = BU CRB (θ) if the SNR tends to infinity (Appendix 2.D). 2.6.2 Bayesian Bounds based on the Cauchy-Schwarz Inequality In the Bayesian case, lower bounds on the estimator MSE can also be derived from the CauchySchwarz inequality # Ey,θ eeH ≥ Ey,θ esH Ey,θ ssH Ey,θ seH in which the expectation involves also the random parameters and the score function s (y, θ) is zero-mean for any value of y [Wei88b, Eq. 1], i.e., Eθ/y {s (y, θ)} = s (y, θ) fθ/y (θ/y) dθ = 0 (2.55) and, therefore, Ey,θ {s (y, θ)} = Ey Eθ/y {s (y, θ)} = 0. Once again the bound is attained if and only if the estimation error is proportional to the selected score function, i.e., # e (y, θ) = Ey,θ esH Ey,θ ssH s (y, θ) . It is known that the conditional mean estimator yields the highest lower bound on the (Bayesian) MSE [Wei88b, Eq. 9][Kay93b, Sec. 11.4] with s (y; θ) = e (y; θ) = Eθ/y {g(θ)/y} − g (θ) the associated score function. However, the conditional mean estimator is often not practical because it usually requires numerical integration. For this reason, some simpler but weaker lower bounds have been proposed in [Wei88b] by adopting a different set of score functions. Accordingly, none of these bounds will be attained unless they coincide with the MMSE bound. Among these bounds, we can find the Bayesian Cram´er-Rao [Tre68] [Wei88b], Bayesian Bhattacharyya [Tre68][Wei88b], Bobrovsky-Zakai [Bob76] and Weiss-Weinstein [Wei85][Wei88b]. These bounds are the Bayesian counterparts of the CRB, Bhattacharyya, Hammersley-Chapman-Robbins and Barankin-type deterministic bounds, respectively, in which the likelihood function fy (y; θ) is substituted by the joint p.d.f. fy,θ (y, θ). Notice that Bayesian bounds are implicitly large-error bounds because the whole range of θ is considered by means of the parameter prior fθ (θ). The Weiss-Weinstein bound is briefly described in the following section since it is the most general one. CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 50 Weiss-Weinstein Bound (WWB) The Weiss-Weinstein bound can be understood as the Bayesian version of a Barankin-type bound in which multiple test points are considered. The score function of the WWB is given by Qs (y, θ, δ) f (δ) dδ s (y, θ) = θ±δ∈Θ with Qs (y, θ, δ) defined as Qs (y, θ, δ) fy,θ (y, θ + δ) fy,θ (y, θ) s(δ) − fy,θ (y, θ − δ) fy,θ (y, θ) 1−s(δ) and the terms 0 < s (δ) < 1 and f (δ) selected to produce the tightest lower bound. If we choose s (δ) = 1, we have exactly the Bayesian replica of the Barankin bound. However, the authors showed that tighter lower bounds can be derived with s (δ) < 1. The above score function verifies the regularity condition Eθ/y {s (y, θ)} = 0 in (2.55) so that the WWB can be computed as BW W B = sup Σes Σ# ss Σse ≤ Σee f (δ),s(δ) where Σes = Ey,θ e (y, θ) sH (y, θ) = −Ey,θ g (θ) sH (y, θ) fy,θ (y, θ + δ) s(δ) H [g (θ + δ) − g (θ)] f (δ) dδ = Ey,θ fy,θ (y, θ) θ±δ∈Θ Σss = Ey,θ s (y, θ) sH (y, θ) Thus far, infinite test points have been considered as done in the initial approach to the Barankin bound in (2.41). If a finite number of Q test points shall be considered, we can always $ use a set of delta measures, f (δ) = Q q=1 f (δ q ) δ (δ − δ q ) , to obtain the following score function s (y, θ) = Q Qs y, θ, δ q f (δ q ) , q=1 that must be optimized for {δ q }q=1,...,Q , {f (δ q )}q=1,...,Q and {s (δ q )}q=1,...,Q . In that case, the Qth-order WWB can be obtained as indicated next BW W B = = sup {δ q },{f (δ q )},{s(δ q )} sup {δ q },{s(δ q )} Σes Σ# ss Σse = GQ# GH sup {δ q },{f (δ q )},{s(δ q )} H G FF# Q# FF# GH 2.6. LOWER BOUNDS IN PARAMETER ESTIMATION 51 where Σes = GFH and Σss = FQFH are given by [G]q Ey,θ fy,θ (y, θ + δ q ) [g (θ + δ q ) − g (θ)] fy,θ (y, θ) s(δ q ) F [f (δ1 ) , . . . , f (δ Q )] [Q]p,q Ey,θ Qs y, θ, δp Qs y, θ, δ q . A simpler expression is obtained if g(θ) = θ. In that case, we get the original WWB bound [Wei85], that is given by # ∆H ≤ Σee BW W B = sup ∆Q (2.56) ∆ with ∆ [δ 1 , . . . , δ Q ] and Q p,q Ey,θ Qs y, θ, δ p Qs y, θ, δ q , Ey,θ Ls y, θ, δ p Ey,θ Ls y, θ, δ q using the following definition Ls (y, θ, δ) fy,θ (y, θ + δ) fy,θ (y, θ) s(δ) . The optimization of BW W B is normally prohibitive and the authors suggest in [Wei88b, Eq. 39] to work with s (δ q ) = 1/2 because it is usually the optimal choice in the unidimensional case. In that case, it is possible to write the WWB bound in terms of the distance µ (s, θ, δ) ln Ey,θ {Ls (y, θ, δ)} = ln 1−s s fy,θ (y, θ + δ) fy,θ (y, θ) dydθ used to derive the Chernoff bound on the probability of detection error [Tre68, p. 119]. Thus, can be represented in terms of the Bhattacharyya distance µ (1/2, δ) as follows the matrix Q Q p,q 2 eµ(1/2,δ p −δ q ) − eµ(1/2,δ p +δ q ) . eµ(1/2,δ p )+µ(1/2,δ q ) As it happened in the deterministic case, the Bobrovsky-Zakai, Bayesian Cram´er-Rao and Bhattacharyya bounds can be deduced from the more general Weiss-Weinstein bound in (2.56). Specifically, the Bobrovsky-Zakai bound is obtained by setting s = 1 and Q = P (i.e., a test point per parameter). The Bayesian CRB is obtained from the Bobrovsky-Zakai bound if the Q = P test points converge to the true parameter along linearly-independent lines. In addition, the Nth-order Bhattacharyya bound is obtained when there are N × P test points converging to the true parameter through P linealy-independent trajectories. CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 52 2.6.3 Bayesian Bounds based on the Kotelnikov’s Inequality Other important class of Bayesian lower bounds are obtained from the Kotelnikov’s inequality proposed for the first time in [Kot59, p. 91], and used afterwards in [Bel74, Eq. 2] and [Cha75] to bound the MSE in case of a single uniformly distributed parameter. The Kotelnikov’s result is extended in [Bel97, Eq. 11] to admit any distribution of the parameter of interest, resulting in the following inequality Pr (|e| ≥ δ) ≥ ∞ −∞ Pe (θ − δ, θ + δ) [fθ (θ − δ) + fθ (θ + δ)] dθ D (δ) (2.57) where e = θ − θo is the estimation error for the scalar case, fθ (θ) is the parameter prior and Pe (θ − δ, θ + δ) is the minimum error probability associated to the following binary detection problem: Definition 1 Let us assume that the parameter θ o could take only two possible values, θ− θ−δ and θ+ θ + δ with probabilities Pr θ − + fθ θ − fθ θ + and Pr θ − , fθ θ− + fθ θ+ fθ θ + fθ θ + respectively. In that case, the estimation problem becomes a binary detection problem consisting in deciding the most likely hypothesis θ− or θ+ in view of the observation yo and the prior probabilities Pr θ− and Pr θ+ . The solution to this classical problem is supplied by the MAP detector or, equivalently, by the likelihood ratio test [Kay93a]. Then, the parameter is decided as follows θ − fy y; θ − Pr θ− ≥ fy y; θ+ Pr θ+ θ= θ + fy y; θ − Pr θ− < fy y; θ+ Pr θ+ and, thus, Pe θ − , θ+ = Pr θ − θ ∞ fy y; θ− dy+ Pr θ+ θ −∞ fy y; θ+ dy. If the proposed estimator solves optimally the related detection problem for all the possible values of θ, equation (2.57) is hold with equality. Moreover, if the hypotesis are very close (δ → 0), the MAP estimator, M AP = arg max fy (y; θ) fθ (θ) , θ θ attains the Kotelnikov’s bound in (2.57) and, thus, minimizes Pr (|e| ≥ δ) as explained in [Kay93b, Sec. 11.3]. 2.6. LOWER BOUNDS IN PARAMETER ESTIMATION 53 Ziv-Zakai Bounds (ZZB) The original work relating the estimation and detection problems was presented by Ziv and Zakai in [Ziv69]. However, they applied the Chebyshev’s inequeality, Eθ E |e|2 , δ2 in lieu of the Kotelnivov’s one (2.57), and the resulting bound was looser. The original idea was Pr (|e| ≥ δ) ≥ improved in [Cha75] [Bel74][Wei88a][Bel97] where the Kotelnikov’s inequality is used to derive tight bounds on the (Bayesian) MSE. To do so, it is necessary to use the following relation between Pr (|e| ≥ δ) and the mean square error [Bel97, Eq. 2]: ∞ 2 Eθ E |e| = Pr (|e| ≥ δ) δdδ 0 where the Bayesian expectation is made explicit again. In the scalar case, the Ziv-Zakai bound is extended in [Bel97, Eq. 14] as follows 2 Eθ E |e| ≥ ∞ ν [D(δ)] δdδ (2.58) 0 where D(δ) is the bound on Pr (|e| ≥ δ) introduced previously in (2.57) and ν [·] is the “valleyfilling” function introduced by Bellini and Tartara in [Bel74] and defined as v [f (x)] max f(x + ξ). ξ≥0 If the prior distribution is uniform on a finite interval, the above bound reduces to the BelliniTartara bound [Bel74]. Finally, the Bellini-Tartara bound is generalized in [Bel97] to multivariate problems and arbitrary prior functions. In that case, the extended Ziv-Zakai bound is obtained projecting the − θo onto a given direction determined by the vector v [Bel97]. For a estimation error e = θ given v, we have the same expression, H Eθ E v e ≥ 0 where Dmax (δ) = max ∆:vH ∆=δ ∞ −∞ ∞ ν [Dmax (δ)] δdδ, Pe (θ − ∆, θ + ∆) [fθ (θ − ∆) + fθ (θ + ∆)] dθ. In principle, the two hypothesis θ− θ− ∆ and θ+ θ + ∆ could be placed arbitrarily in the hyperplane RP provided that the projection of the estimation error vH e is equal to δ in case of an erroneous detection or, in other words, ∆ must hold that vH ∆ = δ. Then, the tightest lower bound corresponds to the vector ∆, yielding the highest error probability. The reader is referred to the original work [Bel97] for further results, properties and examples. The utilization of the Ziv-Zakai bound (ZZB) in the problem of passive time delay estimation is carried out in detail in [Wei83][Wei84]. CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 54 DETERMINISTIC Lower Bounds Cramér-Rao (CRB) [Kay93b] Battacharyya [Bat46][For02] [Fen59] [Gor91] Conditional CRB [Sto89][Rib01] [Vaz00] Unconditional CRB [Sto90a] [Vaz00] BQUE [Vil01b][Vil05] Chapter 3 Modified CRB [And94][Men97] [Moe98] BAYESIAN Lower Bounds Bayesian CRB [Tre68][Wei88b] Bayesian Battacharyya [Tre68][Wei88b] Small-Error Bounds CauchySchwarz Inequality Second-Order MMSE Barankin Kiefer HammersleyChapman-Robbins Abel [Bar49][Mar97] [Tre68][For02] Classification [Vil01b][Vil05] Chapter 4 Weiss-Weinstein [Wei85][Wei88b] Bobrovsky-Zakai [Bob76] Ziv-Zakai [Ziv69] Bellini-Tartara [Bel74] Extended ZivZakai [Bel97] [Kie62] [Ham50][Cha51] [Gor90] Large-Error Bounds [Abe93] Kotelnikov’s Inequality Figure 2.5: Classification of the most important lower bounds in the literature. The lower bounds assuming a certain model for the nuisance parameters —or imposing the second-order constraint— are marked in gray. 2.A UML FOR POLYPHASE ALPHABETS 55 Appendix 2.A UML for polyphase alphabets Let us consider that the nuisance parameters belong to a polyphase alphabet of dimension I so that xk ∈ ej2πi/I with i = 0, ..., I − 1. In that case, it can be shown that the log-likelihood ln fy (y; θ) is the sum of a finite number of cosh(·) functions, which are computed next Ex I−1 ∗ H −1 1 −1 = exp 2 Re e−j2πi/I aH R y exp 2 Re xk ak Rw y k w I i=0 I/2−1 2 −1 cosh 2 Re ej2πi/I aH R y = w k I i=0 K −1 = Ex exp 2 Re x∗k xl aH k Rw al l>k K l>k I−1 2 j2πi/I H −1 (I − i) cosh 2 Re e a R a k w l I2 i=0 2 −1 −1 = Ex exp 2 Re |xk |2 aH cosh aH k Rw ak k Rw ak . I I/2−1 i=0 Notice that the term Ex exp xH AH R−1 can be omitted if AH R−1 w Ax w A does not depend on the parameter. This situation is usual in digital communications [Men97, Sec. 5.7.3] because the noise is white (i.e., R = σ2 IM ) and aH al ∼ = Es δ (k, l) with Es the energy of the received w w k symbols. In that case, we have that ln fy (y; θ) ∝ K k=1 I/2−1 ln i=0 −1 cosh 2 Re ej2πi/I aH (θ) R y . w k CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 56 Appendix 2.B Low-SNR UML results Unbiasedness Condition It can be shown that the low-SNR UML estimator is unbiased for any positive SNR if AH (θ) R−1 w A (θ) is independent of θ. If this condition is verified, then the mean value of the log-likelihood gradient is null at θ = θo for any value of the parameter. The proof is provided next. Let Ey ∂ −1 ln fy (y; θ) = Tr AH (θo ) R−1 w Dp (θ o ) Rw A (θ o ) ∂θp θ=θ o (2.59) be the expected value of the log-likelihood gradient under the low SNR approximation with Dp (θ) ∂A (θ) ∂ ∂AH (θ) A (θ) AH (θ) = AH (θ) + A (θ) . ∂θ p ∂θp ∂θp If we plug Dp (θ) into (2.59), the argument of the trace can be written as AH (θ) R−1 w A (θ) ∂ H A (θ) R−1 w A (θ) ∂θp using that Tr (AB) = Tr (BA). Therefore, since AH (θ) R−1 w A (θ) is positive definite, (2.59) −1 −1 vanishes iff AH (θ) R−1 w A (θ) is independent of θ. This condition implies that Rw Dp (θ) Rw (2.59) must lie completely into the orthogonal subspace of A (θ) for p = 1, ..., P . Self-Noise Free Condition If the gradient of the low-SNR UML log-likelihood function is not zero at θ = θo as the noise variance goes to zero, the estimator variance exhibits a variance floor due to the randomness of the nuisance parameters. A sufficient condition to have self-noise free estimates at high SNR is that ∂ lim ln fy (y; θ) =0 σ2w →0 ∂θp θ=θ o meaning that xH AH (θ) N−1 Dp (θ) N−1 A (θ) x = 0 for any value of θ and x. Notice that this requirement coincides with the unbiasedness condition if A (θ) x effectively spans all the signal subspace. 2.C CML RESULTS 57 Appendix 2.C CML results Unbiasedness −1/2 Plugging B (θ) Rw A (θ) into ln fy (y; θ, xM L (θ)) (2.14), we have that H −1 H −1/2 ln fy (y; θ, R = xM L (θ)) = C4 + Tr R−1/2 B (θ) B (θ) B (θ) B (θ) R w w = C4 + Tr R−1/2 PB (θ) R−1/2 R w w −1 H B (θ) the orthogonal projector onto the subspace genwith PB (θ) B (θ) BH (θ) B (θ) erated by the columns of B (θ). Computing now the log-likelihood gradient, it is found that ∂ −1/2 ∂PB (θ) −1/2 ln fy (y; θ, xML (θ)) = Tr Rw Rw R ∂θp ∂θ p where the derivative of the orthogonal projector is given by [Vib91, Eq. 33] % &H ∂B (θ) # ∂B (θ) # ∂PB (θ) ⊥ ⊥ = PB (θ) B (θ) + PB (θ) B (θ) ∂θp ∂θ p ∂θ p (2.60) with P⊥ B (θ) IM − PB (θ). Therefore, the expected value of the gradient is ' ∂ ∂P (θ) B −1/2 −1/2 H A (θo ) A (θo ) + Rw ln fy (y; θ, xML (θ)) R Ey = Tr Rw ∂θp ∂θp θ=θ o w θ=θ o ' ∂PB (θ) ∂PB (θ) H = Tr + B (θo ) B (θo ) , ∂θp ∂θ p θ=θ o θ=θ o that is equal to zero because P⊥ B (θ) B (θ) = 0 BH (θ) P⊥ B (θ) = 0. Self-Noise Free If the gradient of the CML log-likelihood function is not zero at θ = θo as the noise variance goes to zero, the estimator variance exhibits a variance floor due to the randomness of the nuisance parameters. A sufficient condition to have self-noise free estimates at high SNR is that ∂ lim ln fy (y; θ, xM L (θ)) = 0, σ2 →0 ∂θp θ=θ o w meaning that ∂PB (θ) x B (θ) B (θo ) x = 0 ∂θp θ=θ o H H for any value of θ and x. Notice that the last equation is verified for any value of x due to (2.60). Actually, the CML is able to cancel out the self-noise as well as the bias because of the orthogonal projector P⊥ B (θ) appearing in ∂PB (θ) /∂θ p (2.60). CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 58 Appendix 2.D GML asymptotic study Using the inversion lemma [Kay93b, p. 571], we find that R−1 (θ) has the following asymptotic expressions: H −1 lim R−1 = R−1 w IM −AA Rw −1 H −1 IM −A AH R−1 A Rw lim R−1 = R−1 w w A σ2w →∞ σ2w →0 with the operator lim meaning “asymptotically approximated to” in this appendix. If we substitute these results into (2.24) and omit constant terms, we obtain the following asymptotic expressions for the GML cost function: H −1 −1 H −1 lim ln Ex {fy (y/x; θ)} ∝ Tr ln IM −AA Rw + Tr Rw AA Rw R σ2w →∞ −1 H −1 − Tr AH R−1 A + Tr R AA R R (2.61) w w w H −1 −1 H −1 lim ln Ex {fy (y/x; θ)} ∝ − Tr ln AAH + Rw + Tr R−1 A A Rw A A Rw R w σ2w →0 H −1 −1 H −1 , Tr R−1 A A Rw A A Rw R (2.62) w that correspond to the low-SNR UML and CML solutions obtained in (2.17) and (2.19), respectively. The independent term b (θ) in (2.61) has been approximated using the Taylor expansion of the logarithm and the commutative property of the trace [Kay93b, p. 571], yielding H −1 H −1 lim Tr ln IM −σ−2 = T r ln (IM ) + Tr −σ−2 = − Tr AH R−1 w AA N w AA N w A . σ2w →∞ On the other hand, the independent term b (θ) in (2.62) is neglected at high SNR since it converges to the constant − Tr ln AAH whereas the second term is proportional to σ−2 w . 2.E CLOSED-LOOP ESTIMATION EFFICIENCY 59 Appendix 2.E Closed-loop estimation efficiency Following the indications in [Kay93b, Appendix 7B], if the observation y is splitted into N statistically independent blocks, the log-likelihood function ln fy (y; θ) is given by ln fy (y; θ) = N ln fz (zn ; θ) n=1 and, thus, the corresponding gradient and Hessian are given by ∇(y; θ) = H(y; θ) = N ∂ ln fz (zn ; θ) n=1 N ∂θ ∂ 2 ln f N ∇z (zn ; θ) n=1 z (zn ; θ) T (2.64) ∂θ∂θ n=1 (2.63) respectively. Therefore, the Newton-Raphson and scoring algorithms are updated in the k-th iteration adding the following term N 1 k )J−1 (θ k )∇z (zn ; θ k ), Dg (θ z N n=1 (2.65) in which we have taken into account that N ∂ 2 ln fz (zn ; θ) n=1 ∂θ∂θT NEz ∂ 2 ln fz (zn ; θ) ∂θ∂θT −NJz (θ), for N sufficiently large [Kay93b, Appendix 7B]. Notice that the last equation is approximatelly equal to the Fisher’s information matrix: J(θ) = −Ey {H(y; θ)} = − N Ez n=1 ∂ 2 ln fz (zn ; θ) ∂θ∂θT = −NJz (θ). Then, the averaging in (2.65) can be substituted by an exponential filtering such as k )J−1 (θ k )∇z (zn ; θ k ), εn = (1 − µ) εn−1 − µDg (θ (2.66) with ε0 = 0. The step-size or forgetting factor µ is adjusted to yield the same noise equivalent bandwidth, which is defined as 1/2T Bn −1/2T |H (f)|2 df 2T |H (0)|2 where T is the sampling period and H (f) is the frequency response of the adopted filter [Men97, Sec. 3.5.5]. Using this formula, it follows that the noise equivalent bandwidth for the integrator (2.65) and the exponential filter (2.66) is Bn = 0.5/N and Bn = 0.5µ/(2−µ) µ/4, respectively, 60 CHAPTER 2. ELEMENTS ON ESTIMATION THEORY where the last approximation is verified for µ 1. Using this approximation, the step-size µ is approximatelly equal to 2/N (for N 1) and (2.66) can be written as k )J−1 (θ k )∇z (zn ; θ k ). εn = εn−1 − µDg (θ (2.67) Finally, if (2.67) is integrated into the Newton-Raphson or scoring recursions, and the estimated parameter is updated after processing each block, we obtain the closed-loop estimator presented in (2.30). Notice that the obtained closed-loop estimator can also iterate the N blocks several times as the original iterative methods in (2.27) and (2.29). 2.F COMPUTATION OF ΣVS (θ) FOR THE SMALL-ERROR BOUNDS 61 Appendix 2.F Computation of Σsv (θ) for the small-error bounds The computation of Σsv (θ) = E s(y; θ)vT (y; θ) for the Bhattacharyya and Cram´er-Rao bounds requires to compute the following term n n ∂ ln fy (y; θ) H ∂ fy (y; θ) H Ey v (y; θ) = v (y; θ) dy n ∂θ ∂θn (2.68) The last term can be further manipulated taking into account that the estimator is unbiased, i.e., E {v (y; θ)} = 0. Then, if the chain rule is applied and the integral and derivative signs are swapped, we obtain that H ∂n ∂n E (y; θ) = v fy (y; θ) vH (y; θ) dy y ∂θn ∂θn n ∂ fy (y; θ) H ∂ n vH (y; θ) = v (y; θ) dy + f (y; θ) dy = 0, y ∂θn ∂θn (y) − g(θ), it follows that Then, using that v (y; θ) α ∂ n vH (y; θ) ∂ n gH (θ) ∂ n gH (θ) dy = − f y (y; θ) dy = − f y (y; θ) n n ∂θ ∂θ ∂θn must be equal to (2.68) except for the minus sign. Thus, we conclude that n ∂ ln fy (y; θ) H ∂ n gH (θ) Ey v (y; θ) = . ∂θn ∂θn CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 62 Appendix 2.G MCRB, CCRB and UCRB derivation In this appendix, the derivation of the lower bounds introduced in Section 2.6.1 is sketched. UCRB Derivation The UCRB involves the computation of the following score function: ∂ ln fy (y; θ) ∂ ln det (R (θ)) + Tr R−1 (θ) R =− ∂θp ∂θp ∂R (θ) −1 −1 − R (θ) , R (θ) R = Tr R (θ) ∂θp [su (y; θ)]p where fy (y; θ) is the Gaussian p.d.f. introduced in (2.22) and the following two expressions from [Ott93, Eq. 4.57-58] have been applied: ∂ ∂R (θ) −1 ln det (R (θ)) = Tr R (θ) ∂θp ∂θp ∂ ∂R (θ) −1 −1 −1 Tr R (θ) R = − Tr R (θ) R (θ) R . ∂θp ∂θp Therefore, the score function su (y; θ) can be written as follows −1 ∗ ( r − r (θ)) , su (y; θ) = DH r (θ) (R (θ) ⊗ R (θ)) with the following definitions [Dr (θ)]p vec (∂R (θ) /∂θp ) r vec R (2.69) r (θ) vec (R (θ)) , and using the following relationships: vec ABCH = (C∗ ⊗ A) vec (B) A−1 ⊗ B−1 = (A ⊗ B)−1 . Finally, in the unconditional model, the Fisher’s information matrix becomes −1 H ∗ Dr (θ) Ju (θ) Ey su (y; θ) sH u (y; θ) = Dr (θ) (R (θ) ⊗ R (θ)) using that the covariance matrix of r − r (θ) is precisely R∗ (θ) ⊗ R (θ) under the Gaussian assumption [Li99, Eq. 20]. In Chapter 4, it will be shown that Ju can be obtained from J2 (2.52) when the nuisance parameters are Gaussian distributed. 2.G MCRB, CCRB AND UCRB DERIVATION 63 CCRB Derivation The CCRB was originally derived in [Sto89][Sto90a] for DOA estimation. A different derivation is given next based on the asymptotic performance of the CML estimator and the estimation bounds theory presented in Section 2.6.1. In the conditional model, the CML estimator is formulated from the following score function: ∂ ∂A (θ) # −1 ⊥ ln fy (y; θ, xML (θ)) = 2 Re Tr Rw PA (θ) A (θ) R [sc (y; θ)]p ∂θp ∂θp ∂A (θ) # −1 ⊥ A (θ) R − R (θ) = 2 Re Tr Rw PA (θ) ∂θ p that is obtained using the results in Appendix 2.C. Using again (2.69) and vec ABCH = (C∗ ⊗ A) vec (B) , it follows that ∂ H #∗ −1 ⊥ sc (y; θ) = xM L (θ)) = 2 Re Da (θ) A (θ) ⊗ Rw PA (θ) ( r − r (θ)) ln fy (y; θ, ∂θ with [Da (θ)]p vec (∂A (θ) /∂θp ). After some tedious simplifications that are omitted for the sake of brevity, in the conditional model, the Fisher’s information matrix is given by H H −1 ⊥ Ey sc (y; θ) sH c (y; θ) = 2 Re Da (θ) xx ⊗ Rw PA (θ) Da (θ) −1 H H −1 −1 ⊥ +2 Re Da (θ) A (θ) Rw A (θ) ⊗ Rw PA (θ) Da (θ) , that is found to depend on the actual vector of nuisance parameters x. It is shown in [Sto90a, Eq. 2.13] that the first term converges to its expected value as the observation size increases and, thus, xxH → IK . On the other hand, the second term can be neglected if the SNR or the observation length goes to infinity. Actually, this second term causes the CML degradation at low SNR when the observation is short [Sto90a, Eq. 2.15]. Bearing in mind these arguments, the asymptotic Fisher’s information matrix apperaring in the CCRB expression (2.50) contains only the average of the first term. The resulting expression is known to bound the performance of the CML and GML estimators whatever the SNR or the observation size. However, the adopted CCRB becomes a loose bound for low SNRs in case of finite observations. CHAPTER 2. ELEMENTS ON ESTIMATION THEORY 64 MCRB Derivation A straightforward derivation of the multidimensional MCRB is provided next: ln fy/x (y/x; θ) = const − y − A (θ) x2R−1 w ∂ ln fy/x (y/x; θ) H −1 ∂A (θ) = 2 Re (y − A (θ) x) Rw x ∂θp ∂θp 2 H ∂ 2 ln fy/x (y/x; θ) H −1 ∂ A (θ) H ∂A (θ) −1 ∂A (θ) = 2 Re (y − A (θ) x) Rw x−x Rw x ∂θp ∂θq ∂θp ∂θq ∂θq ∂θp H ∂ 2 ln fy/x (y/x; θ) H ∂A (θ) −1 ∂A (θ) = −2 Re x Ey/x Rw x ∂θp ∂θq ∂θq ∂θp and, therefore, [Jm ]p,q = −Ex Ey/x ∂ 2 ln fy/x (y/x; θ) ∂θp ∂θq ∂AH (θ) −1 ∂A (θ) = 2 Re Tr . Rw ∂θq ∂θp Finally, the elements of Jm can be arranged as in equation (2.54) using the following properties: Tr AH B = vecH (A) vec (B) vec (ABC) = CT ⊗ A vec (B) . Chapter 3 Optimal Second-Order Estimation In this chapter, optimal second-order estimators are formulated considering that the estimator is provided with some side information about the unknown parameters. This side information can be exploited to improve the estimator accuracy or imposed by the designer in order to constrain the estimator mean response. Although the formulation in both cases will be the same, the classification above becomes crucial from a theoretical viewpoint. In that way, the adopted framework allows unifying the Bayesian and classical estimation theories. In the first case, the Bayesian approach is adopted to model the unknown parameters as random variables of known probability density function fθ (θ). Thus, fθ (θ) provides the available statistical information on the parameters prior to the observation of the data. Bayesian estimators resort to this prior information when the observation is severely corrupted by the noise in low SNR scenarios. On the other hand, the prior contribution is scarse if the observation is rather informative. The above side information is supposed to be obtained in a previous estimation stage providing both the estimate and its accuracy. In that case, Gaussian priors are usually employed having in mind that the output of a consistent estimator becomes asymptotically Gaussian distributed on account of the Central Limit Theorem. When the parameter is constrained to a given interval, the folded Gaussian distribution is more appropiate [Rib97]. In particular, the folded Gaussian p.d.f. converges to the uniform distribution when all the available knowledge is the parameter range. Bayesian estimation has received a lot of attention in the past decades but it has always raised a lot of controversy because the parameters are actually deterministic unknowns in a typical estimation problem (Section 2.1). The Bayesian approach is realistic if the parameters can be modelled as ergodic realizations of the a priori distribution fθ (θ). In this kind of applications, adaptive filters or trackers must be designed in order to track the parameter temporal evolution (Section 2.5.2). If the observation is linear in the parameters and the prior is Gaussian, the optimal linear tracker is the well-known Kalman filter [And79] [Kay93b]. Unfortunately, most 65 CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 66 estimation problems in communications are nonlinear and the suboptimal Extended Kalman filter (EKF) must be used instead. In Chapter 5, the EKF formulation is generalized to design blind second-order trackers based on the Bayesian interpretation of the results in this chapter. If the classical estimation theory is adopted, the side information can be used to constrain the estimator mean response. In that case, fθ (θ) is a weighting function introduced by the designer to define new custom-built optimization criteria. Anyway, the formulation in both cases will be identical and fθ (θ) will be referred to as the prior distribution in spite of dealing with deterministic parameters. Likewise, Eθ {·} will denote indistintly the Bayesian expectation with respect to the random vector θ or solely the following averaging Eθ {F (θ)} F (θ) fθ (θ) dθ, (3.1) if the parameters are deterministic amounts. Based on the a priori distribution fθ (θ) and the known linear signal model introduced in Section 2.4, the objective is to find the optimal second-order estimator of α ∈ RQ where α = g (θ) is an arbitrary transformation of θ ∈ RP . With this aim, the general expression of any secondorder estimator of α is presented next: = b + MH α r (3.2) = vec yyH r vec R (3.3) where introduced in Section 2.4.1 is the column-wise vectorization of the sample covariance matrix R and, b and M are the estimator coefficients corresponding to the independent and the quadratic term, respectively. Notice that the linear term is not considered because the nuisance parameters x are usually zero-mean random variables in the context of NDA estimation (2.4). If the transmitted constellation is polarized or some training symbols are transmitted, the linear term LH y should be included following a semi-blind approach, improving so the estimator performance at low SNR [Mes02, Ch.3][Gor97][Car97]. Notice too that the vec (·) operator can be applied successively to formulate higher-order estimators in order to improve the estimator performance in high SNR scenarios [Vil01b]. Finally, it is worth noting that a circular constellation is assumed. In that case, the improper covariance matrix E yyT is equal to zero and, therefore, no information can be drawn from the term vec yyT where (·)T stands for the transpose [Sch03][Pic96]. In Appendix 3.A, the results in this section are extended to encompass important noncircular constellations holding 3.1. DEFINITIONS AND NOTATION 67 E yyT = 0 (e.g., PAM, BPSK or CPM). Moreover, the design of quadratic carrier phase synchronizers for the noncircular CPM modulation will be addressed in detail in Section 6.2. Henceforth, the objective is to determine the estimator optimal coefficients M and b under a given performance criterion. Two criteria will be analyzed next; the first one is the usual minimum mean squared error (MMSE) criterion that minimizes the aggregated contribution of variance and bias. The MMSE criterion usually leads to biased estimators, mainly in low SNR scenarios in which the noise-induced variance is dominant. Unfortunately, in some applications bias is not tolerated (e.g., navigation applications) and some constraints must be introduced to compensate this bias. In that case, the proposed alternative is to minimize the estimator MSE subject to the minimum bias constraint. This chapter presents a convenient framework from which different estimation strategies can be devised as a trade-off between bias and variance. With this purpose, the following definitions are introduced in the next section. 3.1 Definitions and Notation In this section, the mean square error (MSE) and variance figures are computed for the linear signal model introduced in Section 2.4 and for second-order estimation. Thus, the MSE associated to the generic second-order estimator in (3.2) is given by 2 MSE (θ) = E α (θ) − g (θ)2 = E b + MH r (θ) − g (θ) (3.4) where the expectation is computed over the noise w and the nuisance parameters x. The estimator MSE can be divided into the bias and variance contributions so that MSE (θ) = BIAS 2 (θ) + V AR (θ) where the squared bias and variance are defined as follows: 2 BIAS 2 (θ) = α (θ) − g (θ)2 = b + MH r (θ) − g (θ) 2 r (θ) − r (θ)) = Tr MH Q (θ) M V AR (θ) = E α (θ) − α (θ)2 = E MH ( (3.5) (3.6) with α (θ) the estimator mean value α (θ) E { α (θ)} = b + MH r (θ) (3.7) r (θ) E { r (θ)} = vec A (θ) AH (θ) + Rw H Q (θ) E ( r (θ) − r (θ)) ( r (θ) − r (θ)) (3.8) and (3.9) the mean and the covariance matrix of the vectorized sample covariance matrix r, respectively. Notice that r (θ) corresponds to the (vectorized) covariance matrix of y whereas Q (θ) gathers CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 68 the central fourth-order moments of y. The vectorization is fundamental to derive a closed form for the matrix Q (θ). In Appendix 3.B, it is found that Q (θ) = R∗ (θ) ⊗ R (θ) + A (θ) KAH (θ) where A (θ) A∗ (θ) ⊗ A (θ), R (θ) was introduced in Section 2.4.3 and, K Ex {vec xxH vecH xxH } − vec (IK ) vecH (IK ) − IK 2 (3.10) (3.11) is the matrix containing the fourth-order cumulants (kurtosis) of the nuisance parameters x. It is worth realizing that Q (θ) and K are calculated analytically for the linear signal model introduced in Section 2.4, avoiding so the problematic estimation of fourth-order statistics. In the case of zero-mean, circular complex nuisance parameters, the matrix K is the following diagonal matrix: K = (ρ − 2) diag (vec (IK )) where the scalar (3.12) E | [x]k |4 ρ 2 E {| [x]k |2 } is the fourth- to second-order moment ratio (Appendix 3.C). If the nuisance parameters are not circular (e.g., for the CPM modulation), the expectation in (3.11) has to be computed numerically —and offline— from the known p.d.f. of x. Moreover, if the nuisance parameters are discrete, as it happens in digital communications, the computation of K needs only a small number of realizations of fx (x). It is well-known that matrix K is zero for normally distributed nuisance parameters for which ρ = 2. Otherwise, matrix K provides the complete non-Gaussian information about the nuisance parameters that second-order estimators are able to exploit. In fact, the GML estimator is sometimes outperformed at high SNR if the second term of (3.10) is considered. This remark was actually the motivation of this thesis and will be analyzed intensively in the following chapters. Unfortunately, the vector b and matrix M minimizing the bias, variance or MSE figures are generally a function of the unknown vector of parameters θ and, therefore, the resulting estimator is not realizable. Accordingly, the estimator coefficients have to be optimized from a convenient average of these figures of merit over all the possible values of θ. In that sense, the prior fθ (θ) introduced previously is applied to obtain the following averaged MSE, bias and variance: 2 M SE Eθ {MSE (θ)} = BIAS 2 + V AR = Eθ E b + MH r (θ) − g (θ) 2 BIAS 2 Eθ BIAS 2 (θ) = Eθ b + MH r (θ) − g (θ) V AR Eθ {V AR (θ)} = Tr MH QM (3.13) (3.14) (3.15) 3.2. SECOND-ORDER MMSE ESTIMATOR 69 with Q Eθ {Q (θ)} . The estimator variance in (3.15) is independent of b. Therefore, b can be selected to minimize the bias contribution without degrading the estimator variance. It is found that the optimum b is given by bopt arg min BIAS 2 = arg min M SE = g − MH r b b (3.16) where g Eθ {g (θ)} (3.17) r Eθ E { r (θ)} (3.18) If now bopt is substituted into (3.13) and (3.14), we obtain that 2 − MH S − SH M BIAS 2 = Eθ MH (r (θ) − r) − (g (θ) − g) = σ2g + Tr MH QM (3.19) (3.20) V AR = Tr MH QM M − MH S − SH M MSE = BIAS 2 + V AR = σ 2g + Tr MH Q + Q (3.21) with the following definitions σ2g Eθ g (θ) − g2 Eθ (r (θ) − r) (r (θ) − r)H Q S Eθ (r (θ) − r) (g (θ) − g)H . (3.22) (3.23) (3.24) The expectation with respect to the prior fθ (θ) poses serious problems when calculating and S. In Appendix 3.D, this problem is solved when the analytical expressions of g, r, Q, Q the parameter dependence is phasorial. In the following sections, the MMSE estimator and the minimum bias-variance estimator are formulated, and further analyzed, assuming that these vectors and matrices have been computed somehow. 3.2 Second-Order MMSE Estimator The second-order MMSE estimator is obtained minimizing the overall MSE in equation (3.21). It follows that the optimum matrix M is given by −1 Mmse = arg min MSE = Q + Q S M (3.25) CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 70 where the inversion is guaranteed assuming that the noise covariance matrix Rw is positive definite. Notice that the above expression corresponds to the linear Bayesian MMSE estimator is the autocorrelation matrix of of α based on the sample covariance vector r where Q + Q r and, S the cross-correlation between r and α [Kay93b]. If (3.25) is now plugged into (3.21), the minimum MSE is found to be −1 2 H MSEmin = σg − Tr S Q + Q S (3.26) where σ2g is the initial (prior) uncertainty about the parameter α and the second term is the MSE improvement after processing the data vector y. It is easy to show that this second term vanishes as the noise variance is increased. 3.3 Second-Order Minimum Variance Estimator The aim of this section would be obtaining the minimum variance unbiased (MVU) estimator (Section 2.2). However, in most cases it is not possible to cancel out the bias term unless the covariance vector r (θ) is an affine transformation of α ∈ RQ , that is, r (θ) = Wg (θ) +v for any value of W and v. If so, it is straightforward to verify that the estimator bias (3.14) is removed by setting MH W = IQ . Unfortunately, this situation is unusual and quadratic estimators are normally degraded by some residual bias. Taking into account this limitation, in this section the minimum variance estimator is deduced subject to those constraints minimizing the estimator bias. Thus, let us first obtain the equation that M must verify to yield minimum bias: dBIAS 2 −S =0 = QM dM∗ (3.27) Generally, the constraints obtained in (3.27) form an underdetermined system of equations < M 2 and S ∈ CM 2 ×Q in (3.24) lies, by definition, in the column because R rank Q ∈ CM 2 ×M 2 . Hence, (3.27) is actually imposing RQ design constraints on the matrix span of Q M ∈ CM 2 ×Q = VΣVH , can be formulated as follows: that, after the diagonalization of Q VH M = S (3.28) where S Σ−1 VH S, Σ ∈ RR×R is the diagonal matrix containing the non-zero eigenvalues of and, V ∈ CM 2 ×R are the corresponding eigenvectors. Q Therefore, since equation (3.27) is only forcing R constraints, the remaining degrees of freedom in M can be used to optimize the estimator variance. Specifically, the aim is to minimize the estimator variance subject to the constraints on α (θ) given in (3.27) or (3.28), that is, Mvar = arg min V AR = arg min MH QM M M subject to QM = S or VH M = S, (3.29) 3.3. SECOND-ORDER MINIMUM VARIANCE ESTIMATOR 71 which yields the following solution: with P and P defined as Mvar = PH S = P H S (3.30) # −1 −1 Q QQ P QQ −1 H −1 V Q . P VH Q−1 V (3.31) (3.32) Thus, after substitutions in (3.2), the minimum variance estimator is given by = g + SH P ( α r − r) = g + S H P ( r − r) (3.33) where P is projecting the sample covariance vector ( r − r) onto the minimum-bias subspace in (3.27) (see Fig. 3.1). Plugging now (3.30) into (3.20), the minimum generated by matrix Q variance is equal to # −1 −1 Q V ARmin = Tr SH QQ S = Tr S H VH Q−1 V S (3.34) where the argument inside the trace operator is the covariance matrix of the estimation error: # H H −1 = S QQ Q S α (θ) − α (θ)) ( α (θ) − α (θ)) Eθ E ( −1 S (3.35) = S H VH Q−1 V Finally, plugging (3.30) into (3.19), the residual bias can be expressed in any of these alternative forms: 2 = σ2g − Tr MH S = σ2g − Tr MH QM BIASmin = σ2g − Tr SH PS = σ2g − Tr S H ΣS #S = σ2g − Tr SH Q (3.36) The last equation is obtained from MH QM using that1 =Q #Q PQ 1 = VΣVH . It is found that [Mag98, Ch.2]: The following identity is obtained from the diagonalization of Q = VΣ−1 VH Q −1 Q = VΣVH Q−1 VΣVH QQ # −1 −1 Q QQ = VΣ−1 VH Q−1 V Σ−1 VH and, thus, # −1 Q = VVH = QQ −1 Q QQ PQ #Q =Q Q # = VVH . Q taking into account that VH V = IR . CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 72 m2 =S QM M = PHS (min VAR) #S M =Q (min BIAS) )−1 S M =(Q + Q (min MSE) (min norm) m1 Figure 3.1: Geometric interpretation of the second-order estimators deduced in this chapter for a uniparametric, hypothetical problem in which M had only 2 coefficients: m1 and m2 . The is actually the orthogonal projector onto the R-dimensional subspace generated by Q. resulting expression is then simplified using the following property of the pseudo-inverse: #Q # [Mag98, Eq. 5.2]. Q # = Q Q Notice that any matrix M solving (3.27) or (3.28) yields the same bias, for example, M = # Q S. Indeed, among all of them, Mvar (3.30) is the one yielding minimum variance (Fig. 3.1). 3.4 A Case Study: Frequency Estimation In this section, the second-order MMSE and minimum variance estimators are applied to estimate the carrier frequency offset in the context of digital synchronization. This problem has been chosen because closed form expressions exist based on the results in Appendix 3.D. The signal model for frequency synchronization fits the general linear model in Section 2.4, in which the transfer matrix A (θ) is given by [A (ν)]k = exp (j2πνdM /Nss ) [A]k where ν and Nss are, respectively, the normalized carrier frequency offset and sampling rate, matrix A generates the actual modulation and, dM [0, ..., M − 1]T . The precise content of matrix A in digital synchronization will be detailed in Section 6.1.2. In addition, a uniform prior is assumed for the unknown carrier frequency ν as the following one: fν (ν) = ∆−1 |ν| ≤ ∆/2 0 otherwise 3.4. A CASE STUDY: FREQUENCY ESTIMATION 73 1 Closed−Loop (∆ → 0) Min bias OL (∆=20%) Min bias OL (∆=50%) Min bias OL (∆=80%) Min bias OL (∆=100%) Min MSE OL (∆=50%) 0.9 0.8 Estimator Mean Value 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 Normalized Frequency Error 0.7 0.8 0.9 1 Figure 3.2: Estimator mean response for different values of ∆. The simulation parameters are M=4, Nss =2, Es/No=10dB. with ∆ ≤ Nss determining the frequency offset range2 . Notice that ∆ constitutes the sole prior knowledge about the parameter. In the following figures, the MMSE and minimum variance estimators are compared in terms of bias, variance and MSE. The results in this section were partially presented in the following conferences: • “Sample Covariance Matrix Based Parameter Estimation for Digital Synchronization”. J. Villares, G. V´ azquez. Proceedings of the IEEE Global Communications Conference 2002 (Globecom 2002). November 2002. Taipei (Taiwan). • “Sample Covariance Matrix Parameter Estimation: Carrier Frequency, A Case Study”. J. Villares, G. V´ azquez. Proceedings of the IEEE International Conference on Accoustics, Speech and Signal Processing (ICASSP). April 2003. Hong Kong (China). 3.4.1 Bias Analysis The estimator mean response E { ν } is plotted as a function of the parameter value for different values of ∆. Fig. 3.2 shows how the minimum variance solution minimizes the estimator bias 2 Sometimes ∆ will be specified as a percentage of Nss . CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 74 0 10 −2 10 M mse −4 10 −6 10 BIAS2 closed−loop −8 10 −10 10 −12 10 M −14 var 10 −16 10 0 0.2 0.4 0.6 0.8 1 ∆ 1.2 1.4 1.6 1.8 2 Figure 3.3: Averaged squared bias as a function of the prior range ∆ for Es/No=10dB, M = 4 and Nss =2. within the prior range. The estimator mean response oscilates around the unbiased response cancelling out the bias of 2 min (LNss , M) −1 points within the prior interval (−∆/2, ∆/2], with L the effective pulse duration (in symbols). These points are automatically selected in order to minimize the overall estimator bias (3.19). This basic result is shortly proved in Appendix 3.E and states that the residual bias is a function of the following ratio ∆ min (LNss , M ) Therefore, if the prior range ∆ is fixed, the estimator bias can be reduced by oversampling the received signal and/or, if possible, reducing the trasmission bandwidth, i.e., increasing L. Surprisingly, the bias cannot be reduced by augmenting the observation time in the studied frequency estimation problem for M ≥ LNss (Appendix 3.E). Regarding Fig. 3.2, one concludes that the bias term increases dramatically if ∆/Nss exceeds 0.5 (50%) for the simulated MSK modulation (L = 2). In the same figure, the mean response of the MMSE estimator is plotted showing how it is clearly biased. This bias is found to increase if the SNR is reduced because, in that case, the MMSE estimator trades more bias for variance. Finally, the S-curve for the closed-loop estimator deduced in the next chapter is depicted. In that case, the estimator is only required to yield unbiased estimates around the origin (ν = 0). As it will be studied with more detail in Chapter 4, the closed-loop solution is obtained considering the asymptotic case in which ∆ → 0. 3.4. A CASE STUDY: FREQUENCY ESTIMATION 75 1 10 M var Gaussian Assumption 0 10 σ2 g −1 10 M MSE mse −2 10 self−noise BIAS2 min −3 10 −4 10 −20 −10 0 10 20 Es/No (dB) 30 40 50 60 Figure 3.4: Normalized MSE for the MMSE and the minimum variance frequency estimators deduced in (3.25) and (3.30) for the MSK modulation. The corresponding estimators deduced under the Gaussian assumption are also plotted for comparison. The simulation parameters are M = 8, Nss = 2 and ∆ = 1.6. Another interesting simulation is presented in Fig. 3.3 in which the squared bias BIAS 2 (3.19) is plotted as a function of ∆ for the MMSE estimator (Mmse ), the minimum variance estimator (Mvar ) and the closed-loop small-error estimator (∆ → 0) deduced in Chapter 4. The SNR is set to 10 dB and, therefore, the noise induced variance is very significative. This fact justifies the relaxation of the MMSE estimator with respect to the bias term. Notice that the three estimators are able to cancel out the bias term if the prior range approaches zero (∆ → 0). This simple remark is of paramount importance in the following sections and motivates the need of closed-loop algorithms for second-order blind parameter estimation (Chapter 4). 3.4.2 MSE Performance In this section, the performance of second-order frequency estimators is evaluated in terms of their mean square error (3.21). Observing the following figures, the next remarks are relevant: • A priori knowledge. The performance of the MMSE estimator is upper bounded at low SNR by the a priori mean square error σ2g (Fig. 3.4). In such a noisy scenario, the MMSE solution becomes biased with the aim of limiting the variance increase caused by the noise-induced variability. As the SNR increases, the observation provides more CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 76 1 10 Mvar 0 MSE 10 2 σg M mse −1 10 self−noise Gaussian Assumption 2 BIASmin −2 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 3.5: Normalized MSE for the MMSE and the minimum variance frequency estimators deduced in (3.25) and (3.30) for the 16-QAM modulation. The transmitted pulse is a squareroot raised cosine with roll-off 0.75 truncated at ±5T . The corresponding estimators deduced under the Gaussian assumption are also plotted for comparison. The simulation parameters are M = 8, Nss = 2, K = 13 and ∆ = 1.6. information about the parameter of interest and this information is exploited to reduce the average MSE. • Self-Noise. For finite observations (M finite), the studied quadratic estimators manifest a significant variance floor at high SNR due to the so-called self-noise (Fig. 3.4). Remember that self-noise refers to the random fluctuations caused by the unknown nuisance parameters x in blind estimation schemes (See Section 2.4.1). Effectively, the feed-forward estimators presented in this section are unable to cancel out the self-noise for all the possible values of ν. On the other hand, the self-noise free condition is guaranteed in the case of closed-loop (∆ → 0) second-order frequency estimators as shown in Fig. 3.7. Consequently, the amount of information that can be drawn from the current sample y is very limited in the studied case due to the presence of self-noise. In fact, the level of the highSNR floor is a function of the observation time M (Fig. 3.8) as well as the prior range ∆ (Fig. 3.7). • Modulation. If the figures 3.4-3.6 are compared, one concludes that the performance of the MMSE estimator is practically insensitive to the actual distribution of the transmitted 2 , depends on the transmitted symbols. However, the incurred minimum bias, BIASmin 3.4. A CASE STUDY: FREQUENCY ESTIMATION 77 1 10 Mvar 0 MSE 10 2 σg self−noise −1 M 10 Gaussian Assumption mse 2 BIASmin −2 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 3.6: Normalized MSE for the MMSE and the minimum variance frequency estimators deduced in (3.25) and (3.30) for the MPSK modulation. The transmitted pulse is a squareroot raised cosine with roll-off 0.75 truncated at ±5T . The corresponding estimators deduced under the Gaussian assumption are also plotted for comparison. The simulation parameters are M = 8, Nss = 2, K = 13 and ∆ = 1.6. pulse. For the considered Nyquist pulse of roll-off 0.75, the minimum variance solution becomes significantly degraded with respect to the MSK performance for any SNR (Fig. 3.4). Specifically, the bias and self-noise contribution is more significative for the simulated MPSK and 16-QAM modulations. • Bias vs. variance trade-off. The MMSE solution outperforms the minimum variance solution because it is not forced to minimize the bias. On the contrary, it tolerates some bias if the variance term can be attenuated in return, minimizing so the overall MSE. This trade-off is more significant in the low SNR regime but it is also observed at high SNR on account of the self-noise variance. If the self-noise variance is reduced by increasing M, the minimum variance solution converges to the MMSE solution at high SNR (Fig. 3.8). • Consistency. For large samples (M → ∞, with Nss constant), the estimator variance is completely removed whatever the actual SNR and the residual MSE is the estimator bias computed in (3.36). Therefore, consistent second-order estimation is not possible unless 2 vanishes as explained in Section 3.3. This asymptotic result applies the bias term BIASmin CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 78 1 10 0 10 −1 MSE 10 1.6 0.8 −2 10 0.4 −3 10 0.2 −4 10 0.1 0.05 −5 10 −10 −5 0 5 10 15 Es/No (dB) 20 25 30 35 40 Figure 3.7: MSE corresponding to the minimum variance solution for different values of the parameter range ∆=0.05, 0.1, 0.2, 0.4, 0.8 and 1.6. The received signal is MSK-modulated and M=8 samples are processed with Nss =2. to both the MMSE and minimum variance solutions (Fig. 3.8). Formally, 2 #S lim MSE = lim BIASmin = σ 2g − lim Tr SH Q M →∞ M →∞ M →∞ where the last term becomes constant for M ≥ Nss L. Notice that the MSE curves in Fig. 3.8 would eventually converge to the bias floor shown in Fig. 3.4 if the M-axis were 2 expanded, i.e., limM →∞ BIASmin ≈ 10−3 . • Gaussian assumption. The Gaussian assumption is checked in Fig. 3.4 showing that it yields a significant loss for medium-to-high SNRs. On the other hand, it converges to the optimal solution as the SNR approaches to zero. Regarding Fig. 3.8, the Gaussian assumption also supplies asymptotically (M → ∞) self-noise free estimators but it suffers a constant penalty for any finite SNR. This loss is less significative in the case of considering a linear modulation as shown in Figs. 3.5-3.6. 3.4. A CASE STUDY: FREQUENCY ESTIMATION 79 −1 10 MSE Gaussian M var M mse −2 10 4 8 12 16 20 24 28 32 36 40 44 48 M Figure 3.8: Normalized MSE for the MMSE and the minimum variance frequency estimators deduced in (3.25) and (3.30). The modulation is MSK, Es/No=40dB, Nss = 2 and ∆ = 1.6. 1 10 Mvar 0 10 closed−loop −1 MSE 10 −2 10 M −3 mse 10 −4 10 −5 10 0 0.1 0.2 0.3 0.4 0.5 ∆/2 0.6 0.7 0.8 0.9 1 Figure 3.9: MSE as a function of ∆ for the MSK modulation. The simulation parameters are Es/No=10dB, M = 4 and Nss =2. CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 80 3.5 Conclusions This chapter was devoted to design feedforward second-order estimators, adopting the wellknown Bayesian approach. The coefficients of the quadratic estimator were selected to minimize the estimator MSE or the estimator variance on the average, where this average involves the a priori distribution of the unknown parameters. In the optimization of the estimator coefficients, the actual distribution of the nuisance parameters was considered avoiding the usual Gaussian assumption. The applicability of the studied second-order estimators in nonlinear estimation problems becomes generally limited due to the impossibility of cancelling the bias term. Indeed, consistent second-order estimators are mostly unfeasible due to the persistent bias term. Moreover, if the observation time is finite, a variance floor appears at high SNR due to presence of the random nuisance parameters. This floor depends on the actual distribution of the nuisance parameters and can be reduced exploiting their actual distribution, especially in case of CPM signals. Nonetheless, most of these conclusions depend on the actual parameterization and the assumed prior distribution. In this chapter, the problem of blind frequency synchronization was chosen to illustrate these conclusions by means of analytical and numerical results. In this case study, the minimization of the estimator bias —within the parameter range— is proved to be limited by the effective duration of the transmitted pulse. On the other hand, open-loop secondorder frequency estimators exhibited the referred variance floor at high SNR, whereas self-noise free closed-loop frequency estimators exist in the literature even for limited observation times. Beyond the practical interest of open-loop second-order estimators, the formulation in this chapter constitutes the basis for the deduction of optimum quadratic closed-loop estimators in Chapter 4. 3.A SECOND-ORDER ESTIMATION IN NONCIRCULAR TRANSMISSIONS 81 Appendix 3.A Second-order estimation in noncircular transmissions In the main text, optimal second-order estimators have been deduced for complex, circular constellations. The nuisance parameters circularity can be assumed for any bandpass modulation if the carrier phase is uniformly distributed and this random term is incorporated into the vector of nuisance parameters x. In that case, the expectation of yyT becomes zero and does not provide information about the parameter of interest. However, optimal second-order estimators should also exploit the improper correlation matrix yyT in case of baseband transmissions or noncircular bandpass modulations, provided that the carrier phase is known or estimated. Precisely, the carrier phase estimation is addressed in Section 6.2 using quadratic schemes in case of MSK-type modulations. Other important noncircular modulations are the CPM format, any staggered modulation (e.g., offset QPSK), any real-valued constellation such as BPSK or ASK, trellis coded modulations (TCM) as well as other coded transmissions [Pro95]. The analysis of noncircular or improper complex random variables has been carried out in [Sch03][Pic96] and references therein. Widely-linear estimators are proposed in [Sch03][Pic95] in which the vector z y y∗ is linearly processed. This extended signal model has been applied in the field of communications by some authors, e.g., [Gel00][Tul00] [Ger01]. Therefore, all the results in this thesis can be extended by considering the following sample covariance matrix = zzT = R yyT yyH y∗ yT y∗ yH to obtain the optimal widely-quadratic estimator. When stacking the sample covariance matrix, because the term yyH provides it is worth realizing that y∗ yT could be omitted from r = vec(R) the same information. To compute the coefficients of the optimal second-order estimator, it is necessary to obtain following the guidelines in Appendix 3.B. the covariance of r = vec(R) CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 82 Appendix 3.B Deduction of matrix Q(θ) The expression of Q (θ) in (3.9) can be written as follows: H − rrH Q (θ) = E r r (3.37) where r = vec (Ax + w) (Ax + w)H = vec AxxH AH + AxwH + wxH AH + wwH and the dependence on θ is omitted for the sake of brevity. Taking into account the noise is circular and zero mean, i.e., E {w} = 0 E wwT = 0 E wi wj∗ wk = E wi∗ wj∗ wk = 0, only six terms, out of the sixteen in r rH , survive to the expectation in (3.37). These terms can be classified as follows: • signal × signal : vec AxxH AH vecH AxxH AH • signal × noise : vec AxwH vecH AxwH + vec wxH AH vecH wxH AH + vec AxxH AH vecH wwH + vec wwH vecH AxxH AH • noise × noise : vec wwH vecH wwH . Then, using the following three properties [Mag98, Chapter 2]: vec ABCH = (C∗ ⊗ A) vec (B) (A ⊗ B) (C ⊗ D) = AC ⊗ BD H ∗ vec ab vecH abH = (b∗ ⊗a) (b∗ ⊗a)H = bbH ⊗aaH (3.38) (3.39) (3.40) and, bearing in mind that E xxH = IK , one obtains H = AE vec xxH vecH xxH AH + E r r ∗ + R∗w ⊗ AAH + AAH ⊗ Rw + vec AAH vecH (Rw ) + vec (Rw ) vecH AAH + R∗w ⊗ Rw + vec (Rw ) vecH (Rw ) (3.41) where A A∗ ⊗ A and the following property of Gaussian vectors is used (Appendix 3.C): E vec wwH vecH wwH = R∗w ⊗Rw + vec (Rw ) vecH (Rw ) . (3.42) 3.B DEDUCTION OF MATRIX Q(θ) 83 Therefore, grouping terms in (3.41) and having in mind that R = AAH + Rw (2.23), the following expression is obtained: H E r r = AE vec xxH vecH xxH AH ∗ − AAH ⊗AAH − vec AAH vecH AAH + R∗ ⊗R + vec (R) vecH (R) Finally, using once more (3.38) and (3.39) in order to write the negative terms above as a function of A and, plugging this result into (3.37), the expression proposed in (3.10) is obtained: Q (θ) = R∗ (θ) ⊗ R (θ) + A (θ) KAH (θ) . CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 84 Appendix 3.C Fourth-order moments In this section the fourth-order moments of a generic zero-mean, circular, possibly non-Gaussian vector v ∈ CL are deduced. The resulting L4 terms are ordered in the following matrix: Qv = E vec vvH vecH vvH (3.43) whose elements are given by [Qv ]i+Lj,k+Ll = E vi vj∗ vk∗ vl = E vi vj∗ E {vk∗ vl } + E {vi vk∗ } E vj∗ vl − + E vi 4 − 2E 2 vi 2 δ (i, j, k, l) with i, j, k, l ∈ {0, ..., L − 1} and δ (i, j, k, l) the Kronecker delta of multiple dimensions. If all these elements are arranged in Qv , three components are identified: Qv = vec (Rv ) vecH (Rv ) + R∗v ⊗ Rv + diag (vec (Γ)) with Rv E vvH and Γ the diagonal matrix with [Γ]i,i E vi 4 − 2E 2 vi 2 . If the elements of v are identically distributed, µ E vi 2 and ρ E vi 4 /µ2 do not depend on i and, thus, the third term can be simplified to obtain that Qv = vec (Rv ) vecH (Rv ) + R∗v ⊗ Rv + µ2 (ρ − 2) diag (vec (IL )) (3.44) In particular, the fourth-order moments of x in (3.12) are given by (3.44) having in mind that the symbols autocorrelation is E xxH = IK and, thus, we have that Rv = IK and µ = 1. On the other hand, if v is a complex Gaussian vector, as the noise vector w in the adopted signal model, the third term in (3.44) can be removed taking into account that ρ = 2 in the Gaussian case, hence proving equation (3.42): E vec wwH vecH wwH = R∗w ⊗Rw + vec (Rw ) vecH (Rw ) . 3.D BAYESIAN AVERAGE IN FREQUENCY ESTIMATION 85 Appendix 3.D Bayesian average in frequency estimation Let us assume that the scalar parameter λ is estimated from the following observation: y = exp (j2πλdM ) Ax + w where dM [0, ..., M − 1]T and A stands for A (θ)|θ=0 . Therefore, the observation covariance matrix is given by R (λ) = E yyH = E (λ) AAH + Rw where E (λ) is defined as [E (λ)]i,k = ej2πλ(i−k) . (3.45) Let us consider that the prior is uniform in the interval λ ∈ (−∆/2, ∆/2] with ∆ ≤ 1. In that case, it is possible to obtain closed-form expressions for those matrices appering in bopt (3.16), Mmse (3.25) and, Mvar (3.30). The resulting expressions are listed next: g = Eλ {g (λ)} = Eλ {λ} = 0 σ 2g = Eλ λ2 = ∆2 /12 R = E AAH + Rw Q = R∗ ⊗ R + (Eq − E∗ ⊗ E) AAH + Eq AKAH = Eq − vec (E) vecH (E) vec AAH vecH AAH Q s = vec Es AAH with E Eλ {E (λ)} Eq Eλ {E∗ (λ) ⊗ E (λ)} Es Eλ {E (λ) λ} whose elements are given next [Vil03a]: [E]i,k = sinc ((i − k) ∆) [Eq ]i+Mj,k+M l = sinc ((i − j + l − k) ∆) 0 [Es ]i,k = j 2π(i−k) [sinc ((i − k) ∆) − cos (π (i − k) ∆)] i=k i = k and the sinc (·) operator is defined as sinc (x) sin(πx)/(πx) with sinc(0) = 1. CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION 86 Appendix 3.E Bias study in frequency estimation In this appendix, the minimum bias solution is studied in detail for the frequency estimation problem. The coefficients m vec (M) minimizing the estimator bias in (3.19) have to satisfy the minimum-bias constraints in (3.27). After some trivial manipulations, this equation can be written as Eν {B (ν) α∗ (ν)} = Eν {B (ν) ν} (3.46) where B (ν) A (ν) AH (ν) = E (ν/Nss ) AAH (Appendix 3.D) and α∗ (ν) = (r (ν) − r)H m = Tr BH (ν) M − C stands for the estimator mean value (3.7) as a function of the parameter ν and C Eν Tr BH (ν) M is a constant term, which is independent of the parameter ν. Notice also that α (ν) is actually real-valued despite the complex conjugation in (3.46), that is kept for the sake of generality. Regarding the obtained minimum bias equation (3.46), it is straightforward to realize that any unbiased estimator verifies (3.46). Unfortunately, the converse is not usually possible and (3.46) supplies the least squares fitting of α (ν) to the ideal linear response α (ν) = ν within the prior domain (i.e., |ν| < ∆/2). Furthermore, if some elements of B (ν) are connected by an affine transformation, i.e., [B (ν)]i2 ,j2 = Ca [B (ν)]i1 ,j1 + Cb for any value of Ca and Cb , the system of equations in (3.46) becomes underdetermined, as it was equation (3.27). Indeed, this is exactly what happens in the frequency estimation case since the diagonal entries of B (ν) share the same phasor (3.45). Thus, it is possible to reduce (3.46) to 2M − 1 equations corresponding to the diagonals of B(ν). Nonetheless, the uppest and lowest diagonals are equal to zero if M > Nss L with L the effective transmitted pulse duration (in symbols). Therefore, the minimization of the estimator bias requires to fulfill the following 2K + 1 equations: Eν α∗ (ν) ej2πνk/Nss = Eν νej2πνk/Nss or, equivalently, R/2 −R/2 V (f) e j2πf k df = Nss R/2 −R/2 f ej2πf k df k ∈ [−K, K] k ∈ [−K, K] (3.47) where K min (M, LNss ) − 1, f ν/Nss , R ∆/Nss is the carrier uncertainty relative to the Nyquist bandwidth and, K H V (f) Tr B (Nss f) M = [M]i,i+k [B∗ (Nss f)]i,i+k = K k=−K i k=−K [M]i,i+k i ∗ AAH i,i+k ' e−j2πf k 3.E BIAS STUDY IN FREQUENCY ESTIMATION 87 0.5 0.4 0.3 0.2 α(ν) 0.1 0 −0.1 −0.2 −0.3 −0.4 K=4 −0.5 K=16 −0.5 −0.4 −0.3 −0.2 −0.1 −R/2 0 f=ν/Nss 0.1 0.2 0.3 0.4 0.5 R/2 Figure 3.10: Mean value of the frequency estimator corresponding to the minimum bias solution for K=4 and 16. is the Fourier transform of the sequence v[k] defined as ⎧$ ⎨ [M]i,i+k AAH ∗ i,i+k i v[k] F−1 {V (f)} = ⎩0 |k| ≤ K (3.48) otherwise. Notice that in (3.47) we have taken into account that α∗ (ν) = V (v/Nss ) − C where C must be null to guarantee the odd symmetry of the harmonic expansion of f in the right-hand side of (3.47). Thus, equation (3.47) states that the 2K + 1 central terms of the discrete Fourier series of Nss f and V (f), filtered in the interval ±R/2, must be identical in order to minimize the estimator bias. Formally, this means that the sequence v[k] must be equal to the inverse discrete Fourier transform of Nss f as stated in the next equation: v[k] = jNss δ [k] − (−1)k 2πk |k| < K Ideally, if K were arbitrarily long, (3.47) would imply the identity of α (ν) and ν within the prior interval |ν| < ∆/2 or, in other words, V (f)|K→∞ = lim K→∞ K k=−K v[k]e−j2πf k = Nss f |f| < R/2 (3.49) 88 CHAPTER 3. OPTIMAL SECOND-ORDER ESTIMATION whatever the value of R. However, since K is finite and is limited by the transmitted pulse duration L, the value at which the above Fourier series can be truncated without noticiable distortion is a function of the ratio R = ∆/Nss ; the smaller R, the less terms are required for the same distortion of α (ν). In the limit (R → 0), the Taylor expansion of (3.49) around f = 0 ensures that K = 1 is sufficient to hold exactly (3.49) with v [1] = −v [−1] = jNss /(2π). Otherwise, if (3.49) is truncated taking too few elements, α (ν) will suffer from ripple and the Gibbs effect, i.e., the overshooting at the discontinuity points |ν| = ±∆/2, as shown in Fig. 3.10 for the most critical situation in which R = 1. Finally, notice that the effective duration L is inversely proportional to the effective signal bandwidth. Because the minimum transmission bandwidth in bandpass communications is 1/T Hz (i.e., 0% roll-off), it follows that the main lobe of the signal autocorrelation lasts 2T seconds and, thus, in practice the Fourier series in (3.49) becomes truncated approximately at K = Nss or, in the best case, at a few multiples of Nss . Chapter 4 Optimal Second-Order Small-Error Estimation In the last chapter, second-order estimators were designed by achieving a trade-off between bias and variance. The MMSE and minimum variance estimators were obtained averaging all the possible values of the parameter of interest. This approach has a few drawbacks that are summarized next. First of all, second-order estimators are usually biased even if the observation time is increased indefinitely. This fact precludes the existence of consistent quadratic estimators in a majority of nonlinear estimation problems. Moreover, the randomness of the nuisance parameters generally causes a serious variance floor at high SNR for finite data records and, therefore, self-noise free estimates are only possible asymptotically in case of infinite data samples. In this chapter and the next one, the above problems are faced following two different but complementary approaches. In both cases, a closed-loop or feed-back scheme is adopted in which the estimator output is fed back in order to re-design the estimator coefficients and estimate once more the parameters of interest. The closed-loop implementation allows approaching succesively to the true parameter until the estimator attains —after convergence— the so-called small-error regime in which the estimator operates in the neighborhood of the true solution θo . Contrarily, the estimators studied in the previous chapter were based on an open-loop or feedforward architecture in which the parameter was extracted in a “single iterate” from the observed vector. Based on this closed-loop architecture, two different approaches are considered in this chapter following the arguments in Section 2.5. On the one hand, the design of iterative methods is considered in which the observed vector y is repeatedly processed until attaining the small-error regime. With this aim, the gradient-based algorithms presented in Section 2.5 are implemented. The contribution of this chapter is the deduction of the optimal second-order gradient, and the corresponding Hessian, in case of arbitrarily distributed nuisance parameters. Throughout this chapter, we will assume that the length of the observed vector y is sufficient for exceeding the 89 CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION 90 SNR threshold and, thus, working in the small-error regime. Otherwise, the algorithm might converge to a spurious solution, usually referred to as outlier (Section 2.3.2). On the other hand, the design of closed-loop estimators (Section 2.5.1) and trackers (Section 2.5.2) is also addressed in this chapter. As explained in Chapter 2, closed-loop estimators process the observation vector sequentially. The sequential implementation allows a significant reduction in terms of complexity and is unavoidable in case of dealing with a continuous transmission system in which the observation is infinite. It is shown in Section 2.5.1 that the closed-loop architecture yields efficient estimates if the observation is appropriately fragmented and all the parameters have been acquired correctly. Another important feature of closed-loop schemes is their capability of tracking the parameter evolution in time-variant scenarios as explained in Section 2.5.2. As it was explained in Section 2.5.1, closed-loop estimators are composed of a discriminator and a loop filter. The discriminator is actually a small-error estimator dedicated to detect parameter deviations from the current estimate of θ. On the other hand, the loop filter is responsible for filtering the noisy estimates from the discriminator and predicting the parameter evolution in time-varying scenarios. The contribution of this chapter is the deduction of the optimal second-order discriminator assuming that the closed loop has attained the steady-state and, thus, it is working in the small-error regime. The actual distribution of the nuisance parameters is considered in order to cope with the self-noise in an optimal way. The optimal second-order discriminator is obtained focusing uniquely on the steady-state performance and ignoring absolutely the acquisition and tracking behaviour. To complement this approach, the optimal second-order tracker is sought in Chapter 5 based on the Kalman filter theory. In that case, the discriminator and the loop filter are jointly, adaptively designed to optimize both the acquisition and steady-state performance. To summarize this introduction, the small-error regime can be achieved by means of iterative or closed-loop algorithms. Once the small-error regime is achieved, second-order estimators are known to be unbiased since the estimator mean response E { α} is approximately linear on the parameter α, irrespectively of the actual parameterization. Besides, in this small-error situation, second-order estimators become efficient for Gaussian nuisance parameters or in lowSNR scenarios. In the following sections, optimal second-order estimators are designed for the small-error regime and, afterwards, the resulting estimators are applied to the same estimation problem dealt with in Section 3.4; blind frequency offset estimation from digitally-modulated signals. More results can be found in Chapter 6 for the problems of NDA timing synchronization (Section 6.1), NDA carrier phase synchronization (Section 6.2), time-of-arrival estimation in multipath channels (Section 6.3), blind channel identification (Section 6.4) and, angle-of-arrival estimation (Section 6.5). 4.1. SMALL-ERROR ASSUMPTION 4.1 91 Small-Error Assumption In the last chapter, the variability of θ was considered by means of the prior fθ (θ). This chapter deals with the asymptotic case in which this variability is very small (θ θo ). In this small-error regime, the prior fθ (θ) is concentrated around the true parameter θo . Then, the formulation presented in the last chapter can be particularized for a very informative prior fθ (θ) holding that fθ (θ) < ε for any θ = θo with ε arbitrarily small. Accordingly, the prior can be appropriately modelled as a Dirac’s delta centered at θ = θo , that is, fθ (θ) = δ (θ − θo ). Assuming that the estimator works in the small-error regime, the expected value of those complex matrices appearing in Section 3.2 and Section 3.3 can be approximated by means of their Taylor expansion at θ θo . Thus, if F(θ) is a generic complex matrix depending on the vector of parameters θ, its mean value in the neighborhood of θ = θo can be approximated as follows: Eθ {F(θ)} F (θo ) + P 1 ∂ 2 F (θ) [Cθ ]p,q 2 ∂θp ∂θ q θ=θ o (4.1) p,q=1 where the linear term is omitted taking into account that Eθ {θ} θo by definition, and Cθ is the a priori covariance matrix of the parameter: Cθ Eθ (θ − θo ) (θ − θo )H . (4.2) S and Q (Section 3.1) are approximated In Appendix 4.A, the vectors and matrices r, g, Q, in the small-error using (4.1), obtaining that r r (θo ) ro (4.3) g g (θo ) (4.4) Q Dr Cθ DH r (4.5) S Dr Cθ DH g (4.6) Q Q (θo ) Qo where ∂r (θ) Dr ∂θT θ=θ o ∂g (θ) Dg ∂θT (4.7) (4.8) (4.9) θ=θ o Finally, under the small-error assumption, the prior is concentrated in θ = θo so that Cθ (4.2) collapses at this point becoming proportional to a given matrix C0θ defined as 1 Cθ ∆→0 ∆ C0θ lim with ∆ θ − θo the radius of the infinitesimal ball in which the prior is defined. (4.10) CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION 92 4.2 Second-Order Minimum Variance Estimator In the small-error regime, the MMSE solution deduced in Section 3.2 makes no sense because = g(θo ) with Mmse = 0. Thus, the it becomes dominated by the prior in such a way that α MMSE solution must be constrained in some way to avoid the trivial solution. Once more, the minimum bias constraint is imposed to guarantee minimum bias around the true solution θo and, then, the second-order minimum variance estimator in Section 3.3 is formulated again for the small-error regime. The important point is that the bias contribution can be totally eliminated in the small-error case (i.e., ∆ → 0). Actually, a perfect matching between the estimator mean response α (θ) = g(θo ) + MH (r (θ) − ro ) and the target response g (θ) is possible. The necessary and sufficient condition to have unbiased estimates (BIAS 2 = 0) is the equality of the derivatives of α (θ) and g (θ) evaluated at θ = θo (Appendix 4.B): H DH r M = Dg (4.11) For the time being, the target response g (θ) is supposed to verify the above equality for at least one matrix M. Therefore, solving again the minimization problem in (3.29) under the constraints on b and M obtained in (3.16) and (4.11), the optimal small-error estimator is given by # H −1 −1 = g (θo ) +Dg DH α Dr Qo ( r − ro ) r Qo Dr (4.12) where ro and Qo were defined in (4.3) and (4.7) and, the Moore-Penrose pseudoinverse is maintained to cover those cases in which Dr is singular. Thus, the estimation error covariance matrix is given by1 # H −1 BBQU E (θo ) E ( α−g (θo )) ( α−g (θo ))H = Dg DH Dg r Qo Dr (4.13) and the overall variance defined in (3.34) is calculated as the trace of BBQU E (θo ), i.e., V ARmin = V AR (θo ) = Tr {BBQU E (θo )} . Regarding the obtained solution, it is remarkable that the estimator covariance matrix in (4.13) has the same structure than the CRB in Section 2.6.1 where now −1 J2 DH r Qo Dr 1 (4.14) The following estimator was named in [Vil01a] the “Best Quadratic Unbiased Estimator” (BQUE) since it can be understood as a logical extension of the well-known “Best Linear Unbiased Estimator” (BLUE) [Kay93b, (3.3). Ch.6] in case of dealing with a quadratic observation, i.e., r = vec R 4.3. SECOND-ORDER IDENTIFIABILITY 93 plays the same role than the Fisher information matrix (FIM) for the family of second-order estimators considered in this dissertation. Therefore, (4.13) can be seen as the particularization of the Cr´amer-Rao bound to second-order estimation techniques. In section 2.6.1, the matrix J2 is shown to coincide with the FIM of the problem when the SNR is asymptotically low (Section 2.4.1) and/or the nuisance parameters are Gaussian (Section 2.4.3). In general, it can be affirmed that E ( α−g (θo )) ( α−g (θo ))H ≥ BBQU E (θo ) ≥ BCRB (θo ) ∀θo (4.15) = yyH where BCRB (θ) based on the sample covariance matrix R for any unbiased estimator α is the CRB of α = g (θ) (Section 2.6.1). As stated before, the second inequality in (4.15) becomes an identity if the SNR tends to zero and/or the nuisance parameters are Gaussian random variables. 4.3 Second-Order Identifiability This section is devoted to the analysis of the minimum-bias constraints introduced in (4.11). Using basic results on linear algebra, the solution of the system of equations in (4.11) offer three different possibilities [Mag98, Sec. 2.9], which are enumerated next: 1. Dr ∈ CM 2 ×P is full column rank. In that case, (4.11) is always consistent independently of the content of Dg ∈ RQ×P . Assuming that Dr is a tall matrix (i.e., M 2 > P ), the solution of (4.11) is not unique since (4.11) becomes underdetermined. Actually, the 2 bias minimization is only consuming QP degrees of freedom from M ∈ CM ×Q , whereas the remaining degrees of freedom, M 2 − P Q are dedicated to minimize the estimator variance. In Appendix 4.B, it is shown that α g (θo ) + Dg (θ − θo ) in the small-error regime with Dg = MH Dr (4.11). This means that the rank of MH Dr determines the dimension of the subspace that contains the values of α ∈ RQ that can be estimated in the small-error regime from the sample covariance matrix without any ambiguity. As the rank of Dr is P , the rank of MH Dr is equal to min(P, Q) and, thus, α ∈ RQ is locally identifiable from the sample covariance matrix assuming that Q ≤ P. 2. Dr ∈ CM 2 ×P H is singular and DH g ∈ span Dr . In that case, (4.11) is consistent if and P ×Q lies in the subspace generated by the rows of D . Then, if R < P is only if DH r g ∈ R the column rank of Dr , only QR constraints, out of the total QP constraints in (4.11), can be imposed. CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION 94 In that case, the rank of MH Dr is the minimum of R and Q. Therefore, the parameter α ∈ RQ is locally identifiable from the sample covariance matrix if and only if α belongs to the subspace generated by g (θo ) + Dg (θ − θo ) , where the rank of Dg = MH Dr is equal to min(R, Q). 3. Dr is singular but DH / span DH g ∈ r . In that case, (4.11) has no solution and, therefore, there is not exist any unbiased second-order estimator of α = g(θ), even in the small-error regime. In [Sto01], Stoica and Marzetta proved that a finite variance estimator does not exist if (4.11) is not satisfied. Alternatively, the same conclusion can be drawn following the geometrical interpretation derived in [McW93]. In that case, the designer has to proceed as done in (3.29) to obtain the best approximation of g (θ) holding the minimum bias constraints in (3.27). Thus, substituting (4.5)-(4.6) into (3.27), the minimum-bias constraints are given by 0 H Dr C0θ DH r M = Dr Cθ Dg (4.16) where the a priori covariance matrix C0θ (4.10) is used to carry out the matching pro- posed in (3.27). Otherwise, if Dr is full-rank, C0θ is not profitable, showing that Bayesian estimators cannot improve deterministic ones when the small-error assumption applies. Focusing on the second case, there are two circumstances reducing the rank of Dr : • The parameterization is not appropriate. In the following three situations, the estimation problem is not correctly defined and Dr becomes singular. Example 1 : the number of parameters Q is greater than the size of the sample covariance matrix M 2 . Example 2 : the Q parameters are not linearly independent and, therefore, the model is “overparame = yyH , is insensitive to the phase terized”. Example 3 : the sample covariance matrix, R of y in second-order estimation.2 • The estimator has a finite resolution. The estimator is unable to resolve two parameters of the same nature if they are very similar. For example, this problem arises in multiuser estimation problems as, for example, the problem of angle-of-arrival estimation in array signal processing (Section 6.5). It is worth noting that this situation, contrary to the ambiguities related before, cannot be predicted beforehand so it is not possible to guarantee (4.11) all the time. Therefore, the constraints in (4.16) should be used instead of those in (4.11) and the general estimator in (3.33) must be adopted using now the small-error matrices in (4.3)-(4.7). 2 The signal modulus would be also ambiguous if the noise variance σ 2w were not known, as we have assumed throughout the dissertation. 4.4. GENERALIZED SECOND-ORDER CONSTRAINED ESTIMATORS 95 However, from the designer viewpoint, the use of (4.16) may be problematic because the H estimator would reduce automatically the rank of DH g = Dr M when entering into a singular situation (e.g., if two users cross each other as studied in Section 6.5), changing the value of Dg . In the next section, this problem is overcome by setting free the value of the cross derivatives of (4.11). 4.4 Generalized Second-Order Constrained Estimators Thus far, the estimator is designed to have an unbiased mean response when working under the small-error regime. Let us consider first that α = g (θ) is a vector of Q independent parameters H holding that DH g is diagonal. In that case, the diagonal entries of Dr M are related to the estimator bias in the neighborhood of θ = θo whereas the cross-terms reflect the coupling between parameters or, in other words, the interparameter interference (IPI). The classical H unbiased solution forces DH r M = Dg (4.11) in order to yield unbiased estimates without IPI. However, strictly speaking, unbiased estimators are only required to constrain the value of the diagonal entries, that is, H diag DH r M = diag Dg , (4.17) since the IPI contribution is zero-mean in the small-error regime and, therefore, can only increase the estimator variance. Moreover, in noisy scenarios, the IPI-free condition usually causes noise-enhancement whereas, if the cross-terms in (4.11) are kept free, the estimator makes automatically a trade-off among noise, self-noise and IPI in order to minimize the overall variance. Therefore, in case of independent parameters for which Dg is diagonal, the proposed secondorder unbiased estimator is given by −1 = g (θo ) + Dg Dg−1 (J2 ) DH r − ro ) α r Qo ( (4.18) −1 where J2 DH r Qo Dr is the second-order FIM introduced in (4.14). However, the P parameters in θ may appear coupled in α = g (θ). In that case, the matrix of H derivatives DH g is not diagonal and the significance of the out-of-diagonal entries of Dg changes radically. Assuming that DH g is a full matrix (all the elements different from 0), any unbiased estimator of α is required to fulfill (4.11) leading to the original small-error solution in (4.12). In general, if DH g is sparse, only the constraints in (4.11) corresponding to non-zero elements of DH g have to be imposed to obtain unbiased estimators of α. In Section 6.5, the alternative solution obtained in (4.18) is evaluated and compared to the classical unbiased solution in (4.12) for the problem of tracking the angle-of-arrival of multiple digitally-modulated sources in the context of array signal processing. CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION 96 4.5 A Case Study: Frequency Estimation In this section, the small-error estimator proposed in (4.12) is simulated for the frequency estimation problem addressed in Section 3.4. Additional results are given in Chapter 6 for timing estimation and other relevant estimation problems. The results in this section show that the optimal second-order frequency estimator is unbiased and self-noise free. The Gaussian assumption is examined when the transmitted signal is digitally modulated showing that it is generally appropriate in the studied uniparametric problem. In addition, some singular cases are identified for the CPM modulation in which the Gaussian assumption is not able to cancel out the self-noise at high SNR. Later, in Section 6.5, the interest of including the digital information about the symbols is emphasized for the related problem of bearing estimation of multiple digitally-modulated sources. Based on the signal model introduced in Section 3.4 for the problem of frequency estimation, the matrix of derivatives Dr is simply the following column vector: ' ' ∂R(ν) ∂E(ν/Nss ) AAH dr vec = vec ∂ν ν=ν o ∂ν ν=ν o where ν o is the actual value of the parameter, and [E(λ)]i,k = exp (j2πλ (i − k)) are the elements of the Toeplitz matrix introduced in Appendix 3.D. The derivative of E(ν/Nss ) is then calculated, obtaining ∂ [E(ν/Nss )]i,k i−k = j2π [E(ν/Nss )]i,k . ∂ν Nss Therefore, the optimal second-order small-error estimator is given by ν = νo + −1 dH r Qo ( r − ro ) −1 dH r Qo dr where ro and Qo were defined in (4.3) and (4.7), and the denominator is responsible for the unitary slope of E { ν} . Alternatively, a classical synchronization loop can be implemented in which the received signal is corrected using the estimated parameter ν (see Fig. 4.1). Thus, the discriminator can be designed assuming that the input parameter is ν o = 0 once the small-error regime is attained. Consequently, the optimal second-order discriminator is given by ν= −1 dH r Qo r −1 dH r Qo dr where dr and Qo are computed at ν o = 0. Notice too that the last expression is simplified using −1 that dH r Qo ro = 0. This condition is fulfilled thanks to the symmetry of matrix A (ν) for the problem at hand. 4.5. A CASE STUDY: FREQUENCY ESTIMATION y CORDIC (ν o 0 ) 97 µ ν −1 dH r Qo r −1 dH Q o dr r 1 + z −1 LOOP FILTER DISCRIMINATOR Figure 4.1: Block diagram of a (first-order) closed-loop frequency synchronizer. The optimal second-order discriminator, which was derived in this section under the small-error condition, is indicated in the figure. The CORDIC block is due to rotate the phase of the received signal according to the estimated frequency offset. Whatever the selected scheme and the actual value of ν o , the variance at the discriminator output is given by V AR = E ν − ν o 2 = 1 , −1 dH r Qo dr which constitutes the lower bound for the variance of any quadratic unbiased frequency error detector. If the nuisance parameters were normally distributed, then the above expression would correspond to the (Gaussian) UCRB bound presented in Section 2.6.1. Notice that the discriminator variance could be reduced by including the usual loop filter (Fig. 4.1). In that case, the steady-state variance of the related closed-loop estimator is computed using the results in Section 6.1.4 following the reasoning in [Men97, Sec. 3.5.5]. The estimator performance is depicted in the following plots and compared with the MLbased estimators. The Gaussian assumption (GML) is shown to be practically optimal whatever the working point. Nonetheless, a minor degradation of about 0.9 dB is observed in Fig. 4.2 for positve Es /N0 in spite of increasing the observation time (Fig. 4.3). On the other hand, the low-SNR UML solution is rapidly limited by the self-noise as the SNR is augmented, manifesting a significant variance floor. This result is a consequence of the modulation intersymbol interference (ISI) and the finite observation time. In case of linear modulations, this high-SNR floor disappears (e.g., MPSK, QAM, etc.). Finally, the CML solution suffers from noise-enhancement at low SNR due also to the ISI. The interest of the optimal small-error solution is more significant when dealing with a partial-response CPM modulatioon such as the LREC format [Men97, Sec. 4.2]. It can be seen that all the ML-based methods are dominated by the self-noise at high SNR (Figs. 4.4 and 4.5). The CML and GML solutions are not able to cancel out the self-noise when the number of nuisance parameters (K) is greater than the number of samples (M). In that case, the CML estimator cannot remove the self-noise term because there is no noise subspace where to project the data on. Moreover, as it will be studied in Appendix 7.E, the CML and GML solutions CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION 98 0 10 CML −1 10 Low−SNR UML −2 Variance 10 −3 10 GML MCRB −4 10 BQUE −5 10 −6 10 −10 −5 0 5 10 15 Es/No (dB) 20 25 30 35 40 Figure 4.2: Frequency estimation variance under the small-error assumption for the optimal and GML estimators in case of MSK symbols, Nss =2 and, M =4. The UCRB is not plotted for clarity since it is only slightly lower than the GML performance from Es/No=-5dB to Es/No=25dB. are not equivalent at high SNR because the columns of A (ν) are linearly dependent. Another simulation is run in which the received signal is oversampled (Nss = 4) to guarantee that M > K (Fig. 4.6). In that case, the GML and CML estimators supply self-noise free estimates at high SNR, although a significant loss is exhibited for practical SNRs. On the other hand, the optimal second-error estimator is self-noise free under the small-error assumption, as shown for the 2REC and 3REC modulations in Fig. 4.4 and 4.5. Self-noise is removed by exploiting the pseudo-symbols fourth-order moments matrix K. A detailed analysis on the asymptotic behaviour of second-order estimators at high SNR is given in Section 7.3. Two classical small-error lower bounds for the variance of unbiased estimators are used to evaluate the performance of second-order techniques in the presence of nuisance parameters (Section 2.6.1). The (Gaussian) UCRB corresponds to the performance of the GML estimator in case of Gaussian nuisance parameters (Section 2.4.3). Although it has been extensively used in the literature as a valid bound in second-order estimation, simulations show how the UCRB is outperformed by the optimal second-order estimator when the nuisance parameters are discrete symbols. On the other hand, the MCRB predicts the ultimate performance of data-aided estimators that could be approached at high SNR by means of higher-order methods [Vil01b]. 4.5. A CASE STUDY: FREQUENCY ESTIMATION 99 −4 10 −5 10 Variance GML (UCRB) BQUE −6 10 MCRB −7 10 −8 10 1 10 50 M Figure 4.3: Frequency estimation variance under the small-error assumption as a function of M for the MSK modulation, Es/No=40dB and Nss = 2. It is worth noting that the performance predicted in the above curves is only realistic for high SNRs and/or a narrowband loop filter (Section 2.5.2). Otherwise, the studied closed-loop estimators are not able to achieve the small-error regime. This abnormal behaviour is not only associated to closed-loop schemes but it also appears in open-loop estimation in the form of outliers or large-errors. Closed-loop schemes are sometimes able to acquire the parameter without external assistance. The necessary condition is that the estimator bias curve E { ν − ν o } —the so-called S-curve— uniquely intercepts the abcisa with positive slope at the origin. In Fig. 4.7, the acquisition stage of a first-order tracker with forgetting factor µ = 1/20 is simulated for the 2REC modulation. The Es /N0 is set to 60dB in order to study the relevance of the self-noise term. Both the GML and the optimal second-order tracker are shown to acquire the parameter correctly with almost the same speed. On the other hand, the GML self-noise variance is apparent in the steady-state. The associated S-curves are also depicted in Figs. 4.8-4.10. It can be seen that all of them cross the origin and have unitary slope there. 100 CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION 0 10 CML −1 10 Low−SNR UML −2 10 Variance GML −3 10 UCRB −4 10 BQUE MCRB −5 10 −6 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 4.4: Estimators variance as a function of the Es/No for the 2REC modulation and M=8, Nss =2. The number of pseudo-symbols is equal to K=12. 0 10 −1 10 CML Low−SNR UML −2 Variance 10 GML −3 MCRB UCRB 10 BQUE −4 10 −5 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 4.5: Estimators variance as a funciton of the Es/No for the 3REC modulation and M=8, Nss =2. The number of pseudo-symbols is equal to K=28. 4.5. A CASE STUDY: FREQUENCY ESTIMATION 101 0 10 CML −1 10 Low−SNR UML −2 10 Variance GML −3 10 UCRB −4 10 −5 BQUE MCRB 10 −6 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 4.6: Estimators variance as a funciton of the Es/No for the 2REC modulation and M=4, Nss =4. The number of pseudo-symbols is equal to K=12. 0.55 BQUE 0.5 0.45 GML Frequency Estimator Output 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 25 50 75 100 125 n (time) 150 175 200 225 250 Figure 4.7: Frequency tracker output as a function of time for the 2REC modulation in a high SNR scenario (Es/No=60dB). The true frequency offset is equal to ν o = 0.4 (GML) and ν o = 0.5 (BQUE). Both trackers are initialized at ν = 0 with M=8, Nss =2. A first-order closed-loop is implemented with µ = 0.02 the selected step-size or forgetting factor. 102 CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION CML 0.5 BQUE 0.4 GML 0.3 0.2 Low−SNR UML S−curve 0.1 0 −0.1 −0.2 SNR −0.3 −0.4 −0.5 −1 −0.8 −0.6 −0.4 −0.2 0 ν 0.2 0.4 0.6 0.8 1 Figure 4.8: S-curve of the optimum and ML-based discriminators for the MSK modulation with M=8, Nss =2, Es/No=10dB. The dashed arrow points out the tendency of the GML and BQUE S-curves as the Es/No is augmented from Es/No=0 (low-SNR UML S-curve) to Es/No=∞ (CML S-curve). 0.5 0.4 0.3 0.2 S−curve 0.1 0 Low−SNR UML −0.1 CML −0.2 −0.3 GML −0.4 BQUE −0.5 −1 −0.8 −0.6 −0.4 −0.2 0 ν 0.2 0.4 0.6 0.8 1 Figure 4.9: S-curve of the optimum and ML-based discriminators for the 2REC modulation with M=8, Nss =2, Es/No=10dB. 4.5. A CASE STUDY: FREQUENCY ESTIMATION 103 0.5 0.4 0.3 0.2 S−curve 0.1 0 CML Low−SNR UML −0.1 −0.2 GML −0.3 BQUE −0.4 −0.5 −1 −0.8 −0.6 −0.4 −0.2 0 ν 0.2 0.4 0.6 0.8 1 Figure 4.10: S-curve of the optimum and ML-based discriminators for the 3REC modulation with M=8, Nss =2, Es/No=10dB. CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION 104 4.6 Conclusions The limitations of second-order feedforward methods in nonlinear estimation problems motivated the design of closed-loop estimators for the so-called small-error regime. Generally, secondorder estimators are able to yield unbiased and self-noise free estimates once the small-error is attained after the acquisition. This important result is verified, whatever the considered estimation problem, if all the parameters are (locally) identifiable. Focusing only on those identifiable parameters, the prior distribution becomes irrelevant under the small-error assumption. Therefore, it can be stated that Bayesian estimators never outperform deterministic estimators in the small-error regime. In this chapter, the Best Quadratic Unbiased Estimator (BQUE) is formulated considering the true distribution of the nuisance parameters. The BQUE expression is obtained analytically by expanding the constrained minimum variance solution in Chapter 3 in a Taylor’s series in the neighbourhood of the true parameter where the small-error condition is satisfied. The resulting estimator is “the best” in the sense that it does not exist any other unbiased secondorder estimator yielding a lower variance. Consequently, the BQUE performance constitutes the tigthest lower bound on the cvariance of any second-order unbiased blind estimator. Besides, it can be interpreted as the particularization of the CRB theory to second-order estimation. The optimal second-order estimator is proved to depend on the fourth-order cumulants of the nuisance parameters. In some estimation problems, this fourth-order information becomes important to cope with the self-noise disturbance at high SNR. On the other hand, this information is omitted when the Gaussian assumption is adopted. In this chapter, the frequency estimation problem is studied concluding that the Gaussian assumption is practically optimal when we deal with a linear constellation. However, other simulations have shown that the non-Gaussian information about the nuisance parameters is needed to remove the self-noise at high SNR if the number of nuisance parameters exceeds the number of observations and a partial-response CPM transmission is considered. Some other illustrative examples will be studied in Chapter 6 in which the Gaussian assumption is questioned. Finally, in the context of multiuser communications, the estimator peformance is seriously affected by the so-called multiple access inteference (MAI). The original BQUE solution is forced to eliminate the MAI contribution and, for this reason, it suffers from a significant noise enhancement in noisy scenarios. Thus, it is preferable to include the MAI term in the estimator optimization in order to make an optimal trade-off among the three disturbing random terms: thermal noise, self-noise and MAI. The obtained MAI-resistant BQUE estimator is further evaluated in Section 6.5 for the problem direction-of-arrival estimation in cellular communication systems. 4.A SMALL-ERROR MATRICES 105 Appendix 4.A Small-error matrices (θ) as the arguments inside the brackets of (3.24) and (3.23): Let us define S (θ) and Q S (θ) (r (θ) − r) (g (θ) − g)H (θ) (r (θ) − r) (r (θ) − r)H . Q Regarding the matrix S (θ), it is easy to show that S(θo ) = 0 'H 'H ∂g (θ) ∂g (θ) ∂r (θ) ∂r (θ) ∂ 2 S (θ) = + ∂θ p ∂θq θ=θ o ∂θp θ=θ o ∂θq θ=θ o ∂θq θ=θ o ∂θp θ=θ o H = [Dr ]p [Dg ]H q + [Dr ]q [Dg ]p , since the pair of terms depending on r (θ) − ro and g (θ) − g vanish at θ = θo . Then, equation (4.6) is obtained after plugging into (4.1) the following term: P P P ∂ 2 S (θ) H [C ] = [D ] [D ] [C ] + [Dr ]q [Dg ]H r g θ θ p,q p p,q q p [Cθ ]p,q ∂θ ∂θ p q θ=θ o p,q=1 p,q=1 p,q=1 T H H = Dr Cθ DH g + Dr Cθ Dg = 2Dr Re {Cθ } Dg . (4.19) (θ), it is found that Proceeding in the same way with the matrix Q o) = 0 Q(θ 'H 'H (θ) ∂r (θ) ∂r (θ) ∂r (θ) ∂r (θ) ∂2Q = + ∂θ p ∂θq ∂θp θ=θ o ∂θq θ=θ o ∂θq θ=θ o ∂θ p θ=θ o θ=θ o H = [Dr ]p [Dr ]H q + [Dr ]q [Dr ]p . Then, equation (4.5) is deduced after plugging into (4.1) the following expression: P P P (θ) ∂2Q H [Cθ ]p,q = [Dr ]p [Dr ]q [Cθ ]p,q + [Dr ]q [Dr ]H p [Cθ ]p,q ∂θp ∂θq p,q=1 θ=θ o p,q=1 p,q=1 T H H = Dr Cθ DH r + Dr Cθ Dr = 2Dr Re {Cθ } Dr . (4.20) Finally, the real operator in (4.19) and (4.20) can be omitted taking into account that the vector of parameters is actually real-valued throughout this dissertation. CHAPTER 4. OPTIMAL SECOND-ORDER SMALL-ERROR ESTIMATION 106 Appendix 4.B Proof of bias cancellation If the Taylor expansion of α (θ) and the target response g (θ) are calculated around θ = θo , it is found that ∂α(θ) α (θ) g (θo ) + (θ − θo ) = g (θo ) +MH Dr (θ − θo ) ∂θT θ=θ o ∂g (θ) (θ − θo ) = g (θo ) +Dg (θ − θo ) g (θ) g (θo ) + ∂θT θ=θ o (4.21) with Dr and Dg defined in (4.8) and (4.9), respectively. Therefore, if (4.21) is plugged into (3.14), it follows that BIAS 2 = Eθ α (θ) − g (θ)2 = Tr H MH Dr − Dg Cθ MH Dr − Dg where Cθ is the prior covariance matrix introduced in (4.2). Therefore, it follows that MH Dr = Dg is a necessary and sufficient condition to ensure that BIAS 2 = 0 in the small-error regime if Cθ is a full-rank matrix. Chapter 5 Quadratic Extended Kalman Filtering As it was explained in Section 2.5.2, a tracker is a closed-loop estimator that is able to follow the variations of the parameters of interest. To do so, the tracker is composed of a discriminator and a loop filter (see Fig. 2.4). In that scheme, the discriminator is designed to deliver unbiased estimates that are further integrated at the loop filter according to the known parameter dynamics. In Chapter 4, the optimal second-order discriminator was formulated by minimizing the steady-state variance subject to the unbiasedness constraint. In this optimization, it was assumed that the small-error condition is satisfied in the steady-state. Implicitly, this assumption means that all the parameters have been initially acquired and the tracker is following accurately their temporal evolution. However, the tracker optimization was carried out without taking into account the acquisition and tracking performance. For this reason, the loop filter was not involved in the design. Alternatively, the Kalman filter is designed considering globally both the acquisition and steady-state performance. In the Kalman filter theory, the parameter is modelled as a random variable of known statistics [And79][Kay93b, Ch. 13]. The Kalman filter, which is linear in the observed data, is known to be the optimal tracker if the parameters and observations are Gaussian random variables of known a priori mean and variance. In that case, the optimality of the Kalman filter means that it provides minimum variance unbiased estimates in the steadystate as well as minimum MSE estimates during the acquisition. From the results in Chapter 4, the prior distribution about the parameters is useless once the small-error regime is attained. However, this information is very relevant during the acquisition, that is, in the large-error regime. The Kalman filter is considered in this thesis because it 107 CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING 108 performs a gradual transition from the large-error regime in Chapter 3 to the small-error regime in Chapter 4 as the observation length increases. As stated before, this transition is optimal if and only if all the random variables are Gaussian distributed. Unfortunately, the Gaussian condition is quite restrictive because it implies linear models for the observation as well as for the parameter dynamics. Otherwise, the observation and dynamics equations have to be linearized in order to derive the so-called Extended Kalman filter (EKF) [And79][Kay93b, Sec. 13.7]. It can be shown that the EKF is solely the best linear tracker in the steady-state independently of the parameter and observation distribution. This statement is verified because, whatever the parameterization and dynamical model at hand, the observation and dynamics equations are always linear in the vector of parameters if these equations are approximated around the true value of the parameters (small-error assumption). On the other hand, nothing can be stated about the EKF optimality during the acquisition stage (large-error regime), which is actually uncertain. In the context of blind parameter estimation, second-order methods are mandatory because the observation is zero mean. Thus, the EKF is extended in this chapter to deal with quadratic observation models. The result is the so-called Quadratic Extended Kalman Filter (QEKF) that constitutes an alternative deduction for the optimal second-order tracker studied in Chapter 4. The main advantage is that the QEKF adjusts automatically its response during the acquisition phase in order to speed up the tracker convergence without altering the (optimal) steady-state solution. On the other hand, in Chapter 4, the tracker response was specifically designed for the steady-state (small-error regime) and it was not changed during all the operation time. Therefore, the QEKF can be seen as a time-variant quadratic tracker that automatically adjusts the loop bandwidth depending on the current uncertainty on the parameters (Section 2.5.2). Thus, the QEKF bandwidth is progressively decreased during the acquisition time and is finally “frozen” in the steady-state. Another important feature is that, assuming a successful acquisition, the QEKF provides a recursive low-cost implementation of the minimum variance unbiased estimator when the observation time increases indefinitely and the parameters remain stationary. The main criticism about the EKF/QEKF tracker is that the acquisition cannot be guaranteed. Effectively, even in the noiseless case, the linearized model assumed in the EKF/QEKF formulation is not correct when the tracker operates out of the small-error regime, e.g., during the acquisition. To overcome this inconvenient, the Unscented Kalman Filter (UKF) is proposed in [Jul97][Wan00]. The UKF applies the actual nonlinear observation model to propagate correctly the mean as well as the covariance of the Gaussian parameter. The important point is that the convergence of the UKF is guaranteed under some mild conditions [Mer00, Sec. 5]. Implicityly, the UKF is still assuming Gaussian parameters. For other statistical dis- tributions, sequential Monte Carlo estimators —also named particle filters— can be applied 5.1. SIGNAL MODEL 109 [Mer00][Mer01]. The complexity of these methods is usually much greater than that of the well-known EKF. Anyway, the UKF and other particle filters were not considered in this thesis because they are actually higher-order techniques in which the observed samples are plugged into = Eθ E {θ/y} . A nonlinear posterior distributions to approximate the MMSE estimator, i.e., θ tutorial article on the UKF and related sequential Monte Carlo methods is provided in [Mer03]. Finally, the QEKF is deduced and evaluated in the context of DOA estimation and tracking. The Gaussian assumption on the nuisance parameters is tested once more showing the significant improvement in terms of acquisition time as well as steady-state variance when the received signals are digitally modulated and this information is correctly exploited. The results in this section were presented in the 3rd IEEE Sensor Array and Multichannel Signal Processing Workshop that was held in Barcelona in 2004 [Vil04a]: • “On the Quadratic Extended Kalman Filter”, J. Villares, G. V´ azquez. Proc. of the Third IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM 2004). July 2004. Sitges, Barcelona (Spain). 5.1 Signal Model Let us consider a time-variant scenario in which the observed vector at time n is given by yn = A (θn ) xn + wn n = 1, 2, 3, ... (5.1) where the transfer matrix A (θn ) is known except for a vector of P real-valued parameters θn , xn is the vector of K unknown zero-mean inputs and, wn is the vector of Gaussian noise samples. The covariance matrix of xn and wn is given by H = Rw δ (k) E wn wn+k H E xn xn+k = IK δ (k) , respectively. Therefore, we are assuming that the noise and the nuisance parameters are uncorrelated in the time domain. In order to track the parameter evolution in time, the estimator is provided with the following dynamical model or state equation [And79][Kay93b, Ch.13]: θn = f (θn−1 ) + un (5.2) where un is a zero-mean random variable of known covariance matrix Ru E un uH modeling n the uncertainty about the assumed model. The initial state θ0 is also a random variable of known CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING 110 mean µ0 E {θ0 } and covariance matrix Σ0/0 E θ0 θH 0 . These two quantities summarize all the available prior information about the parameter θn . It is important to note that consistent estimates cannot be obtained using linear schemes because the observation yn is zero-mean. Thus, blind estimation imposes the need of secondorder tecnniques that are known to be optimal for low-SNR and/or Gaussian data (Section 2.4.1 and Section 2.4.3). Accordingly, the following quadratic measurement equation is considered: rn vec yn ynH = h (θn ) + vn (θn ) (5.3) where h (θn ) E {rn } = vec A (θn ) AH (θn ) + Rw vn (θn ) rn − E {rn } H H H H H = vec A (θn ) xn xH n − IK A (θ n ) + A (θ n ) xn wn + wn xn A (θ n ) + wn wn − Rw are the signal and noise components of the measurement equation, respectively. Notice that the observation noise vn (θn ) is zero-mean and it depends on the wanted parameters in the considered quadratic model. 5.2 Background and Notation Following the classical notation in [And79], an/m will denote the linear MMSE estimate of a given random vector an based on the quadratic observations r1 , . . . , rm . This means that an/m is an affine transformation of the sample covariance vectors r1 , . . . , rm in (5.3) or, equivalently, a quadratic transformation of the input data y1 , . . . , ym (5.1). It is well-known that the MMSE estimator E {an /r1 , . . . , rm } is linear in r1 , . . . , rm if and only if an and r1 , . . . , rm are jointly Gaussian distributed. However, the Gaussian assumption is not satisfied most times. In that case, it is convenient to introduce the following notation n/m = EL {an /r1 , . . . , rm } a n/m , bearing in mind that EL {an /r1 , . . . , rm } = to refer to the linear MMSE estimator a E {an /r1 , . . . , rm } in the Gaussian case [And79, Sec. 5.2]. The Kalman filter can be seen as a sequential implementation of the linear MMSE estimator of θn that, using the notation above, is given by n/n = EL {θn /r1 , . . . , rn } . θ n/n is unavoidable as the From a complexity point of view, the sequential computation of θ number of observations augments (n → ∞). The Kalman filter recursion is based on two facts: 5.2. BACKGROUND AND NOTATION 111 • The orthogonalization (decorrelation) of the original observations r1 , . . . , rn using the Gram-Schmidt method1 . The transformed observations are the so-called innovations r1 , . . . , rn [And79][Kay93b][Hay91], which are computed as rn/n−1 rn = rn − (5.4) with rn/n−1 = EL {rn /r1 , . . . , rn−1 } the linear MMSE prediction of rn based on the past observations r1 , . . . , rn−1 . Thus, the innovation rn supplies the new information contained in the observation rn or, in other words, it yields the unpredictable component of rn . It can be shown that the innovation rn is zero-mean and it is uncorrelated with both rm and rm for any m = n. Using this property, it is easy to show that n/n = EL {θn /r1 , . . . , rn } = EL {θn / r1 , . . . , rn } = θ n EL {θn / rk } k=1 n/n−1 + EL {θn / = EL {θn /r1 , . . . , rn−1 , rn } = θ rn } n/n−1 + EL θ n / =θ rn , (5.5) where n/n−1 = EL {θn /r1 , . . . , rn−1 } θ n/n−1 n θ n − θ θ (5.6) are the linear MMSE prediction of θn —based on the past observations r1 , . . . , rn−1 — and n are n/n−1 and θ the resulting prediction error, respectively. It can be shown that both θ zero mean and they are uncorrelated with both rn and rn . In fact, this property has been n/n−1 / applied to obtain the final expression in (5.5) considering that EL θ rn = 0. • The existence of a linear state equation (5.2) as well as a linear measurement equation n/n (5.5) can be obtained from θ n−1/n−1 (i.e., the previous (5.3). When this is possible, θ estimate) and rn (i.e., the new datum) bearing in mind that EL {Man /r1 , . . . , rn } = MEL {an /r1 , . . . , rn } . (5.7) Unfortunately, the state and measurement equations in (5.2)-(5.3) are generally nonlinear in the parameters of interest. Consequently, these two equations have to be linearized in order to apply the Kalman filter formulation. This matter is addressed in the next section. 1 Although xn and wn are uncorrelated in Sec. 5.1, the observations r1 , . . . , rn are correlated because they depend on the random parameters θ 1 , . . . , θ n , which are correlated in the assumed dynamical model (5.2). CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING 112 5.3 Linearized Signal Model In order to have linear state and measurement equations, the original nonlinear equations (5.2)n−1/n−1 and θn = θ n/n−1 , (5.3) are expanded in a first-order Taylor series at the points θn−1 = θ n−1/n−1 and θ n/n−1 are the linear MMSE respectively. These points are selected because θ n−1/n−1 and estimates of θn−1 and θn before the new datum rn is processed. By definition, θ n/n−1 are given by θ n−1/n−1 = EL {θn−1 /r1 , . . . , rn−1 } θ n/n−1 = EL {θn /r1 , . . . , rn−1 } . θ n−1/n−1 and θ n/n−1 are previously computed at time n −1, the QEKF Thus, assuming that θ will be derived from the linearized state and quadratic measurement equations given next: n−1/n−1 + un n−1/n−1 θn−1 − θ n−1/n−1 + F θ θn ≈ f θ (5.8) n/n−1 + vn θ n/n−1 θn − θ n/n−1 n/n−1 + Hn θ (5.9) rn ≈ h θ where F (θn−1 ) and Hn (θn ) are the Jacobian of θn and rn , respectively, that is given by ∂θn ∂f (θn−1 ) = T ∂θn−1 ∂θTn−1 ∂rn ∂h (θn ) ∂vn (θn ) Hn (θn ) = + . T ∂θn ∂θTn ∂θTn F (θn−1 ) n/n−1 in (5.9) can be computed as From the linearized state equation (5.8), the prediction θ n/n−1 = f θ n−1/n−1 , (5.10) θ using (5.7) and taking into account that the noise un is zero mean. On the other hand, the Jacobian Hn (θn ) is calculated from (5.3), obtaining % H ∂A (θ) H H ∂A (θ) xn xH A (θ) + A (θ) x x [Hn (θ)]p = vec n n n ∂θp ∂θp & H ∂A (θ) ∂A (θ) + xn wnH + wn xH n ∂θp ∂θp where θp stands for the p-th component of θ. Note that the transfer matrix Hn (θ) appearing in (5.9) is noisy because it depends on the random terms xn and wn . This particularity is a consequence of the original quadratic observation model (5.3). 5.4 Quadratic Extended Kalman Filter (QEKF) In this section, the Kalman filter is derived from the quadratic and linearized model introduced in the last two sections. The resulting tracker is named the Quadratic Extended Kalman Filter (QEKF) because it corresponds to the so-called Extended Kalman Filter (EKF) [And79] 5.4. QUADRATIC EXTENDED KALMAN FILTER (QEKF) 113 [Kay93b, Sec. 13.7] in case of having quadratic observations (5.3). The QEKF is thus obtained from (5.5) after solving the second term as indicated now: n / EL θ rn = MH rn n (5.11) where Mn is the so-called Kalman gain matrix, Mn Q−1 n Sn n θ Sn E rH n H Qn E rn , rn (5.12) (5.13) (5.14) and E {·} stands for the expectation with respect to all the random terms inside the brackets, namely θ0 , . . . , θn and r1 , . . . , rn . The Kalman gain matrix (5.12) has been derived using the following well-known result [Kay93b, Eq. 12.6]: EL {x/y} = E {x} + E xyH E −1 yyH (y−E {y}) n and rn introduced in (5.6) and (5.4), reparticularized for the zero-mean random vectors θ spectively. This abbreviated deduction of the extended Kalman filter [Kay93b, App. 13.B] is based on the following two important equations: E { rn } = Er1 ,... ,rn−1 E { rn /r1 , . . . , rn−1 } = 0 n = Er1 ,... ,rn−1 E θ n /r1 , . . . , rn−1 = 0 E θ n /r1 , . . . , rn−1 are strictly zero in view of their definitions since E { rn /r1 , . . . , rn−1 } and E θ in (5.4) and (5.6), respectively. Therefore, plugging (5.10) and (5.11) into (5.5), we obtain the QEKF recursion: n−1/n−1 + SH Q−1 rn − n/n = f θ rn/n−1 θ n n (5.15) where n−1/n−1 n/n−1 = h f θ rn/n−1 = h θ is obtained from (5.15) and (5.10). 5.4.1 Another QEKF derivation Thus far, the classical formulation of the QEKF is sketched introducing some simplifications. In this section, a simpler derivation of the QEKF is proposed based on the general formulation CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING 114 in Section 3.2. In fact, the solution in (5.15) is obtained by considering a generic second-order estimator, n/n = bn + MH rn , θ n and solving the following optimization problem: 2 bn , Mn = arg max E b + MH rn − θn /r1 , ..., rn−1 b,M where bn and Mn are the independent and quadratic components of the second-order MMSE estimator of θn , respectively. In (3.16), it was obtained that the optimal independent term is n/n−1 − MH bn = θ rn/n−1 . On the other hand, the optimal quadratic term Mn was derived in n (3.25) obtaining precisely the Kalman gain matrix in (5.12). The conditional expectation in the last equation suggests that the random parameters are averaged by means of the prior distribution fθ n /r1 ,...,rn−1 (θn /r1 , ..., rn−1 ), which has all the existing knowledge on θn before processing rn . In that way, the QEKF provides a means of updating the prior distribution every time a new datum is incorporated. In case the QEKF converges to the true parameter, the sequence of priors fθ n /r1 ,...,rn−1 (θn /r1 , ..., rn−1 ) becomes progressively more informative until the small-error regime is attained (Chapter 4). Moreover, if the Gaussian assumption applies, the prior updating is optimal, minimizing so the acquisition time. Definitely, this was the motivation of considering in this thesis the Kalman filter formulation: the QEKF provides the transition from the MMSE large-error solution in Chapter 3 to the small-error BQUE solution in Chapter 4. An evident connection is observed between (5.15) and the expression obtained for the optimal second-order discriminator in (4.12). However, there are some important differences. First of all, the so-called Kalman gain matrix Mn appearing in (5.15) includes both the discriminator and the loop filter of a classical closed-loop implementation. Moreover, Mn is time-varying and, therefore, the QEKF is able to adjust online the overall tracker response in view of the instantaneous uncertainty about the parameters. It can be shown that the QEKF and the closed-loop implementation in Chapter 4 become equivalent in the steady-state if they are arranged to have the same (noise equivalent) loop bandwidth. Formally, it is verified that lim Mn = diag (µ) M n→∞ where M is the optimal second-order discriminator obtained in (4.12) and, the vector of step sizes µ is determined by the state equation noise covariance matrix Ru E un uH (5.2). n The proof of this important statement would require to solve properly the Ricciati steady-state equation [And79, Ch.4] and it suggests an in-depth study that is still incomplete. 5.4. QUADRATIC EXTENDED KALMAN FILTER (QEKF) 5.4.2 115 Kalman gains recursion The linearized model in Section 5.3 allows obtaining Mn = Q−1 n Sn recursively. In this section, the QEKF deduction is completed by making this recursion explicit. Let us study first the cross-correlation matrix Sn (5.13). It is easy to prove that n/n−1 Σn/n−1 = Dr f θ n−1/n−1 Σn/n−1 Sn = Dr θ where H /r1 , ..., rn−1 = F θ n−1/n−1 Σn−1/n−1 FH θ n−1/n−1 + Ru n θ Σn/n−1 E θ n (5.16) n expressed as a function of the estimation is the covariance matrix of the prediction error θ MSE matrix2 at time n − 1: H Σn−1/n−1 = E θn−1 − θn−1/n−1 θn−1 − θn−1/n−1 /r1 , ..., rn−1 . The linear relationship between Σn/n−1 and Σn−1/n−1 is a consequence of the linearized state equation (5.8). On the other hand, Dr (θ) = E {Hn (θ)} was introduced in Chapter 4 as the matrix collecting the covariance matrix derivatives, i.e., & % ∂A (θ) H ∂AH (θ) . A (θ) + A (θ) [Dr (θ)]p = vec ∂θp ∂θp Likewise, the innovations covariance matrix Qn/n−1 can be also computed from the last estin−1/n−1 and the associated MSE matrix Σn−1/n−1 . In the studied quadratic observation mate θ model, the deduction of Qn/n−1 results a little bit more involved because Hn (θ) is random n/n−1 depends on the parameterization. Ommiting (noisy) and the measurement noise vn θ n/n−1 = f θ n−1/n−1 for the sake of clarity, it follows that the dependence on θ H H Qn/n−1 E rn /r1 , ..., rn−1 = E Hn Σn/n−1 HH rn n + E vn vn Regarding now the second term, E vn vnH is the measurement noise covariance (Section 5.1). It is easy to realize that E vn vnH is the fourth-order matrix Q (θ) introduced in (3.9) for n/n−1 . In that chapter, a closed-form expression was deduced for Q (θ) in equation (3.10), θ=θ obtaining Q (θ) = R∗ (θ) ⊗ R (θ) + A (θ) KAH (θ) 2 (5.17) Due to the original nonlinear signal model, Σn−1/n−1 is not the tracker covariance matrix. However, following the original nomenclature in the Kalman filter theory, Σn−1/n−1 will be referred to as the MSE matrix. CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING 116 where R (θ) = A (θ) AH (θ) + Rw , A (θ) = A∗ (θ) ⊗ A (θ) and K is the so-called kurtosis matrix, that supplies all the non-Gaussian information about the nuisance parameters xn . Once more, K plays a prominent role in this chapter in case of non-Gaussian nuisance parameters. Regarding now the first term of Qn/n−1 , it follows that P H Σn/n−1 p,q Hp,q (θ) E Hn Σn/n−1 Hn = p,q=1 n/n−1 θ=θ where, after some tedious manipulations, Hp,q (θ) E [Hn (θ)]p [Hn (θ)]H q H = A∗ (θ) ⊗ [Dr (θ)]p + [Dr (θ)]∗p ⊗ A (θ) K A∗ (θ) ⊗ [Dr (θ)]q + [Dr (θ)]∗q ⊗ A (θ) ∗ H +R∗w ⊗ [Dr (θ)]p [Dr (θ)]H + [D (θ)] [D (θ)] ⊗ Rw (5.18) r r q q p with H xn xH = IK 2 + vec (IK ) vecH (IK ) + K K E vec xn xH n vec n being K the nuisance parameters kurtosis matrix. Therefore, it is found that the Kalman gains Mn can be computed from the previous estimate θn−1/n−1 and the associated covariance matrix Σn−1/n−1 . In order to apply this recursion to Mn+1 in the next time instant, it is necessary to evaluate the estimation MSE matrix at time n, which is given by Σn/n H E θn − θn/n θn − θn/n /r1 , ..., rn = Σn/n−1 − MH n Sn considering the QEKF solution in (5.15). 5.4.3 QEKF programming In this section, the more important equations in the QEKF deduction are listed in order to n−1/n−1 facilitate its implementation in a hardware or software platform. Thus, assuming that θ and Σn−1/n−1 were computed in the previous iterate, the following operations must be carried out when the new sample yn is received: 1. Prediction: n/n−1 = f θ n−1/n−1 θ n−1/n−1 n/n−1 = h f θ rn/n−1 = h θ n−1/n−1 Σn−1/n−1 FH θ n−1/n−1 + Ru Σn/n−1 = F θ 5.5. SIMULATIONS 117 2. Kalman gain: Mn = Q−1 n Sn Sn = Dr (θ) Σn/n−1 θ=θ Qn = P Σn/n−1 p,q=1 n/n−1 ∗ H H (θ) + R (θ) ⊗ R (θ) + A (θ) KA (θ) p,q p,q n/n−1 θ=θ where Mn is eventually a function of the signal model A (θ) and its derivatives n/n−1 — as well as the noise covariance Rw and ∂A (θ) /θ1 , . . . , ∂A (θ) /θP —evaluated at θ the kurtosis matrix K. The exact expressions of Sn and Qn were deduced in Section 5.4.2. 3. Estimation: n/n−1 + MH rn − n/n = θ rn/n−1 θ n with rn = vec yn ynH the sample covariance matrix. 4. MSE matrix update: −1 Σn/n = Σn/n−1 − MH n Sn = Σn/n−1 − Sn Qn Sn . 5.5 Simulations Let us consider the problem of tracking the direction-of-arrival (DOA) of P mobile terminals transmitting toward a uniform linear array composed of M > P antennas spaced λ/2 meters, with λ the wavelength of the received signals. The received signal is passed throught the matched-filter and then sampled at one sample per symbol. We will consider independent snapshots assuming that the actual modulation is ISI-free and the P signals are perfectly synchronized. Assuming for simplicity that all the users are received with the same power, the observed signal verifies the linear signal model in equation (5.1) with A (θn ) exp jπdM θTn dM = [0, . . . , M − 1]T being xn the transmitted symbols for the P users and wn the vector of AWGN samples with E wn wnH = σ2w IM . Therefore, the SNR (per user) is given by σ−2 w bearing in mind that E xn xH IK with K = P in this case. n Several illustrative simulations have been carried out to evaluate the performance of the QEKF (5.15) when the transmitted signal is digitally modulated. The optimum QEKF is compared with the one based on the Gaussian assumption that is obtained imposing K = 0 into CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING 118 −1 10 −2 10 −3 MSE 10 Gaussian QEKF −4 10 Optimum QEKF −5 10 −6 10 −7 10 0 2 4 6 8 10 n (time) 12 14 16 18 20 Figure 5.1: Estimation MSE as a function of time for the optimum and Gaussian QEKF in the case of a single static MPSK-modulated user with random DOA in the range ±0.4 and SNR=40dB. (5.17) and (5.18). This suboptimal QEKF will be referred to as the Gaussian QEKF in the sequel. The normalized mean square error (MSE) is adopted as the figure of merit, that at time n reads 2 1 MSE (n) E θn − θn/n . P - Simulation 1: in Fig. (5.1), a single user (P = 1) transmitting from a static DOA is simulated. The transmitted symbols are drawn from a phase shift keying (MPSK) constellation. The basestation array is composed of M = 4 antennas and the SNR per user is set to 40 dB. This very high SNR scenario is studied in order to analyze how the trackers cope with the random nuisance parameters, i.e., the so-called self-noise. 0/0 = 0 with Σ0/0 = 1000. Then, 1000 realizations are run The estimator is initialized at θ with θ uniformly distributed within (−0.4, 0.4). The parameter range is limited in this interval because the tracker acquisition margin is limited to ±2/M = ±0.5. In general, the QEKF solution is unique, whatever the initial start-up, if and only if M = P + 1. When M > P + 1 the array directivity is augmented but new sidelobes appear in the array beam pattern yielding spurious solutions. Figure 5.1 depicts the estimated MSE(n) for the optimum and Gaussian QEKF. The state equation noise Ru (5.2) is set up to attain the same steady-state variance in both cases. It 5.5. SIMULATIONS 119 −1 10 −2 10 Gaussian Assumption −3 MSE 10 T −4 θ=[−0.1,0.1] 10 θ=[−0.2,0.2]T −5 10 −6 10 0 5 10 15 20 n (time) 25 30 35 40 Figure 5.2: Estimation MSE as a function of time for the optimum and Gaussian QEKF in the case of two static MPSK-modulated users placed at ±0.2, ±0.1 and SNR=40dB. becomes apparent that the acquisition time is reduced if the QEKF exploits the digital structure of the received signal by incorporating the kurtosis matrix K. Alternatively, this improvement could be used to reduce the QEKF steady-state variance if the optimum and Gaussian QEKF trackers were adjusted to yield the same acquisition time. - Simulation 2: in this simulation, we have P = 2 users transmitting from θ = [−0.1, 0.1]T or θ = [−0.2, 0.2]T . The array size is M = 4 and the SNR per user is again 40 dB. The QEKF 0/0 = [−0.5, 0.5]T with Σ0/0 = 1000IP and Ru = 10−3 IP . The trackers are initialized at θ resulting MSE(n) is plotted in Fig. 5.2 for the optimum and Gaussian QEKF. Once more, the fourth-order information about the discrete symbols is shown to improve the QEKF performance in both the acquisition and steady-state regimes. As shown in figure 5.2, the closer are the two sources the higher is this improvement. Further simulations showed that the simulated Gaussian QEKF is unable to acquire the actual DOAs in some cases, e.g., θ = [0.2, 0.4]T , whereas the optimum QEKF converges eventually to the true DOAs. - Simulation 3: in this simulation, the state equation noise is removed (Ru → 0) in order to evaluate the estimator consistency when n → ∞. First of all, the DOAs are acquired (n < 0) with all the QEKFs adjusted to yield the same steady-state variance (0 < n < 20). From this steady-state situation, Ru is set to zero at n = 20 so that the QEKF (noise equivalent) bandwidth is progressively reduced. At this moment, the optimum QEKF becomes an orderrecursive implementation of the second-order tracker in Section 4.2 as the observation time goes CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING 120 −5 10 −6 10 MSE θ=+/−0.01 Gaussian QEKF θ=+/−0.1 −7 10 θ=+/−0.2 −8 10 Optimum QEKF −9 10 1 2 10 10 n (time) Figure 5.3: Estimation MSE as a function of time for the optimum and Gaussian QEKF (dashed) in the case of two static MPSK-modulated sources transmitting from ±0.2 (), ±0.1 () and, ±0.01 (♦) when Ru is set to zero at n=20. SNR=40dB. to infinity. Likewise, the Gaussian QEKF implements the well-known GML estimator explained in Section 2.4.3. These statements are based on the technical discussion in Section 5.4.1. In Fig. 5.3, numerical results are provided for two quiet users at θ = ±0.01, ±0.1 or ±0.2 with M = 4 and SNR=40dB. We observe that the Gaussian assumption suffers a constant penalty as n is augmented. Consequently, the GML estimator is proved to be suboptimal at high SNR when the modulation has constant envelope (e.g., MPSK or CPM [Pro95]), even if the observation is arbitrarily large (n → ∞). This result is further validated by means of the asymptotic study in Section 7.4.5. Finally, notice that the incurred loss is a function of the users angular separation. Surprisingly, the variance of the optimal QEKF is improved as the user are closer. This abnormal result is a consequence of the secondary lobes of the array response when the number of antennas is small. The same effect will be observed in Section 7.5 when studying the asymptotic performance of the optimal small-error DOA tracker. - Simulation 4: in order to validate that the simulated QEKFs are tracking the actual DOAs, the users are moved with constant angular speed from −0.8 to 0.8 with fixed angular separation (0.02). 50 trials are plot in figure 5.4 showing that the Gaussian QEKF fails in tracking the two users. 5.6. CONCLUSIONS 121 −0.6 −0.65 Optimum QEKF DOA −0.7 Gaussian QEKF −0.75 −0.8 −0.85 0 10 20 30 40 50 n (time) 60 70 80 90 100 Figure 5.4: DOA Tracking of two close MPSK-modulated signals separated 0.02 using the optimum and Gaussian QEKF. SNR=40dB. - Simulation 5: the same simulation in Fig. 5.3 has been carried out for a low signal-tonoise ratio (SNR=10 dB) and a multilevel modulation such as 16-QAM [Pro95]. Figures 5.5 and 5.6 manifest the optimality of the Gaussian assumption when multilevel constellations or low SNRs are considered, respectively. 5.6 Conclusions The EKF formulation has been extended to deal with quadratic signal models that appear naturally in blind estimation problems. The resulting Quadratic EKF (QEKF) is found to exploit the fourth-order cumulants (kurtosis) of the unknown inputs whereas this information is implicitly omitted when the classical Gaussian assumption is adopted in the design. The QEKF is further applied to estimate and track the DOA of multiple digitally-modulated sources concluding that constant amplitude modulations (e.g., MPSK or CPM) yield a significant improvement in terms of acquisition and/or steady-state variance for moderate-to-high SNRs. In these scenarios, the Gaussian assumption is found to provide suboptimal DOA estimators or trackers even if the tracker bandwidth is indefinitely reduced or, in other words, the (effective) observation time is increased without limit. CHAPTER 5. QUADRATIC EXTENDED KALMAN FILTERING 122 −5 10 −6 10 MSE +/−0.01 −7 10 +/−0.1 +/−0.2 −8 10 −9 10 1 2 10 10 n (time) Figure 5.5: Estimation MSE as a function of time for the optimum and Gaussian QEKF (dashed) in the case of two 16-QAM modulated signals received from ±0.2 (), ±0.1 () and ±0.01 (♦) when Ru is set to zero at time n=20. SNR=40dB. θ=+/−0.2 −5 MSE 10 θ=+/−0.05 −6 10 1 2 10 10 n (time) Figure 5.6: Estimation MSE as a function of time for the optimum and Gaussian QEKF (dashed) in the case of two MPSK-modulated signals received from ±0.2 () and ±0.05 (♦) when Ru is set to zero at time n=20 and the SNR is equal to 10dB. Chapter 6 Case Studies This chapter explores some illustrative applications in the context of digital communications. The second-order estimation theory in the preceding chapters is developed for these selected case studies. In most examples, the focus is on closed-loop second-order schemes assuming that the small-error approximation is satisfied. The Gaussian ML estimator and the rest of MLbased approximations are numerically compared to the optimal second-order small-error solution in Chapter 4. Likewise, the related lower bounds in the presence of nuisance parameters are included for completeness (Section 2.6.1). In the first section, some contributions in the field of non-data-aided sychronization are presented. Specifically, Section 6.1 proposes the global optimization of second-order closed-loop synchronizers and the design of open-loop timing sycnronizers in the frequency domain. In Section 6.2, the problem of second-order carrier phase synchronization is addressed in case of noncircular transmissions. In this section, the ML estimator is shown to be quadratic at low SNR for MSK-type modulations. Moreover, second-order self-noise free estimates are achieved at high SNR exploiting the non-Gaussian structure of the digital modulation. In Section 6.3, the problem of time-of-arrival estimation in wireless communications is studied. The frequencyselective multipath is shown to increase the number of nuisance parameters and the Gaussian assumption is shown to apply in this case study. In Section 6.4, the classical problem of blind channel identification is dealt with. The channel amplitude is shown to be not identifiable unless the transmitted symbols belong to a constantmodulus constellation and this information is exploited by the estimator. Finally, the problem of angle-of-arrival estimation in the context of cellular communications is addressed in Section 6.5. The Gaussian assumption is clearly outperformed for practical SNRs in case of constantmodulus nuisance parameters and closely spaced sources. In this section, the importance of the multiple access interference (MAI) is emphasized and MAI-resistant second-order DOA trackers are derived and evaluated. 123 CHAPTER 6. CASE STUDIES 124 6.1 Non-Data-Aided Synchronization The problem of blind frequency estimation was adopted in the core of the dissertation —Chapters 3 and 4— to illustrate the most significant conclusions of this thesis. This choice was based on the relevance of this problem in many applications and, the existence of closed-form expressions for the feedforward frequency estimator considered in Chapter 3. In this section, some additional contributions in the field of non-data-aided (NDA) digital synchronization are presented and simulated. To introduce the reader to the problem of digital synchronization, a brief review of the state-of-the-art is provided in Section 6.1.1, in which the most successful timing and frequency estimators are presented. Afterwards, in Section 6.1.2, the signal model for digital synchronization is reviewed and some important remarks are made on the structure of the transfer matrix A (θ). Based on this signal model, the performance of the most important NDA (quadratic) timing estimators —for both linear and CPM modulations— is extensively evaluated via simulation in Section 6.1.5. In this context, a closed-form expression for the optimal second-order open-loop timing estimator is deduced by processing the received signal in the frequency domain (Section 6.1.3). Another contribution of this section is the global optimization of closed-loop estimators, showing that the discriminator should be designed to minimize the variance of the low-pass noisy terms because the high-pass terms (e.g., the self-noise) are filtered at the loopfilter (Section 6.1.4). Finally, all these results are validated via simulation in Section 6.1.5. 6.1.1 Overview In digital communications, the receiver has to recover some reference parameters in order to demodulate the received signal. These parameters are mostly the signal timing and, in bandpass coherent communications, the carrier phase and the carrier frequency. The knowledge of these parameters is necessary to synchronize the demodulator and take reliable decisions on the transmitted symbols [Men97][Vaz00]. Despite the data symbols are a priori unknown, digital modulations exhibit a strict-sense cyclostationarity that can be exploited to derive sufficient statistics for the estimation of the aforementioned parameters. Thus, all the methods in the literature for non-data-aided (NDA) timing and frequency estimation make use of the cyclostationarity property of the received signal [Rib94]. As trying to exploit the entire statistics would be unpractical, two main directions have been adopted in the development of practical algorithms. The first direction focuses on an explicit exploitation of the second-order cyclostationarity [Gar86b][Rib94]. As a result, the algorithms derived become quadratic with respect to the received signal. There are, at least, two motivations 6.1. NON-DATA-AIDED SYNCHRONIZATION 125 for choosing the second-order statistics. The first one is that it represents a minimum complexity constraint. The other is that it allows extracting useful insights from the spectral correlation concept [Gar86b], which is useful for guiding the designer in the derivation of synchronization algorithms. Although all the above methods start from a solid theoretical foundation, the second order constraint appears as an ad hoc selection, and the obtained methods are based on heuristic reasoning. For the preliminary issues on cyclostationarity the reader is referred to [Gar94] and references therein. The second direction commonly adopted for the design of synchronization algorithms is the application of the well-known maximum likelihood principle explained in Section 2.3 [Men97][Vaz00]. While the cyclostationary framework is useful for the derivation of both feedforward and feedback structures, the ML criterion leads primarily to feedback schemes (Section 2.5). With the purpose of deriving NDA methods, the data symbols should be modeled as random variables following the stochastic approach introduced in Section 2.3. Then, the likelihood function should be obtained by averaging the joint likelihood function using the known statistical distribution of the symbols. Additionally, the rest of unknown nuisance parameters can also be averaged out following a Bayesian approach. The resulting NDA ML criterion is referred to as the unconditional (or stochastic) maximum likelihood estimator in the literature (Section 2.3). Because the difficult computation of the mentioned statistical averages, it is very common to consider that the signal-to-noise ratio of the received signal is very low (Section 2.4.1). Although this low-SNR assumption is not generally satisfied, it allows the development of reduced complexity synchronizers because the resulting schemes are usually quadratic in the observation. A different interpretation of the NDA ML estimation is given in [Vaz00][Vaz01][Rib01a]. The new approach is based on the compression of the NDA ML function with respect to the vector of unknown symbols by adopting a linear estimation of these symbols. This approach is valuable because it unifies the different ML-based NDA solutions, namely the Low-SNR UML (Section 2.4.1), the Conditional ML (Section 2.4.2) and the Gaussian ML (Section 2.4.3). In the following sections, the most important NDA synchronization techniques are briefly described and classified. For more information, the reader is referred to the excellent textbooks and historical reports on digital synchronization [Men97][Mey90][Gar88a][Gar90]. Timing Sychronization One of the simplest algorithms exploiting the cyclostationarity property for timing estimation is the well-known Filter and Square Timing Recovery proposed in [Oer88] by M. Oerder and H. Meyr. This feedforward timing synchronizer is based on an explicit spectral line regeneration using quadratic processing. The Oerder&Meyr synchronizer was proved in [LS05a][Vaz00] to be CHAPTER 6. CASE STUDIES 126 the low-SNR ML timing estimator if the carrier frequency error is uniformly distributed within the Nyquist bandwidth and the received symbols are uncorrelated. Likewise, the Oerder&Meyr synchronizer yields also the low-SNR ML solution in the absence of frequency errors as shown in [LS05a][Vaz00]. On the other hand, the application of the ML principle along with the approximation of lowSNR and some additional simplifications, has led to well-known closed-loop estimators such as the NDA Early-Late detector [Men97, Sec. 8.3.1.] and the Gardner’s detector [Gar86a], which is shown to outperform the NDA Early-Late detector at high SNR. A common problem of the existing NDA timing error detectors is the presence of the socalled self-noise (or pattern-noise) [Men97]. Self-noise is the timing jitter induced by the random received symbols. Indeed, this self-noise is a consequence of the adopted low SNR approximation. The occurrence of self-noise yields a high SNR floor on the timing estimation variance that might invalidate these techniques for the medium-to-high SNR range. This problem was addressed in detail in [And96]. In this paper, the authors proposed to pre-filter the received signal before detecting the timing error. Finally, more recent research efforts have been concerned with timing recovery for Continuous Phase Modulation (CPM) [Men97][Vaz00]. These modulation schemes are attractive for their high spectral eficiency and constant envelope nature, which allows the use of low-cost, nonlinear amplifiers. The ML principle along with the low-SNR approximation has been also applied in this case, leading to timing recovery detectors very similar to those derived for a linear format. Carrier-frequency synchronization The structure of the frequency synchronizers highly depends on the magnitude of the maximum frequency offset as compared with the symbol rate. Early methods for feedback frequency recovery in the case of high frequency offset include quadricorrelators [Cah77][Gar88a] and dual filter detectors [Alb89][Gar88a], which have been proved to be equivalent solutions [Moe92]. The rotational detectors for estimating moderate frequency offsets with no timing uncertainty were introduced in [Mes79]. Other ad hoc schemes were proposed in [Sar88] and [Chu91] for the same problem. The first rigorous treatment of the problem starting from a ML perspective can be found in [Gar90]. The frequency recovery methods developed under this framework also make use of the low-SNR approximation. However, the resulting low-SNR ML frequency error detectors become self-noise free if the timing is known. Self-noise appears only when the estimator does not use the timing information [Men97, Sec. 3.5.] and ad hoc techniques for eliminating this effect has been proposed in [Alb89] and [And93]. 6.1. NON-DATA-AIDED SYNCHRONIZATION 6.1.2 127 Signal Model In this section, the signal model used in the context of digital communications is presented in detail. It is shown that most modulations of interest can be represented by means of the linear, vectorial model presented in Section 2.4. Let us start formulating the complex envelope of a generic digital modulation as follows: y(t) = s(t − τ ; dk )ej(ϕ+ωt) + w(t) (6.1) where {dk } are the information symbols conveyed in the transmitted signal s(t), τ is the timing error within a symbol period (−T/2, +T /2], ϕ and ω are the carrier phase and the carrier pulsation errors, respectively, and w(t) is the complex AWGN term with double-sided power spectral density Sw (f) = 2No Watts/Hz.1 If the received signal is low-pass filtered in the Nyquist bandwidth (−0.5/Ts , 0.5/Ts ], the equivalent discrete signal model is given by y(mTs ) = s(mTs − τ ; dk )ej(ϕ+ωmTs ) + w(mTs ) (6.2) where Ts is the sampling period. Under this sampling condition, the discrete noise w(mTs ) remains white. A first case of interest are those linear modulations admiting the following representation: y(mTs ) = +∞ dk p(mTs − kT − τ )ej(ϕ+ωmTs ) + w(mTs ) (6.3) k=−∞ where T = Nss Ts is the symbol period and, p(mTs ) are the samples of the pulse p(t), which is supposed to last L symbol intervals. If we take M Ns Nss samples to estimate the unknown parameters, the n-th observed vector yn [y(nT ), . . . , y(nT + (M − 1)Ts )]T is given by yn = A(θ)xn + wn (6.4) where the transfer matrix ⎡ ⎤ ej(ϕ+nωT ) p((L − 1) T − τ ) ··· ej(ϕ+nωT ) p((1 − Ns ) T − τ ) ⎢ ⎥ .. .. .. ⎥ A(θ) ⎢ . . . ⎣ ⎦ j(ϕ+(n+(M−1)/N )ωT ) j(ϕ+(n+(M−1)/N )ωT ) ss ss p((L + Ns − 1) T − Ts − τ ) · · · e p(T − Ts − τ ) e 1 The in-phase and quadrature two-sided power spectral density is No W/Hz. CHAPTER 6. CASE STUDIES 128 K K Nss M=NsNss Nss M=NsNss LNss A(θ) A(θ) Burst Mode (TDMA) Continuous Mode (SCPC) N s = ⎢⎡ L + K −1⎥⎤ K = ⎢⎡ N s + L −1⎥⎤ LNss Figure 6.1: Structure of the transfer matrix A(θ) in a burst or continuous transmission. is a function of the normailzed vector of parameters θ [ϕ, τ /T, ωT/2π]T , the vector xn [dn−L+1 , . . . , dn+Ns −1 ]T contains the observed data symbols, and the receiver noise wn is defined in the same way than the vector yn . The phase origin can be arbitrarily selected. For instance, a practical choice is the center of the observed interval. In this section, we will focus on the timing and frequency estimation problems assuming that the signal phase is unknown. In that case, the term ejϕ can be integrated into the nuisance parameter vector xn and non-coherent (i.e., quadratic) estimation techniques are adopted. Implicitly, a single channel per carrier (SCPC) system is assumed throughout this thesis in which a continuous, infinite stream of symbols is received (6.3). In that case the initial L − 1 and final L − 1 symbols are partially observed and, consequently, the modulation matrix A(θ) has Ns + L − 1 columns and only Ns Nss rows. Therefore, oversampling (Nss > 1) is normally necessary to have more samples than unknowns, i.e., Ns Nss > Ns + L − 1. This condition is usually a requirement to cancel out the disturbance of the modulation and yield self-noise free estimates of θ. The structure of matrix A(θ) is depicted in Fig. 6.1 (right-hand side) and its Grammian is a function of τ and ω as indicated next ∞ p(m1 Ts − kT − τ )p(m2 Ts − kT − τ ). A(θ)AH (θ) m1 ,m2 = ejω(m1 −m2 )Ts k=−∞ On the other hand, a burst of K symbols is transmitted in a time-division multiple access (TDMA) system. In that case, the observation is composed of (K + L) Nss −1 non-zero samples 6.1. NON-DATA-AIDED SYNCHRONIZATION 129 and, thus, oversampling is not strictly necessary if the received burst is integrally processed. In a TDMA system the matrix A(θ) is Sylvester (see Fig. 6.1) and, if the transmitted pulse is sampled without aliasing, we have that H A (θ)A(θ) k1 ,k2 = Rpp ((k1 − k2 ) T ) where Rpp (∆t) p(t)p(t + ∆t)dt is the pulse autocorrelation. Therefore, AH (θ)A(θ) does not depend on θ. Synchronization algorithms for SCPC systems has to cope with the partial observation of the initial and final symbols. Optimal synchronizers weight the observed samples taking into account that the initial and final symbols provide less information about θ than the central ones. The larger is the observation time (Ns ) the less significative is this “edge effect”. This problem is very relevant, for example, in the carrier phase estimation problem studied in Section 6.2. Asymptotically, the “edge effect” is negligible and the synchronization techniques for SCPC systems are identical to those derived for TDMA systems. Thus, in the asymptotic case synchronizers can be designed considering uniquely the central column of A(θ) (Section 7.4.4). The linear model in equation (6.3) can be extended to encompass more sofisticated scenarios such as multicarrier schemes, multiple access systems, space-time transmissions, or binary CPM modulations on account of the Laurent’s decomposition [Lau86][Men95]. In all these cases, the received signal can be expressed as the superposition of J linearly modulated signals as follows y= J Aj (θj ) xj + w = A (θ1 , ..., θJ ) x + w j=1 with A (θ1 , ..., θJ ) [A1 (θ1 ) , ..., AJ (θJ )] T x xT1 , ..., xTJ where the index n is omitted for simplicity and θj stands for the parameters of the j-th user in case of a multiple access system. The basic difference with respect to (6.3) is that the J signals are usually non-orthogonal and, thus, they interfere each other if we deal with spacetime transmissions [Vil03c], asynchronous CDMA users or, binary CPM signals. Moreover, the J pseudo-pulses of the CPM signal suffer from intersymbol interference (ISI) at their matched filter output. All these terms of interference introduce an additional noisy component affecting the estimator performance at high SNR and yielding the so-called self-noise. 6.1.3 Open-Loop Timing Synchronization In Chapter 3, the formulation of the optimal open-loop second-order estimator was addressed. In that chapter, the parameters of interest were modeled as random variables with known probability density function fθ (θ). Then, the estimator coefficients were optimized averaging the CHAPTER 6. CASE STUDIES 130 estimator bias and variance with respect to the prior fθ (θ). The Bayesian expectation was solved analytically for the problem of frequency estimation in Section 3.4 and some simulations were presented to illustrate the theory of feedforward quadratic estimation. Unfortunately, in most problems, the expectation with respect to fθ (θ) must be solved numerically as, for example, when addressing the problem of timing sychronization. To overcome this drawback, in this section we propose to process the received signal in the frequency domain where the timing error appears as a frequency shift. In that way, the formulation in Section 3.4 can be applied to both the frequency and timing estimation problems. Let z be the DFT of the observed vector y, which is computed as z Fy = FA (θ) x + Fw, where F stands for the unitary M × M DFT matrix defined as follows: 1 2π F √ exp −j dM dTM M M with dM [−M/2, . . . , M/2 − 1]T . Notice that MTs must be greater than the burst duration plus twice the maximum delay to prevent the existence of temporal aliasing. In the frequency domain, the transfer matrix can be written as B (θ) FA (θ) = ejϕ E2 (τ ) FE1 (ν) A (0) with (6.5) & % ν dM E1 (ν) diag exp j2π Nss & % τ Nss dM E2 (τ ) diag exp −j2π M the diagonal matrices accounting for the frequency and timing error, normalized with respect to the symbol period T . In that way, the observation z exhibits the same phasorial dependence on the three parameters ϕ, τ and ν. Therefore, the results in Appendix 3.D can be used to obtain a closed-form expression for the optimal quadratic open-loop timing sychronizer. Notice that optimal estimators can be obtained from z = Fy since F is a unitary transformation that can always be inverted —if necessary— by the estimator matrix M without having noise enhancement. Moreover, the transformation does not change the noise statistics if the original Gaussian noise w is spectrally white. To conclude, it is worth realizing that (6.5) is only held if all the received pulses are entirely observed. Otherwise, those partially observed pulses cannot be interpolated from the vector of samples y because they do not satisfy the Nyquist criterion. Thus, the above expression can be applied to design open-loop estimators if the entire burst —including the pulse tails— is captured and processed in a TDMA system or, alternatively, if the observation time is sufficiently large to neglect the “edge effect” in SCPC systems (Section 6.1.2). 6.1. NON-DATA-AIDED SYNCHRONIZATION 6.1.4 131 Closed-Loop Analysis and Optimization In Chapter 4, the optimum second-order small-error estimator was deduced and then simulated for the frequency estimation problem. The solution therein can be adopted to design the discriminator of NDA timing and frequency closed-loop synchronizers. In this manner, the discriminator coefficients are selected to minimize the steady-state variance at the discriminator output. However, this optimization criterion is not taking into account that the discriminator output is further lowpass filtered by the loop impulse response. For example, an exponential filtering is carried out in case of a first-order closed-loop. When the discriminator output is temporally uncorrelated, this standard procedure is globally optimal and the estimator variance is computed as the discriminator variance divided by the effective loop filter memory N ≈ 0.5/Bn where Bn is the noise equivalent loop bandwidth. This case corresponds to the closed-loop estimator in Section 2.5.1 processing independent blocks zn . However, if the detected errors are correlated because overlapped blocks of the received signal are processed, the estimator variance is no longer divided by N and the standard procedure for designing the discriminator is suboptimal. Remember that overlapping is generally required to have efficient closed-loop estimators (see Proposition 2.1). In this section, the small-error variance of any quadratic NDA closed-loop estimator is formulated analytically. This expression is then optimized to find the optimal discriminator coefficients. Some numerical results for the timing estimation problem are provided comparing the aforementioned design criteria. Notice that the formulation is absolutely general and can be applied to other uniparametric and multiparametric second-order estimation problems. Also, the results in this section are useful in the context of open-loop estimation (Chapter 3) if the parameter estimates are post-filtered. In that case, the Bayesian expectation should be incorporated into all the following expressions. The output of any quadratic discriminator of α = g(θ) can be expressed as n − g (θo ) = MH ( en α rn − ro ) (6.6) where M are the discriminator coefficients under design, rn is the (vectorized) sample covariance rn for any value of n. The matrix for the n-th observed block and, ro is the expected value of sequence en is strict-sense stationary with zero mean and covariance MH Q0 M where H rn − ro ) ( rn − ro ) Q0 E ( is the covariance matrix of the quadratic observation rn (3.10). The meaning of subindex in Q0 will be explained next. Let us remind the reader that the discriminator coefficients minimizing the variance of en were found in Section 4.2. Let us consider now that hn is the loop infinite impulse response. In that case, the estimation CHAPTER 6. CASE STUDIES 132 errors are given by εn ∞ hk en−k = k=0 ∞ H H hk M ( rn−k − ro ) = M k=0 ∞ hk ( rn−k − ro ) k=0 that is a strict-sense stationary zero-mean sequence with covariance ∞ ' H H E εn εn = M Rhh [m] Qm M m=−∞ where Rhh [m] $∞ k=m hk hk−m is the autocorrelation function of the filter hn and Qm E ( rn − ro ) ( rn−m − ro )H stands for the “vectorial autocorrelation function” of the quadratic observation rn evaluated at the m-th lag. Thus, Q0 stands for Qm at m = 0. Notice that Qm is defined for lags |m| ≤ D where D stands for the number of consecutive statistically-dependent blocks. In that way, the covariance of the estimation error is D ' D ' E εn εH Rhh [m] Qm M ≈ Eh MH Qm M = MH n m=−D where Eh Rhh [0] = $∞ 2 k=0 hk m=−D is the filter impulse response energy. In the last approximation, we have taken into account that the bandwidth of hn is very small and, therefore, Rhh [m] is approximately flat for |m| ≤ D. Finally, notice that Eh = 1/N ≈ 2Bn where N and Bn are the effective loop memory and the noise equivalent loop bandwidth, respectively, assuming that $∞ k=0 hk = 1 is verified to have unbiased estimates (Section 2.5.2). In the last equation, the variance of any quadratic (unbiased) closed-loop estimator is given by E εn εH = MH Qopt M n where the fourth-order matrix Qopt is given by Qopt D m=−D Rhh [m] Qm ≈ Eh D m=−D D 1 Qm = Qm . N m=−D and, therefore, the optimal solution is the one deduced in Section 4.2 with Qo = Qopt instead of Qo = Q0 . Thus, the optimal and original second-order discriminators are H −1 # H Dg Mopt = Q−1 opt Dr Dr Qopt Dr # H H −1 M0 = Q−1 Dg , 0 Dr Dr Q0 Dr respectively. 6.1. NON-DATA-AIDED SYNCHRONIZATION 133 −1 10 −2 10 Normalized Timing Error Variance −3 10 −4 10 M=4 −5 10 M=8 −6 10 Optimized Discriminator Mopt −7 10 M=16 −8 10 0 10 20 30 Es/No (dB) 40 50 60 Figure 6.2: Timing error variance with and without loop optimization for different number of samples M in case of the QPSK modulation with roll-off 0.1 and Nss = 2. The same curves are obtained for QAM and MPSK. If Ree [m] and See (f) = $ m Ree [m]e −j2πf m stand for the autocorrelation and the power spectrum of the error sequence en in (6.6), we can affirm that the optimal discrimina$ tor Mopt minimizes See (0) = m Ree [m] whereas the original discriminator M0 minimized 1/2 Ree [0] = −1/2 See (f)df. This means that the optimal discriminator should filter out the very low-frequency errors and let the loop filter to cancel out the high-frequency errors. This fact becomes relevant at high SNR because the self-noise is actually a highpass disturbance. Unfortunately, this desirable aim is severely limited by the unbiased constraint and minor gains have been observed for practical SNRs, at least for the symbol synchronization problem. In Fig. 6.2 and 6.3, it is shown how the self-noise can be reduced at high SNR in case of MPSK and QAM transmissions with small roll-off pulse shaping. On the other hand, if some bias is accepted, the discriminator could adopt a more highpass response in order to reduce the ultimate variance in low-SNR scenarios following the Bayesian formulation in Chapter 3. 6.1.5 Numerical Results The carrier estimation problem was adopted in Sections 3.4 and 4.5 to illustrate the theory of second-order optimal estimation in the field of digital communications. Simulations were provided comparing the optimal solution with the classical ML-based estimators as well as the CHAPTER 6. CASE STUDIES 134 −1 10 −2 Normalized Timing Error Variance 10 −3 10 Original Discriminator M0 −4 10 Optimized Discriminator Mopt −5 10 Nss=4,6,8 N =4,6,8 ss −6 10 0 10 20 30 Es/No (dB) 40 50 60 Figure 6.3: Timing error variance with and without loop optimization for Nss equal to 4, 6 and 8 in case of the QPSK modulation with roll-off 0.1 and M=4. unconditional and modified CRB. To complement these results, some simulations are presented in this section for the problem of digital clock recovery. Closed-loop timing synchronization The optimal second-order timing estimator is compared to the GML and the low-SNR UML. The NDA Early&Late [Men97, Sec. 8.5.2.], Gardner’s [Gar86a], Oerder&Meyr’s [Oer88] synchronizers are also simulated because they are actually the most usual timing sychronizers in practical implementations (Fig. 6.4 and 6.5). Notice that the three algorithms are based on the low-SNR approximation and, therefore, they suffer from self-noise at high SNR. A firstorder closed-loop is simulated with the (normalized) noise equivalent loop bandwidth set to Bn = 5 × 10−3 (i.e., N = 100 symbols). All the Es /N0 values are simulated assuming that the small-error condition is verified. Finally, the CML estimator is not simulated because, in the considered scenario, there are more nuisance parameters than observed samples, i.e., M = 4 and K = 11. The Gaussian assumption is found to yield optimal timing synchronizers for those linear modulations, such as QAM and MPSK, for all the simulated SNR (Fig. 6.4 and 6.5). On the other hand, simulations for the MSK modulation have shown a minor improvement for mediumto-high SNRs [Vil01b], as illustrated in Fig. 6.6. In the same plot, the optimal fourth-order 6.1. NON-DATA-AIDED SYNCHRONIZATION 135 −2 10 −3 Normalized Timing Error Variance 10 −4 10 Gardner OM EL −5 10 BQUE & GML −6 low−SNR 10 −7 10 0 5 10 15 20 Es/No (dB) 25 30 35 40 Figure 6.4: Normalized timing variance for the low-SNR ML and GML estimators as well as the EL (Early&Late), OM (Oerder&Meyr) and Gardner’s symbol sychronizers. The simulation parameters are; 16-QAM, roll-off 0.75, Nss =2 (Nss =4 for the Oerder&Meyr), M = 2Nss and, Bn = 5 · 10−3 . The shaping pulse and the associated matched filter are truncated at ±5T . detector designed in [Vil01b] is simulated showing that higher-order methods are only able to outperform second-order techniques at high SNR. Open-loop timing synchronization Some simulations are also presented in Figs. 6.7-6.12 for the open-loop timing synchronizer. The second-order minimum variance (Mvar ) and MMSE (Mmse ) estimators proposed in Section 3 are compared with the closed-loop estimator formulated in Section 4. The timing is estimated from a burst of K = 4 symbols. Simulations are run for the 16-QAM and MSK modulations. In the first case the transmitted pulse is a square-root raised cosine with roll-off 0.75 and duration 5T . The sampling rate is twice the symbol rate, i.e., Nss = 2. The normalized timing error is modeled as a uniform random variable in the interval ±∆/2. Notice that in a TDMA system the range of ∆ is extended to ±Ns with Ns the burst duration in symbols. The reason is that we are actually dealing with a semiblind estimation problem since those symbols before and after the burst are known to be null. In Fig. 6.7 the normalized MSE is plotted as a function of the timing error for ∆ = 1, K = 4 and Es /N0 =10dB. It can be shown that the MMSE is able to outperform the minimum variance estimator because it is not forced to yield unbiased estimates within the prior range, CHAPTER 6. CASE STUDIES 136 −2 10 Gardner −3 10 Normalized Timing Error Variance OM EL −4 10 Low−SNR UML BQUE & GML −5 10 −6 10 −7 10 0 5 10 15 20 Es/No (dB) 25 30 35 40 Figure 6.5: Normalized timing variance for the low-SNR ML and GML estimators as well as the EL (Early&Late), OM (Oerder&Meyr) and Gardner’s symbol sychronizers. The simulation parameters are; 16-QAM, roll-off 0.25, Nss =2 (Nss =4 for the Oerder&Meyr), M = 2Nss and, Bn = 5 · 10−3 . The shaping pulse and the associated matched filter are truncated at ±5T . i.e., ±∆/2 = ±1/2. On the other hand, the closed-loop estimator is optimized for τ = 0 but its performance degrades rapidly when the timing error approaches the prior limits at τ = ±∆/2. The estimators mean response as well as their squared bias is simulated in Fig. 6.8 and Fig. 6.9, respectively. It is shown that bias is easily cancelled for ∆ = 1 since it is a small fraction of the burst duration, which is equal to 8 symbols in the simulated scenario. Some additional conclusions can be drawn from these simulations: • In noisy scenarios, the loss incurred by open-loop estimators becomes negligible when compared to the performance of closed-loop estimators (Fig. 6.7). On the other hand, closed-loop estimators are superior at high SNR as shown in Fig. 6.10. • The minimum variance and MMSE open-loop estimators converge when the SNR is augmented for the MSK modulation (Fig. 6.11). On the other hand, self-noise is observed at high SNR for the 16-QAM constellation (Fig. 6.12). In that case, the MMSE estimator outperforms the minimum variance solution because it introduces some bias in order to reduce the self-noise variance. 6.1. NON-DATA-AIDED SYNCHRONIZATION 137 −1 10 GML, CML, low−SNR ML (Gaussian) UCRB BQUE 4th−Order [Vil01] MCRB −2 Normalized Timing Error Variance 10 −3 10 −4 10 −5 10 −6 10 −7 10 0 5 10 15 20 Es/No (dB) 25 30 35 40 Figure 6.6: Normalized timing variance for the ML-based estimators for the MSK modulation with Nss =2, M=4 and, Bn = 5 · 10−3 . • The Gaussian assumption leads to suboptimal open-loop sychronizers at high SNR (Figs. 6.11 and 6.12). Regarding the MSK simulation, the Gaussian assumption avoids having self-noise free timing estimates (Fig. 6.11). Some of the results in this section were presented for the first time in the IEEE International Workshop on Statistical Signal Processing that was held in Singapore in 2001 [Vil01a]. This work was further elaborated in [Vil02b] and presented in the IEEE Global Communications Conference that was held in Taipei in 2002: • “Best Quadratic Unbiased Estimator (BQUE) for Timing and Frequency Synchronization”. J. Villares, G. V´ azquez. Proceedings of the 11th IEEE International Workshop on Statistical Signal Processing (SSP01). pp. 413-416. Singapore. August 2001. • “Sample Covariance Matrix Based Parameter Estimation for Digital Synchronization”. J. Villares, G. V´ azquez. Proceedings of the IEEE Global Communications Conference 2002 (Globecom 2002). November 2002. Taipei (Taiwan). CHAPTER 6. CASE STUDIES 138 1 10 ∆/2 −∆/2 0 Normalized Mean Squared Error 10 Mmse −1 10 (MSE=0.029) M var −2 10 (MSE=0.068) closed−loop (MSE=0.043) −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 Normalized timing Error Figure 6.7: Normalized timing MSE for the MMSE, minimum variance and closed-loop secondorder estimators for K = 4 and Es/No=10dB. The average MSE for the three estimators is included inside round brackets. 1 ∆/2 Mvar Estimator Mean Response 0.8 0.6 0.4 M mse 0.2 0 closed−loop −0.2 0 0.2 0.4 0.6 0.8 1 1.2 Normalized Timing Error 1.4 1.6 1.8 2 Figure 6.8: Estimation mean response for the MSE for the MMSE, minimum variance and closed-loop second-order estimators for K = 4 and Es/No=10dB. 6.1. NON-DATA-AIDED SYNCHRONIZATION 139 0 10 closed−loop (BIAS=0.036) −2 10 M ∆/2 −∆/2 MSE (BIAS=0.012) Normalized Squared Bias −4 10 −6 10 Mvar −8 10 (BIAS≈0) −10 10 −12 10 −2 −1.5 −1 −0.5 0 0.5 Normalized Timing Error 1 1.5 2 Figure 6.9: Normalized timing squared bias for the MMSE, minimum variance and closed-loop second-order estimators for K = 4 and Es/No=10dB. The average BIAS for the three estimators is included inside round brackets. 1 10 ∆/2 −∆/2 Mvar 0 10 (MSE=0.015) Mmse Normalized Mean Squared Error −1 10 (MSE=0.012) −2 10 −3 10 −4 10 −5 10 closed−loop (MSE=0.040) −6 10 −2 −1.5 −1 −0.5 0 0.5 Normalized timing Error 1 1.5 2 Figure 6.10: Normalized timing MSE for the MMSE, minimum variance and closed-loop secondorder estimators for K = 4 and Es/No=40dB. The average MSE for the three estimators is included inside round brackets. CHAPTER 6. CASE STUDIES 140 1 10 M 0 var 10 2 στ −1 Normalized MSE 10 Gaussian Assumption Mmse −2 10 −3 10 −4 10 −5 10 0 5 10 15 20 Es/No (dB) 25 30 35 40 Figure 6.11: Normalized timing MSE for the MMSE (Mmse ) and minimum variance (Mvar ) second-order estimators for the MSK modulation when K = 4 and ∆ = 1. The suboptimal estimators deduced under the Gaussian assumption are also plotted. 0 10 Normalized MSE Mvar σ2 −1 10 τ M mse Gaussian Assumption −2 10 0 5 10 15 20 Es/No (dB) 25 30 35 40 Figure 6.12: Normalized timing MSE for the MMSE (Mmse ) and minimum variance (Mvar ) second-order estimators for the 16-QAM modulation when K = 4 and ∆ = 1. The suboptimal estimators deduced under the Gaussian assumption are also plotted. 6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS 6.2 141 Carrier Phase Synchronization of Noncircular Modulations Coherent demodulation of continuous phase modulations (CPM) requires knowledge of the phase and frequency of the received carrier. Self-synchronizing techniques are normally preferred because they avoid the transmission of inefficient training sequences. Moreover, non-data-aided (NDA) algorithms are more appropriate in noisy scenarios because they do not rely on unreliable decisions but on the statistical structure of the received waveform [Men97]. In the synchronization field, the Laurent’s expansion (LE) has been frequently used to derive synchronization techniques for CPM receivers [Men97][Vaz00][Mor00]. The LE is interesting because it allows expressing the non-linear CPM format as the summation of a finite number of pulse amplitude modulated (PAM) signals [Lau86][Men95]. Thus, all the extensive literature on synchronization and parameter estimation for linearly modulated signals can be reused [Rib01b]. On the other hand, the LE allows building scalable schemes considering uniquely the most powerful components of the decomposition [Mor00]. Focusing on the carrier phase estimation problem, Mengali et al. derived in [Men97, Sec. 6.6.2] the ML NDA carrier phase synchronizer under the low SNR assumption for MSK-type modulations (e.g., MSK, LREC, LRC, GMSK) [Men97]. The obtained solution was shown to be quadratic in the data. This is actually a unique feature of MSK-type signals because higher order techniques are required for NDA carrier phase synchronization in case of linear modulations [Ser01][Moe94] as well as general CPM signals. Based on the LE, this property can be justified because the pseudo-symbols are not circular [Pic94] in case of MSK-type modulations and, therefore, the square of the received signal is not zero-mean and offers information about the parameter of interest [Moe94]. Finally, note that the N-th power synchronizer studied in [Moe94] can still be applied to MSK-type modulations although it will be inefficient at low SNR, as stated previously, and will not attain the Cram´erRao bound (CRB) either when the SNR tends to infinity because CPM modulations suffer from intersymbol interference (ISI). From this background, in Section 6.2.2, the low-SNR ML estimator has been reformulated using vectorial notation and the Laurent’s decomposition. The subsequent analysis of the lowSNR approximation at high SNR in Section 6.2.3 reveals the existence of a significant variance floor due to the so-called self-noise, that is, the variability caused by the own modulation in NDA schemes. This floor is inappreciable when the observation is sufficiently large but it is determinant for short samples. This drawback motivated the design of second-order self-noise free schemes minimizing the aggregated contribution of thermal plus pattern noise for a given SNR. The proposed secondorder optimal synchronizer is deduced in Section 6.2.4 and its asymptotic study is presented in CHAPTER 6. CASE STUDIES 142 Section 6.2.5 concluding that, with partial-response signals, some data patterns make the carrier phase unidentifiable if self-noise corrupted estimates are not tolerated. The estimator failure has been related to the singularity of the modulating matrix in partial-response signals. Anyway, although the above circumstance might slow down the parameter acquisition in a closed-loop implementation, self-noise free estimates are guaranteed after convergence. To conclude, the above statements are checked via simulation in Section 6.2.6. The results of this section were presented in the IEEE International Conference on Communications that was held in Paris in 2004 [Vil04b]: • “Self-Noise Free Second-Order Carrier Phase Synchronization of MSK-Type Signals”, J. Villares, G. V´ azquez, Proc. of the IEEE International Conference on Communications (ICC 2004). June 2004. Paris (France). 6.2.1 Signal Model The Laurent’s expansion (LE) allows the representation of binary CPM signals as the sum of a few PAM waveforms [Lau86][Men97]. This transformation is adopted in this section in order to formulate carrier phase synchronizers for the nonlinear CPM format. It was shown in Section 6.1.2 that the complex envelope of the sampled CPM signal is given by y = ejθo J−1 Aj xj + w = ejθo Ax + w (6.7) j=0 where θo is the unknown carrier phase that must be estimated, xj the pseudo-symbols from the j-th component of the Laurent’s expansion having contribution into the observation y, Aj the associated modulating matrix formed from the j-th pseudo-pulse coefficients and, w the vector of AWG noise. The J components of the LE expansion are stacked in the following manner: T x = xT0 , . . . , xTJ−1 A = [A0 , . . . , AJ−1 ] . In order to simplify the study, the following assumptions are taken in the following; 1) the receiver has perfect timing and frequency-offset synchronization; 2) the CPM modulator has achieved the steady-state; 3) the focus is on MSK-type signals for which the modulation index is h = 0.5 and hence the carrier phase shifts are equal to ±π/2; 4) θo ∈ (−π/2, π/2] in order to avoid the inherent ambiguity of quadratic methods [Men97]. Additionally, the study is carried out for a continuous transmission system as explained in Section 6.1.2. This point is specially relevant because some of the concluding remarks are a consequence of the continuous mode model. 6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS 6.2.2 143 NDA ML Estimation in Low-SNR Scenarios In this section, the ML principle is applied to find the optimum estimator of θo when the SNR is asymptotically low. As it was stated in Section 2.3, the ML estimator is the maximizer of the following cost function: −2 jθ 2 ∝ Ex exp σ−2 fy (y; θ) = CEx exp −σw y − e Ax w χ(y; x, θ) (6.8) where C is an irrelevant constant, Ex {·} the expectation with respect to the pseudo-symbols distribution, σ2w the variance of the noise samples and χ(y; x, θ) 2 Re(e−jθ xH AH y) (6.9) is the term in the exponent of (6.8) that depends on θ. Unfortunately, the expectation with respect to x normally complicates the calculation of a closed-form for fy (y; θ). To overcome this obstacle, the likelihood function (6.8) is usually evaluated assuming that the SNR tends to zero, that is, σ 2w → ∞. Following this reasoning, Mengali deduced in [Men97, Sec. 6.6.2] the low-SNR ML estimator of θ o directly from the angular signal model in case of MSK-type signals. Next, an alternative deduction is provided from the vectorial model in Section 6.2.1. In contrast to [Men97, Sec. 6.6.2], the obtained ML solution is exact even if the observation is short. Notice too that [Men97, Sec. 6.6.2] approximates the squared CPM signal (averaged with respect to the data) by means of the first harmonic of its Fourier series in order to yield a low-cost implementation based on transversal filtering. It can be shown that this approximation is only exact for LREC signals in which the frequency pulse is rectangular. The deduction is initiated expanding the logarithm of (6.8) in a Taylor series at σ−2 w = 0, having that 2 1 ln fy (y; θ) σ−4 w Ex χ (y; x, θ) 2 except for some irrelevant additive constants (see Section 2.4.1). Then, computing the above expectation, it results that H ln fy (y; θ) σ−4 w Re Tr(R (θ)R) (6.10) where the improper sample covariance matrix, yyT , R (6.11) constitutes a sufficient statistic for the estimation of θ o in the studied low-SNR scenario (Section 2.4.1) and = ej2θ AΓAT ej2θ R R(θ) E R CHAPTER 6. CASE STUDIES 144 stands for its expected value evaluated at θ with Γ Ex xxT the improper covariance matrix of the pseudo-symbols. = yT Ry is the minimal sufficient statistic Notice that the quadratic form Tr RH R [Kay93b] in the studied low SNR scenario. It is worth noting that it is possible to estimate the carrier phase from the second-order statistics because the pseudo-symbols x do not hold the circular property [Pic94] (i.e., Γ = 0), as it happens in case of linear modulations. Finally, the log-likelihood gradient is given by ∂ H ∇(y; θ) ln fy (y; θ) = 2σ−4 w Im Tr(R (θ)R) ∂θ which is in-quadrature with the likelihood function in (6.10) and vanishes for T 1 = 1 arg x ˆ ˆ Γˆ x θ = arg Tr(RH R) 2 2 (6.12) (6.13) where x ˆ AH y stands for the detected pseudo-symbols at the matched filter output [Vaz00]. The existence of an analytical solution is exceptional and an iterative algorithm is normally required to seek for the maximum of the log-likelihood function (e.g., in timing and carrier frequency synchronization [Vaz00][Men97]). Anyway, even if we have a closed-form solution (6.13), gradient-based algorithms (Section 2.5) allow the design of closed-loop schemes for tracking the parameter of interest in time-varying scenarios (Section 2.5). In that case, the CRB theory (Section 2.3) guarantees that the following recursion ˆ θ n+1 = ˆ θn + I −1 (ˆθn )∇(y; ˆθn ) (6.14) attains asymptotically (M → ∞) the CRB after convergence to the true parameter, i.e., ˆθn θo [Kay93b]. Hence, the asymptotic variance of both the open-loop estimator in (6.13), and its closed-loop implementation in (6.14), is given by 2 var(ˆ θ) E ˆθ − θo = I −1 CRB where I −E ∂ H ∇(y; θ) = E{∇2 (y; θ 0 )} = 4σ −4 w Tr(R R) ∂θ θ=θ0 (6.15) (6.16) stands for the Fisher’s information [Kay93b] at low SNR, that is found to be independent of θo . Notice that the second-order derivative computed in (6.16) normalizes the scoring algorithm in (6.14) to yield unbiased estimates in the small-error regime (ˆθn θo ). Remark: equation (6.15) predicts the variance of the open-loop estimator in (6.13) if and only if the asymptotic (or small-error) condition holds true and, thus, (6.13) works in the linear region of the arg{·} function. Otherwise, (6.13) becomes biased and the CRB theory fails. For instance, at low SNR, the CRB is proportional to σ 4w (6.15) whereas the variance of (6.13) is limited to π 2 /12 bearing in mind that |ˆθ| < π/2 (Fig. 6.13). Nonetheless, the small-error assumption always applies at high SNR even for short samples. 6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS 6.2.3 145 High-SNR Analysis: Self-noise The main drawback of the low-SNR approximation is that it usually suffers from self-noise at high SNR when the sample is finite [Vaz00]. The reason is that the pseudo-pulses of the Laurent’s expansion are not ISI-free, that is, AH A = IK , even in case of full-response CPM formats such as MSK. Consequently, the variance of (6.13) presents a high-SNR floor. In this section, this floor is characterized and, afterwards, second-order self-noise free phase synchronizers are designed in Section 6.2.4. First of all, let us compute the asymptotic variance of (6.13) for an arbitrary value of the SNR. This is done evaluating the variance of (6.14) in the steady-state (ˆθn θo ) obtaining, after some tedious manipulations (Appendix 6.A), that var( θ) = I −2 E{∇2 (y; θ0 )} = 2σ−8 I −2 rH Qr = rH Qr 8 Tr2 (RH R) (6.17) where r vec(R) stands for the column-wise stacking of R, and Q is the fourth-order moments matrix given by Q = 2R ⊗ R+AKAT , (6.18) that extends the formulation in Chapter 3 to noncircular constellations with the following set of definitions:2 R E yyH = AAH + σ2w IM A A⊗A − 2P KΓ (6.19) 1 2 (IK 2 +K) P Ex vec(xxT ) vecH (xxT ) − Ex vec(xxT ) vecT (xxT ) Γ where K is the commutation matrix that is implicitly defined as the matrix holding that vec(XT ) = K vec(X) for any matrix X [Mag98, Sec. 3.7]. Likewise, P is the orthogonal projector onto the subspace that contains the vectorization of any symmetric matrix, i.e., vec(X) with lie in this subspace. The matrix X = XT [Mag98, Sec. 3.7]. It can be shown that both r and Γ is specific of the actual CPM format and can be calculated numerically. In case of MSK-type Γ i,j ∈ {0, ±2}. modulations, this task is simplified because [Γ] Therefore, if (6.17) is evaluated in the noiseless case, one finds that the self-noise variance causing the high-SNR floor is equal to Hr rH AΓA . lim var(ˆθ) = σ2w →0 8 Tr2 (RH R) 2 (6.20) The reader is warned that some notation is slightly redefined in this section. For example, R (θ) is the improper covariance matrix; the conjugation is omittedin A; matrix K is redefined in (6.19); and, finally, Q is (6.11) the covariance matrix of the new sufficient statistic vec R CHAPTER 6. CASE STUDIES 146 As stated before, the estimator would be self-noise free in the absence of ISI (AH A = IK ) vec(Γ) in (6.20) becomes equal to zero because in that case AH r = vec(Γ) and, thus, the term Γ for most noncircular modulations of interest, e.g., real-valued constellations, MSK-type signals as well as the offset QPSK format. Anyway, the estimator is consistent for any SNR since the self-noise variance in (6.20) turns out to be proportional to M −1 for M 1. For example, the simulations in [Men97, Sec. 6.6.2] show that the variance curvature is practically inappreciable below SNR=20dB with M = 100 for the MSK and GMSK modulations. 6.2.4 Second-Order Optimal Estimation The aim of this section is to deduce optimal second-order synchronization techniques for the whole SNR range. Assuming the noise variance is known (or accurately estimated), the proposed estimator will minimize the joint contribution of thermal and pattern noise, leading to the previous ML solution (6.13) when the SNR is sufficiently low and to self-noise free schemes at high SNR (Section 6.2.5). With this purpose, let us introduce the equation of a generic second-order gradient following the structure provided by (6.12) under the low-SNR assumption: = 2 Im e−j2θ mH ∆(y; θ) 2 Im e−j2θ Tr(MH R) r (6.21) where M is thematrix of coefficients that should be optimized, m vec(M) its vectorization the vectorization of (6.11). and r vec R The value of θ for which (6.21) is null is given by 1 ˆ = 1 arg mH r θ = arg Tr(MH R) 2 2 (6.22) r = 0. Otherwise, the open-loop algorithm in (6.22) is unable to extract any provided that mH phase information from this specific r. This fact will be studied in detail in Section 6.2.5 because it is only relevant at high SNR. For the moment, (6.22) is assumed to be “well-conditioned”. Another important remark is that the estimation problem at hand allows obtaining a closedform solution for the zero of ∆(y; θ). However, notice that the open-loop estimator proposed in (6.22) is not quadratic in the data due to the arg {·} operator. Therefore, the estimation techniques studied in this section should be seen as a nonlinear transformation of the sample = yyT , that is only a sufficient statistic under the low-SNR approximation. covariance matrix R Thus, the variance of (6.22) in the small-error regime is given by var(ˆ θ) = J −2 E{∆2 (y; θ 0 )} = 2J −2 mH PQm, (6.23) 6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS 147 that can be seen as a generalization of (6.17) with J defined as the gradient slope at θo , that is: ∂ J −E ∆(y; θ) (6.24) = 4 Re mH r . ∂θ θ=θo Notice that J plays the same role than the Fisher’s information in (6.14)-(6.16), that is, it normalizes the recursion in (6.14) to yield unbiased estimates around the true parameter θo . The optimal coefficients are obtained minimizing the estimator variance (6.23) subject to the above bias constraint (6.24). This optimization leads to an underdetermined system of equations and m= JQ−1 r 4rH Q−1 r (6.25) is found to be the minimum-norm solution. Anyway, all the solutions are found to yield the same variance, that is equal to var(ˆθ) = J −1 = 1 8rH Q−1 r (6.26) plugging (6.25) into (6.23). Eventually, the coefficients of the optimal estimator are given by m = 2Q−1 r. Finally, note that all the above expressions reduce to the ones obtained in Section 6.2.2 under the low-SNR assumption (σ2w → ∞) taking into account that −4 1 Q−1 = σ−4 w IM 2 + o σ w 2 (6.27) gathers all the terms converging to zero faster than σ−4 where o σ−4 w w . Notice that the GML estimator has not been considered in the carrier phase estimation problem because the Gaussian assumption also implies the circularity of the nuisance parameters. 6.2.5 High SNR Study: Self-noise This section is concerned with the high-SNR study of the optimal second-order synchronizer deduced in the last section. Although the analysis is more involved than in Section 6.2.3, closed-form expressions have been obtained concluding that self-noise can be totally removed. Nonetheless, in the case of partial-response schemes (e.g., GMSK) the open-loop implementation (6.22) may fail when mH r = mH A vec(xxT ) = 0 in the noiseless case. When this happens, the carrier phase is not identifiable from this particular observation r. The reason for this abnormal behavior is that it is not always possible to cancel out the imaginary part of the argument (selfnoise) while the real part is kept positive (6.22). For example, when the binary data symbols CHAPTER 6. CASE STUDIES 148 are alternate, i.e., {+1, −1, +1, ...}, the 2REC modulation exhibits a constant phase equal to ±π/4 and, thus, r = A vec(xxT ) is strictly imaginary in the noiseless case. A deeper analysis shows that this limitation is a consequence of the singularity of matrix A for partial-response modulations. However, this conclusion needs to be clarified; the singularity of A is due to the partial contribution from the pseudo-symbols outside the observation window in the studied SCPC system (Section 6.1.2). Therefore, only in the asymptotic case (M → ∞), this “border effect” is negligible and matrix A is effectively full-rank. The asymptotic study of (6.26) involves the computation of Q−1 when the noise power tends to zero. Because the noiseless component of Q is singular (6.19), we must resort to the inversion lemma obtaining that m = 2Q−1 r = R−1 I − V(2Σ−1 + VH R−1 V)−1 VH R−1 r (6.28) where R R ⊗ R and VΣVH is the “economy-size” diagonalization of AKAH (6.19), i.e., Σ only contains the non-zero eigenvalues and, V the associated eigenvectors. −1 Using again the inversion lemma, the high-SNR asymptotic value of R can be expanded in terms of σ2w , yielding −1 R 4 ⊥ 2 2 = σ−2 w PA + B − σ w B + O σw (6.29) # where O σ4w contain all the terms converging to zero as σ4w or faster, P⊥ A I − AA stands for the orthogonal projector onto the span of matrix A and B (AAH )# is introduced to compact further equations. Thus, the limit of R−1 is straightforward from (6.29), using that −1 R−1 = R −1 ⊗R . (6.30) However, all the terms in (6.30) containing P⊥ A go to zero when multiplied by V or r in (6.28) since span{V} ⊂ span{A} and r = A vec(Γ) ∈ span{A}. Therefore, considering only the surviving terms, it is found that R−1 = B − σ2w B ⊗ B2 + B2 ⊗ B + O σ 2w (6.31) for σ2w → 0 where B B ⊗ B = (AAH )# . To complete the deduction, the inversion lemma has to be used once again in order to compute the inner inverse in (6.28) because T 2Σ−1 + VH BV 6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS 149 turns out to be singular again. Precisely because of that, the second term of (6.28) becomes proportional to σ−2 w and prevails at high SNR avoiding the variance floor. Taking this fact into account, the high-SNR asymptotic expression of m is given by ⊥ H m = σ−2 w BVPT V Br + O (1) with (6.32) −1 I − V (VH U−1 V )−1 VH U−1 P⊥ T T T U T T U VH B ⊗ B2 + B2 ⊗ B V (6.33) The above expression is general no matter if A is singular or not. In case A is full-rank, (6.32) can be simplified if the deduction is started again by decomposing the first term of Q as H AH with Σ the diagonal matrix having the non-zero eigenvalues of K and V the AVK ΣK VK K K related eigenvectors. Thus, if V and Σ are redefined as V = AVK and Σ = ΣK , respectively, (6.32) can be written as follows: H # H # m = σ−2 VK P⊥ w A T VK A r + O (1) (6.34) using that AH B = A# and T = 2Σ−1 K + I. At this point, it is worth understanding that the obtained solution differs from the standard CML estimator (Section 2.4.2) in that (6.34) cannot project the self-noise term onto the orthogonal subspace of A because they are collinear [Vaz00][Rib01b]. Alternatively, in (6.34) the received signal y is passed through a zero-forcing equalizer A# in order to decorrelate the received pseudo-symbols [Vaz00] and, afterwards, the outer product of the detected pseudoH ⊥ , which symbols is projected onto the matrix VK P⊥ T VK whose span coincide with that of PΓ is the orthogonal projector onto the subspace generated by Γ (6.19). The key property of P⊥ H ⊥ vec(xxT ) is real-valued for any possible vector x. —inherited by VK P⊥ T VK — is that PΓ Γ Resuming the initial discussion, if A is full-rank, the zero-forcer recovers without error the H vector of pseudo-symbols, i.e., x ˆ = A# y = x and VK P⊥ T VK is able to eliminate the imaginary part of vec(xxT ) that causes the referred self-noise while the real part is preserved, allowing feed-forward estimation (6.22). Otherwise, the real and imaginary parts are coupled and the self-noise cancellation attenuates inevitably the real part too. To conclude this section, the estimator variance at high SNR is given by var(ˆ θ) = σ2w H 4 vecH (Γ)A# VP⊥ TV A #H vec(Γ) + o σ2w (6.35) for the general case (6.32) and, reduces to var(ˆ θ) = σ2w + o σ2w ⊥ H H 4 vec (Γ)VK PT VK vec(Γ) (6.36) when A is full-rank (6.34). Notice that in both cases the estimators are consistent, i.e., the denominator increases without limit as M is augmented. CHAPTER 6. CASE STUDIES 150 1 10 0 10 Low−SNR approximation −1 Carrier Phase Variance 10 M=2 M=4 −2 10 MCRB (M=4) Optimal quadratic −3 10 M=2 M=4 −4 10 −5 10 −10 −5 0 5 10 15 Es/No (dB) 20 25 30 35 40 Figure 6.13: Carrier phase variance as a function of the SNR for the MSK modulation with Nss = 2. Dotted lines correspond to the high-SNR bounds computed in Sections 6.2.3 and 6.2.5. 6.2.6 Numerical Results This section validates via simulation the theoretical results presented in this case study. The steady-state variance of the closed-loop solution presented in equation (6.14) is evaluated for both the low-SNR approximation and the optimal second-order synchronizer deduced in Section 6.2.4 (see Fig. 6.13 and Fig. 6.14). Simulations show that the proposed solution is self-noise free even if the observation is rather short (M = 2, 4). Furthermore, the high-SNR asymptotic expressions obtained in (6.20), (6.35) and (6.36) exhibit a perfect match at high SNR. Although it is not plotted, the feedforward synchronizer derived in (6.22) was tested for the MSK modulation in Fig. 6.13 confirming that it is always self-noise free. Surprisingly, the high-SNR estimators deduced in (6.32) and (6.34) are also exact for any SNR. The reason is that, as mentioned before, there is no noise-enhancement at low-SNR because (6.32) and (6.34) does not include the orthogonal projector P⊥ A. Finally, the acquisition performance of the optimal closed-loop synchronizer is evaluated in Fig. 6.15 in order to validate its operability when A is singular in case of partial-response schemes (e.g., 3REC). In the same figure, the probability of “failure” has been computed as a function of the observation length for some partial response schemes. As shown in the plot, the probability of failure decays exponentially with the observation time and the damping factor increases if the modulator memory is shortened. 6.2. CARRIER PHASE SYNCHRONIZATION OF NONCIRCULAR MODULATIONS 151 1 10 0 10 Low−SNR approximation −1 Carrier Phase Variance 10 M=2 M=4 −2 10 MCRB (M=4) Optimal quadratic −3 10 M=2 M=4 −4 10 −5 10 −10 −5 0 5 10 15 Es/No (dB) 20 25 30 35 40 Figure 6.14: Carrier phase variance as a function of the SNR for the 3REC modulation with Nss = 4. 0 0.3 10 θo 0.25 3REC −1 10 SNR=60 dB Failure probability Carrier Phase Estimate / π 0.2 0.15 0.1 2REC −2 10 0.05 −3 10 0 −0.05 −4 0 100 200 300 n 400 500 10 2 4 6 8 M 10 12 14 Figure 6.15: On the left side, 10 acquisitions for the 3REC modulation with Nss = 2 and M = 2. The SNR was set to 20 dB and the loop step-size fixed to µ = 0.01. On the right-hand side, the probability of failure for 2REC and 3REC with Nss = 2 and σ2w = 0. CHAPTER 6. CASE STUDIES 152 6.3 TOA Estimation in Multipath Scenarios In the context of wireless, underwater or optical communications, the transmitted signal is severely distorted by the channel due to the so-called multipath propagation. In a multipath scenario, the received signal is the sum of multiple replicas of the transmitted waveform whose delays, amplitudes and phases are unknown. The resulting dispersive channel can be modeled as a finite impulse response (FIR) filter of unknown complex-valued coefficients. In digital communications, the multipath disturbance is mitigated implementing digital equalizers in order to prevent intersymbol interference (ISI). This topic is addressed in Section 6.4 where blind channel estimators are designed from the sample covariance matrix. In this section, we focus on the problem of radiolocation in cellular networks using range estimates from several base stations. Range information can be obtained estimating the time of arrival (TOA) —if the network is synchronous— or the time difference of arrival (TDOA) in case of an asynchronous cellular network. Although the principle is the same than in radar and navigation applications (e.g., GPS or GALILEO), the mobile radio channel poses some additional impairments as, for example, time- and frequency-selective fast fading, non line-of-sight (NLOS) conditions, narrowband signaling in case of second generation terminals, low Es /N0 for the received signal coming from the non-serving base stations, limited training periods for TOA estimation (e.g., GSM midamble), etc. In this context, a lot of effort has been made to design TOA estimators robust to the multipath degradation. Some of them have been developed for satellite positioning systems (i.e., GPS, GLONASS and, GALILEO) using direct sequence spread spectrum signaling, e.g., [Bra01][Sec00, Sec. 2.2.2.] and references therein. All these contributions are intended for singleantenna receivers. Nonetheless, it has been proved that the use of antenna arrays is useful to mitigate multipath and also to cancel interferences [Sec00]. Actually, a multisensor receiver is able to combine direction of arrival (DOA) and TOA information in order to render more accurate position estimates. The application of these techniques to third generation cellular systems such as UMTS is rather straightforward since they share the same signal format. On the other hand, timing recovery in narrowband systems (e.g., GSM) becomes more difficult since the time resolution is inversely proportional to the signal bandwidth and, the self-noise contribution becomes critical for the working SNRs. Recall that self-noise consists of the intersymbol interference (ISI) at the timing error detector output. In spread spectrum communications, the self-noise term is negligible because it is filtered out in the despreading stage. Some relevant contributions in the context of narrowband communications are [Chi94][Mog03] and, [Fis98][Win00][Rib02]. In the first two proposals, the Gaussian assumption is adopted and non-data-aided TOA estimators are deduced. On the other hand, the last 6.3. TOA ESTIMATION IN MULTIPATH SCENARIOS 153 three papers rely on the transmission of known training data. In all the papers, with the exception of [Rib02], the channel coefficients are deterministic unknowns that need to be estimated in a first step. An alternative approach is adopted in [Rib02] where the multipath is modeled as a random term of known first- and second-order moments. In this manner, unbiased TOA estimates are obtained trading some estimation variance following a Bayesian approach. In this section, the problem of both DA and NDA timing —and also carrier frequency-offset— estimation is studied in a multipath scenario assuming that the channel response is unknown. Optimal second-order unbiased estimators are deduced based on the channel first- and secondorder statistics following a similar approach to the one presented in [Rib02]. Some numerical results are presented for the problem of TOA estimation in a typical wireless outdoor scenario in the context of the GSM standard and the EMILY European project [Bou02a][Bou02b]. The results in this section were partially presented in the the International Zurich Seminar on Broadband Communications that was held in Zurich in 2002 [Vil02a]: • “Optimal Quadratic Non-Assisted Parameter Estimation for Digital Synchronisation”. J. Villares, G. V´ azquez. Proceedings of the International Zurich Seminar on Broadband Communications 2002 (IZS2002). pp. 46.1-46.4. Zurich (Switzerland). February 2002. 6.3.1 Signal Model Let us consider that the channel impulse response is time-invariant during the observation time (M samples). The channel low-pass equivalent impulse response within the receiver bandwidth W is given by L−1 1 h(t) = h(k/W ) sinc(W t − k) W (6.37) k=0 with L/W the effective duration of the channel [Pro95, Sec. 5-1, Ch. 14]. Hereafter, the bandwidth W is set to 2/T (100% excess of bandwidth) in order to admit the majority of bandpass modulations. The channel taps h(k/W ) will be modeled as zero-mean complex Gaussian variables with their envelope and phase following a Rayleigh and uniform distribution, respectively. The Rayleigh distribution is adopted hereafter because it corresponds to a worst-case situation. Anyway, it is possible to assume for the first coefficient of h(k/W ) a Ricean distribution in order to take into account the line-of-sight (LOS) component [Gre92]. From the above considerations, the complex envelope of the received signal at the sampler CHAPTER 6. CASE STUDIES 154 output can be written as y (mTs ) = ∞ i=−∞ di L−1 h(k/W )ej2πνm/Nss p (mTs − k/W − iT − τ T ) + w (mTs ) (6.38) k=0 where Ts is the sampling period, {di } is the sequence of transmitted symbols, ν and τ are the frequency and timing errors normalized with respect to the symbol time T = Nss Ts , p(t) is the shaping pulse, and w(t) the stationary AWGN term. Following the guidelines in Section 6.1.2, the above formula can be expressed in vectorial form as follows: y = A (λ) Hd + w = A (λ) x + w (6.39) where λ stands for either the timing or frequency error and the columns of H are 1/W -seconds delayed versions of the channel impulse response h(k/W ). It is really important to realize that the inclusion of a random channel h(t) yields directly the same model in (6.3), except that the proposed estimators will have to cope with the extended, correlated vector of symbols x Hd. Therefore, the channel has two negative effects: 1. Firstly, the unknown vector of symbols x is about W T times longer than the vector of transmitted symbols d. The increment of the nuisance parameters will establish a limit on the variance of blind estimators when dealing with a time dispersive channel. 2. Secondly, the channel modifies the covariance of the transmitted symbols in the following way: Γ E xxH = EH HHH assuming again uncorrelated symbols. Notice that the Gaussian assumption is verified in case of constant amplitude modulations such as MPSK or CPM. On the other hand, the received symbols are not strictly Gaussian when the transmitted signal is a multilevel modulation such as QAM or APK. However, it can be shown that in that case the Gaussian assumption yields practically optimal second-order estimators for any SNR. 6.3.2 Optimal Second-Order NDA Estimator The optimal second-order estimator of λ from the above signal model is deduced in the following lines. First, the covariance matrix R (λ) and the fourth-order matrix Q (λ) are calculated, having 6.3. TOA ESTIMATION IN MULTIPATH SCENARIOS 155 that R (λ) = A (λ) ΓAH (λ) + Rw Q (λ) = R∗ (λ) ⊗ R (λ) + A (λ) KAH (λ) (6.40) where the kurtosis matrix K is diagonal in the presence of multipath because x = Hd is always circular even if the transmitted symbols d are not. Moreover, K is strictly zero for any constant amplitude modulation and admits a simple form in case of a linear circular modulation. Taking into account that K = Ed {vec ddH vecH ddH } − vec (I) vecH (I) − I is equal to K = (ρ − 2) diag (vec (I)) in case of circular constellations (3.12), it follows that the diagonal entries of K are ∞ [K]k,k = 2 (ρ − 1) P DP 2 (k/W − iT ) (6.41) i=−∞ where P DP (t) ⎧ ⎨ E |h(t)|2 0 ≤ t < L/W ⎩0 otherwise stands for the channel power delay profile (PDP). Notice that (6.41) vanishes in case of constant amplitude modulations (ρ = 1). Thus, the Gaussian assumption in a multipath scenario applies for an important class of modulations whereas it is not verified in case of Gaussian distributed symbols (ρ = 2). In some circumstances, the channel taps are uncorrelated and, if W T is an integer number, Γ is a diagonal matrix with entries: ∞ [Γ]k,k = P DP (k/W − iT ) i=−∞ In this uncorrelated scattering (US) scenario, the channel PDP conveys all the statistical information about the channel. Then, assuming that the channel PDP is known or accurately estimated, optimal second-order synchronizers can be built for the studied scenario using the framework provided in Chapter 3 and 4. Moreover, in some situations a limited set of parameters is sufficient to describe completely the PDP function. For example, sometimes the mobile radio channel is correctly modeled by adopting a (decreasing) exponential PDP [Gre92] as the following: −t P DP (t; σ) = Cexp σ (6.42) CHAPTER 6. CASE STUDIES 156 where σ is the so-called delay spread and C is a normalization constant forcing Tr {Γ} = K with K the length of x. Depending on the channel delay spread, two asymptotic situations can be studied: 1. Flat fading channel (σ → 0): [Γ]k,k = 1 k multiple of W T 0 otherwise (6.43) and hence (6.39) reduces to the ideal channel case with x = d. In that case, the channel only changes the distribution of the received symbols. 2. Highly frequency-selective channel (σ → ∞): 1 (6.44) IK WT and, therefore, the channel is implicitly increasing the vector of received symbols x in (6.39) Γ= by a factor of W T , as well as changing their distribution. Notice that this expansion may require to oversample the received signal in order to guarantee that the matrix A (λ) is tall, i.e., it has more rows (received samples) than columns (unknown symbols). Otherwise, the estimator variance will exhibit a high-SNR floor because the self-noise term cannot be cancelled. This fact forces the designer to ensure that Nss > W T . 6.3.3 Optimal Second-Order DA Estimator Thus far, the vector of transmitted symbols d is unknown at the receiver side and, therefore, NDA estimators are required. Next, the optimal DA estimator is formulated assuming that d is deterministic (i.e., a training sequence) but the channel H is still unknown. In that case, the received symbols x = Hd are zero-mean random variables and, hence, second-order methods are necessary once more. In order to deduce the optimal second-order estimator, the signal model in (6.39) must be modified in the following way: y = A (λ) Dh + w (6.45) with [h]k = h(k/W ) the k-th tap of the unknown channel and D the matrix stacking the known transmitted symbols {di } in such a way that Dh = Hd. At this point, the optimal second-order estimator of λ is straighforward from the above signal model with h the vector of Gaussian nuisance parameters. It only rests to compute the covariance matrix R (λ) and the fourth-order matrix Q (λ) for the problem at hand, obtaining that R (λ) = A (λ) DΓh DH AH (λ) + Rw Q (λ) = R∗ (λ) ⊗ R (λ) 6.3. TOA ESTIMATION IN MULTIPATH SCENARIOS 157 where Γh E hhH and Q (λ) is computed taking into account that h is normally distributed. In case of uncorrelated scattering, Γh becomes diagonal with entries [Γh ]k,k = P DP (k/W ). 6.3.4 Numerical Results The results in this section were applied to devise multipath resistant TOA estimators in the context of the EMILY project [Bou02b]. The aim of this project was the integration of positioning measurements from the GPS and GSM networks. In the second case, the spatial accuracy is severely degraded due to the multipath propagation. In this kind of scenarios the proposed NDA and DA estimators are robust against the multipath providing unbiased TOA estimates. In both cases, the channel PDP and the noise variance is measured off-line. For the sake of simplicity, an uncorrelated Rayleigh channel having exponential PDP is considered. Different delay spreads are simulated and the channel is varied in time according to the Jake’s Doppler spectrum [Gre92] although this information is not exploited by the estimator. Finally, the GMSK modulation from the GSM standard as well as the MPSK and MSK modulations are considered in the simulations. To estimate the timing error τ , the received bandpass signal is filtered into W = 2/T and, afterwards, the I and Q components are generated and sampled taking Nss = 4 per symbol. A first-order closed-loop is implemented to estimate and track the TOA of the user of interest. The optimal second-order NDA discrimator is considered with M = 8 the number of input samples. The variance at the discriminator output is computed as a function of the SNR. Notice that this variance is further reduced by the loop filter. Simulations fit pretty well with the theoretical performance obtained in Chapter 4, where it is shown that the minimum variance for any quadratic unbiased timing detector is given by V AR (τ ) = with dR (τ ) dr (τ ) vec dτ 1 dH r (τ ) Q−1 (τ ) dr (τ ) (6.46) dAH (τ ) dA (τ ) H = vec A (τ ) + A (τ ) dτ dτ and dA (τ ) /dτ the matrix whose columns are 1/W -delayed versions of the shaping pulse derivative, i.e., dp(t)/dt. Notice that the estimator variance in (6.46) becomes independent of the actual value of the parameter τ . The first simulation in Fig. 6.16 is carried out for the 16-QAM modulation. As commented before, the number of significant nuisance parameters grows with the delay spread σ. In Fig. CHAPTER 6. CASE STUDIES 158 1 10 σ=5T 0 10 σ=T −1 10 −2 Variance 10 σ=T/4 −3 10 σ=T/10 −4 10 −5 10 σ=T/20 −6 10 ideal chanel & σ=0 −7 10 0 10 20 30 SNR (dB) 40 50 60 Figure 6.16: TOA estimation variance as a function of the SNR for 16-QAM symbols. The simulation parameters are Nss = 4 and M = 8. The transmitted pulse is a square-root raised cosine truncated at ±4T (100% roll-off). 6.16, the estimator is unable to cope with the self-noise enhancement and exhibits a variance floor at high SNR. This degradation is rapidly observed even for very small values of σ (e.g., σ = T /10). If σ is slightly augmented (σ = T /4), this degradation is also observed at low SNR. In the limit (σ → ∞), the number of nuisance parameters is multiplied by W T = 2. Notice that the loss in terms of timing accuracy caused by the channel is extremely important in case of QAM modulated signals. On the other hand, the maximum loss with constant modulus constellations such as MPSK, MSK and LREC is bounded and occurs when the delay spread approaches the symbol time (Figs. 6.17-6.20). The estimator is found to be self-noise free for the MPSK and MSK modulations whatever the channel delay spread. This loss is manifested first at high SNR, as it was observed in the QAM simulations (Fig. 6.16). On the other hand, self-noise can be eliminated augmenting the observation time when dealing with the 2REC and 3REC modulations. Regarding the Gaussian assumption, it is always verified for the QAM modulation (Fig. 6.16). On the other hand, it applies for any constant modulus modulation (e.g., MPSK and CPM) in the presence of a fading channel (Figs. 6.17-6.20). 6.3. TOA ESTIMATION IN MULTIPATH SCENARIOS 159 0 10 σ=0 −1 10 σ=∞ −2 10 Variance −3 10 −4 10 Gaussian Assumption (ideal channel) −5 10 −6 Ideal channel 10 −7 10 0 10 20 30 SNR 40 50 60 Figure 6.17: TOA estimation variance as a function of the SNR for MPSK symbols. The simulation parameters are Nss = 4 and M = 8. The transmitted pulse is a square-root raised cosine truncated at ±4T (100% roll-off). 0 10 σ>T σ=0 −1 10 −2 Variance 10 −3 10 Gaussian Assumption (ideal channel) −4 10 −5 10 Ideal channel −6 10 0 10 20 30 SNR (dB) 40 50 60 Figure 6.18: TOA estimation variance as a function of the SNR for MSK symbols. The simulation parameters are Nss = 4 and M = 8. CHAPTER 6. CASE STUDIES 160 0 10 −1 10 −2 Variance 10 Gaussian Assumption (ideal channel) −3 10 σ>T Ideal channel −4 10 σ=0 −5 10 −6 10 0 10 20 30 SNR (dB) 40 50 60 Figure 6.19: TOA estimation variance as a function of the SNR for the 2REC modulation. The simulation parameters are Nss = 4 and M = 8. 2 10 1 10 0 10 Variance −1 10 −2 10 −3 10 σ>T −4 Gaussian Assumption (ideal channel) 10 σ=0 Ideal channel −5 10 0 10 20 30 SNR (dB) 40 50 60 Figure 6.20: TOA estimation variance as a function of the SNR for the 3REC modulation. The simulation parameters are Nss = 4 and M = 8. 6.4. BLIND CHANNEL IDENTIFICATION 6.4 161 Blind Channel Identification In some scenarios, the transmission channel is frequency-selective causing intersymbol interference (ISI) at the matched filter output [Pro95]. Most times the channel response is not known a priori and the receiver has to identify the channel in order to cope with this ISI. This task is mandatory if the channel response is time-variant as it happens in wireless communications. In that case, adaptive techniques have to be developped to track the channel evolution. On the other hand, in a given access network, the subscribers have different channel responses and, thus, their equipments are supposed to configure theirselves when they are plugged for the first time to the network. In most standards, some training or pilot symbols are transmitted periodically to facilitate the receiver synchronization and the channel identification. The use of training sequences reduces the system efficiency, mostly when the channel varies in time. This inconvenience has motivated for a long time the study of blind channel estimation and equalization techniques. The pionnering work is authored by Y. Sato [Sat75] and was further developed by Godard [God80], Treichler et al. [Tre83], Benveniste et al. [Ben84], Picci et al. [Pic87], Salvi et al. [Sha90], Giannakis et al. [Gia89], Nikias [Nik92], Sala [Sal97] among others. All these methods exploit the higher-order moments of the received signal in the belief that non-minimum phase channels were not identifiable from second-order techniques3 . This idea was refuted in the revolutionary paper by Tong et al. [Ton91] where the authors proved that the channel response can be identified from the second-order moments if the received signal is cyclostationary and multiple samples per symbol are taken from the channel output. This new perspective is founded into the fractionally-spaced equalizer proposed by Ungerboeck in 1976 [Ung76]. In this paper, the oversampling was proposed as a means of improving the equalizer performance in the presence of timing errors. Anyway, the main advantage of second-order methods is that their convergence is faster than the one of higher-order methods. The original paper was further simplified by Moulines et al. in [Mou95] and studied in [Ton94][Ton95][Tug95] from different points of view. All these channel estimators are subspace methods based on the eigendecomposition of the sample covariance matrix. A different perspective was introduced by Giannakis et al. [Gia97] and Zeng et al. [Zen97a] in which the asymptotic (large sample) best quadratic unbiased channel estimator is formulated from the cyclic spectrum or the cyclic correlation, respectively. Additionally, an hybrid method including subspace constraints is proposed in [Zen97a]. The resulting estimator is shown to encompass most second-order methods in the literature [Gia97][Liu93][Mou95][Sch94] [Ton95]. Some asymptotic studies are also supplied in [Zen97b]. 3 A system is minimum phase if all the zeros of its transfer function are inside the unit circle. This implies that the inverse system is realizable. CHAPTER 6. CASE STUDIES 162 In this chapter, the best quadratic unbiased estimator is deduced for a finite observation. The proposed estimator exploits the knowledge on the pulse shaping as well as the statistics of the transmitted discrete symbols. It is found that the optimal solution is able to estimate the channel amplitude in case of constant-modulus constellations such as MPSK or CPM. On the other hand, the amplitude is ambiguous under the Gaussian assumption. This contribution actually complements the results in [Gia97][Zen97a][Zen97b]. 6.4.1 Signal Model This section is based on the signal model presented in Section 6.3.1. The aim is now to estimate the channel impulse response h(k/W ) in (6.37). The vector of parameters is given by θ [Re {h0 } , . . . , Re {hL−1 } , Im {h0 } , . . . , Im {hL−1 }]T with hk h(k/W ) the k-th tap of the channel. The received waveform is the superposition of L replicas of the transmitted pulse p(t), L−1 hk p (t − k/W ) (6.47) k=0 that, if it is sampled every Ts = T/Nss seconds, yields the following transfer matrix A (θ): A (θ) = L−1 hk B (k/W ) k=0 where [B(τ )]m,i = p(mTs − τ − iT ) m = 0, ..., M − 1, i = 0, ..., I − 1 is the matrix performing the convolution with the delayed shaping pulse p(t − τ ) and I is the number of observed symbols. Notice that the proposed signal model can be applied in spread spectrum communications with p(t) the known signature and h∗ the weight associated to the k-th finger in a RAKE k receiver. The optimality of the RAKE receiver is guaranteed if the L fingers are uncorrelated which means that p(t) is spectrally white and the channel taps hk are uncorrelated, as well. In the context of narrowband communications, the identification of the channel impulse response h(k/W ) is required to implement fractional equalizers [Ung76]. On the other hand, if the estimated channel is later employed to implement the maximum likelihood sequence estimator (MLSE) [For72][Pro95, Sec. 5-1-4] and detect the sequence of transmitted symbols without incurring in noise-enhancement, the objective is to estimate the complex channel response at the 6.4. BLIND CHANNEL IDENTIFICATION 163 matched filter output sampled at one sample per symbol, that is, αn = where g(t) L−1 kk g (nT − k/W ) ≈ k=0 L−1 hk sinc (n − k/W T ) k=0 p(τ )p(t + τ )dτ stands for the shaping pulse at the matched filter output and the last equality holds if g (t) is an ideal Nyquist pulse without truncation. Although it is not strictly necessary, hereafter W T is assumed to be an integer for the sake of simplicity. In that case the vector of real parameters becomes α [Re {α0 } , . . . , Re {αN −1 } , Im {α0 } , . . . , Im {αN −1 }]T = Gθ, which is a linear transformation of θ given by matrix G I2 ⊗ T with N L/W T and [T]n,k g (nT − k/W ) ≈ sinc (n − k/W T ) . Taking now into account that the estimator is invariant in front of any linear transformation, Therefore, if an unbiased = Gθ. the optimal second-order estimator of α is directly given by α estimator of α = Gθ is aimed, it has to guarantee that MH Dr = I2N and, hence, the IPI-free solution stated in (4.12) must be adopted (see Section 4.4). Once the signal model is identified, the procedure for deducing the optimal second-order estimator is systematic and consists in finding the set of constituent matrices in (4.12) for the problem at hand. Regarding the matrix of derivatives Dr , the first coefficient h0 is supposed to be real valued in order to solve the phase ambiguity of second-order algortihms. Eventually, the matrix Dr is built stacking the derivatives of the vectorized covariance matrix r (θ) = vec (R (θ)) with respect to the real and imaginary part of the complex coefficients hk : ∂r (θ) /∂ Re {hk } = vec B (k/W ) AH (θ) + A (θ) BH (k/W ) ∂r (θ) /∂ Im {hk } = j vec B (k/W ) AH (θ) − A (θ) BH (k/W ) for k = 0, . . . , L − 1. As stated in [Zen97a, Theorem 2], the channel is identifiable if the channel Z-transform $ −k can be decomposed into N subchannels having different reciprocal zeros4 . H(z) L−1 ss k=0 hk z When this condition is not hold, Dr is singular for this channel realization (see Section 4.3). Notice that this condition is weaker than the usual identifiability condition [Ton95][Tug95]. 6.4.2 Numerical Results The simulated channel spreads over N = 3 symbol periods. The transmitted pulse p(t) is a squared-root raised cosine of roll-off r truncated to last 8 symbols. The channel taps hk are 4 z0 is a reciprocal zero of H(z) if H(z0 ) = H(z0−1 ) = 0. CHAPTER 6. CASE STUDIES 164 1 10 CML 0 10 Low−SNR ML −1 Normalized Variance 10 −2 10 GML −3 10 UCRB BQUE MCRB −4 10 −5 10 −6 10 0 10 20 30 SNR (dB) 40 50 60 Figure 6.21: Performance of the second-order ML-based estimators (Low-SNR approximation, Conditional ML and Gausian ML) and the optimal solution provided in Chapter 4. The simulation parameters are r = 0.35, µ = 0.02 (Bn = 5 · 10−3 ). generated as independent zero-mean Gaussian variables of unit variance. The receiver bandwidth is set to W = 2/T to encompass any roll-off factor. Consequently, the number of taps is L = NW T = 6. The observation window is set to M = 18 samples and the received signal is oversampled taking Nss = 3 samples per symbol. Finally, the transmitted symbols are QPSK. in The figure of merit computed in this section is the normalized variance of the estimator α the steady-state. The expected value with respect to the random channel will be computed in order to obtain the average performance of the estimator. Thus, the channel estimator variance is defined in the following way: V AR Eθ E α (n) −α2 α2 where the expectation with respect to θ is approximated by averaging 100 random channels. The above figure of merit will be plotted as a function of the signal-to-noise ratio. The noise variance σ 2w will be adjusted at each realization to maintain the SNR since the received power depends on the actual channel response. The ML-based estimators discussed in Chapter 3 and the corresponding bounds are evaluated and compared with the optimal second order estimator formulated in Chapter 4. In all the cases, a closed-loop scheme is implemented with its bandwidth adjusted to guarantee the small-error 6.4. BLIND CHANNEL IDENTIFICATION 165 −1 10 −2 10 Normalized variance UCRB −3 10 64−QAM (GML&BQUE) MCRB −4 10 QPSK (GML) −5 10 QPSK (BQUE) −6 10 10 20 30 40 SNR (dB) 50 60 70 Figure 6.22: Comparison of the BQUE and GML estimators when the channel amplitude is estimated too and the transmitted symbols are QPSK, 64-QAM or, Gaussian distributed for a roll-off factor equal to 1. condition for the simulated SNR range. Suboptimal Algorithms comparison: Low-SNR ML, CML, GML Using the CML method the channel can be determined up to a constant complex factor [Car00]. For comparison, all the methods will assume that the value of the first coefficient is 1. Fig. 6.21 points out that the low-SNR approximation suffers from a severe high-SNR floor due to the self-noise contribution. On the other hand, the CML criterion is shown to be not useful for channel estimation because its variance is extremely high within the range of operative SNRs. Contrarily, the Gaussian model is shown to be appropiate to build good estimators of the channel response. Uniquely at high SNR, the exploitation of the discrete distribution of the symbols is found to improve the estimator accuracy. The UCRB is also depicted showing that is a valid lower bound for the performance of second-order techniques. Nonetheless, in Fig. 6.21 the UCRB is shown to be a little optimistic in high-SNR scenarios. Finally, the MCRB predicts the theoretical performance that data-aided schemes would attain compared with second-order blind estimators. Clearly, the insertion of pilot symbols improves notably the estimator performance for any SNR. Additionally, another important advantage of DA methods is that they do not exhibit outliers at low SNR because CHAPTER 6. CASE STUDIES 166 −3 10 GML −4 Normalized variance 10 r=1 −5 10 r=1/3 r=0 BQUE (r=0,0.35,1) −6 10 −7 10 −8 10 10 20 30 40 SNR (dB) 50 60 70 Figure 6.23: Comparison of the BQUE and GML estimators if the channel is multiplicative using different values of the roll-off factor. the estimator is a linear transformation of the parameters. Channel amplitude estimation: GML vs BQUE In Fig. 6.22, the amplitude of the first channel tap is estimated too. The estimator variance exhibits a severe floor at high-SNRs due to the self-noise unless the symbols are drawn from a constant modulus constellation such as M-PSK or CPM. The floor level is inversely proportional to the observation time, which means that the estimator is consistent for any SNR, and is related to the amplitude dispersion of the constellation. In order to clarify these conclusions, in Appendix 6.B the high-SNR asymptotic variances at high SNR for the GML and the optimal quadratic estimator is deduced when the transmitted signal is linearly modulated and the channel is multiplicative (Fig. 6.23). The asymptotic expressions obtained therein predict exactly the aforementioned floor showing its dependence on the constellation fourth-order moment ρ and the number of observed symbols. Regarding the QPSK simulation (ρ = 1), the BQUE asymptotic variance is inversely proportional to the SNR whereas the GML performance degrades for high SNR (Fig. 6.22). The underlying motive is the poor estimation of the channel amplitude, as depicted in Fig. 6.23. In this figure, the GML suffers a transitory floor because the Gaussian assumption fails gradually as the SNR is augmented. 6.5. ANGLE-OF-ARRIVAL (AOA) TRACKING 6.5 167 Angle-of-Arrival (AoA) Tracking The classical approach in array signal processing considers that the sources are deterministic unknowns (conditional model) or, alternatively, Gaussian random variables (unconditional model) [Sto90a][Ott93]. As a consequence of the Central Limit Theorem, the Gaussian assumption provides optimal second-order DOA trackers with independence of the actual distribution of the sources if the number of sensors is asymptotically large [Sto89] or the SNR asymptotically low, as studied in Chapter 7. However, in the context of mobile communications, the array size is limited and the above asymptotic condition is unrealistic. In these scenarios, the consideration of the discrete distribution of the transmitted signals yields a significant improvement in terms of tracking variance when two or more sources transmit from a similar DOA, even if the SNR is moderate. Notice that this improvement is not obtained exploiting the signal cyclostationarity [Gar88b][Sch89][Xu92][Rib96] because we consider that all the users transmit using the same modulation and, thus, share the same cyclostationarity. Nonetheless, it would be straightforward to incorporate this information if the received signal were oversampled as indicated in Section 6.1. From this background, in the next subsection, we have sketched the formulation of the optimal second-order DOA tracker when the transmitted signals are digitally modulated. The performance of the resulting estimator constitutes the lower bound for the variance of any sample covariance based DOA estimator including the ML [Sto90a][Ott93] and subspace based methods such as the Pisarenko’s [Pis73], MUSIC [Sch79][Bie80][Sto89], ESPRIT [Roy89], MODE [Sto90b], weigthed subspace fitting (WSF) [Vib91] and other variants (see [Kri96][Ott93] and references therein). Notice that all these quadratic methods achieve the same asymptotic performance under appropiate hypothesis on the array manifold, when the observation time goes to infinity, as proved in [Ott92][Car94]. However, in this section they are shown to be inefficient —even in the asymptotic case— if the symbols are drawn from a constant modulus alphabet (e.g., MPSK or CPM). The results in this section were presented in the IEEE Asilomar Conference on Signals, Systems and Computers that was held in Pacific Grove (USA) in 2003 [Vil03b]: • “Second-Order DOA Estimation from Digitally Modulated Signals”, J. Villares, G. V´azquez, Proc. of the 37th IEEE Asilomar Conference on Signals, Systems and Computers, Pacific Grove (USA), November 2003. CHAPTER 6. CASE STUDIES 168 6.5.1 Signal Model In all the estimation problems addressed in this chapter, the parameter θ remains static during the observation time. However, in the context of mobile communications, it is important to obtain an accurate estimate of the users angular position but also to track their location as the transmitters move around the base station. Consequently, it is necessary to estimate both the angle and the angular speed of every source transmitting towards the base station. Higher-order derivatives of the AoA (acceleration and so on) will be disregarded from the study for simplicity. Therefore, based on the small-error estimators obtained in (4.12) and (4.18), a closed-loop scheme (tracker) as the one suggested in Section 2.5.2 will be implemented in order to track the parameter θn evolution. To do so, the parameter dynamics (speed, acceleration, etc.) must be incorporated into the model. Formally, let us consider the problem of tracking the angle-of-arrival of P narrowband sources impinging into a uniform linear array composed of M antennas spaced λ/2 meters, with λ the common signal wavelength. Let us consider that all the transmitters are visible from the base station array and they do not experience multipath propagation. Let φ (t) ∈ [−π, π)P be the temporal evolution of the P angles-of-arrival in radians and φ (t) ∂φ (t) /∂t the respective derivatives accounting for the angular speed. Let us assume that the acceleration and higherorder derivatives are negligible during the observation time, that is, ∂ i φ (t) /∂ti = 0 for i > 1. Furthermore, let us assume that the bandwidth of φ(t) does not exceed 1/2T , with T the symbol period. In that case, φ (nT ) holds the sampling theorem and the P trajectories can be ideally reconstructed from their samples φn φ (nT ) yielding the following discrete-time dynamical model or state equation: φn−k = φn − kφn (6.48) where the angular speed φn is normalized to the symbol period T . Therefore, the composed vector of parameters that must be estimated to track the users without having any systematic pursuit error is θn+1 φn+1 φn+1 = Gθn (6.49) with G IP IP 0P IP Consequently, the optimal second-order AoA tracker is given by n + diag (µ) GJ# (θ n )DH Q−1 (θ n ) n ) , n+1 = Gθ r − r( θ θ r 2 (6.50) 6.5. ANGLE-OF-ARRIVAL (AOA) TRACKING 169 using the small-error expression obtained in (4.12) with Dg = G. Notice that the above expression follows the structure of the optimal ML tracker in Section 2.5.2 with h (θn ) = Gθn . The above solution forces to zero all the cross-derivatives of g (θ) including the IPI terms associated to the interference from other users (Section 4.4). This interference is referred to as multiuser or multiple access interference (MUI or MAI) in the literature. Thus, the second-order AoA tracker in (6.50) will be referred to as the MUI-free AoA tracker hereafter. On the other hand, following the reasoning in Section 4.4, it is not strictly necessary to cancel out the cross-derivatives corresponding to different users because the tracker optimization will remove the MUI contribution if the SNR is sufficiently high. Likewise, if the SNR is low, the MUI term will be automaticaly ignored to not enhance the noise contribution. Thus, it is only necessary to decouple the estimates of φn and φn in order to have unbiased estimates of θn+1 in (6.49). If not, AoA estimation errors would yield angular speed deviations and vice versa. To avoid this, we have to constrain these cross-derivatives to zero, as indicated next: n n ∂ φ ∂ φ p p = = 0 p = 1, ..., P ∂ [φn ]p ∂ φn p θ=θ n (6.51) θ=θ n while the rest of cross-derivatives is liberated. We will refer to this solution as the MUI-resistant AoA tracker in the sequel. To complete the signal model, the received signal is passed throught the matched-filter and then sampled at one sample per symbol in order to collect K snapshots5 . Independent snapshots are obtained assuming that the actual modulation is ISI-free and, consequently, the symbol synchronization has been established. On the other hand, it can be shown that carrier synchronization is not required since the matrix Q (θ) is insensitive to the phase of the P × K nuisance parameters. According to the considerations above, the snapshot recorded at time n − k is given by yn−k = An−k (θn ) xn−k + wn−k where xn−k is the vector containing the symbols transmitted by the P users at time n − k, wn−k the white noise samples and the p-th column of An−k (θn ) , ⎤ 1 ⎥ ⎢ ⎥ ⎢ exp jπ sin [φ ] − k φ n n p p ⎥ ⎢ [An−k (θn )]p ⎢ ⎥, .. ⎥ ⎢ . ⎦ ⎣ exp jπ (M − 1) sin [φn ]p − k φn p ⎡ 5 Notice that the problem dynamics (angle and angular velocity) require to process K ≥ 2 snapshots. CHAPTER 6. CASE STUDIES 170 is the steering vector associated to the p-th source at time n − k with j √ −1. Notice that An−k (θn ) incorporates the known dynamical model (6.48). In order to reproduce the vectorial model in (2.13), the K snapshots are stacked to build the following spatio-temporal observation: ⎡ ⎤ yn ⎢ ⎥ .. ⎥ = A (θn ) x (n) + w (n) y (n) ⎢ . ⎣ ⎦ yn−K+1 where x (n) and w (n) are constructed as y (n) and the transfer matrix A (θn ) is given by ⎡ ⎤ An (θn ) ⎢ ⎥ .. ⎥ A (θn ) ⎢ . ⎣ ⎦ An−K+1 (θn ) As stated before, once the signal model has been determined, we only have to find the set of constituent matrices in (4.12) and (4.18). To conclude, the derivatives of the steering vectors are provided next: ∂ [An−k (θn )]p,m ∂ [φn ]q = jπm cos [φn ]p − k φn p exp jπ sin [φn ]p − k φn p δ (p − q) ∂ [An−k (θn )]p,m = −jπmk cos [φn ]p − k φn p exp jπ sin [φn ]p − k φn p δ (p − q) ∂ φn q for all p, q ∈ {1, . . . , P } where δ (·) stands for the Kronecker delta. 6.5.2 Numerical Results Two independent sources transmitting from the far-field to a uniform linear array composed of M = 4 antennas are simulated. The received power is assumed to be the same for simplicity. Both signals are QPSK modulated and two snapshots (K = 2) are recorded at the matched-filter output. The figure of merit considered in this section is the estimator normalized steady-state variance defined as V AR (∆φ) 2 E φ −φ n n P ∆φ2 (6.52) with ∆φ ([φn ]2 − [φn ]1 ) /2 half of the sources separation. The variance will be plotted as a function of the SNR per source at the matched-filter output Es /N0 = σ−2 w with Es the received symbol energy and N0 the noise double-sided spectral density. Two AoA trackers forcing a different set of constraints on g (θ) will be tested: 6.5. ANGLE-OF-ARRIVAL (AOA) TRACKING 171 AoA (rad) MUI−free AoA tracker MUI−resistant AoA tracker 0.15 0.15 0.1 0.1 0.05 0.05 0 0 −0.05 −0.05 −0.1 −0.1 −0.15 −0.15 450 500 n (time) 550 450 500 n (time) 550 Figure 6.24: AoA tracking of two users whose trajectories cross at time instant n=500. The output of the MUI-free and MUI-resistant trackers is plotted on the left and right hand side, respectively. Two simulations are run with two different outcomes for the MUI-free tracker: tracking is lost (solid line) or the two sources are interchanged (dashed line). The signal SNR is fixed to 10 dB in both cases. 1. MUI-free AoA tracker: MH Dr = Dg = IP ; 2. MUI-resistant AoA tracker: diag MH Dr = diag (Dg ) = diag (IP ) and the cross terms in (6.51) are set to zero. Two different scenarios have been simulated in order to illustrate the benefit of considering the actual distribution of the sources when they are transmitting from similar angles. Two users crossing Figure 6.24 shows that the MUI-free AoA tracker (left plot) loses tracking as the two sources approach each other due to the noise enhancement observed when the SNR is low (SNR=10dB). This situation arises because, when the users are transmitting from similar angles, the matrix Dr becomes nearly singular and the estimator variance (4.13) augments suddenly. On the other hand, the MUI-resistant AoA tracker (right plot) overcomes this critical situation because it does not try to remove the MUI term associated to the cross derivatives of Dr when the noise contribution is dominant (low SNR). Following the explanation in Section 4.4, CHAPTER 6. CASE STUDIES 172 −1 10 CML Low−SNR ML −2 10 Normalized Variance UCRB −3 10 GML MCRB −4 10 BQUE −5 10 −6 10 0 5 10 15 20 Es/No (dB) 25 30 35 40 Figure 6.25: Steady-state variance of the AoA tracker for two sources located at ±5 degrees from the broadside. The loop-bandwidth is set to Bn = 1.25 · 10−3 and the MUI-free estimator is simulated. the MUI-resistant AoA tracker liberates the cross derivatives in Dr while the users are crossing and matrix Dr is badly conditioned. In this manner, the tracker does not enhance the noise contribution and is able to remain “locked” during the crossing. Steady-state variance for two near sources The steady-state variance of the MUI-free AoA tracker is evaluated as a function of the SNR, considering that we have two still users separated 10o (Fig. 6.25) and 1o (Fig. 6.26). The noise equivalent loop bandwidth Bn (Section 2.5.2) has been selected in order to guarantee the small-error condition for all the simulated SNRs (Section 4). For the studied set-up, the noise enhancement caused by the sources proximity is found to be negligible. This fact makes the two suggested implementations (MUI-resistant and MUI-free) to be practically equivalent in the simulated scenarios. A minor improvement is appreciatted in Fig. 6.26 for low SNR. Theoretically, the performance of the MUI-free estimator is very limited at low SNR when the two sources are close, as shown in figure 6.27, whereas its competitor (MUI-resistant) achieves the single user performance whatever the simulated SNR. Thus, Fig. 6.27 illustrates the potential gain that the MUI-resistant alternative offers in terms of steady-state variance when the problem is badly conditioned and the observations are very corrupted by the noise. 6.5. ANGLE-OF-ARRIVAL (AOA) TRACKING 173 2 10 1 10 Low−SNR ML 0 10 −1 Normalized Variance 10 CML −2 10 −3 10 UCRB −4 10 MUI−resistant GML MCRB −5 10 −6 10 BQUE −7 10 10 15 20 25 30 35 Es/No (dB) 40 45 50 55 60 Figure 6.26: Steady-state variance of the AoA tracker for two sources located at ±0.5 degrees from the broadside. The loop-bandwidth is set to Bn = 1.65 · 10−4 and the MUI-free estimator is simulated. In Fig. 6.25 and Fig. 6.26, the optimal second-order tracker has also been compared with the ML-based trackers formulated in Section 2.4. The first conclusion is that the low-SNR approximation appears to be useless in these critical scenarios for the SNRs of interest. The underlying motive is the so-called self-noise, i.e., the variance floor caused by the nuisance parameters at high SNR (Section 2.4.1). The self-noise is really irrelevant when the SNR tends to zero but it becomes dominant as soon as the SNR is increased. Notice that, in the AoA estimation problem at hand, the so-called self-noise is generated by the random symbols (nuisance parameters) from the user of interest as well as the other interfering users. Therefore, the MUI and self-noise contributions are strongly connected in this case study. To overcome the low-SNR UML variance floor, the CML tracker was proposed in Section 2.4.2. The CML is able to yield self-noise free estimates but it suffers from noise enhancement when the SNR is low because it tries to decorrelate the nuisance parameters from the different users. Regarding the GML AoA tracker presented in Section 2.4.3, the convergence to the CML solution for high SNR and to the low-SNR UML solution for low SNR (if the x -axis were expanded) is observed. Between these two asymptotic extremes, the GML adjusts its coefficients depending on the actual SNR to minimize the joint contribution of the noise and the self-noise. Indeed, the GML solution is found to be the best quadratic estimator or tracker based uniquely CHAPTER 6. CASE STUDIES 174 −2 10 0.1 MUI−free 0.2 −3 10 0.5 1 −4 Normalized Variance ·∆φ2 10 −5 10 MUI−resistant −6 10 −7 10 −8 10 −9 10 0 5 10 15 20 γ (SNR) 25 30 35 40 Figure 6.27: Normalized variance as a function of the SNR for the MUI-free and MUI-resistant AoA trackers when the sources are separated 0.1, 0.2, 0.5 or 1 as indicated for each curve. The tracker loop bandwidth is set to Bn = 1.25 · 10−3 . on the second-order moments of the nuisance parameters, i.e., if K = 0 in (3.10). Nonetheless, when comparing the variance of the GML and the BQUE AoA trackers in Figs. 6.25-6.26, it is confirmed that second-order estimation is improved for medium-to-high SNRs if the fourthorder statistical knowledge on the nuisance parameters (K = 0) is exploited. The resulting gain is shown to be greater when the angular separation is reduced if one compares Fig. 6.25 and Fig. 6.26. Moreover, when the loop bandwidth Bn is small (Fig. 6.26), the BQUE performance is rather close to the one predicted by the MCRB in case of known nuisance parameters and, it definitely constitutes the lower bound for the variance of any unbiased estimator based on the sample covariance matrix. Surprisingly, the GML estimator does not attain the UCRB bound out of the aforementioned asymptotic cases because the nuisance parameters are actually non-Gaussian (QPSK discrete symbols) and the UCRB is based on the Gaussian assumption. 6.A COMPUTATION OF Q FOR CARRIER PHASE ESTIMATION 175 Appendix 6.A Computation of Q for carrier phase estimation The expected value of ∇2 (y; θo ) (6.12) can be manipulated as follows 2 2 −8 2 −j2θo H −8 −j2θo H j2θo H r e r E{∇ (y; θo )} = 4σw E Im e r = −σ w E r−e r r H T ∗ H −j4θo H r− Re e r E r E = 2σ−8 r r r r r w T H r = 2σ−8 E r rH − e−j4θo E r r w r bearing in mind that both Γ and Ex vec xxT vecT xxT are real amounts for any CPM signal according to the Laurent’s expansion [Lau86]. In that case, the proper and improper correlation matrices of r can be computed as follow: T E r r = ej4θo AEx vec xxT vecT xxT AT H E r r = AEx vec xxT vecH xxT AH + + (I + K) Rw ⊗ AAH + (I + K) AAH ⊗ Rw + + (I + K) (Rw ⊗ Rw ) = A Ex vec xxT vecH xxT − 2P AH + 2P R ⊗ R where P 1 2 (I + K) and the following identities have been applied as done in Appendix 3.B: vec ABCT = (C ⊗ A) vec (B) (A ⊗ B) (C ⊗ D) = AC ⊗ BD T vec ab vecH abT = (b ⊗ a) (b ⊗ a)H = bbH ⊗aaH vec baT vecH abT = K vec abT vecH abT = K bbH ⊗aaH = aaH ⊗bbH K. Finally, if Q is defined as Q AKAH + 2R ⊗ R with K given in (6.19), then H −8 H E{∇2 (y; θo )} = 2σ−8 w r PQr = 2σw r Qr using the following properties of the orthogonal projector P: PAKAT = APKAT = AKAT Pr = r. CHAPTER 6. CASE STUDIES 176 Appendix 6.B Asymptotic expressions for multiplicative channels Let y =ax + w be the matched filter output with a ∈ (0, +∞) the amplitude we aim to estimate, x the vector of N symbols and w the AGW noise of variance σ2w . From this simple model and after some manipulations, the variance of the GML and BQUE estimators is identical and is given by V AR = BU CRB + a2 (ρ − 2) 4N where BU CRB denotes the associated UCRB (Section 2.6.1): BU CRB 2 2 a + σ2w = 4Na2 If we take the limit of V AR when the noise variance tends to zero, we obtain V AR = σ2 a2 (ρ − 1) + w + o σ2w , 4N 2N that only goes to zero as the noise vanishes if N → ∞ (consistent estimator) or ρ = 1, which is the case of the MPSK modulation, proving that the signal amplitude can only be perfectly estimated from a finite observation in case of constant amplitude modulations such as the MPSK. Finally, notice that the performance of the GML and the BQUE is generally different if the estimator operates directly on the received signal as shown in Fig. 6.23. Chapter 7 Asymptotic Studies In this chapter, some asymptotic results are provided for the second-order estimators formulated in Chapters 3 and 4. In the first sections, their asymptotic performance is evaluated when the SNR is very low or very high. The low-SNR study concludes that the nuisance parameters distribution is irrelevant in a noisy scenario. In that case, the Gaussian assumption is shown to yield efficient estimators. On the other hand, the high-SNR asymptotic study is useful to bound the loses incurred when the Gaussian assumption is applied in spite of having non-Gaussian nuisance parameters. The most important conclusion is that the Gaussian assumption leads to optimal second-order schemes unless the nuisance parameters belong to a constant modulus constellation, such as MPSK or CPM. Therefore, the Gaussian assumption applies for very important constellations in digital communications such as QAM or APK. The theoretical study is accompanied with some simulations for the problem of bearing estimation in case of digitally-modulated signals (Section 7.5.1). Numerical results are also provided in Section 7.5.2 for the problem of feedforward second-order frequency estimation initially addressed in Section 4.5. The same asymptotic study was carried out in Section 6.2 for the carrier phase estimation problem in case of noncircular transmissions. In the second part of the chapter, the asymptotic performance of the second-order smallerror estimators in Chapter 4 is evaluated when the data record grows to infinity. Asymptotic expressions are deduced for a vast majority of estimation problems in digital communications, such as timing and frequency synchronization, channel impulse response estimation and timeof-arrival estimation, among others. In that case, the large sample asymptotic expressions become a function of the spectra of the received waveform and its derivatives. In this context, a simple condition is obtained that identifies whether the Gaussian assumption yields optimal second-order schemes or not. From this result, the Gaussian assumption is proved to be optimal for timing and frequency synchronization. Some simulations are supplied in Section 7.5.2 that validate this last conclusion. 177 CHAPTER 7. ASYMPTOTIC STUDIES 178 Asymptotic expressions are also obtained for the DOA estimation problem when the spatiotemporal observation grows indefinitely. If the number of antennas increases, it is shown that the covariance of the estimation error is asymptotically independent of the sources statistical distribution and, therefore, the Gaussian assumption can be applied to obtain efficient DOA estimators. On the other hand, the Gaussian assumption is found to yield an important loss if the number of sensors is small and multiple constant-modulus signals (e.g., MPSK or CPM) impinge into the array from near directions. These conclusions are validated numerically in Section 7.5.3. 7.1 Introduction Let us summarize first the main results obtained in Chapters 3 and 4. As it was shown therein, any second-order estimator of α = g(θ) is an affine transformation of the sample covariance matrix, having the following form: = g + MH ( r − r) α where g =Eθ {g(θ)} and r =Eθ E { r} are the a priori knowledge about the parameter α and the quadratic observation r = vec(yyH ), respectively. Based on the linear signal model presented in Section 2.4, matrix M was optimized in Chapters 3 and 4 by adopting different criteria. For the large-error MMSE and minimum variance second-order estimators studied in Chapter 3, matrix M was given by −1 Mmse Q + Q S # QQ −1 Q S Mvar Q−1 Q (7.1) was introduced in (3.23), and Q is the Bayesian expectation, Eθ {·} , of matrix where Q Q (θ) = R (θ) + A (θ) KAH (θ) , with K the fourth-order cumulant matrix in (3.11) and A (θ) = A∗ (θ) ⊗ A (θ) R (θ) = R∗ (θ) ⊗ R (θ) (7.2) R (θ) = A (θ) AH (θ) + Rw . On the other hand, the optimum second-order small-error estimator was obtained in Chapter 4, having that −1 H −1 Mbque (θ) Q−1 (θ) Dr (θ) DH (θ) Dr (θ) Dg (θ) r (θ) Q (7.3) 7.1. INTRODUCTION 179 where θ stands henceforth for the actual value of the parameter and ∂A (θ) H ∂R (θ) ∂AH (θ) = vec A (θ) + A (θ) [Dr (θ)]p = vec ∂θp ∂θp ∂θp (7.4) is the derivative of R (θ) with respect to the p-th parameter, i.e., θp [θ]p . The MSE matrices for the above estimators are given by1 −1 Σmse Σg − SH Q + Q S # −1 Q #S Σvar Σg + SH QQ S − SH Q −1 H −1 (θ) Dr (θ) Dg (θ) . Bbque (θ) Dg (θ) DH r (θ) Q (7.5) where Dg (θ) = ∂g (θ) /∂θT and Σg Eθ (g(θ) − g) (g(θ) − g)H stands for the prior covariance matrix. Finally, when the above estimators are deduced under the Gaussian assumption (i.e., K = 0 and Q = R), their performance is given by the following MSE matrices: −1 +R Σmse Σg − SH Q S + Xmse (K) # −1 Q # S + Xvar (K) S − SH Q Σvar Σg + SH QR (7.6) Bgml (θ) BUCRB (θ) + Xgml (K) . where −1 H −1 BU CRB (θ) Dg (θ) DH (θ) Dr (θ) Dg (θ) r (θ) R (7.7) is the well-known (Gaussian) unconditional CRB (Section 2.6.1) and, Xmse (K), Xvar (K), Xgml (K) are the terms depending on the kurtosis matrix K, which are given by −1 −1 +R +R Xmse (K) SH Q Eθ A (θ) KAH (θ) Q S # # −1 Q −1 Eθ A (θ) KAH (θ) R−1 Q QR −1 Q Xvar (K) SH QR QR S (7.8) −1 H −1 Dr (θ) R−1 (θ) A (θ) KAH (θ) R−1 (θ) Dr (θ) (θ) Dr (θ) Xgml (K) Dg (θ) DH r (θ) R −1 H H Dg (θ) Dr (θ) R−1 (θ) Dr (θ) It will be shown in next sections that Xgml (K) is always negligible for very low or high SNR. Nonetheless, the GML estimator might outperform the associated UCRB if the SNR is 1 The “MSE matrix” is defined as Eθ E eeH where e stands for the considered estimation error [Kay93b]. CHAPTER 7. ASYMPTOTIC STUDIES 180 moderate and Xgml (K) is negative. This behaviour has been observed, for example, in the DOA estimation problem in Section 6.5. On the other hand, in this chapter, the Gaussian assumption is proved to yield the optimal second-order estimator when the SNR goes to zero or if the amplitude of the nuisance parameters is not constant and the SNR goes to infinity. Finally, regarding the large-error MMSE and minimum variance estimators, Xvar (K) and Xmse (K) are irrelevant at low SNR but they are determinant at high SNR because they are able to reduce the variance floor. Before going into detail, let us decompose the noise covariance matrix Rw as σ2w N in order to make explicit the dependence on the noise variance σ2w . Assuming that the noise is stationary, the diagonal entries of Rw are precisely the noise variance σ2w . Formally, the noise variance is given by σ2w Tr (Rw ) /M. and, therefore, N = Rw /σ 2w has unitary diagonal entries, by definition. Furthermore, in next sections, it will be useful to consider the following fourth-order matrix: N N∗ ⊗ N. 7.2 Low SNR Study When the noise variance goes to infinity (σ2w → ∞), the inverse of R (θ), Q (θ) and R(θ) take the following asymptotic form: −1 R−1 (θ) = σ−2 + o σ −2 w N w −1 + o σ−4 Q−1 (θ) ,R−1 (θ) = σ−4 w N w assuming that N is full-rank. The Landau symbol o (x) is introduced to consider all those terms that converge to zero faster than x. On the other hand, the rest of matrices appearing in (7.5) and (7.6) are independent of σ2w . Specifically, the noise variance does not affect the value of S, Dr (θ), Dg (θ), K and Σg . A (θ), A (θ), Q, Therefore, the MSE matrices in (7.5) and (7.6) have the following asymptotic expressions at low SNR: H −1 Σmse , Σmse = Σg − σ−4 S + o σ −4 w S N w # −1 Q (7.9) S + o σ4w Σvar , Σvar = σ 4w SH QN −1 H −1 BU CRB (θ) , Bgml (θ) , Bbque (θ) = σ 4w Dg (θ) DH (θ) Dr (θ) Dg (θ) + o σ4w . r (θ) N taking into account that Xmse (K) in (7.8) is proportional to σ−8 w and, Xvar (K) and Xgml (K) are constant. Notice that the fourth-order matrix K does not appear in none of the above 7.3. HIGH SNR STUDY 181 asymptotic expressions. This implies that the actual distribution of the nuisance parameters becomes irrelevant at low SNR when designing second-order schemes. Moreover, any assumption about the distribution of the nuisance parameters yields the same MSE expressions in (7.9). To complete the analysis, the asymptotic expression of the studied second-order estimators is provided next: −1 S + o σ −4 Mmse , Mmse = σ −4 w N w # QN −1 Q Mvar , Mvar = N −1 Q S + o (1) (7.10) −1 H −1 (θ) Dr (θ) Dg (θ) + o (1) Mbque (θ) , Mgml (θ) = N −1 (θ) Dr (θ) DH r (θ) N where Mmse and Mvar correspond to the minimum variance and MMSE estimators obtained under the Gaussian assumption. In Appendix 7.A, it is shown that Mbque (θ) and Mgml (θ) in (7.10) coincide with the scoring method that implements the low-SNR ML estimator deduced in Section 2.4.1. Due to the asymptotic efficiency of the ML estimator (Section 2.3.2), if the GML and BQUE estimators converge to the ML solution at low SNR, we can state that the GML and BQUE estimators become asymptotically efficiency as the SNR goes to zero. As it was discussed in Section 2.3.2, the “asymptotic” condition is satisfied whenever the estimator operates in the small-error regime or, equivalently, the actual SNR exceeds the SNR threshold. Accordingly, in the studied low SNR scenario (σ2w → ∞), the asymptotic condition requires that the observation length goes to infinity (M → ∞) in order to attain the small-error regime. Likewise, because the GML is efficient at low SNR, the associated (Gaussian) UCRB (7.9) becomes the true CRB at low SNR if and only if the observation size goes to infinity (smallerror). Notice that both the UCRB and the true CRB are proportional to σ−4 w at low SNR, as it was reported in [Ste01] for the problem of timing synchronization. 7.3 High SNR Study In low SNR conditions, the Gaussian assumption has been proved to yield optimal second-order estimators. However, when the SNR increases, the optimal second-order estimators listed in (7.1) and (7.3) exploit the fourth-order statistical information about the nuisance parameters contained in matrix K. When the Gaussian assumption is adopted and this information is omitted (K = 0), the performance of the studied second-order estimators degrades at high SNR. In this section, this loss is upper bounded by evaluating the asymptotic performance of the aforementioned estimation methods when the noise variance goes to zero. In Appendix 7.B, the asymptotic value of R−1 (θ) and R−1 (θ) as the noise variance goes to CHAPTER 7. ASYMPTOTIC STUDIES 182 zero is calculated, obtaining 4 ⊥ 2 R−1 (θ) = σ −2 w PA (θ) + B (θ) − σ w B (θ) NB (θ) + O σw ⊥∗ ⊥ R−1 (θ) = σ −4 P (θ) ⊗ P (θ) w A A ∗ ⊥ ⊥∗ B +σ−2 (θ) ⊗ P (θ) + P (θ) ⊗ B (θ) w A A (7.11) +B∗ (θ) ⊗ B (θ) −σ2w (B∗ (θ) ⊗ B (θ) NB (θ) + [B (θ) NB (θ)]∗ ⊗ B (θ)) +O σ4w (7.12) where the Landau symbol O (x) includes all the terms that converge to zero as x or faster. The asymptotic value of R−1 (θ) and R−1 (θ) is given in terms of the following matrices: −1 H A (θ) N−1 A# (θ) AH (θ) N−1 A (θ) −1 # P⊥ I (θ) N − A (θ) A (θ) M A H B (θ) A# (θ) A# (θ) (7.13) (7.14) (7.15) where A# (θ) and P⊥ A (θ) are variations of the Moore-Penrose pseudoinverse and the projector onto the null subspace of A (θ), respectively. The original definitions are altered to include the whitening matrix N−1 in case of correlated noise samples (i.e., N = IM ). Although abusing of notation, the above matrices retain all the properties of the original definition, that is, A# (θ) A (θ) = IK A (θ) A# (θ) A (θ) = A (θ) A# (θ) A (θ)# A (θ) = A# (θ) P⊥ A (θ) A (θ) = 0 AH (θ) P⊥ A (θ) = 0. On the other hand, the asymptotic value of Q−1 (θ) depends on the kurtosis matrix K. The complete study is carried out in Appendix 7.C when K is full-rank and in Appendix 7.D when K is singular. In these appendices, Q−1 (θ) is proved to have the following asymptotic expression: −1 Q −1 (θ) = R (θ) + σ−2 w H # A (θ) P⊥ K (θ) A (θ) + O (1) # (7.16) where the pseudoinverse of A (θ) = A∗ (θ) ⊗ A (θ) (7.2) is defined as follows ∗ −1 H A (θ) N −1 = A# (θ) ⊗ A# (θ) . A# (θ) AH (θ) N −1 A (θ) and P⊥ K (θ) stands for the projector onto the subspace generated by the eigenvectors of H associated to the eigenvalue −1. K = VK ΣK VK 7.3. HIGH SNR STUDY 183 The second term in (7.16) is positive semidefinite and it becomes zero if and only if P⊥ K (θ) = 0, i.e., if all the eigenvalues of K are different from −1. The rank of P⊥ K (θ) is thus determinant to assess the potential benefit of considering the kurtosis matrix K in the design of secondorder estimators. The exact expression of P⊥ K (θ) is given in Appendix 7.C (K full-rank) and in Appendix 7.D (K singular). In Section 7.3.3, the study of P⊥ K (θ) will be addressed with more detail. It is worth realizing that all the above asymptotic results are implicitly assuming that the transfer matrix A (θ) is full column rank. This will be the baseline for the asymptotic studies in this section. In addition, some indications are given in Appendix 7.E to carry out the asymptotic study when the rank of A (θ) is lower than the number of columns K. 7.3.1 (Gaussian) Unconditional Cram´ er-Rao Bound The (Gaussian) UCRB is widely used to lower bound the performance of second-order estimators. Thus far, it is proved that the UCRB is a valid second-order lower bound when the SNR goes to zero or if the nuisance parameters are actually Gaussian. Nonetheless, this is not generally true. Indeed, the UCRB is shown to be outperformed at high SNR by the optimal second-order small-error estimator proposed in Chapter 4. Likewise, the GML estimator usually outperforms the UCRB for intermediate SNRs. In this section, the high-SNR limit of BU CRB (θ) when the noise variance goes to zero is derived. It is shown that BU CRB (θ) becomes proportional to σ2w at high SNR and, therefore, self-noise free estimates are feasible when the nuisance parameters are Gaussian. Formally, we have that −1 H −1 BUCRB (θ) = Dg (θ) DH (θ) Dr (θ) Dg (θ) r (θ) R 4 = σ 2w Dg (θ) B1−1 (θ) DH g (θ) + O σw (7.17) −1 (θ) D (θ) . The entries of B (θ) where B1 (θ) stands for the high-SNR limit of σ2w DH r 1 r (θ) R are determined in Appendix 7.F, obtaining [B1 (θ)]p,q = 2 Re Tr ∂AH (θ) ⊥ ∂A (θ) . PA (θ) ∂θp ∂θq (7.18) Notice that this result requires that ∂A (θ) /∂θp does not lie totally on the subspace generated by the columns of A (θ), i.e., P⊥ A (θ) ∂A (θ) = 0 ∂θp for all the parameter θ1 , . . . , θ P . This abnormal situation takes place if the noise subspace of matrix A (θ) is null (Appendix 7.D) but also in the problem of carrier phase synchronization CHAPTER 7. ASYMPTOTIC STUDIES 184 addressed in Section 6.2. In both cases, the constant term B∗ (θ) ⊗ B (θ) in (7.12) has to be considered in order to evaluate the variance floor at high SNR, having that lim BUCRB (θ) = Dg (θ) B2−1 (θ) DH g (θ) σ2w →0 (7.19) −1 (θ) D (θ) . The entries of B (θ) are where B2 (θ) stands for the high-SNR limit of DH r 2 r (θ) R determined in Appendix 7.G, obtaining ∂A (θ) # ∂A (θ) # ∂A (θ) ∂AH (θ) [B2 (θ)]p,q 2 Re Tr A (θ) A (θ) + B (θ) . ∂θp ∂θq ∂θp ∂θq 7.3.2 (7.20) Gaussian Maximum Likelihood In most estimation problems, the UCRB takes the form in equation (7.17) and self-noise free estimation is possible with Gaussian nuisance parameters. In that case, the asymptotic performance of the GML estimator is exactly the one computed in (7.17) irrespective of the actual distribution of the nuisance parameters, i.e., even if K = 0. Formally, we have that 4 Bgml (θ) , BU CRB (θ) = σ 2w Dg (θ) B1−1 (θ) DH g (θ) + O σw (7.21) with B1 (θ) given in (7.18). This statement is true because the term Xgml (K) in (7.6) can be neglected since it depends on σ4w whereas BUCRB (θ) is proportional to σ2w . Notice that Xgml (K) is proportional to σ4w because R−1 (θ) A (θ) = [B∗ (θ) ⊗ B (θ)] A (θ) + O σ 2w is asymptotically constant, as pointed out in Appendix 7.B. Finally, if ∂A (θ) /∂θp and A (θ) were linearly dependent, the GML performance would exhibit a variance floor at high SNR that would be a function of the kurtosis matrix K. Using equation (7.6), it follows that the GML variance floor would be equal to (7.22) lim Bgml (θ) = Dg (θ) B2−1 (θ) DH g (θ) H −1 H # # +Dg (θ) B2 (θ) Dr (θ) A (θ) KA (θ) Dr (θ) B2−1 (θ) DH g (θ) σ2w →0 where BU CRB (θ) = Dg (θ) B2−1 (θ) DH g (θ) is the variance floor in case of Gaussian nuisance parameters (7.19) and the second term corresponds to Xgml (θ) in (7.6). 7.3.3 Best Quadratic Unbiased Estimator In this section, closed form expressions are obtained for the ultimate performance of secondorder small-error estimators at high SNR. The study in Appendix 7.C and Appendix 7.D comes 7.3. HIGH SNR STUDY 185 to the conclusion that the Gaussian assumption is optimal at high SNR unless some eigenvalues of the kurtosis matrix K are equal to −1. It seems that this condition is related to the constant modulus of the nuisance parameters. This important result suggests to classify the nuisance parameters distribution according to the eigendecomposition of K. With this purpose, let us first obtain the asymptotic expression of Bbque (θ) (7.5) as the noise variance goes to zero, i.e., σ2w → 0. Using the asymptotic value of Q−1 (θ) in (7.16), it follows that −1 −1 DH (θ) Dr (θ) = DH (θ) Dr (θ) r (θ) Q r (θ) R H H # # D (θ) A (θ) P⊥ +σ−2 w r K (θ) A (θ) Dr (θ) + O (1) K where P⊥ K (θ) ∈ R 2 ×K 2 (7.23) denotes the projector onto the subspace generated by the eigenvectors of K associated to the eigenvalue −1. −1 (θ) D (θ) in (7.17), it follows that Using now the asymptotic expression of DH r r (θ) R (7.24) Bbque (θ) = σ 2w Dg (θ) −1 H # # 4 B1 (θ) + DH P⊥ DH r (θ) A (θ) K (θ) A (θ) Dr (θ) g (θ) + O(σ w ) where the second term inside the inverse is always positive semidefinite and, therefore, we can state at high SNR that Bbque (θ) ≤ Bgml (θ) = BU CRB (θ) . The second term of (7.24) is zero and, therefore, the Gaussian assumption applies at high SNR in any of the following situations: 1. Signal parameterization. The Gaussian assumption applies at high SNR if ∂A (θ) /∂θp lies totally in the noise subspace of A (θ), i.e, ∂A (θ) ∂A (θ) = P⊥ A (θ) ∂θp ∂θ p or, taking into account the definition of P⊥ A (θ) in (7.14), AH (θ) N−1 ∂A (θ) = 0. ∂θp In that case, after some simple manipulations, it can be shown that H ∂R (θ) # # # A (θ) Dr (θ) = vec A (θ) A (θ) ∂θp p H ∂AH (θ) # ∂A (θ) # = vec A (θ) + A (θ) =0 ∂θp ∂θ p (7.25) CHAPTER 7. ASYMPTOTIC STUDIES 186 and, thus, the second term in (7.24) is strictly zero independently of the nuisance parameters distribution. For example, this condition applies in digital synchronization as the observation length goes to infinity (Section 7.4.4). By comparing this condition and the one introduced in Section 7.3.1, we can conclude that the condition (7.25) never applies if the UCRB and GML suffer from self-noise at high SNR since, in that case, P⊥ A (θ) ∂A (θ) /∂θp = 0. 2. Nuisance parameters distribution. Regardless of the signal parameterization, the Gaussian assumption applies at high SNR if all the eigenvalues of the kurtosis matrix K are different from −1. In that case, P⊥ K (θ) is strictly zero, and the second term in (7.24) becomes zero. If the nuisance parameters are drawn from an arbitrary circular complex alphabet, the kurtosis matrix is given by K = (ρ − 2) diag (vec (IK )) (3.12) and, therefore, the Gaussian assumption always applies except if ρ = 1. It can be shown that this condition (ρ = 1) is solely verified in case of constant modulus alphabets. Accordingly, in the context of digital communications, the Gaussian assumption applies for any multilevel linear modulation such as QAM or APK. On the other hand, it does not apply in case of any complex MPSK modutation holding that ρ = 1. If the nuisance parameters are not circular, there is not a closed-form expression for the eigenvalues of K. However, it is found that the kurtosis matrix of some important constantmodulus noncircular modulations has some eigenvalues equal to −1. Among them, an special attention is given in this thesis to the CPM modulation. Other important constantmodulus noncircular modulations are the BPSK and those constant-modulus staggered modulations such as the offset QPSK [Pro95]. Finally, in those scenarios in which the UCRB (7.19) and the GML (7.22) exhibit a variance floor at high SNR because P⊥ A (θ) ∂A (θ) /∂θ p = 0, the Gaussian assumption fails when the nuisance parameters have constant modulus. In that case, the second term in (7.24) allows cancelling the self-noise because H −1 −2 H # # DH (θ) Q (θ) D (θ) = B (θ) + σ D (θ) A (θ) P⊥ r 2 r w r K (θ) A (θ) Dr (θ) + O (1) and, therefore, the constant term B2 (θ) (7.20) can be neglected when compared to the second term, that is proportional to σ−2 w . Using this result, the asymptotic variance of the optimal second-order estimator is given by −1 H 4 2 H # ⊥ # DH Bbque (θ) = σw Dg (θ) Dr (θ) A (θ) PK (θ) A (θ) Dr (θ) g (θ) + O σ w This situation arises in the carrier phase estimation problem studied in Section 6.2 as well as in the scenarios simulated in Section 4.5 in which A (θ) is not full-column rank. 7.3. HIGH SNR STUDY 7.3.4 187 Large Error Estimators In this section, the asymptotic performance of the Bayesian estimators in (7.1) is analyzed when the SNR goes to infinity. The result of this asymptotic study depends on the influence of the Bayesian expectation Eθ {·} on the following matrices: R Eθ {R (θ)} = G + σ2w N R Eθ {R (θ)} = Eθ {R∗ (θ) ⊗ R (θ)} = Eθ A (θ) AH (θ) + σ2w U + σ4w N Q Eθ {Q (θ)} = R + Eθ A (θ) KAH (θ) = Eθ A (θ) (IK 2 + K) AH (θ) + σ2w U + σ 4w N (7.26) with the following definitions: G Eθ A (θ) AH (θ) (7.27) U [G∗ ⊗ N + N∗ ⊗ G] (7.28) The Bayesian expectation always increases the rank of these matrices. Even if the prior distribution is rather informative, these matrices become rapidly full rank. Therefore, let us consider that G, and hence U, are eventually full rank. In that case, the MSE matrices in (7.5) and (7.6) converge to the following limits at high SNR (Appendix 7.H): lim Σmse = Σg − SH BT1 S σ2w →0 # T Q #S lim Σvar = Σg + SH QB S − SH Q 2 σ2w →0 lim Σmse = Σg − SH BT3 S + Xmse (K) σ2w →0 (7.29) # # S + Xvar (K) T Q S − SH Q lim Σvar = Σg + SH QB 4 σ2w →0 where Xmse (K) = SH BT3 Eθ A (θ) KAH (θ) BT3 S # # T Q T4 Eθ A (θ) KAH (θ) BT4 Q QB T4 Q Xvar (K) = SH QB QB S 4 (7.30) and BT is computed as −1 −1 H −1 −1 H −1 BT U−1 VT VTH U−1 VT ΣT VT U VT VT U (7.31) with VT ΣT VTH the “economy-size” diagonalization of the specific matrix T considered in (7.29): T1 Eθ A (θ) (IK 2 + K) AH (θ) + Q T2 Eθ A (θ) (IK 2 + K) AH (θ) T3 Eθ A (θ) AH (θ) + Q (7.32) T4 Eθ A (θ) AH (θ) . CHAPTER 7. ASYMPTOTIC STUDIES 188 Taking a glance at (7.29), one observes that the terms # T Q SH QB S 2 # T Q SH QB S + Xvar (K) 4 # S is the estimator bias in Σvar and Σvar correspond to the self-noise whereas the term Σg −SH Q at high SNR. The self-noise terms were found to vanish when the observation time was infinite for the problem of blind frequency estimation in Section 3.4. On the other hand, the bias term could not be cancelled extending the observation time in the problem of blind frequency estimation (Section 3.4). 7.4 Large Sample Study In this section, the asymptotic performance of the second-order estimators deduced in Chapter 4 is evaluated when the number of observed samples goes to infinity (M → ∞). Notice that M can be increased by augmenting either the sampling rate (Nss ) or the observation interval (Ns = M/Nss ). In the first case, the sampling theorem states that it is enough to take Nss = 2 samples per symbol for those modulations with an excess of band smaller than 100%. However, when the observation window is too short, the observed spectrum becomes wider due to well-known smearing and leakage effects [Sto97]. The proposed estimators deal with this problem by applying the best temporal window according to the known signal model and the adopted optimization criterion. Nonetheless, if the vector of nuisance parameters is longer than the number of observed samples, it is not possible to avoid the variance floor at high SNR unless Nss is increased (see Appendix 7.E). This problem is only relevant when the observation time is really short, as it has been considered in this dissertation so far. If the observation time Ns is augmented, the problem of spectral aliasing becomes rapidly negligible and Nss = 2 becomes sufficient. The importance of the sampling rate was also evidenced in Section 3.4 for the problem of carrier frequency-offset synchronization. It was shown therein that the estimator bias can only be cancelled if Nss goes to infinity. Surprisingly, the bias term cannot be removed by only increasing Ns . However, this sort of arguments are specific to the frequency estimation problem and should be revised for other estimation problems. Considering in the sequel that Nss is fixed, asymptotic expressions are given in this section for the small-error second-order estimators deduced in Chapter 4 as the observation length goes to infinity (M → ∞) . The study for the large error estimators in Chapter 3 is omitted because it is less insightful due to the role of the Bayesian expectation (see Section 7.3.4). 7.4. LARGE SAMPLE STUDY 189 In the large sample case, a unified analysis is not feasible because the results depend on the actual parameterization and on how A (θ) grows when M → ∞. In this section, the problems of non-data-aided synchronization in Section 6.1, blind time-of-arrival estimation in Section 6.3, blind channel identification in Section 6.4 and DOA estimation in Section 6.5 are considered. Before addressing the asymptotic study for the aforementioned estimation problems, the covariance matrices BU CRB (θ), Bgml (θ) and Bbque (θ) in Section 7.1 are now restated in terms of the following matrices B (θ) AH (θ) N−1 A (θ) ∂AH (θ) −1 Bp (θ) N A(θ) ∂θp ∂AH (θ) −1 ∂A (θ) Bp,q (θ) N . ∂θp ∂θq (7.33) for p, q = 1, . . . , P . These matrices collect all the scalar products between two columns of A (θ) and ∂A (θ) /∂θp —normalized by means of N−1 . It is shown in the following subsections that these K × K matrices determine enterely the performance of second-order estimators. Thus, it is only necessary to study the asymptotic value of B (θ) , Bp (θ) and Bp,q (θ) as the number of observations goes to infinity (M → ∞) or, in other words, as the dimension of the column space of A (θ) and ∂A (θ) /∂θp increases wihtout limit. For the sake of clarity, we will consider hereafter that g (θ) = θ and, in most cases, the noise term will be assumed white, i.e., N = IM . 7.4.1 (Gaussian) Unconditional Cram´ er-Rao Bound After some simplifications, the (Gaussian) UCRB in (7.7) can be restated as −1 −1 BU CRB (θ) p,q = DH (θ) Dr (θ) p,q r (θ) R ∂AH (θ) −1 ∂AH (θ) −1 = 2 Re Tr R (θ) A (θ) R (θ) A (θ) ∂θp ∂θq H ∂AH (θ) −1 ∂A (θ) −1 + A (θ) R (θ) A (θ) R (θ) . ∂θp ∂θq using the algebraic properties in (7.54) from Appendix 7.A. Then, if the inversion lemma is applied to arrange the inverse of R (θ) as −1 H H −1 −1 −1 R−1 (θ) = σ−2 − σ−2 A (θ) N−1 , w N w N A (θ) A (θ) N A (θ) + IK (7.34) CHAPTER 7. ASYMPTOTIC STUDIES 190 it follows that the entries of B−1 U CRB (θ) become a function of the following three matrices: −1 X (θ) σ2w AH (θ) R−1 (θ) A (θ) = B (θ) − B (θ) B (θ) + σ2w IK B (θ) H −1 ∂A (θ) −1 R (θ) A (θ) = Bp (θ) − Bp (θ) B (θ) + σ2w IK B (θ) (7.35) Xp (θ) σ2w ∂θ p −1 H ∂AH (θ) −1 ∂A (θ) Xp,q (θ) σ2w R (θ) = Bp,q (θ) − Bp (θ) B (θ) + σ 2w IK Bq (θ) ∂θ p ∂θq where B (θ) , Bp (θ) and Bp,q (θ) were introduced in (7.33). Therefore, plugging (7.35) into (7.34), it follows that −1 BU CRB (θ) p,q = 2σ−4 Re Tr X (θ) X (θ) + X (θ) X (θ) . p w q p,q 7.4.2 (7.36) Gaussian Maximum Likelihood In this section, the GML covariance matrix Bgml (θ) is restated in terms of B (θ) , Bp (θ) and Bp,q (θ). Bearing in mind that Dg (θ) = IP , it follows that (7.6) can be written as Bgml (θ) = BU CRB (θ) + Xgml (K) = BU CRB (θ) + BUCRB (θ)Ψ (K) BU CRB (θ). (7.37) where −1 Ψ (K) DH (θ) A (θ) KAH (θ) R−1 (θ) Dr (θ) . r (θ) R Next, we will prove that Ψ (K) is also a function of B (θ), Bp (θ) and Bp,q (θ) (7.33). Taking into account the definitions of A (θ) and R (θ) in (7.2), the associative property of the Kronecker product yields ∗ AH (θ) R−1 (θ) = AH (θ) R−1 (θ) ⊗ AH (θ) R−1 (θ) . Then, using again the matrix properties in (7.54), it can be seen that H A (θ) R−1 (θ) Dr (θ) p = vec X (θ) Xp (θ) + XH p (θ) X (θ) where X (θ) and Xp (θ) were introduced in (7.35). Therefore, Ψ (K) can be written as H [Ψ (K)]p,q = σ−4 w vec (Yp (θ)) K vec (Yq (θ)) H H = σ−4 w vec (Yp (θ)) VK ΣK VK vec (Yq (θ)) (7.38) where Yp (θ) X (θ) Xp (θ) + XH p (θ) X (θ) (7.39) 7.4. LARGE SAMPLE STUDY 191 H is the “economy-size” diagonalization of K. and VK ΣK VK To conclude this analysis, Ψ (K) can be further simplified when the nuisance parameters are circular. In that case, the kurtosis matrix K is equal to (ρ − 2) diag (vec (IK )) (3.12), and the eigenvalues and eigenvectors of K are given by ΣK = (ρ − 2) IK [VK ]k = vec ek eH k where ek ∈ RK is defined as [ek ]i 1 i=k 0 i = k (7.40) . In [Mag98, Ex.4, p.62], it is shown that VK has the following interesting properties: H vec(A) = diag (A) VK [(A ⊗ B) VK ]k = [A]k ⊗ [B]k H VK (A ⊗ B) VK = A B (7.41) for any pair of matrices A and B of appropriate size. Taking into account the first property in (7.41), (7.42) becomes H [Ψ (K)]p,q = σ−4 w diag (Yp (θ)) ΣK diag (Yq (θ)) = σ−4 w (ρ − 2) Tr (Yp (θ) Yq (θ)) (7.42) using (7.40) and the following identity: diagH (A) diag (B) = Tr (A∗ B) = Tr AH B . Regarding now the definition of Yp (θ) (7.39), it follows that diag (Yq (θ)) is always realvalued because diag (Yq (θ)) = 2 Re diag XH . p (θ) X (θ) 7.4.3 Best Quadratic Unbiased Estimator In this section, the same analysis is carried out for the optimal second-order estimator. The aim is also to formulate Bbque (θ) in terms of B (θ), Bp (θ) and Bp,q (θ) (7.33). To begin with, the inversion lemma is applied to Q−1 (θ) obtaining H −1 −1 (θ) Dr (θ) = DH (θ) Dr (θ) + Γ (K) B−1 r (θ) R bque (θ) = Dr (θ) Q (7.43) CHAPTER 7. ASYMPTOTIC STUDIES 192 where the first term corresponds to B−1 U CRB (θ) in (7.36), and H H −1 −1 (θ) A (θ) VK VK A (θ) R−1 (θ) A (θ) VK + Σ−1 Γ (K) −DH r (θ) R K H H VK A (θ) R−1 (θ) Dr (θ) (7.44) is the term depending on the kurtosis matrix2 . The “economy-size” diagonalization of K is introduced to encompass those problems in which K is singular (e.g., CPM). Next, Γ(K) is formulated in terms of B (θ), Bp (θ) and Bp,q (θ) (7.33), H [Γ (K)]p,q = −σ −4 w vec (Yp (θ)) H ∗ −1 H [X (θ) ⊗ X (θ)] VK + σ4w Σ−1 VK vec (Yp (θ)) , VK VK K (7.45) using the following identities: H A (θ) R−1 (θ) Dr (θ) p = σ−4 w vec (Yp (θ)) ∗ AH (θ) R−1 (θ) A (θ) = σ−4 w [X (θ) ⊗ X (θ)] . Unfortunately, the analysis of (7.45) is really involved except for those circular alphabets holding K = (ρ − 2) diag (vec (IK )) (3.12). In that case, from (7.40) and (7.41), we have H vec (Yp (θ)) = diag (Yp (θ)) VK H VK [X∗ (θ) ⊗ X (θ)] VK = X∗ (θ) X (θ) . Accordingly, if ρ = 2, the non-Gaussian term Γ(K) is given by −1 −1 H ∗ 4 [Γ (K)]p,q = −σ −4 IK diag (Yq (θ)) . (7.46) w diag (Yp (θ)) X (θ) X (θ) + σ w (ρ − 2) Regarding the last expression, the following important conclusion arises. If the nuisance parameters are circular (3.12), the Gaussian assumption applies independently of the SNR and the nuisance parameters distribution if diag (Yp (θ)) = 2 Re diag XH p (θ) X (θ) = 0 (7.47) for p = 1, . . . , P where X (θ) and Xp (θ) were defined in (7.35). If the last equation holds true, Ψ (K) and Γ(K) are exactly zero in view of (7.42) and (7.46). This condition will be tested in the following sections to validate the Gaussian assumption in some relevant estimation problems in digital communications. Notice that the last condition is more restrictive than the one presented in (7.25). Actually, it is straightforward to realize that (7.47) is satisfied if (7.25) is held because, in that case, BH p (θ) = AH (θ) N−1 ∂A (θ) /∂θp = 0 and hence Xp (θ) = 0 (7.35). 2 Notice that Γ (K) is actually the second term in (7.23). 7.4. LARGE SAMPLE STUDY 7.4.4 193 Second-Order Estimation in Digital Communications In this section, simple asymptotic closed-form expressions are obtained for any estimation problem in which multiple replicas of the same waveform (pulse) are periodically received. The received waveform is parameterized by a finite set of parameters θ and will be referred to as g(t; θ) in this section. Assuming that a continuous stream of symbols is received, the structure of A (θ) corresponds to the one represented in Fig. 6.1 (right-hand side) in Section 6.1.2. This framework encompasses most estimation problems in digital communications, among others the synchronization problems described in Section 6.1, the problem of blind channel identification in Section 6.4 and, the problem of time-of-arrival estimation studied in Section 6.3. Although the problem of frequency estimation does not fall into this category because the phase of the received waveform is time-varying, it is proved in Appendix 7.J that quadratic NDA techniques are only aware of the carrier phase variation within the received pulse duration. In this section, the aymptotic value of B (θ), Bp (θ) and Bp,q (θ) (7.33) is determined for Ns going to infinity. In that case, the size of these K × K square matrices also increases proportionally as Ns → ∞ because K = Ns + L − 1 with L the pulse duration in symbol periods. However, although the size of B (θ), Bp (θ) and Bp,q (θ) tends to infinity, the central rows and columns of B (θ), Bp (θ) and Bp,q (θ) contain delayed versions of the following autocorrelation and cross-correlation functions3 ∞ R [k] g(t)g ∗ (t + kT )dt −∞ ∞ g(t)gp∗ (t + kT )dt Rp [k] −∞ ∞ Rp,q [k] gp (t)gq∗ (t + kT )dt, −∞ where gp (t; θ) ∂g (t; θ) /∂θ p stands for derivative of g (t; θ) with respect to the p-th parameter θp . In the sequel, the dependence on θ will be omitted for the sake of brevity. Henceforth, only the central rows and columns of B (θ), Bp (θ) and Bp,q (θ) will be considered bearing in mind that the “edge effect” is negligible in the asymptotic case (Ns → ∞) or in case of TDMA signals (Section 6.1.2). This analysis is inspired in the asymptotic study carried out in [Rib01b] for the CML timing estimator4 . In [Rib01b], it is shown that the multiplication of these matrices yields another matrix whose central columns and rows are the convolution (∗) 3 For simplicity the noise is assumed uncorrelated, i.e., N = IM . Otherwise, the same expressions are valid for the whitened waveform η(t) ∗ g (t; θ) where η(mTs ) is the central column of N−1/2 . 4 Likewise, the same reasoning was adopted in [Kay93b, Sec. 7.9] to get asymptotic expressions for the NewtonRaphson and scoring recursions in the context of maximum likelihood estimation. CHAPTER 7. ASYMPTOTIC STUDIES 194 of the central columns and rows of the original matrices. This allows computing BUCRB (θ), Bgml (θ) and Bbque (θ) as follows: −1 BU CRB (θ) p,q = 2σ −4 w Re Tr (Xp (θ) Xq (θ) + X (θ) Xp,q (θ)) = 2Ns σ−4 w Re (Xp [k] ∗ Xq [k] + X [k] ∗ Xp,q [k])|k=0 + o (Ns ) where the central rows and columns of X (θ), Xp (θ) and Xp,q (θ) are given by −1 X[k] R[k] − R[k] ∗ R[k] + σ 2w δ[k] ∗ R[k] −1 ∗ R[k] Xp [k] Rp [k] − Rp [k] ∗ R[k] + σ2w δ[k] (7.48) Xp,q [k] Rp,q [k] − Rp [k] ∗ (R[k] + δ[k])−1 ∗ R∗q [−k]. In the above equations, the inverse operator (·)−1 stands for the deconvolution, i.e., a−1 [k] is the sequence holding a[k] ∗ a−1 [k] = δ[k]. As it is the usual practice, this deconvolution is solved in the frequency domain. Using standard Fourier calculus, it is found that B−1 U CRB (θ) p,q = 2Ns σ −4 w Re 0.5 −0.5 SXp (f)SXq (f) + SX (f )SXp,q (f)df + o (Ns ) where the Fourier transforms of X[k], Xp [k] and Xp,q [k] are given next in terms of the Fourier transforms of R[k], Rp [k] and Rp,q [k]: S 2 (f ) σ2w S(f) = S(f ) + σ2w S(f ) + σ2w Sp (f) S(f) σ2w Sp (f ) SXp (f ) F {Xp [k]} = Sp (f ) − = S(f ) + σ2w S(f ) + σ2w Sp (f)Sq∗ (f ) . SXp,q (f ) F {Xp,q [k]} = Sp,q (f) − S(f ) + σ2w SX (f ) F {X[k]} = S(f ) − (7.49) Focusing uniquely on circular complex alphabets (e.g., MPSK and QAM), the term Ψ (K) appearing in the GML covariance matrix (7.37) is asymptotically (Ns → ∞) given by −4 2 [Ψ (K)]p,q = σ−4 w (ρ − 2) Tr (Yp (θ) Yq (θ)) = Ns σ w (ρ − 2) Yp [0] + o (Ns ) where Yp [k] X[k] ∗ Xp [k] + Xp∗ [−k] ∗ X[k] = 2 Re Xp∗ [−k] ∗ X[k] . On the other hand, the term Γ(K) (7.46) appearing in the BQUE covariance matrix (7.43) 7.4. LARGE SAMPLE STUDY 195 is asympotically equal to −1 −1 H ∗ 4 [Γ (K)]p,q = −σ−4 diag (Y (θ)) X (θ) X (θ) + σ (ρ − 2) I diag (Yq (θ)) p K w w ∞ −1 |X[k]|2 + σ4w (ρ − 2)−1 δ [k] + o (Ns ) 2 = −Ns σ −4 w Yp [0] k=−∞ 2 [0] Y p N + o (Ns ) = −σ−4 s w ∗ 4 S (f) ∗ S (−f) + σ w / (ρ − 2) f =0 0.5 2 −0.5 SYp (f)df = −σ−4 + o (Ns ) N s w 0.5 2 4 −0.5 |S (f) | df + σ w / (ρ − 2) with SYp (f ) = F {Yp [k]} the Fourier transform of Yp [k]. Regarding now the Gaussian condition in (7.47), the matrices Γ(K) and Ψ (K) are asymptotically null if ∗ ∗ Yp [0] = 2 Re Xp [−k] ∗ X[k] k=0 = 2 Re Xp [n]X[n] = 2 Re 0.5 −0.5 ∗ SX (f )S(f )df p n = 2σ 4w Re 0.5 Sp∗ (f ) S(f ) −0.5 |S(f ) + σ2w |2 df is equal to zero independently of the actual value of ρ and σ2w . The last expression has been formulated in the frequency domain using the Parceval’s identity and the Fourier transforms of X[k] and Xp [k] in (7.49). Notice that the energy spectrum S(f) = F {R[k]} is always real because R[k] has Hermitian symmetry, i.e., R[k] = R∗ [−k]. Besides, S(f ) is even if R[k] is real-valued, which implies that g (t; θ) is also real-valued. Therefore, there are three possible situations leading to Yp [0] = 0, and hence validating the Gaussian assumption: 1. Sp (f ) is imaginary, i.e., Re {Sp (f)} = 0. From the Fourier theory, Sp (f) = F {Rp [k]} is imaginary if Rp [k] is imaginary or, Rp [k] is real but it has odd symmetry. 2. S (f) is an even function whereas Sp (f ) is an odd function. The former condition is held for g (t; θ) real-valued. The latter condition is held if and only if the cross-correlation Rp [k] is also odd. 3. S (f) is an odd function whereas Sp (f) is an even function. The former condition is held when the received waveform g (t; θ) is imaginary and even. The latter condition is held if the cross-correlation Rp [k] is also even. It is found that the last conditions usually apply in frequency and timing5 synchronization. In frequency synchronization, the cross-correlation Rp [k] is imaginary (condition 1). On the 5 The same conclusion applies to the related problem of time-of-arrival estimation in radiolocation applications. CHAPTER 7. ASYMPTOTIC STUDIES 196 other hand, in timing synchronization, the cross-correlation Rp [k] is a real-valued odd function because the transmitted pulse is usually a real symmetric function in digital communications (condition 2)6 . Nonetheless, the Gaussian assumption generally fails in the problem of blind channel identification (Section 6.4) because the received waveform g (t; θ) is distorted by the complex channel impulse response and, hence, the complex cross-correlation Rp [k] does not exhibit any symmetry. For example, if the channel is multiplicative, the received waveform is a scaled version of the transmitted pulse and it is easy to show that Rp [k] is proportional to R[k]. 7.4.5 Second-Order Estimation in Array Signal Processing In array signal processing, the spatio-temporal observation can be written as y = vec As (θ) (At X)T + w (7.50) where M [As (θ)]p = exp jπθp d is the spatial signature of the p-th user impinging on a λ/2-spaced linear array from the direction θp ∈ [−1, 1) with M dM − M − 1 d 2 dM = [0, . . . M − 1]T . In (7.50), the modulation matrix At ∈ RNs ×K contains the shaping pulse p(t), [X]k are the received symbols from to the k-th user and, w the spatio-temporal Gaussian noise vector. Notice that the array is calibrated to have unitary response when the signal comes from the broadside (θ p = 0). However, the same results would be obtained adopting any other calibration. The observation vector y can be arranged in the standard form, y = A (θ) x + w, using that vec ABCT = (A ⊗ C) vec (B) . Then, we have A (θ) = At ⊗ As (θ) x = vec XT . Based on the general expressions deduced in Section 7.4, the asymptotic value of B (θ), Bp (θ) and Bp,q (θ) in (7.33) is now obtained for the above spatio-temporal signal model. It is 6 This implies that the same pulse shaping is used in the in-phase and quadrature components. 7.4. LARGE SAMPLE STUDY 197 straightforward to show that −1 H −1 B (θ) = AH t Nt At ⊗ As (θ) Ns As (θ) ∂AH s (θ) −1 −1 N A ⊗ Ns As (θ) Bp (θ) = AH t t t ∂θ p −1 Bp,q (θ) = AH t Nt At ⊗ ∂AH s (θ) −1 ∂As (θ) Ns ∂θ p ∂θ q assuming that the temporal and spatial components of the noise are decoupled as N = Nt ⊗ Ns and using that % ∂As (θ) ∂θp & i ⎧ ⎨ jπ d M M exp jπθ p d i=p = . ⎩0 i= p Moreover, assuming for simplicity that the noise is spatially uncorrelated (i.e., Ns = IM ), we have that the spatial cross-correlation for the P users is determined by the following matrices: −1 [B (θ)]i,k AH s (θ) Ns As (θ) i,k = FM (θ i − θ k ) % & ∂AH dFM (f) s (θ) −1 [Bp (θ)]i,k Ns As (θ) = δ (i, p) ∂θ p df f =θi −θk i,k & % ∂AH s (θ) −1 ∂As (θ) [Bp,q (θ)]i,k Ns ∂θ p ∂θ q i,k 2 d FM (f) δ (i, p) δ (k, q) = df 2 f =θi −θk (7.51) where δ (i, j) is the Kronecker delta and FM (f) is the following sinc function, M f =0 . FM (f) sin(πM f /2) f = 0 sin(πf /2) Notice that Bp (θ) and Bp,q (θ) in (7.51) are derived resorting to the differential property of the Fourier transform, i.e., −j2πF {nv[n]} = where V (f) = F {v[n]} = $ −j2πf n n v[n]e dV (f) df is the Fourier transform of a given sequence v[n]. In the studied space-time signal model, the observation y can be increased by augmenting either the number of antennas (M) or the number of snapshots (Ns ), where M and Ns are the number of rows of As (θ) and At , respectively. The asymptotic performance of the GML and CML direction-of-arrival estimators have already been studied in [Sto89] [Sto90a][Vib95] when M → ∞ and in [Sto89] [Sto90a][Ott92][Car94] when Ns → ∞. In the following two sections, the aforementioned study is extended to the optimal second-order DOA estimator deduced in Chapter 4. CHAPTER 7. ASYMPTOTIC STUDIES 198 Infinite number of antennas If the number of sensors is increased (M → ∞), the asymptotic MSE matrix for the optimal and the GML estimators is computed in Appendix 7.K, having that −3 σ2w 6 I + o M P −1 π 2 M 3 Tr AH t Nt At 6 IP + o Ns−1 M −3 = 2 3 π M Ns Es /N0 BU CRB (θ) , Bgml (θ) , Bbque (θ) = (7.52) with σ2w = N0 the noise double-sided spectral density and Es the energy of the received symbols. In the last expression, we have taken into account that 1 −1 Tr AH t Nt At = Es . Ns →∞ Ns lim This result was previously obtained in [Sto89] for the conditional model (Section 2.3). Moreover, it has been proved in [Sto90a, R8-R9] that the conditional and unconditional model yield efficient estimates when the number of sensors or the SNR goes to infinity. The analysis in Appendix 7.K focuses on the unconditional model and becomes an extension to the concise solution provided in [Sto89]. In this appendix, the asymptotic value of the off-diagonal entries of BU CRB (θ) as well as the non-Gaussian terms in Bgml (θ) and Bbque (θ) is calculated, concluding that they are totally negligible if M → ∞. Also, notice that (7.52) coincides with the modified CRB (MCRB) for the problem of carrier frequency-offset estimation [Men97, Eq. 2.4.23][Rif74], which is known to be attained by means of data-aided (DA) methods. In both cases, the estimator tries to infere the frequency of an infinitely long sinusoid either in the space domain (DOA) or in the time domain (DA frequency synchronization). Nonetheless, in array signal processing, the array size is implicityly limited by the narrowband and far-field assumptions [Vib95]. Infinite number of snapshots When the number of observed symbols Ns is large, it is shown in [Ott92][Car94] that most second-order DOA estimators in the literature —based on both the conditional and unconditional model— are asymptotically robust. This means that the covariance matrix of the estimation error is independent of the sources statistical distribution provided that Ns → ∞. This statement implies that the higher-order term Xgml (K) (7.8) is negligible for Ns → ∞ whatever the content of matrix K. However, it was shown in Section 6.5 that the knowledge of K can be exploited to improve significantly the estimator accuracy when multiple constant-modulus sources transmit towards the array from near directions. Actually, this result was already pointed out in [Ott92] where the authors stated that ‘[...] a Gaussian distribution of the emitter signals represents the 7.5. SIMULATIONS 199 worst case. Any other distribution would tipically give better estimates, provided the appropiate ML method is used.’ In Appendix 7.L, the asymptotic expressions of BU CRB (θ), Bgml (θ) and Bbque (θ) are obtained when the number of received symbols Ns tends to infinity whereas the number of antennas M is kept constant. In that case, it is shown that Γ (K) is exactly zero in the single user case (Appendix 7.L). On the other hand, if there are multiple users transmitting towards the array (P > 1), both B−1 U CRB (θ) and Γ (K) are proportional to the number of snapshots and, therefore, Γ(K) does not disappear as Ns → ∞. In addition, it is found that, at high SNR, the second term Γ(K) is proportional to σ−2 w if and only if ρ = 1. If so, the contribution of Γ(K) remains as the SNR is increased. This important result is formalized in the next equations. If Es /N0 and Ns go to infinity, it is proved in Appendix 7.Lthat −1 (θ) , B (θ) B−1 U CRB gml p,q Es = 2Ns Re Tr Bp,q (θ) − Bp (θ) B −1 (θ) BqH (θ) + o (Ns ) N0 p,q −1 ρ = 1 BU CRB (θ) p,q + o (Ns ) −1 = −1 Bbque (θ) p,q BU CRB (θ) p,q + [Γ (K)]p,q + o (Ns ) ρ = 1 with [Γ (K)]p,q = 2ξNs Es Tr Bp (θ) B−1 (θ) Dg −1 B −1 (θ) Bp (θ) B −1 (θ) δ (p, q) + o (Ns ) N0 being the result a function of the cross-correlation of the P users signatures and their derivatives (7.51). On the other hand, the constant ξ ≤ 1 is a function of the temporal correlation of the received signal (7.91) and is unitary in the uncorrelated case (Appendix 7.L). 7.5 Simulations In this section, the asymptotic studies carried out in this chapter are validated via computer simulations. 7.5.1 SNR asymptotic results for the BQUE and GML estimators To evaluate the asymptotic performance of the BQUE and GML small-error estimators when the SNR goes to zero and infinity, the problem of DOA estimation is adopted (see Section 6.5). The angle of arrival of two users is estimated with a linear array of four elements (M = 4). A single snapshot is taken at the matched filter output (Ns = 1). Assuming perfect timing synchronization and ISI-free received pulses, the estimator MSE becomes inversely proportional to the number of integrated snapshots. In Fig. 7.1 and Fig. 7.2, the sum of the variance of the CHAPTER 7. ASYMPTOTIC STUDIES 200 2 10 1 10 0 Normalized Variance 10 low−SNR asymptote −1 10 high−SNR asymptote −2 10 −3 10 UCRB −4 10 BQUE −5 10 GML −6 10 −20 −10 0 10 20 30 Es/No (dB) 40 50 60 70 Figure 7.1: Normalized variance for the GML and BQUE DOA estimators in case of having two 16-QAM sources transmitting from ±0.5 degrees. The low- and high-SNR limits computed in this chapter are plotted as well. two users, i.e., V ARbque = Tr (Bbque (θ)) V ARgml = Tr (Bgml (θ)) , (7.53) is evaluated as a function of the Es /N0 per user when the two users are located at ±0.5o from the broadside. For more details the reader is referred to Section 6.5. In these figures, it is shown how the asymptotic expressions deduced in this chapter predict exactly the low and high SNR performance of the studied quadratic small-error estimators. In Fig. 7.1, the Gaussian assumption is shown to be optimal at low and high SNR whereas minor losses are observed in the middle of these extremes. It has been checked that the BQUE converges to the GML performance when the alphabet dimension is augmented (e.g., 64-QAM). On the other hand, if the constellation has constant modulus (e.g., MPSK or CPM), the Gaussian assumption is found to yield important losses when the SNR exceeds a given critical value or threshold determined by the array size (Fig. 7.2). The position of this SNR threshold is actually independent of the number of processed snapshots. Additional simulations have been carried out for the CPM modulation obtaining the same curves than in Fig. 7.2. Therefore, it seems that the only relevant feature of the nuisance parameters for DOA estimation is their constant amplitude. 7.5. SIMULATIONS 201 2 10 1 10 0 low−SNR asymptote 10 −1 Normalized Variance 10 −2 10 high−SNR asymptote high−SNR asymptote −3 10 UCRB −4 10 GML −5 10 −6 10 BQUE −7 10 −20 −10 0 10 20 30 Es/No (dB) 40 50 60 70 Figure 7.2: Normalized variance for the GML and BQUE DOA estimators in case of having two MPSK sources transmitting from ±0.5 degrees. The low- and high-SNR limits computed in this chapter are plotted as well. Regarding Fig. 7.2, another important remark is that the low- and high-SNR asymptotes can be combined to lower bound the performance of any second-order technique in case of constantmodulus alphabets. Finally, notice that the UCRB only predicts the asymptotic performance of the GML estimator. However, in both figures, the GML estimator outperforms the UCRB for intermediate SNRs. 7.5.2 SNR asymptotic results for the large-error estimators In this section, the large-error frequency-offset estimators presented in Section 3.4 are simulated again. The low- and high-SNR asymptotic expressions deduced in this chapter are validated for the 16-QAM, MPSK and MSK modulations. In all the simulations, the rank of G = E AAH is full (Appendix 3.D). A uniform prior with ∆ = 0.4 is considered. Although this prior is rather informative and the variance floor was not observed in Fig. 3.7 for the MSK modulation, its existence is evidenced in Fig. 7.3. In this figure, it is also shown how the Gaussian assumption leads to a higher variance floor at high SNR. Comparing Figs. 7.3, 7.4 and 7.5, the floor level depends on the modulation at hand. This statement is true for both the optimal estimator and the one deduced under the Gaussian assumption, although the latter is not represented in Figs. 7.4 and 7.5 for the sake of clarity. CHAPTER 7. ASYMPTOTIC STUDIES 202 1 10 0 Mean Square Error (MSE) 10 −1 10 Mvar low−SNR asymptote low−SNR asymptote Gaussian Assumption −2 10 M mse −3 10 high−SNR asymptotes Optimum −4 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 7.3: Mean square error for the MMSE and minimum variance frequency-offset estimators for the MSK modulation (Nss = 2, M = 8, K = 5). The estimators based on the Gaussian assumption as well as all the low- and high-SNR limits are indicated. 7.5.3 Large sample asymptotic results for the BQUE and GML estimators In this section, the large sample study in Section 7.4 is validated numerically. In the first two figures (Fig. 7.6 and 7.7), the normalized variance is computed for the optimal secondorder timing and frequency estimators deduced in Chapter 4 under the small-error condition. The normalization consists in multiplying the estimator variance by the number of processed symbols, i.e., Ns = M/Nss . The estimators variance is simulated for different data lengths and is compared to the asymptotic variance obtained from the large sample study (Ns → ∞) in Section 7.4.4. The Gaussian assumption is optimal in all the simulations except in the timing synchronization problem (Fig. 7.6). In that case, the Gaussian assumption exhibits a higher variance floor (self-noise) when the noise subspace of matrix A (θ) is null (M ≤ K). Regarding the DOA estimation problem, the large sample study presented in Section 7.4.5 is validated via simulation for the same scenario considered in Section 7.5.1. In the first simulations (Fig. 7.8 and 7.9), the estimator variance (7.53) is evaluated for different values of ρ considering an array of four antennas an a single snapshot7 . The performance associated to a hypothetical super-Gaussian constellation with ρ = 10 is also depicted in Fig. 7.9, although all the alphabets 7 Remember that estimator variance is inversely proportional to the number of processed snapshots whatever the value of ρ. Therefore, all the results and conclusions are still correct if Ns → ∞. 7.5. SIMULATIONS 203 1 10 Mvar 0 Mean Square Error (MSE) 10 −1 10 low−SNR asymptote low−SNR asymptote −2 10 M mse −3 10 high−SNR asymptotes −4 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 7.4: Mean square error for the MMSE and minimum variance frequency-offset estimators for the 16-QAM modulation with roll-off 0.75 (Nss = 2, M = 8, K = 8). The low- and high-SNR limits computed in this chapter are plotted as well. of interest in digital communications are sub-Gaussian (ρ < 2). Regarding Fig. 7.8 and Fig. 7.9, one concludes that the asymptotic expression derived in (7.52) for M → ∞, BU CRB (θ) , Bgml (θ) , Bbque (θ) = 6 IP π 2 M 3 Ns Es /N0 + o(Ns−1 M −3 ), is attained for practical SNRs in case of having constant-amplitude nuisance parameters (ρ = 1) even if the number of antennas is very small (M = 4). Notice that the optimality at high SNR is verified irrespective of the users angular separation if one compares Fig. 7.8 and Fig. 7.9. Nonetheless, minor discrepancies are observed in Fig. 7.8 due to sinc-like beam pattern when M is finite (7.51). On the other hand, if ρ > 1, the estimator performance at high SNR converge to the (Gaussian) UCRB, that corresponds to ρ = 2. It can be seen that the larger is ρ and the closer are the sources, the lower is the Es /N0 from which the convergence to the UCRB is manifested. Moreover, the closer are the users the more significant is the loss incurred by the Gaussian assumption in case of constant-modulus nuisance parameters. These conclusions are manifested again when the estimator variance is evaluated as a function of M (Figs. 7.10-7.12). In that case, the UCRB attains the asymptotic limit (7.52) if the number of antennas goes to infinity (M → ∞). On the other hand, when the nuisance parameters have CHAPTER 7. ASYMPTOTIC STUDIES 204 1 10 0 10 M Mean Square Error (MSE) var −1 10 low−SNR asymptote low−SNR asymptote −2 10 M mse −3 10 high−SNR asymptotes −4 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 7.5: Mean square error for the MMSE and minimum variance frequency-offset estimators with MPSK symbols and roll-off 0.75 (Nss = 2, M = 8, K = 8). The low- and high-SNR limits computed in this chapter are plotted as well. constant modulus (ρ = 1), the optimal second-order estimator attains (7.52) for any value of M, except for an intermediate interval in which the estimator converges to the UCRB. It can be shown that the value of M from which Bbque (θ) departs from (7.52) is inversely proportional to the angular separation of the users. Specifically, this critical value occurs when Γ(K) (7.46) attains its maximum value. In case of having two users, the critical value of M corresponds to the first maximum of (7.90) in Appendix 7.K that, asymptotically, takes place at M = 0.5/ |θ1 − θ 2 |. For example, using equation (7.90), the referred threshold should take place at M 20 and M 100 in Fig. 7.11 and Fig. 7.12, respectively. The GML performance coincides with the UCRB unless the angular separation is reduced (Fig. 7.12). When the number of antennas is less than 20, the GML outperforms the UCRB bound for both the MPSK and 16-QAM modulations. Indeed, the UCRB is severely degraded when the number of antennas is less than 10 whereas the variance of the BQUE and GML estimators is practically constant for M < 10 (Fig. 7.13). 7.5. SIMULATIONS 205 1 10 0 10 −1 Normalized Timing Variance 10 −2 10 −3 10 M=4,K=6 −4 10 M=8, K=8 −5 10 M→∞ M=20, K=14 −6 10 M=10, K=9 −7 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 7.6: Normalized variance for the optimal second-order timing synchronizer in case of the MPSK modulation. The transmitted pulse is a square-root raised cosine with roll-off 0.75, truncated at ±5T . The observation timel (M) is augmented with Nss = 2 constant. The dashed curves correspond to estimator based on the Gaussian assumption. 1 10 0 10 −1 Normalized Variance 10 −2 10 −3 M=4,K=6 10 −4 10 M=8, K=8 −5 10 M→∞ −6 10 M=20, K=14 M=10, K=9 −7 10 −10 0 10 20 30 40 50 60 Es/No (dB) Figure 7.7: Normalized variance for the optimal second-order frequency-offset synchronizer in case of the MPSK modulation. The transmitted pulse is a square-root raised cosine with roll-off 0.75, truncated at ±5T . The observation interval (M) is augmented with Nss = 2 constant. CHAPTER 7. ASYMPTOTIC STUDIES 206 0 10 −1 10 ρ=2 (UCRB) asymptote for M→∞ −2 Normalized Variance 10 ρ=1.2 −3 10 −4 10 ρ=1.01 −5 10 ρ=1.001 ρ=1 −6 10 −7 10 −10 0 10 20 Es/No (dB) 30 40 50 Figure 7.8: Normalized variance for the optimal second-order small-error DOA estimator for different values of ρ. The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1, M = 4, two users transmitting from ±5 degrees. −1 10 −2 10 −3 Normalized Variance 10 ρ=10 asymptote for M→∞ ρ=2 (UCRB) −4 10 ρ=1.2 ρ=1.05 −5 10 ρ=1.01 −6 10 ρ=1.001 −7 10 ρ=1 −8 10 0 10 20 30 40 50 60 70 Es/No (dB) Figure 7.9: Normalized variance for the optimal second-order small-error DOA estimator for different values of ρ. The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1, M = 4, two users transmitting from ±0.5 degrees. 7.5. SIMULATIONS 207 −6 10 −7 10 UCRB, GML BQUE −8 Normalized Variance 10 −9 Asymptote for M→∞ 10 −10 10 −11 10 −12 10 1 2 10 10 M Figure 7.10: Normalized variance for the optimal second-order small-error DOA estimator as a function of M. The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1, Es /N0 =60dB, two MPSK users transmitting from ±5 degrees. −4 10 −5 10 UCRB, GML −6 10 −7 Normalized Variance 10 −8 10 −9 10 Asymptote for M→∞ −10 10 −11 10 BQUE −12 10 −13 10 −14 10 1 2 10 10 M Figure 7.11: Normalized variance for the optimal second-order small-error DOA estimator as a function of M. The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1, Es /N0 =60dB, two MPSK users transmitting from ±0.5 degrees. CHAPTER 7. ASYMPTOTIC STUDIES 208 −2 10 UCRB −4 10 BQUE (16−QAM) −6 10 Normalized Variance GML (MPSK) −8 10 Asymptote for M→∞ −10 10 −12 10 −14 10 BQUE (MPSK) −16 10 1 2 10 3 10 10 M Figure 7.12: Normalized variance for the optimal second-order small-error DOA estimator as a function of M . The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1, Es /N0 =60dB, two MPSK (or 16-QAM) users transmitting from ±0.1 degrees. 0 10 UCRB −2 Normalized Variance 10 −4 10 BQUE, GML (16−QAM) −6 10 GML (MPSK) BQUE (MPSK) −8 10 M→∞ −10 10 2 4 6 8 10 12 14 16 18 20 M Figure 7.13: Zoom of the previous plot between M = 2 and M = 20. The simulation parameters are Nss = 1, Nyquist pulse shaping, K = 1, Es /N0 =60dB, two MPSK (or 16-QAM) users transmitting from ±0.1 degrees. 7.6. CONCLUSIONS 7.6 209 Conclusions In the previous chapters, the Gaussian assumption was proved to yield significant losses at high SNR if the nuisance parameters had a constant modulus and the observation interval was reduced. In this chapter, the Gaussian assumption has been examined again when the observation interval is increased to infinity. From the Central Limit Theorem, it seems that the data statistics will be irrelevant in this asymptotic case. This intuition is validated in some important estimation problems such as digital synchronization and DOA estimation, if the number of antennas goes to infinity. However, the Gaussian assumption is shown to be suboptimal in some other scenarios. In particular, the Gaussian assumption fails when the estimator suffers from self-noise at high SNR. In that case, the fourth-order information about the nuisance parameters can be exploited to reduce the variance floor, mainly when the nuisance parameters have constant amplitude. By considering the fourth-order information of the nuisance parameters, second-order DOA estimators are able to attain the asymptotic performance associated to an infinite number of antennas, even if the array is very short. On the other hand, the Gaussian assumption yields an important loss that is a function of the sources angular separation. Therefore, in array signal processing, we have concluded that the Gaussian assumption is only optimal if there is a single source or the number of antennas goes to infinity. Finally, in the problem of blind channel identification, some improvement is also expected in the asymptotic case when the transmitted symbols are drawn from a constant-modulus alphabet (Section 6.4). CHAPTER 7. ASYMPTOTIC STUDIES 210 Appendix 7.A Low-SNR ML scoring implementation It was obtained in Section 2.4.1 that the log-likelihood function in a low SNR scenario is given by H −1 ln fy (y; θ) = Tr R−1 A (θ) A (θ) R R − R + o σ−2 w w w w that, resorting again to the vec (·) operator, can be manipulated as follows H −1 ( r − rw ) + o σ−2 ln fy (y; θ) = vecH R−1 w A (θ) A (θ) Rw w r − rw ) + o σ−2 = vecH A (θ) AH (θ) (R∗w ⊗ Rw )−1 ( w H r − rw ) + o σ−2 = σ−4 A (θ) AH (θ) N −1 ( w vec w and Rw , respectively, and the where r and rw are the vectorization of the Hermitian matrices R following relations have been applied vecH (A) vec(B) = Tr(AH B) = Tr(BAH ) vec ABCH = (C∗ ⊗ A) vec(B). (7.54) AH ⊗ BH = (A ⊗ B)H A−1 ⊗ B−1 = (A ⊗ B)−1 The gradient of the asymptotic log-likelihood function is given by ∂ ln fy (y; θ) H −1 = σ −4 ( r − rw ) + o σ−2 w Dr (θ)N w ∂θ since ∂R (θ) /∂θ = ∂ A (θ) AH (θ) /∂θ. Next, the Fisher’s information matrix is computed as the expected value of the Hessian matrix, obtaining the following asymptotic expression: 2 ∂ ln fy (y; θ) ∂ ln fy (y; θ) ∂ ln fy (y; θ) H Ey = Ey ∂θ ∂θ ∂θ∂θT H −1 Q (θ)N −1 Dr (θ) + o σ−4 = σ −8 w Dr (θ)N w H −1 = σ −4 Dr (θ) + o σ−4 w Dr (θ)N w using that, at low SNR, the fourth-order matrix Q (θ) is given by r − rw )H = σ4w N + o σ4w . Q (θ) =E ( r − rw ) ( Therefore, the following scoring recursion, which was presented in equation (2.29), k ) ( k+1 = α k + MH (θ α r − rw ) −1 H −1 M(θ) N −1 Dr (θ) DH Dr (θ) Dg (θ) r (θ)N is known to attain the CRB at low SNR if the small-error condition is verified. −1 (θ) 7.B HIGH-SNR LIMIT OF R−1 (θ) AND R 211 −1(θ) Appendix 7.B High-SNR limit of R−1 (θ) and R In this appendix, we consider that A (θ) is full column rank. In that case, the asymptotic value of R−1 (θ) can be easily obtained by means of the inversion lemma: −1 R−1 (θ) = A (θ) AH (θ) + σ2w N −1 H −1 IM −A (θ) AH (θ) N−1 A (θ) + σ 2w IK = σ−2 A (θ) N−1 . w N (7.55) At high SNR, the inner inverse can be expanded in a Taylor series around σ2w = 0, having that8 H −1 H −1 −2 A (θ) N−1 A (θ) + σ2w IK = A (θ) N−1 A (θ) − σ2w AH (θ) N−1 A (θ) −3 +σ4w AH (θ) N−1 A (θ) + O σ6w . Finally, plugging these three terms into (7.55), the high-SNR limit of R−1 (θ) is given by 4 ⊥ 2 R−1 (θ) = σ−2 w PA (θ) + B (θ) − σ w B (θ) NB (θ) + O σw (7.56) where P⊥ A (θ) and B (θ) are defined in (7.13)-(7.15), and the following identity has been considered: AH (θ) N−1 A (θ) −1 H = A# (θ) N A# (θ) . (7.57) The key property of R−1 (θ) is that, asymptotically, it holds that AH (θ) R−1 (θ) = AH (θ) B (θ) + O σ2w = A# (θ) + O σ 2w H R−1 (θ) A (θ) = B (θ) A (θ) + O σ2w = A# (θ) + O σ2w AH (θ) R−1 (θ) A (θ) = AH (θ) B (θ) A (θ) + O σ2w = IK + O σ2w (7.58) because, by definition, AH (θ) P⊥ A (θ) = 0 P⊥ A (θ) A (θ) = 0. 8 The following relation has been considered to obtain the terms of the Taylor expansion: ∂X−1 (λ) ∂X (λ) −1 = −X−1 (λ) X (λ) . ∂λ ∂λ (7.59) CHAPTER 7. ASYMPTOTIC STUDIES 212 To conclude this appendix, the asymptotic value of R−1 (θ) is obtained from (7.56) as indicated now: R−1 (θ) = R∗−1 (θ) ⊗ R−1 (θ) ⊥∗ ⊥ (θ) ⊗ P (θ) P = σ−4 w A A ∗ ⊥ ⊥∗ B +σ−2 (θ) ⊗ P (θ) + P (θ) ⊗ B (θ) w A A (7.60) +B∗ (θ) ⊗ B (θ) −σ2w (B∗ (θ) ⊗ B (θ) NB (θ) + (B (θ) NB (θ))∗ ⊗ B (θ)) +O σ 4w . is orthogonal to The key property of R−1 (θ) is that the term proportional to σ−4 w vec(A (θ) X), vec(XAH (θ)), X ⊗ A (θ) and A∗ (θ) ⊗ X for any matrix X on account of (7.59). In particular, this is true for the matrix of derivatives Dr (θ) in (7.4) and for A (θ) in (7.2). On the other hand, the first term on σ−2 w is orthogonal to vec(A (θ) X) and X ⊗ A (θ) whereas the second one is orthogonal to vec(XAH (θ)) and A∗ (θ) ⊗ X, based again on (7.59). The same properties in (7.58) can be stated for R−1 (θ), having that AH (θ) R−1 (θ) = AH (θ) [B∗ (θ) ⊗ B (θ)] + O σ 2w = A# (θ) + O σ2w H R−1 (θ) A (θ) = [B∗ (θ) ⊗ B (θ)] A (θ) + O σ2w = A# (θ) + O σ2w AH (θ) R−1 (θ) A (θ) = IK 2 + O σ2w using the following definition of pseudoinverse: ∗ −1 H A# (θ) AH (θ) N −1 A (θ) A (θ) N −1 = A# (θ) ⊗ A# (θ) . All these properties will be used to simplify the high-SNR expressions in Section 7.3. (7.61) 7.C HIGH-SNR LIMIT OF Q−1 (θ) (K FULL-RANK) 213 Appendix 7.C High-SNR limit of Q−1 (θ) (K full-rank) Assuming that K is invertible, the inversion lemma allows expressing Q−1 (θ) as follows −1 Q−1 (θ) = A (θ) KAH (θ) + R (θ) −1 H = R−1 (θ) IM 2 − A (θ) K−1 + AH (θ) R−1 (θ) A (θ) A (θ) R−1 (θ) (7.62) Using (7.61) from Appendix 7.B, it follows that AH (θ) R−1 (θ) A (θ) = IK 2 + O σ2w and, therefore, the asymptotic value of Q−1 (θ) is straightforward if K−1 + IK 2 is invertible. In that case, (7.62) becomes H −1 # A (θ) + O σ2w K−1 + I Q−1 (θ) = R−1 (θ) − A# (θ) using that AH (θ) R−1 (θ) = A# (θ) (7.61). Notice that the term depending on K is negligible unless the dominant terms of R−1 (θ) in (7.60) are null. However, K−1 + IK 2 becomes singular in case of CPM modulations and, therefore, the inner inverse in (7.62) is a little more involved. In that case, the terms in (7.60) depending on σ2w must also be considered obtaining that AH (θ) R−1 (θ) A (θ) = IK 2 − σ 2w U (θ) + O σ 4w where U (θ) is the following full-rank matrix, U (θ) AH (θ) [B∗ (θ) ⊗ B (θ) NB (θ) + [B (θ) NB (θ)]∗ ⊗ B (θ)] A (θ) −1 ∗−1 −1 −1 H H = IK ⊗ A (θ) N A (θ) + A (θ) N A (θ) ⊗ IK , (7.63) that is simplified applying the associative property of the Kronecker product, (A ⊗ B) (C ⊗ D) = AC ⊗ BD, and using then the results in (7.57) and (7.58). Thus, the inverse in (7.62) can be solved computing the “economy-size” diagonalization of K−1 + IK 2 as follows K−1 + IK 2 = V Σ−1 + IK 2 VH where Σ is the diagonal matrix containing the eigenvalues of K that are different from −1 and the columns of V are the associated eigenvectors. Then, the inversion lemma can be applied once more to obtain −1 −1 −1 −1 = V Σ + I VH −σ2w U (θ) + O (1) K + AH (θ) R−1 (θ) A (θ) ⊥ = −σ−2 w PV (θ) + O (1) CHAPTER 7. ASYMPTOTIC STUDIES 214 where the orthogonal projector P⊥ V (θ) onto the subspace spanned by V is defined as H −1 −1 H −1 −1 P⊥ (θ) U (θ) I − V V U (θ) V V U (θ) . V (7.64) As it was argumented for the projector P⊥ A (θ) in (7.14), the conventional definition of the H is modified to include the weighting matrix U−1 (θ). orthogonal projector P⊥ V = I − VV Anyway, P⊥ V (θ) holds that P⊥ V (θ) V = 0 VH P⊥ V (θ) = 0, and, thus, P⊥ V (θ) is the projection matrix onto the subspace generated by the eigenvectors of K associated to the eigenvalue −1. Finally, putting together all the above partial results, we obtain that H # # A (θ) P⊥ Q−1 (θ) = R−1 (θ) + σ −2 w V (θ) A (θ) + O (1) using that AH (θ) R−1 (θ) = A# (θ) (7.61). ⊥ In the following, the orthogonal projector P⊥ V (θ) will be referred to as PK (θ) in order to emphasize the dependence on the kurtosis matrix K, i.e., ⊥ P⊥ K (θ) PV (θ) . 7.D HIGH-SNR LIMIT OF Q−1 (θ) (K SINGULAR) 215 Appendix 7.D High-SNR limit of Q−1 (θ) (K singular) If K is singular, as it happens when the nuisance parameters are drawn from a circular constellation (3.12), the inversion lemma cannot be applied directly and it is necessary to diagonalize previously the matrix K as indicated next: −1 H H Q−1 (θ) = R (θ) + A (θ) VK ΣK VK A (θ) −1 H H H H −1 −1 = R−1 (θ) IM 2 − A (θ) VK Σ−1 + V A (θ) R (θ) A (θ) V V A (θ) R (θ) K K K K where ΣK is the diagonal matrix containing the non-zero eigenvalues of K, and VK the associated eigenvectors. Therefore, the study carried out in Appendix 7.C is still correct if K−1 and A (θ) are substituted by Σ−1 K and A (θ) VK , respectively. In that case, the inner inverse studied in detail in Appendix 7.C is given by −1 −1 H H ΣK + VK A (θ) R−1 (θ) A (θ) VK (7.65) with H H VK A (θ) R−1 (θ) A (θ) VK = I−σ2w U (θ) + O σ4w and −1 H ∗−1 H U (θ) VK IK ⊗ AH (θ) N−1 A (θ) + A (θ) N−1 A (θ) ⊗ IK VK . (7.66) (7.67) According to this result, the same two scenarios of Appendix 7.C can be distinguished: 1. If Σ−1 K + I is invertible, it follows that H −1 H # Q−1 (θ) = R−1 (θ) − A# (θ) VK Σ−1 VK A (θ) + O σ2w K +I and the second term becomes negligible at high SNR. −1 2. Otherwise, if Σ−1 K + I is singular, the diagonal matrix ΣK + I has to be diagonalized as VΣV H where Σ is the diagonal matrix containing the eigenvalues of K different from −1 and V are the vectors of the canonical basis {ek } selecting the position of these eigenvalues in ΣK . Formally, the k-th diagonal entry of ΣK different from −1 is selected by means of the vector ek defined as [ek ]i 1 i=k 0 i = k . In that case, the term −σ2w U (θ) in (7.66) must be considered in the computation of (7.65), yielding −1 −1 −1 ΣK + I−σ2w U (θ) = VΣV H −σ2w U (θ) ⊥ = −σ −2 w PV (θ) + O (1) CHAPTER 7. ASYMPTOTIC STUDIES 216 where P⊥ V (θ) is the following orthogonal projector: H −1 −1 H −1 −1 P⊥ (θ) U (θ) I−V V U (θ) V V U (θ) V with U (θ) defined in (7.67). Finally, we obtain that H H # A# (θ) VK P⊥ Q−1 (θ) = R−1 (θ) + σ−2 w V (θ) VK A (θ) + O (1) using again that AH (θ) R−1 (θ) = A# (θ) (7.61). H The matrix VK P⊥ V (θ) VK is also a projector onto the subspace generated by the eigen- vectors of K associated to the eigenvalue −1. This projection is carried out in two steps. H are projecting onto the subspace associated to the eigenFirst, the matrices VK and VK values of K different form 0. Afterwards, P⊥ V (θ) is projecting onto the subspace associated to those eigenvalues different from −1. H ⊥ In the following, the orthogonal projector VK P⊥ V (θ) VK will be referred to as PK (θ) in order to emphasize the dependence on the kurtosis matrix K, i.e., ⊥ H P⊥ K (θ) VK PV (θ) VK . 7.E HIGH-SNR RESULTS WITH A (θ) SINGULAR 217 Appendix 7.E High-SNR results with A (θ) singular Depending on the rank of A (θ) ∈ CM ×K , two singular situations can be distinguished: 1. If the rank of A (θ) is equal to M with M ≤ K, the high-SNR limit of R−1 (θ) and R−1 (θ) is independent of the noise variance and is simply given by −1 + o (1) R−1 (θ) = A (θ) AH (θ) −1 + o (1) R−1 (θ) = A (θ) AH (θ) whereas the asymptotic value of Q−1 (θ) is determined by the rank of the K 2 × K 2 matrix IK 2 + K. If the rank of IK 2 + K is greater or equal to M 2 , the inverse of Q (θ) is the following constant matrix: −1 Q−1 (θ) = R (θ) + A (θ) KAH (θ) −1 = A (θ) AH (θ) + A (θ) KAH (θ) + o (1) −1 = A (θ) (IK 2 + K) AH (θ) + o (1) . In that case, all the MSE matrices in (7.5) and (7.6) suffer a serious floor at high SNR. This situation arises when there are less observations than nuisance parameters (i.e., M ≤ K) and the noise subspace becomes null. On the other hand, if the rank of IK 2 + K is less than M 2 (assuming that M ≤ K), the use of the fourth-order information avoids the variance floor at high SNR because in that case A (θ) (IK 2 + K) AH (θ) is not invertible and the terms of R−1 (θ) (7.12) proportional to σ2w has to be considered, as done in Appendix 7.C and Appendix 7.D. This situation is only possible if the nuisance parameters have constant modulus as, for example, the MPSK and CPM constellations. In the MPSK case, the rank of IK 2 +K is exactly K 2 −K because K = (ρ − 2) diag (vec (IK )) . In the CPM case, the rank reduction is still more significant. 2. If the rank of A (θ) is lower than min(M, K), the covariance matrix R (θ) must be diagonalized as follows H (θ) + σ2w N R (θ) = VA (θ) ΣA (θ) VA (7.68) where ΣA (θ) is the diagonal matrix having the positive eigenvalues of A (θ) AH (θ) and, VA (θ) are the associated eigenvectors. Therefore, the inverse of R (θ) is obtained after applying the inversion lemma to (7.68), obtaining that H −1 H −1 −1 IM 2 − VA (θ) VA . (θ) N−1 VA (θ) + σ2w Σ−1 (θ) V (θ) N R−1 (θ) = σ−2 w N A A CHAPTER 7. ASYMPTOTIC STUDIES 218 Then, similar results to those in Appendix 7.B are obtained with these substitutions: # −1 P⊥ I (θ) −→ N − V (θ) V (θ) M A A A H # # B (θ) −→ VA (θ) Σ−1 A (θ) VA (θ) H −1 H # (θ) VA (θ) N−1 VA (θ) VA (θ) N−1 . This second situation is observed with VA when some columns of A (θ), if M ≥ K, or some rows of A (θ), if M ≤ K, are linearly dependent. This is actually the case of the partial response CPM signals simulated in this thesis (e.g., 2REC, 3REC and GMSK). 7.F HIGH-SNR UCRB 219 Appendix 7.F High-SNR UCRB Using the asymptotic results in (7.12), it follows that −1 ∗ ⊥ σ2w DH (θ) Dr (θ) = DH r (θ) R r (θ) B (θ) ⊗ PA (θ) Dr (θ) ⊥∗ +DH (θ) P (θ) ⊗ B (θ) Dr (θ) + o (1) . r A Then, the entries of this matrix can be simplified as indicated next: H ∂R (θ) ⊥ ∂R (θ) 2 −1 σ w Dr (θ) R (θ) Dr (θ) p,q = Tr PA (θ) B (θ) ∂θp ∂θq ∂R (θ) ⊥ ∂R (θ) B (θ) PA (θ) + o (1) , + ∂θp ∂θq (7.69) bearing in mind that [Dr (θ)]p = vec (∂R (θ) /∂θp ) and using the properties in (7.54). The final expression is simplified because all the matrices in (7.69) are Hermitian. Therefore, if ∂R (θ) /∂θp is decomposed as ∂R (θ) ∂A (θ) H ∂AH (θ) = A (θ) + A (θ) , ∂θp ∂θp ∂θ p H ⊥ and all the terms including P⊥ A (θ) A (θ) and A (θ) PA (θ) are removed using (7.59), it follows that σ 2w DH r −1 (θ) R ∂AH (θ) ⊥ ∂A (θ) PA (θ) ∂θp ∂θq H ∂A (θ) ⊥ ∂A (θ) + Tr + o (1) PA (θ) ∂θq ∂θp (θ) Dr (θ) p,q = Tr using that AH (θ) B (θ) A (θ) = IK (7.58). Finally, the matix B1 (θ) in (7.18) is obtained observing that the last two terms are complex conjugated. CHAPTER 7. ASYMPTOTIC STUDIES 220 Appendix 7.G High-SNR UCRB variance floor Using the asymptotic results in (7.12), it follows that −1 ∗ DH (θ) Dr (θ) = DH r (θ) R r (θ) [B (θ) ⊗ B (θ)] Dr (θ) + o (1) . Then, the entries of this matrix can be simplified as indicated next: H ∂R (θ) ∂R (θ) −1 Dr (θ) R (θ) Dr (θ) p,q = Tr B (θ) B (θ) + o (1) , ∂θp ∂θq bearing in mind that [Dr (θ)]p = vec (∂R (θ) /∂θ p ) and using the properties listed in (7.54). The final expression is simplified because all the matrices in the last equation are Hermitian. Thus, if ∂R (θ) /∂θp is decomposed as ∂R (θ) ∂A (θ) H ∂AH (θ) = A (θ) + A (θ) , ∂θ p ∂θ p ∂θ p and the relations in (7.58) are applied, it follows that H ∂A (θ) # ∂A (θ) # −1 A (θ) A (θ) Dr (θ) R (θ) Dr (θ) p,q = Tr ∂θp ∂θq H ∂A (θ) ∂AH (θ) # + A (θ) A# (θ) ∂θp ∂θq H ∂AH (θ) H H ∂A (θ) + A# (θ) A# (θ) ∂θp ∂θq H H ∂A (θ) ∂A (θ) # # A (θ) A (θ) + o (1) + ∂θp ∂θq H taking into that B (θ) A# (θ) A# (θ) . Finally, the matrix B2 (θ) in (7.20) is obtained observing that the third and fourth terms are the complex conjugated versions of the first and second terms9 . 9 Notice that T r (AB) = T r (BA) . 7.H HIGH-SNR STUDY IN FEEDFORWARD SECOND-ORDER ESTIMATION 221 Appendix 7.H High-SNR study in feedforward second-order estimation Looking at the MSE matrices Σmse , Σvar , Σmse and Σvar in (7.5) and (7.6), the high-SNR limit −1 −1 and Q + Q has to be computed. In all these four cases, the of R−1 , Q−1 , R + Q following inversion problem must be solved: −1 T + σ2w U + σ 4w N where the expression of T depends on the inverse that is being solved (7.32) and U [G∗ ⊗ N + N∗ ⊗ G] G Eθ A (θ) AH (θ) . (7.70) (7.71) The Bayesian expectation is found to augment the rank of the involved matrices. This effect is actually negative since it reduces the dimension of the noise subspace. In the limit, if the constant term T became full-rank, the estimators would exhibit the typical variance floor at high SNR. However, in the sequel, we will assume that T is always rank defficient. It is worth noting that the kurtosis matrix K appearing in Q (7.26), reduces the rank of T and, therefore, the dimension of the noise subspace is increased. This aspect was also addressed in the first point of Appendix 7.E In order to evaluate the above inverse, the “economy-size” diagonalization of T = VT ΣT VTH is calculated and the auxiliary matrix X U + σ2w N is introduced. Then, the inversion lemma is invoked as it was done in Appendix 7.B, obtaining −1 −1 = VT ΣT VTH + σ2w X T + σ 2w U + σ4w N 2 ⊥ = σ −2 w PT + BT + O σ w (7.72) where BT is the following matrix: H # BT VT# Σ−1 T VT (7.73) with −1 H −1 VT X VT# VTH X−1 VT # −1 I P⊥ X −V V 2 T T M T the generalization of the pseudoinverse and the projection matrix onto the noise subspace of T, respectively. CHAPTER 7. ASYMPTOTIC STUDIES 222 In most problems, the rank of G Eθ A (θ) AH (θ) grows rapidly due to the Bayesian expectation and, eventually, matrix G becomes full-rank. In that case, taking into account (7.70), matrix U is also invertible so that lim X−1 = U−1 , σ2w →0 bearing in mind that X U + σ2w N . ⊥ When this happens (i.e., G is full rank), if the first term σ−2 w PT survives when (7.72) is or S = QM multiplied by Q in (7.5)-(7.6), it is posssible to have self-noise free estimates. Oth- ∈ span (T), the estimator exhibits the typical variance floor because the surviving erwise, if Q term BT in (7.72) is constant at high SNR. When the Gaussian assumption is adopted, it is shown in Appendix 7.I that = Eθ A (θ) vec (IK ) vecH (IK ) AH (θ) Q −Eθ {A (θ)} vec (IK ) vecH (IK ) EθH {A (θ)} (7.74) always lies in the subpace generated by T3 Eθ A (θ) AH (θ) + Q T4 Eθ A (θ) AH (θ) , which are the matrices T appearing in the MMSE and minimum variance estimators deduced under the Gaussian assumption (7.6). This result is independent of the actual parameterization and the nuisance parameters distribution. Consequently, if G is full rank, the Gaussian assumption always suffers from self-noise at high SNR (7.29), and the level of the variance floor is determined by Xvar (K) and Xmse (K) (7.30). Regarding the optimal estimators in (7.5), the cumulant matrix K is able to reduce the rank of T1 Eθ A (θ) (IK 2 + K) AH (θ) + Q T2 Eθ A (θ) (IK 2 + K) AH (θ) if the nuisance parameters have constant modulus. Unfortunately, this reduction is usually out of the span of T1 or T2 (7.32). In that case, the optimal large-error insufficient to move Q estimators in (7.5) also exhibit a variance floor at high SNR (7.29). The level of this variance floor depends again on the kurtosis matrix K. 7.I HIGH-SNR MSE FLOOR UNDER THE GAUSSIAN ASSUMPTION 223 Appendix 7.I High-SNR MSE floor under the Gaussian assumption does not increases the rank of these two matrices (7.32): In this appendix, it is proved that Q T4 Eθ A (θ) AH (θ) T3 Eθ A (θ) AH (θ) + Q. This implies that rank (T4 ) = rank T4 + Q . rank (T3 ) = rank T3 + Q Regarding the first statement, it is found that = Eθ A (θ) vec (IK ) vecH (IK ) AH (θ) Q −Eθ {A (θ)} vec (IK ) vecH (IK ) EθH {A (θ)} is the sum of infinitessimal terms like this: α1 A (θ1 ) vec (IK ) vecH (IK ) AH (θ1 ) + α2 A (θ2 ) vec (IK ) vecH (IK ) AH (θ2 ) −α3 A (θ1 ) vec (IK ) vecH (IK ) AH (θ2 ) − α3 A (θ2 ) vec (IK ) vecH (IK ) AH (θ1 ) , (7.75) corresponding to two arbitrary values of θ, namely θ1 and θ2 , with α1 = fθ (θ1 ) > 0 α2 = fθ (θ2 ) > 0 α3 = fθ (θ1 ) fθ (θ2 ) > 0 the associated probability densities. It can be shown that (7.75) is contained in the span of the following matrix: α1 A (θ1 ) vec (IK ) vecH (IK ) AH (θ1 ) + α2 A (θ2 ) vec (IK ) vecH (IK ) AH (θ2 ) . Then, if T4 Eθ A (θ) AH (θ) is decomposed in the same way, T4 becomes the sum of infinitessimal terms such as α1 A (θ1 ) IK 2 AH (θ1 ) + α2 A (θ2 ) IK 2 AH (θ2 ) . must lie in the span of T4 on account of the following relationship: Therefore, Q A vec (B) vecH (B) C ∈ span {A (B ⊗ B) C} that is hold for arbitrary matrices A, B and C. ∈ span (T4 ), it belongs necessarily to the span of T3 = T4 + Q. Finally, if Q CHAPTER 7. ASYMPTOTIC STUDIES 224 Appendix 7.J Performance limits in second-order frequency estimation When the received signal exhibits a frequency offset equal to ν/T , the pulse received at time kT is given by g (mTs − kT ; ν) = g0 (mTs − kT ; ν) ej2πνk where T and Ts are the symbol and sample period, respectively, and g0 (mTs ; ν) p (mTs ) ej2πνm/Nss stands for the pulse p(t) received at t = 0. The derivative of g (mTs − kT ; ν) is given by % & ∂g (mTs − kT ; ν) ∂g0 (mTs − kT ; ν) = + g0 (mTs − kT ; ν) (j2πk) ej2πνk . ∂ν ∂ν Let A (ν) and A0 (ν) stand for the matrices whose columns are delayed replicas of g (mTs ; ν) and g0 (mTs ; ν) , respectively. It can be shown that these two matrices —and their derivatives— are related in the following manner: A (ν) = A0 (ν) diag (exp (j2πdK ν)) & % ∂A0 (ν) ∂A (ν) = + A0 (ν) diag (j2πdK ) diag (exp (j2πdK ν)) ∂ν ∂ν where dK = [0, 1, . . . , K − 1]T is the K-long vector accounting for the intersymbol phase slope. On the other hand, the stationary matrix A0 (ν) only accounts for the phase variation during the observation interval. Thus, in the frequency estimation problem, X1 (θ) and X1,1 (θ) (7.35) have some additional terms depending on dK that are listed next: X (ν) = σ 2w AH (ν) R−1 (ν) A (ν) = X (ν) E∗ (ν) ∂AH (ν) −1 X1 (ν) = σ 2w R (ν) A (ν) = X1 (ν) − σ2w diag (j2πdK ν) X (ν) E∗ (ν) ∂ν ∂AH (ν) −1 ∂A (ν) X1,1 (ν) = σ 2w R (ν) = X1,1 (ν) − σ 2w diag (j2πdK ν) X1H (ν) ∂ν ∂ν +X1 (ν) diag (j2πdK ν) − diag (j2πdK ν) X (ν) diag (j2πdK ν)] E∗ (ν) where H −1 (ν) A0 (ν) X (ν) σ −2 w A0 (ν) R H ∂A0 (ν) −1 R (ν) A0 (ν) X1 (ν) σ −2 w ∂ν ∂AH ∂A0 (ν) 0 (ν) −1 X1,1 (ν) σ −2 R (ν) w ∂ν ∂ν 7.J PERFORMANCE LIMITS IN SECOND-ORDER FREQUENCY ESTIMATION 225 are the functions X (θ), X1 (θ) and X1,1 (θ) associated to the stationary matrix A0 (ν) , and [E (ν)]i,k exp (j2π (i − k) ν) was introduced in Appendix 3.D. It can be shown that the new terms on dK as well as the factor E∗ (ν) vanish when X (θ), X1 (θ) and X1,1 (θ) are plugged into BUCRB (θ) (7.36), Ψ(K) (7.38) and Γ(K) (7.45). The terms on dK are imaginary and they are eliminated when the real part is extracted in BUCRB (θ), Ψ(K) and Γ(K). On the other hand, only the diagonal entries of E∗ (ν) —which are all equal to 1—are involved in (7.36), (7.38) and (7.44). The conclusion of this appendix is that, despite the received signal is not stationary in the frequency estimation problem, the asymptotic study can be addressed considering uniquely the stationary matrix A0 (ν) and its derivatives. CHAPTER 7. ASYMPTOTIC STUDIES 226 Appendix 7.K Asymptotic study for M → ∞ In this appendix, the asymptotic limit of BU CRB (θ), Bgml (θ) and Bbque (θ) is derived when the number of antennas goes to infinity. While the asymptotic study of BU CRB (θ) and Bgml (θ) was already addressed in the literature, the asymptotic study of the optimal second-order DOA estimator is carried out in this appendix for the first time. The most important conclusion is that the term Γ(K) that appears in Bbque (θ) is always negligible in front of B−1 UCRB (θ) independently of the nuisance parameters distribution. Indeed, B−1 U CRB (θ) is found to grow as M 3 whereas Γ(K) cannot grow faster than M. Moreover, if the nuisance parameters are circular, Γ(K) goes to zero as M −1 or faster. An interesting conclusion is that the convergence order can be increased in one order when the parameters are drawn from a constant-modulus alphabet. However, this increase is not sufficient in the studied problem because the dominant term, B−1 U CRB (θ) , goes to infinity faster. When the number of sensors goes to infinity (M → ∞), the spatial cross-correlation matrices in (7.51) have the following asymptotic expressions: −1 B (θ) = AH s (θ) Ns As (θ) = MIP + O(1) ∂AH s (θ) −1 Ns As (θ) = Bp∞ (θ) + O(1) Bp (θ) = ∂θp ∞ (θ) + o(M 2 ) p = q Bp,q (θ) ∂A (θ) ∂AH s s N−1 = Bp,q (θ) = s ∞ (θ) + o(M 3 ) p = q ∂θp ∂θq Bp,q with [B (θ)]i,k M θi = θk sin(πM (θi −θk )/2) sin(π(θi −θk )/2) otherwise ⎧ ⎪ i = p, θp = θk ⎪ ⎨0 ∞ π cos(π(θp −θk )/2) i = p, θp − θk = 1/M, 3/M, ... Bp (θ) i,k ± 2 sin2 (π(θp −θk )/2) ⎪ ⎪ ⎩ π M cos(πM (θp −θk )/2) otherwise 2 sin(π(θp −θk )/2) ⎧ ⎪ 0 i = p or k = q ⎪ ⎪ ⎪ ⎪ π2 3 ⎨ M i = p, k = q, θp = θq ∞ Bp,q (θ) i,k 12π2 cos(π(θp −θq )/2) . ⎪ ± 2 M sin2 (π(θ −θ )/2) i = p, k = q, θp − θq = 2/M, 4/M, ... ⎪ p q ⎪ ⎪ ⎪ ⎩ π2 M 2 sin(πM(θp −θq )/2) otherwise 4 (7.76) (7.77) (7.78) sin(π(θp −θq )/2) In order to find the asymptotic value of BUCRB (θ), Bgml (θ) and Bbque (θ), it is necessary to obtain the limit of X (θ) , Xp (θ) and Xp,q (θ) (7.35) as M → ∞. Before doing so, we have to evaluate the inverse appearing in X (θ) , Xp (θ) and Xp,q (θ) when the number of antennas goes to infinity. Taking into account that the diagonal entries of −1 B (θ) = AH t Nt At ⊗ B (θ) 7.K ASYMPTOTIC STUDY FOR M → ∞ 227 are proportional to M (7.76), it follows that −1 −1 σ2 = M −1 M −1 B (θ) + w IKP = B−1 (θ) + o M −1 B (θ) + σ2w IKP M (7.79) where the last expression is verified for σ2w /M → 0. If the resulting inverse is now plugged into (7.35), X (θ) and Xp (θ) become zero when σ2w /M goes to zero10 . Hence, (7.79) should be expanded in a Taylor series around σ2w /M = 0 in order to determine its order of convergence, obtaining −1 B (θ) + σ2w IKP = B−1 (θ) − σ2w B−2 (θ) + σ4w B−3 (θ) + o M −3 . Plugging now the Taylor series into (7.35), it follows that X (θ) = σ2w IKP − σ4w B−1 (θ) + O M −2 −1 −1 ⊗ B−1 (θ) + O M −2 = σ2w IKP − σ4w AH t Nt At (7.80) Xp (θ) = σ2w Bp (θ) B−1 (θ) + o (1) = σ2w IK ⊗ Bp∞ (θ) B−1 (θ) + o (1) −1 ∞ 2 Bp,q (θ) + o(M 2 ) = AH t Nt At ⊗ Bp,q (θ) + o(M ) p = q Xp,q (θ) = −1 ∞ 3 Bp,q (θ) + o(M 3 ) = AH t Nt At ⊗ Bp,q (θ) + o(M ) p = q (7.81) (7.82) where the inverse of B (θ) has the following asymptotic value: B −1 (θ) = M −1 IP + O M −2 . (7.83) (Gaussian) Unconditional Cram´ er-Rao Bound Using the above results, it can be shown that the diagonal entries of B−1 U CRB (θ) (7.36) have the following asymptotic value −1 3 −1 BU CRB (θ) p,p = DH (θ) Dr (θ) p,p = 2σ−4 r (θ) R w Re Tr (X (θ) Xp,p (θ)) + o M 3 = 2σ −2 w Re Tr (Xp,p (θ)) + o M H −1 3 ∞ = 2σ −2 w Re Tr At Nt At ⊗ Bp,p (θ) + o M 3 π 2 σ −2 w −1 (7.84) M 3 Re Tr AH = t Nt At + o M 6 whereas the off-diagonal entries converge to a constant when the number of antennas is aug10 Notice that this condition will be also satisfied at high SNR. CHAPTER 7. ASYMPTOTIC STUDIES 228 mented. After some tediuos calculations, it can be shown that −1 −1 (θ) Dr (θ) p,q BU CRB (θ) p,q = DH r (θ) R = 2σ−4 w Re Tr (Xp (θ) Xq (θ) + X (θ) Xp,q (θ)) ∞ (θ) + o(1) = 2 Tr IK ⊗ M −2 Bp∞ (θ) Bq∞ (θ) − B−1 (θ) Bp,q ∞ = 2K M −2 Bp∞ (θ) p,q Bq∞ (θ) q,p − Bp,q (θ) p,q B −1 (θ) q,p + o(1) ∞ = 2KM −2 Bp∞ (θ) p,q Bq∞ (θ) q,p + Bp,q (θ) p,q [B (θ)]q,p + o(1) =− Kπ 2 cos (πM (θp − θ q )) + o(1) 2 sin2 (π (θp − θq ) /2) assuming in the last equation that θp − θq is not multiple of 1/2M with probability one.11 Notice that the off-diagonal entries of B−1 UCRB (θ) are constant because the term proportional ∞ (θ) are null for p = q (7.78). Therefore, in order to to M is zero since the diagonal entries of Bp,q ∞ (θ), the off-diagonal entries of B −1 (θ) in (7.83) must be taken evaluate the trace of B −1 (θ) Bp,q into account. Thus, B −1 (θ) needs to be expanded in a Taylor series around M −1 = 0, obtaining −1 B−1 (θ) = M −1 IP + M −1 [B (θ) − MIP ] M −1 IP − M −2 [B (θ) − MIP ] + o M −1 = 2M −1 IP − M −2 B (θ) + o M −1 (7.85) and, using (7.76), we have sin (πM (θp − θq ) /2) + o M −2 B −1 (θ) q,p = −M −2 [B (θ)]q,p + o M −2 = M −2 sin (π (θp − θq ) /2) Finally, the term Bp∞ (θ) Bq∞ (θ) is computed taking into account that π cos (πM (θp − θk ) /2) M −1 Bp∞ (θ) p,q = 2 sin (π (θp − θk ) /2) using (7.77). Gaussian Maximum Likelihood In this section, the asymptotic study of Bgml (θ) = BU CRB (θ) + Xgml (K) when M → ∞ is addressed concluding that the second term Xgml (K) is negligible in front of BUCRB (θ). Therefore, the GML estimator is proved to be robust to the sources’ distribution when the number of antennas goes to infinity. 11 Notice that, if θp − θ q were multiple of 1/M , the final expression could be calculated considering these particular cases of Bp (θ) and Bp,q (θ) in (7.77)-(7.78). Anyway, the off-diagonal entries are found to become asymptotically constant unless θp − θq = 0.5/M, 1.5/M, ... In that case, the constant term is equal to zero and, −1 thus, the convergence order of B−1 ). U CRB p,q becomes O(M 7.K ASYMPTOTIC STUDY FOR M → ∞ 229 To begin with, let us remind the expression of Xgml (K) in (7.37): Xgml (K) = BU CRB (θ)Ψ (K) BU CRB (θ) H [Ψ (K)]p,q = σ−4 w vec (Yp (θ)) K vec (Yq (θ)) . Then, the asymptotic value of Yp (θ) is obtained from (7.39), concluding that 4 −1 (θ) + σ4w B−1 (θ) BH Yp (θ) = X (θ) Xp (θ) + XH p (θ) X (θ) = σw Bp (θ) B p (θ) + o (1) −1 H 4 ∞ −1 ∞ = σw IK ⊗ Bp (θ) B (θ) + B (θ) Bp (θ) + o (1) H = σ4w IK ⊗ M −1 Bp∞ (θ) + M −1 Bp∞ (θ) + o (1) (7.86) and, taking into account that Bp∞ (θ) is proportional to M (7.77), the convergence order of Yp (θ) is O (1), at most. In that case, Xgml (K) decays as O M −6 whereas BU CRB (θ) decreases as O M −3 (7.84) when the number of sensors goes to infinity. Focusing now on those circular alphabets considered in (7.42), it is found that Xgml (K) decays as O M −8 because Ψ (K) becomes proportional to M −2 , as indicated next:12 H [Ψ (K)]p,q = σ−4 w (ρ − 2) diag (Yp (θ)) diag (Yq (θ)) = σ−4 w (ρ − 2) Tr (Yp (θ) Yq (θ)) = 4K (ρ − 2) Tr Bp∞ (θ) B −1 (θ) Bq∞ (θ) B −1 (θ) + o M −2 Bp∞ (θ) p,i B−1 (θ) i,p Bq∞ (θ) q,i B −1 (θ) i,q + o M −2 = 4K (ρ − 2) i=p,q = = 4K (ρ − 2) ∞ Bp (θ) p,i [B (θ)]i,p Bq∞ (θ) q,i [B (θ)]i,q + o M −2 2 M i=p,q ⎧ '2 ⎪ ⎪ $ 2 sin(πM (θp −θq )) ⎨ Kπ (ρ − 2) + o M −2 p = q 4M 2 ⎪ ⎪ ⎩0 p=q sin2 (π(θp −θq )/2) (7.87) p = q where the off-diagonal elements of B −1 (θ) in (7.85) are considered again because the diagonal of Bp∞ (θ) is zero (7.77). Remember that K is the number of columns of matrix At or, in other words, the number of nuisance parameters per user. Best Quadratic Unbiased Estimator Thus far, the performance of the GML estimator is shown to be independent of the nuisance parameters distribution when the number of sensors goes to infinity. Next, the BQUE estimator is shown to converge asymptotically to the (Gaussian) UCRB when M → ∞. Specifically, if the nuisance parameters have constant modulus, the non-Gaussian term Γ(K) could be proportional 12 All the matrices are real-valued and the Re{} operator is omitted for simplicity. CHAPTER 7. ASYMPTOTIC STUDIES 230 to M as the number of antennas is augmented. However, this is not possible if the nuisance parameters are circular. In that case, Γ(K) goes to zero as M −1 . On the other hand, if the modulus of the nuisance parameters is not constant, Γ(K) might be constant but it decays as M −2 if the nuisance parameters are circular. To support this conclusion, we begin by recovering the general expression of Γ(K) from (7.45): H ∗ H 4 −1 −1 H [Γ (K)]p,q −σ−4 VK vec (Yp (θ)) w vec (Yp (θ)) VK VK [X (θ) ⊗ X (θ)] VK + σ w ΣK H is the “economy-size” diagonalization of K. Then, bearing in mind that where K = VK ΣK VK Yp (θ) (7.86) is constant in the best case, the asymptotic order of Γ(K) is determined by H ∗ −1 VK [X (θ) ⊗ X (θ)] VK + σ4w Σ−1 , K that converges to a constant if all the eigenvalues of the kurtosis matrix K are different from −1. This condition on the eigenvalues of K is equivalent to the aforementioned constant-modulus condition. In that case, bearing in mind that X (θ) = σ−2 w IKP + o(1), it is straightforward to obtain H ∗ −1 −1 −1 VK [X (θ) ⊗ X (θ)] VK + σ4w Σ−1 = σ−4 + o (1) w I + ΣK K and, therefore, Γ(K) converges to a constant as M → ∞. On the other hand, if some eigenvalues of K are equal to −1, the inverse of I + Σ−1 K does not exist and the second component of X (θ) (7.80) must be considered. Thus, it follows that H ∗ −1 −1 6 VK [X (θ) ⊗ X (θ)] VK + σ4w Σ−1 = σ4w I + Σ−1 K K − 2σ w U (θ) + o M where the second term, H −1 H B (θ) VK = M −1 VK U (θ) VK −1 −1 −1 , AH N A ⊗ I t P VK + o M t t (7.88) is proportional to M −1 . At this point, the inversion lemma should be applied to compute the above inverse, as it was done in Section 7.3. By doing so, the inversion would yield a term proportional to M and, therefore, the non-Gaussian term Γ(K) will become proportional to M, as well. To illustrate this general conclusion, the previous analysis is particularized in case of having circular nuisance parameters. In that case, the non-Gaussian term Γ(K) is given in (7.46): −1 −1 H ∗ 4 [Γ (K)]p,q −σ−4 IKP diag (Yq (θ)) . w diag (Yp (θ)) X (θ) X (θ) + σ w (ρ − 2) where X∗ (θ) X (θ) + σ4w (ρ − 2)−1 IKP = σ4w ρ−1 IKP − 2σ6w U (θ) + o M −1 ρ−2 (7.89) 7.K ASYMPTOTIC STUDY FOR M → ∞ 231 and U (θ) is the matrix introduced in (7.88), that can be written as −1 −1 U (θ) = M −1 IKP AH + O M −2 N A ⊗ I t P t t −1 −1 ⊗ IP + O M −2 N A = M −1 Dg AH t t t being Dg [A] the diagonal matrix built from the diagonal of A. Therefore, if the fourth- to second-order ratio ρ is not unitary, Γ (K) is given by 2−ρ diagH (Yp (θ)) diag (Yq (θ)) ρ−1 4K 2 − ρ ∞ Tr Bp (θ) B−1 (θ) Bq∞ (θ) B−1 (θ) + o M −2 = 4 M ρ−1 ⎧ '2 ⎪ ⎪ ⎨ Kπ2 2−ρ $ sin(πM(θp −θq )) + o M −2 p = q 2 ρ−1 2 (π(θ −θ )/2) 4M sin p q , = p=q ⎪ ⎪ ⎩0 p = q [Γ (K)]p,q = σ−8 w repeating the calculations in (7.87). Notice that this term goes to zero as O M −2 and, therefore, it is absolutely negligible when compared to B−1 U CRB (θ) (7.84). On the other hand, if we deal with a constant modulus alphabet with ρ = 1, the constant term in (7.89) is zero and the next term must be considered, yielding H −1 [Γ (K)]p,q = 0.5σ−10 (θ) diag (Yq (θ)) w diag (Yp (θ)) U −2 2σ ξKEs ∞ = w 3 Tr Bp (θ) B −1 (θ) Bq∞ (θ) B −1 (θ) + o M −1 ⎧ M '2 ⎪ −1 ⎪ ⎨ σ−2 ξKEs π2 $ sin(πM(θp −θq )) + o M p=q 2 w 8M sin (π(θp −θq )/2) = p=q ⎪ ⎪ ⎩0 p = q (7.90) where Es 1 −1 Tr AH t Nt At K is the energy of the received symbols (for K sufficiently large) and −1 −1 N A Tr Dg −1 AH t t t H −1 ≤1 ξ Tr At Nt At (7.91) is a coefficient determined by the snapshots correlation. In particular, ξ is unitary if the snap−1 shots are uncorrelated because, in that case, AH t Nt At = Es IK . Therefore, ξK can be under- stood as the effective observation time. The index ξ is therefore the only information about the temporal waveform that is retained in the asymptotic performance of the optimal second-order estimator. CHAPTER 7. ASYMPTOTIC STUDIES 232 Finally, regarding (7.90), we can state that the term Γ (K) decays as O M −1 and, therefore, it is asymptotically negligible in front of B−1 U CRB (θ) (7.84). Putting together all these partial results, if follows that the estimator performance is independent of the number of interfering users because the off-diagonal terms of BU CRB (θ), Bgml (θ) and Bbque (θ) are negligible. Furthermore, the non-Gaussian information is negligible when the number of antennas goes to infinity because Γ (K) B−1 U CRB (θ) and Xgml (K) BUCRB (θ). Regarding the asymptotic value of Γ(K), it can be seen how a positive term proportional to σ−2 w appears when the nuisance parameters have a constant amplitude. However, this term is actually proportional to M −1 and, therefore, Γ (K) is absolutely negligible in front of B−1 U CRB (θ) (7.84) whatever the actual SNR. 7.L ASYMPTOTIC STUDY FOR NS → ∞ 233 Appendix 7.L Asymptotic study for Ns → ∞ −1 The asymptotic study considering an arbitrary temporal correlation AH t Nt At and a finite number of sensors is rather involved because of the inverse appearing in X (θ), Xp (θ) and Xp,q (θ) (7.35). To circumvent this obstacle, two directions have been adopted. In the first approach, the asymptotic study (Ns → ∞) is carried out considering that the SNR goes to infinity without −1 any assumption about AH t Nt At . The objective is to prove that the non-Gaussian term Γ(K) (7.46) remains significant even if the observed time is infinite. An important conclusion is that, asymptotically, the estimator performance is independent of the temporal structure of the received signals, at least at high SNR. Bearing this result in mind, in the second part of this appendix, the same asymptotic study is done assuming that the received snapshots are −1 uncorrelated, i.e., AH t Nt At = Es IK . This scenario is actually the one simulated in Section 6.5 considering that the received symbols are detected without ISI at the matched filter output. Large sample study for high SNR and arbitrary temporal correlation To begin with, let us consider that the SNR is very high, i.e, σ2w → 0. In that case, the inverse in X (θ), Xp (θ) and Xp,q (θ) (7.35) can be evaluated as we did when the number of antennas was infinite (7.80)-(7.82), obtaining −1 = B−1 (θ) − σ2w B−2 (θ) + σ4w B−3 (θ) + o M −3 . B (θ) + σ2w IKP Then, plugging this result into (7.35), we get X (θ) = σ2w IKP − σ4w B−1 (θ) + o σ4w −1 −1 ⊗ B−1 (θ) + o σ 4w = σ2w IKP − σ4w AH t Nt At Xp (θ) = σ2w Bp (θ) B−1 (θ) + o σ2w = σ2w IK ⊗ Bp (θ) B−1 (θ) + o σ 2w Xp,q (θ) = Bp,q (θ) + Bp (θ) B−1 (θ) BH q (θ) + o (1) −1 −1 = AH (θ) BqH (θ) + o (1) t Nt At ⊗ Bp,q (θ) − Bp (θ) B where B (θ), Bp (θ) and Bp,q (θ) are the spatial correlation matrices for M finite (7.51). Based on the above high-SNR expressions, the (Gaussian) UCRB, B−1 UCRB p,q = 2σ −4 w Re Tr (Xp (θ) Xq (θ) + X (θ) Xp,q (θ)) , as well as the non-Gaussian terms Ψ (K) and Γ (K) introduced in (7.38) and (7.45), H H [Ψ (K)]p,q = σ−4 w vec (Yp (θ)) VK ΣK VK vec (Yq (θ)) H ∗ H 4 −1 −1 H [Γ (K)]p,q = −σ−4 VK vecH (Yp (θ)) , w vec (Yp (θ)) VK VK [X (θ) ⊗ X (θ)] VK + σ w ΣK CHAPTER 7. ASYMPTOTIC STUDIES 234 can be evaluated when the number of received symbols goes to infinity (Ns → ∞). In the last H is the “economy-size” diagonalization of the kurtosis matrix K and the equations, VK ΣK VK high-SNR limit of Yp (θ) is given by Yp (θ) = X (θ) XP (θ) + XH p (θ) X (θ) = σ4w IK ⊗ Bp (θ) B −1 (θ) + B−1 (θ) BpH (θ) + o σ4w . At this point, the formulation in Appendix 7.K can be reproduced to produce asymptotic expressions for Ns → ∞. Thus, in this appendix, similar asymptotic expressions to those in Appendix 7.K are deduced for B (θ), Bp (θ) and Bp,q (θ) the spatial correlation matrices of the studied finite sensor array (7.51). In Appendix 7.K, the asymptotic form of Ψ (K) and Γ (K) is derived as a function of K and, afterwards, the obtained expressions are simplified in case of having circular nuisance parameters. Next, assuming again circular nuisance parameters, the limit of BUCRB (θ) , Bgml (θ) and Bbque (θ) is calculated as the number of received symbols Ns goes to infinity13 . Thus, starting from the above high-SNR limits of X (θ) , Xp (θ) , Xp,q (θ) and Yp (θ), we arrive at Es −1 B−1 (θ) BqH (θ) + o (Ns ) U CRB (θ) p,q = 2Ns 2 Re Tr Bp,q (θ) − Bp (θ) B σw [Ψ (K)]p,q = [Γ (K)]p,q 4Ns (ρ − 2) Tr Bp (θ) B−1 (θ) Bp (θ) B−1 (θ) + o (Ns ) p = q p = q 0 ⎧ 2−ρ −1 (θ) B (θ) B −1 (θ) + o (N ) ⎪ 4N Tr B (θ) B ρ = 1, p = q ⎪ s p p s ρ−1 ⎨ −1 Es −1 −1 −1 = 2ξNs σ2 Tr Bp (θ) B (θ) Dg B (θ) Bp (θ) B (θ) + o (Ns ) ρ = 1, p = q w ⎪ ⎪ ⎩0 p = q taking into account that, if the number of received symbols Ns goes to infinity, the central rows −1 and columns of AH t Nt At are delayed versions of the pulse autocorrelation R[k] (Section 7.4.4) and, therefore, 1 −1 Tr AH t Nt At = R[0] = Es Ns →∞ Ns lim Besides, the coefficient ξ (7.91) can be manipulated using the spectral analysis in Section 7.4.4, yielding −1 −1 Tr Dg−1 AH N A t t t 1 H −1 = 1 lim ξ = lim Ns →∞ Ns →∞ Tr At Nt At 0 Es /S (f) df 13 Ns . Notice that, if Ns goes to infinity,the number of observed symbols K = Ns + L − 1 is asymptotically equal to 7.L ASYMPTOTIC STUDY FOR NS → ∞ 235 with S (f) = F {R[k]}. Following the same reasoning in Appendix 7.K, the asymptotic expression of [Γ (K)]p,q for ρ = 1 is obtained from (7.46) by expanding the argument of the inverse as follows: X∗ (θ) X (θ) + σ 4w (ρ − 2)−1 IKP = σ4w ρ−1 IKP − 2σ6w U (θ) + o σ6w ρ−2 where U (θ) = Dg −1 −1 AH ⊗ Dg B−1 (θ) t Nt At is the surviving term when ρ = 1. Notice that the asymptotic results in this appendix are equivalent to those obtained in the high-SNR study of Section 7.3.3 if we deal with circular nuisance parameters. The first conclusion is that Xgml (θ) = BU CRB (θ)Ψ(K) BU CRB (θ) (7.35) is negligible at high SNR because it is proportional to σ4w whereas BU CRB (θ) is only proportional to σ 2w . The second conclusion is −1 that the second term Γ(K) has the same dependence on Ns and σ −2 w than BU CRB (θ) in case of a constant-modulus alphabet and, therefore, Γ(K) is not negligible even if Ns → ∞. However, the last conclusion is only verified in the multiuser case, i.e., P > 1. In the single user case, B (θ), Bp (θ) and Bp,q (θ) are the following scalars: B (θ) = M Bp (θ) = 0 π2 M 2 − 1 M . Bp,q (θ) = 12 and, therefore, the non-Gaussian terms Ψ (K) and Γ(K) are zero for any SNR because Bp (θ) = 0. Thus, in the single user case, the asymptotic performance of second-order bearing estimators is given by M + σ2w 6 σ 2w + o Ns−1 2 2 2 π Ns Es (M − 1) M 1 M + Es /N −1 6 0 + o Ns = 2 π Ns Es /N0 (M 2 − 1) M 2 BUCRB (θ) , Bbque (θ) , Bgml (θ) = where σ2w = N0 is the double-sided spectral density of the AWG noise. The above result is valid for any value of σ 2w or M. Moreover, this expression converges to the bound in (7.52) when the number of antennas holds that M max (Es /N0 )−1 , 1 , which is equivalent to M 1 in the context of digital communications. CHAPTER 7. ASYMPTOTIC STUDIES 236 Large sample study for uncorrelated snapshots and arbitrary SNR. Next, the performance of the GML and BQUE estimators is evaluated considering an arbi−1 trary SNR and uncorrelated snapshots, i.e., AH t Nt At = Es IK . In this scenario, it is straight- forward to show that the (Gaussian) UCRB (7.36) is inversely proportional to the number of snapshots K = Ns , even if Ns is finite. Actually, we have B−1 U CRB (θ) p,q = 2Ns Es σ−4 w Re Tr (Xp (θ) Xq (θ) + X (θ) Xp,q (θ)) [X (θ)] = 2Ns Es σ−4 [X (θ)] + [X (θ)] [X (θ)] p,q p q w q,p p,q p,q q,p where X (θ) , Xp (θ) and Xp,q (θ) are the spatial components of X (θ) , Xp (θ) and Xp,q (θ), that is, −1 X (θ) B (θ) − B (θ) B (θ) + σ2w IP B (θ) −1 B (θ) Xp (θ) Bp (θ) − Bp (θ) B (θ) + σ2w IP −1 H Xp,q (θ) Bp,q (θ) − Bp (θ) B (θ) + σ2w IP Bq (θ) . Furthermore, the non-Gaussian terms Ψ (K) (7.42) and Γ (K) (7.46) are also proportional to Ns and, consequently, they do not vanish as more snapshots are processed. Simple manipulations yield the following expressions in case of circular nuisance parameters: [Ψ (K)]p,q = 4Ns Es σ −4 w (ρ − 2) Tr (X (θ) Xp (θ) X (θ) Xp (θ)) −1 −1 H ∗ 4 diag (X (θ) X (θ)) X (θ) X (θ) + σ (ρ − 2) I [Γ (K)]p,q = −4Ns Es σ−4 p P w w diag (X (θ) Xq (θ)) . Finally, notice that Bp (θ) is still null in the single user case and, therefore, Ψ (K) and Γ (K) are also zero because Xp (θ) = 0. Chapter 8 Conclusions and Topics for Future Research In this thesis, optimal blind second-order estimators are deduced considering the true distribution of the nuisance parameters. Quadratic estimators are formulated assuming that the nuisance parameters distribution is known and a certain side information on the unknown parameters is available. Adopting the Bayesian formulation, the referred side information is introduced by means of the parameters prior distribution. This approach allows unifying the formulation of the open-loop (large-error) estimators in Chapter 3 and the closed-loop (small-error) estimators in Chapter 4. In the former case, the prior knowledge is rather vague whereas a very informative prior is considered in the latter case. The first important conclusion is that, in most estimation problems, second-order techniques are severely degraded due to the bias term unless the small-error condition is satisfied. As an illustrative example, it is shown in Chapter 3 that it could be difficult to have unbiased frequency estimates using second-order open-loop schemes. However, the interest of second-order openloop estimators is motivated by the problematic convergence of closed-loop schemes in noisy scenarios, in which large-error and small-error estimators are shown to yield approximately the same mean square error. To avoid the bias limitations, the Best Quadratic Unbiased Estimator (BQUE) is deduced in Chapter 4 under the small-error condition. The covariance matrix associated with the BQUE estimator constitutes the tighest lower bound on the variance of any second-order unbiased estimator. Formally, it is claimed in Chapter 4 that −1 H −1 E ( α − g (θ))2 ≥ BBQUE (θ) = Dg (θ) DH (θ) Dr (θ) Dg (θ) r (θ) Q for any quadratic estimator of α = g (θ). In the above expression, Dg (θ) and Dr (θ) are the Jacobian of g (θ) and vec (R (θ)), respectively, where R (θ) stands for the covariance matrix of 237 CHAPTER 8. CONCLUSIONS AND TOPICS FOR FUTURE RESEARCH 238 the observed vector y. On the other hand, Q (θ) contains all the central fourth-order moments of y. The matrix Q (θ) can be splitted into two terms, as it was pointed out in Chapter 3, obtaining Q (θ) R∗ (θ) ⊗ R (θ) + A (θ) KAH (θ) (8.1) where the second term accounts for all the non-Gaussian information about the nuisance parameters (K = 0) that is profitable to estimate the vector of parameters θ by means of quadratic processing. The evaluation of the potential benefits gained when considering this second term has become one of the most important issues in this thesis. In many problems, the Gaussian assumption (i.e., K = 0) is adopted to design secondorder schemes when the actual distribution is unknown or it becomes an obstacle to obtain analytically the ML estimator. The most relevant contribution in this thesis is proving that the Gaussian assumption leads, in some scenarios, to suboptimal second-order estimation methods. Conversely, the Gaussian assumption is proved to supply the optimal second-order estimator —independently of the actual parameterization— in all these cases: • The nuisance parameters are normally distributed. • The SNR is low. • All the derivatives of the transfer matrix A (θ) are orthogonal to the columns of A (θ) (Section 7.3). Formally, PA (θ) ∂A (θ) =0 ∂θp where PA (θ) is the orthogonal projector onto the subspace generated by the columns of A (θ) .1 • The SNR is high and the nuisance parameters are drawn from a multilevel alphabet (e.g., the QAM constellation). Otherwise, in case of dealing with constant-modulus nuisance parameters (e.g., the MPSK or CPM modulations), some improvement can be expected in case of exploiting the second term of Q (θ) for medium-to-high SNR. The actual improvement is a function of the observation size and depends on the actual parameterization. All these conclusions have also been evidenced in Chapter 5 where the design of the optimal second-order tracker is presented. In this chapter, the Kalman filter formulation is adopted to optimize both the acquisiton and steady-state performance. In that way, in Chapter 5, the 1 A more general condition is presented in (7.47) in case of circular nuisance parameters. 239 Gaussian assumption is validated in the acquisition phase, concluding that the acquistion time can be significantly shortened for practical SNRs if the nuisance parameters are drawn from a constant modulus alphabet. Otherwise, if the nuisance parameters have multiple amplitudes, the Gaussian assumption is also optimal in terms of acquisition performance. Despite the last statements, in some significant estimation problems, the Gaussian assumption applies asymptotically as long as the observation interval grows to infinity, as proved in Chapter 7. In that case, the importance of the second term of Q (θ) is relegated to those scenarios in which the observation interval is short. Despite this, most conclusions from the asymptotic study in Chapter 7 are problem-dependent. In the following paragraphs, the main conclusions from every estimation problem addressed in this thesis are summarized: • Synchronization. The Gaussian assumption is asymptotically optimal when the number of observed symbols is infinite (Section 7.4.4). In a continuous mode transmission, the asymptotic condition allows neglecting the so-called “edge effect” related to the partial observation of the border symbols (Section 6.1.2). On the other hand, if the observation time is limited, the fourth-order information about the received constellation is crucial to filter out the self-noise at high SNR in case of a partial response CPM modulation (e.g., LREC or GMSK) but it is negligible in case of linear modulations (e.g., MPSK or QAM). The importance of this result is beyond the actual interest of CPM modulations. Actually, it shows that the optimal second-order estimator is able to take advantage of the statistical dependence of the Laurent’s expansion pseudo-symbols. In this case, as it happens in coded transmissions, the received symbols are not statistically independent in spite of being uncorrelated. Thus, the results for the CPM format could be translated to optimize existing second-order synchronizers in case of coded communication systems. Finally, note that the Gaussian assumption always applies in TDMA communication systems whatever the observation length. • Channel Estimation. If the channel amplitude is not estimated, the Gaussian assumption yields minor losses at high SNR on the average, that is, if the estimator performance is averaged considering multiple channel realizations (Section 6.4). However, the Gaussian assumption is expected to fail for some particular realizations of the channel impulse response. Indeed, this point is currently being investigated [LS04][LS05a][LS05b]. Another important conclusion is that the Gaussian assumption yields a severe degradation at high SNR when the channel amplitude is estimated too. If the transmitted symbols belong to a multilevel constellation (e.g., QAM), second-order estimators exhibit the typical variance floor at high SNR. On the other hand, if the transmitted symbols have constant 240 CHAPTER 8. CONCLUSIONS AND TOPICS FOR FUTURE RESEARCH modulus (e.g., MPSK or CPM), the BQUE estimator is able to avoid the aforementioned variance floor whereas, if the Gaussian assumption is adopted, the estimator performance degrades at high SNR because the channel amplitude estimate is drastically degraded. In fact, the higher is the transmitted pulse bandwidth (roll-off factor) the more important is the incurred loss at high SNR. The asymptotic study in Section 7.4.4 states that the above conclusions are still valid if the observation time goes to infinity. The loss incurred by the Gaussian assumption becomes a function of the actual channel impulse response. • Direction-of-Arrival (DOA). The Gaussian assumption is asymptotically optimal when the number of antennas is infinite (Section 7.4.5). The Gaussian assumption also applies in the single user case whatever the array size or the working SNR (Section 7.4.5). Likewise, the Gaussian assumption is optimal at high SNR if the transmitted symbols are drawn from a multilevel constellation (e.g., QAM or APK). On the other hand, if the transmitted symbols belong to a constant-modulus alphabet (e.g., MPSK or CPM), the fourth-order statistics of the transmitted symbols can be exploited to discriminate the DOA of those signals impinging into the array from near directions. When the Gaussian assumption is adopted and this fourth-order information is omitted, a significant loss is manifested at high SNR that is a function of the number of antennas and the users angular separation. Furthermore, the incurred loss cannot be reduced even if the number of received symbols goes to infinity (Section 7.4.5). 8.1 Further Research In this thesis, the ultimate limits of second-order estimation are studied from both the practical and theoretical point of view. However, some interesting points are still open and should be investigated in the future. From the author’s opinion, the most promising topics for further study are outlined in the following paragraphs: 1. Multiuser estimation problems. Second-order methods are able to exploit the constant modulus property of the random nuisance parameters. This property appears reflected in the eigendecomposition of the kurtosis matrix K. In some estimation problems, this information is crucial to deal with the intersymbol interference as well as the multiple access interference in multiuser applications. Actually, the results of this thesis suggest that the constant modulus property is mainly relevant in multiuser or MIMO scenarios. In these scenarios, the constant-modulus property could be exploited to discriminate the parameters associated to non-orthogonal interfering users. 8.1. FURTHER RESEARCH 241 2. Asymptotic Gaussian assumption. The Gaussian assumption does not apply for practical SNRs if the nuisance parameters have constant-modulus and the observation length is rather short in case of low-cost implementations. However, in some important problems, the Gaussian assumption is rapidly satisfied as the number of observations is augmented because of the Central Limit Theorem. This asymptotic study was addressed in Chapter 7 for different estimation problems. In all the studied problems, if the nuisance parameters have constant modulus, the second term in equation (8.1) generates a favourable term that persists at high SNR. In the problem of DOA estimation, this term becomes negligible if the number of antennas goes to infinity. However, this term survives if the number of antennas is finite, even if the number of snapshots goes to infinity. Therefore, the results in Chapter 7 could be useful to identify those estimation problems in which the non-Gaussian information persists as the number of observations goes to infinity. 3. Noncircular and coded nuisance parameters. If we deal with noncircular nuisance parameters (e.g., CPM signals), the kurtosis matrix K provides additional information regarding the statistical dependence of the nuisance parameters. In that way, it is possible to remove the self-noise —including the multiple access interference— even if the number of parameters exceeds the number of observations. This feature should be throroughly investigated because it could be exploited to improve second-oder estimators in case of coded transmissions. Moreover, other noncircular constellations should be studied in detail as, for example, BPSK, digital PAM and staggered formats as the offset QPSK modulation Besides these three principal research lines, some other topics for future research are listed next: 1. Large error bounds with nuisance parameters. In Section 2.6, the most important lower bounds in the literature were classified and briefly described. Among all the existing bounds, the Cram´er-Rao bound (CRB) is without doubt the most widespread one due to its simplicity. However, the true CRB is still unknown in a lot of estimation problems in the presence of nuisance parameters. To fill this gap, the CRB is derived in some particular scenarios: low SNR, high-SNR, Gaussian nuisance parameters (Gaussian UCRB), deterministic and continuous nuisance parameters (CCRB), and known nuisance parameters (MCRB). Moreover, in this thesis we have deduced the CRB under the quadratic constraint. In the context of digital communications, it would be useful to apply the last assumptions to the large-error bounds in Section 2.6 in order to characterize the large-error region and the SNR threshold in the presence of nuisance parameters. Among all the lower bounds in Section 2.6, the Hammersley-Chapman-Robbins, Weiss-Weinstein and Ziv-Zakai lower bounds are surely the most promising candidates. Then, the obtained large-error bounds 242 CHAPTER 8. CONCLUSIONS AND TOPICS FOR FUTURE RESEARCH should be compared to the second-order large-error estimators deduced in Chapter 3. From this comparison, we could determine whether quadratic estimators are optimal or not at low SNR in the large-error regime. Also, we could evaluate the performance loss due to the presence of the random nuisance parameters. 2. Acquisition optimization. The QEKF was proposed in Chapter 5 with the aim of improving the acquisition performance of classical closed loop schemes. Initially, the QEKF supplies the large-error MMSE solution derived in Chapter 3 considering the initial noninformative prior. Thereafter, the QEKF converges progressively to the small-error solution in Chapter 4 every time that a new observation is processed. The new datum is used to update the prior distribution so that the prior becomes every time more and more informative. Unfortunately, the QEKF convergence is not guaranteed unless the observations and parameters are jointly Gaussian distributed. Moreover, although the QEKF had converged, the acquisition time could have been shortened by optimizing the prior update. Therefore, an important topic for research is to find the optimal prior update for optimizing the acquisition probability and delay. In that sense, the Unscented Kalman Filter proposed in [Jul97][Wan00] should be considered since it is known to guarantee the convergence under mild conditions. 3. Low-cost implementation. The optimal second-order estimator is formulated in Chapter 3 and Chapter 4 resorting to the vec(·) transformation. Consequently, it is necessary to compute the inverse of the M 2 × M 2 square matrix Q(θ) and, thus, the estimator computational cost increases rapidly when the number of observations M is augmented. In some problems, matrix Q−1 (θ) can be computed offline before processing the first sample (e.g., in digital synchronization). However, in other relevant problems such as channel and DOA estimation, the inverse needs to be computed every time (online). The closed-loop architecture introduced in Section 2.5.1 as well as the QEKF formulation in Chapter 5 allow reducing the number of observations M that are jointly processed each time. Additionally, suboptimal quadratic estimators could be investigated by considering rank-reduction techniques [Sic92][Sch91b] and transversal filtering implementations. In the latter case, the (scalar) parameter θ could be estimated from the sample covariance matrix at time n applying a one-rank matrix M = hhH . Thus, we have n = b + ynH Myn = b + hH yn 2 θn = b + Tr MR where yn is the observation at time n and h collects the coefficients of the estimator time-invariant impulse response. In that case, we aim at optimizing the coefficients of h according to the criteria presented in Chapter 3 and Chapter 4. Evidently, we can consider 8.1. FURTHER RESEARCH 243 multiple transversal filters: θn = b + R H 2 hm yn m=1 where R is the rank of M= R hm hH m. m=1 Actually, the original estimators in Chapter 3 and Chapter 4 were the sum of M transversal filters because matrix M was originally full rank, i.e., R = M. Appendix A Notation In general, uppercase boldface letters (A) denote matrices, lowercase boldface letters (a) denote (column) vectors and italics (a, A) denote scalars. In some occasions, matrices are also represented with calligraphic fonts (A). AT , A∗ , AH Transpose, complex conjugate and transpose conjugate of matrix A, respectively. A−1 , A# Inverse and Moore-Penrose pseudoinverse of matrix A, respectively. A1/2 Positive definite Hermitian square root of matrix A, i.e. A1/2 A1/2 = A. A (B) Matrix A is a function of the entries in matrix B. det (A) Determinant of matrix A. Tr (A) Trace of matrix A. vec (A) Column vector formed stacking the columns of matrix A on top of one another. diag (a) , diag (A)Following the Matlab notation, diag (a) is the N × N diagonal matrix whose entries are the N elements of the vector a, and diag (A) is the column vector containing the N diagonal elements of matrix A. Dg (A) a aW An N × N diagonal matrix whose entries are the N elements in the diagonal of matrix A, i.e., diag [A]1,1 , . . . , [A]N,N or, equivalently, diag (diag (A)) . √ Euclidean norm of a, i.e. a = aH a. √ Weighted norm of a, i.e. aW = aH Wa (with Hermitian positive definite W). [A]i,j The entry of matrix A in the i-th row and the j-th column. [A]i The i-th column of matrix A. 245 APPENDIX A. NOTATION 246 [v]i The i-th element of vector v. A⊗B Kronecker product between A and B. If A is M × N, ⎡ ⎤ [A]1,1 B · · · [A]1,N B ⎢ ⎥ .. .. .. ⎥ . A ⊗ B =⎢ . . . ⎣ ⎦ [A]M,1 B · · · [A]M,N B AB Elementwise (Schur-Hadamard) product between A and B (they must have the same dimensions). A ≥ B, A > B The matrix A − B is positive semidefinite and positive definite, respectively. IN , I The N × N identity matrix and the identity matrix of implicit size. 0M ×N , 0M , 0 An M ×N all-zeros matrix, an M-long all-zeros vector and, an all-zeros matrix or vector of implicit size. 1M ×N , 1M , 1 An M × N all-ones matrix, an M-long all-ones vector and, an all-ones matrix or vector of implicit size. dN The vector defined as dN = [0, . . . , N − 1]T ei Vector that has unity in its i-th position and zeros elsewhere. RM×N , CM ×N The set of M × N matrices with real and complex valued entries, respectively. j Imaginary unit (j = √ −1). Re {A} , Im {A} The matrices containing the real and imaginary parts of the entries of A respectively. Im(a) Re(a) arg {a} Angle of the complex number a, i.e., arg {a} = arctan |a| , sign (a) Absolute value and sign of a real valued a. a Smallest integer bigger than or equal to a. A Estimator or estimate of the matrix A. fA (A) Probability density function of the random matrix A. E {A} Expectation of a random matrix A. EB {A} Expectation of a random matrix A with respect to the statistics in B. arg min f(B) Matrix B minimizing the scalar function f (B) arg max f (B) Matrix B maximizing the scalar function f(B) B B . 247 ∂A/∂B If B is M × N, ∂A ∂B is a matrix formed as ⎡ ∂A ∂{B}1,1 · · · ⎢ ∂A ⎢ .. .. =⎢ . . ∂B ⎣ ∂A ··· ∂{B} M,1 ∂A ∂{B}1,N .. . ∂A ∂{B}M,N ⎤ ⎥ ⎥ ⎥ . ⎦ In addition, for a given scalar b, ∂A/∂b is the matrix containing the derivatives of the entries of A with respect to b. If b is complex, we have ∂ , j ∂ Im{b} Multidimensional Kronecker delta defined as 1 i1 = . . . = iN . δ (i1 , . . . , iN ) = 0 otherwise δ (x) Vectorial Dirac‘s delta defined as δ (x) = 1 x=0 0 otherwise − Direct and inverse Fourier transform for both the analog and discrete cases ∞ $ −j2πf n , defined as F {x(t)} = −∞ x(t)e−j2πf t dt and F {x[n]} = ∞ n=−∞ x[n]e ∞ ∗ Analog or discrete convolution defined as x(t) ∗ y(t) = $ x[n] ∗ y[n] = ∞ k=−∞ x[k]y[n − k], respectively. x ∈ (A, B] The scalar x belongs to the interval given by x > A and x ≤ B sin (πx) / (πx) x = 0 Function defined as sinc (x) . 1 x=0 sup ∂ ∂ Re{b} . respectively. sinc (x) = see [Bra83]. δ (i1 , . . . , iN ) F {·} , F−1 {·} ∂ ∂b −∞ x(τ )y(t − τ )dτ or Supremum (lowest upper bound). If the set is finite, it coincides with the maximum (max). lim sup Limit superior (limit of the sequence of suprema). O (·) , o (·) Landau symbols for order of convergence. Symbol used to define a new variable. ∝ It stands for “proportional to” or sometimes “equivalent to”. ln (·) Natural logarithm. Appendix B Acronyms AoA Angle of Arrival. APK Amplitude Phase Keying. AWG Additive White Gaussian. AWGN Additive White Gaussian Noise. BLUE Best Linear Unbiased Estimator. BPSK Binary Phase Shift Keying. BQUE Best Quadratic Unbiased Estimator. CDMA Code Division Multiple Access. CML Conditional Maximum Likelihood. CPM Continuous Phase Modulation. CRB Cram´er-Rao Bound. DA Data Aided. DD Decision Directed. DOA Direction of Arrival. EKF Extended Kalman Filter. FIM Fisher Information Matrix. FIR Finite Impulse Response. GML Gaussian Maximum Likelihood. 249 APPENDIX B. ACRONYMS 250 GMSK Gaussian Minimum Shift Keying. GSM Global System Mobile. IIR Infinite Impulse Response. ISI Intersymbol Interference. LOS Line-of-Sight. MAI Multiple Access Interference. MCRB Modified Cram´er-Rao Bound. MIMO Multiple Input Multiple Output. ML Maximum Likelihood. MMSE Minimum Mean Squared Error. MPSK M-ary Phase Shift Keying. MSE Mean Squared Error. MSK Minimum Shift Keying. MUI Multiuser Interference. MVU Minimum Variance Unbiased. NDA Non Data Aided. NLOS Non Line-of-Sight. OFDM Orthogonal Frequency Divison Multiple Access. PAM Pulse Amplitude Modulation. p.d.f. Probability Density Function. QAM Quadrature Amplitude Modulation. QEKF Quadratic Extended Kalman Filter. QPSK Quaternary Phase Shift Keying. RX Receiver. SCPC Single Channel per Carrier. SNR Signal to Noise Ratio. TOA Time of Arrival. 251 TX Transmitter. UKF Unscented Kalman Filter ULA Uniform Linear Array. UMTS Universal Mobile Telecommunications System. WLS Weighted Least Squares. ZF Zero Forcing. Bibliography [Abe93] J.S. Abel, “A bound on mean-square-estimate error”, IEEE Trans. on Information Theory, Vol. 39, no 5, pags. 1675—1680, Sept. 1993. [Alb73] J.P.A. Albuquerque, “The barankin bound: A geometric interpretation”, IEEE Trans. on Information Theory, Vol. 19, no 4, pags. 559—561, Jul. 1973. [Alb89] T. Alberty, V. Hespelt, “A new pattern jitter free frequency error detector”, IEEE Trans. on Communications, Vol. 37, pags. 159—163, Feb. 1989. [Ama98] S.I. Amari, “Natural gradient works efficiently in learning”, Neural Computation, Vol. 10, pags. 251—276, 1998. [And79] B.D.O. Anderson, J.B. Moore, Optimal Filtering, Prentice-Hall, New Jersey, 1979. [And90] A.N.D’ Andrea, U. Mengali, R. Reggiannini, “A digital approach to clock recovery in generalized minimum shift keying”, Proc. of the IEEE Trans. on Vehicular Technology, Vol. 39, no 3, pags. 227—234, Aug. 1990. [And93] A.N.D’ Andrea, U. Mengali, “Design of quadricorrelators for automatic frequency control systems”, IEEE Trans. on Communications, Vol. 41, pags. 988—997, Jun. 1993. [And94] A.N.D’ Andrea, U. Mengali, R. Reggiannini, “The modified Cram´er-Rao bound and its application to synchronization problems”, IEEE Transactions on Communications, Vol. 42, no 2, pags. 1391—1399, Feb.-Apr. 1994. [And96] A.N.D’ Andrea, M. Luise, “Optimization of symbol timing recovery for QAM data demodulators”, IEEE Trans. on Communications, Vol. 44, pags. 399—406, Mar 1996. [Bah74] L. Bahl, J. Cocke, F. Jelinek, , J. Raviv, “Optimal decoding of linear codes for minimizing the symbol error rate”, IEEE Trans. on Information Theory, pags. 284—287, Mar. 1974. [Bar49] E.W. Barankin, “On some analogues of the amount of information and their use in statistical estimation”, Annals of Mathematical Statistics, Vol. 20, pags. 477—501, 1949. 253 BIBLIOGRAPHY 254 [Bat46] A. Battacharyya, “On some analogues of the amount of information and their use in statistical estimation”, Sankya, Vol. 8, pags. 1—14, 1946. [Bel74] S. Bellini, G. Tartara, “Bounds on error in signal parameter estimation”, IEEE Trans. on Communications, Vol. 22, pags. 340—342, Mar. 1974. [Bel97] K. L. Bell, Y. Steinberg, Y. Ephraim, H. L. Van Trees, “Extended Ziv-Zakai lower bound for vector parameter estimation”, IEEE Trans. on Information Theory, Vol. 43, no 2, pags. 624—637, Mar. 1997. [Ben84] A. Benveniste, M. Goursat, “Blind equalizers”, IEEE Trans. on Communications, Vol. 32, no 8, pags. 871—883, Aug. 1984. [Ber93] C. Berrou, P. Glavieux, P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes”, Proc. of the IEEE Int. Conf. on Communications (ICC), Geneva (Switzerland), 1993. [Bie80] G. Bienvenu, L. Kopp, “Adaptivity to background noise spatial coherence for high resolution passive methods”, Proc. of IEEE Int. Conf. on Accoustics, Speech and Signal Processing, pags. 307—310, Apr. 1980. [Bob76] B.Z. Bobrovsky, M. Zakai, “A lower bound on the estimation error for certain diffusion processes”, IEEE Trans. on Information Theory, Vol. 22, no 1, pags. 45—52, Jan. 1976. [Bou02a] N. Bourdeau, S. Di Girolamo, J. Riba, F. Barcelo, M. Burri, M. Gibeaux, , F. Sansone, “System architecture definition report”, Tech. Rep. Deliverable D6, IST-2000-26040 EMILY, Nov. 2002. [Bou02b] N. Bourdeau, S. Di Girolamo, J. Riba, M. Gibeaux, M. Burri, F. Sansone, “System performance definition report”, Tech. Rep. Deliverable D7, IST-2000-26040 EMILY, April 2002. [Boy04] S. Boyd, L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, 2004. [Bra83] B.A. D.H. Brandwood, “A complex gradient operator and its application in adaptive array theory”, Proceedings of the IEE , Vol. 130, F and H, pags. 11—16, Feb. 1983. [Bra01] M.S. Braasch, “Performance comparison of multipath mitigating receiver architectures”, Proc. of the IEEE Aerospace Conference, pags. 3.1309—3.1315, Mar. 2001. [Cah77] C.R. Cahn, “Improving frequency acquisition of a Costas loop”, IEEE Trans. on Communications, Vol. 25, pags. 1453—1459, Feb. 1977. BIBLIOGRAPHY [Car94] 255 J.F. Cardoso, E. Moulines, “A robustness property of DOA estimators based on covariance”, IEEE Trans. on Signal Processing, Vol. 42, no 11, pags. 3285—3287, Nov. 1994. [Car97] E. de Carvalho, D.T.M. Slock, “Cramer-Rao bounds for semi-blind, blind and training sequence based channel estimation”, Proc. of the IEEE Signal Processing Workshop on Signal Processing Advances in Wireless Communications (SPAWC), pags. 129—132, Paris (France), Apr. 1997. [Car00] E. Carvalho, J. Cioffi, D. Slock, “Cramer-Rao bounds for blind multichannel estimation”, Proc. of the IEEE Int. Conf. on Global Communications (GLOBECOM), Nov 2000. [Cha51] D.G. Chapman, H. Robbins, “Minimum variance estimation without regularity assumptions”, Annals of Mathematical Statistics, Vol. 22, pags. 581—586, 1951. [Cha75] D. Chazan, M. Zakai, J. Ziv, “Improved lower bounds on signal parameter estimation”, IEEE Trans. on Information Theory, Vol. 21, no 1, pags. 90—93, Mar. 1975. [Chi94] P.C Ching, H.C So, “Two adaptive algorithms for multipath time delay estimation”, IEEE Journal of Oceanic Engineering, Jul. 1994. [Chu91] J.C.-I. Chuang, N.R. Sollenberger, “Burst coherent demodulation with combined symbol timing, frequency offset estimation, and diversity selection”, IEEE Trans. on Communications, Vol. 39, pags. 1157—1164, Jul. 1991. [Dem77] A.P. Dempster, N.M. Laird, D.B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm”, Annals of the Royal Statistic Society, Vol. 39, pags. 1—38, Dec. 1977. [Fed88] M. Feder, E. Weinstein, “Parameter estimation of superimposed signals using the EM algorithm”, IEEE Trans. on Signal Processing, Vol. 36, pags. 477—489, April 1988. [Fen59] A.V. Fend, “On the attainment of Cram´er-Rao and Bhattacharyya bounds for the variances of an estimate”, Annals of Mathematical Statistics, Vol. 30, pags. 381—388, 1959. [Fis98] S. Fischer, H. Grubeck, A. Kangas, H. Koorapaty, E. Larsson, P. Lundqvist, “Time of arrival estimation of narrowband TDMA signals for mobile positioning”, Proc. of the IEEE Int. Symp. on Personal, Indoor and Mobile Radio Communications (PIMRC), pags. 451—455, Sep. 1998. [For72] G.D. Jr Forney, “Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference”, IEEE Trans. on Information Theory, Vol. 18, no 5, pags. 363—378, May 1972. BIBLIOGRAPHY 256 [For02] P. Forster, P. Larzabal, “On lower bounds for deterministic parameter estimation”, Proc. of the IEEE Int. Conf. on Accoustics, Speech and Signal Processing (ICASSP), Vol. II, pags. 1137—1140, Orlando (Florida, USA), May 2002. [Gar86a] F.M. Gardner, “A BPSK/QPSK timing-error detector for sampled receivers”, IEEE Trans. on Communications, Vol. 5, pags. 423—685, May 1986. [Gar86b] W.A. Gardner, “The role of spectral correlation in design and performance analysis of synchronizers”, IEEE Trans. on Communications, Vol. 34, pags. 1089—1095, Nov. 1986. [Gar88a] F.M. Gardner, “Demodulator reference recovery techniques suited for digital implementation”, Tech. Rep. Final Report, ESTEC Contract No. 6847/86/NL/DG, European Space Agency, Aug. 1988. [Gar88b] W.A. Gardner, “Simplification of MUSIC and ESPRIT by exploitation of cyclostationarity”, Proceedings of the IEEE , Vol. 76, no 7, pags. 845—847, Jul. 1988. [Gar90] F.M. Gardner, “Frequency detectors for digital demodulators via maximum-likelihood derivation”, Tech. Rep. Final Report, Part II, ESTEC Contract No. 8022/88/NL/DG, European Space Agency, Jun. 1990. [Gar94] W.A. Gardner, Cyclostatonarity in Communications and Signal Processing, IEEE Press, 1994. [Gel00] G. Gelli, L. Paura, A.R.P. Ragozini, “Blind widely linear multiuser detection”, IEEE Communications Letters, Vol. 4, pags. 187—189, Jun. 2000. [Ger01] W.H. Gerstacker, R. Schober, A. Lampe, “Equalization with widely linear filtering”, Proc. of Int. Symposium on Information Theory, pag. 265, Washington, USA, Jun. 2001. [Gia89] G. Giannakis, J. Mendel, “Identification of non-minimum phase systems via higherorder statistics”, IEEE Trans. on Accoustics, Speech and Signal Processing, Vol. 38, pags. 360—377, Mar. 1989. [Gia97] G. Giannakis, S.D. Halford, “Asymptotically optimal blind fractionally spaced channel estimation and performance analysis”, IEEE Trans. on Signal Processing, Vol. 45, no 7, pags. 1815—1830, Jul. 1997. [God80] D.N. Godard, “Self-recovering equalization and carrier tracking in two-dimensional data communication systems”, IEEE Trans. on Communications, Vol. 28, no 11, pags. 1867—1875, Nov. 1980. BIBLIOGRAPHY [Gor90] 257 J.D. Gorman, A.O. Hero, “Lower bounds for parametric estimation with constraints”, IEEE Trans. on Information Theory, Vol. 26, no 6, pags. 1285—1301, Nov. 1990. [Gor91] J.D. Gorman, A.O. Hero, “On the application of Cram´er-Rao type lower bounds for constrained estimation”, Proc. of the IEEE Int. Conf. on Accoustics and Signal Processing (ICASSP), pags. Vol. 2, 1333—1336, Toronto (Canada), April 1991. [Gor97] A. Gorokhov, P. Loubaton, “Semi-blind second order identification of convolutive channels”, Proc. of IEEE Int. Conf. on Accoustics, Speech and Signal Processing, pags. 3905—3908, Munich (Germany), Apr. 1997. [Gra81] A. Graham, Kronecker Products and Matrix Calculus: with Applications, John Wiley and Sons, New Jersey, 1981. [Gre92] D. Greenwood, L. Hanzo, Mobile Radio Communications, chap. Characterization of Mobile Radio Channels (chapter 2), pags. 93—185, John Wiley and Sons, 1992, Editor: R. Steele. [Ham50] J.M. Hammersley, “On estimating restricted parameters”, Annals of the Royal Statistics Society (Series B), Vol. 12, pags. 192—240, 1950. [Hay91] S. Haykin, Adaptive Filter Theory, Prentice-Hall International, 1991. [Jul97] S.J.. Julier, J.K.. Uhlmann, “A new extension of the Kalman filter to nonlinear systems”, Proc. of the Aerosense: The 11th Int. Symposium on Aerospace/Defence Sensing, Simulation and Controls., Orlando, Florida, 1997. [Kai00] T. Kailath, A.H. Sayed, B. Hassibi, Linear Estimation, Prentice Hall, 2000. [Kay93a] S.M. Kay, Fundamentals of Statistical Signal Processing: Detection Theory, Prentice Hall, New Jersey, 1993. [Kay93b] S.M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice Hall, New Jersey, 1993. [Kie52] J. Kiefer, “On minimum variance estimators”, Annals in Mathematical Statistics, Vol. 23, pags. 629—632, 1952. [Kno99] L. Knockaert, “On the effect of nuisance parameters on the threshold SNR value of the Barankin bound”, IEEE Trans. on Signal Processing, Vol. 47, no 2, pags. 523—527, Feb. 1999. [Kot59] V.A. Kotelnikov, The Theory of Optimum Noise Immunity, McGraw-Hill, New York, 1959. BIBLIOGRAPHY 258 [Kri96] H. Krim, M. Viberg, “Two decades of array signal processing research. The parametric approach”, IEEE Signal Processing Magazine, pags. 67—95, Jul. 1996. [Lau86] P.A. Laurent, “Exact and approximate construction of digital phase modulations by superposition of amplitude modulated pulses”, IEEE Trans. on Communications, Vol. 34, pags. 150—160, Feb. 1986. [Li99] H. Li, P. Stoica, J. Li, “Computationally efficient maximum likelihood estimation of structured covariance matrices”, IEEE Trans. on Signal Processing, Vol. 47, no 5, pags. 1314—1323, May 1999. [Liu93] H. Liu, G. Xu, L. Tong, “A deterministic approach to blind identification of multichannel FIR systems”, Proc. of the Int. 27th Asilomar Conf. Signals, Systems and Computers, Oct. 1993. [LS04] J.A. Lopez-Salcedo, G. Vazquez, “Frequency domain iterative pulse shape estimation based on second-order statistics”, Proc. of 5th IEEE Int. Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Lisbon, Portugal, Jul. 2004. [LS05a] J.A. Lopez-Salcedo, G. Vazquez, “Asymptotic equivalence between the unconditional maximum likelihood and the square-law nonlinearity symbol timing estimation”, IEEE Trans. on Signal Processing, 2005, to appear. [LS05b] J.A. Lopez-Salcedo, G. Vazquez, “Low-SNR subspace-compressed approach to waveform estimation in digital communications”, IEEE Trans. on Signal Processing, May 2005, submitted. [Lue84] D. Luenberger, Linear and Nonlinear Programming, Addison Wesley, Massachusetts, 1984. [Mag98] J.R. Magnus, H. Neudecker, Matrix Differential Calculus with Applications in Statistics and Econometrics, John Wiley and Sons, England, 1998. [Mar97] T.L. Marzetta, “Computing the Barankin bound, by solving an unconstrained quadratic optimization problem”, Proc. of the IEEE Int. Conf. on Accoustics, Speech and Signal Processing (ICASSP), pags. 3829—3832, Munich (Germany), April 1997. [McW93] L.T. McWhorter, L.L. Scharf, “Properties of quadratic covariance bounds”, Proc. of the 27th Asilomar Conference on Signals, Systems and Computers, Oct. 1993. [Men95] U. Mengali, M. Morelli, “Decomposition of M-ary CPM signals into PAM waveforms”, IEEE Trans. on Information Theory, Vol. 41, pags. 1265—1275, Sept. 1995. BIBLIOGRAPHY 259 [Men97] U. Mengali, A. D’Andrea, Synchronization Techniques for Digital Receivers, Plenum Press, 1997. [Mer00] R. van der Merwe, N. de Freitas, A. Doucet, E. Wan, “The unscented particle filter”, Technical Report CUED/F-INFENG/TR 380, Cambridge University Engineering Department, Aug. 2000. [Mer01] R. van der Merwe, N. de Freitas, A. Doucet, E. Wan, “The unscented particle filter”, Advances in Neural Information Processing Systems, Vol. 13, Nov. 2001. [Mer03] R. van der. Merwe, E.A.. Wan, “Sigma-point Kalman filters for probabilistic inference in dynamic state-space models”, Proc. of the Workshop on Advances in Machine Learning, Montreal, Canada, Jun. 2003. [Mes79] D.G. Messerschmitt, “Frequency detectors for PLL acquisition in timing and carrier recovery”, IEEE Trans. on Communications, Vol. 27, pags. 1288—1295, Sep. 1979. [Mes02] X. Mestre, Space Processing and Channel Estimation: Performance Analysis and Asymptotic Results, PhD Thesis, Dept. of Signal Theory and Communications, Technical University of Catalonia (Spain), Nov. 2002. [Mey90] H. Meyr, G. Ascheid, Synchronization in Digital Communications. Phase-, FrequencyLocked Loops, and Amplitude Control, John Wiley and Sons, 1990. [Mie00] B. Mielczarek, Synchronization in Turbo Coded Systems, Licentiate thesis, Dept. of Signals and Systems, Chalmers University of Technology (Sweden), April 2000, chalmers technical report 342L. [Mit97] T.M. Mitchell, Machine Learning, McGraw-Hill, 1997. [Moe92] M. Moeneclaey, “Overview of digital algorithms for carrier frequency synchronization”, Proc. of the Int. European Space Agency Conf. on Digital Signal Processing for Space Communications, pags. 1.1—1.7, Sep. 1992. [Moe94] M. Moeneclaey, G. de Jonghe, “ML-oriented NDA carrier synchronization for general rotationally symmetric signal constellations”, IEEE Trans. on Communications, Vol. 42, pags. 2531—2533, Aug. 1994. [Moe98] M. Moeneclaey, “On the true and modified Cramer-Rao bounds for the estimation of a scalar parameter in the presence of nuisance parameters”, IEEE Trans. on Communications, Vol. 46, pags. 1536—1544, Nov. 1998. [Mog03] P.P Moghaddam, H. So, R.L. Kirlin, “A new time-delay estimation in multipath”, IEEE Trans. on Signal Processing, May 2003. BIBLIOGRAPHY 260 [Mor00] M. Morelli, G.M. Vitetta, “Joint phase and timing recovery for MSK-type signals”, IEEE Trans. on Communications, Vol. 48, pags. 1997—1999, Dec. 2000. [Mou95] E. Moulines, P. Duhamel, J. Cardoso, S. Mayrargue, “Subspace methods for blind identification of multichannel FIR filters”, IEEE Trans. on Signal Processing, Vol. 43, pags. 516—526, Feb. 1995. [Nik92] C. Nikias, “Blind deconvolution using higher-order statistics”, Proc. of the Int. 2nd Int. Conf. of Higher-Order Statistics (Elsevier), pags. 49—56, 1992. [Noe03] N. Noels, C. Herzet, A. Dejonghe, V. Lottici, H. Steendam, M. Moeneclaey, M Luise, L. Vandendorpe, “Turbo synchronization: an EM algorithm interpretation”, Proc. of the IEEE Int. Conf. on Communications (ICC), 2003. [Oer88] M. Oerder, H. Meyr, “Digital filter and square timing recovery”, IEEE Trans. on Communications, Vol. 36, pags. 605—612, May 1988. [Ott92] B. Ottersten, M. Viberg, T. Kailath, “Analysis of subspace fitting and ML techniques for parameter estimation from sensor array data”, IEEE Trans. on Signal Processing, Vol. 40, pags. 590—600, Mar. 1992. [Ott93] B. Ottersten, M. Viberg, P. Stoica, Radar Array Processing, chap. Exact and Large Sample Maximum Likelihood Techniques for Parameter Estimation and Detection, Springer-Verlag, 1993. [Pic87] G. Picci, G. Prati, “Blind equalization and carrier recovery using a stop-and-go decision directed algorithm”, IEEE Trans. on Communications, Vol. 35, no 9, pags. 877— 887, Sept. 1987. [Pic94] B. Picinbono, “On circularity”, IEEE Trans. on Signal Processing, Vol. 42, pags. 3473— 3482, Dec. 1994. [Pic95] B. Picinbono, P. Chevalier, “Widely linear estimation with complex data”, IEEE Trans. on Signal Processing, Vol. 43, pags. 2030—2033, Aug. 1995. [Pic96] B. Picinbono, “Second-order complex random vectors and normal distributions”, IEEE Trans. on Signal Processing, Vol. 44, pags. 2637—2640, Jul. 1996. [Pis73] V.F. Pisarenko, “The retrieval of harmonics from a covariance function”, Journal of the Royal Astronomy Society, Vol. 33, pags. 347—366, 1973. [Pol95] A. Polydoros, R. Raheli, C.K. Tzou, “Per-Survivor-Processing: A general approach to MLSE in uncertain enviroments”, IEEE Trans. on Communications, Vol. 43, pags. 354—364, Feb./March/April 1995. BIBLIOGRAPHY 261 [Pro95] J.G. Proakis, Digital Communications, McGraw-Hill, 1995. [Rib94] J. Riba, G. Vazquez, “Bayesian recursive estimation of frequency and timing exploiting the cyclostationary property”, EURASIP Signal Processing, Vol. 40, pags. 21—37, Oct. 1994. [Rib96] J. Riba, J. Goldberg, G. Vazquez, “Signal selective DOA tracking for multiple moving targets”, Proc. of IEEE Int. Conf. on Accoustics, Speech and Signal Processing, pags. 2559—2562, May 1996. [Rib97] J. Riba, Procesado de Senal Bayesiano en Estimaci´ on Conjunta de Frequencia y Tiempo de Llegada, PhD Thesis, Dept. of Signal Theory and Communications, Technical University of Catalonia (Spain), Feb. 1997. [Rib01a] J. Riba, “Parameter estimation of binary CPM signals”, Proc. of the IEEE Int. Conf. on Accoustics, Speech and Signal Processing (ICASSP), Salt Lake City (Utah, USA), 2001. [Rib01b] J. Riba, J. Sala, G. Vazquez, “Conditional maximum likelihood timing recovery: Estimators and bounds”, IEEE Trans. on Signal Processing, Vol. 49, pags. 835—850, April 2001. [Rib02] J. Riba, A. Urruela, “A robust multipath mitigation technique for time-of-arrival estimation”, Proc. of the IEEE Vehicular Technology Conference (VTC), pags. 2263— 2267, Sep. 2002. [Rif74] D.C. Rife, R.R. Boorstyn, “Single-tone parameter estimation from discrete-time observations”, IEEE Trans. on Information Theory, Vol. 20, pags. 378—392, Sep. 1974. [Rif75] D.C. Rife, M. Goldstein, R.R. Boorstyn, “A unification of Cram´er-Rao type bounds”, IEEE Trans. on Information Theory, Vol. 21, no 3, pags. 330—332, May 1975. [Roy89] R. Roy, T. Kailath, “ESPRIT - estimation of signal parameters via rotational invariance techniques”, IEEE Trans. on Accoustics, Speech and Signal Processing, Vol. 37, no 7, pags. 984—995, Jul. 1989. [Sal97] J. Sala, G. Vazquez, “Statistical reference criteria for adaptive signal processing in digital communications”, Proc. of the IEEE , Vol. 45, no 1, pags. 14—31, Jan. 1997. [Sar88] H. Sari, S. Moridi, “New phase and frequency detectors for carrier recovery in PSK and QAM systems”, IEEE Trans. on Communications, Vol. 36, pags. 1035—1043, Sep. 1988. BIBLIOGRAPHY 262 [Sat75] Y. Sato, “A method of self-recovering equalization for multilevel amplitudemodulation”, IEEE Trans. on Communications, Vol. 23, no 6, pags. 679—682, Jun. 1975. [Sch79] R.O. Schmidt, “Multiple emitter location and signal parameter estimation”, Proc. of RADC Spectral Estimation Workshop, pags. 243—258, 1979. [Sch89] S.V. Schell, R.A. Calabretta, W.A. Gardner, B.G. Agee, “Cyclic MUSIC algorithms for signal-selective direction finding”, Proc. of the IEEE Int. Conf. on Accoustics, Speech and Signal Processing (ICASSP), pags. 2278—2281, Glasgow (Scotland), May 1989. [Sch91a] L.L. Scharf, Statistical Signal Processing. Detection, Estimation, and Time Analysis, Addison Wesley, 1991. [Sch91b] L.L. Scharf, “The SVD and reduced-rank signal processing”, R.J. Vaccaro (ed.), SVD and Signal Processing, II: Algorithms, Analysis and Applications, pags. 3—31, Elsevier Science Publishers B.V. (North-Holland), 1991. [Sch94] S.V. Schell, D.L. Smith, S. Roy, “Blind channel identification using subchannel response matching”, Proc. of the Int. 26th Conf. Information Sciences and Systems, Princeton (NJ), Mar. 1994. [Sch03] P.J. Schreier, L.L. Scharf, “Second-order analysis of improper complex random vectors and processes”, IEEE Trans. on Signal Processing, Vol. 51, no 3, pags. 714—725, Mar. 2003. [Sec00] G. Seco, Antenna Arrays for Multipath and Interference Mitigation in GNSS Receivers, PhD Thesis, Dept. of Signal Theory and Communications, Technical University of Catalunya (Spain), Jul. 2000. [Ser01] E. Serpedin, P. Ciblat, G.B. Giannakis, P. Loubaton, “Performance analysis of blind carrier phase estimators for general QAM constellations”, IEEE Trans. on Signal Processing, Vol. 49, pags. 1816—1823, Aug. 2001. [Sha90] O. Shalvi, E. Weinstein, “New criteria for blind deconvolution of nonminimum phase systems (channels)”, IEEE Trans. on Information Theory, Vol. 36, no 2, pags. 312— 320, Mar. 1990. [Sic92] G.L. Sicuranza, “Quadratics filters for signal processing”, Proceedings of the IEEE , Vol. 80, 1992. [S¨ od89] T. S¨oderstr¨om, P. Stoica, System Identification, Prentice Hall, London, 1989. BIBLIOGRAPHY [Ste01] 263 H. Steendam, M. Moeneclaey, “Low-SNR limit of the Cramer-Rao bound for estimating the time delay of a PSK, QAM, or PAM waveform”, IEEE Communications Letters, Vol. 5, no 1, pags. 31—33, Jan. 2001. [Sto89] P. Stoica, A. Nehorai, “MUSIC, Maximum Likelihood, and Cram´er-Rao Bound”, IEEE Trans. on Acoustics, Speech and Signal Processing, Vol. 37, no 5, pags. 720— 741, May 1989. [Sto90a] P. Stoica, A. Nehorai, “Performance study of conditional and unconditional directionof-arrival estimation”, IEEE Trans. on Accoustics and Signal Processing, Vol. 38, no 10, pags. 1783—1795, Oct. 1990. [Sto90b] P. Stoica, K. Sharman, “Maximum likelihood methods for direction-of-arrival estimation”, IEEE Trans. on Accoustics, Speech and Signal Processing, Vol. 38, no 7, pags. 1132—1143, Jul. 1990. [Sto97] P. Stoica, R.L. Moses, Introduction to Spectral Analysis, Prentice Hall, 1997. [Sto01] P. Stoica, T. Marzetta, “Parameter estimation problems with singular information matrices”, IEEE Trans. on Signal Processing, Vol. 49, pags. 87—90, Jan. 2001. [Ton91] L. Tong, G. Xu, T. Kailath, “A new approach to blind identification and equalization of multipath channels”, Proc. of the 25th Asilomar Conference, pags. 856—860, Pacific Grove (CA), Nov. 1991. [Ton94] L. Tong, G. xu, T. Kailath, “Blind channel identification and equalization based on second-order statistics: A time domain approach”, IEEE Trans. on Information Theory, Vol. 40, no 2, pags. 340—349, Mar. 1994. [Ton95] L. Tong, G. xu, B. Hassibi, T. Kailath, “Blind channel identification based on secondorder statistics: A frequency-domain approach”, IEEE Trans. on Information Theory, Vol. 41, no 1, pags. 329—334, Jan. 1995. [Tre68] H.L. Van Trees, Detection, Estimation and Modulation Theory. Part I , Wiley, New York, 1968. [Tre83] J.R. Treichler, B.G. Agee, “A new approach to multipath correction of constant modulus signals”, IEEE Trans. on Accoustics, Speech and Signal Processing, Vol. 31, no 2, pags. 459—472, Apr. 1983. [Tug95] J.K. Tugnait, “On blind identifiability of multipath channels using fractional sampling and second-order cyclostationary statistics”, IEEE Trans. on Information Theory, Vol. 41, no 1, pags. 308—311, Jan. 1995. BIBLIOGRAPHY 264 [Tul00] A.M. Tulino, S. Verdu, “Improved linear receivers for BPSK-CDMA subject to fading”, Proc. of Allerton Conf. on Communications, Control and Computation, pags. 11—21, Monticello, IL, Oct. 2000. [Ung76] G. Ungerboeck, “Fractional tap-spacing equalizer and consequences for clock recovery in data modems”, IEEE Trans. on Communications, Vol. 24, no 8, pags. 856—864, Aug. 1976. [Vaz00] G. Vazquez, J. Riba, Signal Processing Advances in Wireless Communications. Trends in Single and Multi-user Systems, Vol. II, chap. Non-Data-Aided Digital Synchronization, pags. 357—402, Prentice-Hall, 2000, Editors: G.B. Giannakis, Y. Hua, P. Stoica and L. Tong. [Vaz01] G. Vazquez, J. Villares, “Optimal quadratic NDA synchronization”, Proc. of the 7th Int. European Spacial Agency Conf. on Digital Signal Processing for Space Communications, Lisbon, Portugal, Sept. 2001. [Vib91] M. Viberg, B. Ottersten, T. Kailath, “Detection and estimation in sensor arrays using weighted subspace fitting”, IEEE Transactions on Signal Processing, Vol. 39, no 11, pags. 2435—2449, Nov. 1991. [Vib95] M. Viberg, A. Nehorai B. Ottersten, “Performance analysis of direction finding with large arrays and finite data”, IEEE Trans. on Signal Processing, Vol. 43, no 2, pags. 469—477, Feb. 1995. [Vil01a] J. Villares, G. Vazquez, “Best quadratic unbiased estimator (BQUE) for timing and frequency synchronization”, Proc. of the IEEE Statistical Signal Processing Workshop, Singapore, Aug. 2001. [Vil01b] J. Villares, G. Vazquez, J. Riba, “Fourth order non data aided synchronization”, Proc. of the IEEE Int. Conf. on Accoustics and Signal Processing (ICASSP), May 2001. [Vil02a] J. Villares, G. Vazquez, “Optimal quadratic non-assisted parameter estimation for digital synchronisation”, Proc. of Int. Zurich Seminar on Broadband Communications (IZS), Zurich (Switzerland), Feb. 2002. [Vil02b] J. Villares, G. Vazquez, “Sample covariance matrix based parameter estimation for digital synchronization”, Proc. of the IEEE Global Communications Conference 2002 (Globecom), Taipei (Taiwan), Nov. 2002. [Vil03a] J. Villares, G. Vazquez, “Sample covariance matrix parameter estimation: Carrier frequency, a case study”, Proc. of the IEEE Int. Conf. on Accoustics and Signal Processing (ICASSP), pags. Vol.6, 725—728, Hong Kong (China), Apr. 2003. BIBLIOGRAPHY [Vil03b] 265 J. Villares, G. Vazquez, “Second-order DOA estimation from digitally modulated signals”, Proc. of the Int. 37th Asilomar Conf. Signals, Systems and Computers, Pacific Grove (USA), Nov. 2003. [Vil03c] J. Villares, G. Vazquez, M. Lamarca, “Maximum likelihood blind carrier synchronization in space-time coded OFDM systems”, Proc. of the IEEE Signal Processing Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Rome (Italy), Jun. 2003. [Vil04a] J. Villares, G. Vazquez, “On the quadratic extended Kalman filter”, Proc. of the Third IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Barcelona (Spain), Jul. 2004. [Vil04b] J. Villares, G. Vazquez, “Self-noise free second-order carrier phase synchronization of MSK-type signals”, Proc. of the IEEE Int. Conf. on Communications (ICC), Paris (France), Jun. 2004. [Vil05] J. Villares, G. Vazquez, “Second-order parameter estimation”, IEEE Transactions on Signal Processing, Jul. 2005. [Wan00] E.A. Wan, R. van der Merwe, “The unscented kalman filter for nonlinear estimation”, Proc. of IEEE Symposium 2000 (AS-SPCC), Lake Louise, Alberta, Canada, Oct. 2000. [Wei83] A.J. Weiss, E. Weinstein, “Fundamentals limitations in passive time delay estimation - part I: Narrow-band systems”, IEEE Trans. on Accoustics, Speech, and Signal Processing, Vol. 31, no 2, pags. 472—486, Apr. 1983. [Wei84] E. Weinstein, A.J. Weiss, “Fundamentals limitations in passive time delay estimation - part II: Wide-band systems”, IEEE Trans. on Accoustics, Speech, and Signal Processing, Vol. 32, no 5, pags. 1064—1078, Oct. 1984. [Wei85] A.J. Weiss, E. Weinstein, “A lower bound on the mean square error in random parameter estimation”, IEEE Trans. on Information Theory, Vol. 31, no 5, pags. 680—682, Sept. 1985. [Wei88a] E. Weinstein, “Relations between Bellini-Tartara, Chazan-Zakai-Ziv, and Waz-Ziv lower bounds”, IEEE Trans. on Information Theory, Vol. 34, pags. 342—343, Mar. 1988. [Wei88b] E. Weinstein, A.J. Weiss, “A general class of lower bounds in parameter estimation”, IEEE Trans. on Information Theory, Vol. 34, pags. 338—342, Mar. 1988. BIBLIOGRAPHY 266 [Win00] J. Winter, C. Wengerter, “High resolution estimation of the time of arrival for GSM location”, Proc. of the IEEE Vehicular Technology Conference (VTC), pags. 1343— 1347, May 2000. [Xu92] G. Xu, T. Kailath, “Direction-of-arrival estimation via exploitation of cyclostationarity - a combination of temporal and spatial processing”, IEEE Transactions on Signal Processing, Vol. 7, no 40, pags. 1775—1786, Jul. 1992. [Zei93] A. Zeira, P.M. Schultheiss, “Realizable lower bounds for time delay estimation”, IEEE Trans. on Signal Processing, Vol. 41, no 11, pags. 3102—3113, Nov. 1993. [Zei94] A. Zeira, P.M. Schultheiss, “Realizable lower bounds for time delay estimation: Part 2 - threshold phenomena”, IEEE Trans. on Signal Processing, Vol. 42, no 5, pags. 1001— 1007, May 1994. [Zen97a] H.H. Zeng, L. Tong, “Blind channel estimation using the second-order statistics: Algorithms”, IEEE Trans. on Signal Processing, Vol. 48, no 8, pags. 1919—1930, Aug. 1997. [Zen97b] H.H. Zeng, L. Tong, “Blind channel estimation using the second-order statistics: Asymptotic performance and limitantions”, IEEE Trans. on Signal Processing, Vol. 48, no 8, pags. 2060—2071, Aug. 1997. [Ziv69] J. Ziv, M. Zakai, “Some lower bounds on signal parameter estimation”, IEEE Trans. on Information Theory, Vol. 15, no 3, pags. 386—391, May 1969.
© Copyright 2025