Diss. ETH No. 10980 Sample-Rate Conversion: Algorithms and VLSI Implementation A dissertation submitted to the SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH for the degree of Doctor of technical sciences presented by FRITZ MARKUS ROTHACHER Dipl. El.-Ing. born 24. 8. 1965 citizen of Blumenstein BE accepted on the recommendation of Prof. Dr. W. Fichtner, examiner Prof. Dr. W. Guggenb¨uhl, co-examiner 1995 Acknowledgements I would like to thank my adviser, Prof. W. Fichtner, for his confidence in me and my work and for establishing a generous working environment. I enjoyed working at the Integrated Systems Laboratory over the last four years. I am grateful to my associate advisor, Prof. W. Guggenb¨uhl, for reading and commenting on my thesis. I want to thank Gunnar Lehtinen, Andreas Steiner, Dominique M¨uller, and Christian Siegrist for their effort in implementing the sample-rate converter during their semester and diploma theses. I would also like to thank the staff of the Integrated Systems Lab: All secretaries for handling the administrative work, Hanspeter Mathys and Hansj¨org Gisler for maintaining the logistics, the system administrators Christoph Wicki and Adam Feigin who fixed all computer-related problems, and Hubert Kaeslin and Andreas Wieland of the Microelectronics Design Center for their support concerning VLSI design methodology and tools. My special thanks go to Norbert Felber, who contributed many valuable ideas and support in the hardware lab. He found always time and was interested to discuss digital signal processing and related topics, and offered competent advice. I owe most of my knowledge of real-world measurement techniques to him. Tom Heynemann, Robert Rogenmoser, Hubert Kaeslin, and Nobert Felber improved the quality of this text by their constructive proofreading. This work would not have been possible without the support and encouragement of many people outside the ETH. In particular I would like to thank my parents for their tolerance and support. i Contents Acknowledgements i Abstract ix Zusammenfassung xi 1 Introduction 1 ::::::::: Concept : : : : : : : : : : Related Work : : : : : : : Structure of the Thesis : : 1.1 Motivation 1.2 1.3 1.4 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 Principle in Time and Frequency Domain :::::::::: Interpolation : : : : : : : : : : : : : : : : : : : Decimation : : : : : : : : : : : : : : : : : : : Sample-rate Conversion for Fixed Ratios : : Sample-rate Conversion for Rational M/L : : 2.1 Test Signals and Sampling 2.2 2.3 2.4 2.5 1 2 4 5 7 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 10 11 12 15 iii ::::::::::::::::::: 2.5.2 Downmode : : : : : : : : : : : : : : : : : 2.5.3 Continuous Mode : : : : : : : : : : : : : Sample-rate Conversion for Any Arbitrary Ratio : 2.5.1 Upmode 2.6 : : : : : : : : : : : : : : : : 3 High-Order Digital Filters : 3.1.1 Transition Band : : : : : : 3.1.2 Stopband : : : : : : : : : Realization Considerations : : : 3.2 3.3 3.4 3.5 16 18 19 23 : : : : FIR Filter and Hold Operation : : : : : 3.2.1 : 3.2.2 FIR Filter and Lagrange Interpolation : 3.2.3 Alternatives : : : : : : : : : : : : : : : : Required Resources : : : : : : : : : : : : : : : Synthesis of High-Order FIR Filters : : : : : : : 3.4.1 Specifications : : : : : : : : : : : : : : : 3.4.2 Design Technique : : : : : : : : : : : : 3.4.3 Direct Synthesis : : : : : : : : : : : : : 3.4.4 Prototype Method : : : : : : : : : : : : 3.4.5 Relaxed Specifications : : : : : : : : : Quantization of Filter Coefficients : : : : : : : : 3.5.1 Quantization of FIR Coefficients : : : : 3.5.2 Quantization of Interpolated Coefficients : 3.1 Required Filter Characteristics 15 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23 24 24 26 26 27 34 36 38 38 39 40 40 43 45 45 47 4 Frequency Tracking 51 ::::::::::: Principle : : : : : : : : : : : : : : Frequency Counter : : : : : : : : Digital Phase Locked Loop : : : 4.4.1 Phase Detector : : : : : : 4.4.2 Loop Filter : : : : : : : : Non-uniform Resampling : : : : Performance Comparison : : : : 4.6.1 Frequency Counter : : : 4.6.2 Digital PLL : : : : : : : : 4.1 Requirements 4.2 4.3 4.4 4.5 4.6 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 Implementation ::::: 5.1.1 Frequency Counter : 5.1.2 Digital PLL : : : : : : Datapath for FIR Filter : : : : 5.2 5.3 5.4 52 53 55 56 57 58 62 62 65 67 : : : : 5.2.1 Review of Operations : 5.2.2 Finite Word-Length in Datapath : DSP Realization Considerations : : : : ASIC Realization : : : : : : : : : : : : : 5.4.1 Implementation : : : : : : : : : : 5.4.2 Discussion : : : : : : : : : : : : 5.1 Frequency Tracking 51 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 68 68 69 71 71 73 75 77 77 79 6 Measurement Principle & Setup 83 ::::::::::::::::::: Analyzer : : : : : : : : : : : : : : : : : : : : 6.2.1 Frequency Matching for the FFT : : 6.2.2 Windowing for the FFT : : : : : : : 6.2.3 Complex Windows : : : : : : : : : : Measurement Procedure : : : : : : : : : : 6.1 Generator 6.2 6.3 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7 Results : 7.1.2 Digital PLL : : : : : : System Measurements : : : Discussion : : : : : : : : : : 7.1.1 Frequency Counter 7.3 85 86 86 89 89 93 7.1 Frequency Tracking Measurements 7.2 83 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 94 95 95 98 103 8 Conclusions 107 A List of Symbols and Integrals 111 B Error Caused by Hold Effect 113 ::::::::::::::::::::::::: White Noise : : : : : : : : : : : : : : : : : : : : : : : : : Pink Noise : : : : : : : : : : : : : : : : : : : : : : : : : B.1 Sine Wave 113 B.2 114 B.3 C Error Caused by Linear Interpolation C.1 Sine Wave ::::::::::::::::::::::::: 114 117 117 C.2 White Noise C.3 Pink Noise ::::::::::::::::::::::::: ::::::::::::::::::::::::: 118 118 D Performance Measurements 121 List of Figures 146 List of Tables 147 Curriculum Vitae 149 Bibliography 154 Abstract Many modern signal processing tasks are performed in the digital domain. Various sample rates are used depending on the required signal quality and the available bandwidth. Sample-rate conversion is therefore inevitable to interface systems with different sample rates. In this thesis the concept of digital sample-rate conversion by integer ratios is extended to arbitrary ratios and the trade-offs of various design parameters for audio applications are discussed. Conceptually, the source samples are interpolated to a heavily oversampled intermediate signal using a high-order digital filter. To allow a conversion between arbitrary sample rates this intermediate signal is transformed into a pseudo-analog representation by holding the value in-between the interpolated samples. The resulting continuous-time signal is then resampled at the sink sample rate. For an infinite interpolation factor, ideal filters, and jitter-free sampling clocks digital sample-rate conversion is lossless. In this thesis the error contributions of the various non-idealities are quantified. The synthesis of high-order (> 20000) narrow-band lowpass filters and their quantization properties are discussed. A digital PLL is presented, which allows to measure precisely the sink sampling moment relative to the source samples, while attenuating jitter of the sampling clocks. The effects of the non-uniform sampling due to clock jitter are treated. An ASIC for fully digital sample-rate conversion has been developed, fabricated, and characterized using an all-digital measurement system. A new (complex) window technique for the FFT is introduced, which minimizes the window side effects when doing spectrum analysis of multirate discrete-time signals. Based on the results of this thesis, the design parameters for a converter with full 20-bit quality are proposed. ix Zusammenfassung Viele moderne Signalverarbeitungsaufgaben werden im digitalen Bereich gel¨ost. Abh¨angig von der geforderten Signalqualit¨at und der verf¨ugbaren Bandbreite werden unterschiedliche Abtastraten verwendet. Abtastratenwandlung ist daher unumg¨anglich, um Systeme mit unterschiedlichen Abtastraten zu verbinden. In dieser Doktorarbeit werden die Konzepte f¨ur die digitale Abtastratenwandlung von ganzzahligen zu beliebigen Verh¨altnissen erweitert. In Abh¨angigkeit von verschiedenen Parametern wird die Qualit¨at und der Aufwand bez¨uglich Audioanwendungen untersucht. Vom Prinzip her werden die Eingangswerte zu einem stark u¨ berabgetasteten Zwischensignal interpoliert unter Verwendung eines digitalen Filters hoher Ordnung. Um eine Wandlung zwischen beliebigen Abtastraten zu erm¨oglichen, wird dieses Zwischensignal in eine pseudo-analoge Darstellung gewandelt. Dazu wird der Wert des Zwischensignals zwischen zwei Abtastwerten gehalten. Das resultierende zeitkontinuierliche Signal wird danach mit dem Ausgangstakt neu abgetastet. F¨ur einen unendlich grossen Interpolationsfaktor, ideale Filter und jitterfreie Taktsignale ist die digitale Abtastratenwandlung verlustlos. In dieser Doktorarbeit werden die Fehleranteile von verschiedenen Nichtidealit¨aten abgesch¨atzt. Die Synthese von schmalbandigen Tiefpassfiltern hoher Ordnung (> 20000) und deren Eigenschaften f¨ur begrenzte Wortl¨angen werden behandelt. Ein digitaler PLL wird vorgestellt, welcher die Abtastzeitpunkte am Ausgang relativ zum Eingangstakt sehr genau misst und gleichzeitig Jitter der Taktsignale d¨ampft. Die durch Taktjitter verursachten Auswirkungen von nicht a¨ quidistanten Abtastzeitpunkten werden ebenso besprochen. Ein ASIC f¨ur ausschliesslich digitale Abtastratenwandlung wurde entwickelt, hergestellt und mit einem vollst¨andig digitalen Messystem ausgemessen. F¨ur die Spektralanalyse von zeitdiskreten Signalen mit mehreren Abtastraten xi wird eine neue (komplexe) Fenstertechnik f¨ur die FFT verwendet, welche die Nebeneffekte des Fensters auf ein Minimum reduziert. Ausgehend von den Resultaten dieser Doktorarbeit werden die Parameter f¨ur einen Wandler mit voller 20-bit Qualit¨at vorgeschlagen. 1 Introduction 1.1 Motivation Modern signal processing problems are often solved in the digital domain due to the availability of powerful VLSI circuits which allow to perform complex operations in real-time, without the well-known shortcomings of analog implementations. The source signal is transformed into the digital domain by an A/D converter. All data processing, e.g. filtering, shaping, mixing etc., is done in the digital domain and only the final result is converted back to analog. To overcome the degradation caused by successive A/D-D/A conversion, all processing blocks must have digital interfaces. Depending on the available bandwidth of the channel, the required quality, and the data rate of the interfaces a wide variety of sample rates are used: e.g. 48 kHz for professional digital audio systems, 44.1 kHz for consumer digital audio (CD), 32 kHz for digital satellite radio (DSR), or 44.056 kHz for video compatibility (VCR). The incorporation of all these systems is however trouble-free, if a sample-rate converter is used at each interface. Sample-rate conversion is also used in the growing field of digital audio compression systems. Downsampling followed by data compression allows transmission of a high-quality audio signal over a 64 kbit/s ISDN telephone line [dGL94]. 1 From source to sink, signals are repeatedly converted for editing, storage, and transmission. To retain the signal quality of the original recording – even after several cascaded converters – the conversion must be virtually transparent. 1.2 Concept Many digital signal processing applications necessitate a conversion of the sample rate at the interface to match it to the following system. Conceptually, sample-rate conversion by non-rational ratios is accomplished in the digital domain by interpolating the incoming samples to a high enough sampling frequency (usually in the GHz range) and subsequently decimating this oversampled data to the target sample rate by choosing the appropriate samples. Practically the problem will be partitioned into a frequency tracking unit, which determines precisely the sink phase relative to the source samples, and into a high-order digital interpolation/decimation filter. This filter limits the signal either to half the source or to half the sink sample rate to avoid aliasing. Therefore two different modes of operation result depending on whether the source or the sink sample rate is larger. Research activities of our laboratory demonstrated a way to overcome the need for two separate modes for up- and downsampling: The cutoff frequency of the interpolation/decimation filter is dynamically adjusted – according to the ratio of sink and source sample rate – by using the frequency scaling property of the Fourier transform. We call this capability for a smooth transition between up- and downsampling CONTINUOUS MODE [RW90]. Based on the CONTINUOUS MODE we proposed and realized a VLSI architecture of a samplerate converter for arbitrary ratios [LS91]. We will refer to this implementation as SARCO (SAmple-Rate COnverter). For the interpolation/decimation filter, the precise ratio of the source and the sink sample rates must be measured. If both sample rates are synchronous i.e. derived from the same master clock and if this master clock is available, both rates can be measured by a counter running with the master clock. If the master clock is not available, it can be recovered by means of a phase-locked loop (PLL), which additionally suppresses clock jitter. A buffer (FIFO) is then used for the data exchange. However, non-synchronous interfaces require a new solution. In general three cases can be distinguished: A. Synchronous Interfaces B. Plesiochronous Interfaces C. Asynchronous Interfaces Synchronous interfaces are the easiest to handle (as shown above), but real-world problems are often plesiochronous or asynchronous. In the plesiochronous case two systems may have the same nominal sample rates, but due to tolerances or temperature drift of the two independent oscillators, the two frequencies are not phase-locked, but differ by a small, varying amount e.g. in the range of 100 ppm. Asynchronous interfaces apply for examples where sources of different independent sample rates are to be mixed or equipment with arbitrary sample rates has to be connected. The frequency measurement can be realized accurately by several methods: 1. Measurement performed with a very high-frequency clock (GHz) 2. Averaging measurements performed with a medium-frequency clock 3. Determination of the precise sample phase by a (digital) phase-locked loop For dynamically changing sample rates an optimal trade-off between precise tracking and noise suppression must be found. High-frequency measurement allows to follow sample-rate changes immediately. Unfortunately jitter of the sampling frequencies is not suppressed in this case, but modulated onto the audio signal. The lowpass filters used for systems of type 2 and 3 remove high frequency noise at the cost of slower tracking. In this thesis the emphasis is put on a unified approach to the implementation of sample-rate conversion of audio signals with arbitrarily changing ratios. The effects of non-ideal filtering and non-uniform sampling are analyzed and their effect on the signal quality is discussed. Additionally, considerations on VLSI implementation are presented. This field of research involves a large number of digital signal processing topics, including high-order filter synthesis, digital phase-locked loops (DPLL), real-time filtering, time-varying and adaptive filters, polyphase structures, finite word-length effects, spectrum analysis, and VLSI architecture. 1.3 Related Work The simplest solution for sample-rate conversion in the digital domain is to repeat or omit a sample from time to time to match the sample rates. This method is especially applicable for plesiochronous sample rates, but yields only low quality results. In professional applications the problem of samplerate conversion by arbitrary ratios is often reduced to the synchronous case by equipping all external sources, such as CD players or DAT machines, with a synchronization input. The speed of the external source is then adapted according to the system master clock. The basic operations for synchronous sample-rate conversion (type A) from a signal processing point of view have been covered in [CR81]. [LK81, LPW82] extended this work to non-rational ratios,but distinguished two modes depending on whether the sink sample rate is larger or smaller than the source rate. Plesiochronous interfaces (type B) cannot be realized with this distinction without smooth switching between modes. A first hardware implementation has been realized following the above principles, which required some 400 integrated circuits per audio channel [Lag82]. In 1983 a theoretical exposition of many aspects of multirate digital signal processing appeared in [CR83]. It treats the advantages and disadvantages of FIR and IIR filters for interpolation and decimation by rational ratios in detail and gives many design examples. These realizations all have the disadvantage that they are not well suited for the CONTINUOUS MODE, but only for fixed sample-rate ratios. [CR83] also covered the realization of high-order filters by cascading several low-order stages, which are easier to realize. [Ram84] discussed how to reduce the storage requirements for the filter coefficients by using various interpolation methods. Most publications on sample-rate conversion do not cover in depth the need for proper frequency tracking although this is crucial for high quality audio applications. [LPW82] suggested the use of a moving average filter for frequency tracking. [Sti92] used a digital PLL running at a multiple of either the source or the sink sample rate. Frequency measurement with a GHz counter has not jet been realized due to the lack of jitter suppression and technology limitations. In addition to the prototype hardware implementation mentioned above, several sample-rate converters have been realized using general purpose digital signal processors. [PHR91] implemented a sample-rate converter on a DSP56001. The interpolation/decimation filter has been realized by multiplying a SINC function by the Blackman-Harris window function, as suggested by [Ram82]. [CDPS91] realized a sample-rate converter between 44.1 kHz and 48.0 kHz using B-splines. A two stage FIR interpolation each by a factor 2 is followed by a 6th order B-spline interpolation. However, these implementations are limited to fixed sample-rate ratios. The first commercial VLSI implementation of a sample-rate converter using the CONTINUOUS MODE has appeared recently [AK93]. A second commercial realization of a sample-rate converter by [JTG+ 94] follows an alternative approach, which will be discussed briefly. 1.4 Structure of the Thesis This thesis describes theory, experience, and results that have been gained during the development and test of an application specific integrated circuit (ASIC) for sample-rate conversion of audio signals with arbitrarily changing ratios. In particular the error contribution of various non-idealities are discussed and the parameters for a true 20-bit converter are derived. In Chapter 2 an overview of the basic principle for sample-rate conversion in the digital domain is given. The well-known concepts of interpolation and decimation by integer factors are applied to sample-rate conversion by rational ratios. The CONTINUOUS MODE already published by the author in [RW90], which extends the sample-rate conversion to arbitrary sample-rate ratios, is described in detail. Chapter 3 gives insight into the strategies for the design and synthesis of very high-order (> 20 000) narrow-band lowpass filters. Several one- and multistage implementations are compared in performance, implementation cost, and required computation power. An interpolation method which reduces the storage requirements for the filter coefficients is presented. Additionally, the effects of filter coefficient quantization on the transfer function is discussed. Chapter 4 compares several alternatives for the realization of the frequency tracking unit. The performance and cost of high-frequency measurement, averaged medium-frequency measurement, and a digital PLL are compared for synchronous, plesiochronous and asynchronous sampling frequencies. In addition the effects of non-uniform (jittered) sampling are discussed. Chapter 5 evaluates the system complexity and compares ASIC and DSP implementations according to hardware requirements and costs. In Chapter 6 the measurement principles for multirate digital audio systems and the hardware setup used for the measurements in this thesis are presented. A new window technique for the FFT is introduced, which allows to minimize the side effects of the window when doing spectrum analysis of multirate discrete-time signals. Chapter 7 compares the different concepts, architectures and implementations by measurements. 2 Principle in Time and Frequency Domain The obvious method to transfer an audio signal between unassociated sample rates is digital-to-analog (D/A) conversion followed by analog-to-digital (A/D) conversion (Fig. 2.1). D/A DA-ADconv.epsi f source 103 21 mm f source Reconstruction Filter A/D f sink Anti-aliasing Filter f sink Figure 2.1: D/A conversion followed by A/D conversion This method works for any ratio of sink sample rate fsink to source sample rate fsource , but is tainted with several limitations. First of all, to retain the large signal-to-noise ratio (S/N) of a digital audio signal expensive high-quality D/A and A/D converters have to be used. Secondly, the output signal of the D/A converter must be reconstructed by an analog filter of high precision with cutoff frequency fsource =2. Additionally, to fulfill the Nyquist theorem, the input signal of the A/D converter must be band-limited to fsink =2 by an anti-aliasing filter. If the sink sample rate fsink is larger than the source sample rate fsource 7 – we will refer to this case as UPMODE– the second filter can be omitted. In the opposite case (DOWNMODE) the first filter is not necessary. Therefore both lowpass filters can be merged into one with cutoff frequency fcutoff = fsource if f sink > fsource UPMODE 2 fsink otherwise DOWNMODE (2.1) 2 This reconstruction/anti-aliasing filter, which must be realized in the analog domain, needs a flat passband, a narrow transition band and a highly attenuated stopband. The required specifications for 16 bit quality make the problem hardly solvable. A third disadvantage of the D/A-A/D approach is – besides the need for high-quality D/A and A/D converters and the expensive analog filter – that any jitter on the sampling clocks will translate into signal distortion and thus needs to be suppressed by additional circuitry. We can avoid many of the above problems by performing the sample-rate conversion process in the digital domain. The A/D and D/A converters are omitted and the analog filters are replaced by digital ones. This concept will be explained in detail in the following sections. 2.1 Test Signals and Sampling The spectrum of the example signal that is used for illustrative purposes throughout this thesis is shown in Figure 2.2. X(f) Signal.epsi 49 31 mm 1 2 ⋅f sample f sample f Figure 2.2: Spectrum of example signal It is composed of three parts: a low-frequency sine wave, a high-frequency sine wave and a third component, which declines linearly with the frequency. This third component shall represent the spectrum of a typical audio signal, while the first two show the response for low- and high-frequency components. For noise calculations three sine waves at 1; 10; 20 kHz, white noise, and pink noise are used as test signals. White noise is a random signal, which has an equal amount of energy per Hertz of bandwidth. Pink noise however is random noise delivering an equal energy per octave [Ben88]. This distribution matches more precisely the sensitivity of the ear. Figure 2.3 shows the spectral power density of pink noise and white noise respectively. S(f) [dB/Hz] -20.8 Pink.epsi 101 33 mm -30.8 -40.8 -46.0 -50.8 20 200 2 000 20 000 f [Hz] Figure 2.3: Pink noise S (f ) = 61f , white noise S (f ) = ∆1f The well-known sampling theorem states that a continuous-time signal x(t), with a spectrum X(f) that is band-limited to fsample =2, can be uniquely and error-free reconstructed from its samples x[n]. The spectrum of these samples is composed of the original spectrum X(f), which repeats at all multiples of the sampling frequency (Fig. 2.4). If the analog signal x(t) contains components at frequencies higher than fsample =2 they will distort the sampled signal in the form of spectral fold-over (aliasing) and can not be recovered anymore. baseband Sampling.epsi 99 18 mm f sample 2⋅f sample 3⋅f sample 4⋅f sample 5⋅f sample Figure 2.4: Spectrum of a sampled signal with sample rate fsample A second parameter – besides the sampling frequency – which characterizes the sampled data is the word-length used to represent the data samples. The effects of finite word-length will be neglected in the following sections, but will be discussed in Sections 3.5 and 5.2.2. 2.2 Interpolation In this section the increase of the sample rate by an integer factor L is described. In the following we refer to this process as interpolation. f sample L⋅f sample Interpolation.epsi 99 37 mm HL L⋅f sample f sample Figure 2.5: Interpolation by a factor L If we increase the sample rate of a signal we can preserve the full signal content according to the sampling theorem. After the interpolation the signal spectrum repeats only at multiples of the new sample rate L fsample (Fig. 2.5). The interpolation is accomplished by inserting L-1 zeros between successive samples (zero-padding) and using an (ideal) lowpass filter with ( HL (f ) = L 0 f for 0 < f < sample 2 f for sample < f < Lfsample 2 2 (2.2) The multiplication of a signal with spectrum X (f ) with a transfer function H (f ) in frequency domain Y (f ) = H (f ) X (f ) corresponds to a convolution of the signal in the time domain y (t) = h(t) ? x(t). For the zero-padded discrete-time signal x[.] we get +1 X k y[m] = hL [m , k] x L ; k=,1 for k = L; 2L; 3L; : : : (2.3) If we substitute k by (m div L) L , n L we get y[m] = = +1 X n=,1 +1 X n=,1 hL m , (m div L) L + n L x [m div L , n] (2.4) hL [n L + m mod L] x [m div L , n] (2.5) where m div L is the integer portion of the division m=L , and m mod L denotes the remainder. For each output value y [m] we therefore multiply and accumulate n samples of the impulse response hL [:] by the corresponding input samples x[:]. Note that the samples of the impulse response hL[:] are equidistant with spacing L. 2.3 Decimation To decrease the sample rate by an integer factor M (decimation) we must first band-limit the signal to fsample =(2 M ) (Fig. 2.6) by the lowpass filter HM (Eq. 2.6) to comply with the sampling theorem and keep only every Mth sample. As a result, we loose all signal content above half the target sampling frequency fs =M . HM 1/M⋅f sample Decimation.epsi 104 38 mm 1/M⋅f sample Figure 2.6: Decimation by a factor M f sample f sample ( HM (f ) = 1 0 f for 0 < f < sample 2M f f sample for 2M < f < sample 2 (2.6) To get the decimated signal we start from an initial phase '0 , which can be chosen arbitrarily, keep every Mth sample and skip all other samples (Eq.2.7). There exist therefore M different sets of samples (depending on the initial phase '0 ), which all represent the same signal. y[m] = 2.4 +1 X n=,1 hM [m M + '0 , n] x [n] ; for '0 = 0; : : : ; M ,1 (2.7) Sample-rate Conversion for Fixed Ratios In the preceding section we have considered interpolation and decimation as two separate operations. To perform sample-rate conversion by a fixed ratio M=L these two operations have to be cascaded (Fig. 2.7). Note that the interpolation must precede the decimation to keep the maximum bandwidth of the source signal. L HM HL M Conversion.epsi 103 38 mm f source L⋅f source L⋅f source f sink L/M⋅f source = f sink Figure 2.7: Sample-rate conversion by a fixed ratio M=L The two lowpass filters can be merged into a single one with a cutoff frequency, which is half of the minimum of source and sink sample rate (Eq. 2.8). HLM (f ) = L 0 for 0 < f < min( fsource ; fsink 2 2 ) otherwise (2.8) The interpolation of the source signal by the factor L is succeeded by the decimation of factor M . Therefore, instead of calculating the interpolated signal at the rate L fsource and keeping only every Mth value, we can operate the interpolation/decimation filter at the rate L=M fsource , as indicated by the dotted line in Figure 2.7, but we must choose the correct set of filter coefficients for each sink sample. If we substitute m in Equation 2.5 by 'sink [m] = m M + '0 (Eq. 2.7) we obtain the formula for sample-rate conversion by fixed integer ratios y[m] = +1 X j =,1 hLM j L + 'sink [m] mod L x 'sink [m] div L , j (2.9) with 'sink [m] = m M + '0 ; mod Modulo division; div Integer division This is the all-digital equivalent to the D/A-A/D solution (Fig. 2.1, Eq. 2.1), if we use a infinitely large interpolation factor L. The theoretical concept for sample-rate conversion by fixed integer ratios has already been treated in [CR83]. But for an actual realization of Equations 2.8 and 2.9 the following problems are to be solved: 1. Equation 2.8 specifies a lowpass filter with a constant passband gain and a transition band of width zero. The impulse response of such a rectangular filter with cutoff frequency fcutoff and a sample rate L fsample is h[n] = f sin[2n Lcutoff fsource ] f 2n Lcutoff fsource = SINC [n 2 fcutoff L fsource ] (2.10) and extends from n = ,1 to n = +1. An actual realization can only be a high-order lowpass filter of finite length due to the causality principle. The synthesis problems for such high-order (lowpass) filters are discussed in Chapter 3. In the following we assume that the filter is of order Q L + 1 with a sufficiently large Q. 2. For a lossless sample-rate conversion we assumed an infinite stopband attenuation of the lowpass. A filter of finite length can only reach a finite stopband attenuation, however. Furthermore the quantization of the filter coefficients h[:] additionally alters the filter transfer function. In Chapter 3 we will evaluate the achievable stopband attenuation of the lowpass h[:] and introduce the concept of quasi-floating-point (QFP) notation for the filter coefficients. The effects of finite word-length computation in the datapath are discussed in Section 5.2. 3. The cutoff frequency of the lowpass (Eq. 2.8) is not constant for variable ratios M=L, but is defined by fcutoff = fsource if f sink > fsource UPMODE 2 fsink otherwise DOWNMODE (2.11) 2 [Pel82] distinguished two modes of operation depending on whether the sink sample rate is larger or smaller than the source rate. The same lowpass filter either runs with the source or the sink sample rate and thereby only one set of filter coefficients is needed. The disadvantage of this approach is that switching from one mode to the other requires muting of the output signal. The CONTINUOUS MODE to be presented in Section 2.5.3 overcomes this limitation. 4. For fixed sample rates M and L can be calculated in a straightforward manner, e.g. for fsource = 48 kHz and fsink = 44:1 kHz we get M = 160 and L = 147. On the other hand, if the two clock sources are not synchronous their ratio can be non-rational and may even vary over time. The sample-rate converter must be able to accept any ratio of sink and source sample rate. Furthermore it must suppress short-time jitter of the sampling clocks. The different frequency tracking principles for arbitrary ratios M=L are described in Chapter 4, whereas in Section 2.6 we will estimate the required interpolation factor L for professional audio quality. 2.5 Sample-rate Conversion for Rational M/L In Section 2.4 we treated the sample-rate conversion for fixed ratios and stated that the cutoff frequency of the interpolation/decimation lowpass must be either fsource =2 (if fsource < fsink ) or fsink =2 (Eq. 2.11). In this section we describe how the frequency scaling property of the Fourier transform can be used to adjust the cutoff frequency of the lowpass filter dynamically to allow a smooth transition between both modes. Adaptive Filter Rational.epsi 103 26 mm L f source M f sink L⋅f source Figure 2.8: Sample-rate conversion by a rational ratio M=L Note that since M=L is rational all operations can be performed on a discrete time-grid. The time unit is defined by the source sample rate as tunit = L f1 (2.12) source Using this time unit, the source sample period is L tunit and the sink sample period is accordingly M tunit . 2.5.1 Upmode If the sink sample rate is larger than the source sample rate we can preserve the full bandwidth of the source signal. If we assume filter h to be of order Q L + 1, using Equation 2.9, we get y[m] = Q + 2 X j =, Q2 2 z }| sink [ ] h j L+' 6 4 ' m] ∆ [ m 3 2 { 7 5 x ' mod L z 6 4 with 'sink [m] = m M + ptr[m] }| sink [ ] m '0 3 { div L ,j 5 (2.13) 7 Figure 2.9 gives a graphical representation of Equation 2.13. Because all mathematical operations in Equation 2.13 are integer operations, we will stay on the time-grid defined in Equation 2.12. ptr[m] source samples L ∆ϕ[m] ϕ0 sink samples ϕ sink M Upmode.epsi 104 52 mm [m] h [n] ∆ϕ[m] -L+∆ϕ[m] L+∆ϕ[m] -2⋅L+∆ϕ[m] Figure 2.9: UPMODE, fsink > fsource, L > M Starting from a measured initial phase '0 (on the grid) the phase of the sink samples 'sink [m] is incremented by M (time units) for every value y [m]. Each value of 'sink [m] determines both the phase of the sink sample relative to the last source sample ∆'[m] as well as its position ptr [m]. Depending on the phase difference ∆'[m] one particular subset of the samples of the impulse response h is taken. The samples of the impulse response have a spacing of L. Since the lowpass filter h is of order Q L + 1, L different subsets of Q + 1 values exist. 2.5.2 Downmode If the sink sample rate is smaller than the source sample rate we are not able to preserve the full frequency range of the source signal, but must limit it to fsink =2 by the lowpass filter h0 (Eq. 2.11). Q + 2 X j =, Q2 2 ' m] ∆ [ 3 2 z ptr[m] 3 }| { }| { z 6 7 6 0 h 4j L + 'sink [m] mod L5 x 4'sink [m] div L ,j 75 (2.14) In order to use the same lowpass filter coefficients as in the UPMODE (fcutoff = fsource =2) we use the frequency scaling property of the Fourier transform to alter the cutoff frequency from fsource =2 to fsink =2. fsink i h fsource fsource h0[i] = fsink L h L i = M M (2.15) If Equations 2.14 and 2.15 are combined we get for y [m] in the DOWNMODE 2 Q M L2 X 3 ' m] ∆ [ 2 3 ptr[m] z }| { }| { z L 6L 7 6 7 (j L + 'sink [m] mod L)5 x 4'sink [m] div L ,j 5 h 4 M j=, M Q M + L 2 with 'sink [m] = m M + '0 L ptr[m] source samples sink samples (2.16) ∆ϕ[m] ϕ0 M ϕ sink [m] Downmode.epsi 108 69 mm h’[n] h [n] ∆ϕD[m] -L2/M+∆ϕD[m] L2/M+∆ϕD[m] -2⋅L2/M+∆ϕD[m] Figure 2.10: DOWNMODE, fsink < fsource, L < M Figure 2.10 illustrates Equation 2.16. The phase 'sink [m] for each sink sample y [m] is calculated in the same way as in the UPMODE, but the filter coefficients must be scaled by L=M . The effective phase difference ∆'D [m] is therefore L=M ('sink [m] mod L) and the samples of the impulse response have a spacing of L2 =M . Since L=M determines only the cutoff frequency of the decimation filter we can alter it slightly so that L2 =M is an integer and stays on the time-grid defined in Equation 2.12. P The number of multiplication and accumulations ( x[:] h[:]) increases from Q + 1 in the UPMODE to Q M=L + 1 in the DOWNMODE. 2.5.3 Continuous Mode The CONTINUOUS MODE combines both UPMODE and DOWNMODE. Since the cutoff frequency of the lowpass filter is adapted continually, there is a smooth transition between both modes. y[m] = % QD + 2 X j =, QD h j L % + ∆'D [m] x ptr[m] , j (2.17) 2 with 'sink [m] ptr[m] % ∆'D [m] QD UPMODE DOWNMODE if L > M if L < M i.e. fsink > fsource i.e. fsink < fsource m M + '0 'sink [m] div L 1 'sink [m] mod L Q L M L M ('sink [m] mod L) M Q L Recall that to stay on the time-grid, % must be altered slightly so that L2 =M is an integer. Additionally, QD must be rounded up to the next even integer and the initial phase ∆'D [0] = L=M ' must be rounded to the next integer. 2.6 Sample-rate Conversion for Any Arbitrary Ratio Up to now we have assumed that the ratio of source to sink sampling frequency is a rational number and can consequently be written as M=L. If we choose L large enough we can approximate any rational number to a sufficient precision. However there are practical implementation limits for L. In this section we will estimate the minimum required order of L for 18- and 20-bit quality. x(t) y(t) HoldTime.epsi 107 35 mm â x[n] T=1/(L⋅f source) Figure 2.11: Continuous-time representation between samples xˆ [n] y(t) by holding value in- If the source and the sink sample rates have an arbitrary (or even varying) ratio, we need to change from the discrete time-grid (Eq. 2.12) to a continuoustime representation. This can accomplished by interpolating the source signal with a large, but fixed interpolation factor L and holding the value in-between the interpolated samples (Fig. 2.11). This continuous-time signal is then resampled at the sink sample rate. Note that instead of holding the sample value between successive samples, linear or higher order interpolation between samples could be used as well. However, since the available time-resolution is limited, these operations must be followed by a hold operation to get a continuous-time signal. We will discuss this approach further in Chapter 3. Figure 2.12 shows a block diagram of the required operations using the hold operation. The source signal x[n] is interpolated to xˆ [n] and then converted to the continuous-time signal y (t) by the hold operation. y (t) can then be resampled at any required moment. The hold operation of the value of the interpolated source signal during the Adaptive Filter L Continuous Time Discrete Time Arbitrary.epsi 105 30 mm Hold â x[n] x[n] f source Resample y(t) y[m] f sink L⋅f source Figure 2.12: Sample-rate conversion by an arbitrary ratio M=L time T = 1=(L fsource) can be expressed in the time domain as a convolution of the interpolated samples with a rectangle of width T (Eq. 2.18). y(t) = xˆ [n] ? RECT ( 1 L fsource ) (2.18) This convolution in the time domain corresponds to a multiplication with a SINC function in the frequency domain (Eq. 2.19). Y (f ) = Xˆ [f ] HHold (f ) = Xˆ [f ] SINC ( f ) L fsource (2.19) After interpolation by a factor L the spectrum of the source signal repeats at all multiples of L fsource . Figure 2.13 shows both the spectrum of the interpolated source signal and the transform of the hold operation. The resulting spectrum is the multiplication of both. H Hold Hold.epsi 103 30 mm L⋅f source = 1⋅f i 2⋅L⋅f source = 2⋅f i 3⋅L⋅f source = 3⋅f i Figure 2.13: Interpolation by L and hold operation The baseband signal up to fsource =2 is left (almost) unchanged by the hold operation, while all remaining signal components at multiples of fi = L fsource are attenuated by the SINC . They will be folded back into the baseband after the decimation. If fsource and fsink are uncorrelated we can assume that the signal amplitudes of different folding products that are folded back to the same location in the baseband do neither add nor totally cancel each other. Therefore their energy can be summed up. The total error power due to the hold operation for a full-scale signal is listed in Table 2.1 for five different signals. The derivation of these values can be found in Appendix B. Note that these errors correspond to full-scale signals, and scale down proportionally for smaller signal levels. Sine @ 1 kHz Sine @ 10 kHz Sine @ 20 kHz White Noise Pink Noise Error in [dB] for L= 2k 216 218 ,28:5 , k 6:02 ,124:8 ,136:9 ,8:5 , k 6:02 ,104:8 ,116:9 ,2:4 , k 6:02 ,98:7 ,110:8 ,8:6 , k 6:02 ,104:9 ,117:0 ,13:2 , k 6:02 ,109:5 ,121:6 Table 2.1: Error caused by hold operation (fsource = 220 ,148:9 ,128:9 ,122:8 ,129:0 ,133:6 48 kHz) The largest portion of the error is contributed by the components at the first zero crossing of the SINC function (Fig. 2.13, f = fi ). If the audio signal is a sine wave with frequency fsig we get two contributions around fi at f = L fsource fsig with an amplitude of fsig =(L fsource ) (Eq. B.2). These two components contain about 60 % of the total error energy due to the hold effect. I.e. for a 1 kHz (20 kHz) signal sampled at fsource = 48 kHz and an interpolation factor of L = 216 the two error peaks appear ,5:2 dB below the value given in Table 2.1 at ,130 dB (,104 dB). In this section we have calculated the error introduced due to an arbitrary, non-rational ratio of source and sink sample rate depending on the interpolation factor and various source signals. Thereby we presumed the sample-rate ratio and thus the resampling moment to be known precisely. We will discuss this assumption further in Chapter 4, after the presentation of the frequency tracking unit. For performance and cost estimations in the following chapters we will use two different interpolation factors: 1. For L = 216 18-bit quality can be reached for a pink noise audio signal, but for full-scale signals close to fsource =2 only 16-bit accuracy is possible. 2. For L = 220 the sum of all error components is below ,122 dB for all five signals shown in Table 2.1. 20-bit quality can therefore be achieved over the full audio bandwidth. 3 High-Order Digital Filters In Chapter 2 we assumed that the interpolation/decimation lowpass has a constant passband gain, a transition band of width zero, and an infinite stopband suppression (Eq. 2.8). In this chapter we will estimate the minimally required filter specifications for professional digital audio applications and discuss various realization methods. Furthermore the quantization properties of high-order narrow-band lowpass filters are discussed. As shown in Section 2.5.3, the CONTINUOUS MODE uses the same filter coefficients for the UPMODE and the DOWNMODE. In the following sections we will therefore design the filter and estimate the performance for the UPMODE. We will discuss the implications on the DOWNMODE in Section 3.3. 3.1 Required Filter Characteristics The interpolation/decimation filter is specified by a passband from 0 to fp with a ripple smaller than p and a stopband that starts at fs with an attenuation of at least s (Fig. 3.1). 23 δp FiniteFilter.epsi 103 32 mm δs fp fs f i = L⋅f source f source Figure 3.1: Filter with finite stopband attenuation 3.1.1 Transition Band The transition band of the filter must be narrow, because signal components just below fsource =2 must pass unaltered, whereas those just above must be sufficiently suppressed. For high-quality audio fp = 15=32 fsource and fs = 17=32 fsource is chosen [Pel82]. Note that the transition band is symmetric in respect to fsource =2 and will therefore not alias into the baseband (below fp = 15=32 fsource). Table 3.1 gives the borders of the transition band for the three most common sampling frequencies. With 48 kHz for example, the passband extends up to 22.5 kHz. fsource fp fs 48.0 kHz 44.1 kHz 32.0 kHz 22.5 kHz 20.7 kHz 15.0 kHz 25.5 kHz 23.4 kHz 17.0 kHz Table 3.1: Transition bandwidth for common sampling frequencies 3.1.2 Stopband In Section 2.3 we stated that the interpolated signal is to be band-limited to half the target sample rate before the decimation takes place. This requirement is only approximated by a filter with a finite stopband attenuation s (Fig. 3.1). The remaining stopband energy (from fsource =2 upwards) of the higher order images is therefore attenuated by s and is then folded back into the baseband. The alias error of the residual signals at multiples of fi = L fsource caused by the hold operation has already been discussed in Section 2.6 (Fig. 2.13, FilterHold.epsi 103 27 mm δs 1⋅f i 2⋅f i 3⋅f i Figure 3.2: Weighted stopband error Table 2.1). This contribution is not affected by the finite stopband attenuation and is therefore left blank in Figure 3.2. As mentioned above, an additional error contribution results from the nonzero stopband which is weighted by the SINC -function of the hold operation (Fig. 3.2). For a large interpolation factor L the signal energy is approximately equally distributed over the stopband since the signal spectrum is repeated many times. If we further assume that the filter has a constant stopband attenuation s the total stopband energy is obtained by Estop = 2 1Z X j =0 j fi, fsource ( +1) 2 j fi+ fsource 2 s) f ( 2 source !2 sin( ffi ) df ffi (3.1) Assuming the same error energy density in the (narrow) passband as in the stopband right through, Equation 3.1 can be simplified (using Eq. A.2) to Estop = Z 1 (s)2 !2 sin( ffi ) df fsource ffi 2 (s )2 fi 2 = L (s ) fsource 2 2 0 (3.2) If the error introduced by the finite stopband attenuation should be smaller than ,110 dB and L = 216 (hold error at ,110 dB for pink noise; Table 2.1) then s must be less than ,160 dB (Eq. 3.2). For a stopband attenuation of ,130 dB and L = 220 , s must even be below ,190 dB. 3.2 Realization Considerations In this section we discuss several alternatives for the realization of the interpolation/decimation filter specified in the previous section. In Section 3.2.1 the order and computational cost of a single stage FIR filter is estimated. This filter is followed by the hold operation and the decimation. In Section 3.2.2 the hold operation is replaced by linear interpolation, allowing a lower order filter. In Section 3.2.3 alternative implementations are discussed briefly. 3.2.1 FIR Filter and Hold Operation The most direct implementation is the use of a single stage FIR filter followed by the hold operation as depicted in Figure 3.3. What is the order of a single stage FIR filter, which fulfills the specifications given in Section 3.1? Discrete Time Single Stage SingleStage.epsi Hold FIR 97 22 mm L Continuous Time Resample f sink f source Figure 3.3: Single stage FIR filter followed by hold operation [Rab73] derived an empirical formula to estimate the order of a Chebyshev lowpass filter for specified p ; s ; fp ; fs . Table 3.2 gives the approximate order N of the lowpass filter according to this formula for various values of Estop, L, and p. The parameters s , fp , fs are calculated according to Equation 3.3. q 1 17 1 ; f = s = Estop=L; fp = 15 s 32 L 32 L (3.3) Assuming a decent passband ripple of 0:01 dB, a filter of order 6 600 000 is required. If the passband ripple is reduced to 0:001 dB, the filter order increases only slightly to 7:5 106 . A filter of order 8:2 106 results, if the stopband error is reduced to ,130 dB. If the interpolation factor is increased to lower the influence of the hold operation the filter order increases rapidly. Estop L [dB] ,110 ,110 ,130 ,130 ,130 216 216 216 218 220 p s [dB] [dB] 0:01 0:001 0:001 0:001 0:001 ,158 ,158 ,178 ,184 ,190 N Q 6:6 106 7:5 106 8:2 106 33:9 106 138:9 106 102 115 126 129 132 Table 3.2: Estimated filter order Recall that to compute one sink sample only a subset of Q coefficients of the impulse response of the filter is taken (Q = N=L, Sec. 2.5.1). The computational cost (which is proportional to Q) is consequently about the same independent of the interpolation factor (last three rows of Table 3.2). However, the number of filter coefficients that must be stored increases proportionally to the interpolation factor. For L = 220 over 108 filter coefficients must be stored. Additionally, direct synthesis of a lowpass filter of order 108 is clearly beyond practical feasibility. 3.2.2 FIR Filter and Lagrange Interpolation 3.2.2.1 Principle In Section 2.6 we used the hold operation to convert the highly interpolated source signal to a continuous-time signal, which was then resampled at the sink sample rate. Instead of holding the value between two consecutive samples of the interpolated signal, we now use first order Lagrange interpolation (from now on simply called linear interpolation) between the two samples to calculate the sink value (Fig. 3.4). The frequency response of a linear interpolator has a SINC 2 characteristic. The noise suppression around the zero crossings of a SINC 2 is much higher compared to the SINC function of the hold operation. Therefore a substantially smaller interpolation factor is sufficient. The error energy around the zero crossings for a given interpolation factor L1 = 2k is shown in Table 3.3 and derived in Appendix C. If the error caused by the linear interpolation (Table 3.3) is compared to the error caused by the hold operation (Table 2.1) it can be seen that L1 = 27 corresponds to L = 216 and L1 = 29 corresponds to Discrete Time Linear L1 FIR Lagrange1.epsiInterFilter 91 21 mm polation Continuous Time Resample f sink f source Figure 3.4: FIR filter followed by linear interpolation L = 220 . The interpolation factor can therefore be reduced drastically, if the hold operation is replaced by linear interpolation. Sine @ 1 kHz Sine @ 10 kHz Sine @ 20 kHz White Noise Pink Noise Error in [dB] for L1 = 2k 27 28 ,63:9 , k 12:04 ,148:2 ,160:2 ,23:9 , k 12:04 ,108:2 ,120:2 ,11:9 , k 12:04 ,96:2 ,108:2 ,18:7 , k 12:04 ,103:0 ,115:0 ,25:7 , k 12:04 ,110:0 ,122:0 Table 3.3: Error caused by linear interpolation (fsource 29 ,172:3 ,132:3 ,120:3 ,127:1 ,134:1 = 48 kHz) So far in this section we assumed that the linear interpolation is calculated with an infinite resolution on the time axis. Therefore we got a continuous-time signal after the interpolator (Fig. 3.4). Linear Interpolation L2 Lagrange2.epsi 97 46 mm L1 f source FIR Filter Interpolation/ Decimation Filter Discrete Time Hold Continuous Time Resample f sink Figure 3.5: FIR filter followed by discrete-time linear interpolation In a practical realization we only have a finite time resolution of tunit . tunit = L f1 source with L = L1 L2 , where L2 is the number of resolved samples by the linear interpolator between two FIR-interpolated samples. The block diagram of Figure 3.4 is therefore completed with additional function blocks resulting in Figure 3.5. We actually have a cascade of three filters (Fig. 3.6): the FIR filter HFIR , the linear interpolator HLin with the first zero crossing at L1 fsource , and the hold operation HHold with the zero at L1 L2 fsource . If the following section we discuss the error contributions of the three filters. H FIR H Hold H Lin FilterCascade.epsi 108 26 mm L 1⋅ L 2⋅ f source L 1⋅ f source Figure 3.6: Transfer characteristic of FIR filter, linear interpolator, and hold operation Figure 3.7 shows the time domain representation of Figure 3.6. The source samples are interpolated by the FIR filter resulting in the bold samples with spacing 1=(L1 fsource ) in Figure 3.7. Linear interpolation leads to L2 intermediate samples with spacing 1=(L1 L2 fsource ) = 1=(L fsource ). Finally, the hold operation converts the discrete-time signal to a continuoustime representation. x(t) y(t) LinTime.epsi 89 29 mm T=1/(L1⋅f source) â x[n] T=1/(L⋅f source) Figure 3.7: FIR filter and linear interpolation followed by hold operation 3.2.2.2 Performance Estimation The error introduced through the three cascaded filters consists of three components. In the first place the stopband attenuation of the FIR filter is weighted by the SINC 2 transfer function of the linear interpolator, secondly the remaining components of the higher order FIR passband attenuated by the SINC 2 as well, and thirdly the residuals of the hold operation. The sum of the weighted stopband attenuation is (using Eq. A.3) Estop1 2f (s1) 2 source Z 1 f ) sin( L1 fsource f L fsource 0 !4 df = 23 L1 (s1)2 (3.4) 1 If we assume a white noise signal, we get for the contributions around the zero crossings of the linear interpolator (Eq. C.7) and the zero-order hold (Eq. B.7) 4 2 Estop2 7200 (L )4 + 72 (L L )2 1 1 2 (3.5) The total error for a FIR filter with a finite stopband attenuation s1 is therefore for the three different audio signals 4 4 2 f fsig sig 2 L1 (s1) + 45 L f + 3 L1 L2 fsource 1 source 4 2 2 2 EWhite 3 L1 (s1) + 7200 (L )4 + 72 (L1 L2 )2 1 ESine 23 EPink 23 L1 (s1)2 + L f 1 source 10 960 4 + 10 472 L1 L2 fsource 2 (3.6) (3.7) 2 (3.8) Table 3.4 gives the contribution of the three summands for a white noise signal (Eq. 3.7) at fsource = 48 kHz and sample values of L1 , L2 , and s1 . Note that the second summand is reduced by 7 dB and the third by 4.6 dB, if pink noise is used instead of white noise. L1 L2 s1 L [dB] 27 28 29 29 28 27 28 29 210 211 ,137 ,137 ,137 ,137 ,152 216 216 216 218 220 Weighted Linear Zero , order stopband interpolation hold Eq: 3:4 Table 3:3 Table 2:1 [dB] [dB] [dB] ,118 ,103 ,105 ,115 ,115 ,105 ,112 ,127 ,105 ,115 ,127 ,115 ,127 ,117 ,129 Table 3.4: Error contributions from interpolation and finite stopband attenuation (white noise signal) For the first three rows of Table 3.4 the same interpolation factor L = L 1 L2 = 216 is used, but the order of the interpolation filter increases and likewise the number of coefficients that have to be stored. For a sine wave test signal the second and the third error summand of Eq. 3.6 depend on the audio signal frequency. Figure 3.8 shows plots using the parameters in last four rows of Table 3.4. All three summands of Equation 3.6 are plotted with dashed lines and their sum with a solid line. For the parameters of two plots in the upper half of Figure 3.8 18-bit performance for full-scale signals is only reached up to about 5 kHz due to the hold operation (L = 16). The lower two plots represent parameters for 18and 20-bit performance over the full range of input signals. So far we only discussed the integral over the error contributions. If we apply a sine wave test signal, the energy will not be equally distributed and we will get distinct error components in the decimated sink signal. In the following the size of these ‘peaks’ is calculated. For a sine wave close to fsource =2 the largest contribution to the stopband error is caused by the insufficient attenuation of the linear interpolator and the zero-order hold at the first zero crossing (Fig. 3.6). The minimal attenuation of the linear interpolator at the first zero crossing L1 = 256, L2 = 256, δ s1 = −137 dB L1 = 512, L2 = 128, δ s1 = −137 dB −100 [dB] [dB] −100 −110 −120 −130 −120 100 1000 [Hz] L1 = 256, L2 =1024, δ s1 −130 10000 1000 10000 100 1000 [Hz] 10000 −100 [dB] [dB] 100 ErrorSine.eps [Hz] 108 76 mm = −137 dB L1 = 512, L2 =2048, δ s1 = −152 dB −100 −110 −120 −130 −110 −110 −120 100 1000 [Hz] −130 10000 Figure 3.8: Error contributions vs signal frequency for sinusoidal audio signals for f = L1 fsource fsource =2 is (Eq. C.2) fsource L f2 1 source !4 = 1 16 (L1)4 (3.9) and for the zero-order hold we get (Eq. B.2) fsource L L 2 f 1 2 source !2 = 1 4 (L1 L2 )2 (3.10) For L1 = L2 = 256 the two above equations result in ,108 dB and ,102 dB respectively. This means that for a source signal close to fsource =2 the sink signal contains two distinct error components at ,102 dB caused by the hold operation and two error components at ,108 dB caused by the linear interpolation. The alias frequency of these error components depends on the exact source and sink sample rates and the signal frequency. Therefore not only the accumulated stopband error is a matter of concern, but also its distribution. 3.2.2.3 Implementation Figure 3.9 shows a simplified block diagram of the principle outlined in the previous section. The two interpolators (by L1 and L2 ) are merged into one and the FIR filter and the linear interpolator are combined in one block. Continuous Time Discrete Time Linear FIR & Inter- Lagrange3.epsi Filter polation 97 22 mm L Hold Resample f sink f source Figure 3.9: FIR filter with linear interpolator In the time domain the multiplications of the source signal by the filter transfer functions can be expressed as successive convolutions. We will discuss further the two implementations that are indicated with brackets in Equation 3.11. x ? hFIR ) ? hLin = x ? (hFIR ? hLin ) ( (3.11) In the left hand case the source signal is interpolated by the FIR filter. Additional intermediate values are calculated by linear interpolation of the oversampled signal. In the right hand case the coefficients of the FIR filter are linearly interpolated and the resulting filter is then applied to the source signal. Both cases result in the same transfer function, since the convolution is associative. The above paragraph describes only the conceptual model. As in the case of the single stage FIR filter not all values of the interpolated source signal must be calculated since the interpolation is followed by the decimation. In the left hand case of Equation 3.11 two successive output samples of the FIR filter are computed per sink sample. The sink sample is then calculated by linear interpolation between the two. In the right hand case only one output sample of the FIR filter is calculated, but each filter coefficient requires an interpolation. This second implementation resembles the single stage FIR case. But since the linear interpolation is performed in real-time less filter coefficients have to be stored. Instead of the transfer characteristic of a true single stage filter in Figure 3.1, we get the one in Figure 3.10. FilterLin.epsi 109 26 mm δ s1 L 1⋅ L 2⋅ f source L 1⋅ f source Figure 3.10: FIR with interpolated impulse response In Section 3.2.1 (Table 3.2) we estimated the order of a single stage FIR interpolation/decimation filter and stated that such a filter can not be directly synthesized. Table 3.5 gives the order N of the FIR filter and the number of taps Q that must be calculated for the FIR filter if a linear interpolator is used. Estop L1 [dB] ,118 ,115 ,112 ,115 ,127 27 28 29 28 29 p s [dB] [dB] 0:001 0:001 0:001 0:001 0:001 ,137 ,137 ,137 ,137 ,152 N Q 13:3 103 26:6 103 53:1 103 104 104 104 26:6 103 57:3 103 104 112 Table 3.5: Estimated filter order The values of L1 and Estop are taken from Table 3.4. Note that the number of taps is the same for the first three cases, but the number of stored coefficients doubles. The parameters for 20-bit performance increase the number of taps only from 104 to 112. 3.2.3 Alternatives 3.2.3.1 High-Order Coefficient Interpolation For a single stage FIR implementation with an interpolation factor L = 220 a filter of order 108 is required. If the coefficients are interpolated using first order Lagrange interpolation the filter order can be reduced to 57 103 as discussed in the previous sections. The filter order could be further reduced by Interpolation L L1 L2 Estop s1 [dB] [dB] Zero order 1st order 220 220 220 29 20 211 2nd order 3rd order 220 220 26 24 214 216 ,127 ,127 ,127 ,127 ,187 ,152 ,144 ,138 N Q 137 225 217 131 57 288 112 6 886 108 1671 104 Table 3.6: Coefficient interpolation and filter order using higher order Lagrange interpolation. Table 3.6 shows the resulting filter order. Note that the order Q of the actually computed filter stays about the same, but the reduced storage requirements are traded for a more complicated coefficient interpolation. 3.2.3.2 Separate Interpolation and Decimation Filter [JTG+ 94] presented an implementation of a sample-rate converter which performs interpolation by only L = 64 followed by a proprietary hold operation called ‘controlled validation’ (Fig. 3.11). Before the decimation the error introduced by the hold operation is attenuated by a second lowpass filter. ’Controlled Validation’ 64 f source Filter Philips.epsi Hold 105 28 mm 1×,2×,3× 64⋅f source Filter 128⋅f sink 128 f sink Figure 3.11: Separate interpolation and decimation filters This special hold operation repeats each interpolated source sample 1-3 times in such a way that only high-frequency error components are produced, which can be attenuated by the decimation filter. If source and sink sample rate are the same, each interpolated sample is repeated twice on average. Since the interpolation filter runs with fsource and the decimation filter runs with fsink the sink samples are automatically band-limited to half the minimum of source and sink sample rate by-passing the need to adapt the cutoff frequency of the filter as in the CONTINUOUS MODE. 3.3 Required Resources Before we discuss the synthesis problem for high-order digital filters we recapitulate the required resources for the approaches presented in the previous section. We use the parameters L = 216 and L1 = 28, which results in 18-bit performance for full-scale signals up to 5kHz and assume a (stereo) audio signal. Single Stage FIR Filter If we assume a filter order of N = 7:5 106 , an interpolation factor of L = 216 , and use the hold operation, the number of multiplications for each stereo sink sample is 2 Q = 2 N=L = 230. For a sink sample rate of 48 kHz we get a cycle time of 90 ns for the multiplication (and accumulation). This approach has the drawback that all N = 7:5 106 filter coefficients must be stored in a way that they can be accessed in real-time. Interpolation of FIR Impulse Response If we use linear interpolation on the impulse response with L1 = 28, L2 = 28 (L = 216 ), and a filter order of N = 26 103 the number of multiplications for the filter is 2 Q = 2 N=L1 = 208. We need one additional multiplication for each interpolation of the filter coefficients. This results in 3 N=L1 = 312 multiplications and L1 = 28 coefficients to store. FIR Filter followed by Interpolation If we interpolate the samples after the FIR filter, two FIR calculations are needed for each (stereo) channel. One additional multiplication per channel is used for the linear interpolation. Therefore the number of multiplications is 2 2 Q + 2 = 4 N=L1 + 2 = 418. The number of stored coefficients is the same as in the previous case. Discussion In Section 3.2 we considered different concepts for the realization of the interpolation/decimation filter. We have seen that a trade-off between the computational cost and the number of stored coefficients must be made. The noise estimations in this chapter have been done for full-scale signals. However, a full-scale sine wave at 20 kHz is not a realistic audio signal. Therefore we could reduce the requirements for higher frequencies. Additionally, the noise level scales proportionally to the signal level. As a result we have less distortion for lower level signals. However, these measures reduce the required filter order only by a small fraction. So far in this chapter we have only treated the UPMODE. In the DOWNMODE (i.e. if the sink sample rate is smaller than the source sample rate) the width of the FIR filter is reduced and therefore the error contributions from the higher order passbands are smaller. The contribution from the finite stopband attenuation of the FIR filter is about the same. On the other hand the signal energy is also reduced due to the smaller passband. In the following chapters we will treat further only the concept of interpolation of the FIR impulse response. The synthesis of the FIR filter of order N = 26 103 is covered in Section 3.4. 3.4 Synthesis of High-Order FIR Filters In this section we discuss methods which allow to synthesize the high-order interpolation/decimation filter. For the examples, the following parameter values are used: L1 = 256; N 26 103; s1 = ,137 dB A filter with these parameters is suited for a sample-rate converter with 18-bit performance up to 5kHz (Fig. 3.8). 3.4.1 Specifications Let us review the specifications for the interpolation/decimation filter for the parameter values given above. δp FilterSpecs.epsi 108 32 mm δ s1 fp fs 0.5 [f/f sample] Figure 3.12: Filter specifications The filter is used for an interpolation by a factor L1 = 256. Its (ideal) cutoff frequency (with a normalized sample rate of 1.0) is therefore 0:5=L1 = 0:00195. The passband and the stopband borders are given in Table 3.7 for a transition bandwidth of fsource =16. The maximum allowed passband ripple for high-quality digital audio is a disputed subject. We demand a (conservative) value below 0:001 dB. An average attenuation of at least 137 dB must be reached by the filter. Since not only the average stopband attenuation is a matter of concern, but also its distribution we request an minimal attenuation of 137 over the full stopband. A Chebyshev filter which meets these specifications is of order 26 103 . How could the order of the interpolation/decimation filter be reduced? The required order of a Chebyshev lowpass filter meeting certain specifications for Lower Edge Upper Edge Deviation [dB] Passband 0.0000 0.0018 0.001 Stopband 0.0021 0.5000 ,137 Table 3.7: Filter Requirements p and s is about inversely proportional to its transition bandwidth. If the source signal does not extend over the full bandwidth, or if some level of aliasing can be tolerated for high frequencies, the filter order and thus the computational complexity can be reduced drastically by allowing a slightly wider transition band. An alternative approach is to split up the stopband into multiple parts with different attenuations. The filter order is determined to a large extent by the required stopband attenuation of the part immediately following the transition band. The following stopband sections can reach much higher attenuations. We will analyze the above possibility to reduce the filter order in Section 3.4.5. 3.4.2 Design Technique A wide variety of design techniques for FIR digital lowpass filters have been developed [RG75, Ant93]. We used the technique of equiripple design based on Chebyshev approximation methods. The filters designed by this method are optimal in the sense that they fulfill the minimax criterion, i.e. the peak approximation error in the frequency domain is minimized. The resulting filter has an equiripple transfer characteristic in the passband and in the stopband. To solve the Chebyshev approximation problem the well-known computer program presented in [MPR73] is used. It is based on the multiple-exchange Remez algorithm. The program was originally written to synthesize (loworder) filters up to N = 128. We have extended the program for the synthesis of high-order filters. However, the computational cost of the program is approximately proportional to the square of the filter length. Many elaborate methods have been proposed to reduce the computation time of the above mentioned program [AW85, Ant93]. Since our main concern is not runtime, but the convergence of the algorithm, we optimized the program to allow synthesis of lowpass filters up to order N = 30 000 (Sec. 3.4.3). Addi- tionally we used the prototype method proposed in [AW85] to estimate the performance of the high-order filter with a low-order approximation (Sec. 3.4.4). 3.4.3 Direct Synthesis Numerical problems in the original Remez exchange program [Rab73] disallow computation of high-order filters. We have improved the program for better numerical properties. The original program often computes expressions of type 1 cos() , cos( + ) (3.12) For values of 1 the evaluation of this expressions is inaccurate and may even lead to an arithmetic exception (division by zero) due to finite precision arithmetic. We have replaced the above expressions by 1 2 sin( + 2 ) sin( 2 ) (3.13) which has much better numerical properties for small values of . Additionally, range checking and scaling of intermediate results have been added to prevent overflow of accumulator variables during calculation of high-order filters. Using these modifications, direct synthesis of filter with more than 20 000 filter taps is feasible. Table 3.8 lists the runtimes on a SUN SPARCstation 10/41 for sample specifications. 3.4.4 Prototype Method The prototype method for narrow-band lowpass filters starts out from the fact that the computational cost is roughly proportional to the square of the filter length. A low-order prototype filter is synthesized and is then scaled to the high-order filter. This approach allows to evaluate rapidly the performance of different filter specifications at low computational cost. The frequency scaling property of the Fourier transform is used to derive the specifications of the prototype. Consider the impulse response h(t) and its Fourier transform H (f ). Scaling the impulse response by a factor yields a new Fourier pair t h ( ) $ H ( f ) 1 The edges of the pass- and the stopband are scaled to fp and fs . Due to the wider transition band the filter order is reduced by a factor . Figure 3.14 shows an example for = 4. The above scaling transformation operates precisely in the continuous-time domain and with good approximation in the discrete-time domain for narrow-band high-order lowpass filters. Filter Order 419 1679 6719 13439 26879 64 16 4 2 1 p [dB] 0:001 0:001 0:001 0:001 0:001 s [dB] ,138:8 ,139:0 ,139:0 ,139:0 ,139:0 Runtime [min] 0.37 7.73 143 564 3656 Table 3.8: Runtime and deviation depending on 4 Runtime [min] 10 2 10 Runtime.eps 106 40 mm 0 10 −2 10 −4 10 −3 10 −2 10 −1 10 0 10 [1/ λ ] 2 Figure 3.13: Graphical representation of Table 3.8 Table 3.8 lists the runtime on a SUN SPARCstation 10/41 for a target filter of order 26 879 using the specifications shown in Table 3.7 for different values of . The runtime is about inversely proportional to 2 – except for = 1, where the program converges only slowly. The resulting stopband attenuation decreases only for large values of . Low−order filter: Time domain Low−order filter: Freq. domain 0 0.2 −50 0.15 0.1 −100 0.05 −150 0 −0.05 100 200 300 [Samples] 400 ZeroPad.eps 103 86 mm High−order filter: Time domain 100 0.06 0.04 −50 0.02 −100 0 −150 1000 [Samples] 1500 400 Zero−padded filter: Freq. domain 0 500 200 300 [Samples] 500 1000 [Samples] 1500 Figure 3.14: Scaling of prototype filter by = 4 using zero padding To avoid the computation-intensive direct synthesis of the high-order filter, the Fourier transform can be used to scale the low-order prototype filter to a higher order filter. Figure 3.14 shows the required operations. In a first step the impulse response of the prototype filter is transformed into the frequency domain using the discrete Fourier transform (upper left to right). In a second step zeros are inserted at the Nyquist sample rate in the frequency domain to increase the filter order by the factor (right top to bottom). This zero-padded transfer function is then converted back into the time domain using the inverse Fourier transform (lower right to left). Due to high stopband attenuation of the filter at the Nyquist sample rate, the discontinuity introduced by the zero padding is very small. Figure 3.15 shows a comparison of the transfer function of the scaled prototype ( = 16) with the directly synthesized filter. Low−Order Filter and Zero−Padding 0 −50 −100 −150 2 10 3 10 4 10 ZeroPad2.eps [Hz] 102 72 mm Direct Synthesis 5 6 10 10 0 −50 −100 −150 2 10 3 10 4 10 [Hz] 5 Figure 3.15: Comparison of zero-padded prototype ( thesis 3.4.5 6 10 10 = 16) and direct syn- Relaxed Specifications The specifications for the interpolation/decimation filter need always to be a trade-off between the computational cost for real-time operation and the quality of the filter. I.e. if the transition band is widened, then the order of the filter can be reduced substantially, but aliasing occurs for high frequency signal components. The prototype method discussed in the previous section allows to evaluate the cost and performance of filters with different specifications at low computational cost. To reduce the filter order the specifications given in Figure 3.12 are altered to allow an intermediate stopband with a lower attenuation (Fig. 3.16). For a sample rate of 48 kHz the transition band extends from 22.5 kHz to 25.5 kHz for the original specifications (Table 3.1). Figure 3.17 shows an example of a resulting filter for reduced aliasing requirements. The stopband attenuation and the passband ripple are the same as for the original specifications, but the filter order N is reduced from 26 879 to 21 759. Thereby the δp FilterSpecs2.epsi 112 34 mm δ s1 fp fs 0.5 [f/f sample] Figure 3.16: Relaxed specifications with intermediate stopband order Q of the actually computed filter is reduced by 19 % from 105 to 85. The aliasing components in the baseband that result from the relaxed specifications are attenuated by 78 dB (Fig. 3.17) and are located above 21 kHz (fsample = 48 kHz), 14 kHz (fsample = 32 kHz) respectively. The error contribution by the higher order transition bands can be neglected since they are attenuated by at least 90 dB (L1 = L2 = 256) by the hold operation and the linear interpolation (Eq. 3.9, 3.10). The degradation caused by the relaxed specifications is therefore only marginal. 0 −20 −40 [dB] −60 Aliasing.eps 79 63 mm −80 −100 −120 −140 −160 0 0.5 1 1.5 2 2.5 [Hz] 3 3.5 4 Figure 3.17: Filter with relaxed specifications (fsample 4.5 4 x 10 = 48 kHz) 3.5 Quantization of Filter Coefficients In Section 3.4 a narrow-band FIR lowpass filter of order 26 879 with a minimum stopband attenuation of 139dB has been synthesized (average attenuation 142 dB). As described in Section 3.2.2 the (virtual) interpolation/decimation filter of order 106 is obtained by linear interpolation of the filter coefficients (impulse response) of the 26 879 filter. So far the quantization of the filter coefficients has not been treated. For the implementation the ‘basic coefficients’ are quantized twice: Once for reducing the storage requirements and to match the word-length of the multiplier, and a second time after the linear interpolation of those stored values. The following two sections cover the effects of each quantization step. 3.5.1 Quantization of FIR Coefficients Changing the coefficients of a filter alters its transfer function. In particular, quantizing the coefficients of a lowpass filter reduces the stopband attenuation. Figure 3.18 shows the transfer characteristic of the 26 879 filter with coefficients quantized to 20 bits: the average stopband attenuation is reduced from 142 dB to 129 dB. The equiripple characteristic in the stopband is replaced by stopband noise. 0 −20 −40 −60 −80 Filter26879bit20.eps 74 58 mm [dB] −100 −120 −140 −160 −180 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 [f/f sample] Figure 3.18: FIR filter, N = 26 879, 20 bit coefficients To increase the dynamic range and therefore to reduce the quantization effects, a floating-point format can be used for the coefficients. The impulse response h of the FIR filter has approximately a SINC ,shape i.e. a j1=xj asymptotic behavior from the central to the outer coefficients (Fig 3.19). The P evaluation of one sink sample is a scalar product of the form h x. If the summation is started with the outermost coefficients a special block floating point format can be used, which allows the exponent to increase only. We call this easy to implement format quasi floating point (QFP). It can be realized in hardware by shifting the accumulator to the right whenever the exponent changes. No storage of the exponent is necessary, since a simple controller can do the job. 0.8 0.6 0.4 Filter26879Impulse.eps 108 42 mm 0.2 0 −0.2 0.5 1 1.5 2 2.5 4 x 10 Figure 3.19: Impulse response of FIR filter using fixed-point coefficients The stopband attenuation of the 26 879 order FIR filter for a given precision of the coefficients is given in Table 3.9. Note that to reach a stopband attenuation of 137 dB (Table 3.4) 22-bit coefficients are required in the fixed-point format, but only 19 bits if the QFP format is used (Tab. 3.9). Coefficient [bit] 18 19 20 22 24 Stopband attenuation Fixed point QFP 117 dB 134 dB 123 dB 139 dB 129 dB 141 dB 139 dB 142 dB 142 dB 142 dB Table 3.9: Stopband attenuation depending on coefficient precision The quantization of the FIR coefficients can therefore be modelled by adjusting the stopband attenuation s1 of the FIR filter according to the spectrum of the quantized coefficients. Or in other words, the FIR coefficient word-length must be chosen such that the stopband requirements given in Section 3.2.2 are fulfilled for the quantized coefficients. 3.5.2 Quantization of Interpolated Coefficients The (already quantized) FIR coefficients are linearly interpolated to yield the interpolation/decimation filter of order 106 (Sec. 3.2.2). In Figure 3.10 the transfer characteristic of the resulting filter has been sketched: The FIR filter shape is repeated L2 times and attenuated by the SINC 2 transfer function representing the linear interpolator in the frequency domain. Figure 3.20 shows the spectrum of the FIR filter for L2 = 64 using 22-bit (non-QFP) FIR coefficients with ideal interpolation. 0 −20 −40 −60 [dB] −80 Filter26879bit22-64.eps 108 85 mm −100 −120 −140 −160 −180 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 [f/f sample] Figure 3.20: Interpolated FIR filter, L2 = 64 The quantization error of the ‘basic FIR coefficients’ (previous section) showed approximately random behavior, resulting in a white noise spectrum. The quantization of the linear interpolated (quantized) FIR coefficients results in a strongly correlated error function, which can assume only discrete values. This results in a non-white spectrum of the quantization error. Detail of quantization noise in time domain 0.5 0 −0.5 0 50 100 150 200 250 300 350 400 Filter26879bit22-64bit22-Noise.eps 104 83 mm Spectrum of quantization noise, average level = −159 dB 450 500 0.45 0.5 −120 −130 −140 −150 −160 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Figure 3.21: Quantization noise of interpolated FIR filter, L2 = 64, 22 bit The upper graph in Figure 3.21 shows the quantization error of the first 500 coefficients. The corresponding spectrum of this function is shown in the lower half. Note that the same 22-bit coefficients have been used as for Figure 3.20, but the interpolated coefficients have been quantized to 22 bit as well. The average error level is at ,159 dB with some peaks up to ,126 dB. Since all error peaks caused by the quantization of the interpolated coefficients (Fig. 3.21) are dominated by the highest peaks of the (non-quantized) interpolation operation (Fig. 3.20) the quantization error contributes only by its average level. Table 3.10 shows the average and the peak quantization error level for 19 and 22-bit quantization and sample values of L2 . Note that the average quantization error level is lowered by 17 dB using the QFP principle – the peak value even by 30 dB. L2 4 16 64 256 Quantization noise level [dB] 19-bit 22-bit fixed-point QFP fixed-point QFP peak avg. peak avg. peak avg. peak avg. ,94 ,130 ,123 ,147 ,111 ,148 ,141 ,165 ,102 ,136 ,129 ,153 ,117 ,153 ,148 ,171 ,106 ,142 ,138 ,159 ,126 ,159 ,157 ,177 ,107 ,148 ,148 ,165 ,128 ,166 ,166 ,183 Table 3.10: Quantization noise caused by second quantization Figures 3.22 and 3.23 show the resulting filter transfer characteristic with both the ‘basic FIR coefficients’ and the interpolated coefficients quantized to 22 bit using the fixed-point and the QFP format. 0 −20 −40 −60 [dB] −80 Filter26879bit22-64bit22.eps 97 76 mm −100 −120 −140 −160 −180 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 [f/f sample] Figure 3.22: FIR filter, L2 = 64, quantized twice to 22 bit 0 −20 −40 −60 [dB] −80 Filter26879bit22QFP-64bit22QFP.eps 97 76 mm −100 −120 −140 −160 −180 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 [f/f sample] Figure 3.23: FIR filter, L2 = 64, quantized twice to 22 bit QFP 4 Frequency Tracking In Section 2.6 we have presented the concept for sample-rate conversion by arbitrary ratios. The source signal is interpolated by a constant factor L, transformed into a continuous-time representation, and is then resampled at the new (sink) sample rate. In this chapter we discuss methods to determine precisely the sink sample moment in the digital domain. 4.1 Requirements In order to allow the computation of a sink value of the sample-rate converter, its precise phase (i.e. the sink sampling time relative to the source samples) must be known. Inaccuracy in measuring this phase results in a wrong sink sampling moment and thus in a non-uniform resampling of the sink signal. In Section 4.5 the effects of the non-uniform sampling on the signal spectrum will be covered in detail. It will be shown that the sample phase error is modulated onto the audio signal as shown in Figure 4.1. A precision of at least L bits is required for the sink phase. The (virtual) sample rate after the interpolation and thus the time resolution for the sampling moment of the sink samples is L fsource (Eq. 2.12). For example, a time resolution of approximately 320 ps is needed for L = 216 and a sample rate of 48 kHz. 51 uniform non-uniform Samples Modulation.epsi 85 39 mm Spectrum of phase error Spectrum of resampled sine Figure 4.1: Effects of non-uniform sampling on the signal spectrum 4.2 Principle A straight forward solution to determine the phase of the sink samples is to use a high-speed counter which starts at each source sample moment and measures the position of the sink samples relative to the source samples (Fig. 4.2). Besides the need for a counting frequency in the GHz range and a very stable clock source, this direct approach exhibits the shortcoming that jitter of either of the sampling clocks translates directly into phase errors of the sink samples. Since jitter-free sampling clocks are hardly feasible, methods which determine the precise average phase are asked for. source samples sink samples L Phase.epsi ∆ϕ[m+1] ∆ϕ[m] 95 32 mm M[m] Figure 4.2: Phase relations Figure 4.2 gives a graphical representation of the phase relations as defined in Chapter 2 (Eq. 2.17). Obviously there is no need to measure each sink sample moment directly, instead the subsequent sink sample phases are computed from the previous ones by adding the precisely measured ratio of source and sink sample rate (Eq. 4.1). This indirect solution makes it possible to suppress jitter of the sampling clocks by evaluating a long term average M=L of the ratio fsource =fsink , which in general varies only slowly in time. Note that in Equation 4.1 the interpolation factor L is constant, but the decimation factor M [m] may vary slowly. The average decimation factor M must be equal to L times the ratio of source and sink sample rate. Since L is chosen to be a power of 2 the modulo operation in Equation 4.1 can be implemented easily. ∆'[m + 1] = (∆'[m] + M [m]) mod L; with M L = fsource fsink (4.1) If the sink phase ∆'[:] is computed according to Equation 4.1, the calculated phase may drift away from the actual phase due to the finite word-length of M [m] and the filtering used for the jitter reduction. Therefore measures must be taken to avoid this drift. In the following section a frequency counter solution is presented. An alternative approach to measure the average frequency ratio is a digital PLL (Sec. 4.4). Section 4.5 treats the effects of non-uniform sampling due to variations of M [m]. A comparison of the performance of the frequency counter and the digital PLL is given in Section 4.6. 4.3 Frequency Counter In order to reach the desired overall precision, the ratio of source and sink sample rate has to be determined with a precision of at least L bits. This measurement can be accomplished by a frequency counter using two different methods: 1. The source sample rate is multiplied by a phase locked loop (PLL) to 2L,k fsource and is then used to measure the sink sample rate (Fig. 4.3). If a 2L,k -fold multiple of the source sampling clock is already available, the PLL can be omitted. An additional precision of k bits is obtained by averaging 2k consecutive measurements. This averaging operation additionally serves as lowpass filter to suppress short-time jitter. 2. The additional PLL used in the above method can be avoided by using a fast counter, which runs independently of the source and sink sampling f source PLL 2 L-k ⋅f sourceFreq2.epsi 32 mm Counter 70 f sink Averaging (2 k ) Error tracking f source / f sink Figure 4.3: Frequency counter with PLL clock. The basic idea is to measure and average both periods 1=fsource and 1=fsink and divide the resulting values by each other (Fig. 4.4). f ref ≈ 2 L-k ⋅f sample f sink Averaging (2 k ) Error tracking Freq.epsi 77 32 mm Divider High-Speed Counter f source f source / f sink Averaging (2 k ) Error tracking Figure 4.4: Frequency counter with high-speed reference clock Both methods demand a trade-off between the measurement frequency, the memory requirements of the moving average filter and the jitter reduction. Direct measurements require a GHz counter frequency with a low-jitter clock source and must be followed by a lowpass filter to eliminate jitter caused by the sampling clocks. Low-frequency measurements need a high-order averaging filter to achieve the required accuracy and therefore allow to track sample-rate changes only slowly. A VLSI implementation with a 50 MHz reference counter and two moving average filters of length 128 has been realized at our laboratory [MS93]. In order to avoid drifting of the group delay, care has been taken that the sink phase calculated from the averaged values of source and sink sample rates (Eq. 4.1) does not drift away from the actual value. We implemented the additional error-tracking circuitry suggested by [Pel82]. This VLSI implementation fulfilled our expectations for rational ratios of source and sink sample rates. If both rates differ only by a small amount (plesiochronous case) this solution is not accurate enough. Consequently an alternative approach to resolve this problem was studied (Sec 4.4). The cause of the inaccuracy of the above frequency counter implementation and performance simulations are given in Sections 4.5 and 4.6. 4.4 Digital Phase Locked Loop An alternative approach to measure the average frequency ratio is a digital phase locked loop (DPLL). A DPLL is a digital circuit that controls an output signal in a way that its phase matches the phase of an input signal [dD89, Bes93]. PLLs (especially analog PLLs with analog or digital phase detectors) are widely used e.g. in telecommunication applications for modulation and demodulation of signals, clock recovery, or as frequency synthesizers. In the case of the frequency tracking unit, the purpose of the DPLL is to control the ratio of source and sink sample rate (M [m]=L) in a way that the calculated sink phase matches the phase of the (de-jittered) sink sampling clock. The lowpass characteristic of the PLL is used to suppress short-time jitter of the sampling clocks. We use an approach similar to the servo controlled loop proposed in [Ada93], but we use an adaptive loop filter and avoid the resynchronization of the sampling clocks. f source Counter ⋅L Pll.epsi ∆ϕ 100 32 mm ϕ source Phase Detector f sink Loop Filter VCO Integrator ϕ sink Figure 4.5: Digital PLL (DPLL) The all-digital PLL consists of the same three basic building blocks as a conventional PLL (Fig. 4.5): 1. Phase detector: The phase detector generates a signal that is proportional to the difference of source phase 'source and sink phase 'sink . 2. Loop filter: The loop filter is a lowpass filter that determines the dynamic behavior of the PLL. It attenuates short-time jitter of the phase detector output and minimizes the average deviation. Again, a trade-off between fast tracking and noise suppression must be made. 3. VCO: A VCO in the strict sense of the term is a voltage-controlled oscillator. The input value corresponds to the (filtered) phase difference. The output frequency of the oscillator is altered proportionally to the input. The VCO can therefore be looked at as an integrator in the digital domain, which is realized as an accumulator. 4.4.1 Phase Detector The purpose of the phase detector is to measure the phase difference between the source and the sink samples. At the first glance this phase detection can be realized by a simple subtraction of source and sink phase. However, this simple solution is not applicable, because the two phases are not updated at the same time. The source phase 'source is incremented by L at the rate fsource . The sink phase 'sink on the other hand is incremented by a variable amount M [m] at the rate fsink . For precise operation of the phase detector both phase values must be fetched exactly at the sink sample moment. It is therefore undesirable to introduce additional jitter by synchronizing both sampling clocks to a common master clock. f sink f source Gray Counter Gray.epsi 102 17 mm Register Gray Decoder ⋅L ϕ source Figure 4.6: Synchronization with Gray counter An elegant solution to avoid the resynchronization is shown in Figure 4.6. The counter running at the source sample rate fsource is realized as Gray counter. Since the Gray code is a unit-distance code, i.e. only one bit changes for subsequent counter values, consistent readout of the counter is guaranteed at any moment. 4.4.2 Loop Filter The output of the phase detector is not constant, but fluctuates considerably. This phase noise origins from several sources. Two sources of the phase noise are the jitter of the source and sink sampling clocks. A third source of phase noise is caused by the fact that the source and the sink phase are incremented only by discrete amounts at different rates: 'source is incremented by L at the rate of fsource , whereas 'sink is incremented by M [m] at sink sample rate fsink . The loop filter following the phase detector must suppress the phase noise caused by all these sources. The design of the loop filter is a trade-off between the noise bandwidth of the filter and the dynamic behavior (fast tracking). We use a first order lowpass filter with lead-lag compensation to stabilize the full loop as proposed by [Sti92]. Gain [dB] 0 −50 −100 −1 10 0 10 1 10 Frequency [Hz] 2 10 3 10 Openloop.eps 108 73 mm Phase [Deg] −120 −150 −180 −1 10 0 10 1 10 Frequency [Hz] 2 10 3 10 Figure 4.7: Open-loop transfer characteristic of the DPLL The cutoff frequency of the lowpass filter can be adapted in four steps. Figure 4.7 shows the four open-loop transfer characteristics of the loop filter. The zero and pole of the lead-lag compensation are 1:5 decades apart. The resulting phase margin is 69 . The variable bandwidth of the loop filter allows to adapt the dynamic behavior of the DPLL. With a wide filter a fast pull-in and an immediate tracking for varying sample rates is accomplished. If the sample rates are stable over time the bandwidth of the loop is reduced resulting in an improved noise suppression. The integrator following the lead-lag section avoids discontinuities of the output phase when switching the bandwidth of the loop filter. In-lock Detector f sink ϕsource f source − K1, K2, K3 2K ∆ϕ + − + −+ ± :2 K3 Pll2.epsi ACCU 106 47 mm 2K 2 K1 M :2 K2 + Loop Filter + f sink ϕsink Figure 4.8: Adaptive digital PLL Figure 4.8 shows the loop filter and the surrounding blocks. Note that all multiplications can be realized as shift operations. The output of the phase detector is monitored to determine whether the PLL is locked. The bandwidth of the loop filter can then be reduced (parameter K 1; K 2; K 3), which results in a higher jitter rejection. The amount of jitter attenuation of the sampling clocks can be derived from the closed-loop transfer characteristic of the DPLL shown in Figure 4.12. E.g. sinusoidal jitter on a sampling clock at 50 Hz (400 Hz) is attenuated by 70 dB (105 dB) for the narrowest loop filter. 4.5 Non-uniform Resampling In the previous chapters we have treated the error introduced through the finite interpolation factor and the limited stopband attenuation of the interpolation/decimation filter. So far we have assumed ideal resampling of the interpolated signal. In this section we go a step further and estimate the error caused by the non-ideal frequency tracking. A comprehensive treatment of non-uniform sampling can be found e.g. in [Mar93]. [Ada93] studied the effects of clock jitter on the performance of various D/A converters. P M [m] mod L is restricted to discrete The calculated phase 'sink [m] = values due to the finite word-length of M . In the following the difference between the actual phase 'ideal [m] and the phase computed by hardware 'sink [m] is called 'q [m]. Note that 'q [m] does not only include the quantization of M , but also all other by-products of the non-ideal frequency tracking. Consider an ideal sine wave with amplitude A and frequency fsig , which is resampled at the (computed) sink sample moment. The resulting sink samples are y[m] = A sin ' [m] + 'q [m] ideal 2fsig L fsource (4.2) [Har90] approximated the spectrum of y [m] for a sinusoidal shape of 'q [m]. He interpreted Equation 4.2 as an FM modulation with the (ideal) sine wave as carrier and a modulation index of 2fsig =(L fsource ). The resulting spectrum is a sum of Bessel functions. The Bessel functions can be approximated for small modulation indices by simple functions [Fet90]. [Har90] supported his findings with simulations and measurements. We use a generalized approach, which leads to the same result for a sinusoidal shape of 'q , but is valid for any shape of 'q . Equation 4.2 can be modified for small modulation indices by using the trigonometric equations sin( + ) = sin cos + sin cos sin + cos with 1 (4.3) (4.4) Applying Equation 4.4 to Equation 4.2 yields y[m] = A sin 2fsig 'ideal [m] L fsource 2fsig 2fsig 'ideal [m] A 'q [m] + L fsource cos L fsource (4.5) 2 The error introduced by the above approximation (Eq. 4.4) is of order / ('q =L)2 and can therefore be neglected for large interpolation factors. The resulting signal consists of two orthogonal parts: the original sine wave and the phase deviation 'q multiplied by an attenuated cosine function. This multiplication in the time domain corresponds to a convolution in frequency domain. The spectrum of 'q is shifted by the cosine function to the same frequency as the original sine wave, but is attenuated by the factor 2fsig =(L fsource ) relative to the signal amplitude. Note that not the deviation of the decimation factor M [m] is modulated onto the signal, but its sum – the phase deviation 'q [m]. The attenuation is proportional to both the signal frequency and signal amplitude. Therefore the jitter attenuation is the same for a full-scale 1 kHz signal or a ,20 dB 10 kHz signal. Figure 4.9 shows sample traces for an uniform distribution of the quantization error of M [m] between ,1 and +1 LSB. The second trace shows the phase deviation 'q [m] of the sink sample moment, which is the integral of the first trace. The third trace shows the second summand – the error term – of Equation 4.5 for a sampling frequency of 48 kHz. Note that the error spectrum has a 1=f characteristic, since it is the integral over a signal with uniform distribution. The phase deviation is modulated onto the signal according to Equation 4.5. The lowest trace in Figure 4.9 shows the resulting spectrum for a sine wave at 997 Hz. Frequency Deviation [LSB_16] 2 0 −2 50 100 150 200 250 300 [Samples] 350 400 450 500 350 400 450 500 Phase Deviation [Deg] 0.05 0 −0.05 50 100 150 200 250 300 [Samples] Jitter.eps 109 124 mm [dB] Attenuated Spectrum of Phase Deviation −100 −120 −140 2 3 10 4 10 10 [Hz] [dB] Phase Deviation Modulated on 997 Hz Sine −100 −120 −140 2 3 10 10 4 10 [Hz] Figure 4.9: Simulation of distortion caused by uniform jitter of M 4.6 Performance Comparison In the previous section we have discussed the influence of non-ideal sink phase computation on the signal quality. In this section the performance of the frequency counter method and the DPLL are discussed. Measurements will be given in Section 7.1. To compare both solutions we distinguish three cases: synchronous, plesiochronous, and asynchronous sampling clocks. In the synchronous case source and sink sample rates are subharmonics of a common master clock. On the other hand, if source and sink sample rate are asynchronous no such common master clock exists. The two clocks are plesiochronous if they have the same nominal frequency, but are derived from different oscillators. In this case the two sampling clocks are then not phase-locked, and their frequencies may differ slightly due to tolerances of the oscillators. Plesiochronous conditions are often found at system interfaces. 4.6.1 Frequency Counter In this section the performance of the frequency counter (running at 50 MHz) for synchronous, plesiochronous, and asynchronous sample rates is discussed. The moving average filter which follows the counter (Fig. 4.3) averages the last 2k counter readings. The transfer characteristic of the filter is a SINC with the first zero crossing at fsample =2k and a decay of 20 dB/dec. Figure 4.10 shows the transfer characteristic for k = 7 and fsample = 48 kHz. Note that for these parameters only frequencies above 150 Hz are suppressed. 0 [dB] −20 Movingaverage.eps 108 36 mm −40 −60 1 10 2 3 10 10 4 10 [Hz] Figure 4.10: Spectrum of moving average filter Figure 4.11 shows traces of the distortion caused by the non-ideal frequency tracking of the frequency counter for synchronous, plesiochronous and asynchronous operation. Three subplots are given (corresponding to Figure 4.9 in the previous section): The deviation of M [m] in LSBs, the deviation of the phase 'q [m] and the error spectrum of the phase deviation according to Eq. 4.5. In the synchronous case the filter is not needed at all since the counter output is constant and no phase distortion results from the frequency tracking. In the asynchronous case the error energy is roughly equally distributed over the frequency band and the moving average filter reduces it by a factor 2k . All distortion products are below ,125 dB for the example given in Figure 4.11. In the plesiochronous case the error energy is concentrated at low frequencies and can not be suppressed by the moving average filter. In the example in Figure 4.11 the frequency difference between the quartz that is used to generate the source sampling clock and the quartz of the reference counter is 100 Hz. This difference frequency is not suppressed by the moving average filter, but can clearly be seen in the plots of the phase deviation for the plesiochronous case in Figure 4.11. Distortion products at multiples of 100 Hz are produced for the given example. The frequency counter approach is therefore well suited for the synchronous case. Both sample rates can exactly be measured by a counter running with the common master clock. If the counter frequency is asynchronous to the sampling clock, the counter readings alternate between two values which differ by one LSB of the counter. These counter readings can be averaged by a moving average filter to increase the precision of the frequency measurement. In the plesiochronous case the counter reading alternate also only between two values as in the asynchronous case, but these variations are at low frequencies which are not suppressed by the filter. Therefore distortion products with the difference frequency of the source and sink sample rate result. The performance can be improved by using a higher order moving average filter, which reduces the cutoff frequency, or by a different type of filter. For a cutoff frequency of 0.4 Hz (as with the digital PLL) a moving average filter of length 216 is required. A high-order moving average filter has the drawback that it tracks slower and needs more storage resources. A different type of filter with a low enough cutoff frequency must be followed by a controller to Synchronous Frequency Deviation [LSB_16] 2 0 −2 200 400 600 800 1000 [Samples] 1200 1400 1600 1800 2000 1400 1600 1800 2000 Phase Deviation [Deg] 1 sync.eps 92 47 mm 0 −1 200 400 600 800 1000 [Samples] 1200 [dB] Attenuated Spectrum of Phase Deviation −100 −120 −140 2 3 10 4 10 10 [Hz] Asynchronous Frequency Deviation [LSB_16] 2 0 −2 50 100 150 200 250 [Samples] x 10 [Deg] 300 350 400 450 500 350 400 450 500 Phase Deviation −3 6 4 2 0 −2 −4 async.eps 92 47 mm 50 100 150 200 250 [Samples] 300 [dB] Attenuated Spectrum of Phase Deviation −100 −120 −140 2 3 10 4 10 10 [Hz] Plesiochronous Frequency Deviation [LSB_16] 2 0 −2 200 400 600 800 1000 [Samples] 1200 1400 1600 1800 2000 1400 1600 1800 2000 Phase Deviation [Deg] 0.1 0 92 −0.1 200 400 600 800 ples.eps 47 mm 1000 [Samples] 1200 [dB] Attenuated Spectrum of Phase Deviation −100 −120 −140 2 3 10 10 4 10 [Hz] Figure 4.11: Simulation of frequency counter followed by moving average filter (fsample = 48 kHz, fsig = 997 Hz, k = 7) ensure that the phase calculated from the frequency ratio does not drift away from the effective phase (see page 53). These limitations can be overcome by the DPLL solution. 4.6.2 Digital PLL An architecture of the digital PLL has already been presented in Figure 4.8. The Gray counter used as input to the DPLL shows a similar output behavior as the frequency counter described in the previous section. The counter is incremented at the rate fsource , but the counter value is fetched with fsink . If source and sink sample rates are plesiochronous the output of the phase detector contains mostly low frequency components. For proper operation of the PLL this low frequency distortion must be suppressed by the loop filter. Gain [dB] 0 −50 −100 −1 10 0 10 90 0 Phase [Deg] 1 10 Frequency [Hz] Closedloop.eps 2 10 3 10 61 mm −90 −180 −1 10 0 10 1 10 Frequency [Hz] 2 10 3 10 Figure 4.12: Closed-loop transfer characteristic of the DPLL Figure 4.12 shows the closed-loop transfer characteristic of the DPLL for the four bandwidth of the loop filter (0.4, 1.9, 29, 480 Hz). If source and sink sample rates are stable the cutoff frequency can be stepwise reduced. This improves the performance in the plesiochronous case. For varying sample rates a wider filter allows fast tracking. Performance simulations similar to Figure 4.11 for M 216 showed that all distortion products are below ,125 dB for the DPLL solution. 5 Implementation In this chapter we will discuss alternatives for the implementation of a samplerate converter following the concepts presented in the previous chapters. Two different realization methods are analyzed: 1. Application specific integrated circuit (ASIC) We have implemented a prototype ASIC for 18-bit audio sample-rate conversion with arbitrary sampling frequency ratios [LS91, RF94]. We will refer to this implementation as SARCO. Alternatives, improvements and extensions to this architecture and realization are considered in this chapter. Performance measurements are given in Chapter 7. 2. Digital signal processor (DSP) Modern DSPs have powerful instruction sets to implement digital signal processing tasks. The choice of a programmable solution allows to tailor the implementation for each application, but is not cost effective when large volumes are required. We will estimate the necessary resources for integer DSPs with 24 24 bit multiplier as the members of the DSP5600x family [Mot88]. The dynamic range of a 24 bit datapath with 56 bit accumulation is well suited for high-quality audio applications. Figure 5.1 shows the basic building blocks of the sample-rate converter. The frequency tracking unit (Chap. 4) determines the ratio of source to sink 67 Frequency tracking Source clock Sink clock Frequency ratio Source samples BlockDiagramm.epsi High order digital 58 54 mm interpolation / decimation filter Sink samples Filter coefficients Figure 5.1: Block diagram sample rate while the interpolation/decimation filter does the actual conversion (Chap. 2). The third block represents the storage of the filter coefficients, which are interpolated from a stored subset as described in Chapter 3. 5.1 Frequency Tracking The implementation of the two methods for frequency tracking proposed in Chapter 4 – frequency counter and digital PLL – will be discussed here. 5.1.1 Frequency Counter A frequency counter which runs at the chip clock (50 MHz) according to Figure 4.4 with a moving average filter of length 128 has been implemented [MS93] for the prototype ASIC. The frequency tracking unit is connected to the filter datapath by a serial interface which is accessible from external pins. This concept allowed to analyze the output of the frequency tracking unit and to evaluate alternative solutions. Measurements showed a precision of 14bit for the worst case, where source and sink sample rate are plesiochronous. Inaccuracies in the measurement of the sample-rate ratio produce phase noise of the sink sampling moment, which is modulated onto the audio signal according to Equation 4.5 (Fig. 4.9). Since the error is proportional to the signal amplitude A and the signal frequency fsig , the distortion has the largest influence for signal frequencies close to fsample =2 with an amplitude near full-scale. The frequency counter of SARCO occupies an area 13:6 mm2 . 30% of this area is used for the RAMs of the moving average filter. 5.1.2 Digital PLL A digital PLL solution has been developed for frequency tracking in order to reach an accuracy of 20 bit for the ratio of source to sink sample rate. The basic architecture of the digital PLL has already been given in Figure 4.8. Since the loop filter is a recursive (IIR)-filter with poles close to the unit circle, large word-lengths are needed for the filter datapath to achieve the required 20-bit precision for M . Because of the feedback involved, the circuit must be carefully designed to guarantee stable operation and to avoid limit cycles due to quantization effects. Extensive simulations have been performed using a finite word-length model of the PLL. A minimal word-length of 44 bits is required for the accumulator. After the initialization during pull-in of the PLL the internal registers have not reached their stable values, which may lead to overflow in the accumulator. Therefore a limiter has been added after the phase detector, which saturates the output of the phase detector to avoid overflow. In-lock Detector f sink ϕsource f source 28 − Saturate 25 ∆ϕ’ 218 K1, K2, K3 + − + −+ ± PLL.epsi 105 44 mm 218 2 K1 :2 K3 ACCU + M 21 44 + :2 K2 ϕsink Figure 5.2: Implementation of digital PLL f sink ptr ∆ϕ 28 The in-lock detector adapts the bandwidth of the loop filter using the parameters K 1, K 2, and K 3 according to the output of the phase detector. If the output of the phase detector is below a certain limit during some time period the bandwidth is gradually reduced. If a large phase difference is detected the bandwidth of the loop filter is increased at once to allow fast tracking of sample-rate changes and to guarantee that the PLL stays locked. The PLL in Figure 5.2 has been realized in a Field-Programmable Gate Array (FPGA) having the same serial interface as SARCO. Unfortunately the interface of SARCO limits the frequency ratio to a precision of only 16 bits. Therefore an additional circuit has been added, which generates a 16 bit approximation of the 20 bit ratio M by controlling the ‘duty cycle’ of the least significant bit. This FPGA realization of the PLL is however well suited for a sample-rate converter with an (audio) precision of up to 20 bit. The datapath of the PLL implemented in a Xilinx XC4010 FPGA runs at 2 MHz, since only 7 additions/subtractions must be calculated for each sink sample. The frequency ratio M is transmitted over the serial interface running at 20 MHz. The XC4010 contains 400 configurable logic blocks (CLBs), which is claimed to be equivalent to 10 000 gate equivalents (GE). Each CLB provides two independent function generators of four inputs and two flip-flops. Considerable effort was required to fit the PLL into one FPGA, since all CLBs are occupied: 91 % of the CLBs are used for logic functions, 9 % as additional routing resources. To compare the complexity of the DPLL to the frequency counter solution, the die size for an ASIC realization of the PLL has been estimated for the same target technology as the frequency counter. The DPLL requires only 6:3 mm2 , which is about half the area of the frequency counter. The accuracy has been improved from 14 to 20 bit. Table 5.1 summarizes the technical data of both approaches. 2 Freq. counter DPLL [mm ] 13:6 6:3 ASIC [GE] 9 100 + RAM 5 900 FPGA [CLB] [GE] 400 10 000 Table 5.1: Implementation alternatives for frequency tracking 5.2 5.2.1 Datapath for FIR Filter Review of Operations In Chapter 2 we have explained the principle of sample-rate conversion for arbitrary ratios: It can be realized by a FIR filter of order Q with time-varying coefficients (Eq. 2.17). An equidistant subset of the Q L filter coefficients of the high-order interpolation/decimation filter is chosen for each output sample. The selection of the subset depends on the difference between source and sink sample phase ∆'[m]. Equation 2.17 is rewritten below for time-varying M [m] and %[m]. y[m] = %[m] + QD m [ 2 X ] j =, QD m [ h j L %[m] + ∆'D [m] x ptr[m] , j (5.1) ] 2 'sink [m] ptr[m] ∆'[m] %[m] QD [m] ∆'D [m] = = = 'sink [m , 1] + M [m , 1]; 'sink [0] = '0 'sink [m] div L 'sink [m] mod L = = = 1 UPMODE L M [m] DOWNMODE M [m] Q L %[m] ∆'[m] (5.2) (5.3) (5.4) (5.5) (5.6) (5.7) To clarify the required operations, a step-by-step description of the algorithm is given below. 1. Determine the new sink sample phase 'sink [m] according to the current sample-rate ratio M [m , 1] fetched from the frequency tracking unit (Eq. 5.2). Split the sink phase 'sink [m] into a lower part ∆'[m] (Eq. 5.4) which determines the sink phase relative to the source samples, and into an upper part ptr [m] (Eq. 5.3) which points to the source sample. Note that the values of 'sink [m], ptr [m], and ∆'[m] are directly available, if the frequency tracking is realized with a DPLL according to Figure 5.2. 2. If the sink sample rate is smaller than the source sample rate (DOWNMODE) the number Q of coefficients and the relative phase ∆'[m] are multiplied by M [m]=L and %[m] respectively (Eq. 5.6, 5.7) yielding QD [m] and ∆'D [m]. Note that since M [m] > L in DOWNMODE, % is smaller than 1. P 3. The ‘inner loop’ of the algorithm is of the form h[hadr] x[xadr]. The address of the source samples xadr [j ] = ptr [m] , j is incremented by one for each calculation. The spacing of the filter coefficients h is equidistant with spacing L (UPMODE) or %[m] L (DOWNMODE). Since only a fraction of the filter coefficients h are stored, the actual coefficients are to be interpolated in real-time. The coefficient address hadr has to be split into two parts: hadrHigh = hadr div L2 is used to access the stored FIR coefficients hFIR ; hadrLow = hadr mod L2 is used for the interpolation. The difference ∆h between successive coefficients needed for the interpolation can either be stored or calculated in real-time – both methods need two memory accesses. Multiplication Memory access h = hadrLow ∆h + h1 al = h xl + al ar = h xr + ar Address update xl = xLeft[xadr] xr = xRight [xadr] h1 = hFIR [hadrHigh ] hadr + = % L ∆h = hFIR [hadrHigh ] xadr ++ For each iteration of the inner loop a minimum of three multiplications, four memory accesses and two address updates must be calculated. 4. In DOWNMODE the accumulators al, ar must be multiplied by % due to the scaling of the filter cutoff frequency (Eq 2.15). Note that the number of repetitions of the inner loop is proportional to %. In UPMODE it is constant. For the extreme case fsink = fsource =2 (DOWNMODE) the number of loop repetitions doubles compared to the UPMODE. Since y [m] is calculated at the sink sample rate (which is fsource =2 in this case) the overall speed requirements for the inner loop in DOWNMODE are the same as for fsink = fsource. If the computation effort outside the inner loop is neglected, the time for one repetition of the inner loop must be below UPMODE DOWNMODE 1=(Q fsink ) 1=(Q fsource ) (5.8) 5.2.2 Finite Word-Length in Datapath In this section the implications of the finite word-lengths in the datapath on the signal quality are discussed. Figure 5.3 shows the basic structure for the filter calculation. ρ[m] h1, ∆h x[n] bx Circular Buffer xadr bx bh Datapath.epsi 95 59 mm bm M[m] hadr xl, xr Address Generator Scale Multiply Accumulate Unit ba by y[m] Figure 5.3: Datapath: word-lengths for FIR filter The source samples x[n] are stored in a circular buffer. The size of this buffer is mainly determined by the filter order QD . The allowed short-time variations of the sample rates requires (few) additional words. A larger buffer size is more tolerant to sample-rate variations, but increases the group delay of the filter. The address generator calculates the coefficient address sample address xadr according to Equation 5.1. hadr and the The product of the source samples xl, xr of word-length bx and the filter coefficients h1, ∆h of word-length bh is quantized to bm bits and accumulated. The word-length of the accumulator ba must be large enough to avoid overflow. If QFP coefficients are used (Sec. 3.5.1) the accumulator is renormalized by shifting when the exponent changes: block ‘Scale’ in Figure 5.3. As a last step the accumulator value is multiplied by %. For this last multiplication the same multiplier can be used as before, but the accumulator output must be quantized to bh bits. Note that this last multiplication also allows to alter the overall gain bx SARCO [RF94] AD1890 [Dev93] DSP5600X [Mot88] [bit] 18 20 24 bh [bit] 22 22 24 bm [bit] 22 25 48 ba [bit] 25 27 56 by [bit] 18 24 24 Quantization noise [dB] ,123 ,127 ,140 Table 5.2: Datapath width in bits ( = QFP) if desired. Table 5.2 shows sample word-length for three implementations. The influence of the finite word-length filter coefficients on the filter transfer function has already been shown in Section 3.5. The finite word-lengths in the datapath are the reason for several sources of quantization noise: 1. Quantization to bm bits after the multiplication in the inner loop 2. Quantization caused by the renormalization for QFP coefficients 3. Quantization to bh , 1 and bm , 1 bits before and after the multiplication with % 4. Output quantization to by bits The quantization noise – neglecting output quantization – is therefore for fixed-point coefficients qbh ,1 + qbm ,1 + Q qbm for QFP coefficients qbh ,1 + qbm ,1 +2qbm + 4qbm ,1 + 4qbm ,2 + 8qbm ,3 + : : : +qbm + qbm ,1 + qbm ,2 + qbm ,3 + : : : For the QFP coefficients we thereby assumed that two coefficients have exponent 0, four have exponent 1, four have exponent 2, etc. The resulting quantization noise for Q = 104 is given in the last column in Table 5.2. The error contribution of the datapath can therefore be neglected for the given examples for audio word-length up to 20 bit. 5.3 DSP Realization Considerations The operations reviewed in the previous section seem to be well suited for a DSP implementation, since a fast multiply-add unit and a large memory bandwidth are required. DSP solutions have been reported i.e. for the following implementations: [CDPS91] realized a sample-rate converter for fixed frequency ratios on a DSP56001 using B-splines. Two cascaded FIR interpolators each by a factor 2 are followed by a 6th order B-spline interpolation. 300 instruction cycles are needed per sink sample allowing real-time on a DSP56001 @ 33 MHz. [PHR91] implemented a 16-bit sample-rate converter for rational frequency ratios on a DSP56001. The interpolation/decimation filter has been synthesized by multiplying a SINC function with the Blackman-Harris window, as suggested by [Ram82]. The interpolation/decimation filter is of order N = 10239 using an interpolation factor L of 160. Therefore the number of repetitions Q of the inner loop is 63. Each repetition of the inner loop requires only two instruction cycles, since no coefficients are interpolated. Therefore only 126 instruction cycles are to be executed in the inner loop for each sink sample. Due to the real-time coefficient interpolation and the frequency tracking an implementation of our algorithm is more complicated than the examples shown above. The frequency tracking is not well suited for a DSP implementation due to the synchronization problems, but can be realized as FPGA (Sec. 5.1.2). Some problems of the filter implementation on a DSP56001 are pointed out below. The DSP56001 has two 256 24 bit on-chip data memories, a 512 24 bit program memory and many powerful addressing modes, but only one access to external memory per instruction cycle is possible. Due to the 24-bit data bus and 16-bit address bus the external memory space is limited to 64k 24 bit. The address ALUs, which allow auto-increment and modulo addressing, are limited to a 16-bit word-length. The data ALU has two 56-bit accumulators and allows single cycle multiply-accumulate from the X and Y register. The source samples x[:] and the program are stored in the on-chip (X-)RAM. A maximum of 64k filter coefficients can be stored in an external RAM, which must be fast enough for one cycle accesses. Since only two accumulators are available in the DSP56001, the ‘inner loop’ is split into two loops, which are executed one after the other: in the first loop the filter coefficients h are precomputed and stored in the internal (Y-)RAM; in the second loop the actual P filter function h x (al + = h xl, ar + = h xr) is calculated. In the first loop the coefficient address is updated: hadr + = % L, hadrHigh = hadr div L2 , hadrLow = hadr mod L2 , and the coefficients h1, ∆h are accessed from external memory. The actual filter coefficient is then interpolated h = hadrLow ∆h + h1 and stored in the internal memory. Figure 5.4 shows two realization alternatives: on the left hand side the DSP is used for all to above computations; on the right hand side the FPGA calculates the coefficient address update and addresses the RAM. This second alternative requires a larger FPGA, but needs less (DSP) instruction cycles. The second loop – an FIR filter with constant coefficients – is easy to implement in two MAC instruction cycles, since both operands are stored in the internal RAM. Frequency tracking XC4010 ρ[m], M[m] x[n] Filter Frequency hadr High tracking coefficients XC4013 RAM 64k×24 Filter coefficients RAM 64k×24 hadr High High order digital interpolation / decimation filter DSP56001 ρ[m], M[m] h1, ∆h h1, ∆h BlockDiagramm-DSPnew.epsi hadr Low 105 37 mm y[m] x[n] High order digital interpolation / decimation filter DSP56001 y[m] Figure 5.4: Block diagram using DSP and FPGA The author estimates that 5 - 7 instruction cycles are required for an actual implementation of the inner loop. A DSP5600x running at 40 (66) MHz allows to compute 400 (660) MAC/s at a rate of 50 kHz. If 6 instruction cycles are used for the inner loop, and if we assume that 90% of the computation time is spent in the inner loop, 60 (100) loop repetitions can be realized. A DSP/FPGA implementation of a sample-rate converter for arbitrary ratios seems therefore possible, but rather expensive if needed in large quantities. 5.4 5.4.1 ASIC Realization Implementation A VLSI implementation of a sample-rate converter (SARCO) has been realized in several semester and diploma theses [RW90, LS91, MS93]. This ASIC is realized in a standard cell and macrocell technique using a 1:2m double metal CMOS technology. It contains 230 000 transistors, including 14 kbit of static RAM and an 18 22 bit pipelined multiplier. The total die area is 6.4 mm 9.6 mm. Figures 5.5 and 5.6 show the floorplan and a photomicrograph of the chip. The architecture of a 16-bit sample-rate converter for arbitrary ratios and the high-order filter synthesis program have been developed in a first diploma thesis [RW90]. A sophisticated scheduling schema for the pipelined multiplieraccumulator has been elaborated, which allows to keep the pipeline fully loaded during the filter computation. Based on the results of this work, the specifications for a 18-bit sample-rate converter have been established. To verify the performance of the algorithm for the chosen parameters a software model of the sample-rate converter has been written and simulated [LS91]. In a second step this model has been partitioned into several functional units according to the planned chip architecture. An additional model has then been written for each functional block, which matched exactly the intended implementation e.g. concerning word-length of the datapath. This bit-level model allowed to simulate the quantization properties of the whole system. However, the integer model executed much faster than the bit-level model. The bit-level model has been used to generate stimuli and expected responses for the simulation of the VLSI circuit during development. This methodology allowed to verify each functional block separately as soon as it was implemented. Additional simulations have been performed using several functional blocks. Some key parameters of the actual implementation of SARCO are given below. A more detailed description of the filter implementation can be found in [LS91]. In [MS93] the design of the frequency counter is documented in detail. Frequency Tracking Arithmetic Unit FIR Address Generator Circular Buffer Left Registers Floorplan.epsi 108 70 mm Accu Frequency Measurement Unit Circular Buffer Right Pipelined Multiplier Moving Average RAM Circular Buffer Figure 5.5: Floorplan of SARCO Sarco-scan50dpi.eps 108 72 mm Figure 5.6: Photomicrograph of SARCO The (stereo) input samples are stored in two 256 18 bit circular buffers, for the left and right channel each. Special address generation units compute the required addresses for these sample buffers and the address of the filter coefficients. The heart of the digital filter architecture is a microprogrammed pipelined multiply-accumulate unit. A cycle time of 50 ns has been reached using three stage pipelining of the multiplier. The multiply-accumulate unit is used both for the coefficient interpolation and for the FIR filter calculation. The FIR filter coefficients of this prototype design are stored off-chip in the quasi floating point format (QFP): 22 bit mantissa, 4 bit exponent. The 25 bit wide accumulator is dynamically rescaled according to the exponent of the filter coefficient. However only subsequent shifts by either 1, 2 or 3 bits are implemented. A (total) interpolation factor L of 65536 is used (L1 = 256, L2 = 256). Therefore the required number of filter taps (Q) is 105 for the full and 85 for the relaxed specifications (Chapter 3). Each filter tap needs three multiplications: one for the coefficient interpolation and two for the FIR filter calculation of the stereo channels. Besides the inner loop, 30 additional cycles per output sample are required for initialization and postprocessing. To compute one sink sample, a total of 3 85 + 30 = 285 cycles (at 20 MHz) is needed. Therefore the maximum sample rate which can be processed is 70 kHz. SARCO is targeted for 18-bit performance. Therefore the input word-length is 18 bits and the output is rounded to 18 bits. Figure 3.8 showed that for the chosen parameters (L1 = 256, L2 = 256, s1 = ,137 dB) 18-bit performance for full-scale sine waves is only reached up to a signal frequency of 5 kHz. Improvements are suggested in the following section. 5.4.2 Discussion During the development and measurements of the sample-rate converter SARCO a deeper understanding of all side effects involved has been gained. The effects of the various parameters on the resulting signal quality have been discussed in detail in the previous chapters. The following improvements to the above implementation make full 20-bit performance possible. The frequency tracking unit realized on the ASIC reaches only a worst case accuracy of 14 bit. The digital phase-locked loop solution (realized as FPGA) replacing this unit allows to improve the resolution of the frequency tracking 0 −20 −40 −60 [dB] −80 Filter57855bit22QFP-256bit22QFP.eps 106 83 mm −100 −120 −140 −160 −180 0 0.05 0.1 0.15 0.2 0.25 [f/f sample] Figure 5.7: Interpolated FIR filter (N = 0.3 0.35 57855), L2 0.4 = 0.45 0.5 256, 22 bit QFP to 20 bit. As an additional benefit the required chip area used by this unit is reduced from 13.6 mm2 to 6.3 mm2 . The filter coefficients of SARCO are stored in an external memory. To allow one-cycle access, SRAMs buffered by a Lithium battery are used. To reduce the number of pins and the external components of the 20-bit version, the coefficients can be stored on-chip. The parameter Q, which defines the order of the interpolation/decimation filter of SARCO, is programmable for Q 85 (N 21 759). For full 20-bit performance a filter order of at least 57 855 is required. Since the impulse response is symmetric, and if the QFP notation is used a ROM size of 28 928 22 bit is sufficient. For the 1:2m technology of SARCO an additional die area of 26 mm2 is necessary. The number of filter coefficients can be substantially reduced by using higher order coefficient interpolation as outlined in Section 3.2.3.1. A second multiplier is then needed to allow coefficient interpolation in real-time. Figure 5.7 shows the spectrum of the interpolation/decimation filter of order 57 855 using an interpolation factor L2 = 256 and 22 bit QFP quantization. The average quantization noise level is at ,186 dB with a peak value of ,165 dB. The 22 bit QFP format is therefore well suited for a filter of order 57 855. The circular buffer for the source samples and the multiplier must be extended from 18 to 20 bit. This leads to an area increase of approximately 1 mm2 . The above enhancements from the (nearly) 18-bit implementation of SARCO to a full 20-bit implementation increases the chip area from 61.4 mm2 to 81 mm2 (+32%). To estimate the die area of the above 20-bit architecture in a 0.8 m technology the main blocks have been retargeted to this technology. Based on these results the author estimates that a die area of approximately 50 mm2 is sufficient. 6 Measurement Principle & Setup In this chapter the setup of an all-digital measurement system, which has been built for the characterization of the sample-rate converter is described. In addition to the digital measurements informal hearing tests have been performed using a CD-player as source. At a first glance it seems to be an easy task to measure the performance of a digital system with digital input and output using DSP-based equipment. However many specific particularities must be taken into account for accurate, consistent and relevant measurement of high-quality digital audio equipment. The measurement system is composed of generator and analyzer hardware, and a workstation (Fig. 6.1). The generator applies repeatedly in real-time a sequence of 24-bit audio samples to the device under test (DUT). The analyzer acquires the response samples, which are then processed, displayed and analyzed on a workstation using MATLAB [Mat88] – a high-performance numeric computation and visualization software. 6.1 Generator The source sample rate fsource is generated by dividing the frequency fgen of a quartz oscillator by U . This concept allows for a very stable low-jitter 83 f source Generator RAM 256k x 24 bit Digitally Generated (Sine) - Waves f sink DUT Source Samples Ssetup.epsi///Chap6Sink 103 52 mm Samples Workstation Analyzer RAM 256k x 24 bit Spectrum Analysis with MATLAB Figure 6.1: Setup of all-digital measurement system sampling clock source. With a 50 MHz quartz and sampling frequencies in the range of 48 kHz discrete frequencies with a resolution of 46 Hz can be generated. A large memory (256 k 24 bit) holds the source sample values and is read out at the source sample rate. Any sequence of source values of length up to the size of the memory can be loaded into the RAM and is then applied periodically to the DUT. This approach allows the use of dedicated test signals, like pseudo-random noise, artificially distorted sine waves, multiple frequency signals, as well as music signals of a length up to 6 s. The ratio of the test signal frequency to sampling frequency has a significant effect on the spectral content of the signal [HB86]. If the sampling frequency is a multiple of the test signal frequency then the quantization error components are located at the harmonics of the test signal. This is often undesirable since the true harmonic distortion should be separated from the quantization noise. If adequate dither is added prior to the quantization, the above effect is reduced, since dither randomizes the quantization process and results in a white-noise characteristic of the quantization noise [VL84, LWV92]. The same effect can be reached if the test signal frequency and the sampling frequency are chosen to be of non-integer (preferably close to irrational) ratio and the analysis is performed over several periods of the test signal. Using this method, the signal degradation caused by the added dither can be avoided. If the generator RAM contains LOOP values and NP periods of the test signal, the following equations hold: fsource = U1 fgen NP NP f fsig = LOOP source = LOOP U fgen (6.1) (6.2) NP should be chosen to be a (large) prime number to ensure that NP and LOOP have no common factors for reasonable choices of fsig and fsource . For fsource 48 kHz and fsig 997 Hz, i.e. the following parameters result (fgen = 50 MHz): U = 1 042; NP = 5 087; LOOP = 244 832 The resulting frequencies are: fsource = 47:985 kHz and fsig = 997:002 Hz. Using this method, only discrete but closely spaced test signal frequencies can be generated (the next larger signal frequency would be 997.006 Hz in the above example). It can be checked easily that the signal frequency fsig is not related to the other frequencies by a small integer ratio – neither to fsource nor to fgen . 6.2 Analyzer The analyzer is built in a fashion similar to the generator (Fig. 6.1). The analyzer quartz frequency fanl is divided by V to generate the sink sampling clock fsink . The sink sample rate can alternatively be derived from the generator quartz instead of the separate analyzer quartz to allow synchronous measurements. The samples are acquired at the sink sample rate with 24-bit precision and written into the analyzer RAM. The spectrum of the acquired samples is calculated using the FFT and displayed graphically on the workstation using MATLAB. The power spectrum of an N -point FFT consists of N discrete frequency values – so-called bins. Therefore the frequency resolution is limited to fsink =N . When calculating the FFT, it is assumed that the sequence of N points is repeated periodically. The discontinuities at the beginning and at the end of the sequence, which may result from this repetition, distort the spectrum. The two well-known methods to cope with this problem are given in the next two sections. 6.2.1 Frequency Matching for the FFT If the test signal frequency is chosen to lay on a bin, i.e. fsig = n Nfsink (6.3) the data sequence contains exactly n periods of the test signal and no discontinuity occurs at the boundary of the measurement interval. This technique is often used in A/D- and D/A-converter testing [Der87], but is not well suited for sample-rate conversion characterization, since it must be possible to choose the test signal frequency independently of the sink sample rate to comply with the requirements given in the previous section. In addition this method is only feasible if Equation 6.3 can be met exactly, i.e. if the exact sink sampling frequency is available. A small mismatch leads to spectral leakage discussed below. 6.2.2 Windowing for the FFT The second method to avoid the discontinuity at the border of the data sequence is to multiply the data by a window function w[i]. [Har78] compared many different windows and concluded that the Kaiser-Bessel and the BlackmanHarris window perform best. We use the 4-term Blackman-Harris window (BH4) and the Kaiser-Bessel window with = 6 (KB6), due to their high sidelobe attenuation. In Figure 6.2 the spectrum of the BH4-window is plotted. The highest sidelobes are at ,92 dB. The ,6-dB bandwidth of the window is 2.72 bins. Therefore signal frequencies which are at least 3 bins apart can be distinguished [Har78]. Note that if the signal frequency falls exactly on a bin of the FFT, the BH4 window is sampled at it’s (equidistant) zero-crossings and the effect of the window disappears. On the other hand, if the signal frequency falls in the middle of two bins, the BH4 window is sampled on the upper envelope in Figure 6.2. The parameter of the Kaiser-Bessel window allows a trade-off between mainlobe width and sidelobe attenuation. The Kaiser-Bessel window for = 6 is plotted in Figure 6.3. The ,6-dB bandwidth of the KB6 window is 3.31 bins, and the highest sidelobes are at ,145 dB. The window coefficients are Bessel functions and therefore the zero-crossings are not equidistant. 0 −20 −40 [dB] −60 BH4win.eps 89 69 mm −80 −100 −120 −140 −160 −80 −60 −40 −20 0 [bin] 20 40 60 80 Figure 6.2: Spectrum of 4-term Blackman-Harris window (BH4) 0 −20 −40 −60 [dB] −80 KB6win.eps 89 69 mm −100 −120 −140 −160 −180 −80 −60 −40 −20 0 [bin] 20 40 60 80 Figure 6.3: Spectrum of Kaiser-Bessel window with = 6 (KB6) What is the required accuracy of the measurement system for high-quality digital audio characterization? For a full-scale sine wave quantized with k bits the quantization noise floor is ,6:02 k , 1:76 dB below the signal level. Therefore the window sidelobes of the BH4 window at ,92 dB would be only a minor error for 16-bit measurements. However, this estimate is misleading, since the quantization noise is modified by the window function and distributed over N=2 bins of the FFT resulting in a lower noise level. Several measurements are usually averaged to obtain a uniform distribution of the noise over all bins. cg Two characteristic values of a specific window w[i] are its coherent gain and its equivalent noise bandwidth nbw [Har78]. cg = N 1 N X N X w[i]; nbw = N cg2 w2 [i] i=1 i=1 1 (6.4) The coherent gain cg is the average value of the window. To preserve the energy of the signal, the amplitude of the spectrum is divided by cg . The window function amplifies the noise level by its noise-bandwidth nbw. In a N -point FFT the average noise floor for k-bit quantization appears therefore at , 10 log N , 6:02 k , 1:76 dB 2 nbw (6.5) Table 6.1 gives examples values of Eq. 6.5 for the BH4 window (nbw = 2:004). The listed values must be increased by 1.0 dB for the KB6 window, due to its larger noise bandwidth (nbw = 2:492). N k =16 bit k =18 bit k =20 bit 1024 ,122:2 dB ,134:2 dB ,146:2 dB 16384 ,134:2 dB ,146:2 dB ,158:2 dB 65536 ,140:2 dB ,152:2 dB ,164:2 dB Table 6.1: Average noise floor level of a sine wave with k -bit quantization using a N -point FFT with BH4 window 6.2.3 Complex Windows In Sections 6.2.1 and 6.2.2 we have presented two methods to allow accurate spectrum analysis of high-quality audio signals. Both methods exhibit certain limitations: The frequency matching method requires the precise sampling frequencies to be available and the performance of the windowing method depends strongly on the chosen window. The BH4 window has its sidelobes as high as ,92 dB and the KB6 window needs coefficients which are expensive to calculate and allows only a limited frequency resolution, due to its wider mainlobe. To overcome the inaccuracies caused by the sidelobes of the BH4 window, the signal frequency must be chosen to fall exactly on a bin as mentioned above. To avoid this restriction we make use of the frequency transformation property of the Fourier transform. The signal frequency is determined by a first FFT and is then shifted slightly onto a bin (Eq. 6.6) without affecting the signal energy. s0 = s e,jΩt (6.6) We have called this combination of the complex correction and the window ‘complex BH4 window’ (BH4C) [RF94]. It is now also used in the newest version of the ‘Audio Precision One’ software. This method is only applicable for windows which have equidistant zero-crossings that can be matched on a bin (i.e. not to KB6 window). If the frequency measurement is not accurate enough remnants of the window will still be visible. Figure 6.4 shows the Hanning, the BH4, and the KB6 window with three different offsets. The BH4C window is a good compromise between the mainlobe width and the inaccuracies introduced by imprecise frequency measurement. The KB6 window is well suited for multi-tone measurements, where it is not possible (and not desirable) to match all frequencies on a bin. 6.3 Measurement Procedure This section concludes this chapter with the step-by-step description of the procedure used for the measurements in this thesis. Offset = 0.004 bin Offset = 0.500 bin 0 0 −20 −20 −20 −40 −40 −40 −60 −60 −60 −80 Wincompare.eps 89 72 mm −80 [dB] [dB] [dB] Offset = 0.000 bin 0 −80 −100 −100 −100 −120 −120 −120 −140 −140 −140 −160 −10 0 [bin] 10 −160 −10 0 [bin] 10 −160 −10 0 [bin] 10 Figure 6.4: Influence of (bin-)offset on window performance: BH4 ( Hanning (, ,), KB6 (, , ,) ), 1. The source and the sink sample rates are chosen. Since they are generated by dividing the frequency of the generator and the analyzer quartz respectively, parameters U , V are set. If a particular precise sample rate – such as 44.100 kHz – must be generated, an appropriate quartz with a multiple of this rate must be used. 2. The source samples are loaded from the computer into the generator RAM. A signal frequency according to Equation 6.2 should be chosen. If this is not possible dither must be added to the source signal. 3. N sink samples are acquired into the analyzer RAM and transferred to the computer. A first FFT is used to determine the signal frequency. The signal is then multiplied by the complex Blackman-Harris window and a second FFT is calculated in double precision using MATLAB. To average several (Avg ) measurements the signal acquisition is repeated Avg times. The acquired N sink samples are transformed into the frequency domain by the FFT and are then averaged. 4. The current standard for measurement of digital audio equipment [Cab91] states that the notch filter to suppress the signal frequency for ‘total harmonic distortion and noise’ (THD+N) measurements should have an ‘electrical Q’ between 1 and 5. We realized the notch filter in frequency domain, by setting the appropriate bins to zero. Two THD+N values are given in each plot: one for Q = 5 and one for a notch filter with a constant width of 16 bins (i.e. Q = 21 for fsig = 1 kHz, fsink = 48 kHz, N = 16k). According to the standard only noise components up to 20 kHz are taken into account. Therefore it is possible to get THD+N values which are slightly below the minimum for k-bit quantization. For further reference Figure 6.5 shows the spectrum of an (ideal) sine wave quantized with 18 bit. A 16k-FFT has been calculated using the BH4C window. The signal level is 0 dB, and the total harmonic distortion and noise is the same for the narrow and the standard (Q = 5) notch filter. 16 sequences have been averaged (Avg = 16). The slow rise of the noise floor towards low frequencies is a residual of the BH4C compensation. 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −110.5dB; −110.5dB, Avg = 16 0 −20 −40 [dB] −60 Sine18BH4.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 [Hz] Figure 6.5: Sine wave quantized with 18 bit 4 10 7 Results In this chapter measurements of our VLSI implementation (SARCO) of the sample-rate converter are presented to verify the theoretical concepts developed in the previous chapters. Measurements of a commercial implementation are added for comparison purposes. Three major sources of signal degradation due to sample-rate conversion have been identified in this thesis: Non-uniform sampling due to finite precision frequency tracking Non-ideal reconstruction/anti-aliasing lowpass filter (including the hold operation, the coefficient interpolation and quantization, and the finite stopband attenuation) Quantization noise in the datapath The phase deviation of the computed sink sample moment caused by the finite precision frequency tracking and jitter of the sampling clocks results in a non-uniform resampling of the signal. The phase deviation is modulated onto the audio signal and leads to a wider signal peak. The level of the modulated error signal is proportional to both the signal frequency fsig and the signal amplitude A. Note however that jitter on the sampling clocks is attenuated by 93 the digital PLL and therefore the signal quality may even be improved by the sample-rate conversion. The zero-order hold operation and the coefficient interpolation lead to spurious signal peaks near the zero-crossings of the interpolation function spectrum ( SINC for hold operation, SINC 2 for linear interpolation). These peaks are folded back into the baseband by the resampling process. Note that the resulting filter transfer function is still strictly linear phase, since the filter coefficients are exactly symmetric even after interpolation and quantization. These error components caused by the interpolation and hold operation can easily be identified in the spectrum, since they appear always in pairs with (nearly) identical amplitude, but with a frequency difference of twice the signal frequency fsig . The amplitude of these peaks scales proportionally to the signal amplitude A. The error peaks caused by the hold operation are proportional to the signal frequency fsig , whereas the peaks caused by the coefficient interpolation scale by (fsig )2 . In Section 7.1 measurements of the distortion caused by the non-ideal frequency tracking are shown and discussed. In Section 7.2 the performance of the entire sample-rate converter is measured and error components caused by the non-ideal filtering are pointed out. Measurements of the error caused by the finite datapath width are not shown, since this quantization is masked by the 18 bit output quantization of SARCO. 7.1 Frequency Tracking Measurements The phase of the sink samples 'sink [m] is calculated starting from an initial phase '0 . It is incremented for each sink sample by the (measured) ratio of source and sink sample rate M [m] (Eq. 5.2, Fig. 4.2). Jitter of the source and sink sampling clock and the finite arithmetic precision of the frequency tracking unit lead to a (small) inaccuracy of the calculated sink sample moment. For the characterization in this section the output of the frequency measurement unit is acquired with the analyzer (Sec. 6.2). This measured data is used to sample an ideal, non-quantized sine wave at the corresponding sink sample moments. The spectrum of this non-uniformly sampled sine wave is then calculated and displayed. The following measurements base on source and sink sample rates derived from different quartz oscillators as described in Chapter 6. The sample-rate ratio M [m] is tracked by either the frequency counter integrated on SARCO (Sec. 5.1.1) or the digital PLL realized as FPGA (Sec. 5.1.2). 16 384 values of M [m] are acquired for different source and sink sample-rate pairs using the digital analysis system. Plots of the deviation caused by the non-ideal frequency tracking are shown on Pages 96 and 97. Each plot is subdivided into three traces. The top trace shows the deviation of M [m] from the average value in (16-bit) LSBs. The sink phase is computed as integral of M [m]. The phase deviation 'q [m] is shown in the middle plot. The bottom trace shows the (simulated) spectrum of an full-scale sine wave at 997 Hz, which is sampled non-uniformly with the measured phase deviation shown in the middle trace. The measurements on Page 96 correspond to the asynchronous sample rates fsource = 44:1 kHz and fsink = 48:0 kHz, whereas on Page 97 measurements for fsource = 48:0 kHz and fsink = 47:9 kHz are shown. This second case corresponds to plesiochronous sample rates with a difference frequency of 100 Hz. Note that the error distribution of frequency tracking measurements depends on the precise quartz frequencies of generator and analyzer, since their difference frequency may also appear in the measurements. 7.1.1 Frequency Counter Figures 7.1 and 7.4 show measurements of the performance of the frequency counter. The measured frequency ratio M [m] varies by up to 2 LSBs (16 bit) even though both sample rates are constant. The resulting phase noise contains both low-frequency and high-frequency components up to ,80 dB for a 1 kHz full-scale signal. The frequency and amplitude of the error components depend on the precise frequency of the source and sink sampling clocks and on the clock frequency of the counter. If the counting frequency is unrelated to the sampling frequency the amplitude of the error peaks in Figures 7.1 and 7.4 is reduced. 7.1.2 Digital PLL In Figures 7.3 and 7.6 performance measurements of the (20 bit) digital PLL are shown. Due to a limitation in the SARCO implementation the frequency ratio M [m] can only be used with a precision of 16 bits. For comparison Figures 7.2 and 7.5 show the performance of the PLL with M [m] reduced to 16 bit. Frequency Deviation [LSB_16] 2 0 −2 50 100 150 200 250 [Samples] 300 350 400 450 500 Phase Deviation [Deg] 0.5 PROTO-40-44-48.eps 95 48 mm 0 −0.5 2000 4000 6000 8000 [Samples] 10000 12000 14000 16000 Phase Deviation Modulated on 997 Hz Sine [dB] −100 −150 2 3 10 4 10 10 Figure 7.1: Frequency counter 16 bit, 44.1 kHz ! 48 kHz [Hz] Frequency Deviation [LSB_16] 2 0 −2 50 100 150 200 [Deg] 5 x 10 300 350 400 450 500 350 400 450 500 SARCO-44-48.eps 95 48 mm 0 −5 250 [Samples] Phase Deviation −3 50 100 150 200 250 [Samples] 300 Phase Deviation Modulated on 997 Hz Sine [dB] −100 −150 2 3 10 4 10 10 Figure 7.2: Digital PLL 16 bit, 44.1 kHz ! 48 kHz [Hz] Frequency Deviation [LSB_16] 0.1 0 −0.1 2000 4000 6000 8000 [Samples] 10000 12000 14000 16000 12000 14000 16000 Phase Deviation [Deg] 0 SARCO20-44-48.eps 95 48 mm −1 −2 2000 4000 6000 8000 [Samples] 10000 Phase Deviation Modulated on 997 Hz Sine [dB] −100 −150 2 3 10 10 4 10 Figure 7.3: Digital PLL 20 bit, 44.1 kHz ! 48 kHz [Hz] Frequency Deviation [LSB_16] 2 0 −2 50 100 150 200 250 [Samples] 300 350 400 450 500 Phase Deviation [Deg] 0.5 PROTO-40-48-47.eps 95 48 mm 0 −0.5 2000 4000 6000 8000 [Samples] 10000 12000 14000 16000 Phase Deviation Modulated on 997 Hz Sine [dB] −100 −150 2 3 10 4 10 10 Figure 7.4: Frequency counter 16 bit, 48 kHz ! 47.9 kHz [Hz] Frequency Deviation [LSB_16] 2 0 −2 50 100 150 200 250 [Samples] 300 350 400 450 500 Phase Deviation [Deg] 0.5 SARCO-48-47.eps 95 48 mm 0 −0.5 2000 4000 6000 8000 [Samples] 10000 12000 14000 16000 Phase Deviation Modulated on 997 Hz Sine [dB] −100 −150 2 3 10 4 10 10 Figure 7.5: Digital PLL 16 bit, 48 kHz ! 47.9 kHz [Hz] Frequency Deviation [LSB_16] 0.1 0 −0.1 2000 4000 6000 8000 [Samples] 10000 12000 14000 16000 12000 14000 16000 Phase Deviation [Deg] 0.5 SARCO20-48-47.eps 95 48 mm 0 −0.5 2000 4000 6000 8000 [Samples] 10000 Phase Deviation Modulated on 997 Hz Sine [dB] −100 −150 2 3 10 10 4 10 Figure 7.6: Digital PLL 20 bit, 48 kHz ! 47.9 kHz [Hz] The output of the PLL frequency tracking unit varies only by one LSB. Due to the lowpass characteristic of the PLL these variations are restricted to low frequencies. Figure 7.3 shows a measurement for fsource = 44:1 kHz and fsink = 48:0 kHz. During the measurement interval of 16k values the 20-bit LSB ( = 0:06 16-bit LSB) switches only once. Because the frequency ratio is in general not a power of 2 it cannot be exactly represented by a unique binary number, but it is approximated by an appropriate duty cycle of the LSB. Since the LSB of the frequency ratio changes only at low frequencies, its integral – the phase deviation – may reach large values as shown in the middle trace of Figure 7.3. This phase deviation leads to a widening of the signal spectrum below ,120 dB as shown in the bottom trace. Figure 7.2 shows the measurements corresponding to Figure 7.3, but with M [m] reduced to 16 bit. The duty cycle of the (16-bit) LSB is 15 : 1 as shown in the upper trace. The resulting phase deviation has a sawtooth shape with a frequency of approximately 48 kHz=15 = 3:2 kHz, which leads to spectral lines at i.e. (0:997 3:2) kHz for a sine wave at 997 Hz, as shown in the bottom trace of Figure 7.2. Note that the spectral line at (0:997 , 3:2) kHz is aliased to 2:2 kHz. In the plesiochronous case shown in Figures 7.5 and 7.6 the difference frequency of the sample rates (100 Hz) is not totally suppressed by the lowpass of the PLL, but only attenuated. This leads to sidebands at multiples of 100 Hz. These sidebands cause only a minor degradation of the audio quality, since they are all below ,110 dB and will additionally be masked by the full-scale 997 Hz signal. The performance of the digital PLL discussed in this section could even be improved by carefully dithering the computed sink phase 'sink to distribute the quantization error over the full spectrum. For many implementations the inherent dither caused by the jitter of the sampling clocks may be sufficient. 7.2 System Measurements In this section measurements of the performance of SARCO using the digital PLL (FPGA) for frequency measurement are shown. Additionally, measurements of a commercial realization (AD1890 [Dev93]) and of SARCO using the frequency counter are given in appendix D for comparison purposes. All the following measurements have been carried out using an undithered 18 bit input signal and follow the concepts outlined in Chapter 6. The source and the sink sample rate have been derived from separate quartz oscillators with the a nominal frequency of 50 MHz. 16 384 sink samples are acquired, multiplied by the complex 4-term Blackman-Harris (BH4C), and Fourier transformed. The spectra of 16 consecutive measurements are averaged (Avg = 16). The ‘notch filter’ for the THD+N calculations is realized in frequency domain either as a 16 bin wide brick-wall filter (THD+N = ,107:4 dB in Fig. 7.7) or by using a filter with an ‘electrical Q’ of 5 (THD+N = ,107:7 dB). 16384 FFT, BH4C, 996Hz, 0dB, THD+N = -107.4dB; -107.7dB, Avg = 16 0 -20 -40 -60 Sarco-44k-48k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, SARCO Figure 7.7: SARCO + FPGA: 0 dB 997 Hz, 44.1 4 10 ! 48.0 kHz Figure 7.7 shows the resulting spectrum for sample-rate conversion from 44.1 kHz to 48.0 kHz. The THD+N almost reaches the ideal value of a requantized 18-bit signal! The THD+N value is 3 dB above the ideal value (Fig 6.5, THD+N = ,110:5 dB) due to the 18-bit output quantization of SARCO. The error caused by the coefficient interpolation is below the noise floor for a signal frequency of 1 kHz (compare to Figure 3.8, L1 = L2 = 256, Eq. 3.6), but the hold operation causes error components up to ,128 dB (Eq. 3.10). For the measurement in Figure 7.7 they do not appear in the audio band, but if 16384 FFT, BH4C, 996Hz, 0dB, THD+N = -106.7dB; -107.2dB, Avg = 16 0 -20 -40 -60 Sarco-48k-44k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_997_0dB_18_, ser_44k_16k_async, SARCO 4 10 Figure 7.8: SARCO + FPGA: 0 dB 997 Hz, 48.0 ! 44.1 kHz 16384 FFT, BH4C, 15000Hz, 0dB, THD+N = -94.0dB; -101.1dB, Avg = 16 0 -20 -40 -60 Sarco-48k-44k-15k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_15000_0dB_18_, ser_44k_16k_async, SARCO 4 10 Figure 7.9: SARCO + FPGA: 0 dB 15 kHz, 48.0 ! 44.1 kHz 16384 FFT, BH4C, 997Hz, 0dB, THD+N = -103.5dB; -107.8dB, Avg = 16 0 -20 -40 -60 Sarco-48k-47k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_997_0dB_18_, ser_47k_16k_sync, SARCO 4 10 Figure 7.10: SARCO + FPGA: 0 dB 997 Hz, 48.0 ! 47.9 kHz 16384 FFT, BH4C, 997Hz, -20dB, THD+N = -106.1dB; -106.5dB, Avg = 16 0 -20 -40 -60 Sarco-32k-31k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_32k_997_-20dB_18_, par_31k_16k_async, SARCO 4 10 Figure 7.11: SARCO + FPGA: ,20 dB 997 Hz, 32.0 ! 31.96 kHz source and sink sample rates are exchanged they appear at frequencies around 20 kHz (Fig. 7.8). Since the error components caused by the hold operation are aliased from their original frequency k L fsource fsig into the baseband, their frequency depends sensitively on the exact values of fsource and fsink . The amplitude of the error caused by the hold operation scales linearly with the signal frequency. If the signal frequency is increased from 1 kHz to 15 kHz the error components rise by 24 dB (= 20 log 15=1). Figure 7.9 shows the resulting spectrum for a 15 kHz signal converted from 48 kHz to 44.1 kHz. For a signal frequency of 15 kHz the error components caused by the coefficient interpolation become also visible with an amplitude below ,113 dB (Eq. 3.9). Note that both error sources could be drastically reduced by increasing the interpolation factors L1 and L2 . Figure 7.10 shows the artifacts caused by the frequency tracking in the plesiochronous case. Sidebands with a modulation frequency of 100 Hz appear 113 dB below the signal peak. The phase noise caused by the non-ideal frequency tracking scales proportionally to the signal amplitude. Figure 7.11 shows the spectrum of a ,20 dB signal converted from 32 kHz to 31.96 kHz resulting in a reduced (absolute) amplitude of the modulated phase error. For plesiochronous sample rates derived from crystal oscillators small artifacts may result from the finite precision frequency tracking. However short-time jitter of the sampling clocks is suppressed by the lowpass characteristic of the PLL. In Figure 7.12 the closed-loop transfer function and thus the jitter attenuation is replotted from Figure 4.12. Clock jitter above 20 Hz is attenuated by 50 dB; jitter above 300 Hz even by over 100 dB. Jitter Gain [dB] 0 −50 Jitterreject.eps 108 36 mm −100 0 10 1 2 10 10 3 10 Frequency [Hz] Figure 7.12: Jitter rejection on sampling clocks To measure the jitter rejection capabilities of the sample-rate converter the source sampling clock fsource is generated by a signal generator which allows to modulate the output frequency by an external signal (Fig. 7.13). Source Samples Sample-rate converter JitterMeasure.epsi 74 34 mm D/A f source (jitter) f sink Signal Generator Figure 7.13: Jitter measurement set-up The source samples and the (jittered) sampling clock are fed to a D/Aconverter and the spectrum is displayed using a HP35670A dynamic signal analyzer. Figure 7.14 shows the spectrum of a 997 Hz signal with sinusoidal 50 Hz jitter modulated on the source sampling clock. In Figure 7.15 the spectrum of the sink signal is shown. The 50 Hz jitter is attenuated by 70 dB as expected from Figure 7.12. For the measurements in Figures 7.16 and 7.17 triangular 400 Hz jitter at ,40 dB was modulated on the source sampling clock. Since 400 Hz jitter is attenuated by more than 100 dB it is suppressed below the 18-bit noise floor in Figure 7.17. Additional jitter measurements are given in Appendix D. 7.3 Discussion In this chapter we have presented measurements of the frequency tracking unit and the entire sample-rate converter and identified the source of the different error components. In spite of all these non-idealities, the overall conversion quality is very high. A total harmonic distortion and noise (THD+N) value below ,106 dB (,94 dB) has been reached for 1 kHz (15 kHz) full-scale signals. For signals with jittered sampling clocks the quality is even improved! If we take into account the decreasing sensitivity of the human ear towards higher frequencies and the lower signal amplitude of realistic audio signals the performance decrease for audio frequencies above 10 kHz is of minor importance. Informal hearing test revealed no audible artifacts. As pointed out in Chapter 3 (Fig. 3.8), the conversion quality can be Source-jitter-50Hz-20dB.epsi 110 62 mm Figure 7.14: Analog source signal with ,20 dB 50 Hz sinusoidal jitter of the source sampling clock (expanded scale due to limited resolution of HP35670A Dynamic Signal Analyzer) 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −87.4dB; −107.0dB, Avg = 16 0 −20 −40 [dB] −60 SARCO-jitter-50Hz-20dB.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, SARCO 4 10 Figure 7.15: SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with 50 Hz sinusoidal jitter of the source sampling clock ,20 dB Source-jitter-400Hz-40dB.epsi 110 62 mm Figure 7.16: Analog source signal with ,40 dB 400 Hz triangular jitter of the source sampling clock (expanded scale due to limited resolution of HP35670A Dynamic Signal Analyzer) 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −107.0dB; −107.5dB, Avg = 16 0 −20 −40 [dB] −60 SARCO-jitter-400Hz-40dB.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, SARCO 4 10 Figure 7.17: SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with 400 Hz triangular jitter of the source sampling clock ,40 dB enhanced to 20 bit by increasing the interpolation factors L1 and L2 to 512 and 2048 respectively. −90 −95 −100 [dB] −105 Error2.eps 108 73 mm −110 −115 −120 −125 −130 100 1000 10000 [Hz] Figure 7.18: Total error contributions of (unquantized) interpolation/decimation filter for sinusoidal audio signals: SARCO ( ), SARCO18 (, ,), SARCO20 (, , ,) In Figure 7.18 the error contributions caused by the hold operation, the coefficient interpolation and the finite stopband attenuation are (re)plotted for three different sets of L1 and L2 : SARCO (L1 = L2 = 256); SARCO18 using the same basic filter, but with higher coefficient interpolation (L1 = 256, L2 = 1024); SARCO20 redesigned for full 20-bit performance. Note that the plots in Figure 7.18 do not include the degradation caused by the quantization of the filter coefficients and in the finite word-length datapath. 8 Conclusions In this thesis the concepts for fully digital sample-rate conversion of stereo audio signals between arbitrary sample rates have been covered. It has been shown that a converter with 20-bit audio quality over the full audio band is feasible. Conceptually, the process of converting the digital source signal into the analog domain by a D/A converter and resampling it at the sink sample rate by a A/D converter is imitated in the digital domain by a high-order interpolation/decimation process. The nearly ideal reconstruction filter, which transmits the baseband unaltered but suppresses totally the higher order images, can not be realized efficiently directly, but is approximated by a three-stage digital filter: high-order FIR filter, interpolator, and zero-order hold. A trade-off between the target quality, the memory requirements and the computational cost must be found for each implementation. It has been shown that digital lowpass filters of order up to 30 000 can be directly synthesized using a filter design program that has been optimized for (very) high-order filters. A 18-bit prototype ASIC for sample-rate conversion of audio signals by arbitrary ratios has been developed and fabricated in a standard cell and macrocell technique using a 1:2m double metal CMOS technology. It contains 230 000 transistors, including 14 kbit of static RAM and an 18 22 bit pipelined mul107 tiplier. To improve the conversion quality for plesiochronous sample rates a digital PLL has been implemented in an FPGA and interfaced to the ASIC. Short-time jitter of the sampling clocks is suppressed by the loop filter of the PLL. The adaptive bandwidth of the loop filter allows for vary-speed operation, where fast tracking is required. The ASIC is therefore well suited to synchronize an input with jittered sampling clock to the master clock of a digital audio system. In this work we concentrated on the effects caused by digital signal processing and identified the sources of the different error components, but did not evaluate the results concerning psycho-acoustic phenomena. However, informal hearing tests have been carried out. Non of the involved persons could distinguish the sound of original CD signal from the audio signal converted by the ASIC. An all-digital measurement system has been built for the characterization of the sample-rate converter in the digital domain. A generator applies in realtime a sequence of 24-bit audio samples to the circuit and the analyzer acquires 256k sink values, which are then Fourier transformed. Two windows have been found suitable for spectrum analysis of multirate signals: the complex 4-term Blackman-Harris window (BH4C) and the Kaiser-Bessel window for = 6 (KB6). Using these windows, accurate spectrum analysis of highquality signals is possible without constraining the allowed signal frequencies or sample rates. In this thesis the contribution of various non-idealities to the error in audio sample-rate conversion has been evaluated and listed. This is not a hint on the low quality of digital sample-rate conversion, but allows to customize sample-rate converters for each application according to the required quality and allowed cost. The principles of digital sample-rate conversion can be extended towards other applications as i.e. the conversion of rectangular pixels of a CCD-camera to square shape for pattern recognition applications. Appendix A List of Symbols and Integrals : : Continuous-time signal Discrete-time signal () [] Sample rate Source sample rate Sink sample rate Signal frequency Cutoff frequency Generator frequency Analyzer frequency fsample fsource fsink fsig fcutoff fgen fanl Impulse response Transfer characteristic Interpolation filter Decimation filter Adaptive filter h(:) H (:) HL HM HLM Passband ripple Stopband ripple Passband edge Stopband edge p s fp fs Modulo division Integer division Convolution ? Source samples Sink samples x[n] y[m] Interpolation factor Decimation factor Time grid L M tunit Initial phase Sink phase Phase difference '0 'sink [m] ∆'[m] SINC (x) sin(x)=(x) RECT (x) mod div 1 if jxj < 0:5 0 otherwise 111 The following definite integrals are used [GR94]: Z 1 sin(a x) a x dx 0 2 Z 1 sin(a x) dx ax 0 4 Z 1 sin(a x) dx a x 0 6 Z 1 sin(a x) dx ax 0 8 Z 1 sin(a x) dx ax 0 a [a > 0] = 1 2 = 1 2 = 1 3 = 11 40 = 151 630 a [a > 0] a [a > 0] a [a > 0] a [a > 0] (A.1) (A.2) (A.3) (A.4) (A.5) The following approximations are used [BK85]: x + O(x3) [x 1] sin(n x) x + O (x3) [x 1] cos(x) 1 + O (x2) [x 1] sin x (A.6) (A.7) (A.8) The following summation formulas are used [BK85]: 1 X 1 2 x=1 x 1 X 1 4 x=1 x 1 X 1 6 x=1 x 1 X 1 8 x=1 x = = = = 2 6 4 90 6 945 8 9450 (A.9) (A.10) (A.11) (A.12) Appendix B Error Caused by Hold Effect B.1 Sine Wave Ehold = for fsig 1 X j fi fsig ) !2 fi j fi fsig fi sin( j =1 (B.1) fi we get ffsigi Ehold 2 j j =1 1 X !2 fsig =2 fi 2 1 X 1 2 j =1 j (B.2) and finally, if we substitute fi by L fsource we get a total of 2 2 f sig Ehold 3 L f source (B.3) 113 B.2 White Noise Ehold = for fsig Ehold ffi (B.4) fi we get 1Z X ∆ f + 2 j =1 , f ∆ 2 = with ∆f !2 sin( ffi ) df ∆f 1Z 1 X j fi+ 2 ∆f ∆f j =1 j fi, 2 = ffi 1 j ∆f 1 (fi)2 ∆f !2 df = ∆f (f )2 1 i 2 ∆f 3 6 34 2 = 1 X j= ∆f Z ∆ 2 ∆ 2 (B.5) 2 fi 72 (B.6) fsource and fi = L fsource we finally get 2 Ehold 72 L2 B.3 f + 2 2 f df f j , 1 1 (B.7) Pink Noise Ehold 2 for fsig Ehold 2 1Z X j =1 j fi+20 000 1 6f j fi+20 !2 sin( ffi ) df ffi (B.8) fi we get 1 Z 20 000 X j =1 20 ffi 1 6f j !2 df = 6 (f )2 2 i 1 X 2 j j =1 1 Z 20 000 20 f df (B.9) 6 (f )2 2 i with fi = 2 1 (20 000)2 = 6 2 20 000 6 fi 2 (B.10) L fsource we finally get Ehold L f source 10 472 2 (B.11) Appendix C Error Caused by Linear Interpolation C.1 Sine Wave Ehold = for fsig 1 X j fi fsig ) !4 fi j fi fsig fi sin( j =1 (C.1) fi we get ffsigi Ehold 2 j j =1 1 X !4 fsig =2 fi 4 1 X 1 4 j =1 j (C.2) and finally, if we substitute fi by L fsource we get a total of 4 4 f sig Ehold 45 L f source (C.3) 117 C.2 White Noise 1Z X Ehold = for fsig Ehold j =1 !4 sin( ffi ) df ffi (C.4) fi we get 1 ∆f 1Z X = ∆ f + 2 j =1 , = with ∆f j fi+ ∆2f 1 j fi, ∆2f ∆f f ∆ 2 ∆f ffi j !4 df = ∆f (f )4 1 i 1 X Z ∆ f + 2 4 f df f j , 1 1 4 j= ∆ 2 (C.5) 4 4 ∆f = f 5 16 7200 i 4 ∆f 5 1 (fi)4 90 (C.6) fsource and fi = L fsource we finally get 4 Ehold 7200 L4 C.3 (C.7) Pink Noise Ehold 2 for fsig Ehold 2 1Z X j =1 j fi+20 000 1 6f j fi+20 !4 sin( ffi ) df ffi (C.8) fi we get 1 Z 20 000 X j =1 20 ffi 1 6f j !4 df = 6 (f )4 2 i 1 X j 1 j= 1 4 Z 20 000 20 f 3 df (C.9) 6 (f )4 2 i with fi = 4 1 (20 000)4 = 90 4 1 1080 20 000 fi 4 (C.10) L fsource we finally get Ehold L f source 10 960 4 (C.11) Appendix D Performance Measurements 121 16384 FFT, BH4C, 996Hz, 0dB, THD+N = -105.8dB; -106.1dB, Avg = 16 0 -20 -40 -60 AD1890-44k-48k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, AD1890 4 10 Figure D.1: AD1890: 0 dB 997 Hz, 44.1 ! 48.0 kHz 16384 FFT, BH4C, 996Hz, 0dB, THD+N = -95.3dB; -103.9dB, Avg = 16 0 -20 -40 -60 PROTO-44k-48k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_44k_997_0dB_18_, par_48k_16k_async, PROTO 4 10 Figure D.2: SARCO: 0 dB 997 Hz, 44.1 ! 48.0 kHz 16384 FFT, BH4C, 996Hz, 0dB, THD+N = -107.4dB; -107.7dB, Avg = 16 0 -20 -40 -60 Sarco-44k-48k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, SARCO 4 10 Figure D.3: SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz 16384 FFT, BH4C, 996Hz, 0dB, THD+N = -105.8dB; -106.0dB, Avg = 16 0 -20 -40 -60 AD1890-48k-44k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_997_0dB_18_, ser_44k_16k_async, AD1890 4 10 Figure D.4: AD1890: 0 dB 997 Hz, 48.0 ! 44.1 kHz 16384 FFT, BH4C, 996Hz, 0dB, THD+N = -95.0dB; -104.3dB, Avg = 16 0 -20 -40 -60 PROTO-48k-44k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_997_0dB_18_, par_44k_16k_async, PROTO 4 10 Figure D.5: SARCO: 0 dB 997 Hz, 48.0 ! 44.1 kHz 16384 FFT, BH4C, 996Hz, 0dB, THD+N = -106.7dB; -107.2dB, Avg = 16 0 -20 -40 -60 Sarco-48k-44k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_997_0dB_18_, ser_44k_16k_async, SARCO 4 10 Figure D.6: SARCO + FPGA: 0 dB 997 Hz, 48.0 ! 44.1 kHz 16384 FFT, BH4C, 15000Hz, 0dB, THD+N = -95.1dB; -102.4dB, Avg = 16 0 -20 -40 -60 AD1890-48k-44k-15k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_15000_0dB_18_, ser_44k_16k_async, AD1890 4 10 Figure D.7: AD1890: 0 dB 15 kHz, 48.0 ! 44.1 kHz 16384 FFT, BH4C, 15000Hz, 0dB, THD+N = -71.2dB; -94.8dB, Avg = 16 0 -20 -40 -60 PROTO-48k-44k-15k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_15000_0dB_18_, par_44k_16k_async, PROTO 4 10 Figure D.8: SARCO: 0 dB 15 kHz, 48.0 ! 44.1 kHz 16384 FFT, BH4C, 15000Hz, 0dB, THD+N = -94.0dB; -101.1dB, Avg = 16 0 -20 -40 -60 Sarco-48k-44k-15k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_15000_0dB_18_, ser_44k_16k_async, SARCO 4 10 Figure D.9: SARCO + FPGA: 0 dB 15 kHz, 48.0 ! 44.1 kHz 16384 FFT, BH4C, 997Hz, 0dB, THD+N = -104.7dB; -105.2dB, Avg = 16 0 -20 -40 -60 AD1890-48k-47k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_997_0dB_18_, ser_47k_16k_async, AD1890 4 10 Figure D.10: AD1890: 0 dB 997 Hz, 48.0 ! 47.9 kHz 16384 FFT, BH4C, 997Hz, 0dB, THD+N = -81.9dB; -103.6dB, Avg = 16 0 -20 -40 -60 PROTO-48k-47k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_997_0dB_18_, par_47k_16k_async, PROTO 4 10 Figure D.11: SARCO: 0 dB 997 Hz, 48.0 ! 47.9 kHz 16384 FFT, BH4C, 997Hz, 0dB, THD+N = -103.5dB; -107.8dB, Avg = 16 0 -20 -40 -60 Sarco-48k-47k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_48k_997_0dB_18_, ser_47k_16k_sync, SARCO 4 10 Figure D.12: SARCO + FPGA: 0 dB 997 Hz, 48.0 ! 47.9 kHz 16384 FFT, BH4C, 997Hz, -20dB, THD+N = -109.3dB; -109.5dB, Avg = 16 0 -20 -40 -60 AD1890-32k-31k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_32k_997_-20dB_18_, ser_31k_16k_async, AD1890 Figure D.13: AD1890: 4 10 ,20 dB 997 Hz, 32.0 ! 31.96 kHz 16384 FFT, BH4C, 997Hz, -20dB, THD+N = -89.2dB; -106.4dB, Avg = 16 0 -20 -40 -60 PROTO-32k-31k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_32k_997_-20dB_18_, par_31k_16k_async, PROTO Figure D.14: SARCO: 4 10 ,20 dB 997 Hz, 32.0 ! 31.96 kHz 16384 FFT, BH4C, 997Hz, -20dB, THD+N = -106.1dB; -106.5dB, Avg = 16 0 -20 -40 -60 Sarco-32k-31k-1k.eps 102 72 mm -80 -100 -120 -140 -160 2 3 10 10 sin_32k_997_-20dB_18_, par_31k_16k_async, SARCO 4 10 Figure D.15: SARCO + FPGA: ,20 dB 997 Hz, 32.0 ! 31.96 kHz 16384 FFT, KB6 , 12000Hz, −6dB, THD+N = −6.0dB; −102.4dB, Avg = 16 0 −20 −40 [dB] −60 AD1890-twin.eps 88 73 mm −80 −100 −120 −140 −160 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 sin_44k_11000_0dB_18__twin, ser_48k_16k_async, AD1890 Figure D.16: AD1890: 2 4 x 10 ,6 dB 11 + 12 kHz, 44.1 ! 48.0 kHz 16384 FFT, KB6 , 11000Hz, −6dB, THD+N = −6.0dB; −92.3dB, Avg = 16 0 −20 −40 [dB] −60 PROTO-twin.eps 88 73 mm −80 −100 −120 −140 −160 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 sin_44k_11000_0dB_18__twin, ser_48k_16k_async, PROTO Figure D.17: SARCO: 2 4 x 10 ,6 dB 11 + 12 kHz, 44.1 ! 48.0 kHz 16384 FFT, KB6 , 11000Hz, −6dB, THD+N = −6.0dB; −106.7dB, Avg = 16 0 −20 −40 [dB] −60 SARCO-twin.eps 88 73 mm −80 −100 −120 −140 −160 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 sin_44k_11000_0dB_18__twin, ser_48k_16k_async, SARCO 2 4 x 10 Figure D.18: SARCO + FPGA: ,6 dB 11 + 12 kHz, 44.1 ! 48.0 kHz 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −56.4dB; −66.9dB, Avg = 16 0 −20 −40 [dB] −60 AD1890-jitter.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, AD1890 4 10 Figure D.19: AD1890: 0 dB 997 Hz, 44.1 ! 48.0 kHz with wideband jitter of the source sampling clock 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −37.5dB; −39.7dB, Avg = 16 0 −20 −40 [dB] −60 PROTO-jitter.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, PROTO 4 10 Figure D.20: SARCO: 0 dB 997 Hz, 44.1 ! 48.0 kHz with wideband jitter of the source sampling clock Source-jitter.epsi 110 63 mm Figure D.21: Analog source signal with wideband jitter of the source sampling clock (expanded scale due to limited resolution of HP35670A Dynamic Signal Analyzer) 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −103.3dB; −106.8dB, Avg = 16 0 −20 −40 [dB] −60 SARCO-jitter.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, SARCO 4 10 Figure D.22: SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with wideband jitter of the source sampling clock 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −34.1dB; −81.8dB, Avg = 16 0 −20 −40 [dB] −60 AD1890-jitter-50Hz-20dB.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, AD1890 4 10 Figure D.23: AD1890: 0 dB 997 Hz, 44.1 ! 48.0 kHz with sinusoidal jitter of the source sampling clock ,20 dB 50 Hz 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −15.8dB; −25.7dB, Avg = 16 0 −20 −40 [dB] −60 PROTO-jitter-50Hz-20dB.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, PROTO 4 10 Figure D.24: SARCO: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,20 dB 50 Hz sinusoidal jitter of the source sampling clock Source-jitter-50Hz-20dB.epsi 110 62 mm Figure D.25: Analog source signal with ,20 dB 50 Hz sinusoidal jitter of the source sampling clock (expanded scale due to limited resolution of HP35670A Dynamic Signal Analyzer) 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −87.4dB; −107.0dB, Avg = 16 0 −20 −40 [dB] −60 SARCO-jitter-50Hz-20dB.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, SARCO 4 10 Figure D.26: SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with 50 Hz sinusoidal jitter of the source sampling clock ,20 dB 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −75.9dB; −75.9dB, Avg = 16 0 −20 −40 [dB] −60 AD1890-jitter-400Hz-40dB.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, AD1890 4 10 Figure D.27: AD1890: 0 dB 997 Hz, 44.1 ! 48.0 kHz with triangular jitter of the source sampling clock ,40 dB 400 Hz 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −49.9dB; −50.4dB, Avg = 16 0 −20 −40 [dB] −60 PROTO-jitter-400Hz-40dB.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, PROTO 4 10 Figure D.28: SARCO: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,40 dB 400 Hz triangular jitter of the source sampling clock Source-jitter-400Hz-40dB.epsi 110 62 mm Figure D.29: Analog source signal with ,40 dB 400 Hz triangular jitter of the source sampling clock (expanded scale due to limited resolution of HP35670A Dynamic Signal Analyzer) 16384 FFT, BH4C, 997Hz, 0dB, THD+N = −107.0dB; −107.5dB, Avg = 16 0 −20 −40 [dB] −60 SARCO-jitter-400Hz-40dB.eps 102 72 mm −80 −100 −120 −140 −160 2 3 10 10 sin_44k_997_0dB_18_, ser_48k_16k_async, SARCO 4 10 Figure D.30: SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with 400 Hz triangular jitter of the source sampling clock ,40 dB List of Figures ::::::: 2.2 Spectrum of example signal : : : : : : : : : : : : : : : 2.3 Pink noise S (f ) = 61f , white noise S (f ) = ∆1f : : : : : : 2.4 Spectrum of a sampled signal with sample rate fsample 2.5 Interpolation by a factor L : : : : : : : : : : : : : : : : : 2.6 Decimation by a factor M : : : : : : : : : : : : : : : : : 2.7 Sample-rate conversion by a fixed ratio M=L : : : : : : 2.8 Sample-rate conversion by a rational ratio M=L : : : : 2.9 UPMODE, fsink > fsource, L > M : : : : : : : : : : : : : 2.10 DOWNMODE, fsink < fsource, L < M : : : : : : : : : : : 2.11 Continuous-time representation y (t) by holding value inbetween samples xˆ [n] : : : : : : : : : : : : : : : : : : : 2.12 Sample-rate conversion by an arbitrary ratio M=L : : : 2.13 Interpolation by L and hold operation : : : : : : : : : : 2.1 D/A conversion followed by A/D conversion :::::::::: Weighted stopband error : : : : : : : : : : : : : : : : : Single stage FIR filter followed by hold operation : : : : 7 8 9 9 10 11 12 15 16 17 19 20 20 3.1 Filter with finite stopband attenuation 24 3.2 25 3.3 26 141 ::::::::: FIR filter followed by discrete-time linear interpolation : 3.4 FIR filter followed by linear interpolation 28 3.5 28 3.6 Transfer characteristic of FIR filter, linear interpolator, and hold operation : : : : : : : : : : : : : : : : : : : : : 29 3.7 FIR filter and linear interpolation followed by hold operation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 29 3.8 Error contributions vs signal frequency for sinusoidal audio signals : : : : : : : : : : : : : : : : : : : : : : : : : 32 ::::::::::::: 3.10 FIR with interpolated impulse response : : : : : : : : : 3.11 Separate interpolation and decimation filters : : : : : : 3.12 Filter specifications : : : : : : : : : : : : : : : : : : : : 3.13 Graphical representation of Table 3.8 : : : : : : : : : : 3.14 Scaling of prototype filter by = 4 using zero padding : 3.15 Comparison of zero-padded prototype ( = 16) and direct synthesis : : : : : : : : : : : : : : : : : : : : : : : 3.16 Relaxed specifications with intermediate stopband : : : 3.17 Filter with relaxed specifications (fsample = 48 kHz) : : : 3.18 FIR filter, N = 26 879, 20 bit coefficients : : : : : : : : : 3.9 FIR filter with linear interpolator 33 34 35 38 41 42 43 44 44 45 3.19 Impulse response of FIR filter using fixed-point coefficients 46 3.20 Interpolated FIR filter, L2 :::::::::::::: 3.21 Quantization noise of interpolated FIR filter, L2 = 64, 22 bit : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3.22 FIR filter, L2 = 64, quantized twice to 22 bit : : : : : : : 3.23 FIR filter, L2 = 64, quantized twice to 22 bit QFP : : : : = 64 : 47 48 50 50 4.1 Effects of non-uniform sampling on the signal spectrum :::::::::::::::::::::: 4.3 Frequency counter with PLL : : : : : : : : : : : : : : : 4.4 Frequency counter with high-speed reference clock : : 4.5 Digital PLL (DPLL) : : : : : : : : : : : : : : : : : : : : : 4.6 Synchronization with Gray counter : : : : : : : : : : : : 4.7 Open-loop transfer characteristic of the DPLL : : : : : 4.8 Adaptive digital PLL : : : : : : : : : : : : : : : : : : : : 4.9 Simulation of distortion caused by uniform jitter of M : 4.10 Spectrum of moving average filter : : : : : : : : : : : : 4.2 Phase relations 52 52 54 54 55 56 57 58 61 62 4.11 Simulation of frequency counter followed by moving average filter (fsample = 48 kHz, fsig = 997 Hz, k = 7) : : : 64 ::::: 65 4.12 Closed-loop transfer characteristic of the DPLL ::::::::::::::::::::::: Implementation of digital PLL : : : : : : : : : : : : : : : Datapath: word-lengths for FIR filter : : : : : : : : : : : Block diagram using DSP and FPGA : : : : : : : : : : : Floorplan of SARCO : : : : : : : : : : : : : : : : : : : : : Photomicrograph of SARCO : : : : : : : : : : : : : : : : Interpolated FIR filter (N = 57855), L2 = 256, 22 bit QFP 5.1 Block diagram 68 5.2 69 5.3 5.4 5.5 5.6 5.7 ::::::::: Spectrum of 4-term Blackman-Harris window (BH4) : : Spectrum of Kaiser-Bessel window with = 6 (KB6) : : 73 76 78 78 80 6.1 Setup of all-digital measurement system 84 6.2 87 6.3 87 :::: ::::::::::::: 6.4 Influence of (bin-)offset on window performance 90 6.5 Sine wave quantized with 18 bit 91 7.1 Frequency counter 16 bit, 44.1 kHz ! 48 kHz : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7.2 Digital PLL 16 bit, 44.1 kHz ! 48 kHz : : : : : : 7.3 Digital PLL 20 bit, 44.1 kHz ! 48 kHz : : : : : : 7.4 Frequency counter 16 bit, 48 kHz ! 47.9 kHz 7.5 Digital PLL 16 bit, 48 kHz ! 47.9 kHz : : : : : : : 7.6 Digital PLL 20 bit, 48 kHz ! 47.9 kHz : : : : : : 7.7 SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz 7.8 SARCO + FPGA: 0 dB 997 Hz, 48.0 ! 44.1 kHz : : 7.9 SARCO + FPGA: 0 dB 15 kHz, 48.0 ! 44.1 kHz : 7.10 SARCO + FPGA: 0 dB 997 Hz, 48.0 ! 47.9 kHz 7.11 SARCO + FPGA: ,20 dB 997 Hz, 32.0 ! 31.96 kHz : 7.12 Jitter rejection on sampling clocks : : : : : : : : : : 7.13 Jitter measurement set-up : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 96 96 96 97 97 97 99 100 100 101 101 102 103 7.14 Analog source signal with ,20 dB 50 Hz sinusoidal jitter of the source sampling clock : : : : : : : : : : : : : : : 104 7.15 SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,20 dB 50 Hz sinusoidal jitter of the source sampling clock : : : 104 7.16 Analog source signal with ,40 dB 400 Hz triangular jitter of the source sampling clock : : : : : : : : : : : : : : : 105 7.17 SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,40 dB 400 Hz triangular jitter of the source sampling clock : : : 105 7.18 Total error contributions of (unquantized) interpolation/decimation filter for sinusoidal audio signals : : : : : : : : 106 D.1 AD1890: 0 dB 997 Hz, 44.1 ! 48.0 kHz ::::: D.2 SARCO: 0 dB 997 Hz, 44.1 ! 48.0 kHz : : : : : : D.3 SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz : D.4 AD1890: 0 dB 997 Hz, 48.0 ! 44.1 kHz : : : : : D.5 SARCO: 0 dB 997 Hz, 48.0 ! 44.1 kHz : : : : : : D.6 SARCO + FPGA: 0 dB 997 Hz, 48.0 ! 44.1 kHz : D.7 AD1890: 0 dB 15 kHz, 48.0 ! 44.1 kHz : : : : : D.8 SARCO: 0 dB 15 kHz, 48.0 ! 44.1 kHz : : : : : : D.9 SARCO + FPGA: 0 dB 15 kHz, 48.0 ! 44.1 kHz : D.10 AD1890: 0 dB 997 Hz, 48.0 ! 47.9 kHz : : : : : D.11 SARCO: 0 dB 997 Hz, 48.0 ! 47.9 kHz : : : : : : D.12 SARCO + FPGA: 0 dB 997 Hz, 48.0 ! 47.9 kHz : D.13 AD1890: ,20 dB 997 Hz, 32.0 ! 31.96 kHz : : D.14 SARCO: ,20 dB 997 Hz, 32.0 ! 31.96 kHz : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : D.15 SARCO + FPGA: ,20 dB 997 Hz, 32.0 ! 31.96 kHz : D.16 AD1890: ,6 dB 11 + 12 kHz, 44.1 ! 48.0 kHz : : : : D.17 SARCO: ,6 dB 11 + 12 kHz, 44.1 ! 48.0 kHz : : : : : D.18 SARCO + FPGA: ,6 dB 11 + 12 kHz, 44.1 ! 48.0 kHz : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 122 122 123 124 124 125 126 126 127 128 128 129 130 130 131 132 132 133 D.19 AD1890: 0 dB 997 Hz, 44.1 ! 48.0 kHz with wideband jitter of the source sampling clock : : : : : : : : : : : : : 134 D.20 SARCO: 0 dB 997 Hz, 44.1 ! 48.0 kHz with wideband jitter of the source sampling clock : : : : : : : : : : : : : : : 134 D.21 Analog source signal with wideband jitter of the source sampling clock : : : : : : : : : : : : : : : : : : : : : : : 135 D.22 SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with wideband jitter of the source sampling clock : : : : : : : : : 135 D.23 AD1890: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,20 dB 50 Hz sinusoidal jitter of the source sampling clock : : : : : : : 136 D.24 SARCO: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,20 dB 50 Hz sinusoidal jitter of the source sampling clock : : : : : : : 136 D.25 Analog source signal with ,20 dB 50 Hz sinusoidal jitter of the source sampling clock : : : : : : : : : : : : : : : 137 D.26 SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,20 dB 50 Hz sinusoidal jitter of the source sampling clock : : : 137 D.27 AD1890: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,40 dB 400 Hz triangular jitter of the source sampling clock : : : : : : : 138 D.28 SARCO: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,40 dB 400 Hz triangular jitter of the source sampling clock : : : : : : : 138 D.29 Analog source signal with ,40 dB 400 Hz triangular jitter of the source sampling clock : : : : : : : : : : : : : : : 139 D.30 SARCO + FPGA: 0 dB 997 Hz, 44.1 ! 48.0 kHz with ,40 dB 400 Hz triangular jitter of the source sampling clock : : : 139 List of Tables 2.1 Error caused by hold operation (fsource ::: 21 3.1 Transition bandwidth for common sampling frequencies 24 = 48 kHz) 3.2 Estimated filter order 27 3.3 ::::::::::::::::::: Error caused by linear interpolation (fsource = 48 kHz) : 28 3.4 Error contributions from interpolation and finite stopband attenuation (white noise signal) : : : : : : : : : : : : : : 31 :::::::::: Coefficient interpolation and filter order : Filter Requirements : : : : : : : : : : : Runtime and deviation depending on : 3.5 Estimated filter order 3.6 3.7 3.8 : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 34 35 39 41 3.9 Stopband attenuation depending on coefficient precision 46 ::: 49 3.10 Quantization noise caused by second quantization 5.1 Implementation alternatives for frequency tracking 70 5.2 Datapath width in bits ::: ::::::::::::::::::: 74 6.1 Average noise floor level of a sine wave with k -bit quantization using a N -point FFT with BH4 window : : : : : 88 147 Curriculum Vitae I was born in Thalwil, Switzerland, on August 24, 1965. After finishing high school at the Kantonsschule R¨amib¨uhl Z¨urich (Matura Typ C) in 1984, I enrolled in Electrical Engineering at the Swiss Federal Institute of Technology ETH Z¨urich. I received a M.Sc. in Electrical Engineering (Dipl. El.-Ing. ETH) in 1990 and joined then the Integrated Systems Laboratory of the ETH, where I worked as a research and teaching assistant in the ASIC design and test group. Besides the work on sample-rate conversion presented in this thesis, my second main project was the evaluation of template matching algorithms and the realization of an ASIC for real-time 2D correlation of gray-scale images for an industrial process automation system. 149 Bibliography [Ada93] Robert Adams. Jitter analysis of asynchronous sample-rate conversion. In Proceedings of the 95th Convention of the AES, Preprint # 3712. Audio Engineering Society, 1993. [AK93] Robert Adams and Tom Kwan. Theory and VLSI architectures for asynchronous sample-rate converters. Journal of the Audio Eng. Soc. (AES), 41(7/8):539 – 555, July/August 1993. [Ant93] A. Antoniou. Digital Filters: Analysis, Design, and Applications. McGraw-Hill Book Company, second edition, 1993. [AW85] John W. Adams and Alan N. Willson. On the fast design of highorder FIR digital filters. IEEE Transactions on Circuits and Systems, 32(9):958 – 960, September 1985. [Ben88] K. Blair Benson. Audio engineering handbook. McGraw-Hill Book Company, first edition, 1988. [Bes93] Roland E. Best. Phase-locked loops. McGraw-Hill Book Company, second edition, 1993. [BK85] I.N. Bronstein and K.A.Semendjajew. Taschenbuch der Mathematik. Verlag Harri Deutsch, 22nd edition, 1985. [Cab91] Richard Cabot. AES standard method for digital audio engineering – measurement of digital audio equipment. Journal of the Audio Eng. Soc. (AES), 39(12):962 – 975, December 1991. [CDPS91] S. Cucchi, F. Desinan, G. Parladori, and G. Sicuranza. DSP implementation of arbitrary sampling frequency conversion for high quality sound applications. In Proc. of the Int. Conf. on Acoustics, Speech and Signal Processing, pages 3609 – 3612, Vol. 5, Toronto, 1991. 151 [CR81] Ronald E. Crochiere and Lawrence R. Rabiner. Interpolation and decimation of digital signals - a tutorial review. In Proceedings of the IEEE, pages 300–331, 1981. [CR83] Ronald E. Crochiere and Lawrence R. Rabiner. Multirate Digital Signal Processing. Prentice Hall, Englewood Cliffs, New Jersey 07632, 1983. [dD89] Richard C. den Dulk. An approach to systematic phase-lock loop design. PhD thesis, TU Delft, 1989. [Der87] F. Deravi. Design issues in FFT test systems for high performance converters. Colloquium on ’Advanced A/D Conversion Techniques’, 15. April 1987. [Dev93] Analog Devices. Stereo asynchronous sample rate converters AD1890/AD1891. Datasheet, 1993. [dGL94] R. de Gaudenzi and M. Luise. Audio and video digital radio broadcasting systems and techniques. Elsevier science B.V., Amsterdam, 1994. [Fet90] A. Fettweis. Elemente nachrichtentechnischer Systeme. B. G. Teubner, Stuttgart, 1990. [GR94] I.S. Gradshteyn and I.M. Ryzhik. Table of integrals, series, and products. Academic Press, 1994. [Har78] Francis J. Harris. On the use of windows for harmonic analysis with the discrete fourier transform. Proceedings of the IEEE, 66(1):51 – 83, January 1978. [Har90] Steven Harris. The effects of sampling clock jitter on nyquist sampling analog-to-digital converters, and on oversampling delta-sigma ADCs. Journal of the Audio Eng. Soc. (AES), 38(7/8):537 – 542, July 1990. [HB86] Joel M. Halbert and R. Allan Belcher. Selection of test signals for DSP-based testing of digital audio systems. Journal of the Audio Eng. Soc. (AES), 34(7/8):546 – 555, July 1986. [JTG+ 94] J. Janssen, D. Therssen, J. Van Ginderdeuren, L. Van Paepegem, P. Van Lierop, Z.L. Wu, and H. Verhoeven. A new principle/IC for audio sampling rate conversion. In Proceedings of the 96th Convention of the AES, Preprint # 3807. Audio Engineering Society, 1994. [Lag82] Roger Lagadec. Digital sampling frequency conversion. In Digital Audio, Collected Papers from the AES Premiere Conference, New York, pages 90 – 96. Audio Engineering Society, 1982. [LK81] Roger Lagadec and Henry O. Kunz. An universal, digital sampling frequency converter for digital audio. In Proc. of the Int. Conf. on Acoustics, Speech and Signal Processing, pages 595 – 598, Vol. 2, Atlanta, 1981. [LPW82] Roger Lagadec, Daniele Pelloni, and Daniel Weiss. A 2-channel, 16-bit digital sampling frequency converter for professional digital audio. In Proc. of the Int. Conf. on Acoustics, Speech and Signal Processing, pages 93 – 96, Vol. 1, Paris, 1982. [LS91] Gunnar Lehtinen and Andreas Steiner. Digital audio sampling converter. Master’s thesis, Integrated Systems Lab, Swiss Federal Institute of Technology (ETH Z¨urich), 1991. [LWV92] Stanley P. Lipshitz, Robert A. Wannamaker, and John Vanderkooy. Quantization and dither: A theoretical survey. Journal of the Audio Eng. Soc. (AES), 40(5):355 – 375, May 1992. [Mar93] Robert J. Marks II. Advanced Topics in Shannon Sampling and Interpolation Theory. Springer-Verlag New York, 1993. [Mat88] Matlab. Matlab user’s manual. MathWorks, Inc., Natick, MA, 1988. [Mot88] Motorola. DSP56000 digital signal processor user’s manual, 1988. [MPR73] J.H. McClellan, T.W. Parks, and L.R. Rabiner. A computer program for designing optimum FIR linear phase digital filters. IEEE Trans. Audio Electroacoust., AU-21(6):506 – 526, December 1973. [MS93] D. M¨uller and Ch. Siegrist. Frequency measurement for audio sampling rate conversion. Semesterarbeit, 1993. Integrated Systems Lab, Swiss Federal Institute of Technology (ETH Z¨urich). [Pel82] Daniele Pelloni. Der einstufige Abtastratenumsetzer. Unpublished, 1982. [PHR91] Sangil Park, Garth Hillman, and Roman Robles. A novel structure for real-time digital sample-rate converters with finite precision error analysis. In Proc. of the Int. Conf. on Acoustics, Speech and Signal Processing, pages 3613 – 3616, Vol. 5, Toronto, 1991. [Rab73] L.R. Rabiner. Approximate design relationships for lowpass FIR digital filters. IEEE Trans. Audio Electroacoust., AU-21:456 – 460, October 1973. [Ram82] Tor A. Ramstad. Sample-rate conversion by arbitrary ratios. In Proc. of the Int. Conf. on Acoustics, Speech and Signal Processing, pages 101 – 104, Vol. 1, Paris, 1982. [Ram84] Tor A. Ramstad. Digital methods for conversion between arbitary sampling frequencies. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(3):577 – 591, June 1984. [RF94] F. Rothacher and N. Felber. VLSI implementation of a fully digital asynchronous audio sample-rate converter. In Proceedings of the 96th Convention of the AES, Preprint # 3832. Audio Engineering Society, 1994. [RG75] Lawrence R. Rabiner and Bernard Gold. Theory and Application of Digital Signal Processing. Prentice Hall, Englewood Cliffs, New Jersey 07632, 1975. [RW90] Fritz Rothacher and Andreas Wieland. DIDI: Audio samplingfrequency converter for arbitrary changing ratios. Master’s thesis, Integrated Systems Lab, Swiss Federal Institute of Technology (ETH Z¨urich), 1990. [Sti92] Eduard F. Stikvoort. Some Subjects in Digital Audio, Noise Shaping, Sampling-Rate Conversion, Dynamic Range Compression and Testing. PhD thesis, TU Eindhoven, 1992. [VL84] John Vanderkooy and Stanley P. Lipshitz. Resolution below the least significant bit in digital systems with dither. Journal of the Audio Eng. Soc. (AES), 32(3):106 – 113, March 1984. Corrections in JAES 32(11):889.
© Copyright 2025