Block-Processing Approach to Fractional Sample Rate Conversion with Adjustable Timing Alexandra Groth, Heinz G. Göckler Digital Signal Processing Group Ruhr-Universität Bochum Universitätsstr. 150, D-44800 Bochum,Germany [email protected] ABSTRACT In the past various FIR block-processing structures (e.g. for fractional sample rate conversion) have been published. However, the timing interval of all proposed structures is constrained to integer multiples of Ts=MTi (Ti: timing interval of input signal). Increasing Ts by p ∈ concurrently raises the number of multipliers by the same factor p. In this paper, two novel structures are proposed that enable a higher degree of freedom for the optimal choice of the timing interval. As a result, a refined trade-off between timing interval (system clock) and hardware expenditure (chip area) is possible. INTRODUCTION The reduction of power consumption in digital systems is of increasing interest, especially in satellite communications. One common method is the power supply voltage scaling technique [1]. To this end, the system has to be clocked at a multiple of the prescribed sampling interval. Hence, parallel processing is of prime importance for this purpose as well as for overcoming technological speed constraints. X (z i ) fi L H (z ) Y ( zo ) M Lf i=Mf o fo Fig. 1. Fractional sample rate converter (FSRC). As a general example, the derivation of efficient block-processing structures for fractional sample rate converters (FSRC) is revisited. The system theoretic approach is a cascade connection of an L-fold upsampler, an FIR-filter H(z) of length N and an M-fold downsampler (Fig. 1). As a consequence, all filter operations have to be performed at the highest rate Lfi=Mfo. pM z -1 zi-1 zo-1 pL × pM i zi-1 pL pM MIMO pM LTI pL pL System zi-1 pM 1-to-pM blocking zo-1 zo-1 zo-1 pL pL-to-1 unblocking Fig. 2. Block-processing structure operating with a timing interval pTs=pMTi. Recently, various block-processing approaches to FSRC have been published [2-7]. As a result, efficient structures (Fig. 2) are known, where filtering is completely performed at a fixed subnyquist rate fs/p=fi/(pM). Hence, the timing interval is increased by a factor pLM, while the number of multipliers raises to pN (Fig. 3: ). Number of Multiplipliers System Theoretic Approach Current Block Processing Structure New Structure 1 New Structure 2 4N 3N 2N N T Ti 2Ti 3Ti To 2To Ts 5Ti 6Ti 7Ti 2Ts 9Ti 10Ti 11Ti 3Ts 4To 5To 7To 8To Timing Interval Fig. 3. Hardware expenditure (Ideal result, L=4 and M=3) Since this is often a very bad approximation of the optimal timing interval dictated by the semiconductor process for fabrication, the aim of this contribution is to deduce structures which allow more flexibility in the choice of timing. As a consequence, the number of required multipliers and adders can be reduced in compliance with the relationship between timing interval and hardware expenditure (Fig. 3). [h0 h L-1 ... h1 ] M/L z i-1 z i-1 [hL h 2L-1 ... hL+1 ] M/L [hN-L hN-1 ... hN-L+1] z -1 i M/L Digital Hold and Sample Fig. 4. Fractional Sample Rate Converter operating at fo (M>1, N/L∈ ) [8] A previous proposal for an FSRC operating at a sample rate fo (Fig. 4) has been presented in [8] (A transposed version operating at fi in [9]. Since this structure is not optimal in the sense of writing access to the digital sample and hold and no versions with αTo, α ∈ { \1} exist, efficient structures with timing intervals αTo and αTi are deduced in section 3. Beforehand efficient interpolators and decimators are reviewed (Section 2). EFFICIENT TIME-VARYING REALIZATION OF INTERPOLATORS AND DEZIMATORS Starting from the system theoretic approach, we derive efficient structures for an interpolator and a decimator, both operating at the highest occurring sampling rate. Interpolator First let us consider the system theoretic approach to a whole-numbered interpolation (Fig. 5a). Due to the L-fold upsampling only every Lth filter input sample gives rise to a non-zero multiplication. A structure avoiding these superfluous multiplications, deduced from a polyphase interpolator [10], is depicted in Fig. 5b. By sharing (time-multiplexing) each multiplier between L coefficients, this structure requires N / L hardware multipliers and N / L − 1 adders with all hardware beeing operated at the output sampling rate fo except from the N / L − 1 delays. As a consequence, this interpolator represents a time-varying filter with the reduced time interval To for memory access. X(z i ) g2L g3L g1 gL+1 g2L+1 g3L+1 g2L-1 … gL … g0 gL-1 fo Y(zo ) z i-1 z -1 i … … z i-1 G(zo) L g3L-1 gN-1 Fig. 5. a.) Interpolator (System theoretic approach) b.) Efficient realization operating at fo Starting from a transposed direct form realization of the filter G(zo) we obtain Fig. 5c. In contrast to the previous realization, this structure is inefficient due the ( N / L − 1) L delays operating at the high output sample frequency fo. g3L+1 g2L+1 gL+1 g1 fo gN-1 z o-L g3L-1 … g0 … gL … g2L … g3L g2L -1 gL-1 z o-L z -L o Fig. 5. c.) Inefficient realization operating at fo Decimator The same considerations can be applied to whole-numbered decimators. Hence, the final structure (Fig. 6b) still operates at the input frequency fi avoiding any superfluous operation. Note that the result is again a time-varying filter, which requires only N / M hardware multipliers and N / M adders, but N / M M − M + 1 delays. The structure derived from the transposed direct form realization of H(zi) is depicted in Fig. 6c. Although requiring 2 N / M − 1 adders, it is less expensive due to the 2 N / M − 1 delays needed. X(z i ) H(zi) M fi z -M i z -M i hM h2M h3M h1 hM+1 h2M+1 h3M+1 h2M-1 … hM-1 … h0 … … z i-M Y(zo ) h3M-1 hN-1 “0” Accu fo Fig. 6. a.) Decimator (System theoretic approach) b.) Realization operating at fi hM h0 h3M+1 h2M+1 hM+1 h1 “0” Accu fo h2M-1 h3M-1 “0” “0” fo Accu z -1 o … h N-1 … h2M … … fi h3M hM-1 “0” Accu fo Accu z o-1 fo z o-1 Fig. 6. c.) Realization operating at fi with less delays EFFICIENT FSRC WITH ADJUSTABLE TIMING The aim of this section is to derive an efficient FSRC-structure, operating with the timing interval Td=αTo (Structure 1) or Td=αTi (Structure 2), respectively, as desired. To this end, a time domain polyphase decomposition is applied both to the input and output signals of Fig. 1. As a result, we obtain a structure still comprising superfluous operations. Eventually, those arithmetic operations are eliminated by substituting either the interpolators or decimators according to Fig. 5 and Fig. 6, respectively. Without any loss of generality, we assume that L and M are coprime. As an example, structure 1 is derived subsequently. The output signal y(kTo) of the system depicted in Fig. 1 is related to the input signal x(nTi) by [10] y (kTo ) = M r k L − L ∑ x (nTi )h(kMT − nLT − rT ) (1) n =−∞ with r being the unknown additional delay necessary for causality. Since the final block processing structure has to comprise an input decimator delay chain system, which decomposes the input signal x(nTi) into polyphase components (with F being the common divider of α and L) x(nTi ) = αM −1 F ∑ m =0 αM αM Ti - mTi ), for j − m = n x( j F F 0, otherwise (2) and reduces the sampling rate by deleting the zeros, and an output interpolator delay chain circuit, which interleaves the polyphase components of the output signal y(kTo) according to y (iαTo + lTo ), for iα + l = k = y (kTo ) . 0, otherwise l =0 α −1 ∑ The substitutions k = iα + l with i ∈ , l ∈ {0, , α − 1} and n= j αM αM − m with j ∈ , n ∈ 0, , − 1 F F (3) are introduced into (1). As a result, we obtain the polyphase component of the output signal y[iαTo + lTo ] = αM −1 Ψ F ∑ ∑ x[ j m = 0 j =−∞ αM L Ti − mTi ]h[(i − j )α MT + lMT + mLT − rT ] F F (4) with Fr Fi Fl Fm . Ψ= + + − L α L α M α ML The substitution β β = α M + ( β )α M with β = lM + mL − r and ( β )α M = β modulo (α M ) α M leads to y[iαTo + lTo ] = αM −1 Ψ F ∑ ∑ x[ j m = 0 j =−∞ αM L lM + mL − r α MT + (lM + mL − r )α M T . (5) Ti − mTi ]h i − j + αM F F By defining the polyphase components of type 1 [10] yl( p1,α ) [iTd ] = y[iαTo + lTo ] ( p1,α M ) µ h (Td = α To = α (6) [ν Td ] = h[να MT + µT ] (7) αM L Td ] = x[ j Ti − mTi ] F F (8) M Ti) and of type 3 L ( p 3, xm αM ) F [j Eq. (5) can compactly be rewritten as yl( p1,α ) [iTd ] = αM −1 Ψ F ∑∑ m = 0 j =−∞ ( p 3, xm αM ) F L ( p1,α M ) j F Td h( lM + mL− r )α M L lM + mL − r iTd − j F Td + Td . αM (9) To guarantee causality, the following requirements have to be met: 1. The original overall system has to be causal [11], i.e. h(kMT − nLT − rT ) ( k M )<( n + r ) = 0 . L 2. (10) L All subsystems have to be causal, i.e. L lM + mL − r M) h((lMp1,+αmL Td + − r )α M iTd − j Td L = 0 αM F i< j ∀ i, j . (11) F Due to block processing the sampling instants ( calculation of output signal) are iαTo. Therefore only input signals up to jαM/FTi ≤ iαTo (jαM/FTi sampling instants of the input delay-chain decimator circuit) can be considered when calculating the output signal. As a consequence, the impulse response has to be zero for all times jαM/FTi > iαTo (or jL/FTd > iTd), i.e. for jL/F > i. Provided that the original FSRC is causal (condition 1), conditions 2 can only be satisfied, if for (9) i− j L lM + mL − r + L < 0 αM F i< j ∀ m, l (12) F holds or lM + mL − r ≤ 0 αM ∀ m, l (13) resp. Hence, this leads to α ML r ∈ − L − M + 1 , , ∞ . F (14) In order to obtain the smallest additional delay, we choose r = rmin = α ML − L − M + 1. F Thus, the block processing structure can be described in the time domain by with ( ∗ stands for convolution) yl( p1,α ) [iTd ] = αM −1 F ∑ m =0 ( p 3, xm αM ) F α −1 y ( p1,α ) (iTd ), for iα + l = k y (kTo ) = ∑ l 0, otherwise l =0 (15) α ML lM + mL − F + L + M − 1 L ( p1,α M ) Td . i F Td ∗ h(lM + mL− α ML + L + M −1)α M iTd + α M F As it can be seen from xm(p3, αM/F)[i(L/F)Td], the input signal x(nTi) is subjected to an (αM/F)-fold polyphase decomposition and is then upsampled by L/F. After the filtering with z α ML lM + mL − F + L + M −1 αM d H ( p1,α M ) α ML ( lM + mL − F + L + M −1)α M ( zd ) the resulting α polyphase components yl(p1,α)[iTd] are interleaved. The resulting structure is depicted in Fig. 7. In order to eliminate the superfluous operations caused by the (L/F)-fold upsampling each upsampler with the subsequent FIRfilter (interpolator) has to be realized according to Fig. 5. Similarly, a structure operating at Td=αTo can be derived. To this end, an α-fold time-domain polyphase decomposition of the input signal and an (αL/F)-fold polyphase decomposition of the output signal is introduced into (1), where r=(αML)/F-M-L+1 (F beeing the common factor of α and M). As a result, we obtain Fig. 6. Eventually, the superfluous operations are eliminated by replacing the (M/F)-fold decimator with to Fig. 9. The number of required hardware (multipliers, adders and delays) is depicted in Fig. 8 for N=360, L=4 and M=3. In most cases it can be observed that structure 2 needs an unallowable large number of adders and delays. As a consequence, structure 1 should be preferred. In addition it can be seen that for large timing intervals the gap between the ideal and real number of multipliers becomes larger if αM (αL resp.) does not divide N. On the other hand the number of delays decreases even in this cases. αM m=0 z -1 i αM m=1 z -1 i -αML+L+M-1 αM l=0 H(-αML+L+M-1) (zd ) αM L zd L -αML+L+2M-1 H(-αML+L+2M-1) (zd ) zd αM αM L -αML+L+αM-1 H(-αML+L+αM-1) (z d ) zd αM αM L zd -αLM+2L+M-1 H(-αML+2L+M-1) (zd ) αM αM L zd -αML+2L+2M-1 H(-αML+2L+2M-1) (zd ) αM αM L zd -αML+2L+αM-1 H(-αML+2L+αM-1) (zd ) αM αM α z -1 o l=1 α l=α-1 α z -1 o z -1 o z -1 i m=αM-1 αM L H(M-1) L H(2M-1) L H(αM-1) (zd ) αM (zd ) αM αM (zd ) Interpolator Fig. 7. FSRC with a timing interval Td=αTo (redundant arithmetic operations still remaining, α und L coprime). 1XP EHURI$ GGHUV 7LP LQJ,QWHUYDO7 V 1XP EHURI'HOD\( OHP HQWV 1XPEHURI0XOWLSOLHUV 7LP LQJ,QWHUYDO7 V 7LP LQJ,QWHUYDO7 V Fig. 8. Hardware expenditure, example N=360, L=4 and M=3) CONCLUSION A novel systematic and rigorous derivation of block-processing structures operating with a timing interval of αTi or αTo ( α ∈ ) has been given. As a consequence, we obtain efficient structures that enable more freedom for the choice of the optimal timing interval and, hence, allow a trade- off between system clock and chip area. α z -1 i z -1 i z -1 i α α m=0 m=1 m=α-1 zd -αML+L+M-1 αL zd -αML+2L+M-1 H(-αML+2L+M-1) (zd ) αL αL M zd -αLM+αL+M-1 H(-αML+αL+M-1) (zd ) αL αL M zd -αLM+L+2M-1 H(-αML+L+2M-1) (zd ) αL αL M zd -αML+2L+2M-1 H(-αML+2L+2M-1) (z d) αL αL M zd -αML+αL+2M-1 H(-αML+αL+2M-1) (zd ) αL αL M H(-αML+L+M-1) (zd ) αL M l=0 αL z -1 o l=1 αL z -1 o z -1 o l=αL-1 H(L-1) αL H(2L-1) αL H (αL-1) αL (zd ) M (zd ) M (zd ) αL M Decimator Fig. 9. FSRC with a timing interval of Td=αTi (redundant arithmetic operations still remaining, α und M coprime) REFERENCES [1] A.P. Chandrakasan, and R.W. Broderson, “Minimizing Power Consumption in Digital CMOS Circuits,” Proc. of IEEE, vol. 83, no. 4, pp. 498-523, , April 1995. [2] C.C. Hsiao, “Filter Matrix for Rational Sampling Rate Conversions,” IEEE Int. Conf. Acoustics, Speech, Signal Processing ICASSP '87, Dallas, pp. 2173-2176, 1987. [3] P.P. Vaidyanathan, “Multirate Digital Filters, Filter Banks, Polyphase Networks, and Applications: A Tutorial,” Proc. of IEEE, vol. 78, no. 1, pp. . 56-93, Januar 1990. [4] W.H. Yim, and F.P. Coakley, B.G. Evans “Extended Polyphase Structures for Multirate DSP,” IEE Proc.-F, vol. 139, no. 4, , pp.273-277, August 1992. [5] H.G. Göckler, G. Evangelista, and A. Groth, “em Minimal Polyphase Implementation of Fractional Sample Rate Conversion,” Signal Processing, vol. 81, no.4, pp. 673-691, April 2001. [6] A. Groth, and H.G. Göckler, “Efficient Minimum Group Delay Block Processing Approach to Fractional Sample Rate Conversion,” ISCAS '01, Sydney, Australia, vol. II, pp. 189-192, May 2001. [7] A. Groth, and H.G. Göckler, “Polyphase Implemenation of Unrestricted Fractional Sample Rate Conversion,” Internal Report, http://www.nt.ruhr-uni-bochum.de/lehrstuhl/mitarbeiter/alex.html, 2000. [8] R. Crochiere and L. Rabiner, “Interpolation and Decimation of Digital Signals: A Tutorial Review,” Proc. IEEE, vol. 69, pp. 300-331, March 1981. [9] J. Webb, “Transposed FIR Filter Structure with Time-Varying Coefficients for Digital Data Resampling,” IEEE Trans. on Signal Processing, vol. 48, pp. 2594-2600, September 2000. [10] P.P. Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall Signal Processing Series, 1993. [11] H.W. Schüßler, Digitale Signalverarbeitung 1, Analyse diskreter Signale und Systeme, Springer Verlag, Berlin, 1994.
© Copyright 2024