IMPLEMENTATION OF SYNCHRONOUS SAMPLE RATE CONVERTER USING MODULAR AUDIO PROCESSING SYSTEM

IMPLEMENTATION OF SYNCHRONOUS SAMPLE RATE
CONVERTER USING MODULAR AUDIO PROCESSING SYSTEM
Piotr Nykiel, Bartosz Bielawski
Warsaw University of Technology
Institute of Radioelectronics
ul. Nowowiejska 15/19, PL-00-665 Warsaw POLAND
[email protected], [email protected]
Abstract: The paper describes a simple yet powerful audio processing library for personal computers
and demonstrates how it can be used to create a synchronous sample rate converter. In the first part Modular Audio Processing System (MAPS) is presented. The
two unique features are integer-only calculations assuring high processing quality and modular design enabling
rapid system development. The second part of the paper presents possible uses of MAPS and a sample implementation of synchronous sample rate converter utilizing the library and the Qt Toolkit for GUI creation.
The software details as well as the general system architecture are presented.
1
Introduction
Operations performed during mastering audio material
may be divided into two categories: interactive and
batch actions. Interactive must be supervised by an
operator monitoring and adjusting parameters in realtime. Perceiving sound is subjective, so no computer
can handle this human task.
On the other hand there are some operations that do
not require immediate attention, those are called batch
jobs. In most cases an operator chooses what should
be done, sets all parameters and starts processing. All
other things are done automaticaly, the only thing an
operator must do is to watch for errors and wait for
the task to end. Modular Audio Processing System
(for short MAPS) was designed as a robust library for
batch processing.
Diagrams and graphs are a very convenient way of
showing principles of operation as they decomposite
complicated systems into simple blocks. The same can
be done to almost any audio processing system. The
design goal of MAPS was to provide those basic blocks
and enable user to create complex systems by easily
linking those blocks.
As it was mentioned before, MAPS is capable of running on almost any modern PC with Microsoft Windows XP/Vista or Linux operating system installed.
The library is written in plain C++ which makes it
portable and robust.
There were two main reasons, why C++ has been
chosen: it is compiled to native machine code and
it’s object-oriented. First feature ensures top performance [1]. Not so long ago only dedicated digital signal processors were capable of filtering signal in realtime. Nowadays Intel x86 CPUs with an integrated
x87 FPUs are cheap and powerful and despite being
general puprose CISC processors they can handle DSP
as well.
The object-oriented approach allows programmers to
use general interfaces and encapsulation. The library
provides set of blocks that share common interface and
hide details of operation behind it. The user is presented only few methods that set parameters, allow
block interconnection and do the procesing.
The MAPS is an unique software because it uses
fixed-point arithmetics for processing signal. Floatingpoint numbers which are commonly used in other audio
programs have some serious drawbacks. The defects
will be described in the next section.
2
Fixed-point vs. floating-point
Both ways of storing numbers in computer’s memory
use two’s complement code, which is a slight modification of the natural binary code. An n-bit number
stored in two’s complement code has a value of:
x = −dn−1 · 2n−1 +
n−2
X
di · 2i
i=0
where Dn is nth digit of number — right-most digit
has an index of 0.
Fixed-point numbers are stored the same way. The
only difference is placement of radix point, not behind
the last digit but somewhere inside the word. Let’s
consider an n-bit number with m-bit fractional part.
It’s value is as follows:
Pn−2
−dn−1 · 2n−1 + i=0 di · 2i
x=
2m
Floating-point numbers are stored using exponential
formula:
x = (−1)S · F · 2E−B
(1)
where S is sign of the number, F is a fractional part,
E is an exponent and B is a constant enabling negative
exponents. S, F and E are nonnegative integers. The
lenghts and position of these fields in IEEE754 single
standard are shown in table 1 [2].
Field
Length
Position
Sign
1
32
Exponent
8
31
Mantissa
23
24
Table 1: Structure of IEEE754 single precission
floating-point number
To compare which numbers are better for audio processing further analysis has to be conducted. The two
most common operations used in digital signal processing are addition and multiplication which are often combined into Multiply And Accumulate (MAC)
instruction of DS processors.
The case of fixed-point numbers is simple. Addition
is precise as long as the operation does not cause overflow. It is common to use result world longer than
terms’ word, thus preventing overflow. The result of
multiplication of two n-bit numbers is 2n-bits long and
has to be stored in a word of this length. To restore
initial length the result has to be somehow truncated,
in most cases rounding is used:
Multiplication of two floating-point numbers can be
described by following equation:
x · y = S1 · S2 · M1 · M2 · 2E1 +E2
Please note that M1 · M2 is fixed-point multiplication
and rounding is incorporated. It should be also mentioned that exponantial format used to store floatingpoint numbers causes quant to be non-uniform. This
leads to quantization error begin correlated with input
signal and modulative interference.
Floating-point numbers seem to be the best choice
for scientific calculations where a wide range of magnitudes is used. Audio processing is not the case,
the signal dynamic range is well known. Fixed-point
arithmetics allows designers to control signal processing with accuracy of a single bit, and that is why it was
chosen to be used in MAPS.
3
MAPS architecture
The MAPS library was designed to be modular, yet
easy to use. All entities used in signal processing were
divided into two categories: buffers and filters.
Buffers are blocks used for storing data, it can be
both audio signal as well as filter’s coefficients. Buffers
are partially aware of their contents, besides samples,
samplerate, number of channels and bitdepth is transffered.
Internally data is stored as 32-bit integers, but all
system blocks support using most significant bits (up
to 8) as integer part. Buffers automatically manage
memory for it’s contents and can load and save data
from/to text files.
Filter is not really a block, but rather an interface
for specific ones. The general interface depicted on figure 1 allows software to use same functions to perform
common tasks on all objects driven from filter.
1
x
˜ = bx + LSBc
2
It is possible to minimize the rounding error while
calculating convolution. The solution is similar to those
used in digital signal processors — using a temporal
variable that has at least two times more bits than
input operands. This way the rounding happens only
once for an output sample.
Operations on floating-point numbers are more complicated. Addition can be explained using symbols
from equation 1 (assume x > y):
x + y = (M1 + M2 · 2E2 −E1 ) · 2E1
Figure 1: Common filter interface
Fractional parts may be added (M1 + M2 ) only after
being brought to the same, bigger of the two, exponents
(E2 ). This is equal to shifting smaller mantissa right
and may lead to a loss of precission whenever exponents
do not match.
It was assumed that each block creates and controls
only it’s output buffers, while input data is acquired
using pointers to other buffers. This enables a buffer
to be source of signal to any number of buffers. The
most important functions are:
set {input|coeffs|dither} buffer() — Sets the
corresponding buffer and does the error checking.
input version configures block to work with the
given buffer. Some blocks do not implement all of
these functions (e.g. delayline doesn’t support
setting coeffs buffer).
get {output|random} buffer() — Returns a pointer
to corresponding buffer. If the parameter is invalid or the block does not support this feature
— returns NULL. Not all of these functions are
implemented in all blocks.
{set|get} out bits() — sets or gets the number of
significant bits in the output buffer(s).
flush() — flushes all delaylines and buffers in block,
sets internal parameters to default values.
process() — processes data, kind of processing depends heavily on the type of the block.
The MAPS library supports currently 9 types of
blocks, their names and functions are presented in the
picture 2.
Multiplexer (mux) interleaves two or more buffers
into one buffer. It is commonly used just before
output file block to prepare data for being written
into file. The number of inputs is set in constructor (mux()).
Demultiplexer (demux) separates one buffer into two
or more buffers. It is commonly used to disjoin
channels in input file buffer. The number of outputs is set in constructor (demux()).
Adder (adder) mixes down at least two buffers into
one buffer. Before addition samples are weighted,
channel weights are set using set input amp().
Adder supports dithering and can be a source of
signal for dither generator.
Input file (infile) is used to read data from audio
files into infile’s output buffer. This block uses
no input buffer. There are three important infile
functions: open(), close() and eof(). This
block wraps functions provided by libsndfile library [3].
Output file (outfile) is used to write data from a
buffer into an audio file. File format may be
specified. There are threee important functions:
open(), close() and set format(). outfile autodetects output samplerate and bitdepth. This
block wraps functions provided by libsndfile library [3].
Dither generator (dither) is a block designed to
provide dither signal for other blocks. It’s presence is optional.
Figure 2: Division of objects by their function
Those nine blocks can be combined to create many
different signal processing lines. In the picture 3 sample
configuration of stereo-to-mono converter is presented.
Below a short description of all blocks is presented.
FIR filter (fir filter) is a block that convolves
input signal x[n] with coefficients h[n]. This
block has one input buffer, one output buffer,
supports signal dithering and generates pseudorandom data for dither block.
PolyFIR filter (polyfir filter) supersedes
fir filter by enabling polyphase filtering. Coefficients for all subfilters are provided by coefficient
buffer. User must also set L (interpolation) and M
(decimation) factors using set filter ratio().
Please note that polyfir filter may upsample
as well as downsample signal.
Delayline (delayline) delays input signals by N
samples. N must be set using set delay(). Delayline supports dithering, but can’t be a source of
random data for dither generator.
Figure 3: Sample stereo-to-mono converter
Providing only blocks to build system configured at
compile-time is not a flexible solution. Building processing line of those components is easier, but still requires some programming skills. That is why MAPS
provides class for managing buffers and filters.
Class filter chain can be descibed as an inteligent
container with dynamically created content. Besides
storing objects it is capable of loading signal processing line from a text configuration file. Configuration
file syntax is subset of syntax supported by libconfig
library [4].
Each configuration file consists of several sections,
one of them is ”config” section, which describes some
parameters of the whole processing line. Other sections
describe blocks or buffers, one section each. Properties
of a specific section depend on the type of section, only
two things are common: name and type specifier.
Upon loading of such file filter chain object parses
input, checks for errors (e.g. checks for missing buffers)
and creates dynamic structure as described in the file.
The class provides also methods for easy access to default input and output files and wraps calling block’s
process() functions in it’s own process() method.
Simple program using MAPS is presented on listing 1.
Please note the error checking has been ommited.
Listing 1: Sample program using MAPS library
adder : {
t y p e = ” adder ” ;
i n p u t s = [ ”dmx : 0 ” , ”dmx : 1 ” ] ;
amps = [ 0 x3FFFFFFF , 0 x3FFFFFFF ] ; // 1/2
bits = 16;
};
out : {
type = ” o u t f i l e ” ;
i n p u t = ” adder ” ;
};
4
#include <i o s t r e a m >
#include ” f i l t e r c h a i n . h”
using namespace s t d ;
filter chain f ;
i n t main ( i n t a r g c , char∗ a r g v [ ] )
{
// c h e c k number o f arguments
i f ( a r g c != 4 ) {
p r i n t f ( ” \ nUsage : maps f i l t e r . f l t r . . .
i n f i l e . wav o u t f i l e . wav\n” ) ;
return 0 ;
}
// l o a d p r o c e s s i n g l i n e
f . l o a d f i l t e r ( argv [ 1 ] ) ;
// g e t p o i n t e r s , open f i l e s , r e c o n n e c t
i n f i l e ∗ inf = f . g e t d e f a u l t i n f i l e () ;
o u t f i l e ∗ outf = f . g e t d e f a u l t o u t f i l e () ;
i n f −>open ( a r g v [ 2 ] ) ;
f . reconnect () ;
o u t f −>open ( a r g v [ 3 ] ) ;
// c h e c k i f c o n n e c t e d ok
if (! f . check all () )
return 1 ;
Implementation of SSRC
The examples shown above are only a tiny piece of
MAPS’s capabilities, as it was created mainly for synchronous sample rate conversion. This process can be
done in several ways. The most simple case is a single
step decimator/interpolator which can be used only if
conversion ratio is an integer. If samplerate ratio is
rational a cascade of interpolator and decimator can
be used. This method is not optimal, as the second
stage must process audio data at L times increased
speed. The best solution is to use polyfir filter
class which can handle any rational (also integer) ratio
using polyphase filter approach.
Polyphase filtering is an effect of optimizing standard interpolator-decimator cascade [5]. Combining
those two sections leads to two LP filters placed between up-sampler and down-sampler, one of them is redundant. While upsampling leads to processing many
zero-valued samples, the down-sampling discards some
of just calculated. With a little effort a structure ommiting multiplying by 0’s and skipping unneeded output samples can be developed. This kind of structure
is implemented in MAPS’s polyfir filter. A simple
case of polyphase interpolator is presented on figure 4.
// p r o c e s s d a t a w h i l e n o t end o f f i l e
do
f . process () ;
while ( ! i n f −>e o f ( ) ) ;
p r i n t f ( ”Done : ) \n” ) ;
return 0 ;
}
Configuration file implementing processing line —
stereo to mono converter — shown on figure 3 is demonstrated on listing 2.
Listing 2: Configuration file implementing stereo-to
mono converter
c o n f i g : { // c o n f i g f o r w h o l e p r o c e s s i n g l i n e
d e s c r i p t i o n = ” s t e r e o t o mono c o n v e r t e r ” ;
d e f a u l t i n = ” in ” ;
d e f a u l t o u t = ” out ” ;
in channels = 2;
};
in : {
type = ” i n f i l e ” ;
};
dmx : {
t y p e = ”demux” ;
input = ” in ” ;
outputs = 2 ;
};
Figure 4: Simple polyphase interpolator
MAPS can be used to implement another commonly
used technique — cascading filters in order to decrease
computational complexity. This is achieved by lowering transition band requirements for a single stage,
thus reducing total filter length. MAPS may be therefore used in such cases, but the configuration file must
be moddified accordingly. Sample config file describing 44,1 kHz to 96 kHz converter using two cascaded
polyphase filters is presented on listing 3.
Listing 3: Two-stage 44 1 kHz to 96kHz stereo sample
rate converter
#w r i t t e n by : B . B i e l a w s k i , P . N y k i e l
config : {
d e s c r i p t i o n=” 44 k t o 96 k s t e r e o f i l t e r ! ” ;
i n s a m p l e r a t e =44100; i n c h a n n e l s =2;
d e f a u l t i n = ” i n ” ; d e f a u l t o u t=” out ” ; } ;
i n : { t y p e=” i n f i l e ” ; } ;
The main aim whilst developing iSRC was to create an easy interface hiding multiple options given by
MAPS. The GUI is as simple as possible, it was designed to be clear and easy to understand even for a
begginer. The end-user is presented only a limited set
of choices regarding frequencies, bit-depth and output
filename.
The application has only one window, which consists
of five regions. They have been marked on the picture
5.
dmx : { t y p e=”demux” ; i n p u t=” i n ” ; o u t p u t s =2;}
x2 b : { // c o e f f s f o r s t a g e 1
type = ” b u f f e r ” ;
f i l e n a m e = ” c o e f f s /44 −88. dat ” ; } ;
pf 1l : {
// f s x 2 , i n p u t demux : 0
t y p e=” p o l y f i r ” ; i n p u t=”dmx : 0 ” ;
c o e f f s=” x2 b ” ; f i l t e r s =2; i n c r e m e n t = 1 ; } ;
pf 1r : {
// f s x2 , i n p u t demux : 1
t y p e=” p o l y f i r ” ; i n p u t=”dmx : 1 ” ;
c o e f f s=” x2 b ” ; f i l t e r s =2; i n c r e m e n t = 1 ; } ;
pf b :{
// c o e f f s f o r s t a g e 2
t y p e=” b u f f e r ” ;
f i l e n a m e=” c o e f f s /88 −96. dat ” ; } ;
p f 2 l : { // f s x 160 / 147
t y p e=” p o l y f i r ” ; i n p u t=” p f 1 l ” ;
c o e f f s=” p f b ” ; f i l t e r s =160;
i n c r e m e n t =147; } ;
p f 2 r : { // f s x 160 / 147
t y p e=” p o l y f i r ” ; i n p u t=” p f 1 r ” ;
c o e f f s=” p f b ” ; f i l t e r s =160;
i n c r e m e n t =147; } ;
Figure 5: iSRC – frontend to the MAPS library
mx : { t y p e=”mux” ; i n p u t s =[ ” p f 2 l ” , ” p f 2 r ” ] ; } ;
out : { t y p e=” o u t f i l e ” ; i n p u t=”mx” ; } ;
The functions of the regions are as follows:
r t : { // c o e f f s f o r d i t h e r gen .
type = ” b u f f e r ” ; f i l e n a m e = . . .
” c o e f f s / rottab3 . txt ” ; };
Input file selection is used to browse and select files
to be converted. Context menu enables user to
add all files in a current directory or go directly to
another directory using standard system dialog.
d l : { // d i t h e r g e n e r a t o r l e f t
type = ” d i t h e r ” ; input = ” p f l ” ;
outputs = [ ” d l l ” , ” x 2 l ” , ” p f l ” ] ;
coeffs = ” rt ” ; };
dr : { // d i t h e r g e n e r a t o r r i g h t
type = ” d i t h e r ” ; input = ” p f r ” ;
outputs = [ ” d l r ” , ” x2 r ” , ” p f r ” ] ;
coeffs = ” rt ” ; };
5
iSRC — MAPS frontend
The MAPS library has been created as an flexible engine. This modular design enabled authors to design
simple graphical user interface without dealing with the
internals of library. Therefore GUI is using MAPS, but
as long as the API is stable, the applications can be developed separately [6].
The iSRC has been written in C++ and uses Qt
Toolkit [7]. Qt is a framework delivering cross-platform
abstraction layer for many common programming tasks
like GUI creation, configuration storage and thread
management.
Selected files panel lists files selected by user with
full path included.
Conversion options is the most complicated part.
”Conversion filter” dropbox is used to select input and output frequency and number of channels.
”Output bits” spinbox lets user choose bitdepth
of resulting file. ”-6dB” checkbox enables and disables overflow countermeasure by suppresing input
signal. If ”overwrite files” is not selected program
will not write over already existing files. ”Output dir” is a place where converter files are stored.
”Name format” is field that generates output filename. Tokens are supported (e.g.”%F” — output
frequency and ”%n” — file number). The slider at
the bottom selects program’s priority, thus choosing conversion speed and system responsiveness.
Control buttons start, stop or pause conversion.
Please note that ”stop” aborts current process.
Progress and other information region is used to
inform user about the state of conversion. All
types of messages appear here.
The usage is simple and intuitive. User first selects
files he/she wants to convert, then conversions settings
must be chosen. Pressing ”start” will begin conversion,
which cen be stopped temporarily by pressing ”pause”
or aborted by pressing ”stop”.
6
Conclusion
The MAPS library emerged from need for a precise and
fast synchronous sample rate converter. The flexibility
of the design allowed authors to use it in other DSP
tasks. Custom processing lines can be created using
simple configuration files.
In the nearest future further development of both
MAPS and iSRC is planned. The schedule includes
adding plugin support, more blocks (e.g. compandor,
noise gate, tone generator) and a new GUI.
Creating a new GUI for MAPS is an additional task
and will not be the top priority, although some design
assumptions has been made. The planned features are:
• processing line configuration files (*.fltr) support,
• interactive creation of processing lines using
graphical representation of blocks and links,
• realtime alteration of block’s parameters,
• support for various signal sources.
It is hoped that MAPS and iSRC will soon become
well-known brand and will be used by thousands audio
engineers worldwide.
References
[1] B. Stroustrup, The C++ Programming Language,
Addison-Wesley Pub Co; 3rd edition, 2000.
[2] IEEE, IEEE Standard for Floating-Point Arithmetic (IEEE754) 2008.
[3] Erik de Castro Lopo, libsndfile Manual, Homepage:
http://www.mega-nerd.com/libsndfile/
[4] Mark A. Linder, libconfig Manual,
Homepage: http://www.hyperrealm.com/libconfig/
[5] Roland E. Crochiere and Lawrence R. Rabiner, Interpolation and Decimation of Digital Signals — A
Tutorial Review, Proceedings of the IEEE vol 69,
1981.
[6] B.Bielawski, Synchronous Sample Rate Converter
— M.Sc. thesis, not yet published.
[7] Nokia, Qt 4.5 Manual,
Homepage: http://doc.qtsoftware.com/4.5
This document has been typeset with LATEX.