9HSTF MG*afi hii+

D
e
p
a
r
t
me
nto
fS
i
g
na
lP
r
o
c
e
s
s
i
nga
ndA
c
o
us
t
i
c
s
J
ussi Rämö
Equal
iz
at
io
nT
e
c
h
nique
s fo
r
H
e
adph
o
neL
ist
e
ning
Equal
iz
at
io
nT
e
c
h
nique
sf
o
rH
e
adph
o
neL
ist
e
ning
J
us
s
iR
ämö
A
a
l
t
oU
ni
v
e
r
s
i
t
y
D
O
C
T
O
R
A
L
D
I
S
S
E
R
T
A
T
I
O
N
S
Aalto University publication series
DOCTORAL DISSERTATIONS 147/2014
Equalization Techniques for
Headphone Listening
Jussi Rämö
A doctoral dissertation completed for the degree of Doctor of
Science (Technology) to be defended, with the permission of the
Aalto University School of Electrical Engineering, at a public
examination held at the lecture hall S1 of the school on 31 October
2014 at 12.
Aalto University
School of Electrical Engineering
Department of Signal Processing and Acoustics
Supervising professor
Prof. Vesa Välimäki
Thesis advisor
Prof. Vesa Välimäki
Preliminary examiners
Dr. Aki Härmä, Philips Research, The Netherlands
Dr. Joshua D. Reiss, Queen Mary University of London, UK
Opponent
Dr. Sean Olive, Harman International, USA
Aalto University publication series
DOCTORAL DISSERTATIONS 147/2014
© Jussi Rämö
ISBN 978-952-60-5878-8
ISBN 978-952-60-5879-5 (pdf)
ISSN-L 1799-4934
ISSN 1799-4934 (printed)
ISSN 1799-4942 (pdf)
http://urn.fi/URN:ISBN:978-952-60-5879-5
Unigrafia Oy
Helsinki 2014
Finland
Abstract
Aalto University, P.O. Box 11000, FI-00076 Aalto www.aalto.fi
Author
Jussi Rämö
Name of the doctoral dissertation
Equalization Techniques for Headphone Listening
Publisher School of Electrical Engineering
Unit Department of Signal Processing and Acoustics
Series Aalto University publication series DOCTORAL DISSERTATIONS 147/2014
Field of research Acoustics and Audio Signal Processing
Manuscript submitted 13 February 2014
Date of the defence 31 October 2014
Permission to publish granted (date) 6 August 2014
Language English
Monograph
Article dissertation (summary + original articles)
Abstract
The popularity of headphones has increased rapidly along with digital music and mobile
phones. The environment in which headphones are used has also changed quite dramatically
from silent to noisy, since people are increasingly using their headphones while commuting and
traveling. Ambient noise affects the quality of the perceived music as well as compels people to
listen to the music with higher volume levels.
This dissertation explores headphone listening in the presence of ambient sounds. The
ambient sounds can be either noise or informative sounds, such as speech. The first portion of
this work addresses the first case, where the ambient sounds are undesirable noise that
deteriorates the headphone listening experience. The second portion tackles the latter case, in
which the ambient sounds are actually informative sounds that the user wants to hear while
wearing headphones, such as in an augmented reality system. Regardless of the nature of the
ambient sounds, the listening experience can be enhanced with the help of equalization.
This work presents a virtual listening test environment for evaluating headphones in the
presence of ambient noise. The simulation of headphones is implemented using digital filters,
which enables arbitrary music and noise test signals in the listening test. The disturbing effect
of ambient noise is examined with the help of a simulator utilizing an auditory masking model
to simulate the timbre changes in music. Another study utilizes the same principles and
introduces an adaptive equalizer for mitigation of the masking phenomenon. This psychoacoustic audio processing system was shown to retain reasonably low sound pressure levels
while boosting the music, which is highly important from the viewpoint of hearing protection.
Furthermore, two novel hear-through systems are proposed, the first of which is a digital
augmented reality headset substituting and improving a previous analog system. The second
system is intended to be worn during loud concerts as a user-controllable hearing protector and
mixer. The main problem in both of the systems is the risk of a comb filtering effect, which can
deteriorate the sound quality. However, it is shown that the comb filtering effect is not
detrimental due to the passive isolation of an in-ear headset.
Finally, an optimization algorithm for high-order graphic equalizer is developed, which
optimizes the order of adjacent band filters to reduce ripple around the filter transition bands.
Furthermore, a novel high-precision graphic equalizer is introduced based on parallel secondorder sections. The novel equalization techniques are intended for use not only in headphone
applications, but also in wide range of other audio signal processing applications, which require
highly selective equalizers.
Keywords Acoustic signal processing, audio systems, augmented reality, digital filters,
psychoacoustics
ISBN (printed) 978-952-60-5878-8
ISBN (pdf) 978-952-60-5879-5
ISSN-L 1799-4934
ISSN (printed) 1799-4934
ISSN (pdf) 1799-4942
Location of publisher Helsinki
Pages 151
Location of printing Helsinki Year 2014
urn http://urn.fi/URN:ISBN:978-952-60-5879-5
Tiivistelmä
Aalto-yliopisto, PL 11000, 00076 Aalto www.aalto.fi
Tekijä
Jussi Rämö
Väitöskirjan nimi
Ekvalisointitekniikoita kuulokekuunteluun
Julkaisija Sähkötekniikan korkeakoulu
Yksikkö Signaalinkäsittelyn ja akustiikan laitos
Sarja Aalto University publication series DOCTORAL DISSERTATIONS 147/2014
Tutkimusala Akustiikka ja äänenkäsittelytekniikka
Käsikirjoituksen pvm 13.02.2014
Julkaisuluvan myöntämispäivä 06.08.2014
Monografia
Väitöspäivä 31.10.2014
Kieli Englanti
Yhdistelmäväitöskirja (yhteenveto-osa + erillisartikkelit)
Tiivistelmä
Digitaalisen musiikin ja matkapuhelimien yleistyminen ovat lisänneet kuulokkeiden suosiota.
Samalla kuulokkeiden tavallinen käyttöympäristö on muuttunut hiljaisesta meluisaksi, sillä
ihmiset käyttävät kuulokkeita liikkuessaan. Taustamelu vaikuttaa havaittuun äänenlaatuun ja
myös pakottaa ihmiset kuuntelemaan musiikkia suuremmalla äänenvoimakkuudella.
Tässä väitöskirjatyössä tutkitaan kuulokekuuntelua tilanteissa, joissa kuullaan myös
ympäristöstä lähtöisin olevia ääniä. Ympäristön äänet voivat olla melua tai informatiivisia
ääniä, kuten puhetta. Tämän työn ensimmäinen osa keskittyy tapaukseen, jossa ympäristön
äänet ovat kuulokekuuntelua häiritsevää melua. Toisessa osassa ympäristön äänet ovat
informatiivisia hyötyääniä, jotka käyttäjä haluaa kuulla vaikka käyttäisikin kuulokkeita, kuten
esim. lisätyn todellisuuden järjestelmässä. Kummassakin tapauksessa
kuulokekuuntelukokemusta voidaan parantaa ekvalisaattorin avulla.
Tässä työssä esitetään virtuaalinen kuuntelukoeympäristö kuulokkeiden arviointiin melussa.
Kuulokkeiden simulointi on toteutettu digitaalisilla suotimilla, jotka mahdollistavat
mielivaltaisten testisignaalien käytön. Taustamelun aiheuttamaa peittoilmiötä tutkitaan
simulaattorin avulla, joka hyödyntää auditiivista peittomallia simuloimaan havaittua musiikkia
melussa. Samoja periaatteita hyödyntämällä on toteutettu myös psykoakustinen adaptiivinen
ekvalisaattori, joka mukautuu kuunneltavaan musiikiin ja ympäristön meluun. Työssä
näytettiin, että ekvalisaattori säilyttää musiikin kohtuullisen äänipainetason, koska se
vahvistaa vain tarvittavia taajuusalueita. Tämä on erittäin tärkeää kuulovaurioiden
ehkäisemisen kannalta.
Työssä esitetään myös kaksi digitaalista läpikuuluvuussovellusta, joista ensimmäinen on
analogisen järjestelmän paranneltu versio. Toinen sovellus on kehitetty konsertteja varten,
joissa tarvitaan kuulonsuojausta. Sovellus mahdollistaa käyttäjäkohtaisen kuulonsuojauksen
ja läpikuultavan musiikin ekvalisoinnin. Molempien sovellusten ongelmana on mahdollinen
kampasuodinilmiö, joka voi heikentää järjestelmän äänenlaatua. Työssä kuitenkin osoitettiin,
että kampasuodinilmiö on hallittavissa tulppakuulokkeiden hyvän vaimennuskyvyn ansiosta.
Lisäksi tässä työssä kehitetään korkea-asteiselle graafiselle ekvalisaattorille optimointialgoritmi, joka vähentää ekvalisaattorin vasteen värähtelyä siirtymäkaistoilla. Tässä työssä
kehitettiin myös uusi tarkka graafinen ekvalisaattori, joka perustuu rinnakkaisiin toisen asteen
suotimiin. Uusia ekvalisointitekniikoita voidaan käyttää kuulokesovellusten lisäksi myös
monissa muissa audiosignaalinkäsittelyn sovelluksissa.
Avainsanat Akustinen signaalinkäsittely, audiojärjestelmät, digitaaliset suodattimet, lisätty
todellisuus, psykoakustiikka
ISBN (painettu) 978-952-60-5878-8
ISBN (pdf) 978-952-60-5879-5
ISSN-L 1799-4934
ISSN (painettu) 1799-4934
ISSN (pdf) 1799-4942
Julkaisupaikka Helsinki
Sivumäärä 151
Painopaikka Helsinki
Vuosi 2014
urn http://urn.fi/URN:ISBN:978-952-60-5879-5
Preface
The work presented in this dissertation was carried out at the Department of Signal Processing and Acoustics at Aalto University School of
Electrical Engineering in Espoo, Finland, during the period of September
2009 and October 2014. During this period, I have been lucky to have had
the chance to be involved in various projects with Nokia Gear and Nokia
Research Center. Without these great projects the writing of this thesis
would not have been possible.
I want to express my deepest gratitude to my supervisor Prof. Vesa
Välimäki for all the guidance and support during the writing of this dissertation. Your guidance has been inspiring and invaluable. I am also extremely grateful to my first supervisor Prof. Matti Karjalainen who originally believed in me and gave me the chance to work in the acoustics lab
back in 2008.
I wish to thank my co-authors Dr. Miikka Tikander, Mr. Mikko Alanko,
and Prof. Balázs Bank, for all your contributions. We managed to do
pretty cool stuff together. I would also like to thank all the people from
Nokia, with whom I have had the chance to work with during the projects.
I also want to thank Luis Costa for proof-reading my work and constantly
helping me to improve my writing skills.
I would like to thank my pre-examiners Dr. Joshua D. Reiss and Dr. Aki
Härmä, whose comments were valuable in further improving this dissertation. Thank you also to Dr. Sean Olive for agreeing to act as my opponent.
As many have said before, the working environment in the acoustics lab
is absolutely top-notch and I have had the pleasure to work with fantastic colleagues in the lab over the years. I would like to thank my
co-workers: Sami, Julian, Heidi-Maria, Rafael, Stefano, Hannu, Jussi,
Antti, Henkka, Pasi, Ole, Rémi, Jari, Jyri, Heikki T., Fabian, Atakan,
1
Preface
Ville, Magge, Mikkis, Javier, Tapani, Olli, Marko, Ilkka, Tuomo, Teemu,
Okko, Henna, Symeon, Archontis, Julia, Sofoklis, and all the other former
and present people at the acoustics lab I have met during these years.
Of course, I also need to thank all my friends outside the university who
have helped me to maintain the balance between work and free-time.
I am deeply grateful to my parents Arto Rämö and Heli Salmi for always believing and supporting me and showing pride in the things I do.
I cannot thank you enough for that. Your support has been and still is
extremely important to me. Finally, I would like to say special thanks to
my girlfriend Minna for all the support and love you have shown me over
the years.
Espoo, September 18, 2014,
Jussi Rämö
2
Contents
Preface
1
Contents
3
List of Publications
5
Author’s Contribution
7
List of Symbols
9
List of Abbreviations
11
1. Introduction
13
2. Headset Acoustics and Measurements
17
2.1 Headphone Types . . . . . . . . . . . . . . . . . . . . . . . . .
17
2.2 Headset Acoustics . . . . . . . . . . . . . . . . . . . . . . . . .
18
2.3 Ear Canal Simulators . . . . . . . . . . . . . . . . . . . . . . .
20
2.4 Frequency Response Measurements . . . . . . . . . . . . . .
22
2.5 Headphone Equalization . . . . . . . . . . . . . . . . . . . . .
24
2.6 Ambient Noise Isolation . . . . . . . . . . . . . . . . . . . . .
25
2.7 Occlusion Effect . . . . . . . . . . . . . . . . . . . . . . . . . .
27
3. Augmented Reality Audio and Hear-Through Systems
29
3.1 Active Hearing Protection . . . . . . . . . . . . . . . . . . . .
29
3.2 Augmented Reality Audio . . . . . . . . . . . . . . . . . . . .
30
3.2.1 ARA Applications . . . . . . . . . . . . . . . . . . . . .
31
3.3 Comb-Filtering Effect . . . . . . . . . . . . . . . . . . . . . . .
32
4. Auditory Masking
35
4.1 Critical Bands . . . . . . . . . . . . . . . . . . . . . . . . . . .
35
4.2 Estimation of Masking Threshold . . . . . . . . . . . . . . . .
36
3
Contents
4.3 Partial Masking . . . . . . . . . . . . . . . . . . . . . . . . . .
5. Digital Equalizers
38
39
5.1 Equalizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
5.2 Parametric Equalizer Design . . . . . . . . . . . . . . . . . .
41
5.3 High-Order Graphic Equalizer Design . . . . . . . . . . . . .
42
5.4 Parallel Equalizer Design . . . . . . . . . . . . . . . . . . . .
44
5.5 Comparison of the Graphic Equalizers . . . . . . . . . . . . .
46
6. Summary of Publications and Main Results
49
7. Conclusions
53
Bibliography
57
Errata
69
Publications
71
4
List of Publications
This thesis consists of an overview and of the following publications which
are referred to in the text by their Roman numerals.
I J. Rämö and V. Välimäki. Signal Processing Framework for Virtual
Headphone Listening Tests in a Noisy Environment. In Proc. AES 132th
Convention, 6 pages, Budapest, Hungary, April 2012.
II J. Rämö, V. Välimäki, M. Alanko, and M. Tikander. Perceptual Frequency Response Simulator for Music in Noisy Environments. In Proc.
AES 45th Int. Conf., 10 pages, Helsinki, Finland, March 2012.
III J. Rämö, V. Välimäki, and M. Tikander. Perceptual Headphone Equalization for Mitigation of Ambient Noise. In Proc. Int. Conf. Acoustics,
Speech, and Signal Processing, pp. 724–728, Vancouver, Canada, May
2013.
IV J. Rämö and V. Välimäki. Digital Augmented Reality Audio Headset.
Journal of Electrical and Computer Engineering, Vol. 2012, Article ID
457374, 13 pages, September 2012.
V J. Rämö, V. Välimäki, and M. Tikander. Live Sound Equalization and
Attenuation with a Headset. In Proc. AES 51st Int. Conf., 8 pages,
Helsinki, Finland, August 2013.
VI J. Rämö and V. Välimäki. Optimizing a High-Order Graphic Equalizer
for Audio Processing. IEEE Signal Processing Letters, Vol. 21, No. 3,
pp. 301–305, March 2014.
5
List of Publications
VII J. Rämö, V. Välimäki, and B. Bank. High-Precision Parallel Graphic
Equalizer. IEEE/ACM Transactions on Audio, Speech and Language
Processing, Vol. 22, No. 12, pp. 1894–1904, December 2014.
6
Author’s Contribution
Publication I: “Signal Processing Framework for Virtual Headphone
Listening Tests in a Noisy Environment”
The author collaborated in the development of the filter design algorithms.
He made the measurements and implemented the whole system by himself. He also wrote and presented the entire work.
Publication II: “Perceptual Frequency Response Simulator for Music
in Noisy Environments”
The author conducted the measurements, listening tests, and implementation of the perceptual frequency response simulator in collaboration
with the co-authors. Furthermore, the author wrote the article and presented it at the conference.
Publication III: “Perceptual Headphone Equalization for Mitigation of
Ambient Noise”
The author collaborated in the algorithm development. He implemented
the signal processing algorithms for the perceptual headphone equalizer.
Furthermore, the author wrote and presented the entire paper.
Publication IV: “Digital Augmented Reality Audio Headset”
The author collaborated in the signal processing algorithm development.
He implemented and wrote the work in its entirety except for the writing
of abstract and conclusions.
7
Author’s Contribution
Publication V: “Live Sound Equalization and Attenuation with a
Headset”
The author designed the system in collaboration with the co-authors, based
on the original idea of the second author. The author programmed the
Matlab simulator as well as the real-time demonstrator. Furthermore,
the author wrote the article and presented it at the conference.
Publication VI: “Optimizing a High-Order Graphic Equalizer for
Audio Processing”
The author discovered, designed, and implemented the optimization algorithm and was the main author of the paper.
Publication VII: “High-Precision Parallel Graphic Equalizer”
The author was responsible for coordinating and editing the work based
on the idea of the co-authors. The author wrote the abstract and Sections
I, IV, and V. He also collaborated in the algorithm design and implemented
most of the new Matlab codes and examples as well as provided all the
figures and tables included in the paper.
8
List of Symbols
A(z)
transfer function of an allpass filter
a
frequency parameter of the Regalia-Mitra filter
ac
frequency parameter of the Regalia-Mitra filter for cut cases
ak,1 , ak,2
denominator coefficients of the parallel equalizer
b
allpass coefficient of the Regalia-Mitra filter
bk,0 , bk,1
numerator coefficients of the parallel equalizer
B
two-slope spreading function
c
speed of sound
f
frequency in Hertz
fr,open
first quarter-wavelength resonance of an open ear canal
fr,closed
first quarter-wavelength resonance of a closed ear canal
gd
gain of direct sound
geq
gain of equalized sound
h(t)
impulse response of a system
H(z)
transfer function of a system
K
gain of the Regalia-Mitra filter
L
delay in samples
LM
sound pressure level of a masker
l
length of the ear canal
leff
effective length of the ear canal
P
number of filter sections
r
average radius of the ear canal
U
masking energy offset
x(t)
input signal (sine sweep)
y(t)
output signal (recorded sine sweep)
ν
frequency in Bark units
Ω
normalized filter bandwidth of the Regalia-Mitra filter
ω0
normalized center frequency of the Regalia-Mitra filter
9
List of Symbols
10
List of Abbreviations
ANC
active noise control
ARA
augmented reality audio
CPU
central processing unit
DRP
drum reference point
FIR
finite impulse response
FFT
fast Fourier transform
GPS
Global Positioning System
GPU
graphics processing unit
HRTF
head-related transfer function
IEC
International Electrotechnical Commission
IFFT
inverse fast Fourier transform
IIR
infinite impulse response
ISO
International Organization for Standardization
ITU-T
International Telecommunication Union,
Telecommunication Standardization Sector
LPC
linear prediction coefficients
NIHL
noise-induced hearing loss
PGE
parallel graphic equalizer
SPL
sound pressure level
11
List of Abbreviations
12
1. Introduction
Nowadays many people listen to music through headphones mostly in
noisy environments due to the widespread use of portable music players,
mobile phones, and smartphones. Gartner Inc. reported that worldwide
mobile phone sales to end users totaled 1.75 billion units in 2012, while
the fourth quarter of 2012 realized a record in smartphone sales totaling
over 207 million units, which was more than a 38 percent increase from
the same quarter of 2011 [1]. However, already in the third quarter of
2013, smartphones sales reached over 250 million units, and the forecast
of global mobile phone sales in 2013 is 1.81 billion units [2]. Almost all
mobile phones, including low-priced budged phones, are capable of playing music and typically come with a stereo headset, which can also be
used as a hands-free device (see Figure 1.1). Thus, basically everybody
who owns a mobile phone is a potential headphone user.
Due to the mobile nature of today’s headphone usage, the listening environments have also changed quite dramatically from quiet homes and
offices to noisy streets and public transportation vehicles. The increased
ambient noise of the mobile listening environments introduces a problem,
since the noise masks the music by reducing its perceived loudness or by
(a)
(b)
(c)
Figure 1.1. Different mobile headsets, where (a) shows a smartphone and an in-ear headset, (b) shows a button-type headset, and (c) shows an on-the-ear headset.
13
Introduction
masking parts of the music completely. This often leads to increased noise
exposure levels, since people try to compensate the masking of the music
by turning up the volume, which can increase the risk of hearing impairment. First, people should use suitable headphones that isolate ambient
sounds well, and secondly, the ‘unmasking’ of the music signal should be
done in a way that the level of the music is increased as little as possible.
Additionally, now that people increasingly use their headphones, it is
necessary to have a hear-through function in the headphones so that communication is easy and enjoyable even without removing the headphones.
This requires built-in microphones and an equalizer to compensate for the
changes that the headphones introduce to our outer ear acoustics. The
ideal hear-through function, which is also the basis in augmented reality
audio (ARA) [3], is to have completely transparent headphones that do
not alter the timbre of the incoming sounds at all.
This work considers how we can devise and assess improved signal processing approaches that enhance the experience of listening over headphones, including how to implement a psychoacoustically reasonable equalizer for headphones, especially for noisy environments, and how to further
immerse headphone usage to be a part of our everyday lives. Thus, the
main objectives of this dissertation are to
1. provide means to demonstrate the sound quality of different headphones
especially in noisy environments,
2. enhance the mobile headphone listening experience using digital signal
processing,
3. improve the precision of graphic equalizers, and
4. develop novel digital methods for hear-through systems.
Objective 1 provides essential results that helps us to understand and
experience how different types of headphones behave in a noisy environment. After understanding the principles of objective 1, it possible to devise novel signal processing methods, as defined in objective 2, which will
enhance the headphone listening experience, e.g., by adaptively equalizing the music according to ambient noise. In order to equalize the music
to mitigate the masking caused by ambient noise, it is necessary to satisfy
14
Introduction
objective 3 and to design an equalizer with frequency selectivity similar
to that of a human ear. Finally, to include headphones more into our everyday lives, the headphones should have a hear-through function that
allows us to wear the headphones during conversations and to use them
with augmented reality audio applications.
The contents of this dissertation consist of this overview part accompanied by seven peer-reviewed publications, from which three have been
published in international journals and four in international conferences.
These publications tackle the research questions aiming to satisfy the objectives. Publications I, II, and III tackle the first objective by exploring headphone listening in noisy environments. Publication I introduces
a test environment for virtual headphone listening tests, which also enables the evaluation of headphones in the presence of virtual ambient
noise, whereas Publication II presents a perceptual frequency response
simulator, which simulates the effect of the auditory masking caused by
the ambient noise.
Objective 2 is addressed especially in Publication III, which introduces
an intelligent adaptive equalizer that is based on psychoacoustic models.
However, the groundwork for Publication III is done in Publications I, II,
and VI. Objective 3 is covered in Publications VI and VII, which present
an optimizing algorithm for a high-order graphic equalizer and a novel
high-precision graphic equalizer implemented with a parallel filter structure, respectively. The final objective is addressed in Publications IV and
V, where digital hear-through systems are developed and evaluated.
This overview part of this dissertation is organized as follows. Section 2
discusses headset acoustics and their measurement techniques, including
ear canal simulators as well as magnitude frequency response and ambient noise isolation measurements. Section 3 introduces hear-through
systems along with the concept and applications of augmented reality audio. Section 4 examines the auditory masking phenomenon, including the
concepts of masking threshold and partial masking. Section 5 introduces
digital equalizers, giving also three examples of digital equalizer filters
used during this work. Section 6 summarizes the publications and their
main results, and Section 7 concludes this overview part.
15
Introduction
16
2. Headset Acoustics and
Measurements
This section introduces basic headset acoustics and their measurement
techniques. The focus is on in-ear headphones, since they are increasingly used in mobile music listening and in hands-free devices. Furthermore, in-ear headphones provide the tightest fit of all headphones, which
leads to excellent ambient noise isolation, but at the same time introduces
problems, such as unnatural ear canal resonances and the occlusion effect,
due to the tightly blocked ear canal. The emphasis on the measurement
techniques is on the frequency response and ambient noise isolation measurements, since they greatly affect the perceived sound quality in noisy
environments.
Section 2.1 provides a brief introduction to different headphone types
and Section 2.2 introduces the basics of headphone acoustics, providing
background information for Publications I through V. Section 2.3 describes the history and current status of ear canal simulators that are
used in headphone measurements, related to Publications I–V. Sections
2.4 and 2.6 recapitulate frequency response and ambient noise isolation
measurement procedures. Section 2.7 discusses the occlusion effect, which
can easily degrade the user experience of an in-ear headset (Publications
IV and V).
2.1
Headphone Types
There are many different types of headphones, which all have their pros
and cons in divergent listening environments. Furthermore, different applications set distinct requirements for headphones, such as size, ambient
noise isolation, sound quality, and even cosmetic properties. ITU Telecommunication Standardization Sector (ITU-T) categorizes headphones into
four groups: circum-aural, supra-aural, intra-concha, and in-ear/insert
17
Headset Acoustics and Measurements
(a)
(b)
(c)
(d)
Figure 2.1. Categorization of headphones according to ITU-T Recommendation P.57,
where (a) shows a circum-aural, (b) a supra-aural, (c) an intra-concha, and
(d) an in-ear/insert headphone. Adapted from [4].
headphones [4]. Figure 2.1 shows the ITU-T categorization of the headphones, where (a) shows the circum-aural headphones, which are placed
completely around the ear against the head; (b) shows the supra-aural
headphones, which are placed on top of the pinna (also called on-the-ear
headphones); (c) shows the intra-concha headphones (button-type), which
are put loosely in the concha cavity; and (d) shows the in-ear headphones,
which are inserted tightly into the ear canal (similarly to earplugs).
When one or more microphones are wired to the headphones, they become a headset. Commonly, a mono microphone is installed in the wire
of stereo headphones. Often, the casing of the in-wire microphone is also
used as a remote control for the music player. When two microphones are
mounted into headphones in a way that they are placed near the ear canal
entrances of the user, the binaural cues of the user can be preserved well
[5].
2.2
Headset Acoustics
Normally, when a person is listening to surrounding sounds with open
ears, the incident sound waves are modified by the environment as well
as by the listener’s torso, head, and outer ear. However, when the same
surrounding sounds are captured with a conventional microphone and
later reproduced using headphones, all these natural modifications are
lost. Thus, headphone listening is quite different from the natural openear hearing that we are used to. Headphones also produce almost perfect
channel separation between the left and right channels, which results in
a situation where sound is perceived to come from inside the head.
18
Headset Acoustics and Measurements
Headphones can alter the acoustics of the open ear canal by blocking
it. This happens especially when using in-ear headphones, which tightly
block the entrance of the ear canal. An open ear canal is basically a tube,
which is open at one end and closed at the other by the ear drum. Thus,
the open ear canal acts as a quarter-wavelength resonator, which has it
first resonance at the frequency
fr,open =
c
,
4leff
(2.1)
where c is the speed of sound and leff is the effective length of the ear
canal. The effective length of the ear canal is slightly longer than the
physical length due to the attached-mass effect [6]. The effective length
of the ear canal (as an open-ended tube) can be estimated as
leff = l + 8r/3π,
(2.2)
where r is the average radius and l is the physical length of the ear canal
[6, 7].
An average ear canal has a length of approximately 27 mm and a diameter of about 7 mm [8]. When these measures are used in (2.1) and (2.2), the
effective length leff is 30 mm and the first quarter-wavelength resonance
occurs at the frequency of 2890 Hz, when the speed of sound is 346 m/s
(corresponding to 25◦ C). Typically, the first resonance of the human ear
is located around 3 kHz with the maximum boost of approximately 20 dB
[9].
Figure 2.2 shows the effect of an open ear canal, where the contributions of the pinna, head, and upper torso are also included. The curve
represents the difference between the magnitude response of a Genelec
8030A active monitor measured first 2 meters away in free-field, and then
measured again from inside a human ear canal located 2 meters away.
The microphone was placed 1 cm deep into the ear canal for the second
measurement. As can be seen in Figure 2.2, the first quarter-wavelength
resonance of this individual appears at 3.0 kHz.
When the ear canal is blocked by an in-ear headphone, the ear canal
acts as a half-wavelength resonator. The first half-wavelength resonance
occurs at
fr,closed =
c
.
2l
(2.3)
The insertion of the in-ear headphone also shortens the length of the ear
canal (by approximately 5 mm) and increases the temperature of the air in
the ear to be about 35◦ C [10]. The speed of sound at 35◦ C is approximately
19
Magnitude (dB)
Headset Acoustics and Measurements
20
10
0
−10
100
1k
Frequency (Hz)
10k
Figure 2.2. Effect of the torso, outer ear, and ear canal on the magnitude of the frequency
response measured approximately 1 cm deep from the ear canal of a human
subject. The quarter-wavelength resonance, created by the open ear canal,
appears at 3.0 kHz.
352 m/s. Using these measures in Equation (2.3), the first half-wavelength
resonance of a closed ear canal occurs at 7180 Hz. Thus, the insertion of
the headphone creates a new unnatural resonance as well as cancels the
natural resonance people are used to hearing.
2.3
Ear Canal Simulators
Headphones are typically measured using an ear canal simulator or a
dummy head. The idea of both of the measurement devices is to mimic
the actual average anatomy of the human ear canal, i.e., the measurement result should contain the ear canal resonances and the attenuation
at low frequencies due to the eardrum impedance [11, 10]. The measurement device should also provide a natural fitting of the headphone to the
measurement device.
The sound pressure level (SPL), and thus also the magnitude response,
created by a headphone in a cavity depends directly on the impedance
of the cavity. The impedance of the cavity depends on the volume of the
cavity and on the characteristics of anything connected to it [11].
In the case of an in-ear headphone inserted into an ear canal, there
is a volume between the headphone tip and the eardrum (about 0.5 cm3
[11], and it depends on the in-ear headphone and the insertion depth
of the headphone) as well as an acoustic compliance due to the middle
ear cavity and the eardrum, which corresponds to a second volume (approximately 0.8 cm3 [11]). For low frequencies, the combined volume of
1.3 cm3 determines the impedance, and thus the SPL as well. When the
20
Headset Acoustics and Measurements
frequency increases, the impedance changes from stiffness-controlled to
mass-controlled as the mass of the eardrum becomes more significant
(around 1.3 kHz) [10]. After that, the impedance corresponds only to that
of the volume between the headphone tip and the eardrum (in this example 0.5 cm3 ), since the mass of the eardrum blocks the impedance of the
middle ear [11, 10]. At higher frequencies, above around 7 kHz, the ear
canal resonances affect the impedance the most [10].
One of the earliest endeavours to build a dummy head, which included a
detailed structure of the pinnae as well as a correct representation of the
ear canal up to the eardrum location, was conducted by Alvar Wilska in
1938 [12]. Wilska was actually a physician, and he had access to a morgue,
where he was able to make a mold of the head of a male corpse. He then
made a gypsum replica of the head with the ear regions formed from gelatine. He also built his own microphones (with a membrane diameter of
13 mm) and even installed the microphones at the eardrum location at an
angle, which corresponds to the human anatomy [12].
The first headphone measurement devices were couplers that had a
cylindrical volume and a calibrated microphone [11, 10], which simulated
the acoustical load of an average adult ear. The couplers were mainly
used to measure hearing aids and audiometry headphones. However, the
simple cylindrical volume did not produce satisfactory results, which led
to the development of more ear-like couplers [11, 13].
The pioneer of the modern ear simulator is Joseph Zwislocki, who designed an occluded ear canal simulator, which has a cylindrical cavity
corresponding to the volume of the ear canal, a microphone positioned
at the ear drum location called the drum reference point (DRP) [4], and
a mechanical acoustic network simulating the acoustic impedance of the
ear drum [14].
The acoustic network consists of several small cavities (typically from
two to five), which are all connected to the main cylindrical cavity via a
thin tube [11, 10]. The purpose of these thin tubes and cavities is to simulate the effective total volume of the ear, which changes with frequency, as
explained above. As the frequency rises, the impedance of the thin tubes
that connect the small cavities to the main cavity increases, which causes
them to close off, and thus the effective volume of the simulator gradually
falls from 1.3 cm3 to 0.6 cm3 , similarly as in the real ear [11].
Sachs and Burkhard [15, 16], Brüel [10], Voss and Allen [17], and Saltykov and Gebert [18] have evaluated the performance of the Zwislocki ear
21
Headset Acoustics and Measurements
canal simulator and the previous 2 cm3 headphone couplers (often called
2cc couplers). Their results showed that even though there is a significant variability in the magnitude of the impedance for different human
ear canals, the Zwislocki ear simulators produce reasonably accurate results when compared to real ear measurements, whereas the 2 cm3 calibration couplers give worse results. Futhermore, Sachs and Burkhard
[15] suggested that better results could be achieved for the 2cc couplers,
if an electronic correction filtering is applied.
Today, the IEC 60318-4 Ed 1.0b:2010 [19] and ANSI S3.25–2009 [20]
standards define the occluded ear simulators that are used to measure
headphones. The occluded ear simulator can be used to measure in-ear
headphones and hearing aids as such, and it can be extended to simulate
the complete ear canal and the outer ear [19, 21, 22]. IEC 60318-4 (2010)
defines the frequency range of the occluded ear canal simulator to be from
100 Hz to 10 kHz, whereas the previous standard IEC 60711, published in
1981, defined an upper frequency limit of 8 kHz. Furthermore, below 100
Hz and above 10 kHz, the simulator can be used as an acoustic coupler
at frequencies down to 20 Hz and up to 16 kHz. However, the occluded
ear simulator does not simulate the human ear accurately in the extended
frequency range.
2.4
Frequency Response Measurements
The frequency response measurement is a good way to study the performance of headphones. The magnitude frequency response shows the
headphones’ ability to reproduce different audible frequencies, which determines the perceived timbral quality of the headphones. A widely used
technique to measure frequency responses is a swept-sine technique. It
has been proposed by Berkhout et al. [23], Griesinger [24, 25], Farina
[26, 27], as well as Müller and Massarani [28]. The swept-sine technique
as well as other measurement techniques are compared by Stan et al. [29].
The basic idea in sweep measurements is to apply a known signal x(t),
which has a sinusoidal shape with exponentially increasing frequency,
through the headphones, which are fitted in an ear canal simulator or
a dummy head, and record the response of the system y(t) at the DRP.
The impulse response of the system h(t) is calculated from the x(t) and
22
Headset Acoustics and Measurements
−20
Magnitude (dB)
−30
−40
−50
Circum−aural
Supra−aural
Intra−concha
In−ear
−60
−70
−80
100
1k
Frequency (Hz)
10k
Figure 2.3. Magnitude frequency responses of different headphones.
y(t) with the fast Fourier transform (FFT) deconvolution
h(t) = IFFT
FFT(y(t))
,
FFT(x(t))
(2.4)
where IFFT is the inverse fast Fourier transform.
Figure 2.3 shows four magnitude frequency responses of different headphones measured using a dummy head and the sine-sweep technique. The
responses are normalized to have the same magnitude value at 1 kHz. As
can be seen in the figure, the in-ear headphone has the strongest low frequency response.
This is caused by the pressure chamber principle [30, 31], which affects
low and middle frequencies, when sound is produced in small, airtight
enclosures. The force of the headphone driver pumps the air inside the
ear canal cavity producing a sound pressure proportional to the driver
excursion. Moreover, when the wavelength is large compared to the dimensions of the ear canal (up to about 2 kHz [30]), the sound pressure is
distributed uniformly in the volume, which enables the production of high
SPLs at low frequencies. Thus, the pressure inside the closed ear canal is
in phase with the volume displacement of the transducer membrane and
the amplitude is proportional to that volume displacement [30].
The sine-sweep measurement technique was an essential tool in Publication I, but it was also utilized in Publications II–V and VII. Publication
I introduced a signal processing framework for virtual headphone listening tests in simulated noisy environments. Briolle and Voinier proposed
the idea of simulating headphones by using digital filters in 1992 [32].
Hirvonen et al. [33] presented a virtual headphone listening test methodology, which was implemented using dummy-head recordings. Hiekkanen
23
Headset Acoustics and Measurements
et al. [34] have proposed a method to virtually evaluate and compare loudspeakers using calibrated headphones. Furthermore, a recent publication
by Olive et al. [35] tackles the virtual headphone listening task by using infinite impulse response (IIR) filters constructed based on headphone
measurements to simulate the headphones.
Similarly as in [35], the main idea in Publication I is to simulate different headphones with digital filters, which allows the use of arbitrary music, speech, or noise as the source material. Thus, a wide range of impulse
responses of different headphones were measured using the sine-sweep
technique and a standardized head-and-torso simulator. The measured
impulse responses were used as finite impulse response (FIR) filters to
simulate the measured headphones. Furthermore, the ambient noise isolation capabilities of the headphones were also measured and simulated
using FIR filters (see Section 2.6).
2.5
Headphone Equalization
In the simulator of Publication I, all the modeled headphones are listened
to through one pair of calibrated headphones. The pair of headphones
used with the simulator must be compensated so that they introduce as
little timbre coloration to the system as possible. This is done by flattening
or whitening the headphone response.
A typical method to implement the compensation of the headphones
is to use an inverse IIR filter constructed from the magnitude response
measurement of the headphone [33, 35]. Olive et al. [35] used a slightly
smoothed minimum-phase IIR correction filter, which did not apply filtering above 10 kHz. Lorho [36] fitted a smooth curve to the headphone’s
magnitude response to approximate the mean response of his measurements. The fitted curve had a flat low-frequency shape, the most accurate match in the range of 2–4 kHz (around where the first quarterwavelength resonance occurs), and a looser fit at high frequencies. Finally, he designed a linear-phase inverse filter from the smoothed approximation curve.
There can be problems with the inverse filter when the measured magnitude response has deep notches, because they will convert into high
peaks in the inversion. This was the case in [34] and in Publication I of
this work. Hiekkanen et al. [34] first smoothed the measured magnitude
response (with a moving 1/48-octave Hann window) and then inverted
24
Headset Acoustics and Measurements
the smoothed response. They then calculated a reference level based on
the average level of the inverted response (between 40–4000 Hz). After
that, they compressed all the peaks (above 4 kHz) exceeding the reference
level. Finally, they created a minimum-phase impulse response from the
inverted and peak deducted magnitude response.
In Publication I the measured magnitude response of the headphones
was first modeled with linear prediction coefficients (LPC). The LPC technique results in a minimum-phase all-pole filter, which can easily be inverted without any stability issues. A peak reduction (or actually a notch
reduction) technique was also used, and it was implemented by ’filling’
the notch in the measured magnitude response in the frequency domain,
after which the LPC technique was applied to the corrected magnitude
response where the notch was compensated.
2.6
Ambient Noise Isolation
The isolation capabilities of headphones are determined by the physical
structure of the headphones as well as by the coupling of the headphones
to the user’s ears. Circum-aural and supra-aural headphones are typically
coupled tightly against the head and the pinna, respectively. Furthermore, these headphones can have open or closed backs, where the open
back refers to earphone shells with leakage routes intentionally built into
them. Intra-concha headphones usually sit loosely in the concha cavity,
thus providing poor ambient noise isolation for the user. However, in-ear
headphones are used similarly as earplugs, and they are designed to have
good isolation properties.
Figure 2.4 shows the measured isolation curves of four different types of
headphones. In Figure 2.4, zero dB means that there is no attenuation at
all and negative values indicate increasing attenuation. The black curve
illustrates the isolation of open-back circum-aural headphones, the green
curve represents the isolation of closed-back supra-aural headphones, the
red curve shows the isolation of intra-concha headphones, and finally the
purple curve shows the isolation of in-ear headphones. As can be seen,
the in-ear headphones have clearly the best ambient noise isolation capability of these headphones, while the open-back circum-aural and the
intra-concha headphones have the worst sound isolation, as can be expected.
The isolation properties of headphones are further explored in Publica-
25
Headset Acoustics and Measurements
10
Attenuation (dB)
0
−10
−20
−30
−40
−50
Circum−aural
Supra−aural
Intra−concha
In−ear
100
1k
Frequency (Hz)
10k
Figure 2.4. Measured isolation curves of different headphone types.
tions I–V, where the first three publications estimate and simulate the
amount of ambient noise in the ear canal in a typical music listening and
communication applications and the latter two consider how the leakage
of ambient noise affects the hear-through applications (see Section 3).
The isolation capabilities of headphones can be improved with active
noise control (ANC) [37, 38, 39, 40, 41]. The idea in ANC is to actively
reduce the ambient noise at the ear canal by creating an anti-noise signal,
which has the same sound pressure in magnitude but is opposite in phase
to the noise signal. The utilization of the ANC in headsets dates back
to the 1950s, where Simshauser and Hawley [42] conducted pure-tone
experiments with an ANC headset.
The two basic forms of ANC are feedforward [40] and feedback systems [41]. The main difference in these systems is the position of the
microphones that are used to capture ambient noise. Adaptive feedforward ANC uses an external reference microphone outside the headphone
to capture the noise and a microphone inside the headphone to monitor
the error signal. Feedback ANC has only the internal error microphone
that is used to capture the noise and to monitor the error signal. The
feedforward and feedback structures can also be combined to create more
efficient noise attenuation [39, 43].
Figure 2.5 shows the effect of a feedback ANC, measured from a commercial closed-back circum-aural ANC headset. The feedback ANC works
best at low frequencies, while it has a negligible effect at high frequencies.
Furthermore, the feedback ANC typically amplifies the noise at its transition band, around 1 kHz [39], as shown in Figure 2.5. The transition
band is the frequency band between the low frequencies, where the ANC
26
Headset Acoustics and Measurements
10
ANC off
ANC on
Attenuation (dB)
0
−10
−20
−30
−40
−50
100
1k
Frequency (Hz)
10k
Figure 2.5. Effect of active noise control.
operates efficiently and the high frequencies, where the effect of the ANC
is negligible.
2.7
Occlusion Effect
The occlusion effect refers to a situation where one’s ears are blocked by a
headset or a hearing aid and their own voice sounds loud and hollow. This
effect is due to the bone conducted sounds that are conveyed to the ear
canals [44]. The occlusion effect is experienced as annoying and unnatural, since the natural perception of one’s own voice is essential to normal
communication [45]. With music listening this is rarely a problem, since
the user rarely talks while listening to music. However, with ARA applications and hearing aids, the occlusion effect can be a real nuisance.
Normally, when the ear canal is not blocked, the bone-conducted sounds
can escape through the ear canal opening into the environment outside
the ear and no extra sound pressure is generated in the ear canal [30, 46],
as illustrated in Figure 2.6(a). This is the normal situation that people
are used to hear. However, when the entrance of the ear canal is blocked,
the perceived sound of one’s own voice is changed [45]. This is because
the bone-conducted sounds that are now shut in the ear canal obey the
pressure chamber principle, as discussed in Section 2.4. This amplifies
the perception of the bone-conducted sounds when compared to the open
ear case [30, 11, 47, 48, 46], as illustrated in Figure 2.6(b). In addition
to the bone-conducted sounds, the air-conducted sounds are attenuated
when the the ear canal is occluded, which further accentuates the distorted perception of one’s own voice [45].
27
Headset Acoustics and Measurements
Bone
conducted
sound
Bone
conducted
sound
p
p
(a)
(b)
Figure 2.6. Occlusion effect. Diagram (a) illustrates the open ear, where the bone conducted sound radiates through the ear canal opening and (b) illustrates the
occluded ear, where the bone conducted sound is amplified inside the ear
canal. Adapted from [30].
The occlusion effect can be passively reduced by introducing vents in
the headset casing. However, the vents decrease the ambient noise isolation of the headset, which is usually undesirable. Furthermore, the vents
increase the risk of acoustic feedback when the headset microphone is
placed near the earpiece. The occlusion effect can also be reduced actively,
which typically uses the active feedback cancellation method similarly as
a feedback ANC. That is, there is an internal microphone inside the ear
canal connected to the signal chain with a feedback loop [49]. According
to Meija et al. [49], active occlusion reduction can provide 15 dB of occlusion reduction around 300 Hz, which results in a more natural perception
of one’s own voice and therefore increases the comfort of using a headset.
Furthermore, active occlusion reduction does not deteriorate the ambient
noise isolation capability of a headset, since no additional vents or leakage
paths are needed.
28
3. Augmented Reality Audio and
Hear-Through Systems
This section presents the concept of ARA as well as touches on the subject
of active hearing protection. According to the World Health Organization
[50], more than 360 million people have disabling hearing loss. The number is so large that the current production is less than 10 % of the global
need.
It is extremely important to protect one’s hearing, and not only from loud
environmental noises, but also from loud music listening, which can cause
hearing loss as well. Furthermore, it is widely accepted that temporary
hearing impairments can lead to accumulated cellular damage, which can
cause permanent hearing loss [51]. This section provides background information for Publications III, IV and V.
3.1
Active Hearing Protection
Noise-induced hearing loss (NIHL) occurs when we are exposed to very
loud sounds (over 100 dB) or sounds that are loud (over 85 dB) for a long
period of time. NIHL can be roughly divided into two categories, namely
work-related and self-imposed NIHLs. The first one can be controlled with
regulations, which, e.g., enforce employees to wear hearing protection in
loud working environments. However, laws or regulations cannot prevent
self-imposed NIHL, such as that caused by listening to loud music or live
concerts.
People’s listening habits when using headphones have been studied, e.g.,
in [52, 53], where the highest user-set music levels reached over 100 dB
in the presence of background noise [52]. Studies conducted during live
music concerts [54, 55] indicate that the A-weighted exposure levels can
exceed 100 dB during a concert. Furthermore, Clark et al. [54] discovered that five of the six subjects had significant hearing threshold shifts
29
Augmented Reality Audio and Hear-Through Systems
(< 50 dB) mainly around the 4-kHz region after a rock concert. The hearing of all subjects returned to normal 16 hours after the concert.
A major problem with passive hearing protection is that it impairs speech
intelligibility as well as blocks informative sounds, such as alarms or instructions. With active hearing protectors, it is possible to implement
a hear-through function, which lets low-level ambient sounds, such as
speech, through the hearing protector by capturing the sounds with an
included microphone and by reproducing them with a loudspeaker that
is implemented in the hearing protector. When the level of the ambient
sounds increases, the gain of the reproduced ambient sounds is decreased,
and eventually when the gain goes to zero, the harmfully loud sounds are
passively attenuated by the hearing protector [56].
Publications III and V tackle the NIHL problem by offering signal processing applications which reduce the noise exposure caused by music.
Publication III introduces an intelligent, perceptually motivated equalizer for mobile headphone listening that adapts to different noisy environments. The idea is to boost only those frequency regions that actually
need amplification, and in this way minimize the volume increase caused
by the boosting. Publication V presents a hear-through application that
can be used during a loud concert to limit the noise exposure caused by the
live music. The hear-through function is user controllable, which allows
the user to do some mixing of their own during the concert.
Furthermore, the idea to enhance the target signal in the presence of
noise is also used in mobile communication applications [57, 58, 59] as
well as in automotive audio [60, 61]. In mobile communication applications, the target signal is typically speech, and this technique is called
near-end listening enhancement.
3.2
Augmented Reality Audio
Augmented reality is defined as a real-time combination of real and virtual worlds [3]. In ARA, a person’s natural sound environment is extended
with virtual sounds by mixing the two together so that they are heard simultaneously.
The mobile usage of ARA requires a headset with binaural microphones
[62, 63, 64]. A prerequisite of an ARA headset is that it should be able to
relay the natural sound environment captured by the microphones to the
ear canal unaltered, and with minimal latency. This copy of the natural
30
Augmented Reality Audio and Hear-Through Systems
Virtual
sounds
Binaural
signals to
distant user
Figure 3.1. Block diagram of the ARA system.
sound environment that has propagated through the ARA system is called
a pseudoacoustic environment [3].
However, the acoustics of the ARA headset differs from normal open ear
acoustics, and hence, the ARA headset requires equalization in order to
sound natural. The equalization is realized with an ARA mixer [63]. The
sound quality of the pseudoacoustic environment must be good so that
the users can continuously wear the ARA headset without fatigue. The
usability issues of the ARA headset have been studied by Tikander [65].
Because of the low latency requirements, earlier ARA systems were
implemented using analog components [63] in order to avoid the combfiltering effect (see Section 3.3) caused by the delayed pseudoacoustic sound.
Publication IV introduces a novel digital version of the ARA system, as
well as analyses how the delay caused by the digital signal processing
affects the perceived signal.
3.2.1
ARA Applications
Figure 3.1 shows the system diagram for ARA applications. Virtual sounds
can be processed, e.g., with head-related transfer functions (HRTF) to
place the virtual sounds into the three dimensional space around the user
[66, 67]. The ARA mixer and equalizer is used for the equalization of the
pseudoacoustic representation as well as for routing and mixing of all the
signals involved in the system. Furthermore, the binaural microphone
signals can be sent to distant users for communication purposes.
The ARA technology enables the implementation of different innovative applications, including applications in communication and information services, such as teleconferencing and binaural telephony with good
telepresence [68, 69], since the remote user can hear the same pseudoacoustic environment as the near-end user does.
31
Augmented Reality Audio and Hear-Through Systems
There are also many possible application scenarios when the position
and orientation of the user is known, including an eyes-free navigation
system [70, 71], the virtual tourist guide [65], and location-based games
[72]. The positioning of the user can be done using the Global Positioning System (GPS) whereas determining of the user’s orientation requires
head tracking [73].
The ARA system can also be used to increase situational awareness [56].
Furthermore, it is possible to estimate different acoustic parameters from
the surrounding environment, such as room reflections [74] and reverberation time [75].
The ARA system could also be used as an assistive listening device.
In Publication V, the hear-through headset is used to attenuate ambient sounds while the pseudoacoustic representation is made adjustable
for the user. Enhancing ambient sounds would also be possible with current technology. The digital ARA technology presented in Publication IV
would enable an adjustable platform for a user-controllable ARA system
that could be individually tuned to the user’s needs.
3.3
Comb-Filtering Effect
In ARA, the pseudoacoustic representation undergoes some amount of delay due to the equalization filters. However, the largest delays are typically caused by the analog-to-digital and digital-to-analog converters operating before and after the actual equalization, respectively. The delay
of a laptop and an external FireWire audio interface measured in Publication IV, including both the delays of the analog-to-digital and the digitalto-analog converter, was 8.6 and 21 ms, respectively. Furthermore, the
delay introduced by the converters of the DSP evaluation board used in
Publication IV was approximately 0.95 ms.
In addition to the pseudoacoustic sound, the ambient sounds leak through
and around the ARA headset. The leaked sound and the delayed pseudoacoustic sound are summed at the ear drum of the user, and because of the
delay, it can easily result in a comb-filtering effect [76, 77]. The combfiltering effect occurring in a hear-through device can be simulated with
an FIR comb filter:
H(z) = gd + geq z −L ,
(3.1)
where gd is the gain for the direct leaked sound, geq is the gain for the
equalized pseudoacoustic sound, and L is the delay in samples.
32
Augmented Reality Audio and Hear-Through Systems
Magnitude (dB)
10
0
−10
−20
−30
−40
100
1k
Frequency (Hz)
10k
(a)
Magnitude (dB)
10
0
−10
−20
−30
−40
100
1k
Frequency (Hz)
10k
(b)
Figure 3.2. Magnitude frequency response of the FIR comb filter defined in Equation
(3.1), when the delay is 2 ms. In (a) the values of gd and geq = 0.5 and in (b)
gd = 0.1 and geq = 0.9.
The worst-case scenario occurs when both of the signals have the same
amount of energy gd = geq . Figure 3.2(a) shows the comb-filtering effect
when the gains are set to gd = geq = 0.5 and the delay is 2 ms. The value
of 0.5 for the gains was chosen because when the sum of the gains is one
the top of the peaks lie at 0 dB.
Figure 3.2(b) shows the magnitude response of an FIR comb filter described by Equation (3.1), when the delay between the signals is 2 ms, as
for the filter used to obtain Figure 3.2(a). However, now the direct sound
is attenuated 20 dB, and thus the values of gd and geq are 0.1 and 0.9,
respectively. As can be seen, the dips are at the same frequencies as in
Figure 3.2(a), but their depth is reduced dramatically. In fact, with the
20-dB attenuation, the depth of the dips is only 2 dB.
The audibility of the comb-filtering effect has been previously studied,
e.g., in [78, 79, 76]. The amount of attenuation needed between the direct and delayed sound for the comb-filtering coloration to be inaudible
33
Augmented Reality Audio and Hear-Through Systems
depends on the delay and the type of audio that is listened to. The result of Schuller and Härmä [79] showed that the level difference between
the direct and delayed sound should be around 26–30 dB in order to be
inaudible. The result was averaged over different audio test material.
Brunner et al. [76] conducted a listening test in order to determine the audibility of the comb-filtering effect for speech, piano, and snare drum test
signals. Their results indicated that when the delay is 0.5–3 ms, which
corresponds to a typical latency of an ARA headset system, the average
level difference needed in order for the comb-filtering effect to be inaudible was approximately 12 dB for the speech and piano, and 17 dB for the
snare drum sample. The highest required attenuations (based on single
subjects) for the piano, speech, and snare drum samples, were 22, 23, and
27 dB, respectively.
This kind of isolation cannot be accomplished with all headphones, but
with in-ear headphones it is possible, since they can provide more than
20-dB attenuation above 1 kHz, as shown in Figure 2.4. Furthermore, the
idea in the ARA system is to wear the headset continuously and not to
compare the pseudoacoustic representation of the sound directly with the
real-world sounds by removing the headset. This allows the user of the
headset to get accustomed to minor timbral distortions.
34
4. Auditory Masking
Psychoacoustics is aiming to understand and characterize human auditory perception, especially the time-frequency processing of the inner-ear
[80]. Auditory masking is a common psychoacoustic phenomenon that occurs all the time in our hearing system. Auditory masking refers to a
situation where one sound (a masker) affects the perceived loudness of
another sound (a maskee). The maskee can either be completely masked
to be inaudible by the masker or just reduced in loudness. The latter
instance is referred to as partial masking.
Auditory masking has been tackled for quite a long time. Fletcher and
Munson [81] have studied the relation between the loudness of complex
sounds and masking already back in 1937. Despite the long history of
research in the area, the auditory system, including auditory masking
and hearing models, is still an active research topic [82, 83].
Knowledge of auditory masking has also been utilized in audio coding
for 35 years, since Schroeder et al. [84] studied the optimization of speech
codecs by utilizing the knowledge of human auditory masking. The idea
in audio coding is to identify irrelevant information by analyzing the audio signal and then reducing this information from the signal according
to psychoacoustical principles, including the analysis of absolute threshold of hearing, the masking threshold, and the spreading of the masking
across the critical bands [85, 86, 87].
This Section provides the psychoacoustic context for Publications II and
III.
4.1
Critical Bands
The critical band is an important concept when describing the time-frequency resolution of the human auditory system. Basically, it describes
35
Auditory Masking
30
100
1k
300
Frequency (Hz)
10k
3k
Figure 4.1. Bark band edges on a logarithmic scale.
how the basilar membrane interprets different frequencies. The mechanical properties of the basilar membrane change as a function of its position.
At the beginning, the membrane is quite narrow and stiff, while at the end
it is wider and less stiff [88]. Every position on the basilar membrane reacts differently to different frequencies. In fact, the basilar membrane
conducts a frequency-to-position transformation of sound [89, 90, 91].
The critical bandwidth is based on the frequency-specific positions of
the basilar membrane. The hearing system analyzes wide-band sound in
a way that the sounds within one critical band are analyzed as a whole.
In terms of auditory masking, this means that the level of masking within
a critical band is constant [92].
The Bark scale was proposed by Zwicker in 1961 [93] to represent the
critical bands such that one Bark corresponds to one critical band. The
critical bands behave linearly up to 500 Hz, and after that the behavior is
mainly logarithmic. Figure 4.1 plots the band edges of the Bark scale, as
introduced in [93]. Furthermore, the conversion from frequency in Hertz
to the Bark scale can be approximately calculated from
ν = 13 arctan
0.76f
kHz
+ 3.5 arctan
f
7.5kHz
2
,
(4.1)
where f is the frequency in Hertz and ν is the mapped frequency in Bark
units [94].
4.2
Estimation of Masking Threshold
In order to estimate the inaudible critical bands of the maskee in the presence of a masker, a masking threshold is calculated. According to Zwicker
and Fastl [94], the masking threshold refers to the SPL needed for a maskee to be just audible in the presence of a masker. The calculation of
masking threshold used in this work (Publications II and III) is based on
an audio signal coding technique presented by Johnston [85]. There are
36
Auditory Masking
Masking energy offset (U)
Sp
re
ad
i
ng
fu
nc
tio
SPL (dB)
n
(B
)
SPL of masker (LM)
Critical band (Bark)
Figure 4.2. Masking threshold estimation.
more recent and advanced auditory models available as well, e.g., [95, 96].
Nevertheless, the technique by Johnston was chosen, since the applications in Publications II and III required only an estimate of the masking
threshold and not more complex auditory models or attributes. The Johnston method provides a straightforward method to calculate the masking
threshold, and it is sufficiently low in complexity to be implemented in
real-time.
The masking threshold is calculated in steps [85], including the following: calculating the energy of the masker signal, mapping the frequency
scale onto the Bark scale and calculating the average energy for each critical band, applying a spreading function to each Bark band, summing the
individual spread masking curves into one overall spread masking threshold, estimating the tonality of the masker and calculating the offset for the
overall spread masking threshold, and finally, accounting for the absolute
threshold of hearing. All the equations needed to calculate the masking
threshold are found in Publications II and III.
Figure 4.2 illustrates the masking threshold estimation of one Bark
band, where LM is the energy of the Bark band, B is the spreading function, and U is the masking-energy offset. The masking-energy offset is
determined based on the tonality of the masker that can be estimated
using the spectral flatness measure, which is defined as the ratio of the
geometric mean to the arithmetic mean of the power spectrum [85]. A
ratio close to one indicates that the spectrum is flat and the signal has
decorrelated noisy content, whereas a ratio close to zero implies that the
spectrum has narrowband tonal components [80]. The masking energy
offset is required since it is known that a noise-like masker masks more
effectively than a tonal masker [95].
37
Auditory Masking
SPL (dB)
Masker
A
B
Ma
ski
ng
thre
sho
ld
Critical band (Bark)
Figure 4.3. Partial masking effect, where two tones (unfilled markers) are in the presence of a masker.
4.3
Partial Masking
The maskee is not necessarily masked completely in the presence of the
masker. When the maskee is just above the masking threshold, it is audible but perceived to be reduced in loudness. In such a case, the maskee is partially masked. Thus, the masker does not only shift the absolute threshold of hearing to the masked threshold, but it also produces a
masked loudness curve [94]. The effect of partial masking is the strongest
near the masking threshold. When the level of the maskee increases, the
effect of the partial masking decreases at a slightly faster rate. As the
level of the maskee increases to a sufficiently high level, the effect of partial masking ends, and the maskee is perceived at the same level in the
presence of the masker as in quiet [97].
Figure 4.3 shows a situation where two tones with the same SPL are in
the presence of a masker, one of which is at slightly higher frequency (B)
than the masker and the other at much lower frequency (A). The thick
stem with the filled marker shows the SPL of the masker, whereas the
two stems with unfilled round markers (A and B) indicate the SPL of the
two tones. The triangle marker is the perceived SPL of the high-frequency
tone. As can be seen in Figure 4.3, the high-frequency tone (B) is above
the masking threshold but still close to it. Thus, the tone is audible but its
perceived SPL is reduced, as indicated by the triangle marker. However,
the low-frequency tone (A) is nowhere near the masking threshold, and
thus, it is not affected by the masker at all. The low-frequency tone is
perceived at the same level as it would be perceived without the masker.
38
5. Digital Equalizers
This section presents the basics of digital equalizers, which are widely
used in this work, as well as a parametric and a graphic equalizer design,
since they are used in Publications II-VII. Furthermore, the concept of
a parallel equalizer is introduced, since Publication VII is based on this
concept.
5.1
Equalizers
Equalizers are often used to correct the performance of audio systems. In
fact, equalization can be said to be one of the most common and important audio processing methods. The expression ‘equalize’ originates from
its early application, which was to flatten out a frequency response. However, nowadays, the objective is not necessarily to flatten out the frequency
response, but to alter it to meet desired requirements.
A certain set of filters is needed in the implementation of an equalizer.
These equalizer filters can be tuned to adjust the equalizer to suit the
given application. Typical equalizer filters are shelving filters and peak
or notch filters. Shelving filters are used to weight low or high frequencies. High-frequency and low-frequency shelving filters boost or attenuate frequencies above or below the selected cut-off frequency, respectively.
Shelving filters are usually found on the treble and bass controls of home
and car audio systems. Shelving filters are useful when adjusting the
overall tonal properties. However, more precise filters are often required.
A peak or notch filter can be targeted to a specific frequency band at any
desired frequency.
The first time that an equalizer was used to improve the quality of reproduced sound was in the 1930s, when motion pictures with recorded sound
tracks emerged. Sound recording systems as well as reproduction systems
39
Digital Equalizers
Figure 5.1. Front panel of a graphic equalizer.
that were installed in theaters were poor and thus needed equalization
[98, 99]. The first stand-alone, fully digital equalizer was introduced in
1987 by Yamaha, namely the DEQ7, which included features such as 30
preset equalizer programs and a parametric equalizer [99, 100].
The two most common types of equalizers are the parametric [101, 102,
103, 104, 105] and graphic [106, 107, 108, 109, 110] equalizers. With
a parametric equalizer, the user can control the center frequency, bandwidth, and gain of the equalizer filters, whereas with graphic equalizer
the only user-controllable parameter is the gain. The gains are controlled
using sliders that plot the approximate magnitude response of the equalizer, as shown in Figure 5.1.
Nowadays equalizers are widely used in audio signal processing, which
include a variety of applications, such as enhancing a loudspeaker response [111, 112, 113] as well as the interaction between a loudspeaker
and a room [114, 115, 116, 117, 118], equalization of headphones in order
to provide a natural music listening experience [36, 119, 120, 121, 122] or
a natural hear-through experience [123, 124, 125, 126], and enhancement
of recorded music [127, 128].
Furthermore, an equalizer can be considered to be a filter bank, since
it has an array of bandpass filters that are used to control different subbands of the input signal. However, more advanced filter banks, such as
quadrature mirror filter banks, are often implemented with the analysissynthesis system using multirate signal processing [129]. In the analysis
bank, the signal is split into different bands, and the sub-bands are decimated by an appropriate factor. Then, the sub-band signals can be processed independently, for example, by using different coding or compressing the sub-bands differently. After that, the signal is reconstructed by
interpolating the sub-band signals [129]. Although multirate filter banks
40
Digital Equalizers
offer a wider range of signal processing possibilities, the advantage of an
equalizer is that it does not require the splitting and reconstruction of the
audio signal.
5.2
Parametric Equalizer Design
Regalia and Mitra [101] introduced a digital parametric equalizer filter,
which is based on an allpass filter structure. The Regalia–Mitra filter
enables gain adjustments at specified frequencies, while it leaves the rest
of the spectrum unaffected. Furthermore, the equalizer parameters can
be individually tuned.
The transfer function of the filter can be expressed as
K
1
H(z) = [1 + A(z)] + [1 − A(z)],
2
2
(5.1)
where K specifies the gain of the filter and A(z) is an allpass filter [101].
When A(z) is a first-order allpass, the filter is a shelving filter. When A(z)
is a second-order allpass, the filter is a peak or notch filter. The secondorder allpass filter commonly used in (5.1) has the transfer function
A(z) =
a + b(1 + a)z −1 + z −2
,
1 + b(1 + a)z −1 + az −2
where
a=
1 − tan(Ω/2)
,
1 + tan(Ω/2)
b = − cos(ω0 ),
(5.2)
(5.3)
(5.4)
Ω denotes the normalized filter bandwidth, and ω0 the normalized center
frequency. When K > 1, the filter is a peak filter and when K < 1, it is
notch filter.
The peaks and notches are not symmetric in the original Regalia–Mitra
filter structure, i.e., the bandwidth of the notches is smaller than that of
the peak filters [101]. Figure 5.2(a) shows the asymmetry between the
peak and notch filters. However, Zölzer and Boltze [130] extended the
second-order Regalia-Mitra structure in order to have symmetrical peaks
and notches. The only change to the original structure concerns the frequency parameter a, which becomes
ac =
K − tan Ω/2
,
K + tan Ω/2
when K < 1.
(5.5)
Figure 5.2(b) shows the magnitude response of the peak and notch filters,
when ac is used.
41
Digital Equalizers
20
10
10
0
0
−10
−10
Gain (dB)
20
−20
100
1k
Frequency (Hz)
10k
−20
100
(a)
1k
Frequency (Hz)
10k
(b)
Figure 5.2. Effect of ac , where (a) is the response of the original Regalia-Mitra design
and (b) is that of the improved design, which uses the ac parameter for notch
filters.
A slightly different approach to achieve the symmetry between the peaks
and notches is presented by Fontana and Karjalainen [131]. Their implementation enables switching between peak and notch filter structures
such that both structures share a common allpass section.
5.3
High-Order Graphic Equalizer Design
Holters and Zölzer [108] presented a high-order graphic equalizer design
with minimum-phase behavior. The equalizer has almost independent
gain control at each band due to the high filter order. The design is based
on a high-order digital parametric equalizer [132], which was later extended by Holters and Zölzer [133]. The design utilizes recursive filters,
where fourth-order filter sections are cascaded in order to achieve higher
filter orders.
Figure 5.3 shows the recursive structure of the graphic equalizer, where
P is the number of fourth-order sections that are cascaded. The dashed
lines show the lower and upper limits of the ninth Bark band, whose center frequency is 1 kHz and bandwidth is 160 Hz. As can be seen, the filter
becomes more and more accurate as the number of fourth-order sections
is increased. A more detailed description of the filter design is given, e.g.,
in Publication VI.
Publications II and III use a Bark-band graphic equalizer implemented
using this high-order filter design [108]. The graphic equalizer in Publication II is used to simulate the perception of music in a noisy environment,
i.e., the graphic equalizer is used to suppress those Bark bands that are
either completely or partially masked by ambient noise. The masking information is obtained from psychoacoustical models, as described in Sec-
42
Digital Equalizers
H1(z)
H2(z)
HP(z)
Figure 5.3. High-order graphic equalizer, where the gain is set to -20 dB. The dashed line
shows the lower and upper limit of the ninth Bark band.
tion 4. Based on the estimate of the masking threshold and the partial
masking effect, the gains of the high-order graphic equalizer are adjusted
to suppress the music, so that if the energy of the music in a certain Bark
band is below the masking threshold it is attenuated 50 dB. When the
music is just partially masked, it is attenuated according to the partial
masking model.
The lessons learned from Publication II was then exploited in Publication III, where the design goal was to implement an adaptive perceptual
equalizer for headphones which estimates the energy of the music and
ambient noise inside the user’s ear canal based on the measured characteristics of the headphones.
Furthermore, Publication VI introduces a novel filter optimization algorithm for the high-order graphic equalizer. Especially in the Bark-band
implementation of the graphic equalizer, the errors in the magnitude response around the transition bands of the filters increase at high and low
frequencies (due to the bilinear transform at high frequencies [108] and
logarithmically non-uniform bandwidths at low frequencies). The errors
occur due to the non-optimal overlapping of the adjacent filter slopes.
The proposed optimization algorithm in Publication VI minimizes these
errors by iteratively optimizing the orders of adjacent filters. The optimization affects the shape of the filter slope in the transition band, which
makes the search for the optimum filter order possible by comparing it to
the slope of the adjacent filter.
Figure 5.4 shows the idea behind the optimization scheme, where in (a)
the black curve is a target band filter (center frequency 350 Hz, bandwidth
100 Hz, gain −20 dB, and the number of fourth-order sections P = 2) and
the colored curves are filter candidates of different order for the adjacent band (center frequency 250 Hz and bandwidth 100 Hz) with different
number of fourth-order sections P = 2–5. As can be seen in Figure 5.4(a),
43
Digital Equalizers
0
3
Error (dB)
−5
Magnitude (dB)
P=2
P=3
P=4
P=5
−10
P=2
P=2
P=3
P=4
P=5
−15
−20
200
300
Frequency (Hz)
400
(a)
2
1
0
240
260
280
300
320
Frequency (Hz)
340
360
(b)
Figure 5.4. Example of the interaction occurring around a transition band between two
adjacent band filters, where (a) shows the magnitude response of a target
band filter (black curve, P = 2) and a set of magnitude responses of filter
candidates on the adjacent lower band (colored curves, P = 2, 3, 4, 5), and
(b) shows the absolute errors around the transition band for the total filter
responses (target is a flat curve in −20 dB).
the shape of the filter slope changes when the filter order is increased. Figure 5.4(b) shows the absolute errors around the transition band (300 Hz),
when the filters of different order (colored curves) are summed with the
target filter (black curve). The target curve lies at −20 dB. As can be
seen, the smallest peak error (approx. 1.5 dB, purple curve) occurs when
the lower band filter has three fourth-order sections P = 3. The algorithm
then chooses the number of fourth-order sections that results in the smallest peak error and moves to the next band and starts optimizing the order
of that band filter by comparing it to the already optimized filter.
5.4
Parallel Equalizer Design
The fixed-pole design of parallel equalizer filters was first introduced by
Bank [134]. Since then, it has been utilized in different applications,
such as magnitude response smoothing [135, 136], equalization of a loudspeaker-room response [135, 136], and modeling of musical instruments
[134, 137, 138].
The design is based on parallel second-order IIR filters whose poles are
set to predetermined positions according to the desired frequency resolution, which is directly affected by the pole frequency differences, i.e., the
distance between adjacent poles. This enables a straightforward design
of a logarithmic frequency resolution by placing the poles logarithmically,
which closely corresponds to that of human hearing, and hence, it is of-
44
Digital Equalizers
In
b1,0
-a1,1
-a1,2
Out
b1,1
H1(z)
d0
Figure 5.5. Structure of the parallel equalizer.
ten used in audio applications [139, 140]. The accuracy of the equalizer
gets better when the number of the poles is increased. Furthermore, the
accuracy increment can be targeted to a specified frequency area by placing more poles in that area. The poles can be set manually, as is done in
Publication VII, or automatically, as shown, e.g., in [135, 118].
Other ways to achieve logarithmic frequency resolution include using
warped filters [141] and more generalized Kautz filters [142, 143], where
the frequency resolution can be set arbitrarily through the choice of the
filter poles. Furthermore, Alku and Bäckström [144] introduced a linear
prediction method which gives more accuracy at low frequencies. The
method is based on conventional linear predictors of successive orders
with parallel structures, which consist of symmetric linear predictors and
lowpass and highpass pre-filters.
Figure 5.5 shows the block diagram of the parallel filter structure, which
consists of P parallel second-order sections (H1 (z), H2 (z), . . . , HP (z)) and
a parallel direct path with a gain d0 . Thus, the transfer function of the
parallel filter is
H(z) = d0 +
P
bk,0 + bk,1 z −1
,
1 + ak,1 z −1 + ak,2 z −2
k=1
(5.6)
where ak,1 and ak,2 are the denominator coefficients of the k th second-order
section, which are derived based on the predetermined pole positions, and
bk,0 and bk,1 are the numerator coefficients of the k th section, which are
optimized based on the target response.
45
Digital Equalizers
In Publication VII of this work, the filter design is done in the frequency
domain [140, 145], since the parallel filter structure is used to implement
a graphic equalizer, where the gain sliders define the target magnitude
response. The phase of the target magnitude response is then set to be
minimum phase. Furthermore, it is often useful to give different weights
to different frequencies depending on the desired gain, when the magnitude error is minimized on a decibel scale, as discussed in Publication
VII. Once the denominator coefficients and the target response have been
determined, the numerator parameters can be solved using least-squares
error minimization, as shown in Section III-B of Publication VII.
This design is similar to the linear-in-parameter design of Kautz equalizers [142, 143]. In fact, the parallel filter structure produces effectively
the same results as the Kautz filter but requires 33 % fewer operations,
and has a fully parallel structure [146]. The benefit of the fully parallel
structure is the possibility to use a graphics processing unit (GPU) instead
of a central processing unit (CPU) to do the filtering [147]. GPUs have a
large number of parallel processors, which can also be used to perform
audio signal processing [148, 149].
5.5
Comparison of the Graphic Equalizers
Figure 5.6 shows the magnitude responses of both of the proposed graphic
equalizers, when they have been configured for Bark-bands (25 bands).
Figure 5.6(a) shows the magnitude response of the optimized high-order
graphic equalizer, and Figure 5.6(b) shows the magnitude response of the
parallel graphic equalizer (PGE). These two Bark-band graphic equalizers have the same command slider settings (round markers), which were
chosen to have flat regions as well as large gain differences, both of which
are known to be difficult challenges for a graphic equalizer.
The orders of the equalizers differ quite radically. Starting from the first
Bark band of the optimized high-order design shown in Figure 5.6(a), the
number of the fourth-order sections are six, four, and three; bands from 4
to 23 have two fourth-order sections; and the last two bands have three
and five sections, respectively, yielding a total order of 244. The parallel
graphic equalizer shown in Figure 5.6(b) has 50 second-order sections,
resulting in a total order of 100.
Furthermore, the PGE design avoids the problem that typically occurs
in graphic equalizers, where the interaction between the adjacent band
46
Digital Equalizers
Magnitude (dB)
15
10
5
0
−5
−10
−15
100
1k
Frequency (Hz)
10k
(a)
Magnitude (dB)
15
10
5
0
−5
−10
−15
100
1k
Frequency (Hz)
10k
(b)
Figure 5.6. Comparison of two Bark-band graphic equalizers (25 bands). The figures
show the magnitude response of (a) the optimized high-order equalizer design
and (b) the parallel graphic equalizer design with the same command slider
positions. The round markers indicate the command positions.
filters cause errors in the magnitude response, as shown in the case of
the high-order graphic equalizer in Figure 5.4(b). This problem is avoided
because the parallel second-order sections are jointly optimized, as discussed in Publication VII. Comparing the PGE to the high-order graphic
equalizer, the main difference between these two equalizers is that the
joint optimization in the PGE enables smaller filter orders than the highorder graphic equalizer requires.
One downside of the PGE is that every time a command slider position
is altered by the user, the optimization has to be recalculated, whereas in
the optimized high-order equalizer, the optimization is done only once, offline, and when the user changes the command setting of a band, only the
fourth-order sections of that band must be recalculated. However, saving
preset equalizer command settings for the PGE is highly efficient, since
the filter optimization can be precomputed and easily stored in memory.
47
Digital Equalizers
Both of these graphic equalizer designs can be widely used in audio signal processing applications. The high-order graphic equalizer results in
more brick-wall-like filters than the smoother PGE due to the high filter
order, as can be seen in Figure 5.6, especially around 400–1000 Hz and 2–
3 kHz. This feature makes the high-order graphic equalizer suitable, e.g.,
for the perceptual equalization applications presented in Publications II
and III.
48
6. Summary of Publications and Main
Results
This section summarizes the main results presented in the publications.
Publication I: "Signal Processing Framework for Virtual Headphone
Listening Tests in a Noisy Environment"
Publication I describes a signal processing framework that enables virtual listening of different headphones both in quiet and in virtual noisy
environments. Ambient noise isolation of headphones has become an essential design feature, because headphones are increasingly used in noisy
environments when commuting. Thus, it is important to evaluate the performance of the headphones in an authentic listening environment, since
the headphones together with the ambient noise define the actual listening experience. The proposed framework provides just this by emulating
the characteristics of the headphones, both the frequency response of the
driver and the ambient sound isolation of the headphone, based on actual
measurements. Furthermore, the proposed framework makes otherwise
impractical blind headphone comparisons possible.
Section 3 presents the calibration method of the reference headphones
used during the listening, since the frequency response of the reference
headphones must be compensated. This is done by using linear prediction filter coefficients to invert the measured frequency response of the
reference headphones with the help of a peak reduction technique, also
described in Section 2.5 of this overview part. Section 4 shows how the
proposed framework is implemented using Matlab with Playrec and how
the measured impulse responses are used as FIR filters to simulate the
characteristics of the headphones.
49
Summary of Publications and Main Results
Publication II: "Perceptual Frequency Response Simulator for Music
in Noisy Environments"
Publication II further tackles mobile headphone listening and the difficulties caused by ambient noise. Publication II expands on the fact that the
listening environment drastically affects the headphone listening experience, as discussed in Publication I. A real-time perceptual simulator for
music listening in noisy environments is presented. The simulator uses
an auditory masking model to simulate how the ambient noise alters the
perceived timbre of music. The auditory masking model is described in
Section 3, which includes a masking threshold calculation (Sec. 3.1) and
a partial masking model (Sec. 3.2) derived from a listening test with complex test sounds resembling musical sounds. Section 4 describes the implementation of the simulator, whose output is a music signal from which
all components masked by the ambient noise as well as the noise itself are
suppressed. This is done with a high-order Bark-band graphic equalizer,
which is controlled by the psychoacoustic masking estimations. Furthermore, Section 5 presents the simulator evaluation results obtained from
a listening test.
Publication III: "Perceptual Headphone Equalization for Mitigation of
Ambient Noise"
Publication III uses the ideas and techniques developed in Publication
II to enhance the listening experience in mobile headphone listening by
equalizing the music. This is achieved by using the same psychoacoustic models that were used in Publication II to analyze ambient noise and
music while simultaneously considering the characteristics of the headphones. This enables the estimation of the ambient noise and music level
inside the ear canal. The ideal goal is to estimate the level to which the
music should be raised to in order to have the same perceived tonal balance in a noisy mobile environment as in a quiet one. The same high-order
Bark-band graphic equalizer as used in Publication II was utilized.
Section 3 together with Figures 1 and 2 describe the implementation of
the proposed perceptual headphone equalization. One of the main advantages of the proposed equalizer is that it retains a reasonable SPL after
equalization, since the music is boosted only at those frequencies where
needed. Thus, the system preserves the hearing of the user, when com-
50
Summary of Publications and Main Results
pared to the typical case where the user just raises the volume of the
music. The Matlab/Playrec implementation is also presented in Section 3.
Publication IV: "Digital Augmented Reality Audio Headset"
Publication IV introduces a novel digital design of an ARA headset. The
ARA system has been commonly believed to require the delay in the processing chain to be as small as possible, since otherwise the perceived
sound can be colored by the comb-filtering effect. This is the reason why
earlier ARA systems have been implemented with analog components.
However, the in-ear headphones used in this work provide a high isolation of the outside ambient sounds, which is beneficial with regard to the
comb-filtering effect. Section 2 describes the physics behind the ARA technology as well as introduces possible applications to be used on the ARA
platform. Sectiond 3 and 4 discuss the delays introduced by digital filters
and passive mechanisms. Section 5 introduces the ARA equalizer filter
design, which is implemented using a DSP evaluation board, as discussed
in Section 6. Additionally, Section 6 presents how the measurements of
the properties of the DSP board are made as well as a detailed analysis
of the comb-filtering effect. Moreover, a listening test was conducted to
subjectively evaluate the disturbance of a comb-filtering effect with music
and speech signals. The digital implementation brings several benefits
when compared to the old analog realization, such as the ease of design
and programmability of the equalizer filters.
Publication V: "Live Sound Equalization and Attenuation with a
Headset"
It was possible to implement a novel idea, where the user can control
their surrounding soundscape during a live concert, based on the successful implementation of the digital augmented reality headset presented in
Publication IV. Publication V presents a system, called LiveEQ, which
captures ambient sounds around the user, provides a user-controllable
equalization, and plays back the equalized ambient sounds to the user
with headphones. An important feature of the LiveEQ system is that it
also attenuates the perceived live music, since it exploits the high attenuation capability of in-ear headphones to provide hearing protection.
The system was implemented using Matlab with Playrec as well as with
51
Summary of Publications and Main Results
a DSP evaluation board. The Matlab implementation serves as a simulator whereas the DSP board implementation can be used in real time.
Section 4.1 further examines the comb-filtering effect occurring in a hearthrough system that uses attenuative headphones.
Publication VI: "Optimizing a High-Order Graphic Equalizer for
Audio Processing"
Publication VI presents a new optimization algorithm for the high-order
graphic equalizer used in Publications II and III. The main idea is to
optimize the filter orders to improve the accuracy and efficiency of the
equalizer. Every practical filter has a transition band, and in the case
of a graphic equalizer, each transition band interacts with its neighboring filters creating errors in the magnitude response. The proposed iterative algorithm minimizes these errors by optimizing the filter orders,
which affects the shape of the transition band. The optimum shape of
the transition band depends on the shape of the adjacent filter. The order
optimization is executed only once offline, and after that, during the operation, only the gains of the graphic equalizer are changed according to
the user’s command settings. The order optimization was shown to reduce
the peak errors around the transition bands while the equalizer still had
a lower filter order than non-optimized equalizers with larger errors.
Publication VII: "High-Precision Graphic Parallel Equalizer"
Publication VII introduces an accurate graphic equalizer whose design
differs from that of the high-order equalizer used in Publication VI. The
novel graphic equalizer is based on the parallel filter structure introduced
in Section 5.4 of this overview. A typical problem with graphic equalizers
is that the interaction between the adjacent filters, which can result in
substantial errors especially with large boost or cut command settings.
The advantage of the parallel design is that the filters are jointly optimized, and thus, the typical errors due to interaction of the filters are
avoided. This enables the implementation of a highly accurate graphic
equalizer. It was discovered that when there are twice as many pole frequencies compared to the number of the equalizer bands, the maximum
error of the magnitude response of the proposed ±12-dB parallel graphic
equalizer is less than 1 dB across the audible frequencies.
52
7. Conclusions
This work explores equalization techniques for mobile headphone usage,
including equalization of typical headphones as well as equalization of a
hear-through headset. The mobile use of headphones has introduced new
problems in music listening by degrading the perceived sound quality of
the music. This is due to the ambient noise that leaks through the headphones into the ear canal. The ambient noise masks parts of the music
and thus changes its perceived timbre. This is the reason why a good isolation capability is nowadays an important feature in mobile headphones.
The degrading effect of ambient noise can be mitigated by using an intelligent equalization that enhances the music in those frequency regions
where the auditory masking is pronounced.
The ample isolation capability can sometimes be too much, especially
in social situations. An ARA headset is a device that aims to produce
a perfect hear-through experience with the possibility to include virtual
sounds. The ambient sounds are captured with binaural microphones and
reproduced with the headphone drivers. This requires equalization to
compensate the alterations that the headset introduces to the acoustics
of the outer ear.
This overview of the dissertation presented background information that
supports the included publications. The discussed topics include headphone acoustics and their measurements, an introduction to hear-through
systems, as well as augmented reality audio and its applications. We also
discussed digital equalizers and the matter of auditory masking, including the concepts of masking threshold and partial masking.
Publication I presents a signal processing framework for virtual headphone listening tests, which includes the simulation of headphones’ ambient noise isolation characteristics. Publication II provides a simulator
tool that helps one understand the auditory masking phenomenon in mo-
53
Conclusions
bile headphone listening. Publication III is a sequel to Publication II,
where the discovered results were utilized in an implementation of a perceptual equalizer for mobile headphone listening. Publications IV and
V concentrate on hear-through applications. Publication IV presents the
digital ARA headset, whereas Publication V presents a user-controllable
hear-through application for equalizing live music and limiting the sound
exposure during live concerts.
The parts related to Publications II–V that could be expanded in the
future are the usability and validation tests. Both of the systems introduced in Publications II and III would probably be more accurate should
they be fine-tuned based on listening tests conducted in laboratory conditions as well as in real-life everyday situations. However, both of the
proposed systems operate well as they are, and the perceptual equalizer
introduced in Publication III provides an enhanced listening experience
in noisy environments, while at the same time sustains restrained SPL.
Moreover, the main issue in hear-through applications presented in Publications IV and V is the comb-filtering effect. However, it should also be
tested how disturbing the comb-filtering effect actually is, when the user
wears the hear-through system over long, continuous periods of time in
different environments.
Publication VI presents an optimization algorithm for a high-order graphic equalizer that iteratively optimizes the order of adjacent band filters.
Publication VII presents a novel high-precision graphic equalizer, which
is implemented using parallel second-order filters. The optimized highorder graphic equalizer enables brickwall-like filter responses, as demonstrated in Figure 5.3, which gives good accuracy for each band filter as
well as for the whole equalizer, since the overlapping of the adjacent band
filters are minimized. However, the filter orders must be quite large in
order to create the brickwall filters. The PGE in Publication VII, on the
other hand, avoids the filter overlapping issue by jointly optimizing the
second-order sections. The resulting total order of the PGE can be significantly lower than that of the high-order equalizer; however, the PGE
cannot create as steep band filter responses as the high-order equalizer,
but results in a smoother overall magnitude response. Furthermore, the
parallel structure of the graphic equalizer enables implementations using
a GPU, which can often outperform a CPU in such a parallelizable task.
Future research could address the following aspects and tasks that were
not possible to be included in this work due to the time and space con-
54
Conclusions
straints. First, a mobile implementation of the proposed equalization
techniques and headphone applications would be important in order to allow real-life testing of the systems. The problem is the limited computing
power of smartphones as well as the shortage of input channels (typically
only one channel). However, the computing power of smartphones is constantly increasing, and modern smartphones can even be equipped with
a GPU, which can enable utilization of parallel algorithms. Even then,
the low power consumption, and thus low complexity algorithms, is a key
requirement for smartphone applications.
Another future study could include the development of algorithms reducing the comb-filtering effect in hear-through devices, which should be
possible, since the equalized sound can be independently processed from
the leaked sound. Furthermore, different hardware configurations should
also be tested, since low-delay digital signal processors are widely available and they could reduce the disturbance of the comb-filtering effect.
Of course, the ideal goal would be to implement all the proposed methods and more into a single pair of headphones. For example, it could
include a user-controllable frequency response (arbitrary or based on existing measured responses), which could then be enhanced with an intelligent adaptive equalizer that keeps the perceived frequency response constant no matter what kind of noisy environment the user is in. Furthermore, the hear-through function should be fully adjustable from transparent hear-through to amplification and all the way to the most effective
ANC-driven isolation. The adjusting of the hear-through can be automated based on the ambient noise and the content the user is listening
to. Furthermore, it is possible to utilize other data provided by the user’s
smartphone, such as location, network connections (WIFI and Bluetooth),
time of day, and incoming phone calls, to control the properties of the
headphones.
It would also be useful in the future to measure a vast set of headphones,
build a database, as well as implement and publish a listening test framework on the Internet that would be available to all who are interested in
headphones. Based on the informal feedback from the users of the headphone simulator presented in Publication I, this appears to be a highly
desired application for consumers.
In the future, headphones could and probably will be used in assistive
listening applications, where the idea is to improve the hearing of people with normal hearing in different situations. I believe that assistive
55
Conclusions
listening applications will also create a need for novel signal processing
methods and increase the usage of headphones in the future.
56
Bibliography
[1] Gartner, Inc., “Gartner says worldwide mobile phone sales declined
1.7 percent in 2012,” http://www.gartner.com/newsroom/id/2335616, Feb.
2013.
[2] ——, “Gartner says smartphone sales accounted for 55 percent of overall mobile phone sales in third quarter of 2013,”
http://www.gartner.com/newsroom/id/2623415, Nov. 2013.
[3] A. Härmä, J. Jakka, M. Tikander, M. Karjalainen, T. Lokki, J. Hiipakka,
and G. Lorho, “Augmented reality audio for mobile and wearable appliances,” J. Audio Eng. Soc., vol. 52, pp. 618–639, Jun. 2004.
[4] ITU-T, “Recommendation P.57, Series P: Telephone transmission quality,
objective measuring apparatus - artificial ears,” International Telecommunication Union, Telecommunication Standardization Sector, Rec. ITU-T
P.57, 2008/1996.
[5] D. Hammershøi, “Fundamental aspects of the binaural recording and synthesis techniques,” in Proc. AES 100th Convention, Copenhagen, Denmark, May 1996.
[6] R. Teranishi and E. A. G. Shaw, “External-ear acoustic models with simple
geometry,” J. Acoust. Soc. Am., vol. 44, no. 1, pp. 257–263, 1968.
[7] D. Raichel, The Science and Applications of Acoustics.
Springer-Verlag, 2000.
New York:
[8] L. L. Beranek, Acoustics.
New York: Acoustical Society of America
through the American Institute of Physics, Inc., 1996.
[9] F. M. Wiener and D. A. Ross, “The pressure distribution in the auditory
canal in a progressive sound field,” J. Acoust. Soc. Am., vol. 18, no. 2, pp.
401–408, Oct. 1946.
[10] P. Brüel, “Impedance of real and artificial ears,” Brüel & Kjær, Application
Note, 1976.
[11] H. Dillion, Hearing Aids.
Sydney, Australia: Boomerang Press, 2001.
[12] A. Wilska, “Studies on directional hearing,” Ph.D. dissertation, English
translation, Aalto University School of Science and Technology, Dept. of
Signal Processing and Acoustics, 2010. Ph.D. dissertation originally published in German as Untersuchungen über das Richtungshören, University
of Helsinki, 1938.
57
Bibliography
[13] M. D. Burkhard, “Measuring the constants of ear simulators,” J. Audio
Eng. Soc., vol. 25, no. 12, pp. 1008–1015, Dec. 1977.
[14] J. J. Zwislocki, “An ear-like coupler for calibration of earphones,” J. Acoust.
Soc. Am., vol. 50, no. 1A, pp. 110–110, 1971.
[15] R. M. Sachs and M. D. Burkhard, “Earphone pressure response in ears and
couplers,” in Proc. 83rd Meeting of the Acoustical Society of America, Jun.
1972.
[16] ——, “Zwislocki coupler evaluation with insert earphones,” Industrial Research Products, Inc. for Knowles Electronics, Inc., Tech. Rep., Nov. 1972.
[17] S. E. Voss and J. B. Allen, “Measurement of acoustic impedance and reflectance in the human ear canal,” J. Acoust. Soc. Am., vol. 95, no. 1, pp.
372–384, Jan. 1994.
[18] O. Saltykov and A. Gebert, “Potential errors of real-ear-to-couplerdifference method applied for a prediction of hearing aid performance in
an individual ear,” in Proc. AES 47th International Conference, Chicago,
USA, Jun. 2012.
[19] IEC, “Electroacoustics—Simulator of human head and ear—Part 4:
Occluded-ear simulator for the measurement of earphones coupled to the
ear by means of ear inserts,” International Electrotechnical Comission, IEC
60318-4 Ed. 1.0 b:2010, Jan. 2010.
[20] ANSI, “American national standard for an occluded ear simulator,” ANSI
S3.25-2009, 2009.
[21] S. Jønsson, B. Liu, A. Schuhmacher, and L. Nielsen, “Simulation of the
IEC 60711 occluded ear simulator,” in Proc. AES 116th Convention, Berlin,
Germany, May 2004.
[22] K. Inanaga, T. Hara, G. Rasmussen, and Y. Riko, “Research on a measuring
method of headphones and earphones using HATS,” in Proc. AES 125th
Convention, San Francisco, CA, Oct. 2008.
[23] A. J. Berkhout, D. de Vries, and M. M. Boone, “A new method to acuire
impulse responses in concert halls,” J. Acoust. Soc. Am., vol. 68, no. 1, pp.
179–183, Jul. 1980.
[24] D. Griesinger, “Impulse response measurements using all-pass deconvolution,” in Proc. AES 11th International Conference, Portland, Oregon, May
1992, pp. 306–321.
[25] ——, “Beyond MLS—Occupied hall measurements with FFT techniques,”
in Proc. AES 101st Convention, Los Angeles, California, Nov. 1996.
[26] A. Farina, “Simultaneous measurement of impulse response and distortion with a swept-sine technique,” in Proc. AES 108th Convention, Paris,
France, Feb. 2000.
[27] ——, “Advancements in impulse response measurements by sine sweeps,”
in Proc. AES 122nd Convention, Vienna, Austria, May 2007.
[28] S. Müller and P. Massarani, “Transfer-function measurement with
sweeps,” J. Audio Eng. Soc., vol. 49, no. 6, pp. 443–471, Jun. 2001.
58
Bibliography
[29] G.-B. Stan, J.-J. Embrechts, and D. Archambeau, “Comparison of different
impulse response measurement techniques,” J. Audio Eng. Soc., vol. 50,
no. 4, pp. 249–262, Apr. 2002.
[30] C. A. Poldy, “Headphones,” in Loudspeaker and Headphone Handbook,
3rd ed., J. Borwick, Ed. Focal Press, New York, 1994, pp. 585–692.
[31] S. P. Gido, R. B. Schulein, and S. D. Ambrose, “Sound reproduction within a
closed ear canal: Acoustical and physiological effects,” in Proc. AES 130th
Convention, London, UK, May 2011.
[32] F. Briolle and T. Voinier, “Transfer function and subjective quality of headphones: Part 2, subjective quality evaluations,” in AES 11th International
Conference, Portland, OR, May 1992, pp. 254–259.
[33] T. Hirvonen, M. Vaalgamaa, J. Backman, and M. Karjalainen, “Listening
test methodology for headphone evaluation,” in Proc. AES 114th Convention, Amsterdam, The Netherlands, Mar. 2003.
[34] T. Hiekkanen, A. Mäkivirta, and M. Karjalainen, “Virtualized listening
tests for loudspeakers,” J. Audio Eng. Soc., vol. 57, no. 4, pp. 237–251, Apr.
2009.
[35] S. E. Olive, T. Welti, and E. McMullin, “A virtual headphone listening test
methodology,” in Proc. AES 51st International Conference, Helsinki, Finland, Aug. 2013.
[36] G. Lorho, “Subjective evaluation of headphone target frequency responses,” in Proc. AES 126th Convention, Munich, Germany, May 2009.
[37] P. A. Nelson and S. J. Elliott, Active Control of Sound.
demic Press Limited, 1995.
London, UK: Aca-
[38] S. M. Kuo and D. R. Morgan, “Active noise control: A tutorial review,”
Proceedings of the IEEE, vol. 87, no. 6, pp. 943–973, Jun. 1999.
[39] B. Rafaely, “Active noise reducing headset—An overview,” in Proc. International Congress and Exhibition on Noise Control Engineering, The Hague,
The Netherlands, Aug. 2001.
[40] S. Priese, C. Bruhnken, D. Voss, J. Peissig, and E. Reithmeier, “Adaptive feedforward control for active noise cancellation in-ear headphones,”
in Proc. Meeting on Acoustics, Acoustical Society of America, vol. 18, Jan.
2013.
[41] C. Bruhnken, S. Priese, H. Foudhaili, J. Peissig, and E. Reithmeier, “Design
of feedback controller for active noise control with in-ear headphones,” in
Proc. Meeting on Acoustics, Acoustical Society of America, Jan. 2013.
[42] E. D. Simshauser and M. E. Hawley, “The noise-cancelling headset—An
active ear defender,” J. Acoust. Soc. Am., vol. 27, no. 1, pp. 207–207, 1955.
[43] T. Schumacher, H. Krüger, M. Jeub, P. Vary, and C. Beaugeant, “Active
noise control in headsets: A new approach for broadband feedback ANC,”
in Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic, May 2011, pp. 417–420.
59
Bibliography
[44] G. V. Békésy, “The structure of the middle ear and the hearing of one’s own
voice by bone conduction,” J. Acoust. Soc. Am., vol. 21, no. 3, pp. 217–232,
May 1948.
[45] C. Pörschmann, “One’s own voice in auditory virtual environments,” Acta
Acustica united with Acustica, vol. 87, pp. 378–388, 2001.
[46] J. Curran, “A forgotten technique for resolving the occlusion effect,”
Starkey Audiology Series reprinted from Innovations, vol. 2, no. 2, 2012.
[47] F. Kuk, D. Keenan, and C.-C. Lau, “Vent configurations on subjective and
objective occlusion effect,” J. Am. Acad. Audiol., vol. 16, no. 9, pp. 747–762,
2005.
[48] J. Kiessling, “Open fittings: Something new or old hat?” in Hearing
Care for Adults First International Conference, (Chapter 17, pp. 217-226),
Staefa, Switzerland, 2006.
[49] J. Meija, H. Dillon, and M. Fisher, “Active cancellation of occlusion: An
electronic vent for hearing aids and hearing protectors,” J. Acoust. Soc.
Am., vol. 124, no. 1, pp. 235–240, Jul. 2008.
[50] WHO, “World health organization. millions have hearing loss that
can be improved or prevented,” Feb. 2013. [Online]. Available:
www.who.int/mediacentre/news/notes/2013/hearing_loss_20130227/en/
[51] N. Petrescu, “Loud music listening,” McGill Journal of Medicine, vol. 11,
no. 2, pp. 169–176, 2008.
[52] E. Airo, J. Pekkarinen, and P. Olkinuora, “Listening to music with earphones: An assessment of noise exposure,” Acta Acustica united with Acustica, vol. 82, no. 6, pp. 885–894, Nov./Dec. 1996.
[53] S. Oksanen, M. Hiipakka, and V. Sivonen, “Estimating individual sound
pressure levels at the eardrum on music playback over insert headphones,”
in Proc. AES 47th International Conference, Chicago, IL, Jun. 2012.
[54] W. W. Clark and B. A. Bohne, “Temporary threshold shifts from attendance
at a rock concert,” J. Acoust. Soc. Am. Suppl. 1, vol. 79, pp. 48–49, 1986.
[55] R. Ordoñes, D. Hammershøi, C. Borg, and J. Voetmann, “A pilot study of
changes in otoacoustic emissions after exposure to live music,” in Proc.
AES 47th International Conference, Chicago, IL, Jun. 2012.
[56] M. C. Killion, T. Monroe, and V. Drambarean, “Better protection from
blasts without sacrificing situational awareness,” International Journal of
Audiology, vol. 50, no. S1, pp. 38–45, Mar. 2011.
[57] B. Sauert and P. Vary, “Recursive closed-form optimization of spectral
audio power allocation for near end listening enhancement,” in ITGFachtagung Sprachkommunikation, Bochum, Germany, Oct. 2010.
[58] ——, “Near end listening enhancement concidering thermal limit of mobile phone loudspeakers,” in Proc. 22. Konferenz Elektronische Sprachsignalverarbeitung, Aachen, Germany, Sept. 2011.
60
Bibliography
[59] C. H. Taal, J. Jensen, and A. Leijon, “On optimal linear filtering of speech
for near-end listening enhancement,” IEEE Signal Processing Letters,
vol. 20, no. 3, pp. 225–228, Mar. 2013.
[60] T. E. Miller and J. Barish, “Optimizing sound for listening in the presence
of road noise,” in Proc. AES 95th Convetion, New York, NY, Oct. 1993.
[61] M. Cristoph, “Noise dependent equalization control,” in Proc. AES 48th
International Conference, Munich, Germany, Sept. 2012.
[62] M. Tikander, M. Karjalainen, and V. Riikonen, “An augmented reality audio headset,” in Proc. 11th Int. Conference on Digital Audio Effects (DAFx08), Espoo, Finland, Sept. 2008.
[63] V. Riikonen, M. Tikander, and M. Karjalainen, “An augmented reality audio mixer and equalizer,” in Proc. AES 124th Convention, Amsterdam, The
Netherlands, May 2008.
[64] D. Brungart, A. Kordik, C. Eades, and B. Simpson, “The effect of microphone placement on localization accuracy with electronic pass-through
earplugs,” in Proc. IEEE Workshop on Applications of Signal Processing
to Audio and Acoustics (WASPAA), New Paltz, NY, Oct. 2003.
[65] M. Tikander, “Usability issues in listening to natural sounds with an augmented reality audio headset,” J. Audio Eng. Soc., vol. 57, no. 6, pp. 430–
441, Jun. 2009.
[66] H. Møller, M. F. Sørensen, D. Hammershøi, and C. Jensen, “Head-related
transfer functions of human subjects,” J. Audio Eng. Soc., vol. 43, no. 5, pp.
300–321, May 1995.
[67] J. G. Bolaños and V. Pulkki, “HRIR database with measured actual source
direction data,” in Proc. AES 133rd Convention, San Francisco, USA, Oct.
2012.
[68] A. Härmä, J. Jakka, M. Tikander, M. Karjalainen, T. Lokki, H. Nironen,
and S. Vesa, “Techniques and applications of wearable augmented reality audio,” in Proc. AES 114th Convention, Amsterdam, The Netherlands,
Mar. 2003.
[69] T. Lokki, H. Nironen, S. Vesa, L. Savioja, A. Härmä, and M. Karjalainen,
“Application scenarios of wearable and mobile augmented reality audio,”
in Proc. AES 116th Convention, Berlin, Germany, May 2004.
[70] C. Dicke, J. Sodnik, and M. Billinghurst, “Spatial auditory interfaces compared to visual interfaces for mobile use in a driving task,” in Proc. Ninth
International Conference on Enterprise Information Systems (ICEIS), Funchal, Madeira, Portugal, Jun. 2007, pp. 282–285.
[71] B. F. G. Katz, S. Kammoun, G. Parseihian, O. Gutierrez, A. Brilhault,
M. Auvray, P. Truillet, M. Denis, S. Thorpe, and C. Jouffrais, “NAVIG: augmented reality guidance system for the visually impaired,” Virtual Reality,
vol. 16, pp. 253–269, 2012.
[72] M. Peltola, T. Lokki, and L. Savioja, “Augmented reality audio for locationbased games,” in Proc. AES 35th International Conference, London, UK,
Feb. 2009.
61
Bibliography
[73] M. Karjalainen, M. Tikander, and A. Härmä, “Head-tracking and subject
positioning using binaural headset microphones and common modulation
anchor sources,” in Proc. International Conference on Acoustics, Speech and
Signal Processing (ICASSP), Montreal, Canada, May 2004.
[74] S. Vesa and T. Lokki, “Detection of room reflections from a binaural room
impulse response,” in Proc. 9th Int. Conference in Digital Audio Effect
(DAFx-06), Montreal, Canada, Sept. 2006, pp. 215–220.
[75] S. Vesa and A. Härmä, “Automatic estimation of reverberation time from
binaural signals,” in Proc. IEEE International Conference on Acoustics,
Speech, and Signal Processing (ICASSP), Philadelphia, USA, Mar. 2005,
pp. 281–284.
[76] S. Brunner, H.-J. Maempel, and S. Weinzierl, “On the audibility of combfilter distortions,” in Proc. AES 122nd Convention, Vienna, Austria, May
2007.
[77] A. Clifford and J. D. Reiss, “Using delay estimation to reduce comb filtering
of arbitary musical sources,” J. Audio Eng. Soc., vol. 61, no. 11, pp. 917–
927, Nov. 2013.
[78] W. Ahnert, “Comb-filter distortions and their perception in sound reinforcement systems,” in AES 84th Convention, Paris, France, Mar. 1988.
[79] G. Schuller and A. Härmä, “Low delay audio compression using predictive
coding,” in IEEE Int. Conf. Acoust. Speech Signal Process., Orlando, FL,
May 2002, pp. 1853–1856.
[80] A. Spanias, T. Painter, and V. Atti, Audio Signal Processing and Coding.
Hoboken, New Jersey: John Wiley & Sons, Inc., 2007.
[81] H. Fletcher and W. A. Munson, “Relation between loudness and masking,”
J. Acoust. Soc. Am., vol. 9, no. 1, pp. 1–10, Jul. 1937.
[82] R. F. Lyon, A. G. Katsiamis, and E. M. Drakakis, “History and future of
auditory filter models,” in Proc. IEEE International Symposium on Circuits
and Systems (ISCAS), Paris, France, Jun. 2010, pp. 3809–3812.
[83] J. Blauert, Ed., The Technology of Binaural Listening.
Springer-Verlag, 2013.
New York:
[84] M. R. Schroeder, B. S. Atal, and J. L. Hall, “Optimizing digital speech
coders by exploiting masking properties of the human ear,” J. Acoust. Soc.
Am., vol. 66, no. 6, pp. 1647–1652, Dec. 1979.
[85] J. D. Johnston, “Transform coding of audio signals using perceptual noise
criteria,” IEEE Journal of Selected Areas in Communications, vol. 6, no. 2,
pp. 314–323, Feb. 1988.
[86] K. Brandenburg and D. Seitzer, “OCF: Coding high quality audio with data
rates of 64 kbit/sec,” in Proc. AES 85th Convention, Los Angeles, CA, Nov.
1988.
[87] K. Brandenburg and M. Bosi, “Overview of MPEG audio: Current and
future standards for low-bit-rate audio coding,” J. Audio Eng. Soc., vol. 45,
no. 1/2, pp. 4–21, Feb. 1997.
62
Bibliography
[88] J. O. Pickles, An Introduction to the Physiology of Hearing.
Academic Press Inc., 1982.
London, UK:
[89] D. D. Greenwood, “Critical bandwidth and the frequency coordinates of the
basilar membrane,” J. Acoust. Soc. Am., vol. 33, no. 10, pp. 1344–1356, Oct.
1961.
[90] ——, “A cochlear frequency-position function for several species–29 years
later,” J. Acoust. Soc. Am., vol. 87, no. 6, pp. 2592–2605, Jun. 1990.
[91] G. V. Békésy, Experiments in Hearing, E. G. Wever, Ed.
McGraw–Hill Book Company, Inc., 1960.
New York:
[92] M. Bosi and R. E. Goldberg, Introduction to Digital Audio Coding and
Standards. Boston, MA: Kluwer Academic, 2003.
[93] E. Zwicker, “Subdivision of the audible frequency range into critical bands
(frequenzgruppen),” J. Acoust. Soc. Am., vol. 33, no. 2, p. 248, Feb. 1961.
[94] E. Zwicker and H. Fastl, Psychoacoustics: Facts and Models.
Springer-Verlag, 1990.
New York:
[95] S. van de Par, A. Kohlrausch, G. Charestan, and R. Heusdens, “A new
psychoacoustical masking model for audio coding applications,” in Proc.
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2002, pp. 1805–1808.
[96] M. L. Jepsen, S. D. Ewert, and T. Dau, “A computational model of human
auditory signal processing and perception,” J. Acoust. Soc. Am., vol. 124,
no. 1, pp. 422–438, Jul. 2008.
[97] H. Gockel and B. C. J. Moore, “Asymmetry of masking between complex
tones and noise: Partial loudness,” J. Acoust. Soc. Am., vol. 114, no. 1, pp.
349–360, Jul. 2003.
[98] W. C. Miller, “Basis of motion picture sound,” in Motion Picture Sound
Engineering, 6th ed. D. Van Nostrand Company, Inc., New York, 1938,
ch. 1, pp. 1–10.
[99] D. A. Bohn, “Operator adjustable equalizers: An overview,” in Proc. AES
6th International Conference, May 1988, pp. 369–381.
[100] Yamaha, Digital Equalizer DEQ7, Operating Manual. [Online]. Available:
http://www2.yamaha.co.jp/manual/pdf/pa/english/signal/DEQ7E.pdf
[101] P. Regalia and S. Mitra, “Tunable digital frequency response equalization
filters,” IEEE Transactions on Acoustics, Speech, and Signal Processing,
vol. 35, no. 1, pp. 118–120, Jan. 1987.
[102] T. van Waterschoot and M. Moonen, “A pole-zero placement technique for
designing second-order IIR parametric equalizer filters,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2561–
2565, Nov. 2007.
[103] H. Behrends, A. von dem Knesebeck, W. Bradinal, P. Neumann, and
U. Zölzer, “Automatic equalization using parametric IIR filters,” J. Audio
Eng. Soc., vol. 59, pp. 102–109, Mar. 2011.
63
Bibliography
[104] J. D. Reiss, “Design of audio parametric equalizer filters directly in the
digital domain,” IEEE Transactions on Audio Speech and Language Processing, vol. 19, no. 6, pp. 1843–1848, Aug. 2011.
[105] S. Särkkä and A. Huovilainen, “Accurate discretization of analog audio filters with application to parametric equalizer design,” IEEE Transactions
on Audio, Speech, and Language Processing, vol. 19, no. 8, pp. 2486–2493,
Nov. 2011.
[106] D. A. Bohn, “Constant-Q graphic equalizers,” J. Audio Eng. Soc., vol. 34,
no. 9, pp. 611–626, Sept. 1986.
[107] S. Azizi, “A new concept of interference compensation for parametric for
parametric and graphic equalizer banks,” in Proc. AES 112th Convention,
Munich, Germany, May 2002.
[108] M. Holters and U. Zölzer, “Graphic equalizer design using higher-order
recursive filters,” in Proc. Int. Conf. Digital Audio Effects (DAFx-06), Sept.
2006, pp. 37–40.
[109] S. Tassart, “Graphical equalization using interpolated filter banks,” J. Audio Eng. Soc., vol. 61, no. 5, pp. 263–279, May 2013.
[110] Z. Chen, G. S. Geng, F. L. Yin, and J. Hao, “A pre-distortion based design method for digital audio graphic equalizer,” Digital Signal Processing,
vol. 25, pp. 269–302, Feb. 2014.
[111] M. Karjalainen, E. Piirilä, A. Järvinen, and J. Huopaniemi, “Comparison
of loudspeaker equalization methods based on DSP techniques,” J. Audio
Eng. Soc., vol. 47, no. 1-2, pp. 15–31, Jan./Feb. 1999.
[112] A. Marques and D. Freitas, “Infinite impulse response (IIR) inverse filter
design for the equalization of non-minimum phase loudspeaker systems,”
in Proc. IEEE Workshop on Applications of Signal Processing to Audio and
Acoustics, New Paltz, NY, Oct. 2005, pp. 170–173.
[113] G. Ramos and J. J. López, “Filter design method for loudspeaker equalization based on IIR parametric filters,” J. Audio Eng. Soc., vol. 54, no. 12, pp.
1162–1178, Dec. 2006.
[114] J. N. Mourjopoulos, “Digital equalization of room acoustics,” J. Audio Eng.
Soc., vol. 42, no. 11, pp. 884–900, Nov. 1994.
[115] M. Karjalainen, P. A. A. Esquef, P. Antsalo, A. Mäkivirta, and V. Välimäki,
“Frequency-zooming ARMA modeling of resonant and reverberant systems,” J. Audio Eng. Soc., vol. 50, no. 12, pp. 1012–1029, Dec. 2002.
[116] A. Mäkivirta, P. Antsalo, M. Karjalainen, and V. Välimäki, “Modal equalization of loudspeaker-room responses at low frequencies,” J. Audio Eng.
Soc., vol. 51, no. 5, pp. 324–343, May 2003.
[117] A. Carini, S. Cecchi, F. Piazza, I. Omiciuolo, and G. L. Sicuranza, “Multiple
position room response equalization in frequency domain,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 122–
135, Jan. 2012.
64
Bibliography
[118] B. Bank, “Loudspeaker and room equalization using parallel filters: Comparison of pole positioning strategies,” in Proc. AES 51st International
Conference, Helsinki, Finland, Aug. 2013.
[119] A. Lindau and F. Brinkmann, “Perceptual evaluation of headphone compensation in binaural synthesis based on non-individual recordings,” J.
Audio Eng. Soc., vol. 60, no. 1/2, pp. 54–62, Jan./Feb. 2012.
[120] S. E. Olive and T. Welti, “The relationship between perception and measurement of headphone sound quality,” in Proc. AES 133rd Convention,
San Francisco, CA, Oct. 2012.
[121] S. E. Olive, T. Welti, and E. McMullin, “Listener preference for different
headphone target response curves,” in Proc. AES 134th Convention, Rome,
Italy, May 2013.
[122] ——, “Listener preferences for in-room loudspeaker and headphone target
responses,” in Proc. AES 135th Convention, New York, NY, Oct. 2013.
[123] M. Tikander, “Usability issues in listening to natural sounds with an augmented reality audio headset,” J. Audio Eng. Soc., vol. 57, no. 6, pp. 430–
441, Jun. 2009.
[124] D. Hammershøi and P. F. Hoffmann, “Control of earphone produced binaural signals,” in Proc. Forum Acousticum, Aalborg, Denmark, Jun. 2011, pp.
2235–2239.
[125] P. F. Hoffmann, F. Christensen, and D. Hammershøi, “Quantitative assessment of spatial sound distortion by the semi-ideal recording point of hearthrough device,” in Proc. Meetings on Acoustics, vol. 19, Montreal, Canada,
Jun. 2013.
[126] ——, “Insert earphone calibration for hear-through options,” in Proc. AES
51st International Conference, Helsinki, Finland, Aug. 2013.
[127] E. Perez Gonzales and J. Reiss, “Automatic equalization of multi-channel
audio using cross-adaptive methods,” in Proc. AES 127th Convention, New
York, NY, Oct. 2009.
[128] Z. Ma, J. D. Reiss, and D. A. A. Black, “Implementation of an intelligent
equalization tool using Yule-Walker for music mixing and mastering,” in
Proc. AES 134th Convention, Rome, Italy, May 2013.
[129] P. P. Vaidyanathan, Multirate Systems and Filter Banks.
Cliffs, New Jersey: Prentice-Hall, Inc., 1993.
Englewood
[130] U. Zölzer and T. Boltze, “Parametric digital filter structures,” in AES 99th
Convention, New York, NY, Oct. 1995.
[131] F. Fontana and M. Karjalainen, “A digital bandpass/bandstop complementary equalization filter with independent tuning characteristics,” IEEE
Signal Processing Letters, vol. 10, no. 4, pp. 119–112, Apr. 2003.
[132] S. J. Orfanidis, “High-order digital parametric equalizer design,” J. Audio
Eng. Soc., vol. 53, no. 11, pp. 1026–1046, Nov. 2005.
[133] M. Holters and U. Zölzer, “Parametric recursive high-order shelving filters,” in AES 120th Convention, Paris, France, May 2006.
65
Bibliography
[134] B. Bank, “Direct design of parallel second-order filters for instrument body modeling,” in Proc. International Computer Music Conference, Copenhagen, Denmark, Aug. 2007, pp. 458–465, URL:
http://www.acoustics.hut.fi/go/icmc07-parfilt.
[135] B. Bank and G. Ramos, “Improved pole positioning for parallel filters based
on spectral smoothing and multi-band warping,” IEEE Signal Processing
Letters, vol. 18, no. 5, pp. 299–302, Mar. 2011.
[136] B. Bank, “Audio equalization with fixed-pole parallel filters: An efficient
alternative to complex smoothing,” J. Audio Eng. Soc., vol. 61, no. 1/2, pp.
39–49, Jan. 2013.
[137] B. Bank and M. Karjalainen, “Passive admittance synthesis for sound synthesis applications,” in Proc. Acoustics’08 Paris International Conference,
Paris, France, Jun. 2008.
[138] ——, “Passive admittance matrix modeling for guitar synthesis,” in Proc
13th International Conference on Digital Audio Effects (DAFx-10), Graz,
Austria, Sept. 2010, pp. 3–7.
[139] A. Härmä and T. Paatero, “Discrete representation of signals on a logarithmic frequency scale,” in Proc. IEEE Workshop on the Applications of Signal
Processing to Audio and Acoustics, New Paltz, NY, Oct. 2001, pp. 39–42.
[140] B. Bank, “Logarithmic frequency scale parallel filter design with complex and magnitude-only specifications,” IEEE Signal Processing Letters,
vol. 18, no. 2, pp. 138–141, Feb. 2011.
[141] A. Härmä, M. Karjalainen, L. Savioja, V. Välimäki, U. K. Laine, and
J. Huopaniemi, “Frequency-warped signal processing for audio applications,” J. Audio Eng. Soc., vol. 48, no. 11, pp. 1011–1031, Nov. 2000.
[142] T. Paatero and M. Karjalainen, “Kautz filters and generalized frequency
resolution: Theory and audio applications,” J. Audio Eng. Soc., vol. 51, no.
1–2, pp. 27–44, Jan./Feb. 2003.
[143] M. Karjalainen and T. Paatero, “Equalization of loudspeaker and room responses using Kautz filters: Direct least squares design,” EURASIP Journal on Advances in Signal Processing, vol. 2007, 13 pages, Jul. 2007.
[144] P. Alku and T. Bäckström, “Linear predictive method for improved spectral modeling of lower frequencies of speech with small prediction errors,”
IEEE Trans. Speech and Audio Processing, vol. 12, no. 2, pp. 93–99, Mar.
2004.
[145] B. Bank, “Magnitude-priority filter design for audio applications,” in Proc.
AES 132nd Convention, Budapest, Hungary, Apr. 2012.
[146] ——, “Perceptually motivated audio equalization using fixed-pole parallel
second-order filters,” IEEE Signal Processing Letters, vol. 15, pp. 477–480,
May 2008, URL: http://www.acoustics.hut.fi/go/spl08-parfilt.
[147] J. A. Belloch, B. Bank, L. Savioja, A. Gonzalez, and V. Välimäki, “Multichannel IIR filtering of audio signals using a GPU,” in IEEE International
Conference on Acoustic, Speech and Signal Processing (ICASSP), Florence,
Italy, May 2014, pp. 6692–6696.
66
Bibliography
[148] L. Savioja, V. Välimäki, and J. O. Smith, “Audio signal processing using
graphics processing units,” J. Audio Eng. Soc., vol. 59, no. 1/2, pp. 3–19,
Jan./Feb. 2011.
[149] J. A. Belloch, M. Ferrer, A. Gonzalez, F. J. Martinez-Zaldivar, and A. M.
Vidal, “Headphone-based virtual spatialization of sound with a GPU accelerator,” J. Audio Eng. Soc., vol. 61, no. 7/8, pp. 546–561, Jul./Aug. 2013.
67
A
al
t
o
D
D1
4
7
/
2
0
1
4
9HSTFMG*afihii+
I
S
BN9
7
89
5
2
6
0
5
87
88
I
S
BN9
7
89
5
2
6
0
5
87
9
5(
p
d
f
)
I
S
S
N
L1
7
9
9
4
9
34
I
S
S
N1
7
9
9
4
9
34
I
S
S
N1
7
9
9
4
9
4
2(
p
d
f
)
A
a
l
t
oU
ni
v
e
r
s
i
t
y
S
c
h
o
o
lo
fE
l
e
c
t
r
i
c
a
lE
ng
i
ne
e
r
i
ng
D
e
p
a
r
t
me
nto
fS
i
g
na
lP
r
o
c
e
s
s
i
nga
ndA
c
o
us
t
i
c
s
w
w
w
.
a
a
l
t
o
.
f
i
BU
S
I
N
E
S
S+
E
C
O
N
O
M
Y
A
R
T+
D
E
S
I
G
N+
A
R
C
H
I
T
E
C
T
U
R
E
S
C
I
E
N
C
E+
T
E
C
H
N
O
L
O
G
Y
C
R
O
S
S
O
V
E
R
D
O
C
T
O
R
A
L
D
I
S
S
E
R
T
A
T
I
O
N
S