Download Report

2014 IEEE International Symposium on Multimedia
Indexing and Retrieving Continuations in Musical Time Series Data Using
Relational Databases
Aleksey Charapko
Ching-Hua Chuan
School of Computing
University of North Florida
Jacksonville, FL, USA
[email protected]
School of Computing
University of North Florida
Jacksonville, FL, USA
[email protected]
to grow at an astonishing rate in the digital era, it becomes
imperative to study and improve the scalability of existing
algorithms and to propose new approaches to solve old
problems with new challenges of the exponential data
growth in mind.
A number of limiting factors should be considered when
studying how an algorithm can scale up for large datasets.
The most prominent element is efficiency: the time it takes
for the algorithm to search and retrieve the result. Other
elements such as memory limitations and inability to
distribute an algorithm to many computation and storage
nodes can affect scalability as well.
In this paper, we proposed and tested a model that
performs the search and indexing similarly to Continuator
[6], but operates on the relational database in order to
overcome memory limitation that the original model may
experience with large inputs. Although NoSQL databases
may be more suitable for storing time series data, we focused
on relational databases in this study because of the popularity
and wide adoption of relational databases. The model
provides quick search using hash values in the relational
database, and reuses the structures for indexing to reduce the
usage of storage space. We evaluated the model performance
by comparing it with the file-scan algorithm using textural
data from Wikipedia. We also applied the model for music
stylistic analysis by identifying continuations for musical
data in MIDI and audio consisting of more than a thousand
excerpts of Bach’s and Mozart’s compositions.
Abstract—This paper proposed and tested a model that
provides quick search and retrieval of continuations for time
series, particularly musical data, using relational databases.
The model extends an existing interactive music-generation
system by focusing on large input sequences. Experiments
using textural and musical data provided satisfactory
performance results for the model.
Keywords-continuations; information retrieval; database;
data sequence; music generation
I.
INTRODUCTION
In this paper, we present a compact and efficient model
that indexes time series data and retrieves continuations for a
query using relational databases. Multimedia data, such as
music recordings, are time series. In the existing information
retrieval and recommendation systems, the processing of
such data is often song-based [1]. Each song is indexed by its
meta-data such as artist name and song title, or the overall
acoustic features summarized for the entire duration of the
song [2, 3]. The details such as the order of the note events
are eliminated. This approach is sufficient for searching for
songs given an artist name as input and for recommending
songs based on similarity between songs. However, tasks
such as modeling compositional styles [4] and automatic
music generation [5] require note-level indexing and
retrieval. Consider a system that supports interactive music
playing, i.e., a system that mimics a jazz musician engaging
in back-and-forth improvisation with one another. It is
essential for the system to be able to quickly retrieve all
suitable note sequences, the continuations, as the response to
carry on the music stream.
Several computer systems have been proposed for
interactive music playing and creation. For example,
Continuator [6] captures the musical sequence played by a
musician and responds with playing another musical
sequence that imitates the style in the original sequence. In
[7], Assayag and Dubnov used variable Markov models and
factor oracle to generate improvisation in real-time.
However, these systems only focus on the incoming data
stream at a particular duration of time and the data stream is
stored in the memory for processing. Therefore, these
proposed algorithms and data structures cannot be easily
scaled up when the amount of multimedia data exceeds the
size of the memory. As the volume of information continues
978-0-7695-5437-2/14 $31.00 © 2014 IEEE
DOI 10.1109/ISM.2014.14
II.
CONTINUATION INDEXING AND RETRIEVAL
A. Problem Definition
The goal of the system is to provide fast retrievals of all
continuations for a query based on the sequences stored in
the database. Figure 1 shows a simplified scenario for the
proposed system. A composition example is stored in the
database as the top melody shown in Figure 1. Suppose that
the music sequences stored in the database represent highquality compositions, which can be used as examples to
inspire new music creations. Also suppose that the user
created a short melodic segment shown as the query in
Figure 1 and was seeking for inspirations to continue and
finish his or her melody. Based on the example stored in the
database, the system can suggest five different continuations
for the user to consider. This data-driven approach avoids
341
TABLE I.
relying heavily on domain knowledge, i.e., rules or theories
for music aesthetics, for music generation.
ALGORITHMS FOR MODEL BUILDING:APPEND-TO-MODEL
Algorithm APPEND-TO-MODEL (appending sequence to a
model)
Input: Sequence S;
Output: Generated Prefix Tree Model: Si, Path,
CompoundHashDigest, and continuations (nodes with
SubTreeRoot);
Method:
1. for i = |S| to 1
2.
P = {S1 , …, Si-1 }
3.
N = empty //initialize variables
4.
CompoundHashDigest = empty
5.
Path = empty
6.
SubTreeRoot = empty
7.
if REUSE-BRANCH(P, Si) == false then
//cannot reuse entire branch
8.
for j = |P| to 1
9.
if REUSE-NODE(P, N, Pj, Si ) == false then
// cannot reuse any nodes
10.
if N == empty then
11.
create new node N
12.
SubTreeRoot = N
13.
else
14.
create new node N’ with parent N, Path
and CompoundHashDigest accumulated so far
15.
N = N’
16.
preppend hash digest of N and N’ to
CompoundHashDigest
17.
append N to Path
18.
end if
19.
add continuation for Si to N with subtree root of SubTreeRoot
20.
end if
21.
end for
22. end if
23. end for
Figure 1. An example of all continuations for a query based on a
musical sequence.
Therefore, when building the database, the system must
index all possible continuations for any potential queries on
every example in the database. For example, suppose the
model has the sequence {A B C D} stored in the database.
When the model receives a query {A B C}, it needs to
produce {D} as the continuation. Similarly, the continuation
{C D} should be generated for the query {A B}. In other
words, the model is required to build certain structures so
that all possible continuations ({C D}, and {D}) for all
possible queries ({A B}, and {A B C}), can be retrieved
quickly for the input sequence {A B C D}, assuming the
minimum length of the query is two.
B. Building the Model and Indexing
1) The Basic Model
The model for searching the sequence and retrieving
possible continuations is based on Continuator [6].
Continuator stores a tree structure in memory to index the
input sequence and search for the possible continuation to
the query. As the input size increases, it might not be
possible to store all data in the computer memory. Therefore,
disk storage is essential to scale up the model.
Our model expands Continuator in various aspects.
Moving the model to a database requires several significant
changes in order to avoid performance degradation. For
example, the search in Continuator relies on tree traversal,
which can be inefficient for large trees. In order to store the
tree structure in a relational database, we developed
algorithms to reuse existing sub-trees and create compound
hash digest for quick retrievals without having to look up
from the root of the tree.
The process of building the model begins with
segmenting the original input sequence. A unique identifier
that corresponds to the position of the segment in the input
sequence is stored with each segment. The sequence of
segments is parsed right to left to build the prefix trees of the
model. A reduction function can be applied to each data
segment before it is placed on the tree to allow the model to
perform search on inexact queries or look up content similar
to the query. Every node needs to maintain a list of
continuations or segment identifiers from the input that
continues the data represented by the node. Thus, the model
is able to retrieve the continuations from the original input
sequence. We also store the path information with every
TABLE II.
ALGORITHMS FOR MODEL BUILDING: REUSE-NODE
Algorithm REUSE-NODE (reusing the node already in
the model if possible)
Input: Sequence P, previous node N, current
sequence item Csi, Continuation sequence item C;
Output: Whether the node has been reused, a node
N, sub-tree root node SubTreeRoot, updated
CompoundHashDigest, updated Path;
Method:
1. if N == empty then
2.
SubTreeRoot = any node representing item Csi
3.
if SubTreeRoot != empty then
4.
add continuation for C to SubTreeRoot
5.
N = SubTreeRoot
6.
else
7.
return false
8.
end if
9. else // this allows to reuse branches for P
10.
N’ = any non-root node representing item
Csi, whose parent is N, if exists
11.
if N’!= empty then
12.
preppend hash digest of N and N’ to
CompoundHashDigest
13.
append N to Path
14.
add continuation for C to N’ with subtree root of SubTreeRoot
15.
N = N’
16.
else
17.
return false
18.
end if
19. end if
20. return true, N, SubTreeRoot,
CompoundHashDigest, Path
342
TABLE III.
ALGORITHMS FOR MODEL BUILDING: REUSE-BRANCH
2) Long Input Sequences
In the basic model, the length of the input sequence
determines the depth of the tree. As a result, search slows
down due to the deep tree structure built for long sequences.
In this paper, we utilize a sliding window to limit the depth
of the trees and increase branching. We partition the input
sequence using half-overlapped windows, and the
subsequence in each window is then processed accordingly.
Suppose we have a longer input sequence {A B C D A D B
C}, which is processed using a window of size four. We first
parse the window {A B C D} from the input and produce the
model as shown in Figure 3 (a). As we progress, we apply
the same mechanism as that of the previous window except
that we merge the newly produced tree with the ones already
in the database. For example, consider the second window
consisting of {C D A D}. We begin by putting the prefix of
D into the model, and perform a search to find if the prefix
(node A) is already in the database. If so, we create a tree
and merge it with the existing one. The process is repeated
until all the prefixes of {C D A D} are processed as shown in
Figure 3 (b). Note that node D has hash digest “h3” instead
of “h1h2h3”. This is because edge A-D was the first edge
created while processing the input window {C D A D}.
Similarly, the model is updated as in Figure 3 (c) after the
last window {A D B C} is processed.
Algorithm REUSE-BRANCH (reusing a tree branch
already in the model)
Input: Sequence P, Continuation sequence item C;
Output: Whether the branch of the existing tree
has been reused;
Method:
1. RBP = array of nodes representing the branch
for sequence P
2. if RBP != empty then //branch is reused
3.
for each node N in RBP
//add continuations to reuse the sub-tree
4.
add continuation for C to N with subtree root of RBP0
5.
end for
6.
return true
7. else
8.
return false
9. end if
node on the tree in order to identify the ancestors to the node
when we perform the search. Path data stored by the model
comes in two distinct pieces: the node identifiers to the
ancestor tree nodes and a path hash computed from the data
of the input sequences.
For instance, consider the input sequence: {A B C D},
the proposed model builds trees that contain all possible
prefixes for the input. We start by looking at node D and
examine its prefix: node C is added as the root of the tree and
node B is attached as a child of node C. For every created
edge, we compute a path hash value by producing a small
hash digest of the two nodes that make up the vertices of the
edge. We then append the digest to the path hash of the
previous edge in the path to the current node. The built tree
for the input sequence is illustrated in Figure 2 (a). In Figure
1, hash digests are labeled as “h” followed by a number to
designate different digests. For example, the edge C-B is
labeled as “h1” and the edge B-A as “h2”. As a result, the
hash digest for the entire path from node C to node A is
“h1h2”. At this point, nodes A, B and C all have a
continuation index of 4, which corresponds to the segment D
from the input.
We continue parsing the input sequence without the last
element, i.e., we parse the sequence {A B C}. In [6], a new
tree is created with a root of B and node A being a child of B
as shown in Figure 2 (b). Unlike the model in [6], we
compress the tree structures by reusing common sub-trees to
reduce memory usage. Instead of creating a new tree, we
append the continuation lists produced during this iteration to
the existing nodes. However, because of the reuse of the
structure, it is important not to mix the continuation lists
from various iterations. For example, the continuations
recorded during the parse of {A B C D} should not be mixed
with those for {A B C}. Similarly, we parse sequence {A B}
and add the appropriate continuations to the corresponding
node in order to build the complete tree with all prefixes for
the input as shown in Figure 2 (c).
The algorithms for building the basic model are
presented in Tables I, II, and III.
Figure 2. (a) Tree after parsing {A, B, C, D}; (b) Tree after parsing {A, B,
C}; (c) Completed single tree that captures the patterns and continuations
found in {A, B, C, D}. Number in circle is the unique identifier for a node
in the tree. Continuation List next to the node is of the format Node ID:
List of continuation indexes.
Figure 3. (a) The model after processing of the first window; (b) The model
after processing of the second window; (c) The completed model.
C. Seaching and Retrieving Continuations
1) Search and Retrieval
Once the model is built, it can be used to search for
patterns and to retrieve continuations of found patterns. The
approach in [6] finds the right tree and traverses down the
tree that matches with the query sequence. However, such
343
approach is too costly for models stored in the relational
database, because every transition from a parent to a child
node requires a search in the entire database of the model.
Instead, we use the path hash values of our model to quickly
locate possible matches. As the query arrives, we compute
the path hash using MD5 from a query sequence, and then
perform a database look-up for the path hash and last query
segment. The results of this search include all possible
matches to our query and we sieve through the matches to
eliminate the incorrect ones produced by hash collisions.
Continuations are then retrieved for all of the valid results.
Suppose we receive a query {A D}. We first compute the
path hash for the query. Edge A-D produces the hash digest
of “h6” based on the example shown in Figure 2 (c). We then
look up any nodes in the model that have path hash ending in
“h6”. The list returned by the hash look-up might contain
invalid nodes that do not produce a continuation for the
query. Thus, we need to verify every returned node by
checking whether the list of identifiers to the ancestor nodes
forms a valid traversal path for a given query. In this
example, we only retrieve node A with the unique identifier
of 6 for the query. During the validation process, we also
obtain the unique identifier of the root of the sub-tree (the
unique identifier 5 for node D) in which the matched query
was found. The final step is to look up the continuations for
node A given that the sub-tree root for the continuation is the
sub-tree root's identifier we have retrieved earlier. The
continuation list for node A with unique identifier of 6 and
sub-tree root 5 consists of only one element: segment
number 7, indicating {B C} in the original sequence {A B C
D A D B C}. Therefore, the sub-sequence {B C} is retrieved
as the continuation for the query {A D}. The search
algorithm is presented in Table IV.
TABLE IV.
but not the entire set of results. In addition, no single query
of length that equals to or is greater than the window size
will return any results. Another search limitation is the
possibility of duplicate results due to the overlapped input
windows, but such duplicates can be easily removed during
validation phase of the query processing.
III.
EXPERIMENTS AND RESULTS
A. Performance Evaluation Using Text Inputs
To evaluate the performance of the model, we compared
it with the file-scan approach. Our model was implemented
in Java with local MySQL as a relational database. In filescan search, the contents of the file are read from the disk
and scanned for matching patterns. To focus on the
performance of the algorithms, all database caching was
turned off. Similarly, we disallowed caching for file-scan
between queries.
We used textual data from Wikipedia to compare the
performance of the proposed model and file-scan. Two test
files of different size were used: the smaller text consisting
of 13.8 thousand segments is a single article, while the larger
file of 47.8 thousand segments is a combination of articles
from different topics.
We first examined model performance by varying the
size of the search query. We generated a set of 100 queries of
same length guaranteed to have at least one match in the
input sequence. Then we ran the search for these queries on
the model and file-scan while recording time for performing
each look-up. We repeated the process 10 times and
computed an average time it took the model and file-scan to
process 100 queries on the input data.
As shown in Figure 4, the proposed model outperformed
the file-scan approach for longer sized queries, but file scan
works faster for smaller queries of length two. For the
proposed model, the significant difference in search time
between queries of size two and three is because a much
longer list of matches was retrieved for queries of size two
than three. Since the model's initial data look-up is carried
out based on the hash value calculated from the query, a
longer list of potential results indicates that the model needs
to validate more possible matches for hash collisions. Each
hash validation is a separate database operation which, given
a large number of such operations, considerably slows down
the model. In addition, retrieval of each continuation list
results in an additional look-up in the relational database that
negatively impacts the performance. For longer queries
where continuation list is smaller, the model outperformed
the file-scan with the difference being more substantial for
larger input size.
ALGORITHMS FOR MODEL BUILDING: SEARCH
Algorithm SEARCH (performs the lookup for query Q
in the model)
Input: A query sequence Q;
Output: A list of continuations;
Method:
1. Continuations = {∅}
2. Prospects = list of nodes with
CompoundHashDigest same as the one produced
from sequence Q
3. for each node N in Prospects
4.
if N.path is a list of nodes representing
items in Q then
5.
add continuations of N and sub-tree root
N.path0 to Continuations
6.
end if
7. end for
8. return Continuations
We also studied the effects of the result set size, i.e., the
number of matches, on the performance of the model. We
produced a set of queries of fixed length and of the same
number of matches produced by a query. Queries were used
to test the model and file-scan with running times recorded
and averaged over 10 repetitions for each result set size.
2) Limitation
The model has a few limitations in search. The first
limitation is related to the query size. Because of the sliding
window processing, the maximum query size for which the
model is guaranteed to find all matches is half the window
size. For instance, in our example with a window size of 4,
only the queries of length 2 will return the full set of
continuations, while the queries of size 3 will return some
344
Figure 6. A melodic sequence consists of note events representing in tuples
of pitch, onset time, duration, and velocity.
Figure 4. Average search time for varying sizes of query strings.
Figure 7. Number of compositions matched for continuations for queries
generated from different composer’s work.
Figure 5. Average search time versus number of results with query size of
two.
work while Figure 7 (b) presents the queries generated from
Mozart’s. The italicized numeric values indicate the number
of matches from Mozart’s while the non-italic values shows
the matches from Bach’s. As shown in Figure 7, the decrease
in the number of matches as the query length increases
indicates that longer note sequences are less likely to appear
in multiple compositions from the two composers. However,
it can be observed that queries from Mozart’s work tend to
retrieve more continuations from both composers than
queries from Bach’s.
Figure 5 shows the average search time for the proposed
model and file-scan using queries with varying number of
matches. It can be observed that the model performed better
for queries with a small number of matches than file-scan.
With the growth of the result set, the model gradually
became slower than the file-scan approach. However, as
shown in the previous experiment, the model outperformed
file-scan more significantly for the larger input size.
B. Retriving Continuations for Symbolic Music Data
In this section, we tested the model using symbolic music
data. In symbolic music, specifically the MIDI format, a
composition consists of one or more tracks that each often
represents an instrument. A track consists of a sequence of
note events, and each note event is described as a tuple of
attributes including pitch, onset time, duration, and velocity
(loudness). Figure 6 presents a melodic sequence with note
events represented in MIDI pitch number, onset time (in
beats), duration (in beats), and velocity. For simplicity, we
focused only on MIDI pitch number in this study.
We collected MIDI files of compositions by Bach and
Mozart. The MIDI files with 120 tracks include symphony,
concerto, sonata, prelude, fugue and other forms, which
result in 138010 note events. For the experiments, we
produced a set of 1000 randomly selected, fixed length
queries from the tracks of each composer’s work. For each
query, we retrieved continuations from both Bach and
Mozart data sets and examined the number of compositions
identified as matches for continuations for each composer.
Figure 7 shows the average number of matches from
each composer’s work given varying length of queries in
terms of number of notes. Figure 7 (a) shows the retrieved
number of matches using queries generated from Bach’s
C. Retriving Continuations for Audio
In this section, we focused on audio musical sequences.
We collected 1,427 15-second excerpts of audio recordings
from Bach (555 excerpts) and Mozart (872 excerpts)
compositions. Compositions used in the experiment also
include various musical forms with instruments such as
recorder, violin, flute, piano, oboe, guitar, and choir.
Unlike textual data and MIDI which can be easily parsed
as a sequence of words or pitch numbers, we needed a
reduction function to convert audio clips into sequences of
symbols that are capable of describing the musical content in
the audio. We used the FACEG algorithm [8] for the
reduction function. The FACEG algorithm tracks music
tonal center, or key, in real time. As shown in Figure 8, the
input audio recording is first pre-processed to reduce noise
and eliminate silences, and then analyzed frame-by-frame
using fast Fourier transform to retrieve frequency
information. The information in the frequency spectrum
provides pitch information about local note events, and this
information is mapped to the 3D Spiral Array model [9] to
label the audio frame using the closest key from the 24 predefined key points.
345
Figure 8. The system architecture for FACEG [3].
We modified the FACEG algorithm to create two
reduction functions for this study: the key reduction function
labels each audio segment using the closest key generated by
FACEG while the pitch-chord-key reduction function labels
the segment using the closest pitch, chord, or key. The pitchchord-key reduction function provides finer description of
the musical content as it divides the space in Spiral Array
into 60 partitions while the key reduction function only
consists of 24 partitions. Each audio excerpt (with sampling
rate of 44.1kHz) was divided into non-overlapped segments
of roughly 370 milliseconds, and each segment was then
summarized as the closest pitch, chord, or key using the
reduction function. After applying the reduction function, we
created the tree structures and indexes for the 1,427 audio
excerpts using the approach described in section II. Similar
to the experiment using MIDI data, we produced a set of
1000 randomly selected, fixed length queries from the
excerpts of each composer.
Figure 9 shows the average number of matches from
each composer’s work given varying length of queries using
key reduction function. It can be observed that the model
was able to identify only few compositions from the data set
for longer queries. Shorter queries with length less than 1.5
seconds (4 segments) have many matches in both
composers’ sets, but shorter queries generated from Mozart’s
retrieved more matches in Mozart’s compositions.
Figure 10 shows the results using the pitch-chord-key
reduction function. Comparing with Figure 9, the number of
matches reduced significantly when the pitch-chord-key
reduction function was used, especially for shorter queries.
The model was able to identify a single composition for
queries longer than 1.5 seconds. This implies that the pitchchord-key reduction function can be used to uniquely
identify compositions from a mixed data set of Bach and
Mozart’s work when the recordings are summarized in
sufficiently long segments.
IV.
Figure 9. Number of compositions matched for continuations for queries
generated from different composer’s work using key reduction function.
Figure 10. Number of compositions matched for continuations for queries
generated from different composer’s work using pitch-chord-key reduction
function.
However, the performance measurements indicated that
our model drive by relational database was hindered by the
need of frequent database interactions. Identifying a way to
conduct database look-ups less frequently is essential to
improve the model's speed. Ultimately, we plan to migrate
the model to HBase environment running under Hadoop. A
small number of database interactions becomes important
when dealing with distributed databases because of added
overhead of network latency.
REFERENCES
[1]
[2]
[3]
CONCLUSIONS AND FUTURE WORK
[4]
We proposed and tested a model that can be used to
search and retrieve possible continuations for various types
of data. Unlike the original Continuator model, our version
resides in the database instead of the main machine memory
in order to mitigate memory size limitations. The proposed
algorithm expands the original model by reusing sub-trees
and compound hash digest for quick retrievals. Experiments
using textural data showed that the model outperformed the
sequential file-scan algorithm on large input data set. We
also observed that the model was able to identify the musical
sequence of a particular composition when appropriate
reduction function was used.
[5]
[6]
[7]
[8]
[9]
346
M. I. Mandel and D. Ellis, “Song-level features and support vector
machines for music classification,” Proceedings of the 6th
International Conference on Music Information Retrieval, 2005.
M. Levy and M. Sandler, “Music information retrieval using social
tags and audio,” IEEE Trans. on Multimedia, vol. 11, no. 3, pp. 383395, 2009.
P. Lops, M. Gemmis, G. Semeraro, “Content-based recommendar
system: State of the art and trends,” Recommender Systems
Handbook, Springer, New York, 2011.
C.-H. Chuan and E. Chew, “Generating and evaluating musical
accompaniments that emulate style,” Computer Music Journal, vol.
35, issue 4, MIT Press, 2011.
A. Eigenfeldt and P. Pasquier, “Realtime generative music system
using autonomous melody, harmony and rhythm agents,” the 12
Generative Art Conference, 2009.
F. Pachet, “The continuator: Musical interaction with style,” Journal
of New Music Research, vol. 32, no. 3, pp. 333-341, 2003.
G. Assayag and S. Dubnov, “Using factor oracles for machine
improvisation,” Journal of Soft Computing: A Fusion of Foundations,
Methodologies and Applications, vol. 8, no. 9, pp. 604-610, 2004.
C.-H. Chuan and E. Chew, “Audio key finding: Considerations in
system design, and case studies on 24 Chopin’s preludes,” EURASIP
Journal on Applied Signal Processing, doi:10.1155/2007/56561.
E. Chew, Mathematical and Computational Modeling of Tonality:
Theory and Applications, Springer, New York, 2014.