2014 IEEE International Symposium on Multimedia Indexing and Retrieving Continuations in Musical Time Series Data Using Relational Databases Aleksey Charapko Ching-Hua Chuan School of Computing University of North Florida Jacksonville, FL, USA [email protected] School of Computing University of North Florida Jacksonville, FL, USA [email protected] to grow at an astonishing rate in the digital era, it becomes imperative to study and improve the scalability of existing algorithms and to propose new approaches to solve old problems with new challenges of the exponential data growth in mind. A number of limiting factors should be considered when studying how an algorithm can scale up for large datasets. The most prominent element is efficiency: the time it takes for the algorithm to search and retrieve the result. Other elements such as memory limitations and inability to distribute an algorithm to many computation and storage nodes can affect scalability as well. In this paper, we proposed and tested a model that performs the search and indexing similarly to Continuator [6], but operates on the relational database in order to overcome memory limitation that the original model may experience with large inputs. Although NoSQL databases may be more suitable for storing time series data, we focused on relational databases in this study because of the popularity and wide adoption of relational databases. The model provides quick search using hash values in the relational database, and reuses the structures for indexing to reduce the usage of storage space. We evaluated the model performance by comparing it with the file-scan algorithm using textural data from Wikipedia. We also applied the model for music stylistic analysis by identifying continuations for musical data in MIDI and audio consisting of more than a thousand excerpts of Bach’s and Mozart’s compositions. Abstract—This paper proposed and tested a model that provides quick search and retrieval of continuations for time series, particularly musical data, using relational databases. The model extends an existing interactive music-generation system by focusing on large input sequences. Experiments using textural and musical data provided satisfactory performance results for the model. Keywords-continuations; information retrieval; database; data sequence; music generation I. INTRODUCTION In this paper, we present a compact and efficient model that indexes time series data and retrieves continuations for a query using relational databases. Multimedia data, such as music recordings, are time series. In the existing information retrieval and recommendation systems, the processing of such data is often song-based [1]. Each song is indexed by its meta-data such as artist name and song title, or the overall acoustic features summarized for the entire duration of the song [2, 3]. The details such as the order of the note events are eliminated. This approach is sufficient for searching for songs given an artist name as input and for recommending songs based on similarity between songs. However, tasks such as modeling compositional styles [4] and automatic music generation [5] require note-level indexing and retrieval. Consider a system that supports interactive music playing, i.e., a system that mimics a jazz musician engaging in back-and-forth improvisation with one another. It is essential for the system to be able to quickly retrieve all suitable note sequences, the continuations, as the response to carry on the music stream. Several computer systems have been proposed for interactive music playing and creation. For example, Continuator [6] captures the musical sequence played by a musician and responds with playing another musical sequence that imitates the style in the original sequence. In [7], Assayag and Dubnov used variable Markov models and factor oracle to generate improvisation in real-time. However, these systems only focus on the incoming data stream at a particular duration of time and the data stream is stored in the memory for processing. Therefore, these proposed algorithms and data structures cannot be easily scaled up when the amount of multimedia data exceeds the size of the memory. As the volume of information continues 978-0-7695-5437-2/14 $31.00 © 2014 IEEE DOI 10.1109/ISM.2014.14 II. CONTINUATION INDEXING AND RETRIEVAL A. Problem Definition The goal of the system is to provide fast retrievals of all continuations for a query based on the sequences stored in the database. Figure 1 shows a simplified scenario for the proposed system. A composition example is stored in the database as the top melody shown in Figure 1. Suppose that the music sequences stored in the database represent highquality compositions, which can be used as examples to inspire new music creations. Also suppose that the user created a short melodic segment shown as the query in Figure 1 and was seeking for inspirations to continue and finish his or her melody. Based on the example stored in the database, the system can suggest five different continuations for the user to consider. This data-driven approach avoids 341 TABLE I. relying heavily on domain knowledge, i.e., rules or theories for music aesthetics, for music generation. ALGORITHMS FOR MODEL BUILDING:APPEND-TO-MODEL Algorithm APPEND-TO-MODEL (appending sequence to a model) Input: Sequence S; Output: Generated Prefix Tree Model: Si, Path, CompoundHashDigest, and continuations (nodes with SubTreeRoot); Method: 1. for i = |S| to 1 2. P = {S1 , …, Si-1 } 3. N = empty //initialize variables 4. CompoundHashDigest = empty 5. Path = empty 6. SubTreeRoot = empty 7. if REUSE-BRANCH(P, Si) == false then //cannot reuse entire branch 8. for j = |P| to 1 9. if REUSE-NODE(P, N, Pj, Si ) == false then // cannot reuse any nodes 10. if N == empty then 11. create new node N 12. SubTreeRoot = N 13. else 14. create new node N’ with parent N, Path and CompoundHashDigest accumulated so far 15. N = N’ 16. preppend hash digest of N and N’ to CompoundHashDigest 17. append N to Path 18. end if 19. add continuation for Si to N with subtree root of SubTreeRoot 20. end if 21. end for 22. end if 23. end for Figure 1. An example of all continuations for a query based on a musical sequence. Therefore, when building the database, the system must index all possible continuations for any potential queries on every example in the database. For example, suppose the model has the sequence {A B C D} stored in the database. When the model receives a query {A B C}, it needs to produce {D} as the continuation. Similarly, the continuation {C D} should be generated for the query {A B}. In other words, the model is required to build certain structures so that all possible continuations ({C D}, and {D}) for all possible queries ({A B}, and {A B C}), can be retrieved quickly for the input sequence {A B C D}, assuming the minimum length of the query is two. B. Building the Model and Indexing 1) The Basic Model The model for searching the sequence and retrieving possible continuations is based on Continuator [6]. Continuator stores a tree structure in memory to index the input sequence and search for the possible continuation to the query. As the input size increases, it might not be possible to store all data in the computer memory. Therefore, disk storage is essential to scale up the model. Our model expands Continuator in various aspects. Moving the model to a database requires several significant changes in order to avoid performance degradation. For example, the search in Continuator relies on tree traversal, which can be inefficient for large trees. In order to store the tree structure in a relational database, we developed algorithms to reuse existing sub-trees and create compound hash digest for quick retrievals without having to look up from the root of the tree. The process of building the model begins with segmenting the original input sequence. A unique identifier that corresponds to the position of the segment in the input sequence is stored with each segment. The sequence of segments is parsed right to left to build the prefix trees of the model. A reduction function can be applied to each data segment before it is placed on the tree to allow the model to perform search on inexact queries or look up content similar to the query. Every node needs to maintain a list of continuations or segment identifiers from the input that continues the data represented by the node. Thus, the model is able to retrieve the continuations from the original input sequence. We also store the path information with every TABLE II. ALGORITHMS FOR MODEL BUILDING: REUSE-NODE Algorithm REUSE-NODE (reusing the node already in the model if possible) Input: Sequence P, previous node N, current sequence item Csi, Continuation sequence item C; Output: Whether the node has been reused, a node N, sub-tree root node SubTreeRoot, updated CompoundHashDigest, updated Path; Method: 1. if N == empty then 2. SubTreeRoot = any node representing item Csi 3. if SubTreeRoot != empty then 4. add continuation for C to SubTreeRoot 5. N = SubTreeRoot 6. else 7. return false 8. end if 9. else // this allows to reuse branches for P 10. N’ = any non-root node representing item Csi, whose parent is N, if exists 11. if N’!= empty then 12. preppend hash digest of N and N’ to CompoundHashDigest 13. append N to Path 14. add continuation for C to N’ with subtree root of SubTreeRoot 15. N = N’ 16. else 17. return false 18. end if 19. end if 20. return true, N, SubTreeRoot, CompoundHashDigest, Path 342 TABLE III. ALGORITHMS FOR MODEL BUILDING: REUSE-BRANCH 2) Long Input Sequences In the basic model, the length of the input sequence determines the depth of the tree. As a result, search slows down due to the deep tree structure built for long sequences. In this paper, we utilize a sliding window to limit the depth of the trees and increase branching. We partition the input sequence using half-overlapped windows, and the subsequence in each window is then processed accordingly. Suppose we have a longer input sequence {A B C D A D B C}, which is processed using a window of size four. We first parse the window {A B C D} from the input and produce the model as shown in Figure 3 (a). As we progress, we apply the same mechanism as that of the previous window except that we merge the newly produced tree with the ones already in the database. For example, consider the second window consisting of {C D A D}. We begin by putting the prefix of D into the model, and perform a search to find if the prefix (node A) is already in the database. If so, we create a tree and merge it with the existing one. The process is repeated until all the prefixes of {C D A D} are processed as shown in Figure 3 (b). Note that node D has hash digest “h3” instead of “h1h2h3”. This is because edge A-D was the first edge created while processing the input window {C D A D}. Similarly, the model is updated as in Figure 3 (c) after the last window {A D B C} is processed. Algorithm REUSE-BRANCH (reusing a tree branch already in the model) Input: Sequence P, Continuation sequence item C; Output: Whether the branch of the existing tree has been reused; Method: 1. RBP = array of nodes representing the branch for sequence P 2. if RBP != empty then //branch is reused 3. for each node N in RBP //add continuations to reuse the sub-tree 4. add continuation for C to N with subtree root of RBP0 5. end for 6. return true 7. else 8. return false 9. end if node on the tree in order to identify the ancestors to the node when we perform the search. Path data stored by the model comes in two distinct pieces: the node identifiers to the ancestor tree nodes and a path hash computed from the data of the input sequences. For instance, consider the input sequence: {A B C D}, the proposed model builds trees that contain all possible prefixes for the input. We start by looking at node D and examine its prefix: node C is added as the root of the tree and node B is attached as a child of node C. For every created edge, we compute a path hash value by producing a small hash digest of the two nodes that make up the vertices of the edge. We then append the digest to the path hash of the previous edge in the path to the current node. The built tree for the input sequence is illustrated in Figure 2 (a). In Figure 1, hash digests are labeled as “h” followed by a number to designate different digests. For example, the edge C-B is labeled as “h1” and the edge B-A as “h2”. As a result, the hash digest for the entire path from node C to node A is “h1h2”. At this point, nodes A, B and C all have a continuation index of 4, which corresponds to the segment D from the input. We continue parsing the input sequence without the last element, i.e., we parse the sequence {A B C}. In [6], a new tree is created with a root of B and node A being a child of B as shown in Figure 2 (b). Unlike the model in [6], we compress the tree structures by reusing common sub-trees to reduce memory usage. Instead of creating a new tree, we append the continuation lists produced during this iteration to the existing nodes. However, because of the reuse of the structure, it is important not to mix the continuation lists from various iterations. For example, the continuations recorded during the parse of {A B C D} should not be mixed with those for {A B C}. Similarly, we parse sequence {A B} and add the appropriate continuations to the corresponding node in order to build the complete tree with all prefixes for the input as shown in Figure 2 (c). The algorithms for building the basic model are presented in Tables I, II, and III. Figure 2. (a) Tree after parsing {A, B, C, D}; (b) Tree after parsing {A, B, C}; (c) Completed single tree that captures the patterns and continuations found in {A, B, C, D}. Number in circle is the unique identifier for a node in the tree. Continuation List next to the node is of the format Node ID: List of continuation indexes. Figure 3. (a) The model after processing of the first window; (b) The model after processing of the second window; (c) The completed model. C. Seaching and Retrieving Continuations 1) Search and Retrieval Once the model is built, it can be used to search for patterns and to retrieve continuations of found patterns. The approach in [6] finds the right tree and traverses down the tree that matches with the query sequence. However, such 343 approach is too costly for models stored in the relational database, because every transition from a parent to a child node requires a search in the entire database of the model. Instead, we use the path hash values of our model to quickly locate possible matches. As the query arrives, we compute the path hash using MD5 from a query sequence, and then perform a database look-up for the path hash and last query segment. The results of this search include all possible matches to our query and we sieve through the matches to eliminate the incorrect ones produced by hash collisions. Continuations are then retrieved for all of the valid results. Suppose we receive a query {A D}. We first compute the path hash for the query. Edge A-D produces the hash digest of “h6” based on the example shown in Figure 2 (c). We then look up any nodes in the model that have path hash ending in “h6”. The list returned by the hash look-up might contain invalid nodes that do not produce a continuation for the query. Thus, we need to verify every returned node by checking whether the list of identifiers to the ancestor nodes forms a valid traversal path for a given query. In this example, we only retrieve node A with the unique identifier of 6 for the query. During the validation process, we also obtain the unique identifier of the root of the sub-tree (the unique identifier 5 for node D) in which the matched query was found. The final step is to look up the continuations for node A given that the sub-tree root for the continuation is the sub-tree root's identifier we have retrieved earlier. The continuation list for node A with unique identifier of 6 and sub-tree root 5 consists of only one element: segment number 7, indicating {B C} in the original sequence {A B C D A D B C}. Therefore, the sub-sequence {B C} is retrieved as the continuation for the query {A D}. The search algorithm is presented in Table IV. TABLE IV. but not the entire set of results. In addition, no single query of length that equals to or is greater than the window size will return any results. Another search limitation is the possibility of duplicate results due to the overlapped input windows, but such duplicates can be easily removed during validation phase of the query processing. III. EXPERIMENTS AND RESULTS A. Performance Evaluation Using Text Inputs To evaluate the performance of the model, we compared it with the file-scan approach. Our model was implemented in Java with local MySQL as a relational database. In filescan search, the contents of the file are read from the disk and scanned for matching patterns. To focus on the performance of the algorithms, all database caching was turned off. Similarly, we disallowed caching for file-scan between queries. We used textual data from Wikipedia to compare the performance of the proposed model and file-scan. Two test files of different size were used: the smaller text consisting of 13.8 thousand segments is a single article, while the larger file of 47.8 thousand segments is a combination of articles from different topics. We first examined model performance by varying the size of the search query. We generated a set of 100 queries of same length guaranteed to have at least one match in the input sequence. Then we ran the search for these queries on the model and file-scan while recording time for performing each look-up. We repeated the process 10 times and computed an average time it took the model and file-scan to process 100 queries on the input data. As shown in Figure 4, the proposed model outperformed the file-scan approach for longer sized queries, but file scan works faster for smaller queries of length two. For the proposed model, the significant difference in search time between queries of size two and three is because a much longer list of matches was retrieved for queries of size two than three. Since the model's initial data look-up is carried out based on the hash value calculated from the query, a longer list of potential results indicates that the model needs to validate more possible matches for hash collisions. Each hash validation is a separate database operation which, given a large number of such operations, considerably slows down the model. In addition, retrieval of each continuation list results in an additional look-up in the relational database that negatively impacts the performance. For longer queries where continuation list is smaller, the model outperformed the file-scan with the difference being more substantial for larger input size. ALGORITHMS FOR MODEL BUILDING: SEARCH Algorithm SEARCH (performs the lookup for query Q in the model) Input: A query sequence Q; Output: A list of continuations; Method: 1. Continuations = {∅} 2. Prospects = list of nodes with CompoundHashDigest same as the one produced from sequence Q 3. for each node N in Prospects 4. if N.path is a list of nodes representing items in Q then 5. add continuations of N and sub-tree root N.path0 to Continuations 6. end if 7. end for 8. return Continuations We also studied the effects of the result set size, i.e., the number of matches, on the performance of the model. We produced a set of queries of fixed length and of the same number of matches produced by a query. Queries were used to test the model and file-scan with running times recorded and averaged over 10 repetitions for each result set size. 2) Limitation The model has a few limitations in search. The first limitation is related to the query size. Because of the sliding window processing, the maximum query size for which the model is guaranteed to find all matches is half the window size. For instance, in our example with a window size of 4, only the queries of length 2 will return the full set of continuations, while the queries of size 3 will return some 344 Figure 6. A melodic sequence consists of note events representing in tuples of pitch, onset time, duration, and velocity. Figure 4. Average search time for varying sizes of query strings. Figure 7. Number of compositions matched for continuations for queries generated from different composer’s work. Figure 5. Average search time versus number of results with query size of two. work while Figure 7 (b) presents the queries generated from Mozart’s. The italicized numeric values indicate the number of matches from Mozart’s while the non-italic values shows the matches from Bach’s. As shown in Figure 7, the decrease in the number of matches as the query length increases indicates that longer note sequences are less likely to appear in multiple compositions from the two composers. However, it can be observed that queries from Mozart’s work tend to retrieve more continuations from both composers than queries from Bach’s. Figure 5 shows the average search time for the proposed model and file-scan using queries with varying number of matches. It can be observed that the model performed better for queries with a small number of matches than file-scan. With the growth of the result set, the model gradually became slower than the file-scan approach. However, as shown in the previous experiment, the model outperformed file-scan more significantly for the larger input size. B. Retriving Continuations for Symbolic Music Data In this section, we tested the model using symbolic music data. In symbolic music, specifically the MIDI format, a composition consists of one or more tracks that each often represents an instrument. A track consists of a sequence of note events, and each note event is described as a tuple of attributes including pitch, onset time, duration, and velocity (loudness). Figure 6 presents a melodic sequence with note events represented in MIDI pitch number, onset time (in beats), duration (in beats), and velocity. For simplicity, we focused only on MIDI pitch number in this study. We collected MIDI files of compositions by Bach and Mozart. The MIDI files with 120 tracks include symphony, concerto, sonata, prelude, fugue and other forms, which result in 138010 note events. For the experiments, we produced a set of 1000 randomly selected, fixed length queries from the tracks of each composer’s work. For each query, we retrieved continuations from both Bach and Mozart data sets and examined the number of compositions identified as matches for continuations for each composer. Figure 7 shows the average number of matches from each composer’s work given varying length of queries in terms of number of notes. Figure 7 (a) shows the retrieved number of matches using queries generated from Bach’s C. Retriving Continuations for Audio In this section, we focused on audio musical sequences. We collected 1,427 15-second excerpts of audio recordings from Bach (555 excerpts) and Mozart (872 excerpts) compositions. Compositions used in the experiment also include various musical forms with instruments such as recorder, violin, flute, piano, oboe, guitar, and choir. Unlike textual data and MIDI which can be easily parsed as a sequence of words or pitch numbers, we needed a reduction function to convert audio clips into sequences of symbols that are capable of describing the musical content in the audio. We used the FACEG algorithm [8] for the reduction function. The FACEG algorithm tracks music tonal center, or key, in real time. As shown in Figure 8, the input audio recording is first pre-processed to reduce noise and eliminate silences, and then analyzed frame-by-frame using fast Fourier transform to retrieve frequency information. The information in the frequency spectrum provides pitch information about local note events, and this information is mapped to the 3D Spiral Array model [9] to label the audio frame using the closest key from the 24 predefined key points. 345 Figure 8. The system architecture for FACEG [3]. We modified the FACEG algorithm to create two reduction functions for this study: the key reduction function labels each audio segment using the closest key generated by FACEG while the pitch-chord-key reduction function labels the segment using the closest pitch, chord, or key. The pitchchord-key reduction function provides finer description of the musical content as it divides the space in Spiral Array into 60 partitions while the key reduction function only consists of 24 partitions. Each audio excerpt (with sampling rate of 44.1kHz) was divided into non-overlapped segments of roughly 370 milliseconds, and each segment was then summarized as the closest pitch, chord, or key using the reduction function. After applying the reduction function, we created the tree structures and indexes for the 1,427 audio excerpts using the approach described in section II. Similar to the experiment using MIDI data, we produced a set of 1000 randomly selected, fixed length queries from the excerpts of each composer. Figure 9 shows the average number of matches from each composer’s work given varying length of queries using key reduction function. It can be observed that the model was able to identify only few compositions from the data set for longer queries. Shorter queries with length less than 1.5 seconds (4 segments) have many matches in both composers’ sets, but shorter queries generated from Mozart’s retrieved more matches in Mozart’s compositions. Figure 10 shows the results using the pitch-chord-key reduction function. Comparing with Figure 9, the number of matches reduced significantly when the pitch-chord-key reduction function was used, especially for shorter queries. The model was able to identify a single composition for queries longer than 1.5 seconds. This implies that the pitchchord-key reduction function can be used to uniquely identify compositions from a mixed data set of Bach and Mozart’s work when the recordings are summarized in sufficiently long segments. IV. Figure 9. Number of compositions matched for continuations for queries generated from different composer’s work using key reduction function. Figure 10. Number of compositions matched for continuations for queries generated from different composer’s work using pitch-chord-key reduction function. However, the performance measurements indicated that our model drive by relational database was hindered by the need of frequent database interactions. Identifying a way to conduct database look-ups less frequently is essential to improve the model's speed. Ultimately, we plan to migrate the model to HBase environment running under Hadoop. A small number of database interactions becomes important when dealing with distributed databases because of added overhead of network latency. REFERENCES [1] [2] [3] CONCLUSIONS AND FUTURE WORK [4] We proposed and tested a model that can be used to search and retrieve possible continuations for various types of data. Unlike the original Continuator model, our version resides in the database instead of the main machine memory in order to mitigate memory size limitations. The proposed algorithm expands the original model by reusing sub-trees and compound hash digest for quick retrievals. Experiments using textural data showed that the model outperformed the sequential file-scan algorithm on large input data set. We also observed that the model was able to identify the musical sequence of a particular composition when appropriate reduction function was used. [5] [6] [7] [8] [9] 346 M. I. Mandel and D. Ellis, “Song-level features and support vector machines for music classification,” Proceedings of the 6th International Conference on Music Information Retrieval, 2005. M. Levy and M. Sandler, “Music information retrieval using social tags and audio,” IEEE Trans. on Multimedia, vol. 11, no. 3, pp. 383395, 2009. P. Lops, M. Gemmis, G. Semeraro, “Content-based recommendar system: State of the art and trends,” Recommender Systems Handbook, Springer, New York, 2011. C.-H. Chuan and E. Chew, “Generating and evaluating musical accompaniments that emulate style,” Computer Music Journal, vol. 35, issue 4, MIT Press, 2011. A. Eigenfeldt and P. Pasquier, “Realtime generative music system using autonomous melody, harmony and rhythm agents,” the 12 Generative Art Conference, 2009. F. Pachet, “The continuator: Musical interaction with style,” Journal of New Music Research, vol. 32, no. 3, pp. 333-341, 2003. G. Assayag and S. Dubnov, “Using factor oracles for machine improvisation,” Journal of Soft Computing: A Fusion of Foundations, Methodologies and Applications, vol. 8, no. 9, pp. 604-610, 2004. C.-H. Chuan and E. Chew, “Audio key finding: Considerations in system design, and case studies on 24 Chopin’s preludes,” EURASIP Journal on Applied Signal Processing, doi:10.1155/2007/56561. E. Chew, Mathematical and Computational Modeling of Tonality: Theory and Applications, Springer, New York, 2014.
© Copyright 2024