Conference Paper

LoopMaker: Automatic Creation of Music Loops from Pre-recorded Music

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Music loops are seamlessly repeatable segments of music that can be used for music composition as well as backing tracks for media such as videos, webpages, and games. They are regularly used by both professional musicians as well as novices with very little experience in audio editing and music composition. The process of creating music loops can be challenging and tedious, particularly for novices. We present LoopMaker, an interactive system that assists users in creating and exploring music loops from pre-recorded music. Our system can be used in a semi-automatic mode in which it refines a user's rough selection of a loop. It can also be used in a fully automatic mode in which it creates a number of loops from a given piece of music and interactively allows the user to explore these loops. Our user study suggests that our system makes the loop creation process significantly faster, easier, and more enjoyable than manual creation for both novices and experts. It also suggests that the quality of these loops are comparable to manually created loops by experts.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... For loop extraction, Shi and Mysore [24] proposed an interface with automatic and semi-automatic modes for the producer to find the most suitable segment in a piece of music to excerpt and use as a loop, by cropping directly. These algorithms estimate similarity with handcrafted features: harmony, timbre, and energy [24,30]. ...
... For loop extraction, Shi and Mysore [24] proposed an interface with automatic and semi-automatic modes for the producer to find the most suitable segment in a piece of music to excerpt and use as a loop, by cropping directly. These algorithms estimate similarity with handcrafted features: harmony, timbre, and energy [24,30]. However, the segments are excerpted without any attempt to isolate one part of a potentially multi-part piece of music. ...
... Data from 116 Taiwanese participants were collected, and all of them were included for analysis. The participants self-reported the gender they identified with (36 female, 80 male) and age (19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30)(31)(32)(33). 50% said they listened to loopbased music more than 5 days a week, 54% had classical musical training, and 42% had experience composing loop-based music. ...
Preprint
Full-text available
Music producers who use loops may have access to thousands in loop libraries, but finding ones that are compatible is a time-consuming process; we hope to reduce this burden with automation. State-of-the-art systems for estimating compatibility, such as AutoMashUpper, are mostly rule-based and could be improved on with machine learning. To train a model, we need a large set of loops with ground truth compatibility values. No such dataset exists, so we extract loops from existing music to obtain positive examples of compatible loops, and propose and compare various strategies for choosing negative examples. For reproducibility, we curate data from the Free Music Archive. Using this data, we investigate two types of model architectures for estimating the compatibility of loops: one based on a Siamese network, and the other a pure convolutional neural network (CNN). We conducted a user study in which participants rated the quality of the combinations suggested by each model, and found the CNN to outperform the Siamese network. Both model-based approaches outperformed the rule-based one. We have opened source the code for building the models and the dataset.
... For loop extraction, previous works have attempted to detect autocorrelated peaks (it relies on an assumption that it is sufficient to detect the starting point "A" of phrases) [5,6], but we apply the knowledge of overall loop structures, obtained from the audio domain, to the MIDI domain. ...
... Some works have imposed structural constraints on music generation models [15,16], or directly detect novel segments which are repetitive in time series [17,18]. In the audio domain, there have been attempts to extract loops explicitly by capturing repeated phrases [5,6]. They extract harmonic features such as chroma vectors or mel-frequency cepstrum and catch autocorrelation peaks to determine the starting point of loops. ...
Preprint
Full-text available
Since most of music has repetitive structures from motifs to phrases, repeating musical ideas can be a basic operation for music composition. The basic block that we focus on is conceptualized as loops which are essential ingredients of music. Furthermore, meaningful note patterns can be formed in a finite space, so it is sufficient to represent them with combinations of discrete symbols as done in other domains. In this work, we propose symbolic music loop generation via learning discrete representations. We first extract loops from MIDI datasets using a loop detector and then learn an autoregressive model trained by discrete latent codes of the extracted loops. We show that our model outperforms well-known music generative models in terms of both fidelity and diversity, evaluating on random space. Our code and supplementary materials are available at https://github.com/sjhan91/Loop_VQVAE_Official.
... We introduce the design of waveform+region navigation, a standard interaction paradigm used in professional music production tools [74, 75] to facilitate music instructional video navigation (D2). While similar designs have been leveraged in HCI literature for music-related interfaces [57,58], none of them were used to explicitly address the challenges of navigating music instructional videos. ...
Preprint
Full-text available
Learning musical instruments using online instructional videos has become increasingly prevalent. However, pre-recorded videos lack the instantaneous feedback and personal tailoring that human tutors provide. In addition, existing video navigations are not optimized for instrument learning, making the learning experience encumbered. Guided by our formative interviews with guitar players and prior literature, we designed Soloist, a mixed-initiative learning framework that automatically generates customizable curriculums from off-the-shelf guitar video lessons. Soloist takes raw videos as input and leverages deep-learning based audio processing to extract musical information. This back-end processing is used to provide an interactive visualization to support effective video navigation and real-time feedback on the user's performance, creating a guided learning experience. We demonstrate the capabilities and specific use-cases of Soloist within the domain of learning electric guitar solos using instructional YouTube videos. A remote user study, conducted to gather feedback from guitar players, shows encouraging results as the users unanimously preferred learning with Soloist over unconverted instructional videos.
... Therefore, existing MIR algorithms need to be tested and (possibly) adapted to work successfully in this scenario. Furthermore, new MIR tasks are emerging with the study of music loops including loop retrieval [12], loop detection [18], loop discovery [19] and extraction [27], loop recommendation [5], exploration of large loop databases [29], and automatic loop generation [26]. ...
Preprint
Full-text available
Music loops are essential ingredients in electronic music production, and there is a high demand for pre-recorded loops in a variety of styles. Several commercial and community databases have been created to meet this demand, but most are not suitable for research due to their strict licensing. We present the Freesound Loop Dataset (FSLD), a new large-scale dataset of music loops annotated by experts. The loops originate from Freesound, a community database of audio recordings released under Creative Commons licenses, so the audio in our dataset may be redistributed. The annotations include instrument, tempo, meter, key and genre tags. We describe the methodology used to assemble and annotate the data, and report on the distribution of tags in the data and inter-annotator agreement. We also present to the community an online loop annotator tool that we developed. To illustrate the usefulness of FSLD, we present short case studies on using it to estimate tempo and key, generate music tracks, and evaluate a loop separation algorithm. We anticipate that the community will find yet more uses for the data, in applications from automatic loop characterisation to algorithmic composition.
Article
Full-text available
Dynamic time warping (DTW) is a well-known technique to find an optimal alignment between two given (time-dependent) sequences under certain restrictions. Intuitively, the sequences are warped in a non-linear fashion to match each other. Originally, DTW has been used to compare different speech patterns in automatic speech recognition. In fields such as data mining and information retrieval, DTW has been successfully applied to automatically cope with time deformations and different speeds associated with time-dependent data. In this chapter, we introduce and discuss the main ideas of classical DTW (Section 4.1) and summarize several modifications concerning local as well as global parameters (Section 4.2). To speed up classical DTW, we describe in Section 4.3 a general multiscale DTW approach. In Section 4.4, we show how DTW can be employed in identifying all subsequences within a long data stream that are similar to a given query sequence (Section 4.4). A discussion of related alignment techniques and references to the literature can be found in Section 4.5.
Conference Paper
Full-text available
Current digital painting tools are primarily targeted at professionals and are often overwhelmingly complex for use by novices. At the same time, simpler tools may not invoke the user creatively, or are limited to plain styles that lack visual sophistication. There are many people who are not art professionals, yet would like to partake in digital creative expression. Challenges and rewards for novices differ greatly from those for professionals. In this paper, we leverage existing works in Creativity and Creativity Support Tools (CST) to formulate design goals specifically for digital art creation tools for novices. We implemented these goals within a digital painting system, called Painting with Bob. We evaluate the efficacy of the design and our prototype with a user study, and we find that users are highly satisfied with the user experience, as well as the paintings created with our system.
Conference Paper
Full-text available
Traditional audio editing tools do not facilitate the task of separating a single mixture recording (e.g. pop song) into its respective sources (e.g. drums, vocal, etc.). Such ability, how­ ever, would be very useful for a wide variety of audio applications such as music remixing, audio denoising, and audio-based forensics. To address this issue, we present ISSE – an interactive source separation editor. ISSE is a new open-source, freely available, and cross-platform audio editing tool that enables a user to perform source separation by painting on time-frequency visualizations of sound, resulting in an interactive machine learning system. The system brings to life our previously proposed interaction paradigm and separation algorithm that learns from user-feedback to perform separation. For evaluation, we conducted user studies and compared results between inexperienced and expert users. For a variety of real-world tasks, we found that inexperienced users can achieve good separation quality with minimal instruction and expert users can achieve state-of-the-art separation quality.
Article
Full-text available
In this paper we present a system, AutoMashUpper, for making multi-song music mashups. Central to our system is a measure of “mashability” calculated between phrase sections of an input song and songs in a music collection. We define mashability in terms of harmonic and rhythmic similarity and a measure of spectral balance. The principal novelty in our approach centres on the determination of how elements of songs can be made fit together using key transposition and tempo modification, rather than based on their unaltered properties. In this way, the properties of two songs used to model their mashability can be altered with respect to transformations performed to maximize their perceptual compatibility. AutoMashUpper has a user interface to allow users to control the parameterization of the mashability estimation. It allows users to define ranges for key shifts and tempo as well as adding, changing or removing elements from the created mashups. We evaluate AutoMashUpper by its ability to reliably segment music signals into phrase sections, and also via a listening test to examine the relationship between estimated mashability and user enjoyment.
Conference Paper
Full-text available
We present the beginnings of a Cognitive Theory of Creativity Support aimed specifically at understanding novices and their needs. Our theory identifies unique difficulties novices face and reasons that may keep them from engaging in creative endeavors, such as fear of failure, time commitment, and lack of skill. To test our theory, we use it to analyze existing creativity support tools from multiple domains. We also describe the design and initial implementation of a creativity support tool based on our theory. The creativity support tool, called StorySketch, is designed to empower storytellers without graphical skills to engage in visual storytelling.
Conference Paper
Full-text available
Audio stories are an engaging form of communication that combine speech and music into compelling narratives. Ex-isting audio editing tools force story producers to manipu-late speech and music tracks via tedious, low-level waveform editing. In contrast, we present a set of tools that analyze the audio content of the speech and music and thereby allow pro-ducers to work at much higher level. Our tools address several challenges in creating audio stories, including (1) navigating and editing speech, (2) selecting appropriate music for the score, and (3) editing the music to complement the speech. Key features include a transcript-based speech editing tool that automatically propagates edits in the transcript text to the corresponding speech track; a music browser that supports searching based on emotion, tempo, key, or timbral similar-ity to other songs; and music retargeting tools that make it easy to combine sections of music with the speech. We have used our tools to create audio stories from a variety of raw speech sources, including scripted narratives, interviews and political speeches. Informal feedback from first-time users suggests that our tools are easy to learn and greatly facilitate the process of editing raw footage into a final story.
Conference Paper
Full-text available
This paper describes the design policy and specifications of the RWC Music Database, a music database (DB) that is available to researchers for common use and research purposes. Various commonly available DBs have been built in other research fields and have made a significant contribution to the research in those fields. The field of musical information processing, however, has lacked a commonly available music DB. We therefore built the RWC Music Database which contains four original DBs: the Popular Music Database (100 pieces), Royalty-Free Music Database (15 pieces), Classical Music Database (50 pieces), and Jazz Music Database (50 pieces). Each consists of originally-recorded music compact discs, standard MIDI files, and text files of lyrics. These DBs are now available in Japan at a cost equal to only duplication, shipping, and handling charges (virtually for free), and we plan to make them available outside Japan. We hope that our DB will encourage further advances in musical information processing research.
Article
Full-text available
We present a new technique for audio signal comparison based on tonal subsequence alignment and its application to detect cover versions (i.e., different performances of the same underlying musical piece). Cover song identification is a task whose popularity has increased in the music information retrieval (MIR) community along in the past, as it provides a direct and objective way to evaluate music similarity algorithms. This paper first presents a series of experiments carried out with two state-of-the-art methods for cover song identification. We have studied several components of these (such as chroma resolution and similarity, transposition, beat tracking or dynamic time warping constraints), in order to discover which characteristics would be desirable for a competitive cover song identifier. After analyzing many cross-validated results, the importance of these characteristics is discussed, and the best performing ones are finally applied to the newly proposed method. Multiple evaluations of this one confirm a large increase in identification accuracy when comparing it with alternative state-of-the-art approaches.
Article
Full-text available
We describe a method that aligns polyphonic audio recordings of music to symbolic score information in standard MIDI files without the difficult process of polyphonic transcription. By using this method, we can search through a MIDI database to find the MIDI file corresponding to a polyphonic audio recording.
Article
Full-text available
Subjective similarity between musical pieces and artists is an elusive concept, but one that must be pursued in support of applications to provide automatic organization of large music collections. In this paper, we examine both acoustic and subjective approaches for calculatingsimilarity between artists, comparing their performance on a common database of 400 popular artists. Specifically, we evaluate acoustic techniques based on Mel-frequency cepstral coefficients and an intermediate `anchor space' of genre classification, and subjective techniques which use data from The All Music Guide, from a survey, from playlists and personal collections, and from web-text mining.
Conference Paper
In this paper, we present MusicMixer, a computer-aided DJ system that helps DJs, specifically with song mixing. MusicMixer continuously mixes and plays songs using an automatic music mixing method that employs audio similarity calculations. By calculating similarities between song sections that can be naturally mixed, MusicMixer enables seamless song transitions. Though song mixing is the most fundamental and important factor in DJ performance, it is difficult for untrained people to seamlessly connect songs. MusicMixer realizes automatic song mixing using an audio signal processing approach; therefore, users can perform DJ mixing simply by selecting a song from a list of songs suggested by the system, enabling effective DJ song mixing and lowering entry barriers for the inexperienced. We also propose personalization for song suggestions using a preference memorization function of MusicMixer.
Conference Paper
We present PortraitSketch, an interactive drawing system that helps novices create pleasing, recognizable face sketches without requiring prior artistic training. As the user traces over a source portrait photograph, PortraitSketch automatically adjusts the geometry and stroke parameters (thickness, opacity, etc.) to improve the aesthetic quality of the sketch. We present algorithms for adjusting both outlines and shading strokes based on important features of the underlying source image. In contrast to automatic stylization systems, PortraitSketch is designed to encourage a sense of ownership and accomplishment in the user. To this end, all adjustments are performed in real-time, and the user ends up directly drawing all strokes on the canvas. The findings from our user study suggest that users prefer drawing with some automatic assistance, thereby producing better drawings, and that assistance does not decrease the perceived level of involvement in the creative process.
Article
Beat tracking – i.e. deriving from a music audio signal a sequence of beat instants that might correspond to when a human listener would tap his foot – involves satisfying two constraints. On the one hand, the selected instants should generally correspond to moments in the audio where a beat is indicated, for instance by the onset of a note played by one of the instruments. On the other hand, the set of beats should reflect a locally-constant inter-beat-interval, since it is this regular spacing between beat times that defines musical rhythm. These dual constraints map neatly onto the two constraints optimized in dynamic programming, the local match, and the transition cost. We describe a beat tracking system which first estimates a global tempo, uses this tempo to construct a transition cost function, then uses dynamic programming to find the best-scoring set of beat times that reflect the tempo as well as corresponding to moments of high ‘onset strength’ in a function derived from the audio. This very simple and computationally efficient procedure is shown to perform well on the MIREX-06 beat tracking training data, achieving an average beat accuracy of just under 60% on the development data. We also examine the impact of the assumption of a fixed target tempo, and show that the system is typically able to track tempo changes in a range of ±10% of the target tempo.
Article
This article is in five sections, each of which deals with a problem basic to the theory of musical meter. The first sets up a conceptual framework in which meter, group structure, and accent are related to one another. The second explores the practical and aesthetic functions of metric perception. In the next section, grouping and accent, as the determinants of meter, are thoroughly investigated. A penultimate section discusses several operations of metric distortion, and the final section considers the bases for, and significance of, deeper levels of meter. In a general sense, the article attempts to make sense out of the tangle of conflicting views that constitutes the recent literature on this subject. Each of its sections departs from a survey of prevalent views and proceeds to mediate among these by integrating them in a general framework in which they are more or less easily accommodated without mutual contradiction.
Article
Music loops, seamlessly repeating segments of audio, are an important ingredient for remixes and mash-ups. By recombining loops taken from complete tracks or from loop libraries not only professional DJs but even musical laypersons can enjoy the experience of music creation. One key aspect is to identify what sounds good together. To facilitate this selection process, we present a system for exploring collections of music loops through a graphical user interface that allows playful interaction with the content. The system first extracts loop segments from a selection of music tracks. The loops are then visualized as graphical objects in a GUI. Depending on their needs, the users can switch between various criteria for the visualization of the objects which is based on a set of manually or algorithmically provided features. Interaction with the objects triggers playback and simple effects on the loops.
Conference Paper
Music loops are seamlessly repeating segments of audio. The automatic extraction of music loops from digital audio is useful for searching reusable materials for new compositions. In this paper, an effective approach of music loop discovery and extraction is proposed. In our approach, we use a pitch class distribution feature to represent the tonal characteristics of the signal. Based on tonal self-similarity, we propose a pattern recognition technique to discover loop segments. For extracting single loop instances from the audio signal, we propose a timbre-based similarity criterion on beat level to allocate optimal cutting points. Our method was evaluated in a listening test with 200 extracted loop segments. The qualitative evaluation shows that for 85% of the test data, our method succeeded in extracting music loops of agreeable quality directly from audio signals.
Making loop music in a computer
  • Jim Aikin
  • Aikin Jim
Jim Aikin. 2001. Making loop music in a computer. Keyboard 27, 2 (02 2001), 134-135.
Game sound: an introduction to the history, theory, and practice of video game music and sound design
  • J Brzezinski
  • Brzezinski J.
J. Brzezinski. 2009. Game sound: an introduction to the history, theory, and practice of video game music and sound design. Choice 46, 7 (03 2009), 1323.
A loop sequencer that selects music loops based on the degree of excitement
  • Tetsuro Kitahara
  • Kosuke Iijima
  • Misaki Okada
  • Yuji Yamashita
  • Ayaka Tsuruoka
Tetsuro Kitahara, Kosuke Iijima, Misaki Okada, Yuji Yamashita, and Ayaka Tsuruoka. 2015. A loop sequencer that selects music loops based on the degree of excitement. Proceedings of the 12th International Conference in Sound and Music Computing, SMC 2015 (2015), 435-438.
Audio production worktext: concepts, techniques, and equipment
  • David E Reese
  • Lynne S Gross
  • Brian Gross
  • Reese David E.
David E. Reese, Lynne S. Gross, and Brian Gross. 2009. Audio production worktext: concepts, techniques, and equipment. Taylor & Francis.
Guide to Producing Music with Samples, Loops, and MIDI. Artistpro
  • Bill Gibson
Bill Gibson. 2005. The S.M.A.R.T. Guide to Producing Music with Samples, Loops, and MIDI. Artistpro, Boston, MA, USA.
Yuji Yamashita, and Ayaka Tsuruoka. 2015. A loop sequencer that selects music loops based on the degree of excitement
  • Tetsuro Kitahara
  • Kosuke Iijima
  • Misaki Okada
  • Yuji Yamashita
  • Ayaka Tsuruoka
  • Kitahara Tetsuro