The requirements of audio processing for motion pictures present several special problems that both make digital processing of audio very desirable and relatively difficult. The difficulties can be summarized as follows: (1) Large amounts of numerical computation are required, on the order of 2 million integer multiply-adds per second per channel of audio, for some number of channels. (2) The exact processing involved changes in real time but must not interrupt the flow of audio data. (3) Large amounts of input/output capacity is necessary, simultaneous with numerical calculation and changes to the running program, on the order of 1.6 million bits per second per channel of audio. To this end, the digital audio group at Lucasfilm is building a number of audio signal processors the architecture of which reflects the special problems of audio.
A central problem in the production of music by computer is how to generate sounds with similar but different timbres; that is, how to synthesize a "family" of instruments that are perceived as distinguishable but similar in timbre. This paper describes a method for synthesizing a family of string-like instruments that proceeds as follows: First, a few actual violin notes are analyzed using linear prediction. Second, a bilinear transformation is applied to the linear prediction model on synthesis, using a recently described algorithm. This yields a computationally efficient way to generate a virtual string orchestra, with violin-, viola-, violoncello-, and bass-like timbres. A realization of a 4.7-minute piece composed by P. Lansky will be played.
Subjective similarity between musical pieces and artists is an elusive concept, but one that must be pursued in support of applications to provide automatic organization of large music collections. In this paper, we examine both acoustic and subjective approaches for calculatingsimilarity between artists, comparing their performance on a common database of 400 popular artists. Specifically, we evaluate acoustic techniques based on Mel-frequency cepstral coefficients and an intermediate `anchor space' of genre classification, and subjective techniques which use data from The All Music Guide, from a survey, from playlists and personal collections, and from web-text mining.
Advances in processor technology will make it possible to use general-purpose personal computers as real-time signal processors. This will enable highly-integrated "all-software" systems for music processing. To this end, the performance of a present generation superscalar processor running synthesis software is measured and analyzed. A real-time reimplementation of Fugue, now called Nyquist, takes advantage of the superscalar synthesis approach, integrating symbolic and signal processing. Performance of Nyquist is compared to Csound.
ly, there is a mathematical function that maps beat numbers to time and an inverse function that maps time to beats. (Some special mathematical care must be taken to allow for music that momentarily stops, instantaneously skips ahead, or is allowed to jump backwards, but we will ignore these details.) One practical representation of the beat-to-time function is the set of times of individual beats. For example, a performer can tap beats in real time and a computer can record the time of each tap. This produces a finitely sampled representation of the continuous mapping from beats to time, which can be interpolated to obtain intermediate values. MIDI Clocks, which are transmitted at 24 clocks per quarter note, are an example of this approach. The Mockingbird score editor [Maxwell 84] interface also supported this idea. The composer would play a score, producing a piano-roll notation. Then, downbeats were indicated by clicking with a mouse at the proper locations in the piano roll score....
The availability of large music collections calls for ways to efficiently access and explore them. We present a new approach which combines descriptors derived from audio analysis with meta-information to create different views of a collection. Such views can have a focus on timbre, rhythm, artist, style or other aspects of music. For each view the pieces of music are organized on a map in such a way that similar pieces are located close to each other. The maps are visualized using an Islands of Music metaphor where islands represent groups of similar pieces. The maps are linked to each other using a new technique to align self-organizing maps. The user is able to browse the collection and explore different aspects by gradually changing focus from one view to another. We demonstrate our approach on a small collection using a meta-information-based view and two views generated from audio analysis, namely, beat periodicity as an aspect of rhythm and spectral information as an aspect of timbre.
Canon is both a notation for musical scores and a programming language. Canon offers a combination of declarative style and a powerful abstraction capability which allows a very high-level notation for sequences of musical events and structures. Transformations are operators that can adjust common parameters such as loudness or duration. Transformations can be nested and time-varying, and their use avoids the problem of having large numbers of explicit parameters. Behavioral abstraction, the concept of making behavior an arbitrary function of the environment, is supported by Canon and extends the usefulness of transformations. A non-real-time implementation of Canon is based on Lisp and produces scores that control MIDI synthesizers. Introduction Canon is a computer language designed to help composers create note-level control information for hardware synthesizers or synthesis software. Canon was motivated by my need for a simple yet powerful language for teaching second-semester stud...
This article introduces a few of the many ways that sound data can be stored in computer files, and describes several of the file formats that are in common use for this purpose. This text is an expanded and edited version of a "frequently asked questions" (FAQ) document that is updated regularly by one of the authors (van Rossum). Extensive references are given here to printed and network-accessible machine-readable documentation and source code resources. The FAQ document is regularly posted to the USENET electronic news groups alt.binaries.sounds and comp.dsp for maximal coverage of people interested in digital audio, and to comp.answers, for easy reference. It is available by anonymous Internet file transfer A Child's Garden of Sound File Formats DRAFT: To appear in Computer Music Journal 19:1 (Spring 1995) 2
This paper describes how MAX has been extended on the NeXT to do signal as well as control computations. Since MAX as a control environment has already been described elsewhere, here we will offer only an outline of its control aspect as background for the description of its signal processing extensions.
/SampledSounds sound UI tools Mixes Mix editors and browsers browsers other structures MIDIVoices SampleVoices Kernel Pope: IDP 9 CMJ 16(3) - Fall, 1992 MODE: The Musical Object Development Environment The MODE software system consists of Smalltalk-80 classes that address five areas: the representation of musical parameters, sampled sounds, events and event lists; the description of middle-level musical structures; real-time MIDI, sound I/O and DSP scheduling; a user interface framework and components for building signal, event, and structure processing applications; and several built-in end-user applications. We will address each of these areas in the sections below. Figure 2 is an attempted schematic presentation of the relationships between the class categories and hierarchies that make up the MODE environment. The items in Fig. 2 are the MODE packages, each of which consists of a collection or hierarchy of Smalltalk-80 classes. The items that are displayed in courier font are ...
In this paper we describe our efforts towards the development of live performance computer-based musical instrumentation. Our design criteria include initial ease of use coupled with a long term potential for virtuosity, minimal and low variance latency, and clear and simple strategies for programming the relationship between gesture and musical result. We present custom controllers and unique adaptations of standard gestural interfaces, a programmable connectivity processor, a communications protocol called Open Sound Control (OSC), and a variety of metaphors for musical control. We further describe applications of our technology to a variety of real musical performances and directions for future research.
nd to its importance to the musicmaking process. It exists if only because musicians think it exists; that's enough if it changes the way they play music. To make live music with the hypothetical Computer Music Workstation of the first paragraph we must make it work in real time: connect one or more gestural input devices to it, compute each sample only slightly in advance of the time it is needed for conversion by the DACs, and make the sample computation dependent in some musically useful way on what has come in recently from the live input. The closer the rapport between the live input and what comes out of the speaker, the longer the audience will stay awake. This rapport is the crux of live computer music. A part of the quest for a better Computer Music Workstation is to make it easy to establish real-time musical control over the computer. I am against trying to set the computer up as a musical performer. Large software systems which try to instill "musical intelligence" i
time and frequency domains. The user provides a script in a language that is a superset of C. The EIN system then compiles it and provides a wrapper that calls the script for each time sample from t = 0 up to the specified number of samples, nsamps; computes the next output sample y (for the mono case), or the variables left and right (for the stereo case); and writes the output samples to an output sound file, formated for the sampling rate of sr samples/sec. The int variables t and nsamps, and the float variables y, left, right, and sr, are reserved by EIN, and should not be used for other purposes. The mono/stereo option, and the values of sr and nsamps are selected with radio buttons on the interface. The remaining reserved names
The Theremin was one of the first electronic musical instruments, yet it provides a degree of expressive real-time control that remains lacking in most modern electronic music interfaces. Underlying the deceptively simple capacitance measurement used by it and its descendants are a number of surprisingly interesting current transport mechanisms that can be used to inexpensively, unobtrusively, robustly, and remotely detect the position of people and objects. We review the relevant physics, describe appropriate measurement instrumentation, and discuss applications that began with capturing virtuosic performance gesture on traditional stringed instruments and evolved into the design of new musical interfaces. 1)
We aim at capturing the know-how of expert patch programmers to build more productive human interfaces for commercial synthesizers. We propose a framework to represent this superficial layer of knowledge, that can be used to help musicians program commercial synthesizers patches. Our key idea is to classify sounds according to transformations experts can apply to them. We propose a dual representation of sounds combining object-oriented programming with classification mechanisms. We illustrate our framework with a prototype system that helps program Korg-05R/W synthesizers. Key Words: Computer-Aided Synthesis, Desription Logics, ObjectOriented programming, Smalltalk From Computer Aided Synthesis to Computer Aided Synthesizer Programming Our framework stems from the following remark. In an intensive care unit, a typical nurse knows perfectly well how to use an infusion pump. However, the nurse may be an "expert" in infusion pump manipulation without necessarily having any theoretical ...
Nyquist is a functional language for sound synthesis with an efficient implementation. It is shown how various language features lead to a rather elaborate representation for signals, consisting of a sharable linked list of sample blocks terminated by a suspended computation. The representation supports infinite sounds, allows sound computations to be instantiated dynamically, and dynamically optimizes the sound computation.
ion The design technique of abstraction can be used to increase the amount of reuse within a set of classes by describing abstract classes that embody some of their shared state and/or behavior. One uses abstraction to construct a hierarchy relating several concrete classes to common abstract classes for better reuse of state and behavior. This emphasizes sharing of both state and behavior; abstract classes often implement algorithms in terms of methods that are overridden by concrete subclasses. The result of abstraction is a more horizontal hierarchy or a new protocol family. A typical abstraction example is building a model graphical objects for an objectoriented drawing package. The first pass includes several concrete classes (for the superclass name Controller StorageManager superclass name instance variables View EventListEditorView displayList layoutManager "manage read/write of list to/from files" (links represent 'has-a' relationship) superclass name Controller Se...
This article will discuss the technology of SWSS and then present and compare these three systems. It is divided into three parts; the first introduces SWSS in terms of progressive examples. Part two compares the three systems using the same two instrument/score examples written in each of them. The final section presents informal benchmark tests of the systems run on two different hardware platforms---a Sun Microsystems SPARCstation-2 IPX and a Next Computer Inc. TurboCube machine---and subjective comments on various features of the languages and programming environments of stateof -the-art SWSS software. This author's connection with this topic is that of extensive experience with several different SWSS systems over the last 15 years, starting with MUS10 and including all three compared here: Csound (in the form of Music-11 initially) at the CMRS studio in Salzburg (Pope 1982); cmusic in the CARL environment at PCS/Cadmus computers in Munich (Pope 1986); and more recently a combination of cmix, Csound, and various vocoder software packages with user interfaces written in Smalltalk-80 at the CCRMA Center for Computer Research in Music and Acoustics at Stanford University (Pope 1992). Pope: MT XV---Three Systems for SWSS 2 CMJ 17(2)---Summer, 1993 1993.01.16
Karlheinz Stockhausen had the last piece from his 29-hour-long opera cycle licht entitled Licht-Bilder on which 27 long years were put forth. This work was complemented by a video production where both an elaborate sound system for auditoriums and a complete redundant set of stage equipment were provided. Triangular sails were designed on the stage onto which the videos were being projected. This work tries to show off a kind of run-through of all the seven days of the opera while the diverse materials that were being used are a dense interconnection that is central on how Stockhausen is thinking. The pitch, rhythm, and duration were not altered at all while the tempi and dynamics were given interpretative leeway. As for its acousmatic music, both the construction and the sound act of the sound elements were in a dialectic way.