Genetic Programming:
Automatic Generation of Sound Synthesis Techniques
Research realized at MIT Media Lab. Master in Science in Media Arts & Sciences
Digital sound synthesizers, ubiquitous today in sound cards, software and dedicated hardware, use algorithms (Sound Synthesis Techniques, SSTs) capable of generating sounds similar to those of acoustic instruments and even totally novel sounds. The design of SSTs is a very hard problem. It is usually assumed that it requires human ingenuity to design an algorithm suitable for synthesizing a sound with certain characteristics. Many of the SSTs commonly used are the fruit of experimentation and a long refinement processes. A SST is determined by its “functional form” and “internal parameters”. Design of SSTs is usually done by selecting a fixed functional form from a handful of commonly used SSTs, and performing a parameter estimation technique to find a set of internal parameters that will best emulate the target sound.
A new approach for automating the design of SSTs is proposed. It uses a set of examples of the desired behavior of the SST in the form of “inputs + target sound”. The approach is capable of suggesting novel functional forms and their internal parameters, suited to follow closely the given examples.
Design of a SST is stated as a search problem in the SST space (the space spanned by all the possible valid functional forms and internal parameters, within certain limits to make it practical). This search is done using evolutionary methods; specifically, Genetic Programming (GP). A custom language for representing and manipulating SSTs as topology graphs and expression trees is proposed, as well as the mapping rules between both representations. Fitness functions that use analytical and perceptual distance metrics between the target and produced sounds are discussed.
The AGeSS system (Automatic Generation of Sound Synthesizers) developed in the Media Lab is outlined, and some SSTs and their evolution are shown.
Audio Watermarking:
Digital Watermarking of Audio Signals Using a Psychoacoustic Auditory Model and Spread Spectrum Theory
Research realized at University of Miami. Master in Science in Music Engineering
A new algorithm for embedding a digital watermark into an audio signal is proposed. It uses spread spectrum theory to generate a watermark resistant to different removal attempts and a psychoacoustic auditory model to shape and embed the watermark into the audio signal while retaining the signal’s perceptual quality. Recovery is performed without knowledge of the original audio signal. A software system is implemented and tested for perceptual transparency and data-recovery performance.
Other Grad Research
These projects were done at MIT Media Lab between 1999 and 2001
Fall 1999 Cognitive Artifacts and Architectures –MAS 654– Discrete Time Signal Processing –6.341– Audio Processing by People and Machines –MAS 641J– | |
Spring 2000 The Nature of Mathematical Modeling –MAS 864– The Society of Mind –MAS 731J– Writing for Computer Performance –MAS 642J– | |
Fall 2000 Pattern Recognition and Analysis –MAS 622J– Acoustics –6.312– Preparation for the master’s thesis –MAS 940– |