home  previous   next 
The Fifth Annual Interactive Music Conference
brainstorming graphic

Group Report: The Multichannel Audio Working Group

Participants: A.K.A. "The Story of O" Jack Buser; Dolby Laboratories
Keith Charley; Creative Labs, Inc. Trudy Culbreth-Brassell; Microsoft
Todd Hager; Dolby Laboratories Jonathan Hoffberg; Dolby Laboratories
Jean-Marc Jot; Creative Advanced Technology Center Phil Lehwalder; Intel
Scott McNeese; Philips Semiconductors Adam Philp; Sensaura Ltd.
Jim Rippie David Roach; SigmaTel, Inc.
Larry The O; LucasArts

Keith Weiner; DiamondWare, Ltd.

  Facilitator: Linda Law; Fat Labs, Inc.


The goal of this group was to outline a "write-once, deliver anywhere," platform independent, format-agnostic approach to 3D interactive multichannel audio delivery from authoring through to the final consumer experience. We began our investigation with a survey of the delivery scenario, and worked back to the authoring process.

The primary intent of our discussion was to identify issues, not investigate them in detail. The issues cited in this report all must be explored and solved before a complete multichannel signal chain can be realized. It is our hope that those solving individual problems will do so with awareness of the holistic context we describe here, and we believe that if that awareness is maintained it will result in the highest integration and efficiency through the entire multichannel signal chain.

This report touches on and incorporates subjects that have been part of previous Project Bar-B-Q groups, including the multiformat audio work group of '98 and the interactive audio "big picture" work group of '99.

This report also reflects on the Interactive Audio Special Interest Group's (IASIG) Multi-Format Audio Working Group, which has made substantial progress toward a report entitled "Recommended Practices For Handling Multi-Format Audio," which awaits ratification by the IASIG (at which time it will become publicly available). The observations and recommendations in this report will be forwarded to the IASIG in the hopes that the Multi-Format Audio Working Group will explore these concerns and proposed enhancements, incorporating them into a "Recommended Practices" version 2 document.

This Bar-B-Q group's recommendations revolve around three main proposals:

  1. That IASIG advocate audio calibration features to audio device manufacturers, which provides a system profile of the user audio environment against which audio can be more effectively delivered.
  2. That IASIG discuss and outline a metadata layer for delivering positional audio rendering instructions to audio devices.
  3. That the IASIG give thought to the advancement of a unified platform for authoring linear and interactive multichannel audio, which incorporates the notions of metadata generation to maximize the efficiency of the authoring process.

Note: another Project Bar-B-Q 2000 working group focused on Interactive Audio authoring issues in detail, including multichannel authoring and delivery issues. Readers looking for supplemental information are strongly encouraged to seek out this report at http://www.projectbarbq.com/reports/bbq00/bbq00r7.htm.

Delivery World-Stating the Problem

We began with a major concern: How does a developer account for different users' listening environments?

Currently, audio playback chains take on many configurations: Stereo speakers, stereo headphones, 5.1 channel speaker environments, 6.1, 7.1 channel environments, etc.

Furthermore, audio streams can be encoded for more efficient transmission using Dolby Digital, Dolby Pro Logic, or otherwise processed to achieve a greater sense of spatiality using techniques such as Head-Related Transfer Function (HRTF) or stereo field expansion algorithms.

Finally, customizable platforms, such as PC and Macintosh, can have wide variance in available resources and CPU power, which results in a range of capabilities for reproducing multichannel sound. (This is in contrast to console platforms, which have the same resources from unit to unit.)

The combination of these delivery variables, together with aspects of an end user's listening environment (reflective surfaces, bass absorption, etc.), and the lack of sufficient connectors on the back of devices supposedly equipped for multichannel audio delivery, provide wide-ranging challenges to audio providers seeking to deliver the optimum listening experience to the most users.

Delivering audio properly formatted for many configurations is beyond impractical for audio providers, even if we ignore issues specific to each listener's environment. Audio providers need a platform independent solution that will realize the audio designer's goal on as wide an array of systems as possible, without requiring custom authoring for any particular configuration.

Defining the Listening Environment

A couple of terms are useful in quantifying and qualifying the listener's environment, so that the delivery mechanism may more accurately deliver an audio experience.

  1. System Profiling - The issue is that of how to inform the delivery system of the user's listening environment, speaker setup, and platform configuration so that the delivery system can render the experience as effectively as possible.
  2. Automated Calibration - The challenge of how to create an intelligent calibration application that would employ a microphone feedback mechanism to set speaker levels and bass management settings.

Once the end user's system is profiled and calibrated, applications playing audio to that environment can adjust their audio content to the system to provide predictable results. With a calibrated playback chain whose attributes are known by the delivery mechanism, any given end user will be that much more likely to receive an optimized performance.

Of course, variances between listening environments occur for many reasons, including

  1. Frequency response of the equipment chain.
  2. Number, placement, and orientation of speakers, or whether headphones are in use.
  3. Amount of reflectivity (or sound absorption) from surfaces across the frequency range.
  4. Whether artificial positional processing is available.

Aspects of a self-calibration system could include:

  1. Listener-defined setups in conjunction with an intelligent App in control panel, or
  2. System self-calibration, with the addition of a coincident pair of microphones at the primary listener position .
  3. Automated or semi-automated bass management.

As an existing real world example, many games already perform such a profiling on installation and select (either automatically or through user choice) an installation appropriate for the resources found. This is a good model for multichannel profiling.

The group recognized that system profiling itself is only part of the long-term solution for a "write-once, deliver anywhere" approach to multichannel audio-the solution should also provide a sensible way to handle variances between many different types of systems. (More detail on how these systems can vary and the problems that poses appears further below.)

These variances can be accommodated by having each type of system adaptively determine how best to resolve the audio the author has designed. To that end, the group proposes the introduction of a new data protocol that will accompany an interactive audio stream for the purpose of assisting the playback system in determining how best to reproduce the associated audio.

The group defined this set of instructions as a metadata layer.

The Metadata Layer

The group worked to identify ways that audio experiences could be authored and delivered to provide the end users with maximum benefit from their varied playback environments.

It was agreed that a metadata layer is required to supply information to the playback system regarding channel routing, and location, orientation, and other attributes of accompanying audio data. The format of this metadata layer could be based upon a 3D audio API, such as Interactive 3D Audio Level 2 (I3DL2). That API defines interactive simulations of spatial acoustics and real time audio environment modeling and is intended to deliver highly realistic experiences.

The authoring tools should generate the metadata layer, which will accompany any audio stream produced in the tool, and will instruct the playback system how to play the associate audio.

Aspects of this metadata layer and the systems that use them include:

  1. Well-defined interface for access on the authoring and consumer sides
  2. Within a stream, description of the # of channels and vector position of each
  3. Ability to map any input format to any output format
  4. Capacity to handle the special case: stream pass-through
  5. Capacity to route interactive audio to the LFE
  6. Ability to supersede default channel designations of multichannel formats

The metadata layer should also contain parameters that are sensitive to the following playback chain limitations:

  1. CPU
  2. DSP Capability
  3. Interfaces
  4. Mass storage
  5. Decoders
  6. Hardware
  7. Software
  8. Physical connectors
  9. System bus
  10. Memory/buffering
  11. Output port

The group also recognized that variances between systems will inevitably lead to cases where some systems will be incapable of delivering the audio designer's intended experience as expressed through the metadata layer. Some form of graceful degradation must be available to accommodate situations where the platform is not capable of reproducing full, discrete multichannel data (based on a profile of the user's circumstances).

For example, a system capable only of stereo output should have a way of resolving audio content such as streamed 5.1 format audio that outstrips its abilities, through a recommended process of downmixing or by other means, such as alternate compressed versions. The group sees a revision to the IASIG's "Recommended Practices For Handling Multi-Format Audio" report as ideally suited to providing these recommendations.

Authoring Tools and Methods

Platform independent development is the grail of the interactive multichannel authoring tool quest. It was proposed that, ideally, there should be one authoring platform for both linear and interactive content. The environment should have the following attributes:

  1. A platform-independent authoring environment
  2. Fully scriptable multichannel effects, including reverb, and occlusion.
  3. 3D, hi-resolution, gestural control, with response curves (for the hi-end).
  4. An inexpensive tactile controller for defining spatial placement (lo-end).
  5. Auditioning capability.
  6. Orchestration capability.
  7. Capacity to audition within the context of the target platform (in game).
  8. Implementation of depth-axis positioning for 3D sound.
  9. Scriptable audio behavior, both interactive and linear.
  10. Support for seamless looping and crossfading of compressed streams.
  11. Open list of audio source formats, including MIDI.
  12. Allows author to specify positions relative to the listener, the listener's head, or the speakers.
  13. Authoring tools should support the generation of a metadata layer that controls the manner of playback on a given profiled system, as described in the previous section.


The discussions that produced this document were quite broad in scope, but there are some key ideas that surfaced that could form the basis for significant advances in multichannel and positional audio delivery on PCs and game platforms:

  1. A metadata layer for delivering 3D interactive multichannel audio.
  2. A standardized method for self-calibration of end user multichannel playback systems.
  3. A single, integrated environment for the interactive and linear production of audio soundtracks for games.

It is our hope that these ideas will continue to be explored and refined to the point where useful recommendations can be made to equipment and tools manufacturers for the development of technology which addresses this fundamental problem: How to deliver a 3D interactive multichannel audio experience in a complex, multiplatform world.

Action Items

The working group will flesh out and distribute guidelines.

The working group will seek participation of representatives from other potential multichannel platforms, and more content providers for the working group.

Jim Rippie will forward the group's recommendations to the IASIG to co-opt the multiformat working group if Michael Land and the executive advisor of said group assent.

Individual members will evangelize the recommendations of the BBQ workgroup and the IASIG group to their respective constituencies.

section 5

next section

select a section:  1. Introduction  2. Speakers  3. Executive Summary  
4. The Appliantology Group  5. The Multichannel Audio Working Group  
6. The Intellectual Property - Business Models To Save Your Soul Group 
7. The General Interactive Audio Group   8. Schedule & Sponsors