home  previous   next 
The Fifth Annual Interactive Music Conference
PROJECT BAR-B-Q 2000
brainstorming graphic

Group Report: General Interactive Audio

Participants: A.K.A. "Foghorn" Clint Bajakian; C.B. Studios
Peter Drescher; Twittering Machine Duane Ford; Staccato Systems, Inc.
Chris Grigg; Beatnik Jennifer Hruska; Sonic Implants Network
Mike Kent; Roland Corporation Ron Kuper; Cakewalk
Mike Overlin; Yamaha Corporation Rob Rampley; Line 6
  Facilitator: Aaron Higgins; MixMeister Technology
 

 

Contents

  1. Introduction
  2. General Interactive Audio (GIA): Proposal for a New Platform Standard
  3. The Content Developer's Plight: "Call of the Foghorn"
  4. The Ultimate Development Tool
  5. Toward a Command Language
  6. Practical Steps: Proposal for MIDI Media Cueing (MMQ)
  7. Conclusion

1. Introduction

Project BBQ's ongoing mandate is to influence the development of music on computers over the next five years. As of late 2000, all signs suggest that our next major context change is going to be that the field we've come to know as 'music on computers' is about to expand to include all types of media-capable devices - not just high-performance computers, game machines, and internet appliances, but also small, mobile, wirelessly networked devices - and everything in between. These products are likely to be specified by engineers and market researchers lacking computer music experience, and the number of different products is likely to be large. Very large.

There is a clear and present danger that years of IA-SIG and MMA standards work will be overlooked in the development of these devices, and that a new Babel of incompatible, device-dependent music and sound media formats will arise. If this happens, content will suffer - and by extension, so will everyone who has to listen to the damn things. And so will everyone who has to create audio content for them - history has shown that the more formats there are, the more the tools suck.

While the situation is serious - indeed, urgent - it is far from hopeless. We do have viable trade organizations, and we can use them as megaphones for a message targeted squarely at device and tool manufacturers.

Firstly, the Foghorn group recommends that the IA-SIG undertake the creation of a document detailing our industry's recommended system specifications for interactive data-driven music and sound playback, relying on open standards and appropriate for all platforms: General Interactive Audio (GIA). Because some of these new devices will be less or more powerful than current desktop/laptop machines, the specification should include guidelines for how the system and the media should scale up or down to match the device's capabilities.

Secondly, Foghorn has considered the problem of Authoring Tools for interactive music and sound. We have articulated the tool problems that conspire to limit the quality of audio for games and other interactive media. Taking these problem statements as inspiration, we've also begun to characterize the abstract requirements of a 'Command Language' that might connect the programmer's coding world with the audio artist's media world during the development process. This language contains elements of the Cueing concept (as explored by BBQ 1999's "Q" Group and others), and would be a basis for the 'Integrator' tool that The Fat Man and others have sought since the dawn of interactive audio.

While the particulars of such a language will require further definition, we have also gone a step further and developed a rough proposal for extending the MIDI standards to accommodate this 'Command Language' in the form of a new set of MIDI messages. Expressing the language in MIDI form is an intensely practical step, as it opens the door to direct support for interactivity authoring in the pro MIDI+Audio sequencer products that content creators already use in their daily work. Imagine the improvement in content quality that would result if we could just run a MIDI cable from the game box to the sound artist's computer, so that all sound and music elements could be fine-tuned before the delivery to the programmer, in context with the actual gameplay and visuals under development.

Interactive audio has always been characterized by too many incompatible playback platforms and too few good tools. By speaking with a unified and authoritative voice now, our industry can provide a warning signal to steer device and tool designers in the right direction, and not die a horrible scraping death on the rocks of their own ignorance.

2. General Interactive Audio (GIA): Proposal for a New Platform Standard

Introduction

The members of the Project BBQ Conference of October 2000 recommend that the IA-SIG and MMA undertake the development of a new specification, white paper, recommended practice, or other similar document. The purpose of this specification is to define the necessary or recommended components of a well-designed audio system for any device or system that is expected to provide interactive audio applications.

This specification should define the components necessary to create compelling digital audio environments for computers, video game systems, Internet appliances, cell phones, wireless devices, handheld organizers and other audio producing systems. Typical software applications that would use the features of these devices include games, Internet audio, multimedia software, music production systems, video and DVD systems.

The following paper is an example outline prepared by members of the Project BBQ 2000 Foghorn Working Group. It is not intended to define any specific requirements or recommendations. It is offered as an example outline of the type of topics and required features that the IA-SIG and MMA should consider for inclusion in this document. It is not an exhaustive list of necessary items.

The GIA specification document should be written for an audience of decision-makers and management not necessarily familiar with electronic and computer music concepts, but should include enough concrete detail to guide implementors. It should primarily rely on references to preexisting specification documents (e.g. all MIDI and IA-SIG RPs and CAs) for particulars.

After preparing such a new specification, the IASIG and MMA should distribute and promote this specification and its recommendations to manufacturers of hardware devices, audio semiconductors, API designers, development tool suppliers and content providers.

GIA Specification Outline

Tools with interface designed to suit Content Creators

  • Music tools for interactive composers
  • Audio studio tools for sound designers
  • Programming tools for programmers

Audition while Authoring ("What You Hear Is What You Get")

  • Authoring/production directly in target sound system.
  • Comprehensive Real-time Connectivity via cable to target sound system.
  • Emulation of target sound system in authoring system (Authoring Tools as close to rendering engine as possible)

Linear & Nonlinear Audio Authoring Tools

  • Linear Authoring for playback of simple Music/Audio.
  • Nonlinear for playback of Interactive Music/Audio.

Creation and Translation to and from all common formats

  • MIDI, SMF, XMF, SD2, Wave, AIFF, DLS, Etc.
  • Support Established Audio Authoring Tools

Cue List Oriented

  • System based on Cue List with Common Event types

MIDI Media Cue List

  • Control all Cue Events from common protocol
  • Control Layer shared between concurrently running tools
  • (Needs Defined MIDI Control/Command language from MMA)
  • (Needs defined set of audio terms: D.Javelosa's work)

Content Format Support

  • MIDI Song Files (SMF)
  • XMF
  • MP3
  • Wave Audio
  • Etc.

Number of Channels/Voices

  • For Synthesis Controlled by MIDI
  • For Streaming Audio (Sound effects, Environmental, Etc.)

Synthesis Voices

  • Wavetable Support (DLS2, GM, etc.)
  • Layering Multiple Waveforms
  • Filter: 2 Pole LPF, 2 Pole BPF, HPF
  • MIDI Control: Number of Virtual MIDI Cables
  • Etc.

Number of Channels with Interactive 3D Positioning

  • Number of channels with 3D support
  • Number of Channels with 3D Environmental Reverb
  • Etc.

Number of Sub Groups (Application Output)

  • Stereo Mix
  • 4 Channels Output
  • 5.1 Channel Output
  • 6 Channel Output
  • Etc.

Number of Main Outputs (Physical Output)

  • To Stereo Speakers
  • To 4 Speakers
  • To 5.1 Speakers
  • Etc.

Handling/Assigning Outputs

  • 4 Outputs to 2 Speakers
  • 5.1 Output to 4 Speakers

Output Formats

  • Analog Audio
  • Digital Audio
  • USB
  • 1394
  • Etc.

Decoding/Decompression

  • The system must have processing ability for: compression, encoding, encryption, etc.

Effects Processing

  • Number of FX Sends per Channel
  • Global FX (Reverb)
  • Other Effects (Distortion, Enhancer, Chorus, Etc.)
  • Programmable DSP?
  • FX Sends or Insert FX
  • Channel EQ
  • Master EQ

Control of Playback Platform

  • API for Channel/Voice Priority
  • API for 3D Placement
  • API for Mixing Control

Quality of Service

  • Sample Rate, Bit Resolution Depth
  • Absolute Output Level, Volume Response Curve
  • Headroom
  • Uninterrupted Service Interval
  • Low Latency/Jitter
  • Sample Rate Conversion
  • Signal to Noise Ratio
  • Master Clock Source

Guidelines for Downward Scalability

  • Absolute Minimum Feature Set
  • Prioritizing Audio Channels
  • Prioritizing Synthesis Voices
  • Scaling Effects

3. The Content Developer's Plight: "Call of the Foghorn"

The developer of sound and music content for interactive media has typically had no direct control over the playback behavior of the content on the target platform thereby limiting both product quality and development productivity.

Being denied the ability to directly affect the playback performance of audio content in the end target application, sound and music developers have been relegated to the difficult task of merely estimating how various audio elements will combine in the end product. They must therefore employ the practice of educated approximation over that of intuitive precision. Too often necessary adjustments need to be made "off-line", that is, in some audio or music development environment divorced from the target, often taking days or weeks, dragging down the production process and compromising quality.

What is clearly needed is the technology to allow the audio/music professional to develop and develop and adjust audio content in direct integration with the target or "host" application.

It seems the best way to accomplish this would be to set up an interface between the audio development platform and the interactive media platform itself. There should be two-way communication between the host platform running the entertainment application and the audio development environment. One scenario is where information flows from the host to the developer - cues identical to those the host application program calls internally need to be echoed over a virtual or actual cable to the development application that will in turn execute the manipulation of all the various audio content resulting in a real-time performance of the audio track. This necessitates audio content residing on the development platform. In the other direction, the development platform could transmit calls to the host platform in order to control the real-time playback of content in the host. This necessitates the residence of content on the host platform. Any problems detected with any aspect of the playback of the audio track, such as problems with balance or tone, can then be immediately fixed in the development application. Files can be destructively adjusted, or alterations made the script that will control the content in the target, or host, application. Adjustments can then be immediately tested for verification and tweaking if necessary.

Each platform being developed for has different audio architecture both software and hardware. So it makes sense that different approaches need to be adopted to address each one on the development side. But in general the concept outlined herein applies globally to all interactive platforms.

Scenario 1: Audio Asset Playback in Development Application Controlled Remotely by Host Application

Host application uses a common language to control audio content, such as MIDI messages. In addition to controlling whatever audio has been integrated internally, these calls are routed externally to a development application that is effectively slaved to the host, an application that organizes the audio content and plays them back analogously to how the host would ultimately play them back once they are integrated into the host application.

There are two types of slaved application seen as important. [1] Audio or music authoring tools for discreet elements that will be members of the eventual full soundtrack. [2] a full soundtrack mixing and mastering tool with playlists for all audio elements (dialog, music and effects) providing control over the quality of playback of the various content, either allowing destructive editing ("Open in Waveform Editor" as a menu command, for example), or for software DSP adjustment instructions that would be saved in a script. These parameters would have to be compatible with the host DSP capabilities. Ideally, the tool could perform some relatively simple adjustments destructively such as Gain, Compression and EQ. Better still, it could support the many excellent software DSP plug-ins available, and be able to run batch processes on designated subsets of files.

A delivery to the project would consist of all audio content plus a script that is automatically exported as text from the development application. The language of the script would contain calls identical to the calls in the host application. The script would contain instructions as to how the host application must manipulate the elements interactively, as well as DSP parameters either on a discreet file basis, or on a subgroup basis (all "Footsteps files", or "all Music files", for example). This approach would vastly simplify the programming side of development, as simple function calls like "Do Cue 87!" could be implemented in the host that is in turn interpreted and executed by the enclosed script designed on the audio development side.

Notably, when all audio content is loaded (or referred to on disk) in the development application, and all function calls sent by the host are being received by the development application, then the result should be an accurate playback of the entire soundtrack on the development platform. While playback on an audio production system other than the host's is probably capable of much more mixing and processing power, as long as the development system were simply scaled to emulate the capabilities of the host audio system, then the playback should be virtually identical. The exception to this is: [1] the DAC would be different and [2] there would be greater latency than when the elements were eventually played back on the host system itself. Latency is not seen as a problem as the intent is not to dub the playback on the development system and deliver it as "synchronized audio", but rather to simply experience all the content being played in consort as they will eventually be on the host, and to be able make the necessary changes directly in a controlled and integrated environment.

Scenario 2: Audio Asset Playback in Host Application Controlled Remotely by Development Application

Not as attractive to the current writer as the previous scenario, this would still be way ahead of what is generally the case today. Once audio content has been integrated into the host application, the playback properties can be controlled remotely from an audio development tool. One less attractive quality of this is that audio content wouldn't be able to be destructively altered - any destructive changes to audio content would still have to be made in an environment detached from the target application. The technological model here would be that the incoming messages from the development platform would override current settings in the host, and audio playback would be controlled via whatever DSP capabilities native to the platform could be controlled. The changes then would exist in the script on the development side which would be exported and delivered to the project, overwriting the previous version.

The Point

Both scenarios presented are merely intended as a "what if", a couple thumbnail sketches of the kind of development environment that could allow the audio/music developer the kind of control that has so conspicuously been lacking in the interactive audio development community. The "production vision" sketched above is not intended as a specific proposal or to serve as a technical specification. As to who might develop the kinds of tools discussed, one thought would be the current manufacturers of integrated MIDI / Audio authoring applications. Or perhaps the tools described above could be built to integrate into existing MIDI and/or audio development tools.

One thing readily agreed upon, the tool set for developing audio for interactive media is vastly underdeveloped in comparison with the overwhelming popularity and financial success of the interactive products themselves. When audio developers do eventually attain the level of control enjoyed in so many other, typically linear, production models, interactive applications will begin to achieve much greater levels of quality.

4. The Ultimate Development Tool

Today's breed of audio production tools are falling short of the needs of interactive audio developers. In order for sound design and composition to flourish across a broad spectrum of playback platforms, the tools must focus on the real needs of interactive composers.

What is required of the ideal interactive composition tool?

The tool must be united with the full playback experience

The composer has instantaneous and constant access to visuals and scripts. For example, if a game composer needs to change a particular cue in the gameplay, they would point the game's rendering engine to the cue point and the sounds/music would then immediately come up in the authoring tool. The author would then make changes and immediately hear their effect, all while in the context of the gameplay. Doing all this requires connectivity in many directions. The audio tool must be connected to the graphical tool. The audio tool must be connected to sound engine. All of these connections must be bi-directional.

The tool must be attuned the composer's workflow

The composer is free to work in the linear style that comes naturally. This means that all of the slicing and dicing that's usually associated with creating interactive content happens after the "real song" is finished. So instead of being required to write 4-bars here, 4-bars there, the author can write entire pieces of music, and let the technology help to turn linear music into "segments" for interactive use.

The tool must be simple and intuitive

Existing paradigms are leveraged so the composer isn't forced to learn new terminology and processes. Composers are musicians first and foremost, so musical ideas need to be portrayed and manipulated in musical terms. This requirement actually kills 2 birds with one stone. Building tools that are strictly focused on "game audio" development is not economically feasible for a tools vendor. However, if the interactive features are presented in a way that makes the tool more useful to conventional musicians, then tools vendors are much more likely to build new interactive features.

The tool must be integrated with all playback platforms

Music can written once and played anywhere. The tool needs to plug seamlessly into all delivery platforms, be it a MIDI synth over a cable, a DLS software synth, a software synth "plug-in", or an external game console. The way we reach this level of integration is via the standardized MIDI-based protocols proposed elsewhere in the document.

Integration with all platforms gets us closer to the "what you hear is what you get" goal, originally put forth in General MIDI. Via integration an audio developer can ask questions such as, "How would this sound on a cell phone?", or, "How can we make the PS2 version sound as a good as the XBox version?"

5. Towards a Command Language

Goal: In order to leverage the existing paradigms, we're going to have to connect them together. Finding a way to connect the linear content creation world to the nonlinear, interactive rendering world means allowing game composers to work in their favorite and familiar tools, while still producing media that meets the game engine's requirements.

Solution: A Command Language for nonlinear, event- and conditional- driven, interactive audio. This language must be able to communicate between different machines, and between different programs running on the same machine. It must be compatible with existing music and sound authoring tools, and should be easy for custom applications (such as game code) to use.

Method: Add Interactivity Support to MIDI.

Why MIDI?:

  • MIDI most suitably addresses the needs of a carrier for this command language.
  • MIDI is widely accepted, understood, and implemented more commonly by the paradigms we wish to leverage and connect.
  • MIDI has a proven process to be adapted to the evolving needs of our industry via the MMA and IA-SIG organizations.

Section 6 of this document is a draft of a letter to the MIDI Manufacturer's Association, calling for the creation of a MIDI Media Cueing Working Group (MMQ) to develop a practical implementation of the Command Language concepts outlined below.

Command Language Requirements

The nonlinear nature of MMQ, along with multi-transport cable, wireless, and API connectivity needs, results in several unique sets of requirements. The following list will hopefully illustrate the nature of these requirements. This list is not meant to be complete or comprehensive; rather, it is intended as a starting point for an MMQ working group discussion.

Nonlinear language solution

  • A language to express nonlinear audio/MIDI/Media compositions and soundscapes.

Interactive communication solution

  • A language with the ability to interact with the rendering process. An emphasis is placed on the ability for authoring/composition tools to communicate directly with the engine(s) that render the composition.

Connection

  • The language provides a communication mechanism between the audio creation/emulation tools and the graphics creation/emulation tools.
  • The language provides a connection mechanism between the audio creation tools and the runtime rendering engine(s).

Transportable Language

  • The language must be adaptable to streaming protocols, via cable or wireless transports, as well as supporting a destination API adaptation.
  • MIDI can potentially leverage the work of mLAN and the MMA Transport Layer Working Group in this area.

Branching, conditionally and interactively

  • Essential to the non-linearity of the language, provisions must be in place to support event and conditional branch, jump, and looping methods.

Cue driven playback process

  • A Cue (or other suitable term) will be used to define event groupings and conditional applied parameters.
  • A Cue, along with its parameter groups and control definitions, will refer to "segments" of playback data (sequences, media streams, etc.), allowing for multi-format stream synchronization.
  • A Cue's reference to streams of playback media will contain address information (start and end, etc.), fundamental to nonlinear access of the playback data.

Meta-Level for Orchestrating Sequences of Cues

  • To complete a fully nonlinear, looping and branching language, cues may be organized into groups (Orchestration)
  • These Orchestrations can be used to trigger a sequence of response cues, with control parameters that affect cue looping, parallel triggering, and/or cue selection within the group.
  • Example 1: An Orchestration may be setup to select a particular cue for playback based on a trigger message received from a game engine during a particular scene.
  • Example 2: An Orchestration may be setup to trigger a sequence of cues, played end to end, with countdown looping of each cue (i.e. Cue #1 plays 5 times, Cue #3 plays 2 times, Cue #2 plays 'n' times, where 'n' is the value of a Controller)

Bi-directional control over compositional elements

  • The language must support control methods that give the creation/authoring environment and rendering engine the ability to interact bi-directionally.

Control Parameter Evolution

  • The ability to specify the value evolution of a parameter. For example: a command that specifies that volume control will increase by 30dB, logarithmically, over the next 5 measures.
  • Allows for relative parameter adjustments, essential in a nonlinear environment, where the previous state of a parameter cannot be determined by the authoring system due to looping and branching factors.
  • Evolution control may be specified in terms of delta or absolute destination, along with scaling and curve modifiers (log, linear, dB, etc.)

Combinatorial Parameterization (META Parameters)

  • The ability to build groups of control parameters, that as a group, are subject to all of the parameter adjustment, modification, and evolution control of a singular parameter.

High-Level Control Media Handling / Loading

  • The language must support the ability to express data orientation that can be adapted to streaming content delivery, load prioritization, and indirection.

Quality of Service Callbacks / Hardware Compatibility Queries

  • A query mechanism, with a set of defined responses, allows a playback system to identify rendering capabilities. This query gives the playback system an opportunity to modify and/or select playback/control data appropriately for a given set of capabilities.
  • The query/response map might be identified by capability category and function, many of which are outlined in Section 2 of this document (GIA).

MIDI Channels

  • The current implementation of MIDI Channelization must be evaluated for use with this language.
  • Again, the work of mLAN and the MMA Transport Layer Working Group have solutions in this area to be leveraged.

Define common terms for future work

  • In order to convey descriptions and concepts, a common set of terms are necessary, along with descriptions. We recommend collaboration with a number of other working groups that have expressed a desire to agree on common terms and metaphors that describe the components involved in interactive composition.

6. Practical Steps:
Proposal for MIDI Media Cueing (MMQ)

The Foghorn group recommends that the MMA undertake the development and promotion of a Recommended Proposal for MIDI Media Cueing (MMQ). This RP would specify a mechanism for controlling a player's handling and playback of MIDI and digital audio media by sending it MIDI messages. This would have the same essential control function as traditional function call-based APIs, but would make it possible to drive those functions with MIDI. When stored in a Standard MIDI File (SMF) with intervening delta-times, these messages could be used to create 'playlists' of MIDI files and audio clips.

This scheme constitutes a cross-platform standard for a cueing mechanism which is both data-driven, transmittable, and responsive to real-time events. (See also: 1999 BBQ Report, "Q" Group) Such a mechanism would catalyze a speedy solution to several long-standing problems in interactive music and sound development - namely the need for adequate tools for editing the interactive response of media, and for auditioning music and sound in context while the game or application is running. It would also capitalize on the software plug-in interfaces of modern MIDI+Audio sequencers.

MMQ functions may potentially be useful in all interactive media application areas including games, multimedia, web sonification, device user interfaces, theater, and DJ performance.

There are two aspects to the scheme: a data structure called a Cue, and a set of new MIDI messages for commands to handle and play Cues. The MIDI messages would control the setup and playback of the music and sound media stored in the Cues.

Scenario

Currently, interactive software titles ('games' for short) generally produce interactive audio experiences by driving a software sound system in real time via its native API. (The software sound system in turn drives an audio rendering layer which may be software or hardware.) Music and sound media is created by audio artists, then delivered to the application programmer who 'integrates' it into the software build, and triggers specific MIDI and digital audio files using the API. This process imposes a separation in time and space between the creation of the audio media and the complete experience of the game (sound, picture, and interactivity). This separation makes it very difficult for the audio artist to fine-tune the content, compromising quality.

In MIDI Media Cueing, the game's communication with its sound system changes from direct API calls to real-time MIDI messages. This means adding a 'shell' software layer between the game and the sound system, using MMQ messages as the communication medium.

In the final distributed version of the game, the sound system performs all the same functions as it would if MMQ were not there. MMQ is just the control conduit for the same jobs it would normally be doing anyway.

In Final Product:

While the game is in development, however, these real-time MIDI messages can be redirected away from the game sound system, and instead sent to the audio artist's computer (or an MMQ tool running on the same machine as the game).

During Game Development:

This is where the advantage of MMQ lies, because the MMQ tool will produce exactly (or essentially) the same audio performance that the game sound system would. This happens because both the MMQ tool and the game sound system would implement the MMQ MIDI Messages according to the same standard. The difference is that the MMQ tool allows the music or sound artist to make rapid changes in the audio content to improve the total audio/video/interactivity experience, using their normal editing tools, and without having to wait for the content to be delivered to the programmer for 'integration.' This allows the content to be fine-tuned to a high degree before delivery, which experience indicates is likely to result in great improvements to the title's overall product quality.

When the audio artist is ready to make a content delivery, the MMQ tool would export a file of Cues. For the programmer, 'integration' would be simpler, perhaps meaning just replacing the previous Cue file with the new one.

Standardization of MMQ MIDI messages may motivate manufacturers of existing commercial MIDI + audio sequencers to add MMQ functions to their products. In other words, the MMQ tool is the tool the audio artist is already using (or can easily learn to use). Keeping the MMQ feature set small would increase the likelihood of implementation in commercial products.

While it could be argued that using MMQ between the game code and the game sound system in the final product introduces an unnecessary inefficiency, in practice this overhead can be made very small, and clever use of conditional macros may eliminate it entirely. In any event, the potential for improvement in the quality of final product should outweigh such concerns.

The Cue Data Structure

A Cue would be defined as a data structure containing an ID, a chunk of playable MIDI or audio media, and perhaps a block of setup information:

Cue:

ID - Number or Name
Media Clip - SMF or Digital Audio chunk
Setup Info - Controller response definitions, etc.

The ID may be a name, or perhaps a number (the Working Group can decide this). There should probably be a MediaType field as well, to indicate whether the Media Clip is SMF or Digital Audio.

Collections of Cues could be stored together in files in a standard way. XMF may be an appropriate container format for Cue files, as it could allow the Media Clip to reference external files (for sharing & modularity).

MMQ MIDI Messages

The MMQ MIDI Messages would be newly-defined MIDI messages that manipulate Cues. Messages would refer to a specific Cue by using the ID stored in the Cue data structure.

At a minimum, the MMQ MIDI Messages should include commands to load, play, stop, and unload specific Cues. The following list illustrates this essential idea:

Command Messages:

Preload Cue <ID>
Play Cue <ID>
Stop Cue <ID>
Unload Cue <ID>

Example: If a digital audio clip called Footstep3 exists, then sending the MIDI message Play Cue <Footstep3> would cause the receiver to play that clip.

It may be useful for MMQ messages to include a ContextReference parameter, so that when multiple copies of a single cue are playing, a command can be addressed to the desired instance.

It is likely that the need for many other commands will be identified (see Section 5, Towards a Command Language). For example, Pause and Resume will also be useful, and there will probably need to be commands for selecting Cue files.

In addition to high-level commands, MMQ could support continuously changing control parameters for a Cue as it plays, such as volume, mix control such as subgroups and mute and solo, spatial position, variation and intensity controls, and so forth. MIDI Continuous Controller messages may be the most appropriate mechanism for this class of control. If Cues are able to respond to unique logical controllers, or other nonstandard parameters, such controller response definitions could be represented in data and stored with the Cue.

There will likely be a need for looping, conditional branching, and conditional waiting within SMF tracks. There may also be a need for a 'meta' level for orchestrating sequences of Cues. We would recommend that such SMF flow-of-control work also be undertaken in coordination with the MMQ work.

7. Conclusion

Developing content for interactive audio applications is currently an unwieldy and complicated process. Frequently, composers and sound designers must submit their files to the programmers and wait days or weeks until the next build to judge the appropriateness of their work. Producing music, sound effects and voice-overs for games, Internet and wireless applications can be difficult, costly and frustrating because of the inability to audition audio tracks in the interactive environment.

The Foghorn Group sees a multi-pronged solution to these problems:

  • A standard platform specification, which we are calling General Interactive Audio, should be created to define baseline playback features and capabilities.
  • A Command Language should be created to define the data and parameters required for interactive audio development. This language should be used by game programmers to request audio element playback during gameplay, and would facilitate interactive auditioning by audio content producers, which would dramatically improve the quality of content.
  • Communication between the program and the audio engineer can be improved by leveraging a system already familiar to most musicians working in the digital age: use MIDI as the communications protocol to transmit the Command Language. The MMA should develop a standard for MIDI Media Cueing (MMQ), including a standard Cue File format.
  • We'll need tools for creating, editing, auditioning, and exporting this Cue File format. Since musicians and sound designers already create tracks using a variety of editing and sequencing software, interactive audio functionality should be added to the existing tools, using GIA as the player and MIDI Media Cueing as the command language.
  • Game software development systems will have to be modified somewhat to use GIA/MMQ, in order to take advantage of its on-the-fly auditioning capabilities.

Audio content producers are currently stumbling around in a fog, trying to create compelling soundtracks for interactive environments while avoiding the jagged rocks and dangerous shoals of incompatible platforms and impossible schedules. We recommend that this report be considered by the IA-SIG and the MMA as a call to action, symbolized by the sound of a foghorn:

BWAAAAAAAAAA - MUAAAAAHHHHH!!!

section 7


next section

select a section:  1. Introduction  2. Speakers  3. Executive Summary  
4. The Appliantology Group  5. The Multichannel Audio Working Group  
6. The Intellectual Property - Business Models To Save Your Soul Group 
7. The General Interactive Audio Group   8. Schedule & Sponsors