|The Fifth Annual Interactive
PROJECT BAR-B-Q 2000
Group Report: General Interactive Audio
|Participants: A.K.A. "Foghorn"||Clint Bajakian; C.B. Studios|
|Peter Drescher; Twittering Machine||Duane Ford; Staccato Systems, Inc.|
|Chris Grigg; Beatnik||Jennifer Hruska; Sonic Implants Network|
|Mike Kent; Roland Corporation||Ron Kuper; Cakewalk|
|Mike Overlin; Yamaha Corporation||Rob Rampley; Line 6|
|Facilitator: Aaron Higgins; MixMeister Technology|
Project BBQ's ongoing mandate is to influence the development of music on computers over the next five years. As of late 2000, all signs suggest that our next major context change is going to be that the field we've come to know as 'music on computers' is about to expand to include all types of media-capable devices - not just high-performance computers, game machines, and internet appliances, but also small, mobile, wirelessly networked devices - and everything in between. These products are likely to be specified by engineers and market researchers lacking computer music experience, and the number of different products is likely to be large. Very large.
There is a clear and present danger that years of IA-SIG and MMA standards work will be overlooked in the development of these devices, and that a new Babel of incompatible, device-dependent music and sound media formats will arise. If this happens, content will suffer - and by extension, so will everyone who has to listen to the damn things. And so will everyone who has to create audio content for them - history has shown that the more formats there are, the more the tools suck.
While the situation is serious - indeed, urgent - it is far from hopeless. We do have viable trade organizations, and we can use them as megaphones for a message targeted squarely at device and tool manufacturers.
Firstly, the Foghorn group recommends that the IA-SIG undertake the creation of a document detailing our industry's recommended system specifications for interactive data-driven music and sound playback, relying on open standards and appropriate for all platforms: General Interactive Audio (GIA). Because some of these new devices will be less or more powerful than current desktop/laptop machines, the specification should include guidelines for how the system and the media should scale up or down to match the device's capabilities.
Secondly, Foghorn has considered the problem of Authoring Tools for interactive music and sound. We have articulated the tool problems that conspire to limit the quality of audio for games and other interactive media. Taking these problem statements as inspiration, we've also begun to characterize the abstract requirements of a 'Command Language' that might connect the programmer's coding world with the audio artist's media world during the development process. This language contains elements of the Cueing concept (as explored by BBQ 1999's "Q" Group and others), and would be a basis for the 'Integrator' tool that The Fat Man and others have sought since the dawn of interactive audio.
While the particulars of such a language will require further definition, we have also gone a step further and developed a rough proposal for extending the MIDI standards to accommodate this 'Command Language' in the form of a new set of MIDI messages. Expressing the language in MIDI form is an intensely practical step, as it opens the door to direct support for interactivity authoring in the pro MIDI+Audio sequencer products that content creators already use in their daily work. Imagine the improvement in content quality that would result if we could just run a MIDI cable from the game box to the sound artist's computer, so that all sound and music elements could be fine-tuned before the delivery to the programmer, in context with the actual gameplay and visuals under development.
Interactive audio has always been characterized by too many incompatible playback platforms and too few good tools. By speaking with a unified and authoritative voice now, our industry can provide a warning signal to steer device and tool designers in the right direction, and not die a horrible scraping death on the rocks of their own ignorance.
The members of the Project BBQ Conference of October 2000 recommend that the IA-SIG and MMA undertake the development of a new specification, white paper, recommended practice, or other similar document. The purpose of this specification is to define the necessary or recommended components of a well-designed audio system for any device or system that is expected to provide interactive audio applications.
This specification should define the components necessary to create compelling digital audio environments for computers, video game systems, Internet appliances, cell phones, wireless devices, handheld organizers and other audio producing systems. Typical software applications that would use the features of these devices include games, Internet audio, multimedia software, music production systems, video and DVD systems.
The following paper is an example outline prepared by members of the Project BBQ 2000 Foghorn Working Group. It is not intended to define any specific requirements or recommendations. It is offered as an example outline of the type of topics and required features that the IA-SIG and MMA should consider for inclusion in this document. It is not an exhaustive list of necessary items.
The GIA specification document should be written for an audience of decision-makers and management not necessarily familiar with electronic and computer music concepts, but should include enough concrete detail to guide implementors. It should primarily rely on references to preexisting specification documents (e.g. all MIDI and IA-SIG RPs and CAs) for particulars.
After preparing such a new specification, the IASIG and MMA should distribute and promote this specification and its recommendations to manufacturers of hardware devices, audio semiconductors, API designers, development tool suppliers and content providers.
GIA Specification Outline
Tools with interface designed to suit Content Creators
Audition while Authoring ("What You Hear Is What You Get")
Linear & Nonlinear Audio Authoring Tools
Creation and Translation to and from all common formats
Cue List Oriented
MIDI Media Cue List
Content Format Support
Number of Channels/Voices
Number of Channels with Interactive 3D Positioning
Number of Sub Groups (Application Output)
Number of Main Outputs (Physical Output)
Control of Playback Platform
Quality of Service
Guidelines for Downward Scalability
The developer of sound and music content for interactive media has typically had no direct control over the playback behavior of the content on the target platform thereby limiting both product quality and development productivity.
Being denied the ability to directly affect the playback performance of audio content in the end target application, sound and music developers have been relegated to the difficult task of merely estimating how various audio elements will combine in the end product. They must therefore employ the practice of educated approximation over that of intuitive precision. Too often necessary adjustments need to be made "off-line", that is, in some audio or music development environment divorced from the target, often taking days or weeks, dragging down the production process and compromising quality.
What is clearly needed is the technology to allow the audio/music professional to develop and develop and adjust audio content in direct integration with the target or "host" application.
It seems the best way to accomplish this would be to set up an interface between the audio development platform and the interactive media platform itself. There should be two-way communication between the host platform running the entertainment application and the audio development environment. One scenario is where information flows from the host to the developer - cues identical to those the host application program calls internally need to be echoed over a virtual or actual cable to the development application that will in turn execute the manipulation of all the various audio content resulting in a real-time performance of the audio track. This necessitates audio content residing on the development platform. In the other direction, the development platform could transmit calls to the host platform in order to control the real-time playback of content in the host. This necessitates the residence of content on the host platform. Any problems detected with any aspect of the playback of the audio track, such as problems with balance or tone, can then be immediately fixed in the development application. Files can be destructively adjusted, or alterations made the script that will control the content in the target, or host, application. Adjustments can then be immediately tested for verification and tweaking if necessary.
Each platform being developed for has different audio architecture both software and hardware. So it makes sense that different approaches need to be adopted to address each one on the development side. But in general the concept outlined herein applies globally to all interactive platforms.
Scenario 1: Audio Asset Playback in Development Application Controlled Remotely by Host Application
Host application uses a common language to control audio content, such as MIDI messages. In addition to controlling whatever audio has been integrated internally, these calls are routed externally to a development application that is effectively slaved to the host, an application that organizes the audio content and plays them back analogously to how the host would ultimately play them back once they are integrated into the host application.
There are two types of slaved application seen as important.  Audio or music authoring tools for discreet elements that will be members of the eventual full soundtrack.  a full soundtrack mixing and mastering tool with playlists for all audio elements (dialog, music and effects) providing control over the quality of playback of the various content, either allowing destructive editing ("Open in Waveform Editor" as a menu command, for example), or for software DSP adjustment instructions that would be saved in a script. These parameters would have to be compatible with the host DSP capabilities. Ideally, the tool could perform some relatively simple adjustments destructively such as Gain, Compression and EQ. Better still, it could support the many excellent software DSP plug-ins available, and be able to run batch processes on designated subsets of files.
A delivery to the project would consist of all audio content plus a script that is automatically exported as text from the development application. The language of the script would contain calls identical to the calls in the host application. The script would contain instructions as to how the host application must manipulate the elements interactively, as well as DSP parameters either on a discreet file basis, or on a subgroup basis (all "Footsteps files", or "all Music files", for example). This approach would vastly simplify the programming side of development, as simple function calls like "Do Cue 87!" could be implemented in the host that is in turn interpreted and executed by the enclosed script designed on the audio development side.
Notably, when all audio content is loaded (or referred to on disk) in the development application, and all function calls sent by the host are being received by the development application, then the result should be an accurate playback of the entire soundtrack on the development platform. While playback on an audio production system other than the host's is probably capable of much more mixing and processing power, as long as the development system were simply scaled to emulate the capabilities of the host audio system, then the playback should be virtually identical. The exception to this is:  the DAC would be different and  there would be greater latency than when the elements were eventually played back on the host system itself. Latency is not seen as a problem as the intent is not to dub the playback on the development system and deliver it as "synchronized audio", but rather to simply experience all the content being played in consort as they will eventually be on the host, and to be able make the necessary changes directly in a controlled and integrated environment.
Scenario 2: Audio Asset Playback in Host Application Controlled Remotely by Development Application
Not as attractive to the current writer as the previous scenario, this would still be way ahead of what is generally the case today. Once audio content has been integrated into the host application, the playback properties can be controlled remotely from an audio development tool. One less attractive quality of this is that audio content wouldn't be able to be destructively altered - any destructive changes to audio content would still have to be made in an environment detached from the target application. The technological model here would be that the incoming messages from the development platform would override current settings in the host, and audio playback would be controlled via whatever DSP capabilities native to the platform could be controlled. The changes then would exist in the script on the development side which would be exported and delivered to the project, overwriting the previous version.
Both scenarios presented are merely intended as a "what if", a couple thumbnail sketches of the kind of development environment that could allow the audio/music developer the kind of control that has so conspicuously been lacking in the interactive audio development community. The "production vision" sketched above is not intended as a specific proposal or to serve as a technical specification. As to who might develop the kinds of tools discussed, one thought would be the current manufacturers of integrated MIDI / Audio authoring applications. Or perhaps the tools described above could be built to integrate into existing MIDI and/or audio development tools.
One thing readily agreed upon, the tool set for developing audio for interactive media is vastly underdeveloped in comparison with the overwhelming popularity and financial success of the interactive products themselves. When audio developers do eventually attain the level of control enjoyed in so many other, typically linear, production models, interactive applications will begin to achieve much greater levels of quality.
Today's breed of audio production tools are falling short of the needs of interactive audio developers. In order for sound design and composition to flourish across a broad spectrum of playback platforms, the tools must focus on the real needs of interactive composers.
What is required of the ideal interactive composition tool?
The tool must be united with the full playback experience
The composer has instantaneous and constant access to visuals and scripts. For example, if a game composer needs to change a particular cue in the gameplay, they would point the game's rendering engine to the cue point and the sounds/music would then immediately come up in the authoring tool. The author would then make changes and immediately hear their effect, all while in the context of the gameplay. Doing all this requires connectivity in many directions. The audio tool must be connected to the graphical tool. The audio tool must be connected to sound engine. All of these connections must be bi-directional.
The tool must be attuned the composer's workflow
The composer is free to work in the linear style that comes naturally. This means that all of the slicing and dicing that's usually associated with creating interactive content happens after the "real song" is finished. So instead of being required to write 4-bars here, 4-bars there, the author can write entire pieces of music, and let the technology help to turn linear music into "segments" for interactive use.
The tool must be simple and intuitive
Existing paradigms are leveraged so the composer isn't forced to learn new terminology and processes. Composers are musicians first and foremost, so musical ideas need to be portrayed and manipulated in musical terms. This requirement actually kills 2 birds with one stone. Building tools that are strictly focused on "game audio" development is not economically feasible for a tools vendor. However, if the interactive features are presented in a way that makes the tool more useful to conventional musicians, then tools vendors are much more likely to build new interactive features.
The tool must be integrated with all playback platforms
Music can written once and played anywhere. The tool needs to plug seamlessly into all delivery platforms, be it a MIDI synth over a cable, a DLS software synth, a software synth "plug-in", or an external game console. The way we reach this level of integration is via the standardized MIDI-based protocols proposed elsewhere in the document.
Integration with all platforms gets us closer to the "what you hear is what you get" goal, originally put forth in General MIDI. Via integration an audio developer can ask questions such as, "How would this sound on a cell phone?", or, "How can we make the PS2 version sound as a good as the XBox version?"
Goal: In order to leverage the existing paradigms, we're going to have to connect them together. Finding a way to connect the linear content creation world to the nonlinear, interactive rendering world means allowing game composers to work in their favorite and familiar tools, while still producing media that meets the game engine's requirements.
Solution: A Command Language for nonlinear, event- and conditional- driven, interactive audio. This language must be able to communicate between different machines, and between different programs running on the same machine. It must be compatible with existing music and sound authoring tools, and should be easy for custom applications (such as game code) to use.
Method: Add Interactivity Support to MIDI.
Section 6 of this document is a draft of a letter to the MIDI Manufacturer's Association, calling for the creation of a MIDI Media Cueing Working Group (MMQ) to develop a practical implementation of the Command Language concepts outlined below.
Command Language Requirements
The nonlinear nature of MMQ, along with multi-transport cable, wireless, and API connectivity needs, results in several unique sets of requirements. The following list will hopefully illustrate the nature of these requirements. This list is not meant to be complete or comprehensive; rather, it is intended as a starting point for an MMQ working group discussion.
Nonlinear language solution
Interactive communication solution
Branching, conditionally and interactively
Cue driven playback process
Meta-Level for Orchestrating Sequences of Cues
Bi-directional control over compositional elements
Control Parameter Evolution
Combinatorial Parameterization (META Parameters)
High-Level Control Media Handling / Loading
Quality of Service Callbacks / Hardware Compatibility Queries
Define common terms for future work
The Foghorn group recommends that the MMA undertake the development and promotion of a Recommended Proposal for MIDI Media Cueing (MMQ). This RP would specify a mechanism for controlling a player's handling and playback of MIDI and digital audio media by sending it MIDI messages. This would have the same essential control function as traditional function call-based APIs, but would make it possible to drive those functions with MIDI. When stored in a Standard MIDI File (SMF) with intervening delta-times, these messages could be used to create 'playlists' of MIDI files and audio clips.
This scheme constitutes a cross-platform standard for a cueing mechanism which is both data-driven, transmittable, and responsive to real-time events. (See also: 1999 BBQ Report, "Q" Group) Such a mechanism would catalyze a speedy solution to several long-standing problems in interactive music and sound development - namely the need for adequate tools for editing the interactive response of media, and for auditioning music and sound in context while the game or application is running. It would also capitalize on the software plug-in interfaces of modern MIDI+Audio sequencers.
MMQ functions may potentially be useful in all interactive media application areas including games, multimedia, web sonification, device user interfaces, theater, and DJ performance.
There are two aspects to the scheme: a data structure called a Cue, and a set of new MIDI messages for commands to handle and play Cues. The MIDI messages would control the setup and playback of the music and sound media stored in the Cues.
Currently, interactive software titles ('games' for short) generally produce interactive audio experiences by driving a software sound system in real time via its native API. (The software sound system in turn drives an audio rendering layer which may be software or hardware.) Music and sound media is created by audio artists, then delivered to the application programmer who 'integrates' it into the software build, and triggers specific MIDI and digital audio files using the API. This process imposes a separation in time and space between the creation of the audio media and the complete experience of the game (sound, picture, and interactivity). This separation makes it very difficult for the audio artist to fine-tune the content, compromising quality.
In MIDI Media Cueing, the game's communication with its sound system changes from direct API calls to real-time MIDI messages. This means adding a 'shell' software layer between the game and the sound system, using MMQ messages as the communication medium.
In the final distributed version of the game, the sound system performs all the same functions as it would if MMQ were not there. MMQ is just the control conduit for the same jobs it would normally be doing anyway.
While the game is in development, however, these real-time MIDI messages can be redirected away from the game sound system, and instead sent to the audio artist's computer (or an MMQ tool running on the same machine as the game).
This is where the advantage of MMQ lies, because the MMQ tool will produce exactly (or essentially) the same audio performance that the game sound system would. This happens because both the MMQ tool and the game sound system would implement the MMQ MIDI Messages according to the same standard. The difference is that the MMQ tool allows the music or sound artist to make rapid changes in the audio content to improve the total audio/video/interactivity experience, using their normal editing tools, and without having to wait for the content to be delivered to the programmer for 'integration.' This allows the content to be fine-tuned to a high degree before delivery, which experience indicates is likely to result in great improvements to the title's overall product quality.
When the audio artist is ready to make a content delivery, the MMQ tool would export a file of Cues. For the programmer, 'integration' would be simpler, perhaps meaning just replacing the previous Cue file with the new one.
Standardization of MMQ MIDI messages may motivate manufacturers of existing commercial MIDI + audio sequencers to add MMQ functions to their products. In other words, the MMQ tool is the tool the audio artist is already using (or can easily learn to use). Keeping the MMQ feature set small would increase the likelihood of implementation in commercial products.
While it could be argued that using MMQ between the game code and the game sound system in the final product introduces an unnecessary inefficiency, in practice this overhead can be made very small, and clever use of conditional macros may eliminate it entirely. In any event, the potential for improvement in the quality of final product should outweigh such concerns.
The Cue Data Structure
A Cue would be defined as a data structure containing an ID, a chunk of playable MIDI or audio media, and perhaps a block of setup information:
The ID may be a name, or perhaps a number (the Working Group can decide this). There should probably be a MediaType field as well, to indicate whether the Media Clip is SMF or Digital Audio.
Collections of Cues could be stored together in files in a standard way. XMF may be an appropriate container format for Cue files, as it could allow the Media Clip to reference external files (for sharing & modularity).
MMQ MIDI Messages
The MMQ MIDI Messages would be newly-defined MIDI messages that manipulate Cues. Messages would refer to a specific Cue by using the ID stored in the Cue data structure.
At a minimum, the MMQ MIDI Messages should include commands to load, play, stop, and unload specific Cues. The following list illustrates this essential idea:
Example: If a digital audio clip called Footstep3 exists, then sending the MIDI message Play Cue <Footstep3> would cause the receiver to play that clip.
It may be useful for MMQ messages to include a ContextReference parameter, so that when multiple copies of a single cue are playing, a command can be addressed to the desired instance.
It is likely that the need for many other commands will be identified (see Section 5, Towards a Command Language). For example, Pause and Resume will also be useful, and there will probably need to be commands for selecting Cue files.
In addition to high-level commands, MMQ could support continuously changing control parameters for a Cue as it plays, such as volume, mix control such as subgroups and mute and solo, spatial position, variation and intensity controls, and so forth. MIDI Continuous Controller messages may be the most appropriate mechanism for this class of control. If Cues are able to respond to unique logical controllers, or other nonstandard parameters, such controller response definitions could be represented in data and stored with the Cue.
There will likely be a need for looping, conditional branching, and conditional waiting within SMF tracks. There may also be a need for a 'meta' level for orchestrating sequences of Cues. We would recommend that such SMF flow-of-control work also be undertaken in coordination with the MMQ work.
Developing content for interactive audio applications is currently an unwieldy and complicated process. Frequently, composers and sound designers must submit their files to the programmers and wait days or weeks until the next build to judge the appropriateness of their work. Producing music, sound effects and voice-overs for games, Internet and wireless applications can be difficult, costly and frustrating because of the inability to audition audio tracks in the interactive environment.
The Foghorn Group sees a multi-pronged solution to these problems:
Audio content producers are currently stumbling around in a fog, trying to create compelling soundtracks for interactive environments while avoiding the jagged rocks and dangerous shoals of incompatible platforms and impossible schedules. We recommend that this report be considered by the IA-SIG and the MMA as a call to action, symbolized by the sound of a foghorn:
BWAAAAAAAAAA - MUAAAAAHHHHH!!!
select a section: 1. Introduction
2. Speakers 3.
Copyright 2000-2014, Fat Labs, Inc., ALL RIGHTS RESERVED