During preparation of this group's report, Chris Grigg began writing a
separate narrative combining a description of the group experience with
various insights and historical references to other interactive audio
systems and design theories. The members of the group have decided to
submit Chris' narrative (as shaped by their feedback) in lieu of a more
conventional report. Since the piece was originally intended to be published
elsewhere, it is © 2001 by Chris Grigg, all rights reserved, and
published by Project BBQ with the author's permission.
I had a dream.
No, not that kind of dream - I started this Rogue Group because of an
actual dream. On the second night of BBQ 2001, after a day full of talk
of XMFs and interactive audio and Integrators and so forth, I dreamed
a System diagram. When I woke up I knew what I had to do. I knew I'd feel
I hadn't made the most of my BBQ 2001 if I didn't at least try to take
advantage of the rare and precious opportunity to gather together some
of the world-class loonies, er, I mean, interactive audio visionaries
in attendance, and pound out at least a basic design pointing the way
to an XMF-based, nonproprietary interactive audio framework in a venerable
vein: the 1999 Big Picture group, the 2000 Q group, LucasArts iMuse ™,
and George's Integrator, tossing in some of the best bits of my own designs
going back to pre-PC computer games.
In recent years most of the attention in the game sound arena has been
focused on 3D audio, the latest sound cards, 5.1 surround, and other rendering
features. While the systems we have in mind would certainly use that stuff,
technically it doesn't have very much to do with them. In plain terms,
the goal of these systems is to make sure that, in the audio artists'
view, the right sound plays at the right time. I realize that sounds so
basic it's silly, but the harsh truth is that in the ordinary way of doing
interactive audio, where the sound artists depend on programmers outside
(or occasionally inside) the audio department to make sure that the audio
files are used in the intended way, the audio artist's vision (or whatever
the aural equivalent of 'vision' would be) is rarely achieved. It just
doesn't happen because it's too labor- and communication-intensive, and
sound work has always had a funny way of getting pushed down the priority
list over the course of a project.
In my view, this is a - no, make that THE - fundamental problem of our
field and has never (yet) been adequately solved. For many years I have
devoted a small piece of my brain and time to contributing to a nonproprietary
solution. So at Saturday breakfast, at the urging of several BBQ brothers
and both BBQ sisters, I stood up and announced my schism. Nobody threw
food (or worse), the right peoples' eyes lit up, and a 10:00 rendezvous
was set for the overlook.
The day turned out even better than I'd hoped. By the time we were done
we not only had a PowerPoint presentation, we also had GUI tool mockups,
a timeline to reduce our design to implementable data structures, commitments
for an IA-SIG working group to standardize the work, and a promise of
a review by the LucasArts team. Now it looks like there could be a session
at the Game Developers Conference, and that this work will help drive
George's book/tool project. Not too bad for six hours' work. We have now
moved considerably farther "Toward Interactive XMF" than expected.
Why so much success? Because, it seems, the time for this particular
form of this particular idea has finally come. Game development has matured
to the point where more developers realize that platform-specific interactive
audio tools are not as cost-effective as cross-platform tools. Some developers
may feel that any competitive advantage from proprietary interactive audio
systems may be in danger of being overshadowed by their high cost and
functionality limitations (amortization base too small to justify additional
development). Also, the importance of human factors such as production
process and the working interfaces between the audio and programming departments
is now more widely recognized than before. It remains to be seen whether
there is a business to be had developing this system (George thinks there
is, I'm not so sure), but to developers even an open/shared development
model looks more promising than having every developer continue to go
it alone, or depend on single-platform tools.
Another reason for the success is the maturity of the design we have
in mind. Forget for a moment the XMF part. Each member of the group brought
a unique (not to mention esteemed) body of experience to the overlook,
both in terms of breadth of game and audio styles and breadth of interactive
audio systems used (or designed), yet independently we had each come to
essentially the same view of what the design concepts for a cue-oriented
interactive audio system should be. There's ample wiggle room when it
comes down to details and implementation, but the value of our group's
work product is in the expression of those core concepts, with a practical
standards-based path to implementation.
As I outlined in my talk Friday morning, previous BBQ groups have done
work articulating parts of this vision. It is perhaps best appreciated
by contrast with the usual working method for interactive audio, which
unfortunately in most cases can be stated simply as "I talk to the programmer,
we decide what sounds are needed, I make a file for each sound, I give
him the files, he puts them in the game but often doesn't use them the
right way." This, as the kids like to say, sucks.
We'd rather work according to these principles:
- Defend against misuse of delivered audio
- Use MIDI & audio together
- Simplify tool development
- Standard formats lead to better tools
- Avoid least-common-denominator functionality limits
The following tactics have been identified to help put the audio artist
- Use data-driven runtime audio systems, not code-driven
- Abstract the interface between coding and audio teams
- Have the game request audio services by symbol, not filename
(Which relates to Brian Schmidt's immortal comment "Anyone who still
thinks there's a 1:1 relationship between a sound and a WAV file just
doesn't get it.")
- Put audio guy in control of the data (i.e. by providing an editor)
Going further, some niceties we might like to see in an ideal world might
- Simplify & formalize content management & hand-off procedures
- Attach audio artist notes to individual sounds
- Generate reports & databases easily from sound, & vice-versa
- Facilitate collaboration among multiple audio artists
Of course this is all high-level goal stuff; there are additional objectives
like controlling sound parameters while a sound is playing, making it
easy to do commonly needed things like randomization and spatial placement,
synchronizing new pieces of music to already-playing pieces of music,
providing callbacks so events can be triggered when a sound gets to a
certain point, etc. These will all pop up below, once we get into implementation,
but they also point up a couple issues.
It's likely we could identify a basic set of core functionality for this
system that would satisfy 80% of its users 80% of the time. If such a
baseline system were built, it would be a great boon to the industry,
and a great gift to the audio artist community. But there's also a whole
world of further improvements that some few folks would like to have;
excluding these would keep some amazing soundtracks from ever happening,
which would (as the kids say) suck. So do we wait to figure out all the
cool stuff before we start building?
George has a flower. Or rather, an analogy involving a flower, which
inspired our group logo and goes something like this: All plants need
roots, a stem and some leaves. You can draw a dotted line just below the
flower, and cut, and you'll still have a perfectly viable plant. The analogy
is: If you want to get anywhere with this thing, figure out what's basic
vs. what's attractive but nonessential, and go with basics because you'll
never ever get to the bottom of what would be cool to add.
FLOWER V. 2.0:
We thought this over and said: "Naaaaaah." We're sensitive guys and we
like flowers, or in other words the purpose of the system is to empower
audio artists to make the coolest things possible. But George is really
right about not waiting to figure out all the possibilities up front,
so, taking an inspiration from the X in XMF, we decided to make our system
eXtensible. In other words, we added as new a working design principle
the idea that, in any area of functionality where someone might conceivably
want to do more than the basics, we should provide a mechanism for extending
the standard functionality.
XMF, it turns out, helps achieve all that. It's designed for bundling
media files together, and allows content to be organized as files &
folders like a computer file system. It also lets you attach metadata
to files and folders in a simple and consistent way, so you can attach
name tags or composer comments, so simple programs can generate reports
based on the contents. Or slightly more elaborate programs could create
XMF files based on entries made in a database. And it's extensible, so
we can easily add new media formats, data formats, and metadata fields
Because XMF already handles so much of the organization and housekeeping,
it helps us focus our design concepts into a concrete design, and allows
us to focus just on the parts of the problem that are left. But it's important
to understand that the current XMF standards don't include anything like
the high-level interactivity functionality we want to see. XMF technology
per se is just about providing the vessel that the data rides in. All
of that new functionality needs to be invented and represented in data
structures before it can be carried inside XMF.
So one useful way to look at the work of our Rogue Group is an attempt
to answer the question Larry the O posed: "What do XMF and BBQ-style interactive
audio mean for each other?" Of course, that begs another question: "What
exactly does 'BBQ-style interactive audio' mean?"
BLIND PEOPLE, MEET ELEPHANT
I believe all of us in the rogue group have the same basic thing in mind,
but as with the story of the blind gents and the elephant, each of us
conceives it from a different standpoint. To George, for years it's looked
like a tool the audio artist can use to unambiguously demonstrate the
audio media and the way it should be used. Lately it's begun to look like
a standard for a data format that would be carried in an XMF file together
with the audio media; the audio artist would create that chunk of data
with a tool as per George's vision, and then the game would use that data.
As one who's worked on both sides of the artist/programmer fence, my view
is that we can't reduce the tool or the data format to practice until
we make some design decisions about the program that's going to read that
data at runtime. Like a razor and its blades, you need both parts to get
the job done, and each part has to be designed from the start to work
with the other.
THE (DREAMY) SYSTEM DIAGRAM
This is where the dream comes in. Dreams have a funny way of crystallizing
things we already know into a new, usually symbolic, form. In this case
the symbols were literally symbols: blocks in a diagram. (Hey, if you
think that's weird, I once had a girlfriend whose dad had this recurring
dream of huge colored numbers raining out of the sky, the air thick with
them like a downpour.) The dream showed me how all the parts would fit
together, in such a way that the media and interactivity rules could be
carried in an XMF file, with complete portability (meaning that the same
XMF file could be used on practically any platform, and would behave the
I think this is the model for the thing we need to build. If we can accept
this despite our differing points of view of the elephant, we'll have
a basis for moving forward with designing everything else we need for
real-world implementations. The rest of this document is just filling
in the details of this diagram.
If you (or your publisher) develop for more than one platform, sooner
or later you start to care about platform independence. If you intersect
with lots of platforms, you start caring a lot about platform independence,
The key to the system's platform independence is the combination of a
platform-independent Soundtrack Manager (SM) with a platform-specific
Adapter Layer (AL). The SM would have to be written by somebody, and that's
where the bulk of the work will be. It would be written in transportable
C or C++ (maybe ported to Java at some point), and talk to each platform's
AL in exactly the same way, using a few simple commands like Load File,
Allocate Memory, Play File, Set File Volume, Locate File To, and so on.
For each platform, an implementation of the AL would turn those commands
into whatever native API calls are required. It helps portability that
only very basic native MIDI and audio file playback abilities are needed
(play from/to, pause/resume, set volume, set pan/spatial location, set
callback), which is possible because all the advanced features are handled
at the SM level.
Because it's the platform-independent SM and not the platform-specific
playback APIs that reads the Interactivity Rules, the same set of content
(media files and interactivity rules) responds to a given set of stimuli
in exactly the same way on all platforms where the system runs.
Now that we have platform independence nailed, we can start considering
the creative features and behaviors the audio artist and the game programmer
will be working with.
THE CUE, or
STIMULUS AND RESPONSE
Whether you're a programmer or a sound artist, if you're going to be
a user of this system the first thing to do is to get your head around
the idea of a Cue, which has a couple parts. Basically a Cue is an event
that the game (or application or performer or whatever) signals to the
Soundtrack Manager, and to which the Soundtrack Manager responds with
a corresponding, predefined action designed by the sound artist.
We call it a Cue because it resonates with preexisting terminology in
two related fields:
- In film, each discrete unit of scored music is referred to as a
- In theater, stagecraft events driven by the action, and 'called'
(to the stage crew, via headset intercom) by the stage manager during
the performance, are referred to Cues
So the term Cue combines both sides of Stimulus and Response: the game
signals a Cue to the Soundtrack Manager (that'd be the stimulus part),
and the Soundtrack Manager responds by doing the right thing.
This, by the way, is not a completely new idea in game sound. Early 8-bit
computer and console games typically wrapped up all the music and sound
effects for a given level into 'modules' that started with tables of 16-bit
offsets to the start of each tune or sound effect, allowing the game programmer
to request each one by number. The sound artist and the programmer had
previously agreed on the number for each needed sound, so those numbers
served as Cue signals. ("Play cue number 15.")
A more modern view of cue IDs would be to use ASCII text strings (maybe
qualified with namespaces) because they're descriptive, readable by both
humans and machines, and easily attached to objects (e.g. stored in a
game's geometry database). So from here on out we'll assume every cue
is identified by an ASCII nametag. To request a Cue from the Sound Manager,
the game would call a function, supplying the nametag of the desired Cue.
("Play the cue named 'FootstepGrassSoftLeft'.")
This is a fundamentally different way for the game to address the audio
system, and it changes the way in which the sound department and the game
designers interact during game design and production. Before, the two
sides got together and agreed on what sounds were needed, how the sound
files should be prepared and used, and then the sound artists created
and delivered the files - but it was up to the programmers to do the (in
many cases manual C++ coding) work of seeing that the sound files actually
got used in the agreed upon ways.
With a cue-based Soundtrack Manager system, the game designers and sound
artists get together and decide on what sounds are needed, what Cue signals
are needed (remember, a single Cue tag may be able to access any number
of sound files), and agree on a nametag for each of those signals; and
then the sound artists create the sound files and the cue responses,
and deliver them all in an XMF bundle. The programmers just have to plug
the XMF files in for the right sounds to happen at the right time, without
doing any additional coding.
Because the sound artists in this scenario have interactive tools for
testing out how the sounds and interactivity rules respond to the cue
signals, they have better confidence that the game will sound right than
they did when the programmers had to interpret every individual artistic
intention into concrete code. And the engineering burden from each new
delivery shifts from heavy-duty programming to just verifying that the
sound files and cue responses have been authored in a way that won't overburden
the game engine memory-wise or CPU-wise (resource hogging).
We foresee way less antacid consumption all around.
So. The next questions are: What exactly are these Interactivity Rules
(IR), how do they relate to media files like AIFF and MIDI files, how
are they represented in data, and how do the IRs and media get bundled
into XMF files? The answer is: Cue Sheets.
A Cue Sheet is a collection of Cue data chunks, where each Cue data chunk
contains both the media files and the interactivity rules for a given
Cue signal that the Soundtrack Manager can receive from the game. Here
again we borrow from preexisting terminology. In film sound post-production,
a Cue Sheet is a timeline showing every piece of sound that will be mixed
together into the soundtrack, laid out onto a channels vs. time grid.
So a Cue Sheet is the thing that tells you what media is there, and when
each piece gets played. Unlike film where the 'when' is unambiguously
tied to elapsed time within the film reel (at time 12:03:08, hear "bonk"),
in our interactive case the 'when' is usually determined by the gameplay
events that happen at runtime (whenever player head hits tree, hear "bonk").
But you see the basic similarity. For any given scene or subsection, the
Cue Sheet collects all the info on all the Cues in the soundtrack.
Once we invent a few data structures, a Cue Sheet can be
represented with an XMF file.
In a Cue Editor that a sound artist might use, a Cue Sheet might look
something like this:
Each row in a Cue Sheet represents one Cue. That is, the media files
and interactivity rules that get used when the game sends the corresponding
Cue signal to the Soundtrack Manager. Each Cue has several elements (or,
if you're more of an object-oriented-programming kind of person, 'data
- The signal and the Cue are linked by name tag, so each Cue needs
a Name field. This is just ASCII text.
- To make life easier for the sound artists, each Cue should include
a Comments field (not shown in the illustration), which could be used
for anything the sound artist wants. A simple program could skim this
info into a database or report generator. If there's a need to save
space in the final product, this field can be automatically stripped
out of the delivered XMF file.
- We've said a Cue states the media files it uses, so each Cue needs
a pool (AKA a list) of media files. This could be done in many ways,
but since we're using XMF, this should be stored as a list of references
to media files stored in XMF files (see XMF spec for details).
- Experience shows that mixing a game is usually hard to do, so in
our vision we include a Fader (AKA volume control - actually, it's
more like a Trim than a Fader, but 'fader' sounds cooler) for every
Cue, to make it easier for the sound artist to balance Cues against
each other. Don't worry about extra DSP overhead - this just becomes
a coefficient in volume scalar calculations for the media files in
this Cue, so the Sound Manager only sends one final setVolume command
to the native playback APIs. (For the same reason, we put a fader
on each Media File in the pool too.)
We've also been doing a lot of hand waving about 'Interactivity Rules',
which we at this stage we need to get more specific about:
- What actually happens when the Cue signal is received? If there
are multiple media files for a cue, which one(s) get played? Can the
Cue do different things depending on what else is happening at the
time? We combine all such logical, decision-oriented stuff into an
Action, which is so much fun - and so involved - that we won't talk
about it just yet; see below. (You'll have noticed that what we call
an 'Action' has a lot of similarities to what in other systems are
- Another important part of the interactive response is the manner
in which new sound elements (media files) are introduced into the
soundtrack. For example, if you just start one music file while another
one is already playing, you'll get cacophony because both will be
playing at the same time, and the rhythms won't align. (Usually I
mean 'cacophony' in a good way, but in this case I mean it in a bad
way.) It would be much better to synchronize the new music to the
old music, and switch over at a musically graceful point which, depending
on the style, could be a phrase, bar, or beat. In many situations
the game design will call for a gradual change of scene (this can
be true for either music or sound effects), in which case a crossfade
of given duration would be better than a hard cut. We can generalize
these different modes of introduction for new elements as Transitions,
and Transitions should be part of a good Cue.
Remember George's Flower? I think the only two areas of the Cue where
we need to provide eXtensibility mechanisms are the Action and the Transition.
The other parts (Name tag, Comment, Media Pool, and Fader) seem to be
dead simple, and don't need extension.
So a Cue includes:
- Name tag
- Media Pool
- Action (with extensibility mechanism)
- Transition (with extensibility mechanism)
These are the items, collected into Cue Sheets, which need to be represented
in a data structure in order to be carried in the XMF file. This list
is complicated enough it might take more then one screen to edit a Cue
in a real-world Cue Editor. For example, the Media Pool editor might look
something like this (fader column not shown in illustration):
The Action and Transition are more complicated than the Media Pool, and
will probably require separate editor screens.
THE ACTION, BABY - WHAT IT IS!
We said there was a lot more to Actions; here are the grisly details.
First off, let's go contemplate that flower again. For many Cues, in
fact, probably most Cues, an Action needs to do no more than choose and
play the right file from the Media Pool. There are only a handful of common
basic rules for choosing which file to play. We can call these Playback
Modes, and George offers up the following four (colorfully named!) ones:
- Gunshot - Each time the cue is called, a different media file
from the pool is played once, all the way through, then stops.
- Sampler - Each time the cue is called, a different media file
from the pool is played all the way through and looped, like a sampler
imitating a flute.
- Clatter - Plays each one of the files in the pool once through,
in order. When all files have been played once, audio stops. This would
be handy to create a random-sounding "everything falls out of Fibber
McGee's Closet" sound.
- Jukebox - Each time the cue is called, a different file plays
through once and stops. After all files in the pool have been played
once, the pool is reshuffled (if "random" is selected), and the pool
is played again, ad infinitum.
- And I would add: Singleton - When there's only one file in
the Media Pool, there's no decision to make - you just play that file.
There may be a few more good basic Playback Modes as well; we'll let
the IA-SIG working group ferret that out. These basic modes are the stem
of the flower, and they should be made totally simple to access and use.
But there will also be cases where a crafty (or wiseacre) sound artist
wants more control than that. That's the blossom on the flower, and it
kind of calls for a simple scripting language. I think we can accept that
any sound artist using such a language would have to be a bit of a propellerhead.
How do we reconcile stem and blossom? In this case, we cheat. We can
make the Action system script-driven under the hood, and put two faces
on the it in the GUI editor: Basic level, where you just pick one of the
preset Playback Modes (which under the hood is just a script, but a built-in,
standard one), and Advanced level, where you get access to a full script
editor and can get your hands dirty (virtually speaking).
And now, a brief side-trip. Before we talk about the innards of the scripting
system, we need to introduce the two-part system that lets a Cue do 2-way
communication with the game (or application, or whatever's talking to
the Soundtrack Manager) while the Cue is running, to wit: Mailboxes and
CONTROL DURING THE LIFE OF A CUE:
THE MAILBOX METAPHOR
There are sounds, and then there are sounds. Some only need do one thing,
but others are called upon to change depending on what's happening in
the game at the time. The classic example is the vehicle engine, where
the RPM, load, and other simulation parameters are constantly careening
around unpredictably in response to game action. A less chaotic (but still
classic) example might be a piece of music where every 8 measures the
next phrase to be played is selected, based loosely on how much activity
is going on in the gameplay. In both cases, information passes from the
game to the soundtrack, influencing playback while it's still in progress.
Not allowing this kind of thing would be just plain lame. This means our
Soundtrack Manager needs to provide some sort of conduit between the game
and the Cues that are running at any given time.
The Rogue Group discussed a few ways of handling this kind of communication,
and agreed that the mailbox metaphor that quite a few game sound systems
have used is hard to beat for simplicity and generality. Imagine a post
office, with a row of numbered mailboxes. Incoming mail is addressed to
a particular box according to box number. To receive mail a customer just
has to look in the right numbered box. In our sound system, the Sound
Manager maintains a numbered array of mailboxes, each holding one (probably
numeric) value, and the game can set the contents of any mailbox to any
value at any time.
This simple mechanism can be used for several kinds of communication.
A mailbox can be used like a knob, so that whenever the game changes the
value, some sound parameter is changed proportionally. Or a mailbox can
be used as a signal that a condition or event has happened - for example
a cue may watch to see whether mailbox 33 contains 0 or 1, because the
game programmer and the sound artist agreed that would signal whether
the hero's radioactive eye sockets are visible (because you have a musical
line that goes extra-cool with the animation effect).
PUBLIC AND PRIVATE MAILBOXES
Although many uses of mailboxes are closely tied to individual running
cues (like a dynamic volume control), some uses involve communication
to all cues at once (like a Dim or Mute function), or communication from
one running cue to another. That's why I'd like to see a new refinement
of the mailbox system: one set of 'Global' mailboxes, plus a separate
set of mailboxes for each running cue. So instead of setting mailboxes
solely by mailbox number, we would do it by cue and number, or else 'global'
DEFAULT SOUND PARAMETER CONNECTIONS
One of the jobs we leave to the IA-SIG Working Group is to determine
what sound parameters should by default be controlled by mailboxes, and
what mailbox number to use for each. Volume, stereo pan (for stereo sounds),
and spatial location (for spatialized sounds) are likely candidates, as
would be effects sends. There should also be an eXtensible way to drive
other, new, and different parameters from mailboxes, including platform-specific
rendering features. These sound parameters would for the most part be
attached to the per-cue mailbox set, not the global mailboxes, although
it might be useful to control a few global sound parameters in a similar
way from the global mailboxes (global volume, for instance). In both cases,
a goodly number of the mailboxes should be left unassigned to avoid cramping
MARKERS & STEPS TO SET MAILBOXES:
"DEAR GAME" AND "NOTES TO MYSELF"
Mailboxes can also be used to pass information from the soundtrack back
to the game, for the same kinds of reasons, and can support the same types
of communication (continuous control, events). The sound artist should
be provided with Mailbox-setting commands via Markers, which can be embedded
in MIDI and audio media, and via Steps in the Action scripting language.
Of course you can also use these commands to talk to other Cues, not just
the game; imagine several pieces of music running in parallel at the same
time, with tracks being dynamically muted and un-muted based on a mailbox
value controlled by the 'master' song.
SYNCING ACTION TO SOUND:
So far we've been talking about making sound things happen in response
to instructions from the game or host application, but a good sound system
also needs to furnish some way to let the soundtrack drive game things
when necessary. For example, you might want to trigger an explosion animation
in sync with a cymbal crash at the end of a piece of music. Since the
timing of the cymbal crash is determined by the music playback, not the
game code, there has to be some sort of music-event-driven signaling mechanism.
In general these are called Callbacks because the game leaves the Soundtrack
Manager a function to call when the desired event happens, and the Soundtrack
Manager uses that to "call back" to the game at the right time. (Think
of it as the SM making a phone call to let the game know that an interesting
sound-based event has happened and the terminology makes more sense.)
There's probably no limit to what an interesting sound-based event might
be, but two intensely practical occasions where it would be useful to
trigger a callback would be 1) when playback of a piece of media reaches
a particular point, and 2) when a particular step in an Action is reached.
CALLBACKS FROM MEDIA MARKERS
One thing audio files and MIDI files have in common is that they are
both (at least in their simplest forms) linear media: playback begins
at the beginning, runs through the file in linear order, and ends at the
end of the file. Another thing they have in common is Markers, which are
non-sound events that are pegged to particular times during playback,
with space for a scrap of text you can fill in. You can place markers
in any good MIDI or audio file editor. (In MIDI files a marker is a "Meta
Event" appearing in the MIDI event stream at the appropriate time, and
in audio files a marker is a special data chunk stored outside the audio
We can set a rule that in our sound system, any marker with text in the
form "Callback:YourTagHere" will cause the Sound Manager to call the game's
callback function that's tagged 'YourTagHere'. If the game hasn't set
up such a callback, then nothing happens.
I bet that was too propellerheaded to understand, so here's an example.
Joe Programmer has a function called DoCrashAnimation() and wants the
Sound Manager to call it when a certain piece of music reaches its crescendo.
Jane Soundgrrl chats with Joe, and they agree the tag for this callback
should be "TimeToCrash". Jane goes back to her MIDI sequencer in the basement,
opens the song in question, and inserts a new marker with the text "Callback:TimeToCrash"
at the appropriate place, then redelivers the media (in an XMF file) to
Joe. Meanwhile Joe's been adding code to the game to tell the Soundtrack
Manager that his function DoCrashAnimation() should be called when the
callback event tagged "TimeToCrash" occurs [which would look something
like this: theSoundtrackManager.setCallback( "TimeToCrash", &DoCrashAnimation
) ]. Joe rebuilds the game, incorporating the new XMF file, and the next
time that piece of music plays, the animation happens when the song hits
CALLBACKS FROM ACTION STEPS
It would also be useful (not to mention easy) to provide a Step in the
Action scripting language to trigger a callback. For example, with a combination
of mailbox-setting markers and an Action script, you could watch for unusual
musical states (like when multiple loops of different length all restart
at the same time), and send the game a callback to say it's time to move
on to a new setup.
E-Z-EDITOR AND LANGUAGE STEPS
The intrepid few who venture into creating their own Action
scripts will need a basic language with enough control statements and
media manipulation statements to get the job done. Most non-programmers
get discouraged by the whole Syntax Error thing, so to make it all seem
just incredibly artist-friendly, we'll call the statements 'Steps' and
provide a button-driven, error-proof editor looking something like this:
To add a step, you just put the cursor at the desired line
and click the 'Add Step' button for the desired step. To delete a step,
select it and click the Delete Step button. To set a parameter for a step,
click on it and enter a new value. Or, where it makes sense, pick a mailbox
number and the script will take the parameter from whatever's in that
mailbox at the time the script runs. You can see where this is leading,
right? It means any part of any script can be driven by any mailbox -
and since the game or any piece of sound media can drive any mailbox,
well... things could get pretty interesting.
SCRIPTING LANGUAGE STEPS
Designing a scripting language can bring on practically religious differences
of opinion, so we'll leave the bulk of this potentially controversial
task to the IA-SIG Working Group, and offer the following short list of
steps (statements) as a starting point. Note that the steps treat all
types of media files (MIDI and audio) in the same way, not as special
Play From / To
Play Random File Of...
Set Mailbox X to Y
Execute Callback 'CallbackTagName'
EXTENDING THE SCRIPTING LANGUAGE
What happens when some genius absolutely needs a new step we haven't
thought of? Flower Time. The data format for the scripting steps needs
to include an extensibility mechanism so that in the future, when we'll
be smarter than we are now, anyone can add new stuff without breaking
the old stuff. (Hint to implementors: Look at XMF's ResourceFormatID non-collision
Notice how much better speech synthesizers have gotten in the last 10
years? The early ones played steady-state phonemes one after another,
end-to-end, and they sounded like bad sci-fi robots. Then someone realized
that in actual speech, most of the time the vocal tract isn't standing
still, it's changing shape as it moves from one phoneme to the next -
so for natural-sounding speech, what you really need to model is the transitions,
not just the steady states.
Mixing sound for film or games is similar, in that if you simply hard-cut
from one sound to another - for example, when the scene or location changes
- the transition usually feels wrong: too sudden, jarring, and unnatural.
The alternative is to transition between settings more smoothly, which
from a mixing standpoint means having the tracks for both scenes playing,
and then cross-fading from one scene's elements to the next one's over
an appropriate interval of time. This opens the door to a whole world
of much more natural, more cinematic, or more musical effects, and by
doing so reinforces the production's immersive effect. By contrast, a
jarring transition frequently detracts from the player's suspension of
But there's more to transitions than just crossfading. We could define
'transition' as anything relating to the way a newly introduced sound
relates to the other sounds running at the time. In some cases, the entrance
of a new cue should kill off other cues (e.g. the classic whistle fall
vs. bomb explosion). In some cases calling a cue when the same cue is
already running should stop playback of the previous sound (flush toilet);
other times you'll want the old sound to continue, so that it overlaps
the new sound (banging on several crash cymbals). So sometimes a Transition
includes killing other cues, or media files within the same cue.
MUSICAL TRANSITIONS AND MARKER ALIGNMENT
While smooth transitions can be very important in creating convincing
sound effects that continue across scene changes, smooth musical transitions
probably offer the best payoff, in terms of a perceived increase in production
values (see for example Peter McConnell's soundtrack for LucasArts' "Grim
Fandango"). From a technical perspective, however, musical transitions
bring the additional requirement that if the transition is going to sound
OK, the new piece of music and the old piece of music must be both a)
playing at the same speed, and b) synchronized to the same musical point
(or at least a compatible point) in the song. This can be tricky. Perhaps
the simplest way to achieve this is to place markers in both music files,
for example exactly on the 1st beat of each music phrase, and let the
Sound Manager use the markers' times to figure out when to start the newer
piece. Depending on the musical style, finer-grained markers (8th notes,
16th notes, etc.) could be used, making faster musical transitions possible.
Marker alignment may also be useful for rhythmic or looped sound effects.
For example, transitions among different angles or intensity levels on
a continuous piledriver or motor would have to be synchronized to the
rhythm to sound right.
PROPOSED BASIC TRANSITION FUNCTIONALITY
Again the IA-SIG WG can decide what functionality is necessary, but here's
our initial stab at the essential creative controls for a good Transition:
Likely the above controls come nowhere near exhausting the possibilities
for useful transition rules, so here again the system should be designed
with an eye to extensibility I confess this seems to me to be an area
where it may be difficult to foresee all the possibilities, so designing
an extensibility mechanism for transitions might be a little tougher than
some of the other areas.
So. What does all the above add up to?
- To the audio artist, all the above appears as markers placed into
the audio and MIDI media using the existing audio and MIDI editors,
and interactivity rules created with a new Cue Editor application. The
Cue Editor would include auditioning tools, for example a way to load
multiple Cue Sheets and a separate button to call every contained Cue.
Final files would be delivered in XMF format, exported by the Cue Editor.
- To the audio integration programmer, all the above appears as the
audio artist delivering media in XMF format, and way less custom programming
- To the game engine architect, all the above appears as a better-behaved,
drop-in way to achieve systematized soundtrack control. World objects
can be easily tied to sound behaviors that the audio artist can adjust
- To a programmer maintaining a proprietary in-house interactive audio
system, all the above may appear as an interesting alternative approach,
or it may appear as some useful ideas to incorporate into the next version.
At a minimum, bundling content with XMF file technology might seem worth
- To the game design team, all the above appears as a way to let the
audio artist get the creative job done that avoids the usual resource
bottleneck of getting precious and expensive programmer support time.
- To the craft of interactive audio, all the above appears to have promise
as a basis for a standard generalized working methodology that would
foster tool development, facilitate collaboration and cross-platform
content creation, and make better interactive soundtracks much, much
easier to create.
This document has laid out a pretty Big Picture, but only in sketch form.
Many details remain to be filled in, and there will be a great deal of
implementation work to be done. The Rogue Group members would like to
offer our encouragement and our support to those who decide to continue
this work, including but not limited to the IA-SIG Working Group. We all
hope to participate directly in whatever comes next, but if for any reason
that's not possible then you can be sure we'll be there in spirit.