Bob in a Box
of Automated Interactive Mixing
Traditional methods of producing and delivering audio experiences have
never managed to overcome a universal problem: the listener never hears
audio in an “ideal” context, and each listener’s situation
presents different challenges (sometimes slight, sometimes great) for
the audio professional. Audio in electronic, interactive games presents
unusually complicated cases to manage, since the content of the audio
itself changes depending on what actions the listener takes in the game.
What’s needed is a new approach, where high quality audio is mixed
“on the fly,” specific to the user’s changing context.
Fortunately, the audio industry, and game audio developers in particular,
have years of experience and techniques to apply to the problem. Specific
kinds of new technical innovations could help us provide even more consistently
memorable and stunning experiences to our listeners. In an attempt to
improve the state of the art, we provide some preliminary conclusions
and propose specific educational resource efforts to increase our collective
knowledge, share our hard-won experience, and prompt some technical innovations
to propel the industry forward.
Many individuals representing all sorts of perspectives (technology development,
game development, game publisher, academic, journalist, game enthusiast,
professional audio developer and professional musician) joined together
at the 2006 Project BarBQ conference to debate common problems and explore
common solutions to one of the deep unsolved dilemmas of contemporary
audio development: to enable a complete and artful integration of the
“listener’s context” into all aspects of audio in the
gaming environment. In different areas, we all strive to provide listeners
with excellence, both artistically and technically, and we all realize
the ways in which our efforts fall short of our imagined potential.
If only, we thought, we could cram a little homunculus clone of famed
audio mixing engineer Bob Clearmountain into each and every computer system,
game console, and home music amplifier, we’d have it: audio excellence
for each and every listener every time, as good as their equipment can
sound, Bob in the Box! Though in reality, we figured we’d have to
settle for something a little less.
Every bit of produced audio heard by every set of ears is the product
of compromise: trying to get one assembled result to sound as good as
possible over a range of playback equipment and playback environments
involves compromise. This is a difficult problem and a primary objective
for any audio professional, whether attempting to help a listener experience
the profound experience of a world-class orchestral or jazz ensemble performance,
or the energized drive of a loud, buzz-guitar pop band.
Electronic games (for PC’s, game consoles, handheld devices, etc.)
represent an additional complication beyond that of standard audio production:
the game player drives the nature of the audio experience through continued
input and interactive decision-making. The output signals of game audio
may never be exactly the same twice. So, how do you mix audio for that
- the unpredictable audio experience?
The goal of improving audio-for-games represents our biggest challenge,
our toughest set of problems, and our biggest opportunity. We set out
to define the problem, find some answers, summarize the areas of experience
where we felt able to produce results, and explore the unfamiliar ground
where our experience provided no ready-made solutions.
Mapping the Problem
“How do we introduce a high level of mixing aesthetics to
interactive audio and games (at a level that compares with the best
musical and cinematic examples)?”
Some game audio developers have an uncanny ability to make games sound
good, despite severe constraints on time and equipment. Even so, most
electronic game players have experienced at least some of the hallmark
problems of poor audio. A few unfortunates have experienced them all,
many times. Some examples:
- Inappropriate or accidental “dead” audio zones –
game areas with no sound
- Pileups – unintentional sound clashes
- Static focus – microphone location or auditory POV never changes
- Lack of psychological perspective – character’s emotional
state doesn’t affect the mix (e.g., during intense moments, all
sound except player’s breathing could drop out)
- Slavish realism (consistency that becomes predictable and boring)
- Not enough variation
- Too “in your face” (all foreground and loud, little 3D
perspective, poorly controlled dynamics)
- Distortion, Truncation
- Lack of masking control
- Dialogue is unintelligible
Poor audio results stem from many root causes, including:
- Lack of clear language for communicating about game audio (between
software developers, artists, directors, composers and sound designers)
- Lack of time
- Lack of budget
- Lack of run-time resources dedicated to audio (RAM, CPU, real time
- Rendering differences between playback systems
- Lack of consistency between localized assets
- Lack of communal knowledge – developers constantly reinvent
- Lack of automation and high-level control tools
Throwing more money at the problem could clearly solve some of it, but
only some; we’d still be left with an unsolved problem, and a larger
financial crisis. We believe that the industry can ameliorate much of
this problem by changing our production processes.
Surveying Our Assets
Part of our solution is already in hand: we occasionally triumph over
our imposed limitations and create some stunningly great audio for our
listeners. The group set out to identify things that human mixers already
do to produce pleasing mixes (often outside the context of game audio),
and the data, parameters, and engines that we would need to accomplish
this within a game. We realized that:
- We would not be able to complete the list during the conference proceedings
- an ongoing effort would be required
- The list would not guarantee successful mixes
- To a large extent, tools already exist (XACT, FMOD, ISACT, WWISE,
CRI, Miles, SCREAM, DARE, PUNCH, KICK, SNARL, etc.), but we still have
a long way to go to “ideal”
- Designing a master tool that would handle all cases is not efficient
or desirable; a plug-in architecture would be preferable.
- Education and training would be needed to reach our objective.
Coin of Technology and Processes
To reach our goal of “audio always delivered as good as it can
sound,” it will take new audio development processes (pre-production,
production, and post-production) and new tools. The group agreed on the
need to generate a list of parameters for controlling essential aspects
of the audio experience, plus the technical feature sets in our tools
and playback systems that would be required to deliver this experience
We recommend working backwards from the aesthetic experience to determine
those parameters, provide necessary background work, and raise the bar
- We will develop clear language for communicating about game audio.
- We will share techniques with developers, through a site containing
featured articles and a public forum.
- We need to raise awareness and find examples of how great game audio
- We envision a more powerful tool set, and will continue exploring
the requirements, that will include parametric control for items such
- Numeric scaling (number of objects being heard (i.e. one cricket,
two crickets, thousands, etc.).
- Psychologically-oriented mixing
- Contextual mixing
- Consistent preproduction and rendering tools (authoring tools
use the same parameters, arguments, data values and DSP/rendering
architecture as the eventual runtime engine).
Defining and exploring parameters to control psychologically oriented
mixing may be one of the bigger challenges in designing these tools. The
most effective mix may not be the most realistic in the strictest definition.
For example, consider any number of movies where the protagonist walks
down a busy city street, and the audience hears primarily the interior
monologue of the character’s thoughts, not the traffic sounds.
Future Perfect: Imagining
While we all agreed there are many excellent audio tools available now,
and there has been significant progress over the past several decades,
we each found it easy to envision a new, more profoundly capable tool
chain that a competent professional could use to ensure a great audio
The tool chain involves several key procedural stages and technical aspects:
- the ability to define discrete parameters “around” individual
categories of playback sounds (music, dialogue, effects, spontaneously
generated audio, etc.).
- a mechanism to collate and communicate those parameters as sets of
- a game engine smart enough to collect real time player and game environment
- an audio playback engine smart enough to apply the metadata to those
game state variables and thus create a full mix representing the game
- a mastering stage smart enough to delivery polished mixes for the
Figure 1 illustrates the basic process for collecting and creating component
sounds for the final mix, assembling them as a unified set ready for mixing,
and handing them to a smart playback mixing engine.
Figure 1. Basic procedural components for delivering
Figure 2 illustrates the technical relationship of the assets, their metadata,
and the playback engine.
Figure 2: High-Level Smart Audio Mix Engine
Group members considered this report to be the first step in a long process
toward the human and technical development required to support an “intelligent
mixer.” Members accepted some specific tasks to drive the effort
(they know who they are!):
- Set up a wiki and/or forum, perhaps on the IASIG site, to share techniques
that will advance the art of interactive mixing
- NOTE: See the section below, “Enter the WikiBlog™:
'The Art of Interaction Mixing Journal'”
- Contributors will write one technical or artistic article for the
site, addressing, for example, one of the following topics:
- Analysis of existing games for examples of successful mixing
- Reviewer Guidelines
- Defining Example Scenarios
- Identification of features that exist in current audio tools
- Define and explore “Dynamic Ambience Tracking”
- Field recording
- Write an Annoying Audio blog about mix annoyances and their solutions
- Sound Source occlusion and obstruction
- Specification of “Dynamic Music Systems”
- Define and explore “Contextual Audio”
- Specification of new systems/new tool architectures
- Communicate by e-mail reflector, with conference calls when necessary
- Start an IASIG working group on interactive mixing and present a progress
report at GDC
- Present our findings to Project Horseshoe (a conference for game designers)
- Investigate SMU’s interactive audio program
Enter the WikiBlog™:
“The Art of Interactive Mixing Journal”
As part of the first and primary action item, the group discussed and
took steps to create an ongoing project: an educational resource for audio
professionals (and professionals-to-be) that can grow, evolve and feed
necessary input to technical innovators for improving audio creation and
delivery tools. The Interactive Audio Special Interest Group (IASIG) provides
a natural forum for hosting and developing this recourse. An abstract
Art of Interactive Mixing Journal
is a strawman for the Art of Interactive Mixing online journal. This
concept was conceived at the Project BBQ 2006 think-tank.
The Journal will
focus on documenting techniques, wisdom, experiences and tools for
the advancement of the art of interactive audio mixing. It is meant
as a neutral ground for the exchange of ideas between professionals
in the field.
The Journal will
be structured as an infrequent blog, à la memepool.com, with
a new entry every 2-4 weeks. Each entry will consist of a full-length
article (2000 words) and an associated discussion between registered
participants. Ultimately the collection of articles may be turned
into a book.
The Journal strives
for simplicity and focus. It will consist of the following sections.
- Front Page.
The Front Page will contain a list of recent articles in chronological
order. Each entry will consists of author, abstract, date and pointer
to associated discussion.
The Contributors section will contain a short biography of all contributors
to the Journal.
The Glossary section will consists of a number of pages, each focused
on a particular technique or term of art, e.g. cue.
- Archive. The
Archive section will contain a chronological list of all published
- About. The
About section will provide a brief description of the Journal.
The Journal will
be edited and moderated by an Editor (Pierre-Anthony Lemieux, Dolby,
Peter Otto, UCSD, Associate Editor). The Journal will be the main
work item of a newly-created Art of Interactive Mixing group within
the IASIG, chaired by the Editor.Conference calls may be scheduled
to address specific topics and business related to the Journal.
- December 2006.
Journal launches with its first article.
- March 2007.
3 articles published. The Journal broadly announced at the Game
Developers Conference 2007.
The Journal will
be open to all, but editorial control will remain with IASIG members.
Specifically, anyone may post comments to any article and anyone may
submit articles for publication to the Editor or Associate Editors.
However, glossary entries and other editorial content may be added
only by IASIG membership. To improve ease-of-use and focus, the Journal
will have its dedicated domain, e.g. interactivemixing.com.
and Obstruction (Fabien Noel, Ubisoft)
- New system
architectures (Peter Otto, UCSD)
- Field Recording
(Guy Whittmore, Microsoft Game Studios)
- Game Audio
Reviewer Guidelines (Matt Tullis, Dolby)
- Dynamic Ambience
Tracking (Tracy Bush, NCSoft)
Audio (Scott Snyder, Dancin’ Mouse Production)
- Mix annoyances
and solutions (Peter Drescher, Danger)
Deep Thoughts on Interactive
Group member Guy Whitmore contributed some well-considered thoughts from
his vantage point as an industry veteran and hands-on director of audio
for a large game publishing company.
Welcome to the Art of Interactive Mixing
Welcome to The Art of Interactive Mixing. At this year’s
Bar-B-Q Interactive Audio Think Tank, a group of audio industry
professionals racked their brains on the topic of the current state
of mixing sound for games. The very idea of a ‘mix’ for
a game is something that currently gets very short shrift not only
from our industry, but also from the very sound designers that create
and implement audio for games. We’re often happy if we get great
sounding assets working at a good relative balance; but that’s
where it all too often stops.
How do we, as audio directors, designers, and producers, advance
the art of game audio mixing? …and what does that even entail?
What is the possible range of expression we can give games through
our design and mixes?
Body (Integration and Mix)
I posit that we’re only touching the base of potential here.
Game mixes tend to be overtly utilitarian,i.e. if it’s closer,
it’s louder. But that’s a very limited paradigm, and I
would like to challenge us to increase our palette of expression,
and create mixes that not only rival film mixes in quality, but do
things that linear mediums are incapable of. To that end we have set
up this site as a central repository of ideas, techniques, opinions,
white papers, post mortems, as well as a meeting place for discussion.
While at Bar-B-Q, we agreed that one person or small group could
not come up with the UBER-solution and present that to the world on
a golden platter. No single one of us has the full range of experience
necessary to be so audacious. Solutions will be emergent as we all
create the best mixes possible for the games we work on. Successful
techniques, ideas, and tools will float to the top organically. As
we experiment with mixing concepts on the various games we create,
we learn what works and what doesn’t. I also believe that over
time, it won’t be one uber-tool or uber-technique that wins
out. Hopefully, a broad collection of approaches and tools will be
known and available, and we can choose the most appropriate to accomplish
the artistic goal of any particular project.
The intention of this site is to accelerate the growth and development
of game audio mixes. Audio folks are often sequestered at work on
our individual projects and our sphere of influence is limited to
the company we work for. Therefore we’re all solving the same
problems in isolation. Furthermore, we are failing to educate students
and audio designers new to our industry, and that means wasting time
and money to get new personnel trained. Sharing ideas and techniques
can only help the still nascent field of game audio.
No excuses. (Noise Reduction)
For the first time in our industry, there is a small but good collection
of tools available to us (most, if not all will be discussed on this
site over time). With this current crop of tools, much is already
possible, and there’s much to be explored without adding a single
new feature. That said, we are and will be the greatest influence
on these and future tools. Therefore, let this site also be a place
where sound designers and tool makers can meet.
Lack of time, money, and resources, are real problems, but let’s
not beat this drum to excess. We often come off as the gripey, whiney
sound people, and that doesn’t further our cause. To advance
our art (or just get that must-have feature), become a positive evangelist
and educator to your team, company and game audience. Don’t
simply ask for new features or resources; convince your team through
demos and mock ups. In other words, don’t ask them - show them!
That’s the only way I’ve ever made any headway.
So I encourage everyone to approach this site with an attitude of
openness and sharing. We often feel we should keep our best ideas
to ourselves, but I suggest that this is counter-productive, not only
to game audio as a whole, but to your company and even to your personal
career. Over my 12 years in this industry, nothing has helped my career
more than openly sharing my ideas and techniques. Of course there’s
a place for proprietary technology and many companies can and should
develop tech that gives them an ‘advantage’. But even
here, the big concepts can be expressed and shared. In the end, techniques
and tools only take you so far; the key ingredient is a creative mind
and a killer set of ears, and these you cannot give away even if you
Contribute. Participate. Enjoy!
- Guy Whitmore, Microsoft Games
A blog entry from group member Peter Drescher imagining the possibilities
of better tools:
A report from a prior Project BBQ exploring a “produce once, playback
anywhere” model for delivering positional audio (whether to headphones,
stereo speaker, or surround sound):