Computer audio is far too hard to use, no matter what the user's
level - consumer, prosumer, or pro. We set out to identify a framework
that would bridge the digital divide between artists (or music consumers)
and geeks. We noted that artists need a more consistent and intuitive
workflow in order to be more productive. Although implementing basic usability
principles would be a good start, we theorized that there was another
reason audio user interfaces (UIs) are so bad: There is no standard
for mapping controls to functions in interactive audio.
We started by identifying some basic usability guidelines, then focused
on usability guidelines specific to audio applications. Next, we outlined
some requirements for the "user interface interface," with special
emphasis on game audio production and delivery. Finally, we identified
some existing technologies that could potentially form a framework for
developing the necessary mapping and control protocol.
II. Issue: Contradictions
in UI Needs
- Always the same mapping of controls
- Essential information only
- Usable by idiots
- Less is more
- Fully configurable
- Ability to remap all controls
- Accessible to blind, deaf, and others
- Remote control and collaboration
- Usable live
- More is better
UI and Tool
A. Create a language for describing the "user interface interface"
- Allows for multiple implementations of user interfaces.
- Provides a Geek/Freak interface, i.e., one that's both powerful and
simple to use.
B. Create a scalable control infrastructure
III. Basic Usability
- All functionality should be accessible in the least number of steps.
- Common functions should require only one step.
- Provide user-configurable hotkeys for everything. (Every command should
be accessible through any input device - mouse, keyboard, etc.)
- Don't re-invent things that do work. (Use accepted conventions when
- Provide multilevel undo/redo. Branching (the ability to undo the thing
you did five things ago but keep the rest) might be useful.
- Controls should relate to the available input device. Computer should
detect the input device and configure itself appropriately. User should
be able to specify his preferred input device.
- Controls should return physical feedback where appropriate.
- Controls should support both absolute and relative changes.
- Provide rollover help (tooltips) for every control.
- In general, each control must support both coarse and fine adjustments,
perhaps through a modifier key.
- The appearance of a control should suggest its function and operation.
(Appearance means presentation, not necessarily visual appearance.)
- The control's label should accurately reflect its function.
- Controls must respond rapidly to input.
- Organize commands rationally.
- Devices should support a remote UI.
- Programs should have a Top Ten menu, compiled dynamically from the
user's favorite actions. (Contrast this to Microsoft's approach of removing
rarely used items from menus.)
- Don't make the user do housekeeping.
- Make sure that modal interfaces truly correspond to the user's situation
- All reports to the user are of five basic classes, and should be indicated
as such. Furthermore, these communications should be clear and honest:
- 1. Primary data window
- 2. Progress report
- 3. Program error
- 4. Unsupported request ("I can't handle that")
- 5. More information needed
- Error messages must be descriptive and offer solutions (e.g., connecting
to Google or an online support forum to get more information).
- Programs should offer useful warnings to help prevent possible problems
in the future.
- Don't be restricted by a track metaphor.
- Always be recording.
- Manage the assets without putting a burden on the user (e.g., the
user shouldn't have to worry about what format an audio file is; the
program should just play it).
- All functions should be available without having to stop playback.
- Standardize normal volume level. (Tame the forest of volume sliders.)
- There is no easy way to make interactive music.
- By being cumbersome, programs sap your inspiration and restrict creative
- How do designers establish an intuitive and efficient mapping between
controls and actions?
- Designers rarely observe users, so they don't know what users want.
- Historical inertia stifles innovation.
- Tools are too general.
- Definition of a "user": There are three categories of interface
for any product:
- 1. Consumer
- 2. Prosumer
- 3. Professional
- There is a lack of scalable user interfaces.
VI. Our Task
- Use an intermediary language for UI, (e.g., XML).
- Create a dictionary of common controls and what they do.
VII. Freaks and Geeks
Must Get Along: The Giant Globule
One of the conclusions of the group was that we need a standard language
for describing the "user interface interface" (UII). This idea
really resonated with the "Freaks and Geeks" component of the
working group. We saw it as a good answer to two related questions we
were struggling with: How do artists communicate artistic information
to programmers, and how do artists communicate artistic information to
The information that artists need to transfer falls into two general
categories. First, state information: How was your studio configured when
you produced that cool sound? A historical example of this type of data
is taking a Polaroid of your analog effects rack. Many current digital
audio devices provide a way of saving their own state. We would like to
see a standard language for saving the state of all devices, including
the connections between them.
The second category of information is audio data and performance information.
This is the fundamental problem for game-audio designers. They must communicate
to the programmer the raw audio data in the form of wave files. But they
also need a method of communicating the performance: the rules of behavior
governing playback of the data, such as variations, DSP effects to apply,
and conditional decisions. Current methods often put too much control
over the performance in the hands of the programmer.
Our group envisions that both types of communication can be accomplished
with a single format and protocol. From the 10,000-foot view, both studio
setup and game playback are the same thing: There are only four basic
objects involved: Command Events, Sound Emitters, Sound Processors, and
Output Devices. In the studio space these are objects such as keyboards
and control boards, synthesizers and wavetables, DSP plug-ins and effects
boxes, and speakers and wave-file writers. In the game space examples
are player triggers and game states, wavetables, DSP hardware or software,
and the speaker drivers. In either case, the chain of devices essentially
becomes an instrument for the artist. We believe that there can be a standard
language capable of controlling both types of digital instruments.
The obvious question is, "Even if we can control both types of information
the same way, why do we want to?" What are the advantages for establishing
a standard language of audio control that can be spoken by pro studio
equipment and game engines? Here are some of the reasons we got excited
- A standard language would allow real-time control of in-game parameters,
using standard studio input devices. Today, the artist's creation process
is often: create -> save -> build into game -> play game and
listen -> return to studio -> fix -> repeat. A standard interface
would simplify the process to: create -> perform real-time transfer
to running game -> play and listen and fix on the spot.
- A standard language would allow the future extension of game (and
other) hardware that is considered fixed. For example, if the language
describes 3D spatialization, than future hardware should be able to
use that information to extend the original system's 3D capabilities
(9.1 speakers, anyone?), even with programs written before the hardware
existed. Another possibility: The standard would allow the production
of add-on playback hardware that takes the CPU processing burden off
the original host. Or hardware that allows better sound reproduction
than the audio drivers on your Gameboy or cell phone.
- A standard language can be a tool for real-time collaboration. It
would allow hooking up two studios over a network. It would allow one
artist to send equalizer settings to another, even if they have different
brands of EQs. Or it would allow end users to jam together over Xbox
Live, or any network.
- A standard language would allow creation of alternate input devices,
such as a USB guitar that hooks into your game console. And it would
allow alternate output devices, replacing a console's limited wavetable
with a Gigasampler-size one, for example.
So what would such a language look like? More study is needed. But the
group identified several existing protocols that may already provide some
of the capabilities we envision. In order to meet the needs described
above, a few things are clear:
- The language must be able to describe device state information. It
would be more flexible if it described state in terms of the effect,
rather than the input. For instance, "Make it 10 percent louder"
instead of, "Set its volume to 11."
- It would have to describe chains of devices: "Patch the output
of this synth into that delay," for example.
- It would need a method of describing device equivalency - some way
of saying, "This sampler can perform the functions of that game
console's built-in wavetable."
- It needs the ability to communicate performance rules: "When
trigger is pulled, play 'gunshot'; if ground is not grass, play 'shellBounce.'"
- It would need to be processed in real time by digital audio devices
and computer software.
- It needs to be capable of being recorded into a digital file format
(the Giant Magic Globule) that can be passed from one artist to another.
The next steps are to evaluate the existing protocols to see how they
meet these needs, and to further refine these ideas to produce a better
outline of what this standard audio language would entail.
VIII. Existing Technologies
A working group in the Audio Engineering Society Standards Committee created
the AES-31 specification. AES-31 allows different workstations to share
digital audio media and edit decision lists. It is a simple file standard
and is FAT32-compatible.
The Media-Accelerated Global Information Carrier is a specification developed
by Gibson Labs (www.gibsonmagic.com).
It is an open, royalty-free protocol that utilizes Ethernet to interconnect
multimedia devices. It provides a transport for both content and control.
Universal Plug and Play is a standard based on an underlying architecture
contributed by Microsoft. Additional functionality is defined in Device
and Service descriptions by the Universal Plug and Play Forum (www.upnp.org).
UPnP is a control and discovery protocol that operates on top of IP. UPnP
enables automatic discovery using DHCP or AutoIP. Devices communicate
their control information via an XML document that contains available
services that include actions and states.
Rendezvous is open standard developed by Apple. It is a zero configuration
discovery protocol that allows devices to automatically discover each
other on a network and provide a list of services available. Rendezvous
is based on IP and other open standards.
The HAVi organization was founded by eight companies including Sony, Philips,
Matsushita, and Toshiba. HAVi is an automatic discover-and-control protocol
that uses Java and 1394 as its underlying network. When devices connect
via HAVi they broadcast their appearance on the network and provide a
Java-based application and interface for control to devices that are interested.
IX. Action Items
- Evaluate existing standards (including ones mentioned in Part VIII)
to see if they meet the identified needs of audio usability and report
back to the Project Bar-B-Q mailing list.
- Post usability guidelines on Musability.com; solicit input from the
- Call a general meeting at NAMM 2003 to discuss audio usability.