home  previous   next 
The Seventh Annual Interactive Music Conference
brainstorming graphic

Group Report: User Interface Design Issues for Audio Creation Tools

Participants: A.K.A. "The Giant Magic Globule" Shri Arora; Gibson Labs
David Battino; Batmosphere Charlie Boswell; AMD
Chris Crawford Atom Ellis; Analog Devices
Jason Flaks; Dolby Laboratories Tim Keenan; ZForm
Jason Kridner; Texas Instruments Adam Shelly; Analog Devices
Scott Snyder; Infogrames Facilitator: Aaron Higgins; MixMeister Technology

I. Introduction
Computer audio is far too hard to use, no matter what the user's level - consumer, prosumer, or pro. We set out to identify a framework that would bridge the digital divide between artists (or music consumers) and geeks. We noted that artists need a more consistent and intuitive workflow in order to be more productive. Although implementing basic usability principles would be a good start, we theorized that there was another reason audio user interfaces (UIs) are so bad: There is no standard for mapping controls to functions in interactive audio.

We started by identifying some basic usability guidelines, then focused on usability guidelines specific to audio applications. Next, we outlined some requirements for the "user interface interface," with special emphasis on game audio production and delivery. Finally, we identified some existing technologies that could potentially form a framework for developing the necessary mapping and control protocol.

II. Issue: Contradictions in UI Needs

A. Simplicity

  • Consistent
  • Always the same mapping of controls
  • Essential information only
  • Usable by idiots
  • Less is more

B. Flexibility

  • Fully configurable
  • Ability to remap all controls
  • Accessible to blind, deaf, and others
  • Remote control and collaboration
  • Usable live
  • More is better

Answer: Decouple UI and Tool

A. Create a language for describing the "user interface interface" (UII)

  • Allows for multiple implementations of user interfaces.
  • Provides a Geek/Freak interface, i.e., one that's both powerful and simple to use.

B. Create a scalable control infrastructure

III. Basic Usability Guidelines

  • All functionality should be accessible in the least number of steps.
  • Common functions should require only one step.
  • Provide user-configurable hotkeys for everything. (Every command should be accessible through any input device - mouse, keyboard, etc.)
  • Don't re-invent things that do work. (Use accepted conventions when helpful.)
  • Provide multilevel undo/redo. Branching (the ability to undo the thing you did five things ago but keep the rest) might be useful.
  • Controls should relate to the available input device. Computer should detect the input device and configure itself appropriately. User should be able to specify his preferred input device.
  • Controls should return physical feedback where appropriate.
  • Controls should support both absolute and relative changes.
  • Provide rollover help (tooltips) for every control.
  • In general, each control must support both coarse and fine adjustments, perhaps through a modifier key.
  • The appearance of a control should suggest its function and operation. (Appearance means presentation, not necessarily visual appearance.)
  • The control's label should accurately reflect its function.
  • Controls must respond rapidly to input.
  • Organize commands rationally.
  • Devices should support a remote UI.
  • Programs should have a Top Ten menu, compiled dynamically from the user's favorite actions. (Contrast this to Microsoft's approach of removing rarely used items from menus.)
  • Don't make the user do housekeeping.
  • Make sure that modal interfaces truly correspond to the user's situation and desires.
  • All reports to the user are of five basic classes, and should be indicated as such. Furthermore, these communications should be clear and honest:
    • 1. Primary data window
    • 2. Progress report
    • 3. Program error
    • 4. Unsupported request ("I can't handle that")
    • 5. More information needed
  • Error messages must be descriptive and offer solutions (e.g., connecting to Google or an online support forum to get more information).
  • Programs should offer useful warnings to help prevent possible problems in the future.

IV. Audio-Specific Usability

  • Don't be restricted by a track metaphor.
  • Always be recording.
  • Manage the assets without putting a burden on the user (e.g., the user shouldn't have to worry about what format an audio file is; the program should just play it).
  • All functions should be available without having to stop playback.
  • Standardize normal volume level. (Tame the forest of volume sliders.)

V. Challenges

  • There is no easy way to make interactive music.
  • By being cumbersome, programs sap your inspiration and restrict creative flow.
  • How do designers establish an intuitive and efficient mapping between controls and actions?
  • Designers rarely observe users, so they don't know what users want.
  • Historical inertia stifles innovation.
  • Tools are too general.
  • Definition of a "user": There are three categories of interface for any product:
    • 1. Consumer
    • 2. Prosumer
    • 3. Professional
  • There is a lack of scalable user interfaces.

VI. Our Task

  • Use an intermediary language for UI, (e.g., XML).
  • Create a dictionary of common controls and what they do.

VII. Freaks and Geeks Must Get Along: The Giant Globule
One of the conclusions of the group was that we need a standard language for describing the "user interface interface" (UII). This idea really resonated with the "Freaks and Geeks" component of the working group. We saw it as a good answer to two related questions we were struggling with: How do artists communicate artistic information to programmers, and how do artists communicate artistic information to other artists?

The information that artists need to transfer falls into two general categories. First, state information: How was your studio configured when you produced that cool sound? A historical example of this type of data is taking a Polaroid of your analog effects rack. Many current digital audio devices provide a way of saving their own state. We would like to see a standard language for saving the state of all devices, including the connections between them.

The second category of information is audio data and performance information. This is the fundamental problem for game-audio designers. They must communicate to the programmer the raw audio data in the form of wave files. But they also need a method of communicating the performance: the rules of behavior governing playback of the data, such as variations, DSP effects to apply, and conditional decisions. Current methods often put too much control over the performance in the hands of the programmer.

Our group envisions that both types of communication can be accomplished with a single format and protocol. From the 10,000-foot view, both studio setup and game playback are the same thing: There are only four basic objects involved: Command Events, Sound Emitters, Sound Processors, and Output Devices. In the studio space these are objects such as keyboards and control boards, synthesizers and wavetables, DSP plug-ins and effects boxes, and speakers and wave-file writers. In the game space examples are player triggers and game states, wavetables, DSP hardware or software, and the speaker drivers. In either case, the chain of devices essentially becomes an instrument for the artist. We believe that there can be a standard language capable of controlling both types of digital instruments.

The obvious question is, "Even if we can control both types of information the same way, why do we want to?" What are the advantages for establishing a standard language of audio control that can be spoken by pro studio equipment and game engines? Here are some of the reasons we got excited about.

  • A standard language would allow real-time control of in-game parameters, using standard studio input devices. Today, the artist's creation process is often: create -> save -> build into game -> play game and listen -> return to studio -> fix -> repeat. A standard interface would simplify the process to: create -> perform real-time transfer to running game -> play and listen and fix on the spot.
  • A standard language would allow the future extension of game (and other) hardware that is considered fixed. For example, if the language describes 3D spatialization, than future hardware should be able to use that information to extend the original system's 3D capabilities (9.1 speakers, anyone?), even with programs written before the hardware existed. Another possibility: The standard would allow the production of add-on playback hardware that takes the CPU processing burden off the original host. Or hardware that allows better sound reproduction than the audio drivers on your Gameboy or cell phone.
  • A standard language can be a tool for real-time collaboration. It would allow hooking up two studios over a network. It would allow one artist to send equalizer settings to another, even if they have different brands of EQs. Or it would allow end users to jam together over Xbox Live, or any network.
  • A standard language would allow creation of alternate input devices, such as a USB guitar that hooks into your game console. And it would allow alternate output devices, replacing a console's limited wavetable with a Gigasampler-size one, for example.

So what would such a language look like? More study is needed. But the group identified several existing protocols that may already provide some of the capabilities we envision. In order to meet the needs described above, a few things are clear:

  • The language must be able to describe device state information. It would be more flexible if it described state in terms of the effect, rather than the input. For instance, "Make it 10 percent louder" instead of, "Set its volume to 11."
  • It would have to describe chains of devices: "Patch the output of this synth into that delay," for example.
  • It would need a method of describing device equivalency - some way of saying, "This sampler can perform the functions of that game console's built-in wavetable."
  • It needs the ability to communicate performance rules: "When trigger is pulled, play 'gunshot'; if ground is not grass, play 'shellBounce.'"
  • It would need to be processed in real time by digital audio devices and computer software.
  • It needs to be capable of being recorded into a digital file format (the Giant Magic Globule) that can be passed from one artist to another.

The next steps are to evaluate the existing protocols to see how they meet these needs, and to further refine these ideas to produce a better outline of what this standard audio language would entail.

VIII. Existing Technologies
A working group in the Audio Engineering Society Standards Committee created the AES-31 specification. AES-31 allows different workstations to share digital audio media and edit decision lists. It is a simple file standard and is FAT32-compatible.

The Media-Accelerated Global Information Carrier is a specification developed by Gibson Labs (www.gibsonmagic.com). It is an open, royalty-free protocol that utilizes Ethernet to interconnect multimedia devices. It provides a transport for both content and control.

Universal Plug and Play is a standard based on an underlying architecture contributed by Microsoft. Additional functionality is defined in Device and Service descriptions by the Universal Plug and Play Forum (www.upnp.org). UPnP is a control and discovery protocol that operates on top of IP. UPnP enables automatic discovery using DHCP or AutoIP. Devices communicate their control information via an XML document that contains available services that include actions and states.

Rendezvous is open standard developed by Apple. It is a zero configuration discovery protocol that allows devices to automatically discover each other on a network and provide a list of services available. Rendezvous is based on IP and other open standards.

The HAVi organization was founded by eight companies including Sony, Philips, Matsushita, and Toshiba. HAVi is an automatic discover-and-control protocol that uses Java and 1394 as its underlying network. When devices connect via HAVi they broadcast their appearance on the network and provide a Java-based application and interface for control to devices that are interested.

IX. Action Items

  • Evaluate existing standards (including ones mentioned in Part VIII) to see if they meet the identified needs of audio usability and report back to the Project Bar-B-Q mailing list.
  • Post usability guidelines on Musability.com; solicit input from the real world.
  • Call a general meeting at NAMM 2003 to discuss audio usability.

section 6

next section

select a section:
1. Introduction  2. Speakers  3. Executive Summary  
4. Proposal for Latency and Uncertainty (Jitter) Management By Enumerating Renderers and Sources
5. The Future of Hardware for Home Entertainment and Computer Systems  
6. User Interface Design Issues for Audio Creation Tools 
7. Maximizing the Resources Available to Achieve Quality Game Audio
8. Schedule & Sponsors