home  previous   next 
The Tenth Annual Interactive Music Conference
brainstorming graphic

Group Report: Improving Computer Audio and Music Production Systems User Interfaces

Participants: A.K.A "The Al Gore Rhythm Method"

Pat Azzarello, Microsoft

David Battino, Batmosphere
Athan Bilias, Yamaha Todd Hager, Dolby Laboratories
Ron Kuper, Cakewalk Kevin MacManus, Yamaha R&D
Philip Merrill, Digital Media Project Tom White, MMA
  Facilitator: Aaron Higgins, MixMeister

A brief statement of the problem(s) on which the group worked

Many user interfaces in computer audio and music production systems impede creativity by slowing the ability to capture inspiration, leading to customer dissatisfaction. For example, these interfaces are:

  • Overly technical, intimidating users who do not have experience in traditional recording methods.
  • Inflexible, frustrating users whose workflow is different from the rigid one designed in the product.
  • Difficult to navigate, making it hard to balance screen space and locate necessary functions.
  • Difficult to configure, making it hard to access features and add or remove system components.

These problems promote customer disaffection, increase frustration, lower productivity, destroy inspiration, and reduce customer loyalty as users move to other products (i.e., don’t upgrade).

A brief statement of the group’s solutions to those problems

  • As part of product planning, identify personas and potential workflows for targeted users.
  • Applications should deliver an adaptive UI environment based on the user’s needs and background. This can be done in various ways such as:
    • Querying users when the application is first run to identify their degree and type of experience.
    • Querying users as they start a new project to determine the likely workflow.
    • Previewing the environment and guiding users through the workflow, allowing them to refine the environment as they go.
    • Deemphasizing elements of the workspace that are seldom used or not applicable (at that time).
  • Explore other metaphors for visualizing and interacting with audio data, such as using frequency displays in conjunction with waveform displays.

The action item list

  • Flesh out the personas and workflows: Pat Azzarello and Philip Merrill [completed]
  • Survey alternative interfaces, such as ones used in children’s software: David Battino [ongoing]
  • Survey interfaces used by DAWs and plug-ins: Ron Kuper [ongoing]
  • Explain how Yamaha is integrating software and hardware through Studio Connections: Kevin MacManus [ongoing]
  • Solicit input from top producers and engineers and publish results: Philip Merrill/David Battino [ongoing; see also The Art of Digital Music]
  • Contribute and refine additional personas: Todd Hager [ongoing]

Expanded problem statement

  • Most GUIs replicate the tools used for music and sound manipulation instead of representing the processes.
  • People approach song-making many different ways, and we need to help them regardless of how they joined the game. The current model is either keyboardist- or recording engineer-focused, and often that isn’t the customer’s background.
  • The current model doesn’t look like it’s making music. Songs don’t look like songs; there is no obvious visual (static) depiction of which way time flows.
  • The most complicated issue for most users is managing the connections between hardware and software (as well as between multiple software packages).
  • Hardware and software aren’t automatically and seamlessly integrated.
  • Reviewers slam innovative interfaces for being nonstandard.
  • None of today’s software products support all the different ways that people want to create music or audio.
  • Because of the “democratization of recording,” musicians have to be their own engineers. The technical left-brained stuff isn’t separated from the creative right-brained stuff in the UI.
  • The UI is not as compartmentalized as it should be, so users can’t focus on the task at hand.
  • Eye candy sells products, but too much candy is bad for you. (Often what looks good isn’t necessarily the most intuitive or usable UI.)

Expanded solution description

The scope of the solution encompasses:

  1. Modular workspaces that change depending on context or user experience and background.
  2. “Adaptive workflow” — a way of following the individual work habits and personality of the user, step by step.
  3. Defined needs for hardware integration.
  4. Exploring alternatives to waveform displays and timeline.

1. Modular Workspaces

The group’s discussion about developing better user interfaces for audio software began with the simple question, “Why do so many music programs look like spreadsheets?” The goal of music software is to facilitate and record creativity, not perform calculations, so if a user wants a multitrack project to look like something else, he should be able to do that. (Similarly, if a user does want to use popular spreadsheet programs to manage tracks, why not enable that as well?) We also noted the challenge of managing 70-plus tracks stacked like a downtown office building.

DAW as Spreadsheet

Why do so many music programs look like spreadsheets? Doesn’t that unnecessarily “box in” the user? To demonstrate the resemblance, we simulated a digital audio workstation in Microsoft Excel.

On the other extreme, making user interfaces look like physical objects (mixing consoles, effects processors, analog synthesizers, etc.) helps a segment of the population but intimidates the rest. We shouldn’t throw out those paradigms if they still work (e.g., transport controls), but we must balance intuitive appearance with usability. For example, it’s intuitive that moving a fader up will increase the amount of something, but faders consume a lot of screen space and, unless grouped, can be addressed only one at a time with a mouse. Moreover, there are many times musicians will want to make sweeping, gestural changes to the music rather than surgical ones.

Configuring the Modular Workspace

We feel that a successful user interface for audio software should connect a segmentable demographic with a workflow that can be represented using project work-modules like virtual Lego pieces. The appearance and use of the display should support the needs of people who purchased the product specifically for a certain project.

We considered various means for customizing the UI. For example, the program could present a split screen, and the user would select instruments on the left side, building a personal avatar graphic on the right. The user could confirm that this graphic pictured his personal situation (segment). For configuring gear, the user (or program) could fill an onscreen rack incrementally, or the program could configure a workflow, with the user confirming that the details pictured matched his project needs. If standardized, this data could be managed and shared via XML.

To construct an appropriate production environment, the DAW would survey the user at the first startup and customize itself based on the response. Questions might include:

  • What instrument do you play?
  • Do you have a DJ background?

In other words, who are you, and what do you want to do?

A person who reads music, but has no traditional recording experience, would likely be a good candidate to either:

  1. Enter their music in notation and have it rendered with a software instrument
  2. Record their instrument, utilizing tempo and bar lines, etc.

A DJ with little or no musical training would be unlikely to use the same methodology.

Here is a simplified survey example:

Survey Flowchart

Survey Flowchart

User Setup Questionnaire

  • We recommend that all complex music software include some version of this questionnaire and make use of the information to tailor the UI. Different apps would have different surveys; for example, Logic Pro would be able to make some assumptions about the user’s skill level. The layout could resemble an avatar builder—as you answer the survey your avatar fills in and you can see how it matches up. Potential questions:
  • Do you play an instrument? [Yes/No]
  • Can you read music? [Yes/A little/No]
  • Have you ever recorded anything? [Yes, on a computer; Yes, on a tape deck; Yes, in a recording studio; No]
  • What kinds of gear will you be using? [Keyboards; Guitars; Effects; Mixing Surface; Turntable; Microphone]
  • Are you a professional or an amateur? (Skilled or unskilled? Complicated or simple? Familiar with traditional production paradigm or not? Clued or clueless?)
  • How old are you? [Child; Young Adult; Adult; Mature] (This could be qualified by humorous questions such as, What band did Paul McCartney play in? [Wings; Beatles; Michael Jackson; Who is Paul McCartney?])

User Personas: from Granny to Grammy

During and after the Project Bar-B-Q conference, we developed personas characterizing a variety of users. The simplest extreme is Grandma seated at a piano wanting to record herself playing and singing a song for her grandchildren. With a tape recorder, it would be easy for her to press Record, and then perform the music and mail the cassette tape to her grandkids. It should be just that easy using today’s digital tools.

One member mentioned meeting a 50-something woman who had purchased an expensive and powerful laptop and DAW software, hoping to record her song ideas. Even with great equipment costing more than $3,000, she was unable to do what grandma had been able to do so easily back in the era of tape. It is unwise to abandon dissatisfied customers, leaving them in the position of believing money they spent was a disappointing waste. As mentioned above, that is a problem our approach would remedy.

Personas 1: Simple Recording/Performing Scenarios

Name: Grandma
Summary: Wants to record herself playing piano and singing “Happy Birthday” to her granddaughter.
Description: The simple analog example of recording voice and piano live to tape requires mentally substituting a device that does the same thing digitally. Here is a potential production scenario:

Simple Recording Scenario

Simple Recording Scenario

Name: Choir Director
Summary: Wants to record church choir and distribute recording to congregation.
Description: This is a directly practical example including placing mics for straight-to-device recording, although many worship institutions have dedicated sound systems with central sound boards.

Name: Songwriter
Summary: Plays acoustic guitar, wants drum and bass accompaniment.
Description: Many a guitar-playing, songwriting singer has needed rudimentary backing both for practice and performance; these can also be accompaniment in multitrack audio for demos and listen-back sketches.

Name: Cover-Band Guy
Summary: Wants to learn the organ solo in “Smokin’” by Boston.
Description: Like lessons in a language lab, many a cover-band guy needs to listen to a solo over and over again, ideally slowing down the tempo without changing the pitch. Recording oneself and listening back to judge accuracy and style are helpful.

Name: MIDI Gear Junkie
Summary: Wants to play some basic tracks (bones) to jam with.
Description: From a compositional point of view, being able to rapidly combine ideas as they are generated is ideal for creating the “bones” of a piece, ready to be arranged and through-composed later. This allows personal jamming, and because MIDI gear offers a variety of sounds, can easily result in the accumulation of keyboards and rack-mount modules.

Personas 2: Amateur/Consumer Recording Scenarios

Another source of Personas is music and recording newbies — certainly a significant slice of the consumer market. Although many customers will be unsophisticated about things like music notation or how to play an instrument, others will enjoy more advanced knowledge and experience and want UI tools to support their skill level.

Name: Britney Wannabe
Summary: Mall booth karaoke
Description: Many teens go to the mall to have glamour photos or portrait photos taken, so it’s easy to imagine an audio version as karaoke-to-CD, or even DVD with some video. A teen singer would want song selection, key adjustment for vocal range, vocal enhancement, and possibly visible music notation tracking the lyrics.

Name: Teenage Bedroom Hobbyist
Summary: Kid with a guitar writing a song for this girlfriend.
Description: Unlike a songwriter who has worked to assemble a personal configuration of gear, a casual Romeo might delight in lying on his bed singing a one-time romantic ballad to someone the song is about...and then hope she likes it.

Name: College Kid
Summary: Selects a group of tracks to play together to create a mix CD.
Description: Popularized by High Fidelity, the mix tape or CD takes playlist-making to a high level in the art of self-expression, especially when the playlist is specifically created for a known individual.

Name: Weekend Warrior
Summary: Plays electric guitar, owns a 4-track, wants to make his band demo.
Description: What Grandma could have done easily with a tape recorder, the Weekend Warrior could once have done easily with a Portastudio or similar 4-track multitrack tape recorder that allows bouncing and simple mixing. Basically, Weekend Warrior thinks his band sounds great and just wants to be able to burn CDs and e-mail MP3s of it.

Name: Forty-something Guy with Disposable Income
Summary: Used to play in a band back in the day, now jams with his buddies; wants to impress friends and kids
Description: This is distinct from Weekend Warrior because Forty-somethings like this are very well-known and desirable customers of gear manufacturers, since they can afford really good equipment and buy more regularly. This emphasis on quality extends to the finished product they want if they are multitrack recording. It may still be friends, family, and CD Baby, but the finished quality should be enough to impress the friends of a teenage daughter — the project output should be mall-worthy.

Name: Postal Service
Summary: Two guys collaborating by mailing CDs back and forth.
Description: This is actually the name of a real band. According to AllMusic.com, Dntel’s Jimmy Tamborello worked with Death Cab for Cutie’s Ben Gibbard by snail mail “with Tamborello sending electronic pieces and Gibbard adding guitars, vocals and lyrics.” The team was forced to promote the United States Postal Service in order to keep their name.

Name: Elementary School Kid
Summary: Edutainment, creativity, experiencing interactive music.
Description: This is where the first reference to Fisher-Price came up, later extended to the idea that advanced users might want to be able to create their own toys to work with. Both in the educational music market as well as just for general fun with music-making, all sorts of equipment exists to encourage interactivity with music. There are video games now driven by banging on a drum.

Name: Dance or Gym Teacher
Summary: Wants interactive control over a playlist.
Description: An exercise instructor wants a dynamic playlist of tracks to energize a class. S/he might want to be able to select a track easily with trick modes. A dance instructor might also want to use loops, since learning a routine often involves repeating a certain section of choreography.

Personas 3: Pro/Prosumer Recording Scenarios

Name: Interapp Collaborator
Summary: A project studio user running Cubase takes raw tracks into a pro studio to overdub and mix.
Description: At a certain advanced level, it becomes essential to be able to maintain the integrity of a file that represents the master output of a project. Depending on circumstances, such as interoperability between two particular products, this is generally either very difficult or very easy.

Name: Power-User Mix Engineer
Summary: Wants efficient editing and comping of recorded tracks using QWERTY keyboard primarily.
Description: Speed is of the essence for a certain kind of high-powered mix engineer. These professionals are highly practiced with their music production configuration and often resort to numerous time-saving ways of working, including using keyboard shortcuts to navigate the commands offered to a software package’s user.

Name: Small Project Studio
Summary: A semi-pro who records demos and small bands for hire.
Description: Multipurpose multitracking studios can be reasonably small and take on a wide range of projects for bands, singers, etc. These end up being a low-end pro version of the Grandma persona. Surround sound is probably a must now.

Name: Soundtrack Composer
Summary: Composer wants to sync to video.
Description: The composer receives a video from the director and wants to be able to play along with the video and record his instrument. Later, he will adjust tempo, bar lines, etc., render it in notation, copy parts, and distribute to an orchestra for final recording.

In some cases the Soundtrack Composer will utilize a cue sheet, or DVD/VCR to write the music (using pad and paper), set up a project that reflects this handwritten roadmap in the DAW, and then record parts along with picture, mix it, and deliver it to the director.

Name: Video Post
Summary: Wants to produce sound effects, foley, musical cues, ADR, dimensional effects, surround sound, and run a networked, multi-user studio.
Description: Like the Soundtrack Composer, the Video Post persona will receive a video (with or without audio). Video post houses generally do not create music (i.e., record individual instruments or parts), but rather drop audio events into a timeline that is synced with the picture. These events may be sound effects, sound design, or music cues created by the Soundtrack Composer.

Name: The Game Composer
Summary: Wants to create video game audio and music.
Description: See 2005 Project Bar-B-Q report New Approaches for Developing Interactive Audio Production Systems.

Name: Frank Filipetti recording James Taylor on Martha’s Vineyard
Summary: High-end location recording.
Description: Frank has a lot of resources at his disposal. He’s looking for a high quality system in limited space. In the analog days, Frank would have to bring a large tape deck with him, along with all of his other recording equipment (compressors, gates, microphones, mixing console). In this case, Frank is likely to replace the tape deck with a computer, though perhaps not the other gear. He wants it to be portable, unobtrusive, reliable, and very likely quiet.

Unlike many of the previous personas, Frank requires the ability to record many tracks simultaneously.

Name: Jimmy Jam Comping on the Plane
Summary: Takes Janet Jackson tracks on his notebook and assembles them on the plane.
Description: Jimmy has already recorded tracks in his studio. He gets on an airplane and auditions versions of individual tracks, identifying the “best” performance, and adding them to his “final” project. Since he isn’t recording anything from the analog domain on the flight, he only needs to play back the audio, and in small spurts. Most of his time is going to be spent listening to a limited number of playback channels, though he will likely want to validate his edits against the complete track (many playback channels).

Name: Grammy-Winning Producer
Summary: Wants to record eight people or an orchestra in a room.
Description: Usually utilizing a recording studio, the producer insulates himself from the actual recording process. His requirements are that the performance be captured faithfully and without compromise. He doesn’t care about technology as much as he cares about the musicianship, and gets very upset when the engineer asks the performer to sing again because there was an artifact introduced through the recording chain (though he generally has patience during the same session with the rap artists’ chains clinking in the background J). He may require one audio capture channel, multiple channels, and even video sync.

Name: The Personalizer
Summary: Wants to make distribution-related tweaks such as callouts and station/customer customization.
Description: A maker of children’s videos wants to personalize audio so that its customers can buy the songs with their child’s name included. (“Are you sleeping, brother Peter.”) The Personalizer generally remixes the original content (or reassembles it from subgroups/stems), inserting only the content that is absolutely necessary (name, station call letters, etc.).

Name: Mastering Engineer
Summary: Takes mixed tracks and produces a production-quality master.
Description: The mastering engineer’s key requirement is high quality, and his main goal is to stabilize the audio and solve problems. In the end, whether he plays back the mastered recording in real time to an analog deck or renders it within the DAW, he wants what he hears as he edits to be what he hears when he plays it back. He also has invested a huge amount of capital in processing (plug-ins and outboard), which often allow much greater flexibility than usually utilized during a mix.

Name: A&R Guy
Summary: Wants to supervise the workflow to get demos in the proper format for submission.
Description: The A&R guy is closely related to the DJ in workflow. He doesn’t add instruments, effects, or vocals. He doesn’t mix from the constituent parts. He assembles the various songs in order, sometimes modifying them at a high level (fadeouts, maybe cutting out sections), and adjusts their volume. Eventually he’ll burn a CD with the material and wants to have track indices, etc.

Personas 4: DJ-Related Recording Scenarios

Name: Open-Mic Performer
Summary: Wants backing tracks.
Description: A singer transfers existing songs from CD and removes the vocal part with special processing. She then adjusts the key to match her vocal range. She burns the result onto a CD so that she can take it to her gig.

Name: Remixer
Summary: Starts with an existing song and reworks it for the dancefloor or simply an alternate sound.
Description: Rips the song from CD and processes it with EQ. He rearranges the original track using copy and paste. He adds a few drum loops to give it a more pounding beat. He also records a few keyboard lines using an external synthesizer to improve the flow of the track.

Name: Masherupper
Summary: Takes two or more songs and combines them, maybe adding loops and other new parts.
Description: Rearranges the original tracks using copy, paste, time-stretching, and pitch-shifting. He may add a few drum loops to give it a new rhythmic backbone. He may also add a few keyboard lines using an external synthesizer to improve the flow of the track.

Name: Acid Looper
Summary: Assembles songs from sound libraries.
Description: The looper relies heavily on sound libraries and could be considered a collage artist. He spends much of his time searching the libraries and previewing material for possible inclusion. Much of his work is trial, error, and refinement. When he finishes a track he renders it to an audio file and posts it on the Internet.

Personas 5: Notation-Related Recording Scenarios

Name: The Orchestrator
Summary: Wants to work in notation and either print it out or render it.
Description: For this user, accurate, legible display and printing are likely the most important features.

Name: The Transcriber
Summary: Wants to compose a piece and get it down in sheet music and/or tablature.
Description: This user is more concerned with speed and ease-of-use than orchestral filigree. Lyric handling may be important, as may intelligent transposition.

Personas 6: Education-Related Recording/Performance Scenarios

Name: Music Educator
Summary: Prepares lessons, uses sequencer to teach orchestration, composition, and perhaps synthesis or pop music production.
Description: Multi-user support may be useful.

Name: Ivory Tower Academic
Summary: Wants to explore new frontiers of sound and computer-aided composition.
Description: Thinks shrink-wrapped software is too mainstream and limited; wants to use Csound, Max, or Reaktor to build music atomically.

Personas 7: Other Recording Scenarios

Name: Disabled User
Summary: Visually impaired, diminished fine motor control.
Description: Needs support within the DAW for alternative controllers (flexible mapping), or uses operating system accessibility tools, such as a zoned QWERTY keyboard that doesn’t require him to hit an individual key, but an area of the keyboard.

Name: Auto-accompaniment Keyboard User
Summary: Wants to export their work into software to tweak.
Description: Starts a song by playing along with the keyboard’s backing tracks, then edits the complete performance in a computer, adjusting such aspects as phrasing, dynamics, notes, and tempo.

Name: Al Gore Rhythm Method
Summary: Uses tools such as KARMA for getting new ideas (on one extreme) or for rapid soundtrack development.
Description: Similar to the Auto-accompaniment Keyboard User, but relies on the software to make compositional suggestions, not just harmonic enhancements.

2. Adaptive Workflow

The concept of adaptive workflow is to define the steps (and the transitions between them) for creating music on a DAW, and then to guide the user through those steps interactively.

Again, an initial survey could help set up the project. For example:

  • Will you be recording new material (audio or MIDI) or using existing material?
  • Will you use the software in a live performance?
  • Will you be recording original music or someone else’s?

At times, the adaptive workflow may shift to a tutorial, or even a creativity-stimulating Oblique Strategy. A newcomer might prefer easy visual UIs, whereas an advanced user should be able to customize whatever is easiest for their project. The software could build a template file for a specific user and project, containing the user/project details. A beginner receiving such a file by e-mail would be able to complete projects more easily.

Workflow Diagram

Workflow Diagram

As the software guides the user through the steps, it should balance right-brain tasks and left-brain tasks.

3. Hardware Integration

With modern music production, tasks bounce between software and hardware constantly. To make that hybrid system work well, the group developed these recommendations:

  • Hardware discovery should be automatic.
  • Configuration of newly discovered hardware should happen automatically in applications.
  • Configurations should be storable, recallable, and transportable.
  • Hardware functionality needs to be scalable, e.g., it should be easy to add a second control surface to a setup.

Some integration solutions have begun to emerge, including

4. Explore Alternatives to Timeline Displays

In addition to discussing ways to escape the “tyranny of the spreadsheet,” we considered ways to render musical or control data. Here is a partial list.

  • Musical arranging “blocks”
  • Tracks
  • Piano roll
  • Notation view
  • Waveform
  • Frequency
  • Envelopes
  • Text (event list, lyrics)
  • Section (SMPTE cue sheet)
  • Arrangement / Playlist
  • Media Pool

Rationale for Current (Wave) Paradigm

  • Gives a visual distinction between MIDI and audio track
  • Makes it relatively easy to see the location of parts
  • Users can “see” dynamic information (including issues like clipping)

Problems with Current (Wave) Display Paradigm

  • Conveys amplitude information only
  • Loudness is logarithmic but waveforms are linear
  • Waves are drawn filled in, but that seems to be a waste of space
  • Meaningful locations in time aren’t always obvious: bar lines, phrasing, when a specific note begins, where the verse or chorus begins, etc.

Other Ways to Represent Sound

  • Frequency domain: 2-D FFT, 3-D FFT
  • Wavelet transform
  • Musical octaves
  • Metadata, automatically determine “sounds like”
  • Visualizations
  • Cycling ’74 Radial (circular rather than linear display)

Way Out of the Box

  • MadPlayer (“Death Star bombing run” interface)
  • AudioPad (A gestural interface developed at the MIT Media Lab)
  • StikAx (a remixing toy)
  • Sony Block Jam (building blocks that make or shape music)
  • Rendered 3-D graphics for workspace
  • Virtual reality
  • Using transparency for overlays
  • 3-D rendered visualization (^ la Creative Lava)
  • Force feedback
  • Animated avatars
  • Stage/mix plot (use icons for each track, move instruments into position)
  • Rhetorical question: Is it possible to build a DAW that has no menus at all, where everything is direct interaction?
  • The Minority Report UI
  • Audio feedback, e.g., for scrubbing
  • Using a spreadsheet as a DAW

Differentiators for Faders and Widgets in Common Use Today

  • Fader
    • Taper
    • Fine/coarse control modifiers
    • Numeric feedback
    • Photorealism
    • Orientation
    • Size on screen
    • Grouped manipulation
  • Knob
    • –inf to 1
    • Rotary encoder, like a data wheel
    • Incremental
  • Mapping to other products’ layouts
  • Button (on/off)
  • Button (radio)
  • Peak/VU Meter
  • Alphanumeric display
  • Numeric data entry
  • Musical note data entry

Items from the brainstorming lists that the group thought were worth reporting

  • Personalizing the user interface: Peter Drescher suggested that software and objects such as cell phones could have a customizable musical “personality.”

Other reference material

section 8

next section

select a section:
1. Introduction  2. Speakers  3. Executive Summary  
4. Using a Multiplicity of Audio Devices in the Home PC
5. New Approaches for Developing Interactive Audio Production Systems
6. Design Features of a Mass Market Living Room PC
7. Ubiquitous Content Distribution to and within the Home
8. Improving Computer Audio and Music Production Systems User Interfaces
9. Disrupting the Current Paradigm of How Audio is Viewed and Used
10. Schedule & Sponsors