home  previous   next 
The Eleventh Annual Interactive Music Conference
brainstorming graphic

Group Report: Improving the PC Sound Alert Experience

Participants: A.K.A "BeepBeepBeep"

Gary E. Johnson, SMSC Austin

Len Layton, C-Media Electronics
Keith Weiner, DiamondWare Benjamin Masse, Double V3
Peter Lupini, 3dB Research Ltd. Henry Trenton, SigmaTel
  Facilitator: Van Webster, Webster Communications

Problem Statement:

Ever since the simplest “beep” sound made by the very first personal computers, audio alerts have been at the core of every major PC operating system. Much to the disappointment of computer audio folks, however, PC users are increasingly turning off these sounds. To audio people, this is a clear symptom of a deeper issue. Why has computer audio alerting become so irritating and counter-productive? Why bother with audio as an attention-getter anyway? Can audio alerting actually enhance our computing experience? With the continuing emergence of the PC as a central device both at home and in the office, our group of audio experts figured that now was a good time to try and answer these questions.

The use of audio on PCs is currently quite primitive. User control over the audio environment, for example, is very limited – basically just volume control. There were the old “Sound Schemes” that have fallen into disuse, mainly because application vendors did not embrace the paradigm that Microsoft envisaged in the early 1990’s. In many ways we have gone backward in audio use, even as the capabilities of PC have dramatically increased.

People tend to turn off audio alerts because they interrupt our workflow. We assert that the concept of “Flow” is important in that it is the desired state for most people – when someone is in flow, they feel very productive. Consider the following discussion on flow:

Over and over again, as people describe how it feels when they thoroughly enjoy themselves, they mention eight distinct dimensions of experience. These same aspects are reported by Hindu yogis and Japanese teenagers who race motorcycles, by American surgeons and basketball players, by Australian sailors and Navajo shepherds, by champion figure skaters and by chess masters. These are the characteristic dimensions of the flow experience:

1. Clear goals: an objective is distinctly defined; immediate feedback: one knows instantly how well one is doing.

2. The opportunities for acting decisively are relatively high, and they are matched by one's perceived ability to act. In other words, personal skills are well suited to given challenges.

3. Action and awareness merge; one-pointedness of mind.

4. Concentration on the task at hand; irrelevant stimuli disappear from consciousness; worries and concerns are temporarily suspended.

5. A sense of potential control.

6. Loss of self-consciousness, transcendence of ego boundaries, a sense of growth and of being part of some greater entity.

7. Altered sense of time, which usually seems to pass faster.

8. Experience becomes autotelic: If several of the previous conditions are present, what one does becomes autotelic, or worth doing for its own sake.

The Evolving Self - Mihaly Csikszentmihalyi, 178-179

We want to enable more people to enter the Flow state - even possibly during the working day, but people are constantly annoyed and distracted by incoming interruptions. People get so annoyed by audio alerts and other sounds coming from their PCs that many turn the audio off – and some never turn it back on again.

In 2000, Gloria Mark was hired as a professor at the University of California at Irvine. Until then, she was working as a researcher, living a life of comparative peace. She would spend her days in her lab, enjoying the sense of serene focus that comes from immersing yourself for hours at a time in a single project. But when her faculty job began, that all ended. Mark would arrive at her desk in the morning, full of energy and ready to tackle her to-do list - only to suffer an endless stream of interruptions. No sooner had she started one task than a colleague would e-mail her with an urgent request; when she went to work on that, the phone would ring. At the end of the day, she had been so constantly distracted that she would have accomplished only a fraction of what she set out to do. "Madness," she thought. "I'm trying to do 30 things at once."

--CLIVE THOMPSON, The New York Times, October 16, 2005, “Meet the Life Hackers”


Our group would like to overturn the bad reputation that currently plagues PC audio alerting by arguing that sound is in fact the best way to alert PC users of a wide variety of events requiring their attention. After some discussion, we decided that the problem isn’t with sound itself, but rather with the primitive way that sounds have been used.

In Defense of Sound
In order to establish the framework over which an effective and “flow-friendly” sound alert paradigm can be built, it is important to highlight the many advantages of sound as an alert mechanism:

  1. The available sound palette is both enormous and varied.
    There is vast assortment of sound types that can be used to alert a computer user. Sound qualities can have many dimensions such as volume, timbre, pitch, consonance/dissonance, and percussiveness. Sounds can range from being totally artificial (electronically synthesized) to completely natural (animal sounds, weather, human speech). Just the category of musical sounds alone illustrates the vastness of the sound palette. As an added bonus, for most of these sound dimensions, there is a way of moving smoothly along the dimension. This is obvious for properties such as pitch and volume, but even artificial sounds can be smoothly “morphed” in to natural ones.
  2. Sound can be placed in space.
    The fact that we can perceive sound in three-dimensional space offers yet another mechanism for varying and crafting sounds. We can make use of distance, spatial position, trajectory and speed. For example, humans tend to have a personal space that can be exploited – by bringing sounds right into someone’s “comfort zone”, we can grab the attention of the user. And by the same token, by keeping the distance of a sound at the periphery of a user’s comfort zone, we can lessen the impact of the interruption.
  3. Sound can profoundly affect our attention.
    Sound can affect our attention in a wide variety of ways. It can range from relaxing (flowing water) to jarring (thunder, the growl of a panther, or the rattle of a snake). In particular, speech is especially important to humans. For example, many people who are asleep will stop snoring if their partner simply speaks their name. Speech is so fundamental to human development, that humans have evolved a range of information extraction mechanisms relating to the spoken word. For example, most of us can determine the emotional state of someone by the sound of their voice (technically called “prosody” which includes all the information in speech other than the words themselves – for example the timbre, pitch, and tempo).
  4. Sound and vision can be processed independently.
    Our brains can process sound independently of visual input, because our sound and vision processing systems have parallel channels to our consciousness. This means that we can be engaged in an activity such as typing a report, and without interrupting our flow, be made aware of some external event via our auditory system. And because of the available palette of sounds described above, the level of evoked awareness can be precisely tuned.
  5. We can process many streams of sound simultaneously.
    Human hearing is unique among the senses in that it allows the both low and high level attentive monitoring of the environment. That is, you can be intensely engaged in a conversation with someone while at the same time be aware of things like weather conditions or musical sounds. In fact, many of us have seen an ambassador at the United Nations continue speaking to the assembly while an aide whispers some important information in his ear.

Using Sound to Alert Users
By considering the many capabilities of sound, it is possible to come up with a much more compelling sound experience on the PC for the purpose of getting the attention of users. The following examples illustrate some more advanced ways of using sound:

  1. Create beautiful sounds! Get rid of badly quantized 8-bit sounds that do nothing but irritate the user.
  2. Take advantage of the sophisticated 3D processing capability of the human brain. Here are a few examples:

    a. Use 3D processing to make the sound corresponding to a popup dialog box appear as if it were coming from the same location on the screen as the dialog box. This is especially important as screens become larger and multiple screens are commonly used making it very easy to miss an important visual alert.

    b. Use 3D processing to move a sound closer as the importance increases. For example, an approaching thunder storm could be used to warn an engineer of that dreaded meeting with marketing.

  3. Take advantage of our ability to process multiple channels of sound. For example, if a user is on a Skype call and an important alert is required, the user may miss a visual alert because he or she may not be looking at the screen. Using a whispered voice coming from the side to provide the alert may be a good alternative.
  4. Vary sounds over different dimensions in order to move smoothly into the user’s consciousness, thereby reducing the chance of jarring the user out of flow. For example, create a meeting soundtrack for each type of meeting. The computer will start the music gently playing in the background, and then slowly increase the volume / proximity as the meeting start time approaches.

Going Further: Flow Friendly Computing
In order to take the next step in friendly computing, we propose a new PC software application that can sense the user’s state of flow and appropriately modulate the audio that they hear. The software would wait for work flow pauses and use a level of interruption that is appropriate for the situation and flow. It would understand privacy! There should only be a very few interruptions that make it to the user when the privacy level requested is high.

For example, consider a system that detects that I am working on Bar-BQ PowerPoint presentation and I am deeply in a flow state. However, the deadline is approaching and I need to go to the meeting room in 5 minutes. The system has held all my calls and stopped instant messages from reaching me. With 3 minutes to go, it starts gently playing the 1812 Overture, which rises gradually in volume until a crescendo of cannon fire informs me that I need to leave.

When designing new technology products, interaction researcher Bill Buxton advocates leveraging the existing skills that people have because there is a only a finite number of skills that each of us can have at any one time (skills take time to learn and maintain, and our available time is finite). For example, I shouldn’t need to learn how to forward my phone to voicemail – the act of closing my door indicates that I do not wish to be disturbed.

In terms of audio, human beings already have lots of built-in ‘skills’ that allow us to be aware of events and information from the world around us, just from their sound. We would like the audio presented to the user to be empathetic to them. That is, we would like the user’s state of mind to be taken into consideration as the system decides what audio to play to them. How can we deduce the user’s state of mind? We are not talking about artificial intelligence (right now), nor some kind of ‘digital psychic’ but initially a set of simple heuristic rules that can be applied to a richer set of information about the user and their history and patterns of behavior.

What information do humans generate that can be used to derive their state of mind. We are proposing a general-purpose flow detector that recognizes whether a user in “flow” and, if so, avoids disturbing them unless necessary. And if it is necessary, then we should alert them with audio, and we should do this in a polite way. The following tables provide some ideas about how we might detect the state of a user:


Sensor ideas

1.      Body/limb position and movement (shifting in your seat (or not),

Contact mics in furniture, sonar or radar, personal sensors - fabric tension detectors.

2.      Gaze, head orientation

Video cameras, Headphones with headtracking

3.      Typing/mouse activity

Key logger - vigorous typing may indicate flow. (May have to be application specific)

4.      Volume adjustments - button presses

Audio drivers, other OS

5.      General physiological state

Heart rate, respiration (CO2), skin resistance.

6.      Equipment state (phone, doors, furniture)

Door switches, phone system status messages, pressure sensors in chairs.

7.      Environmental noise


In as sense, we are proposing a kind of an ‘executive producer’ for your PC audio. It mutes and un-mutes applications, changes voices to whispers, senses your state and generally tries to keep you in flow unless you need to be moved on to another task or meeting.

What else is being done in this field?

An excellent starting point in the field is W. Wayt Gibbs, “Considerate Computing,” Scientific American Jan 2005 This article mentions several key research groups, including:

The Web site http://interruptions.net/ also has a large bibliography of papers and a list of researchers in the field of interruptions relating to human computer interactions.

While much progress has been made in this nascent field in recent times, there seems to be a good opportunity to make the research and commercial community aware of the potential benefits of using the power of audio to present information in ways that other mediums cannot.

What existing Windows/MacOS infrastructure can be leveraged?

To build a ‘considerate audio’ system in which the user’s state could be modeled as described here, existing components such as the following can be leveraged:

  1. standardized mouse and keyboard interfaces -- there is no standard OS-wide “keyboard activity” API and most presence-sensing systems rely on tracking or logging mouse movements and keystrokes to sense whether there is a human being sitting at the PC.
  2. Standardized camera interfaces – USB video class devices and windows and MacOS camera APIs allow video data to be accessed easily.
  3. Nascent ‘standards’ for sensor networks and security system integration with PCs. Microsoft and companies like Zensys (Z-Wave), Intellon and others are working on ways to bring sensor data into PCs for security, energy management and home automation. These could be leveraged for a considerate audio system.

Presence Sensing

There does not appear to be any suitable presence-sensing infrastructure available in the common operating systems themselves, however several applications such as Skype and Windows Messenger do attempt to sense the ‘presence’ of the user at the PC. These applications can be interrogated through standard APIs that are documented.

Busyness Sensing

Again, the operating systems themselves do not offer anything at such a high level, but the above infrastructure and sensor hardware now becoming available can be integrated to derive levels of busyness – although research and effort is still required to robustly recognize the user’s state.

section 8

next section

select a section:
1. Introduction  2. Speakers  3. Executive Summary  
4. Ensuring that PC Audio Editing/Rendering Plug-ins and Processors Always Work
5. Making the Configuration and Utilization of Audio Systems Much Easier
6. To DRM or Not To DRM?
7. A Consumer-friendly Quantifiable Metric for Audio Systems
8. Improving the PC Sound Alert Experience
9. A Prescription for Quality Audio
10. Facilitating Remote Jam Sessions
11. Providing a High Level of Mixing Aesthetics in Interactive Audio and Games

12. Schedule & Sponsors