home  previous   next 
The Ninth Annual Interactive Music Conference
brainstorming graphic

Group Report: Mobile Phone Audio - Lessons Learned from
Games and the Web

Participants: A.K.A. "The Color of Suck" Peter Drescher; Danger
Jim Reekes; at large Tom Zudock; Sigmatel

Chris Grigg; Beatnik

Bob Starr; QSound
Pete Clare; Sensaura/Creative

Martin Puryear; Microsoft

Jim Rippie; independent Facilitator: Linda Law; Fat Labs

Problem Statement

Producing audio for mobile devices today is like doing game audio in the 80's and Web audio in the 90's. The similarities are striking - severe bandwidth constraints, cross-platform incompatibilities, arcane technical limitations, a plethora of file formats. What have we learned from these past experiences that might help the mobile audio industry in the future?

The Mobile Phone Audio group discussed questions such as: "If I knew then what I know now, what might I have done differently?" "What recommendations might we have for the mobile audio industry on how to make content providers' lives easier and more profitable, based on similar experiences developing game/web audio systems?" "How can we help mobile audio producers avoid some of the pitfalls and problems game/web audio producers have faced in similar situations?"

Basically the issues fell into two categories: For games and on the Web, A) What did we do that worked? and B) What did we do that didn't work?

These discussions involved issues of concern to the following stakeholders:
1. Content creators (sound designers, game composers, record companies, ringtone providers)
2. End users (cell phone and mobile internet device owners, ringtone consumers)
3. Service providers (cell phone carriers such as T-Mobile and AT&T)
4. Handset manufacturers (Nokia, Samsung, Motorola, et al)
5. Component manufacturers (chip designers, speaker makers)
6. Mobile device operating systems vendors (Symbian, Palm, etc)

The Mobile Phone Audio group limited the scope of our discussions to devices based on cellular phone technology, in part because they are a good examples of resource constrained devices. We identified technical trends from the history of the game and Web industries (with a particular emphasis on audio issues) and looked for lessons that could be learned. We then correlated the lessons to similar issues facing developing mobile phone technologies. Many topics were discussed, falling into 5 main categories:

1. Format Incompatibilities
The dizzying array of formats for currently used for ringtones and mobile game audio are massively inefficient. We want a standard audio format for ringtones and for interactive game audio on mobile devices; however, there must be a clear understanding of the problems that the standards are intended to solve.

2. Scalability Constraints
Current technical bottlenecks are guaranteed to widen. Limitations due to processing power, bandwidth, battery life, memory, polyphony, fidelity, sample rates and other constraints will expand as mobile technology progresses, as it did for games and the Web.

3. Licensing Issues
Record companies face piracy issues similar to a "Napster for cell phones" scenario. Currently, these issues remain unsolved (meaning, we don't have any good answers, and neither does anybody else).

4. Hardware & OS Proprietary Chaos
Hardware manufacturers and OS developers will cling to their own proprietary systems as long as possible, in order to protect their investments, despite the fact that this may not be the best survival strategy in the long run. Hardware companies will continue to add features to their closed platforms, and OS vendors will tend to limit innovations in their low level code to maintain compatibility.

5. Content Lessons
The user appetite for ringtone and mobile game content is insatiable, and content trends tend to be unpredictable - peope want whatever people want. However, production and sales can be limited due lack of tools for producing and distributing content.


1. Format Incompatibilities
A standard audio format for ringtones, and for interactive audio for cell phone games, would improve production and sales. Standards provides a number of benefits:

  • They simplify the content market.
  • They make it easier to repurpose content for various markets and promotions.
  • They provide a broader selection of content, and more freedom of choice for the consumer.
  • They allow pooled R&D investment for tools, components and other infrastructure items.

However, standards must provide for controlled extensibility, to allow for future developments as resource bottlenecks expand. At the same time, there must be a clear understanding of what problems the standards are solving.

What worked: Standard linear audio file formats (such as WAV and MP3) have made game soundtracks and Web audio a vastly more compelling experience for the consumer, while simultaneously improving production techniques and the availability of tools. In general, the standard linear formats have completely superceded MIDI, MOD and other resource constrained audio solutions.

What didn't work: Although many solutions have been developed, there is still no generally accepted and implemented standard for non-linear interactive audio.

2. Scalability Concerns
Given what has happened in the games and Web industries over the past 20 years, there will be a natural progression towards higher resolution sound (including stereo and 3D audio) in the mobile audio space as well. Hardware manufacturers will continue to add features that they believe will be profitable, and OS developers will continue to add functionality as processing power allows. This trend will happen even faster in the mobile space than it did for games and on the Web, due to consumer demand and the millions of cell phones on the market. The only guarantee is that things will get better over time, so the best strategy is to plan for scalability, expandability and extensibility whenever you can!

What didn't work: While General MIDI was an excellent standard for producing large amounts of music using small files, its inability to contain custom audio samples limited its musicality; basically, General MIDI didn't scale up very well.

3. Licensing Issues
No one has yet solved the licensing issues in all its glorious ramifications. However, the game/Web experience shows that pretty much any DRM /copyright scheme can (and will) be defeated; therefore, content providers should be concentrate on revenue, not control. If you make agreements with distributors, you'll make money; if you resist, you'll create criminals (and lose money).

What worked: Apple's iTunes software, combined with iPod hardware, provides an excellent example of how agreements made between content providers and Internet developers can produce secure and profitable systems.

What didn't work: The record companies vs. peer-to-peer file sharing systems.

4. Hardware & OS Proprietary Chaos
As with games and the Web, content is king and will drive the market. However, hardware manufacturers tend to cling to their proprietary systems, both as a marketing strategy (you must use our platform and no one else's) and to protect their development investments. Simultaneously, OS vendors tend to limit innovation at the lowest level of the code to maintain compatibility. This may provide a better customer experience and enhance stability in the short run, but can deter new feature development while accelerating obsolescence. [insert "how to trap a monkey" analogy here ... then duck!]

One major difference between the games/Web industry and cell phones is that the telephone communications part of the platform is quite mature, having predated the Internet by at least 50 years. It's the new mobile Internet features now being built into cell phones that will see explosive development. Changes will happen in the mobile space even faster than they ever did in games or on the Web, due to insatiable consumer demand and the chaotic nature of the market.

What worked: In the early of PC audio, Yamaha went to content providers to solicit them to write specifically for the Yamaha chipset. This resulted in Yamaha becoming a de facto standard for PC audio: a similar stragety may work for phone manufacturers.

What didn't work: Microsoft did not allow hardware acceleration of 3d audio in DirectX, limiting its use in games. Also, Sega doesn't give developers access to hardware, locking them into a specific mode of programming and hindering innovation. These kinds of constraints and controls can ultimately doom proprietary systems to extinction.

Special case - wrong&right: Hardware data sheets for consoles such as Xbox and PlayStation are proprietary; this may prevent some 3rd party innovations, but can also reduce the number of bugs!

5. Content Lessons
Controlled content availability always gives way to open content, due to the widely varied preferences of consumers and the unpredictability of demand. User appetites for audio content is insatiable, as demonstrated by music file sharing systems on the Web. Content trends for ringtones will change rapidly, even faster than on the Web.

Given that a significant segment of the market for content is young males, ringtones containing sexual references will sell rapidly. Since games currently have a rating systems, ringtone ratings may have to be developed soon (along the lines of "explicit" stickers on CDs). As cell phones become even more ubiquitous and markets develop for teenagers and children, there may be a demand for content filters and parental controls, as have been developed for Web sites.

Currently, ringtone and mobile game audio is severely limited by bandwidth and other technical constraints. Nonetheless, high quality audio can be produced in constrained resource situations, as shown in the early days of PC games and on websites designed to be accessed at 56k modem speeds. All of the skills developed by sound designers and composers for games and the Web can be applied to content development in the mobile space: MIDI files, small samples with short loops, multiple use of compressed samples varied by pitch and filters, etc. Mobile audio producers would be well advised to use sound designers experienced with game/Web audio over the last 20 years (or stated another way: hire old game audio guys!)

Mobile audio content quality and quantity will improve as tools exist to create them. We should expect middleware to emerge for mobile audio (like the Miles Sound Driver did for games and Macromedia's Flash did for the Web). There will also be in-house development of tools for proprietary systems (such as Xact for Xbox and Scream for Playstation).

What worked: Cheese Racer - a game developed for the Hiptop platform that contains a completely interactive soundtrack, consisting of MIDI + sample data background music that changes as the user progresses through the levels, and utilizes multiple use of highly compressed samples varied by pitch to increase variation. The entire audio budget for the game is 100k.

Action Items

This report will be submitted to the Mobile Audio Working Group (MAWG), sponsored by the IASIG, to be discussed and possibly included in the group's report. This will be accomplished by working group chair, Peter Drescher.

Other Good Ideas!

1. Audiocon (audio emoticon):
During instant message (IM) conversations, it is commonplace to include a variety of "emoticons" such as ;-) wink or :-D laugh. Some systems parse the sequence of punctuation marks and replace them with "smiley" graphics. Similar processing could be done to play built-in and/or user-assignable sounds to accompany the smileys.

2. Voice transformations:
As DSP algorithms become more sophisticated due to increased CPU power, voice audio on the phone could be transformed so that users could "spoof" other people's voices or create audio "avatars". For example, a user could choose to sound like Darth Vader or change the gender of their speech. This might be appealing for teenagers, practical jokers and extortionists.

3. Playing audio on your phone:
It would be useful to be able to feed sounds coming from the phone's audio subsytem directly into the phone's radio during a conversation. This would allow the user to play a ringtone for the person they are talking to, or interject a movie quote into a conversation for effect.

4. Broadcast audio:
Users will most likely eventually demand the ability to listen to radio broadcasts on their phones, along the lines of Webcasts on the Internet. Given that cell phones already contain radio modules for connecting to cell phone networks and the Internet, phone manufacturers might do well to plan for this eventuality.

5. 3d audio for conference calls:
As 3d audio systems are developed for phones with stereo speakers, it might be useful to separate out multiple voices in a conference call and shift them left, right and center in a (virtually larger than the phone itself) stereo field. This would facilitate keeping track of who was speaking at any one time. Since all voices currently come into the phone on a single multiplex line, this would be difficult to implement; it might be accomplished using FFT to distiguish different people's voice characteristics, or possibly using VoIP systems.

6. Split architectures are problematic:
While some phones are being designed with DSP chips built-in, this can sometimes cause more problems than it's worth, due to system overhead and bandwidth limitations on the bridge that moves data from the MCU to the DSP. In the long run, it may be more efficient to simply do all DSP type processing on the main CPU, as processing power increases.

7. Peer to peer social networking via Bluetooth (with audio alerts):
Many phones are now including Bluetooth technology for communication with devices in close proximity. This technique could be used for social networking in the following manner:
a) The user sets his social profile to things he is interested in (juggling, sports fan, unusual sexual practices, whatever).
b) When the user's phone detects another device in close proximity via Bluetooth, the phone might play a specific ringtone to alert the user that someone sharing his interests is nearby.

Other Reference Material

Edited transcript of Peter Drescher's "Ringtones Are Really Annoying" BBQ presentation.

section 4

next section

select a section:
1. Introduction  2. Speakers  3. Executive Summary  
4. Mobile Phone Audio - Lessons Learned from Games and the Web
5. A Whole-system Testing Framework for PC Audio
6. The Stroke-a-phone: A New Digital Instrument for Troubled Times
7. MIFFED (Music Industry Foundation for Educational Development)
8. PRAGMA (Pet Rocks and Game Music Alliance)
9. Schedule & Sponsors