home  previous   next 
The Twelfth Annual Interactive Music Conference
PROJECT BAR-B-Q 2007
brainstorming graphic

Group Report: Fixing Broken PC Audio

   
Participants: A.K.A "Mr. Miagi’s Little Trees"

Steve Rowe, Microsoft

Peter Otto, UCSD
Mary Cunningham, IDT Scott Arnold, Akustica
Keith Weiner, DiamondWare Devon Bergman, Dolby
Scott McNeese, D&M David Roach, Optimal Sound
Gary E. Johnson Facilitator: Linda Law, The Fat Man
 

Problem Statement

Identify the major problems facing PC audio today and in the near future.

The bulk of the audio problems being seen on the PC today are being driven by new trends.  These are:

  • the PC becoming a primary communication device
  • the PC becoming a central media device
  • the upward trend of laptop/handheld market share. 

This group identified approximately 40 problems facing the PC today and prioritized them by urgency and importance.  We then approached the topmost items on the list by further:

  • defining the problem space
  • identifying potential solutions
  • identifying current attempts at solutions within our collective knowledge

Here are the Top 5 problems that we identified.

  1. Problems involving audio through HDMI and/or Display Port
  2. Problems involving Content robustness rules & DRM Inside the platform. 
  3. Bluetooth Connected Audio Devices
  4. Managing Multiple “codecs”
  5. Speech Processing (capture, fidelity, level)

Below are the detailed problem descriptions for the Top 5 contenders:

1. Problems involving audio through HDMI and/or Display Port

Problem Description:

  • HDMI appears as a separate codec on board (MB) w/ separate end points.
  • User plugs HDMI w/no audio playing. OS has a different endpoint such as internal speakers set to default and HDMI audio is unheard. User needs to find default endpoint selector 3 clicks away.
  • Audio streams that require processing but do not have access to that APO. e.g. You want to have signal processing ($$Dolby$$) on your HDMI audio stream.
  • Problem is magnified when multiple codec vendors want to add proprietary value.
  • HDCP handshake is often unavailable and otherwise unreliable
  • Audio only plays when video is playing
  • Display Port still being defined

Potential Solutions:

  • One software stack (HDA driver) to control all audio endpoints to create a positive usage model. Application or plug event driven vs. user knowing to define end points.
  • More intuitive UI for multiple endpoint control (see below)
  • Known good configurations are stored and recalled
  • Defining audio device roles

Known Current Solution Attempts:

  • Individual codec vendors are providing custom solutions
  • Vista audio structure provides partial solution
  • HDCP will become more robust with time and iterations
  • Display Port expected to track HDMI spec and benefit from its solutions

2. Problems involving Content robustness rules & DRM Inside the platform

Problem Description:

  • Each company shipping a product has to evaluate whether it complies with predefined (and vague) content robustness rules.  Failure to comply has high financial penalties.
  • Hollywood will sue people that don’t comply. This scares off decoder vendors.
  • Hardware components may be called on to provide encryption for all internal audio paths.  There is no chip-level encryption today on the HD Audio bus.  This will increase the cost and complexity of the chips. 
  • Acoustic echo cancellation will be hampered because they cannot easily access output streams.
  • Watermarking detection is processor intensive.  
  • This is mostly being driven by video.  Music is moving toward unrestricted content.

Potential Solutions:

  • $$$$Lawyers$$$$. 
  • Shakespeare had it right.
  • Encrypt all signals on the motherboard (solves 1&2, exacerbates 3&4)
  • Pressure entertainment industry to back down.
  • Decode inside device/codec.

Known Current Solution Attempts:

  • Interest in watermarking for copy protection is fading
  • Software-only attempts were made in Vista but viewed as inadequate.
  • Repeated efforts in industry to encrypt on hd audio bus have so far been unsuccessful due to economic reasons.

3. Bluetooth Connected Audio Devices

Problem Description:

  • Bluetooth devices need to be exposed by the operating the same way other audio devices are. 
  • There are issues with pairing and stream redirection.  Needs to be more plug & play.
  • Potentially hard to connect multiple Bluetooth devices to a PC
  • No good way to manage multiple Bluetooth profiles

Potential Solutions:

  • Microsoft driver
  • Microsoft to adopt something like crossbar solution
  • Evangelize in Bluetooth SIG.

Known Current Solution Attempts:

  • Bluetooth radio vendors have their own drivers
  • Codec vendors provide custom solutions like hijacking audio stream and redirect to specific endpoint
  • See Crossbar work from previous BBQ (2005)

4. Managing Multiple “codecs”

Problem Description:

  • New potential for multiple different PC Audio devices on a single system:
  • Multichannel rear panel codec
  • 2- Channel front panel codecs
  • HDMI audio
  • Digital Power amp
  • Bluetooth
  • USB Audio
  • Lack of common functionality for all silicon
  • DRM/robustness issues if header on motherboard
  • ODM loses single point of support
  • PC’s have more than one audio device now. Difficult to select devices and control things like volume.
  • No common clocks across devices (hampers AEC)

Potential Solutions:

  • Better UI in the OS
  • Audio Device roles in the OS
  • OS could provide support for a single clock
  • See crossbar report

Known Current Solution Attempts:

  • Silicon vendors can write drivers for other devices to provide single point of UI
  • Applications like DiamondWare, Communicator, and Skype allow for device enumeration
  • Custom control panel applets and event engines
  • OEMs providing hard coded configurations
  • Companies like Optimal Sound can provide alternatives approaches

5. Speech processing (capture, fidelity, level)

Problem Description:

  • In historical terms, capture is only recently relevant. 
  • Analog microphone fidelity is highly variable, and both analog and digital mics often severely affected by industrial acoustic design.
  • Level controls, when present are counter-intuitive. Line input controls are not useful and should be removed.
  • Analog mic jacks are a big problem, as they are intended to connect to unknown mics (impedance and current).
  • Levels settings are confusing and counterintuitive, and users don’t know what to do. Most auto-level wizards don’t work well.
  • Laptop microphones are very noisy due to electrical layout

Potential Solutions:

  • Avoid analog mic jacks
  • Use built in mics preset for usage case
  • Remove line level input slider
  • “Skype” method of level setting – auto setting of levels.  Plays it back for you to evaluate.
  • An all-digital audio system using digital microphones and a digital power amp (and no analog audio I/O) will resolve most if not all of these issues.

Known Current Solution Attempts:

  • Microsoft to start testing Vista analog capture fidelity requirements.
  • USB Microphones (typically higher cost).
  • Array Mics
  • Digital Mics (Akustica)
  • Suppliers are helping OEMs to test their systems
  • Auto gain stage calibration (wizards).  Currently don’t work well.

Other items related to speech processing are:

  • #2 Multiple codecs on motherboard
  • #14 Cross-device AEC
  • #16 Crosstalk
  • #27 DRM
  • #28 latency

In order to provide an experience that “just works”, VOIP software must cancel any sound that comes from the user’s speakers back into the microphone.  This includes both echo of the remote parties’ on the call and locally generated audio (e.g. music).

AEC can be challenging if the mic is attached to one codec (e.g. USB) and speakers are on a different one (e.g. HDMI).

DRM is another obstacle because to cancel the echo, software must have the unencrypted audio stream that is going to the speakers as well as the unencrypted audio stream coming from the microphone.

Without AEC, VOIP will be headphones only (and unless #16 is addressed, there can still be echo problems).

This concludes the topics which merited an in-depth drill down.


Here are other problems that we identified and defined, but didn’t have time to dig into more deeply.

Power
Overall power usage of the audio subsystem must decrease.
Things to consider:

  • Overall power – fidelity
  • Sleep states in the PC?
    • How does this affect getting a Skype call?
  • How does process shrink affect power rails on the system
  • What about audio processing?
    • More processing means more power.
  • Power budgets dictate the amount of DSP’s that could run - could be a hit on EFX, etc.

Latency
Problem Definition:
Latency is the delay between when a program plays a sound and the time when it actually exits the speakers (or arrives at the ears).  This is not generally a problem for streamed audio but becomes an issue in interactive environments like VOIP, gaming, and music creation.  It is also an issue in video because long latency can cause lip-sync issues.
In a PC, each step in the processing chain takes time and thus adds delay.  The effective latency of the audio is the cumulative delay added by each step in processing.  Sources of delay include:

  • buffering (e.g. to send over USB)
  • algorithm (e.g. Dolby digital codec)
  • hardware
  • network stack
  • audio pipeline in the OS
  • application
  • AEC
  • Internet

For voice communications, end-to-end latency should be < 150 ms, and should continue to target lower latencies such as the 5 millisecond latency of POTS.

Potential Solutions:
Because delay is added at each step in the processing chain, there is no single solution.  Lowering the latency of the system requires that each step above be examined and optimized.  Some solutions include:

  • Less buffering in the pipeline.  Run the processing pipeline on smaller buffers which are processed more often.  This has a direct tradeoff with glitch resiliency and power usage.
  • Lower-cost algorithms.  If less-complex algorithms are used to process or decode the audio, less delay is introduced into the pipeline.
  • Move algorithms to dedicated hardware.  Processing can potentially be done more quickly in specialized hardware.
  • Pipeline optimizations.  It is possible that the processing pipeline in the application or the operating system are doing more copies of the data than necessary.  Another possibility is that two serial operations can be done simultaneously.  Each read/write cycle from memory is expensive.  Doing more processing per read will help lower overall time taken.

File Formats

  • PC’s and devices (iPods, Zunes, etc.) Can’t play well together

Poor acoustic design (and audio)

  • ODMs lack expertise to properly design
    • mechanical interfaces for speakers/mics
    • electrical circuits for good SNR and AEC performance
    • speaker and mic placement
  • ODMs lack the expertise and equipment to properly evaluate audio quality.
  • ODMs lack strong incentive to improve.

Cheap components/low quality parts

  • Low margins – cheap speaker components
  • ODMs/OEMs may want high quality audio
    • Don’t understand
    • Need education

Audio controls (volume, endpoint, EQ, etc.; UI)

  • OS/application software for audio, and to some extent hardware, is often poorly designed and counterintuitive.  Users would experience more satisfaction if UIs were more expressive, legible, useful.
  • Some issues include volume, EQ, play list, sound file position controls, etc. Other issues may include metadata display, play list organization, metering, help menus, vocabulary, etc.
  • No more toggle buttons: separate code mute 01h, not mute 00h

Here is our sorted list of potential problems to discuss, along with the results of a show of hands for urgency and importance for each item on the list.
 

Urgent?

Important?

1.     DisplayPort/HDMI usage models

8

7

2.     DRM inside platform.  Potential for watermarking/no capture

7

8

3.     Interoperability of connected devices

   

          a.       Bluetooth

7

8

4.     Multiple codecs on motherboards

6

7

5.     Voice processing

   

          a.       Capture (for voice especially)

6

7

          b.       Microphone fidelity

6

7

          c.       Microphone output level (tomb of the unknown microphone)

6

7

6.     Multistreaming usage models

5

7

7.     Audio controls (volume, endpoint, eq, etc.) – UI

4

7

8.     Cheap components (speakers)

5

6

9.     Latency

4

7

10.   Content robustness rules

5

6

11.   File Formats (one codec to rule them all)

5

6

12.   Poor acoustic design (cable/trace layout, noisy components near mics)

3

7

13.   Power usage

3

7

14.   Move to mobile causes lower performance parts

3

7

15.   How to communicate multichannel to the speakers

2

6

16.   Education of OEMs (sound engineers back in the business?)

3

4

17.   Software integration issue (think Mac)

0

7

18.   Need asynchronous AEC

2

4

19.   Array mic software integration

2

4

20.   Cross-device AEC

0

5

21.   Where does ADC/DAC take place?

0

5

22.   Crosstalk (input to output)

0

3

23.   Trends toward small geometry chip designs

0

3

24.   Amplifiers

1

2

25.   Windows mixer API

0

3

26.   3.5mm jacks – too small, too cheap

0

2

27.   Speaker metadata (how many? Where?)

0

2

28.   No good paradigm for multiple audio devices (same as 3)

   

29.   Broken 3D multichannel audio (what exactly is broken?) – skip

   

30.   Commoditization (same as 10)

   

31.   Multiroom streaming (sync?) – skip

   

32.   Calibration - same as 30

   

33.   Acoustical environment (room)- same as 30

   

34.   Interoperability of connected devices

   

          a.      Digital Audio Receivers – skip

   

          b.       Portable Audio Players (iPod, Zune, etc.) – skip

   

          c.      Screen (hdmi) – skip

   

          d.      CE devices – skip

   

section 4


next section

select a section:
1. Introduction  2. Speakers  3. Executive Summary  
4. Th Unfnshd Smph... Fixing Broken PC Audio
5. iHear the Future
6. Overcoming Roadblocks in the Quest for Interactive Audio
7. Call for a Highly Distributed Metadata
8. Game Producer's Guide to Audio
9. Next Generation Hand/Glove Controller
10. The Computer as a Musical Instrument
11. Schedule & Sponsors