home  previous   next 
The Eighteenth Annual Interactive Audio Conference
PROJECT BAR-B-Q 2013
BBQ Group Report:
HD Audio Capture in Consumer Devices
   
Participants: A.K.A. "Dark Side of Devon"

Phil Brown, Dolby Labs

Devon Worrell, Intel
Scott McNeese, Cirrus Logic Diby Nandy, Knowles
Leng Ooi, Google Ted Kao, DTS
Michael Jessup, Dolby Mikko Suvanto, Akustica
Facilitator: Phil Brown, Dolby Labs  
 
  PDF download the PDF

Problem Statement

Consumer “smart” devices are used to capture audio for multimedia, voice and speech. Such user generated media is becoming an increasingly greater fraction of media consumed through video and audio sharing services and social apps. Ubiquitous audio capture is being driven by access to easily available multi-purpose devices like tablets, smart-phones at very accessible price points. Such devices tend to be limited by the platform, which may be designed primarily for one purpose (usually not audio) and be re-purposed to handle other purposes, which include audio as a minimum requirement check-list item. Constraints are also imposed on acoustic design by evolution of devices to be smaller, thinner and lighter. Thus, audio captured by current mobile devices has low quality and fidelity.

The group identified use cases that highlight the deficiencies in audio capture and provide opportunities for high quality consumer audio

  • Use phone/tablet as camcorder – long range capture
    • record concert, kids playing, activities, lectures, conference
  • Capture people talking
    • Voice communication – skype, facetime
      • Transmit and receive clear speech
    • Speech recognition
    • Simultaneous communication + speech recognition
      • Distinguish and manage communication speech and command & control words
    • Biometric analysis
      • Voice Recognition
      • Stress, emotion detection
  • Acoustic scene analysis
    • Activity detection during low power standby
    • Sound track acoustic analysis to determine context of the content
    • Use mic to monitor and optimize playback performance
  • Directional/focused capture
    • Full band audio capture (concert) – at 30’ but not interference at 5’, record based on proximity
  • Wireless/Remote capture
    • Lavalier mic on a speaker broadcast over a local network
  • Control directional capture automatically
    • While changing camera or when the device is rotated. E.g. during Skype/Facetime capture
  • Capture audio on wearable devices
    • Command and control on wrist in any position
    • Context based audio capture for smart eyewear for multi-media, command and control, communication
  • Multimedia capture
    • Capability to provide mono, stereo, surround, spatial depending on playback mode
  • Capture and stream real-time or store it for later playback
  • Capturing ultrasonic data
    • Impact of location of mic, port geometry

Key Problems
The problems may be defined by limitations that arise in old and new use cases for audio capture enabled on “smart” mobile devices.

  • Dynamic range limitations in the transducers
    • Noise floor of microphones limit lower end.
    • Acoustic handling capabilities limit high end.
  • Use of multiple microphones on a device:
    • Unable to select a subset of microphones e.g. horizontal pairs of microphones based on orientation.
    • Unable to use more than 2 microphones simultaneously
    • Different types of microphones are being used on a device although where they are located and which microphone(s) to use in a specific application and orientation is unknown
  • Devices are not capable of fully determining the desire of the content creator, even in limited contexts. It is challenging to determine what to capture e.g. environment, individuals, wideband, narrowband, speech, voice, etc.
    • Sensors, like accelerometers and gyroscopes, which may provide context are not being exploited for controlling audio capture.
    • Power management: Sensors are not on same power domain and may not all be accessible in the same power state.
    • Components like microphones and codecs usually come from different vendors and have different performance characteristics.
  • Processing solutions/algorithms
    • Algorithms come from multiple vendors and they don't interoperate.
    • Most noise reduction produces monaural where spatial audio is preferred.
    • OS is impediment to high quality audio capture.
  • Audio quality is compromised due to BOM cost of devices and software

Proposed Solutions

The group determined that solutions need to be defined in terms of the full platform design.The diagram below defines the interdependencies between the different components. The following are necessary to enable such capabilities

  • More microphones
  • Better microphones with improved SNR, dynamic range, resonance, sealing & isolation
  • Glue-only microphones to improve fidelity and to lower cost
  • Single package microphone arrays
  • Better speakers for better echo cancellation and playback and recorded content
  • Better algorithms that work well with microphones and codec –  robust to microphone placements, distance and quality
  • Improve dynamic range through microphone control of amplifiers
  • Microphone characterization / parameters available to algorithm developers and in real-time to system
  • Real time availability of sensor data to improve ambient contextual awareness e.g. orientation, geo-location, focal distance of lens, distance of the object, face recognition, distance of object, time stamp, format, position
  • Pluggable compute architecture to extend processing capability
  • Ensure that needed sensors are ON when microphones are used
  • Real time algorithm change based on sensor data
  • Standardize info reporting so codec, microphone, algorithm developers can acquire info for device customization, updates, etc.
    • sensor, components, block diagram, what the app/algorithm developers need for development
  • Smart processing – AGC/ALC, Spatializer, are available

Block diagram

High Fidelity Capture block diagram

References

2011 Definition of Audio Quality and Happiness
Explores audio quality in terms of experience and presents 6 metrics that attempt to revitalize the definition of ‘quality audio’ by focusing on consumer experiences.
2008 Smart Ambient Sound Sensor
Proposes the creation of a new form of acoustic monitoring for the PC space that can be used to improve user experience with minimal user interaction.
2006 A Consumer-friendly Quantifiable Metric for Audio Systems
A proposal for a consumer-friendly quantifiable metric for audio systems that can help provide a great listening experience for the user, as well as generate market growth through increased awareness of the value of quality components.

The action item list


 

Who’s Responsible

Due Date

Description

1

Diby

11/21/2013

Complete report for publication

2

Devon

 On going

Make recommendations to OEMs on designs

3

Diby, Mikko

 On going

Microphone to improve design

4

Leng

 On going

OS: Microsoft, Apple, Google to provide methodology to provide sensor data,

5

Phil & Mike and Ted

 On going

Algorithm developers to update algorithm

section 4


next section

select a section:
1. Introduction
2. Workgroup Reports Overview
3. Ubiquitous Networked Audio
4. HD Audio Capture in Consumer Devices
5. Enabling More Profound Human Expression with Modern Musical Instruments
6. Using Sensor Data to Improve the User Experience of Audio Applications
7. When is Hardware Offloading Preferable, Now and in the Future?
8. Schedule & Sponsors