home  previous   next 
The Twenty-second Annual Interactive Audio Conference
PROJECT BAR-B-Q 2017
BBQ Group Report: Mode and Nodes Enabling Consumer Use of Heterogeneous Wireless Speaker Devices
   
Participants: A.K.A. "Party Mode"
Omri Eisenbach, CEVA Mike Determan, Analog Devices
Leslie Ann Jones, Skywalker Sound David Roach, Magic Leap
Jeroen Langendam, I to M Julie Stultz, Fairchild / On Semi
Randy Stephens, Cirrus Logic Jack Joseph Puig, Waves
Ajay Kanji, Tempo Semi Whit Hutson, Synaptics
Gerard Andrews, Cadence Markus Altman, Dolby
   
Facilitator: Jim Rippie  
  PDF download the PDF
 

Wireless Audio Speakers

Introduction
Over the past years the market for wireless audio devices has grown dramatically. A wireless audio speaker offers audio playback supported by connectivity, expandability, flexibility, and consumer convenience. Initially the market evolved from boom boxes and speaker docks to a variety of wireless smart speakers, i.e. Sonos, Bose SoundLink, Harman/JBL, Denon, Sony.

Since 2015, the introduction of interactive smart speakers, i.e. Amazon Echo and Google Home, as well as the announcement of the Apple HomePod introduction for December 2017, are propelling the market. According to Global Market Insights, the interactive smart speaker market only will surpass $13 billion in 2024, with shipments of over 100 million units.

Market Drivers:
The market drivers for wireless audio speakers consist of:

  • Increasing demand for Internet of Things (IoT) devices, mainly related to smart homes.
  • Increase consumer convenience.
  • Improvement of functionalities and audio quality of legacy systems beyond playing music.
  • Connecting multiple wireless speakers in multi-room audio systems (aka Party Mode).

Brief problem statement

There is currently no defined standard to integrate one or more wireless node(s) or feedback an input to affect the output(s). This means end users can’t realize maximum value out of a single or collection of different wireless speaker device(s).

A brief statement of the group’s solution

Define a standard with common, wireless transport method where input, not limited to speech, has the capability to inform the output in some meaningful way.  This enables an end user to integrate multiple products without having to rely on a proprietary solution.  It also enables new capabilities that can improve the user experience.

Expanded problem statement

Deficiencies in the system

  • Performance metrics are not great right now, i.e. THD+N.  The performance metrics poor because of what we’re measuring
  • Bluetooth and WiFi audio, very different bandwidths.  Should we do one or the other?
    • Should we focus on these existing standards or new standards? 
    • All audio transports will have these problems.
    • On battery power, we’ll have more issues to drag us down
    • Do any of the competing standards give us what we need
    • Bluetooth is widely adopted – without wireless transport
    • Some standards like HD audio are widely embraced by the whole industry with smaller points of differentiation
  • Walled garden
    • Driven by large companies – adding custom specs to the end of existing standards
    • Similar to an Apple store, etc
  • What are the needs by consumers?  Are they different for everyone?
    • What are the issues with the current industry?  i.e. Apple Match will take over and replace music unintentionally
  • Most devices now have microphones and speakers
    • Even stand-alone speakers have mics for calibration
  • We need to harvest the power
  • Are standards inadequate?
  • What’s currently available in the WiFi space?  Proprietary only?
  • There’s a list from last year’s BBQ on wireless transports
  • Agree on the Transport
  • What are the issues with the existing transports
  • Are we talking transducers or just speakers
  • OUT OF SCOPE - Power / Battery only has inherent issues. We won’t focus here.
  • Sound pressure is a battery issue?  Is the battery truly an issue?
  • What needs improving in the wireless audio system?
    • Moving from stereo to multi-speaker can impact performance. 
    • Better Codecs built into the standard.  Open up existing “closed” format?  Or new format?  Is the issue with the standard? How can we have the highest quality with existing standards.  Perhaps we need a Bluetooth standard specifically designed for wireless audio?  (This is in progress, but there are other standards)
    • Physical limitations.  SNR.  How can we fill the physical near-field space with the best possible quality audio. 
    • Audio quality of small speakers.
    • Interference.  Interruptions in streaming
    • Battery lifetime, power consumption is too high.  Improve the battery technology or just reduce power consumed (i.e. Optimize dead time)
    • Audio latency – must be able to be augmented and controlled.  Synchronize LFE, must be accurate. 
      • (Content originates in the device.  Both digital light and sound. Same performance wired or wireless.) 
      • Inability to control sync.  Is there a good way to align the signals of different frequencies?  (i.e. Speakers and subwoofers). 
      • Additionally – video sync, gaming, TV/soundbar, and phone call latency.  Does the source matter? 
      • How much latency is too much?  (i.e. 20ms? 50ms?) 
      • Can we bind latency to physical human senses
      • Codecs play a big role in latency issues. 
      • We need to address the latency at the system level
    • Lack of high frequency content due to the size of the speaker.  Speakers are designed to compensate for the delivery
    • Interoperability
      • speaker from one company won’t work with the speaker from another
      • System to system – need to work within your vehicle and within your home.
    • Bluetooth range could be improved
    • High quality profile for audio from Bluetooth ATA – multi-room and multi-speaker
    • Mic placement in the system.  Mic and speakers (transducers) capture and render don’t coexist well in a given device, let alone distributed systems.
    • Drop outs, lost packets, buffering dodgy, and interference.

Deficiencies of Vehicle

  • Voice capture is difficult in a noisy environment
  • Good playback is difficult in a noisy environment, specifically any music (i.e. classical) with a dynamic range.  All use compression.
  • Bluetooth pairing’s a bitch
    • Dropping connection
    • Tapping in the code repeatedly
    • Only one device can connect at a time.  We believe this should be included in the new spec. 
    • Bluetooth has different standards and users aren’t aware of the interoperability limitations.
  • Voice identification and isolation of each passenger in a vehicle
  • Microphone pickup.  There are no audio zones for passengers other than the driver. 
    • It’s difficult to hear the different passengers at different locations in the vehicle. 
  • As the car fills up with people, the experience of each user should not be compromised or interrupted.  
    • i.e.  a couple of MEM’s mic modules in the LEDs compartment over head. 
    • Add a pressure sensor in each seat to know when passengers are present. 
  • Easier problem to solve because we’re a closed system and not free space. 
  • Unique to vehicles:  Noise canceling, but this could extrapolate to home systems as well. 

Deficiencies of Home Systems

  • Speaker location / position can be anywhere.  Lots of freedom of speaker placement.  We have the freedom to make them sound like shit.
    • We recommend making sure each wireless speaker has a sensor and location detection for calibration.  
    • Speakers could also include that movement sensor so the speaker knows when it is being moved.  It should calibrate automatically
    • Multiple speakers need to have controllable latencies.  This could be done through a master clock in the system and location of each speaker and self report. This can be inexpensive through a supersonic ping.  This can be crude sonar. 

Additional Notes

  • New bluetooth standard for audio is already in the works.  48K, 24bit. 
  • Current Bluetooth - Shitty Bluetooth Codec (SBC) right now. Previously designed for voice only.  Also a sub-group for hearing aids.
  • Manufacturers decision to follow the new standard will come down to cost.
  • Bluetooth may not be a solution for AR
  • (Refer to last year’s report for other standards.  Last year’s report was focused on smartphones.)
  • Why Dropouts?
    • Is this under the certification?
  •  Pass audio streams back and forth between devices of different types
  • Should we narrow to standards?  Or physical device?  Or output only?  Entertainment speakers? Can we narrow to wall powered?

Wireless Speaker Features

  • Cater to the preoccupied.  It needs to be the path of least resistance. minimal steps.  user friendly
  • Dynamic calibration / adaptation – track people through the house
  • Room coordinates
  • Collecting environment data “real time”
    • Temperature
  • WiFi and cloud connectivity
  • Voice controlled
  • Interop with screen (embedded or separate)
  • Radar, map the room.  This needs to be dynamic.
    • More people in the room
    • AC comes on
    • Needs to be “trainable”. 
  • Acoustical drivers, current and voltage sensing for correction
  • Limited to Bluetooth sending back audio data stream.
  • In transport – we need communication between multiple devices.
  • Change design dynamically? 
  • Constellation – microphone / speaker combination.  Mics process the sound in the room then feedback through the speakers. 
  • Some DSPs can operate in a tight loop to fit a physical model better.
  • Cost? Start expensive and make sure it works right first, then reduce the cost. 
  • Convenience – this will drive the market
  • Very poor interoperability - If the standards were better, manufacturers wouldn’t have to create proprietary solutions. 
    • Some companies (OEM) will license chipset and protocols for ODMs to create devices that are compliant with their systems (i.e. Apple CarPlay ,Android Auto, etc)
    • The drawback is the licensing is expensive and legal terms are complicated (“Apple’s your Daddy)
    • The benefit is that without the guidance by the OEM, there would be chaos at the ODM.
    • In there a way to make speaker systems modular and easily upgrade-able.
    • Some companies will force manufacturers (Android Auto) to be compliant with certain specs. 
  • Low cost.  Ability to decrease costs. 
  • Categories  (all include input back)
    • Wireless, but powered
    • Wireless (no power cable)
    • Hybrid system (Some wired speakers, some wireless, eg. vehicles)
  • Who’s controlling the experience?  User’s control the experience at home, why is the car any different?
    • Should there be a standard for audio and speaker systems in vehicles?
      • Reference design houses are going after the hardware (i.e. QC, Intel)
      • Software standards won’t make sense as the cars are designed 7 years out so standards will be outdated by the time the car hits the market.  Updates would be painful
      • Every two years some modules are replaced. 
    • Can we create a standard for automotive and home for hardware?  Can we test this in an objective way? (i.e. specific, measurable parameters)
    • Companies would need to see an advantage to go to a common spec.  This would have to be profitable for the manufacturers in some way.  The Apple/Google’s as well and car manufacturers don’t want to be a “service” provider only for infotainment.  They want to own the whole system and provide the experience
    • We need to influence the Google’s and Apple’s.  They are controlling the house and vehicle experience.  Can we influence the Ford’s and GM’s instead?
      • Apple and Google are NOT doing a good job defining the speaker systems.
      • The front end works well, but the backend needs improvement and standards.
      • Testing standard and hardware protocol standard.  We need a standard to define how to implement mics and speakers in a car.  Right now it sucks because the Apple and Google standards are insufficient.  We have trended away from OnStar.
      • The reason Apple and Google haven’t done this is because they’re more focused on the “bigger” IoT system
      • The brands won’t succeed if the experience doesn’t improve.
    • What does the consumer want in the wireless speaker system?  The consumer wants the familiar, “fruity” experience in their car?
      • Consumers may purchase cars based on technology and experience.  (If you can’t get your texts in car, you may not buy it.)
      • Consumers want Switzerland – brand experience through performance measurement. 
      • Consumers have a minimum bar that they will demand.  What is the minimum bar?  Can we define this?  (Right now the emperor has no clothes).  With no established bar, we have an opportunity to define it.
      • We need to connect the input, output, and sensor hubs
      • We need a system that will perform and a spec that enforces it through a certification
    • What is the hub?  Cell phone, speaker, or other?
    • We need a standard to profile the systems and objectively quantify the performance.  We don’t need a single profile, we can have different classification levels based on price / cost and performance targets.
    • Perhaps we have classifications for ingredients, not specifically how they all fit together

We can’t address today:

  • Effective sound field coverage for capture and render
  • Extensibility for protocols

Expanded solution description

Defining the System:  Intelligent speaker connectivity

  • Do we already have what we need?  Yes, it’s happening at a small local level in a small closed loop.  This cannot be processed in the cloud because of low latency.
  • We’re defining the interaction between speakers, not necessary the performance in a single speaker.
  • The microphone itself isn’t a wireless node.  The actual node can be selected by the designer. 
  • Capabilities are reported at the node level
  • System integrator designs the device which defines the capabilities that are reported
  • The node must advertise its capabilities. 
  • We MUST have inputs back into the system.  We need to have access to this information, it must be broadcasted somewhere. 
  • We may have a higher level system that can sync with any device (phone, IoT system, etc)
  • We need to have a minimal standard for this audio quality. 
  • All smart amps have an ADC feedback loop (including microphones) 

Game changer summary

  • Input MUST inform the output in some meaningful way and not just limited to speech. 
  • Context aware speakers.   
  • Integrate an input channel for EVERY speaker that influences the playback.  There must be input and not just a focus on the output playback only. 
  • Speaker networks need to self report. 

Framework

  • An unwired physical space (room, outside, etc) beyond the default compensation with an undefined, multiple, asynchronous number of channels including playback and listening to the environment. 
  • We will focus on vehicles as closed system, but we believe the vehicle challenges are easier to solve, but the solutions can be a starting place for larger, open space.   
  • Bandwidth would be asynchronous.   
  • “Unwired” is audio transmission.  Improving quality and fidelity within the space. 
  • What do we have today and do we need improving?  We have to start with what’s on the market now, many already have tuning apps.  Self sensing device.  Receive only.  Needs to gather data.  Improve the delivery and quality. 
  • Is Bluetooth adequate?  No.  there are still issues with the transport protocol and quality.
  • Ear buds
    • Batteries need charging
    • Step on them and break them
    • Lose them
    • Noise cancelling requires ultra low latency
  • Excluded from our framework
    • Earbuds, headphones – perhaps our speaker focus can extrapolate to earbuds
    • Active noise canceling

What to standardize

  • We don’t suggest standardizing anything that offers opportunity for competitive advantage.  Focus on the boring areas that waste time when multiple companies have to implement the boring things.
  • Two areas to standardize
    • Transport
      • (Refer to last year’s table and report)
    • Device Capability and Configuration -
      • Mechanism for publishing device capabilities that affect management in a larger and/or multi device system
        • Playback / Output
          • Channel configurations
          • System identity and capabilities
          • Types of bandwidth
            • Ultrasonic
            • LFE capability
          • SPL – Weighted average at half a meter
          • Latency
        • Sense / Input
          • Microphone
            • Number of mics
            • Relative position
            • Preprocessed or not
            • Certification (AVS, Google, etc)
            • Bandwidth – which modes
            • Sensitivity
            • Voice or self activated
            • Frequency
            • Latency
          • Other sensors
            • Temperature
            • Location (absolute, relative)
            • Orientation
            • Device Movement
            • 3D Sensing (Room measurement, people tracking)
            • Light sensing
            • Output load / impedance sensing
      • Codec compatibility – Inherent is a requirement
      • Testing standard for performance
        • THD+N, SPL, FR,
        • Optional certifications
        • Reliability of the speaker
      • Optional features – security,

Items from the brainstorming lists that the group thought were worth reporting

Use case

  • Overall case:  Content delivered via the LAN to device then device to speakers.
  • “Dumb use case” = Single speaker system playback audio quality; multiple speaker system playback  (Speaker system = single enclosure with one or more drivers).  No positional information.  No input back into the system.
  • Listener fatigues when listening to poor quality music.
  • Smartphone with earbuds?
  • Hearing the monster jumping out on Grandma
  • On the phone with grandma
  • Music adjusting volume when cooking
  • Controlling a system with speech
  • Someone who wants to network their home and use the speakers for multiple purposes.  IoT security
    • All mics can be enabled for recording.
    • Speakers can detect intruders and network with the security systems.
    • The speakers can let you know when someone’s in the home.
    • The user should be able to turn these alerts on and off.
  • Situations where you don’t want to touch the phone, speaker, device, etc
  • Moving from car to home and room to room.  Audio follows
  • We don’t want to have to do a prescribed dance.

Other reference material:

Last year’s report: https://www.projectbarbq.com/reports/bbq16/bbq16r7.htm

  • From last year’s report: 

Requirements

WiFi

5G

BT

BLE

Low Power

no

?

maybe

yes

Reliable audio communication

?

?

no

?

Controllable latency

no

?

no

?

Geometry/Location reporting

no

no

no

no

Low cost

no

?

yes

yes

Non-proprietary

yes

yes

yes

yes

Discoverability

yes

yes

yes

yes

Self configuring network

yes

?

no

no

Geometry/Location reporting

no

no

no

no

Nearfield inter-communcation

yes

yes

yes

yes

Hubless multi peer connectivity

yes

?

yes

yes

Multi-bidirectional audio channels

yes

yes

?

?

Scalable audio channel count

yes

yes

no

?

Versatile audio formats

yes

yes

yes

?

Command & Control

yes

yes

yes

yes

Highly accurate synchronization

yes

?

no

?


section 7


next section

select a section:
1. Introduction
2. Workgroup Reports Overview
3. Alexa, Siri, Cortana or: How I Learned to Stop Worrying and Love the Cloud
4. “You and the Uni: Defining Pedagogical Requirements for Audio Engineering Education” a.k.a. Discovering What to Learn Them Young Whippersnappers
5. A spatial audio format with 6 Degrees of Freedom
6. CAAML: Creative Audio Applications of Machine Learning
7. Mode and Nodes Enabling Consumer Use of Heterogeneous Wireless Speaker Devices
8. Abusing Technology for Creative Purposes
9. Schedule & Sponsors