|The Eighteenth Annual Interactive Audio Conference
PROJECT BAR-B-Q 2013
|Group Report: Ubiquitous Networked Audio|
|Participants: A.K.A. "The Kissingers"|
Ethan Schwartz, Conexant
|Tomer Elbaz, Waves|
|Jack Joseph Puig, Waves||Sergio Liberman, Freescale|
|Chris Grigg, MMA||Desheet Mehta, Beats By Dre|
|Doug Gabel, Intel||Peter Frith, Linn|
|Alex Kovacs, Cirrus||Med Dyer, Harman|
|Facilitator: Doug Peeler, Dell|
|download the PDF|
Currently there are multiple, incompatible methods for network connecting audio devices.
Below we identify shortcomings in each of the existing systems: Due to these shortcomings no one of these systems can be recommended as a preferred standard going forward.
We recommend that development of a new standard, (or a significant revision of one or more of the existing standards), be undertaken to incorporate the additional functionality needed (detailed below) and so enable the nascent rich marketplace that is waiting for interoperable, connected home audio input and output devices.
We recommend that a standardized solution be created to enable all networked audio devices to interoperate. Our recommendation requires support for both input and output of audio reflecting our belief that applications such as ‘Skype’ type communication are becoming as important as audio playback only applications in future homes. To accomplish this we propose to:
Expanded Problem Statement
At this point in time (BBQ2013) we are at the point where connectivity of audio devices, be they portable or static, is beginning to become normally by wireless or networked connection, the RCA jack and 3.5mm headphone connector finally being consigned slowly to the junk drawer.
However, the (perhaps inevitable) development of multiple connection methods and standards has led to an environment where the leading commercially available connectivity solutions will not interoperate and no one of the systems commonly available satisfies all application needs.
This has led to the unsatisfactory situation where the user might have up to four sets of compromised loudspeakers in his living room for example: one each for the TV, Hifi, Dock and Speakerphone, instead of having one set of networked high quality speakers that any of these four ‘applications’ can utilise.
Furthermore, these audio systems only consider playback: we are today at the point where users are beginning to consider IP telephone and associated full audio bandwidth communication a norm and as such, future networked audio resources must support audio capture as well as playback.
Expanded Solution Discussion
In Practice Today
Apple devices will ‘Airplay’ only to Apple compliant renderers, and then only in 16 bit 44.1ks/s sample rate so the user cannot experience studio quality 24/96 playback..
Android device users in the same household must have their own solutions - they cannot use the Airplay devices of their friends or family. Some android devices are beginning to support higher resolution audio, but then when the user wants to watch a youtube video on his device he finds he is unable to stream the audio to his networked high quality audio speakers whilst having the video play on his device or stream to his monitor/TV display.
And none of the solutions have considered the emerging desire to capture the audio in the home to support IP telephony - Who wants to have to hold a handset to their ear whilst conducting family telepresence call to Grandma in Australia using the big screen TV display and the high quality networked audio playback speakers?
What is missing are standards to make all the hardware solutions talk nicely to each other. The closest thing we have to a standard appears to be the uPnP discovery solution combined with the DLNA (Digital Living Network Alliance) set of recommendations.
uPnP defines the three primary components of today’s audio distribution devices and DLNA describes how they should interact. The uPnP specification describes an audio system comprising connected control device, content and playback or render device. This system can conveniently be expressed as a ‘triangle’ of connections with the three functions described above at its corners:
A capture device is any source of audio, primarily a microphone and its associated digitizer. Data from a capture device becomes Content, available to multiple applications potentially at the same time.
A render device is any means of playing Content audio to the end user, primarily a DAC, amplifier and speaker(s). There may be one to many render devices on a particular network. The Control may logically group the render devices by function or proximity.
The Current Network State
Examples of “self contained” and currently non-interoperable triangles are:-
Clearly there are redundant Render/Capture devices in the same room. Render/Capture device from one system triangle don’t currently communicate with other triangles. It may make more sense to playback content on another system triangle, but currently there is no easy method to route the stream. The end result are multiple duplicated devices, each of compromised quality.
Recommendations for Solutions
As such we would like to influence the solution specification owners i.e. DLNA or H/W manufacturers. In the case of DLNA we would like to motivate creation of a DLNA spec 2.0 with extensions to address the problems described above..
Recommendations for Next Generation Networked Audio
1.0 Support audio capture
a. Majority of today’s CE devices have capabilities to capture audio. These devices are on the network, but do not have a mechanism to stream the content to rendering devices.
b. In the near future, we are anticipating the emergence of applications such as telepresence, voice control, VOPI telephony which require multiple audio capture devices. These might be built into existing devices or added as standalone network devices.
c. We anticipate microphones all around us will stream onto a network and be available simultaneously to multiple applications.
2.0 Support for both ‘push’ and ‘pull’ streaming
3.0 Support for ‘split destination’ streaming
a. Currently multimedia streams can only be sent from source to a single renderer, for example a TV. In future it is desirable to be able to send the video component to a display screen whilst simultaneously sending the audio part of the program to a separate audio rendering device (speaker).
4.0 Local tracklist storage
a. The current solutions require the content device to provide a single track which gets rendered. When the track finishes the controller requires to instruct playback of the next track. If the controller has gone to sleep (batteries flat for example) playback stops.
5.0 Volume control support
a. UpnP / DLNA in the future spec needs to track and allow networked volume control
6.0 Multi channel/room synchronized playback
a. Future DLNA spec should support synchronized playback when multiple rendering devices are available.
7.0 Non audio device support (or at least tolerance!)
a. It is anticipated that networked non-audio devices like security, lighting controls, and many more will connect to home networks in future. The new specification must support (or tolerate) the discovery mechanism, and potentially support the streaming specifications or these non audio devices.
Appendix: General notes captured during brainstorming activity
select a section:
Copyright 2000-2014, Fat Labs, Inc., ALL RIGHTS RESERVED