| The Twentieth Annual Interactive Audio Conference PROJECT BAR-B-Q 2015 |
![]() |
Group Report:
Audio Of Things: Audio Features
and Security |
| Participants: A.K.A. "Audio Of Things" | |
David Berol, Knowles |
Konstantin Merkher, CEVA, Inc. |
| Avi Keren, DSP Group | Andy Lambrecht, Analog Devices |
| Mikko Suvanto, Akustica | Whit Hutson, Conexant |
| Trausti Thormundsson, Conexant | Jack Joseph Puig, Waves |
| Facilitator: Linda Law, Project Bar-B-Q | |
Problem Statement The Internet of Things (IoT) has been a catch all for many new technologies and products that are becoming increasingly connected. Many of these technologies address voice (and audio/sound in general) controlled interaction with home and personal systems for the purposes of home automation or other lifestyle or safety enhancing purposes. There are already a number of devices in our lives that are capable of always listening and this will only increase as more everyday items become part of the IoT. As IoT devices proliferate, it is expected that voice coverage within the home will increase. This raises concerns regarding safety and security. Audio aspects of the IoT, Audio of Things (AoT) need careful consideration for security, privacy, interoperability and collaboration/cooperation. Adding to security concerns is the increasing reliance on cloud processing for voice and audio processing. There are well known benefits for performing significant portions of audio processing in the cloud. The business models of Google, Apple, and Amazon steer them to perform much of this processing in the cloud. Each of these companies have introduced one or more devices which have the potential of becoming central to establishing a centerpiece to a home automation solution. It is believed that there may be opportunities for alternative solutions that partition the audio problem such that sensitive information can be contained within the home. Opportunity Statement The IoT is rapidly evolving and there are numerous opportunities available:
AoT Background The IoT is an expansive concept that encompasses the interconnection of physical objects in industrial, commercial, and residential environments for the purpose of collecting and exchanging information. In residential environments, a key aspect of the residential IoT is the constant sensing of the environment such that data collected from numerous sensors and be processed to determine specific situation and perform desired actions automatically as a result. Sound sensors (microphones) are key elements in the residential IoT and are capable of “always listening”. The ability to be always listening raises privacy concerns among many. Questions that were raised and addressed include:
Looking 5-10 years into the future, processor and memory technologies will increase predictably and, when coupled with new and improved sensors embedded in everyday objects, will result in enable highly intelligent homes. Ten years from now when one walks into the house, it is expected that arrival will be anticipated and/or recognized and the environment will be automatically set to known preferences. Patterns and preferences are learned for each resident and actions are performed automatically and learned compromises must take place when multiple residents are present with differing preferences. AoT Issues, Elements and Solutions The AoT workgroup looked at a number of general question and issues as well as the elements of a residential AoT system. Do sensors within the home generate a complete audio stream that can be mined in the cloud? Or, is a decision made locally and sent point-to-point?
What happens when access to the cloud is limited or nonexistent?
Security around Audio Scene Monitoring
There are two key areas of concern: Command and Control & Information Privacy.
Elements of AoT Audio Subsystem The AoT workgroup discussed the elements of an AoT audio subsystem and what improvements are needed over the next 5-10 years to achieve the vision of a highly intelligent home. Microphone, Source Isolation Large improvements are needed in far field source selection. Far field source selection is a much different problem than speaking directly to the microphone of a smart phone. The presence of multiple microphones coupled with algorithms such as beamforming can be utilized to provide necessary improvement. Although smartphones aren’t currently optimized for far field use, the audio picked up by a smartphone can be processed with audio picked up by other microphones in a room to isolate the desired audio source. Improvements in fidelity (lower noise, better THD+N) will also contribute to improvements. Keyword Detector There are a number of improvement recommended for the keyword detector. The keyword detector is a key function which is always listening for select words and phrases (“OK Google”, “Alexa”, etc.) to switch to a more active mode in which action can be taken. Improvements and recommendation for the keyword detector include
Speaker System AoT is not just about voice/audio input. The highly intelligent home will need to interact with residents. Recommended improvements and capabilities include:
Speech Synthesis Improvements in speech synthesis are needed such that it is natural and not robotic. Interoperable Communication and Connectivity Improved standards are needed to allow for interoperable connection and communication of various sensor types. Communication standards need to allow for cooperation and collaboration among sensors (sensor fusion). For example, in a room with a large number of microphones, a certain subset of microphones might be in the best position to isolate a source. Furthermore, additional sensor types might be used to determine the stress level of a resident to determine additional context. Converters (Analog-to-Digital converters and Digital-to-Analog converters) Improved price and lower power are needed to use in ultra low power objects. Input Sources Input sources include residents, visitors, sounds within the house, and sound generating devices such as TVs. These sources are what they are. Knowledge of the audio generated by sources such as the TV can be used to remove that audio source in the process of source selection. Similarly, knowledge of audio generated by sources such as TVs and computer games can be used in sound scene analysis to distinguish between gaming activity and actual in home situations that could require assistance. Metadata added to TV audio signals could be used to communicate with the AoT system. Noise/Distractor Elimination; Source Isolation See the “Microphone, Source Isolation” element for specific improvements and recommendations for far field noise suppression/source selection. A privacy trust issue needs to be addressed in that residents and visitors need to be comfortable that the AoT system does not address the cocktail party effect such that all conversations are being monitored and possible recorded. The system does need the capability of detecting owners’ commands in a cocktail party environment. Speaker identification/authentication/verification Improved price/performance is expected from existing products providing this capability. Sensors, sensor fusion New sensor types are expected to emerge and current sensor types are expected to evolve in performance and capability. Price/performance ratio is expected to improve over time. The ability to combine data from disparate sensor types (sensor fusion) to improve scene analysis and context awareness will improve as residential AI improves. Digital Signal Processing/Processor (DSP) Ultra-low power DSPs will allow more objects to become audio enabled. Price/performance ratio is expected to improve along typical timelines. These will allow for improved local command recognition. In some cases, analog signal processing may be used in combination with DSP to optimize product’s performance. Application processor Application processor performance is expected to improve along a predictable trajectory. A more open API is recommended to facilitate the addition of external capabilities. There is an open question that was raised – “How does vertical integration impact the evolution of AoT?” Companies such as Apple, and to a lesser degree Samsung, are increasingly becoming more vertically integrated with the internal development of their own processors, OS, and end product. How does this impact -- positively or negatively – evolution of AoT. Storage/memory Memory density is expected to improve along predictable trajectories. There are some architectural improvements that were identified that can facilitate some aspects of AoT. When external memory devices are needed, cost and/or performance can be greatly impacted. A choice often needs to be made between slower, pin-efficient memory devices and faster, high pin-count memory devices. Higher pin-counts typically translate to higher power. A fast, narrow interface is recommended to provide improved support for embedded real-time audio processing. AI/decision maker Great improvements in Artificial Intelligence (AI) are expected over the next several years. An AI that learns quickly and easily is needed. Distribution of the AI across multiple devices in the home may be necessary. Collaboration between devices that are part of the home and devices that are part of the person (i.e. smart phones) may be needed to eliminate conflicting or even counter-active actions. It should be comprehended that multiple objects/devices are contributing to decisions. The system needs to be adaptive to learn the habits and needs of individual residents. Appliance Control interfaces IoT is expected to evolve such that control interfaces with common appliances will become standardized. System security The ever growing threat of malicious attacks coupled with the sensitive private information and activity that is present in virtually every home, system security carries extreme importance. There is tremendous opportunity here for the creation of technology and products that can isolate and protect the home. Key aspects of the IoT security that need to be addressed are
Power source, power backup IoT devices are largely expected to be powered by something other than line power. Objects that are plugged in today (toaster) will continue to be line powered. But objects that are unpowered today are likely to be powered through batteries or alternate energy sources such as solar or energy harvesting. Batteries are expected to improve along predictable paths. Issues with battery level or energy sources need to result in resident friendly alerts. Access to cloud As more functionality relies on connectivity to the cloud, the more importance is placed on reliable access to the cloud. Improvements to continuity of access are needed. Access bandwidth is expected to improve over time. A backup access method (i.e. mobile phone network) would be attractive to some owners. It is recommended that the system have some degree of capability when access to the cloud is not available. This requires some amount of local capability that does not rely on the cloud. Additionally, manual override is recommended as a failsafe. Cloud Based Processing, Trust and Security With the sensitivity of voice/audio data that can be collected and then processed and stored in the cloud, there is great opportunity to create trust in how this data will be handled. Existing businesses can create differentiation in the areas of trust and security. If existing businesses don’t adequately address this, it is expected that there could be significant opportunity for businesses that create capability based on a transparency and trust model. Recommendations here include:
It is possible that the sensitivity to sending personal information to the cloud may be dependent on age/generation with younger generations less worried, or even apathetic to privacy issues. Access to mobile network for security alerts – likely commercial choice Use of mobile phone network for security alerts and backup access could be a differentiating capability. The presence of this capability is likely to be commercial choice of the system provider. User Interface (UI) UI is always an important aspect of technology products. Specific recommendations with respect to UI include:
Interoperability protocol/OS Siri and Alexa need to be friends. A homeowner should not be put in the position of deciding up front that a home should be an Apple home, a Google home, or an Amazon home. There is significant opportunity for a business to create a system level framework that is OS independent and will allow devices from different OEMs to cooperate and even collaborate. Standards are likely necessary to achieve this as a number of different companies are currently vying to be the home’s center of intelligence. Price It is expected that the price of AoT components will follow typical consumer pricing trends by declining significantly over time allowing AoT to become more affordable to the mass market. Additional Project BBQ Items Worth Mentioning The workgroup considers the following report from previous years’ Project BBQ worth reviewing section 4 |
|
Copyright 2000-2015, Fat Labs, Inc., ALL RIGHTS RESERVED |