home  previous   next 
The Twenty-second Annual Interactive Audio Conference
BBQ Group Report:
Alexa, Siri, Cortana or: How I Learned to Stop Worrying and Love the Cloud
Participants: A.K.A. "Always Listening, Always"
Michael Ricci, Knowles Bobby Littrell, Vesper
Dafydd Roche, Dialog Semi Phil Brown, Dolby
Chris Morrison, Dialog Semi Rajeev Morajkar, Analog Devices
Dan Bogard, Synaptics David Berol, Amazon
David Dully, Dolby Neil Hinnant, Microsoft
Jack Joseph Puig, Waves  
Facilitator: David Battino  
  PDF download the PDF

Brief statement of the problem(s) on which the group worked

There are now 10 billion devices that are always on and, for example, are able to discern music playing or to characterize the background before a keyword. What challenges do these systems bring in terms of ethics, security and privacy, like the expectation of a similar experience about privacy across multiple ecosystems? Can we do cool stuff and still keep our privacy intact?

A brief statement of the group’s solutions to those problems

We created a diagram to explain the flow of data from a hardware and software perspective in today’s systems to identify the trust boundary for each respective node. Of note for each node is the privacy, security and ownership. Privacy is defined as users having a clear understanding of where and how their data is being used. Security is defined as how safe the data is from unknown use. Ownership is defined as who can use, control access to, and store/remove data.

Software Model:

  1. Link between Mic and Firmware
    1. Security concerns: Physically connecting to the mic output (low probability) and/or Firmware hack (medium probability)
    2. Recommendations: Hardware switch and/or encryption method along with some type of indication mechanism to user.
    3. Owner: HW system vendor.
  2. Link between Firmware and Software driver
    1. Security concerns: TBD
    2. Recommendations: TBD
    3. Owner: Microphone Aggregator and SW Stack
  3. Link between SW driver and the OS
    1. Security concerns: Who can talk to the SW driver.
    2. Recommendations: TBD
    3. Owner: OS Vendor — If OS with multiple consumers (apps) have a device access trust model.
  4. 1/ Link between OS and Application layer
    1. Security concerns: single app environment — no concerns; in multi-app environment, the brokering of access to the data (microphone input)
    2. Recommendations: TBD
  5. 2/ Link between OS and OS platform data collection
    1. Security concerns: Excessive data collection (typically not screened, capture everything)
    2. Recommendations: Best practices like applied to financial data — encryption, appropriate tagging, strict policy
  6. Link between Application Layer / Device and Router
    1. Security concerns: No practical concerns but it’s the responsibility of application to maintain data encryption as needed.
    2. Recommendations: Clearly articulate the security namespace is captured
    3. Owner: Application developer
  7. Application and/or OS cloud
    1. Security concerns: What data is being collected?
    2. Recommendations: Opt-in for feature (like voice print improvement)
    3. Owner: Cloud owner
  8. Link between Application Cloud and 3rd Party, e.g., Bank, Retail, Utility Cloud
    1. Wrong cloud gets access to data outside of its need to know.
    2. Recommendations: Clearly define roles and responsibility for App Cloud and 3P Cloud
    3. Owner: Both Cloud services

Expanded problem statement

Original problem statement to reference: There are now 10 billion devices that are always on and, for example, are able to discern music playing or to characterize the background before a keyword. What challenges do these systems bring in terms of ethics, security and privacy, like the expectation of a similar experience about privacy across multiple ecosystems? Can we do cool stuff and still keep our privacy intact?
Next evolution of “always-on” systems taking into account privacy, clarity of terms of use, competitive landscape and use cases. Does always listening mean always analyzed and / or always stored? Stored for how long, who owns the data and who can access the data? Complexity vs. user experience in balancing all the permissions.
Expanded solution description

Privacy Expectations:

  • Articulate Usage States (LED state or icon is strongly desired)
    • Current state / Normal Mode
      • In a home, anyone can use, no parental control (free use), all apps use approved, purchase with pin.
    • “Family Mode (Multi-User / Speaker ID)
      • Tiers of users; purchase capability per account; automatic categorization of searches
    • “Friends” Mode:
      • Continue playing song or movie; Take a note for another person’s account.
    • “Incognito” Mode
      • A mode where search history is not save; keyword and apply to wake is still present.
    • “Mute” Mode:
      • Microphone off; some implement as a hard button (more robust);
    • “Privacy” Mode: may need to include camera as well.
      • Any and all capture functions are off. This is desire to be cross ecosystem and the same keyword for clarity to users.
    • “Off”
      • You know, Off.

Need to address what the minimum privileges are in broad report.

Shared keywords / Privacy mode:
This is addressing the idea of an industry-wide phrase or keyword to enable a given feature. For example, “Engage Cone of Silence” or “Privacy” to turn off all device across voice assistants (Google, Amazon, Apple, Microsoft, etc.).
Issues: Buy-in from all parties. Personality of the given assistant. Cost of performance to keywords or battery life.
Additional infrastructure: All devices may not hear you but if one does it pokes all the other device in your accounts (across ecosystem).
Advantages: Trust for users that they are in a safe space for what they want to do. Convenience to not have engage each device directly (push all of the mute buttons).


  • Privacy vs. Security
    • What is the implementation of background apps in the future?
  • Each Link SWOT
    • SW Side
      • Who’s responsible?
    • HW Side
      • Who’s responsible?
  • Standardization of trust model
    • Policies among cloud providers         
      • Articulate the terms and conditions for data storage and persistence
        • One person asks a question on another person’s account
    • Minimum Privileges
  • Recommendations
    • Privilege level change needs two-factor authentication
  • What cool stuff can we do?

 Items from the brainstorming lists that the group thought were worth reporting

  • What cool things can we do with always-on?
  • Camouflage the speaker so they can’t be listened to
  • Measuring accuracy of always-on devices
  • Can we preserve audio for posterity’s sake?
  • Ethical and Philosophical issues
  • Terms of Service & Clickthrough
  • Convenience vs. Privacy
  • Each product has different rules/terms etc.
  • Regulation and Legislation
  • Multiple devices listening; do they interact? Communicate?
  • What is captured? Is it preserved? What privacy rights does it have?
  • What info is shared between devices?
  • Can my wife recall a conversation to prove she’s right?
  • Is there individual speaker ID with different permissions?
  • Can we switch to higher quality modes at the right time?
  • What is protected by different countries or states?
  • What other devices can we add mics to? How do you individually address each device?
  • What is being stored and where? What is local, what is in the cloud?
  • Who is responsible for right to privacy? Chip vendor? System integrator? Software provider?
  • How do we integrate authentication on always-on devices?
  • Privacy in/out of house — how do we handle these? Different considerations?
  • Shotspotter type crowdsourcing
  • Acoustic event detection, do we want to use these devices to monitor?
  • Google, Amazon, Apple are clear as to what they are doing with the audio
  • Facebook (Terms of Service) TOS says it listens all the time (and pushes ads)
  • How does the user know what the hell is going on? No one reads TOS and there is not always a light on saying that the device is listening. How do we handle opt-in/out?
  • When you use the service, the data is used for training Deep Neural Networks (DNNs) etc. (anonymously?)
  • “There needs to be a way to delete all my data” implies that my data is annotated and not anonymized
  • Take snippets of speech and re-synthesize people saying things that they didn’t actually say
  • Is our memory of recorded events distorted or romanticized from what is actually recorded?
  • Buffered recording — can we have enough buffer to save/preserve things that just happened? “In the moment” type recording.
  • Is relevant advertising useful? Friends talking about bands might be useful. But browsing history is already used for targeting ads, so is voice-triggered advertising more or less annoying?
  • How do we balance convenience vs. privacy? Scrubbing email to put flight info on a calendar is useful, but other things seem invasive
  • How has culture changed to allow mics to be always on? Identifying individual speakers is further invasive. If we talk about going to dinner in two weeks, do I get a calendar entry predicting this?
  • Level of trust — a real assistant might provide a completely trustworthy conduit but an electronic one is not? A real assistant would be discreet and confidential — for instance, if I buy Xmas gifts for my family, how do I keep my kids from finding this out?
  • Do we need deeper understanding from machine learning? Able to interpret or use common sense to get to the core of the topic or request?
  • Alexa has a “personality” that is a choice by Amazon; she has favorite teams, colors, etc. — this personality might not be appropriate for another country
  • Can the AI personalize herself based on my favorites — sports, politics, etc.
  • Communication between virtual assistants: “Alexa, unlock the privacy rights skill” — can we ask the AI to set the rules for sharing data?
  • What if I tell the AI to not tell something? Or tell a lie? Or ask something that might indicate that I’m dangerous or criminal?
  • Can there be a natural language way to have a private conversation? “Alexa, go private”
  • How do we do it across all the different mics/platforms?
  • Is there a market for a private assistant that might not be used for marketing purposes? Would that be a subscription?
  • Opt-out options — see Google analytics opt-out add-on
  • Trade features or money for privacy
  • Opt-in/out in a selective way — dates, times, content, etc.

Address direction of the future groups: 2018
Multiple groups have addressed the future of voice assistance over the past several years. In addressing topics in this space going forward, we advise future groups to avoid going too broad (a consequence of clumping) and address the direction of a specific topic in the field.
Other reference material

Previous BBQ reports
The Future of Voice Interfaces —https://www.projectbarbq.com/reports/bbq17/bbq17r3.htm
Audio Of Things: Audio Features and Security for Smart Homes/Internet of Things —http://www.projectbarbq.com/reports/bbq15/bbq15r4.htm
Audio opportunities in the Internet of Things —http://www.projectbarbq.com/reports/bbq14/bbq14r7.htm
Using Sensor Data to Improve the User Experience of Audio Applications —http://www.projectbarbq.com/reports/bbq13/bbq13r6.htm
Form Factors and Connectivity for Wearable Audio Devices —https://www.projectbarbq.com/reports/bbq12/bbq12r5.htm

Privacy links
https://images.apple.com/business/docs/iOS_Security_Guide.pdf (pg 49)

Fun Videos!

section 3

next section

select a section:
1. Introduction
2. Workgroup Reports Overview
3. Alexa, Siri, Cortana or: How I Learned to Stop Worrying and Love the Cloud
4. “You and the Uni: Defining Pedagogical Requirements for Audio Engineering Education” a.k.a. Discovering What to Learn Them Young Whippersnappers
5. A spatial audio format with 6 Degrees of Freedom
6. CAAML: Creative Audio Applications of Machine Learning
7. Mode and Nodes – Enabling Consumer Use of Heterogeneous Wireless Speaker Devices
8. Abusing Technology for Creative Purposes
9. Schedule & Sponsors