home  previous   next 
The Fourteenth Annual Interactive Audio Conference
brainstorming graphic

Group Report: Dick and Jane Tracy plug into the Matrix


Tomer Elbaz, Waves

Gene Radzik, Dolby Laboratories
Chris Grigg, MMA Amir Ben Kiki, Waves
Rory O’Neill, Club Penguin (Disney Online Studios – Canada) Jim Rippie, Invisible Industries
  Facilitator: David Roach Optimal Sound
The group stepped back and looked at the bigger picture of where voice is used.  Recognizing voice, text, video and interactivity are converging on multiple platforms, VOIP improvements are limited in scope. We flipped the VOIP topic on its head and began looking for a breakthrough rather than an incremental improvement in stand alone VOIP application.

“Social networks will drive audio convergence in multi-sensory devices”

  • Facebook for family and friends - contact lists, multi-media (video, gaming, texting, photos, events)
  • LinkedIn for professionals and work (contact lists, introductions, recommendations)
  • Google – Infrastructure know-how, cloud applications, Google Wave

This vision implies a merging of multiple media that have previously been treated as separate entities, both technically and in the user's mind.  Convergence soon leads to a blurring of the boundaries separating the media, both technically and in the user's mind.  In the long run the blurring turns to a fading, as the technology becomes increasingly unobtrusive, taken-for-granted, and beyond simple to use. User awareness of the entire infrastructure --network and terminal alike -- recedes into the background and finally all there is is the telepresence experience itself. And that's a good thing.  Except of course for when it isn't i.e.potential privacy issues arise when you forget the mics and cameras are still on.

But it'll take some time and considerable engineering, build-out, and industrial design to get there.
“Ubiquitous communication with minimum user fatigue ”
VoIP will span multiple devices/platforms:   Desktop, Laptop, Netbook, SmartPhone, iTV, DVR, game console, game portables, and likely more

We identified Drivers, Road-bumps, Blockades, Opportunities, and Recommendations for paving this new communication super highway.


Economic Bubble – VOIP is a disruptor (“Free” is better than paying Ma Bell for long distance). Free and convenience wins over quality in the near-term.  Users willing are to pay for value add (e.g. - Ooma's privacy features, Skype's Skype-Out service)
Convergence – Single smart device eliminate the need to carry multiple devices to communicate. Notebook, netbook, phone, video camera, microphones, audio recorders
Telepresence – Video as well as audio includes user generated content (UGC) (e.g. - YouTube, Podcasting, Media Mail)
Unified Communication – Enterprise voice, video, text, documents and media share
Anonymity – The desire to sound and look different (gender neutrality, power positions)
Centralized Contacts – Centralized friends list (updated by user, shared with contacts)
Fatigue – More time on the phone (absence of travel).  How long can you stand to listen to distorted multi-speaker mono madness?  
Viral –  send your media 1: N (many)
Mash-Up – Customized interests, repackage multiple media types and data
Democratization of self-contained tools-to-distribution –  (e.g. – iPhone 1) video capture, 2) edit,  3) publish to web)
Entertainment – It’s music, It’s video, It’s gaming
Rolodex Nightmare Cured – Disconnected contact lists converge.  Write, erase, update, REPEAT (now managed by users connected over networks)
Ad aggregation – Keywords/metadata of conversations or context to drive advertisement

  • Drop outs and other audio issues
  • Network Latency
  • System Latency
  • Consistency of processing
  • Inability to Access Metadata (lack of standards or conformity to standards)
  • Microphones – Voice should drive microphone development / selection in devices (poor mic'ing a source of fatigue)
  • Maintain gender, inflections, transients of the language/culture
  • Talk-over – 2 simultaneous speakers either because of network latency or lack of comfort noise
  • Legal consideration for recording rights
  • Privacy  (fear of personal data being exposed - contacts, financial data, etc.)
  • Advertising overload (Freemium ad model potential pros and cons)


Punitive cost of bandwidth (Telco Industry restricting VOIP access) waiting for “Net Neutrality”
Low bar of expectations – Mp3s are acceptable, dropped calls are expected
            Enterprise requirements for quality trickle down to the mass market
Ubiquitous bandwidth required – Fiber to the home/curb help address this (coming soon)
Bluetooth audio support problematic – Bluetooth profile incompatibilities between devices
Cloud failure – Need fast-fail back systems when network application dependency exists


People like to customize and accessorize (from kids thru adults)
            Ringtones on your phone, Avatars, Facebook preferences
“Cloud services” – fee based processing

  • QoS guarantee
  • Recording / Back-up  - aware of international law
  • Automated speech-to-text for indexing, timestamping, and searching stored audio
  • Manual Transcription
  • Translation for language
  • Contact Management – Cultural/Holiday/Personal contextual reminders

Ad-free Premium service

Environmental responsibility – Alt energy for Data Farms to enable the cloud

People will likely use auditory themes – Audible Emoticons, Voice morphing

Connect devices – Use available wireless protocols to bring experiences together

Data Point:
PSTN Round-Trip 5ms (20 ms – human perception)


Audio components:

  • Minimum sampling rate: 16 kHz fs (over current 8 kHz POTS standard)
  • Positional Audio – Spatially positioned perspective
  • Optimize processing efficiency and minimize redundancy – (proper audio signal chain – e.g. – single echo cancellation function)
  • "Evaluated blocks individually"
  • Comfort noise – call connection verified,  talk over less likely
  • Echo and Noise reduction, where needed
  • Watermark – evidence of tapering (pilot tone, timecode)
  • In communications, audio takes precedence (Audio discontinuity is unacceptable, Video as secondary importance )

Social components:

  • Sonic emoticons (provide contextual reference for emoticons)
  • Waiting to speak queue
  • Timestamp indexing
  • Markers
  • Status – Away from Keyboard, Reviewing message, Privacy/Busy, Talking with x,y,z
  • Contextual Profiling – Use location / conditions
  • Advance dashboard – Latency calc, Connection meter, Content Meters, BS meter, Rate the speaker, Record, Cloud interface
  • Dynamic Opt-out functionality  (Push-to-Talk)
  • Force feedback – blue tooth “mood/ physiological state monitor”
  • Gesture component  (capture and use) driven by user innovation

The action item list


Who’s Responsible

Due Date





Complete report for publication




Connect with network professionals – Invite to Proj BarBQ




Connect with social networking entities – Invite to Proj BarBQ


 David Roach        


Connect with unified communications entities – Invite to Project BarBQ













 Other references:

Over arching thoughts/suggestions that the Giant Brain contributed to VOIP (BBQ 2009).
“Ubiquitous Audio”
“Predictive Processing”
“Audio Products that work with decreasing purchasing power”
“As soon as it is practical, it will be done”
“Let user/use drive innovations”
“Extension of thought – Facebook had impact on 2008 elections.  There is potential for greater impact”
“We are realizing democratization of powerful media tools”

section 4

next section

select a section:
1. Introduction
2. Executive Summary
3. Hear, There, and Everywhere
4. Dick and Jane Tracy plug into the Matrix
5. Mobile Infra-Structure
6. Re-imagining Operating System/Hardware Services for Applications
7. Schedule & Sponsors