Project Bar-B-Q, 2009, report, section 4

home previous next
The Fourteenth Annual Interactive Audio Conference PROJECT BAR-B-Q 2009

	Group Report: Dick and Jane Tracy plug into the Matrix

Participants:

Tomer Elbaz, Waves

Gene Radzik, Dolby Laboratories

Chris Grigg, MMA

Amir Ben Kiki, Waves

Rory O’Neill, Club Penguin (Disney Online Studios – Canada)

Jim Rippie, Invisible Industries

Facilitator: David Roach Optimal Sound

Background:
The group stepped back and looked at the bigger picture of where voice is used. Recognizing voice, text, video and interactivity are converging on multiple platforms, VOIP improvements are limited in scope. We flipped the VOIP topic on its head and began looking for a breakthrough rather than an incremental improvement in stand alone VOIP application.

Vision:
“Social networks will drive audio convergence in multi-sensory devices”

Facebook for family and friends - contact lists, multi-media (video, gaming, texting, photos, events)
LinkedIn for professionals and work (contact lists, introductions, recommendations)
Google – Infrastructure know-how, cloud applications, Google Wave

This vision implies a merging of multiple media that have previously been treated as separate entities, both technically and in the user's mind. Convergence soon leads to a blurring of the boundaries separating the media, both technically and in the user's mind. In the long run the blurring turns to a fading, as the technology becomes increasingly unobtrusive, taken-for-granted, and beyond simple to use. User awareness of the entire infrastructure --network and terminal alike -- recedes into the background and finally all there is is the telepresence experience itself. And that's a good thing. Except of course for when it isn't i.e.potential privacy issues arise when you forget the mics and cameras are still on.

But it'll take some time and considerable engineering, build-out, and industrial design to get there.

Goal:
“Ubiquitous communication with minimum user fatigue ”

Devices/Platforms:
VoIP will span multiple devices/platforms: Desktop, Laptop, Netbook, SmartPhone, iTV, DVR, game console, game portables, and likely more

Work:
We identified Drivers, Road-bumps, Blockades, Opportunities, and Recommendations for paving this new communication super highway.

Drivers

Economic Bubble – VOIP is a disruptor (“Free” is better than paying Ma Bell for long distance). Free and convenience wins over quality in the near-term. Users willing are to pay for value add (e.g. - Ooma's privacy features, Skype's Skype-Out service)

Convergence – Single smart device eliminate the need to carry multiple devices to communicate. Notebook, netbook, phone, video camera, microphones, audio recorders

Telepresence – Video as well as audio includes user generated content (UGC) (e.g. - YouTube, Podcasting, Media Mail)

Unified Communication – Enterprise voice, video, text, documents and media share

Anonymity – The desire to sound and look different (gender neutrality, power positions)

Centralized Contacts – Centralized friends list (updated by user, shared with contacts)

Fatigue – More time on the phone (absence of travel). How long can you stand to listen to distorted multi-speaker mono madness?

Viral – send your media 1: N (many)

Mash-Up – Customized interests, repackage multiple media types and data

Democratization of self-contained tools-to-distribution – (e.g. – iPhone 1) video capture, 2) edit, 3) publish to web)

Entertainment – It’s music, It’s video, It’s gaming

Rolodex Nightmare Cured – Disconnected contact lists converge. Write, erase, update, REPEAT (now managed by users connected over networks)

Ad aggregation – Keywords/metadata of conversations or context to drive advertisement

Road-Bumps

Drop outs and other audio issues
Network Latency
System Latency
Consistency of processing
Inability to Access Metadata (lack of standards or conformity to standards)
Microphones – Voice should drive microphone development / selection in devices (poor mic'ing a source of fatigue)
Maintain gender, inflections, transients of the language/culture
Talk-over – 2 simultaneous speakers either because of network latency or lack of comfort noise
Legal consideration for recording rights
Privacy (fear of personal data being exposed - contacts, financial data, etc.)
Advertising overload (Freemium ad model potential pros and cons)

Blockers:

Punitive cost of bandwidth (Telco Industry restricting VOIP access) waiting for “Net Neutrality”

Low bar of expectations – Mp3s are acceptable, dropped calls are expected
Enterprise requirements for quality trickle down to the mass market

Ubiquitous bandwidth required – Fiber to the home/curb help address this (coming soon)

Bluetooth audio support problematic – Bluetooth profile incompatibilities between devices

Cloud failure – Need fast-fail back systems when network application dependency exists

Opportunities:

People like to customize and accessorize (from kids thru adults)
Ringtones on your phone, Avatars, Facebook preferences

“Cloud services” – fee based processing

QoS guarantee
Recording / Back-up - aware of international law
Automated speech-to-text for indexing, timestamping, and searching stored audio
Manual Transcription
Translation for language
Contact Management – Cultural/Holiday/Personal contextual reminders

Ad-free Premium service

Environmental responsibility – Alt energy for Data Farms to enable the cloud

People will likely use auditory themes – Audible Emoticons, Voice morphing

Connect devices – Use available wireless protocols to bring experiences together

Data Point:
PSTN Round-Trip 5ms (20 ms – human perception)

Recommendations:

Audio components:

Minimum sampling rate: 16 kHz fs (over current 8 kHz POTS standard)
Positional Audio – Spatially positioned perspective
Optimize processing efficiency and minimize redundancy – (proper audio signal chain – e.g. – single echo cancellation function)
"Evaluated blocks individually"
Comfort noise – call connection verified, talk over less likely
Echo and Noise reduction, where needed
Watermark – evidence of tapering (pilot tone, timecode)
In communications, audio takes precedence (Audio discontinuity is unacceptable, Video as secondary importance )

Social components:

Sonic emoticons (provide contextual reference for emoticons)
Waiting to speak queue
Timestamp indexing
Markers
Status – Away from Keyboard, Reviewing message, Privacy/Busy, Talking with x,y,z
Contextual Profiling – Use location / conditions
Advance dashboard – Latency calc, Connection meter, Content Meters, BS meter, Rate the speaker, Record, Cloud interface
Dynamic Opt-out functionality (Push-to-Talk)
Force feedback – blue tooth “mood/ physiological state monitor”
Gesture component (capture and use) driven by user innovation

The action item list

	Who’s Responsible	Due Date	Description
1	Gene	11/25/2009	Complete report for publication
2			Connect with network professionals – Invite to Proj BarBQ
3			Connect with social networking entities – Invite to Proj BarBQ
4	David Roach	01/04/2010	Connect with unified communications entities – Invite to Project BarBQ
5
6
7

Other references:

Over arching thoughts/suggestions that the Giant Brain contributed to VOIP (BBQ 2009).

“Ubiquitous Audio”
“Predictive Processing”
“Audio Products that work with decreasing purchasing power”
“As soon as it is practical, it will be done”
“Let user/use drive innovations”
“Extension of thought – Facebook had impact on 2008 elections. There is potential for greater impact”
“We are realizing democratization of powerful media tools”

section 4

next section

select a section:
1. Introduction
2. Executive Summary
3. Hear, There, and Everywhere
4. Dick and Jane Tracy plug into the Matrix
5. Mobile Infra-Structure
6. Re-imagining Operating System/Hardware Services for Applications
7. Schedule & Sponsors