2014 Workgroup Topic Proposals

Multifactor Interfaces Between Human and Machine?

Humans communicate with each other not just with pure speech but with speech augmented by a variety of cues including audio, facial, posture and gestures.  Do any of these cues also add value in human to machine communications?  Could they serve as another form of contextual awareness making ASR more accurate and machine responses more meaningful.

Some suggest these cues are too individual in nature to augment human to machine communication.  Others point out that experts can identify and interpret these audio and visual cues by observing anyone which suggests computer intelligence could easily programmed to interpret these cues.

Let’s identify specific applications where a multifactor interface (speech plus other visual and audio cues) would add value in the new world where your voice becomes the primary interface to consumer products.

Audio Opportunities in Emerging Wearable and IoT

What wearable and IoT applications featuring audio will make a splash and stick?  What applications will just splat?  What sound voice, speech and audio technologies will make a difference in these emerging applications?  What advances in sound technologies are needed in the next five years to create compelling and sustainable applications in this space?  Let’s brainstorm potential applications with real consumer value propositions, debate their merit, prioritize them and define a sound technology roadmap to support them.

Lets play lean startup

We’ve got two days to validate a business model for a new company that we have created out of the ether of beer, sweat and BBQ.

We will identify our target customers, understand what they’re bitching about, and propose what we’re going to do about it & blow their socks off. We’ll run experiments on other groups (our target customers) to validate the hypotheses of our model.  We’ll iterate, and pivot and all that good stuff. And when our model is complete, we will be ready to put a dent in the universe and make a ton of cash.  Yeehaw!

With some sincerity, this topic is really about tech and business, not just tech.  I’ve heard as many grumblings about business and politics at BBQ as I have about tech problems.  Like Love and Marriage, you can’t have one without the other.  So, what if we set out to solve some business problems?

The Chorus of Bats: “Do we REALLY want to support High-Resolution Audio?”

The Problem:

Regardless of where you stand on the “Is High-Resolution Audio Worth It” debate, the marketing departments have already opened the barn door and the cows are out.  In the corporate world’s never-ending quest for brand differentiation, market relevance, and lavish CEO compensation packages, “High Resolution Audio” is already being sold as the “Next Big Thing” in audio.   As the owners of audio for the next five years, do we:

1:  Ignore it:  “That’s Snake Oil and we don’t want any part of it!”

2:  Sell it:  “We’ve got the BEST Snake Oil, and we’re gonna milk this to stay employed for a few more years!”

3:  Build it:  “We love it, you really want it, and we’re gonna do it right, even if that means ultrasonic  tweeters in your headphone cans!”

I see that the 2011 “Galileo” group might be real big on topic 1, where we discuss whether or not High-Resolution Audio really does bring audio happiness.  I think we’ll need some Screaming Monkeys when we get to topic 2, because we need to know if the new Monkey Bus can support High-Res.  Finally, for #3, I think we need some support of the Doppler Chickens.  As engineers, let’s do it right.  I know mics can go ultrasonic, but what about speakers?  How do we engineer speakers, microspeakers, and headphone receivers that can cleanly go up to 40kHz?  What about low-power portable audio amplifiers with no intermodulation distortion up there?  What about headphones (both circumaural and in-ear) design considerations?  What kinds of transducers are we looking at?  How about test systems and standards?  I don’t think the type 3.3 ears go that high.  Where do we go from here?

The Dream Dugout: New Best Practices for Dream-Team-Building

When it’s time to build the dream product, or series of products, yer gonna want yer best possible noises.

So you bring in the Dream Team, naturally, but how best to set them up?  What tricks have Time and Experience taught us about the environment, the structure, the attitude–and how can we anticipate changes to those lessons over the next few years?

How do these things change in the future, with vastly improved collaboration tools, teleconferencing, telepresence, lifecasting, and with new tools blurring lines between “integration,” “music” and “sound design:”  Who punches the clock at the factory, who commutes to his garage studio in bunny slippers?   How big are the teams?  How do we account for slippery job titles?  How frequent are physical/virtual meetings?  What’s flow of control and command will work best?  What about interactions Audio has with the deeper technical teams and loftier Vision-Holders?  Collaborations with outside contributors?  What about it?  HUH?  WHAT ABOUT IT?!?!?!?

 

Making Binaural Work: Bringing back “Handsome’s” suggestion from 2013

Looks like there’s lots of interest in Binaural and Headphones this year.  Hmmmm.

I don’t recall who is handsome, let alone who “Handsome” is (Howard Brown?) but I like this topic, and it appears to have slipped through the cracks of 2013’s Giant Brain.  Can we take another swing at it?  

 

–George

Make Binaural work

We often talk about ‘immersive audio’, where one feels like they are in the middle of a game, orchestra or movie. The use of spatial audio (HRTFs, room models, BRIRs, etc.) to render these immersive scenes is usually the ‘go-to’ idea. Some of the problems with synthetic spatial audio, as well as binaural field recordings, are:

1) The visual cues are missing or wrong.
2) Head motion is not taken into account.
3) HRTFs are generic and not individualized.
4) The listener’s environment is not taken into account.

That last point is particularly important. If you have a binaural recording made in a small room, but you listen to it in a large room, it will sound terribly colored. In fact, if the room you are listening in is not taken into account, any synthetic or binaural recording will have coloration.

Another big issue is that, if the visual cue is missing, the listener tends to localize the sound behind them (or at least somewhere outside of their field of vision).

So what can be done to mitigate these issues? Is this something that we can engineer (i.e., build me some new, celebrity endorsed headphones), or is it a matter of getting the signal processing just right (can you say ‘head tracker’, hallelujah!), or are there limitations at the cognitive level that need to be addressed?

Reinventing audio for headphones

According to a 2013 survey by Motorola, more people watch TV and movies on tablets than on television sets/home theatres. More people than ever consume most of their sound over headphones, rather than speakers.

  • How do we adjust our creative practices for headphone listening?
  • What improvements to headphone sound do we need?
  • How can we prevent people from damaging their ears when so much sound is consumed loudly on headphones?
  • What other questions do we need to explore and address related to  headphone listening?

There are elements that could tie in with the binaural tracking workgroup proposal.

The sound of one hand clapping

More and more gesture-based computer controllers are being developed (Leap motion, Myo, etc.). In the movies and TV (hello, Star Trek fans), these always have sounds, but as yet, these devices are usually released without sounds, to have each implementation/software using the device implement their own fx.

Should gestural sounds have some form of standardization, in the way that keyboard sounds and mouse clicks have?  What are some “universal” gestures that might need a standard set of interface sounds?

Recording studio and software design for game dialogue recording

If you were to build a recording studio for games dialogue recording what would it be like? Regular recording studios and existing recording software are the round hole that the square peg, nay, the multi-dimensional spaghetti peg, of games gets hammered through. What needs to change to make the games specialist studio technically on the nail and creatively inspiring?

Anyone up for brainstorming for the ultimate studio design and recording software design wish list? Is the market for games production big enough for the likes of Avid, Steinberg, Adobe, Sony etc to take commercial interest in game specific tools/features?

Headtracking for Binaural Audio

Topics of discussion: The effective use of Head tracking position methods in Binaural and Augmented Reality applications. How much impact on the Center image stability is gained by the use of a Head-tracking device. Can a realistic 2D & 3D experience happen without the use of Headtracking  methods. Will Augmented Reality devices make Head tracking mandatory? What is the minimum degree of accuracy attainable and also required?