Humans communicate with each other not just with pure speech but with speech augmented by a variety of cues including audio, facial, posture and gestures. Do any of these cues also add value in human to machine communications? Could they serve as another form of contextual awareness making ASR more accurate and machine responses more meaningful.
Some suggest these cues are too individual in nature to augment human to machine communication. Others point out that experts can identify and interpret these audio and visual cues by observing anyone which suggests computer intelligence could easily programmed to interpret these cues.
Let’s identify specific applications where a multifactor interface (speech plus other visual and audio cues) would add value in the new world where your voice becomes the primary interface to consumer products.
What wearable and IoT applications featuring audio will make a splash and stick? What applications will just splat? What sound voice, speech and audio technologies will make a difference in these emerging applications? What advances in sound technologies are needed in the next five years to create compelling and sustainable applications in this space? Let’s brainstorm potential applications with real consumer value propositions, debate their merit, prioritize them and define a sound technology roadmap to support them.
Dealing with thousands of dialogue lines has inspired ingenuity, creativity, tantrums and nervous breakdowns. The software used is myriad and methods swing from lunacy to genius. Get down and dirty therapy; share stories and find Holy Grails.
Almost every studio and every audio engineer still have different approaches and ‘standards’ when it comes to recording and mastering dialogue. Can we thrash out and write a best practice Bible on one side of paper?
We’ve got two days to validate a business model for a new company that we have created out of the ether of beer, sweat and BBQ.
We will identify our target customers, understand what they’re bitching about, and propose what we’re going to do about it & blow their socks off. We’ll run experiments on other groups (our target customers) to validate the hypotheses of our model. We’ll iterate, and pivot and all that good stuff. And when our model is complete, we will be ready to put a dent in the universe and make a ton of cash. Yeehaw!
With some sincerity, this topic is really about tech and business, not just tech. I’ve heard as many grumblings about business and politics at BBQ as I have about tech problems. Like Love and Marriage, you can’t have one without the other. So, what if we set out to solve some business problems?
If 3D printing is the next big thing in technology, what opportunities does it afford when it comes to audio?
Regardless of where you stand on the “Is High-Resolution Audio Worth It” debate, the marketing departments have already opened the barn door and the cows are out. In the corporate world’s never-ending quest for brand differentiation, market relevance, and lavish CEO compensation packages, “High Resolution Audio” is already being sold as the “Next Big Thing” in audio. As the owners of audio for the next five years, do we:
1: Ignore it: “That’s Snake Oil and we don’t want any part of it!”
2: Sell it: “We’ve got the BEST Snake Oil, and we’re gonna milk this to stay employed for a few more years!”
3: Build it: “We love it, you really want it, and we’re gonna do it right, even if that means ultrasonic tweeters in your headphone cans!”
I see that the 2011 “Galileo” group might be real big on topic 1, where we discuss whether or not High-Resolution Audio really does bring audio happiness. I think we’ll need some Screaming Monkeys when we get to topic 2, because we need to know if the new Monkey Bus can support High-Res. Finally, for #3, I think we need some support of the Doppler Chickens. As engineers, let’s do it right. I know mics can go ultrasonic, but what about speakers? How do we engineer speakers, microspeakers, and headphone receivers that can cleanly go up to 40kHz? What about low-power portable audio amplifiers with no intermodulation distortion up there? What about headphones (both circumaural and in-ear) design considerations? What kinds of transducers are we looking at? How about test systems and standards? I don’t think the type 3.3 ears go that high. Where do we go from here?
When it’s time to build the dream product, or series of products, yer gonna want yer best possible noises.
So you bring in the Dream Team, naturally, but how best to set them up? What tricks have Time and Experience taught us about the environment, the structure, the attitude–and how can we anticipate changes to those lessons over the next few years?
How do these things change in the future, with vastly improved collaboration tools, teleconferencing, telepresence, lifecasting, and with new tools blurring lines between “integration,” “music” and “sound design:” Who punches the clock at the factory, who commutes to his garage studio in bunny slippers? How big are the teams? How do we account for slippery job titles? How frequent are physical/virtual meetings? What’s flow of control and command will work best? What about interactions Audio has with the deeper technical teams and loftier Vision-Holders? Collaborations with outside contributors? What about it? HUH? WHAT ABOUT IT?!?!?!?
According to a 2013 survey by Motorola, more people watch TV and movies on tablets than on television sets/home theatres. More people than ever consume most of their sound over headphones, rather than speakers.
- How do we adjust our creative practices for headphone listening?
- What improvements to headphone sound do we need?
- How can we prevent people from damaging their ears when so much sound is consumed loudly on headphones?
- What other questions do we need to explore and address related to headphone listening?
There are elements that could tie in with the binaural tracking workgroup proposal.
More and more gesture-based computer controllers are being developed (Leap motion, Myo, etc.). In the movies and TV (hello, Star Trek fans), these always have sounds, but as yet, these devices are usually released without sounds, to have each implementation/software using the device implement their own fx.
Should gestural sounds have some form of standardization, in the way that keyboard sounds and mouse clicks have? What are some “universal” gestures that might need a standard set of interface sounds?
The New York Times reported that Google Fiber “is so fast, it’s hard to know what to do with it.” After downloading 612 kitten photos in one second, researchers wondered what to do next with the gigabit connection.
Could the killer app involve audio? As gigabit speeds roll out across America, what are the audio opportunities?
If you were to build a recording studio for games dialogue recording what would it be like? Regular recording studios and existing recording software are the round hole that the square peg, nay, the multi-dimensional spaghetti peg, of games gets hammered through. What needs to change to make the games specialist studio technically on the nail and creatively inspiring?
Anyone up for brainstorming for the ultimate studio design and recording software design wish list? Is the market for games production big enough for the likes of Avid, Steinberg, Adobe, Sony etc to take commercial interest in game specific tools/features?
Topics of discussion: The effective use of Head tracking position methods in Binaural and Augmented Reality applications. How much impact on the Center image stability is gained by the use of a Head-tracking device. Can a realistic 2D & 3D experience happen without the use of Headtracking methods. Will Augmented Reality devices make Head tracking mandatory? What is the minimum degree of accuracy attainable and also required?