Project Bar-B-Q 2018 report section 8

home previous next
The Twenty-third Annual Interactive Audio Conference PROJECT BAR-B-Q 2018

BBQ

Group Report:
Taking the “virtual” out of virtual audio


Participants: A.K.A. "Is it Enigma or is it Memorex?"
Andie Ray, Dolby Laboratories	Phill Williams, Netflix
David Roach, Magic Leap	Mike Minnick, Magic Leap
Scott McNeese, Surfaceink	Martin Puryear, Google
Brett Patterson, Firelight Technologies Pty Ltd
	download the PDF

Problem Statement Everything that you perceive with your ears is coloring every other perception you have and every conscious thought that you have. Sound gets in so fast that it modifies all other input and sets the stage for it. - Seth Horowitz Most audio is hyper-real. Audio for games, TV and movies, goes beyond ‘realistic’ sound to suspend your disbelief and immerse you in an experience. Even meticulously-recorded music is mixed into a hyper-real presentation. As of today, audio attempting to be truly ‘real’ is limited to certain musical recordings and a handful of other stuff. AR experiences are currently a small but growing subset. Distractions are exponentially more difficult to deal with in AR, since both virtual video and virtual audio objects must be modeled accurately. It’s much more difficult to suspend disbelief when we are living in the real world. As AR/MR rises in popularity, is it now more important than ever before to be able to convey real audio. Hyper-real sounds pull you out of an augmented environment. But it is not sufficient to simply make binaural recordings of environments … the sound is interactive, and aspects have to be procedurally generated. So what is needed, and what are the obstacles? Obstacles Fidelity Virtual objects must match the real world acoustics The content must be balanced with the real world to be believable The content must not sound arbitrarily generated (i.e., footsteps should vary and never repeat) Out-of-scope topics for this paper Spatialization Real world objects interacting with virtual occlusions Reproduction of content coming out of virtual speakers Speech synthesis Synthesis techniques The uncanny valley of audio Uncanny Valley: the phenomenon whereby a computer-generated figure or humanoid robot bearing a near-identical resemblance to a human being arouses a sense of unease or revulsion in the person viewing it. Figure. The Uncanny Valley To date, there is a small amount of literature on the topic of an “aural” uncanny valley for audio. This will be an area of great interest, in the future, because for an audio presentation to be real, the uncanny valley must be avoided. While it is beyond the scope of this document to attempt to fully define the aural uncanny valley, here are some references: From The audio Uncanny Valley: Sound, fear and the horror game.: Certain amplitude envelopes applied to sound affect perceptions of urgency. Frequency might have an effect on the unpleasantness of sound and this might lead to negative affect. Familiar or iconic sounds can be defamiliarized and this can lead to perceptions of uncanniness. Uncertainty about the location of a sound source, its cause or its meaning in the virtual world increases the fear emotion. An aural resolution that is lower than a high quality, human-like visual resolution might lead to the uncanny. An exaggerated articulation of the mouth whilst speaking might lead to the uncanny. A lack of synchronization between lips and voice for photo-realistic virtual characters leads to a perception of the uncanny. In particular, sound that precedes associated video can be very unsettling. Recommendations and observations Fidelity Industrial design constraints Speaker/transducer location - closer to the ear is better, but not on or blocking the ear Speaker/transducer size - larger is usually better Transducer quality - low distortion Number of transducers - minimum of 2 speakers required for spatial audio Frequency response at ear Speaker/transducer placement - closer to ear is better, but not on or blocking the ear HRTF accuracy Object-to-ear HRTF - personalized is best Speaker-to-ear HRTF - needs to be considered for near-ear speakers Computing resources - limited on battery powered devices M/AR is not a tethered experience Virtual objects must match the real world acoustics Need a means to measure and match the room’s reverb Virtual audio objects must be occluded in the same way that real objects Ideally real-world audio would also be occluded by virtual visual objects The content must be balanced with the real world to be believable Eliminate visual or physical distractions that may distract from the experience (e.g. hiding speakers, using near-ear speakers instead of headphones) Ensure that content is not being masked by real world sounds The content must not sound arbitrarily generated (i.e., footsteps should vary and never repeat) How can we test this? This brings us closer to the original Turing test - Is a believable AI required to achieve this? What is good enough to be believable? We need to better understand and quantify the threshold for what sounds real vs fake It often depends on the listener’s intention: “I want to be excited” (hyper-real) vs “I want it to be real” We need to support both “They are here” and “We are there” use cases Is it a background or foreground experience? The believability is affected by experience, expectations, and age of listener How realistic are any synthetic “organic” sounds Audio cues extend perceived field of view and the graphics depth of field If the related video is not realistic, then audio may be less believable Realness Subjective Tests Below are two examples of subjective test methods that may be useful to rate the realness of an M/AR audio system. They are presented as introductory ideas; other test methodologies may be equal to or better than these. Example of an attentive test of audio realness In front of the listener is an acoustically-transparent screen (or the subject is blindfolded). Behind the screen is an acoustic sound source. Also present is a playback system. During the test, either the acoustic source or the playback system plays a sound. If the alternation is random, the listener may be asked to identify the sound source each time. If the alternation is not random, the listener may be asked to identify the real source, after a set of iterations. Questions What would the speaker-produced audio need to do? Would it need to make mistakes/variances as any acoustic source would? Would it need to express movement? Would it need to make ‘human’ or ‘natural’ like sounds in addition to the desired sound Can one distinguish synthetic content from acoustically generated content? Thoughts It is clear that if only pre-recorded or synthesized sounds are used, there would need to be a corpus of them to convey the variance expected of a natural sound. Perhaps one practical way to accomplish the test would be to procedurally generate the sounds. Parameters for a passive test of audio realness A measure of audio realness is the ability of the audio to contribute to an immersive experience without inadvertently breaking suspension of reality. A test of this attribute would necessarily be passive, as the audio must blend into the experience. This can help us quantify the threshold of believability. Create an immersive experience containing sounds/events to be tested One experience might contain background sounds, intended to be experienced inattentively Another might contain foreground sounds intended to draw attention Could have multiple versions of the experience with parameters changed (reverberance, sound design, av sync, etc.) Subject is entered into the experience Interview the subject after the test about the realism of it or request the listener to score it Alternatively, measure the subject’s attentiveness to sound in a non-subjective manner References & Additional Reading The Bandwidth of Human Perception and its Implications for Pro Audio by Thomas Lund (AES Library) The McGurk Effect Virtual haircut Carnegie Mellon: The challenges of testing in a non deterministic world Will Virtual Reality Get Lost in the Uncanny Valley Of Sound? Gamastura: Virtual Reality in the Uncanny Aural Valley Examples of ‘uncanny’ sounds generated by ML section 8

select a section:
1. Introduction
2. Workgroup Reports Overview
3. An Exploration of Machine Learning and the use cases where it might provide the most benefit for Audio Synthesis
4. Benchmarking methodology for a multi-voice assistant enabled future
5. Problems and Solutions for Audio in Augmented Reality Headsets
6. A World Without 3.5mm: Transport Features, Guidelines, and Opportunities
7. Pork Rinds: Challenges with the present hearable model
8. Taking the "virtual" out of virtual audio
9. Impact of non-traditional sound: mic used for ultrasonic, etc. Everything is broken!
10. Schedule & Sponsors

Copyright 2000-2018, Fat Labs, Inc., ALL RIGHTS RESERVED
www.projectbarbq.com