According to a 2013 survey by Motorola, more people watch TV and movies on tablets than on television sets/home theatres. More people than ever consume most of their sound over headphones, rather than speakers.
- How do we adjust our creative practices for headphone listening?
- What improvements to headphone sound do we need?
- How can we prevent people from damaging their ears when so much sound is consumed loudly on headphones?
- What other questions do we need to explore and address related to headphone listening?
There are elements that could tie in with the binaural tracking workgroup proposal.
More and more gesture-based computer controllers are being developed (Leap motion, Myo, etc.). In the movies and TV (hello, Star Trek fans), these always have sounds, but as yet, these devices are usually released without sounds, to have each implementation/software using the device implement their own fx.
Should gestural sounds have some form of standardization, in the way that keyboard sounds and mouse clicks have? What are some “universal” gestures that might need a standard set of interface sounds?
The New York Times reported that Google Fiber “is so fast, it’s hard to know what to do with it.” After downloading 612 kitten photos in one second, researchers wondered what to do next with the gigabit connection.
Could the killer app involve audio? As gigabit speeds roll out across America, what are the audio opportunities?
If you were to build a recording studio for games dialogue recording what would it be like? Regular recording studios and existing recording software are the round hole that the square peg, nay, the multi-dimensional spaghetti peg, of games gets hammered through. What needs to change to make the games specialist studio technically on the nail and creatively inspiring?
Anyone up for brainstorming for the ultimate studio design and recording software design wish list? Is the market for games production big enough for the likes of Avid, Steinberg, Adobe, Sony etc to take commercial interest in game specific tools/features?
Topics of discussion: The effective use of Head tracking position methods in Binaural and Augmented Reality applications. How much impact on the Center image stability is gained by the use of a Head-tracking device. Can a realistic 2D & 3D experience happen without the use of Headtracking methods. Will Augmented Reality devices make Head tracking mandatory? What is the minimum degree of accuracy attainable and also required?
Many portable devices are already loaded with sensors and the Internet of Things will provide even more sensor data. The amount of available information will be enormous and we should figure out how to use it to the advantage of Audio. The question is, how can a combination of audio components (microphones, speakers, earpieces) and sensors (accelerometers, gyroscopes, pressure sensors / altimeters, thermometers, humidity sensors, etc) be much more than the sum of the individual components? What sort of sensor data could be used to improve audio? Could data from the microphone (or even speaker/earpiece..?) be used to complement sensors? What sort of new applications can we come up with when we have sensor data available?
Coming up with the higher level ideas could be enough but if there’s time, here are some technical questions that could also be discussed:
– What are the requirements for the interface? (There’s the i-word again…)
– What applications are latency critical? What are the latency requirements?
– Data bandwidth requirements?
– Should sensors and audio components use the same interface (bus)?
– Would it be beneficial to have a direct connection between sensors and audio components?
– Would audio/sensor hubs be the right way to go?
We often talk about ‘immersive audio’, where one feels like they are in the middle of a game, orchestra or movie. The use of spatial audio (HRTFs, room models, BRIRs, etc.) to render these immersive scenes is usually the ‘go-to’ idea. Some of the problems with synthetic spatial audio, as well as binaural field recordings, are:
1) The visual cues are missing or wrong.
2) Head motion is not taken into account.
3) HRTFs are generic and not individualized.
4) The listener’s environment is not taken into account.
That last point is particularly important. If you have a binaural recording made in a small room, but you listen to it in a large room, it will sound terribly colored. In fact, if the room you are listening in is not taken into account, any synthetic or binaural recording will have coloration.
Another big issue is that, if the visual cue is missing, the listener tends to localize the sound behind them (or at least somewhere outside of their field of vision).
So what can be done to mitigate these issues? Is this something that we can engineer (i.e., build me some new, celebrity endorsed headphones), or is it a matter of getting the signal processing just right (can you say ‘head tracker’, hallelujah!), or are there limitations at the cognitive level that need to be addressed?
It saddens me to see two people sharing a single set of earbuds, one lonely channel per listener. What if there were an app that let one person broadcast music to the mobile devices of nearby friends? Add voice and text chat, plus some ways to monetize the service through DSP plug-ins, hardware add-ons, and referral fees for promoting music.
What are the requirements, and what types of businesses could result?
Failing that, how about some really immersive headphones?
Is there a value to connect 16 microphones in a system
– is there a simple way of connecting these 16 microphones to the system
– what requirements would be needed to fulfill the use case?
Always on for voice recogniziton is becomming popular. It is a tough use case as voice recognition requires to be accurately recognize a voice amongst a crowd of people and react on the right voice. This could requrie to add more microphones. How to connect those microphones easy to the main chipset and what are the requirements for this.
Other considerations are: standby current (while being always on)
There is a movement underway to move audio processing off the host processor and onto dedicated hardware intended for audio processing. However, there is a lot of information coming to light showing that offloading doesn’t provide many benefits for the most common use cases, and in fact the only scenario it may be useful is when using analog headphones listening to music for hours. Let’s flesh out what is truly useful, and what use cases justify the extra engineering, expense, and segmenting required for hardware offloading on the most popular computing platforms.
With the Tesla Model S earning Car of the Year nods from several leading auto publication, and the company’s subsequent increase in sales and government incentives to push alternative fuels, the impact that silence of electric vehicles has on pedestrians is being considered by a number of car manufacturers. For example, Audi has a sound generation system outside their electric version of the R8. See this video http://www.youtube.com/watch?v=Yungwc92gFo
Should these kinds of sounds, artificially generated on the outside of electric (or other silent) vehicles be made to sound like a traditional gasoline engine and exhaust, or should there be some sort of iconic, industry-wide sound developed similar to the chirping sound currently used to assist the sight-impaired at crosswalks? Should sound generation only be done when proximity to collision is detected via peer-to-peer communication with other connected devices, or should it be in an “always on” state? This decision could have a profound impact on how humans perceive silent or near-silent approaching vehicles in the future.
Working on the assumption that you have a personal car (not a train or public transport) and it is completely safe and goes point to point smoothly. What does it look and sound like on the inside?
Steering wheel? Chairs point inward / rotating? Speaker placement change? Microphone arrays? Kinect-like IR?