Many portable devices are already loaded with sensors and the Internet of Things will provide even more sensor data. The amount of available information will be enormous and we should figure out how to use it to the advantage of Audio. The question is, how can a combination of audio components (microphones, speakers, earpieces) and sensors (accelerometers, gyroscopes, pressure sensors / altimeters, thermometers, humidity sensors, etc) be much more than the sum of the individual components? What sort of sensor data could be used to improve audio? Could data from the microphone (or even speaker/earpiece..?) be used to complement sensors? What sort of new applications can we come up with when we have sensor data available?
Coming up with the higher level ideas could be enough but if there’s time, here are some technical questions that could also be discussed:
– What are the requirements for the interface? (There’s the i-word again…)
– What applications are latency critical? What are the latency requirements?
– Data bandwidth requirements?
– Should sensors and audio components use the same interface (bus)?
– Would it be beneficial to have a direct connection between sensors and audio components?
– Would audio/sensor hubs be the right way to go?
We often talk about ‘immersive audio’, where one feels like they are in the middle of a game, orchestra or movie. The use of spatial audio (HRTFs, room models, BRIRs, etc.) to render these immersive scenes is usually the ‘go-to’ idea. Some of the problems with synthetic spatial audio, as well as binaural field recordings, are:
1) The visual cues are missing or wrong.
2) Head motion is not taken into account.
3) HRTFs are generic and not individualized.
4) The listener’s environment is not taken into account.
That last point is particularly important. If you have a binaural recording made in a small room, but you listen to it in a large room, it will sound terribly colored. In fact, if the room you are listening in is not taken into account, any synthetic or binaural recording will have coloration.
Another big issue is that, if the visual cue is missing, the listener tends to localize the sound behind them (or at least somewhere outside of their field of vision).
So what can be done to mitigate these issues? Is this something that we can engineer (i.e., build me some new, celebrity endorsed headphones), or is it a matter of getting the signal processing just right (can you say ‘head tracker’, hallelujah!), or are there limitations at the cognitive level that need to be addressed?
It saddens me to see two people sharing a single set of earbuds, one lonely channel per listener. What if there were an app that let one person broadcast music to the mobile devices of nearby friends? Add voice and text chat, plus some ways to monetize the service through DSP plug-ins, hardware add-ons, and referral fees for promoting music.
What are the requirements, and what types of businesses could result?
Failing that, how about some really immersive headphones?
Is there a value to connect 16 microphones in a system
– is there a simple way of connecting these 16 microphones to the system
– what requirements would be needed to fulfill the use case?
Always on for voice recogniziton is becomming popular. It is a tough use case as voice recognition requires to be accurately recognize a voice amongst a crowd of people and react on the right voice. This could requrie to add more microphones. How to connect those microphones easy to the main chipset and what are the requirements for this.
Other considerations are: standby current (while being always on)
There is a movement underway to move audio processing off the host processor and onto dedicated hardware intended for audio processing. However, there is a lot of information coming to light showing that offloading doesn’t provide many benefits for the most common use cases, and in fact the only scenario it may be useful is when using analog headphones listening to music for hours. Let’s flesh out what is truly useful, and what use cases justify the extra engineering, expense, and segmenting required for hardware offloading on the most popular computing platforms.
With the Tesla Model S earning Car of the Year nods from several leading auto publication, and the company’s subsequent increase in sales and government incentives to push alternative fuels, the impact that silence of electric vehicles has on pedestrians is being considered by a number of car manufacturers. For example, Audi has a sound generation system outside their electric version of the R8. See this video http://www.youtube.com/watch?v=Yungwc92gFo
Should these kinds of sounds, artificially generated on the outside of electric (or other silent) vehicles be made to sound like a traditional gasoline engine and exhaust, or should there be some sort of iconic, industry-wide sound developed similar to the chirping sound currently used to assist the sight-impaired at crosswalks? Should sound generation only be done when proximity to collision is detected via peer-to-peer communication with other connected devices, or should it be in an “always on” state? This decision could have a profound impact on how humans perceive silent or near-silent approaching vehicles in the future.
Working on the assumption that you have a personal car (not a train or public transport) and it is completely safe and goes point to point smoothly. What does it look and sound like on the inside?
Steering wheel? Chairs point inward / rotating? Speaker placement change? Microphone arrays? Kinect-like IR?