With the tangible excitement in the air around VR, Snap Inc. effectively conquering the smartphone camera with its IPO, Facebook and Twitter doubling-down on live-stream video, it’d be easy to think that video is still the coolest kid at school. But scratch the surface a bit; you’ll hear that audio, not video, is the new black. Yes, podcasts are still white-hot, but investors, startups and enterprise alike are making bets on audio technologies that are more than just content-based.
ARM in Your Ear
Multiple companies are taking advantage of the ongoing miniaturization of high-power ARM chips to bring ‘hearables’ to market. These include Sound Hawk, Doppler Labs Here One and Apple AirPods. Hearables allow consumers to augment not only their hearing, but also control other devices such as smartphones. Think talking to Siri in your AirPods or through your Dick Tracy-esque, Samsung Gear S smartwatch. It doesn’t need a Samsung phone to make phone calls.
This is notable because it marks a hard shift from traditional earbuds who made use of single-task barebones hardware (just enough for playback)—whereas the aforementioned all have the equivalent of a general computing device in a tiny form factor. That is to say, they boast an actual computer with memory that can handle a wide variety of tasks and challenges. Prepare to see Amazon launch an AirPods competitor with Alexa pre-loaded.
Voice Assistants: Who Said What?
We won’t spend too much time on voice assistants such as Alexa, Siri, and Google Home as the topic du jour has been covered ad nauseam, however, one 800-gorilla-in-the-room problem remains to be solved, one that will demand considerable innovation in software: multiple person detection.
In short, automatically detecting who is talking to the voice assistant and who is talking to someone else in the room, in order to avoid false detection.
Right now, the most advanced voice assistant audio tech has the ability to slice up an audio stream to divide words into syllables. It then takes these syllables and a neural network ‘guesses’ about what which words you just said.
What’s missing is the meta-data attached to the actual syllable. That is, who said what? That’s why you can start a sentence and your spouse can finish it, and Alexa won’t know the difference. Try it.
This phenomenon can be a bit a fun, but is extremely annoying when you’d like to communicate with Alexa, but the rest of the room is engaged in conversation.
“Kids, be quiet so I can ask Alexa about the weather!”
Every voice assistant developer is familiar with this problem—and for voice assistants to jump the chasm into the mainstream, it will be have to be solved. Look for this as differentiating feature once Amazon, Google, Apple or some other party figure out the magic software to make this happen.
Spatializing Virtual Reality
When it comes to audio in VR/AR/MR, the fundamental issue is how to best spatialize audio. Which means that when you see that virtual chirping bird fly by you, the sound of the chirps must match the bird’s position in time and space—if the visual and the aural don’t match up exactly, no magical immersive VR experience.
While almost 100 years old, the Head-Related Transfer Function (“HRTF”) is still the workhorse of spatializing audio. Most recently, Dirac Labs has published a custom HRTF that promises to adapt to any individual’s actual head, accounting for all the variances in head and ear shapes, to deliver that immersive experience to anyone. The challenge though is that the HRTF uses a lot of CPU processing. At the risk of tooting our own horn, we take a different tack with a non-HRTF based spatializer. The innovation of a non-HRTF spatializer is that it offers the same or better audio quality but uses considerably less CPU. Less CPU utilization means less power consumed and more CPU available to process other VR-related tasks, a critical advantage for resource-hungry VR on mobile.
Digital USB Audio
As mobile devices become platforms for VR, digital audio on these devices is also seeing rapid change. The iPhone 7, Moto Z, HTC Bolt or HTC U no longer offer the traditional headphone jack.
Prepare for the 3.5mm headphone jack to be replaced by USB Type-C connector on ALL smartphones in the near future. This means that the entire audio chain on a mobile device will be digital—it will become analog outside of the mobile device.
The advantage is that consumers will have more flexibility, better control and more choice over audio quality by unbundling the DAC (Digital to Analog Converter) from inside the phone to accessories like Sony’s hi-res audio MDR-1ADAC headphones.
Innovation in audio is hardly over and done with. Audio is more than just an adjunct for video—in fact, with the advances of recent mobile processors that are bringing multimedia into our hands and homes—innovation in audio is experiencing a renaissance that will have a tremendous impact on the way we live our lives. Long live audio!
Patrick Vlaskovits is the CEO and founder of Superpowered SDK and a two-time New York Times best selling author.
Gabor Szanto is the founder and CTO of Superpowered.