Hacker Newsnew | past | comments | ask | show | jobs | submitlogin



For those who have never heard of Ambisonics, it's a surround-sound format based on taking the concept of differential stereo and extending it to three dimensions. Differential stereo is of course where instead of storing/transmitting each channel independently (as on audio cassettes or uncompressed digital audio formats like CDs) you store/transmit a sum of the two channels (L+R) and a difference (L-R) (as on vinyl records or analog FM broadcasts). Monophonic devices can just reproduce the sum channel and ignore the difference channel. Stereophonic devices can recover the two stereo channels through simple signal processing: (L+R) + (L-R) = 2L, (L+R) - (L-R) = 2R.

Ambisonics essentially asks the question of "what if we also stored a front-back and a top-bottom signal?" It turns out this is sufficient to fully represent 3D spatial audio. Additionally, unlike traditional "5.1/7.1"/etc. audio formats which assume a fixed speaker placement, digital signal processing can be used to adapt the audio to any number of speakers in any location in a space or use an HRTF to produce virtualized 3D audio for headphones.

Ambisonics had the misfortune of being invented in the 1970s before digital signal processing made the necessary audio signal processing for making practical use of it cheap and easy to implement, so inferior (but cheaper to implement) formats like Dolby Pro Logic ended up winning in both the consumer and professional spaces. From an open source perspective, though, this makes Ambisonics compelling because all of the important patents related to it should be well past their expiration by this point.


Do we know how Apple's Spatial Audio works? And what is its relationship to Atmos?


Dolby Atmos allows you to specify multiple channels together with spatial metadata. The Ambisonics support for opus is quite similar. The spec allows the encoder to specify an arbitrary matrix used for mixing of the audio. If you read Apple's blog post, it seems they use the Dolby Atmos format instead of their own homegrown solution.


Doing some reading just now, it seems Atmos is what's called an OBA (object-based audio) format. It's fundamentally lower-level than Ambisonics, which falls into the SBA (scene-based audio) category, so it should be possible to mix it down to Ambisonics. In fact I suspect the Atmos decoder does something like that on its way to creating signals for each speaker. OBA formats generally require more bandwidth than SBA formats. The benefit is that the end user can customize the mix (e.g. mute certain sources). In the case where there are many mutually-exclusive objects (e.g. dialog in different languages), the bandwidth advantage of SBA diminishes...


Maybe there is no first-class support in the main opus encoder for object based audio, but as I've said above you can encode arbitrary matrices, so you can encode arbitrary positions for channels. That's enough for stationary objects. Moving objects, I'm not sure if Atmos supports them, can't be encoded in a single opus stream, however one can chain multiple opus streams after another. I don't think you need high time resolution for moving objects, something like one stream per 100ms should be enough, 10 per second. So in theory, opus can get get very close to object based audio, even if the encoder right now only does ambisonics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: