If the only thing you do with your music is listen, then yes, 24/192 delivers questionable value compared to the more popular 16/44.1 or 16/48 formats.
However, all musicians I know use these high-rez formats internally. The reason for that, when you apply audio effects, especially complex VST ones, these discretization artifacts noticeably decrease the result quality.
Maybe, the musicians who distribute their music in 24/192 format expect their music to be mixed and otherwise processed.
I do not believe it. 24 bits is definitely needed for processing (better yet, use floating-point).
Not 192 kHz; no friggin' way.
Repeated processing through multiple blocks at a given sample rate does not produce cumulative discretization problems in the time domain; it's just not how the math works.
Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between? Technically, 64-bit floating point format would be enough for precision. But that would inflate both bandwidth and CPU requirements for no value. 32-bit floating point ain’t enough. Many people in the industry already use 32-bit integers for these samples.
> Not 192 kHz; no friggin' way.
I think you’re underestimating the complexity of modern musician-targeted VST effects. Take a look: https://www.youtube.com/watch?v=-AGGl5R1vtY
I’m not an expert, i.e. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.
BTW, professionals use 24bit/192kHz audio interfaces for decades already. E.g. ESI Juli@ was released in 2004, and that was very affordable device back then.
I habitually edit audio using 32-bit float, not 16-bit integer.
> Why would you want to use a floating point in between?
Because 32-bit float has enough mantissa bits to represent all 24-bit integer fixed-point values exactly, so it is at least as good.
Because 32-bit float is friendly to vectorization/SIMD, whereas 24-bit integer is not.
Because with 32-bit integers, you still have to worry about overflow if you start stacking like 65536 voices on top of each other, whereas 32-bit float will behave more gracefully.
Because 32-bit floating-point audio editing is only double the storage/memory requirements compared to 16-bit integer, but it buys you the ultimate peace of mind against silly numerical precision problems.
float has scale-independent error because it is logarithmic/exponential.
If you quiet the amplitude by some decibels, that is just decrementing the exponent field in the float; the mantissa stays 24 bits wide.
If you quiet the amplitude of integer samples, they lose resolution (bits per sample).
If you divide a float by two, and then multiply by two, you recover the original value without loss, because just the exponent decremented and then incremented again.
(Of course, I mean: in the absence of underflow. But underflow is far away. If the sample value of 1 is represented as 1.0, you have tons of room in either direction.)
> Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between?
Fixed point arithmetic is non-trivial and not well supported by CPU instruction sets.
(Hint: you can't just use integer add/multiply.)
> I think you’re underestimating the complexity of modern musician-targeted VST effects. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.
Indeed, many audio effects require upsampling to work well with common inputs, e.g highly non-linear effects like distortion/saturation or analog filter models.
However usually they perform upsampling and downsampling internally (commonly between 2x-4x-8x).
While upsampling/downsampling is expensive (especially if you are using multiple of these types of plugins) its not clear if running at a higher sample rate across the board is worth it just to save those steps.
But it's not resolution, right? It's extra frequencies outside the audible range. Is there any natural process that would make those affect the audible components, if I were listening to the music live instead of a recording?
Only if the two are mixed in a nonlinear way (e.g. "heterodyned").
If a sonic and ultrasonic frequency are combined together, but a low pass filter doesn't pass the ultrasonic one, the ultrasonic one doesn't exist on the other end.
> Why would you want to use a floating point in between?
Because if you don't you accumulate small errors at each processing step due to rounding. Remember that it is very common for an input to pass through multiple digital filters, EQs, some compressors, a few plugins, then to be grouped and have more plugins applied to the group. You can end up running the sample through hundreds of equations before final output. Small errors at the beginning can be magnified.
Pretty much all pro-level mix engines use 32-bit floating point for all samples internally. This gives you enough precision that there isn't a useful limit to the number of processing steps before accumulated error becomes a problem. By all samples I mean the input comes from a 24-bit ADC and gets converted to 32-bit FP. From that point on all plugins and processes use 32-bit FP. The final output busses convert back to 24-bit and dither to feed the DAC (for higher-end gear the DAC may handle this in hardware).
As for 192 kHz I've never seen or heard a difference. Even 96 kHz seems like overkill. A lot of albums have been recorded at 48 kHz without any problems. As the video explains there is no "missed" audible information if you're sampling at 48 kHz. I know that seems counter-intuitive but the math (and experiments) bear this out.
An inaccurate but intuitive way to think about it is your ear can't register a sound at a given frequency unless it gets enough of the wave which has a certain length in the time domain (by definition). If an impulse is shorter than that then it has a different frequency, again by definition. 1/16th of a 1 kHz wave doesn't actually happen. Even if it did a speaker is a physical moving object and can't respond fast enough to make that happen (speakers can't reproduce square waves either for the same reasons - they'll end up smoothing it out somewhat). Even if it could the air can't transmit 1/16th of a wave - the effect will be a lower-amplutide wave of a different frequency. And again your ear drum can't transmit such an impulse (nor can it transmit a true square wave).
I've done a lot of live audio mixing and a little bit of studio work, including helping a band cut a vinyl album. Fun fact: almost all vinyl is made from CD masters and has been for years. The vinyl acetate (and master) are cut by squashing the crap out of the CD master and applying a lot of EQ to shape the signal (both to prevent the needle from cutting the groove walls too thin), then having the physical medium itself roll off the highs.
The only case where getting a 24-bit/192kHz recording might be worthwhile is if it is pre-mastering. Then it won't be over-compressed and over-EQ'd, but that applies just as well to any master. (For the vinyl we cut I compressed the MP3 version myself from the 24-bit 48 kHz masters so they had the best dynamic range of anything: better than the CD and far better than the Vinyl).
Unless you are altering the time or pitch, which, these days, you are more than you are not.
But no, musicians aren't releasing things at ultra-resolutions because they expect others to reuse their work. The ones that are, are providing multitracks.
> Repeated processing through multiple blocks at a given sample rate does not produce cumulative discretization problems in the time domain; it's just not how the math works.
That isn't entirely true. e.g. It's common for an audio DSP to use fixed point 24bit coefficients for an FIR filter. If you're trying to implement a filter at low frequency then there can be significant error due to coefficient precision, that error is reduced by increasing the sampling rate.
It can be useful to run your signal processing chain at a higher rate because many digital effects are not properly bandlimited internally (and it would be pretty CPU hungry to do so).
But that doesn't mean you need to store data even that you'll process later at 192KHz though it might be easier to do so.
What if I simply want to slow down the signal 2 times (without pitch correction)? Then 44.1 kHz is obviously not enough. Maybe 192 kHz is way overkill though, but I would argue that that's the point. You don't want it to become a bottleneck ever for any effects.
Lots of my plugin users are resorting to massive oversampling, even when it's not appropriate, simply because in their experience so many of the plugins they use are substantially better when oversampled. 192K is an extremely low rate for oversampling.
I wonder how you measure the quality difference for higher rates? 384K, 512K and beyond? I hear from audiophiles that there is a very distinct difference, but there is absolutely no basis for it in science.
Not so: this oversampling is mostly about generating distortion products without aliasing, so in the context I mean, the difference is obvious. But, it's a tradeoff- purer behavior under distortion, versus a dead quality that comes from extreme over-processing. I've never seen an audiophile talk about 384K+ sampling rate unless they mean DSD, and with that, it's for getting the known unstable HF behavior up and out of the audible range.
Oversampling in studio recording is mostly about eliminating aliasing in software that's producing distortion, and it's only relevant in that context: I don't think it's nearly so relevant on, say, an EQ.
That could be simply that some of these plugins are either written to specifically work correctly only at 192 kHz; i.e. it's a matter of matching the format they
expect.
Notice that there are a number of species whose range extends well past 20kHz. Even with 192kHz you're still chopping off the upper end of what dolphins and porpoises can hear and produce.
So please convince Apple and friends that you need 200+kHz to truly capture the "warmth" of Neil Young's live albums. Then we'll be able to crowdsource recording all the animals and eventually synthesize the sounds to communicate with them all.
Maybe then we can synthesize a compelling, "We're really sorry, we promise we're going to fix all the things now," for the dolphins and convince them not to leave. :)
for DAW use - this has been mentioned before on this thread, but 192khz has two beneficial effects - lower latency for realtime playing, and it "forces" some synths and effects to oversample, thus reducing aliasing.
All this comes at a high computational and storage cost though.
I personally use 44khz 24bit settings for DAW use.
TFA specifically addresses this and agrees 100% with you on it. Develop in 24/192, but ship in 16/48. Good point about supporting better downstream remixing!
I feel like there is some parallel to open source and GPL type licenses: you can never know which of your customers may wish to remix or further work on your materials, so you should ship the "source" material.
To truly support downstream remixing, musicians would need to distribute the original, separate tracks. Occasionally an independent musician will do this, but it's not at all common AFAIK.
For software, you have a source code, you compile it with a standard-complying compiler, and you’ll more or less reproduce the result.
Music is different. If you have the original multi-track composition, you can’t reproduce the result unless you also have the original DAW software (of the specific version), and all the original VST plugins (again, of the specific version each).
All that software is non-free, and typically it’s quite expensive (esp. VSTi). Some software/plugins are only available on Windows or Mac. There’s little to no compatibility even across different versions of the same software.
Musicians don’t support nor maintain their music. Therefore, DAW software vendors don’t care about open formats, interoperability, or standardization.
All true. One independent musician that I like, Kevin Reeves, made the tracks from his first album available for free download for a while. But he published them as WAV files. So, of course, one could only create one's own mixes, not easily reproduce his. But if he had just posted the files saved by his DAW (or his mix engineer's DAW), that would have been useless to just about everyone. Aside: Though the album was completed in mid-2006, it was recorded with DAW software that was obsolete even then, specifically an old version of Pro Tools for Mac OS 9, because both Kevin and his mix engineer (for that album) are blind, and Pro Tools for OS X was inaccessible to blind users at the time.
BTW, I'm merely a dilettante when it comes to recording and especially mixing.
However, all musicians I know use these high-rez formats internally. The reason for that, when you apply audio effects, especially complex VST ones, these discretization artifacts noticeably decrease the result quality.
Maybe, the musicians who distribute their music in 24/192 format expect their music to be mixed and otherwise processed.