Previous Entry Share Next Entry

Why 24-bit/192kHz music downloads make no sense

(by Monty and the Xiph.Org community)

Articles last month revealed that musician Neil Young and Apple's Steve Jobs discussed offering digital music downloads of 'uncompromised studio quality'. Much of the press and user commentary was particularly enthusiastic about the prospect of uncompressed 24 bit 192kHz downloads. 24/192 featured prominently in my own conversations with Mr. Young's group several months ago.

Unfortunately, there is no point to distributing music in 24-bit/192kHz format. Its playback fidelity is slightly inferior to 16/44.1 or 16/48, and it takes up 6 times the space.

If you just said 'Whaa?', you may want to read the whole article.

It's fairly long... but hearing, perception and fidelity are complicated topics. Shysters and charlatans exploit that nuance (and misunderstanding) to bilk unsuspecting consumers of their money, all the while convincing them they're paying for 'quality'.

Anyway, happy reading and comments welcome!

  • 1
I, for one, agree with the audiophiles that compression is the problem with modern music distribution. Unfortunately, unlike most of them, I'm referring to dynamic range compression. Do you have a solution for us there? :-)

Edited at 2012-03-05 08:28 pm (UTC)
(Frozen) (Thread) (Expand)

Good Headphones?

Wow, I really enjoyed reading this article. I have nothing to do with music (besides listening to it occasionally) and it's good to know that 16/44.1 will be sufficient and that I don't have to be anxious about it anymore.

I switched my entire music library over to FLAC some time ago, and I was even planning to spend a fair amount of money for good headphones. Unfortunately I quickly discovered that I know nothing about headphones at all and reading dozens of online reviews was even more disturbing. Even my friends weren't able to rationalize there buying decisions. As a matter of fact, I still don't have good headphones yet. :)

Now it just happened that I've discovered your article by accident and I was really glad to read something more scientific about audio in general. Can you recommend some in-depth articles about headphones and how to tell apart the good ones?
(Frozen) (Parent) (Thread) (Expand)

Re: Good Headphones? (Anonymous) Expand

If digital is better . . .

Thanks for a thoughtful analysis. I have no reason to question anything you've written here. It still leaves me with the question: If digital is better than the analog stereo setup of 30 years ago, why did I (and, we now learn, Steve Jobs) always find my CDs tiring, and prefer my vinyl records? Identical receiver and speakers; just CD player vs. turntable. It can't be confirmation bias. I wanted and expected CD to be better than vinyl. Thanks.


(Frozen) (Thread) (Expand)

Re: If digital is better . . .

Objectively, by any fidelity measure, digital far surpasses what vinyl is capable of. It's entirely possible you prefer either the more veiled distorted sound of vinyl (many people like tube amps for the same reason), or the physical interaction with it (something I kinda miss myself), or both.

Or... Perhaps you're objecting to how modern pop music is so badly overcompressed which has nothing to do with digital, it's just a modern trend. BTW, reissues of vinyl are remastered, so it's likely they've been compressed as well like modern releases.

I came from vinyl and tape and for the first few years of CDs bought into the whole 'there's no way this could be as good as vinyl' that all my hi-fidelity buddies repeated. I eventually wanted a release that wasn't going to come out on vinyl, and I had a real job, so I figured I'd get a CD player finally, but I wouldn't _like_ it.

I felt like a complete fool! I could not believe how much better the CD was in every way. It was deeper, blacker, crisper, no noise, no pops... wow. I never looked back. I bought my first computer not long after for the express purpose of using it to record...

That's my experience. It's too bad if modern mastering trends are ruining it for others :-(
(Frozen) (Parent) (Thread) (Expand)

Good read. Looks like a reference to footnote 5 is missing in the text
(Frozen) (Thread)

And a word missing?
...or even by a good lossy encoder *used* incorrectly.
(Frozen) (Thread)

Excellent catches both. Fixed and Thanks.
(Frozen) (Parent) (Thread)

I disagree with your point of view

It was very nice to find such a well documented article! I've been trying to persuade my friends for many years not to fall into temptation of charlatans and 'high end' companies marketing.

I have dedicated my life to Audio (+30 years), and I have been an Audio Engineer for 22 years. Unfortunately when you become professional you get more critical and less enthusiastic...

What I can tell you guys is that there is a huge difference between 16/44.1 and 24/192. 16/44.1 just doesn't sound right. When you mix a project (usually I work with 24/96 kHz) you have a sonic depth of the elements, let's say a voice vs. reverberation; finally you get a mix down which is your "Master" but as soon as you convert it to 16/44.1 your work goes to the trash, you lose much of the program you had. The voice will get 'into your face' and you will lose a lot of the reverb you had, you don't get things in the space they were.

Going from 16 to 20 bit is like going from vinyl to CD. Remember that every bit more represents twice the information, so going from 16 to 18 you'll get 4 times more depth. Please don't do less. I agree that you don't need 24 bit but 16 is not enough.

On regards of the sampling frequency 192 kHz do sound softer, it's much more natural but unfortunately it does take more resources. 96kHz has a good trade off.

(Frozen) (Thread)

Re: I disagree with your point of view

The issue I take with this statement is that it's made constantly... and if the change is so obvious, it would be easily observable in a controlled blind test. Yet in every controlled test, no one can tell the difference.
(Frozen) (Parent) (Thread) (Expand)

Please elaborate on this section

Great writeup!
It is also worth mentioning that increasing the bit depth of the audio representation from 16 to 24 bits does not increase the perceptible resolution or 'fineness' of the audio. It only increases the dynamic range, the range between the softest possible and the loudest possible sound, by lowering the noise floor.
This is an extremely common and very reasonable-sounding misconception. I think your page could definitely benefit from elaborating more on why this is wrong.

BTW I recently added a little writeup on the hydrogenaudio wiki about TOS 8 ( - just a starting point and unlikely to convince the unconvinced but perhaps it may be useful to somebody.
(Frozen) (Thread)

Re: Please elaborate on this section

Noted, on my tweak list.
(Frozen) (Parent) (Thread)

A link with some online listening tests

I got this in email, a few others will likely find it interesting:

I've not yet looked at everything there, but the pieces I've played with looked quite good.
(Frozen) (Thread)

Great blog

Just want to say thanks for the effort you put into this. It must have taken some time to get it all laid out. This blog has really helped my audio angst attacks - I'm a trained electrical engineer and an audiophile, and even I have a hard time occasionally cutting through the buffer-bloat on this topic.

Anyway, this article will be read for years to come I'm sure.

Now, however, I shall go listen to some wonderful music without a thought of how it was engineered!
(Frozen) (Thread)

Summing up in one sentence why I'd never want to be a porn star.
(Frozen) (Parent) (Thread)

In your article you say that hardly anybody understands the basic signal theory or the sampling theorem and that analog signal in practice "can be reconstructed losslessly" from the information which samples contain. As from your article it appears that you are one of the very few who understands the sampling theorem, I have a big question to you. What are differences of signals most widely encountered in practice (e.g., musical ones) from signals that fully corresponds to the sampling theorem? Please list all the differences and then substantiate why we should ignore all these differences.
(Frozen) (Thread)

Re: sampling theorem

You're right that 'lossless' happens only in ideal circumstances.

No part of the process is going to be ideal, so you see small deviations at every step. All of them will be measurable (and predictable), but it's unusual to find one that's audible unless you're dealing with a flat out bug. Flat out bugs do happen.

The most common truly digital 'bugs' are bad digital antialiasing filters (almost always in software; have a look at and linearity errors in the hardware DAC. However, it's still much more common to hit analog shortcomings (an output stage that can't go full range without distorting badly, etc).

How sampling in practice differs from the ideal:

1) the sampling theorem assumes the sampling period is infinitely small. In practice, it's not. It's close enough, though, that you will have trouble measuring the effect on the bench. You have no hope of ever hearing the effect unless you're designing a bad DAC on purpose just to hear it.

2) quantization assumes perfect linearity. Again, in practice, it's not perfectly linear. This was a problem in the bad old days, not so much now with oversampling (and cascading). The audible effect is primarily harmonic and intermodulation distortion. Here's an example file for you:

That's two tones, one at 30kHz and one at 33kHz. They're completely inaudible, and the file sould sound 100% silent. If it does not, you're probably hearing intermodulation distortion from a linearity problem in your DAC or analog amplifier (or maybe even from your transducers). Regardless the source, if you hear it, that's what nonlinearity sounds like.

Oversampling has killed this problem in the DAC, you might still hit problems int he analog stages.

The other reason you might hear something is bad resampling in your computer's sound drivers. You can't blame that on the DAC.

3) clock jitter adds noise. It was established in the 80s that it can be audible. The problem's been thoroughly addressed since then with better clocks.

4) the antialiasing filters: These were once the weak link before oversampling. A bad filter will roll off too early, too late, not fast enough (causing aliasing) or ripple in the passband's frequency response. Oversampling has effectively killed this problem. The only time you'll hear it is in those singing greeting cards.

Edited at 2012-03-08 06:29 pm (UTC)
(Frozen) (Parent) (Thread)

The dynamic range of 16 bits

I am very pleased you have written this article. I understood, and agreed with most of it, but I did not understand a couple of parts.

I had most problems understanding the section "The dynamic range of 16 bits"

It says "16 bit audio is commonly said to have a dynamic range of 96dB (each bit doubles the range and a doubling is about 6dB so, 6dB*16=96dB). This is incorrect."

It then offers an encoding of a 1KHz tone, at a level of -105dB, using 16bit/48kHz in a wav file. It provides a spectral analysis plot to show that 16 bits can encode such a signal.

While that is sufficient to show something can be encoded, I don't think that is sufficient to prove that 16 bits is sufficient to encode *all* sounds with a level below -96dB. Nor does it prove that such encoding would be perceived in the same way as a higher-bit encoding of the same signal.

I can understand that frequencies which have simple integer relationships with the sampling frequency can be encoded and have spectral plots which show lower energy than -96dB.

For example, let's assume the encoding uses a signed integer, with integer 0 as no signal, and the sample rate is 48kHz.

We could construct a data stream which has +1 and -1 at appropriate sample-rate distances to produce any single tone of 1kHz, 2kHz, 4kHz, 500Hz, 250Hz, etc.

We could then remove alternate pairs of +1 and -1, setting the value to 0, and that has reduced the energy content of the signal. So I assume it must be below -96dB. We could remove two samples from three, and so on, reducing the energy in the signal. I apologise if I have already made an error, but this seemed 'intuitive'.

Looking at it a slightly different way, is this 1kHz signal just an artefact of taking some large number of samples? Are there other frequencies which can be carefully encoded to give an audible tone at selected frequencies, but in general, some frequencies can not be encoded down to -105dB? Using that approach of encoding signals at specific sub-frequencies of the sample frequency may be misleading me, but it seems like some combinations of frequencies will be handled less well than others. After all, it seems the signal must be encoded in a tiny number of sample values (or the signal will have an average power above -96dB), and I assume there is a finite duration after which the ear+brain no longer hears 'sound' and it degenerates into noise.

Summary: the existence of that wav file that encodes a specific frequency at a level below -96dB doesn't seem to be enough to prove that 16bits can encode *all* sounds with a level below -96dB, only some of them.

My second concern is the ability to encode that signal with that low energy, for example using the technique I mention, doesn't demonstrate that a human would perceive the signal as the original sound. Does a human perceive this as a 1kHz tone which is identical to a 1kHz signal encoded with 24 bits?

Further, how is that sample proving all audible sounds can be encoded in 16 bits with a level down to 120dB? It may be obvious to you, you think about this a lot, but I need a bit more help and information.

A small point. The article says "Handled correctly, the dynamic range of 16 bit audio reaches 120dB in practice [10], more than twenty times deeper than the 96dB claim."

The difference between 120dB and 96dB is 24dB. Earlier the article says "a doubling is about 6dB", so I's expect the difference to be about sixteen, not "more than twenty". On my calculator, the difference between 120dB and 96dB is a factor of 15.85.

The spectrum shows a wide range of frequencies as well as the 1kHz peak, for example there is a peak about 20dB lower than the 1kHz tone at just above 8kHz. Is this showing that it is impossible to encode a low-signal level without other frequency content? If that is the case, can all the other signal content be digitally removed without effecting the 1kHz signal? If it can't be removed, without a-priori knowledge of what the actual signal should contain, then might that be a problem? If 16 bits is the *only* format available for storing and transporting sound, yet it introduces artefacts that can not be removed, then it is not the encoding we need for sound which may be further processed.
(Frozen) (Thread)

Re: The dynamic range of 16 bits

Hello (I took the liberty of deleting a mostly duplicate earlier comment; I hope that's OK).

I've substantially rewritten the 'dynamic range of 16 bits' section, as I wasn't happy with it and it confused many people. Hopefully it's much easier to understand now, and I think it answers your questions directly (you weren't the only person to have them).

>On my calculator, the difference between 120dB and 96dB is a factor of 15.85.

Yup, you spotted an error. Fixed.

> The spectrum shows a wide range of frequencies as well as the 1kHz peak, for
> example there is a peak about 20dB lower than the 1kHz tone at just above
> 8kHz. Is this showing that it is impossible to encode a low-signal level
> without other frequency content?

That's noise from the shaped dither. And yes, dither of some variety is needed; it doesn't just allow the very low level signal to be encoded, dither is also the mechanism by which quantization is rendered distortion-free.

It's not leaked energy or an artifact, it is merely injected uncorrelated noise.

> If that is the case, can all the other
> signal content be digitally removed without effecting the 1kHz signal? If it
> can't be removed, without a-priori knowledge of what the actual signal
> should contain, then might that be a problem? If 16 bits is the *only*
> format available for storing and transporting sound, yet it introduces
> artefacts that can not be removed, then it is not the encoding we need for
> sound which may be further processed.

The short answer is: correct. Processing breaks the dither (rendering it partly or completely useless) but the added noise from the dither is still there.

16 bit depth, dithered or no, is still going to be deeper than any analog format that preceded it. But if you have the choice between processing 24 bit audio and 16, you should choose the 24.
(Frozen) (Parent) (Thread) (Expand)

(Deleted comment)

Re: Your right, but you're wrong

Comment removed at request of poster
(Frozen) (Parent) (Thread)


You mentioned squishyball. Do you have any intention to make a release? I'd like see a Debian package and one of the guidelines is that any uploaded package should be stemming from an actual release. There's the occasional -svn package, but I've never seen one without some justification (most often a non-svn package that's hopelessly outdated).

Also, is there a (more-or-less) canonical set of samples to test against?
(Frozen) (Thread)

Re: squishyball

a .tar.xz of the sources would already increase the comfort level because i wouldn't have to install git or click 50 times till i have the sourcecode ready to compile.
(Frozen) (Parent) (Thread)

lol, I was definitely saying "whaaa?" thanks for linking the whole article :-D
(Frozen) (Thread)

thank YOU
(Frozen) (Thread)

Ah, the spambots finally found this one....

Looks like the spambots are out in full force again. It's been ~ 6 months, so I think it's not a bad time to close comments to keep the spam out.
Thanks to everyone who wrote, and there's still email :-)
(Frozen) (Thread)

  • 1

Log in