I, for one, agree with the audiophiles that compression is the problem with modern music distribution. Unfortunately, unlike most of them, I'm referring to dynamic range compression. Do you have a solution for us there? :-)
Edited at 2012-03-05 08:28 pm (UTC)
2012-03-06 12:08 am (UTC)
Wow, I really enjoyed reading this article. I have nothing to do with music (besides listening to it occasionally) and it's good to know that 16/44.1 will be sufficient and that I don't have to be anxious about it anymore.
I switched my entire music library over to FLAC some time ago, and I was even planning to spend a fair amount of money for good headphones. Unfortunately I quickly discovered that I know nothing about headphones at all and reading dozens of online reviews was even more disturbing. Even my friends weren't able to rationalize there buying decisions. As a matter of fact, I still don't have good headphones yet. :)
Now it just happened that I've discovered your article by accident and I was really glad to read something more scientific about audio in general. Can you recommend some in-depth articles about headphones and how to tell apart the good ones?
The Loudness War may finally be receding; unfortunately, it's something the industry as a whole needs to let go of. :-|
2012-03-05 11:39 pm (UTC)
If digital is better . . .
Thanks for a thoughtful analysis. I have no reason to question anything you've written here. It still leaves me with the question: If digital is better than the analog stereo setup of 30 years ago, why did I (and, we now learn, Steve Jobs) always find my CDs tiring, and prefer my vinyl records? Identical receiver and speakers; just CD player vs. turntable. It can't be confirmation bias. I wanted and expected CD to be better than vinyl. Thanks.
2012-03-06 12:25 am (UTC)
Re: If digital is better . . .
Objectively, by any fidelity measure, digital far surpasses what vinyl is capable of. It's entirely possible you prefer either the more veiled distorted sound of vinyl (many people like tube amps for the same reason), or the physical interaction with it (something I kinda miss myself), or both.
Or... Perhaps you're objecting to how modern pop music is so badly overcompressed which has nothing to do with digital, it's just a modern trend. BTW, reissues of vinyl are remastered, so it's likely they've been compressed as well like modern releases.
I came from vinyl and tape and for the first few years of CDs bought into the whole 'there's no way this could be as good as vinyl' that all my hi-fidelity buddies repeated. I eventually wanted a release that wasn't going to come out on vinyl, and I had a real job, so I figured I'd get a CD player finally, but I wouldn't _like_ it.
I felt like a complete fool! I could not believe how much better the CD was in every way. It was deeper, blacker, crisper, no noise, no pops... wow. I never looked back. I bought my first computer not long after for the express purpose of using it to record...
That's my experience. It's too bad if modern mastering trends are ruining it for others :-(
Good read. Looks like a reference to footnote 5 is missing in the text
And a word missing?
...or even by a good lossy encoder *used* incorrectly.
Excellent catches both. Fixed and Thanks.
2012-03-06 04:12 pm (UTC)
I disagree with your point of view
It was very nice to find such a well documented article! I've been trying to persuade my friends for many years not to fall into temptation of charlatans and 'high end' companies marketing.
I have dedicated my life to Audio (+30 years), and I have been an Audio Engineer for 22 years. Unfortunately when you become professional you get more critical and less enthusiastic...
What I can tell you guys is that there is a huge difference between 16/44.1 and 24/192. 16/44.1 just doesn't sound right. When you mix a project (usually I work with 24/96 kHz) you have a sonic depth of the elements, let's say a voice vs. reverberation; finally you get a mix down which is your "Master" but as soon as you convert it to 16/44.1 your work goes to the trash, you lose much of the program you had. The voice will get 'into your face' and you will lose a lot of the reverb you had, you don't get things in the space they were.
Going from 16 to 20 bit is like going from vinyl to CD. Remember that every bit more represents twice the information, so going from 16 to 18 you'll get 4 times more depth. Please don't do less. I agree that you don't need 24 bit but 16 is not enough.
On regards of the sampling frequency 192 kHz do sound softer, it's much more natural but unfortunately it does take more resources. 96kHz has a good trade off.
2012-03-07 03:56 am (UTC)
Re: I disagree with your point of view
The issue I take with this statement is that it's made constantly... and if the change is so obvious, it would be easily observable in a controlled blind test. Yet in every controlled test, no one can tell the difference.
It is also worth mentioning that increasing the bit depth of the audio representation from 16 to 24 bits does not increase the perceptible resolution or 'fineness' of the audio. It only increases the dynamic range, the range between the softest possible and the loudest possible sound, by lowering the noise floor.
This is an extremely common and very reasonable-sounding misconception. I think your page could definitely benefit from elaborating more on why this is wrong.
BTW I recently added a little writeup on the hydrogenaudio wiki about TOS 8 (http://wiki.hydrogenaudio.org/index.php?t
itle=TOS_8) - just a starting point and unlikely to convince the unconvinced but perhaps it may be useful to somebody.
2012-03-08 06:30 pm (UTC)
Re: Please elaborate on this section
Noted, on my tweak list.
2012-03-06 11:29 pm (UTC)
A link with some online listening tests
2012-03-07 11:07 am (UTC)
Just want to say thanks for the effort you put into this. It must have taken some time to get it all laid out. This blog has really helped my audio angst attacks - I'm a trained electrical engineer and an audiophile, and even I have a hard time occasionally cutting through the buffer-bloat on this topic.
Anyway, this article will be read for years to come I'm sure.
Now, however, I shall go listen to some wonderful music without a thought of how it was engineered!
2012-03-08 06:08 pm (UTC)
Re: Great blog
Summing up in one sentence why I'd never want to be a porn star.
In your article you say that hardly anybody understands the basic signal theory or the sampling theorem and that analog signal in practice "can be reconstructed losslessly" from the information which samples contain. As from your article it appears that you are one of the very few who understands the sampling theorem, I have a big question to you. What are differences of signals most widely encountered in practice (e.g., musical ones) from signals that fully corresponds to the sampling theorem? Please list all the differences and then substantiate why we should ignore all these differences.
2012-03-08 06:26 pm (UTC)
Re: sampling theorem
You're right that 'lossless' happens only in ideal circumstances.
No part of the process is going to be ideal, so you see small deviations at every step. All of them will be measurable (and predictable), but it's unusual to find one that's audible unless you're dealing with a flat out bug. Flat out bugs do happen.
The most common truly digital 'bugs' are bad digital antialiasing filters (almost always in software; have a look at http://src.infinitewave.ca/
) and linearity errors in the hardware DAC. However, it's still much more common to hit analog shortcomings (an output stage that can't go full range without distorting badly, etc).
How sampling in practice differs from the ideal:
1) the sampling theorem assumes the sampling period is infinitely small. In practice, it's not. It's close enough, though, that you will have trouble measuring the effect on the bench. You have no hope of ever hearing the effect unless you're designing a bad DAC on purpose just to hear it.
2) quantization assumes perfect linearity. Again, in practice, it's not perfectly linear. This was a problem in the bad old days, not so much now with oversampling (and cascading). The audible effect is primarily harmonic and intermodulation distortion. Here's an example file for you:http://people.xiph.org/~xiphmont/demo/30_and_33.wav
That's two tones, one at 30kHz and one at 33kHz. They're completely inaudible, and the file sould sound 100% silent. If it does not, you're probably hearing intermodulation distortion from a linearity problem in your DAC or analog amplifier (or maybe even from your transducers). Regardless the source, if you hear it, that's what nonlinearity sounds like.
Oversampling has killed this problem in the DAC, you might still hit problems int he analog stages.
The other reason you might hear something is bad resampling in your computer's sound drivers. You can't blame that on the DAC.
3) clock jitter adds noise. It was established in the 80s that it can be audible. The problem's been thoroughly addressed since then with better clocks.
4) the antialiasing filters: These were once the weak link before oversampling. A bad filter will roll off too early, too late, not fast enough (causing aliasing) or ripple in the passband's frequency response. Oversampling has effectively killed this problem. The only time you'll hear it is in those singing greeting cards.Edited at 2012-03-08 06:29 pm (UTC)
2012-03-11 03:16 pm (UTC)
The dynamic range of 16 bits
I am very pleased you have written this article. I understood, and agreed with most of it, but I did not understand a couple of parts.
I had most problems understanding the section "The dynamic range of 16 bits"
It says "16 bit audio is commonly said to have a dynamic range of 96dB (each bit doubles the range and a doubling is about 6dB so, 6dB*16=96dB). This is incorrect."
It then offers an encoding of a 1KHz tone, at a level of -105dB, using 16bit/48kHz in a wav file. It provides a spectral analysis plot to show that 16 bits can encode such a signal.
While that is sufficient to show something can be encoded, I don't think that is sufficient to prove that 16 bits is sufficient to encode *all* sounds with a level below -96dB. Nor does it prove that such encoding would be perceived in the same way as a higher-bit encoding of the same signal.
I can understand that frequencies which have simple integer relationships with the sampling frequency can be encoded and have spectral plots which show lower energy than -96dB.
For example, let's assume the encoding uses a signed integer, with integer 0 as no signal, and the sample rate is 48kHz.
We could construct a data stream which has +1 and -1 at appropriate sample-rate distances to produce any single tone of 1kHz, 2kHz, 4kHz, 500Hz, 250Hz, etc.
We could then remove alternate pairs of +1 and -1, setting the value to 0, and that has reduced the energy content of the signal. So I assume it must be below -96dB. We could remove two samples from three, and so on, reducing the energy in the signal. I apologise if I have already made an error, but this seemed 'intuitive'.
Looking at it a slightly different way, is this 1kHz signal just an artefact of taking some large number of samples? Are there other frequencies which can be carefully encoded to give an audible tone at selected frequencies, but in general, some frequencies can not be encoded down to -105dB? Using that approach of encoding signals at specific sub-frequencies of the sample frequency may be misleading me, but it seems like some combinations of frequencies will be handled less well than others. After all, it seems the signal must be encoded in a tiny number of sample values (or the signal will have an average power above -96dB), and I assume there is a finite duration after which the ear+brain no longer hears 'sound' and it degenerates into noise.
Summary: the existence of that wav file that encodes a specific frequency at a level below -96dB doesn't seem to be enough to prove that 16bits can encode *all* sounds with a level below -96dB, only some of them.
My second concern is the ability to encode that signal with that low energy, for example using the technique I mention, doesn't demonstrate that a human would perceive the signal as the original sound. Does a human perceive this as a 1kHz tone which is identical to a 1kHz signal encoded with 24 bits?
Further, how is that sample proving all audible sounds can be encoded in 16 bits with a level down to 120dB? It may be obvious to you, you think about this a lot, but I need a bit more help and information.
A small point. The article says "Handled correctly, the dynamic range of 16 bit audio reaches 120dB in practice , more than twenty times deeper than the 96dB claim."
The difference between 120dB and 96dB is 24dB. Earlier the article says "a doubling is about 6dB", so I's expect the difference to be about sixteen, not "more than twenty". On my calculator, the difference between 120dB and 96dB is a factor of 15.85.
The spectrum shows a wide range of frequencies as well as the 1kHz peak, for example there is a peak about 20dB lower than the 1kHz tone at just above 8kHz. Is this showing that it is impossible to encode a low-signal level without other frequency content? If that is the case, can all the other signal content be digitally removed without effecting the 1kHz signal? If it can't be removed, without a-priori knowledge of what the actual signal should contain, then might that be a problem? If 16 bits is the *only* format available for storing and transporting sound, yet it introduces artefacts that can not be removed, then it is not the encoding we need for sound which may be further processed.
2012-03-14 09:38 am (UTC)
Re: The dynamic range of 16 bits
Hello (I took the liberty of deleting a mostly duplicate earlier comment; I hope that's OK).
I've substantially rewritten the 'dynamic range of 16 bits' section, as I wasn't happy with it and it confused many people. Hopefully it's much easier to understand now, and I think it answers your questions directly (you weren't the only person to have them).
>On my calculator, the difference between 120dB and 96dB is a factor of 15.85.
Yup, you spotted an error. Fixed.
> The spectrum shows a wide range of frequencies as well as the 1kHz peak, for
> example there is a peak about 20dB lower than the 1kHz tone at just above
> 8kHz. Is this showing that it is impossible to encode a low-signal level
> without other frequency content?
That's noise from the shaped dither. And yes, dither of some variety is needed; it doesn't just allow the very low level signal to be encoded, dither is also the mechanism by which quantization is rendered distortion-free.
It's not leaked energy or an artifact, it is merely injected uncorrelated noise.
> If that is the case, can all the other
> signal content be digitally removed without effecting the 1kHz signal? If it
> can't be removed, without a-priori knowledge of what the actual signal
> should contain, then might that be a problem? If 16 bits is the *only*
> format available for storing and transporting sound, yet it introduces
> artefacts that can not be removed, then it is not the encoding we need for
> sound which may be further processed.
The short answer is: correct. Processing breaks the dither (rendering it partly or completely useless) but the added noise from the dither is still there.
16 bit depth, dithered or no, is still going to be deeper than any analog format that preceded it. But if you have the choice between processing 24 bit audio and 16, you should choose the 24.
2012-04-04 10:09 am (UTC)
Your right, but you're wrong
I agree with your article and conclusions and I enjoyed reading it.
However, people including myself will still make the move to 24/192 music files. why? For the same reasons people buy car A which is inferior to car B, while car B is cheaper. <-- TL:DR; you can stop reading here :P.
I've done double blind tests, on both my own and better audio setups. The best test I've done was the one which proved me wrong; my own hand picked songs for which I've done the ripping and coding myself to make sure it was done right. I could not differentiate ogg > 160kbps and mp3 higher than 192kbps. For the latter, I couldn't have differentiated between 160kbps if it wasn't for one particular song for which the sound stage was screwed. For certain songs even 128kbps was enough for both ogg and mp3.
Yet, I eagerly await the move to lossless 24/96 or even 192khz. But why? It doesn't make sense. Buying car A over car B doesn't make sense either.
In fact, virtually none of my choices make sense. And neither does yours. Surprised? Virtually none of our choices as humans are rational. They are, for the very most part, emotional, even if you think they are not. Your car, your audio stuff, your microwave. Your wife. Virtually nothing you choose is based on pure rationale. It's all biased, prejudiced, emotional, because-it-makes-me-feel-good choices.
So why, if I admit you are right, and I've done the tests to prove myself wrong, I still would choose otherwise?
Maybe because there is no comfort in that knowledge. I listen to my perfectly encoded mp3's, 192kbps VBR, ripped perfectly to the bit and fed optically to my 24bit DAC. 364 days in a year there's nothing wrong and that one day in a year, I'll jump up and say: 'did you hear that?!?'. And I'm off searching through my CD's to listen to the original (damned if I can find it though). There's nothing wrong here except my brain. I reason with myself and argue with the folk that heard it too. I'll point to your article and we spend the next 14 hours comparing and listening, switching audio gear and turning speakers up side down. Not that it matters, it changes nothing and I'll move to 24/96 and I swear I'll feel better. And to the folks that come over I'll say, 'Dude its 24 bit yo, its even better than the original' :).
I don't care that your car was tested better than mine. All the tests prove it, I still don't care and I still like mine better and I'll gladly pay more for it too.
It doesn't make sense, but for all the choices we make, what does?
One day we will move to 64 million bit 4096Thz audio. Not even aliens will hear the difference. But who cares and what does it matter? Why do you care? What are you fighting here? Cost of production? Size? Principle? I honestly wonder?
My internet bandwidth has increased almost a 1000-fold (!!) over the past 20-odd years (dual ISDN 128kbps in the early nineties to 120mbit today). Harddisk (and storage in general) capacity is increasing 10 times every 5 years.
People are having the same discussion on DVD quality vs 720p vs 1080i vs 1080p vs 4K. Discussion is good and so is progress. But why are we questioning progress if it doesn't make things worse?
I believe these things are a result of technical progress. Moore's law. Things get faster and better and audio bitrate and screen resolutions go up as a result. Why? Because its possible. Why should I question this?
When I look around my house, nearly everything is capable of more than what I'm using it for. Even simple things like my stove is capable of 300 degrees celcius. It will burn everything in it. I never use it over 220 degrees. My car does 250 kph while I can only do 120 max. My microwave has 1200 watts while I only need 600 or even less and my TV has full HD capabilities while I can't even differentiate 720p from a good DVD at 4 meters distance.
Why then do you worry about 24 bits while everything in and around your house is capable of so much more than what you're using it for and you never worry about it? Or do you?
2012-04-19 07:59 pm (UTC)
You mentioned squishyball. Do you have any intention to make a release? I'd like see a Debian package and one of the guidelines is that any uploaded package should be stemming from an actual release. There's the occasional -svn package, but I've never seen one without some justification (most often a non-svn package that's hopelessly outdated).
Also, is there a (more-or-less) canonical set of samples to test against?
2012-08-29 05:14 pm (UTC)
a .tar.xz of the sources would already increase the comfort level because i wouldn't have to install git or click 50 times till i have the sourcecode ready to compile.
lol, I was definitely saying "whaaa?" thanks for linking the whole article :-D
2012-09-06 02:29 pm (UTC)
Ah, the spambots finally found this one....
Looks like the spambots are out in full force again. It's been ~ 6 months, so I think it's not a bad time to close comments to keep the spam out.
Thanks to everyone who wrote, and there's still email :-)