Could 8-bit be enough for carrying all the information in a given piece of music?

Frank Dernie · Oct 25, 2019

First I should apologise for the slightly provocative title.
I know human hearing has a dynamic range of ~120dB but the only device I can think of which needs this much would be an environmental noise recorder that you didn't want to miss any sound and didn't want to use an auto gain control.
Those of us that have recorded music on any type of recorder will be familiar with its level control. If the sound is loud we turn it down a bit, and vice versa.
Back in tape days it was a real skill to get the level good. Too high gives distortion on peaks but too low made the tape noise too audible. A bit of tape overload can be euphonic (a tape overload emulator plug in is a popular limiter used by recordists now) and is certainly better than audible hiss during the quiet parts of the music so we did tend to over record a touch, something catastrophic with digital.
With a 16-bit digital recorder it is very easy to set levels since, IME, even with peaks at -6dB the quiet bits of the music will never be accompanied by audible hiss.
What started me on this line of thought was an experience at the Scalford enthusiasts show, put on by the HiFi Wigwam, about 5 or so years ago.
I showed up with a few bits of music on a USB stick one of which was a 24/96 recording of Eric Whitaker music (Water Night).
@Pluto of this parish was there with his active Harbeth Monitor 40s and his laptop. I had intended to ask him to produce a 16/44.1 down sample of one of my files to compare with the 24/96 original but he suggested a different comparison which may be more surprising for listeners and he could do in real time on his PC.
This was to play back the file as 8-bit with noise shaping. We, the assembled audience of enthusiasts and many die hard "analogue is better" fans, then got to compare 24/96, 8/96 with noise shaping and 8/96 without.
The 8/96 had obvious hiss in the quiet bits and between tracks without noise shaping but I think it fair to say nobody in the room could hear any difference between the 8-bit noise shaped and the original 24/96. I was surprised, and a several of the audience were angry, refusing to believe they had been contentedly listening to 8-bit and claiming trickery.
Anyway @sergeauckland was there too so maybe he would remember it too.

sergeauckland · Oct 25, 2019

I do remember it well, it was one of the most impressive demonstrations of theory working in practice.

S.

ElNino · Oct 25, 2019

Believe it or not, there is virtually no academic research on the threshold of audibility of different bit depths. (With sample rates, there is some research but limited consensus, but with bit depths, there actually is almost nothing out there.) Reiss did a review of all of the published research in 2016 along both axes; you might find it interesting: http://www.aes.org/tmpFiles/elib/20191025/18296.pdf

It is plausible that 8 bit, properly dithered, would be enough (or close to enough) to be audibly transparent for most people. We just don't have a lot of good published data either way.

pozz · Oct 25, 2019

@j_j previously did a demonstration he mentioned in this talk where a certain encoding protocol produced only 13dB SNR. Apparently it was hard to tell the difference between that and the original.

GrimSurfer · Oct 25, 2019

First, the title is a little provocative because of the phrase "all the information".

Second, the focus on musical signal ignores the issue of noise. A dynamic range of 48 dB would bring noise well into the audible band, particularly in the middle frequencies. Some of this might be masked by the room or signal, so may not be immediately apparent. But it is noise nonetheless and would contribute of fatigue and irritation over time.

The last point is something that doesn't get factored into ABX tests. The sound clips are small and frequently switched. It can, however, be picked up in other ways. Prolonged use at home, for instance, which isn't affected as much by bias because there is no "B sample" against which to compare "A".

amirm · Oct 25, 2019

If you sample high enough, you get down to 1 bit as in with DSD. 96 kHz helps a lot with noise shaping as it allows the dither noise to be parked up there with plenty of space. And higher sample rate reduces the needed bit depth some as well.

GrimSurfer · Oct 25, 2019

amirm said:
If you sample high enough, you get down to 1 bit as in with DSD. 96 kHz helps a lot with noise shaping as it allows the dither noise to be parked up there with plenty of space. And higher sample rate reduces the needed bit depth some as well.

Sure, but then things turn into a toh-mate-oh, tho-matt-oh discussion (16/44 vs 8/96). In an era where bandwidth, memory or processing aren't limiting factors, it probably doesn't matter.

Fluffy · Oct 25, 2019

Interesting. Can anybody explain how does noise shaping work? I could never figure that out…

ayane · Oct 25, 2019

I've tested this on myself anecdotally and I was not surprised that I can tell the difference between 44/16 and 44/8 with or without noise shaping. I've got fairly young ears and very clean measuring gear, so I might be an outlier.

I want to go back and see how low I can drop the bitrate before it's audibly transparent with the original. I wouldn't be surprised if 9 or 10 bits is transparent to me. I also wonder what the lowest sampling rate would be in order to make noise-shaped 8-bit transparent for me.

MZKM · Oct 25, 2019

Most people listen to music with peaks of 95dB. 16Bit music is usually dithered, so >100dB or dynamic range, so 0dB to 95dB is perfectly played back.

For movies, reference level peaks are 105dB. Most rooms have a noise floor of roughly 35dB-50dB (lower on the treble region), that is ~70dB of SNR. Most studies show that with music, THD of roughly 1% is the lowest we can hear (close to 100% THD in the bass), meaning a SINAD of ~ 40dB should good enough (may not be perfect, but good enough).

8Bit is ~49dB; let’s see if you can tell the 8Bit from 16Bit in this Neil Young song.

So yes, 16Bit, which is usually dithered, so >100dB of dynamic range, should be good enough.

MZKM · Oct 25, 2019

Fluffy said:
Interesting. Can anybody explain how does noise shaping work? I could never figure that out…

Time stamped to noise-shaping

Or were you looking for an actual explanation of how we can shape the noise?

DonH56 · Oct 25, 2019

Oversampling and noise shaping gets a little complicated... See e.g. https://www.audiosciencereview.com/...igma-delta-digital-audio-converters-dac.1928/ and there are some other good articles about it in the technical section.

Noise shaping can be performed using oversampling or (rarely IME) paralleled converters and special techniques to provide equivalent oversampling (Hadamard sequences and input multipliers is one way I researched once upon a time).
Oversampling with nothing else (no noise shaping or special filtering) provides about 1/2 bit reduction for each doubling of the sampling rate for a given bandwidth. This is simply because the noise is spread over a greater bandwidth, so if you double the sampling rate but keep the same output bandwidth, noise is reduced by about 3 dB.
Increasing the order of the delta-sigma (or whatever) modulator yields greater SNR improvement to the tune of about 1 bit for each order (no noise shaping ~0.5 bit, 1st order ~1.5 bit, 2nd order ~2.5 bit, etc. for each doubling of sampling rate, again for a fixed bandwidth).
This is for quantization noise, so distortion is not really affected (the input or output still needs N-bit linearity for N-bit resolution), and dither (noise decorrelation) is generally affected just like any other input signal (the added noise over the Nyquist bandwidth will be shaped above the output bandwidth).

I am not sure how higher sampling rates lead to lower bit depth except when noise shaping is used -- @amirm, is that what you meant, or something else?

I made up some files years ago to try a few different scenarios. I can well believe 8 bits would be fine for some music, inadequate for other, probably readily discerned with test tones, and likely due more to the quantization noise floor than the distortion.

HTH - Don

ayane · Oct 25, 2019

Fluffy said:
Interesting. Can anybody explain how does noise shaping work? I could never figure that out…

Noise shaping works by pushing the quantization error into parts of the passband which are out of the way of the signal.

For example, quantizing a 16-bit signal to 8 bits will introduce quantization noise which is highly correlated to the signal. This can be avoided by randomizing the least significant bit. Randomly choosing the bit to be 1 or zero with 50/50 probability will result in white noise. White noise has a flat spectrum with equal amplitude at all frequencies. This white noise can be changed to another noise spectrum such that more of the noise is thrown into the high frequencies where our hearing is weak, which actually increases the signal to noise ratio where our hearing is more sensitive. A simple way of doing this is to randomly pick the value of the LSB such that it falls into a probability distribution that looks like the noise spectrum we desire.

This is a high level conceptual explanation of the concept which really needs a good understanding of math to be understood properly. The Wikipedia article for noise shaping is a good place to start, as well as Monty's "Digital Show and Tell".

RayDunzl · Oct 25, 2019

1 "bit" is enough to create reproduce an easily recognizable tune.

At -30dBfs, no dither:

Hear at SoundCloud...

---

Amazingly enough, they didn't like the fulllength version in one-bit one bit:

ayane · Oct 25, 2019

RayDunzl said:
1 "bit" is enough to create reproduce an easily recognizable tune.

At -30dBfs, no dither:

View attachment 36919

Hear at SoundCloud...

What a brilliant example. This is almost exactly how PDM technology works, for example DSD. Technically PDM is just 1-bit "noise-shaped" PCM =)

RayDunzl · Oct 25, 2019

ayane said:
This is almost exactly how PDM technology works

I wouldn't go quite so far as to say that.

https://en.wikipedia.org/wiki/Pulse-density_modulation

This recording is nothing more than zero crossing, and if there is a zero, it will catch that too, as 0.

I'll call it ZCM - Zero Crossing Modulation

Take the original 16bit tune, amplify (allow clipping) by 100dB or more, and any original bit above zero becomes +full scale, below zero becomes -full scale.

Attenuate that by -30dB for a reasonable playback level, and serve piping hot.

Fluffy · Oct 25, 2019

MZKM said:
Most people listen to music with peaks of 95dB. 16Bit music is usually dithered, so >100dB or dynamic range, so 0dB to 95dB is perfectly played back.

For movies, reference level peaks are 105dB. Most rooms have a noise floor of roughly 35dB-50dB (lower on the treble region), that is ~70dB of SNR. Most studies show that with music, THD of roughly 1% is the lowest we can hear (close to 100% THD in the bass), meaning a SINAD of ~ 40dB should good enough (may not be perfect, but good enough).

8Bit is ~49dB; let’s see if you can tell the 8Bit from 16Bit in this Neil Young song.

So yes, 16Bit, which is usually dithered, so >100dB of dynamic range, should be good enough.

I think I passed

And I can also say exactly how – first of all I captured the two samples of music using Audacity, and synchronized them so I can compare them directly. At first hearing it was difficult telling which is which. But looking at the spectrogram revealed an audible hint:

As you can see, in the 8 bit version there is a slower decay of the high frequency sounds. They are masked pretty heavily by the cymbals during most of the song, but at the final second the fade out reveals clearly which one is the 8 bit. So in the blind test I used that fade in the end to easily determine which one is the 8 bit version. After several times, I adjusted my ear to pick up on the added sound on the high part of the spectrum, that can be heard during the decay, and definitely is not associated with the cymbals. Once I "tuned" my attention to just that part of the spectrum, I could pick up when there was added noise on top of the cymbals, and so I was able to determine which one is the 8 bit version after a few seconds of the song playing (before it reached the fade out). The one where the cymbals where dirtier is 8 bit, and where the cymbals are cleaner is the 16 bit.

But that took some training and very dependent on the specific music. I don't claim I could hear the difference in other songs, or even in other parts of this song.

Fluffy · Oct 25, 2019

ayane said:
Noise shaping works by pushing the quantization error into parts of the passband which are out of the way of the signal.

For example, quantizing a 16-bit signal to 8 bits will introduce quantization noise which is highly correlated to the signal. This can be avoided by randomizing the least significant bit. Randomly choosing the bit to be 1 or zero with 50/50 probability will result in white noise. White noise has a flat spectrum with equal amplitude at all frequencies. This white noise can be changed to another noise spectrum such that more of the noise is thrown into the high frequencies where our hearing is weak, which actually increases the signal to noise ratio where our hearing is more sensitive. A simple way of doing this is to randomly pick the value of the LSB such that it falls into a probability distribution that looks like the noise spectrum we desire.

This is a high level conceptual explanation of the concept which really needs a good understanding of math to be understood properly. The Wikipedia article for noise shaping is a good place to start, as well as Monty's "Digital Show and Tell".

That's a good enough explanation for me to understand the general idea. Thanks!

MZKM · Oct 25, 2019

Fluffy said:
View attachment 36920
I think I passed
And I can also say exactly how – first of all I captured the two samples of music using Audacity, and synchronized them so I can compare them directly. At first hearing it was difficult telling which is which. But looking at the spectrogram revealed an audible hint:
View attachment 36923

As you can see, in the 8 bit version there is a slower decay of the high frequency sounds. They are masked pretty heavily by the cymbals during most of the song, but at the final second the fade out reveals clearly which one is the 8 bit. So in the blind test I used that fade in the end to easily determine which one is the 8 bit version. After several times, I adjusted my ear to pick up on the added sound on the high part of the spectrum, that can be heard during the decay, and definitely is not associated with the cymbals. Once I "tuned" my attention to just that part of the spectrum, I could pick up when there was added noise on top of the cymbals, and so I was able to determine which one is the 8 bit version after a few seconds of the song playing (before it reached the fade out). The one where the cymbals where dirtier is 8 bit, and where the cymbals are cleaner is the 16 bit.

But that took some training and very dependent on the specific music. I don't claim I could hear the difference in other songs, or even in other parts of this song.

So, that was with a noise floor ~50dB down, now imagine 16Bit which is ~100dB down. With audible differences getting exponentially more difficult as bit-depth increases, the answer to OP’s question should be clear.

Now, one case Amir made in other threads is that “music” thresholds vary as the music varies, so we should instead aim for absolute thresholds, which is ~116dB. I agree that this is good form, but is indeed overkill.

j_j · Oct 26, 2019

Fluffy said:
Interesting. Can anybody explain how does noise shaping work? I could never figure that out…

Hmm. Yeah, I can, but it's not very intuitive. Basically in the "encode" part of the system, you examine the system error at the encoder, filter it, and use that to adjust the next quantization error. So then the quantization noise acquires a shape that you can control.

Could 8-bit be enough for carrying all the information in a given piece of music?

Master Contributor

Major Contributor

Addicted to Fun and Learning

Слава Україні

Major Contributor

Founder/Admin

Major Contributor

Addicted to Fun and Learning

Active Member

Major Contributor

Major Contributor

Master Contributor

Active Member

Grand Contributor

Active Member

Grand Contributor

Addicted to Fun and Learning

Addicted to Fun and Learning

Major Contributor

Major Contributor

Similar threads