• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Could 8-bit be enough for carrying all the information in a given piece of music?

Frank Dernie

Master Contributor
Forum Donor
Joined
Mar 24, 2016
Messages
6,452
Likes
15,797
Location
Oxfordshire
First I should apologise for the slightly provocative title.
I know human hearing has a dynamic range of ~120dB but the only device I can think of which needs this much would be an environmental noise recorder that you didn't want to miss any sound and didn't want to use an auto gain control.
Those of us that have recorded music on any type of recorder will be familiar with its level control. If the sound is loud we turn it down a bit, and vice versa.
Back in tape days it was a real skill to get the level good. Too high gives distortion on peaks but too low made the tape noise too audible. A bit of tape overload can be euphonic (a tape overload emulator plug in is a popular limiter used by recordists now) and is certainly better than audible hiss during the quiet parts of the music so we did tend to over record a touch, something catastrophic with digital.
With a 16-bit digital recorder it is very easy to set levels since, IME, even with peaks at -6dB the quiet bits of the music will never be accompanied by audible hiss.
What started me on this line of thought was an experience at the Scalford enthusiasts show, put on by the HiFi Wigwam, about 5 or so years ago.
I showed up with a few bits of music on a USB stick one of which was a 24/96 recording of Eric Whitaker music (Water Night).
@Pluto of this parish was there with his active Harbeth Monitor 40s and his laptop. I had intended to ask him to produce a 16/44.1 down sample of one of my files to compare with the 24/96 original but he suggested a different comparison which may be more surprising for listeners and he could do in real time on his PC.
This was to play back the file as 8-bit with noise shaping. We, the assembled audience of enthusiasts and many die hard "analogue is better" fans, then got to compare 24/96, 8/96 with noise shaping and 8/96 without.
The 8/96 had obvious hiss in the quiet bits and between tracks without noise shaping but I think it fair to say nobody in the room could hear any difference between the 8-bit noise shaped and the original 24/96. I was surprised, and a several of the audience were angry, refusing to believe they had been contentedly listening to 8-bit and claiming trickery.
Anyway @sergeauckland was there too so maybe he would remember it too.
 

ElNino

Addicted to Fun and Learning
Joined
Sep 26, 2019
Messages
557
Likes
724
Believe it or not, there is virtually no academic research on the threshold of audibility of different bit depths. (With sample rates, there is some research but limited consensus, but with bit depths, there actually is almost nothing out there.) Reiss did a review of all of the published research in 2016 along both axes; you might find it interesting: http://www.aes.org/tmpFiles/elib/20191025/18296.pdf

It is plausible that 8 bit, properly dithered, would be enough (or close to enough) to be audibly transparent for most people. We just don't have a lot of good published data either way.
 

pozz

Слава Україні
Forum Donor
Editor
Joined
May 21, 2019
Messages
4,036
Likes
6,827
@j_j previously did a demonstration he mentioned in this talk where a certain encoding protocol produced only 13dB SNR. Apparently it was hard to tell the difference between that and the original.
 

GrimSurfer

Major Contributor
Joined
May 25, 2019
Messages
1,238
Likes
1,484
First, the title is a little provocative because of the phrase "all the information".

Second, the focus on musical signal ignores the issue of noise. A dynamic range of 48 dB would bring noise well into the audible band, particularly in the middle frequencies. Some of this might be masked by the room or signal, so may not be immediately apparent. But it is noise nonetheless and would contribute of fatigue and irritation over time.

The last point is something that doesn't get factored into ABX tests. The sound clips are small and frequently switched. It can, however, be picked up in other ways. Prolonged use at home, for instance, which isn't affected as much by bias because there is no "B sample" against which to compare "A".
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,563
Likes
238,997
Location
Seattle Area
If you sample high enough, you get down to 1 bit as in with DSD. 96 kHz helps a lot with noise shaping as it allows the dither noise to be parked up there with plenty of space. And higher sample rate reduces the needed bit depth some as well.
 

GrimSurfer

Major Contributor
Joined
May 25, 2019
Messages
1,238
Likes
1,484
If you sample high enough, you get down to 1 bit as in with DSD. 96 kHz helps a lot with noise shaping as it allows the dither noise to be parked up there with plenty of space. And higher sample rate reduces the needed bit depth some as well.

Sure, but then things turn into a toh-mate-oh, tho-matt-oh discussion (16/44 vs 8/96). In an era where bandwidth, memory or processing aren't limiting factors, it probably doesn't matter.
 

ayane

Active Member
Forum Donor
Joined
Dec 15, 2018
Messages
183
Likes
686
Location
NorCal
I've tested this on myself anecdotally and I was not surprised that I can tell the difference between 44/16 and 44/8 with or without noise shaping. I've got fairly young ears and very clean measuring gear, so I might be an outlier.

I want to go back and see how low I can drop the bitrate before it's audibly transparent with the original. I wouldn't be surprised if 9 or 10 bits is transparent to me. I also wonder what the lowest sampling rate would be in order to make noise-shaped 8-bit transparent for me.
 

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
4,250
Likes
11,551
Location
Land O’ Lakes, FL
Most people listen to music with peaks of 95dB. 16Bit music is usually dithered, so >100dB or dynamic range, so 0dB to 95dB is perfectly played back.

For movies, reference level peaks are 105dB. Most rooms have a noise floor of roughly 35dB-50dB (lower on the treble region), that is ~70dB of SNR. Most studies show that with music, THD of roughly 1% is the lowest we can hear (close to 100% THD in the bass), meaning a SINAD of ~ 40dB should good enough (may not be perfect, but good enough).

8Bit is ~49dB; let’s see if you can tell the 8Bit from 16Bit in this Neil Young song.

So yes, 16Bit, which is usually dithered, so >100dB of dynamic range, should be good enough.
 

DonH56

Master Contributor
Technical Expert
Forum Donor
Joined
Mar 15, 2016
Messages
7,868
Likes
16,622
Location
Monument, CO
Oversampling and noise shaping gets a little complicated... See e.g. https://www.audiosciencereview.com/...igma-delta-digital-audio-converters-dac.1928/ and there are some other good articles about it in the technical section.
  • Noise shaping can be performed using oversampling or (rarely IME) paralleled converters and special techniques to provide equivalent oversampling (Hadamard sequences and input multipliers is one way I researched once upon a time).
  • Oversampling with nothing else (no noise shaping or special filtering) provides about 1/2 bit reduction for each doubling of the sampling rate for a given bandwidth. This is simply because the noise is spread over a greater bandwidth, so if you double the sampling rate but keep the same output bandwidth, noise is reduced by about 3 dB.
  • Increasing the order of the delta-sigma (or whatever) modulator yields greater SNR improvement to the tune of about 1 bit for each order (no noise shaping ~0.5 bit, 1st order ~1.5 bit, 2nd order ~2.5 bit, etc. for each doubling of sampling rate, again for a fixed bandwidth).
  • This is for quantization noise, so distortion is not really affected (the input or output still needs N-bit linearity for N-bit resolution), and dither (noise decorrelation) is generally affected just like any other input signal (the added noise over the Nyquist bandwidth will be shaped above the output bandwidth).
I am not sure how higher sampling rates lead to lower bit depth except when noise shaping is used -- @amirm, is that what you meant, or something else?

I made up some files years ago to try a few different scenarios. I can well believe 8 bits would be fine for some music, inadequate for other, probably readily discerned with test tones, and likely due more to the quantization noise floor than the distortion.

HTH - Don
 

ayane

Active Member
Forum Donor
Joined
Dec 15, 2018
Messages
183
Likes
686
Location
NorCal
Interesting. Can anybody explain how does noise shaping work? I could never figure that out…
Noise shaping works by pushing the quantization error into parts of the passband which are out of the way of the signal.

For example, quantizing a 16-bit signal to 8 bits will introduce quantization noise which is highly correlated to the signal. This can be avoided by randomizing the least significant bit. Randomly choosing the bit to be 1 or zero with 50/50 probability will result in white noise. White noise has a flat spectrum with equal amplitude at all frequencies. This white noise can be changed to another noise spectrum such that more of the noise is thrown into the high frequencies where our hearing is weak, which actually increases the signal to noise ratio where our hearing is more sensitive. A simple way of doing this is to randomly pick the value of the LSB such that it falls into a probability distribution that looks like the noise spectrum we desire.

This is a high level conceptual explanation of the concept which really needs a good understanding of math to be understood properly. The Wikipedia article for noise shaping is a good place to start, as well as Monty's "Digital Show and Tell".
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,245
Likes
17,144
Location
Riverview FL
1 "bit" is enough to create reproduce an easily recognizable tune.

At -30dBfs, no dither:

1572036407956.png



Hear at SoundCloud...

---

Amazingly enough, they didn't like the fulllength version in one-bit one bit:

1572036858356.png
 

ayane

Active Member
Forum Donor
Joined
Dec 15, 2018
Messages
183
Likes
686
Location
NorCal
1 "bit" is enough to create reproduce an easily recognizable tune.

At -30dBfs, no dither:

View attachment 36919


Hear at SoundCloud...
What a brilliant example. This is almost exactly how PDM technology works, for example DSD. Technically PDM is just 1-bit "noise-shaped" PCM =)
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,245
Likes
17,144
Location
Riverview FL
This is almost exactly how PDM technology works

I wouldn't go quite so far as to say that.

https://en.wikipedia.org/wiki/Pulse-density_modulation

This recording is nothing more than zero crossing, and if there is a zero, it will catch that too, as 0.

I'll call it ZCM - Zero Crossing Modulation

Take the original 16bit tune, amplify (allow clipping) by 100dB or more, and any original bit above zero becomes +full scale, below zero becomes -full scale.

Attenuate that by -30dB for a reasonable playback level, and serve piping hot.
 

Fluffy

Addicted to Fun and Learning
Joined
Sep 14, 2019
Messages
856
Likes
1,424
Most people listen to music with peaks of 95dB. 16Bit music is usually dithered, so >100dB or dynamic range, so 0dB to 95dB is perfectly played back.

For movies, reference level peaks are 105dB. Most rooms have a noise floor of roughly 35dB-50dB (lower on the treble region), that is ~70dB of SNR. Most studies show that with music, THD of roughly 1% is the lowest we can hear (close to 100% THD in the bass), meaning a SINAD of ~ 40dB should good enough (may not be perfect, but good enough).

8Bit is ~49dB; let’s see if you can tell the 8Bit from 16Bit in this Neil Young song.

So yes, 16Bit, which is usually dithered, so >100dB of dynamic range, should be good enough.
8bit.png

I think I passed :)
And I can also say exactly how – first of all I captured the two samples of music using Audacity, and synchronized them so I can compare them directly. At first hearing it was difficult telling which is which. But looking at the spectrogram revealed an audible hint:
dacay.png

As you can see, in the 8 bit version there is a slower decay of the high frequency sounds. They are masked pretty heavily by the cymbals during most of the song, but at the final second the fade out reveals clearly which one is the 8 bit. So in the blind test I used that fade in the end to easily determine which one is the 8 bit version. After several times, I adjusted my ear to pick up on the added sound on the high part of the spectrum, that can be heard during the decay, and definitely is not associated with the cymbals. Once I "tuned" my attention to just that part of the spectrum, I could pick up when there was added noise on top of the cymbals, and so I was able to determine which one is the 8 bit version after a few seconds of the song playing (before it reached the fade out). The one where the cymbals where dirtier is 8 bit, and where the cymbals are cleaner is the 16 bit.

But that took some training and very dependent on the specific music. I don't claim I could hear the difference in other songs, or even in other parts of this song.
 

Fluffy

Addicted to Fun and Learning
Joined
Sep 14, 2019
Messages
856
Likes
1,424
Noise shaping works by pushing the quantization error into parts of the passband which are out of the way of the signal.

For example, quantizing a 16-bit signal to 8 bits will introduce quantization noise which is highly correlated to the signal. This can be avoided by randomizing the least significant bit. Randomly choosing the bit to be 1 or zero with 50/50 probability will result in white noise. White noise has a flat spectrum with equal amplitude at all frequencies. This white noise can be changed to another noise spectrum such that more of the noise is thrown into the high frequencies where our hearing is weak, which actually increases the signal to noise ratio where our hearing is more sensitive. A simple way of doing this is to randomly pick the value of the LSB such that it falls into a probability distribution that looks like the noise spectrum we desire.

This is a high level conceptual explanation of the concept which really needs a good understanding of math to be understood properly. The Wikipedia article for noise shaping is a good place to start, as well as Monty's "Digital Show and Tell".
That's a good enough explanation for me to understand the general idea. Thanks!
 

MZKM

Major Contributor
Forum Donor
Joined
Dec 1, 2018
Messages
4,250
Likes
11,551
Location
Land O’ Lakes, FL
View attachment 36920
I think I passed :)
And I can also say exactly how – first of all I captured the two samples of music using Audacity, and synchronized them so I can compare them directly. At first hearing it was difficult telling which is which. But looking at the spectrogram revealed an audible hint:
View attachment 36923

As you can see, in the 8 bit version there is a slower decay of the high frequency sounds. They are masked pretty heavily by the cymbals during most of the song, but at the final second the fade out reveals clearly which one is the 8 bit. So in the blind test I used that fade in the end to easily determine which one is the 8 bit version. After several times, I adjusted my ear to pick up on the added sound on the high part of the spectrum, that can be heard during the decay, and definitely is not associated with the cymbals. Once I "tuned" my attention to just that part of the spectrum, I could pick up when there was added noise on top of the cymbals, and so I was able to determine which one is the 8 bit version after a few seconds of the song playing (before it reached the fade out). The one where the cymbals where dirtier is 8 bit, and where the cymbals are cleaner is the 16 bit.

But that took some training and very dependent on the specific music. I don't claim I could hear the difference in other songs, or even in other parts of this song.

So, that was with a noise floor ~50dB down, now imagine 16Bit which is ~100dB down. With audible differences getting exponentially more difficult as bit-depth increases, the answer to OP’s question should be clear.

Now, one case Amir made in other threads is that “music” thresholds vary as the music varies, so we should instead aim for absolute thresholds, which is ~116dB. I agree that this is good form, but is indeed overkill.
 
Last edited:

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,278
Likes
4,780
Location
My kitchen or my listening room.
Interesting. Can anybody explain how does noise shaping work? I could never figure that out…

Hmm. Yeah, I can, but it's not very intuitive. Basically in the "encode" part of the system, you examine the system error at the encoder, filter it, and use that to adjust the next quantization error. So then the quantization noise acquires a shape that you can control.
 
Top Bottom