• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,422
Likes
4,029
Location
Pacific Northwest
...That's pretty much what the result looks like. ...
Wow. Then DSD has many different ways (bit encoding sequences) to encode the same signal. Because many of the permutations mean the same thing (decode to the same analog wave), it has wasted bits.
That makes DSD an inefficient way to encode the signal. This makes data rate comparisons misleading; DSD having the same data rate as PCM (total # of bits / sec) is lower resolution audio.

PCM only wastes 1 bit per sample, the LSB which is randomized. That's only 1/16 or 1/24 of the bits. And it's not really wasted.
 
Last edited:

JustAnandaDourEyedDude

Addicted to Fun and Learning
Joined
Apr 29, 2020
Messages
518
Likes
819
Location
USA
If DSD encodes 0 and 1 whenever the amplitude is below or above zero, how does it encode the amplitude of the signal? For example when we change the volume of a waveform, that doesn't change any of its zero crossing points. More generally, I can devise many different signals that all have the same zero crossing points. And if DSD works like that, it seems it would sample them all the same.
What you describe above is what I referred to as "1-bit PCM" in my previous post (#138) in this thread. Quite right about that, it has a large quantization error, and will not distinguish between signals that have the same zero crossings (and largest amplitudes). DSD is a lot better than that because of the noise spread followed by noise shaping.

I suspect that DSD must work differently. Instead of simply being 0 or 1 when the signal is below or above zero, somehow, the ratio of 1s to 0s over a particular interval indicates the average amplitude over that interval. But this would mean that sometimes it encodes a 0 when the signal is above zero, and a 1 when it's below zero. Put differently, the probability of encoding a 1 or 0 would be proportional to the amplitude of the signal being encoded at that point. I don't know, just trying to make sense of it.

Again, you are right about that. Rather than the ratio of 1s to 0s, think of it as the average value of 1s (assigned a value of +1) and 0s (assigned a value of -1) over a particular interval. Say your audio signal has a bandwidth fb, and for PCM you sample it at (better than) the Nyquist frequency of 2 x fb. For 1-bit (or multi-bit) DSM, you oversample the audio signal at a frequency fs, when the oversample ratio is defined as OSR = fs/(2 fb). The output of the 1-bit DSM will be a bitstream of 1s and 0s. If you pick the interval for averaging to be 1/(2 fb), corresponding to the Nyquist sampling rate or the least lossless PCM sampling rate, you will have OSR number of 1s and 0s to average (with the number of 1s ranging from zero to OSR). The increase in ENOB (effective number of bits) due to oversampling will be 1/2 lg(OSR) or in other terms the increase in SNR (relative to that of 1 bit PCM sampled at the Nyquist frequency) due to oversampling will be log(OSR).

A zero amplitude stretch of time would be represented in the DSD bitstream by alternating 1s and 0s, which would value-average as zero. For a fixed DSM sampling frequency fs, if you increase the time interval dt over which you average the 1s and 0s, you improve the ENOB because you now have more samples per averaging interval, but you lose on the time resolution because now you have broader fewer time intervals over which you are averaging. Similarly, decreasing the averaging time interval improves time resolution, but decreases number of samples and thus ENOB. The only way to improve both time and amplitude resolution is to further increase OSR, i.e. to increase the oversampling frequency fs; thus DSD-1024 is upon us. Of course, the microphone SNR and time resolution (frequency bandwidth limit) are the real limiting factors; increase of bit depth or time resolution beyond this in the digital domain are pretty much moot except to handle DSP losses in bit depth. DSD and DSM are a form of PDM or pulse density encoding. The concept of PDM is similar to probability density functions or spectral density functions (amplitude per Hertz vs frequency in Hz) in the continuous variable case, though the twist in the case of DSM is that most of the resolution of the orthogonal axis (amplitude) is folded into the time axis.

The advantage of DSM is that it overcomes the difficulty (linearity problems) and hardware cost of large-bit-depth comparators for multi-bit PCM ADC by using high time-resolution of sampling made possible by precise high-frequency bistable oscillators (femtosecond clocks) at reasonable prices. Similarly, on the DAC side, it avoids linearity problems and high cost of large-bit-depth R-2R architecture, thus the technical and cost advantage of the DSM based flagship DACs of today. Plus, DSM with oversampling gives a large ultrasonic frequency range into which noise can be swept out of the audio band.

Analog-to-Digital Conversion (Springer, 3/e, 2017, 548pp) by Dr. Marcel Pelgrom (was team leader for high-speed ADC at Philips laboratories among other roles, worked for NXP Semiconductor, teaches at Delft University, University of Twente, 2017 recipient of IEEE field award Gustav R. Kirchhoff, holder of 35 US patents).

Quote from p. 269 "... Sigma-delta modulation and noise shaping are forms of pulse density modulation. ..."
Quote from p. 454 "... sigma-delta modulation is a form of pulse-density coding. ..."
 
Last edited:

bennetng

Major Contributor
Joined
Nov 15, 2017
Messages
1,634
Likes
1,692
What you describe above is what I referred to as "1-bit PCM" in my previous post (#138) in this thread. Quite right about that, it has a large quantization error, and will not distinguish between signals that have the same zero crossings (and largest amplitudes). DSD is a lot better than that because of the noise spread followed by noise shaping.
https://www.audiosciencereview.com/...easuring-distortion.10282/page-12#post-334093
If I understand your description correctly the audio files attached above should be able to illustrate it. There are two files in the attachment and the dithered one uses 0.5 bit TPDF dither with very weak noise shaping to make sure all samples are only encoded with two amplitude values.
 

JustAnandaDourEyedDude

Addicted to Fun and Learning
Joined
Apr 29, 2020
Messages
518
Likes
819
Location
USA
https://www.audiosciencereview.com/...easuring-distortion.10282/page-12#post-334093
If I understand your description correctly the audio files attached above should be able to illustrate it. There are two files in the attachment and the dithered one uses 0.5 bit TPDF dither with very weak noise shaping to make sure all samples are only encoded with two amplitude values.

Yes, those two audio files constitute an excellent example of the difference, thanks for linking to your previous uploads. The file "dither_off" corresponds to what I would call 4-bit PCM, and sounds pretty bad (though 1-bit PCM would sound a lot worse!), and the minimally dithered audio file sounds much more comprehensible. I think the minimal dithering likely spreads the quantization noise uniformly across the bandwidth.
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
What you describe above is what I referred to as "1-bit PCM" in my previous post (#138) in this thread. Quite right about that, it has a large quantization error, and will not distinguish between signals that have the same zero crossings (and largest amplitudes). DSD is a lot better than that because of the noise spread followed by noise shaping.
It's all 1-bit PCM, just with or without noise shaping.
 

tmtomh

Major Contributor
Forum Donor
Joined
Aug 14, 2018
Messages
2,635
Likes
7,486
This discussion prompted me to go into discogs, where I've catalogued my music collection, and see how many SACDs I own.

I found that I own 35, but 29 of them have a unique (and to my ears better) mastering not available on any regular CD pressing. And 4 of the remaining 6 have the same base mastering as a CD version, but for whatever reason the CD version is pushed to clipping while the SACD mastering/layer has the full dynamics.

So without realizing it until now, I've only ever bought two SACDs because of the SACD format - the rest were all for the mastering.

Upon reflection it makes sense, as I believe mastering trumps format every time (at least when it comes to digital formats). But still, I was surprised by this particular fact of my music collecting.
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
This discussion prompted me to go into discogs, where I've catalogued my music collection, and see how many SACDs I own.

I found that I own 35, but 29 of them have a unique (and to my ears better) mastering not available on any regular CD pressing. And 4 of the remaining 6 have the same base mastering as a CD version, but for whatever reason the CD version is pushed to clipping while the SACD mastering/layer has the full dynamics.

So without realizing it until now, I've only ever bought two SACDs because of the SACD format - the rest were all for the mastering.

Upon reflection it makes sense, as I believe mastering trumps format every time (at least when it comes to digital formats). But still, I was surprised by this particular fact of my music collecting.
If something you want is only available as DSD, that's of course no reason not to buy it. It's not quite that terrible a format. What you shouldn't do is pay extra for DSD when the same music is available cheaper on another format. The most egregious pricing is probably that of Cookie Marenco who asks $15 for CD quality ($40 for a physical disc) and $50 for DSD256.
 

JustAnandaDourEyedDude

Addicted to Fun and Learning
Joined
Apr 29, 2020
Messages
518
Likes
819
Location
USA
It's all 1-bit PCM, just with or without noise shaping.

True enough, they are both binary bitstreams that can both be interpreted as 1-bit PCM. However, the 1-bit PCM without DSM and noise reshaping seemingly only retains one bit depth of amplitude information, while the DSD version apparently retains the equivalent of 18 bits depth of amplitude information (the ENOB) in the audio passband, which should allow it to sound a lot better than the version without the DSM and noise reshaping.

This thread made me curious about how DSD differed from 1-bit and multi-bit PCM without noise shaping in mathematical or algorithmic terms. Now I understand enough to satisfy that idle curiosity.

I have no interest in actually buying or streaming any DSD, I rejected SACD from the start. I only bought a few hybrid SACDs when it seemed that a (cheaper) pure CD version of the album was unavailable. The primary factor for me is that the patents for Redbook CD have long expired, whereas SACD/DSD continues to be locked down with patents and DRM. Then as to sound quality, higher bitrate PCM is as capable of accuracy as DSD. And anyway, my hearing is probably shot over 10kHz, I could not even reliably distinguish 128kbps from 16/44.1 on some tracks in the 2015 NPR online test. I certainly do not need or want anything more than 16/44.1 PCM with FLAC for DRM-free size-efficient files or bitstreams for any downloads or streaming I purchase.
 
Last edited:

JustAnandaDourEyedDude

Addicted to Fun and Learning
Joined
Apr 29, 2020
Messages
518
Likes
819
Location
USA
Wow. Then DSD has many different ways (bit encoding sequences) to encode the same signal. Because many of the permutations mean the same thing (decode to the same analog wave), it has wasted bits.
That makes DSD an inefficient way to encode the signal. This makes data rate comparisons misleading; DSD having the same data rate as PCM (total # of bits / sec) is lower resolution audio.

PCM only wastes 1 bit per sample, the LSB which is randomized. That's only 1/16 or 1/24 of the bits. And it's not really wasted.

I am unclear on this point, so I remain cautious about dismissing DSD's efficiency of sample usage as being as poor as it appears. On the surface, it seems so if you consider any particular time interval (say, of width 1/(2 fb) where fb is the audio frequency bandwidth at an interval position A) over which to average the values of the 1s and 0s. You are right in this case that many permutations would give you the same average, all that matters are the total number of 1s and the total number of 0s. So on the surface it seems that DSD is wasteful in the amount of oversampling needed to represent a desired ENOB of amplitude information.

However, we may consider the averaging window of width say 1/(2 fb) to move to later times one sample width (1/fs) at a time until it reaches an interval position B which is contiguous with A but non-overlapping. The signed signal value is a well defined quantity (by the averaging) not only for intervals A and B, but also for all the intervening overlapping intervals, consecutive ones of which differ at most in the difference in value of two samples which are not in their overlap.

The high-order noise shaping algorithms probably uniquely determine the pattern of distribution of the 1s and 0s, and thus determine the variation of the signed signal value between intervals A and B inclusive. The action of those noise-shaping algorithms possibly amounts to multiple smoothness criteria in transitioning from its value on the interval A to that on B. I am thinking of an analogy with high-order spline curves as a continuous variable example. So it is possible that the permutations you refer to are not arbitrary but are effectively statistically constrained to meet some smoothness criteria for the signal, which may amount to higher frequency/time resolution of the DSD signal (than if you just consider the Nyquist sampling intervals) though with a decreasing number of effective bits as you examine increasing frequency. I do not know, but I hesitate to pass judgement on the wastefulness of DSD until someone with the math skills analyzes this.
 
Last edited:

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
True enough, they are both binary bitstreams that can both be interpreted as 1-bit PCM. However, the 1-bit PCM without DSM and noise reshaping seemingly only retains one bit depth of amplitude information, while the DSD version apparently retains the equivalent of 18 bits depth of amplitude information (the ENOB) in the audio passband, which should allow it to sound a lot better than the version without the DSM and noise reshaping.
Yes, noise shaping improves the dynamic range in part of the spectrum by making it worse elsewhere. That is not unique to DSD.

This thread made me curious about how DSD differed from PCM in mathematical or algorithmic terms.
It doesn't, not one bit.
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
I do not know, but I hesitate to pass judgement on the wastefulness of DSD until someone with the math skills analyzes this.
It's simple. On a spectrum chart, mark the area deemed important. Typically a rectangle covering frequencies below ~20 kHz and levels above -100 dBFS or so, quantisation noise is not allowed within this area. The minimum sample rate required is twice that of the highest frequency in important area. The lowest level determines the maximum bit depth that might be needed. The trivial solution is to use a sample rate a little higher than the bare minimum (for the filter transition band), whatever bit depth gives the desired dynamic range, and flat TPDF dither. Alternatively, a lower bit depth and higher sample rate can be used along with noise shaping to keep from encroaching on the audio area.

As for efficiency, the trivial choice is unbeatable since it encodes exactly the information we want and nothing (well, only a little) more. This can be matched by a high-rate, low-resolution encoding only if the noise shaping 100% effective, i.e. capable of shifting some amount of noise to another part of the spectrum with exactly the same area. No known algorithm can do this. It follows that noise-shaped encodings are always somewhat wasteful.
 
Last edited:

AnalogSteph

Major Contributor
Joined
Nov 6, 2018
Messages
3,334
Likes
3,278
Location
.de
From what I've seen, DSD64 (2.8224 MHz 1 bit) effectively performs roughly like 18-bit, 88-96 kHz PCM... 16 bit 88-96 kHz with shaped dither perhaps. (Unlike 44.1 where you have to squeeze in dithered noise energy just below 20 kHz, there is plenty of room once you go hi-res!) Its data rate is almost twice that. Not sure what would happen if you were to try to compress both.

The one (at least theoretical) problem with a 1-bit stream that it cannot be dithered enough for 100% transparency (see Lipshitz/Vanderkooy SACD paper). If you take a sine wave and encode it at ever lower amplitudes, it can always be retrieved from properly dithered multibit PCM no matter how low you go, it just takes ever more filtering. In 1-bit, it'll eventually be gone. I'm pretty sure this is in no way audibly relevant, as nobody does the equivalent of retrieving deep space probe signals with DSD. (The noise floor is at about -110 dBFS for DSD64, so I would expect something well below that.)
 

JustAnandaDourEyedDude

Addicted to Fun and Learning
Joined
Apr 29, 2020
Messages
518
Likes
819
Location
USA
It's simple. On a spectrum chart, mark the area deemed important. Typically be a rectangle covering frequencies below ~20 kHz and levels above -100 dBFS or so, quantisation noise is not allowed within this area. The minimum sample rate required is twice that of the highest frequency in important area. The lowest level determines the maximum bit depth that might be needed. The trivial solution is to use a sample rate a little higher than the bare minimum (for the filter transition band), whatever bit depth gives the desired dynamic range, and flat TPDF dither. Alternatively, a lower bit depth and higher sample rate can be used along with noise shaping to keep from encroaching on the audio area.

As for efficiency, the trivial choice is unbeatable since it encodes exactly the information we want and nothing (well, only a little) more. This can be matched by a high-rate, low-resolution encoding only if the noise shaping 100% effective, i.e. capable of shifting some amount of noise to another part of the spectrum with exactly the same area. No known algorithm can do this. It follows that noise-shaped encodings are always somewhat wasteful.

Agreed. Thanks for the neat logic. I did know that the trivial PCM solution is the most efficient solution. DSD will be somewhat wasteful. I was merely hesitating over whether one could jump to the conclusion that MRC01 seemed to have in mind, that for each OSR number of 1-bit values, the ordering of the values is immaterial, and that it is only the proportion of 1s and 0s within that interval that mattered. As I wondered, would does the actual ordering of 1-bit values by the noise shaping algorithm add some resolution at higher frequencies though not to the ENOB that it does at lower frequencies? Such resolution would likely above 20 kHz and thus not be of practical interest, but still of mathematical interest. I guess AnalogSteph probably just answered my question.
 
Last edited:

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
Agreed. Thanks for the neat logic. I did know that the trivial PCM solution is the most efficient solution. DSD will be somewhat wasteful. I was merely hesitating over whether one could jump to the conclusion that MRC01 seemed to have in mind, that for each OSR number of 1-bit values, the ordering of the values is immaterial, and that it is only the proportion of 1s and 0s within that interval that mattered. As I wondered, would the actual ordering of 1-bit values by the noise shaping algorithm add some resolution at higher frequencies though not to the ENOB that it does at lower frequencies? Such resolution would likely above 20 kHz and thus not be of practical interest, but still of mathematical interest. I guess AnalogSteph probably just answered my question.
The precise pattern of bit values _is_ the noise shaping. Since it's not possible to completely fill a section of the spectrum with noise, there is indeed some dynamic range available at higher frequencies. In practice, that's not of much use, though, since the spectral content of music drops off as the frequency rises while the noise level increases. It doesn't take long for the noise to completely swamp whatever signal remains.
 

JustAnandaDourEyedDude

Addicted to Fun and Learning
Joined
Apr 29, 2020
Messages
518
Likes
819
Location
USA
The precise pattern of bit values _is_ the noise shaping. Since it's not possible to completely fill a section of the spectrum with noise, there is indeed some dynamic range available at higher frequencies. In practice, that's not of much use, though, since the spectral content of music drops off as the frequency rises while the noise level increases. It doesn't take long for the noise to completely swamp whatever signal remains.

Thanks for confirming and explaining in a easy-to-visualize manner. I think I have seen it in plots on ASR or on your website. Now I have a feel for what DSD is and how it does what it does. Next up, if you come up with good explanations for the layperson of either QFT or the Atiyah-Singer Index Theorem and Nicolae Teleman's extension of it to manifolds with graded algebras, I will be interested to read them :)
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
The precise pattern of bit values _is_ the noise shaping.
To elaborate on that, if you were to split the sample stream into blocks of 64 (assuming that is the oversampling factor) and collect all the 1 bits at the start of each such block, you'd get basically a PWM signal quantised (in time) with 6-bit precision. DSD can, if one is thus inclined, be seen as oversampled PWM quantised to two widths, zero and 100%, with noise shaping. Not that this interpretation is particularly useful. PCM is much more efficient since it gives each bit position within a block a different meaning, thus allowing the various combinations to express a wider range of values.
 
Last edited:

Tom C

Major Contributor
Joined
Jun 16, 2019
Messages
1,501
Likes
1,370
Location
Wisconsin, USA
Confuse audiophiles and make them pay more, apparently.
I thought it was because, at the time it was introduced, the patents on red book CD were running out, and Sony was hoping to launch a new proprietary system that would continue to generate licensing royalties.
Thank you for your clear and direct explanations, by the way. Makes for interesting reading.
 

mansr

Major Contributor
Joined
Oct 5, 2018
Messages
4,685
Likes
10,700
Location
Hampshire
I thought it was because, at the time it was introduced, the patents on red book CD were running out, and Sony was hoping to launch a new proprietary system that would continue to generate licensing royalties.
None of that necessitates DSD.
 
Top Bottom