Does DSD recording benefit Japanese traditional instruments?

GXAlan · Apr 6, 2023

Is DSD 11.2 MHz "required" for recording taiko drums and Japanese stringed instruments?

ìˆäˆèŽq‚ÌVìƒAƒ‹ƒoƒ€w‹¿ -HIBIKI-xB˜aŠyŠí‚Æ¼—mŠyŠí‚ð‘g‚Ý‡‚í‚¹‚½˜^‰¹ƒAƒvƒ[ƒ`‚ð’T‚é - Phile-web

ìˆäˆèŽq‚ÌVìƒAƒ‹ƒoƒ€w‹¿ -HIBIKI-xB˜aŠyŠí‚Æ¼—mŠyŠí‚ð‘g‚Ý‡‚í‚¹‚½˜^‰¹ƒAƒvƒ[ƒ`‚ð’T‚é

www.phileweb.com

Original article in Japanese

"The powerful and deep ultra-low sound of Japanese taiko drums and the penetrating treble of Noukan and Hichiriki are important listening highlights that audiophiles should pay attention to. Ezaki talks about the difficulty of recording and playback: "Japanese taiko drums contain deep ultra-low sounds that can only be recorded at DSD 11.2MHz, so please challenge yourself to reproduce them accurately. The sound pressure was too high, and if I recorded at a normal level, it would all be over, so I put in a minus 20dB pad to suppress the over-level.

Incidentally, the sound pressure of the Noh tube is almost unheard of in Western instruments, and is comparable to the bang of a large drum. Its sharp and powerful treble is also likely to be one of the most difficult challenges for audio systems."

Is there any truth to this?
No idea. We weren't there at the time of recording, but we can run some indirect experiments.

We do know that the theoretical quantization noise of SACD is horrible beyond 20 kHz but the dynamic range in the bass region is actually are pretty good. So DSD isn't worse than PCM in this regard.

source: attached document

EXTON Studio TOKYO
The album was mixed at EXTON Studio TOKYO. Opened in 2012, the studio was "thoroughly particular about the power supply environment, and in addition to installing a pole transformer dedicated to power sharing, it is a studio that pursues sound quality from the power supply side, such as the burial of ground rods, high-quality sound quality wire rods, and the specifications of parts."

The studio uses the Pyramix DAW system which can keep everything in DSD 11.2MHz except for crossfades when DXD is brought into the chain. So at least the studio involved could benefit from DSD 11.2 MHz recordings.

At least from their 2012 equipment list, they've got Accuphase electronics and speakers from B&W, KEF, and even the Sony SS-AR1!

But... it seems that this album is only available on physical Hybrid SACD...
There's no 11.2 MHz DSD digital album to purchase. Admittedly, the recording engineer didn't say that you needed to *listen* to this at DSD 11.2 MHz -- just that it needed to be recorded at DSD 11.2 Mhz.

This creates the scenario for our test. The recording engineer took the trouble to work with DSD 11.2 which is more difficult than PCM. Is there anything that carries to the finished disc?

Does DSD 2.8 MHz offer anything above 16-bit / 44.1 kHz for this album?

I chose Track #11 which is a re-arrangement of Mussorgsky Pictures at an Exhibition with traditional Japanese instruments which has some nice drum activity.

SACD layer

CD layer

The dynamic range of the LEFT channel is 0.4 points different compared to the dynamic range of the right which is just 0.1 points different.

Test parameters
@pkane's DeltaWave v2.0.8
DSD converted to "DXD" 352.8 kHz, with 50 kHz cut frequency and a transition BW of 5000 Hz. Auto Upsample turned on.
DSD and CD layers extracted digitally.

DeltaWave v2.0.8, 2023-04-06T09:16:49.8591744-07:00
Reference: 11 - _Bonus Track_ Ikuko Kawai_ Excerpt from Pictures at an Exhibition.dsf[L] 49188864 samples 352800Hz 24bits, mono, MD5=00
Comparison: 11. Track11.wav[L] 6149304 samples 44100Hz 16bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:True
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:65536, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -500dB
Correct Non-linearity: False
Correct Drift:True, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:True, Window:Kaiser
Spectrum Window:Kaiser, Spectrum Size:32768
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Kaiser, taps:262144, minimum phase=False
Dither:False bits=0
Trim Silence:False
Enable Simple Waveform Measurement: False

Resampled Comparison to 352800Hz
Discarding Reference: Start=0s, End=150s
Discarding Comparison: Start=0s, End=150s

Initial peak values Reference: -0.002dB Comparison: -0.04dB
Initial RMS values Reference: -18.768dB Comparison: -18.767dB

Null Depth=14.973dB
X-Correlation offset: 1551 samples
Drift computation quality, #1: Excellent (0.09μs)

Trimmed 0 samples ( 0.00ms) front, 0 samples ( 0.00ms end)

Final peak values Reference: -0.002dB Comparison: -0.042dB
Final RMS values Reference: -18.768dB Comparison: -18.769dB

Gain= 0.0027dB (1.0003x) DC=0 Phase offset=4.396549ms (1551.103 samples)
Difference (rms) = -59.9dB [-83.82dBA]
Correlated Null Depth=61.4dB [79.66dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0.61%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 49.9977% when reduced to 9.42 bits

---- Phase difference (full bandwidth): 0.499160200574147°
0-10kHz: 0.22°
0-20kHz: 0.39°
0-24kHz: 0.41°
Timing error (rms jitter): 1.9μs
PK Metric (step=400ms, overlap=50%):
RMS=-87.4dBr
Median=-87.8
Max=-76.6

99%: -82.2
75%: -86.56
50%: -87.81
25%: -89.4
1%: -96.47

gn=0.99969121573486, dc=-2.98660917164619E-09, dr=0, of=1551.102620714

DONE!

Signature: 1074f525b750623839bbee49b51d5194

RMS of the difference of spectra: -286.583287535641dB
DF Metric (step=400ms, overlap=0%):
Median=-42.2dB
Max=-9.8dB Min=-49.7dB

1% > -49.23dB
10% > -47.82dB
25% > -45.69dB
50% > -42.19dB
75% > -39.13dB
90% > -34.15dB
99% > -14.6dB

Linearity 23.3bits @ 0.5dB error

DeltaWave v2.0.8, 2023-04-06T09:45:01.1545896-07:00
Reference: 11 - _Bonus Track_ Ikuko Kawai_ Excerpt from Pictures at an Exhibition.dsf[R] 49188864 samples 352800Hz 24bits, mono, MD5=00
Comparison: 11. Track11.wav[R] 6149304 samples 44100Hz 16bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:True
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:65536, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -500dB
Correct Non-linearity: False
Correct Drift:True, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:True, Window:Kaiser
Spectrum Window:Kaiser, Spectrum Size:32768
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Kaiser, taps:262144, minimum phase=False
Dither:False bits=0
Trim Silence:False
Enable Simple Waveform Measurement: False

Resampled Comparison to 352800Hz
Discarding Reference: Start=0s, End=150s
Discarding Comparison: Start=0s, End=150s

Initial peak values Reference: 0.008dB Comparison: -0.006dB
Initial RMS values Reference: -19.098dB Comparison: -19.096dB

Null Depth=13.05dB
X-Correlation offset: 1551 samples
Drift computation quality, #1: Excellent (0.09μs)

Trimmed 0 samples ( 0.00ms) front, 0 samples ( 0.00ms end)

Final peak values Reference: 0.008dB Comparison: -0.008dB
Final RMS values Reference: -19.098dB Comparison: -19.098dB

Gain= 0.0019dB (1.0002x) DC=0 Phase offset=4.396333ms (1551.026 samples)
Difference (rms) = -59.95dB [-83.98dBA]
Correlated Null Depth=61.68dB [77.53dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0.62%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50.002% when reduced to 9.42 bits

---- Phase difference (full bandwidth): 0.455286117579634°
0-10kHz: 0.22°
0-20kHz: 0.36°
0-24kHz: 0.38°
Timing error (rms jitter): 431ns
PK Metric (step=400ms, overlap=50%):
RMS=-88.2dBr
Median=-88.2
Max=-82.8

99%: -84.42
75%: -87.11
50%: -88.2
25%: -89.81
1%: -95.9

gn=0.999783269216895, dc=-1.33039044914503E-06, dr=0, of=1551.0264145405

DONE!

Signature: 5bf9e4568681ac4a5d47df182c2df300

RMS of the difference of spectra: -288.232023821114dB
DF Metric (step=400ms, overlap=0%):
Median=-41.7dB
Max=-11.6dB Min=-50dB

1% > -49.3dB
10% > -48.12dB
25% > -44.75dB
50% > -41.71dB
75% > -37.61dB
90% > -33.28dB
99% > -16.3dB

Linearity 23.2bits @ 0.5dB error

Accuracy of the match:

Delta Waveform (left channel)

You definitely see "spikes" in regions of the music. These aren't drum hits but the stringed instruments. The same is seen on the right channel delta waveform.

The spectrogram does show some changes dipping down as low as 700 Hz, but the changes are very subtle. The biggest changes are above 6 kHz. The biggest transient is also associated with changes in the high frequencies. (Note that the X-axis is slightly shifted when comparing the above and below images)

The "global" spectra looks similar, which makes sense, since it's only the transients that might show differences.

The PK Metric is very low though (suggesting no detectable difference)

LEFT / RIGHT

Discussion
There is no way to test if DSD11.2MHz is required for recording, as claimed by the recording engineer but it does seem that the DSD layer offers something slightly different. We have slightly higher dynamic range by the measurements, and the waveforms do show the biggest differences when the stringed instruments are really being used.

Based upon this test, though, delivery of the finished piece in both CD and SACD format is excellent. The PK Metric suggests that "rms" you're going to find both layers similar. When looking at the null waveform, the transients do seem to have differences of ~12 dB over the noise difference, but it's pretty subtle. The question though is if the noise is just all ultrasonic noise while the signal is actually audible.

ABX Testing: Yes, it's definitely audible. It wasn't the drums but the stringed instruments.
Sennheiser HD820 and Sony TA-ZH1ES at -31.5 dB volume. Estimated in-ear volume for the peaks are 65-70 dB based upon putting a microphone to the open headphone driver. I was easily able to detect the higher noise on the CD layer as opposed to the DSD layer in the beginning of the tracks. This was absent once the music started.

Looking at the volume matching of the DSD vs CD waveforms, you can see that they're pretty reasonable

RIGHT channel
0.008 vs -0.006 dB peak
-19.098 vs -19.096 dB rms

LEFT channel
-0.002 vs -0.040 dB peak
-18.768dB vs -18.767dB rms

I did my ABX testing by specifying the region between 1:27 and 1:40 seconds since that's where we saw the biggest waveform differences were. Here the difference was that the CD layer actually had more HF energy and felt like it was "tickling" the ear more with the plucks of the stringed instrument. It was an ASMR like effect. The SACD didn't generate that result. I didn't appreciate any difference in the bass between the two.

This makes sense numerically from the global spectra which shows that the higher HF energy above 17 kHz on the CD track, although I cannot hear 17 kHz test tones that well. Again, note that I'm listening at -31.5 dB on the TA-ZH1ES.

foo_abx 2.1 report
foobar2000 v1.6.12
2023-04-06 10:33:15

File A: 11 - _Bonus Track_ Ikuko Kawai_ Excerpt from Pictures at an Exhibition.dsf
SHA1: 8b0235a944d775a2eba760a8e090d62ab2e20917
File B: 11. Track11.wav
SHA1: 06d1a7a29a3b284378440143d76e0e74c0c2d0c1

Output:
ASIO : Sony Headphone Amplifier Driver
Crossfading: NO

10:33:15 : Test started.
10:34:21 : 01/01
10:36:29 : 02/02
10:36:41 : 03/03
10:36:50 : 04/04
10:36:58 : 05/05
10:37:07 : 06/06
10:37:24 : 07/07
10:37:32 : 08/08
10:37:41 : 09/09
10:37:47 : 10/10
10:37:54 : 11/11
10:38:06 : 12/12
10:38:13 : 13/13
10:38:28 : 14/14
10:38:35 : 15/15
10:38:42 : 16/16
10:38:42 : Test finished.

----------
Total: 16/16
p-value: 0 (0%)

-- signature --
1dd3718348e0b371bb665fa84488db04713b6695

Final thoughts
Musically, this is an interesting album. It's a blend between traditional Japanese and Western instruments with uniquely Japanese interpretations of even classics like Holst's Jupiter. It's not a frenetic or dynamic album -- it's more pensive despite the choice of typically energetic compositions.

While we cannot "prove" that there is a need for DSD 11.2 MHz, there was a difference between the DSD and CD layers even though they were volume matched RMS and peak at baseline very well. I was able to ABX test the two different layers, but it's also unclear how much of this is mastering choices (though the waveforms look virtually identical) nor can I explain what "tickling the ear drum/ASMR effect" actually means when I can't hear above 17 kHz on a sine wave test tone.

I can imagine some listeners would prefer the CD layer for the "tickling the ear drum/ASMR" effect but I personally preferred the SACD layer based upon the musical content and its slower pace. I did my ABX testing with the Sennheiser HD820 and Sony TA-ZH1ES. There could be an interaction with the recording and hardware and generation of IMD.

@pkane I wonder why the PK Metric is so low, yet I could detect a difference. Happy to send you a copy of the digital files for your evaluation.
@amirm For your consideration for a frontpage pin.

Elitzur–Vaidman · Apr 6, 2023

I came in expecting some nonsensical audiophile hypothesis, and I couldn't be more pleased to be wrong.

pkane · Apr 6, 2023

GXAlan said:
Is DSD 11.2 MHz "required" for recording taiko drums and Japanese stringed instruments?

ìˆäˆèŽq‚ÌVìƒAƒ‹ƒoƒ€w‹¿ -HIBIKI-xB˜aŠyŠí‚Æ¼—mŠyŠí‚ð‘g‚Ý‡‚í‚¹‚½˜^‰¹ƒAƒvƒ[ƒ`‚ð’T‚é - Phile-web

ìˆäˆèŽq‚ÌVìƒAƒ‹ƒoƒ€w‹¿ -HIBIKI-xB˜aŠyŠí‚Æ¼—mŠyŠí‚ð‘g‚Ý‡‚í‚¹‚½˜^‰¹ƒAƒvƒ[ƒ`‚ð’T‚é

www.phileweb.com

Original article in Japanese

"The powerful and deep ultra-low sound of Japanese taiko drums and the penetrating treble of Noukan and Hichiriki are important listening highlights that audiophiles should pay attention to. Ezaki talks about the difficulty of recording and playback: "Japanese taiko drums contain deep ultra-low sounds that can only be recorded at DSD 11.2MHz, so please challenge yourself to reproduce them accurately. The sound pressure was too high, and if I recorded at a normal level, it would all be over, so I put in a minus 20dB pad to suppress the over-level.

Incidentally, the sound pressure of the Noh tube is almost unheard of in Western instruments, and is comparable to the bang of a large drum. Its sharp and powerful treble is also likely to be one of the most difficult challenges for audio systems."

Is there any truth to this?
No idea. We weren't there at the time of recording, but we can run some indirect experiments.

We do know that the theoretical quantization noise of SACD is horrible beyond 20 kHz but the dynamic range in the bass region is actually are pretty good. So DSD isn't worse than PCM in this regard.

View attachment 277385
source: attached document

EXTON Studio TOKYO
The album was mixed at EXTON Studio TOKYO. Opened in 2012, the studio was "thoroughly particular about the power supply environment, and in addition to installing a pole transformer dedicated to power sharing, it is a studio that pursues sound quality from the power supply side, such as the burial of ground rods, high-quality sound quality wire rods, and the specifications of parts."

The studio uses the Pyramix DAW system which can keep everything in DSD 11.2MHz except for crossfades when DXD is brought into the chain. So at least the studio involved could benefit from DSD 11.2 MHz recordings.

At least from their 2012 equipment list, they've got Accuphase electronics and speakers from B&W, KEF, and even the Sony SS-AR1!

But... it seems that this album is only available on physical Hybrid SACD...
There's no 11.2 MHz DSD digital album to purchase. Admittedly, the recording engineer didn't say that you needed to *listen* to this at DSD 11.2 MHz -- just that it needed to be recorded at DSD 11.2 Mhz.

This creates the scenario for our test. The recording engineer took the trouble to work with DSD 11.2 which is more difficult than PCM. Is there anything that carries to the finished disc?

Does DSD 2.8 MHz offer anything above 16-bit / 44.1 kHz for this album?

I chose Track #11 which is a re-arrangement of Mussorgsky Pictures at an Exhibition with traditional Japanese instruments which has some nice drum activity.

SACD layer
View attachment 277495

CD layer
View attachment 277496

The dynamic range of the LEFT channel is 0.4 points different compared to the dynamic range of the right which is just 0.1 points different.

Test parameters
@pkane's DeltaWave v2.0.8
DSD converted to "DXD" 352.8 kHz, with 50 kHz cut frequency and a transition BW of 5000 Hz. Auto Upsample turned on.
DSD and CD layers extracted digitally.

DeltaWave v2.0.8, 2023-04-06T09:16:49.8591744-07:00
Reference: 11 - _Bonus Track_ Ikuko Kawai_ Excerpt from Pictures at an Exhibition.dsf[L] 49188864 samples 352800Hz 24bits, mono, MD5=00
Comparison: 11. Track11.wav[L] 6149304 samples 44100Hz 16bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:True
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:65536, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -500dB
Correct Non-linearity: False
Correct Drift:True, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:True, Window:Kaiser
Spectrum Window:Kaiser, Spectrum Size:32768
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Kaiser, taps:262144, minimum phase=False
Dither:False bits=0
Trim Silence:False
Enable Simple Waveform Measurement: False

Resampled Comparison to 352800Hz
Discarding Reference: Start=0s, End=150s
Discarding Comparison: Start=0s, End=150s

Initial peak values Reference: -0.002dB Comparison: -0.04dB
Initial RMS values Reference: -18.768dB Comparison: -18.767dB

Null Depth=14.973dB
X-Correlation offset: 1551 samples
Drift computation quality, #1: Excellent (0.09μs)

Trimmed 0 samples ( 0.00ms) front, 0 samples ( 0.00ms end)

Final peak values Reference: -0.002dB Comparison: -0.042dB
Final RMS values Reference: -18.768dB Comparison: -18.769dB

Gain= 0.0027dB (1.0003x) DC=0 Phase offset=4.396549ms (1551.103 samples)
Difference (rms) = -59.9dB [-83.82dBA]
Correlated Null Depth=61.4dB [79.66dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0.61%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 49.9977% when reduced to 9.42 bits

---- Phase difference (full bandwidth): 0.499160200574147°
0-10kHz: 0.22°
0-20kHz: 0.39°
0-24kHz: 0.41°
Timing error (rms jitter): 1.9μs
PK Metric (step=400ms, overlap=50%):
RMS=-87.4dBr
Median=-87.8
Max=-76.6

99%: -82.2
75%: -86.56
50%: -87.81
25%: -89.4
1%: -96.47

gn=0.99969121573486, dc=-2.98660917164619E-09, dr=0, of=1551.102620714

DONE!

Signature: 1074f525b750623839bbee49b51d5194

RMS of the difference of spectra: -286.583287535641dB
DF Metric (step=400ms, overlap=0%):
Median=-42.2dB
Max=-9.8dB Min=-49.7dB

1% > -49.23dB
10% > -47.82dB
25% > -45.69dB
50% > -42.19dB
75% > -39.13dB
90% > -34.15dB
99% > -14.6dB

Linearity 23.3bits @ 0.5dB error

DeltaWave v2.0.8, 2023-04-06T09:45:01.1545896-07:00
Reference: 11 - _Bonus Track_ Ikuko Kawai_ Excerpt from Pictures at an Exhibition.dsf[R] 49188864 samples 352800Hz 24bits, mono, MD5=00
Comparison: 11. Track11.wav[R] 6149304 samples 44100Hz 16bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:True
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:65536, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -500dB
Correct Non-linearity: False
Correct Drift:True, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:True, Window:Kaiser
Spectrum Window:Kaiser, Spectrum Size:32768
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Kaiser, taps:262144, minimum phase=False
Dither:False bits=0
Trim Silence:False
Enable Simple Waveform Measurement: False

Resampled Comparison to 352800Hz
Discarding Reference: Start=0s, End=150s
Discarding Comparison: Start=0s, End=150s

Initial peak values Reference: 0.008dB Comparison: -0.006dB
Initial RMS values Reference: -19.098dB Comparison: -19.096dB

Null Depth=13.05dB
X-Correlation offset: 1551 samples
Drift computation quality, #1: Excellent (0.09μs)

Trimmed 0 samples ( 0.00ms) front, 0 samples ( 0.00ms end)

Final peak values Reference: 0.008dB Comparison: -0.008dB
Final RMS values Reference: -19.098dB Comparison: -19.098dB

Gain= 0.0019dB (1.0002x) DC=0 Phase offset=4.396333ms (1551.026 samples)
Difference (rms) = -59.95dB [-83.98dBA]
Correlated Null Depth=61.68dB [77.53dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0.62%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50.002% when reduced to 9.42 bits

---- Phase difference (full bandwidth): 0.455286117579634°
0-10kHz: 0.22°
0-20kHz: 0.36°
0-24kHz: 0.38°
Timing error (rms jitter): 431ns
PK Metric (step=400ms, overlap=50%):
RMS=-88.2dBr
Median=-88.2
Max=-82.8

99%: -84.42
75%: -87.11
50%: -88.2
25%: -89.81
1%: -95.9

gn=0.999783269216895, dc=-1.33039044914503E-06, dr=0, of=1551.0264145405

DONE!

Signature: 5bf9e4568681ac4a5d47df182c2df300

RMS of the difference of spectra: -288.232023821114dB
DF Metric (step=400ms, overlap=0%):
Median=-41.7dB
Max=-11.6dB Min=-50dB

1% > -49.3dB
10% > -48.12dB
25% > -44.75dB
50% > -41.71dB
75% > -37.61dB
90% > -33.28dB
99% > -16.3dB

Linearity 23.2bits @ 0.5dB error

Accuracy of the match:
View attachment 277390
View attachment 277507

Delta Waveform (left channel)
View attachment 277508

You definitely see "spikes" in regions of the music. These aren't drum hits but the stringed instruments. The same is seen on the right channel delta waveform.
View attachment 277516

The spectrogram does show some changes dipping down as low as 700 Hz, but the changes are very subtle. The biggest changes are above 6 kHz. The biggest transient is also associated with changes in the high frequencies. (Note that the X-axis is slightly shifted when comparing the above and below images)

View attachment 277517

The "global" spectra looks similar, which makes sense, since it's only the transients that might show differences.
View attachment 277518

The PK Metric is very low though (suggesting no detectable difference)

LEFT / RIGHT
View attachment 277511
View attachment 277519

Discussion
There is no way to test if DSD11.2MHz is required for recording, as claimed by the recording engineer but it does seem that the DSD layer offers something slightly different. We have slightly higher dynamic range by the measurements, and the waveforms do show the biggest differences when the drums are really being used.

Based upon this test, though, delivery of the finished piece in both CD and SACD format is excellent. The PK Metric suggests that "rms" you're going to find both layers similar. When looking at the null waveform, the transients do seem to have differences of ~12 dB over the noise difference, but it's pretty subtle. The question though is if the noise is just all ultrasonic noise while the signal is actually audible.

View attachment 277522

ABX Testing: Yes, it's definitely audible. It wasn't the drums but the stringed instruments.
Sennheiser HD820 and Sony TA-ZH1ES at -31.5 dB volume. Estimated in-ear volume for the peaks are 65-70 dB based upon putting a microphone to the open headphone driver. I was easily able to detect the higher noise on the CD layer as opposed to the DSD layer.

Looking at the volume matching of the DSD vs CD waveforms, you can see that they're pretty reasonable
the RIGHT channel is 0.008 vs -0.006 dB peak and -19.098 vs -19.096 dB rms
the LEFT channel is is -0.002 vs -0.040 dB peak and -18.768dB vs -18.767dB rms

I then tried listening to the region between 1:27 and 1:40 seconds since that's where we saw the biggest waveform differences were. Here the difference was that the CD layer actually had more HF energy and felt like it was "tickling" the ear more with the plucks of the stringed instrument. It was an ASMR like effect. The SACD didn't generate that result. I didn't appreciate any difference in the bass.

This makes sense from the global spectra which shows that the higher HF energy above 17 kHz on the CD track, although I cannot hear 17 kHz test tones that well. Again, note that I'm listening at -31.5 dB on the TA-ZH1ES.

View attachment 277527

foo_abx 2.1 report
foobar2000 v1.6.12
2023-04-06 10:33:15

File A: 11 - _Bonus Track_ Ikuko Kawai_ Excerpt from Pictures at an Exhibition.dsf
SHA1: 8b0235a944d775a2eba760a8e090d62ab2e20917
File B: 11. Track11.wav
SHA1: 06d1a7a29a3b284378440143d76e0e74c0c2d0c1

Output:
ASIO : Sony Headphone Amplifier Driver
Crossfading: NO

10:33:15 : Test started.
10:34:21 : 01/01
10:36:29 : 02/02
10:36:41 : 03/03
10:36:50 : 04/04
10:36:58 : 05/05
10:37:07 : 06/06
10:37:24 : 07/07
10:37:32 : 08/08
10:37:41 : 09/09
10:37:47 : 10/10
10:37:54 : 11/11
10:38:06 : 12/12
10:38:13 : 13/13
10:38:28 : 14/14
10:38:35 : 15/15
10:38:42 : 16/16
10:38:42 : Test finished.

----------
Total: 16/16
p-value: 0 (0%)

-- signature --
1dd3718348e0b371bb665fa84488db04713b6695

Final thoughts
Musically, this is an interesting album. It's a blend between traditional Japanese and Western instruments with uniquely Japanese interpretations of even classics like Holst's Jupiter. It's not a frenetic or dynamic album -- it's more pensive despite the choice of typically energetic compositions.

While we cannot "prove" that there is a need for DSD 11.2 MHz, there was a difference between the DSD and CD layers even though they were volume matched RMS and peak at baseline very well. I was able to ABX test the two different layers, but it's also unclear how much of this is mastering choices (though the waveforms look virtually identical) nor can I explain what "tickling the ear drum/ASMR effect" actually means when I can't hear above 17 kHz on a sine wave test tone.

I can imagine some listeners would prefer the CD layer for the "tickling the ear drum/ASMR" effect but I personally preferred the SACD layer based upon the musical content and its slower pace. I did my ABX testing with the Sennheiser HD820 and Sony TA-ZH1ES. There could be an interaction with the recording and hardware and generation of IMD.

@pkane I wonder why the PK Metric is so low, yet I could detect a difference. Happy to send you a copy of the digital files for your evaluation.
@amirm For your consideration for a frontpage pin.

Here's one possible reason why you might detect differences: intersample overs and their handling using the two types of recording. Your waveforms reach nearly 0dBFS in places, although intersample overs can occur with as low as -3dBFS or even lower. While the individual samples in the recording may be below 0dBFS at the recorded rate, their interpolated, oversampled values during playback may exceed 0dBFS resulting in clipping or some sort of limiter kicking in.

GXAlan · Apr 6, 2023

pkane said:
Here's one possible reason why you might detect differences: intersample overs and their handling using the two types of recording. Your waveforms reach nearly 0dBFS in places, although intersample overs can occur with as low as -3dBFS or even lower. While the individual samples in the recording may be below 0dBFS at the recorded rate, their interpolated, oversampled values during playback may exceed 0dBFS resulting in clipping or some sort of limiter kicking in.

That's a great thought! If you look at the region I was looking at it does cross the -3 dBFS point that you measure. The close to 0dBFS is much later toward the end.

I will send you a PM if you have curiosity to look at the source files.

kemmler3D · Apr 6, 2023

What jumped out at me was the mention of a -20dB pad. There's nothing about PCM that can't handle a even a 0.2hz waveform, if your mic can capture it.

However, depending on the recorder, maybe noise performance in low frequencies is better with DSD than PCM? (Maybe if they had a 16-bit PCM recorder, I dunno, doesn't make a ton of sense either way)...

Interesting to see that the layers were this different. It's more than you'd guess, but I am likewise not sure where the differences actually come from.

GXAlan · Apr 6, 2023

kemmler3D said:
Interesting to see that the layers were this different. It's more than you'd guess, but I am likewise not sure where the differences actually come from.

I am admittedly testing with headphones not full range speakers, so it’s not the right environment for bass, but at least on the bass, I really don’t see anything in the numbers. I think the drums is audiophile hand waving since -20 dB pad also means that you don’t have to worry about those peaks in the same way.

It is interesting how the high frequencies changed. My understanding is that you have YouTube videos that generate ASMR effects so it shouldn’t be the high frequencies alone.

That’s sort of a new area for science: what audio characteristics can trigger or not trigger ASMR. Could simply be a treble boost at 17 kHZ

Also makes me wonder if things like the B&W D4 leverage that effect since they are famously non-flat.

krabapple · Apr 6, 2023

Without full insight into their signal chain(s) this is really an exercise in speculation. A priori there is no reason they 'should' sound different if all that was done was to properly convert a well-recorded DSD file to CD rate PCM.

Blumlein 88 · Apr 6, 2023

I don't do much with DSD. At one time many devices put out slightly different levels playing DSD vs PCM. I don't know if your gear does this or not. Did you check for that?

krabapple · Apr 6, 2023

Back in the day -- I have no idea what hardware SACD players do today -- there could be as much as as 6dB level difference between the DSD layer and PCM layer of a hybrid disc (with the DSD being lower). The players 'compensated' for this by attenuating PCM layer playback. Otherwise, thanks to the psychoacoustic effect of level difference, customers might think the CD layer sounds better -- unacceptable!

In other cases, notoriously Dark Side of the Moon, dynamic range compression was clearly added to the PCM layer but not the DSD layer. (It always seemed to me that, ironically, this would work against the 'safeguard' put into place in players above.)

DVDdoug · Apr 6, 2023

It's unlikely that the format is the cause of any differences.

Sennheiser HD820 and Sony TA-ZH1ES at -31.5 dB volume. Estimated in-ear volume for the peaks are 65-70 dB based upon putting a microphone to the open headphone driver. I was easily able to detect the higher noise on the CD layer as opposed to the DSD layer in the beginning of the tracks. This was absent once the music started.

Quantization noise at 16-bits is around -93 or -96dB. And when there is no digital signal, the digital noise goes away completely (-infinity dB). (The noise can be higher with dither.) At 70dB SPL, that puts the quantization noise at around -30dB SPL. Way below the threshold of hearing and probably quieter than any place on earth!!!

Audible noise is coming from somewhere else. Usually it's something in the analog electronics. If it's the same player, I don't have any idea why the CD would be worse.

DSD noise is "more complicated" and mostly in the ultrasonic range and overall it should be lower than 16-bit PCM.

Sokel · Apr 6, 2023

DVDdoug said:
It's unlikely that the format is the cause of any differences.

Quantization noise at 16-bits is around -93 or -96dB. And when there is no digital signa, the digital noise goes away completely (-infinity dB). (The noise can be higher with dither.) At 70dB SPL, that puts the quantization noise at around -30dB SPL. Way below the threshold of hearing and probably quieter than any place on earth!!!

Audible noise is coming from somewhere else. Usually it's something in the analog electronics. If it's the same player, I don't have any idea why the CD would be worse.

DSD noise is "more complicated" and mostly in the ultrasonic range and overall it should be lower than 16-bit PCM.

Yep,equipment behave differently,at least that's what I see with something as simple as Khadas tone all else been equal:

PCM vs DSD

Did a little comparison and I was wondering if any friend with better gear here has tested this. Results you're about to see is probably narrowed to the specific DAC but it's interesting to see how it handles things differently. I will not try to decipher the results other that DSD seems much...

www.audiosciencereview.com

Cleaner spectrum-higher noise for DSD (distortion is about the same but much better progression for DSD and no grass)

GXAlan · Apr 6, 2023

DVDdoug said:
Audible noise is coming from somewhere else. Usually it's something in the analog electronics. If it's the same player, I don't have any idea why the CD would be worse.

DSD noise is "more complicated" and mostly in the ultrasonic range and overall it should be lower than 16-bit PCM.

I agree. It must be in the mastering. It might not be noise but some Japanese instrument that sounds like noise

I also had 16/16 correct in ABX but that seemed unfair to just use that to point to the difference.

Listening to the portion of the music I described above, it may very well be intersample overs if that happens at -3 dBFS since it’s the third note that had the effect. But whatever it is, I can definitely imagine people preferring the ASMR effect from the CD.

Sokel · Apr 6, 2023

That's me:

(don't ask to interpret,I only know it's different)

dc655321 · Apr 6, 2023

GXAlan said:
It might not be noise but some Japanese instrument that sounds like noise

Ah, the legendary Noyzifukka?!

I'll show myself out now...

GXAlan · Apr 6, 2023

@dualazmak
I have relied on the auto-translate feature to read the PhileWeb article on this. Is there anything else you can add as a native Japanese speaker? It was a lot pricier to import this SACD to the USA -- I imagine you can just walk into HMV Japan and pick one up.

dualazmak · Apr 7, 2023

Hello @GXalan and friends,

Looks interesting.
I am now away from my home/office on business travel until April 18. On my return home, I will order the SACD at HMV Japan.

I have my "secret" method of extracting the intact DSD layer into DSF (DSD64 2.8 MHz 1 bit) files using "specific version model" of Sony PlayStation. Of course I can rip the CD layer in very accurate manner by using dBpoweramp CD Ripper.

Comparative subjective listening in my latest DSP multichannel multi-driver multi-amplifier audio system (please refer here and here) as well as objective spectrum analyses/comparison using Adobe Audition and MusicScope, would be really interesting.

Keith_W · Apr 7, 2023

Interesting thread and conclusions. I have a question though - many professional studios use DSD for recording. Merging has an entire line of DSD capable processors. What is the advantage of a DSD workflow for professional studios? And if it's good for the pros, why isn't it good for us?

dc655321 · Apr 7, 2023

Keith_W said:
Interesting thread and conclusions. I have a question though - many professional studios use DSD for recording. Merging has an entire line of DSD capable processors. What is the advantage of a DSD workflow for professional studios? And if it's good for the pros, why isn't it good for us?

I’d need to see evidence to support the idea that “many” studios use dsd.

Otherwise, it’s a format that consumes a lot of bandwidth (unnecessarily) and is difficult to digitally manipulate. Kinda useless, imo.

voodooless · Apr 7, 2023

Keith_W said:
What is the advantage of a DSD workflow for professional studios?

There are none. You can’t process DSD. For any operation other than a simple cut you must convert to multibit first. So any software for mastering DSD will be a lot more resource intensive. Why bother?

Keith_W · Apr 7, 2023

dc655321 said:
I’d need to see evidence to support the idea that “many” studios use dsd.

Otherwise, it’s a format that consumes a lot of bandwidth (unnecessarily) and is difficult to digitally manipulate. Kinda useless, imo.

If you are looking for an academic paper that shows how many of them use DSD, I am not aware of one. However, the fact that Merging are still in business selling a lot of Anubis, Hapi, and Horus interfaces and have not gone under suggests that there is market demand for DSD. Merging is known for two core technologies: their heavy promotion of DSD, and Ravenna.

Does DSD recording benefit Japanese traditional instruments?

Master Contributor

Attachments

Addicted to Fun and Learning

Master Contributor

Master Contributor

Master Contributor

Master Contributor

Major Contributor

Grand Contributor

Major Contributor

Major Contributor

Master Contributor

Master Contributor

Master Contributor

Major Contributor

Master Contributor

Major Contributor

Major Contributor

Major Contributor

Grand Contributor

Major Contributor

Similar threads