DAC ABX Test Phase 1: Does a SOTA DAC sound the same as a budget DAC if proper controls are put in place? Spoiler: Probably yes. :)

Grooved · Feb 28, 2022

@dominikz

I made a mistake by posting this in the other ABX thread of @Echoes
It was you using the Cosmos ADC, so here was what I wrote:

One thing I forgot to ask you sooner regarding your previous test: can you do a test with Deltawave to compare each one of your previous recordings with the original file?
If I'm not wrong (unless I saw it on another thread), you compared both files between each other, but not each one with the original file
I ask that because using Cosmos ADC, you certainly did have no clock sync between your DAC and the ADC

Try using these settings: (and with "clock drift" enabled to check if it changes something in this case)

and

dominikz · Mar 6, 2022

Grooved said:
@dominikz

I made a mistake by posting this in the other ABX thread of @Echoes
It was you using the Cosmos ADC, so here was what I wrote:

One thing I forgot to ask you sooner regarding your previous test: can you do a test with Deltawave to compare each one of your previous recordings with the original file?
If I'm not wrong (unless I saw it on another thread), you compared both files between each other, but not each one with the original file
I ask that because using Cosmos ADC, you certainly did have no clock sync between your DAC and the ADC

Try using these settings: (and with "clock drift" enabled to check if it changes something in this case)
View attachment 189661

and
View attachment 189662

Sure, the results are below.

In addition you may perhaps be interested in the related comments (including clock drift and offset) by @KSTR in posts #59 and #60.
If you wish to do additional analysis of the files, they are linked in post #1 and original source master file in post #76.

Note that the recordings were done with an older E1DA Cosmos ADC firmware, which did phase inversion vs original source digital master file (see this mentioned at the end of post #76). There is also phase wrapping due to the ADC linear phase filter. However the same effect is applied to both recordings so isn't relevant in our ABX context.

DeltaWave config A comparisons
First the comparisons done with the following DeltaWave config (based on the first one in your post, I'll call it config A):

Original 24bit digital master vs the same file played by Topping E50 and recorded by E1DA Cosmos ADC (DeltaWave config A)

DeltaWave v2.0.2, 2022-03-06T13:34:36.7503750+01:00
Reference: ABX - Original 24-bit master - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - Topping E50 - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -1dB Comparison: -0,837dB
Initial RMS values Reference: -14,371dB Comparison: -14,385dB

Null Depth=5,725dB
Trimming 5125 samples at start and 5094 samples at the end that are below -90,31dB level

Phase inverted
X-Correlation offset: -14 samples
Trimming 1 samples at start and 0 samples at the end that are below -90,31dB level

Trimmed 109 samples ( 2,471655ms) front, 95604 samples ( 2167,891156ms end)

Final peak values Reference: -1dB Comparison: -0,898dB
Final RMS values Reference: -14,411dB Comparison: -14,411dB

Gain= -0,0138dB (0,9984x) DC=0 Phase offset=-0,276387ms (-12,189 samples)
Difference (rms) = -16,61dB [-17,21dBA]
Correlated Null Depth=30,26dB [29,56dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0,02%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50,0706% when reduced to 3,13 bits

Timing error (rms jitter): 20,3μs
PK Metric (step=400ms, overlap=50%):
RMS=-63,6dBFS
Median=-70,4
Max=-40,1

99%: -59,23
75%: -64,13
50%: -70,4
25%: -77,75
1%: -85,55

gn=1,00159312225962, dc=0, dr=0, of=-12,188686869

DONE!

Signature: 44c38859d7c4758589d5d9f379ddc73a

RMS of the difference of spectra: -69,678709095787dB

Original 24bit digital master vs the same file played by FiiO D03K and recorded by E1DA Cosmos ADC (DeltaWave config A)

DeltaWave v2.0.2, 2022-03-06T13:37:21.8786374+01:00
Reference: ABX - Original 24-bit master - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - FiiO D03K - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -1dB Comparison: -0,434dB
Initial RMS values Reference: -14,371dB Comparison: -14,429dB

Null Depth=5,241dB
Trimming 5125 samples at start and 5099 samples at the end that are below -90,31dB level

Phase inverted
X-Correlation offset: -21 samples
Trimming 5 samples at start and 0 samples at the end that are below -90,31dB level

Trimmed 737 samples ( 16,712018ms) front, 96896 samples ( 2197,188209ms end)

Final peak values Reference: -1dB Comparison: -0,302dB
Final RMS values Reference: -14,415dB Comparison: -14,414dB

Gain= -0,0584dB (0,9933x) DC=0 Phase offset=-0,570238ms (-25,147 samples)
Difference (rms) = -15,86dB [-16,93dBA]
Correlated Null Depth=28,44dB [20,76dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0,01%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 49,9996% when reduced to 2,88 bits

Timing error (rms jitter): 15,1μs
PK Metric (step=400ms, overlap=50%):
RMS=-55,7dBFS
Median=-57,2
Max=-43,8

99%: -48,55
75%: -54,73
50%: -57,17
25%: -60,35
1%: -71,72

gn=1,00674577227056, dc=0, dr=0, of=-25,147483648

DONE!

Signature: a01ee4b318e212869fed16b1a4768abf

RMS of the difference of spectra: -69,3105373059918dB

Comparison of recordings from Topping E50 vs FiiO D03K into the same E1DA Cosmos ADC (DeltaWave config A)

DeltaWave v2.0.2, 2022-03-06T13:38:34.4498781+01:00
Reference: ABX - Topping E50 - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - FiiO D03K - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -0,837dB Comparison: -0,434dB
Initial RMS values Reference: -14,385dB Comparison: -14,429dB

Null Depth=37,742dB
Trimming 5124 samples at start and 5099 samples at the end that are below -90,31dB level

X-Correlation offset: -11 samples
Trimming 0 samples at start and 0 samples at the end that are below -90,31dB level

Trimmed 8768 samples ( 198,820862ms) front, 29648 samples ( 672,290249ms end)

Final peak values Reference: -0,837dB Comparison: -0,325dB
Final RMS values Reference: -14,347dB Comparison: -14,377dB

Gain= -0,0135dB (0,9984x) DC=0 Phase offset=-0,241108ms (-10,633 samples)
Difference (rms) = -35,89dB [-46,79dBA]
Correlated Null Depth=42,96dB [49,72dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0,06%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50,0013% when reduced to 5,94 bits

Timing error (rms jitter): 2,8μs
PK Metric (step=400ms, overlap=50%):
RMS=-56,1dBFS
Median=-56,8
Max=-46,9

99%: -48,47
75%: -55,34
50%: -56,76
25%: -58,71
1%: -66,22

gn=1,00155491522598, dc=0, dr=0, of=-10,632884702

DONE!

Signature: be567466eb523a80f0ef9462dab288b9

RMS of the difference of spectra: -87,2415878081697dB

DeltaWave config B comparisons
Now the comparisons done with the following DeltaWave config (based on the second one in your post, I'll call it config B):

Original 24bit digital master vs the same file played by Topping E50 and recorded by E1DA Cosmos ADC (DeltaWave config B)
Note This configuration seems to cause clipping (I assume due to level/phase EQ matching), so results may not be relevant:

DeltaWave v2.0.2, 2022-03-06T13:39:55.3681467+01:00
Reference: ABX - Original 24-bit master - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - Topping E50 - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:True Non-linear Phase EQ: True
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:True
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -1dB Comparison: -0,837dB
Initial RMS values Reference: -14,371dB Comparison: -14,385dB

Null Depth=5,725dB
Trimming 5125 samples at start and 5094 samples at the end that are below -90,31dB level

Phase inverted
X-Correlation offset: -14 samples
Trimming 1 samples at start and 0 samples at the end that are below -90,31dB level

Trimmed 14609 samples ( 331,269841ms) front, 66198 samples ( 1501,088435ms end)

Final peak values Reference: -1dB Comparison: 1,452dB
Final RMS values Reference: -14,359dB Comparison: -13,423dB

Gain= 0,0895dB (1,0104x) DC=0 Phase offset=-0,276387ms (-12,189 samples)
Difference (rms) = -19,58dB [-19,38dBA]
Correlated Null Depth=24,87dB [20,15dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0,01%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 49,9997% when reduced to 3,02 bits

---- Variable Group Delay. Frequency matched from 0Hz to 22,1kHz:
1kHz = 11,9ms (4296,99°)
2kHz = 15,1ms (10861,06°)
4kHz = 15ms (21632,12°)
8kHz = 9,7ms (27840,13°)
16kHz = 1,7ms (9734,16°)
Timing error (rms jitter): 13,5μs
PK Metric (step=400ms, overlap=50%):
RMS=-11,6dBFS
Median=-12,3
Max=-5,9

99%: -6,18
75%: -10,73
50%: -12,35
25%: -17,34
1%: -24,58

gn=0,989743833360899, dc=0, dr=0, of=-12,188686869

DONE!

Signature: c9236bc116d5acb4c3c117fe8e6894e0

RMS of the difference of spectra: -72,6823297673763dB

Original 24bit digital master vs the same file played by FiiO D03K and recorded by E1DA Cosmos ADC (DeltaWave config B)
Note This configuration seems to cause clipping (I assume due to level/phase EQ matching), so results may not be relevant:

DeltaWave v2.0.2, 2022-03-06T13:45:25.8775546+01:00
Reference: ABX - Original 24-bit master - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - FiiO D03K - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:True Non-linear Phase EQ: True
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:True
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -1dB Comparison: -0,434dB
Initial RMS values Reference: -14,371dB Comparison: -14,429dB

Null Depth=5,241dB
Trimming 5125 samples at start and 5099 samples at the end that are below -90,31dB level

Phase inverted
X-Correlation offset: -21 samples
Trimming 5 samples at start and 0 samples at the end that are below -90,31dB level

Trimmed 15488 samples ( 351,201814ms) front, 75388 samples ( 1709,478458ms end)

Final peak values Reference: -1dB Comparison: 1,367dB
Final RMS values Reference: -14,355dB Comparison: -13,71dB

Gain= 0,211dB (1,0246x) DC=0 Phase offset=-0,570238ms (-25,147 samples)
Difference (rms) = -20,99dB [-21,22dBA]
Correlated Null Depth=30,49dB [21,51dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0,01%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50,0084% when reduced to 3,23 bits

---- Variable Group Delay. Frequency matched from 0Hz to 22,1kHz:
1kHz = 6,3ms (2262,77°)
2kHz = 5,7ms (4088,90°)
4kHz = 14,8ms (21262,08°)
8kHz = 11,3ms (32454,41°)
16kHz = 1,8ms (10476,69°)
Timing error (rms jitter): 26,7μs
PK Metric (step=400ms, overlap=50%):
RMS=-18,4dBFS
Median=-20,3
Max=-12,2

99%: -12,78
75%: -16,1
50%: -20,33
25%: -24,32
1%: -29,66

gn=0,9759995710513, dc=0, dr=0, of=-25,147483648

DONE!

Signature: 9e4553a34435094b10d7402523af3a7e

RMS of the difference of spectra: -72,9269593417217dB

Comparison of recordings from Topping E50 vs FiiO D03K into the same E1DA Cosmos ADC (DeltaWave config B)

DeltaWave v2.0.2, 2022-03-06T13:47:49.2522559+01:00
Reference: ABX - Topping E50 - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - FiiO D03K - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:True Non-linear Phase EQ: True
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:True
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -0,837dB Comparison: -0,434dB
Initial RMS values Reference: -14,385dB Comparison: -14,429dB

Null Depth=37,742dB
Trimming 5124 samples at start and 5099 samples at the end that are below -90,31dB level

X-Correlation offset: -11 samples
Trimming 0 samples at start and 0 samples at the end that are below -90,31dB level

Trimmed 21514 samples ( 487,845805ms) front, 49914 samples ( 1131,836735ms end)

Final peak values Reference: -0,837dB Comparison: -0,879dB
Final RMS values Reference: -14,34dB Comparison: -14,339dB

Gain= -0,0452dB (0,9948x) DC=0 Phase offset=-0,241108ms (-10,633 samples)
Difference (rms) = -49,6dB [-49,98dBA]
Correlated Null Depth=58,21dB [50,59dBA]
Clock drift: 0 ppm

Files are NOT a bit-perfect match (match=0,22%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50,0148% when reduced to 7,83 bits

---- Variable Group Delay. Frequency matched from 0Hz to 21,1kHz:
1kHz = 3,5μs (1,25°)
2kHz = 866,9ns (0,62°)
4kHz = 320,1ns (0,46°)
8kHz = 22,6ns (0,06°)
16kHz = 547,9ns (3,16°)
Timing error (rms jitter): 1,1μs
PK Metric (step=400ms, overlap=50%):
RMS=-56,4dBFS
Median=-59,6
Max=-46,5

99%: -48,18
75%: -55,64
50%: -59,57
25%: -62,2
1%: -68,35

gn=1,0052157954628, dc=0, dr=0, of=-10,632884702

DONE!

Signature: e26091f87744101b3731632de57c08b8

RMS of the difference of spectra: -96,796616364537dB

dominikz · Mar 6, 2022

Time for another update of result overview so far:

These are the results of participants that took the online test via abxtests.com - we had a total of 93 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.

Correct	p-value (X>=x)	How many participants scored?
1	99,998%	0
2	99,974%	0
3	99,791%	0
4	98,936%	3
5	96,159%	7
6	89,494%	14
7	77,275%	17
8	59,819%	15
9	40,181%	18
10	22,725%	8
11	10,506%	3
12	3,841%	3
13	1,064%	4
14	0,209%	0
15	0,026%	0
16	0,002%	1

Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Prettier distribution graph:

As we see, out of the total 93 attempts we have a total of eight attempts that beat the lax <5% p-value criteria; out of those eight attempts four were borderline for the more strict <1% criteria, and only one was well below it - scoring all 16 correct out of 16 trials.
Note that the second 16/16 result is not included in the result overview, as explained in post #141.

In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post

dominikz said:
In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment.

Talisman · Apr 10, 2022

Using low-cost jbl earbuds connected directly to the tablet jack output for me it is not possible to recognize the two files, I will try again with my main listening system

AdamG · Apr 10, 2022

Talisman said:
Utilizzando auricolari jbl a basso costo collegati direttamente all'uscita jack del tablet per me non è possibile riconoscere i due file, riproverò con il mio sistema di ascolto principale

Please use English.

Translation: “Using low-cost jbl earbuds connected directly to the tablet jack output for me it is not possible to recognize the two files, I will try again with my main listening system”

Talisman · Apr 10, 2022

AdamG247 said:
Please use English.

Translation: “Using low-cost jbl earbuds connected directly to the tablet jack output for me it is not possible to recognize the two files, I will try again with my main listening system”

I apologize, I was convinced that I converted the text with google translate, I got confused and posted the Italian text

AdamG · Apr 10, 2022

Talisman said:
I apologize, I was convinced that I converted the text with google translate, I got confused and posted the Italian text

It happens so don’t beat yourself up about it.

Cote Dazur · Apr 13, 2022

gomar said:
So basically instead of getting an e50 dac I should just get this tiny box to feed the Topping PA5 into LS50 meta's.

Exactly, a Dac is a Dac, if they have no defect they will both sound equally good. This site is supposed to be objective, yet very few elected to do the test preferring to hide behind pseudo technicality.
Too many here replaced pure subjectivity with measurement subjectivity, not sure which is worst. If a measurement is inaudible, past certain point then what are we trying we to achieve by putting a scale about it? 120 dB sinad! So what!

dominikz · May 15, 2022

Time for another update of result overview so far - I believe this will be my last one, as the results are by now relatively consistent.
Since last time we has a lot of new test attempts, and especially in the last week we had around 100 new ones so I suspect the link to this test was shared in another community, giving it a new lease on life

Anyway, these are the results of participants that took the online test via abxtests.com - we had a total of 225 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.

Correct	p-value (X>=x)	How many participants scored?
0	100,000%	0
1	99,998%	0
2	99,974%	0
3	99,791%	1
4	98,936%	4
5	96,159%	17
6	89,494%	29
7	77,275%	42
8	59,819%	41
9	40,181%	46
10	22,725%	20
11	10,506%	9
12	3,841%	6
13	1,064%	6
14	0,209%	0
15	0,026%	1
16	0,002%	2

Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Prettier distribution graph:

As we see, out of the total 225 attempts we have a total of 15 attempts that beat the lax <5% p-value criteria; out of those 15 attempts 6 were borderline for the more strict <1% criteria, and only 3 were well below it - scoring 15 or all 16 correct out of 16 trials.
Note that the third 16/16 result is not included in the result overview, as explained in post #141.

To put the above in percentages:

6,7% of total test attempts beat the lax <5% p-value criteria
4% of total test attempts are either borderline or better than the more strict 1% p-value criteria
1,3% of total test attempts clearly beat the more strict 1% p-value criteria

In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post

dominikz said:
In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment.

pkane · May 15, 2022

dominikz said:
Time for another update of result overview so far - I believe this will be my last one, as the results are by now relatively consistent.
Since last time we has a lot of new test attempts, and especially in the last week we had around 100 new ones so I suspect the link to this test was shared in another community, giving it a new lease on life

Anyway, these are the results of participants that took the online test via abxtests.com - we had a total of 225 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.

Correct p-value (X>=x) How many participants scored?
0 100,000% 0
1 99,998% 0
2 99,974% 0
3 99,791% 1
4 98,936% 4
5 96,159% 17
6 89,494% 29
7 77,275% 42
8 59,819% 41
9 40,181% 46
10 22,725% 20
11 10,506% 9
12 3,841% 6
13 1,064% 6
14 0,209% 0
15 0,026% 1
16 0,002% 2

Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Prettier distribution graph:
View attachment 206859

As we see, out of the total 225 attempts we have a total of 15 attempts that beat the lax <5% p-value criteria; out of those 15 attempts 6 were borderline for the more strict <1% criteria, and only 3 were well below it - scoring 15 or all 16 correct out of 16 trials.
Note that the third 16/16 result is not included in the result overview, as explained in post #141.

To put the above in percentages:

6,7% of total test attempts beat the lax <5% p-value criteria

4% of total test attempts are either borderline or better than the more strict 1% p-value criteria

1,3% of total test attempts clearly beat the more strict 1% p-value criteria

In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post

Thank you for doing this! You’ve got a lot more takers than most Internet blind tests, and the results are really interesting, though not very surprising.

SIY · May 15, 2022

dominikz said:
Time for another update of result overview so far - I believe this will be my last one, as the results are by now relatively consistent.
Since last time we has a lot of new test attempts, and especially in the last week we had around 100 new ones so I suspect the link to this test was shared in another community, giving it a new lease on life

Anyway, these are the results of participants that took the online test via abxtests.com - we had a total of 225 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.

Correct p-value (X>=x) How many participants scored?
0 100,000% 0
1 99,998% 0
2 99,974% 0
3 99,791% 1
4 98,936% 4
5 96,159% 17
6 89,494% 29
7 77,275% 42
8 59,819% 41
9 40,181% 46
10 22,725% 20
11 10,506% 9
12 3,841% 6
13 1,064% 6
14 0,209% 0
15 0,026% 1
16 0,002% 2

Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Prettier distribution graph:
View attachment 206859

As we see, out of the total 225 attempts we have a total of 15 attempts that beat the lax <5% p-value criteria; out of those 15 attempts 6 were borderline for the more strict <1% criteria, and only 3 were well below it - scoring 15 or all 16 correct out of 16 trials.
Note that the third 16/16 result is not included in the result overview, as explained in post #141.

To put the above in percentages:

6,7% of total test attempts beat the lax <5% p-value criteria

4% of total test attempts are either borderline or better than the more strict 1% p-value criteria

1,3% of total test attempts clearly beat the more strict 1% p-value criteria

In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post

A normal distribution. Huh. Who woulda guessed that would be the result?

usern · May 15, 2022

I could not hear difference with HD660S, D30pro and JDS Atom Amp. Very difficult to use this web tool comparison because you will forget what the exact track position sounds like when it loops back around.

Would be interesting to repeatedly compare very short fragments of the song / demo track. If there is a very quiet part, you could perhaps turn up software pre gain so that noise floor would be audible and compare which one has more noise.

dominikz · May 15, 2022

usern said:
I could not hear difference with HD660S, D30pro and JDS Atom Amp. Very difficult to use this web tool comparison because you will forget what the exact track position sounds like when it loops back around. Would be interesting to repeatedly compare very short fragments of the song / demo track.

Thanks for taking the test!
The feature you describe has been discussed in this thread a few times before.
foobar2000 ABX comparator plugin implements this functionality already, you can get the source files from links in post #1 to use with it.

xaviescacs · May 16, 2022

yeah.. a Gaussian centered at 0.5 percentile, which is the distribution of the sample mean with p=q=0.5. So one can say that people's result is quite the same as random result, meaning that people don't have a clue.

However, those three points on the ~~left~~ right are interesting because, besides proving the difference is audible, they don't belong, statistically, to the main population, meaning that their p is not around 0.5 but around 1.

The question is, what make this population different? Training? Knowledge on how to set up the test? Perhaps investigating that could be the next step to find out what makes one pass the test.

SIY · May 16, 2022

xaviescacs said:
yeah.. a Gaussian centered at 0.5 percentile, which is the distribution of the sample mean with p=q=0.5. So one can say that people's result is quite the same as random result, meaning that people don't have a clue.

However, those three points on the left are interesting because, besides proving the difference is audible, they don't belong, statistically, to the main population, meaning that their p is not around 0.5 but around 1.

The question is, what make this population different? Training? Knowledge on how to set up the test? Perhaps investigating that could be the next step to find out what makes one pass the test.

Or there’s just not enough points to make it perfectly smooth and symmetrical.

This seems to hit nearly every point of the Langmuir criteria.

xaviescacs · May 16, 2022

SIY said:
Langmuir criteria

Didn't know that one

SIY said:
Or there’s just not enough points to make it perfectly smooth and symmetrical.

224 points are a lot of points! CLT kicks in at ~ 30 points. The probability that a normal distribution centered at around 8 gives those three points is vanishing. Try to simulate a Gaussian with 224 points, with the mean and sd of the sample and call me when you get this histogram, with two points with 16 and non with 14

If we perform a statistical test the p-value of those points having the same mean as the rest of the population is ~ 0.0005. So statistically we can say they come from a different population unless we take a ridiculously low confidence level for this test.

So I stand, there are two populations here, one that knows how to differentiate the two sound samples and one that doesn't.

SIY · May 16, 2022

xaviescacs said:
So I stand, there are two populations here, one that knows how to differentiate the two sound samples and one that doesn't.

If true, that should shift the mean. Does it?

Dmitriy · May 16, 2022

I did the test again and again the result is 11 correct, I think if you rest between repetitions, you can quite hear the difference, if you listen in a row, the hearing is blurred, the difference can be heard in the chorus, it is so scanty, at the placebo level the result

xaviescacs · May 16, 2022

SIY said:
If true, that should shift the mean. Does it?

Not sure what you mean...

There are a lot of points in the bell shape. The overall mean is 8.080357. The mean of the bell shape population, so to speak, is 7.977376, and the mean o the "informed" population is 15.66667. So the difference between 8.080357 and 7.977376 when conducting the t.test to test if the "informed" population belongs to the bell shape one is negligible.

dominikz · Oct 9, 2022

I did not mean to write another one of these, but test results continue coming in relentlessly.

More of the same though!

Anyway, these are the results of participants that took the online test via abxtests.com - we had a total of 350 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.

Correct	p-value (X>=x)	How many participants scored?
0	100,000%	0
1	99,998%	0
2	99,974%	0
3	99,791%	1
4	98,936%	9
5	96,159%	23
6	89,494%	45
7	77,275%	65
8	59,819%	65
9	40,181%	71
10	22,725%	33
11	10,506%	18
12	3,841%	9
13	1,064%	6
14	0,209%	0
15	0,026%	1
16	0,002%	3

Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Pretty distribution graph:

As we see, out of the total 350 attempts we have a total of 19 attempts that beat the lax <5% p-value criteria; out of those 19 attempts 10 were borderline for the more strict <1% criteria, and only 4 were well below it - scoring 15 or all 16 correct out of 16 trials.
Note that one of the 16/16 result is not included in the result overview, as explained in post #141.

To put the above in percentages:

5,4% of total test attempts beat the lax <5% p-value criteria
2,9% of total test attempts are either borderline or better than the more strict 1% p-value criteria
1,1% of total test attempts clearly beat the more strict 1% p-value criteria

In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post

dominikz said:
In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment.

DAC ABX Test Phase 1: Does a SOTA DAC sound the same as a budget DAC if proper controls are put in place? Spoiler: Probably yes. :)

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Debunking the “Infomercial” hawkers & fabricators

Addicted to Fun and Learning

Debunking the “Infomercial” hawkers & fabricators

Addicted to Fun and Learning

Addicted to Fun and Learning

Master Contributor

Grand Contributor

Senior Member

Addicted to Fun and Learning

Major Contributor

Grand Contributor

Major Contributor

Grand Contributor

Member

Major Contributor

Addicted to Fun and Learning

Similar threads