• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). There are daily reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

DAC ABX Test Phase 1: Does a SOTA DAC sound the same as a budget DAC if proper controls are put in place? Spoiler: Probably yes. :)

Grooved

Addicted to Fun and Learning
Joined
Feb 26, 2021
Messages
538
Likes
327
@dominikz

I made a mistake by posting this in the other ABX thread of @Echoes
It was you using the Cosmos ADC, so here was what I wrote:

One thing I forgot to ask you sooner regarding your previous test: can you do a test with Deltawave to compare each one of your previous recordings with the original file?
If I'm not wrong (unless I saw it on another thread), you compared both files between each other, but not each one with the original file
I ask that because using Cosmos ADC, you certainly did have no clock sync between your DAC and the ADC

Try using these settings: (and with "clock drift" enabled to check if it changes something in this case)
DW basic settings.PNG


and
DW advanced settings.PNG
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
554
Likes
1,922
@dominikz

I made a mistake by posting this in the other ABX thread of @Echoes
It was you using the Cosmos ADC, so here was what I wrote:

One thing I forgot to ask you sooner regarding your previous test: can you do a test with Deltawave to compare each one of your previous recordings with the original file?
If I'm not wrong (unless I saw it on another thread), you compared both files between each other, but not each one with the original file
I ask that because using Cosmos ADC, you certainly did have no clock sync between your DAC and the ADC

Try using these settings: (and with "clock drift" enabled to check if it changes something in this case)
View attachment 189661

and
View attachment 189662
Sure, the results are below.

In addition you may perhaps be interested in the related comments (including clock drift and offset) by @KSTR in posts #59 and #60.
If you wish to do additional analysis of the files, they are linked in post #1 and original source master file in post #76.

Note that the recordings were done with an older E1DA Cosmos ADC firmware, which did phase inversion vs original source digital master file (see this mentioned at the end of post #76). There is also phase wrapping due to the ADC linear phase filter. However the same effect is applied to both recordings so isn't relevant in our ABX context.

DeltaWave config A comparisons
First the comparisons done with the following DeltaWave config (based on the first one in your post, I'll call it config A):
1646569369291.png


Original 24bit digital master vs the same file played by Topping E50 and recorded by E1DA Cosmos ADC (DeltaWave config A)
DeltaWave v2.0.2, 2022-03-06T13:34:36.7503750+01:00
Reference: ABX - Original 24-bit master - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - Topping E50 - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -1dB Comparison: -0,837dB
Initial RMS values Reference: -14,371dB Comparison: -14,385dB

Null Depth=5,725dB
Trimming 5125 samples at start and 5094 samples at the end that are below -90,31dB level

Phase inverted
X-Correlation offset: -14 samples
Trimming 1 samples at start and 0 samples at the end that are below -90,31dB level


Trimmed 109 samples ( 2,471655ms) front, 95604 samples ( 2167,891156ms end)


Final peak values Reference: -1dB Comparison: -0,898dB
Final RMS values Reference: -14,411dB Comparison: -14,411dB

Gain= -0,0138dB (0,9984x) DC=0 Phase offset=-0,276387ms (-12,189 samples)
Difference (rms) = -16,61dB [-17,21dBA]
Correlated Null Depth=30,26dB [29,56dBA]
Clock drift: 0 ppm


Files are NOT a bit-perfect match (match=0,02%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50,0706% when reduced to 3,13 bits


Timing error (rms jitter): 20,3μs
PK Metric (step=400ms, overlap=50%):
RMS=-63,6dBFS
Median=-70,4
Max=-40,1

99%: -59,23
75%: -64,13
50%: -70,4
25%: -77,75
1%: -85,55

gn=1,00159312225962, dc=0, dr=0, of=-12,188686869

DONE!

Signature: 44c38859d7c4758589d5d9f379ddc73a

RMS of the difference of spectra: -69,678709095787dB
1646570230235.png


Original 24bit digital master vs the same file played by FiiO D03K and recorded by E1DA Cosmos ADC (DeltaWave config A)
DeltaWave v2.0.2, 2022-03-06T13:37:21.8786374+01:00
Reference: ABX - Original 24-bit master - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - FiiO D03K - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -1dB Comparison: -0,434dB
Initial RMS values Reference: -14,371dB Comparison: -14,429dB

Null Depth=5,241dB
Trimming 5125 samples at start and 5099 samples at the end that are below -90,31dB level

Phase inverted
X-Correlation offset: -21 samples
Trimming 5 samples at start and 0 samples at the end that are below -90,31dB level


Trimmed 737 samples ( 16,712018ms) front, 96896 samples ( 2197,188209ms end)


Final peak values Reference: -1dB Comparison: -0,302dB
Final RMS values Reference: -14,415dB Comparison: -14,414dB

Gain= -0,0584dB (0,9933x) DC=0 Phase offset=-0,570238ms (-25,147 samples)
Difference (rms) = -15,86dB [-16,93dBA]
Correlated Null Depth=28,44dB [20,76dBA]
Clock drift: 0 ppm


Files are NOT a bit-perfect match (match=0,01%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 49,9996% when reduced to 2,88 bits


Timing error (rms jitter): 15,1μs
PK Metric (step=400ms, overlap=50%):
RMS=-55,7dBFS
Median=-57,2
Max=-43,8

99%: -48,55
75%: -54,73
50%: -57,17
25%: -60,35
1%: -71,72

gn=1,00674577227056, dc=0, dr=0, of=-25,147483648

DONE!

Signature: a01ee4b318e212869fed16b1a4768abf

RMS of the difference of spectra: -69,3105373059918dB
1646570299516.png


Comparison of recordings from Topping E50 vs FiiO D03K into the same E1DA Cosmos ADC (DeltaWave config A)
DeltaWave v2.0.2, 2022-03-06T13:38:34.4498781+01:00
Reference: ABX - Topping E50 - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - FiiO D03K - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:False Non-linear Phase EQ: False
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:False
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -0,837dB Comparison: -0,434dB
Initial RMS values Reference: -14,385dB Comparison: -14,429dB

Null Depth=37,742dB
Trimming 5124 samples at start and 5099 samples at the end that are below -90,31dB level

X-Correlation offset: -11 samples
Trimming 0 samples at start and 0 samples at the end that are below -90,31dB level


Trimmed 8768 samples ( 198,820862ms) front, 29648 samples ( 672,290249ms end)


Final peak values Reference: -0,837dB Comparison: -0,325dB
Final RMS values Reference: -14,347dB Comparison: -14,377dB

Gain= -0,0135dB (0,9984x) DC=0 Phase offset=-0,241108ms (-10,633 samples)
Difference (rms) = -35,89dB [-46,79dBA]
Correlated Null Depth=42,96dB [49,72dBA]
Clock drift: 0 ppm


Files are NOT a bit-perfect match (match=0,06%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50,0013% when reduced to 5,94 bits


Timing error (rms jitter): 2,8μs
PK Metric (step=400ms, overlap=50%):
RMS=-56,1dBFS
Median=-56,8
Max=-46,9

99%: -48,47
75%: -55,34
50%: -56,76
25%: -58,71
1%: -66,22

gn=1,00155491522598, dc=0, dr=0, of=-10,632884702

DONE!

Signature: be567466eb523a80f0ef9462dab288b9

RMS of the difference of spectra: -87,2415878081697dB
1646570345723.png


DeltaWave config B comparisons
Now the comparisons done with the following DeltaWave config (based on the second one in your post, I'll call it config B):
1646569673151.png


Original 24bit digital master vs the same file played by Topping E50 and recorded by E1DA Cosmos ADC (DeltaWave config B)
Note
This configuration seems to cause clipping (I assume due to level/phase EQ matching), so results may not be relevant:
1646570546990.png

DeltaWave v2.0.2, 2022-03-06T13:39:55.3681467+01:00
Reference: ABX - Original 24-bit master - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - Topping E50 - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:True Non-linear Phase EQ: True
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:True
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -1dB Comparison: -0,837dB
Initial RMS values Reference: -14,371dB Comparison: -14,385dB

Null Depth=5,725dB
Trimming 5125 samples at start and 5094 samples at the end that are below -90,31dB level

Phase inverted
X-Correlation offset: -14 samples
Trimming 1 samples at start and 0 samples at the end that are below -90,31dB level


Trimmed 14609 samples ( 331,269841ms) front, 66198 samples ( 1501,088435ms end)


Final peak values Reference: -1dB Comparison: 1,452dB
Final RMS values Reference: -14,359dB Comparison: -13,423dB

Gain= 0,0895dB (1,0104x) DC=0 Phase offset=-0,276387ms (-12,189 samples)
Difference (rms) = -19,58dB [-19,38dBA]
Correlated Null Depth=24,87dB [20,15dBA]
Clock drift: 0 ppm


Files are NOT a bit-perfect match (match=0,01%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 49,9997% when reduced to 3,02 bits


---- Variable Group Delay. Frequency matched from 0Hz to 22,1kHz:
1kHz = 11,9ms (4296,99°)
2kHz = 15,1ms (10861,06°)
4kHz = 15ms (21632,12°)
8kHz = 9,7ms (27840,13°)
16kHz = 1,7ms (9734,16°)
Timing error (rms jitter): 13,5μs
PK Metric (step=400ms, overlap=50%):
RMS=-11,6dBFS
Median=-12,3
Max=-5,9

99%: -6,18
75%: -10,73
50%: -12,35
25%: -17,34
1%: -24,58

gn=0,989743833360899, dc=0, dr=0, of=-12,188686869

DONE!

Signature: c9236bc116d5acb4c3c117fe8e6894e0

RMS of the difference of spectra: -72,6823297673763dB
1646570712909.png


Original 24bit digital master vs the same file played by FiiO D03K and recorded by E1DA Cosmos ADC (DeltaWave config B)
Note
This configuration seems to cause clipping (I assume due to level/phase EQ matching), so results may not be relevant:
1646570818420.png

DeltaWave v2.0.2, 2022-03-06T13:45:25.8775546+01:00
Reference: ABX - Original 24-bit master - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - FiiO D03K - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:True Non-linear Phase EQ: True
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:True
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -1dB Comparison: -0,434dB
Initial RMS values Reference: -14,371dB Comparison: -14,429dB

Null Depth=5,241dB
Trimming 5125 samples at start and 5099 samples at the end that are below -90,31dB level

Phase inverted
X-Correlation offset: -21 samples
Trimming 5 samples at start and 0 samples at the end that are below -90,31dB level


Trimmed 15488 samples ( 351,201814ms) front, 75388 samples ( 1709,478458ms end)


Final peak values Reference: -1dB Comparison: 1,367dB
Final RMS values Reference: -14,355dB Comparison: -13,71dB

Gain= 0,211dB (1,0246x) DC=0 Phase offset=-0,570238ms (-25,147 samples)
Difference (rms) = -20,99dB [-21,22dBA]
Correlated Null Depth=30,49dB [21,51dBA]
Clock drift: 0 ppm


Files are NOT a bit-perfect match (match=0,01%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50,0084% when reduced to 3,23 bits


---- Variable Group Delay. Frequency matched from 0Hz to 22,1kHz:
1kHz = 6,3ms (2262,77°)
2kHz = 5,7ms (4088,90°)
4kHz = 14,8ms (21262,08°)
8kHz = 11,3ms (32454,41°)
16kHz = 1,8ms (10476,69°)
Timing error (rms jitter): 26,7μs
PK Metric (step=400ms, overlap=50%):
RMS=-18,4dBFS
Median=-20,3
Max=-12,2

99%: -12,78
75%: -16,1
50%: -20,33
25%: -24,32
1%: -29,66

gn=0,9759995710513, dc=0, dr=0, of=-25,147483648

DONE!

Signature: 9e4553a34435094b10d7402523af3a7e

RMS of the difference of spectra: -72,9269593417217dB
1646570857544.png


Comparison of recordings from Topping E50 vs FiiO D03K into the same E1DA Cosmos ADC (DeltaWave config B)
DeltaWave v2.0.2, 2022-03-06T13:47:49.2522559+01:00
Reference: ABX - Topping E50 - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Comparison: ABX - FiiO D03K - Farewell to Arms.flac[?] 2254189 samples 44100Hz 24bits, stereo, MD5=00
Settings:
Gain:True, Remove DC:False
Non-linear Gain EQ:True Non-linear Phase EQ: True
EQ FFT Size:2097152, EQ Frequency Cut: 0Hz - 0Hz, EQ Threshold: -160dB
Correct Non-linearity: False
Correct Drift:False, Precision:30, Subsample Align:True
Non-Linear drift Correction:True
Upsample:False, Window:Hann
Spectrum Window:Kaiser, Spectrum Size:65536
Spectrogram Window:Hann, Spectrogram Size:4096, Spectrogram Steps:2048
Filter Type:FIR, window:Hann, taps:65536, minimum phase=False
Dither:False bits=0
Trim Silence:True
Enable Simple Waveform Measurement: False

Discarding Reference: Start=0s, End=0s
Discarding Comparison: Start=0s, End=0s

Initial peak values Reference: -0,837dB Comparison: -0,434dB
Initial RMS values Reference: -14,385dB Comparison: -14,429dB

Null Depth=37,742dB
Trimming 5124 samples at start and 5099 samples at the end that are below -90,31dB level

X-Correlation offset: -11 samples
Trimming 0 samples at start and 0 samples at the end that are below -90,31dB level


Trimmed 21514 samples ( 487,845805ms) front, 49914 samples ( 1131,836735ms end)


Final peak values Reference: -0,837dB Comparison: -0,879dB
Final RMS values Reference: -14,34dB Comparison: -14,339dB

Gain= -0,0452dB (0,9948x) DC=0 Phase offset=-0,241108ms (-10,633 samples)
Difference (rms) = -49,6dB [-49,98dBA]
Correlated Null Depth=58,21dB [50,59dBA]
Clock drift: 0 ppm


Files are NOT a bit-perfect match (match=0,22%) at 16 bits
Files are NOT a bit-perfect match (match=0%) at 24 bits
Files match @ 50,0148% when reduced to 7,83 bits


---- Variable Group Delay. Frequency matched from 0Hz to 21,1kHz:
1kHz = 3,5μs (1,25°)
2kHz = 866,9ns (0,62°)
4kHz = 320,1ns (0,46°)
8kHz = 22,6ns (0,06°)
16kHz = 547,9ns (3,16°)
Timing error (rms jitter): 1,1μs
PK Metric (step=400ms, overlap=50%):
RMS=-56,4dBFS
Median=-59,6
Max=-46,5

99%: -48,18
75%: -55,64
50%: -59,57
25%: -62,2
1%: -68,35

gn=1,0052157954628, dc=0, dr=0, of=-10,632884702

DONE!

Signature: e26091f87744101b3731632de57c08b8

RMS of the difference of spectra: -96,796616364537dB
1646570973272.png
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
554
Likes
1,922
Time for another update of result overview so far:

These are the results of participants that took the online test via abxtests.com - we had a total of 93 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Correctp-value (X>=x)How many participants scored?
199,998%0
299,974%0
399,791%0
498,936%3
596,159%7
689,494%14
777,275%17
859,819%15
940,181%18
1022,725%8
1110,506%3
123,841%3
131,064%4
140,209%0
150,026%0
160,002%1
Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Prettier distribution graph:
1646597982465.png


As we see, out of the total 93 attempts we have a total of eight attempts that beat the lax <5% p-value criteria; out of those eight attempts four were borderline for the more strict <1% criteria, and only one was well below it - scoring all 16 correct out of 16 trials.
Note that the second 16/16 result is not included in the result overview, as explained in post #141.

In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post :p
In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment.
 

Talisman

Active Member
Forum Donor
Joined
Mar 27, 2022
Messages
216
Likes
436
Using low-cost jbl earbuds connected directly to the tablet jack output for me it is not possible to recognize the two files, I will try again with my main listening system
 
Last edited by a moderator:

AdamG247

Coadjutor Miscreant and general Scalawag
Moderator
Forum Donor
Joined
Jan 3, 2021
Messages
2,920
Likes
7,623
Utilizzando auricolari jbl a basso costo collegati direttamente all'uscita jack del tablet per me non è possibile riconoscere i due file, riproverò con il mio sistema di ascolto principale
Please use English.

Translation: “Using low-cost jbl earbuds connected directly to the tablet jack output for me it is not possible to recognize the two files, I will try again with my main listening system”
 

Talisman

Active Member
Forum Donor
Joined
Mar 27, 2022
Messages
216
Likes
436
Please use English.

Translation: “Using low-cost jbl earbuds connected directly to the tablet jack output for me it is not possible to recognize the two files, I will try again with my main listening system”
I apologize, I was convinced that I converted the text with google translate, I got confused and posted the Italian text
 

AdamG247

Coadjutor Miscreant and general Scalawag
Moderator
Forum Donor
Joined
Jan 3, 2021
Messages
2,920
Likes
7,623
I apologize, I was convinced that I converted the text with google translate, I got confused and posted the Italian text
It happens so don’t beat yourself up about it. ;)
 

Cote Dazur

Senior Member
Joined
Feb 25, 2022
Messages
308
Likes
267
So basically instead of getting an e50 dac I should just get this tiny box to feed the Topping PA5 into LS50 meta's.
Exactly, a Dac is a Dac, if they have no defect they will both sound equally good. This site is supposed to be objective, yet very few elected to do the test preferring to hide behind pseudo technicality.
Too many here replaced pure subjectivity with measurement subjectivity, not sure which is worst. If a measurement is inaudible, past certain point then what are we trying we to achieve by putting a scale about it? 120 dB sinad! So what!
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
554
Likes
1,922
Time for another update of result overview so far - I believe this will be my last one, as the results are by now relatively consistent.
Since last time we has a lot of new test attempts, and especially in the last week we had around 100 new ones so I suspect the link to this test was shared in another community, giving it a new lease on life :)

Anyway, these are the results of participants that took the online test via abxtests.com - we had a total of 225 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Correctp-value (X>=x)How many participants scored?
0100,000%0
199,998%0
299,974%0
399,791%1
498,936%4
596,159%17
689,494%29
777,275%42
859,819%41
940,181%46
1022,725%20
1110,506%9
123,841%6
131,064%6
140,209%0
150,026%1
160,002%2
Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Prettier distribution graph:
1652647526550.png


As we see, out of the total 225 attempts we have a total of 15 attempts that beat the lax <5% p-value criteria; out of those 15 attempts 6 were borderline for the more strict <1% criteria, and only 3 were well below it - scoring 15 or all 16 correct out of 16 trials.
Note that the third 16/16 result is not included in the result overview, as explained in post #141.

To put the above in percentages:
  • 6,7% of total test attempts beat the lax <5% p-value criteria
  • 4% of total test attempts are either borderline or better than the more strict 1% p-value criteria
  • 1,3% of total test attempts clearly beat the more strict 1% p-value criteria
In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post :p
In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment.
 

pkane

Major Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
4,168
Likes
7,156
Location
North-East
Time for another update of result overview so far - I believe this will be my last one, as the results are by now relatively consistent.
Since last time we has a lot of new test attempts, and especially in the last week we had around 100 new ones so I suspect the link to this test was shared in another community, giving it a new lease on life :)

Anyway, these are the results of participants that took the online test via abxtests.com - we had a total of 225 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Correctp-value (X>=x)How many participants scored?
0100,000%0
199,998%0
299,974%0
399,791%1
498,936%4
596,159%17
689,494%29
777,275%42
859,819%41
940,181%46
1022,725%20
1110,506%9
123,841%6
131,064%6
140,209%0
150,026%1
160,002%2
Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Prettier distribution graph:
View attachment 206859

As we see, out of the total 225 attempts we have a total of 15 attempts that beat the lax <5% p-value criteria; out of those 15 attempts 6 were borderline for the more strict <1% criteria, and only 3 were well below it - scoring 15 or all 16 correct out of 16 trials.
Note that the third 16/16 result is not included in the result overview, as explained in post #141.

To put the above in percentages:
  • 6,7% of total test attempts beat the lax <5% p-value criteria
  • 4% of total test attempts are either borderline or better than the more strict 1% p-value criteria
  • 1,3% of total test attempts clearly beat the more strict 1% p-value criteria
In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post :p

Thank you for doing this! You’ve got a lot more takers than most Internet blind tests, and the results are really interesting, though not very surprising.
 

SIY

Master Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
8,737
Likes
19,846
Location
Alfred, NY
Time for another update of result overview so far - I believe this will be my last one, as the results are by now relatively consistent.
Since last time we has a lot of new test attempts, and especially in the last week we had around 100 new ones so I suspect the link to this test was shared in another community, giving it a new lease on life :)

Anyway, these are the results of participants that took the online test via abxtests.com - we had a total of 225 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Correctp-value (X>=x)How many participants scored?
0100,000%0
199,998%0
299,974%0
399,791%1
498,936%4
596,159%17
689,494%29
777,275%42
859,819%41
940,181%46
1022,725%20
1110,506%9
123,841%6
131,064%6
140,209%0
150,026%1
160,002%2
Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Prettier distribution graph:
View attachment 206859

As we see, out of the total 225 attempts we have a total of 15 attempts that beat the lax <5% p-value criteria; out of those 15 attempts 6 were borderline for the more strict <1% criteria, and only 3 were well below it - scoring 15 or all 16 correct out of 16 trials.
Note that the third 16/16 result is not included in the result overview, as explained in post #141.

To put the above in percentages:
  • 6,7% of total test attempts beat the lax <5% p-value criteria
  • 4% of total test attempts are either borderline or better than the more strict 1% p-value criteria
  • 1,3% of total test attempts clearly beat the more strict 1% p-value criteria
In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post :p
A normal distribution. Huh. Who woulda guessed that would be the result?:cool:
 

usern

Senior Member
Joined
May 31, 2020
Messages
331
Likes
306
I could not hear difference with HD660S, D30pro and JDS Atom Amp. Very difficult to use this web tool comparison because you will forget what the exact track position sounds like when it loops back around.

Would be interesting to repeatedly compare very short fragments of the song / demo track. If there is a very quiet part, you could perhaps turn up software pre gain so that noise floor would be audible and compare which one has more noise.
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
554
Likes
1,922
I could not hear difference with HD660S, D30pro and JDS Atom Amp. Very difficult to use this web tool comparison because you will forget what the exact track position sounds like when it loops back around. Would be interesting to repeatedly compare very short fragments of the song / demo track.
Thanks for taking the test!
The feature you describe has been discussed in this thread a few times before.
foobar2000 ABX comparator plugin implements this functionality already, you can get the source files from links in post #1 to use with it.
 

xaviescacs

Major Contributor
Forum Donor
Joined
Mar 23, 2021
Messages
1,309
Likes
1,557
Location
La Garriga, Barcelona
yeah.. a Gaussian centered at 0.5 percentile, which is the distribution of the sample mean with p=q=0.5. So one can say that people's result is quite the same as random result, meaning that people don't have a clue. :D

However, those three points on the left right are interesting because, besides proving the difference is audible, they don't belong, statistically, to the main population, meaning that their p is not around 0.5 but around 1.

The question is, what make this population different? Training? Knowledge on how to set up the test? Perhaps investigating that could be the next step to find out what makes one pass the test.
 
Last edited:

SIY

Master Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
8,737
Likes
19,846
Location
Alfred, NY
yeah.. a Gaussian centered at 0.5 percentile, which is the distribution of the sample mean with p=q=0.5. So one can say that people's result is quite the same as random result, meaning that people don't have a clue. :D

However, those three points on the left are interesting because, besides proving the difference is audible, they don't belong, statistically, to the main population, meaning that their p is not around 0.5 but around 1.

The question is, what make this population different? Training? Knowledge on how to set up the test? Perhaps investigating that could be the next step to find out what makes one pass the test.
Or there’s just not enough points to make it perfectly smooth and symmetrical.

This seems to hit nearly every point of the Langmuir criteria.
 

xaviescacs

Major Contributor
Forum Donor
Joined
Mar 23, 2021
Messages
1,309
Likes
1,557
Location
La Garriga, Barcelona
Langmuir criteria
Didn't know that one :D
Or there’s just not enough points to make it perfectly smooth and symmetrical.
224 points are a lot of points! CLT kicks in at ~ 30 points. The probability that a normal distribution centered at around 8 gives those three points is vanishing. Try to simulate a Gaussian with 224 points, with the mean and sd of the sample and call me when you get this histogram, with two points with 16 and non with 14 :p

If we perform a statistical test the p-value of those points having the same mean as the rest of the population is ~ 0.0005. So statistically we can say they come from a different population unless we take a ridiculously low confidence level for this test.

So I stand, there are two populations here, one that knows how to differentiate the two sound samples and one that doesn't. :)
 

Dmitriy

Member
Joined
Oct 12, 2021
Messages
17
Likes
22
I did the test again and again the result is 11 correct, I think if you rest between repetitions, you can quite hear the difference, if you listen in a row, the hearing is blurred, the difference can be heard in the chorus, it is so scanty, at the placebo level the result
 

xaviescacs

Major Contributor
Forum Donor
Joined
Mar 23, 2021
Messages
1,309
Likes
1,557
Location
La Garriga, Barcelona
If true, that should shift the mean. Does it?
Not sure what you mean...

There are a lot of points in the bell shape. The overall mean is 8.080357. The mean of the bell shape population, so to speak, is 7.977376, and the mean o the "informed" population is 15.66667. So the difference between 8.080357 and 7.977376 when conducting the t.test to test if the "informed" population belongs to the bell shape one is negligible.
 
OP
dominikz

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
554
Likes
1,922
I did not mean to write another one of these, but test results continue coming in relentlessly. o_O More of the same though! :)

Anyway, these are the results of participants that took the online test via abxtests.com - we had a total of 350 completed attempts.
Note that here I'm saying 'attempts' instead of 'participants' - this is because a few participants reported they took the test more than once.
Correctp-value (X>=x)How many participants scored?
0100,000%0
199,998%0
299,974%0
399,791%1
498,936%9
596,159%23
689,494%45
777,275%65
859,819%65
940,181%71
1022,725%33
1110,506%18
123,841%9
131,064%6
140,209%0
150,026%1
160,002%3

Note: p-value P(X>=x) has been calculated with this online calculator (n=16, p=0.5, q=0.5, K=<number of correct trials>) and cross-checked here.

Pretty distribution graph:
1665350386961.png

As we see, out of the total 350 attempts we have a total of 19 attempts that beat the lax <5% p-value criteria; out of those 19 attempts 10 were borderline for the more strict <1% criteria, and only 4 were well below it - scoring 15 or all 16 correct out of 16 trials.
Note that one of the 16/16 result is not included in the result overview, as explained in post #141.

To put the above in percentages:
  • 5,4% of total test attempts beat the lax <5% p-value criteria
  • 2,9% of total test attempts are either borderline or better than the more strict 1% p-value criteria
  • 1,1% of total test attempts clearly beat the more strict 1% p-value criteria
In addition to above, we had two participants reporting they also did the test in foobar2000 ABX comparator: one participant got 40 correct out of 64 trials for a total p-value of 2,997% (beating the <5% p-value criteria, but not the more strict <1% criteria), the other participant reported they couldn't hear a clear difference so gave up.

Here's a(nother) replay of closing words from my original overview post :p
In the end, I do hope this was an interesting exercise to those included. Hopefully one that also illustrates the importance of precise level matching and blind listening when doing comparisons of audio equipment.
 
Top Bottom