• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

DAC ABX Test Phase 1: Does a SOTA DAC sound the same as a budget DAC if proper controls are put in place? Spoiler: Probably yes. :)

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
870
Likes
2,784
This thread is meant to once and for all settle the age-old question on whether different non-broken DACs have a "sound" or if all DACs sound the same! :cool:
...just kidding - those debates will probably never end. :p

Let me start by providing the link to the ABX test.
Note: There are 16 trials in the test.

[EDIT 2024-09-17] Latest overview of participant test results can be found in post #196.

[EDIT 2022-01-17]
Note that since test files are recorded in 44,1kHz sample rate it is recommended to set your audio output device sample rate to 44,1kHz as well to avoid resampling.
[EDIT 2021-12-29] For those wanting to use foobar2000 (in WASAPI exclusive mode) with ABX comparator plugin (16 trials suggested), here are direct links to the audio test files:
1. Topping E50 ABX sample file
2.
FiiO Taishan D03K ABX sample file
If you do a test please report your results (copy/paste of ABX comparator result output) by posting in this thread or via a private message. Thanks! :)

Details of the test are provided below.

Introduction

Anyway, since I recently got a Topping E50 DAC and E1DA Cosmos ADC (both SOTA converters) I thought it might be interesting to prepare a controlled blind test between a SOTA DAC and one that is a relatively low-performing unit by todays standards (but not so much that one would call it "broken"). The ideas is to see if there are any audible differences in such an extreme comparison.
Many people might expect that this would be an easy test (given the price and spec difference), so let's start with that before trying anything else :)

Title states "Phase 1" because I plan to do a more difficult follow-up (E50 vs Babyface DAC) if results of this test show that several people can reliably tell the difference in this 'simple' test.

DACs under test

Meet the contenders:
1640518210939.png


Test equipment, SW and test track

The test track is the original 44,1kHz/24bit digital master wav file of the song "Farewell to Arms" (link to full song, available on various streaming services).
The song was mainly selected because I have the distribution rights and master files for it (shameless self-promotion alert! :D).

The ADC used to record both DACs was the E1DA Cosmos ADC (full AP measurements available here) which, similarly to Topping E50, achieves measurement-equipment-grade conversion performance (SINAD ~120dB).

Head'n'HiFi Objective2 (O2) headphone amp in unity (1x) gain setting was used as an impedance buffer when recording the D03K - more on this below.

With both E50 and D03K the optical input was used to feed the DAC, and the source was the RME Babyface Silver Edition (1st gen) soundcard.

PreSonus Studio One 5 Professional was used to record the files. The project was configured to use the same 44,1kHz sample rate as the source track, and 32bit bit-depth to avoid any chance of loss of data.

ASIO4All driver was used so that the RME soundcard and E1DA Cosmos ADC USB devices can be used together as a single device for measurements and recording. 44,1kHz sample rate was used in all cases, to be consistent with the original recording and to avoid any resampling.

The online ABX test is constructed with the amazing abxtests.com web-based tool by @jaakkopasanen (see related thread). Thanks to @jaakkopasanen for building and providing this resource to the community!

Test file recording and preparation

In principle the concept is that of a simple loopback test - i.e. the output of both DACs was (separately) recorded by the same ADC, and resulting files level matched to better than 0,1dB accuracy.

Tone generator plugin (set to generate a -1dBFS 1kHz tone) was added to the track before the source file and used to calibrate the levels of both DACs when recording, and later to fine-tune the levels of the resulting recordings.

The recording chain was:

1) RME Babyface optical out -> Topping E50 optical in then balanced L/R out -> E1DA Cosmos ADC (set to 4,5V sensitivity) balanced L/R in

With this setup -1dBFS sine output results in -2,1dBFS input (-1,1dB loss in level).

Here's a short snip of the performance in this setup (used to reference with independent measurements and to make sure that the recording setup is correct):
Topping E50 - left channel THD+N vs frequency (at -1 dBFS).jpg

Topping E50 - right channel THD+N vs frequency (at -1 dBFS).jpg

We see THD+N is very low (at -115dB vs fundamental at 1kHz) and pretty close to maximum performance. Around 2dB is lost due to selected measurement input level, and additional ~3dB should be lost due to both ADC and DAC having similar SNRs and the noise summing in loopback.

2) RME Babyface optical out -> FiiO Taishan D03K optical in, then unbalanced RCA L/R out -> Head'n'Hifi O2 in 1x/unity gain (impedance buffer) -> E1DA Cosmos ADC (set to 1,7V sensitivity) unbalanced L/R input

With this setup -1dBFS sine output results in -3dBFS input (-2dB loss in level).

Note: E1DA Cosmos ADC has no input buffer and at 1,7V sensitivity has a very low input impedance (around 450 Ohms in unbalanced mode). If connected directly to D03K output this low input impedance is loading down the DAC's output, decreasing the level and increasing the distortion significantly. By using the O2 as buffer (1x gain with maximum volume pot position results in unity gain between input and output) its high input impedance and low output impedance allows for optimum impedance matching between the two devices. Since performance of O2 is much better than that of D03K there is no loss in signal transfer quality due to this.

Here's a short snip of the performance in this setup (used to reference with independent measurements and to make sure that the recording setup is correct):
FiiO D03K - left channel THD+N vs frequency (at -1 dBFS).jpg

FiiO D03K - right channel THD+N vs frequency (at -1 dBFS).jpg

Since D03K is not a very high performing unit (much worse performing than the rest of the measurement chain), this is fully in line with previous measurements, as well as the manufacturer's specification.

From the above diagrams we see that the total loopback recording chain 1kHz SINAD is:
  • ~115 dB with Topping E50
  • ~90 dB with FiiO Taishan D03K

Lastly let's compare the measured performance of the two DACs on FR and THD:
Topping E50 vs FiiO D03K - frequency response -1 dBFS.jpg

E50 FR is perfectly flat, while D03K has a ~0,3dB loss at 20Hz, and a ~0,7dB peak at 17,5kHz.
Topping E50 vs FiiO D03K - THD at -1 dBFS.jpg

THD of E50 is more uniform across frequencies and >28dB better than that of D03K.

As we saw before, there is an overall ~0,9dB level difference in the raw recorded files, due to different analogue output levels of the two DACs and fixed sensitivity settings of the ADC.
To further fine-match the volume, Presonus Mixtool plugin (allowing gain adjustments to 0,01 dB precision) was used to tune the 1kHz tone level to maximum precision. Resulting files are therefore matched to significantly better than required 0,1dB (at 1kHz at least, note slight FR differences above).

Discussion and conclusion

As you can see, care was taken to achieve maximum performance from both DACs when preparing the test and to achieve very good level matching in all stages. This, combined with the blinded ABX test approach, ensures a controlled subjective comparison of audible differences between the two DACs.

Note however that an ABX test by itself will not tell us which DAC sounds 'better' (preference) - it will just tell us if there's any audible difference between them at all. This is still an important first step before it makes sense to investigate preference at all.

Though I consider this to be a difficult test to most people, I expect that those with well-preserved high frequency hearing (>10kHz) might still be able to hear a difference between the two DACs due to differences in their frequency response. Note that FiiO Taishan D03K has a slight sub-LF loss and a small top-octave peak that should result in a slightly brighter sound than the Topping E50, which could then be used to identify the "X".

I will wait for a week or so before posting preliminary result summary (unless a lot of people complete the test sooner and we have some significant results quickly). Anyone completing the test is of course free to post their results and impressions here at any time :)

If no listeners are able to reliably differentiate these two DACs I will not proceed to prepare the more difficult 'phase 2' test.
However if there are several listeners that can differentiate between these two DACs reliably, I will prepare the a similar comparison between the RME Babyface 1st gen DAC (flat FR, SINAD >100dB), representing average but solid DAC performance, and Topping E50 (flat FR, SINAD ~120dB) representing SOTA DAC performance.

Lastly, let me link again to the ABX test.
Note: There are 16 trials in the test.

[EDIT 2022-01-17] Note that since test files are recorded in 44,1kHz sample rate it is recommended to set your audio output device sample rate to 44,1kHz as well to avoid resampling.
[EDIT 2021-12-29] For those wanting to use foobar2000 (in WASAPI exclusive mode) with ABX comparator plugin (16 trials suggested), here are direct links to the audio test files:
1. Topping E50 ABX sample file
2.
FiiO Taishan D03K ABX sample file
If you do a test please report your results (copy/paste of ABX comparator result output) by posting in this thread or via a private message. Thanks! :)

Enjoy! :D

[EDIT 2022-01-17] You can find an initial overview of results in post #69 (up to 01.01.2022.) and the latest one in post #130 (up to 17.01.2022.).
[EDIT 2022-02-18] Updated overview of results can be found in post #139.
[EDIT 2022-03-06] Updated overview of results can be found in post #163.
[EDIT 2022-05-15] Updated overview of results can be found in post #169.
[EDIT 2022-10-09] Updated overview of results can be found in post #180.
[EDIT 2024-09-17] Updated overview of results can be found in post #196.
 
Last edited:
So far we have 3 participants that completed the test.
Come on people, do it for science! Or to prove science wrong!
Either way works :p
That might be due to timing. Your test has coincided with holiday season. Many people are traveling or hosting relatives and probably haven't had a good time to sit and do such a test yet. Be patient.

OTOH, when I've posted actual files for people to listen to and choose without knowing, the participation levels have always been abysmal.
 
That might be due to timing. Your test has coincided with holiday season. Many people are traveling or hosting relatives and probably haven't had a good time to sit and do such a test yet. Be patient.

OTOH, when I've posted actual files for people to listen to and choose without knowing, the participation levels have always been abysmal.
Thanks!
TBH I can also imagine that these kind of tests might be demotivating, and that some perhaps start but give up before finishing.
Still, would be interesting to read some impressions!
 
The first couple tests I thought I could detect a difference. By the fourth test I kept going back and forth and eventually gave up. It could be my old ears and modest equipment, or not.
 
This thread is meant to once and for all settle the age-old question on whether different non-broken DACs have a "sound" or if all DACs sound the same! :cool:
...just kidding - those debates will probably never end. :p

Let me start by providing the link to the ABX test.
Note: There are 16 trials in the test.

Details of the test are provided below.

Introduction

Anyway, since I recently got a Topping E50 DAC and E1DA Cosmos ADC (both SOTA converters) I thought it might be interesting to prepare a controlled blind test between a SOTA DAC and one that is a relatively low-performing unit by todays standards (but not so much that one would call it "broken"). The ideas is to see if there are any audible differences in such an extreme comparison.
Many people might expect that this would be an easy test (given the price and spec difference), so let's start with that before trying anything else :)

Title states "Phase 1" because I plan to do a more difficult follow-up (E50 vs Babyface DAC) if results of this test show that several people can reliably tell the difference in this 'simple' test.

DACs under test

Meet the contenders:
View attachment 174684

Test equipment, SW and test track

The test track is the original 44,1kHz/24bit digital master wav file of the song "Farewell to Arms" (link to full song, available on various streaming services).
The song was mainly selected because I have the distribution rights and master files for it (shameless self-promotion alert! :D).

The ADC used to record both DACs was the E1DA Cosmos ADC (full AP measurements available here) which, similarly to Topping E50, achieves measurement-equipment-grade conversion performance (SINAD ~120dB).

Head'n'HiFi Objective2 (O2) headphone amp in unity (1x) gain setting was used as an impedance buffer when recording the D03K - more on this below.

With both E50 and D03K the optical input was used to feed the DAC, and the source was the RME Babyface Silver Edition (1st gen) soundcard.

PreSonus Studio One 5 Professional was used to record the files. The project was configured to use the same 44,1kHz sample rate as the source track, and 32bit bit-depth to avoid any chance of loss of data.

ASIO4All driver was used so that the RME soundcard and E1DA Cosmos ADC USB devices can be used together as a single device for measurements and recording. 44,1kHz sample rate was used in all cases, to be consistent with the original recording and to avoid any resampling.

The online ABX test is constructed with the amazing abxtests.com web-based tool by @jaakkopasanen (see related thread). Thanks to @jaakkopasanen for building and providing this resource to the community!

Test file recording and preparation

In principle the concept is that of a simple loopback test - i.e. the output of both DACs was (separately) recorded by the same ADC, and resulting files level matched to better than 0,1dB accuracy.

Tone generator plugin (set to generate a -1dBFS 1kHz tone) was added to the track before the source file and used to calibrate the levels of both DACs when recording, and later to fine-tune the levels of the resulting recordings.

The recording chain was:

1) RME Babyface optical out -> Topping E50 optical in then balanced L/R out -> E1DA Cosmos ADC (set to 4,5V sensitivity) balanced L/R in

With this setup -1dBFS sine output results in -2,1dBFS input (-1,1dB loss in level).

Here's a short snip of the performance in this setup (used to reference with independent measurements and to make sure that the recording setup is correct):
View attachment 174697
View attachment 174698
We see THD+N is very low (at -115dB vs fundamental at 1kHz) and pretty close to maximum performance. Around 2dB is lost due to selected measurement input level, and additional ~3dB should be lost due to both ADC and DAC having similar SNRs and the noise summing in loopback.

2) RME Babyface optical out -> FiiO Taishan D03K optical in, then unbalanced RCA L/R out -> Head'n'Hifi O2 in 1x/unity gain (impedance buffer) -> E1DA Cosmos ADC (set to 1,7V sensitivity) unbalanced L/R input

With this setup -1dBFS sine output results in -3dBFS input (-2dB loss in level).

Note: E1DA Cosmos ADC has no input buffer and at 1,7V sensitivity has a very low input impedance (around 450 Ohms in unbalanced mode). If connected directly to D03K output this low input impedance is loading down the DAC's output, decreasing the level and increasing the distortion significantly. By using the O2 as buffer (1x gain with maximum volume pot position results in unity gain between input and output) its high input impedance and low output impedance allows for optimum impedance matching between the two devices. Since performance of O2 is much better than that of D03K there is no loss in signal transfer quality due to this.

Here's a short snip of the performance in this setup (used to reference with independent measurements and to make sure that the recording setup is correct):
View attachment 174699
View attachment 174700
Since D03K is not a very high performing unit (much worse performing than the rest of the measurement chain), this is fully in line with previous measurements, as well as the manufacturer's specification.

From the above diagrams we see that the total loopback recording chain 1kHz SINAD is:
  • ~115 dB with Topping E50
  • ~90 dB with FiiO Taishan D03K

Lastly let's compare the measured performance of the two DACs on FR and THD:
View attachment 174702
E50 FR is perfectly flat, while D03K has a ~0,3dB loss at 20Hz, and a ~0,7dB peak at 17,5kHz.
View attachment 174703
THD of E50 is more uniform across frequencies and >28dB better than that of D03K.

As we saw before, there is an overall ~0,9dB level difference in the raw recorded files, due to different analogue output levels of the two DACs and fixed sensitivity settings of the ADC.
To further fine-match the volume, Presonus Mixtool plugin (allowing gain adjustments to 0,01 dB precision) was used to tune the 1kHz tone level to maximum precision. Resulting files are therefore matched to significantly better than required 0,1dB (at 1kHz at least, note slight FR differences above).

Discussion and conclusion

As you can see, care was taken to achieve maximum performance from both DACs when preparing the test and to achieve very good level matching in all stages. This, combined with the blinded ABX test approach, ensures a controlled subjective comparison of audible differences between the two DACs.

Note however that an ABX test by itself will not tell us which DAC sounds 'better' (preference) - it will just tell us if there's any audible difference between them at all. This is still an important first step before it makes sense to investigate preference at all.

Though I consider this to be a difficult test to most people, I expect that those with well-preserved high frequency hearing (>10kHz) might still be able to hear a difference between the two DACs due to differences in their frequency response. Note that FiiO Taishan D03K has a slight sub-LF loss and a small top-octave peak that should result in a slightly brighter sound than the Topping E50, which could then be used to identify the "X".

I will wait for a week or so before posting preliminary result summary (unless a lot of people complete the test sooner and we have some significant results quickly). Anyone completing the test is of course free to post their results and impressions here at any time :)

If no listeners are able to reliably differentiate these two DACs I will not proceed to prepare the more difficult 'phase 2' test.
However if there are several listeners that can differentiate between these two DACs reliably, I will prepare the a similar comparison between the RME Babyface 1st gen DAC (flat FR, SINAD >100dB), representing average but solid DAC performance, and Topping E50 (flat FR, SINAD ~120dB) representing SOTA DAC performance.

Lastly, let me link again to the ABX test.
Note: There are 16 trials in the test.

Enjoy! :D
May I know if X is same file for all 16 trials?
 
This thread is meant to once and for all settle the age-old question on whether different non-broken DACs have a "sound" or if all DACs sound the same! :cool:
...just kidding - those debates will probably never end. :p

Let me start by providing the link to the ABX test.
Note: There are 16 trials in the test.

Details of the test are provided below.

Introduction

Anyway, since I recently got a Topping E50 DAC and E1DA Cosmos ADC (both SOTA converters) I thought it might be interesting to prepare a controlled blind test between a SOTA DAC and one that is a relatively low-performing unit by todays standards (but not so much that one would call it "broken"). The ideas is to see if there are any audible differences in such an extreme comparison.
Many people might expect that this would be an easy test (given the price and spec difference), so let's start with that before trying anything else :)

Title states "Phase 1" because I plan to do a more difficult follow-up (E50 vs Babyface DAC) if results of this test show that several people can reliably tell the difference in this 'simple' test.

DACs under test

Meet the contenders:
View attachment 174684

Test equipment, SW and test track

The test track is the original 44,1kHz/24bit digital master wav file of the song "Farewell to Arms" (link to full song, available on various streaming services).
The song was mainly selected because I have the distribution rights and master files for it (shameless self-promotion alert! :D).

The ADC used to record both DACs was the E1DA Cosmos ADC (full AP measurements available here) which, similarly to Topping E50, achieves measurement-equipment-grade conversion performance (SINAD ~120dB).

Head'n'HiFi Objective2 (O2) headphone amp in unity (1x) gain setting was used as an impedance buffer when recording the D03K - more on this below.

With both E50 and D03K the optical input was used to feed the DAC, and the source was the RME Babyface Silver Edition (1st gen) soundcard.

PreSonus Studio One 5 Professional was used to record the files. The project was configured to use the same 44,1kHz sample rate as the source track, and 32bit bit-depth to avoid any chance of loss of data.

ASIO4All driver was used so that the RME soundcard and E1DA Cosmos ADC USB devices can be used together as a single device for measurements and recording. 44,1kHz sample rate was used in all cases, to be consistent with the original recording and to avoid any resampling.

The online ABX test is constructed with the amazing abxtests.com web-based tool by @jaakkopasanen (see related thread). Thanks to @jaakkopasanen for building and providing this resource to the community!

Test file recording and preparation

In principle the concept is that of a simple loopback test - i.e. the output of both DACs was (separately) recorded by the same ADC, and resulting files level matched to better than 0,1dB accuracy.

Tone generator plugin (set to generate a -1dBFS 1kHz tone) was added to the track before the source file and used to calibrate the levels of both DACs when recording, and later to fine-tune the levels of the resulting recordings.

The recording chain was:

1) RME Babyface optical out -> Topping E50 optical in then balanced L/R out -> E1DA Cosmos ADC (set to 4,5V sensitivity) balanced L/R in

With this setup -1dBFS sine output results in -2,1dBFS input (-1,1dB loss in level).

Here's a short snip of the performance in this setup (used to reference with independent measurements and to make sure that the recording setup is correct):
View attachment 174697
View attachment 174698
We see THD+N is very low (at -115dB vs fundamental at 1kHz) and pretty close to maximum performance. Around 2dB is lost due to selected measurement input level, and additional ~3dB should be lost due to both ADC and DAC having similar SNRs and the noise summing in loopback.

2) RME Babyface optical out -> FiiO Taishan D03K optical in, then unbalanced RCA L/R out -> Head'n'Hifi O2 in 1x/unity gain (impedance buffer) -> E1DA Cosmos ADC (set to 1,7V sensitivity) unbalanced L/R input

With this setup -1dBFS sine output results in -3dBFS input (-2dB loss in level).

Note: E1DA Cosmos ADC has no input buffer and at 1,7V sensitivity has a very low input impedance (around 450 Ohms in unbalanced mode). If connected directly to D03K output this low input impedance is loading down the DAC's output, decreasing the level and increasing the distortion significantly. By using the O2 as buffer (1x gain with maximum volume pot position results in unity gain between input and output) its high input impedance and low output impedance allows for optimum impedance matching between the two devices. Since performance of O2 is much better than that of D03K there is no loss in signal transfer quality due to this.

Here's a short snip of the performance in this setup (used to reference with independent measurements and to make sure that the recording setup is correct):
View attachment 174699
View attachment 174700
Since D03K is not a very high performing unit (much worse performing than the rest of the measurement chain), this is fully in line with previous measurements, as well as the manufacturer's specification.

From the above diagrams we see that the total loopback recording chain 1kHz SINAD is:
  • ~115 dB with Topping E50
  • ~90 dB with FiiO Taishan D03K

Lastly let's compare the measured performance of the two DACs on FR and THD:
View attachment 174702
E50 FR is perfectly flat, while D03K has a ~0,3dB loss at 20Hz, and a ~0,7dB peak at 17,5kHz.
View attachment 174703
THD of E50 is more uniform across frequencies and >28dB better than that of D03K.

As we saw before, there is an overall ~0,9dB level difference in the raw recorded files, due to different analogue output levels of the two DACs and fixed sensitivity settings of the ADC.
To further fine-match the volume, Presonus Mixtool plugin (allowing gain adjustments to 0,01 dB precision) was used to tune the 1kHz tone level to maximum precision. Resulting files are therefore matched to significantly better than required 0,1dB (at 1kHz at least, note slight FR differences above).

Discussion and conclusion

As you can see, care was taken to achieve maximum performance from both DACs when preparing the test and to achieve very good level matching in all stages. This, combined with the blinded ABX test approach, ensures a controlled subjective comparison of audible differences between the two DACs.

Note however that an ABX test by itself will not tell us which DAC sounds 'better' (preference) - it will just tell us if there's any audible difference between them at all. This is still an important first step before it makes sense to investigate preference at all.

Though I consider this to be a difficult test to most people, I expect that those with well-preserved high frequency hearing (>10kHz) might still be able to hear a difference between the two DACs due to differences in their frequency response. Note that FiiO Taishan D03K has a slight sub-LF loss and a small top-octave peak that should result in a slightly brighter sound than the Topping E50, which could then be used to identify the "X".

I will wait for a week or so before posting preliminary result summary (unless a lot of people complete the test sooner and we have some significant results quickly). Anyone completing the test is of course free to post their results and impressions here at any time :)

If no listeners are able to reliably differentiate these two DACs I will not proceed to prepare the more difficult 'phase 2' test.
However if there are several listeners that can differentiate between these two DACs reliably, I will prepare the a similar comparison between the RME Babyface 1st gen DAC (flat FR, SINAD >100dB), representing average but solid DAC performance, and Topping E50 (flat FR, SINAD ~120dB) representing SOTA DAC performance.

Lastly, let me link again to the ABX test.
Note: There are 16 trials in the test.

Enjoy! :D
Please post the audio files so people can do a proper ABX test using Foobar's ABX Comparator.
 
It's a blind listening test. Why provide graphs? the THD and IMD metrics are completely insufficient metrics for describing the perceptual effects of nonlinear distortion. (1)

The problem with ABX tests is volume matching. You can't do that by ear. You need a calibrated Class 1 SPL Meter. Let's say a Larson Davis 831C-with a 378A04-microphone. Noise floor: 5 db(A). These things are very expensive. If I have to do a wild guess > $5000. This is a low estimate.
The meter needs to be regularly calibrated by the manufacturer or someone who is licensed to do that.

i quote Amirm "Without level matching, listening test results are unreliable" (2)

Ya ba dibba dibba dibba dibba dibba dibba dum O and it needs to be very quiet in your listening room when you're doing the test (it's 35 Db here right now and it's the middle of the night. The perks of living in a city). I like to see environmental noise level of < = 25 Db. I have one closed headphone. I only use that thing for recording sessions sometimes. It's an Audio Technica ATH-M50X. Sound isolation isn't that great. Sound quality is not that good either without EQ. but i am used to that and i know what to expect and correct accordingly. I also have several active monitors to verify the mix. I also listen in my car and a number of bluetooth speakers. If it sounds good on everything. It will probably be a good mix. I hope.

(1) Perception & Thresholds of Nonlinear Distortion using Complex Signals

https://hifisonix.com/wordpress/wp-content/uploads/2017/11/Perceptual-Levels-of-distortion.pdf

(2) https://www.audiosciencereview.com/forum/index.php?threads/understanding-audio-measurements.2351/
 
It's a blind listening test. Why provide graphs? the THD and IMD metrics are completely insufficient metrics for describing the perceptual effects of nonlinear distortion. (1)

The problem with ABX tests is volume matching. You can't do that by ear. You need a calibrated Class 1 SPL Meter. Let's say a Larson Davis 831C-with a 378A04-microphone. Noise floor: 5 db(A). These things are very expensive. If I have to do a wild guess > $5000. This is a low estimate.
The meter needs to be regularly calibrated by the manufacturer or someone who is licensed to do that.

i quote Amirm "Without level matching, listening test results are unreliable" (2)

Ya ba dibba dibba dibba dibba dibba dibba dum O and it needs to be very quiet in your listening room when you're doing the test (it's 35 Db here right now and it's the middle of the night. The perks of living in a city). I like to see environmental noise level of < = 25 Db. I have one closed headphone. I only use that thing for recording sessions sometimes. It's an Audio Technica ATH-M50X. Sound isolation isn't that great. Sound quality is not that good either without EQ. but i am used to that and i know what to expect and correct accordingly. I also have several active monitors to verify the mix. I also listen in my car and a number of bluetooth speakers. If it sounds good on everything. It will probably be a good mix. I hope.

(1) Perception & Thresholds of Nonlinear Distortion using Complex Signals

https://hifisonix.com/wordpress/wp-content/uploads/2017/11/Perceptual-Levels-of-distortion.pdf

(2) https://www.audiosciencereview.com/forum/index.php?threads/understanding-audio-measurements.2351/
the files are already level matched.
 
The first couple tests I thought I could detect a difference. By the fourth test I kept going back and forth and eventually gave up. It could be my old ears and modest equipment, or not.
Please don't be discouraged! IMHO this will indeed be a difficult test to most people - even if it may seem simple from the outset.

Given that there is significant measurable frequency response variation in the audible range between the two DACs I expect that even most mediocre-performing equipment should be able to reveal it in direct comparison. Audibility is perhaps debatable - but I'm optimistic, and anyway this test should help determine it :)
Of course, one would ideally want to listen on an equipment chain with a flat FR and at least better-than-16-bit-resolution (from distortion and noise perspective). But I'd say even this is probably not mandatory since linear distortions (FR variations) between the two files should be similarly measurable (and therefore hopefully detectable) even on a non-flat reproduction chain.

May I know if X is same file for all 16 trials?
Just a disclaimer that my understanding of how the abxtests.com tool operates comes only from the description and demos on its title page (link). For any details or explanations I'd suggest to instead contact the developer (who is also a member of this forum - @jaakkopasanen).

As far as I saw, the abxtests.com tool works as one would expect of any ABX test - "A" and "B" are always fixed between trials (e.g. "A" points to file#1 and "B" points to file#2), while "X" is randomized for each of the 16 trials and in a specific trial can be the same as either "A" or "B". This of course also means that theoretically "X" could point to the same file for all 16 trials (e.g. always "A" or always "B") - it is just not statistically very likely to happen. Also, this simple ABX demo helps illustrate the concept.

Tool description states only this:
"The options presented to the user are shuffled automatically with Fisher-Yates algorithm which ensures that all possible orders are equally probable."

When configuring the test there is no way to influence the randomization algorithm or to fix the "X" (at least none that I found).

Please post the audio files so people can do a proper ABX test using Foobar's ABX Comparator.
If there is a need I can of course post the files directly, but could you please clarify why you feel comparison via abxtests.com tool is not 'proper'?

From what I've seen and tested the tool works really well and IMHO simplifies the process significantly since it's browser-based. Perhaps the only part missing is the ability to loop just a specific section of the test track (rather than looping the whole thing).
On the other hand, another thing I like more with abxtests.com vs foobar2000 ABX plugin is that the transition when switching between A/B/X is gapless and click-free.

If you're concerned about loss of audio quality due to streaming from the browser, I actually tested for this as well. As long as you avoid resampling in Windows (i.e. have your output audio device configured to use 44,1kHz sample rate that the files are encoded with) there seems to be virtually no loss of audio quality at all - i.e. almost the same as playing the file locally from my PC.
Avoiding Windows audio resampling is anyway good practice, IMHO, and would apply to foobar2000 as well.

Streaming quality of abxtests.com was tested by uploading 32bit float and 16bit PCM 1kHz test files (both without dither), playing them from the abxtests.com tool and recording the streamed audio output with the E1DA Cosmos ADC (in stereo mode):
abxtests.com stream quality - 32bit (float) vs 16bit (PCM) wav file.jpg

As you see, the 32-bit file performance is limited by the loopback performance of the DAC/ADC chain and gives same figures with both online streaming and local file playback (THD for both is around ~127dB).
The 16-bit PCM file playback resolution is at the theoretical limit of the 16-bit encoding (96/97dB) - but interestingly there is an increase in THD (~103dB when streaming vs ~122dB with local file playback), however THD+N is still noise-limited and therefore not impacted.

IMHO this is all very good performance and shouldn't make the test any less valid - but please let me know if I'm missing something.

I can share the 1kHz test link if other people are interested to validate these results.

It's a blind listening test. Why provide graphs? the THD and IMD metrics are completely insufficient metrics for describing the perceptual effects of nonlinear distortion. (1)
The reasons for posting the graphs are twofold:
  • Main reason was to validate the test setup and show that it doesn't sacrifice much of the raw DAC performance. You can compare them to the full suite of measurements available from independent sources and see that they are close.
  • Second reason was to make the test a little easier by giving participants clues on what they might listen for - e.g. the difference in frequency response above 10kHz.
EDIT: Note that I make no argument for audibility (or inaudibility) of non-linear distortions.
The problem with ABX tests is volume matching. You can't do that by ear. You need a calibrated Class 1 SPL Meter. Let's say a Larson Davis 831C-with a 378A04-microphone. Noise floor: 5 db(A). These things are very expensive. If I have to do a wild guess > $5000. This is a low estimate.
The meter needs to be regularly calibrated by the manufacturer or someone who is licensed to do that.

i quote Amirm "Without level matching, listening test results are unreliable" (2)

Ya ba dibba dibba dibba dibba dibba dibba dum O and it needs to be very quiet in your listening room when you're doing the test (it's 35 Db here right now and it's the middle of the night. The perks of living in a city). I like to see environmental noise level of < = 25 Db. I have one closed headphone. I only use that thing for recording sessions sometimes. It's an Audio Technica ATH-M50X. Sound isolation isn't that great. Sound quality is not that good either without EQ. but i am used to that and i know what to expect and correct accordingly. I also have several active monitors to verify the mix. I also listen in my car and a number of bluetooth speakers. If it sounds good on everything. It will probably be a good mix. I hope.

(1) Perception & Thresholds of Nonlinear Distortion using Complex Signals

https://hifisonix.com/wordpress/wp-content/uploads/2017/11/Perceptual-Levels-of-distortion.pdf

(2) https://www.audiosciencereview.com/forum/index.php?threads/understanding-audio-measurements.2351/
You are right and I fully agree with both you and @amirm that listening tests are only valid if sounds being compared are level-matched.
However if you read the original post you will see that indeed I've taken care to ensure very close level matching between the test files - the whole process is described there.

Please also note that SPL meters are only used to level match chains that contain transducers (i.e. loudspeakers, headphones...) and that electrical devices such as DACs should be level-matched by measuring and aligning their electrical outputs. This can be done with much greater precision than can be achieved when level-matching transducers with SPL meters, regardless of SPL meter precision. It is due to the simple fact that electrical devices vary less in their FR, and are free of acoustic interferences (such as room reflections and residual acoustic noise).

For this test the total average level-matching was done with better than 0,1dB precision. This is actually what makes this test so difficult :)

Here's track loudness data analysis done with a few different methods:

PreSonus Studio One 5 Professional - 1kHz calibration results
(0,03dB peak-level channel mismatch):
Topping E50 - 1kHz calibration tone peak level:
1640699895884.png

FiiO Taishan D03K - 1kHz calibration tone peak level:
1640699784920.png


PreSonus Studio One 5 Professional - Project View Loudness analysis of test files
(EBU R128 INT and LRA loudness identical to 0,1dB precision; maximum of 0,7dB difference in true-peak values [likely due to differences in frequency response between the two DACs])
1640700104165.png


foobar2000 ReplayGain analysis of the test tracks
(0.01dB total average level difference, ~0.043 peak level difference)
1640699263027.png


No download --> no test, for me as well.
I can understand that. Would you perhaps be willing to argument why?
E.g. if you see some flaw in the methodology perhaps there are things I can improve in future attempts - I'd be very grateful for any help!
I'll also be happy to post the source files, but perhaps it makes sense to first allow some more time for people to complete the online version.
So far we have 6 people who finished the test, and one that reported starting the test but not finishing.

Thanks to all who completed the test so far and/or expressed interest in it!
 
Last edited:
Is it just me, or does this ABX tool not work at all? Tried two different browsers on iOS, I hear no sound at all.
 
Is it just me, or does this ABX tool not work at all? Tried two different browsers on iOS, I hear no sound at all.
Worked for me on a Mac with Chrome browser, outputting to ADI2 Pro. I did find the inability to select the portion of the track I want to compare to be a hindrance. Oh, and when you click on A or B, it just selects that track but doesn't start playing it until you click one more time.
 
I can understand that. Would you perhaps be willing to argument why?
I think for the reason that you can select a certain part of the downloaded files and repeat it/switch it in Foobar abx. This is the most effective way to tell the small sound differences.
 
Is it just me, or does this ABX tool not work at all? Tried two different browsers on iOS, I hear no sound at all.
Sorry to hear that! :( For me it works on both a Windows PC (with Chrome and Edge browsers) as well as a Samsung Android phone (with Chrome and Firefox browsers).

Worked for me on a Mac with Chrome browser, outputting to ADI2 Pro. I did find the inability to select the portion of the track I want to compare to be a hindrance. Oh, and when you click on A or B, it just selects that track but doesn't start playing it until you click one more time.
Interesting - on my devices one click on any option starts playback immediately while the second click on the same selection stops playback. Clicking on a different option while playing another continues playback without any pause or click.
 
I think for the reason that you can select a certain part of the downloaded files and repeat it/switch it in Foobar abx. This is the most effective way to tell the small sound differences.
Agree, that part of the functionality I miss as well (I even noted it in post #11).
@jaakkopasanen Any chance to implement it in future versions?
 
Honestly I didn't like the choice of music, so much so, that I couldn't make past the first few trials before having to stop.
The level matching was excellent, but dreaded having to hearing the clip one more time, so I never finished the test.
 
From what I've seen and tested the tool works really well and IMHO simplifies the process significantly since it's browser-based.
On linux Google and Mozilla have chosen to support only the PulseAudio interface, so using a browser pretty much means you're stuck with the risk of resampling that you're suggesting we avoid in Windows.
 
The 16-bit PCM file playback resolution is at the theoretical limit of the 16-bit encoding (96/97dB) - but interestingly there is an increase in THD (~103dB when streaming vs ~122dB with local file playback), however THD+N is still noise-limited and therefore not impacted.
Even though this THD increase with 16bit files doesn't impact the tests (both files in the original test are 24bit) I was still curious so I did a few more tests to try and figure out when this happens.
Long story short, this only happens with 16-bit files. 24bit and 32bit files are not affected at all, even when dithered down to the same 16bit resolution, as long as the files are saved in higher bit-depth:

abxtests.com - Impact of file bit-depth on THD.jpg

With local file playback in foobar2000 and the same output device we get basically the same performance for both 24bit files, but not for 16bit:
Local playback - Impact of file bit-depth on THD.jpg

Again, since the files in the test from post #1 are both 24bit FLAC files, there should actually be no loss there at all.

@jaakkopasanen Any ideas where the increase in THD for 16bit file playback might come from? Thanks!

Honestly I didn't like the choice of music, so much so, that I couldn't make past the first few trials before having to stop.
The level matching was excellent, but dreaded having to hearing the clip one more time, so I never finished the test.
:) That's OK - thanks anyway for trying!

On linux Google and Mozilla have chosen to support only the PulseAudio interface, so using a browser pretty much means you're stuck with the risk of resampling that you're suggesting we avoid in Windows.
I see :confused: - thanks for the insight!
 
Back
Top Bottom