• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Wiim Ultra vs Wiim Ultra + External Dac (Blind Test Results)

Is this substantiated by their relative measurements?
Keith
Of course not. The measurements show the DACS are audibly identical. They sound the same.


5,4,3,,,
 
Of course not. The measurements show the DACS are audibly identical. They sound the same.


5,4,3,,,

It is true that technical measurements (distortion, noise, and frequency response) indicate minimal or insignificant differences, so the Topping and the WiiM Ultra should perform very similarly and sound audibly identical.

Guess I didn't make the point clear enough.

A:In a ABX test, the attendee got 43 correct out of 60. How likely it is really a different between A and B

ChatGPT:
In an ABX test, the goal is to determine if the attendee can reliably distinguish between two stimuli, A and B. To assess how likely it is that the attendee's correct guesses are due to a real difference between A and B (rather than just chance), we can perform a binomial test or calculate the p-value for the result.

In this case:

  • The number of trials n=60
  • The number of correct answers k=43
  • Under the null hypothesis (assuming no difference between A and B), the probability of a correct answer in each trial is p=0.5, since it would be just a random guess.
The p-value will help us understand how likely it is that a person could have gotten 43 or more correct answers purely by guessing.
Let me calculate this for you.
The p-value is approximately 0.00053. This indicates that the probability of achieving 43 or more correct answers purely by chance (under the assumption that there's no difference between A and B) is very low—about 0.05%.
Since this p-value is well below the common significance threshold of 0.05, it suggests that it is very likely there is a real difference between A and B in the test, and the attendee is not simply guessing.

So it's not that 72% isn't representative enough. You need to deny the whole deny the whole experiment.

However, according to this post the probability that the listeners in the test would have obtained 43 correct answers out of 60 by pure chance was 0.053%.

Undeniably, this suggests that it is more likely than not that the participants genuinely noticed a difference in sound quality, rather than the results being due to random guessing.
 
Undeniably, this suggests that it is more likely than not that the participants genuinely noticed a difference in sound quality, rather than the results being due to random guessing.
That is true only in the absence of confounders.
 
It is true that technical measurements (distortion, noise, and frequency response) indicate minimal or insignificant differences, so the Topping and the WiiM Ultra should perform very similarly and sound audibly identical.



However, according to this post the probability that the listeners in the test would have obtained 43 correct answers out of 60 by pure chance was 0.053%.

Undeniably, this suggests that it is more likely than not that the participants genuinely noticed a difference in sound quality, rather than the results being due to random guessing.
They're talking in general, statistics, not about any particular test.
 
Undeniably, this suggests that it is more likely than not that the participants genuinely noticed a difference in sound quality, rather than the results being due to random guessing.

Or that the controls were insufficient.
 
That is true only in the absence of confounders.
Or that the controls were insufficient.

If the existence of confounders in the test is dogmatically assumed, the possibility of any meaningful discussion is undermined.

Especially after it was explained that it was a blind test, in which it is very reasonable to assume the subjects didn't know or could have any clue about which DAC they were hearing.
 
Last edited:
Better bass and slightly clearer are symptomatic of one level being slightly louder than another.
Keith
 
Better bass and slightly clearer are symptomatic of one level being slightly louder than another.
Keith

It is true that, based on the psychoacoustic principle, volume affects sound perception. When a system is louder, bass and treble tend to stand out more, giving the impression of better quality or clarity. However, in our test, none of the participants mentioned that one system sounded louder than the other, nor did they base their responses on volume differences.

That said, clearly there is the possibility that, despite balancing volume levels with an SPL meter, small variations in volume could still have influenced the perception of sound quality.
 
If the existence of confounders in the test is dogmatically assumed, the possibility of any meaningful discussion is undermined.
Not dogmatically assumed. But if the controls are insufficient (and I think we have determined that to be the case) then confounders are possible.

If you combine that with the vanishingly unlikely chance of being able to hear defects in the sound down at the levels both the DACS (internal and external) achieve, then the assumption is going to be that confounders influencing the result are vastly more likely than being able to hear those defects.

Confounders can include insufficient level matching, tells in the switching process, including the behavour of people doing the switching, distortion or noise added by the differing system circuit with the external DAC in/out of the circuit. Probably other possibilities I've not thought of.

Getting the controls spot on is actually difficult. It is simply more likely that there are failures there, than that you have a group of people with super human hearing able to hear distortion and noise down at -115dB. Even more so when you are listening via speakers whose distortion is probably 1000 times or more larger than that of the DACs.
 
Last edited:
If the existence of confounders in the test is dogmatically assumed, the possibility of any meaningful discussion is undermined.

That's been the point all along.

Tighten up the controls, and let's try it again. This wasn't a test with enough rigor to be taken seriously.

That doesn't mean it wasn't fun, but it wasn't evidence of anything.
 
It is true that, based on the psychoacoustic principle, volume affects sound perception. When a system is louder, bass and treble tend to stand out more, giving the impression of better quality or clarity. However, in our test, none of the participants mentioned that one system sounded louder than the other, nor did they base their responses on volume differences.

That said, clearly there is the possibility that, despite balancing volume levels with an SPL meter, small variations in volume could still have influenced the perception of sound quality.
Something if you ever do it again, milli volt meter from the units outputs.
Keith
 
However, in our test, none of the participants mentioned that one system sounded louder than the other, nor did they base their responses on volume differences.
That is the point - a small level difference (say from as low as 0.2dB to <1dB) is NOT percieved as a volume difference. It will be perceived only as a quality difference.
 
From my point of view (I was the only one there), it’s not impossible that the listeners perceived differences, especially considering subjective perception and individual sensitivity to sound.

Without denying that confounding factors may have influenced the results, completely dismissing the participants' ability to detect differences could overlook subtle nuances that technical measurements don't fully capture.

Listening tests often reveal small subjective differences in sound that may not appear in purely technical measurements.

If technical measurements were the only factor considered, many DACs and speakers would sound exactly the same, and that’s clearly not the general opinion.

People often notice subtle tonal variations, dynamics, or soundstage that can differentiate equipment, even when objective measurements indicate near-identical performance.

Therefore, I believe subjective listening experiences should also be considered when considering audio equipment, alongside the technical data.
 
Last edited:
No, speakers yes because their measurements even between fine measuring examples are different but between dacs no.
If there were more bass that would be evident in the components measurements.
Keith
 
Thanks to everyone for the feedback. I have decided to redo the same test with instructional and recreational purposes exclusively and electrically balance the output levels of the DACs to compare results. Any further suggestions are welcome.
 
This is *massively* more likely to explain it
Without denying that confounding factors may have influenced the results,

Than this;

completely dismissing the participants' ability to detect differences could overlook subtle nuances that technical measurements don't fully capture.
 
Without denying that confounding factors may have influenced the results, completely dismissing the participants' ability to detect differences could overlook subtle nuances that technical measurements don't fully capture.

How well do you understand the measurements?

Listening tests often reveal small subjective differences in sound that may not appear in purely technical measurements.

That has never been demonstrated with proper controls. Ever.

People often notice subtle tonal variations, dynamics, or soundstage that can differentiate equipment, even when objective measurements indicate near-identical performance.

Yes, when they don't use proper controls.
If technical measurements were the only factor considered, many DACs and speakers would sound exactly the same, and that’s clearly not the general opinion.

DACs are electronics that can be fully characterized with measurements.

Speakers are motors, and not what we are talking about here.

Therefore, I believe subjective listening experiences should also be considered when considering audio equipment, alongside the technical data.

Yes, subjective evaluation with appropriate controls, otherwise you are more likely to end up with less than useful results, to yourself or anyone else.

You say you are new to audiophile, yet don't seem willing to listen to what may actually help you move beyond just believing nice stories.
 
Thanks to everyone for the feedback. I have decided to redo the same test with instructional and recreational purposes exclusively and electrically balance the output levels of the DACs to compare results. Any further suggestions are welcome.

Make it double blind, and make it AB/X.
 
Back
Top Bottom