• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

DAC and amp combos did not give same clues when running online blind tests. Why? What would be the desired clue?

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
When running online blind timing tests yesterday, I heard different clues when using multiple combination of dac and amps. I wonder why.

I thought, as long as I am using transparent DAC and amp, I can mix and match them and any combo will sound the same. But not this case. Is it okay?

More details here:

If this is not expected, can any expert explain to me why I get different clues?

Also, per the website at https://www.audiocheck.net/blindtests_timing_2w.php?time=5 ,
it uses clips with bass drum and hi hat. Thus, it can represent a short real music clip.

In that case, which kind of clue is more desired outcome, considering how it would likely affect regular music playback enjoyment?

Do we want clear double clicks kind of clue? Do we want slight tonality change clue? Do we want strong tonality change clue?

Thanks!
 
Last edited:
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
Some thoughts....

In my timing test thread, there were many that mentions hearing tonality change when running 1ms tests.

For example:

Pkane heard tonality change when using his Apple laptop's internal speakers to run 1ms tests.

Amir heard tonality change when using cheap $30 USB conference headset connected to Windows to run 2ms tests.

I heard very obvious tonality change when using my smartphone internal speakers to run 1ms tests.

However, using Amir's recommended headphones (k371 and HE400SE) with the same smartphone using jack out, with all audio enhancements features turned off, I do not hear tonality change for 1ms. Instead, I hear one extra click and can pass 1ms using that clue. K371 is a little more subtle in the click than the HE400SE.

I wonder what could cause such clue difference. What would cause tonality change and what could give extra subtle click clue?

Should I assume hearing tonality change mean something in the chain is not resolving enough? For example, using cheap headset or using laptop/phone's built in speakers?

Should I assume if I hear extra click, the chain is more resolving?

Thanks!
 
Last edited:
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
Did a few more tests. Decided to track my results in this thread.

Here is a screen shot of the results so far:

dac_amp_headphones_results_1.PNG
 

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
803
Likes
2,626
I'll post this here as I fear the other thread went a bit off-topic. The quotes here are from that other topic.
Like I mentioned before, level matching does not makes sense because I had to use various volume levels to find a clue for each combo.
Even so, level-matching still makes sense if you want to try and determine the source of the differences you observed in your initial test.
You would in any case also need to test without knowing which stack you are listening to at any point (until you finish all tests), to minimize confirmation bias (as I also suggested in post #3,379 of the other thread).
Ideally you would also do a few runs to see if your results are consistent and repeatable.

If it was me, I'd probably try at least a few things:
  1. Test all stacks at exactly the same, comfortable listening level (level would need to be precisely measured). Listen without knowing which stack is playing until finishing all attempts with all different stacks.
  2. Select another listening level, level-match all stacks and repeat the blind test from 1. (though the stacks should not be tested in the same sequence as in the first test - the order should be unknown to you in each trial). Possibly repeat for a few different levels.
  3. Test each stack while freely modifying the level during each test to what I feel is best for test performance (but again without knowing which stack was connected in each attempt, until finishing all attempts with all different stacks). Note that the level adjustment would need to be done in a way that doesn't give away which stack is playing (e.g. in the same software or with the same passive attenuator with each stack).
  4. Repeating 1-3 a few times and match notes to tested stacks to see if the results are always consistent
Doing the above steps still doesn't necessarily avoid all sources or error, but it would at least remove listening level and basic visual/confirmation bias from the equation.

Comparing transparent DAC based on DAC measurements alone seems not enough anymore in the conversation of audibility.
IMHO it is still too early to claim the differences you are observing are clear evidence of audible differences between otherwise transparent DACs. There's a lot of chance for error in informal listening tests, and results can easily be swayed by less-than-obvious issues in the setup or methodology.

Please note that this doesn't mean that your results are not genuine - just that it is probably too early to interpret them with any certainty, and especially too early to make claims about audio system performance based on these results.
TBH I'm not sure anyone could start explaining these differences without first going through some trouble making sure they are repeatable under strict and controlled conditions.

Of course, it is clear this kind of testing would be very time consuming and tedious - you should only do it if you're really interested in how all this works! But without doing very strict tests people will almost certainly (and IMO rightfully) challenge the results as well as any claims/conclusions drawn from them. You can view it as a sort of coarse peer-review process :)
 
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
I'll post this here as I fear the other thread went a bit off-topic. The quotes here are from that other topic.

Even so, level-matching still makes sense if you want to try and determine the source of the differences you observed in your initial test.
You would in any case also need to test without knowing which stack you are listening to at any point (until you finish all tests), to minimize confirmation bias (as I also suggested in post #3,379 of the other thread).
Ideally you would also do a few runs to see if your results are consistent and repeatable.

If it was me, I'd probably try at least a few things:
  1. Test all stacks at exactly the same, comfortable listening level (level would need to be precisely measured). Listen without knowing which stack is playing until finishing all attempts with all different stacks.
  2. Select another listening level, level-match all stacks and repeat the blind test from 1. (though the stacks should not be tested in the same sequence as in the first test - the order should be unknown to you in each trial). Possibly repeat for a few different levels.
  3. Test each stack while freely modifying the level during each test to what I feel is best for test performance (but again without knowing which stack was connected in each attempt, until finishing all attempts with all different stacks). Note that the level adjustment would need to be done in a way that doesn't give away which stack is playing (e.g. in the same software or with the same passive attenuator with each stack).
  4. Repeating 1-3 a few times and match notes to tested stacks to see if the results are always consistent
Doing the above steps still doesn't necessarily avoid all sources or error, but it would at least remove listening level and basic visual/confirmation bias from the equation.


IMHO it is still too early to claim the differences you are observing are clear evidence of audible differences between otherwise transparent DACs. There's a lot of chance for error in informal listening tests, and results can easily be swayed by less-than-obvious issues in the setup or methodology.

Please note that this doesn't mean that your results are not genuine - just that it is probably too early to interpret them with any certainty, and especially too early to make claims about audio system performance based on these results.
TBH I'm not sure anyone could start explaining these differences without first going through some trouble making sure they are repeatable under strict and controlled conditions.

Of course, it is clear this kind of testing would be very time consuming and tedious - you should only do it if you're really interested in how all this works! But without doing very strict tests people will almost certainly (and IMO rightfully) challenge the results as well as any claims/conclusions drawn from them. You can view it as a sort of coarse peer-review process :)
I really appreciate you spending time to provide suggestions.

Do you really think the difference are so small between certain chain that I would need to do "blind" tests of online "blind" tests to prove it?

Main point is to show different chain can give different results. Showing obvious difference between headphones used not enough for you?

Like you said, the results should be able to be duplicated by others. Many already duplicated some of the findings, especially the tonality change observed. Some already observed certain chain does better than others.

So, would you please do peer review and try replicate my findings?

Thanks again!

; )
 
Last edited:

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
803
Likes
2,626
I really appreciate you spending time to provide suggestions.
I'm glad - I'm genuinely hoping some of the information is useful! :)
Do you really think the difference are so small between certain chain that I would need to do blind tests of online "blind" tests to prove it?
I'm not making claims on the 'size' of the difference, I'm just saying that testing blind would remove one obvious and well known biasing factor from your results. Testing level-matched would remove another, as well as remove one uncontrolled variable from the test.
Both of these would help make your results a bit more convincing to others who have not performed the test.

Main point is to show different chain can give different results. Showing obvious difference between headphones used not enough for you?
The issue is whether we can safely make this conclusion based on the performed tests. Different test performance when using different headphones is to me less surprising, as the frequency responses of various headphone models will typically be very different; and IMHO it is not difficult to believe some FR deviations would expose cues in these kinds of tests more that other FR deviations.

DACs and HP amps on the other hand perform objectively much more similar to one another, which makes the claimed test performance difference IMO unexpected. This is why caution is needed to make sure both methodology and results are valid first.

Like you said, the results should be able to be duplicated by others. Many already duplicated some of the findings, especially the tonality change observed.
I'll admit I haven't read all related posts, but what I believe I did read is that some people claimed the test was easier to pass for them using systems with poor objective performance (e.g. phone and laptop speakers) - i.e. systems that have obvious FR deviations and possibly other very audible deviations. This again doesn't seem that surprising to me.
However I don't believe I've seen other people claiming (and substantiating) that different 'transparent' measuring electronics perform differently in this test (perhaps I missed it, though). This is in my view a very different claim, and such a finding would IMO indeed be surprising, if it could be proven repeatable under controlled conditions.

So, would you please do peer review and try replicate my findings?
I don't mean to offend by this, but personally I'm not interested to pursue this line of investigation.
Still, just for fun let's suppose that I do a variant of your test and can't detect differences - what does that tell us? Is it that my critical listening performance is not good enough, or that there is really no difference? How to prove a negative? :) Also if both of us don't use the same methodology, how can we be sure our results are even comparable?

As I said before - if you're happy with your findings, that is perfectly OK. :) Just note that people might challenge them as they are currently presented, and even more so when general conclusions/claims are drawn from them. Even if you adhered to a very strict protocol and implemented every control, presented results would and should still be questioned and carefully confirmed - especially unexpected ones. It is the only way to make sure the results are valid. Please don't take this to heart - we all have a chance to learn from the process. :)

Lastly I have to say that I do admire the amount of effort you put into doing listening tests - all of that must have taken days to do! And it is IMHO indeed an interesting idea to use this type of timing test with different devices.
 
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
I'm glad - I'm genuinely hoping some of the information is useful! :)

I'm not making claims on the 'size' of the difference, I'm just saying that testing blind would remove one obvious and well known biasing factor from your results. Testing level-matched would remove another, as well as remove one uncontrolled variable from the test.
Both of these would help make your results a bit more convincing to others who have not performed the test.


The issue is whether we can safely make this conclusion based on the performed tests. Different test performance when using different headphones is to me less surprising, as the frequency responses of various headphone models will typically be very different; and IMHO it is not difficult to believe some FR deviations would expose cues in these kinds of tests more that other FR deviations.

DACs and HP amps on the other hand perform objectively much more similar to one another, which makes the claimed test performance difference IMO unexpected. This is why caution is needed to make sure both methodology and results are valid first.


I'll admit I haven't read all related posts, but what I believe I did read is that some people claimed the test was easier to pass for them using systems with poor objective performance (e.g. phone and laptop speakers) - i.e. systems that have obvious FR deviations and possibly other very audible deviations. This again doesn't seem that surprising to me.
However I don't believe I've seen other people claiming (and substantiating) that different 'transparent' measuring electronics perform differently in this test (perhaps I missed it, though). This is in my view a very different claim, and such a finding would IMO indeed be surprising, if it could be proven repeatable under controlled conditions.


I don't mean to offend by this, but personally I'm not interested to pursue this line of investigation.
Still, just for fun let's suppose that I do a variant of your test and can't detect differences - what does that tell us? Is it that my critical listening performance is not good enough, or that there is really no difference? How to prove a negative? :) Also if both of us don't use the same methodology, how can we be sure our results are even comparable?

As I said before - if you're happy with your findings, that is perfectly OK. :) Just note that people might challenge them as they are currently presented, and even more so when general conclusions/claims are drawn from them. Even if you adhered to a very strict protocol and implemented every control, presented results would and should still be questioned and carefully confirmed - especially unexpected ones. It is the only way to make sure the results are valid. Please don't take this to heart - we all have a chance to learn from the process. :)

Lastly I have to say that I do admire the amount of effort you put into doing listening tests - all of that must have taken days to do! And it is IMHO indeed an interesting idea to use this type of timing test with different devices.
Again, appreciate your thoughtful response.

Indeed, hard to get all others to duplicate my results. But 30% of respondents in my timing test thread could sense 1ms....So I have hope that someone who could sense 1ms can provide more peer reviews.
: )

Thanks again and have a great weekend.
 
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
I'm glad - I'm genuinely hoping some of the information is useful! :)

I'm not making claims on the 'size' of the difference, I'm just saying that testing blind would remove one obvious and well known biasing factor from your results. Testing level-matched would remove another, as well as remove one uncontrolled variable from the test.
Both of these would help make your results a bit more convincing to others who have not performed the test.


The issue is whether we can safely make this conclusion based on the performed tests. Different test performance when using different headphones is to me less surprising, as the frequency responses of various headphone models will typically be very different; and IMHO it is not difficult to believe some FR deviations would expose cues in these kinds of tests more that other FR deviations.

DACs and HP amps on the other hand perform objectively much more similar to one another, which makes the claimed test performance difference IMO unexpected. This is why caution is needed to make sure both methodology and results are valid first.


I'll admit I haven't read all related posts, but what I believe I did read is that some people claimed the test was easier to pass for them using systems with poor objective performance (e.g. phone and laptop speakers) - i.e. systems that have obvious FR deviations and possibly other very audible deviations. This again doesn't seem that surprising to me.
However I don't believe I've seen other people claiming (and substantiating) that different 'transparent' measuring electronics perform differently in this test (perhaps I missed it, though). This is in my view a very different claim, and such a finding would IMO indeed be surprising, if it could be proven repeatable under controlled conditions.


I don't mean to offend by this, but personally I'm not interested to pursue this line of investigation.
Still, just for fun let's suppose that I do a variant of your test and can't detect differences - what does that tell us? Is it that my critical listening performance is not good enough, or that there is really no difference? How to prove a negative? :) Also if both of us don't use the same methodology, how can we be sure our results are even comparable?

As I said before - if you're happy with your findings, that is perfectly OK. :) Just note that people might challenge them as they are currently presented, and even more so when general conclusions/claims are drawn from them. Even if you adhered to a very strict protocol and implemented every control, presented results would and should still be questioned and carefully confirmed - especially unexpected ones. It is the only way to make sure the results are valid. Please don't take this to heart - we all have a chance to learn from the process. :)

Lastly I have to say that I do admire the amount of effort you put into doing listening tests - all of that must have taken days to do! And it is IMHO indeed an interesting idea to use this type of timing test with different devices.
I thought more about what combo would be best for blind voltage matched tests, if you are going to be my helper.
; )

The most surprising difference I found is the difference between Gustard combo vs Topping combo when I used he400se as headphones.

With Topping E30 and Topping L30, 5ms test was VERY easy due to noticeable double hits clue.

With Gustard x16 and h16, I tried multiple different volume levels and could not sense double clicks clue. Kind of concerning to me as this combo is $900 vs $300 for the Topping.

These can be very good candidates for blind voltage match test, when we eventually have a chance to sit down and do the tests.

: P
 

raif71

Major Contributor
Joined
Sep 7, 2019
Messages
2,333
Likes
2,535
One click vs double clicks, can that be an indication of sound separation/resolution?
 
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
One click vs double clicks, can that be an indication of sound separation/resolution?
I am curious too. Likely the he400se highs is sharper, thus easier to hear the hit hat hit without being mask by the drum hit. Thus, more double hit clues with multiple combos.

But then, why the difference between Topping vs Gustard combo?

Anyway, how many different kinds of clues/tells you found so far when performance the tests?

Thanks!
 

raif71

Major Contributor
Joined
Sep 7, 2019
Messages
2,333
Likes
2,535
I am curious too. Likely the he400se highs is sharper, thus easier to hear the hit hat hit without being mask by the drum hit. Thus, more double hit clues with multiple combos.

But then, why the difference between Topping vs Gustard combo?

Anyway, how many different kinds of clues/tells you found so far when performance the tests?

Thanks!
I myself have not done the tests but I'm incline to believe you as I'm on the fence when it comes to DACs sounding different. I have all the gears that you used except H16, the headphones and minidsp.
 
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
I myself have not done the tests but I'm incline to believe you as I'm on the fence when it comes to DACs sounding different. I have all the gears that you used except H16, the headphones and minidsp.
It would be great if you can run some tests when you have free time. Thanks!
 

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
803
Likes
2,626
I thought more about what combo would be best for blind voltage matched tests, if you are going to be my helper.
; )

The most surprising difference I found is the difference between Gustard combo vs Topping combo when I used he400se as headphones.

With Topping E30 and Topping L30, 5ms test was VERY easy due to noticeable double hits clue.

With Gustard x16 and h16, I tried multiple different volume levels and could not sense double clicks clue. Kind of concerning to me as this combo is $900 vs $300 for the Topping.

These can be very good candidates for blind voltage match test, when we eventually have a chance to sit down and do the tests.

: P
Not sure what level of involvement you expect from my side, but I'll be happy to help with some best-effort suggestions if I can. I have to say though that I'm sure there are much more experienced and competent members in this forum if you really want to prepare a rigorous test. I.e. I'm probably not the best person to help design a protocol to help you avoid all potential sources of error in such a test - if that's what you're after :)

The two stacks you propose seem like good candidates, agreed. Both should be solidly matched technically and can be considered transparent in objective performance based on ASR review. I did a quick check of technical parameters from the review and specs:
  • Gustard X16/H16 stack:
    • DAC
      • Output level at 0dBFS (ASR): 3,3Vrms RCA (4,15Vrms XLR)
      • Output impedance (spec): 100 ohms RCA (300 ohms XLR)
    • HP amp:
      • Input sensitivity (spec): 2Vrms RCA (6,2Vrms XLR) -> If 2Vrms really is max input level with RCA, clipping might happen with full-scale RCA signal from X16! Might be safer to use XLR.
      • Input impedance: 4700 Ohms RCA (10000 Ohms XLR) -> much higher than matching DAC output impedance so OK!
      • Output impedance (spec): 0.1 ohms unbal (0.2 ohms bal) -> Very good - much lower than even low impedance headphones!
      • Output power at 33Ohm unbalanced (ASR): 810 mW -> Should provide enough current for low-impedance headphones.
  • Topping E30/L30 stack:
    • DAC
      • Output level at 0dBFS (ASR): 2,15Vrms RCA
      • Output impedance: I couldn't find it
    • HP amp:
      • Input sensitivity (spec): 3Vrms G=H RCA -> the same or higher than matching DAC output level so should be OK!
      • Input impedance (ASR): 2500 Ohms RCA -> probably high enough for L30 since they are matched devices.
      • Output impedance (spec): <0.1 ohms unbal -> Very good - much lower than even low impedance headphones!
      • Output power at 33Ohm (ASR): 1040 mW -> Should provide enough current for low-impedance headphones.
Regarding methodology, some ideas:
  1. If you have an ADC on hand it would be good to use it to do some REW test sweeps at the selected volume level for both stacks - this would ensure level-matching and help check if there are any obvious artefacts introduced by the audio chain for any reason.
  2. I'd suggest to try and download the source audio files from this online test and use foobar2000 ABX comparator in WASAPI exclusive mode to do your tests. That might help reduce the chance of any misbehaviour from SW side.
  3. You will also need a willing helper to facilitate a blind test - it will only be single-blind but still better than sighted! :)
 
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
Not sure what level of involvement you expect from my side, but I'll be happy to help with some best-effort suggestions if I can. I have to say though that I'm sure there are much more experienced and competent members in this forum if you really want to prepare a rigorous test. I.e. I'm probably not the best person to help design a protocol to help you avoid all potential sources of error in such a test - if that's what you're after :)

The two stacks you propose seem like good candidates, agreed. Both should be solidly matched technically and can be considered transparent in objective performance based on ASR review. I did a quick check of technical parameters from the review and specs:
  • Gustard X16/H16 stack:
    • DAC
      • Output level at 0dBFS (ASR): 3,3Vrms RCA (4,15Vrms XLR)
      • Output impedance (spec): 100 ohms RCA (300 ohms XLR)
    • HP amp:
      • Input sensitivity (spec): 2Vrms RCA (6,2Vrms XLR) -> If 2Vrms really is max input level with RCA, clipping might happen with full-scale RCA signal from X16! Might be safer to use XLR.
      • Input impedance: 4700 Ohms RCA (10000 Ohms XLR) -> much higher than matching DAC output impedance so OK!
      • Output impedance (spec): 0.1 ohms unbal (0.2 ohms bal) -> Very good - much lower than even low impedance headphones!
      • Output power at 33Ohm unbalanced (ASR): 810 mW -> Should provide enough current for low-impedance headphones.
  • Topping E30/L30 stack:
    • DAC
      • Output level at 0dBFS (ASR): 2,15Vrms RCA
      • Output impedance: I couldn't find it
    • HP amp:
      • Input sensitivity (spec): 3Vrms G=H RCA -> the same or higher than matching DAC output level so should be OK!
      • Input impedance (ASR): 2500 Ohms RCA -> probably high enough for L30 since they are matched devices.
      • Output impedance (spec): <0.1 ohms unbal -> Very good - much lower than even low impedance headphones!
      • Output power at 33Ohm (ASR): 1040 mW -> Should provide enough current for low-impedance headphones.
Regarding methodology, some ideas:
  1. If you have an ADC on hand it would be good to use it to do some REW test sweeps at the selected volume level for both stacks - this would ensure level-matching and help check if there are any obvious artefacts introduced by the audio chain for any reason.
  2. I'd suggest to try and download the source audio files from this online test and use foobar2000 ABX comparator in WASAPI exclusive mode to do your tests. That might help reduce the chance of any misbehaviour from SW side.
  3. You will also need a willing helper to facilitate a blind test - it will only be single-blind but still better than sighted! :)
Thanks for checking the setups!

Regarding the x16, Amir got preproduction sample. Thus the output levels are higher than normal. Production sample outputa are limited to 2v max for RCA and 4v max for xlr. This was mentioned by the manufacturer somewhere in the long thread.

As you can see, both pairs are supposed to be transparent.

Good idea about downloading the test file.

Regarding blind test test file, I think I just need to download the delay file. I can start with 5ms.

Once level matched both chains using test tone and multimeter, the test can start.

Then, I just need a helper to flip a coin to decide which chain to use, select the right DAC in my laptop audio setting, connect headphones to the right amp, and play the test file for me using a player.

I will just sit facing away from all the setup and listen. I will then decide which chain is which based on what I heard.

Sounds reasonable?

Regarding needing competent helper.....Well, I suspect I would not get one anytime soon.
I have done double blind listening tests twice already over the last 1 plus year. My family members were involved and they are kind of fed up with all that time and hassel involved. I won't bother my daughter until June as she is working hard to prepare for her senior year IB exams and all other related works. My wife....lets just say that she will be mad at me if I am doing listening tests for the third time..... but maybe I can get her to do 20 rounds test sometimes in the near future, when I bribe her enough.
; )

BTW, I did mean that I want you to be the helper that run the tests for me, when I just sit and listen.
: P
 
Last edited:

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
803
Likes
2,626
Thanks for checking the setups!
No worries!
Regarding the x16, Amir got preproduction sample. Thus the output levels are higher than normal. Production sample outputa are limited to 2v max for RCA and 4v max for xlr. This was mentioned by the manufacturer somewhere in the long thread.
That's great then!
As you can see, both pairs are supposed to be transparent.
Agreed, both seem very capable.
Good idea about downloading the test file.

Regarding blind test test file, I think I just need to download the delay file. I can start with 5ms.
Actually, my suggestion is to still download both the delayed and synced files. The idea would be to do an equivalent of the online test, just with a bit more control, and see if your impressions from post #3 remain unchanged under this more rigorous blind test.
Once level matched both chains using test tone and multimeter, the test can start.

Then, I just need a helper to flip a coin to decide which chain to use, select the right DAC in my laptop audio setting, connect headphones to the right amp, and play the test file for me using a player.
Two comments:
  1. After level matching I'd still suggest to measure the output of each stack with an ADC, to have some reference of performance and to verify that neither stack shows any unexpected performance issues in your system.
  2. You should never know which stack you are listening to, and you shouldn't know if the next one is the same as the previous or not. You would ideally do several runs, making notes every time. Only after finishing all runs should the order in which the stacks were presented be revealed to you. Then you would match with your notes and check if your impressions are consistent and match those from post #3.
I will just sit facing away from all the setup and listen. I will then decide which chain is which based on what I heard.
IMHO you should not try to identify the stack currently playing; instead for each test run you should mark your impressions in the same way as you did in post #3. Identifying the stacks blind is in my view a slightly different test.

Regarding needing competent helper.....Well, I suspect I would not get one anytime soon.
I have done double blind listening tests twice already over the last 1 plus year. My family members were involved and they are kind of fed up with all that time and hassel involved. I won't bother my daughter until June as she is working hard to prepare for her senior year IB exams and all other related works. My wife....lets just say that she will be mad at me if I am doing listening tests for the third time..... but maybe I can get her to do 20 rounds test sometimes in the near future, when I bribe her enough.
; )
Peace in the family should come before any listening tests, I agree :D

BTW, I did mean that I want you to be the helper that run the tests for me, when I just sit and listen.
: P
Thanks, but running the test remotely would probably be difficult, and honestly I'm not willing to spare a the time as I'm personally not that interested in this line of investigation - hope you take no offence by this!
 
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
No worries!

That's great then!

Agreed, both seem very capable.

Actually, my suggestion is to still download both the delayed and synced files. The idea would be to do an equivalent of the online test, just with a bit more control, and see if your impressions from post #3 remain unchanged under this more rigorous blind test.

Two comments:
  1. After level matching I'd still suggest to measure the output of each stack with an ADC, to have some reference of performance and to verify that neither stack shows any unexpected performance issues in your system.
  2. You should never know which stack you are listening to, and you shouldn't know if the next one is the same as the previous or not. You would ideally do several runs, making notes every time. Only after finishing all runs should the order in which the stacks were presented be revealed to you. Then you would match with your notes and check if your impressions are consistent and match those from post #3.

IMHO you should not try to identify the stack currently playing; instead for each test run you should mark your impressions in the same way as you did in post #3. Identifying the stacks blind is in my view a slightly different test.


Peace in the family should come before any listening tests, I agree :D


Thanks, but running the test remotely would probably be difficult, and honestly I'm not willing to spare a the time as I'm personally not that interested in this line of investigation - hope you take no offence by this!
Haha, no problem that you can't help with running the tests. Just poking fun at you.

Even if I could get family members to help and get passing results, some ASR members will still question my results. One member already said he doesn't trust anything done in my home.

Maybe I could get @amirm to host a listening challenge with $ involved. That will make people motivated and get involved. Something like the $10k challenge.

; )
 
Last edited:

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
803
Likes
2,626
Even if I could get family members to help and get passing results, some ASR members will still questions my results. One member already said he doesn't trust anything done in my home.
Of course people would question them - you should be aware that it is very easy to accidentally introduce an error that impacts results in these kinds of test.

Personally I'd advise against doing so many time consuming tests to try and prove something to people online. If instead you do it because you want to learn more and share/verify your findings - that's great! :)
 

sq225917

Major Contributor
Joined
Jun 23, 2019
Messages
1,361
Likes
1,612
Excuse me if I have this wrong.

To be valid, apart from being double blinded, you'd have to run each stack multiple times out of sequence. Not attempt it ten times on one stack then move onto the next, you'd have to randomise the stack order and run it myiple times.

As it stands it appears to offer no statistical validity.
 
OP
Pdxwayne

Pdxwayne

Major Contributor
Joined
Sep 15, 2020
Messages
3,219
Likes
1,172
Of course people would question them - you should be aware that it is very easy to accidentally introduce an error that impacts results in these kinds of test.

Personally I'd advise against doing so many time consuming tests to try and prove something to people online. If instead you do it because you want to learn more and share/verify your findings - that's great! :)
Yup. I wasn't trying to prove something other than curious to know how well people will do in timing tests.

Then people started to tell me that 1ms was very easy due to tonality change. That started the whole process of checking all the different combination because I was curious, not because I was trying to prove anything.

Then, after I collected all the info, it occurred to me that I should ask the members here why I sensed different clues. This is because if certain clue is better, I might use a certain combination as my primary listening chain.

: )

Thanks for all the great advices!
 

dominikz

Addicted to Fun and Learning
Forum Donor
Joined
Oct 10, 2020
Messages
803
Likes
2,626
To be valid, apart from being double blinded, you'd have to run each stack multiple times out of sequence. Not attempt it ten times on one stack then move onto the next, you'd have to randomise the stack order and run it myiple times.
Indeed - I see it the same way. IMHO it would be really important to randomize the stack order. I tried to address this in post #15 but perhaps my wording was a bit ambiguous:
You should never know which stack you are listening to, and you shouldn't know if the next one is the same as the previous or not. You would ideally do several runs, making notes every time. Only after finishing all runs should the order in which the stacks were presented be revealed to you. Then you would match with your notes and check if your impressions are consistent and match those from post #3.
 
Top Bottom