• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

What kind of evidence is sufficient?

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,657
Likes
240,901
Location
Seattle Area
Reading between the lines, it is definitely a truism that for some people, no amount of evidence is enough. Take my measurements. I use an objective measurement device and merely move one set of cables from one audio product to another and press "measure" again. Yet folks come back doubting the comparison saying the results must be cooked, etc.

Likewise, I have been challenged by people saying no one can tell the difference between 320 kbps MP3 and original. I pull out random tracks (mostly ones under discussion), run ABX and show clearly I passed them. Yet people don't believe, demand witnesses, saying I must have edited the results, cooked the process, etc.

So the simple answer is that for some from each camp, nothing is enough.

For people outside of those two bounds, there is some hope. In the case of objectivists and say, comparing two DACs, I am OK if we walk before we run. To wit, on another forum someone said DAC A is better than another. I told him he likely did not match levels but that if he did, and still arrived at the same conclusion, I would give him $50 for his troubles. He claimed he had matched levels (even though he did not say so originally) and I paid up!

The simple rule of what is enough is to not present results to us when the very foundations of what we believe is violated. Don't give us sighted tests. Don't give us unmatched results. Don't tell us stories which clearly sound false. Don't tell us how many years you have been an audiophile.

Run a controlled test where all variable other than fidelity of sound is the same. Or as much as you can.
 

Rod

Addicted to Fun and Learning
Joined
Mar 24, 2018
Messages
744
Likes
332
I want to see a dac(while playing music) put into the particle accelerator at CERN and see the atoms of the music explode into there individual parts and analyzed.
 

Grave

Senior Member
Joined
Jun 30, 2018
Messages
382
Likes
204
Likewise, I have been challenged by people saying no one can tell the difference between 320 kbps MP3 and original. I pull out random tracks (mostly ones under discussion), run ABX and show clearly I passed them. Yet people don't believe, demand witnesses, saying I must have edited the results, cooked the process, etc.

I find this particularly amusing when it happens to me.
 
Last edited:
OP
J

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
@DuxServit,
I believe the general scientific methods should apply to electronic devices (i.e. audio gear). It should have the usual scientific properties:

(a) Assumptions and methods for analysis must be stated and explained upfront.

(b) The process of the analysis must be clear and transparent.

(c) The published results must be repeatable by others using the identical process.

Imo a reasonable approach, although c) depends on the objectives as replication is usually considered as sufficient while repeatability is a concern if more wide ranging conclusion should be possible.

But basically, you´d consider the evidence by a positive experimental result as sufficient, if the experiment is designed using propper scientific methodology (i.e. is valid, reliable and objective) and replication was possible, right?
You would not demand additional oversight, other levels of significance or whatever?

@sergeauckland,

<snip>

I could think of many more examples, but so as not to labour the point, in answer to your post, there's no one type of evidence, it entirely depends on circumstances. Certainly, the usual forum posting resulting from a few mates, or one's wife listening from the other room, are not sufficient evidence, or even evidence at all, merely anecdote.

S.

Heeh, "it depends" is my line..... :)
Overall, it seems that you were more or less on the same side as DuxServit (demanding sound scientific rigor while following the Fisherian approach that experimenters shouldn´t do always the same, but must adapt to the specific questions/hypothesises the experiment should address), so again, you would´t demand additional oversight or anything in that direction?

@Dismayed,

@Jakob1863

No, clinical trials are not perfect. There are, by necessity, limitations on sample sizes due to cost, so rarer side effects may not be identified. But just imagine if they were run by subjectivists with no controls what so ever! And only a bonehead would set a prior probability at zero.

Beside limitations on sample size, these trials seem often to suffer from (more or less) intentional wrongdoing like "p-hacking" and of inattentional, because of misunderstanding of the meaning of p-values, CIs, statistics in general , neglecting statistical power considerations and so on.
Run by mere subjectivists could be even worse, although it´s more a myth that all subjectivists are not using any kind of control.
Edit: just as an example (not meant to be offensive), if someone uses the phrase "impossible nonsense people claim to hear all the time" i´d say that is essentially a prior probability of zero.

@tr1ple6 ,

<snip>
I don't have enough info on the specific example you cited. A link to the original post would be nice and would give more context.

Understood; unfortunately i have to search for it....
 
Last edited:
OP
J

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
@Jakob1863, I think your question would be easier to answer if it were reframed in terms of a particular claim. Otherwise any answer (like the question) will tend to be a little nebulous / so general as to be almost meaningless.

Good hint, but i wanted to be more specific after an initial round of gathering informations.
Nevertheless i should have been more clear that the basis was already talking about a controlled listening test result (including the double blind condition).
 
Last edited:
OP
J

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
@Pio2001,

<snip>
The test must take into account the possible number of listeners. If 5 listeners are doing the same ABX test together, the fact that one out of five gets enough right answers to gets his p value below 0.05 is no more significant. Basically the target p value can be divided by the number of listeners. Having one listener out of five scoring p < 0.01 is about equally probable as having one listener alone scoring p < 0.05.

So, you´re demanding usage of the Bonferroni correction to control the family wise error rate.

<snip>
And last, everyone has his own background, which mean that what is significant for someone is not necessarily significant for someone else. I, for example, would not consider a successful ABX between interconnects as a proof that interconnects have an effect on the sound for any value of p above 0.0001.

Combined for all results including any replication or as level of significance in each isolated experiment?

For the record, I have already seen an ABX test succeeding with p < 0.002 while absolutely nothing was tested. The operator was just playing with the software, hitting randomly the X is A and X is B buttons ! The more you see ABX tests, the less likely you are to be convinced by an isolated success.

Sure, "random guessing" is "random guessing" and in the long run every possible result will most likely occur sometime, or will occur surely if the long run approaches infinity. :)

...but remember the importance of the context : I may admit nonetheless that it is a proof that this interconnect did change the sound in this particular setup. I have already heard a preamplifier than had been so much tweaked that it would produce audible noises according to the impedance of the source ...and according to the interconnects used.

If i understand it correctly, your significance criterion varies in dependence of prior beliefs, correct?
I have to admit that in this context the example above with audible noise puzzles me a bit....

As a reminder, we have to strictly seperate the terms "ABX test" and "software tool called ABX" .
Doing an ABX test means exactly listening to A, then listening to B then listening to X and then giving the answer to the trial question.
Although not used in the original ABX method (invented in ?1947?), repeated listening to A,B and X before doing the final ABX sequence would imo still qualify as an ABX test, although the implications for the statistical analysis can be quite interesting if such variations from the original protocols are implemented.

Doing something different like listening only to A and X means just using the software tool called ABX but doing not an ABX test anymore.
The above example with listening to A and X would be the equivalent to doing an incomplete same/different test
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,406
Good hint, but i wanted to be more specific after an initial round of gathering informations.
Nevertheless is hould have been more clear that the basis was already a controlled listening test result (including the double blind condition).

Ok yeh fair enough :) I maintain that the question of what is sufficient evidence may vary depending on the claim being made. But you have started an interesting discussion.

Anyway, I think there have already been some very good general answers and some very good responses from you, so I'll step aside to let the thread continue to develop...
 

Sal1950

Grand Contributor
The Chicago Crusher
Forum Donor
Joined
Mar 1, 2016
Messages
14,195
Likes
16,918
Location
Central Fl
The simple rule of what is enough is to not present results to us when the very foundations of what we believe is violated. Don't give us sighted tests. Don't give us unmatched results. Don't tell us stories which clearly sound false. Don't tell us how many years you have been an audiophile.

Run a controlled test where all variable other than fidelity of sound is the same. Or as much as you can.

A perfectly defined set of rules minus one point, to be definitive they results should also be repeatable.
If a A-B test resulted in 95% correct answers, it is obvious that result should be repeatable.
If it was not repeatable we could probably assume the presenter was not honest in his presentation in some way.
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,406
Here’s an anecdote that doesn’t go anywhere near answering the OP, but may be relevant. I design speakers so I’m interested in the audibility of the type of group delay caused by crossover filters. I’d surveyed the main studies (Blauert etc) and was fairly confident that typical filter slopes could not cause audible group delay, certainly not in speakers, and probably not through headphones either.

I then came across an online program that allows you to blind abx test customisable digital filters that add degrees of group delay corresponding to butterworth and LR crossover slopes at any frequency chosen using wav source material also of your choosing. The name of the little program escapes me now but maybe others have also come across it? It was the better type of abx tester in that you could flick back and forth between a, b and x whenever and as many times as you liked, and I could also ofc adjust the master volume on my headphone amp.

Anyway, I did the tests on headphones (my girlfriend’s in this case - AKG somethings, nice but not exceptional phones) and was surprised to find that after just one early error, I was able to correctly differentiate a simulated 4th order LR crossover at 1200hz over 20 consecutive times.

This should be unlikely to be possible according to much published literature. But after a bit of practice, the difference became so clear to me that I didn’t bother continuing. In fact, I quickly worked out a technique that involved switching between a or b and x in the middle of a held vocal note that always seemed to work for me.

What can I draw from this? Perhaps I’m much more sensitive than average to group delay in this frequency range? Perhaps, in procesding the wave files, the software introduced some other form of audible distortion that I was able to detect? Perhaps something else...

Anyway, I need to revisit this thing, it’s one of the things I’ve been meaning to get around to doing. I use this example now simply to illustrate that even with a lot of controls in place, it can be very difficult to say with confidence that the outcome of even what appears to be a controlled, single, self-administered dbt can reliably prove a claim.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,511
Likes
25,340
Location
Alfred, NY
What would be great is if you could write up exactly what you did, how the experiment was set up, source material, software, software settings, etc, etc, etc, so the experiment can be examined and (if no significant flaws are found) replicated.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,657
Likes
240,901
Location
Seattle Area
Anyway, I did the tests on headphones (my girlfriend’s in this case - AKG somethings, nice but not exceptional phones) and was surprised to find that after just one early error, I was able to correctly differentiate a simulated 4th order LR crossover at 1200hz over 20 consecutive times.
Phase shifts are audible with headphones. It is in rooms with reflections where audibility becomes very difficult. Here is Dr. Toole from his book:

Many investigators over many years have attempted to determine whether
phase shift mattered to sound quality (e.g., Greenfield and Hawksford, 1990;
Hansen and Madsen, 1974a, 1974b; Lipshitz et al., 1982; Van Keulen, 1991).
In every case, it has been shown that if it is audible, it is a subtle effect,
most easily heard through headphones or in an anechoic chamber, using carefully
chosen or contrived signals. There is quite general agreement that with
music reproduced through loudspeakers in normally reflective rooms, phase
shift is substantially or completely inaudible.
When it has been audible as a
difference, when it is switched in and out, it is not clear that listeners had a
preference.

Amir
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,406
Phase shifts are audible with headphones. It is in rooms with reflections where audibility becomes very difficult. Here is Dr. Toole from his book:

Many investigators over many years have attempted to determine whether
phase shift mattered to sound quality (e.g., Greenfield and Hawksford, 1990;
Hansen and Madsen, 1974a, 1974b; Lipshitz et al., 1982; Van Keulen, 1991).
In every case, it has been shown that if it is audible, it is a subtle effect,
most easily heard through headphones or in an anechoic chamber, using carefully
chosen or contrived signals. There is quite general agreement that with
music reproduced through loudspeakers in normally reflective rooms, phase
shift is substantially or completely inaudible.
When it has been audible as a
difference, when it is switched in and out, it is not clear that listeners had a
preference.

Amir

That’s of course true, but from memory the threshold of audibility found using headphones in those studies was generally higher (greater than 1 or 2 ms in the upper midrange) than the threshold I seemed to manage very easily when experimenting on my own (I’m on my phone atm so can’t calculate the number, but 360 degrees of phase shift @ 1200Hz would def be less than 1ms).

I have to admit though, I don’t have my laptop in front of me and am a bit fuzzy on a lot of the details from these studies.

I’ll send whatever further details I can find/remember when I’m back home on the laptop...
 

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,406
What would be great is if you could write up exactly what you did, how the experiment was set up, source material, software, software settings, etc, etc, etc, so the experiment can be examined and (if no significant flaws are found) replicated.

I will do this as soon as I get a chance, sure :)
 
  • Like
Reactions: SIY

andreasmaaan

Master Contributor
Forum Donor
Joined
Jun 19, 2018
Messages
6,652
Likes
9,406
One further perhaps interesting detail: in every case, I preferred the sound without added group delay.
 

Pio2001

Senior Member
Joined
May 15, 2018
Messages
317
Likes
507
Location
Neuville-sur-Saône, France
So, you´re demanding usage of the Bonferroni correction to control the family wise error rate.

I didn't know it was called like that :cool:

Combined for all results including any replication or as level of significance in each isolated experiment?

In each isolated experiment (one day, one place), but including all present listeners.

If i understand it correctly, your significance criterion varies in dependence of prior beliefs, correct?

Yes, although I'd say "knowledge" rather than "belief".

It's nothing else than the "extraordinary claims require extraordinary evidence" principle. If someone tells me he's annoyed by his tweeter playing 0.5 dB louder than his woofer, I'd believe it with p < 0.05. But if he says that he's annoyed by the harsh sound of his silver speaker cables over the much better sounding copper ones, p = 0.01 would not be enough to convince me. That would merely make me accept to have a look at the measurements.

I have to admit that in this context the example above with audible noise puzzles me a bit....

The noises were soft hissing with clicks. It would change whith the variable volume control of the CD player output. And it wad different with different interconnects. We knew that the preamplifier's circuitrly had been completely tweaked by a mad audiophile. Obviously, it was converting radio-frequencies into audible artifacts, and interconnects of different length and shielding were acting as different antennas for them.

Doing an ABX test means exactly listening to A, then listening to B then listening to X and then giving the answer to the trial question.

In this case I'm not using them, nor advising them. With a forced answer test with limited time, the bad excuse of blind listening tests being stressful becomes a good excuse ! :eek:
 
OP
J

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
<snip>
I don't have enough info on the specific example you cited. A link to the original post would be nice and would give more context.

It was this line that draw my attention:
And I eagerly await the posting of results from supervised bias controlled blind listening tests proving the audible value of any such products in any reasonably well designed system.
(bold feature by me)
 

vert

Active Member
Forum Donor
Joined
May 30, 2018
Messages
285
Likes
258
Location
Switzerland
Likewise, I have been challenged by people saying no one can tell the difference between 320 kbps MP3 and original. I pull out random tracks (mostly ones under discussion), run ABX and show clearly I passed them. Yet people don't believe, demand witnesses, saying I must have edited the results, cooked the process, etc.

Interesting! So you think there is a difference? I was similarly challenged by someone on head-fi some time ago. He was an audio expert or liked to think of himself as one, and a great admirer of Apple's iPhone. Other than the equivalent quality, he kept mentioning gain of space as a side benefit of 320 kbps (to which I replied storage space wasn't an issue to an Android user). So I got challenged to take some kind of scientific (pseudo-sientific?) procedure of his own chosing to satisfy his claim. Instead I offered to rip a 320 kbps track and compare it to the same track in FLAC and try an honest comparison. It was a song by Brazil's Caetano Veloso, a world-class performer always surrounded by top musicians. I chose to focus on the very sophisticated percussions part in that song. The loss of detail was very obvious. In brief, the FLAC version made it possible to hear the percussionist play actual melodies ; on the 320 kbps version there were just thuds with no sense of melody. I reported my "findings", suggesting he perform the same test, but he never bothered to even reply. Not scientific enough I guess. I remember him mentioning level matching, I have no idea if my levels were matched, but I doubt it would have made a difference in this case: when a detail is not there, it is not there.

That claim seems to surface quite often, it appeared recently on a YouTube channel I'm a subscriber to ; the author was justifying the fact his new recording would only be available in 320kbps and the purpose of the episode was to prove there was no difference (if you thought differently you were an "audiophool").
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,511
Likes
25,340
Location
Alfred, NY
I remember him mentioning level matching, I have no idea if my levels were matched, but I doubt it would have made a difference in this case: when a detail is not there, it is not there.

Level matching is critical.
 

sergeauckland

Major Contributor
Forum Donor
Joined
Mar 16, 2016
Messages
3,460
Likes
9,158
Location
Suffolk UK
Level matching is critical.
Yes it is, however, if a piece of audio is converted from WAV (CD) to MP3 to FLAC, the audio levels don't get changed, so they are inherently level matched. Unlike analogue, where any medium conversion can affect amplitude, digital conversion won't unless deliberately invoked.

Edit:- However, one problem with starting from a commercial CD is that many if not most commercial CDs have clipping, even if the CD levels don't reach 0dBFS there's still a lot of 'flat-topping' at lower levels. An MP3 encoder doesn't do well with flat-topped audio as it was never designed to deal with that, so I'm not surprised if an MP3 of whatever bit rate was audibly different if the source was a commercial CD unless it was one known not to have any flat-topping or limiting. Ideally it will be a live recording done with no EQ or compression.

S.
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,511
Likes
25,340
Location
Alfred, NY
Yes it is, however, if a piece of audio is converted from WAV (CD) to MP3 to FLAC, the audio levels don't get changed, so they are inherently level matched. Unlike analogue, where any medium conversion can affect amplitude, digital conversion won't unless deliberately invoked.

I haven't used every piece of conversion software out there, so this may be universally true, but I don't know. Nonetheless, given the criticality and repeatedly demonstrated audibility of small level changes, I'd still spend a minute or two verifying it before beginning a test.
 
Top Bottom