What kind of evidence is sufficient?

Jakob1863 · Jul 4, 2018

People are often complaining that evidence for the so-called questionable "audiophile" effects is missing - complaining mainly because their personal belief is different - but statements about when evidence would be considered as sufficient are rare/non existent.

Laplace already wrote in the beginning of the 19th century about the human tendency to develop strong opinions even when knowing not much about a subject.
I read in one of Gigerenzer´s publications the idea (although i don´t know if he invented it) that we all are mainly acting as a special kind of Bayesians, which basically means we establish prior beliefs/probabilities - see in this context the Laplace assertion - that might be changed by "something" into posterior beliefs/probabilites.

Clearly, if someone sets the prior probability at zero (which seems to happen quite often by "non golden ears") no evidence will ever be able to change that due to Bayes formula.

But beside that mainly unreasonable approach, what evidence would you consider as sufficient?
In another thread i read that a member ?demands? that controlled listening tests must be supervised, but who should do that and has it really to be?

Wombat · Jul 4, 2018

Jakob1863 said:
People are often complaining that evidence for the so-called questionable "audiophile" effects is missing - complaining mainly because their personal belief is different - but statements about when evidence would be considered as sufficient are rare/non existent.

Laplace already wrote in the beginning of th e 19th century about the human tendency to develop strong opinions even when knowing not much about a subject.
I read in one of Gigerenzer´s publications the idea (although i don´t know if he invented it) that we all are mainly acting as a special kind of Bayesians, which basically means we establish prior beliefs/probabilites - see in this context the Laplace assertion - that might be changed by "something" into posterior beliefs/probabilites.

Clearly, if someone sets the prior probability at zero (which seems to happen quite often by "non golden ears") no evidence will ever be able to change that due to Bayes formula.

But beside that mainly unreasonable approach, what evidence would you consider as sufficient?
In another thread i read that a member ?demands? that controlled listening tests must be supervised, but who should do that and has it really to be?

Make a claim? Put up or shut up seems reasonable to me.

Dismayed · Jul 4, 2018

Audiophile writers have done us a great service - by staying out of real science. Image if they were in charge of clinical trials in the pharmaceutical industry!

tr1ple6 · Jul 4, 2018

The burden of proof is on the person making the claim. The evidence provided should be sufficient to prove whatever claim is made. "The weight of evidence for an extraordinary claim must be proportioned to its strangeness." Pierre-Simon Laplace. Here is an interesting article on Bayes' Theorem

Of course anyone can just make wild claims and provide no evidence to back it up. Then Hitchens' Razor applies: "That which can be asserted without evidence can be dismissed without evidence"

Jakob1863 · Jul 4, 2018

@Wombat ,

Wombat said:
Make a claim? Put up or shut up seems reasonable to me.

As i asked about "what kind of evidence is sufficient" me thinks "put up or shut up...." doesn´t help, because we still don´t know what (exactly or at least) should be "put up" .

@Dismayed ,

Dismayed said:
Audiophile writers have done us a great service - by staying out of real science. Imaging if they were in charge of clinical trials in the pharmaceutical industry!

People examining the quality of those "clinical trials in the pharmaceutical industry" (or others where humans as subjects are involved) do regularly bemoan the high percentage of scientific studies which are probably wrong in their findings.

@tr1ple6 ,

tr1ple6 said:
The burden of proof is on the person making the claim. The evidence provided should be sufficient to prove whatever claim is made. "The weight of evidence for an extraordinary claim must be proportioned to its strangeness." Pierre-Simon Laplace. Here is an interesting article on Bayes' Theorem

Of course anyone can just make wild claims and provide no evidence to back it up. Then Hitchens' Razor applies: "That which can be asserted without evidence can be dismissed without evidence"

Of course, but still what kind of evidence would be considered as sufficient to fullfill the burden of proof wrt the usual claims of audibility?

The example i´ve mentioned did not only demand controlled "blind" listening tests but additionally that they should be supervised .....

DuxServit · Jul 4, 2018

Jakob1863 said:
But beside that mainly unreasonable approach, what evidence would you consider as sufficient?

I believe the general scientific methods should apply to electronic devices (i.e. audio gear). It should have the usual scientific properties:

(a) Assumptions and methods for analysis must be stated and explained upfront.

(b) The process of the analysis must be clear and transparent.

(c) The published results must be repeatable by others using the identical process.

sergeauckland · Jul 4, 2018

Jakob1863 said:
People are often complaining that evidence for the so-called questionable "audiophile" effects is missing - complaining mainly because their personal belief is different - but statements about when evidence would be considered as sufficient are rare/non existent.

----snipped for brevity----
But beside that mainly unreasonable approach, what evidence would you consider as sufficient?
In another thread i read that a member ?demands? that controlled listening tests must be supervised, but who should do that and has it really to be?

For me, it depends on what it is I'm trying to prove/discover/understand.

For example, if I've just repaired an amplifier that had a blown output transistor, sufficient proof that the amplifier had been repaired properly would be if all the voltages on the PCBs were correct, the output power was to spec and the distortion and noise were to spec. I would not need to carry out a full performance analysis or set up blind listening tests to be sure, as if the amplifier was working OK before the breakdown, it should be OK after the repair.

If I was trying to decide what MP3 data rate is sufficient for my needs, then some blind ABBA testing of different data rates and with different types of music will suffice. I would not need to have a large pool of candidates taking the test, as I'm only interested in my criterion. However, if I was being told that an MP3 at Xbps was horrible , and no MP3 was any good at any data rate, as I've seen on some forums, then I would want to see the results of a much wider set of blind tests, not just with one listener, to show at what point (if at all) an MP3 becomes transparent. These would have to be done with academic rigour, not just a few mates having a drink and listening to a few tunes.

I could think of many more examples, but so as not to labour the point, in answer to your post, there's no one type of evidence, it entirely depends on circumstances. Certainly, the usual forum posting resulting from a few mates, or one's wife listening from the other room, are not sufficient evidence, or even evidence at all, merely anecdote.

S.

tr1ple6 · Jul 4, 2018

Jakob1863 said:
@Wombat ,

As i asked about "what kind of evidence is sufficient" me thinks "put up or shut up...." doesn´t help, because we still don´t know what (exactly or at least) should be "put up" .

@Dismayed ,

People examining the quality of those "clinical trials in the pharmaceutical industry" (or others where humans as subjects are involved) do regularly bemoan the high percentage of scientific studies which are probably wrong in their findings.

@tr1ple6 ,

Of course, but still what kind of evidence would be considered as sufficient to fullfill the burden of proof wrt the usual claims of audibility?

The example i´ve mentioned did not only demand controlled "blind" listening tests but additionally that they should be supervised .....

It all depends on the nature of the claim. If someone claims that they like a specific product then I may accept their claim on face value. If they claim that they hear a subjective difference between product A and product B, I will probably probe them for more information about how they tested to come to such a conclusion. All claims are not created equally so the evidence needs to be proportionate to the claim being made.

I don't have enough info on the specific example you cited. A link to the original post would be nice and would give more context.

Grave · Jul 4, 2018

Well, blind tests are certainly necessary because of all the impossible nonsense people claim to hear, which never ends.

"The published results must be repeatable by others using the identical process."

I am not so sure about this since I passed blind tests between high bit rate lossy vs. lossless years ago even though many people claimed that lossy should be transparent, even at low bit rates. I have not used lossy since.

sergeauckland · Jul 4, 2018

tr1ple6 said:
It all depends on the nature of the claim. If someone claims that they like a specific product then I may accept their claim on face value. If they claim that they hear a subjective difference between product A and product B, I will probably probe them for more information about how they tested to come to such a conclusion. All claims are not created equally so the evidence needs to be proportionate to the claim being made.

I don't have enough info on the specific example you cited. A link to the original post would be nice and would give more context.

Exactly. What one likes has nothing to do with what's good, better, best. It's a subjective opinion which includes looks, feel, ergonomics, and yes, sound, but that can come a long way down. Saying they hear a difference can be tested objectively, and is mere heresay without a proper set of tests, carried out with some rigour.

S.

SIY · Jul 4, 2018

Grave said:
I am not so sure about this since I passed blind tests between high bit rate lossy vs. lossless years ago even though many people claimed that lossy should be transparent, even at low bit rates. I have not used lossy since.

Who would that be?

Dismayed · Jul 4, 2018

@Jakob1863

No, clinical trials are not perfect. There are, by necessity, limitations on sample sizes due to cost, so rarer side effects may not be identified. But just imagine if they were run by subjectivists with no controls what so ever! And only a bonehead would set a prior probability at zero.

Pio2001 · Jul 4, 2018

Jakob1863 said:
But beside that mainly unreasonable approach, what evidence would you consider as sufficient?

For me, an ABX test is required (or any other setting, like AXY, ABC/HR etc).

The test must be double blind, which means that the listener can't guess what he or she's listening to by any other means that the sound alone. This includes any attitude, hesitation, moves made by the operator. That's why it is often said that for the test to be double blind, the operator must be in another room. For me this is a bit too much. It is enough if the operator stands behind the listener and takes care not to express surprise or boredom when the listener makes a mistake.

The test must be randomized. That means that the X samples must not be the choice of the operator, but the result of a random decision, with a dice or a flipping a coin, for example. Otherwise, the p values will be underestimated.

The test must clearly distinguish what is a training and what is the real test. If someones fails and says "this one doesn't coun't", and starts again 20 times in a row. The probability that a success will come with p < 0.05 (that is one out of 20) is high. If the listeners stops there and claims "this was the right one", then he can prove anything.

The test must take into account the possible number of listeners. If 5 listeners are doing the same ABX test together, the fact that one out of five gets enough right answers to gets his p value below 0.05 is no more significant. Basically the target p value can be divided by the number of listeners. Having one listener out of five scoring p < 0.01 is about equally probable as having one listener alone scoring p < 0.05.

The context must also be taken into account. The spanish association Matrix-hifi has performed tens of ABX tests. The presence, among them of one success with p < 0.05 is expected even if the listeners are doing nothing else than guessing. This is not significant in this context, even if there was only one listener in this test.
But the listener can consider the result to be significant, if this was his first and only ABX test.
It can also become significant if other informations are given, for example if this is their only ABX test between speakers.

And last, everyone has his own background, which mean that what is significant for someone is not necessarily significant for someone else. I, for example, would not consider a successful ABX between interconnects as a proof that interconnects have an effect on the sound for any value of p above 0.0001.
For the record, I have already seen an ABX test succeeding with p < 0.002 while absolutely nothing was tested. The operator was just playing with the software, hitting randomly the X is A and X is B buttons ! The more you see ABX tests, the less likely you are to be convinced by an isolated success.
...but remember the importance of the context : I may admit nonetheless that it is a proof that this interconnect did change the sound in this particular setup. I have already heard a preamplifier than had been so much tweaked that it would produce audible noises according to the impedance of the source ...and according to the interconnects used.

That's why, when an audiophile claim is made about the sound of cables or DACs, I first look at the measurements. Measurements are easier to make than double blind tests, and if they can tell us the solution, then no need to go further.
ABX tests are useful when measurements show nothing, but listening shows a clear difference.

Pio2001 · Jul 4, 2018

By the way, these were the restrictions. On the other hand, I am ok with allowing a great freedom for the listener about the following points :

Choice of musical samples. The listener can listen to anything that he sees fit.
Duration : any duration is correct as long as the test is double blind.
Repetition : the listener is not forced to give an answer after one listening. If he needs time, if he needs to listen again, no problem.
Answers monitoring : no problem with giving the right answers right after each trial, as long as this information is not used to change the total number of trials, or the significance threshold.
Other changes : no problem with switching to another musical sample, and perform other training sessions between two trials, as long as the first trials are not dismissed.

SIY · Jul 4, 2018

Pio, why ABX as opposed to other double blind test formats, depending on the variable to be tested?

Pio2001 · Jul 4, 2018

You mean why ABX and not AB vs AA, for example ?
The ABX setup is not playing A, then B, then X, then you must answer. It is rather you have access to A, B, X and even Y as many times as you want, in any order you want.

Our audio comparisons are different from usual scientific experiments in two ways : first, the same listener performs all the trials one after another, while in scientific research, the most common configuration is one subject / one trial.

Then, we are not looking at an average discrimination threshold over a given population. We are looking for the utmost smallest possible detectable change by the most sensitive subject under the most favorable conditions. The ABX setup (actually ABXY in Foobar's ABX module) offers many possibilities to the listener.
The possibility to listen to A and B at any time allows the subject to perform additional comparisons that can help him. For example in an XX vs XY test, the subject doesn't know what sample contains the audible clue that he's used to. In ABX, the A and B samples are previously known.

The subject is free to listen to A-X-B-X-A-X-B-X and give one answer.
Or he can choose the order A-B-X-Y (if the ABX module proposes Y too).
Or just X, which is useful when the difference is obvious and you want to quickly reach very high scores, like 16/16 or even 50/50.

The idea is to give the maximum freedom to the listener, so that some differences that could have been overviewed because they were too small to be recognized several times in a row, could be successfully recognized using the right playback order and duration for the listener.

SIY · Jul 4, 2018

Pio2001 said:
You mean why ABX and not AB vs AA, for example ?

Or triangle. Or sorting. Or paired forced choice. Or... well, there are quite a few different formats of controlled sensory test available, depending on the question to be answered. A procrustean approach of "only one type of format is universally valid" is unnecessarily limiting.

Pio2001 · Jul 4, 2018

Any other format would be a proof for me.
I'm used to ABX because it is the most widely available in freeware, and because it seems to be the easiest one.

Thomas savage · Jul 4, 2018

I need evidence your a beer drinker before I trust a single utterance..

andreasmaaan · Jul 4, 2018

@Jakob1863, I think your question would be easier to answer if it were reframed in terms of a particular claim. Otherwise any answer (like the question) will tend to be a little nebulous / so general as to be almost meaningless.

What kind of evidence is sufficient?

Addicted to Fun and Learning

Master Contributor

Senior Member

Active Member

Addicted to Fun and Learning

Senior Member

Major Contributor

Active Member

Senior Member

Major Contributor

Grand Contributor

Senior Member

Senior Member

Senior Member

Grand Contributor

Senior Member

Grand Contributor

Senior Member

Grand Contributor

Master Contributor

Similar threads