Is the entire audio industry a fraud?

Holmz · Dec 31, 2022

kemmler3D said:
…
Holmz gives us the popular interpretation, which is (by experts) considered wrong in a very similar way to "tube distortion makes music sound better" is wrong. It's not WRONG in that "better" is subjective with no ultimate definition, but it's wrong.
…

Tube amp harmonics are wrong if fidelity is considered right.
But right if that floats one’s boat.

kemmler3D said:
…
Doodski is correct, a vodka martini shaken is basically the opposite of what the british would consider a good martini, which is gin, stirred.
…

I am not sure I would go to the UK for advice on anything culinary.

SIY · Dec 31, 2022

atmasphere said:
So where is the spot where I said
I don't see that there.
But I think I might see what you were trying to convey. When I read that, I wasn't sure what you meant by 'instrument' but I'm thinking now you're referring to a musical instrument (which is why its probably better to quote me if you're going to attack the post rather than saying something that I didn't say).

FWIW I showed a stylized waveform of a clarinet and trumpet, but if you were to really show all the harmonics the waveforms are a lot noisier. Some of them are approaching the same range as the distortion. Since you seem to have read up on this I thought you knew that. Your remonstrations are starting to convince me that you didn't.

Dance all you like, play all the word games you want, but you're still making an audibility claim about low levels of distortion you attributed to your competition. You still have no evidence, and the plausibility argument gets more and more ludicrous. I mean, really, what's four orders of magnitude? A mere detail.

You may not be intending to stand in for the frauds who dominate your market segment, but you are nonetheless carrying their water. Get evidence for your claims. It's very easy and quick to do. Unless, of course, you're aware that your claims are incorrect and that your products are technologically interesting and well-engineered but sound no different than any other amp EQed to yield the same frequency response.

SIY · Dec 31, 2022

kemmler3D said:
but if ABX isn't the answer, what is?

Any of a number of other formats for controlled testing.

Galliardist · Dec 31, 2022

Holmz said:
Tube amp harmonics are wrong if fidelity is considered right.
But right if that floats one’s boat.

I am not sure I would go to the UK for advice on anything culinary.

As a British person myself, I can assure you we are the only people in the world to consult on "non brewed condiment", baked beans on toast, and instant coffee.

Do any of those count as culinary?

Blumlein 88 · Dec 31, 2022

kemmler3D said:
Well, if there is one, I would certainly like to hear it! That's why I responded. I'm not closed to the possibility but if ABX isn't the answer, what is?

I think the tests and metrics we rely on now are good, useful, and important, but if there are better ones, I imagine we'd all like to know.

I'm no expert, but beyond ABX there are 2AFC and 3AFC (alternative forced choice), duo-trio, triangle testing, up/down testing, and a few more. You also have cases where people answer a questionaire or describe preferences of a few characteristics (like MUSHRA).

Thorsten Loesch · Dec 31, 2022

pablolie said:
[1] Even though I could reliably tell a difference, I picked the wrong file as a HiRez file :-D i.e. the 2 tracks were played 6 times, and 5 times out of that I picked the same track to be HiRez (and it turned to be the 320k MP3 :-D]

// unlurk

lurk = 0000;

// print message

printf(

"So, here we go. Had the test used preference as metrics to indicate a difference the test would have returned a positive.

That the reliable preference indicated 320k MP3 over original HiRez is interesting and matches my expectation.

As I always remark, the ABX is not a very sharp too in the box of tools we have, possibly the dullest other than completely uncontrolled listening.

Why people defend it against valid criticism and and continue to promote it is beyond my understanding.

I's like a block of beautiful, sharp knifes in your knife block, knifes that slice ripe tomatoes into thin slices, feel great in the hand and all but you insist peparing your salad with an unergonomic dull knife that cannot cut warm butter well. Why?

Thor"

);

// restore lurk

lurk = FFFF;

IPunchCholla · Dec 31, 2022

Thorsten Loesch said:
// unlurk

lurk = 0000;

// print message

printf(

"So, here we go. Had the test used preference as metrics to indicate a difference the test would have returned a positive.

That the reliable preference indicated 320k MP3 over original HiRez is interesting and matches my expectation.

As I always remark, the ABX is not a very sharp too in the box of tools we have, possibly the dullest other than completely uncontrolled listening.

Why people defend it against valid criticism and and continue to promote it is beyond my understanding.

I's like a block of beautiful, sharp knifes in your knife block, knifes that slice ripe tomatoes into thin slices, feel great in the hand and all but you insist peparing your salad with an unergonomic dull knife that cannot cut warm butter well. Why?"

Thor

);

// restore lurk

lurk = FFFF;

You might always say that it an unsharp tool, but that doesn't make it so. And perhaps you are mistaking people pointing out flaws in your criticisms of ABX (applying the weaknesses it exhibits in testing populations to tests of individuals, misunderstanding the role and use of type 1 and type 2 errors, claiming it is only capable of detecting gross differences without any proof (and against the industry using it to test codecs, something that needs to tease out bandwidth vs. fidelity fairly rigorously) for support of ABX? As many posters have pointed out (particularly @Blumlein 88 - thank you!) have pointed out, there are many tools in the blind testing toolbox. Use the one that is appropriate to your point. But for Effs sake, use one, or stay lurking.

Blumlein 88 · Dec 31, 2022

We have had at least one thread on the topic. Read what is said on page 2 especially by J_J in post #27. People speak of a properly done double blind test. Post #27 summarizes what is needed. I think very few of us have ever done any such thing.

Double Blind Testing FAQ Development

OK, my fingers are tired. :) I hope others start to contribute and we have a starting point.... I will be super disappointed if this work does not conclude and members don't contribute to it. OP is doing us a great favor by creating this doc so that we can reference in the future than...

www.audiosciencereview.com

So if you aren't doing a proper test are they worthless, worthwhile, how careful should you be of what you claim? I mean a score of 63 out of 100 means there is only a 1% chance the result is random and not real. You need 16 out of 20 in a shorter test. Likely the audibility of some factor must be larger to be sure with only 20 trials vs the same probability with more trials. That means there some number of edge cases where you get a null result with 20 trials when the truth is not a null (those type 2 errors).

For my self, I find doing 10 of your usual ABX trials relatively doable without terrible tedium. If I do two of those with 20 of 20, I feel sure I can hear something. But even that is not a "properly done ABX test" by research criteria.

PS: I too would like for Thorsten Loesch to give us one example of what he considers a good way to do it. He answered in one post (I think in another thread) in general terms, but people will have a better idea if we had real example or one he can imagine meeting his standards. He seems not interested in providing that.

PPS: This is what Mr. Loesch replied to earlier in what is needed.

Is the entire audio industry a fraud?

Why is it so hard to get my position right? I criticise Audio ABX as practiced on a number of grounds, all strait forward and all related easy to understand flaws, including Methodology and use of statistics. The result of these flaws is that the Audio ABX test is very heavily weighted towards...

www.audiosciencereview.com

1) Make sure the test is BLIND. That is, the test subjects should not have any awareness of what is being tested, so they cannot have any bias on the subject. This is specific to Audio (though is also useful in other contexts where strong emotions are attached to views on the subject of the investigation) as we had five decades or so of extreme polarisation.

2) Make sure the test minimises test induced stress, this involves protocol, environs and general interactions with listeners. They are not enemies to be defeated, but resources to employed in the search for knowledge. Make the listeners comfortable and relaxed, make them feel they are giving a real contribution, not matter what your personal view on the matter. If necessary employ someone to be nice if you cannot be.

3) Use a form of preference / performance ranking, it not only gives more information, humans are much more consistent in their preference than their ability to correctly identify a specific item. Collect as much as data as possible. Use questionaries' that assess the emotional / mental state of the subject as well. Test for reliable preference and reliable alteration of mood/emotional state as proxies to the presence of a potential difference, rather than attempting to test the difference directly.

4) Use whatever statistics you like, but be clear to everyone, your listeners as well as the audience of the test what the limitations and implications are UPFRONT.
Is that straightforward and detailed enough?

Thor

IPunchCholla · Dec 31, 2022

Blumlein 88 said:
We have had at least on thread on the topic. Read what is said on page 2 especially by J_J in post #27. People speak of a properly done double blind test. Post #27 summarizes what is needed. I think very few of us have ever done any such thing.

Double Blind Testing FAQ Development

OK, my fingers are tired. :) I hope others start to contribute and we have a starting point.... I will be super disappointed if this work does not conclude and members don't contribute to it. OP is doing us a great favor by creating this doc so that we can reference in the future than...

www.audiosciencereview.com

So if you aren't doing a proper test are they worthless, worthwhile, how careful should you be of what you claim? I mean a score of 63 out of 100 means there is only a 1% chance the result is random and not real. You need 16 out of 20 in a shorter test. Likely the audibility of some factor must be larger to be sure with only 20 trials vs the same probability with more trials. That means there some number of edge cases where you get a null result with 20 trials when the truth is not a null (those type 2 errors).

For my self, I find doing 10 of your usual ABX trials relatively doable without terrible tedium. If I do two of those with 20 of 20, I feel sure I can hear something. But even that is not a "properly done ABX test" by research criteria.

PS: I too would like for Thorsten Loesch to give us one example of what he considers a good way to do it. He answered in one post (I think in another thread) in general terms, but people will have a better idea if we had real example or one he can imagine meeting his standards. He seems not interested in providing that.

PPS: This is what Mr. Loesch replied to earlier in what is needed.

Is the entire audio industry a fraud?

Why is it so hard to get my position right? I criticise Audio ABX as practiced on a number of grounds, all strait forward and all related easy to understand flaws, including Methodology and use of statistics. The result of these flaws is that the Audio ABX test is very heavily weighted towards...

www.audiosciencereview.com

1) Make sure the test is BLIND. That is, the test subjects should not have any awareness of what is being tested, so they cannot have any bias on the subject. This is specific to Audio (though is also useful in other contexts where strong emotions are attached to views on the subject of the investigation) as we had five decades or so of extreme polarisation.

2) Make sure the test minimises test induced stress, this involves protocol, environs and general interactions with listeners. They are not enemies to be defeated, but resources to employed in the search for knowledge. Make the listeners comfortable and relaxed, make them feel they are giving a real contribution, not matter what your personal view on the matter. If necessary employ someone to be nice if you cannot be.

3) Use a form of preference / performance ranking, it not only gives more information, humans are much more consistent in their preference than their ability to correctly identify a specific item. Collect as much as data as possible. Use questionaries' that assess the emotional / mental state of the subject as well. Test for reliable preference and reliable alteration of mood/emotional state as proxies to the presence of a potential difference, rather than attempting to test the difference directly.

4) Use whatever statistics you like, but be clear to everyone, your listeners as well as the audience of the test what the limitations and implications are UPFRONT.
Is that straightforward and detailed enough?

Thor

I guess I disagree with him on 1 and 3 then. 1 because for many tests of small differences, the listeners need to be trained to identify the particular component under test. 3 is a very useful way of doing things, but certainly not the only way and to only do things in that way (preferences) would be mean abandoning many useful tools.

Blumlein 88 · Dec 31, 2022

IPunchCholla said:
I guess I disagree with him on 1 and 3 then. 1 because for many tests of small differences, the listeners need to be trained to identify the particular component under test. 3 is a very useful way of doing things, but certainly not the only way and to only do things in that way (preferences) would be mean abandoning many useful tools.

I agree with you, I only quoted him to remind us of what he has said is needed so far. Certainly for a preference to exist a difference must be heard. One can do a test anyway, and the results will show if there is no difference even with preferences. Using preferences makes the statistics less simple, but in some uses that is needed.

Thorsten Loesch · Dec 31, 2022

IPunchCholla said:
I guess I disagree with him on 1 and 3 then. 1 because for many tests of small differences, the listeners need to be trained to identify the particular component under test. 3 is a very useful way of doing things, but certainly not the only way and to only do things in that way (preferences) would be mean abandoning many useful tools.

// unlurk

lurk = 0000;

// print message

printf(

"1) does not exclude training. You can train listeners in identifying the audible difference without them knowing what causes it.

3) is based on general human psychology

I like Rioja's and generally full bodied red wines.

Give me a real spanish Rioja and a Wolf Blass Australian Malbec (a favourite for a cheap tipple).

Give me four random glasses and ask me what I prefer, I will pick the Rioja when sober and the malbec when tipsy.

Give an ABX test and I will say they are same.

It is easier to establish a preference for different items reliably than actually correctly identifying A/B if we are presented with X, which is what ABX requires with a high requirement of statistical confidence Type A errors have been avoided for a very small dataset, which places it really as edge case on where statistics can be applied.

As a result for small n tests (< 100's) we should seek to maximise sensitivity (this is unrelated to the statistics) instead of minimising it (which I assert ABX does).

If you have anything that qualifies above ipse dixit levels and asserting truism (often incorrectly) that proves that my points are incorrect and that ABX in fact has nonof the problems I raise, please post them.

I will not respond to ipse dixit, appeal to authorities without actual data, truisms and similar logical fallacies, it impossible to have a logical and sensible debate in this case.

Thor"

);

// restore lurk

lurk = FFFF;

MarkS · Dec 31, 2022

Thor, here's what I say: just listen, and don't look. Listen at leisure. Have somebody else do the swap, whenever it's convenient. Do this a few times. See if you think you hear a difference. Do as many trials as you like, and can conveniently arrange. If you're comparing power amps, set the sound level by ear only. Then see how you did at identifiying which component was in the system. Decide for yourself if you did better enough than 50/50 to make the component that seemed to sound better worthwhile to you.

And before everybody else jumps all over me: I fully understand all the complaints you will have. I am knowledgeable enough to have actually published a paper on Bayesian statistical analysis of listening tests in JAES. But your complaints will not convince Thor and many other audiophiles. They are the audience I want to reach.

Blumlein 88 · Dec 31, 2022

MarkS said:
Thor, here's what I say: just listen, and don't look. Listen at leisure. Have somebody else do the swap, whenever it's convenient. Do this a few times. See if you think you hear a difference. Do as many trials as you like, and can conveniently arrange. If you're comparing power amps, set the sound level by ear only. Then see how you did at identifiying which component was in the system. Decide for yourself if you did better enough than 50/50 to make the component that seemed to sound better worthwhile to you.

And before everybody else jumps all over me: I fully understand all the complaints you will have. I am knowledgeable enough to have actually published a paper on Bayesian statistical analysis of listening tests in JAES. But your complaints will not convince Thor and many other audiophiles. They are the audience I want to reach.

You already know my complaint. Can you explain how recommending a known faulty methodology accomplishes anything? I'm just to be clear referring to the level matching by ear. How is that going to work, when we already know it will generate some positive test results that are nothing more than poor level matching? Do you think in some Bayesian way we'll get more right than not so it is better than nothing? You aren't going to convince Thorsten, and you might convince lots of audiophiles their beliefs are true when they aren't. How is this a step forward?

jaymusic · Dec 31, 2022

Thorsten Loesch said:
// unlurk

lurk = 0000;

// print message

printf(

"1) does not exclude training. You can train listeners in identifying the audible difference without them knowing what causes it.

3) is based on general human psychology

I like Rioja's and generally full bodied red wines.

Give me a real spanish Rioja and a Wolf Blass Australian Malbec (a favourite for a cheap tipple).

Give me four random glasses and ask me what I prefer, I will pick the Rioja when sober and the malbec when tipsy.

Give an ABX test and I will say they are same.

It is easier to establish a preference for different items reliably than actually correctly identifying A/B if we are presented with X, which is what ABX requires with a high requirement of statistical confidence Type A errors have been avoided for a very small dataset, which places it really as edge case on where statistics can be applied.

As a result for small n tests (< 100's) we should seek to maximise sensitivity (this is unrelated to the statistics) instead of minimising it (which I assert ABX does).

If you have anything that qualifies above ipse dixit levels and asserting truism (often incorrectly) that proves that my points are incorrect and that ABX in fact has nonof the problems I raise, please post them.

I will not respond to ipse dixit, appeal to authorities without actual data, truisms and similar logical fallacies, it impossible to have a logical and sensible debate in this case.

Thor"

);

// restore lurk

lurk = FFFF;

Great post.
One thing to add. In any test involving human decision making there is also a certain amount of anxiety. That in itself introduces problems. Like the Coke / Pepsi challenge… which one do you prefer immediately has one trying to discern which one they prefer. A very different experience that of just enjoying a coke( or a Pepsi) on a hot summers day at the beach.

Imo, blind tests where a subject does not know what is being tested is not enough.
The subject also needs to be totally unaware that they are even being tested.

Sokel · Dec 31, 2022

Haven't read all the thread but I think I get the idea.
Historically everything is about power and misinformation.

Remember what it needed 20 years ago to get a really powerful amp?One that would measure decent too?
Yes,there was pro stuff around but their looks,measurements,fan noise really wouldn't made them suitable.

So,inevitably such an amp would need to be big and heavy.If someone wanted it to look nice too even worst,it's not easy to have perfect looks in a solid 50 kilos metal box.
That at some time worked as an avalanche,in the mind of the less informed (or rich enough to not care) size/weight = quality and they applied this to everything.
The rest is known.

As a side note,I have couple of really nice Montblanc pens,precious to me as they are gifts by beloved ones.
But nope,my handwriting keeps looking like the child of an Egyptian ancient sculptor tried to replicate it's father,just like with the 10c pen :facepalm:

At least I'm aware of this.

bodhi · Dec 31, 2022

Considering all the criticism against blind testing, could it be possible to split the difference and agree that should there be differences between certain components they are only audible by trained (which is different from experieced) listeners in optimal conditions?

All this arguing about little details is weird as the elephant in the room is huge, night and day differences, which cannot be proven because invalid testing methods mask tiny details.

Anyway, I know the answer as I have asked that question many, many times. I have never gained an inch of ground.

Thorsten Loesch · Dec 31, 2022

jaymusic said:
A very different experience that of just enjoying a coke( or a Pepsi) on a hot summers day at the beach.

With a tot of Rum one hopes.

Thor

MarkS · Dec 31, 2022

Blumlein 88 said:
You already know my complaint. Can you explain how recommending a known faulty methodology accomplishes anything? I'm just to be clear referring to the level matching by ear. How is that going to work, when we already know it will generate some positive test results that are nothing more than poor level matching? Do you think in some Bayesian way we'll get more right than not so it is better than nothing? You aren't going to convince Thorsten, and you might convince lots of audiophiles their beliefs are true when they aren't. How is this a step forward?

Each trial starts with the volume at zero (no sound at all). It is then turned up by the listener, who must not have any visual or tactile indication of the volume level (no reading the volume meter on the preamp, no counting clicks of a stepped volume dial, etc).

Since the level is set only by ear, a perfect listener would set Amp A to play at a specific volume (call it 0dB) and ALSO set Amp B to play at the SAME volume (0dB), even though these might be DIFFERENT settings on the preamp (if the two amps have different gains). Thus the levels are matched. If the listener is not perfect in setting the volume, it still does not matter, because any deviation from 0dB from trial to trial will not be correlated to which amp is in the system, since the listener does not have this information.

A different, incorrect, procedure would be to set the preamp to the same output for both amps in every trial. BUT THIS IS NOT MY PROCEDURE!!!

Thorsten Loesch · Dec 31, 2022

bodhi said:
Considering all the criticism against blind testing,

I never criticised blind testing. Only a specific subset / implementation / protocol known as "Audio ABX test by ABX Company of Troy Michigan" and its various offshoots.

Thor

Blumlein 88 · Dec 31, 2022

MarkS said:
Each trial starts with the volume at zero (no sound at all). It is then turned up by the listener, who must not have any visual or tactile indication of the volume level (no reading the volume meter on the preamp, no counting clicks of a stepped volume dial, etc).

Since the level is set only by ear, a perfect listener would set Amp A to play at a specific volume (call it 0dB) and ALSO set Amp B to play at the SAME volume (0dB), even though these might be DIFFERENT settings on the preamp (if the two amps have different gains). Thus the levels are matched. If the listener is not perfect in setting the volume, it still does not matter, because any deviation from 0dB from trial to trial will not be correlated to which amp is in the system, since the listener does not have this information.

A different, incorrect, procedure would be to set the preamp to the same output for both amps in every trial. BUT THIS IS NOT MY PROCEDURE!!!

Let us suppose the sources are the same or very close. We know the volume setting will vary by around a .5 db level. You have just thrown a heads or tails random factor into the middle whether there are differences or not. If there are this might result in thinking there are none. If there are no differences, then it probably doesn't matter. Setting by ear results in many of the trials having a level difference effecting the result. Read the link from earlier in the thread referencing #27 in that thread and the resulting discussion. One suggestion for a control is to slightly mismatch level to test if it is detected. If you are level matching by ear it is just more noise in the results.

That is without getting into the issue of starting every track from silence and wasting time every switch of the trial.

I'll stop commenting on your test method. Maybe you should give a try with something known to be just clearly audible and see what happens. I won't say anymore about it.

Is the entire audio industry a fraud?

Major Contributor

Grand Contributor

Grand Contributor

Major Contributor

Grand Contributor

Senior Member

Major Contributor

Grand Contributor

Major Contributor

Grand Contributor

Senior Member

Major Contributor

Grand Contributor

Member

Master Contributor

Major Contributor

Senior Member

Major Contributor

Senior Member

Grand Contributor

Similar threads