• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Statistics of ABX Testing

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,769
Likes
37,633
I never worry about volume - because it makes zero difference when listening for defects: a mechanic doesn't move closer and further away from a car that's making a funny noise to "adjust the volume" - it's a yes/no decision as to whether there's a problem.

My gripe about so much of this testing is the use of the word "preference". To me, the sound is either 'right' or 'wrong', it's very clear cut - dud system A, dud system B, two unpleasantnesses - I would "prefer" to switch both off, and go outside ...

So you're a digital listener then. Everything is GOOD or BAD no in between. I guess you bypass all volume controls on all gear. The item is either on or off. Volume makes zero difference, there is either volume or no volume. I guess Fletcher_Munson doesn't apply to you. You can evaluate the relative frequency balance of a recording whether it is played back at 30 db or 130 db.

Your mechanic story is pretty funny too. I once worked on 24 cylinder methane powered engines. There were exhaust pipes, but in the building next to the engines it was around 115-120 db. If you simply listened to that even briefly you could hear nothing useful. Put on some ear muffs, knock that noise down 25 db, and you could hear details. You could hear a wrist pin making noise, or valves that needed adjusting or even that the timing was a bit off. So even your goofy mechanic example holds no water.
 

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
We don't. So we rely on tests that qualify categories of products. Our job is to indicate how unlikely it is for some product to make a difference/improvement. How far we go and how much one relies on that is up to the person. Let's get the data on the table as there are a lot more blind tests that we even know about.
So can you give (put on the table) examples of "tests that qualify categories of products" & how we might use it to make a judgement about a specific device we might be considering?
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
So you're a digital listener then. Everything is GOOD or BAD no in between. I guess you bypass all volume controls on all gear. The item is either on or off.
Strangely enough, I have no volume control at the moment. More correctly, I do, but I run it at the end of the pot track, because I've changed the gain setting circuitry so that this means the pot is effectively out of the picture. Why? Because it's a crap, cheap Alps pot - at some stage I shall experiment with decent parts, to see if I can get "transparency" there.

Your mechanic story is pretty funny too. I once worked on 24 cylinder methane powered engines. There were exhaust pipes, but in the building next to the engines it was around 115-120 db. If you simply listened to that even briefly you could hear nothing useful. Put on some ear muffs, knock that noise down 25 db, and you could hear details. You could hear a wrist pin making noise, or valves that needed adjusting or even that the timing was a bit off. So even your goofy mechanic example holds no water.
Strangely enough, conventional cars don't have noise abatement issues - so we're not talking silly examples here. The point being, is that you listen for something being wrong, that's the idea - if I hear nothing wrong, or I'm not in a fussy, investigative mood then it's 'right'; but if I I hear problems in just normal listening, or when I deliberately stress by putting on a very complex, treble infused track at high volume then it's 'wrong'.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Sorry fas but this couldn't be more wrong, unless of course you are referring to the most gross and obvious of defects.

Your hearing sensitivity changes with volume.

https://en.m.wikipedia.org/wiki/Equal-loudness_contour
To me, those defects are "obvious", because I very deliberately focus on listening for them. So I can hear them over a very wide volume range - and once I've zoomed in and noticed something then my consciousness has no problem "staying" with that artifact. What I will do is adjust the volume to see if the issue varies with level - usually a sign that a power supply is not optimum.
 

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,321
Location
Albany Western Australia
To me, those defects are "obvious", because I very deliberately focus on listening for them. So I can hear them over a very wide volume range - and once I've zoomed in and noticed something then my consciousness has no problem "staying" with that artifact. What I will do is adjust the volume to see if the issue varies with level - usually a sign that a power supply is not optimum.

Please provide evidence of that, beyond your subjective opinion. It is very likely to be the changes in hearing with level as described above.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Probably the clearest instance of that was the battleship Perreaux amp I started this journey with - it had a very noticeable issue in that the level of distortion of the treble was highly dependent on the output level - a pretty normal behaviour for many older era amps, of course. There was a clear point in the acoustic level where the cymbal splash started to go dead in the sound, the harmonics just fell off the cliff - whether I was close to the speakers or far away made no difference. And, no, it was not the speakers - it was highly dependent on the material, a piece that had a massive treble transient after some softer material was fine - the power supply had enough charge storage to handle this type of demand.

Down the track I did major surgery to upgrade the energy storage of this amp, and then those problems went away. With current, decent amps those types of issues are far less prevalent.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,769
Likes
37,633
Strangely enough, I have no volume control at the moment. More correctly, I do, but I run it at the end of the pot track, because I've changed the gain setting circuitry so that this means the pot is effectively out of the picture. Why? Because it's a crap, cheap Alps pot - at some stage I shall experiment with decent parts, to see if I can get "transparency" there.


Strangely enough, conventional cars don't have noise abatement issues - so we're not talking silly examples here. The point being, is that you listen for something being wrong, that's the idea - if I hear nothing wrong, or I'm not in a fussy, investigative mood then it's 'right'; but if I I hear problems in just normal listening, or when I deliberately stress by putting on a very complex, treble infused track at high volume then it's 'wrong'.
So now you might use a complex track at high volume to find a problem after saying volume did not matter.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
So now you might use a complex track at high volume to find a problem after saying volume did not matter.
Because, I'm using "volume" to tickle out problems, stressing the system to encourage it to misbehave - issues arise for various reasons, some because the power supplies are not sufficient, others because the components are not robust enough against interference - and the "tactics" are different.

The principle is to ascertain whether the system will behave itself under all conditions, rather than just rely on a series of standard, relatively static tests.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,769
Likes
37,633
Because, I'm using "volume" to tickle out problems, stressing the system to encourage it to misbehave - issues arise for various reasons, some because the power supplies are not sufficient, others because the components are not robust enough against interference - and the "tactics" are different.

The principle is to ascertain whether the system will behave itself under all conditions, rather than just rely on a series of standard, relatively static tests.

Which contradicts your previous statements about volume matching not being important. If you compare two devices or parts, the fact they might act differently at various volume levels is one reason to match. The other is you hear differently at different volumes which necessarily effects your ability to discern whether you understand that or not.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
You misunderstand. I'm using volume purely to elicit bad behaviour from a single system, not to compare things. I have no interest in comparing, at the moment - I just want to eradicate all flaws that cause audible artifacts in one particular system.

If I had two separate systems that differed and both had no audible issues, then I would be interested in comparing the sorts of things you have in mind - but I haven't reached that point of having two setups on the ground to try that; something for further down the track.

I'm certainly aware that the raw quality of a component of a system can shine through, even though the rig may still have issues - at times I hear a 'superior' quality in another person's set of equipment which I don't usually get, but that doesn't bother me ... my "shtick" is get the system in front of me to work to the best of its inherent ability.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,769
Likes
37,633
If you elicit bad behavior, then eradicate it you are comparing two different conditions though at different times. The only problems that will work with are very large ones. There are much better, consistent, and finer levels at which to work.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
Statistics of ABX Testing
By Amir Majidimehr


In the table below, I have computed the answer for 10, 20, 40, 80 and 160 trials:

ABX-Statistics.png

There exists a problem with the binom.inv function (or maybe in the description of the function, struggling with "at most" or "less than" ) as it returns apparently the number that actually means "greater than" , so instead of 8 correct answers 9 are needed (SL = 0.05), 15 instead of 14 and so on.
For example, the cumulative probability of P(X<8) = 0.9453 and therefore P(X>=8) = 0.0547 , so slightly above the line.
 

Phelonious Ponk

Addicted to Fun and Learning
Joined
Feb 26, 2016
Messages
859
Likes
216
If you elicit bad behavior, then eradicate it you are comparing two different conditions though at different times. The only problems that will work with are very large ones. There are much better, consistent, and finer levels at which to work.

You're wasting your time. You're not really even talking to Frank, you're talking to the voices in his head. It couldn't be a more futile endeavor.

Tim
 
Last edited:

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Tim, you ol' rascal you, can't keep you down, can we now? In Tim's world, there are things that you can hear ... and everything else is nonsense, right? Dearie, me ...

As someone who can't even hear that listening at 720p on YouTube makes a difference, you have zero credibility in terms of being able to distinguish audible variations in sound, I'm afraid.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,671
Likes
241,052
Location
Seattle Area
There exists a problem with the binom.inv function (or maybe in the description of the function, struggling with "at most" or "less than" ) as it returns apparently the number that actually means "greater than" , so instead of 8 correct answers 9 are needed (SL = 0.05), 15 instead of 14 and so on.
For example, the cumulative probability of P(X<8) = 0.9453 and therefore P(X>=8) = 0.0547 , so slightly above the line.
Welcome to the forum. And yes there is rounding error there but as I said there is no magic in 95% that stops being so at 94.5%.
 

Jakob1863

Addicted to Fun and Learning
Joined
Jul 21, 2016
Messages
573
Likes
155
Location
Germany
Welcome to the forum. And yes there is rounding error there but as I said there is no magic in 95% that stops being so at 94.5%.

Thank you very much for the welcome.
It seems to be a systematic error as the function returns always (means in the ~20 numbers i´ve tried) the number that is one count to low, so the description of the function is misleading. We are looking for the number of successes with a cumulative probability of >= 0.95 (at least 0.95), while the binom.inv delivers apparently the number of successes with a cumulative probability of <=0.95 (at most 0.95) .

Nevertheless you are absolutely right, there is no magic in the usual criteria hence it is often better to report the p-values and let the reader decide if 5,x% constitutes an unbearable risk while 4,x% does not (for example).
 
Top Bottom