• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Master Thread: Are measurements Everything or Nothing?

Frankly, I think the problem is that there is more than one discussion taking place and people are someone talking past each other.

The first discussion concerns the sound waves produced by a component. In this case, I think that most of us are in agreement, electronics that measure well end up producing the same sound waves. Components that that fail to be discriminated between them in ABX produce either the same sound waves or are beyond the limits of human perception to discriminate when only the sound stimuli is taken into account. There are components that reproduce the signal stored within the recording with more fidelity, those are components that "measure well" by objective standards.

The second discussion concerns the experience of listening to music.
In our daily lives, we listen to music with full knowledge of what's reproducing it, we have bias from the marketing we were subjected to for the components we are listening to. We have bias from preconceptions about how we feel something should be designed and build, that is we have bias for the technical choices made by the manufacturer. We have bias from spending money on a component. We have bias based on how a component looks. We have bias from how the inside looks. We have bias from the name of the brand.

So my answer to your question is that ABX is well suited to answer the question of the fidelity of musical reproduction in the context of auditory stimulus only. ABX is probably ill-suited at predicting the emotional response of a random person listening to music with a set of components.

A person hearing that a piece of music being reproduced with fidelity especially in a sighted context, and the sound waves emitted from a set of components reproducing with fidelity what was on the recording, are not the same thing. Depending on the set of components, the specific recording, the listener, the listening conditions, they sometimes coincide, but not always.
thank you. I don't disagree with most of what you're saying. It's just that it doesn't address the points I made.

when testing a speaker, If blind testings involve other kinds or biases, why should it be preferred over sighted listening?
 
I could possibly be done.

You'd need a single speaker in place of each musician. The speakers would need to have close to perfect constant directivity and have dispersion patterns mimicking those of the individual instruments. They would also need to have SPL capabilities matching the instruments without reaching audible compression, and of course need to be able to cover the same part of the audible spectrum + have no audible resonances or notable deviations from a flat frequency response.

Each of the musicians would have to be recorded individually in an anechoic chamber with a microphone array capable of capturing the entire useful part of their dispersion pattern. You'd need to give the musicians in ear monitors and possibly some sort of visual aid to give them what they need to play as an orchestra, and you'd natually also have to make sure that all of the speakers play their part with correct timing.

If you go through all of that trouble (an possibly more), I bet it would give you a pretty mind blowing illusion.

And yes, we all know that a traditional stereo setup doesn't sound like the real thing when compared side by side, no matter what you do. But that has nothing to do with the importance of measurements.
The directivity issue is over thinking it. Do you lose and have to find or recognize a violin when the musician turns 30 degrees and the directivity has changed? Do all musicians with various instruments present some consistent direction aiding you in listening to them play? No to both.

I've close miked people in a dampened, but far from anechoic room. Had each on a separate track which fed a separate loudspeaker one per musician. You can create a highly lifelike result like that. You can take it to different rooms and in each you would think the group is playing there where you are. You can move the speaker or switch feeds and it is as if the musicians had changed positions, but still seemed real.

The demo has been done a few times publicly. I remember in one case it was an outdoor concert with an amphitheater shell. The musicians had been recorded on a remote hillside to get something close to anechoic conditions. Then each was playing on an instrument next to a speaker. At some points they would mimic playing as the speakers were going and then stop while the music kept going and the audience was unaware of when it happened. In those cases I guess you had the sighted influence of seeing the guys, but most would not have believed that would work.

The factors are for each musician you have a real sound source no phantom images. The sound source is real and co-located with where you think it is. Whatever your room does it is doing the same as if a musician were in the location. So directivity differences, the fact the microphone picks up from on spot etc etc is not enough to disrupt the illusion.

If you mike too far away or the original location is too reflective you begin to erode that live that person is here illusion. It is a they are here illusion and not a you are somewhere else illusion. This effect is easier to pull off than most imagine. It also means it is less of an indicator of fidelity in the recording than most think it is.
 
thank you. I don't disagree with most of what you're saying. It's just that it doesn't address the points I made.

when testing a speaker, If blind testings involve other kinds or biases, why should it be preferred over sighted listening?
It depends on what you are testing for and how you approach the hobby.

For example, if you design and test for performance, it should be blind, no question.

If you test for personal enjoyment, it depends on you personally. If you know you can get past looks, you might go for the speaker with the better measurements even the first sighted comparison had the better looking speaker sound better. If you know looks plays a very important role in how you enjoy things, you might good for the other speaker instead.

Like it or not, placebo works on the human brain even if you know it. So you might want to "flatter" that part of your brain if it leads to better enjoyment. Or you might not, maybe for ideology, maybe because you're not sensitive to specific placebos, that is also a thing that happens.
 
thank you. I don't disagree with most of what you're saying. It's just that it doesn't address the points I made.

when testing a speaker, If blind testings involve other kinds or biases, why should it be preferred over sighted listening?
How many biases are involved with blind speaker testing? I don't know of any for sure, but let us say there are a couple. Well with sighted listening you'll have those same biases PLUS, speaker appearance, size, design type, reputation, and knowledge of price to cloud the waters. And we know from research these sources of bias are very, very strong. Enough to completely flip the rating of listeners due to the biases.
 
How many biases are involved with blind speaker testing? I don't know of any for sure, but let us say there are a couple. Well with sighted listening you'll have those same biases PLUS, speaker appearance, size, design type, reputation, and knowledge of price to cloud the waters. And we know from research these sources of bias are very, very strong. Enough to completely flip the rating of listeners due to the biases.
Yes, a mountain was just compared to a molehill, and declared a tie.
I think this person needs to study actual literature, instead of playing a game of fallacies and false equivalencies.
 
So this is the hypothesis: ABX tests are not a reliable way to test one’s ability to hear differences in music reproduction because the situation of the test itself involve great psychological bias.

you actually can't.
I proposed an hypothesis grounded on a reasoning with premises and deductions. I gave the reasoning for everyone to see/contradict. To disapprove it, one has to either contradict the premises or show that the deduction is a fallacy.
You just gave a number from your personal intuition that conveniently suits your beliefs.
Very different attitudes.
I never said I fully believe in this hypothesis. It may be wrong. But at least, it respects some basic rationality principles. You just affirmed something.
I think the problem with your hypothesis is that when you blind test people and there is a real audible difference, even a very small difference, between A and B, they tend to ace the test.
 
Like it or not, placebo works on the human brain even if you know it. So you might want to "flatter" that part of your brain if it leads to better enjoyment. Or you might not, maybe for ideology, maybe because you're not sensitive to specific placebos, that is also a thing that happens.
But are these effects stable? That's a critical question. I think we've all had the experience of being impressed by something, but then fatigued or unimpressed over longer periods. When your cool-looking gizmo is a well-known part of your living room, does it affect your listening the same way?

I say they are not stable. I can't prove it, of course. But that's why controlling for non-auditory biases makes sense to me, even as a matter of preference. Your sighted impression is simply the auditory impression plus an unknown and probably unstable variable.
 
P3: considering that for many subjects that participate in audio blind testings, the psychological stakes involved in passing the test are high (whether it’s the pressure to perform or the fear to fail, the fear to feel shame, to reconsider all their past beliefs…) and then might skew the experience.
For those who suffer with performance anxiety we could administer a mild "sedagive"
 
Last edited:
Familiarity and experience also would reduce anxiety about the test. Blind testing is used in the food, perfume, wine and other industries. It works, very well. But somehow it doesn't work with music? If you have such a premise that is unusual, you need something to make you think it is so.
 
Second conclusion/first question: does that not entail that ABX tests are not suited to assess the ability of individuals to distinguish/assess/listen critically?

So this is the hypothesis: ABX tests are not a reliable way to test one’s ability to hear differences in music reproduction because the situation of the test itself involve great psychological bias.

How (un)likely does that sound to you?
Don't fully understand your question. But here is my take.

You play the same piece of music using 2 (may be) different pieces of electronics. (The "may be" is there because that particular ABX test can be a negative control.) So when the piece of music is the same, my question is why would there be a different and consistent emotional/psychological impact that is caused by A but not B or vice versa?

I think much of your hypotheses are addressed by positive and negative experimental controls. If one is serious, the ABX test should not be run in isolation. It should be a series of tests with different A's & B's. If the A & B are well known to be audibly different (positive control) and the testee couldn't identify them, that tells us something about the testee. If the A & B are the same (negative control) and the testee can tell them apart consistently, that tells us there is something wrong with how the test is conducted.
 
1) how I listen to music is directly affected by how it moves me and how it moves me is directly determined by how I feel when I listen to it.

The test though, that I figured we are talking about, has to do with detecting Sonic differences not whether one is being moved by music or not.

I mentioned in my first reply that if one is suggesting that the music itself, due to the type of emotional investment you suggest it demands, creates a variable, not only is that variable controlled for by keeping it constant, but separate tests, such as that I suggested can test for the claim about that variable. So as far as I can tell, the point you were bringing up was already addressed and my reply.

There is psychological/epistemological work that is being done on the "standpoint theory": namely, that your perception can be more acute when you're personally engaged in a situation. You can see things better if they matter more, to say it bluntly. Listening in a "clinical" mode might alter that type of perspective.


OK, I still think there seems to be a bit of slippage in terms of what is being tested for , engagement or the audibility of Sonic differences.

But as I understand it here you are saying that engagement level with the music can affect the ability to discriminate between Sonic differences. And that if the ABX conditions subdue that engagement, that may be lowering the discrimination abilities of the participant.

I guess I’d say that, since you’d want to be careful about begging the question here, you would want to supply evidence for that claim.
I’m actually not familiar with standpoint theory. But I think some evidence would have to be presented not only for that theory, but that the phenomenon uncovered is likely to be a problem variable in the specific type of testing we are discussing. And what type of audible threshold might we be talking about in terms of this “ lack of engagement” lowering discrimination? My wife has plenty of engagement with Taylor Swift’s music. I have essentially none. But I’m pretty sure I could use a Taylor Swift track to discriminate between audible differences in an ABX test.

2) that seems to avoid the problem of the "ABX setting" in general. Have you ever played a track you love very much for someone else? Have you not felt that you "heard" it differently? Are you not ready to believe that your psychological state differs vastly whether you're trying to assess a speaker blindly or letting yourself go into your emotions while playing a record for yourself?

I’m not totally clear whether you’re argument is laser targeted on the ABX format, or whether it is to apply to any proper blind, listening test.

But since you were asking about my experience, here’s some experience:

I had a couple CD players and an outboard DAC in the 90s. They seemed to sound slightly different to me. I would play plenty of my favourite tracks on those CD players depending on which I had in my system, which I’m very engaged in.
I decided to do a proper blind test between them (volume level matched at the speaker terminal, random switching with a helper, etc.)
It turned out the same audible properties I was identifying in normal sighted listening were there to be heard in blind and so I could easily tell them apart. These were subtle (obvious to me, but still subtle ) differences.

This suggests the “ stress” of the blind testing did not Interfere with hearing the type of subtle Sonic differences I perceived during my informal listening.

How does that fit with your argument?
 
How many biases are involved with blind speaker testing? I don't know of any for sure, but let us say there are a couple. Well with sighted listening you'll have those same biases PLUS, speaker appearance, size, design type, reputation, and knowledge of price to cloud the waters. And we know from research these sources of bias are very, very strong. Enough to completely flip the rating of listeners due to the biases.
Sure! I agree! It is possible to conclude there are no satisfying way to test. It's not necessary that there is one :)
 
Yes, a mountain was just compared to a molehill, and declared a tie.
I think this person needs to study actual literature, instead of playing a game of fallacies and false equivalencies.
ahaha, the smugness.

You know, when talking about something, there is always at least two expertises at stake. The expertise on the topic and the expertise of constructing a rational proposition self aware of its own limits and how the probabilities that ground it.

if you have just the former, you're petty much certain to talk foolishly.
if you have just the latter, you can avoid most danger.
 
Don't fully understand your question. But here is my take.

You play the same piece of music using 2 (may be) different pieces of electronics. (The "may be" is there because that particular ABX test can be a negative control.) So when the piece of music is the same, my question is why would there be a different and consistent emotional/psychological impact that is caused by A but not B or vice versa?
Yeah, like this:
Many of the thought-experiments being posted here as novel, were already explored in practice, often a very long time ago.
I think much of your hypotheses are addressed by positive and negative experimental controls. If one is serious, the ABX test should not be run in isolation. It should be a series of tests with different A's & B's. If the A & B are well known to be audibly different (positive control) and the testee couldn't identify them, that tells us something about the testee. If the A & B are the same (negative control) and the testee can tell them apart consistently, that tells us there is something wrong with how the test is conducted.
This is a great point. It's not like just one aspect of the truth table has been tested here.
 
Familiarity and experience also would reduce anxiety about the test. Blind testing is used in the food, perfume, wine and other industries. It works, very well. But somehow it doesn't work with music? If you have such a premise that is unusual, you need something to make you think it is so.
As I said, it is also very much consensual in psychology that it does not work in many other cases. And I have offered reasons to believe music could fall into the category where psychologists agree it needs a more complex protocol.
Generally, studies that involve emotions need a more sophisticated approach than studies involving mere sensations. That's just how the field works.
 
Sure! I agree! It is possible to conclude there are no satisfying way to test. It's not necessary that there is one :)
Satisfaction must not include accuracy for your statement to be so. Blind testing is well vetted to be reliable in many fields including audio. So your clumsy attempt at cleverness isn't clever. Sorry.
 
Don't fully understand your question. But here is my take.

You play the same piece of music using 2 (may be) different pieces of electronics. (The "may be" is there because that particular ABX test can be a negative control.) So when the piece of music is the same, my question is why would there be a different and consistent emotional/psychological impact that is caused by A but not B or vice versa?

I think much of your hypotheses are addressed by positive and negative experimental controls. If one is serious, the ABX test should not be run in isolation. It should be a series of tests with different A's & B's. If the A & B are well known to be audibly different (positive control) and the testee couldn't identify them, that tells us something about the testee. If the A & B are the same (negative control) and the testee can tell them apart consistently, that tells us there is something wrong with how the test is conducted.
I'm sorry that does not address the core of my (maybe extremely weak and stupid) hypothesis.
if this hypothesis is so lame, it should be easy to attack the reasoning at its core, I feel.
 
As I said, it is also very much consensual in psychology that it does not work in many other cases. And I have offered reasons to believe music could fall into the category where psychologists agree it needs a more complex protocol.
Generally, studies that involve emotions need a more sophisticated approach than studies involving mere sensations. That's just how the field works.
The blind tests are about sensations. The sensations must differ if the gear makes a difference. If you are studying emotions after the sensations, then the sensations aren't part of the subject. The audio gear has no part to play in that in terms of performance.

Can you give some examples of the more complex protocol these psychologists agree upon?
 
Seems to me (700 pages in, I've not read them all) that;
Sometimes there are audible differences, as in speaker comparison.
In that case it's reasonable that people can hear a difference and the challenge is to communicate what they hear.
Some people are better at hearing those difference, some are better trained to explain them.
When I worked in the traditional wine trade (regular blind tastings) it was easiest to communicate with the group I regularly tasted with, and harder with other people. Explaining something to 'interested amateurs', or trying to understand what they meant, was tricky!
There's a challenge to use clear and objective language, and avoid flowery fanciful descriptive nonsense.
We can work on that.

(That background, followed by working on clinical trial reporting for big Pharma, helped me adjust to ASR fast ... but I was still fooled for years regardless. I really should have known!)

The real issue is where there is no difference, or the difference is inaudible. That includes speakers reproducing the higher frequencies but is obvious for cables, networking, streamers, DACs and Amps operating in their comfort zones.

The challenge in the second case is to help others to accept that there is no difference. It's a big ask - of course they hear a difference, they really do. Undoing the learning of a lifetime is difficult.

Patience, repetition and a lot of resilience might be the best hope with that.

Two different issues; real differences that are hard to communicate, and false differences created by our biases.
 
Since I’m fascinated between live versus reproduced, the experiment you describe his fascinating. As I’ve mentioned, I’ve done my own modest life versus reproduced tests. (including fooling some people that there is live instrument being played in a room down the hall).

The factors are for each musician you have a real sound source no phantom images. The sound source is real and co-located with where you think it is.

This is one reason why, when I am in the presence of a loudspeaker demonstration or just myself demoing speakers, I like to compare it to real sounds as well.

This is merely for the sake of apprehending some general Sonic characteristics of live sound versus reproduced - the difference in Gestalt as it were.

For instance, I was listening to some expensive loudspeakers at my friends place and we were playing a vividly recorded male vocal track. It’s just the type of track that you’d often hear demos at shows and shops, to say “ isn’t that amazing it’s like they are right there singing!”

I had my friend go behind in between the loudspeakers and stand in the spot where I perceived the image of that male vocalist. Then I had him speak while turning the track of the vocals on and off. It really laid bare the difference between the reproduced voice and the real voice sound source in between the speakers. First of all, it reveals how the phantom centre image did not have the Solidity density and focus, the obvious “ air is being moved right from that spot” sensation you get when a real person is speaking in that spot. Which I take to be revealing of the inherent weakness of stereo reproduction - that sort of see-through phases of the phantom images it produces.

Also revealed other differences, such as the more dense and obviously human and organic quality of the real voice versus the more mechanical artificial aspects of the reproduced voice.

An excellent loudspeaker may be able to close the gap a bit more if we are talking about mono reproduction - EG voice played through a single loudspeaker. Most of my live versus reproduced test were using stereo.
 
Back
Top Bottom