Why is audio objectivism so frequently focused on all the wrong things?

Mikey · Oct 27, 2019

This is going to seem a little trolly, because it is a basically existential criticism of this site. But I'm actually sympathetic to the goals of audio objectivism, and this post arises from my frustration with how it's so often carried out. In particular, one of the most perplexing things to me about the world of audio objectivism, as embodied on this site especially, is that it doesn't seem to even be trying to achieve its stated goals.

Because the problem at hand is that sighted listening impressions are -- famously, notoriously -- unreliable. It's provably impossible to remove expectation bias and to prevent non-auditory cues from strongly coloring audible impressions. Combine that with the weakness of long-term auditory memory, and it's clear that most subjective impressions/reviews aren't worth the photons they're displayed with. We need better!

And so we know how to do better: You do blind comparisons, and try to find what product is preferred when people don't know what it is. This is how Floyd Toole and Sean Olive compared speakers at the NRCC and Harman; this is how Wine Spectator compares and scores wines. (As an aside, how embarrassing is it that wine magazines have vastly better methodological rigor than audiophile magazines!)

If you want to take it a step further from that, you can try to identify patterns based on what you find in the blind comparisons (as Toole and Olive did with the "spinorama" measurements), and then do further experiments to see how strongly correlated those measurements are with blind preference -- keeping in mind that it's the blind preference that's the ultimate arbiter, and the measurements that are the hypothesis being tested.

But that's not what this site's flavor of objectivism does. This site doesn't do any blind listening comparisons at all. It appears to just take as a given, for no obvious reason, that the set of measurements Amir performs are the set of measurements that would correlate to blind listening preference. This seems like a strange assumption, particularly because those in the subjectivist camp already know about these measurements, and believe that they are not correlated with audible performance at the levels generally measured here, that there are other factors in play that matter.

What's worse is, even if you believe that this set of measurements is what's important, it's not clear that any of the measurements (except for the most broken units) rise to any level of significance whatsoever. Take the "SINAD" tests for DACs, for instance. Amir is savage about the Schiit Modi Multibit, because it measures worse than other products. Due to a second harmonic at -80dB, and higher order harmonics and other noise at -100dB, its SINAD measurement gets a "red" rating, as obviously inferior to other, better-measuring products.

But why? The AES has done actual listening tests, and they've experimentally established limits to the audibility of THD, and they're much, much higher than we're dealing with here, particularly for 2nd order harmonic distortions. If your hypothesis is that measurements like SINAD tell the full story of listening quality, then based on what we know about perceptibility of distortion, your conclusion has to be that any product with measurements like the Modi Multibit is audibly perfect, and all this green/yellow/red stuff is marketing fluff that conveys no useful information.

So it seems to me that there are two possible scenarios in play:

1. There are audible differences between (non-broken) DACs and amps, such that you can tell the difference between a Modi Multibit and a Topping DX70 by listening. If that's the case, these differences aren't found in the numbers this site is collecting and publishing, so wouldn't it make more sense to go back to blind testing to collect more data about where listener preferences genuinely lie, and then try to identify a new hypothesis about which numbers do matter for that?

2. There are no audible differences between (non-broken) DACs and amps, such that you could buy basically anything above a certain quality level, and it'd be audibly indistinguishable from anything else. In that case, these measurements should just be pass/fail, with no reason to give any more detailed breakdown (as there's no benefit to over-engineering inaudible "improvements"), and recommendations should be based on price, build quality, and ergonomics. But rather than assuming this, it seems like you'd want to prove it first, by actually doing those blind preference tests, like Toole did with speakers.

Either way, it's hard for me to see the value in taking well-understood measurements, and then grading equipment based on how well it performs at solidly inaudible levels on those metrics. That has the appearance of science, but not the substance. It's not even rewarding good engineering, because good engineers don't gold-plate irrelevant metrics, they focus on what actually matters.

You've probably heard the old saw about the man looking for his keys in the parking lot. A good samaritan comes over to help him, and after some fruitless minutes, asks whether the man is sure that he lost the keys here. "Oh no," the man says, "I lost them in the bushes, but I'm looking here because that's where the light is."

Audio objectivists have spent too long looking where the light is; It's time to start looking in the bushes.

BDWoody · Oct 27, 2019

fuzzyqoute said:
The biggest gripe i have is that they can't answer why bad measuring gear can sound fantastic to better gear?. There people who love the Grado SR225e over the HD600. That a headphone with 10% on bass and 3% treble distortion wise.

You not liking the answers, doesn't mean the questions haven't been addressed...

Hmmmm...first two posts in a troll'ey thread from two first time posters...

If you actually want to try to answer your questions, maybe do more reading, or actually ask them and not join to gripe about how you poor poor people can't get answers that satisfy your biases?

Unless of course it is just simple trolling...

Julf · Oct 27, 2019

fuzzyqoute said:
The biggest gripe i have is that they can't answer why bad measuring gear can sound fantastic to better gear?. There people who love the Grado SR225e over the HD600. That a headphone with 10% on bass and 3% treble distortion wise.

Sure we can. It is well-known that added harmonic distortion makes music sound "fuller" and "more punchy", and those levels of distortion alos imply a degree of compression (with a similar effect). This can easily be verified by simulation/emulation with tools like tube saturation plugin and tape saturation plugin.

Blumlein 88 · Oct 27, 2019

I suppose 1st off your link on THD is for woofers only. You can't extrapolate that to the whole audible bandwidth.

I also am not surprised by the results.

We have a very good idea of how low in level things need to be to prevent any possibility of non-linearity being audible. Let us just call it -120 db.

Then we have a gray area where specifics matter. I've personally suggested -70 db on THD, and SNR of 100 db is enough to be inaudible with some margin of safety already included. In between are some pretty wide gray areas.

Now I certainly think a DAC that is very expensive, claims superior sonics, and measures poorly even if at inaudible levels represents poor engineering along with poor value. Why pay $10K for a device that might barely squeak by causing no audible problems when an excellent device for $500 has performance removing audibility without question? I can see paying more for features, convenience, appearance, interface qualities etc. But not 20 times as much and in the bargain getting much lesser measured performance even if identical audible performance. There is no good reason for it.

The reason for the site I think is a counter-point to sites that describe all these qualities between units from subjective evaluation. The measurements instead are hard to come by info with only a few other sources. So much high end audio is based upon lies, this site attempts to give real data on claims. A further evolution in the future may be to move toward what is needed for audible transparency. While a pass fail list might be fine, I also like all the detailed measurements especially of expensive gear that doesn't perform particularly well.

Cosmik · Oct 27, 2019

It's simple: each item of the audio chain has a pre-defined function to fulfil based on engineering specifications. It's not like wine whose 'function' is to 'be' the thing that the human perceives. An audio system has no sound of its own (or it shouldn't) so there's no point listening to 'it'. The only question should be how transparent it is, and that can be ascertained with measurements that address the pre-defined function of the device.

audimus · Oct 27, 2019

I don’t see the OP as trolling. It is a justifiable position and something I share as well.

The best analogy I can provide is a site that measures (and with the best precision and statistical validity possible) the efficacy of different Statin brands to lowering cholesterol but with dubious link to those levels lowering heart attacks. The measurements here have the same dubious links to audibility.

At the extremes, it does make sense and there are some benefits. Obviously anything that lowers cholesterol couldn’t hurt but at what price (i.e., side effect)? Obviously, brands that don’t actually lower cholesterol while claiming to do so should be exposed regardless of whether the lowering results in less heart attacks or not.

On the other hand, the encouraged misinterpretations are dangerous
Given the uncertain link to lowering heart attack rates

1. Does it matter if one lowers the levels to 40% while another does so by 50%?
2. Does it make sense to bash a multi-drug that amongst other benefits (control sugar levels, inflammations) lowers cholesterol only by 20% while the best Statins do so by 50%? Shouldn’t this multi-drug be produced to the same level of cholesterol efficacy as the best of the Statins? Never mind if the additional 30% gives any benefit or not.

And yet, that is exactly what the measurements and relative comparisons here are doing.

Martin · Oct 27, 2019

The problem is that blind listening tests are still subjective; they are dependent upon personal opinion and not on measurable and verifiable fact.

Anything objective sticks to the facts, but anything subjective has feelings. Objective and subjective are opposites.

Objective: DAC A measures better than DAC B.

Subjective: DAC A sounds better than DAC B.

Martin

Mikey · Oct 27, 2019

Blumlein 88 said:
Then we have a gray area where specifics matter. I've personally suggested -70 db on THD, and SNR of 100 db is enough to be inaudible with some margin of safety already included. In between are some pretty wide gray areas.

Now I certainly think a DAC that is very expensive, claims superior sonics, and measures poorly even if at inaudible levels represents poor engineering along with poor value. Why pay $10K for a device that might barely squeak by causing no audible problems when an excellent device for $500 has performance removing audibility without question?

As I say, I think there are two scenarios here:

1. The sound is genuinely indistinguishable, because the measurements capture everything we need to know. In this case, aesthetics/price/etc are the only things you need to look at. Measurements beyond the audibility threshold are irrelevant.

2. The sound actually has distinguishing characteristics, because the measurements that have been performed don't capture everything we need to know (like how early amplifier distortion measurements failed to capture TIM, as this IEEE paper from 1977 explains, and so "great measuring" amps actually sounded empirically terrible).

But neither of these two scenarios is consonant with grading gear based on inaudible measurements. So why do that?

The reason for the site I think is a counter-point to sites that describe all these qualities between units from subjective evaluation. The measurements instead are hard to come by info with only a few other sources. So much high end audio is based upon lies, this site attempts to give real data on claims. A further evolution in the future may be to move toward what is needed for audible transparency. While a pass fail list might be fine, I also like all the detailed measurements especially of expensive gear that doesn't perform particularly well.

I think the "doesn't perform particularly well" thing is a psychological trap. It's easy to look at "better" numbers as being better, but in fact if they're already past the limit of audibility, then there's no meaningful difference between them. Distortion that's lower than other inaudible distortion is, according to what psychoacoustic science tells us, effectively the same. Both pieces of gear perform just as well, in the case where they both measure audibly perfect. It's an irrational part of our brain that's telling us that the lower number is "more inaudible" or otherwise better, right?

I'm not going to argue that it's not interesting to see the numbers -- I like numbers! I like graphs! But you have to be careful not to draw the wrong conclusions from things, and if everything is measuring inaudibly perfect, you shouldn't draw conclusions about "well engineered" or not, since it's all perfect.

MediumRare · Oct 27, 2019

Mikey said:
The AES has done actual listening tests, and they've experimentally established limits to the audibility of THD, and they're much, much higher than we're dealing with here, particularly for 2nd order harmonic distortions.

I'm open to your argument, but your link does not support your statement. My reading is that that only very narrowly discusses low bass THD, and starts with some chosen THD "maximum". Please provide a link that demonstrates clinical audibility of THD.

Mikey · Oct 27, 2019

Martin said:
The problem is that blind listening tests are still subjective; they are dependent upon personal opinion and not on measurable and verifiable fact.

So this is where I'll encourage you to read Floyd Toole's research, because what they found when they did blind tests with speakers is that in fact, there wasn't a "subjective taste" aspect to it. When you blinded the device under test, so that nobody could vote for what they "knew" that they liked, everyone actually ended up liking the same thing, speakers that measured flat in an anechoic chamber and had even dispersion patterns.

This is actually a sort of surprising outcome; given how many different kinds of speakers are out there, and how there are different tribes that adhere to different design philosophies (planars! "time and phase coherent" Dunlavy or Thiel designs! ribbons! electrostats! horns!), you might have expected individuals to have strongly divergent personal preferences. And given how low-end consumer speakers are set up to give a midbass boost and treble sizzle to impress on the sales floor, you might have expected people's preferences to go toward models that did that. But in fact: people almost universally had the same preferences, and those preferences were toward anechoic neutral speakers.

That's a major discovery in its own right (and one that gets under-appreciated, because it matches up to what a lot of people's intuition was beforehand, that oh yeah, of course flat is better).

Is the same thing true of DACs and amps? Or are things different there, with people actually falling into one of several preference camps? The answer apparently is "nobody knows," because the subjectivists don't care to do the blind tests, and too many of the objectivists have already convinced themselves without any evidence that they know which measurements, and which values on those measurements, should be preferred, so there's no point checking.

GrimSurfer · Oct 27, 2019

The issue is clear to me.

@amirm, you have to start blindfolding your Audio Precision analyzer.

@Mikey: I think you're reading too much into blind testing. The purpose of blind testing is to reduce visual bias. The results are not objective. They are still based on "preference", which is a subjective quality. The only thing that increasing the sample size of the blind audience does is reduce the results from being skewed by individual preference.

When Toole talks about things like the Harman curve, he's highlighting the findings of blind tests. I am not aware how he filters out the influence of past preferential experiences and how these impact current listener choices. Perhaps the reason why people don't like a flat frequency curve is because we're not generally exposed to such things. Or it could be that natural conditions of our species (man outdoors) is wired to prefer higher attenuation of HF sound, as would happen in a free space with no reflections or natural surfaces that attenuate such sounds (leaves, grass etc). Regardless, Toole's findings wrt the curve reflect human preference and are not wholly objective because of the nature of test subjects.

Julf · Oct 27, 2019

Martin said:
The problem is that blind listening tests are still subjective; they are dependent upon personal opinion and not on measurable and verifiable fact.

Yes and no. A test that asks "does A or B sound better to you" is subjective, but "is X the same as A, or the same as B" is objective when you have enough tests to provide statistical significance.

BDWoody · Oct 27, 2019

audimus said:
I don’t see the OP as trolling. It is a justifiable position and something I share as well.

I was more talking about the second poster, who has since deleted his post and disappeared.

To the OP: As the new norm of great and ever greater numbers settles in, taking time to actually figure what might mean what is certainly worth thoughtful discussion. If I came across as dismissive, my apologies.

It's been like whack-a-mole lately with some less than thoughtful dissention.

Blumlein 88 · Oct 27, 2019

Mikey said:
As I say, I think there are two scenarios here:

1. The sound is genuinely indistinguishable, because the measurements capture everything we need to know. In this case, aesthetics/price/etc are the only things you need to look at. Measurements beyond the audibility threshold are irrelevant.

2. The sound actually has distinguishing characteristics, because the measurements that have been performed don't capture everything we need to know (like how early amplifier distortion measurements failed to capture TIM, as this IEEE paper from 1977 explains, and so "great measuring" amps actually sounded empirically terrible).

But neither of these two scenarios is consonant with grading gear based on inaudible measurements. So why do that?

I think the "doesn't perform particularly well" thing is a psychological trap. It's easy to look at "better" numbers as being better, but in fact if they're already past the limit of audibility, then there's no meaningful difference between them. Distortion that's lower than other inaudible distortion is, according to what psychoacoustic science tells us, effectively the same. Both pieces of gear perform just as well, in the case where they both measure audibly perfect. It's an irrational part of our brain that's telling us that the lower number is "more inaudible" or otherwise better, right?

I'm not going to argue that it's not interesting to see the numbers -- I like numbers! I like graphs! But you have to be careful not to draw the wrong conclusions from things, and if everything is measuring inaudibly perfect, you shouldn't draw conclusions about "well engineered" or not, since it's all perfect.

I would discount #2 above. That might seem close minded, but if someone can come up with listening tests indicating otherwise then we have something to work with.

The other part is again something of a gray area. I can put together a string of gear that might just be inaudible on its own, but adds up to being audible paired with other gear. So giving a DAC a pass just barely over a limit isn't a good idea without knowing how it will be used. Pass/fail is usually a very poor way to grade anything. It necessarily filters out useful information.

DACs are a particularly odd thing in that there is no reason not to have performance well beyond any chance of audibility at very low price levels. Give me a good reason to accept a DAC that doesn't manage 16 bit performance, even if it is inaudible, other than it being very, very, very cheap. And with a $9 Apple dongle managing the feat, we are talking really, really , really cheap. Since it is so easily attainable I'd say DACs should reach 16 bit performance or not be recommended. I'd probably be okay with that as a pass/fail even as much as I dislike pass/fail evaluations.

Amps, speakers, are a different thing. There is not good speaker testing available on ASR at this time. Amps, we might have a useful discussion on where to place that. Even in that case, if brand A costs half of brand B while having superior performance with identical features why would you choose brand B?

DonH56 · Oct 27, 2019

Mikey said:
This is going to seem a little trolly, because it is a basically existential criticism of this site.

I don't really "do" existential and others are handling the counter-argument.

I do object to this:

Audio objectivists have spent too long looking where the light is; It's time to start looking in the bushes.

Anyone who has been in this business for any length of time, or for that matter any sort of engineering, knows that the opposite is more likely to be true. Looking under the bushes, digging in the dirt to see the roots, and making sure everything is perfect in the dark and not just the light of day is what designers do. The "forest vs. trees" analogy might be more apt.

Preferences are easily swayed, some call it "marketing", and the best of engineering often cannot compete. Still, I cannot convince myself to give less than my best on any project. And, having worked around many other engineers for several decades, IME I am not at all alone.

solderdude · Oct 27, 2019

The only advice I can give Mikey is:

Do your own blind tests and do them properly.
The last part is the hard part... doing it properly.
Only rarely people go through the trouble.
AND
Do many of the audibility tests and make your own tests if needed.
Learn to understand all measurement types and what each type offers in terms of specific parts of total performance.
Understand that measurements in the electrical and acoustical plane differ immensely.

That will teach you far more than reading all the research done by others.
I did just that (for over 25 years) and for that reason can easily say what matters to me and what does not.

Don't just believe what is said on both sides... investigate and educate yourself... then form an opinion.

GrimSurfer · Oct 27, 2019

For the most part, consumer audio isn't failing customers on the margins. It's failing to demonstrate mastery of the basics. What @Blumlein 88 and @DonH56 highlight in their posts is much more productive than looking for things that might not be there.

Mikey · Oct 27, 2019

GrimSurfer said:
@Mikey: I think you're reading too much into blind testing. The purpose of blind testing is to reduce visual bias. The results are not objective. They are still based on "preference", which is a subjective quality. The only thing that increasing the sample size of the blind audience does is reduce the results from being skewed by individual preference.

This is my fundamental disagreement. I think that the blind test is the thing that matters, because the ultimate goal of an audio playback chain isn't to make an analyzer happy, it's to make humans happy. The analyzer is only a proxy for what humans like, and it only has value if its measurements can be correlated in a meaningful way against human preferences.

If you're not doing that human-perception check on it, you can't actually know whether you're measuring the right thing. Look at the example of TIM in the early solid-state days, as I linked above, for instance: By using gallons of negative feedback, manufacturers were able to create amps that measured amazingly, but sounded bad; later on, they discovered that a thing they hadn't been measuring was correlated to that heavy use of negative feedback, and that's what was causing it to sound bad.

If you didn't take seriously the human-perception check there, you'd just go forward, making things that "measured great" via measurements that didn't capture all the relevant factors, but that were actually pretty bad. And if you were to castigate "subjectivists" for not liking your "great measuring" stuff, you'd be the one who ultimately had egg on your face, not them.

Julf · Oct 27, 2019

Mikey said:
This is my fundamental disagreement. I think that the blind test is the thing that matters, because the ultimate goal of an audio playback chain isn't to make an analyzer happy, it's to make humans happy. The analyzer is only a proxy for what humans like, and it only has value if its measurements can be correlated in a meaningful way against human preferences.

Agreed.

And if you were to castigate "subjectivists" for not liking your "great measuring" stuff, you'd be the one who ultimately had egg on your face, not them.

Yes, if that was what you castigated them for. Most of us here have no problem with their subjective preferences, but their justifications for their preferences, and for rejecting any attempts to verify them objectively.

FrantzM · Oct 27, 2019

Hi

Watching football... thus I will have to come back to this interesting thread. I'll say this in the meantimes:

There is a degree of education attached to any hobby...

Many prefer the cheap sugary Midnight Express wine to whatever "great" wine you can think of... Some people prefer Mac Donald to the most sophisticated burger in Kansas City or whatever joint connoisseurs claim to the best ... it's just that, taste.. preferences ...
What is wrong with,but not limited to, Audio is the ways its describe its scale of preference with this false equation:

More expensive = "More" Better

Apologies to William Shakespeare for butchering his mother tongue

High End Audio also perpetrate things that are utter lies and easily debunked: Stupidities like Audiophile Ethernet Switches, Audiophile Ethernet cables and Audiophiles USB cables .. Once this adjective (Audiophile) is used in front of a product, you should expect prices that can not have any bearing on performance ... Abominations such as $10,000 Ethernet or USB Cables... That, is plain wrong ..
Note that Wines goes along the same path ... No wine afficionados will accept that blind he (mostly) may find a $10 wine more to this taste than a $1000 one ...

Why is audio objectivism so frequently focused on all the wrong things?

Member

Chief Cat Herder

Major Contributor

Grand Contributor

Major Contributor

Senior Member

Major Contributor

Member

Major Contributor

Member

Major Contributor

Major Contributor

Chief Cat Herder

Grand Contributor

Master Contributor

Grand Contributor

Major Contributor

Member

Major Contributor

Major Contributor

Similar threads