• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). There are daily reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

How we could finally pin down flowery audiophile subjective descriptions

OP
kemmler3D

kemmler3D

Addicted to Fun and Learning
Joined
Aug 25, 2022
Messages
597
Likes
1,165
Location
San Francisco
But I am. I addressed it in a recent post:

You can't standardize subjectivity, so this is a pointless exercise.


The idea is to measure how often people (in general, large groups of people, not individuals) use subjective terms in relation to specific objective measurements, so that when someone can't or won't refer to measurements, (which is very often, as you know) we could still make a probabilistic / educated guess as to what they actually heard.

What's the problem with that?
 

MattHooper

Major Contributor
Joined
Jan 27, 2019
Messages
4,434
Likes
7,099
You can't standardize subjectivity, so this is a pointless exercise.

Dictionaries would like a word with you...

Look up: "Bitter" "Sweet" "sour" "savory"...

(Oh, I'm sure there are a few more words in the Dictionary showing our standard terms that arose from how things cause different subjective perceptions).
 

Jim Taylor

Major Contributor
Joined
Oct 22, 2020
Messages
1,561
Likes
3,367
What's the problem with that?

The problem is twofold:

1) There are outliers from statistical center. In a case such as this, I would expect there to be a great number of outliers. Is the person to whom you refer an outlier, a statistical exception? You have no way of knowing. Unfortunately, neither does that person.

2) I believe it is not good to indulge in a "guess" ..... even one driven by probabilities. Guesses are like trying to play ping-pong in the dark; you might get a hit, and then again you might get a wild miss. Much better to play a game in the light of day.

For centuries, accurate translation has been valued highly. People know that guesses as to meaning could led to serious mistakes. I'm not just talking about between different languages, but also within a language. How many times have you heard someone say, "Whadya mean by that?" or "I don't get where you're coming from." or, more plainly, "I don't understand what the heck you're trying to say!"
Probabilities and guesses are NOT conducive to accuracy.

As for someone who can't or won't refer to measurements: If they can't, then I admit that the problem gets more difficult. However, if they can but won't, then that is a problem of an entirely different nature, and statistical probability won't help.

Jim
 
OP
kemmler3D

kemmler3D

Addicted to Fun and Learning
Joined
Aug 25, 2022
Messages
597
Likes
1,165
Location
San Francisco
Well, look at it like this. Currently we have lots of people out there listening to speakers and delivering nothing but subjective reviews, including a good handful of "professionals". Year in, year out, we get reviews of potentially interesting speakers with no objective data. It feels like a big waste.

I think most of the people in this thread would agree that these subjective reviews are worthless or nearly so. However, sometimes the only reviews available for a given piece of equipment are subjective.

So, we do this: We take everything a certain reviewer has ever written about speakers, and correlate the language they use to objective measurements where available.

This would yield a model where if said reviewer uses the term (say) "strident", we could (for example) correlate that with a 92% probability that there is excess energy in the 7-9khz range.

Such a tool could also be used to evaluate the credibility of subjective reviewers. If the ML model finds that their use of terms is very consistent across speakers (for example, they always use the words "buttery", "smooth", or "round" to describe bass boost of +2dB or more in the 55-80hz region), we can then put more stock in those reviews, since we will know that their use of certain words consistently refers to real, specific variations in performance.

Conversely, it could be used to discredit reviewers who just make s*** up for their reviews. If their use of language correlates poorly or not at all to objective measurements, we can show (objectively) that their reviews are meaningless.

To me, this would be a little bit useful and very interesting. I'm not really proposing more than that, a mostly-just-interesting study.

It seems that people in this thread are arguing such a tool or study would be worse than nothing, which is odd to me, but I'm still considering the possibility I haven't made the concept clear.

You could also use this technique to find trends in how people in general use these words, but maybe that's less interesting or useful. I think it would work for any corpus where there is enough use of flowery language. That could be "people in general" or for long-standing reviewers, it could be one person.
 

Jim Taylor

Major Contributor
Joined
Oct 22, 2020
Messages
1,561
Likes
3,367
Well, look at it like this. Currently we have lots of people out there listening to speakers and delivering nothing but subjective reviews, including a good handful of "professionals". Year in, year out, we get reviews of potentially interesting speakers with no objective data. It feels like a big waste.

I think most of the people in this thread would agree that these subjective reviews are worthless or nearly so. However, sometimes the only reviews available for a given piece of equipment are subjective.

So, we do this: We take everything a certain reviewer has ever written about speakers, and correlate the language they use to objective measurements where available.

This would yield a model where if said reviewer uses the term (say) "strident", we could (for example) correlate that with a 92% probability that there is excess energy in the 7-9khz range.

Such a tool could also be used to evaluate the credibility of subjective reviewers. If the ML model finds that their use of terms is very consistent across speakers (for example, they always use the words "buttery", "smooth", or "round" to describe bass boost of +2dB or more in the 55-80hz region), we can then put more stock in those reviews, since we will know that their use of certain words consistently refers to real, specific variations in performance.

Conversely, it could be used to discredit reviewers who just make s*** up for their reviews. If their use of language correlates poorly or not at all to objective measurements, we can show (objectively) that their reviews are meaningless.

To me, this would be a little bit useful and very interesting. I'm not really proposing more than that, a mostly-just-interesting study.

It seems that people in this thread are arguing such a tool or study would be worse than nothing, which is odd to me, but I'm still considering the possibility I haven't made the concept clear.

You could also use this technique to find trends in how people in general use these words, but maybe that's less interesting or useful. I think it would work for any corpus where there is enough use of flowery language. That could be "people in general" or for long-standing reviewers, it could be one person.

I understand what you're saying, and I agree that your goal is commendable. I simply doubt that you could get any reasonably useful percentage of correlation (much less something in the 90-ish category). That's why I brought up the concept of the outliers, and expressed doubt about the percentage of outliers in relation to the statistical core.

Not only that, but this whole idea seems to be a crutch that would, for many, many people, destroy any interest or reliance on measurements. The whole idea of a subjectivist regimen would seem to refute the idea that logical, scientific work has superior value, because it would seem to place subjectivism on an equal level of accuracy with scientific measurements. And that is just downright wrong.

Why should we advocate using a crutch instead of teaching people to walk on their own two feet?

Jim
 
OP
kemmler3D

kemmler3D

Addicted to Fun and Learning
Joined
Aug 25, 2022
Messages
597
Likes
1,165
Location
San Francisco
I simply doubt that you could get any reasonably useful percentage of correlation (much less something in the 90-ish category)
If so, then so much the better, we would have data showing that subjective reviews are unreliable, and HOW unreliable they are.

this whole idea seems to be a crutch that would, for many, many people, destroy any interest or reliance on measurements.
I don't know, it doesn't work at all without the objective measurements, so you could just as well argue that this could be the final subjugation of subjectivity to objectivity! Just think - people would skip reading the review and just wait for the "ML scoring model summary" of the review to come out on ASR. Then if the scores look good, they petition Amir to measure the device for confirmation.

In retaliation, the subjective reviewers would either start publishing measurements to stay in the game, or double down and switch up their vocabulary to throw off the model, which would backfire and kill off their remaining traffic / readership.

At that point, everyone is either publishing measurements, or has become irrelevant once and for all.

Since this site is pretty objectivist-supremacist, I think this idea should be very popular, actually! ;)
 

Galliardist

Addicted to Fun and Learning
Joined
Jun 26, 2021
Messages
741
Likes
892
Location
Sydney. NSW, Australia
Yes, I would limit this to descriptions of speakers, to at least avoid measuring known-zero differences...
But are they zero differences?
I can think of a few things that can cause a difference to be described, and I'll list some scenarios:

1) A difference is audible, but not the difference the reviewer believes they are describing
2) A difference is not audible, but described - the reviewer "hears" and describes the effect of non-audible changes as an audible one
3) A difference is audible, but the reviewer describes it incorrectly, maybe because a non-audible feature is registered as the reason by an uncertain brain
Conversely, it could be used to discredit reviewers who just make s*** up for their reviews. If their use of language correlates poorly or not at all to objective measurements, we can show (objectively) that their reviews are meaningless.
How many reviewers do that though? Their brain processed the input from the ears, along with whatever other inputs there are and whatever experience was deemed relevant - and across several centres in the brain working on different aspects of the music, by current accounts, and created the experience that is described as "hearing" particular aspects of the "sound".

Then again, is the review necessarily meaningless even then? We keep concentrating on the soundwaves. But maybe the bigger speaker, or the one with the traditional look, or the famous brand name, is seen as better but it comes out as a difference in the sound once the brain's finished its processing.

The thing is, this subjective model has been put to the test, albeit informally, by hundreds of thousands of readers down the years. Sure, for some readers it has failed, and these days those people may turn up here. But there must be a lot of cases where the reviewer's opinion has more or less matched that of a lot of readers that have auditioned and bought the same product.

So the subjective model is doing something. Maybe it would be more scientific to understand what it is doing, than to condemn it for not matching up with objective measurements (if indeed it doesn't, for whatever boundaries we put around what the sound waves are doing).
 

Jim Taylor

Major Contributor
Joined
Oct 22, 2020
Messages
1,561
Likes
3,367
In retaliation, the subjective reviewers would either start publishing measurements to stay in the game, or double down and switch up their vocabulary to throw off the model, which would backfire and kill off their remaining traffic / readership.

I most assuredly hope that you are correct, and everything would pan out that way. That would be wonderful! However, it's not prudent to underestimate the power of the Dark Side. :D

To be clear ..... I'm not suggesting that you shouldn't go ahead and try your analysis .... not at all. I just have my doubts. :)

Jim
 
OP
kemmler3D

kemmler3D

Addicted to Fun and Learning
Joined
Aug 25, 2022
Messages
597
Likes
1,165
Location
San Francisco
How many reviewers do that though?
Quite a few if you ask the typical ASR poster.

Maybe 'make stuff up' isn't quite right, but the people who are constantly hearing major improvements from using different cables or ethernet switches are definitely just reporting on their subjective experience of placebo effect, not actual audible changes in their system's output. For some product categories, there is simply no doubt to give the benefit of.

Expectation bias has a very well known and understood effect on hearing. Fancy cables, ethernet switches, and random blocks of wood have no measurable effect on audio in general. And for this idea, if it's not measurable, it's not usable as input, even if we wanted to.
But maybe the bigger speaker, or the one with the traditional look, or the famous brand name, is seen as better but it comes out as a difference in the sound once the brain's finished its processing.
Certainly, and actually I would not argue that subjective reviews that are technically about nothing more than the look of the equipment are worthless. They enable people to enjoy music (even if it's on a questionable basis) and it keeps hifi manufacturers in business. "But they're WRONG!!" is a common feeling around here, but at the end of the day, homeopathic audio solutions aren't just fraud, they're effective placebos. If you hear an improvement that isn't actually there, well, you still hear it, and that's what counts.

However, I'm more interested in whether there is any objective insight to be gained from a subjective review.
So the subjective model is doing something. Maybe it would be more scientific to understand what it is doing,
Indeed, that's the real goal of this proposed idea. To brute-force a link between measurable performance and subjective description.
condemn it for not matching up with objective measurements (if indeed it doesn't, for whatever boundaries we put around what the sound waves are doing).
If there are any objective correlations between the actual measured sound and what people say, then we can be comforted in the fact that there is real value in these reviews.

If the terms used in reviews turn out to be indistinguishable from random, we have 2 possibilities left:

1) The reviews are hogwash and should be ignored, unless you want someone to talk you into buying gear for no reason whatsoever

2) The reviewers are actually hearing something that is not measured. We could infer this by finding that two or more subjective reviewers' terminology matches closely, but doesn't correlate with measurements. So either they are collaborating (not unlikely) or they're actually hearing something we don't measure. (much less likely).

Both outcomes would be valuable info, I think!
 

Galliardist

Addicted to Fun and Learning
Joined
Jun 26, 2021
Messages
741
Likes
892
Location
Sydney. NSW, Australia
If there are any objective correlations between the actual measured sound and what people say, then we can be comforted in the fact that there is real value in these reviews.

If the terms used in reviews turn out to be indistinguishable from random, we have 2 possibilities left:

1) The reviews are hogwash and should be ignored, unless you want someone to talk you into buying gear for no reason whatsoever

2) The reviewers are actually hearing something that is not measured. We could infer this by finding that two or more subjective reviewers' terminology matches closely, but doesn't correlate with measurements. So either they are collaborating (not unlikely) or they're actually hearing something we don't measure. (much less likely).

Both outcomes would be valuable info, I think!
There's a third option at least. The reviewers are reacting to some other feature of their experience of the system. For example, the name on the boxes.

And a fourth. A reviewer may be confused about the experience, decide that something is a bit different in some way, and rationalise it. Yes, that is "make stuff up", but what editor will let a reviewer come to the conclusion "haven't the faintest".. so a conscious conclusion based on what might be expected will fit the bill.
 
OP
kemmler3D

kemmler3D

Addicted to Fun and Learning
Joined
Aug 25, 2022
Messages
597
Likes
1,165
Location
San Francisco
There's a third option at least. The reviewers are reacting to some other feature of their experience of the system. For example, the name on the boxes.

And a fourth. A reviewer may be confused about the experience, decide that something is a bit different in some way, and rationalise it. Yes, that is "make stuff up", but what editor will let a reviewer come to the conclusion "haven't the faintest".. so a conscious conclusion based on what might be expected will fit the bill.

If there is no correlation between the gear itself (I guess brand could be used as an input variable, why not) then it's hogwash. I would definitely expect a correlation between brand and subjective impressions though, you're right.

Post-hoc rationalization of random placebo-type impressions won't show up in a statistical analysis though, so I don't think there is a real option 4.
 

soundcheck

Member
Joined
Apr 28, 2020
Messages
11
Likes
10
Many of us throw up our hands when equipment is described as "fast", "slow", "crisp", "warm", etc. It seems impossible to relate these terms to measurable characteristics.

You nailed it.

First of...
People need to accept that there are characteristics in sound that are not properly described through
the commonly used measurements. And "people" includes Amir!

Audio Precision openly stated that their measurement gear won't cover the full audible "signature" (spectrum).
It would require more and different measurements to explain what we hear.
Their AP devices are best to be used for quality assurance in production processes. That's what I recall was said.

Note:
I had discussions with manufacturers in the past. We agreed that measurable improvements - all in the "inaudible" arena - ,
which usually require numerous product design enhancements, also can have indirect impact on the sound signature of a product.
All kind of parts get changed. Software gets changed, asf. asf. What's really the root cause of an audible change on the final device
usually remains unknown. If the device sells great for that very reason - a perceived better sound. Who cares what's causing it!

Now. Fact is.

There are audible differences, even if standard measurements are suggesting to some people there are none.
It's been proven a million times. Not just by the "audiophile golden ears" out there.
Manufactures, professional reviewers, reviewers, audio professionals, more or less experienced users, ... they can all hear it.
Everybody can - even via poorly recorded Youtube videos you can tell fuse A from fuse B, cable A from cable B, asf. asf.... apart.

Now. For all the "scientists" over here.

Empirical evidence is fully accepted in science. So. Yes. Listening experiences can prove a certain matter. That's not the issue.

The problem starts with people trying to explain it or being forced to explain it. That usually turns out to be very "objective"
and the terminology used gets adventurous.

Whatsoever. As a "real" scientist you'd listen to all these people.
You start looking for the subject until you see it yourself. Simply ignoring the facts is the worst thing a scientist can do.

To quote sciencealert.com :

The issue is that when it comes to facts, people think more like lawyers than scientists, which means they 'cherry pick' the facts and studies that back up what they already believe to be true.

So if someone doesn't think humans are causing climate change, they will ignore the hundreds of studies that support that conclusion, but latch onto the one study they can find that casts doubt on this view. This is also known as confirmation bias, a type of cognitive bias.

Yep. Cognitive bias. That's what we find in the audio realm a lot over here and elsewhere.
And I'd extent above sciencelalert quote "...they already believe to be true..." by "...and/or they want others to believe to be true..."

Now. It has even been proven numerous times by measurements, that there are differences, beyond the standard measurements.
One approach was to compare a real music sample with its loop-back recording. Not just a test-tone!
It was done by DAC and ADC looping. No ears involved! The tool being used was AudioDiffmaker .
You can find several gear tests results over at gearspace.com

Each of these tests did show differences for a different device on the recorded sample file. And here we talk about the audible part!
And we talk several dB in difference.

The key parameter over at gearspace is named "correlated 0 depth". It shows how close the recording gets to the source.

This approach also works on transport optimizations. We used it over at slim devices forum more than 10 years ago to show that my
Squeezebox Touch Toolbox (Linux optimizations) was impacting the sound signature. Result: It did clearly impact the sound signature btw!

AudioDiffmaker is (was?) just one approach. It to me simply shows, if you start looking for evidence, you'll find it.

Amir could take that tool and run some tests with it. It might turn out to be a very useful step ahead. It might turn out to be useless. Who knows.

The better we know what we're talking about the better we can describe it.
Until then the language around it remains flowery. Which is not that bad. Many people understand what's "crisp" when you hear it.
You simply need to learn a new language. ;)

Enjoy.
 

Axo1989

Addicted to Fun and Learning
Joined
Jan 9, 2022
Messages
999
Likes
838
"7. Wovon man nicht sprechen kann, darüber muss man schweigen."

Literal.

I'm not following: are you saying you can't see difference between "one cannot speak" and "whereof one cannot speak"?

We should defer to measurements when performance assessment is the goal. A lot can be done outside of that to speak better, to be more informative, but the object of description is phantasmagorical. It asks for precision and study. It resists casualness.

I agree performance assessment is one goal of a review. But we also want to know "what does it sound like" and other things.

Are you using a translation app or is English your second language? No offence intended obviously but your syntax is unusual. When you say the object of the description is phantasmagorical do you mean that the reviewer's objective is to produce a dreamlike/unreal/fantastic/illusory text? Or are you referring to description generally? Or do you mean something else?
 
Last edited:

Galliardist

Addicted to Fun and Learning
Joined
Jun 26, 2021
Messages
741
Likes
892
Location
Sydney. NSW, Australia
There are audible differences, even if standard measurements are suggesting to some people there are none.
It's been proven a million times. Not just by the "audiophile golden ears" out there.
Manufactures, professional reviewers, reviewers, audio professionals, more or less experienced users, ... they can all hear it.
Everybody can - even via poorly recorded Youtube videos you can tell fuse A from fuse B, cable A from cable B, asf. asf.... apart.
The problem is this. If you eliminate difference by level matching, between devices where no audible difference is predicted by measurements, and test blind, the result is that no audible difference is heard. It's kind of "by definition" but it's strong evidence that we don't have to allow for anything else.

If you can produce a repeatable, properly done DBT where that hypothesis breaks down and a difference is heard, bring it here and we can talk. Youtube videos and discussions of sighted subjective tests with no rigour or control, we have no proper evidence that the sound waves in the room have changed.

If it's been proven - properly proven - a "million times" you and any other subjectivist should be able to bring that proper proof here. I haven't seen it and nor have the genuine experts here. The evidence always seems to be anecdotal, though, doesn't it?
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
13,007
Likes
29,326
Location
The Neitherlands
I had made an attempt to correlate some of the most used words into a technical explanation here.
Of course I could add some other flowery ones as well but would have to 'understand' what is meant by the words.
 

Jim Taylor

Major Contributor
Joined
Oct 22, 2020
Messages
1,561
Likes
3,367
Now. Fact is.

There are audible differences, even if standard measurements are suggesting to some people there are none.
It's been proven a million times. Not just by the "audiophile golden ears" out there.
Manufactures, professional reviewers, reviewers, audio professionals, more or less experienced users, ... they can all hear it.
Everybody can - even via poorly recorded Youtube videos you can tell fuse A from fuse B, cable A from cable B, asf. asf.... apart.

As has been suggested over and over in this forum, use a double-blind test. So-called"facts" can disappear.

Simply ignoring the facts is the worst thing a scientist can do.

No. The worst thing a logical person can do is give credence to fashionable foolishness that has been disproven.


Jim
 

Jim Shaw

Addicted to Fun and Learning
Forum Donor
Joined
Mar 16, 2021
Messages
557
Likes
1,006
Location
Northeastern Ohio, USA, in the woods
Curious: How do you think words get in to dictionaries in the first place?

And does some level of imprecision render words useless?
"smooth" "sharp" "dull" "sweet" "bitter" and on and on?

Would you plead ignorance if anyone used such terms...or any of countless such examples...because those words are not measurements, or don't come with measurements, and un-quantified language represents such a subjective morass it's just useless?

How many measurements do you see in dictionary definitions?
Etymologists use resources to standardize the meaning of words, so others can use those words with some level of consistent meaning.

A subjective morass is by most standards useless, though it may be entertaining. As an example, I love limericks. But they have little place in describing music.

There once was a troll from Audiometry
Who never could figure music symmetry.
He spoke with disdain
Of all who explain
With the slightest amount of civility.


It would be absurd to use a language that doesn't allow some imprecision. (You may be trying too hard to find something wrong with definitions.)

Measurements are comparisons of information with relatively stable standards. Dictionaries (every entry) are full of such measurements. Have you not noticed this?
;)
 
Top Bottom