• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Does what we hear correspond to what we measure?

Which one do you prefer

  • N° 1

    Votes: 5 31.3%
  • N° 2

    Votes: 11 68.8%

  • Total voters
    16
Status
Not open for further replies.
It's extremely difficult to have a serious conversation at this point.

OP says he wants to learn. But isn't taking the constructive feedback from other members seriously. Which causes it to become manifest as criticism. And to be repeated over and over, making it interpreted as personal. It's unfortunate. It plays out all the time here.

A few points have been clearly laid out.
The need for blind testing
The need for controls, including files for evaluation that are clean of artifacts unrelated to the device being tested, and level matched, to name just a few of the issues.
And the vast difference between a test for difference and a test for preference.
And the need to understand sample statistics, which is a field unto itself and the source of many misinterpretations of studies.

I think it would help if OP acknowledges the points and takes the feedback seriously. It doesn't seem that will happen though, way too many flippant responses.
 
You are vague. ;)
I meant to say : which proves that it's all a question of interpretation (It's not easy to make good translations.)
 
It's extremely difficult to have a serious conversation at this point.

OP says he wants to learn. But isn't taking the constructive feedback from other members seriously. Which causes it to become manifest as criticism. And to be repeated over and over, making it interpreted as personal. It's unfortunate. It plays out all the time here.

A few points have been clearly laid out.
The need for blind testing
The need for controls, including files for evaluation that are clean of artifacts unrelated to the device being tested, and level matched, to name just a few of the issues.
And the vast difference between a test for difference and a test for preference.
And the need to understand sample statistics, which is a field unto itself and the source of many misinterpretations of studies.

I think it would help if OP acknowledges the points and takes the feedback seriously. It doesn't seem that will happen though, way too many flippant responses.
Sorry, but you're judging me wrongly. I readily admitted that I presented and handled things too carelessly.
I've acknowledged the comments, even if you take the opposite view. You only have to reread the thread to see that.
 
  • Like
Reactions: MAB
Sorry, but you're judging me wrongly. I readily admitted that I presented and handled things too carelessly.
I've acknowledged the comments, even if you take the opposite view. You only have to reread the thread to see that.
I am perhaps just not understanding some of your posts. Apologies for that.
I have read the thread enough.
 
I am perhaps just not understanding some of your posts. Apologies for that.
I have read the thread enough.
As I said, perhaps that the translation from French to English is sometimes not exactly what I really want to explain. Apologies for that
 
  • Like
Reactions: MAB
The title is:
Does what we hear correspond to what we measure?

As you can see, there is a "?" at the end, so it is a question and not a statement
Betteridge's law of headlines is an adage that states: "Any headline that ends in a question mark can be answered by the word no."

Based on Betteridge's law, the title you intended to use was: "What we hear does not correspond to what we measure".

 
While the null hypothesis of "no preference" cannot be rejected (p=0.291), I do think there are some statistically significant conclusions that I found very interesting. Since the first sample was louder, you would expect it to be chosen more often (all else being equal). If you put that likelihood at 75%, all of a sudden, the p-value drops to a strongly statistically significant 0.006. This provides meaningful evidence that the quality difference between the samples (e.g., the excessive compression in sample 1) matters as much or more important than the volume difference. This certainly encourages me to look harder at the dynamic range database before purchasing.
 
While the null hypothesis of "no preference" cannot be rejected (p=0.291), I do think there are some statistically significant conclusions that I found very interesting. Since the first sample was louder, you would expect it to be chosen more often (all else being equal). If you put that likelihood at 75%, all of a sudden, the p-value drops to a strongly statistically significant 0.006. This provides meaningful evidence that the quality difference between the samples (e.g., the excessive compression in sample 1) matters as much or more important than the volume difference. This certainly encourages me to look harder at the dynamic range database before purchasing.
Random statistics don’t apply in this study, it was sighted, nothing random about it.
 
Random statistics don’t apply in this study, it was sighted, nothing random about it.
I agree that there are methodological problems, so the results should be viewed skeptically, but I don't think they're worthless. If anything, I think the discussion about sample 1 being louder would have created confirmation bias in its favor.
 
I agree that there are methodological problems, so the results should be viewed skeptically, but I don't think they're worthless. If anything, I think the discussion about sample 1 being louder would have created confirmation bias in its favor.
I don't know how to interpret the results of a small, poorly constructed, and unserious preference study. It was sighted, so no way it is valid. Please read the first chapter of Toole's book, it discusses the the overarching problem of sighted studies. That alone disqualifies it from being useful, an potentially misdirecting. Can we answer why people appeared to prefer the louder track? Was it the compression? What the preference even real, or a bogus result of the botched study? I can't tell.

If you grabbed a dozen people off the street and poured them glasses of Coke and Pepsi, you are likely to find what seems to be a preference. If done blind in controlled settings the difference in preference becomes very small. If you don't control for temperature, you will skew the results (which was actually done in some Coke v Pepsi trials). Do I find value in these types of studies? No I do not, they really do lead us to think there are large differences when there are none, or even inverting the actual preference results.

And the differences in files appear to be mishandling by the investigator, not due to a Bluetooth transceiver. What are we even preferring? With our without hi-hat? Compression? Loudness? Channel balance?

Also, there was at least one troll, no idea how he voted, not likely in a way that allows random stats.

No, not worthwhile, every aspect of the test was botched. In fact, preference studies are massive undertakings. Even a test for difference needed to be done way more rigorously.

The results are worthless.
 
Betteridge's law of headlines is an adage that states: "Any headline that ends in a question mark can be answered by the word no."

Based on Betteridge's law, the title you intended to use was: "What we hear does not correspond to what we measure".

This also means that the answer can be "yes" ;)
 
I don't know how to interpret the results of a small, poorly constructed, and unserious preference study. It was sighted, so no way it is valid. Please read the first chapter of Toole's book, it discusses the the overarching problem of sighted studies. That alone disqualifies it from being useful, an potentially misdirecting. Can we answer why people appeared to prefer the louder track? Was it the compression? What the preference even real, or a bogus result of the botched study? I can't tell.

If you grabbed a dozen people off the street and poured them glasses of Coke and Pepsi, you are likely to find what seems to be a preference. If done blind in controlled settings the difference in preference becomes very small. If you don't control for temperature, you will skew the results (which was actually done in some Coke v Pepsi trials). Do I find value in these types of studies? No I do not, they really do lead us to think there are large differences when there are none, or even inverting the actual preference results.

And the differences in files appear to be mishandling by the investigator, not due to a Bluetooth transceiver. What are we even preferring? With our without hi-hat? Compression? Loudness? Channel balance?

Also, there was at least one troll, no idea how he voted, not likely in a way that allows random stats.

No, not worthwhile, every aspect of the test was botched. In fact, preference studies are massive undertakings. Even a test for difference needed to be done way more rigorously.

The results are worthless.
Your rigorous approach does you credit, but this little test did not claim to be a "major study." It was primarily a starting point for a discussion on a Bluetooth receiver. I will not go over what has been said, and often with accuracy and technical rigor. I have taken the comments into account to plan a set of measurements, this time made more rigorously, even though I know that there will be points that will be discussed.

I did not vote because knowing the files my answer is biased

But I will say that these recordings listened to separately are perfectly listenable without one saying to oneself "that's horrible".
But one point should be noted: making a comparison voluntarily necessarily implies a choice, even if this choice is a "non-choice".
 
This also means that the answer can be "yes" ;)
It always is "No". That is the singular and whole point of Betteridge's Law.

An editor wants people to buy their newspaper and they know that publishing a story about a famous actor on the front page will sell more copies. But they have no evidence of anything. So they publish. "Did famous actor take drugs at party?". It gives them plausible deniability.
 
Your rigorous approach does you credit, but this little test did not claim to be a "major study." It was primarily a starting point for a discussion on a Bluetooth receiver. I will not go over what has been said, and often with accuracy and technical rigor. I have taken the comments into account to plan a set of measurements, this time made more rigorously, even though I know that there will be points that will be discussed.

I did not vote because knowing the files my answer is biased

But I will say that these recordings listened to separately are perfectly listenable without one saying to oneself "that's horrible".
But one point should be noted: making a comparison voluntarily necessarily implies a choice, even if this choice is a "non-choice".
Studies of preference in music reproduction are 'major studies', not 'little tests'. And your test was in no way a test of a BT device since the files were botched. Yet we still have people coming to your thread finding bogus significance. TBH you do need a rigorous approach.

The fact that the recordings are listenable has nothing to do with the fact that the files were messed up for purposes of a study of preference.
You clearly misunderstand what I meant by 'botched'. Same as you misunderstanding what 'level' meant on the first page of this thread.

Also, music is my hobby and I enjoy playing and listening to music, writing endlessly about it not so much. I think I have spent enough time here writing to you. It would be one thing if I got the impression you were actually learning which would be rewarding, but going in circles with you isn't any fun at all.
 
It always is "No". That is the singular and whole point of Betteridge's Law.

An editor wants people to buy their newspaper and they know that publishing a story about a famous actor on the front page will sell more copies. But they have no evidence of anything. So they publish. "Did famous actor take drugs at party?". It gives them plausible deniability.
From Wikipedia : "Any headline that ends in a question mark can be answered by the word no." .. This is different from : "Any title that ends with a question mark has the answer to the word no." ... 'can be' leaves open the possibility of answering yes."
 
Studies of preference in music reproduction are 'major studies', not 'little tests'. And your test was in no way a test of a BT device since the files were botched. Yet we still have people coming to your thread finding bogus significance. TBH you do need a rigorous approach.

The fact that the recordings are listenable has nothing to do with the fact that the files were messed up for purposes of a study of preference.
You clearly misunderstand what I meant by 'botched'. Same as you misunderstanding what 'level' meant on the first page of this thread.

Also, music is my hobby and I enjoy playing and listening to music, writing endlessly about it not so much. I think I have spent enough time here writing to you. It would be one thing if I got the impression you were actually learning which would be rewarding, but going in circles with you isn't any fun at all.
I honestly don't really know where you want to go or even what you would like me to admit as an unforgivable fault, but I just think that you are making this little test into a story that is far too important. Certainly a difference in vision linked to the ocean that separates our two continents.
 
I think that above all, you should take the listening test and you will get your answer.
The total number of voters is still 16, so the sample size may be too small.
What kind of answer would we get if the sample size reached, say, 100?
Or how would the effort and time of 100 members be useful to the Science of How We Hear subforum?
 
Status
Not open for further replies.
Back
Top Bottom