• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Master Thread: Are measurements Everything or Nothing?

Newman

Major Contributor
Joined
Jan 6, 2017
Messages
3,598
Likes
4,469
I'll defend my comment. As far as I understand it, AMR was selling audiophile fuses when Thorsten Loesch was listed as being in charge of design, and I believe my claim that he was as "fuse seller" is correct.
While not wanting to defend TL’s statements, I would still ask that we refrain from blatant name-calling as a put-down. I saw someone describe Floyd Toole as a “loudspeaker salesman” as if that was an argument against his credibility, and I don’t care by what logic one might defend it, I would prefer that we don’t go there.

cheers
 

Newman

Major Contributor
Joined
Jan 6, 2017
Messages
3,598
Likes
4,469
My concern is more with the interpretation of results. Not on an individual level, but on the collective statistics. My guess would be that the distribution of scores wil be different with and without intermediate results. Because people that know how to play the system will stick out more.

So question then is: how more valid is that metric if a subset of people know how to use this “cheat”? Yes you say: you improved detectability and it shows. But those are people that know how to expect exploit it. Most people don’t know though.
Yes there’s that, and there is also the question of whether the testee respects the intention of the test.

For example 16 vs 24 bit music, the intention of the test is to help the average punter to know whether 24 bit music sounds detectably different (ideally better, but that would be step 2) from 16 bit music, as music. So if Amir scores 10/10 by finding a silent bit with no music, putting it on repeat, and cranking the volume to a level that would damage both him and the equipment if the music came back on, then listening for differences in the noise floor and ‘nailing it’ (which is exactly what he says he did to nail that test), to me, he hasn’t respected the intention of the test.

I mean, if that was the intention of the test, then the files should be silent tracks and we can all compare noise floors.

So, even though Amir makes brief mention of this in his 60-minute video, and briefly admits he would never pass the test if forced to listen to the music itself at normal listening levels, it gets buried in the headline announcement that “24 bit music sounds detectably different to 16 bit music”.

So that particular ABX test, passed that way, does the community no favours at all.

cheers
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,640
Likes
25,582
Location
Alfred, NY
why? real world matters. I'd like to have an anechoic chamber to listen in, too
If you need to peek to hear something, you can't hear it. It's not complicated.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,854
Likes
243,525
Location
Seattle Area
For example 16 vs 24 bit music, the intention of the test is to help the average punter to know whether 24 bit music sounds detectably different (ideally better, but that would be step 2) from 16 bit music, as music. So if Amir scores 10/10 by finding a silent bit with no music, putting it on repeat, and cranking the volume to a level that would damage both him and the equipment if the music came back on, then listening for differences in the noise floor and ‘nailing it’ (which is exactly what he says he did to nail that test), to me, he hasn’t respected the intention of the test.
"Intent" of the test was to prove the difference between the two formats is inaudible. No attempt was made to find music that would accentuate the differences between 16 and 24 bits. The fact that I still managed to find a difference -- which was NOT in the silent part -- means that there could be other test cases that are even more revealing.

When testing lossy compression, we don't just grab some audiophile music and test with that. Instead, we pay attention to where the algorithm is weak and find content to test with that. This is why these standardized tracks are called "Codec Killers." We, as developers and inventors of lossy compression, wanted the worst case test tracks to optimize the algorithms. Once there, we had a much better shot at having a solution that worked well across largest swath of music and listeners.

Also keep in mind that I am not the best there is as far as listeners. So what I did is not representative of someone with better hearing than me could do.

Summarizing, the test was corrupt at the start. You want to worry about something, that is what you want to worry about.
 
OP
amirm

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,854
Likes
243,525
Location
Seattle Area
So that particular ABX test, passed that way, does the community no favours at all.
It does actually. The community should have its chest up guaranteeing that no one can hear the difference between 16 and 24 bits in any condition. Saves from embarrassment when I am able to pass such controlled tests.
 

birdog1960

Senior Member
Joined
Oct 18, 2022
Messages
309
Likes
329
Location
Virginia
If you need to peek to hear something, you can't hear it. It's not complicated.
so measurements are worthless to 60+ yo?
Good to know. but I don't agree. Can still sort the grain from the whey in a meaningful manner to different listeners.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,997
Likes
38,182
to quote you: "No". seems pretty concrete, non?
This NO was in regard to controls. Age has no bearing on whether or not you need controls in testing you do. Do you understand what SIY means by controls? If one is evaluating high frequency response age probably matters, but varies with individuals. To test that particular thing or any other thing, you must have a controlled listening procedure to learn anything. The need for controls is fully independent of listener age.
 

birdog1960

Senior Member
Joined
Oct 18, 2022
Messages
309
Likes
329
Location
Virginia
This NO was in regard to controls. Age has no bearing on whether or not you need controls in testing you do. Do you understand what SIY means by controls? If one is evaluating high frequency response age probably matters, but varies with individuals. To test that particular thing or any other thing, you must have a controlled listening procedure to learn anything. The need for controls is fully independent of listener age.
sure, "controls" should include all ages, races etc with a breakdown.. Same as in drug trials. And those trials are particularly important to the subset that are reading or using them to make a decision..not pretending that audio science is equivalent to medicine but c"mon.
 
Last edited:

GXAlan

Major Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
3,984
Likes
6,144
It doesn’t matter what the pills are for. If you can find out after a few rounds which is which, the trail will fail. So thanks for tanking your own argument ;)
I am a medical professional. It does depend whether or not the outcome is repeated measures or independent measures.

That’s medicine. Let’s talk audio.

Imagine you have two nearly identically tuned pianos. The tuning of one note is different. You are ABXing those pianos using real musical content.

If you did something like a pkmetric, the results may be similar. If you only compare a few notes, the two many be similar. If you did a null test with a true complex piano piece, there would probably be enough variations from recording to recording that it would be hard to interpret.

However if in your 10 rounds of ABX, and you know when you were right vs. wrong, you may learn to identify and specifically listen for that aberrant note.

The results didn’t change. In the first scenario, before knowing your results, you would be more likely to claim the two pianos sound the same when in fact they sound and measure differently if you measure them appropriately. With knowledge of being right or wrong, you haven’t skewed the results.
 

birdog1960

Senior Member
Joined
Oct 18, 2022
Messages
309
Likes
329
Location
Virginia
I am a medical professional. It does depend whether or not the outcome is repeated measures or independent measures.

That’s medicine. Let’s talk audio.

Imagine you have two nearly identically tuned pianos. The tuning of one note is different. You are ABXing those pianos using real musical content.

If you did something like a pkmetric, the results may be similar. If you only compare a few notes, the two many be similar. If you did a null test with a true complex piano piece, there would probably be enough variations from recording to recording that it would be hard to interpret.

However if in your 10 rounds of ABX, and you know when you were right vs. wrong, you may learn to identify and specifically listen for that aberrant note.

The results didn’t change. In the first scenario, before knowing your results, you would be more likely to claim the two pianos sound the same when in fact they sound and measure differently if you measure them appropriately. With knowledge of being right or wrong, you haven’t skewed the results.
but hamilton depression scores suck. Lousy gold standard. now if we measure antibodies or hemoglobin A1C's...
...
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,997
Likes
38,182
sure, "controls" should include all ages, races etc with a breakdown.. Same as in drug trials. And those trials are particularly important to the subset that are reading or using them to make a decision..not pretending that audio science is equivalent to medicine but c"mon.
SIY is not referring to a control group. He is referring to level matching, identical listening conditions and proper blinding during the test. What he refers to as controlled listening comparisons. There certainly is a case for control groups as well, but in this case that is not the control SIY has in mind. (If I'm wrong he can correct me of course).

So do we need a properly randomized control group for all audio tests? Depends upon your aims. If your aim is to test whether a group of 50+ year old audiophiles can hear cable differences as they claim they can then no. Get a group who think they are hearing this and test them under controlled listening conditions and see if their claim holds up. If your aim is to show whether it is humanly possible to hear 48 khz vs 96 khz sample rates, then yes a properly organized control group may be in order. Yet in both cases you use controlled listening conditions.
 

Thorsten Loesch

Senior Member
Joined
Dec 20, 2022
Messages
460
Likes
533
Location
Germany, now South East Asia (not China or SAR's)
I was stuck on the idea of refuting the central point, when as far as I have been able to make out, there hasn’t been one.

Oh yes, there is.

My point is that Science follows some basic ideas and precepts that make it wat it is today. To quote Mr. Feynman from his Caltec Commencement Address commonly now referred to as "Cargo Cult Science".

"During the Middle Ages there were all kinds of crazy ideas, such as that a piece of rhinoceros horn would increase potency. Then a method was discovered for separating the ideas—which was to try one to see if it worked, and if it didn’t work, to eliminate it. This method became organized, of course, into science. And it developed very well, so that we are now in the scientific age."

"In the South Seas there is a Cargo Cult of people. During the war they saw airplanes land with lots of good materials, and they want the same thing to happen now. So they’ve arranged to make things like runways, to put fires along the sides of the runways, to make a wooden hut for a man to sit in, with two wooden pieces on his head like headphones and bars of bamboo sticking out like antennas—he’s the controller—and they wait for the airplanes to land. They’re doing everything right. The form is perfect. It looks exactly the way it looked before. But it doesn’t work. No airplanes land."

"So I call these things Cargo Cult Science, because they follow all the apparent precepts and forms of scientific investigation, but they’re missing something essential, because the planes don’t land."

"Now it behooves me, of course, to tell you what they’re missing. But it would he just about as difficult to explain to the South Sea Islanders how they have to arrange things so that they get some wealth in their system. It is not something simple like telling them how to improve the shapes of the earphones."

"But there is one feature I notice that is generally missing in Cargo Cult Science. It’s a kind of scientific integrity, a principle of scientific thought that corresponds to a kind of utter honesty—a kind of leaning over backwards. For example, if you’re doing an experiment, you should report everything that you think might make it invalid—not only what you think is right about it: other causes that could possibly explain your results; and things you thought of that you’ve eliminated by some other experiment, and how they worked—to make sure the other fellow can tell they have been eliminated."

"Details that could throw doubt on your interpretation must be given, if you know them. You must do the best you can—if you know anything at all wrong, or possibly wrong—to explain it. If you make a theory, for example, and advertise it, or put it out, then you must also put down all the facts that disagree with it, as well as those that agree with it. There is also a more subtle problem. When you have put a lot of ideas together to make an elaborate theory, you want to make sure, when explaining what it fits, that those things it fits are not just the things that gave you the idea for the theory; but that the finished theory makes something else come out right, in addition."

"In summary, the idea is to try to give all of the information to help others to judge the value of your contribution; not just the information that leads to judgment in one particular direction or another."

"We’ve learned from experience that the truth will out. Other experimenters will repeat your experiment and find out whether you were wrong or right. Nature’s phenomena will agree or they’ll disagree with your theory. And, although you may gain some temporary fame and excitement, you will not gain a good reputation as a scientist if you haven’t tried to be very careful in this kind of work. And it’s this type of integrity, this kind of care not to fool yourself, that is missing to a large extent in much of the research in Cargo Cult Science."

(Speaking of science involving psychology)

"A great deal of their difficulty is, of course, the difficulty of the subject and the inapplicability of the scientific method to the subject. Nevertheless, it should be remarked that this is not the only difficulty. That’s why the planes don’t land—but they don’t land."

"The first principle is that you must not fool yourself—and you are the easiest person to fool. So you have to be very careful about that. After you’ve not fooled yourself, it’s easy not to fool other scientists. You just have to be honest in a conventional way after that."

"I would like to add something that’s not essential to the science, but something I kind of believe, which is that you should not fool the layman when you’re talking as a scientist."

"I’m not trying to tell you what to do about cheating on your wife, or fooling your girlfriend, or something like that, when you’re not trying to be a scientist, but just trying to be an ordinary human being. We’ll leave those problems up to you and your rabbi."

"I’m talking about a specific, extra type of integrity that is not lying, but bending over backwards to show how you’re maybe wrong, that you ought to do when acting as a scientist. And this is our responsibility as scientists, certainly to other scientists, and I think to laymen."

"Other kinds of errors are more characteristic of poor science. When I was at Cornell. I often talked to the people in the psychology department. One of the students told me she wanted to do an experiment that went something like this—I don’t remember it in detail, but it had been found by others that under certain circumstances, X, rats did something, A. She was curious as to whether, if she changed the circumstances to Y, they would still do, A. So her proposal was to do the experiment under circumstances Y and see if they still did A."

"I explained to her that it was necessary first to repeat in her laboratory the experiment of the other person—to do it under condition X to see if she could also get result A—and then change to Y and see if A changed. Then she would know that the real difference was the thing she thought she had under control. She was very delighted with this new idea, and went to her professor. And his reply was, no, you cannot do that, because the experiment has already been done and you would be wasting time. This was in about 1935 or so, and it seems to have been the general policy then to not try to repeat psychological experiments, but only to change the conditions and see what happens."

"All experiments in psychology are not of this type, however. For example, there have been many experiments running rats through all kinds of mazes, and so on—with little clear result. But in 1937 a man named Young did a very interesting one. He had a long corridor with doors all along one side where the rats came in, and doors along the other side where the food was. He wanted to see if he could train the rats to go in at the third door down from wherever he started them off. No. The rats went immediately to the door where the food had been the time before."

"The question was, how did the rats know, because the corridor was so beautifully built and so uniform, that this was the same door as before? Obviously there was something about the door that was different from the other doors. So he painted the doors very carefully, arranging the textures on the faces of the doors exactly the same. Still the rats could tell. Then he thought maybe the rats were smelling the food, so he used chemicals to change the smell after each run. Still the rats could tell. Then he realized the rats might be able to tell by seeing the lights and the arrangement in the laboratory like any commonsense person. So he covered the corridor, and, still the rats could tell."

"He finally found that they could tell by the way the floor sounded when they ran over it. And he could only fix that by putting his corridor in sand. So he covered one after another of all possible clues and finally was able to fool the rats so that they had to learn to go in the third door. If he relaxed any of his conditions, the rats could tell."

"Now, from a scientific standpoint, that is an A‑Number‑l experiment. That is the experiment that makes rat‑running experiments sensible, because it uncovers the clues that the rat is really using—not what you think it’s using. And that is the experiment that tells exactly what conditions you have to use in order to be careful and control everything in an experiment with rat‑running."

"I looked into the subsequent history of this research. The subsequent experiment, and the one after that, never referred to Mr. Young. They never used any of his criteria of putting the corridor on sand, or being very careful. They just went right on running rats in the same old way, and paid no attention to the great discoveries of Mr. Young, and his papers are not referred to, because he didn’t discover anything about the rats. In fact, he discovered all the things you have to do to discover something about rats. But not paying attention to experiments like that is a characteristic of Cargo Cult Science."

My point is that while this place is called AUDIO SCIENCE REVIEW, I notice Cargo Cult Science tendencies.

As science is self correcting, or at least should be, I call out these traces of Cargo Cult Science in the hope that my criticism will be reviewed seriously and that Audio Science practiced here is correcting itself.

Between the typical hardcore objectivists of Mr. Clarke's ilk and typical hardcore subjectivists like most Stereophile Writers there is sadly preciously little science.

I am delighted to see a venue that tries to be scientific about audio. It is greatly needed. And I want to contribute to improve the science. I would have hoped that is obvious.

In my considered view the objectivist and subjectivist streams in audio both have value and contributions to make, Science is when the Demon Thesis meets the Arkangel Antithesis and they are combined into Synthesis.

It is the Synthesis I find interesting and which I home may eventually help to reliably produce good sounding audio gear using appropriate technology, made to suit the purpose, fairly priced as well.

IT MAY SEEM that I have no core point, because I do not want fall with the door into the living room and proclaims "ASR is Cargo Cult Science". There are plenty of venues where if I did this and posted my criticism I would be lauded as valuable contributor when in fact all in contribute to would be the continued lack of science.

Doing so of course would be great if my interest was commercial or self-promotion. Going in St George who slays the ASR dragon and then presenting my lastetet greatest sounding gizmo for sale would tick all boxes, EXCEPT helping to improve the science of audio.

So I hope my criticism is not taken as dismissal, but as encouragement that something worthwhile is being done that can nevertheless be improved greatly still.

Thor
 

birdog1960

Senior Member
Joined
Oct 18, 2022
Messages
309
Likes
329
Location
Virginia
SIY is not referring to a control group. He is referring to level matching, identical listening conditions and proper blinding during the test. What he refers to as controlled listening comparisons. There certainly is a case for control groups as well, but in this case that is not the control SIY has in mind. (If I'm wrong he can correct me of course).

So do we need a properly randomized control group for all audio tests? Depends upon your aims. If your aim is to test whether a group of 50+ year old audiophiles can hear cable differences as they claim they can then no. Get a group who think they are hearing this and test them under controlled listening conditions and see if their claim holds up. If your aim is to show whether it is humanly possible to hear 48 khz vs 96 khz sample rates, then yes a properly organized control group may be in order. Yet in both cases you use controlled listening conditions.
i get the distinction but it is important for readers of measurements. Personally, I'm more interested in whether an "upgrade" is worth it or not to my ears. I doubt I'm alone.
 
Last edited:

Newman

Major Contributor
Joined
Jan 6, 2017
Messages
3,598
Likes
4,469
The fact that I still managed to find a difference -- which was NOT in the silent part --
Oh my goodness. Do you not realise that your words above give the very clear impression — misleading impression — that you found the difference in the passages where music is playing normally. Which you did not.

You are being way too-smart-by-omission with your above words. If it wasn’t silent, it was near-silent. So you could hear the noise over the music. Entirely my point. Very disappointed that you chose the above wording.

Anyone reading this: start here and listen for 60 seconds.

"listening to the music...it's hopeless"
"look for where the music fades to zero", "to a level exhausting what 16 bits can do"
"then I can turn up the volume..."
"...when the music faded to zero, one had higher noise floor than the other...boom! I knew what to listen for, and managed to pass the test."
 

GXAlan

Major Contributor
Forum Donor
Joined
Jan 15, 2020
Messages
3,984
Likes
6,144
So I hope my criticism is not taken as dismissal, but as encouragement that something worthwhile is being done that can nevertheless be improved greatly still.

It was a very long post but the concept of Cargo Cult Science is a good one to think about, both strengths and flaws.
 
Top Bottom