• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

AES meta analysis on audibility of hi-res

ceedee

Active Member
Joined
Jun 8, 2016
Messages
105
Likes
32
Location
DFW, TX
It looks like this thread has been dead for a while, but I accidentally posted about the meta analysis on the WAV vs. FLAC thread. I'll quote my post:
It truly is stunning to see subjectivist audiophiles suddenly siding with DBT and trumpeting that things have finally been "proven." Such hypocrisy.

And as for the meta analysis, unless I'm misreading it, all we know is that some difference was heard in a small percentage of trials. It would be hypocritical of me to dismiss the results just because they aren't what I expect, but the fact that we're even still having this debate shows that hi-res is nowhere near as audible as audiophiles claim. The great benefits of hi-res nearly always disappear when listeners are subjected to blind testing. That means that there's still a lot of mass delusion going on...probably because the brains of most audiophiles subconsciously scheme to make sure they never fail to hear a difference.

More testing is obviously needed to pin down exactly what caused those positive results. I don't see how we can eliminate the possibility that it's some unintended artifact (IM distortion, a "tell," etc.). The author claims in the press release that the results demonstrate hi-res provides a "small but important" increase in sound quality. Unless someone can explain how the data support that conclusion, it seems like pure conjecture on his part. Did that bias influence the way he weighted and analyzed the various studies? I guess we won't know for sure until someone else attempts more testing.

After all these years of failed ABX tests, I doubt we're going to suddenly start seeing anything different. Although, now that so many audiophiles have embraced controlled testing...

Here is Fitzcaraldo215's reply:
I think it is important to keep a level head and focus on the science and the issues that are being discussed in a particular published paper. This thread is about an obviously, really grossly dubious paper from a science standpoint, which allegedly demonstrates a perceptable difference between FLAC and WAV. There is no comparison between this "pseudoscience" paper and the Reiss paper on the perception of hi rez vs. RBCD, a different topic and a different paper. To believe they are somehow similar is naive and it totally misses the scientific substance of the two papers.

Also, how or on what basis "subjectivist" and "objectivist" audiophiles think or react to each paper sheds little light on the papers themselves, their methods or their conclusions. We know that considerable bias exists in audiophile listening reactions, and it also exists in their reading of and reactions to published papers and studies. Hence, some subjectivists, who normally demonize bias controlled, double blind studies, find themselves cheering for the conclusions of Reiss paper or vice versa.

Your own biases are quite obvious, by the way. The Reiss paper does not at all say that a difference between hi rez and RBCD was heard "in a small number of trials". But, that is a different subject. My suggestion is that we discuss that paper in more detail in the appropriate thread.

One thing I wanted to ask is, since I'm not a statistics expert, just how often was the difference between hi-res and standard-res detected in the tests? It seemed to me that it was a small number of times (though statistically significant), but I could be wrong.

Since it seems that so many are clinging to this paper as proof that hi-res really does make a difference (and others have seemingly already dismissed it as flawed), I think it would be valuable to understand what the data actually support. Even though we disagree about the audible benefits of hi-res, I agree with Fitzcaraldo that this paper merits more detailed discussion here.
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,194
Likes
3,760
You , and everyone, should carefully read the paper. It's a free download. The answers are in there.

The *larger* answer is, it depends on how you slice and dice the tests -- which tests are worthy to include, and which results are worthy to assess. Which is what meta analysis does. The author makes what he argues are reasonable slice/dice choices, and when he does that, there's a marginal but statistically significant 'accuracy' result above 50% for detecting difference. When he slice/dices further, considering only experiments where 'trained' subjects were used, it goes up to 60% ish

In table 2c, he gives the number of trials and successes in aggregate for all the experiments that passed muster
#correct 6,736
total trials 12,645
percent correct 53.27%
probability 1.006E-13


In the end much depends on whether you accept his arguments for inclusion and grouping of various experiments and conditions.

No single experiment (and indeed, not the meta-analysis either) shows an effect that would support the sorts of 'even my wife hears it' 'night and day', 'a veil was lifted' claims from A/B sighted listening , that audiophiles routinely make, and that the industry counts on to 'sell' hi rez.
 
Last edited:

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,250
Likes
17,186
Location
Riverview FL
What would the percentages work out to be for the same number of responses for the responses to be considered within the statistical range of "just guessing" when using the same confidence level as above?

Excuse my wording, as I live in a non statistically driven parallel universe. Everything has a 50/50 chance to me, will/won't, can/can't, did/didn't, etc.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,654
Likes
240,798
Location
Seattle Area

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,194
Likes
3,760
What would the percentages work out to be for the same number of responses for the responses to be considered within the statistical range of "just guessing" when using the same confidence level as above?

Excuse my wording, as I live in a non statistically driven parallel universe. Everything has a 50/50 chance to me, will/won't, can/can't, did/didn't, etc.

I think what you're asking is, what percent correct (number of correct answers/12,645 trials * 100) has a probability that just equals or exceeds 0.05 (lower than that is typically considered 'significantly' better than chance, larger than that, not. Though one can argue that for a subtle effect the significance threshold should be more like 0.01)

My answer is, I don't know. The online binomial probability calcualtors don't let me use such large numbers. I'm also not sure how Reiss calculated it since there are few different flavors of that ( e.g., P of *exactly* X out of Y; P of *X or more* out of Y; P of *X or fewer* out of Y -- typically p is 'X or more out of Y'). It's probably in his article's Appendix
 
Last edited:

Fitzcaraldo215

Major Contributor
Joined
Mar 4, 2016
Messages
1,440
Likes
634
It looks like this thread has been dead for a while, but I accidentally posted about the meta analysis on the WAV vs. FLAC thread. I'll quote my post:


Here is Fitzcaraldo215's reply:


One thing I wanted to ask is, since I'm not a statistics expert, just how often was the difference between hi-res and standard-res detected in the tests? It seemed to me that it was a small number of times (though statistically significant), but I could be wrong.

Since it seems that so many are clinging to this paper as proof that hi-res really does make a difference (and others have seemingly already dismissed it as flawed), I think it would be valuable to understand what the data actually support. Even though we disagree about the audible benefits of hi-res, I agree with Fitzcaraldo that this paper merits more detailed discussion here.
Krabapple's post above is a fair summary, though incomplete, like anybody's summary would be.

Your view that the difference was detected "a small number of times" is incorrect. It was detected slightly more often than not in the aggregate of all test studies included in the paper. However, that slight difference was deemed statistically significant based on the aggregate data from those tests using standard statistical methods.

An excellent summary graph is presented in Fig. 2 on page 370. This lists the separate test studies included in Reiss' analysis and it shows several things. For each study, the mean number of times the test takers got the discrimination right is listed and graphed as a small square. There is also a horizontal line through each square which gives the standard deviation of results around the mean, indicating how much spread there was among the answers in each study. Obviously, a mean of 50% right is equivalent to chance, meaning no difference. All test studies analyzed were double blind, by the way.

The studies in Fig. 2 are grouped into two categories: one where no training was given to the test subjects and the other where training was provided prior to the test. The no training group got an aggregate mean of 51% right. The range of means is 47.5-56.3% for studies in that group. The trained group got 62% right on average, with a range of 56.9-74.7% in the individual test study means. So, clearly, advance training in taking the tests has improved accurate discrimination by the test subjects. I think that is significant for all audio perception testing on human subjects.

Reiss also comments on the bias of a majority of those test studies toward a Type II error, a bias toward not hearing a difference where there was one. Therefore, there is a likelihood that the test results understate the true ability to discriminate. This strikes me as plausible because comparative test protocols like ABX inevitably have this issue, unless the test stimuli have a "significant" difference. ABX and similar protocols are not as perfect as some make them out to be.

Unless someone very sophisticated comes along and demolishes Reiss, we have a well thought out, very comprehensive answer for now, with some useful guidance for other test studies that may be undertaken in the future. Potshots at the paper coming from naysayer audiophiles who lack the appropriate statistical and academic skills are ludicrous hot air. Yes, there is no gee whiz difference that everyone heard all the time. But, that is not a reason to criticize the Reiss paper. A difference was heard most of the time by a small but significant margin.

I am OK with you or others who have not heard a difference with your own listening to hi rez. But, as with all things in audio, people hear things differently, and some might not be as careful as others in how they attempt to listen for audible differences, eliminate their biases, etc. I keep stressing the need to listen to natively recorded hi rez vs. the RBCD version from the same hi rez master. Music mastered in hi rez from analog or RBCD sources is biased toward revealing no perceptable difference.

Also, if you have not already done so, please see the following paper by Amir:

http://audiosciencereview.com/forum/index.php?threads/high-resolution-audio-does-it-matter.11/
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,194
Likes
3,760
krabapple NOT Fitzcaraldo215 said:
But I look forward, by all means, to independent researchers following the new(ish) Reiss recommendations for doing a proper comparison to reveal and, one expects, bulk up, this currently 'small' effect. (Punters at home need not apply.)

BTW, by new(ish) I am giving props to folks like JJ (James Johnston) who have *always* recommended listener training, and positive controls, for scientific audio DBTs. I wonder what his take on this work would be.
 
Last edited:

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,747
Likes
37,568
I think what you're asking is, what percent correct (number of correct answers/12,645 trials * 100) has a probability that just equals or exceeds 0.05 (lower than that is typically considered 'significantly' better than chance, larger than that, not. Though one can argue that for a subtle effect the significance threshold should be more like 0.01)

My answer is, I don't know. The online binomial probability calcualtors don't let me use such large numbers. I'm also not sure how Reiss calculated it since there are few different flavors of that ( e.g., P of *exactly* X out of Y; P of *X or more* out of Y; P of *X or fewer* out of Y -- typically p is 'X or more out of Y'). It's probably in his article's Appendix
6435/12645 or 50.9% gets you to 95 % chance this isn't random.
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,194
Likes
3,760
It looks like this thread has been dead for a while, but I accidentally posted about the meta analysis on the WAV vs. FLAC thread. I'll quote my post:


Here is Fitzcaraldo215's reply:


One thing I wanted to ask is, since I'm not a statistics expert, just how often was the difference between hi-res and standard-res detected in the tests? It seemed to me that it was a small number of times (though statistically significant), but I could be wrong.

Since it seems that so many are clinging to this paper as proof that hi-res really does make a difference (and others have seemingly already dismissed it as flawed), I think it would be valuable to understand what the data actually support. Even though we disagree about the audible benefits of hi-res, I agree with Fitzcaraldo that this paper merits more detailed discussion here.
6435/12645 or 50.9% gets you to 95 % chance this isn't random.


Getting half the trials right (6322/12645 = 50%) has a cumulative p = 0.5 (of course)

To just cross the 0.05 significance threshold for cumulative p, one needs 6419/12645 correct (50.8%), for a p = 0.043

If my brain was working at better than half speed today I'd have realized this when I answered Ray's question before.

Your number and Reiss's go further past the 0.05 significance threshold, obviously. Yours has a p= 0.02. Reiss's is far lower (1.006e-13, or 0.0000000000001006). Reiss's obviously goes past the 0.01 threshold too. If you agree with his analysis choices, his result is highly significant (i.e, highly unlikely to be due to guessing), though the 'effect' is rather small.
 
Last edited:

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,747
Likes
37,568
Getting half the trials right (6322/12645 = 50%) has a cumulative p = 0.5 (of course)

To just cross the 0.05 significance threshold for cumulative p, one needs 6419/12645 correct (50.8%), for a p = 0.043

If my brain was working at better than half speed today I'd have realized this when I answered Ray's question before.

Your number and Reiss's go further past the 0.05 significance threshold, obviously. Yours has a p= 0.02. Reiss's is far lower (1.006e-13, or 0.0000000000001006). Reiss's obviously goes past the 0.01 threshold too. If you agree with his analysis choices, his result is highly significant (i.e, highly unlikely to be due to guessing), though the 'effect' is rather small.

Well, I used the simple formula. Which is usually okay for larger sample sizes. I am no pro on statistical math however.
 
Last edited:

Fitzcaraldo215

Major Contributor
Joined
Mar 4, 2016
Messages
1,440
Likes
634
BTW, by new(ish) I am giving props to folks like JJ (James Johnston) who have *always* recommended listener training, and positive controls, for scientific audio DBTs. I wonder what his take on this work would be.

That's a new one. You invent something and make it look like you are quoting me. A good way to foster a constructive dialog.

I would appreciate it if you would remove your fictitious quote which you attributed to me.
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,194
Likes
3,760
That's a new one. You invent something and make it look like you are quoting me. A good way to foster a constructive dialog.

I would appreciate it if you would remove your fictitious quote which you attributed to me.

So, you think I purposely attributed to you a quote of mine, that I was obviously expanding on? For some nefarious reason rather than as a simple editing error?

You're entertaining in so many ways. :rolleyes:

Anyway, I'll fix it, thanks for noticing.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,654
Likes
240,798
Location
Seattle Area
I think what you're asking is, what percent correct (number of correct answers/12,645 trials * 100) has a probability that just equals or exceeds 0.05 (lower than that is typically considered 'significantly' better than chance, larger than that, not. Though one can argue that for a subtle effect the significance threshold should be more like 0.01)
The answer given that many trials is 51%. (really 50.7315%). The high number of trials means slightest percentage over 50% means statistically valid results. In other words, in large number of random trials, the number "must" converge to 50%. If it doesn't, it likely is not due to chance/randomness.
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,194
Likes
3,760
The answer given that many trials is 51%. (really 50.7315%). The high number of trials means slightest percentage over 50% means statistically valid results. In other words, in large number of random trials, the number "must" converge to 50%. If it doesn't, it likely is not due to chance/randomness.

50.8% by my reckoning. I simply incremented the number correct starting from ~6410, until p dropped below 0.045. That turned out to be 6419 (=50.7631%).
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,250
Likes
17,186
Location
Riverview FL
Uh, we only have one meta-analysis. How many of those do we need to be confident of its result?

Where's the answer, Lebowski? Where's the answer?


Oh, nevermind, I think I found the answer.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
I have my own views on the spectacle of garbage being fed into statistical formulae and laundered into figures with 12 decimal places.

Shouldn't we be suggesting why a difference might be heard? This could point us to musical samples that might be the most likely to be provide an audible difference.

From the existing experiments maybe we could home in on the ones that apparently 'found something' and drill down into the 'killer samples' that enabled a remarkable 0.3529% of listeners to correctly hear a "difference" a stunning 50.6817% of the time, then set a computer on finding similar samples in order to ramp up the strike rate in future experiments.

At the same time, it is necessary to ensure that any "difference" heard is not just due to increased IMD from high frequencies making its way down into mortals' hearing ranges, or other interactions with real world transducers.
 
Last edited:

ceedee

Active Member
Joined
Jun 8, 2016
Messages
105
Likes
32
Location
DFW, TX
This strikes me as plausible because comparative test protocols like ABX inevitably have this issue, unless the test stimuli have a "significant" difference. ABX and similar protocols are not as perfect as some make them out to be.
This strikes me as, not an intrinsic issue with ABX, but more with the difficulty humans have at hearing very small differences. If we remove the controls, we still will have a very hard time actually differentiating … it's just that we will *think* it's a lot easier. ;)

If you agree with his analysis choices, his result is highly significant (i.e, unlikely to be due to guessing), though the 'effect' is rather small.
Thanks for the explanations, everyone.

So if one were to make slightly different choices in analysis, could a different conclusion be made?

At the same time, it is necessary to ensure that any "difference" heard is not just due to increased IMD from high frequencies making its way down into mortals' hearing ranges, or other interactions with real world transducers.
Yep.
 

krabapple

Major Contributor
Forum Donor
Joined
Apr 15, 2016
Messages
3,194
Likes
3,760
I have my own views on the spectacle of garbage being fed into statistical formulae and laundered into figures with 12 decimal places.

Shouldn't we be suggesting why a difference might be heard? This could point us to musical samples that might be the most likely to be provide an audible difference.

From the existing experiments maybe we could home in on the ones that apparently 'found something' and drill down into the 'killer samples' that enabled a remarkable 0.3529% of listeners to correctly hear a "difference" a stunning 50.6817% of the time, then set a computer on finding similar samples in order to ramp up the strike rate in future experiments.

At the same time, it is necessary to ensure that any "difference" heard is not just due to increased IMD from high frequencies making its way down into mortals' hearing ranges, or other interactions with real world transducers.


Yup. The honest conclusion from Reiss (2016) if his data are taken at face value is 'something probably was heard, under some conditions, by some people, but we don't know what it was'

Reiss makes some recommendations for further work. But his press release rhetoric of hi rez providing a 'small but important advantage in quality' makes me think maybe he's not the one who should do it.
 
Last edited:

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,654
Likes
240,798
Location
Seattle Area
There's a discussion of this in various points in the paper, that's a good place to start. No doubt there will be further discussion of alternative data sets outside of the paper too
Not sure I agree with that person when he says, "The bit and sample rate conversion is not controlled for. There are pretty huge variations in the performance of sample rate converters, as evidenced by these measurements [1] and without having characterized the performance of one, the paper is pretty much only testing the performance of the sample rate converter itself."

If resampling is audible that way, that is reason enough to stay with high-res. And at any rate, it is not like when you buy CD rate music you know how it was resampled.
 
Top Bottom