Can You Trust Your Ears? By Tom Nousaine

j_j · Oct 30, 2017

Amir, I'm simply telling you how both of the tests work. You can get exactly the line you show from an ABX test FROM AN ABC/hr test. You get exactly that information. It does force a choice. You certainly ALSO get more information. I don't know why you think I'm supporting ABX testing for anything more than "can you make out a difference, any difference". That's all it's for. In an ABC/hr test, you are REQUIRED to identify the hidden reference. That provides you with precisely the same data as an ABX test. PRECISELY the same information. Yes, you also get MORE data. Nobody is disputing that.

BS1116, i.e. ABC/hr, also provides an estimation of how far the non-reference signal is from the original. Certainly that information is useful, but, please, please don't call it MOS. Which brings us to the test youshow in the bottom half of the previous post:

In the second example just above, you're showing MUSHRA results, not ABC/hr. Yes, it's a standard. I'm not the only person to express dislike for it. It's a rather questionably designed test, with results that are not guaranteed to be pairwise transitive, and lots of other things, but is used for expediency. The fact it does use an "MOS" sort of system is one of the reasons that it's a bit shaky. As I mentioned to someone else a few pages back, it also forces the subject to rank a multidimensional sensation along one axis. That's pretty much the heart of why MOS testing is a problem. Basically, if you pick the right (or wrong as the case may be) 3 stimuli you may find yourself A prefered over B in a pairwise test, B over C, and C over A. There is a long history of this kind of testing in MOS-like regimes. This despite the fact that A, B and C may be preferred in that order in the MUSHRA test. There is a large confounding problem in judging preference.

In the two examples above neither is BS1116, which is certainly a credible method. BS1116 is a distance method, that uses impairment phrasing. There are also scales much like the ITU scale that are purely distance, but experience shows that most trained subjects perform as well on the distance method in the ITU method, even with the impairment phrasing. Apparently cognitively speaking impairment is more reliable than preference. In any case, yes, most codec testing is ABC/hr. Yes, it provides a scale that looks like a preference scale, but it is important to understand that it is at its heart a distance, rather than preference scale.

This is not new material, really.

Cosmik · Oct 30, 2017

If we have to listen to a lossy codec to see if (or how well) it works, does that mean we don't understand how it works?

svart-hvitt · Oct 30, 2017

Cosmik said:
If we have to listen to a lossy codec to see if (or how well) it works, does that mean we don't understand how it works?

«Works» vs «works on humans». Not the same.

Cosmik · Oct 30, 2017

Arnold Krueger said:
In general false. One of the goals of programming styles is to program the solution in a modular fashion and partition the modules so that fixes to different problems as much as possible don't step on each other.

The general audiophile version of this myth is "Everything Matters". The short answer is finding out what does matter and what doesn't matter is a basic part of being a successful practitioner. Of course in a placebo-driven world like audiophilia, so-called "proofs" that everything matters rule.

Well I'm a DSP programmer among other things (not that that should matter to the argument at all), and I find it hard to conceive of a 'patch' you can put over a tricky training example that
(a) won't degrade your quality/compression performance
(b) won't come back to bite you on some other example you've not yet encountered.

Clearly, one approach would be to effectively 'turn down' the compression if you think you might have one of those "tricky castanets" in the buffer. It might even work, but you've reduced your overall compression performance, and you're either relying on having covered every example of a similar sound that might cause you the problem (which makes it likely to 'trip' the detector more often than necessary) or you're tailoring your 'solution' specifically to your training data which makes the whole thing very fragile.

What you are playing with, is a potential 'information space' that is as big as the number of permutations of a buffer of 16 bit numbers, and thinking that you can possibly provide enough training examples and listen to them to exhaustively to allow you to "code" a solution for every tricky sound. As I was harping on about before, the problem and its solution has got very little to do with the practice of A/B/X, etc. or the rigorousness of testing procedures.

For sure you can tailor a solution for the tiny pocket within the information space for which you can provide examples, but it won't work the first time you bring in a new sound you've never heard before. A realistic solution must be based on a generalised method based on some understanding of:
(a) the nature of music and sound: it occupies a fraction of the information space. Identify that space and, on average, store/stream only that space.
(b) perceptual factors that allow a further shrinking of the necessary information space.

I think that a lot of people assume that (b) is all that is happening, but FLAC typically yields an approximate 2:1 reduction that is lossless! If lossy encoding only increases this up to 5:1 or even 10:1 while remaining transparent, it seems to me that the perceptual aspect (masking, equal-loudness contours, etc.) is a refinement rather than the major component of lossy compression. Further hand-tailoring and listening to tricky artefacts (as we have been discussing) can only account for a tiny fraction of the encoder's performance - and has the potential to spoil it just as much as improving it! Using listening tests to gain a better understanding of human hearing and then feeding that into codec development would be more useful than using them to tweak existing codecs. But we may already be at the limits of what is possible...

Cosmik · Oct 30, 2017

svart-hvitt said:
«Works» vs «works on humans». Not the same.

FLAC gives a lossless data compression. We know how it works, and don't need to listen to it.

Rationally, trimming off everything above 20kHz also works, and we shouldn't need to listen to music to realise that it does, because much simpler tests in the past, and examination of the physiology of the ear have established it is so.

Should lossy compression not be understood in similar ways? In fact, I think it is, and listening tests are a very minor tweak of its performance (that could go negative as easily as positive).

Jakob1863 · Oct 30, 2017

ABC/HR is a forced choice protocol by definition (as is ABX too) but it puts no (obvious) requirement on the listener to identify the hidden reference, which is imo an important point.

Although - according to the ITU-R - the panel gets the information that a hidden reference is included the listeners are actually doing pairwise comparisons to the open reference and rate "B" and "C" seperately.

Arnold Krueger · Oct 30, 2017

amirm said:
You have no experience with how codecs are developed Arny. Only JJ and I do. JJ said he used ABX at AT&T together with other tools such as ABC/HR. I said that we never did including the time that JJ joined my team.

The fallacies above is the implied assertion that Amir knows my complete life's work in full detail and that one has to do hands-on work in order to be aware of the processes that are used.

I might add that I am hihly prone to believe that MS never used ABX because I quickly and easily discovered one of the most egregious artifacts that I ever heard in a coder when I applied ABX to a MS encoder for my www.pcabx.com . web site. I believe that an example can be found in the Wayback machine archive for that site which is online to this day. It was one of the first I posted and was so obvious that no training was deemed necessary. As soon as you compare the original musical sample to one that was MS-encoded, it jumped right out and bit you in the ear!

I also participated in a number of shoot outs conducted by others of lossy codecs and none used ABX. I also quoted a number of papers from AES, none of which showed ABX usage in that regard.

The fallacy here is that ABX might be thought by anybody who is qualified to rate codecs to the best available or even a useful tool for that particular purpose. Again, there has to be the ability to determine that detecting audible errors is not the same as an overall ranking.

ABX was developed to provide reliable results in comparisons where there was a reasonable possibility that one or both alternatives create no, nine, nada audible impairments, It was no way ever designed or ever formally presented as a means for rating alternatives that are already known to have audible impairments.

One of Schopenhauer's 38 strategies for winning arguments by false means is to blow the opponents arguments well beyond its intended applicability...

http://www.mnei.nl/schopenhauer/38-stratagems.htm

This is the first one!

"Carry your opponent's proposition beyond its natural limits; exaggerate it. The more general your opponent's statement becomes, the more objections you can find against it. The more restricted and narrow his or her propositions remain, the easier they are to defend by him or her."

Cosmik · Oct 30, 2017

Arnold Krueger said:
One of Schopenhauer's 38 strategies for winning arguments by false means is to blow the opponents arguments well beyond its intended applicability...

Aha! I'll look out for the remaining 37 in your posts from now on

Yes... the use of the word "opponent" which I seem to recall a few pages back...

Arnold Krueger · Oct 30, 2017

Cosmik said:
Aha! I'll look out for the remaining 37 in your posts from now on

Thanks for the confidence! ;-)

The first person to make this joke was John Atkinson circa Y2K on rec.audio.opinion, so you are in good company if you call that good company. ;-)

At the time I'd never heard of Schopenhauer, so he ended up creating the monster he feared!

Google for LOGICAL FALLACIES for more such material.

Seriously, I check my posts constantly to make sure I avoid wasting time and effort with such crap.

Arnold Krueger · Oct 30, 2017

I can tell from this thread that listening tests are regarded by many as the Swiss Army knife of measurements.

Speaking just for myself, no. Summarizing the mainstream audio art as I perceive it based on a lot of experience and education, no.

People become listening test enthusiasts, because they think they can resolve "multidimensional" mysteries such as why people seem to like the sound of valve amplifiers and vinyl over the perfection of digital and solid state.

That may be what they think, but if you study the art of audio evaluation, reliance on just informal listening tests correlates with an unprecidented rise in audio products that have only perceived differences based on naive evaluations.

Or they can prove "their opponents" ('subjective' audiophiles) wrong - a very strong motivation for the scientific listening test 'community'.

No. Please study history. Far at least a decade informal listening tests were accepted in some professional journals including the JAES. Then skepticism rose because the list of leading zeroes transcended believability. Well known effects were relabelled as newly discovered forms of distortion. However, other professional journals had restored order to that chaos by formalizing listening tests. I'm referring to the landmark paper published in the early 50s in the JASA by some Bell Labs researchers. However their test was for a different purpose and ran into sensitivity problems when applied to the evaluation of audio product sound quality. A few judicious modifications and viola ABX as most of us know it.

Lossy compression is one area where listening tests are needed to check it works - which is why it is so often raised as an example of the usefulness of listening tests - but even so, the test can only fine tune a system that was designed on paper based on pretty straightforward logic (not saying I'd have thought of it, though!). A workable lossy compression system could be designed without any listening tests at all.

The uselessly vague word in the paragraph above is *workable*. Is your workable my workable? How about the majority of music lovers or listeners to supposedly high quality media?

The early MP3 and other encoders were fairly easy to detect, almost instantly. I don't think that much new relevant basic science has been discovered since then.

At the end of the day, it is odd that lossy compression seems to be such an obsession for high-performance audio professionals. It is as though they are stuck in the year 2000.

That's because we have progressed from such drek to such efficiency and near perfection but are still not quite yet actually perfect. The carrot keeps moving, but it seems close enough to encourage additional efforts.

amirm · Oct 30, 2017

Arnold Krueger said:
The fallacies above is the implied assertion that Amir knows my complete life's work in full detail and that one has to do hands-on work in order to be aware of the processes that are used.

Let me tell you what I do know for certain after spending a decade going back and forth with you online. That you have no common sense. You are in a forum that champions many of the things that you believe in, yet you are fighting every member in sight as if they are all your enemy.

Second and importantly, you are bringing no new information. No insight. No learning. Only a political dogma and spiteful attitude that gives the entire objectivity in audio a bad name.

I didn't create this forum for what you bring Arny. This is the forum mission statement on every page:

"WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). Come here to have fun, be ready to be teased and not take online life too seriously. "

You are the farthest person from a happy member and desire to learn and share knowledge.

I am going to watch your next few posts. If they continue in the same tone and information-free nature, we will be saying goodbye to you.

RayDunzl · Oct 30, 2017

For my daily dose of invective I prefer reading those gathered by Nicholas Slonimsky.

Lexicon of Musical Invective: Critical Assaults on Composers Since Beethoven’s Time
by Nicolas Slonimsky
Buy this book on Amazon.com

Paperback: 336 pages
Publisher: W.W. Norton & Company; (August 28, 2000)
ISBN: 039332009X

Book Description

“A supermarket tabloid of classical music criticism.”―from the foreword by Peter Schickele.

A snakeful of critical venom aimed at the composers and the classics of nineteenth- and twentieth-century music. Who wrote advanced cat music? What commonplace theme is very much like Yankee Doodle? Which composer is a scoundrel and a giftless bastard? What opera would His Satanic Majesty turn out? Whose name suggests fierce whiskers stained with vodka? And finally, what third movement begins with a dog howling at midnight, then imitates the regurgitations of the less-refined or lower-middle-class type of water-closet cistern, and ends with the cello reproducing the screech of an ungreased wheelbarrow? For the answers to these and other questions, readers need only consult the “Invecticon” at the back of this inspired book and then turn to the full passage, in all its vituperation.

Among the eminent reviewers are George Bernard Shaw, Virgil Thomson, Hans von Bulow, Friedrich Nietzsche, Eduard Hanslick, Olin Downes, Deems Taylor, Paul Rosenfeld, and Oscar Wilde.

Itself a classic, this collection of nasty barbs about composers and their works, culled mostly from contemporaneous newspapers and magazines, makes for hilarious reading and belongs on the shelf of everyone who loves–or hates–classical music. With a foreword by Peter Schickele (“P.D.Q. Bach”).

Samples...

j_j · Oct 30, 2017

Cosmik said:
If we have to listen to a lossy codec to see if (or how well) it works, does that mean we don't understand how it works?

It means that the proper way to test a system that depends on perception is to use perception.

j_j · Oct 31, 2017

Arnold Krueger said:
The early MP3 and other encoders were fairly easy to detect, almost instantly. I don't think that much new relevant basic science has been discovered since then.

***kaff*** What? I'll go along with the first sentence.

Wayne · Oct 31, 2017

Cosmik said:
Rationally, trimming off everything above 20kHz also works, and we shouldn't need to listen to music to realise that it does, because much simpler tests in the past, and examination of the physiology of the ear have established it is so.

@Cosmik -- Are you saying that if everything above 20kHz is trimmed, a listener could not tell if the music was trimmed or not?

j_j · Oct 31, 2017

Wayne said:
@Cosmik -- Are you saying that if everything above 20kHz is trimmed, a listener could not tell if the music was trimmed or not?

Depends. Does the source have content over 20kHz? Does the playback system have the ability to play it? Is the playback system linear above 20kHz? Is the level low, or enormous above 20kHz? How old are you? What is your noise exposure history? Are you standing next to a 2 year old with unimpaired, child hearing?

Cosmik · Oct 31, 2017

Wayne said:
@Cosmik -- Are you saying that if everything above 20kHz is trimmed, a listener could not tell if the music was trimmed or not?

Yes.

(But as j_j says, their system may sound different i.e. better as a result)

Wayne · Oct 31, 2017

Wayne said:
@Cosmik -- Are you saying that if everything above 20kHz is trimmed, a listener could not tell if the music was trimmed or not?

Cosmik said:
Yes.

(But as j_j says, their system may sound different i.e. better as a result)

@Cosmik You have lost me. (your answer of "yes" seems contradictory.}

Are you saying that:

A. if everything above 20kHz is trimmed, a listener could not tell if the music was trimmed or not? (I was thinking if an AB/ABX comparison was done before trimming and after trimming - would the difference be detectable -- I was not clear on this point. It seems to me it would be, but I was trying to understand your position)

or B. if everything above 20kHz is trimmed, a listener would not know (in the absence of quantitative instrumentation) if the music was trimmed or not? (or I suppose it would be the same if the original recording had only recorded 20kHz and below.)

Thanks -

Cosmik · Oct 31, 2017

Wayne said:
@Cosmik You have lost me. (your answer of "yes" seems contradictory.}

Are you saying that:

A. if everything above 20kHz is trimmed, a listener could not tell if the music was trimmed or not? (I was thinking if an AB/ABX comparison was done before trimming and after trimming - would the difference be detectable -- I was not clear on this point. It seems to me it would be, but I was trying to understand your position)

or B. if everything above 20kHz is trimmed, a listener would not know (in the absence of quantitative instrumentation) if the music was trimmed or not? (or I suppose it would be the same if the original recording had only recorded 20kHz and below.)

Thanks -

Hi.

I claim that we cannot hear above 20 kHz, so we will not notice if everything above 20 kHz is trimmed off or not.

We will notice, however, if our system suffers from intermodulation distortion from ultrasonic components, producing unwanted artefacts that come down into the audible range. By removing the ultrasonic components from the music we remove the unwanted artefacts - which may be registered as an audible difference from our system.

In other words, I am happy with CD sample rate.

Wayne · Oct 31, 2017

@Cosmik: Thanks. I got it.

Can You Trust Your Ears? By Tom Nousaine

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Addicted to Fun and Learning

Active Member

Major Contributor

Active Member

Active Member

Founder/Admin

Grand Contributor

Major Contributor

Major Contributor

Active Member

Major Contributor

Major Contributor

Active Member

Major Contributor

Active Member

Similar threads