Amir, I'm simply telling you how both of the tests work. You can get exactly the line you show from an ABX test FROM AN ABC/hr test. You get exactly that information. It does force a choice. You certainly ALSO get more information. I don't know why you think I'm supporting ABX testing for anything more than "can you make out a difference, any difference". That's all it's for. In an ABC/hr test, you are REQUIRED to identify the hidden reference. That provides you with precisely the same data as an ABX test. PRECISELY the same information. Yes, you also get MORE data. Nobody is disputing that.
BS1116, i.e. ABC/hr, also provides an estimation of how far the non-reference signal is from the original. Certainly that information is useful, but, please, please don't call it MOS. Which brings us to the test youshow in the bottom half of the previous post:
In the second example just above, you're showing MUSHRA results, not ABC/hr. Yes, it's a standard. I'm not the only person to express dislike for it. It's a rather questionably designed test, with results that are not guaranteed to be pairwise transitive, and lots of other things, but is used for expediency. The fact it does use an "MOS" sort of system is one of the reasons that it's a bit shaky. As I mentioned to someone else a few pages back, it also forces the subject to rank a multidimensional sensation along one axis. That's pretty much the heart of why MOS testing is a problem. Basically, if you pick the right (or wrong as the case may be) 3 stimuli you may find yourself A prefered over B in a pairwise test, B over C, and C over A. There is a long history of this kind of testing in MOS-like regimes. This despite the fact that A, B and C may be preferred in that order in the MUSHRA test. There is a large confounding problem in judging preference.
In the two examples above neither is BS1116, which is certainly a credible method. BS1116 is a distance method, that uses impairment phrasing. There are also scales much like the ITU scale that are purely distance, but experience shows that most trained subjects perform as well on the distance method in the ITU method, even with the impairment phrasing. Apparently cognitively speaking impairment is more reliable than preference. In any case, yes, most codec testing is ABC/hr. Yes, it provides a scale that looks like a preference scale, but it is important to understand that it is at its heart a distance, rather than preference scale.
This is not new material, really.
BS1116, i.e. ABC/hr, also provides an estimation of how far the non-reference signal is from the original. Certainly that information is useful, but, please, please don't call it MOS. Which brings us to the test youshow in the bottom half of the previous post:
In the second example just above, you're showing MUSHRA results, not ABC/hr. Yes, it's a standard. I'm not the only person to express dislike for it. It's a rather questionably designed test, with results that are not guaranteed to be pairwise transitive, and lots of other things, but is used for expediency. The fact it does use an "MOS" sort of system is one of the reasons that it's a bit shaky. As I mentioned to someone else a few pages back, it also forces the subject to rank a multidimensional sensation along one axis. That's pretty much the heart of why MOS testing is a problem. Basically, if you pick the right (or wrong as the case may be) 3 stimuli you may find yourself A prefered over B in a pairwise test, B over C, and C over A. There is a long history of this kind of testing in MOS-like regimes. This despite the fact that A, B and C may be preferred in that order in the MUSHRA test. There is a large confounding problem in judging preference.
In the two examples above neither is BS1116, which is certainly a credible method. BS1116 is a distance method, that uses impairment phrasing. There are also scales much like the ITU scale that are purely distance, but experience shows that most trained subjects perform as well on the distance method in the ITU method, even with the impairment phrasing. Apparently cognitively speaking impairment is more reliable than preference. In any case, yes, most codec testing is ABC/hr. Yes, it provides a scale that looks like a preference scale, but it is important to understand that it is at its heart a distance, rather than preference scale.
This is not new material, really.