digicidal
Major Contributor
Following the rabbit-hole links in that post (on Drop) and subsequently in SBAF there is an obvious problem with Amir's measurements... it's just that no one actually had a problem with this aspect. Before butts are hurt, let me say I'm not saying the measurements were incorrect - and certainly not that Amir doesn't know what he's doing. He, and many others here, know a hell of a lot more than I do - that's for sure.
With subjective reviews there is a level of inherent confirmation bias that is selection dependent - i.e. once 20 reviewers have said a particular device sounds exceptional (or horrid), anyone expressing the complete opposite opinion is risking the very kind of online lynching seen in the linked posts. Sure there might be a few mavericks, but the assumption is that if everyone else hears something you don't... you must be wrong. Since there is usually no shortage of subjective reviews, this bias presents itself rapidly to anyone searching for information on a product they're considering, often before the actual product even ships in volume.
With objective, measurement-driven reviews - the opposite problem exists. In many (if not most) cases, the manufacturer doesn't even provide a full set of AP measured charts. Even if they do, was the sample device the same as the one reviewed on ASR? What about the rest of the chain? Couple that with a dizzying array of products, variety of testing methodologies, and poor QC seen in even high-end brands... and you are often left with results in total isolation. With nothing to compare them to, can these results be considered much more than anecdotal?
The truly optimal situation is that there are 10 reviewers like Amir, all with similar test configurations, all being sent multiple samples of a given product (one from the manufacturer, and a couple from consumers pulled from actual retail samples). In that fantasy world, there would be a fairly simple way of eliminating edge cases where a single bad unit of an otherwise stellar product was tested. It would also be trivial to determine if a particular test (or reviewer) failed in some aspect of configuration or execution - if a single reviewer consistently produced results outside the mean.
Unfortunately, objective reviews are very hard to come by - and consistently are limited to one or two samples of a product. So although reproducible results can be achieved... without the ability to confirm that the product you as a consumer purchased measures the same - you are left with many of the same doubts regarding the review and/or source.
When those doubts lead to fanaticism - positive or negative - problems can and will arise. I would still much rather have even partially flawed measurements to include in a purchasing decision, than nothing but subjective platitudes regarding how "sublime and compelling" some audio device is to someone. Just because I will allow that it's possible (though unlikely) for something to measure well but not sound good, doesn't mean the measurements are the problem. If anything it likely just means there weren't enough measurements taken, or that the sample size was too small for it to be meaningful in comparison.
While everyone is quick to yell how someone's subjective opinion is meaningless without full documentation of their ABX methodology... in other cases, a metric without confirmation - on a sample size of one - isn't necessarily viewed with the same skepticism. For some at least.
With subjective reviews there is a level of inherent confirmation bias that is selection dependent - i.e. once 20 reviewers have said a particular device sounds exceptional (or horrid), anyone expressing the complete opposite opinion is risking the very kind of online lynching seen in the linked posts. Sure there might be a few mavericks, but the assumption is that if everyone else hears something you don't... you must be wrong. Since there is usually no shortage of subjective reviews, this bias presents itself rapidly to anyone searching for information on a product they're considering, often before the actual product even ships in volume.
With objective, measurement-driven reviews - the opposite problem exists. In many (if not most) cases, the manufacturer doesn't even provide a full set of AP measured charts. Even if they do, was the sample device the same as the one reviewed on ASR? What about the rest of the chain? Couple that with a dizzying array of products, variety of testing methodologies, and poor QC seen in even high-end brands... and you are often left with results in total isolation. With nothing to compare them to, can these results be considered much more than anecdotal?
The truly optimal situation is that there are 10 reviewers like Amir, all with similar test configurations, all being sent multiple samples of a given product (one from the manufacturer, and a couple from consumers pulled from actual retail samples). In that fantasy world, there would be a fairly simple way of eliminating edge cases where a single bad unit of an otherwise stellar product was tested. It would also be trivial to determine if a particular test (or reviewer) failed in some aspect of configuration or execution - if a single reviewer consistently produced results outside the mean.
Unfortunately, objective reviews are very hard to come by - and consistently are limited to one or two samples of a product. So although reproducible results can be achieved... without the ability to confirm that the product you as a consumer purchased measures the same - you are left with many of the same doubts regarding the review and/or source.
When those doubts lead to fanaticism - positive or negative - problems can and will arise. I would still much rather have even partially flawed measurements to include in a purchasing decision, than nothing but subjective platitudes regarding how "sublime and compelling" some audio device is to someone. Just because I will allow that it's possible (though unlikely) for something to measure well but not sound good, doesn't mean the measurements are the problem. If anything it likely just means there weren't enough measurements taken, or that the sample size was too small for it to be meaningful in comparison.
While everyone is quick to yell how someone's subjective opinion is meaningless without full documentation of their ABX methodology... in other cases, a metric without confirmation - on a sample size of one - isn't necessarily viewed with the same skepticism. For some at least.