Most electrical gear is transparent these days. That doesn't mean one shouldn't buy good engineering. Loudspeakers have a lot of distortion still. Someone needs to work on that. Any analog source (records or tape) is a crap shoot, depending on the time of day and your luck.The "music" has fallen apart by 10% The tone, by 2%.
See this thread for a more detailed listening test by Klippel.
Reliability means consistency. There is no reason these sorts of Internet tests cannot provide consistency. As long as one doesn't obtain wildly variant results each time one takes the test then it can be used as a valuable self screening tool. Obviously it's not going to satisfy the validity level a more rigorous control is going to offer.Any results from that test are null and void... You need to do a proper ABX test reaching at least 13/16 trials passed in bit-perfect WASAPI exclusive audio mode using e.g. Foobar2000's ABX comparator component to get reliable results.
Reliability means consistency. There is no reason these sorts of Internet tests cannot provide consistency. As long as one doesn't obtain wildly variant results each time one takes the test then it can be used as a valuable self screening tool. Obviously it's not going to satisfy the validity level a more rigorous control is going to offer.
To say any results obtained are 'null and void' is to overstate the case quite a bit, given what these 'tests' offer to do.
The degree to which the result of a measurement, calculation, or specification can be depended on to be accurate.
I would have liked to have a similar test with acoustic instruments
There are two definitions of reliability - the technical scientific definition, and the commonly used definition that I was using (from the Oxford English dictionary):
That's the beauty (and frustration) of the English language for you - many words have several different meanings, some even contradictory to each other. The bottom line is these tests are poor ways of determining your distortion audibility threshold, that can lead to mistruths being proliferated all over the internet. If you're going to do something, do it properly.
I can understand the sentiment, but in my opinion it's a mistake to ignore something of utility because it could be done better. To me the question isn't "is this test as accurate as possible", the question is "is this test useful?" and within it's constraints, yes it is useful.
No data is better than wrong data