• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Is Audio Science Review going about it all wrong? Or partly wrong? Or all right?

I want to start a crusade that everyone interested in high quality audio reproduction should try a level-matched double blind listening test sometime in their lives. For me it was very enlightening and really helped my thinking about audio electronics. And made me a more relaxed and happier listener.
This is a very good proposal!
I set up these kinds of tests for myself as a young adult using an amplifier switchbox I made. I took my switchbox to a couple of friendly audio salons that had very good speakers that were set up well. I compared a 70 watt JVC receiver and an 80 watt Sansui integrated against the TOL high-end preamps and amps they had at the salons. I couldn't hear any differences (a long as we kept volumes below clipping). The salespeople thought they could but they didn't do any better than random guessing and gave up pretty quickly. To me the lack of difference was definite and, well, liberating. I had heard very clear differences in sighted tests.

tl:dr I heard all sort of differences in electronics until I did fair blind comparisons. After that, it was clear that speakers, acoustics, and source material were a better place to focus.
I can tell a similar story. Sometime end of the last millenium (after I finished my DIY preamp) I did some Opamp rolling on my DAC, an Arcam Black Box 3. I replaced the Opamps in the analog output stage by OPA2134 (the same I used in the preamp) and added 100nF film caps in parallel to all electrolytic caps (that was the hype back then). When I was finished I did a non blind test comparing the modified Black Box 3 with an unmodified model (had a second one used in our second system). Oh boy, what an improvement this has brought - more dynamic, better sound stage, veals lifted - you name it.

The interesting part happened the day after, when I asked my wife to perform a blind test. I scored 8 of 10 right - so it may be statistically significant. But - it was extremely hard to hear any difference at all! This told me once and for all that I cannot rely on what I'm hearing, and the direct consequence was that I did not modify the second Black Box since the very marginal improvement was not worth the time and money invested.

The typical audiophile trusts his ears and usually argues that the human ear can hear things which science can't measure, thereby ignoring the fact that science has indeed shown that the human ear is fallible and the engineers can measure differences where humans fail in blind tests but not the other way around. There is a ton of scientific results showing the ability and limits of the human hearing sense (combination of ear and brain processing) but it is disregarded by audiophiles because they cannot accept that their sense may tell a lie. Maybe because one had to accept that a very costly investment in high end equipment might have been stupid.
 
I just want the music to sound good, maybe even lifelike, though that's a stretch to achieve without really big bucks.

This is another interesting question. After many years in the hobby, I decided that what I really wanted was a playback system that just very accurately reproduced what was on the recording; that if I could get accuracy (and the neutrality that requires), then the recording is responsible for the "lifelike" part.

And I have found that the more neutral my headphones and correction equalization became, the more forgiving I am of bad recordings. I tell myself I'm just hearing things "the way the artist intended" and it's easier to accept recording flaws, because artists. Also, I can listen longer, louder, with less fatigue, to more diverse recordings.
 
That's not what I was getting at, I was saying that once you accept that the difference between some components may well exist, and you may even be able to pass a DBT, it's not the fact that the difference exists that counts most, it's how big the audible difference is. When you recalibrate to accepting the audible difference as very small, then how you perceive them sighted can be effectively discarded.
Sorry I have lost you here.
1. You started off with being able to perceive an audible difference between components in sighted experiment
2. You performed DBT and were unable to detect the difference/detected very little difference to identify the threshold of hearing (i.e identify the "audible difference").
3. Now you go back to step #1 and you apply learnings from step #2 ("recalibrate") to convince yourself that the difference you hear in sighted setup are not significant enough.
Am I getting it right?

I am going to conduct test at work in a few days. Without going into much details it would be a simple study where a group of people will be exposed to some audio artifacts. I will ask them to provide the "annoyance score" as well as to count number of artifacts they have noticed. "Annoyance score" should give me subjective opinion of how well a given system performs for each participant. Whereas number of artifacts counted should provide me more "objective" measure. What I am afraid is going to happen is that number of artifacts recorded(noticed) by participants will affect their subjective score (annoyance score). I would like to run this experiment once using the same group of people but do not know how to get objective and subjective data at the same time without having much influence.
 
The typical audiophile trusts his ears and usually argues that the human ear can hear things which science can't measure, thereby ignoring the fact that science has indeed shown that the human ear is fallible and the engineers can measure differences where humans fail in blind tests but not the other way around. There is a ton of scientific results showing the ability and limits of the human hearing sense (combination of ear and brain processing) but it is disregarded by audiophiles because they cannot accept that their sense may tell a lie. Maybe because one had to accept that a very costly investment in high end equipment might have been stupid.
I think you oversimplify things. Take a look at this video (fast forward to ~22minutes into it). Do you want to guess what measuring equipment ESS is using?
 
Sorry I have lost you here.
1. You started off with being able to perceive an audible difference between components in sighted experiment
2. You performed DBT and were unable to detect the difference/detected very little difference to identify the threshold of hearing (i.e identify the "audible difference").
3. Now you go back to step #1 and you apply learnings from step #2 ("recalibrate") to convince yourself that the difference you hear in sighted setup are not significant enough.
Am I getting it right?

In a fair DBT experience, when you flip the switch from one to the other, and hear no difference, then try again, and again, and again, or maybe even sort of hear a subtle difference but aren't sure, it's an amazing experience. For me it was far more convincing than any differences I ever heard in a sighted comparison. The "absolute similarity" was way more audible than the "differences" I heard. YMMV, of course. I think if there's a lot riding on confirmation of audible differences, it will be be a bad and frustrating thing to try.
 
Funny you mention that. A number of times when testing headphone amplifier I have found very large differences in soundstage. Then I realized in *every case* the channels were swapped! Remarkable how the brain interprets soundstage differently just because the channels are reversed.

Outside of that, I have listened and tested dozens and dozens of headphone amplifiers. Not once have I found a difference in soundstage. Yet every subjective review claims differences in soundstage between headphone amplifiers.
This is no wonder. Soundstage experience in the play back chain is a question of inter channel crosstalk. As far as I know a channel crosstalk of more than 35 or 40 dB is totally sufficient for a full stereo experience. These are numbers only a very badly designed amplifier may not reach.

And this is good, since there is no phone cartridge available with higher numbers for crosstalk. Folks who listen to vinyl need not care much about crosstalk as any competent designed amplifier is way better than required.
All of this aside, soundstage is in the music. For sound reproduction, we can modify that with different levels in each channel or delay. Both of these are very easy to measure.

BTW, there is a simple way to increase the sound stage. Take a part of the signal of each channel, hipass it, reverse its polarity and add it to the other channel. Used for example by the Behringer Ultrafex pro.
 
I am going to conduct test at work in a few days. Without going into much details it would be a simple study where a group of people will be exposed to some audio artifacts. I will ask them to provide the "annoyance score" as well as to count number of artifacts they have noticed. "Annoyance score" should give me subjective opinion of how well a given system performs for each participant. Whereas number of artifacts counted should provide me more "objective" measure. What I am afraid is going to happen is that number of artifacts recorded(noticed) by participants will affect their subjective score (annoyance score). I would like to run this experiment once using the same group of people but do not know how to get objective and subjective data at the same time without having much influence.

I'm sure your controls will be immaculate.
 
BT at matched levels is essential. It should be mandatory for anyone giving any kind of advice to others. I have two pairs of speakers almost side by side in a "friendly" room. It is not totally optimal because one pair is more separated than the other ( 1. but properly oriented to the same sweet spot 2. alternating the speakers is a no go for aesthetical reasons). There are audible differences between the speakers, totally understandable as they are quite different in design. I have (had?) a marked preference for one pair for some type of music and for the other for other types.

Now, even if they are my speakers in my room and I have spent a ton of time in front of them, when I matched the levels (with a umik at the sweet spot) and asked my partner to play random stuff from my library while I kept my eyes closed, freely switching from one system to the other (basically transferring Roon zone from one system to the other) the differences I heard and expected to hear collapsed. Don't get me wrong, the differences did not disappear totally, it is just that they suddenly didn't seem that significant.

On the other hand, when I select the zone myself, I now feel that expectation bias kicks in at full speed. Expecting one system to be more resolving and expecting the other system to have more punch definitely magnifies the differences.

I could have sworn I could tell the systems apart with 90%+ accuracy but it seems that it is more 65% ( we keep testing... ) Closing my eyes was eye-opening.

If I only had one system, I may have had the impression, thanks to some lyrical enthusiastic review, that I was missing something essential. That would have been an illusion. And before anyone asks, I did not end up with two systems because of a doomed pursuit but simply because one of the pair of speakers was a bit, hmmm, temperamental... :mad::mad::mad:

And, finally for the sake of completeness, I have to say that one of the pairs needed to be matched with an amplifier that had more grunt at low impedance before it could perform (and in that case, the difference was night and day).
 
I would like to see an experiment that was not blind folded. For instance group of audio "experts" gather at Amir's place and compare his ML amps to $300 receiver from Best Buy. Volume matched, same source, nowhere near the clipping confirmed with AP measurements for each amp along with everything else you believe that would hide the difference. Then someone distracts the experts for a short time and the system is reconfigured in the way that when ML are selected $300 receiver is playing with nothing giving them a clue as to what happened. In theory all of the biased should have been taken away in this sighted comparison. I wonder what the outcome of that experiment would be.
Such an outcome most likely will show my ML amps are superior. The ML amplifiers have tons of power. It would be trivial to push a $300 amplifier into distortion well before my ML amps would.

The right experiment would be to equalize performance of the amplifiers, not rely on prices.
 
I think you oversimplify things. Take a look at this video (fast forward to ~22minutes into it). Do you want to guess what measuring equipment ESS is using?
AP? I haven't listened to the whole video (thanks for the link, very interesting), but there are a few things to mention:
  1. In the blind tests the "audiophile guy" says "this sounds better" when comparing a normal DAC with a sigma delta DAC. Are we sure that the better sounding DAC really is the more transparent one - especially when listening to old recordings? The difference between those DACs is easily to measure, and a transparent DAC may reveal faults in the recording a none transparent DAC may not reveal and hence may appear as the better sounding.
    Well, according to this presentation the audiophile guy was correct.
  2. That humans hear below the noise floor is not a new finding. I think the engineers just thought that their numbers were good enough at that time. With averaging and high FFT sizes you can probe very deep into the noise floor. However FFT is not the only way to measure audio equipment.
  3. Go to 28:00 where the ability of the human hearing is described. It shows that science also advances. This is actually how science works: tests are done (here: blind listening tests) and then the engineers try to find the cause for differences. They were abke to do so and they also found a way to measure that part of the performance (noise dependency on DAC state) which correlates with the test results.
 
Such an outcome most likely will show my ML amps are superior. The ML amplifiers have tons of power. It would be trivial to push a $300 amplifier into distortion well before my ML amps would.

The right experiment would be to equalize performance of the amplifiers, not rely on prices.
Understood, that is why I recommended measuring amplifiers with AP to identify "not to exceed range". I get what you are saying though, you can replace $300 receiver and use Parasound JC-1 or similar amps. The goal is not to use speakers that require welding equipment (i.e. Krell).
 
Understood, that is why I recommended measuring amplifiers with AP to identify "not to exceed range". I get what you are saying though, you can replace $300 receiver and use Parasound JC-1 or similar amps. The goal is not to use speakers that require welding equipment (i.e. Krell).
My speakers and space I have to fill require that much power.

If you were to not use it, then yes, we could orchestrate content and situation where such a difference would be inaudible.
 
AP? I haven't listened to the whole video (thanks for the link, very interesting), but there are a few things to mention:
  1. In the blind tests the "audiophile guy" says "this sounds better" when comparing a normal DAC with a sigma delta DAC. Are we sure that the better sounding DAC really is the more transparent one - especially when listening to old recordings? The difference between those DACs is easily to measure, and a transparent DAC may reveal faults in the recording a none transparent DAC may not reveal and hence may appear as the better sounding.
    Well, according to this presentation the audiophile guy was correct.
  2. That humans hear below the noise floor is not a new finding. I think the engineers just thought that their numbers were good enough at that time. With averaging and high FFT sizes you can probe very deep into the noise floor. However FFT is not the only way to measure audio equipment.
  3. Go to 28:00 where the ability of the human hearing is described. It shows that science also advances. This is actually how science works: tests are done (here: blind listening tests) and then the engineers try to find the cause for differences. They were abke to do so and they also found a way to measure that part of the performance (noise dependency on DAC state) which correlates with the test results.
I have a lot of questions myself as well. I wish there was more information about this study. How brains react to high frequncies above "audio" range. At some point in time I worked for a company that used transducers that used 40kHz square waves (it was used to identify speaker placement with respect to listener position). During calibration of those transmitters testers reported headache after about 30 minutes of interacting with them. Obviously none of them could hear the sound coming out of them...
 
I have a lot of questions myself as well. I wish there was more information about this study. How brains react to high frequncies above "audio" range.
I only know of one such study and it was shown that their testing was invalid in that intermodulation distortion in the speaker created audible band content.

There is actually a wiki on it: https://en.wikipedia.org/wiki/Hypersonic_effect
 
The goal is not to use speakers that require welding equipment (i.e. Krell).

Oh.

I thought the goal was something else.

My bad.

Carry on!
 
I only know of one such study and it was shown that their testing was invalid in that intermodulation distortion in the speaker created audible band content.

There is actually a wiki on it: https://en.wikipedia.org/wiki/Hypersonic_effect
There is a report by Klein & Hummel about this study. The experiment was done as follows:
  1. using one speaker to reproduce a test signal (within 20 kHz range) plus a pulsed ultrasonic signal (31.5 kHz)
  2. using two speakers (one for test signal only, one for ultrasonic signal only)
In the first experiment the listeners could here a difference when the ultrasonic signal was present, in the second experiment not. Measurements showed that in the first experiment a distortion product of 3.5 kHz was emitted by the speaker, which is in the audible range.

Attached is a PDF of this report (in german language).
 

Attachments

  • hoeren_ueber_20kHz.pdf
    187.6 KB · Views: 378
I'm reminded of all the information about viewing distance from a hidef TV. Like https://www.rtings.com/tv/reviews/by-size/size-to-distance-relationship
The eye can only resolve so much, so the story goes.
But I drew a single-pixel-thick black line on a white background on my old 46" 1080P Samsung. I could see the line easily from 30 feet away, but a 46" 1080P screen is supposedly only worth it from about 9 feet away. It wasn't clearly resolved, but with my crap eyes, even with glasses, little is. My point is not that the viewing distances are completely off base, but that the description of perception and resolution is incomplete. I definitely think measurements can work. I'm just not convinced that we have the complete set of measurements going on here.

That said, I'm totally interested in doing an AB test. Anybody in the SF Bay Area, preferably East Bay, wanna help me do it? We can stack my Carver a-500x against your amp or my SMSL, or my $15 DAC against your $300 DAC or $3000 DAC or whatever. I've even got a nice DVOM we can use to measure voltage at the speaker terminals. Must be someone with good speakers, though. Preferably someone available weekday mornings.

BTW, how should I measure voltage? One lead on each speaker wire/terminal? How many volts is typical? Worth measuring amperage too? What do we do when one channel is louder than the other? This is complicated.
 
  • Like
Reactions: MWC
I'm reminded of all the information about viewing distance from a hidef TV. Like https://www.rtings.com/tv/reviews/by-size/size-to-distance-relationship
The eye can only resolve so much, so the story goes.
But I drew a single-pixel-thick black line on a white background on my old 46" 1080P Samsung. I could see the line easily from 30 feet away, but a 46" 1080P screen is supposedly only worth it from about 9 feet away. It wasn't clearly resolved, but with my crap eyes, even with glasses, little is. My point is not that the viewing distances are completely off base, but that the description of perception and resolution is incomplete. I definitely think measurements can work. I'm just not convinced that we have the complete set of measurements going on here.

That said, I'm totally interested in doing an AB test. Anybody in the SF Bay Area, preferably East Bay, wanna help me do it? We can stack my Carver a-500x against your amp or my SMSL, or my $15 DAC against your $300 DAC or $3000 DAC or whatever. I've even got a nice DVOM we can use to measure voltage at the speaker terminals. Must be someone with good speakers, though. Preferably someone available weekday mornings.

BTW, how should I measure voltage? One lead on each speaker wire/terminal? How many volts is typical? Worth measuring amperage too? What do we do when one channel is louder than the other? This is complicated.

The proper test would have been a pair of lines adjacent to each other. Separated by one pixel of white. Could you see (resolve) those as separate from 30 feet away? I think somewhere closer they would have merged together on you. The length of the line obviously is more than one pixel.

The measurements of voltage are simple. Pick a few pieces of music or one at least and set it to a comfortable loudness for yourself to listen to it.
Now play the tone and measure voltage. When you swap components use the same tone to get back to the same voltage.

If one channel is louder, you need to adjust the other component to have the same loudness separately for each channel. How you do this will depend upon the gear in your system. Might be in the playback software with a balance control.

I also might add the second rule of listening comparisons. If there is a real difference when level matched, it is almost always a frequency response difference. Not always, but the great majority of the time. The saying "hifi is 85% frequency response" comes from this idea. Some of the gear in your original post almost surely interacted with the speaker to cause a FR difference that likely was audible.
 
Last edited:
Back
Top Bottom