• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

New Research on Audibility of Distortion in Headphones

This can be improved by shaping the level of tones vs. frequecy as described in IEC-268-1. The shape is basically 1/f (pink noise) with a high-pass somewhere below 60Hz and also a low pass in the upper treble range if I recall right.
Yes, that's what Prof. Anselm Goertz in his Sound & Recording studio monitor tests used and I also adopted for my personal measurements.
 
EAC started to run multitone tests some time ago, below examples (voltages are different because output was spl-calibrated)
And what do you learn from that? How do you perform a perceptual analysis on it? Here are my HD measurements:

index.php

Notice how explainable the results are. You know the harmonic order. You know the region of unhappiness from 200 to 1 kHz.

What did you possibly learn from the IMD that is useful above the measurements above???
 
A sweep will never show you the Intermodulation (IMD) or Doppler distortion that occurs when a 40 Hz excursion modulates a 2 kHz tone.
And how do you see either in a Multitone??? Doppler distortion being FM, would produce wide spectrum of Bessel spikes. Unless the modulation is massive, it will easily get lost in the intermodulation created by non-linearities -- the same ones HD shows. As to IMD itself, you no longer have the large excitation in base and small signal as high frequency. So that too is lost. Net, net, if your aim is that data, you have the wrong tool for the job when it comes to Multitone.

You argued that multitone is a 'less severe' test because energy is spread out. While the per-tone SPL is lower, the stress on the motor system is much higher. In a THD sweep, the voice coil is mostly centered except at the lowest frequencies.In a multitone test, the low-frequency components keep the voice coil pushed toward the limits of the magnetic gap Bl(x) and suspension Kms(x) while the high-frequency tones are being reproduced. This 'stress test' reveals nonlinearities in the motor and inductance that a sweep - which allows the coil to return to center for every high-frequency measurement - simply misses.
Sorry, no way that is true. I can push the driver to maximum with a single tone at 20 Hz. It is just a matter of input voltage to the driver. No other test that spreads that energy across the spectrum does not. I can't believe you are arguing this.

You are also ignoring the artifacts of Multitone testing. You get spectrum leakage from each band which is overlayed on top of the distortion products.

Ultimately Multitone is a soup of artifacts and multitude of distortions, which works counter to how our hearing works. As I noted, its main application is "one and done." A person can see a quick view of multiple things at once. And so can a factory worker testing a speaker.

Harmonic distortion on the other hand, easily lends itself to perceptual analysis because that is how our hearing works (banks of filters). Just look at how revealing the F6B distortion results were compared to the mess that Erin had produced. Mine easily points to problem areas that make sense. His is some stepped and smoothed graph with little meaning. It also has some inversion in low frequencies which make no sense.
 
And how do you see either in a Multitone??? Doppler distortion being FM, would produce wide spectrum of Bessel spikes. Unless the modulation is massive, it will easily get lost in the intermodulation created by non-linearities -- the same ones HD shows. As to IMD itself, you no longer have the large excitation in base and small signal as high frequency. So that too is lost. Net, net, if your aim is that data, you have the wrong tool for the job when it comes to Multitone.


Sorry, no way that is true. I can push the driver to maximum with a single tone at 20 Hz. It is just a matter of input voltage to the driver. No other test that spreads that energy across the spectrum does not. I can't believe you are arguing this.

You are also ignoring the artifacts of Multitone testing. You get spectrum leakage from each band which is overlayed on top of the distortion products.

Ultimately Multitone is a soup of artifacts and multitude of distortions, which works counter to how our hearing works. As I noted, its main application is "one and done." A person can see a quick view of multiple things at once. And so can a factory worker testing a speaker.

Harmonic distortion on the other hand, easily lends itself to perceptual analysis because that is how our hearing works (banks of filters). Just look at how revealing the F6B distortion results were compared to the mess that Erin had produced. Mine easily points to problem areas that make sense. His is some stepped and smoothed graph with little meaning. It also has some inversion in low frequencies which make no sense.
You mention that you can push a driver to maximum at 20 Hz with a single tone. That is true, but that test only tells you how the speaker handles 20 Hz. It tells you nothing about how that 20 Hz displacement affects the rest of the music. In a THD sweep, when the analyzer is at 2 kHz, the voice coil is sitting at x=0 (rest). In a multitone test, the low-frequency content keeps the coil at xmax while it tries to play 2 kHz. This captures the Modulation of the Force Factor Bl and Inductance Le. When the coil is at the edge of the gap, the motor's grip on the high frequencies changes. A chirp/THD sweep is physically incapable of measuring this inter-frequency interference because the low and high frequencies never meet.

About your mentioned spectral leakage and 'messy' graphs. Modern multitone testing (like Klippel’s) uses non-integer related frequencies and synchronous sampling to ensure that every single distortion product lands in a bin where there is no stimulus. It isn't a soup but a precise map. When you see the grass or noise floor rise in a multitone test, that isn't leakag but integrated distortion power. While you prefer the 'clean' lines of a chirp, those lines are clean because they are ignoring the intermodulation noise that actually exists when a speaker is under the stress of a complex signal.

You asked how we see Doppler in multitone. While it’s true that FM produces Bessel spikes, in a loudspeaker, these appear as sidebands around the higher-frequency tones. In a THD sweep, there is no second tone to be modulated, so Doppler distortion is zero. In a multitone test, Doppler is excited and measured. If you believe Doppler and IMD are lost in the multitone, I’d suggest looking at IEC 60268-21, the international standard for loudspeaker measurements. It explicitly recommends multitone because it is the only way to measure the total distortion dmt that includes the IMD and FM components that a chirp ignores.

You are right that Fielder’s harmonic analysis is easier to map to filter banks. But a speaker doesn't just produce harmonics; it produces intermodulation noise that raises the floor in the masking valleys. If a designer uses only your sweep results, they might fix a resonance at 1.5 kHz (the sharp spikes you mentioned), but they will never see that their motor design is causing the midrange to smear every time a bass drum hits. One measurement finds the broken parts; the other finds the weak parts. We need both, THD is not sufficient for audibility.
 
In a multitone test, the low-frequency content keeps the coil at xmax while it tries to play 2 kHz.
Say what? The 20 Hz is not DC. It is pushing the driver through its excursion. And it is doing so at well below the level that a single tone does. This is just math.

And no, it is not "playing 2 kHz." It is seeing a combined voltage that sum of all of those tones.
In a multitone test, Doppler is excited and measured.
Show me. F6B's multitone was shown. Where do you see the doppler effect and its level???
 
What did you possibly learn from the IMD that is useful above the measurements above???
Maybe this?

IMD.+HD.png

EDIT: I amended the diagram
 
Last edited:
And what do you learn from that? How do you perform a perceptual analysis on it? Here are my HD measurements:

index.php

Notice how explainable the results are. You know the harmonic order. You know the region of unhappiness from 200 to 1 kHz.

What did you possibly learn from the IMD that is useful above the measurements above???
Have you ever tried to investigate more around these kinds of distortion patterns? High distortion in bass (expected, naturally), decreasing, but then increasing again.
 
Have you ever tried to investigate more around these kinds of distortion patterns? High distortion in bass (expected, naturally), decreasing, but then increasing again.
You can usually see that in the near-field response with the woofer output not filtered enough:

index.php


Note the comment about "rather slow" meaning the filter slope is not high enough to truncate the messy output of the woofer. You can also see the port/cabinet output being quite distorted and messy. Indeed, i showed that separately:

index.php
 
You can usually see that in the near-field response with the woofer output not filtered enough:

index.php


Note the comment about "rather slow" meaning the filter slope is not high enough to truncate the messy output of the woofer.
But that is quite high f. While there may be higher distortion below pass-band due to break-up, these usually occur 1-3 kHz. Which can be dealt with notch filters, as e.g. noted in a Purifi paper. I note that the 200-500 Hz distortion peaks may have multiple causes other than driver itself, port problems, cabinet damping problems, wrong material/dimensions of cabinets. Some conventional 2-way speakers just don't have these "peaks", others seem to follow a pattern with the high distortion in bass, decreasing and then increasing again.

 
Maybe this?

View attachment 528626
EDIT: I amended the diagram
Commenting on the left side, that part is completely bogus as it is showing less distortion at 10 volts than it does at lower voltages!

index.php


Notice where the green line is going. It is off by some 20 dB even if the distortion did not change! One advantage of CHIRP measurements is that it hugely increases signal to noise ratio. Such is not the case with multitone and hence the reason this area my be corrupted.

As to that slight hump, who knows what it is about. Careful, controlled experiments need to be conducted to clearly tease out what it is. For now, it looks just like the hump in my HD graph, but shifted up a bit:

index.php


Look at the 96 dB response in mine.

But again, look the clarity of my graphs. We clearly see the expected sharp rise in bass distortion below 100 Hz. And again, below 50 Hz. This is confirmed by audible distortion in that region. The smoothed and coarse multitone graph shows noting of diagnostic nature.
 
But again, look the clarity of my graphs.
Solid explanations for the necessity of IMD measurements have been given. There is even a compelling example from Purifi including audio samples. Practical implementations of the measurement have been described, that are successfully used in product development and by other professional product reviewers.

As is the case with every measurement, IMD testing must be done right to give meaningfull results. Erins graphs are measured with pink-noise-shaped multitone-signal. As previously described there are better options to cause large excursion in the bass, which is required to drive the speaker into nonlinear BL range. Such measurements answer the question how much bass a speaker can play while sounding clean in the midrange. An important information.

Nevertheless, Erin's measurements do show a clear differenz between a two-way and a three-way speaker. While the former is approaching 10% IMD at 800Hz, the latter is at 1%. 10% IMD is audible in my experience. And the pink-noise test-signal shape is not the worst possible signal for IMD.

As long as only single tone distortion is measured (a sweep with only one frequency at a time), we will not see what a two-way speaker does to the midrange while producing high THD in the bass. You are uncomfortable with the THD rise in bass, but IMD is actually much worse for audio quality. In most cases bass will drive the mid-woofer into nonlinear BL range and voices will become modulated by bass which causes excessive roughness that sounds really bad.

Today two- and three-way speakers are judged equally on ASR with respect to low-end distortion. It is ignored that distortion in bass becomes particularly relevant when midrange is compromised. Woofer distortion including mainly 2nd/3rd harmonic can even be seen as a matter of taste because it sounds like more bass. IMD in the midrange will be disliked by everbody. Distortion measurements should be able to reveal the main advantage of three-way loudspeakers. THD is not capable of doing that.

1777512199548.png
1777512219650.png
 
Last edited:
Solid explanations for the necessity of IMD measurements have been given. There is even a compelling example from Purifi including audio samples. Practical implementations of the measurement have been described, that are successfully used in product development and by other professional product reviewers.
Careful. The discussion here is about audible distortion. I have yet to see IMD tests dominating impairments in audible form with speakers. The kind of non-linearity we see is easily identified with CHIRP log sweeps. And this data lends itself to psychoacoustic analysis as I have explained multiple times. I also quotes peer review research that demonstrated IMD is not a concern in low bass frequencies.

But let's say we ignore all that. As I explained, a 2-tone IMD test is useful for driver engineering/testing. But not so in the context of a reviewer like me. Such tests need to have their tones adapted to the product. Hunting around for which tones to use is not practical (I have tried) and at any rate, results can't be compared to another speaker that needs different set of tones.
 
As is the case with every measurement, IMD testing must be done right to give meaningfull results. Erins graphs look like flat spectrum test signal to me.
He doesn't do IMD testing in the way you mean, i.e. two tones. He is running a multitone test such as you see in my electronics measurements, but in a different format. As I have explained, such test throws everything in there, plus some artifacts of measurements, making it impossible to tease out what is going on. The measurement is way too complex to apply any kind of psychoacoustic analysis to it. This is why I don't run it.
 
As long as only single tone distortion is measured (a sweep with only one frequency at a time), we will not see what a two-way speaker does to the midrange while producing high THD in the bass. You are uncomfortable with the THD rise in bass, but IMD is actually much worse for audio quality. In most cases bass will drive the mid-woofer into nonlinear BL range and voices will become modulated by bass which causes excessive roughness that sounds really bad.
Sorry, but no. The whole basis of figuring audibility of low frequency distortion is the fact that the non-linearity is so strong, that it creates high order harmonics that land where our hearing is most sensitive: 3 to 5 kHz. High pass filter a speaker and you often hear increased clarity in those ranges.

IMD in higher frequencies will only have an effect if you don't have these bass harmonics. Even then, the music levels are going to be far lower than bass causing IMD components to be even lower than they already are. Artificial test tones at full amplitude do not represent actual music use case. I can easily make a case of a speaker being drive to maximum capability in bass. You can't do that with straight face for 1 kHz upper tone in IMD.

See the results of the research just posted where IMD had no correlation with listening results.
 
Very interesting talks.
The first one was from Steve Temme of Listen Inc, he/they developed the SoundCheck software (that Sean Olive used for his testing).

He first went over the psychoacoustics of hearing as a starting point: noting that the ear is non-linear and that our hearing changes; noting that at low levels (i.e. very quiet room — i.e. anechoic chamber) the low frequency is dominated by the blood flowing through our ears/head and the high frequency reaches the limits of what the cochlear bones can mechanically do. And at higher SPLs human hearing frequency range flattens out, the '0 dB' being between 1 and 5 kHz in only very quiet places.

His first section was on how the ear will mask distortion or tones. His example of having two tones together at similar levels: e.g. 500Hz, 600Hz, with 500Hz having a lower level. Then if you put a 450Hz tone at a much higher level than the other tones, that the human ear will mask the 500Hz tone and you'll only hear the 600Hz tone. (Don't remember the exact numbers but, the masking did make differences in the perceived distortion and even volume.)
And since THD usually is measuring the 2 or 3rd harmonic; that if using say a 1kHz tone the listener will effectively mask out the 2nd or 3rd harmonic, but, since the human ear is more sensitive at higher frequencies it will perceive the higher frequency HD. And the psychoacoustics will mean that the person will perceive the volume as louder. This is why when you have a low-distortion source you want to turn it up because you perceive it as quieter because of hearing less distortion.

Next he talked about how standard THD was referenced to the fundamental frequency and that when the HD was normalized (to the other harmonics) it would show lower distortion in the lower frequencies and higher distortion (measured) in the higher frequencies (this was also seen in Olive's testing). He had some great graphs to show the effect. And showed some measurements of loudspeakers that showed the effect and played sound samples that when the speaker that had more lower frequency HD that was right at the edge of human hearing (avoid the 'masking effect'), and lower high frequency distortion it sounded ok, but, the other example that had higher frequency HD that was higher levels so above the 'masking effect' you could hear more buzz distortion noise, even though the lower HD was higher than the first speaker.

So, he noted they needed to try and find measurements that did show differences (i.e. could measure the distortion), and that the human ear could detect, that accounted for psychoacoustics of differences in SPLs and the masking effect. That's in the last slide "Conclusions" that was posted above.

He also talked about noticing distortion when turning on ANC (active noise canceling) on a headphone. And first thinking that the ANC was adding distortion to the headphones. They did some null-like tests to show the differences with some headphones they tested (record playing music with and without ANC). One interesting thing was the ANC circuits would improve some distortion of the headphones, but, that the DSP (ANC processing) would also impart some artifacts. Also interesting is because of the active cancelling it would many times need to boost the whole signal up so it wasn't all canceled out. He found that POLQA (from the telecoms industry) did the best at measuring or at least detecting that with ANC on that the sound quality (when playing music) changed. So it wasn't entirely distortion that was being introduced. POLQA also is more tuned to packet-based signals so he was surprised that it work, but, when using PEAQ it didn't seem to detect the time domain distortion that was seen in the testing (i.e. the ANC DSP does take some time, and has issues with transients. so it has to 'catchup' to the signals it's playing).

Sean Olive presented a talk on some testing he did of perception of distortion in headphones. He used the LDC-X as a 'lowest distortion' headphone for the tests. Then picked some headphones and recorded (with a B&K Type 5128-B) music played through each of the headphones from 80 to 110 dB. Normalized them to -16 LUFS, and then played back at ~83 dBA. This is the 'RTINGS study', they were tested on with ABX test (in software) where they would start at the highest dB playback and then stepped down the dB for each correct answer. One being just the recording itself and the other being the 'recorded through the headphones'. For incorrect answers it would step back up 6 dB, and continue the test. He'll be presenting the full paper at the next AES in Copenhagen.

He noted that when first testing the ambient noise of the recordings was enough for the testers to perceive differences, so they had to go somewhere with better controlled sound.

He then correlated with various measurements: Non-Coherent Distortion, IMD normalized and non-normalized, etc. And found that NCD was better correlated with distortion. All very interesting and how the two studies compared to each other. He did see odd affects from the Crinicle Red IEMs and other things — check the RTINGS thread for more. He also mentioned a paper: "Perception & Thresholds of Nonlinear Distortion using Complex Signals" (warning PDF! from Google Scholar) from Aalborg University that did some interesting testing of other distortion types (e.g. square wave, etc.) by Eric Mario de Santis & Simon Henin. And would like to incorporate some of their findings into testing going forward. (I asked about why to use an ABX test and he noted that it gave a reference or control using the recordings itself, so if a listener couldn't hear a difference they could throw out those data.)

An interesting mix of industry people, talked with someone from Google who worked on the pixel platform, and others (looking at name-tags) from Apple, Logitech, etc. But, I think there were some others just there to hear the talk, like me. I should have asked if others from ASR would be there to meet up. And talked with one of the other Listen employees, mostly about music and trying not to spend too much money on records on Discogs or audio equipment on Reverb. Someone in the audience commented about 'are there really distortion measurements by reviewers?' and Sean noted that RTINGS, ASR (here), and others are doing the measurements now so "you don't have to buy bad headphones, the data is out there."

I even talked with Steve afterwords a bit; about how having music in ones life is always good. And with Sean as we walked out to our cars about how the human ears can lie to you and you need something to check yourself, you can't just "trust your ears".
 
View attachment 528792
The session is being recorded, but I am not sure if/when it will be posted somewhere.
View attachment 528794
What's interesting in the slides is that it mentions 104 to 112 dBA.
The dist measurements Amir does however are dBZ.
At 200Hz with music this can easily create a difference of 10dB on the meter.
This means that at 112dBA the actual signal on the driver might even be 120 to 122 dB (in the distortion plots).
This coincides with real world listening to music at loud levels where distortion sets in with headphones.

This is also why I am of the opinion that a headphone amp should be able to provide enough 'power' to drive a headphone to 120dB in order to ensure that when one or rare occasions (high DR music at impressive level) it is easy to reach 120dB SPL.

This basically means that when looking at Amir's distortion plots at 114dB you might actually be listening at 104dBA when listening to music.
Still this is 4x times as loud as 'normal' listening levels.

And this is even worse for bass notes (say around 60Hz) where the difference is 25dB or so.
Meaning that when looking at distortion measurements and relating those to real world music listening at 80dB average the actual SPL for bass notes can be 105dB in Amir's plots.

This is important to know when looking at distortion plots when thinking of real world issues.
When listening to music at loud levels. (dBA measurement would show around 100-104dB)

For 114dB measurements look at plots from 20Hz to 200Hz.
For 104 dB measurements look at plots from 200Hz to 4kHz.
The 94dB distortion plots are relevant from 4kHz to 20kHz.

When listening to music at studio reference levels, this comfortable loud. (dBA measurement would show around 80dB) this means that dBZ measurements could be around 90dB so in this case 94dB plots are already relevant and for low bass even the 104dB measurements.. note this is at 80dBA.

Look at 94dB measurements in the plots from 20Hz to 200Hz.
For frequencies > 200Hz the distortion will be lower than the 94dB measurements.

This is why it is important to measure 114dB (for frequencies below 200Hz) and pointless to look at higher frequencies in the 114dB plot.
The 104 dB plot is more telling for mids and the 94dB plot for the higher frequencies.
When listening to music that is.

When one only wants to listen to sensible levels the 90dB/94dB plots are the only ones that are relevant.

To put this is perspective that most headphones at 96dB measure much much lower in distortion than most speaker measurements at 96dB which show distortion in the bass well over 10%.
 
Last edited:
you can't just "trust your ears".
You can't trust them as an analyzer/measurement/'sound quality assessment device' but fortunately you can trust them when it comes to music enjoyment. :)
 
Careful. The discussion here is about audible distortion. I have yet to see IMD tests dominating impairments in audible form with speakers. The kind of non-linearity we see is easily identified with CHIRP log sweeps. And this data lends itself to psychoacoustic analysis as I have explained multiple times. I also quotes peer review research that demonstrated IMD is not a concern in low bass frequencies.
Of course this whole discussion is about audible distortion. I'm not interested in solving inaudible problems.

Please note that I edited/corrected my post while you were answering. Erin uses a pink-noise-shaped multi-tone stimulus, which is already good but not the most revealing test. In the two-way vs. three-way measurement example from him, that I added, relevant differences in IMD are visible in the midrange.

An important difference in THD vs. IMD plots is that the former show harmonics at the test signal frequency, while they are actually located at multiples of the test frequency. For IMD, distortion products are shown at their actual frequency. The important part is that distortion products in the midrange can still be generated by bass tones.

While a 40Hz tone (f1) will barely show harmonics at 200Hz as k5 is usually relatively low, it will show modulation products with e. g. 260Hz (f2) located at 180/220Hz and 300/340Hz. Because auditory masking mainly works upwards in frequency, especially the modulation products below f2 will be very audible as they are also sufficiently remote from f1 and thus not masked.
But let's say we ignore all that. As I explained, a 2-tone IMD test is useful for driver engineering/testing. But not so in the context of a reviewer like me. Such tests need to have their tones adapted to the product. Hunting around for which tones to use is not practical (I have tried) and at any rate, results can't be compared to another speaker that needs different set of tones.
I proposed a lower test frequecy f1 at the lower cut-off frequency of the speaker (e. g. -3/6dB frequency or -10dB frequency x 2). This will show if the speaker designer made a sensible choice when choosing the lower cut-off frequency for the design. Makes no sense to test a large floorstander at 80Hz or a small bookshelve at 20Hz. The simple rule proposed above will avoid such issues.

If the designer of a relatively small speaker decided for 30-40Hz lower cut-off, the results will strongly depend on the frequency distribution over speaker drivers. In such cases, the speaker should better be a three-way, otherwise midrange IMD may be high (still depends on BL linearity). Stepping up f1 in level will show IMD rising and give an idea how much bass can be played while enjoying good sound.

Now there are corner cases, for example a low-frequency limiter that avoids increase of bass output at some point. In this case the result still shows that not more bass than provided with the limiter is possible. I would always add IMD measurements with pink-noise-shaped stimulus for a second perspective.

BTW. for 2-tone IMD tests, psychoacoustic weighting with masking thresholds of f1/f2 is also possible. This would provide an even better perspective on the severity of the measured distortion products.

Finally, the whole "IMD in the bass is not important conclusion" results from wrong testing methods. Unless bass and midrange frequencies play at the same time in a fullrange speaker, results will not show audible problems.
 
Back
Top Bottom