• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

New Research on Audibility of Distortion in Headphones

Czerwinski et al. (AES, 2001)Proved that a system with low THD can still have "enormous amounts" of intermodulation products. Found that difference-frequency products (e.g., f2 - f1) are the most audible because they are far from the original tones and thus not masked.

It is easy to poke holes in THD. I didn't ask about that. I asked where there is correlation between IMD and audible distortion. The paper we are discussing actually mentions there is no correlation.

Going by this reference anyway, I am familiar with his more recent J. AES paper where the authors discuss Multitone test that folks ask for:
Graphing, Interpretation, and Comparison of Results of Loudspeaker Nonlinear-Distortion Measurements

"As has been mentioned, the multitone stimulus, whose objective parameters, such as the probability density function, have similarity with a musical signal, seems to be a good candidate for a better testing signal. However, there is an important aspect of using multitone stimuli that should be considered here to be objective. Currently the interpretation of multitone test results does not have a well substantiated psychoacoustical support. So far we cannot derive precise judgments about the sound quality of a loudspeaker (that has been tested by a multitone stimulus) from the response to this signal."


He goes to say this about dual-tone IMD which I asked about:

"Since the dynamic reaction of a complex nonlinear system such as a loudspeaker cannot be extrapolated from its reaction to simple testing signals, such as a sweeping tone, the thresholds expressed in terms of the loudspeaker reaction to these signals (THD, harmonics, and two-tone intermodulation distortion) may not be valid."

In other words, not much closer than THD is.

Fielder's analysis based on harmonic distortion is the only psychoacoustically valid method for such assessments.
 

View attachment 527913

They are using B&K 5128 to record the different headphones and playing them back over Audeze LCD-X headphones. However, we know that In-Ear headphone measurements on the 5128 have inconsistent artifacts such as rocking mode resonances caused by the physical interaction between the in-ear headphones and the measurement device, that are not explainable by the difference in mean human ear canal impedance of 32 subjects versus an IEC 60318-4 coupler.

View attachment 527914

View attachment 527916
This research provides further evidence that the B&K 5128 system is not superior to the IEC 60318-4 and extended standards such as Amir's GRAS 45CA with an RA040x coupler.

"The nature of IEMs, with their ear canal resonances that are specific to the test fixture and insertion depth, may also have been a contributing factor, assuming post-equalization cannot compensate for this fully. Non-coherent distortion captures more than just headphone distortion: it includes recording and environment-induced nonlinearities, which we cannot prove are intrinsic to the headphones."

I think ultimately this is a matter of testing for edge cases. If the preference for distortion is tested without edge case material that would incite buzzing/rattling, then sure, it makes sense why listeners would have no preference for high vs low distortion.
Unfortunately a flawed test. "Fast Car" and "Spanish Harlem" only cause minor levels of IMD which is mainly triggered by high levels of low bass. And for "Fast Car" the guitar sounds distorted anyways. Unless one only listens to this kind of music, test songs alone invalidate the conclusion.

A good test track for IMD is "Astronaut in the Ocean" by Masked Wolf. The combination of bass and voice gets poor HP speakers to their limits. Recordings of this track at various levels played back at a lower listening level will reveal distortion easily.

Measuring IMD at 70+600Hz also makes no sense due to the same reasons. The lower frequency should rather be at or below 40Hz with lower frequencies resulting in higher IMD. At least IMD measurements and test song selection match with regard to bass extension.
 
And then ..... there was auditory masking....

View attachment 527928

and THD + IM measurements (at least for electronics)
View attachment 527922

View attachment 527923

can also be done on speakers
View attachment 527925
(picture from Klippel https://www.klippel.de/products/rd-system/modules/mton-multi-tone-measurement.html)
And from Amir:
View attachment 527926
That's the kind of test I would like to see with loudspeakers. Athough the multitone spectrum requires some shaping to make it more representative for music.
 
Both sound distorted. The nature of which becomes clear with listening. Not looking at those two IMD charts. And at any rate, that is not the kind of IMD chart you are asking about. The selection of two tones will highly depend on the speaker. For one driver, they could pick such but not universally for any and all speakers. I know, I have tried.
Both may sound distorted but one is still very close to the reference while the other is completely unacceptable. And THD cannot show this while IMD does.
 
My job is to move the industry toward measuring and eliminating barriers to full transparency
We should be all be very thankful for that. This site, only here because of your relentless pursuit of improving our audio experience, is a gem.
Many here, have lost track, because, at the end of the day, no one need a "perfect" distortion free, device of any kind, to enjoy music at home, also most products we have access to are already more or less fool proof for the task and have been for decades.
Knowing, because of your work, that almost any DAC available at any price, is not the issue, is just a blessing, as one can instead concentrate on something else, if not happy with what he hears. Knowing what DAC is closest to SOTA, is also interesting, but not for the same reasons, almost a different hobby.
Transducers, headphones and speakers, are the last frontiers, as far as device on where the "full transparency" still need some work, even if todays device performance, are totally adequate for the task of listening to music.
Thank you again for sharing your years of experience and irreplaceable expertise, shinning some lights on this very important part of our life that is listening to music.
 
Czerwinski et al. (AES, 2001)Proved that a system with low THD can still have "enormous amounts" of intermodulation products. Found that difference-frequency products (e.g., f2 - f1) are the most audible because they are far from the original tones and thus not masked.

It is easy to poke holes in THD. I didn't ask about that. I asked where there is correlation between IMD and audible distortion. The paper we are discussing actually mentions there is no correlation.

Going by this reference anyway, I am familiar with his more recent J. AES paper where the authors discuss Multitone test that folks ask for:
Graphing, Interpretation, and Comparison of Results of Loudspeaker Nonlinear-Distortion Measurements

"As has been mentioned, the multitone stimulus, whose objective parameters, such as the probability density function, have similarity with a musical signal, seems to be a good candidate for a better testing signal. However, there is an important aspect of using multitone stimuli that should be considered here to be objective. Currently the interpretation of multitone test results does not have a well substantiated psychoacoustical support. So far we cannot derive precise judgments about the sound quality of a loudspeaker (that has been tested by a multitone stimulus) from the response to this signal."


He goes to say this about dual-tone IMD which I asked about:

"Since the dynamic reaction of a complex nonlinear system such as a loudspeaker cannot be extrapolated from its reaction to simple testing signals, such as a sweeping tone, the thresholds expressed in terms of the loudspeaker reaction to these signals (THD, harmonics, and two-tone intermodulation distortion) may not be valid."

In other words, not much closer than THD is.

Fielder's analysis based on harmonic distortion is the only psychoacoustically valid method for such assessments.
This beggs the qestion from which audio measurement we can derive precise judgments about the sound quality of a loudspeaker? I'm not aware of such a measurement. The best measurements provide correlation to perceived sound quality but not a precise judgement. There are far too many variables involved to claim any level of precision.

I fully agree that we need distortion measurements which show scientifically proven correlation with human perception of sound quality. And I also think these should long exist - but unfortunately, they don't. So, to this date, everyone in need of a useable distortion metric must come up with a working solution for their specific requirements. If we wait for science or any measurement equipment manufacturer to establish a standard, we might have to wait for a long time.

For me this means, besides a bunch of other metrics, my fullrange drivers are measured with 2-tone burst signals and analyzed for IMD. Specifically, I use f1 = 20-1kHz (stepped frequency) and f2 = 8.5 x f1 with f1 20dB higher level than f2. Levels are stepped over e.g. 20dB in 1dB increments. The reason is that I want to know how much bass the speaker can generate before exhibiting excessive IMD in the midrange. This measurement is further analyzed in a max SPL graph over frequency for a given IMD limit (e.g. 10%). Effectively, maximum SPL in bass is therefore defined by midrange IMD.

Speakers with lower IMD/higher maximum SPL in this test sound cleaner when playing bass + midrange than speakers with higher IMD. No blind/ABX testing required as my employer is only interested in obvious improvements that normal consumers will appreciate (e.g. 6dB more bass while midrange is reasonably clean). This measurement is generally working well in a practical application for product development because it tracks meaningfull improvements that correlate with what I and others hear.

While I also measure THD and other noises that the speakers may generate, the latter including psychoacoustic weighting to evalluate audibility, IMD has become the most important distortion metric for sound quality. And it is basically an indirect measurement for BL linearity. Speakers with low IMD have more linear BL curves (which is also measured during speaker development).

If we want to assess maximum output from a bandlimited speaker (e.g. a subwoofer), HD measurements may be sufficient and some psychoacoustic weighting would probably be a good idea to put distortion into a perceptionally better correlated perspective. Application of masking curves as reference level for harmonics seems feasible although there are various masking curves and the best one needs to be evaluated, first. Some noise measurements for port chuffing etc. would also be nice.

For fullrange speakers (single or multi-way, headphone or speaker-box), maximum output can only be derived from measurements that include IMD (obviously HD should still be measured). Excluding IMD will lead to wrong conclusions as HD in the bass is not a sufficient metric for sound quality and maximum SPL of a fullrange loudspeaker. Failure to understand this, leads to wrong comparisons of HD data between e.g. 2- and 3-way speakers. 2-ways are much more restricted in HD because high levels of HD in the bass indicate a risk of high IMD in the midrange. If the speaker really shows high IMD, can only be measured. For 3-way speakers, IMD in the mid-woofer will not be caused by lower bass. So HD limits are much more relaxed, independent of distortion mechanism (Kms/Bl).

The advantage of 2-tone IMD measurements is that we clearly know which distortion components have been generated by which test frequencies. Multitone IMD measurements lack this clear assignment, which limits the insights that can be had from such tests. If focus is on a certain frequency range (e.g. bass), only 2-tone measurements allow deep insights. Nevertheless, I would also like to see more multi-tone measurements of loudspeakers. Preferably with spectrally shaped tone-levels to resemble average music spectra.

Klippel provides 2-tone and multi-tone IMD measurements and they can remove reflections from distortion measurements. The technology is there. It's merely a question of competent application to get the right data and correct interpretation. Personally, I use matlab scripts for all sorts of acoustic measurements, which is more flexible and much lower cost.

This is not meant as a request to you, Amirm, to do any additional measurements. Klippel SW modules are costly, measurements may be time consuming and by no means am I entiteled to ask you for anything. My only proposal would be to keep the complications of HD in mind, when judging speakers. HD measurements simply don't support clear judgement of sound quality. And measurement plots can evoke strong visual bias - especially in engineers.
 
Czerwinski et al. (AES, 2001)Proved that a system with low THD can still have "enormous amounts" of intermodulation products. Found that difference-frequency products (e.g., f2 - f1) are the most audible because they are far from the original tones and thus not masked.

It is easy to poke holes in THD. I didn't ask about that. I asked where there is correlation between IMD and audible distortion. The paper we are discussing actually mentions there is no correlation.

Going by this reference anyway, I am familiar with his more recent J. AES paper where the authors discuss Multitone test that folks ask for:
Graphing, Interpretation, and Comparison of Results of Loudspeaker Nonlinear-Distortion Measurements

"As has been mentioned, the multitone stimulus, whose objective parameters, such as the probability density function, have similarity with a musical signal, seems to be a good candidate for a better testing signal. However, there is an important aspect of using multitone stimuli that should be considered here to be objective. Currently the interpretation of multitone test results does not have a well substantiated psychoacoustical support. So far we cannot derive precise judgments about the sound quality of a loudspeaker (that has been tested by a multitone stimulus) from the response to this signal."


He goes to say this about dual-tone IMD which I asked about:

"Since the dynamic reaction of a complex nonlinear system such as a loudspeaker cannot be extrapolated from its reaction to simple testing signals, such as a sweeping tone, the thresholds expressed in terms of the loudspeaker reaction to these signals (THD, harmonics, and two-tone intermodulation distortion) may not be valid."

In other words, not much closer than THD is.

Fielder's analysis based on harmonic distortion is the only psychoacoustically valid method for such assessments.
You make a fair point regarding the lack of a standardized, "one-number" psychoacoustic rating system for multitone distortion (MD). However, there is a critical distinction between the difficulty of grading a result and the audibility of the artifacts themselves.
While the Voishvillo/Czerwinski paper admits we lack a "well-substantiated psychoacoustical support" to derive a precise sound quality score, that is an admission of a metric limitation, not a claim that the distortion is inaudible. In fact, the physical reality of the multitone stimulus is that it uncovers intermodulation (IMD) and frequency modulation (Doppler) products that fall into the "spectral valleys" of a signal.

Here is why relying solely on Fielder’s harmonic-based analysis is problematic for loudspeakers:

Fielder’s models rely heavily on the ear's masking curves. Harmonic distortion (THD) is, by definition, harmonically related to the fundamental, meaning it is most likely to be hidden by the primary signal. Multitone testing produces non-harmonic products that appear in frequency regions where there is zero masking protection. Even if we don't have a formula to "rank" the annoyance yet, these artifacts are physically present in the gaps where Fielder’s model would predict silence.

Loudspeakers are unique compared to the digital systems Fielder studied. A woofer’s excursion modulates the magnetic field and inductance for higher frequencies. This creates a "muddy" noise floor that is clearly audible as a loss of clarity, even if it doesn't show up as a spike on a THD graph.

In short, THD is "psychoacoustically valid" only within the narrow scope of harmonic masking. If a speaker sounds "congested" or "smeared" during complex passages despite having low THD, MD is the tool that shows you why—even if we are still perfecting the "scorecard" for it. Utilizing Fielder as a shield against IMD testing ignores the very non-harmonic artifacts that human hearing is most sensitive to.
 
"Average sound levels in a symphony orchestra pit also typically exceed 85 dB(A) during rehearsals and performances [117,118,119,120,121]. While orchestral musicians do not play together for 40 h per week year-round, about 50% exceed the weekly NIOSH REL during solitary practice alone [122]. Nevertheless, one early study of Swedish orchestra musicians by Karlsson et al. [123] reported no evidence of NIHL and concluded that “sound exposure criteria for industrial noise are not valid when discussing such sounds as are produced by acoustic instruments in a symphonic environment”. "

There is no question that some exposure to loud music is bad, really bad. Night clubs and rock concerts can be deafening, again due to treble frequencies, not deep bass. And duration. I put hearing protection on when I go to such places due to sheer discomfort. I do not feel the same at all, playing high dynamic range music with deep and strong bass. Use proper judgement and listen to your body and you will be fine in such usage in moderation.

Net, net, your categorial statement is wrong and fear mongering. Any research that cites noise standards is likewise faulty as your reference admits in multiple places.
And yet, "violinist ear" is real. And it's not only violinists. The study you referenced is behind a paywall, but the synopsis contradicts what my audiologist told me, and the experience of professional musician friends (violinists, woodwinds, brass, etc.) who have impaired hearing. Professional musicians do commonly get impaired hearing due to exposure to music, even unamplified. This is well known among audiologists, for example this article on an ENT audiologist site.

This hearing impairment can be from playing in groups - playing piccolo in the band, when the trumpets kick in right behind me I measure 120 dB SPL at my head. I can't stand it and have musician earplugs. Even playing solo, my piccolo produces a measured 108 dB SPL at my head. The flute is almost as loud. I can play the alto flute without earplugs but I wear them for flute and piccolo, even playing solo. It's downright painful otherwise. It's not uncommon for musicians to practice several hours daily and that extended exposure lowers the damaging thresholds.

As to this paper, it is a meta analysis and in completely different domain. Yes, hearing loss is an occupational hazard in musicians as Dr. Toole is fond of saying. But we are not musicians. We are not sitting next to a drum playing above levels that most home systems can produce, day in and day out. ...
Ah, I just saw this response. Your position is more nuanced than the earlier comment suggests.
 
Fielder’s models rely heavily on the ear's masking curves. Harmonic distortion (THD) is, by definition, harmonically related to the fundamental, meaning it is most likely to be hidden by the primary signal. Multitone testing produces non-harmonic products that appear in frequency regions where there is zero masking protection. Even if we don't have a formula to "rank" the annoyance yet, these artifacts are physically present in the gaps where Fielder’s model would predict silence.
That is the opposite of what is going on. Let me explain some background.

Multitone was created not as a new way to discover audibility of distortion but to speed testing of products in time sensitive manufacturing line. Traditional sweeps are slow and require multiple ones for different measurements. Multitone test on the hand, is very quick and generates not only distortion results but also frequency response.

But the above comes at a cost. To get the same SPL as a single tone, you would have to spread the energy across the number of tones you use for multitone. For this reason, distortion seen in multitone is lower than what you see in a THD analysis. You can see that in for example, amplifier measurements where I show the distortion and multitone results.

Further, the intermodulation products are many and hence, once again divide the distortion products into smaller components. This means if you apply psychoacoustic theory to them, they will result in them being deemed inaudible. From Fielder's paper:


"The previous section has concerned itself only with the analysis of harmonic distortion and has ignored intermodulation (IM) effects. This simplification is acceptable because IM produces products that are less or equally audible compared with the harmonics of sine waves. This is true because difference IM products for subwoofers lie in the first critical band with the masking signals and have their presence masked. The sum products are no more audible than the harmonics of a single sine wave with equivalent level and average frequency because they are less than or equal in frequency to the appropriate harmonic and are at a lower level since IM divides the amplitude between components."

Multitone testing was obsoleted with advent of CHIRP signals which is modern way of testing transducers and room. And what I and others use for headphone and speaker testing. It is a continuous signal where many other components just "fall out" of the signal making it ideal for fast test but also getting granular distortion products. Look at the clarity it provided in the last test I ran:
index.php


There is no need to perform any analysis. Something is seriously wrong here with the source known. Designer could have used this measurement to eliminate or reduce the impact. We are of no need of any other measurement.

If we wanted to, we could look at the harmonic order and assess audibility. Our hearing has radically different masking/threshold for each frequency band. That makes the comparison of distortion against audibility vastly simpler and a possibility. Such doesn't exist in multitone where all the distortion products are mixed together.

To emphasize, whatever non-linearity causes IMD distortion, also caused HD distortion. A totally linear product has neither.

Loudspeakers are unique compared to the digital systems Fielder studied.
Fielder's specialty is psychoacoustics of audio signals. Nothing there is specific to "digital" signals. He has written some of the most authoritative papers on this front and covers both speakers and headphones. This is his bio:

"Louis Fielder received a BS. degree in electrical engineering from the California Institute of Technology in 1974 and an MS. degree in acoustics from the University of California in Los Angeles in 1976.

During the period between 1976– 1978 he worked on electronic component design for custom sound‑reinforcement systems at Paul Veneklasen and Associates. From 1978 – 1984 he was involved in digital-audio and magnetic recording research at the Ampex Corporation. At that time he became interested in applying psychoacoustics to the design and analysis of digital-audio conversion systems. From 1984–2018 he has worked at Dolby Laboratories on the application of psychoacoustics to the development of audio systems and on the development of a number of bit-rate reduction audio coders for music distribution, transmission, and storage applications, i.e. AC-1, AC-2, AC-3, Enhanced AC-3, AAC, and Dolby E. Additionally, he has investigated perceptually derived limits for the performance for digital‑audio conversion, low-frequency loudspeaker systems, distortion limits for loudspeakers/headphones, loudspeaker-room equalization, and headphone virtualization. He managed the Sound Technology Research Department at Dolby Laboratories in San Francisco from 2005 – 2009.

Louis Fielder is a life fellow of the AES, a recipient of the AES Silver Medal, a senior life member of the IEEE, a life member of the SMPTE, and an emeritus member of the Acoustical Society of America. He was on the AES Board of Governors during 1990–1992, President during 1994 – 1995, and Treasurer 2005 – 2009."


If a speaker sounds "congested" or "smeared" during complex passages despite having low THD, MD is the tool that shows you why—even if we are still perfecting the "scorecard" for it.
And where is the reference, or research for this? Remember, the corpus of music out there will be wildly different than multitone with equal loudness tones. That "grass" that you see is not what you will see with real music spectrum. So don't assume that you see the same with music and hence, is "congested."
 
Not remotely so. Bookshelf speakers heavily distort with music content in 20 to 30 Hz even at modest playback levels.
I'm a huge appreciator of your hard work and the graphs you make available to us for free. I had questions about the volume levels at which you test. Most of us with speakers are listening at around 70-75 db in room. My LAeq inroom for daytime listening with two speakers is usually not above 75db. A graph for distortion at 76db would be more accurate for my personal daily use than one at 96db which is only reached for very brief transients.
 
A graph for distortion at 76db would be more accurate for my own uses than one at 96db, which is only reached for very brief transients.
I have gone down to 81 dBSPL now. If I keep going lower, the room noise interferes with measurements. 81 dBSPL @1 meter is pretty quiet.
 
That is the opposite of what is going on. Let me explain some background.

Multitone was created not as a new way to discover audibility of distortion but to speed testing of products in time sensitive manufacturing line. Traditional sweeps are slow and require multiple ones for different measurements. Multitone test on the hand, is very quick and generates not only distortion results but also frequency response.

But the above comes at a cost. To get the same SPL as a single tone, you would have to spread the energy across the number of tones you use for multitone. For this reason, distortion seen in multitone is lower than what you see in a THD analysis. You can see that in for example, amplifier measurements where I show the distortion and multitone results.

Further, the intermodulation products are many and hence, once again divide the distortion products into smaller components. This means if you apply psychoacoustic theory to them, they will result in them being deemed inaudible. From Fielder's paper:


"The previous section has concerned itself only with the analysis of harmonic distortion and has ignored intermodulation (IM) effects. This simplification is acceptable because IM produces products that are less or equally audible compared with the harmonics of sine waves. This is true because difference IM products for subwoofers lie in the first critical band with the masking signals and have their presence masked. The sum products are no more audible than the harmonics of a single sine wave with equivalent level and average frequency because they are less than or equal in frequency to the appropriate harmonic and are at a lower level since IM divides the amplitude between components."

Multitone testing was obsoleted with advent of CHIRP signals which is modern way of testing transducers and room. And what I and others use for headphone and speaker testing. It is a continuous signal where many other components just "fall out" of the signal making it ideal for fast test but also getting granular distortion products. Look at the clarity it provided in the last test I ran:
index.php


There is no need to perform any analysis. Something is seriously wrong here with the source known. Designer could have used this measurement to eliminate or reduce the impact. We are of no need of any other measurement.

If we wanted to, we could look at the harmonic order and assess audibility. Our hearing has radically different masking/threshold for each frequency band. That makes the comparison of distortion against audibility vastly simpler and a possibility. Such doesn't exist in multitone where all the distortion products are mixed together.

To emphasize, whatever non-linearity causes IMD distortion, also caused HD distortion. A totally linear product has neither.


Fielder's specialty is psychoacoustics of audio signals. Nothing there is specific to "digital" signals. He has written some of the most authoritative papers on this front and covers both speakers and headphones. This is his bio:

"Louis Fielder received a BS. degree in electrical engineering from the California Institute of Technology in 1974 and an MS. degree in acoustics from the University of California in Los Angeles in 1976.

During the period between 1976– 1978 he worked on electronic component design for custom sound‑reinforcement systems at Paul Veneklasen and Associates. From 1978 – 1984 he was involved in digital-audio and magnetic recording research at the Ampex Corporation. At that time he became interested in applying psychoacoustics to the design and analysis of digital-audio conversion systems. From 1984–2018 he has worked at Dolby Laboratories on the application of psychoacoustics to the development of audio systems and on the development of a number of bit-rate reduction audio coders for music distribution, transmission, and storage applications, i.e. AC-1, AC-2, AC-3, Enhanced AC-3, AAC, and Dolby E. Additionally, he has investigated perceptually derived limits for the performance for digital‑audio conversion, low-frequency loudspeaker systems, distortion limits for loudspeakers/headphones, loudspeaker-room equalization, and headphone virtualization. He managed the Sound Technology Research Department at Dolby Laboratories in San Francisco from 2005 – 2009.

Louis Fielder is a life fellow of the AES, a recipient of the AES Silver Medal, a senior life member of the IEEE, a life member of the SMPTE, and an emeritus member of the Acoustical Society of America. He was on the AES Board of Governors during 1990–1992, President during 1994 – 1995, and Treasurer 2005 – 2009."



And where is the reference, or research for this? Remember, the corpus of music out there will be wildly different than multitone with equal loudness tones. That "grass" that you see is not what you will see with real music spectrum. So don't assume that you see the same with music and hence, is "congested."
Do you test singe tones was well, looking at the full spectrum of e.g. 310 Hz?
 
That is the opposite of what is going on. Let me explain some background.

Multitone was created not as a new way to discover audibility of distortion but to speed testing of products in time sensitive manufacturing line. Traditional sweeps are slow and require multiple ones for different measurements. Multitone test on the hand, is very quick and generates not only distortion results but also frequency response.

But the above comes at a cost. To get the same SPL as a single tone, you would have to spread the energy across the number of tones you use for multitone. For this reason, distortion seen in multitone is lower than what you see in a THD analysis. You can see that in for example, amplifier measurements where I show the distortion and multitone results.

Further, the intermodulation products are many and hence, once again divide the distortion products into smaller components. This means if you apply psychoacoustic theory to them, they will result in them being deemed inaudible. From Fielder's paper:


"The previous section has concerned itself only with the analysis of harmonic distortion and has ignored intermodulation (IM) effects. This simplification is acceptable because IM produces products that are less or equally audible compared with the harmonics of sine waves. This is true because difference IM products for subwoofers lie in the first critical band with the masking signals and have their presence masked. The sum products are no more audible than the harmonics of a single sine wave with equivalent level and average frequency because they are less than or equal in frequency to the appropriate harmonic and are at a lower level since IM divides the amplitude between components."

Multitone testing was obsoleted with advent of CHIRP signals which is modern way of testing transducers and room. And what I and others use for headphone and speaker testing. It is a continuous signal where many other components just "fall out" of the signal making it ideal for fast test but also getting granular distortion products. Look at the clarity it provided in the last test I ran:
index.php


There is no need to perform any analysis. Something is seriously wrong here with the source known. Designer could have used this measurement to eliminate or reduce the impact. We are of no need of any other measurement.

If we wanted to, we could look at the harmonic order and assess audibility. Our hearing has radically different masking/threshold for each frequency band. That makes the comparison of distortion against audibility vastly simpler and a possibility. Such doesn't exist in multitone where all the distortion products are mixed together.

To emphasize, whatever non-linearity causes IMD distortion, also caused HD distortion. A totally linear product has neither.


Fielder's specialty is psychoacoustics of audio signals. Nothing there is specific to "digital" signals. He has written some of the most authoritative papers on this front and covers both speakers and headphones. This is his bio:

"Louis Fielder received a BS. degree in electrical engineering from the California Institute of Technology in 1974 and an MS. degree in acoustics from the University of California in Los Angeles in 1976.

During the period between 1976– 1978 he worked on electronic component design for custom sound‑reinforcement systems at Paul Veneklasen and Associates. From 1978 – 1984 he was involved in digital-audio and magnetic recording research at the Ampex Corporation. At that time he became interested in applying psychoacoustics to the design and analysis of digital-audio conversion systems. From 1984–2018 he has worked at Dolby Laboratories on the application of psychoacoustics to the development of audio systems and on the development of a number of bit-rate reduction audio coders for music distribution, transmission, and storage applications, i.e. AC-1, AC-2, AC-3, Enhanced AC-3, AAC, and Dolby E. Additionally, he has investigated perceptually derived limits for the performance for digital‑audio conversion, low-frequency loudspeaker systems, distortion limits for loudspeakers/headphones, loudspeaker-room equalization, and headphone virtualization. He managed the Sound Technology Research Department at Dolby Laboratories in San Francisco from 2005 – 2009.

Louis Fielder is a life fellow of the AES, a recipient of the AES Silver Medal, a senior life member of the IEEE, a life member of the SMPTE, and an emeritus member of the Acoustical Society of America. He was on the AES Board of Governors during 1990–1992, President during 1994 – 1995, and Treasurer 2005 – 2009."



And where is the reference, or research for this? Remember, the corpus of music out there will be wildly different than multitone with equal loudness tones. That "grass" that you see is not what you will see with real music spectrum. So don't assume that you see the same with music and hence, is "congested."
You’ve raised some good points regarding the history of multitone as a speed-efficiency tool and the energy-summation challenges. However, there are two critical areas where the "THD/Chirp is enough" argument breaks down when applied specifically to loudspeakers:

You mentioned that "whatever nonlinearity causes IMD also causes HD." While mathematically true in a static system, it is physically misleading in a transducer. In a loudspeaker, the most audible distortions are position-dependent. The Bl(x) Problem: If you run a high-frequency chirp, the voice coil stays near the center of the gap. If you run a low-frequency chirp, it moves to the ends. But only a multitone (or two-tone) test reveals what happens when a low-frequency note pushes the coil to the edge of the gap while it is simultaneously trying to reproduce a high-frequency note. This creates Intermodulation (IMD) that is not present in a single-tone sweep or chirp. You cannot 'extrapolate' the sound of a modulated 1 kHz tone from a static 1 kHz THD measurement. This is why loudspeakers sound 'congested'—the high frequencies are being 'shaken' by the low frequencies.

Fielder’s work on subwoofers correctly notes that IMD products often fall within the first critical band and are masked. However, this logic fails in full-range systems. In a 2-way or 3-way speaker, IMD products (specifically f2\f1) often land far outside the critical band of the stimulus. While it's true that multitone 'divides' the amplitude between components, it also creates a dense 'distortion floor' across the entire spectrum. Even if individual spikes are lower, the total integrated distortion power in the 'valleys' between musical notes can be significantly higher than the noise floor, leading to a loss of low-level detail (transparency) that a THD chirp simply cannot visualize.

You asked for a reference on why multitone explains 'congestion' where THD fails. I point you to Klippel’s Assessment of Voice Coil Peak Displacement' (2003) and his later work on Loudness of Distortion. Klippel demonstrates that THD measurements significantly underestimate the audibility of distortion during complex signals because THD doesn't account for the FM (Doppler) and AM (Gain Compression) effects that occur only when multiple frequencies are present.

From those exemplary:
"Clearly, the measurement of harmonic distortion is not sufficient for assessing all important aspects of the large signal performance as emphasized by Voishvillo [5]. Nonlinearities inherent in transducers such as force factor Bl(x), inductance Le(x) and Doppler produce significant modulation distortion."
Source: https://www.klippel.de/fileadmin/kl...t_of_Voice_coil_peak_displacement_XMAX_02.pdf

"Unfortunately, harmonic distortion measurement gives not a comprehensive picture of the large signal performance of loudspeaker systems. At least a second tone is required to generate intermodulation products which occur at difference and sum frequencies in all possible combinations of the excitation frequencies. Increasing the number of fundamental components in multi-tone stimulus will generate more and more intermodulation components spreading over the complete audio band. Contrary to the THD response in Fig.9 the nonlinear force factor Bl(x) and the inductance L(x) THD generates significant intermodulation distortion at higher frequencies as illustrated in Fig. 10. Thus, harmonic distortion measurements using a single test tone are not sufficient for assessing loudspeakers comprehensively and predicting the large signal performance for complex stimuli like music."
Source: https://www.klippel.de/fileadmin/klippel/Files/Know_How/Literature/Papers/Loudspeaker regular signal distortion caused by design_part 1_Klippel_Werner.pdf
 
You’ve raised some good points regarding the history of multitone as a speed-efficiency tool and the energy-summation challenges. However, there are two critical areas where the "THD/Chirp is enough" argument breaks down when applied specifically to loudspeakers:



"Unfortunately, harmonic distortion measurement gives not a comprehensive picture of the large signal performance of loudspeaker systems. At least a second tone is required to generate intermodulation products which occur at difference and sum frequencies in all possible combinations of the excitation frequencies. Increasing the number of fundamental components in multi-tone stimulus will generate more and more intermodulation components spreading over the complete audio band. Contrary to the THD response in Fig.9 the nonlinear force factor Bl(x) and the inductance L(x) THD generates significant intermodulation distortion at higher frequencies as illustrated in Fig. 10. Thus, harmonic distortion measurements using a single test tone are not sufficient for assessing loudspeakers comprehensively and predicting the large signal performance for complex stimuli like music."
Source: https://www.klippel.de/fileadmin/klippel/Files/Know_How/Literature/Papers/Loudspeaker regular signal distortion caused by design_part 1_Klippel_Werner.pdf
Regarding dense signals, it's very interesting to see the outcome of an FSAF measurement with periodic/non periodic pink or white noise as test signals.
The results are usually way worst that 2-tone or multitone signals, no matter the speaker, level or setup.
 
You cannot 'extrapolate' the sound of a modulated 1 kHz tone from a static 1 kHz THD measurement.
But you can if you have the distortion and phase of both signals you used in the IMD test. Remember, in speaker testing, we are not just testing at 1 kHz. We have a full sweep at all frequencies.
 
In a 2-way or 3-way speaker, IMD products (specifically f2\f1) often land far outside the critical band of the stimulus.
OK, I am now confused what you are asking about. A two-tone test is different than multitone. I thought you were advocating multitone measurements.

I have done a ton of testing with dual tones and speakers. It simply is not practical to develop such as a universal measurement for all speakers. The problem is, you don't know what two tones to use. Every speaker is different with different drivers taking on different ranges. In one speaker, 40Hz and 2 kHz may be in one driver, in another, in two drivers. One speaker can produce 40 Hz at full amplitude, but another, may not.

Multitone gets around that but then you have the issues I mentioned.

Note also that dual tone has similar problem to multitone in the way it excites the system with two tones, spreading the energy. So in that sense, it too is a less severe test of the system than single tone.

You asked for a reference on why multitone explains 'congestion' where THD fails. I point you to Klippel’s Assessment of Voice Coil Peak Displacement' (2003) and his later work on Loudness of Distortion.
That paper is about driver testing, not a full loudspeaker. In a driver, in engineering testing, you know the capabilities of the driver and hence can guess at proper set of tones. Not so for a full speaker of varying configurations. A lot of Klippel's products are for this kind of R&D work. Hence their papers and tests they have developed. As I explained above, it is hard if not impossible to do for speaker reviews.

As to this quote:

Thus, harmonic distortion measurements using a single test tone are not sufficient for assessing loudspeakers comprehensively and predicting the large signal performance for complex stimuli like music.
We don't test at a single tone. We test at ALL frequencies at once! Beauty of CHIPR signal. This creates a rich set of information which at a glance, tells you how well speaker is performing:

index.php


There is a wealth of information here showing many impairments in different frequencies. As buyers, we are not so concerned about diagnosing such. That is the job of the designer and company. Even then, we can look at things like those sharp spikes and realize they are resonances. This is far more informative than dual tone example in the paper:

1777360027938.png

Here, you only know what happens with those two tones and nothing else. You would have to have a priori knowledge of the system/drivers to know that those were the right tones to use.
 
Remember, in speaker testing, we are not just testing at 1 kHz. We have a full sweep at all frequencies.
But not simultaneously which has a very limited/different distortion mechanism and results.
 
OK, I am now confused what you are asking about. A two-tone test is different than multitone. I thought you were advocating multitone measurements.

I have done a ton of testing with dual tones and speakers. It simply is not practical to develop such as a universal measurement for all speakers. The problem is, you don't know what two tones to use. Every speaker is different with different drivers taking on different ranges. In one speaker, 40Hz and 2 kHz may be in one driver, in another, in two drivers. One speaker can produce 40 Hz at full amplitude, but another, may not.

Multitone gets around that but then you have the issues I mentioned.

Note also that dual tone has similar problem to multitone in the way it excites the system with two tones, spreading the energy. So in that sense, it too is a less severe test of the system than single tone.


That paper is about driver testing, not a full loudspeaker. In a driver, in engineering testing, you know the capabilities of the driver and hence can guess at proper set of tones. Not so for a full speaker of varying configurations. A lot of Klippel's products are for this kind of R&D work. Hence their papers and tests they have developed. As I explained above, it is hard if not impossible to do for speaker reviews.

As to this quote:


We don't test at a single tone. We test at ALL frequencies at once! Beauty of CHIPR signal. This creates a rich set of information which at a glance, tells you how well speaker is performing:

index.php


There is a wealth of information here showing many impairments in different frequencies. As buyers, we are not so concerned about diagnosing such. That is the job of the designer and company. Even then, we can look at things like those sharp spikes and realize they are resonances. This is far more informative than dual tone example in the paper:

View attachment 528415
Here, you only know what happens with those two tones and nothing else. You would have to have a priori knowledge of the system/drivers to know that those were the right tones to use.
You are arguing from the perspective of review utility (which measurement is most readable and universal), while I am arguing from the perspective of physics (which measurement captures the actual behavior of a speaker playing music).

A sweep will never show you the Intermodulation (IMD) or Doppler distortion that occurs when a 40 Hz excursion modulates a 2 kHz tone. You cannot see that modulation in a THD sweep because, during a sweep, the 40 Hz and 2 kHz tones never exist at the same time. This is the 'congestion' I’m referring to—it is a byproduct of interaction, and a single-tone sweep (no matter how fast) cannot trigger it.

You argued that multitone is a 'less severe' test because energy is spread out. While the per-tone SPL is lower, the stress on the motor system is much higher. In a THD sweep, the voice coil is mostly centered except at the lowest frequencies.In a multitone test, the low-frequency components keep the voice coil pushed toward the limits of the magnetic gap Bl(x) and suspension Kms(x) while the high-frequency tones are being reproduced. This 'stress test' reveals nonlinearities in the motor and inductance that a sweep - which allows the coil to return to center for every high-frequency measurement - simply misses.

You noted that two-tone testing is impractical because you don't know which tones to pick. That is exactly why Multitone was developed. It doesn't require a priori knowledge of the crossover; it fills the entire bandwidth. And while Fielder is the authority on masking, his models assume the distortion products are 'small.' But in a real-world loudspeaker, the IMD products from Bl(x) and Doppler can be significant. More importantly, they land in the masking valleys. If you have a peak at 50 Hz and a peak at 1 kHz, the resulting IMD at 950 Hz and 1050 Hz is not 'masked' by the 50 Hz tone, it is far outside that critical band.
 
IMD in headphones has been discussed here earlier

For loudspeakers IMD can be a real problem despite audible levels of it are not defined. EAC started to run multitone tests some time ago, below examples (voltages are different because output was spl-calibrated)

1777388768453.png

1777388787075.png
 
You noted that two-tone testing is impractical because you don't know which tones to pick. That is exactly why Multitone was developed. It doesn't require a priori knowledge of the crossover; it fills the entire bandwidth.

By filling the entire bandwidth, multi-tone test signals reduce the amplitude at each single frequency as Armirm pointed out above. As a result, excursion in the bass is relatively low. One problem is that the levels of all tones are usually the same, which is not resembling energy distribution in music. This can be improved by shaping the level of tones vs. frequecy as described in IEC-268-1. The shape is basically 1/f (pink noise) with a high-pass somewhere below 60Hz and also a low pass in the upper treble range if I recall right.

While such shaping helps, it is still not the same as a single bass note played at high level along with much smaller signal components in midrange and treble. That's why I prefer 2-tone testing. The issue of different speakers can be solved easily by use of a fixed bass tone (f1) at the lower cutoff frequency (-3/-6dB) and a second tone (f2) with much lower level (e. g. -20dB) sweept over the frequency range above f1. This allows large signal at f1 while evaluating IMD over frequencies f2 which includes all drivers of multi-way speakers.

An alternative would be similar f1 frequency as above combined with a multi-tone spectrum in midrange and treble again with much lower level than f1. This avoids frequency sweeping of a single second tone. Instead, f1 could be swept/stepped in level while analyzing IMD within the upper multi-tone spectrum. IMD will be domiated by f1.
 
Last edited:
Back
Top Bottom