• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Measuring headphones (and others) with music as stimulus

Mr. Haelscheir

Active Member
Joined
Mar 16, 2023
Messages
139
Likes
106
I would like to solicit where we are for the "state of the art" of transducer measurements.

tl;dr: What methods could we explore for measuring the dynamic performance of transducers with actual music as opposed to steady or swept tones? Am I missing literature finding such analysis to be ineffectual?

Room EQ Wizard already presents an excellent suite for my own uses in assessing magnitude and phase response, impulse and step response (whether or not this can actually be correlated with the subjective perception of transients and their attack, sharpness, or decay), CSD (residuals probably only audible with extreme, isolated transients or "illegal signals" where a transducer's nonlinearities are rendered audible) (see https://www.head-fi.org/threads/hifiman-he1000-se.886228/page-320 for this suite in action), and an excellent Real-Time Analyzer for multitone distortion measurement as I had reported in https://audiosciencereview.com/foru...n-susvara-headphone-review.50705/post-1888972 (post #1,183; it particularly showed that a headphone with the same harmonic distortion performance past a frequency could still have worse multitone distortion performance within that same range; I still need to repeat the test with the multitone truncated to exclude the higher-distortion bass of the "lesser" headphone).

The limitation of that classic battery of measurements is its reliance on steady-state signals or averaging over a complex signal like a pink spectrum multitone. For the goal of assessing a transducer's accuracy or its ability to only produce residuals below audible thresholds, one might argue even spectrally dense multitone measurements to be insufficient for capturing the dynamic or transient performance of transducers with real music. As for impulse responses, a member here (I don't have the link on me) had managed to demonstrate the agreement between an impulse response calculated from a sine sweep (frequency domain measurement) and a time domain recording, so there should be no concern for the capacity of sine sweeps to capture such isolated transient performance. https://www.superbestaudiofriends.org/index.php?threads/burst-response-hd800-sr-207-hd650.3688/ had been a somewhat interesting read, whatever you may think of SBAF, but is probably already captured by the step response and CSD.

Edit: More clearly distinguished "frequency response" from "transfer function".

The questions I wish to answer with these dynamic measurements include:
  1. Do transducers or the electronics that drive them change in frequency domain magnitude response or transfer function (relation of output level to input level) in the middle of playing certain audio signals?
  2. Do some transducers or electronics actually reject certain low-level signals so as to "lose the details"? I define "detail" as all the raw, objective information contained within the recording (effectively samples), and "resolving capability" as the ability for the output of the playback system to possess a linear correlate for every sample within the reference recording.
  3. In some transducers or electronics, can the presence of juxtaposed higher-level signals cause lower-level signals to be rejected (as opposed to simply being masked by intermodulation distortion and other), or in other words have their transfer function or levels altered?
One tool I have come across is presented in https://www.diyaudio.com/community/threads/loudspeaker-distortions-measurement-in-matlab.388032/, once mentioned here in https://audiosciencereview.com/foru...with-thd-0-00002-100w-20khz.22982/post-774584 (post #18). The technical details are unfortunately still quite above my head, or it would take quite a fair bit of time for me to figure out how to or if I can use it for what I am looking for, which I describe below.

What I want a dynamic audio analyzer to do:
  1. The user takes a reference track, for example, the first minute of the second movement of Boulez' Mahler Symphony No. 5 recording (alternatively, some idealized synthetic if not AI-generated dynamic test signal, which is open to discussion), captured as a .WAV file, then records using microphones (be it a measurement microphone, a dummy head, or in-ear microphones) in a sufficiently low-noise environment the output of the transducer.
  2. The FFT of the sample-aligned reference and transducer .WAV files are calculated and plotted with respect to time (this need not be done in real-time so as to be able to maximize resolution). As I understand, FFT window length (assuming a rectangular window as having sufficient dynamic range for the transducers in question) present tradeoffs in frequency and temporal resolution, whereby for longer FFT window lengths, one attains greater frequency resolution for lower frequencies (at least when looking at a logarithmic scale; I suppose the main lobe width per tone would effectively be equal when using a linear scale?) at the cost of reduced temporal resolution for the changes in all frequencies, hurting the analysis of transient behaviour. I happened to come across the term "wavelet transform" which might help for tuning the resolution for the desired view. An alternative idea I am curious of is whether it is possible to reduce the FFT length with increasing frequency toward the theoretical fastest if not "actual" rates at which individual tones of a given frequency are changing, if such a concept even exists insofar as a time domain signal obviously doesn't have an "instantaneous FFT" without consideration of window length (I could be mistaken regarding this).
  3. After being able to obtain the spectral content of the reference and transducer signals at a given time, the software correlates all the tones common between the reference and transducer signals and from this estimates the instantaneous frequency response of the transducer for each sample (or chosen window). Except where residuals overlap with the tones in the reference signal, this analysis would reveal whether the playback system's transfer function (frequency response and possibly things like "DAC or amp tonality" which we are skeptical of) is changing with respect to time for the given practical music stimulus. A 3D visual could be made plotting the deviation of this frequency response from the steady-state sine sweep measurement, pointing out areas of dynamic compression or extension.
  4. From this, the actual residuals, be it distortion in the form of tones, reflections, or tones actually failing to decay fast enough, could be extracted and plotted with respect to time for analysis of audibility. Knowledge of the steady-state frequency response could perhaps be used to discern residuals as opposed to frequency response changes within tones already present in the reference recording. This analysis could also be used to discern where tones in the reference signal are "missing" or attenuated and hence not as well "resolved". The software could depict these residuals and perhaps visualize where tones resolved from the reference signal (the audibility of those themselves can be analyzed) are being masked by the adjacent residuals. As such, we would have an objective analysis of a system's "resolution" and "accuracy" independent of tonality (linear distortions).
  5. This analysis could also be applied to the direct measurement of electronics so as to do a matters more than the PK Metric at least in being able to separate linear distortions from nonlinear distortions.
  6. Maybe FSAF does exactly this, but it is still way above my head for me to discern that.
I would appreciate feedback regarding the feasibility and utility of this measurement approach.
 
Last edited:
  1. Do transducers or the electronics that drive them change in transfer function in the middle of playing certain audio signals?
My opinion:

Electronics .. no. It just follows the input signal (+ adds some distortion). This can be shown with multitone.
Transducers may change transfer function at larger excursions which causes sidebands.

  1. Do some transducers or electronics actually reject certain low-level signals so as to "lose the details"? I define "detail" as all the raw, objective information contained within the recording (effectively samples), and "resolving capability" as the ability for the output of the playback system to possess a linear correlate for every sample within the reference recording.
Electronics ... no.
Transducers might be masking small signals.
Unfortunately the only way to check is to compare to the applied electrical signal. And here comes the problem. The variations between what is picked up by a mic in a certain fixture will change the acoustic signal. This is device dependent (fixture and headphone) so is not an option to use either smart analyzing using complex program material.
The brain can do this better but ... that ratchet short term memory and positioning is in the way.

  1. In some transducers or electronics, can the presence of juxtaposed higher-level signals cause lower-level signals to be rejected (as opposed to simply being masked by intermodulation distortion and other), or in other words have their transfer function or levels altered?
In electronics ... no, poorer electronics, of course, will add IM when they also add harmonics. The audibility of that is recording dependent (as well as listener training)
In acoustics there is no rejection but resonances, non linear behavior of the driver and dips (due to nulling for instance) can lower or mask certain signals.
Here too the brain is important.

There is no substitute for a trained listener and measurements.... the latter has the problem of being limited in several ways and by circumstances.
The listener needs to know what to listen for, there needs to be standard recordings (needs consensus), needs to have a fixed vocabulary that is agreed upon and recognized by peers. There need to be fixed conditions and in case of IE an agreed insertion depth, for OE (on and over) position variances need to be accounted for (same for measurements)
 
Do transducers or the electronics that drive them change in transfer function in the middle of playing certain audio signals?
Depending on semantics here, this is basically just any nonlinear behavior like THD or TIM, so yes, sometimes.

Do some transducers or electronics actually reject certain low-level signals so as to "lose the details"?
AFAIK not sure how you'd define "reject" exactly, but you can get masking due to noise or distortion, so kind of, I guess.

To make this more intuitive, consider the fact that amplifiers / DACs etc will consistently pass along noise, which is basically an inherent signal at the very lowest levels. You actually generally have to use special gear (known as a gate in the studio world) to reject low-level signals in practice.

In the sense of a transducer taking a certain voltage to "wake up", supposedly it happens in rare cases with certain types of speaker surround that don't behave linearly, but I'm not aware of this happening very often.

  1. In some transducers or electronics, can the presence of juxtaposed higher-level signals cause lower-level signals to be rejected (as opposed to simply being masked by intermodulation distortion and other), or in other words have their transfer function or levels altered?
In electronics, I don't think so, other than via noise / distortion.

In transducers, it seems like that might be the case intuitively, but consider that the acceleration of the cone corresponds directly to the strength of the magnetic field in the voice coil, which in turn corresponds directly to the voltage... there's nowhere for those forces to go, even small ripples in those forces, unless the cone moves. So, aside from distortion (which definitely can and does obscure details) I am not aware of a way for this to happen.


All of that said, you can do measurements with music using software like Deltawave, which is made by a member here and free. I do think that in-room speaker measurements with music is probably too hard to be worthwhile but I think it could be possible in principle.
 
All of that said, you can do measurements with music using software like Deltawave, which is made by a member here and free. I do think that in-room speaker measurements with music is probably too hard to be worthwhile but I think it could be possible in principle.
For electronics and cables yes, for acoustic stuff nulling is not possible.
Of course one can compare a recording with the original but nothing conclusive will come out for a ton of reasons.
Nulling is a great tool for electronics but that's where the usability of this method ends.
 
Depending on semantics here, this is basically just any nonlinear behavior like THD or TIM, so yes, sometimes.


AFAIK not sure how you'd define "reject" exactly, but you can get masking due to noise or distortion, so kind of, I guess.

To make this more intuitive, consider the fact that amplifiers / DACs etc will consistently pass along noise, which is basically an inherent signal at the very lowest levels. You actually generally have to use special gear (known as a gate in the studio world) to reject low-level signals in practice.

In the sense of a transducer taking a certain voltage to "wake up", supposedly it happens in rare cases with certain types of speaker surround that don't behave linearly, but I'm not aware of this happening very often.


In electronics, I don't think so, other than via noise / distortion.

In transducers, it seems like that might be the case intuitively, but consider that the acceleration of the cone corresponds directly to the strength of the magnetic field in the voice coil, which in turn corresponds directly to the voltage... there's nowhere for those forces to go, even small ripples in those forces, unless the cone moves. So, aside from distortion (which definitely can and does obscure details) I am not aware of a way for this to happen.


All of that said, you can do measurements with music using software like Deltawave, which is made by a member here and free. I do think that in-room speaker measurements with music is probably too hard to be worthwhile but I think it could be possible in principle.
For question #2, "reject" here is asking if there are cases where transducers exhibit something like a coefficient of static friction such that they won't even move for certain frequencies until a certain level is surpassed. As for the idea of the existence of a phenomenon whereby below a certain signal level, the frequency response or transfer function levels decrease or attenuate at a marked rate, I guess that is indeed precisely the definition of a nonlinear distortion and is bound to show up as spurious tones, though I am interested in the case where it is the fundamental frequency that comes to play markedly lower in level than nominal.

For question #3, it is on whether the current motion or inertia of a transducer for a given larger amplitude signal or tones could cause the driver to assume a nonlinearity that more greatly attenuates smaller signals. I suppose I should have better separated the frequency domain term of "magnitude response" (linear distortions or dynamic changing of tonality for certain stimuli) from "transfer function" as in the relation of output level to input level, which I suppose can itself vary with frequency. This, question #3 inquires for the existence of phenomena where the transfer function for certain demanding signals comes to have an increasingly increasing or concave shape where higher level signals are needed to achieve a given output level. This kind of nonlinearity would be differentiated from one where the transfer function instead becomes convex and in effect amplifies the lower level signals; the concave case to me would evince a transducer's "losing" information from the recording while the convex case would evince the transducer's "amplifying" the information. I don't know if the latter has to do with what I had read about some finding tubes capable of imparting an increased sense of detail, whether or not the transfer function really is amplifying lower level signals with higher gain but at the cost of introducing more spurious frequency content. It might also be interesting whether concave or convex transfer functions contribute to subjective impressions of less or more "detailed" or "resolving" headphones irrespective of magnitude response and harmonic distortion measurements. My idealized dynamic audio analyzer would show where "details" are being amplified or attenuated.

I don't know if it would be valid or useful to measure the frequency domain magnitude response for a range of input levels and plot these, and if so, whether the slice of output magnitudes versus input outputs for a given frequency would effectively be showing you that frequency's transfer function; I suppose the alternative to taking many sine sweeps at different levels would be to take multiple sinusoid amplitude sweeps at different frequencies depending on which you are prioritizing the resolution of. Given this, it should still be feasible for the overall shape and hence tonality of the frequency domain magnitude response to remain unchanged for all playback levels provided that the transfer function nonlinearity is the same for every frequency, its simply being vertically translated.

@solderdude Indeed, direct nulling of acoustic measurements doesn't really work, but the linked FSAF methodology purportedly performs analyses by some sophisticated means to extract acoustic residuals.
 
Last edited:
Isn't this just a convoluted way of asking the same old can we measure everything we hear question
  1. Do transducers or the electronics that drive them change in frequency domain magnitude response or transfer function (relation of output level to input level) in the middle of playing certain audio signals?
  2. Do some transducers or electronics actually reject certain low-level signals so as to "lose the details"? I define "detail" as all the raw, objective information contained within the recording (effectively samples), and "resolving capability" as the ability for the output of the playback system to possess a linear correlate for every sample within the reference recording.
  3. In some transducers or electronics, can the presence of juxtaposed higher-level signals cause lower-level signals to be rejected (as opposed to simply being masked by intermodulation distortion and other), or in other words have their transfer function or levels altered?
All deviations from a linear transfer function are called non-linearity. Including deviations from superposition principle. So then what you are asking seems to be whether there can be other mechanisms of distortion that we have not thought about. If that is the case, then what really matters is what we can measure, so the more pertinent question would be whether these mechanisms, if they exist, would produce a type of non-linearity that is currently unknown or not captured in typical distortion measurements. I think the answer to that last question would be no.
 
Last edited:
Well there are some issues with speaker drivers at least. Not sure how relevant they are, but the result is somewhere between not-zero and swamped by larger factors.

There exist ways to measure many of these. Except at the R&D level of driver design not sure how much this is used.

In another thread recently was discussion of how small signal vs large signal responses effected listening at moderate to high volume vs low volume. One aspect mentioned involved how ESL panels are effectively free to move with high linearity with the motion primarily damped by the resistance of the air itself. Transitioning from very low levels to moderate levels that is not true of cones in a box. I don't know if that reaches a level to matter much in practice. Certainly I suspect it is rather down in the weeds. Spectral balance (frequency response), directivity, and lack of resonance likely overwhelm everything else until these get down to very low levels. Even after that point simple room modes probably cover up remaining differences.
 
It may just be a matter of interpreting a full measurement suite, so more than just FR at 1 or 2 levels and other aspects at various levels and using more than one measurement fixture/method.
Then perception has to be included (another can of worms) while interpreting those measurements when it comes to audibility of all said measurements.
It would at least have to start with measurements.... what ARE good measurements... Just saying there is a standard and measurements should be done acc. to that standard and declaring it holy is with 100% certainty the wrong way to go about this.
 
if there are cases where transducers exhibit something like a coefficient of static friction such that they won't even move for certain frequencies until a certain level is surpassed.
I've heard that there are occasionally such speaker drivers, but I don't know of any actual examples. It came up in another thread recently. My understanding is this is pretty rare. I think this would also happen depending on total excursion, it wouldn't be frequency-dependent if the driver were already moving.
whether the current motion or inertia of a transducer for a given larger amplitude signal or tones could cause the driver to assume a nonlinearity that more greatly attenuates smaller signals.
The only way I know of that this (nonlinearity based on signal input other than normal distortion modalities) actually happens is hysteresis in the driver magnets, but if you read what Bruno wrote about this in the Purifi whitepaper, the character of this distortion is more like noise or extra unwanted transients, not suppressing small movements. But I guess if there were a lot of hysteresis-induced distortion, it would tend to mask quieter sounds like any other type of distortion.

what I had read about some finding tubes capable of imparting an increased sense of detail,
This is not hard to do with simple harmonic distortion, which tubes are known to impart. IIRC there are some tubes that also have a varying frequency response of distortion (i.e. more HF distortion depending on level) so that also points in that direction. It's maybe worth mentioning that you can use a dynamic EQ plugin or just a plain EQ plugin to get similar output.
 
Last edited:
Quoting Doug Self:
"Sinewaves are steady-state signals that represent too easy a test for amplifiers, compared with the complexities of music."
This is presumably meant to imply that sinewaves are in some way particularly easy for an amplifier to deal with, the implication being that anyone using a THD analyser must be hopelessly naive. Since sines and cosines have an unending series of non-zero differentials, "steady" hardly comes into it. I know of no evidence that sinewaves of randomly varying amplitude (for example) would provide a more searching test of amplifier competence.​
I believe this outlook is the result of anthropomorphic thinking about amplifiers; treating them as though they think about what they amplify. Twenty sinewaves of different frequencies may be conceptually complex to us, and the output of a symphony orchestra much more so, but to an amplifier both composite signals resolve to a single instantaneous voltage that must be increased in amplitude and presented at low impedance. The rate of change of this voltage has a maximum set by the frequency response and amplitude capability of the channel and is not generally greater for more complex signals; you do not get hgher slew rate with bigger orchestras. You must remember that an amplifier has no perspective on the signal arriving at its input, but literally takes it as it comes.​
 
Quoting Doug Self:
"Sinewaves are steady-state signals that represent too easy a test for amplifiers, compared with the complexities of music."
This is presumably meant to imply that sinewaves are in some way particularly easy for an amplifier to deal with, the implication being that anyone using a THD analyser must be hopelessly naive. Since sines and cosines have an unending series of non-zero differentials, "steady" hardly comes into it. I know of no evidence that sinewaves of randomly varying amplitude (for example) would provide a more searching test of amplifier competence.​
I believe this outlook is the result of anthropomorphic thinking about amplifiers; treating them as though they think about what they amplify. Twenty sinewaves of different frequencies may be conceptually complex to us, and the output of a symphony orchestra much more so, but to an amplifier both composite signals resolve to a single instantaneous voltage that must be increased in amplitude and presented at low impedance. The rate of change of this voltage has a maximum set by the frequency response and amplitude capability of the channel and is not generally greater for more complex signals; you do not get hgher slew rate with bigger orchestras. You must remember that an amplifier has no perspective on the signal arriving at its input, but literally takes it as it comes.​
This is of course true. I gave up long ago trying to get this point across. As multi tone and other such things can easily be done that seems like a better way to convince someone.
 
The latest REW beta version can do FSAF measurements, providing TD+N for music and with an option to listen to the distortion residual.

Thank you SO MUCH for this. That's literally the n°1 feature I was dreaming to see in REW !
 
Back
Top Bottom