• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Importance of impulse response

KSTR has mentioned the 6.5 cycle tone burst. So here is a simulation of the time domain response to the CTA2010/2034 6.5 cycles, Hann windowed tone burst (normalized burst duration to 1 time unit, i.e. tone frequency 6.5 cycles per unit time) to a peaking PEQ filter of 3 dB gain, Q 20 with a sliding center frequency from 4 to 9.



Time domain response is often not simple to interpret. Don't be misled into making incorrect conclusions.
Nice, thx.

Here's a interesting test how bursts sound of 1, 2, 4, 8, 16, 32, 64, and 128 periods; using 300Hz, 1000Hz, and 3000Hz.
https://ccrma.stanford.edu/~malcolm/correlograms/index.html?29 Pitch Salience And Tone Duration.html
(link recently posted by member @fluid on another forum.)

Wish there was a 20 Hz example too :).
 
Thank you.

Because if a speaker did as shown with an "in bandwidth" signal, it would not be possible to recreate any musical waveform.

Sure - output won't be the same as input, because you've got discontinuities (meaning infinet bandwidth) at the start and stop of the input. The result would be a rounding at the start, and possibly an over shoot, and or rounding at the end - perhaps with a small oscillation (assuming the main part of the waveform is within bandwidth of the transducer.
A transducer is a minimum phase system. That means the movement of the cone basically tracks the input signal as long as the input signal is within the bandwidth capability of the transducer. There will be lag and/or lead (phase shift) possible ringing at transients (eg if trying to track a square wave - but that is out of bandwidth) but you won't get additional (high magnitude) vibrations - or at least, when you do (if driven beyond limits), that is audible distortion.


EDITED TO ADD - plus, looking at that daft input output waveform again. The output continues to get bigger even after the input has stopped - how on earth does that happen. Where is the energy coming from to increase the magnitude of the speaker movement?

Utter nonsense.
Thank you. I won't argue with you on the "daft" signal graphic, but it has one possibly useful attribute even if it doesn't describe waveform reality: it was a salesman's effort to depict the physical experience of hearing an unrealistically soft attack and lingering decay in a way that would seem relevant to a naive consumer who actually hears such speaker behavior. I get the harmonic distortion part, but am having trouble with the "soft/slow" vs "hard/fast" part and the ringing part.

What I am getting to is that, rather than weighing down this thread with the basics, just for me, do you know a good resource on understanding the subject for someone starting at close to ground zero? My goal is to understand why, whenever someone proposes that it would be helpful for ASR (or someone) to measure the response of speakers and headphones that subjectively are "crisp", "fast" or "resolving", as distinguished from those that are "soft", "slow" or "congested", the ensuing discussions often seem to go 'round in circles. What are the actual phenomena that account for these attributes, and how can they be measured or explained, if at all?

Again, thanks for taking the time to carefully reply! Let me know any good intro resource if one comes to mind.
 
Also keep in mind that an IR is only valid for one point in space.
 
That is true - but if there is little energy at 20Hz in the recording I can understand if people are not that concerned about it - in many cases low-level 20Hz content would be anyway inaudible due to equal-loudness contours / Fletcher–Munson curves and perceptual masking.
This is usually right, but since we're talking about the timbre of a transient and not two or more distinct sounds, I am less sure of how masking will play out.
Maybe this will also be interesting to you - I prepared a few more graphs illustrating what happens with an ideal impulse (actually not strictly ideal - this one has 96kHz bandwidth) when you apply high-pass filters at 10Hz, 30Hz and 50Hz, shown in both frequency and time domains:



As you can see - not a lot of difference in the time-domain shape (waveform) of the pulse - even at this highly zoomed-in scale - but in the frequency domain the differences are very obvious!
Yes, in this case the difference would be audible as a change in tone/timbre, the time domain stuff is definitely inaudible.
So while it is true that "transients" in the recordings look a lot like impulses, that doesn't mean that they contain very low frequencies!
This is actually not correct, see below.
It is also a nice reminder why it is often told that the impulse response is not a good format for presentation of data if you're interested in low-frequency performance of a device! :)
I would agree with that, except that they do tend to reveal ringing/resonances pretty starkly.

Anyway, I took a cymbal strike from Freesound.org and loaded i up into Audacity for the spectrogram - look at the attack and notice how there's actually pretty significant energy all the way down:

There's nothing special about this recording either, it's just a cymbal, which you'd expect would have only HF content. However, the attack (to the extent it resembles an impulse, which is pretty well among recorded sounds) theoretically (and apparently actually) contains all frequencies.

It seems to me that you need a sub to properly reproduce cymbals... have I gone off track somewhere?

e: FWIW just tried to see what the audible difference would be on headphones filtering <80hz out ... honestly not much but it's the theory that counts?

Screen Shot 2022-11-10 at 1.08.10 PM.png
 
Last edited:
"fall delay" happens because nothing can start or stop moving instantaneously. "rise delay" looks pretty nonsensical. I guess the closest thing is group delay but even then...?

Yes - that's why the goal is infinitely stiff, zero mass, and completely acoustically inert.

In the real world, all things have compromises. Beryllium is toxic to mine and refine, and insanely expensive despite its exceptional behavior, and the cheaper alternatives like aluminum are both heavier and have worse breakup mode behavior.

But really, most of what makes for "fast" or "slow" bass is the cabinet behavior. How the port is tuned (if present), how well-braced and nonresonant the cabinet is, the internal volume, how much absorbent material there is, etc etc.

For example: ATC intentionally overdamps their ports - they trade off maximal LF extension for shorter decay time as they preferred that outcome in that compromise. Others like Genelec prefer maximal LF extension over shorter decay time. They're both entirely valid approaches, it ultimately boils down to which works better for your tastes.
thanks. I kind of understand the cabinet resonance part. Is there a separate story at the speaker level? I mean, are there waveform physical response differences between transducers operating within their bandwidth that make a lot of difference in perceived resolution/speed/etc.?
 
The problem with the graph above is that it gives an example with descriptions that do not reflect the physical reality and are threfore misleading.

E.g. the first picture doesn't show a "recorded deep bass sound" but an impulse which when transformed into the frequency domain via Fourier transform would have infinite spectrum width (so a lot of high frequencies too!).

Remember that any time-domain function can be transformed into an equivalent frequency (magnitude + phase) response and vice-versa. I.e. the impulse and frequency responses of the same system show the same data - just a different view of it.

This can be difficult to explain in just a few words and without math - but there are e.g. some nice visual explanations on ASR and elsewhere.

In addition, it appears humans are in general far more sensitive to small frequency response magnitude differences than many phase or time domain differences. This means that in controlled blind tests people often wouldn't be able to reliably differentiate between sounds with the same frequency magnitude spectrums but different time domain function shapes (within reason, of course).

Many times the feeling of "speed" relates simply to the lack of bass resonance peaks, or less bass, or more treble, or some combination of these.

Hope this helps!

EDIT: Let me provide some references, perhaps it will be helpful to some!
  • Link to an AudioXpress article on audibility of phase with comments from Dr. Floyd Toole, Dr. Wolfgang Klippel, Andrew Jones and James Croft
  • Link to ASR post #1 - Illustrations of how high-passing or low-passing and ideal pulse impacts the signal in frequency and time domain. Also an illustration of how the step and impulse response looks like when limited to the human hearing range (20Hz-20kHz) in an idealized case.
  • Link to ASR post #2 - Illustration of how typical loudspeaker crossover impacts the impulse and step response, comparison between a real loudspeaker response and a similar idealized simulated response.
  • Link to ASR post #3 - A very nice illustration showing how a sum of 3 sine waves of different frequencies looks like in the time and frequency domains (Fourier transform)
Thank you so much! This is exactly what I need.
 
Also keep in mind that an IR is only valid for one point in space.
Yes, every point in space will have a unique IR.

And what makes for a truly great speaker imo, is one that minimizes the IR variations over as large a listening area as possible.
 
Also keep in mind that an IR is only valid for one point in space.

Ah… no.
IR measurements describe the speaker.

Once you have the IR of the speaker, then you could basically compute the measured IR in any location in a room and the delayed echos.
But one does not usually look at the IR of a speaker out in the (many milliseconds) range… we are looking at wall/ceiling/floor reflections out there.

I suppose one can argue that the IR varies vertically in a normal speaker, or horizontally in an MTM set up… Is that what you mean?
 
Yes, every point in space will have a unique IR.

And what makes for a truly great speaker imo, is one that minimizes the IR variations over as large a listening area as possible.
Otherwise known as good directivity behavior. If only we had a graph showing that.
 
I suppose one can argue that the IR varies vertically in a normal speaker, or horizontally in an MTM set up… Is that what you mean?
Anything other than a true point source will produce at least slightly different impulse responses at different points in space. Even without room effects..
It's just the simple geometry of a speaker having multiple drivers/sources, and how their triangulated distances to the points in space vary.
Coaxial or single driver full range should have the most consistent IR over space, but even these radiate different frequencies from different areas of their drivers,
(And not to mention diffraction, yada,)

Any variation in frequency response between on-axis and off-axis traces, gives an accompanying change in IR.
 
This is usually right, but since we're talking about the timbre of a transient and not two or more distinct sounds, I am less sure of how masking will play out.
The extent to which masking plays a role will definitely depend on the specific case - I just thought it was worth mentioning in this context.

This is actually not correct, see below.
My statement was meant to say that by just eyeballing the waveform of the pulse in the time domain we usually can't reliably tell whether or not it contains significant low frequency data. Notice that I never said that pulses in general can't/don't contain low bass since obviously some do - e.g. the 'ideal' impulse in my own examples.

I would agree with that, except that they do tend to reveal ringing/resonances pretty starkly.
Just note that such strong resonances show up in frequency magnitude responses as well - as large peaks. See the very nice illustration by @NTK in post #99.

Anyway, I took a cymbal strike from Freesound.org and loaded i up into Audacity for the spectrogram - look at the attack and notice how there's actually pretty significant energy all the way down:

There's nothing special about this recording either, it's just a cymbal, which you'd expect would have only HF content. However, the attack (to the extent it resembles an impulse, which is pretty well among recorded sounds) theoretically (and apparently actually) contains all frequencies.

It seems to me that you need a sub to properly reproduce cymbals... have I gone off track somewhere?

e: FWIW just tried to see what the audible difference would be on headphones filtering <80hz out ... honestly not much but it's the theory that counts?

Screen Shot 2022-11-10 at 1.08.10 PM.png

Let me start by saying that I appreciate that you provided an example and reference :)
I don't think you are off track in principle - IMO the question is only how much energy is in that part of the spectrum in the complete recording as that will in a large part contribute to how audibly significant it is. In some recordings it is definitely relevant, but many recordings are quite lacking in sub-bass energy so I can somewhat understand people who state that e.g. accurate reproduction of the sub 40Hz octave is less important for music - even if I don't share their sentiment :D
 
Otherwise known as good directivity behavior. If only we had a graph showing that.
Agreed.
Although their are some pretty good directivity plots to be found imo.
 
Ah… no.
IR measurements describe the speaker.

Once you have the IR of the speaker, then you could basically compute the measured IR in any location in a room and the delayed echos.
But one does not usually look at the IR of a speaker out in the (many milliseconds) range… we are looking at wall/ceiling/floor reflections out there.

I suppose one can argue that the IR varies vertically in a normal speaker, or horizontally in an MTM set up… Is that what you mean?

To tack on to this, IRs are often recorded in spaces because you can use them as a convolution filter to simulate the reverberation as an effect on other audio. Take an IR in a church and now you have "church reverb" for your DAW. I wrote an article on the technique for Electronic Musician back in 2005 I think it was.

For that, you want a truly full range, flat, omnidirectional speaker, or you just shoot a starter pistol or pop a balloon. Turns out the cops don't love the starter pistol approach, though.
 
Ah… no.
IR measurements describe the speaker.

Once you have the IR of the speaker, then you could basically compute the measured IR in any location in a room and the delayed echos.
But one does not usually look at the IR of a speaker out in the (many milliseconds) range… we are looking at wall/ceiling/floor reflections out there.

I suppose one can argue that the IR varies vertically in a normal speaker, or horizontally in an MTM set up… Is that what you mean?
As a non-speaker-designer with limited experience measuring them recently, I am a bit curious about this. IME, recognizing this is not my day job and it has been a while since I did this, if I measure the impulse response at different points around the speaker, I get different results as the physical path length from the various driver various and the crossover does not completely compensate. I generally got worse IR as I moved further off-axis (left-right or above-below). I think that is your last line? Is that relevant as a listener? I thought so but am not sure your argument.
 
The extent to which masking plays a role will definitely depend on the specific case - I just thought it was worth mentioning in this context.


My statement was meant to say that by just eyeballing the waveform of the pulse in the time domain we usually can't reliably tell whether or not it contains significant low frequency data. Notice that I never said that pulses in general can't/don't contain low bass since obviously some do - e.g. the 'ideal' impulse in my own examples.


Just note that such strong resonances show up in frequency magnitude responses as well - as large peaks. See the very nice illustration by @NTK in post #99.



Let me start by saying that I appreciate that you provided an example and reference :)
I don't think you are off track in principle - IMO the question is only how much energy is in that part of the spectrum in the complete recording as that will in a large part contribute to how audibly significant it is. In some recordings it is definitely relevant, but many recordings are quite lacking in sub-bass energy so I can somewhat understand people who state that e.g. accurate reproduction of the sub 40Hz octave is less important for music - even if I don't share their sentiment :D
I guess overall we're in agreement however, that in principle you always need a subwoofer to reproduce all musical content. Whether it's totally audible or not, will vary and it's a lot more questionable than being able to hear the bottom notes of a pipe organ.
 
As a non-speaker-designer with limited experience measuring them recently, I am a bit curious about this. IME, recognizing this is not my day job and it has been a while since I did this, if I measure the impulse response at different points around the speaker, I get different results as the physical path length from the various driver various and the crossover does not completely compensate. I generally got worse IR as I moved further off-axis (left-right or above-below). I think that is your last line? Is that relevant as a listener? I thought so but am not sure your argument.
The frequency response of the speaker varies at different positions in space due to a myriad of reasons such as driver breakup, driver-self interference (beaming) and of course the frequency response of all the different radiating elements. The variable path length between drivers at different points in space makes them play slightly in or out of phase compared to the reference axis - at some angles/frequencies, you will even have nulls where radiation between drivers is cancelled out.

All of this will be captured by the impulse response, which, as many have tried to emphasize, contains very little information that is not much better shown in a FR graph, and then ideally collated in a directivity sonogram or spinorama.

If you want to scrutinize the 'time' behavior of a speaker the IR would not be my choice of diagram anyway. A step response or ETC is clearer to me; just keep in mind the looming influence of the room.

For some reason, we normally look at the frequency response of a speaker, and the ETC of a room. In theory we could look at the FR of a room, but I'm not sure what conventions exist to communicate that.
 
The frequency response of the speaker varies at different positions in space due to a myriad of reasons such as driver breakup, driver-self interference (beaming) and of course the frequency response of all the different radiating elements. The variable path length between drivers at different points in space makes them play slightly in or out of phase compared to the reference axis - at some angles/frequencies, you will even have nulls where radiation between drivers is cancelled out.

All of this will be captured by the impulse response, which, as many have tried to emphasize, contains very little information that is not much better shown in a FR graph, and then ideally collated in a directivity sonogram or spinorama.

If you want to scrutinize the 'time' behavior of a speaker the IR would not be my choice of diagram anyway. A step response or ETC is clearer to me; just keep in mind the looming influence of the room.

For some reason, we normally look at the frequency response of a speaker, and the ETC of a room. In theory we could look at the FR of a room, but I'm not sure what conventions exist to communicate that.
Thanks, I got that... What confused me was this:
Once you have the IR of the speaker, then you could basically compute the measured IR in any location in a room and the delayed echos.
But one does not usually look at the IR of a speaker out in the (many milliseconds) range… we are looking at wall/ceiling/floor reflections out there.
I think it is semantics tripping me up. If you measure the IR at a point directly in front of the speaker, it does not follow to me that it predicts the response at a different point in the room, precisely because of all the things you mentioned. At any rate I feel my lack of understanding is a side issue and I'll try to refrain from interrupting again.
 
Once you have the IR of the speaker, then you could basically compute the measured IR in any location in a room and the delayed echos.

Not quite, you also need the IR of the room at the location in question.

An IR tells you what frequency and phase you get from the equipment being used to generate it, at one location, including everything.

The "IR of a speaker" is usually assumed to be the anechoic response, either nearfield or inside 1m.

Once you add reflected sound, you are talking about a completely new IR that involves both the speaker AND the room.

Often the IRs for speaker measurements are "time gated" in that any reflected sound arrives after the time gate - like 6ms or so.

If not time gated, you are getting the IR of everything present during the recording, from the speaker to the back wall, and back to the mic.
 
As a non-speaker-designer with limited experience measuring them recently, I am a bit curious about this. IME, recognizing this is not my day job and it has been a while since I did this, if I measure the impulse response at different points around the speaker, I get different results as the physical path length from the various driver various and the crossover does not completely compensate. I generally got worse IR as I moved further off-axis (left-right or above-below). I think that is your last line? Is that relevant as a listener? I thought so but am not sure your argument.

Well @DonH56 - I was assuming that that maybe it was @617 argument?
And most people find that toe in changes the sound.

I was not sure if the argument of @617 was that the room reflections confound the IR.


All of this will be captured by the impulse response, which, as many have tried to emphasize, contains very little information that is not much better shown in a FR graph, and then ideally collated in a directivity sonogram or spinorama.

If we have one speaker an IR that differs from another speaker, with both having the same FR, then that will make for a pair that will not sound like a stereo pair.
Imagine a the tweeter going negative on the lading edge of the IR… and then the other going positive.
They will sum to zero.



If you want to scrutinize the 'time' behavior of a speaker the IR would not be my choice of diagram anyway. A step response or ETC is clearer to me; just keep in mind the looming influence of the room.
...

Both the step response and the IR are conveying a similar thing.
The directivity and FR give us the spatial and frequency components to understand the speaker in an amplitude sense.

IR and step function give us time domain and phase,
And GD gives more of a FR versus time behaviour, so it is more on the order of IR and step function in that it is a time domain presentation, but then also with respect to frequency... The waterfall is similar to GD in time domain vs freq, and reserved for cabinet resonance.

If I am going to look at spinorama, I might as well look at step response and IR too.

Then there is also the DIRAC Live, which is generally IR and GD (Phase EQ) correction.


Thanks, I got that... What confused me was this:

I think it is semantics tripping me up. If you measure the IR at a point directly in front of the speaker, it does not follow to me that it predicts the response at a different point in the room, precisely because of all the things you mentioned. At any rate I feel my lack of understanding is a side issue and I'll try to refrain from interrupting again.

I think that the interruptions generally help to clarify what we are trying to convey.
 
Not quite, you also need the IR of the room at the location in question.

An IR tells you what frequency and phase you get from the equipment being used to generate it, at one location, including everything.

The "IR of a speaker" is usually assumed to be the anechoic response, either nearfield or inside 1m.

Once you add reflected sound, you are talking about a completely new IR that involves both the speaker AND the room.


Often the IRs for speaker measurements are "time gated" in that any reflected sound arrives after the time gate - like 6ms or so.

If not time gated, you are getting the IR of everything present during the recording, from the speaker to the back wall, and back to the mic.
That was not me, I was editing a bad quote when you grabbed that, I think. The bolded parts are what I think I was misunderstanding.

There is a similar problem with any radiator so I have "related" experience though antenna design is not my day job either... I have in the primordial past toyed with the idea of using DSP and a multitude of drivers to provide dispersion control much like an active antenna array but other than a few samples for play never built anything substantial. Now there are commercial products that do it.
 
Back
Top Bottom