Step Response: Does It Really Matter?

pma · Feb 7, 2021

j_j said:
This has nothing to do with time resolution of PCM or subsample delays. It may read (if the detail is there) on how to calculate acoustic impedance.

~~I am arguing about nothing, it was my first post here in this thread.~~ (OK, you edited the post) I was just interested if you knew the author, for the reason that you mentioned the acoustical impedance measurement. The world is not limited to PCM subsample delays.

https://www.aes.org/e-lib/browse.cfm?elib=6008

j_j · Feb 7, 2021

pma said:
~~I am arguing about nothing, it was my first post here in this thread.~~ (OK, you edited the post) I was just interested if you knew the author, for the reason that you mentioned the acoustical impedance measurement. The world is not limited to PCM subsample delays.

Sorry, I confused posters there for a minute.

I do not know that work (well, I can't read the actual paper), but I do know what I suspect are very similar results described in English (sorry, I'm your typical monolingual American).

pma · Feb 7, 2021

j_j said:
Sorry, I confused posters there for a minute.

I do not know that work (well, I can't read the actual paper), but I do know what I suspect are very similar results described in English (sorry, I'm your typical monolingual American).

That's fine

. Now to interchannel subsample delays, to stay on-topic. Yes the resolution is great and "almost" unlimited, I just made a quick measurement of one channel delayed against another by 20 metres of the link cable (in fact propagation delay + R(out)*C(cable) time constant, total about 1us). Years ago I did similar experiments so I am not surprised. In fact there is not much difference from analog view of the signals. (Soundcard output, simultaneous sampling, Fs=48kHz, one channel recorded back to soundcard in the loop with 1m, second with 20m. Recorded file captured by scope, from soundcard analog output, blue channel corresponds to 1m, red channel to 20m).

KSTR · Feb 7, 2021

@pkane's DeltaWave does a good job for subsample shifts. Shifting a file first by 0.333 samples, then a second time by 0.667, gives 1.000 sample total shift. After rotating left by one sample the difference to the original file is basically an infinite null, with the exception of Nyquist (fs/2) ripple present, with some peaks every 512 samples (the FFT block size used for shifting, it seems)

j_j · Feb 7, 2021

KSTR said:
@pkane's DeltaWave does a good job for subsample shifts. Shifting a file first by 0.333 samples, then a second time by 0.667, gives 1.000 sample total shift. After rotating left by one sample the difference to the original file is basically an infinite null, with the exception of Nyquist (fs/2) ripple present, with some peaks every 512 samples (the FFT block size used for shifting, it seems)

There are a lot of ways to do subsample or variable-length resampling. DC and Nyquist present issues for real signals when using FFT methods. Proakis' method of oversampling and filtering in the oversampled domain is pretty much bulletproof, and gives you approximately an SNR of 'n^4' where n is the oversampling rate. 64 is almost bulletproof, and 128 pretty much exceeds any sort of real signal. At that rate, you can simply use linera interpolation between the two nearest samples at 128 times the sampling rate. Of course, only calculate the samples you actually need.

KSTR · Feb 7, 2021

So, in a practical example using Adobe Audition, one would upsample to 64x without filter (as that would impose a FR change), then calculate the required shift count and implement the remaining fractional shift (in upsampled sample periods) by linear interpolation, then downsample again without filtering?
I just tried (without the interpolation, just shifting the 64x by 7 samples) but get odd results: Result is polarity-inverted and reduced in level by some 11dB

pma · Feb 7, 2021

KSTR said:
then downsample again without filtering?

Is it correct to make so?

KSTR · Feb 7, 2021

^ Probably not, but when filtering is applied (at the upsampling stage and/or the downsampling stage), I get the expected HF drop near fs/2 from the filter which is not what we want.
At upsampling, Audition does apply a filter to remove the images above fs/2 even with no pre/post filter enabled, which seems correct.
I might try with SoX and see if I get better results.

j_j · Feb 7, 2021

KSTR said:
So, in a practical example using Adobe Audition, one would upsample to 64x without filter (as that would impose a FR change), then calculate the required shift count and implement the remaining fractional shift (in upsampled sample periods) by linear interpolation, then downsample again without filtering?
I just tried (without the interpolation, just shifting the 64x by 7 samples) but get odd results: Result is polarity-inverted and reduced in level by some 11dB

You must always filter. Always. See Proakis' signal processing paper. Of course if your upsampler filtered, you're set.

Also remember dealing properly with gain when doing upsampling and downsampling. But there should be no absolute polarity inversion. If you used a midfrequency sine way, you should still have amplitude preserved. Not sure offhand what's gone down with that. The process works like a champ, that part I do know. Any "inversion" would be due strictly to the time delay putting you at a phase where the signal is negative, rather than it "inverting" per se. For partial samples of delay, this should not be possible Consider the period of the sine wave at fs/2.

KSTR · Feb 8, 2021

It turns out that Audition cannot handle extreme factors like 64x not foreseen by the developers which in general did a excellent job. CoolEdit/Audition for a long time was the only editor that gets 99% of things right, and still doesn't have much competition these days.

Polarity inversion bug pops up somewhere above 16x (didn't check at which exact integer multiple of fs this is happening. Level reduction is only happening when no pre-filtering is applied in the downsampling. As the upsampled data is already filtered I had assumed that downsampling without filtering would be simple decimation but for some reason this breaks...

Sidenote closed, now back to topic.

ctrl · Feb 8, 2021

bennybbbx said:
I think step reponse is usefull which time it need to reach 0 and how much % it later reach max to see in this value if a speaker is precise and can produce good stereo width or not.

Sorry, but this is not correct either!
Let's consider two ideal wideband chassis (minimum phase, no excess phase). The following step responses are available for these:
A)

B)

Chassis A, there the voice coil comes to rest after about 2-3ms.
Chassis B, there the damped oscillation is not in a constant state even after 4ms.

The differences are caused by the different frequency transmission range of chassis A and B:
A)

B)

Only if the frequency transmission ranges of the chassis are completely identical, one could read off conclusions about the decay behavior of the chassis. This is unlikely to be the case in reality when comparing loudspeakers.

In order to be able to make a statement about the decay behavior of a loudspeaker (to get a "value if a speaker is precise"), the measured impulse response is converted into the cumulative spectral decay (with time-based or oscillation-based axis).
In this way, it is much easier to make statements about the decay behavior of loudspeakers - provided that reflections are suppressed via gate.

The same applies to the evaluation of multi-way loudspeakers on the basis of the step response.
Let's consider two ideal two-way loudspeakers with identical filter order but one with a crossover frequency of 5kHz (C) and one of 1.5kHz (D):
C)

D)

I don't know what the sources for your conclusions were, but if someone wants to tell you in the future that speaker C is more precise and can reproduce transients better because, for example, the woofer "reaches the maximum" faster in the step response, you know that this is nonsense.
Due to the lower crossover frequency of speaker D, there the oscillation periods of the woofer ("starting" at 1.5kHz) are longer, hence the slower rise and the "later" maximum.

...produce good stereo width or not

On the subject stereo phantom image, width, elevation, spaciousness, envelopment,... you will find much useful information in Floyd Toole's book "Sound Reproduction" ... and no, you cannot infer the stereo image width established by loudspeakers based on the step response
.

pkane · Feb 8, 2021

j_j said:
There are a lot of ways to do subsample or variable-length resampling. DC and Nyquist present issues for real signals when using FFT methods. Proakis' method of oversampling and filtering in the oversampled domain is pretty much bulletproof, and gives you approximately an SNR of 'n^4' where n is the oversampling rate. 64 is almost bulletproof, and 128 pretty much exceeds any sort of real signal. At that rate, you can simply use linera interpolation between the two nearest samples at 128 times the sampling rate. Of course, only calculate the samples you actually need.

DeltaWave uses the FFT method.

j_j · Feb 8, 2021

KSTR said:
It turns out that Audition cannot handle extreme factors like 64x not foreseen by the developers which in general did a excellent job.

Well, that's annoying to say the least!

FeddyLost · Feb 13, 2021

I would not like to be a prophet of phase linearity, but i'd say currently for decent active monitors or any high level digital speakers it's a must. It can be done almost at no additional cost, so why not?
For passive customer speakers it's great quality, but just after all other properties are as good as it gets. Classic approach is too "expensive" for mass market and not very valuable in typical use case.
In case of cost-no-option high end speakers it might be beneficial in direct comparison, but also - after all other qualities. It will not overcome bad FR or high distortion.

thewas · Feb 13, 2021

FeddyLost said:
I would not like to be a prophet of phase linearity, but i'd say currently for decent active monitors or any high level digital speakers it's a must. It can be done almost at no additional cost, so why not?

Linear phase including the bass means a high latency which is often not acceptable for monitoring or even enjoying combined video-audio.

ernestcarl · Feb 13, 2021

thewas said:
Linear phase including the bass means a high latency which is often not acceptable for monitoring or even enjoying combined video-audio.

The time cost for linearizing the bass is not worth-it in many applications, but "300ish" or 500Hz to 20kHz can be quite acceptable (2.5 - 20ms). Offsetting the tweeter back is also also another way to reduce the time delay -- for example, Presonus Sceptre S8 and Meyer Sound Amie horn designs have their tweeters' positioned way further back than in many other monitors -- dunno how much actual time cost in delay that practically saves, but it surely must be able to save some extra DSP processing -- maybe.

I've been experimenting with different convolution settings in order to try to reduce this latency... and I found multichannel convolution significantly more processor intensive -- as you add more channels -- where phase "correction" is arguably more important/useful for system "optimization" due to the summing of multiple speakers possibly having different phase curves/profiles.

j_j · Feb 14, 2021

Alternatively you can simply compensate each driver, and the crossover as well, digitally, using DSP and it all works like a charm.

FeddyLost · Feb 14, 2021

thewas said:
Linear phase including the bass means a high latency which is often not acceptable for monitoring or even enjoying combined video-audio.

Low-latency mode like in Kii for live monitoring (exact phase here is not so important), lipsync for customer AV case.
I mean it might be one of available presets, for example.
Or limit phase linearisation to i.e. midbass, when pinpoint localisation at least troublesome to our earbrain and phase error will not really matter.

ernestcarl said:
The time cost for linearizing the bass is not worth-it in many applications

In critical application like mastering studio it will be done in complex with subwoofers and delay lines.
When we talk about single active speaker or even soundbar, we shall not forget cost-to-benefit analysis.
Typical 2-way compact monitor compromises bass so much (oh, and the room!) that linear phase in bass is not more than pig lipstick ...

thewas · Feb 14, 2021

FeddyLost said:
Low-latency mode like in Kii for live monitoring (exact phase here is not so important), lipsync for customer AV case.
I mean it might be one of available presets, for example.

Yes, as a mode it surely can be done, my above reply was a reason why it has a cost/disadvantage when fully used.

FeddyLost said:
Or limit phase linearisation to i.e. midbass, when pinpoint localisation at least troublesome to our earbrain and phase error will not really matter.

On the other side unfortunately the biggest audible differences for non pathological constructions are in the bass region where group delay can exceed known psychoacoustic limits.

thewas · Feb 14, 2021

j_j said:
Alternatively you can simply compensate each driver, and the crossover as well, digitally, using DSP and it all works like a charm.

Although drivers should be better placed geometrically in a way that their sound generation centres don't have a significant distance differences as correcting differences per DSP delay works only for one angle and compromises the generated radiation pattern and reflected sound.

Step Response: Does It Really Matter?

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Major Contributor

Master Contributor

Major Contributor

Addicted to Fun and Learning

Master Contributor

Major Contributor

Major Contributor

Addicted to Fun and Learning

Master Contributor

Master Contributor

Similar threads