• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Does Phase Distortion/Shift Matter in Audio? (no*)

If the crossover frequency is not 1 kHz, then the Phase vs. Frequency will be the same shape, except that it will shift left or right so that the inflection point is at the new crossover frequency.
Yes, I'm aware. Like I said before, each data point in this plot was calculated with both the all pass and ERB centered at the frequency plotted. In other words, at 300Hz, both the all pass and ERB are centered at 300Hz. At 1kHz, both are centered at 1kHz. At 3kHz, both are centered at 3kHz. (And so on...)

As you say, the bandwidth in octaves changes with frequency (lower frequencies being wider), which is why the phase rotation over the ERB decreases with increasing frequency.

Here's another plot which shows the (instantaneous) phase rotation per ERB for some individual crossover frequencies:
lr4_phase_per_erb_discrete.png

You can see that the peaks approximately match the curve shown in the other plot.
 
I thought what he was saying in response to the question is that phase shift is an issue if it happens within the human hearing range, and that is why devices have higher bandwidths. In devices with a wide bandwidth, the frequency roll-off and subsequent phase issues happen above human hearing, thus the device can produce a linear response within human hearing with no audible phase shifting.

Jumping from that, Marketing takes this to an unnatural extreme - and my experience is that downsampling filters have gotten good enough on the consumer level that sample frequencies above 48khz aren't needed to produce incredible fidelity. Combine that with good performances, quality instruments and quality microphones, and you have a recipe for an amazing end result when the production process is done (Assuming no one makes any big production mistakes).

I used to have an M-audio Firewire 18/14 that had terrible sound at 44.1khz, but when I switched to 88.2khz, it sounded excellent. Then I would downsample recordings afterward with software and there was no difference (subjectively that I could tell) between 44.1 and 88.2khz. I ultimately found 48khz to be what sounded great without needing to fiddle with downsampling as a separate process until the very end of production. I figured that there was a downsample filter issue in the device itself since 44.1khz should sound just as good as higher sampling rates, as evidenced by Compact Discs which are still considered lossless audio today, and are 44.1khz. I complained about this to M-audio to no avail (At the time).

Incidentally, some software instruments like EZ Keys v1 have strange resonances and a graininess sound at 44.1khz which disappear at higher sampling rates. But - I still think this is a software design problem. 44.1khz is really all that's needed to produce hi-fi audio. Above that is fairy dust, but unfortunately poor design forces us to use higher sampling rates to make sure we can get good fidelity and compensate for design flaws in our gear. Marketing is leveraging that to convince people to use ridiculously high sampling rates - but that's a useless endeavour that will only force you to buy more hard drive space.
 
and it's pretty much gone by 2kHz
By definition phase should be undetectable for wavelengths shorter than the distance between our ears. If we assume that to be about 15cm, biological cutoff frequency is 2286Hz.
 
By definition phase should be undetectable for wavelengths shorter than the distance between our ears. If we assume that to be about 15cm, biological cutoff frequency is 2286Hz.

Head size has nothing to do with this, and envelope shape takes over from phase above 2kHz. The mechanism is firing rate of neurons, which fire in phase at least up to 500Hz, to some extent to 1kHz, and a bit to actually about 4kHz, although the effect is almost gone by 2kHz. Above 2kHz, the neurons fire on leading edge of signal envelope in a given ERB.

And the "by definition" is entirely wrong for many reasons, as well, things like HRTF, etc, also confound your idea.
 
The mechanism is firing rate of neurons
Thanks for the information. Is this what is meant by phase locking of auditory nerve fibers?
 
Thanks for the information. Is this what is meant by phase locking of auditory nerve fibers?

Mostly, as with anything, "it's more complicated than that". :) The firing of the outer hair cells to decrease the gain on the cochlea can't help but intervene too.
 
  • Like
Reactions: OCA
Head size has nothing to do with this, and envelope shape takes over from phase above 2kHz. The mechanism is firing rate of neurons, which fire in phase at least up to 500Hz, to some extent to 1kHz, and a bit to actually about 4kHz, although the effect is almost gone by 2kHz. Above 2kHz, the neurons fire on leading edge of signal envelope in a given ERB.

And the "by definition" is entirely wrong for many reasons, as well, things like HRTF, etc, also confound your idea.

Thank you JJ. I don't mean to argue with you, but I am now a bit confused by Wikipedia's entry on ITD which states that the ITD is dependent on the width of the head.

1724659779369.png


Perhaps it would be good to clarify if you are talking about INTRA-aural or INTER-aural phase differences?
 
Thank you JJ. I don't mean to argue with you, but I am now a bit confused by Wikipedia's entry on ITD which states that the ITD is dependent on the width of the head.

View attachment 388581

Perhaps it would be good to clarify if you are talking about INTRA-aural or INTER-aural phase differences?


ITD does scale exactly due to shape and size of head, yes. But that is a different question than detecting phase of a signal, either in monaural or binaural signals.

Up to about 500Hz, phase of bass over 40Hz or so is a remarkably powerful "directional" cue.

Above 2K, envelope is as good a cue.

You may, if you've tried, notice that clicks and sudden signals at high frequencies can be localized, but a continuous sine wave at 10kHz offers just about zero cues, and only HRTF's help, and poorly so. There are actually signals designed to be precisely localized.

Something to realize is that with that kind of a signal, there is some sensation detectable down to between 10 and 5 microseconds, in terms of subtle shift of the source. The ear can not phase lock at all at that level, but it CAN lock on to the attack of a signal envelope.

I think I said "it's complicated" for a reason.

Now, in terms of monaural, without concerning localization, the coherence of the signal at the two ears is also detectable, exactly what the threshold is is not yet perfectly clear, but I can tell you it's much, much better than MPEG Surround can provide. :)
 
Dp you mean 40 degrees?
I mean between about 40Hz and about 500/800 Hz (depending on how you define strong) interaural phase is a very important part of directional cues. Among other things, that shows that phase locking happens at those frequencies. Above the upper limit, the locking becomes less and less and less effective due to the 1ms best neural recharge rate. Below the lower limit, the edge of the waveform is too small to overcome the noise floor on the cochlea.
 
Last edited:
Back
Top Bottom