• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

A Broad Discussion of Speakers with Major Audio Luminaries

Two microphones? Since any music with more than one sound producing sources (e.g. a musician singing and playing the guitar) is not a point source, so it is impossible for the microphones to be equi-distance to all sound sources, so there went your phase coherency.
Sorry, but your answer doesn't answer my question. When microphones pick up sound, it will have its own phase-time coherence, plus spatial information, which will have its own phase and time coherence. Speakers should maintain and preserve this micro and macro information during playback.
 
  • Like
Reactions: ADU
Speakers should maintain and preserve this micro and macro information during playback.
Why? What's special about the particular phase information picked up by the microphones? As already explained, this is not the actual phase relationships as present in the room during the recording (and in any case, it's not the same phase relationships as two ears on a head would have experienced). Even if it was, the phase has already been altered numerous times during the mixing and reproduction chains before it reaches your speakers.

Finally, yet again, there's no good evidence that this is audible in real playback conditions (not test signals in a test setup), and that even if it were that one phase relationship vs another is preferable. Why spend a not inconsiderable amount of effort worrying about it? There are way bigger fish to fry.
 
The sound of an instrument is by definition in phase; two microphones will pick it up, and the speakers will reproduce it.

That is technically not possible. Microphones which contain any gradient-transducer characteristics, are anyways shifting the phase, but also pressure transducer will not proportionally translate sound pressure to output level, and they catch more of room interaction.

We should also define for which instruments this is relevant. Most of group delay distortion or phase shift having a potential to be audible (due to exceeding the audibility thresholds), happens between lower bass and upper harmonics in case of a more or less transient event. This leaves picked double bass, tympany, kick drums and some bigger toms, and that is basically it (organ, bassoon, bass trombone or tuba are too slow in attack). These all rely on pretty hefty masses resonating, and the room does add additional impression of ´slower´ low frequencies. No way you can capture this mixture ´in phase´ from a few meters away with an omni mic.

The sounds with which this is really relevant, are either miked in close proximity (like rock kick drums), come from electromagnetic pickups (which also have phase shift, like a bass guitar) or are artificial. There are pretty interesting samples which give a feeling of a very tight, coherent impulse plus a quickly decaying low bass. Very good for testing subwoofers or bass characteristics in a room. And no, these don't resemble test signals, rather a corrected kick drum.
 
Direct sound that is phase and time aligned, reaches the ears as such.
The issue/question of how much reflected sound gets integrated with direct is too situation/room specific to make universal pronouncements about how much phase gets corrupted, or about what we can and can't hear with respect to time and phase alignment.

I know it doesn't take an anechoic chamber to help separate direct from reflected, to get a handle of the value of phase and time alignment. Constant directivity speakers with a defined narrow pattern help, especially in a larger room.
Outdoors really helps/works, and ime, can make you think twice about the "value and appeal" of reflected sound... (per widely accepted/quoted research).

btw, fun short vid about the sound of a balloon pop.... outdoors, different rooms, and an anechoic chamber at the end.
Outdoors clearly is a long way from sound sucking anechoic..


edit: grammar
This produced a cool sound effect::)

 
Direct sound that is phase and time aligned, reaches the ears as such.
Direct sound reaches the ears directly, surprise surprise, but your words "that is phase and time aligned" are presumptive.

Let's say a small ensemble is performing outdoors. Who is receiving sound "that is phase and time aligned", the listener who is twice as close to the singer as to the pianist, or the one who is twice as close to the pianist?

I think the point being made is that for recorded music playback, the idea of "that is phase and time aligned" is probably not present in the recording environment and not meaningful in the playback environment: it doesn't even occur in the 'direct sound' reaching the ear from the speakers, due to issues in the recording process.

The relevant point I hope you were making in response to Amir is the one made by Dr Toole (the researcher with whom you disagree more than any other), namely, humans have the cognitive ability to separate direct sound from summed-with-reflections sound, and will perceive the sound as erroneous if the direct sound from the speakers is not well-balanced, even if the summed sound is not badly balanced. Therefore, getting the direct (anechoic) speaker response to sound well-balanced is important.

The issue/question of how much reflected sound gets integrated with direct is too situation/room specific to make universal pronouncements about how much phase gets corrupted, or about what we can and can't hear with respect to time and phase alignment.
Yes it is infinitely variable due to room specifics, but some generalities can still be made, such as the one in my paragraph above, and also to say that in general, a home listening room is a 'hot mess' of phase and time misalignment, and that although the right room treatment is beneficial, only an anechoic chamber will be effective in reducing it to near the levels in the recording itself.

I know it doesn't take an anechoic chamber to help separate direct from reflected, ...
Yes, very much so
...to get a handle of the value of phase and time alignment.
Not so much that bit, as discussed above.
Constant directivity speakers with a defined narrow pattern help, especially in a larger room.
Gee, now you are into generalities? Good to see! Except that the generalities should be consistent with quality evidence, not just a principle that seems logical to you. And in this case, the quality evidence coming from the research does not specifically favour narrow directivity patterns, at least not for the home playback environment.

Outdoors really helps/works, and ime, can make you think twice about the "value and appeal" of reflected sound... (per widely accepted/quoted research).
As Dr Toole has mentioned, humans very strongly prefer the sound of a classical music performance in a hall with good acoustics more than outdoors.
 
Did you mean to say "Reflections from milliseconds later in time are mixing with it, albeit at lower amplitude"?
No, I mean in both directions. As I said, music is not an instant in time that has nothing before or after it.
I can see how this would be true for early ipsilateral (same-side) reflections because they arrive at that side's eardrum conflated with the direct sound.

But for early contralateral reflections, which arrive at the opposite ear from the first-arrival sound, wouldn't the ear's binaural processing ability allow it to distinguish the reflections from the direct sound?
Reflections cause comb filtering but its resolution is too high for the auditory filters to discriminate. By the time central cortex receives the signal, it has already been "cleaned up" but the auditory sensors.
 
I think he was talking about music, rather than test signals. So yes you can get reflections from the previous transient contaminating the current transient. It think he worded it a bit clumsily hence the confusion.

Thx Keith. I still must be missing something, because both interpretations I've gathered are very amiss imo.
My first, that reflections can arrive before direct...is clearly physically impossible.
And the one you present it might be, previous hangover reflections Lol, seems to be stretching at best..

A speaker is a piece of engineering imo......just a tool.......that I want to turn whatever audio signal into acoustic energy.
I want that acoustic energy to exactly match the electrical signal given to the speaker.
Whatever the signal is...phase corrupted, phase perfect, whatever any of those nebulous terms even mean.....it doesn't matter one iota imo.
A truly technically excellent speaker will match signal completely....mag and phase, impulse et al.

The only arguments against such technical excellence that I see, are the time -is we can't hear phase, small-rooms mask everything anyway, source material is screwed up to begin with....yada yada and more yada.

Well, I say who knows...maybe the reason for all that 'yada why bother' is that we never had the ability to make such phase and time aligned technically excellent speakers before. Muti-way active DSP has truly changed what speakers are capable of.
I say, why not get on board with the idea we can at least make one component in the whole "why bother/can't hear" circle of confusion quagmire ....at least make the speakers.... NOT be part of the problem.

Technical excellence means both mag and phase matter. To the degree speaker phase eventually proves to be audible/inaudible remains to be seen.
Anyone saying it's already proven, is operating under a very subjective state of mind, imnsho.
 
Direct sound doesn't happen in an infinitesimally small time period. Reflections from milliseconds before in time are mixing with it, albeit at lower amplitude. Only in anechoic chamber what you say is true. This is the very basis of the impact of room on sound. You are imagining something that doesn't exist in real rooms. Specialized signals attempt to lower the possibility of this happening and hence the higher probability of it being audible.
It ^all that^ is true, then it makes it sort of difficult that Klippel can even have the cojones to advertise that they are able to gate out the sounds.

Are they obviously telling lies?

:cool:
 
I think we need @j_j to jump in with a response about phase inaudibility.

The way I see it, INTRAchannel phase distortion is audible in a free-field, and with live instruments. Griesinger has written about the proximity effect where instruments lose their focus / sense of immediacy after a certain distance where phase becomes incoherent after being distorted by reflections. As anybody who has attended a live performance of acoustic instruments will tell you, the experience of sitting further away is definitely true, although I am less confident that loss of phase coherence is the only explanation.

...

Given that we have some heavyweights in this discussion, perhaps someone is aware of a study that answers the third question. For a given listening room, and given linear phase recordings and linear phase loudspeakers, does room masking sufficiently corrupt phase to the extent that it becomes inaudible?

You're gonna wince, but the answer is "sometimes", sorry. Some phase response issues, like with crossovers that swap phase at the crossover point, can have audible effects even in the 500-2000Hz range (where many crossovers are), as well as in the 5K range (another common point) but only on some signals (percussion for instance, harpsichords, some others). If the direct signal is intense enough, compared to the reverberant signal, at the ear, yeah, you can hear that with the right kind of musical signal (you can surely hear it with synthetic phase shifting). The question is how much of the 1/3 ERB by 1/3 ERB signal in a given band is actually heavily allpassed due to the reverberation AND the effective different in time delay/inversion <which creates something similar, kind of> across frequency is not masked by reverberation.

I realize this is not a very helpful answer without a lot of annoying measurement.
 
It isn't. By definition your room is not an anechoic chamber so it will corrupt the phase and repeatedly so.

Except the direct (early) and reverberant (delayed) signals do not necessarily have the same cochlear detection strategy. Don't also forget that co-articulation (intRAaural) is an actual thing (meaning time arrival of an impulsive signal across FREQUENCY), and that rapid phase shift inside of one ERB is for certain audible.
 
Findings of researcher David Griesinger seem to support an argument for time coherence north of 500 Hz or so, and for freedom from early reflections. He studies the acoustics and psychoacoustics of concert venues and lecture halls, rather than home audio as such, but the principles he describes are arguably applicable to home audio too.

If I understand correctly, when the overtones all arrive at the same instant (rather than smeared over a short time interval), their energy combines into an instantaneous energy peak which increases the dynamic range (peak-to-average ratio) and therefore the effective signal-to-noise ratio. In the context of concert venues and lecture halls, sufficiently early reflections can degrade the phase response and therefore the time-domain relationship of these overtones. So whether the phase response is degraded by the loudspeakers, or by early reflections, or both, this is arguably detrimental to the effective signal-to-noise ratio and therefore detrimental to clarity and/or dynamic contrast.

Quoting David Griesinger: "The information content of speech is (almost) entirely in frequencies above 500Hz. For all people, even children, this means that information – at least the identity of vowels – is encoded in amplitude modulations of harmonics of a lower frequency tone. This method of encoding is universally used by almost any creature that wishes to communicate, from insects to elephants. Why? Because harmonics have a unique property – they combine to make sharp peaks in the acoustic pressure at the frequency of the fundamental frequency that created them. It is the presence – or the absence – of these peaks that enables the perception of ‘near’ and ‘far’. The sharp peaks also facilitate separating the signals from noise, and with the appropriate neural network the peaks from one sound source can be separated from another.

"The peaks only exist when the incoming signal consists of a tone with a definite pitch and lots of upper harmonics. Furthermore, the peaks only exist when there are two or more harmonics at the same time within one critical band." [emphasis mine; also, note that critical bands are approximately 1/3 octave wide.]

So the implications for home audio seem to be that a) the direct sound should be time-coherent north of about 500 Hz in order for the overtones to arrive simultaneously, resulting in these peaks; and b) early reflections which would be conflated with the direct sound north of 500 Hz are undesirable.

So, we might well ask, how early is "too early" for the onset of reflections?

David Griesinger again: "Transients are not corrupted by reflections if the room is large enough - and 10ms of reflections free time is enough."

Now at first glance it looks like Griesinger's advocacy of a 10-milliseconds reflections-free time interval after the arrival of the direct sound conflicts with @Floyd Toole's observation that the in-room reflections actually improve intelligibility by providing the listener with multiple "looks" at complex sounds. And such may be the case! (I no longer have my copy of The Book, and am awaiting the 4th Edition.) But perhaps this increased intelligibility comes from those reflections arriving AFTER the 10-milliseconds interval Griesinger mentions. Hopefully someone who knows the answer will post and shed light on this.

(Elsewhere Griesinger mentions 700 Hz and 1000 Hz as being the lower end of the frequency region that matters most to the ears, giving me the impression that the octave between 500 Hz and 1 kHz is a somewhat fuzzy transition region.)
 
Last edited:
No, I mean in both directions. As I said, music is not an instant in time that has nothing before or after it.

Thanks for correcting my misunderstanding.

Okay, yes, reflections (particularly the earliest ones) can degrade clarity by partially masking first-arrival sounds.

Reflections cause comb filtering but its resolution is too high for the auditory filters to discriminate. By the time central cortex receives the signal, it has already been "cleaned up" but the auditory sensors.

I'm under the impression that comb filtering is not nearly as perceptually significant as a frequency response measurement implies. My understanding is that the ear/brain system is often able to distinguish between the first-arrival sound and the comb-filter-causing reflection(s), which is something microphones don't do. I assume this is the auditory processing you're referring to.

My understanding is that as the wavelengths become longer, the amount of time in between the direct sound and the arrival of the reflections needed for the ear/brain system to do this processing increases, until we get down into the bass region where the ear can no longer distinguish them. Thus in the bass region (in home-audio-sized listening rooms), the steady-state in-room response curve - which includes multiple comb-filter-like effects - is predictive of perception.
 
Last edited:
Except the direct (early) and reverberant (delayed) signals do not necessarily have the same cochlear detection strategy. Don't also forget that co-articulation (intRAaural) is an actual thing (meaning time arrival of an impulsive signal across FREQUENCY), and that rapid phase shift inside of one ERB is for certain audible.
And also for impulse sounds, there are “no pre-ringing reflections bouncing around the room” like there are for more steady-state music.
i.e. the drums and harpsichords you mentioned.

It is not exactly like what I as picturing when @amirm mentioned with the sounds rattling around like a tornado scene out of “The Wizard of Oz” with Toto, the witch, etc. in a total swirl.
 
The question is how much of the 1/3 ERB by 1/3 ERB signal in a given band is actually heavily allpassed due to the reverberation AND the effective different in time delay/inversion <which creates something similar, kind of> across frequency is not masked by reverberation.

Except the direct (early) and reverberant (delayed) signals do not necessarily have the same cochlear detection strategy. Don't also forget that co-articulation (intRAaural) is an actual thing (meaning time arrival of an impulsive signal across FREQUENCY), and that rapid phase shift inside of one ERB is for certain audible.

Does "ERB" stand for "Equivalent Rectangular Bandwidth"?

If so, is this related to Gammatone filters?

What would constitute a "1/3 ERB by 1/3 ERB signal"?

And, in the context of crossover filter slope, what would be an example of "rapid phase shift inside of one ERB"?
 
Last edited:
I'm under the impression that comb filtering is not nearly as perceptually significant as a frequency response measurement implies.

This is pure speculation, but I think that comb filtering that is the result of mixing "in the air" is far less audible than comb filtering that is "mixed electrically". Try downloading a flanger VST and dialling in some comb filtering. It sounds metallic and unpleasant. Yet comb filtering "in the air" does not have this metallic quality. I wonder if it is because we move our heads in a sound field and it somehow reduces the effect of comb filtering by averaging it out. Dr. Toole does mention that comb filtering is not as audible as the measurements suggest, but (unless I missed it) there was no explanation why in his book.
 
Does "ERB" stand for "Equivalent Rectangular Bandwidth"?

If so, is this related to Gammatone filters?

What would constitute a "1/3 ERB by 1/3 ERB signal"?

And, in the context of crossover filter slope, what would be an example of "rapid phase shift inside of one ERB"?

15 degrees inside of one 1/3 ERB, or 45 in an ERB. Gammatone filters are an ok sort of estimation of ERB's. I'd think 1/3 ERB envelope correlation with the ones above and below it ought to be simple enough, yes?
 
This is pure speculation, but I think that comb filtering that is the result of mixing "in the air" is far less audible than comb filtering that is "mixed electrically". Try downloading a flanger VST and dialling in some comb filtering. It sounds metallic and unpleasant. Yet comb filtering "in the air" does not have this metallic quality.
To start with surfaces mostly diffuse to some extent, so the zeros caused by a delayed signal are less of an impulse, even if they are a delayed reflection. Second, walls have varying reflections in most cases (if you try a narrow concrete hallway, you'll (*&(& well hear that unpleasant metallic sound,d,d,d,d,,d but remember what delay we're talking about, too. 1 millisecond per foot, give or take, is the acoustic delay. Flangers can go a lot longer than that.)

The longer and more coherent the delay, the more "flanged" it is. If you put your ear close to a hard wall and listen to something from down the hall, you'll hear it, of course, too.

Short specular reflections are bad in this regard,as well, but MOST rooms don't provide truly specular reflections. In large spaces, as well, air movement, surface texture, etc, all enter into this.

Having built a box that was 15x21x11 that started out with hard walls, I can tell you it's flanged to ****. Of course, after putting 8" spacers and 4" of medium density mineral wool, with a fireproof scrim surface, it's MUCH better. (that's 4 walls and ceiling).
 
wow this is an awesome discussion and i have learned a lot from reading it.

my only question is - Kippel is the ultimate tool for utter accuracy when fighting about final measurements? doesn't their website state their accuracy is within </+/- 1.5dB? isn't that pushing us into much ado about nothing territory with many corner cases?
 
wow this is an awesome discussion and i have learned a lot from reading it.

my only question is - Kippel is the ultimate tool for utter accuracy when fighting about final measurements? doesn't their website state their accuracy is within </+/- 1.5dB? isn't that pushing us into much ado about nothing territory?

Measuring what? Frequency response? Radiation pattern? Distortion? Relative (discounting delay) phase of drivers?
 
Back
Top Bottom