• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Time Alignment of Speaker Drivers

Yes, it should be obvious that all these aspects need to be designed to work together. Line-of-sight distance means nothing.

A classic example is this:

View attachment 502710

Recessed mid-woofer (tweeter mounted to front of baffle, mid-woofer to the back) and inverted tweeter polarity, to work in unison with the crossover.
Those are not time-aligned and therefore the tweeter is always too early and there will be time-smear around the XO frequency. The step response will not be as compact as possible. Of course this is a very old design from a time when designers had limited knowledge and tools.

See also my post here, and above post from @dualazmak
 
Then why are the tweeters recessed on Genelecs and some other speakers? They must have had a reason for doing so.
They are recessed because of the waveguide. They handle the time alignment via DSP, which is a vastly superior way of doing it.
Are the tweeters really recessed in Genelecs? In the 3-way coaxial models these seem to be pretty aligned with the midrange plane, and the waveguide of the 2-way models is also not very deep. In either case, it is not really of importance, as they are all active, and mostly DSP-controlled, so can be time-aligned by delay easily.
The initial versions of the Genelec "Eggs" with waveguides were analog and the recess via waveguide helped time-alignment big time, to implement a truly correct analog Linkwitz-Riley XO. Same goes for the Neumanns.
 
Interesting example in this thread, as they seem to have accidentally created a maximum of lobing and directivity error step with this particular combination of midwoofer and tweeter:

View attachment 502760

You can see the naturally steep increase in directivity index between 1K and 2K, that is to be expected from a stiff 6.5" driven too high in frequency, subsequently exhibiting a proper summation of the 2 drivers at crossover point on axis (indication of perfect time alignment). The lobing suckout in the transitional band (x-over point around 2.3K) comes early and hefty, as visible in the +20deg and +30deg vertical amplitude responses, which is creating a pretty pronounced step down in directivity index and vertical reflection tonality for the lowest octave the tweeter is on its own (2.7K to 4.4K), without being supported enough by the waveguide (which presumably is too small for such wavelengths).

What does this have to do with time alignment? It is a pretty good example of how optimizing time alignment and amplitude summation at one particular point (on axis), will lead lobing and cancellation under different angles.
Are these near field studio monitors? If yes then that horrible off-axis dip is probably not too bad.

However, I am surprised that not many more loudspeaker designers are using co-axial/coincident drivers. If you add in FIR arbitrary amplitude and phase EQ, then you really can get both wide-band flat frequency response and uniform dispersion little to none waveform distortion..
 
I wonder. A blind test. In a normally furnished room, with all that it means with reflexes and so on when would you detect a difference in time alignment?

By that I mean for example a test with a two-way speaker where you could physically move the tweeter back (or forward). I know that you probably couldn't generalize the result because it depends on the specific room (and speaker) , but still. Would you be able to blindly detect if the tweeter is moved by a centimeter/0.4 inch? Or by...a decimeter/4 inch?

Edit:
Alternatively, if possible, set a time delay to the signal to the tweetees or bass woffer. In practice, it would more or less be the same type of blind test.
 
Last edited:
I wonder. A blind test. In a normally furnished room, with all that it means with reflexes and so on when would you detect a difference in time alignment?

By that I mean for example a test with a two-way speaker where you could physically move the tweeter back (or forward). I know that you probably couldn't generalize the result because it depends on the specific room (and speaker) , but still. Would you be able to blindly detect if the tweeter is moved by a centimeter/0.4 inch? Or by...a decimeter/4 inch?
Yes I do believe that incorrect time alignment is audible, BUT it depends upon the signal and the cross-over region. I had my doubts satisfied recently with doing pulse measurements where a single pulse (sent every 1/2 second) is sent to tweeter and mid frequency driver simultaneously. I was watching both pulse arrival times on my Oscilloscope and could hear the speaker also. When not aligned by a around a milli-second the clicking sound sounded significantly different (amplitude and timbre) from the when the pulses occurred at exactly the same time - the composite pulse was louder and timbre different - like very different and thus very obvious. Greater than about a millisecond and 2 pulses are heard.

Well, we don't listen to pulses, but some instruments have a spectrum that covers the cross over range, so some when these spectral components are combined acoustically with delay the combined acoustic waveform will be distorted which may/maynot be audible. One might argue the real problem is having reference to the original recorded sound, but it is no trouble to make short recordings of single instruments, like flute note, piano note, drum beat etc and do an A/B comparison between a system with poor time alignment and/or group delay with one that is fully compensated.
 
The question is if you want to "time align" the drivers, then "where those drivers are acoustically located" is a pertinent question. The time alignment concept make sense only if you can approximate the driver (e.g. the woofer) as a point. And the answer to the question of where that point is is not obvious. It is almost certainly frequency dependent, and we may not even know how to define it.

We actually do know how to define it when we group the frequency part into Group Delay.
 
I wonder. A blind test. In a normally furnished room, with all that it means with reflexes and so on when would you detect a difference in time alignment?

By that I mean for example a test with a two-way speaker where you could physically move the tweeter back (or forward). I know that you probably couldn't generalize the result because it depends on the specific room (and speaker) , but still. Would you be able to blindly detect if the tweeter is moved by a centimeter/0.4 inch? Or by...a decimeter/4 inch?

Edit:
Alternatively, if possible, set a time delay to the signal to the tweetees or bass woffer. In practice, it would more or less be the same type of blind test.
You are basically tilting the polar response by this, almost like tilting the speaker itself.
IME, with pink noise, even small changes of this kind are audible. And with strong enough misalignment, like one additional wavelength at the XO frequency one can start to hear time-domain ill-effects, effects on sound-stage etc, even when the effective on-axis magnitude FR and the polars are almost the same wrt to the correct time alignment.
 
Are these near field studio monitors? If yes then that horrible off-axis dip is probably not too bad.

These are passive models, so I would assume they are meant for hi-fi.

From practical experience with many studios I would say a vertical off-axis dip like that is particularly disadvantageous for typical nearfield use. While the risk of colorated indirect sound might be reduced here, you have two major issues: Firstly the mixing engineer moving his head vertically or inviting people to listen standing behind the main listening chair which is a pretty common situation in studios. Furthermore, you have the console reflection which would bring a very early second wavefront, pretty dominant and in this case colorated.

I am surprised that not many more loudspeaker designers are using co-axial/coincident drivers.

The market for properly engineered OEM drivers is pretty limited when it comes to coaxials, and the risk of them bringing in some unsolvable problems due to the tweeter being inside the midrange´s voicecoil former employing the midrange cone as a treble waveguide, is surprisingly high.

If you add in FIR arbitrary amplitude and phase EQ, then you really can get both wide-band flat frequency response and uniform dispersion little to none waveform distortion.

Unfortunately that is not the case. There are many potential causes of resonance, interference and lobing issues with the tweeter being positioned inside the midrange voicecoil former, particularly suckout cancellation on axis (visible as a sharp on-axis dip somewhere in the 7K...11K band). On the other hand, the few coaxial drivers being optimized to avoiding these problems, in many cases bring in directivity characteristics which are very unwanted by many designers: Increasing directivity index towards higher frequencies and underrepresented brilliance/treble off-axis.

All the aforementioned issues are textbook examples of what cannot be countered with EQ.

I do believe that incorrect time alignment is audible

If audibility thresholds are exceeded, I would expect audible phenomena as well. We should be a bit cautious here, as there is a difference between time/phase distortion like group delay being audible, or the outcome of phase/time shift between different sources like several drivers becomes audible due to difference in frequency response, directivity or alike. In the latter case, the differences in amplitude resulting from phase issues, are audible, not the phase issues themselves. Same is true to intraaural differences.
 
These are passive models, so I would assume they are meant for hi-fi.

From practical experience with many studios I would say a vertical off-axis dip like that is particularly disadvantageous for typical nearfield use. While the risk of colorated indirect sound might be reduced here, you have two major issues: Firstly the mixing engineer moving his head vertically or inviting people to listen standing behind the main listening chair which is a pretty common situation in studios. Furthermore, you have the console reflection which would bring a very early second wavefront, pretty dominant and in this case colorated.



The market for properly engineered OEM drivers is pretty limited when it comes to coaxials, and the risk of them bringing in some unsolvable problems due to the tweeter being inside the midrange´s voicecoil former employing the midrange cone as a treble waveguide, is surprisingly high.



Unfortunately that is not the case. There are many potential causes of resonance, interference and lobing issues with the tweeter being positioned inside the midrange voicecoil former, particularly suckout cancellation on axis (visible as a sharp on-axis dip somewhere in the 7K...11K band). On the other hand, the few coaxial drivers being optimized to avoiding these problems, in many cases bring in directivity characteristics which are very unwanted by many designers: Increasing directivity index towards higher frequencies and underrepresented brilliance/treble off-axis.

All the aforementioned issues are textbook examples of what cannot be countered with EQ.



If audibility thresholds are exceeded, I would expect audible phenomena as well. We should be a bit cautious here, as there is a difference between time/phase distortion like group delay being audible, or the outcome of phase/time shift between different sources like several drivers becomes audible due to difference in frequency response, directivity or alike. In the latter case, the differences in amplitude resulting from phase issues, are audible, not the phase issues themselves. Same is true to intraaural differences.
On the KEF UniQ design, I have not found many issues at all in my on and off-axis measurements, but I have seen a dip exactly on axis at a frequency I don’t immediately recall off-hand, but it goes away with 2-3 degrees off-axis. I thought this was a measurement artifact or diffraction, but I went back a number of time and re-made measurements at different on-axis distances and still can see a dip. I have yet to gain a good theoretical understanding on what might be going on -what do think? I am in the far field of both drivers.
 
These are passive models, so I would assume they are meant for hi-fi.

From practical experience with many studios I would say a vertical off-axis dip like that is particularly disadvantageous for typical nearfield use. While the risk of colorated indirect sound might be reduced here, you have two major issues: Firstly the mixing engineer moving his head vertically or inviting people to listen standing behind the main listening chair which is a pretty common situation in studios. Furthermore, you have the console reflection which would bring a very early second wavefront, pretty dominant and in this case colorated.



The market for properly engineered OEM drivers is pretty limited when it comes to coaxials, and the risk of them bringing in some unsolvable problems due to the tweeter being inside the midrange´s voicecoil former employing the midrange cone as a treble waveguide, is surprisingly high.



Unfortunately that is not the case. There are many potential causes of resonance, interference and lobing issues with the tweeter being positioned inside the midrange voicecoil former, particularly suckout cancellation on axis (visible as a sharp on-axis dip somewhere in the 7K...11K band). On the other hand, the few coaxial drivers being optimized to avoiding these problems, in many cases bring in directivity characteristics which are very unwanted by many designers: Increasing directivity index towards higher frequencies and underrepresented brilliance/treble off-axis.

All the aforementioned issues are textbook examples of what cannot be countered with EQ.



If audibility thresholds are exceeded, I would expect audible phenomena as well. We should be a bit cautious here, as there is a difference between time/phase distortion like group delay being audible, or the outcome of phase/time shift between different sources like several drivers becomes audible due to difference in frequency response, directivity or alike. In the latter case, the differences in amplitude resulting from phase issues, are audible, not the phase issues themselves. Same is true to intraaural differences.
On the comment about wide dispersion and wide-band frequency response and your comment that “Unfortunately this is the not the case” I am actually achieving what I claimed with FIR EQ on the KEF UniQ coincident driver - making each MF/Tweeter driver amplitude flat with constant acoustic phase and using LR48 acoustic targets with or precise time alignment via DSP, you can see some of my preliminary measurements on my posts. I currently have a pause in my measurement campaign, but I will publish the measurement results when I have them. So far, the results that I am getting with this driver/FIR EQ/time delay combination are better than expected. By the way this UniQ design does NOT use the MF cone as a waveguide, instead it uses a custom designed phase plug (commercial name tangerine waveguide) to provide wide dispersion and the cone (according to KEF) plays little part in that. That was not the case with early non-tangerine waveguide UniQ designs. Have you looked at the UNIQ from a practical of theoretical point of view?
 
On the KEF UniQ design, I have not found many issues at all in my on and off-axis measurements, but I have seen a dip exactly on axis at a frequency I don’t immediately recall off-hand, but it goes away with 2-3 degrees off-axis.

Depends a bit on the generation as they improve this type of driver for decades already. But the more current ones are pretty advanced designs when it comes to avoiding the typical cancellation or lobing issues found in older coaxials (older Seas units and many Tannoys are not ideal in this regard). This comes at a price, though, which in many cases can be found in a pretty inconsistent or steeply increasing directivity index.

If I would have to choose a best compromise from technical perspective, I would say the TAD R1 coax and the smaller Genelec coaxials (found in 8341 and 8331 models) are class-leading. The TADs deliver most consistent directivity over several broad bands at the price of some minor lobing dips, the Genelecs offer a better performance in terms of avoiding cancellation, but their directivity is bit more inconsistent.
 
Based on my own experiences, I believe we should carefully subjectively assess the pros (or cons) of time-alignment and dispersion of especially high-Fq sound (ca. 6 kHz to 22 kHz) in our own home acoustic environments where we always have more-or-less complex reflections, standing waves, etc. caused by furniture, walls, ceiling, floor, and so on.

Just for example,,,
After I could establish 0.1 msec (and 1 msec) precision time-alignments among my subwoofers, woofers, midranges, tweeters and super-tweeters, I implemented wide-3D reflective dispersion of narrow-directivity metal-horn super-tweeter sound (ca. 8 kHz to 22 kHz) using random-surface heavy crystal-glass material as shared in my posts under the below spoiler cover.
A new series of audio experiments on reflective wide-3D dispersion of super-tweeter sound using random-surface hard-heavy material:
Part-1_ Background, experimental settings, initial preliminary listening tests: #912
Part-2_ Comparison of catalogue specifications of metal horn super-tweeter (ST) FOSTEX T925A and YAMAHA Beryllium dome tweeter (TW) JA-0513; start of intensive listening sessions with wide-3D reflective dispersion of ST sound: #921
Part-3_ Listening evaluation of sound stage (sound image) using excellent-recording-quality lute duet tracks: #926
Part-3.1_ Listening evaluation of sound stage (sound image) using excellent-recording-quality jazz trio album: #927
Part-4_Provisional conclusion to use Case-2 reverse reflective dispersion setting in default daily music listening:
#929
At least in my home listening acoustic environment, the pros of such wide-3D dispersion of super-tweeter sound effectively contributes to better and stable "3D stereo image" (hence better "disappearance of SPs") and to enlargement of so-called "sweet listening sphere" around my listening position; I took my efforts in preparation of illustrations (cartoons) of the 3D sound perspectives for your easy understandings on my such preferable subjective assessments/observations in my posts #926 and #927; I attach the two illustrations herewith under the below spoiler cover.
Ref. #926; Listening evaluation of sound stage (sound image) using excellent-recording-quality lute duet tracks:
WS00007267.JPG


Ref. #927; Listening evaluation of sound stage (sound image) using excellent-recording-quality jazz trio album:
WS00007268.JPG
 
Last edited:
I used to think that time alignment was critical. After I carefully time aligned my speakers with DSP and sat down to listen, I was astonished at the increased clarity. I kid you not, it really sounded fantastic. But there was something at the back of my mind that was nagging me ... having spent hours carefully examining measurements and nudging the system to perfection, was expectation bias clouding my subjective judgement? After all, I know the measurement looks perfect. Surely a perfect measurement means perfection in sound as well? The most powerful form of expectation bias comes when you know what the measurement looks like. The second most powerful is when you have spent thousands of dollars on something.

So I went around looking for papers, and I found a study by Liski, Makivirta, et al: link. Using special test signals and headphones, they found that in the range between 500Hz to 4kHz (the most critical audio band for time alignment), the threshold for audibility for group delay distortion was 0.56ms. That is huge! Sound can travel 19.2cm (7.6") in that time.

Well, that was rather deflating. Looks like my lying ears were fooling me again. I haven't yet conducted an experiment where I deliberately mess up the time alignment to confirm to myself that it really is that inaudible.
 
They are recessed because of the waveguide. They handle the time alignment via DSP, which is a vastly superior way of doing it.

When I saw a speaker like that 15 years ago, my gut instinct suggested "aligning the arrival of top and bottom end signals".

I guess even gut instinct can be wrong... sometimes!
 
So I went around looking for papers, and I found a study by Liski, Makivirta, et al: link. Using special test signals and headphones, they found that in the range between 500Hz to 4kHz (the most critical audio band for time alignment), the threshold for audibility for group delay distortion was 0.56ms. That is huge! Sound can travel 19.2cm (7.6") in that time.
Essentially, I agree with your above specific point even though I did carful "wave-shape matching time-alignment" measurement and tuning in 0.1 msec precision between woofer and midrange at my XO fq of 500 Hz, as shared these two diagrams in my post #504 on my project thread where I changed the group delay of midrange from 0 msec to 0.9 msec by 0.1 msec step.

I first recorded the sound of MD(SQ)-only and WO-only at 500 Hz by stimulating with the 500 Hz single sine wave.
WS00005962.JPG

As shown in above diagram, even with the single wave stimulation, both of MD(SQ) and WO give two and one aftershocks, respectively, which are reasonable and understandable. My main interest is that, due to the very large difference in "inertial mass" (mass of moving parts) between the two drivers, the bottoms and peaks given by WO sound delay in 0.3 ms to 0.7 ms against SQ. If I would like to have complete/perfect time alignment between WO and MD(SQ), therefore, I need to "relatively delay" the MD(SQ) sound against WO in 0.1 ms precision/accuracy which I can control by DSP software EKIO's group delay controller. I actually did it by 0.1 ms step relative delays in MD(SQ) sound from zero to 0.9 ms as shown here;
WS003309.JPG


Although I carefully selected/decided the "0.3 msec" delay setting would give the best wave-shape matching, the differences in WO+SQ(MD) wave-shapes for 0.2 msec to 0.8 msec delays were/are very subtle as you can see in the above diagram.

In my careful subjective listening tests using well QC-ed pink noise and several reference music tracks, I could not audibly distinguish between 0.3 msec MD-delayed and 0.7 msec MD-delayed sounds of WO+MD(SQ).

Consequently, based on my above objective wave-shape-matching measurements and subjective listening assessments, 0.5 msec precision would be "enough" for time-alignment between woofer/midrange (as for between midrange/tweeter, tweeter/super-tweeter, fortunately I found no delay between them).

On the other hand, as we can unanimously agree, 1 msec precision would be more than enough for time-alignment between subwoofer and woofer in consideration of acoustic wave-length of the low Fq sound around 30 Hz - 80 Hz.
 
Last edited:
Based on my own experiences, I believe we should carefully subjectively assess the pros (or cons) of time-alignment and dispersion of especially high-Fq sound (ca. 6 kHz to 22 kHz) in our own home acoustic environments where we always have more-or-less complex reflections, standing waves, etc. caused by furniture, walls, ceiling, floor, and so on.

Just for example,,,
After I could establish 0.1 msec (and 1 msec) precision time-alignments among my subwoofers, woofers, midranges, tweeters and super-tweeters, I implemented wide-3D reflective dispersion of narrow-directivity metal-horn super-tweeter sound (ca. 8 kHz to 22 kHz) using random-surface heavy crystal-glass material as shared in my posts under the below spoiler cover.
A new series of audio experiments on reflective wide-3D dispersion of super-tweeter sound using random-surface hard-heavy material:
Part-1_ Background, experimental settings, initial preliminary listening tests: #912
Part-2_ Comparison of catalogue specifications of metal horn super-tweeter (ST) FOSTEX T925A and YAMAHA Beryllium dome tweeter (TW) JA-0513; start of intensive listening sessions with wide-3D reflective dispersion of ST sound: #921
Part-3_ Listening evaluation of sound stage (sound image) using excellent-recording-quality lute duet tracks: #926
Part-3.1_ Listening evaluation of sound stage (sound image) using excellent-recording-quality jazz trio album: #927
Part-4_Provisional conclusion to use Case-2 reverse reflective dispersion setting in default daily music listening:
#929
At least in my home listening acoustic environment, the pros of such wide-3D dispersion of super-tweeter sound effectively contributes to better and stable "3D stereo image" (hence better "disappearance of SPs") and to enlargement of so-called "sweet listening sphere" around my listening position; I took my efforts in preparation of illustrations (cartoons) of the 3D sound perspectives for your easy understandings on my such preferable subjective assessments/observations in my posts #926 and #927; I attach the two illustrations herewith under the below spoiler cover.
Ref. #926; Listening evaluation of sound stage (sound image) using excellent-recording-quality lute duet tracks:
View attachment 502876

Ref. #927; Listening evaluation of sound stage (sound image) using excellent-recording-quality jazz trio album:
View attachment 502877
I certainly agree with you about the multi-path environment in which we listen. If one is seriuos about hifi sound, then the room needs treatment. My RT60 is now around 350 milliseconds and the clarity index 6dB - mainly because we sit close to the speakers and the room is quite big. I have absorbers on the ceiling and walls, but no bass traps. I am not a big fan of DSP room EQ because I don’t believe its practical. Ive been involved in this hobby since building Linkwitz active loudspeaker in 1977, but I am still amazed at how the audio community argue about topics such as 100 vs 110 dB DAC S/N, vanishing small distortion and the like, but fail to address the dominant source of impairment to the sound which comes from the room! Its almost that the most important aspect is the one with least attention. Thanks for your reply
 
So I went around looking for papers, and I found a study by Liski, Makivirta, et al: link. Using special test signals and headphones, they found that in the range between 500Hz to 4kHz (the most critical audio band for time alignment), the threshold for audibility for group delay distortion was 0.56ms. That is huge! Sound can travel 19.2cm (7.6") in that time.

Well, that was rather deflating. Looks like my lying ears were fooling me again. I haven't yet conducted an experiment where I deliberately mess up the time alignment to confirm to myself that it really is that inaudible.
They tested group delay abberations created by allpass filters, which is quite different from what you get from two sources playing the (almost) same signal with a time offset.
The allpass does not change the envelope and duration of a wavelet (shaped sine burst) whereas a time offset does, it stretches the summed resulting wavelet and in extreme cases it creates two distinct events.
The allpass shows "time smear" when comparing wavelet of different frequencies, the time offset smears the wavelet itself. When you have signals that are narrow-band tonal transients, like claves, castanets and many others, mostly percussive instruments, this time smear can become audible.
 
I used to think that time alignment was critical. After I carefully time aligned my speakers with DSP and sat down to listen, I was astonished at the increased clarity. I kid you not, it really sounded fantastic. But there was something at the back of my mind that was nagging me ... having spent hours carefully examining measurements and nudging the system to perfection, was expectation bias clouding my subjective judgement? After all, I know the measurement looks perfect. Surely a perfect measurement means perfection in sound as well? The most powerful form of expectation bias comes when you know what the measurement looks like. The second most powerful is when you have spent thousands of dollars on something.

So I went around looking for papers, and I found a study by Liski, Makivirta, et al: link. Using special test signals and headphones, they found that in the range between 500Hz to 4kHz (the most critical audio band for time alignment), the threshold for audibility for group delay distortion was 0.56ms. That is huge! Sound can travel 19.2cm (7.6") in that time.

Well, that was rather deflating. Looks like my lying ears were fooling me again. I haven't yet conducted an experiment where I deliberately mess up the time alignment to confirm to myself that it really is that inaudible.
Correct driver crossover phase correction (hence time alignment) lifts the phantom stage up if your tweeter is above your midrange driver. That's an easy to detect audible improvement if you can switch between calibrations quickly.
 
I used to think that time alignment was critical. After I carefully time aligned my speakers with DSP and sat down to listen, I was astonished at the increased clarity. I kid you not, it really sounded fantastic. But there was something at the back of my mind that was nagging me ... having spent hours carefully examining measurements and nudging the system to perfection, was expectation bias clouding my subjective judgement? After all, I know the measurement looks perfect. Surely a perfect measurement means perfection in sound as well? The most powerful form of expectation bias comes when you know what the measurement looks like. The second most powerful is when you have spent thousands of dollars on something.

So I went around looking for papers, and I found a study by Liski, Makivirta, et al: link. Using special test signals and headphones, they found that in the range between 500Hz to 4kHz (the most critical audio band for time alignment), the threshold for audibility for group delay distortion was 0.56ms. That is huge! Sound can travel 19.2cm (7.6") in that time.

Well, that was rather deflating. Looks like my lying ears were fooling me again. I haven't yet conducted an experiment where I deliberately mess up the time alignment to confirm to myself that it really is that inaudible.
Surely time alignment is really only about avoiding the distortion to the music waveform that spans the crossover region? Because the spectral components arrive at different times when they did arrive all together before they entered the loudspeaker. I guess what I am saying is that the parameter of interest is the waveform distortion.
 
Back
Top Bottom