• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Group delay produced by stereo

jsilvela

Senior Member
Joined
Dec 1, 2022
Messages
441
Likes
401
Location
Spain
I’m trying to find an explanation for my preference for center speaker vs. phantom center, and for mono over stereo for Zoom calls, videos of talks, podcasts.
I'm about to get mathy, so ... well, anyway.

I recently put a single monitor connected to a DAC on my desktop, which doesn’t have enough room for two speakers. I had never liked Zoom calls and talks/podcasts over the hifi system in my home office. The voices sounded “fuzzy”. Also I don’t like the “voice of god” effect you get from phantom center’s wide imaging, for movies.
With mono computer audio, or center speaker for movies, voices seems more "in focus".
Of course this is my experience/preference, but I have no doubt I could tell mono from phantom center in a blind test.

I’ve been perusing ASR threads, dr. Toole’s book, and other sources. And I don’t find conclusive explanations of the differences between phantom center and real center / mono which I can hear so clearly.

In Dr Toole’s book there is a description, in chapter 7, of the problem with phantom center. The calculations show a 0.27 ms inter-aural delay from Left-Right speakers that are placed 30 degrees from center. The delay induces a destructive interference dip at around 2kHz.
But Toole explains that in normally reflective rooms, the dip is not as noticeable anyway.

I wonder about group delay though… given the cross-talk interference for phase, that should induce some cross-talk smearing for the group delay, and make it potentially bigger than the constant 0.27ms one could expect.

Now, in Toole’s book he mentions a 0.27ms inter-aural delay, and 6dB attenuation of the "second" signal. By the symmetry of the setup, 0.27ms is also the delay between the L - R speaker signals arriving at a single ear.

Doing some math now:

The phase shift created by a time delay is 2π*f*delay where f is the frequency in hertz, and the delay is measured in seconds.
This phase is measured in radians. For convenience, let’s convert to cycles.

phase(cycles) = f*delay

We don’t care about full cycles, so really,
phase(cycles) = fractional-part( f*delay )

This is a “saw” graph. In our case, delay = 0.27ms. To find the frequency at which we get to 1 cycle delay, f = 1/0.27ms = 3703 Hz. Twice the frequency of the “stereo dip”. Makes sense. The “group delay” for the second signal, if we computed "the derivative" would clearly be 0.27ms for the second signal.

Now I ask: given that the second signal is delayed by a certain phase, and is 6dB attenuated, what is the combined effect?

6dB attenuated means 1/4 the power, and 1/2 the amplitude.

So, we’re dealing with a second sinusoid arriving `phase` cycles later, with amplitude 1/2. To get fewer fractions, let’s say the first signal has amplitude 2, the second signal amplitude 1 and delay is `phase` cycles.

We need to add the two phasors. Using vector notation as (x, y) pairs:

Speaker 1: (2, 0)
Speaker 2: (cos(2π*phase), sin(2π*phase)) ... the 2π is there because phase is in cycles…

To get the phase, I add the vectors, and do `tan⁻¹` of the “y” over “x”.
So tan⁻¹(sin(2π x) / (2 + cos(2π x))) / (2π)
with the 2π 's there to get both "input” and "output" measured in cycles.

Plotting this in geogebra (https://www.geogebra.org/m/SFacefc2)

Screenshot 2023-03-05 at 16.00.22.png


Which, eyeballing it, seems to veer between mild positive slope and a 45 degree negative slope.
Asking geogebra (politely) to plot the derivative

Screenshot 2023-03-05 at 16.02.44.png



I could not manage to alter the axes in this derivative plot, but we only care about the fragment between 0 and 1 cycle. The rest is repetition.
The red dashed line is the derivative, and as eyeballed, it moves between -1 and about 0.25.

So, we have up to 125% "slope difference" peak to peak.

Applying the chain rule:

group delay = [value of red curve] * 0.27ms

So, up to 1.25 * 0.27ms = 0.33ms shift in group delay.

This is below the Blauert et al. threshold for detection of group delay...

But been wondering if phase and group delay really are more important than they are given credit for.

David Griesinger (whom I discovered thanks to
)
will often comment that to get "realism" in vocals we need true phase-linear speakers.

And in this talk Matthew Poes comments on Griesinger and speculates that there must be something to his claims, and that phase really seems more important than thought previously.

What I'm wondering is if the systematic phase/delay smearing from stereo cross-talk effects could be the explanation for the "out-of focus" sensation of stereo speakers fed identical voice signals?
Also wondering if that could contribute to the problems with intelligibility and hearing fatigue often reported at ASR for movie dialog...

EDIT: bunch of edits in the above to clarify what I meant :/
 
Last edited:

IAtaman

Major Contributor
Forum Donor
Joined
Mar 29, 2021
Messages
2,404
Likes
4,154
Also wondering if that could contribute to the lack of intelligibility and hearing fatigue often reported at ASR...
You mean lack of intelligibility of dialogue in movies and such?
 

fineMen

Major Contributor
Joined
Oct 31, 2021
Messages
1,504
Likes
680
What I'm wondering is if the systematic phase/delay smearing from stereo cross-talk effects could be the explanation for the "out-of focus" sensation of stereo speakers fed identical voice signals?
Another chink that may allow the group delay to shine through? Nope, its quite easy. Your hearing is far too intelligent to be tricked by stereo when listening to such a simple situation. Stereo is ruined by turning the head by a few degrees even if everything else is kept ideal. Stereo is an effect that needs spectacle, and it has only little to do with music, or any other sort of understanding.

You prove my point.
 
OP
jsilvela

jsilvela

Senior Member
Joined
Dec 1, 2022
Messages
441
Likes
401
Location
Spain
Another chink that may allow the group delay to shine through? Nope, its quite easy. Your hearing is far too intelligent to be tricked by stereo when listening to such a simple situation. Stereo is ruined by turning the head by a few degrees even if everything else is kept ideal. Stereo is an effect that needs spectacle, and it has only little to do with music, or any other sort of understanding.

You prove my point.
I don't understand what you mean.
What point of yours am I proving?
Stereo is an effect that needs spectacle
???
 

fineMen

Major Contributor
Joined
Oct 31, 2021
Messages
1,504
Likes
680
I don't understand what you mean.
What point of yours am I proving?

???
Sorry for being unclear. As opposd to many I'm not so much in perfect stereo imaging. I doubt that this ol' idea of two (and more so with even more) speakers for imaging is a sound one.

I'm pretty much sure that your hearing is healthy. So I expect that it will detect the flaws of the stereo especially with a plain vitually centered sound source easily . E/g by just moving the head a little. or rotate it by a few degrees. You know, it tries to spot the location more and more precisely, unconciously. That's exactly what one must not do with stereo.

Stereo may shine with spectacular effects, but calm it down and it is gone.

I cannot follow your speculations regarding the group delay.
 
OP
jsilvela

jsilvela

Senior Member
Joined
Dec 1, 2022
Messages
441
Likes
401
Location
Spain
Sorry for being unclear. As opposd to many I'm not so much in perfect stereo imaging. I doubt that this ol' idea of two (and more so with even more) speakers for imaging is a sound one.

I'm pretty much sure that your hearing is healthy. So I expect that it will detect the flaws of the stereo especially with a plain vitually centered sound source easily . E/g by just moving the head a little. or rotate it by a few degrees. You know, it tries to spot the location more and more precisely, unconciously. That's exactly what one must not do with stereo.

Stereo may shine with spectacular effects, but calm it down and it is gone.

I cannot follow your speculations regarding the group delay.
Ah, ok, I get your comments now. Thanks.
I'm also not so big on "imaging". Dr Toole comments that often, one has "mono left, mono right, mono center", and this is what I experience often.
 

fineMen

Major Contributor
Joined
Oct 31, 2021
Messages
1,504
Likes
680
Ah, ok, I get your comments now. Thanks.
I'm also not so big on "imaging". Dr Toole comments that often, one has "mono left, mono right, mono center", and this is what I experience often.
Now, after a day, or even weeks after my attempts to address this topic I wonder why the audience (pun intended) doesn't care. I may be tempted to conclude that hifi doesn't have anything in common with music. To own a stereo is just any arbitray person's trait to show off consumistic exclusivity. 'Science' as is often quoted could be just another publicitly funded level of affirmation. Sounds bad (pun intended)? I feel so ...
 
Last edited:

asrlat

Member
Joined
Feb 25, 2023
Messages
7
Likes
8
This amplifies my feelings of hopeless resignation I get when I think of how music is recorded with mono or stereo mics and then mixed and mastered for stereo speakers and we're supposed to hear it "as it was intended." What was even intended? And we're stuck between "mono isn't enough" and "no amount of speakers is enough." Well, maybe a continuous sheet of loud speaking material and a heap of processing could do it. Any phantom point between L and R would be a real point source.

Then we have headphones. They don't have this crosstalk problem but music is made for speakers. Do we need personalised HRTF processing or what? No, there's the phantom center again.

So yes, I also think music production and sound reproduction are disconnected. It could also be my lack of understanding. In general.
 

sejarzo

Addicted to Fun and Learning
Forum Donor
Joined
Feb 27, 2018
Messages
977
Likes
1,077
So yes, I also think music production and sound reproduction are disconnected. It could also be my lack of understanding. In general.

No, you are correct. We don't listen with our ears a few inches away from instruments, nor can we hear an electronic signal that's never acoustic until it comes out of a speaker or headphone (as in the case of electronic music.) At best, the overwhelming majority of recordings are analogous to paintings rather than photographs.
 

kemmler3D

Major Contributor
Forum Donor
Joined
Aug 25, 2022
Messages
3,352
Likes
6,862
Location
San Francisco
What was even intended?
Whatever the mixing engineer heard in their studio, give or take. This isn't necessarily a path to realism so much as a path to a pleasant abstraction of what the musicians might have actually sounded like in the studio. As sejarzo said, it's more like a painting or collage than a photograph.

I think if a stereo mix was to be compared to visual art, it would be a series of close-ups of the instruments pasted in a line left-to-right, collage-style, which may or may not give an impression of a real scene. It wouldn't usually be a photograph of the actual band. And even if it was, the photograph on your wall wouldn't fool you into thinking the band was really standing right there.

Then there is the camp that says we should strive to reproduce truly realistic (as in "live, unamplified music") sound using two speakers, recorded with 2-4 mono mics, which seems maybe (or maybe not) a more defensible ethos, but also hard to square with the technology, as noted in various ways ITT.
 

sejarzo

Addicted to Fun and Learning
Forum Donor
Joined
Feb 27, 2018
Messages
977
Likes
1,077
It was said back in the 70s that the biggest challenge for sound engineers for rock shows in big arenas was to make the performance sound as good as the LPs.

It's all about perspective, eh?

(Especially given how lousy the PA systems were in that era.)
 
Last edited:

rationaltime

Member
Joined
Jan 30, 2023
Messages
68
Likes
55
This is an interesting topic. I am not prepared at first reading
to accept the assumptions.

I would like to present the situation from my own point of view.

Apparently we wish to reproduce the experience of hearing live sound.
Let's start at acoustic performance with sound reinforcement.

A typical live acoustic group might have musicians spread across
about 12 feet from one end to the other. They stand about 5 feet
from the front of the stage, a little more if sitting. Maybe the
listener is sitting in the 4th row, about 20 feet from the stage.
On the centerline that puts the end musicians about +/- 14 degrees.
The originator can do the math for us, but it means when pointing
the head at stage center the unreinforced sound from an end musician
arrives at the ear on that side about 100 us before the other ear.
That is still discernable for some frequencies, I think.

Now add sound reinforcement. There might be a monitor speaker on
stage in front of each musician. While the monitor is pointed at
the musician, the lows diffract around and all the sound bounces
off the stage setting. Sound from the monitors would come from
nearly the same direction as the acoustic sound. Except, in their
monitors the musicians usually want some of everyone and more me.
So, the same sound with a different mix of equalization and
loudness comes from every monitor. The audience hears the monitors
especially in the front rows.

Next add in the house speakers. Mount those pretty high to avoid
blasting the listeners close to the stage. There will be some
delay that depending the listener's distance from the stage. From
the 4th row call the delay 2 ms. There is something called the
Haas effect. The idea is when you hear multiple copies of a sound
the first arriving wave determines the direction you perceive the
sound. This means if you hear something from a musician you
perceive the house sound as coming from the first direction.
This works up to some delay like 30 ms as I recall. Beyond that
you hear an echo.

There in some tradition for bluegrass musicians to prefer the
use of a single large diaphragm condenser mic, and have all the
musicians sing or play into it. That is mono. Outside of that
we use a mic or pickup for each sound source. Each sound source
is routed to a separate channel. Let's emphasize that. Each
sound source is tracked in mono. Cross feed is not regarded
as desireable. To give an example, you don't want to set mic
levels and equalization for a violin then have the violinist
hold the instrument aside and start singing into the mic.
So far we are discussing live performance, but still.

Now each of those separate channels may be equalized, may be
modified with an effect, and have a level set. Then all the
channels are added together and routed to the house sound.
We want everyone in the audience, independent of location,
to have nearly the same audio experience. So, the house mix
is mono. That is the live sound experience.

My guess is without visual cues it would be difficult for most
listeners to determine the position of the musicians on stage.

All right, that is my story. It is enough for now.
 
OP
jsilvela

jsilvela

Senior Member
Joined
Dec 1, 2022
Messages
441
Likes
401
Location
Spain
These points are all very good, about how stereo and even multi-channel cannot really give you back "the original" and are at best approximations.
What with reverb, Haas effect, group delay and phase effects, comb filtering etc. it is all but impossible.

But I think we get better "evaluation" not from music but from plain human voice or from sounds we hear daily. Things we really know.

E.g. I was recently watching a show in my mother's system, which is 2 Bowers&Wilkins towers.
There was a scene where someone was reading a newspaper, then crumpled it.
Whoa, that sound was loud and harsh, and sounded nothing like a newspaper being crumpled.
I thought the gaffers had completely overdone it.
I played the same scene in my system. And the newspaper sounded like a newspaper.

Then the motivation for my research into group delay: my colleagues's voices in Zoom, played on my stereo system, sounded nothing like people talking. It was not the "dullness at 2kHz" as per Toole's book, but the subjective feeling that the dialog was slurred, out of focus. And, the voice-of-god effect from phantom image as opposed to single radiator.

I have the impression, or the hunch really, that I would be happier with a system that gave me realistic effects for a human voice or a newspaper being crumpled, even if it was at the expense of not having a flat (or downward sloping as is the fashion) frequency response.
 
OP
jsilvela

jsilvela

Senior Member
Joined
Dec 1, 2022
Messages
441
Likes
401
Location
Spain
Now, after a day, or even weeks after my attempts to address this topic I wonder why the audience (pun intended) doesn't care. I may be tempted to conclude that hifi doesn't have anything in common with music. To own a stereo is just any arbitray person's trait to show off consumistic exclusivity. 'Science' as is often quoted could be just another publicitly funded level of affirmation. Sounds bad (pun intended)? I feel so ...
hmm, here I differ. I get what you say. No matter how good Amir's objective rating, no matter the room treatment etc., we can never match the original.
But, do we really need to match the original faithfully? I think incomplete matching can already be very satisfying.
 
OP
jsilvela

jsilvela

Senior Member
Joined
Dec 1, 2022
Messages
441
Likes
401
Location
Spain
BTW I have another pet theory for my preference of single speaker vs. stereo speaker for dialog:
In our rooms, which we can "listen through", we know how a human voice + room reverb sounds.
With stereo speakers playing the same signal, we have way more reverb than we expect for a person talking.
 

Da cynics

Member
Joined
Jun 21, 2020
Messages
82
Likes
39
Very interesting. Could be developed a bit more.
By the way, a little hit speaker in the country where I live

The speakers, marketed for TV with the catchphrase "Clear words without raising the volume", were basically designed to be used in a single unit...
 

kemmler3D

Major Contributor
Forum Donor
Joined
Aug 25, 2022
Messages
3,352
Likes
6,862
Location
San Francisco
my colleagues's voices in Zoom, played on my stereo system, sounded nothing like people talking.
we have way more reverb than we expect for a person talking.
I think this is on the right track. The human voice naturally comes from a single point source. It's a sound our brains will also be highly attuned to. It's reasonable to suspect that any deviation from that physical arrangement could sound (or "feel") different than an actual human speaking / singing.
 

asrlat

Member
Joined
Feb 25, 2023
Messages
7
Likes
8
BTW I have another pet theory for my preference of single speaker vs. stereo speaker for dialog:
In our rooms, which we can "listen through", we know how a human voice + room reverb sounds.
With stereo speakers playing the same signal, we have way more reverb than we expect for a person talking.
Yes, from stereo speakers you get 4 signals into each ear: one direct, one through and around the head, and two potentially very messy ones from the room. No wonder it's not clear.
 

fineMen

Major Contributor
Joined
Oct 31, 2021
Messages
1,504
Likes
680
Yes, from stereo speakers you get 4 signals into each ear: one direct, one through and around the head, and two potentially very messy ones from the room. No wonder it's not clear.
Of course the reverberation from a so called 'phantom source' is not right. That's why a 'dry' recording is hard to find. With the emphasis on a full fleshed stereo rendition aka 'imaging' with perfectly sharp localization of those phantom sources the business got to mix-in tons of artifical, nowadays pretty cheap extra reverberation.

Anecdotally, back in the day when reverberation was not yet available via digital processing, voices were, literally, recorded in airplane hangars ...

Two channel stereo is just ill defined--in nearly every aspect. Let's come back to ping-pong stereo :cool:

My idea of narrow side speakers combined with a wide center derived from the two channels + delay may come to the rescue. But for the time being its my business alone, sadly.
 
Top Bottom