• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Sonic impact of downmixing stereo recordings to mono

It does: "the brain" would not average, as you put it, it is about different signal processing chains. As soon as a phantom source is identified, the coloration due to direction dependent ear signals is - spontaneously ignored. If the phantom source collapses, the coloration is perceived. So far the direct observation. The modelling of the effect in the second part of Theile's piece is speculative, sure, but not without reason.

I would go even further in saying, the identification of phantom sources hinders the detection of coloration, may it originate in the HRTF or in the speakers.

Anyway, as you say, phantom is colorful, does it turn to grey once downmixed?

We should foremost acknowledge, that evaluating a speaker is a very unique mode of listening. I'm quite rarely into that, don't know about you :rolleyes:
My experience is listening to mono and stereo using both pink noise, music, sweeps and single frequencies in the 500 Hz- 5 kHz while constructing crossovers. I have almost always found a weaker energy around 1-2 kHz vs. 3-4 kHz with stereo setups despite measurements look fine. With music some voices may just be a bit unpleasant. It has ended with a slightly more energy 2 kHz vs. 3-4 kHz. That way it sounds better. But again that is me.

On-axis 2.png
 
The Ascilab actually thanks to ...
Great speakers, great concept, vitually all done right. Congrats!

It comes with implications, though: Partly the constant directivity in its lowest octave relies on vertical interference and lobing. This usually results in a slightly different tonality for vertical and horizontal dispersion influencing discrete reflection tonality ...
All directivity talks are void as long as the room isn't taken into account with the same scrutinity.

When it comes to evaluating a speaker model listening to only one of the pair, the chain of premises and conclusions is a bit longer than people think. There are so many uncertanties, caveats and all.

What is the target, especially would it evaluate the fitness for stereo? Presumably not. I think it was meant to identify the technical parameters for a 'perfect' speaker. Once that is done, all the panel testing in mono isn't needed anymore. The parameters, naturally, can be measured instead - today.

To our all surprise it came out: flat, homogeneous. What a shock! (We desperately needed a standard, because 'surround' asked for it, the marketing of certifications.)

Listening to a single speaker is not comparable to listening stereo, that easy. Hence a downmix is not a technical problem in the first place.
 
this vertical lobing is surprisingly low for a non-coincidential design

I agree, the listening window is not affected. Nevertheless we have a band of almost an octave (0.7-1.4K) which is showing very high vertical directivity index, while the horizontal one is very low (both compared to the neighboring octave 1.4-2.8K). It is impossible to predict the influence on early reflections and resulting diffuse field. Someone with trained ears, experienced with judging reverb, has to give it a listening session.

I think that also depends on the listening distance and closeness to vertical reflections like for example desktop

Yes, it definitely depends on the listening distance. While in a nearfield and desktop situation indeed localization angle and tonality might likely be affected, under mid-field conditions it can have an influence on localization stability, in case our ears can determine the differing positions of ´real sources´ through early reflections in contradiction to phantom sources. Again, someone experienced has to try it.
 
No speaker is perfect but, in most cases, should never be designing a speaker towards how the music was recorded.

It is safe to say that Dr. Toole's (and related) research has affected almost every speaker mentioned in this thread so far. Along with Amir's efforts, speaker manufacturers are starting to provide more and more useful specifications than in the past. Is there room for improvement, sure. But, at this point, the next big advancement opportunity is in how music is recorded and produced. If better music production standards existed, likely this thread would not.
 
Last edited:
Don't know if this will add anything or just add confusion (seems to be no shortage)

But If an audio recording is mixed and encoded in stereo, it has mono information and sides information.
During mixing, When you pan something out to L or R, you are simultaneously lowering the mono volume of that thing and increasing the sides volume of it (very basic ... But that's how it works and this is how pseudo surround gets decoded with hard panned and reverbs appearing in the surround channels). Good recordings balance this out8 between mono (mid) and stereo (sides) information. Most professionally mixed music does at least.

The sides information contains reverb tails, some hard panned elements and room tone. If there is a lot of side volume in a mix, if will feel very wide and may have a certain tonality to it that is lost in mono. But a good stereo mix should retain its tonality in mono.

As soon as you are off axis, in another room, across the room or have only one ear bud in with mono mode turned on), you're listening in mono.
 
But a good stereo mix should retain its tonality in mono.

In a good time intensity trading setup, which requires more or less constant directivity in a range of about 700-7k, possibly some floor and ceiling bounce control, it can be revealing that in a case of such recording:

1. In the close field, you can fast switch to right channel only, listen to the overall tonality of the recording, then turn on the left channel and NOTHING perceptibly changes, except for image shifting to the center. Then you switch off the right channel, and again nothing changes except for image shifting to the left.

2. With both channels on, you slowly slide yourself from left to the right loudspeaker, and perceive no tonality or loudness changes except for an interactive image shift that stays summed along your center axis. That's for the close field.

In the far field, switching off any of the channels would result in 6db SPL drop and as far as perceived loudness for the mono information, all of a sudden something is profoundly missing as your side information is suddenly collapsed and fused together as far as imaging. There you can only assess the tonality of each of the channels separately, but it would be recording dependent, namely side information would dictate if the image collapse would also result in some tonality change. There's also potential for bias and high degree of uncertainty with recordings of sufficient complexity. Confusing indeed :)
 
But If an audio recording is mixed and encoded in stereo, it has mono information and sides information.
In a good time intensity trading setup, which requires more or less constant directivity in a range of ...
Once it is stereo it isn't mono, right? The question was, if some kind of mono can be derived from stereo, but you just state it as a given. Sounds circular to me, or tautological, if you will. Then directivity again, the universal adhesive to make every argument stick.

The tech/ of mono downmixing is fundamental to the "directivity" paradigm, not the other way round. Here's how it goes.

Which measurable parameters of a loudspeaker are essential for good sound reproduction? To determine these, loudspeakers should be listened to and evaluated using industrially produced, commonly used stereo recordings. A preference ranking is established, and a set of parameters is derived from it. This constitutes the modeling of “good.”

However, it turned out that the distinguishability of loudspeakers is greater when they operate as single units, i.e., in mono. Therefore, there is a need for a downmix from stereo to mono. Think about that … let it sink in and never forget anymore.

Interjection: Directivity is a parameter of “good,” meaning it is a consequence of the evaluation after downmixing from stereo to mono, not a prerequisite.

The question, then, is whether the sound balance resulting from a downmix from stereo to mono is suitable for ranking loudspeakers for the actual purpose of stereo listening. I would say: absolutely not**, unless there are strong arguments supporting the necessary equivalence. To my knowledge, proponents of mono evaluation have not provided a chain of conclusions trageting that topic.

I give the whole approach the benefit of the doubt. Ultimately, the model mentioned above confirms general standards that have long been implicitly established (after a painstaking period of trial and error). Namely, a nicely linear response on-axis, smooth, with a tilt in the diffuse field that results from even, non-abrupt omnidirectional radiation.

To lend some weight to this method, which would be rather untenable by strict standards, one could simply have assigned a limited number of sound engineers to produce mono recordings of everyday sounds, musical instruments, and speech. Well then, that would have cost money and required a defined program. Apparently, this effort was spared—the result of the standardization was simply plausible: a standard can be whatever already exists.

Now you adhere closely to the standard, emphasizing 'directivity' a tiny bit obsessively, but that's engineering on standards, less science. ASR => AER, audio engineering review? But I acknowledge that science is, in American English, often taken as a synonym for engineering.

** illustrative caveat: a signal having inverted phase in the two stereo channels would be clearly audible in stereo, but vanishes in a summing mono downmix; many microphone techniques, M/S and what have you, do that
 
Last edited:
my understanding of this originally came from mixing music. One technique let's me manipulate the mid (mono) channel and sides (stereo) channel. It's called Mid/Side processing.

An example: Increasing the sides channel gain or reducing the gain of the mid channel increase stereo width and reduces transient impact. Increasing gain on the mid channel reduces stereo width and increases transient impact

Another example: Adding a high shelf boost at 10k in the sides channel can create excitement in a mix without changing the mono compatibility of the music or affecting all elements of the mix.

And mono testing of a single speaker is absolutely necessary to reveal design flaws. Dr. Toole has an excellent post about this. A speaker that measures well in mono will excel in stereo.
 
Once it is stereo it isn't mono, right? The question was, if some kind of mono can be derived from stereo, but you just state it as a given. Sounds circular to me, or tautological, if you will. Then directivity again, the universal adhesive to make every argument stick.
It's a given. The mono channel exists in a stereo recording. Heres how it is decoded.

Mid (M): Sums the Left (L) and Right (R) channels (\(M=L+R\)), capturing everything in the center.

Side (S): Subtracts the Left from the Right channel (\(S=L-R\)), capturing only the differences, which is the stereo information.

The tech/ of mono downmixing is fundamental to the "directivity" paradigm, not the other way round. Here's how it goes.

Which measurable parameters of a loudspeaker are essential for good sound reproduction? To determine these, loudspeakers should be listened to and evaluated using industrially produced, commonly used stereo recordings. A preference ranking is established, and a set of parameters is derived from it. This constitutes the modeling of “good.”
I hope you mean that this idea is backwards and incompatible with good audio science?. Speakers should be tested using a tone sweep and a measurement device with a good measurement microphone to measure for distortion, frequency response, dispersion etc. Using a single speaker is the most reliable way to do this.
The question, then, is whether the sound balance resulting from a downmix from stereo to mono is suitable for ranking loudspeakers for the actual purpose of stereo listening. I would say: absolutely not**, unless there are strong arguments supporting the necessary equivalence. To my knowledge, proponents of mono evaluation have not provided a chain of conclusions trageting that topic.
Mono measurements of a single speaker are the most reliable way to find design flaws. A speaker that measures excellent in mono will sound excellent in a stereo pair (assuming manufacturing is consistent).

down mixing to do listening tests to rank speakers is a flawed approach, as is doing stereo listening tests to rank speakers. Any listening test is subjective. Speakers must be measured using appropriate tools like the ones used on ASR. This is the only way to reliably judge speaker performance. Everything else is preference - but at least if you have good data on your side, your preferences will be within the realm of high quality and not based on groupthink or cognitive biases.
 
In the far field, switching off any of the channels would result in 6db SPL drop and as far as perceived loudness for the mono information, all of a sudden something is profoundly missing as your side information is suddenly collapsed and fused together as far as imaging. There you can only assess the tonality of each of the channels separately, but it would be recording dependent, namely side information would dictate if the image collapse would also result in some tonality change. There's also potential for bias and high degree of uncertainty with recordings of sufficient complexity. Confusing indeed :)
Side channels can contain a lot of energy in the upper mids, where hearing is most sensitive. Especially genres like rock where distorted guitars and cymbals take up a lot of upper mid presence.

Level matching the mono vs stereo output should help decide if a song sounds good in both mono and stereo. Some songs just don't sound as good in mono though, even on well made speakers. But that's a production issue, not a speaker issue.
 
On the presence of a mono signal in a stereo recording:

It's a given. The mono channel exists in a stereo recording. Heres how it is decoded.
Mid (M): Sums the Left (L) and Right (R) channels (\(M=L+R\)), capturing everything in the center.
Side (S): Subtracts the Left from the Right channel (\(S=L-R\)), capturing only the differences, which is the stereo information.

I don't know where you are coming from. Vinyl records were cut this way, technically, while bass content was just mono. Bluemlein defined these terms, M/S when presenting his invention of the stereo groove. But it has no (nil) correlation to how humans hear, it is just a definition (!!) of the mechanical encoding He did like you: mono compatability is a given (as a selling point). That's observed in other fields likewise. Terminology gets overstreched beyond meaning, taking the word for reality. (Some recognize the fallacy, some not, giving rise to social unrest--hence I can't give you an example.)

My previous counter-example still holds (time delay IDT). I give you another one. In stereo the speakers are listend to in the (in)famous 60°-triangle. In mono, presumably, the speakers are positioned straight ahead? If so the head related transfer function (HRTF) is really quite different. (Has nothing to do with the speaker's directivity.)
 
I give you another one. In stereo the speakers are listend to in the (in)famous 60°-triangle. In mono, presumably, the speakers are positioned straight ahead?
And it is not only the HRTF. With stereo you have crosstalk from one ear's speaker, so to say, to the other ear. Comb filter effects, that are missing with an incident angle of real mono 0°, while present at 30° in stereo even for a center = M/S style mono signal.

Eventually, I think proof is provided that your model doesn't hold. What do you think?
 
This is where I am coming from:


In a recording, the mono channel can be extracted separately from the stereo (sides) channel. If there is tonal imbalance in the sound of a piece of music when going from stereo to mono, it's a production problem first and foremost. But that loss of spectral information happens at the processing level first because of what happens at a digital, electrical or component level (depending on if it's a digital switch or an analogue switch) when flipping from stereo to mono. THEN the effect is reproduced by the speaker, then heard.
 
My previous counter-example still holds (time delay IDT). I give you another one. In stereo the speakers are listend to in the (in)famous 60°-triangle. In mono, presumably, the speakers are positioned straight ahead? If so the head related transfer function (HRTF) is really quite different. (Has nothing to do with the speaker's directivity.)

In mono, you would still use an equilateral listening triangle. This is what creates the phantom centre psychoacoustic illusion.

Our auditory system localizes sound using:
Interaural Level Differences (ILD) – differences in volume between ears
Interaural Time Differences (ITD) – tiny timing differences (microseconds) between when a sound arrives at the left vs. right ear.

When both left and right speakers emit the same signal at the same level and time, the brain infers that the source is straight ahead—because the timing and intensity cues match what would happen if there were a center speaker in front of you. That’s the illusion of a phantom center.

To summarize:
- The phenomenon of losing spectral information when down mixing to mono from stereo has a lot to do with how the audio is mixed and produced in the first place. On a well designed speaker, all you should hear is what was mixed and produced, without additional colouration. Meaning, a good speaker won't take away any additional spectral information than what is already inherent in the recording when going from stereo to mono.
- in a dual mono set up, where the same Singal at the same amplitude is coming out of two speakers. The listener will perceive all sound as coming from between the speakers, assuming the listener is sitting in the sweet spot.

A song I know of that loses quite a bit of upper midrange when downmixed to mono is "Dig" by Incubus.
 
Last edited:
In mono, you would still use an equilateral listening triangle. This is what creates the phantom centre psychoacoustic illusion.
Ja, that's a very good idea! For reference:

A single speaker is required, they say. Reasoning is, that the spacious stereo effect would distract. But then, why not downmixing to mono but keep the stereo triangle with a sole center signal?
 
A single speaker is required, they say
#sigh# it feels to me as if you are consistently missing the point. This is ONLY about detecting speaker flaws, NOT about enjoying music.

The research has determined that speaker faults are easiest to hear when only one speaker is playing. Some source material (dense "pop") is better than others for triggering these faults. The more speakers of identical type are being used, the harder it is to spot the inherent flaws in the speaker design.
 
#sigh# it feels to me as if you are consistently missing the point. This is ONLY about detecting speaker flaws, NOT about enjoying music.

The research has determined that speaker faults are easiest to hear when only one speaker is playing. Some source material (dense "pop") is better than others for triggering these faults. The more speakers of identical type are being used, the harder it is to spot the inherent flaws in the speaker design.
Hope your 'sigh' gave you some relief ;-) Times it's hard to follow so many topics. See my posts #111, and #112 to better track the context. I'm not dismissive on the "mono" approach, but critical. The comment you commented was meant to be constructive, in that using two speakers in in triangular arrangement but playing mono from a (virtual) center might circumvent a few caveats with a real "single" speaker. While apperently playing into the hand of saying that the stereo excitement was too distractive. At last it would provide a test for that hypothesis. Well, done?
 
The comment you commented was meant to be constructive, in that using two speakers in in triangular arrangement but playing mono from a (virtual) center might circumvent a few caveats with a real "single" speaker.
Why invent increased complexity? We have evidence of a simple, repeatable test.
 
Ja, that's a very good idea! For reference:

A single speaker is required, they say. Reasoning is, that the spacious stereo effect would distract. But then, why not downmixing to mono but keep the stereo triangle with a sole center signal
There is no centre "signal" in a two speaker set up. It's a phantom image created by your ears and brain when two speakers play mono content.

Two separated speakers playing the same mono signal are also not “one speaker louder”. It becomes a 2-element acoustic array, and arrays have interference (frequency-dependent summation and cancellation). A Klippel-style measurement of a single loudspeaker is trying to estimate the loudspeaker’s own transfer function and directivity. Adding a second source changes the system being measured.

If you are measuring two speakers at the same time, you are no longer measuring a loudspeaker. You are measuring a two-element acoustic array at one or more spatial points. Sonarworks does that. It's not useless. It's just not what ASR is about and it doesn't give the kind of information about the design of a speaker that can help us know if the speaker is designed well or not.

.... And Mono just means the electrical signal is the same. But The acoustic sum depends on phase at the microphone, which depends on distance. Mono only guarantees that the speakers are fed the exact same signal. What happens after that (what the speakers reproduce) is a matter of design and then acoustics.
 
If you are measuring two speakers at the same time, you are no longer measuring a loudspeaker.
Basically, the problem is being formulated again. The thesis is that the subjective evaluation of loudspeakers should be conducted using a single loudspeaker. Under this assumption, discrimination based on subjective criteria is expected to be more reliable and clearer than in stereo. For this purpose, standard hi-fi recordings that were originally mixed in stereo are used.

This leads to the question of how stereo can be converted to mono without affecting the subjective judgment — that is, without distorting the evaluation criteria when shifting from the relevant application (stereo) to a different level (mono). Since the approach is inherently subjective, the explanatory power of theoretical reasoning is limited.

After extensive discussion of technical details (M/S stereo), we began to question the role of the HRTF. This concerns listening with frontal radiation (mono) versus a 60-degree loudspeaker triangle (stereo arrangement played mono) with center phantom source. In both cases, spatial information is deliberately excluded, as it is considered a source of distraction in stereo listening. However, in the 60-degree arrangement, the complex implications of stereo reproduction are still preserved.
 
Last edited:
Back
Top Bottom