• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Sonic impact of downmixing stereo recordings to mono

I actually worked in studio's and there it's always checked in mono, as most of the speakers will be mono. If a mix sounds different in tonality in mono than in stereo, the mix was not good was the phylosophy then (and it should still be). It's a practice i always did use, also in own low budget productions at home, or even for mixing dubplates (special on-off remixes for dj's), what i did a lot. Switching from stereo to mono by a simple summing is a very good test to see if your mix is balanced right.

I heared also things in summing that are not right in mono, but that is for me because the music was not mixed right. Not because the summing was not done right.
 
Occasionally, I implement the first stereo recording I regard to being well-executed, in a selection of listening test tracks, and people are usually amazed by the high standard of this one. Give it a try in stereo, as it creates good mood as a side effect:

View attachment 455565
Listening to this on Truthear Gates and I agree that this is a fine stereo presentation (nicely dynamic too). I will try this on some speaker based systems too including summed stereo and one speaker stereo.

Off topic, but are orchestras ever recorded with the same microphone technique nowadays? I can only imagine how good it could sound with modern hardware.
 
I actually worked in studio's and there it's always checked in mono, as most of the speakers will be mono. If a mix sounds different in tonality in mono than in stereo, the mix was not good was the phylosophy then (and it should still be). It's a practice i always did use, also in own low budget productions at home, or even for mixing dubplates (special on-off remixes for dj's), what i did a lot. Switching from stereo to mono by a simple summing is a very good test to see if your mix is balanced right.

I heared also things in summing that are not right in mono, but that is for me because the music was not mixed right. Not because the summing was not done right.
Interesting. So you’re saying that poorly balanced stereo mixing results in poor mono translation/summing? Could you describe an example of a specific component in a mix that this studio would say is poorly balanced?
Is there a generalised type of stereo presentation, e.g. wide, narrow, diffuse, dry, spacious, that typically aims to mono well?
 
It's in general, a good stereo mix should sum into mono without tonal changes, and where that goes wrong can be on a lot of factors and need to be found when you got the error. It's hard to explain in a forum post how you debug that. But tonal changes when summing to mono is a red light for a bad mix, was what i learned from older experienced mixers. Several mastering engineers told me the same and they do check, very often on a single NS10 or Auratone type of speaker as faults in this are mostly in the midrange most hearable. That is also the only reason why you often see a single one in mastering studio's (mainly avatone/auratone speakers today), to check mono summing.
 
When you think about how many people listen on mono summed systems like small portable systems, it makes perfect sense that mixing/mastering stages would require checks for mono tonality. If we make a fair assumption that a large proportion of published content has been produced to work well in mono then we can assume that summed mono listening tests using one speaker will be perfectly valid for judging tonality.

Some people may be overthinking things for the sake of some fringe cases and without evidence we can only really go on Toole’s data.
 
... signal from the mono microphone would still hit the listener´s head from +-30deg in a stereo setup - that is what it is mixed and EQ´ed for. If you do a downmix to mono of the same recording, it will be presented usually at a central position, hence completely different post-HRTF frequency response. A mono speaker correcting this difference would be perceived as ´more natural regarding timbre´, while the same speaker in a stereo setup will sound colorated.
Exactly, and again all experts just ignore what they don't understand. I see the problem, but don't have a solution. I can offer a solution for another problem, here it is ... .

What @Arindal says is, that the problem of downmixing stereo to mono is not to be solved on a technical (signal) level. It cannot be understood without taking the individual Head Related Transfer Function (HRTF) into account. But that again is easy, see


especially section 2/2. A downmix is impossible, if it is understood as targeting human perception. Stereo and mono (aka one speaker only) are different animals alltogether.

One may suggest, that listening to stereo needs training, some learning to actively ignore (LoL) distractions inherent to the basic principle. This ignoring may interfere with a thorough evaluation of the speaker, especially in regards to resonances and ondulations in amplitude response ober frequency - the comb filter established by the HRTF and ear to ear crosstalk in stereo fight each other.
 
This has been widely discussed in the pro audio community. It is more or less common agreement that with recordings based on two or more microphones capturing the events which will become exactly central phantom sources (such as main stereo mic arrangements in a concert hall), it is a more or less negligible problem (will not go into details of center elevation with such recordings). Narrow-banded cancellation effects caused by HRTF and interaurally identical crosstalk is, as mentioned in David's video, the main root of the problem, but by far not the only one related to HRTF and angles for central phantom sources.

If you look at the average 30deg az. eardrum FR relative to 0deg ahead, you notice an additional peak around 5K. This is another band which mixing engineers will address when panning lead vocals as central phantom source originating from a mono microphone signal which is rather unrelated to keeping the exact central position in a nearfield environment.

In any case, a dry mono signal from a microphone very close to a singer will call for applying EQ to sound more or less natural. No engineer will mix down this signal unchanged, and most will use stereo speakers positioned at +-30deg in a more or less untreated room to judge tonality. There is still a great variety of EQ curves to be expected, but reference point is the +-30deg of the stereo triangle.

If you now take one channel of any EQ-corrected recording and listen to it under 0deg mono, well, you will end up in applying a correction curve which is not meant for this angle and you will hear significant colorations. If a speaker sounds ´more natural´ under these conditions, you can be sure it is not neutral in terms of tonal balance, but doing something to reverse the ´wrongly applied´ EQ correction.

One might come to the idea that placing a mono speaker at -30deg left would solve this problem. Unfortunately it does not. On one hand you have the crosstalk to the right ear which in particular attenuates the aforementioned 5K band. On the other hand at -30deg the loudspeaker itself can be localized as a real source by our brain pretty precisely as coming in from 30deg left, so it would expect a different tonality compared to the central phantom source. And tonal balance is to a certain degree judged based on pattern recognition.

The only conclusion in my understanding: Judging tonal balance of speakers in mono can only be executed with designated material mixed in mono. Using stereo material of any kind, is prone to misjudgments particularly if these speakers will be used in a stereo arrangement later.



If on-axis tonality would be solely a matter of preference and individual choice, wouldn't it mean the end of any measurement-based verdicts of loudspeakers, research on preference, controlled listening tests and optimizing speakers as well as rooms? Your statements sounds as if you are promoting what Amir has called ´the Wild West´ where any reproduction curve as absurd as we can imagine, was to be accepted if only someone declare it to meet their taste.

In practice, the HRTF-related tonality issues with phantom source vs. real source, testing in mono vs. stereo, listening at 0deg or 30deg, are much much more pronounced than +-1.5dB. Intraaural difference within this listening window can be as high as +-11dB relative to the other variant in narrow frequency bands.

Interestingly, it is seemingly not a specific ´correction curve´ which introduces errors and misjudgments related to phantom vs. real source tonality. If I recall it correctly, Dr. Toole has confirmed in the parallel thread that such thing as a ´disappearing mono localization´ exists as a quality of certain speakers, preferred in a mono test. I can confirm that, and my hypothesis would be that it has to do with the indirect soundfield and the directivity of the speaker.
As mentioned, I would like to see published double-blind studies of the subject. This discussion is not new at ASR and it ends with the same conclusion. Lack of evidence either way.
 
What needs to be consistent at evaluating in mono while the recording is stereo is the use of the same channel.

If for example the song used is Space Oddity (the stereo version, there's also a mono one as I found out) , there will be voice or guitars at one channel but not at the other (we're talking hard panned at some points)

Just for sanity.
 
I'm sorry to be blunt, but this is incorrect or at least out of context.

Yes, you are right about how the difference signal (L-R) is added to allow STEREO receivers to decode and recreate L and R from the summed L+R mono feed.

But, roll back to before stereo FM. It was mono, obviously - and the mono feed was L+R summed. People could only hear music in mono from summed L+R. When stereo was introduced, the approach adopted had to be backwards compatible, hence the pilot tone and suppressed carrier. This meant that people listening on mono FM receivers still only heard the L+R mono feed after stereo was introduced. Also, anyone on the fringes of a transmitter will often only pick up the L+R mono feed even if they have a stereo decoder. In practice this is how many (perhaps the majority if you consider kitchen radios) still experienced the music, despite stereo being available.

To make this acceptable, broadcast producers making their own content went to great pains to ensure "mono compatibility". Many radio mixing panels have a L/R meter and a M/S meter. If the gap between the M and S meters was too small (or the S was higher than the M) the mix would be corrected so mono listeners were not affected. Every professional in the industry is trained to do this.

In summary, if you are on the fringes of an FM transmitter or you have mono FM receiver, all you hear is mono L+R. This is how millions of people have consumed music for decades. If there was a major issue it would have been discovered in the 1960s.
yes :) my father was a sound engineer at SR (Swedish public service radio ) and later at SVT ( Public service TV ) and then had his own company .he later became an video editor.
he told me about how they always checked for mono compatibility as most radio was consumed trough small mono radios. And for TV , the old TV sets only had mono speaker .

Well produced content sounds reasonable in any channel count .

There might be small inconsistencies ofcourse , but if you manage to close the circle of confusion when both producer and listeners have similar speakers this will be consistently corrected for by the producers .

I think the errors in speakers are large enough . when we iterated the circle of confusion another couple of rounds maybe it will come into play ? when listener and producers are closer ?

My amateur layman thoughts is that this does not invalidate mono comparisons as a powerful tool and this issue should not be used as FuD to discredit this tool.

But its there ofcourse .

And is this a case when perfect is the enemy of the good enough ?

Edit: sorry forgot a NOT , English is not my first language
 
Last edited:
it makes perfect sense that mixing/mastering stages would require checks for mono tonality

How do you imagine such check to be done practically? I mean, if the summed tonality turns out to be imbalanced on a mono speaker, should the mixing engineer be forced to bypass the EQ curve so it will sound kinked in stereo? Does not make sense to me.

When listening to mono on some portable speaker or all-in-one Bluetooth speaker, I do not think overly many people are thinking in terms of hi-fi tonality or studio accuracy. They accept a certain degree of coloration. And if not, manufacturers of such speakers can apply directivity-related tricks which they will anyways do to make the sound more attractive with the typical music people play on such.

One may suggest, that listening to stereo needs training, some learning to actively ignore (LoL) distractions inherent to the basic principle.

I don't think it is necessary to actively ignore typical properties of stereo. If you are aware of the system´s limitations and do not expect a perfectly realistic reproduction in terms of tonality and ambience at the same time, you can listen to a vast majority of recordings of your preferred genres without feeling annoyed.

Again, I am not saying that there is a right and a wrong way, and that 2 speakers in a room forming a stereo setup must precisely follow certain standards. I am just recommending to do own listening tests with a vast variety of different recordings in a setup which is as close as possible to the one that will be used for enjoyment later. If you include recordings which are a bit off the average in terms of tonality, dynamics, bass, border to harshness and imaging in all directions, you hopefully get a good impression how to bring your own system closer to the ones that typical mastering engineers are using. If you can listen to 80%+ of tracks of a random Spotify radio stream of your preferred genres thinking the sound gives you enjoyment, I would declare the goal to be met.

What needs to be consistent at evaluating in mono while the recording is stereo is the use of the same channel.

What do you do with orchestras following different philosophies of placing the instruments? Take the 2nd violins for example with American placement vs. German/Austrian one, they would either dominate the or disappear from the left channel. Does it mean all recordings following the American one should be banned from testing?

Another question: If mono testing is as reliable as Amir and others are claiming, shouldn't it lead to the exact same verdict regardless if you use left channel, right channel, or mixdown sum for comparison? If I understood Amir correctly, he said the results had been identical so he gave up on downmixing.

Well produced content sounds reasonable in any channel count .

That is not the case with runtime stereophony recordings. They are definitely not mono compatible, particular the ones which to a high degree contain a greater distance A/B main mic arrangement creating a wide soundstage solely based on interchannel delay.
 
One simple set-up for a first experiment would be:

- use a single high quality speaker with even dispersion
- if needed EQ that to within +/- 0.2 dB within 1-8 kHz on-axis
- Use a comparative EQ of about -3 dB at 1.8 kHz, + 2 dB at 3.5 kHz (see curve Shirley et al in reflective room). This EQ assumes the opposite EQ has been done in the mix for center placed phantom sources
- With several music tracks mixed for stereo using known center placed phantom source, use one channel, listeners score for preference, double-blind

Not perfect of course because dispersion also changes, but it can give some indication.
 
@Arindal if summed stereo does sometimes change the tonal characteristics of the recording, so what? Tonal balance varies so wildly from recording to recording yet we are still able to use a variety of recordings to form an opinion about a speaker.
 
What do you do with orchestras following different philosophies of placing the instruments? Take the 2nd violins for example with American placement vs. German/Austrian one, they would either dominate the or disappear from the left channel. Does it mean all recordings following the American one should be banned from testing?

Another question: If mono testing is as reliable as Amir and others are claiming, shouldn't it lead to the exact same verdict regardless if you use left channel, right channel, or mixdown sum for comparison? If I understood Amir correctly, he said the results had been identical so he gave up on downmixing.
Reference tracks for evaluation is not the same as listening for pleasure, they are tools and are defined and limited.
It's the same kind of consistency, one uses music who knows well.

It's boring (it should be) but if we see it as just tools...
 
I actually worked in studio's and there it's always checked in mono, as most of the speakers will be mono. If a mix sounds different in tonality in mono than in stereo, the mix was not good was the phylosophy then (and it should still be). It's a practice i always did use, also in own low budget productions at home, or even for mixing dubplates (special on-off remixes for dj's), what i did a lot. Switching from stereo to mono by a simple summing is a very good test to see if your mix is balanced right.

I heared also things in summing that are not right in mono, but that is for me because the music was not mixed right. Not because the summing was not done right.

Yes, but what you describe is the usual mono-compatible test, mostly done to make sure there are not too many conflicting elements in the extreme stereo fields, not causing any weird masking and nulls. If you use the mono test for finding tonality problems, you would risk introducing tonal issues when played in stereo. A stereo mix is, of course, foremost meant to be played in stereo. :)
 
As mentioned, I would like to see published double-blind studies of the subject. This discussion is not new at ASR and it ends with the same conclusion. Lack of evidence either way.
Why not do it yourself? What is your own, personal impression, once you chose a proper setup? We should not overly emphasize big science. It always starts with humble little experiments after a thorough thinking process, of course. So, what are you after, actually, the quest.
I don't think it is necessary to actively ignore typical properties of stereo.
The ear to ear crosstalk is expected to be an obstacle. (Needless to say, that avoiding it is neither practical, nor would it help with already published recordings ;-) May I add a link to quite an entertaining discussion on stereo effectivity again:

https://hauptmikrofon.de/theile/1980-2_Diss._Theile_englisch.pdf

A must-read for sure, if not already a triviality after 40+ years of its existence :cool:

It makes my argument, why mono (single speaker) evaluation might be a good thing. But anyway, if stereo is the use case, why bother, pedantry - is it about fun or longing for glory?
 
My hypothesis is that mono listening (both single speakers and quasi-mono signals in stereo) favors speakers with imbalanced directivity index, particularly those pumping too much of energy in the 1-2K band into the room while attenuating the 2-5K and higher bands. The relative level of these two bands is vital for the perception of frontal or direct soundfield vs. diffuse field or sound coming from the rear half of the horizontal plane.
If I understand correctly, your hypothesis would predict that the highest rated speakers in Harman's listening tests (mono) should show lower DI at 1-2 kHz ("pumping too much energy...into the room") and higher DI at 2-5+ kHz ("attenuating") because directivity index is sound power minus listening window, and I interpret "favors" as meaning that Harman's trained listeners would express preference for such. Your statements also remind me of Blauert bands and the so-called BBC or Gundry dip (https://www.audiosciencereview.com/...gundry-dip-in-loudspeakers.23180/#post-773206).

These graphs from Toole's second edition of Sound Reproduction suggest relatively smoothly increasing DI up to a few kHz and then relatively flat or decreasing up to around 8 kHz for the Revel Salon 2 and JBL Array 1400, two of the four "kings of the hill" at that time from Harman's preference testing (I believe that the other two are here: https://seanolive.blogspot.com/2008/12/part-3-relationship-between-loudspeaker.html). Here is another set of Salon 2 measurements: https://www.spinorama.org/speakers/Revel Ultima2 Salon2/Revel/index_harman-v2-2017.html. An alternate and significantly more limited view of this might be Stereophile's lateral response curves for the Salon 2 (https://www.stereophile.com/content/revel-ultima-salon2-loudspeaker-measurements) and Array 1400 (https://www.stereophile.com/content/jbl-synthesis-1400-array-bg-loudspeaker-measurements). I don't perceive any definite evidence favoring your hypothesis over Toole's "relatively constant, or at least smoothly changing directivity."

Young-Ho
 
Yes, but what you describe is the usual mono-compatible test, mostly done to make sure there are not too many conflicting elements in the extreme stereo fields, not causing any weird masking and nulls. If you use the mono test for finding tonality problems, you would risk introducing tonal issues when played in stereo. A stereo mix is, of course, foremost meant to be played in stereo. :)
This is not the case today (and never was). Most stereo mixes will be played in mono. Most clubs run in mono, most BT speakers, boomboxes and phones also. And that is what most people listen to. Certainly today, a full blown stereo system is not that common anymore, and even before there were way more mono transistor radio's out there than high quality stereo systems. And the music industry knows that and adapt to that. Audiophiles are always nitpicking on image and so, but most don't care and the music is mostly made for the mainstream, who does not care. That is the reality, now and in the past. Mono is almost more important (busssiness wise) than Stereo, not an aftertought. So your record/cd/digital track should sound the same in both ways from the same source, a stereo file/medium.

Remember that this place host the nerds, the freaks that care about the little details. 99% of the world population that listen to music doesn't give a f**k as long as they can hear the songs they like and sing along.
 
if summed stereo does sometimes change the tonal characteristics of the recording, so what?

If such a signal would be used in a mono listening test of speakers in order to judge their tonal balance, it would lead to misjudgments. Speakers which compensate for this phenomenon inherent to stereo recordings this or that way, would be more likely to win this test, while speakers which deliver best tonal balance results in stereo tests would more likely to be perceived as colored or imbalanced.

I would not find such a test to be useful, but seemingly it is happening.

Tonal balance varies so wildly from recording to recording yet we are still able to use a variety of recordings to form an opinion about a speaker.

That is the point. You cannot foresee the variations which different mixing engineers have applied on their microphone signals for millions of tracks. I think it is a good idea to stay within the range of a standard studio setup and follow their standards as much as it is practically possible in a living room. You dramatically increase the chances that a majority of recordings which you might be listening to in future, will sound satisfying and not make you want to jump up and correct them with DSP.

Recording techniques, styles and taste of the mixing engineers do vary, but the number of recordings which are completely off in terms of tonality, is astonishingly low.

If you choose a speaker instead which sounds somehow more balanced with a downmixed stereo recording compensating for the inherent tonal imbalances of this method, you are much more likely to have a higher number of recordings being far off average and sounding annoyingly colored, boring, delivering dull ambience, reducing perceived proximity or any other of the side effects of such reversed compensation.

Reference tracks for evaluation is not the same as listening for pleasure, they are tools and are defined and limited.
It's the same kind of consistency, one uses music who knows well.

Having done such evaluation professionally, I am pretty well aware of that. As mentioned, I have auditioned several 100,000s of tracks if they are meaningful for listening tests, and have compiled playlists and sampler albums containing roughly 2,000 to 3,000 tracks which I know well enough in order to choose the one I would consider to be meaningful in a specific listening environment for the given task.

your hypothesis would predict that the highest rated speakers in Harman's listening tests (mono) should show lower DI at 1-2 kHz ("pumping too much energy...into the room") and higher DI at 2-5+ kHz ("attenuating") ... and I interpret "favors" as meaning that Harman's trained listeners would express preference for such.

I cannot predict what a specific group of listener would express preference for, but, yes, that was pretty accurately the result of almost every listening test I have done myself or taken part in. For most of rooms, a constant directivity in that region (1-5K, if not overly deviating from the band one octave lower) is sufficient. If the indirect sound in the room might become dominant, some people who share the same idea even propagate a higher directivity from 0.5-2K compared to a lower one from 2-5K roughly.

Your statements also remind me of Blauert bands and the so-called BBC or Gundry dip

Blauert´s preferred bands were one starting point of this theory, the other being the equivalent loudness for direct vs. diffuse sound as described by Zwicker/Fastl in one of their standard books on psychoaoucstics.

The main idea is: If a sound event like a reverb pattern shows an emphasis in the 0.8-2k band, our brain tends to localize it as coming from the rear/sides. If the 2-5K bands dominate over the 0.8-2k, we get the perception ´coming from frontal angles´. For a mono listening setup, the former is presumably less annoying as it mildens the annoying directness of a mono real source and spreading the angles at which reverb is subjectively coming it, making the mono speaker ´disappear behind the screen´, as Dr. Toole has put it. The latter is in favor of a stereo setup in an overly lively room as it directs the additional reverb to the frontal listening window, ´hiding it´ behind the direct sound and making it more likely to blend with the uncolored reverb from the recording.

The latter is also the reason why some loudspeaker manufacturers do this intentionally.

hese graphs from Toole's second edition of Sound Reproduction suggest relatively smoothly increasing DI up to a few kHz and then relatively flat or decreasing up to around 8 kHz for the Revel Salon 2 and JBL Array 1400, two of the four "kings of the hill"

The question would be which constant directivity or decreasing d.i. in the 1-5K bands loudspeaker without major flaws they have tested as a competitor, and if this has been done with judging tonality and imaging in stereo as well.

I am not aware of such test, as the number of loudspeakers being specifically advertised and achieving CD over a broad band, only appeared after 2010. Around this time I had some eye-opening listening events and tests, some controlled and some not, which brought me to really being sensitive to loudspeakers with increasing directivity index.

I don't perceive any definite evidence favoring your hypothesis over Toole's "relatively constant, or at least smoothly changing directivity.

As mentioned, this is a relatively young concept, as most of the technology to achieve it without major tradeoffs, did not exist before the era of DSP-controlled active speakers.

I encourage everyone to do a comparison between a true constant directivity speaker and one with ´smoothly increasing directivity´ in the 0.5-8K band. I do not see a hard threshold between the two, and there are some speakers with a slight increase in d.i. which one could EQ to satisfaction. I would say a constant plateau between 0.8-5K is the single most important thing with the neighboring bands not making a step up in d.i. But the moment the index is increasing over several octave-broad bands, particularly if the 3-5K band is already of higher d.i., makes me pretty cautious. I have had too many moments of disappointment, as the dull, rear-heavy reverb is also detereorating other aspects of sound quality in my understanding.
 
Last edited:
This is not the case today (and never was). Most stereo mixes will be played in mono. Most clubs run in mono, most BT speakers, boomboxes and phones also. And that is what most people listen to. Certainly today, a full blown stereo system is not that common anymore, and even before there were way more mono transistor radio's out there than high quality stereo systems. And the music industry knows that and adapt to that. Audiophiles are always nitpicking on image and so, but most don't care and the music is mostly made for the mainstream, who does not care. That is the reality, now and in the past. Mono is almost more important (busssiness wise) than Stereo, not an aftertought. So your record/cd/digital track should sound the same in both ways from the same source, a stereo file/medium.

Remember that this place host the nerds, the freaks that care about the little details. 99% of the world population that listen to music doesn't give a f**k as long as they can hear the songs they like and sing along.

Yes, but the bolded part is an impossible goal to reach when it comes to the particular problem with phantom-panned sources sounding "dull" without EQ adjustments, which will, with the same EQ adjustments, unavoidably lead to an exaggerated sound for those sound sources if listened to on a single speaker in mono. I don't expect there are that many mixing engineers who care enough about the mono mix to make compromises for the sound when played in stereo.
 
Back
Top Bottom