• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Sonic impact of downmixing stereo recordings to mono

Arindal

Addicted to Fun and Learning
Joined
Apr 6, 2025
Messages
509
Likes
466
Consequently, recordings have an enormous influence on one's perceptions of soundstage and imaging.

There is no doubt about that. But I do not see any evidence for the loudspeakers and the way they interact with the listening room, are of no significance for soundstage and imaging. How the ´sonic panorama´ created by the recording engineer will translate to phantom localization, image of the concert hall, depth-of-field, reverb tonality and alike, to a significant degree is depending on the loudspeakers.

There are vast differences between different models creating a soundstage, and directivity index of the speaker over frequency will greatly influence how the added reverb in the listening room will blend with the reverb on the recording, or rather not.

The direct sound is a dominant factor in evaluating sound quality. Clearly it is timbral neutrality that is being looked for by listeners

While I agree the direct sound is the dominant factor for perceived tonal balance, there is a practical problem with this tested on a mono speaker. Where to source the proper mono recording from to reliably judge timbral neutrality?

This is more difficult than it sounds. There are not many dedicated mono mixes containing music of natural timbre and healthy reverb ratio sufficient for judging speakers. If you, on the other hand, rely on stereo recordings, you run into a bunch of problems with tonality which might be inducing errors and misjudgments on neutrality into your test:

Downmixing stereo recordings to mono will cause not only an imbalance of the instruments spread over the stereo base, it will result in a kinked tonality particularly of instruments and voices panned close to the center. As we can assume the recording was mixed and mastered on a stereo setup, the tonality of voices close to the center is recorded or equalized to hit the listener´s head at plus and minus 30deg while creating a phantom localization at 0deg. If you listen to the downmix on 0deg in a mono setup, a loudspeaker giving a kinked tonality correcting for this mistake, will be perceived as ´natural´ while a completely neutral-sounding (in a stereo setup) will expose the full tonality error.

Ditching one channel of stereo, will not help, as the tonality of a single real source will also deviate tonality-wise from how it was mixed and intended to sound like.

That is just a few thoughts on why I think doing a preference test of loudspeakers in mono will in many cases lead to misjudgement if they are intended to be used in stereo later. This is independent from the fact that for loudspeaker development, identifying audible flaws (like resonances, distortion etc.) or any sort of discrimination test, mono will be superior and more sensitive.
 
Last edited by a moderator:
@Arindal Do you have an example of a commercial stereo recording that when listened to as mono has this kinked timbre tonality to the vocals you claim?

While acknowledge the mono mix may change some of the tonality, have never noticed any major issue with vocals on a mono mix.
 
Last edited:
Maybe I'm missing something, but why does it even matter if the mono downmix has artifacts or not? As long as the same content is played through the speakers under test, it's still a like-for-like comparison. The point is to compare the speakers, and the music is a test signal. It seems to me that it only matters that it's consistent.
 
Maybe I'm missing something, but why does it even matter if the mono downmix has artifacts or not? As long as the same content is played through the speakers under test, it's still a like-for-like comparison. The point is to compare the speakers, and the music is a test signal. It seems to me that it only matters that it's consistent.

Agree, as long as the content is not extremely manipulated in some very artificial way. @Arindal's post alleges that stereo content could result in a mono mixdown that could favor a speaker with tonality issues. Am skeptical but keeping an open mind.
 
Maybe I'm missing something, but why does it even matter if the mono downmix has artifacts or not? As long as the same content is played through the speakers under test, it's still a like-for-like comparison. The point is to compare the speakers, which one can do with any test signal as long as it's consistent.
Thank you. As stated, Pink Noise is incredibly revealing, and avoids the supposed issue of mono artifacts. I am waiting for specific examples of the downmix artifacts somehow (in a Rube Goldberg kind of way) to benefit speakers with resonances and their own artifacts (two wrongs make a right?!?)

Also, we measure the 3D sound field of a speaker. Superposition of sound from a second speaker in a room is standard physics, standard math, not unknown.
 
Maybe I'm missing something, but why does it even matter if the mono downmix has artifacts or not? As long as the same content is played through the speakers under test, it's still a like-for-like comparison. The point is to compare the speakers, and the music is a test signal. It seems to me that it only matters that it's consistent.

I can name one possible problem, which has been talked about earlier in this thread.

Most music mixes are made in a "dry" listening environment using nearfield monitors, where the mixing engineer may have added a 3 dB to 6 dB boost in the 2 kHz frequency range to the usually phantom-centered vocal track to tackle the "dullness" caused by the comb-filtering effect (a phantom-sound problem). If a mix like that is then summed to mono, that boost should sound exaggerated when played by a single flat-measuring speaker in mono when positioned right in front of the listener, so why would a flat-measuring loudspeaker be preferred over another loudspeaker that may have a dip in that frequency range, if the criteria is to listen for the most neutral sound?
 
If a mix like that is then summed to mono, that boost should sound exaggerated when played by a single flat-measuring speaker in mono when positioned right in front of the listener, so why would a flat-measuring loudspeaker be preferred over another loudspeaker that may have a dip in that frequency range, if the criteria is to listen for the most neutral sound?
I don't think the criteria was to "listen for the most neutral sound". Preference was what was being tested, and a neutral response was what was found to be preferred if I understand correctly. If there was a very limited amount of content used for testing I could see a weird quirk like that sneaking in. Given that there were numerous test tracks, I believe, it seems unlikely.
 
Do you have an example of a commercial stereo recording that when listened to as mono has this kinked tonality to the vocals you claim?

Try this one:

Egmont_Haselböck.jpg


Compare the three voices and how each of them blends differently with the enveloping reverb. I would not say more.

While acknowledge the mono mix may change some of the tonality, have never noticed any major issue with vocals on a mono mix.

Have you ever activated and deactivated phantom center while listening to a surround mix solely with dry mono speech mixed to the center channel? I think the difference is pretty obvious.

why does it even matter if the mono downmix has artifacts or not?

In this case the issue is not downmix artifacts but different tonality being preferred by the mixing engineer depending on the studio monitoring setup. If a track originating from close-mic´ed mono recording is to be intensity-panned to the center of the stereo base, the dry signal from the mono microphone would still hit the listener´s head from +-30deg in a stereo setup - that is what it is mixed and EQ´ed for. If you do a downmix to mono of the same recording, it will be presented usually at a central position, hence completely different post-HRTF frequency response. A mono speaker correcting this difference would be perceived as ´more natural regarding timbre´, while the same speaker in a stereo setup will sound colorated.
 
Last edited by a moderator:
Try this one:

View attachment 455244

Compare the three voices and how each of them blends differently with the enveloping reverb. I would not say more.

I think you need to do so for a valid test.

Are you suggesting listening in stereo and comparing to listening to one speaker of the pair in mono? If so, how do you switch between them quickly enough for valid comparison. Are you compensating for the level difference?
 
For anyone worrying about summing to mono sounding wrong, consider FM broadcast. The L+R are summed to mono. The difference signal is then modulated onto the suppressed carrier. Millions of mono receivers have been successfully playing summed mono to their owners. Statistically this probably means most people have only ever experienced the L+R version of popular music.
 
Are you suggesting listening in stereo and comparing to listening to one speaker of the pair in mono? If so, how do you switch between them quickly enough for valid comparison. Are you compensating for the level difference?

I am a strong advocate of doing that.

I run a LCR setup just for music. Each speaker can run summed mono. Any two pairs, or all three can run summed mono.
Any two pairs can run in stereo, normal L&R, L&C, C&R.
All three in LCR can run a different LCR matrices, ala Gerzon etc.
Level matching for all those combos is via broadband pink (each LCR is identical, main speaker on top of sub)

It teaches me how varied recordings are. It teaches me I never really know what will sound best. It teaches me how the L, C, and R in mono all sound a little different based on room interactions.
Yes, there is a center gravity that heads towards the center of the circle of confusion I guess. And I would say that center of gravity is well done stereo.
But still...it's amazing how some recordings come more alive under a particular choice of setup, usually a LCR matrix.. Sometimes, albeit rarely, single speaker mono rules a track.

The how? A good open-architecture prosound processor that let's you design whatever you want into presets.
I use Q-sys.... there are a number of other good alternatives.
This custom remote that runs on PC laptop or any IOS device, allows instant silent switching between anything discussed above, and a good deal more. Think of MitchCo's Hang Loose Convolver, that also lets you incorporate nearly anything into presets or the remote, including I/O routing & selections.

1748989588149.png
 
stereo content could result in a mono mixdown that could favor a speaker with tonality issues.
Possible in theory if the music has certain things mixed way out of phase, they can cancel in the mono mix. This means more holes in the spectrum of music than you had before the mixdown.

This isn't super common, and last time I mixed anything it was best practice to maintain "mono compatibility" to an extent.

That said, pretty much all music has this problem. Few songs have what you'd call a continuous spectrum with sufficient energy at every frequency.

This is why low key jazz is often used at audio shows. A sparse spectrum hides speaker flaws. If e.g. there is a nasty resonance at 700hz, better make sure to play music that doesn't excite that frequency, which means keeping as close to solo pure tones as you can.

All this is to say that music is IMO okay at best as a diagnostic tool.

Pink noise doesn't give you much of a subjective impression, but it's definitely the best way to identify flaws by ear, even metal isn't as good.
 
I am a strong advocate of doing that.

I run a LCR setup just for music. Each speaker can run summed mono. Any two pairs, or all three can run summed mono.
Any two pairs can run in stereo, normal L&R, L&C, C&R.
All three in LCR can run a different LCR matrices, ala Gerzon etc.
Level matching for all those combos is via broadband pink (each LCR is identical, main speaker on top of sub)

It teaches me how varied recordings are. It teaches me I never really know what will sound best. It teaches me how the L, C, and R in mono all sound a little different based on room interactions.
Yes, there is a center gravity that heads towards the center of the circle of confusion I guess. And I would say that center of gravity is well done stereo.
But still...it's amazing how some recordings come more alive under a particular choice of setup, usually a LCR matrix.. Sometimes, albeit rarely, single speaker mono rules a track.

The how? A good open-architecture prosound processor that let's you design whatever you want into presets.
I use Q-sys.... there are a number of other good alternatives.
This custom remote that runs on PC laptop or any IOS device, allows instant silent switching between anything discussed above, and a good deal more. Think of MitchCo's Hang Loose Convolver, that also lets you incorporate nearly anything into presets or the remote, including I/O routing & selections.

View attachment 455277

Thanks, did not consider it impossible to do the routing with the right equipment, but is not something I (or many others) have handy. So, if you can readily switch between the stereo pair and the center, can you try with the proposed vocal tracks?

To be clear, am interested in the vocal tonality. Do not doubt that stereo can yield some added pleasant sound presentation.
 
For anyone worrying about summing to mono sounding wrong, consider FM broadcast. The L+R are summed to mono. The difference signal is then modulated onto the suppressed carrier. Millions of mono receivers have been successfully playing summed mono to their owners. Statistically this probably means most people have only ever experienced the L+R version of popular music.
FM radio doesn’t just take a regular stereo track and sum it to mono. It’s actually designed for mono compatibility from the start.

In FM stereo broadcasting, the mono signal is the L+R mix, and the stereo info (L−R) is added on a separate layer with a pilot tone to help receivers decode it. So when you hear mono on an FM radio, it's not the same as simply summing a finished stereo mix.

If you just take any stereo track and sum it to mono, you can run into problems like phase issues, weird tonal changes, or things dropping out, especially if the mix wasn’t made with mono in mind.

FM works well because it's built for it but that doesn’t mean all stereo music will sound right summed to mono after the fact.
 
Possible in theory if the music has certain things mixed way out of phase, they can cancel in the mono mix. This means more holes in the spectrum of music than you had before the mixdown.

This isn't super common, and last time I mixed anything it was best practice to maintain "mono compatibility" to an extent.

That said, pretty much all music has this problem. Few songs have what you'd call a continuous spectrum with sufficient energy at every frequency.

Definitely possible.

Sometimes when I read certain discussions, I start wondering what kind of music people actually listen to. Then I think -maybe it’s just me who has a really extreme taste in music, going from classical to death metal and trance.

I often come across tracks with panned or phase manipulated stereo imaging effects, also on vocals, so it’s pretty clear to me that a lot of music would sound very different from the intended sound if you just summed it to mono.

The effect of summing to mono really depends on the genre and how the music is mixed. And I think it's fair to acknowledge that someone’s personal taste and listening habits determines how much they are aware of the potential issues.
 
FM radio doesn’t just take a regular stereo track and sum it to mono. It’s actually designed for mono compatibility from the start.
Never heard of that. What do they special other than mono=L+R?
 
If you just take any stereo track and sum it to mono, you can run into problems like phase issues, weird tonal changes, or things dropping out, especially if the mix wasn’t made with mono in mind.

100% correct. Phase issues and comb filtering effects are maybe the easiest thing to detect when doing a mixdown. In the old days stereo recordings were checked on a correlation vectorscope for such incompatibilities. Tonal changes are the least identifiable, and they are pretty likely to induce errors if you do tonality judgements in a listening test with the resulting signal.

It teaches me how the L, C, and R in mono all sound a little different based on room interactions.

It is not only room interactions. It is also your HRTF influencing the tonality of sound coming in from different angles. This reveals one of the main flaw of judging tonality in a mono test for speakers that will be used in stereo later: the deviation between real source angle and phantom angle.

Are you suggesting listening in stereo and comparing to listening to one speaker of the pair in mono?

That would be one method, yes. You can start with a mono downmix and try on different speakers to figure out how each of them delivers the differences between the three voices and how there were embedded in the mix, maybe with keeping in mind if this test would allow a reliable judgment which speaker delivers the original imaging best.

Eventually listening to the stereo version might give you an idea what really happened to these three voices and how it should sound. You do need neither A/B testing nor level calibration for that.

As I have said before, the prime motivation for encouraging the audio industry - professional and consumer - to adopt a consistent target for loudspeaker performance is to improve the likelihood that I and other consumers have a reasonable chance of hearing the art as you so carefully created it.

I agree to the general goal.

But seemingly loudspeaker manufacturers have a very very different approach to which aspects are important when it comes to ´coming close to the studio sound´. We can all agree that linear on-axis response and absence of annoying resonances, distortion and alike is a reasonable goal and shared by most. But it is in my understanding also the easiest one to achieve and I do not really need anyone to do a listening test for that. I can design my own response curve thanks to DSP in a minute, and resonance/distortion issues I identify thanks to measurements prior to buying a speaker. Some 40 years ago this was not common, so there was a good reason for having discrimination listening tests in mono to identify such issues. Today you can walk into a shop and buy a $100 portable bluetooth speaker to meet these minimum demands.

Which leaves other aspects of reproduction quality which you cannot easily adjust or identify. Spatial and localization qualities are definitely a great part of the question how to come closer to the studio sound and the intension of the mixing engineers. They set high-quality audio reproduction apart from just listening to a quasi-mono brick, and are taken more seriously by pro audio and audiophiles alike.

Still I see no hint how these qualities can be judged in a mono test, or by solely looking at measurements. The majority of loudspeaker manufacturers does not seem to care much for optimizing these properties based on what would be ideal from acoustical perspective or judgment of those who have created the recordings. Some are even moving to the (in my opinion) wrong direction by delivering increasingly imbalanced off-axis response built into their speakers.

When judging loudspeakers and their interaction with a room, I like to rely on recordings which I have witnessed in the making. Ideally having listened to the whole concert in the auditorium, the whole general rehearsal in the broadcast control room, the final album mix, and compare these with what I hear in a studio or home. If you are lost in judging what sounds right, I recommend this method having selected a 30 or 40 albums from my personal catalogue which I regularly use (particularly those I know the concert hall very well). Other people might have other methods, but IMHO you have to start somewhere when trying to bring home listening conditions closer to studio ones.

It is also a good method to evading the ´circle of confusion´, as you have a real reference point in your sonic memory and witnessed the process how it translated to the final album.

As a side effect, the percentage of random recordings you can listen to getting the feeling ´Ah, I understand how it was originally meant by the recording engineer, and it gives enjoyment´ is increasing enormously. 80 or 90% of a random Spotify, Apple Music or Roon stream is possible.

Some 20 years ago, I never thought that was possible, as the ratio was rather opposite and most speakers outside an optimized studio were sounding just not right in terms of imaging and ambience. The theoretical knowledge and research from pro audio side was on the table, but that did not translate to products for home use. A series of events (like listening to the new high end speakers at AIR studios mastering suite in London or the MDR control room for Gewandhaus concert hall in Leipzig around that time) made me rethink this, interestingly all driven by people who have equally deep understanding for high end audio, pro audio and recording of acoustic music.
 
@Arindal's post alleges that stereo content could result in a mono mixdown that could favor a speaker with tonality issues.

Imagine the resulting tonality of two loudspeaker signals coming in at -30deg and +30deg horizontally towards your head and which frequency response the eardrums will be presented with, compared to a single signal coming in from 0deg frontal. The differences are quite obvious.

Interestingly this will not necessarily favor a speaker with the exact reversed tonality issues on axis. We have to keep in mind that the stereo signal contains a reverb pattern which was originally mixed for being spanned all over the stereo base. If you downmix this to mono, pretty unexpected things can happen with ambience and perception of proximity. According to my experience a speaker which would partly ´correct´ for these mono- and downmix related flaws by kinking the additional reverb field in the listening room, is most likely to win the mono competition. This one will, on the other hand, be distorting the imaging of a stereo reproduction even further.

My hypothesis is that mono listening (both single speakers and quasi-mono signals in stereo) favors speakers with imbalanced directivity index, particularly those pumping too much of energy in the 1-2K band into the room while attenuating the 2-5K and higher bands. The relative level of these two bands is vital for the perception of frontal or direct soundfield vs. diffuse field or sound coming from the rear half of the horizontal plane.

While those speaker sound somehow more ´dissolving´, ´distant´ and ´natural´ with mono sources (as Floyd Toole has confirmed), they according to my vast experience show a very compromised reproduction of natural reverb, proximity, depth-of-field and how phantom sources integrate into their enveloping reverb.
 
But Amir clearly listens to songs when he does its evaluation..
How do you make the mono signal Amir?
I initially created mono from stereo tracks in Audition. But quickly realized this is not necessary. Much of my test tracks have common elements in both channels. So I just connect one channel. It is remarkable how good the experience is with a properly engineered speaker when listening to just one channel!
 
Imagine the resulting tonality of two loudspeaker signals coming in at -30deg and +30deg horizontally towards your head and which frequency response the eardrums will be presented with, compared to a single signal coming in from 0deg frontal. The differences are quite obvious.

As mentioned in my reply to @gnarly, do not have a setup that would readily allow this test. However, I do often design a speaker with a mono signal and later listen to it in stereo. With my music, I have not heard this imagined tonality change. This is why I asked for a recording that might.

Interestingly this will not necessarily favor a speaker with the exact reversed tonality issues on axis. We have to keep in mind that the stereo signal contains a reverb pattern which was originally mixed for being spanned all over the stereo base. If you downmix this to mono, pretty unexpected things can happen with ambience and perception of proximity. According to my experience a speaker which would partly ´correct´ for these mono- and downmix related flaws by kinking the additional reverb field in the listening room, is most likely to win the mono competition. This one will, on the other hand, be distorting the imaging of a stereo reproduction even further.

Am not familiar with your so called kinking but agree that summing could be problematic. Have just never heard a major issue with vocal tonality in my experience.

My hypothesis is that mono listening (both single speakers and quasi-mono signals in stereo) favors speakers with imbalanced directivity index, particularly those pumping too much of energy in the 1-2K band into the room while attenuating the 2-5K and higher bands. The relative level of these two bands is vital for the perception of frontal or direct soundfield vs. diffuse field or sound coming from the rear half of the horizontal plane.

While an interesting hypothesis, you have not clearly shown any documented repeatable experiment to prove. Have you even tried waveform analysis of a mono mixdown of your Beethoven vocals? If your hypothesis is untested, then is just your speculation.

While those speaker sound somehow more ´dissolving´, ´distant´ and ´natural´ with mono sources (as Floyd Toole has confirmed), they according to my vast experience show a very compromised reproduction of natural reverb, proximity, depth-of-field and how phantom sources integrate into their enveloping reverb.

This may be true, but was focusing on vocal tonality for the purpose of keeping the proposed experiment simple for a individual to test. You have not produced anything but claims. On the other hand, Dr. Toole has published research and it shows the value of evaluating a single speaker for comparing tonality. I can respect your experience but not ready to simply dismiss my own and Dr. Toole’s supporting published research in favor of an unproven hypothesis.
 
Back
Top Bottom