• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Sonic impact of downmixing stereo recordings to mono

Then I think -maybe it’s just me who has a really extreme taste in music, going from classical to death metal and trance.

If you are planning to do meaningful listening tests involving speakers and rooms, that is actually a minimum span of differently recorded music. Regarding classical I would say variations within this genre need to be as wide as with all others combined, from Renaissance era sacred music to 20th century percussion-heavy/jazzy recordings.

Does not necessarily need to be congruent with one's own taste. I have several main genres of music I would never listen to privately, but found them to be pretty useful for loudspeaker testing. Test tracks furthermore have to be chosen depending on results of previous tracks, and you have to know your test tracks and which frequency band they challenge and which flaws they might reveal. I have something in the region of 2,000 to 3,000 tracks I am regularly using for listening tests.

The only category you can completely ignore, is ´pseudo-acoustic´ close-mic´ed recordings, like singer/songwriter plus acoustic guitar, soft female vocal jazz. It does neither reveal anything nor does it help with a judgement what sounds correct in terms of tonal balance because there is no meaningful reverb pattern and the tonality is an artificial if not random product of the microphones at play. IMHO you can ignore reviews in which Sara K, Tracy Chapman, Dire Straits, James Taylor, Eva Cassidy or Rebecca Pidgeon are mentioned.

So if I’m listening to a system that has conjured up an extremely vivid vocal image between the speakers, I like to compare that to a real voice. So I get the person I’m with to stand where the recorded voice seems to occupy in the space between the speakers, and I have the person either sing a bit or even just talk, and I can compare that with the sound of the recorded voice.

Unfortunately this is not a reliable method of judging tonality and imaging. The tonality does drastically vary due to the real source vs. phantom source difference, and reverb pattern on the recording vs. speaker directivity vs. reverb from the real source also create a different perception. What seems to sound similar in this case is most likely to be a reproduction flaw, such as a speaker directivity aimed at being similar to a human voice´s dispersion pattern.

I initially created mono from stereo tracks in Audition. But quickly realized this is not necessary. Much of my test tracks have common elements in both channels.

Common elements do not constitute a meaningful balance between the instruments, their position and the perception of resulting reverb spanning over the stereo base. If you cut off all of this, you are most likely to get listening test results which can neither be translated to real mono listening nor to stereo.

It is remarkable how good the experience is with a properly engineered speaker when listening to just one channel!

Next time you read a review judging a 3D video projector, and the reviewer tells you how he refused to wear 3D glasses and covered one of his eyes during the test, you have a pretty good idea about how reliable this review is.
 
FM radio doesn’t just take a regular stereo track and sum it to mono. It’s actually designed for mono compatibility from the start.

In FM stereo broadcasting, the mono signal is the L+R mix, and the stereo info (L−R) is added on a separate layer with a pilot tone to help receivers decode it. So when you hear mono on an FM radio, it's not the same as simply summing a finished stereo mix.

If you just take any stereo track and sum it to mono, you can run into problems like phase issues, weird tonal changes, or things dropping out, especially if the mix wasn’t made with mono in mind.

FM works well because it's built for it but that doesn’t mean all stereo music will sound right summed to mono after the fact.
I'm sorry to be blunt, but this is incorrect or at least out of context.

Yes, you are right about how the difference signal (L-R) is added to allow STEREO receivers to decode and recreate L and R from the summed L+R mono feed.

But, roll back to before stereo FM. It was mono, obviously - and the mono feed was L+R summed. People could only hear music in mono from summed L+R. When stereo was introduced, the approach adopted had to be backwards compatible, hence the pilot tone and suppressed carrier. This meant that people listening on mono FM receivers still only heard the L+R mono feed after stereo was introduced. Also, anyone on the fringes of a transmitter will often only pick up the L+R mono feed even if they have a stereo decoder. In practice this is how many (perhaps the majority if you consider kitchen radios) still experienced the music, despite stereo being available.

To make this acceptable, broadcast producers making their own content went to great pains to ensure "mono compatibility". Many radio mixing panels have a L/R meter and a M/S meter. If the gap between the M and S meters was too small (or the S was higher than the M) the mix would be corrected so mono listeners were not affected. Every professional in the industry is trained to do this.

In summary, if you are on the fringes of an FM transmitter or you have mono FM receiver, all you hear is mono L+R. This is how millions of people have consumed music for decades. If there was a major issue it would have been discovered in the 1960s.
 
Since the original tests we have repeated stereo vs. mono tests a few times to satisfy ourselves and critics, with the same result. The most recent elaborate examination was done in 2008 with the result shown in post no. 54 in this thread.

If I understand your post correctly, this test refers solely to identifying audible resonances in one or two of the candidates which makes it rather a discrimination test. The result looks absolutely plausible to me, but it does not say anything about overall preference in the sense of perceiving a loudspeaker performance as ´natural´, ´realistic´ or however you want to call it.

I would expect a test to verify if preference in a mono listening tests in terms of imaging, transparency and natural tonality always translates to preference in stereo, to employ no candidates showing audible flaws like resonances, offering anechoic equalization between the scenarios and using mainly recordings containing natural reverb.

Long story short, they were all found to have hearing loss, which as you no doubt know, is an occupational hazard in the pro audio world.

No disagreement from my side. But you are referring to an era when comparably high monitoring SPL was common, some recording engineers assumedly even believed in some kind of ´the louder, the better I can judge dynamics and details´ philosophy.

This has changed drastically in my view over the last 30 years. Particularly when working with recording engineers specializing in classical music or working for public broadcast institutions such as culture radio, average SPL in the control rooms and editing suites is surprisingly low. Many studio engineers specifically demand speakers delivering full capability of judging details and bass at 85dB SPL max. You can gather a bunch of them being ages 50+ and most of them hear 14kHz without showing signs of insensitivity.

Seemingly, this has been addressed in later tests (which I could overview back to roughly 1999) with lots of students of recording engineering being invited and lots of younger pro audio people taking part (preferably not FOH engineers and sound reinforcement people).

This is probably because you are hearing mono signals in the hard panned L & R sounds. This is the reason why the same mono preferred loudspeakers are preferred in stereo and multichannel - the resonances are able to be heard during occasional "mono" intervals.

I was referring to tests solely using monaural signals, so no hard intensity-panned signals on the flanks.

As mentioned, I was meaning controlled listening tests asking for a verdict on localization stability, proximity, reverb envelopment, ambience and overall natural imaging, not discriminating or distinguishing resonances.

So the result was quite the opposite from yours: What was preferred in mono and with monaural signals in stereo (or very dry, center-panned stereo signals), was leading to an unnatural imaging in stereo, the added reverb detached from the phantom sources, making it impossible to judge proximity. This test was repeated with students of recording engineering who were asked to choose the correct direct/reverb ratio in the concert hall with musicians playing, according to how they perceived reverb and proximity in the control room. Interestingly, on the speakers preferred with mono signals (which were common waveguide-2-way concepts having increasing d.i. towards higher frequencies), they added a ridiculous amount of overshoot ambience, with great variation and uncertainty. With constant directivity speakers being generally preferred in stereo, the ratio was not varying and seemingly balanced in all cases.

I wonder how you would judge all these things in a mono test. BTW the speakers were identical in terms of anechoic response, driver choice, geometry, enclosure.

Whatever the soundstage and imaging, it is good to begin with loudspeakers capable of reproducing accurate timbre. Now they can be recognized in measurements.

Agreed. But are you aware of any controlled test being made and published comparing solely speakers which are recognized as accurate in terms of timbre, free from audible flaws like resonances and outright perfect in standard measurements?

This is where our life experiences differ. Mine is more than personal,

Mine too. As mentioned, I am referring to controlled tests being done by an independent research institute involving a significant number of students (mainly of acoustic engineering, recording engineering) and recording engineers. As this was touching sensitive procurement processes of big broadcast institutions, results are not publicly available, but all of this has greatly influenced technical requirements for studio monitors, and lots of pro audio loudspeaker manufacturers have eventually chosen to adopt to these ideas, particularly on the concept of constant directivity.

Funnily, my personal experience realizing that I always prefer speakers which are linear both on- and off-axis, came much later than being involved in first controlled listening tests.
 
I do often design a speaker with a mono signal and later listen to it in stereo. With my music, I have not heard this imagined tonality change.

If you use a designated monaural mix, I would not expect this tonality change to be dramatic. All you hear is the difference between +-30deg sound coming in compared to 0deg horizontally. This tonality change should be audible, in some cases in the form of a phantom center elevation.

Have you even tried waveform analysis of a mono mixdown of your Beethoven vocals?

What do you expect to see in the waveform?

The spectral analysis between the two narrators is seemingly very different as their voices differ. My guess would be that it is difficult to differentiate what is voice and what is microphone/mix-related. But you can clearly hear the difference. I did not find another recording making it that obvious.

I can respect your experience but not ready to simply dismiss my own and Dr. Toole’s supporting published research in favor of an unproven hypothesis.

Pretty fair, that is why I encourage everyone to do own experiments or repeat experiments which I have done in the past.

I might want to add that research published by Dr. Toole does seemingly not address any of the questions a lot of recording engineers (and studio acousticians and monitor manufacturers) find relevant. There is not really much of research published on this field although a lot of controlled tests have been done over the course of the last 25 years.
 
Thanks, did not consider it impossible to do the routing with the right equipment, but is not something I (or many others) have handy. So, if you can readily switch between the stereo pair and the center, can you try with the proposed vocal tracks?

To be clear, am interested in the vocal tonality. Do not doubt that stereo can yield some added pleasant sound presentation.
Thanks for the clarification in terms of your interest. I needed to re-read the last few pages of the thread to see what was going on.

Ok, using Arindal's recommendation in #289....
Compared to normal stereo, stereo summed to mono on a single speaker gives a clear upward shift in vocal tonality.
The reverb in stereo sums with, and lowers the vocal tonality. The reverb in mono pretty much disappears, leaving the vocal brighter..
 
In summary, if you are on the fringes of an FM transmitter or you have mono FM receiver, all you hear is mono L+R. This is how millions of people have consumed music for decades. If there was a major issue it would have been discovered in the 1960s.

I think the fact that a "major issue" wasn’t reported doesn’t mean one didn’t exist -just that expectations, technology, and standards were different. People didn’t complain because they didn’t have stereo to compare with, and mono mixes were often produced separately to ensure compatibility.
 
Compared to normal stereo, stereo summed to mono on a single speaker gives a clear upward shift in vocal tonality.

Thanks, interesting! Could you specify the frequency band that has mainly caused this difference in tonality pls? It is interesting because usually a downmixed stereo signal coming from 0deg mono instead, is expected to show slight subjective attenuation in the 0.8-2K band (if I recall it correctly, due to more pronounced cheek shading) and a pretty noticeable one in a narrow band slightly above 5K (less diffracted waves reaching the ear canal due to their direction being 30deg more away from the angle of maximum level). As the former band is signaling our brain (with increasing level) ´sound from behind´ and the latter ´slightly elevated sound´ it should also affect the subjective perception of proximity and elevation.

The reverb in mono pretty much disappears, leaving the vocal brighter..

Is this the same with all three voices on your speakers? That was the main point why I have chosen this recording.

By chance stumbled across another very interesting recording to demonstrate different phantom center vs. real source localization. Will need to make a check later and post a link if anyone is interested.
 
Is this the same with all three voices on your speakers? That was the main point why I have chosen this recording.

By chance stumbled across another very interesting recording to demonstrate different phantom center vs. real source localization. Will need to make a check later and post a link if anyone is interested.

My apologies..... I never found the three voices track. Should have said so. I was reporting on a single male voice. Could you pls link or point to the right track?

Thanks, interesting! Could you specify the frequency band that has mainly caused this difference in tonality pls?
Sure, I'll pull a couple of RTA's at listening position, when I have the right track . And hope they are not too room corrupted to make sense of :)
 
Thanks for giving this a try. As it is only a specific recording intended to demonstrate the intended problem, would be interested if you tried some music that you know well.

If you are willing to spend more time, going to move the discussion into its own thread. :)

I've tried so many well known tracks for mono to stereo comparisons. Both for evaluating DIY speakers, and because I have a keen interest in LCR matrices that I can build into a processor. (Particularly the Gerzon energy preserving type).

Usually, lead vocals are quite the same in regular L- R stereo, summed mono to any single L,C,R, Or dual L-R mono.
I've found C mono is the best speaker to compare to L-R stereo because of both those present a solid centered image.

If there's one part of a track I can usually count on to stay close to the same as I run through all the various speaker playback alternatives, it's been lead vocals.
I've always assumed it's because lead vocals are usually so well centered in stereo, that they have to be present rather equally in both L&R sides.
But what do I know about either mixing, or mastering....nothing !

My guess is, it takes vocals with reverb or other effects, like Arindal talks about, to break that up consistency I've observed between mono and stereo.
Am I'm not a fan of reverb, so maybe that's why experience has been what it is.

I'm definitely willing to spend more time in another thread, if others are too.
 
Could you pls link or point to the right track?

Sorry. It is track no. 2 (monologue and song). There are no three voices at a time but on disc 1 you find the version with soprano + German narrator, on disc 2 soprano + English narrator (yes, it is indeed John Malkovich´s voice). Most obvious reverb issues can be spotted on tracks no. 10 on both discs, the Finale.

If I recall it correctly, this project was meant to record a bunch of famous compositions by Beethoven in the original premier´s concert hall (or in this case a theatre similar to it), capturing as much of the original ambience and reverb as possible on CD with only minimum use of spot microphones.
 
Thanks for the clarification in terms of your interest. I needed to re-read the last few pages of the thread to see what was going on.

Ok, using Arindal's recommendation in #289....
Compared to normal stereo, stereo summed to mono on a single speaker gives a clear upward shift in vocal tonality.
The reverb in stereo sums with, and lowers the vocal tonality. The reverb in mono pretty much disappears, leaving the vocal brighter..
@amirm in your experience listening to one half of the stereo mix through a single speaker, do you experience such a shift?
My basic applied logic says that your approach might actually be preferable to summed mono because the dry versus reverb content ratio remains the same for many recordings. And that selecting recordings that don’t have too much hard panned instruments or voices would make for good material for single speaker tests.
 
your approach might actually be preferable to summed mono because the dry versus reverb content ratio remains the same for many recordings.

How reverb from the recording is perceived, is highly dependent on other factors than the simple level of reverb. One factor is how reverb is spread over different angles, how it is blending with the reverb added in the listening room, how it is masked by direct sound, not to forget its tonality and reflectogram over time and to which degree our brain is capable of linking the direct sound event and its footprint in the reverb to follow.

Stereo reverb which is spanning all over the stereo base, will be leading to uncalculatable effects in mono, no matter if you do a downmix or ditch a channel. Member @gnarly has described the situation correctly in which reverb in mono is ´disappearing behind the direct sound´ which is typical for artificial or decorrelated reverb.

selecting recordings that don’t have too much hard panned instruments or voices would make for good material for single speaker tests.

Is it simply not foreseeable what the outcome would be. Stereo recordings vastly differ in the way they contain reverb and meaningful spatial qualities. I would say the only proper way of doing a mono listening test is using material which was mixed and mastered in mono. If you use it the way Floyd Toole has described, it is actually pretty useful, e.g. for identifying resonance bands or horn colorations. The easiest way is to use spoken word material for that which is available in mono.

At some point wouldn't it be easier to use a suite of synthetic tests?

It is depending on your individual method of conducting such tests. I do not have a standard ´program´ but decide depending on the room, the speakers and the previous sessions which tracks to try next. For example identifying room modes, bass reflex problems, booming, bloated bass, impulse response or compression, I would go through a few hundred tracks which I had selected specifically for identifying bass problems. As I am pretty familiar with all these 2,000+ tracks, I can skip pretty quickly through the playlists aimed at testing bass, needing just something like 5 seconds per track to understand if I have nailed the problem. Synthetic tests would not help here to judge, as existing recordings very much vary.

The streaming queue looks horrible afterwards, and for people witnessing this might be very very annoying.
 
My scientific contribution to the thread:

1749067940174.jpeg
 
I once played with recording using in-ear microphones with mono pink noise using a single speaker (0°) and the same with two speakers, stereo setup. Made a mono file of both and uploaded at ASR for comparison. There is no doubt there is a change in "timbre" of the pink noise.
 
Is it simply not foreseeable what the outcome would be. Stereo recordings vastly differ in the way they contain reverb and meaningful spatial qualities. I would say the only proper way of doing a mono listening test is using material which was mixed and mastered in mono. If you use it the way Floyd Toole has described, it is actually pretty useful, e.g. for identifying resonance bands or horn colorations. The easiest way is to use spoken word material for that which is available in
Thanks for the considered responses.
 
Common elements do not constitute a meaningful balance between the instruments, their position and the perception of resulting reverb spanning over the stereo base. If you cut off all of this, you are most likely to get listening test results which can neither be translated to real mono listening nor to stereo.
Seems like the plot is lost and you are speaking from lack of experience. I suggest you remedy the latter before arguing more. Turn off one speaker and listen.

Until then, my collection of test tracks is from what sounds excellent on both stereo headphone and speaker listening. Out of that large set, I have narrowed down the list to a handful of critical tracks for testing various aspects of speaker performance. Here is the key thing: on any well designed speaker, all 200 to 300 tracks sound wonderful in mono! I can't think of a single scenario where it is the aim of a mix/mastering engineering to produce poor tonality in one speaker as to then compensate with the other. I repeat: these tracks are hugely enjoyable listened to with one speaker. You need to try it and then come back and complain.

Then you have to understand what makes a good test track:

1. It has to test for tonality. For this, I mostly use female vocals that bring this to the forefront and per above, sound very nice on stereo system as well as single speakers. If the track you picked doesn't do this, then go find another. There are tens of millions of pieces of music out there in subscription services.

2. The spectrum needs to stay constant more or less so as you switch between speakers or in my case, switch between EQ setting, the content doesn't change. This is why some use pink noise. I don't because I don't like to listen to noise.

3. I have test tracks for specific things such as sub-bass performance. My chosen track has this spectrum in both channels so I don't need at all to play in stereo.

These are the critical things to test for in a speaker: what is the tonality and can it have full range and play loud enough. They are trivially tested in mono, far easier than stereo.

Stuff you talk about you is positioning and room dependent. Conveying that to others is useless as no one is going to be able to replicate your setting. Further, you don't know those elements for when the music was produced. So you have no idea if what you are determining is right anyway. A dipole speaker surely generates spatial effects that are not real and were never heard in the studio. Combine two such speakers nave you are just dealing with fantasies as opposed to mimicking what someone may have setup.

Bottom line is simple: you need to go and perform single speaker testing. Get some experience under your belt as those of us who have done it. The few who have done it on ASR, have become instant believers. Don't just throw lay intuition at us and arguments we already know. Physics says as you approach speed of light, time essentially stops for you and distances become incredibly small. There is nothing intuitive about this but it is a proven fact from Einstein. Please stop arguing from what you think happens and spend time learning the "new" science. We do mono testing because it works far better and is far cheaper. It is insanity to say just because you think this and that with nary any experience under your belt that we should not do this.
 
Next time you read a review judging a 3D video projector, and the reviewer tells you how he refused to wear 3D glasses and covered one of his eyes during the test, you have a pretty good idea about how reliable this review is.
And how much of that 3D effect is invented in the projector vs content? It is the latter, right? The 3D glasses are just shutters synchronized to the left and right frames. Critically, we one minor exception, such a projector would be calibrated/reviewed in 2-D, destroying your claim that you need to test the transducer the way it is used. The exception is level of brightness you get due to light loss in 3-D mode.

Now for your turn, you need to go and visit a broadcast engineer and watch them throw the Luma (black and white) switch on the broadcast monitory to turn off all color information. Our eyes are much more sensitive to Luminance errors and we want to spot those and not allow color to add fog to that analysis. Luminance is given higher bandwidth for this reason. Here is quick random spec from this Sony broadcast monitor: https://pro.sony/ue_US/products/bro...04046ImprovedReflectionPerformance-bvm-hx3110

"* Mono, blue Only, RGB cut off"

Funny that they call it "Mono" given the discussion we have (they mean monochrome). Note that there even other modes such as Blue-only (used for calibration), etc. Surely no one wants to watch programming in any of these modes but they are critical in analysis of the performance of the display without other distractions.

As I have said, testing speakers is hard enough already. We want to do everything we can to allow the reviewer to get to reliable information. And single speaker is by far the best tool we have. So please don't spit upwind. This is not work that you can dismiss with analogies when you are speaking with people have have had a leg in that domain as much as audio.
 
I once played with recording using in-ear microphones with mono pink noise using a single speaker (0°) and the same with two speakers, stereo setup. Made a mono file of both and uploaded at ASR for comparison. There is no doubt there is a change in "timbre" of the pink noise.
Phase cancellations will cause changes. Note also that such cancellations are different in electronic domain (what you did) vs acoustic (in room with speaker playing).

My suggest is to NOT convert anything to mono. Create a single channel pink noise or just play one channel of the stereo version.
 
Back
Top Bottom