• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Sonic impact of downmixing stereo recordings to mono

To add to that .. generalizing all music for the purpose of this thread is not helpful. You have to Choose reference material carefully. Not all stereo recordings are produced equally. Some will downmix to mono very well, others won't. This is why Amir has tracks that he chooses, and Dr. toole posted some examples as well.

You can measure mono compatibility in stereo recordings with a DAW like reaper or audacity, and something like isotope insight or even the built-in tools in Reaper. It can give you some ballparks on mono compatibility of recordings.

Or just use known high quality recordings that work well in mono. And again as it's been proven mathematically and practically, when a single loudspeaker measures well in mono it's going to sound excellent in a stereo pair or multi-channel.
 
Last edited:
To me at least, what @Floyd Toole answered to this question concerning mono listening tests here was more than satisfactory: ...
I'm not happy with how the term "resonance" is used. It isn't a solely physical resonance in its original meaning. But this term is fundamental for the explanation, me thinks. I’ll just leave it like that, so as not to put words in anyone’s mouth.

But since you understand the explanation, you can surely help me.
 
I'm not happy with how the term "resonance" is used
Excellent, we're getting somewhere!

I would not want to put words in Dr Toole's mouth, but reading his work gives me a sense of what he means. His YouTube videos, of course put his own words in his own mouth!

What I gather is that many speaker non-linear behaviours (assuming they are not being overdriven) are discerned as resonances. Some resonances come from the cabinet, but many come from drivers not operating perfectly at all frequencies and from driver interactions (with eachother and the cabinet). Speakers with fewer or lesser resonances are preferred in blind tests.

A useful aspect is that resonances are often discernable on-axis as well as off-axis and are largely unrelated to level.

Why are you uncomfortable with this word?
 
  • Like
Reactions: MAB
To add to that .. generalizing all music for the purpose of this thread is not helpful. You have to Choose reference material carefully. Not all stereo recordings are produced equally. Some will downmix to mono very well, others won't. This is why Amir has tracks that he chooses, and Dr. toole posted some examples as well.

You can measure mono compatibility in stereo recordings with a DAW like reaper or audacity, and something like isotope insight or even the built-in tools in Reaper. It can give you some ballparks on mono compatibility of recordings.

Or just use known high quality recordings that work well in mono. And again as it's been proven mathematically and practically, when a single loudspeaker measures well in mono it's going to sound excellent in a stereo pair or multi-channel.
There is still a fundamental misunderstanding of how and why monophonic listening has been employed in our research. It was never used in subjective evaluations of an "absolute" kind, judging the merit of the sound as reproduced by one loudspeaker in a small listening room, and never sighted. We did our best to avoid using downmixes with problems, but even if they had problems they probably would not affect the results because the test employing the mono listening was a multiple-loudspeaker (3 or 4 at a time), loudness equalized randomized comparison with program in 30 s loops. Whether the signal was music, pink noise or impulses the results were the same - loudspeakers without audible resonances were preferred. The music was merely a test signal to energize resonances so they could be heard, and some music was much better than others. Pink noise was the most revealing, but it was unfamiliar to most listeners. It is all in my papers, books and YouTube lectures. The most preferred loudspeakers were the most technically accurate ones.
 
I haven't read through the entire thread, so apologies if no one has mentioned it before.

One issue with a mono sum is that in the analog world, summing L+R causes material in the center to increase relative to the other material. This is because power is proportional to the square of the voltage, so taking a voltage sum causes a 3db increase vs. the power sum that occurs when you send the signal to two speakers.

This has been known for the entirety of stereo recordings, and back in the day when AM radio ruled engineers would create and distribute mono versions of songs on special "DJ records" often with the mono version on one side and the stereo version on the other.

Of course, there are far worse things that can happen when summing to mono such as phase anomalies. And since your typical mono listening device is not particularly high-end, a bit of boost to the foreground material is not really a bad thing.
 
I haven't read through the entire thread, so apologies if no one has mentioned it before.

One issue with a mono sum is that in the analog world, summing L+R causes material in the center to increase relative to the other material. This is because power is proportional to the square of the voltage, so taking a voltage sum causes a 3db increase vs. the power sum that occurs when you send the signal to two speakers.

This has been known for the entirety of stereo recordings, and back in the day when AM radio ruled engineers would create and distribute mono versions of songs on special "DJ records" often with the mono version on one side and the stereo version on the other.

Of course, there are far worse things that can happen when summing to mono such as phase anomalies. And since your typical mono listening device is not particularly high-end, a bit of boost to the foreground material is not really a bad thing.
But as Dr Toole put it, music wasn't ever used in an absolute way (in their testing) to determine the quality of a speaker. It was more a way to contextualize resonance in a speaker so that people could understand what they were hearing.

But this is what drives home everything: "The most preferred loudspeakers were the most technically accurate ones" ... Think about that. Technical accuracy in speaker design translates into a better listening experience. Every time, whether mono, stereo or multi-channel.

A well designed speaker shows you what well produced music is supposed to sound like.
 
But this is what drives home everything: "The most preferred loudspeakers were the most technically accurate ones" ... Think about that.
Tounge in cheak one might argue that the "technically better" speaker was used to evaluate and optimize the recording in the studio to begin with. So, later when playing the record and evaluating it--the recording, the "technically better" speaker would be optimal for this purpose. That renders the "technically better" speaker the technically better speaker. :facepalm: This circular reasoning makes the standard, which is fair because standards are selfreferential in their logic to begin with. It breakes the "circle of confusion". If studio uses speaker type A, use speaker type A for playback also, that easy. Done. But, I once asked if their's equivalence, say type A is e/q to type B, and by which parameters the e/q is stated. As you might guess, the question wasn't picked up- It's science: open question(s) remain.

What I gather is that many speaker non-linear behaviours (assuming they are not being overdriven) are discerned as resonances. ...
That's the "audio" resonances which apparently are not real physical resonances. Physical resonances don't resonate with the audio use of that term. I won't dare to educate anyone who carries Maxwell's Equations in their alias:) I could analyse the above statement word by word to show by what degree it is typical audio talk.

But briefly: resonance is not bad. Resonance is necessary to transform energy from electrical to mechanical, especially when talking about speakers. Thing is, how the resonance(s) are designed, or controlled as the engineer might put it. It might take a minute or two to think about it (sic!).

Every acoustic signal is resonant by nature, its description respectively. Remeber: resonance can be aperiodic, or composed of many, at least the physical description goes that far. When using a speaker, you feed resonance into another (necessarily) resonant device as to replicate the original resonance as an acoustic event.

It is not so, that resonance lives for itself, waiting to gain enough energy, statistically, to become objectional. That's simply not how it works.

It is more about perception by a human mind (not: brain), that motivates test listeners to check a box on the evaluation sheet. It's a concious decision making, that depends on other parameters than the mathematical overlay of resonant processes. Good/bad is based on subjective, basically unknown preferences. We better leave it as that - stop blaming resonance. There is simply no layed out path from math/ to pref/.

Add.: think of the previously ventilated idea of not using a single speaker for evaluation (necessarily mono), but a pair in stereo arrangement, but fed with a mono signal. What you then would get is funny intereferences aka comb filter effect. By what is comb filtering different from resonance, or "resonance"? Me rethinking might conclude that single speaker assessment outrules extra comb filtering that is a permanet compagnion in stereo? While the 'extra' is extra to the Head Related Transfer Function.

You see, on topic, mono isn't just mono. Me as a scientist would have investigated further to clarify my foundational assumptions.
 
Last edited:
Tounge in cheak one might argue that the "technically better" speaker was used to evaluate and optimize the recording in the studio to begin with. So, later when playing the record and evaluating it--the recording, the "technically better" speaker would be optimal for this purpose. That renders the "technically better" speaker the technically better speaker. :facepalm: This circular reasoning makes the standard, which is fair because standards are selfreferential in their logic to begin with. It breakes the "circle of confusion". If studio uses speaker type A, use speaker type A for playback also, that easy. Done. But, I once asked if their's equivalence, say type A is e/q to type B, and by which parameters the e/q is stated. As you might guess, the question wasn't picked up- It's science: open question(s) remain.

You are confusing psychoacoustics with speaker design. And you are describing what the subjective reviewers do - they evaluate speakers based on other speakers they have listened to with no scientific discovery informing their decisions, using music alone as their source of truth for what makes the speaker sound good or not. It's effective snake oil selling, and it's completely backwards.

But in the case of the work of Audio Luminaries like Dr. Toole and others, speakers that measured well (Scientific measuring discussed ad nauseum around here) were preferred in blind listening tests (Psychoacoustics) over speakers that didn't measure well (Scientific measuring).

And further, this relates back to why using only one speaker for measuring is necessary. You can't use two - because then you're measuring an acoustic array, not a speaker.
 
Tounge in cheak one might argue that the "technically better" speaker was used to evaluate and optimize the recording in the studio to begin with. So, later when playing the record and evaluating it--the recording, the "technically better" speaker would be optimal for this purpose. That renders the "technically better" speaker the technically better speaker. :facepalm: This circular reasoning makes the standard, which is fair because standards are selfreferential in their logic to begin with. It breakes the "circle of confusion". If studio uses speaker type A, use speaker type A for playback also, that easy. Done. But, I once asked if their's equivalence, say type A is e/q to type B, and by which parameters the e/q is stated. As you might guess, the question wasn't picked up- It's science: open question(s) remain.


That's the "audio" resonances which apparently are not real physical resonances. Physical resonances don't resonate with the audio use of that term. I won't dare to educate anyone who carries Maxwell's Equations in their alias:) I could analyse the above statement word by word to show by what degree it is typical audio talk.

But briefly: resonance is not bad. Resonance is necessary to transform energy from electrical to mechanical, especially when talking about speakers. Thing is, how the resonance(s) are designed, or controlled as the engineer might put it. It might take a minute or two to think about it (sic!).

Every acoustic signal is resonant by nature, its description respectively. Remeber: resonance can be aperiodic, or composed of many, at least the physical description goes that far. When using a speaker, you feed resonance into another (necessarily) resonant device as to replicate the original resonance as an acoustic event.

It is not so, that resonance lives for itself, waiting to gain enough energy, statistically, to become objectional. That's simply not how it works.

It is more about perception by a human mind (not: brain), that motivates test listeners to check a box on the evaluation sheet. It's a concious decision making, that depends on other parameters than the mathematical overlay of resonant processes. Good/bad is based on subjective, basically unknown preferences. We better leave it as that - stop blaming resonance. There is simply no layed out path from math/ to pref/.

Add.: think of the previously ventilated idea of not using a single speaker for evaluation (necessarily mono), but a pair in stereo arrangement, but fed with a mono signal. What you then would get is funny intereferences aka comb filter effect. By what is comb filtering different from resonance, or "resonance"? Me rethinking might conclude that single speaker assessment outrules extra comb filtering that is a permanet compagnion in stereo? While the 'extra' is extra to the Head Related Transfer Function.

You see, on topic, mono isn't just mono. Me as a scientist would have investigated further to clarify my foundational assumptions.
You said, "Tounge in cheak one might argue that the "technically better" speaker was used to evaluate and optimize the recording in the studio to begin with. So, later when playing the record and evaluating it--the recording, the "technically better" speaker would be optimal for this purpose." I only wish this were true. The abundant measured evidence displayed in my books indicates that recording studio monitor loudspeakers have historically been as variable as their consumer versions. This is the origin of the circle of confusion. Slide show 6 in the companion website to the 4th edition of my book shows many examples of where things stand now - we are much better off than not many years ago. Timbrally neutral loudspeakers are appearing everywhere at all prices.

All voices and musical instrument sounds originate as mechanical and/or acoustical resonances combined with various associated aerodynamic and mechanical sounds. They are the identifying timbres of voices and musical instruments. One objective of the recording industry is to capture, store and reproduce those resonances and sounds without modification. However, in popular music those sounds are modified in countless ways, or synthesized to create unique effects - it is art. The objective in those situations is to deliver this contrived art to listeners as it was created.

A reproducing device, electronic or electroacoustical, with a flat frequency response is the desired timbrally "neutral" device. Such a frequency response can be considered a "resonance" with a Q of zero. Deviations from flat, are interpretable as evidence of resonances or acoustical interference. Acoustical interference, as it abundantly occurs in listening in reflective spaces, is position dependent so humans have apparently evolved to not consider it to be a degradation of the sound source. A concert hall is a staggeringly complex comb filter. Carrying on a conversation across a table involves massive measurable interference which we readily adapt to. The acoustical interference in stereo phantom images is a non-trivial phenomenon when measured, and it is audible when tested, but is routinely adapted to by many listeners who think that stereo is in some way a "pure" record/reproduction format. A corrupted phantom centre is often preferred to a real centre loudspeaker - that is adaptation at work.

Resonances added by playback devices are positionally and directionally durable phenomena. In loudspeakers they are revealed in the anechoic on- and off-axis frequency responses, radiating outward in many directions, and audible throughout a listening room. Superimposed on these are small-room resonances that are perceived differently for every location in the room. Both degrade the timbre of the original recordings. We are most sensitive to broadband - very low-Q - deviations from flat. The subjective detection/evaluation mechanism seems to be better correlated to the bump in the frequency response than to the time-domain extensions. As explained in Section 1.4 of the 4th edition, the likelihood of audilbly detecting resonances is greatly increased when extraneous variables are controlled or removed from the listening evaluation.

Decades of double-blind listening tests indicate that when these resonances are attenuated the subjective sound quality ratings increase. In loudspeakers, these are physical resonances in the wave motion in diaphragms, in the mechanical suspension systems and structural elements of transducers. Good transducer design has eliminated most of these resonances, and because they are minimum-phase phenomena those remaining are amenable to attenuation by matched parametric equalization. Enclosures exhibit both acoustical and mechanical resonances, and competent engineering easily controls them. Listening room resonance are more challenging as discussed in detail in Chapter 14 of the 4th edition.

So, when you say "But briefly, resonance is not bad", you are both right and wrong. They are essential to the character of the sounds we want to hear, but flaws when they appear in sound reproducing devices and acoustical spaces.

I would welcome elaboration on your closing comment: "You see, on topic, mono isn't just mono. Me as a scientist would have investigated further to clarify my foundational assumptions." Section 5.8 in the 4th edition explains why stereo and multichannel reproduction reduces awareness of resonances in loudspeakers - it is more than just channel count.
 
You are confusing psychoacoustics with speaker design. And you are describing what the subjective reviewers do - they ...
So, you say that I'm confused, like the other guys?

But in the case of the work of Audio Luminaries like Dr. Toole and others, ...
I'm a not a believer. And I'm proud of it a little.

.. only one speaker for measuring is necessary. You can't use two - because then you're measuring an acoustic array, not a speaker.
I never talked about measuring. I was asking if the cause for less dicriminating powers of listening with comb filter effect present (stereo arrangement, mono signal, phantom center) was the ruffled frequency response, and conversely, that "resonance" is just another audio word for frequency response aberrations. You derailed it.

I would welcome elaboration on your closing comment: "...
I understand that we are, by and large, in complete agreement, even on details left unspoken. Elaboration: With an assumed budget, I would have examined whether the (missing) accuracy in the evaluation of loudspeakers might not, after all, be related to comb-filter effects. This relates to the experiment described, using mono playback over a pair of loudspeakers arranged in a stereo triangle. That was all—provided we can agree that resonances are, in essence, subjectively noticeable primarily through amplitude effects. :) Thanks for the standard anyway.
 
So, you say that I'm confused, like the other guys?
It seems like you are conflating terms and misunderstanding some core ideas.
I'm a not a believer. And I'm proud of it a little.
There it is. Being proud won't get you anywhere
I never talked about measuring. I was asking if the cause for less dicriminating powers of listening with comb filter effect present (stereo arrangement, mono signal, phantom center) was the ruffled frequency response, and conversely, that "resonance" is just another audio word for frequency response aberrations. You derailed it.
Speaker Resonance is mechanical, physical and measurable and hearable.

Comb filtering ... Kippel measurements are designed to eliminate comb filtering.
 
Last edited:
This relates to the experiment described, using mono playback over a pair of loudspeakers arranged in a stereo triangle.
This has been done, as described in AES papers and my books. When the signals delivered to the two loudspeakers are identical, resonances are well detected, when there is added large venue spaciousness, the resonances are more difficult to detect. Multichannel audio delivers even more persuasive recorded spaciousness and detection is even worse. It is not related to acoustic crosstalk comb filter effects. You need to read up on the science.
 
You need to read up on the science.
Dr Toole, I've a qualifying degree in nat/ science. This, though, is a hobby. I will never ask you again, sorry for the uncenvenience.
 
When the signals delivered to the two loudspeakers are identical, resonances are well detected, when there is added large venue spaciousness, the resonances are more difficult to detect.
I read this in the book, but IIRC identical signals with two speakers (in stereo triangle) were only tested with technical signals. Then resonances were readily audible.

Has this (audibility of resonances) also been tested with music? That is, with identical signals (aka music in mono) through two (or more) speakers?

It is not clear to me, what
”when there is added large venue spaciousness, the resonances are more difficult to detect."
exactly means.

Does this relate to the case of stereo music (as described in the 4th edition)?
Or does this relate to mono music over two speakers, too?
I know that "large venue spaciousness" is mainly transported via stereo, but a little bit (temporal) will be present even in mono.
 
Has this (audibility of resonances) also been tested with music? That is, with identical signals (aka music in mono) through two (or more) speakers?
Yes, and resonances normally correlate with a "bump-up" in the frequency response which is more apparent than any ringing/resonating because even bad speakers rarely resonate THAT badly. Speakers don't ring like a bell or a plucked guitar string, etc.

Back the the stereo/mono topic - I remember reading that Amir simply uses one stereo channel (without downmixing to mono) and he said most recordings sound OK that way, including the selected music he uses for his listening tests. I was surprised that he wasn't downmixing but he knows more than me! And the purpose is to evaluate the speaker, not the music. And when reviewing speakers he's not listening for pleasure.
 
Last edited:
I read this in the book, but IIRC identical signals with two speakers (in stereo triangle) were only tested with technical signals. Then resonances were readily audible.

Has this (audibility of resonances) also been tested with music? That is, with identical signals (aka music in mono) through two (or more) speakers?

It is not clear to me, what

exactly means.

Does this relate to the case of stereo music (as described in the 4th edition)?
Or does this relate to mono music over two speakers, too?
I know that "large venue spaciousness" is mainly transported via stereo, but a little bit (temporal) will be present even in mono.
You raise a number of good points, and the following is an overly long response that came out as a stream of consciousness - I may have gone on a bit . . . :) Some of it I wish I had written into the book, and may yet put into a website update.

Obviously we enjoy listening in stereo and multichannel because they deliver perceptions of direction and spatial envelopment that mono does not. They are more realistic if that is the goal, and more "engaging" for more abstract art. Increasing the number of channels delivers a more persuasive sense of being in a large space. Envelopment is a binaural perceptual phenomenon - the sounds at the two ears are different, and if the differences are of the right kind we experience "envelopment"; we are in the space with the musicians. The recorded "spaces", whether they are captured in a concert hall or auditorium or added by electronic simulation, are always "large" spaces. The perception of the large space is superimposed on that of the small listening room: the classic "room within a room" problem. Fortunately, the large room perception is usually the dominant perception. It is this fact that appears to be what makes the loudspeaker resonance cues in the small room direct and reflected sounds to be less noticed - but they are revealed during intervals of hard-panned, monophonic, sounds.

All of the highly enjoyable large space impressions are lost in mono, whether reproduced by a single loudspeaker or by a pair of loudspeakers driven by the same signals. The music is conveyed, as is the time-domain reverberation, but no binaurally perceived senses of direction and space other than that of the listening room. There is no large space envelopment.

We want timbrally neutral loudspeakers and it turns out that they are much more easily recognized in mono listening, which is why it was done. BUT it was done in double-blind, equal-loudness, multiple-loudspeaker randomized comparisons. In those tests the listening room and the program were constant factors during the comparisons, and the loudspeakers being compared were moved to the same location when active (positional substitution). These were laboratory experiments, not simple exercises that can be done in living rooms. They were very costly to set up and run and other than those done at the NRCC and Harman, I know of no other large scale experiments that have been done and published. However, because they were done we now have very persuasive correlations between what is measured and what is heard. Interpreting spinorama data is a more reliable way of identifying timbrally neutral, technically accurate, loudspeakers than by listening using methods available to audio enthusiasts.

When listening to loudspeakers in mono or stereo without equal-loudness comparisons one is hearing the combined effects of the loudspeaker, the room and the recording. Everything is a variable and consequently opinions also vary. The evidence of resonances is there, but it is seriously masked by uncontrolled variables. If the experiment is sighted, all bets are off.

Pink noise, a technical signal, was by far the most revealing sound for detecting resonances. We used music because it was more familiar and friendly for listeners, and it was the "real world" sound through which we would hear resonances if they were present. However, music exists in countless versions or instrumentation, processing and spatial content, so some recordings were more useful than others. Looking at the technical features of the most revealing selections of music it was concluded that the following were desirable:
  • wide uniform spectrum (low bass content is very important)
  • a dense spectrum, meaning several voices and or instruments. Solo voices and instruments were not very useful.
  • reverberation which added "repetitions".
Pink noise, had all of these properties, and it told us that resonances were there, but music told us something about the probability of the resonances being audible or annoying in everyday listening. It turned out that high-Q resonances were easily heard in pink noise, and seen in measurements, but less obvious in music because they need sustained input at a very specific frequency in order for them to be energized. Whether they were heard or not was in reality a probability, not an absolute, because of the structure of the program. Fortunately they are easily eliminated from loudspeakers.

It is interesting that now, technical measurements have replaced pink noise as the best method of discovering whether resonances are present or not, but we still feel the necessity to listen to music as the final test. The reality is that the most technically perfect loudspeaker will not always sound good. Recordings are not standardized. Neither are listening rooms and we now know that bass accounts for about 30% of our overall impressions of sound quality, and that is dominated by small room resonances.

All this is a real "downer" for audio enthusiasts who have absolute faith in what they are hearing at the moment, and who have used these opinions to choose loudspeakers. Some may have been good choices, but most have simply kept a lot of mediocre manufacturers alive. With spinorama data from Amir, Erin and others, the playing field has been leveled. Technical accuracy now has a visible target. There are differences, but it is clear that those differences are trending smaller, and are less associated with narrow-band resonances and more with easily equalized or adapted to broadband features. Frequency-dependent directional effects remain something to watch out for, but even that can be seen to have improved.

Examining spinorama data one can see recognizable differences in the engineering capabilities of different brands, and there remain a few that still believe in selling loudspeakers with "personality". It's a free world. Some of the "prestige" high-end brands of the past are now seen to have been in that category. They may be selling at attractive prices in the second-hand market but they are really bad deals. Timbrally neutral loudspeakers can be found at affordable prices these days. Small ones may need subwoofers, and larger ones will be needed for high sound levels, but the choices are there and expanding.

Knowledgable audio people no longer expect large, or even any, audible differences among electronic components and wires. Subjectivists who don't do measurements or proper listening evaluations still write essays or do YouTube shows about differences, and manage to sleep at night, but that is not reality. We are approaching a similar state with respect to loudspeakers. In our double-blind evaluations at the NRCC and Harman, it was not uncommon to end up with statistical ties - loudspeakers that were very slightly different, but in the end, comparably good. The differences heard in those very revealing tests were not likely to be recognized in recreational listening in stereo or multichannel in typical domestic or home theater circumstances. The entertainment would be of a very high standard. Tone controls and adjustable EQ are still useful for fussy listeners in dealing with variations in recordings.

The remaining technical "weak link" in audio is the listening room and its inescapable low frequency resonances. Interactions between loudspeaker directivity and early reflections exist and can be heard to influence "soundstage" perceptions, but these perceptions are also influenced by recordings, so there can be no single solution. The low-frequency problem is always there, unchanging, and different for every listening location. Chapter 14 discusses solutions, but not everyone bothers to address this issue. Adaptation helps, but is not a solution.

But, as I ended one of my slide shows, the real "weak link"in audio is human nature. We want to believe that what we think we hear at the moment is an absolute truth. Opinions matter more than facts.
 
Last edited:
Back
Top Bottom