As mentioned on a previous thread, I have been doing a lot of research and thinking about the basic paradigms of audio as they relate to the goal of more realistic, natural reproduction - of music, in particular. A lot of this thinking was inspired by reading the latest version of Floyd Toole's book: https://tinyurl.com/y8am5le7 , which is pretty much mandatory reading for any science-oriented audiophile, IMO. Further insights came from this book: https://tinyurl.com/y8gg8kyz , which is a publication of the AES and therefore more technical, but still enlightening even for the non-mathematically inclined. I apologize if a lot of this post is too pedantic, familiar, or obvious, but to have a productive discussion on this it seems important to be on the same page to start.
To start with, here are some basic observations from Toole that I am treating as core assumptions. I suspect that these core assumptions will not be uncontroversial but they are backed by a lot of hard and empirical science at this point.
1. The basic signal path from microphone through recording, storage, and playback all the way to the loudspeaker is now sufficiently perfected that it has minimal to no impact on the quality of reproduction.
2. Loudspeaker design, as it relates to perceptual quality, is now a quite well-understood discipline. Speakers exhibiting flat on-axis response and even directivity will be highly rated for naturalness by a broad cross section of listeners across a broad range of physical conditions and program material.
In other words, we've now reached a high plateau of performance in all the basic elements of sound recording and reproduction, from mics to digital recording to electronics to speakers. And yet: nobody who goes to concerts on a frequent basis is ever fooled into believing that their stereo system recreates the experience of a musical event in a concert hall or other performance space. The two books I reference above describe the problem in gory detail, but you can boil it down to one word: stereo.
On the face of it, stereo is an attractively simple concept for recording and reproducing music. We have two ears, stereo has two channels. Record those two channels accurately, deliver them to your ears, viola. In practice, this simplicity is undone by a number of problems, but the two most glaring defects are both related to the same issue: we hear each channel with both ears.
Stereo defect number 1 is the most obvious and familiar - stereo does not recreate the full sense of space, immersion, envelopment we hear in an actual performance space. Why is this? To start with, our sense of being in a performance space comes from reflected sounds, sounds that bounce off the walls and ceiling of the space (the floor level is usually covered by absorbent people). As it turns out, the most important of these reflections from the perspective of envelopment are reflections that are "decorrelated" - dissimilar between the two ears. Those reflections will largely be lateral (side) reflections. So to recreate a sense of envelopment, we should start by capturing and recreating these low-correlation sounds at the ears. You see the problem: in conventional stereo, both ears hear both speakers, and the necessary decorrelation is greatly reduced. Recording engineers can do things to partly compensate for this, through microphone arrangements or signal processing that increase the amount of decorrelated sound, but the ear/brain combination is never wholly convinced.
Stereo defect number 2 relates to imaging and timbral accuracy. Think about the center image of a vocalist in a pop recording. The center image is created by routing the same signal to each speaker. Once again, though, each ear hears each speaker. And with speakers placed to the right and left of the listener, the signal will arrive at one ear slightly before the other. The result of combining a signal with a delayed version of itself is a frequency response artifact called comb filtering. In the case of the center image example above, the comb filter will create a strong frequency response dip around 2khz, and some other artifacts as well - see section 7.1.1 of the Toole book. If you have experienced the difference between a system with a physical center channel and a "phantom" center using just right and left speakers, the difference is obvious and "phantom" is not a bad description. And of course, the phantom imaging of stereo loses fidelity when the listener moves or rotates their head.
The two most promising paradigms for fixing/replacing the stereo problem are multichannel and binaural. Given how long this post is already, I'll break each of those into separate subsequent posts.
Scott
To start with, here are some basic observations from Toole that I am treating as core assumptions. I suspect that these core assumptions will not be uncontroversial but they are backed by a lot of hard and empirical science at this point.
1. The basic signal path from microphone through recording, storage, and playback all the way to the loudspeaker is now sufficiently perfected that it has minimal to no impact on the quality of reproduction.
2. Loudspeaker design, as it relates to perceptual quality, is now a quite well-understood discipline. Speakers exhibiting flat on-axis response and even directivity will be highly rated for naturalness by a broad cross section of listeners across a broad range of physical conditions and program material.
In other words, we've now reached a high plateau of performance in all the basic elements of sound recording and reproduction, from mics to digital recording to electronics to speakers. And yet: nobody who goes to concerts on a frequent basis is ever fooled into believing that their stereo system recreates the experience of a musical event in a concert hall or other performance space. The two books I reference above describe the problem in gory detail, but you can boil it down to one word: stereo.
On the face of it, stereo is an attractively simple concept for recording and reproducing music. We have two ears, stereo has two channels. Record those two channels accurately, deliver them to your ears, viola. In practice, this simplicity is undone by a number of problems, but the two most glaring defects are both related to the same issue: we hear each channel with both ears.
Stereo defect number 1 is the most obvious and familiar - stereo does not recreate the full sense of space, immersion, envelopment we hear in an actual performance space. Why is this? To start with, our sense of being in a performance space comes from reflected sounds, sounds that bounce off the walls and ceiling of the space (the floor level is usually covered by absorbent people). As it turns out, the most important of these reflections from the perspective of envelopment are reflections that are "decorrelated" - dissimilar between the two ears. Those reflections will largely be lateral (side) reflections. So to recreate a sense of envelopment, we should start by capturing and recreating these low-correlation sounds at the ears. You see the problem: in conventional stereo, both ears hear both speakers, and the necessary decorrelation is greatly reduced. Recording engineers can do things to partly compensate for this, through microphone arrangements or signal processing that increase the amount of decorrelated sound, but the ear/brain combination is never wholly convinced.
Stereo defect number 2 relates to imaging and timbral accuracy. Think about the center image of a vocalist in a pop recording. The center image is created by routing the same signal to each speaker. Once again, though, each ear hears each speaker. And with speakers placed to the right and left of the listener, the signal will arrive at one ear slightly before the other. The result of combining a signal with a delayed version of itself is a frequency response artifact called comb filtering. In the case of the center image example above, the comb filter will create a strong frequency response dip around 2khz, and some other artifacts as well - see section 7.1.1 of the Toole book. If you have experienced the difference between a system with a physical center channel and a "phantom" center using just right and left speakers, the difference is obvious and "phantom" is not a bad description. And of course, the phantom imaging of stereo loses fidelity when the listener moves or rotates their head.
The two most promising paradigms for fixing/replacing the stereo problem are multichannel and binaural. Given how long this post is already, I'll break each of those into separate subsequent posts.
Scott
Last edited: