Perceptual Effects of Room Reflections
By Amir Majidimehr
[Note: This article was originally published in the Widescreen Review Magazine, titled "it is not simple!"]
When Gary Reber asked me to write an article for the 20th anniversary of Widescreen Review magazine I was stumped at first. What should I write about that relates to such an important occasion? For the answer I thought back to what I knew 20 years ago and what I know now: how complex Audio/Video technology can be. Our systems are deceptively simple especially with the advent of digital technology. Hit the power button, queue up your music or video, press play and enjoy. Now add the fact that “bits are bits” as is often said about digital audio and video and you have what seems to be a very straightforward system to use and understand.
It is hard to argue against that point of view at high level. The system does perform and work that simply. But peel the layers of the onion and a far more complex picture emerges. And one that is anything but intuitive at times. In my article on
room dynamic range I showed how the simple computation of the noise floor of a room requires understanding pretty complex science of psychoacoustics (how we hear). Without that understanding you would easily miscompute the dynamic range by tens of decibels! Not a small error. So I thought I would continue the theme by covering more topics related to how we hear sound in a room. Specifically how we perceive sound above the so called “transition frequency.“ I covered that concept in my earlier article on
optimization of bass frequencies. Here is a useful graph from that article:
As explained in that article and is evident in the graph above, below the transition frequencies we have a straightforward problem of frequency responses being (wildly) modified for the room. Techniques for dealing with that were covered there. This article focuses on what happens above the transition frequencies where the speaker controls the overall frequency response more or less. But that is not the full story. The sound reflects from wall surfaces and arrives at our ears in addition to the direct sound. Perceptually this creates a very complex situation, one that is usually way oversimplified. The first step in understanding that is analyzing the effect through a kind of distortion called "comb filtering."
Comb Filtering
Take the simplest situation of a sound hitting just one wall, reflecting and then arriving at the same point as the direct sound. In doing so, it gets delayed and its level decreases some. Here is a simulation of it that I have created by taking a signal, delaying and reducing its amplitude and then combining it with itself as would happen with a reflection:
Not a pretty sight. Our flat response has notches in it now which is where the name comes from (i.e. they resemble the teeth in a comb).
The natural reaction is to attempt to absorb the reflection as to restore the direct sound of the speaker and hence flat response. Indeed, if you search for room acoustics online nine out of ten articles or forum posts will tell you the exact same thing. Pictures like the above are put in front of you and the decision of what to do becomes obvious. It can be good to follow the crowd in some occasions but this is not one of them! The reason is that neither the measurement nor how we perceive the distortion is represented by what I have explained so far.
Let’s start by examining how our hearing system works courtesy of Fletcher who in 1940 performed a series of listening tests leading to the discovery that our hearing system becomes less and less selective in frequency discrimination as frequencies went up. Fletcher’s model was later refined by Moore resulting in what is called Equivalent Rectangular Bandwidth or ERB. Plotting that with source frequency (coming out of our speakers) on the horizontal axis and the bandwidth of our auditory filters on the vertical axis we get the graph to the right. As we see in the graph there, the resolution or our ears is inversely proportional to the frequency we are trying to hear. The higher the auditory filter tuning frequency, the lower the resolution of the ear at that frequency.
As an example, at 300 Hz source frequency (horizontal axis), the ear’s sensitivity or discrimination is a narrow 60 Hz (vertical axis). At the other extreme, for a source frequency of 10 KHz, the level climbs way up to 1.1 KHz. This readily shows that we are far less sensitive to frequency variations at 10 KHz than we are at 300 Hz.
Now let’s apply the learning there to our comb filtering example. There, I used the assumption of the reflection path causing a 10 millisecond delay. If we invert the delay, we arrive at the frequency of the comb filter notches which in this case will be 100 Hz. If we take 100 Hz on the vertical axis and find the source frequency at which our auditory filter becomes that wide, we land at 700 Hz (where the green line is). Therefore for source frequencies above 700 Hz, our hearing system lacks sufficient resolution to properly hear the notches in this example.
Considering that the lower end of interest is the transition frequencies of 200 to 400 Hz, we have barely climbed up the range to 700 Hz before the audibility of those notches becomes questionable. Yet the display of distortion in our original simulation happily goes to 10,000+ Hz, portraying a far more serious issue than exists in reality.
Do We Measure What We Hear?
We clearly have a problem of measuring more precisely than our hearing system. In the example I purposely selected the highest resolution the tool can provide at just 0.3 Hz. That is far narrower than our ear’s selectivity. You probably could not tell the resolution was that high because it is obscured in the FFT size parameter. Let’s simulate what happens if we reduced the measurement resolution to 44 Hz:
Now it does not look remotely as alarming. Granted, we should be applying variable resolution to the measurement per the ERB graph so this version probably errs too much in the other direction. Still, it is representative of how the tool can be used to show too much detail relative to what we are interested in, i.e., what distortion we are likely to hear.
Just to emphasize the point, let’s look at an actual room measurement at high resolution:
Just like our simulation we get a disturbing display showing massive amount of comb filtering. Now let’s apply a 1/3 octave smoothing to it:
As with the reduction in resolution in the simulation, a more appropriate picture appears. It shows that we have a high frequency roll off which like is more audible than the jungle of notches in the previous graph. In some way this over analysis is a symptom of modern computing power. In the old days when this was a very slow process, everyone ran 1/3 octave analysis. Today computers are very fast so we opt for the much higher resolution and with it arriving at misleading data. It is the old forest from the tree problem.
As mentioned, the ideal measurement system would apply variable filtering as the frequencies climb. The only system I have seen that does this is the JBL Synthesis ARCOS automatic equalization system. Its frequency response display is adaptively filtered allowing the troughs and variations to be shown at proper resolution of our hearing system. This not only makes it easier for the user to analyze the room performance, but it also assists the automatic correction system to arrive at the right corrective equalization settings. It would avoid having it chase problems that don’t exist audibly. Put another way, you want to please your ears, not the meter, or the graph in this instance!
There is more yet. At the risk of stating the obvious, we have two ears, not one. Yet the standard practice for room acoustic measurement calls for the use of a single microphone. This is a problem as the two ears do not hear the same sound as frequencies climb. Think of a center speaker directly in front of you sending a reflection to the left wall. The left ear is closer to that wall than the right ear by a few inches. In time domain that translates to about 0.4 msec of extra delay for the right ear. Recall that comb filter frequency is directly proportional to time delay. If you change that value, the frequency of the notches changes with it. Since we are talking about 5 to 10 milliseconds for the typical reflection in the typical home listening space, 0.4 milliseconds is a significant change in the frequency of the comb filter.
Additionally, your head becomes an acoustic filter as the frequencies climb and their wavelength becomes smaller than your head. The fancy term for this is Head Related Transfer Function or HRTF. It is the science behind how we simulate surround sound using headphones for example.
The filtering due to HRTF is direction and frequency dependent. On the right is an example measurement performed in a similar scenario to ours as published by Macpherson in his AES paper. Notice how the ear that is not being blocked by the head is picking up a stronger signal (the curve on top) and how the effect gets exaggerated as the frequencies climb and the head becomes a more effective acoustic filter for the other ear. Note also that even for a single ear what we measure is not what we hear. Measurement microphones are calibrated to provide near flat frequency response yet our ears do not at all follow such a curve as is evident by the above graph.
Since the comb filter is the result of the direct sound being combined with the reflected one, if we filter out a good chunk of the spectrum of the latter for the ear in the shadow of the head, the comb filter will become less extreme. Importantly the waveform will be different than the other ear not being subjected to this filtering. The one microphone measurement errs significantly here by only showing us a poor facsimile of one ear and ignores the HRTF effect on the other altogether.
Putting the above two concepts together, what the brain is receiving is two different waveforms arriving from each ear. If this were your eyes, you would be seeing double vision and double tint! Fortunately that is not how we perceive it. Our everyday life is full of reflections due to enclosed spaces we live in. It is not surprising then that our brain has adapted to not only avoid being confused, but to put this situation to good use. Research indicates that the brain invokes a “central summation” which roughly combines the two signals as opposed to hearing each one independently. The net result is that much of comb filtering washes out, leaving us with the sum total of what the two ears hear. And that summing helps increase the total sound energy, helping with such things as intelligibility of speech.
So far I have only talked about how there is only a single reflection in the room. That is not going to ever happen in a real room which almost always has other wall surfaces, each of which also reflects the sound. Since each reflection distance is different and so are the levels of each, the result is countless variations of comb filters. They all mix up together and statistically become far less pronounced than the clean notches in our simulation. This helps yet again to reduce detectability of comb filtering.
Listening Tests
This is all fine in theory but how does it work in practice? There is good news and bad news. The bad news is that little of it will likely make sense to you! The good news is that for a change, distortion will become our friend and not the enemy.
First let’s dispense with a myth. In home listening spaces, a reflection is not an echo. Yes, the sound is bouncing and then arriving at our ear. But due to a phenomenon known as the “Haas” effect, what we hear is that the reflection “fuses” with the direct sound and will be heard as a “single event.” For an echo to occur you need to have reflection paths that are longer than 30 feet typically which likely is not going to be the case in even large listening spaces at home. So don’t think there is an issue here due to what you may hear in much larger public spaces where distinct echoes can be indeed be a problem. It is not here.
So if we don’t hear reflections as echoes how do we hear them? The answer to that depends on direction of the reflection. Let’s focus for now on the so called first reflection points on the side walls where “common wisdom” says should be eliminated. Experiments conducted by Dr. Toole and Olive show that such reflections, when perceptible, serve to widen the apparent source of the sound (i.e. no longer just coming out of a small speaker). Turns out this is a preferred outcome and one that human listeners in controlled settings indicate as being a good thing! There simply is more realism to an image of sound that extends past the speakers and better mimics our everyday experiences in reflection-rich environments. You probably are still scratching your head wondering how what I just said can be true. So let me quote some sections of Dr. Toole’s AES paper and book in this regard:
“Most reflections arrive from directions different from the direct sound, and perceptions vary considerably. Two ears and a brain have advantages over a microphone and an analyzer. The fact that the perceived spectrum is the result of a central (brain) summation of the slightly different spectra at the two ears attenuates the potential coloration from lateral reflections significantly [34]. If there are many reflections, from many directions, the coloration may disappear altogether [35], a conclusion to which we can all attest through our experiences listening in the elaborate comb filters called concert halls. Blauert summarizes: “Clearly, then, the auditory system possesses the ability, in binaural hearing, to disregard certain linear distortions of the ear input signals in forming the timbre of the auditory event.”
“It was in this room [Dr. Toole’s Reference IEC room at National Research Council] that experience was gained in understanding the role of first reflections from the side walls. The drapes were on tracks, permitting them to easily be brought forward toward the listening area so listeners could compare impressions with natural and attenuated lateral reflections (see Figures 4.10a and 8.8). In stereo listening, the effect would be considered by most as being subtle, but to the extent that there was a preference in terms of sound and imaging quality, the votes favored having the side walls left in a reflective state. In mono listening, the voting definitely favored having the side walls reflective."
"See the discussions in Chapter 8, and Figures 8.1 and 8.2, which show that attenuating first reflections seriously compromises the diffusivity of the sound field and the sense of ASW/image broadening. One of the problems with both music and movies is that sounds that in real life occupy substantial space—multiple musicians or crowds of people, for example—end up being delivered through a single loudspeaker—a tiny, highly localizable source. The precision of the localization is the problem. Most of what we hear in movies and television is monophonic, delivered by the center channel, so a certain amount of locally added room sound may be beneficial; this is definitely a case where a personal opinion is permitted."
Expanding on the last sentence, there is indeed a subset of people who are sensitive to comb filtering and hence strive to eliminate them. A prime example is recording engineers. Since they are able to electrically generate comb filtering they have learned what it sounds like and hence have well above average ability to hear them. And at any rate, the process of mixing and creating music requires being able to detect small changes to the parameters in that work. Both of these factors explain their preference for absorption of reflections. Such characteristics are not shared for the most part by the general public or audiophile community who aims at enjoyment of music. So be careful in following “what the PROs do” when it comes to room acoustics. You are unlikely to fit in that group.
As further evidence here, Clark in 1983 set out to test to create four different scenarios that involved comb filters:
1. Using two speakers playing a mono signal. The second speaker’s sound combines with the first creating comb filtering.
2. A reflector held vertically to the right of the listener and in between him and the speaker. While the distance there was shorter than typical wall reflection, the reflection nevertheless creates comb filtering just the same.
3. Same as #2 but the reflector held horizontally.
4. Creating the comb filter electronically by delaying the signal and combining it with itself. This is the same thing I did in my earlier simulation.
The important factor is that Clark made sure that the amplitude of the reflection (simulated or otherwise) was kept constant in all four scenarios. On the surface one would expect the effect to be similar because the reflection levels were the same. Yet the results were anything but!
In scenario #1, the addition of a second speaker was considered to have “moderate and pleasing effect.“ This, despite the fact that comb filtering was generated as a result of the second speaker. Clearly the listeners liked the effect more than they were concerned with any frequency response variations.
Scenario #2 was stated as having “very small effect.” What looked awful on a frequency response measurement was barely noticed. Turning the reflector horizontal did make it a bit more noticeable (for this reason you should absorb floor reflections with thick carpeting/pad). But still, in the grand scheme of things, it did not have the same magnitude effect as scenario #1.
The most surprising was scenario #4 where the outcome was “greatly degrading effect.” Let me repeat: the same distortion created electronically and sent out of the speaker was a very negative thing. The reason is that when comb filtering is created that way, we don’t get the nice benefit of the image widening, or the psychoacoustic factors that reduced its severity. This is how Clark concludes the paper:
“Two speaker mono was considered superior to the one speaker, one path mono. A reflection from a vertical surface was barely audible but a horizontal reflector was more audible. An electronic delay comb filter was highly audible and annoying.”
He goes on to emphasize how misleading measurements in both time and frequency domains can be in determining the audible effects:
- A comb filter response can be preferred over flat [frequency response].
- More lateral sound than a single speaker provides can be preferred.
- Response notches are almost inaudible if the notches are filled in by reflections within 10 ms.
- Response notches are annoying if not filled in by reflections.
- Vertical (wall) reflection notches were subtly audible.
- Horizontal (desk) reflection notches are more audible.
- Time responses can look the same and sound different.
- Frequency responses can look the same and sound different.
- A single test mic hinders getting directional information relevant to audibility.
Considering how long ago this test was done and how simple it is, isn't remarkable that we still cling to conclusions completely counter to this research?
Conclusion
So what seemed like an open and shut case of eliminating wall reflections due to anomalies in frequency response of the room becomes much more complex when one considers how we hear sounds in our home listening spaces. It shatters “gut feelings” one might have about the problem and solution thereof. I don’t know about you but I am fascinated by all of this. It is not every day that we get to like some distortion and save money not trying to eliminate it! So complexity and deep understanding of the science does have its virtues.
References
"Sound Reproduction: The Acoustics and Psychoacoustics of Loudspeakers and Rooms,"Dr. Floyd Toole, 2008 [book]
“The Detection of Reflections in Typical Rooms, ”Olive, Sean E., Toole, Floyd E., AES Convention: 85 (November 1988)
“A Computer Model of Binaural Localization for Stereo-Imaging Measurement, ”Macpherson, Ewan A., AES Convention: 87 (October 1989)
“Measuring Audible Effects of Time Delays in Listening Rooms, ”Clark, David, AES Convention: 74 (October 1983)