• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Loudness perception and critical bands

roba

New Member
Joined
Jul 5, 2023
Messages
1
Likes
2
Hi friends, I've been doing some reading about critical bands, which as I understand it is a field that tries to describe the bandwidth around a particular sound within which a second sound will mask the perception of the first. This has me wondering about models of loudness that take critical bands into account. To me, the existence of critical bands kind of implies that the loudest sound within a particular bandwidth around it will essentially dominate the perceived loudness in that bandwidth, and that a second sound within that bandwidth would not increase perceived loudness very much overall, despite adding more energy to the signal. With a little experimentation, I can kind of prove this to myself. I can hear that a pure 1kHz sine tone hitting 0dB is roughly as loud as a surprisingly wide band of white noise around 1kHz where all of the frequencies present also peak at 0dB, even though the white noise has far more total energy.

Another way I recently saw this come up is when comparing the spectrums of a lead guitar with a rhythm guitar, shown below. I perceive both of these tracks to be about the same loudness across the spectrum, but the visuals paint a different picture. The lead is missing large scoops of low end around 200Hz and below 100Hz that are present in the rhythm, and yet they sound the same to me in this band. It seems to me like the large fundamental in the lead guitar at 120Hz-ish is largely compensating for the lack of other frequencies around it.
image.png

I'm curious if there are any well known concepts or research that model loudness to account for this? It seems like the equal loudness contours do not take this effect into account. I'm imagining some kind of study where, instead of asking listeners to judge how they perceive sound A specifically when sound B is moved into A's critical band, the study asks "How much louder do you find A+B than A alone (if at all) when A and B are within each other's critical bands?" Ultimately, it would be really cool to generate a "perceptual" spectrogram of a sound, where the two guitars above end up having much more similar spectrums in the low end.
 
Last edited:

tmuikku

Senior Member
Joined
May 27, 2022
Messages
302
Likes
338
Hi, I suppose there is lot's of studies on this. For example mp3 file format is taking advantage of this, truncating information that is masked.

In a typical home stereo setup the subject, masking, peaks, is about always present with room modes no matter how good or bad a system is, the modal peaks seem to be the most severe. There can be easily +10db peak(s) on somewhere low frequencies and these are really bad in a way that when you do dip them out with EQ the sound seems to get better way above, everything seems to get better. I do not know what kind of things are involved perceiving modes, but the masking is one thing.

If you measure your setup with REW, you could have a strong +10db peak at listening position at 40Hz for example. Now make psychoacoustic smoothing and the peak widens and reduces in amplitude. Try to EQ the peak with the psychoacoustic smoothed graph and about nothing happens, if you measure again there is about same peak in the psychoacoustic smoothed results because the room mode did lower only few db and is still +10db, sound didn't really change either. All you did is reduced bass in general, around the peak mostly, but the peak still dominates perception of bass.

Now, instead don't smooth your graph and make a strong dip, like -15db with Q of 10 or something, to really address the mode and voila.
I think smoothing is fine for higher frequencies or to contour sound, but fixing issues it's maybe better to see the peaks as those really seem to be very audible and ruin a lot. Instead of smoothing one could play with windowing as well.
 

dasdoing

Major Contributor
Joined
May 20, 2020
Messages
4,301
Likes
2,770
Location
Salvador-Bahia-Brasil
Hi, I suppose there is lot's of studies on this. For example mp3 file format is taking advantage of this, truncating information that is masked.

In a typical home stereo setup the subject, masking, peaks, is about always present with room modes no matter how good or bad a system is, the modal peaks seem to be the most severe. There can be easily +10db peak(s) on somewhere low frequencies and these are really bad in a way that when you do dip them out with EQ the sound seems to get better way above, everything seems to get better. I do not know what kind of things are involved perceiving modes, but the masking is one thing.

If you measure your setup with REW, you could have a strong +10db peak at listening position at 40Hz for example. Now make psychoacoustic smoothing and the peak widens and reduces in amplitude. Try to EQ the peak with the psychoacoustic smoothed graph and about nothing happens, if you measure again there is about same peak in the psychoacoustic smoothed results because the room mode did lower only few db and is still +10db, sound didn't really change either. All you did is reduced bass in general, around the peak mostly, but the peak still dominates perception of bass.

Now, instead don't smooth your graph and make a strong dip, like -15db with Q of 10 or something, to really address the mode and voila.
I think smoothing is fine for higher frequencies or to contour sound, but fixing issues it's maybe better to see the peaks as those really seem to be very audible and ruin a lot. Instead of smoothing one could play with windowing as well.

I EQ to psy 5 times in a row, and than to var 3 times in a row
 

tmuikku

Senior Member
Joined
May 27, 2022
Messages
302
Likes
338
Do you mean you EQ with psychoacoustic smoothing enough times that it flattens out? is it that you always adjust the filter in between? Why not EQ unsmoothed and get it smooth one go? How is it working?

As EQuing room sound better at one position, it makes it worse somewhere else, so I think smoothing gives feeling of safety to not over EQ, but I'm not sure if it's good or bad thing, depends what you are doing. If you want to fix a peak, then you should fix the peak without smoothing, and be sure what you are fixing and where it gets better and where worse. Worst case, EQuing an issue with smoothed response would not fix the issue anywhere, and make things worse somewhere.

I also use psychoacoustic smoothing when tailoring for some target response, but first I'd check the issues out, peaks.

As disclaimer, I've only played with my setup in my room and am not pro at this, just observing what happens as I play around.
 
Last edited:

DRMLFL

Member
Joined
Dec 14, 2023
Messages
32
Likes
30
Location
Germany
1 kHz - where the letter P is located - is really the scale, balance, and the middle point of our hearing. The letter P is the most plosive of all plosives (pun intended).
So, think of air movement when saying P, and think of phonetics and aspirated voiceless bilabial plosive sounds.
I found this on Wiki and translated it with DeepL for a better understanding what I want to describe:
The voiceless bilabial plosive (a voiceless closure sound formed with both lips) is a consonant that is articulated by the air being trapped behind the closed lips and suddenly escaping while the vocal folds are at rest.
Here is a video, and listen carefully to the very first P sound:

Communication can be so diverse, and still the easiest and fastest is spoken communication. This is (I guess) why humans have the ability to talk to each other, this is what separates us from other mammals. We clearly can describe anything in the world with words and we even hear ourselves talking when we think. Words can hurt but they also can heal.

The following picture, the so-called speech banana, shows the connection between frequencies and letters.
speech-banana1.picture.jpg


So, taking also the bark scale and/or the mel scale into consideration, I think this why 1 kHz is the "heart" of our hearing. Our hearing leads us to our sixth sense, which is to perceive emotions with your heart your gut feeling.

Stay focused!
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,771
Likes
37,636
One thing that makes this tough is though our hearing is something like breaking our frequency range into 32 or so critical bands, those bands aren't static. They slide around depending upon the sound being heard. The filter is centered on the highest level within each band. The spacing more or less conforms to ERBs (effective rectangular bandwidth). J_J would be the forum member who could be most informative about this. REW and other software has ERB smoothing and Psychoacoustic smoothing to sort of help with this.
 

Curvature

Major Contributor
Joined
May 20, 2022
Messages
1,116
Likes
1,410
Another way I recently saw this come up is when comparing the spectrums of a lead guitar with a rhythm guitar, shown below. I perceive both of these tracks to be about the same loudness across the spectrum, but the visuals paint a different picture. The lead is missing large scoops of low end around 200Hz and below 100Hz that are present in the rhythm, and yet they sound the same to me in this band. It seems to me like the large fundamental in the lead guitar at 120Hz-ish is largely compensating for the lack of other frequencies around it.
I think your interpretation is correct.
I'm curious if there are any well known concepts or research that model loudness to account for this? It seems like the equal loudness contours do not take this effect into account.
I don't fully understand your question. What sources did you use to learn about CBs? Because it seems like you already answered it above. The classical examples discuss the amount of energy in a CB being a function of bandwidth and level of a sound, and level and bandwidth can be traded to create sensations of equal loudness.

Equal loudness contours are a function of the individual tuning of each CB, which corresponds to a physical location on the basilar membrane.
I'm imagining some kind of study where, instead of asking listeners to judge how they perceive sound A specifically when sound B is moved into A's critical band, the study asks "How much louder do you find A+B than A alone (if at all) when A and B are within each other's critical bands?" Ultimately, it would be really cool to generate a "perceptual" spectrogram of a sound, where the two guitars above end up having much more similar spectrums in the low end.
That's confusing wording. Sounds, other than pure tones, never fall into just one CB.

It's been a while since I've looked at the studies, but many have been done asking that question.

Zwicker & Fastl's book Psychoacoustics: Facts & Models is a good reference. That and Moore's Introduction to the Psychology of Hearing.

I think the problem with perceptual spectograms is that loudness is tied to many factors and it isn't clear how they all interact in the auditory cortex. Besides spectrum, duration and envelope are a very important determinants, as are direction and binaural integration.

One thing to note is that var or psychoacoustic smoothing approximates the way the ear interprets the relative loudness of one frequency vs. another, but the way the ear processes sound is through those 20-40 CB filters (few different models out there) whose tuning depends on level and ear health. A good model would show you what's happening in each one. In my mind, anyway. I don't know what the best visual representation of this would be. I think something based on the psychoacoustic encoders for compression would do a good job, since they are based on masking models and do interesting, well-researched slicing of the audible spectrum based on CBs.
 
Top Bottom