• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Perception of Sound Loudness/Dynamic Range

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,722
Likes
241,583
Location
Seattle Area
This is a topic that I expect some of the members in the forum who are in the medical field to also know. If so, feel free to contribute :).

The outer ear is designed to collect and magnify the mid-range frequencies. This is the reason the equal loudness graphs of Fletcher Munson show much higher sensitivity in that region which naturally enables one to hear other humans better:

705px-Lindos4.svg.png


The effect is like cupping your hand around your ear to hear better. The "horn" effect created will best tune to frequencies of 1 to 5 Khz.

At this point we have air pressure inside the ear. That gets us to the middle ear which starts with the ear drum. This is an impedance matching system where it translates the high pressure but small movement of air molecules to high movement of the fluids in the inner ear that ultimately lead to us detecting sound.

There are two sets of hair cells. The outer hair cells (OHC) are not hairs in the classical sense in that they are connected at both ends. They play a critical role. Normally the maximum dynamic range that we can detect without that is about 60 db. This is computed based on thermal and other noises that set the lowest level sound we can hear, and the maximum movement that is possible. Yet, listening tests show that we can hear a dynamic range of 120 db. How do we go from 60 to 120? The outer hair cells enable that by creating a dynamic compression (as in volume compression). It is part of a clever positive mechanical feedback loop. The hairs actually lengthen or shorten/stiffen and loosen their structure to enable this dynamic range control:

OrganOfCorti.gif


The ultimate goal is to change the sensitivity of the inner hair cells that drive the nerves that feed the brain. Pretty cool system if you ask me! :). BTW, I am trying to simplify the concepts here and provide generalized consensus around this area as I know it. Research is on going to model the ear and it remains challenging to fully do that.

At this point we are done with the ear and the next chapter is the brain. This thing is even more clever. The first thing happens is that all the data generated by the nerve system is captured "bit for bit" if you will in a short term storage memory. Little to no analysis of what is being heard is done at this stage. It is a simple dump of all data.

This short term store is called "echoic memory" because it allows the brain to revisit what has been captured. Ever ask someone to repeat something only to remember what they said? That is your brain going back to the echoic memory, re-analyzing what was captured and saying, "aha! now I get it." The echoic memory is able to store 4 to 5 seconds. Despite the short during in our terms, that is incredible amount of data that has been captured if you think about what it is recording.

The cognitive part of the brain then kicks in. It analyzes and chooses what it wants to store for medium and long term. Think of driving to work and someone cuts you off. You will remember that car, its color, etc. very well. But you will likely not remember any of the other cars you passed. We saw those other cars but the brain assigned little to no value in keeping that information. Same thing happens with sound. We choose to hear parts of the music. We likely miss significant details in the music which we may then pick up in subsequent listenings.

The cognitive process plays an important role in acoustics. The Haas effect that says we don't hear faded versions of a sound that come a few milliseconds later, e.g. a wall reflection, is the result of this analysis by the brain. If the two signals are similar in nature, the brain says, "this looks like a lousy copy of the louder sound so I am going to ignore it." Increase the time gap a ton and the brain decided the other event is different such as true echo in a large space. This is why room reflections are not what they seem to us on paper and in measurement systems. They lack the filtering that goes on in the brain.

How long does the echoic memory stays before decaying is under debate. I have seen estimates of a few seconds to 10 or even 20 seconds. This and the duration of the echoic memory have very important impact in AB switching audio tests. If I play A and then play B 1 minute later, the brain simply does not have proper recall of the other clip. If there are huge differences between them, then the brain may have stored that in long term memory. But if the differences are very small, there simply no recall of such detail as the echoic memory has long been wiped clean.

It is for the above reason that you hear the requirement for very fast switching between audio clips in AB testing. Speaking personally, I cannot pick small differences at all if the time is 1 seconds or more. I usually isolate a single item such as a single guitar pick, put it in a loop and then keep switching back and forth between two versions of it. That way, the echoic memory has a complete record of both sounds and it enables me to analyze them at whatever detail necessary. By looping I am constantly refilling my memory with both versions of the stimulus.

Now imagine a test where you hear one set of audio cables for a few minutes, stop, then put in another set of audio cables. By then there is no way we have perfect recall of the first system. That detailed memory is long gone. Much filtered version of it of course exists in the form of what song it was, what was played, etc. But nothing down to how that one 100 millisecond transient sounded. In that sense, any double blind AB tests that involves such long switching times is unreliable and favors the person not finding a difference as they get frustrated that they can't recall the other clip accurately enough. Again, if the differences are very large, then we can rely on mid and long term memory to detect it but in this case I am assuming that is not the case with cables in this example.

The above is readily apparent to anyone who has taken these blind tests. Do your own with an only ABX test of a compressed and uncompressed file and keep increasing the switching gap. No doubt the job gets harder and harder.

So what about the theory of "long term listening" being a better comparison? I don't know :). It is possible that we commit to long term memory nuances that we recall when we perform such tests. It is also possible that we simply hear different things in each pass. When we hear system A, the brain picks different set of info from echoic memory than in the case of System B. That delightful note that we hear in the second system may have always been there in System A but we just did not unconsciously focus on it.

Anyway, this is what I know of the topic :). I have on purpose skipped over a large chapter on how we actually detect sound and focusing on its capture and recognition. Appreciate comments, debates, info, etc.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,793
Likes
37,702
Glad to see this post here. Quite straight forward about what we know about how the hearing works and leaves room for the areas not known about how that data gets parsed further up the line into the brain.
 

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
Amir, there are some additional factors to be considered & some oversimplifications to be corrected in what you say.

Yes, A/B testing strengths is in the use of echoic memory which is useful for detection of frequency/amplitude differences between audio but this isn't as useful for other factors which require higher levels of auditory processing.

Most of what our auditory system is about is making sense of the vibrations of the eardrum i.e. constructing from these vibrations a scenario that fits with our experience of how the world works. This whole field is call auditory scene analysis - it's analogous to visual scene analysis - we aren't cameras or microphones - we are very much more like a distributed network of parallel computers operating in real-time analysing & creating a "best guess" visual/audio scene from elements within the signals. This "best guess" is continually being evaluated & changed if necessary based on the new information that is continually being transmitted to the auditory processing system. So, auditory perception is, of necessity, a tenuous conclusion, a flexible decision about what the signals signify. We live with this uncertainty - it's a fundamental characteristic of our perceptions. It's one of the reasons we second guess our choice in A/B testing - it's also why I say we need a specific "tell" when doing A/B testing as for most of us the natural uncertainty of our perception overwhelms & makes for null results

These elements are still being uncovered but we know that the ear breaks up the signal into critical frequency bands by the use of a bank of frequency filters called ERBs (Effective Rectangular Bandwidths) - we know that an early stage in auditory processing uses a bank of modulation filters that are tuned to specific frequencies (mainly got to do with speech analysis but probably also used generally); we kinda know that a statistical analysis of certain types of sounds is the prefered method to analyse & decide on how realistic such sounds are

So you see - a lot of the features in the sound stream are outside of echoic memory - they are only created based on the brain's processing of a longer section of the audio stream & not an instant A or B moment in that stream. Things like the quality of the soundstage, realism of the playback illusion, all require sampling of a longer piece of audio than instant A/B techniques.

We also have to be careful about Fletcher-Munson curves & treating them as the bible. For instance I believe I gave you this info before - the sensitivity curve for noise which is defined in ITU-R 468 is very different to the Fletcher_Munson graph showing a max 12dB difference @ 6KHz but differences elsewhere in the frequency range also:
400px-Lindos3.svg.png

And we have to be careful about thresholds - the accepted threshold for differentiating the amplitude of two signals is considered to be 1dB yet we are told that two playback devices have to be matched to within 0.1dB or the louder one will be perceived as "better quality". So here we have an example of a perceived difference when listening to music which would not be predicted as audible based on the accepted JNDs for loudness.
 
Last edited:

NorthSky

Major Contributor
Joined
Feb 28, 2016
Messages
4,998
Likes
946
Location
Canada West Coast/Vancouver Island/Victoria area
this book is interesting, not directly related to audio but..
View attachment 39

My browser has an 'incognito' mode, for not memorizing the history, not accumulating data, not giving amazon the products to advertise and strangle me with.

Low volume listening levels; some audio components and speakers are better than others @ it. To me it is a great attribute.
The lower the distortion @ lost listening levels, the more natural the dynamic range @ those levels.
And if the designs were built with that in mind, there is a very good chance that increasing the master volume level would provide even more dynamic range.

* I'm only talking about the "proper" music recordings here, not all that pizzazz from the majority of bad music and bad sound recordings...the type that makes you violent, without life, but aggressive and irritating, or plain 'boringness' sameness level. You know what I'm talking about...you have some of those in your own music library, just like me.

When I look @ measurements of amplifiers, I checked @ low volumes, 3 watts and below (0.03 watts). The lower the distortion the more attracted I am by those amps.
The volume control; analog and digital...it's the most serious "tone" control of them all in the universe and the ruler of all.

Classical music is generally the one with the largest dynamics. ...Widest range. ...One minute you listen @ a certain volume level, the next second that control is the fastest control you reach for! It is so fast the reaction you have that you are totally internally shook up! There are no faster way to instinctively react and reach for the volume control than when listening to orchestral classical music.
* With some movies too. But movies are different; between soft spoken dialog (two lovers), and a sudden hurricane with all mayhem including the music score with winds and timpani drums and horns sections.

Death to the loudness war! Vive le Roi! ...King. :)
 

Vincent Kars

Addicted to Fun and Learning
Technical Expert
Joined
Mar 1, 2016
Messages
796
Likes
1,593
Classical music is generally the one with the largest dynamics. ...Widest range. ...One minute you listen @ a certain volume level, the next second that control is the fastest control you reach for! It is so fast the reaction you have that you are totally internally shook up! There are no faster way to instinctively react and reach for the volume control than when listening to orchestral classical music.
Yep

When I am seated at the Concertgebouw in Amsterdam I don’t have this problem.
The moment you play a symphony over your speakers the contrast between soft and loud is as you describe it
Somehow a symphony orchestra doesn’t fit into a living room.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Yep

When I am seated at the Concertgebouw in Amsterdam I don’t have this problem.
The moment you play a symphony over your speakers the contrast between soft and loud is as you describe it
Somehow a symphony orchestra doesn’t fit into a living room.
This is veeeerry important ;);) ... if the behaviour in your living room is like this, then to me it says that there is a problem with the playback system. Problem, what problem??! Answer: the system can't do 'intensity' well, limitations of the setup are kicking in.

It's one of the first things I do if I visit an unknown system - effectively test whether it's capable of producing intensity in the sound, competently. If it can't, then I scale back my expectations, and listen to it differently - it can't go over 60mph, therefore there's no point in pushing the accelerator hard to the floor ...
 

pos

Addicted to Fun and Learning
Forum Donor
Joined
Feb 13, 2018
Messages
574
Likes
720
We also have to be careful about Fletcher-Munson curves & treating them as the bible. For instance I believe I gave you this info before - the sensitivity curve for noise which is defined in ITU-R 468 is very different to the Fletcher_Munson graph showing a max 12dB difference @ 6KHz but differences elsewhere in the frequency range also:
400px-Lindos3.svg.png

@amirm, do you have access to ITU-R 468 weighting in your AP?
It would be very interesting for DNR and S/N measurements.
 

jss

Member
Joined
Oct 30, 2021
Messages
7
Likes
2
These days the "subjective study" of audio quality goes well beyond the what the PEAQ ITU-T BS.1387 and PESQ ITU-T P.862 set out to achieve. AI/ML have been now heavily employed. As listening tests (aka subjective evaluation in a controlled environment) are time consuming and costly and cannot be easily carried out at development stages, plus prevailing pandemic, researches are turning to ML/AI/meta-analyses to try and come up with models to predict various AQ / SQ. Eg. dynamics in music performance - its perceptual prediction. However, the advance is not without challenge, one main hurdle is the ground-truth quality and even bias and potential errors..
 
Top Bottom