• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Auditory Scene Analysis

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
Yes, this hobby & the goal of audio reproduction is the creation of an illusion - an illusion that gives us enough audible cues to satisfy our ever-vigilant auditory processing. Much like we watch television, videos or movies which offer us enough believability to allow us to forget their limitations & become engaged with the content - this is where the emotional connection then begins. If there isn't enough believability we are constantly aware of the medium & it's portrayal - when we are bored by the sound, it's a sure indication that this believability is missing or less convincing

Let's forget about the psychophysical aspect of the ear mechanism & focus on Auditory Scene Analysis or ASA. This is the area of study, first started by Bregman in 1990 or so, which is concerned with how we make sense of the vibrations of the eardrums & create an auditory scene from these signals. Much like we create a visual scene from the impact of photons on the rods & cone cells in the eyes, we create an auditory scene from the two streams of electrical signals coming from both ears.

Now, when you think about it, this is a highly complex & interesting problem that the brain has to try to solve - to continuously create a fully realistic, moving auditory scene that maps the auditory objects in that scene & follow their change through time. In other words close your eyes & listen - you will hear & be able to locate all the sonic objects around you, including the size of the room, etc. just from the electrical waves being generated in the ears. Think about it - this is the equivalent of sitting at a corner of a swimming pool & being able to use only the waves hitting this corner to sense how many people are in the pool, where they are & where they're moving to, what they're doing, how big the pool is, etc.

The idea of ASA seems to have its genesis in trying to answer the question that the "Cocktail Party Effect" gives rise to - how do we follow one conversation among all the other conversations & noise at a party. The audio signals from all sources are hitting the ear at the same time as the audio signal from the conversation so how do we isolate & associate the signals that belong to just the followed conversation from among all the rest i.e how do we form an auditory object & follow it in the face of changing signals & changing surrounding auditory signals?

How the brain does this is being teased out in ASA & other areas of sound research. The auditory processing happens whether we are listening to the real world or to the signals from our speakers which are attempting to create an illusion of an audio performance or audio event

We perceptually ascertain audio objects in what we hear by the brain processing that we perform on the signal. The perception of these audio objects occurs because we seem to cross correlate particular signal markers which we associate with that particular audio object - spatial location, timbre, temporal coherence, & amplitude all seem to play a role - let's call these some of our perceptual factors for identifying this object. So, the interplay & relationship between these factors are the rules or schema or models that is the study of ASA. If these rules are adhered to in the audio playback system then we have a believable illusion - the more the rules are diverged from, the less believable the illusion.

Now, one thing about digital audio - because it is based on mathematics & is almost infinitely adjustable, it has so many new ways to diverge from these rules & introduce new audio anomalies that we have never heard in the real world - things like digital filter ringing spring to mind. When we encounter a new audio anomaly that we haven't met before we tend to be subconsciously confused as we have no biological model to fit it to & we are not consciously aware of what is wrong, just that we want to turn off the playback or are bored by the sound & our attention drifts. I suspect that this occurs more often than we would like to admit & may well be where the disagreements arise from - between those that intuitively (Or explicitly know this) & those who believe that measurements tell us everything ?

This part is something I wrote before which continues from the thoughts above (so forgive some of the repetition to what's above):

Yes, it's already been stated here but is worth repeating - what we hear is a construct of our brain processing. Fundamentally, there is not enough data in the signals that are picked up by the two ears to fully construct the auditory scene - we need to use all sorts of pattern-matching, extrapolation, experience of the behaviour of sounds in the real world (biological models of sound), sight, etc. to generate the fairly robust auditory scene that we continuously do.

One of the important points that comes from the research is that we are continually processing the auditory data & updating our best-guess auditory scene by decomposing, analysing & comparing the auditory signal stream & comparing it to already stored auditory models of the world

People who interpret psychoacoustics as being the illusional part of hearing & what makes it untrustworthy are completely missing this fundamental point - psychoacoustics is what allows us to make sense of the jumble of pressure waves impinging on our eardrums. It's what allows us to pick out the auditory objects, such as the bassoon in the orchestra & be able to follow it's musical line through a performance or be able to switch to listening to the string section.

Stereo reproduction is itself a trick - a trick that uses some learned knowledge about psychoacoustics to present an acceptable illusion of a real auditory scene. However, not knowing the full rules/techniques that our brains use in psychoacoustics somewhat hampers this goal of realistic audio reproduction. As a result, we can find that small discoveries are stumbled upon which audibly improve matters in a small way but we have no clear explanation yet for how they are working at the psychoacoustic level.

Without this knowledge of psychoacoustic rules, we are stumbling around using unsophisticated measurements & I believe, incorrect concepts about the limits of audibility. A lot of the improvements that I hear reported in audio are about increased realism, increased clarity, etc. - in other words they are no longer about frequency/amplitude improvements - they are improvements in other factors which our psychoacoustic rules are picking up on & we are perceiving as more realistic. Or, maybe they are small changes in freq/amplitude that currently are dismissed as inaudible but further knowledge about psychoacoustic workings may well reveal them to be audible when part of the dynamics of music & not when tested in a lab with simple tones?
 
Last edited:

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Excellent move kicking this topic off, John - I have not researched ASA at all as yet, though I may have bumped into it when looking around and not realised how relevant it was. I'll check any references you recommend, and hunt for more material ...

One of the major rules seems to be the absence or presence of low level distortion masking similar amplitude acoustic and ambience clues in the incoming sound - IME, there is definitely a cut-off point where the processing fails, and it's quite abrupt: I've heard a system move gently between sufficient information, and insufficient information, but subjectively the transition was abrupt - the illusion collapsed.
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
Excellent move kicking this topic off, John - I have not researched ASA at all as yet, though I may have bumped into it when looking around and not realised how relevant it was. I'll check any references you recommend, and hunt for more material ...

One of the major rules seems to be the absence or presence of low level distortion masking similar amplitude acoustic and ambience clues in the incoming sound - IME, there is definitely a cut-off point where the processing fails, and it's quite abrupt: I've heard a system move gently between sufficient information, and insufficient information, but subjectively the transition was abrupt - the illusion collapsed.
Yes, it's hugely relevant to this hobby & what is missing from most of the discussions on here. I tried to introduce some appreciation of it in most of my posts but it ended up being called "waffle" :(

Here's a good starting point which has on-line demos & videos https://auditoryneuroscience.com/scene_analysis
If you require further in-depth info about any of the topics in this link, just ask & I'll try to recommend something
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Thanks for that, John ...
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
Bob, I really don't have a clue what you're talking about or it's relevance to the topic?
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Yes, very interesting stuff ...
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Done some Goggling - yes, this looks like an excellent lode, Bregman is the man - I was always on the lookout for a theoretical underpinning of the sound behaviours that I was hearing, and this seems to be on the money. It is curious that this body of work had not struck loudly in my awareness up to now - better late than never!
 

Phelonious Ponk

Addicted to Fun and Learning
Joined
Feb 26, 2016
Messages
859
Likes
216
Yes, even when the vibrations fall initially on a microphone diaphragm instead of our ears, and are filtered through a reproduction system, we humans have the ability to focus on and follow the specific. And while I'm sure fidelity helps, it's certainly not a necessity. I've seen musicians point out that the chord opening that bridge is not C7, but C7 add 9...from listening to the speaker in an iPhone. That's focus. This is valuable research, though, because loss of this ability can be a symptom of general hearing loss. Maybe someday when the "rules" (are none of them understood? You didn't give us any) you refer to are known, and understood, we'll be able to help those people. I'm not at all sure about this, though:

"Without this knowledge of psychoacoustic rules, we are also stumbling around using unsophisticated measurements & I believe, incorrect concepts about the limits of audibility."

What we're measuring are the vibrations getting to the ear, not what the brain does with them. What we're measuring when establishing, so long ago, the limits of audibility, are the range of frequency and amplitude that can stimulate the eardrums enough to be recognized by the brain. I don't imagine the brain can perceptually process something the ears can't pick up. But hey, anything is possible, particularly given the "stumbling, unsophisticated measurements," of this technology with only a bit over a century of maturity.

Tim
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
Done some Goggling - yes, this looks like an excellent lode, Bregman is the man - I was always on the lookout for a theoretical underpinning of the sound behaviours that I was hearing, and this seems to be on the money. It is curious that this body of work had not struck loudly in my awareness up to now - better late than never!
Glad it has resonated :D with you
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Just noted that his work's on ResearchGate, still producing articles - will check through the goodies ... :cool:
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Some abstracts:

Article: Issues in the use of acoustic cues for auditory scene analysis
Albert S. Bregman
[Hide abstract]
ABSTRACT: Issues concerning auditory scene analysis (ASA) raised by the previous speakers will be discussed: (1) Disorders of ASA in humans can tell us about the weighting of cues in ASA. (2) The apparent weakness of spatial cues for ASA may simply show that they interact strongly with other ASA cues (c.f., recent research in the author's lab). (3) The power of harmonic relations among partials as a grouping cue is not guaranteed, but depends on many other factors. (4) Abstract models of ASA may require the peripheral auditory system to carry out analyses that are questionable, based on current psychophysical and physiological findings. Is this where psychologists and computational ASA (CASA) modelers part company? (5) The ``old-plus-new heuristic,'' one of the most potent ASA mechanisms, is neglected by existing CASA models. (6) The different roles of bottom-up and top-down processes (e.g., in ``exclusive allocation'' of sensory evidence) should be reflected in models. (7) Should the output of a CASA system be the reconstructed signal of a single source, as a front end to a recognition system, or should grouping mechanisms merely form an interacting part of a larger system that outputs a higher-level description (e.g., a series of words)?

No preview · Article · Jan 2003 · The Journal of the Acoustical Society of America


Conference Paper: Progress in the Study of Auditory Scene Analysis
Albert S. Bregman
[Hide abstract]
ABSTRACT: The early research on auditory scene analysis (ASA) - the subject of my talk at the corresponding IEEE workshop in 1995 - has been followed by many exciting studies that have opened up new directions of research. A number of them will be discussed under the following headings: (1) What is the role of attention in ASA? (2) What have we learned by using evoked potentials to study ASA? (3) To what extent has research on human babies and on non-human animals supported the idea that primitive ASA is "wired into" the brain? (4) What is the physiological basis of ASA? (5) How is "binding" carried out in the brain?
No preview · Conference Paper · Nov 2007


Article: Three directions in research on auditory scene analysis
Albert S Bregman
[Hide abstract]
ABSTRACT: Research on auditory scene analysis (ASA) began with some simple laboratory phenomena such as streaming and illusory continuity. Subsequently, research has gone in three directions, downwards toward underlying mechanisms (by neurophysiologists), upwards toward system organization (by computer scientists), and sideways toward other species (by neurobiologists). Each direction has its problems. The downward approach sometimes takes a phenomenon-oriented view of ASA, leading to simple explanations of a single ASA demonstration, such as streaming, with no obvious connection to any larger system. Research done by the upward approach usually takes the form of a computer program to achieve ASA in a working system, often ignoring known facts about human ASA, in favor of mathematically understood principles. The sideways approach often finds that non-human animals can respond to an important sound despite the presence of other interfering sounds. However, there is no reason to believe that a frog, a fish, and a human accomplish this by means of the same mechanisms. So finding out how some animal does this, while interesting in its own right, may shed little light on how humans do it. I will describe some properties of the human ASA system that should be borne in mind when manufacturing explanations.
No preview · Article · May 2013 · The Journal of the Acoustical Society of America
 

NorthSky

Major Contributor
Joined
Feb 28, 2016
Messages
4,998
Likes
945
Location
Canada West Coast/Vancouver Island/Victoria area
Bob, I really don't have a clue what you're talking about or it's relevance to the topic?

I deleted my post. It was too ambiguous I concede.
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
Yes, even when the vibrations fall initially on a microphone diaphragm instead of our ears, and are filtered through a reproduction system, we humans have the ability to focus on and follow the specific. And while I'm sure fidelity helps, it's certainly not a necessity. I've seen musicians point out that the chord opening that bridge is not C7, but C7 add 9...from listening to the speaker in an iPhone. That's focus. This is valuable research, though, because loss of this ability can be a symptom of general hearing loss. Maybe someday when the "rules" (are none of them understood? You didn't give us any) you refer to are known, and understood, we'll be able to help those people.
I stated a "set of rules" for simplicity - it's far more complex than that really. It's easier to talk about visual perception to explain what I meant by this because we know a lot more about the working of visual perception than auditory perception. Essentially we analyse the visual signal into visual features - which means it's split into 3 aspects - spots and edges, colors and shapes, movements and textures. These are all attributes that are not in themselves objects, but in combination they can define the objects we see. So let's be clear here we are not seeing an object, we are seeing these features from the visual scene. There is no signal in the brain that is the visual object. So, we can see why perception is considered not reality but our creation. These features are the building blocks of perception. At a higher level in the brain these 3 parallel analyses of features can be recombined in various ways to form visual objects or aspects of visual objects i.e the objects we perceive. If you want to read more about this you'll find a good source here Now auditory processing works in similar ways but it's a little more hazy - we split the signals into various features - frequency analysis banks, amplitude modulation banks & temporal analysis but it gets a lot more complicated than this.
I'm not at all sure about this, though:

"Without this knowledge of psychoacoustic rules, we are also stumbling around using unsophisticated measurements & I believe, incorrect concepts about the limits of audibility."

What we're measuring are the vibrations getting to the ear, not what the brain does with them.
Yes, that's signal analysis & acoustics
What we're measuring when establishing, so long ago, the limits of audibility, are the range of frequency and amplitude that can stimulate the eardrums enough to be recognized by the brain. I don't imagine the brain can perceptually process something the ears can't pick up. But hey, anything is possible, particularly given the "stumbling, unsophisticated measurements," of this technology with only a bit over a century of maturity.

Tim
Well, this is where it gets complicated because we actually process the signals coming from two ears & analyse more than it may seem possible to do. As a simple demonstration - the concept of the missing fundamental gives an idea of what's going on - if we hear the harmonics of a fundamental tone but the fundamental is actually missing - we hear (perceive) the fundamental itself even though it's missing from the signal.

Also let me state that music signals are far different to tones & noise used for establishing limits of audibility. Why? Because music involves patterns & our brains are pattern matching engines par excellence or correlation engines. We can do a helluva a lot more with correlation & pattern matching than with simple noise - for instance by comparing repeating patterns we can tell when timing is off at a far greater sensitivity than the audibility test which involves the timing separation of clicks.
Here's what Rob Watts says which is food for thought:
I like to think of the resolution problem as the 16 bit 44.1k standard - the ear performance is pretty much the same as CD - 96 dB dynamic range, similar bandwidth. But with CD you can encode information that is much smaller than the 16 bit quantised level. Take a look at this FFT where we have a -144 dB signal encoded with 16 bit:







So here we have a -144 dB signal with 16 bit data - the signal is 256 times smaller than the 16 bit resolution. So even though each quantised level is only at -96 dB, using an FFT it's possible to see the -144 dB signal. Now the brain probably uses correlation routines to separate sounds out - and the thing about correlation routines is that one can resolve signals that are well below the resolution of the system. So it is possible that small errors - for which the ears can't resolve on its own - become much more important when they interfere with the brains processing of the ear data. This is my explanation for why I have often reliably heard errors that are well below the threshold of hearing but nonetheless become audibly significant - because these errors interfere with the brains processing of ear data - a process of which science is ignorant off.
These are difficult concepts to get our heads around particularly when we experience vision & hearing on a daily basis & it seems effortless. But this is the research that is going on into how our perceptions work & it's not waffle, by any means!!
 
Last edited:

Phelonious Ponk

Addicted to Fun and Learning
Joined
Feb 26, 2016
Messages
859
Likes
216
That's all very interesting, John, but the brain can still only perceive what the ears are sensitive enough to pick up. I'm familiar with the missing fundamental. In the context of a musical progression, where the brain expects the next fundamental in that progression to be X, you can play a chord that doesn't actually contain the fundamental, and the fundamental is implied. That's not the same as the fundamental actually being there. It's not, and that's why it's not measurable; the phenomenon says nothing about what cannot be measured but can be perceived. The fundamental is not there; we do not "hear" it. It is implied and we fill in the blank from our experience and expectations. This is fine research, not waffling. Enjoy. But I doubt it has anything to do with the complete measurement of the audio created by components and systems. That audio can only deliver to the ear its best representation of the recording. If you want that implied fundamental to work from a hifi perspective, what you need to do is deliver the notes of the chord around that missing fundamental, the stuff that is audible and gives the brain the material it needs to work with. Then the brain will fill in the blank. To that end, your most productive path would be to acquire the tools of your trade and measure your products as thoroughly as you know how during testing and development. Speculating on the metaphysical impact of the inaudible may be a good hobby, though.

Tim
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
That's all very interesting, John, but the brain can still only perceive what the ears are sensitive enough to pick up. I'm familiar with the missing fundamental. In the context of a musical progression, where the brain expects the next fundamental in that progression to be X, you can play a chord that doesn't actually contain the fundamental, and the fundamental is implied. That's not the same as the fundamental actually being there.
Tim, the point is what our perception picks up.
Your point is - if the eardrum doesn't vibrate then there's no signal to process. But the eardrum vibrates down to just above the level of brownian motion & we sense this vibration. What I think you are confusing is the concept of what is considered audible Vs what vibrates at the eardrum
It's not, and that's why it's not measurable; the phenomenon says nothing about what cannot be measured but can be perceived. The fundamental is not there; we do not "hear" it. It is implied and we fill in the blank from our experience and expectations.
but again, the point is we construct what we perceive - the auditory signals are just the foundations but the final constructed building is not the foundations.
If the fundamental was actually there would it sound different or the same? If it would sound the same then it is exactly "the same as the fundamental actually being there" - the perception is the same with or without the signal present
This is fine research, not waffling. Enjoy. But I doubt it has anything to do with the complete measurement of the audio created by components and systems. That audio can only deliver to the ear its best representation of the recording. If you want that implied fundamental to work from a hifi perspective, what you need to do is deliver the notes of the chord around that missing fundamental, the stuff that is audible and gives the brain the material it needs to work with. Then the brain will fill in the blank. To that end, your most productive path would be to acquire the tools of your trade and measure your products as thoroughly as you know how during testing and development. Speculating on the metaphysical impact of the inaudible may be a good hobby, though.

Tim
Sure, the missing fundamental is an easy example so we know what to measure to ensure that the missing fundamental is correctly recreated but there are lots of other processing tricks used that we don't know about so we don't know what/where to measure & to what level. So let's say that we didn't know the missing fundamental trick & how it works & we measured our waveform. Let's say that we find a slight discrepancy in the 5th harmonic but it is considered of no consequence as it is considered below audibility (based on listening tests with single test tones), yet we find the perception of the fundamental is affected, skewed in some audible way. Not knowing where to focus we see nothing wrong with the measured audio signal - all measurements are fine in the area above "audibility" so we conclude it's a delusion, right? This is where I'm saying we are with current measurements - we don't know where to look & we are using old audibility thresholds which need to be re-evaluated.
 
Last edited:

Phelonious Ponk

Addicted to Fun and Learning
Joined
Feb 26, 2016
Messages
859
Likes
216
Tim, the point is what our perception picks up but again, the point is we construct what we perceive - the auditory signals are just the foundations but the building is far different to the foundations you seem to have overlooked the fft where with a resolution of 96dB, we can still analyse down to 144dB, ways states that the brain works in a similar manner but it's too late here to formulate my reply, must sleep
[/Quote][/QUOTE]

My point was...well, you made my point. "the auditory signals are just the foundations..." If the auditory signals are not picked up by the ears, they won't reach the brain. How can the brain perceive what it is completely unaware of? Now we can have a conversation about the limits of human audibility, and you can point me to research that shows humans hearing beyond the known limits, but this research doesn't address that. It is about what the brain does with what the ears deliver.

Tim
 
Last edited:
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
My point was...well, you made my point. "the auditory signals are just the foundations..." If the auditory signals are not picked up by the ears, they won't reach the brain. How can the brain perceive what it is completely unaware of?

Tim
Tim, I just updated/changed my previous reply to your post - it was late last night when I was replying & I didn't have the energy for a full reply. Yes, I understand your point & I believe I've answered it in my update?
 
Top Bottom