• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Auditory Scene Analysis

OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
Regarding measurements & ASA streams

I believe the cues that we use to analyse sound into objects/streams are spatial, spectral & temporal cues. We analyse the relationship between these factors & as a result of this analysis form a perception of an auditory object. The auditory object is the end result of our analysis - the lumping together of certain spatial, spectral & temporal cues in the signal that our processing/analysis determines belong together & are emanating from the same source.

Streaming is this on-going analysis through time which is continually grouping these same auditory object cues together even though the object might change location, might change spectrally or might change temporally

Now this is happening for all auditory objects in the soundfield - so we have multiple auditory objects & multiple auditory streams that we are able to switch our attention between.

So if we go back to the ideas above about the perception of naturalness of auditory streams being because there is no fluctuation in the stream perception - they all remain firmly defined through the duration of the playback. A disturbance to a stream does not have to be a big change in spatial, spectral or temporal aspect - it does not have to be a change that lasts a long time (it can be very short). It's difficult to establish what is the minimum change in any one of these aspects that causes the disturbance of an auditory stream.

So, my view on measurements is that:
- firstly we need to be using test signals that are music
- we need to use techniques which will show any difference in these aspects throughout the whole music track
- we are not sure what depth we need to measure to in order to reveal the aspects above

So, I believe that techniques like AudioDiffmaker & Bibo01's/Tom's techniques of analysis are in the right direction
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
You're probably seeing the same sort of problems as I am - you would need to physically move the listener between two separate rooms, as an example. However, I have spent many years hearing a system slip between the two states, back and forth, simply by altering a single variable in the environment - so, to some degree, it is doable.

The interesting aspect is that the streams separate or merge quite distinctly, depending on that quality level. So the testing idea would along the lines of having two streams at various levels of "closeness in characteristics" - the measure would be the "closeness index" at which the ear could no longer segregate.
Right, you may well have trained yourself in being sensitive to these issues but Joe Audiophile has not got this training
It might be interesting if you could post some examples of audio that you hear these differences in & even let people listen to them blind before you reveal what you are hearing?
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Joe Audiophile may have problems, but not the ordinary consumer, methinks! Bev has never had any problem picking the difference, and neither did I have any training beforehand, when I first clearly "heard the streams".

With the measurements, it wouldn't be a "trick" test - in essence it would be a more sophisticated version of those simple tests that you first pointed to - a 'software' slider that adjusted the "degree of separation" and the aim would be to get the number when the sounds stopped being distinct. The purpose is not to test the listener, rather the system he wants to know the competence of.
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
This is a paper that deals with one of the "auditory cues" responsible for streaming that I mentioned above "Temporal predictability as a grouping cue in the perception of auditory streams"

What I was saying above was that streams can be disturbed by just one aspect being out of place - in this case if the temporal regularity is disturbed - "Based on our findings, we propose that temporal regularity is a form of predictability that tends to bind tone sequences into one stream, and adding temporal jitter violates to an increasing degree the expected pattern of regularity that would otherwise serve as a binding cue. "

"temporal predictability may serve as a binding cue during auditory scene analysis."
"The results in this letter demonstrate that temporal regularity is a binding cue between tone sequences, and that this breaks down to an increasing degree as one sequence becomes more irregular in time. The auditory system constantly copes with temporal irregularities in natural auditory scenes. It remains to be seen whether this influence of temporal irregularity on streaming can be captured by any of the existing computational models. However, it is clear that temporal predictability plays an important role in the perception of auditory streams, and needs to be considered for any complete understanding of auditory scene analysis."
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
Just a bit of a diversion from papers & research but still educational - here's a video of an outer hair cell showing how they "dance" to sound. Pity I can't embed the video here

"Since the amplitude, and hence the mechanical energy, of airborne sounds is tiny, the cochlea mechanically amplifies the incoming vibrations. The motors which supply this mechanical amplification are the outer hair cells. Like inner hair cells, they use stretch receptors associated with the stereocilia at their tips to sense vibrations and convert them to electrical currents. But only in outer hair cells are these currents used to control length changes which parallel, and reinforce, the incoming mechanical vibration. The video below, which was recorded in the laboratory of Prof. Jonathan Ashmore , shows an isolated guinea pig outer hair cell to which a whole cell patch electrode has been attached. Through the pipette, an alternating current signal is injected, and the resulting motor response is observed under a microscope. The alternating current signal is also played to a loudspeaker, so we can hear the signal that the outer hair cell receives."
 
Last edited:

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
"Based on our findings, we propose that temporal regularity is a form of predictability that tends to bind tone sequences into one stream, and adding temporal jitter violates to an increasing degree the expected pattern of regularity that would otherwise serve as a binding cue. "

"temporal predictability may serve as a binding cue during auditory scene analysis."
"The results in this letter demonstrate that temporal regularity is a binding cue between tone sequences, and that this breaks down to an increasing degree as one sequence becomes more irregular in time. The auditory system constantly copes with temporal irregularities in natural auditory scenes. It remains to be seen whether this influence of temporal irregularity on streaming can be captured by any of the existing computational models. However, it is clear that temporal predictability plays an important role in the perception of auditory streams, and needs to be considered for any complete understanding of auditory scene analysis."
Yes, this makes 100% sense to me - if the predictability drops below a certain point, then the streams merge - and the auditory illusion is lost. The noisiness of the cues that the bottom up processing is relying on to confirm prediction has become too great, and the balance between bottom and top is lost.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Another good piece to read, further helps one to get one's head around many of the concepts currently being very actively studied: "Predictability effects in auditory scene analysis: a review", http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3978260/.

A salient point here is

Further evidence for the view that predictability effects on stream segregation comprise more than attentional allocation comes from a recent MMN study (Bendixen et al., 2012b). This study showed that a series of interleaved tone sequences could be disentangled solely on the basis of predictability when the sequences were presented outside the focus of attention. In contrast, listeners mostly failed to segregate the streams when attentively trying to do so. This failure during active listening makes it highly unlikely that attention could have contributed to the predictability effect during passive listening. Hence the “pre-attentive” (Sussman, 2007) auditory system appears to be equipped with a bottom-up mechanism disentangling a mixture of two sound streams solely based on the predictability of these streams.

This is saying that one's hearing in certain situations is able to register details when not paying attention - the process of actively listening actually impedes the ability to discern aspects of the sound - IME this is what I've found, I often very deliberately don't listen directly to music playback when assessing the quality, it helps me get a more "correct" answer.
 

Opus111

Addicted to Fun and Learning
Joined
Mar 2, 2016
Messages
666
Likes
38
Location
Zhejiang
I find a similar thing - I want to take in the whole gestalt so I'm not paying attention to one particular thing. I want things to 'pop out' and draw my attention - this seems to me to be a more sensitive way of listening than, for example moving my spotlight of attention around the audio scenery. Asking 'Is there sibilance?' or 'Is the bass tight?' is a less sensitive way of listening IME.
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
Yes this extract from the paper is relevant to above
Whereas there is not yet consensus as to the precise underlying neuronal mechanisms and the terminology best used to describe the relevant phenomena (e.g., Näätänen et al., 2005; May and Tiitinen, 2010), it is undisputed that the auditory system effortlessly acquires information about the regular structure of the surrounding sound sources. The term “effortlessly” is meant to imply that this information comes as an inherent property of auditory sensory information processing (as opposed to being made available only by actively searching for it).

And it proceeds to state that "There is also general agreement that information about the regular characteristics of sound sources is available to the auditory system at early processing stages—within 150 ms after sound onset, "
And "In fact, the predictive notion implies that such information should be available even before the onset of the next signal emitted by a given sound source (Baldeweg, 2006; Bendixen et al., 2009, 2012a)."

The early availability of information on the regular characteristics of sound sources implies that predictability could, in theory, be used as an early cue in ASA. If auditory input is indeed processed in a predictive manner, information about the predictable succession of sound events could act upon ASA processes before any other grouping cue would be able to exert its influence. This is because all other cues need at least some rudimentary analysis of stimulus input, whereas prediction-based grouping could start at or even before the expected time of stimulus arrival. Therefore, predictability could in theory be the earliest grouping cue in ASA.

Which all points to the attentional effect that Opus describes - factors in the sound which "pop out & draw my attention"

All of this may well explain the controversial experiment that "Human hearing beats FFT uncertainty principle" where both timing & frequency were discriminated 10 times better than the limit imposed by the Fourier uncertainty principle.

There really is no better example of the fact that auditory perception (indeed all our perceptions) are constructs - in this case predictive processing comparing the prediction to signals coming from the auditory cortex. And in a more general sense, comparing previously stored models with incoming signals & adjusting to alternatives as needed.

BTW, Frank, great paper - it's those overview papers that are gold as they give a wider summary perspective (& also references) in the field.

Great work!
 
Last edited:

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
Some very recent, and relevant discussion of Mismatch negativity (MMN), http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4069482/.

A telling extract,

MMN occurs when a sound violates some regular pattern within a sequence of sounds. Repetition of the pattern leads to the formation of what has been termed a “prediction model”- a memory containing information about sound characteristics and their transitions (Winkler et al., 1996a; Winkler, 2007). The learned pattern enables the brain to infer the most likely subsequent state of brain activation to follow the present state, in other words, to form predictions about what stimulus should come next based on a dynamically updated probabilistic inference. MMN is evoked when the prediction does not match the next state encountered.

IOW, the mind sensing that something is 'wrong' with the sound, even though you consciously can't put your finger on it ...
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
One of the cross references in that paper above examines the level of sophistication of MMN, how it is capable of reacting to a variant in the sound, and how it deals with it in terms of changing its predictability patterns,
"What controls gain in gain control? Mismatch negativity (MMN), priors and system biases"

http://link.springer.com/article/10.1007/s10548-013-0344-4
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
And this is what I was trying to explain to Tim, it's the processing of the signal stream that is the important core issue - what we do with the signal & how this allows us to pick out & perceive elements that aren't easily measured in the signal stream or are dismissed as below audibility - "audibility" that simple tone based tests established decades ago as the audibility "thresholds". As I pointed out how FFTs can "measure" below the noise floor - no it can't - it can process the signals mathematically & by averaging, increase the signal strength of signals hidden within the noise floor

A similar but much more sophisticated processing engine is at the heart of auditory processing. Why wouldn't it have the capabilities to "hear" (actually perceive) below the "thresholds" once complex signals are involved & not just tones?

Once we learn the underlying processes & what factors auditory processing considers important (& at what level) then we can devise measurement techniques which can use this understanding & become more sophisticated in our analysis of audio reproduction systems
 
OP
J

John Kenny

Addicted to Fun and Learning
Joined
Mar 25, 2016
Messages
568
Likes
18
A related aspect to this whole predictability & MMN is about how we might store models of sounds - what aspects we extract & group together as a representation of that sound object - what aspects of this model is our auditory processing system using when comparing it with the signal stream?

Some evidence suggests that summary statistics are stored of the timing relationships that define the sound object.

From this paper "Humans Use Summary Statistics to Perceive Auditory Sequences"
"Here we provide evidence that the auditory system summarizes the temporal details of sounds using time-averaged statistics."

This has been shown for "sound textures" - "Sound Texture Perception via Statistics"
"Rainstorms, insect swarms, and galloping horses produce ‘‘sound textures’’—the collective result of many similar acoustic events. Sound textures are distinguished by temporal homogeneity, suggesting they could be recognized with time-averaged statistics."

In this paper they decomposed & statistically analysed certain sound textures & then synthesised new sounds which they assessed for realism. They found that "We then assessed the realism and recognizability of novel sounds synthesized to have matching statistics. Statistics of individual frequency channels, capturing spectral power and sparsity, generally failed to produce compelling synthetic textures; however, combining them with correlations between channels produced identifiable and natural-sounding textures. Synthesis quality declined if statistics were computed from biologically implausible auditory models. The results suggest that sound texture perception is mediated by relatively simple statistics of early auditory representations, presumably computed by downstream neural populations. The synthesis methodology offers a powerful tool for their further investigation."
 
Last edited:

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
One of the most clarifying aspects of this current research, for me, is that it gives a very good technical explanation of why I can listen to very "poor" historical recordings when the system is working well enough - and the experienced event transcends the supposed limitations. The concept of the balance between bottom up - the actual acoustic signal impinging on the ear, and being assessed - and top down - the brain "expecting" something to come next, as part of its memories, and predictions - processing seems to be at the heart of this mental behaviour - and if the bottom up data begins to dominate, and fails to reach alignment with the top down predictions to a necessary degree, then it just sounds like a scratchy old recording.
 

Dynamix

Addicted to Fun and Learning
Joined
Mar 29, 2016
Messages
593
Likes
216
Location
Nörway
Subjectivists discussing psychoacoustics? Comedy gold.
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
So ... ASA is not relevant to audio?
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
For those who "like to watch", plenty of YouTube videos on this stuff: demos, examples, lectures and so on ...
 

fas42

Major Contributor
Joined
Mar 21, 2016
Messages
2,818
Likes
191
Location
Australia
A very recent article by Bregman, linking ASA and the sound world we're interested in:
"Progress in Understanding Auditory Scene Analysis", in Music Perception: An Interdisciplinary Journal, Sep 2015

http://www.jstor.org/stable/10.1525/mp.2015.33.1.12

Only the first page can be read, but it gives a flavour of his thinking ...
 

SoundAndMotion

Active Member
Joined
Mar 23, 2016
Messages
144
Likes
111
Location
Germany
A very recent article by Bregman, linking ASA and the sound world we're interested in:
"Progress in Understanding Auditory Scene Analysis", in Music Perception: An Interdisciplinary Journal, Sep 2015

http://www.jstor.org/stable/10.1525/mp.2015.33.1.12

Only the first page can be read, but it gives a flavour of his thinking ...

From the horse's mouth (short and no slides, but...): http://www.music.mcgill.ca/bkn25/videos/Bregman.mp4
It's worth a listen.

Please don't quote the first point from the first sentence of the abstract from a world-renowned scientist from McGill Auditory Research Lab, who has won many awards. This is a science-oriented forum!
 
Last edited:

Thomas savage

Grand Contributor
The Watchman
Forum Donor
Joined
Feb 24, 2016
Messages
10,260
Likes
16,310
Location
uk, taunton
Top Bottom