• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Relationships between physical sound, auditory sound perception, and music perception

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.
If I can offer a couple of slide decks/talks here:

First, on context, environment, etc.

http://www.aes-media.org/sections/pnw/pnwrecaps/2013/apr_jj/ Pay particular to the discussion around slide 14 of http://www.aes-media.org/sections/pnw/ppt/jj/heyser.pptx When anything (context, the wind in the willows, random chance, a fly outside the window, you name it, changes your focus, you will have a different experience with the same physical stimulus.

Then, on how the ear turns sound into neural impulses. I'm afraid there is no actual recording of this talk. There ought to be, but there isn't. Sorry.

But this is the slide deck, and warning, yes, it is a touch dated, it's 10 years old. http://www.aes-media.org/sections/pnw/ppt/jj/hearingtutorialv1.pptx
 

HuskerDu

Member
Joined
Mar 27, 2019
Messages
62
Likes
44
Location
Houston
When I was younger, a friend who was an orchestral percussionist told me that auditions for positions in orchestras are often done blinded (behind screens) so that the performance of candidates can be evaluated without the biases resulting from knowing who the candidate is, their appearance, gender, etc.

Ya know @Phronesis, that's an excellent point. I'd forgotten about that. I saw a similar write-up, prolly traceable to Kahneman I think...

We now know all we need to know: professional musicians listening to LIVE music can't make reliable side-by-side judgements without intervention... Arguably the rest of us are just coping with audio-OCD.

As is, I'm thinking about how people built airplanes before the Wright brothers figured out how to use a wind tunnel. Audible glare indeed!

No wonder there's so much dogma about different classes of amplifiers. No wonder audio reproduction is essentially unchanged since its invention. No wonder the metaphors to describe sound are full of references to visual phenomena.

Maybe we go back and start with what we can measure, in addition to how well a DAC can produce an analog test tone from its data. I think I have speakers, for example, that can't produce anything above 12 khz... That should be measurable, albeit unsatisfactory by itself... My doctor knows exactly how much damage I've done with pheasant hunting. (To myself I mean... The pheasants were generally unharmed.) Maybe those two facts (about sound frequency, not bird hunting) would be relevant ...to each other, at least. Alas, even more reading ahead of me... (I'm thinking the Stones for background tunes. You can't always get what you want...)

Heck, I bet all versions of that recording were engineered to sound good on AM radio. I'd be surprised if its dynamic range was greater than 35hz to 10khz... I'll bet it's possible to look that up for virtually all commercial recording practices, genre by genre and era by era... First pressing my... skepticism is aroused.
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,250
Likes
17,186
Location
Riverview FL

HuskerDu

Member
Joined
Mar 27, 2019
Messages
62
Likes
44
Location
Houston
...But this is the slide deck, and warning, yes, it is a touch dated, it's 10 years old. http://www.aes-media.org/sections/pnw/ppt/jj/hearingtutorialv1.pptx

THAT. IS. AWESOME!
Thank you!

I love this slide:
"HRTF’s vs. Stereo
•A stereo signal, sent to two speakers at symmetric angles (let’s use the standard setup), sends two signals to each ear.
•If the signals are duplicated in the two channels, i.e. center signal, the two HRTF’s interfere
–This means you have a dip in the actual frequency response in the midrange for center images."

My brain hurts now.

I was trying to figure out what "ERB" is when I found one of the papers JJ's slide deck is referencing... Neat! (@j_j are you the JJ in the slides?)

And for the love of Pete, what is ERB? As in, "...a 1 ERB wide Gaussian pulse will have a delay sensitivity (binaurally) around 2 samples at 44K..."

(Um, I know what those words mean separately... But what's ERB? Exceptionally robust bocabulary?)

Whatever ERB is, doesn't this suggest that sampling rates get kinda important in designing cochlear implants? (I'm kidding! Please don't answer sampling rates! ;-)
 
Last edited:

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,250
Likes
17,186
Location
Riverview FL
And for the love of Pete, what is ERB?

Equivalent rectangular bandwidth

https://en.wikipedia.org/wiki/Equivalent_rectangular_bandwidth

From the REW Manual - https://www.roomeqwizard.com/help/help_en-GB/html/graph.html
"ERB smoothing uses a variable smoothing bandwidth that corresponds to the ear's Equivalent Rectangular Bandwidth, which is (107.77f + 24.673) Hz, where f is in kHz. At low frequencies this gives heavy smoothing, about 1 octave at 50Hz, 1/2 octave at 100 Hz, 1/3 octave at 200 Hz then levelling out to approximately 1/6 octave above 1 kHz."

REW example:

Same measurement, unsmoothed, 1/6 octave smoothing, and ERB smoothing (offset 10 and 20dB to separate the traces)

1557788758441.png
 

HuskerDu

Member
Joined
Mar 27, 2019
Messages
62
Likes
44
Location
Houston
Equivalent rectangular bandwidth

https://en.wikipedia.org/wiki/Equivalent_rectangular_bandwidth

From the REW Manual - https://www.roomeqwizard.com/help/help_en-GB/html/graph.html
"ERB smoothing uses a variable smoothing bandwidth that corresponds to the ear's Equivalent Rectangular Bandwidth, which is (107.77f + 24.673) Hz, where f is in kHz. At low frequencies this gives heavy smoothing, about 1 octave at 50Hz, 1/2 octave at 100 Hz, 1/3 octave at 200 Hz then levelling out to approximately 1/6 octave above 1 kHz."

REW example:

Same measurement, unsmoothed, 1/6 octave smoothing, and ERB smoothing (offset 10 and 20dB to separate the traces)

View attachment 26167
Okay. I just want y'all to know that I'm having to read this stuff out loud. Slowly. And it's not helping. Yet. UMmmmmmmmmmmmmmm...
 

HuskerDu

Member
Joined
Mar 27, 2019
Messages
62
Likes
44
Location
Houston
I little sleep goes a long way! I think I'm sorta starting to get it this morning. Maybe. I love learning from engineers. Usually purpose-specific and focused... Sort of the pursuit of the perfect tool.

This kinda reminds me of Day One in calculus class. A series of rectangles used to approximate the area under a curve? I feel like I'm about to see a bunch of connections I've never noticed before. Who knew hankering for decent speakers would be so integrative!?!?

From another angle, I did know about the tiny hairs, but not the second set, nor the stretching. When an inside hair gets permanently bent over but does not break off (as a result of too loud noises) I believe the result is always-on tone(s).

Now I see how the tone steps are kinda like the wind tunnel, but I don't see how a complex noise (like a car crash or thunder) arises from the steps in the cochlea plus the timing differences. Also, I don't see why standing waves in a room are relevant, if our brains listen only to the leading edge and successfully shut out the rest. But I do start to see how harmonic noise would be distinct from, say, the same orchestra tuning up, or again a shattering window. But then why do we graph this multiple simultaneous wave front as a 2-D squiggle, with "time" simplistically shown as the (one) x-axis...? And how do we know whether the recording studio used a center mic? Why does anybody accept stereo playback? Do microphones record in these distinct tone steps all at once? Wouldn't the decoder need to know precisely how the recording was encoded in the first place? Wouldn't that tend to produce a microphone that looks and acts like a cochlea? If this stuff is all in the compression algo, why not simply read the DAC's software and be done? Why wouldn't we simply converge to a de facto standard for en/decoding, rather like HTML or Ethernet or compiled languages--whatever is correct for the abstraction layer we need? Assuming this is all already done, or I'm asking the wrong questions, what are the main barriers to grading the solutions in terms of compliance and not compliance?

I love puzzles. Did I mention my brain hurts?
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
Did I mention my brain hurts?
I sympathise. And I would suggest forgetting it all for now - or forever. I would start off by assuming that, to a first approximation, the job of an audio system is to make a speaker cone vibrate in such a way that it reproduces the air pressure vibrations that fell on a microphone. Notice there is not any mention of sine waves, harmonics, standing waves.

I think that you would be hard pressed to find such a simple statement within a forum such as this. The experts aren't even just elaborating on the above, or dwelling on the inevitable caveats and real world difficulties with it; they are starting from a completely different place. Woe betide any scientifically-minded layman who strays into the hobby, because they'll soon lose all sight of what it's about.

The challenge should be: explain audio to someone without reference to sine waves or the frequency domain.

The main problem with that challenge is that it can be done in a sentence so it won't impress anyone.
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,250
Likes
17,186
Location
Riverview FL
Do microphones record in these distinct tone steps all at once?

Sound in air is one or more longitudinal waves.

http://resource.isvr.soton.ac.uk/spcg/tutorial/tutorial/Tutorial_files/Web-basics-show.htm

A microphone converts air pressure to a varying electrical voltage.

Graph the voltage (or pressure) over time to get the squiggly line.

But then why do we graph this multiple simultaneous wave front as a 2-D squiggle, with "time" simplistically shown as the (one) x-axis...?

Multiple waves (frequencies and amplitudes) sum to a single pressure value at the microphone.

http://resource.isvr.soton.ac.uk/spcg/tutorial/tutorial/Tutorial_files/Web-inter-superp.htm
 

HuskerDu

Member
Joined
Mar 27, 2019
Messages
62
Likes
44
Location
Houston
Multiple waves (frequencies and amplitudes) sum to a single pressure value at the microphone.

This sentence does seem to approach @Cosmik's prime directive...

But, but, but...

I feel like I was asked to sum up life on Earth, and I've got it down to a sentence or two, but I now realize I forgot to consider sea life... But the approximation is still pretty good, if not particularly useful... (Huh?)

Basically, I'm stunned that a microphone works at all... Likewise, a stylus in a teensie vinyl groove is sufficient as the 3rd or 4th generation reproduction... Um, also speakers... It's like trying to explain bumble bee flight. (That, too, was a tall learning curve, just so ya know...)

It "sounds" so rich, it's just hard to believe, for example, that the very specifically distinctive back-tone that makes Judy Garland's singing voice, well, distinct... That it's just a coefficient on a side channel in the sum of the squiggles... But if it is, then... If we record each thing in isolation and sum them up... (Aarggh! Gak! Thwpttt!)

But I do love to learn. Thanks. I think.
 

edechamps

Addicted to Fun and Learning
Forum Donor
Joined
Nov 21, 2018
Messages
910
Likes
3,621
Location
London, United Kingdom
THAT. IS. AWESOME!
I love this slide:
"HRTF’s vs. Stereo
•A stereo signal, sent to two speakers at symmetric angles (let’s use the standard setup), sends two signals to each ear.
•If the signals are duplicated in the two channels, i.e. center signal, the two HRTF’s interfere
–This means you have a dip in the actual frequency response in the midrange for center images."

If you want more information about this particular phenomenon, there is also this paper which shows that voice is more intelligible with a real center speaker than with a phantom stereo image, for precisely that reason.
 

HuskerDu

Member
Joined
Mar 27, 2019
Messages
62
Likes
44
Location
Houston
... after letting this rest in the back of my mind for a while, I think my expectations have been wildly unreasonable. Moving magnets record sounds with amazing accuracy and detail. Silver halide film records light with amazing accuracy and detail. These phenomena are just very cool.

I'm glad you guys are here, because I can't think of another source that would have allowed me to glimpse the enormity of what must be going on inside and outside my head as I'm sitting here looking at the trees out the window and listening to Blue Man Group. Yet I'm completely unaware of any kind of effort or process, apart from seeing and hearing.

And the wave mechanics must be some mind blowing math... Yet still really primitive. I ran across an article explaining that there is still no math to model rogue waves (in the ocean) and indeed the folks who would study such things didn't really believe such waves (I think 90 foot swells in the open ocean) were even possible until the last couple of decades.

By the way, the little Klipsch Powergate amp seems to be a very happy fit with the efficient Zu Dirty Weekend speakers. The result is crisp. The big magnets stop on a dime and start again instantly. The tweeters seem to have added another 10khz to the top of what I've been used to from decent popular Bluetooth speakers.
 
Last edited:

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,788
Location
My kitchen or my listening room.
THAT. IS. AWESOME!
Thank you!

I love this slide:
"HRTF’s vs. Stereo
•A stereo signal, sent to two speakers at symmetric angles (let’s use the standard setup), sends two signals to each ear.
•If the signals are duplicated in the two channels, i.e. center signal, the two HRTF’s interfere
–This means you have a dip in the actual frequency response in the midrange for center images."

My brain hurts now.

I was trying to figure out what "ERB" is when I found one of the papers JJ's slide deck is referencing... Neat! (@j_j are you the JJ in the slides?)

And for the love of Pete, what is ERB? As in, "...a 1 ERB wide Gaussian pulse will have a delay sensitivity (binaurally) around 2 samples at 44K..."

(Um, I know what those words mean separately... But what's ERB? Exceptionally robust bocabulary?)

Whatever ERB is, doesn't this suggest that sampling rates get kinda important in designing cochlear implants? (I'm kidding! Please don't answer sampling rates! ;-)


ERB ::= Equivalent Rectangular Bandwidth It's an estimate of the bandwidth of a cochlear filter. You can think of it as the modern version of a Critical Bandwidth.
 
Top Bottom