• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

"Things that cannot be measured"

CMOT

Active Member
Joined
Feb 21, 2021
Messages
147
Likes
114
The question isn’t whether generative graphics can put a realistic mustache on a face with scant evidence of the original, but rather whether the algorithm can put on a mustache that looks identical to the original mustache. And that’s the fundamental philosophical discord in audio—the uncontrolled subjectivists want a realistic mustache—with realism defined by the picture of a mustache they have in their minds. High fidelity wants a mustache that looks as exactly as possible like the original mustache, even if the original photo was out of focus and poorly color-matched.

Rick “who can improve a photo, but only by risking a loss of indexical fact” Denney

Thats not quite fair. One might a realistic mustache to the best of the abilities of an algorithm for a variety of reasons that aren't "uncontrolled subjectivism" - that is, not just what "target" one has in their mind. For instance, for a really poor done recording, removing noise might also muck with the perception of the signal - wouldn't we want the algorithm to do the best job it can reconstructing the signal given the preserved information and some model of what original signals "looked like"?

So yes, identical to the original mustache unless the mustache was badly deformed for whatever reason by the image capture, storage and presentations considerations - then maybe some reasonable guess as to what original was. We shouldn't' be adding things for their own sake, but we might when things have been mangled... Not that this happens very often in modern recording, but it has and does...
 

rdenney

Major Contributor
Forum Donor
Joined
Dec 30, 2020
Messages
2,295
Likes
4,034
Thats not quite fair. One might a realistic mustache to the best of the abilities of an algorithm for a variety of reasons that aren't "uncontrolled subjectivism" - that is, not just what "target" one has in their mind. For instance, for a really poor done recording, removing noise might also muck with the perception of the signal - wouldn't we want the algorithm to do the best job it can reconstructing the signal given the preserved information and some model of what original signals "looked like"?

So yes, identical to the original mustache unless the mustache was badly deformed for whatever reason by the image capture, storage and presentations considerations - then maybe some reasonable guess as to what original was. We shouldn't' be adding things for their own sake, but we might when things have been mangled... Not that this happens very often in modern recording, but it has and does...

I wasn't making a value judgment, really, but just making a point. When it comes to photography, I fall a lot on the meaning of the word indexical, which I actually learned from Paul Raphaelson who also frequents this forum. The concept is that the changes in density of a negative or the pixels from a digital sensor were created by light coming from the scene at the time the photo was made. No photograph is an utterly factual representation, simply because it is static while in-person sight is anything but. Journalistic photography comes with a high expectation of indexicality even in the details of the image. One would not expect any manipulation of the image other than to correct known coloration from the digital sensor in the camera. Art photography, on the other hand, retains its definition as photography by its indexicality of form--the shapes of the subject (or subjects in the case of montages) are rendered by light from the scene striking the sensitized surface. But there are many tonal manipulations made by the photographer in making an art print, in addition to the initial artistic selection of what should be in the photo and when should it be taken.

When I decide to make a photographic print, an inkjet print, or publish the photo in a book, I have to deal with quite different constraint envelopes. I do not consider that art. I consider that craft--the art part is the initial interpretation. I think of making photos (after the initial snap of the shutter) to fall into three parts: 1. Correction, to correct for the known errors in the capturing tools. Thus, I correct for known lens distortions, and I correct for known color casts for a scanner (and spend a lot on calibration tools to do so), and stuff like that. That is pure engineering--it's either right or it isn't. The product of this first step is a "corrected" image. 2.) Interpretation, where I do all the things that photographers do artistically, including shifting colors and tonalities, dodging, burning in, adjusting saturation, and so on with the purpose of creating the artistic result, as displayed on my calibrated monitor. The calibrated monitor is calibrated precisely for the same reason that mixing engineers want flat studio loudspeakers. 3.) Targeting, to make the adjustments necessary to maintain (to the extent possible) the artist's interpretation when displayed in various media.

I may hire a photo lab technician (or a publisher's printing technician) to perform the targeting function, in which case I expect only craft from that person, and not art. I may hire a technician to scan a negative for me and provided a corrected image file, in which case that role is craft and not art. But the interpretation comprises artistic decisions, and most photographers insist on doing that themselves. They think of that role as part of photography just as much as its fundamental indexicality. (The craft may, of course, require more training and talent than the art.)

All this discussion is aimed at identifying the boundaries of art and craft. Applying an EQ curve to correct a microphone is obviously craft. Audio playback equipment is not art, as I think most here would agree. It is craft, with the job of sustaining as much of the indexical relationship between the artist's product and the listener's ear as possible. The same is true for the mastering engineer. The mixing engineer may be the artist (or an artist), and may use a range of tools to artistically interpret the recording, but I'm usually increasingly annoyed as the mixing engineer's efforts become apparent, because they often undermine the art (same for ham-handed or overdone darkroom/editing work on a photo). The more apparent the manipulations, the more obviously the indexical relationship is lost (Autotune is just one example--there are lots of others that are much less bothersome to me).

An image-generation algorithm is guessing at what the thing it's generating looks like. No matter how well it guesses, it has no indexical relationship to the subject. It may look very good. Indeed, it may look much better than the original image from before the missing parts were removed, necessitating regeneration. But the portion of a picture that is generated image is not a photograph.

I have old recordings from the deeps of time which have been doctored by modern technologists to make them more listenable. I don't have a problem with that. But I have to be careful about what's real and what's processing artifact. Probably, the conductor's tempos are real, and that's a big part of listening to a historical performance--to understand what the conductor was doing. But it would be difficult to draw many other conclusions from it without external knowledge of the process. So, my recording of Ralph Vaughan William's conducting of his own 4th Symphony, made in 1937, tells me a lot about how he interpreted the music, just from the tempi he selected. Assuming he was competent, it might even be considered a reference interpretation. But I would NOT expect that recording to give me much information about what orchestral tuba players sounded like in 1937, or even what dynamic markings in the music actually mean in performance.

I've also doctored many old photos to make them look "modern" and free of blemish. But there was a lot of non-indexical interpretation in those repairs.

I think we talk past each other quite a lot because we aren't clear about the role at each step and its responsibilities as art or craft to preserve the indexical relationship between source and playback.

Rick "with apologies--unconstrained for the moment by having an actual keyboard" Denney
 
Last edited:

CMOT

Active Member
Joined
Feb 21, 2021
Messages
147
Likes
114
I wasn't making a value judgment, really, but just making a point. When it comes to photography, I fall a lot on the meaning of the word indexical, which I actually learned from Paul Raphaelson who also frequents this forum. The concept is that the changes in density of a negative or the pixels from a digital sensor were created by light coming from the scene at the time the photo was made. No photograph is an utterly factual representation, simply because it is static while in-person sight is anything but. Journalistic photography comes with a high expectation of indexicality even in the details of the image. One would not expect any manipulation of the image other than to correct known coloration from the digital sensor in the camera. Art photography, on the other hand, retains its definition as photography by its indexicality of form--the shapes of the subject (or subjects in the case of montages) are rendered by light from the scene striking the sensitized surface. But there are many tonal manipulations made by the photographer in making an art print, in addition to the initial artistic selection of what should be in the photo and when should it be taken.

When I decide to make a photographic print, an inkjet print, or publish the photo in a book, I have to deal with quite different constraint envelopes. I do not consider that art. I consider that craft--the art part is the initial interpretation. I think of making photos (after the initial snap of the shutter) to fall into three parts: 1. Correction, to correct for the known errors in the capturing tools. Thus, I correct for known lens distortions, and I correct for known color casts for a scanner (and spend a lot on calibration tools to do so), and stuff like that. That is pure engineering--it's either right or it isn't. The product of this first step is a "corrected" image. 2.) Interpretation, where I do all the things that photographers do artistically, including shifting colors and tonalities, dodging, burning in, adjusting saturation, and so on with the purpose of creating the artistic result, as displayed on my calibrated monitor. The calibrated monitor is calibrated precisely for the same reason that mixing engineers want flat studio loudspeakers. 3.) Targeting, to make the adjustments necessary to maintain (to the extent possible) the artist's interpretation when displayed in various media.

I may hire a photo lab technician (or a publisher's printing technician) to perform the targeting function, in which case I expect only craft from that person, and not art. I may hire a technician to scan a negative for me and provided a corrected image file, in which case that role is craft and not art. But the interpretation comprises artistic decisions, and most photographers insist on doing that themselves. They think of that role as part of photography just as much as its fundamental indexicality. (The craft may, of course, require more training and talent than the art.)

All this discussion is aimed at identifying the boundaries of art and craft. Applying an EQ curve to correct a microphone is obviously craft. Audio playback equipment is not art, as I think most here would agree. It is craft, with the job of sustaining as much of the indexical relationship between the artist's product and the listener's ear as possible. The same is true for the mastering engineer. The mixing engineer may be the artist (or an artist), and may use a range of tools to artistically interpret the recording, but I'm usually increasingly annoyed as the mixing engineer's efforts become apparent, because they often undermine the art (same for ham-handed or overdone darkroom/editing work on a photo). The more apparent the manipulations, the more obviously the indexical relationship is lost (Autotune is just one example--there are lots of others that are much less bothersome to me).

An image-generation algorithm is guessing at what the thing it's generating looks like. No matter how well it guesses, it has no indexical relationship to the subject. It may look very good. Indeed, it may look much better than the original image from before the missing parts were removed, necessitating regeneration. But the portion of a picture that is generated image is not a photograph.

I have old recordings from the deeps of time which have been doctored by modern technologists to make them more listenable. I don't have a problem with that. But I have to be careful about what's real and what's processing artifact. Probably, the conductor's tempos are real, and that's a big part of listening to a historical performance--to understand what the conductor was doing. But it would be difficult to draw many other conclusions from it without external knowledge of the process. So, my recording of Ralph Vaughan William's conducting of his on 4th Symphony, made in 1937, tells me a lot about how he interpreted the music. just from the tempi he selected Assuming he was competent, it might even be considered a reference interpretation. But I would expect that recording to give me much information about what orchestral tuba players sounded like in 1937, or even what dynamic markings in the music actually mean in performance.

I've also doctored many old photos to make them look "modern" and free of blemish. But there was a lot of non-indexical interpretation in those repairs.

I think we talk past each other quite a lot because we aren't clear about the role at each step and its responsibilities as art or craft to preserve the indexical relationship between source and playback.

Rick "with apologies--unconstrained for the moment by having an actual keyboard" Denney

:)

Fair enough. In psychology/philosophy the term is call veridical. But because of the way the brain works, much of our experience is pretty non-veridical all the time. Of course, the evaluation function here isn't veridicality, but how likely we are to survive and pass on our genes.
 

audio2design

Major Contributor
Joined
Nov 29, 2020
Messages
1,769
Likes
1,842
This was really fun, a couple of dual op-amps, some caps and resistors with a pot. You could make the image go completely around your head.

You could only go around 180 degrees (practically less) with headphones, and with speakers, only from left to right within the limits of the speaker widths. Crosstalk negates any width wider than that. Signal processing launching cancellation signals can extend that but it has its own issues. You can use a 2nd set of speakers closer to the listener, and achieve better results.
 

audio2design

Major Contributor
Joined
Nov 29, 2020
Messages
1,769
Likes
1,842
And yep, that's all you get from 2 channels. You get a great deal more from 3 channels (LCR) and even more from 5 or 7. Past 2, the center channel is key, and is most absolutely not "only for dialog".

ITD is just one aspect of sound placement, even in two channels.

A center channel does not help extend the width of the ITD related sound placement. However, with a center channel, you can place your 2 channels wider and still have a properly centered image, so it provides some ability to do that. Unfortunately adding more channels is not a panacea. It can work for simple discrete sounds, but it greatly increases cross-talk which ends up making accuracy of placement in front (assuming music) worse. It may be more pleasing, but not more accurate.
 

audio2design

Major Contributor
Joined
Nov 29, 2020
Messages
1,769
Likes
1,842
Scary thought!
IMO the "musicality" of a recording is entirely down to the musician and how he/she uses their instrument and voice.
If some sort of AI was used to f*ck about with it then the weirdos who talk about the musicality of their hifi would suddenly have a point but I am not sure how that would improve, rather than wreck, any of the music I personally like.


Trying to take a forward look, this was actually something a few of us bandied around a bit with several years ago. How to we measure recording and mixing engineers and their environment?? Sounds crazy, but what if you could? What if you could capture and quantify the nuances in home the individuals creating the art differ from those consuming the art? If you could, then you could hear the music closer to how it was really intended. Conceptually we decided we would need AI and deep learning. As we were working through it, we realized it could be a good training tool and even allow people to "calibrate" themselves, or determine if they were in a good mindset.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,293
Likes
4,820
Location
My kitchen or my listening room.
Out of honest curiosity, why?
Because, yes, I know the outcome, and it shows that you do not dare extrapolate the "click" test beyond its original stimulus. It's that simple.

Because it will be enlightening. There are many results in audio that show "phase doesn't matter" using specific stimuli for which phase, does not, in fact matter. That does not mean "phase does not matter" as can be trivially proven by using a very simple set of test signals. The trick is in figuring out WHEN phase does not matter.

There are many results, using clicks for the most part, that argue that "envelope arrival does not matter at high frequencies". Again, for some stimulii, very short ones with low energy in particular, that's pretty much true. It does not mean that envelope arrival does not matter, it means that for THAT STIMULUS it does not matter. Trying a gaussian pulse train, say with a 100Hz rep rate (hint, that does not mean there is any 100Hz content at all, so don't go there) and testing for DL interaurally will be a great big surprise. (In headphones.) There will also be a stunning difference in loudspeakers, but there are multiple causes that confound that experiment so badly that it's really not meaningful.

Yes, that's why I tell you to try it for yourself. It is very unwise to extrapolate ANY test beyond the conditions in the test, without some supporting evidence that the extrapolation can be done.

That kind of extrapolation has led to many mistakes.

So try the test, instead of citing tests that may be accurate, but that can not be extrapolated to all signals in the real world.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,293
Likes
4,820
Location
My kitchen or my listening room.
ITD is just one aspect of sound placement, even in two channels.

A center channel does not help extend the width of the ITD related sound placement. However, with a center channel, you can place your 2 channels wider and still have a properly centered image, so it provides some ability to do that. Unfortunately adding more channels is not a panacea. It can work for simple discrete sounds, but it greatly increases cross-talk which ends up making accuracy of placement in front (assuming music) worse. It may be more pleasing, but not more accurate.


Now that's wrong. Obviously, in addition to having more channels, one must use them properly. Sorry. Your concern about "cross-talk" is simply the result of production that's not appropriate to the channel layout. Yes, there is evidence of this. The first evidence comes in 1933 from Steinburg and Snow, which was republished as chapter 13 of the collected work of Harvey Fletcher.

As to extend width, well, that's wrong too, but more complex. One can extend the soundstage in 2-channel with proper signal construction, at least along the "sweet spot" which is a single line down the center of the listening area.

Various authors have claimed this means that multichannel has a single "sweet spot", which is trivially shown wrong with properly recorded (no processing at all) material. Said recording is NOT a co-incident recording of any kind, of course, there are a number of ways available that propose to solve this, each solving some subset of a larger problem.

Go read it. It's as conclusive today as it was then, and it completely dismisses concerns about 'crosstalk'. You can also go to thereabouts of the 199? stuff I published on "Perceptual Soundfield Reconstruction", authors Johnston and Lam, which show conclusively that the "sweet spot" is enormously enhanced when material is properly captured and presented. https://www.aes.org/e-lib/browse.cfm?elib=9136 is the AES link. There are various other citations.
 

rdenney

Major Contributor
Forum Donor
Joined
Dec 30, 2020
Messages
2,295
Likes
4,034
On the inclusion of the recording's room effects competing with the listener's room effects:

In the early days of samplers, an amateur orchestra in which I played decided to use one to produce a mallet instrument (I don't recall which--perhaps a xylophone). The problem we ran into is that the samples were recorded with room effects--reverb in particular. When we played them through a speaker at realistic loudness on stage, the sound was diffuse and muddy, because the room effects of our performance space were added on top of the reverb effects built into the samples. We had to search quite a bit to find a tone generator (and these were separate purpose-built devices at the time--I still have one made by Roland) that recorded the samples very dry--really in a very dead reference studio (an anechoic chamber would have been even better). That way, the instrument would sound like the real instrument up close, and benefit from the room effects out in the hall, as with the other real instruments in the orchestra.

The problem is that the more we try to include the sound of the original performance space, the less like our listening room it sounds, and that violates the willing suspension of disbelief. To sound like real instruments in our house, it has to be recorded dry as a bone, just for a start. The field recordings I have made that caught little or no room effects sound more realistic--in terms of sounding like the musicians are in my room--than recordings that include room effects. I still prefer to hear the performance space, because I want to transcend the room I'm sitting in to the concert hall. For rock music, though, I like the old, dead recordings, because they retain a jam-session-in-my-living-room sound. I think this all has to do with microphone placement and how much room effect (real or as a result of processing) is in the recording.

Change of topic--on the notion of a sound field:

I don't know whether this adds to the discussion, but I've said here before that what makes one tuba different from another, if there is a difference that anyone can detect, has more to do with how the sound comes to us in the room than with frequency spectrum as measured up close. We take this as a given with loudspeakers--the sound we hear isn't just the direct sound, but early reflections and room effects also. An instrument that sounds like it's "over there"--like the tall-bell German-style orchestral tubas differs from a tuba that sounds like it's "right here" as with the shorter, fatter American-style orchestral tubas. That has to be an impression based on the sound field--the ratio of direct and reflected sound. The shape of tuba's bell creates this difference, I believe. Try as I might, I've never been able to sustain that effect in a recording, and I believe that is because 1.) I lack the sight, which means the sense of over-there versus right-here is uninformed by knowing where the tuba is, and 2.) microphones aren't placed to capture the same ratios of direct and reflected sound. (Of course, most--but not all!--tuba tone is below 500 Hz.)

But it seems to me that it's not an issue of playback equipment, but rather an issue of--yup--microphone placement and how much room effect is in the recording.

Rick "whose recordings in his living room sound like a performer in his living room when listening on headphones" Denney
 

audio2design

Major Contributor
Joined
Nov 29, 2020
Messages
1,769
Likes
1,842
Now that's wrong. Obviously, in addition to having more channels, one must use them properly. Sorry. Your concern about "cross-talk" is simply the result of production that's not appropriate to the channel layout. Yes, there is evidence of this. The first evidence comes in 1933 from Steinburg and Snow, which was republished as chapter 13 of the collected work of Harvey Fletcher.

As to extend width, well, that's wrong too, but more complex. One can extend the soundstage in 2-channel with proper signal construction, at least along the "sweet spot" which is a single line down the center of the listening area.

Various authors have claimed this means that multichannel has a single "sweet spot", which is trivially shown wrong with properly recorded (no processing at all) material. Said recording is NOT a co-incident recording of any kind, of course, there are a number of ways available that propose to solve this, each solving some subset of a larger problem.

Go read it. It's as conclusive today as it was then, and it completely dismisses concerns about 'crosstalk'. You can also go to thereabouts of the 199? stuff I published on "Perceptual Soundfield Reconstruction", authors Johnston and Lam, which show conclusively that the "sweet spot" is enormously enhanced when material is properly captured and presented. https://www.aes.org/e-lib/browse.cfm?elib=9136 is the AES link. There are various other citations.


I am not sure we are talking about the same thing.

Are you saying that crosstalk, the effective destruction of ITD timing information and ILD level information, not to mention time based comb filtering can be negated by simple recording methods, or are you talking about applying crosstalk cancellation, the latter of which does not require the listener to be centered. .... or were you talking about cross-talk as it applies to a multichannel recording only? We may be totally in agreement but talking about different issues.

Do you mean, Auditory Perspective -Physical Factors, Steinberg and Snow, 1934? I don't think that paper reveals anything at odds with what I stated. They did not recreate images outside the width of the speakers, and adding the center had the effect of narrowing the width, but improving depth perception. Increasing the width of speakers in a 2-channel system improves width of image recreation, but reduces the localizing of centered sounds. Adding a center channel fixes that localization issue. That was my only comment (positive) w.r.t. center channel. It allows the potential for a wider presentation with accuracy.
 

scott wurcer

Major Contributor
Audio Luminary
Technical Expert
Joined
Apr 24, 2019
Messages
1,501
Likes
2,822
You could only go around 180 degrees (practically less) with headphones, and with speakers, only from left to right within the limits of the speaker widths. Crosstalk negates any width wider than that. Signal processing launching cancellation signals can extend that but it has its own issues. You can use a 2nd set of speakers closer to the listener, and achieve better results.

I've heard some THX FX on a Kindle with its tiny 2 speakers also (one could argue the visual ques help). The Smyth Realizer also does an excellent job, but that's a very different thing the only point is that it's still only 2 sources.

Zuccarelli was kind of a kook but the recording of being buried alive (listened to in the dark) is quite convincing.

EDIT - What I am saying is that (in the case of THX) a Kindle and the built in speakers of an ordinary flat panel TV have on numerous occasions imaged realistically far outside the source (FOR ME). In one case the I was not even watching what was on and reached to see what fell off my bench (sounds like the wife in the kitchen :)).
 
Last edited:

audio2design

Major Contributor
Joined
Nov 29, 2020
Messages
1,769
Likes
1,842
I've heard some THX FX on a Kindle with its tiny 2 speakers also (one could argue the visual ques help). The Smyth Realizer also does an excellent job, but that's a very different thing the only point is that it's still only 2 sources.

Zuccarelli was kind of a kook but the recording of being buried alive (listened to in the dark) is quite convincing.

Using crosstalk cancellation algorithms you can do some pretty amazing things, but I am glad you brought up THX FX. That is an excellent example of visual bias. Play it on your laptop both watching the THX image and with your eyes closed. I swear the image is wider when watching the image, but not just wider, but harder to localize. However, when I close my eyes and concentrate, I can clearly hear the sound moving from speaker to speaker and never truly moving outside the speakers, at least not much.

The Smyth Realizer if we mean the same thing is specific to headphones though, and uber cool technology. When you don't have to worry about crosstalk, you have a lot more flexibility. If you have tried that, you may want to try out dSoniq.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,293
Likes
4,820
Location
My kitchen or my listening room.
On the inclusion of the recording's room effects competing with the listener's room effects:
Your xylophone is an interesting one for several reasons. First, its radiation pattern isn't anything like uniform, and second, the "ring" does not come from exactly from the same place as the "hit", although it's close.

The problem with the room goes to basic psychoacoustics. If you do not have enough information captured in the performance room, your ear can not separate out the xylophone from the room response, and "muddy" is exactly the outcome.

Back at AT&T I had a fantastic way to demonstrate this, and one that also related to .1 subwoofer used for anything beyond "sound effects".

I had a recording of a good pipe organ, in a room on Lake Erie that I can't mention by name, being played by the organ tuner's laptop via midi. (Yes, really.)

I had mono, stereo, and 5-channel recordings, using the PSR mike mentioned above. For the 5 channel, the mike wanted to be back about row 10, so that's where it sat, that being the point of the recording.

Mono was muddier than ()*&(&*(, bass to high frequencies.
Stereo did ok above about 500-1khz, but was muddier than ()*&(*&(. But, above 1 kHz imaging helped separate the reverb from the noise. Below 500 or thereabouts, all was mud.
Five channel +-72, +-144, 0 degrees (not the best, but it was what it was, and yes, it had killer imaging in any direction you chose), hey, no, it's not muddy, you can, now, tell the hall from the recording.

Some interesting points. The 'C' channel was at least 6dB higher in energy than any other channel (yes, it was the one pointed straight-on at the main organ). The L/R/LR/RR had about equal energy. The stereo image was as wide as anyone could want, very much like the hall itself.
The recording was analyzed in more ways than you might want to imagine, and yes, the overall spectrum at the listener was very, very similar in Mono, Stereo, and PSR 5 channel. The perception was nothing like that.

In mono and stereo, the reflection from the balcony was not distinguishable except as a really annoying "blur". In 5 channel, you could localize the echo from the balcony front without any problem.

SO. My point? In order to be able to separate, perceptually, the space from the instrument, you need enough information delivered to the ear to do it.

According to those who want "coherence", so it's clear, there was nothing of the sort going on in the short-term interactions between speakers. In fact, those who are interested in 'absolute accuracy' would have been quite upset with the short-term "interference", but we got the right cues to the ears.

Also, the listening area was much wider. You could stroll around the listening area and get a sense of perspective.

Change of topic--on the notion of a sound field:

I don't know whether this adds to the discussion, but I've said here before that what makes one tuba different from another, if there is a difference that anyone can detect, has more to do with how the sound comes to us in the room than with frequency spectrum as measured up close. We take this as a given with loudspeakers--the sound we hear isn't just the direct sound, but early reflections and room effects also. An instrument that sounds like it's "over there"--like the tall-bell German-style orchestral tubas differs from a tuba that sounds like it's "right here" as with the shorter, fatter American-style orchestral tubas. That has to be an impression based on the sound field--the ratio of direct and reflected sound. The shape of tuba's bell creates this difference, I believe. Try as I might, I've never been able to sustain that effect in a recording, and I believe that is because 1.) I lack the sight, which means the sense of over-there versus right-here is uninformed by knowing where the tuba is, and 2.) microphones aren't placed to capture the same ratios of direct and reflected sound. (Of course, most--but not all!--tuba tone is below 500 Hz.)

But it seems to me that it's not an issue of playback equipment, but rather an issue of--yup--microphone placement and how much room effect is in the recording.

Rick "whose recordings in his living room sound like a performer in his living room when listening on headphones" Denney

Well, yes, radiation pattern of an instrument is a key part of how it sounds. You won't get an argument from me on that. One thing (not aTuba expert here) that may happen is the direction of radiation from the tuba. Tubas DO show a "leading edge" for the upper harmonics, as well as the "pitchy" sound one expects of brass. If that signal goes into the rafters and back down, it's going to sound far away, because those details are missing and you only have HRTF effects to localize it. If that's aimed at the listener, you have additional cues to localize with.

What's more, if you are also capturing the room at the same time, you must capture enough information for the ear/brain to separate out the room from the instrument. If you don't, mud happens.

That was the first result of trying a 5 channel recording with the particular array. First time we did it, we put the array where you'd put a stereo pair. Hmm, yeah, yep, that cornet was RIGHT THERE! And I mean about 2' away from your face. Ditto the Trumpet, Trombone, and everything else. So we backed off to about row 6 or so. Now the 1 and 2 channel recordings were really not good, but the 5 channel recording was quite "present".

Lesson learned right there. The last recordings I made with it, of the old Cantus group in Northfield, MN, we moved out to about row 10. Imaging and sense of focus on performers were just fine in 5 channel. In 2 channel, not so much.[/QUOTE]
 

CMOT

Active Member
Joined
Feb 21, 2021
Messages
147
Likes
114
Because, yes, I know the outcome, and it shows that you do not dare extrapolate the "click" test beyond its original stimulus. It's that simple.

Because it will be enlightening. There are many results in audio that show "phase doesn't matter" using specific stimuli for which phase, does not, in fact matter. That does not mean "phase does not matter" as can be trivially proven by using a very simple set of test signals. The trick is in figuring out WHEN phase does not matter.

There are many results, using clicks for the most part, that argue that "envelope arrival does not matter at high frequencies". Again, for some stimulii, very short ones with low energy in particular, that's pretty much true. It does not mean that envelope arrival does not matter, it means that for THAT STIMULUS it does not matter. Trying a gaussian pulse train, say with a 100Hz rep rate (hint, that does not mean there is any 100Hz content at all, so don't go there) and testing for DL interaurally will be a great big surprise. (In headphones.) There will also be a stunning difference in loudspeakers, but there are multiple causes that confound that experiment so badly that it's really not meaningful.

Yes, that's why I tell you to try it for yourself. It is very unwise to extrapolate ANY test beyond the conditions in the test, without some supporting evidence that the extrapolation can be done.

That kind of extrapolation has led to many mistakes.

So try the test, instead of citing tests that may be accurate, but that can not be extrapolated to all signals in the real world.

I will give it a try. And yes, I agree, many lab experiments do not generalize to real-world conditions for all sorts of reasons. FWIW, the people I have talked to about these sorts of things have never claimed things like "envelope arrival does not matter at high frequencies" - in fact, I have heard them claim this is when envelope arrival time really matters. So arrival time is in play across the frequency range, but at the fine structure level for lower frequencies and at the envelope level for higher frequencies. (I do understand you mean a 100Hz pulse train, not a 100Hz signal and that the pulse train at that rate has nothing to do with the frequency of the sounds).

"There will also be a stunning difference in loudspeakers, but there are multiple causes that confound that experiment so badly that it's really not meaningful." - you mean like moving your head around or not sitting in quite the right spot? :)

When I find some time, I will program it up - probably in python - Matlab is too expensive - with both above threshold clicks and with GPT and see what differences emerge. I can't find a reference that has directly compared the two for sound localization - it probably exists, but the literature has gotten too vast.

thanks.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,293
Likes
4,820
Location
My kitchen or my listening room.
"There will also be a stunning difference in loudspeakers, but there are multiple causes that confound that experiment so badly that it's really not meaningful." - you mean like moving your head around or not sitting in quite the right spot? :)

Well, the cancellation due to 1-sample difference and interaural mixing tends to jump out at you at 44.1 sampling rate. While that is an audible difference, it's rather obvious. :) There is also some pre-echo effect that also tends to suppress high frequencies. That at least is a psychoacoustic effect. :) This shows, among other things, why stereo is a deficient system, because if you moved the source that far angularly in a free field, there's be no such problem. :D

Try the following:

Rep rate of 100Hz. Gaussian pulse centered at 8kHz, with a 3kHz sigma. Try 1 sample, 2 samples ITD. I only ran 4 subjects, but they all caught 2 samples without any real effort (10/10 is convincing enough for me in ABC/hr), and 3 did well over chance in 1 sample. And that with no training or practice, just computer-moderated DBT into headphones. The effects in loudspeakers at +-30 is a bit of a trip at 2 samples offset.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,293
Likes
4,820
Location
My kitchen or my listening room.
You could only go around 180 degrees (practically less) with headphones,

Sorry, you can go around to any direction with proper considerations of direct vs. reverberant sound plus HRTF's. Furthermore, you can even do elevation, although with less accuracy.
 

CMOT

Active Member
Joined
Feb 21, 2021
Messages
147
Likes
114
Sorry, you can go around to any direction with proper considerations of direct vs. reverberant sound plus HRTF's. Furthermore, you can even do elevation, although with less accuracy.

Listen to a good "kemar" (dummy head) recording (http://kemar.us). You can go all around.

The first track alone on this disc will convince you....
https://www.discogs.com/Various-The-Space-Sound-CD-Dummy-Head-Recording/release/7870641

But some owls are even better - they are "optimized" for localization in elevation - they have facial ruffs that are direction dependent filtering devices and some species have asymmetric pinnae (think one pointing up, the other pointing down) which enables high precision in elevation:
https://link.springer.com/chapter/10.1007/978-3-642-75869-0_17

I wonder if human listeners would be better in elevation estimates (using headphones) if they listened to a recording created with a full-on "dummy owl head" - recreating pretty closely the owl HRTF. We aren't "wired" for this specifically, but there would still be more direction dependent information in the signal. Someone needs to do an owl recording!
 
Top Bottom