That looks like a good analogy.
You are showing a rope, a wall and a hard cylinder, which are probably just that.
And then claim they are a part of an invisible elephant.
I wouldn't care about an invisible elephant if it did't "step on my ear" every once in a while
This usually manifests itself in my aversion to a certain genre of music recorded on CD, until I hear it played live by competent musicians. I already mentioned gamelan. Over the years, I had to add to this list mariachi, and all kinds of rock styles involving heavily distorted electric guitars.
So far you have only shown that it is possible that a sharp pulse might be recorded differently that the ear's response to it.
You didn't show that this is actually the case.
Even if the physical response is different, you didn't show that this difference can be consciously perceived in an ABX test.
You also didn't show 16/44 is not enough to capture the perceived sound.
This means that even if the perception is different, it could be possible to compensate for the difference during the recording, for example by using binaural recording.
Well, there were attempts to prove just what you said needs to be proven. For instance:
http://boson.physics.sc.edu/~kunchur//Acoustics-papers.htm
http://boson.physics.sc.edu/~kunchur//papers/FAQs.pdf
It's been "debunked", for instance here:
https://www.audiosciencereview.com/forum/index.php?threads/milan-kunchur.522/
https://hydrogenaud.io/index.php/topic,73598.100.html
I carefully went through all of the above. Roughly, the "debunking" could be split onto three categories:
(1) Arguing that Mr. Kunchur doesn't understand the Sampling Theory. Well, this didn't sound right to me, because, based on my first-hand experience, Sampling Theory is an order of magnitude easier to understand than Quantum Mechanics, which is in turn an order of magnitude easier to understand than Quantum Electrodynamics - the primary area of Mr. Kunchur's expertise, which was never in doubt by his peers.
After carefully looking at all the "debunking" presented, I realized that the would-be debunkers understanding of what Mr. Kunchur was trying to prove wasn't satisfactory. For instance,
https://hydrogenaud.io/index.php/topic,73598.msg834710.html#msg834710 alleges that Monty Montgomery "debunks Kunchur soundly using a very clear example with no math:
https://xiph.org/video/vid2.shtml. That part starts near the end at around 21:55".
Well, what does Mr. Montgomery says at 21:24 in this video? Verbatim: "Again, our input signals are band-limited". And then he goes on to demonstrate that a band-limited signal that kinda sorta looks like the original in-between-samples pulse can indeed by perfectly captured. What can this prove about a signal with spectrum depicted on Fig. 4 (page 597) of
http://boson.physics.sc.edu/~kunchur//temporal.pdf, or about a signal with spectrum depicted on Fig. 3 (page 5) of
https://pdfs.semanticscholar.org/d9bf/d506271a8c38cf0f77e6edfbffebf5e368b6.pdf?
(2) Arguing that Mr. Kunchur didn't control some of significant experimental input variations, and thus purely physical measurements biases crept into the experiment. Once again, this didn't sound right to me, as Mr. Kunchur went to great lengths to eliminate such biases, using his expertise in designing and conducting experiments using much more refined experimental machinery, and producing results of much higher precision, than what was required for the audio experiment.
Those were earnest and serious inquires. One by one, they were answered: as much as I could tell, to the complete satisfaction of the inquiring parties. My best guess is that those online questions reflected what Mr. Kunchur encountered during the peer reviews, not only of the finished paper, but also during the experiment design, and discussions of its preliminary results.
(3) Debunking based on potential audiological experimental biases. Note that in this case I used
debunking instead of
"debunking". Mr. Kinchur answered some of those inquiries to my satisfaction, yet not all. Specifically, this one was never answered, to the best of my knowledge:
https://hydrogenaud.io/index.php/topic,73598.msg701379.html#msg701379. Indeed, there is no reference in Mr. Kunchur writings to the experimental controls proving that the participants couldn't differentiate the primary harmonic of the 69 dB signal at 7 Khz with precision better than 0.25 dB.
I happen to share perspective on these experiments with Mr. Johnston, the one who is so prominent on this forum as well (
https://hydrogenaud.io/index.php/topic,73598.msg676753.html#msg676753): "If you want to support this premise, repeat the experiment and see if you can confirm the results. You might even try to improve the experimental process, and try several different kinds of ultrasonic stimulii to see what's going on."
Knowing what we know now about the cochlea machinery, it is possible to imagine that the differences Mr. Kunchur measured could in fact change the perceived timbre. An experiment with even finer physical resolution, and more rigorous audiological controls, could either prove or disprove that, with acceptable level of precision. As it stands, I can give a rebuttal to the only serious original rebuttal still standing I'm aware of, but what would it prove? Essential audiological controls in the original rebuttal were absent too. My rebuttal could go like this:
So, you are saying that you:
"... generated two 7 kHz sines using Soundforge. Since the software is all 16 bits, I decreased the volume of the first one by 1 dB, and of the second one by 1.25 dB, in order to get the same quantization noise on both ... I was wearing headphones and the playback volume was moderate. Inferior to 80 dB, but I couldn't say how much .. Like the listeners in Kunchur's experiments, I found the louder sine "brighter"".
Please tell me:
What is the mechanism behind you finding the louder sine "brigher"? That is, why the perceived timbre of what is ostensibly a pure sine signal appears to be noticeably changing with such a small change in volume? Could it be that this effect was caused by unaccounted-for harmonic distortions in the software and/or hardware you were using? Or by distortions particular to your hearing system at that unspecified sound level?
I would also suggest that this difference, even if all of the above is proven true sometime in the future, is not very relevant to music.
While there are instruments with a sharp onset, I would suspect most of the audible vibrations/energy would be in the subsequent "ringing" of that instrument (the instrument itself, not the ear's response).
Could be, or could be not. If the pulse is strong enough to excite essentially all auditory nerve fibers in the course of, say, 200 microseconds of the cochlear ringing, than for the next ~2 milliseconds the IHCs would be recovering, and wouldn't register what the instrument was sending during these 2 milliseconds, potentially making that instrument to sound subjectively softer.
If, on the other hand, the pulse is so weak that it wouldn't be heard all by itself, it could still pump up the IHCs enough to trigger much faster during the integration of the consequent sound that the instrument was sending. If the instrument's signal is also not very strong, then instead of it being heard, let's say only after being integrated enough over 200 ms, it could be heard just 50 ms after the non-perceived transient, which could be quite noticeable musically.
Qualitatively, this corresponds well to the differences audiophiles ascribe sometimes to the qualities of an "under-resolved" record: gross music components - loud and long - sound right, but levels and timing of decay tails and reverberations are off, taking away from the immersive experience.
The effects described above can potentially affect perception of any music where transients and sinusoids are placed close enough together: gamelan, mariachi, and heavily distorting guitars are perhaps extreme examples of that, yet I personally also noticed that sometimes cymbals produce the "blackouts" of consecutive music tones, and decay over time, perceptually differently on 192/24 vs 44/16 - for me.
Generating a sharp pulse is hard, even the air itself would attenuate it while it travels from the instrument to the ear.
Additionally, many of the studies used pulses at very high SPL to be audible at all.
I don't think most people listen to music at such levels.
Yes, a sharp pulse attenuates while traveling through the air. Hearing system probably takes this into account, I would expect most readily for natural sounds. I believe this could be one of the reasons why we can feel the "stage depth" of a good acoustic recording. But then again, such pulse could be perceived as "veiled" instead of as more distant.
Other strong depth clues are given by the patterns of reverberations, and spectral changes of sounds emitted by sound objects with known characteristics. Perhaps the integration of intensity of the direct transient, the clues coming from the reverberations, and spectral analysis gives us the robust estimate of the depth of the scene?
From evolutionary considerations, it must work well both in the the open air and in the forest, where the transients can be actually veiled by the foliage. I'm not aware of a detailed peer-reviewed research on that (haven't yet looked deeply into the stereo stage depth effect), yet somehow the stage depth ought to be recorded and then perceived. Maybe other members have more insight into this?