So there are some misconceptions here, I think:
- That's the average of the entire recording, not the peak - the peak value will, of course, depend on the length of the FFTs we use, because you're effectively integrating over the entire duration of the FFT, but something that's inarguable is that an FFT of the entire audio sample does not reflect the peak value.
True. The numbers were more about having a ballpark figure. It would be more precise to specifically look for peaks in HF content >20 kHz and see what else is present at that point in time and compare those SPLs.
- -35dB for a distortion product would be quite audible with a pure tone, so long as it was not an ultra low frequency and/or the harmonic wasn't H2 - it's important to bear in mind that there's no universal threshold for distortion audibility outside of the actual threshold of hearing, because it's entirely dependent on the masking effect, and masking is a frequency and level varying effect
This isn't about pure tones, though. The test sample we're talking about is music. With actual music and a broad distortion spectrum,
most people still fail to identify -30 dB or less. Yes, that's not the limit of audibility. Yes, trained listeners perform better. Yes, it doesn't cover all types of music and distortion and doesn't directly apply to this ABX test. It's just a ballpark figure to get an idea how much smaller a signal typically has to be to possibly consider masking.
- Distortion audibility isn't a good analog for audibility of portions of music, since distortion is tautologically a product of another signal, which, when distortion is a negative decibel value, is tautologically a stronger signal. The sections of high frequency information that are captured with the wider bandwidth filter are short, transient sounds which do not have to have a larger masking signal present - to give an example here, I can get a -110dB average value with a sample of a gunshot in a long period of silence, but it's still going to be audible for the moment it occurs if you can hear the frequencies it covers
As stated, distortion was an example to get an idea of the order of magnitude we're talking about. For the sample files we're talking about I just checked how much difference there is between 3 kHz and 21 kHz in a region of ~0.1 s around some of the transient "ping" sounds, which do have the highest HF content. It is >=30 dB of difference. Your point about short peaks getting ironed out by the average is correct. As far as I can tell from the analysis, my guesstimate of SPL difference still happened to be correct in this case. Probably just luck.
And I'd like to reiterate that - as far as I can tell from the given information - Cameron barely manages to identify tones at 21 kHz under ideal conditions (which is impressive enough). I'm just sceptical that he can identify content in this frequency region while listening to actual music with so much other "dirt" around at the same time. I'm not saying it's impossible, I just like to see clear evidence that this is the reason he aces that ABX test. And I'd like to understand other possible factors as well, in case there are any.