You are in the process of making Kuncher's mistake. JND is the difference in loudness you can perceive by ear. As in playing two tones close together and picking the louder or saying they sound the same. Depending upon the frequency you'll get something between about .7 and 1.2 db for the answer. It's been awhile since I read the paper, but I think he was assuming 1.2 db was the JND.
Yet, in blind testing .5 db loudness differences will result in almost universal recognition of a difference. A .25 db will be picked up significantly if not 100%. Only around .1 db will get you random results. If you look at the filters for the harmonics of 7 khz, they had an effect on the level of the 7 khz tone. Kunchur incorrectly believed keeping differences below 1 db wouldn't effect results.
So when you look at the results, they are almost exactly what you'd expect if you just played 7 khz tones that had a slight difference in level. So his test is invalidated right there. I don't have the files, I once made up some pure 7 khz tones and sent them to a few people. Level differences matched the effect of his filters, but mine were not filtered. Results were similar to Kuncher's.
All of this is why level matching by ear for comparison listening does only one thing. Confuses you. Because you likely don't get it closer than .5 to 1 db. You'll always hear a difference with those level differences even though they sound the same loudness to you.
From Kunchur's paper:
As per the earlier discussion in subsection 2.2, only the 7 kHz component exceeds its threshold of audibity. The sound-level changes in all components (individually or collectively) fall below their JNDs. For the shortest discriminable displacement of d = 2.3 mm, we have ΔLp ≈ −0.2 dB (a 5% intensity decrease) for both rms and 7 kHz fundamental levels (Tables I and II, and Fig. 5). The JND (for f ≥ 7 kHz and Lp=69 dB) is known from Jesteadt et al. [40] to be 0.7 dB (a 15% decrease in intensity). Even the 3 standard-error lower limit of this JND is 0.5 dB (an 11% decrease in intensity). Thus the level changes in the experiment (< 0.2 dB) appear to be subliminal and the discrimination might involve more than just spectral amplitude cues.
In the last portion with a level difference right about .2 db one listener was 10 for 10 and the others 4 for 10. At the point where the level was .5 db different results were around 8 or 9 of 10. Above that it was 100%. His test was fatally flawed for just this reason about mistaking JND and level differences that alter blind testing. It is a shame he expended so much effort. I was rather appalled that no on reading his write ups caught this obvious goof before publication. Equally appalling is the oversize attention his work gets and apparently will get into perpetuity. This very basic flaw kills its relevance right there. Strong is the will of the audiophile to want extra bandwidth to mean extra sound. More is better.
The other obvious tip off was his misunderstanding the timing resolution of digital audio to be the time between samples taken. Sound familiar?
Thank you for taking the time to explain the JND vs Blind deltas so eloquently. Much appreciated. We both believe that the Kunchur's experiments were flawed because he took from a book what he believed was an accurate and relevant differentiation threshold for a pure tone, instead of measuring it in the context of his specific experiments, using exactly same equipment, participants, and blind testing protocol he used for the square waves.
However, I haven't found any evidence pointing to Kunchur's misunderstanding of the Sampling Theorem. Or that Bob Stuart and Peter Craven misunderstand it. Let me relay my view on this Theorem, if you will. I will use the definition and proof from
https://ccrma.stanford.edu/~jos/mdft/Sampling_Theorem.html.
What functions are subject to the Sampling Theorem? For an audio signal, let's assume the Theorem's first two technical requirements of the function being continuous, and having a continuous Fourier transform, are satisfied. The third technical requirement is tougher to satisfy though: the function values of the Fourier transform have to be zero beyond the angular frequency argument value of |w| >= Pi / T, where T is the sampling interval in seconds.
Obviously,
audio-practical sines with constrained frequencies and amplitudes, and their linear combinations with finite number of terms, satisfy the third technical requirement of the Sampling Theorem. By
audio-practical I mean a sine that could be gradually attenuated toward zero amplitude at its beginning and end without perceptually changing the resulting sound.
Yet plenty of practically relevant functions
don't satisfy the third requirement. For instance, look at
https://web.calpoly.edu/~fowen/me318/FourierSeriesTable.pdf or
https://www.utdallas.edu/~raja1/EE 3302 Fall 16/GaTech/fseriesdemo/help/theory.html.
Correspondingly, the Sampling Theorem simply doesn't technically apply to such practical functions. If we want to perceptually accurately sample the audio signals containing such functions as their components (e.g. such signals as some practically observed transients), we have to first approximate those "inconvenient" functions with other functions, perceptually equivalent, yet satisfying the third technical requirement of the Sampling Theorem.
The perceptually accurate approximations of transients may require a lower, equal, or higher sampling rate and bit depth, compared to the sampling rate and bit depth required for the representation of the audio-practical sines.
If the perceptually accurate "transients-friendly" sampling rate and bit depth, for a specific audio signal, a specific delivery system, and a specific individual, happen to be lower or equal to the perceptually accurate "sines-friendly" sampling rate and bit depth, we can just use the "sines-friendly" sampling rate and bit depth throughout, and the transients will be taken care of as well. Please note that such sampling rate doesn't necessarily have to be 44 KHz or more: some phone systems still happily use sampling rate of 8 KHz.
If, however, the combination of a specific audio signal, a specific delivery system, and a specific individual is such that the "transients-friendly" perceptually accurate sampling rate and bit depth turn out to be higher than the "sines-friendly" sampling rate and bit depth, we either have to use the "transients-friendly" sampling rate and bit depth throughout, or devise a scheme for encoding the sines and transients via different channels, each using the corresponding "friendly" sampling rate and bit depth.
It took some time for humankind to figure out the practical universal "sines-friendly" sampling rate and bit depth for music. Most people believed those to be 44 KHz and 16 bits, back in the 20th century. Nowadays, it is believed to be 48 KHz (to accommodate a wider transition region for a perceptually-friendlier antialias filter) and 24 bits (a perceptually-friendly 20 bits rounded up to the nearest multiple of 8 bits).
As to the practical universal "transients-friendly" sampling rate, the debate keeps raging. Judged by the highest sampling rate that audio professionals and buyers of high-resolution audio and video records voted for with their wallets, 192 KHz appears to be sufficient. There are hotly disputed indications that certain rare genres of music benefit from 384 KHz.
The practical universal "transients-friendly" bit depth is commonly believed to be either equal to, or lower than, the "sines-friendly" bit depth. While I haven't seen this assumption scientifically proven in an experiment directly aimed to do so, this does appear to be a reasonable approximation for now. Still, I'd like to see this validated one day.
What MQA ostensibly achieves is a compression ratio that allows to store perceptually accurate representation of practically encountered music pieces, sampled at the highest currently believed "sines-friendly" and "transients-friendly" sampling rates and bit depths, in a space required for the uncompressed 20th century 44/16 representation.