OK, I fiddled with this some more and figured out that my limitation in going up in number of tones for the multitone test was the FFT size. The highest I could use repeatedly without the app crashing was 400 with the max FFT of 4 million (vs. FFT=1M in all previous tests). What shows up with FFT 4M and with all multitone numbers from 200 up is a stable picture of some 0.7-0.8 dB dips at 2k, 6k, 10k, 13k and 17k.
MT200:
, MT300:
, MT400:
Lower numbers of multitones start to not reflect this as well going down from 100, and lower FFT sizes also start to smooth this out and make them look like fewer and fewer dips as I go down. I got it to work once with 500 multitones but it threw an error during the run and then didn't compute the FR once it was done, but it did give me the base spectral plot, and if you look closely at the spike tips the same dips are there in the same spots:
They don't quite look sinusoidal, but I would still bet on this being based on the DAC's in-band ripples (one of the options suggested by
@solderdude), just made a lot worse by the amp section, because the DAC's datasheet says these ripples should all be sub-0.01 dB no matter what filter you choose.
This doesn't look like an artifact created by the software itself or by pkane's choice of how to space his multitones apart, because the biggest differences are seen when I test different DAC/amps:
Cheapo dongle JCally JM6 Pro (CX31993):
Half decibel wobbles, heard them as treble grain/mush in the original listening test.
Also cheap AkLiam PD4 (CS43131, discontinued, maybe their PD4 Plus, PD5, PD6 etc. perform better):
Interestingly, a similar response shows up for the 2015 iFi Micro iDSD (first edition with the silver housing, Burr-Brown chip):
A bit fewer wobbles and somewhat lower in amplitude, should sound better but still has that "inflatable mattress response".
I'd say that was simply the state of the art at the time, but there did seem to be other devices around in 2015 that went for more of a KA17 approach with far fewer dips but more audible in amplitude, e.g. for the Oppo HA-2 (ES9018Q2M, line-out, MT400, FFT=4M):
And of course the star of the show so far, the HiBy FC3 (ES9281pro) with its pristine response, probably because it's such a simple device, almost just a "DAC chip with cable connectors", which I'm betting is why it can perform so close to the chip's datasheet specs:
(Thanks again to
@Serge Smirnoff for pointing out this beautiful performer via his Df-metric results page.)
And since I actually started yesterday's tests with Solderdude's suggestion to put white noise through the KA17, I can now show it because we know better what to look for:
Audacity is dumb and doesn't let us zoom as much as we'd like on the Plot Spectrum graph, I had to drag the window partially off-screen to enlarge it a bit, and I got 1-dB gridlines. Even at this resolution you can still see the dips are there, maybe not the 2k one but there's a 5-6k, a 10k, a 13k and a 17-ish k. Much subdued, probably within 0.2 dB of amplitude delta, but still there, and could be still audible.
I think these ripple effects are not revealed as well by white noise as the randomness of the signal probably doesn't allow the oscillatory phenomena that must be the root cause of such behavior to ramp up over time as they would with a stable input. But even that doesn't completely hide the problem. With music I think the effect comes out far closer to what we see with multitone.
The next thing I want to try is to measure with all the different KA17 filter options, see if that's really where this whole thing starts. Everything up to now has been with the linear-phase fast-rolloff aka. "FAST" in the menu. (What this also makes me wonder is what filter is pre-selected in the HiBy FC3 - maybe that superb flatness in the audio band comes at the cost of subpar attenuation in the stopband, and of higher aliasing problems, though I'd bet that's far less audible than 0.2-0.8 dB passband ripples and might have been a smart trade-off.)