First, I appreciate your non-attack response. Good job.
What I mean to say below is:
A) Music is not generic sound because its sound can be cognitively reinforced by emotional pathways.
B) Listening is not passive hearing.
C) I don't challenge the data concerning the noise generated by DACs, but I challenge the conclusions.
D) I suspect that important subtle elements of the signals DACs generate can actually originate from the DACs input recorded digital information that is, on its own, below threshold and out of human bandwidth.
Point A: We're all emotionally unique. Enough said about what makes the sound of music unique...
Point B: As a first approximation, I agree that the sensitivity and bandwidth of human hearing is a large factor. And I know that my own sensitivity and bandwidth are not exceptional - in fact they are a little narrower than average. But hearing sensitivity can be focused (imagine the RCA dog cocking his head to critically listen to the Victrola

). As one example, acoustic musical instruments can have their 'voices' changed by the smallest alterations in physical setup. And that sonic change can be obvious to their players and luthiers, but less so to those not trained to attend to the instrument's 'voice'. Would you invite John Doe off the street to tune your piano? Don't do it. Piano tuners are highly trained listeners.
Point C:
Intro - I spend my leisure time to better understand and transform my observations into entertainment as I see fit. While the ideas from this shark tank are interesting, I remain deaf to the sharks' derision and/or caustic opinions. That said...
Background - I recently upgraded 4 ESS-based DACs that I built about 4 years ago. They all got a new external, frequency-switching master clock board. And I tested a small variety of clocks in the systems. With all of the DACs - two 9038Pro and two 9028Pro - the output sound had changed playing into headphones and confirmed by REW scans using my wide-bandwidth amp with highly resolving speakers.
Observations - Now, let's zoom into the 9038 with the cleanest clocks (which happen to natively output a square waveform). Especially with clean acoustic recordings at 24 bits of depth, all of the the interpolation and noise-shaping stages before HyperStream (block diagram below) in the DAC are not just unnecessary, they cause small degradations of the things I listen for. DSD vs. PCM input sounded almost identical. After listening while manipulating the DAC's various digital filters, I decided to use PCM because a) it allows optimizing the DAC's 'pre-Hyperstream' stages (whereas DSD shuts them down), b) 8X upsampling the PCM paid almost no dividends - the sound was just a touch more coherent, but lacked a little of the 'dynamic punch/transient weight' that came from the native recording frequency.
Experiments - In PCM, the harmonic distortion neutralizer circuit (THD Compensation) was the one exception to digital degradation of the sound. I wanted to learn if it could be used to help the 'voice' of the system and it turned out it can. But it's complicated, because it depends heavily on the recording quality, content, and playback volume. I had installed a web interface to manipulate the DAC chip from the player's internal RPi computer. It's 3 pages of buttons! Adjusting for each particular album or track - with instantaneous A/B switching - is no way to 'just enjoy' the music! In the DAC control website I built-in 4 harmonic distortion profiles, and confirmed them with FFT plots directly from the DAC's balanced I/V board. They do this: 1 - correct for inherent 2nd and 3rd harmonic imbalance - creating a clean one-peak FFT; 2 - emulate the performance of a 6SN7 triode (like my old preamp, now long retired); 3 - slightly 'warm up' strong vocals that my listening room can't quite handle; 4 - approach the harmonic sound signature of a 300B room-heater amplifier. Long story short - I'm in the midst of developing a system of digitally 'fingerprinting' each track using an ffmpeg analysis*, and putting a lightweight Dameon on the RPi to optimally, and automatically optimize the 9038 processing for each song. No more flipping around the control website to get the best - most emotionally engaging - sound reproduction. You see why I just can't accept that all DACs are transparent, and that brings me to my last point.
Point D: The DPLL and Sub-Threshold Information
In years past, it was generally accepted that when using the DAC's DPLL to eliminate jitter, narrower was better — a well-designed system would hold lock at the tightest possible window. With the low-jitter clock boards I installed, the DPLL is no longer needed, and the sound is measurably cleaner without it — just as in DSD mode.
The effect is most obvious on my best live acoustic recordings, the ones my son describes as "magical." When I engage the DPLL and progressively narrow its timing window, something essential disappears: the fine spatial clarity of the soundstage, the audience settling into silence, the small sounds of instrument handling and microphone movement during a live performance — that texture of
being there goes away. On the FFT plots, all of that information sits down in the grass around −110 dBr. By any conventional standard, it shouldn't be audible.
My working hypothesis is that the DPLL's temporal averaging destroys the fine timing relationships between these sub-threshold signals — and that those relationships matter. In a nonlinear system, sub-threshold signals don't just disappear; they can combine to produce intermodulation products that cumulatively reach perceptual threshold. Destroy the timing coherence, and those products either don't form or shift out of the frequency ranges where they'd reinforce the original signal. What's lost isn't the signals themselves — it's the emergent audibility they were generating together.
I can't prove this yet. But I hear it consistently, my measurements are consistent with it, my A/B testing is consistent with it, and I'm not done investigating and enjoying it.
All the best, as always.
* - ffmpeg variables being characterized include: integrated loudness (LUFS), loudness range (LRA), crest factor, and flat factor