I am so sick of people insisting on unreal levels of bit depth and sample rate when Nyquist clearly shows 24/48 adequate for home listening. I used to be a snob about MP3s and compressed formats but I certainly can't hear a difference after repeated blind tests and my speaker are certainly "revealing enough". Ditto for DSD, seems a clever way to induce sales of media and "features" though certainly a better listening experience than vinyl.
High resolution audio is often misunderstood, but it's to be expected since the industry falsely advertises what it represents and targets it to the wrong target audience.
High resolution music is useful for DAC hardware and DSP.
DAC's work best with high sample rate audio. All* DACs do their own internal oversampling because it's easier (as in cheaper, less labor intensive, and less digitally complex) to create a proper reconstruction filter with high sample rates audio compared to low sample rate audio. And yes, 44.1, 48, even 96 is considered low sample rate to the DAC. High(er) resolution audio puts less "burden" on this oversampling process by having some of it already baked in from the source and not artificially created. It's akin to native resolution on LCD and OLED screens. Most* 4K TVs have native resolution - they can only output a 4K image (even if you feed it otherwise). Would you rather feed them a 720p source to stretch to 4K or feed them 1080p to stretch? Or you could do better and provide them 1440p. The stretching has to happen because it needs that 4K output, just like how almost all* DACs need that high sample rate to work with. It's something designers choose to work around to the point where it's like a physical limitation now. But you could bypass the entire resizer on a TV by just feeding it 4K. Same for audio. Whether or not you hear a difference is another issue entirely, but if you treat audio as a chain, feeding in higher sample rates makes the DAC more "transparent"/pure because you're not utilizing an extra step of DSP. If you feed it 96000, it won't have to oversample as much.
*Before this was known, this concept was explored in the 90s with NOS (no oversampling) DACs which instead of oversampling, run at the rate of target music being played. So if you input 44.1, it outputs 44.1 . Conclusion: it's hard to make a reconstruction filter with that necessary steep drop off near 20k. It can be done, but it's not easy. It also fucking really limits the choice of filter types such as linear phase sharp, minimum phase slow, etc, because you literally don't have enough bandwidth to "create" a good impulse response (remember that impulse response and frequency range are functions of each other - another reason why oversampling is chosen - higher sample rate audio "creates" higher frequency range). This is a whole can of worms if you want to dive in, with DAC reconstruction filters functioning as a necessary evil to deal with the reality of low sample rate music being ubiquitous, and the different approaches of reconstruction serving as both functions of faithful reproduction or subtle tone shaping along the way. Ultimately, the first attempts at digital audio dropped the ball in this realm through unintentional ignorance and didn't take electronics into account. Now we have this RedBook standard, with this naive aspect built in, that refuses to die.
The second use for it is DSP. Signal processors can work differently depending on the source material, just like all forms of A/V signal processing. This can be because of aliasing, pushing an algorithm too hard, or even hand tweaked differences depending on the sampling rate (VST plugins). Going back to the TV example, sharpening. The sharpening feature looks uglier the lower resolution and lower bitrate videos you feed it. You can't crank it because you expose it's own internal weaknesses (halos, geometric shapes, too much contrast, etc), however you can start cranking it when you feed it higher resolution and higher bitrate videos. So the processing step isn't just a function of how well it's designed. And the same applies to high resolution audio - if you use a iZotope Declipper, EQs, stereo wideners, etc. The higher resolution audio
should take to signal processing better. Again, if there is a perceivable difference to listener is another thing. And if it's subtle or easy to detect is another thing, but the concept is also sound.
I don't know much about DSD and don't it or MQA seriously because of what you said. They seem like fundamental greed experiences first and foremost with the implementations of how to access them and potentially dangerous endgame scenarios that would transpire if they became the default for audio. And even if they were better, hell, we can't move people beyond Redbook or 24/48 after decades! And now you want us to break free of PCM and go to something else radically different!? Maybe if human beings had another 100,000 years of evolution or differently structured brains.