My understanding of MQA explanations. As you are obviously much more knowledgeable in the subject than myself, please 'filter'

)) what may be wrong:
They are not primary talking of the digital filters you mentioned (although they do analyze in depth the effects of decimation filters in the ADC process, a subject that's beyond my understanding), but instead of the analog ones with the known phase shift of harmonics they produce because of their capacitances and resistances.
In a very simplified explanation (sorry, not lecturing about Nyquist-Shannon, just trying to make myself clear): while you are converting an analog source to digital, there is a need to cut frequencies of the incoming analog signal before reaching the Nyquist frequency (NF), 22.05 Khz in a Redbook. Otherwise, severe aliasing artifacts will occur in ADC. If as close as possible to 'Brickwall' filters are used for this, a much bigger increase of the time smearing problems in the frecuencies below will happen. If instead a gentler filter slope is used, either you will have signals remaining beyond that NF and won't eliminate those aliasing problems you are trying to avoid, or, by displacing the filter to a lower frequency, you will start having a poor high frecuency response in the audible band , as that NF in Redbook is quite close to the audible limit. And yet, still having phase problems to some degree. Every ADC process must balance these opposing problems, but it is not possible to get rid of them completely. One of the reasons, according Meridian, why low sampling digital sounds harsh compared to analog sources.
Among the basic premises of MQA, one of the reasons why HD files (any high resolution file, not MQA only) sound better -if done properly-, is because that NF is displaced to a higher frequency. Then much gentler analog filters can be used to cut the incoming signal prior to quatization, without reaching the audible band, and so, preserving the phase coherence between fundamentals and harmonics in that audible band, while at the same time avoiding those aliasing problems of Redbook (at least, close to the audible region). There are of course other reasons that justify a higher sampling (like the better impulse response you get the higher your sample is), but I mentioned the issue because of this rather simple fact of avoiding the effects of *analog* filters in the audible band.
Then, if that explanation satisfies you, perhaps you may address the other "aberrations" of my previous post that you were commenting. It would be helpful for myself too.