I do not believe MQA is dumb. Paul McGowan, a noted critic of MQA, compared it and the original 192k masters. They were very close. He was impressed by how close the two were. As he said, 'After all, this is a pretty amazing process that allows streaming services to send high-resolution audio without degradation – no small feat.' That should be with nearly inaudible degradation from his listening tests.
The problem with MQA is that they claim it is better than the original (I will not go into why that is claimed, but they make the claim). As Paul said, 'If all the hype and hoopla had merely stated the results were
indistinguishable from the original, I might be jumping up and down with how close they got it. Few systems have the resolving power of Music Room One, and the fact they got close after folding the music into a smaller file size is quite an achievement.'
The real issue with MQA is not that it is a lousy format - it is, as Paul said, quite an achievement. It is the BS marketing hoopla around it making claims better than the original etc. But the MQA people are hardly alone in that tactic.
Technically it is an interesting way to transmit modern recordings. Mostly the original performance these days is captured using some one-bit high sampling rate such as 10XDSD. However, MQA claims the audible information is contained in a triangle found in figure 7 of the following:
Even at high sample rates, standard PCM audio ‘smears’ important timing information. A new digital format, MQA, promises vastly improved time-domain accuracy — without the huge file sizes.
www.soundonsound.com
To transmit it efficiently, you first run it through a slow roll-off 32-bit filter flat to 20k. It slowly rolls off to be about 8db down at 48k and keeps on reducing even higher frequencies. Since it is all noise above 48khz, that is not a worry, nor likely is it 8db down at 48khz. It is just conjecture; probably (at least hopefully) they have done blind listening tests to prove it. Suppose you have a 192k record, ing and chuck away every second sample, then you get a 96k stream. But this chucking away has a consequence - information above 48khz is reflected into the 0-48khz region. However,r because of the shallow filter,r this information lies below the noise floor hence will not make any audible difference -you are just changing the noise a bit. You can do the same for 384k to convert it to 192k and the high DSD rates recorded. So all the audible information has been captured at 96k.
Now you can locate the noise floor and figure out how many bits you need to reproduce the triangle's music. Add a couple of bits to the safe side, then reduce it to those bits using dithering techniques. Look dithering up if it is not something you understand. From what I have read, it can be anything from about 15 to 18 bits. This means the stuff that has been reflected has been chopped off. It then, for compatibility reasons, adds and subtract samples next to each other. Compress the removed information in the bits below what is chopped off,f, and if played back at 48k 24 bits, it will simply sound like noise. But to get the 96k back, use those bottom bits to recover the difference information, so you get the original back by adding and subtracting. In practice, it uses what is called a quadrature filter - but the principle is the same. We know the slow roll-off filter used and can figure out the best filter to upsample it back to some high bitrate. The filter and chucking away process is done using the triangle sampling they discussed in the link above. The inverse of that is simple linear interpolation. It's just an approximation of what was chucked away, but since it is all noise, who cares. The high sampling rate has all the bits except the top bit thrown and dithering applied so that in the audio band, you get the original bit depth. It makes reconstructing the audible signal easy. You pass it through a simple analog filter. The PS Audio guys give use an 80k 24db filter including an output transformer in the network. A friend of mine takes a DAC output and passes it through just a transformer relying on the natural roll-off of the transformer. Either way, the idea is to introduce the least audible processing to get that very high DSD delivered to the listener. In practice, more complex functions are used in downsampling to 96k and upsampling than simple triangles and linear approximation called b-splines. When encoding MQA the encoder decides on the best filter and transmits that with the 48/24 file. I think all this merging into the bottom bits is silly - transmit the 96k using Flac.
OK, that is what MQA is supposed to do with modern recordings. You usually use a sharp cutoff sinc filter to reduce the one bit to say 192k and transmit it. The MQA claim is in doing that it introduces time smear. The shallow filter introduces negligible time smear. But the MQA people forget one thing - Shannons SamplingTheoremm. If you take a bandlimited signal and upsample it using a sinc filter, then the bandlimited signal is reproduced precisely. That is what the DSD and Chord DAC's do. There is no need to try and recover an approximation of just noise. Only a blind listening test can determine which is best. But MQA certainly is a tricky way of doing it.
Note, however, when audio is made into MQA, what is supplied is usually the 48k, 96k, 192k or even DXD master rather than the original very high bitrate recording. That will likely have used a sinc filter that will introduce what MQA call time smear. When converted to MQA, that time smear will remain, so their claimed advantage of reducing time smear to really low levels does not hold. It would have to upsample it to some high frequency and rely on om Shannons Sampling Theorem then apply the MQA process. That is where the controversy starts. People like Rob Watts claim, unless you use his 1 million tap filter to get an exact reconstruction to 16 bits, then audible inaccuracies are introduced. Most engineers would claim with modern filters, such inaccuracies are inaudible. Only a blind listening test can tell. But measurements show Rob certainly has produced a perfect sync filter to at least 16 bits accuracy. However, he has never published the math showing you need a million taps that require a lot of processing power.
Just a personal opinion, I think MQA has gone in the wrong direction in trying to be compatible with existing standards. One can apply the slow roll-off filter and use a new compression method invented by Microsoft:
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/Malvar_DCC07.pdf
You chop off everything below 18 bits in the frequency domain and transmit at the lowest rate where all frequencies above it are zero after chopping off to 18 bits. Any loss of resolution in the frequency domain is much harder to detect than in the time domain. It will likely produce better quality and smaller files and capture the information in the small number of recordings that do not fit into the triangle.