If appears some parts of this may have been run thru a soft limiter or perhaps a microphone was near overloading on cymbals. Do you have any information on the provenance of how it was recorded or processed? I suppose that really isn't pertinent.
That's a good question! I believe provenance is pertinent. I tracked and mixed that song myself. I picked a simple and quiet passage, so no overloading there, at least nothing significant: other parts of the song were performed much louder. Onboard FX was completely off during tracking.
Soft limiting during tracking: can't completely exclude this, because MOTU 896mk3 provides additional ~12 dB of soft-clipping headroom, by using two ADC channels, running in parallel, per physical microphone channel, the second ADC channel attenuated by ~12 dB, with a short "look-ahead" buffer.
It is not very likely though that the soft limiting mechanism was triggered during these 6 seconds, as this was a really quiet passage. Cymbals were struck a lot harder during other parts of the song, and I don't recall seeing an overload there. That was mixed from the last take, so I already dealt with the overloads during the sound check and previous takes.
Compression was applied to the vocals, and to the overall mix. Yet once again, it shouldn't matter much on this quiet part of the song. And even if the compressor did significantly affect the signal, compression is a virtually omnipresent factor in mixing and mastering these days: the delivery chain has to deal with it transparently.
Just what aspect of this is showing us momentum transfer from ultrasonics.
The idea was to determine what information was lost while capturing at 48/24 vs capturing at 192/24. Also, what was lost when resampling and dithering to 44.1/16 from the 48/24 and 192/24 captures. After that is determined, we can look at the lost information, try to hear it, and figure out whether it indicates a lost transfer of mechanical momentum, or otherwise just represents useless true-ultrasonic-sinusoids and noise.
A is a 6-seconds fragment of the 192/24 master.
B is a straight resample of A into 48/24, using Audacity 2.3.0 on macOS, with the "Best Quality" setting.
C is a straight resample of B into 192/24, for comparison with the master, also using Audacity with the same settings.
D is A downsampled using Sox: every three out of four 192/24 samples were replaced with zeroes.
E is a straight resample of D into 48/24, using same parameters as were used for A->B resample.
F is a straight resample of E into 192/24, for comparison with the master, using same parameters as were used for B->C resample.
A->B represents a capture at 192/24, with consequent resampling to 48/24 using full information contained in the 192/24.
A->D->E emulates a capture directly at 48/24 using delta-sigma ADC.
A->G and A->D->I are analogous to A->B and A->D->E, but for the 44.1/16 (CD) output format.
H and J are analogous to C and F, but for 44.1/16 instead of 48/24: they are only used to ease the comparisons with the master.
What you'll see and hear:
A_vs_C and A_vs_H, representing differences between the 192/24 master and properly produced resamples of the master to 48/24 and 44.1/16 correspondingly, do show that some information was lost (look at waveform dB graph in Audacity), and that 48/24 lost less than 44.1/16.
Amplitudes in A_vs_C and A_vs_H signals are not that high. The signals themselves are not heard. It appears that the process of proper resampling directly from the 192/24 to 48/16 or 44.1/16, while throwing away some transients details, manages to capture some useful information about the transients (just like you said it would).
A_vs_F, representing difference between the 192/24 master and the signal captured at 48/24, shows a lot more loss. You can hear the A_vs_F as is. The level of that signal is at times just 20 dB lower than the master peak.
A_vs_J, representing differences between the 192/24 master and the signal first captured at 48/24 and then resampled and dithered (with noise-shaping) to 44.1/16, shows even more loss. A_vs_J can also be heard as is. The level of that signal is at times just 18 dB lower than the master peak.
Amplitudes of A_vs_F and A_vs_J are significant, and quite a few of the fragments of these signals are asymmetrical, which indicates that the corresponding encoding processes didn't capture the mechanical momentum that was present in the transients (just like I said they wouldn't).
We can now contemplate in a more informed way the pros and cons of the approaches listed below:
(1) Guaranteed good sound quality, yet challenging to stream because of the data size: capture with 192/24 PCM; losslessly compress; store and deliver; uncompress.
(2) More involved algorithmically, yet resulting in more compact data, while still maintaining good sound quality: capture with 192/24 PCM; split onto three parts: hearing range sinusoids (encode at 48/24), noise (encode compactly), and transients (encode compactly); compress the three parts via lossless algorithms; store and deliver; uncompress the parts; combine the parts and deliver in 192/24 PCM. MQA supposedly does something like that.
(3) Lossy in regard to transients, yet still acceptable for "non-complex" music: capture with 192/24 PCM; properly resample to 48/24; losslessly compress; store and deliver; uncompress.
(4) Lossy in regard to transients, yet still acceptable for "simple" music: capture with 192/24 PCM; properly resample to 44.1/16; losslessly compress; store and deliver; uncompress.
(5) Not recommended (too many of important transients will be lost): capture with 48/24 PCM; losslessly compress; store and deliver; uncompress.
(6) Not recommended (even more of important transients will be lost): capture with 48/24 PCM; properly resample to 44.1/16; losslessly compress; store and deliver; uncompress.