Ideally it would work this way and indeed it does in many streamers. There are some barriers here though:
1. Measurements. I always perform measurements with 24 bits. To the extent the transport or device doesn't support 24 bit, we see artifacts generated. In general, the conversion is done poorly with truncation rather than proper dither. Where it is happening in the stack, I don't know. To the extent you don't play anything that is 24 bits, then this becomes a measurement artifact.
Unfortunately in this case even feeding the device 16 bit content, it didn't produce the theoretically correct digital output. It may have some processing pipeline for audio that is performing conversions. It can also have a noisy digital output clock
2. Transport. We can only get transparency if the wireless transport supports 24 bits. Airplay clearly does not and in my past testing, can limit performance to 94 dB or so. This is why I always like to see Roon supported as that provides same as wire transport fidelity. If you don't use Roon, then this won't work for you anyway.
Tons of streamers do this right so the solution is out there and general rule as you stated applies. But we need to measure to be sure especially in the case of budget, OEM designs like this.