Even if you set the sampling rate by hand and it matches your material you're still not protected from internal processing in the OS mixer..
This is not necessarily bad from a fidelity point of view, only increases latency. Not a problem for music streaming, only for gaming, recording, real time communications, etc. But you might get interrupted by other system sounds or other applications making sound.
Unless you have also turned on some effects processing that the mixer is able to do. There is no digital to analog conversion and if there is no sample rate conversion, the same PCM that comes in will come out. Since the audio engine can only handle PCM, this can be a problem for applications that want to pass through encoded formats like Dolby, DTS, etc as typical in HT applications. So they cannot do pass through unless they use direct mode. Music streamers send PCM anyway (as far as I know almost all do).
Your best bet is is to try something like JRiver and its WDM driver, this is a virtual device driver that you set as the default OS device and it is able to bootleg music samples bypassing the mixer into JRiver, from there you can forward it onto the ASIO or WASAPI exclusive device.
Voicemeeter is free unlike JRiver. But you would not avoid going through the Windows audio engine to get to the virtual sound devices if the app cannot do direct mode. But like I said above, this is not necessarily a problem. What Voicemeeter or JRiver will do is take any sample rate that the player provides and feed it bit perfect into the output or do a competent sample rate conversion if necessary.
Their app needs to be able to talk to an ASIO driver or WASAPI exclusive interface. It can't, maybe in the future it will be able to. They were either lazy, rushed, or plain ignorant in this regard.
They don’t need to go into those low level APIs unless latency was an issue which is not in music streaming. Just Windows Direct Mode is fine (which is not that different from the APIs they are using now in shared mode) but they will need to be prepared to do sample rate conversion if they have to go direct since no one in the middle will in direct mode.