That would be getting something for nothing. Dither randomizes the LSB so by definition, you lose bits and dynamic range. You can noise shape that so that in the audible band the effect is not there but then you better have ultrasonic spectrum to park the noise.
There are some subtleties here.
When sampling it is a
requirement that dither is used. This is often not appreciated. An ADC that has no dither applied is incorrectly implemented and will yield objectionable signal correlated quantisation noise. This is true in any digital system, whether audio or any other system. Dither is not an option. It is a mathematical necessity to obtain correct operation from the quantiser. Dither is minimally AIWN of one half LSB. But it can be other distributions. Correct dither doesn't reduce the available bit depth. Incorrect dither might. Dither other than AIWN can lead to some surprising benefits. (Noise this low in amplitude is often below the inherent noise floor of the recording process. So long as there is AIWN in the chain - and most microphones will oblige, certianly any small diaphragm mic, there is often no real need to add it in explicitly. In early digital recordings this was often all that was done. But you need to know and be sure.)
The overarching theory is of course Shannon. I highly suggest that anyone interested read his original paper. It is perhaps my favourite scientific paper. Well written, groundbreaking, and one of the most important of the last century.
There are a lot of subtleties in the paper that go unappreciated.
The core point is signalling in a noisy channel. Every communication channel is a bandwith limited channel with a defined signal to noise ratio. The maximum information transfer rate is exactly determined by this pair of metrics. However SNR need not be constant over the channel. It is the area under the curve of SNR/frequency that determines the total information capacity. (This is the key insight that allows ADSL to work.)
It is perfectly possible to resolve signal that is well below the noise. Humans can, and radar and sonar systems exploit this to the hilt. There is signal below tape hiss. Detecting it is covered by Shannon's theorem. They key is that you need time to get enough information into the system to allow a signal to be found. And there is no free lunch, you can't detect arbitrary signals, the signal must itself have a low information content, and overall the books will balance. Signals that are self correlated are detectable, since self correlation limits their information content. Eventually you exchange ability to detect signal in one part of the band with ability in another. But it can be done any way you like by perturbing the overall SNR/frequency curve,
so long as the area remains constant. The bit depth does not automatically set the signal dynamic range in the channel. It constrains the overall information content, but
not the dynamic range at any one frequency.
If we note that the human ear has poor SNR at high frequencies - say the top octave, and certainly above 15kHz, there is a lot of wasted channel capacity. There is no chance any human, even with perfect hearing, can hear down to the LSB at 15kHz. Zip. Even the lowest couple of bits. So, in the digitisation process we can exchange SNR across the audible band. And this is exactly what all the various audio dither mechanisms do. They effectively perturb the dither in such away that the SNR in the human ear critical mid bands is better than 96dB, whilst it is reduced in the higher frequencies.
You can be sure that nearly every CD you ever bought has been so dithered. Back in the early days companies were very proud of their dither algorithms, and sometimes named them on the liner notes. Every DAW you can buy has a final dither step, and usually a choice of dither algorithms.
What is critical is this, dithering is pretty much a final step. It is applied in a DAW when the entire production and mastering is complete and the result is created for distribution. Typically the internal 24 bit samples are being truncated to 16 bit. Again, even if no specialised dither is applied, you cannot truncate the bits from 24 to 16. You get quantisation artefacts. In exactly the same way as ADCs must dither, truncation
must dither.
Whilst shaped dither is related to noise shaping in DACs, as both are described by the same theory, the implementation and effects of the two are quite different. Dither results in in-band shaped noise. But the shaping is controlled so that it remains inaudible.
Internal to DAWs there is a lot of care and management needed in the digital chain. In the ideal case there is never any truncation or re-sampling applied to a signal. But if there is it is critical that the algorithms do not lose the below LSB signal. Different plugins can be poorly behaved in this manner, and silently introduce quantisation noise into the chain. A 16 bit audio signal commonly has information below 96dB at some frequencies, and the DAW must ensure that is preserved.
This is also important when considering the ultimate quality of a DAC. A DAC that shows 96dB SNR does not actually provide perfect reproduction of CD quality music. Typically there is another 6 odd dB of real signal in the mid bands. Any DAC aspiring to be CD transparent needs to hit over 100dB SNR in the mid bands.