• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Fixed vs Floating point explanation

we need a 64 bit word size - it was da schnizzle way back when
 
Most fixed point DSP’s work internally with high precision, often somewhere around the 40~50 bits. That mostly mitigates the downsides of fixed point integer calculations.
Yep, the most famous was the Pro Tools HD TDM cards which worked at 48 bit fixed.
 
I find it hard to believe that the noise or distortion limiter in any audio system is the DSP, fixed or floating point...
 
I wonder if all dsp make use of floating point.
Most. Sometimes no floating-point processor is available or maybe some simple operations can be done in fixed-point.

There is a lot of summation (and sometimes multiplication) involved in DSP and it makes things a whole lot easier if you don't have to worry about overflowing an integer value (and clipping).

The other big advantage if your editing & mixing is that you (temporarily) can go over 0dB without clipping. And you can (temporarily) lower the levels without loosing resolution. From what I recall, 32-bit floating-point can go from less than -1000dB to more than +1000dB and no matter the actual numbers, for all practical purposes there is no upper or lower limit.

Reading this blog I'd say it should give better SNR.
DACs & ADCs are integer-based so you can't avoid the quantization noise. But I can't hear quantization at 16-bits... Most people can't... and at 24-bits it's simply not an issue. (You can hear it at 8-bits.)
 
There are floating-point (or equivalent, e.g. multiplying) ADCs and DACs though I am unaware of any used for mainstream audio devices. And they still have quantization noise along with all the other nonlinearities of data conversion. The ones I designed generally had a small "mantissa" converter with the "exponent" used to adjust dynamic range so, while dynamic range was large, resolution was not all that great (did not need to be for the applications targeted). But mostly in my world the "small signal surrounded by big signals in a wide bandwidth" problem meant floating-point ADCs did not help much.
 
Lots of DSP is done in fixed-point. Lots of DSP is done in floating point as well. However, the implementation of the code must be altertered accordingly. For example, as I talked about in another blog posting, a biquad implemented in Direct Form 1, doing low-frequency, high-Q filtering at a high sampling rate (say, 192 kHz) can go down to a signal level below -150 dB FS internally. If you're running a fixed-point system, this would be dumb - so you change the biquad impementation to suit. In a double-precision floating point system, this would be no problem, so you don't need to worry.

Regarding the 48-bit fixed point, mentioned above: This was a clever use of the 48 bits, according to the stories I've been told. The signal came in with a 24-bit word length, which was "inserted" into the 48-bit processor in the middle - with 8 bits below (so, 32-bit fixed point if you didn't use a gain above 0 dB) and 8 above (because the system has an internal mixer with the possibility of adding gain above 0 dB. 8 bits above gives you 24 dB of headroom.

Cheers
-geoff
 
Lots of DSP is done in fixed-point. Lots of DSP is done in floating point as well. However, the implementation of the code must be altertered accordingly. For example, as I talked about in another blog posting, a biquad implemented in Direct Form 1, doing low-frequency, high-Q filtering at a high sampling rate (say, 192 kHz) can go down to a signal level below -150 dB FS internally. If you're running a fixed-point system, this would be dumb - so you change the biquad impementation to suit. In a double-precision floating point system, this would be no problem, so you don't need to worry.

Regarding the 48-bit fixed point, mentioned above: This was a clever use of the 48 bits, according to the stories I've been told. The signal came in with a 24-bit word length, which was "inserted" into the 48-bit processor in the middle - with 8 bits below (so, 32-bit fixed point if you didn't use a gain above 0 dB) and 8 above (because the system has an internal mixer with the possibility of adding gain above 0 dB. 8 bits above gives you 24 dB of headroom.

Cheers
-geoff

Great info, Geoff. Could you link to the blog post? I have a feeling many among us would like to follow up. As for me, I knew there was a reason I used DF2 way back when... :)
 
One more thing... someone above said that you can't hear dither/quantisation of a 16-bit system.

This might be true in a lot of cases, but it's dependent on the gain structure of the entire system, the sensitivity of the loudspeakers, and the dynamic range of the signal. I talked about this in another chapter in the "high res" series here:
https://www.tonmeister.ca/wordpress/2021/07/05/high-res-audio-part-12-outputs/
but the setup for it is here:
https://www.tonmeister.ca/wordpress/2021/06/24/high-res-audio-part-6-noise-noise-noise/

Cheers
-geoff
 
Back
Top Bottom