32 Bit Float Explained

maxotics · Sep 14, 2023

kchap said:
Is 32 bit FP a technological cul de'sac?

OK, I watched a couple of YouTube videos and read a few web pages. My mathematics and engineering are very basic but I can accept that stacking ADCs can give an additional 2 or 3 bits of resolution and there may be advantages in using 32 bit FP.

Imagine a few years time when it is possible to build an ADC with 23 bits of resolution and by using stacking in conjunction with SOTA amp design, the resolution ban be extened to 25~26 bits. At that point you need to go to 32 bit integers or 64 bit FP.

Hi kchap thanks for watching my videos. I am not skilled at mathematics and engineering (though I suppose I do the latter with computers). I am the boy pointing out the King has no clothes.

When you talk about an additional 2 or 3 bits of resolution would you want to save those two bits as those precise bits, or as a mathematical formula N^e which could potentially calculate to 2.11111111_?

Indeed, why did they go to 32-bit float instead of 32-bit fixed (integers) (which is what the electronics manufacturers do internally on the chips--duh

)? Would you do that? I can't think of any engineer who would. I am waiting for any engineer here to explain why there is a benefit of 32-bit float over 32-bit fixed when digitizing an analog current. Why don't they answer, categorically? Am I wrong and they are being polite? Do they just not know? Are they unsure? There's nothing secret about how 32-bit fixed and 32-bit float works. The latter is complicated, but fundamentally simple if you ignore the bit-math and focus on what it does mechanically, so to speak.

I suspect there is too much of thinking in "waves", "harmonic distortion", "high frequency", etc, which are human constructs and have nothing to do with analog voltages/currents. In any point of time you try to count the properties of electricity. You can only discern the above properties, like "waves" if you "look at them" as a group. No electron knows it's part of a wave or noise, whether it came from the microphone or a neutrino hitting it from a gazillion light years away. All that "frequency" talk has absolutely nothing to do with measuring a charge at at 1/48,000th of a second.

Sure, you can use mathematically/digital methods to reconstruct what you believe the analog value should have been, you can "re-sample" etc., but doing that is a distortion--even if it is, in the end, pleasing to the ear. Shouldn't we keep separated what is real and what is artificial?

It's sad so few will confirm (or correct) that but stand idly by

markanini · Sep 14, 2023

maxotics said:
Exactly, 32-bit float is, if I'm generous, a short-hand way of saying "waveform or digital reconstruction". I mean fine! I don't stop taking photos with my phone just because it's making up pixel values in way Google has found to be pleasing to people. If a manufacturer creates an audio interface that produces more pleasing voice recordings with less apparent clipping then I'd use it! I wouldn't hesitate. I get red-ass because when we change the meaning of words that have very explicit meanings to sell more products we are producing propaganda, not technology.

I'd rather not play the WWII card, but isn't that what happened in Germany? Isn't it what's happened in Russia today? Is it happening in the U.S.? If the emotional attribute of a word means more than its empirical meaning, if "32-bit float" means you'll be rich and successful, is that good for society? Is it good that people believe EV means less carbon released into the atmosphere? Should technical people allow that? Technical people during the pandemic tried to say there is no empirical evidence that masks work very well. They were silenced. There are scientists that question the side-effects of mRNA technology? Can they speak freely about their technical views? No, because as soon as they say "side effect" they're shouted down with "people are dying you out of touch know-it-all".

I'm sure many people reading this believe I'm making a mountain out of a molehill, as it used to be said. We can't know in science what are important words and what unimportant. Should we try to defend them all equally?

If everyone on Audio Science Review is okay with "32-bit float" exchanging its technical meaning for one of gold dust and fairies, okay

Everyone accepted that 16 and 24 bit devices wouldn't utilize the maximum theoretical SNR. Why would 32-bit float be different?

maxotics · Sep 14, 2023

markanini said:
Everyone accepted that 16 and 24 bit devices wouldn't utilize the maximum theoretical SNR. Why would 32-bit float be different?

I feel like I'm being gaslit

There is no SNR in the quantization of an analog current. If there is can you explain that to me? (S)ignal is an abstract concept of information (instead of randomness/entropy etc). Therefore, any single quantization of analog current is by definition, has a 0/0 ratio. The signal or noise is not known.

There is no limit to the theoretical quantization of a current. If we accept that there are 1.602 x 10^-19 electrons in a millivolt, then certainly, theoretically, you'd need something like 266 bit memory (if Chat GPT is to believed) to represent them.

To rephrase your question, if 16 or 24-bit doesn't utilize the full SNR we can meaningfully calculate, should we use a higher bit depth? Absolutely. But why "float" over "fixed" What is the benefit? To me it is worse than 32-bit fixed.

As I've been saying, you only ask about "32-bit float" because of what is essentially propaganda/marketing. I too once believed it. Not judging

KSTR · Sep 14, 2023

For a stacking ADC in an integrated system 32bit (or 64bit) floating point output is a natural choice as the crossfade process leads to fractional bits anyway... and any post-processing (like in a guitar modelling amp etc) is usually done in floating point as well.

For a standalone ADC or DAC, 32bit integer is all that's ever needed and that is readily available with ASIO and other driver schemes.

maxotics · Sep 14, 2023

KSTR said:
For a stacking ADC in an integrated system 32bit (or 64bit) floating point output is a natural choice as the crossfade process leads to fractional bits anyway... and any post-processing (like in a guitar modelling amp etc) is usually done in floating point as well.

For a standalone ADC or DAC, 32bit integer is all that's ever needed and that is readily available with ASIO and other driver schemes.

We may be saying the same thing? I'm not sure. This is vague to me, "or a stacking ADC in an integrated system 32bit (or 64bit) floating point output is a natural choice"

I haven't seen 32-bit or 64-bit "floating" point data as an output of any ADC strictly Analog-to-Digital electronics. Sure, some the chips these days perform post-ADC data manipulations which could be written as floating point. I mean, write what you want

There can only be one analog input. There can only be one digital output. Sure, you can perform all kinds of electronic manipulations on the current. You can do all kinds of digital manipulations on the digital output.

In the end, there can only be one tire, one road

Can you point me to a data sheet where the analog to digital conversion process is written out as floating point? No post digital processing. Thanks!

KSTR · Sep 14, 2023

@maxotics, for the ADC proper (the chip or a channel of a chip) the natural format is integer as all known (to me) ADCs have integer internal processing and the chip interface (I2S) is integer. But in a stacked ADC system where multiple channels are combined algorithmically with a DSP a floating point format is more adequate as the output format. In some cases fixed point integer could be an option for computing, that depends on the architecture of the DSP or generic CPU used in the system.

maxotics · Sep 14, 2023

KSTR said:
@maxotics, for the ADC proper (the chip or a channel of a chip) the natural format is integer as all known (to me) ADCs have integer internal processing and the chip interface (I2S) is integer. But in a stacked ADC system where multiple channels are combined algorithmically with a DSP a floating point format is more adequate as the output format. In some cases fixed point integer could be an option for computing, that depends on the architecture of the DSP or generic CPU used in the system.

You're talking about digital processing, not the quantization of an analog current. I feel like I keep trying to talk about apples but you bring up oranges

But okay, I'll roll with you. So you have an integer of 4 from the first "stacked" ADC, then an integer of 3 from the second ADC (pains me to talk this way

) then you want to combine them "algorithmically" by adding them up and divided by 2. So 4+3=7 with the result of 3.5 which you save as a floating point number. I mean okay, but you never measured 3.5 of a current. You measured a 4 and a 3.

3.5 only exists as an abstraction. It's a symbolic manipulation not an empirical piece of data.

Why do you keep bringing up DSPs? What does a DSP have to do with the core function of analog to digital conversion? (D)igital (S)ignal (P)processing. There is no (A)nalog there, except as the one/off current as you mentioned with the I2S protocol or similar.

antcollinet · Sep 14, 2023

maxotics said:
Exactly, 32-bit float is, if I'm generous, a short-hand way of saying "waveform or digital reconstruction". I mean fine! I don't stop taking photos with my phone just because it's making up pixel values in way Google has found to be pleasing to people. If a manufacturer creates an audio interface that produces more pleasing voice recordings with less apparent clipping then I'd use it! I wouldn't hesitate. I get red-ass because when we change the meaning of words that have very explicit meanings to sell more products we are producing propaganda, not technology.

I'd rather not play the WWII card, but isn't that what happened in Germany? Isn't it what's happened in Russia today? Is it happening in the U.S.? If the emotional attribute of a word means more than its empirical meaning, if "32-bit float" means you'll be rich and successful, is that good for society? Is it good that people believe EV means less carbon released into the atmosphere? Should technical people allow that? Technical people during the pandemic tried to say there is no empirical evidence that masks work very well. They were silenced. There are scientists that question the side-effects of mRNA technology? Can they speak freely about their technical views? No, because as soon as they say "side effect" they're shouted down with "people are dying you out of touch know-it-all".

I'm sure many people reading this believe I'm making a mountain out of a molehill, as it used to be said. We can't know in science what are important words and what unimportant. Should we try to defend them all equally?

If everyone on Audio Science Review is okay with "32-bit float" exchanging its technical meaning for one of gold dust and fairies, okay

Talking about off with the fairies - I (again) have no idea what you are trying to say with your recent posts.

At the end of the day - the voltage or the current is not the sound. It is just an analogue of the sound. A representation of it. Pressure waves measured and represented as electrical waves.

Then the digital sampling of that waveform (one individual sample is meaningless and useless - I have no idea why you are decrying thinking of waves and frequencies - without them we have no music) is just another analogue (in the meaning of representation) of the electrical signal.

you an represent that as 24 bit integer, or 32 bit float (with similar resolution), or even 32 bit integer - either way it doesn't really matter, because 24 bits is already more resolution than we are able to have in the electrical domain. The LS 2 to 8 bits are just sampling noise, depending on the quality of the electronics. And even if they weren't it still wouldn't matter since our auditory system is not able to detect anything 144dB down.

KSTR · Sep 14, 2023

antcollinet said:
Talking about off with the fairies - I (again) have no idea what you are trying to say with your recent posts.

Same for me here. Looks like he lacks even basic understanding of how DACs and ADCs work and how they are used, hence compensated with philosophical outpourings.

DonH56 · Sep 14, 2023

I have not been following this (or any, lately) thread so apologies if this has already been explained. A few comments:

maxotics said:
I feel like I'm being gaslit There is no SNR in the quantization of an analog current. If there is can you explain that to me? (S)ignal is an abstract concept of information (instead of randomness/entropy etc). Therefore, any single quantization of analog current is by definition, has a 0/0 ratio. The signal or noise is not known.

Any time you quantize a signal you create quantization noise, and thus can calculate the SNR relative to the resolution of the quantization (ADC or DAC). It does not matter if the signal is current, voltage, pressure, temperature, or whatever -- quantization creates an error by definition and limits the maximum theoretical SNR. There are several articles on sampling and signal conversion linked in my signature, as well as numerous other references.

Other noise sources (thermal/Johnson, shot noise, flicker noise, etc.), distortion, and so forth further degrade an ideal quantizer.

maxotics said:
There is no limit to the theoretical quantization of a current. If we accept that there are 1.602 x 10^-19 electrons in a millivolt, then certainly, theoretically, you'd need something like 266 bit memory (if Chat GPT is to believed) to represent them.

The limit for any signal is its fundamental unit, like electron charge, noise floor of the medium (air, etc. has a fundamental noise floor). Analog current at the micro level is not a steady flow, and anytime you introduce an active device to create a real circuit you generate additional noise.

maxotics said:
To rephrase your question, if 16 or 24-bit doesn't utilize the full SNR we can meaningfully calculate, should we use a higher bit depth? Absolutely. But why "float" over "fixed" What is the benefit? To me it is worse than 32-bit fixed.

Floating-point numbers are used to extend the system's dynamic range without resorting to very large integers that waste memory and processing power. This is often in the processing where various operations like filters can increase the signal level and cause internal clipping without floating-point (or much larger fixed-point -- floating-point often uses less memory). On the quantization side, a simple (but common) system taking advantage of floating-point dynamic range uses one or more converters and gain-ranging as described previously. Sometimes effective resolution is also varied.

A system I have often seen comprises a single ADC with selectable gain (gain ranging) before the ADC to expand its range. Here is a very simplified example. The ADC has a range of 1 V (or 1 A if you prefer) and 16-bit resolution. When the signal drops near or below an lsb, an additional gain stage is switched in before the ADC, amplifying the signal back to full-scale range of the ADC. Repeat as needed to cover the signal range desired. Say you have four ranges, then instead of using 4 x 16 = 64 bit words in memory, you could use floating-point format with 16-bit mantissa and 2-bit exponent, saving copious memory.

maxotics said:
As I've been saying, you only ask about "32-bit float" because of what is essentially propaganda/marketing. I too once believed it. Not judging

I have only seen a few 32-bit ADCs that actually approached ideal 32-bit performance (noise and distortion), and their bandwidth was very limited (much less than the audio band), but I have not been following them (my career was in higher-speed and lower-resolution devices). In audio (and other) systems floating-point is very helpful when capturing wide dynamic range signals (e.g. live recording using gain-ranged ADCs) and subsequent mixing and processing (EQ etc.) in the digital domain where the inputs from multiple ADCs is combined and processed to create the final product.

HTH - Don

maxotics · Sep 14, 2023

KSTR said:
Same for me here. Looks like he lacks even basic understanding of how DACs and ADCs work and how they are used, hence compensated with philosophical outpourings.

So you're all ganging up on me? Let's say I don't have a "basic understanding" of DACs or ADCs work. Does that mean you do?

@antcollinet wrote "Then the digital sampling of that waveform (one individual sample is meaningless and useless - I have no idea why you are decrying thinking of waves and frequencies - without them we have no music) is just another analogue (in the meaning of representation) of the electrical signal."

He's complicating the question. When did I ever say we shouldn't talk about waves of frequencies? I simply said they have nothing to do with the simple counting of electrons in a current (and digitizing them.)

There is no "digital sampling of a waveform". He'd have to explain it here, slowly , explicitly, because as you said, I know nothing. How would one digitally sample a wave at the beach?

What he said is like someone with their house burning saying to go to the seashore and get water and when the person asks how much. He answers "3 waves".

A "sample" is a sample of something. Sure, it can be a sample of digital data, but in quantizing electrons to numbers it is a sample of current. Many here keep confusing the two. When I point it out, more jargon spills out.

maxotics · Sep 14, 2023

DonH56 said:
I have not been following this (or any, lately) thread so apologies if this has already been explained. A few comments:

Any time you quantize a signal you create quantization noise, and thus can calculate the SNR relative to the resolution of the quantization (ADC or DAC). It does not matter if the signal is current, voltage, pressure, temperature, or whatever -- quantization creates an error by definition and limits the maximum theoretical SNR. There are several articles on sampling and signal conversion linked in my signature, as well as numerous other references.

Other noise sources (thermal/Johnson, shot noise, flicker noise, etc.), distortion, and so forth further degrade an ideal quantizer.

The limit for any signal is its fundamental unit, like electron charge, noise floor of the medium (air, etc. has a fundamental noise floor). Analog current at the micro level is not a steady flow, and anytime you introduce an active device to create a real circuit you generate additional noise.

Floating-point numbers are used to extend the system's dynamic range without resorting to very large integers that waste memory and processing power. This is often in the processing where various operations like filters can increase the signal level and cause internal clipping without floating-point (or much larger fixed-point -- floating-point often uses less memory). On the quantization side, a simple (but common) system taking advantage of floating-point dynamic range uses one or more converters and gain-ranging as described previously. Sometimes effective resolution is also varied.

A system I have often seen comprises a single ADC with selectable gain (gain ranging) before the ADC to expand its range. Here is a very simplified example. The ADC has a range of 1 V (or 1 A if you prefer) and 16-bit resolution. When the signal drops near or below an lsb, an additional gain stage is switched in before the ADC, amplifying the signal back to full-scale range of the ADC. Repeat as needed to cover the signal range desired. Say you have four ranges, then instead of using 4 x 16 = 64 bit words in memory, you could use floating-point format with 16-bit mantissa and 2-bit exponent, saving copious memory.

I have only seen a few 32-bit ADCs that actually approached ideal 32-bit performance (noise and distortion), and their bandwidth was very limited (much less than the audio band), but I have not been following them (my career was in higher-speed and lower-resolution devices). In audio (and other) systems floating-point is very helpful when capturing wide dynamic range signals (e.g. live recording using gain-ranged ADCs) and subsequent mixing and processing (EQ etc.) in the digital domain where the inputs from multiple ADCs is combined and processed to create the final product.

HTH - Don

I don't disagree with what you wrote. But I don't feel you're reading me closely. For example, you wrote, "Floating-point numbers are used to extend the system's dynamic range" No, they're use to extend its range, min and max. And they do so at the cost of precision (dynamism). To compare apples to apples, you would have to say 32-bit float extends the dynamic range from 24-bit float because the precision of those two would be equal. Will you gang up on me like everyone else or back back to read more closely what my argument is (and isn't)? Thanks for your input!

Hayabusa · Sep 14, 2023

voodooless said:
While very impressive, still very far of 192 dB and still within the 144 dB 24 bit integer gives us.

If you want to do any DSP with 32 or 24 bits integers you need a lot more bits in your multiplication...

maxotics · Sep 14, 2023

Hayabusa said:
If you want to do any DSP with 32 or 24 bits integers you need a lot more bits in you multiplication...

I get it, I really do. But I don't look at an ADC as set of electronics that must exist within a DSP. Many do of course. When someone plugs in a microphone and it isn't working I don't say, "I'm not getting a current," I say, "I'm not getting a signal", even though signal has nothing to do with the problem. We all use short-hand in discussing technical subjects.

I say one must be careful not to lose sight of the fundamentals, of what really goes on. It's SLOPPY thinking. It leads to errors because you don't know the difference. If a NASA engineer did all this calculations using 32-bit float would anyone be surprised, after all the analysis is done, that a spacecraft crashed because a 32-bit float value was imprecise and gave a wrong value for the purpose at hand?

Once can use 32-bit float wherever they want. I am NOT the 32-bit float police

But they can't use it to count electrons. Represent them, sure. Count them. No.

DonH56 · Sep 14, 2023

maxotics said:
I don't disagree with what you wrote. But I don't feel you're reading me closely.

As I said, I have not been following this thread.

maxotics said:
For example, you wrote, "Floating-point numbers are used to extend the system's dynamic range" No, they're use to extend its range, min and max. And they do so at the cost of precision (dynamism).

Precision, resolution, accuracy, and dynamic range are different. I have never before seen precision equated to "dynamism" or dynamic range.

A DSP system is generally limited in dynamic range by the performance of the analog circuits and data converters (ADC/DAC), not the numerical format. I had to explain to program managers many times over the years why I couldn't just design a 16/32/64-bit ADC to match the DSP in the system. Physics is a %^*$#.

maxotics said:
To compare apples to apples, you would have to say 32-bit float extends the dynamic range from 24-bit float because the precision of those two would be equal.

The dynamic range and precision depends upon the format of the floating-point numbers used. A 32-bit FP number having 24-bit mantissa and 8-bit exponent would have greater precision than a 24-bit FP number with 16-bit mantissa and 8-bit exponent. The dynamic range could be roughly the same as determined by the exponent.

maxotics said:
Will you gang up on me like everyone else or back back to read more closely what my argument is (and isn't)? Thanks for your input!

I barely skimmed the previous posts and had a hard time following your argument, sorry. You appear to lack fundamental understanding of the sampling and quantization process and how it relates to numerical representation. I had hoped to help clarify but obviously failed and will bow out -- too many other things going on right now and of no help to you, no point in being part of "the gang".

voodooless · Sep 14, 2023

Hayabusa said:
If you want to do any DSP with 32 or 24 bits integers you need a lot more bits in your multiplication...

Why is that relevant to an ADC interface? If you need more bits, then just add them.

Barrelhouse Solly · Sep 14, 2023

Hoo boy. I took C in 1990 or thereabouts and had to do a quick review of floats and doubles. I remembered it was a "logarithmic" representation. In my work life I used mostly BCD. A lot of the sampling stuff is over my head because I'm just plain ignorant of the details. Even so, I'm enjoying the discussion.

From my position of ignorance it seems to me that in order to do away with clipping you need and infinite degree of precision and infinitely responsive hardware.

maxotics · Sep 14, 2023

maxotics · Sep 14, 2023

DonH56 said:
You appear to lack fundamental understanding of the sampling and quantization process and how it relates to numerical representation

Thanks for leaving the conversation with that. If it's so fundamental, I don't know why you or others can't explain it to me. Instead, all I get is DSP this or DSP that. DSPs have nothing to do with quantizing a voltage. "I'm not part of the gang but I'll kick you on the way out."

maxotics · Sep 14, 2023

One of my comments was deleted for "political content". Not sure which one. Anyway, thanks for all the comments. I explained why 32-bit float cannot mathematically be used to add precision to ADCs (over 24-bit fixed) in digitizing a current. I explain why a current cannot be measured directly as a mantissa/exponent pair. Sure, you could split the current into a bunch of exponential measurement and then split those buckets into mantissas, but I don't see how that would work better than 24-bit output. There are links here to my videos and explanations of 32-bit float. If one is interested, one can peruse them at their leisure.

Goodbye.

32 Bit Float Explained

Member

Major Contributor

Member

Major Contributor

Member

Major Contributor

Member

Master Contributor

Major Contributor

Master Contributor

Member

Member

Addicted to Fun and Learning

Member

Master Contributor

Grand Contributor

Senior Member

Member

Member

Member

Similar threads