• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required as is 20 years of participation in forums (not all true). Come here to have fun, be ready to be teased and not take online life too seriously. We now measure and review equipment for free! Click here for details.

Am I understanding basic audio science right?

Joined
Jan 13, 2021
Messages
5
Likes
21
#1
Ok, so I only just passed my science exams when aged 16 so my science knowledge is really shaky so bear that in mind; I've read so many posts on this forum and elsewhere to try to understand what I can expect to get sound quality wise from my audio equipment and from audio sources and wanted to check I'm on the right lines. I'm posting here though because it's a science forum and I trust scientific explanations over the waffle I have been trained to write as a flaky humanities graduate. If you can help with a few questions and check if my understanding is right then that would be really helpful.

1. Is it worth paying for 24 bit audio files with 48khz or higher, as opposed to CD quality tracks with 16 bit at 44khz?

part a: bits - my understanding: bits are related to volume levels. A 16 bit audio file can record up to 96db of dynamic range, so the difference between the quietest sound and the loudest sound on the CD is 96db. 96db is really loud. I would only benefit from having 24 bit audio if I was listening to music that was set at a volume where the loudest parts were more than 96db louder than the quieter parts. A whisper is about 30db in volume when I sit a couple of metres from the speakers. So, if my amp was set so that I could just hear a recorded whisper (let's say the sound engineer finished the CD with this 30db whisper as the quietest possible sound on the CD), and the loudest sound on the CD was 96db higher, that would be around 126db loud and that would not be a pleasurable experience. So I really don't need more than 16 bit audio?

part b: hertz - my understanding: hertz (in this context) refers to the number of times an analogue sound wave is sampled each second and converted into a digital value. So, if a musical performance was sampled only 30 times a second, it would be a 30hz recording and would be virtually impossible to listen to with pleasure because it would have missed so much of the performance. It is possible to capture an analogue sound signal perfectly if it is sampled twice the number of times per second as the sounds frequency expressed in hertz (from Nyquist’s theorem, ok - I’ve butchered it). So, if there’s a low note at 250Hz and it is sampled at 500Hz, that analogue signal can be captured digitally perfectly and later perfectly converted back into the analogue realm. So, because a CD is sampled at 44.1khz, it can send an accurate analogue signal to the amp providing the recording has frequencies of 22khz or less. As I’m an adult, I can’t hear any frequency above 22khz (and I can’t even hear that), so I don’t need to buy music that is sampled at a higher rate than 44.1khz?

2. How good does my DAC need to be so that I can’t hear the difference with a better DAC?

My understanding is really confused here. I’ve read on this forum (guess what, I can’t find where), that if a DAC has a Sinad of 96db then that should be transparent for a CD recording. So is it right to say that the Sinad measurement (which I get means ‘total harmonic distortion + noise’, but I don’t get what that actually means) means that any sound at a decibel level up to the Sinad value will be processed transparently, ie a DAC with 93db Sinad will process nearly all of an audio CD’s transparently, ie all but any information from 94db-96db?

3. Do I even need CD quality or is 320Kbs good enough?

My understanding: a CD quality track plays at a much higher bit rate (1,411 Kbps) than a 320Kbps track and this means more musical information is transferred each second the amplifier. So, theoretically CD could sound better than 320Kbps compressed files (which is entirely different to saying it sounds better in practice)? I’m aware of the huge arguments people have over this in practice, so I’m just focusing my question on in theory.

Any help is much appreciated.
 
Joined
Jan 13, 2021
Messages
5
Likes
23
#2
1a. 'dB' is like '%' or 'thousand' in that it tells you about the relative scale of the value provided, but does not tell you what the scale is measuring. (You can have ten thousand dollars or ten thousand volts, but having "ten thousand" without specifying the thing being counted is meaningless.) A Bel is a ten-fold change, so 0 Bel dollars, 1 Bel dollars, and 2 Bel dollars would be $0, $10, and $100; -1 Bel dollars and -2 Bel dollars would be $0.10 and $0.01. A decibel is one-tenth of a Bel, so 20 dB is the same as 2 B. Straight 'Bel' is very rarely used, though, because units in dB is usually more practical.

In this context, dB are often used to describe recordings as dB full scale, or dBFS, where 0 dBFS is as loud as the samples can record, -10 dBFS is -1 Bel so 1/10th so 10% of 0 dBFS. Another use of dB is for sound pressure level, or SPL, which measures the amount of force a sound wave is carrying. 100 dB SPL has ten times as many of those force units (which isn't very complex, but let's not get into those details right now) as 90 dB SPL. Whether 10 dBFS -- going from -10 to 0 dBFS, for example -- translates into 10 dB SPL -- going from 80 to 90 dB SPL, for example -- is entirely dependent on how much amplification is used.

Imagine a two-bit recording: each sample would have loudest, moderate, quiet, and silent levels, and that's it as detailed as each sample can be. But, an amplifier could play the loudest samples at 120 dB SPL, moderate at 90 dB SPL, and quiet at 60 dB SPL -- or, an amplifier could play the same samples at 80, 75, and 70 dB SPL. Either reproduction would be faithful to the recording because the relative levels are maintained.

So, no, more bits does not equal more sound; volume levels are not the issue here.

1b. Sounds like you've got the basics down pretty well. In theory -- and that is a very important qualification if you want to go deeper into this -- a recording that takes x samples per second will be able to accurately reproduce tones that are lower than 1/2 x. A 500 Hz sampling frequency would allow you to accurately record sounds up to 250 Hz. (Perfectly? That's a big word that could open nuanced discussion. Impossible to hear the difference? Absolutely doable.)

There are good reasons for high bitrate material. It allows further processing of the sounds with fewer artifacts. It changes requirements for filtering out frequencies above the Nyquist value, which if done poorly can create noise in the audible range. It also can make recordings more accurate -- more true to the original sounds -- although I've never seen any evidence the gains here from going above 44.1 kHz is remotely audible. The primary reason is just to have better input so you don't need to worry about whether it is a problem in the first place. Well, the folks who designed CDs (44.1kHz) and DVDs (48 kHz) knew what they were doing and did a good job, so you don't need to worry about it for listening.

2. This is a more difficult topic, but I'll summarize my thoughts by saying (1) you're absolutely entitled to be confused, pretty much everyone who isn't an EE and actively working in this field is confused by this stuff, and (2) it isn't as simple as masking out bits here and there or dBFS values here and there. My takeaway will be: don't think of noise as masking other noise, think of all noise in the system adding up together.

And also: learn to recognize which kinds of noise bother you and which don't. I'll tolerate a lot of 2nd order harmonics while 60 Hz hum drives me absolutely nuts, and both measure the same in THD+N numbers.

3. In theory, yes. Compressed files come in two types: lossy and lossless. Lossy formats, which include everything that'll encode CD-quality music at 320kbps, work in part by throwing information away. The entire goal of those compression schemes, though, is to throw away information that is not audible. I can certainly hear the difference between uncompressed and 96 kbps compressed sound; using less compression makes those changes less audible, perhaps inaudible, but they are still changing the signal, and in theory any relevant change that's large enough would be audible, and any audible change might be considered to sound better. On the other hand, there's also lossless compression that makes for a smaller file, although decompressing it yields the exact same flow of bits (at 1411 kbps?) to the DAC when the time comes.
 

sergeauckland

Major Contributor
Forum Donor
Joined
Mar 16, 2016
Messages
1,838
Likes
4,191
Location
Suffolk UK
#3
Part a and Part b - spot on, correct.
Question 3 is a bit more complicated. 320kbps MP3 and definitely AAC , are transparent for almost everybody and every recording, but there are always the outliers, especially trained listeners who can listen for the artefacts most of us miss. So, yes generally, 320kbps is 'good enough' but storage and data is cheap, and something lossless like FLAC or ALAC only takes up twice the space of MP3 so one might as well have it, even if mostly the difference is inaudible.

As we mostly agree on here, the main difference is in the loudspeakers and their interaction with the room rather than anything in the electronics. Probably the biggest difference of all is the quality of the recording, something we can do little about.

S
 

RayDunzl

Major Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
10,845
Likes
10,332
Location
Riverview FL
#4
A Bel is a ten-fold change, so 0 Bel dollars, 1 Bel dollars, and 2 Bel dollars would be $0, $10, and $100; -1 Bel dollars and -2 Bel dollars would be $0.10 and $0.01. A decibel is one-tenth of a Bel, so 20 dB is the same as 2 B. Straight 'Bel' is very rarely used, though, because units in dB is usually more practical.
0 Bel Dollars cannot be "nothing" as in no dollars.

0 Bel $ would have to refer to some non-zero value - not $0

Negative Bel $ would represent fractions of that value

Positive Bel $ would represent multiplications of that value.

Your example might otherwise be correct if you specified 0 Bel Dollar = $1
 

voodooless

Senior Member
Forum Donor
Joined
Jun 16, 2020
Messages
348
Likes
350
#6
So I really don't need more than 16 bit audio?
As a source, probably not. But if you want to do digital processing like filters you do want more bits. It gives more headroom and calculation errors are much smaller.

so I don’t need to buy music that is sampled at a higher rate than 44.1khz?
nope! Most of that music does not contain much content above 20 kHz anyway.

2. How good does my DAC need to be so that I can’t hear the difference with a better DAC?
Probably anything with a SINAD of 96 or better will be transparent. Just to be on the safe side I would personally use 100 as a good boundary, but that has no objective basis

My understanding: a CD quality track plays at a much higher bit rate (1,411 Kbps) than a 320Kbps track and this means more musical information is transferred each second the amplifier.
I can debunk your theory (or actually hypothesis if were talking science lingo): a FLAC file has lower bit rate than a WAV file (for equal bit depth and sampling rate), yet contains the exact same information. Encoding is the trick here, and in case of FLAC it’s lossless. Not so for MP3 als equivalents.
Any help is much appreciated.
The Force is strong in this one! You’re doing quite well on your own. Keep it up!
 

Killingbeans

Major Contributor
Joined
Oct 23, 2018
Messages
1,115
Likes
1,908
Location
Bjerringbro, Denmark.
#7
1. Spot on.

2. SINAD tells you how far below the original signal the sum of the added harmonic distortion and noise sits.

How good a DAC you need probably depends on what you intend on using it for.

If you love dissecting gear with your ears and spend most of the time painstakingly seaching for artifacts, the answer most likely lies somewhere in the established thresholds of human hearing.

But if you would just like to enjoy some music without being bothered by anything, distortion at -60db and noise at -80db will very likely be fine(?).
(It's a bit of a controversial subject)

3. MP3 and other lossy formats compresses the data with algorithms designed to take advantage of the shortcomings of human hearing. Once in a while they screw up a bit (no pun intended), but the vast majority of the time they do a remarkably good job. Often 192 kbps is perfectly adequate.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
1,479
Likes
1,442
Location
Pacific Northwest
#8
... A 16 bit audio file can record up to 96db of dynamic range, so the difference between the quietest sound and the loudest sound on the CD is 96db. 96db is really loud. I would only benefit from having 24 bit audio if I was listening to music that was set at a volume where the loudest parts were more than 96db louder than the quieter parts. ...
Essentially correct, but there's another factor to consider...

Suppose you're listening to music with high dynamic range, a quiet section that is 48 dB quieter than peak levels. 48 dB is 8 bits, so in this musical section the top 8 MSB of each sample are all zero and the resolution of what you're hearing is only 8 bits. Furthermore, since energy drops with frequency (at least with normal acoustic music), most of that amplitude comes from the lowest frequencies. So the violin & viola you hear in this quiet section is likely another 12 dB or 2 bits quieter, so they're only 6 bits. Then the difference between them (why a violin sounds different from a viola) is in the harmonics, which are octaves higher, thus lower in level, having even fewer bits. So the actual resolution may be quite a bit less than 16 bits, especially in treble frequencies where our hearing is most sensitive.

This is one reason why the redbook CD standard had "pre-emphasis". They boosted the treble by up to 10 dB before A/D conversion, then cut it accordingly after D/A on playback, to get better bit resolution. The main reason for this was because some early CD players weren't really 16 bit, they didn't process the bottom 1 or 2 bits properly. Another "solution" was "HDCD" which boosted the level of quiet parts in order to increase the bit resolution, then cut them on playback. It was similar in concept to pre-emphasis, but based on dynamic level instead of on frequency.

In short, 16 bits is mostly transparent and sufficient for most music. But it is not 100% transparent for music having wide dynamic range, and the limitations can become audible at something less than 96 dB of dynamic range, as in quiet parts the number of bits actually used for music is much smaller, thus noise (bottom 1-2 bits) consequently becomes a bigger % of the total.
 

MrPeabody

Active Member
Joined
Dec 19, 2020
Messages
138
Likes
127
Location
USA
#9

MrPeabody

Active Member
Joined
Dec 19, 2020
Messages
138
Likes
127
Location
USA
#11
The most important thing to remember about any value expressed in Bels and deciBels is that it is a ratio of two numbers, expressed in a specific way that has various benefits. Common logarithms are used. The primary effect of the use of logarithms is that a collection of ratios that cover a large range in magnitude is squeezed into a much smaller range of magnitude. This is the primary effect of using logarithms, but this is not the reason per se that we use a logarithmic scale for expressing ratios.

Logarithms are kind of like playing Jeopardy. In Jeopardy, you are given the answer and you have to figure out the question. Same thing with logarithms. You're given the answer, and you have to figure out the question.

Finding the base 10 logarithm of a given number, say 1000, is like being asked, "If I tell you that 10 raised to some unknown power is equal to 1000, what would the power be?" Obviously the answer is 3, which is to say, the base 10 logarithm of 1000 is 3. The base 10 logarithm of 10000 is 4. The base 2 logarithm of 16 is also 4, because 4 is the power that you have to raise 2 to, to get 16. Bels and decibels are based on base 10 logarithms.

In mathematical parlance, logarithm functions are the inverse of exponential functions. If you raise 10 to some particular power X, then take the base 10 logarithm of that result, whatever it is, you get back the X. And you can do this the other way around. One way to sketch a graph of the base 10 logarithm function is to use slow-drying ink to sketch a graph of the function 'Y = 10^X', then carefully fold the graph paper along the diagonal line 'Y = X'.

No matter the base, any exponential function will intercept the y-axis at Y = 1, because no matter the value of n, n^0 is equal to 1. Since logarithm functions are the inverse of exponential functions and are reflective of their complementary exponential functions about the diagonal line 'Y = X', it is apparent that the logarithm of 1 will always equal 0, regardless of the base. In other words, logarithm functions always intercept the x-axis at X=1.

As you look to the left of the y-axis, where the independent variable is negative, the exponential function approaches the x-axis asymptotically. in other words, for negative values that are very large in absolute value, like -100000000, the value of the exponential function is positive but very close to 0. For negative values of the independent variable, the value of the exponential function is between 0 and 1.

Again applying the reflective property of inverse functions, it is not difficult to see that for any logarithm function the value will be negative for independent variable values between 0 and 1, and will be positive for independent variable values greater than 1. For independent variable values that are positive but very close to zero, the value of the logarithm function will be negative but very large in absolute value, like -100000000.

A Bel value is calculated by taking the base 10 logarithm of some ratio. In practice it is generally assumed and true that the ratio is positive, i.e., that either both values are positive or both are negative. It is also generally assumed that both quantities have the same units of measure and represent the same physical quantity. Thus, even though it is certainly meaningful to talk about the ratio of voltage to current, it would not make sense to express this ratio in Bel or deciBel.

Since the Bel value is simply the logarithm of the ratio, it is apparent that for ratios where the numerator is less than the denominator, the ratio itself will be between 0 and 1, which means the Bel value will be negative. If a given, measured value is smaller than the value to which it is being compared (which may or may not be a standardized reference value), the Bel value will be negative. If a given, measured value is greater than the value to which it is being compared, the Bel value will be positive. A deciBel is simply one-tenth of a Bel. To convert Bel to deciBel, you multiply by 10 because deciBel is only one-tenth as large as Bel. To convert deciBel to Bel, you do the reverse. To convert a deciBel value to an ordinary decimal fraction, the first thing you would ordinarily do is divide by 10. Then instead of using the Log key on your calculator, you use the 10^x key.

The deciBel value corresponding to the doubling or halving of a quantity makes for a handy point of reference. To obtain this deciBel value, we need only take the base 10 log of 2 and then multiply by 10. We get 3.0103. In sloppy government work (or sloppy work in private practice), this is conveniently rounded off to 3.0. Thus, each 3 dB of gain or loss corresponds to a doubling or halving. If a quantity is reduced to 1/4 of its previous value, this is a reduction of approximately -6 dB. If the sensitivity of a speaker stays within a +/- 3 dB window over some particular frequency range, this means that the lowest SPL value is not less than 1/4 of the greatest SPL value (if the values are expressed using physical units for pressure), or equivalently, the greatest SPL value is not greater than 4x the lowest SPL value.
 

Killingbeans

Major Contributor
Joined
Oct 23, 2018
Messages
1,115
Likes
1,908
Location
Bjerringbro, Denmark.
#12
But it is not 100% transparent for music having wide dynamic range, and the limitations can become audible at something less than 96 dB of dynamic range, as in quiet parts the number of bits actually used for music is much smaller, thus noise (bottom 1-2 bits) consequently becomes a bigger % of the total.
Well that's the thing isn't it? The noise floor doesn't change when you lower the bit depth. The SNR goes up, but that seem natural enough. It's not a problem if the noise floor is low enough?

https://sonicscoop.com/2013/08/29/w...t-you-knew-about-bit-depth-is-probably-wrong/
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
1,479
Likes
1,442
Location
Pacific Northwest
#13
Sure, dither makes all the difference. Without dither, bit depth is resolution and there is no noise. With dither, you get virtually infinite resolution and bit depth determines the noise level.

I did a little experiment demonstrating this by encoding a -114 dB signal into 16-bit digital. Without dither, you get nothing, since -114 is below -96 dB. The signal never rises above the LSB of 16-bit, so every sample is all zeros. When dithered, you can clearly hear the -114 dB signal. It's embedded in hiss (the dither noise), actually below the noise level, but you can still hear it quite clearly.

My point is, 16 bits isn't enough to be 100% transparent for high dynamic range music; 24 bit has some benefit.
 

danadam

Active Member
Joined
Jan 20, 2017
Messages
187
Likes
235
#14
The deciBel value corresponding to the doubling or halving of a quantity makes for a handy point of reference. To obtain this deciBel value, we need only take the base 10 log of 2 and then multiply by 10.
It is "multiply by 10" for quantities related to power. For quantities related to amplitude it is 20.
https://dspillustrations.com/pages/posts/misc/decibel-conversion-factor-10-or-factor-20.html

If the sensitivity of a speaker stays within a +/- 3 dB window over some particular frequency range, this means that the lowest SPL value is not less than 1/4 of the greatest SPL value (if the values are expressed using physical units for pressure), or equivalently, the greatest SPL value is not greater than 4x the lowest SPL value.
I have an impression something is mixed up in the above. AFAIK, SPL is "amplitude" quantity and halving/doubling happens at 6 dB, not 3 dB.
 
Joined
Mar 24, 2019
Messages
76
Likes
57
#16
16 bit noise can be heard using IEM.

24bit 48kHz is already very good.
Anything more than that is like buying salt in France and fly back with it to your home country.
Then having a salt tasting party with your friends with salt from various country and claim there's difference.

Don't get me wrong, there's difference. ;)
 

danadam

Active Member
Joined
Jan 20, 2017
Messages
187
Likes
235
#17
My point is, 16 bits isn't enough to be 100% transparent for high dynamic range music; 24 bit has some benefit.
Um... not for me. At the max volume level that I listen to dynamic tracks, I won't hear anything at -114 dBFS. I start detecting 3 kHz tone at -105, maybe -108 if it is really quiet around. After converting that to 16 bit I can hear dither noise but not shaped dither noise. More complex signals, like a guitar chord, I start detecting at -88 (max peak) and again can hear dither noise but not shaped dither noise.

That's with IEMs, maybe with speakers the range would be bigger?

Another question is if you have any examples of such dynamic tracks, which would reach full potential of 16 bit, let alone exceed it? :)
 
Last edited:

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
1,479
Likes
1,442
Location
Pacific Northwest
#20
It's even better than intuitive. It's in the arithemtic. :)
The arithmetic doesn't require it, it's done for convenience.
In other words, we could have a single definition of a dB, say 20 * log(R). That would be simpler and mathematically consistent, eliminating any confusion about which formula to use since there's only 1. But that would mean a 6 dB change in voltage (say increasing volume) leads to a 12 dB change in power delivered to the speaker. By defining a power dB to be different from a voltage dB, 10*log(R) and 20*log(R) respectively, a 6 dB change in voltage leads to a 6 B change in power, simply because a power dB is "bigger".
Having 1 formula or 2, the math works either way. Using 2 different formulas and definitions for a dB is done by convention, presumably for convenience.
 
Top Bottom