• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

AAC's quality...

OP
GrO

GrO

Member
Joined
Nov 23, 2021
Messages
49
Likes
5
Location
Poland
i cant agree with this, ever heared of upsampling? imagine having two datapoints one at 0% volume, the other at 100% volume, now imagine doubling sampling size and having a third datapoint with 50% in between the two first ones, thats why upsampling actually makes a difference to the better (tho its a simplificated way to explain it)

As far as I know it only makes a difference in file size and container being used, and I have to admit that I don't get your idea with those volume levels, maybe it would make more sense if you'd use 25% (instead of 0), 50% (as low), and 100% (as doubled), but it's still not a very good example in my opinion.

thats why high khz files exist, its not about hearing above 20khz but its about having more datapoints so fast volume changes are more fluid and therefore sound better
its jsut that 192 khz files are named that way because in "theory" they can hold data that reproduces 192khz (or half that because of two channels) with the datapoints in the file, for 44,1khz material it means more resolution in the datapoints

...it's not like that. Sample rate's frequency is not really related to the actual audible frequencies or volume changes (dynamics). Simplifying all this we could say that sample rate is a resolution of digitized sound, and bit depth is a resolution of each sample. They also shouldn't be limited by bit rate per second as it is with lossy compression encoders.

You may also want to read those articles:

https://en.wikipedia.org/wiki/Audio_bit_depth .

As a curio I can mention DSD (Direct Stream Digital) where only 1 Bit is used (instead of 16, or 24) with a very high sampling frequency.
 
Last edited:

BDWoody

Chief Cat Herder
Moderator
Forum Donor
Joined
Jan 9, 2019
Messages
6,948
Likes
22,627
Location
Mid-Atlantic, USA. (Maryland)
Sample rate's frequency is not really related to the actual audible frequencies

They are directly related, or did I misunderstand you?

Sampling at 44.1 kHz gives audio up to 22.05 kHz, etc.
 
OP
GrO

GrO

Member
Joined
Nov 23, 2021
Messages
49
Likes
5
Location
Poland
They are directly related, or did I misunderstand you?

Sampling at 44.1 kHz gives audio up to 22.05 kHz, etc.

...and why any sample couldn't play at 22kHz no matter how many of them has been used?

sample_rates.jpeg


By the way, human's audible frequency range is 20Hz-20kHz but it's out of the topic.
 

BDWoody

Chief Cat Herder
Moderator
Forum Donor
Joined
Jan 9, 2019
Messages
6,948
Likes
22,627
Location
Mid-Atlantic, USA. (Maryland)
...and why any sample couldn't play at 22kHz no matter how many of them has been used?

View attachment 174863

By the way, human's audible frequency range is 20Hz-20kHz but it's out of the topic.

I don't know what all that means.

Sampling Theory needs 2 samples to recreate any given frequency. If all we wanted was up to 10kHz, a 20kHz sampling rate gives us that, but no higher. More samples per frequency does not increase the fidelity. It doesn't get connected like connect the dots. The waveform is recreated with no loss of information within the bandwidth limit.

Monty should be required viewing...
 
OP
GrO

GrO

Member
Joined
Nov 23, 2021
Messages
49
Likes
5
Location
Poland
I don't know what that means.

Sampling Theory needs 2 samples to recreate any given frequency. If all we wanted was up to 10kHz, a 20kHz sampling rate gives us that, but no higher. More samples per frequency does not increase the fidelity. It doesn't get connected like connect the dots. The waveform is recreated with no loss of information within the bandwidth limit.

Monty should be required viewing...

...can you provide any scientific article with references that confirm this? I don't want any Monty from 'youtube'.
 
D

Deleted member 23982

Guest
That's a bunch of nonsense.

Maybe a little Monty is in order here...
a bunch of nonsense is a bit harsh :D atleast i thought this would be it
thanks for the video, makes sense, tho im unsure why upsampling has a "good" effect then (like less harsh transients, and how the letter b,p,k,t,s,z sound, more smoothly)

also there is a notable difference in 24bit 44,1khz vs 192khz, i cant pinpoint it but it just sounds a tad more clear (not a huge difference like mp3 320kbit vs 16bit/44khz but like 10-30% of that difference)

this was basicly what i tried to say :) https://www.psaudio.com/pauls-posts/getting-something-for-nothing/#:~:text=When we upsample a 44.1,file size is considerably bigger
 
Last edited by a moderator:
U

UKPI

Guest
...can you provide any scientific article with references that confirm this? I don't want any Monty from 'youtube'.
Just open up any freshman to sophomore level (a.k.a. entry-level) electrical/electronics engineering textbook for signal processing like Signals and Systems by Simon Haykin or Signals and Systems by Alan V. Oppenheim.

This article is misleading in that it treats the filter and the interpolation process as a separate technique. Almost all modern DACs internally upsamples the input with a digital low pass filter and then uses a relatively gentle analog low pass filter for its final output. Digital low pass filters for upsampling and analog low pass filters for the final output signal are both interpolation filters.
 

earlevel

Addicted to Fun and Learning
Joined
Nov 18, 2020
Messages
545
Likes
776
...and why any sample couldn't play at 22kHz no matter how many of them has been used?

View attachment 174863

By the way, human's audible frequency range is 20Hz-20kHz but it's out of the topic.
I did a reverse image search, so I see this image is used a several places on the web (including an iZotope article—how embarrassing, for them, sad to see it). But it's...I'll be kind and say "wrong". Sure, there is typically a sample and hold function, to make it easier/practical to acquire each measurement, but first, there is utterly no reason for it to extend all the way to the next sample time, and second, the measurement is still equivalent to a measurement at an instant in time. All the measurements just end up delay in time a tiny (but constant) amount.

In other words, everything on the right ("digital result"—even though it's not actually digitized yet—it's really "discrete time analog") should look like an impulse train, not stair steps. Pretending they are stair steps exaggerates the mistaken idea that sampling is "coarse", and that higher sampling rates are progressively more "fine", and get closed to the analog waveform.

And that notion is fundamentally wrong. If you have a signal that has no frequency components above 2 kHz, it doesn't matter whether you sample it and play it back with a 5 kHz sample rate, 20k, 96k, 500k. Except that the 5 kHz sample rate requires a lot less storage.

PCM = Pulse Code Modulation, which is just "coded" (converted to digital values) Pulse Amplitude Modulation, the modulation with a signal by a pulse train. Not a "step train", a pulse train. Do that modulation and the result is a bunch of evenly space pulse, of varying height. it the very reason we do this—because we want to end up with a bunch of instantaneously values—letting us discard everything else—so we can put them discrete memory.

Ill-conceived drawings like the one above are a big reason people get confused about digital audio. :confused:
 
U

UKPI

Guest
Regarding implementation I highly prize Opus which is actually OGG on high bit rate with fast impulse response.
So, Opus is a lossy compression format that can provide acceptable quality at lower bitrates when compared to Vorbis, but it is actually Vorbis on a high bit rate with a fast impulse response?

Come to think of it, exactly what parts of Vorbis and Opus are similar? They both use some type of vector quantization, but that's not enough to call "Opus is just Vorbis with something".

Opus uses impulse response and 32 bit FP precision to mask the artifacts sort of feeling it more natural while codec is the same (container differs).
This is interesting... I've read through the Opus standard and High-Quality Low-Delay Music Coding in the Opus Codec by Jean-Marc Valin, et al before and don't recall anything about 32 bit FP precision and impulse response masking the artifacts. Opus uses filters with certain impulse responses for preprocessing and postprocessing, but describing that as "using an impulse response" doesn't make sense. Also, the code for the reference implementation of Opus has an option to only use fixed-point calculations when you compile the source code.

Would you care to provide any reading material for this?

EDIT: Typo correction.
 
Last edited by a moderator:

earlevel

Addicted to Fun and Learning
Joined
Nov 18, 2020
Messages
545
Likes
776
Any time you see digital samples represent as staircases you should quickly ignore any textual content that goes with it.
Yes, and I would add that any time you come across any drawings that try to show why a higher rate of samples makes the "digital" (sampled) representation look more like the analog, time to ignore that source and do something more useful with your time. There is no requirement that discrete time has to look close to analog, and the idea that we'll somehow be better off if it does is fantasy.

If that seems confusing, because the waveforms show up pretty nice in your DAW, consider that your DAW is doing crude upsampling (filling in the waveform) as needed (dependent on your horizontal magnification), and analog conversion ("smoothing"). And even if it didn't, real music has a wide span of frequencies, so the low frequency components like bass, where much of the music energy is, ends up being heavily oversampled even at 44.1k and would still give you a decent visual representation. But, if you were looking at the data without the benefit of the smoothing/upsampling of a DAW, and were trying to look at sampled signals closer to half the sample rate (and therefore don't have the benefit of the oversampled effect), it would not look very much like the analog version at all, despite having the same frequency content.

In other words, there is no reason that digital representation needs to look much like the analog wave. It often will, but how close it looks has nothing to do with the effectiveness of the digitization and recreation. Because in discrete time, 20 kHz sampled at 48 kHz is really 20k + 28k + 68k + 76k + 116k + 124k + 164k...plus/minus 20k of every multiple of 48k to infinity, and don't forget to repeat all that with negative frequencies if you want to be picky. There is no expectation that it should look like a since wave, or any need for it to. When it gets played back, the output lowpass filter will remove everything but the 20k.
 

BDWoody

Chief Cat Herder
Moderator
Forum Donor
Joined
Jan 9, 2019
Messages
6,948
Likes
22,627
Location
Mid-Atlantic, USA. (Maryland)

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,524
Likes
37,057
...can you provide any scientific article with references that confirm this? I don't want any Monty from 'youtube'.
I find this stance insanely bothersome. Why? Because you don't have to believe Monty, you can go do a test yourself.

Will 44.1 khz sampling encode a sine wave of 20 khz with high accuracy? Send it to an ADC and then play back the result and see if you get it back. You will. You think sine waves too simple, add other tones, add music, add anything you want with only sub 20 khz content and you'll get it back out the other end with high accuracy.

Did you somehow miss one of the most important points of the Monty video? Those generators and scopes he was using to monitor the result were all 100% analog devices, some of the very highest quality such devices ever made, ever, ever. And they show a result that a trip thru digital land does everything it claims to do. You don't have to think about, believe, or understand any theory of any kind. You can see a straightforward test of can digital do this or not? And these answer is simply, YES, IT CAN!

So my question to you is, if this isn't convincing then tell me what would convince you?
 

ZolaIII

Major Contributor
Joined
Jul 28, 2019
Messages
4,069
Likes
2,409
@UKPI yes for processing. Opus has both integer and FP implementations and by default uses FP one.
The reference you looked at is old (2012) I think there whose IE float 32 implementation error which caused rounding integer overflow which as much as I recall whose resolved in 2017. Not actively tracking it for long now. Sean same FP confusion in WavPack recently.
Ther is no need for more than 21 bit integer in any written form for losseles.
 
U

UKPI

Guest
@UKPI yes for processing. Opus has both integer and FP implementations and by default uses FP one.
The reference you looked at is old (2012) I think there whose IE float 32 implementation error which caused rounding integer overflow which as much as I recall whose resolved in 2017. Not actively tracking it for long now. Sean same FP confusion in WavPack recently.
Ther is no need for more than 21 bit integer in any written form for losseles.
Like you said, Opus doesn't need floating point calculations for its implementation. That's why the statement that Opus uses 32 bit floating point precision to mask the artifacts is misleading. Encoders for other formats also have implementations that use floating point calculations for Fourier Transform, psychoacoustic models, etc. Yet, they fail to achieve what Opus does at lower bitrates. So, floating point is not a decisive factor in Opus's ability to maintain its quality at lower bitrates.
 

ZolaIII

Major Contributor
Joined
Jul 28, 2019
Messages
4,069
Likes
2,409
Like you said, Opus doesn't need floating point calculations for its implementation. That's why the statement that Opus uses 32 bit floating point precision to mask the artifacts is misleading. Encoders for other formats also have implementations that use floating point calculations for Fourier Transform, psychoacoustic models, etc. Yet, they fail to achieve what Opus does at lower bitrates. So, floating point is not a decisive factor in Opus's ability to maintain its quality at lower bitrates.
But a GSM class codec is (SILK). Take a look how hybrid mode combines them (1.11 Opus FAQ). Not only Opus has a FP implementation but SIMD implementations are a on going development (nightly builds currently).
 

Grooved

Addicted to Fun and Learning
Joined
Feb 26, 2021
Messages
679
Likes
441
That's a bunch of nonsense.

Maybe a little Monty is in order here...

And add the video of Dan Worrall if you want another source on sampling rate, adding aliasing or other things :
 
U

UKPI

Guest
But a GSM class codec is (SILK). Take a look how hybrid mode combines them (1.11 Opus FAQ). Not only Opus has a FP implementation but SIMD implementations are a on going development (nightly builds currently).
I am aware of the hybrid mode. Bandwidth of 8kHz or less is covered by SILK, the rest is covered by CELT. That's for high quality speech. Pretty sure that fixed-point-only option also applies to the SILK part. SIMD would speed up the encoding/decoding process, but it per se won't improve quality.

What makes Opus (CELT) a better codec for music compared to others is combining a lot of good ideas. Explicitly quantizing the energy level of each critical band, replacing a spectrum of a frequency band with a copy of a normalized lower frequency band when there aren't enough bits to make that part sound good (techniques similar but not same to this have been used in HE-AAC and mp3pro), being able to change the time-frequency resolution of a frequency band through the Hadamard transform, etc...

It is all in that paper I mentioned. It is a good complementary material to the Opus standard. Needless to say, Opus and Vorbis don't have that much in common.
 

earlevel

Addicted to Fun and Learning
Joined
Nov 18, 2020
Messages
545
Likes
776
...can you provide any scientific article with references that confirm this? I don't want any Monty from 'youtube'.
If you want a mathematically complete explanation (not drawings and hand waving) that is not difficult to understand, I offer:

Sampling theory, the best explanation you've ever heard

I wasn't trying to cocky with that title, my goal was for it to be just that for many people. I spent a lot of time thinking about it, I don't know of an easier to understand explanation that is mathematically complete. Multipart, but each part is short and presents a key idea. Just paused and think about it a little after each part.

The key difference in the way I'm explaining sampling, versus most everyone, if it focuses on the frequency domain instead of time domain drawings. Has anyone ever been convinced by the ubiquitous drawings that show a dotted outline of another sine wave, and the word "aliasing"? Maybe for a moment, until you start thinking, "but what if it's not a simple sine wave, what if it's a transient? Is there a flaw in the sampling theory that makes it not suitable for music, but it's hard to notice because it works fine for the steadier parts?" It's this kind of paranoia that even digital audio people get caught up in, because even most digital audio experts don't understand it fully.

But if you do take time to understand it, you'll recognize that it works exactly as claimed: Frequency components below half the sample rate are encode perfectly. How much below? Any amount at all—a point that's hard to understand with the time domain drawings, but perfectly clear when viewed as a modulation in the frequency domain (sampling and AM radio are nearly the sample thing!). It will be clear that the "only" source of imperfection is the choice and implementation of lowpass filtering. (I'm being slightly gray here with "only", but let's just say we can assume the other conversion details are done properly and to the state of he art, which for practical purposes we can consider perfect.)
 
Last edited:
Top Bottom