• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

High Resolution Audio: Does It Matter?

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,789
Location
My kitchen or my listening room.
* I am not an expert on pyschoacoustics and I dare say, no one on this planet is either.

I am. I know quite a few others. This kind of "mankind can never know" is really tedious. No, nobody knows EVERYTHING, but it is possible to determine limits, and live within them.

Now don't take that as an endorsement of Red Book, it's not. However, your comments about redbook either repeat myth or simply raise issues not in dispute (like capture, nobody captures at 16/44 any more, just give that up).

There is massive parts of how the brain does all this which are entirely unknown. We dont actually know the full "specification" but our attempt at doing so with the CD standard was a laughable and childish attempt for sure. There is no way any "golden ear" ever declared CD to be transparent to LP, and the fact is, the most casual human knows its a digital recording to because the sample rate is way way less than what humans need for directionality

Now, please stop making this absolutely ridiculous claim about "what humans need for directionality". The time resolution, even of redbook, is not 1/44100. That flat-out, in some cases knowingly dishonest myth is very tiresome. A few journalists simply make things up. I speak from more than limited experience.

The actual resolution of RedBook CD (and you're talking to someone who thinks that, for reasons related to nonlinearity in the auditory system we need at least 50kHz, and I would go with 64kHz myself, but not for this "time resolution" codswallop) in terms of TIME RESOLUTION is, in fact, more like
1/(2*pi*20000*65526). The best proven ITD anywhere for a human listener is 5 microseconds, and the lowest report ever that has any miniscule chance of being accurate is 2 microseconds.

The time resolution of a full scale pulse in 16/44 is, in case you're wondering, just about .1 NANOseconds. So give it up. Please. The scientific community is nowhere near as ignorant as you have defamed us, and you've simply repeated a shoebox of myths that's been debunked over and over and over. Lest you fuss about full scale being an issue, sorry to tell you,the noise level of the atmosphere itself is about 6dB SPL (up to 8 or so) at the eardrum, so there are very real physical limits you can't ever get past. That's the noise floor your ear has, like it or not. Now that's 20-20K white noise (no, not pink, not brownian, although it is due to brownian motion, the frequency is still a collection of very narrowband pulses) so you can't hear it (just barely, I might note) even with perfect human hearing in a silent place.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,789
Location
My kitchen or my listening room.
So much incorrect with your post, but let’s start with the above snippet - you’re about 3 orders of magnitude off. Not sure where you found that figure.

We can ignore your incorrect notions on the relationship between sampling rate and time resolution for now.

Oh and welcome to ASR.

Yeah, the nanosecond claim is what used to be a microsecond claim. It's strange, as the facts about sampling theory are taught more widely, the sensitivity of the human ear is claimed to get reduced to whatever the actual evidence is, and then a bit.

But even then, even if the ridiculous 10 nanoseconds was true, which isn't since it's more like 10 microseconds, tested and verified over and over again, both with analog and digital stimulii, that's still a factor of 100 LONGER than what you get from Redbook CD. Ooopsie.
 

Dismayed

Senior Member
Joined
Jan 2, 2018
Messages
392
Likes
417
Location
Boston, MA
Anyone who claims that the CD standard in inadequate needs to present proof in the form of a blinded randomized control study with matched levels. Otherwise you’re just spouting BS.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,789
Location
My kitchen or my listening room.
Yes and hi, thanks for the welcome mate

I'll find a paper to reference it. I think your trying to reference an old standard, which was about microseconds. There is research showing its about nanoseconds. What attracts me to this board is the rationalism and science - so I do freely welcome debate and criticism. I care about this subject allot.

Standards are not involved. What's more, credible results were, are, and continue to be over 5 microseconds in ITD sensitivity for highly sensitive signals. There is a "paper" out there that thinks that time resolution of Redbook is 1/44100. It's dead, flat, profoundly wrong to assert that. Before you go off on that particularly and terribly mistaken claim,I suggest you read a bit more on the subject.
 

Bathrone

Member
Joined
Jun 28, 2022
Messages
17
Likes
5
Location
Australia
The best proven ITD anywhere for a human listener is 5 microseconds, and the lowest report ever that has any miniscule chance of being accurate is 2 microseconds.

Thanks for your contribution. As I said above, I will come back with the research. The data is what it is.

I dont see how you can claim to be an expert in psychoacoustics - the field just isnt as known and proven as many other disciplines. Once its known well be able to model it fully in simulation and model all our design to that. In video, this has been done allot for use cases like making additional gains in encoding it by throwing away less important to humans data, assisting the compression. Im not being emotionally anti or anything, I find the field exciting and interesting things are happening often within it.

regards
 

Bathrone

Member
Joined
Jun 28, 2022
Messages
17
Likes
5
Location
Australia
Anyone who claims that the CD standard in inadequate needs to present proof in the form of a blinded randomized control study with matched levels. Otherwise you’re just spouting BS.
Yes ofcourse, thats been done. I realise the first one was it Maye or Mayor or something Ill look it up later, that showed to the engineering society that it was statistically good. Then later, a whole series of papers were released to the engineering society showing the contrary where serious and fair criticisms were made of the original paper. I belive also from memory too there was a meta analysis of all the papers.

I support your rationalism and though - I agree on your intent and method - the data does show CD is inadaquete. Ill see about references
 

earlevel

Addicted to Fun and Learning
Joined
Nov 18, 2020
Messages
550
Likes
779
This is a subject close to my heart. I have significant research and time into this field. High Res is necessary because

* The claim Sony Phillips got the best golden ears across the industry they could find at the time in developing red book CD, and the outcome was 44.1khz was settled on because a)None of the golden ears could hear a difference to LPs b) Nyquist Shannon processing law re-assured them all that they were ok is a nebulous myth. Human beings havent changed in psychoacoustics since the 70s and 80s. The standard wasnt biblically blessed and it is not transparent.
* A 16bit depth is not perfectly suitable because the human human hearing has around 120db of dynamic range and the act of quantization in 16bits doesnt meet this. I would tend to reject noise floor arguments which try to suggest yeah well add the noise floor in and its around the ball park too because arguing like 40db is lost to the listening location doesnt account to a quiet listening room with closed back headphones.
* A 16bit depth is a poor quantization approach for use cases where folk or hardware makers want to process digital audio further themselves for specific reasons
* With the sampling rate, no one educated would try to say that Nyquist Shannon is somehow wrong. The maths are clear and its proven time and again. Everyone rational agrees that 44.1khz is more than sufficient for reproducing the highest frequency that humans can hear, and its certainly way more than sufficient for the majority of the population who cant even hear beyond 15khz much.
* But, its "wrong" because were measuring the wrong things. We need as an industry to focus way more on psychoacoustics. Good start to explore the maximum frequency a human could ever hear. Dumb ass of the decade move though for Sony Phillips to try to declare that was it and now their "golden ears" cant even tell. Its BS. The real question beyond pre-school, the most basic next question, should have always been ok so high frequencies are covered, how do we deal with transients and directionality in human hearing?
* To answer that.................44.1 khz is 22.676 nanoseconds in time. Epic fail. Science knows that human beings have directionality sense in audio below 10 nano seconds. To properly be transparent, and indeed, even to a casual listener and anything but a "golden ear", a human being is very very good at figuring out directionality and anything less than what our brain processes every day from countless evolutionary pressures killing off half deaf cannon fodder to predators, what has remained through evolution is a human being that has clear perception of directionality to properly respond to threats of harm. And as well, cause harm to other animals on what we want as food lol
* Yes yes, its a horror show of countless data sets and all this pesky cardinality of it all in discrete maths......but we are humans and we have capabilities beyond hearing to understand and influence all this phenomenon. So if a human needs a few nanoseconds to not loose the sense of directionality, then, 768khz sample rate has a time interval of 1.302 nanoseconds which for all is known of science, exceeds what we determine for directionality
* I am not an expert on pyschoacoustics and I dare say, no one on this planet is either. There is massive parts of how the brain does all this which are entirely unknown. We dont actually know the full "specification" but our attempt at doing so with the CD standard was a laughable and childish attempt for sure. There is no way any "golden ear" ever declared CD to be transparent to LP, and the fact is, the most casual human knows its a digital recording to because the sample rate is way way less than what humans need for directionality
As dc655321 said, several things wrong here, but briefly:

16 bits was an obvious choice at the time for several reasons: First, it's two bytes. Sony (or was it Philips?) initially spec'd 14, but if you're going to do 14, you might as well do 16, with small loss of playing time but otherwise convenient. If you want 17 or 18, it's at a great cost because it needs another byte in normal computer processing, and converters would have been a problem anyway. And to whatever degree one might argue that we can hear more bits, 16 largely covers the practical dynamic range. And 16-bit converters was at the state of the art of the time. Many early CD players used 14-bit converters while using oversampling to drop the noise level.

44.1k was chosen as a clock-convenient (related to video recorders, used for digital audio at the time) rate that allowed 20k bandwidth. Again, no golden-eared listeners selecting this, it was purely a practical decision. And they wanted the playing time to be significantly longer than LPs, higher sample rates robbed from that. If you doubled the sample rate, you couldn't get close to fitting an LP of the time on a single CD.

The time resolution thing is a bogus issue, from people who don't understand sampling theory. Time resolution in digital audio is independent of sample rate, it's limited only by limitations of bit depth and how much your ears can resolve. To put it another way, timing features (say, an impulse) can be places anywhere, they aren't dependent on where the samples are in time. Obviously, high sample rates can record steeper features in the time domain, but that's not a timing resolution issue, it's simply a matter of accommodating higher frequencies.
 
Last edited:

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,789
Location
My kitchen or my listening room.
I dont see how you can claim to be an expert in psychoacoustics - the field just isnt as known and proven as many other disciplines.

Get used to it. We are a rare breed, but there are rather a few of us around.

Once its known well be able to model it fully in simulation and model all our design to that.

Catch up, then.

In video, this has been done allot for use cases like making additional gains in encoding it by throwing away less important to humans data, assisting the compression. Im not being emotionally anti or anything, I find the field exciting and interesting things are happening often within it.

regards

No kidding. I have patents for perceptual image and video coding as well as patents, awards, professional recognition, yadda, yadda for using psychoacoustics for a variety of purposes including, but not limited to, coding.

As to your citations, I see none. Surprise, surprise.

Your claim to "not being emotionally charged" is, however, a joke, when we look at your initial rant, fully of nonsense. I rather think your insult above is intended.
 
Last edited:

Bathrone

Member
Joined
Jun 28, 2022
Messages
17
Likes
5
Location
Australia
But even then, even if the ridiculous 10 nanoseconds was true, which isn't since it's more like 10 microseconds, tested and verified over and over again, both with analog and digital stimulii, that's still a factor of 100 LONGER than what you get from Redbook CD. Ooopsie.

Hi again. I did not say that the field of psychoacoustics is impossible to know, and I certainly support your rejection of the throw our hands up and get all nebulous about it for all time. I didnt say it was for all time. What I'm saying, is that new research and exciting things happen in the field. That the field isnt as far along in our history of study as many other fields are. Thats not in anyway a criticism to you or other people involved in it, on an emotional level I find that to be exciting and interesting and fun

I do understand what your trying to assert about redbook CD. In digital processing using say time domain processing techniques, all the maths with the algorithms don't work to way you have interpreted it. The sample rate is central to the DSP filter. Yes, supersampling, oversamping, running more taps between the sample. These things have allot of value. Ultimately though, the cadence of all the processing in DSP is keyed off the sample rate. It simply would not be possible to do the calcuations your taking of in the way your trying to define it with small devices in the current era of computing. Im not sure about quantum but even if it could, its too big for those use cases
 
D

Deleted member 16543

Guest
.. and you're talking to someone who thinks that, for reasons related to nonlinearity in the auditory system we need at least 50kHz, and I would go with 64kHz myself ...
This is interesting, as to me the necessity of 50+ kHz sample rate seems to stem from wanting to model the non linear behavior of the ear itself. The question is, why is this NECESSARY?
In my opinion, instead, all that's SUFFICIENT to do is to just replicate the stimuli that trigger that non linear system. That is, the audible spectrum. From that point of view, 44.1 kHz is more than sufficient.
I'm not sold on the reasons (but willing to consider them) why we need to cover stimuli above the audible range just because the ear extends that range due to its non-linearities.
 

tomelex

Addicted to Fun and Learning
Forum Donor
Joined
Feb 29, 2016
Messages
990
Likes
572
Location
So called Midwest, USA
JJ, thanks for your input. It does get tedious I know however correcting these misconceptions once in a while helps keep the audio train on the rails Sir. And is a good review for a lot of us who appreciate truth and facts.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,789
Location
My kitchen or my listening room.
This is interesting, as to me the necessity of 50+ kHz sample rate seems to stem from wanting to model the non linear behavior of the ear itself. The question is, why is this NECESSARY?
In my opinion, instead, all that's SUFFICIENT to do is to just replicate the stimuli that trigger that non linear system. That is, the audible spectrum. From that point of view, 44.1 kHz is more than sufficient.
I'm not sold on the reasons (but willing to consider them) why we need to cover stimuli above the audible range just because the ear extends that range due to its non-linearities.

A good question. Consider, the ear is pretty much a minimum-phase system (there are some reflections inside the cochlea afterwards, but we'll get to that.

The very first time than an inner hair cell fires is when the detection starts. It's all about that. There are several ways to bandlimit a signal, all of which result in a signal that is limited to 20kHz. The FIRST firing. That is nonlinear to the max, really, it's a detection/no detection, nothing more, and it feeds back to the outer hair cells to desensitize the ERB around it inside a millisecond.

You can do a constant delay system to limit the bandwidth, that's what most people call an FIR. That has both 'preringing' and 'postringing', in fact the system is symmetric about the centerpoint. This can (and for really wild delays that result from bad or obsolete systems, does, i.e. pre-echo) start the cochlear compression before the main impulse (which does fire many more neurons). When that happens, it ***CAN*** cause a change in sensation, in that it can slightly reduce the loudness (in an ERB) of the main signal. This clearly happens with excessively poor filter designs, but of course, anyone can do it wrong, and most do. (the point being that the mechanism is proven to work to at least some level)

For that constant delay system, the loudness (remember, loudness is a perceptual attribute, not least-mean-squares energy, of the attack can be changed. For pre-echo enough to cause '2 detections' but less obvious pre-echo still changes the timbre of the attack. Now when filtering between 20 and 22 kHz, the main lobe (width of the filter's more energetic part) is rather wide, in particular 2/(transition bandwidth) or 2/2000 long. That's right on the edge of pre-echo sensation.

When you move to 48kHz you have 2 choices, either wind up with a main lobe with of 2/4000, or have the same 2/2000 but starting at 22 khz. (I vote for the first).

There is some old results from Tom Stockham's work that showed that 50kHz was a good number, but that does not apply directly to FIR-based antialiasing filters. It does, however, still relate to the filter main lobe width, which thanks to Fourier remains the same length. Stockham used elliptic filters. They are NOT constant delay, and have a different problem, in that different frequencies are delayed at different lengths through the filter, resulting in dispersion. Enough dispersion is certainly audible, but again, when does that end? At about the same point, apparently, as pre-echo. This makes some sense, but does not directly follow.

The 2/5000 length for a 50kHz sampling rate is JUST under the worst known pre-echo sensitivity. Again there is no proof of this, and the test is an absolute mother****er to run, from an equipment and subject point of view.

HOWEVER, running at 64kHz, with a 12 kHz transition band, we're done. Covered, even having some sensitivity well above 20kHz, much better than any analog equipment (nearly all of which is minimum phase, by the way). So my choice is 64khz.

Again, there is no test supporting this. There is no test completely discounting it. Nobody's run the test, and probably never will. In the REAL world, with most REAL stimulii, it probably simply does not matter.

Remember, when you drop the "linear system" part, the bandwidth limiting does not do what you think it does.

I hope that helped.
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,789
Location
My kitchen or my listening room.
These things have allot of value. Ultimately though, the cadence of all the processing in DSP is keyed off the sample rate. It simply would not be possible to do the calcuations your taking of in the way your trying to define it with small devices in the current era of computing. Im not sure about quantum but even if it could, its too big for those use cases

You have no idea what you're talking about, sorry. All you have to do is look at the required time response of either an antialiasing or antiimaging filter, and you will learn otherwise.

Somewhere on this board is a graph where I point out shifts of 1/64ths of a sample are easily captured and reproduced. It's been a while, so go find it instead of being insulting.

I might add, my other most quoted paper is on digital filter design. I'm glad to explain things, but first, you had better start listening.
 

BDWoody

Chief Cat Herder
Moderator
Forum Donor
Joined
Jan 9, 2019
Messages
7,079
Likes
23,512
Location
Mid-Atlantic, USA. (Maryland)
D

Deleted member 16543

Guest
...

I hope that helped.
It did, somewhat. I'd like to dig a little deeper. More of a free-form reasoning that a rebuttal to what you said.

Discarding the less than ideal filter designs out there that you mentioned (which is an actual issue even in exotically priced converters), and sticking to well designed, linear phase brickwall FIR filters, I think that examining the width of the main lobe of the kernel filter and the pre-ringing can be somewhat misleading, as the kernel is not what's actually at the output of a converter.
If the input is already limited below 20 kHz, there is mathematically no pre or post-ringing added. The output is exactly (save quantization and aliasing noise) the same as the input. You don't need me to tell you this. However..

The more interesting scenario is when the input is NOT limited to below 20 kHz. In such case there could be some wiggly-looking things in the output, in the time domain, some of which could be, misleadingly in my opinion, called "pre-ringing". In reality, it's just the result of taking away inaudible frequencies from a signal that had them to begin with.
This scenario could be interesting as the case could be made for those let's call them "pre-wiggles" to start the cochlear compression before their full range original signal would, and potentially not by the same amount.
However, this would mean that frequencies otherwise inaudible on their own, when subtracted from a signal that has them, modify the audible part of the signal.
This is hard to swallow for me, and I would definitely need to see test results to believe it actually does happen. But.. it is a theoretical possibility, at least.
Alas, I agree with you that probably nobody will actually do the test for the foreseeable future.
Besides, there's way bigger fish to fry in the audio chain to worry about this. So for me the verdict, all in all, is still 44.1 kHz is all that's necessary.
 
D

Deleted member 16543

Guest
I'll give you a little time off to get that together. You are in way over your head.
1656472137755.jpeg
 

j_j

Major Contributor
Audio Luminary
Technical Expert
Joined
Oct 10, 2017
Messages
2,282
Likes
4,789
Location
My kitchen or my listening room.
The more interesting scenario is when the input is NOT limited to below 20 kHz. In such case there could be some wiggly-looking things in the output, in the time domain, some of which could be, misleadingly in my opinion, called "pre-ringing". In reality, it's just the result of taking away inaudible frequencies from a signal that had them to begin with.

That's kind of right, but also kind of not right. If I put an impulse into the system and run it through the filter (yes, that is most assuredly containing out of band material, of course), you get the antialiasing filter response back.

NOW, you analyze ****that**** response running it through, say, an ERB cochlear filter (you can't bypass this one, it's in your head, say at 18k), did you change the detection vs. putting the impulse directly into the ear canal and then the ERB cochlear filter?

***that*** is the question.

It's not even remotely easy to answer, and since it involves detection being a nonlinear property, you can't just apply the math directly.

Consider, pre-echo, which is most surely audible, is purely in-band. No out-of-band material at all.

So, if I put an impulse into your inner hair cell in two ways, one through an anti-imaging (or aliasing) filter, and the other not, will they respond at different times.

Back of the envelope actually suggests 'yes'. It would explain Stockham's old results (sigh, which he kept for years and did not publish, they became ill and did not publish) very cleanly, and without having to invent super-hearing or some new mechanism.

In any case, going to 64 seems ideal to me, it is a very good engineering choice, and is probably fine for most of us over 40, for sure. (that crap at 18kHz is no longer our problem).
 
Last edited:
D

Deleted member 16543

Guest
That's kind of right, but also kind of not right. If I put an impulse into the system and run it through the filter (yes, that is most assuredly containing out of band material, of course), you get the antialiasing filter response back.

NOW, you analyze ****that**** response running it through, say, an ERB cochlear filter (you can't bypass this one, it's in your head, say at 18k), did you change the detection vs. putting the impulse directly into the ear canal and then the ERB cochlear filter?

***that*** is the question.

It's not even remotely easy to answer, and since it involves detection being a nonlinear property, you can't just apply the math directly.

Consider, pre-echo, which is most surely audible, is purely in-band. No out-of-band material at all.

So, if I put an impulse into your inner hair cell in two ways, one through an anti-imaging (or aliasing) filter, and the other not, will they respond at different times.

Back of the envelope actually suggests 'yes'. It would explain Stockham's old results (sigh, which he kept for years and did not publish, they became ill and did not publish) very cleanly, and without having to invent super-hearing or some new mechanism.

In any case, going to 64 seems ideal to me, it is a very good engineering choice, and is probably fine for most of us over 40, for sure. (that crap at 18kHz is no longer our problem).
If you put an impulse (which is not band limited by its nature) through an anti-aliasing filter you remove the inaudible frequencies (and only those), like I was saying.
Is there an audible difference?
As you say.. ***that*** is the question.
To me the answer is very likely not. Even if the kernel of the anti-aliasing filter does have all components below 20 kHz, a sinc representing a 20 kHz linear phase brickwall sounds to me the same as any other representing a cut-off frequency above that.
And if there's an audible difference it's probably to be attributed to the bigger fish I was talking about, like IMD and tweeter resonance, for example.
Again, I'm only limiting my argument to well designed linear phase brick-wall filters. If there's phase shift within the audio band, or amplitude roll-off, I have no problem admitting it's audible.
But as long as the amplitude and phase of the content below 20 kHz is accurately reproduced, I don't see why removing ultrasonic content should cause any audible difference.
Will we ever know for sure? Probably not..
 

DonH56

Master Contributor
Technical Expert
Forum Donor
Joined
Mar 15, 2016
Messages
7,894
Likes
16,708
Location
Monument, CO
You'd think before telling one of the best-known experts in the field he's not an expert and doesn't understand what he's talking about the person would've at least checked for a bio or something...

A question on the band-limited impulse response: does Gibbs apply, and would that lead to audible artifacts? I ran into Gibbs again in my day job (at about 30 GHz, not audio) and got to wondering if it really mattered for this (audio) case...
 
Top Bottom