• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

MQA creator Bob Stuart answers questions.

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
Weren't you saying earlier that 192 kHz PCM was 100% sufficient? That MQA was an intelligent compression method but not an improvement on a sufficiently high sampling rate and bit depth?

According to the classic DSP and hearing theories, the 48/24 PCM shall be 100% sufficient. I'm still saying that 192/24 appears to be sufficient: for me, for the music I've heard so far. Other people say that 384/24 works even better for them, for recordings of symphony orchestra.

I retracted my statement regarding Vorbis. It can resort to bit-perfect representation of 192/24 for the frames where its encoder detects transients. I was originally discussing the lossy codecs in the context of compressed frames; Amir pointed out that a more proper context is the whole song.

I would add today that even more proper context is the actual statistics of streams played by a provider. I guess Spotify can afford occasional hit on its internet bandwidth bill from a complex song with many transients, as long as the bulk of songs played are unsophisticated Pop and EDM.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,808
Likes
242,830
Location
Seattle Area
Couple of papers taking about the differences between the sinusoids and transients:
Please stop googling for this stuff. It is not good for your audio health.

Bandwidth and rise time are completely linked. The formula is everywhere as is its derivation : BW = 0.35/rise time. If you set the rise time to 0, i.e. perfect square wave, then bandwidth becomes infinite as it should. If you set the rise time to higher values, then you don't need as much bandwidth. You can derive this from the Fourier Transform.

No music can produce the rise time you mentioned unless it has a bandwidth of 70 kHz. If it does, then a digital system with 176 kHz sampling will nicely capture it. And it will do so with 20+ bit of dynamic range. No analog system has a prayer of doing anything remotely close to this. You can't even extract the 70 Khz out of noise let alone have it have meaningful dynamic range.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,808
Likes
242,830
Location
Seattle Area
I would add today that even more proper context is the actual statistics of streams played by a provider. I guess Spotify can afford occasional hit on its internet bandwidth bill from a complex song with many transients, as long as the bulk of songs played are unsophisticated Pop and EDM.
When lossy codecs are used in streaming, the content is encoded using constant bit rate mode (CBR). Here, a time window of say, 1 to 5 second is specified, during which the bandwidth is guaranteed to not average more than what is encoded. So any peaks will have to be compensated by valleys that don't need as many bits. If you don't get lucky with this coincident, then distortion sets in. Bit rate always trumps fidelity in CBR mode.

There is another mode which is the default for Ogg Vorbis and that is variable encoding mode (VBR). There the peaks can be anything they want for as long as they want. This makes them hard to stream so likely not used by likes of Spotify. Anytime you see a bit rate designation for a lossy encoded file, etc. 128 kbps, then it is using CBR mode.
 

svart-hvitt

Major Contributor
Joined
Aug 31, 2017
Messages
2,375
Likes
1,253
This is a never-ending thread giving new members the impression that digital is faulty while LP captures reality. So I post this link once again, an article written by Jim Lesurf on the limits of the playback medium:

He concludes thus:

«You should now understand that the terms ‘analog’ and ‘digital’ are based on idealisations. Real systems and signals will show a mixture of analog (smooth continuous) and digital (quantised) properties. Although it's often convenient to assume a signal/system is one thing or the other, this mixed behaviour is an unavoidable consequence of the way the world works».

Source: https://www.st-andrews.ac.uk/~www_pa/Scots_Guide/iandm/part12/page2.html
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,909
Likes
37,973
When lossy codecs are used in streaming, the content is encoded using constant bit rate mode (CBR). Here, a time window of say, 1 to 5 second is specified, during which the bandwidth is guaranteed to not average more than what is encoded. So any peaks will have to be compensated by valleys that don't need as many bits. If you don't get lucky with this coincident, then distortion sets in. Bit rate always trumps fidelity in CBR mode.

There is another mode which is the default for Ogg Vorbis and that is variable encoding mode (VBR). There the peaks can be anything they want for as long as they want. This makes them hard to stream so likely not used by likes of Spotify. Anytime you see a bit rate designation for a lossy encoded file, etc. 128 kbps, then it is using CBR mode.
Spotify used Ogg at one time. I think they use AAC currently. Amazon Music uses VBR 256 kbps at least for some of its available music.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,808
Likes
242,830
Location
Seattle Area
Let's take a tweeter capable of reproducing 20 KHz sinusoid at full amplitude. If it can truly track the analog signal in real time, this means that its diaphragm can go from zero displacement and zero velocity to full displacement in 12.5 microseconds. If we only need to go to 42% of amplitude for the pulse described above, such tweeter can reproduce the pulse too, because the tweeter's slew rate is high enough.

But a tweeter capable of reproducing 20 KHz sinusoid at full amplitude isn't necessarily capable of tracking the analog signal in real time. It may take its sweet time to get there, requiring multiple signal cycles to get to the full amplitude. Formally, it is rated up to 20 KHz, or maybe even up to 40 KHz. Yet it is not capable of reproducing the pulse.
All wrong. Sorry.

That pulse can be decomposed using Fourier transform, mandating that you have to have the higher frequency harmonics. If you don't, then you just broke Fourier Transform which is an impossibility as that is proven by math.

If a pulse is "taking its time" then its rise time is very slow and therefore, it is not the pulse you are thinking about.
 

amirm

Founder/Admin
Staff Member
CFO (Chief Fun Officer)
Joined
Feb 13, 2016
Messages
44,808
Likes
242,830
Location
Seattle Area
Those who are interested in learning about the 5% will read my postings in this thread, and follow the references to supporting information they contain. The interested readers will learn that mammal hearing system doesn't do Fourier Transform, that Sampling Theorem doesn't actually apply to PCM, and that cochlea is neither Linear not Time Invariant system. Those who are not interested ... well ... that's their choice.
I don't think anyone is going to bother following any of those obscure links. I know I hit the quit button as soon as I go there, realizing they are completely out of context of the discussion.

No one has said human hearing is a Fourier Transform. A FT analysis however is essential for understanding the nature of the signals we are talking about. As I noted in my other post, you cannot say anything that breaks a fourier analysis because the transform is provable mathematically so not subject to ifs and buts.

Once we diagnose a signal, then we can apply psychoacoustics to it. If you have a single that extends to 70 kHz, we know the ear won't hear it, period. Indeed a square wave at 15 Khz sounds like a pure tone, not square wave because you can't hear the harmonics.
 

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
Please stop googling for this stuff. It is not good for your audio health.

Bandwidth and rise time are completely linked. The formula is everywhere as is its derivation : BW = 0.35/rise time. If you set the rise time to 0, i.e. perfect square wave, then bandwidth becomes infinite as it should. If you set the rise time to higher values, then you don't need as much bandwidth. You can derive this from the Fourier Transform.

No music can produce the rise time you mentioned unless it has a bandwidth of 70 kHz. If it does, then a digital system with 176 kHz sampling will nicely capture it. And it will do so with 20+ bit of dynamic range. No analog system has a prayer of doing anything remotely close to this. You can't even extract the 70 Khz out of noise let alone have it have meaningful dynamic range.

OK. Let me explain what I'm saying from a different angle. Nothing about Fourier Transform anymore :) Imagine that you are in a boxing gym in front of a heavy punching bag. You have an equally strong sparring partner, standing on the opposite side of the bag.

First experiment: you push on the bag, slowly, the pressure of your hand approximating a sinusoid. The bag starts moving. Once it passes the zero velocity point on the partner's side, the partner does what you just did. Continue the exercise, both of you pushing on the bag while it is moving from you, until the bag reaches very high amplitude. That's an approximation of a normal basilar membrane resonance behavior.

Second experiment: you and your partner are punching the bag, each of you landing a strong straight every second, yet out of phase, so that the bag is hit at your side at 0 milliseconds, then at partners's side at 500 milliseconds, then at your side at 1,500 milliseconds, and so on. You can do this for a very long time, assuming you are very fit :) Yet the bag will not reach the amplitude that it reached in the first experiment. That's an approximation of a basilar membrane response to an ultrasonic wave.

Third experiment: you punch the bag, once. The bag won't reach the amplitude of the first experiment. Yet if won't stay in place like it effectively did during the second experiment either. It will move, because you transferred to it mechanical momentum (mass multiplied by velocity), which was not quickly counteracted by your partner.

Eventually, the bag will settle in an oscillation with its own characteristic frequency, no matter how softly and slowly, or how hard and quickly, you punched it. The maximum amplitude it reaches will depend on the mechanical momentum transferred by your punch. You can try a "noodle slap" vs "all my body weight in" punches and observe the difference. That's an approximation of basilar membrane response to a short pulse.

In the first and second experiments, we can talk about the frequency, because the applications of force are repetitive. In the third experiment, we can't, because with just one punch, how do we determine the time period until the second punch? The second punch simply doesn't come during the third experiment.

Analogously, the basilar membrane effectively ignores a vigorous application of ultrasonic force analogous to the second experiment. Yet it has to react, somehow, upon getting half of that ultrasonic wave in the third experiment, because the half wave is asymmetrical, and thus transfers mechanical momentum.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,909
Likes
37,973
That would be for their downloads. I was speaking for streaming.
"Where possible, we encode our MP3 files using variable bit rates for optimal audio quality and file sizes, aiming at an average of 256 kilobits per second (kbps). Using a variable bit rate allows us to allocate a higher bit rate to the more complex sections of music files while using a smaller bit rate for the less complex sections. The average of these rates is then calculated to produce an average bit rate for the entire file that represents the overall sound quality. Some of our content is encoded using a constant bit rate of 256 kbps."

So at least when I listen to Amazon music it isn't a continuous stream. It will download an entire track and stop. If you have good bandwidth that is a few seconds. Soundcloud works the same way. I don't know how Spotify does it. It is obviously true, I'm downloading the track rather than live streaming it with Amazon and Soundcloud.

Listening to some FM radio station websites, those do continuously stream with a slight buffer. Here is a good site to find those stations that suit your musical taste and stream them.
https://www.internet-radio.com/
 
Last edited:

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,909
Likes
37,973
OK. Let me explain what I'm saying from a different angle. Nothing about Fourier Transform anymore :) Imagine that you are in a boxing gym in front of a heavy punching bag. You have an equally strong sparring partner, standing on the opposite side of the bag.

First experiment: you push on the bag, slowly, the pressure of your hand approximating a sinusoid. The bag starts moving. Once it passes the zero velocity point on the partner's side, the partner does what you just did. Continue the exercise, both of you pushing on the bag while it is moving from you, until the bag reaches very high amplitude. That's an approximation of a normal basilar membrane resonance behavior.

Second experiment: you and your partner are punching the bag, each of you landing a strong straight every second, yet out of phase, so that the bag is hit at your side at 0 milliseconds, then at partners's side at 500 milliseconds, then at your side at 1,500 milliseconds, and so on. You can do this for a very long time, assuming you are very fit :) Yet the bag will not reach the amplitude that it reached in the first experiment. That's an approximation of a basilar membrane response to an ultrasonic wave.

Third experiment: you punch the bag, once. The bag won't reach the amplitude of the first experiment. Yet if won't stay in place like it effectively did during the second experiment either. It will move, because you transferred to it mechanical momentum (mass multiplied by velocity), which was not quickly counteracted by your partner.

Eventually, the bag will settle in an oscillation with its own characteristic frequency, no matter how softly and slowly, or how hard and quickly, you punched it. The maximum amplitude it reaches will depend on the mechanical momentum transferred by your punch. You can try a "noodle slap" vs "all my body weight in" punches and observe the difference. That's an approximation of basilar membrane response to a short pulse.

In the first and second experiments, we can talk about the frequency, because the applications of force are repetitive. In the third experiment, we can't, because with just one punch, how do we determine the time period until the second punch? The second punch simply doesn't come during the third experiment.

Analogously, the basilar membrane effectively ignores a vigorous application of ultrasonic force analogous to the second experiment. Yet it has to react, somehow, upon getting half of that ultrasonic wave in the third experiment, because the half wave is asymmetrical, and thus transfers mechanical momentum.
Facepalm_statue-600x.jpg
 

Hugo9000

Addicted to Fun and Learning
Joined
Jul 21, 2018
Messages
576
Likes
1,760
Location
U.S.A. | Слава Україні
I will be happy when the political season for the next election cycle heats up and a certain person's duties spreading FUD on twitter and fb occupy all of his time. Audio forums are either a side interest to the job, or perhaps useful for warmup during political lulls?
 

SIY

Grand Contributor
Technical Expert
Joined
Apr 6, 2018
Messages
10,589
Likes
25,482
Location
Alfred, NY
What's the opposite of "value-added"?
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,909
Likes
37,973
OK. Let me explain what I'm saying from a different angle. Nothing about Fourier Transform anymore :) Imagine that you are in a boxing gym in front of a heavy punching bag. You have an equally strong sparring partner, standing on the opposite side of the bag.

First experiment: you push on the bag, slowly, the pressure of your hand approximating a sinusoid. The bag starts moving. Once it passes the zero velocity point on the partner's side, the partner does what you just did. Continue the exercise, both of you pushing on the bag while it is moving from you, until the bag reaches very high amplitude. That's an approximation of a normal basilar membrane resonance behavior.

Second experiment: you and your partner are punching the bag, each of you landing a strong straight every second, yet out of phase, so that the bag is hit at your side at 0 milliseconds, then at partners's side at 500 milliseconds, then at your side at 1,500 milliseconds, and so on. You can do this for a very long time, assuming you are very fit :) Yet the bag will not reach the amplitude that it reached in the first experiment. That's an approximation of a basilar membrane response to an ultrasonic wave.

Third experiment: you punch the bag, once. The bag won't reach the amplitude of the first experiment. Yet if won't stay in place like it effectively did during the second experiment either. It will move, because you transferred to it mechanical momentum (mass multiplied by velocity), which was not quickly counteracted by your partner.

Eventually, the bag will settle in an oscillation with its own characteristic frequency, no matter how softly and slowly, or how hard and quickly, you punched it. The maximum amplitude it reaches will depend on the mechanical momentum transferred by your punch. You can try a "noodle slap" vs "all my body weight in" punches and observe the difference. That's an approximation of basilar membrane response to a short pulse.

In the first and second experiments, we can talk about the frequency, because the applications of force are repetitive. In the third experiment, we can't, because with just one punch, how do we determine the time period until the second punch? The second punch simply doesn't come during the third experiment.

Analogously, the basilar membrane effectively ignores a vigorous application of ultrasonic force analogous to the second experiment. Yet it has to react, somehow, upon getting half of that ultrasonic wave in the third experiment, because the half wave is asymmetrical, and thus transfers mechanical momentum.

Okay let us just spitball this explanation in fairly unrelated terms. Seems a favorite method. I mean literally spitball this.

A punching bag weighs 100 pounds (45 kilos). I create a paper spitball and shoot it from a regulation straw from 10 feet (3 meters) away. The spitball hits the bag and bounces off. We know it must have imparted some momentum. We can even calculate it if necessary. We might need to also include how much of the energy was absorbed by deformation of the paper spitball vs the relatively stiff punching bag. And then whatever energy did get to the interior of the punching bag (coupled through some impedance we'll say) will get significantly absorbed by the grains of sand in the bag. In fact good chance that most or all of this energy gets turned into heat. And no momentum that can be ascertained is imparted to the bag. Which likely means the bag doesn't start to move and repeated high frequency spit balls won't make it move.

Maybe I should offer some links to acoustic impedance of the ear. But I feel that would be too directly related to the topic. So I'll forego that.
 

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,503
Likes
4,145
Location
Pacific Northwest
...
Analogously, the basilar membrane effectively ignores a vigorous application of ultrasonic force analogous to the second experiment. Yet it has to react, somehow, upon getting half of that ultrasonic wave in the third experiment, because the half wave is asymmetrical, and thus transfers mechanical momentum.
This topic of how well digital encoding captures transients seems to be putting the cart before the horse. The first step is to determine whether it is relevant to human hearing. That means devise a test. Record or construct a high-def (say, 192 k, 24-bit) audio signal having transient pulses that you believe embodies this idea. Doesn't have to be music, can be any kind of sound or even a synthetic test signal. Now resample it to 44-16 (with proper AA filtering, resampling, dither, etc.). The only difference between these 2 audio files is the bandwidth and encoding bit rate. Play them back in a controlled experiment or ABX test and see if people can differentiate them better than random guessing.

But wait... you don't have to do that because it's already been done.

BTW, what's wrong with Fourier transforms? Any sound that actually propagates in air, that we can hear, is by definition a continuous bandwidth-limited function that can be perfectly represented by a FT.
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,181
Location
UK
According to the classic DSP and hearing theories, the 48/24 PCM shall be 100% sufficient. I'm still saying that 192/24 appears to be sufficient: for me, for the music I've heard so far. Other people say that 384/24 works even better for them, for recordings of symphony orchestra.
So where did the 95% come from? I'm not trying to pin you down on the exact figure, but several pages back you you said it was 100% without any caveat.
 

agtp

Member
Joined
Dec 16, 2018
Messages
95
Likes
60
Well, the MQA shill has successfully derailed the thread and has completely shifted focus from Bob and MQA. No one has noticed yet? Is Sergei still arguing for MQA with all these posts?
 
Last edited:
Top Bottom