MQA creator Bob Stuart answers questions.

Costia · Jun 10, 2019

B The Compression ratio is compressed size/uncompressed size * 100. So, lower is better.

So it says it's 1.4% less efficient?

MRC01 · Jun 10, 2019

The best in the chart is OptimFROG which is only 2.4% smaller than FLAC, and much slower.
I wonder if we can get a mathematical baseline to estimate the theoretical limit: what is the smallest % that is possible? That's what I meant earlier by measuring Shannon entropy of PCM/WAV files.

mansr · Jun 10, 2019

amirm said:
FLAC is not that good although the distance between good and great is very small. My team at Microsoft developed WMA Lossless with the goal of beating all the others in efficiency. You can see that ("WMAL") in this table from HA wiki: https://wiki.hydrogenaud.io/index.php?title=Lossless_comparison

View attachment 27481

That extra 1.4% of efficiency cost us in the marketplace as at the time, many CPUs were too slow to decode it in embedded platforms.

This is a bit like getting cream/butter from milk. The first pass will take out most of the fat. What is left over is very little.

According to that table, WMAL has slightly worse efficiency than FLAC. Of course, the exact results will depend on the material, but it's not looking good for WMAL.

Blumlein 88 · Jun 10, 2019

I can get FLAC to compress a 24 bit wav file to only 8.9% the size of the original. Of course I cheated. I was encoding a quarter sample rate tone. Looks like that is as good as FLAC gets because encoding all zeros gave the same file size in FLAC.

MRC01 · Jun 10, 2019

You might get it even smaller if the amplitude is lower: all the higher order bits will be zero (assuming integer, not float).
The theoretical minimum will depend on the content of the music. Tests like yours give a lower bound. The upper bound could be measured from PCM files of complex signals having wide dynamic range with peak levels near max / 0 dB.

Costia · Jun 11, 2019

Upper bound is no compression at all. Try compressing white noise.
It can actually be bigger than the source file due to metadata

PierreV · Jun 11, 2019

MRC01 said:
Mathematically, I can define a square wave that has a distinct unique value for every time t, no need for a single time t to have multiple values:
F(t) = sign(cos(t))
Consider a transition point (say, t = pi/2). When t = pi/2, my square wave (voltage) is 0. Just before that, it's 1. Just after that, it's -1.
Thus, there are never multiple voltages at the same time. But there is an infinite rate of change, which is impossible in the real world.

Yes, I am aware of that, there are still discontinuities though, and it is a convergence issue. The heavyside approach is a convention, but a useful tool. sin works as well ofc. and many other things. IMHO - it is a discontinuous pulse train, not a "wave". The definition for a pure "function" calls for a 1 to 1 mapping from the domain and the co-domain and that of course would be violated in a theoretical perfect square "wave" as represented on audio sites. In the real world, one needs a dx, even if it is an infinitely small one - in that case, it is a function. Without a dx, it is not a function or is a function by convention only.

It's nitpicking, maybe, but many people object, rightly imho, to the staircase representation of a digitally sampled signal as it opens the door to misinterpretation.

I feel the "square wave" representation leads to the same type of misinterpretation. Here's the mathematically correct representation of the function you gave for the "square wave" - a bit different from what people usually picture in their mind.

Note: not trying to lecture or be pedantic in any way, and I am sure you are well aware of all this. But if the staircase representation should be criticized as it is rightly in that famous video, the square "wave" deserves the same treatment I think.

MRC01 · Jun 11, 2019

I get your point. I wasn't trying to be pedantic either, but point out that of the 2 reasons mentioned here why a square wave can't exist in nature, one is more fundamental than the other.
Reason 1: because it has multiple values for a single instant in time
Reason 2: because it requires an infinite rate of change
The first reason is not fundamental because it is possible to define a square wave function that doesn't have multiple Y values for a given X value. However, the second is fundamental because it is impossible to define a square wave that doesn't have an infinite rate of change.
Whether a square wave really is a wave... I'll leave that to the philosophers. Suffice to say that it's a term in common use.

MRC01 · Jun 11, 2019

Costia said:
Upper bound is no compression at all. Try compressing white noise.
It can actually be bigger than the source file due to metadata

Interesting experiment. I just tried it: got 49.7% ratio (FLAC half the size of the original).
But that file was at -6 dB, so the top bit of amplitude wasn't used. I made another at digital full scale and it FLAC compressed at 50.2%.

Sergei · Jun 11, 2019

Costia said:
Can you link to a paper about the fast integrator?

That's a good, albeit older, read: https://www.pnas.org/content/100/10/6151.

Please note that the study was done just above the threshold of hearing, so the integration times needed were of the order of milliseconds. At 1000x sound pressures - that is, at normal listening level - they are of the order of microseconds.

In later studies, coefficients of the integration algorithm were refined, yet its formula remained the same.

Sergei · Jun 11, 2019

amirm said:
Thanks. But that misses the last part of my sentence. I like to see @Sergei run his listening tests and post his observation and files. Then we can get somewhere as opposed to a theoretical discussion, or dismissal of the results after the fact because the test files were not this way or that way.

Cat is out of the bag on this one. Will take Amir's suggestion when something like this is needed again.

Sergei · Jun 11, 2019

somebodyelse said:
Fair point. With my devil's advicate hat on I'd argue telling you what you're supposed to hear before you hear it would affect what you hear. Is there a way to do a sealed post with the "here's what you should have heard and why" part that can only be opened some time later, or do we just have to trust people not to open spoiler tags in this sort of situation?

I like the idea.

somebodyelse said:
Having said that I'm missing how this specific test is relevant. It's an interesting demonstration of a phenomenon I didn't know about, but it's something that can be captured and reproduced by the existing recording/playback chain.

It demonstrates that the hearing system entangles duration and loudness. The LTI theory doesn't entangle duration and power. Thus, an energy-preserving linear transform valid under the LTI, such as a filter, may not be automatically perceptually faithful.

This was in illustration of the statement that widening an analog "too sharp to represent" pulse by replacing it with a wider "sampling rate friendly" pulse having the same LTI energy, or even the same perceptually accurate energy, may not be faithful because the perceptual timing may be off.

nscrivener · Jun 11, 2019

Sergei said:
I like the idea.

It demonstrates that the hearing system entangles duration and loudness. The LTI theory doesn't entangle duration and power. Thus, an energy-preserving linear transform valid under the LTI, such as a filter, may not be automatically perceptually faithful.

This was in illustration of the statement that widening an analog "too sharp to represent" pulse by replacing it with a wider "sampling rate friendly" pulse having the same LTI energy, or even the same perceptually accurate energy, may not be faithful because the perceptual timing may be off.

The key word in your statement being MAY.

This is waxing hypothetical taken to the extreme.

It's just another way of saying that the effects of anti-aliasing filters may be audible under some conditions.

Sergei · Jun 11, 2019

MRC01 said:
Put differently: construct square wave (A) using 1 MHz bandwidth. Construct square wave (B) using 25 kHz bandwidth. All else equal: frequency, amplitude, phase. We humans can't hear the difference between A and B. At least, I've never seen evidence suggesting this.

Depends on the amplitude. The 25 KHz wave may be still subsonic, whereas the 1MHz one could require a transducer cone to move faster than the speed of sound, generating a shockwave.

Regular sound wave may go up to 194 dB SPL: very loud and ear-damaging, but that's it. A strong enough shockwave topples buildings. Or the main character in "Back to the future"

Sergei · Jun 11, 2019

nscrivener said:
The key word in your statement being MAY.

You got it. Precisely. Each of us only MAY get into a car accident on a given day. Still, isn't it wise to wear a seat belt every time anyway?

I view higher sampling rates in a similar light. Most of the time we don't need them to faithfully record music. Yet sometimes we do.

Maybe to capture that once-in-a-lifetime crazy electric guitar solo, the heart-grabbing edginess of which would be smoothed to the point of boredom by a lower sampling rate.

nscrivener · Jun 11, 2019

Sergei said:
You got it. Precisely. Each of us only MAY get into a car accident on a given day. Still, isn't it wise to wear a seat belt every time anyway?

I view higher sampling rates in a similar light. Most of the time we don't need them to faithfully record music. Yet sometimes we do.

Maybe to capture that once-in-a-lifetime crazy electric guitar solo, the heart-grabbing edginess of which would be smoothed to the point of boredom by a lower sampling rate.

Actually, no, I take "may" to mean that it's not demonstrated and may not be an effect at all.

Cosmik · Jun 11, 2019

MRC01 said:
The fact that some people under ideal conditions can discern 44-16 from higher rate formats was discussed and referenced earlier in this thread.

Was it? Didn't someone point out that in terms of sample rate and bandwidth they may simply be hearing audible products of intermodulation in the hardware rather than the format itself..?

Costia · Jun 11, 2019

Sergei said:
That's a good, albeit older, read: https://www.pnas.org/content/100/10/6151.

Please note that the study was done just above the threshold of hearing, so the integration times needed were of the order of milliseconds. At 1000x sound pressures - that is, at normal listening level - they are of the order of microseconds.

In later studies, coefficients of the integration algorithm were refined, yet its formula remained the same.

Could you point me to the part of the paper which talks about a fast integrator working over a time frame of microseconds?
The lowest integration time I saw in the graphs there was ~1ms.

Costia · Jun 11, 2019

MRC01 said:
Interesting experiment. I just tried it: got 49.7% ratio (FLAC half the size of the original).
But that file was at -6 dB, so the top bit of amplitude wasn't used. I made another at digital full scale and it FLAC compressed at 50.2%.

That doesn't sound right.
How did you generate the noise?
Did you make a stereo file with 2 identical channels?

SIY · Jun 11, 2019

nscrivener said:
This is waxing hypothetical taken to the extreme.

It's goalpost-moving. We're now far into the territory of irrelevant, obfuscatory, and scattered. Any of the original (incorrect) points related to signal generation, capture, and replay have been long forgotten, which may be the idea.

MQA creator Bob Stuart answers questions.

Member

Major Contributor

Major Contributor

Grand Contributor

Major Contributor

Member

Major Contributor

Major Contributor

Major Contributor

Senior Member

Senior Member

Senior Member

Member

Senior Member

Senior Member

Member

Major Contributor

Member

Member

Grand Contributor

Similar threads