• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Convert FLAC 192 to 96 and the files get Bigger?!

MRC01

Major Contributor
Joined
Feb 5, 2019
Messages
3,859
Likes
4,690
Location
Pacific Northwest
When I download music in 192-24 I often downsample it to 96-24 to save space, since there should be no perceptual difference. To do this I use sox 14.4 on my Ubuntu 24 system, specifically:
Code:
mt="--multi-threaded"
bf="--buffer 2097152"
srate="96000"
sox -V2 -S $bf $mt "$file" "${outdir}/${file}" rate -v "${srate}"
Usually the 96k files are smaller, 3/4 of the size. Yet lately I've encountered a couple of albums where all the files got slightly bigger when downsampled. Both versions (192 and 96) sound fine and a spectrum analysis looks the same. I can't think of anything that would cause this. Any ideas?
 
When I download music in 192-24 I often downsample it to 96-24 to save space, since there should be no perceptual difference. To do this I use sox 14.4 on my Ubuntu 24 system, specifically:
Code:
mt="--multi-threaded"
bf="--buffer 2097152"
srate="96000"
sox -V2 -S $bf $mt "$file" "${outdir}/${file}" rate -v "${srate}"
Usually the 96k files are smaller, 3/4 of the size. Yet lately I've encountered a couple of albums where all the files got slightly bigger when downsampled. Both versions (192 and 96) sound fine and a spectrum analysis looks the same. I can't think of anything that would cause this. Any ideas?
Were they mono input files, getting converted to stereo? Or dithering making them less compressible?
 
Were they mono input files, getting converted to stereo? Or dithering making them less compressible?
All files are stereo, though sox does apply dither (TPDF by default). Yet the original should already be dithered, and it is a natural acoustic recording that captures the ambient room and mic noise (similar to dither). Zooming into the original 192-24 files shows nothing "smooth".

PS: correction: on further read of the SOX man page, it might not be applying dither. My simple conversion doesn't meet any of the conditions that it lists for applying it.
 
Last edited:
All files are stereo, though sox does apply dither (TPDF by default). Yet the original should already be dithered, and it is a natural acoustic recording that captures the ambient room and mic noise (similar to dither). Zooming into the original 192-24 files shows nothing "smooth".

PS: correction: on further read of the SOX man page, it might not be applying dither. My simple conversion doesn't meet any of the conditions that it lists for applying it.
Are you applying the max compression flac supports?
 
Are you applying the max compression flac supports?
No compression at all. Cutting the sample rate in half with no other processing.
Oh, you meant lossless file compression, not dymamic range compression. Yes they are both FLAC level 8.
If I un-FLAC the files, the 192k WAV file is twice the size of the 96k WAV file, as expected.
Hang on, I'll test re-flaccing the WAV files again...
And yes, the strangeness continues.
The 96k WAV has size 98.5 MB, its FLAC -8 is 51.8 MB, same as before.
The 192k WAV has size 197 MB, its FLAC -8 is 39.9 MB, same as before.
 
Last edited:
How does the file size compare to your other 24/96 FLACs? Just approximating... I'd expect a bitrate of around 3200kbps (around 35MB per minute) for a 24/96 FLAC. (Around 60-70% of the WAV.)

One possible scenario is that the files were up-sampled from 16-bit without dither. That leaves 8-bits full of zeros (easy to compress) and although the WAV will be 50% larger the FLAC would be the same size as a FLAC made from the original 16-bit file. If you change the sample rate all 24-bits will be filled with data and you'd get a bigger FLAC.
 
One possible scenario is that the files were up-sampled from 16-bit without dither. That leaves 8-bits full of zeros (easy to compress) and although the WAV will be 50% larger the FLAC would be the same size as a FLAC made from the original 16-bit file. If you change the sample rate all 24-bits will be filled with data and you'd get a bigger FLAC.
BINGO! I think this is close to the mark. In the original file, the smallest sample values are around -115 dB, and some 0. There are no points between -115 and 0.
1759264194735.png


In the downsampled file, few/none of the samples are 0 and the range from -115 to 0 is actually used:
1759264278836.png


That suggests this was originally a 20-bit recording. And this says that sox did dither the file. Perhaps that is necessary when applying the low-pass filter, even though it's downsampling by an integer multiple (half the rate).
 
Perhaps that is necessary when applying the low-pass filter, even though it's downsampling by an integer multiple (half the rate).
You have to low-pass filter when down-sampling to prevent (potential) aliasing. And I assume the same algorithm is used whether it's an even multiple or factor, or not.
 
You have to low-pass filter when down-sampling to prevent (potential) aliasing. And I assume the same algorithm is used whether it's an even multiple or factor, or not.
Yah, downsampling by an integer multiple can be done more efficiently, but I don't know whether sox takes advantage of that.

Wikipedia has a description: https://en.wikipedia.org/wiki/Downsampling_(signal_processing)

If I understand the math, the h[k] factor "scales" the samples so if computed at 24-bit, even against data that was originally 20-bit, it could fill in the missing tiny values like we see above, even if no dither was applied.
 
sox -V3 will show if dither is applied in the effects chain.

What file size do you get with
Bash:
sox -V3 -S $bf $mt "$file" "${outdir}/${file}" rate -v "${srate}" dither -p 20
 
Last edited:
sox -V3 will show if dither is applied in the effects chain.
Sure, but even if no dither was applied, it looks like the downsampling itself could have filled in the "gap" in values between -115 and 0 observed above.

PS: when I run sox at -V3 on that file I get this. It doesn't mention dither, so it appears the "randomization" above is coming from the resampling computation.

Code:
sox:      SoX v14.4.2
sox INFO formats: detected file format type `flac'

Input File     : '01-Contrapunctus 1.flac'
Channels       : 2
Sample Rate    : 192000
Precision      : 24-bit
Duration       : 00:02:51.03 = 32838620 samples ~ 12827.6 CDDA sectors
File Size      : 39.9M
Bit Rate       : 1.87M
Sample Encoding: 24-bit FLAC
Endian Type    : little
Reverse Nibbles: no
Reverse Bits   : no

sox INFO flac: encoding at 24 bits per sample

Output File    : '96/01-Contrapunctus 1.flac'
Channels       : 2
Sample Rate    : 96000
Precision      : 24-bit
Duration       : 00:02:51.03 = 16419310 samples ~ 12827.6 CDDA sectors
Sample Encoding: 24-bit FLAC
Endian Type    : little
Reverse Nibbles: no
Reverse Bits   : no
Comment        : 'Processed by SoX'

sox INFO sox: effects chain: input       192000Hz  2 channels
sox INFO sox: effects chain: rate         96000Hz  2 channels
sox INFO sox: effects chain: output       96000Hz  2 channels
In:100%  00:02:51.03 [00:00:00.00] Out:16.4M [      |      ] Hd:4.7 Clip:0    
Done.
 
Last edited:
Sure, but even if no dither was applied, it looks like the downsampling itself could have filled in the "gap" in values between -115 and 0 observed above.
Agreed. The cost of 24 bit precision.
 
Back
Top Bottom