• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

How to detect clipping when converting DSD to PCM?

TunaBug

Active Member
Forum Donor
Joined
Aug 19, 2021
Messages
139
Likes
142
Location
Seattle-ish
TLDR: using FFMPEG, how can I programmatically detect clipping when converting DFF to PCM if I have the gain too high?

I have a couple hundred SACDs which I normally listen to as DSF, but I'm tired of them being a pain in the audio chain. I've put together scripts to convert to PCM. My goal here is to script the whole thing so that I can go to bed, wake up the next morning, and have a bunch of PCMs. Fortunately I saved the ISOs when originally ripping :)

I have read somebody's advice to use +4 gain when converting as a good way to avoid clipping, but my preference is to have the gain as close to +6 as possible. Since I'm scripting everything it would be no big deal to try at +6 and then back off, if only I knew when to back off. Any pointers? I'm currently using ffmpeg for the conversion, but I'm not married to it; I was originally using SOX until I discovered it doesn't support multichannel DSD -> PCM. Something that would return a non-zero exit code would be ideal, but it's no big deal if I have to invoke a tool with detailed logging and then dig through stdout/stderr.

BTW, I know somebody will say "Just buy a DAC that supports DSD". I have those.

ETA: Trying to summarize requirements:
* Output to 24bit PCM. Specifically signed integer 24 bit.
* Command line only. No GUI
* Support multichannel
 
Last edited:
Audacity can Show Clipping (I believe that's the default.) It can also normalize, or otherwise adjust the volume so you can leave headroom and amplify later.

Or, if you can convert to floating point PCM it won't clip. If it goes over 0dB you can load it into Audacity, lower the volume, and export as your desired format.
 
  • Like
Reactions: EJ3
but my preference is to have the gain as close to +6 as possible.
Personally, I'd do it without gain and then normalize afterward. It's not like there exist any playback hardware on which it would make a difference.

Since I'm scripting everything it would be no big deal to try at +6 and then back off, if only I knew when to back off. Any pointers?
Convert to floating point wav:
Code:
]$ ffmpeg -i INPUT_FILE -lavfi "aresample=96000, volume=6dB" -c pcm_f32le  tmp.wav
(I'm not sure about the best DSD conversion options, that's just an example and the point is the "-c pcm_f32le" part.)

Then check the sample peak:
Code:
]$ ffmpeg -i tmp.wav -af astats=measure_perchannel=none -f null - 2>&1 | grep "Peak level"
[Parsed_astats_0 @ 0x7f53d0002680] Peak level dB: 1.611714
(or check true peak if you prefer)

Then reduce the volume appropriately and convert to flac:
Code:
]$ ffmpeg -i tmp.wav -lavfi "volume=-2.6dB" out.flac

In case you convert to 16-bit, consider using dither:
Code:
]$ ffmpeg -i tmp.wav -lavfi "volume=-2.6dB,aresample=osf=s16:dither_method=triangular" out.flac
 
Last edited:
I don't know. Perhaps split the difference at +5?
+4 is already split the difference between 0 and 6. To be clear, 4 would work.

To be clear, what I'm going for is silly hobbyist nerdy engineer stuff. I can imagine how this would work therefore I must build it. :)
 
  • Like
Reactions: EJ3
Thank you @Kal Rubinson , @GXAlan , and @danadam . I didn't know about the Tascam HI-Res Editor. I'll look at it and play with it even if it doesn't support multichannel. Just to learn about additional tools.

@danadam, that's a pretty good suggestion on the toolchain. The only thing that jumps out at me is that in a previous attempt at using WAV as an intermediate format I received warnings that it exceeded the WAV 2G file limit, and that indeed I was unable to read it back afterwards.

I didn't mention this before, but I'm converting the ISO to a single large DSDIFF Edit Master file, then translating that, gaps and all, to PCM, then slicing the PCM into individual tracks as a near-final step. That was recommend (lost the link) somewhere as best practice for gapless conversions. I don't care about the format of the intermediate files, though. Probably raw PCM would be ideal, in which case I'd have questions bout how the f* do I specify the input/output formats to ffmpeg. Frankly ffmpeg's command line confuses the heck out of me as far as syntax and what is allowed in different places.

I have no problems with FP32 PCM as an intermediary format, as it makes a lot of sense for an internal format. But I want good 'ol 24-bit signed integer PCM FLAC as a final format. For home audio purposes FP32 seems almost as inconvenient a a format as DSD.
 
The only thing that jumps out at me is that in a previous attempt at using WAV as an intermediate format I received warnings that it exceeded the WAV 2G file limit, and that indeed I was unable to read it back afterwards.
You can try ".w64" instead of ".wav". That's suppose to be wav extension which supports large files.
 
Then check the sample peak:
Code:
]$ ffmpeg -i tmp.wav -af astats=measure_perchannel=none -f null - 2>&1 | grep "Peak level"
[Parsed_astats_0 @ 0x7f53d0002680] Peak level dB: 1.611714
(or check true peak if you prefer)

Thanks for "astats", that's excellent. I had been thinking of conversion just failing if there was clipping, meaning I would potentially re-convert multiple additional times. But with astats I can know the peak and then just convert exactly one more time. The docs were not clear on the "overall" vs "perchannel" values from astats, but it appears that the overall values are maximums and not averages, which is what I want.

You mentioned "or check true peak if you prefer", which sounds good, but a) I don't see how to get that, and b) what's the difference between what astats calls "Peak level" and what you're referring to as "true peak"?
 
Last edited:
what's the difference between what astats calls "Peak level" and what you're referring to as "true peak"?
"Peak level" is the sample peak level. "True peak" is an estimation of the peak in the reconstructed analog signal. It considers what's between the samples. Usually it's just x-times oversampling and checking what is the new sample peak in the output.

You mentioned "or check true peak if you prefer", which sounds good, but a) I don't see how to get that
There's either "loudnorm" or "ebur128" filter in ffmpeg:
Code:
]$ ffmpeg -i INPUT_FILE -af loudnorm=print_format=json -f null - 2>&1 | grep input_tp
        "input_tp" : "2.63",

]$ ffmpeg -i INPUT_FILE -af ebur128=peak=true -f null - 2>&1 | grep Peak
    Peak:        2.6 dBFS
Or loudness-scanner (or similar tools):
Code:
]$ loudness scan -p all INPUT_FILE
  Loudness, Sample peak,   True peak,  True peak
 -7.9 LUFS,    0.942200,    1.299965,   2.3 dBTP, INPUT_FILE
-------------------------------------------------------------------------------
 -7.9 LUFS,    0.942200,    1.299965,   2.3 dBTP
Or even manual upsampling in SoX:
Code:
]$ sox INPUT_FILE -n gain -10 rate $((4*44100)) stats 2>&1 | grep "Pk lev dB"
Pk lev dB      -7.26     -7.26     -9.90
(here you have to reduce gain to not clip and then calculate the true peak, in this case: 10 - 7.26 = 2.74)

As you can see, depending on the tool, the true peak estimation may vary.

(BTW, the track above with true peak higher than +2 dB is Nightwish / Highest Hopes (Limited Edition) / 11. Deep Silent Complete.flac)
 
Last edited:
Thanks @danadam for the pointers. I'm declaring the astats Peak Level as good enough for now. This is working quite well.

An observation for anybody that tries this:

"astats" can be placed in the filter chain that converts DSD to PCM, and it runs before the down-conversion from the internal FP32 to whatever output you're using. So this means that

Code:
ffmpeg  -i album.dff -lavfi "aresample=88200, volume=6dB, astats=measure_perchannel=none" album.flac

will convert the DFF album into 24-bit flac, which might clip values, but the astats processing happens before that clipping. I've found that 95% of the time there's no clipping and you end up with a single-pass over the file to create a 24-bit PCM that is ready to split into tracks. This seems better and faster than the conversation above which described using fp32 files as an intermediate for conversion which required another pass over the data to convert fp32 into s24.
 
Back
Top Bottom