- Joined
- Feb 4, 2025
- Messages
- 26
- Likes
- 46
Hey everyone,
I've been experimenting with real-time loudness leveling (LUFS-based, with the ability to swap K-weighting for ECMA-418 or ITU-468 for example). The LUFS side of things wasn't much of a problem because I've been working on that for about 4 years straight now. But one challenge I ran into was breaths getting pulled up by the leveler.
The solution that I've ended up going with was a "Guard" mechanism which uses a secondary, shorter LUFS window to quickly respond to out-of-distribution volume levels, which is where breaths tend to live. It basically maintains a rolling histogram of recent loudness measurements, at that shorter window duration, and let's you configure a percentile threshold where the leveler will hold gain instead of ampifying. That way it leaves the natural distance between the main audio and the breaths. There's also a parameter to suppress those breaths further, though generally you pay for that because you've got to ramp the gain back up for the next phrase.
It's working quite well, validated through a dataset of annotated breaths:
https://apu.software/analysis/Guard...ev-clean-3853_163249_000160_000000event2.html
The leveler itself continuously monitors the source audio's LRA and uses that to corral the target into the configured loudness/tolerance band. I've been enjoying tossing it on my PC's system audio via VB-Cable. I listen to lots of real-time dialog online and this fixes everyone's inconsistent microphone levels with very little configuration.
Has anyone else worked with this type of tech before? I'd love to hear any ideas for future improvements, or problems this tool should solve.
https://apu.software/leveler/
I've been experimenting with real-time loudness leveling (LUFS-based, with the ability to swap K-weighting for ECMA-418 or ITU-468 for example). The LUFS side of things wasn't much of a problem because I've been working on that for about 4 years straight now. But one challenge I ran into was breaths getting pulled up by the leveler.
The solution that I've ended up going with was a "Guard" mechanism which uses a secondary, shorter LUFS window to quickly respond to out-of-distribution volume levels, which is where breaths tend to live. It basically maintains a rolling histogram of recent loudness measurements, at that shorter window duration, and let's you configure a percentile threshold where the leveler will hold gain instead of ampifying. That way it leaves the natural distance between the main audio and the breaths. There's also a parameter to suppress those breaths further, though generally you pay for that because you've got to ramp the gain back up for the next phrase.
It's working quite well, validated through a dataset of annotated breaths:
https://apu.software/analysis/Guard...ev-clean-3853_163249_000160_000000event2.html
The leveler itself continuously monitors the source audio's LRA and uses that to corral the target into the configured loudness/tolerance band. I've been enjoying tossing it on my PC's system audio via VB-Cable. I listen to lots of real-time dialog online and this fixes everyone's inconsistent microphone levels with very little configuration.
Has anyone else worked with this type of tech before? I'd love to hear any ideas for future improvements, or problems this tool should solve.
https://apu.software/leveler/
) before printing the final output.