• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

DIY Loudness Compensation Algorithm

koopal151

Member
Joined
Nov 10, 2021
Messages
7
Likes
3
Hi, I am new to this forum but have been trying to improve my audio setup for a long time. I produce music as a hobby and work as a computer scientist (in algorithm engineering). However, I have no experience with signal processing. For room compensation/frequency response correction, I once used a Behringer DEQ 2496. Then, I switched to a Hifiberry DAC+ DSP running HifiberryOS. In the last month, I have been working in my free time on a Loudness Compensation (LC) algorithm that runs on the DAC+ DSP. I implemented the algorithm in SigmaStudio (see the attached dsp.dspproj).

My LC algorithm analyzes the loudness of the incoming signal to adjust the LC intensity. It is intended to be used in the following way: The signal exiting the DSP should be played back at the same volume always, i.e., you should set your DAC/Amp to a constant fixed volume and instead adjust the volume digitally at the source device. The main philosophy of this approach is that we assume the input signal has been created/mastered at 80 Phon, and that it should sound the same at any loudness. There are upsides and downsides to this approach:

Pro:
- The state of the volume control can be oblivious to the DSP. Traditional LC implementations use the state of the gain knob of an amplifier as a reference for the equalization intensity. This implementation does not require this.
- This algorithm adapts to signals with different loudnesses. Unless you are using a music streaming service with "equal loudness" turned on, the loudness of the signal output by your device may vary significantly depending on the content. Whereas you would have to adjust the gain on an amp (and thereby also change the LC intensity in a wrong way) with a traditional LC implementation, my algorithm automatically adapts to changing loudness.
- This implementation does not change the signal at 80 Phon, if configured correctly. Most other implementations do not support calibration to a reference loudness and therefore almost certainly will alter the sound at 80 Phon.

Contra:
- With very dynamic signals, my LC algorithm may unnecessarily and erratically change the LC intensity. However, by averaging the computed reference loudness over the past ~400ms, I could not perceive this effect in practice.
- This implementation requires an input signal with a bit depth of at least 24. This is necessary because the volume control must be done on the source device, and 16 bits of dynamic range are only sufficient for signals that use the full dynamic range.

Algorithm description:
The algorithm at first computes the digital loudness L(l,r) of the input signal (channels l and r) in dBFS using the flipped ISO 226 80 Phon curve W_80 as frequency weighting (see attached picture) and a first-order low-pass filter F_T with a configurable time constant T (T = 0.4s by default). More formally, it computes L(l,r) = 10 * log_10(F_T((W_80(l)^2 + W_80(r)^2) / 2)) Then, it computes its real perceived loudness P(l,r) = 80 + L(l,r) - L_80 in phon, using the parameter L_80, which is the digital loudness in dBFS that results in a real perceived loudness of 80 Phon. This parameter is dependent on the gain of the amplifier and the playback device and should be calibrated in the following way: Play back a 1000 Hz sine wave and adjust the digital volume until a calibrated SPL meter reports 80 dB SPL. This works correctly because the ISO 226 80 Phon curve is neutral at 1000 Hz. Note that using the flipped ISO 226 80 Phon curve for weighting is philosophically correct, because we assume the signal has been created/mastered at 80 Phon.

Then, given the resulting real loudness x = P(l,r) in Phon, the algorithm computes three values d(x), a(x) and t(x), where d(x) = 1 - (x - 40) / 40, a(x) = |d(x)| and t(x) = 1 if and only if d(x) < 0. d(x) is the loudness deviation relative to 80 phon, a(x) is the absolute deviation, and t(x) is the deviation type.

Now, let y be (a channel of) the input signal, i.e., l or r. With a(x) and t(x), the algorithm then computes the output signal as follows:
If t(x) > 0, then the output signal o(x,y) is o(x,y) = C_L(y) a(x) + y (1 - a(x)).
Similarly, if t(x) <= 0, then we have o(x,y) = C_H(y) a(x) + y (1 - a(x)).
Here, C_L and C_H (see attached pictures) are sets of filters that transform the ISO 226 80 Phon curve into the ISO 226 40 Phon and ISO 226 120 Phon curves in the audible band, respectively (see relative curves in the attached picture). Note that this algorithm closely follows the ISO 226 60 Phon and 100 Phon curves at 60 Phon and 100 Phon, respectively.

Finally, the algorithm limits the signals to prevent clipping. However, if you have enough headroom (L_80 = -25 dB in my case), then the limiters will never act.

In conclusion, the algorithm transforms the input sound s.t. it is not altered if it is played back at 80 phon (which is correct, because we assume that it has been created/mastered at 80 Phon). If, however, it is quieter or louder, then it is transformed in such a way that it sounds the same according to ISO 226, except for its loudness.

Let me know if you have questions regarding the algorithm or the implementation. Do you have suggestions for improving the algorithm?
 

Attachments

  • ISO 226 80 Phon flipped.png
    ISO 226 80 Phon flipped.png
    72 KB · Views: 124
  • C_H.png
    C_H.png
    56.3 KB · Views: 126
  • C_L.png
    C_L.png
    61.8 KB · Views: 125
  • ISO 226 relative curves.png
    ISO 226 relative curves.png
    173.9 KB · Views: 128
  • dsp.zip
    dsp.zip
    130.1 KB · Views: 66
Hello!

From what I'm learn recently modern digital LC implementation are either tied to room correction or do some masurements on it's own: Audissey DynamicEQ, YPAO Volume, Onkyo Fidelity IQ.

However my findings were focused specifically on consumer devices with room correction AND loudness compensation.

Speaking of implementation, you can take a look at Linux Studio - scource code is available, so you can compare it with your implmentation or improve theirs.
https://lsp-plug.in/?page=manuals&section=loud_comp_stereo
 
I appreciate what you're doing but personally I don't have an easy way to use DSP in my home theater/audio system. :(

I DO miss the old loudness compensation even though it was imperfect. (Sometimes I manually adjust the subwoofer level.)

Contra:
- With very dynamic signals, my LC algorithm may unnecessarily and erratically change the LC intensity. However, by averaging the computed reference loudness over the past ~400ms, I could not perceive this effect in practice.
Ideally, it should scan the file in advance like ReplayGain or other loudness normalization. Then your algorithm can be linear and consistent throughout the song (or album). That wouldn't work with streaming but most streaming is loudness normalized you only need to know the (calibrated) volume control setting.

Finally, the algorithm limits the signals to prevent clipping.
Limiting... Eeek! Again I'd prefer it to be linear which can be done if you pre-scan to check the peak levels (like ReplayGain). You may not hit the 80 phon target but IMO a loss of a couple of dB is a better compromise than limiting the dynamics.

- This implementation requires an input signal with a bit depth of at least 24. This is necessary because the volume control must be done on the source device, and 16 bits of dynamic range are only sufficient for signals that use the full dynamic range.
More bits might be "better" but it can still be useful to process or "improve" 16-bit audio. And I assume your algorithm is working in floating point so it could be up-sampled to 24-bits to a 24-bit DAC so no resolution is lost (as long as you have a 24-bit DAC).
 
1767377879923.png


and let your ears decide :)

(I'm just kidding - it be nice to have LC baked in so you never need to think about it)
 
Hello!

From what I'm learn recently modern digital LC implementation are either tied to room correction or do some masurements on it's own: Audissey DynamicEQ, YPAO Volume, Onkyo Fidelity IQ.

However my findings were focused specifically on consumer devices with room correction AND loudness compensation.

Speaking of implementation, you can take a look at Linux Studio - scource code is available, so you can compare it with your implmentation or improve theirs.
https://lsp-plug.in/?page=manuals&section=loud_comp_stereo

Thanks for the info. As far as I can tell, those LC implementations are only static and linked to the gain setting, so they are not really comparable to mine.

I appreciate what you're doing but personally I don't have an easy way to use DSP in my home theater/audio system. :(

I DO miss the old loudness compensation even though it was imperfect. (Sometimes I manually adjust the subwoofer level.)


Ideally, it should scan the file in advance like ReplayGain or other loudness normalization. Then your algorithm can be linear and consistent throughout the song (or album). That wouldn't work with streaming but most streaming is loudness normalized you only need to know the (calibrated) volume control setting.


Limiting... Eeek! Again I'd prefer it to be linear which can be done if you pre-scan to check the peak levels (like ReplayGain). You may not hit the 80 phon target but IMO a loss of a couple of dB is a better compromise than limiting the dynamics.


More bits might be "better" but it can still be useful to process or "improve" 16-bit audio. And I assume your algorithm is working in floating point so it could be up-sampled to 24-bits to a 24-bit DAC so no resolution is lost (as long as you have a 24-bit DAC).

Of course, computing the loudness of a track/album/movie in advance and adjusting the compensation strength accordingly is optimal, but it can only be implemented in the software that has access to the whole audio track before playback. This is implicitly done in music streaming apps like Spotify. They make songs quieter based on their excess loudness (beyond -11 dB LUFS). However, they use LUFS, which uses K-weighting for the loudness computation (probably in order to reduce computation complexity), which is not as accurate as the flipped ISO-226 curve. Also, tracks quieter than -11 dB LUFS and less than -1 dB TP (true peak) are not turned up until they reach either -11 dB LUFS or -1 dB TP (which should be done in my opinion). This results in there being audible volume differences. Also, the static approach cannot be used in real-time applications, where the audio is created on-the-fly, e.g., voice chat, games, and music production/mixing.

My implementation is aimed at approximately compensating the loudness also for these other applications. Due to its dynamic nature, quiet and loud moments will be overcompensated, but that is a worthy trade-off in my opinion.

Regarding the limiter: The limiter could also be removed. If configured correctly (enough headroom), it never acts.

And yes, it can be used to improve 16-bit audio. My point was that with 16-bit audio, you should use the whole range of those 16 bits and do the volume control at the last stage before the DAC, or in the analog domain, if possible. Just to preserve the dynamic range. With 24 bits, there are 24-16=8 bits of dynamic range headroom for the volume control (and EQ filters) in the digital domain, which is better.

Regarding my algorithm, I improved it by emploing the following changes:
- I changed C_L: It now consists of two filters: a +7 dB Low Shelf at 155 Hz with 0.45 Q and a +5 dB High Shelf at 6500 Hz with 0.75 Q. This sounds more natural to me and several people I asked to test it.
- I omitted C_H and a(x) and instead always apply C_L (also if d(x) > 0). More precisely, we have o(x,y) = C_L(y) d(x) + y (1 - d(x)). This sounds more natural for loud signals (> 80 phon) in my opinion.
- I implemented the algorithm in ReaJS, which is a VST 2.0 effects processor that can load user-created scripts (see the attached loudness_compensation.jsfx script). The adaptation time and L_80 are configurable. With this, my algorithm can be used directly in any DAW and also system-wide in Windows using, e.g., EqualizerAPO. Since every processing block in EqualizerAPO sits before the Windows volume control, you should use it in combination with the loudness correction plugin that is part of EqualizerAPO (with attenuation set to 0.5). This adds (static) loudness compensation for the Windows volume control and also has to be calibrated to be neutral at 80 phon.
- The updated SigmaStudio version of my algorithm is attached as a board file (LoudnessCompensation_V2.bin).
 

Attachments

Due to its dynamic nature, quiet and loud moments will be overcompensated, but that is a worthy trade-off in my opinion.
I think this will be most of the track. IMHO this is wrong, because overcompensation will deviate the sound from the original creation.
I opt for static solution.
 
I think this will be most of the track. IMHO this is wrong, because overcompensation will deviate the sound from the original creation.
I opt for static solution.
Have you tried it? I did some blind A/B tests and could not detect the difference between no LC @80 phon and my LC @80 phon. At least with the music I am listening to and producing music (mostly synthwave). For games, voice chat, and videos with varying loudness, it is much better than having no LC. And you can also increase the adaptation time to mitigate overcompensation.
 
Have you tried it?
No, it was my opinion on how loudness should work, to maximally preserve original intent. You can preprocess audio example with your algorithm and publish to make comparison possible. But idea of processing to get better sound already looks questionable.
 
No, it was my opinion on how loudness should work, to maximally preserve original intent. You can preprocess audio example with your algorithm and publish to make comparison possible. But idea of processing to get better sound already looks questionable.
If I have time on the weekend, I will upload audio examples. Trying out this plugin is very simple. Just download ReaPlugs and load the script in a DAW or a system-wide VST 2 Host like EqualizerAPO.

Why is the concept of "processing to get better sound" questionable? Exactly which types of processing deviate from the original intent? Would you say speaker/headphone equalization does? Strictly speaking, static gain-dependent LC also does.
 
Last edited:
Just download ReaPlugs and load the script in a DAW or a system-wide VST 2 Host like EqualizerAPO.
I don't use PC for music.

Exactly which types of processing deviate from the original intent?
For example my AVR offers something called Advanced Sound Retriver.

Would you say speaker/headphone equalization does? Strictly speaking, static gain-dependent LC also does.
It does, but we measure what is wrong and we try to correct it. Your procedure changes sound, when there is no problem, like playing at reference level. If you can't hear differences, then the more reason to drop it.
A question: are you sure, that mastering engineer hasn't already done something similar?
 
It does, but we measure what is wrong and we try to correct it.
My algorithm has the same goal as static LC, but can additionally adapt to changing content loudness.

Your procedure changes sound, when there is no problem, like playing at reference level. If you can't hear differences, then the more reason to drop it.
No, you do not understand my algorithm, please read it again (or look at the code). As my A/B test concluded, my algorithm does not alter the sound audibly at 80 phon (I suppose this is what you mean with "referece level"). This is the intended result, as it implies that you cannot hear the difference between static gain-dependent LC and my algorithm also at lower volumes. This is because the difference (literal signal difference) between what the two LC implementations will produce (static and dynamic) is volume-invariant, assuming they use the same compensation curve. Again, it is the goal of my algorithm to also account for the content loudness, not only the gain-knob state. I hope this makes it more clear.

A question: are you sure, that mastering engineer hasn't already done something similar?
Something similar to what? This is unclear.
 
As my A/B test concluded, my algorithm does not alter the sound audibly at 80 phon (I suppose this is what you mean with "referece level")
I mean volume settings, constant for playback of the whole piece of music. Obviously there are louder and quieter parts, where your algorithm will apply correction, which is not needed.
 
Obviously there are louder and quieter parts, where your algorithm will apply correction, which is not needed.
This is exactly what I tried to verify with my A/B tests, but I failed. So I concluded that the dynamic algorithm works in practice.

Your argument is like saying that filtering out everything above 20 kHz is not needed and therefore detrimental to sound quality. But it is not audible and therefore irrelevant (or even necessary for proper practically lossless DAC).
 
This is exactly what I tried to verify with my A/B tests, but I failed. So I concluded that the dynamic algorithm works in practice.
OK, it works, but can't be heard :)
Which actually is good, if can be thoroughly tested and confirmed.
 
I have done some test. I have used first track from Dark Side Of The Moon, where I have normalized album volume with WaveGain.

I have checked with pink noise normalized with WaveGain, that your plugin doesn't change track volume if I set digital loudness to -23.5dB. Then I have processed music with this settings.

I have no problem to hear the difference, I got 8/8 with ABX at the first time.
 
I have done some test. I have used first track from Dark Side Of The Moon, where I have normalized album volume with WaveGain.

I have checked with pink noise normalized with WaveGain, that your plugin doesn't change track volume if I set digital loudness to -23.5dB. Then I have processed music with this settings.

I have no problem to hear the difference, I got 8/8 with ABX at the first time.
There is a methodological flaw with your approach. And a philosophical difference between your and my view on how LC should work.

First, ReplayGain (part of WaveGain) uses K-weighting for the loudness computation. My algorithm uses the flipped ISO-226 80 phon curve. So even when assuming the music has the same average spectral distribution as pink noise, setting digital loudness to the (flipped ISO-226 weighted) loudness (-23.5 dB) of pink noise played back at a volume that results in the same K-weighted loudness as the average of the album won't result in the d(x) parameter of my algorithm averaging 0 after one whole playback of the album, which is probably what you tried to achieve. The offset of the d(x) value probably results in a general misalignment of the overall tonality. For your approach, it would be correct to compute the (flipped ISO-226 weighted) loudness of the average of the album and set digital loudness to that value.

I have processed some private music with the algorithm and set digital loudness (DL) manually for each. Note that this cannot result in a smaller difference (between raw and processed) as the optimal value for DL. I also provide the difference between the raw and processed audio. https://drive.google.com/drive/folders/1-7LF2jwGn9-9rx6JzcuwhuqX0fnv04ow?usp=sharing

Regarding the philosophical difference between your and my view on how LC should work: Let Y be the maximum volume (amplitude) your neutral (linear) monitoring system (MS) is capable of producing sound at. In your view, you set your neutral static LC gain to some volume X in dBFS. And then you assume that whatever music is played back was mixed and mastered with a neutral MS at volume Y + X and therefore (due to the neutral frequency responses) the same loudness as with your monitoring system. Then you turn your knob, and you will get exactly the same perceived tonality using static LC, assuming the compensation curve matches your hearing or brain processing or whatever. The problem with this approach is that it does not work, because Y is unknown. So if you normalize a whole album (like you did), this will result in a wrong static LC gain offset. If Y would be shipped with the music (like scene-based tonemapping in HDR10+ or Dolby Vision), then your approach would be correct.

My approach is to assume that (even every single section in) music has been mastered at a monitoring loudness of 80 phon, which is standard practice in mastering studios. This is because traditionally, a mastering engineer sets up the gain such that the "slow" C-weighted average loudness of the past few seconds is approximately 80 to 85 dB SPL. This is roughly equivalent to 80 phon, weighted with the flipped ISO-226 80 phon curve (maybe there is a 1-2 dB difference based on the spectral distribution of the music). It is also standard practice to adjust the gain per section. In summary, my algorithm ensures that the input signal is perceived with the same spectral distribution as if it was played back at 80 phon (the loudness the mastering engineer probably used). It abstracts away from the digital loudness of a signal and strictly enforces the philosophy of fixed perceived monitoring loudness, even on a momentary basis.

So, even if a difference between the raw and processed audio is audible, it is unclear whether this difference is incorrect, because if there is a difference in a given section of a musical piece, then there is certainly a large deviation between the average loudness of the track and the loudness the mastering engineer used to edit this section, and my algorithm works to compensate for that.
 
For your approach, it would be correct to compute the (flipped ISO-226 weighted) loudness of the average of the album and set digital loudness to that value.
Then how do you calibrate your setup? What is test signal and measurement procedure, to set amplifier gain for 80 phon at listener position?
 
Back
Top Bottom