So, I feel
@Soundmixer has been offering counterargument/disagreement with my initial statement. And, having thought about it, I think I am kinda wrong. Or rather, perhaps the slant I've presented to the issue is. (I do disagree on a few other minor details, but I'll circle back to those at the end of this post...)
Usually speakers are optimized in isolation. So the phase response between speakers doesn't match. The result at low frequencies is a bass response that isn't representative of the bass content in the recording whenever coherent bass content is present in multiple channels.
AND
He's not trying to account for unknowns. He's experiencing a level discrepancy between the recorded content and playback in his reference environment with/without bass management enabled and varying speaker counts.
This is exactly it.
And the reason for it is that two (or more) speaker won't play as fully coherent sources in most normal circumstances. Factors affecting how much they sum include, but are probably not limited to:
- The lower the frequencies sum more completely,
- The closer proximity the speakers the more they sum
- How absorbant the room's acoustic treatment at LF
- The size of the "mix position" / "sweet spot"
- The phase correction
Instead, we normally assume a doubling of speaker sources to add 3dB. This is in everything we do really, and it's an oversimplified assumption. Even in a basic stereo mix, we use a -3dB pan law. (Meaning, something panned hard left or right will modulate 3dB hotter on that single channel than if panned centrally and mod'ing on both.) In reality, some frequncies will be more additive in the room (up to 6dB) and other will be less; subtractive even, depending on listening position vs speaker locations, room reflections etc. But this -3dB pan law is everywhere.
A big part of this is the really simple physics that two point sources in a different location will have points of constructive and destructive interference. You can't phase optimise a usable listening area over the whole spectrum as the wavelengths are too short. This website's got the usual "two-point source" diagram showing this:
https://www.olympus-ims.com/pl/ndt-tutorials/intro/breif-history/
Of course, the lower the frequency, the less of an issue this becomes (or at least, the more correct-able it is with time/phase alignment within a small physical area), but it's still there. This is before you even take in to account the reflections.
In BM, there are no such "mathematical" summing imperfections. But does this mean bass management is better? It's just different. (which, is really what everyone's been saying here, I'm just a bit late to the party!) It's mono. It's mathematically perfect. But it's missing information. Two out of phase signals cancel out in a way they never would in a real space.
I think I've simplified and gone back to basics in my head about where I feel the discrepancy is, and it goes back to our -3dB pan law. In fact, I'd say the -3dB pan law IS the error. The discrepancy is very simply the difference between a -3dB pan law and a +6dB summing of phase coherent sound from 2 channels in BM.
So, I'll give an example:
1 channel = 80dB
Pan this to 2 channels via pan law = 77dB x2.
Sum these back together in to a mono and send to the LFE [(77)+(20log2)] = 83dB.
Meanwhile 77dB x2 above the BM frequency only sums back to around 80dB in the room. Maybe a bit more, but given the short wavelength, high unlikely to ever give 83.
Now extrapolate that out to 7.1.4 and the discrepancy gets wider.
This isn't exactly right as the levels are not what you'd measure - that would depends on how much content was above and below the BM filter - but the imbalance is the important bit.
the main channel limiter (to address bass build-up with BM), and an LFE limiter (to prevent overload of that channel while encoding the content). The specifics of how they all work together are not well understood, and I am sure Dolby wants it to stay that way (proprietary information?).
So, yes, information is limited and I don't know about the object limiting. However, I do feel that these (channel / LFE) limiters are nothing more than a "clip-stopper" on a per-speaker basis.
I believe the encoded object count to be 11+1(LFE) or 15+1(LFE) depending on use case. If you don't mod the LFE track you regain that as a spatial object. Difficult to prove, but it could be done I guess. This is neither here nor there really.
Regarding LFE with full-band content. I would say that, personally, I wouldn't intentionally let a mix go out with high frequency content in the LFE. It's just too risky regardless of negative effects, for me. As it goes, for film work, I tend to LPF the LFE at like 75Hz or lower (unless I really need the LFE a bit wider, in which case I'll just open up the LPF temporarily) which I guess accidentally goes a little way to reducing the discrepancy by it being re-filtered down the line. I may do things differently in music mixes, of which I've done a few but not loads.
@Soundmixer appears to feel differently and personally I think that's fine. Both stances are justifiable and logical to my mind.
Doesn't the limiter kick in if you have like 100 FS objects in the DAW? Does it change with monitoring configuration, e.g. 5.1.2 instead of 7.1.4?
I'd like to find out because this would allow us to determine the worst case scenario. Most AVR manufacturers seem to apply their own bass management downstream of Dolby processing and I've found that they simply use a static headroom. Sometimes the headroom even shrinks when turning up the master volume control. Not a desirable situation.
Yes the renderer's limiter absolutely does kick in. And yes it changes with monitoring config - but only in the sense that you'll probably run out of headroom in 5.1.2 sooner than 7.1.4. More objects will get summed to the same "output" hole with lower speaker counts, so the limiter will grab more to keep it from clipping.
I think my point was more that, in the RMU (un-encoded Atmos) you can have 100 really loud objects and the limiter will work really hard. Once it's been encoded down to 12 spatial objects, I'm not sure they actually get truncated to 12 "full scale" objects at that point. My suspicion is they're stored with headroom (in floating point) and then limited at the decoder. I could easily be talking nonsense; I don't know....
If that's the case, the totally hypothetical worst case scenario without a limiter would insane. 118-ish objects all full scale, encoded down to 12 beyond full scale in FP, then summed down to a single channel....
And I agree, the last situation you describe is a bit crazy.