If you're like me, when you listen to music (or TV/movies or any other content), you will increase the volume to the maximum level that still sounds good. It's well known that, in general "louder" is perceived as "better". But I think everyone would agree that this is only true up to a point. There is always a point where increasing the volume further makes the sound less appealing and more… annoying, "aggressive", "shouty" or whatever you want to call it.
From a psychoacoustical perspective, for a given piece of content, how is this "maximum subjectively preferred loudness level" determined? What makes us want to turn down the volume?
(Note that for the remaining of this discussion, I am assuming that we are not discussing levels so high as to get physically uncomfortable (i.e. close to pain threshold and the like). I think it's already quite obvious why someone would not want to push the volume past such limits.)
Intuitively, the most straightforward theory is that when we perceive something as "too loud" it is because the playback system is being driven into clipping. (By "clipping" here I mean any kind of non-linear distortion that increases quickly above a certain playback level threshold, including compression.) Clipping degrades the audio quality (duh), making the stimuli less pleasant and compelling the listener to turn down the volume to get a more enjoyable sound.
I'm not saying this doesn't happen - clearly, if you run into clipping you're gonna have a Bad Time and you'll want to turn the volume down.
However, in my experience clipping alone doesn't explain all instances where we perceive something as "too loud".
In particular, I often tend to notice (and obviously this is very subjective and anecdotal) that this "maximum subjectively preferred loudness level" tends to be significantly higher on systems that have better frequency response. One especially telling example is that I'm happy to listen to properly room EQ'd systems much louder than I would without EQ. This leads me to suspect that this is not just about clipping or non-linear distortion (in fact room EQ would tend to reduce headroom) - the frequency response of the system plays a role as well.
I was comforted in this view when I ran an experiment whereby I recorded "reasonably loud" as well as "uncomfortably loud" music played back through my speakers to a microphone, and then level-matched the recordings and ABX'd them through headphones. I was unable to distinguish between them, which proved no clipping/compression was taking place and that system non-linearities could be ruled out as a factor in my perception of excessive loudness. (This was a very useful experiment by the way, because it convinced me that the peak SPL of my speakers was perfectly fine and that there was no need to upgrade them.)
I would like to put forth an hypothesis, which is that we perceive something as "too loud" when the peaks in the signal spectrum are too loud. This would explain why a non-EQ'd system with a poor frequency response sounds runs into "too loud" territory quicker as one turns up the volume - it's because the peaks (i.e. the resonances) in the irregular frequency response get too loud, creating a "perceptual loudness bottleneck" preventing the rest of the spectrum from being turned up higher. This would be consistent with the well-known fact that we are more susceptible to peaks in the frequency response.
This problem can be caused by a problematic playback system, but of course these peaks could be in the source material too. Such poor recordings will sound unpleasant when pushed too loud, whereas properly made recordings free of resonances can be enjoyed at higher volume.
I wonder if anyone is aware of any research that looked at this question? Namely the factors that contribute to the "maximum subjectively preferred loudness level" for audio reproduction.
If my hypothesis is correct, this has implications with regard to the advice that should be given to someone who complains that their system don't produce enjoyable sound at high levels. I get the impression that the standard approach to this question (on ASR and elsewhere) is to assume clipping and recommend equipment capable of higher SPL levels. But I'm really sceptical this is the right approach in all (or even the majority) of cases, and that many such cases have more to do with frequency response and source material quality instead.
From a psychoacoustical perspective, for a given piece of content, how is this "maximum subjectively preferred loudness level" determined? What makes us want to turn down the volume?
(Note that for the remaining of this discussion, I am assuming that we are not discussing levels so high as to get physically uncomfortable (i.e. close to pain threshold and the like). I think it's already quite obvious why someone would not want to push the volume past such limits.)
Intuitively, the most straightforward theory is that when we perceive something as "too loud" it is because the playback system is being driven into clipping. (By "clipping" here I mean any kind of non-linear distortion that increases quickly above a certain playback level threshold, including compression.) Clipping degrades the audio quality (duh), making the stimuli less pleasant and compelling the listener to turn down the volume to get a more enjoyable sound.
I'm not saying this doesn't happen - clearly, if you run into clipping you're gonna have a Bad Time and you'll want to turn the volume down.
However, in my experience clipping alone doesn't explain all instances where we perceive something as "too loud".
In particular, I often tend to notice (and obviously this is very subjective and anecdotal) that this "maximum subjectively preferred loudness level" tends to be significantly higher on systems that have better frequency response. One especially telling example is that I'm happy to listen to properly room EQ'd systems much louder than I would without EQ. This leads me to suspect that this is not just about clipping or non-linear distortion (in fact room EQ would tend to reduce headroom) - the frequency response of the system plays a role as well.
I was comforted in this view when I ran an experiment whereby I recorded "reasonably loud" as well as "uncomfortably loud" music played back through my speakers to a microphone, and then level-matched the recordings and ABX'd them through headphones. I was unable to distinguish between them, which proved no clipping/compression was taking place and that system non-linearities could be ruled out as a factor in my perception of excessive loudness. (This was a very useful experiment by the way, because it convinced me that the peak SPL of my speakers was perfectly fine and that there was no need to upgrade them.)
I would like to put forth an hypothesis, which is that we perceive something as "too loud" when the peaks in the signal spectrum are too loud. This would explain why a non-EQ'd system with a poor frequency response sounds runs into "too loud" territory quicker as one turns up the volume - it's because the peaks (i.e. the resonances) in the irregular frequency response get too loud, creating a "perceptual loudness bottleneck" preventing the rest of the spectrum from being turned up higher. This would be consistent with the well-known fact that we are more susceptible to peaks in the frequency response.
This problem can be caused by a problematic playback system, but of course these peaks could be in the source material too. Such poor recordings will sound unpleasant when pushed too loud, whereas properly made recordings free of resonances can be enjoyed at higher volume.
I wonder if anyone is aware of any research that looked at this question? Namely the factors that contribute to the "maximum subjectively preferred loudness level" for audio reproduction.
If my hypothesis is correct, this has implications with regard to the advice that should be given to someone who complains that their system don't produce enjoyable sound at high levels. I get the impression that the standard approach to this question (on ASR and elsewhere) is to assume clipping and recommend equipment capable of higher SPL levels. But I'm really sceptical this is the right approach in all (or even the majority) of cases, and that many such cases have more to do with frequency response and source material quality instead.