Just initially looking I notice the loudness rise or decline by year is a reasonably close fit to how many albums were released each year. Probably just a happenstance correlation.
https://en.wikipedia.org/wiki/Category:Albums_by_year
Now looking at methodology using album review and list at
www.besteveralbums.com maybe the above correlation isn't happenstance. They looked at the top ranked 150 albums of each year. In the 60's that would be 150 of some few hundred albums. In the 2010's it would be top 150 of more than 3000 albums. It might appear that in years with more albums loudness is more important to stand out from the large number released, while in years with less albums released it is a lesser factor. Perhaps they should have looked at the top 10% of each year to account for this possible effect.
It is possible louder is always at the top of such a list. By not adjusting for top 10% or 5% or something they are allowing in some years the top 5% to be counted and in others the top 30%. Which would tend to hide the fact louder is a plus. This would appear to support the idea louder is better.
I've only read the first little bit though perhaps they somehow account for this later on in the article.
EDIT: finished the article and they never deal with the above issue of variable album numbers per year when they sample the same number from each year. An obvious fault in the methodology of their analysis. If their conclusions hold up with a modified number analysed per year it seems very likely it will water down the robustness of that conclusion. The conclusions may not hold up once the sampling methodology is improved.
@Blumlein 88 , thanks for your comment on the Deruty-Pachet (2015)!
Yes, you are right that there may be hidden biases in the selection of empirical data. This is a researcher's worst nightmare; that his data is faulty and full of biases.
Let me now try and comment your remarks as well as having other ASR readers in mind when I comment.
The authors have a discussion (chapter 4.3) of their findings (on year being more important than genre), and here they write that "This may bring the suspicion that dynamics are only dependent on the trends followed by the most represented genres, such as the subgenres of rock represented in Figure 3, but independent from the trends followed by most other genres, in which case our
conclusion would not stand" (my underlining).
They try and show that their results are year related controlling for genres. And that analysis may implicitly give us an insight into what would have happened with their results if they controlled for total album releases per year. The bars with 5-95 percentiles in figure 1 may also be an indication of how a control for total album releases per year would have fared.
See also this ISMIR poster that accompanies their 2015 article:
http://www.emmanuelderuty.com/pages/publications/2015_ISMIR_poster.pdf
FWIW, the entire dataset of Deruty-Pachet (2015) can be found here:
http://emmanuelderuty.com/pages/dynamics/Corpus7200/Values.xlsx
But you are right: It would have been great if we would have a complete dataset of all recordings ever made since the 1950s, 1960s.
One interesting finding in their paper, is their note on micro vs macro dynamic:
"A notable exception lies in macrodynamics as measured by the EBU3342 Loudness Range, which are more independent from both genre and year of release. In other words, dynamic range in the musical sense (pianissimo tofortissimo) is only marginally dependent on either mainstream genre or trend (...) As an exception, macrodynamics, which have not been significantly influenced by the loudness war, appear to increase since the loudness war’s peak, and are currently reaching very high values".
In other words, the debate on loudness has been little nuanced if it doesn't make a distinction between micro and macro dynamics.
Please note that the paper was presented at International Society for Music Information Retrieval, ISMIR:
http://www.ismir.net/society.php
This perspective, big data in audio, is highly interesting because it replaces opinion and anecdotes with fact. And this perspective may also draw more computer people into audio science. Needless to say, habit and convention will predict that old school audio people are a bit skeptical towards this new breed of audio scientists that are more pattern oriented than case and anecdote oriented.
Many people, including people at ASR, think loudness is a battle lost. So they keep on fighting as guerilla fighters. However, other people, that are in high regard, are of a totally different opinion. Bob Katz, thinks (2013) the war is over, due to normalization features in distributors like iTunes (Katz' original blog post is no longer available):
https://www.soundonsound.com/techniques/end-loudness-war
Lastly, just a couple of words on the authors, Deruty and Pachet. Deruty is a frequent publisher of scientific audio articles:
http://emmanuelderuty.com
Pachet is the better-known name of the two:
https://en.wikipedia.org/wiki/François_Pachet
He is, among other things, a fellow of The European Association for Artificial Intelligence. Which may be an indication of his ability to deal with datasets.