It might be easier to interpret if you order the mean ratings from highest to lowest - I did this and removed the 2-3 modifications they made to the 8 headphones as I was less interesting in that , although it might be the focus of a different question,
The graph below is the mean based on all ratings ( all listeners, programs, repeats). They didn't code the data according to location (Dane vs Japanese) so we can't analyze it based on culture.
The ratings of the headphone curves is remarkably similar to our work in terms of Harman Target being preferred, diffuse and free field rated progressively lower. It is amazing to me and one of the co-authors I spoke with that two separate experiments using different subjects and methodologies can some up with similar conclusions.
They did use two of our standard tracks (Jennifer Warnes "bird on a wire" and Tracy Chapman "FastCar" so there was that commonality in methods. They also used a virtual headphone method applying equalization filters to a replicator headphone so that rapid A/B comparisons can be made using a MUSHRA method. They used a closed headphone which to me is surprising because open backs generally eliminate the issue of leakage among subjects.
View attachment 319570