So maybe the effect is not significant enough to manifest a pronounced difference in the mixing of songs and so a roughly flat speaker is preferred even when these originally stereo songs are played in mono. The preference scores have a good correlation but more work needs to be done IMO as to the correlation of preferences between multiple very competent loudspeakers (e.g. approaching +-1 dB flatness and good directivity) and their measurements. So many speakers do not approach this level in the first place so I feel the existing work on preference scores is more of a separating the wheat from the chaff type of thing, but it seems like a wash when all the speakers under test meet a basic level of competence. In other words the perceived FR deviations caused by the crosstalk and/or mixing engineers compensating for it may be dwarfed by the sins of the speakers themselves the vast majority of the time.
I’m not an expert in this area by any means so these are just a couple possibilities that came to mind. I’m also drunk and posting this in bed so sorry if I missed something obvious. It’s an interesting question!