I did the Neumann KH80, Focal Chora 806, Devialet Reactor 900, Q Acoustics 3030i, D&D 8C, and Fluance Ai60 today. I was getting a little paranoid at how close all the scores were but then the Fluance Ai60 ended up with significantly worse performance as expected.
Normal score/ignore LFX/Using listening window
KH80: 5.7/8.1/8.2
Chora 806: 5.7/7.7/8.1
Q Acoustics 3030i: 5.2/7.5/7.8
Devialet Reactor 900: 7.2/7.6/7.8
Dutch 8C: 7.4/8.4/8.4
Fluance Ai60: 3.4/5.4/5.6
Amphion Argon1: 5.3/7.3/7.5
Polk L200: 5.6/7.6/7.7
KEF R3: 6.3/7.8/7.8
Buchardt S400: 6.3/8.1/8.1
Obviously take this all with a grain of salt. Just sharing for the curious and to see if there's any merit in using similar performance metrics from DIY data.
Aside from the obvious fact my measurements aren't truly anechoic and are more subject to human error, I also have to 'cheat' on bass performance for the off-axis curves because I normal only do the single nearfield splice for the on-axis. Vituixcad lets me simulate the other angles, but it's kind of trial and error and guesstimating based on expected results. Still, I think the results are largely in line with what I'd expect from my own interpretations of the measurements, especially the ignore LFX score.
Q acoustics is definitely winning the price to performance ratio. The devialet and dutch 8c are notable for having by far the highest preference score sans subs because bass. It's also encouraging to see that the 8c scores best, which is the expected result, though diminishing returns kick in early.
I think this also shows it's going to be really hard for any speaker to ever get out of the 8s.
Next step is to take some of Amir's klippel generated data and use a 6.5ms gate and see how much the score calculations change to try and estimate how much the lowered resolution affects the score accuracy.
EDIT: Realized that my L200 calculation was only using 12 points instead of 24. Scores went up a bit from 5.5/7.5/7.5 to 5.6/7.6/7.7