@Soundescape Hmm, I've added GTZAN data, but I'm seeing several issues with these tracks:
1) wildly varying in quality
2) mono
3) 22050 Hz sampling rate
4) no artist/album/track/release information
But it helped me show one thing: total unweighted RMS and K-weighted RMS are within +/-1 dB of each other for most (90%) music.
So if one can't measure the K-weighted RMS value then it's safer to just use the unweighted RMS value instead of adding something to the A-weighted value as I've suggested before.
Due to these issues, I think I'll remove the data again in the next version.
Here's something that makes more sense to me:
Pick at least one reference album per genre. Analyze that. People can then play the exact same album on their system to calibrate their volume levels.
Different services may use different masters so it could make sense to repeat that for every service. And it could make sense to pick a "loud" (highly dynamic range compressed) and "quiet" album in each genre. (As we saw, with proper loudness normalization differences in volume setting for the same target will be within ~2 dB, so picking the "louder" one would be the safest option.)
In the end, the user experience would be like this:
I listen to metal on Spotify, so I filter for that and copy the values of the reference album into the main sheet.
I configure my target and select a headphone. The spreadsheet tells me I need X volts and Y dB amp gain. Cool, now I know what to buy.
I configure the source and amp. Now the spreadsheet tells me that I need to set the volume to Z dB to be on target. Done.
As long as loudness normalization is engaged, all comparable music I listen to will be at roughly the same target level.