Yes, the idea is hypothetical and is pared down to an easy to pose question - maybe not easy to answer for some though.
This is for the sake of pondering the idea.
Maybe you are an outlier or whatever, in any case you are forced to choose between a genuine tested preference and genuine SOTA accuracy (but that you found in a test that for whatever reasons you enjoy said accuracy less.)
Imo your question is excellent for making me ponder the idea.
Roughly thirty-five years ago I ran up against almost exactly that situation as an amateur speaker builder. Having managed to borrow high quality test equipment from a technician for the weekend, I was fine-tuning a loudspeaker design in pursuit of what I KNEW was the ideal frequency response: Flat. The closer I got to "flat", the worse the speaker sounded, but I persevered faithfully because I KNEW that once I got to the "promised land" of flat frequency response, everything would snap together and sound magnificent
It took me all weekend but I finally had a speaker that measured plus or minus about 1.25 dB over the range where I thought I was getting valid data. And it was unlistenable. Absolutely dreadful.
Well obviously what I "KNEW" was incorrect. As a result I started spending evenings at the local university library reading relevant papers in the Journal of the Audio Engineering Society, Acoustical Society of America, Wireless World, and whatever else seemed applicable. I won't bore you with the details, but briefly that's how I first learned that the off-axis response matters too.
Getting back to your hypothetical, IF I had good reason to believe that my own preference was anomalous AND I was making the selection primarily for the listening enjoyment of other people who would presumably prefer Set A, THEN I would buy Set A. Otherwise, I'd buy Set B.