As you say, our ears adapt to quirks in a speaker design fairly quickly.
I do most of my design work with measurements, taking care of the directivity and ensuring there are no major resonances. Only then will I have 2-3 people do a single blind testing, where I play them short clips of songs in mono, swap to the second speaker, and ask them to rate any differences on a scale of -10 to +10.
Most of the time, people are pretty happy with a neutral response. Some prefer a slight treble rise, although I usually don't build this into the speakers, because over longer listening sessions it becomes less endearing. A universal constant is that people love a good bass response, I conduct listening tests with well integrated subs. The subs are only turned off if I am specifically evaluating a speaker who's owner will not use a sub.