Hey, welcome to ASR!
Let's assume the room is the same and only the speakers or headphones vary.
Perceptions of "detail" can come from a few different factors. I tend to interpret "detail" as "subjective ability to hear short and/or quiet sounds in the recording".
It's often associated with boosted high frequencies or even distortion in high frequencies. This would probably because short, sharp, high-frequency sounds tend to come across as "detail". So with more energy in the highs, you hear "more detail".
However, flat frequency response and low noise and distortion (THD as well as IMD) should also lead to more "detail". With distortion or uneven frequency response, you end up with masking, i.e. the "detail" is overwhelmed by something else coming out of the speaker. If everything is flat, with no noise or distortion to cover up the detail, you should also hear a clear, "detailed" sound.
You can sometimes experience this for yourself in interesting ways. If you play with EQ while listening to a song, sometimes a"detail will disappear or re-appear in a frequency range that you didn't touch with EQ. This is because of masking!
Lack of resonances in the speaker might also be important, for the same reason. If you have a honky 700hz resonance tail coming out of the port for longer than it should, it could mask quiet, short sounds, i.e. "details".
A more minor factor is probably the group delay / phase response of the speaker. If everything comes out of the speaker at the same time, this makes for a sharper impulse response. Very small sounds with impulse-like envelopes might be more audible if the phase response is really clean. This is a matter of some debate, but at the end of the day, all else held equal, it can't hurt to have more phase coherence.