Bit late to the party, but as I consider it page 1 of my day job to make dialogue audible in films, I could talk on this subject for days.... but I'll try my best not to!
I like Amir's idea of a usability study, but sadly we have to press on fast with a mix if we even want to listen it ourselves and make tweaks before it goes out the door! Schedules are stupid & film mixes are not usually a refined work of art (as much as I would love to pretend otherwise!) They are a commercial product done to a schedule, and that schedule probably isn't based on the specific requirements of the film.
That aside (and I don't mean to pass the buck entirely) but as others have said or alluded to, the bigger problems with dialog intelligibility usually come from the performance and it's capture.
If I can't understand the dialogue on the raw recording in isolation, it's unlikely to get more than a few percent better in the mix. Our toolkit is mostly EQ and compression, and there's cost (trade off) in using them.
Regarding dynamics of the Home Ent mix, we're in a difficult position. Some HT owners love it when we have wide dynamic mix, but a larger percentage of listeners prefer a mix more like what you get on YouTube with nothing being louder than the dialogue.
So what do we do? We just have to compromise of course. Personally I generally err on the side of lower dynamic unless there's very good reason not to. I just don't think the dynamics of the piece move you in the same way at home as they do in the cinema. And if I have to reach for the volume control myself, I've failed.
Moving on to replay, I think one thing that's sometimes overlooked is that 2 way speakers often have an FR dip / directivity error somewhere in the critical 1 to 4k range. Even if the speaker is relatively good and it's +/- 3dB from like 150 to 15k, it could be that e.g. 2k is as much as 6dB down on 200Hz, and obviously this will have material effect on intelligibility if it's not corrected in-room.
I've also found dialogue hard to hear on flatscreen built-in speakers but I'm not entirely sure why... I guess the simple answer is they're rubbish. But given they tend to favour dialog frequencies I'm often surprised at how hard it is to hear even on sparse mixes, played in a quiet room, where I know intelligibility to be fine on the recording.
To touch on what
@tecnogadget pointed out: In the case where the recordings are OK, I think where things *can* go wrong in the mix is that younger mixers (and I've fallen in to the trap myself before now, so I don't mean to be too critical) end up using too many plugins. It's quite easy to over-denoise, and the thing is, it makes it easier to mix if you do. When dialogue is noisy you can end up accidentally mixing the noise not the dialogue. So if time is tight, putting de-noising on everything can bring the noise floor changes from take to take down to below an acceptable ambience cover level. But dialogue quality suffers of course. (Most denoisers impart an low bitrate MP3-like quality) Then, you end up adding de-essing or multiband dynamics to fix the horrible stuff, and the whole thing is just way over processed. It takes conviction to sit in a mix and do almost nothing to the dialogue, but often that's exactly what's needed. I suspect the more experienced a mixer is, the more likely they are to do what's right, rather than showing off their plugin arsenal.
Anyway, that's enough rambling from me.. I said I would try not to, but I've not really succeeded