One possible explanation for why people hear differences could originate from something like the McGurk effect, try it yourself:
There is an audible difference which originates from simultaneous visual and audio impressions.
No matter what you do, a visual information alters audio interpretations. So you hear something different depending on what you see.
We have to accept that our hearing (or more precisely our perception of sound) is not a measurement device.
Recording engineers know about this. In audio recordings the lead singer's volume needs to be increased vs. the same in a video production to give the singer the same stand-out impression from the rest of the orchestra. Your eyes "amplify the sound" from the singer you are watching.
What is interesting and can be learnt from the video is, that you can't make that impression go away, even if you consciously know about it.