Room modes are irrelevant above Schroeder frequency. It is, after all, the definition of Schroeder frequency, above which, the room modes are so many and the modal frequencies are so close that they all blend together.
What you described is comb filtering. Quoting Dr. Toole:
Comb filtering is often mentioned in the context of these side wall reflections. It is indeed true that one measures what looks like a comb filter. However, two ears and a brain process sounds in a manner that distinguishes between sounds based on the angle of incidence, a microphone does not. When the direct and reflected sounds arrive from different directions, the perception is normally of a small spatial effect not destructive timbral distortion. Figure 7.3 (p. 164) and the associated discussion are relevant.
An interesting fact is that when we are moving we can hear things that we don’t when we are stationary. I have witnessed an acoustical consultant playing pink noise and demonstrating that acoustical interference, which was called “phasiness,” was audible when swaying the head from side to side. However, the same phenomenon that was audible in the dynamic situation with pink noise, a highly revealing signal, becomes inaudible if a listener simply walks in, sits down, and listens to music or movies. Such reflections, and there are many of them, fall into the context of “room sound,” which human listeners are known to readily adapt to. To a very substantial extent, we are able to “listen through” rooms. It is what happens in live, unamplified music performances, and everyday conversation. In terms of speech intelligibility, most small room early reflections are desirable (pp. 200–201).