@j_j
1. You mentioned a codec issue where the same issue caused 70% to observe pre-echo (nearly all of the people who worked in studios) and 30% to observe messed-up imaging (those who have done a lot of work in live sound). Can you please comment further on characteristics of these groups? I was reminded about how musicians and audio professionals seem to be more sensitive to lateral reflections, but I don't assume any correlation to what you're describing.
2. In your discussion about transducers, you briefly alluded to the reflections off the back of the room, and your slide indicated that these rear wall reflections would be similar to the on-axis sound. Does that lack of spectral or timbral distortion change the perceptual significance of these rear wall reflections, compared with lateral, floor, and ceiling reflections that would usually expected to differ with frequency from the on-axis response?
Regarding the second, I'm wondering about a comment from Griesinger: "
t is well known that a prompt early reflection can augment speech and musical instruments…Often in a room where localization and proximity is poor a seat in the very last row, up against the back wall, will sound much better…In practice I had found that your ears had to be within two and a half feet of the wall for the trick to work…The results showed that a reflection at 5ms which was 6dB less strong than the direct sound did augment the loudness and the localizability of a source without detrimental effects on timbre…” Also, Joachim Gerhard (previously of Audio Physic) had proposed listener placement <1m of the wall behind him/her, arguing that “From experience that this reflection is not so objectionable for phantom image perception.”
It seemed to me, as a naive low-level student of these phenomenon, that these similar thresholds seem to align with the transition from fusion to localization dominance and discrimination suppression, so perhaps arrival of a similar spectrum reflection more than 0.63-1 ms or so (beyond which loudness compresses within the ERBs) up to about 5 ms or so (beyond which fusion seems to fail for most listeners for clicks, though speech or especially music would be longer) could be perceptually "beneficial" through:
A. Loudness enhancement as noted, since the very early rear wall reflection would be perceptually "added" to the direct signal, perhaps with comb filtering but otherwise nearly identical. The ERB loudness compression at around 1 ms would emphasize the leading edges of the direct signal, while the rear wall reflection would arrive early enough to avoid adversely affecting localization?
B. The very early nature of the rear wall reflections could relatively preserve envelopes without scrambling them, again not adversely affecting localization?
C. Depending on listening proximity to the rear wall and system setup, could head shadowing effects potentially enhance localization by reducing interaural cross-talk, i.e. could the head itself when in close enough proximity to the rear wall block to some degree the signal from one loudspeaker from reflecting off the rear wall and reaching the other ear? I'm assuming that would be expected to enhance localization.
Thanks in advance for your time and consideration,
Young-Ho
[edit: italics removed]