Riled up? I'm a bit confused, did I incite some sort of negative emotion? I hope not.
Quite the opposite! Conversing with you has been exceptionally positive.
There was also the Haas Effect, which can be simulated in the digital realm, which is how many of these mono-to-stereo widening examples have been produced (so again, no qualms here). Though some of your numbers run contrary to what I've been exposed to. Since I'm not as deep into these things as a manufacturer might be, I'd simply take your word for it on authority alone on this front. Oh and by the numbers, I mean that ~20ms number. I can hear differences with channel panned delay of mono sound even at ~10ms to 20ms, though any less than that, it just sounds "louder" on the other channel instead of "wider" overall. Oh and I also was under the impression that the discernability of the 'echo effect' starts at ~40ms, I wasn't aware Haas actually is at ~20ms.
I may have been mis-remembering that 20 ms. And I agree with your observations - the Haas effect is NOT a perfect suppression of directional cues. This is demonstrated to those familiar with Floyd Toole's work by the widening of the Apparent Source Width caused by the first sidewall reflections (which have their downsides as well, including a reduction in the precision of sound image locations).
As far as false Azimuth cues. I'm not sure why we would be evaluating "faulty" baffle or speaker designs that would lead to these false cues based on timings (unless of course this is actually one of the primary aspects of what you take soundstage to be, thus any speaker with the least false azimuth cues, would equal a speaker with the best soundstage, or should I say imaging). Instead we should be taking speakers with minimal reflections and diffraction. There's nothing really particularly interesting about a driver if most of the directionality is driven by things you mentioned like waveguides and such. That's just trivially true, in the same way a massively distorting woofer will eventually distort enough to ruin "imaging" simply in virtue of basically ruining everything about the sound produced as a byproduct.
Agreed, I was just pointing out a case where there was variation in spatial quality which could be traced to characteristics of some drivers.
I know you opened up by telling me most of your talking points from from the perspective of speaker room interactions, but if soundstage exists in things like headphones and IEMs, then I find it too unwieldy to start soundstage discussions since that's yet another thing that needs to be juggled (unless of crouse, again, room reflections are tied to your notion of soundstage definitionally). What would happen for instance if someone asks you: "Yeah, no problem with anything you're saying here, but now start talking to me about soundstage with respect to IEMs". Unless of course you're ready to say that soundstage is simply tied to bore-geometry (in the same way baffle design or waveguides are partly what determine directionality in speakers). And if that's all soundstage actually is, then Y-axis imaging/soundstage is going to need to be explained as to how such a thing is possible in stereo (or better yet, even mono) without the use of visual or pre-contextualized cues (like silly binural demos that psychologically pre-load you by telling you which sound is coming from the top, or which is coming from the bottom). I won't ask that question though, because I first need a definition to the best of your ability, but we've abandoned soundstage as the main talking point, so sorry for even bringing it up like this again. Though I would like to ask you something before I conclude: Do you actually believe something like vertical imaging is remotely possible in a blind evaluation of a recorded sound that was recorded in an anechoic chamber, and you listening to said sound on headphones also in a chamber? I ask this because I want to remove all factors that contribute to generalizing the concept or bogging it down with too many moving parts. So no reflections in the recording, no reflections during playback, and no reflections of the pinna (though for this it would need to be IEMs).
I know virtually nothing about soundstaging with headphones and/or IEM's. My opinion is that pinna transforms are worth inclusing somewhere in the signal path for IEM's, but know not how to do so if the recording was not binaural using dummy ears (or one's own ears).
In conclusion, nothing you mentioned here is anything I'd really contest as far as it's effects on soundstage goes, better housing design can fully lend itself to a more "expansive" sound image experience (obviously the actual numbers are for manufacturers to figure out what they think is best, but I can't imagine why anyone would opt for anything less than maximal expansiveness for their given form factor at least where speakers are concerned, unless of course that ruins some other intents for certain metrics).
I make the assumption that the "package" of spatial cues on the recording are what we want to hear, which in turn implies that we want to present those cues effectively while suppressing the effectiveness with which the playback room's "package" of spatial cues are presented.
When I say "package of spatial cues", I'm thinking primarily in terms of four things: The first-arrival sound; the first reflections; the reverberation tails (and yes I know the term "reverberation" is not the most precise in the context of small room acoustics, but it does apply to the spatial cues on the recording); and the "temporal center of gravity" of the reflections. The first .68 milliseconds is primarily what gives us the sound image direction; the first reflections tell us about room size and liveliness or deadness; the reverberation tails inform us of distance and room size and room liveliness or deadness; and the temporal center of gravity of the reflections informs us of distance and room size.
My premise is that the ear/brain system tends to select the most plausible "package" of spatial cues, choosing between the "small room signature" package of the playback room and the "venue spatial cues" package on the recording.
My understanding is that the first reflections are the strongest indicators of room size, so we can DISRUPT the "small room signature" cues by suppressing and/or delaying the onset of the first reflections. Doing so ALSO pushes the temporal center of gravity of the reflections back in time. The net result is that the ear/brain system is no longer presented with a convincing package of "small room signature" cues, as now they are somewhat scrambled and self-contradictory.
My understanding is that the reverberation tails on the recording are effective indicators of venue size and venue acoustics and sound source distance, and the later-arriving in-room reflections are the most effective way of delivering those reverberation tails from many directions. The better we preserve those reverberation tails and present them to the listener from all around, the stronger the presentation of the "venue spatial cues" package. (This is what a good multi-channel recording does, as the rear channels deliver the reverberation tails from multiple desirable directions.) In two-channel the in-room reflections function as the "carriers" of those reverberation tails, so in my opinion we want to preserve the later-arriving in-room reflections rather than eliminate them (as some advocate; I'm not saying that YOU do).
IF this combination of room acoustic characteristics sounds a bit like "live end-dead end", well that's because this concept of what constitutes desirable loudspeaker/room interaction is not a new idea.
In this context, I believe there is room for improvement over "conventional" loudspeaker radiation patterns. I believe there are better "starting points" for how the energy is radiated out into the room, given that the desired "end goal" is minimal early reflections + plenty of spectrally-correct late reflections. And apparently my opinion is in the small minority, given the degree to which "conventional" loudspeakers dominate the marketplace!