I thought my browser failed to render something.
Anyway, to re-cap, conventionally we have sensory memory (direct, high resolution, fast-decaying) of which echoic is the auditory component (2-3 seconds apparently) then short term memory (10-15 seconds) and long term memory (where short term memory is consolidated and stored, unless discarded). Sensory memory is sometimes contested and presumably the part you referred to as not so well understood (or agreed) so there's that. Then we have synaptic consolidation (occurring within a few hours) and larger scale systems consolidation (occurring over weeks to years) and reconsolidating (reactivation of consolidated). Standard-ish stuff.
I think the echoic audio testing dogma misses something. Fast switching variable duration makes perfect sense. But I'll speculate that consolidated audio memory is possible at a level of detail beyond broad-brush. My example: I don't change audio gear much or often, so replaced an amp I had for ~15 years. The new amp was AB, measured well, yada yada like the old one and was quite—and initially—satisfactory (yeah my experience happened to be with an amp, if that's bothersome follow along and pretend we are talking about a speaker to get to the point, I imagine the same would apply). Same setup in every other detail. After a short time (days/a week or so) I was bothered when playing familiar music, which didn't sound so right. New music, no problem. Familiar music, wrong. I expect it was an amp-speaker-room synergy issue (assuming it existed, but again bear with me). This went on for several months. Yes, sighted bias, all the things. But still, I couldn't shake it. I assumed my ear/brain would re-calibrate, but it never did (at least not over the course of 3-6 months). I suspect I'd consolidated the memory of the long-term previous amp's sound in sufficient detail. Doesn't prove a thing of course, but it was pretty weird.