• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Harshness and Image Instability in Aging ESL Hybrids: A Phase-Timing Hypothesis by ChatGPT

tengiz

Senior Member
Forum Donor
Joined
Jan 16, 2023
Messages
435
Likes
792
Location
Seattle
The text below is the result of an extended "conversation" with ChatGPT, based on my experience—along with significant help from the forum—diagnosing harshness and imaging issues in an aging pair of Martin Logan Ethos speakers. I’m particularly curious about the strength of the psychoacoustic explanations; everything else seems solid.

In aging pair of Martin Logan hybrid electrostatic speakers with passive crossovers for the panel and active analog low-pass filters for powered woofers—a listener reported increasing vocal harshness, grain, or instability. This is most, although not always, audible during dynamic vocal passages, where the fundamental (around 300–400 Hz) becomes more prominent than the harmonics. Phantom imaging may also become unstable or shift slightly in these moments. The effect completely disappears with a different model from the same manufacturer - older pair of Martin Logan Vista speakers, or with brand new KEF R5 Meta speakers in the same listening room with the same amplifier, etc.

Martin Logan hybrid speakers commonly cross over squarely in the vocal fundamental range. A likely cause of the perceived harshness is subtle phase or group delay misalignment between the woofer and panel, introduced over time by component drift—particularly in capacitors, op-amps, or power supply elements within the active crossover. Even small shifts in timing or polarity near crossover can disrupt driver integration enough to affect perceived coherence.


Standard REW sine sweep measurements may not reveal any obvious blending issues or destructive interference. The magnitude and phase traces may appear nominal, with no major dips or anomalies between drivers. However, the absence of visible flaws in these plots does not rule out perceptual degradation.

Sweep-based measurements operate at moderate, constant levels and may not expose level-dependent group delay or phase shift. Additionally, they reflect sound at a single point in space, whereas the auditory system processes information across time, both ears, and multiple spatial reflections. A mismatch of just a few degrees in phase—or a few milliseconds in group delay—especially in the 100–500 Hz range, can cause smearing and textural artifacts that resemble distortion.

These effects are particularly audible in systems with electrostatic panels due to their exceptional transient clarity. The ear’s sensitivity to midrange timing irregularities, combined with asymmetric hearing or tinnitus, can further exaggerate spatial or tonal inconsistencies—even if the measurements look balanced.

In summary, age-related drift in the active crossover path may introduce subtle phase or timing errors near the woofer–panel transition. While these errors may not be easily captured in traditional sweep measurements, they can cause vocal roughness and spatial instability that are perceptually significant. The issue lies not in nonlinear distortion per se, but in the breakdown of temporal coherence across the crossover—a domain where the ear is far more sensitive than standard measurements suggest.
 
I think we have seen enough stories about the misadventures of lawyers submitting court documents with AI-generated citations that the citations would have to be double-checked by a human in any case.
 
Anecdotally (and this subject is intimately intertwined with psychoacoustics), I've found that when listening to loudspeakers having a very high direct/reflected performance in the transients in a room treated for early reflections around the loudspeakers and the listening position(s)--like dipole panel-type loudspeakers and full-range horn-loaded loudspeakers (polar control down to the room's calculated Schroeder frequency), then improvements in phase/group delay response to eliminate the all-pass phase growth through the crossover networks becomes quite audible.

This includes elimination of harshness and increased perception of deep bass response.

More on these subjects at Linkwitz's site: Some Attributes of Linear-Phase Loudspeakers

Also anecdotally, problems with passive crossover networks in terms of thermal heating and aging capacitors (ESR growth) also leads to soundstage imaging wander. This is particularly pronounced using loudspeakers/room acoustics with higher direct/reflected reproduction, and drivers with wizzer cones that are sensitive to heating and humidity changes while reproducing dynamic music transients.

Chris
 
Last edited:
In summary, age-related drift in the active crossover path may introduce subtle phase or timing errors near the woofer–panel transition. While these errors may not be easily captured in traditional sweep measurements, they can cause vocal roughness and spatial instability that are perceptually significant. The issue lies not in nonlinear distortion per se, but in the breakdown of temporal coherence across the crossover—a domain where the ear is far more sensitive than standard measurements suggest.
Pseudorandom autogenerated conjecture pulled from various online collections of bull.

Electrostats have no more natural transient clarity or other characteristics than any other speaker. It comes down to directivity and, in their case, strong resonances.

Problems in the crossover are diagnosed with measurements. Temporal problems are frequency problem. Linear, rather than nonlinear.
 
Ok, case closed then, thank you!
 
If they measure different (IF) it's due to membrane againg, sagging or resistivity changes ( charge migration)
 
With the Ethos speakers now decommissioned, this is purely academic for me. Both the Vista and KEF perform flawlessly - no harshness, no imaging issues, no need for tuning. Whatever went wrong with the Ethos could only be masked - and I wasn’t interested in chasing another rabbit hole trying to fix them - through minor EQ or placement tweaks. It felt like voodoo - unpredictable and unexplainable.

AI models often tout the superior “clarity and transparency” of electrostatic panels, but to my ears, there’s no meaningful difference between the KEF and the Vista in that respect. Maybe I’m just getting old - but it is what it is.
 
If they measure different (IF) it's due to membrane againg, sagging or resistivity changes ( charge migration)
One of the Ethos speakers in my system showed a clear pattern of degradation. The first symptom was a slow start - on power-up, output was slightly reduced, just enough to shift the phantom image off-center. Within a few seconds, it would slide back to center and stabilize.

Gradually, recovery took longer - up to a minute - and eventually it would never return to center. So, I had to adjust speaker levels in the AVR to compensate, but by then the woofer–panel integration was off due to reduced panel output.

After consulting MartinLogan and ruling out other causes, it became clear the issue was with the panel, not the electronics. I replaced both panels, and the system worked fine for a few years.

While swapping the panels, I noticed burnt components in a crude voltage converter designed to drop voltage from the ICEPower module to the ~18V needed for the high-voltage panel bias board - indicating excess current draw at some point. I think it would be consistent with leakage in the panel, which would also explain the loss of output. Prior operation outside spec may have contributed to further degradation over time even with the new panels.
 
Might be worth looking up how AIs work on the back end.
Even with one of the biggest private corporate models trained on carefully picked data, my experience hasn’t been much different - they still hallucinate or make stuff up like it’s a normal part of the job. To be fair, they usually own it when you ask, 'Are you bullshitting me?' - no hesitation :cool: .
 
It's disappointing to see how some audio gear malfunctions over only the span of a decade or so when my 1958 Tannoy Reds continue to sound great in my vintage bedroom system. I certainly don't expect all gear to last nearly a lifetime however gear should not be designed so poorly that they fail in just over a decade such as: electronics with inadequate cooling on circuit board elements (like the NAD C-356BEE) or rotting woofer foam (like in some Infinity speakers).
 
The text below is the result of an extended "conversation" with ChatGPT, based on my experience—along with significant help from the forum—diagnosing harshness and imaging issues in an aging pair of Martin Logan Ethos speakers.
What is the attraction using a LLM/GPT for this subject? (It seems odd.) Was it to generate responses?

In my response above, I found that I had to turn on my most diplomatic mode to answer your question--since the GPT response on the psychoacoustics part was clearly so poorly conceived (as you yourself pointed out in your first post).

I can see why the forum founder is not so keen on using GPTs.

Perhaps in 20-50 years, the quality of GPT-like responses on these type of subjects will have improved via trial-and-error learning by the GPT providers themselves. Nowadays, I find it's generally a waste of time--since most of the responses that are clearly in error are pulled from thin air (no real references that can be traced).

Additionally, there was no link to your prior experience with your failing loudspeakers--something that if I'd seen what you described in post #11 (above), I would not have responded here.

Chris
 
What is the attraction using a LLM/GPT for this subject? (It seems odd.) Was it to generate responses?
LLMs like ChatGPT are great for advanced search. Last night I was looking for a dry sounding recording of a specific classical piece - minimal reverb, studio-like, etc. Google wasn’t much help. It’s keyword-based and doesn’t grasp what dry means in audio unless someone used that exact term. I’d have to manually try endless keyword combinations.

ChatGPT, on the other hand, understood the intent and suggested relevant recordings - even when the words I used weren’t explicitly mentioned. It’s about meaning, not just word matching.

In audio, it works well, though results can get muddied by vague audiophile jargon like microdynamics, musicality, or air. Still, with careful phrasing, you can usually steer around it.
 
Last edited:
LLMs like ChatGPT are great for advanced search. Last night I was looking for a dry sounding recording of a specific classical piece - minimal reverb, studio-like, etc. Google wasn’t much help. It’s keyword-based and doesn’t grasp what dry means in audio unless someone used that exact term. I’d have to manually try endless keyword combinations.

ChatGPT, on the other hand, understood the intent and suggested relevant recordings - even when the words I used weren’t explicitly mentioned. It’s about meaning, not just word matching.

In audio, it works well, though results can get muddied by vague audiophile jargon like microdynamics, musicality, or air. Still, with careful phrasing, you can usually steer around it.

I disagree that LLMs work well for the purposes of a forum like this one. Like most of us, I end up using LLMs all the time, because now whenever I do a Google search, the first thing I see at the top of the results page is a Google AI summary of the responses to my query. And these summaries are, as you say, generally quite good and helpful, at least for a start.

However, I have found - with Google AI and also with ChatGPT and other LLMs - that they do best when the query is is not hyper-specialized, when it's very well-studied, and when there is either an answer or a strong empirical consensus, or there is disagreement but the terms of the disagreement are clear, well-defined, and well-represented by large number of people on all sides of the debate.

So when it comes to the kind of question you posed - a specific kind of speaker, with a specific subjective issue possibly based on age and/or design, and a field dominated by untested, pseudoscientific claims - you're going to get a response that's faithful to the sum total of human discourse that's out there. But the problem is that the sum total of human discourse on this question is of poor quality.

To be clear, that's not a reason to avoid asking ChatGPT about this kind of thing - go for it if you want! But when you or anyone else copy-pastes such AI responses here and starts a thread with it, it sets the agenda for discussion based on what is essentially nonsense.

To put it another way, consider these two options:

1. Post a LLM response and ask people if it's true or what they think about it; or

2. Post your initial question or concern and ask people what their thoughts are on it.

I would argue that in a forum like this that's filled with folks who have expert knowledge and an above-average understanding of which audio claims are supported by evidence, which are not, and which remain unsettled, option 2 is far more productive for the OP and everyone who responds and reads the thread. It essentially enables everyone to start on solid ground - whereas option 1 makes everyone start down in a hole of bad information. Before they can start helping to build good information, they have to spend many comments digging us all collectively out of the hole of all the bad or incorrect or unproven info in the LLM response.

So that's my $0.02 in agreement with @Curvature and others who are urging folks not to cut and paste AI responses to make threads.
 
Interestingly however AI is engaging in cut and paste from ASR and probably a multitude of other sources. As I stated elsewhere here, AI actually lifted a whole sentence of one of my comments here in a response to an online inquiry of mine just 2 days after I placed that comment here.
 
I would argue that in a forum like this that's filled with folks who have expert knowledge and an above-average understanding of which audio claims are supported by evidence, which are not, and which remain unsettled, option 2 is far more productive for the OP and everyone who responds and reads the thread. It essentially enables everyone to start on solid ground - whereas option 1 makes everyone start down in a hole of bad information. Before they can start helping to build good information, they have to spend many comments digging us all collectively out of the hole of all the bad or incorrect or unproven info in the LLM response.
This sounds reasonable to me.
 
ChatGPT, on the other hand, understood the intent and suggested relevant recordings - even when the words I used weren’t explicitly mentioned. It’s about meaning, not just word matching.
All fine and good. Then why are you posting this on this forum? I don't think anyone would be terribly surprised to learn that GPTs can do what they do (to be honest).

In other words, were you simply trying to generate human responses to bolster or replace the weaknesses in ChatGPT's psychoacoustics response (as I guessed you were), or is there some other reason?

It's very disconcerting to spend time answering what appeared to be a vague question, then get ignored while you post information that was needed to answer your question. Do you see what I'm saying?

Chris
 
Last edited:
In other words, were you simply trying to generate human responses to bolster or replace the weaknesses in ChatGPT's psychoacoustics response (as I guessed you were), or is there some other reason?

It's very disconcerting to spend time answering what appeared to be a vague question, then get ignored while you post information that was needed to answer your question. Do you see what I'm saying?
I see what you mean now. I didn’t mean to cause any confusion - apologies if I did. The point of this topic was just to see whether, at least in this particular case, the robot’s summary of responses was even in principle in the ballpark. I wasn’t trying to actually diagnose or debug anything at that point. I tried to make that clear in the first paragraph, but I guess it didn’t quite come through.
 
Back
Top Bottom