• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

What role should listening play in speaker design

I read multiple comments about we audio designers tell stories just to please the crowd or simply for commercial gains. I find this depressing. If I wanted to make money then I would not be designing audio gear. My motivation is to create stuff that I would love to have myself but does not yet exist. I can then only hope that others may like it too.

But is it sometimes not also a reality one has to deal with? Audio, in this segment of the market at least, is a place where for much of the consumer base objectivity often comes secondary (if it comes at all).

I mean, I would totally understand if certain aspects are romanticized when talking designers talk about products, if that means staying alive in a competitive (luxury) business.
Ideally you appeal to all parties of course, both the subjectivists and the objectivists alike - but that I think, requires a good story :)
 
My Diy mtm monitors, later i added a 0.5ohm resistor to tame the tweeter 1.5db or so :)
Outdoor measurements 1 meter distance, they sound very good paper Satori woofers and the TW29-B tweeter.
They are sealed and for use with subwoofers.


20251222_154302.jpg20251218_204455.jpg20251220_132105.jpg
 
What is the difference between direct first arrival and axial response?
Direct first arrival generally has a shallow slope down with increasing frequency largely due to attenuation by air at higher frequencies. It's also influenced by the directionality of drivers, primarily the tweeter but also small midranges ( my preference). If you don't listen perfectly on-axis with drivers pointed straight at your ears there will be influence due to that. I have had systems that sounded a bit harsh directly on-axis, so a slight toe out that place my ears a bit off-axis made an improvement. Others were the exact opposite. There's a lot of variability between systems in this regard.

This I suppose is also a factor relating to the OP question about what role listening should play in design. I always listen and experiment with minor changes in my systems to decide if it's right. That said, my solution is to make changes using the design software/listen/repeat as much as necessary. Most often it turns out to have been the choice of crossover frequency/slopes for any pair of drivers in a system. As an aside that is one reason I have used SoundEasy (SE) and the Ultimate Equalizer (UE) for years. SE provides an option to audition even passive designs before constructing the final crossover due to its digital filter. I spend a lot of time correcting and/or find tuning the crossover Fc/slopes to find what sounds best since there is no "perfect" crossover for any set of speakers. In my opinion. The UE provides for almost immediate changes that can be auditioned with a few keystrokes and a mouse click to start.

Edit: There is also the impact in the off-axis due to diffraction and driver off-axis irregularity. I always apply felt for tweeters and small midranges such that the off-axis response has a smooth, monotonic drop with frequency. Without that, if you do listen off-axis there can be rather severe irregularities, especially if an attempt was made to make the direct on-axis response as flat as possible. This often results in the oft-measured off-axis flare between tweeter and the next driver.
 
Last edited:
Direct first arrival generally has a shallow slope down with increasing frequency largely due to attenuation by air at higher frequencies. It's also influenced by the directionality of drivers, primarily the tweeter but also small midranges ( my preference). If you don't listen perfectly on-axis with drivers pointed straight at your ears there will be influence due to that. I have had systems that sounded a bit harsh directly on-axis, so a slight toe out that place my ears a bit off-axis made an improvement. Others were the exact opposite. There's a lot of variability between systems in this regard.
It sounds to me like you're equating "direct first arrival" and "listening window". Is that right?
 
It sounds to me like you're equating "direct first arrival" and "listening window". Is that right?
No. Listening window is a factor in power response and affects systems where more than one person may listen at the same time such that more than one person cannot be in the "sweet spot". I listen alone, so I can position my systems optimally for my listening position. Direct first arrival is exactly that, the initial response at the listener's seated position, wherever that is.

That also brought up another possible factor in the design of a system. Much depends on placement within the room of course. If there are limits to that, say that the system must be placed near the front or side walls, the design should take that into account. First arrival could include reflections that our ears interpret as direct rather that time-delayed reflections that can affect timbre and other aspects. Just one more possible factor related to what role listening plays in design.
 
No. Listening window is a factor in power response and affects systems where more than one person may listen at the same time such that more than one person cannot be in the "sweet spot". I listen alone, so I can position my systems optimally for my listening position. Direct first arrival is exactly that, the initial response at the listener's seated position, wherever that is.

That also brought up another possible factor in the design of a system. Much depends on placement within the room of course. If there are limits to that, say that the system must be placed near the front or side walls, the design should take that into account. First arrival could include reflections that our ears interpret as direct rather that time-delayed reflections that can affect timbre and other aspects. Just one more possible factor related to what role listening plays in design.
No, first arrival never includes reflections. That's why it's called first arrival.
 
"Initial comparative listening tests, using varied programme material*, suggested that the reproduced sound was less coloured by the LS 5/8 than by any other loudspeaker so far tested at Research Department. Nevertheless, audible colorations are present, and the most noticeable of these is at about 600 Hz. A low-Q notch filter at this frequency was therefore inserted and it was found that a 2 dB dip produced a marked subjective improvement without noticeably affecting the tonal balance. *This included serious music (solo instrumental, orchestral and choral) as well as light music and speech."

The above quote was taken from "Design of the high-level studio monitoring loudspeaker type LS 5/8", C.D.Mathers, M.Sc., M.I.E.E., BBC Research Department Report 1979/22 (November 1979). Hence, it seems that listening does/should play a role in the design of loudspeakers. It 'twas thus in the 1970s, and not much has changed since that time. The need for that 2 dB dip mentioned above seems to be a self-evident adjustment based on the simple measured on-axis frequency response curve, shown below. This still shows a bit of relative boost in the 500 Hz to 1000 Hz frequency region, which almost seems to be compensating for the broad dip in the 200 Hz to 500 Hz region. Note the mislabelled 500 Hz frequency in this graph.

View attachment 522214
I was working at the BBC in those days - the LS5/8 were everywhere. Operationally, the sound engineers would adjust the EQ to make what ever program material being used sounded “correct” out of this loudspeaker. What I mean by correct, is the how the music, speech should sound - I recall they became very skilled at this. You have to remember that they could go next door in to the studio and hear the real thing for reference. They knew how the newsreaders voice should sound because they talked to each other and so “knew their voice” , but it was more general than that - imagine you spend a career as a sound engineer, every day to get to hear the real thing and the LS version of it - immediately you can detect a non naturally sounding loudspeaker and through skill, you know what EQ is needed to fix it. One sound engineer that 1 worked with could walk in to a space, click his fingers and tell you if the room was too reverberant and whether close mics would be needed, and he was correct most of the time.

Also a trend amongst microphone manufacturer’s (particularly Shure) was to creat a “presence” hump around 1-3kHz to improve speech intelligibility, so that often required EQ. I don’t recall anything in particular about the so called BBC dip, but remember the LS5/8 were used in studios, not in your listening room, so vastly different acoustics. Ditto source acoustics where the program originated. So I would caution against using this as a kind of acoustic profile “standard” for your application. I think its pretty much irrelevant and related to the LS5/8 - a good speaker at its time, but crap compared with good modern designs.

On a related topic, there has been a raging debate on another 2 thread, when Andrew Jones (Ex KEF designer) said you have to use both measurements (mostly for design validation, IMO) and then listen to the final product and tweak it if needed. As a designer, he will have listened to hundreds of speakers, with his colleagues, musicians, etc identifying audible resonances, distortions etc and he will have developed similar BBC sound engineers skills, of knowing mostly if something is “wrong”. This is the type of person you want to be involved in designing a speaker that has to “sell”

Anyway, I digressed, but my suggestion is not to pay too much attention to the BBC dip and definitely don’t use it as a general EQ criteria
 
On a related topic, there has been a raging debate on another 2 thread, when Andrew Jones (Ex KEF designer) said you have to use both measurements (mostly for design validation, IMO) and then listen to the final product and tweak it if needed.

Isn't that this thread? Or is there another too?
 
No, first arrival never includes reflections. That's why it's called first arrival.
That is true unless the reflections are so close in time that our ears treat it as first arrival. That's no different than the impact of diffraction that alters the direct radiator wave or some very close object (such as a very close wall) that reflects it within a time limit that makes our ears hear it as the first arrival. Put a system up against a wall and the reflection will arrive at the same time as the diffraction that would have occurred had the system been away from the wall. As I see it it's first arrival at the listener's ears and how our hearing treats it.

How accurate this is I can't say, but some online research into the time frame returned this:
A reflection from a speaker is generally considered part of the first arrival (or "direct sound") when it arrives within roughly 20–40 milliseconds (ms) of the initial direct sound. This phenomenon is part of the Precedence Effect (or Haas Effect), where the brain fuses early reflections with the direct sound into a single perceptual event, increasing the perceived loudness and clarity rather than being heard as a distinct echo.
It sounds to me like you're calling "direct sound" to be first arrival, but to me it's direct sound plus early reflections that our brain fuses into a s single event, as is certainly the case with diffraction. I see no reason to think of it differently for some early reflection within a short time window.
 
Last edited:
... some people can have a relatively good ear with regards to hearing if something sounds off, or if something sounds right when reproducing music through a loudspeaker. ...that a person who designs speakers for a living actually can trust his ears to quite some extent is not very far fetched.
There may be an inherent difficulty with that approach, which seems to be touched on by Harwood's experience, noting in particular the very last sentence:

"Now the alarming fact is that A/B testing may under certain circumstances give rise to completely wrong results when comparing the sound quality of two loudspeakers. If pink noise is used as a convenient source, and a deep narrow crevasse produced in it, it has been shown that the effect will be almost inaudible. If this is listened to for, say, half a minute as if programme were being used to judge a loudspeaker, and then the crevasse is switched out so that a uniform spectrum is produced, the ear will hear a strong colouration at the frequency of the crevasse. It seems that there are two mechanisms at work; the conscious one ignores the crevasse but the subconscious one detects it clearly. When the uniform condition is suddenly heard the subconscious mechanism comes forward and points out that there is now a considerable amount more sound energy at the frequency of the crevasse, and as that condition had been accepted as satisfactory the only conclusion to be reached is that there is now an excess in this region and that the sound must now be highly coloured. Transferring this to loudspeakers it is implied that if one with a crevasse is first listened to then it will probably appear that one with a uniform response is coloured."

From: "Some factors in loudspeaker quality", H. D. Harwood, Wireless World, May 1976, pages 45–48, 51–54.

It would seem that such vagaries of human perception can be largely removed by accurate measurements of loudspeaker on-axis and off-axis frequency responses (including distortion), supported by an understanding of the typical variations inherent in domestic listening rooms inhabited by users of those same loudspeakers.

Hence, the above seems to indirectly point to the understanding that a "good" loudspeaker system needs to be designed for a flat on-axis response, irrespective of other approaches proposed in the same article. This seems to be broadly supported by another Harwood observation:

"The conclusion therefore was that it is essentially the direct sound which determines the sound quality and not the spherical response. The measurement of frequency response at various angles in a free-field room is therefore a much better indication of performance than the spherical response even when listening in the reverberant field, and this has been confirmed by careful listening tests many times since. ... The sound quality of a loudspeaker is determined much more by the direct response at any given angle than by the spherical integrated response, and at any rate for stereophonic purposes there may well be a degree of omnidirectionality beyond which it is inadvisable to go."

Supporting that approach, Toole noted that:

"It is well-documented that the timbre of voices and musical instruments is mainly dependent upon the amplitude-versus-frequency spectrum of the sounds. Other linear and non-linear effects can be influential but, for simplicity, this discussion will concentrate on the dominant factor. The accurate reproduction of timbre, therefore, requires a wideband, flat and smooth amplitude response. Wide bandwidth is necessary to reproduce all of the sound, flat response is required for neutral spatial balance, and a smooth response is a good indicator of the absence of audible resonances."

"The deviations from a linear, flat amplitude response that can be detected are very small indeed, a few tenths of a dB for low-Q resonances and other wide-bandwidth deviations. The problem in assessing loudspeakers has been that their sound is very much dependent on the listening conditions. Different organisations and different individuals have different views about what constitutes ideal listening conditions."

"But there still remains a problem, since there seems to be about as much inconsistency in reviewing as there is in listeners' homes. As a consequence, there have developed cliques of manufacturers and reviewers who seem to be mutually appreciative of each others' efforts, and others that go their separate ways."


From: "Loudspeakers and Rooms For Stereophonic Sound Reproduction - Part IV", Floyd E. Toole, Australian Hi-Fi, pages 30–34 (year unknown).


So, the question seems to be, which hi-fi loudspeaker manufacturer evaluates their loudspeakers in four or five different listening rooms that are representative of the totality of their (discerning) customer base? That seems to be the final frontier, but also an inherently expensive and time consuming one to tackle with any alacrity.
 
Direct first arrival generally has a shallow slope down with increasing frequency largely due to attenuation by air at higher frequencies.
Over a 3–5 metre listening distance, what is the expected attenuation of a 5kHz signal in a typical listening room environment? Couldn't that all be taken into account during the initial design process?
I have had systems that sounded a bit harsh directly on-axis, so a slight toe out that place my ears a bit off-axis made an improvement.
Can you recall what their on-axis frequency response was like? Would not some judicious application of EQ been a more direct and correct way of addressing such an issue? Wouldn't toe out simply in and of itself create issues of its own, which may or may not immediately surface on the currently-being-heard programme material?
There's a lot of variability between systems in this regard.
As there undoubtedly is in the acoustical nature of listening rooms of owners of loudspeakers, without even touching on relative placement of loudspeakers in said listening rooms.
 
Over a 3–5 metre listening distance, what is the expected attenuation of a 5kHz signal in a typical listening room environment? Couldn't that all be taken into account during the initial design process?
I don't know the specific answer to the first question without some research and yes to the question for personal designs. My issues were partly in the past with commercial systems.
Can you recall what their on-axis frequency response was like? Would not some judicious application of EQ been a more direct and correct way of addressing such an issue? Wouldn't toe out simply in and of itself create issues of its own, which may or may not immediately surface on the currently-being-heard programme material?
I don't have the memory to recall specifics. EQ could fix most problems, though some tweeters years ago had peaks on-axis that you don't try to correct with a crossover due to the impact off-axis, old metal diaphragms in particular. I also recall coaxial systems that were designed to be used off-axis specifically due to on-axis tweeter response. Look at the manufacturer's measurements of many coaxials and you might agree that the response at off-axis, maybe 5 degrees, is the better design axis. They may have extreme peaks on-axis that shouldn't be controlled completely via the crossover. I never considered using those old metal dome tweeters largely for that reason. Certainly today they are much improved.
As there undoubtedly is in listening rooms.
Yes, which I think tends to support the idea (for personal designs at least) that listening should play a role in design to assess if it's adequate and/or optimum. One thing I do know is that in the past when I was in the market for speakers I would never buy any based on the measurements, although that was a good starting point. I always auditioned them knowing where and how they would be used at home. Unfortunately there were stores that had good speakers, but they didn't know how to set them up in a room. In a couple of cases well regarded systems simply sounded awful due to the room, so I couldn't risk that they would be poor in my room. Evaluation by listening, what I end up doing with my own designs. I have not ever had a system that I didn't tweak or make a significant change after listening due to poor crossover choice or implementation, especially with regard to the off-axis (power) response. Getting a flat on-axis response is the easy part.
 
How accurate this is I can't say, but some online research into the time frame returned this:

"A reflection from a speaker is generally considered part of the first arrival (or "direct sound") when it arrives within roughly 20–40 milliseconds (ms) of the initial direct sound. This phenomenon is part of the Precedence Effect (or Haas Effect), where the brain fuses early reflections with the direct sound into a single perceptual event, increasing the perceived loudness and clarity rather than being heard as a distinct echo."

Was what you quoted generated by AI? I ask because it's misleading. What it says about the Haas Effect is correct, but the assertion that everything within the Haas Effect interval is "considered part of the first arrival (or "direct sound")" is incorrect, at least in a home audio setting.

The Haas Effect suppresses directional cues from reflections arriving within 20-40 milliseconds of the direct sound so that we can tell the direction a sound came from in a reverberant environment. After that 20-40 milliseconds (the timespan varies with the specifics), reflections can be heard as distinct echoes, assuming they are still loud enough.

The Haas Effect kicks in at about .68 milliseconds, which roughly corresponds to the path length around your head from one ear to the other. Before .68 milliseconds, the "window" is open for first-arrival sounds (no Haas-effect suppressions yet). For a first-arrival sound arriving from directly to your left, it will wrap around your head and reach your right ear .68 milliseconds after it reaches your left ear. The ear can tell the arrival direction from how long the interval is between arrival at the first ear and arrival at the second ear. That time interval will be less than .68 millisecons for any other arrival direction. Then the "window" closes after .68 milliseconds because after that reflections would be giving your brain contradictory directional cues. The Hass Effect wears off after about 20-40 milliseconds.

But psychoacoustically there is a great deal happening due to reflections that arrive between the initial arrival of the sound and the end of the Haas Effect 20-40 milliseconds later!

If reflections off the front face of the speaker (or its edges) occur, they almost always arrive within that initial .68 milliseconds when the "window" is open to receiving new first-arrival sounds. And the ear/brain system can mis-interpret those extremely early reflections! They can function as false azimuth cues: The ear/brain system computes the horizontal arrival angle (azumith) of a sound from the less-than-.68-milliseconds time gap between its arrival at one ear and then the other ear. Super-early baffle reflections arriving before .68 milliseconds can interfere with this and result in degraded sound image localization. So these very early, pre-Haas-effect reflections are disproportionately detrimental to spatial quality, and you'll often see manufacturers use round-overed edges or very narrow baffles for the sake of better imaging.

Reflections off of room surfaces all arrive later than .68 milliseconds so the Haas Effect suppresses - but does not necessarily eliminate - their influence on the perceived arrival direction. More specifically, a speaker's strong first reflection off the same-side-wall will tend to widen the soundstage in the direction of that reflection. There are arguably some trade-offs from having strong early same-side-wall reflections, but in general most listeners find the soundstage-widening effect to be worthwhile and enjoyable.

Pretty much all of the in-room reflections affect sound quality. If these reflections are spectrally similar to the first-arrival sound, their effect on sound quality is generally beneficial. If they are spectrally significantly different from the first-arrival sound, they may be detrimental to sound quality. How loud they are also matters, as does how quickly they decay.

The longer a sound lasts, the louder it seems to be, even if the measured SPL is unchanged. So to the extent that some frequencies take longer to decay in-room, those frequencies can be perceived as louder, even if they do not MEASURE as louder.

The in-room reflections also convey information about the acoustic space of the playback room, with earlier initial reflection arrival times obviously corresponding to shorter reflection paths and therefore a smaller playback room size. Speaker placement and orientation, listener location, and room acoustic treatment can be used to reduce the "small room signature" of the playback room, which in turn can make the "sense of space" on the recording itself more perceptually dominant.

Getting back on point, I would probably consider reflections off the front baffle (and/or diffraction) arriving within .68 milliseconds of the first wavefront to be effectively part of the "first arrival or direct sound", but everything after that is, imo, clearly NOT part of the "first arrival or direct sound", from the standpoint of playback in a home audio setting.

This is just scratching the surface; people far smarter than me have written chapters if not books on this topic.

And, I welcome correction and/or elaboration from any such people far smarter than me (or even just a little smarter than me) who read this.
 
Last edited:
I'm not arguing listening and doing some tone shaping is not desirable,
I guess that begs the question of why would tone shaping be desirable? Doesn't it go against the principle of neutrality to the source material being reproduced? After all, to some degree, one can always season an otherwise "perfect" loudspeaker to suit one's taste. And who's to say that one person won't prefer salt while another prefers pepper on even the same programme material.
 
I'll let you know. Here's a 5" +1" sealed monitor to be used with a subwoofer as a living room TV setup. It has a hybrid crossover where the passive components handle the crossover, and DSP is used to shape the signal (like the JBL 7-series). Measurements were made, quick active crossover tested to verify the measurements under a couple of vertical angles.
An interesting little design. I'm wondering if you have deliberately included the 1dB boost across a quite wide frequency range, from about 800 Hz all the way to 4 kHz. Funnily enough, this seems almost to be the inverse of a "BBC Dip". :)
1775358343498.png
 
Maybe alternate between theory, calculations, measurements and listening to various irritations of the new model?
That's a pretty neat Freudian slip: "irritations" instead of "iterations". :) Indirectly, you seem to have hit the proverbial nail right on the head!
 
But what if it doesn't sound good?
That begs the question: by what metric doesn't it sound good?
As soon as I change a crossover part, it now doesn't measure "perfectly". I change a single capacitor to the next lower value and two resistors by 0.5 ohms and it is a clear and obvious change for the better but now it doesn't "measure as good".
If it doesn't "measure as good", why would it be considered to "sound better"? Clear and obvious changes.—such as a small increase in output level in a small part of the frequency range—can easily mislead the listener into proclamations of "that's better". It's often been noted that "louder" is "better" in listening tests.
 
This reminds me of the subjectivity that comes in (from any number of factors) in deciding whether a loudspeaker is doing a good job with any particular genre.
So what is "doing a good job" going to sound like with a loudspeaker across the wide gamut of musical genres that are out there?

Isn't the first and foremost aim of the listening "test" to try and discern whether the loudspeaker sounds "natural", using programme material of a such a type whereby the "naturalness" of the sound reproduction can be subjectively judged? If one is a trained/experienced listener, maybe "subjectively" could be replaced with "objectively". The reproduction sound quality on other sources would then simply be entirely irrelevant, as once the "natural" sound quality target is hit, let the other genres fall where they may. They may not sound "good" using the selected programme material, but the loudspeaker itself would not be at fault in this instance.
 
I guess that begs the question of why would tone shaping be desirable? Doesn't it go against the principle of neutrality to the source material being reproduced? After all, to some degree, one can always season an otherwise "perfect" loudspeaker to suit one's taste. And who's to say that one person won't prefer salt while another prefers pepper on even the same programme material.
There can be reasons, for commercial designs for example: the less crossover parts you can get away with without obiously degrading the sound the better for cost management.
Or, you're trying out some configurations in which you are less or not experienced, for example you're adding a rear firing ambiance tweeter which affects sound power.

An interesting little design. I'm wondering if you have deliberately included the 1dB boost across a quite wide frequency range, from about 800 Hz all the way to 4 kHz. Funnily enough, this seems almost to be the inverse of a "BBC Dip".

Good catch, since these will not be used nearfield I did opt to balance out the curves somewhat. I have been listening to them in the nearfield for the past week, can't say anything is bothering me. It's still remarkably flat overall.
1775370343361.png
 
The largest study covering this is "A Multiple Regression Model for Predicting Loudspeaker Preference Using Objective Measurements" by Dr. Sean Olive.
If you analyse the study the preferred slopes can be deducted. Conveniently in Vituixcad the preferred slopes can now be set standard as "shadow slopes" so speaker data can be referenced against it.
Slope targets in Olive's preference study were taken for the most preferred speakers. VituixCAD has very close to same default target lines for 'Full space' option i.e. conventional small...mid-sized loudspeakers being omni at sub frequencies and cardioidish at high frequencies, and located to free space.
But slope target zones in VituixCAD are calculated using measured textbook directivity index DI. Speaker with DI slope of ca. 1.2 dB/oct. can have ON slope of 0 dB/oct. Targets with higher DI slopes are limited to ON slope of +0.05 dB/oct to avoid giving impression that deeper DI slopes would be okay. They're certainly not - expecially if the speakers are used for casual listening anywhere. In those cases DI slope of 1.2 dB/oct. could already be too steep.

All three slopes of perfectly constant directive speaker (theoretical) would be equal so slope targets should be equal too. It's some negative value in practice to avoid too thin and bright sound to mid...far field in presumeably balanced room acoustics.

Half space concept and it's default slope targets are somewhere between conventional free space and constant directive concepts. Half space has DI >= 3 dB at LF so also that would sound thin compared to conventional free space concept with ON slope of 0 dB/oct.

Olive's preference study does not cover last two concepts and their close relatives, but designers have to design decent default balancing for those concepts too. Therefore VituixCAD has to have logical extension outside single simplified preference study.
 
Last edited:
Back
Top Bottom