• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Couldn't we technically standardize "perceived" frequency response of a loudspeaker system?

If estimated in-room response means the estimated frequency response that a microphone would pick up in a room, then no, that is not what I mean.

I mean a curve that shows frequency response from the listening position but modified to include how the characteristics of the reflections would affect your perception of the frequency response in a way that the raw frequency response from the mic would not show.

EDIT: I may have worded this badly. So when a mic measures frequency response, it does not distinguish between direct and reflected sound. It is all the same to the mic. But our brains do, and are actively trying to hear what the direct sound sounds like, but they can't do it perfectly. PFR would be akin to the frequency response after your brain is done interpreting it.
Unfortunately, there is no simple way to do this, due to the nature of how sound in a reflective environment works.

All the information is already in the measurement (acoustic measurement in-room at listening position), but it is not possible to map all the information into one graph that is easy to read.

This is because the perceived frequency response will be affected differently for different signals, so that a stationary waveform will be like the frequency response graph (no gating), while transient signals (everything where level changes over time) will be colored by the reflections and decaying energy in the room, and this coloration will be different depending on the duration and spectral distribution of the signal.

So tonality for a stationary, non-percussive instrument will be affected different from a transient instrument, like drums. This is also why a live room with lots of reverb can never sound the same as a dry room, and why speakers with different radiation patterns sound different in the same room.
 
If estimated in-room response means the estimated frequency response that a microphone would pick up in a room, then no, that is not what I mean.

I mean a curve that shows frequency response from the listening position but modified to include how the characteristics of the reflections would affect your perception of the frequency response in a way that the raw frequency response from the mic would not show.

EDIT: I may have worded this badly. So when a mic measures frequency response, it does not distinguish between direct and reflected sound. It is all the same to the mic. But our brains do, and are actively trying to hear what the direct sound sounds like, but they can't do it perfectly. PFR would be akin to the frequency response after your brain is done interpreting it.
It produces an estimate based on reflections, reverberant field, and direct sound. They have documented the method extensively, with pictures, perhaps start here. They use a generalized room, based on measurements of multiple rooms. The estimated in room response is created from the 3D sound field measured from the speaker, and uses that data to calculate in room response. If you have different reverberant times, reflections, and some resonances, then you too can take the raw spin data and create an estimate of what your room will sound like with said speaker based on a specific construction of all direct, reflected, and reverberant sound. I think this is what professional sound consultants do for a client's specific room.:cool:

I know of no generalizable way to make the figure of merit you want. I tend to go at the problem from the opposite direction; I have measured many speakers in my room, and am pretty clear on the bass modes, the reflected sound, and the reverberant energy in my space. I have dealt with the most problematic issues as best I can, and have a fair idea of how a speaker will behave based on good quality published measurements. From those measurements, I also have a decent idea about how responsive they will be to EQ, placement, etc. And a reasonable idea if a particular model can be ruled out for my use.
 
I've wondered this, too: given full spinorama data for a speaker and the impulse response at the listening postiion, is it possible to derive a correction filter that gives a "canonically" neutral response.

DRC-FIR attempts something like this using just a single listening position impulse response using some psychoacoustic principles described here:


However, in my room the resulting "psychoacoustic target" is nearly flat rather than the usual downward slope, which of course sounds too bright in room. Some variant of the B&K target usually sounds better.
 
One simple reason is that (so far) we listen with ears, not microphones or neural implants, and since ears vary in size and shape, perceived response can thus also vary greatly. The best we can hope to achieve with ears is subjective perception. Unless we clone humans. Then, we may be able to do it for a particular group of clones.
Seems to me the direct-to-reflected sound ratio would play a role, and that's going to change with listening distance, and from one room to the next, and would be somewhat dependent on the acoustic properties of the room surfaces, so I think that standardization might need to include specifications covering such variables.
I'm confident the two above statements answer most of the reasons, for the differences.

1. No two set of ears much less 7.9 billion set of ears are the same.

2. Seldom are two rooms set up the same.

3. EVERYBODY has different taste in music including setting in a church in Brazil vs. setting in a church in Chicago. That's just "church songs"
Head to New Orleans and all bets are off, once again.

Personally I'm glade there are different preferences from one person to the next. "To thine own self be true." One of the most prophetic statements
ever made. One mans garbage is another mans treasure. "He likes opera", "she like RAP," One likes potato chips, the other like Fritos.

Regards
 
I'm confident the two above statements answer most of the reasons, for the differences.
The speaker can also measure differently.
I am not sure it is easy to so quickly discount measurements as bing meaning to help.

1. No two set of ears much less 7.9 billion set of ears are the same.
It is stil a fact that the same person, in the same room, with the same speakers… will hear a difference as the speakers are moved around.
Let’s resist throwing in more variable, like the ones below, until we can agree what one set of ears, in one room, with one pair of speakers can do

2. Seldom are two rooms set up the same.

3. EVERYBODY has different taste in music including setting in a church in Brazil vs. setting in a church in Chicago. That's just "church songs"
Head to New Orleans and all bets are off, once again.

Personally I'm glade there are different preferences from one person to the next. "To thine own self be true." One of the most prophetic statements
ever made. One mans garbage is another mans treasure. "He likes opera", "she like RAP," One likes potato chips, the other like Fritos.

Regards

To thine own ears be true is great as a slogan.
But I am not sure what the OP is yearning for in a “new standard” or a with a “new measurement”.
It is all pretty frequency response centric.
 
I think more researchers are now measuring and using binaural room impulse responses (BRIR) to help understand and characterize room acoustics and room-loudspeakers interactions. Hopefully soon they may further our knowledge beyond what Dr Toole has already contributed.

Here is a link to Dr Olive's fancy KEMAR setup to measure the BRIR of the Harman reference listening room he designed.
 

Well this won't speak directly to the Perceived Frequency Response metric you propose; this is just an anecdote that may or may not be relevant:

I have done some experimenting with unorthodox relationships between the direct sound and the in-room sound, and one of my observations has been that minimizing the spectral discrepancy between the two is desirable. This contradicts the prevailing wisdom, which holds that the ideal is flat direct sound and downward-sloping in-room sound; in other words, the prevailing wisdom calls for a (deliberate?) spectral discrepancy between the direct sound and the in-room sound.

Therefore, at least in my opinion, exactly "where the goal posts are" for the relationship between the direct and in-room sound has not been conclusively established. So I would resist a "standardization" which is based on the assumption that this matter has been settled.

And I might well be in the extreme minority there!
 
Last edited:
Therefore, at least in my opinion, exactly "where the goal posts are" for the relationship between the direct and in-room sound has not been conclusively established. So I would resist a "standardization" which is based on the assumption that this matter has been settled.

And I might well be in the extreme minority there!
I suspect Linkwitz would have agreed on that, given the unusual published designs, experiments and musings on room interaction over the years.
 
I suspect Linkwitz would have agreed on that, given the unusual published designs, experiments and musings on room interaction over the years.

His Orion speaker did not have a rear-firing tweeter, or at least it did not when I first encountered it many years ago, and I sent him an e-mail advocating the use of a rear-firing tweeter so that the backwave energy would have the same spectral balance as the front radiation. He did not reply to my e-mail so I don't know whether it had any influence on his thinking, but the next time I encountered one of his designs it had a rear-firing tweeter.
 
If estimated in-room response means the estimated frequency response that a microphone would pick up in a room, then no, that is not what I mean.

I mean a curve that shows frequency response from the listening position but modified to include how the characteristics of the reflections would affect your perception of the frequency response in a way that the raw frequency response from the mic would not show.

EDIT: I may have worded this badly. So when a mic measures frequency response, it does not distinguish between direct and reflected sound. It is all the same to the mic. But our brains do, and are actively trying to hear what the direct sound sounds like, but they can't do it perfectly. PFR would be akin to the frequency response after your brain is done interpreting it.

The spin contains a good representation of the actual 3D sound field of the speaker, that is a pretty complete measurement.
Klippel does a mighty good job of generalizing the integrated response in a canonical room.

You are asking for an estimation that involves the exact shape, reflectivity, and reverberant characteristics of your room. And a model of your head and ears. And since you mention 'perception' and seem to blend concepts like room targets with calculated responses, you would need some statistical study on perception and preference.

Perception and preference studies indicate frequency response is the most important factor. The rest is math. But for sure, you will still be left with changing toe-in, back wall position, because perceptions have a wider error bar than the result of a detailed calculation of the sound field.

Do you have a microphone, have you measured your speakers and room?
 
Everybody has given me a ton to think about here and it will take me a long time to fully comprehend everything that has been said. Thanks for everyone who has participated in the convo so far. In the meantime I want to make some things clear because I might not have done the best job explaining.

First of all, my idea of a "standard" has nothing to do with preference. In a way, whatever the standard ends up being is somewhat arbitrary. Take video calibration for instance. There are standards there, but all that calibrating to those standards means is that what you see on your TV is going to be as close as possible to what the post-production team saw on their monitors because they're using the same standard. But the standard doesn't determine what the content actually looks like. The movie team does, and you do too, should you decide to deviate from the calibration.

Another analogy: imagine loudspeakers didn't exist and the only way people listened to music was headphones. The Harman Curve (the one for headphones, not loudspeakers) is not an official standard but let's say it became one. It seems like a lot of people have an issue with the Harman Curve, but if it became a standard, everyone's issues with it would pretty much become moot. Audio engineers would mix and master music and film using headphones that had the Harman Curve and if there was anything wrong with the curve, they would be compensating for it automatically. Eventually the industry would start remastering old recordings made before the Harman Curve became a standard to make sure they sound good on headphones that use the Harman Curve. Really, the standard could be the goofiest looking curve imaginable and it would still work as long as engineers are using it and expect the average consumer to be using similar headphones. Of course different headphones and EQ options would always be available to anyone whose preference doesn't line up with the average audio engineer, or as budget options, and yet the standard would still be really helpful as a reference point.

Now that that's out of the way, I want to make it really clear what I am actually proposing.

I was watching this Audioholics video and they were discussing how certain speakers with certain directivity would sound incorrect if they EQ'd them to match their usual preferred frequency response. Directivity influences the mix of reflected and direct sound, which influences our perception of the frequency response in a way that the raw frequency response measured from the listening position does not show. My thought was "It would sure be nice if there was a way of measuring frequency response that would naturally compensate for this irregularity, so that instead of all that trial and error, you really could just use your usual preferred target frequency response and be good to go."

I was also looking at this thread by Amir and saw just how different a target curve can look like for a big theater versus a small room. This difference is IIRC largely because in a larger room the mix of reflected and direct sound is different, which again, influences our perception of the frequency response in a way that the raw frequency response doesn't show.

I'm no scientist, but couldn't we theoretically make a bunch of binaural recordings with a dummy head in as many different kinds of rooms as we could manage with a bunch of different speakers that have different directivity playing the same music, and ask, say, 100 audio engineer participants to listen to these recordings on headphones and equalize every recording until they all sound like they have the same frequency balance to them? And then we could use that data to create an algorithm that could extrapolate how pretty much every possible mix of reflected and direct sound influences our perception of frequency balance? And then once we have that, could we not create a standard curve for that perceived frequency response that all music and film studios could use?

And again, I am well aware that this only standardizes one aspect of sound and doesn't mean your system would sound the same as another using the same PFR. However, that still seems useful. Like how correct screen calibration is based on the brightness of the room. No one can argue that watching a movie in a bright room will ever be the same as watching a movie in a dark room yet with the right calibration they are somewhat comparable. No one can argue that a typical projector can look like an OLED screen, yet a standard for calibration is still useful for both of them. A reference point is useful.

I don't mean to come in and act like I know what's best for the future of audio. There is SO much I don't know. I am just legitimately curious if this makes sense.
 
First of all, my idea of a "standard" has nothing to do with preference. In a way, whatever the standard ends up being is somewhat arbitrary.

I don't want arbitrary standards, thanks very much.
 
could we not create a standard curve for that perceived frequency response that all music and film studios could use?
Sort of but not really.

You're on the right track in that the difficulty comes in with variations in directivity. And you can, in fact, EQ the overall in-room response of a speaker (including all the reflections and stuff) so that the perceived frequency response is the same across the board. This is not even that hard, you can pretty much do it with pink noise and a mic.

However, the perceived frequency response is not the only problem. When you get variations in directivity you also get variations in timing, which affect stereo image and have secondary effects on perceived frequency response. Even if you can fix the latter you cause more problems in the former at the same time.

You are looking to close the famous circle of confusion your idea about just settling on a single response industry-wide is basically that. This problem has a well-known name... and you are not really on the wrong track IMO. And I think that in practice the industry is not TOO far from what you propose already. Most studios try to use a relatively flat, slightly downward sloping power response in-room. There are also fairly clear (if varied) ideas (if you look around) on what decay times should look like, when reflections should arrive, etc.

The real problem, which is a little hard to grasp at first, is that 1) EQ can't fix a room and 2) directivity for a given speaker is fixed, and EQ can't really do anything about that either.

Acoustics and directivity are 4-dimensional problems (space + time) but audio signals only have one spatial dimension (amplitude) so you can't really do anything to the audio signal that fixes those issues.
 
I was watching this Audioholics video and they were discussing how certain speakers with certain directivity would sound incorrect if they EQ'd them to match their usual preferred frequency response. Directivity influences the mix of reflected and direct sound, which influences our perception of the frequency response in a way that the raw frequency response measured from the listening position does not show. My thought was "It would sure be nice if there was a way of measuring frequency response that would naturally compensate for this irregularity, so that instead of all that trial and error, you really could just use your usual preferred target frequency response and be good to go."

This is just my opinion:

Even if you have an algorithm that takes into account what happens when you EQ a speaker having an off-axis response anomaly such that the Perceived Frequency Response can be accurately predicted and corrected, the off-axis response anomaly is still there. You cannot EQ away its effects, prominent among which is an inevitable spectral discrepancy between the direct sound and the reflected sound.

By way of example, Harman developed an algorithm that derives a "Preference Score" which has been used to rank speakers. Should we be relying on these scores? Not according to @Floyd Toole! Quoting from a post he made on this forum on March 25th of this year:

"...please stop putting any reliance on the calculated "scores". Learn to interpret the spinorama curves. That will have to do until we have an "educated" AI version of sound quality prediction. The ratings that were calculated by the Harman research group were done to prove a scientific point, and that done, they ceased to be used even by the people who created them. We rely on visual interpretations of the family of curves." [emphasis Duke's]

In other words: The specifics matter, the details matter, and an oversimplified metric which obscures the relevant details is of limited usefulness.
 
Last edited:
… I'm no scientist, but couldn't we theoretically make a bunch of binaural recordings with a dummy head in as many different kinds of rooms as we could manage with a bunch of different speakers that have different directivity playing the same music, and ask, say, 100 audio engineer participants to listen to these recordings on headphones and equalize every recording until they all sound like they have the same frequency balance to them? And then we could use that data to create an algorithm that could extrapolate how pretty much every possible mix of reflected and direct sound influences our perception of frequency balance? And then once we have that, could we not create a standard curve for that perceived frequency response that all music and film studios could use?
Proper scientists have done that.
It is not easy with a lot of ray tracing etc.



I don't mean to come in and act like I know what's best for the future of audio. There is SO much I don't know. I am just legitimately curious if this makes sense.
It sort of makes sense.
But most people do not want to pay a lot in consultancy fees for a room that they are renting, or one that their othr half ‘owns’… so it can get sort of academic for many people.
 
I don't see why a microphone plus a really good algorithm couldn't match an ear plus a brain pretty much spot on.

It is a good point, and I think you hint at it in later posts. The key is that you don’t need a good microphone. You need good microphoneS to do things like the Klippel NFS, Trinnov Optimizer, Yamaha YPAO, or Sony 360SS.

A single microphone cannot distinguish between on axis and reflected sound. An array of microphones can do that because the direct sound will be delayed by exactly the distance to the mics and the reflections will differ depending on the mic orientation.

But patents and business decisions probably get in the way.
 
It is a good point, and I think you hint at it in later posts. The key is that you don’t need a good microphone. You need good microphoneS to do things like the Klippel NFS, Trinnov Optimizer, Yamaha YPAO, or Sony 360SS.

A single microphone cannot distinguish between on axis and reflected sound. An array of microphones can do that because the direct sound will be delayed by exactly the distance to the mics and the reflections will differ depending on the mic orientation.

But patents and business decisions probably get in the way.
An autocorrelation, or cross correlation, can show the direct peak and the subsequent reflections.
A sweep or pink noise can used for that, just not a sinewave tone.
 
A single microphone cannot distinguish between on axis and reflected sound. An array of microphones can do that because the direct sound will be delayed by exactly the distance to the mics and the reflections will differ depending on the mic orientation.
This is what you get with MMM measurement and a single microphone : at higher frequencies, reflections are proportionnaly removed from frequency response. Above 600 to 800Hz, the response is mostly direct sound.
 
I think perceptions and preferences should be kept distinct in this type of discussion.

There are well established methods for getting different people to judge subjective experience of objective stimuli in ways that line up. It boils down to training, and getting everyone on the same page, resulting in high inter-rater reliability statistics. On the individual level, once you have listened and measured a lot, you will likely develop a sharper ability to judge what you hear in terms of objective measurements. As you do that more and more, what you hear lines up with the measures better and better.

For example, I can DEFINITELY hear my "room", in terms of reflected sound. But that took a lot of different set ups and measurements before I could hear it clearly. That included toe, speaker height, speaker position, in a lot of variations. At this point, I can tell you if my RT60 is spiking above 400ms below 100hz, for one example. But I would only claim accuracy for things that my room affects, because that is what I have "trained" on. Other frequencies, I am far less accurate with.

Preferences, trying to standardize those is a fool's errand. Show me any target curve and there will be people who will dislike it. Lots of them.

For a non-audio example, I can tell you if food is well made, take it apart component by component. Whether I like it or not. Kidneys can be very well cooked and seasoned, which I can certainly pick up on, but I hate them. Even when they are of the highest quality.
 
Back
Top Bottom