Understanding the State of the Art of Digital Room Correction

dc655321 · Oct 28, 2021

JRS said:
Hi Mitch--thanks for the presentation. I was wondering why the latencies are so long? I was poking around and found this claim made by Brute-FIR:

I guess I don't understand how dedicated 500MHz SHARC's can't seem to manage anything close? Granted my understanding is superficial, but it seems that most DSP speaker correction isn't coming close to this figure.

Latency is primarily a function of FIR filter design/requirements and convolution algorithm.

I don't understand your question(s), sorry.

JRS · Oct 28, 2021

dc655321 said:
Latency is primarily a function of FIR filter design/requirements and convolution algorithm.

I don't understand your question(s), sorry.

Sorry so let me try again: I noticed the spec that Brute-FIR flies, shows it running a lot of filters with 130K taps on multiple channels in real time. Yet the software implementations I have used (as well as the number Amir posted) are a whole bunch worse. This is what brought up the whole issue of A-V syncing. Unless I am reading it wrong, it seems to suggest that running say three XO's and say 8PEq's per channel ought to be no sweat--just trying to get a grasp on the bottleneck.

So when they say in real time, they mean they can keep up with a 44k data stream, and there will always be an initial and inherent latency in the process. So really, once one goes beyond a certain threshold by adding additional FIR's there may be a bit more latency, but it won't grow by say 0.3 seconds with each additional filter--in other words, running the acausal filters means a finite wait until enough data is buffered and the recursion begins, but adding many more will not necessarily make it much worse? Edit--and I don't mean stacking lots of LF filters on top of one another in one channel, but eq'ing say 6 channels at a time, each wit some LF EQ.

fluid · Oct 28, 2021

Most FIR filters use a centred impulse alignment where the latency due to the filter alone is half the sample time of the overall filter as shown above. Linear phase filters have symmetrical impulse responses so this makes sense. An acausal filter could have the impulse placed anywhere within the filter but that would limit the amount of time correction to the number of samples of delay so is more unusual.

Every soundcard, computer system, DSP, software etc. uses buffers to prevent dropouts, even DAC's have propagation delays through them. It is these in combination with the latency of the filters that make up the final delay from pressing go to when audio comes out.

An IIR causal filter in itself should cause no delay as the impulse peak is at the start, but if the software or associated processing has buffers this might not end up being true in practice. IIR/causal filter encoded as an FIR only needs to incur 1 sample of delay if the convolver is written to make it so but many are not.

Parallel processing of the same type should not incur any extra delay unless it causes the software to need to increase it's buffer settings to keep up.

fluid · Oct 28, 2021

jhaider said:
How so? Does it really tell you anything except that the software can auto-EQ a (single? if so even less useful) point in space? Given that we generally hear the speaker above the transition region and not what-a-mike-sees-at-one-point-in-the-room, I'm far from convinced that auto-EQ gyrations based on listening position measurements do anything except for soothe the eyes of people who like orderly-looking graphs.

Mitch didn't bite so I'll give it a go. Some of the points you make are relevant and valid but you might be looking at the glass as half empty instead of half full. I have done a lot of this myself so my comments are based on that experience.

As to whether a single point or multipoint measurement gives the best subjective result depends on the speaker, room, and treatment. In a non treated, reflective room where multiple listening positions are to be used, then multipoint measurements are more likely to be successful. In a treated room with a single main listening position and a speaker with good controlled directivity then a single point tends to be better. Taking multiple points within a small space is another option.

All filters that have any real "time" correction in them almost always work better with single point measurements. When you measure and vector average multiple impulses to allow phase correction on them it does not always work so well.

It is entirely possible to make something that looks good in graphs and sounds bad. Single point measurements and keeping the level of correction reasonable tends to avoid this from happening as much. The measurement that the correction is based on must be good and made properly. If this is not done correctly then all bets are off.

Getting the levels matched between channels to a high degree and making the frequency responses between them as similar as possible does have a really positive impact on the timbre as well as imaging/soundstage.

Optimizing the position of the speakers and listening position to avoid avoidable boundary interference is still necessary. A good electronic correction does not seek to fix these sort of avoidable acoustic problems with EQ. In the graph screenshot from the video posted before you can see that the base response of the speaker is very correctable. This is not always the case and brute forcing a nice looking graph with un-correctable issues will only lead to disappointment and deciding that all "Room Correction" is flawed.

All of the good room correction software will use frequency dependent windowing to allow different parts of the spectrum to treated differently. The success or failure of the algorithm relies heavily on this aspect. The aim is really to correct the direct sound from mid to high frequencies with progressively more room and time added in as frequency goes down. This is very similar in principle to using NFS or Anechoic measurements to correct the on axis and in room measurements to deal with modal issues. How successful one or the other technique is will depend a lot on the speaker seeking to be corrected. If it has horrible directivity and is a mess then the ability of any EQ to correct that is limited. But if that speaker had smoother direct sound then it would very likely sound better.

It is very important for the operator or the algorithm to know what to correct and what to leave alone because forcing everything flat or to a target will not work. Mitch knows what he is doing and so do the authors of Acourate, Audiolense and DRC-FIR, but in the wrong hands the software is powerful enough to destroy the sound if the wrong settings are used.

Holmz · Oct 28, 2021

dlaloum said:
So if Dirac Live does full frequency range Time and Frequency domain correction- then one would assume it would do phase correction throughout the same range?

Not so critical for those with "crossoverless" speakers... but many such speakers have a crossover for the Woofer, even if crossoverless between mid/tweeter.... and one would hope that Dirac would correct that?

Optimally one would process the subwoofer channels with a lower sample rate, and also less taps. But a 1 Hz resolution still takes a second of data.

There is not a great reason to have 1 Hz resolution between 1kHz and 20 kHz, unless the speakers is swinging wildly… or there is some desire to correct room reflection (which could be wild).

So the usual 1024 taps can do a lot and IIR filters with ALLPASS can also be used to do some of the large group-delay with higher order subwoofer arraignments.

One can get the 8-10 channel DSPs, and therefore could do a 3 or 4 way, and a couple of subs. Or do the subs with IIR and just add bulk delay to MB/MR/Tweeter to align them along with the phase/impulse response EQ.

Ideally all speakers would be time and phase correct, and then the EQ is not needed for the direct path… Then the remaining EQ just accounts for the room‘s frequency response.

Chromatischism · Oct 28, 2021

McFly said:
Not trying to adjust audio delay. Trying to add video delay, to allow for possible seconds worth of outboard FIR filtering.

For example; Particularly Using TV as the source and streaming services via the TVs apps. If you output your audio to an outboard processor of your choice (say, miniDSP 2x4HD) the TV knows that the audio and video was synced when it sent the audio away out the optical or HDMI ARC outlet to be converted in a DAC, but the DSP processor can add significant delays if you don’t just stick to IIR filtering. The TV only has an option to add more audio delay!

But I want negative audio delay - which is impossible, so, add video delay! TVs have RAM now, surely they could hold 24-48 4K frames in there? 25-50ish mbps is it not? Nothing to a 8gb stick of RAM.

I hope this will be a feature on TVs soon, and I’ll be first in line if it does.

I would think that you wouldn't need to store any video frames, just lead the video with the audio. Basically, audio track plays first, then video. A lead time gap should be able to be maintained.

thorvat · Oct 28, 2021

fluid said:
As to whether a single point or multipoint measurement gives the best subjective result depends on the speaker, room, and treatment. In a non treated, reflective room where multiple listening positions are to be used, then multipoint measurements are more likely to be successful. In a treated room with a single main listening position and a speaker with good controlled directivity then a single point tends to be better. Taking multiple points within a small space is another option.

IMHO taking multiple measurements always provides a more accurate basis for room correction. In the second scenario you are mentioning (treated room with single seat) you should be, of course, getting very similar results from single point vs small space multipoint measurements, but I would still recommend using multipoint measurement.

mitchco · Oct 28, 2021

I know it is counter intuitive, but multiple measurements can actually reduce the resolution of the correction at the listening position. This was mentioned in Sean Olives study on, “The Objective and Subjective Evaluation of Room Correction Products. https://www.audiosciencereview.com/...review-room-eq-setup.26397/page-9#post-906241 You can hear it in AB tests if you try it for yourself.

As explained in detail in the video, the key is using DRC/DSP software with pyschoacoustic filtering and frequency dependent windowing. As demonstrated in the video, and in detail in my book, a single analysis measurement will cover a 6ft x 2ft grid area with smooth frequency and ideal timing response.

Also mentioned in the video, in my tests, David’s Focus Fidelity Designer is the only DSP FIR designer that gets multiple measurements correct. Focus Fidelity uses multiple measurements to build a transfer function model (this is difficult to do well) and from there apply less correction to features which change with position. Focus Fidelity avoids the resolution reduction (heard as over correction) by not just averaging the multiple measurements like so many other DSP/DRC packages do. Talking about the state of the art here.

dannut · Oct 28, 2021

jhaider said:
How so? Does it really tell you anything except that the software can auto-EQ a (single? if so even less useful) point in space? Given that we generally hear the speaker above the transition region and not what-a-mike-sees-at-one-point-in-the-room, I'm far from convinced that auto-EQ gyrations based on listening position measurements do anything except for soothe the eyes of people who like orderly-looking graphs.

Hi - the video is really long, so I'm going to assume many haven't watched it, myself included. One picture needs context, that is not provided. So:

You can use 2 techniques above transition region. One is Geddes and Blind 'Localized Sound Power Method' the other is windowing. Yes, reflection-free time will be very short, but with frequency dependent windowing you can gradually increase the reflections contribution, ie. below 1-2kHz. Yes, you only get one axis from the speaker, so the speaker needs to be well behaved spatially (although you can take into account the power responce) . Yes, with a 'perfect speaker' and 'perfect room' there will be no correction above certain frequency. Yes, best of both worlds is to combine these techniques. We all know, what psychoacoustic studies say how we perceive sound in small rooms, so I wont got there. And yes, 'stuff' needs to get out of the way from the speakers-listener point of view. ~10ms reflection free time above some 'arbitarly' chosen frequency.

Actually I have lot of respect what @j_j & @amirm & co achieved in WIndows Vista, of all things! That was also a mike-in-one-point-in-the-room-solution. Only thing missing was user-adjustability of the spectral tilt, so kinda useless in the real world

. You should try http://drc-fir.sourceforge.net/ any and every imaginable parameter can be modified. Or not - sadly, it doesn't have a UI.

fluid · Oct 28, 2021

thorvat said:
IMHO taking multiple measurements always provides a more accurate basis for room correction. In the second scenario you are mentioning (treated room with single seat) you should be, of course, getting very similar results from single point vs small space multipoint measurements, but I would still recommend using multipoint measurement.

You are of course entitled to your opinion as I said above my opinion is based on my experience of using all of these different processes on the same speakers in different rooms. I think I included enough caveats to allow for others differing experience or at least I tried to

I would always suggest to anyone experimenting to try all methods for themselves. I have had good results with both and the best subjective result used a different technique in each room. I echo Mitch's points on averages.

dannut said:
You should try http://drc-fir.sourceforge.net/ any and every imaginable parameter can be modified. Or not - sadly, it doesn't have a UI.

DRC Designer gives a UI, but I much prefer gmad's scripts that mitch linked to above. With that any parameter can be changed by just adding it into a line in the script which overrides the value set in the base profile being used. This allows fairly quick filter generation and testing once you get used to it. Anything with a -- in front of it signifies a change from the base profile. The script includes sox commands for turning the raw pcm output into wav files and then moves the filters and test convolution to specific folders and deletes the unnecessary files all in one hit.

dannut · Oct 28, 2021

Thank you! Just mentioned as a warning, that it isn't a 'ready-to-go' solution, but fiddleware.

Actually, my hope is that the AVR industry would come out with a user uploadable convolution engine (yes, I know, latency...). And christmas is coming, so: "HDMI eARC, bitstream decoding, music upmixing! (Logic 7 or ProLogic2), flexible matrix mixing, lowpass/highpass/PEQ/biquad and FIR processing. Oh and some cheap amplification to go with it - ie. TPA3255"

fluid · Oct 28, 2021

dannut said:
Thank you! Just mentioned as a warning, that it isn't a 'ready-to-go' solution, but fiddleware.

That is true but anything that is ready to go without any fiddling does not give as good a result. Because you can change any parameter the world is your oyster or a huge rabbit hole to fall in. I've been down it, I can understand why others would not want to, for that you can pay Mitch to do it for you

Ata · Oct 28, 2021

Ron Texas said:
Hey @mitchco thanks for all your help and inspiration. I'm back to a 100 hz crossover on my LS50's, btw. Will watch the video later.

Another shout to @mitchco I have been waiting for this thread for a while!

FWIW I also settled on 100Hz xover for my LS50M.

thorvat · Oct 28, 2021

fluid said:
You are of course entitled to your opinion as I said above my opinion is based on my experience of using all of these different processes on the same speakers in different rooms. I think I included enough caveats to allow for others differing experience or at least I tried to

Well, let me also say that the opinion I stated is also based on the pretty extensive personal experience with room correction, not only with "ordinary" rooms like the once we usually have at our homes, but also with professional recording studios.

What may be the subject of the discussion here is the size and shape of the "small" space of the single seat related to the distance to the speakers, type of the speakers, type of room treatment and is the listener sitting behind the mixing console or sitting in a sofa in the room - to name just some of the factors. But if we assume "ordinary" non-treated room, "ordinary" consumer speakers and app 3 meters or so listening distance, than Dirac single seat measurment area in which multipoint measurement should be taken pretty much gets it right.

j_j · Oct 28, 2021

dannut said:
Actually I have lot of respect what @j_j & @amirm & co achieved in WIndows Vista, of all things! That was also a mike-in-one-point-in-the-room-solution. Only thing missing was user-adjustability of the spectral tilt, so kinda useless in the real world . You should try http://drc-fir.sourceforge.net/ any and every imaginable parameter can be modified. Or not - sadly, it doesn't have a UI.

Unless you used the option that claimed you had a good microphone, it didn't even flatten room response. What it did was make the time, gain, and frequency response the same with all speakers at the microphone point. It did account for basic psychoacoustics.

I am amazed that it's still there in W10, and still has the same broken UI wherein sometimes training fails due to file ownership. I doubt that will ever be fixed.

dasdoing · Oct 28, 2021

I have a hard time understanding the use of Accourate's envelope graph and the FDW combined.
afaiui the envelope should represent what we hear (speakers and room combined), while the FDW is direct sound only. it doesn't seam to make sense to use them both at the same time

markus · Oct 28, 2021

mitchco said:
A deep dive presentation on the fundamentals of "proper" Digital Room Correction (DRC). Includes hands-on DSP FIR Filter Designer demos using Acourate and Audiolense.

Having participated in many audio forum discussions, having watched online videos on Digital Room Correction (or DRC), and having reviewed over a dozen DRC products over the past 11 years, I have come to two conclusions. One is that there is considerable misunderstanding about DRC, how it works and even what problems DRC is trying to solve. And, just as important, understanding what is possible using the SOTA of DRC. I hope you find the content educational and practical.

First of all thank you for taking the time to make this video and explaining your process with Acourate and other software which you – as I understand it – offer as a commercially available service. No excuse for people anymore to not know about the fundamentals.

As much as I tend to agree with JJ's "laws" there has been no formal verification, i.e. scientific studies. Such a study would need to include many different configurations, small rooms, large roooms, different reverberation times, different ratios of direct and reflected sound, reflections angles, timing and spectrum, stereo, multichannel, etc.pp.

Regarding your approach, some remarks:

Latency with ultra long FIR filters: The signal has to pass the whole filter first before sound comes out at the end. This prohibits such filters for any application that requires (near) realtime processing like gaming or really any video streaming. One could build a video buffer to sync audio and video but at this point such a solution does NOT exist in consumer AV space (and the gaming problem remains).

Measuring the (quasi) anechoic speaker response: This is virtually impossible even when using windowing as room boundaries, furniture (seat back!) and objects are too close to the microphone. The magnitude response is skewed and the resolution is coarse.

Psychoacoustic filtering: This is something Uli Brüggemann introduced without providing any information how he arrived at it. It seems more like an educated guess that this is closer to what we hear but it is certainly not backed up by any scientific study (I know of).
"We don't hear dips (as much)": While I agree they greatly contribute to perceived overall timbre. Simply ignoring them (by visually filling them in) isn't probably helpful in that regard.

Audibility of pre-ringing: This is not a well researched topic either. Threshold is probably depending on frequency, signal and specific room reverberation time (masking effects).

Single mic position: We have two ears, so how do other points around a central position look like? You're only showing what looks like heavily smoothed measurements. Did these points improve too? Or are they worse? What about multiple seat optimizations?

dasdoing · Oct 28, 2021

markus said:
it is certainly not backed up by any scientific study (I know of).

acording to Mr Sbragion http://drc-fir.sourceforge.net/doc/drc.html#sec35

The spectral envelope is a concept which has been introduced in the field of speech synthesis and analysis and is defined simply as a smooth curve connecting or somewhat following the peaks of the signal spectrum. There are strong arguments and experimental evidence supporting this approach and the idea that our ear uses the spectral envelope for the recognition of sounds. The spectral envelope, for example, allow our ear to understand speech under many different conditions, whether it is voiced, whispered or generated by other means. These different conditions generate completely different spectrums but usually pretty similar spectral envelopes. The spectral envelope also easily explains why our ear is more sensitive to peak in the magnitude response and less sensitive to dips. A curve based on the peaks of the magnitude response is by definition little or not affected at all by dips in the frequency response.

dasdoing · Oct 28, 2021

In my tests with unsmoothed graphs EQed to a flat-ish spectral envelope, the smoothed graphs of those resemble strongly popular house-curves for the smoothed graph developed by trail and error.
personaly I think this is a strong evidence towards this theory

markus · Oct 28, 2021

dasdoing said:
acording to Mr Sbragion http://drc-fir.sourceforge.net/doc/drc.html#sec35

"strong arguments and experimental evidence" is not the same as "scientific study"

And, what smoothing is "correct" in order to create the "right" spectral envelope?

Understanding the State of the Art of Digital Room Correction

Major Contributor

​

Major Contributor

Addicted to Fun and Learning

Addicted to Fun and Learning

Major Contributor

Major Contributor

Senior Member

Addicted to Fun and Learning

Member

Addicted to Fun and Learning

Member

Addicted to Fun and Learning

Senior Member

Senior Member

Major Contributor

Major Contributor

Addicted to Fun and Learning

Major Contributor

Major Contributor

Addicted to Fun and Learning

Similar threads