• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

RTFIR - realtime convolver for Windows

stefanyovev

Member
Joined
Jan 6, 2024
Messages
5
Likes
5
Hi,
Look at this realtime FIR filtering tool I just made recently.
What do you think ?
Does it have future ?
:)

screenshot.png


https://rtfir.com/
 

StrummerJones

Member
Joined
Apr 13, 2021
Messages
5
Likes
2
Hi, I haven't tried your SW yet but I will. Have you run some really long filters and have an idea of lengths you could run on two channels on your own PC?
 
OP
S

stefanyovev

Member
Joined
Jan 6, 2024
Messages
5
Likes
5
Hi, I run 10 channels with 200 taps each on 48k and its about the maximum for smooth operation on my i7. So for 2 channels on the same pc it would run 1000 taps per channel smoothly.

(Its in my todo list to speedup this x4 with SIMD)

I am happy about the interest and I accept critics and requests.
 

StrummerJones

Member
Joined
Apr 13, 2021
Messages
5
Likes
2
Thanks for the reply @stefanyovev , that's a useful anchor to extrapolate from. I have a target of running +10k taps @ 48kHz on two channels for fullrange (incl LF) phase compensation.
 

ZolaIII

Major Contributor
Joined
Jul 28, 2019
Messages
4,195
Likes
2,475
Such thing doesn't exist, before all it's not RT OS, USB has it's own latency and so on. There are fast low latency one's and MB's that are good with micro OP's (usually server/workstation one's). So your milage will vary regarding latency you get from many factors.
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,404
Likes
18,364
Location
Netherlands
Hi, I run 10 channels with 200 taps each on 48k and its about the maximum for smooth operation on my i7
That isn’t exactly good performance. Tools like CamillaDSP can do thousands of multichannel taps even on a cheap SBC. Sounds like you still have some optimizing to do ;)

A virtual soundcard would also be a useful addition. How do you handle the various clock domains?
 

StrummerJones

Member
Joined
Apr 13, 2021
Messages
5
Likes
2
Such thing doesn't exist, before all it's not RT OS, USB has it's own latency and so on. There are fast low latency one's and MB's that are good with micro OP's (usually server/workstation one's). So your milage will vary regarding latency you get from many factors.
Latency might be a factor of FIR filter design (any energy latency you design) or other systems factors, but I'm curious about your statement that "Such thing does not exist".
Per reference above to the CamillaDSP project it states e.g.
  • "Raspberry Pi 4 doing FIR filtering of 8 channels, with 262k taps per channel, at 192 kHz. CPU usage about 55%.".
  • "An AMD Ryzen 7 2700u (laptop) doing FIR filtering of 96 channels, with 262k taps per channel, at 192 kHz. CPU usage just under 100%."
What anchors can you provide wrt FIR filter length executed on a PC?
 

ZolaIII

Major Contributor
Joined
Jul 28, 2019
Messages
4,195
Likes
2,475
RT doesn't exist! Near real time or with latency we can't distinguish can.
Mesure the more micro OP's latency on your own PC system yourself.
In audio only reproduction latency is not a big problem, syncing with other things like video works to a extent in games it doesn't work.
Grow up.
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,404
Likes
18,364
Location
Netherlands
With FIR filters, low-latency isn’t really a thing to strive for. Inherently it is not a low-latency technique.

So no, it’s not for real-time applications, nor was that claim ever made. I don’t understand the animosity here..
 

StrummerJones

Member
Joined
Apr 13, 2021
Messages
5
Likes
2
I didn't see anyone in here talking about real-time whatsoever in the context of low latency, so don't know where your cool vibes come from.
What I asked abouth here was DATA concerning the ability to execute a certain FIR filter length in real-time on a PC. Not opinions on random topics.
Neither does a systems design with longer FIR filter necessitate any additional energy latency through the system.
 
Last edited:

ZolaIII

Major Contributor
Joined
Jul 28, 2019
Messages
4,195
Likes
2,475
With FIR filters, low-latency isn’t really a thing to strive for. Inherently it is not a low-latency technique.

So no, it’s not for real-time applications, nor was that claim ever made. I don’t understand the animosity here..
I played with complex convolution kernels (not only EQ but EQ and various effects combined) long time ago and on low performance ARM SoC's (Viper for Android quad core A5 at 1 GHz) and it worked fine even for movie's. Thing is when you take it like that latency will be lower than sum of effects in the loop if it's optimised even significantly. I also did bus assessments for HPC and FPGA interconnects along with various programing techniques (but step, shedow...) and architectures (long words, bus packing and so on).
RT doesn't exist! If you want to measure micro OP's (packed short words of various types) you are given opportunity to do so if you don't that's your problem.
Take a look at opening post and thread name.
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,404
Likes
18,364
Location
Netherlands
Take a look at opening post and thread name.
Yes, it says real-time. So what? I think your definition of real-time is somewhat limited. Real-time does not mean low(or zero)-latency. It just means the system adheres to strict timing constraints.

 
Last edited:

ZolaIII

Major Contributor
Joined
Jul 28, 2019
Messages
4,195
Likes
2,475
Yes, it says real-time. So what? I think your definition of real-time is somewhat limited. Real-time does not mean low-latency. It just means the system adheres to strict timing constraints.

It's not how big and bad bastard is it's how you feed it, victim cache won't help, predictor will have limited suces, every miss or migration is very costly and so on. Best you can is use good programing in a suitable program language (still C mostly) and use optimised compiler with additional optimisation flags. Packaging to architectural specifics, fast flag (machine register store) semaphore loops and so on. It means it's stop until the instruction in the QUE is done so it can't fail out or overload and queue.
I really don't know why I bother I left that world because of clowns in the first place cuple years ago. I whose especially into schedulers.
 
Last edited:
OP
S

stefanyovev

Member
Joined
Jan 6, 2024
Messages
5
Likes
5
... I have a target of running +10k taps @ 48kHz on two channels ...

Why exactly 10k ? How did you decide this number ?
I have a strange feeling that no one will ever need more than samplerate/20=2400 taps ?

  • "Raspberry Pi 4 doing FIR filtering of 8 channels, with 262k taps per channel, at 192 kHz. CPU usage about 55%.".
  • "An AMD Ryzen 7 2700u (laptop) doing FIR filtering of 96 channels, with 262k taps per channel, at 192 kHz. CPU usage just under 100%."

Wow... just wow. Makes me wanna check it. It sounds like a statement. 1.3 second delay for that filter but 2M multiplications per second?
Is it 16 bit ints ? Is there any OS running on that time ...
Sounds like very marketing-pumped numbers.

That isn’t exactly good performance. Tools like CamillaDSP can do thousands of multichannel taps even on a cheap SBC. Sounds like you still have some optimizing to do ;)
A virtual soundcard would also be a useful addition. How do you handle the various clock domains?

Well, I never stated that. But I state that it is enough for doing a 4 way crossover.
Yes I have more optimisations to do. I expect x4 and maybe x8.

Clock domains was a high mountain to climb for me. I think most people resample everything
on every buffer with the exact samplerate they currently think it will fit but my approach is different.
I measure the displacement on every buffer but dont apply it, just write it down. Then on every
second I accumulate all displacements and if they come up to more than one sample then I do
really move the read cursor with that value. The program says "correction of X samples.".
For me it ranges around 1 to 3 samples depending on the quality of devices used.
That way I dont resample anything. Of course the corrections is hearable and now I think I
will make a fading between the old-would-be and new-is buffers at the time of correction.
 

voodooless

Grand Contributor
Forum Donor
Joined
Jun 16, 2020
Messages
10,404
Likes
18,364
Location
Netherlands
Is it 16 bit ints ? Is there any OS running on that time ...
Sounds like very marketing-pumped numbers.
No, it’s double precision float, and it can run Linux, MacOS or Windows. As for marketing… it’s just a guy like you, it’s opensource.. you figure it out ;)
 
OP
S

stefanyovev

Member
Joined
Jan 6, 2024
Messages
5
Likes
5
Seems like SSE2 but still such a difference.. will see.

Btw the virtual sound card requires an os driver, which requires a special signing certificate which requires an existing company
which requires me to pay my own salary every month and the taxes of it all witch rounds up to several hundred money per month. :)
I use the built-in stereo mix of my realtek. And voicemeeter vb cable is not so bad.
 
OP
S

stefanyovev

Member
Joined
Jan 6, 2024
Messages
5
Likes
5
@ZolaIII
I realize what you are trying to say. Yes gaming has higher requirements to latency even than live performance.
I've heard that musicians get confused if they hear their sound delayed more than 3-4-5 ms. And if gaming is a sport
we need less than that but .. say you listen from a speakers one meter away then soundwaves need 3ms just to come
to you ears. So you even cant play on speakers at first place....
 

StrummerJones

Member
Joined
Apr 13, 2021
Messages
5
Likes
2
@stefanyovev
Why exactly 10k ? How did you decide this number ?
I have a strange feeling that no one will ever need more than samplerate/20=2400 taps ?
As I said I intend to run fullrange (incl LF) phase corrections. And as you know frequency resolution has a linear correspondance to the number of FIR filter taps. As a rule of thumb you can estimate freq resolution as 3*Fs/#Taps, hence 3*48k/10k gives you ~15Hz. While allpass-filtering for correct full-range phase response with FIR I can easily implement desired magnitude correction within that same filter.

Wow... just wow. Makes me wanna check it.
Sounds like very marketing-pumped numbers.
Yeah it sounds quite amazing, and I have no experience of the CamillaDSP project or filter optimizations on x86/ARM architectures, or implications of high level OS, hence my question. I have been implementing FIR filter kernels for fix-point DSPs executing full single cycle sint16 Dual-MAC instructions (ie deterministic single-MAC/cycle/channel) @ ~168MHz for embedded audio products, which provides at least one performance anchor.
His numbers could be wrong, but would be quite remarkable if they were simply lies.
 

RayDunzl

Grand Contributor
Central Scrutinizer
Joined
Mar 9, 2016
Messages
13,250
Likes
17,199
Location
Riverview FL
I've heard that musicians get confused if they hear their sound delayed more than 3-4-5 ms.

If electrified, do they have to stand that close (3, 4, 5 feet) to their speakers?
 
Top Bottom