pure fir, brute firce... is a loop for the samples, then another loop within for the taps, some shifting... that's the part that takes a lot of time, it's the most basic and accurate way.
2048 samples * x taps doesn't seem a lot to compute.... but at 192khz those samples are coming hot!
Then...