CamillaDSP does FIR very differently "while FIR use convolution via FFT/IFFT".
Technically, that's not FIR, as least not done the correct way.
FFT is a lot faster, skipping, rounding, cutting... adding, merging.... the output doesn't look too good, no wonder why the Pi4 is that fast...