I think there was at one time a cult devoted to brute force convolution i.e. explicit, exhaustive time domain multiplies-and-adds versus the much more economical multiplication in the frequency domain facilitated by the FFT. For sure, if it could be shown that the two versions produce different numerical outputs, then that might suggest audible differences if those differences are great enough.
My own 'convolution engine' code is based on the frequency domain variant and I have never worried too much about it. I simply use the FFTW library to do the forward and reverse FFTs. There is no smoothly tapered windowing prior to calculating the FFTs as the system can use
overlap-and-add which works fine with a rectangular window.
Numerical accuracy could also be influenced by floating point versus fixed point, rounding errors, 32-bit versus 64, if dither is applied, etc. Hopefully such differences would be minuscule in terms of audibility, but as long as there is any difference at all, people can claim they hear it. If two systems produce the same numerical output then of course they cannot sound different - but of course it would then be claimed that they can stress the PC power supply differently etc.