As is known, a popular method for room correction is to use convolution-applied FIR filters, and an ideal method to apply it is to use non uniform partitioned convolution (NUPC), because it allows acceptable latency with long FIR filters.
However, the NUPC is not a standard and fixed process, but can be configured in different ways.
A variable is the size of the initial block and successive blocks, which in addition to latency and computational load also seems to influence the signal, potentially in various ways.
Here, I would like to do discuss of this last aspect in order to understand what is the ideal block size to use for room correction.
Unfortunately I can't find a practical guide to doing this, so I did brainstorm with ChatGPT.
This is the result, on which I would like us to confirm the points or not (consider that there may be translation errors):
1. Linear convolution is partition invariant
Theoretically, partitioned convolution (even non-uniform) is an equivalent way of realizing the same linear convolution. So, if everything is done correctly, the processed signal does not change compared to direct convolution.
2. Possible differences in the signal arise from:
a. Overlap errors
Each block must be correctly aligned and added to its time window in the convolution.
If the temporal offset associated with each block is not precise (e.g. 256 samples offset for a 256 block), you may introduce phase shifts or comb filtering in the resulting signal.
b. Aliasing problems in FFT convolution
FFT uses circular convolution, so it is necessary to apply zero-padding and overlap-add (or overlap-save).
If you do not respect the correct padding, or if the FFTs are too short compared to the local IR of the block, you can introduce spectral aliasing or modify the frequency response.
c. Truncation and numerical precision
Small blocks truncate the IR to short segments. In NUPC, if the partitioning is too aggressive (e.g. small blocks for everything), you may miss important energy components of the IR in each block.
Furthermore, each FFT introduces floating point and round-off errors, which accumulate. If you use very small blocks, you increase the number of FFTs, thus amplifying the numerical error.
3. Frequency domain reflection
The accuracy of the frequency representation is better with longer FFTs. Small blocks have short FFTs, thus a low frequency resolution.
This means that any manipulation or filtering in the frequency domain (e.g. with HRTFs or dynamic filters) will have less precision if you apply too short FFTs.
4. Leakage and discontinuity phenomena
If you apply time windows (e.g. Hann) on the blocks before the FFT, smaller blocks introduce more leakage.
If you do not apply windows, but the blocks do not match perfectly at the edges (especially in non-uniform cases), discontinuities and therefore transient artifacts can be generated.
5. Effects of Block Size on Transients
• Smaller blocks (e.g. 64-128 samples):
Allow for faster response to changes in the signal, faithfully reproducing transients.
Require more processing power, as they increase the number of FFT/IFFT operations.
• Larger blocks (e.g. 1024-4096 samples):
May cause delays in the transient response, smoothing or attenuating fast details.
They are more computationally efficient, reducing the load on the CPU.
Do you think these things can be confirmed?
If so, can we determine an ideal size for first and subsequent blocks?
However, the NUPC is not a standard and fixed process, but can be configured in different ways.
A variable is the size of the initial block and successive blocks, which in addition to latency and computational load also seems to influence the signal, potentially in various ways.
Here, I would like to do discuss of this last aspect in order to understand what is the ideal block size to use for room correction.
Unfortunately I can't find a practical guide to doing this, so I did brainstorm with ChatGPT.
This is the result, on which I would like us to confirm the points or not (consider that there may be translation errors):
1. Linear convolution is partition invariant
Theoretically, partitioned convolution (even non-uniform) is an equivalent way of realizing the same linear convolution. So, if everything is done correctly, the processed signal does not change compared to direct convolution.
2. Possible differences in the signal arise from:
a. Overlap errors
Each block must be correctly aligned and added to its time window in the convolution.
If the temporal offset associated with each block is not precise (e.g. 256 samples offset for a 256 block), you may introduce phase shifts or comb filtering in the resulting signal.
b. Aliasing problems in FFT convolution
FFT uses circular convolution, so it is necessary to apply zero-padding and overlap-add (or overlap-save).
If you do not respect the correct padding, or if the FFTs are too short compared to the local IR of the block, you can introduce spectral aliasing or modify the frequency response.
c. Truncation and numerical precision
Small blocks truncate the IR to short segments. In NUPC, if the partitioning is too aggressive (e.g. small blocks for everything), you may miss important energy components of the IR in each block.
Furthermore, each FFT introduces floating point and round-off errors, which accumulate. If you use very small blocks, you increase the number of FFTs, thus amplifying the numerical error.
3. Frequency domain reflection
The accuracy of the frequency representation is better with longer FFTs. Small blocks have short FFTs, thus a low frequency resolution.
This means that any manipulation or filtering in the frequency domain (e.g. with HRTFs or dynamic filters) will have less precision if you apply too short FFTs.
4. Leakage and discontinuity phenomena
If you apply time windows (e.g. Hann) on the blocks before the FFT, smaller blocks introduce more leakage.
If you do not apply windows, but the blocks do not match perfectly at the edges (especially in non-uniform cases), discontinuities and therefore transient artifacts can be generated.
5. Effects of Block Size on Transients
• Smaller blocks (e.g. 64-128 samples):
Allow for faster response to changes in the signal, faithfully reproducing transients.
Require more processing power, as they increase the number of FFT/IFFT operations.
• Larger blocks (e.g. 1024-4096 samples):
May cause delays in the transient response, smoothing or attenuating fast details.
They are more computationally efficient, reducing the load on the CPU.
Do you think these things can be confirmed?
If so, can we determine an ideal size for first and subsequent blocks?