This may not make the outputs exactly equivalent, mathematically speaking. But it does make them sonically equivalent.
This is where you err. Since the OUTPUT of either the single or joint filter is what drives the inner hair cell, THAT is what ***DETERMINES*** the sonic effect at the detector. There is no more filter after that, so anything that leaked through can have an effect.
This is what you need to understand. While you may never hear a 21kHz tone, it is indeed possible that a bit of 21 kHz content leaking past the cochlear filter COULD (not "does") affect the detection. After all, it's that movement that triggers the detector. There is nothing to gain at this point by making a small incremental change that will absolutely be clear.
Nonlinearities suck.
Consider how FIR's work (constant delay FIR's, not the general case). In a very real sense your filter has energy outside the cutoff (talking about an LPF) that is REMOVED by being antiphase in the second, symmetric half of the filter. If your "detector" detects something BEFORE the second half comes along, whoops, the nonlinearity bit you.
Try this. Make yourself a really sharp antialias filter, with say 1dB ripple and 90dB rejection in Matlab. Make it at 96khz.
NOW analyze that impulse sample by sample, using a 64 tap Hann windowed FFT. Move that sample by sample along the impulse response. Look what you get. Maybe THIS will show you what I'm talking about. The ear does not have do, and does NOT, consider the entirety of a filter on an impulse, it detects on the part of it that corresponds to the CURRENT TIME with it's impulse response width, NOT the whole filter length.
Hence my willingness to move to 64, where any presently conceivable mechanism can be ignored. Going to 96, or 128, or 192, etc, simply makes storage a lot harder. Remember when you double the sampling rate, you now require filters that are twice as long, at twice the rate, for 4x the calculations in an FIR. Furthermore, many IIR's that work in single precision just fine require double when you raise the sampling rate.
Somewhere I have a nice photo of sliding an analysis window reasonable fit to the HF cochlear filter length along a very tight antialias filter. It's been a while, and nobody's debated this for quite a while, but it shows quite graphically how this can go wrong.