You can also re-normalize the coefficients for 4x upsampling. That would save you the decimation step.
The sum of the filter coefficients represents the integral under the impulse response curve. Technically it's a stair-stepped approximation, but at 513 coeffs it's close enough. That integral determines the amplitude gain of the filter. Currently the sum is 1, but for your use case I think it should be 2.
EDIT: That will change the delay of the filter, though. And thinking anout it, it should also halve the stopband frequency, right? Probably better to stay with upsampling to 8x and decimating.