Here we go. File is mono content, so only one channel was processed in the following.
Original track, at the largest ISO with more than +5dBFS peak level which is beyond what any DAC I'm aware of would handle without clipping.
View attachment 323918
Reference track (upsampled to 705.6kHz 32 bit, -6dB gain applied, downsampled to 176.4 16bit):
View attachment 323920
Clipped (same as above, with hard-clipping at -6dBFS before the final downsampling):
View attachment 323921
Note: The "dimple on the roof" comes from the downsampling as theory says it should (ringing from the linear phase resampling).
Now let's have some fun comparing the two versions (ABX logs welcomed). I didn't bother as I'm heavily biased anyway towards "no difference" from previous experiences.
I'm aware that this simple test might not fully represent what could happen in practice (notably the analog output stage of DAC might run into saturation with a sticky recovery even when the DAC chip itself could handle +6dBFS peaks properly). But is a starting point at least.
Slightly more than four samples between the negative peaks with two zero crossings means the signal is very close to fs/2.
This got me thinking. For any halfcycle of a sine at below fs/2 from 0 to pi, pi to 2 pi etc., the maximum intersample over is if the samples are placed symmetrically about the peak and thus have the same value. This is of course why for fs/4, sampling is at pi/4 and 3pi/4 and the sample values are sqrt(1/2). For a non-clipped constant amplitude fs/4 gives the maximum intersample overs. However, if amplitude is varied, higher frequencies may be used.
A sine signal at frequency f = k*fs, where 0 < k < 1/2, with phase phi, has the function
x(t) = sin(2*pi*f*t + phi) = sin(2*pi*k*fs*t + phi).
Sampling is done at times tn = n/fs. Thus, the sampled values are
x(n) = sin(2*pi*k*fs*tn + phi) = sin(2*pi*k*fs*n/fs + phi) = sin(2*pi*k*n + phi).
For the maximum value over the adjacent sample values, x(n0) = x(n0+1) and the fictional sample of the maximum is at x(n0+1/2) = 1 or x(n0+1/2) = -1 for a minimum. Since asin(+-1) = pi-+pi/2, it is the case that 2*pi*k*(n0+1/2) + phi = pi-+pi/2, which gives that n0+1/2 = (1-+1/2)/(2k) - phi/(2*pi*k) which gives
n0 = (1-+1/2 - phi/pi)/(2k) - 1/2 = (1 - k -+ 1/2 - phi/pi)/(2k).
Thus, the other sampling points are at
n = n0 + dn = (1-+1/2 - phi/pi)/(2k) - 1/2 + dn = ( 1 + (2*dn-1)*k -+1/2 - phi/pi)/(2k).
This gives the sample values
x(n0) = sin(2*pi*k*(1 - k -+ 1/2 - phi/pi)/2k + phi) = sin(pi*(1 - k -+ 1/2))
and x(n0+1) = sin(pi*(1 + k -+ 1/2)).
Thus, the signal at that point can be increased by the inverse G = 1/x(n0) without exceeding 1 and the peak between the samples will also be increased by the inverse exceeding 1 being 1/x(n0).
In order for n0 to be an integer phi has to have the correct value, i.e. (1 - k -+ 1/2 - phi/pi)/(2k) = m where m is an integer. Thus, 1 - k -+ 1/2 - phi/pi = 2*k*m and thus
phi/pi = 1 - k -+ 1/2 - 2*k*m = 1 - k*(1-2*m) -+ 1/2.
Thus, the phase has to be
phi = (1 - k*(1+2*m) -+ 1/2)*pi.
For k = 1/4, taking the positive maximum, n0 = 0 if phi = pi/4 and n0 = 1 if phi = -pi/4. The sample values adjacent to the maximum are sin(pi/4) = sqrt(1/2). Thus, the signal can be increased by sqrt(2) without exceeding 1 giving intersample peaks of sqrt(2) or +3.01 dBFS. In the limit at k = 1/2, n0 = -phi/pi which is 0 of phi = 0 and 1 if phi = -pi. The sample values adjacent to the maximum are sin(0) = 0 with the inverse being infinitely large. For a 20 kHz signal sampled at 44.1 kHz, for the maximum, n0 = 0 if
phi = (1 - 20/44.1 - 1/2)*pi = (1/2 - 20/44.1)*pi = (44.1-2*20)/(2*44.1)*pi = 4.1/88.2*pi.
The sample values adjacent to the maximum are sin(pi*4.1/88.2) = 0.1455 which is -16.74 dBFS. The next adjacent sample values are sin(n0+2) = sin((1 + (2*2-1)*20/44.1 - 1/2)*pi) = -0.4242 which is -7.45 dBFS, sin(n0+3) = sin((1 + (2*3-1)*20/44.1 - 1/2)*pi) = 0.6670 which is -3.52 dBFS, sin(n0+4) = sin((1 + (2*4-1)*20/44.1 - 1/2)*pi) = -0.8533 which is -1.38 dBFS, sin(n0+5) = sin((1 + (2*5-1)*20/44.1 - 1/2)*pi) = 0.9673 which is -2.89 dBFS and sin(n0+6) = sin((1 + (2*6-1)*20/44.1 - 1/2)*pi) = -0.9994 which is -0.0055 dBFS. Thus, if the amplitude is ramped up the right way, very high intersample overs may be produced.