As
@Kal Rubinson indicated, the basilar membrane of the cochlea in mammalian ears performs something like a Fourier (frequency-based) decomposition of the sound, whether the cochlea is excited by air conduction or bone conduction. The basilar membrane resonates locally at different locations along its coiled length in response to tones of different frequencies, ranging from around 20 kHz at its base to around 20 Hz at its apex in human ears. This stimulates the hair cells at that location to release synaptic neurotransmitters that in turn stimulate the cochlear nerve fibers at that location to fire, which the brain perceives as a tone of the corresponding pitch. Check out the linear differential equation representation of a simple driven damped harmonic oscillator in the Wikipedia page on "Resonance". If sound tone amplitudes are within the normal hearing levels, the vibration response of the basilar membrane may be linear(-ish) with respect to the tone amplitude at any fixed frequency. However, at any fixed location along the membrane, the response is nonlinear with respect to the tone frequency, due to the resonance. As there is no basilar membrane location and corresponding hair cells that resonates (above the threshold needed to the neurons to fire) in response to tones of frequency greater than about 20 kHz in humans, this mechanism cannot sense tones of greater frequency (ultrasonic). Incidentally, this is why the Fletcher-Munson iso-loudness-level curves in the SPL vs frequency plane are depicted as rising vertically at around 20 kHz: no matter how high the SPL of an ultrasonic tone you are not going to sense it by this mechanism. The ultrasonic tones get partially reflected off the eardrum, and the rest propagate through the middle and inner ears and other tissue and eventually die out as their energy dissipates as heat.
The basilar membrane and the hair cells respond to oscillating pressures exerted by external air on the eardrum. Further into the ear after the eardrum, the pressure waves are transmitted through bone, cochlear fluids and membranes at the speed of sound of the specific material. The bones in the middle ear serve mainly as a piston arrangement between two cylinders of unequal area to provide hydraulic pressure amplitude gain. The entire system including the air carrying the sound to the ear is mechanical in nature. The solids and fluids involved all have properties of density (and thus mass) and "viscosity" (and thus friction). The time derivative of the pressure oscillation at any particular point is not particularly significant. The firing of the neurons at any particular location is a response to the amplitude of the resonant oscillation of the basilar member at that location. The amplitude or amount of peak motion follows of course from Newton's Second Law of Motion. Basically it depends on the magnitude and duration of the impressed net force, the inertial mass and the damping. No macroscopic object with positive rest mass moves instantaneously by a finite non-zero amount. When a force is applied (in this case a pressure difference of fixed SPL over a small area, together with a motion-resisting frictional shear force), that lump of matter with fixed mass is accelerated, its speed increases and it suffers a displacement. The speed increases linearly with the time duration that the force is applied for and the displacement increases as the square of the duration. Then the force reverses, and the lump begins to displace in the opposite direction. So if the SPL is kept fixed, a doubling of the tone frequency halves the duration of each half period and quarters the amplitude of the displacement. Thus the displacement of the basilar membrane (as well as of all the other parts) drops off rapidly (quadratically, not linearly) with rising tone frequency. Further, the magnitude of the frictional damping force rises linearly with frequency and causes additional attenuation of the displacement amplitude for increasing tone frequency. The time derivative of pressure plays little part in these primary effects. A nearly square wave with alternating sign, which has a high derivative during its rise and fall, will produce a displacement amplitude somewhat larger than would a sine wave of the same peak amplitude (and produce a complex response, equivalent to multiple simultaneous frequencies). However, a short alternating-sign pulse of the same peak amplitude and frequency (but with each pulse lasting only a small fraction of the half-period of the wave), and having a derivative similar to the nearly square wave will produce a peak displacement less than does the sine wave. The average force applied during a half period is greater than that of the sine wave in one case and less in the other case. The square-like and pulse-like waves will still be percieved as primarily having the same frequency as the sine wave, but accompanied by some undertones and overtones of lesser amplitude. The increasing attenuation of the displacement amplitude with rising tone frequency would certainly limit how far into the ultrasonic we could sense, but the lack of resonance of the basilar membrane of high enough amplitude to trigger neuronal firing for frequencies above around 20 kHz (in young children with undamaged hearing) causes a sharp cut-off in what we can hear, so we need not actually bother calculating the attenuation at 96, 192 or 384 kHz.
We must discount the stories of audiophiles enthused by what they hear in "HiRes" audio in "sighted" listening, because of the giant confounding factor of cognitive bias. They have been primed to believe by companies touting DSD, MQA, high-bitrate PCM, who tell a seemingly coherent story and use Kahneman's and Tversky's System 1 to induce belief motivated by consumers' desires for increased music enjoyment and one-upmanship over the Joneses. Calling the tones hypersonic instead of ultrasonic not only does not change the underlying physics, but is an abuse of usage of the term hypersonic, which has had a well-established and different meaning in fluid and solid mechanics for over a century. Spending resources on research into how the brain might sense ultrasonic tones requires justification first by rigorous DBT, designed to exclude aliasing of ultrasonic tones into audible range in the equipment, by credible impartial researchers, proving that the brain can do so. As several folks have from time to time posted on ASR, such effects even if proved to physically exist, will be small and most likely not significantly change your experience of recorded music.