Doesn´t it depend on the definition of the term "high fidelity"?
You asserted that most would agree that it is bound to something that "producers/sound engineers" hear and that a ?totally? linear reproduction system would fullfill the best approximation to that imagined "what they heard" situation.
Which raises the question if really "most" agree (is there a survey available about that?) and if the totally linear system would in reality be the best approximation as we usually don´t know about the environment where the "what they heard" took place.
Wrt to the first point i cited a statement from Wolfgang Hoeg which seems - at least to me - to define the term "high fidelity" in a different way.
Wrt the second point i mentioned the variations found to exist between various recording/broadcasting facilities. The aformentioned Wolfgang Hoeg tried to explain the what the EBU considers to be important in the subjective assessment of audio quality:
W.Hoeg, L.Christensen, R.Walker, Subjective assessment of audio quality – the means and methods within the EBU
https://tech.ebu.ch/docs/techreview/trev_274-hoeg.pdf
where the authors mentioned the measurement of the "in room response" at different broadcasting studios and showed the results (fig. 4 in the pdf).
A much more comprehensive study was done a couple of years later:
A. V. Mäkivirta, Chr. Anet, A Survey Study Of In-Situ Stereo And Multi-Channel Monitoring Conditions
https://www.genelec.com/sites/default/files/media/About Us/Academic_Papers/2001_makivirta_anet.pdf
where the authors gathered the "in room response" of 372 loudspeakers in 164 professional monitoring rooms around the world, all equipped with the same factory calibrated loudspeaker system and measurement gear.
Even under these conditions the differences are quite pronounced therefore i wrote that "in a statistically sense" the "totally linear" approach might be correct in the long run, but might be not for specific recordings.
If we additionally take into account that there are a lot more loudspeakers used in montioring rooms and that there exist already tenthousands of records produced under quite different conditions in the past we imo have to conclude that we don´t know much about the conditions where the best "high fidelity" is delivered.
And we know from some studies (that i´ve cited already in the past iirc) that the same raw material mixed/produced by the same people (or different people) und various conditions apparently led to different results.