I have always been bemused by this line of reasoning.In-room response should be smooth but sloping down. Reason is that high frequencies are directional and there is more absorption of them in a typical room. Anechoic response should be flat since reflections are not in there by definition.
If we can get the production to occur with speakers with similar response, then we are golden.
I have read it often but it only makes any sense if people do not want the frequency response level when they are listening.
Obviously a speaker which measures flat in an anechoic chamber will have a downward sloping FR in a room for the reason you mention but, logically, that means either people don't actually like a flat response when they listen to music or that the actual recordings have had their FR tilted up in mastering because the mastering room is absorbent and the mastering engineer compensates.
I am not disputing findings which have been published, but simply that the in room response should be smooth but sloping down is not "high-fidelity" to the input signal in an engineering sense, unless the input signal is always identically sloping up.