Pavel, that's certainly interesting experiment.
I'd love to participate with listening comparisons, but unfortunately I can't do it due to my current medical condition (middle ear issues after Covid).
I recall we've done similar tests cca 10 years ago. Although it was mainly regarding filtering and ways of downsampling (44.1 from 96k original recordings) among friends, sound engineers. Main objective was to make some sense among various options for the task. You had very expensive standalone software resamplers, opensource tools, built-in algorithms in DAWs, some people praise only analog re-capture at target rate etc. One SRC algorithm might have super steep filter curve, other have leaky slower filter, another one is fully adjustable with regards to steepness, cutoff frequency and phase response.
I contributed with some tools and processed samples. Results with comparisons of source and resampled files were pretty much non-conclusive and pretty random. Along the way we found some initially unexpected things. For example one pro audio converter had pretty different (measurably) performance with 48k and 44.1k based frequencies, when externally clocked, so difference to original files was exaggerated because 96k was 48k multiple. Also some DACs like Benchmark DAC-1 has in-line ASRC, which resamples everything to single frequency for the actual conversion, so it was felt that aspect somewhat diminish differences among different source downsampling algorithms and its settings. Possible perceived differences were very material and playback system dependent and of course we've made lot of mistakes in testing procedures along the way

Finally, as expected, we started to argue - ranging from whether it's different, and if it really worthwhile difference in the grand scheme of things, if it really matters to match the source or if it's better to simply follow your gut feeling and pick what you feel is best sounding at the target rate. But it was lot of fun and for me definitely starting point for further technical exploration

I then I naturally came to conclusion, next test needs to be better prepared with ABX tools, ideally aiming to individual aspects in isolation (like your test with sources at common rate and just different filters for example, just to avoid some mentioned issues with playback systems and DACs). But in reality I haven't repeated it properly with multiple listeners, participants, as I was busy with other work.
Anyway one interesting aspect among quite a few people including myself was, those super steep filters with shortest transition band weren't preferred as generally best option for everything. Also later I found few of most used and praised external resamplers like Weiss Saracon, Izotope RX or Hepta SRC in Pyramix DAW has either hardcoded parameters or defaults to somewhat "leaky", more relaxed responses akin to common half-band filters in converters. Sure in those chips you have computational and latency constraints, so such design choice is often also necessary. However despite offline SRC can have really super steep filters with pretty much as many taps you need, those vendors opted for those slower filters with less ringing, be it with some deliberate aliasing especially in case of 1fs rates. I don't think, it was just technical decision of their programmers, as there is usually also beta testing with various sound engineers.
So if you're interested, maybe you could do another round of ABXing with gentler LPF.
Michal