As I already linked previously, Dr Olive and his team showed that yes, it does work a 'fraction' of the time, that fraction being 85% in their study:
https://seanolive.blogspot.com/2016/04/a-virtual-headphone-listening-test.html
That's very good for an initial attempt, and Dr Olive suggests reasons for the remaining error and ways to improve it here:
"The differences between virtual and standard test results we believe are in part due to nuisance variables that were not perfectly controlled across the two test methods. A significant nuisance variable would likely be headphone leakage that would affect the amount of bass heard depending on the fit of the headphone on the individual listener. This would have affected the results in the standard test but not the virtual one where we used an open-back headphone that largely eliminates leakage variations across listeners. Headphone weight and tactile cues were present in the standard test but not the virtual test, and this could in part explain the differences in results. If these two variables could be better controlled even higher accuracy can be achieved in virtual headphone listening."
You could maybe argue that the remaining 15% error in this headphone virtualization method via EQ could also in part be due to non-minimum phase effects and the other difficulties you raised, but I can only foresee these problems being negated (if partially) through further refinement of the technique and controlling for the variables Dr Olive proposed. Of course, you will never get zero error, but I can see it getting close, within the limits of audibility.