This supports my initial comment that it takes more than a specific set of headphones, it needs some sort of binauralization, in your case this is achieved with the Meier crossfeed.
I achieve substantial externalisation purely by parametric EQ of my LCD-2s, that is fairly consistent across recordings and placement (because Audeze planars are relatively insensitive to seal and placement). This EQ substantially deviates from Harman but is nonetheless the best balance between FR and spatial properties I've experienced yet. As I have written about, the average HRTF for "Asians" insofar as there can be a meaningful average, appears significantly different from "Caucasians". Chesky binaural sounds like ass to me, EQ or not for that reason. The Amber Rubarth recording sounds like a mass of indistinct sounds clinging to my scalp, flitting in and out of my head.
My EQ headphones do a real good illusion of nearfield monitors on recordings with relatively stronger reverb -- think Bon Jovi's Livin' on a Prayer. As in I can distinguish where the instruments emanate from relative to each other in a left-right arch with depth, where the drum kit is, and I can localise each cymbal, and the ambient reverb envelopes me to the back of my head.
Even poorer recordings like Eva Cassidy's magnificent rendition of Wade in the Water provide me externalisation of Eva's voice, backing vocals (I count five men, 2 on the left, 3 on the right) and all instruments. The externalisation exists even for the ground loop that's clear in the recording!
The best part? I'm a skeptical listener. I would have Genelecs if I had the space. I am rarely impressed by headphones, and I know enough about spatial audio to be thoroughly unconvinced by reverb gimmicks/poor HRTFs/crossfeed (for the latter, except in extreme cases of hard panning like Sam and Dave's Soul Man). The externalisation is so compelling I cannot deny it. But I don't pretend that these are the spatial cues that are meant to be heard. Hence I have refrained from calling it accurate. I prefer to call what I have spatially and tonally
plausible in a way I find deeply satisfying and moving to speakers would frankly mostly down to communal experience and the industrial design.