I'm going to be one of those annoying "well actually" kind of guy here, but Harman published on AES quite a few articles related to finding an appropriate IE target, and some of them used a larger sample of listeners. Some examples :
A series of controlled listening tests were conducted to determine the preferred low frequency response of in-ear (IE) headphones. Using a method of adjustment ten trained listeners adjusted the bass level and frequency of a 2nd order low shelving filter applied to a high quality IE headphone...
Controlled, comparative double blind listening tests on different in-ear (IE) headphones are logistically impractical. One solution is to present listeners virtualized versions of the headphones through a high quality IE replicator headphone equalized to match their measured frequency responses...
A series of controlled listening tests were conducted on 30 different models of in-ear (IE) headphones to measure their relative sound quality. A total of 71 listeners both trained and untrained rated the headphones on a 100-point preference scale using a multiple stimulus method with a hidden...
I wish these articles weren't behind a paywall.
I also think that the IE target research is less robust than the OE target, but not for these reasons
I observe that this is the Buds2 pro in passive mode, results per Crinacle's measurements, when ANC is turned on, are a bit different - but then I also believe that, when using 711 couplers at least, you should be careful about comparing active IEMs with a feedback system to passive ones or to targets designed for passive IEMs, so... this is just me but I'd be hesitant to use that model as an example of a deliberate departure from the IE target.
Also, do we know if Samsung and Harman share teams and resources when developing headphones ?
PS : I've noticed that people tend to use the default Y scaling in Crinacle's tool, you can adjust it by grabbing the handles here :
View attachment 288389