The p-value shown in your screen shot seems to be wrong. For 10 correct answers from 10 trials, it should be 0.001 or 0.1%, not 0.00169% as shown. I do see this ABX tool producing correct p-values for other tests, so I'm not sure how you got that particular result...
Indeed I have. First I completed the pretty obvious ABX demo and got the expected results.
Then I also tested the tool behaviour while I was measuring the streamed audio quality with my own 1kHz test tones, here's the result of that:
View attachment 175167
I.e. behaviour of the ABX tool in all my test trials was consistent and exactly as expected - I found no such issues.
Perhaps I should note that I too cannot hear any difference in my E50/D03K ABX test (my HF hearing ends somewhere between 16-17kHz).