My assertion:
What I heard initially (when x16 was brand new) was unlikely to be expectation bias nor voltage matched issue between E30 and x16.
Like I mentioned multiple times, although my meter is not correct, it was consistent.
I am going to repeat this.
I have measured Topping E30 in 3 difference locations of my house, multiple audio setups, multiple different times and multiple dates.
Once I voltage matched it with L30 to certain voltage output at 1Khz, all subsequent 2khz, 3Khz, ... to 10 Hz voltage captured are within 0.001V of previous measurements.
Same result for KTB. No more than 0.001V difference.
So, why Gustard x16 showing as much as 0.004V difference in just 3 weeks used from brand new?
Lots of people here talked about controlled environment. But no one here can explain why my measurements for E30 is so stable.
I admire the curiosity, but we're just criticizing the method. I design electronics for a living (not commercials ones, but including DACS and ADCs) and a fair bit of my job is testing and debugging. If a junior engineer came to me with these test results the first thing I would do would be to repeat these tests with appropriately rated equipment. I know this is hard for you to do because the appropriate equipment is extremely expensive, and things are only brand new once. But without a known good reference (which would be the higher accuracy equipment) you can't verify the accuracy or precision of your tests. Repeated precision measurements under one known condition does not project to the same repeated precision under all conditions, and that is where your testing falls apart.