My thoughts repeat some of the advice already given. I would consider performing a higher number of tests with a smaller number of devices. A 4-way double blind test does seem exhausting to me, which will make it harder to focus. I'd narrow your testing down to the two devices you care the most about, or the biggest gap in tech. Do more tests on less devices to gain more confidence in your result. If you can't hear a difference there, I don't know that I'd go thru the effort of more testing other than to satisfy my own curiosities (which is what I guess you are doing anyway).
As for music, I agree with the school of thought of picking something you enjoy that you are very familiar with. I'd make sure it has plenty of dynamic range and is very well recorded. I don't expect any perceivable differences, but if you're really trying to suss out if you can hear a difference, you want to stack the deck in such a way that if there was a difference you'd have every opportunity to catch it, otherwise you might still question your result later.
Lastly, when I first tried DBT, I really wanted to detect a difference. I believed there was one, and that I could hear it, because my sighted tests seemed obvious to me. It became frustrating quick as I started losing confidence and couldn't tell the differences I thought were there. I had to step away and come back with a different mentality of just being curious about the whole thing instead of trying to "pass a test" or prove some sort of thing to myself. Do what you can to keep relaxed, and not detecting the differences (that might not be there) is not the same as not passing the test.