You can measure voltage to the speakers with a
multimeter (AKA "DMM" = digital multi-meter). And you can generate a test-tone file with Audacity (regular program material jumps around too much to measure accurately). Just about any meter will work... It doesn't have to be super-accurate, just repeatable and stable. Note that some meters don't measure low AC voltages accurately (I have one like that) so if you ever want to measure line level signals (around 1V or less depending on the "loudness" and volume control) make sure to get one with a low-AC range.
In case you haven't seen this -
HydrogenAudio - What is an ABX test?
Theoretically, it's supposed to be double-blind so she doesn't know what "X" is and she can't give you any clues, accidentally or intentionally to fool you or help you. But double-blind is not usually easy or practical so try to set it up so you can't see her face and if possible she shouldn't say anything (or say very little). You can ask for "A", "B", or "X" and you can give your answer/guess and move on to the next trial. (You need multiple trials to get a statistically valid result.)
"A" and "B" can be known. "X" should be truly random (a coin flip, etc.). She can flip a coin in advance and make a "secret sequence" list after you've decided on a number of trials. and
you can ask to hear "A", "B", or "X" again until you make a decision or a guess.
Do you have an easy way to switch and an easy way to switch off between switches so you can't tell if it was switched or not?
You may already know this too, but ABX doesn't tell you which is better it just tells you if you can reliably hear a
difference between "A" & "B".