As much as I believe your efforts are misguided, I respect you for taking the time to do it right--well almost, but good enough. You are being two tough on yourself--you don't need 9/10 to reach significance, 8/10 will do. Typically a p=0.05 is considered sufficient evidence--what this means is that there is a 5 percent or less chance that you achieved a certain result, if the two outcomes (choosing either a or B) are equally probable, as would be the case with a heads/tails coin flip. Now if there is one principle in statistics that everyone should know it is the law of large numbers. To illustrate, consider doing 10,000 trials. Headache right? Well if the two are equally probable the number of say tails will be very close to 5000. The odds of a deviation of even 100 as in 5100-4900 are remote.
The more trials the greater the power, that is the less likely it will be a fluke. If we were to flip a coin 4 times and it came up heads each time, then it would almost reach "significance" (less than 0.05) It is 1/16 or .0625.
So for 40 trials, you only need 25 or more right to reach significance. Nowhere is it more true than math that a picture is worth a thousand words. Consider and look at the difference between the n=8 curve and the n=128 curve. Notice that it gets spikier as N increases. Thats the law of large numbers--more and more of the outcomes (were we to repeat the trial say 100 times) are close to the expected mean of 1/2*128 =64. They have normalized the curve--in this case the -0.2 and +0.2 would be at roughly 51 and 77. Lets look at your case below:
What you are seeing is a bar chart representing the probability of each particular outcome. The bar at 20 is the tallest--seems reasonable. But now go down to 26 (the first brown bar) and read across to the x axis, and you see it's tiny--maybe 0.02. But we don't want to restrict ourselves to getting exactly 26 right, we want to include every one that is even better,27,28, 29, .....40. Well if you're with me this far, the probability of just 29 is minute, and 30 can't be seen; also invisible 31, 32, 33....40. So long story short, you get 30/40 you is smoking it. You get 35, I want whatever you're smoking.
Now as payback for the free statistics guide, why is it that even with a low bar like 26, people can't do it. Oh I mean once in a while he/she might, but not reliably.
Is it possible that there is no difference?