It depends which AI model you're using and how well it's performing. One of the ways they've been improving performance while lowering running costs is to work out what skills the prompt requires and hand off to smaller, more specialised models. If you're running a model that does this, and it interprets your prompt correctly, you'll probably get the correct answer. If you're still using a pure language model, or the handover is wrong somehow, you'll probably get something that looks plausible but will be wrong if you check the calculation. And even with the same model it may depend on what backend tuning the operator has applied, perhaps preferring a cheaper but less accurate solution because the load is high or electricity more expensive just now. This is all opaque to the user, so if it matters you'd best get it to show its workings and check them.