The images follow a sinc (sin(x)/x) curve so signals in the upper midrange get folded around Nyquist (one-half the sampling frequency) and can appear above 20 kHz in "significant" amounts. The levels may indeed be "fairly low" but then again tweeters aren't designed to take a lot of power. I have an article on it, somewhere, but in any event have given up trying to convince anyone of anything in audio these days. More to the point, oversampled DACs (currently the vast majority of them) move the images well above the audio band where they are less likely to modulate a driver back into the audio band. Still likely to heat up the tweeter, of course.
This article touches on images: https://www.audiosciencereview.com/...ital-audio-converters-dacs-fundamentals.1927/
I think what saves the tweeters is the fact that HF content is relatively short-lived in nature. Just for kicks here is the peak spectrum of Pink Floyd's Money (in red), it would appear that the image levels should not exceed -40dB, hardly any danger to tweeters at normal listening levels.