Can you please explain in greater detail, without math. Thanks.
That's harder to do than it sounds. You might be able to get an intuitive understanding by looking at some pictures. This is the frequency response of an ideal bandlimited filter, referred to as a sinc filter:
https://en.wikipedia.org/wiki/Sinc_filter#/media/File:Rectangular_function.svg It's kind of intuitive: you want to keep everything in one area of the spectrum and discard the rest. Sampling any signal does exactly this behind the scenes -- we're keeping a certain amount of information and discarding the rest (because it's a sampling; we're not storing the continuous signal).
Now, what if we send an "impulse" through that filter? An impulse is a quick sound pulse for an infinitesimally short period of time, like a really quick perfectly damped drum hit (there are no impulses in the real world because of the physics of air and materials and microphones, but we're dealing with mathematical models here). Here's the actual signal that comes out of that filter:
https://en.wikipedia.org/wiki/Sinc_filter#/media/File:Sinc_function_(normalized).svg That's a little odd on first glance, right? There's a big lobe that represents the impulse, but then what are the waves before and after the impulse? Those weren't there before. Where did they come from? They're a mathematical artifact of the sinc filter/sampling process itself, referred to as pre- and post-ringing artifacts.
This is a perfect reconstruction of what we've sampled, in terms of sampling theory. In the real world, if we're trying to represent a percussion instrument like a drum, the waves before the impulse are not a perfect reconstruction. It's a sound before the drum was actually struck. We can reduce those by sacrificing the accuracy of our reconstruction in either frequency or in phase, but it involves trade-offs. (Note: I'm saying nothing about whether those artifacts are audible or not. Just talking about math here.)