For phase p and frequency f group delay GD is the negative of the derivative of the change in phase with respect to the change in frequency: GD = -dp/df. In general phase is a function of frequency p(f); that is, phase changes with frequency. That's the math behind it.
A change in phase is equivalent to a time shift, for example a 180-degree phase shift is like shifting the time by one-half cycle of the signal. The derivative is a fancy expression for the slope of a line, so it is a measure of phase linearity. A straight (linear) line has the form y = mx+b where every point x is multiplied by the slope m and added to offset b to produce a y value. For a straight line, m is constant (just a number, not a function of something else), and hopefully we remember this formula from school. Now replace m with GD so to get a straight line, that means the change in phase divided by the change in frequency...