Phase delay and group delay are derived from phase. Phase falls out of the steady-state frequency domain approach we prefer over working directly in the time domain, going with phasors and the corresponding phasor phase, which is what is being plotted in a frequency response. Phase delay and group delay thus generally represent "steady-state delays", and so you generally have systems like lumped circuits that have no true delay (there is an immediate output for ANY input), but generally non-zero and frequency-dependent phase delays and group delays, that can also be negative, while the circuit itself is represented by a causal transfer function. Phase delay and group delay are apparent delays; a 'best bet' of sorts of the equivalent true delay that could have caused this apparent delay under steady-state conditions, but there are infinitely many systems that COULD have given you this phase delay/group delay at this particular frequency, but they would all give you different transient responses, even for that one frequency in question.
For example, a polarity flip has a 180 degree phase shift across all frequencies, so if you choose a particular frequency, and plot input vs output for a system that flips the sign of the input, but only look at the steady-state response, your phase delay will be half a period, but this from a system that has no inherent delay/true delay/transport delay (where would it come from, you have just flipped two wires essentially). If you don't know any better, you might now think that this apparent time delay is constant across all frequencies ("Phase is just time delay". No, it is not.), as this steady-state phase delay COULD have been from a true time delay, if you didn't know any better, but in fact the phase delay related to the polarity flip varies with frequency and for example goes to infinity at low frequencies, while at the same time (no pun intended) always having an immediate (but opposite) output for any non-zero input. If at low frequencies, you had only a true time delay at your disposal and no polarity flip, and wanted to delay half a period, you would indeed need a larger and larger true time delay to achieve this, and that is exactly what the phase delay tells you. But that does not mean that there IS this delay in the system, it is just an apparent delay. If steady-state is all that matters, then of course you can rightfully say that this is your delay of interest, but for impulse and step responses, it is crucial to operate with the actual phase and its resulting delays.
Only in the special case of a true time delay will phase delay (and group delay) take on that delay value, and the differential delay (phase delay minus group delay) will be zero, indicating a true delay (barring a polarity flip that obscure this). But, phase is not just a time delay, and so for the full information for a single frequency (transient and steady-state), you need more information than just the magnitude and phase response at that single frequency, as otherwise you only know the (apparent) phase delay for the steady-state condition, but not what happens transiently (if that is a word).
Same thing with group delay. You might find that a modulated signal looks to be delayed with you group delay giving you the offset value, but there likely was an immediate output from the system, so no "actual" delay, just apparent. Of course, the system could have a DSP delay, or transport delay from wave propagation, but the lumped systems that we use for representing transducers cannot have such true delays, as will be discussed later.
If these steady-state delays are negative at some frequencies, it just means that it will 'look' as if the output is negatively delayed once everything has settled to steady-state, but we know that all physical systems are causal, so at the most there can be an immediate output, but not an output before an input. Now, if you allow non-causal system parts in your total system, some or all of the negative delay of course could come from here, but let us just stick with physical systems here.
A general system has a minimum phase part transfer function and an excess phase part transfer function. The excess phase part, in general, has two parts; an allpass part (zeros in right hand side/plane, RHS/RHP) and a 'true delay' phase part with a linear phase. Both of these excess phase parts have constant magnitude. The latter has to have the linear phase to infinite frequency to be a true delay, but in practice, it will be difficult to check this, and so a true delay can be difficult to separate out, as you might approximate a true delay rather well with an allpass function.
A minimum phase transfer function can thus not have any true delay, so we will know a priori that any input will give an immediate output, although it may have some linear distortion (an indicator of this will be how much the phase delay differs from the group delay). We assume our systems to be causal and stable, and so we can find a causal and stable inverse of our minimum phase part. The allpass part will add to the linear distortion but the time delay part will not. The problem comes when we try to invert the excess phase part, as either we end up with an unstable inversion system from the allpass system requiring poles in the RHS to cancel out the zeros, or we have problems with causality, as the inversion system must be anticipatory to counter the true delay in the system.
The minimum phase transfer function on its own may be all that there is to model some relevant system, such as many (lumped) transducer models. It is, however, not quite accurate to say that a system(!) is minimum phase, as for example you might have a transfer function from the displacement at some point on the membrane or surround to input voltage that seemingly is described via a minimum phase (mp) transfer function, but move to another spatial point, and it is no longer mp. When people say that a loudspeaker driver is minimum phase, they are typically talking about the pressure at some distance-to-input voltage being described with a minimum phase transfer function, although at higher frequencies, this may break down, as the spatial 3D aspect somewhat collides with the notion of 0D transfer functions (due to modal aspects, not having a clear distance from the single microphone point to the various points distributed across the membrane for these small wavelengths, temperature changes affect sound speed and this is more critical at higher frequencies, and so on). But this minimum phase aspect generally goes out the window when looking at a complete multi-driver loudspeaker for which the crossover alone is likely non-minimum phase (allpass pressure response is what you are aiming for, and this is by definition non-mp), and when summing mp transfer functions you are not ensured a resulting mp transfer function.
Both the mp and the allpass transfer functions are (typically) described, or can be approximated, via so-called rational transfer functions with polynomials in numerator and denominator, and these rational functions cannot completely describe a pure time delay, although they can approximate such a true delay up to a certain frequency, with a finite number of poles and zeros. This again is very useful to know, as many physical systems are given via a lumped models, and you know then that there will be no true delay, as such circuits can be described via rational transfer functions.
No need to think about waves, and possible dispersion (in acoustics we typically work with a constant sound speed, so that makes it simpler), until these signal processing aspects are well understood.