The safe temperature values depend upon the components used; for example, you can get capacitors in 65, 85, 105, 125, and 140 degC versions (among others). There are some defined temperature qualification ranges we could utilize: 0 - 70 degC for consumer, -40 to 85 degC for industrial, and -55 to 125 degC for military spec components. There are variations on each theme, natch, and these are component, not external case ratings. If your amplifier's case is at 70 degC (about 160 degF, too hot to touch) then internal components are probably 40 degC or so warmer (110 degC), too hot for comfort.
I have noticed that during recent Stereophile testing there have been several amplifiers that shut down during preconditioning and/or would not reach their full power ratings at low impedance without protection circuitry being engaged. (I have about six months of back issues and am slowly catching up whilst I have some time off.)
Rather than music, how about using pink or other colored noise or some other signal with a defined crest factor for power tests? That has been proposed before but I think shot down because it was too stringent (too hard on amplifiers). Foggy memory again back to the 1970/1980 decades when power wars were on-going and various standards bodies were trying to develop something reasonable. These days, we could create all sorts of fancy test signals. How about a multitone test with 10 or more tones across the 20 - 20 kHz bandwidth? That could provide a good crest factor for testing. Or a number of tones weighted to follow the Fletcher-Munson loudness curve (but at what level)? For me the problem is more what test conditions are reasonable and realistic than our ability to create and run them.
What I have done in the primordial past back when I worked "in the biz" and had audio test equipment and all the gear to play with:
- Standard frequency response at 1 W into resistive dummy load;
- SINAD (THD+N) sweeps to full power (typically low duty cycle);
- IMD sweeps or spot checks (e.g. 100 Hz, 1 kHz, 10 kHz);
- A few square-wave signals;
- Full-power test (FTC, including pre-conditioning);
- IHF burst test;
- Overload test -- pulsed signal at 3 dB over max output to assess clipping and recovery behavior; and,
- Pink noise at -20 dB and -10 dB (latter would usually cause some clipping).
Pretty sure my original list had ten things on it but I don't have the list anymore (at least anywhere I would be able to find it). I did run the slew/TIM test that was (I think) a 100 Hz square wave with a 10 kHz sine wave riding on it, or something like that, and some other burst tests like five full-power cycles at some frequency so I could see how quickly the amplifier recovered. At lower power (usually) it was interesting to see what happend with such a test signal when driving speakers. I did make up some speaker dummy loads with a couple of resonators to add a peak and valley at LF (100 Hz) and HF (10 kHz). Not a Power Cube, just a home-brew reactance network I added to my dummy loads that emulated some of the gnarlier speaker impedance plots I measured at the time. And I have said before my dummy loads were big gold-finned power resistors stuck in gallon paint cans filled with transformer oil. I had 4- and 8-ohm and could create other values by combining those.
Steady-state responses were (and are) typically well-defined and well-measured by manufacturers so, while I measured those when repairing or checking amplifiers, I got caught up in transient time-domain response of amplifiers to various "burst" conditions in my long-ago search for what differentiated amplifiers. Sometimes it was very revealing, sometimes not. Remember back then there were no, or at least I did not have, all the nice digital analysis gear we have now. (I had Nak and HP audio analyzers plus all the usual test gear like analog 'scopes and meters, including my big HP rms voltmeter, and a bunch of commercial and DIY filters for testing.) I could grab frequency sweeps on a spectrum analyzer and take a screen shot using a Polaroid camera pack, hand-record the THD numbers and graphs, etc. (HP made a chart recorder but it was pricey and the places I worked did not have one, though the local college did so I got to play with it some.) Burst response and recovery I could see on a 'scope with persistence (phosphor, not digital memory like today) and again grabbed the Polaroid to capture what I saw.
I could easily spend a day or two testing an amplifier, and had a blast doing it. Sometimes literally...
And at the end of the day, I found that pretty high levels of distortion (1% ~ 10% or more depending upon frequency and the type of distortion) was essentially undetectable when music was playing. I added distortion before the amplifier to emulate high nonlinearity and in blind tests it was rarely detected. Music is often complex and includes so many harmonic- and non-harmonic signal components that distortion added by the amps was in the mud.
Oops, this got long, sorry! This is why I should not take days off.
FWIWFM - Don