Only did one and four. Well four is longer than one. Four sounds a little louder than one.
Why because somewhere in the 150 millisecond range is the time integrating duration for the ear. At somewhat longer durations the perceived loudness will become stable. I was aware of this effect, but had to look up the particulars. Here is the first place I found the info. It would be in most college level texts on psycho acoustics. Like B.C.J. Moore's text which I have a copy of though it wasn't handy.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.580.3611&rep=rep1&type=pdf
It is assumed that the auditory system contains a temporal energy integrator, i.e. it performs a summation of the input signal. A simple way of estimating the relationship between thresholds and durations is to plot threshold against duration on a dB vs. logarithmic-time scale. Data will fall roughly on a straight line with a slope of -3dB per doubling of log duration. Letting J represent the integration time of the auditory system, several scientists have estimated its magnitude, - the estimated values lying in the region of 50-200 ms. Some researchers report that J is greater at low frequencies while others have found no frequency dependency. Another way of determining the integration ability of the auditory system is to present equal energy tone bursts of different durations. An ideal energy integration would imply that the detectability of these tone bursts would be independent of duration. According to Green, this is only the case in the region 15-150 ms outside of which the detectability will fall off. The fall off at long durations indicates that the integration operation is time delimited, while the fall off at very short durations might be a result of the spread of energy over the frequency range that occurs for short duration pulses. Other scientists have found similar results, but again there is some variation in the results, - some scientists reporting frequency dependency (low frequency, long duration and vice versa) and others do not. According to [4], and essentially also [1], the integration time is frequency dependent, about 60 ms up to 1000 Hz decreasing linearly to around 10 ms at 5 kHz. This is fairly consistent with an effective time constant (or integration time) for speech around 20 ms. According to Niese, speech intelligibility can be predicted using an integration of so-called useful energy in the range up to 17 ms (full weight) and a linearly decreasing weight factor in the range 17-30 ms.
You also could look at this powerpoint on the matter.
http://depts.washington.edu/sphsc461/temp_res/temp_res.pdf
Perfect!
So, my point was, that expectations based on the theory of Linear Time Invariant (LTI) systems, which are traditionally analyzed with the help of Fourier transform, are breaking down for Mammal Hearing System (MHS), which is neither linear nor time invariant.
In LTI, we care about durations, frequencies, sampling rates, and amplitudes in the time and frequency domains. In MHS, we also have to care about onset times, recuperation periods, levels of perceived loudness, inter-frequencies masking etc. "Four sounds a little louder than One" is not what LTI predicts, yet it makes perfect sense in the MHS framework.
The experiment illustrates at least two things:
(1) In MHS, perceived loudness depends not only on amplitude, but also on duration. This is a robust effect, linked to the hearing system's "slow" integrator, operating over tens of milliseconds. There also exists a less robust effect, not demonstrated by this experiment, due to "fast" integrator, operating over tens of microseconds, which makes a perceived onset time depend on the amplitude.
(2) Some of you will be able to differentiate between One and Two, some not. Or between Two and Three. Virtually everyone will be able to differentiate between One and Four. And this is for the "slow" integrator, considered rather consistent! Individual differences in functioning of the fast integrator are more difficult to elicit experimentally, yet they do exist.
Qualitatively, the number of dimension LTI operates in is smaller than the number of MHS dimensions. If we hold constant the value(s) in one or more of MHS dimension(s), we take the dimension(s) out of play, and then MHS behavior follows the LTI-predicted behavior more closely.
That's the general reason why "simple" music, mostly consisting of a small number of sinusoids slowly changing their amplitudes and frequencies over time, is more readily amenable to LTI analysis. The effects of the perceptual integrators fade away. Onset times matter less.
The "complex" music, with large number of sinusoids exhibiting fast and frequent onsets and fadeouts, chirps, and transients, is not as amenable to LTI analysis. The integrators play an important role in this case. We better preserve the information about the onset times more accurately.