#### Introduction

In the engineering of electronic devices for music production or reproduction all designers are faced with the problem of minimizing the different forms of distortion inexorably introduced by their creations on the audio signal. They are commonly classified into two categories:

**linear**and**nonlinear**. With reference to the representation of the signal in the frequency domain, the former are characterized by alterations of the amplitude and phase of the signal. The latter instead add new frequency components not present in the original signal, due to the nonlinear interaction of the frequencies present in the input signal. Both forms of distortion are often created voluntarily in production to give particular characters to voices or musical instruments, or to manage loudness and dynamics, but they are to be avoided in musical playback, especially nonlinear ones, since the effects they cause on our perception may be slightly "euphonic", not reversible and are still the subject of studies. However, it should be noted that some audiophiles prefer, even unconsciously, the presence of some types of nonlinear distortion in playback. More specifically, common experiences show that modest amounts of low-order nonlinear distortion are able to give sound properties such as:- 2nd order distortion: “warmer” and “softer” sound;
- 3rd order distortion: sound with more “dynamic contrast”.

The aim of this study is to investigate the cause of these effects on perception, looking for a correlation between the physical aspects of nonlinear distortions on signals and some results in the field of psychoacoustics and neuroscience. Beyond the measurements and above all the listening tests in support of the above that are not detailed here, the main investigation tool used will be the computer simulation of the behavior of some families of non-linear systems for music-like signals. This will lead us to define numerical indicators useful for carrying out qualitative speculations with the effects on perception.

**Nonlinear distortion measurement**

The commonly adopted approach to measuring nonlinear distortions on audio devices such as an amplifier is based on assuming that this is linear and permanent, and then looking for deviations from this ideal model. We then measure the amount of tones generated by our device by introducing single or multiple tones (synusoidal components at fixed frequencies). In the first case we will have new harmonic components at multiple frequencies of the fundamental; in the second we will have intermodulation products resulting from the interaction of two or more tones at frequencies identified by their linear combination. We then numerically quantify the extent of the former with the THD and the latter with the IMD. Both indicators are calculated as the ratio between the RMS (Root Mean Square) value of the harmonics introduced by the device with the RMS value of the harmonics in the source signal. For example, on top of in figure 1 it is shown the distortions introduced by a real tube preamp (6H30 in a Mu-follower stage) for a 1KHz sinusoidal signal.

Fig. 1 - Measure of non-linear distortions: Two tones @19KHz+20KHz (CCIF test, top) and a single tone of at 1KHz (bottom).

The 2nd order harmonic level is here of [email protected] while that of 3rd is very small, [email protected]; the THD detected is -72dB. The higher orders all have a lower level and not detectable here; the other tones present at lower frequencies are signal independent and are due to power supply.

With two tones at 19KHz and 20KHz in a 1:1, below in the same figure, we see the intermodulation products, at 1KHz and in the side bands. The detected IMD is -76dB. The increase in the number of tones will cause a considerable proliferation of intermodulation products, in this case at linear combination frequencies of all pairs or triplets of original tones, which form a sort of “carpet” at non-harmonic frequencies. Finally, other measurements can detect THD and IMD levels for each frequency or different levels of the input signal.

Listening experiences of this preamp reveal soft sound, with a slight emphasis on mid-low frequencies and a modest soundstage, at least when compared to a neutral solid state preamplifier like the Threshold FET Ten/e used as a reference. Which is in line with common experiences on the perception of distortions, as the preamp has predominantly 2nd order distortions. Now, none of the measurements or parameters listed provide clear clues to physical alterations on the signal that can be related to this perceived character for 2nd-order distortion, different from that of the 3rd. So let's try to understand what may be the determining aspects hidden in the measurements made.

With two tones at 19KHz and 20KHz in a 1:1, below in the same figure, we see the intermodulation products, at 1KHz and in the side bands. The detected IMD is -76dB. The increase in the number of tones will cause a considerable proliferation of intermodulation products, in this case at linear combination frequencies of all pairs or triplets of original tones, which form a sort of “carpet” at non-harmonic frequencies. Finally, other measurements can detect THD and IMD levels for each frequency or different levels of the input signal.

Listening experiences of this preamp reveal soft sound, with a slight emphasis on mid-low frequencies and a modest soundstage, at least when compared to a neutral solid state preamplifier like the Threshold FET Ten/e used as a reference. Which is in line with common experiences on the perception of distortions, as the preamp has predominantly 2nd order distortions. Now, none of the measurements or parameters listed provide clear clues to physical alterations on the signal that can be related to this perceived character for 2nd-order distortion, different from that of the 3rd. So let's try to understand what may be the determining aspects hidden in the measurements made.

**Mathematical model of nonlinearity**

One way to study in detail the effects of nonlinear distortions is to mathematically model the input/output behavior of amplifiers for these aspects. We will then be able to build a simulator (a program) that replicates its behavior for any input signal and thus perform all our investigations on any type of signal.

The definition of the model starts from identifying the cause underlying the generation of non-linear distortion in the amplifiers: the non-constancy of the gain for each level of the input signal. In other words, the input / output transfer curve

The static model, which we will adopt in the study presented, however, constitutes a good approximation of the behavior of amplifiers that show very limited memory effects when they remain in their normal working area. This class is recognizable by the following characteristics:

The definition of the model starts from identifying the cause underlying the generation of non-linear distortion in the amplifiers: the non-constancy of the gain for each level of the input signal. In other words, the input / output transfer curve

*f*(*x*) is not a straight line in the working range but has "imperfections" around 0 or towards the extremes. By modeling this curve and calculating the output values for each value of the input signal, a simulator of our non-linear system can be obtained. This simple approach determines a**static**type model, for which the value of the output signal depends only on the instantaneous value of the input signal. In real systems, including common audio amplifiers, there is instead a dependence also on the values assumed in the past: it is said that the nonlinear system is “with memory” and the relative model, much more complex, is of the**dynamic**type.The static model, which we will adopt in the study presented, however, constitutes a good approximation of the behavior of amplifiers that show very limited memory effects when they remain in their normal working area. This class is recognizable by the following characteristics:

- Flat frequency response, with extension at least up to 100KHz.
- Low level of harmonic distortions (< 1%) and frequency independence in the audio band.
- Phases of harmonic distortions independent of both frequency (always in audio band) and signal level.

In these hypotheses we will be able to model the transfer function

*f*(*x*) by means of a polynomial. For the 2nd and 3rd order distortions we are interested in studying we can limit ourselves to considering third degree polynomials:

We will also assume that the coefficients are all non negative; we will remove both of these restrictions later. If we now insert in this polynomial a simple sinusoidal signal (

*x*(*t*) = sin(ω*t*)) we will have that the different addends "control":*a*0 : component in DC, normally null.*a*1*x*: amplification (gain*g*) of the fundamental component.*a*2*x*²: 2nd order distortion, consisting of two components:- 2nd harmonic (HD2), out of phase by -90 degrees;
- DC, of the same entity as HD2.

*a*3*x*³: 3rd order distortion, consisting of two components:- 3rd harmonic (HD3), 180 degrees out of phase;
- contribution to the fundamental component (HD1), 10dB higher than HD3.

Therefore, it is appropriate to distinguish the distortion of order

To perform the discrete simulation of these non-linear distortions for a preamplifier the function

*i*due to the*i*-th power addition of the transfer function, from the harmonic component of order*i*. In formulas, the first is represented by the*i*-th power by the function sin(ω*t*) and the second by the harmonic sin(*i*ω*t*).To perform the discrete simulation of these non-linear distortions for a preamplifier the function

*f*(*x*) is inserted in a flow of operations illustrated in the following logic diagram.Fig. 2 - Scheme of the non-linear distortion simulator.

The discrete input signal

*x*[

*m*] is subjected to the following operations in cascade (chain in the center):

- Upsampling of
*x*[*m*] by an integer factor M, so that all subsequent processing works with a signal at a sampling frequency of at least 192KHz. This avoids potential aliasing phenomena due to the subsequent application of the transfer function*f*(*x*) which can create harmonics beyond the maximum frequency that can be represented with the original sampling rate. - Attenuation of the
*g**x*value, which simulates the input volume control. - Application of the non-linear distortion curve
*f*(*x*), of gain*a*1. Optionally you can here:- Apply a negative global linear feedback (chain down) with factor
*β*, resulting in a gain*G*=*a*1 / (1 +*βa*1). - Apply the complementary function
*g*(*x*) in place of*f*(*x*) such as to cancel the effect of*f*(*x*), i.e.*g*(*f*(*x*)) =*x*. - Eliminate the DC component created by the distortion.

- Apply a negative global linear feedback (chain down) with factor
- Addition of Gaussian white noise
*N*[*n*] to simulate the contribution of thermal noise, of a configurable entity. - Attenuation of the
*g**y*value, which simulates the output volume control.

The signal thus obtained,

To focus on the main concepts of the study in the first simulations reported we will make the following assumptions:

*y*[*n*], represents the distorted signal. The subsequent statistical analyses however require the separation of*y*[*n*] into two parts: only the distortion component*d*[*n*] and the signal*s*[*n*], equal to the amplified input signal of the product of the reported values (*g**x*,*G*and*g**y*). The separation is obtained by subtracting from*y*[*n*] the same amplified input signal in a perfectly linear (top chain) and synchronous way.To focus on the main concepts of the study in the first simulations reported we will make the following assumptions:

- Absence of feedback (
*β*= 0). - Absence of thermal noise (
*N*[*n*] = 0). - Attenuation and unity gains (
*gx*=*a*1 =*g**y*= 1), so that the input signal*x*[*m*] and output signal*s*[*n*], except for oversampling, coincide.

The preamplifier described in the previous paragraph (moreover, without global feedback) is well represented by this model. As a check we can generate test signals and compare the output of the amplifier with that of the simulator. As a sample, the first diagram in figure 3 shows the preamplifier output for a series of 8 equally spaced tones in frequency from 10KHz to 17KHz, level at -18dB and

*f*s = 192KHz. The graph clearly shows the intermodulation products in the side bands of the original tones. The second diagram shows the result of the simulator for the same source signal with a 2nd and 3rd order distortion level obtained from measurements with single tones, together with the ‘noise carpet’, distinguishing the undistorted signal*s*[*n*] in red from the distortion component*d*[*n*] in green.Fig. 3 - Measure (top) and Simulation (below) with 8 tones @ 10KHz, 11KHz, …, 17KHz; input at 1Vrms; 0dB Gain.

The agreement is very good; the small differences are due to neglecting linear distortions (the frequency response of the preamp not perfect for module and phase) and interactions with the nonlinear distortions of the DAC/ADC converters of the measuring instrument. Note the distortion components coinciding with the original tones, not detectable in the measurements, which we will return to later.

As a last note, it should be pointed out that the use of *y*[

*n*] in listening tests to simulate the distortions of a real amplifier should be taken with precautions, the details on this post.

**Analysis of sinusoidal and impulsive signals for single order distortions**

Let's start by analyzing the shape of the distortions in the time domain related to transfer curves that have only one of the two orders, 2nd or 3rd. It is not a very realistic situation, but it allows us to understand how the two orders act separately on the signal. The first two graphs at the top of Figure 4 show these transfer curves in the normalized working range [-1, +1], with

*a*0 = 0 (DC) and*a*1 = 1 (unit gain); the one on the left with only HD2 = -60 (*a*2 = 0.002,*a*3 = 0) and the one on the right with only HD3 = -60dB (*a*2 = 0,*a*3 = 0.004). For both, the THD is equal to 0.1%. The curves are amplified by 100 times with respect to the 45 degree straight line to show its trend.Fig. 4 - Source signal (top) and its 2nd order (middle) and 3rd order (bottom) distortions, last two amplified. Detail, Time simulation.

The lower part shows the deformation undergone by the classic sine wave: the source curve in orange; in blue the distorted curve and in red the distortion alone, always amplified by 100 times. The transfer functions act on the signal as follows:

- 2nd order: Expansion of the positive values of signal (
*f*(*x*)>*x*for*x*>0) and compression of the negative ones (*f*(*x*)<*x*for*x*<0) - asymmetric distortion. - 3rd order: Expansion of both positive and negative values of signal (|f(
*x*)|>|*x*| for |*x*|>0) - symmetrical distortion.

We now use the simulator to study more complex signals: we build a source signal composed of several equally spaced tones in frequency at the same level and phase: 100 tones spaced 200Hz from each other, from 50Hz to 20KHz, level at -40dB to avoid clipping; for the phase we choose the same constant value, for example -90 degrees, to move the signal more. In the time domain all these tones are constructively summed up in a few short time windows: the signal takes the form of a sinc pulse train, given the band limitation at 20KHz, with alternating phases of 0, -90, -180 , -270, 0, ... degrees, and spaced 1/200 sec.

Let’s now simulate the passage of this signal in two amplifiers, without memory effects and 0dB of gain, where the first is affected by distortion of only 2nd order and the second only of 3rd. So, in the measurements, we will observe for a single tone only the 2nd harmonic and only the 3rd harmonic respectively. In the simulation we will set these two values at -60dB, which determine a THD of 0.1%. Figure 5 reports the detail for a single pulse out of phase by -90 degrees in channel 1; channels 2 and 3 report only the resulting distortion components, both amplified by 50dB to highlight the trend. The signal is sampled with

Let’s now simulate the passage of this signal in two amplifiers, without memory effects and 0dB of gain, where the first is affected by distortion of only 2nd order and the second only of 3rd. So, in the measurements, we will observe for a single tone only the 2nd harmonic and only the 3rd harmonic respectively. In the simulation we will set these two values at -60dB, which determine a THD of 0.1%. Figure 5 reports the detail for a single pulse out of phase by -90 degrees in channel 1; channels 2 and 3 report only the resulting distortion components, both amplified by 50dB to highlight the trend. The signal is sampled with

*f**s*= 192KHz.Fig. 5 - Source signal (top) and its 2nd order (middle) and 3rd order (bottom) distortions, last two amplified. Detail, Time simulation.

We observe that:

- 2nd order distortion always assumes positive values, proportional to the absolute value of the source signal, compressing the negative half-wave and expanding the positive half-wave (asymmetrical distortion). As a consequence we will have simultaneously effects of attenuation and enhancement of the transients.
- 3rd order distortion instead enhances both positive and negative half-waves, thus increasing the steepness of all transients; the signal extension also increases symmetrically.

Therefore, the expansive and compressive effects behave as in the simple sinusoidal case. Suppose now that these alterations are examples of what happens more generally on musical signals: what are the characteristics of our auditory system that come into play determining the way we perceive them?

To try to answer this question, we must resort to some notions of psychoacoustics.

From the point of view of frequency impacts, it is now well understood that even non-negligible levels of low-order harmonic distortions are not audible. Simplifying, this effect is mainly due to the fact that the ear, when stimulated by a pure tone, generates impulses towards our brain that identify a set of frequencies in the "surrounding" of the fundamental, more extended towards the high frequencies than the low ones and dependent on the signal level. This generates the so-called masking effect, which raises the audibility threshold of frequencies close to the tone of the highest masking level. The amount of masking also depends on the relative phase between the tones involved, and also occurs when the signals are separated in time. Thus the so-called "critical bands" are determined, which cause different stimuli for tones that fall in the same band or in different bands (the MP3 format is based on this phenomenon for compression). It should be added that the effect is milder for the intermodulation products, generally present at frequencies theoretically much more distant from the main ones, non-harmonic, therefore "dissonant" and potentially able to confuse other tones present in the signal.

From the point of view of the temporal impacts of the distortions we can refer to recent advances in the field of neuroscience. These confirmed that our auditory system under certain circumstances exhibits a much higher temporal resolution than that in frequency. In fact, if it is now widely established that our auditory system is able to detect frequencies up to around 20KHz, the temporal resolution, that is the ability to locate transient events over time, is between 6μs and 10μs. The math tells us that this value corresponds to a frequency of around 125KHz, well 7 times higher than our audible audio band. It is therefore hypothesized that these two aspects, contained in frequency and arrival times of a signal, are processed by different parts of our auditory system, probably in a similar way to what our eye does, in which the cells of the retina are differentiated into two. types, rods and cones, to reveal brightness and color separately. Not surprisingly, the MQA format has among the key aspects of its design that of "time precision" (further details in this article).

Returning to the physical effects of the distortions reported in the previous diagram, we have seen how these act on transients, with effects of an expansive or compressive type, and consequent advance or delay in time. It is therefore reasonable to hypothesize that non-linear distortions may have effects on our perception also for the temporal aspects of sounds, which are added to the well-known ones relating to the presence of additional harmonic components. It is on the temporal aspect that we will focus in this study to find a possible justification for the effects described in the introduction.

Again an analogy with our visual system regarding transient alterations is curious. If we look at the 3rd order distortion curve in Figure 5, we note that this is attributable to the effect caused by a filter known as "unsharp mask". The filter (due to Laplace) does nothing but detect the edges in an image: subtracting the result of this filter (possibly attenuated) from the original image, a sort of double border is added to each element contained in the image, where the darker part is further darkened and the lighter part is lightened. This effect is clearly visible at very close range, but by observing the image from a distance it induces an effect of greater sharpness on our visual system. The filter relating to the 2nd order, on the other hand, is not used. An example of applying these filters in a non-aggressive way is in the following figure (to be observed at 100% of the dimensions).

To try to answer this question, we must resort to some notions of psychoacoustics.

From the point of view of frequency impacts, it is now well understood that even non-negligible levels of low-order harmonic distortions are not audible. Simplifying, this effect is mainly due to the fact that the ear, when stimulated by a pure tone, generates impulses towards our brain that identify a set of frequencies in the "surrounding" of the fundamental, more extended towards the high frequencies than the low ones and dependent on the signal level. This generates the so-called masking effect, which raises the audibility threshold of frequencies close to the tone of the highest masking level. The amount of masking also depends on the relative phase between the tones involved, and also occurs when the signals are separated in time. Thus the so-called "critical bands" are determined, which cause different stimuli for tones that fall in the same band or in different bands (the MP3 format is based on this phenomenon for compression). It should be added that the effect is milder for the intermodulation products, generally present at frequencies theoretically much more distant from the main ones, non-harmonic, therefore "dissonant" and potentially able to confuse other tones present in the signal.

From the point of view of the temporal impacts of the distortions we can refer to recent advances in the field of neuroscience. These confirmed that our auditory system under certain circumstances exhibits a much higher temporal resolution than that in frequency. In fact, if it is now widely established that our auditory system is able to detect frequencies up to around 20KHz, the temporal resolution, that is the ability to locate transient events over time, is between 6μs and 10μs. The math tells us that this value corresponds to a frequency of around 125KHz, well 7 times higher than our audible audio band. It is therefore hypothesized that these two aspects, contained in frequency and arrival times of a signal, are processed by different parts of our auditory system, probably in a similar way to what our eye does, in which the cells of the retina are differentiated into two. types, rods and cones, to reveal brightness and color separately. Not surprisingly, the MQA format has among the key aspects of its design that of "time precision" (further details in this article).

Returning to the physical effects of the distortions reported in the previous diagram, we have seen how these act on transients, with effects of an expansive or compressive type, and consequent advance or delay in time. It is therefore reasonable to hypothesize that non-linear distortions may have effects on our perception also for the temporal aspects of sounds, which are added to the well-known ones relating to the presence of additional harmonic components. It is on the temporal aspect that we will focus in this study to find a possible justification for the effects described in the introduction.

Again an analogy with our visual system regarding transient alterations is curious. If we look at the 3rd order distortion curve in Figure 5, we note that this is attributable to the effect caused by a filter known as "unsharp mask". The filter (due to Laplace) does nothing but detect the edges in an image: subtracting the result of this filter (possibly attenuated) from the original image, a sort of double border is added to each element contained in the image, where the darker part is further darkened and the lighter part is lightened. This effect is clearly visible at very close range, but by observing the image from a distance it induces an effect of greater sharpness on our visual system. The filter relating to the 2nd order, on the other hand, is not used. An example of applying these filters in a non-aggressive way is in the following figure (to be observed at 100% of the dimensions).

Fig. 6 - Simulation of 3rd order distortion al left; 2nd at right; original RAW image at center.

The image on the left (equivalent to the 3rd order) appears sharper, more "dynamic", enhancing both transients; in the one on the right (equivalent to the 2nd order) this effect is still present but less pronounced, enhancing the transients on the light tones and compressing those relating to the dark tones, appearing overall more “soft”. Both modified versions seem to improve on the original image: of course it is not exactly the same processing that our hearing system does, but it seems a nice coincidence for the equivalent effect on the sound!

**Analysis of music-like signals for single order distortions**

Now let's try to understand how much of the considerations in the previous paragraph applies to a signal more similar to a musical one. We build a harmonic signal, with different phases and decreasing intensity with frequency: 64 tones from 30Hz to 48KHz in 1/6 octave steps, starting from -25dB, decreasing on the frequency level linearly to -2dB/KHz, each with random phase. This signal will appear erratic over time, given the random phases and the relationships between frequencies, which is difficult to analyze directly over time. Also the 2nd and 3rd order distortions, always at -60dB on single tone, will follow the same trend as shown in the following figure.

Fig. 7 - Source signal (top) and its 2nd order (middle) and 3rd order (bottom) distortions, last two amplified of 55dB. Time simulation, detail.

Upon visual inspection, both distortions seem to be attributable to the curves of Figure 5, applied to each micro-transient: here too the 3rd order distortion enhances all transients; that of 2nd enhances the transients where the signal is positive and attenuates the negative ones. At this point it is appropriate to use a couple of statistical indicators that give quantitative evidence, based on the availability of the source signal and that relating only to the distortion perfectly time-aligned.

*DSA**(Derivative Sign Agreement)*With the first indicator, the DSA, we characterize the type of alteration suffered by the transients, where by “type” we mean the two effects encountered, which we will call

**expansive**and**compressive**. Mathematically, the DSA is expressed by the fraction of the overall agreement between the transients of the source signal and those of the distortion. The transients are qualified through the derivative of each signal, which expresses the speed with which the signals vary: it is positive when the signal grows; negative when it decreases. Therefore:- Where there is agreement between the sign of the derivative of the source signal and that of the distortion we will have an increase in the speed of the transient, increasing or decreasing; therefore, an expansive effect.
- Where there is discord, the distortion will reduce the speed and level reached of the transient; hence, a compressive effect.

If we have upsampled the signal over time x4 or more, we can get a good approximation of the derivatives by simply calculating the difference between the consecutive pairs of samples at this higher frequency. Let's see what has been described with an example.

Fig. 8 – DSA Domains identification - Example.

Figure 8 shows at top a segment of the original signal

We will indicate below the set (or domain, generically indicated with

*s*[*n*] in blue ever-increasing; in purple the distorted signal*s*[*n*]+*d*[*n*] (heavy way to show the details). The difference between the two signals, in red, is therefore the distortion component d[n]: it grows fast in a first stretch and then decreases more slowly. In the center is shown the*s'*[*n*] derivative of the original signal that always assumes positive values, since*s*[*n*] is always increasing. At the bottom the derivative*d'*[*n*] of the distortion component: up to*n*=5 it is positive, like*s’*[*n*]: so the distortion increases the slope of*s*[*n*], as is evident in the curve in purple. Subsequently,*d'*[*n*] becomes negative, a sign opposite to*s'*[*n*], indicating that the distortion decreases the slope of*s*[*n*]. Here the transient agreement is 5/(5+9) = 0.36.We will indicate below the set (or domain, generically indicated with

*D*) of the indices of the signal samples where there is expansion with '+' (positive); with '-' (negative) where there is compression. As defined, we will indicate with DSA the fraction of the only expanded transients; unless otherwise specified, the negative ones will be equal to 1 - DSA.

**PSD**

**(Partialized Signal Distortion)**With this second indicator we quantify the amount of distortion suffered by the signal limited to each of the two types of alteration, expansive or compressive. By borrowing the definition of THD or IMD, we will define the PSD as the ratio between the RMS value of the distortion component

*d*[*i*] and the RMS value of the undistorted signal component*s*[*i*], calculated on the samples related to one of the two domains, '+' and '-'. In formulas:

Compared to THD, or rather IMD, we note that:

- PSD is calculated in the time domain instead of frequency. However, this is not a determining aspect: for Parseval's Theorem, the two methods are equivalent.
- The PSD can be evaluated on any signal, and covers all the distortions suffered, even the noise.
- Only segments of the overall signal are considered.

Returning to the signal in question, we will have:

⁃ 3rd order: DSA = 100%, s- = -60.3dB, s- = N.D. IMD = -60.3dB

The DSA values confirm the visual inspection of the transients: both orders act on the micro-transients according to when seen for the impulsive signal (also here the DSA assumes the same values). PSD values tell us that the amount of distortion is slightly greater for the 3rd order.

To complete the picture, we perform the analysis in the frequency domain of the distortions. The following diagrams show the spectral trend of the distortions in Figure 7, where the reference is with respect to the level of the source signal, decreasing in frequency to -2dB/KHz. Figure 9a shows the 2nd order distortion and the same without the components at the same frequencies coinciding with those in the source signal; Figure 9b that of 3rd order.

Fig. 9a - 2nd order distortion (in dB, blue line) and average (red line); reference to input signal.

Fig. 9b - 3rd order distortion (in dB, blue line) and average (red line); reference to input signal.

Fig. 9b - 3rd order distortion (in dB, blue line) and average (red line); reference to input signal.

On the qualitative trend of the distortions we can say that:

- The level of the "carpet" of distortion is almost constant for both orders.
- The peaks of 2nd order distortion are modest, almost at the same level at each frequency, denser in the medium-low frequencies and generally not coincident with the tones of the original signal.
- The peaks of 3rd order distortion are very pronounced, almost at the same level at each frequency and coinciding with the tones of the original signal (phase synchronized, not shown).

These physical characteristics, combined with those detected in the time domain, allow us to speculate a correlation with the effects on perception: the 2nd order distortion emphasizes the medium-low frequencies (moreover not detected by the normal measures of frequency responses) and "softens" the signal (points 1 and 2), while that of 3rd increases the dynamic contrast (point 3). It must be said that these considerations are derived from the analysis of the trends of the distortions on a hypothetical musical signal; carrying out other simulations with signals with different characteristics, a change is observed for the quantitative aspects, i.e. more slope of the carpet, variations in the density of the peaks etc., but the behaviors described, in the working hypotheses set out, remain substantially unchanged.

**Analysis of music-like signals with distortions of more orders**

Let's now analyze how DSA and PSD behave in more realistic situations where both orders of distortion are present. Let's start by setting a reference level for the 2nd harmonic and vary the level of the 3rd. We will briefly express this variation as the ratio between the 3rd/2nd levels in dB. Figure 10a shows two graphs; the first, as a reference for a single tone, reports the curves:

- The level of harmonics related to 2nd order distortion (in red) and 3rd order (in blue). For the 2nd order we have the 2nd harmonic (HD2) equal to the reference value set at -90dB (DC is omitted). For the 3rd order, we have two components: the 3rd harmonic (HD3) and the contribution to the fundamental frequency (HD1), which is always higher than 10dB. These increase progressively, from -48dB to + 48dB compared to the HD2 level.
- The
**True-THD**(in purple), with which we indicate the value of the Total Harmonic Distortion in which we also consider the distortion of the fundamental (HD1), normally neglected. It coincides with the “classic” THD when the distortion is mainly due to the 2nd order (on the left in the graph); it is higher than 10dB when it is due to the 3rd (right in the graph). - The DSA (in gray, with an ordered reference on the right) which shows the percentage trend of the expanded transients; compressed ones are here equal to 1 - DSA, i.e. symmetrical. In line with what has already been seen, where the 2nd order prevails, there is a parity between expanded and compressed transients (“warm” effect); where 3rd order prevails there are only expanded transients (“dynamic” effect). Between -12dB and +3dB there is the transition from the first to the second situation, with a very pronounced slope.

The second graph shows the DSA and PSD curves for the simulated music-like signal, parameterized on three reference values of the 2nd harmonic: -110dB, -90dB and -70dB.

Fig. 10a - HD level, True-THD, DSA (top) and PSD, DSA (below) per 3rd/2nd harmonic ratio (dB), with more 2nd harmonic references.

Here we observe that:

- The DSA curve (in gray, with ordinate reference on the right), similar in trend to that obtained for a single tone, less steep in the initial climb and shifted to the right. This highlights that the presence of more tones delays and makes the emergence of 3rd order dynamic effects more gradual. Furthermore, the curve is invariant with respect to the different reference levels: therefore, the mix of the two types of distortion on the transients depends only on the ratio of the levels of the two harmonics.
- The PSD curves are affected, as expected, by the variation of the 2nd order reference level which vertically translates the pairs of PSD+ (in red) and PSD- (in blue) curves by the same amount as the variation of the reference, 20dB. For a given reference level, where the 2nd harmonic prevails, the PSD curves remain constant and coincide with the value of the same reference; after that:
- The PSD+, associated with the expanded transients begins to increase around the 3rd/2nd harmonic ratio of -12dB, reaching a constant slope in addition to the 6dB ratio, similar to the trend of True-THD. Therefore, the dynamic effect due to 3rd harmonic distortion begins to act before the DSA increases, around -3dB.
- The PSD-, associated with the compressed transients, has a symmetrical trend to the PSD+, with an advance of the descent of about 10dB on the 3rd/2nd ratio. This indicates that, as the 3rd order increases, the amount of distortion on the same transients decreases, faster than the contributions to the expanded transients increase.

We now trace the same curves, always as a function of the value ratio between 3rd/2nd harmonic, keeping constant the value of the overall distortion, that is the True-THD, constant on three reference values: -80dB, -70dB and -60dB.

Fig. 10b - HD level, True-THD, DSA (top) and PSD, DSA (below) per 3rd/2nd harmonic ratio (dB), with more True-THD references values.

The harmonic distortion graph refers for simplicity to a single True-THD value, -80dB. As expected, for music-like signals the DSA remains invariant, while the PSD curves translate vertically by the same value relative to the variation of the True-THD. Distortion is equally distributed between expanded and compressed transients where 2nd order prevails. When the level of the 3rd harmonic approaches that of the 2nd, the expanded transients (PSD+) undergo a decrease of 10dB. This effect is due to the fact that the True-THD reference value also considers the distortion on the fundamental (HD1). Compressed transients (PSD-) suffer more prominently, decreasing quickly in the same situation.

Let's now change the perspective, analyzing the distortions as a function of the level of the source signal. The graph in Figure 11 shows the True-THD, DSA and PSD curves at the input levels of 0dB, -10dB, -20dB and -30dB, setting the 2nd harmonic reference level at -70dB.

Let's now change the perspective, analyzing the distortions as a function of the level of the source signal. The graph in Figure 11 shows the True-THD, DSA and PSD curves at the input levels of 0dB, -10dB, -20dB and -30dB, setting the 2nd harmonic reference level at -70dB.

Fig. 11 - True-THD, DSA (top) and PSD, DSA (below) per 3rd/2nd harmonic ratio (dB), for more input levels, ref. 2nd harmonic.

The curves in the first True-THD and DSA graph are always relative to the distortion experienced by a single tone for different input levels. The True-THD curve is translated vertically downwards and to the right by the same amount of attenuation, 10dB, while the DSA curve is only shifted to the right by the same amount. These effects are due to the fact that the 3rd order distortion decreases faster than the 2nd order distortion, respectively with the cube and square of the signal level. Therefore, with the same ratio between the two harmonics, the characteristic of the 3rd order distortion (dynamic effect) will “struggle” more to manifest itself as the input level decreases. The second graph shows the PSD and DSA curves for the music-like signal, translated in the same way as the True-THD and DSA of the previous graph for the same reasons. For the rest, the trend is similar to that already described for the curves of fig. 10a.

It should be noted that these effects are obtained by acting on the volume of the reproduction if the amplifier has this control at the signal input or within the same piece of music when passing from parts to high to other lower levels and vice versa. This last aspect is more evident in the graph in fig. 12 which reports the DSA and PSD values as a function of the input level. The curves refer to different 3rd/2nd harmonic ratios (from 0dB to 30dB in 10dB steps), with the 2nd reference at -70dB.

It should be noted that these effects are obtained by acting on the volume of the reproduction if the amplifier has this control at the signal input or within the same piece of music when passing from parts to high to other lower levels and vice versa. This last aspect is more evident in the graph in fig. 12 which reports the DSA and PSD values as a function of the input level. The curves refer to different 3rd/2nd harmonic ratios (from 0dB to 30dB in 10dB steps), with the 2nd reference at -70dB.

Fig. 12 - True-THD, DSA (top) and PSD, DSA (below) per 3rd/2nd harmonic ratio (dB), for more input levels, ref. 2nd harmonic.

For low input levels the distortion is always dominated by the 2nd order characteristics, as evidenced by all the curves; for higher values of the input level the expansive effects of the 3rd order are felt, both on the DSA and on the PSD, so much stronger the higher the 3rd/2nd harmonic ratio. It should be added that the curves do not take into account the level of background noise which in practice hides the lowest levels of distortion. With the noise, the previous graph would be a straight line that, from high values at low input levels, drops progressively until the reported curves are intercepted.

**Preliminary Conclusions**

The above helps us to bridge the apparent gap between the subjective experiences of listening to music and the measures of non-linear distortions. The study shows that the "dynamism" effect is probably caused by the expansive contributions of 3rd order distortion to the tones in the main signal, which strengthen its energy content and therefore the transients, to which we have a high sensitivity. These contributions are milder for the 2nd order, which distributes energy in the form of a "carpet", more pronounced in the medium-low frequencies. The presence of both orders of distortion produces intermediate effects, also dependent on the level of the input signal. With the DSA and PSD parameters we are able to numerically qualify these effects. And here we stop for now with speculations: to what extent a more or less large variation in physical quantities is perceived as more or less important must be experimented with listening tests.

Last edited: