I notice that Amir posted this a few months ago and I might have missed it but I'd didn't see the results of the individual listening trials posted, let along the statistics that would accompany them. Given the level of scientific rigor ...
The results of the limited preliminary tests that I described some months ago have been useful to understand whether what hypothesised (the relationship between specific type of distortions and listening pleasure, which relies on the masking effect of our ear) are a way worthy of a scientific study or less. These tests, let's say “non-formal”, gave encouraging results, in line with other experiences “in the field”. To proceed with the study, I agree with you, it is necessary to build rigorous listening test protocol. This is one of the reasons why I opened the thread, looking for opinions and suggestions, which in fact have arrived numerous; I take this opportunity to thank everyone for the time dedicated to me.
At the same time, preparatory to the set up of the test protocol is the definition of the mathematical model of digital simulation for the nonlinear distortions. Interesting aspects emerged while working on the model, which made me touch many of the considerations described in Chap. 4 of the GedLee
publication. I summarise below the main aspects of the adopted model to highlight how complex is the topic and therefore have an idea about how difficult is to understand "how it sounds" a particular amplifier. In closing, the details on the test protocol that I am developing. Suggestions are always welcome.
Static nonlinear distortion model
In a nutshell, we can divide the type of distortions caused by systems (in this context, any audio electronic device) in two families: linear and nonlinear.
- Linear distortions determine in the output “only” alterations of the module and phase of each of the frequencies contained in the input signal. These are fully modelled by the Transfer Function, the curve that is most frequently shown and measured for any audio device: in general, a curve that is as flat (in the audible band) as possible is preferred to avoid "sound colouring".
- Nonlinear distortions instead introduce frequency components into the output that are not present in the input signal. These are linear combinations of the frequencies in the input signal and they generally reported through the values of THD (harmonic distortion) and IMD (intermodulation) which quantify them with simple numbers. Low values of these ones are better than highs, even if what really matters is the structure in frequency of distortions.
These two components are always present in a system, so we can exemplify our system in the following figure:
Figure 1 - Simple Nonlinear Static amplifier model
Here an analog input signal
x(
t) passes through a system which transforms it into another output signal that we can represent for convenience in the form
g⋅
y(
t), where:
- g is the gain operated by the system (generally in voltage), which we will assume constant and frequency independent, at least in the usual range 20Hz-20KHz. It can be controlled through the volume if the device is a preamplifier; constant not modifiable if it is a power amp.
- f(x) models the nonlinear aspect of the system. This is a function that for each instantaneous value of the input x(t) returns a value u(t) which depends only on the value of x at that instant (less the crossing time, which is not interest here); moreover, it does not change over time. It is normalised (gain = 0dB). If f(x) = x (output = input) then the system has no nonlinear distortion; any other relation determines new frequency components in u(t).
- H(f) models the linear part, i.e. the transfer function that determines how each frequency component of its input is altered. This function in fact also models the memory effects of the system, absent in f(x). The analogue in the time domain of H(f) is the impulsive response h(t), linked together by the Fourier transform.
Both curves
f(
x) and H(
f) depend on the type of amplifier; more specifically,
f(
x) is very different between the solid state and tube family of amplifiers. Using a "black box" approach, we assume that it is representable through a polynomial, which essentially represents Tylor's series development of the original function, truncated to a certain number of terms:
By introducing a pure sinusoidal function
x(
t) in
f(
x) and carrying out the expansions, it results that the term
ai controls:
-
a0: component in DC, usually equal to 0.
-
a1: component of the original signal, close to unity.
-
ai,
i > 1: harmonic distortion component of order
i,
i - 2,
i - 4, ...
Therefore, the degree of
f(
x) determines also the order of the maximum harmonic distortion that is generated. For amplifiers in normal working conditions the distortions are generally lower than the 10th order; the higher ones are covered by the thermal noise.
For example, the
f(
x) curve obtained through distortion measurements with a tone at 1KHz, 1V
rms, 0dB gain of the preamplifier used for the tests, the
Threshold FET10/e, of low distortion, is shown in the following figure. Here the effect of the distortion is amplified 10000 times (80dB) with respect to the line at 45 degree to make it visible.
Figure 2 - f(x) for Threshold FET10/e at 1KHz, 1Vrms, 0dB gain
It is evident here how the distortion, higher in the second harmonic, is asymmetrical. Furthermore:
- Positive values of x(t) are slightly "compressed" for low values, and then "expanded" for higher values.
- Negative values of x(t) are always compressed more.
Simulation
Now let's try to transform the system of Fig. 1 into an equivalent one, where the target system is replaced by an ultra-linear one and its distortions are digitally injected. A first step is to consider the discrete version
x[
n] of
x(
t), and then we introduce in the chain a DAC upstream of the amplifier, as shown in Fig. 2-a.
Figure 3 - Simple Nonlinear Static amplifier simulation
The next step is to replace the target system with two components:
- An ideal amplifier downstream of the DAC with zero linear and nonlinear distortions, which only takes care of amplifying the input signal of the g value. In reality, we will be "satisfied" with using an amplifier with very low distortion levels and a high bandwidth.
- A component upstream of the DAC (the program I wrote) that digitally simulates the application of f(x) and H(f) (h[t] in discrete time domain) functions on the discrete signal x[n]. Both create the y[n] version with unity gain, in input to the DAC which transforms it into y(t) in input to the linear amplifier, as shown in Fig. 2.b.
So, y[
n] value is obtained from
x[
n] with the simple formula:
where * represent the convolution operation.
As additional note, the DAC should be configured to avoid oversampling: we'll see why in the next section.
Identification of f(x)
The coefficients of
f(
x) are calculated by the harmonic distortion value in dB for each order (up to 32). These values can be obtained from a real device measuring the harmonic distortion of a pure tone at 1KHz. From these data, the coefficients
ai of
f(
x) can be obtained resolving a set of linear equations.
For example, for distortion components of -50dB, -60dB and -70dB for second, third and fourth harmonic respectively (THD = 0.33%) we will have the following time contributions for each harmonic for one cycle of the fundamental harmonic (DC is omitted; the resulting curve is dashed):
Figure 4 - Distortion shape by frequency synthesis
The corresponding polynomial
f(
x), of 4th degree, has coefficients
a1 = 0.9897,
a2 = 0.0037,
a3 = 0.0039,
a4 = 0.002, whose addends together give rise to the same curve for a sinusoid as an input signal:
Figure 5 - Distortion shape by time domain components
Processing of x[n]
In the discrete processing of the
x[
n] several precautions must be taken:
- Generally, the bandwidth of an amplifier exceed 100KHz. This implies that the discrete signal processing, that can create ultrasonic frequencies, must have a sampling frequency fs higher than at least 200KHz to avoid any aliasing phenomenon. We have therefore chosen to perform an oversampling as the first operation on x[n], bringing it at least to fs = 352.8KHz or 384KHz (configurable up to 32x).
- After the distortion injection, the signal can then be brought to a lower sampling frequency, with a decimation operation downstream of an aliasing filter to limit it to the new in-band. This operation is also configurable: at least fs = 176.4KHz or 192KHz is recommended.
- Given the numerous operations on the signal, the risk of inserting distortions due to rounding errors is contained by managing all calculations, in floating point with 64bit precision. At the output, the signal is brought to the bit depth of 24bit through a re-quantization operation. To avoid introducing unwanted distortions in this operation, the signal is subject to dithering.
- Since f(x) is normalised (f(1) = 1 or f(-1) = -1), the injection of harmonics is related to this value. If the g value (volume) of the ideal amplifier is changed, the amount of distortion in dB in the output signal g⋅y'(t) always remains at the same value in dB. This is not what happens with the real amplifier, where the distortion level depends instead on g value. The implication is that the amount of distortion injected is that actually heard only at a certain listening level. In the process of calculating f(x) it is however possible to set the desired value of g to remodulate the amount of distortion injected.
Structure of distortions
Some interesting properties derive from the analysis of the equations that control the estimation of
f(
x):
- The phase of the distortions relative to a pure tone have the following values:
- order 1, 5, 9,…: 0 degree
- order 2, 6, 10,…: -90 degree
- order 3, 7, 11,…: 180 degree
- order 4, 8, 12,…: +90 degree
- Odd harmonics (symmetric) make contributions to the fundamental:
- order 3: 3/4⋅a3⋅x^3
- order 5: 10/16⋅a5⋅x^5
- order 7: 35/64⋅a7⋅x^7
- order 9: 126/256⋅a9⋅x^9
- Even harmonics (asymmetric) add a DC component:
- order 2: 1/2⋅a2⋅x^2
- order 4: 3/8⋅a4⋅x^4
- order 6: 10/32⋅a6⋅x^6
- order 8: 35/128⋅a8⋅x^8
- If the coefficients of f(x) are all positive, we have that a harmonic of order d adds contributions (in phase) to the lower harmonics of degree d-2, d-4, ..., in increasing quantity as the order decreases. This implies a decreasing structure of the distortion values with increasing degree, separately for even and odd harmonics.
- As the level of the input signal increases, the amount of distortion increases faster for higher order harmonics (there is the dependence on x^n). For example, when measuring the RME-ADI2 pro fs DAC chain with the Threshold FET10/e preamplifier, this distortion pattern results (1KHz tone, 0dBFS, 1Vrms in/out):
Figure 6 - Measured Harmonic Distortion of a 1KHz tone
While the distortion measure per input level is as follows:
Figure 7 - Measured Harmonic Distortion per input level of a 1KHz tone
By obtaining the curve f(x) with the described model, the following diagram results for the same graph, unless of course the background noise (-135dB).
Figure 8 - Harmonic Distortion per input level by the model
There is a moderate agreement, but not perfect... why?
Dynamic nonlinear distortion model
The main reason for the moderate agreement for the data above is due to nonlinear distortion model adopted, of the static type, where it is assumed that the nonlinear component
f(
x) has no memory about the value of the signal handled in the previous instants (only the linear one deals with it). Unfortunately, this hypothesis is not true for audio devices, which all exhibit this type of memory effects, named also dynamic nonlinear distortions. From a physical point of view this is determined by the non-linearity of the components that make up the amplifier, together with thermal effects. From a mathematical point of view, the effect of memory translates into making the value of
u(
t) depend in a given instant also on the values assumed by the entry in previous instants. From a certain point of view, it can be said that the way in which
x(
t) moves on the “static”
f(
x) curve modifies the
f(
x) curve itself, which determines alterations in the structure of the distortions; in other words,
f(x) depends on the frequencies present in x(t).
An indication of this effect can be found in the measure of harmonic distortion as a function of frequency: more the curves deviate from a constant value, more the system is affected by dynamic distortion. The following graph illustrates this effect for the Threshold FET10/e, very good when compared to the average:
Figure 9 - Measured Harmonic Distortion per frequency
A powerful mathematical “black box” model that describes these effects is based on the Volterra Kernels. A strong simplification applicable in the audio field is that of Diagonal Volterra Kernels, which consists in dividing the system still in two parts, as shown in Fig. 10:
Figure 10 - Diagonal Volterra Model
The output signal here is obtained by adding together
n parallel streams. The
i-th flow is built again by a nonlinear part without memory that models the nonlinearity of order
i (a polynomial
gi(
x) of degree
i), followed by a
dedicated linear system that models the memory effect only for that order of distortion. With this schema, the contribution of each nonlinear distortion order is modified in module and phase before being added all together (to remember that the distortion of order
i includes also components of order
i - 2,
i - 4, ...). So, the final distortion structure can differ a lot from the static one, where instead the sum in
f(
x) of distortions was all in phase. The value of
y[
n] is expressed by the following "relatively" simple formula:
The real complexity is in the identification of the transfer functions
hi[
n] from the measurements of the real devices;
gi(
x) polynomials are fixed in advance to make simple the same estimation. Unfortunately, the measurements that are normally made do not sufficiently explore this aspect. As the term "dynamic" suggests, this type of distortion has major impact on transients, which real musical signals are full of... by neglecting it, we lose non-secondary aspects of the amplifier's behaviour.
This overview shows how is difficult to fully characterise the sound of an audio device. Anyway, I won't go any further now: there is an extensive bibliography on the subject for those who want to know more. Currently I am investigating more on this model; the consolidated simulation program implements only static distortion.
Subjective Listening Test Protocol
Moving on lighter and more fun aspects, I report below the procedure for carrying out the listening tests that I am preparing.
Preparation
A certain number of tracks of different genres and of proven quality are selected. For each one, a new track is created consisting of three components:
- Selection of a significant part of the original track of no more than 30 seconds.
- Creation of a new track in high resolution (at least 192Khz/24bit) which contains three subtracks in a random order:
- The original track part, oversampled only.
- Two additional tracks with the injection of different levels of second and third order distortions: (high, low) and (low, high). The low and high values are to be determined based on preliminary audibility tests on the reference system used in the listening test.
In the injection of harmonics, the gain (or attenuation) value of the original signal must be chosen in such a way that:
- The RMS level of the resulting signal does not differ by more than 0.5dB in the different versions.
- There isn't clipping.
- The listening level, set at 90dB SPL and 70dB SPL, must be taken into account (then, possibly more versions of the same track).
These aspects are verifiable with the same program that inject distortions.
Execution
On the reference system for the test, the correct volume is set. The person carrying out the listening test can independently select the tracks (each containing the three subtracks A, B, C) and listen to them at will, without any external help. Then, he will fill in a questionnaire divided into the following items, for which he must express a score from 1 (low) to 5 (high) for versions A, B and C of in track:
- A / B difference
- B / C difference
- Pleasant
- Distortion
- Timber
- Dry
- Enveloping
- Quality of Bass
- Quality of Medium
- Quality of High
After compiling the report for all tracks, the data are collected and analyzed with the classic statistical methods.