• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Alternative method for measuring distortion

Serge Smirnoff

Active Member
Joined
Dec 7, 2019
Messages
240
Likes
136
Hi everybody! This is my first post on the forum.

I would like to present results of audio measurements according to the new music-based audio metric which I developed. I think this listener-centric audio metric is mature enough to be discussed in public and the community of scientifically minded audio enthusiasts and professionals is the best place for the purpose.

Devices under the test are mostly portable audio. Some of them were also measured on this forum according to traditional audio metric.

In simple words the core idea of the new metric can be explained as follows. Traditional parameter THD measures the level of distortion/degradation of pure Sine wave. This audio parameter can be measured using different method/procedure. With this new method the level of distortion can be measured not only for Sine signals but also for Square, Triangle, White Noise, ..., any signals. Needless to say that distortion of real music signal can be measured as well. Thus, the new audio metric is based on generalized method of distortion measurement having THD+N as a special case when test signal is Sine. Such measurements with real music signals correlate well to perceived quality estimations and this relationship can be researched directly as the same signal can be used for both objective measurements and subjective evaluations.

The level of distortion is quantified by the parameter Difference Level (Df), which shows ... the difference in dB between two waveforms. Measurement algorithm does not account linear distortion of both coordinate axes of waveform - time and amplitude. So, only degradation of waveform “structure” is measured; DC-offset/amplitude and pitch/phase shift are not accounted (details of the algo can be discussed separately if anybody interested in). Df can be measured using various time windows: 50, 100, 400, 3000 ... ms. Examples below use 400ms window.

Let's have a look at some real-life df-measurements in order to have some idea of scale, sensitivity and accuracy of Df parameter. I will use the measurements that were performed in cooperation with the member of Head-Fi forum - csglinux. He made recordings of some high-end portable players with his RME Babyface Pro; I computed Df values and put them on df-slides. Each df-slide shows level of degradation for 10 technical signals and one real-life audio signal (two hours of various music material).

Here is results of testing the best and the worst among 11 tested devices:

ChordHugo2@SE.png

FiiOM11@SE.png

Pure Sine signals in both devices are distorted almost similarly while other tech signals are distorted more heavily in FiiO. Predictably the overall distortion of real audio signal is also higher (+7.8dB, histogram median). Such difference in quality will be audible for sure. For simplicity of explanation we can safely assume here that 1-2dB difference is not perceptible for human hearing.


Another two examples shows df-measurements of LG V30 in two modes - low impedance and high impedance:

LGV30@SE.png

LGV30(high)@SE.png

As expected, almost identical Df values for all signals (including real-life) and small differences for Sine1k and 1-bit signal.


Another two df-slides show devices, which use the same audio solution inside:

AppleiPhoneSE@SE.png

AppleiPhone5@SE.png

All signals have the same Df levels ... except both pure Sine waves (1k and 12.5k)


This device was measured in two filter modes - steep and gradual:

ShanlingM2s@SE.png

ShanlingM2s-gr@SE.png

With gradual filter mode most tech signals have worse Df levels but resulting Df level with real music signal lost only 0.1dB and this is too small for perception.


The device above was measured using less accurate recorder M-Audio Microtrack 96/24 (see Testing Diagram). Yes, quality of recorder affects df-measurements but the relation is not trivial. Below are two df-slides of the same device measured with recorders of different precision:

AppleA1749csglinux@SE.png

AppleA1749@SE.png

Df-measurements with some signals are equal for both recorders, others differ to different extent. The most important indicator - Df with real-life signal is 3dB better with better recorder. So, measurement recorder/interface should be mentioned in the reports. But within a class of recorders/interfaces with comparable precision Df levels are consistent, can be compared and used for inference about perceived audio quality. This recorder-dependency can be considered as a drawback of this measurement method. On the other hand this dependency reflects the natural fact that resulting audio measurements depend on quality of measurement interface. In df-metric this fact is just explicitly visible. So, measurement interface must be better than device under test )).


Df-slides of other devices can be found here - http://soundexpert.org/articles/-/blogs/audio-quality-of-high-end-portable-players
How to read df-slides - http://soundexpert.org/portable-players#howtoreaddfslides
One-page introduction to df-metric - http://soundexpert.org/documents/10179/11017/se-audio-metric.pdf
Tech. signals used for testing - http://soundexpert.org/test-signals
Audio material used for testing - http://soundexpert.org/test-signals#variety

Many questions and features of the new metric left beyond the scope of this post including the very important one - are Df levels correlate well to perceived audio quality in all cases or there are some limitations (spoiler: yes, there are some limitations). In order to answer the question I need to explain the concept of artifact signatures, which is essential part of df-metric (artifact signatures can be quantified and compared). Also some features/bugs of the metric remains questionable and need additional research (similarity metric for artifact signatures as an example). Nevertheless, I hope the basic idea of the new metric/measurements is pretty clear now and I'll be thankful for your comments, questions and objections.

-- Serge Smirnoff
 

Attachments

PierreV

Major Contributor
Forum Donor
Joined
Nov 6, 2018
Messages
1,437
Likes
4,686
The data visualization is spectacular, but not very informative imho. Using the same color scale for different tests ends up giving a nice bluish color to all the result of a certain test (say Sine 1 kHz) when obviously some devices are performing much better than the other and the best results of other tests are still very red. I would map colors per test category. The best Sine 1kHz would end up blue, the worst red. And, likewise, the best program simulation noise would end up blue... This would have the added advantage of making the reason for some bad or good synthetic results more obvious visually.

Then, there is the issue of weighting of different parameters and correlation with perceived SQ.
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,630
Likes
10,205
Location
North-East
Hi everybody! This is my first post on the forum.

I would like to present results of audio measurements according to the new music-based audio metric which I developed. I think this listener-centric audio metric is mature enough to be discussed in public and the community of scientifically minded audio enthusiasts and professionals is the best place for the purpose.

Devices under the test are mostly portable audio. Some of them were also measured on this forum according to traditional audio metric.

In simple words the core idea of the new metric can be explained as follows. Traditional parameter THD measures the level of distortion/degradation of pure Sine wave. This audio parameter can be measured using different method/procedure. With this new method the level of distortion can be measured not only for Sine signals but also for Square, Triangle, White Noise, ..., any signals. Needless to say that distortion of real music signal can be measured as well. Thus, the new audio metric is based on generalized method of distortion measurement having THD+N as a special case when test signal is Sine. Such measurements with real music signals correlate well to perceived quality estimations and this relationship can be researched directly as the same signal can be used for both objective measurements and subjective evaluations.

The level of distortion is quantified by the parameter Difference Level (Df), which shows ... the difference in dB between two waveforms. Measurement algorithm does not account linear distortion of both coordinate axes of waveform - time and amplitude. So, only degradation of waveform “structure” is measured; DC-offset/amplitude and pitch/phase shift are not accounted (details of the algo can be discussed separately if anybody interested in). Df can be measured using various time windows: 50, 100, 400, 3000 ... ms. Examples below use 400ms window.

Let's have a look at some real-life df-measurements in order to have some idea of scale, sensitivity and accuracy of Df parameter. I will use the measurements that were performed in cooperation with the member of Head-Fi forum - csglinux. He made recordings of some high-end portable players with his RME Babyface Pro; I computed Df values and put them on df-slides. Each df-slide shows level of degradation for 10 technical signals and one real-life audio signal (two hours of various music material).

Here is results of testing the best and the worst among 11 tested devices:

View attachment 41758
View attachment 41768
Pure Sine signals in both devices are distorted almost similarly while other tech signals are distorted more heavily in FiiO. Predictably the overall distortion of real audio signal is also higher (+7.8dB, histogram median). Such difference in quality will be audible for sure. For simplicity of explanation we can safely assume here that 1-2dB difference is not perceptible for human hearing.


Another two examples shows df-measurements of LG V30 in two modes - low impedance and high impedance:

View attachment 41769
View attachment 41770
As expected, almost identical Df values for all signals (including real-life) and small differences for Sine1k and 1-bit signal.


Another two df-slides show devices, which use the same audio solution inside:

View attachment 41771
View attachment 41772
All signals have the same Df levels ... except both pure Sine waves (1k and 12.5k)


This device was measured in two filter modes - steep and gradual:

View attachment 41764
View attachment 41765
With gradual filter mode most tech signals have worse Df levels but resulting Df level with real music signal lost only 0.1dB and this is too small for perception.


The device above was measured using less accurate recorder M-Audio Microtrack 96/24 (see Testing Diagram). Yes, quality of recorder affects df-measurements but the relation is not trivial. Below are two df-slides of the same device measured with recorders of different precision:

View attachment 41766
View attachment 41767
Df-measurements with some signals are equal for both recorders, others differ to different extent. The most important indicator - Df with real-life signal is 3dB better with better recorder. So, measurement recorder/interface should be mentioned in the reports. But within a class of recorders/interfaces with comparable precision Df levels are consistent, can be compared and used for inference about perceived audio quality. This recorder-dependency can be considered as a drawback of this measurement method. On the other hand this dependency reflects the natural fact that resulting audio measurements depend on quality of measurement interface. In df-metric this fact is just explicitly visible. So, measurement interface must be better than device under test )).


Df-slides of other devices can be found here - http://soundexpert.org/articles/-/blogs/audio-quality-of-high-end-portable-players
How to read df-slides - http://soundexpert.org/portable-players#howtoreaddfslides
One-page introduction to df-metric - http://soundexpert.org/documents/10179/11017/se-audio-metric.pdf
Tech. signals used for testing - http://soundexpert.org/test-signals
Audio material used for testing - http://soundexpert.org/test-signals#variety

Many questions and features of the new metric left beyond the scope of this post including the very important one - are Df levels correlate well to perceived audio quality in all cases or there are some limitations (spoiler: yes, there are some limitations). In order to answer the question I need to explain the concept of artifact signatures, which is essential part of df-metric (artifact signatures can be quantified and compared). Also some features/bugs of the metric remains questionable and need additional research (similarity metric for artifact signatures as an example). Nevertheless, I hope the basic idea of the new metric/measurements is pretty clear now and I'll be thankful for your comments, questions and objections.

-- Serge Smirnoff

Sorry, haven't had a chance to look through the details of the algorithm, hoping you can outline it here. Is this metric using differences in dB levels over a number of short fragments and then selecting the median value as representative of the device, or did I completely misunderstand?
 

pma

Major Contributor
Joined
Feb 23, 2019
Messages
4,591
Likes
10,727
Location
Prague
I think I understand. He takes two level matched time aligned files, applies algorithm to minimize phase differences, makes a difference and gets something like this:

1575821860493.png

This may be quite in a correlation with a subjective evaluation. I am probably doing something similar, but not precised to the special method.
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,630
Likes
10,205
Location
North-East
I think I understand. He takes two level matched time aligned files, applies algorithm to minimize phase differences, makes a difference and gets something like this:

View attachment 41780
This may be quite in a correlation with a subjective evaluation. I am probably doing something similar, but not precised to the special method.

Yeah, that’s similar to what DeltaWave does in the error distribution plot. Here's an example:

1575822307839.png
 

Xulonn

Major Contributor
Forum Donor
Joined
Jun 27, 2018
Messages
1,828
Likes
6,311
Location
Boquete, Chiriqui, Panama
I don't have the time to investigate this, but look forward to hearing from resident experts on the value of this method and its results.

Apparently SoundExpert has been around for a few years, but has not taken the audio world by storm.

Are the listener testing results "preference" or "objective" data - or a combination of the two?
 

pma

Major Contributor
Joined
Feb 23, 2019
Messages
4,591
Likes
10,727
Location
Prague
If you look at those few documents links from the post here
https://www.audiosciencereview.com/...hod-of-measuring-distortion.10282/post-282216
you will get the fastest answer taking something like 2 minutes of time.

Activity like this is very welcome, because correlation of listening preferences with THD, SINAD (which is nothing else than THD+N expressed in dB) at 5W and other standard non-linear measurement methods is close to zero and this almost no-correlation has been known for decades.
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,630
Likes
10,205
Location
North-East
Activity like this is very welcome, because correlation of listening preferences with THD, SINAD (which is nothing else than THD+N expressed in dB) at 5W and other standard non-linear measurement methods is close to zero and this almost no-correlation has been known for decades.

Agree completely!
 
  • Like
Reactions: pma

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,630
Likes
10,205
Location
North-East

Thanks, Arpiben. I’ll take a look when I get back to my large screen at home :)

The subject is of great interest to me, having done all sorts of measurements of various devices using various calculations. And in writing Distort, I found very little correlation between standard engineering metrics and what I can reliably distinguish.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,522
Likes
37,050
The AES document looks like this is very similar, and essentially the same concept Paul has for Deltawave. So Paul can probably look at the AES document to see what is similar and what is different.

Might be interesting if @Serge Smirnoff could look at Deltawave, and try it with his algorithm on some given files and see if the results are the same or very similar to Deltawave results. You can download Deltawave here:
https://deltaw.org/

Oh, and welcome to ASR Serge Smirnoff.
 

DDF

Addicted to Fun and Learning
Joined
Dec 31, 2018
Messages
617
Likes
1,355

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,630
Likes
10,205
Location
North-East
Earl Geddes attacked this conundrum and developed the GedLee metric, which you're probably familiar with. This link includes a tool for calculating:
https://www.diyaudio.com/forums/software-tools/287291-gedlee-metric-demystified.html

I’ve come across it. Nonlinear transfer function is what causes THD and IMD. These are almost irrelevant to sound quality unless at a very high magnitude. THD of about 1% seems very close to the threshold of audibility, for example.

Nonlinear phase distortions are different and I haven’t added these yet to Distort (coming soon), but have these measured and corrected for in DeltaWave.

I’m putting some finishing touches on jitter generation option for Distort. Preliminary tests show that jitter is also very hard to hear unless added in copious amounts.
 

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,934
Location
Oslo, Norway
I’ve come across it. Nonlinear transfer function is what causes THD and IMD. These are almost irrelevant to sound quality unless at a very high magnitude. THD of about 1% seems very close to the threshold of audibility, for example.

Nonlinear phase distortions are different and I haven’t added these yet to Distort (coming soon), but have these measured and corrected for in DeltaWave.

I’m putting some finishing touches on jitter generation option for Distort. Preliminary tests show that jitter is also very hard to hear unless added in copious amounts.

So what kind of distortions can be heard, then, according to the the tests you've ran so far? (asking on behalf of all the other dogs who are too lazy to run listening tests on themselves)
 

pma

Major Contributor
Joined
Feb 23, 2019
Messages
4,591
Likes
10,727
Location
Prague
I agree with pkane. Time passed since Geddes metrics was published and we are not solving trivialities like crossover distortion.
 

Blumlein 88

Grand Contributor
Forum Donor
Joined
Feb 23, 2016
Messages
20,522
Likes
37,050
So what kind of distortions can be heard, then, according to the the tests you've ran so far? (asking on behalf of all the other dogs who are too lazy to run listening tests on themselves)
Answering for myself having used Distort, mostly around 1 % levels on music. I set up distortion profiles similar to real gear like amps and DACs, but increased the amount of distortion. Listened over headphones with the assumption distortion from the phones is less than from speakers. Double checked by running Foobar ABX on saved results at different levels.

Haven't taken time to do things like compare all even vs all odd harmonics with any rigor.

EDIT to add:1% can be discerned in comparison, but doesn't much bother you in isolation. Worse even 3% doesn't bother you much with most music in isolation. 10% sounds BAD. So distortion levels at -40 db pretty okay, -30 db not so bad, -20 db oh no!
 
Last edited:

oivavoi

Major Contributor
Forum Donor
Joined
Jan 12, 2017
Messages
1,721
Likes
1,934
Location
Oslo, Norway
Answering for myself having used Distort, mostly around 1 % levels on music. I set up distortion profiles similar to real gear like amps and DACs, but increased the amount of distortion. Listened over headphones with the assumption distortion from the phones is less than from speakers. Double checked by running Foobar ABX on saved results at different levels.

Haven't taken time to do things like compare all even vs all odd harmonics with any rigor.

Cool. That's more or less with expectations, isn't it?

Would be interesting to compare perception of THD directly with other kinds of distortion metrics if they were available, like the gedlee metric, the Rnonlin metric, or Harman's "non-coherent distortion" metric. Edit: yeah, and the new one developed by @Serge Smirnoff !
 

pkane

Master Contributor
Forum Donor
Joined
Aug 18, 2017
Messages
5,630
Likes
10,205
Location
North-East
So what kind of distortions can be heard, then, according to the the tests you've ran so far? (asking on behalf of all the other dogs who are too lazy to run listening tests on themselves)

I’ll tell you when I finish adding all the possible distortions to Distort ;) Meanwhile we‘ll run some blind tests on audibility of various distortions to get a better idea of what others can and cannot hear.

Maybe the DF metric Serge shared here will turn out to be a better predictor of SQ than others. I can see how this could be valid, although it hides the actual (multiple) sources of distortion behind a simple amplitude error.
 
OP
S

Serge Smirnoff

Active Member
Joined
Dec 7, 2019
Messages
240
Likes
136
The data visualization is spectacular, but not very informative imho. Using the same color scale for different tests ends up giving a nice bluish color to all the result of a certain test (say Sine 1 kHz) when obviously some devices are performing much better than the other and the best results of other tests are still very red. I would map colors per test category. The best Sine 1kHz would end up blue, the worst red. And, likewise, the best program simulation noise would end up blue... This would have the added advantage of making the reason for some bad or good synthetic results more obvious visually.

Then, there is the issue of weighting of different parameters and correlation with perceived SQ.
Normally in df-metric there is no necessity to weight various distortions because all of them are already perfectly weighted in output music signal (m-signal) of a DUT. And we can measure and research distortion of that m-signal directly. Technical signals (t-signals) are only useful during development of audio equipment; no need to use them for assessment of audio quality. All required info about the latter is in the output m-signal.

There is another reason why distortion of m-signal has the highest status. Scale of its distortion has a point (around -50dB to my current understanding) when DUT becomes transparent for any listener. In other words, the output signal follows the input one so accurately that human hearing can not discern them. In the end a listener wants to have at the output of his amplifier exactly the same waveform, which he has in file.

So, color scale is absolute as well as the scale of Df values. And yes, some signals are distorted badly even in high quality audio devices, so they are red, in accordance with the measurements )).
 

PierreV

Major Contributor
Forum Donor
Joined
Nov 6, 2018
Messages
1,437
Likes
4,686
There is another reason why distortion of m-signal has the highest status. Scale of its distortion has a point (around -50dB to my current understanding) when DUT becomes transparent for any listener. In other words, the output signal follows the input one so accurately that human hearing can not discern them. In the end a listener wants to have at the output of his amplifier exactly the same waveform, which he has in file.

That sounds reasonable and, in any case, matches with the self-tests I took.

So, color scale is absolute as well as the scale of Df values. And yes, some signals are distorted badly even in high quality audio devices, so they are red, in accordance with the measurements )).

OK, so most of the info is "by engineers for engineers", fair enough. I was more thinking in terms of the customers/average users point of view who would possibly be interested in a SQ synthetic measure/estimate that is grounded in science and also would like to know which particular characteristic "fails" an otherwise well-performing device.

Anyway, thanks for replying.
 
Top Bottom