• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

Benefits of capturing audio at higher sampling rates

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
Branching off the "MQA creator Bob Stuart answers questions" thread. One of discussion topics emerged there: a better suitability, or lack thereof, of higher sampling rates and bit depths, such as 192/24, for capturing transients and other "inconvenient" components of music signals. I want this branched thread to be more rational and less emotional than the MQA thread.

First, please consider the evidence I'm presenting: https://www.dropbox.com/s/qii14v911sregjw/S192_v4.zip?dl=0

I made all file names descriptive. What the files with isolated distortions demonstrate is that even if the destination media is CD, it still makes sense to capture audio at a higher sampling rate and bit depth. I'm interested in reading your accounts of whether you personally hear the difference between the file A and file J, and if so, what are the characteristics of music that you feel are different.
 
Last edited:

reza

Active Member
Joined
Mar 27, 2019
Messages
110
Likes
131
FYI: The dropbox link, when clicked, directs to ASR.
 

flipflop

Addicted to Fun and Learning
Joined
Feb 22, 2018
Messages
927
Likes
1,240
I'm interested in reading your accounts of whether you personally hear the difference between the file A and file J
I couldn't. How about you?
foo_abx 2.0.6c report
foobar2000 v1.3.16
2019-06-29 21:22:34

File A: A-S192.wav
SHA1: eec4c1c7eb651849d46d15571e86be874218ece8
File B: J-S192_33TapsHalfBandDoubleDownsampledTo48_AudacityResampledAndDitheredToCD_AudacityResampledTo192.wav
SHA1: 6e859799534ae5ef681496da3f097852e9ef0904

Output:
DS : Primær lyddriver
Crossfading: NO

21:22:34 : Test started.
21:23:24 : 01/01
21:24:06 : 02/02
21:25:01 : 02/03
21:25:26 : 02/04
21:25:49 : 02/05
21:26:10 : 02/06
21:26:59 : 02/07
21:28:02 : 03/08
21:28:02 : Test finished.

----------
Total: 3/8
p-value: 0.8555 (85.55%)

-- signature --
6505aeb13967fba53262ee6ebffa5fbd26b29ddc
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,068
Likes
36,479
Location
The Neitherlands
I think most if not all folks here will agree that capturing at 24/192 is useful for many reasons.
Most studios work with either 96/24 or 192/24 for capture anyway.
It can record all that is needed.
What some here question is if it is really needed to distribute it that way as well.
For the majority of people 16/44 will suffice, hell even MP3/VBR2 will.
Those that want to listen to 192/24 or DSDx8 or whatever have a choice to do so.

What is the object of the excercise ?
 
OP
Sergei

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
I couldn't. How about you?
foo_abx 2.0.6c report
foobar2000 v1.3.16
2019-06-29 21:22:34

File A: A-S192.wav
SHA1: eec4c1c7eb651849d46d15571e86be874218ece8
File B: J-S192_33TapsHalfBandDoubleDownsampledTo48_AudacityResampledAndDitheredToCD_AudacityResampledTo192.wav
SHA1: 6e859799534ae5ef681496da3f097852e9ef0904

Output:
DS : Primær lyddriver
Crossfading: NO

21:22:34 : Test started.
21:23:24 : 01/01
21:24:06 : 02/02
21:25:01 : 02/03
21:25:26 : 02/04
21:25:49 : 02/05
21:26:10 : 02/06
21:26:59 : 02/07
21:28:02 : 03/08
21:28:02 : Test finished.

----------
Total: 3/8
p-value: 0.8555 (85.55%)

-- signature --
6505aeb13967fba53262ee6ebffa5fbd26b29ddc

Not statistically significant difference either, yet with couple of twists: one related to hardware+software, another to listening fatigue. Decided to do the comparison with the same foo_abx version you used, on an unfamiliar Windows laptop, supposedly capable of playing 192/24. With the laptop directly driving Beyerdynamic DT 770 headphones. Got a result similar to yours.
foo_abx 2.0.6c report
foobar2000 v1.4.5
2019-06-29 23:55:34

File A: A-S192.wav
SHA1: eec4c1c7eb651849d46d15571e86be874218ece8
File B: J-S192_33TapsHalfBandDoubleDownsampledTo48_AudacityResampledAndDitheredToCD_AudacityResampledTo192.wav
SHA1: 6e859799534ae5ef681496da3f097852e9ef0904

Output:
DS : Primary Sound Driver
Crossfading: NO

23:55:34 : Test started.
23:58:47 : 00/01
23:58:50 : 01/02
23:58:53 : 01/03
23:58:57 : 02/04
23:59:00 : 03/05
23:59:03 : 03/06
23:59:06 : 03/07
23:59:09 : 04/08
23:59:13 : 05/09
23:59:20 : 05/10
23:59:23 : 05/11
23:59:26 : 06/12
23:59:29 : 06/13
23:59:33 : 06/14
23:59:37 : 06/15
00:01:11 : 06/16
00:01:11 : Test finished.

----------
Total: 6/16
p-value: 0.8949 (89.49%)

-- signature --
88c1afdd40151fa522c0528f7bdf301b8a92ab74
Couldn't tell any difference at all, sighted or unsighted. Upon investigation, which involved switching foobar2000 from using DS : Primary Sound Driver to using ASIO driver for the laptop internal DAC, it turned out that the DAC is not actually capable of playing 192/24, and either the vendor-supplied driver, or the Windows Audio Engine, engaged in hidden resampling. No wonder I had to essentially give up in the middle of the test!

Switched to an external consumer-grade USB DAC with sampling rate indicator. Same laptop. Same headphones. This time it wasn't as hopeless. I thought I could hear a tiny difference, most of the time. Sometimes I couldn't hear anything different, and resorted to guessing. Ran ABX twice, with the same result.
foo_abx 2.0.6c report
foobar2000 v1.4.5
2019-06-30 19:08:38

File A: A-S192.wav
SHA1: eec4c1c7eb651849d46d15571e86be874218ece8
File B: J-S192_33TapsHalfBandDoubleDownsampledTo48_AudacityResampledAndDitheredToCD_AudacityResampledTo192.wav
SHA1: 6e859799534ae5ef681496da3f097852e9ef0904

Output:
WASAPI (push) : Speakers (DX3 Pro), 24-bit
Crossfading: NO

19:08:38 : Test started.
19:12:12 : 01/01
19:13:57 : 02/02
19:15:14 : 02/03
19:16:18 : 03/04
19:17:39 : 04/05
19:19:54 : 05/06
19:20:57 : 05/07
19:22:28 : 05/08
19:25:20 : 05/09
19:26:53 : 06/10
19:28:14 : 06/11
19:31:14 : 07/12
19:33:02 : 07/13
19:35:06 : 08/14
19:36:14 : 09/15
19:37:49 : 09/16
19:37:49 : Test finished.

----------
Total: 9/16
p-value: 0.4018 (40.18%)

-- signature --
622611de07cd1bdc171aeb4ce3af6cfafebcbcf8
foo_abx 2.0.6c report
foobar2000 v1.4.5
2019-06-30 19:51:01

File A: A-S192.wav
SHA1: eec4c1c7eb651849d46d15571e86be874218ece8
File B: J-S192_33TapsHalfBandDoubleDownsampledTo48_AudacityResampledAndDitheredToCD_AudacityResampledTo192.wav
SHA1: 6e859799534ae5ef681496da3f097852e9ef0904

Output:
WASAPI (push) : Speakers (DX3 Pro), 24-bit
Crossfading: NO

19:51:01 : Test started.
19:55:14 : 00/01
19:56:30 : 01/02
19:58:42 : 02/03
20:00:31 : 02/04
20:02:28 : 02/05
20:04:14 : 03/06
20:05:38 : 03/07
20:07:11 : 03/08
20:08:28 : 04/09
20:11:03 : 05/10
20:12:43 : 05/11
20:14:31 : 06/12
20:18:06 : 07/13
20:19:35 : 07/14
20:21:37 : 08/15
20:23:35 : 09/16
20:23:35 : Test finished.

----------
Total: 9/16
p-value: 0.4018 (40.18%)

-- signature --
8cfa65d8a9d76a6db5b95a762c3374567529c21e
Far from spectacular, or even significant. As expected on such a simple piece of music. The influence of listening fatigue was interesting though. In both tests involving hardware+software actually capable of playing 192/24, I did better in the very beginning of the tests. In the second test, I took a 3+ minutes break near the end of the test, and this seemingly restored my differentiation ability for a while.
 
OP
Sergei

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
I think most if not all folks here will agree that capturing at 24/192 is useful for many reasons.
Most studios work with either 96/24 or 192/24 for capture anyway.
It can record all that is needed.

That's debatable. I thought that way too, yet it turned out to be a Californian bias, I guess mostly due to the proximity of major Hollywood studios, which had to switch to 192/24 during the ascendance of Blue-ray, and could afford the switching. Now that I thought more about it, I'm siding with what other ASR members told: there is still significant number of studios in the world using 48/24 and even 44/16 for capture.
What some here question is if it is really needed to distribute it that way as well.
For the majority of people 16/44 will suffice, hell even MP3/VBR2 will.
Those that want to listen to 192/24 or DSDx8 or whatever have a choice to do so.

Another view I changed. I thought that the existing corpus of songs, shared by major streaming services, will be gradually transformed to Hi-Res. Now I think that one of the reasons such conversion proceeds at such a glacial pace is that Hi-Res masters are simply not available for most songs. If those studios keep capturing in 48/24 and 44/16, the situation isn't going to drastically change anytime soon.
What is the object of the excercise ?
To give the interested ASR members a glimpse of what significantly influenced my opinion on the subject of transients. Instead of considering music as a pure mix of slowly evolving sinusoids, I believe it pays to take into account the intentional noise and transients as well. Especially in the context of lossy formats.
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,068
Likes
36,479
Location
The Neitherlands
there is still significant number of studios in the world using 48/24 and even 44/16 for capture.

The local studio from Geore Baker (hitsingle little green bag) indeed still records at 48/24 which he feels is enough.
This at least was the case when I spoke to him in his studio about 2 years ago.


Me thinks your view of transients just differs from others. There is no one here that believes audio is a mix of slowly evolving sinoids.
 
OP
Sergei

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
I've also filtered out the 20 khz and below then done the shift to lower frequencies so I can surely hear all of it. There is nothing to hear. If you boost the volume some there is still very little to hear. People keep promising how this stuff matters. There just isn't anything up there much, and we SIMPLY CANNOT HEAR those frequencies.

Not arguing with that. You may want to take a look at this though:

http://www.till.com/articles/PickupResponse/
https://robrobinette.com/Tube_Guitar_Amp_Overdrive.htm

I don't remember for sure regarding this particular song, because I didn't set up the microphones and amp feeds myself, yet I believe we captured the guitars either with a microphone placed close to an amp, or via an amp's post-distortion line output, if it had one. The guitars and amps generate some weirdly-shaped signals with correspondingly unwieldy spectra, like the ones pictured in the referred articles.

e_E_combo.bmp


lowEMid.gif


HeavyOverdriveScope.jpg



I believe that most of what appears to be "ultrasonic noise" are artifacts caused by an attempt to Fourier-transform components of signals that are "inconvenient" for such transform. In effect, those could be remnants, generated by the forceful process of approximating triangles and squares with a weighted sum of sinusoids.

You may notice that the cymbal strikes tend to be aligned with these "ultrasonic noise" bursts. The 17 Hz could be a half of a five- or six-string bass guitar 34 Hz fundamental: the performing talent has been well known for their love of weird and rare guitars. The first article referred above explains how such 34->17 Hz non-linear metamorphosis might happen.

Well, those articles don't talk in depth about the bizarre-signal-generating puppies depicted below, which the performing talent wildly obsesses with (at least here in California). If you think that's the only pedal board involved in a typical performance, think again! And what's the deal with those pink boxes, supposedly applying British transform to the American sound? :)

IMG_8706_downsampled.jpg
 

solderdude

Grand Contributor
Joined
Jul 21, 2018
Messages
16,068
Likes
36,479
Location
The Neitherlands
I believe we captured the guitars either with a microphone placed close to an amp

In that case there surely will be no 'sharp transients' because the large speakers usually roll-off way before 20kHz. Those that tried to use guitar amps as a hifi speaker will know.

Even when fed directly in a console the input filter of the ADC will remove all* the US content and 'sharper than allowed' transients anyway.

* all as in how that filter is constructed.

You are showing a captured waveform.
How do you know if there was relevant ultrasonic info present that had not been recorded ?
One can only prove this when the same signal was also recorded at a much higher rate and compared.

The squarewave... its a nice plot but there is no timebase shown nor amplitude. Also we don't know anything about the signal.
It could well be a 100Hz signal for instance.
What was it ? the output of an effect box ? what was it fed ? (seems like a continuous signal) a squarewave ? an undistorted sine wave ? A sinewave with harmonics in it ?
The scope image really means nothing here as 'proof' of anything.
 
Last edited:

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
@Sergei

I don't think you saw my question on the other thread?

Which is more detrimental to music?
  1. Woofer/mid/tweeter arriving at different times at the listener's ear - say discrepancies of the order of 100us
  2. Recording and playback at 44.1kHz
The impression I get is that people are very interested in this sampling stuff while listening to it on very expensive, but very corrupting, speakers.
 
OP
Sergei

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
@Sergei

Which of these two would you advocate that the would-be audiophile should attend to first?
  1. Eliminate phase and timing anomalies in their speakers
  2. Ensure that most of their recordings from now on are 192/24 or MQA
If there is any notion of speaker timing anomalies being 'constant' and therefore nulled out by human hearing, I would dispute it. If a percussionist hits a drum twice in a row, both strikes will be different. Different proportions of the sound will make their way to the woofer versus the tweeter, for example. If there's a physical timing anomaly in the speakers (such as both drivers being mounted on a vertical flat baffle without delay correction), the timing variations thus produced will swamp the supposed inaccuracy of CD by a factor of 'many'.

I agree that addressing first the weakest link in a particular music delivery chain is the most prudent approach.
And yet the MQA people don't mention this in their publicity material...

Well, they do recommend buying Meridian gear, such as their phase-aligned speakers, utilizing digital crossovers. No doubt, yet another part of the "evil plan" :)
 

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
I agree that addressing first the weakest link in a particular music delivery chain is the most prudent approach.
Before worrying about this sampling stuff, did you do any work on quantifying the timing misalignments due to speakers in the real world? If a multi-way speaker is set up 'perfectly', and the listener sinks in their seat and ends up 1" lower, does the resulting misalignment dwarf your worries regarding sampling?

My suspicion is that speakers are never set up perfectly, and even if they were, the listener would have to have their head in the proverbial vice to benefit from it to anywhere near the levels of finesse you are worrying about. Plus you would need humidity and temperature control.

As I said in another thread, with digital audio you can keep zooming in on the theoretical errors until they fill the screen, ad infinitum. With real world hardware, there reaches a point where you have to stop zooming.
 
Last edited:

Cosmik

Major Contributor
Joined
Apr 24, 2016
Messages
3,075
Likes
2,180
Location
UK
When setting up my speakers, the delay between drivers is controlled in units of the humongous 44.1 kHz sample period that so appals you. And even then, there's a bit of guesswork in exactly which one to choose; what height my ears are going to be at and so on.
 

FrantzM

Major Contributor
Forum Donor
Joined
Mar 12, 2016
Messages
4,377
Likes
7,881
I am all for capturing at the best rate technically and financially achievable. IMO good compromises would be 192/24 or perhaps 176.4/24 since the final delivery will be in 44.1/16. Manipulations are intrinsic to the process of producing recorded music; If these are performed on an original recorded at a higher rate then we are (somewhat) insured of optimal quality of the finished product.

High Resolution is mostly an audiophile reaction IMO. When they started showing, wrongly, those staircases to represent what PCM does, it struck a chord in the mind of most audiophiles... from that point they started thinking (hasn't changed today) something in digital had to be missing ... the cut-off at 22 KHz was too low, we were losing precious ultrasonics (those we can't hear anyway :)) and analog became the absolute reference. The counterpoint for the commercial digital faction was to come up with something that would be "better than CD" and "closer to analog" thus High Res 96/24 where the cut-off is 48 Khz ... Most Ultrasonics (those we can't hear anyway) should be fully recorded .... It doesn't matter on the face of it that microphone do not capture those, nor consoles electronics or that as Sergei remarked that most music is produced in 44.1 or 48/16, No! They will still sell you upsampled constructs as High Res and ... so conditioned, we will "hear the differences" ... a bit of re-mastering will sway most in the so-called right direction.

I know enough now, thanks to ASR to enjoy music even at the 320 rate of Spotify and Tidal. Perfectly content. The real progress reside in speakers and their digital treatment. The existence of (relatively) inexpensive high quality amplification modules (Hypex, B&O CE, etc) ushers a new time in active speakers with better DSP. Speakers systems as diverse as the JBL M2, Beolab 90 & 50, Kii 3 and Kii BXT, Dutch $ Dutch 8 C and yes the Linkwiz KX521 and LX Mini are the harbingers of a DSP and active speakers future. Speakers that can be tuned to the room.

P.S and OT :)

A speaker system that needs to be auditioned to have an idea how good DSP and 44.1 can be is the no-longer-produced B&O Beolab 5. It deserves much respect. I have heard these and came up impressed but at that time was too much an audiophile to accept that fact. Details are a bit fuzzy but I know of a person in Miami, Florida who replaced his >$300,000 Wilson based system ( you know, the classic audiophile "shrine" with, rack, cables DAC, Pre, monblocks etc) with a pair of Beolabs 5 and has been happy since.
 
Last edited:

March Audio

Master Contributor
Audio Company
Joined
Mar 1, 2016
Messages
6,378
Likes
9,321
Location
Albany Western Australia
To give the interested ASR members a glimpse of what significantly influenced my opinion on the subject of transients. Instead of considering music as a pure mix of slowly evolving sinusoids, I believe it pays to take into account the intentional noise and transients as well. Especially in the context of lossy formats.

@Sergei you have to get this idea out of your head that "transients" have some kind of fundamentally different meaning to "high frequency content".

If a transient has high frequency content above the range of hearing *this content will not be heard*.

Capturing at 96kHz bandwidth simply wont help you hear content that is above your hearing range.

You can repeat yourself as much as you like but it wont change simple facts.
 
Last edited:

Rja4000

Major Contributor
Forum Donor
Joined
May 31, 2019
Messages
2,770
Likes
4,721
Location
Liège, Belgium
I'm quite sure no pro studio is recording at 16 bits nowadays.
Not even home studios.
(If the hardware they use allows full benefit of this resolution is another question).

But probably, most production is still 48kHz.
Or 96kHz.
Live events are typically 48kHz too (except really big productions maybe).

The benefit of the higher resolution seems obvious to me: it offers more room for sound transformation and manipulations.
And more error margin while recording.
You're also further away from the borders of the significant data, so less risk to corrupt it.
More margin again.

It's like the digital photography. When the max resolution was 6Mp, that was about what was needed to print with OK resolution.
Now, with a market average around 24Mp, there is a significant margin, allowing crop and heavy processing while keeping the final 6Mp or 8Mp target. In first digital photography age, some thought the film was still better, because they could see some problems at the margin. Nowadays, this is no longer true.
Exact same thing happened with digital audio history.
 
Last edited:
OP
Sergei

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
In that case there surely will be no 'sharp transients' because the large speakers usually roll-off way before 20kHz. Those that tried to use guitar amps as a hifi speaker will know.

If you try to reproduce sinusoids, yes, they roll off. But these amps and speakers are not optimized for sinusoids. They are designed to emit massively distorted signals. Look at this for instance: https://www.parts-express.com/pedocs/specs/290-486--eminence-legend-1258-spec-sheet.pdf - it is even advertised as a highly non-linear device.

One way to generate distortions is to accelerate the speaker cone to a significant speed and then let it run into an elastic yet heavily decelerating stop. This will generate a significant change in the cone speed, and thus sound pressure spike. Another way to generate distortions is to make the speaker cone non-rigid, so that the non-piston (a.k.a. breakup ) vibration modes get excited.
Even when fed directly in a console the input filter of the ADC will remove all* the US content and 'sharper than allowed' transients anyway.

* all as in how that filter is constructed.

Indeed, that's approximately what the double-half-band-LPF in the ADC I used does when capturing at 48/24. Technically speaking, it doesn't completely remove the transients. It smoothes them out. Energy of one-sample 192/24 transient is spread over several neighboring samples in 48/24. The amplitude of the transient at the 48/24 sample time-wise corresponding to the original 192/24 sample is attenuated by a factor of about 4x.

When capturing at 192/24, the ADC I used logically captures signal change (delta) every 163 nanoseconds, and converts from that high-sample-rate yet shallow-bit-depth differential format into 24-bit PCM format by effectively averaging (sigma) the signal over 5.2 microseconds. Physically, the delta and sigma operations may be swapped at some or all stages (there are 5 stages). What matters in the end is the averaging over the 5.2 microseconds.
You are showing a captured waveform.
How do you know if there was relevant ultrasonic info present that had not been recorded ?
One can only prove this when the same signal was also recorded at a much higher rate and compared.

I agree. What is captured at 192/24 is not the absolutely precise rendition of the signal. What I'm demonstrating is the comparison between the 192/24 and 48/24 captures of the signal, for a specific ADC chip, modeled with the level of understanding I could attain from the chip's data sheet and other technical data sources. For all I know, the difference between 384/24 and 48/24 captures using another chip may have revealed other distortions.
The squarewave... its a nice plot but there is no timebase shown nor amplitude. Also we don't know anything about the signal.
It could well be a 100Hz signal for instance.
What was it ? the output of an effect box ? what was it fed ? (seems like a continuous signal) a squarewave ? an undistorted sine wave ? A sinewave with harmonics in it ?
The scope image really means nothing here as 'proof' of anything.

That was an output of a concrete guitar amp described in the article, in its overdrive (clipping) mode. The article is pretty educational. Such guitar "amp" is not really a conventional amplifier per se, but a highly-non-linear electronic musical instrument, which is controlled by yet another highly-non-linear electromechanical musical instrument - the guitar.
 
OP
Sergei

Sergei

Senior Member
Forum Donor
Joined
Nov 20, 2018
Messages
361
Likes
272
Location
Palo Alto, CA, USA
Before worrying about this sampling stuff, did you do any work on quantifying the timing misalignments due to speakers in the real world? If a multi-way speaker is set up 'perfectly', and the listener sinks in their seat and ends up 1" lower, does the resulting misalignment dwarf your worries regarding sampling?

My suspicion is that speakers are never set up perfectly, and even if they were, the listener would have to have their head in the proverbial vice to benefit from it to anywhere near the levels of finesse you are worrying about. Plus you would need humidity and temperature control.

As I said in another thread, with digital audio you can keep zooming in on the theoretical errors until they fill the screen, ad infinitum. With real world hardware, there reaches a point where you have to stop zooming.
I agree with everything you said. However, a simple way to escape the tyranny of imperfect speakers and uncontrolled room acoustics is to use studio-quality headphones.
 
Top Bottom