GPU Audio - the future of DSP?

Ra1zel · Jul 28, 2022

Dlomb11 said:
Honestly I would very much like to try FIR convolver because I think you could make a remarkable HiFi setup without having the problem induced by upsampling and linear phase filters with many TAPs (I think of some great crossovers).

I have no idea how it is possible to reduce inherent latency of high tap linear phase FIR filters. Thats why all studio converters use minimum phase. Sounds like people at ASR try to fight with math and physics again.

btw, I only ever achieved stable behavior below 1ms latency with AES67 networked converters.

ZolaIII · Jul 28, 2022

1 ms is a huge latency (equal to disastrous) in the world of modern HPC systems PCI bus considered as highest offender and that's why faster links and switches are developed. Latency is cumulative (all parts take their latency in final equation) and traditional GPU's aren't suitable for latency critical operations.

HPC-oriented Latency Numbers Every Programmer Should Know

HPC-oriented Latency Numbers Every Programmer Should Know - latency.txt

gist.github.com

Audio isn't latency critical but GPU's aren't really suitable for such operations that's why they have deticated (usually outdated) DSP's for such purposes (for example AMD still uses old Tensilica ones). Modern SoC's with modern interconnects are much more suitable for the purpose. Unfortunately we never got a good developer bord for such a purpose with full documentation and mainlined in order to such development even ever start. Some come close HiSilicone with Tensilica P5 P6 (never got mainlined and future work is dropped thanks to sanctions), Broadcom SoC's on the more potent Pi boards with V Core ADSP which all do old could have carried a lot of development burden for which all documentation disappeared (purposely), Old Ti opma panda boards... Industry obviously doesn't want this to happen so it remains the paper dream up to this day's (especially when it comes to GPU's for which Nv is one to blame the most).
Still there is hope as modern CPU's are becoming more and more SoC alike like recent Intel graphic units on the dye (partly based on Creative DSP design), hopefully one day someone will brake the chains. Best regards and have a good time.

Davide · Jul 29, 2022

As far as I'm concerned, no one here tries to fight anything, much less physics and math.
It is a forum, and here we discuss. Obviously within the limits of everyone's knowledge and skills. And this serves to increase these.
Your comments are interesting indeed, it is nice to receive quick technical information on non-mainstream issues.
What, as a layman, I struggle to understand, is why GPUs cannot be considered suitable for audio processing, when today they provide adequate latencies in video games (from the point of view of human interaction).

voodooless · Jul 29, 2022

Dlomb11 said:
What, as a layman, I struggle to understand, is why GPUs cannot be considered suitable for audio processing, when today they provide adequate latencies in video games (from the point of view of human interaction).

It has nothing to do with the GPU, but rather the filter technique itself: looking into the future has an inherent time penalty.

pierre · Jul 29, 2022

Dlomb11 said:
I don't know if you've ever tried mastering in a DAW with 7 or 8 plugins that do x32 upsampling and 64-bit processing (sometimes they are necessary so that the chain of these does not excessively degrade the audio quality).

The CPU runs at very high loads and the latency is just as high. And the problem is that if you have 1 second of latency and you are there fiddling with the effects, you cannot directly perceive the effect of what you are doing (it's a bit like ABX tests).

This is why taking advantage of GPUs, that already exist widely, is a valid solution.

if you have a modern CPU with 32 or 64 cores, the problem goes away if your DAW knows how to use them efficiently. The other option is to buy a DSP card from AVID or BlackMagick or Waves. PT or Resolve scale well.

For GPU, I am not sure I understand how they want to leverage them. They have a lot of small cores, so it is good for CPU expensive operations like video encoding but for Audio most operation are (relatively) lite on the CPU. I had in mind that most of the latency was from FIR filters now or operations that are sequential and less due to CPU being the bottleneck. To be efficient, you also would like the audio stream to come in and out from the GPU, possibly on HDMI.

ZolaIII · Jul 29, 2022

@Dlomb11 let's put it this way; hard to program, not well documented at all, made really for something else and include ASIC's and DSP's (that you again don't have access to) for such, big, chunky, expensive and noisy. Now you get a better picture. Now compare that to the SoC powering something like Samsung late buds (complete analog/digital system with DSP and all) that you could implement even in something like large 6.35 mm jack hosting. So why on earth would you prefer to use GPU? We are stuck with development because there is a need for Linux (Kernel) mainlined SoC - development board with potent DSP that is well documented and potent enough everything else to get it out public or open source if you wish. Today you at best get 64 bit FP (not for a sakes of audio precision but as wide as that) procsing done on CPU FPU - MPC units. Even moving to SIMD's (again not really easy to program) would be a big gain in both efficiency and processing power as you would have 128~256 bit vectors to pack them (with standard length instructions and even mixed). Of course this is for architectures which incorporate SIMD's and per each one independently. Future evolution of multipurpose multi functional accelerators should be based on flexible DSP's (in consumer grade products) and again only well documented and with appropriate toll chains disregarding of their additional graphic processing capabilities or if they are taylored for such in the first place (GPU's are large DSP areas after all).
Described Nv series are future more tailored for graphic processing and even lose badly in duble precision floting point calculations to average desktop CPU's with good optimised code because their FP duble precision performance is deliberately crippled severely by manufacturers (if you want unlocked FP64 you have to pay for it and get Quadro or such and the same thing goes for AMD).
To be fair they are much faster in 32 bit FP and integer (24, 32 & 64 bit) operations.
GPU's on propetry snake legs (no documentation no access on higher level) architecture is not worth of development time (including pretty much all GPU's and DSP architectures from such as QC).

Davide · Jul 29, 2022

ZolaIII said:
@Dlomb11 let's put it this way; hard to program, not well documented at all, made really for something else and include ASIC's and DSP's (that you again don't have access to) for such, big, chunky, expensive and noisy. Now you get a better picture. Now compare that to the SoC powering something like Samsung late buds (complete analog/digital system with DSP and all) that you could implement even in something like large 6.35 mm jack hosting. So why on earth would you prefer to use GPU? We are stuck with development because there is a need for Linux (Kernel) mainlined SoC - development board with potent DSP that is well documented and potent enough everything else to get it out public or open source if you wish. Today you at best get 64 bit FP (not for a sakes of audio precision but as wide as that) procsing done on CPU FPU - MPC units. Even moving to SIMD's (again not really easy to program) would be a big gain in both efficiency and processing power as you would have 128~256 bit vectors to pack them. Of course this is for architectures which incorporate SIMD's and per each one independently. Future evolution of multipurpose multi functional accelerators should be based on flexible DSP's and again only well documented and with appropriate toll chains disregarding of their additional graphic processing capabilities or if they are taylored for such in the first place (GPU's are large DSP areas after all).
Described Nv series are future more tailored for graphic processing and even lose badly in duble precision calculations floting point calculations to average desktop CPU's with good optimised code because their FP duble precision performance is deliberately crippled severely by manufacturers (if you unlocked FP64 you have to get Quadro or such and the same thing goes for AMD).
To be fair they are much faster in 32 bit FP and integer (24, 32 & 64 bit) operations.
GPU's on propetry snake legs (no documentation no access on higher level) architecture is worth of development time (including pretty much all GPU's and DSP architectures from such as QC).

Thank you @ZolaIII .
Very explanatory.

changster · Jul 31, 2022

Latency as the key selling point? That sounds very unattractive. There’s a thing called buffering for this.

pierre · Jul 31, 2022

changster said:
Latency as the key selling point? That sounds very unattractive. There’s a thing called buffering for this.

Can you explain what you mean? Latency is important in some use case, like recording or mixing, The lower the better, the best tail latency, the better. Buffering is increasing latency …

abdo123 · Jul 31, 2022

pierre said:
if you have a modern CPU with 32 or 64 cores.

It's really funny that you classify a EPYC Ryzen powered servers as 'modern CPU'.

You're not technically wrong but that's not typically a word i normally see to describe these.

Ra1zel · Jul 31, 2022

abdo123 said:
It's really funny that you classify a EPYC Ryzen powered servers as 'modern CPU'.

Threadrippers also have 32 cores, either way if we are considering today's hardware good enough even for most advanced audio applications give it 5 more years and it's hard to imagine any audio use will be considered more than pedestrian. Just now next generation of Intel and amd cpus should launch this year with 30% performance increase.

abdo123 · Jul 31, 2022

Ra1zel said:
Threadrippers also have 32 cores

AMD canceled the threadripper line.

voodooless · Jul 31, 2022

abdo123 said:
AMD canceled the threadripper line.

The Pro line is still there, targeted towards professional workloads. Seems like this kind of stuff would certainly quantify.

abdo123 · Jul 31, 2022

voodooless said:
The Pro line is still there, targeted towards professional workloads. Seems like this kind of stuff would certainly quantify.

Which is a weird product on its own but i digress.

GPU Audio - the future of DSP?

Ra1zel

Addicted to Fun and Learning

ZolaIII

Major Contributor

HPC-oriented Latency Numbers Every Programmer Should Know

Davide

Senior Member

voodooless

Grand Contributor

pierre

Addicted to Fun and Learning

ZolaIII

Major Contributor

Davide

Senior Member

changster

Member

pierre

Addicted to Fun and Learning

abdo123

Master Contributor

Ra1zel

Addicted to Fun and Learning

abdo123

Master Contributor

voodooless

Grand Contributor

abdo123

Master Contributor

Similar threads