• WANTED: Happy members who like to discuss audio and other topics related to our interest. Desire to learn and share knowledge of science required. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

New low-cost DSP platform in development

It's a very logical expectation. Yet AFAIK no such device, even just a simple USB DSP filter with configurable input samplerates and channel combinations, is on the market. Far from passing proprietary protocols further downstream :) I can imagine the DSP getting in way of the UAC2 protocol/function and just passing all the other functions (HID, mass storage, etc.).
Sometimes the screen and controls may be wrapped up in UAC2 quirks rather than nicely separated out into HID or whatever. That's how the Forte handles its display, rotary encoder and touch buttons. That makes @TunaBug's desired operation even trickier.
 
I'm not seeing i2s on that one.
Where's Waldo?
1733961320201.png
 
You tell me. It's not clear that it makes it out to the header:
PINFunctionFunctionPIN
1VCC3V3_SYSVCC5V0_OUT2
3AP_I2C4_SDA_3V3VCC5V0_OUT4
5AP_I2C4_SCL_3V3GND6
7GPIO_70_3V3R_UART0_TXD_3V38
9GNDR_UART0_RXD_3V310
11GPIO_71_3V3GPIO_74_3V312
13GPIO_72_3V3GND14
15GPIO_73_3V3GPIO_91_3V316
17VCC3V3_SYSGPIO_92_3V318
19SPI3_MOSI_3V3GND20
21SPI3_MISO_3V3GPIO_49_3V322
23SPI3_SCLK_3V3SPI3_CS_3V324
25GNDGPIO_50_3V32
 
You tell me. It's not clear that it makes it out to the header:
Here's the complete pinout of the SoC:
Looks like Mr Banana here actually has very few of those pins available, so our next best bet is to custom craft a board with everything we need. Feeling up for the challenge?
 
Unfortunately, this project requires RISC-V, so ARM boards are a nonstarter.
 
Looks like Mr Banana here actually has very few of those pins available, so our next best bet is to custom craft a board with everything we need. Feeling up for the challenge?
All SBCs pass only a small subset of the extensive SoC features onto their pin headers. For more features or actual embedding into a product core boards/compute modules are used, typically with edge connectors/castellated holes/LGA pads/high-density connectors. Only the large-volume integrators solder the actual SoCs, DDRs etc onto their product boards. Very likely no core board is available for your chosen SoC, the market is still tiny and young.
Unfortunately, this project requires RISC-V, so ARM boards are a nonstarter.
Yes, if you want to code the DSP in assembler. Honestly, good luck with that. For any higher-level language the architecture is not that important. E.g. CamillaDSP leaves the optimization of vector operations for the particular architecture to the Rust compiler, or uses small sections designed for the architecture, e.g. https://github.com/HEnquist/rubato/...nterpolator/sinc_interpolator_neon.rs#L60-L89 .

IMO that's the way to go with writing a high-performance code. My 2 cents that 99.9% of the code for this extremely large-envisioned project would be identical for any CPU architecture. But that's just my opinion.
 
Another issue with the USB bridge compatible with the original custom-designed vendor drivers would be the USB endpoint numbers which the bridge would have to duplicate from the bridged device. IMO this is not a viable project. Perhaps a pushbutton to physically bridge the USB lines within the DSP device to allow convenient quick access to the bridged device from the host would do.
 
Progress update: The custom assembler is now generating RISC-V machine code and almost on par with the GCC reference!
Code:
Ok, here's the assembled code:
07 35 05 00 87 35 85 00 07 36 05 01 87 36 85 01 93 05 10 00 d3 85 05 d2 43 05 b5 02 93 02 a0 00 13 03 a0 00 b7 03 00 40 93 83 03 00 53 80 03 c0 53 00 00 42 43 05 05 02 37 0e 80 3f 13 0e 0e 00 d3 00 0e c0 d3 80 00 42 53 05 15 0a 6f 80 00 01 63 86 62 00 6f c0 82 05 43 05 b5 02 b7 0e 20 41 93 8e 0e 00 53 81 0e c0 53 01 01 42 53 05 25 12 63 96 62 00 6f c0 80 01 43 05 b5 02 13 0f a0 00 63 86 e2 01 6f c0 80 01 43 05 b5 02 93 0f a0 00 63 86 5f 00 6f c0 80 01 43 05 b5 02 13 06 a0 00 63 86 c2 00 6f c0 80 01 43 05 b5 02 63 06 00 00 6f c0 80 01 43 05 b5 02 d3 26 a5 a2 63 96 06 00 6f c0 80 01 43 05 b5 02 37 07 70 41 13 07 07 00 d3 01 07 c0 d3 81 01 42 63 16 07 00 6f c0 80 01 43 05 b5 02 b7 07 70 41 93 87 07 00 53 82 07 c0 53 02 02 42 63 96 07 00 6f c0 80 01 43 05 b5 02 13 08 10 00 27 30 a5 00 27 b4 a5 00 27 38 a6 00 27 bc a6 00 67 80 00 00
temp.s: Assembler messages:
temp.s: Warning: end of file not at end of a line; newline inserted
Allocating 31 bytes, 48998 used
Allocating 61 bytes, 49029 used
Allocating 121 bytes, 49090 used
Allocating 241 bytes, 49211 used
Allocating 481 bytes, 49452 used
Ok, here's the reference code:
07 35 05 00 87 35 85 00 07 36 05 01 87 36 85 01 93 05 10 00 d3 85 05 d2 53 75 b5 02 93 02 a0 00 13 03 a0 00 b7 03 00 40 53 80 03 f0 53 00 00 42 53 75 05 02 37 0e 80 3f d3 00 0e f0 d3 80 00 42 53 75 15 0a 6f 00 40 00 63 84 62 00 6f 00 40 02 53 75 b5 02 b7 0e 20 41 53 81 0e f0 53 01 01 42 53 75 25 12 63 94 62 00 6f 00 80 00 53 75 b5 02 13 0f a0 00 63 84 e2 01 6f 00 80 00 53 75 b5 02 93 0f a0 00 63 84 5f 00 6f 00 80 00 53 75 b5 02 13 06 a0 00 63 84 c2 00 6f 00 80 00 53 75 b5 02 63 04 00 00 6f 00 80 00 53 75 b5 02 d3 26 a5 a2 63 94 06 00 6f 00 80 00 53 75 b5 02 37 07 70 41 d3 01 07 f0 d3 81 01 42 53 17 35 a2 63 14 07 00 6f 00 80 00 53 75 b5 02 b7 07 70 41 53 82 07 f0 53 02 02 42 d3 17 45 a2 63 94 07 00 6f 00 80 00 53 75 b5 02 13 08 10 00 27 30 a5 00 27 34 b5 00 27 38 c5 00 27 3c d5 00 67 80 00 00
 
Last edited:
The assembler is now fully working with my test file! It is also slightly faster than the GCC assembler as for smaller files, and slightly slower for bigger ones, and that's without any extra optimization. There will be no waiting around for your DSP code to build because both can process a 4k line file (probably typical for EQ effects) in about 50ms on my slow LicheePi 4A with a processor similar to the one in the Duo S, which will be in the first model in the series.

But why write a whole new assembler? I did so because bundling the GCC one would add bloat and because it has no C++ API, so you need to use an ugly hack to use it. I don't like ugly hacks (and neither should you) and bloat is important to avoid on embedded systems.
 
Last edited:
The assembler is now fully working with my test file! It is also slightly faster than the GCC assembler as for smaller files, and slightly slower for bigger ones, and that's without any extra optimization. There will be no waiting around for your DSP code to build because both can process a 4k line file (probably typical for EQ effects) in about 50ms on my slow LicheePi 4A with a processor similar to the one in the Duo S, which will be in the first model in the series.

But why write a whole new assembler? I did so because bundling the GCC one would add bloat and because it has no C++ API, so you need to use an ugly hack to use it. I don't like ugly hacks (and neither should you) and bloat is important to avoid on embedded systems.
Just curious, how does gas "add bloat"?
 
Just curious, how does gas "add bloat"?
The size of the dynamically linked version of gas (on my system) is 320kB but the size of my DSP library, including everything else and the new assembler, is just 140kB. However, on the DSP device, we won't actually be using the dynamically linked version, but rather the static one to eliminate dependency hell and the slight overhead of calling dynamically linked functions. Since statically linked programs have to bundle every dependency, that could easily inflate gas to several megabytes. And since gas is a separate binary, that's several megabytes of libraries duplicated and accessed every time you want to assemble some code. Bundling the assembler into the same DSP library would completely eliminate this size/memory overhead and also allow us to further optimize it if needed.
 
The size of the dynamically linked version of gas (on my system) is 320kB but the size of my DSP library, including everything else and the new assembler, is just 140kB. However, on the DSP device, we won't actually be using the dynamically linked version, but rather the static one to eliminate dependency hell and the slight overhead of calling dynamically linked functions. Since statically linked programs have to bundle every dependency, that could easily inflate gas to several megabytes. And since gas is a separate binary, that's several megabytes of libraries duplicated and accessed every time you want to assemble some code. Bundling the assembler into the same DSP library would completely eliminate this size/memory overhead and also allow us to further optimize it if needed.
If I read that correctly, you're concerned about the size of the assembler itself, and not the size of the code that it generates?
 
If I read that correctly, you're concerned about the size of the assembler itself, and not the size of the code that it generates?
The assembler doesn't actually do anything other than convert human readable instructions into raw data that the processor can understand, and so it doesn't change the meaning of any individual instruction (with only a couple exceptions). Additionally, all uncompressed RISC-V instructions are 4 bytes long, while compressed ones are 2 bytes long. This basically means that given some input, all RISC-V assemblers should generate the same output. For example, the instruction addi t0, a0, 10 becomes 00000000101001010000001010010011 because the format for addi is
imm[11:0]rs1000rd0010011ADDI
Check out this page for all the details: https://github.com/riscv/riscv-isa-manual/blob/main/src/rv-32-64g.adoc

Therefore, what the assembler does doesn't matter as long as it's correct, so we only need to think about the size of the assembler itself.
 
Last edited:
The assembler doesn't actually do anything other than convert human readable instructions into raw data that the processor can understand, and so it doesn't change the meaning of any individual instruction (with only a couple exceptions). Additionally, all uncompressed RISC-V instructions are 4 bytes long, while compressed ones are 2 bytes long. This basically means that given some input, all RISC-V assemblers should generate the same output. For example, the instruction addi t0, a0, 10 becomes 00000000101001010000001010010011 because the format for addi is
imm[11:0]rs1000rd0010011ADDI
Check out this page for all the details: https://github.com/riscv/riscv-isa-manual/blob/main/src/rv-32-64g.adoc

Therefore, what the assembler does doesn't matter as long as it's correct, so we only need to think about the size of the assembler itself.
I know what an assembler does. What I am confused about is that it sounds like you care how large the assembler itself is. How does that impact the product? Is it being used during use of the product to dynamically generate code?
 
I know what an assembler does. What I am confused about is that it sounds like you care how large the assembler itself is. How does that impact the product? Is it being used during use of the product to dynamically generate code?
It directly impacts the product's performance during compilation of effects. Why wouldn't we want the fastest possible user experience? In quantitative finance we worry about 1% performance penalties and this is far larger than that. The problem is, the GCC assembler is too big and bloated and requires an ugly hack to even use, so it's worth it to custom craft our own to act exactly how we want.
 
It directly impacts the product's performance during compilation of effects. Why wouldn't we want the fastest possible user experience? In quantitative finance we worry about 1% performance penalties and this is far larger than that. The problem is, the GCC assembler is too big and bloated and requires an ugly hack to even use, so it's worth it to custom craft our own to act exactly how we want.

Thanks. That is what I was trying to understand. I did not realize there was code-gen at runtime.
 
Back
Top Bottom