New low-cost DSP platform in development

somebodyelse · Dec 11, 2024

bachatero said:
Check this one out: https://docs.banana-pi.org/en/BPI-F3/BananaPi_BPI-F3 It's a lot more expensive though.

I'm not seeing i2s on that one.

somebodyelse · Dec 11, 2024

phofman said:
It's a very logical expectation. Yet AFAIK no such device, even just a simple USB DSP filter with configurable input samplerates and channel combinations, is on the market. Far from passing proprietary protocols further downstream I can imagine the DSP getting in way of the UAC2 protocol/function and just passing all the other functions (HID, mass storage, etc.).

Sometimes the screen and controls may be wrapped up in UAC2 quirks rather than nicely separated out into HID or whatever. That's how the Forte handles its display, rotary encoder and touch buttons. That makes @TunaBug's desired operation even trickier.

bachatero · Dec 11, 2024

somebodyelse said:
I'm not seeing i2s on that one.

Where's Waldo?

somebodyelse · Dec 12, 2024

bachatero said:
Where's Waldo?
View attachment 413225

You tell me. It's not clear that it makes it out to the header:

PIN Function Function PIN
1 VCC3V3_SYS VCC5V0_OUT 2
3 AP_I2C4_SDA_3V3 VCC5V0_OUT 4
5 AP_I2C4_SCL_3V3 GND 6
7 GPIO_70_3V3 R_UART0_TXD_3V3 8
9 GND R_UART0_RXD_3V3 10
11 GPIO_71_3V3 GPIO_74_3V3 12
13 GPIO_72_3V3 GND 14
15 GPIO_73_3V3 GPIO_91_3V3 16
17 VCC3V3_SYS GPIO_92_3V3 18
19 SPI3_MOSI_3V3 GND 20
21 SPI3_MISO_3V3 GPIO_49_3V3 22
23 SPI3_SCLK_3V3 SPI3_CS_3V3 24
25 GND GPIO_50_3V3 2

bachatero · Dec 12, 2024

somebodyelse said:
You tell me. It's not clear that it makes it out to the header:

Here's the complete pinout of the SoC:

进迭时空开发者社区

developer.spacemit.com

Looks like Mr Banana here actually has very few of those pins available, so our next best bet is to custom craft a board with everything we need. Feeling up for the challenge?

somebodyelse · Dec 12, 2024

bachatero said:
Feeling up for the challenge?

Hell no! I'd go for one of the cheap ARM based boards that have USB host and OTG, plus I2S/TDM on the headers.

bachatero · Dec 12, 2024

Unfortunately, this project requires RISC-V, so ARM boards are a nonstarter.

phofman · Dec 12, 2024

somebodyelse said:
That's how the Forte handles its display, rotary encoder and touch buttons.

Thanks, please is there any more detailed info on this? I could not find any Focusrite Forte quirk in mainline alsa drivers, and this project https://github.com/alastair-dm/forte-mixer handles only mixer controls related to audio.

phofman · Dec 12, 2024

bachatero said:
Looks like Mr Banana here actually has very few of those pins available, so our next best bet is to custom craft a board with everything we need. Feeling up for the challenge?

All SBCs pass only a small subset of the extensive SoC features onto their pin headers. For more features or actual embedding into a product core boards/compute modules are used, typically with edge connectors/castellated holes/LGA pads/high-density connectors. Only the large-volume integrators solder the actual SoCs, DDRs etc onto their product boards. Very likely no core board is available for your chosen SoC, the market is still tiny and young.

bachatero said:
Unfortunately, this project requires RISC-V, so ARM boards are a nonstarter.

Yes, if you want to code the DSP in assembler. Honestly, good luck with that. For any higher-level language the architecture is not that important. E.g. CamillaDSP leaves the optimization of vector operations for the particular architecture to the Rust compiler, or uses small sections designed for the architecture, e.g. https://github.com/HEnquist/rubato/...nterpolator/sinc_interpolator_neon.rs#L60-L89 .

IMO that's the way to go with writing a high-performance code. My 2 cents that 99.9% of the code for this extremely large-envisioned project would be identical for any CPU architecture. But that's just my opinion.

phofman · Dec 12, 2024

Another issue with the USB bridge compatible with the original custom-designed vendor drivers would be the USB endpoint numbers which the bridge would have to duplicate from the bridged device. IMO this is not a viable project. Perhaps a pushbutton to physically bridge the USB lines within the DSP device to allow convenient quick access to the bridged device from the host would do.

bachatero · Dec 15, 2024

Progress update: The custom assembler is now generating RISC-V machine code and almost on par with the GCC reference!

Code:

Ok, here's the assembled code:
07 35 05 00 87 35 85 00 07 36 05 01 87 36 85 01 93 05 10 00 d3 85 05 d2 43 05 b5 02 93 02 a0 00 13 03 a0 00 b7 03 00 40 93 83 03 00 53 80 03 c0 53 00 00 42 43 05 05 02 37 0e 80 3f 13 0e 0e 00 d3 00 0e c0 d3 80 00 42 53 05 15 0a 6f 80 00 01 63 86 62 00 6f c0 82 05 43 05 b5 02 b7 0e 20 41 93 8e 0e 00 53 81 0e c0 53 01 01 42 53 05 25 12 63 96 62 00 6f c0 80 01 43 05 b5 02 13 0f a0 00 63 86 e2 01 6f c0 80 01 43 05 b5 02 93 0f a0 00 63 86 5f 00 6f c0 80 01 43 05 b5 02 13 06 a0 00 63 86 c2 00 6f c0 80 01 43 05 b5 02 63 06 00 00 6f c0 80 01 43 05 b5 02 d3 26 a5 a2 63 96 06 00 6f c0 80 01 43 05 b5 02 37 07 70 41 13 07 07 00 d3 01 07 c0 d3 81 01 42 63 16 07 00 6f c0 80 01 43 05 b5 02 b7 07 70 41 93 87 07 00 53 82 07 c0 53 02 02 42 63 96 07 00 6f c0 80 01 43 05 b5 02 13 08 10 00 27 30 a5 00 27 b4 a5 00 27 38 a6 00 27 bc a6 00 67 80 00 00
temp.s: Assembler messages:
temp.s: Warning: end of file not at end of a line; newline inserted
Allocating 31 bytes, 48998 used
Allocating 61 bytes, 49029 used
Allocating 121 bytes, 49090 used
Allocating 241 bytes, 49211 used
Allocating 481 bytes, 49452 used
Ok, here's the reference code:
07 35 05 00 87 35 85 00 07 36 05 01 87 36 85 01 93 05 10 00 d3 85 05 d2 53 75 b5 02 93 02 a0 00 13 03 a0 00 b7 03 00 40 53 80 03 f0 53 00 00 42 53 75 05 02 37 0e 80 3f d3 00 0e f0 d3 80 00 42 53 75 15 0a 6f 00 40 00 63 84 62 00 6f 00 40 02 53 75 b5 02 b7 0e 20 41 53 81 0e f0 53 01 01 42 53 75 25 12 63 94 62 00 6f 00 80 00 53 75 b5 02 13 0f a0 00 63 84 e2 01 6f 00 80 00 53 75 b5 02 93 0f a0 00 63 84 5f 00 6f 00 80 00 53 75 b5 02 13 06 a0 00 63 84 c2 00 6f 00 80 00 53 75 b5 02 63 04 00 00 6f 00 80 00 53 75 b5 02 d3 26 a5 a2 63 94 06 00 6f 00 80 00 53 75 b5 02 37 07 70 41 d3 01 07 f0 d3 81 01 42 53 17 35 a2 63 14 07 00 6f 00 80 00 53 75 b5 02 b7 07 70 41 53 82 07 f0 53 02 02 42 d3 17 45 a2 63 94 07 00 6f 00 80 00 53 75 b5 02 13 08 10 00 27 30 a5 00 27 34 b5 00 27 38 c5 00 27 3c d5 00 67 80 00 00

bachatero · Dec 16, 2024

The assembler is now fully working with my test file! It is also slightly faster than the GCC assembler as for smaller files, and slightly slower for bigger ones, and that's without any extra optimization. There will be no waiting around for your DSP code to build because both can process a 4k line file (probably typical for EQ effects) in about 50ms on my slow LicheePi 4A with a processor similar to the one in the Duo S, which will be in the first model in the series.

But why write a whole new assembler? I did so because bundling the GCC one would add bloat and because it has no C++ API, so you need to use an ugly hack to use it. I don't like ugly hacks (and neither should you) and bloat is important to avoid on embedded systems.

epicure · Dec 16, 2024

Very cool project!
Would like to have your DSP with AES/EBU in and out.

TunaBug · Dec 16, 2024

bachatero said:
The assembler is now fully working with my test file! It is also slightly faster than the GCC assembler as for smaller files, and slightly slower for bigger ones, and that's without any extra optimization. There will be no waiting around for your DSP code to build because both can process a 4k line file (probably typical for EQ effects) in about 50ms on my slow LicheePi 4A with a processor similar to the one in the Duo S, which will be in the first model in the series.

But why write a whole new assembler? I did so because bundling the GCC one would add bloat and because it has no C++ API, so you need to use an ugly hack to use it. I don't like ugly hacks (and neither should you) and bloat is important to avoid on embedded systems.

Just curious, how does gas "add bloat"?

bachatero · Dec 16, 2024

TunaBug said:
Just curious, how does gas "add bloat"?

The size of the dynamically linked version of gas (on my system) is 320kB but the size of my DSP library, including everything else and the new assembler, is just 140kB. However, on the DSP device, we won't actually be using the dynamically linked version, but rather the static one to eliminate dependency hell and the slight overhead of calling dynamically linked functions. Since statically linked programs have to bundle every dependency, that could easily inflate gas to several megabytes. And since gas is a separate binary, that's several megabytes of libraries duplicated and accessed every time you want to assemble some code. Bundling the assembler into the same DSP library would completely eliminate this size/memory overhead and also allow us to further optimize it if needed.

TunaBug · Dec 17, 2024

bachatero said:
The size of the dynamically linked version of gas (on my system) is 320kB but the size of my DSP library, including everything else and the new assembler, is just 140kB. However, on the DSP device, we won't actually be using the dynamically linked version, but rather the static one to eliminate dependency hell and the slight overhead of calling dynamically linked functions. Since statically linked programs have to bundle every dependency, that could easily inflate gas to several megabytes. And since gas is a separate binary, that's several megabytes of libraries duplicated and accessed every time you want to assemble some code. Bundling the assembler into the same DSP library would completely eliminate this size/memory overhead and also allow us to further optimize it if needed.

If I read that correctly, you're concerned about the size of the assembler itself, and not the size of the code that it generates?

bachatero · Dec 17, 2024

TunaBug said:
If I read that correctly, you're concerned about the size of the assembler itself, and not the size of the code that it generates?

The assembler doesn't actually do anything other than convert human readable instructions into raw data that the processor can understand, and so it doesn't change the meaning of any individual instruction (with only a couple exceptions). Additionally, all uncompressed RISC-V instructions are 4 bytes long, while compressed ones are 2 bytes long. This basically means that given some input, all RISC-V assemblers should generate the same output. For example, the instruction addi t0, a0, 10 becomes 00000000101001010000001010010011 because the format for addi is

imm[11:0]

rs1

000

rd

0010011

ADDI

Check out this page for all the details: https://github.com/riscv/riscv-isa-manual/blob/main/src/rv-32-64g.adoc

Therefore, what the assembler does doesn't matter as long as it's correct, so we only need to think about the size of the assembler itself.

TunaBug · Dec 18, 2024

bachatero said:
The assembler doesn't actually do anything other than convert human readable instructions into raw data that the processor can understand, and so it doesn't change the meaning of any individual instruction (with only a couple exceptions). Additionally, all uncompressed RISC-V instructions are 4 bytes long, while compressed ones are 2 bytes long. This basically means that given some input, all RISC-V assemblers should generate the same output. For example, the instruction addi t0, a0, 10 becomes 00000000101001010000001010010011 because the format for addi is

imm[11:0] rs1 000 rd 0010011 ADDI

Check out this page for all the details: https://github.com/riscv/riscv-isa-manual/blob/main/src/rv-32-64g.adoc

Therefore, what the assembler does doesn't matter as long as it's correct, so we only need to think about the size of the assembler itself.

I know what an assembler does. What I am confused about is that it sounds like you care how large the assembler itself is. How does that impact the product? Is it being used during use of the product to dynamically generate code?

bachatero · Dec 18, 2024

TunaBug said:
I know what an assembler does. What I am confused about is that it sounds like you care how large the assembler itself is. How does that impact the product? Is it being used during use of the product to dynamically generate code?

It directly impacts the product's performance during compilation of effects. Why wouldn't we want the fastest possible user experience? In quantitative finance we worry about 1% performance penalties and this is far larger than that. The problem is, the GCC assembler is too big and bloated and requires an ugly hack to even use, so it's worth it to custom craft our own to act exactly how we want.

TunaBug · Dec 18, 2024

bachatero said:
It directly impacts the product's performance during compilation of effects. Why wouldn't we want the fastest possible user experience? In quantitative finance we worry about 1% performance penalties and this is far larger than that. The problem is, the GCC assembler is too big and bloated and requires an ugly hack to even use, so it's worth it to custom craft our own to act exactly how we want.

Thanks. That is what I was trying to understand. I did not realize there was code-gen at runtime.

PIN	Function	Function	PIN
1	VCC3V3_SYS	VCC5V0_OUT	2
3	AP_I2C4_SDA_3V3	VCC5V0_OUT	4
5	AP_I2C4_SCL_3V3	GND	6
7	GPIO_70_3V3	R_UART0_TXD_3V3	8
9	GND	R_UART0_RXD_3V3	10
11	GPIO_71_3V3	GPIO_74_3V3	12
13	GPIO_72_3V3	GND	14
15	GPIO_73_3V3	GPIO_91_3V3	16
17	VCC3V3_SYS	GPIO_92_3V3	18
19	SPI3_MOSI_3V3	GND	20
21	SPI3_MISO_3V3	GPIO_49_3V3	22
23	SPI3_SCLK_3V3	SPI3_CS_3V3	24
25	GND	GPIO_50_3V3	2

New low-cost DSP platform in development

Master Contributor

Master Contributor

Addicted to Fun and Learning

Master Contributor

Addicted to Fun and Learning

Master Contributor

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Addicted to Fun and Learning

Member

Active Member

Addicted to Fun and Learning

Active Member

Addicted to Fun and Learning

Active Member

Addicted to Fun and Learning

Active Member

Similar threads