## 2024 01 03

Today I'd like to check out the RP2040's PIO via the UART example, and see how fast we can sendy UART frames between two devices, as a preliminary speed test. 

So - we're wired up, we should hit the scope and then fire up some test code: display (?) uart PIO, blinky, etc ... 

The first thing [I'm learning here](https://www.eevblog.com/forum/microcontrollers/8-uarts-using-asm_pio-pio-dma-micropython-on-the-rpi-pico/) and as shown in the [pio rx](https://github.com/raspberrypi/pico-examples/tree/master/pio/uart_rx) and [pio tx](https://github.com/raspberrypi/pico-examples/blob/master/pio/uart_tx/uart_tx.pio) examples is that each state machine can only do one half of a UART... TX or RX, not both as we are accustomed to with a UART peripheral. So, since we are interested in building our little router thing, we actually would only be able to max out at 6 total UARTS there: 2x the-og-peripheral, and 4x PIOs. 

The second thing [I'm learning](https://www.instructables.com/Using-RP2040-PIO-in-Arduino-IDE-on-Windows/) is that working with PIO in C is not *that* simple; we write a PIO block `uart_tx.pio` and then use a pio-assembler to write `uart_tx.pio.h` that we can include in our sketch. There is an [online pioasm instance](https://wokwi.com/tools/pioasm). Doing this on windows is a little bit of a pain - and means that we will have two things to call before we can upload code, but not a major-major roadblock. 

So, as for reasonable goals for today, I should basically just try to throw-and-catch a block, really simple-like, to test baseline perf. 

Well actually, fk it, I will use the online pioasm for the time being...

### UART PIO TX'ing 

And we're up with a test, I will find the BAUD limit next, and check if that is affected by changes to f_cpu... 

This should be simple: we have output is 

```cpp
float div = (float)clock_get_hz(clk_sys) / (8 * baud);
```

So we should be able to to clk/8 : 16MBit/s, and indeed we can see things working up to 15MBit/sec. To get to 30, we can crank the CPU to 250MHz. 

| Mbit/s | Traces |
| --- | --- |
| 1   | ![img](images/2024-01-03_uart-tx-1mb.png)   |
| 2.5 | ![img](images/2024-01-03_uart-tx-2p5mb.png) |
| 10   | ![img](images/2024-01-03_uart-tx-10mb.png)   |
| 15  | ![img](images/2024-01-03_uart-tx-15mb.png)   |
| 25  | ![img](images/2024-01-03_uart-tx-25mb.png)   |
| 30  | ![img](images/2024-01-03_uart-tx-30mb.png)   |

But this is not terribly interesting: we want to see that we can catch words fast enough: the ISR on the RX side is normally where we meet our limits. 

### UART PIO RX'ing with PIO Example 

So, I should spin up an RX line now and see about firing an interrupt there... I'll get one chip TX'ing at a fixed rate and then start in on no. 2 

... doing this using [the blocking example](https://github.com/raspberrypi/pico-examples/blob/master/pio/uart_rx/uart_rx.c) catches *some* bytes, but not that many (at only 1mbaud), and it's perhaps only latching when we *just* catch the byte in time, i.e. if we poll just as the last bit has arrived - or perhaps we're only-sometimes catching in time, and the thing is not receiving next bytes etc etc.. 

So, the interrupt version... works at a similar quality: some bytes are captured, many are not. It tends to happen in phases. In traces below, CH1 goes lo-then-hi whenever a new byte is loaded into the TX chip, CH2 is the UART trace (TX/RX), and CH4 flips state whenever the UART RX IRQ fires. 

![irq](images/2024-01-03_iffy-irq-rx-01.png)
![irq](images/2024-01-03_iffy-irq-rx-02.png)

So, this is all kind of bad news for our project, and I suspect I would have to get into some of the PIO depths to figure out what's going wrong, which I don't really have the time for at the moment. I can try slowing it down to see if this is a chunking error or something else... it's the same even at 115200 BAUD.

So - for troubleshooting, I am perhaps missing some pin config? But that looks to be handled in the example's setup. 

Not totally sure, but I'm going to move on to try out Earle's software serial PIO, which I have some prior experience with, but also found some bugs in (?) IIRC.

### UART PIO RX'ing with Earle's Software Serial PIO 

[the earle commit](https://github.com/earlephilhower/arduino-pico/pull/391)  
[the earle pio_uart.h](https://github.com/earlephilhower/arduino-pico/blob/326697bbe1cc3b4b5f7c140dca10a6924262539d/cores/rp2040/pio_uart.pio.h)  
[the earle pio softwareserial.h](https://github.com/earlephilhower/arduino-pico/blob/4c1c72c996b6a1243b0eafa956dd0eb6410e2362/cores/rp2040/SerialPIO.h)  
[the earle pio softwareserial.cpp](https://github.com/earlephilhower/arduino-pico/blob/4c1c72c996b6a1243b0eafa956dd0eb6410e2362/cores/rp2040/SerialPIO.cpp)  

So - let's try this out. 

OK, I have this up and running: I am streaming at a fixed BAUD, then counting the ratio of bytes we miss in a stream. The uc's are counting missed and proper bytes (reading monotonic sequence numbers) and then checking also intervals between transmissions... 

| Mbit/s | Misses (errs / total-bytes) | Expected Byte Time (us) | Avg Byte Time |
| --- | --- | --- | --- |
| 0.1 | nil | 100 | 110.2 |
| 0.5 | nil | 20 | 22.2 | 
| 1.0 | 0.025 (1/40) (!) | 10 | 13.4 |
| 2.5 | 0.497 | 4 | 11.0 |
| 5.0 | 0.755 | 2 | 11.0 |

So, I'm not convinced that I haven't fuxd anything here... it seems wild that we would have such a bottleneck to performance, and there are a few red flags here; namely that we have a lower bound of 11us between transmits, which would also explain our increasing error rate when we surpass 1 Mbit/sec (as the byte-wise period there is 10us or so). 

---

### Not in Flash

I'm also noticing [this pattern](https://github.com/earlephilhower/arduino-pico/blob/4c1c72c996b6a1243b0eafa956dd0eb6410e2362/cores/rp2040/SerialPIO.cpp#L90) to do:

```cpp
void __not_in_flash_func(){}
```

Around some handlers. I wonder if this is a missing step on lots of these codes... This is discussed [in this forum post](https://forums.raspberrypi.com/viewtopic.php?t=311811) and also shows up in [the sdk here](https://github.com/raspberrypi/pico-sdk/blob/master/src/rp2_common/hardware_spi/spi.c#L84) - for fast shit. Maybe important...

Also points to a larger red flag for me about the system... maybe this is not actually the microcontroller for hardo realtime stuff... 