The previous essay said ssh pinephone works. This one is what it took to get there. The DWC3 device-mode driver at src/sys/dev/usb/controller/dwc3/dwc3_gadget.c is 2,707 lines I wrote against the Synopsys DesignWare DWC3 inside the RK3399S, with the host-side bus traced on a Linux laptop running usbmon because the device under test was the only thing that could possibly print debug. There is no upstream FreeBSD device-mode DWC3. Starting points were Linux 5.2, Barebox, and the DWC3 databook. None of them agree about EP0.
Two arcs took ten days. EP0 SETUP, where the hardware reported 8-byte completions with a buffer full of zeros. And bulk TX, where CDC Ethernet would push two packets and deadlock — except sometimes it didn’t, and there’s the rub.
[WAR STORY]
The EP0 SETUP TRB infinite loop
▸ symptom
Plug USB-C into the host. Linux journal logs “device descriptor read/64, error -110” three times. Phone-side serial shows the DWC3 firing XferComplete events on EP0 OUT immediately after each SETUP TRB is queued, but ep0_bounce[0..7] is 00 00 00 00 00 00 00 00. The host gives up, the device retries, the host gives up. Forever. No data ever flows.
▸ hypothesis 1
DMA cache coherency. EP0 bounce buffer is BUS_DMA_COHERENT, which on arm64 with a non-coherent SoC like RK3399 doesn’t actually mean coherent. The DWC3 might be writing data we read as a stale cache line. Switched to BUS_DMA_NOCACHE. 50b4566 dwc3_gadget: move SETUP TRB to connect-done, use BUS_DMA_NOCACHE
Same result. Bounce still zeros. Not coherency — the DWC3 isn’t writing anything to that address.
▸ hypothesis 2
The DWC3 is auto-completing the SETUP TRB during bus reset, before the host has sent a SETUP. Linux waits for connect-done before the first SETUP queue; we were queuing during reset. Moved SETUP to connect-done. 50b4566 dwc3_gadget: move SETUP TRB to connect-done, use BUS_DMA_NOCACHE
Fewer spurious completions during reset, but steady-state is identical. Even after enumeration starts properly we get XferComplete with zero data. Added a “spurious SETUP, re-queuing” path; the phone logged that line forever. 0d538c3 dwc3_gadget: add spurious SETUP detection and re-queue — note the commit message: “This is not a timing issue (re-queuing loops infinitely) but a hardware state issue.”
▸ hypothesis 3
EP0 state inherited from U-Boot. Maybe rk2aw left the controller half-initialized. Did DCTL.CSFTRST at attach plus a full CRU reset of the PHY. 9a99f0d dwc3: use DCTL.CSFTRST for device-mode reset, add CRU reset dance No change. Reset was clean. Something we were programming was wrong.
▸ breakthrough
Two register-define bugs in dwc3.h. DWC3_DEPCFG_EP_NUMBER(x) was ((x) << 1) — bit 1. The databook puts EP number at bits [29:25]. 60f1183 dwc3: fix DEPCFG EP_NUMBER (bits [29:25] not [1:0]) and TRBCTL_SETUP (6 not 2)
The actual root cause of the zero-data SETUP: per databook table 6-3, the TRBCTL encoding is 1=Normal, 2=Control-Setup, 3=Status-2, 4=Status-3, 5=Control-Data, 6=Isochronous-First, 7=Isochronous, 8=Link. We were using 6 for SETUP — which is Isochronous-First. The DWC3 saw TRBCTL=6, decided this was the first TRB of an iso transfer, and auto-completed without data because that’s what iso-first does. 92b3f13 dwc3_gadget: fix TRBCTL values — CONTROL_SETUP is 2 not 6
▸ fix
Patch one register definition file. The next boot, the very first SETUP TRB completed with real bytes:
SETUP: 80 06 00 01 00 00 40 0080 06 = bmRequestType=0x80, bRequest=GET_DESCRIPTOR. 00 01 = descriptor type Device, index 0. 00 00 = lang. 40 00 = wLength 64. The host was asking the right question; we’d been answering the wrong one for two weeks because of one bit-position constant.
▸ lesson
Register-defines bugs are silent and indistinguishable from working hardware. The TRB completion event fires either way; the bus signal differs only in what the host sees. The “reset and re-queue” instinct is wrong: the hardware is doing exactly what you’re telling it to. Find these by tracing the bus from a different machine and checking every bit position against a known-working register-write trace.
The other half of EP0 took its own path. The original integration tried FreeBSD’s usb_template framework, which defers usbd_transfer_done callbacks to the USB process kthread. By the time ctrl_start ran and started the EP0 IN transfer, the host had already retried the bus. DWC3 EP0 needs the response inside the same ISR — the host gives up at 50 ms; on a contended kernel we were missing the deadline by one or two. Reverted the kthread path and handle EP0 inline. 2ed4468 Revert usb_template integration — deferred callbacks too slow for EP0
That plus the TRBCTL fix, plus smaller cleanups (5db8e09, d2cf055, 52f48b9), got EP0 enumerating cleanly. Then bulk data started, and broke in its own way.
[WAR STORY]
The lost TX completion
▸ symptom
CDC ECM enumeration completes. Host sees enxaabbccddeef0, assigns 10.0.0.1, sends a ping. Phone receives it (RX completion fires, ICMP packet enters the network stack). Phone tries to reply. The reply mbuf gets into dwc3_gadget_if_start, gets DMA-prepared, STARTTRANSFER is issued — and then nothing. No XferComplete on EP1 IN. Subsequent TX attempts queue and stall. After about a minute the host’s ARP entry expires. Sometimes the second packet works. Sometimes 100 packets work and the 101st loses a completion. Reproducible only by sustained traffic.
▸ hypothesis 1
DMA coherency on TX. Allocated BUS_DMA_COHERENT, sync PREWRITE before STARTTRANSFER — maybe the TRB.HWO clear isn’t propagating to cache. Switched non-cached, added explicit invalidate. No change.
▸ hypothesis 2
Single TX buffer reuse. Maybe the previous DMA hadn’t finished. Audited: tx_busy set at submit, cleared on completion, counters checked. tx_busy was clear, transfer issued, no completion ever arrived. Not a reuse race.
▸ hypothesis 3
TRB ring index drift. Per-endpoint ring with trb_enqueue/trb_dequeue. prepare_trb advances enqueue; start_transfer reads from dequeue, which never moved on TX. The second TX wrote a fresh TRB at slot 1; STARTTRANSFER pointed at stale TRB[0]. Hardware processed the stale TRB (HWO=0 already), saw nothing, silently completed without firing IOC. 9255d7a dwc3_gadget: fix TRB ring index mismatch — always use TRB[0]
▸ breakthrough
Right diagnosis. The smoking gun was logging in 9f13c7b dwc3_gadget: add debug prints for bulk RX/TX completion that printed TRB.bpl/bph/size/ctrl after each submit — bpl values stuck at the first packet’s DMA address through three submits. Single-buffer mode means there’s never more than one outstanding transfer per endpoint; the ring is overkill. So instead of fixing the dequeue advance, always use TRB[0]:
/*
* Single-buffer mode: always use TRB[0]. We only have one
* outstanding transfer per endpoint at a time, so there's no
* need to advance through the ring. Reset both indices to 0
* to keep prepare_trb and start_transfer in sync.
*/
ep->trb_enqueue = 0;
ep->trb_dequeue = 0;
trb = &ep->trb_ring[0]; ▸ fix
That’s the version that’s been stable since April 3. It pins enqueue/dequeue to 0 in dwc3_gadget_ep_prepare_trb and the start_transfer reads from the same slot. RX got a parallel fix — the previous code was advancing trb_dequeue on every RX completion and re-queueing into TRB[1], TRB[2], etc., which mostly worked but would occasionally lose a completion when the indices wrapped. 9255d7a dwc3_gadget: fix TRB ring index mismatch — always use TRB[0]
▸ lesson
Two-pointer ring index code is famously easy to get subtly wrong. If your hardware only allows one outstanding transfer at a time, do not implement a ring. The Linux DWC3 driver implements a real ring because Linux supports streaming bulk endpoints with multiple in-flight TRBs; we don’t, we won’t, and the half-implemented version was strictly worse than no ring at all. The CDC ECM TX ring we do have (8-slot, d4dfbbd dwc3_gadget: allocate 8 TX ring DMA buffers in attach + 569ebb7 dwc3_gadget: rewrite if_start with TX ring — queue up to 8 packets ) sits at a different layer — it queues mbufs in software so we can submit the next one as soon as the previous TX completes. That’s not a hardware ring; that’s a software queue feeding a single-slot hardware path.
An honest gap remains. Linux’s DWC3 driver handles batched completions with a mask-and-drain pattern: at IRQ entry, set GEVNTSIZ.intmask=1; drain the entire event ring; clear the mask at exit. A completion can land between your last drain and your IRQ-return, and without the mask you’ll never see it. We don’t do that because we don’t multi-buffer. If we ever push CDC throughput hard enough to need multiple in-flight TRBs, we will need that pattern too.
Two more fixes to actually pass packets: f580710 dwc3_gadget: fix CDC Ethernet data path — different MAC, send notification — phone and host both had MAC aa:bb:cc:dd:ee:f0, so ARP couldn’t disambiguate; phone is now …f2. And we have to send USB_CDC_NOTIFY_NETWORK_CONNECTION on EP2 IN after SET_CONFIGURATION or Linux’s cdc_ether stays in operstate “unknown” forever. The first attempt at that notification used tx_buf and collided with bulk TX; 981ac97 dwc3_gadget: fix CDC notification buffer collision with bulk TX/RX moved it to ep0_bounce, which is idle during SET_CONFIGURATION STATUS.
A side story: do not modify dwc3_gadget.c
[WAR STORY]
The debug-printf catastrophe
▸ symptom
April 8, 2026. I wrap noisy device_printf calls in #ifdef DWC3_DEBUG because the driver is “stable now.” Macro is undefined. Build succeeds. Reboot. USB networking does not come up. SSH — the only path into the phone — is dead.
▸ breakthrough
The debug printfs weren’t decoration. They were buffering timing. The device_printf in dwc3_ep0_start_setup added ~30 µs of serial-console latency between STARTTRANSFER and the next operation — exactly enough margin that the SETUP queue happened in a quiet window. With the printf removed, STARTTRANSFER raced the bus reset and EP0 wedged.
▸ fix
Reverted in adfcb21 revert dwc3_gadget DWC3_DEBUG changes — broke USB networking with one line: “revert dwc3_gadget DWC3_DEBUG changes — broke USB networking.” The macro stays undefined in normal builds; the printfs stay verbose.
▸ lesson
In hardware drivers, debug printfs are not free decoration. They buffer timing, and the timing is part of the contract. Our EP0 is implicitly racing the host on the SETUP-vs-reset window. The race is invisible because the printf delay covers it. The right fix is to remove the race — but that’s a multi-day patch on a driver that’s currently the only path SSH takes into the phone. So the printf stays. Filed under “tech debt, accepted, with a note.”