The Bluetooth radio on the PinePhone Pro lives inside the AzureWave AP6255 module — a BCM4345C5 die — talking HCI over UART2 at 1500000 baud with hardware flow control. UART2 is /dev/cuau0 on FreeBSD. Nothing else on this SoC needs UART2; if the chip and the kernel can agree on framing, we have a Bluetooth controller. The catch: FreeBSD’s HCI stack assumes a transport that already delivers H4-framed packets, and the deprecated ng_h4(4) line discipline that used to do that framing was removed years ago. There is no in-tree path from cuau0 to ng_hci.
Three things had to happen before the chip could ever issue an event the kernel would understand:
- A netgraph node that does HCI H4 framing — type byte plus per-type variable-length packet — sitting between
ng_tty(raw bytes) andng_hci(HCI frames). - A patchram uploader that streams the
BCM4345C0.hcdfile into the chip without driving the firmware into its “Hardware_Error 0x00 burst, then permanently deaf” failure mode. - A UART RX-FIFO trigger that fires on every byte, not every eight bytes, so HCI Events that are only 6–13 bytes long actually leave the FIFO before the chip stalls waiting for a host turnaround.
Each one took its own arc.
ng_h4frame: H4 framing as a netgraph node
ng_h4 was a tty line discipline. Modern FreeBSD does framing as a netgraph node — that’s how ng_tty already exposes a UART byte stream into the netgraph mesh. So ng_h4frame is a node with two hooks (lower for raw bytes from ng_tty, upper for HCI frames to ng_hci) and a state machine in the middle. 248fe0e ng_h4frame: new netgraph node for HCI H4 framing is the first cut: 336 lines, no debug knobs, single h4frame_feed() function that walks one byte at a time through a 3-state machine — H4S_TYPE → H4S_HDR → H4S_PAYLOAD → emit.
The header sizes are fixed per type and live in h4_header_size() at src/sys/netgraph/bluetooth/drivers/h4frame/ng_h4frame.c:131:
case H4_PKT_CMD: return 3; /* opcode[2] + plen[1] */
case H4_PKT_ACL: return 4; /* handle[2] + len[2] */
case H4_PKT_SCO: return 3; /* handle[2] + len[1] */
case H4_PKT_EVT: return 2; /* code[1] + plen[1] */
case H4_PKT_ISO: return 4; /* handle[2] + len[2] */
After enough header bytes are in to read the length field, h4_payload_len() decodes the per-type payload size and the state machine transitions to H4S_PAYLOAD (or emits immediately on a zero-length packet). The TX path in h4frame_rcvdata() is degenerate: ng_hci writes already include the H4 type byte, so we forward unchanged to the lower hook.
That much is mechanical. The interesting parts are everything that came afterward, when the kernel started getting bytes that didn’t match the chip’s behavior.
[WAR STORY]
The trailing-byte stall
▸ symptom
After firmware load, every HCI Event from the chip arrives in two phases. The first 7-ish bytes hit the host UART promptly. The remaining bytes — the tail of the same event — sit in the chip’s TX FIFO indefinitely. Read_BD_Addr (which returns 13 bytes) prints the first half on the serial sniffer and then nothing. Until the host writes another HCI command. Then the rest of the previous event flushes, immediately followed by the response to the new command.
▸ hypothesis 1
RX overrun in our UART. Maybe the snps UART is dropping bytes when it can’t keep up. Added counters: hw.h4frame.bytes_in vs deltas, plus polled RFL (Receive FIFO Level). RFL is zero through the stall. The bytes are not in our hardware. They are in the chip’s TX FIFO, waiting.
▸ hypothesis 2
Chip-side LPM (Low Power Mode). Linux’s hci_bcm driver issues VSC 0xfc27 (Set_Sleep_Mode) with sleep_mode=0 to disable LPM on boards without HOST_WAKE wired through. We were sending the same. Disabling LPM didn’t change the stall pattern. Removing the VSC entirely also didn’t help — and worse, sending it while the chip was in an inconsistent state was implicated in Hardware_Error 0x00 cascades. 36d84b3 bcm_firmware_load.pl: stop sending 0xfc27 — caused chip HW errors dropped the VSC.
▸ breakthrough
The chip’s UART transmitter is gated on host-side TX activity. There’s no HOST_WAKE / DEV_WAKE wiring in stock FreeBSD on this board (essay 10 fixes that), so the chip’s UART power-management state machine ends up holding partial event bytes until it sees us write anything back. A single null byte is enough — the chip drops invalid H4 type bytes silently, but the UART activity itself wakes the chip’s TX path and flushes the held event tail.
▸ fix
A “tickler” callout in ng_h4frame that periodically pokes the chip with a 0x00 byte when an HCI packet has been mid-parse for more than 500 ms. Gated on the parse state — fires only when we’re somewhere between H4S_HDR and H4S_PAYLOAD and idle, never from H4S_TYPE. Also gated on packet type: only events. ACL exchanges naturally have many TX rounds that wake the chip on their own, and tickling mid-ACL confuses the chip’s UART state machine. 52a4ec0 ng_h4frame: TX tickler workaround for BCM trailing-byte stall first landed at a 100 ms threshold; 35a1969 ng_h4frame: bump tickle threshold to 500ms to avoid interfering with active L2CAP bumped it to 500 ms after observing interference with active L2CAP traffic; 185a961 ng_h4frame: only tickle when mid-parsing HCI event packets narrowed the gate to event packets only. Sysctl hw.h4frame_tickle_enable makes it disablable from userland.
▸ lesson
The chip’s specification says nothing about this. The Linux hci_bcm driver “doesn’t have the bug” because Linux always wires HOST_WAKE (or doesn’t run on hardware where it’s missing). When you port a driver into an environment the original author never tested, every implicit assumption about the surrounding system becomes a bug to discover. The TX tickler is a workaround, not a fix; the real fix is HOST_WAKE, which is essay 10’s story. The tickler stays as a sysctl-controlled belt-and-suspenders option for boards where HOST_WAKE isn’t reliable yet.
The 1-byte UART trigger
Even with the tickler, performance was awful. Every event took two interrupts: one for the first ~7 bytes (filling the FIFO past its default 8-byte trigger), one for the timeout (the rest). The 8550-style FCR has four trigger levels — 1, 4, 8, 14 bytes. The default is 8. With the BCM4345 emitting events as small as 6 bytes (Command_Status), an 8-byte trigger means every short event sits in the FIFO until the character-timeout interrupt fires (~4 char times = ~27 µs at 1.5 Mbaud). Multiply by hundreds of events per pairing flow and the latency adds up.
But unconditionally lowering the trigger to 1 byte for every snps UART would cost the rest of the system — RFC 1340 console at 115200 doesn’t need single-byte interrupts. So the trigger change has to be targeted at UARTs that have a Broadcom BT chip child. The DT already encodes that: the BT subnode under UART2 has compatible = "brcm,bcm43xx". snps_attach() walks its FDT children and matches the prefix:
sys/dev/uart/uart_dev_snps.c
#include <sys/bus.h> #include <sys/kernel.h> #include <sys/module.h> #include <sys/sysctl.h> #include <machine/bus.h> #include <dev/uart/uart.h> #include <dev/uart/uart_bus.h> #include <dev/uart/uart_cpu_fdt.h> #include <dev/uart/uart_dev_ns8250.h> #include <dev/ic/ns16550.h> #include <dev/ofw/ofw_bus.h> #include <dev/ofw/ofw_bus_subr.h> sc->reset = reset; return (BUS_PROBE_VENDOR); } /* * If a child node compatible with "brcm,bcm43*" is present under this * UART (i.e. a UART-attached Broadcom BT chip), force the RX FIFO * trigger to 1 byte. The BCM4345 sends short HCI Event responses * that would otherwise sit below the default 8-byte trigger and only * be picked up via the RXTOUT char-timeout interrupt. */ static bool snps_has_bcm_child(device_t dev) { phandle_t node, child; char compat[64]; int len; node = ofw_bus_get_node(dev); if (node == -1) return (false); for (child = OF_child(node); child != 0; child = OF_peer(child)) { len = OF_getprop(child, "compatible", compat, sizeof(compat)); if (len > 0 && strncmp(compat, "brcm,bcm43", 10) == 0) return (true); } return (false); } static int snps_attach(device_t dev) { struct snps_softc *sc; struct uart_bas *bas; phandle_t node; int ret; node = ofw_bus_get_node(dev); /* Set up phandle to dev mapping */ OF_device_register_xref(OF_xref_from_node(node), dev); if (snps_has_bcm_child(dev)) { sc = device_get_softc(dev); bas = &sc->ns8250.base.sc_bas; /* Set RX trigger to 1 byte (every byte interrupts). */ sc->ns8250.fcr = (sc->ns8250.fcr & ~0xc0) | FCR_RX_LOW; uart_setreg(bas, REG_FCR, sc->ns8250.fcr); uart_barrier(bas); device_printf(dev, "BT chip detected; RX FIFO trigger set to 1 byte\n"); } } return (ret); After this patch, dmesg shows uart2: BT chip detected; RX FIFO trigger set to 1 byte at boot, and HCI Event latency drops from a tenth of a millisecond to one byte time at 1.5 Mbaud — about 7 µs per byte, immediate IRQ on each. The tickler still fires occasionally but its rate dropped by an order of magnitude.
The patchram dance
The chip boots in ROM mode at 115200 baud and won’t do anything useful — no scanning, no advertising, no SDP — until you stream a vendor .hcd patchram into it. The file is a sequence of HCI command records: 2-byte opcode, 1-byte parameter length, parameters. The loader sends them as HCI commands and waits for Command_Complete on each. The very last record is Launch_RAM (0xfc4e) which resets the chip; you do NOT wait for a completion on that one.
Sounds simple. It isn’t.
overlay/usr/local/sbin/bcm_firmware_load.pl is the loader. The lessons it encodes are blood:
-
Strict opcode-matching of
Command_Complete. The chip’sncmdfield inCommand_Completerefreshes the host’s command credit. If you send a command before the previous credit refresh, or if you ignore the opcode in the response and assume “any complete is good,” the firmware drives itself into aHardware_Error 0x00burst (10+ events in a row) and goes deaf. The loader tracks$cmd_credit, refuses to send when it’s zero, and warns on opcode mismatch. -
Launch_RAM is fire-and-forget. It resets the chip; no completion ever arrives.
996c880bcm_firmware_load.pl: fix Launch_RAM handling fixed an early version that hung waiting. -
Baud rate sequencing has to happen on both sides, twice. The chip starts at 115200. After firmware upload, you switch the chip to 1500000 with VSC
0xfc18(parameter ispack("C2 V", 0, 0, 1500000)— 2 zero bytes plus a little-endian u32). Then yousttythe host UART to match. Then you re-issueHCI_Resetto confirm the chip is alive at the new baud. -
Do NOT issue post-patchram VSCs blindly. Linux’s
btbcm_setupdoes NOT sendSet_Sleep_Mode(0xfc27),Set_Event_Mask(0x0c01), orWrite_Simple_Pairing_Mode(0x0c56) at this layer. Those belong to ng_hci’s connection-manager init phase. The first few iterations of the loader sent them all “for safety.” The chip responded withHardware_Errorevents. Removing the unnecessary VSCs made the chip stable. -
Write_Secure_Connections_Host_Support(0x0c7a) requires the right firmware build. The defaultlinux-firmwareBCM4345C0 build (RPi 3+, 0190) returnsCommand Disallowed (0x0c). That’s essay 11’s problem.
The loader output for a clean run looks like:
Step 1: HCI_Reset...
Step 2: Read_Local_Name (pre-patchram)... BCM4345C0
Step 3: Download_Minidriver (0xfc2e)...
Step 4: Uploading firmware records from /tmp/BCM4345C0.hcd...
uploaded 50 records...
uploaded 100 records...
...
record 1832: Launch_RAM (0xfc4e) — fire-and-forget
Step 4: Uploaded 1832 records, 256441 bytes
Step 5: Waiting 250ms for chip to reset into patched firmware...
Step 6: Post-patch HCI_Reset... chip is alive in patched mode
Step 7: Switch chip baud to 1500000 (VSC 0xfc18)... chip baud set to 1500000
Step 8: Read_Local_Name (post-patchram)... BCM4345C0 Murata Type-1MW UART 37.4 MHz BT 5.0-0187
Step 9: Read_BD_Addr... BD_Addr: 22:22:b1:f3:11:e2
That’s the chip introducing itself, in its post-patchram identity. The very first time we saw step 9 print a real BD address — instead of timing out, instead of Hardware_Error 0x00, instead of garbled bytes from a baud mismatch — that’s when the chip was truly attached.
What it looks like together
A complete Read_BD_Addr exchange after all three pieces are in place:
1776832010.012444 CMDlen=4 01 09 10 001776832010.013998 EVTlen=14 04 0e 0a 01 09 10 00 e2 11 f3 b1 22 22
01 09 10 00 is the HCI Command packet (type 0x01) for opcode 0x1009 (Read_BD_Addr) with no parameters. 04 0e 0a 01 09 10 00 e2 11 f3 b1 22 22 is Command_Complete (event 0x0e) with ncmd=1, opcode 0x1009, status 0x00, and the BD address in little-endian. Six bytes outbound, fourteen bytes inbound, both ends acknowledging each other for the first time.
ng_h4frame is in the tree at src/sys/netgraph/bluetooth/drivers/h4frame/ng_h4frame.c with its module skeleton at src/sys/modules/netgraph/bluetooth/h4frame/Makefile. The patchram loader is at overlay/usr/local/sbin/bcm_firmware_load.pl. The UART trigger fix is the patch above. With those three landed, every essay after this can assume “the chip responds to HCI commands” — which is what essay 10 is going to need before it can fight the next battle.