09 · bluetooth

Bluetooth attached: BCM4345C5 firmware load

A from-scratch netgraph H4 framing node, a 1-byte UART trigger, and a chip that speaks back.

● working

The Bluetooth radio on the PinePhone Pro lives inside the AzureWave AP6255 module — a BCM4345C5 die — talking HCI over UART2 at 1500000 baud with hardware flow control. UART2 is /dev/cuau0 on FreeBSD. Nothing else on this SoC needs UART2; if the chip and the kernel can agree on framing, we have a Bluetooth controller. The catch: FreeBSD’s HCI stack assumes a transport that already delivers H4-framed packets, and the deprecated ng_h4(4) line discipline that used to do that framing was removed years ago. There is no in-tree path from cuau0 to ng_hci.

Three things had to happen before the chip could ever issue an event the kernel would understand:

  1. A netgraph node that does HCI H4 framing — type byte plus per-type variable-length packet — sitting between ng_tty (raw bytes) and ng_hci (HCI frames).
  2. A patchram uploader that streams the BCM4345C0.hcd file into the chip without driving the firmware into its “Hardware_Error 0x00 burst, then permanently deaf” failure mode.
  3. A UART RX-FIFO trigger that fires on every byte, not every eight bytes, so HCI Events that are only 6–13 bytes long actually leave the FIFO before the chip stalls waiting for a host turnaround.

Each one took its own arc.

ng_h4frame: H4 framing as a netgraph node

ng_h4 was a tty line discipline. Modern FreeBSD does framing as a netgraph node — that’s how ng_tty already exposes a UART byte stream into the netgraph mesh. So ng_h4frame is a node with two hooks (lower for raw bytes from ng_tty, upper for HCI frames to ng_hci) and a state machine in the middle. 248fe0e ng_h4frame: new netgraph node for HCI H4 framing is the first cut: 336 lines, no debug knobs, single h4frame_feed() function that walks one byte at a time through a 3-state machine — H4S_TYPE → H4S_HDR → H4S_PAYLOAD → emit.

The header sizes are fixed per type and live in h4_header_size() at src/sys/netgraph/bluetooth/drivers/h4frame/ng_h4frame.c:131:

case H4_PKT_CMD: return 3;	/* opcode[2] + plen[1] */
case H4_PKT_ACL: return 4;	/* handle[2] + len[2] */
case H4_PKT_SCO: return 3;	/* handle[2] + len[1] */
case H4_PKT_EVT: return 2;	/* code[1] + plen[1] */
case H4_PKT_ISO: return 4;	/* handle[2] + len[2] */

After enough header bytes are in to read the length field, h4_payload_len() decodes the per-type payload size and the state machine transitions to H4S_PAYLOAD (or emits immediately on a zero-length packet). The TX path in h4frame_rcvdata() is degenerate: ng_hci writes already include the H4 type byte, so we forward unchanged to the lower hook.

That much is mechanical. The interesting parts are everything that came afterward, when the kernel started getting bytes that didn’t match the chip’s behavior.

[WAR STORY]

The trailing-byte stall

ng_h4frame / BCM4345 UART

▸ symptom

After firmware load, every HCI Event from the chip arrives in two phases. The first 7-ish bytes hit the host UART promptly. The remaining bytes — the tail of the same event — sit in the chip’s TX FIFO indefinitely. Read_BD_Addr (which returns 13 bytes) prints the first half on the serial sniffer and then nothing. Until the host writes another HCI command. Then the rest of the previous event flushes, immediately followed by the response to the new command.

▸ hypothesis 1

RX overrun in our UART. Maybe the snps UART is dropping bytes when it can’t keep up. Added counters: hw.h4frame.bytes_in vs deltas, plus polled RFL (Receive FIFO Level). RFL is zero through the stall. The bytes are not in our hardware. They are in the chip’s TX FIFO, waiting.

▸ hypothesis 2

Chip-side LPM (Low Power Mode). Linux’s hci_bcm driver issues VSC 0xfc27 (Set_Sleep_Mode) with sleep_mode=0 to disable LPM on boards without HOST_WAKE wired through. We were sending the same. Disabling LPM didn’t change the stall pattern. Removing the VSC entirely also didn’t help — and worse, sending it while the chip was in an inconsistent state was implicated in Hardware_Error 0x00 cascades. 36d84b3 bcm_firmware_load.pl: stop sending 0xfc27 — caused chip HW errors dropped the VSC.

▸ breakthrough

The chip’s UART transmitter is gated on host-side TX activity. There’s no HOST_WAKE / DEV_WAKE wiring in stock FreeBSD on this board (essay 10 fixes that), so the chip’s UART power-management state machine ends up holding partial event bytes until it sees us write anything back. A single null byte is enough — the chip drops invalid H4 type bytes silently, but the UART activity itself wakes the chip’s TX path and flushes the held event tail.

▸ fix

A “tickler” callout in ng_h4frame that periodically pokes the chip with a 0x00 byte when an HCI packet has been mid-parse for more than 500 ms. Gated on the parse state — fires only when we’re somewhere between H4S_HDR and H4S_PAYLOAD and idle, never from H4S_TYPE. Also gated on packet type: only events. ACL exchanges naturally have many TX rounds that wake the chip on their own, and tickling mid-ACL confuses the chip’s UART state machine. 52a4ec0 ng_h4frame: TX tickler workaround for BCM trailing-byte stall first landed at a 100 ms threshold; 35a1969 ng_h4frame: bump tickle threshold to 500ms to avoid interfering with active L2CAP bumped it to 500 ms after observing interference with active L2CAP traffic; 185a961 ng_h4frame: only tickle when mid-parsing HCI event packets narrowed the gate to event packets only. Sysctl hw.h4frame_tickle_enable makes it disablable from userland.

▸ lesson

The chip’s specification says nothing about this. The Linux hci_bcm driver “doesn’t have the bug” because Linux always wires HOST_WAKE (or doesn’t run on hardware where it’s missing). When you port a driver into an environment the original author never tested, every implicit assumption about the surrounding system becomes a bug to discover. The TX tickler is a workaround, not a fix; the real fix is HOST_WAKE, which is essay 10’s story. The tickler stays as a sysctl-controlled belt-and-suspenders option for boards where HOST_WAKE isn’t reliable yet.

The 1-byte UART trigger

Even with the tickler, performance was awful. Every event took two interrupts: one for the first ~7 bytes (filling the FIFO past its default 8-byte trigger), one for the timeout (the rest). The 8550-style FCR has four trigger levels — 1, 4, 8, 14 bytes. The default is 8. With the BCM4345 emitting events as small as 6 bytes (Command_Status), an 8-byte trigger means every short event sits in the FIFO until the character-timeout interrupt fires (~4 char times = ~27 µs at 1.5 Mbaud). Multiply by hundreds of events per pairing flow and the latency adds up.

But unconditionally lowering the trigger to 1 byte for every snps UART would cost the rest of the system — RFC 1340 console at 115200 doesn’t need single-byte interrupts. So the trigger change has to be targeted at UARTs that have a Broadcom BT chip child. The DT already encodes that: the BT subnode under UART2 has compatible = "brcm,bcm43xx". snps_attach() walks its FDT children and matches the prefix:

sys/dev/uart/uart_dev_snps.c

@@ -28,12 +28,14 @@
28 28 #include <sys/bus.h>
29 29 #include <sys/kernel.h>
30 30 #include <sys/module.h>
31 + #include <sys/sysctl.h>
31 32 #include <machine/bus.h>
32 33
33 34 #include <dev/uart/uart.h>
34 35 #include <dev/uart/uart_bus.h>
35 36 #include <dev/uart/uart_cpu_fdt.h>
36 37 #include <dev/uart/uart_dev_ns8250.h>
38 + #include <dev/ic/ns16550.h>
37 39
38 40 #include <dev/ofw/ofw_bus.h>
39 41 #include <dev/ofw/ofw_bus_subr.h>
@@ -231,11 +233,38 @@
231 233 sc->reset = reset;
232 234
233 235 return (BUS_PROBE_VENDOR);
236 + }
237 +
238 + /*
239 + * If a child node compatible with "brcm,bcm43*" is present under this
240 + * UART (i.e. a UART-attached Broadcom BT chip), force the RX FIFO
241 + * trigger to 1 byte. The BCM4345 sends short HCI Event responses
242 + * that would otherwise sit below the default 8-byte trigger and only
243 + * be picked up via the RXTOUT char-timeout interrupt.
244 + */
245 + static bool
246 + snps_has_bcm_child(device_t dev)
247 + {
248 + phandle_t node, child;
249 + char compat[64];
250 + int len;
251 +
252 + node = ofw_bus_get_node(dev);
253 + if (node == -1)
254 + return (false);
255 + for (child = OF_child(node); child != 0; child = OF_peer(child)) {
256 + len = OF_getprop(child, "compatible", compat, sizeof(compat));
257 + if (len > 0 && strncmp(compat, "brcm,bcm43", 10) == 0)
258 + return (true);
259 + }
260 + return (false);
234 261 }
235 262
236 263 static int
237 264 snps_attach(device_t dev)
238 265 {
266 + struct snps_softc *sc;
267 + struct uart_bas *bas;
239 268 phandle_t node;
240 269 int ret;
241 270
@@ -244,6 +273,17 @@
244 273 node = ofw_bus_get_node(dev);
245 274 /* Set up phandle to dev mapping */
246 275 OF_device_register_xref(OF_xref_from_node(node), dev);
276 +
277 + if (snps_has_bcm_child(dev)) {
278 + sc = device_get_softc(dev);
279 + bas = &sc->ns8250.base.sc_bas;
280 + /* Set RX trigger to 1 byte (every byte interrupts). */
281 + sc->ns8250.fcr = (sc->ns8250.fcr & ~0xc0) | FCR_RX_LOW;
282 + uart_setreg(bas, REG_FCR, sc->ns8250.fcr);
283 + uart_barrier(bas);
284 + device_printf(dev,
285 + "BT chip detected; RX FIFO trigger set to 1 byte\n");
286 + }
247 287 }
248 288
249 289 return (ret);

After this patch, dmesg shows uart2: BT chip detected; RX FIFO trigger set to 1 byte at boot, and HCI Event latency drops from a tenth of a millisecond to one byte time at 1.5 Mbaud — about 7 µs per byte, immediate IRQ on each. The tickler still fires occasionally but its rate dropped by an order of magnitude.

The patchram dance

The chip boots in ROM mode at 115200 baud and won’t do anything useful — no scanning, no advertising, no SDP — until you stream a vendor .hcd patchram into it. The file is a sequence of HCI command records: 2-byte opcode, 1-byte parameter length, parameters. The loader sends them as HCI commands and waits for Command_Complete on each. The very last record is Launch_RAM (0xfc4e) which resets the chip; you do NOT wait for a completion on that one.

Sounds simple. It isn’t.

overlay/usr/local/sbin/bcm_firmware_load.pl is the loader. The lessons it encodes are blood:

The loader output for a clean run looks like:

Step 1: HCI_Reset...
Step 2: Read_Local_Name (pre-patchram)... BCM4345C0
Step 3: Download_Minidriver (0xfc2e)...
Step 4: Uploading firmware records from /tmp/BCM4345C0.hcd...
  uploaded 50 records...
  uploaded 100 records...
  ...
  record 1832: Launch_RAM (0xfc4e) — fire-and-forget
Step 4: Uploaded 1832 records, 256441 bytes
Step 5: Waiting 250ms for chip to reset into patched firmware...
Step 6: Post-patch HCI_Reset... chip is alive in patched mode
Step 7: Switch chip baud to 1500000 (VSC 0xfc18)... chip baud set to 1500000
Step 8: Read_Local_Name (post-patchram)... BCM4345C0 Murata Type-1MW UART 37.4 MHz BT 5.0-0187
Step 9: Read_BD_Addr...   BD_Addr: 22:22:b1:f3:11:e2

That’s the chip introducing itself, in its post-patchram identity. The very first time we saw step 9 print a real BD address — instead of timing out, instead of Hardware_Error 0x00, instead of garbled bytes from a baud mismatch — that’s when the chip was truly attached.

What it looks like together

A complete Read_BD_Addr exchange after all three pieces are in place:

HCI trace 2 packets
1776832010.012444 CMD len=4 01 09 10 00
1776832010.013998 EVT len=14 04 0e 0a 01 09 10 00 e2 11 f3 b1 22 22

01 09 10 00 is the HCI Command packet (type 0x01) for opcode 0x1009 (Read_BD_Addr) with no parameters. 04 0e 0a 01 09 10 00 e2 11 f3 b1 22 22 is Command_Complete (event 0x0e) with ncmd=1, opcode 0x1009, status 0x00, and the BD address in little-endian. Six bytes outbound, fourteen bytes inbound, both ends acknowledging each other for the first time.

ng_h4frame is in the tree at src/sys/netgraph/bluetooth/drivers/h4frame/ng_h4frame.c with its module skeleton at src/sys/modules/netgraph/bluetooth/h4frame/Makefile. The patchram loader is at overlay/usr/local/sbin/bcm_firmware_load.pl. The UART trigger fix is the patch above. With those three landed, every essay after this can assume “the chip responds to HCI commands” — which is what essay 10 is going to need before it can fight the next battle.