10 · bluetooth

The HOST_WAKE IRQ saga

Polling, panicking, pivoting to a DT overlay — and discovering the same five PIC methods that bit Goodix had bitten Bluetooth too.

◐ partial

The BCM4345C5 has two GPIO sideband signals to the host — BT_HOST_WAKE (chip → host, “I have data, please come read”) and BT_DEV_WAKE (host → chip, “I’m awake and willing to receive commands”). On the PinePhone Pro these route to GPIO0_PA4 and GPIO2_PD2 respectively, both pulled out of the AzureWave AP6255 module and wired to the RK3399S’s pinctrl. Linux’s hci_bcm driver toggles DEV_WAKE dynamically (asserts before TX, deasserts after idle) and treats HOST_WAKE as an IRQ that wakes the UART out of suspend. Together they let the controller sit in low-power state and still deliver events promptly.

Without HOST_WAKE wired into the kernel, you have two options. Either poll the chip’s UART forever (burns power, adds latency, and triggers the trailing-byte stall from essay 9). Or pin DEV_WAKE high so the chip never gates RX, and live without the wake-on-event. We did the second. But before we got there, we tried — and broke — most of the alternatives.

This is also the story where a problem we’d already solved for the touchscreen (essay 6) turned out to have been blocking Bluetooth all along.

Why HOST_WAKE matters

Without HOST_WAKE, the chip sometimes holds events in its TX FIFO until host TX activity flushes them — the trailing-byte stall from essay 9. The h4frame TX tickler is a workaround for exactly that condition. With HOST_WAKE asserting on event arrival, the host knows immediately that data is ready; we can stop poking the chip with NULL bytes and instead service it on demand.

Without DEV_WAKE pinned (or dynamically asserted before each TX), the BCM4345 with sleep_mode=0 firmware still gates command acceptance on DEV_WAKE level. We discovered this empirically — bcm_hostwake.c documents it at the top of the file: dropping DEV_WAKE after idle causes hccontrol timeouts on the very next command. Linux’s dynamic toggle works because Linux uses sleep_mode=1 and the chip honors LPM. With our sleep_mode=0 we don’t get LPM benefits, so the simplest correct policy is “DEV_WAKE high forever.”

The polling phase

The first attempt at HOST_WAKE was a 1 kHz callout in a custom bcm_hostwake.ko ( cd6b7ed bcm_hostwake: experimental module probing BCM4345 HOST_WAKE GPIO ) that polled the GPIO via the gpioc(4) userland-facing API and drove DEV_WAKE on rising edges. Crude, lossy, but enough to prove the contract: with this in place, SSP Simple_Pairing_Complete came back with status=0x00 instead of timing out. The mechanism was right; the implementation was wasteful.

The IRQ attempt that panicked

Moving from polling to interrupts on a GPIO line in FreeBSD INTRNG goes through gpio_alloc_intr_resource(). The first argument is the consumer device — the device that will be receiving the interrupt. We passed gpio_dev (the rk_gpio controller’s device_t) as the consumer, reasoning that the rk_gpio device is what’s hosting the IRQ source.

[WAR STORY]

simplebus alloc_resource panic on rk_gpio consumer

bcm_hostwake / simplebus

▸ symptom

Kernel panic on kldload bcm_hostwake. Backtrace: simplebus_alloc_resourcebus_generic_rl_get_resource → null deref. The simplebus child accounting walks the consumer’s FDT-derived resource list looking for the synthetic IRQ rid that gpio_alloc_intr_resource allocated. rk_gpio has no such rid in its own resource list — it’s a GPIO controller, not a consumer of its own IRQs.

▸ hypothesis 1

Wrong rid. Maybe gpio_alloc_intr_resource is returning a rid that simplebus doesn’t recognize. Audited the rid handling — it was correct, the rid was the one freshly-minted by the GPIO PIC for this consumer. The lookup wasn’t the problem; the consumer identity was.

▸ hypothesis 2

Maybe pass the bcm_hostwake’s own device_t instead. But our bcm_hostwake module wasn’t a device at the time — it was a one-shot MOD_LOAD callback that grabbed the GPIO and registered a callout. There was no device_t to attach to.

▸ breakthrough

The pattern in upstream FreeBSD’s gpio-keys and gpiopps is: the consumer must be a separate simplebus child device with its own DT node and compatible string. Then simplebus allocates its resource list from that DT node, and gpio_alloc_intr_resource(sc->dev, ...) works because sc->dev is the simplebus child’s device_t, not the GPIO controller. We need a DT node.

▸ fix

Rewrite bcm_hostwake.c as a simplebus child driver matching compatible = "pine64,bcm-hostwake" ( 4fedd5c bcm_hostwake: rewrite as simplebus FDT driver for real GPIO IRQ ). Register it via DRIVER_MODULE(bcm_hostwake, simplebus, ...). Add a DT node so simplebus has something to attach to. Inside bhw_attach, call gpio_pin_get_by_ofw_property(dev, node, "host-wakeup-gpios", ...), then gpio_alloc_intr_resource(dev, ...) with the child’s device_t, then bus_setup_intr with our IRQ filter.

DT overlay, not base-DTB rebuild

Adding a DT node is normally a base-DTB problem: edit the upstream rk3399-pinephone-pro.dts, regenerate the DTB, drop it into /boot/dtb/. We tried that. Twice. Both times hit something stupid — typoed labels (bt_host_wake_l is what the upstream pinctrl labels say, not bt_host_wake_pin; 0b3ae0c dts patch: use correct bt pinctrl labels (bt_host_wake_pin / bt_wake_pin) fixed the labels), then a line-number drift in our patch ( 8f3fdeb dts patch: fix line numbers for bcm-hostwake insertion ). Each iteration left a broken DTB on the SD card and required a U-Boot recovery cycle.

The pivot: FreeBSD’s loader supports FDT overlays via fdt_overlays in loader.conf. Compile a .dtbo, drop in /boot/dtb/overlays/, set the env var, done. No upstream DTS surgery required, no risk of bricking boot, the overlay applies cleanly on top of whatever the base DTB happens to be. 38146c0 bcm-hostwake: switch to DT overlay instead of base-DTB patch moved everything to that approach.

The overlay itself is a 34-line file:

src/sys/dts/arm64/overlays/bcm-hostwake.dtso label: bcm-hostwake
	bcm-hostwake {
		compatible = "pine64,bcm-hostwake";
		host-wakeup-gpios   = <&gpio0 4  0>;
		device-wakeup-gpios = <&gpio2 26 0>;
		pinctrl-names = "default";
		pinctrl-0 = <&bt_host_wake_l &bt_wake_l>;
		status = "okay";
	};

Two GPIO references and a pinctrl pair. RK_PA4 is encoded as bank-offset 4 (bank 0, pin 4); RK_PD2 as 26 (bank 2, pin 3*8+2). The pinctrl-names / pinctrl-0 pull in the upstream pinctrl labels (bt_host_wake_l, bt_wake_l) so the GPIOs are in the right pinmux state at attach.

And then it didn’t work — again

With the DT overlay loaded, the bcm_hostwake driver attached. gpio_alloc_intr_resource returned a non-NULL struct resource *. bus_setup_intr returned ENXIO.

If you read essay 6 (Goodix and the PIC methods), the next paragraph writes itself.

[WAR STORY]

The same five missing PIC methods, again

bcm_hostwake / rk_gpio

▸ symptom

bcm_hostwake0: bus_setup_intr: 6 (ENXIO) at attach. The IRQ resource was allocated. The DT overlay wired up cleanly. Nothing in our driver was suspect. The error was identical to the one that had stalled Goodix touch — same return code, same call site.

▸ breakthrough

Same root cause. rk_gpio.c had registered a PIC class with only three of the eight INTRNG-required methods (pic_map_intr, pic_setup_intr, pic_teardown_intr) — missing pic_enable_intr, pic_disable_intr, pic_pre_ithread, pic_post_ithread, pic_post_filter. Calling any of the missing methods returns kobj_error_method’s ENXIO. bus_setup_intr calls pic_enable_intr after pic_setup_intr. Boom.

The five missing methods landed for Goodix in essay 6 ( 5d6a594 rk_gpio: add missing PIC methods for GPIO interrupts (enable/disable/pre/post) ). With those methods present in rk_gpio, bus_setup_intr for bcm_hostwake suddenly returned 0.

▸ fix

There was nothing to fix in bcm_hostwake. The rk_gpio fix from essay 6 was the fix for HOST_WAKE too. Two unrelated GPIO IRQ consumers — a touchscreen on bank 1 pin C5, a Bluetooth wake on bank 0 pin A4 — both blocked on the same set of missing PIC methods that nobody had ever needed to implement because the upstream FreeBSD tree had no in-tree consumer that wanted GPIO interrupts on this controller.

▸ lesson

In a downstream port, every untested code path in upstream is a bug waiting for your customer. The rk_gpio PIC methods had been missing for years. No upstream device cared. The Pi Touch panel and the Broadcom BT chip both cared. Two separate driver bringups, two unrelated subsystems, both hit the same bug independently — and the second one was free, because we’d already paid the cost of finding it the first time.

The corollary: when you’re the first person to use a code path, you are also the test suite for everything beneath it. The bug is rarely in your driver. It’s in the foundation that nobody had stress-tested in your direction.

What the driver actually does

src/sys/modules/bcm_hostwake/bcm_hostwake.c is 168 lines. It probes pine64,bcm-hostwake, grabs both GPIOs, sets DEV_WAKE high once at attach, registers an edge-rising IRQ on HOST_WAKE, and counts edges in hw.bcm_hostwake.irqs. The IRQ handler does no work:

static int
bhw_intr(void *arg __unused)
{
    atomic_add_long(&bhw_irqs, 1);
    return (FILTER_HANDLED);
}

No DEV_WAKE pulse, no notification to ng_hci, no reading of UART data. Just a counter. The point of the IRQ-driven path right now is to prove the IRQ path works and to provide a sysctl that increments visibly during BT activity. The actual servicing of HCI events still happens via ng_h4frame getting bytes from ng_tty getting bytes from the UART RX interrupt.

This is intentional. The DEV_WAKE policy (“pin high forever”) is documented at the top of the file: with sleep_mode=0 firmware the chip doesn’t honor LPM, so dynamic DEV_WAKE toggling is at best wasted, at worst breaks TX. Pin it high and move on.

Status: partial, not working

This essay is <Status state="partial" /> because the integration with ng_h4frame’s tickler is unfinished. The honest debate inside the project right now: with HOST_WAKE armed and bhw_irqs incrementing on every event, do we still need the tickler at all? Empirically, the trailing-byte stall hasn’t been reproduced in the last 30+ pairing flows since HOST_WAKE went in. But the tickler is still enabled (default hw.h4frame_tickle_enable=1) because nobody has done the controlled experiment to confirm it’s now redundant. Disabling it might surface a remaining stall path; leaving it on costs one packet every few seconds in the worst case. We left it on.

The other deferred item: the IRQ handler should eventually do something useful — at minimum, kick the UART RX path so we don’t wait for the next cuau0 interrupt. That requires plumbing from bcm_hostwake into ng_tty or into the snps UART driver, neither of which has a clean entry point yet. Filed.

The chip is now reachable, attached, and emitting events the kernel sees promptly. Next we have to make it pair with something — which is where SSP, Secure Connections, and a firmware swap come in.