Appendix · reference

Cross-driver audit (2026-04-30)

Side-by-side review of bwfm/dwmmc, rk_vop, fusb302+rk818, and rt5640 against Linux/OpenBSD references — divergences, suspect band-aids, and the prioritized fix list.

This is the punch-list that came out of auditing five subsystems against their Linux mainline (and where applicable OpenBSD) references on 2026-04-30, after the WiFi stability arc landed and we picked our heads up. The audit lists what each driver does, what the reference implementation does, and what is most likely the cause of the next user-visible flake. Recommendations are actionable: file, line, register, bit. Every item is falsifiable on the next bench session.

The site already has subsystem essays for the surface symptoms — this page is the diff between what we ship and what the canonical drivers do.

SDIO IRQ delivery — DWMMC + bwfm

The current bench reality (2026-04-29 logs): WiFi associates, completes a 2 MiB transfer with cmd53_err=0, but dev.rockchip_dwmmc.0.sdio_intrs stays at 0. The driver only survives because bwfm_sdio_poll_callout ticks at 100 Hz and enqueues the task regardless of whether the chip raised an IRQ.

Reference path (Linux mainline)

drivers/mmc/host/dw_mmc.c:

The order is CLKENA(LP=0) + UPD_CLK → INTMASK |= SDIO → CCCR INT_ENABLE = 0x03.

Our path

patches/sys/dev/mmc/host/dwmmc.c.patch::dwmmc_cam_sdio_intr (the function added by 916fe51 Add SDIO IRQ support for bwfm + extended by 5c4b836 Keep DWMMC clock running for SDIO IRQ ) used to do INTMASK first, then CLKENA + UPD_CLK — the reverse of Linux. The current overlay arms the handler state first, commits CLKENA with low-power mode cleared, clears stale RINTSTS.SDIO, and only then unmasks INTMASK.SDIO. The bench predicate is still dev.rockchip_dwmmc.0.sdio_intrs > 0 during a bounded transfer. dwmmc_intr() SDIO branch acks RINTSTS but does not clear the SDIO bit in INTMASK on fire, leaving it permanently armed. dwmmc_attach() installs an initial INTMASK that does not include SDMMC_INTMASK_SDIO; any code path that re-issues that mask silently disarms us.

src/sys/dev/bwfm/bwfm_sdio.c:

Top divergences ranked by “could explain sdio_intrs=0

  1. DAT[1] pinctrl pull configuration. SDIO function-1 IRQ is signaled by the card pulling DAT[1] low while in 4-bit mode. If our DTS pinctrl for the SDIO node leaves DAT[1] without a pull-up (or with the wrong drive strength), the card asserts low but the controller front-end sees nothing — exactly matching the symptom (CMD53 holds the line driven during a transfer, so polling works; between transfers the line floats and the IRQ is lost). This is the single highest-likelihood root cause.

  2. INTMASK / CLKENA write order. Linux is explicit that the clock must be ungated before INTMASK_SDIO is set; otherwise an initial spurious DAT[1] edge during clock startup can be latched and the real IRQ lost.

  3. No mask-on-fire in dwmmc_intr SDIO branch. Linux disables INTMASK_SDIO in the handler and re-enables after sdio_signal_irq returns. We don’t, which means a card holding DAT[1] low through our function handler will create an IRQ storm — but right now no IRQ ever fires in the first place, so we haven’t hit it.

Recommendations (priority order)

  1. Audit the sdmmc0 / sdio0 pinctrl entries in our PPP DTS for explicit bias-pull-up on DAT[0..3] and CMD. Cross-reference Rockchip rockchip,rk3399.dtsi and the postmarketOS PPP DTS. If DAT[1] has no pull, no software fix in dwmmc.c will help.
  2. Reorder dwmmc_cam_sdio_intr to write CLKENA (clear LP) + UPD_CLK busy-poll before INTMASK. After UPD_CLK and before INTMASK, also clear any latched stale state with WRITE4(SDMMC_RINTSTS, SDMMC_INTMASK_SDIO).
  3. Add a single capture point in sdiob_claim_irq: dump INTMASK / MINTSTS / RINTSTS / GRF pull register for the SDIO pads immediately after the MMC_SIM_SDIO_INTR(...,true) call and before the CCCR 0x00 → 0x03 write. The next bench session can then say in one log entry whether the host bit is set, the GPIO is pulled, and whether the card has already raised the line.
  4. Mirror Linux’s mask-on-fire in dwmmc_intr SDIO branch once IRQs are flowing.
  5. Stop bwfm_sdio_poll_callout once sc_irq_count > 0 so logs stop conflating IRQ and poll-driven progress.

Strange code

bwfm — TX, NVRAM, scan, events

Things still OpenBSD-shaped

NVRAM padding (likely root cause of intermittent firmware weirdness)

bwfm_sdio.c:1057-1075 — after stripping trailing NULs we do nvlen += 2, then roundup(nvlen, 4). Linux’s brcmf_fw_nvram_strip() does roundup(nvram_len + 1, 4) — exactly one trailing NUL plus alignment fill. We always add one more NUL than Linux. On any input whose stripped length is ≡ 3 (mod 4) the token offset shifts by one byte and the firmware-side CRC of the NVRAM block mismatches the count we encode in the trailer.

The fix is one-character: change nvlen += 2 to nvlen += 1 at bwfm_sdio.c:1060, and adjust the nvlen - 2 reference at L1068.

Scan, credits, events

Strange code / band-aids

rk_vop — the modeset-lock wedge

Open WarStory in essay 08. The audit found three structurally suspect items in the same code path; two of them match the wedge symptoms exactly.

Reference path (Linux mainline)

drivers/gpu/drm/rockchip/rockchip_drm_fb.c mode_config helpers point at the stock drm_atomic_helper_commit_tail_rpm, which runs: modeset_disables → modeset_enables → commit_planes(ACTIVE_ONLY) → fake_vblank → commit_hw_donewait_for_vblanks → cleanup_planes.

rockchip_drm_vop.c::vop_crtc_atomic_flush (1618-1685) latches the event into vop->event under event_lock, calls drm_crtc_vblank_get, then vop_cfg_done. The IRQ handler (vop_isr 1393-1455) on FS_INTR calls drm_crtc_handle_vblank, then vop_handle_vblank (2234-2246) drains vop->event with drm_crtc_send_vblank_event and calls drm_crtc_vblank_put.

The point is that the event lives on the driver between flush and vblank IRQ — never on crtc->state once flush has run.

Our path

src/sys/dev/drm/rockchip/rk_drm.c:147-165 — custom rk_drm_atomic_commit_tail that does modeset_disables → commit_planes → modeset_enables → fake_vblank → commit_hw_done → cleanup_planes. The comment at 157-162 explicitly says wait_for_vblanks was removed because “the wait can deadlock when a GPU reset parks the scheduler thread while a DRM commit is in flight” — that diagnosis predates the panfrost reset rework ( 3378d3d Queue Panfrost reset before scheduler stop , cf2dd90 Fix Panfrost reset fence signaling context ).

src/sys/dev/drm/rockchip/rk_vop.c:428-444rk_crtc_atomic_begin sends crtc->state->event immediately and nulls it. rk_crtc_atomic_flush (446-481) re-reads crtc->state->event; if begin already drained it, flush no-ops. Both paths exist; the prior git history (per the agent’s read of revert chains) shows them fighting across multiple revisions.

rk_vop.c:269-290rk_vop_intr reads INTR_STATUS0, then writes ~0 to INTR_CLEAR0 (line 280) — clears every pending bit in the status register, including bits that were about to assert. On a shared interrupt line a frame-completion that arrives between the read and the write is silently lost.

Top divergences ranked by “could explain the wedge”

  1. drm_atomic_helper_wait_for_vblanks removed. Without this, cleanup_planes runs synchronously in the same commit as commit_hw_done. A later commit’s wait_for_dependencies then sees a commit->flip_done that nobody completes. Exact symptom: [CRTC:33:crtc-0] flip_done timed out on the wedge screenshot.
  2. Shared-IRQ ack writes ~0. rk_vop.c:280 clobbers any in-flight FS_INTR by clearing the entire status register. Frame completions that race the ack are lost — so vblank counter advances stop, the CRTC commit chain stalls.
  3. Double-event drain. Both atomic_begin and atomic_flush handle the event, in different ways, with no vop->event driver-side latch. vblank_get is taken in flush (line 470) but never balanced by a vblank_put (the IRQ never sees the event because the IRQ doesn’t drain vop->event — there is no vop->event). A vblank_get without a matching vblank_put is exactly the kind of imbalance that surfaces as “flip_done never completes” under sustained load.
  4. vblank_get called with event_lock held. rk_vop.c:465 takes event_lock then 470 calls drm_crtc_vblank_get, which takes vbl_lock. The IRQ path takes vbl_lock and may signal a waiter on event_lock. Classic A-B / B-A.

Recommendations (priority order)

  1. Match Linux’s vop->event pattern. Add vop_event to softc. In rk_crtc_atomic_flush only: under event_lock, WARN_ON(vop_event != NULL); WARN_ON(drm_crtc_vblank_get(crtc) != 0); vop_event = state->event; state->event = NULL. Gut rk_crtc_atomic_begin’s event handling to a no-op. In rk_vop_intr after drm_crtc_handle_vblank, call a new rk_vop_handle_vblank(sc) that under event_lock does drm_crtc_send_vblank_event(crtc, vop_event); drm_crtc_vblank_put(crtc); vop_event = NULL.
  2. Restore drm_atomic_helper_wait_for_vblanks in rk_drm_atomic_commit_tail before cleanup_planes. The original panfrost-reset deadlock concern is mooted by the post- 3378d3d Queue Panfrost reset before scheduler stop reset path (deferred to taskqueue, fence signaled before scheduler stop).
  3. Fix the shared-IRQ ack: change INTR_CLEAR0 = ~0 to INTR_CLEAR0 = status, and gate the handler with if ((status & INTR_STATUS0_FS_INTR) == 0) return; (return FILTER_STRAY if converted to a filter).
  4. Add a per-commit ringbuffer sysctl for scripts/wedge-repro to bisect the remaining cases when (1)-(3) land. Capture per commit: (timestamp, crtc_seq, atomic_begin_event_ptr, atomic_flush_event_ptr, vop_event_ptr, fs_intr_count, vbl_counter, curthread).
  5. Drop the U-Boot-preserve early-return in rk_crtc_atomic_enable (rk_vop.c:502-514) once first_modeset_done is set. Today every Hyprland output reconfig goes through the short-circuit and skips timing reprogramming.

Strange code

fusb302 + rk818 — Hard_Reset and OTG

USB-PD sink works on a fresh boot. Two open issues: Hard_Reset lands in PE_DISABLED and stays there until manually reattached; no source-role path for future host-mode / OTG.

Linux Hard_Reset path

drivers/usb/typec/tcpm/tcpm.c:

Linux does not have PE_DISABLED. Recovery is a timed sequence; the BC_LVL signal is not used as a “drop everything” trigger.

Our equivalent

fusb302.c:1329-1343 HARDRST handler does TX/RX flush, zeros message-id counters and source-PDO state, jumps directly to PE_SNK_STARTUP. It does not write FUSB302_RESET = RESET_PD, and it does not wait PD_T_SAFE_0V for the source’s VBUS to settle. Verified at the cited lines.

fusb302.c:1296-1306 BC_LVL=0 watchdog — if BC_LVL_MASK is zero for

500 ms while VBUS is present, drop to PE_DISABLED. Verified. This is the root cause of the post-Hard_Reset wedge. After Hard_Reset the source briefly drops Rp before re-applying; the watchdog sees BC_LVL=0

fusb302.c:1496-1509 PE_SNK_HARD_RESET (the case where we send Hard_Reset, not receive) does DELAY(5000) — that is 5 ms, not 5 µs (FreeBSD DELAY() is microseconds). Linux’s PD_T_PS_HARD_RESET = 30 ms is six times longer; we may transition to PE_SNK_STARTUP before the chip has finished BMC-encoding the Hard_Reset preamble.

Linux source-role → rk818 OTG path

PPP DTS:

Source attach: TCPM enters SRC_STARTUPtcpm_set_vbus(port, true)fusb302::tcpm_set_vbus (drivers/usb/typec/tcpm/fusb302.c:759) → regulator_enable(chip->vbus) → rk808-regulator core flips DCDC_EN_REG bit 7. Bit 6 (SWITCH2) is unrelated — it’s a generic switched output (RK818_ID_SWITCH2), not OTG. Our DCDC_EN_OTG_MASK = OTG | SWITCH2 (rk818_battery.h:54) bundles something that shouldn’t be bundled.

Recommendations (priority order)

  1. Fix Hard_Reset recovery. In the INTA.HARDRST handler at fusb302.c:1329:
    • Write FUSB302_RESET = RESET_PD (BIT(1)) immediately.
    • Reset bc_lvl_zero_ticks = 0 so the BC_LVL watchdog doesn’t fire on the post-Hard_Reset CC blip.
    • Add an explicit PE_SNK_HARD_RESET_RECOVERY state with a PD_T_SAFE_0V = 650 ms deadline; only enter PE_SNK_STARTUP after that.
  2. Skip BC_LVL=0 watchdog for ~1 s after HARDRST. Or bump from 500 ms to 1000 ms+. Today this is the actual cause of the PE_DISABLED wedge — a timing-band mismatch between two correct policies.
  3. Auto-reattach as a backstop. On entry to PE_DISABLED after HARDRST, schedule a 2-second callout that re-runs the equivalent of dev.fusb302.0.reattach=1. Keep the sysctl for debug.
  4. Drop SWITCH2 from DCDC_EN_OTG_MASK. It’s an unrelated rail. Keep DCDC_EN_OTG = BIT(7) only.
  5. Source-role state machine when we tackle OTG. Add PE_SRC_* states; on PR_Swap from sink to us-as-source, call rk818_charger_set_source_role(true) (already exists at rk818_battery.c:710) from SRC_TRANSITION_TO_DEFAULT.
  6. Decouple fusb302 ↔ rk818 from the global-singleton call by exposing an eventhandler (EVENTHANDLER_DECLARE(typec_role_change, ...)) so PineTab2’s rk817 can register the same listener.
  7. Fix PE_SNK_HARD_RESET send delay at fusb302.c:1507: bump DELAY(5000) (5 ms) to a pause() of 30 ms to match PD_T_PS_HARD_RESET.

Other small bugs

rt5640 — locking, capture path, rk_tsadc

Lock-order reversal

rt5640.c:865-907 rt5640_attach runs in newbus attach context; iicbus child-attach already holds the iicbus lock when it calls our attach. Attach calls rt5640_init(sc) (line 899) which fires ~50 register writes via rt5640_writeiicdev_writeto(IIC_WAIT)iicbus_request_bus → re-acquires the iicbus mutex. Reversal.

Linux avoids this with regmap: every regmap_update_bits performs its own bus arbitration, and codec init runs in the ASoC probe taskqueue — not under a held bus lock.

The right fix is the Linux pattern: in rt5640_attach, replace inline rt5640_init(sc) with taskqueue_enqueue(taskqueue_thread, &sc->init_task). Drop RT5640_LOCK around single iicdev_writeto calls (which already serialize on the iicbus lock); keep it only for read-modify-write sequences in rt5640_modify.

Microphone capture (DAPM walk)

The PinePhone Pro internal mic is DMIC1 on IN1P (per the project_rt5640_mic_plan.md memory + PPP DTS realtek,dmic1-data-pin = <1>). Our rt5640reg.h is missing every register needed for capture — STO_ADC_MIXER (0x27), REC_L1/2_MIXER (0x3b/0x3c), REC_R1/2_MIXER (0x3d/0x3e), ADC_DIG_VOL (0x1c), IN1_IN2 (0x0d), INL_INR_VOL (0x0f), DMIC (0x75). Add these defs.

DMIC1 init sequence (append to rt5640_init):

/* PWR_ADC_L|R, then PWR_ADC_SF (stereo filter) */
rt5640_modify(sc, RT5640_PWR_DIG1, (1<<2)|(1<<1), (1<<2)|(1<<1));
rt5640_modify(sc, RT5640_PWR_DIG2, (1<<15), (1<<15));

/* DMIC1 enable, IN1P data pin per PPP DTS, divider index 3 */
rt5640_write(sc, 0x75 /*DMIC*/, (1<<15) | (1<<11) | (3<<5));

/* Stereo ADC mixer: ADC2 = DMIC1, unmute L2/R2, mute L1/R1 */
rt5640_write(sc, 0x27 /*STO_ADC_MIXER*/, 0x4040);

/* ADC digital volume ~0 dB */
rt5640_write(sc, 0x1c /*ADC_DIG_VOL*/, 0x2f2f);

For analog headset mic on IN2P (gated on jack-detect later): power BST2 + MICBIAS1 in PWR_ANLG2, set BST2 gain in IN1_IN2, unmute the BST2 input in REC_L2_MIXER / REC_R2_MIXER, switch STO_ADC_MIXER to analog (0x2020).

mixer_setrecsrc is unwired

rt5640.c:487-492 returns 0 unconditionally. Wire it to flip STO_ADC_MIXER between DMIC and analog so userland can pick the input.

rt5640_dai_trigger is a no-op for PCMDIR_REC

rt5640.c:802-819. With capture enabled, REC must gate PWR_DIG1 ADC bits and PWR_ANLG2 BST/MB bits on/off so we don’t burn current when no stream is open. Mirror Linux’s DAPM transitions.

Strange code

rk_tsadc — superseded by the 2026-05-01 audit

This section was written before src/sys/arm64/rockchip/rk_tsadc.c landed. The follow-up candidate driver audit corrected one important detail: rk3399_tsadc_data uses v3 initialize/control hooks, but v2 callbacks for data, alarm, and shutdown registers. The current driver should therefore mirror drivers/thermal/rockchip_thermal.c this way:

The old note’s generic v3 COMP_SHUT = 0x10c + chn*4 advice is wrong for RK3399.

Cross-cutting priority list

These are the items most likely to convert “mostly works” into “works”:

  1. ◐ partial rk_vop: wait_for_vblanks, the vop->event driver-side latch, and the shared-IRQ ack fix have code in the overlay. Bench still needs a scripts/wedge-repro soak to prove the modeset-lock wedge is gone.
  2. ◐ partial fusb302: Hard_Reset recovery code is present for both received and locally-sent resets; bench still needs to prove neither direction falls back to PE_DISABLED.
  3. ◐ partial bwfm_sdio: NVRAM trailing-NUL count, EAPOL priority, scan-version shortcut, control-credit reservation, and TX credit-window clamp are in the overlay. Remaining work is hardware soak, not first-code.
  4. ◐ partial bwfm_sdio + dwmmc: pinctrl audit on SDIO DAT[1] pull, then bench the INTMASK / CLKENA reorder patch, then add a capture point in sdiob_claim_irq only if sdio_intrs=0 persists.
  5. ● working rt5640: DMIC1 has an audible div3/edge3/adc32 bench setting; analog headset capture still needs a receipt.
  6. ◐ partial fusb302 / rk818: type-C role-change eventhandler and OTG rail sequencing are scaffolded; DWC3 host-mode bridge is next once a USB-C accessory can be tested.

Each item has a verified file:line and a falsifiable next-bench predicate. The wedge-repro instrumentation ( 279739c scripts/wedge-repro: deterministic load generator for the modeset-lock wedge , e22bae6 drm: gate WARN_ON kdb_backtrace+panic behind sysctls for wedge debugging ) is the lever for #1; the existing debug:wifi:transfer harness is the lever for #3 and #4.

Caveats

This audit was produced by reading our drivers against Linux mainline and OpenBSD HEAD on 2026-04-30. Two corrections were made during review of the raw findings: (a) DELAY(5000) is 5 ms, not 5 µs as the initial draft stated; (b) the txq-repair band-aid uses BWFM_TXQLEN, not the hard-coded 256 initially flagged. Other claims have been spot-checked against the cited file:line locations; readers should verify before patching.

Reproduction notes for each subsystem live in the existing recipe appendices (finishing plan, USB-C / PD verification, GPU debugging).