Appendix · reference

When to reach for callout(9): a driver audit

Where the FUSB302 refactor's lesson generalizes — and where it doesn't

The FUSB302 type-C/PD driver had a class of bug that’s easy to write into any event-driven driver: spec deadlines (tNoResponse, tSenderResponse, tPSTransition) were checked by computing ticks - state_entered_ticks at the top of the worker, on every wake. That works fine when the worker runs on a fixed cadence faster than the shortest deadline. It stops working the moment the worker becomes purely event-driven — an IRQ wake plus an occasional 200 ms heartbeat is enough to step past a 30 ms deadline without ever evaluating the in-state logic, and the PD policy engine simply hangs in PE_SNK_SELECT_CAPABILITY waiting for an Accept that already came and went.

12a63fe fusb302: PD timers via callout instead of state-age polling moved those three deadlines onto a single struct callout per softc, armed via callout_init_mtx so the callback runs under the same mutex as the worker. set_state() cancels any prior arming, installs the new state, and arms the callout if the new state has a deadline; the callback sets pe_timer_fired and wakes the kproc exactly when the deadline expires.

This appendix walks the rest of the drivers we own or heavily modified and asks the same question: is there time-based logic that polls “has N ms passed since X” instead of arming a deadline? Where the answer is “yes and that’s a bug waiting to happen” we say so. Where the existing pattern (callout-for-poll, state-age-for-rate-limit, spin-with-DELAY-for-microsecond-PHY-settling) is actually correct, we say that too — the goal is not to migrate everything to callouts.

src/sys/dev/iicbus/fusb302.c — the motivating case

Refactored in 12a63fe fusb302: PD timers via callout instead of state-age polling . The driver still uses fusb302_state_age_ms() in exactly one place after the refactor: PE_DISABLED retries detect_cc() once per second when VBUS is up but no source has been detected. That’s a rate limit, not a spec deadline — if the retry happens at 1.0 s or 1.2 s, nothing breaks. Leaving it on state-age math is the right call; arming a callout for a “try again in roughly a second” loop would just add cancel/rearm churn around every interrupt.

● working All three PD spec timers now arm via callout; state-age math survives only for the 1 Hz cable-rescan loop.

src/sys/dev/usb/controller/dwc3/dwc3_gadget.c — already on callout

The DWC3 gadget TX watchdog (dwc3_gadget_tx_watchdog) is exactly the pattern FUSB302 just adopted: callout_init_mtx(&sc->tx_watchdog, &sc->sc_bus.bus_mtx, 0) at attach, callout_reset(&sc->tx_watchdog, hz / 2, ...) at start and at the end of every tick. The callback runs under USB_BUS_LOCK, compares tx_ok_count against tx_ok_at_last_watchdog, and unsticks the endpoint if no progress happened. That’s a 500 ms sweep, not a one-shot deadline, but the mtx-bound callout discipline is the same one we want.

The other timing in DWC3 is microsecond-scale PHY/clock settling (DELAY(2500) after CSFTRST, DELAY(2500) for UTMI clock) and register-poll loops (DELAY(1) × 1000 waiting on DCTL.CSFTRST to self-clear). Both are correct uses of DELAY — converting a 1 µs register poll into a callout is silly; the cost of the callout machinery dwarfs the wait.

● working No migration needed.

src/sys/dev/iicbus/goodix.c — already on callout, correctly

When goodix can’t allocate an IRQ it falls back to polling at 100 Hz. callout_init(&sc->sc_poll_callout, 1) then callout_reset(..., hz / 100, goodix_poll, sc) from attach, and the ISR re-arms the callout from its tail (the if (sc->sc_polling) re-arm at the end of goodix_intr). That’s a periodic sweep, which is what callouts are for. There is no state-age arithmetic anywhere else in the file.

The boot sequence has hard-coded DELAY(20000) / DELAY(50000) waits between reset and read of the firmware version. Per the Goodix controller datasheet these are required power-up waits, not adjustable timeouts, and they happen exactly once at attach. Leaving them as DELAY is correct — there’s no kproc to wake yet.

● working No migration needed. (The recent rk_i2c fix in 2972362 rk_i2c + goodix: fix iicbus child IRQ alloc is what got the IRQ path working in the first place; goodix only used the polling fallback because the i2c bus was eating its IRQ allocation.)

src/sys/dev/iicbus/rt5640.c — codec init, no state machine

The RT5640 codec init writes a sequence of registers with DELAY(10000) / DELAY(15000) / DELAY(80000) between groups. These are power-rail / VREF / DAC-bias settling times specified by the Realtek datasheet, and they happen exactly once during rt5640_init while the codec is being brought out of standby. There is no state machine running on top of them — the driver is request/response over i2c plus ALSA-style register routing — so there’s nothing to convert. State-age polling never appears in this file.

The mic-init writes drafted in project_rt5640_mic_plan (still untested) follow the same pattern: a fixed sequence of register writes with datasheet-mandated settle times. Use DELAY there too.

▸ next Mic init still untested; pattern stays as-is.

src/sys/dev/sound/fdt/simple_amplifier.c — no post-trigger delay

The simple-audio-amp shim (the one that drives PB3 from a PCMTRIG_START to enable the speaker amp, landed 3b23d2f audio: drive simple amplifier enable GPIO ) toggles the GPIO synchronously in simple_amp_dai_trigger: gpio_pin_set_active(..., 1) on PCMTRIG_START, gpio_pin_set_active(..., 0) on PCMTRIG_STOP. There is no post-trigger pop-suppression delay, no soft-mute, no fade.

If we ever hear pop on enable that we want to suppress, the right tool is not a callout — it’s writing the codec’s HP_SOFT_MUTE / DAC_DIG_VOL ramp registers before raising the GPIO, which is request/response i2c again. The amp turning on a few ms after the DAC has its analog output muted is what suppresses pop on real Linux too.

● working No migration needed.

src/sys/dev/iicbus/pmic/rockchip/rk818_battery.c — periodic sweep, tsleep is fine

The battery monitor runs as its own kproc and does tsleep(&my_rwlock, PWAIT, "rk818_battery_thread", 1 * hz) between samples. That’s a 1 Hz fuel-gauge poll — exactly the use case tsleep is designed for. A callout would also work but would require teaching the i2c read path to be callable from callout context (it currently isn’t — i2c transfers can sleep). The kproc + tsleep pattern is strictly simpler.

There is no time-based state machine on the charger side. The driver doesn’t currently arm “if charge current hasn’t risen above X mA in N seconds, throttle back” rules, which would be the legitimate place for a callout. If we add that logic later, it should arm a callout on the relevant register write and cancel/rearm on each charger event, not check elapsed time on every poll wake.

● working No migration needed today; flag for review if we add charger-state timeouts.

src/sys/dev/bwfm/bwfm_sdio.c — already on callout, BCDC tsleep is correct

The SDIO IRQ-poll fallback (bwfm_sdio_poll_callout) is a proper periodic sweep: callout_init(&sc->sc_poll, 1) at attach, callout_reset(&sc->sc_poll, hz / 100, ...) at the bottom of every tick. There’s no state-age arithmetic.

Firmware download and BCDC command paths use tsleep(ctl, 0, "bwfm", hz) to wait for a response with a 1 s timeout. That’s the correct pattern for a “send command, sleep until either response arrives or 1 s expires” RPC. The response handler wakeup()s the channel; if no response arrives, tsleep returns nonzero and the caller treats it as a BCDC timeout. Converting this to a callout gains nothing — tsleep already has the timeout built in.

The boot-time DELAY(65) / DELAY(1000) / DELAY(10000) waits are SDIO bus-reset, F2-ready, and chip-up settle times pulled from Broadcom’s reference driver. Same logic as the codec — datasheet- mandated, one-shot, leave them as DELAY.

◐ partial WiFi still blocked on BCDC timeout (see project_wifi_status), but the timing-primitive choices are correct; the bug is in the SDIO IRQ wiring, not the timeout discipline.

src/sys/dev/drm/panfrost/ — DRM scheduler owns the timeout

Panfrost doesn’t have its own callout for GPU job timeouts. It sets queue->sched.timeout = msecs_to_jiffies(JOB_TIMEOUT_MS) (5 s) and lets the DRM scheduler arm a delayed_work per submitted job; when the work fires, panfrost_job_timedout is called. That’s the same discipline as a callout — armed when the job goes out, cancelled when the fence signals, fires at the deadline regardless of how long the worker has been idle.

The recent panfrost reset arc ( 1a5666e Defer Panfrost timeout reset to taskqueue , 3378d3d Queue Panfrost reset before scheduler stop , cf2dd90 Fix Panfrost reset fence signaling context , f5184d0 Harden Panfrost reset fence recovery ) had crash issues, but they were all locking / fence-context / use-after-free bugs, not “did we miss a deadline” bugs. The deadline plumbing itself — DRM scheduler’s delayed-work + spurious-timeout check (dma_fence_is_signaled at the top of panfrost_job_timedout) — is correct.

The only direct timing primitive in panfrost is DELAY(10000) after JS_COMMAND_SOFT_STOP — a 10 ms wait for the GPU to acknowledge the soft-stop before we hard-reset. That’s a hardware-handshake settle, microseconds-to-milliseconds scale, called from a taskqueue context that’s already committed to resetting the device. DELAY is the right tool.

◐ partial Heavy-load freezes are still open (see project_gpu_crash_fixes); they aren’t a deadline-miss bug.

patches/sys/dev/iicbus/controller/rockchip/rk_i2c.c.patch — no timing change

Our rk_i2c patch (the one that fixed goodix IRQ allocation, 2972362 rk_i2c + goodix: fix iicbus child IRQ alloc ) adds bus_* method passthroughs to the DEVMETHOD table. It doesn’t touch any timing logic. The upstream i2c bus controller already uses tsleep with timeout for transfer completion; that’s correct.

When state-age polling is fine

The point of the FUSB302 refactor is not “always use callouts.” It’s “if you have a hard spec deadline that fires once at a known offset from a known event, arm a callout the moment the event happens; don’t hope the worker wakes at the right moment to notice.”

State-age polling — if (now - state_entered > N) do X — is fine when:

State-age polling becomes a bug when:

Pattern summary

DriverTime-based logicPatternMigrate to callout?
fusb302.cPD spec timers (tNoResponse, tSenderResponse, tPSTransition)callout (post-12a63fe)done
fusb302.cPE_DISABLED retry-ccstate-age (1 Hz rate limit)no
dwc3_gadget.cTX watchdogcallout_init_mtx, hz/2 sweepalready done
goodix.cPoll-mode fallbackcallout, hz/100 sweepalready done
goodix.cReset/firmware-ready settleDELAY (one-shot)no — DELAY is correct
rt5640.cCodec init register settleDELAY (one-shot)no — DELAY is correct
simple_amplifier.cPCMTRIG GPIO togglesynchronousno
rk818_battery.c1 Hz fuel-gauge polltsleep in kprocno
bwfm_sdio.cSDIO IRQ poll fallbackcallout, hz/100 sweepalready done
bwfm_sdio.cBCDC command timeouttsleep with timeoutno — tsleep timeout is correct
bwfm_sdio.cChip bring-up settleDELAY (one-shot)no
panfrost_job.cGPU job timeoutDRM scheduler delayed_workno — already armed-deadline
panfrost_job.cSoft-stop ack settleDELAY (one-shot)no
rk_i2c.c.patch(no timing change)n/an/a

The lesson generalizes to one rule: arm the deadline when the event that starts it happens; cancel it when the deadline becomes irrelevant; never compute “has the deadline expired” from the outside. Everywhere else in the tree we already follow it; FUSB302 is now the ninth.