The FUSB302 type-C/PD driver had a class of bug that’s easy to write
into any event-driven driver: spec deadlines (tNoResponse, tSenderResponse,
tPSTransition) were checked by computing ticks - state_entered_ticks at
the top of the worker, on every wake. That works fine when the worker
runs on a fixed cadence faster than the shortest deadline. It stops
working the moment the worker becomes purely event-driven — an IRQ wake
plus an occasional 200 ms heartbeat is enough to step past a 30 ms
deadline without ever evaluating the in-state logic, and the PD policy
engine simply hangs in PE_SNK_SELECT_CAPABILITY waiting for an Accept
that already came and went.
12a63fe fusb302: PD timers via callout instead of state-age polling moved those three deadlines onto a single
struct callout per softc, armed via callout_init_mtx so the callback
runs under the same mutex as the worker. set_state() cancels any prior
arming, installs the new state, and arms the callout if the new state
has a deadline; the callback sets pe_timer_fired and wakes the kproc
exactly when the deadline expires.
This appendix walks the rest of the drivers we own or heavily modified and asks the same question: is there time-based logic that polls “has N ms passed since X” instead of arming a deadline? Where the answer is “yes and that’s a bug waiting to happen” we say so. Where the existing pattern (callout-for-poll, state-age-for-rate-limit, spin-with-DELAY-for-microsecond-PHY-settling) is actually correct, we say that too — the goal is not to migrate everything to callouts.
src/sys/dev/iicbus/fusb302.c — the motivating case
Refactored in 12a63fe fusb302: PD timers via callout instead of state-age polling . The driver still uses
fusb302_state_age_ms() in exactly one place after the refactor:
PE_DISABLED retries detect_cc() once per second when VBUS is up
but no source has been detected. That’s a rate limit, not a spec
deadline — if the retry happens at 1.0 s or 1.2 s, nothing breaks.
Leaving it on state-age math is the right call; arming a callout
for a “try again in roughly a second” loop would just add cancel/rearm
churn around every interrupt.
● working All three PD spec timers now arm via callout; state-age math survives only for the 1 Hz cable-rescan loop.
src/sys/dev/usb/controller/dwc3/dwc3_gadget.c — already on callout
The DWC3 gadget TX watchdog (dwc3_gadget_tx_watchdog) is exactly the
pattern FUSB302 just adopted: callout_init_mtx(&sc->tx_watchdog, &sc->sc_bus.bus_mtx, 0) at attach, callout_reset(&sc->tx_watchdog, hz / 2, ...) at start and at the end of every tick. The callback runs
under USB_BUS_LOCK, compares tx_ok_count against
tx_ok_at_last_watchdog, and unsticks the endpoint if no progress
happened. That’s a 500 ms sweep, not a one-shot deadline, but the
mtx-bound callout discipline is the same one we want.
The other timing in DWC3 is microsecond-scale PHY/clock settling
(DELAY(2500) after CSFTRST, DELAY(2500) for UTMI clock) and
register-poll loops (DELAY(1) × 1000 waiting on DCTL.CSFTRST to
self-clear). Both are correct uses of DELAY — converting a 1 µs
register poll into a callout is silly; the cost of the callout
machinery dwarfs the wait.
● working No migration needed.
src/sys/dev/iicbus/goodix.c — already on callout, correctly
When goodix can’t allocate an IRQ it falls back to polling at 100 Hz.
callout_init(&sc->sc_poll_callout, 1) then callout_reset(..., hz / 100, goodix_poll, sc) from attach, and the ISR re-arms the
callout from its tail (the if (sc->sc_polling) re-arm at the end of
goodix_intr). That’s a periodic sweep, which is what callouts are
for. There is no state-age arithmetic anywhere else in the file.
The boot sequence has hard-coded DELAY(20000) / DELAY(50000) waits
between reset and read of the firmware version. Per the Goodix
controller datasheet these are required power-up waits, not
adjustable timeouts, and they happen exactly once at attach. Leaving
them as DELAY is correct — there’s no kproc to wake yet.
● working No migration needed. (The recent rk_i2c
fix in 2972362 rk_i2c + goodix: fix iicbus child IRQ alloc is what got the IRQ path working in
the first place; goodix only used the polling fallback because the
i2c bus was eating its IRQ allocation.)
src/sys/dev/iicbus/rt5640.c — codec init, no state machine
The RT5640 codec init writes a sequence of registers with DELAY(10000)
/ DELAY(15000) / DELAY(80000) between groups. These are
power-rail / VREF / DAC-bias settling times specified by the Realtek
datasheet, and they happen exactly once during rt5640_init while
the codec is being brought out of standby. There is no state machine
running on top of them — the driver is request/response over i2c plus
ALSA-style register routing — so there’s nothing to convert. State-age
polling never appears in this file.
The mic-init writes drafted in project_rt5640_mic_plan (still
untested) follow the same pattern: a fixed sequence of register writes
with datasheet-mandated settle times. Use DELAY there too.
▸ next Mic init still untested; pattern stays as-is.
src/sys/dev/sound/fdt/simple_amplifier.c — no post-trigger delay
The simple-audio-amp shim (the one that drives PB3 from a PCMTRIG_START
to enable the speaker amp, landed 3b23d2f audio: drive simple amplifier enable GPIO ) toggles the
GPIO synchronously in simple_amp_dai_trigger: gpio_pin_set_active(..., 1) on PCMTRIG_START, gpio_pin_set_active(..., 0) on PCMTRIG_STOP.
There is no post-trigger pop-suppression delay, no soft-mute, no fade.
If we ever hear pop on enable that we want to suppress, the right tool is not a callout — it’s writing the codec’s HP_SOFT_MUTE / DAC_DIG_VOL ramp registers before raising the GPIO, which is request/response i2c again. The amp turning on a few ms after the DAC has its analog output muted is what suppresses pop on real Linux too.
● working No migration needed.
src/sys/dev/iicbus/pmic/rockchip/rk818_battery.c — periodic sweep, tsleep is fine
The battery monitor runs as its own kproc and does
tsleep(&my_rwlock, PWAIT, "rk818_battery_thread", 1 * hz) between
samples. That’s a 1 Hz fuel-gauge poll — exactly the use case tsleep
is designed for. A callout would also work but would require teaching
the i2c read path to be callable from callout context (it currently
isn’t — i2c transfers can sleep). The kproc + tsleep pattern is
strictly simpler.
There is no time-based state machine on the charger side. The driver doesn’t currently arm “if charge current hasn’t risen above X mA in N seconds, throttle back” rules, which would be the legitimate place for a callout. If we add that logic later, it should arm a callout on the relevant register write and cancel/rearm on each charger event, not check elapsed time on every poll wake.
● working No migration needed today; flag for review if we add charger-state timeouts.
src/sys/dev/bwfm/bwfm_sdio.c — already on callout, BCDC tsleep is correct
The SDIO IRQ-poll fallback (bwfm_sdio_poll_callout) is a proper
periodic sweep: callout_init(&sc->sc_poll, 1) at attach,
callout_reset(&sc->sc_poll, hz / 100, ...) at the bottom of every
tick. There’s no state-age arithmetic.
Firmware download and BCDC command paths use tsleep(ctl, 0, "bwfm", hz) to wait for a response with a 1 s timeout. That’s the correct
pattern for a “send command, sleep until either response arrives or
1 s expires” RPC. The response handler wakeup()s the channel; if
no response arrives, tsleep returns nonzero and the caller treats it
as a BCDC timeout. Converting this to a callout gains nothing — tsleep
already has the timeout built in.
The boot-time DELAY(65) / DELAY(1000) / DELAY(10000) waits are
SDIO bus-reset, F2-ready, and chip-up settle times pulled from
Broadcom’s reference driver. Same logic as the codec — datasheet-
mandated, one-shot, leave them as DELAY.
◐ partial WiFi still blocked on BCDC timeout (see project_wifi_status), but the timing-primitive choices are correct; the bug is in the SDIO IRQ wiring, not the timeout discipline.
src/sys/dev/drm/panfrost/ — DRM scheduler owns the timeout
Panfrost doesn’t have its own callout for GPU job timeouts. It sets
queue->sched.timeout = msecs_to_jiffies(JOB_TIMEOUT_MS) (5 s) and
lets the DRM scheduler arm a delayed_work per submitted job; when
the work fires, panfrost_job_timedout is called. That’s the same
discipline as a callout — armed when the job goes out, cancelled when
the fence signals, fires at the deadline regardless of how long the
worker has been idle.
The recent panfrost reset arc ( 1a5666e Defer Panfrost timeout reset to taskqueue ,
3378d3d Queue Panfrost reset before scheduler stop , cf2dd90 Fix Panfrost reset fence signaling context ,
f5184d0 Harden Panfrost reset fence recovery ) had crash issues, but they were all
locking / fence-context / use-after-free bugs, not “did we miss a
deadline” bugs. The deadline plumbing itself — DRM scheduler’s
delayed-work + spurious-timeout check (dma_fence_is_signaled at
the top of panfrost_job_timedout) — is correct.
The only direct timing primitive in panfrost is DELAY(10000) after
JS_COMMAND_SOFT_STOP — a 10 ms wait for the GPU to acknowledge the
soft-stop before we hard-reset. That’s a hardware-handshake settle,
microseconds-to-milliseconds scale, called from a taskqueue context
that’s already committed to resetting the device. DELAY is the
right tool.
◐ partial Heavy-load freezes are still open (see project_gpu_crash_fixes); they aren’t a deadline-miss bug.
patches/sys/dev/iicbus/controller/rockchip/rk_i2c.c.patch — no timing change
Our rk_i2c patch (the one that fixed goodix IRQ allocation,
2972362 rk_i2c + goodix: fix iicbus child IRQ alloc ) adds bus_* method passthroughs to the
DEVMETHOD table. It doesn’t touch any timing logic. The upstream
i2c bus controller already uses tsleep with timeout for transfer
completion; that’s correct.
When state-age polling is fine
The point of the FUSB302 refactor is not “always use callouts.” It’s “if you have a hard spec deadline that fires once at a known offset from a known event, arm a callout the moment the event happens; don’t hope the worker wakes at the right moment to notice.”
State-age polling — if (now - state_entered > N) do X — is fine when:
-
It’s a rate limit, not a deadline. PE_DISABLED retrying
detect_cc()once per second is fine: missing the window by 200 ms doesn’t break anything. The spec deadlines (tSenderResponse = 30 ms, tNoResponse = 5 s, tPSTransition = 500 ms) are different — missing the window means the PD partner has already moved on. -
It’s a periodic sweep already on a callout. The DWC3 TX watchdog computes “has tx_ok_count moved since last tick?” against a counter snapshot. That’s state-age arithmetic, but the tick is itself armed by callout, so the worst-case latency is bounded by the callout period (500 ms), not by some upstream worker’s wake pattern.
-
It’s coarse and the worker provably wakes faster. A 1 Hz fuel gauge running its own kproc with
tsleep(..., 1 * hz)is fine — the kproc is the only thing that decides when to wake.
State-age polling becomes a bug when:
-
The worker became event-driven and the deadline is shorter than the longest gap between events. This is what bit FUSB302: once IRQ wakes replaced the 50 ms heartbeat, a 30 ms deadline could fall entirely inside one inter-event gap and never get evaluated.
-
The deadline matters for protocol correctness. Anything tagged with a “t” prefix in a USB-PD / Bluetooth / 802.11 spec is in this bucket. The remote side will give up at exactly that offset; we must too.
Pattern summary
| Driver | Time-based logic | Pattern | Migrate to callout? |
|---|---|---|---|
fusb302.c | PD spec timers (tNoResponse, tSenderResponse, tPSTransition) | callout (post-12a63fe) | done |
fusb302.c | PE_DISABLED retry-cc | state-age (1 Hz rate limit) | no |
dwc3_gadget.c | TX watchdog | callout_init_mtx, hz/2 sweep | already done |
goodix.c | Poll-mode fallback | callout, hz/100 sweep | already done |
goodix.c | Reset/firmware-ready settle | DELAY (one-shot) | no — DELAY is correct |
rt5640.c | Codec init register settle | DELAY (one-shot) | no — DELAY is correct |
simple_amplifier.c | PCMTRIG GPIO toggle | synchronous | no |
rk818_battery.c | 1 Hz fuel-gauge poll | tsleep in kproc | no |
bwfm_sdio.c | SDIO IRQ poll fallback | callout, hz/100 sweep | already done |
bwfm_sdio.c | BCDC command timeout | tsleep with timeout | no — tsleep timeout is correct |
bwfm_sdio.c | Chip bring-up settle | DELAY (one-shot) | no |
panfrost_job.c | GPU job timeout | DRM scheduler delayed_work | no — already armed-deadline |
panfrost_job.c | Soft-stop ack settle | DELAY (one-shot) | no |
rk_i2c.c.patch | (no timing change) | n/a | n/a |
The lesson generalizes to one rule: arm the deadline when the event that starts it happens; cancel it when the deadline becomes irrelevant; never compute “has the deadline expired” from the outside. Everywhere else in the tree we already follow it; FUSB302 is now the ninth.