The PinePhone Pro reboots on any unhandled kernel fault. When the host is
not already logging /dev/ttyUSB1, the best artifact is often a phone
photo of the framebuffer after the crash. That is not good enough: the
register dump, DRM warnings, and reset path need to land in a file on the
laptop before the board is power-cycled.
This page is the recipe for not losing the next one.
0. Snapshot a live wedge over SSH
When the panel is frozen but USB Ethernet still answers, capture the live process stacks before killing clients, restarting Sway, or rebooting:
mise run debug:gpu:wedge:phone -- firefox-glxtest-wedge
The task writes logs/gpu-wedge/<timestamp>-<name>.log. It is read-only:
it records process state, procstat -k stacks, Sway IPC if it still answers,
DRM/Panfrost sysctls, thermal state, dmesg, and /var/log/messages. The local
analysis classifies the known signatures:
panfrost-bo-wait— Sway or a browser helper is stuck inpanfrost_ioctl_wait_bo/dma_resv_wait_timeout_rcu.drm-modeset-wedge— the olddrm_modeset_lock/hw_done/flip_donetimeout cascade is present.memory-pressure-with-gpu-clients— Firefox or another GL client was killed by VM reclaim before the compositor got sick.media-audio-spin— a browser/media workload left PulseAudio burning CPU while Sway and Panfrost were still responsive.firefox-media-cpu— the PulseAudio path is avoided, but Firefox’s content/RDD/media processes are saturating the RK3399S CPUs.
This is the first command to run on the next browser/WebGL or compositor
freeze. It exists because the 2026-05-06 Firefox bench left sway
unkillable inside panfrost_ioctl_wait_bo; even SIGKILL could not reap it,
and sudo reboot stopped sshd but then hung late while the kernel still
answered ICMP.
For browser media tests the receipt also captures /dev/sndstat,
mixer(8), PulseAudio / PipeWire / virtual_oss process stacks, RT5640
dmesg lines, and the selected Panfrost counters. That separation mattered on
2026-05-06: the first YouTube run did not show a compositor wait; it showed
PulseAudio’s OSS thread pegged. Forcing Firefox’s cubeb backend to OSS
through mobile-config-firefox and disabling PulseAudio autospawn made the
same YouTube launch avoid PulseAudio entirely. The phone still got hot, but
the receipt showed ordinary Firefox/media CPU saturation rather than a GPU
timeout, DRM modeset lock, or PulseAudio spin.
1. Live serial logging
Before triggering anything that might crash — running glmark2, reloading
a sway theme, resetting WiFi after a wedged transfer, or plugging in the
gigabit-ethernet dongle for the first time — start a serial capture on the
laptop. mise run serial:capture -- <name> wraps tools/capture-serial.sh,
which opens picocom against /dev/ttyUSB1 at 1500000 baud (the FT232
dongle — the CP2102 doesn’t go that high) and logs everything to
logs/serial/<timestamp>-<name>.log:
mise run serial:capture -- stress-test
# … hit Ctrl-A Ctrl-X to exit
grep -nE 'panic|Fatal|abort|WARNING' logs/serial/*-stress-test.log
Even when the crash hard-reboots the phone before any disk write completes, the serial transcript on the laptop is intact.
For longer risky sessions, use detached capture so the terminal running the test does not own the serial port:
mise run serial:capture:daemon -- wifi-reset
mise run serial:capture:status
# … run the risky test, reboot, or hard-reset the phone …
grep -nE 'panic|Fatal|abort|WARNING' "$(cat logs/serial/.capture.log)"
mise run serial:capture:stop
Detached capture records U-Boot, the EFI loader, kernel boot, and post-boot
console output as long as the FT232 adapter stays plugged in. It also answers
the loader’s ESC[6n cursor-position query; a raw cat logger can record
that query but leave the phone stuck before FreeBSD reaches USB networking.
2. Mini-dump on panic via savecore (caveat: needs a real swap)
FreeBSD writes a kernel mini-dump to the swap partition on panic(), then
savecore copies it to /var/crash/ on the next boot. This is the most
useful piece of post-mortem state — it captures the panic message, register
state, and a stack trace into a file you can read on honor with kgdb.
The overlay’s rc.conf sets dumpdev="AUTO" + savecore_enable="YES" +
savecore_flags="-z", and tools/configure-dump.sh applies the same to a
live phone:
./tools/configure-dump.sh
ssh pinephone 'dumpon -l && ls /var/crash'
BUT: today’s Honeyguide image has no swap partition, and md(4)-backed
swap files do not support DIOCSKERNELDUMP (verified — the ioctl
returns Operation not supported). So dumpon -l will say /dev/null
and a panic will not be captured by savecore until one of:
- The next image build is repartitioned to include a real swap slice
(~512 MB is plenty for a mini-dump on a 4 GB phone). Edit
honeyguide/img/create_img_clean.sh. netdump(4)is wired up to honor over the USB-Ethernet link. Thecdce/usb_templatedriver stack would need DEBUGNET hooks first (net.netdump.enabledwill refuse to come up otherwise).
Until then, fall back to sections 1 and 3.
After a panic and reboot, fish out the dump:
ssh pinephone 'ls -lhrt /var/crash | tail'
ssh pinephone 'cat /var/crash/info.last' # panic string + uptime + version
ssh pinephone 'gunzip -c /var/crash/vmcore.last.gz' \
| ssh honor 'cat > /tmp/vmcore'
ssh honor 'kgdb \
~/pine64-freebsd/honeyguide/obj.clang/.../sys/PINEPHONE_PRO/kernel.debug \
/tmp/vmcore'
The kernel.debug (the unstripped one) lives in the buildkernel object
tree on honor. bt in kgdb gives you the panic backtrace.
To verify the path works without waiting for a real crash, deliberately panic the phone:
ssh pinephone 'sudo sysctl debug.kdb.panic=1'
# … wait for reboot, then check /var/crash
(Don’t do this casually — it’s a hard panic.)
3. Per-minute dmesg snapshots
Mini-dumps require a clean panic(). A hang — kernel still alive but
display dead, IRQs frozen, watchdog bites — leaves nothing for savecore.
For those, we want the most recent dmesg written to disk before things
went sideways.
mise run debug:wifi:setup:phone installs tools/install-dmesg-snapshots.sh
on the phone: a per-minute cron that writes dmesg to
/var/log/dmesg-snapshots/dmesg-HHMM.log and rotates anything older than
an hour. Disk impact is bounded (60 small files, total under a megabyte).
mise run debug:wifi:setup:phone
# After a hang and reboot:
ssh pinephone 'ls -lhrt /var/log/dmesg-snapshots/ | tail'
The HHMM in the filename is the time the snapshot was taken, so you can
match it against your serial log to see what was on dmesg seconds before
the wedge.
4. Snapshot the live WiFi state before and after a risky test
For the bwfm(4) / BCM43455 work, the most common failure mode is not an
immediate panic but a firmware command rejection followed by a wedged or
missing interface. Keep debug reads passive. The state sysctls used by the
current harness should report cached counters and MMIO state; they should not
issue fresh SDIO CMD52/CMD53 transactions from sysctl context.
mise run debug:wifi:phone collects the pieces that are easy to lose track of
mid-session:
- local and
honorgit SHAs honormodule path + SHA256- phone
/boot/modules/<name>.koSHA256 kldstatifconfig bwfm_sdio0/ifconfig wlan0- filtered
dmesg,/var/log/messages, and the latest dmesg snapshot - tail of the most recent local serial log
It writes the snapshot to logs/wifi/<timestamp>-<name>.log.
The script appends a short receipt analysis at the end of the log; rerun it
manually with mise run debug:wifi:analyze -- logs/wifi/<file>.log if you
want to compare older captures.
mise run module:refresh:phone -- bwfm_sdio
mise run module:compare:phone -- bwfm_sdio
mise run debug:wifi:phone -- before-scan bwfm_sdio
ssh pinephone 'sudo ifconfig wlan0 scan'
mise run debug:wifi:phone -- after-scan bwfm_sdio
For transfer tests, prefer the bounded harness and leave verbose trace dumps off unless you are specifically debugging the trace ring:
SIZE_MIB=2 POLL_SECS=5 DEBUG_TIMEOUT=5 DUMP_TRACE=0 \
mise run debug:wifi:transfer -- both sdio-irq-clock-fix-small
The transfer harness appends the same analysis after it stops the poller. The
important line is summary: classification=...: irq-active-watchdog means
the host delivered function interrupts to bwfm_sdio; irq-armed-poll-fallback
means the function IRQ was claimed but the watchdog/poll path did the work.
completed-with-usb-stalls is still useful evidence, but it says the USB
management link timed out during the run and the serial log should be checked.
If a debug sysctl ever hangs the phone by itself, treat that as a driver bug. The next transfer should wait until the sysctl has been made passive again.
5. Reading a captured panic
A typical fault frame on aarch64 looks like:
Fatal data abort:
x0: 0xffff... x7: 0xffffffffffffff
...
far: 0 esr: 0x96000004
faris the faulting address.0means a NULL deref; small offsets (0x20,0x40) mean we deref’d through a NULL struct pointer to a field at that offset.esrdecodes the fault class.0x96is “Data abort taken without a change in EL”; the lower bits are the fault status code.0x04is “translation fault, level 0” — there’s literally no page table entry, consistent with NULL.x0is usually the first argument orthis-equivalent at the point of the call.
For NULL derefs, the most useful follow-up is:
- Find the function name from
pc(addr2lineorkgdb’sinfo line *0x...). - Look at the instruction at
pc— usually aldr xN, [xM, #offset].xMis the base pointer (matches a register in the dump);offsettells you which field. - Trace back from the call site to find what made
xMNULL.
For the panic that prompted this page, the panic was deliberate:
panic("plane is not visible") left as a /* TODO */ in
rk_vop_plane_atomic_update. The lock-assertion warnings further down the
log were tail-end damage as the panic unwound state still held by another
core. See the war story.
6. When all else fails
If every other capture fails (savecore unconfigured, hang too hard, serial cable disconnected), the last line of defence is a screen photograph through the privacy switch. Keep a phone-with-decent-camera nearby. Aim at the EFI framebuffer; the panic frame stays visible for ~1 second before the watchdog reboots. Decode by hand. We have the technology to do better than this — use the steps above so we don’t have to.