The PinePhone Pro reboots on any unhandled kernel fault. The CRU hack (see essay 7) prevents the early-boot serial console from coming back up, so by the time the kernel logs “Fatal data abort” the host serial port is already clear of the boot banner — and within a second the chip resets. That’s a 200 ms window to photograph the screen, and after that, all state is gone.
This page is the recipe for not losing the next one.
1. Live serial logging
Before triggering anything that might crash — running glmark2, reloading
a sway theme, plugging in the gigabit-ethernet dongle for the first time —
start a serial capture on the laptop. The script tools/capture-serial.sh
opens picocom against /dev/ttyUSB1 at 1500000 baud (the FT232 dongle —
the CP2102 doesn’t go that high) and tees everything to
logs/serial/<timestamp>-<name>.log:
./tools/capture-serial.sh stress-test
# … hit Ctrl-A Ctrl-X to exit
grep -nE 'panic|Fatal|abort|WARNING' logs/serial/*-stress-test.log
Even when the crash hard-reboots the phone before any disk write completes, the serial transcript on the laptop is intact.
2. Mini-dump on panic via savecore (caveat: needs a real swap)
FreeBSD writes a kernel mini-dump to the swap partition on panic(), then
savecore copies it to /var/crash/ on the next boot. This is the most
useful piece of post-mortem state — it captures the panic message, register
state, and a stack trace into a file you can read on honor with kgdb.
The overlay’s rc.conf sets dumpdev="AUTO" + savecore_enable="YES" +
savecore_flags="-z", and tools/configure-dump.sh applies the same to a
live phone:
./tools/configure-dump.sh
ssh pinephone 'dumpon -l && ls /var/crash'
BUT: today’s Honeyguide image has no swap partition, and md(4)-backed
swap files do not support DIOCSKERNELDUMP (verified — the ioctl
returns Operation not supported). So dumpon -l will say /dev/null
and a panic will not be captured by savecore until one of:
- The next image build is repartitioned to include a real swap slice
(~512 MB is plenty for a mini-dump on a 4 GB phone). Edit
honeyguide/img/create_img_clean.sh. netdump(4)is wired up to honor over the USB-Ethernet link. Thecdce/usb_templatedriver stack would need DEBUGNET hooks first (net.netdump.enabledwill refuse to come up otherwise).
Until then, fall back to sections 1 and 3.
After a panic and reboot, fish out the dump:
ssh pinephone 'ls -lhrt /var/crash | tail'
ssh pinephone 'cat /var/crash/info.last' # panic string + uptime + version
ssh pinephone 'gunzip -c /var/crash/vmcore.last.gz' \
| ssh honor 'cat > /tmp/vmcore'
ssh honor 'kgdb \
~/pine64-freebsd/honeyguide/obj.clang/.../sys/PINEPHONE_PRO/kernel.debug \
/tmp/vmcore'
The kernel.debug (the unstripped one) lives in the buildkernel object
tree on honor. bt in kgdb gives you the panic backtrace.
To verify the path works without waiting for a real crash, deliberately panic the phone:
ssh pinephone 'sudo sysctl debug.kdb.panic=1'
# … wait for reboot, then check /var/crash
(Don’t do this casually — it’s a hard panic.)
3. Per-minute dmesg snapshots
Mini-dumps require a clean panic(). A hang — kernel still alive but
display dead, IRQs frozen, watchdog bites — leaves nothing for savecore.
For those, we want the most recent dmesg written to disk before things
went sideways.
tools/install-dmesg-snapshots.sh adds a per-minute cron to root on the
phone that writes dmesg to /var/log/dmesg-snapshots/dmesg-HHMM.log and
rotates anything older than an hour. Disk impact is bounded (60 small
files, total under a megabyte).
./tools/install-dmesg-snapshots.sh
# After a hang and reboot:
ssh pinephone 'ls -lhrt /var/log/dmesg-snapshots/ | tail'
The HHMM in the filename is the time the snapshot was taken, so you can
match it against your serial log to see what was on dmesg seconds before
the wedge.
4. Reading a captured panic
A typical fault frame on aarch64 looks like:
Fatal data abort:
x0: 0xffff... x7: 0xffffffffffffff
...
far: 0 esr: 0x96000004
faris the faulting address.0means a NULL deref; small offsets (0x20,0x40) mean we deref’d through a NULL struct pointer to a field at that offset.esrdecodes the fault class.0x96is “Data abort taken without a change in EL”; the lower bits are the fault status code.0x04is “translation fault, level 0” — there’s literally no page table entry, consistent with NULL.x0is usually the first argument orthis-equivalent at the point of the call.
For NULL derefs, the most useful follow-up is:
- Find the function name from
pc(addr2lineorkgdb’sinfo line *0x...). - Look at the instruction at
pc— usually aldr xN, [xM, #offset].xMis the base pointer (matches a register in the dump);offsettells you which field. - Trace back from the call site to find what made
xMNULL.
For the panic that prompted this page, the panic was deliberate:
panic("plane is not visible") left as a /* TODO */ in
rk_vop_plane_atomic_update. The lock-assertion warnings further down the
log were tail-end damage as the panic unwound state still held by another
core. See the war story.
5. When all else fails
If every other capture fails (savecore unconfigured, hang too hard, serial cable disconnected), the last line of defence is a screen photograph through the privacy switch. Keep a phone-with-decent-camera nearby. Aim at the EFI framebuffer; the panic frame stays visible for ~1 second before the watchdog reboots. Decode by hand. We have the technology to do better than this — use the steps above so we don’t have to.