Files
Anthias/website/data
Viktor Petersson cca69b594d fix(balena): repair Pi 4 graphics overlay + manage fleet host config as IAC (#2947) (#2949)
* fix(viewer): detect Pi 4 eglfs DRM card at runtime so boot doesn't hang (#2947)

- vc4-drm (display) and v3d (render-only) race during probe, so the
  display node is card0 on some boots/images and card1 on others
- #2905 hardcoded /dev/dri/card1; when vc4 loses the race eglfs opens
  the render-only node, finds no connectors, and the device hangs on
  the balena splash forever
- start_viewer.sh now picks the card that owns connectors at runtime
  and rewrites QT_QPA_EGLFS_KMS_CONFIG before launch
- prefers a connected connector, falls back to any card exposing
  connectors (excludes the connector-less v3d node)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(balena): repair Pi 4 graphics overlay, manage fleet host config as IAC (#2947)

Root cause of the Pi 4 boot-splash hang: the anthias-pi4 fleet's
dtoverlay was stored as the malformed value `"vc4-kms-v3d"` — literal
double-quotes, on the legacy RESIN_ prefix — unlike every other board's
clean `vc4-kms-v3d`. The quotes stop the firmware loading the overlay,
so the Pi 4 fell back to firmware-KMS; since #2905 the viewer renders
through Qt eglfs_kms, which needs full KMS, so the display never came up
and the device hung on the splash. (linuxfb, used before #2905, didn't
care, which is why this surfaced now.) The malformed value was a manual
dashboard edit — the config was never codified.

- add balena-host-config.json: declarative per-board config.txt knobs,
  reconciled from the live fleets and corrected (clean pi4 dtoverlay;
  drop bogus `dtparam=...,vc4-kms-v3d`; standardize pi2/pi3 off the
  RESIN_ prefix; drop pi5 gpu_mem which a Pi 5 ignores; add cma-512 to
  pi5 per docs/board-enablement.md)
- build-balena-disk-image.yaml reconciles each fleet to the file:
  upsert under the canonical BALENA_ prefix, then prune anything not in
  the file (incl. legacy RESIN_HOST_CONFIG_* dupes). Supervisor vars
  untouched.
- docs/balena-fleet-host-config.md documents the mechanism + the full
  per-board audit; modernize the self-hosted doc's `env add`->`env set`
- drop `--pin-device-to-release` from `balena preload` (added in #2098)
  so flashed devices track the latest stable release instead of freezing;
  correct installation-options.md / faq.yaml accordingly

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(ci): harden fleet host-config reconcile against fleet-wide wipe (#2947)

Review of the prune step surfaced three failure modes:

- An empty desired set (absent board key, or jq/file parse failure not
  caught by set -e through a process substitution) made the prune delete
  every config var on the fleet, incl. dtoverlay=vc4-kms-v3d. Now resolve
  the board config with `jq -ec .boards[$b]` and hard-fail if it's null
  or the file is invalid; a `{}` board (x86) is truthy so it still
  reconciles to "no config.txt".
- The prune selector `test("HOST_CONFIG")` was an unanchored substring
  match — a var merely containing HOST_CONFIG (e.g. BALENA_HOST_CONFIG-
  URATION_BACKUP) would be pruned. Anchored to `^(BALENA|RESIN)_HOST_CONFIG_`.
- A transient `balena env list` / jq failure in the prune's process
  substitution was swallowed (pipefail doesn't propagate out of `<(...)`),
  silently skipping the prune and leaving stale RESIN_ duplicates. Capture
  the listing into a var first so the failure aborts the step.

Also folds the duplicate jq pass over balena-host-config.json into the
single `board_json` resolve.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(balena): describe current host config, not the one-time cleanup

Replace the point-in-time "Audit" table (old quoted/broken values, was→fixed
narrative) with a forward-looking per-key rationale. The cleanup history lives
in the PR / git history; the doc should describe what each setting is and why.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(ci): drop the inline rationale comment from the preload step

Move the explanation out of the workflow and into history (this is where
`git blame` on the preload step lands):

`--commit latest` seeds the image with the current release's container
images so a freshly flashed device boots fully offline. We deliberately
do NOT pass `--pin-device-to-release`: pinning (added in #2098) froze
flashed devices to the downloaded release, so they never received OTA
fixes. Anthias balena devices should track the fleet's latest stable
release, so the device joins the fleet unpinned and auto-updates from
here on.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-05-30 19:46:13 +02:00
..