Files
Anthias/.github/workflows
Viktor Petersson fb4770bfe3 feat(balena): unpinner also rolls OS + supervisor updates (#2984)
* fix(viewer): fix two startup warnings surfaced by the log cleanup

With debug logging removed (#2977), two pre-existing startup warnings
became visible in the viewer logs. Both are fixed here:

- migrate_legacy_paths.sh: `set -euo pipefail` + the bare `${USER}` in
  USER_HOME aborted the script when $USER is unset — which is the case
  in-container (DATA mode, via migrate_in_container_paths.sh). On a
  legacy (screenly→anthias) device that would skip the in-container
  migration entirely. Fall back to `id -un` so it resolves without
  tripping `set -u`.
- AnthiasViewer: the MainWindow ctor called showFullScreen() AND main()
  called window->show(), showing the window twice. Under cage/wayland
  the double surface-commit tripped wlroots' "A configure is scheduled
  for an uninitialized xdg_surface" warning at startup. Show once, in
  main(), after construction.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(balena): unpinner also rolls OS + supervisor updates

bin/balena_unpin_devices.py cleared the app-release pin but left
unpinned devices on whatever old balenaOS / supervisor they booted.
Add two optional, cloud-API-only phases so the one hourly job brings
the fleet fully current:

- --os-update: start a resinhup toward the latest balenaOS for the
  device type (POST actions/<base>/v2/<uuid>/resinhup, base resolved
  from /config). Online-only, skips OS < 2.14.0 (single-hop HUP floor
  per balena-hup-action-utils), bounded to --os-percent (default 5%)
  of the eligible population per run so the backlog ramps instead of
  stampeding.
- --supervisor: point devices at the newest supervisor release for
  their CPU architecture (PATCH should_be_managed_by__release).
  Restricted to devices already on the target OS so the supervisor/OS
  pairing stays compatible, and tranche-bounded like the OS phase.

Both phases honour the anthias_keep_pinned opt-out, stay dry-run by
default, and keep the aggregate-only public-log output. The hourly
workflow now passes --os-update --supervisor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(balena): skip supervisor bump on legacy-OS device types

getSupervisorReleasesForCpuArchitecture returns the newest supervisor
for an arch regardless of OS. On a device type frozen on a legacy
balenaOS line that's unsafe: raspberry-pi2 tops out at balenaOS 5.1.x
(devices run supervisor 15.x), and pointing them at 17.x would likely
break them. Only run the supervisor phase when the fleet's target OS is
on the calendar-versioned line (major >= 2025); legacy-OS fleets keep
their OS-matched supervisor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(balena): set a User-Agent so Cloudflare allows the resinhup call

The device-actions host (actions.balena-devices.com) is behind
Cloudflare, which 403s the default Python-urllib User-Agent as a banned
client signature (error 1010) — so every OS-update POST failed. Send a
descriptive User-Agent on all requests; the resinhup action then
triggers normally (HTTP 202).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(balena): report actual per-fleet OS-update started/failed counts

The per-fleet line printed the tranche size regardless of how many
resinhup calls actually succeeded. Track started/failed per fleet and
print both, so a few transient busy/offline failures are visible
instead of hidden behind the planned count.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 07:24:43 +02:00
..