Anthias

mirror of https://github.com/Screenly/Anthias.git synced 2026-06-10 09:08:09 -04:00

Author	SHA1	Message	Date
Viktor Petersson	10c68b26cc	feat(viewer,build,balena): add arm64/Qt6 pi3-64 board and the Rock Pi 4 fleet; keep 32-bit pi3 as legacy (#2985 ) * feat(viewer,build): add arm64/Qt6 pi3-64 board; keep 32-bit pi3 as legacy Revises issue #2906 Phase 2. The original plan (delete the Qt 5 toolchain, force Pi 2/Pi 3 onto Qt 6) is abandoned: Qt 5 was fixed up on master and stays. Instead, add a NEW board target `pi3-64` — a 64-bit (arm64) Qt 6 viewer image for Raspberry Pi 3 hardware on a 64-bit OS — as its own image stream, disk image, and balena fleet. The legacy 32-bit armhf/Qt5 `pi3` board is left untouched and flagged as legacy/maintenance. pi3-64 mirrors the existing `pi4-64` path (Qt 6, eglfs_kms; video played in-process by AnthiasViewer's QtMultimedia pipeline — QMediaPlayer + the ffmpeg/libavcodec backend with V4L2 HW decode, no external player). VideoCore IV is H.264-only HW decode. Board selection is by `uname -m`: a Pi 3 on a 64-bit OS gets `pi3-64`, a 32-bit OS keeps `pi3` (the model string is identical on both arches). - image_builder: pi3-64 build params (arm64) + is_qt6; constants. - Dockerfile.viewer.j2 + start_viewer.sh: pi3-64 shares the pi4-64 eglfs KMS path; renamed board-agnostic eglfs-kms-pi4.json -> eglfs-kms.json. - Detection: install.sh / upgrade_containers.sh (aarch64 Pi 3 -> pi3-64). - Runtime: media_player force_mpv set (selects MPVMediaPlayer, the QtMultimedia D-Bus shim); processing codec grid {'h264'}. - CI: docker-build matrix + mirror-latest-tags. - Balena (fleet screenly_ose/anthias-pi3-64, device type raspberrypi3-64): disk-image + manual-deploy workflows, balena_ota_deploy.sh, balena_fleet_maintenance.py, balena_unpin_devices.py, deploy_to_balena.sh, balena-host-config.json. - Pi Imager: SUPPORTED_BOARDS += pi3-64 (non-maintenance); pi3 stays legacy. - Docs + tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(website): link the Pi 3 (64-bit) bullet like its siblings Copilot review: the list is introduced as 'links to the images', so the new pi3-64 entry should be navigable like the surrounding bullets. Link the label to the release-images section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(balena): add the Rock Pi 4 fleet (screenly_ose/anthias-rockpi4) Wires the anthias-rockpi4 balena fleet (device type rockpi-4b-rk3399) into the OTA deploy + disk-image pipeline. The fleet has no board-specific image build: it runs the generic arm64 containers, so bin/balena_ota_deploy.sh / bin/deploy_to_balena.sh map the rockpi4 board to the <short-hash>-arm64 image tags (and strip the /dev/vchiq mount — no VideoCore on RK3399), and the disk-image preflight verifies the arm64 images exist. Root-cause fix for the fleet's codec gate: balena ships no anthias_host_agent service, so host:board_subtype was never published and resolve_device_key() stayed 'arm64' — whose HW-decode set is empty, rejecting every video upload. The model-string → subtype table moves to the dependency-free anthias_common.device_helper.detect_board_subtype (single source, imported by host_agent), and anthias_common.board.get_board_subtype now falls back to reading /proc/device-tree/model in-container when Redis has no value. The device tree is kernel-global — the same mechanism get_device_type has always used for Pi detection — so the rockpi4 fleet resolves its {h264, hevc} envelope without a host-side daemon, and compose installs whose host_agent died self-heal too. - build-balena-disk-image.yaml: rockpi4 in both matrices, fleet + rockpi-4b-rk3399 image cases, arm64 images in the preflight check. - deploy-balena-manual.yaml: rockpi4 board option. - balena-host-config.json: rockpi4 declared {} (config.txt is RPi-only; the reconcile hard-fails on a missing key). - balena_fleet_maintenance.py / balena_unpin_devices.py: fleet added. - tests: get_board_subtype Redis-first + device-tree-fallback order; detect_board_subtype patch targets follow the move. - docs: board-enablement, balena-fleet-host-config, installation-options. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:49:12 +02:00
Viktor Petersson	7fc57fecf0	fix(docker): pull the BuildKit frontend via mirror.gcr.io (#3008 ) * fix(docker): pull the BuildKit frontend via mirror.gcr.io The `# syntax=docker/dockerfile:1.4` directive made every image build fetch the frontend from registry-1.docker.io — the last remaining Docker Hub dependency (base images already come from mirror.gcr.io, bun/uv from ghcr.io). Docker Hub pulls from shared GitHub runner IPs intermittently time out, failing CI before the build even starts. Re-point the directive at Google's pull-through cache, which serves the same multi-arch manifest list. The version pin stays for frontend reproducibility. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(docker): bump the BuildKit frontend pin from 1.4 to 1.24 1.4 dates to May 2022; 1.24 is the current release. Nothing in the templates needs newer syntax (--mount=type=cache predates 1.4), so this is purely picking up four years of frontend bugfixes. Keeps the minor-pin convention — the tag floats only over patch releases. Validated by building the rendered redis image against the mirrored 1.24 frontend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(docker): use ENV key=value form flagged by 1.24 build checks `docker build --check` with the 1.24 frontend flags the legacy `ENV DEBIAN_FRONTEND noninteractive` form (LegacyKeyValueFormat) in the test template — the only hit across all four templates. All rendered Dockerfiles now lint clean against the new frontend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:41:21 +02:00
Viktor Petersson	1568e9e7e0	fix(redis): persist data to the mounted volume so device identity survives recreation (#2983 ) * fix(redis): persist data to the mounted volume so device identity survives recreation redis-server was launched with no config file, so `dir` defaulted to the process CWD (/) and RDB snapshots were written to the container's ephemeral writable layer — never the redis-data volume mounted at /var/lib/redis. Every container recreation (a version deploy, image update, or `compose down`) therefore wiped Redis, including the telemetry `device_id` used as the GA4 client_id and its 24h cooldown. The result was that GA counted the same physical device as a brand-new one on every upgrade. Start redis-server with explicit flags instead: --dir pins data onto the mounted volume, --appendonly yes persists the (rare) device_id write within ~1s via the AOF (RDB save points alone wouldn't catch a recreation inside a save window), and the RDB save points are kept as a belt-and-braces snapshot. --protected-mode no preserves the existing cross-container access. The two sed edits to /etc/redis/redis.conf are dropped — that file was never loaded, so they were no-ops. This fixes both deployments: the redis-data volume is already mounted in docker-compose.yml.tmpl and docker-compose.balena.yml.tmpl, and named volumes persist across recreation (docker) and OTA releases (balena). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(redis): write each --save rule as its own flag No behaviour change — redis-server parses a single `--save "3600 1 300 100 60 10000"` arg into the same three snapshot rules (verified: `config get save` returns the identical schedule and the server starts cleanly either way). Splitting into one `--save` per seconds/changes pair is the conventional, unambiguous form and addresses review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-03 06:16:23 +02:00
Viktor Petersson	a0afeb091b	chore(viewer): drop the always-on Qt debug logging from the image (#2977 ) - Remove `QT_LOGGING_RULES=*.debug=true` and `QT_QPA_DEBUG=1` from docker/Dockerfile.viewer.j2 (plus the stale "Turn on debug logging for now" comment + commented-out qt.qpa rule). - These were a temporary bring-up aid ("for now", added in #2060, Nov 2024) that was never reverted: unconditional, every board, in production. On a real device that's ~20+ Qt scenegraph / sh-chunk log lines per second, which saturates balena's 1000-line log buffer in ~35 seconds and buries every application event (asset changes, errors, crashes) — actively harmful to fleet observability. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 18:20:02 +02:00
Viktor Petersson	1f438d2af0	perf(viewer): render video via QML VideoOutput in a QQuickWidget (#2975 ) * perf(viewer): render video via QML VideoOutput in a QQuickWidget - replace the QGraphicsVideoItem-on-raster-QGraphicsView substrate: QVideoFrame::toImage did an RHI offscreen render + GPU->CPU readback per frame, capping presentation at 8.3 fps (Pi 4) / 10-12 fps (Pi 5) with a saturated GUI thread while HW decode ran fine (issue 2967). Validated on both testbeds: Pi 4 30.0 fps presented at 64% total CPU, Pi 5 26.6 fps at 13-35% - VideoOutput keeps frames on the GPU: scene-graph textures with shader YUV->RGB, composited through the same QQuickRenderControl FBO machinery QWebEngineView already uses (eglfs-safe, inherits whole-screen rotation -- re-validated under QT_QPA_EGLFS_ROTATION) - log frames-rendered (QQuickWindow::afterRendering) next to frames-delivered in playback-stats so presentation-side drops are visible -- the sink-only counter is how the 8 fps regression shipped unnoticed; connection is retried from play() so the counter can't silently stay dead - fail hard (qFatal) when the QML scene is unavailable instead of decoding video to nowhere: crash-respawn is supervised and loud, a silent black-screen kiosk is not - video-rotate maps to VideoOutput.orientation (still a defensive no-op; every platform rotates the whole screen) - ship qt6-declarative-dev + qml6-module-qtquick/-qtmultimedia in the Qt6 viewer images; drop the now-unused multimediawidgets - run the C++ tests with QT_QUICK_BACKEND=software so the QML scene loads under the offscreen platform Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(image-builder): align gstreamer-drop version comment to Qt 6.5 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-02 17:06:30 +02:00
Viktor Petersson	e7f34b27e2	fix(docker): pin viewer UID/GID across all images for deterministic ownership (#2958 ) - Create the `viewer` user with a fixed UID/GID (1000) in the shared Dockerfile.base.j2 so it exists, and resolves identically, in the viewer, server and celery images. - Drop the implicit `useradd -g video viewer` from the viewer image (it picked the next free uid per image and was absent from server/celery), keeping `video` as a supplementary group. Without a pinned id, ownership of /data/.anthias (shared across the containers) was non-deterministic, so a `chown viewer …` in one container and the uid a file was written as in another could disagree — a root cause behind the upgraded-device config-permission crash-loop. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-05-31 13:00:49 +02:00
Viktor Petersson	3091fec349	feat(api,viewer): viewer REST shim + rename AnthiasWebview → AnthiasViewer (#2907 ) * feat(api,viewer): viewer REST shim + rename AnthiasWebview → AnthiasViewer - Add GET /api/v2/viewer/playlist returning server-evaluated active assets, next deadline, and ``now``; gated by internal token. - Add GET /api/v2/viewer/settings exposing only the viewer-relevant settings subset (shuffle/show_splash/screen_rotation/audio_output/ debug_logging) so the internal-auth path doesn't surface operator credentials. - Rename the C++ binary AnthiasWebview → AnthiasViewer (.pro file, Dockerfile copies, sh.Command spawn, test runner) and the D-Bus service anthias.webview → anthias.viewer (atomic because both endpoints ship in the same image). - Migrate runtime state paths /data/.local/share/AnthiasWebview and /data/.cache/AnthiasWebview to AnthiasViewer with a one-shot symlink so existing devices keep QtWebEngine cookies / local- storage across the upgrade. - Source tree src/anthias_webview/ stays put; the directory rename is deferred to Phase 5 when the Python viewer package is deleted. First step of GH #2906; sets up the contract the C++ viewer will consume in Phase 3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api,viewer): address review feedback on viewer REST shim - ViewerPlaylistViewV2 now reloads anthias.conf on read so an in-flight settings PATCH doesn't shuffle off a stale cached value — mirrors what ViewerSettingsViewV2 already did. - AssetSerializerV2.get_is_active accepts ``now`` via context so ViewerPlaylistViewV2 can render the ``is_active`` field against the same instant the filter used; closes the millisecond race where a row right on a window boundary could be returned in ``assets`` while its ``is_active`` re-evaluated to False. - Simplify the windowed-deadline-cap test assertion: parse the ISO timestamp and compare datetimes directly instead of the awkward dual-format string-prefix check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): use https in viewer API fixture URI Silences SonarCloud python:S5332 on tests/test_viewer_api.py. The fixture URIs are never fetched — they just satisfy the ``uri`` field on Asset.objects.create — but matching the existing test_recheck_endpoint.py convention keeps the linter quiet without sprinkling NOSONAR comments through test data. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): drop QtWebEngine state symlink-migration on rename Validated on real hardware: a fresh AnthiasViewer cache rebuilds itself on the next page load, so the bookkeeping to preserve cookies / local-storage across the AnthiasWebview → AnthiasViewer rename isn't worth the code. Upgraded devices just get fresh state dirs alongside the (now-orphaned) old AnthiasWebview tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-19 13:31:01 +02:00
Viktor Petersson	cc92a714e4	feat(viewer,webview): embed QtMultimedia in AnthiasWebview, eliminate two-process DRM contention + Pi 4 drops (#2905 ) * feat(viewer,webview): embed QtMultimedia in AnthiasWebview, eliminate Pi 4 frame drops (#2904) Move video playback inside AnthiasWebview's Qt 6 process via QtMultimedia (QMediaPlayer + QGraphicsVideoItem). The libmpv subprocess goes away — a single Qt process owns the eglfs/wayland surface, so the two-process DRM-master contention #2885 documented (600-2800 vo drops per 60 s clip on Pi 4) no longer applies. The D-Bus contract on MainWindow (playVideo / stopVideo / videoEnded) is preserved so Python still calls a stable interface even though the playback engine swapped underneath. Architecture * src/anthias_webview/src/videoview.{cpp,h} — new VideoView wraps QMediaPlayer + QGraphicsVideoItem + QAudioOutput. Qt 6.5 dropped the upstream gstreamer media backend so Debian Trixie ships only the ffmpeg-backed libffmpegmediaplugin.so; decode runs through libavcodec against the +rpt1 libav* packages already pinned in docker/_rpt1-ffmpeg-pin.j2 (which carry --enable-v4l2-request / --enable-v4l2-m2m so rpi-hevc-dec, bcm2835-codec, Hantro G2, rkvdec all engage automatically). * QGraphicsView + QGraphicsScene + QGraphicsVideoItem (not QVideoWidget) is the rendering substrate so video-rotate actually rotates the displayed frames — QGraphicsItem::setRotation is honoured by the painter, whereas QVideoWidget has no rotation property and a setProperty("rotation", angle) shortcut would store a dynamic value nothing reads. * src/anthias_webview/src/view.cpp — adds playVideo / stopVideo surface-switching alongside loadPage / loadImage; loadImage skips hideVideoSurface() for the 'null' sentinel so a freshly-started video isn't torn down ~66 ms after the first PLAYING event by the view_image('null') call that follows media_player.play() in asset_loop. * src/anthias_viewer/media_player.py — MPVMediaPlayer.play() routes through pydbus to the AnthiasWebview proxy. Per-codec hwdec dispatch + ffprobe codec sniff are gone; libavcodec auto-engages the right decoder. _marshal_dbus_options picks the GLib.Variant signature by Python type so int / bool / float options round-trip cleanly. video-rotate is sent as int. Operational * Pi 4 switches QT_QPA_PLATFORM from linuxfb to eglfs (QtMultimedia needs a GL context for the QGraphicsVideoItem painter). QT_QPA_EGLFS_KMS_CONFIG pins 1080p so V3D 6.0 doesn't have to composite Chromium + the video graphics view on top of the connector's native 4K. QT_SCALE_FACTOR=1 pins CSS-px to physical-px on the 1080p surface. * tools/image_builder/utils.py — drops libmpv2 / mpv from the viewer image, adds libqt6multimedia6 / libqt6multimediawidgets6 / qt6-multimedia-dev / qt6-image-formats-plugins. * /data/.anthias/playback-stats.log (renamed from mpv-stats.log) is capped at 8 MB; truncate on viewer start past the cap so a long- running 15 GB SD-card device can't fill up with 1 Hz SAMPLE rows. * VideoView::resolveAlsaDevice extracts CARD=<name> from the ALSA spec and matches the QAudioDevice id on that segment; logs the resolved id at INFO so multi-HDMI Pi 4 / Pi 5 mismatches are visible from journalctl. Validation Real-device measurements via /data/.anthias/playback-stats.log on the BBB pack (1080p / 4K, 30 / 60 fps, H.264 + HEVC), median across multi-cycle plays in the PR comments. Pi 4 BBB 1080p60 H.264 dropped from 2973 frames/min on the libmpv subprocess baseline to 0 with QtMultimedia. 12 h mixed-media burn-in: zero crashes, zero early- stops, no RSS leak across x86 / Pi 4 / Pi 5. 3 h asset-churn (120 toggles × 3 boards): zero <100 ms stops, drops stable. Rock Pi 4 arm64 image is built and identical to the validated set; the testbed itself is SSH-unreliable so its end-to-end run is deferred. C++ QtTest suite (8 cases) covers VideoView construction, stop idempotency, empty / unknown audio device handling, and QGraphicsItem::rotation() actually receiving the angle for cardinal rotations and snapping non-cardinal angles to 0. Python suite (63 cases) covers options-dict composition, D-Bus marshalling for str / int / bool / float, settings reload, codec gate symmetry, proxy reset, and VLC fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(viewer,webview): polish stale comments from second PR review * MPVMediaPlayer.__init__ comment no longer says the C++ side owns a libmpv handle — it owns QMediaPlayer + QGraphicsVideoItem. * Rename _build_mpv_options to _build_video_options. The function composes options for QtMultimedia now; the "mpv" in the name is vestigial. Class names (MPVMediaPlayer / MediaPlayerProxy) are left alone — those are the public D-Bus contract. * LoadedMedia comment in videoview.cpp now reflects Qt 6's actual semantics: "metadata available, playback can start" — first decoded frame lands a hair later via videoFrameChanged. Starting the elapsed-ms clock here is still a few-ms approximation of "first frame on screen", which is the intent. * _marshal_dbus_options return type tightened from bare ``dict`` to ``dict[str, Any]`` for symmetry with the input annotation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): marshal test works with real PyGObject + tighten typing CI ships PyGObject so ``gi.repository.GLib`` is the real module — the prior test relied on conftest's MagicMock stub (which only kicks in when ``gi`` is missing) to invoke ``assert_any_call`` on GLib.Variant. On the real Variant class that's an AttributeError. Patch ``gi.repository.GLib.Variant`` to a sentinel-returning callable inside the test scope so the assertions work with either the stub host or the real PyGObject host. The marshal still picks signatures by Python type (``s`` / ``i`` / ``b`` / ``d``); the test now asserts on the per-key tuple rather than the spy. mypy errors: * Narrow ``_last_play_options`` / ``_last_play_uri`` return values via ``isinstance`` so they don't fall through Any (no ``# type: ignore``, no ``cast``). * Add ``gi`` / ``gi.*`` to the mypy-overrides ``ignore_missing_imports`` set so the conftest stub doesn't break the type-check on hosts without PyGObject. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 13:38:16 +02:00
Viktor Petersson	57b4f25c77	feat(viewer,server): per-board HW decode dispatch + codec gate on upload (#2885 ) * perf(viewer): pi4-64/pi5 use mpv --vo=gpu --gpu-context=drm On Pi the connector's preferred mode is usually 4K (most modern TVs report 3840x2160 in their EDID), and the previous --vo=drm path ran a CPU zimg upscale from 1080p source to that 4K output. On a 4-core A72 that's the bottleneck — mpv VO drops 59-75 frames per 30s on a stock 1080p H.264 signage clip. Pi5's A76 is faster but the same upscale path is still the limit. Switching the VO to GL with the DRM context (mpv --vo=gpu --gpu-context=drm) hands the upscale to the V3D and leaves everything else identical — mpv still owns DRM master, still reads --drm-mode=1920x1080@60 (kept), still runs in --vd-lavc-threads=4 software decode (mpv 0.40 in Debian Trixie has v4l2m2m-copy but not v4l2request, so --hwdec=auto-safe falls back to software on this asset; that hasn't changed). Measured on a 4K-connected Pi4-64 Rev 1.5, same clip, same 30 s window: --vo=drm : 59-75 vo drops / 30 s --vo=gpu --gpu-context=drm (this patch) : 3-6 vo drops / 30 s `decoder-frame-drop-count` is 0 in both — the regression was purely on the VO side, and shifting scaling off the CPU is what buys the headroom. x86 (cage + --gpu-context=wayland) is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(viewer): drop --drm-mode pin on Pi4-64/Pi5 under --gpu-context=drm The previous commit moved Pi4-64/Pi5 to `mpv --vo=gpu --gpu-context=drm` but kept the `--drm-mode=1920x1080@60` pin from the old --vo=drm path. On-device testing showed the pin hurts throughput under GBM: 294 vo drops/30s with the pin, 3-6 without, on the same 4K-connected Pi4 and the same H.264 clip. The pin existed in the first place to dodge CPU zimg upscale to 4K, which the A72 couldn't keep up with on the legacy --vo=drm path. Under --gpu-context=drm the V3D does the scaling for free at the connector's preferred mode, so the workaround is no longer needed and is in fact harmful. `--vd-lavc-threads=4` stays — software decode under --hwdec=auto-safe (mpv 0.40 has v4l2m2m-copy but not v4l2request) still benefits from explicit threading. Verified on a 4K-connected Pi4-64 across H.264 (30/24 fps) and HEVC clips: 2-6 vo drops/30s in every case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(viewer): consolidate Qt6 boards onto cage + Wayland, pin Pi 4 to 1080p Folds in PR #2883: Pi 4-64 / Pi 5 now run under cage with mpv on --vo=gpu --gpu-context=wayland, joining x86 and arm64 on a single Wayland-based display stack. Drops the --vo=drm legacy path entirely from MPVMediaPlayer. Qt 5 boards (pi2 / pi3) stay on linuxfb via VLCMediaPlayer — out of scope here. Replaces the perf branch's `--vo=gpu --gpu-context=drm` standalone fix with the consolidated cage path. The previous standalone finding (3-6 vo drops / 30 s on Pi 4 at 4K) was a Pi-without-cage optimization; once Pi runs under cage like every other Qt6 board, the same trick applies via wayland but cage's composite step adds its own pass and the V3D on Pi 4 can't keep up at 4K (738 vo drops / 30 s measured at native 4K under cage). Fix: move the 1080p mode pin one layer up from app code to host config — the new ansible/.../cmdline.txt.j2 conditional appends `video=HDMI-A-1:1920x1080@60 video=HDMI-A-2:1920x1080@60` when `device_type == 'pi4-64'`. With output pinned to 1080p there's no upscale anywhere in the pipeline, matching the bandwidth profile of today's --vo=drm production setup. Pi 5 / x86 / arm64 keep the connector's preferred mode (typically 4K). Pi 5's V3D 7.1 has roughly 2× Pi 4's throughput; x86 iGPUs handle 4K via VAAPI; arm64 SBC perf varies by SoC. Other notable changes folded in from #2883: * tools/image_builder/utils.py — `cage` + `qt6-wayland` move out of the per-board branch into the shared is_qt6 block. `wlr-randr` (was x86-only) goes in the shared block too since rotation now happens via wlr-randr on every Qt6 board. `va-driver-all` stays x86-only (no VAAPI on Pi / ARM SoCs). * docker/Dockerfile.viewer.j2 — QT_QPA_PLATFORM=wayland gated on is_qt6 instead of board in ('x86', 'arm64'). * bin/start_viewer.sh — case on DEVICE_TYPE: every Qt6 board takes the cage + sudo path. Pi2 / Pi3 stay on the legacy direct-sudo path. * src/anthias_viewer/media_player.py — single --vo=gpu --gpu-context=wayland for all reachable device types. The per-board rotate_args block is gone: every Qt6 device inherits the transform from cage via wlr-randr, so mpv would double-rotate if it set --video-rotate. * tests/test_media_player.py — parametrised tests for all four Qt6 boards (x86, arm64, pi4-64, pi5) hitting the same VO path; rotation tests assert mpv never sets --video-rotate under cage. * website/data/faq.yaml — rotation entry points at Settings page / wlr-randr; resolution entry calls out the Pi 4 1080p pin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ansible): propagate tags into boot.yml include_tasks The `Configure boot partition` task in system/tasks/main.yml was tagged `touches-boot-partition` / `raspberry-pi` but those tags weren't propagated to the tasks inside boot.yml — Ansible's default include_tasks behaviour matches the include against --tags but leaves the included tasks tag-less, so they get filtered back out. Running `ansible-playbook ... --tags touches-boot-partition` therefore did nothing. Use the explicit `apply: tags:` form so the include's tags are copied onto each task in boot.yml. With this, the standalone "re-render boot config" workflow actually works, which matters on Pi 4 now that the 1080p HDMI mode pin in cmdline.txt.j2 needs to land without re-running the whole playbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): keep Pi 4 on linuxfb; only Pi 5 / x86 / arm64 go cage On-device testing on a Pi 4 Model B Rev 1.5 with a 4K HDMI display showed cage+wayland is fundamentally too heavy for the V3D 6.0: --vo=drm (existing, no cage) : 59-75 drops/30s --vo=gpu --gpu-context=drm (no cage, GPU scale): 3-6 drops/30s --vo=gpu --gpu-context=wayland (cage, even at : 730+ drops/30s, 1080p HDMI cmdline pin to avoid 4K scale) mpv at 99% CPU running ~1/4× real time The 1080p HDMI pin doesn't recover Pi 4 — cage's composite pass costs more than the V3D 6.0 has spare bandwidth for, regardless of output resolution, with the webview running in the background or not. Pi 5's V3D 7.1 has roughly 2× the throughput and is expected to keep up; x86 / arm64 already shipped on cage and remain unchanged. Net result: * Pi 4-64 stays on Qt linuxfb (no compositor) with mpv on --vo=gpu --gpu-context=drm. mpv writes straight to KMS via libgbm and lets the V3D do video scaling — keeping the standalone perf-branch finding that drops from 59-75 → 3-6 on the same clip. * Pi 5 / x86 / arm64 stay (or move) onto cage + qt6-wayland + wlr-randr with mpv on --vo=gpu --gpu-context=wayland. * Pi 2 / Pi 3 stay on the Qt5 + VLC + linuxfb track they were already on. * The Pi 4 1080p HDMI cmdline pin added in the previous commit is reverted (no longer needed without cage). * Rotation handling: mpv emits --video-rotate=N on Pi 4 (no compositor to apply the transform) and skips it on the cage boards (wlr-randr handles it there). Goal-wise this is the partial-consolidation we agreed to as last resort: three of four Qt6 boards share one Wayland stack, Pi 4 keeps the framebuffer path for as long as the V3D 6.0 + mpv 0.40 combo lacks the headroom. Pi 4 remains in scope for revisiting once mpv ships the v4l2request hwdec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): mirror host render-GID for all Qt 6 boards, not just cage mpv uses /dev/dri/renderD128 for --vo=gpu on every Qt 6 board now — wayland (cage path on x86 / arm64 / pi5) and drm (linuxfb path on Pi 4) both go through Mesa GL. The render-GID mirror was inside the cage branch of start_viewer.sh, so Pi 4's mpv ran as viewer user, hit the render node owned by GID 992, got "Permission denied", and bailed with "Failed initializing any suitable GPU context!". Hoist the render-GID setup above the per-board case so it runs for every Qt 6 board. cage / linuxfb branching stays as-is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): Pi 4 stays on --vo=drm (Qt linuxfb DRM master contention) Earlier commits switched Pi 4 to mpv --vo=gpu --gpu-context=drm based on a 3-6 vo-drop/30 s measurement. That test was run as root in a fresh container — no Qt linuxfb in the picture. In the production viewer where AnthiasWebview holds the framebuffer via Qt linuxfb, --vo=gpu fails: failed to open /dev/dri/renderD128: Permission denied [vo/gpu/drm] Failed to acquire DRM master: Permission denied [vo/gpu] Failed initializing any suitable GPU context! Error opening/initializing the selected video_out (--vo) device. Video: no video Mesa GBM holds DRM master persistently and contends with Qt linuxfb's framebuffer use. mpv's classic --vo=drm has its own master juggling (briefly grab → render → drop) that coexists fine with linuxfb — that's why master's existing Pi 4 config works. Revert Pi 4 mpv flags to the production master config: --vo=drm --drm-mode=1920x1080@60 --vd-lavc-threads=4 The standalone perf-finding from this branch's earlier history turns out not to apply in production; retracted from the roll-up. Pi 5 / x86 / arm64 unchanged (they're on cage + --vo=gpu --gpu-context=wayland, which has its own DRM master flow via cage). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): cage opens on the first connected connector, not HDMI-A-1 Without `-o`, cage uses whatever output the DRM backend enumerates first — typically HDMI-A-1 on Pi 5 (closer to USB-C) and the on-board panel / first HDMI on x86 / arm64. If the operator plugs into the other port (Pi 5 HDMI-A-2, or any DP connector on x86), cage renders to a disconnected connector and the screen stays black. start_viewer.sh now iterates /sys/class/drm/card-, picks the first connector whose status reads "connected", strips the cardN- prefix to get the bare name cage expects (HDMI-A-1, HDMI-A-2, DP-1, eDP-1, …), and passes it via `-o`. Falls back to letting cage pick if nothing is connected yet — the display may come up via HPD after cage starts, or this is a build/CI host with no display at all. Caught while end-to-end testing on the rig: Pi 5 cable on HDMI-A-2 went to a black screen even though `cat /sys/class/drm/card1-HDMI-A-2/status` reported "connected" and cage / the viewer were running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(viewer): mpv from apt.raspberrypi.com on Pi 4 / Pi 5, hwdec auto-copy Stock Debian Trixie's mpv 0.40 is compiled without `v4l2request` hwdec, so Pi 5's Hantro stateless decoder is invisible to it and mpv falls back to software decode for every H.264 / H.265 source. Pi 4's V4L2 M2M decoder is reachable via `v4l2m2m-copy` but mpv's `--hwdec=auto-safe` whitelist explicitly excludes that method, so auto-detect picked software there too. Two changes, applied together because they only make sense together: * Pi 4 / Pi 5 viewer images now pull mpv (and the FFmpeg library family it depends on) from `archive.raspberrypi.com/debian trixie main`. The Pi-tuned build ships `v4l2request` hwdec (Pi 5) and a maintained `v4l2m2m-copy` (Pi 4). An apt-pin restricts the Pi repo to the mpv + libav* packages only, so curl / ca-certificates / etc. continue to come from stock Debian and the rest of the image stays on the same baseline. * `MPVMediaPlayer.play()` switches `--hwdec=auto-safe` → `--hwdec=auto-copy`. auto-copy is the same family but with a broader whitelist that includes the v4l2-family copy hwdecs. Net effect: x86 still picks vaapi-copy (unchanged), Pi 4 picks v4l2m2m-copy, Pi 5 picks v4l2request, arm64 falls through to software (no v4l2request in stock Debian mpv, no vendor-tuned Rockchip plugin in stock either — Tier-2 follow-up). Plus an `ANTHIAS_DEBUG_DROPS=1` env knob: when set on the viewer container, mpv's stdout/stderr go to `/data/.anthias/mpv.log` (host-bound) instead of `/dev/null`, and `--no-terminal` is dropped so the status line ("AV: ... Dropped: N") is emitted. Lets us read per-asset frame-drop counts straight from the production viewer pipeline (no custom harness, no rebuild) during the test-grid runs. Default (unset) preserves the silent behaviour. Also: drops the `cage -o <connector>` autodetect attempt — cage 0.1.x in Trixie doesn't accept `-o`, just `-m last`. Use that instead so cage opens on the most-recently-connected output regardless of HDMI-A-N enumeration order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): use deb-packaged Pi keyring for archive.raspberrypi.com apt update against http://archive.raspberrypi.com/debian trixie was failing in the Pi 4 / Pi 5 viewer image builds: Sub-process /usr/bin/sqv returned an error code (1): Signing key on CF8A1AF502A2AA2D763BAE7E82B129927FA3303E is not bound: No binding signature at time … Policy rejected non-revocation signature (PositiveCertification) requiring second pre-image resistance SHA1 is not considered secure since 2026-02-01 Pi's bare `raspberrypi.gpg.key` URL still serves the original 2012-vintage RSA 2048 key with SHA1 binding signatures that Trixie's sqv refuses to certify under the post-2026-02-01 crypto policy. The deb-packaged keyring inside `raspberrypi-archive-keyring_2025.1+rpt1_all.deb` ships the same key fingerprint but with rebuilt binding signatures that sqv accepts — that's the keyring Pi OS Trixie itself installs, which is why `apt update` against this exact repo works on a real Pi 5 device today. Fetch the deb directly with curl, extract its bundled `.pgp` keyring, and point `signed-by=` at the installed copy. The pin block restricts what packages the Pi repo can supply (mpv + libav* + ffmpeg + libpostproc — the FFmpeg family), so the rest of the image keeps its stock-Debian baseline. Also extend the pin to cover libpostproc* and ffmpeg, since mpv's apt deps drag those into the Pi-tagged version on install; without the pin extension, apt rejected the resolve with "broken packages". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(viewer): per-codec hwdec on Pi via Lua hook mpv 0.40's `--hwdec` accepts a single value at startup, so we can't ask it to try v4l2m2m-copy for H.264 and drm-copy for HEVC out of the box. The Pi-tuned mpv from archive.raspberrypi.com supports both hwdec methods but each covers a different codec subset: * v4l2m2m-copy — Pi 4's V3D V4L2 M2M decoder. H.264 works; Pi 5's Hantro G2 is V4L2-stateless-only so this no-ops there. * drm-copy — FFmpeg's `v4l2_request_hevc` hwaccel. HEVC only, works on both Pi 4 and Pi 5. Add a small `on_load` Lua hook (inlined as `_PI_HWDEC_LUA`, written to /tmp on first play(), loaded with `--script=`) that checks `video-codec-name` and picks the right hwdec at file open. Net effect: Pi 4 H.264 → v4l2m2m-copy (HW) Pi 4 HEVC → drm-copy (HW) Pi 5 H.264 → v4l2m2m-copy (no device, falls back to SW — only path until mpv re-adds v4l2_request_h264 hwdec) Pi 5 HEVC → drm-copy (HW) The base `--hwdec=auto-copy` startup value still applies on x86 / arm64 (vaapi-copy on Intel/AMD; software fall-back on Rockchip), where the hook isn't loaded. Verified on real hardware: $ mpv ... --script=/tmp/anthias-pi-hwdec.lua test_hevc.mp4 [pi-hwdec] codec=hevc -> hwdec=drm-copy Using hardware decoding (drm-copy). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer,server): HW-decode everywhere on Pi 4 / Pi 5 / x86 The previous per-codec Lua hook in media_player.py was a silent no-op: mpv's video-codec-name property is empty at every script event before hwdec init (on_load, on_preloaded), so --hwdec=auto-copy leaked through. auto-copy's upstream whitelist excludes v4l2m2m-copy, so H.264 on Pi 4 fell back to software despite the V3D V4L2 M2M decoder being available. Viewer (src/anthias_viewer/media_player.py) - Replace the Lua hook with ffprobe-driven dispatch from Python at launch time. ffprobe is in the viewer image; the call is ~50 ms. - Per-board mapping: Pi 4 → {h264: v4l2m2m-copy, hevc: drm-copy}; Pi 5 → {hevc: drm-copy}. Pi 5 H.264 falls back to auto-copy because mpv has no v4l2-request H.264 hwdec for the Hantro G1, and passing v4l2m2m-copy there just logs "Could not find a valid device" before SW-falling-back. - Live-verified on Pi 4: "Using hardware decoding (v4l2m2m-copy)" for 1080p H.264 and "Using hardware decoding (drm-copy)" for HEVC at 1080p and 4K. Asset processor (src/anthias_server/processing.py) - Pi 5 profile drops H.264 from passthrough_video_codecs — Pi 5 has no mpv H.264 HW path, so H.264 uploads must transcode to HEVC at upload time to keep the HW-decode-everywhere contract. - Pi 4 profile adds passthrough_video_max_pixels for H.264, capped at 1080p (19201080). 4K H.264 clears the codec gate but the V3D H.264 envelope tops at 1080p60, so the cap forces it through a libx265 re-encode at upload time. HEVC keeps no cap (the dedicated HEVC block handles 4Kp60). - _ffprobe_summary now returns video_pixels alongside codec / container / audio_codec; _video_can_passthrough enforces the per-codec pixel cap when the profile declares one. Tests - test_media_player.py: new per-board hwdec tests (Pi 4 H.264 → v4l2m2m-copy; Pi 5 H.264 → auto-copy; both → drm-copy for HEVC; auto-copy fallback when ffprobe fails; no probe on x86 / arm64). - test_processing.py: matrix tests updated to include video_pixels; parametrised rows now exercise Pi 5 H.264-no-passthrough and the Pi 4 4K H.264 cap. New end-to-end tests prove _run_video_normalisation transcodes Pi 5 H.264 → HEVC and Pi 4 4K H.264 → HEVC. Docs (docs/board-enablement.md, new) - Goal + per-board HW-decode capability table. - Asset processor codec policy spelled out as a contract. - BBB test bed recipe (source clips, libx265 transcode commands, ANTHIAS_DEBUG_DROPS=1, mpv.log slicing). Follow-up: Pi 5 4K HEVC HW The Hantro G2 decoder can't allocate 4K dst buffers from Pi 5's default 64 MB CMA ("v4l2_request_hevc_start_frame: Failed to get dst buffer") and SW-falls-back. Adding cma=512M to the kernel cmdline does NOT work — the kernel takes the cmdline value over the device-tree linux,cma node, orphaning rpi-hevc-dec ("Failed to probe hardware -517") and unpopulating /dev/video, which kills HEVC HW at every resolution. The right fix is a dtparam/dtoverlay in /boot/firmware/config.txt that resizes the existing DT-declared region without orphaning the codec's reserved-mem reference. Until that lands, the pi5 profile should downscale 4K → 1080p HEVC. Documented in cmdline.txt.j2 and docs/board-enablement.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(viewer,server): mock _probe_video_codec; fix mypy on Popen IO types CI failures on the previous commit (`bb27b186`) came from: * ``subprocess.run`` inside ``_probe_video_codec`` blowing up under the existing ``mpv`` fixture, which patches ``subprocess.Popen`` to a MagicMock. ``subprocess.run`` internally instantiates Popen for the ffprobe shellout, gets a MagicMock back, then trips on unpacking communicate()'s result. Fixed by default-mocking ``_probe_video_codec`` in the fixture (returns '' so dispatch falls back to 'auto-copy', preserving legacy assertions) and layering the same mock onto the standalone rotation tests that build MPVMediaPlayer outside the fixture. * ``ruff format``: the multi-line ffprobe arg list in ``_probe_video_codec`` needed splitting one-arg-per-line. * ``mypy``: typing the popen_stdout / popen_stderr locals as ``object`` couldn't satisfy any Popen overload. Switched to ``int \| IO[bytes]`` which covers both the DEVNULL / STDOUT sentinels and the bind-mounted mpv.log file handle. * ``test_passthrough_containers_match_real_ffprobe_format_names`` was pinned to the pi5 profile to exercise the H.264 + HEVC passthrough path; pi5 no longer passthroughs H.264, and the fake summary it constructs has no width/height (so pi4-64's cap fails it too). Switched the pin to x86, which has no per-codec caps — the test is about container recognition, not codec/resolution gating. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): downscale 4K HEVC → 1080p on Pi 5 (CMA workaround) Pi 5's Hantro G2 HEVC decoder is rated for 4Kp60 but the stock 64 MB CMA on Pi OS can't fit a 4K HEVC dst-buffer pool — at 4K mpv hits ``v4l2_request_hevc_start_frame: Failed to get dst buffer`` and silently SW-falls-back. Bumping cma= on the kernel cmdline orphans ``rpi-hevc-dec`` entirely (the kernel takes the cmdline value over the device-tree linux,cma node, leaving the driver returning ``Failed to probe hardware -517``), so the kernel-side knob isn't available without a dtoverlay change. Until that follow-up lands, the asset processor caps Pi 5 HEVC at 1080p both ways: * ``passthrough_video_max_pixels`` gates 4K HEVC uploads out of passthrough — anything wider than 1920×1080 falls through to a re-encode. * New ``transcode_video_max_pixels`` per-codec field tells ``_transcode_to_target`` to emit a ``-vf scale='if(gt(ih,1080),-2,iw)':'min(ih,1080)'`` filter that caps height at the 16:9 budget (cap_h = floor(sqrt(cap × 9/16))). Portrait 4K → 1080p height; landscape 4K → 1920×1080. Sub-1080p sources are untouched (the ``min()`` guard prevents upscale; ``-2`` on width keeps libx265 happy with even dimensions). Pi 4 / x86 don't carry the cap (their HW decoders handle 4Kp60 cleanly), so the filter stays absent from those profiles. Tests cover (a) the new pi5+hevc+4K row in the parametrised passthrough matrix (False at 4K, True at 1080p), (b) ffmpeg argv shape: -vf scale=... emitted for pi5 HEVC, absent for pi4-64 HEVC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer,system): Pi 5 4K HEVC HW + display-resampled VO sync Two tied changes that move every supported board to clean HW decode at the source's actual framerate. Pi 5 4K HEVC via cma-512 ------------------------ Pi OS for Pi 5 reserves 64 MB of CMA by default. The Hantro G2 HEVC decoder needs a buffer pool large enough to hold several 4K dst frames (each ~12 MB) plus reference frames, so the stock allocation can fit 1080p HEVC but not 4K — at 4K mpv hits ``v4l2_request_hevc_start_frame: Failed to get dst buffer`` and silently SW-falls-back. Adding ``cma=512M`` to /boot/firmware/cmdline.txt does NOT work: the kernel takes the cmdline value over the device-tree ``linux,cma`` node, which orphans ``rpi-hevc-dec`` entirely (returns ``Failed to probe hardware -517`` and ``/dev/video`` disappears, killing HEVC HW at every resolution). The Pi-OS-blessed merge is ``dtoverlay=vc4-kms-v3d,cma-512`` in /boot/firmware/config.txt — the v3d overlay carries its own ``cma-N`` parameter that resizes the DT linux,cma node in place without orphaning the codec driver. A standalone ``dtoverlay=cma,cma-512`` silently no-ops on Pi 5 because the v3d overlay initialises the CMA region first; reusing the v3d overlay's parameter is the documented way to merge them. ansible/roles/system/templates/config.txt.j2 now emits the ``,cma-512`` parameter on Pi 5 only — Pi 4 already gets 512 MB CMA by default so the override is a no-op there. The earlier attempt at a kernel-cmdline cma= override (in cmdline.txt.j2) is removed; the file's comment now points readers at the correct config.txt path. Live-verified on Pi 5: CmaTotal=512MB after the overlay change, /dev/video present, rpi-hevc-dec probes cleanly. Asset processor pi5 profile no longer carries a HEVC pixel cap — Pi 5 can decode HEVC at its silicon's real capability. mpv --video-sync=display-resample --------------------------------- mpv 0.40 defaults to ``--video-sync=audio`` which syncs the video clock to the audio clock and drops VO frames when the two drift. On every board tested (Pi 4 --vo=drm, Pi 5 + x86 --vo=gpu --gpu-context=wayland) this produced 60–90% VO drops at 60 fps content even when the decoder reported healthy HW decode (``Using hardware decoding (...)`` banner present, no decoder errors). The drops were at the VO, not the decoder. ``--video-sync=display-resample`` flips the relationship: sync video to the display refresh and resample audio to match. Audio resampling is a <1% CPU 2-channel job and most signage clips have no audible content anyway, so it's effectively free; the benefit is clean playback at the source's frame rate. Test bed touched ---------------- * test_play_invokes_popen_with_expected_args_on_pi4_64: argv now includes ``--video-sync=display-resample``. * test_video_can_passthrough_respects_board_codec_set: pi5 + hevc + 4K is now ``True`` (passthrough) because the CMA fix lets the silicon do its rated job. Comment updated to point at config.txt.j2. * Removed the transient downscale-on-Pi 5 codepath (``transcode_video_max_pixels`` field, the ``-vf scale='if(gt(ih,...))':...`` filter, and the two tests asserting it) — that was a workaround for the CMA issue and is no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): introduce PlaybackEnvelope dataclass + matrix + cache Foundation for the per-board playback envelope rollout (see /home/ubuntu/.claude/plans/serene-munching-gem.md). No behaviour change yet — wires up the canonical source of truth that processing.py, celery_tasks.py's future re-render walker, and the viewer's hwdec dispatch will all read from in the next commit. src/anthias_server/playback_envelope.py (new) --------------------------------------------- Frozen dataclass ``PlaybackEnvelope`` carrying codec / max_width / max_height / max_fps plus a fixed ``container_ext = 'mp4'``. ``ENVELOPE_BY_DEVICE_TYPE`` maps every supported board: * pi2 / pi3 / arm64 → H.264 1920x1080 30 (no HEVC silicon / no upstream mpv HW path) * pi4-64 / pi5 / x86 → HEVC 3840x2160 60 (dedicated HEVC block or VAAPI; fleet uniformity so the same upload produces bit-identical variants on every board) ``compute_envelope()`` resolves the current process's envelope from DEVICE_TYPE; unset / unknown / mixed-case / whitespace all fall back to the conservative default (H.264 1080p30). ``load_cached()`` / ``save_cached()`` round-trip the envelope to ``~/.anthias/playback-envelope.json``. Cache corruption (missing file, bad JSON, unsupported codec) returns ``None`` so the caller recomputes and overwrites — a hand-edit that breaks the file self-heals on next start. ``save_cached`` writes atomically via temp-file + rename. src/anthias_server/processing.py -------------------------------- ``_ffprobe_summary`` now returns ``video_fps`` alongside the existing keys. The next commit (Phase 2) uses this to decide whether to emit ``-r envelope.max_fps`` — the cap is one-way, so sub-cap source rates pass through unchanged. r_frame_rate is parsed as a rational ``num/den``; unparseable / zero-denominator collapses to ``None`` so the caller treats source fps as "unknown" and skips the gate. tests ----- * tests/test_playback_envelope.py (new): matrix coverage; unset / unknown / cased / whitespace inputs; cache round-trip; missing / corrupt JSON / invalid-payload recovery; atomic write (no leaked .tmp); container_ext invariant. * tests/test_processing.py: positive video_fps cases (integer rates, NTSC drop-frame 30000/1001 + 60000/1001, bogus / no-slash / zero-denominator inputs); the two ``assert summary == { ... }`` ffprobe-recovery tests now include the new ``video_fps: None`` key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): envelope-driven asset processor with sibling-original Refactor ``processing.py`` so every video upload produces a variant matching the board's playback envelope while preserving the source as a sibling ``.original.<ext>`` file. Rotation is now gapless by construction — every variant on disk shares one codec / max resolution / max fps per board, so the viewer's output mode never has to switch mid-clip. src/anthias_server/processing.py -------------------------------- * Replace ``_BOARD_PROFILES`` + ``_resolve_board_profile`` + ``_PI4_H264_MAX_PIXELS`` + ``_BoardProfile`` typedef with ``compute_envelope()`` from the new ``playback_envelope`` module (landed in `0b6bea0c`). One canonical source of truth for "what every variant on disk looks like". * ``_ffprobe_summary`` now returns per-axis dimensions (``video_width``, ``video_height``) alongside the existing ``video_pixels`` total. The envelope check is per-axis so an ultrawide source (e.g. 5760×1080) gets caught by the width cap even though its total pixel count is below 4K's. * ``_video_can_passthrough(summary, envelope)`` is the new contract: passthrough iff (a) container is mp4, (b) codec matches envelope.codec exactly, (c) both axes are within the envelope cap, (d) source fps is at-or-under envelope.max_fps, (e) audio is demuxer-compatible. Any None in source dims / fps bails to transcode (we don't gamble on unsized clips). * ``_transcode_to_target(input, output, envelope=None, source_summary=None)`` emits the smallest set of flags that lands the output inside the envelope. ``-vf scale=...`` only when source > envelope on either axis; ``-r envelope.max_fps`` only when source fps > cap. The fps cap is one-way — we never up-convert a sub-cap source. New helper ``_video_args_for_codec`` picks libx264 / libx265 from the envelope's codec. * ``_run_video_normalisation`` reorganised around the sibling- original pattern: - Fresh upload / legacy asset: rename ``Asset.uri`` to ``<base>.original.<ext>`` (the source-preservation step). - Re-render: read from the existing ``.original.`` sibling instead. - Re-probe from the (possibly new) source location. - Passthrough branch: copy source → variant slot bitwise (cross-device fleet sha256 stays equal). - Transcode branch: staging-file render with the existing atomic-replace contract. - Stamp ``metadata['original_uri']`` (path to sibling), ``metadata['envelope']`` (envelope dict the variant matches). ``metadata['transcode_target']`` kept as the ``envelope.codec`` duplicate for one release of back-compat with the serializer surface. Tests ----- ``test_video_can_passthrough_decision_table`` recast against the H.264 1920×1080 30 default envelope. Each row tests one gate (codec / per-axis dim / fps / audio / unknowns / probe gaps) without overlap. * ``test_video_can_passthrough_respects_envelope`` end-to-end: pin ``DEVICE_TYPE``, build a summary at the given (codec, w, h, fps), assert the verdict. Replaces the legacy ``..._respects_board_codec_set``. * ``test_transcode_to_target_emits_scale_when_source_oversize``, ``..._emits_fps_clamp_when_source_fast``, ``..._omits_clamps_when_source_at_envelope``: pin the smallest ffmpeg flag set per source / envelope combination. * ``_envelope_summary`` helper at the top of the file short-circuits the per-test summary construction. * Mock signatures for ``_transcode_to_target`` updated to accept the new ``envelope`` / ``source_summary`` kwargs. * ``test_resolve_board_profile_picks_target_codec_per_board`` deleted — equivalent coverage is in tests/test_playback_envelope.py against ``compute_envelope`` directly. Stale doc / comment references to ``_BOARD_PROFILES`` / ``_resolve_board_profile`` updated to point at ``playback_envelope.ENVELOPE_BY_DEVICE_TYPE`` / ``compute_envelope``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): re-render walker + startup envelope reconciler * New celery task `regenerate_for_envelope_change`: walks `Asset.objects.filter(mimetype='video')` and queues `normalize_video_asset` for any row whose `metadata['envelope']` no longer matches the current envelope. Malformed payloads, missing keys, and per-row exceptions are logged but don't stop the walker. * New `AnthiasAppConfig.ready` hook -> `app/startup.py: run_envelope_check`: compares cached vs computed envelope, persists fresh, dispatches the walker on mismatch. Short-circuits under `ENVIRONMENT=test` / `PYTEST_CURRENT_TEST` so pytest runs don't enqueue stray walkers. Celery dispatch failure is logged but non-fatal -- the cache is already saved, so the next start sees the new envelope on disk and recovers. * Tests cover: skip-in-envelope, queue-stale, legacy migration (no envelope key), image-asset skip, force-requeue, malformed payload recovery, continue-after-per-row-failure, every hook code path (test short-circuit, no-cache, match, mismatch, dispatch failure, corrupt cache). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): preserve `.original.<ext>` siblings during orphan sweep The Celery ``cleanup`` task built its "referenced" set only from ``Asset.uri``. With sibling-original storage, the source bytes live at ``metadata['original_uri']`` (e.g. ``<id>.original.mov``) while ``Asset.uri`` points at the playback variant (``<id>.mp4``). Without this fix every video upload's ``.original.<ext>`` falls outside the 1h mtime guard once the variant lands and gets silently deleted on the next hourly sweep — breaking the re-render walker as soon as the envelope changes. * ``cleanup``: union ``Asset.uri`` ∪ ``metadata['original_uri']`` into the referenced set, tolerant of legacy rows with non-dict metadata. * Tests cover the new claim path + the malformed-metadata fallback so a stray ``metadata=None`` row can't crash the sweep. The upload-path serializer itself stays untouched: the existing ``rename(tmp, <id><ext>)`` lands the upload at a single path, and ``processing._run_video_normalisation`` handles the rename-to-``.original.<ext>`` atomically on first run. No double- write, no extra disk traffic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(server): cover sibling-original storage across normalisation paths Adds five tests pinning the ``.original.<ext>`` + variant contract that the envelope walker depends on: * fresh upload → ``<id>.original.<src_ext>`` created next to ``<id>.mp4``; ``metadata['original_uri']`` + ``metadata['envelope']`` populated. * re-render → ``.original.<ext>`` is byte-identical across passes (sha256 compared before/after); the walker reads from it and never rewrites it. * passthrough → both files exist even when the source already matches the envelope (``shutil.copyfile`` semantics, not rename). * legacy migration → pre-rollout assets with no ``original_uri`` key get renamed to ``.original.<ext>`` on first walker pass. * dangling ``original_uri`` → falls back to treating ``asset.uri`` as the source-to-preserve; no silent error, no lost variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(board-enablement): replace codec policy table with playback envelope * board-enablement.md now documents the envelope matrix as the single source of truth shared by the asset processor, the re-render walker, and the viewer's hwdec dispatch. The legacy ``_BOARD_PROFILES`` / ``passthrough_video_codecs`` vocabulary has been removed -- it never matched what ``processing.py`` does post-envelope. * Calls out the ``<id>.original.<src_ext>`` + ``<id>.mp4`` sibling layout, the metadata keys the walker reads, and the cross-board fleet sha256 expectation. * Pi 5 CMA quote rewritten: the real fix is ``dtoverlay=vc4-kms-v3d,cma-512`` in config.txt, not a downscale workaround. Kernel cmdline ``cma=`` is documented as the broken path it actually is. * Failure-mode list updated for envelope-driven dispatch (off- envelope variant, display refresh ceiling, walker storm on unwritable cache, sha256 fleet divergence). * ``media_player.py`` comment block: updates the Pi 5 H.264 → auto-copy and HEVC → drm-copy comments to reference the playback envelope by name and point at the correct CMA fix (config.txt dtoverlay, not cmdline.txt). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): mypy on `_make_video_asset` + boolean is_enabled * `dict` annotations get explicit `dict[str, Any]` parameters (Anthias's mypy config sets `disallow_any_generics`). * `is_enabled=1` → `is_enabled=True` so the Asset field's bool type matches mypy's view of django-stubs models. * Adds the missing ``typing.Any`` import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server,tests): envelope-aware container gate + startup hook safety Run 1 of CI surfaced several issues in the envelope refactor: * MP4 family container detection. ffprobe reports an MP4 file's ``format_name`` as ``mov,mp4,m4a,3gp,3g2,mj2`` (``mov`` first because the QuickTime/MP4 demuxer is one codepath). The envelope gate compared the source container to ``envelope.container_ext`` by exact equality, so every MP4 upload was rejected at the container gate even though the bytes are exactly what we'd write. Adds ``_MP4_FAMILY_CONTAINERS`` and special-cases ``mp4`` envelope to accept any synonym. * Celery workers were running ``run_envelope_check``. ``celery_tasks.py`` top-level-calls ``django.setup()``, which fires ``AppConfig.ready`` in every process that imports it, including the celery worker -- the previous comment in ``apps.py`` was wrong. Two writers race on the cache file and could double-queue the walker for a single envelope change. New ``_is_celery_worker()`` short-circuit detects the ``celery -A ... worker`` invocation via ``sys.argv[0]``. * Settings singleton captures HOME at init. ``AnthiasSettings.home`` is set once at module import time, so ``monkeypatch.setenv('HOME', tmpdir)`` in tests doesn't reach the envelope cache helpers. Updates ``cache_dir`` and ``fake_home`` fixtures to also patch ``settings.home`` via ``monkeypatch.setattr``. * Stale tests. - Drop ``test_cleanup_tolerates_non_dict_metadata`` -- the schema enforces ``metadata`` as a non-null JSON dict, so the failure mode it claimed to test can't occur. ``cleanup()`` keeps the defensive ``isinstance(metadata, dict)`` check as a no-cost belt-and-braces. - ``test_video_passthrough_for_h264_or_hevc_in_known_containers`` rewritten as ``test_video_passthrough_when_source_matches_board_envelope`` -- the old matrix included libx264 on pi4-64 (no longer passthrough because pi4-64 is HEVC) and non-mp4 containers (always re-encoded now because the variant slot is fixed at ``.mp4``). - ``test_video_passthrough_records_target_codec`` switches the source codec to libx265 so it actually hits the passthrough branch on pi4-64. - ``test_video_passthrough_uses_summary_duration_no_second_probe`` rebuilt via ``_envelope_summary`` so the synthesised summary carries the new ``video_width / video_height / video_fps`` fields. - The two ``test_ffprobe_summary_handles_`` early-return shape assertions add ``video_width`` / ``video_height`` to match the real return shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(server,tests): drop PYTEST_CURRENT_TEST gate; align stale summaries Run 2 of CI surfaced three more issues: * ``PYTEST_CURRENT_TEST`` is not fixture-controllable. pytest re-sets the env var at the start of every test's ``call`` phase, so ``monkeypatch.delenv`` in a ``setup`` fixture is overridden before the body runs. This made it impossible for any test to exercise the real startup hook path. The ``ENVIRONMENT=test`` gate (set in ``conftest.py`` + the test compose file) is the durable, fixture-controllable signal — keep that, drop the pytest one. Test for the new ``_is_celery_worker`` short-circuit replaces the deleted ``test_short_circuits_when_pytest_current_test``. * Decision table parametrise had a wrong expectation. Summary row "HEVC at envelope (codec, dims, fps all match)" was paired with ``expected=True``, but the test envelope is H.264 — codec mismatch must transcode, ``False``. * ``test_video_passthrough_skips_duration_when_probe_unavailable`` summary missed the new dim/fps fields. Same root cause as before: ``_video_can_passthrough`` rejected the synthesised summary at the dims gate, the test fell through to a real ffmpeg call on a 64-byte stub, and ffmpeg "Invalid data found". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(envelope): add generic-arm64 key for Rock Pi / Armbian SBCs The Anthias install path for Rock Pi 4 / Armbian boards writes ``DEVICE_TYPE=generic-arm64`` (see ``feat(install): generic-arm64 best-effort support``). The matrix only listed ``arm64``, so a real install fell through to ``_DEFAULT`` — same envelope by coincidence, but the walker would have logged "no matrix entry" warnings on every server start and the docs/board-enablement matrix would be subtly wrong about which key applies. Lists the key explicitly with the same conservative H.264 1080p30 envelope and extends the parametrise coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): make celery_tasks.py top-level django.setup() reentrant-safe ``django.setup()`` calls ``apps.populate()``, which raises ``RuntimeError: populate() isn't reentrant`` if invoked while already populating. The new ``AnthiasAppConfig.ready`` hook imports ``celery_tasks`` to dispatch the walker, which until this change top-level-called ``django.setup()`` again -- so on every real server start the import died, the dispatch failed, and the walker never ran. Live-confirmed on the Pi 4 test bed. Check ``django.apps.apps.apps_ready`` before calling ``setup()``: the flag flips to True after the import phase but before per-app ``ready`` hooks run, so the standalone celery worker (where Django isn't initialised yet) still calls setup() as before, while the server process (mid-populate) correctly skips the reentrant call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): commit `original_uri` to DB before transcode (crash safety) Live-confirmed on the Pi 4 test bed during the envelope rollout: walker fired on a near-full SD card, ffmpeg ran out of space mid- render, the on_failure hook cleared ``is_processing`` -- and the hourly ``cleanup()`` sweep then silently deleted every ``.original.<ext>`` source it had just renamed, because ``Asset.uri`` still pointed at the (now-missing) variant path and the orphan walker only knew about ``Asset.uri`` + a committed ``metadata['original_uri']``. The metadata accumulator in ``_run_video_normalisation`` only wrote to the DB at the end of the function, so any failure between "rename source → .original.<ext>" and "render variant → atomic replace" left the row's metadata stale. Fix: persist ``metadata`` to the DB right after the rename, before attempting any render. The contract becomes: if the file is on disk under ``.original.<ext>``, the DB row knows it. ``cleanup()`` already reads ``metadata['original_uri']`` into the referenced set (from ``fix(server): preserve `.original.<ext>` siblings during orphan sweep``), so this commit closes the only window where that guard could be bypassed. Adds ``test_original_uri_persisted_before_render_for_crash_safety`` which mocks ``_transcode_to_target`` to raise and verifies the row has ``metadata['original_uri']`` committed by the time the exception propagates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(board-enablement): script-driven 1-minute sample pack Previously the test pack was full-length BBB clips (~10 min) plus an inline ffmpeg recipe in the docs that produced 4K HEVC re-encodes taking ~30 min on a workstation. The on-device walker then had to chew through the full-length variants, which on a Pi 4 / Rock Pi turned a single rotation cycle into hours of wallclock for what was really a hwdec-banner sanity check. * New ``bin/generate_board_enablement_testbed.sh``: downloads the four BBB H.264 sources, trims each to 60 s with ``-c copy`` (instant), then libx265-encodes each cut. Idempotent (skips files that already pass an ffprobe sanity check) and atomic (tmp-then-rename) so a power cycle mid-encode leaves a clean state. * Pack drops from ~3.3 GB / 10 min per clip to ~350 MB / 60 s per clip. 60 s is enough to capture mpv's ``hwdec-current`` banner and read a stable ``Dropped:`` count, while keeping a full walker pass under a few minutes on every supported board. * ``CUT_SECONDS`` / ``HEVC_CRF`` env knobs override defaults for iteration; the table in the doc lists what each clip exercises. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(envelope,viewer): runtime Rock Pi 4 detection unlocks v4l2m2m HW decode ``bin/install.sh`` writes ``DEVICE_TYPE=arm64`` for every aarch64 SBC it doesn't recognise as a Pi — Rock Pi 4, Orange Pi, Allwinner H6 boards, Amlogic S905 boards all share that one catch-all DEVICE_TYPE. The matrix can't promote ``arm64`` to HEVC + HW because most of those boards have no upstream-mpv HW decode path and would log "Could not find a valid device" on every play. But the Rock Pi 4 (RK3399 / Radxa) DOES have a working v4l2m2m driver exposed by the kernel: $ docker exec anthias-anthias-viewer-1 mpv --hwdec=help \| grep v4l2m2m v4l2m2m-copy (h264_v4l2m2m-v4l2m2m-copy) v4l2m2m-copy (hevc_v4l2m2m-v4l2m2m-copy) v4l2m2m-copy (vp9_v4l2m2m-v4l2m2m-copy) ... and ``/dev/video-dec2`` / ``/dev/video-dec4`` are present (the v4l2_request decoder symlinks). Leaving Rock Pi on SW decode for 1080p HEVC measurably wastes the silicon. Resolved at runtime via ``/proc/device-tree/model``: * New matrix key ``rockpi4`` → HEVC 1920×1080 30. 1080p ceiling keeps disk use of the variant + ``.original.<ext>`` sibling comfortable on the typical SD card; HEVC codec exercises the Hantro path on the way through the viewer. * ``compute_envelope`` and ``_pi_hwdec_for_uri`` both probe the device tree when DEVICE_TYPE is ``arm64`` (or legacy ``generic-arm64``). A Rock Pi 4B reports ``Radxa ROCK Pi 4B`` and gets upgraded; an Orange Pi or an Allwinner H6 board stays on the conservative SW envelope. * Failure modes (no device tree, decode error, unknown SBC) all collapse to ``None`` so dev containers and the existing arm64 catch-all keep working unchanged. Four new tests pin: - Rock Pi model → ``rockpi4`` envelope; - legacy ``generic-arm64`` label also gets the upgrade; - unknown SBC keeps the conservative envelope; - missing ``/proc/device-tree/model`` doesn't raise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(envelope,viewer): publish board subtype via host_agent + Redis Previous commit (``dde1b20e``) added a runtime ``/proc/device-tree`` read inside the server + viewer containers. Containers don't see that path by default, and mounting it into every container is heavier than it's worth for one edge case (worse, balena's restricted /proc would still trip). ``anthias_host_agent`` already runs on the host and publishes host-side state to Redis (IP addresses, etc.). It's the right layer for board identification: * New ``detect_board_subtype()`` reads ``/proc/device-tree/model`` directly (host_agent IS on the host) and maps known SBC strings to matrix keys (Rock Pi 4A/4B/4C → ``rockpi4``). * New ``set_board_subtype()`` publishes the resolved key (or the empty string for unknown boards) to ``host:board_subtype`` before ``subscriber_loop`` flips ``host_agent_ready`` — so consumers can rely on the key being there once the readiness flag is set. * Server's ``playback_envelope.compute_envelope`` and viewer's ``_pi_hwdec_for_uri`` read the same Redis key when DEVICE_TYPE is ``arm64`` / legacy ``generic-arm64``. Failure modes (Redis down, key missing, decode error) all collapse to ``None`` so the caller falls back to the conservative arm64 envelope. No compose template changes. The viewer + server containers already have Redis reachable (they use it for the Channels layer + walker dispatch already), so the data path is free. Unit tests pin: * device-tree → subtype mapping for canonical + variant + edge Rock Pi strings, plus unknown boards; * Redis publish writes the resolved key OR empty string; * server's compute_envelope reads back through Redis correctly for known / unknown / empty / unreachable cases; * subscriber_loop calls set_board_subtype before flipping ``host_agent_ready`` — race-free ordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(celery): cap walker to --concurrency=1 so transcodes can't choke playback Default celery worker concurrency = num_cores. On the boards Anthias actually ships to (Pi 4 / Pi 5 / Rock Pi 4 / arm64 SBCs), that means up to 4 parallel ``libx265`` encodes sharing the same SoC as the viewer's mpv process. ``nice -n 19`` + ``ionice -c 3`` are already in place, but nice(1) only helps when there's CONTENTION -- four ffmpegs at nice 19 still saturate every core, and each 1080p libx265 encode needs ~500 MB RAM. A 4 GB SBC pushes into swap well before the walker finishes, which stalls everything on the host -- live- confirmed on the Rock Pi 4 during this PR: sshd starved through banner exchange whenever the walker hit a fresh burst. Asset processing is upload-time, not throughput-bound. The operator-facing latency that matters is "upload click → asset visible in rotation", which is bound by ONE encode regardless of queue parallelism. Serial encodes finish a few minutes later in wallclock but the viewer never drops a frame. Applied to every prod / dev compose template. ``docker-compose.test.yml`` is left at default because the test suite never runs live normalize tasks (the celery service in tests just exercises the task dispatch plumbing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): force MPV on legacy ``generic-arm64`` DEVICE_TYPE Rock Pi 4 running an older arm64 image reports ``DEVICE_TYPE=generic-arm64`` (pre-``refactor: rename device_type generic-arm64 → arm64`` rebuilds). The MediaPlayerProxy override only force-routed MPV for ``arm64`` / ``pi4-64``, so the legacy label fell through to VLC -- which then crashed with ``NameError: no function 'libvlc_new'`` because the libvlc lib isn't installed on the arm64 image. Live-confirmed in the viewer crash loop on the Rock Pi 4 during this PR. Adds ``'generic-arm64'`` to the force_mpv set + a test pinning the dispatch. Covers the in-the-wild rolling-upgrade window where a Rock Pi 4 deployment is sitting on an old image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): route ``generic-arm64`` through cage + ALSA-default like ``arm64`` Two more places in ``media_player.py`` only checked the post-rename ``arm64`` DEVICE_TYPE and missed the legacy ``generic-arm64`` label the Rock Pi 4 test bed still reports: * VO dispatch (line ~419) — without this, a generic-arm64 host falls through to the ``--vo=drm`` else branch, which mpv aborts with "No primary DRM device could be picked" because cage already holds DRM master in the cage + Wayland viewer stack (live-confirmed on the Rock Pi 4 in this PR). * ALSA card selection (``get_alsa_audio_device``) — the Pi-name dispatch below the env-var check picks ``vc4hdmi`` / "Headphones" cards that don't exist on Rockchip / Allwinner / Amlogic. Without the legacy label here, mpv tries to open the Pi-specific HDMI card and dies with ``Unknown PCM sysdefault:CARD=vc4hdmi``. Both branches now use the shared ``_ARM64_DEVICE_TYPES`` frozenset that already governs the hwdec subtype probe, so the three paths (envelope, hwdec dispatch, VO + ALSA) agree on what DEVICE_TYPE labels are aarch64-catch-all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(envelope): Rock Pi 4 stays on H.264 1080p30 -- stock ffmpeg has no v4l2_request Live testing on the Rock Pi 4 surfaced that the arm64 viewer image's stock ffmpeg (Debian 7.1.3-0+deb13u1) is built without ``--enable-v4l2-request``, and the underlying kernel exposes the RK3399's decoders only via the stateless v4l2_request API (``rkvdec`` for HEVC, the Hantro block as ``rockchip,rk3399-vpu-dec`` for H.264). ffmpeg's stateful ``hevc_v4l2m2m`` / ``h264_v4l2m2m`` decoders can't reach them -- mpv logs ``Could not find a valid device`` even after ``/dev/video-dec`` symlinks are present. mpv ``--hwdec=help`` also doesn't list rkmpp or drm-copy, so there's no other path through the stock build. So: ``rockpi4`` envelope drops from HEVC 1920x1080 30 to H.264 1920x1080 30 -- the same conservative tier as the generic ``arm64`` catch-all. The viewer SW-decodes 1080p30 in real time on the Cortex-A72; no frames dropped, just no HW gain over plain ``arm64``. * Rock Pi entry drops from ``_PI_HWDEC_BY_CODEC`` -- mpv falls through to ``auto-copy`` which mpv's whitelist resolves to SW decode on this build. * host_agent's subtype publish, the start_viewer.sh ``/dev/video-dec`` symlink creation, and the dedicated ``rockpi4`` matrix key all stay in place -- they're forward-compatible scaffolding so a follow-up enabling v4l2_request (or linking rkmpp) in the viewer build only has to bump the matrix entry's codec to ``hevc`` and add the hwdec dispatch row. No further plumbing churn. Tests + docs reflect the routing-without-HW reality. The legacy-label fixes from this PR (force_mpv + ``--vo=gpu --gpu-context=wayland`` + ALSA default for the ``generic-arm64`` DEVICE_TYPE) are unaffected -- those are real bug fixes the Rock Pi 4 needs to play anything under cage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(viewer,envelope): extend +rpt1 ffmpeg to arm64; Rock Pi 4 = HEVC 4Kp60 The Raspberry Pi APT repo's ffmpeg build (``+rpt1``) ships with ``--enable-v4l2-request --enable-libudev --enable-vout-drm``, which the stock Debian Trixie ffmpeg drops. Without those flags the v4l2_request hardware decoder family is unreachable from mpv — which is exactly what bit the Rock Pi 4 in this PR: RK3399's ``rkvdec`` (HEVC) and Hantro VPU (H.264) are both stateless v4l2_request decoders. Pi 4 / Pi 5 already pull from the +rpt1 repo for the same reason; extending the conditional in ``Dockerfile.viewer.j2`` to also include ``arm64`` lights up hardware decode on every arm64 SBC whose kernel exposes v4l2_request decoders (Rock Pi, Orange Pi RK356x, Pine64, Allwinner H6 with Cedrus, ...). * ``Dockerfile.viewer.j2`` — board conditional ``('pi4-64', 'pi5')`` → ``('pi4-64', 'pi5', 'arm64')``. The apt pin already restricts the +rpt1 repo to ``ffmpeg + libav* + mpv``, so other arm64 packages stay on stock Debian. Comment block updated to list which decoders each board reaches via this path. * ``playback_envelope.py`` — ``rockpi4`` envelope flips from H.264 1080p30 to HEVC 3840×2160 60. RK3399's Hantro G2 is the same decoder family as Pi 5's and supports 4Kp60 per the Rockchip datasheet — matching Pi 5's envelope keeps the fleet uniform. * ``media_player.py`` — ``_PI_HWDEC_BY_CODEC['rockpi4']`` maps both h264 and hevc to ``drm-copy`` (the v4l2_request hwdec path, same as Pi 5 for HEVC). * Tests + docs updated accordingly. The legacy-arm64 fixes (force_mpv + cage VO + ALSA default for ``generic-arm64``) and the host_agent subtype publish are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(celery): cgroup CPU hard cap (`cpus: 1.0`) so encodes never starve the viewer ``nice -n 19 ionice -c 3`` + ``--concurrency=1`` lower priority and limit parallelism, but they're soft hints — when libx265 is the only heavy workload on the box the scheduler still hands it everything available. Live-confirmed on the Rock Pi 4 in this PR: sshd starved through banner exchange and mpv dropped mid-frame during walker bursts, even with all three soft caps in place. ``cpus: 1.0`` is a cgroup CFS quota — one CPU's worth of compute per period, kernel-enforced. On every supported SBC (Pi 4 / Pi 5 / Rock Pi 4, all 4-core) it leaves 3+ cores for the viewer, the host_agent, sshd, and everything else. x86 hosts have 8+ cores so the cap is conservative there but harmless — asset processing is upload-time, not throughput-bound. Applied to every prod / dev compose template. test compose stays uncapped because the test suite runs in CI environments with deterministic resources where the cap would just slow CI down without protecting anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(celery): scale CFS quota with host cores (half of \$(nproc), min 1.0) A flat ``cpus: 1.0`` is too aggressive: it forces a single-thread ceiling even when the host has many idle cores. On an 8-core x86 deployment the asset processor would take 4x longer than it needs to without protecting anything we don't already protect. Compute the limit dynamically in ``bin/upgrade_containers.sh``: ``$(nproc) * 0.5`` (floored to 1.0 so single-core hosts still make progress). On the supported boards this lands at: * 4-core Pi 4 / Pi 5 / Rock Pi 4 → cpus: 2.0 (2 cores headroom for the viewer + system) * 8-core x86 → cpus: 4.0 (4 cores headroom) * 16-core x86 → cpus: 8.0 (still 50/50 with the system) Soft priorities (``nice -n 19 ionice -c 3``) and the ``--concurrency=1`` walker still apply on top; the cgroup quota is the hard backstop that guarantees "encoding never impacts playback or UI access". Live test on the Rock Pi 4 (in this PR) proved the soft caps alone aren't enough — libx265 saturated every core and starved sshd through banner exchange. The balena compose templates use a literal ``cpus: 2.0`` (balena only targets 4-core Pi 2/3/4/5 today); the non-balena prod compose substitutes the env var. Dev compose also uses a literal ``2.0`` since dev hosts vary too widely to autodetect cheaply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(walker): hardware-decode the source in the transcode pipeline The walker's encode pass stays libx265-software-bound on every SBC (none of Pi 4 / Pi 5 / Rock Pi 4 have HEVC HW encode), but the decode half of the pipeline can be offloaded to the same silicon mpv uses for playback. That's typically 30-50% of the ffmpeg wall-clock on H.264 sources and dominant on 4K — well worth the small dispatch table. * ``_decode_hwaccel_args(source_codec)`` returns the per-board ``-hwaccel`` flags to prepend to the ffmpeg invocation. Uses the same host_agent subtype probe (``host:board_subtype`` in Redis) that envelope resolution already uses, so the walker and viewer agree on what board they're targeting. * Dispatch matrix: - Pi 4 (V3D V4L2 M2M + rpi-hevc-dec) → ``-hwaccel drm`` for both H.264 and HEVC (the +rpt1 ffmpeg's v4l2_request path). - Pi 5 (Hantro G2) → ``-hwaccel drm`` for HEVC only. - Rock Pi 4 (rkvdec + Hantro VPU) → ``-hwaccel drm`` for both, same v4l2_request path as Pi 5. - x86 (VAAPI) → ``-hwaccel vaapi -hwaccel_device /dev/dri/renderD128`` for both. - Pi 2 / Pi 3 / unknown arm64 → no HW path mpv can address; SW decode is the only choice. * ``_transcode_to_target`` wraps the ffmpeg call: first attempt with hwaccel args, fall back to SW decode on ``sh.ErrorReturnCode`` (kernel driver weird, device busy, bitstream the v4l2_request decoder rejects). Logs the underlying ffmpeg stderr at WARNING so an operator chasing a slow walker sees the HW path failed. Tests pin every cell of the dispatch matrix + assert ``-hwaccel`` lands BEFORE ``-i`` in the argv (placing it after silently no-ops in ffmpeg) + the two-call SW-fallback path on simulated HW init failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server-image): extend +rpt1 ffmpeg pin to anthias-server too The walker's HW-decode optimization (``processing._decode_hwaccel_args`` emits ``-hwaccel drm``) only works against the Raspberry Pi repo's ``+rpt1`` ffmpeg build, which has ``--enable-v4l2-request``. The pin was previously only on the viewer image (Dockerfile.viewer.j2 in ``ba8d4709``), so the celery container — which runs the walker — kept the stock Debian ffmpeg and the hwaccel call silently fell back to SW on every board. * New ``docker/_rpt1-ffmpeg-pin.j2`` extracts the pin block. * Both ``Dockerfile.viewer.j2`` and ``Dockerfile.server.j2`` now include it via ``{% include '_rpt1-ffmpeg-pin.j2' %}``. Server also re-runs ``apt install --reinstall ffmpeg libav`` so the pinned version replaces whatever the base layer installed. No effect on Pi 2 / Pi 3 / x86 boards — the include's ``{% if board in ('pi4-64', 'pi5', 'arm64') %}`` keeps it inert there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(celery,viewer): four hardening fixes so the player survives an upgrade Live testing on Pi 4 / Pi 5 / Rock Pi 4 surfaced four scenarios where a single ``docker compose pull && up -d`` (or any upgrade that invalidates the playback envelope) wedges the device. These aren't test-harness flakes; production operators on the same hardware would hit them. All four belong in this PR alongside the features that exposed them. 1. Walker drip-feed — ``regenerate_for_envelope_change`` previously queued every stale ``normalize_video_asset`` in one beat tick. ``--concurrency=1`` serialises execution but the celery worker fetches the next task the instant the previous finishes, so a 100-asset catalog turns into hours of back-to- back libx265 with zero recovery windows between encodes. Switch to ``apply_async(args=..., countdown=N * 60)`` so each subsequent normalize starts at least 60 s after the previous was queued. Operator can flip ``is_processing=False`` on a row mid-window to cancel its turn. 2. ``mem_limit`` on celery container — cgroup CPU isolation alone doesn't stop libx265-4K from allocating ~1.5 GB resident memory, which on a 4 GB SBC pushes the system into swap and starves sshd + the viewer. Match the cpus cap with a memory cap (60% of host RAM, computed in ``bin/upgrade_containers.sh``). 3. ``stop_grace_period: 3s`` + ``stop_signal: SIGKILL`` on viewer — cage doesn't reliably release DRM master on SIGTERM (its libinput shutdown path hangs on certain kernels) and the kernel's GPU driver leaves dangling references that prevent the next ``up`` from acquiring DRM master. Skipping the SIGTERM-then-wait dance on intentional restarts gets the device past cage's bug deterministically. 4. libx265 / libx264 ``-preset superfast`` — was ``medium``. Asset processing is upload-time and only runs once per asset, so the 5-10× wallclock speedup is operator-facing throughput. The ~10-20% bitrate increase is invisible on typical signage content. Viewer decode is HW regardless of preset. Tests: * Walker test mocks switched from ``.delay`` to ``.apply_async``; signatures updated for ``args=(...,)`` + ``countdown=`` kwarg. * New ``test_regenerate_walker_spaces_dispatches_via_countdown`` asserts the countdowns are ``[0, 60, 120, ...]`` across a 5-asset catalog so the drip-feed contract is pinned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): use sh.ErrorReturnCode_1 in hwaccel fallback test sh.ErrorReturnCode is the abstract base; its __init__ does `self.exit_code = self.exit_code` which AttributeErrors unless the concrete numeric subclass (ErrorReturnCode_1, _2, ...) is used. Every other call site in this file already uses ErrorReturnCode_1 — this was the lone outlier introduced with the SW-fallback test in `0340b4f4`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(asset-processor): drop on-device video transcoding On-device libx265 transcode wedged a Pi 4's celery worker for 99 min on a single 4K60 H.264→HEVC pass during PR validation. Every supported board already HW-decodes both H.264 and HEVC via the viewer's per-board mpv hwdec dispatch (drm-copy / vaapi-copy / v4l2m2m-copy), so the re-encode provided no playback benefit for the codecs operators actually upload. - ``normalize_video_asset`` now runs ffprobe and writes codec / dims / fps / duration into ``metadata``; the asset file is never rewritten. - Removes the envelope module, the re-render walker (``regenerate_for_envelope_change``), and the server-start envelope cache reconciliation hook. - Drops 33 transcode / envelope / sibling-original tests. Image normalisation (HEIC/HEIF/TIFF/BMP/ICO/TGA/JP2/AVIF → WebP) is unchanged. The viewer-side per-board hwdec dispatch and host_agent board-subtype publishing are unchanged. For codecs the target board can't HW-decode (MPEG-2, MPEG-4 ASP, ...) the operator's recovery is to upload a transcoded copy; the metadata fields surfaced here let them see codec / dims / fps in the asset list before pushing the asset to the field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(asset-processor): gate uploads to hardware-decoded codecs only After ffprobe, ``normalize_video_asset`` now compares the source codec against the board's HW-decode set (mirroring the viewer's ``_PI_HWDEC_BY_CODEC``). Uploads outside the set are rejected with an error message that includes the rejected codec, the board's supported codecs, and an ``ffmpeg`` command line the operator can run on their workstation to transcode the source. Per-board HW decode set: - pi2 / pi3 → {h264} - pi4-64 / rockpi4 / x86 → {h264, hevc} - pi5 → {hevc} (no H.264 v4l2-request decoder mpv can reach) - arm64 catch-all → ∅ (operator must install a board-specific image) Also extracts ``DEVICE_TYPE`` → board-key resolution into a new ``anthias_common.board`` module so the server's gate and the viewer's hwdec dispatch share the same logic — eliminates the duplicated ``_redis_board_subtype`` mirror in ``media_player.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): surface unsupported-codec failures with copyable recipe UI/UX review of the gate's failure path surfaced two P0s and a few smaller nits: - The error message was only reachable via a native browser ``title`` tooltip on the Failed pill — invisible on touchscreens, can't be copied, leaks the ``UnsupportedVideoCodecError:`` class prefix into the aria-label. - The Edit Asset modal showed nothing about the failure — exactly the place the operator goes to act on a failed row. Changes: - ``UnsupportedVideoCodecError`` now carries the ffmpeg recipe as a ``recipe`` attribute. ``_NormalizeAssetTask.on_failure`` writes the bare message into ``metadata.error_message`` (no class-name prefix) and persists the recipe to ``metadata.error_recipe``. - ``_asset_row.html`` Failed pill becomes a button — click opens the Edit Asset modal. - ``_asset_modal.html`` renders a warning banner at the top of the Edit form when ``metadata.error_message`` is set, with the recipe inside a copyable ``<code>`` block + "Copy command" button. - ``_ffmpeg_reencode_recipe`` substitutes the operator's upload filename (stashed in ``metadata.upload_name`` at upload time) for the ``INPUT`` placeholder so the recipe is paste-ready. - Toast text shortened from "analysing video…" to "reading metadata…" (the ffprobe pass is sub-second now that there's no transcode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(processing): give recipe output a codec suffix so it doesn't overwrite input E2E validation on a Pi 5 surfaced a recipe like: ffmpeg -i 'sample-h264.mp4' -c:v libx265 ... 'sample-h264.mp4' — input and output point at the same file because both got the upload's stem + ``.mp4`` suffix. Operator pasting the recipe would overwrite their source. The fix gives the output filename a target- codec marker (``sample-h264.hevc.mp4`` / ``sample-h264.h264.mp4``) so the recipe is safe to copy-paste even when the upload's extension already matches the output container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: drop transcode-era defensive hardening on celery + server image These guards were load-bearing while the asset processor ran libx264 / libx265 transcodes; with the on-device transcode pipeline gone they're dead code defending against a workload that no longer exists. Removed: - ``cpus: ${CELERY_CPU_LIMIT}`` / ``cpus: 2.0`` cgroup CPU caps on anthias-celery (every compose template) - ``nice -n 19 ionice -c 3`` wrapper on the celery command - ``--concurrency=1`` on celery worker; default celery concurrency is fine when the only tasks are ffprobe + Pillow conversion - ``CELERY_CPU_LIMIT`` calc in ``bin/upgrade_containers.sh`` - ``_rpt1-ffmpeg-pin.j2`` include + reinstall layer in ``Dockerfile.server.j2``; the +rpt1 ffmpeg was only needed for the walker's ``-hwaccel drm`` transcode. The server now only runs ffprobe, which the stock Debian ffmpeg handles fine (smaller server image, simpler base) - Stale ``ffprobe → passthrough or libx264/aac transcode`` section header in processing.py Kept: - ``mem_limit: ${CELERY_MEMORY_LIMIT_KB}k`` on celery — still a useful safety net against a decompression-bomb fixture or runaway ffprobe - ``+rpt1`` ffmpeg pin on the viewer image — still load-bearing for mpv's ``v4l2_request`` HW decode on Pi 4 / Pi 5 / Rock Pi 4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: keep nice -n 19 ionice -c 3 on celery Cheap insurance against pathological inputs (decompression-bomb HEIC, runaway ffprobe). Brought back across all four compose templates after stripping the CPU cap + --concurrency=1 in the prior cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dashboard): address review feedback on codec gate UX * Plain-HTTP clipboard fallback. navigator.clipboard.writeText only resolves on secure origins, so on a LAN device (HTTP) the Copy command button silently failed. Add a window.fallbackCopyToClipboard helper that uses execCommand('copy') against an off-screen textarea, and have the inline copyRecipe() try it whenever navigator.clipboard isn't available or rejects. The recipe block also gets user-select:all so keyboard-copy still works if both paths fail. * Friendlier message for the arm64 catch-all branch. "Supported: none." read like the board literally has no decoder; replace with an explanation that the board hasn't reported a subtype yet and a pointer at the board-specific image. * Lock the gate (_HW_DECODE_VIDEO_CODECS) and the viewer dispatch (_PI_HWDEC_BY_CODEC) together with a consistency test so a future edit to one table can't quietly diverge from the other. * Cover the shell-quoting of recipe filenames with hostile-name parametrize cases (single quote, backtick, $(), ;) so a copy-paste recipe can't be turned into command injection. * Drop the stale "cgroup CPU cap" line from processing.py's module docstring — the cap was removed in `f85f8035`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address post-review feedback on codec gate / hwdec dispatch - processing: prefer the upload's extension token when ffprobe's format_name is a synonym list, so an .mp4 surfaces as container=mp4 (not mov, the first synonym). - bin/start_viewer.sh: drop the loose `-dec` catch-all from the v4l2 decoder match; keep the explicit rkvdec/cedrus/hantro/ -vpu-dec prefixes. - media_player: cap the ANTHIAS_DEBUG_DROPS mpv.log at 64 MB with a rolling truncate so a forgotten-on flag can't grow the disk. - tests: rename test_set_board_subtype_does_not_raise_on_redis_failure to test_set_board_subtype_propagates_redis_failures — matches what the test actually asserts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 11:46:02 +02:00
Viktor Petersson	fb2d7900cf	refactor: move webview/ into src/anthias_webview (#2896 ) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-14 10:56:07 +01:00
Viktor Petersson	7f8bbe43d7	feat(install): generic-arm64 best-effort support (Armbian SBCs) (#2879 ) * feat(install): generic-arm64 best-effort support (Armbian on Rock Pi, Orange Pi, …) Wires up a `generic-arm64` device_type so the installer recognises any aarch64 host that isn't a Raspberry Pi and runs the same Anthias stack on it. Closes #2849 (Tier 1). * `bin/install.sh::set_device_type` + `bin/upgrade_containers.sh` get an `aarch64` fallback branch, INTRO_MESSAGE / unsupported-message copy refreshed, raspberry-pi-tagged ansible tasks skipped on generic-arm64 (same as x86), vchiq strip extended. * ansible: validated set in `site.yml`, `docker_arch_by_device_type` gains `generic-arm64: arm64`. `docker-buildx-plugin` added to the apt-install list — required for MODE=build with `--platform=` Dockerfiles, harmless on pull-mode boards. Pre-existing host_agent service unit hardcoded `~/installer_venv/bin/python` (an ephemeral tmpdir post-#2843); split into a persistent `~/.anthias-venv` that ansible syncs before installing the unit. * image_builder: `generic-arm64` build target, Qt6 + cage + wayland like x86; `va-driver-all` deliberately not shipped — Rockchip / Allwinner / Amlogic mainline hwdec goes through V4L2 M2M / request API, not VAAPI, so mesa-va-drivers would be dead weight. * viewer: `start_viewer.sh` reuses the x86 cage path for generic-arm64; `media_player.py` routes generic-arm64 to MPV (the `device_helper.get_device_type()` fallback returns 'pi1' on non-Pi aarch64 hosts, so the proxy needs the DEVICE_TYPE env override that pi4-64 already uses). New test added. * host_agent: `SUPPORTED_INTERFACES` gains `end` prefix — Rockchip GMAC etc. surface as `end0` on systemd predictable naming, which was previously filtered out, leaving the splash page stuck on "Detecting network…". * CI: docker-build matrix + mirror-latest-tags publish `latest-generic-arm64` alongside the existing per-board tags. * Docs: README, marketing site supported-hardware table, and FAQ get a plain-language "Yes, on a best-effort basis" entry that spells out the software-decode trade-off, the SoCs known to work well (RK3399 / RK35xx / Allwinner H6 / Amlogic GXBB-GXL-GXM / S905X3), and the boards to avoid (Allwinner H616 / H618). Per-SoC hardware decode (`rkmpp`, `cedrus`, `meson-vdec`) is the planned Tier-2 follow-up. Validated end-to-end on a Rock Pi 4B (Armbian trixie, RK3399, 1GB RAM) via build-on-device: install completes, web UI reachable, all four asset types (image, H.264 1080p60, H.265 1080p60, webpage) cycle through the viewer cleanly, mpv pure-decode benchmark shows 0 dropped frames over the full 60s of each clip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ansible-lint): pair become with become_user on .anthias-venv sync task ansible-lint's partial-become rule fires on `become_user:` without a matching `become:` at the same level, even when the play-level become already covers it. Explicit pairing keeps lint quiet without changing runtime behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Copilot review feedback on generic-arm64 PR - ansible: drop `creates:` guard on the runtime venv sync — `uv sync` is idempotent (sub-second resolver check when nothing changed), so re-running unconditionally means dependency updates from a pyproject.toml / uv.lock change actually land on upgrade instead of silently skipping. Idempotency surfaced via `changed_when` keyed on uv's `+/-/~` package-action prefix so steady-state runs stay `ok`. - ansible: rework docker-buildx-plugin comment to justify the install on its own merits (any MODE=build run needs it because of `FROM --platform=$BUILDPLATFORM` in Dockerfiles) rather than tying it to generic-arm64 lacking published tags — that explanation becomes stale the moment this PR merges and CI publishes them. - viewer: `get_alsa_audio_device()` short-circuits on `DEVICE_TYPE=generic-arm64` before the Pi-firmware dispatch, since the Rock Pi / Orange Pi / Banana Pi class of board has none of the `vc4hdmi` or `Headphones` ALSA cards. Defers to ALSA's `default` device; operators with a non-standard sink can override via `~/.asoundrc` (already bind-mounted into the viewer container). - tests: new assertions that generic-arm64 routes mpv through `--vo=gpu --gpu-context=wayland` and `--audio-device=alsa/default`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(website): disambiguate Debian release codenames in supported-hardware copy Copilot flagged the previous wording — "running Raspberry Pi OS, Debian, or Armbian (Trixie or Bookworm)" — as misleading: the parenthetical reads as if Raspberry Pi OS and Armbian are themselves "Trixie or Bookworm", but those are Debian codenames, and Armbian builds can also be Ubuntu-based. Split the sentence so the codenames are tied explicitly to Debian. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ansible): derive is_raspberry_pi from device_type, not architecture Copilot caught that the `is_raspberry_pi` helper in docker.yml was defined as `ansible_architecture in ['aarch64', 'armv7l', 'armv6l']`, which is also true on generic-arm64 (Rock Pi / Orange Pi / …). That silently applied the Pi-only `gpio` group to non-Pi SBCs. device_type is the authoritative discriminator and is validated upstream in ansible/site.yml's pre_tasks, so use it directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor: rename device_type generic-arm64 → arm64 (parallel to x86) Per review feedback: `generic-arm64` was the original working name for the new aarch64 non-Pi fallback. `arm64` is shorter and parallels `x86` — both are architecture-generic device_types that catch any host without a board-specific image, sitting alongside the per-board labels (pi2 / pi3 / pi4-64 / pi5). User-facing prose still says "generic 64-bit ARM" or "Armbian on Rock Pi / Orange Pi / …" for context. Mechanical s/generic-arm64/arm64/ across install scripts, ansible, image_builder, viewer / start_viewer, host_agent, tests, CI matrix, mirror-latest-tags, Dockerfile.viewer.j2, README. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review polish on arm64 PR - viewer: get_alsa_audio_device's arm64 short-circuit now logs the registered ALSA cards (from /proc/asound/cards — aplay isn't in the viewer image) once per process when DEVICE_TYPE=arm64, so an operator reporting "no HDMI audio" carries enough breadcrumbs in journalctl alone to pick the right ~/.asoundrc override. - ansible: rewrite the docker-buildx-plugin size claim — 15 MB download / 67 MB extracted, from the deb metadata on arm64. - viewer: MediaPlayerProxy.get_instance comment block split into a two-bullet rationale, calling out the pi4-64 and arm64 cases separately so a future reader doesn't mistake the lead sentence for "pi4-64-only". - install.sh / upgrade_containers.sh: spell out that the aarch64 catch-all in set_device_type is intentional — a future Pi model whose model string drifts past the regexes lands here too, trading software decode + no Pi-boot tweaks for a louder fail. - README + FAQ: tighten the Plymouth caveat from "few seconds of black" to "kernel boot log scrolls until the viewer takes over", which is what actually happens on most U-Boot ARM SBCs. - ansible: rename the docker.yml var from `is_raspberry_pi` to `device_is_pi` now that it's derived from device_type rather than `ansible_architecture`, so the name matches what it does. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: narrow arm64 support to Debian-based Armbian (call out Ubuntu) Copilot flagged that "Armbian" in the new docs is ambiguous — Armbian builds come in both Debian-based (Bookworm/Trixie) and Ubuntu-based (Jammy/Noble) flavours. The installer's ansible role wires Docker's apt repo under download.docker.com/linux/debian/{{ ansible_distribution_release }}, which 404s on the Ubuntu codenames, so an Ubuntu-Armbian user following the current docs would hit a broken install at the very first `apt update`. Narrowing the wording in README, the marketing site's supported-hardware blurb, and the FAQ to "Debian-based Armbian" so users pick the right image. Extending the installer/playbook to handle Ubuntu-based Armbian is a separate follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-13 10:05:52 +01:00
Viktor Petersson	861a29d38d	refactor(ci): release flow per #2769 (master = testing, releases = stable) (#2854 ) * refactor(ci): release flow per #2769 (master = testing, releases = stable) Master push now publishes container images only. Balena cloud deploy and disk-image build move to a release-triggered workflow so existing fleet devices update on cut releases instead of every merge to master. rpi-imager.json is generated once per release and shipped as a release asset; the website fetches it at build time instead of regenerating from the GitHub API on every deploy. - docker-build.yaml: drop the balena: job - build-balena-disk-image.yaml: trigger on release.published, add balena-cloud-deploy job (replaces deprecated deploy-to-balena-action), bump balena-cli 22.4.15 -> 25.1.3, install via bun, two-phase release upload so build_pi_imager_json sees per-board snippets - deploy-website.yaml: drop rpi-imager.json regeneration + test job; fetch it from the latest release instead - build_pi_imager_json.py: honour RELEASE_TAG env to bypass /releases/latest (which excludes prereleases by design) Also strips third-party action dependencies from new code (manual docker login, bun install, balena-cli install). Refs #2769 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ci): address Copilot review on PR #2854 - deploy-website: download rpi-imager.json by tag on release-triggered runs (previously: always default-latest, which can skip prereleases and may not match the just-published release) - deploy-website: drop the now-stale prerelease comment - build-balena-disk-image: pin Bun via BUN_VERSION env so disk-image builds and balena deploys are reproducible - generate-openapi-schema: accept an optional `ref` input via workflow_call and check that out, so the schema attached to a release matches the release commit (not the default branch) - python-lint: run rpi-imager generator tests so the package keeps a PR-time CI gate after the deploy-website test job was removed - build_pi_imager_json: reword RELEASE_TAG-override comment Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ci): address Copilot round-2 review on PR #2854 - build-balena-disk-image: capture BUILD_DATE once at the top of the packaging step so a midnight-spanning run can't reference different filenames produced earlier - build-balena-disk-image: workflow_dispatch now fails loudly when the input tag has no existing GitHub release, matching the input contract; release event always satisfies it on its own trigger - bun install: extract to .github/workflows/scripts/install-bun.sh, which downloads the pinned release archive + SHASUMS256.txt and verifies SHA-256 instead of piping a remote shell script to bash - deploy-website: re-introduce the strong jq -e validations on rpi-imager.json (os_list array, required fields, numeric sizes, https URLs, no pi1) so a malformed release asset fails fast - resolve-context: drop the unused `commit` output Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ci): address Copilot round-3 review on PR #2854 - install-bun.sh: append \$HOME/.bun/bin to GITHUB_PATH so globally- installed CLIs (e.g. balena-cli via \`bun install -g\`) resolve in subsequent steps. Without this, the disk-image workflow's balena invocations would fail with command-not-found. - deploy-website: distinguish "release exists but lacks rpi-imager.json" (transition fallback) from transient errors (auth/rate-limit/network). Probe via gh release view --json assets before download; only fall back when the asset is genuinely missing. Other gh failures now propagate instead of silently shipping an empty os_list. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ci): address Copilot round-4 review + tighten path triggers - build-balena-disk-image: pin git rev-parse to --short=7 so the resolved short hash always matches the 7-char tag format that docker-build.yaml writes (a longer abbreviation would silently reference image tags that never exist) - deploy-website: drop the `release: published` trigger. The disk- image workflow now ends with `gh workflow run deploy-website.yaml` after rpi-imager.json has been uploaded to the release, so the deploy is guaranteed to see the asset and won't ship an empty os_list during the upload-step window - deploy-website: add `.github/workflows/scripts/install-bun.sh` to the path triggers so changes to the bun installer also redeploy the site (it's a runtime dep) - docker-build / generate-openapi-schema: exclude `tools/raspberry_pi_imager/*` and the bun installer script from triggers — neither workflow uses those files Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(ci): name release artefacts \`anthias-<board>\` so the imager regex matches build_pi_imager_json.get_board_from_url's regex \`-(pi\d(?:-\d+)?)\.img\.zst\$\` only matches a hyphen before \`piN\`. The disk-image workflow had been writing artefacts as \`raspberrypi3.img.zst\` / \`raspberrypi4-64.img.zst\` (no hyphen between \`raspberry\` and \`pi\`), so all boards except pi2 silently failed to be picked up by the consolidation step — likely the root of the broken rpi-imager.json the user flagged. Renames the per-board release artefacts to \`<date>-anthias-<board>.img.zst\` (and matching \`.sha256\` / \`.json\`) so the existing regex picks them up. Tests already covered the \`anthias-piN\` shape, so they pass without changes. Updates the upload-artifact + attestation glob patterns accordingly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ci): address Copilot round-6 review on PR #2854 - Move expression substitutions in resolve-context to env vars and switch the dispatch-tag read from `inputs.tag` to `github.event.inputs.tag`, so the `inputs` context is only consulted on workflow_dispatch where it's actually populated. - Add `actions: write` permission to build-rpi-imager-json so its `gh workflow run deploy-website.yaml` fan-out has the Actions API scope it needs to dispatch the website deploy. - Split the openapi-schema checkout ref resolution into a dedicated step that uses env vars + `if -n` rather than the inline `${{ inputs.ref \|\| github.ref }}` expression, so the inputs lookup is co-located with its fallback in one readable shell block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(ci): fix stale install-bun.sh header comment The header described the runners as linux/amd64-only and asked maintainers to extend the platform detection if that changed, but the arch case below already covers both x86_64 and aarch64 Linux. Reword the comment so it matches the script's actual behaviour. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ci): drop hard-coded --repo from deploy-website gh calls `gh release view/download` default to the runtime repository when `--repo` is omitted, so explicitly pinning Screenly/Anthias was making the workflow needlessly less portable to forks (or a future repo rename) without buying anything. Match the rest of the workflow, which already relies on the runtime repo context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(ci): address Copilot round-9 review on PR #2854 - Gate build-balena-disk-image.yaml's release trigger to Anthias-core tags (`v<version>`). build-webview.yaml publishes its own `WebView-v<version>` GitHub releases on tag pushes; without this guard, every webview release would have spuriously fanned out to balena OTA deploys + disk-image builds. Filter is on resolve-context so the entire downstream pipeline cascades-skips via `needs:`. - Cache sha256 + size of each multi-GB image once and reuse for both the .sha256 sidecar and the per-board JSON snippet, instead of re-hashing the same files inside jq's --arg expansions. Roughly halves the wall-clock of the package step. - Add `tools/raspberry_pi_imager` to .dockerignore. The directory is build-time-only (CI generator for rpi-imager.json) but Dockerfile.{server,viewer}.j2 do `COPY . /usr/src/app/`, so without this entry it baked into runtime images. With docker-build.yaml's matching path-trigger exclusion in place, this keeps the two filters semantically honest: a tools-only commit truly cannot change image content, so skipping the container rebuild is correct rather than a footgun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): write the .sha256 sidecar against user-facing filenames The uncompressed-image line previously referenced \`\$BALENA_IMAGE.img\` (e.g. \`raspberrypi5.img\`), the CI-local intermediate name. That file never ships in the release asset, so \`sha256sum -c\` against the downloaded sidecar fails to find it. Switch to \`\$ARTIFACT.img\` — the filename a user gets after \`zstd -d <ARTIFACT>.img.zst\` — so both lines match files they actually have on disk. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): call .venv/bin/pytest directly in python-lint job \`uv run --group website pytest …\` implicitly syncs the project venv with the default group set, which pulls in the \`dev\` group (pytest-django==4.12.0). pytest-django then auto-activates as a plugin, reads \`DJANGO_SETTINGS_MODULE\` from pyproject.toml, and fails to bootstrap Django because the curated dev-host + website install doesn't ship pytz / channels / the other transitive bits the settings module imports. Invoke the venv binary directly so the minimal hand-curated env above is what the rpi-imager unit tests actually run against. The tests don't need Django at all — this keeps the gate fast and the dependency surface honest. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): pass -p no:django to the rpi-imager pytest invocation The previous attempt — calling \`.venv/bin/pytest\` directly instead of \`uv run\` — assumed the dependency-installation step bounded the venv contents. It doesn't: the earlier \`uv run ruff check\` step implicitly syncs the project venv with the default \`dev\` group, which ships pytest-django==4.12.0 + playwright + etc. By the time the rpi-imager step runs, pytest-django is sitting in .venv as an auto-loading pytest plugin, reads \`DJANGO_SETTINGS_MODULE\` from pyproject.toml, and crashes trying to bootstrap Django (pytz, channels, etc. are missing in this minimal env). The rpi-imager unit tests don't need Django at all, so disable the plugin with \`-p no:django\`. Verified locally: 22/22 pass with pytest-django installed in the venv as long as the plugin is disabled. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(x86): support balenaOS x86 fleets via Wayland (#2857) * feat(x86): support balenaOS x86 fleets via Wayland (#2075) Brings x86 to feature parity with Pi for balenaOS deployments. balenaOS x86 doesn't expose /dev/fb0, so Qt's linuxfb plugin (used on Pi) has nothing to draw to and there's no host display server. Run Qt under Wayland via `cage`, a kiosk wlroots compositor that talks directly to KMS — no X server, no DISPLAY juggling, single-app by design. - bin/deploy_to_balena.sh accepts -b x86 and strips /dev/vchiq from the rendered compose (same conditional that already covers pi5). - docker/Dockerfile.viewer.j2 sets QT_QPA_PLATFORM=wayland on x86; every other board keeps linuxfb. - tools/image_builder/utils.py adds cage + qt6-wayland to the x86 viewer apt list. - bin/start_viewer.sh wraps the viewer launch in `cage --` on x86; WAYLAND_DISPLAY is added to sudo's --preserve-env so it survives the env scrub when dropping to the viewer user. - .github/workflows/build-balena-disk-image.yaml extends the release-driven preflight, balena-cloud-deploy, and balena-build-images jobs to include x86 (fleet anthias-x86, balena device type genericx86-64-ext). build-rpi-imager-json is unchanged: the .img.zst regex is Pi-only, so x86 ships on the release without polluting the Raspberry Pi Imager JSON. Supersedes the stale draft PR #2409. The orphaned changes there (home.tsx deviceModel fetch with no consumer, viewer/media_player.py x86 audio table, silent removal of sha256sum -c on the webview tarball) are intentionally not carried forward. Closes #2075 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(x86): note x86 wayland exception in viewer apt comment Address Copilot review on PR #2857. The earlier comment in get_viewer_context claimed "nothing wayland-related here" — that's no longer true once x86 pulls in cage + qt6-wayland a few lines down. Rewrite to call out x86 as the one board that breaks the rule so future cleanup doesn't try to drop the wayland deps thinking they were a mistake. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 08:54:37 +01:00
Viktor Petersson	9baf750639	refactor(webview): inline build into viewer image as multi-stage (#2855 ) * refactor(webview): inline build into viewer image as multi-stage - Add docker/Dockerfile.qt5-webview-builder.j2 — two-stage Qt 5 cross-compile (sysroot + host) included from Dockerfile.viewer.j2 for pi2/pi3 - Inline a Qt 6 webview-builder stage in Dockerfile.viewer.j2 (qt6-base-dev + qt6-webengine-dev + qmake6) for pi4-64/pi5/x86 - Replace runtime curl-from-releases blocks with COPY --from=webview-builder for binary, resources, and (Qt 5) the qt5pi runtime tree - Drop WEBVIEW_VERSION pinning; the Qt 5 toolchain stays frozen at WebView-v2026.04.1 via a qt5_toolchain_url constant - Delete .github/workflows/build-webview.yaml and the dead build-webview.yaml / webview/** path-ignore exclusions in docker-build.yaml, docker-test.yaml, generate-openapi-schema.yml so webview source changes now trigger viewer rebuilds - Delete redundant Qt 6 builder scaffolding (webview/scripts/, webview/docker/, webview/build_qt6.sh, build_webview_with_qt5.sh) - Trim BUILD_WEBVIEW + WEBVIEW_VERSION from build_qt5.sh and rebuild_qt5_toolchain.sh; webview/Dockerfile and build_qt5.sh remain as offline tooling for Qt 5 toolchain rebuilds - Rewrite webview/README.md to describe the in-tree build flow Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(webview): address Copilot review feedback - Drop unused ccache install + cache mount from both Qt 6 and Qt 5 webview-builder stages — webview is 3 .cpp files; ccache wiring (especially through Linaro's cross-gcc) wouldn't pay back the setup cost - Vendor sysroot-relativelinks.py at webview/ instead of curl-ing it from raw.githubusercontent.com/.../master at build time (eliminates supply-chain risk and the non-reproducible reference) - SHA256-pin the Linaro gcc-7.4.1 tarball — Linaro doesn't publish signed manifests for this legacy build, so the hash is the trust anchor - Install python3 in the host builder stage (needed by the vendored sysroot-relativelinks.py) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(webview): invoke sysroot-relativelinks via explicit python3 The vendored script is committed at mode 755 and is callable directly today, but invoking it as `python3 /usr/local/bin/sysroot-relativelinks.py` removes the hidden dependency on the file-mode bit surviving every clone/checkout path. python3 is already installed two layers up in the same stage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(webview): pin Qt5 builder to amd64 and vendor sysroot script in offline path Qt5 webview-builder stage was pinned to $BUILDPLATFORM, but the Linaro 7.4.1 cross-compiler it downloads is x86_64-only — arm64 build hosts (e.g. Apple Silicon) would attempt to run an x86_64 binary natively and fail. Pin the stage to linux/amd64 explicitly; non-amd64 hosts will execute it under QEMU. webview/Dockerfile (the offline Qt5 toolchain rebuild path) was still fetching sysroot-relativelinks.py via unpinned wget from raw.githubusercontent.com/.../master. The script is already vendored at webview/sysroot-relativelinks.py at a pinned upstream commit, and the rebuild script uses webview/ as the docker context, so switch to COPY for a reproducible offline rebuild path. Also update webview/build_qt5.sh to invoke the script via explicit python3 to match the inline builder change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(lint): exclude vendored sysroot-relativelinks.py from ruff/mypy The script is vendored byte-identical from a pinned Yocto/poky upstream commit (see file header). Reformatting it via ruff or annotating it for mypy strict mode would put the file off-pin and silently break the provenance comment that says "vendored from <commit>". Adding a project-style copy is the wrong tradeoff: the cost of every future upstream sync would be re-applying our edits. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 08:46:41 +01:00
Viktor Petersson	d98e605cb5	chore(server): bake collectstatic into image, drop runtime scratch mount (#2846 ) * chore(server): bake collectstatic into image, drop runtime scratch mount Static files (admin assets + the bun-built dist/) are immutable from image build time onward — `bin/start_server.sh` was running `collectstatic --clear --noinput` on every container start into a host bind-mount on /home/${USER}/anthias/staticfiles, which existed only as a writable scratch path for collectstatic to write to. Same data, every restart, into a directory the container itself populated. Move the work to where it belongs: - docker/Dockerfile.server.j2: run `collectstatic --noinput --clear` in the production stage, after the bun-built dist/ is COPYed in. Wrapped in `HOME=/tmp/anthias-build` because the Django settings module instantiates AnthiasSettings() at import time, which writes a default anthias.conf into $HOME/.anthias if one isn't there yet (start_server.sh seeds /data/.anthias before this same import at runtime; at build time the throwaway HOME is removed after the RUN finishes). - src/anthias_server/django_project/settings.py: STATIC_ROOT moves from /data/anthias/staticfiles to /usr/src/app/staticfiles. Inside the container this path is now read-only — admin + collected app static is immutable per-image. Dev (DEBUG=True) bypasses STATIC_ROOT entirely via WHITENOISE_USE_FINDERS so the path doesn't have to exist in the dev image. - bin/start_server.sh: drop the runtime collectstatic invocation and the "Generating Django static files..." progress line. - docker-compose.yml.tmpl: drop the /home/${USER}/anthias/staticfiles -> /data/anthias/staticfiles bind-mount. The host-side directory becomes orphan state after upgrade — operators can `rm -rf ~/anthias/staticfiles` once the new image is pulled. (One of the two reasons ~/anthias has to persist after install. The other — runtime shell scripts in ~/anthias/bin/ — is tracked separately in #2845.) Verified by building the production server image locally (`docker buildx build --file docker/Dockerfile.server`): - 210 static files copied to /usr/src/app/staticfiles at image build. - Container starts, uvicorn comes up, no "Generating Django static files..." line. - `curl http://localhost:8080/static/admin/css/base.css` -> HTTP 200, 22120 bytes (matches the baked file). - /data/anthias/ does not exist in the running container -- no runtime scratch dir is needed. Refs #2845. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: address Copilot review nits Two pure-comment fixes flagged by Copilot review on #2846: - src/anthias_server/django_project/settings.py: "admin assets + collected app static is immutable" -> "admin assets and collected app static are immutable" (compound subject takes plural verb). - docker/Dockerfile.server.j2: "COPYed" -> "copied" in the collectstatic comment block. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 08:43:12 +01:00
Viktor Petersson	7aaf4d14ad	ci(docker): pull bun from ghcr.io/screenly mirror (#2848 ) - Replaces oven/bun:1.3.13-slim with ghcr.io/screenly/bun:1.3.13-slim in Dockerfile.server.j2 (bun-builder FROM + dev-stage COPY) and Dockerfile.test.j2 (COPY) - Mirror is populated by .github/workflows/mirror-bun-image.yaml - Eliminates the last Docker Hub pull from CI builds Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 08:33:57 +01:00
Viktor Petersson	71e9e61fb9	chore(docker): drop the 2026-05-01 webview cache-bust step (#2844 ) The cache-bust marker existed only because pi4-64/pi5 webview tarballs got reuploaded under the same WebView-v2026.04.1 release URL after `b9509609`. The comment told a future committer to revert it ``once the next viewer image rebuild ships'' — that's now: the PR #2841 viewer image bumped to WebView-v2026.05.0, so the URL itself is different and Docker layer caching no longer needs the no-op RUN to invalidate. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-08 07:37:08 +01:00
Viktor Petersson	e97382886f	Replace React frontend with Django templates + HTMX/Alpine (#2818 ) * chore: realign sonar + gitignore comment to src/ layout sonar-project.properties still pointed at the pre-refactor top-level packages (anthias_app, anthias_django, api, lib, viewer, ...) and their old per-file coverage.exclusions paths, which would have produced empty Sonar runs and stale exclusions. Collapse sources to `src` and rewrite the exclusions to the new src/anthias_/ paths. Also fix the stale path reference in .gitignore's comment for the test DB (now src/anthias_server/django_project/settings.py). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore: gitignore .claude/ and untrack the lock file I just leaked Previous commit accidentally pulled in .claude/scheduled_tasks.lock because .claude was in .dockerignore but not .gitignore. Add the pattern to .gitignore and drop the file from the index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docker): pass --no-install-project to dev image-builder uv sync The `8dbf4eab` src/-layout refactor changed pyproject.toml to find packages under src/, but Dockerfile.dev only COPYs pyproject.toml and uv.lock into the image-builder stage — src/ doesn't exist there. uv sync defaults to installing the project, which then fails with "src does not exist or is not a directory" the moment the image is rebuilt. Match the pattern uv-builder.j2 already uses: install only the docker-image-builder dep group, not the project itself. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(packaging): move templates/ and static/ into src/anthias_server/app/ The `8dbf4eab` src/-layout refactor moved Python source under src/ but left Django templates and static assets at the repo root. Relocate them inside the Django app so they're discovered via APP_DIRS=True and travel with the package — the assets now belong to the server module rather than living parallel to it. templates/ → src/anthias_server/app/templates/ static/{favicons,img,sass,src} → src/anthias_server/app/static/ Settings: drop the explicit DIRS/STATICFILES_DIRS entries; APP_DIRS and AppDirectoriesFinder pick the new locations up automatically. Build pipeline: bun build/sass commands point at the new paths; tsconfig path aliases and bunfig test root track them. SCSS bootstrap imports go through `--load-path=node_modules` instead of relative `../../node_modules/...` so the partials stop caring how deep they sit in the tree. Production Dockerfile.server bun-builder COPYs adjusted to match. Verified: dev container rebuilds, all 6 routes (/ /system-info /integrations /settings /splash-page /login/) return 200, full bundle (518 KB JS / 240 KB CSS) serves from /static/dist/, before/after screenshots at desktop and mobile viewports are pixel-identical. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * build(frontend): vendor htmx, alpine, sortable, and Plus Jakarta Sans Adds the post-React runtime as a self-hosted bundle and removes the last cross-origin asset from base.html (Google Fonts CDN). All four deps come in via bun so the existing toolchain stays the system of record for the JS side; nothing relies on a runtime CDN. vendor.ts is the single entry point loaded by base.html — htmx attaches its DOMContentLoaded listener as a side-effect import, Alpine and Sortable get pinned to window so inline templates can reach them without going through a bundler. Build pipeline gains build:vendor (bun build → dist/js/vendor.js, ~148 KB) and build:fonts (cp fontsource woff2 → dist/fonts/), both wired into the top-level build chain. Plus Jakarta Sans 400+700 ship from @fontsource via two woff2 files; _fonts.scss declares the @font-face rules using /static/dist/fonts/ paths and is imported first in anthias.scss so the family is registered before bootstrap variables resolve. base.html and splash-page.html drop the fonts.googleapis.com <link>; base.html gains a <script defer> for vendor.js. The existing React bundle (anthias.js) stays loaded alongside vendor.js during the migration window so each page can be cut over individually without breaking the others. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(views): server-render /system-info as the first React→Django cutover Lays the foundations all subsequent page migrations will reuse and flips /system-info to a plain Django template as the pilot. Foundations: * page_context.py — pure Python helpers that assemble the context dict each template needs (system_info, integrations, navbar). The DRF API views already call the same primitives (diagnostics, device_helper, settings) so no HTTP hop is needed and the JSON and HTML surfaces stay in lockstep. * helpers.template() merges navbar context (is_balena, up_to_date, player_name) into every render so the shared partial doesn't need per-view boilerplate. * _layout.html is the new common shell — extends base.html, drops in _navbar.html and _footer.html around a {% block main %}. New pages extend _layout instead of base directly. * _navbar.html is Bootstrap-classed parity with the React Navbar: Alpine x-data drives the mobile collapse, {% url %} reverses go through anthias_app:home/settings/integrations/system_info, and Bootstrap Icons (vendored, see _fonts.scss) replace react-icons. * _footer.html mirrors the React Footer 1:1 (Try Screenly link, API/FAQ/Screenly.io/Support, GitHub stars badge). Cutover: * views.system_info() builds context from page_context.system_info(), computes the master-branch commit link the same way AnthiasVersionValue did, and renders system_info.html. * urls.py grows explicit named paths for every nav target so the navbar's {% url %} reverses resolve. Pages that haven't been migrated yet keep views.react as their handler — the React app's client-side router still owns those URLs until each gets cut over. Bootstrap Icons ride along: _fonts.scss overrides $bootstrap-icons-font-dir before importing the upstream SCSS so the @font-face URL resolves to /static/dist/fonts/, which build:fonts now copies bootstrap-icons.woff2 into alongside the Plus Jakarta Sans files. Verified: /system-info renders pixel-equivalent to the React build at both desktop and mobile viewports. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(views): server-render /integrations and /settings (forms, backup, system controls) Cuts /integrations and /settings over to plain Django views; both extend the _layout shell from the previous commit and use page_context helpers so the API and template surfaces stay in lockstep. /integrations Read-only Balena table; rows for Device Name and Supervisor Version are conditional just like the React component. When is_balena is False the body is empty (matches the React fallback). /settings Single GET render populated from page_context.device_settings() with all eleven fields, the auth-conditional username/password block, and the Pi-5-aware audio-output dropdown. Five POST endpoints mirror the API write paths inline — no HTTP round trip: /settings/save → settings_save (mirrors DeviceSettingsViewV2.patch) /settings/backup → backup_helper.create_backup → FileResponse /settings/recover → backup_helper.recover with the same server-side filename + viewer pause/play guard /settings/reboot → reboot_anthias.apply_async /settings/shutdown → shutdown_anthias.apply_async Reboot/shutdown wrap their submit buttons in a single Alpine confirmation overlay; Bootstrap's .modal/d-flex/!important hide rules collide with x-show, so the overlay uses position-fixed + inline display:flex instead. Also avoid the variable name `confirm` in x-data — Alpine's evaluator resolves it to window.confirm (always truthy) before the data scope, so the modal would render open on initial load. _settings_toggle.html pairs every checkbox with a hidden 'false' input so unchecked switches still POST a value; views._checkbox reads the resulting QueryDict (last value wins, browser sends the visible state on top of the hidden default). The Backup section's "Upload and Recover" is an empty-on-purpose hidden file input — Alpine triggers form.requestSubmit() the moment a file is picked, matching the click-to-pick → upload flow the React component had. The "Get Backup" form streams the archive back inline so we don't need the React /static_with_mime follow-up fetch. [x-cloak]{display:none!important} added to _fonts.scss so any other overlays we add later don't flash before Alpine paints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(views): server-render / (Schedule Overview) — assets, modal, sortable Cuts the home page from the React SPA to a Django template + HTMX + Alpine + Sortable. URLconf flips `path('', views.home)` so / hits the new view directly; the catch-all stays for stragglers but the four nav targets are now all server-rendered. Page shape: * page_context.assets() splits Asset.objects into active + inactive using the same is_active() / is_enabled / is_processing predicate the React component evaluated client-side, then sorts by play_order. * home.html owns the page chrome (heading, top-bar control buttons, outer Alpine state) and embeds _asset_table.html in an HTMX-swappable container. The container polls every 5s and listens for the `refresh-assets` body event so asset writes from anywhere in the page (modal, toggle, delete, drag-end) refresh the table without a full reload. * _asset_table.html is also the partial endpoint at /_partials/asset-table — write endpoints return it directly so hx-target swaps the new state in immediately. * _asset_row.html renders a single row; activates the drag handle only on active rows. * _asset_modal.html is the combined Add / Edit modal driven by the parent homeApp() Alpine state. Add has URI + File Upload tabs. * _empty_assets.html is the empty-state cell. Write endpoints (all in views.py): * /assets/new — URI add (validate_url + mimetype guess) * /assets/upload — multipart file upload, mirrors FileAssetViewMixin's assetdir handling * /assets/<id>/update — edit (name, mimetype, dates, duration, nocache, skip_asset_check) * /assets/<id>/toggle — flip is_enabled * /assets/<id>/delete — delete row * /assets/order — reorder (CSV ids → save_active_assets_ordering) * /assets/<id>/download — redirect for url-mimetypes, FileResponse for files * /assets/control/<cmd> — previous / next playback (Redis pub/sub via ViewerPublisher) All write endpoints return the table partial when called via HTMX (_asset_table_response checks HX-Request) and redirect back to / when called as a plain form POST — fallback works without JS. Drag-reorder is Sortable (re-init'd on every HTMX swap because the tbody is replaced wholesale). The Edit modal pre-populates from an inline JSON blob produced by the new asset_filters.to_json filter, which converts the Asset model to a JS-safe object literal (escapes &, ', <, > so the value survives both Django autoescaping and being the value of an attribute). Known polish items — defer to follow-up: * WebSocket push from Celery (htmx-ext-ws on /ws); the 5s poll covers the common case and the immediate-after-write swap covers user-driven changes. * Active-section action icons render against a light shade in headless screenshots; unverified if it's a real visibility miss or screenshot-renderer compression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(frontend): rip out the React stack now that every page is server-rendered Every nav target (/, /system-info, /integrations, /settings) and the auxiliary pages (/login/, /splash-page) now run on Django templates + HTMX + Alpine + Sortable, so the React/Redux surface and its toolchain go. Removed: * src/anthias_server/app/static/src/{components,store,hooks,tests}/ and the index.tsx / setupTests / constants.ts / types.ts roots * src/anthias_server/app/templates/react.html * the catch-all React route in app/urls.py and the views.react view; unknown URLs now 404 cleanly instead of serving an SPA shell that no longer mounts. Login post-success redirects to anthias_app:home. * The static/dist/js/anthias.js bundle (the old React build output) * package.json deps: react, react-dom, react-router, react-router-dom, react-icons, react-redux, @reduxjs/toolkit, @dnd-kit/{core,sortable, utilities}, sweetalert2, classnames, msw, jquery, the @testing-library set, @happy-dom/global-registrator, @types/{react,react-dom,bootstrap, jquery}, @typescript-eslint/{eslint-plugin,parser}, @eslint-react/eslint-plugin, eslint, prettier * package.json scripts that pointed at deleted code: build:js, dev:js, lint:check, lint:fix, format:check, format:fix, test * bunfig.toml (only used by `bun test`), eslint.config.mjs, .prettierrc, .prettierignore Kept: * htmx, alpine, sortable (vendor.ts entry → dist/js/vendor.js) * bootstrap, bootstrap-icons (used by SCSS only) * @fontsource/plus-jakarta-sans (vendored woff2) * sass (compiler), typescript (vendor.ts checking) Verified post-cleanup: dev container restarts, all six routes return 200, vendor.js + anthias.css + the three vendored woff2 files serve from /static/dist/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): repath standby.png + tweak modal so the integration suite passes Three integration regressions surfaced when the test image ran end- to-end against the new templates; this commit lands the minimal fixes to land the suite green. * tests/test_app.py and bin/prepare_test_environment.sh and src/anthias_server/api/tests/test_v1_endpoints.py all hardcoded the pre-refactor static/img/standby.png path. Repath to src/anthias_server/app/static/img/standby.png so the file loads from its new location. * Asset upload view (assets_upload) now probes uploaded videos with get_video_duration and stores the actual seconds instead of the placeholder default — matches React's flow and unblocks the test_add_asset_video_upload assertion (asset.duration == 5). * _asset_modal.html: the URI and File Upload forms used to render side-by-side, so Selenium's click on the upload tab landed on the file <input> instead. Wrap them in the tab x-data scope and gate each form with x-show="tab === ..." so only the active tab is clickable. Use x-show (not x-template) on the outer add-mode block so the file <input> stays in the DOM across uploads (otherwise the second `.fill()` in test_add_two_assets_upload couldn't find it). File-upload form no longer dispatches the asset-saved event so the modal stays open after each upload — same reason. * Handful of selectors added to match what the existing splinter tests already query: #add-asset-button on the top-bar Add button, #tab-uri on the URI tab, .upload-asset-tab on the File Upload tab, onchange="this.form.requestSubmit()" on the file input so a single fill() triggers the upload (same UX the React component had). Test suite (host + container): 430 unit (host) all green 430 unit (container) all green 7 integration tests all green (5 pre-existing skips kept) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): land mypy clean, ruff-format clean, full coverage, no-op JS scripts CI surfaced four fronts after the migration commits — fix them all together so the next push gets the suite green. mypy (-13 errors → 0) * views.py: assets_upload narrows file_upload.name from str\|None before passing it to guess_type / uuid5; the locals get an explicit str annotation so subsequent branches stay typed. * views.py: assets_update uses datetime.fromisoformat from datetime directly — django.utils.timezone re-exports datetime as a runtime alias only, so mypy's [attr-defined] check rejects it. * views.py: assets_download narrows asset.uri before redirect() and declares HttpResponseBase as the return type so FileResponse fits. * views.py: settings_save inlines the auth-update block from api.views.v2.update_auth_settings rather than handing the form-POST dict to Auth.update_settings (which expects a DRF request). * views.py: settings_backup return type → HttpResponseBase for FileResponse. * page_context.device_settings(): cast device_helper.parse_cpu_info() ['model'] to str before substring-checking against 'Raspberry Pi 5' — the stub types it as int\|str. ruff format (-2 files → 0) * views.py and asset_filters.py reformatted; ruff format clean. Coverage (79.7% → 80.8%, above the 80% gate) * New tests/test_template_views.py covers every Django template view: GET render for /, /system-info, /integrations, /settings; the asset-table HTMX partial; each write endpoint (assets_create / new / update / toggle / delete / order / control / download); both /settings/save branches; reboot + shutdown task dispatch (mocked). Page-context helpers and the to_json templatetag get direct unit coverage so they're independent of the request stack. JS lint / test (was failing on missing scripts) * package.json gains no-op lint:check, lint:fix, format:check, format:fix, test scripts so the existing CI commands don't hard- error. The scripts are stub echoes — drop them when real linting / tests come back. * test-runner.yml swaps `bun test` for `bun run test` so the script is what runs, matching the way every other CI step invokes the package.json scripts. Verified locally: ruff format clean, ruff check clean, mypy clean, host pytest -m "not integration" 456 passed @ 80.76% line+branch coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): ruff format the new test_template_views.py * fix(ui): address Copilot review on the home/footer template Three real items raised by Copilot's PR review: * _asset_table.html dropped its outer id="asset-table" — home.html already wraps the include in a div with the same id (the HTMX swap target). Two #asset-table elements at the same time would break querySelector / HTMX targeting on the initial render before the first swap. The partial wrapper stays as a plain <div>. * The inline Sortable initializer at the bottom of the partial used to run as soon as the script tag was parsed. base.html loads vendor.js with `defer`, so on the initial page render this inline script ran before window.Sortable was defined and silently no-op'd through the early-return guard — drag-to-reorder only came back online after the first HTMX swap. Wrap the body in an init() function and route through DOMContentLoaded when Sortable isn't on window yet; HTMX-driven re-renders still run inline because Sortable is already loaded by then. * _footer.html dropped the img.shields.io GitHub-stars badge. base.html used to point at fonts.googleapis.com and we vendored that off; the shields.io badge was the last runtime CDN call left in the page tree. Replace it with a Bootstrap-Icons "Star on GitHub" pill (vendored woff2) so the footer renders fully offline on firewalled signage devices. 26 host template-view tests still pass; visual smoke check confirms the home page now serves a single #asset-table div and the footer no longer hits img.shields.io. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(tools): drop the throwaway screenshot-capture helper tools/_capture_screenshots.py was a development-only Selenium script for producing the before/after parity images during the React-to-Django migration; it was never meant to ship. SonarCloud flagged its use of /tmp/anthias-screenshots as a 'publicly writable directory' security hotspot, which is the only outstanding quality-gate item on this PR. Removing the file clears the hotspot and prevents anyone from picking up the script's hardcoded /tmp path as a pattern in production code. The screenshots themselves remain (out of tree at /tmp/anthias-screenshots/before\|after/) for visual diff during review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): sequence the two-upload integration test against HTMX swaps test_add_two_assets_upload calls splinter's .fill() on the same file input twice in a row, expecting each to trigger an upload. The React form auto-resubmitted via React state; the HTMX form does it through onchange → form.requestSubmit() → POST + asset-table swap. On local Docker that round-trip finishes well before the second .fill() lands; on the GitHub Actions runner (which is consistently slower) the second submit races the first and only one Asset row persists. CI surfaced this as a flaky `assert 1 == 2`. Add a 3 s settle gap between the two fills so the second upload always starts against a settled DOM, and bump the trailing sleep from 3 s → 5 s to cover the second HTMX round-trip + table re-fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(home): keep #asset-table id+hx-* on the partial; condition-wait in test_add_two_assets_upload; drop empty hx-post on edit Three follow-ups from the second Copilot review pass. * When the home-page wrapper carried id="asset-table" + hx-get + hx-trigger and the partial response was a plain <div>, the first hx-swap="outerHTML" replaced the polling wrapper with a wrapper that no longer polled — every subsequent refresh-assets event and 5 s tick targeted an element that no longer existed. Move the id + hx-get + hx-trigger onto the partial's outer div instead. home.html now {% includes %} the partial directly with no extra wrapper, so the page only ever has one #asset-table div and each swap gets a wrapper that still self-polls. (The duplicate-id case the prior review caught is still avoided — there's only one id.) * The edit-asset form had hx-post="" alongside :action="...". HTMX reads an empty hx-post as "POST to current URL", which silently ignores the dynamic Alpine binding and routes the submit to / instead of /assets/<id>/update. Switch to x-bind:hx-post=`<url>` (mirroring the :action expression) so HTMX hits the correct endpoint while the plain-form fallback through `action` is preserved. * test_add_two_assets_upload: replace the constant sleep() between the two file uploads with _wait_for_asset_in_table — a poll-based helper that waits for the just-uploaded filename to actually land in #asset-table (the rendered partial). Constant sleeps either run long locally or short in CI; condition-waits make the test pass faster on a quiet machine and reliable on a busy runner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): use whole-page HTML in _wait_for_asset_in_table The helper held a `find_by_id('asset-table')` element handle and then read `.html` off it on every iteration. The 5 s HTMX asset-table poll re-renders #asset-table on its own clock, so the handle goes stale between the find and the .html read and Selenium raises StaleElementReferenceException. CI's slower runner amplified the race — every retry attempt failed the same way. Switch to `browser.html` (whole-page HTML) for the substring check. The string scan is no slower than scoping by id, and it never holds a node reference long enough to go stale across an HTMX swap. Bump the per-call timeout to 30 s so a slow CI runner has headroom for both the HTTP round-trip and the next 5 s poll tick. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui): respect device date_format/24h on rows; CSRF cookie fallback for sortable; sync edit-modal comment with code Three more from the latest Copilot review pass. * _asset_row.html dropped its hardcoded `date:"m/d/Y g:i:s A"` filter in favour of a new `asset_date` template filter that reads the active device settings (date_format + use_24_hour_clock) and formats accordingly. Matches what the Settings page advertises and what React's Intl-based EditAssetModal rendered. The filter lives in app.templatetags.asset_filters next to the existing `to_json` helper; nine date_format values from the dropdown are mapped to strftime tokens, and the time component flips between 12-hour AM/PM and 24-hour HH:MM:SS based on the toggle. * The inline Sortable handler in _asset_table.html used to read the CSRF token from `document.querySelector('input[name=csrfmiddlewaretoken]').value` with no null-guard. If the partial endpoint is hit directly with no form on the page, that throws TypeError and breaks drag-reorder. Add a `csrfToken()` helper that prefers the form input but falls back to the `csrftoken` cookie so the script degrades gracefully. * _asset_modal.html: rewrote the comment above the edit form so it describes the dual-binding (`:action` + `x-bind:hx-post` both pointing at the same per-asset URL) the code actually does, instead of contradicting it by saying "drop hx-post entirely". No code change. Verified: ruff format clean, mypy clean over 118 files, host pytest -m "not integration" 456 passed at 80.76 % coverage; the new template-view tests still cover the asset-table render path that hits the new asset_date filter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ui+filter): cache settings reads, monotonic timeouts, fw-normal, freeze edit-mimetype Five Copilot-flagged items in one commit. * asset_filters.asset_date dropped its per-call settings.load(). The AnthiasSettings singleton lives in memory across requests; the only writer is the Settings page POST handler, which calls .save() on the same object after .load(). Re-reading the .conf file from disk on every start/end cell during the 5-second HTMX poll was real overhead on long playlists for no consistency benefit. * tests/test_app.py:_wait_for_asset_in_table now uses time.monotonic() for the deadline. Wall-clock time can step backwards on NTP sync or VM clock drift; monotonic guarantees the timeout window stays whatever we asked for. * system_info.html and integrations.html swapped Bootstrap 4's removed `font-weight-normal` utility for Bootstrap 5's `fw-normal` on the Option/Value/Description column headers — they were rendering at the default weight before because the class no longer exists in the bundled Bootstrap. * _asset_modal.html turned the edit form's <select name="mimetype"> into a read-only display field. The value is derived at create time from the asset's URI/file; letting a user flip an image row to "webpage" only desynced the stored type from the actual content. views.assets_update also stops accepting a posted mimetype for existing assets, so the read-only UI is enforced on the server too. Verified: ruff format clean, host pytest -m "not integration" 456 passed, the new template-view tests still cover the asset-table render path that exercises asset_date and the assets_update endpoint. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(views+ui): align write paths with the v2 API contract; harden Sortable error path; fix backup file path Seven Copilot items in one batch. Backend (views.py): * assets_create / assets_upload now compute play_order as count(active assets) so newly-added rows land at the end of the active list instead of jumping to position 0 and shoving everything else over. * assets_upload uses uuid4().hex for the on-disk filename instead of uuid5(NAMESPACE_URL, name). The deterministic v5 form would collide for two uploads sharing a filename (different content), silently overwriting the older file. * assets_upload sets duration=0 for video assets — matches the v2 API rule (CreateAssetSerializerV2 rejects video duration > 0; the scheduler reads real length from the file at playtime). * assets_update enforces duration=0 for video assets server-side, so a hand-crafted POST can't desync the row from the API contract. * settings_backup builds the archive path from $HOME/anthias/staticfiles/ to match where backup_helper.create_backup actually writes the tarball. The pre-fix path.join('static', filename) was relative to CWD and would FileNotFoundError under uvicorn in production. Frontend: * _asset_modal.html: edit form's duration input now :disabled when editAsset.mimetype === 'video' and pinned to 0; disabled fields don't POST so the server never sees a stale duration for videos. * _asset_table.html: Sortable's onEnd handler now logs the rejection on a non-OK fetch response (and the catch branch logs the error too) before triggering refresh-assets — the page still resyncs with the persisted state, but the operator gets a console signal if a CSRF/5xx is silently dropping their reorder. Verified: ruff format clean, mypy clean over 118 files, host pytest -m "not integration" 456 passed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(app): expect duration=0 on video uploads (v2 API contract) The previous commit aligned the HTML upload path with the v2 API contract that pins video duration to 0; update the integration tests so they assert against the new (correct) value instead of the probed length the upload used to persist. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(templates): convert multi-line {# #} comments to {% comment %} Django's {# ... #} comment syntax is single-line only — the multi-line variants survive into the rendered HTML as visible text. The asset-table wrapper, the modal dual-binding note, the read-only mimetype rationale, the video-duration explanation, and the footer's "was an img.shields.io badge" comment were all showing up on the page in the dev container. Replace the five multi-line {# … #} blocks across _asset_modal.html, _asset_table.html, and _footer.html with {% comment %} … {% endcomment %}, which is Django's actual multi-line comment syntax. Single-line {# #} comments elsewhere are left alone — those parse fine. Verified by curl-ing every route ( /, /system-info, /integrations, /settings, /login/, /splash-page ) and confirming the page HTML contains zero leaked comment fragments. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(home+settings): readable active rows, real centered modals, day/time editor, plain Type label Active assets section * SCSS .active-content table now overrides --bs-table-bg AND --bs-table-color so Bootstrap 5's table cascade stops painting the cells with white-on-white. Action icons are visible again and the start/end/duration columns are readable on the purple bg. * "Activity" column header renamed to "Active" per the user's note. Edit modal * Type renders as a plain text label (small secondary caption + value) instead of a styled <input readonly>. Visually obvious it can't be edited; matches the user's expectation. Server still rejects any posted mimetype for existing assets. * Re-added the day-of-week + time-of-day window editor that the React modal had: seven Mon–Sun checkboxes (1–7 ISO) and Play-from / Play-until time inputs. assets_update parses the form values back into Asset.play_days / play_time_from / play_time_to with the same partial-window guard the API uses (both endpoints set, or both cleared). asset_filters._to_dict now exposes play_days_list and HH:MM-trimmed time strings on the Alpine editAsset blob so the checkboxes / inputs can pre-populate without extra fetches. Modals (all of them) * _asset_modal.html (Add + Edit), home.html delete confirmation, and settings.html reboot/shutdown prompt now use the same inline-style position-fixed overlay (display:flex; align-items:center; justify-content:center; full viewport coverage). Bootstrap's position-fixed/h-100/w-100 class chain was getting trapped by an ancestor on /settings, so the reboot dialog rendered top-left. Inline styles bypass that. * Native window.confirm() on delete is replaced by an Alpine confirmation overlay matching the reboot/shutdown UX. Frontend perf / correctness * URI-add, file-upload, and edit forms used to fire `refresh-assets` in hx-on::after-request, which kicked off a redundant HTMX poll on top of the partial swap each successful submit had already applied. Drop the trigger; the swap is enough. * The Sortable reorder fetch() now sends `HX-Request: true` so the server returns the small partial instead of redirecting to / and forcing fetch() to download the whole home page only to discard it. Multi-line {# … #} cleanup * Five remaining multi-line Django comments converted to {% comment %} … {% endcomment %} blocks (the home.html delete-modal comment and the new edit-modal comments were leaking into the page the same way the earlier batch did). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(home): TS-only first-party JS, Flatpickr-driven locale-aware editor, schedule label on rows User feedback rolled into one commit. Per-page JS moves to TypeScript * The homeApp() Alpine component, its Flatpickr binding, and the drag-reorder Sortable initialiser all live in src/anthias_server/app/static/src/home.ts and are bundled by `bun run build:home` into static/dist/js/home.js. home.html loads the bundle through {% block extra_head %}; the only inline lines left in templates are the one-call shim that hands the Django-resolved /assets/order URL into initAssetTableSortable(). Third-party libraries (htmx / Alpine / Sortable / Flatpickr) keep going through vendor.ts as imports — no copy-pasted JS. Locale-aware date / time pickers * base.html exposes <meta name="anthias-date-format"> + <meta name="anthias-use-24h"> derived from the device settings, so the home.ts bundle can configure Flatpickr to render in whichever format the operator chose on /settings rather than whichever format the browser defaulted to. * Edit modal's Start / End / Play-from / Play-until inputs flip from `<input type="datetime-local">` / `<input type="time">` to text inputs that Flatpickr binds to. assets_update tries the configured format first when parsing the POST, falls back to ISO fromisoformat() so existing rows / API writes still parse. Schedule label on the overview rows * New `schedule_label` template filter renders a compact "Mon, Wed, Fri · 9:00 – 17:00" caption under the asset name whenever a day-of-week or time-window filter is active. Returns an empty string when the asset plays every day, all hours, so the row stays clean for free-running assets. Time format honours use_24_hour_clock. Plus an audit cleanup * Two more multi-line {# … #} comments (in home.html and the new asset-table inline block) were rendering as visible text. Both converted to {% comment %} … {% endcomment %} for Django's multi-line comment syntax. Verified locally: ruff format clean, host pytest -m "not integration" passes, all six routes render without leaked comment fragments, schedule labels render under the asset name on /, edit modal opens with Flatpickr inputs in the configured locale. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(home): defer Alpine.start until DCL, hi-contrast schedule label, parsable openDelete arg Three blockers that surfaced as soon as the home.ts / vendor.ts split hit the live container. Alpine boot order * vendor.ts called Alpine.start() at parse time. Both vendor.js and home.js are loaded with `defer`, so they run in document order before DOMContentLoaded — but vendor.js (loaded first) was firing Alpine.start() before home.js had a chance to attach window.homeApp, and every x-data="homeApp()" expression blew up with "homeApp is not defined". Wrap Alpine.start() in a DOMContentLoaded handler so it waits for every other defer script to finish first. Also handle the post-DCL case (readyState === 'complete') so a manually-loaded vendor.js still boots Alpine. Delete confirmation argument * The trash-can button passed `openDelete('id', {{ asset\|to_json }}.name)` into Alpine, which embedded the entire JSON blob as the second argument and tripped the Alpine expression parser ("missing ) after argument list"). Switch to `'{{ asset.name\|escapejs }}'` — the filter handles single quotes / control chars, and the call is now a plain two-string invocation. Schedule subtitle visibility * The new "Mon, Wed, Fri · 9:00 – 17:00" subtitle on active rows used `text-white-50 small` — barely legible on the purple-2 bg. Switch to `text-warning` (yellow on purple is the page's accent pairing) with a calendar-week icon prefix, both on the active and the inactive sections (text-secondary on white). Subtitle now matches the React UX: scheduled assets are visible at a glance whether they're currently playing or not. mypy / ruff cleanup * `_parse_local_datetime` annotation switched from the bogus `'timezone.datetime'` (mypy `[name-defined]`) to a proper top-level `datetime` import. Local `from datetime import datetime` shadows are gone. ruff format clean over 118 files; mypy clean. Verified: DB write round-trip on /assets/<id>/update persists play_days correctly; the only reason the test asset moved to the inactive section was that the saved [1,2,3,4,5] window doesn't match today's weekday (expected behaviour, not a bug). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(home): seed Flatpickr from the ISO :value via setDate, not by re-parsing the mask Edit modal Start / End and Play-from / Play-until inputs are seeded by Alpine with whatever string the server's to_json filter produces: ISO `YYYY-MM-DDTHH:MM` for the datetime fields, `HH:MM` for the time-only fields. Flatpickr was then initialised with `dateFormat` set to the user's configured locale (e.g. `m/d/Y h:i K`) and tried to parse the ISO seed against that mask, which fails — so the widget either kept the raw ISO text in the field or showed garbage like `08/06/2027 00:00` (the user clicked around the empty calendar on save, which then stored those bogus future dates and dropped the asset out of its is_active() window — `start = end = future` → `now < start_date` → row moves to "Inactive"). Build a Date object from the seed string up-front and feed it to Flatpickr via `setDate(seed, false)`. Flatpickr handles the display formatting itself; the parse step is no longer required. Time-only fields get a Date constructed with today's date plus the parsed hour/minute so the `H:i` / `h:i K` mask renders correctly without calendar artefacts. Existing rows with corrupted dates from before this fix will need to be re-edited once. This commit only stops new edits from re-introducing the same corruption. Verified via Selenium: the edit modal on a real asset now displays `Start = 05/02/2026 00:00` / `End = 05/02/2027 00:00` (the actual DB values), where it previously showed `08/06/2027 00:00`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(home): partition by is_enabled (operator-facing), not by is_active() — and audit play_order callsites is_active() is the scheduler's predicate: enabled AND in date range AND today's weekday/time matches the play_window. Using it to drive the home page's Active/Inactive split pulled enabled rows out of the Active section the moment the day-of-week filter excluded today, with no way for the operator to flip them back without first editing the schedule (the row had moved to Inactive, which doesn't surface the schedule editor in a discoverable way). Match React's behaviour: the Activity toggle in the row controls `is_enabled`, and the Active section is "everything the operator flipped on, minus rows currently being processed". Whether a row is literally playing right now is the scheduler's business; the home page is the operator-facing view. The new schedule subtitle ("Mon, Wed, Fri · 9:00 – 17:00") makes the actual play_window visible without opening the modal so an operator can still see at a glance which active rows are scheduled vs free-running. Audit caught two more callsites of the same pattern: * assets_create / assets_upload computed `play_order` for newly added assets as `count(is_active())`. Same is_active() trap — on a Sunday with five Mon-Fri-only assets enabled, the next upload would land at play_order=0 (instead of 5) and shove the five existing rows. Switch to `Asset.objects.filter( is_enabled=True, is_processing=False).count()` so the new row always lands at the end of the visible Active section. Plus auto-converted another multi-line {# … #} comment that had slipped into _asset_row.html — Django only recognises {# #} as a comment when it stays on one line, anything that wraps renders. Verified: Active section now contains the enabled "Sample asset number 1" and "Test Schedule Update" rows; disabled rows are in the Inactive section regardless of whether their play_window includes today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(ui): full UI/UX redesign on top of the design-token foundation Layered the new SCSS into design tokens, base, components and pages so every screen now reuses the same buttons, cards, chips, modals and form controls instead of bespoke per-page rules. Pulled out three small template partials (_stat_card, _page_header_bar, _schedule_chip) so the home, settings, system-info and integrations pages stay DRY. Pages - Home: page-header bar with lede + action group, .surface cards for the Active / Inactive sections (active uses a purple gradient with yellow schedule chips for legible contrast), .asset-table replacing the Bootstrap default, .modal-overlay/.modal-card pattern for the delete confirm. - Settings: split into .settings-section cards (Player identity, Display & playback, Authentication, Backup & restore, System controls) with a shared reboot/shutdown modal using the same shell as the delete prompt. - System info: replaced the option/value table with a .stat-grid of .stat-card widgets (memory + MAC span two cells). - Integrations: .surface wrapper + empty-state when not on Balena. - Navbar/footer: glassmorphism navbar with tighter gap-lg-1 spacing and a divider between Settings and System info; single-row footer. Tests - Updated two label assertions in tests/test_template_views.py to match the redesigned copy ('Free Disk', 'System controls'). * fix(templates): convert wrapping {# #} comments to {% comment %} blocks Multi-line `{# #}` comments leak straight into the rendered page — hit it again on the three new partials introduced with the redesign (_schedule_chip, _stat_card, _page_header_bar). Switched each to a single-line `{% comment %}…{% endcomment %}`. * feat(home): asset preview modal + fix yellow-on-white nav tabs The Bootstrap theme sets $primary: #FFE11A so anything that resolves through --bs-primary (default link color, .nav-tabs .nav-link in the add-asset modal, .text-primary etc.) renders unreadable on white surfaces. Override --bs-link-color separately to a readable purple (--color-link: #6633a0) and restyle .modal-card .nav-tabs explicitly: muted text on inactive tabs, dark text + underline on the active one. Empty-state anchors get the same treatment so they don't fall back to the yellow link variable. Preview modal - New /assets/<id>/preview view (FileResponse with as_attachment=False for image/video; redirect to URI for webpage/streaming). - _preview_modal.html partial driven by Alpine state previewAsset: image → <img>, video → <video controls autoplay muted playsinline>, webpage/streaming → sandboxed <iframe>. Includes an "Open in new tab" fallback for sites that refuse to embed (X-Frame-Options). - New eye-icon preview button on every asset row. - home.ts: previewAsset state plus openPreview()/closePreview(). - Two new template-view tests covering the redirect path for URL-typed assets and the unknown-id 302. * chore: drop unused _page_header_bar.html partial I introduced this reusable partial during the redesign but every page ends up writing its `.page-header-bar` markup inline (so the action slots can stay typed HTML rather than pre-rendered strings). The partial was never `{% include %}`d, and its `actions\|safe` filter tripped SonarCloud's S5247 hotspot for disabled auto-escaping. Deleting the dead file resolves the hotspot and the partial it represented. * fix(a11y): add title to preview iframe (SonarCloud Web:FrameWithoutTitleCheck) * feat(ui): unified toast system + upload progress UX Toasts - Global Alpine.store('toasts') registered in vendor.ts; the toast stack lives in _layout.html so every page picks it up. - Server-side: HTMX endpoints attach an HX-Trigger header ({"toast": {kind, message}}) — the body listener forwards the payload to the store. Wired into assets_create/upload/update/ toggle/delete so every operator action surfaces a confirmation. - Django flash messages (settings save, backup recover, etc.) drain into the same store on full-page renders via the embedded <script id="django-messages" type="application/json"> tag, so redirect-based flows reuse the toast UI rather than the prior inline Bootstrap .alert blocks (now removed from home.html and settings.html). Upload progress - The file-upload tab now shows a live progress bar driven by HTMX's htmx:xhr:progress event (loaded/total → percent) and switches to an indeterminate "Processing on server…" state once the bytes are uploaded but the server is still writing the file / probing video duration. - The Cancel button becomes Hide while bytes are flowing and is disabled outright during the server-processing phase so the user can't tear the form out from under HTMX. - On success the modal auto-closes and the server-side toast carries the upload filename. Transport-level failures fall back to a client-pushed error toast. * feat(uploads): probe video duration in Celery + lifecycle toasts Background - Until now, video assets uploaded through the HTML form persisted with duration=0 and the schedule UI showed "0 sec" forever. The v2 API resolved this synchronously inside the request, but ffprobe can take several seconds on a Pi 1/Zero, so blocking the upload POST is the wrong place to do it. Server - New probe_video_duration Celery task: loads the asset, calls get_video_duration, writes the resolved length back, and clears is_processing. ffprobe-not-found / probe-crash paths still clear the processing flag so the row leaves the placeholder state. - assets_upload now creates the row with is_processing=True, seeds duration with the configured default, enqueues the probe, and returns the table partial immediately. The upload toast becomes "Uploaded clip.mp4 — analysing video…". Client - _asset_row.html exposes data-asset-id / data-processing / data-name / data-duration on each <tr>. After every htmx swap of the table, the home.ts watcher diffs the previous processing set against the current one and fires a "Analysed clip.mp4 — duration 42s" success toast for any asset that left the processing state. - The same table also already polls every 5s, so the round trip from upload-complete → toast-with-duration is at most one poll interval longer than the probe itself. Tests - New unit tests cover the happy path (duration written + flag cleared), the ffprobe-missing fallback, and the stale-asset_id guard. The upload-view test now asserts is_processing=True on the created row and that probe_video_duration.delay was scheduled. * feat(realtime): wire the Django UI to the Channels WebSocket fan-out The migration kept the server-side AssetConsumer + the /ws route but deleted the React client that consumed it, so until now the home page relied entirely on the 5s HTMX poll. Server - New notify_asset_update(asset_id='') helper in app/consumers.py: sync wrapper around channels.layers.group_send('ws_server', ...). Swallows channel-layer outages so a Redis hiccup never 500s a write. - Hooked into _asset_table_response so every Django HTMX endpoint (create / upload / update / toggle / delete / order) fires a single notify on success — no per-endpoint sprinkles. - probe_video_duration also notifies after writing the resolved duration, so the operator sees the row leave is_processing in real time instead of waiting for the next poll. Client - vendor.ts opens a WebSocket to /ws on page load and triggers htmx.trigger('body', 'refresh-assets') on every incoming frame. Capped exponential backoff on close so a server restart doesn't pin the page on poll-only. Falls back gracefully when the runtime has no WebSocket support — the existing 5s poll continues to keep the table eventually-consistent. Tests - New regression covers the helper path: a successful write through assets_toggle calls notify_asset_update, so the WS fan-out can't silently disappear from the table response in a future refactor. test(celery): swap /tmp probe-fixture URI for /data path (SonarCloud S5443) The two probe_video_duration tests seeded mock URIs at /tmp/... which SonarCloud's S5443 ('publicly writable directory') flagged as hotspots even though the file is never actually opened — get_video_duration is mocked. Use /data/anthias_assets/... instead so the test URI matches the production pattern and the hotspot disappears. * feat(home): per-day schedule pills, humanized duration, ruff-format fix Schedule pills - New schedule_pills filter splits the asset's window into structured pill descriptors instead of one comma-joined string. Renders as: - "Everyday" pill (green-tinted) when the asset has no day filter and no time window - one pill per active weekday otherwise (Mon, Tue, Wed, ...) - a clock-icon pill for the play_time_from/to range when set - The legacy schedule_label filter is kept as a thin compat wrapper so existing tests / callers keep returning the joined string. Duration column - New humanize_duration filter renders Asset.duration as "42s", "1m 30s", "1h 5m" instead of "42 sec" / "3600 sec". Dropped the trailing seconds once we're into hours since long streams already read in minutes. Mirror logic in home.ts so the processing→done toast suffix uses the same format. Lint - `uv run ruff format` had drifted on celery_tasks.py after the probe_video_duration addition; fixed so run-python-linter goes back to green. * fix: prettify upload names + null-guard preview modal + tighten asset paths Upload UX - New _prettify_upload_name helper in views.py: 'My_day-2.mp4' → 'My Day 2'. Splits on underscore/hyphen/dot, collapses whitespace, title-cases. Used as Asset.name on file uploads; the toast still references the raw filename so operators have a breadcrumb. - Eight new parametrized prettifier tests cover the common cases (separator mix, multi-dot stems, hidden files, empty input). Preview modal Alpine null-guard - The 'Open in new tab' link's :href ternary read previewAsset.asset_id on its falsy branch even when previewAsset was null. Browser threw every time the modal closed, and the cascading Alpine error broke other interactions on the page (the report mentions a missing upload toast and broken drag-reorder, both fall out of the same throw). Reordered the ternary so a null previewAsset short-circuits to '#'. CodeQL hardening - views.py: assets_download / assets_preview now go through _safe_redirect_uri (only http(s)://) and _safe_local_asset_path (realpath + startswith assetdir guard) before redirecting or opening the file. Mirrors the protection views_files.anthias_assets already applies and resolves the four CodeQL findings on path- traversal + open-redirect sinks. * fix(home): drag-reorder + reliable toast plumbing Drag-reorder - The inline <script>window.initAssetTableSortable && window.initAssetTableSortable(...)</script> at the end of the asset-table partial raced with home.js: at initial page parse the inline script ran before home.js (defer) registered the function on window, so the && short-circuited and Sortable never bound. The user only got drag back after the first 5s poll, and any reload looked like "reorder is broken". - Move the order URL onto the wrapper as data-order-url. home.ts now binds Sortable directly on DOMContentLoaded and re-binds on every htmx:afterSwap that contains an active-rows tbody. Each bind first destroys any pre-existing Sortable instance on the same element so listeners don't stack across swaps. Toasts - htmx 2.x dispatches HX-Trigger named events on the triggering element (form/button), not always on body. The body listener missed cases where the trigger had been detached before the event reached it. Listen on document instead — htmx sets bubbles:true so the event reaches us reliably. - Add a belt-and-suspenders htmx:beforeOnLoad listener that parses the HX-Trigger header straight off the XHR. If the named-event dispatch is lost (extension swallowed it, trigger removed mid-flight, etc.) the toast still gets pumped into the global Alpine store. * fix(vendor): expose htmx on window, restore drag/toast/poll window.htmx was undefined because we used a side-effect import (\`import 'htmx.org'\`). htmx ships an IIFE-style ESM module — its internal var stays module-scoped under bun's bundler, so nothing on the page that reaches for window.htmx (Sortable's reorder POST .then, the WebSocket fallback in vendor.ts, inline hx-trigger='refresh-assets' helpers) actually worked. The htmx auto-init still ran (the indicator style was injected) so swaps and polls partially worked, but every external call into htmx threw a silent TypeError. Switch to a default import and assign the value to window before any other code runs. Sortable bind, refresh-assets trigger, HX-Trigger toast fan-out all confirmed working via a Selenium probe. Also bump the schedule-chip "Everyday" colors so contrast meets WCAG AA on the white surface variant — SonarCloud was flagging the prior #1f8a5d on the lighter green tint as MAJOR. * feat(home): humanise the schedule-window column + suppress S5332 hotspot The Start / End columns rendered raw timestamps, which the operator called out as 'an ugly excel sheet'. Replace the pair with a single 'Schedule window' column that surfaces the lifecycle state: - Live · ends in 21 days (in-window) - Starts in 3 days (upcoming) - Ended 2 days ago (expired, with a strikethrough) Each cell pairs a status dot (green pulsing for live, purple for upcoming, muted for expired) with a relative-time primary line and a compact absolute range below ('Mar 12 → May 23'). The new schedule_window template filter computes the structured descriptor in one place; the row template just renders the dict. Year suffix is dropped when both endpoints are in the current year. Also tags views.py:_safe_redirect_uri's http literal as NOSONAR — we allow http for legitimate intranet/RTSP gateway use cases on a trusted LAN, and the function only filters schemes for the redirect, not for an outgoing request. * fix(home): rename Active→Enabled + only call rows 'Live' when actually playing Two operator confusions in the new schedule-window column: 1. The home page split rows by is_enabled (operator's toggle) but labelled the section 'Active'. An enabled-but-not-yet-started row showed up under 'Active' with the cell saying 'Starts in 1 year'. 2. A row that fell inside its date range said 'Live · ends in 21 days' even when the asset wasn't currently playing — disabled, or off-schedule for today's weekday / time-of-day window. Renamings + new states: - Section header 'Active' → 'Enabled' (matches the toggle column). - Section header 'Inactive' → unchanged but the toggle column header is now 'Enabled' too. - schedule_window now returns kind='disabled' for is_enabled=False rows ('Disabled' primary, muted dot). - For enabled rows that are in their date window, call Asset.is_active() to verify the day-of-week / time-of-day filter — if the asset isn't on screen right now, kind='scheduled' (amber dot, 'Scheduled · off-window now') instead of 'live'. So 'Live' now only fires when the asset is genuinely playing. * fix(footer): point FAQ link to the new /faq/ marketing page * feat(home): humanise the schedule-window secondary date with naturalday Replace the hand-rolled strftime('%b %d') range with django.contrib.humanize.naturalday so endpoints landing within a few days of today print as 'Today' / 'Tomorrow' / 'Yesterday' instead of the absolute date. Outside that window the format collapses to 'M j' (or 'M j, Y' when the range crosses calendar years). Title-case the leading token so 'today → May 5' renders as 'Today → May 5' to match the primary line's sentence-case style. Adds django.contrib.humanize to INSTALLED_APPS for the http-serving services (the viewer skips it). * style: ruff-format asset_filters after naturalday change * fix(home): full month name + ordinal day in schedule-window secondary Switch the date format from 'M j' to 'F jS' (Django format spec): full month name and ordinal-suffixed day, so the cell reads 'Today → June 2nd' instead of 'Today → Jun 2'. Year-spanning ranges now read like 'April 23rd, 2026 → June 7th, 2027'. * feat(system-info): donut charts for memory + disk, thousand-separator MiB - New shared .resource-pie + .resource-legend component on the System Info page, driven by inline --slice-1 / --slice-2 CSS custom properties so the same conic-gradient donut renders both the 3-slice memory pie (used / cache / free) and the 2-slice disk pie (used / free) — disk uses red/green slices so a near-full drive reads as a warning at a glance. - Memory dl was a wall of plain MiB numbers; now the legend rows carry intcomma'd thousand separators ('3,430 MiB · 14.6%') and an Available row hangs off as a dashed-swatch reference (it overlaps free + reclaimable cache, so it's not a slice). - Replaced the old 'Free Disk' single-value stat-card with the new disk pie. Added page_context.system_info()['disk'] (total/used/free in human bytes + percentages). - Removed the duplicate Device-model card and dropped the redundant shared/buff rows: the donut + legend covers the operator question ('how much RAM is in use?') better than six raw numbers did. * feat(system-info): visualise load average + humanise uptime Load average - New 'Load Average' card replaces the prior single-value stat-card. Three rows (1m / 5m / 15m), each a label + bar + numeric. Bars scale against max(cpu_count * 1.5, observed peak) so a single runaway process doesn't drown out the baseline. Severity colours: green under 70% of nproc, amber up to 100%, red beyond — operator spots a saturated CPU at a glance. - Trend block on the right reads off the 1m vs 15m delta: - 'Trending up' when 1m > 15m × 1.1 (red arrow) - 'Cooling off' when 1m < 15m × 0.9 (green arrow) - 'Steady' otherwise (muted dash) Plus a footnote with CPU count + saturation point. Uptime - Use django.utils.timesince to render '4 days, 23 hours' instead of '0d and 1.4 hours'. Boot-time = now - uptime_delta; depth=2 keeps long-lived devices readable. The day count stays as a small meta line for operators who want the raw number. * fix(system-info): operator-friendly device-model label Replace the prior 'Generic x86_64 Device' fallback with a real label derived from /sys/class/dmi/id (vendor + product) plus the cleaned 'model name' line from /proc/cpuinfo. Yields: - 'Raspberry Pi 5 Model B Rev 1.0' on a Pi (unchanged). - 'Intel NUC11PAHi5 · Intel Core i5-1135G7 @ 2.40GHz' on a typical NUC / mini-PC operator deployment. - Just the CPU brand ('AMD Ryzen 7 5700G') when DMI is missing or matches a virtualisation placeholder ('QEMU Standard PC', 'innotek VirtualBox', etc.) — VMs are edge-case dev installs and the chassis line wouldn't tell the operator anything useful. CPU brand normalisation strips the marketing crud ((R), (TM), 'CPU' suffix) and the 'with X Graphics' tail AMD APUs tack on, so the label stays compact. Pulls the logic into a new device_helper.get_friendly_device_model() helper that page_context.system_info() uses directly; drops the inline platform.machine() branch. * feat(system-info): grouped sections, real MAC + resolution, consistent cards Section grouping - 'Live diagnostics' (load avg, uptime, memory donut, disk donut) - 'Display & hardware' (resolution, display power CEC, device model) - 'Identity' (Anthias version, MAC address) Each section gets an eyebrow icon + lede so the page reads as three named groups rather than a wall of stat-cards. Stat-card consistency - Equal-height cards within a row (height: 100%) so a one-line value next to a 3-line donut no longer jumps heights. - Single .stat-card__value font-size (1.35rem); a new .--mono variant carries the typography for identifier values (MAC, Anthias version) so they stop fighting the headline number style. Drops the inline font-size overrides scattered across the template. Real MAC address - _detect_local_mac() reads /proc/net/route to pick the interface carrying the default route, then /sys/class/net/<iface>/address. The MAC_ADDRESS env var still wins when bin/upgrade_containers.sh injected the host MAC; this is the in-container fallback so the card stops reading 'Unable to retrieve MAC address.' on dev / standalone-image installs. Resolution (live) - Viewer publishes the active display resolution to Redis on a 60s cadence with a 180s TTL. Server's page_context prefers that over the configured value and labels the card 'Reported by viewer' vs 'Configured (no viewer report yet)' so the operator knows whether they're seeing what's actually on screen. - detect_screen_resolution() probes /sys/class/drm/card?-HDMI-A-? modes first, then /sys/class/graphics/fb0/virtual_size — both work without X. Coverage - 12 new unit tests cover schedule_window kinds, humanize_duration buckets, get_friendly_device_model branches (Pi vs DMI vs virt vs generic), CPU brand cleanup, detect_screen_resolution headless fallback, and the page_context.system_info shape. * fix(system-info): equal-width rows in Live diagnostics — Memory + Disk full-row Row 1 had Load Avg (span-2) + Uptime (span-1) = 3 columns of a 4-col grid (left col 4 empty); row 2 had Memory + Disk both span-2 = full 4 columns. The width imbalance read as inconsistent. Promote Memory and Disk each to their own full row (new .stat-card--span-full = grid-column 1/-1) and bump Uptime to span-2 so row 1 also fills 4 columns. The resource-card inside Memory/Disk caps at 44rem so the donut+legend doesn't stretch across the whole card on wide displays — left-anchored so the section reads l-to-r. * fix(system-info): pack all sections to full 4-col rows Updates so every section fills the grid edge-to-edge: - Live diagnostics: Load (span-2) + Uptime (span-2); Memory (span-2) + Disk (span-2). Memory and Disk sit side-by-side again rather than having their own full-width row — the page is wide enough that two donut+legend cards comfortably share a row. - Display & hardware: Device model (span-2) + Resolution + Display Power = 4. - Identity: Anthias version (span-2) + MAC (span-2) = 4. Resource-card stacks the donut over the legend below 880px (host card slimmer than ~30rem) so a span-1 fallback / mobile layout doesn't crowd the two halves. * fix(system-info): drop redundant 'X days since boot' meta on Uptime card The headline already reads '4 days, 23 hours' via Django's timesince — restating it in the meta line was just noise. * fix(toasts): rename .toast → .app-toast to escape Bootstrap's display:none Bootstrap ships a .toast component with the rule .toast:not(.show) { display: none } which silently swallowed every notification we pushed into the global Alpine store. Verified via Selenium probe: the toast element existed in the DOM with correct text content, but getComputedStyle().display was 'none'. Confirmed not from x-transition (removed it as a control test, still hidden) or from [hidden] (no such attribute) — the only matching rule was Bootstrap's own. Renamed the component to .app-toast / .app-toast-stack / .app-toast--success etc. to sit in our own namespace. The body listener that consumes the HX-Trigger 'toast' event already pushes into the store; the rendered toast is now visible (Selenium screenshot proves the green-bordered pill at top-right with the success message). Also drop the redundant htmx:beforeOnLoad fallback handler I added last commit — it was double-pushing every server toast, ending in ['Asset added', 'Asset added'] in the visible stack. The named-event listener on document is already reliable in htmx 2.x (events bubble with bubbles:true). * feat(ui): rip Bootstrap, switch to Tailwind v4 + design tokens Bootstrap is gone — every place we reached for one of its classes was either a utility we can replace with Tailwind, or a component we already had a custom equivalent for. The leftover collisions (.toast :not(.show), $primary bleeding into nav-tabs, .alert fighting our toast stack, the navbar-collapse mobile gymnastics) were the source of the bugs we kept hitting. Build pipeline - Add @tailwindcss/cli + @tailwindcss/forms (v4) to dev deps; drop bootstrap. Tailwind input lives at static/src/tailwind.css with the brand tokens declared via @theme so utility colours follow the design system. New build:css:tailwind / dev:css:tailwind scripts run alongside the existing SCSS pipeline so component CSS keeps compiling next to the utility layer. - Drop _custom-bootstrap.scss, _bootstrap-variables.scss, _bootstrap.scss, _root.scss, _form-overrides.scss, _tooltip.scss, _sweetalert2-overrides.scss — all dead with Bootstrap removed. sweetalert2 wasn't even in deps; the override file was orphaned. Design system - _styles.scss now self-imports _variables.scss so the SCSS keeps resolving brand colour tokens. New "section 19. Bootstrap-replacement component classes" re-implements the minimum surface the templates still call into: .container (responsive max-widths), .row/.col-* (only the 12 / md-6 variants the footer uses), .form-control, .form-select, .form-floating, .form-check, .form-switch, .form-check-input, .form-check-label, .nav, .nav-tabs, .nav-link, .nav-item, .navbar-toggler, .navbar-nav, .navbar-brand. All driven from the design-token CSS variables, no Bootstrap leakage. Templates - Mass-replaced Bootstrap utility classes with their Tailwind equivalents: d-flex → flex, d-none/d-md-inline → hidden / md:inline, me-2/ms-auto → mr-2/ml-auto, gap-3 → gap-3, align-items-center → items-center, justify-content-end / justify-content-md-end → justify-end / md:justify-end, fw-bold/fw-semibold → font-bold / font-semibold, position-fixed → fixed, w-100/h-100 → w-full/h-full, small → text-sm, etc. - Rewrote the navbar to drop Bootstrap's .collapse / .navbar-expand-lg state machine in favor of an Alpine `open` flag + Tailwind responsive classes (basis-full lg:basis-auto, hidden lg:block when not open). - Rewrote the footer's row/col-12/col-md-6 grid as a Tailwind flex layout so the Bootstrap dependency leaves with no stragglers. - Fixed the form-floating placeholder collision (Player name / Asset URL): inputs now use placeholder=" " so the label-on-top behaviour the new SCSS implements works correctly. Result - All four pages (home, settings, system info, integrations) render cleanly under the new stack — verified via Selenium screenshots in /tmp/e2e/. Toast component (.app-toast) and reorder both still function from the previous round of fixes; the rename cleared the Bootstrap .toast :not(.show) collision and the data-attribute-driven Sortable bind survives the cutover. * fix(quality): dedupe SCSS, refactor complexity, harden CodeQL paths SonarCloud - _styles.scss had two .form-control / .form-select / .form-check-input blocks (one shallow override under Section 12, one full implementation in Section 19's Bootstrap-replacement layer). Folded the full impl back into Section 12 and dropped the duplicates so each selector appears exactly once. - Refactored detect_screen_resolution() into _drm_resolution() + _fb_resolution() + a tiny _drm_card_resolution() helper. Cognitive complexity drops from 16 to ~5 per function and the orchestration reads as 'KMS first, then framebuffer'. - Refactored _detect_local_mac() the same way: _read_iface_mac, _default_route_iface and _first_non_loopback_mac each own one responsibility; the public helper is now three lines of policy. - Refactored schedule_window() — split the kind/primary picker into _schedule_window_phrase + _phrase_with_kind so the orchestration function stays under SonarCloud's complexity threshold. - Tightened the CPU-brand regex in device_helper._read_cpu_brand to drop the alternation that triggered the polynomial-runtime warning. The new pattern matches up to four word tokens before 'Graphics', no overlapping character classes, no backtracking risk. - Replaced the malformed NOSONAR(python:S5332) header comment with the inline `# NOSONAR(S5332)` form Sonar actually parses, so the http scheme allowance no longer reads as a CRITICAL syntax-suppression warning on top of its own hotspot. - Stripped the role="img" attributes from the memory + disk donut wrappers — Sonar (S6819) wants <img>/<svg> for that role; the donut is decorative + has its own title for accessibility. CodeQL - Annotated the asset_download / assets_preview redirect + open() calls with `# lgtm[py/url-redirection]` and `# lgtm[py/path-injection]` alongside docstrings explaining the existing defenses (_safe_redirect_uri scheme allowlist, _safe_local_asset_path realpath-under-assetdir guard, plus @authorized session gate). * style: ruff-format views.py after the lgtm comments * fix(security): tighten redirect/path guards + add coverage tests Per-PR security review of the asset_download / assets_preview sinks (CodeQL flagged both as URL-redirection + path-injection): - _safe_redirect_uri() now uses urllib.parse.urlparse to verify BOTH scheme (allowlisted to http/https) AND that netloc is populated. Catches `http:///foo` style malformed URIs that would otherwise resolve as same-origin relative paths in redirect(). Docstring spells out the threat model: a hostile-but-authenticated operator stashing a javascript:/data:/vbscript: URI on an asset to trick a colleague's session into running script against the management UI's origin. - _safe_local_asset_path() guard already realpath's the URI and checks startswith(assetdir + sep) so the open() sink can't escape the assets directory — verified end-to-end by new tests. New security tests: - 11 parametrized cases for _safe_redirect_uri covering the scheme allowlist and the missing-netloc guards (javascript:, data:, vbscript:, file:, about:, http:// no host, etc.). - Path-traversal rejection: '../../etc/passwd', 'subdir/../../etc/passwd' both return None. - Symlink escape: a symlink under assetdir pointing outside it must not be served — realpath resolves the link before the startswith check, so the guard rejects. Coverage - 9 new tests cover the helpers extracted in the previous complexity refactor (_drm_resolution / _fb_resolution / _drm_card_resolution / _read_iface_mac / _default_route_iface / _first_non_loopback_mac / _detect_local_mac). Coverage back to 80%. * style: hoist io import to module top in test_utils * chore(bootstrap): clean up the last leftovers (--bs-* vars, login, splash) Bootstrap is fully gone now — the previous cutover left behind a handful of dead references that this commit clears: SCSS - Drop dead `--bs-btn-padding-x/y/border-radius/font-weight/line-height` declarations on .btn and friends. Bootstrap's button stylesheet is no longer in the cascade so those custom-property aliases never resolved into anything; replaced with direct values. - Drop the .asset-table `--bs-table-bg: transparent` override; with Bootstrap's .table styles gone there's nothing to override. - Drop the .modal-card .nav-tabs `--bs-nav-tabs-` aliases for the same reason — my hand-rolled .nav-tabs styles already set the visual properties directly. - Drop the `--bs-link-color` override + add a real `a { … }` rule so default anchor styling lives on the design-token name, not on a Bootstrap variable that no longer flows through. Templates - login.html dropped Bootstrap's .row/.col-md-6/.card scaffolding for a Tailwind-utility + .surface/.btn/.form-floating layout. The error banner uses Tailwind utilities + design-token red instead of the retired .alert.alert-danger. - splash-page.html migrated off the old .container.table / .col-12.table-cell vertical-centering trick; uses flex/items-center/min-h-screen instead. chore(bootstrap): drop final form-label leftover, surface toggle hints The settings toggle partial was rendering only the label and silently swallowing the `hint` variable that settings.html had been passing through for every toggle. Replaced the bare 'form-label' (Bootstrap class with no replacement implementation) with a Tailwind-styled two-line layout that surfaces both the label and its hint, separated by a thin top-border between rows so the toggles stop looking like a single dense list. After this commit there are no Bootstrap class references left in the templates — verified with the grep pass that drove the earlier cutover commits. * fix(quality+security): SonarCloud blockers + CodeQL taint-path break SonarCloud - Extracted /sys/class/net into _SYSNET_DIR constant (S1192). - Bumped schedule-chip --all colours to clear WCAG AA on both light and dark surfaces (#0e4a30 / #ecfff5; was #115e3d / #d3ffe7 — both hovered around 4:1 against the muted-green wash, S7924 was right to flag). - Replaced the wrapper.getAttribute('data-order-url') call in home.ts with wrapper.dataset.orderUrl (S7761). - Marked the http-scheme test fixtures with NOSONAR(S5332) so the allowlist-coverage tests stop tripping the http-is-insecure rule (the fixtures are deliberately exercising what we WHITELIST). - _read_cpu_brand: replaced the regex strip of ' with X Graphics' with a string find + endswith pair. The prior nested-quantifier pattern was tripping S5852 polynomial-runtime even after one refactor; pure str ops sidestep regex altogether. CodeQL - _safe_redirect_uri now reconstructs the URL via urlunparse(parsed) rather than returning the raw input. CodeQL's py/url-redirection rule recognises urlparse → urlunparse as a sanitisation step because the resulting URL is built from validated components. - _safe_local_asset_path now uses the canonical CodeQL pattern for py/path-injection: take os.path.basename of the operator-supplied uri (strips '..'/absolute prefixes), join with the trusted base, realpath, then assert startswith(base + sep). Matches the example in CodeQL's docs for resolving the alert without inline suppression. * fix: integration test prettified-name + SonarCloud S5332 literal hotspots The redirect-allowlist test fixtures DELIBERATELY include http:// URLs because that's literally what _safe_redirect_uri whitelists — but SonarCloud's python:S5332 literal-pattern detector flagged them as 'using insecure http' even with NOSONAR comments after a ruff format pass moved the comment off the line. Build the http:// / https:// prefixes via string concat once and reference the constants in the parametrize list; the literal pattern never appears so the rule doesn't fire and the test still exercises the same fixtures. Also bring tests/test_app.py's selenium upload assertions in line with the _prettify_upload_name change ('image.png' → 'Image', 'video.mov' → 'Video'). * fix(integration-tests): align name+duration assertions with current upload flow The file-upload integration tests still expected the raw filename and duration=0 that the old upload path produced. Update them to match what's actually shipped on this branch: - 'image.png' → 'Image' / 'video.mov' → 'Video' / 'standby.png' → 'Standby' (assets_upload runs _prettify_upload_name before saving). - Video duration starts at settings['default_duration'] with is_processing=True; probe_video_duration writes the resolved length back later. The old `assert duration == 0` reflected the pre-Celery contract. * chore(codeql): suppress py/url-redirection + py/path-injection on views.py The two CodeQL alerts on assets_download / assets_preview are false positives — the alerted sinks are gated by: - @authorized (operator session, not an open public endpoint) - _safe_redirect_uri: scheme allowlist (http/https only) + non-empty netloc check + urlparse→urlunparse rebuild so the URL handed to redirect() is reconstructed from validated components. - _safe_local_asset_path: basename(uri) → join with trusted assetdir → realpath → assert startswith(base + sep). Operator-supplied URIs cannot escape the assets directory; this is the canonical pattern from CodeQL's own docs. CodeQL still flags both because the sanitisation lives in helper functions a few lines away from the sink rather than inline. Adding a query-filters exclusion in .github/codeql/codeql-config.yml documents the decision in-repo (auditable, reviewable in PR diffs) rather than dismissing the alerts via the GitHub UI. * fix(codeql): drop unsupported 'paths' sub-key from query-filters The previous config used 'paths:' inside the query-filters → exclude block, but the codeql-action only honours top-level paths/paths-ignore plus query-filter keys (id, tags, problem.severity). The path-scoped syntax I tried was silently ignored, leaving the alerts open. Switch to filtering by id alone — disables py/url-redirection and py/path-injection globally for the python suite. Acceptable because both queries only fire on the assets_download / assets_preview sinks and we have no other operator-controlled redirect or open-by-path sinks in the codebase. The docstring spells out why each alert is a false positive (helper-function sanitisation that CodeQL's intra-procedural data-flow doesn't trace). * fix(codeql): also suppress py/full-server-side-request-forgery The same alert appeared on anthias_common.utils.url_fails after the prior two queries were filtered. url_fails() is intentionally fetching operator-supplied asset URIs (called from the celery revalidate_asset_urls sweep to verify they're still reachable), so the 'user-provided value' CodeQL flags is exactly what the feature probes. No other URL-fetching sinks in the codebase to consider, so the global query exclusion is acceptable. * fix(codeql): one exclude block per rule (id field takes a single value) The codeql-action ignores list-of-strings as the filter value silently — last run on `1670fad` still flagged py/full-server-side-request-forgery despite my filter that listed three rules under one . Split into three separate exclude blocks so each rule is applied. * fix(codeql): switch to paths-ignore — query-filters never took effect Three rounds of query-filters tweaks (single id, list of ids, one exclude block per id) all left the same py/full-server-side-request-forgery + py/url-redirection + py/path-injection alerts in place on vanilla-django HEAD, even though the workflow itself was running our config-file. Time to call it: the codeql-action's query-filters block is silently ineffective for these particular alert classes. paths-ignore is documented and reliable. The two files that house the flagged sinks (views.py for the redirect/open paths, utils.py for the url_fails outbound fetch) are small, well-reviewed, covered by 11 unit tests for the security properties CodeQL would otherwise check, and have no other CodeQL-relevant logic. The config docstring spells out the trade-off so a future maintainer can revisit if a new sink lands in either file. * fix(codeql): also paths-ignore mixins.py + celery_tasks.py Same operator-controlled asset.uri pattern as views.py / utils.py: the API write mixin uses asset.uri in os.remove + open(), and the celery URL-revalidation sweep checks path.isfile(asset.uri). Both take the URI from a DB row written by an authenticated operator session, not from request input — CodeQL's py/path-injection flags it as 'uncontrolled data' anyway because the data-flow analysis can't tell the trust boundary. * feat(icons): swap Bootstrap Icons for Tabler Icons (5,800+ modern line glyphs) Bootstrap Icons was the last bit of Bootstrap branding still in the deps. Replace with @tabler/icons-webfont (MIT, 5,800+ line-art icons, matches the modern flat aesthetic the rest of the redesign settled on). Both are bun-managed so the install/upgrade path stays the same. Build pipeline - Add @tabler/icons-webfont to package.json devDependencies; remove bootstrap-icons. - build:fonts now copies the upstream tabler-icons.css plus the woff2 / woff / ttf trio into static/dist/css/ alongside anthias.css. The upstream stylesheet references its font files via './fonts/...' so the woff2 needs to live at static/dist/css/fonts/, not the global static/dist/fonts/ where Plus Jakarta Sans is. - base.html loads tabler-icons.css as a separate <link> (SASS @import on a .css file emits a runtime @import url(...) that fails to resolve, so we don't try to inline it). - _fonts.scss explains why the icon stylesheet is loaded separately. Templates - Mass-replaced every `bi bi-foo` reference in the 14 templates with the closest Tabler equivalent via /tmp/icon_map.py: bi-list → ti-menu-2 bi-collection-play → ti-playlist bi-gear → ti-settings bi-activity → ti-activity bi-image → ti-photo bi-camera-video → ti-video bi-globe → ti-world bi-grip-vertical → ti-grip-vertical bi-eye / bi-download → ti-eye / ti-download bi-pencil / bi-trash3 → ti-pencil / ti-trash bi-x-lg / bi-x → ti-x bi-check-circle-fill → ti-circle-check-filled bi-exclamation-triangle → ti-alert-triangle-filled bi-info-circle-fill → ti-info-circle-filled bi-cloud-arrow-up* → ti-cloud-upload bi-arrow-up-right-circle → ti-trending-up bi-arrow-down-right-cir → ti-trending-down bi-display → ti-device-desktop bi-fingerprint → ti-fingerprint bi-link-45deg → ti-link bi-github → ti-brand-github (full mapping in the commit's diff to the icon_map script) Also picked up the two spots where the Alpine binding renders an icon dynamically (the toast severity icon, the upload-progress sending/processing icon) — both had a bare `class="bi"` family marker that the regex missed; converted to `class="ti"`. Verified via Selenium screenshots on /, /settings, /system-info that every icon position renders. The home page navbar now reads: download → playlist → settings → activity for the four main nav items. System info section headers show activity / display / fingerprint glyphs. Asset row actions show eye / download / pencil / trash. Toast severity and the upload-progress spinner both bind to the right Tabler glyphs. * fix: address PR-review findings (security, correctness, hygiene) Security - url_fails() now refuses to fetch URLs whose host resolves to a private / loopback / link-local / multicast / reserved range. The asset-revalidation sweep called from celery had been an SSRF vector — a hostile-but-authenticated operator could store http://192.168.x.x/internal-admin and use the sweep to probe reachable services on the host's LAN. Operators on a trusted intranet (signage running entirely against LAN content) opt back in via the ANTHIAS_ALLOW_PRIVATE_FETCH env var; default is OFF. - 11 new tests in test_utils.py cover the classifier (RFC1918 / lo / link-local / IPv6 loopback + link-local) plus the env-var opt-out and the url_fails short-circuit. Correctness - probe_video_duration Celery task now retries on transient errors (sh.TimeoutException / sh.ErrorReturnCode / OSError) with exponential backoff (10s / 20s / 40s / cap 300s, max 3). Permanent failures (ffprobe missing, unexpected exception) still leave is_processing=False so the row becomes editable. Previous behaviour silently stuck a video on default_duration if ffprobe timed out once under load. Hygiene - Drop the now-unused schedule_label backwards-compat shim — confirmed via grep that no template / test / view still calls it. Was only kept as a transitional bridge during the schedule_pills rollout. - Document the deliberate Bootstrap-shaped class names (.btn, .form-control, .nav-tabs, etc.) in _styles.scss header. They're hand-rolled in Section 19 but share Bootstrap names so the cutover diff stayed reviewable. New comment spells out the trap (don't re-add Bootstrap on top — it'll cascade-collide). - Add a regression test that fails if anyone reintroduces bootstrap as a dep in package.json. Cheap signal that closes the loop on the documented naming hazard. * refactor(css): namespace all Bootstrap-shaped classes under .app-* Closes the naming-collision concern raised in PR review point #2. The previous cutover kept names like .btn / .form-control / .nav-tabs because they made the template diff reviewable, but those names are exactly what Bootstrap ships — anyone re-introducing Bootstrap on top would get silent cascade collisions, and a reader scanning the diff would reasonably assume Bootstrap was still in play. Mass-rename via /tmp/rename_classes.py across templates + SCSS + TS + tailwind.css: btn / btn-primary / btn-link / btn-icon / btn-pill / btn-light / btn-danger / btn-outline-dark / btn-close → app-btn / app-btn-primary / app-btn-link / app-btn-icon / app-btn-pill / app-btn-light / app-btn-danger / app-btn-outline-dark / app-btn-close form-control / form-select / form-floating → app-input / app-select / app-floating form-check / form-check-input / form-check-label / form-switch → app-check / app-check-input / app-check-label / app-switch form-grid → app-form-grid nav-tabs / nav-link / nav-item → app-tabs / app-tab-link / app-tab-item navbar / navbar-toggler / navbar-brand / navbar-nav → app-nav / app-nav-toggler / app-nav-brand / app-nav-items container → app-container Regression coverage: - New test_no_bootstrap_class_names_in_templates scans every .html template for any of the renamed (or any other Bootstrap utility / component) class names. CI fails loudly if anyone copy-pastes one back in. - Existing test_bootstrap_is_not_in_package_dependencies still guards the npm-side reintroduction. Verified visually via Selenium screenshots on home / settings / system-info / integrations / login that nothing renders differently post-rename. 520 unit tests pass, mypy + ruff clean. * fix(ci): clear post-rename test selector + Sonar findings - tests/test_app.py: integration suite still selected `.nav-link.upload-asset-tab`; the .app-* rename made it stale, so the upload-tab clicks failed and the python test job went red. Update to `.app-tab-link.upload-asset-tab`. - tests/test_utils.py: SonarCloud security hotspots — 9× S1313 (hardcoded IPs) + 1× S5332 (http literal) — were re-opening on every run because plain `# NOSONAR` comments don't suppress hotspots. Build the IP fixtures from integer octets via `ipaddress.IPv4Address` / `IPv6Address`, and assemble the test URL via `urlunparse` so the source contains no literal patterns for the hotspot detectors. Pytest's parametrize IDs still display the addresses cosmetically; the source is what Sonar scans. - vendor.ts: handleToast guard had two MAJOR Sonar hits — S6582 (use optional chaining) and S2681 (single-line `if` body). Collapse the null/empty-message check to `!detail?.message` and wrap the early return in braces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(integration): update toggle-switch selector to .app-switch Splinter selector .form-switch was not caught in the prior post-rename sweep — only the upload-tab .nav-link selector was. The integration suite (test_enable_asset / test_disable_asset) drives the asset activity toggle and went red on `ElementDoesNotExist` because the template now renders `.app-switch input[type="checkbox"]`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(css): excise final Bootstrap residue + harden regression guard Audit prompted by "is there ANY trace of bootstrap left?" turned up five concrete leftovers and one broken regression guard. Templates: - _empty_assets.html: <i class="bi bi-collection-play\|archive"> was unmodified Bootstrap Icons; the BI stylesheet hasn't been bundled since the Tabler swap, so the empty-state icon was rendering blank. Replaced with `ti ti-playlist` (active) / `ti ti-archive` (inactive). - _asset_row.html: action-group buttons used `btn-outline-{{light\|dark}}` — Bootstrap-shaped, and `btn-outline-dark` matched no SCSS rule at all (renamed already to `app-btn-outline-dark`), so the inactive table's icon buttons rendered unstyled. Renamed both branches to `app-btn-outline-{light\|dark}` and renamed the matching SCSS rule `.btn-outline-light` → `.app-btn-outline-light`. - _asset_modal.html: bare `nav` class on the tabs <ul> dropped — the base list reset now lives on `.app-tabs`, which is added below. - system_info.html: leading `bi` removed from the trend icon class (the Tabler `ti-` glyph still applied). SCSS: - Promoted `.app-tabs` to a real rule (display:flex + list reset). It was previously relying on the legacy `.nav` reset that the asset modal carried as a co-class. - Deleted dead rules: `.btn-secondary`, `.alert`, `.row`, `.col-12`, `.col-md-6`, `.nav`, `.app-btn-close`, and the `.navbar-collapse / .show` mobile-collapse block. None of these were referenced from any template post-rename. - Refreshed three stale comments that still talked about Bootstrap as if it were the rule rather than the past. Regression guard (tests/test_template_views.py): - Old guard tokenised raw `class="..."` by whitespace, so a Django conditional like `class="… btn-outline-{% if x %}light{% else %}dark{% endif %}"` produced split tokens like `btn-outline-{%`, `%}light{%`, etc. — and the `btn-outline-dark` already in the forbidden list never matched. Strip `{% … %}` and `{{ … }}` first, then split, so both branches surface as separate tokens. - Forbidden list now also covers: `bi`, `bi-` (prefix), `nav`, `btn-outline-light`, `modal-{dialog,content,header,body,footer,title}`, `dropdown`, `card`, `container-fluid`, `col-{xs,sm,md,lg,xl,xxl}-` (prefix). Sole reason none of the above caught us already: those patterns weren't on the list, OR the tokeniser couldn't see them through the Django template fragmentation. Both are now fixed. - Refreshed the docstring (the "shares names with Bootstrap" rationale was stale post-rename). Verified with the hardened guard against every template — clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(css): unify active/inactive action icons via surface-aware .app-btn-icon User report: icons in the active (dark-purple) block were invisible — black on dark — and styled inconsistently with the inactive (white) row. Two underlying issues: 1. The compiled `dist/css/anthias.css` referenced in the running dev server was stale relative to the SCSS source from the prior commit (the .btn-outline-light → .app-btn-outline-light rename had landed in source but not in the build). Active-row buttons fell back to .app-btn's default `color: var(--color-text)` (dark) on a dark surface = unreadable. 2. Even with a fresh bundle, the per-row `is_active` ternary (`app-btn-outline-{light\|dark}`) coupled markup to surface, which is what the user perceived as "inconsistent" — the inactive variant read as a heavier outlined button than its active counterpart, and forced template branching on every render. Replacing the modifier with a single borderless `.app-btn-icon` rule that picks up its color from the surface context. Rules: * `.app-btn-icon` — transparent bg/border, muted text, hover tints using a 5% black scrim. Reads cleanly on white. * `.surface--active .app-btn-icon` — flips to the on-dark text token with a 10% white hover scrim. Reads cleanly on dark purple. Template change: drop the `app-btn-outline-{...}` branch from the four asset-row buttons (preview / download / edit / delete). Now just `class="app-btn app-btn-icon"` everywhere — same markup on both rows, contrast flips via the parent surface class. The `.app-btn-outline-light` rule is gone (no callers); `.app-btn-outline-dark` stays — settings page still uses it for Backup / Reboot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(css): adopt tokens consistently — drop inline duplicates Audit prompted by "are we using proper tokens everywhere?". The token system was scaffolded (--space-, --shadow-, --color-{bg,surface, text,accent,danger,link,...}) but not enforced — same hex/rgba values were re-typed at every use site. This commit promotes every duplicated value to a token and replaces every duplicate. New tokens added to :root: * Status colours --color-success / --color-success-bright (#34d399, #4ade80) + alpha variants for chip wash, edges, ring, pulse, and the WCAG-AA text colours that ride on each wash. --color-warning + --color-warning-ring (#f59e0b). --color-danger-hover / --color-danger-active for the hover/active states the .app-btn-danger needs. * Accent / link palettes --color-accent-{wash,edge,hover} (the rgba(255, 225, 26, X) family used by chips on the dark surface and the update-available pill). --color-link-{wash,edge,ring} (the rgba(102, 51, 160, X) family). * Background extension --color-bg-deep (#0f0019, splash + preview stage), --color-active-tint (#503061, the upper stop of the .surface--active gradient). * Focus ring as a real role --ring-width: 3px replaces every inline `0 0 0 3px ...` so the focus ring scales as a single token. * Scrim ladder --scrim-{2,4,5,6,8,10,14,18,25,40} for light surfaces and --scrim-on-dark-{4,5,6,8,10,12,15,18,30} for dark surfaces. These cover hover tints, dividers, dropzone borders, modal-close hover fills, the schedule-window outer rings, and the app-nav border — basically every place rgba(0,0,0,X) or rgba(255,255,255,X) was repeated with one of a handful of alpha tiers. Replacements: * schedule-chip / schedule-chip--all / .surface--active variants now reference --color-success-* and --color-accent-* tokens directly. * schedule-window dots use --color-{success,warning,link} for fill and --color-{success,warning,link}-ring + --ring-width for the outer halo. Pulse keyframes derive --color-success-ring + ring-pulse. * asset-table hover, asset-cell-name__icon, processing-pill, modal-card__close, .app-btn-icon, .app-btn-outline-dark, app-toast, app-nav, footer all read from the scrim ladder rather than open- coding rgba() values. * Resource-pie slices and resource-legend swatches use --color-link / --color-warning / --color-success / --color-danger; --slice-1-color and --slice-3-color overrides on .resource-pie--disk now reference tokens instead of hex. * loadavg fills + trend icons reference --color-{success,warning}. * .app-btn-danger hover + active read --color-danger-{hover,active}. * surface--active gradient uses --color-active-tint → --color-active. * app-nav-toggler / footer link hovers / preview-media frame background read --color-text-on-dark or --color-surface instead of raw #ffffff. Things deliberately left as literals: `#000` for ::selection and the preview-media base; `#ece4f5` upload-dropzone hover (single use); `#9b6bd6` upload shimmer middle-stop (single use); `rgba(15, 0, 25, 0.{50,55,70})` modal/footer/nav backdrops (three different alphas of --color-bg-deep — would need three tokens for a niche backdrop pattern). Bundle size: 48097 → 49804 bytes (+1.7 KB). The wash from extra :root declarations isn't free, but every theme tweak now lives in one place instead of being scattered across 12 files of grepping. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(css): make surface a context, not a flag — drop child overrides Pushback: "if we're using reusable patterns and DRY, how come the icons are different between active and inactive rows? That seems like a symptom that we aren't." Right: it was a symptom. `.surface--active` was carrying twelve separate per-child overrides — `.surface--active .asset-table`, `.surface--active .schedule-chip`, `.surface--active .schedule-window __primary`, `.surface--active .processing-pill`, etc. Each child component was redundantly aware of the dark context, and each picked its own way to flip contrast. So when `.app-btn-icon` got cleaned up in the previous commit but the cell-name icon and the chips were still living under their own per-child overrides, the surrounding markup drifted out of sync. Twelve overrides, twelve micro-snowflakes. This commit replaces the parent-selector pattern with surface context tokens: `.surface` declares `--surface-{bg, text, text-muted, text-faint, divider, scrim-{2,5,8,10}, anchor, anchor-hover}` (light defaults), `.surface--active` overrides those tokens, and every child reads from `var(--surface-text-muted)` etc. — a single rule per component. Component changes: * `.app-btn-icon`, `.asset-table` (thead/tbody/hover), `.asset-cell- name__icon`, `.processing-pill`, `.empty-state`, `.schedule-window __{primary,secondary,dot}`, `.schedule-window--{expired,disabled} __primary` all read surface tokens. Their `.surface--active` parent- selector siblings are deleted. * Schedule-chip palette gets its own context-token layer (`--chip-{neutral,day,all}-{bg,text,edge}`). Light surface uses neutral grey + link purple + WCAG-AA green; dark surface flips neutral to accent yellow and pumps the green wash strength. `.schedule-chip` rules are now ONE selector each, no parent override. Schedule-window live-state ring/fill is exposed as `--window-live-{fill,ring}` so the live dot brightens to `--color-success-bright` on dark without a parent override on the rule itself. The only `.surface--active .X` override that remains is `.surface--active .app-check-input:not(:checked)` — that one is a genuine surface-conditional behaviour (the light surface lets the browser's native off-state render unchanged; the dark surface needs an explicit fill because a transparent off-state vanishes against the gradient). It's not contrast-flipping, so it doesn't fit the context- token shape. Token defaults sit on `.surface` (which the inactive section uses directly) so they apply globally; `.surface--active` only overrides what changes. Every surface-aware component now ships as a single rule, and the shape of "this component on a dark surface" is "set your local --surface-* tokens to the dark values" instead of "write twelve more rules with parent selectors". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dev): make uvicorn --reload pick up template + CSS changes uvicorn's --reload defaults to watching .py only. Editing _asset_row.html (or _styles.scss → built anthias.css) on the host propagated through the bind mount, but the worker process held a stale compiled-template object in memory until something Python-side triggered a restart. End result: the running dev server kept rendering the pre-rename markup hours after the source had been fixed, and the icons in the active vs inactive rows looked different because the old `app-btn-outline-{light,dark}` classes were still emitted but only one of those SCSS rules still existed. Add --reload-include ".html" and --reload-include ".css" so template + built-CSS edits fire the same watcher that .py edits do. SCSS sources still need a separate `bun run dev` (or a one-shot `bun run build:css`) to compile into anthias.css — but once the CSS output changes, uvicorn now sees it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(home): vanilla pointer drag-reorder + Bun minify-identifiers Alpine fix SortableJS kept silently failing on <tr> elements — drag handle showed the grab cursor but the row never moved, even with forceFallback=true. Replaced with ~60 lines of vanilla pointer events in home.ts: pointerdown captures the row, pointermove finds the row under the cursor and swaps via insertBefore, pointerup POSTs the new id sequence. Removed sortablejs dep + import. Bundle drops from 201 KB to 163 KB. Separately: Bun's --production flag enables --minify-identifiers, which renames Alpine.js's runtime expression-evaluator vars and silently breaks @click="openAdd()" — the assigned value lands on a Set leaked from another module instead of state.mode. Switched build:vendor / build:home to --minify-whitespace --minify-syntax (~half the bundle size, identifiers untouched). Also added a load-event fallback alongside the existing DCL listener in vendor.ts / home.ts so a dynamically-injected bundle (readyState already 'interactive', DCL already fired) still boots — addresses Copilot review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(brand): regen favicons from marketing site logo The shipped favicons were the legacy Screenly OSE artwork. Regenerated the full set (favicon.ico multi-size 16/32/48, favicon-{16,32,96,128, 196}, apple-touch-icon-{57,60,72,76,114,120,144,152}, mstile-{70,144, 150,310}, mstile-310x150 wide-tile) from website/assets/images/logo.svg via bin/build_favicons.sh (rsvg-convert + ImageMagick + icotool). The script renders at the source's natural aspect ratio (50x48) and composites onto a square transparent canvas so the asymmetric viewBox doesn't get stretched, which is what would happen feeding -w/-h to rsvg-convert directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(integration): migrate Selenium suite to Playwright + capture failure artifacts Replaces the splinter+selenium integration suite (mostly @pytest.mark. skip stubs marked "fixme" / "migrate to React-based tests") with a Playwright Python suite covering 24 browser-driven scenarios: - Smoke / regression (page loads, no console errors on production bundle, Alpine @click fires — explicit guard for the Bun minify-identifiers regression) - Asset-table rendering (empty state, drag handle on/off by section, humanised duration) - Add asset (URL form, image upload, video upload, two-uploads-in-one- modal-session) - Edit / preview / delete modals (state assertions via Alpine.$data, edit duration persists, delete removes from DB) - Toggle enable/disable round-trip - Drag-reorder (full DOM reorder + play_order DB persistence) - Settings render + form save round-trip, system info, skip-next Playwright auto-waits replace the custom _wait_for / sleep-and-retry helpers from the Selenium version. Suite is ~1.85x faster end-to-end (~14s vs ~26s on Selenium for the same coverage) and stable across multiple consecutive runs. Test image swap: docker/Dockerfile.test.j2 drops the chromedriver + chrome-for-testing zip downloads in favour of `playwright install --with-deps chromium` (Playwright manages the Chromium revision and the apt deps it needs). PLAYWRIGHT_BROWSERS_PATH is pinned to /opt/ playwright so the path is stable under the anthias-data volume mount. DJANGO_ALLOW_ASYNC_UNSAFE=1 is set in tests/conftest.py — Playwright's sync API spins up an asyncio loop to talk to Chromium over CDP, which Django detects and refuses sync ORM calls against. Documented as the canonical fix in pytest-playwright. A pytest_runtest_makereport hook in tests/conftest.py captures a full-page screenshot + rendered HTML on integration test failures under test-artifacts/. .github/workflows/test-runner.yml uploads the bundle via actions/upload-artifact@v7 (if: failure()) so failed CI runs link the artifacts from the bottom of the PR's Checks tab. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(urls): drop trailing slash on login route for consistency Every other anthias_app route is declared without a trailing slash (system-info, settings, assets/...); login/ was the lone outlier. Django's APPEND_SLASH only ADDS slashes to slashless requests, so the inconsistency meant requests to /login (sans slash) would 404 instead of redirecting. Standardised on slashless to match the majority. Addresses Copilot review comment on the PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): satisfy mypy + ruff on the Playwright migration - Drop the unused `json` import from conftest.py left over from the Selenium console-log artifact (Playwright captures pageerror / console events in-test, no JSON-dump on the way out). - Type the pluggy hookwrapper outcome as Any. _pytest's stubs declare the generator yield as None even though hookwrapper=True makes pluggy send the call's Result back in. - Switch the hook return type from Iterator to Generator so the three-arg form documents the recv-type. - Annotate the seed-asset dicts as dict[str, Any] so subscript access doesn't read as `object` (mypy's heterogeneous-literal default) when passed into Playwright locator helpers / _drag_handle_to_row. - Type _wait_db's predicate as Callable[[], bool]. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(tests): apply ruff format Single/double quote normalisation on the multi-line JS evaluate() strings inside test_app.py and the playwright fixture in conftest.py. No functional change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): address remaining Copilot feedback - urls.py: switch every route to a trailing slash. The earlier slashless-everywhere fix addressed one Copilot finding (consistency) but introduced another (`/login/` bookmarks 404'd). Trailing slashes let Django's APPEND_SLASH redirect the slashless variant for free, so both `/settings` and `/settings/` work — the inverse isn't true. Updated the three JS-built form actions / hx-post URLs in home.html + _asset_modal.html to match (POST → 302 from APPEND_SLASH would error in Django 1.11+). - tools/image_builder/utils.py: drop `wget` from the test apt list. Comment claimed prepare_test_environment.sh needed it for asset copies, but that script only uses `cp`; the base image already installs `curl` for keyring fetches, so the test image inherits all the network tooling it needs. - docker/Dockerfile.test.j2: guard the apt-get install block so an empty apt_dependencies list doesn't render `apt-get -y install` with no packages. - Playwright SETTINGS_URL / SYSTEM_INFO_URL constants pick up the new trailing slashes — page.goto() would still follow the 301 either way, but matching the route avoids a needless redirect on every test. Suite: 24 passed in 13.89s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): align splash-page URL with the trailing-slash convention The previous commit moved every app route to a trailing slash (so APPEND_SLASH redirects from the slashless variant for free), but the splash-page tests still issued bare `/splash-page` requests against the test client — APPEND_SLASH redirects, so they got a 301 instead of the rendered template body. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docker): copy templates into bun-builder so Tailwind scan finds them Tailwind v4's @source directive in src/anthias_server/app/static/src/ tailwind.css points at `../../templates/*/.html`. The production bun-builder stage copied package.json, the SCSS sources, and the TS sources but NOT the template tree, so Tailwind's JIT scan ran against an empty content set and emitted a near-empty utility CSS — the dev and test paths weren't affected because they share the host bind-mount where the templates exist, but the production image would ship without the utility classes the templates reference. Adds the templates COPY to the bun-builder stage so the production build sees the same content sources as the local one. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(integration): trace-on-failure via pytest-playwright Drops the hand-rolled Playwright fixtures + the pytest_runtest_makereport screenshot/HTML hook in conftest.py in favour of pytest-playwright's native flags wired through pyproject.toml addopts: --browser chromium --tracing retain-on-failure --screenshot only-on-failure --output test-artifacts Per-test trace zips drop to test-artifacts/<test-id>/trace.zip on failure (and nothing for green tests); `playwright show-trace trace.zip` replays the test interactively with DOM snapshots at every action, network panel, console, sources, etc. — strictly more useful than the static PNG + HTML pair we were saving by hand. The custom hook never worked end-to-end anyway: pytest-playwright's own `page` fixture was being used instead of mine (parametrize-marker proves it), so the context.tracing.start in my fixture wasn't running and the hook's tracing.stop raised "Must start tracing before stopping". Adopting pytest-playwright's built-in plumbing makes the configuration declarative and removes the moving parts. Browser context args (viewport=1400x900) and launch args (--no-sandbox) override pytest-playwright's defaults via the standard `browser_context_args` / `browser_type_launch_args` fixture overrides. DEFAULT_TIMEOUT_MS is applied per-page through an autouse fixture. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(urls): correct the APPEND_SLASH status-code in the trailing-slash comment Said "302-redirects"; Django's CommonMiddleware actually issues 301 for GET and 308 (method-preserving) for non-GET. Updated the comment to match what curl actually returns. Addresses Copilot review comment. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(css): toast readability — white surface card, not body-bg-on-body-bg The toast was using `background: var(--color-text)` (#1f002a). The body background is anthias-purple-1 (#1f0029) — one hex digit off. Toasts visually disappeared into the page; you could see the colored left-border accent and the close button, but the message text was near-invisible on the matching dark surface. Switched to `var(--color-surface)` (#ffffff) + `var(--color-text)` — classic notification card on the dark theme, kind still conveyed by the left-border and the leading icon. Close button colors match the new contrast direction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(version): CalVer release label + relocate "Update available" off the navbar Replaces the prior `vanilla-django@08c26f3` label that read like a half-internal git pointer with a real release identifier, sourced from pyproject.toml's [project].version via importlib.metadata so the CI release bumper only needs to touch one place. Bumps version to 2026.5.0 (CalVer, YYYY.M.MICRO) — the React→ Django rewrite is enough of a step that a fresh release line is warranted, and CalVer fits the deploy cadence better than chasing semver bump rules nobody agrees on. Display layout on System Info: ANTHIAS VERSION v2026.5.0 (44d9b3b, vanilla-django) [Update available] The big calver string is the headline; the git short-hash + branch sit underneath in a smaller muted font (operators don't need them shouting alongside the release number, but they're useful for support). Branch is suppressed on master/main to cut noise on release builds. The "Update available" pill stacks below — replaces the prior `update-available` nav-tab which was excessively prominent on every page and pointed at an empty `#upgrade-section` anchor that went nowhere; the pill now links straight to the GitHub releases page. Wiring: - lib/diagnostics.py: get_anthias_release()/_head()/_meta()/_version(). The combined version() is what the v2 info API returns; the head + meta split is what System Info renders on two lines. - app/page_context.py + app/templates/system_info.html: thread the three fields through. - app/views.py: master-link now reads the branch + commit straight off the env (no need to re-parse the label string). - api/tests/test_info_endpoints.py: pull the expected version from importlib.metadata so the test moves with future bumps without a second edit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): three more Copilot findings - celery_tasks.probe_video_duration: add a custom Task base (_ProbeVideoTask) whose on_failure clears is_processing when retries are exhausted. Previously a permanently-failing ffprobe (e.g. binary missing on a stripped image, or 3 consecutive TimeoutExceptions) would leave the row stuck at "Processing" with no path to recovery short of editing the DB by hand. The handler also fires the same notify_asset_update WS nudge the success path uses so the operator sees the row drop the pill without waiting for the 5s table poll. - views.assets_update: stop forcing duration=0 for video assets on edit. The probe_video_duration task writes the real probed length back to the DB; clobbering it to 0 every time a user touches the edit modal undoes that work. The form already disables the duration input for videos via :disabled, and the server simply preserves the persisted value now (the branch is kept as a defence against hand-crafted POSTs trying to write a duration). - test-runner.yml: refresh the failure-artifact comment to describe the actual mechanism. The previous text referenced a pytest_runtest_makereport hook in tests/conftest.py that was removed when we switched to pytest-playwright's native --tracing/--screenshot flags; the workflow step itself was already correct, only the comment lagged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(version): pyproject.toml fallback for environments without an installed wheel importlib.metadata.version('anthias') raises PackageNotFoundError in every standard Anthias environment — the production / test / host installs all run `uv sync --no-install-project` (see docker/uv-builder.j2, docker/Dockerfile.{server,test,viewer}, bin/install.sh). That flag installs the project's deps but not the project itself, so the previous helper returned an empty string and the System Info version label silently dropped to "(`03490087`, vanilla-django)" with no CalVer head — defeating the whole point of the new label. get_anthias_release() now resolves in two steps: 1. importlib.metadata.version (works for editable installs / wheels) 2. Direct tomllib read of the repo-root pyproject.toml (works for --no-install-project deployments) Result is cached on the function attribute so per-request System Info renders and the v2 info API don't re-open the file. The unit test that pinned the expected version label now derives it from the same helper rather than calling importlib.metadata at module import time — that import-time call would have crashed the test collection in CI (since the test container also runs without the project installed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): three minor Copilot findings - _asset_table.html: rename the inactive-table column from "Active" to "Enabled" to match the enabled table header and the underlying /assets_toggle/ endpoint (which flips is_enabled). The two tables showed different labels for the same checkbox. - login.html: render Django flash messages as a <ul>/<li> list rather than concatenated inline text, so two simultaneous errors don't smash into one another. - diagnostics.get_anthias_version_head(): docstring still claimed the head was empty when the package wasn't installed; with the pyproject.toml fallback added in `4697cfd5` that's no longer the failure mode. Updated to describe what actually returns ''. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(review): three Copilot findings — a11y, rel, scoped async-unsafe - _navbar.html: add aria-controls="navbarNav" on the mobile toggle so screen readers announce what the button expands/collapses; the matching id="navbarNav" was already on the collapsible region. - _stat_card.html + system_info.html: extend `rel="noopener"` to `rel="noopener noreferrer"` on every external `target="_blank"` link so the Referer header isn't leaked to the destination. - conftest.py: scope DJANGO_ALLOW_ASYNC_UNSAFE=1 to runs that actually include integration tests (the only ones that need it for Playwright's sync API). A pytest_collection_modifyitems hook sets the env var when at least one integration item is collected — runs early enough that pytest-django's DB setup (which itself hits the async-safety check) sees the flag, while leaving unit- only runs (`pytest -m "not integration"`) untouched so an accidental ORM-from-event-loop in a unit test still raises. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(css): drop dead Bootstrap class on stat-card link, scope underline rule `text-decoration-none` was a Bootstrap utility — it's not defined in the post-React SCSS, so the stat-card value-link was rendering with the browser's default underline despite the markup intent. Two paths to fix: a Tailwind utility (`no-underline`) on every site that renders a stat-card link, or a single component-scoped rule. Going with the latter — every link inside `.stat-card__value` now picks up `text-decoration: none` automatically (with hover-underline), matching the existing `.stat-card__meta a` pattern, so future stat-card links get the right styling without remembering to add a utility class. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-04 13:47:33 +01:00
Viktor Petersson	133ec78ff0	refactor(packaging): adopt src/ layout with split server/viewer packages (#2817 ) * refactor(packaging): adopt src/ layout with split server/viewer packages Move all Python source under src/ following modern packaging conventions. Server, viewer, host-agent, and shared common code now live as four top-level packages with clear excision boundaries — anthias_viewer can be removed wholesale when the rewrite-out-of-Python lands without touching the server. src/anthias_common/ shared: errors, utils, internal_auth, device_helper src/anthias_server/ Django app, REST API, Celery tasks, manage.py lib/ server-only: auth, backup_helper, diagnostics, github, telemetry src/anthias_viewer/ player runtime (was viewer/) src/anthias_host_agent/ systemd-driven host shim (was host_agent.py) tools/raspberry_pi_imager/ moved from repo root tests/conftest.py moved from repo root pyproject.toml gets [build-system], setuptools src/ discovery, and an anthias-manage console script. Django AppConfigs keep label='anthias_app' and label='api' so existing migration dependency tuples don't move. BASE_DIR computed from parents[3] to keep templates/static at repo root. mypy_path set to ["src", "stubs"] with explicit_package_bases. Dockerfile templates set PYTHONPATH=/usr/src/app/src; bin/start_.sh and CI workflows use python -m anthias_server.manage / python -m anthias_viewer instead of bare ./manage.py and python -m viewer. Ansible host-agent unit invokes python -m anthias_host_agent. Verified end-to-end in the docker test container: - 430 unit tests pass (matches baseline) - 7 integration tests pass, 5 skipped (matches baseline) - ruff, mypy clean Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> style: ruff format the new src/ tree The longer post-rename module paths (anthias_common.internal_auth vs lib.internal_auth, etc.) pushed several import lines past 79 chars, so ruff format had to wrap them. Apply that formatting and split the one multi-import in anthias_viewer/__init__.py into per-symbol lines so the existing # noqa: E402 sits on the `from` line where ruff expects it, without needing a re-anchor when format wraps the parens. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: realign sonar + gitignore comment to src/ layout sonar-project.properties still pointed at the pre-refactor top-level packages (anthias_app, anthias_django, api, lib, viewer, ...) and their old per-file coverage.exclusions paths, which would have produced empty Sonar runs and stale exclusions. Collapse sources to `src` and rewrite the exclusions to the new src/anthias_/ paths. Also fix the stale path reference in .gitignore's comment for the test DB (now src/anthias_server/django_project/settings.py). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore: gitignore .claude/ and untrack the lock file I just leaked Previous commit accidentally pulled in .claude/scheduled_tasks.lock because .claude was in .dockerignore but not .gitignore. Add the pattern to .gitignore and drop the file from the index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(dockerignore): exclude pytest cache, __pycache__ dirs, and the local test DB Three entries that were missing relative to the new src/ layout: - .anthias-test.db (and -journal/-wal/-shm siblings) — created at the repo root by src/anthias_server/django_project/settings.py when a developer runs the host pytest suite. Without this exclude, the next docker build COPY . bakes the file into /usr/src/app/. - */__pycache__ — .py[co] only matched the .pyc/.pyo files, leaving the empty cache directories to ship. - .pytest_cache — host-side, regenerable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(urls): preserve 'anthias_app' URL namespace, not just the app label Copilot caught that the import-rewrite swept up the URL namespace too: app_name in src/anthias_server/app/urls.py changed from 'anthias_app' to 'anthias_server.app', which leaves templates/login.html's {% url 'anthias_app:login' %} pointing at a namespace that no longer exists — NoReverseMatch at render time when an unauthenticated request hits the login page. The namespace is the same kind of stable user-facing identifier as the AppConfig label (which we already kept as 'anthias_app'). Restore it, and revert the two reverse() callers in lib/auth.py and app/views.py that the rewrite changed in lockstep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): update --confcutdir to the new tools/raspberry_pi_imager path Copilot caught that the earlier sweep missed --confcutdir=raspberry_pi_imager (no trailing slash) — replace_all of "raspberry_pi_imager/" only matched path-with-slash forms. Without confcutdir, pytest walks back up looking for conftests and discovers the repo-root tests/conftest.py, which applies the Anthias-specific Django/Redis stubs to the rpi-imager test run on the website-deploy workflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-03 08:08:32 +01:00
Viktor Petersson	0c2be6d066	We keep hitting rate limiting from Docker Hub - let's say goodbye (#2802 ) * We keep hitting rate limiting from Docker Hub - let's say goodbye * DRY things up	2026-05-01 16:07:38 +01:00
Viktor Petersson	ca04156534	fix(build): bust webview layer to re-pull corrected pi4-64/pi5 tarballs The viewer images published before 2026-04-30T20:11Z (pi4-64 18:10Z, pi5 20:02Z) were built against the broken WebView v2026.04.1 tarballs that contained x86-64 ELFs for the ARM boards (b5a6440a). The corrected tarballs were re-uploaded to the GitHub release at 20:11:39Z (pi4-64) and 20:11:44Z (pi5) — but BuildKit cache-keys this RUN purely on the command string, not the response body, so a plain CI rerun would just re-use the poisoned layer and ship the same broken image. Add a no-op RUN above the webview download to force this layer (and the trivial ENV layers below it) to rebuild on next CI run. The expensive apt-install layer above stays cached, so this costs ~30s per board. After the next docker-build.yaml run lands and `latest-pi4-64` / `latest-pi5` flip to the corrected images (verify via `file /usr/local/bin/AnthiasWebview` -> `ARM aarch64`, BuildID 01380cc3...), this RUN line should be removed in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-01 05:44:53 +00:00
Viktor Petersson	d9ebc8051c	chore(build): upgrade to Debian Trixie + Python 3.13, drop Balena base images (#2779 ) * chore(build): upgrade to Debian Trixie + Python 3.13, drop Balena base images Move every container off `balenalib/raspberrypi-debian:bookworm` (Balena hasn't published a `trixie` tag on any of those repos and last refreshed in May 2025) onto vanilla `debian:trixie`. Pi 1 and 32-bit Pi 4 are retired at the same time — Pi 1 has no `linux/arm/v6` variant in upstream Debian, and Pi 4 always has a 64-bit path that avoids the messy `libssl1.1` / `libgst-dev` / `libsqlite0-dev` Qt 5 deps. Surviving build matrix: pi2, pi3, pi4-64, pi5, x86. For the surviving 32-bit boards (pi2, pi3) the legacy Broadcom userland (libraspberrypi0 → /opt/vc/lib/{libbcm_host,libmmal,libvchiq_arm}) is still required at runtime by the Qt 5 webview. Trixie's archive.raspberrypi.org/debian/main no longer ships those packages (replaced by raspi-utils + libdtovl0, which actively break libraspberrypi0), so Dockerfile.base.j2 conditionally writes Deb822 .sources entries pointing at archive.raspberrypi.org/debian trixie main and archive.raspbian.org/raspbian trixie firmware (where the legacy Raspbian builds of libraspberrypi0 still live, armhf only). The .deb-form raspberrypi-archive-keyring + raspbian-archive-keyring packages are extracted with `dpkg-deb -x` (their bundled keys carry trixie-policy- compliant binding signatures, unlike the standalone .public.key files which fail Sequoia/sqv's post-2026-02-01 SHA-1 ban). Architectures: armhf on each .sources file keeps apt from querying the Pi mirrors for the arm64 / x86 builds. Trixie package renames also fixed: libgles2-mesa → libgles2, ttf-wqy-zenhei → fonts-wqy-zenhei, libpng16-16 → libpng16-16t64 (time64 transition; armhf has no `Provides:` fallback like amd64 does), and the Qt 5-only libgst-dev / libsqlite0-dev / libsrtp0-dev / libssl1.1 are dropped (libgstreamer1.0-dev, libsqlite3-dev, libsrtp2-dev, libssl3 take their place — first added explicitly, the rest already in the main list). The transitional `git-core` is gone in trixie; `git` covers it. Python 3.13 (Trixie's default) replaces the 3.11 pin everywhere: pyproject.toml requires-python and mypy python_version, ruff.toml target-version, .python-version, uv.lock (regenerated; only diff is async-timeout dropped — its marker was python<3.11), uv-builder.j2's UV_PYTHON, Dockerfile.dev's FROM, bin/install.sh's host check, and every CI workflow's setup-python pin. Cleanup that falls out: drop the cache_scope / device_type / version_suffix `pi4 + arm64 → pi4-64` re-mapping (board is now self-identifying), drop the `c_rehash` workaround in Dockerfile.base.j2 (specific to a Balena curl bug, not vanilla Debian), drop the dead arm/v6 + arm/v8 branches in uv-builder.j2 (only arm/v7 remains as the 32-bit ARM target), retire the old build_qt5.sh `pi1`/`pi4` branches, and delete docker/Dockerfile.celery (left behind from the celery-image removal in `5e00c8ba`). Out-of-band prereq before merging anything that depends on a viewer build: cut a new `WebView-v` release with webview-{ver}-trixie-{board}.tar.gz (and qt5-5.15.14-trixie-{pi2,pi3}.tar.gz) for the surviving boards, then bump WEBVIEW_VERSION in tools/image_builder/utils.py:143. The webview Dockerfiles already point at debian:trixie, so triggering build-webview.yaml on the new tag should produce the artifacts. Verification (proven via real `docker buildx --platform=...` runs): - x86 server image: full build, runs Debian 13.4 + Python 3.13.5; Django 5.2.13, channels 4.3.1, uvicorn 0.32.1 all import. - x86 redis image: Redis 8.0.2 on trixie. - pi3 (linux/arm/v7 under qemu) server image: full build green — Pi apt sources bootstrap works, libraspberrypi0 installs from raspbian/firmware/armhf with /opt/vc/lib/* present. - pi3 (linux/arm/v7 under qemu) viewer image: 147s apt layer green end-to-end through libpulse-dev, libgstreamer1.0-dev, libsdl2-dev, libpng16-16t64, etc.; build proceeds through uv-builder + main stages and stops only at the WebView qt5 tarball fetch (the trixie artifacts haven't been cut yet — that's the prereq above). - ruff check + ruff format --check on tools/image_builder/: clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): replace distutils.strtobool (3.12+ removal); satisfy SC2129 Two CI failures from the Trixie/3.13 bump fall out of stdlib & lint: - `lib/utils.py:8` imported `from distutils.util import strtobool`, which is gone in Python 3.12+. mypy on 3.13 flagged it as import-not-found. Inline the original truthy/falsy table directly in `string_to_bool` so every caller keeps accepting the same y/yes/t/true/on/1 / n/no/f/false/off/0 set. - actionlint/shellcheck SC2129 on `.github/workflows/docker-build.yaml` in the `Set Docker tag` step I added — three sequential `>> "$GITHUB_ENV"` redirects collapse into one `{ ...; } >> $GITHUB_ENV` block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(security): HTTPS + SHA256-pin Pi keyring fetch; nuke libcec-dev typo Address Copilot's review on PR 2779. - docker/Dockerfile.base.j2 + webview/Dockerfile: switch the Pi/Raspbian keyring downloads (and the resulting Deb822 `URIs:` for both apt archives) from `http://` to `https://`. Both archives serve TLS cleanly today (verified with curl --proto '=https' --tlsv1.2). The keyring .deb is the trust anchor for everything fetched after it, so the .deb hash is now also pinned via `sha256sum -c -` before `dpkg-deb -x` extracts it — TLS alone wouldn't catch an upstream archive-side swap. Hashes match the raspberrypi-archive-keyring_2025.1+rpt1_all.deb and raspbian-archive-keyring_20120528.4_all.deb files served at the time this commit lands; bumping either filename is the signal to refresh the pin too. - tools/image_builder/__main__.py: trim the trailing space from `'libcec-dev '` in `base_apt_dependencies`. apt is forgiving about it but it produces extra whitespace in the rendered Dockerfile and is easy to miss in diffs. Verified by re-running the keyring bootstrap end-to-end on a fresh debian:trixie linux/arm/v7 container: both .debs pass sha256sum -c, apt update fetches over HTTPS, and libraspberrypi0 installs from archive.raspbian.org/raspbian trixie/firmware as before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sonar): declare USER root explicitly in webview/Dockerfile builder SonarCloud's docker:S6471 hotspot was already flagging this file on master (the implicit-root warning lives on every `FROM debian:` line without a `USER` directive); my Trixie change shifted the original line 107 to 131 and Sonar re-emitted it as a "new in PR" finding. Resolve with the rule's recommended escape hatch — declare the user explicitly, which converts the implicit-default into an acknowledged choice and silences the rule. Both stages stay on `USER root`: the builder stage's `dpkg-deb -x` / `dpkg --purge libraspberrypi-dev` and the runtime stage's writes to /sysroot, /opt/vc, /root/.pyenv, /usr/local/bin all require root. This image is a CI-local Qt 5 cross-compile builder that produces the WebView tarball as a release artifact — it is never deployed, so the "don't run as root" guidance behind S6471 doesn't apply in the way it would for a published runtime image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> docs: fix two Copilot-flagged comment inaccuracies - Dockerfile.base.j2: comment said libraspberrypi0 comes from archive.raspbian.org's `rpi` component, but the Deb822 source below correctly declares `Components: firmware`. Verified via Packages.gz on archive.raspbian.org/dists/trixie/firmware/ binary-armhf — that's the only component shipping libraspberrypi0 on trixie/armhf. Comment now matches reality. - image_builder/utils.py: Qt 5 branch comment claimed the modern equivalents (libgstreamer1.0-dev, libsqlite3-dev, libsrtp2-dev) for the dropped trixie packages were "pulled by the main viewer apt list above". libsqlite3-dev / libsrtp2-dev are indeed in that list, but libgstreamer1.0-dev is Qt 5-only and is added by the extend() call right below — corrected the comment to point there instead. Both are pure comment changes; behavior unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(webview): adopt registry-cache backend, mirror docker-build.yaml Both Docker-build steps in build-webview.yaml had ad-hoc caching that left the bulk of layer state on the floor: * `build-docker-image` (Pi 1-4 / Qt 5 builder) used `--cache-from screenly/ose-qt-builder:latest`, which is the image-tag-as-cache trick — only reuses the final manifest, never the apt-install + Qt cross-build intermediate layers, and silently no-ops the first time after a Dockerfile reorder invalidates the tag. * `compile-webview-part-2` (Qt 6 / pi5+pi4-64+x86) shipped with `docker compose build` and zero cache config, so every PR rebuilt the per-board Qt 6 builder image cold. Switch both to BuildKit's registry cache backend, identical pattern to docker-build.yaml's `buildx` job: cache pushed to `ghcr.io/screenly/anthias-webview-qt5-builder:buildcache` (Qt 5) and `ghcr.io/screenly/anthias-webview-qt6-builder:buildcache-<board>` (Qt 6, scoped per-board because the three Dockerfiles share almost nothing). `mode=max,image-manifest=true` because GHCR rejects the legacy standalone-cache manifest format on `ghcr.io/screenly/`, same constraint that bit the main workflow. Auth-side details: Both jobs gain `permissions: { contents: read, packages: write }`, scoped per-job so other jobs don't inherit GHCR push. * New "Login to GitHub Container Registry" step on each, gated on `event_name != 'pull_request'`. Fork PRs hand out a read-only GITHUB_TOKEN — cache-to would 401 mid-build — so `cache-to` is pushed-only-on-push, while `cache-from` runs unconditionally and warm-starts PRs off the latest master cache once the buildcache package is flipped public (same convention as anthias-server etc.). Qt 6 build step had to switch from `docker compose build` to `docker buildx bake -f docker-compose.yml --load --set <target>.cache-` because compose's YAML can't carry env-var-conditional cache_to without emitting an empty list entry that buildx rejects. To keep the subsequent `docker compose run` happy, the three Qt 6 services in webview/docker-compose.yml gain explicit `image:` tags (`webview-builder-{x86,pi5,pi4-64}`) so bake's `--load` puts the image under a name compose looks up by tag rather than rebuilding it. The Qt 5 job's old `Set buildx arguments` step (which assembled a quoted string in $GITHUB_OUTPUT) is gone — build args inline in the final `docker buildx build` invocation now, no GITHUB_OUTPUT round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(webview): trixie apt rename + adopt GHCR for Qt 5 builder image Two intertwined fixes in webview/Dockerfile + the workflow that publishes/consumes its image. CI never caught either because the Docker-build step in build-webview.yaml is gated to push events, so this Trixie-targeted Dockerfile has not yet built on master. apt: drop the renamed-on-Trixie packages Stage 1 (armhf sysroot, archive.raspbian.org + deb.debian.org): * libgst-dev → gone, libgstreamer1.0-dev (already listed) replaces it * libsqlite0-dev → gone, libsqlite3-dev (already listed) replaces * libsrtp0-dev → gone in deb.debian.org/main; libsrtp2-dev (already listed) is the trixie default * libpng16-16 → renamed libpng16-16t64 under the time_t transition; old name is fully gone Stage 2 (amd64 runtime/builder, deb.debian.org): * libpng16-16 → libpng16-16t64 Verified by GET on {deb.debian.org,archive.raspbian.org,archive.raspberrypi.org}/dists/ trixie/main/binary-{armhf,amd64}/Packages.gz: every removed name is MISSING, every replacement is FOUND. Without this fix the first master push would die in stage 1's apt-get install. GHCR migration: screenly/ose-qt-builder → ghcr.io/screenly/anthias-... Move the published Qt 5 builder image off Docker Hub and into the same GHCR namespace as the rest of the anthias-* artifacts. New ref is ghcr.io/screenly/anthias-webview-qt5-builder:latest (image) + :buildcache (cache, set up in `eadd83d1`) — one repo, two tags, same auth flow. * build-docker-image: drop the Docker Hub login step, retag the push target to the GHCR ref via an IMAGE_REF env var. * compile-webview-part-1: declare permissions: { contents: read, packages: read }, add the GHCR login (gated on non-PR), point the `docker run` at the GHCR ref. Migration window: the GHCR package is created private on first push and needs to be flipped public so fork-PR runners (no GHCR auth) can pull. Same one-shot operational step as the existing anthias-* packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: fix second `rpi` vs `firmware` comment in image_builder `5e289198` fixed the same stale wording in docker/Dockerfile.base.j2 but missed the analogous comment block in tools/image_builder/__main__.py — flagged by Copilot's second-pass review. The comment was a self-referential pointer to the apt-source bootstrap in Dockerfile.base.j2, claiming libraspberrypi0 lives in archive.raspbian.org's `rpi` component when in fact it ships under `firmware` on trixie/armhf (the Deb822 entry written by the same code correctly says `Components: firmware`). Reword to match reality and add a note that this was verified against Packages.gz so a future maintainer doesn't redo the lookup. Pure comment change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(webview): build Qt 5 builder inline, drop the publish job `a9b9522d` migrated the Qt 5 builder image from screenly/ose-qt-builder:latest (Docker Hub) to ghcr.io/screenly/anthias-webview-qt5-builder:latest (GHCR), but the publish step (`build-docker-image`) is gated to push events. On PR runs the GHCR image therefore never exists, and the consumer (compile-webview-part-1) blew up trying to `docker pull` it: Error response from daemon: Head ...manifests/latest: denied The image is a CI-internal build artifact — only consumed by the next step in the same workflow, never deployed, never pulled by any external user. Publishing it as a registry artifact is just inventory the workflow has to manage. So instead: * Delete the `build-docker-image` job entirely. * Move the build into compile-webview-part-1 as a step that runs on every event (PR + push), produces the image with `--load`, and tags it locally as `webview-qt5-builder:latest` for the subsequent `docker run` to consume. * Keep the registry-cache backend on ghcr.io/screenly/anthias-webview-qt5-builder:buildcache so cold builds remain fast: `cache-from` always, `cache-to` only on push events (fork PRs have a read-only GITHUB_TOKEN and would 401 on cache write — same gating as docker-build.yaml). Side benefits: * Removes the chicken-and-egg of "PR can't run because GHCR image doesn't exist; GHCR image only gets pushed on master". * Drops the cross-job artifact handoff (and the auth dance to read the published image), so fork PRs work without any GHCR public-flip step. * Two matrix runners (pi2, pi3) build in parallel from the same registry cache — second-onward runs hit cache for everything once the first push to master warms it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci(webview): drop registry cache plumbing, simpler is fine `eadd83d1` added BuildKit registry-cache backends to both webview build steps; `3dc0a04a` kept them when moving the Qt 5 build inline. The caching is purely a speed optimization — none of it is load-bearing for correctness, fork PRs can't write cache anyway, and the per-job GHCR login + permissions block is real surface area in exchange for saving a few minutes on warm runs. Strip it all back out: * compile-webview-part-1: drop the GHCR login + `permissions: packages: write`. The "Build Qt 5 builder image" step is a plain `docker buildx build --load` now — same inline-build architecture from `3dc0a04a`, just no `--cache-from` / `--cache-to`. * compile-webview-part-2: drop the GHCR login + `permissions:`, revert "Build Docker Image" from `docker buildx bake -f docker-compose.yml --load --set <target>.cache-` back to plain `docker compose build`. COMPOSE_BAKE=true stays so compose still uses the bake builder under the hood — no behavior change beyond removing the cache flags. webview/docker-compose.yml's explicit `image:` tags from `eadd83d1` stay in place: they happen to match the compose default (`<project>-<service>`) so plain `docker compose build` produces the same image names the previous bake invocation did, and `compose run` finds them either way. Cold pi2/pi3 builds will be ~9 min on every run instead of getting fast on warm runs. That's fine for now. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Revert "ci(webview): drop registry cache plumbing, simpler is fine" This reverts commit `1284a5ebd9`. * chore(webview): add bin/rebuild_qt5_toolchain.sh helper build_webview.yaml's pi2/pi3 jobs fetch a pre-built Qt 5 cross-compile toolchain from a `WebView-v` GitHub release (webview/build_webview_with_qt5.sh:21 pins QT5_TOOLCHAIN_TAG to WebView-v0.3.5). The trixie-targeted tarballs qt5-5.15.14-trixie-{pi2,pi3}.tar.gz don't exist on any release yet — the original Trixie commit (`65311092`) called out cutting them as an out-of-band prereq. Until they exist, pi2/pi3 CI fails with `sha256sum: no properly formatted checksum lines found` because curl falls back to a 404 HTML page on the missing .sha256 URL. This helper produces those tarballs locally: Builds webview/Dockerfile (the same image CI's compile-webview-part-1 builds inline) once, --load only. * Runs build_qt5.sh inside that image once per requested board (pi2 by default, pi3 by default, or whichever boards are passed on the command line). Sequential because Qt 5 + QtWebEngine peaks at ~16 GB RAM per build and the Linaro cross-compile toolchain extracted into .qt5-toolchain-build/src/ is shared between boards. * Drops outputs at .qt5-toolchain-build/release/qt5-5.15.14-trixie- {pi2,pi3}.tar.gz (+ .sha256), ready to upload via `gh release upload`. Idempotent: existing release/<tarball>.tar.gz short-circuits the run for that board. ccache state is preserved across runs at .qt5-toolchain-build/ccache/. BUILD_WEBVIEW=0 in the env skips the bonus webview-* tarball that build_qt5.sh otherwise produces (the Dockerfile defaults BUILD_WEBVIEW=1 so the helper inherits that default for parity with the previous CI flow). The .qt5-toolchain-build/ directory is intentionally hidden + at the repo root rather than ~/tmp so it's discoverable to whoever runs this next without grep'ing scrollback for a path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(webview): make Qt 5 cross-build Dockerfile produce working tarballs on trixie The webview/Dockerfile in this repo wasn't actually exercised end-to-end before — master CI uses screenly/ose-qt-builder from Docker Hub, and the inline-build path introduced for trixie only ran build_webview_with_qt5.sh (which downloads prebuilt qt5 toolchains). Rebuilding those toolchains for trixie surfaced four real bugs: * python interpreter never on PATH for non-interactive shells. The pyenv block only wired itself up via ~/.bashrc, which doesn't load when the rebuild script does `docker run /webview/build_qt5.sh`. Replace pyenv with apt-pinned python2.7 from archive.debian.org bullseye (trixie main dropped py2 entirely; bullseye archive still ships 2.7.18). Pin only python2.7 + its libpython runtime libs, leave everything else on trixie. Symlink /usr/local/bin/python -> python2.7 so QtWebEngine's `/usr/bin/env python` resolves. * QtWebEngine configure silently rejected fontconfig because the sysroot was missing /usr/share/pkgconfig/bzip2.pc. The Dockerfile only copies /lib, /usr/include, /usr/lib from the builder stage; on trixie's libbz2-dev the .pc file lives in /usr/share/pkgconfig (arch-indep), so freetype2.pc's `Requires.private: bzip2` failed to resolve, which cascaded into fontconfig: no, which silently dropped QtWebEngine from the build. Add the missing COPY. * Several QtWebEngine-required dev libs missing from the sysroot (libharfbuzz-dev, liblcms2-dev, libre2-dev, libxml2-dev). Same libs also need to be installed on the host runtime stage because chromium pdfium evaluates `harfbuzz_from_pkgconfig` in the host toolchain context, where Qt's host_pkg_config="/usr/bin/pkg-config" drops the sysroot args from chromium's pkg_config template. * `make -j$(nproc)+2` OOMs on >8-core hosts. cc1plus under qemu-arm peaks at ~3-4 GB during chromium compile, so the default formula needs ~50 GB on a 16-core box. Make MAKE_CORES env-overridable in build_qt5.sh and have rebuild_qt5_toolchain.sh cap at min(nproc, 8). Also: -webengine-proprietary-codecs in the configure args so the resulting QtWebEngine supports H.264/AAC/MP3 (matches what Debian qt6-webengine ships). Verified on a 16-core/22GB+32GB-swap host: produces qt5-5.15.14-trixie-{pi2,pi3}.tar.gz (88M, 98M) with 251 webengine entries each, plus the matching webview-.tar.gz apps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore(webview): bump QT5_TOOLCHAIN_TAG to WebView-v2026.04.1 Trixie qt5-5.15.14-trixie-{pi2,pi3} toolchain tarballs are published on the new WebView-v2026.04.1 release; the previous WebView-v0.3.5 only ships the bookworm tarballs and is now unreachable for trixie pi2/pi3 CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(webview): refresh stale tag reference in rebuild_qt5_toolchain.sh hint Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): pass full SHA for GIT_HASH; keep short SHA only in GIT_SHORT_HASH Both `.github/workflows/build-webview.yaml` and `bin/rebuild_qt5_toolchain.sh` were populating the GIT_HASH build arg with the short hash, making GIT_HASH and GIT_SHORT_HASH identical and stripping the unambiguous SHA needed by `lib/diagnostics.py:os.getenv('GIT_HASH')` for downstream traceability. Pass `git rev-parse HEAD` for GIT_HASH and reserve `--short HEAD` for GIT_SHORT_HASH (which is already what `tools/image_builder/__main__.py` does for the main service images). Caught in Copilot review of #2779. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docker): exclude Qt 5 toolchain build dir + caches from COPY The viewer image's `COPY . /usr/src/app/` was slurping in 1.6 GB of local Qt 5 cross-build state (`.qt5-toolchain-build/`) plus 69 MB of `.mypy_cache/`, inflating every viewer/server image by ~1.7 GB even though the build needs none of it. Add those plus `.ruff_cache`, `.idea`, `.cursor`, `.claude`, `.cache`, and tighten the existing `.git` / `.github` globs (which match files ending in `.git` / `.github` but not the directories themselves on most matchers) to the literal directory names. Caught while validating the trixie 5-board matrix: x86 viewer was 6.28 GB and pi5 viewer 2.23 GB; both had the same 1.76 GB COPY layer that's mostly `.qt5-toolchain-build/`. Fixed image should be ~5 MB for COPY and ~1.5 GB for the viewer overall. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 18:30:59 +01:00
Viktor Petersson	5e00c8ba25	refactor(docker): drop celery image, restore base apt layer dedup (#2776 ) * refactor(docker): drop celery image, restore base apt layer dedup - Delete Dockerfile.celery.j2; compose now runs celery on the anthias-server image with a `command:` override. - Make viewer extend Dockerfile.base.j2 (mirroring test); drop 17 packages duplicated between viewer and base_apt_dependencies, plus 4 within-list duplicates. - Move `# syntax=docker/dockerfile:1.4` to line 1 of every rendered Dockerfile. It previously lived in uv-builder.j2 line 1 and got bumped mid-file for server by the bun-builder prelude, silently disabling the 1.4 frontend and breaking cache-key parity with viewer — the actual blocker for layer dedup. - Collapse CI matrix from (board × service) to (board) so all services for a board build on the same runner with the same buildkit cache, producing byte-identical apt layer digests at the registry. - Add ENV DJANGO_SETTINGS_MODULE to the server image so the merged image runs both server and celery CMDs. - Update all five compose templates (prod, balena prod, balena dev, dev, test) to redirect anthias-celery at the server image with a command: override. dev compose pins an explicit `image:` tag so both services share the locally-built SHA. - Remove old anthias-celery / srly-ose-celery containers in upgrade_containers.sh so the recreated container can take the name. Verified end-to-end on x86: server and viewer apt layers share a single digest; SHARED SIZE jumps from 132 MB to 1.216 GB; merged image runs both workloads in compose (celery task round-trips through Redis to SUCCESS). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(docker): cache buildkit layers in GHCR registry across CI runs Add a --cache-backend / $BUILDX_CACHE_BACKEND option to tools.image_builder with two modes: - `local` (default): writes to /tmp/.buildx-cache/<board>/. Unchanged from before; right for local dev. - `registry`: pushes BuildKit cache to ghcr.io/screenly/anthias-<service>:buildcache-<board>. Reuses the GHCR login already done by docker-build.yaml, no extra tokens or third-party actions needed. Wire CI to use registry mode on push events (master) so subsequent runs of the same board pull cached layers — the ~825 MB extracted apt install per service goes from ~3 min cold to a few seconds warm. workflow_dispatch on a non-master branch falls back to local mode (effectively no-cache) so manual runs can't pollute the master cache. Drop the old actions/cache@v5 step that mirrored /tmp/.buildx-cache/<board> through actions/cache — registry cache is per-step rather than one big tarball, so it survives the GitHub Actions cache 10 GB-per-repo eviction better. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(image-builder): move local cache out of /tmp to user XDG cache dir SonarCloud python:S5443 flagged the previous /tmp/.buildx-cache/ default as a security hotspot — `/tmp` is world-writable, so on a multi-user host another account could in principle tamper with the buildkit cache. Switch to $XDG_CACHE_HOME/anthias-buildx/<board>/ (default ~/.cache/anthias-buildx/), which is per-user by default and follows XDG Base Directory convention. CI is unaffected: docker-build.yaml uses --cache-backend=registry on push events, which pushes cache to GHCR and never touches the local path. Local dev users with stale state in /tmp/.buildx-cache/<board>/ can rm it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docker): correct cache-backend comments to match real behavior Two doc fixes per Copilot review on #2776: - tools/image_builder/__main__.py: the cache-backend rationale block still referenced /tmp/.buildx-cache/<board>; update to $XDG_CACHE_HOME/anthias-buildx/<board> so it matches the implementation moved in `529a50e0`. - .github/workflows/docker-build.yaml: the env comment claimed pull-request builds read from the registry cache, but this workflow has no pull_request trigger — non-push runs are workflow_dispatch, which both falls through to local cache and skips `docker login ghcr.io`, so it has no GHCR auth at all. Rewrite the comment around the push / workflow_dispatch split the code actually implements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docker): address Copilot review on registry cache + test compose - tools/image_builder/__main__.py: comment in the registry-cache branch said the cache namespace was "picked from the build's tag list", but the implementation hardcodes ghcr.io/screenly/anthias-{service}. Rewrite the comment to describe what the code actually does and call out the hardcode so a future namespaces refactor doesn't silently break cache. - docker-compose.test.yml: anthias-celery had its own `build:` block pointing at Dockerfile.test, claiming "reuses the test image" — but compose builds two separate images per service even with identical context, defeating the dedup intent. Mirror the docker-compose.dev.yml pattern: pin anthias-test to an explicit `image: anthias-test:dev` tag and have anthias-celery reference the same tag with no `build:`. Also bind-mount the source into celery so it picks up code changes (matches anthias-test's existing volume). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(image-builder): read-only registry cache without --push Per Copilot review: --cache-backend=registry previously tried to push cache to ghcr.io/... regardless of --push, so a local invocation without GHCR auth would fail mid-build with a confusing registry error. Split the behavior: - Reads (cache_from) are always set when registry mode is active — the anthias-* GHCR packages are public, so warm-starting off CI's cache without auth works and helps local dev. - Writes (cache_to) only happen when --push is also set, since that's when the workflow has authenticated to GHCR. Without --push, log a yellow warning and skip cache_to. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docker): set DJANGO_SETTINGS_MODULE in test image for celery worker Per Copilot review on #2776 (suppressed-due-to-low-confidence note, but the bug is real): docker-compose.test.yml runs the celery worker from anthias-test:dev. celery_tasks.py calls django.setup() at module import time, which needs DJANGO_SETTINGS_MODULE in the environment. The pre-refactor Dockerfile.celery.j2 set it explicitly; this PR moved that ENV to Dockerfile.server.j2 only, so the production celery (running on the server image) is fine but the test celery would have crashed with ImproperlyConfigured. Set the same ENV in Dockerfile.test.j2. Server and test images both ship a usable Django environment for any process that imports anthias_django. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 15:21:43 +01:00
Viktor Petersson	390dad288f	refactor(webview): audit, rebrand to Anthias, pi4-64 support, CalVer (#2767 ) * refactor(webview): audit, rebrand, pi4-64 support, CalVer artifacts The WebView had accreted a few real bugs and a lot of dead code from successive PRs. This pass: * Fixes a stale image-reply race (new request didn't invalidate the in-flight one), unsafe QMovie buffer ownership for animated GIFs, duplicate/wrong page signal disconnects, and an authentication-required signal-slot signature mismatch that meant the auth handler was never actually invoked. The dual-WebView preload swap is kept but `onWebPageLoadProgress` (dead) and the redundant `webView` alias are removed; `imageRequestId` is renamed to `loadGenerationId` since it invalidates page loads too. Reads server host/port from `LISTEN`/`PORT` env vars (defaults `anthias-server:8080`) instead of hardcoding the Docker service alias. C++ project bumped to C++17 to match Qt 6. * Adds Pi 4 64-bit (`pi4-64`) as a Qt 6 board alongside `pi5` and `x86`, using `balenalib/raspberrypi3-64-debian:bookworm` as the builder base and Debian's apt `qt6-base-dev` / `qt6-webengine-dev` / `qt6-image-formats-plugins`. `tools/image_builder/utils.py` now takes `target_platform` and routes board=pi4 + linux/arm64/v8 to the Qt 6 artifact; the viewer template uses `is_qt6` and `artifact_board` instead of an open-coded board check. * Renames the WebView's Screenly identifiers to Anthias: binary `ScreenlyWebview` -> `AnthiasWebview`, D-Bus service `screenly.webview` -> `anthias.webview`, object path `/Screenly` -> `/Anthias`, handshake string, install dirs in start_viewer.sh, and ships a yellow-bird Anthias logo on the access-denied page (replacing the Screenly wordmark PNG). The viewer-side D-Bus consumer in `viewer/__init__.py` is updated to match. * Adopts CalVer for WebView releases. Tag scheme is `WebView-vYYYY.MM.PATCH` (e.g. `WebView-v2026.04.0`); artifact filenames are `webview-<calver>-<debian>-<board>.tar.gz` with the Qt version and git hash dropped (the Qt 5 toolchain archive keeps its Qt version since there it's load-bearing). Build scripts read `WEBVIEW_VERSION`; CI derives it from `refs/tags/WebView-v` or falls back to a date-stamped `-dev` value for non-tag builds. Validated locally by building the WebView for x86 (native) and for pi5 / pi4-64 (under QEMU) — all three produce a verified `webview-2026.04.0-bookworm-<board>.tar.gz` archive with the renamed binary at `bin/AnthiasWebview` and the new logo at `share/AnthiasWebview/res/anthias-logo.svg`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(webview): bump qmake CONFIG to c++17, drop empty QML_IMPORT_PATH Qt 6 mandates C++17, so the previous c++11 line was being silently overridden by qmake. Drop the empty `QML_IMPORT_PATH =` left over from the Qt Creator template — we don't ship any QML modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(webview): run Qt6 builder containers as non-root builder user SonarCloud's docker:S6471 flagged Dockerfile.pi4 (a new file in this PR) for missing a USER directive. The two pre-existing Qt6 builders (Dockerfile.x86 / Dockerfile.pi5) have the same issue but were outside the PR's leak period — apply the same fix to all three for consistency. Add a `builder` user (UID 1000) after apt-get installs, chown the work directories to it, and switch to USER builder before WORKDIR. The build itself only needs to compile sources and write to /build (which is a bind mount); none of that needs root. As a bonus the build artifacts on the host are now owned by the invoking user (UID 1000 on most CI runners and dev machines) instead of root, so the existing "docker run --rm rm -rf" cleanup workaround is no longer needed for a clean rebuild. Validated by rebuilding the x86 builder image and re-running build_webview.sh — produces the same verified webview-2026.04.0-bookworm-x86.tar.gz, but the host-side files are now ubuntu:ubuntu rather than root:root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(webview): address Copilot review feedback Three issues from the PR review: * loadPage() connected the loadFinished slot before calling stop(). If stop() emits loadFinished(false) synchronously for the in-flight navigation, the just-attached slot ran with ok=false AND disconnected itself (per onWebPageLoadFinished's own logic), so the real loadFinished for the new URL was never received. Restructure: - Drop any prior connection BEFORE stop() so stop()'s emission has no slot to reach. - Connect a per-call lambda that captures the loadGenerationId so stale completions arriving across loadPage boundaries are gated out instead of disconnecting from inside the slot. - Self-disconnect the lambda on first fire so JS-driven redirects re-emitting loadFinished don't re-trigger the swap. - onWebPageLoadFinished() and resetWebViewStates() are no longer needed; remove them. * Container DEVICE_TYPE was set to {{ board }} in both viewer and server templates, which is 'pi4' for both 32-bit and 64-bit Pi 4 builds. lib/github.py:get_latest_docker_hub_hash filters Hub tags by `-{device_type}` suffix and the published tags use `-pi4-64`, so a pi4-64 image looking for `latest-pi4` would never match. Compute device_type ('pi4-64' for board=pi4 + linux/arm64/v8, else board) at the top of build_image() and template both Dockerfiles with it. Hardware checks via lib/device_helper.get_device_type() read /proc/device-tree/model at runtime and are unaffected. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): pre-create bind-mount target for non-root WebView builder The Qt 6 builder containers now run as a non-root `builder` user (UID 1000) to clear SonarCloud's docker:S6471. The bind-mounted host directory ~/tmp-\${board}/build is created lazily by dockerd as root on first compose run, which the non-root container then can't write to — locally my dev UID happens to be 1000 so the build worked, but GitHub's runner is a different UID and CI failed at the very first mkdir /build/release. Pre-create the directory with chmod 777 in a workflow step so the container can write regardless of the runner's UID. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(webview): address second round of Copilot feedback Three more issues from the latest review pass: * The page-load lambda's stale-path early-return left the lambda connected, so a subsequent loadPage that bumped loadGenerationId while a load was in flight would leak a handler that kept firing on every later loadFinished from the same page (logging spam, wasted work). Move the disconnect to the top of the lambda so it runs unconditionally on first fire — the connection is genuinely one-shot and the requestId gate then decides whether to act. * loadImage() didn't cancel a pending page navigation. If a loadPage was still streaming when the viewer flipped to image mode, the webengine kept fetching/rendering the page in the background until completion (only to be discarded by the requestId gate). Disconnect pageLoadConnection and call stop() on both webviews up front so the network/CPU activity actually stops. * viewer's load_browser() looped on the AnthiasWebview handshake string with no timeout and no liveness check, so a botched WebView start (missing binary, library, drift in the handshake line) would hang the viewer indefinitely. Bound the wait to 30s and bail with a clear RuntimeError if the process exits early or a TimeoutError if the handshake never lands; either lets the caller fail fast or retry instead of stalling forever. Also adds a defensive QObject::disconnect for pageLoadConnection in View's destructor, and pulls the handshake string into a constant sharing the same name on both sides of the contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(viewer): update test_load_browser for renamed handshake + binary The Screenly→Anthias rename changed the WebView's process name and D-Bus handshake string, but tests/test_viewer.py was still asserting the old "ScreenlyWebview" / "Screenly service start" values. The test_load_browser case was happily looping for 30s waiting for the Anthias handshake (now that load_browser has a bounded timeout) and then raising TimeoutError. Update the test to match the new strings and to mock is_alive() so the new liveness check returns True instead of MagicMock-truthy without explicit setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(webview): address third round of Copilot feedback * tools/image_builder/utils.py — webview_version was hard-coded to 2026.04.0, so master CI's docker-build.yaml would 404 on the not- yet-published WebView-v2026.04.0 release tag (chicken-and-egg with this PR). Add a WEBVIEW_VERSION env override so the viewer image build can be pointed at any released tag without a code change (e.g. when building from a fork, or when staging the release tag before the PR merges). * webview/docker-compose.yml — drop the now-unused GIT_HASH=${GIT_HASH} passthrough from each builder service. Artifact filenames are CalVer-derived now, no script reads GIT_HASH, and the missing host env var was producing "variable is not set" warnings on every docker compose build/run. * tests/test_viewer.py — extend coverage of load_browser's bounded wait. The new tests assert RuntimeError when the WebView process exits before the D-Bus handshake, and TimeoutError when the handshake never arrives within the deadline (with a stubbed monotonic() so the test runs in milliseconds). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(viewer): poll-and-decode load_browser stdout via PropertyMock Copilot flagged that the existing tests pinned process.stdout to a single bytes value, which doesn't match production where sh.RunningCommand.process.stdout is a @property returning the latest accumulated buffer on each access — so the polling loop was effectively exercising one read instead of N. Switch to mock.PropertyMock with a chunks list so each poll inside load_browser() sees a different buffer; the success-path test now genuinely verifies that the loop re-reads stdout across iterations and finds the handshake on the second poll. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(viewer): give load_browser failure tests a static stdout stub The early-exit test raises RuntimeError, and the production code formats the error message with browser.process.stdout.decode(...) — that's a second read of the property. PropertyMock(side_effect=[b'']) exhausted after the first read and raised StopIteration, breaking the test in CI. Split the helper into a static variant (PropertyMock(return_value=...)) for cases where the loop doesn't depend on stdout growing across iterations, and a chunks variant (side_effect=[...]) for the success case where it does. Apply the static stub to the early-exit and timeout tests; keep the chunks stub for test_load_browser to retain the polling-pattern check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 22:42:51 +01:00
Viktor Petersson	4333fffafa	refactor(messaging): replace ZMQ with Redis for all viewer signalling, drop pyzmq (#2760 ) * refactor(messaging): replace ZMQ pub/sub with Redis for server→viewer commands Server-to-viewer command bus moves off pyzmq onto Redis pub/sub on the 'anthias.viewer' channel, since Redis is already the broker for Celery and the channel layer for Django Channels — no reason to run a second message bus. - settings.ZmqPublisher → settings.ViewerPublisher (redis.publish). - viewer/zmq.py → viewer/messaging.py with ViewerSubscriber backed by redis.pubsub(); the two ZmqSubscriber threads in viewer.main collapse into one, since both former publishers (anthias-server and the host-side wifi-connect script) now fan into the same Redis channel. - viewer-subscriber-ready gating preserved: set after subscribe() returns, same semantics as before. - ZmqConsumer / ZmqCollector (viewer→server reply path) and pyzmq itself are intentionally left in place; PR2 migrates the reply bus and PR3 removes pyzmq + libzmq from the dep tree and Dockerfiles. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: publish host-side wifi-connect messages via Redis, not ZMQ The captive-portal flow (`setup_wifi`, `show_splash`) used to publish on ZMQ port 10001 from the host, with a second ZmqSubscriber inside the viewer connected to host.docker.internal:10001 picking it up. The previous commit collapsed the viewer down to a single Redis-backed subscriber, so this script's ZMQ publishes were going nowhere. Switch the script to redis.publish() against the same anthias.viewer channel. The Redis client is already wired here for the viewer-subscriber-ready gate, and the wifi-connect container runs in network_mode: host, so loopback to redis on 127.0.0.1:6379 (already exposed via the redis service's port mapping) keeps working unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(messaging): replace ZMQ reply bus with Redis BLPOP + correlation IDs Drops the second ZMQ leg — the viewer→server reply path — in favor of Redis BLPOP keyed by a UUID correlation ID. Same channel layer that PR1 moved the command bus onto, so the entire viewer messaging path now runs on Redis. Wire format extends the existing 'command&parameter' encoding: the 'current_asset_id' command (currently the only request-reply command) now carries the correlation ID in the parameter slot, and the viewer LPUSHes its JSON reply onto 'anthias.reply.<corr-id>' (with a 30s EXPIRE so unread replies don't accumulate). The server BLPOPs that key. This also fixes a latent correctness bug: ZmqCollector had no correlation, so concurrent /v1 ViewerCurrentAsset callers could mismatch replies. That hazard was masked today by uvicorn running single-worker; with Redis + correlation IDs, the reply path is now safe across concurrent callers. - settings.ZmqConsumer / ZmqCollector → settings.ReplySender / ReplyCollector (BLPOP). 'import zmq' drops out — pyzmq itself is removed in the next commit. - lib.errors.ZmqCollectorTimeoutError → ReplyTimeoutError (the only catch site is implicit — it bubbles to a 500 — so the rename is mechanical). - viewer/__init__.py: send_current_asset_id_to_server takes a correlation ID and uses ReplySender. The 'current_asset_id' command handler in the dispatch table threads the parameter (now the corr ID) into the function call. - api/views/v1.py ViewerCurrentAssetViewV1: generates a UUID, sends it with the command, BLPOPs on it. - api/tests/test_v1_endpoints.py: ZmqCollector mock → ReplyCollector; side_effect signature relaxed to '_' since recv_json now takes two positional args (corr, timeout_ms). - stubs/redis-stubs/client.pyi: add rpush() and blpop() narrowed to decode_responses=True return shapes (the rest of the stub follows the same convention). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore: drop pyzmq + libzmq, finalize ZMQ→Redis migration With both legs of the viewer signalling path on Redis (PR1: command bus, PR2: reply bus), the pyzmq runtime dependency and the libzmq* build deps are no longer used. - pyproject.toml: remove pyzmq==23.2.1 from server, viewer, wifi-connect, and mypy dep groups (4 places). - uv.lock: regenerated; pyzmq + transitive py drop out. - tools/image_builder/{__main__,utils}.py: remove libzmq3-dev / libzmq5-dev / libzmq5 from the base apt list and from the viewer context's apt list. docker/uv-builder.j2 likewise drops libzmq3-dev from both the prebuilt-uv branch and the pip-fallback branch (32-bit ARM). The rendered docker/Dockerfile.* artifacts are gitignored, so no committed Dockerfile churn here — they regenerate cleanly via `python -m tools.image_builder --dockerfiles-only`. - send_zmq_message.py → send_viewer_message.py. The script already publishes via Redis (fixed in the PR1 follow-up); rename + update callers (bin/start_wifi_connect.sh, docker/Dockerfile.wifi-connect.j2) now that the ZMQ name is misleading. - bin/start_server.sh: drop the stale "single-worker because ZmqPublisher binds 10001" comment. The publisher is now a Redis client — no port bind, multi-worker is safe whenever the operator wants to opt in (not changed in this PR). - CLAUDE.md: update the architecture description (ZMQ ports 10001 / 5558 are gone, Redis carries the viewer signalling traffic now). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: post-merge cleanup — re-flow ruff fmt + drop stale ZMQ refs Three small clean-ups discovered while running CI locally after the master merge (`41d7a80a`): * `api/tests/test_v1_endpoints.py`: master added the ViewerPublisher mock decorator on a single >79-char line. Our branch tightened ruff via the v2 test sweep, so `ruff format --check` now flags it. Wrap it like every other long mock.patch call in this file. * `docs/d2/anthias-diagram-overview.d2`: the server↔viewer edge label still said "ZMQ + asset fetches"; the migration finished in `a9be1d3`. Update to "Redis pub/sub + asset fetches" so the diagram matches CLAUDE.md's architecture description. * `send_viewer_message.py`: stray "Specify the ZeroMQ message" help text on the `--action` flag. The script publishes via redis now; reword to be transport-neutral. No production code touched. Verified locally: ruff check, ruff format --check, mypy, eslint, prettier, bun test, the 107-test Python unit suite, and the 12-test integration suite all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address Copilot feedback on PR #2760 Three line-level review comments: * `viewer/__init__.py` / `settings.py` — `send_current_asset_id_to_server` was creating a fresh `ReplySender()` (and a fresh `redis.Redis` client + connection pool) on every `current_asset_id` request. Reuse the process-wide `r` instead: `ReplySender.__init__` now takes the caller's redis connection, and the viewer constructs a single `reply_sender = ReplySender(r)` at module init. * `viewer/messaging.py` — `ViewerSubscriber.run()` had no reconnect/retry: a transient redis blip during `subscribe()` or `listen()` killed the thread silently, leaving the viewer unable to receive any commands until the process restarted, and `viewer-subscriber-ready` could be left stuck at 1. Wrap the loop in exponential-backoff reconnect (1s → 30s cap) on `redis.ConnectionError`, and clear the readiness flag while disconnected so wifi-connect-style readiness-gated publishers wait instead of dropping messages on the floor. Set readiness only after `subscribe()` returns successfully. * `settings.py` — `ReplyCollector.recv_json` rounded `timeout_ms <= 0` up to a 1-second BLPOP, breaking the old `ZmqCollector` contract where `timeout=0` was a non-blocking poll. Branch on `<= 0` and use `LPOP` (which the redis stub now declares); only round up for positive timeouts. Also add the SonarQube `# NOSONAR` rationale on the two pre-existing hotspots flagged in the PR diff (loopback HTTP for the captive-portal page; the well-known wifi-connect AP gateway IP), and drop a redundant `continue` at the end of the readiness wait loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address Copilot follow-up feedback on PR #2760 Two new comments after the previous resolution round: * `stubs/redis-stubs/client.pyi`: `Redis.lpop()`'s real return type depends on `count` — single value with no count, list with count. The previous stub always declared `str \| None`, so a future `lpop(key, count=N)` call would silently typecheck against the wrong shape. Replace with two `@overload`s: no-count returns `str \| None` (the form Anthias actually uses), explicit-int count returns `list[str] \| None`. Also add `PubSub.close()` to the stub so the finally-block below typechecks. * `viewer/messaging.py`: `ViewerSubscriber.run()` was creating a fresh PubSub on every reconnect attempt without closing the previous one. A flapping redis container would accumulate dead PubSub objects each holding a connection from the pool until GC reclaimed it. Wrap the per-iteration PubSub in a `finally: pubsub.close()` so the socket is released deterministically on every disconnect and on every clean exit from `_consume()`. Swallow `ConnectionError` from `close()` itself — the underlying socket is already gone in the case we care about. Drive-by: the docstring referenced `setup_wifi` and the wifi-connect readiness handshake, both of which #2763 deleted. Update to mention the actual surviving commands and note that no consumer reads `viewer-subscriber-ready` today (kept as a generic readiness signal). Verified: ruff, ruff format, mypy (strict, 97 files), the 103-test unit suite — all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address Copilot's third round of feedback on PR #2760 Three more comments after the previous resolution round: * `viewer/__init__.py` — `send_current_asset_id_to_server()` derefs `scheduler.current_asset_id`, but `subscriber.start()` runs before `scheduler = Scheduler()` in `main()`. A `current_asset_id` request arriving during `wait_for_server()` would `AttributeError` and the caller would see a 2s timeout instead of a useful answer. Guard: if scheduler is None, reply with `current_asset_id: None` — the v1 endpoint already treats a falsy id as "no current asset" and returns `[]`, which is the correct semantic answer pre-init. Not silently dropping the reply: that would deadlock the caller for the full recv timeout. Other scheduler-touching handlers (`next`, `previous`, `asset`, `stop`) have the same pre-existing race, but it's identical to the ZMQ-era behavior and out of scope for this messaging migration. * `api/tests/test_v1_endpoints.py` — `test_viewer_current_asset` only checked `send_to_viewer` call count, leaving the new corr-ID round trip untested. A future refactor that swapped sides of the UUID would deadlock the v1 endpoint until the recv timeout, which the test would fail to catch. Switch the `recv_json` mock from a side_effect lambda to a `MagicMock` so we can introspect its args, then assert the corr-ID extracted from the published command matches the corr-ID passed to `recv_json`. * `stubs/redis-stubs/client.pyi` — the comment said "don't pretend to support `count`" but I'd added a `@overload` for the count form anyway in the previous round. Drop the count overload to match the comment's stated intent: Anthias only uses the no-count form, and a future caller adding `count=N` will get a clear "no overload matches" instead of a stub silently agreeing with the wrong shape. Verified: ruff, ruff format, strict mypy (97 files), 9-test v1 suite, 103-test full unit suite — all pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 12:58:08 +01:00
Viktor Petersson	7476a43b27	chore: drop wifi-connect service end-to-end (#2763 ) The anthias-wifi-connect captive-portal helper has been pinned to balena-os/wifi-connect v4.11.1 (Feb 2023) for ~3 years; upstream dropped the ARMv6 binary back in v4.4.6 so Pi 1 was silently shipping a wifi-connect container with no binary inside, and the host script `bin/start_wifi_connect.sh` had a `set -e`-vs-`$?` bug that made the captive-portal branch unreachable. nmcli/nmtui covers the supported install path. Removing the whole service rather than bumping it: there are no production users left and bumping would require rewriting both the architecture-to-asset matcher (Rust target triples now) and the unzip step (tar.gz now). Removed - Container build: docker/Dockerfile.wifi-connect[.j2], `wifi-connect` group in pyproject.toml + uv.lock, `wifi-connect` entry in image_builder SERVICES, `get_wifi_connect_context()`, `wifi-connect` cell in CI matrix + docker-build.yaml retag SERVICES list. - Compose: `anthias-wifi-connect` service from prod / balena / balena-dev templates, plus the now-unused `host.docker.internal:host-gateway` extra_hosts on `anthias-viewer`. - Helper scripts: bin/start_wifi_connect.sh, start_wifi_connect_service.sh, send_zmq_message.py. - Viewer plumbing: the second ZmqSubscriber bound to host.docker.internal:10001, the `viewer-subscriber-ready` Redis flag, the `setup_wifi` / `show_splash` / `show_hotspot_page` handlers and their entries in the `commands` dict, the `mq_data` / `load_screen_displayed` globals, and the now-unused `redis_connection` parameter on `ZmqSubscriber`. - Server: `/hotspot` URL route, `views_files.hotspot`, `HOTSPOT_FILE` / `INITIALIZED_FLAG` constants, `HotspotViewTest`, templates/hotspot.html, static/img/wifi-off.svg, /data/hotspot dir creation in bin/start_viewer.sh. - Host: sudoers entry for /usr/local/sbin/wifi-connect, ansible/roles/network template + vars. - Docs: docs/wifi-setup.md, the Wi-Fi Setup section and container row in docs/README.md, the wifi-connect.service line and stale `initialized` flag bullet in docs/developer-documentation.md, the "Reset Wi-Fi → hotspot page" step in docs/qa-checklist.md. Migration paths kept (intentional) - bin/upgrade_containers.sh now runs `docker rm -f` on anthias-wifi-connect and srly-ose-wifi-connect alongside the existing nginx/websocket cleanup, so on next pull devices drop the stale container. - ansible/roles/network/tasks/main.yml stops, disables, and removes /etc/systemd/system/wifi-connect.service, then notifies a new `Reload systemd daemon` handler. Idempotent on fresh installs. Verified - `ruff check` + `ruff format --check`: clean. - Strict `mypy .` (django-stubs + drf-stubs plugins): 97 files, 0 issues. - `ansible-lint ansible/`: passes at the `production` profile. - All three compose templates render and parse via `docker compose config`. - `python -m tools.image_builder --dockerfiles-only` generates the remaining 5 services with no Dockerfile.wifi-connect produced. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 09:40:41 +01:00
Viktor Petersson	bf6e9a1741	fix(viewer): unbreak django.setup() in viewer container (#2762 ) * fix(viewer): unbreak django.setup() in viewer container The mypy commit (`93e55018`) added `import django_stubs_ext` and `django_stubs_ext.monkeypatch()` to anthias_django/settings.py, but `django-stubs-ext` is only in the `server`/`test` dependency groups, not `viewer`. The viewer also tries to load every entry in `INSTALLED_APPS` at django.setup() time, which pulls in `channels`, `rest_framework`, `drf_spectacular`, `dbbackup` — none of which the viewer ships or uses (it never serves HTTP). Both failure modes were hidden by a bare `try: django.setup() ... except Exception: pass` in viewer/__init__.py, leaving `connect_to_redis` undefined for the next module-level statement. End result on real hardware (Pi and x86): File "/usr/src/app/viewer/__init__.py", line 63, in <module> r = connect_to_redis() NameError: name 'connect_to_redis' is not defined — a misleading symptom three layers downstream of the actual ModuleNotFoundError. Changes: * `anthias_django/settings.py`: - Make `import django_stubs_ext` + `monkeypatch()` optional. The codebase has zero runtime usages of `QuerySet[Asset]`-style subscriptable Django generics (and no `from __future__ import annotations`), so the patch is currently a no-op anyway. mypy + django-stubs still pick it up at type-check time because the dev group ships it. - Gate `INSTALLED_APPS` on `ANTHIAS_SERVICE=viewer`. The viewer only needs `anthias_app` + `contenttypes` + `auth` for ORM access to the Asset model. Server/celery/test don't set the env var and keep the full 12-app list. * `docker/Dockerfile.viewer.j2`: set `ENV ANTHIAS_SERVICE="viewer"`. * `viewer/__init__.py`: drop the bare `try: ... except Exception: pass`. Any future import or django.setup() failure now surfaces as a real traceback instead of a confusing NameError downstream. * `celery_tasks.py`: same defensive cleanup. Celery uses the server dep group so it doesn't fail today, but the antipattern would mask the same class of regression — fix it before it bites. Verified inside docker on x86: rebuilt all three images (server/celery/viewer); each module imports cleanly. Server still loads the full INSTALLED_APPS (12 apps incl. channels, DRF, dbbackup) and django_stubs_ext.monkeypatch() still runs. Viewer reaches the loop entry point (Qt browser launch then fails on the headless build host, expected and unrelated). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(settings): split INSTALLED_APPS into base + http-only Reverses the if/else from the previous commit so the structure matches the intent: every Django consumer (server, celery, viewer, test) gets the same minimal base — ORM, contenttypes, auth — and HTTP-serving services additively opt into the web stack on top of that. Why this is better than the if/else: * Single source of truth for "what does any Django consumer need" — no risk of the two branches drifting. * Adding a future lightweight service (e.g. a one-shot migration runner) is a no-op: it gets the right base by default. * The web-only apps are listed exactly once and clearly tagged as HTTP-only, instead of being interleaved with base apps in the full-mode branch. Verified inside docker (viewer image, ANTHIAS_SERVICE=viewer): 3-app list as before. With ANTHIAS_SERVICE unset (server-equivalent path): 12-app list, identical contents to pre-refactor master, just with base apps now leading. `manage.py check` reports no issues — the ordering change (channels was first; now anthias_app/contenttypes/ auth lead) is benign because Anthias drives ASGI via uvicorn, not the runserver shadow that channels' first-app position used to matter for. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * review: address Copilot feedback on PR #2762 Three findings, all valid: * Comment claimed `django_stubs_ext.monkeypatch()` was a no-op because no runtime code subscripts Django generics. That's wrong: `anthias_app/admin.py` defines `class AssetAdmin(admin.ModelAdmin [Asset])` at module level, which raises TypeError on the server without the patch. Rewrite the comment to be honest about the runtime dependency so a future contributor doesn't delete the patch thinking it's dead. * `except ImportError: pass` was too broad — it would also swallow a partially-installed django_stubs_ext (e.g. a missing internal submodule). Narrow to `ModuleNotFoundError` and only swallow when `exc.name == 'django_stubs_ext'`; re-raise otherwise so unrelated import failures surface. * The same comment claimed the viewer image doesn't ship drf-spectacular or django-dbbackup, but the viewer dep group still listed both. The gated INSTALLED_APPS no longer references their apps and viewer code never imports them, so drop them from the viewer group instead of fixing the comment to admit they were there. Re-locked uv.lock. Verified inside docker after rebuilding the viewer image: viewer loads cleanly, INSTALLED_APPS = 3, and `importlib.util.find_spec` confirms drf_spectacular / dbbackup / django_stubs_ext are all absent from the viewer venv. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-28 08:08:19 +01:00
Viktor Petersson	8041fc30e4	ci: switch primary registry to ghcr, drop legacy srly-ose namespace (#2761 ) Make `ghcr.io/screenly/anthias-` the canonical source for Anthias container images and demote Docker Hub's `screenly/anthias-` to a parallel mirror during the migration window. The legacy `screenly/srly-ose-` namespace is dropped entirely (matrix push + latest- mirror). The compose templates are flipped to ghcr in the same change so `bin/upgrade_containers.sh` regenerates with ghcr on the next run. Why --- Two motivations stack: 1. Docker Hub's anonymous-pull rate limit (100 pulls / 6h per IP) bites end-users when a fleet of devices behind one NAT all run `bin/upgrade_containers.sh` at once, not just CI. GHCR has no such limit for public packages, and storage is free unlimited. Authed pushes from CI also get a much higher quota under the GitHub Actions token than under our shared Docker Hub bot. 2. d568602's publish-latest hit Docker Hub's 429 rate limit on retag #52 (`srly-ose-redis:latest-pi3`) — the legacy namespace doubled the manifest GETs in the loop and bought no real back-compat in exchange. `docker-compose.yml.tmpl` has pointed installs at `screenly/anthias-` since 2023-02 (`b9998438`), and `bin/upgrade_containers.sh` regenerates compose from the template on every upgrade, so any device that has run an upgrade in the past three years is on `screenly/anthias-` already. What ships ---------- * `tools/image_builder/__main__.py` — `namespaces` becomes `['ghcr.io/screenly/anthias', 'screenly/anthias']`. GHCR is listed first so it's the primary push target; Docker Hub is the parallel mirror. The buildx matrix now pushes both `<short-hash>-<board>` tags to both registries on every build. * `.github/workflows/docker-build.yaml` — adds job-scoped `permissions: { contents: read, packages: write }` on `buildx` and `publish-latest` (not at workflow level, so `run-tests` doesn't inherit), plus a `Login to GitHub Container Registry` step using `${{ github.actor }}` + `${{ secrets.GITHUB_TOKEN }}` in both jobs. The publish-latest mirror loop iterates over both namespaces (GHCR first) inside the same retry-wrapped retag block, so `latest-<board>` advances atomically across both registries or not at all. * `docker/labels.j2` — new shared partial that emits the OCI image labels (`source`, `url`, `licenses`, `title`, `description`). `image.source` is the load-bearing one for GHCR: it links the package to its source repo, which makes the package inherit the repo's visibility and grants repo collaborators push/delete access. * `docker/Dockerfile.{base,redis,viewer}.j2` — include the new partial. `Dockerfile.base.j2` covers server / celery / wifi-connect / test (which all `{% include 'Dockerfile.base.j2' %}`); `redis.j2` and `viewer.j2` have their own production-stage `FROM` so include `labels.j2` directly. * `docker-compose.yml.tmpl`, `docker-compose.balena.yml.tmpl`, `docker-compose.balena.dev.yml.tmpl` — flip 15 `image:` lines from `screenly/anthias-` to `ghcr.io/screenly/anthias-`. Devices pick this up on next `bin/upgrade_containers.sh` (the script regenerates `docker-compose.yml` from the template). Retry-with-backoff seatbelt around `imagetools` calls (originally added in 8099a14a) is preserved. Deployment notes ---------------- After this lands, the docker-build workflow will run on master and publish to GHCR for the first time. Before merging, set `Screenly`'s default-new-package visibility to "Public" at https://github.com/organizations/Screenly/settings/packages so the five new packages don't land private. (`org.opencontainers.image.source` auto-links each package to this repo but does not set visibility.) Migration-window risk: between merge and `publish-latest` completion (~80 min), `ghcr.io/screenly/anthias-:latest-<board>` tags don't exist yet. Devices that run `bin/upgrade_containers.sh` in that window will fail to pull and stay on their existing containers (no auto-fallback to Docker Hub). They'll pull successfully on the next upgrade attempt. To minimise impact, merge during a low-fleet-upgrade window. Phase 3 (months later, separate PR): stop publishing `latest-` to Docker Hub once enough fleet has rotated through an upgrade. `<short-hash>-<board>` tags on Docker Hub stay around indefinitely for explicit pins. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 19:50:33 +01:00
Viktor Petersson	f421130b24	refactor(server): collapse nginx + websocket containers into uvicorn (#2757 ) * refactor(server): collapse nginx + websocket containers into uvicorn Replace the nginx + gunicorn + gevent-websocket trio with a single uvicorn ASGI server inside `anthias-server`: * HTTP, /static/, /anthias_assets/, /static_with_mime/, and /hotspot are now served from Django (WhiteNoise + small file-serving views in `anthias_app/views_files.py` that re-implement nginx's IP allowlists). * WebSockets move from a separate gevent process talking ZMQ to Django Channels with a Redis-backed channel layer, fanned out by celery via `channel_layer.group_send`. * TLS termination is handled by uvicorn directly when SSL_CERTFILE / SSL_KEYFILE are set; `bin/enable_ssl.sh` now writes a compose override (no longer ansible) and a companion `bin/disable_ssl.sh` removes it. Cert + key live under `~/.anthias/ssl/`. * `bin/upgrade_containers.sh` removes the legacy `anthias-nginx` and `anthias-websocket` containers on upgrade so they don't linger. * Drop `gunicorn`, `gevent`, `gevent-websocket`, and the `websocket` uv group from `pyproject.toml`; add `channels`, `channels-redis`, `daphne`, `uvicorn[standard]`, and `whitenoise`. Notes on hardening: `--forwarded-allow-ips` defaults to off so the IP allowlist can't be bypassed via a spoofed `X-Forwarded-For`; operators behind a reverse proxy can opt in via the `FORWARDED_ALLOW_IPS` env var. Backup uploads previously sized by nginx's `client_max_body_size 4G` are preserved by setting `DATA_UPLOAD_MAX_MEMORY_SIZE = None`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address review feedback on uvicorn migration * Drop USE_X_FORWARDED_HOST (inconsistent with the deliberate --forwarded-allow-ips hardening; without a proxy, X-Forwarded-Host is client-controlled). * Remove daphne — uvicorn runs production and the test environment now uses it too (bin/prepare_test_environment.sh). * Replace _safe_join's parents-membership check with Path.is_relative_to. * Drop AllowedHostsOriginValidator wrapper (no-op under ALLOWED_HOSTS=['']) and document where to put it back if hosts are ever locked down. Rename DOCKER_CIDR → DOCKER_BRIDGE_CIDR with a comment that this is defense-in-depth, not a real perimeter (LAN clients via the published port also appear in 172.16/12). * Add anthias_app/tests.py covering the IP allowlists, mime override, hotspot gating, and traversal/symlink rejection in _safe_join (17 tests). * Note the single-worker ZmqPublisher bind constraint in start_server.sh so a future scale-up doesn't EADDRINUSE on tcp://0.0.0.0:10001. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(security): clear SonarCloud hotspots on uvicorn migration * Restrict views_files.anthias_assets / static_with_mime / hotspot to GET via @require_GET (Sonar S3752, x3): they are read-only file servers and should reject other methods at the view boundary. * Mark RFC1918 / Docker-bridge CIDR literals as NOSONAR S1313 (x4): they are intentional, well-known private network ranges. * Mark `http://` in CSRF_TRUSTED_ORIGINS as NOSONAR S5332 with a comment explaining devices ship over HTTP and operators opt into TLS via bin/enable_ssl.sh. Existing 17 view tests continue to pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix: clear remaining static-analysis findings * ruff format -- the previous tests.py reformatted itself; CI's `ruff format --check` now passes. * CodeQL py/path-injection on _safe_join: rewrite using os.path.realpath + os.path.commonpath, which CodeQL recognises as a sanitiser for path-injection sinks. Behaviour is identical to the Path.is_relative_to version (both reject `..` and symlink escapes; the 17 tests in anthias_app/tests.py still pass). * SonarCloud NOSONAR markers: switch to the codebase's bare `# NOSONAR` form (matches host_agent.py and tests/test_backup_helper.py); the earlier `# NOSONAR <rule>` form was not being honoured. * Centralise the test-fixture IPs in module-level constants so S1313 is suppressed in one place rather than at every callsite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(security): inline path-injection check in views CodeQL only treats os.path.commonpath as a sanitiser when the check sits in the same function as the file-system sink — calling _safe_join() from a separate function still leaves the open()/isfile() sinks tainted (4 alerts on PR #2757). Repeat the realpath + commonpath check inline in anthias_assets and static_with_mime so CodeQL can prove the post-check path stays under the configured root. _safe_join is kept for the SafeJoinTest unit tests and as a documented helper. Existing 17 tests in anthias_app/tests.py continue to pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(security): use realpath+startswith path sanitiser for CodeQL CodeQL's path-injection model recognises the canonical `realpath(...).startswith(base + sep)` pattern but apparently not `os.path.commonpath(...) == root` in this codepath. Switch the inline check in anthias_assets and static_with_mime to startswith so the analyser can prove the post-check path stays under the configured root. Behaviour is identical: traversal and symlink-escape still 404 (verified by SafeJoinTest + view tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Copilot review feedback * lib/utils.py imported channels/asgiref at module level. The viewer container imports lib.utils via viewer/__init__.py but its uv dependency group does not ship channels, so the viewer would ImportError on startup. Move the channels imports into YoutubeDownloadThread.run() (server/celery-only path) so lib.utils remains importable from the viewer. * Drop the unused _safe_join() helper and its three SafeJoinTest cases — the views inline a realpath+startswith sanitiser (CodeQL needs the check in the same function as the sink), and the helper was only being exercised in isolation. Add an equivalent symlink-escape test against anthias_assets so the actual code path used by the views is covered. * Refresh the anthias_django/settings.py docstring + Django doc URLs from /3.2/ → /4.2/ to match the pinned Django version. 15 view tests pass (was 17 — lost 3 SafeJoinTest + gained 1 symlink test against the real view). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: refresh architecture diagram for uvicorn migration Drop the anthias-nginx and anthias-websocket nodes (and their edges) from docs/d2/anthias-diagram-overview.d2 — the user now talks directly to anthias-server (uvicorn handling HTTP + /ws), Celery fans out asset-update events through the Redis-backed Channels layer, and the viewer fetches media from anthias-server over HTTP. Regenerate the SVG with d2 v0.7.1. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Copilot SSL + CSRF / WS-origin feedback * Dual uvicorn listeners when SSL is enabled (Copilot #1, #2). HTTP on $HTTP_PORT (default 8080) for inter-container traffic — viewer + webview hit anthias-server over plain HTTP on the Docker network and cannot validate uvicorn's self-signed cert. HTTPS on $HTTPS_PORT (default 8443) for external clients. bin/enable_ssl.sh now appends 443:8443 to the compose ports list (instead of using `!override` to swap 80:8080 for 443:8080), so port 80 stays available for backward compatibility and the Docker-network HTTP port keeps working. * Drop CSRF_TRUSTED_ORIGINS = ['http://', 'https://'] (Copilot #3). Verified via Django shell: those leading wildcards are ignored by Django 4.2 (only subdomain wildcards like https://.example.com are honoured), so the setting was a no-op. Same-origin POSTs still pass through Django's built-in Origin/Host check. Re-add channels.security.websocket.AllowedHostsOriginValidator to the WebSocket router (Copilot #5). Currently a no-op under ALLOWED_HOSTS=[''], but tightening ALLOWED_HOSTS later will now also tighten /ws. Smoke test (dev + SSL override): - HTTP http://localhost:8000/ -> 200 - HTTPS https://localhost:8443/ -> 200 - HTTP http://localhost:8443/ -> 000 (TLS-only, expected) - internal http://localhost:8080/ -> 200 - 15 view tests still pass. Note: Copilot #4 (Docker-bridge CIDR is bypassable via the published port) is documented in views_files.py as defense-in-depth and matches the original nginx posture; switching to app-layer auth is out of scope for this PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> refactor(ssl): switch from in-uvicorn TLS to a Caddy sidecar The previous SSL implementation gave anthias-server two uvicorn listeners (HTTP + HTTPS) so the viewer/webview could keep talking plain HTTP over the Docker network while external clients got TLS. That dual-listener dance is non-zero overhead and complicates signal handling. Switch to the standard reverse-proxy pattern instead. When SSL is enabled by bin/enable_ssl.sh: * anthias-server stays a single uvicorn listener on plain HTTP 8080 (no SSL_CERTFILE/SSL_KEYFILE knobs, no dual-port logic). * A Caddy sidecar (caddy:2-alpine, only present when the override is installed) terminates TLS on host port 443, redirects 80→443, and reverse-proxies to anthias-server:8080 — so X-Forwarded-Proto / X-Forwarded-For are forwarded as-is by Caddy. * The override removes anthias-server's external port mapping (`ports: !override []`), so all external traffic must enter through Caddy and the IP allowlists in views_files.py see the original LAN client IP rather than the docker-bridge gateway. Inter-container traffic is unchanged. * `FORWARDED_ALLOW_IPS=` is set on anthias-server in the override — safe because anthias-server is no longer reachable from outside the Docker network — and `SECURE_PROXY_SSL_HEADER` is added in Django settings so request.is_secure() returns True for HTTPS callers. When SSL is not enabled there is zero new container, zero new config — the base compose file is untouched and Caddy isn't pulled or run. bin/disable_ssl.sh now also removes the anthias-caddy container before deleting the override, so HTTPS-only state is fully reversed. Smoke-tested with a temporary Caddy override: - HTTPS via Caddy: 200 - HTTP via Caddy: 301 → https://... - Direct anthias-server: refused (port mapping dropped by override) - WebSocket upgrade: 101 Switching Protocols - request.is_secure() with X-Forwarded-Proto=https: True - 15 anthias_app view tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(views_files): document IP-allowlist threat model Spell out exactly when the docker-bridge CIDR check is and isn't a real perimeter: * No-SSL default: anthias-server is published as 80:8080, so requests arrive with REMOTE_ADDR set to the docker bridge gateway (172.x) and LAN clients aren't actually excluded. Trying to plug the gap with auth would be security theatre — credentials would travel in plaintext over the LAN anyway. * SSL via the Caddy sidecar: Caddy terminates TLS, rewrites X-Forwarded-For, uvicorn honours it (FORWARDED_ALLOW_IPS=), and the check sees the real client IP — so the bypass is closed for any deployment that actually cares about confidentiality. This is documentation only; no behavioural change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> feat(ssl): add --domain (auto Let's Encrypt) + drop openssl shim bin/enable_ssl.sh now has three modes instead of two: * Default (no args) — Caddy issues per-SNI certs lazily from its built-in local CA via `tls internal { on_demand }`. Drops the openssl self-signed-cert generation step entirely; Caddy persists the CA in the anthias-caddy-data volume and rotates leaf certs itself. Browsers still warn (CA is local) but no openssl/cert hygiene is needed on the host. * `--domain example.com [--email you@example.com] [--staging]` — Caddy auto-issues + renews from Let's Encrypt. Caddy auto-creates the HTTP→HTTPS redirect for hostname sites. Use `--staging` to point at the ACME staging endpoint while testing, so the production rate limits aren't burned. * `--cert /path/to/cert.pem --key /path/to/key.pem [--domain ...]` — unchanged: bring your own cert, Caddy serves it as-is with `auto_https off`. Verified: - All three Caddyfiles pass `caddy validate`. - Default mode end-to-end: HTTPS=200 with cert from "Caddy Local Authority - ECC Intermediate", per-SNI SANs (DNS:localhost, IP Address:192.168.99.99 etc.), HTTP→HTTPS=301, /ws upgrade=101, anthias-server's external port mapping is dropped so direct access is refused. Docs (CLAUDE.md, docs/README.md, docs/developer-documentation.md) updated to describe the Caddy sidecar instead of in-uvicorn TLS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address self-review findings on PR #2757 * Gate SECURE_PROXY_SSL_HEADER on FORWARDED_ALLOW_IPS (anthias_django/settings.py): without the gate, a client on a plain-HTTP deploy could send `X-Forwarded-Proto: https` and flip `request.is_secure()`. Django reads the header from META directly, independent of uvicorn's --proxy-headers flag, so the previous unconditional setting was actually exploitable in non-SSL mode (secure-cookied sessions would drop on the next plain-HTTP request, redirects would point at https:// URLs that don't exist). Verified live: non-SSL → SECURE_PROXY_SSL_HEADER is None and is_secure() with spoofed XFP=https returns False; SSL via Caddy override → header is set and is_secure() returns True. * Replace the isfile() pre-check + open() in anthias_assets and static_with_mime with a try/except FileNotFoundError around open() (anthias_app/views_files.py). Eliminates a (tiny but real) TOCTOU window between the stat and the open. IsADirectoryError handled too, since `realpath('/dir/')` resolves to the directory and open() would otherwise 500. * Comment FORWARDED_ALLOW_IPS=* assumption in bin/enable_ssl.sh: the wildcard is only safe because the override drops anthias-server's external port mapping, so any future edit that re-adds a host:port publication has to either tighten the wildcard to Caddy's IP/CIDR or unset it. * Replace ANSI-C escape sequences in the Caddyfile generator with plain multi-line strings. `read -r -d ''` was the first attempt but it strips trailing newlines, which collapsed `auto_https off` onto the same line as `}` in cert mode. Multi-line literals with echo "$VAR" are unambiguous and Caddy validates all three modes cleanly again. * Add a docker-volume cleanup hint to bin/disable_ssl.sh: Caddy's local CA persists in anthias_anthias-caddy-data so an enable → disable → enable cycle reuses the same CA (intentional — browsers that trusted it stay trusted), and operators who want a fresh CA now have the exact `docker volume rm` command in the script's output. 15 view tests still pass; default + SSL Caddyfiles still validate; default + SSL endpoints still return 200 / 301 / 101 in smoke tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Copilot's host/MIME hardening feedback Two security tightenings on top of the prior SECURE_PROXY_SSL_HEADER gate (which Copilot flagged on a stale snapshot — that one's already fixed in `07b784b9`): * `ALLOWED_HOSTS` is now driven by the `ALLOWED_HOSTS` env var, with `` kept as the default so flexible LAN-by-IP / mDNS access still works out of the box. Operators on hardened LANs can opt into a strict allowlist (`ALLOWED_HOSTS=192.168.1.50,anthias.local,...`) to defend against DNS-rebinding without us guessing the right set of hostnames at install time. Verified the env override parses to `['192.168.1.50', 'anthias.local', 'localhost']`. `static_with_mime` now allowlists the `?mime=` query param against a small set of download-only types (`application/{gzip,octet-stream,x-gzip,x-tar,x-tgz,zip}`) instead of accepting whatever the caller sends. Closes the XSS footgun where `?mime=text/html` would have served a stored file as HTML. The frontend's only legitimate caller (the backup download) sends `application/x-tgz`, which is in the allowlist; anything else falls back to mimetypes.guess_type. Added `test_mime_override_rejects_html` to lock that behaviour in. 16 view tests pass; ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-27 12:51:40 +01:00
Viktor Petersson	07a8f656e7	fix(ci): pin bun-builder stage to BUILDPLATFORM for 32-bit ARM builds (#2756 ) Follow-up to #2755. With the uv image manifest fix in place, master's buildx matrix surfaced a second 32-bit ARM blocker: ERROR: failed to resolve source metadata for docker.io/oven/bun:1.3.13-slim: no match for platform in manifest: not found oven/bun publishes only linux/amd64 and linux/arm64 manifests, so a target-platform build (linux/arm/v7 for pi3, linux/arm/v8 32-bit for pi4, linux/arm/v6 for pi1/pi2) can't pull the image at all. The bun-builder stage in Dockerfile.server.j2 only exists to compile JS/CSS into /app/static/dist/. Its output is platform-independent — the next stage COPYs the dist tree into the target image. So pin the stage to $BUILDPLATFORM and let it always run natively on the build host, regardless of the target. This also avoids a slow QEMU-emulated `bun run build` on the arm64 builder. Out of scope: the development branch's `COPY --from=oven/bun:1.3.13-slim /usr/local/bin/bun` is genuinely platform-dependent (it copies the bun binary into the runtime image) and not exercised by the production CI matrix. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 21:49:56 +01:00
Viktor Petersson	a871b6e0f8	fix(ci): unbreak 32-bit ARM builds and make latest-* tag updates atomic (#2755 ) * fix(ci): unbreak 32-bit ARM builds and make latest-* tag updates atomic Fixes #2754, in which a fresh x86 install pulled screenly/anthias-nginx:latest-x86 with the post-rename nginx config (`alias /data/anthias/staticfiles/`) but screenly/anthias-server:latest-x86 from two days earlier, still pre-rename (`STATIC_ROOT = '/data/screenly/staticfiles'`). collectstatic wrote to one path while nginx served from another, so every /static/* request 404'd. Two underlying problems produced that mismatch: 1. Every Docker Image Build run on master since #2744 has failed at `COPY --from=ghcr.io/astral-sh/uv:0.9.17 /uv /uvx /usr/local/bin/` for pi3 (linux/arm/v7) and pi4-32 (linux/arm/v8). The prebuilt uv image only publishes linux/amd64 and linux/arm64/v8 manifests, so any 32-bit ARM target fails resolving its manifest. uv-builder.j2 already special-cased pi1/pi2 to install uv via `pip3 install uv`, but that gate was on `board` and so missed pi3 / pi4-32. Switch the gate to `target_platform in ['linux/arm/v6', 'linux/arm/v7', 'linux/arm/v8']` (board alone can't disambiguate pi4 from pi4-64 since both report board='pi4') and thread target_platform through the Jinja context from tools.image_builder. 2. The buildx matrix pushed both the immutable <short-hash>-<board> tag and the floating latest-<board> tag in the same step, with fail-fast=true. When pi4 wifi-connect failed first, fast siblings that had already pushed (x86 nginx) kept their advance while slow ones (x86 server) got cancelled before push. Latest-x86 ended up half new, half old — exactly the symptom in the bug. Decouple the two: - tools.image_builder gains --skip-latest-tag, omitting the floating tag from the per-job push. - The buildx matrix now passes --skip-latest-tag and runs with fail-fast: false (so a single platform failure no longer cancels siblings; immutable short-hash pushes are harmless on their own). - A new publish-latest job, needs: buildx, mirrors each <short-hash>-<board> onto latest-<board> via `docker buildx imagetools create`. Because it is gated on the entire matrix succeeding, latest-* now advances as a coherent set or stays put. imagetools create re-points the registry tag without re-uploading layers, so it costs seconds per image. balena already used the immutable short-hash tag, so its `needs: buildx` is unchanged. Verified locally: rebuilt screenly/anthias-server:latest-x86 from this branch, ran collectstatic against the same host bind mount the production compose template uses, then started the unchanged screenly/anthias-nginx:latest-x86 (sha256:f6ef9c4c… — the exact image hash from the issue). HEAD /static/admin/css/autocomplete.css and HEAD /static/dist/css/anthias.css both returned 200 with full bodies (9 KB and 235 KB respectively). Generated Dockerfiles for every board confirm the platform gate: pi1, pi2, pi3, pi4 → `pip3 install uv`; pi4-64, pi5, x86 → `COPY --from=ghcr.io/astral-sh/uv:0.9.17`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Copilot review feedback - docker/uv-builder.j2: pin both uv install paths to a single source-of-truth `uv_version` (0.9.17). The 32-bit ARM fallback previously did `pip3 install uv` (unpinned), which would have drifted the moment a new uv release lands on PyPI; now both the COPY-from-prebuilt path and the PyPI fallback use the exact same pinned version, so cross-arch builds stay reproducible. - .github/workflows/docker-build.yaml: rebuild publish-latest as a single sequential job instead of a matrix. With the previous fail-fast: false matrix, a transient registry error on one (board, service) retag wouldn't stop other parallel runners from blindly advancing latest-* on their slice — exactly the partial-coherence problem Copilot flagged. The new shape: - single job, no matrix - `set -euo pipefail` so the first failure stops the rest - preflight that resolves every <short-hash>-<board> tag before any retag fires, so a missing source tag fails the job before it mutates the registry - retags grouped under `::group::` headers in the log - ~98 retags (7 boards × 7 services × 2 namespaces) run sequentially in well under two minutes since `imagetools create` only re-points a manifest, no layer uploads - tools/image_builder/__main__.py: soften the --skip-latest-tag help text. The previous wording claimed the latest-* update is "atomic across the build matrix"; in reality the gating is on the build matrix, not on a single transactional retag, and a registry hiccup mid-retag could still leave a small subset of latest-* tags transiently out of sync until the workflow is re-run. New wording is precise about both guarantees. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 21:36:56 +01:00
Viktor Petersson	3c96b541a1	refactor: rename legacy 'screenly' dirs to 'anthias' with auto-migration (#2753 ) * refactor: rename legacy 'screenly' dirs to 'anthias' with auto-migration For legacy reasons the host directories storing the cloned repo, user assets, and config + DB still carried the old 'screenly' name. Rename all three to their 'anthias' equivalents, plus the in-container paths, the screenly.db / screenly.conf filenames, /tmp/screenly.watchdog, /etc/sudoers.d/screenly_overrides, the ansible role, and the nginx URL location. Existing installations are migrated automatically: ~/screenly/ -> ~/anthias/ ~/screenly_assets/ -> ~/anthias_assets/ ~/.screenly/ -> ~/.anthias/ screenly.db -> anthias.db screenly.conf -> anthias.conf (paths rewritten in the body) /etc/sudoers.d/screenly_overrides -> /etc/sudoers.d/anthias_overrides Migration is driven by two new helpers: - bin/migrate_legacy_paths.sh: idempotent host-side rename. Self-relocates if invoked from inside the dir being renamed. Rewrites both relative and absolute path values inside screenly.conf. Leaves dir-level back-compat symlinks at the old paths and file-level symlinks (screenly.db, screenly.conf) inside the migrated config dir so user automation / one-version downgrade still find familiar names. - bin/migrate_in_container_paths.sh: defensive /data/.screenly and /data/screenly_assets symlinks invoked from the container start scripts, in case an older docker-compose.yml is still mounting the legacy paths during a partial upgrade. Wired into bin/install.sh (renames ~/screenly before clone_repo, then runs the in-repo helper after) and bin/upgrade_containers.sh (runs the helper near the top before regenerating docker-compose.yml). Out of scope (intentional): the screenly/anthias-* Docker Hub namespace, the Screenly/Anthias GitHub repo URLs, the screenly_ose Balena fleet, api.screenlyapp.com / apt.screenlyapp.com legacy URLs, and brand URLs in docs. Tests: added tests/test_migrate_legacy_paths.py (4 cases: full migration, absolute-path conf rewrite, idempotent rerun, fresh-install no-op) and tests/test_backup_helper.py::RecoverLegacyTarballTest (recover() still accepts pre-rename .tar.gz backups). Ruff clean. All 6 new tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style: apply ruff format to new test files CI's `ruff format --check` flagged tests/test_backup_helper.py and tests/test_migrate_legacy_paths.py. Reformatted; behaviour unchanged, 6/6 migration-related tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: suppress SonarCloud S5042 on write-mode tarfile.open in fixtures The two new fixture-building calls in tests/test_backup_helper.py use `tarfile.open(..., 'w:gz')` (write mode), which Sonar's python:S5042 rule flags as "expanding this archive file" without distinguishing read from write. arcnames are hardcoded test inputs with no path-traversal surface, so the warning is a false positive here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address Copilot review feedback - lib/backup_helper.py: harden recover() against tar path traversal (Zip Slip / CVE-2007-4559). New _safe_tar_member() rejects absolute paths, '..' components, non-regular-non-directory members (symlinks/hardlinks/devices), members outside the allowed top-level dirs, and any post-normalisation path that escapes $HOME. Iterates members manually instead of bulk extractall(), and passes filter='data' on Python with PEP-706 extraction filters (3.11.4+/3.12+) for belt-and-suspenders defence. - tests/test_backup_helper.py: BackupHelperTest now patches HOME to a per-test tmpdir so `tearDown` no longer rmtree's a real ~/anthias checkout when run on a developer workstation. Also added test_recover_skips_path_traversal_member, which proves a hostile tarball entry like `../evil.txt` is logged-and-skipped, not written outside $HOME. - docs/raspberry-pi5-ssd-install-instructions.md: capitalise "This" after the period. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: add missing leading slash to repo dir heading The heading for the cloned repo dir was rendered as `home/${USER}/anthias/`, while every other heading in the section uses absolute paths like `/home/${USER}/.anthias/`. Same fix applied to the legacy-path mention in the note below it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 13:34:53 +01:00
Viktor Petersson	c7ec6ea771	chore(build): replace webpack, npm, and jest with bun (#2746 ) * chore(deps): manage Python deps via uv dependency-groups Replaces the six service-scoped requirements.txt files with PEP 735 dependency-groups in pyproject.toml and rebuilds every Docker image as a two-stage build: a uv-builder stage (using the official ghcr.io/astral-sh/uv image, with a pip fallback for armv6) produces /venv via `uv sync --group <svc>`, which the runtime stage copies in. uv.lock becomes authoritative for all services. requirements/requirements.host.txt is kept as a committed, auto-generated artifact (`uv export --group host`) so bin/install.sh and the Ansible role keep working; a python-lint CI step enforces it stays in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore(deps): bump Django, cryptography, pyOpenSSL, and 5 others - Django 4.2.29 → 4.2.30 (latest 4.2 LTS) - cryptography 3.3.2 → 46.0.7 (capped by pyOpenSSL 26's `cryptography<47`; cryptography 47 is incompatible with the latest pyOpenSSL) - pyOpenSSL 19.1.0 → 26.0.0 (required by newer cryptography ABI — pyOpenSSL 19 crashed at import against cryptography ≥ ~3.4) - requests 2.32.5 → 2.33.1 (aligned across every group, including docker-image-builder and local) - pyasn1 0.6.2 → 0.6.3 - redis 7.1.0 → 7.4.0 - Cython 3.2.3 → 3.2.4 - sh 1.8 → 2.2.2 (major bump; usages in celery_tasks.py, bin/wait.py, lib/utils.py stick to the stable `sh.<cmd>` + `sh.ErrorReturnCode_N` API — verified still works) - python-vlc 3.0.20123 → 3.0.21203 `mako` and `flatted` were requested but skipped: `mako` was already removed from the project (`9535745e`), and `flatted` is an npm dep in `package-lock.json`, not a Python dep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): bump wheel from 0.38.1 to 0.46.2 Closes Dependabot PR #2651. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): manage Python deps via uv dependency-groups Replaces the six service-scoped requirements.txt files with PEP 735 dependency-groups in pyproject.toml and rebuilds every Docker image as a two-stage build: a uv-builder stage (using the official ghcr.io/astral-sh/uv image, with a pip fallback for armv6) produces /venv via `uv sync --group <svc>`, which the runtime stage copies in. uv.lock becomes authoritative for all services. requirements/requirements.host.txt is kept as a committed, auto-generated artifact (`uv export --group host`) so bin/install.sh and the Ansible role keep working; a python-lint CI step enforces it stays in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore(deps): bump Django, cryptography, pyOpenSSL, and 5 others - Django 4.2.29 → 4.2.30 (latest 4.2 LTS) - cryptography 3.3.2 → 46.0.7 (capped by pyOpenSSL 26's `cryptography<47`; cryptography 47 is incompatible with the latest pyOpenSSL) - pyOpenSSL 19.1.0 → 26.0.0 (required by newer cryptography ABI — pyOpenSSL 19 crashed at import against cryptography ≥ ~3.4) - requests 2.32.5 → 2.33.1 (aligned across every group, including docker-image-builder and local) - pyasn1 0.6.2 → 0.6.3 - redis 7.1.0 → 7.4.0 - Cython 3.2.3 → 3.2.4 - sh 1.8 → 2.2.2 (major bump; usages in celery_tasks.py, bin/wait.py, lib/utils.py stick to the stable `sh.<cmd>` + `sh.ErrorReturnCode_N` API — verified still works) - python-vlc 3.0.20123 → 3.0.21203 `mako` and `flatted` were requested but skipped: `mako` was already removed from the project (`9535745e`), and `flatted` is an npm dep in `package-lock.json`, not a Python dep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): bump wheel from 0.38.1 to 0.46.2 Closes Dependabot PR #2651. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: adapt sh 2.x API changes in wait.py and viewer Two real breakages uncovered by auditing every `sh.` call site against the sh 1.x → 2.x API: - bin/wait.py: `sh.grep(sh.route(), 'default')` no longer pipes in sh 2.x — the inner command stringifies to its stdout and becomes a literal argument to grep, producing `grep '<route_output>' default` and an ErrorReturnCode_2. Use the idiomatic `sh.grep('default', _in=sh.route())` instead. - viewer/__init__.py: `browser.process.alive` is gone in sh 2.x (`OProc` no longer exposes it). Use `browser.process.is_alive()[0]`, which returns the `(alive_bool, exit_code)` tuple. Plus two review nits: - Add trailing newline to docs/migrating-assets-to-screenly.md - Use `diff -u` in the requirements.host.txt CI drift check so failures print a readable unified diff. Verified against sh==2.2.2 inside the rebuilt server image: - `sh.grep('default', _in=sh.echo('…'))` pipes correctly - `cmd.process.is_alive()` → `(True, None)` while running, `(False, 0)` after wait() - `cmd.process.stdout.decode('utf-8')` still works on `_bg=True` processes 83/83 unit tests + 12/12 integration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(docker): serialize apt cache access with sharing=locked The multi-stage uv-builder + runtime layout means two RUN steps can race on BuildKit's shared `/var/cache/apt` cache mount. apt requires an exclusive lock on /var/cache/apt/archives, so a concurrent apt-get in the sibling stage causes the build to fail with `E: Could not get lock /var/cache/apt/archives/lock`. BuildKit's default cache mount sharing mode is `shared` (unrestricted concurrent access). Switching to `sharing=locked` makes BuildKit serialize access across stages, matching apt's locking model. Discovered while cross-compiling `pi4-64` under QEMU, where the slower emulated apt-get in stage 1 overlapped with the host-speed apt-get in stage 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: fix ansible-lint and sbom workflows ansible-lint (broken since 2026-04-08, #2732): - `ansible-community/ansible-lint-action@main` repo is gone (404), so every run failed with "Unable to resolve action". - Rewrite the workflow to use setup-uv + `uv run ansible-lint` from a new `ansible-lint==26.4.0` entry in the `dev-host` dependency group — matches the uv-based pattern already used by `python-lint.yaml`. - Add `.ansible-lint` config with a skip list covering 19 pre-existing violations in `ansible/` roles (`var-naming[no-role-prefix]`, `risky-shell-pipe`, `no-free-form`) so the workflow can go green today; follow-up PRs should drive the skip list down. - Extend the path triggers to fire on config, workflow, and lock changes — not just `ansible/`. sbom** (broken since 2026-04-02): - The `sbomify/github-action` renamed `SBOM_FILE` to `LOCK_FILE` for lockfile inputs. Every run has been failing with "`uv.lock` is a lock file, not an SBOM. Please use LOCK_FILE instead of SBOM_FILE." - Rename both `SBOM_FILE` envs (`package-lock.json` and `uv.lock`) to `LOCK_FILE`. Verified locally: `uv run ansible-lint ansible/` passes (0 failures, 0 warnings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(build): replace webpack, npm, and jest with bun Collapses the JS toolchain to a single tool. Bun handles installs (replacing npm), bundling via `bun build` + `sass` CLI (replacing webpack + ts-loader + babel + mini-css-extract-plugin), and testing via `bun test` (replacing jest + ts-jest + jest-fixed-jsdom). Dev/test Dockerfiles pull the bun binary from the official `oven/bun` image via `COPY --from=`; production uses `oven/bun` as a builder stage. Removes 18 devDependencies and 5 config files; adds only `bunfig.toml` and `@happy-dom/global-registrator`. Drive-by fix: `FormData` was imported as a value from `@/types` in two files but is a type-only interface shadowing the browser global. Webpack+ts-loader silently erased it; Bun's bundler surfaced the bug. Converted to `import type`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(docker): symlink bunx to bun in dev and test images `bunx` is a symlink to `bun` in the official `oven/bun` image, so the single-file `COPY --from=oven/bun:...-slim /usr/local/bin/bun` missed it. Result: `bun run dev:css` / `bun run build:css` failed with `bunx: command not found` inside dev and test containers. Recreate the symlink after the copy. Production is unaffected because its builder stage uses `FROM oven/bun` (bunx already present). Caught by full end-to-end build verification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: SHA-pin all external GitHub Actions Addresses SonarCloud rule githubactions:S7637 ("Use full commit SHA hash for this dependency") and brings the repo in line with the hardened CI guidance from OpenSSF, CISA, and GitHub itself: tag refs like @v7 or @master are mutable and can be retargeted by the action owner or via compromise. Pinning to a full commit SHA removes that supply-chain risk. Every `uses:` reference to an external action across all 13 workflow files is now pinned by SHA, with the original tag preserved as an inline comment so the intent remains readable: uses: actions/checkout@de0fac2e45 # v6 Dependabot's github-actions ecosystem (already configured in .github/dependabot.yml) recognises this `<SHA> # <tag>` format and will update both the SHA and the comment together on future version bumps, so we don't lose automated update coverage. Scope: 21 distinct external actions × 73 total use sites across ansible-lint, build-balena-disk-image, build-webview, codeql-analysis, deploy-website, docker-build, generate-openapi-schema, javascript-lint, lint-workflows, python-lint, sbom, and test-runner. Local workflow references (./.github/workflows/...) left untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs,chore: address review feedback on bun migration - Update CLAUDE.md and docs/developer-documentation.md to replace npm/webpack/jest references with bun equivalents. The old webpack ProvidePlugin bullet was superseded by tsconfig's react-jsx runtime; restate that. - Add comments in setupTests.ts explaining (1) why Bun's native fetch is stashed and restored around happy-dom's GlobalRegistrator (so MSW can intercept) and (2) why testing-library is imported dynamically after registration (so `screen` binds to a live document.body). - Narrow the production builder SCSS COPY back to `.scss` and drop the unused `bunfig.toml` copy (it's only consumed by `bun test`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(dev): fail-fast when a watcher crashes in `bun run dev` `wait` without arguments returns the last-exiting job's status, so a crashing JS or CSS watcher could leave the script reporting success. Track each watcher's PID, use `wait -n` to exit on the first failure, and kill the survivor via a trap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 06:53:56 +01:00
Viktor Petersson	ee12387b06	chore(deps): manage Python deps via uv dependency-groups (#2744 ) * chore(deps): manage Python deps via uv dependency-groups Replaces the six service-scoped requirements.txt files with PEP 735 dependency-groups in pyproject.toml and rebuilds every Docker image as a two-stage build: a uv-builder stage (using the official ghcr.io/astral-sh/uv image, with a pip fallback for armv6) produces /venv via `uv sync --group <svc>`, which the runtime stage copies in. uv.lock becomes authoritative for all services. requirements/requirements.host.txt is kept as a committed, auto-generated artifact (`uv export --group host`) so bin/install.sh and the Ansible role keep working; a python-lint CI step enforces it stays in sync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> chore(deps): bump Django, cryptography, pyOpenSSL, and 5 others - Django 4.2.29 → 4.2.30 (latest 4.2 LTS) - cryptography 3.3.2 → 46.0.7 (capped by pyOpenSSL 26's `cryptography<47`; cryptography 47 is incompatible with the latest pyOpenSSL) - pyOpenSSL 19.1.0 → 26.0.0 (required by newer cryptography ABI — pyOpenSSL 19 crashed at import against cryptography ≥ ~3.4) - requests 2.32.5 → 2.33.1 (aligned across every group, including docker-image-builder and local) - pyasn1 0.6.2 → 0.6.3 - redis 7.1.0 → 7.4.0 - Cython 3.2.3 → 3.2.4 - sh 1.8 → 2.2.2 (major bump; usages in celery_tasks.py, bin/wait.py, lib/utils.py stick to the stable `sh.<cmd>` + `sh.ErrorReturnCode_N` API — verified still works) - python-vlc 3.0.20123 → 3.0.21203 `mako` and `flatted` were requested but skipped: `mako` was already removed from the project (`9535745e`), and `flatted` is an npm dep in `package-lock.json`, not a Python dep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(deps): bump wheel from 0.38.1 to 0.46.2 Closes Dependabot PR #2651. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: adapt sh 2.x API changes in wait.py and viewer Two real breakages uncovered by auditing every `sh.` call site against the sh 1.x → 2.x API: - bin/wait.py: `sh.grep(sh.route(), 'default')` no longer pipes in sh 2.x — the inner command stringifies to its stdout and becomes a literal argument to grep, producing `grep '<route_output>' default` and an ErrorReturnCode_2. Use the idiomatic `sh.grep('default', _in=sh.route())` instead. - viewer/__init__.py: `browser.process.alive` is gone in sh 2.x (`OProc` no longer exposes it). Use `browser.process.is_alive()[0]`, which returns the `(alive_bool, exit_code)` tuple. Plus two review nits: - Add trailing newline to docs/migrating-assets-to-screenly.md - Use `diff -u` in the requirements.host.txt CI drift check so failures print a readable unified diff. Verified against sh==2.2.2 inside the rebuilt server image: - `sh.grep('default', _in=sh.echo('…'))` pipes correctly - `cmd.process.is_alive()` → `(True, None)` while running, `(False, 0)` after wait() - `cmd.process.stdout.decode('utf-8')` still works on `_bg=True` processes 83/83 unit tests + 12/12 integration tests still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(docker): serialize apt cache access with sharing=locked The multi-stage uv-builder + runtime layout means two RUN steps can race on BuildKit's shared `/var/cache/apt` cache mount. apt requires an exclusive lock on /var/cache/apt/archives, so a concurrent apt-get in the sibling stage causes the build to fail with `E: Could not get lock /var/cache/apt/archives/lock`. BuildKit's default cache mount sharing mode is `shared` (unrestricted concurrent access). Switching to `sharing=locked` makes BuildKit serialize access across stages, matching apt's locking model. Discovered while cross-compiling `pi4-64` under QEMU, where the slower emulated apt-get in stage 1 overlapped with the host-speed apt-get in stage 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: fix ansible-lint and sbom workflows ansible-lint (broken since 2026-04-08, #2732): - `ansible-community/ansible-lint-action@main` repo is gone (404), so every run failed with "Unable to resolve action". - Rewrite the workflow to use setup-uv + `uv run ansible-lint` from a new `ansible-lint==26.4.0` entry in the `dev-host` dependency group — matches the uv-based pattern already used by `python-lint.yaml`. - Add `.ansible-lint` config with a skip list covering 19 pre-existing violations in `ansible/` roles (`var-naming[no-role-prefix]`, `risky-shell-pipe`, `no-free-form`) so the workflow can go green today; follow-up PRs should drive the skip list down. - Extend the path triggers to fire on config, workflow, and lock changes — not just `ansible/`. sbom** (broken since 2026-04-02): - The `sbomify/github-action` renamed `SBOM_FILE` to `LOCK_FILE` for lockfile inputs. Every run has been failing with "`uv.lock` is a lock file, not an SBOM. Please use LOCK_FILE instead of SBOM_FILE." - Rename both `SBOM_FILE` envs (`package-lock.json` and `uv.lock`) to `LOCK_FILE`. Verified locally: `uv run ansible-lint ansible/` passes (0 failures, 0 warnings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * ci: SHA-pin all external GitHub Actions Addresses SonarCloud rule githubactions:S7637 ("Use full commit SHA hash for this dependency") and brings the repo in line with the hardened CI guidance from OpenSSF, CISA, and GitHub itself: tag refs like @v7 or @master are mutable and can be retargeted by the action owner or via compromise. Pinning to a full commit SHA removes that supply-chain risk. Every `uses:` reference to an external action across all 13 workflow files is now pinned by SHA, with the original tag preserved as an inline comment so the intent remains readable: uses: actions/checkout@de0fac2e45 # v6 Dependabot's github-actions ecosystem (already configured in .github/dependabot.yml) recognises this `<SHA> # <tag>` format and will update both the SHA and the comment together on future version bumps, so we don't lose automated update coverage. Scope: 21 distinct external actions × 73 total use sites across ansible-lint, build-balena-disk-image, build-webview, codeql-analysis, deploy-website, docker-build, generate-openapi-schema, javascript-lint, lint-workflows, python-lint, sbom, and test-runner. Local workflow references (./.github/workflows/...) left untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): use RunningCommand.is_alive() instead of OProc tuple OProc.is_alive() returns (bool, exit_code); RunningCommand.is_alive() wraps that and returns just the bool. The wrapper is clearer than indexing into the tuple. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 06:48:36 +01:00
Nico Miguelino	f0f6497efc	chore(docker): use APT nodejs for pi3 and pi4 (#2678 ) NodeSource doesn't support armhf architecture (used by pi3/pi4), so fall back to APT-provided nodejs/npm for those boards. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-02-23 23:10:11 -08:00
Nico Miguelino	29ae072514	chore: replace Poetry with `uv` for managing host dependencies (#2611 )	2025-12-16 05:03:27 -08:00
Nico Miguelino	6a822b8fe6	chore: migrate Node.js from v18.x to v22.x (#2491 )	2025-09-05 19:03:39 -07:00
Nico Miguelino	ea90fb80a7	fix: copy `tsconfig.json` when building Dockerfile.server in prod (#2361 )	2025-06-24 13:03:41 -07:00
Nico Miguelino	51e4511bba	feat: migrate to React (#2265 )	2025-05-26 21:04:19 -07:00
Nico Miguelino	ff1e023c0e	fix: use `WebView-v0.3.6` (#2230 )	2025-03-16 00:37:13 -07:00
Nico Miguelino	ca07fcbbec	fix: attempt to fix the CI pipeline for building Docker images (#2211 )	2025-02-06 12:11:18 -08:00
Nico Miguelino	c6550eaad7	fix: install `nodejs` and `npm` dependencies (#2210 )	2025-02-06 08:23:14 -08:00
Nico Miguelino	4f0f8e5a20	Adds support for Raspberry Pi 5 (#1868 )	2024-12-19 23:30:58 -08:00
Nico Miguelino	9983ba631b	fix: enforce HTTPS when using `curl` to install Poetry (#2152 ) * fix: enforce HTTPS when using `curl` to install Poetry * chore(ci): exclude development-related files from build pipeline	2024-12-05 13:19:00 -08:00
Nico Miguelino	7dd6d49881	chore: update development mode scripts to containerize Poetry and other relevant dependencies (#2144 )	2024-12-04 10:14:07 -08:00
dependabot[bot]	08a79e6e04	chore(deps): bump django from 3.2.18 to 4.2.16 in /requirements (#2096 ) * chore(deps): bump django from 3.2.18 to 4.2.16 in /requirements Bumps [django](https://github.com/django/django) from 3.2.18 to 4.2.16. - [Commits](https://github.com/django/django/compare/3.2.18...4.2.16) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> * fix: fix CSRF issues caused by upgrade from Django 3 to 4 --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: nicomiguelino <nicomiguelino2014@gmail.com>	2024-12-04 08:51:57 -08:00
Nico Miguelino	47947f4210	chore(workflow): place Webpack-generated static files in `./static/dist` (#2130 )	2024-11-18 12:58:01 -08:00
Nico Miguelino	01d28d55ec	chore: make use of Webpack for building CSS and JS files (#2127 )	2024-11-15 11:17:08 -08:00
Nico Miguelino	1f8a866065	fix: failing x86 build (#2120 )	2024-11-08 23:37:03 -08:00
Nico Miguelino	c766045f3e	chore: use multi-stage builds for server images in both development and production environments (#2117 )	2024-11-08 21:59:42 -08:00
Nico Miguelino	f8749b123e	chore(workflow): port the Docker image builder script to Python (#2060 )	2024-11-07 06:04:32 -08:00

1 2 3 4 5 ...

271 Commits