Anthias

mirror of https://github.com/Screenly/Anthias.git synced 2026-06-10 17:18:43 -04:00

Author	SHA1	Message	Date
Viktor Petersson	8c6cfaf26a	feat(server): offer HandBrake GUI steps for rejected video uploads (#3040 ) * feat(server): offer HandBrake GUI steps for rejected video uploads - Add _handbrake_steps mirroring the ffmpeg recipe's codec/1080p choices - Persist the steps to metadata.error_handbrake on rejection - Render them as a numbered list with a handbrake.fr link in the modal Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): make HandBrake steps a real preset-based walkthrough - Lead with the stock "Fast 1080p30" preset (H.264 MP4, 1080p cap) - HEVC boards just flip Video Encoder to "H.265 (x265)" - Spell out source pick, Save As/Browse, and Start Encode - Drop the cap arg: the 1080p preset is the low-RAM fix too Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: assert on HANDBRAKE_URL constant, not the literal Reference processing.HANDBRAKE_URL instead of a duplicated URL literal so CodeQL stops flagging the assertion as incomplete-URL-substring sanitization (false positive in a test containment check). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: address Copilot review on HandBrake steps - Reword final step: upload as a new asset (the Edit modal has no upload control) - Rename stale-clear test to mention error_handbrake too Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:55:57 +02:00
Viktor Petersson	5d21924080	fix(celery): don't report the by-design codec rejection to Sentry (#3041 ) The upload codec/resolution gate raises ``UnsupportedVideoCodecError`` as a deliberate, operator-facing rejection — it's surfaced in the UI as a "Failed" pill plus a copy-pasteable ffmpeg re-encode recipe via ``_NormalizeAssetTask.on_failure``. It's an expected outcome (e.g. Pi 5 has no H.264 HW decode block, an unknown arm64 board can't certify any codec), not a fault, so it shouldn't reach Sentry — but every rejection was landing there as an unhandled task error (Sentry ANTHIAS-1J, ANTHIAS-20). List it in the video task's ``throws``: Celery then logs it at INFO without a traceback, and sentry-sdk's CeleryIntegration skips ``task.throws`` exceptions (``_capture_exception`` returns early on ``isinstance(exc, task.throws)``), so the gate stops flooding Sentry. ``on_failure`` still runs, so the operator-facing error pill and recipe are unchanged. Regression test asserts the video task declares it in ``throws`` and the image task (which never raises it) does not. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:45:01 +02:00
Viktor Petersson	10c68b26cc	feat(viewer,build,balena): add arm64/Qt6 pi3-64 board and the Rock Pi 4 fleet; keep 32-bit pi3 as legacy (#2985 ) * feat(viewer,build): add arm64/Qt6 pi3-64 board; keep 32-bit pi3 as legacy Revises issue #2906 Phase 2. The original plan (delete the Qt 5 toolchain, force Pi 2/Pi 3 onto Qt 6) is abandoned: Qt 5 was fixed up on master and stays. Instead, add a NEW board target `pi3-64` — a 64-bit (arm64) Qt 6 viewer image for Raspberry Pi 3 hardware on a 64-bit OS — as its own image stream, disk image, and balena fleet. The legacy 32-bit armhf/Qt5 `pi3` board is left untouched and flagged as legacy/maintenance. pi3-64 mirrors the existing `pi4-64` path (Qt 6, eglfs_kms; video played in-process by AnthiasViewer's QtMultimedia pipeline — QMediaPlayer + the ffmpeg/libavcodec backend with V4L2 HW decode, no external player). VideoCore IV is H.264-only HW decode. Board selection is by `uname -m`: a Pi 3 on a 64-bit OS gets `pi3-64`, a 32-bit OS keeps `pi3` (the model string is identical on both arches). - image_builder: pi3-64 build params (arm64) + is_qt6; constants. - Dockerfile.viewer.j2 + start_viewer.sh: pi3-64 shares the pi4-64 eglfs KMS path; renamed board-agnostic eglfs-kms-pi4.json -> eglfs-kms.json. - Detection: install.sh / upgrade_containers.sh (aarch64 Pi 3 -> pi3-64). - Runtime: media_player force_mpv set (selects MPVMediaPlayer, the QtMultimedia D-Bus shim); processing codec grid {'h264'}. - CI: docker-build matrix + mirror-latest-tags. - Balena (fleet screenly_ose/anthias-pi3-64, device type raspberrypi3-64): disk-image + manual-deploy workflows, balena_ota_deploy.sh, balena_fleet_maintenance.py, balena_unpin_devices.py, deploy_to_balena.sh, balena-host-config.json. - Pi Imager: SUPPORTED_BOARDS += pi3-64 (non-maintenance); pi3 stays legacy. - Docs + tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(website): link the Pi 3 (64-bit) bullet like its siblings Copilot review: the list is introduced as 'links to the images', so the new pi3-64 entry should be navigable like the surrounding bullets. Link the label to the release-images section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(balena): add the Rock Pi 4 fleet (screenly_ose/anthias-rockpi4) Wires the anthias-rockpi4 balena fleet (device type rockpi-4b-rk3399) into the OTA deploy + disk-image pipeline. The fleet has no board-specific image build: it runs the generic arm64 containers, so bin/balena_ota_deploy.sh / bin/deploy_to_balena.sh map the rockpi4 board to the <short-hash>-arm64 image tags (and strip the /dev/vchiq mount — no VideoCore on RK3399), and the disk-image preflight verifies the arm64 images exist. Root-cause fix for the fleet's codec gate: balena ships no anthias_host_agent service, so host:board_subtype was never published and resolve_device_key() stayed 'arm64' — whose HW-decode set is empty, rejecting every video upload. The model-string → subtype table moves to the dependency-free anthias_common.device_helper.detect_board_subtype (single source, imported by host_agent), and anthias_common.board.get_board_subtype now falls back to reading /proc/device-tree/model in-container when Redis has no value. The device tree is kernel-global — the same mechanism get_device_type has always used for Pi detection — so the rockpi4 fleet resolves its {h264, hevc} envelope without a host-side daemon, and compose installs whose host_agent died self-heal too. - build-balena-disk-image.yaml: rockpi4 in both matrices, fleet + rockpi-4b-rk3399 image cases, arm64 images in the preflight check. - deploy-balena-manual.yaml: rockpi4 board option. - balena-host-config.json: rockpi4 declared {} (config.txt is RPi-only; the reconcile hard-fails on a missing key). - balena_fleet_maintenance.py / balena_unpin_devices.py: fleet added. - tests: get_board_subtype Redis-first + device-tree-fallback order; detect_board_subtype patch targets follow the move. - docs: board-enablement, balena-fleet-host-config, installation-options. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:49:12 +02:00
Viktor Petersson	b57c1a16a7	feat(viewer,server): 1 GB SBC enablement — low-RAM degradation gates (#2915 ) Boards with < 1.5 GiB MemTotal (Pi 2/Pi 3 1GB, Pi 4 1GB, Rock Pi 4 1GB, generic-arm64 1GB SKUs) OOM-cycled when QtMultimedia loaded a 4K HEVC asset alongside the two QtWebEngine renderers introduced by #2905. On-device repro on a 1 GB Rock Pi 4 confirmed `global_oom` on the container's bash process followed by a restart loop, plus the kernel keeping sshd in banner-exchange until power-cycle. This patch puts the device into a graceful degraded mode before the OOM cascade fires. * bin/upgrade_containers.sh — exports ANTHIAS_LOW_RAM=1 when TOTAL_MEMORY_KB < 1572864 (1.5 GiB), 0 otherwise. Threshold cleanly splits the 1 GB SKUs from the 2 GB+ SKUs in the supported fleet. * docker-compose.yml.tmpl — forwards ANTHIAS_LOW_RAM to the viewer container. * anthias_host_agent — publishes host:total_mem_kb to Redis alongside the existing host:board_subtype so server-side gates can read MemTotal without re-opening /proc/meminfo themselves. * anthias_common.board — adds LOW_RAM_THRESHOLD_KB, get_total_mem_kb(), is_low_ram_device() helpers. * anthias_webview (view.cpp) — when ANTHIAS_LOW_RAM=1, aliases webView2 onto webView1 so the rest of the dual-buffer logic still runs but never spawns a second Chromium renderer (~100 MB physical RAM saved per device). UX: page swap is in-place with a brief blank during load, no preloaded crossfade. * anthias_server.processing — extends the codec gate with a low-RAM 1080p resolution cap. Above-1080p uploads on low-RAM boards are rejected at upload with the existing recipe machinery, extended to inject `-vf scale=1920:1080:force_original_aspect_ratio=decrease` so the operator's re-encode lands within the envelope. If an upload fails BOTH codec and resolution gates, the codec message wins but the recipe folds in the downscale so a single re-encode satisfies both. * Diagnostics page — Memory card surfaces a "Low-RAM mode" badge with the threshold MiB so operators can see why the device degraded. /api/v2/info's `memory` field gains a `low_ram: bool` for API clients. * docs/board-enablement.md — rewrote the stale `--hwdec=drm-copy` / per-codec dispatch text (removed in #2905); documented the known rkvdec mainline limitation on Armbian 6.18 (HEVC stateless engages via `-hwaccel drm` but produces decode errors; H.264 has no v4l2_request binding in +rpt1 7.1.3) and the new low-RAM mode. Tests cover the four matrix cells (low/high-RAM × in/over-cap), the recipe shape with and without `cap_to_1080p`, the cap defence against unknown / zero dimensions, host_agent's MemTotal parser, and the API endpoint's new low_ram field. Out of scope (separate work): HEVC HW-decode on arm64 — depends on an upstream rkvdec driver fix landing in Debian-shipped Armbian kernels; Anthias does not maintain its own kernel/distro. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-21 07:48:46 +02:00
Viktor Petersson	57b4f25c77	feat(viewer,server): per-board HW decode dispatch + codec gate on upload (#2885 ) * perf(viewer): pi4-64/pi5 use mpv --vo=gpu --gpu-context=drm On Pi the connector's preferred mode is usually 4K (most modern TVs report 3840x2160 in their EDID), and the previous --vo=drm path ran a CPU zimg upscale from 1080p source to that 4K output. On a 4-core A72 that's the bottleneck — mpv VO drops 59-75 frames per 30s on a stock 1080p H.264 signage clip. Pi5's A76 is faster but the same upscale path is still the limit. Switching the VO to GL with the DRM context (mpv --vo=gpu --gpu-context=drm) hands the upscale to the V3D and leaves everything else identical — mpv still owns DRM master, still reads --drm-mode=1920x1080@60 (kept), still runs in --vd-lavc-threads=4 software decode (mpv 0.40 in Debian Trixie has v4l2m2m-copy but not v4l2request, so --hwdec=auto-safe falls back to software on this asset; that hasn't changed). Measured on a 4K-connected Pi4-64 Rev 1.5, same clip, same 30 s window: --vo=drm : 59-75 vo drops / 30 s --vo=gpu --gpu-context=drm (this patch) : 3-6 vo drops / 30 s `decoder-frame-drop-count` is 0 in both — the regression was purely on the VO side, and shifting scaling off the CPU is what buys the headroom. x86 (cage + --gpu-context=wayland) is unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(viewer): drop --drm-mode pin on Pi4-64/Pi5 under --gpu-context=drm The previous commit moved Pi4-64/Pi5 to `mpv --vo=gpu --gpu-context=drm` but kept the `--drm-mode=1920x1080@60` pin from the old --vo=drm path. On-device testing showed the pin hurts throughput under GBM: 294 vo drops/30s with the pin, 3-6 without, on the same 4K-connected Pi4 and the same H.264 clip. The pin existed in the first place to dodge CPU zimg upscale to 4K, which the A72 couldn't keep up with on the legacy --vo=drm path. Under --gpu-context=drm the V3D does the scaling for free at the connector's preferred mode, so the workaround is no longer needed and is in fact harmful. `--vd-lavc-threads=4` stays — software decode under --hwdec=auto-safe (mpv 0.40 has v4l2m2m-copy but not v4l2request) still benefits from explicit threading. Verified on a 4K-connected Pi4-64 across H.264 (30/24 fps) and HEVC clips: 2-6 vo drops/30s in every case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(viewer): consolidate Qt6 boards onto cage + Wayland, pin Pi 4 to 1080p Folds in PR #2883: Pi 4-64 / Pi 5 now run under cage with mpv on --vo=gpu --gpu-context=wayland, joining x86 and arm64 on a single Wayland-based display stack. Drops the --vo=drm legacy path entirely from MPVMediaPlayer. Qt 5 boards (pi2 / pi3) stay on linuxfb via VLCMediaPlayer — out of scope here. Replaces the perf branch's `--vo=gpu --gpu-context=drm` standalone fix with the consolidated cage path. The previous standalone finding (3-6 vo drops / 30 s on Pi 4 at 4K) was a Pi-without-cage optimization; once Pi runs under cage like every other Qt6 board, the same trick applies via wayland but cage's composite step adds its own pass and the V3D on Pi 4 can't keep up at 4K (738 vo drops / 30 s measured at native 4K under cage). Fix: move the 1080p mode pin one layer up from app code to host config — the new ansible/.../cmdline.txt.j2 conditional appends `video=HDMI-A-1:1920x1080@60 video=HDMI-A-2:1920x1080@60` when `device_type == 'pi4-64'`. With output pinned to 1080p there's no upscale anywhere in the pipeline, matching the bandwidth profile of today's --vo=drm production setup. Pi 5 / x86 / arm64 keep the connector's preferred mode (typically 4K). Pi 5's V3D 7.1 has roughly 2× Pi 4's throughput; x86 iGPUs handle 4K via VAAPI; arm64 SBC perf varies by SoC. Other notable changes folded in from #2883: * tools/image_builder/utils.py — `cage` + `qt6-wayland` move out of the per-board branch into the shared is_qt6 block. `wlr-randr` (was x86-only) goes in the shared block too since rotation now happens via wlr-randr on every Qt6 board. `va-driver-all` stays x86-only (no VAAPI on Pi / ARM SoCs). * docker/Dockerfile.viewer.j2 — QT_QPA_PLATFORM=wayland gated on is_qt6 instead of board in ('x86', 'arm64'). * bin/start_viewer.sh — case on DEVICE_TYPE: every Qt6 board takes the cage + sudo path. Pi2 / Pi3 stay on the legacy direct-sudo path. * src/anthias_viewer/media_player.py — single --vo=gpu --gpu-context=wayland for all reachable device types. The per-board rotate_args block is gone: every Qt6 device inherits the transform from cage via wlr-randr, so mpv would double-rotate if it set --video-rotate. * tests/test_media_player.py — parametrised tests for all four Qt6 boards (x86, arm64, pi4-64, pi5) hitting the same VO path; rotation tests assert mpv never sets --video-rotate under cage. * website/data/faq.yaml — rotation entry points at Settings page / wlr-randr; resolution entry calls out the Pi 4 1080p pin. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ansible): propagate tags into boot.yml include_tasks The `Configure boot partition` task in system/tasks/main.yml was tagged `touches-boot-partition` / `raspberry-pi` but those tags weren't propagated to the tasks inside boot.yml — Ansible's default include_tasks behaviour matches the include against --tags but leaves the included tasks tag-less, so they get filtered back out. Running `ansible-playbook ... --tags touches-boot-partition` therefore did nothing. Use the explicit `apply: tags:` form so the include's tags are copied onto each task in boot.yml. With this, the standalone "re-render boot config" workflow actually works, which matters on Pi 4 now that the 1080p HDMI mode pin in cmdline.txt.j2 needs to land without re-running the whole playbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): keep Pi 4 on linuxfb; only Pi 5 / x86 / arm64 go cage On-device testing on a Pi 4 Model B Rev 1.5 with a 4K HDMI display showed cage+wayland is fundamentally too heavy for the V3D 6.0: --vo=drm (existing, no cage) : 59-75 drops/30s --vo=gpu --gpu-context=drm (no cage, GPU scale): 3-6 drops/30s --vo=gpu --gpu-context=wayland (cage, even at : 730+ drops/30s, 1080p HDMI cmdline pin to avoid 4K scale) mpv at 99% CPU running ~1/4× real time The 1080p HDMI pin doesn't recover Pi 4 — cage's composite pass costs more than the V3D 6.0 has spare bandwidth for, regardless of output resolution, with the webview running in the background or not. Pi 5's V3D 7.1 has roughly 2× the throughput and is expected to keep up; x86 / arm64 already shipped on cage and remain unchanged. Net result: * Pi 4-64 stays on Qt linuxfb (no compositor) with mpv on --vo=gpu --gpu-context=drm. mpv writes straight to KMS via libgbm and lets the V3D do video scaling — keeping the standalone perf-branch finding that drops from 59-75 → 3-6 on the same clip. * Pi 5 / x86 / arm64 stay (or move) onto cage + qt6-wayland + wlr-randr with mpv on --vo=gpu --gpu-context=wayland. * Pi 2 / Pi 3 stay on the Qt5 + VLC + linuxfb track they were already on. * The Pi 4 1080p HDMI cmdline pin added in the previous commit is reverted (no longer needed without cage). * Rotation handling: mpv emits --video-rotate=N on Pi 4 (no compositor to apply the transform) and skips it on the cage boards (wlr-randr handles it there). Goal-wise this is the partial-consolidation we agreed to as last resort: three of four Qt6 boards share one Wayland stack, Pi 4 keeps the framebuffer path for as long as the V3D 6.0 + mpv 0.40 combo lacks the headroom. Pi 4 remains in scope for revisiting once mpv ships the v4l2request hwdec. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): mirror host render-GID for all Qt 6 boards, not just cage mpv uses /dev/dri/renderD128 for --vo=gpu on every Qt 6 board now — wayland (cage path on x86 / arm64 / pi5) and drm (linuxfb path on Pi 4) both go through Mesa GL. The render-GID mirror was inside the cage branch of start_viewer.sh, so Pi 4's mpv ran as viewer user, hit the render node owned by GID 992, got "Permission denied", and bailed with "Failed initializing any suitable GPU context!". Hoist the render-GID setup above the per-board case so it runs for every Qt 6 board. cage / linuxfb branching stays as-is. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): Pi 4 stays on --vo=drm (Qt linuxfb DRM master contention) Earlier commits switched Pi 4 to mpv --vo=gpu --gpu-context=drm based on a 3-6 vo-drop/30 s measurement. That test was run as root in a fresh container — no Qt linuxfb in the picture. In the production viewer where AnthiasWebview holds the framebuffer via Qt linuxfb, --vo=gpu fails: failed to open /dev/dri/renderD128: Permission denied [vo/gpu/drm] Failed to acquire DRM master: Permission denied [vo/gpu] Failed initializing any suitable GPU context! Error opening/initializing the selected video_out (--vo) device. Video: no video Mesa GBM holds DRM master persistently and contends with Qt linuxfb's framebuffer use. mpv's classic --vo=drm has its own master juggling (briefly grab → render → drop) that coexists fine with linuxfb — that's why master's existing Pi 4 config works. Revert Pi 4 mpv flags to the production master config: --vo=drm --drm-mode=1920x1080@60 --vd-lavc-threads=4 The standalone perf-finding from this branch's earlier history turns out not to apply in production; retracted from the roll-up. Pi 5 / x86 / arm64 unchanged (they're on cage + --vo=gpu --gpu-context=wayland, which has its own DRM master flow via cage). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): cage opens on the first connected connector, not HDMI-A-1 Without `-o`, cage uses whatever output the DRM backend enumerates first — typically HDMI-A-1 on Pi 5 (closer to USB-C) and the on-board panel / first HDMI on x86 / arm64. If the operator plugs into the other port (Pi 5 HDMI-A-2, or any DP connector on x86), cage renders to a disconnected connector and the screen stays black. start_viewer.sh now iterates /sys/class/drm/card-, picks the first connector whose status reads "connected", strips the cardN- prefix to get the bare name cage expects (HDMI-A-1, HDMI-A-2, DP-1, eDP-1, …), and passes it via `-o`. Falls back to letting cage pick if nothing is connected yet — the display may come up via HPD after cage starts, or this is a build/CI host with no display at all. Caught while end-to-end testing on the rig: Pi 5 cable on HDMI-A-2 went to a black screen even though `cat /sys/class/drm/card1-HDMI-A-2/status` reported "connected" and cage / the viewer were running. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(viewer): mpv from apt.raspberrypi.com on Pi 4 / Pi 5, hwdec auto-copy Stock Debian Trixie's mpv 0.40 is compiled without `v4l2request` hwdec, so Pi 5's Hantro stateless decoder is invisible to it and mpv falls back to software decode for every H.264 / H.265 source. Pi 4's V4L2 M2M decoder is reachable via `v4l2m2m-copy` but mpv's `--hwdec=auto-safe` whitelist explicitly excludes that method, so auto-detect picked software there too. Two changes, applied together because they only make sense together: * Pi 4 / Pi 5 viewer images now pull mpv (and the FFmpeg library family it depends on) from `archive.raspberrypi.com/debian trixie main`. The Pi-tuned build ships `v4l2request` hwdec (Pi 5) and a maintained `v4l2m2m-copy` (Pi 4). An apt-pin restricts the Pi repo to the mpv + libav* packages only, so curl / ca-certificates / etc. continue to come from stock Debian and the rest of the image stays on the same baseline. * `MPVMediaPlayer.play()` switches `--hwdec=auto-safe` → `--hwdec=auto-copy`. auto-copy is the same family but with a broader whitelist that includes the v4l2-family copy hwdecs. Net effect: x86 still picks vaapi-copy (unchanged), Pi 4 picks v4l2m2m-copy, Pi 5 picks v4l2request, arm64 falls through to software (no v4l2request in stock Debian mpv, no vendor-tuned Rockchip plugin in stock either — Tier-2 follow-up). Plus an `ANTHIAS_DEBUG_DROPS=1` env knob: when set on the viewer container, mpv's stdout/stderr go to `/data/.anthias/mpv.log` (host-bound) instead of `/dev/null`, and `--no-terminal` is dropped so the status line ("AV: ... Dropped: N") is emitted. Lets us read per-asset frame-drop counts straight from the production viewer pipeline (no custom harness, no rebuild) during the test-grid runs. Default (unset) preserves the silent behaviour. Also: drops the `cage -o <connector>` autodetect attempt — cage 0.1.x in Trixie doesn't accept `-o`, just `-m last`. Use that instead so cage opens on the most-recently-connected output regardless of HDMI-A-N enumeration order. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): use deb-packaged Pi keyring for archive.raspberrypi.com apt update against http://archive.raspberrypi.com/debian trixie was failing in the Pi 4 / Pi 5 viewer image builds: Sub-process /usr/bin/sqv returned an error code (1): Signing key on CF8A1AF502A2AA2D763BAE7E82B129927FA3303E is not bound: No binding signature at time … Policy rejected non-revocation signature (PositiveCertification) requiring second pre-image resistance SHA1 is not considered secure since 2026-02-01 Pi's bare `raspberrypi.gpg.key` URL still serves the original 2012-vintage RSA 2048 key with SHA1 binding signatures that Trixie's sqv refuses to certify under the post-2026-02-01 crypto policy. The deb-packaged keyring inside `raspberrypi-archive-keyring_2025.1+rpt1_all.deb` ships the same key fingerprint but with rebuilt binding signatures that sqv accepts — that's the keyring Pi OS Trixie itself installs, which is why `apt update` against this exact repo works on a real Pi 5 device today. Fetch the deb directly with curl, extract its bundled `.pgp` keyring, and point `signed-by=` at the installed copy. The pin block restricts what packages the Pi repo can supply (mpv + libav* + ffmpeg + libpostproc — the FFmpeg family), so the rest of the image keeps its stock-Debian baseline. Also extend the pin to cover libpostproc* and ffmpeg, since mpv's apt deps drag those into the Pi-tagged version on install; without the pin extension, apt rejected the resolve with "broken packages". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(viewer): per-codec hwdec on Pi via Lua hook mpv 0.40's `--hwdec` accepts a single value at startup, so we can't ask it to try v4l2m2m-copy for H.264 and drm-copy for HEVC out of the box. The Pi-tuned mpv from archive.raspberrypi.com supports both hwdec methods but each covers a different codec subset: * v4l2m2m-copy — Pi 4's V3D V4L2 M2M decoder. H.264 works; Pi 5's Hantro G2 is V4L2-stateless-only so this no-ops there. * drm-copy — FFmpeg's `v4l2_request_hevc` hwaccel. HEVC only, works on both Pi 4 and Pi 5. Add a small `on_load` Lua hook (inlined as `_PI_HWDEC_LUA`, written to /tmp on first play(), loaded with `--script=`) that checks `video-codec-name` and picks the right hwdec at file open. Net effect: Pi 4 H.264 → v4l2m2m-copy (HW) Pi 4 HEVC → drm-copy (HW) Pi 5 H.264 → v4l2m2m-copy (no device, falls back to SW — only path until mpv re-adds v4l2_request_h264 hwdec) Pi 5 HEVC → drm-copy (HW) The base `--hwdec=auto-copy` startup value still applies on x86 / arm64 (vaapi-copy on Intel/AMD; software fall-back on Rockchip), where the hook isn't loaded. Verified on real hardware: $ mpv ... --script=/tmp/anthias-pi-hwdec.lua test_hevc.mp4 [pi-hwdec] codec=hevc -> hwdec=drm-copy Using hardware decoding (drm-copy). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer,server): HW-decode everywhere on Pi 4 / Pi 5 / x86 The previous per-codec Lua hook in media_player.py was a silent no-op: mpv's video-codec-name property is empty at every script event before hwdec init (on_load, on_preloaded), so --hwdec=auto-copy leaked through. auto-copy's upstream whitelist excludes v4l2m2m-copy, so H.264 on Pi 4 fell back to software despite the V3D V4L2 M2M decoder being available. Viewer (src/anthias_viewer/media_player.py) - Replace the Lua hook with ffprobe-driven dispatch from Python at launch time. ffprobe is in the viewer image; the call is ~50 ms. - Per-board mapping: Pi 4 → {h264: v4l2m2m-copy, hevc: drm-copy}; Pi 5 → {hevc: drm-copy}. Pi 5 H.264 falls back to auto-copy because mpv has no v4l2-request H.264 hwdec for the Hantro G1, and passing v4l2m2m-copy there just logs "Could not find a valid device" before SW-falling-back. - Live-verified on Pi 4: "Using hardware decoding (v4l2m2m-copy)" for 1080p H.264 and "Using hardware decoding (drm-copy)" for HEVC at 1080p and 4K. Asset processor (src/anthias_server/processing.py) - Pi 5 profile drops H.264 from passthrough_video_codecs — Pi 5 has no mpv H.264 HW path, so H.264 uploads must transcode to HEVC at upload time to keep the HW-decode-everywhere contract. - Pi 4 profile adds passthrough_video_max_pixels for H.264, capped at 1080p (19201080). 4K H.264 clears the codec gate but the V3D H.264 envelope tops at 1080p60, so the cap forces it through a libx265 re-encode at upload time. HEVC keeps no cap (the dedicated HEVC block handles 4Kp60). - _ffprobe_summary now returns video_pixels alongside codec / container / audio_codec; _video_can_passthrough enforces the per-codec pixel cap when the profile declares one. Tests - test_media_player.py: new per-board hwdec tests (Pi 4 H.264 → v4l2m2m-copy; Pi 5 H.264 → auto-copy; both → drm-copy for HEVC; auto-copy fallback when ffprobe fails; no probe on x86 / arm64). - test_processing.py: matrix tests updated to include video_pixels; parametrised rows now exercise Pi 5 H.264-no-passthrough and the Pi 4 4K H.264 cap. New end-to-end tests prove _run_video_normalisation transcodes Pi 5 H.264 → HEVC and Pi 4 4K H.264 → HEVC. Docs (docs/board-enablement.md, new) - Goal + per-board HW-decode capability table. - Asset processor codec policy spelled out as a contract. - BBB test bed recipe (source clips, libx265 transcode commands, ANTHIAS_DEBUG_DROPS=1, mpv.log slicing). Follow-up: Pi 5 4K HEVC HW The Hantro G2 decoder can't allocate 4K dst buffers from Pi 5's default 64 MB CMA ("v4l2_request_hevc_start_frame: Failed to get dst buffer") and SW-falls-back. Adding cma=512M to the kernel cmdline does NOT work — the kernel takes the cmdline value over the device-tree linux,cma node, orphaning rpi-hevc-dec ("Failed to probe hardware -517") and unpopulating /dev/video, which kills HEVC HW at every resolution. The right fix is a dtparam/dtoverlay in /boot/firmware/config.txt that resizes the existing DT-declared region without orphaning the codec's reserved-mem reference. Until that lands, the pi5 profile should downscale 4K → 1080p HEVC. Documented in cmdline.txt.j2 and docs/board-enablement.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(viewer,server): mock _probe_video_codec; fix mypy on Popen IO types CI failures on the previous commit (`bb27b186`) came from: * ``subprocess.run`` inside ``_probe_video_codec`` blowing up under the existing ``mpv`` fixture, which patches ``subprocess.Popen`` to a MagicMock. ``subprocess.run`` internally instantiates Popen for the ffprobe shellout, gets a MagicMock back, then trips on unpacking communicate()'s result. Fixed by default-mocking ``_probe_video_codec`` in the fixture (returns '' so dispatch falls back to 'auto-copy', preserving legacy assertions) and layering the same mock onto the standalone rotation tests that build MPVMediaPlayer outside the fixture. * ``ruff format``: the multi-line ffprobe arg list in ``_probe_video_codec`` needed splitting one-arg-per-line. * ``mypy``: typing the popen_stdout / popen_stderr locals as ``object`` couldn't satisfy any Popen overload. Switched to ``int \| IO[bytes]`` which covers both the DEVNULL / STDOUT sentinels and the bind-mounted mpv.log file handle. * ``test_passthrough_containers_match_real_ffprobe_format_names`` was pinned to the pi5 profile to exercise the H.264 + HEVC passthrough path; pi5 no longer passthroughs H.264, and the fake summary it constructs has no width/height (so pi4-64's cap fails it too). Switched the pin to x86, which has no per-codec caps — the test is about container recognition, not codec/resolution gating. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): downscale 4K HEVC → 1080p on Pi 5 (CMA workaround) Pi 5's Hantro G2 HEVC decoder is rated for 4Kp60 but the stock 64 MB CMA on Pi OS can't fit a 4K HEVC dst-buffer pool — at 4K mpv hits ``v4l2_request_hevc_start_frame: Failed to get dst buffer`` and silently SW-falls-back. Bumping cma= on the kernel cmdline orphans ``rpi-hevc-dec`` entirely (the kernel takes the cmdline value over the device-tree linux,cma node, leaving the driver returning ``Failed to probe hardware -517``), so the kernel-side knob isn't available without a dtoverlay change. Until that follow-up lands, the asset processor caps Pi 5 HEVC at 1080p both ways: * ``passthrough_video_max_pixels`` gates 4K HEVC uploads out of passthrough — anything wider than 1920×1080 falls through to a re-encode. * New ``transcode_video_max_pixels`` per-codec field tells ``_transcode_to_target`` to emit a ``-vf scale='if(gt(ih,1080),-2,iw)':'min(ih,1080)'`` filter that caps height at the 16:9 budget (cap_h = floor(sqrt(cap × 9/16))). Portrait 4K → 1080p height; landscape 4K → 1920×1080. Sub-1080p sources are untouched (the ``min()`` guard prevents upscale; ``-2`` on width keeps libx265 happy with even dimensions). Pi 4 / x86 don't carry the cap (their HW decoders handle 4Kp60 cleanly), so the filter stays absent from those profiles. Tests cover (a) the new pi5+hevc+4K row in the parametrised passthrough matrix (False at 4K, True at 1080p), (b) ffmpeg argv shape: -vf scale=... emitted for pi5 HEVC, absent for pi4-64 HEVC. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer,system): Pi 5 4K HEVC HW + display-resampled VO sync Two tied changes that move every supported board to clean HW decode at the source's actual framerate. Pi 5 4K HEVC via cma-512 ------------------------ Pi OS for Pi 5 reserves 64 MB of CMA by default. The Hantro G2 HEVC decoder needs a buffer pool large enough to hold several 4K dst frames (each ~12 MB) plus reference frames, so the stock allocation can fit 1080p HEVC but not 4K — at 4K mpv hits ``v4l2_request_hevc_start_frame: Failed to get dst buffer`` and silently SW-falls-back. Adding ``cma=512M`` to /boot/firmware/cmdline.txt does NOT work: the kernel takes the cmdline value over the device-tree ``linux,cma`` node, which orphans ``rpi-hevc-dec`` entirely (returns ``Failed to probe hardware -517`` and ``/dev/video`` disappears, killing HEVC HW at every resolution). The Pi-OS-blessed merge is ``dtoverlay=vc4-kms-v3d,cma-512`` in /boot/firmware/config.txt — the v3d overlay carries its own ``cma-N`` parameter that resizes the DT linux,cma node in place without orphaning the codec driver. A standalone ``dtoverlay=cma,cma-512`` silently no-ops on Pi 5 because the v3d overlay initialises the CMA region first; reusing the v3d overlay's parameter is the documented way to merge them. ansible/roles/system/templates/config.txt.j2 now emits the ``,cma-512`` parameter on Pi 5 only — Pi 4 already gets 512 MB CMA by default so the override is a no-op there. The earlier attempt at a kernel-cmdline cma= override (in cmdline.txt.j2) is removed; the file's comment now points readers at the correct config.txt path. Live-verified on Pi 5: CmaTotal=512MB after the overlay change, /dev/video present, rpi-hevc-dec probes cleanly. Asset processor pi5 profile no longer carries a HEVC pixel cap — Pi 5 can decode HEVC at its silicon's real capability. mpv --video-sync=display-resample --------------------------------- mpv 0.40 defaults to ``--video-sync=audio`` which syncs the video clock to the audio clock and drops VO frames when the two drift. On every board tested (Pi 4 --vo=drm, Pi 5 + x86 --vo=gpu --gpu-context=wayland) this produced 60–90% VO drops at 60 fps content even when the decoder reported healthy HW decode (``Using hardware decoding (...)`` banner present, no decoder errors). The drops were at the VO, not the decoder. ``--video-sync=display-resample`` flips the relationship: sync video to the display refresh and resample audio to match. Audio resampling is a <1% CPU 2-channel job and most signage clips have no audible content anyway, so it's effectively free; the benefit is clean playback at the source's frame rate. Test bed touched ---------------- * test_play_invokes_popen_with_expected_args_on_pi4_64: argv now includes ``--video-sync=display-resample``. * test_video_can_passthrough_respects_board_codec_set: pi5 + hevc + 4K is now ``True`` (passthrough) because the CMA fix lets the silicon do its rated job. Comment updated to point at config.txt.j2. * Removed the transient downscale-on-Pi 5 codepath (``transcode_video_max_pixels`` field, the ``-vf scale='if(gt(ih,...))':...`` filter, and the two tests asserting it) — that was a workaround for the CMA issue and is no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): introduce PlaybackEnvelope dataclass + matrix + cache Foundation for the per-board playback envelope rollout (see /home/ubuntu/.claude/plans/serene-munching-gem.md). No behaviour change yet — wires up the canonical source of truth that processing.py, celery_tasks.py's future re-render walker, and the viewer's hwdec dispatch will all read from in the next commit. src/anthias_server/playback_envelope.py (new) --------------------------------------------- Frozen dataclass ``PlaybackEnvelope`` carrying codec / max_width / max_height / max_fps plus a fixed ``container_ext = 'mp4'``. ``ENVELOPE_BY_DEVICE_TYPE`` maps every supported board: * pi2 / pi3 / arm64 → H.264 1920x1080 30 (no HEVC silicon / no upstream mpv HW path) * pi4-64 / pi5 / x86 → HEVC 3840x2160 60 (dedicated HEVC block or VAAPI; fleet uniformity so the same upload produces bit-identical variants on every board) ``compute_envelope()`` resolves the current process's envelope from DEVICE_TYPE; unset / unknown / mixed-case / whitespace all fall back to the conservative default (H.264 1080p30). ``load_cached()`` / ``save_cached()`` round-trip the envelope to ``~/.anthias/playback-envelope.json``. Cache corruption (missing file, bad JSON, unsupported codec) returns ``None`` so the caller recomputes and overwrites — a hand-edit that breaks the file self-heals on next start. ``save_cached`` writes atomically via temp-file + rename. src/anthias_server/processing.py -------------------------------- ``_ffprobe_summary`` now returns ``video_fps`` alongside the existing keys. The next commit (Phase 2) uses this to decide whether to emit ``-r envelope.max_fps`` — the cap is one-way, so sub-cap source rates pass through unchanged. r_frame_rate is parsed as a rational ``num/den``; unparseable / zero-denominator collapses to ``None`` so the caller treats source fps as "unknown" and skips the gate. tests ----- * tests/test_playback_envelope.py (new): matrix coverage; unset / unknown / cased / whitespace inputs; cache round-trip; missing / corrupt JSON / invalid-payload recovery; atomic write (no leaked .tmp); container_ext invariant. * tests/test_processing.py: positive video_fps cases (integer rates, NTSC drop-frame 30000/1001 + 60000/1001, bogus / no-slash / zero-denominator inputs); the two ``assert summary == { ... }`` ffprobe-recovery tests now include the new ``video_fps: None`` key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): envelope-driven asset processor with sibling-original Refactor ``processing.py`` so every video upload produces a variant matching the board's playback envelope while preserving the source as a sibling ``.original.<ext>`` file. Rotation is now gapless by construction — every variant on disk shares one codec / max resolution / max fps per board, so the viewer's output mode never has to switch mid-clip. src/anthias_server/processing.py -------------------------------- * Replace ``_BOARD_PROFILES`` + ``_resolve_board_profile`` + ``_PI4_H264_MAX_PIXELS`` + ``_BoardProfile`` typedef with ``compute_envelope()`` from the new ``playback_envelope`` module (landed in `0b6bea0c`). One canonical source of truth for "what every variant on disk looks like". * ``_ffprobe_summary`` now returns per-axis dimensions (``video_width``, ``video_height``) alongside the existing ``video_pixels`` total. The envelope check is per-axis so an ultrawide source (e.g. 5760×1080) gets caught by the width cap even though its total pixel count is below 4K's. * ``_video_can_passthrough(summary, envelope)`` is the new contract: passthrough iff (a) container is mp4, (b) codec matches envelope.codec exactly, (c) both axes are within the envelope cap, (d) source fps is at-or-under envelope.max_fps, (e) audio is demuxer-compatible. Any None in source dims / fps bails to transcode (we don't gamble on unsized clips). * ``_transcode_to_target(input, output, envelope=None, source_summary=None)`` emits the smallest set of flags that lands the output inside the envelope. ``-vf scale=...`` only when source > envelope on either axis; ``-r envelope.max_fps`` only when source fps > cap. The fps cap is one-way — we never up-convert a sub-cap source. New helper ``_video_args_for_codec`` picks libx264 / libx265 from the envelope's codec. * ``_run_video_normalisation`` reorganised around the sibling- original pattern: - Fresh upload / legacy asset: rename ``Asset.uri`` to ``<base>.original.<ext>`` (the source-preservation step). - Re-render: read from the existing ``.original.`` sibling instead. - Re-probe from the (possibly new) source location. - Passthrough branch: copy source → variant slot bitwise (cross-device fleet sha256 stays equal). - Transcode branch: staging-file render with the existing atomic-replace contract. - Stamp ``metadata['original_uri']`` (path to sibling), ``metadata['envelope']`` (envelope dict the variant matches). ``metadata['transcode_target']`` kept as the ``envelope.codec`` duplicate for one release of back-compat with the serializer surface. Tests ----- ``test_video_can_passthrough_decision_table`` recast against the H.264 1920×1080 30 default envelope. Each row tests one gate (codec / per-axis dim / fps / audio / unknowns / probe gaps) without overlap. * ``test_video_can_passthrough_respects_envelope`` end-to-end: pin ``DEVICE_TYPE``, build a summary at the given (codec, w, h, fps), assert the verdict. Replaces the legacy ``..._respects_board_codec_set``. * ``test_transcode_to_target_emits_scale_when_source_oversize``, ``..._emits_fps_clamp_when_source_fast``, ``..._omits_clamps_when_source_at_envelope``: pin the smallest ffmpeg flag set per source / envelope combination. * ``_envelope_summary`` helper at the top of the file short-circuits the per-test summary construction. * Mock signatures for ``_transcode_to_target`` updated to accept the new ``envelope`` / ``source_summary`` kwargs. * ``test_resolve_board_profile_picks_target_codec_per_board`` deleted — equivalent coverage is in tests/test_playback_envelope.py against ``compute_envelope`` directly. Stale doc / comment references to ``_BOARD_PROFILES`` / ``_resolve_board_profile`` updated to point at ``playback_envelope.ENVELOPE_BY_DEVICE_TYPE`` / ``compute_envelope``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server): re-render walker + startup envelope reconciler * New celery task `regenerate_for_envelope_change`: walks `Asset.objects.filter(mimetype='video')` and queues `normalize_video_asset` for any row whose `metadata['envelope']` no longer matches the current envelope. Malformed payloads, missing keys, and per-row exceptions are logged but don't stop the walker. * New `AnthiasAppConfig.ready` hook -> `app/startup.py: run_envelope_check`: compares cached vs computed envelope, persists fresh, dispatches the walker on mismatch. Short-circuits under `ENVIRONMENT=test` / `PYTEST_CURRENT_TEST` so pytest runs don't enqueue stray walkers. Celery dispatch failure is logged but non-fatal -- the cache is already saved, so the next start sees the new envelope on disk and recovers. * Tests cover: skip-in-envelope, queue-stale, legacy migration (no envelope key), image-asset skip, force-requeue, malformed payload recovery, continue-after-per-row-failure, every hook code path (test short-circuit, no-cache, match, mismatch, dispatch failure, corrupt cache). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): preserve `.original.<ext>` siblings during orphan sweep The Celery ``cleanup`` task built its "referenced" set only from ``Asset.uri``. With sibling-original storage, the source bytes live at ``metadata['original_uri']`` (e.g. ``<id>.original.mov``) while ``Asset.uri`` points at the playback variant (``<id>.mp4``). Without this fix every video upload's ``.original.<ext>`` falls outside the 1h mtime guard once the variant lands and gets silently deleted on the next hourly sweep — breaking the re-render walker as soon as the envelope changes. * ``cleanup``: union ``Asset.uri`` ∪ ``metadata['original_uri']`` into the referenced set, tolerant of legacy rows with non-dict metadata. * Tests cover the new claim path + the malformed-metadata fallback so a stray ``metadata=None`` row can't crash the sweep. The upload-path serializer itself stays untouched: the existing ``rename(tmp, <id><ext>)`` lands the upload at a single path, and ``processing._run_video_normalisation`` handles the rename-to-``.original.<ext>`` atomically on first run. No double- write, no extra disk traffic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(server): cover sibling-original storage across normalisation paths Adds five tests pinning the ``.original.<ext>`` + variant contract that the envelope walker depends on: * fresh upload → ``<id>.original.<src_ext>`` created next to ``<id>.mp4``; ``metadata['original_uri']`` + ``metadata['envelope']`` populated. * re-render → ``.original.<ext>`` is byte-identical across passes (sha256 compared before/after); the walker reads from it and never rewrites it. * passthrough → both files exist even when the source already matches the envelope (``shutil.copyfile`` semantics, not rename). * legacy migration → pre-rollout assets with no ``original_uri`` key get renamed to ``.original.<ext>`` on first walker pass. * dangling ``original_uri`` → falls back to treating ``asset.uri`` as the source-to-preserve; no silent error, no lost variant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(board-enablement): replace codec policy table with playback envelope * board-enablement.md now documents the envelope matrix as the single source of truth shared by the asset processor, the re-render walker, and the viewer's hwdec dispatch. The legacy ``_BOARD_PROFILES`` / ``passthrough_video_codecs`` vocabulary has been removed -- it never matched what ``processing.py`` does post-envelope. * Calls out the ``<id>.original.<src_ext>`` + ``<id>.mp4`` sibling layout, the metadata keys the walker reads, and the cross-board fleet sha256 expectation. * Pi 5 CMA quote rewritten: the real fix is ``dtoverlay=vc4-kms-v3d,cma-512`` in config.txt, not a downscale workaround. Kernel cmdline ``cma=`` is documented as the broken path it actually is. * Failure-mode list updated for envelope-driven dispatch (off- envelope variant, display refresh ceiling, walker storm on unwritable cache, sha256 fleet divergence). * ``media_player.py`` comment block: updates the Pi 5 H.264 → auto-copy and HEVC → drm-copy comments to reference the playback envelope by name and point at the correct CMA fix (config.txt dtoverlay, not cmdline.txt). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): mypy on `_make_video_asset` + boolean is_enabled * `dict` annotations get explicit `dict[str, Any]` parameters (Anthias's mypy config sets `disallow_any_generics`). * `is_enabled=1` → `is_enabled=True` so the Asset field's bool type matches mypy's view of django-stubs models. * Adds the missing ``typing.Any`` import. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server,tests): envelope-aware container gate + startup hook safety Run 1 of CI surfaced several issues in the envelope refactor: * MP4 family container detection. ffprobe reports an MP4 file's ``format_name`` as ``mov,mp4,m4a,3gp,3g2,mj2`` (``mov`` first because the QuickTime/MP4 demuxer is one codepath). The envelope gate compared the source container to ``envelope.container_ext`` by exact equality, so every MP4 upload was rejected at the container gate even though the bytes are exactly what we'd write. Adds ``_MP4_FAMILY_CONTAINERS`` and special-cases ``mp4`` envelope to accept any synonym. * Celery workers were running ``run_envelope_check``. ``celery_tasks.py`` top-level-calls ``django.setup()``, which fires ``AppConfig.ready`` in every process that imports it, including the celery worker -- the previous comment in ``apps.py`` was wrong. Two writers race on the cache file and could double-queue the walker for a single envelope change. New ``_is_celery_worker()`` short-circuit detects the ``celery -A ... worker`` invocation via ``sys.argv[0]``. * Settings singleton captures HOME at init. ``AnthiasSettings.home`` is set once at module import time, so ``monkeypatch.setenv('HOME', tmpdir)`` in tests doesn't reach the envelope cache helpers. Updates ``cache_dir`` and ``fake_home`` fixtures to also patch ``settings.home`` via ``monkeypatch.setattr``. * Stale tests. - Drop ``test_cleanup_tolerates_non_dict_metadata`` -- the schema enforces ``metadata`` as a non-null JSON dict, so the failure mode it claimed to test can't occur. ``cleanup()`` keeps the defensive ``isinstance(metadata, dict)`` check as a no-cost belt-and-braces. - ``test_video_passthrough_for_h264_or_hevc_in_known_containers`` rewritten as ``test_video_passthrough_when_source_matches_board_envelope`` -- the old matrix included libx264 on pi4-64 (no longer passthrough because pi4-64 is HEVC) and non-mp4 containers (always re-encoded now because the variant slot is fixed at ``.mp4``). - ``test_video_passthrough_records_target_codec`` switches the source codec to libx265 so it actually hits the passthrough branch on pi4-64. - ``test_video_passthrough_uses_summary_duration_no_second_probe`` rebuilt via ``_envelope_summary`` so the synthesised summary carries the new ``video_width / video_height / video_fps`` fields. - The two ``test_ffprobe_summary_handles_`` early-return shape assertions add ``video_width`` / ``video_height`` to match the real return shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> fix(server,tests): drop PYTEST_CURRENT_TEST gate; align stale summaries Run 2 of CI surfaced three more issues: * ``PYTEST_CURRENT_TEST`` is not fixture-controllable. pytest re-sets the env var at the start of every test's ``call`` phase, so ``monkeypatch.delenv`` in a ``setup`` fixture is overridden before the body runs. This made it impossible for any test to exercise the real startup hook path. The ``ENVIRONMENT=test`` gate (set in ``conftest.py`` + the test compose file) is the durable, fixture-controllable signal — keep that, drop the pytest one. Test for the new ``_is_celery_worker`` short-circuit replaces the deleted ``test_short_circuits_when_pytest_current_test``. * Decision table parametrise had a wrong expectation. Summary row "HEVC at envelope (codec, dims, fps all match)" was paired with ``expected=True``, but the test envelope is H.264 — codec mismatch must transcode, ``False``. * ``test_video_passthrough_skips_duration_when_probe_unavailable`` summary missed the new dim/fps fields. Same root cause as before: ``_video_can_passthrough`` rejected the synthesised summary at the dims gate, the test fell through to a real ffmpeg call on a 64-byte stub, and ffmpeg "Invalid data found". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(envelope): add generic-arm64 key for Rock Pi / Armbian SBCs The Anthias install path for Rock Pi 4 / Armbian boards writes ``DEVICE_TYPE=generic-arm64`` (see ``feat(install): generic-arm64 best-effort support``). The matrix only listed ``arm64``, so a real install fell through to ``_DEFAULT`` — same envelope by coincidence, but the walker would have logged "no matrix entry" warnings on every server start and the docs/board-enablement matrix would be subtly wrong about which key applies. Lists the key explicitly with the same conservative H.264 1080p30 envelope and extends the parametrise coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): make celery_tasks.py top-level django.setup() reentrant-safe ``django.setup()`` calls ``apps.populate()``, which raises ``RuntimeError: populate() isn't reentrant`` if invoked while already populating. The new ``AnthiasAppConfig.ready`` hook imports ``celery_tasks`` to dispatch the walker, which until this change top-level-called ``django.setup()`` again -- so on every real server start the import died, the dispatch failed, and the walker never ran. Live-confirmed on the Pi 4 test bed. Check ``django.apps.apps.apps_ready`` before calling ``setup()``: the flag flips to True after the import phase but before per-app ``ready`` hooks run, so the standalone celery worker (where Django isn't initialised yet) still calls setup() as before, while the server process (mid-populate) correctly skips the reentrant call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(server): commit `original_uri` to DB before transcode (crash safety) Live-confirmed on the Pi 4 test bed during the envelope rollout: walker fired on a near-full SD card, ffmpeg ran out of space mid- render, the on_failure hook cleared ``is_processing`` -- and the hourly ``cleanup()`` sweep then silently deleted every ``.original.<ext>`` source it had just renamed, because ``Asset.uri`` still pointed at the (now-missing) variant path and the orphan walker only knew about ``Asset.uri`` + a committed ``metadata['original_uri']``. The metadata accumulator in ``_run_video_normalisation`` only wrote to the DB at the end of the function, so any failure between "rename source → .original.<ext>" and "render variant → atomic replace" left the row's metadata stale. Fix: persist ``metadata`` to the DB right after the rename, before attempting any render. The contract becomes: if the file is on disk under ``.original.<ext>``, the DB row knows it. ``cleanup()`` already reads ``metadata['original_uri']`` into the referenced set (from ``fix(server): preserve `.original.<ext>` siblings during orphan sweep``), so this commit closes the only window where that guard could be bypassed. Adds ``test_original_uri_persisted_before_render_for_crash_safety`` which mocks ``_transcode_to_target`` to raise and verifies the row has ``metadata['original_uri']`` committed by the time the exception propagates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(board-enablement): script-driven 1-minute sample pack Previously the test pack was full-length BBB clips (~10 min) plus an inline ffmpeg recipe in the docs that produced 4K HEVC re-encodes taking ~30 min on a workstation. The on-device walker then had to chew through the full-length variants, which on a Pi 4 / Rock Pi turned a single rotation cycle into hours of wallclock for what was really a hwdec-banner sanity check. * New ``bin/generate_board_enablement_testbed.sh``: downloads the four BBB H.264 sources, trims each to 60 s with ``-c copy`` (instant), then libx265-encodes each cut. Idempotent (skips files that already pass an ffprobe sanity check) and atomic (tmp-then-rename) so a power cycle mid-encode leaves a clean state. * Pack drops from ~3.3 GB / 10 min per clip to ~350 MB / 60 s per clip. 60 s is enough to capture mpv's ``hwdec-current`` banner and read a stable ``Dropped:`` count, while keeping a full walker pass under a few minutes on every supported board. * ``CUT_SECONDS`` / ``HEVC_CRF`` env knobs override defaults for iteration; the table in the doc lists what each clip exercises. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(envelope,viewer): runtime Rock Pi 4 detection unlocks v4l2m2m HW decode ``bin/install.sh`` writes ``DEVICE_TYPE=arm64`` for every aarch64 SBC it doesn't recognise as a Pi — Rock Pi 4, Orange Pi, Allwinner H6 boards, Amlogic S905 boards all share that one catch-all DEVICE_TYPE. The matrix can't promote ``arm64`` to HEVC + HW because most of those boards have no upstream-mpv HW decode path and would log "Could not find a valid device" on every play. But the Rock Pi 4 (RK3399 / Radxa) DOES have a working v4l2m2m driver exposed by the kernel: $ docker exec anthias-anthias-viewer-1 mpv --hwdec=help \| grep v4l2m2m v4l2m2m-copy (h264_v4l2m2m-v4l2m2m-copy) v4l2m2m-copy (hevc_v4l2m2m-v4l2m2m-copy) v4l2m2m-copy (vp9_v4l2m2m-v4l2m2m-copy) ... and ``/dev/video-dec2`` / ``/dev/video-dec4`` are present (the v4l2_request decoder symlinks). Leaving Rock Pi on SW decode for 1080p HEVC measurably wastes the silicon. Resolved at runtime via ``/proc/device-tree/model``: * New matrix key ``rockpi4`` → HEVC 1920×1080 30. 1080p ceiling keeps disk use of the variant + ``.original.<ext>`` sibling comfortable on the typical SD card; HEVC codec exercises the Hantro path on the way through the viewer. * ``compute_envelope`` and ``_pi_hwdec_for_uri`` both probe the device tree when DEVICE_TYPE is ``arm64`` (or legacy ``generic-arm64``). A Rock Pi 4B reports ``Radxa ROCK Pi 4B`` and gets upgraded; an Orange Pi or an Allwinner H6 board stays on the conservative SW envelope. * Failure modes (no device tree, decode error, unknown SBC) all collapse to ``None`` so dev containers and the existing arm64 catch-all keep working unchanged. Four new tests pin: - Rock Pi model → ``rockpi4`` envelope; - legacy ``generic-arm64`` label also gets the upgrade; - unknown SBC keeps the conservative envelope; - missing ``/proc/device-tree/model`` doesn't raise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(envelope,viewer): publish board subtype via host_agent + Redis Previous commit (``dde1b20e``) added a runtime ``/proc/device-tree`` read inside the server + viewer containers. Containers don't see that path by default, and mounting it into every container is heavier than it's worth for one edge case (worse, balena's restricted /proc would still trip). ``anthias_host_agent`` already runs on the host and publishes host-side state to Redis (IP addresses, etc.). It's the right layer for board identification: * New ``detect_board_subtype()`` reads ``/proc/device-tree/model`` directly (host_agent IS on the host) and maps known SBC strings to matrix keys (Rock Pi 4A/4B/4C → ``rockpi4``). * New ``set_board_subtype()`` publishes the resolved key (or the empty string for unknown boards) to ``host:board_subtype`` before ``subscriber_loop`` flips ``host_agent_ready`` — so consumers can rely on the key being there once the readiness flag is set. * Server's ``playback_envelope.compute_envelope`` and viewer's ``_pi_hwdec_for_uri`` read the same Redis key when DEVICE_TYPE is ``arm64`` / legacy ``generic-arm64``. Failure modes (Redis down, key missing, decode error) all collapse to ``None`` so the caller falls back to the conservative arm64 envelope. No compose template changes. The viewer + server containers already have Redis reachable (they use it for the Channels layer + walker dispatch already), so the data path is free. Unit tests pin: * device-tree → subtype mapping for canonical + variant + edge Rock Pi strings, plus unknown boards; * Redis publish writes the resolved key OR empty string; * server's compute_envelope reads back through Redis correctly for known / unknown / empty / unreachable cases; * subscriber_loop calls set_board_subtype before flipping ``host_agent_ready`` — race-free ordering. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(celery): cap walker to --concurrency=1 so transcodes can't choke playback Default celery worker concurrency = num_cores. On the boards Anthias actually ships to (Pi 4 / Pi 5 / Rock Pi 4 / arm64 SBCs), that means up to 4 parallel ``libx265`` encodes sharing the same SoC as the viewer's mpv process. ``nice -n 19`` + ``ionice -c 3`` are already in place, but nice(1) only helps when there's CONTENTION -- four ffmpegs at nice 19 still saturate every core, and each 1080p libx265 encode needs ~500 MB RAM. A 4 GB SBC pushes into swap well before the walker finishes, which stalls everything on the host -- live- confirmed on the Rock Pi 4 during this PR: sshd starved through banner exchange whenever the walker hit a fresh burst. Asset processing is upload-time, not throughput-bound. The operator-facing latency that matters is "upload click → asset visible in rotation", which is bound by ONE encode regardless of queue parallelism. Serial encodes finish a few minutes later in wallclock but the viewer never drops a frame. Applied to every prod / dev compose template. ``docker-compose.test.yml`` is left at default because the test suite never runs live normalize tasks (the celery service in tests just exercises the task dispatch plumbing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): force MPV on legacy ``generic-arm64`` DEVICE_TYPE Rock Pi 4 running an older arm64 image reports ``DEVICE_TYPE=generic-arm64`` (pre-``refactor: rename device_type generic-arm64 → arm64`` rebuilds). The MediaPlayerProxy override only force-routed MPV for ``arm64`` / ``pi4-64``, so the legacy label fell through to VLC -- which then crashed with ``NameError: no function 'libvlc_new'`` because the libvlc lib isn't installed on the arm64 image. Live-confirmed in the viewer crash loop on the Rock Pi 4 during this PR. Adds ``'generic-arm64'`` to the force_mpv set + a test pinning the dispatch. Covers the in-the-wild rolling-upgrade window where a Rock Pi 4 deployment is sitting on an old image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(viewer): route ``generic-arm64`` through cage + ALSA-default like ``arm64`` Two more places in ``media_player.py`` only checked the post-rename ``arm64`` DEVICE_TYPE and missed the legacy ``generic-arm64`` label the Rock Pi 4 test bed still reports: * VO dispatch (line ~419) — without this, a generic-arm64 host falls through to the ``--vo=drm`` else branch, which mpv aborts with "No primary DRM device could be picked" because cage already holds DRM master in the cage + Wayland viewer stack (live-confirmed on the Rock Pi 4 in this PR). * ALSA card selection (``get_alsa_audio_device``) — the Pi-name dispatch below the env-var check picks ``vc4hdmi`` / "Headphones" cards that don't exist on Rockchip / Allwinner / Amlogic. Without the legacy label here, mpv tries to open the Pi-specific HDMI card and dies with ``Unknown PCM sysdefault:CARD=vc4hdmi``. Both branches now use the shared ``_ARM64_DEVICE_TYPES`` frozenset that already governs the hwdec subtype probe, so the three paths (envelope, hwdec dispatch, VO + ALSA) agree on what DEVICE_TYPE labels are aarch64-catch-all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(envelope): Rock Pi 4 stays on H.264 1080p30 -- stock ffmpeg has no v4l2_request Live testing on the Rock Pi 4 surfaced that the arm64 viewer image's stock ffmpeg (Debian 7.1.3-0+deb13u1) is built without ``--enable-v4l2-request``, and the underlying kernel exposes the RK3399's decoders only via the stateless v4l2_request API (``rkvdec`` for HEVC, the Hantro block as ``rockchip,rk3399-vpu-dec`` for H.264). ffmpeg's stateful ``hevc_v4l2m2m`` / ``h264_v4l2m2m`` decoders can't reach them -- mpv logs ``Could not find a valid device`` even after ``/dev/video-dec`` symlinks are present. mpv ``--hwdec=help`` also doesn't list rkmpp or drm-copy, so there's no other path through the stock build. So: ``rockpi4`` envelope drops from HEVC 1920x1080 30 to H.264 1920x1080 30 -- the same conservative tier as the generic ``arm64`` catch-all. The viewer SW-decodes 1080p30 in real time on the Cortex-A72; no frames dropped, just no HW gain over plain ``arm64``. * Rock Pi entry drops from ``_PI_HWDEC_BY_CODEC`` -- mpv falls through to ``auto-copy`` which mpv's whitelist resolves to SW decode on this build. * host_agent's subtype publish, the start_viewer.sh ``/dev/video-dec`` symlink creation, and the dedicated ``rockpi4`` matrix key all stay in place -- they're forward-compatible scaffolding so a follow-up enabling v4l2_request (or linking rkmpp) in the viewer build only has to bump the matrix entry's codec to ``hevc`` and add the hwdec dispatch row. No further plumbing churn. Tests + docs reflect the routing-without-HW reality. The legacy-label fixes from this PR (force_mpv + ``--vo=gpu --gpu-context=wayland`` + ALSA default for the ``generic-arm64`` DEVICE_TYPE) are unaffected -- those are real bug fixes the Rock Pi 4 needs to play anything under cage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(viewer,envelope): extend +rpt1 ffmpeg to arm64; Rock Pi 4 = HEVC 4Kp60 The Raspberry Pi APT repo's ffmpeg build (``+rpt1``) ships with ``--enable-v4l2-request --enable-libudev --enable-vout-drm``, which the stock Debian Trixie ffmpeg drops. Without those flags the v4l2_request hardware decoder family is unreachable from mpv — which is exactly what bit the Rock Pi 4 in this PR: RK3399's ``rkvdec`` (HEVC) and Hantro VPU (H.264) are both stateless v4l2_request decoders. Pi 4 / Pi 5 already pull from the +rpt1 repo for the same reason; extending the conditional in ``Dockerfile.viewer.j2`` to also include ``arm64`` lights up hardware decode on every arm64 SBC whose kernel exposes v4l2_request decoders (Rock Pi, Orange Pi RK356x, Pine64, Allwinner H6 with Cedrus, ...). * ``Dockerfile.viewer.j2`` — board conditional ``('pi4-64', 'pi5')`` → ``('pi4-64', 'pi5', 'arm64')``. The apt pin already restricts the +rpt1 repo to ``ffmpeg + libav* + mpv``, so other arm64 packages stay on stock Debian. Comment block updated to list which decoders each board reaches via this path. * ``playback_envelope.py`` — ``rockpi4`` envelope flips from H.264 1080p30 to HEVC 3840×2160 60. RK3399's Hantro G2 is the same decoder family as Pi 5's and supports 4Kp60 per the Rockchip datasheet — matching Pi 5's envelope keeps the fleet uniform. * ``media_player.py`` — ``_PI_HWDEC_BY_CODEC['rockpi4']`` maps both h264 and hevc to ``drm-copy`` (the v4l2_request hwdec path, same as Pi 5 for HEVC). * Tests + docs updated accordingly. The legacy-arm64 fixes (force_mpv + cage VO + ALSA default for ``generic-arm64``) and the host_agent subtype publish are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(celery): cgroup CPU hard cap (`cpus: 1.0`) so encodes never starve the viewer ``nice -n 19 ionice -c 3`` + ``--concurrency=1`` lower priority and limit parallelism, but they're soft hints — when libx265 is the only heavy workload on the box the scheduler still hands it everything available. Live-confirmed on the Rock Pi 4 in this PR: sshd starved through banner exchange and mpv dropped mid-frame during walker bursts, even with all three soft caps in place. ``cpus: 1.0`` is a cgroup CFS quota — one CPU's worth of compute per period, kernel-enforced. On every supported SBC (Pi 4 / Pi 5 / Rock Pi 4, all 4-core) it leaves 3+ cores for the viewer, the host_agent, sshd, and everything else. x86 hosts have 8+ cores so the cap is conservative there but harmless — asset processing is upload-time, not throughput-bound. Applied to every prod / dev compose template. test compose stays uncapped because the test suite runs in CI environments with deterministic resources where the cap would just slow CI down without protecting anything. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(celery): scale CFS quota with host cores (half of \$(nproc), min 1.0) A flat ``cpus: 1.0`` is too aggressive: it forces a single-thread ceiling even when the host has many idle cores. On an 8-core x86 deployment the asset processor would take 4x longer than it needs to without protecting anything we don't already protect. Compute the limit dynamically in ``bin/upgrade_containers.sh``: ``$(nproc) * 0.5`` (floored to 1.0 so single-core hosts still make progress). On the supported boards this lands at: * 4-core Pi 4 / Pi 5 / Rock Pi 4 → cpus: 2.0 (2 cores headroom for the viewer + system) * 8-core x86 → cpus: 4.0 (4 cores headroom) * 16-core x86 → cpus: 8.0 (still 50/50 with the system) Soft priorities (``nice -n 19 ionice -c 3``) and the ``--concurrency=1`` walker still apply on top; the cgroup quota is the hard backstop that guarantees "encoding never impacts playback or UI access". Live test on the Rock Pi 4 (in this PR) proved the soft caps alone aren't enough — libx265 saturated every core and starved sshd through banner exchange. The balena compose templates use a literal ``cpus: 2.0`` (balena only targets 4-core Pi 2/3/4/5 today); the non-balena prod compose substitutes the env var. Dev compose also uses a literal ``2.0`` since dev hosts vary too widely to autodetect cheaply. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(walker): hardware-decode the source in the transcode pipeline The walker's encode pass stays libx265-software-bound on every SBC (none of Pi 4 / Pi 5 / Rock Pi 4 have HEVC HW encode), but the decode half of the pipeline can be offloaded to the same silicon mpv uses for playback. That's typically 30-50% of the ffmpeg wall-clock on H.264 sources and dominant on 4K — well worth the small dispatch table. * ``_decode_hwaccel_args(source_codec)`` returns the per-board ``-hwaccel`` flags to prepend to the ffmpeg invocation. Uses the same host_agent subtype probe (``host:board_subtype`` in Redis) that envelope resolution already uses, so the walker and viewer agree on what board they're targeting. * Dispatch matrix: - Pi 4 (V3D V4L2 M2M + rpi-hevc-dec) → ``-hwaccel drm`` for both H.264 and HEVC (the +rpt1 ffmpeg's v4l2_request path). - Pi 5 (Hantro G2) → ``-hwaccel drm`` for HEVC only. - Rock Pi 4 (rkvdec + Hantro VPU) → ``-hwaccel drm`` for both, same v4l2_request path as Pi 5. - x86 (VAAPI) → ``-hwaccel vaapi -hwaccel_device /dev/dri/renderD128`` for both. - Pi 2 / Pi 3 / unknown arm64 → no HW path mpv can address; SW decode is the only choice. * ``_transcode_to_target`` wraps the ffmpeg call: first attempt with hwaccel args, fall back to SW decode on ``sh.ErrorReturnCode`` (kernel driver weird, device busy, bitstream the v4l2_request decoder rejects). Logs the underlying ffmpeg stderr at WARNING so an operator chasing a slow walker sees the HW path failed. Tests pin every cell of the dispatch matrix + assert ``-hwaccel`` lands BEFORE ``-i`` in the argv (placing it after silently no-ops in ffmpeg) + the two-call SW-fallback path on simulated HW init failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(server-image): extend +rpt1 ffmpeg pin to anthias-server too The walker's HW-decode optimization (``processing._decode_hwaccel_args`` emits ``-hwaccel drm``) only works against the Raspberry Pi repo's ``+rpt1`` ffmpeg build, which has ``--enable-v4l2-request``. The pin was previously only on the viewer image (Dockerfile.viewer.j2 in ``ba8d4709``), so the celery container — which runs the walker — kept the stock Debian ffmpeg and the hwaccel call silently fell back to SW on every board. * New ``docker/_rpt1-ffmpeg-pin.j2`` extracts the pin block. * Both ``Dockerfile.viewer.j2`` and ``Dockerfile.server.j2`` now include it via ``{% include '_rpt1-ffmpeg-pin.j2' %}``. Server also re-runs ``apt install --reinstall ffmpeg libav`` so the pinned version replaces whatever the base layer installed. No effect on Pi 2 / Pi 3 / x86 boards — the include's ``{% if board in ('pi4-64', 'pi5', 'arm64') %}`` keeps it inert there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * perf(celery,viewer): four hardening fixes so the player survives an upgrade Live testing on Pi 4 / Pi 5 / Rock Pi 4 surfaced four scenarios where a single ``docker compose pull && up -d`` (or any upgrade that invalidates the playback envelope) wedges the device. These aren't test-harness flakes; production operators on the same hardware would hit them. All four belong in this PR alongside the features that exposed them. 1. Walker drip-feed — ``regenerate_for_envelope_change`` previously queued every stale ``normalize_video_asset`` in one beat tick. ``--concurrency=1`` serialises execution but the celery worker fetches the next task the instant the previous finishes, so a 100-asset catalog turns into hours of back-to- back libx265 with zero recovery windows between encodes. Switch to ``apply_async(args=..., countdown=N * 60)`` so each subsequent normalize starts at least 60 s after the previous was queued. Operator can flip ``is_processing=False`` on a row mid-window to cancel its turn. 2. ``mem_limit`` on celery container — cgroup CPU isolation alone doesn't stop libx265-4K from allocating ~1.5 GB resident memory, which on a 4 GB SBC pushes the system into swap and starves sshd + the viewer. Match the cpus cap with a memory cap (60% of host RAM, computed in ``bin/upgrade_containers.sh``). 3. ``stop_grace_period: 3s`` + ``stop_signal: SIGKILL`` on viewer — cage doesn't reliably release DRM master on SIGTERM (its libinput shutdown path hangs on certain kernels) and the kernel's GPU driver leaves dangling references that prevent the next ``up`` from acquiring DRM master. Skipping the SIGTERM-then-wait dance on intentional restarts gets the device past cage's bug deterministically. 4. libx265 / libx264 ``-preset superfast`` — was ``medium``. Asset processing is upload-time and only runs once per asset, so the 5-10× wallclock speedup is operator-facing throughput. The ~10-20% bitrate increase is invisible on typical signage content. Viewer decode is HW regardless of preset. Tests: * Walker test mocks switched from ``.delay`` to ``.apply_async``; signatures updated for ``args=(...,)`` + ``countdown=`` kwarg. * New ``test_regenerate_walker_spaces_dispatches_via_countdown`` asserts the countdowns are ``[0, 60, 120, ...]`` across a 5-asset catalog so the drip-feed contract is pinned. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): use sh.ErrorReturnCode_1 in hwaccel fallback test sh.ErrorReturnCode is the abstract base; its __init__ does `self.exit_code = self.exit_code` which AttributeErrors unless the concrete numeric subclass (ErrorReturnCode_1, _2, ...) is used. Every other call site in this file already uses ErrorReturnCode_1 — this was the lone outlier introduced with the SW-fallback test in `0340b4f4`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(asset-processor): drop on-device video transcoding On-device libx265 transcode wedged a Pi 4's celery worker for 99 min on a single 4K60 H.264→HEVC pass during PR validation. Every supported board already HW-decodes both H.264 and HEVC via the viewer's per-board mpv hwdec dispatch (drm-copy / vaapi-copy / v4l2m2m-copy), so the re-encode provided no playback benefit for the codecs operators actually upload. - ``normalize_video_asset`` now runs ffprobe and writes codec / dims / fps / duration into ``metadata``; the asset file is never rewritten. - Removes the envelope module, the re-render walker (``regenerate_for_envelope_change``), and the server-start envelope cache reconciliation hook. - Drops 33 transcode / envelope / sibling-original tests. Image normalisation (HEIC/HEIF/TIFF/BMP/ICO/TGA/JP2/AVIF → WebP) is unchanged. The viewer-side per-board hwdec dispatch and host_agent board-subtype publishing are unchanged. For codecs the target board can't HW-decode (MPEG-2, MPEG-4 ASP, ...) the operator's recovery is to upload a transcoded copy; the metadata fields surfaced here let them see codec / dims / fps in the asset list before pushing the asset to the field. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(asset-processor): gate uploads to hardware-decoded codecs only After ffprobe, ``normalize_video_asset`` now compares the source codec against the board's HW-decode set (mirroring the viewer's ``_PI_HWDEC_BY_CODEC``). Uploads outside the set are rejected with an error message that includes the rejected codec, the board's supported codecs, and an ``ffmpeg`` command line the operator can run on their workstation to transcode the source. Per-board HW decode set: - pi2 / pi3 → {h264} - pi4-64 / rockpi4 / x86 → {h264, hevc} - pi5 → {hevc} (no H.264 v4l2-request decoder mpv can reach) - arm64 catch-all → ∅ (operator must install a board-specific image) Also extracts ``DEVICE_TYPE`` → board-key resolution into a new ``anthias_common.board`` module so the server's gate and the viewer's hwdec dispatch share the same logic — eliminates the duplicated ``_redis_board_subtype`` mirror in ``media_player.py``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(dashboard): surface unsupported-codec failures with copyable recipe UI/UX review of the gate's failure path surfaced two P0s and a few smaller nits: - The error message was only reachable via a native browser ``title`` tooltip on the Failed pill — invisible on touchscreens, can't be copied, leaks the ``UnsupportedVideoCodecError:`` class prefix into the aria-label. - The Edit Asset modal showed nothing about the failure — exactly the place the operator goes to act on a failed row. Changes: - ``UnsupportedVideoCodecError`` now carries the ffmpeg recipe as a ``recipe`` attribute. ``_NormalizeAssetTask.on_failure`` writes the bare message into ``metadata.error_message`` (no class-name prefix) and persists the recipe to ``metadata.error_recipe``. - ``_asset_row.html`` Failed pill becomes a button — click opens the Edit Asset modal. - ``_asset_modal.html`` renders a warning banner at the top of the Edit form when ``metadata.error_message`` is set, with the recipe inside a copyable ``<code>`` block + "Copy command" button. - ``_ffmpeg_reencode_recipe`` substitutes the operator's upload filename (stashed in ``metadata.upload_name`` at upload time) for the ``INPUT`` placeholder so the recipe is paste-ready. - Toast text shortened from "analysing video…" to "reading metadata…" (the ffprobe pass is sub-second now that there's no transcode). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(processing): give recipe output a codec suffix so it doesn't overwrite input E2E validation on a Pi 5 surfaced a recipe like: ffmpeg -i 'sample-h264.mp4' -c:v libx265 ... 'sample-h264.mp4' — input and output point at the same file because both got the upload's stem + ``.mp4`` suffix. Operator pasting the recipe would overwrite their source. The fix gives the output filename a target- codec marker (``sample-h264.hevc.mp4`` / ``sample-h264.h264.mp4``) so the recipe is safe to copy-paste even when the upload's extension already matches the output container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: drop transcode-era defensive hardening on celery + server image These guards were load-bearing while the asset processor ran libx264 / libx265 transcodes; with the on-device transcode pipeline gone they're dead code defending against a workload that no longer exists. Removed: - ``cpus: ${CELERY_CPU_LIMIT}`` / ``cpus: 2.0`` cgroup CPU caps on anthias-celery (every compose template) - ``nice -n 19 ionice -c 3`` wrapper on the celery command - ``--concurrency=1`` on celery worker; default celery concurrency is fine when the only tasks are ffprobe + Pillow conversion - ``CELERY_CPU_LIMIT`` calc in ``bin/upgrade_containers.sh`` - ``_rpt1-ffmpeg-pin.j2`` include + reinstall layer in ``Dockerfile.server.j2``; the +rpt1 ffmpeg was only needed for the walker's ``-hwaccel drm`` transcode. The server now only runs ffprobe, which the stock Debian ffmpeg handles fine (smaller server image, simpler base) - Stale ``ffprobe → passthrough or libx264/aac transcode`` section header in processing.py Kept: - ``mem_limit: ${CELERY_MEMORY_LIMIT_KB}k`` on celery — still a useful safety net against a decompression-bomb fixture or runaway ffprobe - ``+rpt1`` ffmpeg pin on the viewer image — still load-bearing for mpv's ``v4l2_request`` HW decode on Pi 4 / Pi 5 / Rock Pi 4 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: keep nice -n 19 ionice -c 3 on celery Cheap insurance against pathological inputs (decompression-bomb HEIC, runaway ffprobe). Brought back across all four compose templates after stripping the CPU cap + --concurrency=1 in the prior cleanup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(dashboard): address review feedback on codec gate UX * Plain-HTTP clipboard fallback. navigator.clipboard.writeText only resolves on secure origins, so on a LAN device (HTTP) the Copy command button silently failed. Add a window.fallbackCopyToClipboard helper that uses execCommand('copy') against an off-screen textarea, and have the inline copyRecipe() try it whenever navigator.clipboard isn't available or rejects. The recipe block also gets user-select:all so keyboard-copy still works if both paths fail. * Friendlier message for the arm64 catch-all branch. "Supported: none." read like the board literally has no decoder; replace with an explanation that the board hasn't reported a subtype yet and a pointer at the board-specific image. * Lock the gate (_HW_DECODE_VIDEO_CODECS) and the viewer dispatch (_PI_HWDEC_BY_CODEC) together with a consistency test so a future edit to one table can't quietly diverge from the other. * Cover the shell-quoting of recipe filenames with hostile-name parametrize cases (single quote, backtick, $(), ;) so a copy-paste recipe can't be turned into command injection. * Drop the stale "cgroup CPU cap" line from processing.py's module docstring — the cap was removed in `f85f8035`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix: address post-review feedback on codec gate / hwdec dispatch - processing: prefer the upload's extension token when ffprobe's format_name is a synonym list, so an .mp4 surfaces as container=mp4 (not mov, the first synonym). - bin/start_viewer.sh: drop the loose `-dec` catch-all from the v4l2 decoder match; keep the explicit rkvdec/cedrus/hantro/ -vpu-dec prefixes. - media_player: cap the ANTHIAS_DEBUG_DROPS mpv.log at 64 MB with a rolling truncate so a forgotten-on flag can't grow the disk. - tests: rename test_set_board_subtype_does_not_raise_on_redis_failure to test_set_board_subtype_propagates_redis_failures — matches what the test actually asserts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 11:46:02 +02:00
Viktor Petersson	d2ebdd5450	fix(api): v1/v1.1 normalize dispatch + stuck-row reconciler (#2870 ) (#2873 ) * fix(api): dispatch normalize for v1/v1.1 video uploads + reconcile stuck rows (#2870) * Wire normalize-task dispatch into v1 and v1.1 ``AssetListView.post`` via a shared ``dispatch_pending_normalize`` helper; before this, v1/v1.1 video uploads landed in whatever codec the operator sent and the per-board passthrough set silently dropped non-h264/hevc files from rotation forever. * Set ``_pending_normalize='video'`` / ``is_processing=1`` on local video uploads in ``CreateAssetSerializerV1_1``. Image normalisation stays opt-in via v1.2 / v2 — promoting it onto the legacy endpoints would shift synchronous availability semantics for older clients. * Add ``reconcile_stuck_processing`` celery-beat task (10-min cadence, 60-min threshold) to recover rows whose normalize task was lost (worker SIGKILL, OOM, backup restore that bypassed dispatch). Ages rows via ``metadata.processing_started_at`` — stamped by each dispatch helper, granted on first sight for legacy / restored rows. * Drop ``hurry.filesize`` for ``django.template.defaultfilters.filesizeformat``. ``free_space`` on ``/api/v1/info``, ``/api/v1.1/info``, ``/api/v2/info`` changes from ``"15G"`` → ``"15.0 GB"`` (NBSP separator, decimal, full unit). Same change on home-page disk fields. * Bump version to 2026.05.1 in pyproject.toml, package.json, uv.lock. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: patch v1.1 validate_uri + pass ext on local-upload fixtures CI failures on the new dispatch tests: * v1 / v1.1 video and image tests hit ``validate_uri('/data/…')`` → 500 ``Invalid file path``. The mixin patch only covered ``serializers.mixins.validate_uri``; v1 / v1.1 import the same helper as ``serializers.v1_1.validate_uri``. Patch both binding sites. * v1.2 / v2 image test expected ``dispatch_normalize_image`` once but got zero. The mixin's rename writes ``<asset_id><ext>``; without ``ext`` in the payload the file lands extensionless and ``needs_image_normalisation`` returns False. Pass ``'ext': '.heic'``. * Also pass ``'ext': '.mp4'`` on the video payload — mixin uses it for the rename destination too; v1 / v1.1 ignore it. * Re-run ``ruff format`` on tests/test_processing.py — pre-push run used ``ruff check`` only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(reconciler): handle naive timestamps, single-flight via Redis lock Addresses Copilot review on #2873: * ``_parse_processing_started_at`` now returns ``None`` for naive (no-tzinfo) ISO-8601 strings. The dispatch helpers stamp via ``timezone.now()`` (tz-aware under ``USE_TZ=True``), so a naive value is by definition a hand-edit of metadata — comparing it to the tz-aware ``cutoff`` would raise ``TypeError`` and abort the sweep for every other stuck row. Route it through the stamp-on-first-sight branch instead. * Wrap the reconciler body in a SETNX + Lua-compare-and-delete lock on ``RECONCILE_STUCK_LOCK_KEY`` — same pattern as ``revalidate_asset_urls``. Default Anthias runs one worker with embedded beat (so single-flighted today), but the lock bounds blast radius if a future deploy ever runs two workers with ``-B``. * Rewrite the ``dispatch_download`` docstring: stamping ``processing_started_at`` does not let the reconciler re-download — it routes via mimetype='video' through ``normalize_video_asset``, whose ``path.isfile`` guard fails fast and ``on_failure`` writes ``metadata.error_message`` so the operator sees a "Failed" pill instead of a row stuck on "Processing". Tests: naive-timestamp recovery + lock-prevents-overlap + lock-released-after-clean-run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(processing): make stamp helper public, tighten reconciler types Addresses second Copilot pass on #2873: * Rename ``_stamp_processing_start`` → ``stamp_processing_start``. The single underscore marked it as module-private but it's already called cross-package (anthias_common.youtube.dispatch_download, anthias_server.celery_tasks.reconcile_stuck_processing). Promoting to a public symbol matches its actual API surface; docstring now states the cross-module contract explicitly. * Wrap the read-modify-write in ``transaction.atomic()``. SQLite is single-writer at the file level so the wrap is mostly documenting intent today, but on Postgres / MySQL it produces an actual transaction boundary against a near-simultaneous operator PATCH or worker landing ``error_message``. * Fix the docstring claim of "no-op via the filter().update() pattern" — the code is a SELECT-then-UPDATE; the no-op semantics come from the ``filter().first() is None`` guard, not from the update query shape. * Tighten ``_parse_processing_started_at`` return type from ``Any`` to ``datetime \| None``. Pull ``datetime`` into the module-level ``from datetime import …`` so the annotation resolves; drop the redundant local imports inside the parser and reconciler bodies. * Fix the reconciler docstring's "10-min grace window" — the actual grace is ``RECONCILE_STUCK_THRESHOLD_S`` (60 min) from the first-sweep stamp, with the 10-min cadence just dictating how often the row is re-checked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: honest race acknowledgement on stamp helper + fix OpenAPI example Round-3 Copilot review on #2873: * ``stamp_processing_start`` docstring: replace the implied "transaction.atomic() makes this safe" framing with a clear acknowledgement that this is a read-modify-write race against arbitrary other metadata writers (operator PATCH, normalize task landing error_message). Document why we accept it: stamps fire at dispatch time, before the worker picks the task up, so the practical window is microscopic — matches the same race the wider codebase already accepts in ``_set_processing_error`` et al. ``select_for_update()`` would close it on Postgres / MySQL but is a no-op on SQLite (Anthias's only deployed backend), so we'd pay complexity for zero practical safety today. The ``transaction.atomic()`` wrap stays — cheap savepoint that survives a future multi-writer-backend switch. * ``InfoViewMixin`` OpenAPI ``example`` field: update ``free_space`` from ``'10G'`` to ``'10.0 GB'`` so the generated docs match the new ``filesizeformat`` shape (NBSP separator, decimal, full unit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs: tighten reconciler constants' cost + threshold rationale Round-4 Copilot review on #2873: * ``RECONCILE_STUCK_INTERVAL_S`` comment: the previous "one SELECT, one UPDATE per stuck row" undercounted — each per-row branch itself does SELECT + UPDATE inside ``stamp_processing_start``, and the re-dispatch branch may do more on top. Rewrite to reflect the actual cost shape: the initial SELECT plus per-row stamp / dispatch work, with the observation that a healthy fleet has an empty filtered set so the tick costs one SELECT total. * ``RECONCILE_STUCK_THRESHOLD_S`` comment: removed the claim that a row past the threshold "has had its worker time-limit expire" — rows can be stuck for reasons that never involved a worker picking the task up (Redis flake during ``.delay()``, container restart between enqueue and accept). Rewrite to call out both classes of failure so the threshold's role is honest: "if a row has carried is_processing=True for over an hour, something went wrong — recover it." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 15:46:55 +01:00
Viktor Petersson	2d0132b6a4	feat(processing): normalise HEIC/HEIF/TIFF images and exotic-codec videos at upload time (#2832 ) * feat(processing): add upload-time normalisation for images and exotic-codec videos Two new Celery tasks run on every fresh upload, mirroring the ``download_youtube_asset`` async pattern: * ``normalize_image_asset`` converts HEIC / HEIF / TIFF to lossless WebP via Pillow + pillow-heif, preserving alpha. Other image formats short-circuit out as a no-op. * ``normalize_video_asset`` ffprobes the upload, passes through if it's already H.264/HEVC in an accepted container with a viewer- friendly audio codec, otherwise transcodes to H.264 + AAC MP4 with ``-threads 2 -preset medium -crf 23`` so two cores stay free for the on-device viewer. Both tasks land their output via a staging-file rename, write ``Asset.metadata`` flags (``original_ext``, ``transcoded`` / ``converted``, ``error_message``), and clear ``is_processing`` on success — or via a custom ``Task.on_failure`` on permanent failure so a row never stays stuck on the "Processing" pill. Schema: * New ``Asset.metadata`` JSONField (default dict) plus migration. Exposed read+write on ``AssetSerializerV2`` (read-only on v1.x). Wiring: * ``CreateAssetSerializerMixin.prepare_asset`` flags ``is_processing`` and stashes ``_pending_normalize`` (``image``/``video``/``None``); ``AssetListViewV2`` and ``AssetListViewV1_2`` dispatch the matching task after persistence. * The HTMX ``assets_upload`` view now persists the source extension on disk so the task can identify the format, replaces the ``probe_video_duration`` hop with ``normalize_video_asset`` (whose passthrough branch is the same probe + duration path), and dispatches ``normalize_image_asset`` for HEIC/HEIF/TIFF. * Add-asset modal accepts the wider extension list. Resource control: * ``anthias-celery`` worker command wrapped with ``nice -n 19 ionice -c 3`` in compose templates so transcodes never starve the on-device viewer. * ``ffmpeg`` invocation pins ``-threads 2`` for the same reason. Dependencies: * New: Pillow, pillow-heif (Python); libheif1 in ``base_apt_dependencies`` (~1 MB extracted). * No changes to ffmpeg/ffprobe — already runtime deps. Tests: * ``tests/test_processing.py`` covers both tasks: HEIC/HEIF/TIFF conversion (incl. uppercase ext, RGBA handling), JPEG no-op, corrupt-input failure path, six-row passthrough decision table, exotic-codec → H.264 transcode (mpeg2, mjpeg), MP4-with-non-H264 in-place transcode, ffmpeg timeout/failure/zero-byte cleanup, ffprobe missing-stream parsing, on_failure metadata write, prepare_asset routing for HEIC / video / remote URL / JPEG. * PDF support is deferred to a follow-up — out of scope here per the issue's "image/video first" framing. * ci(mypy): include Pillow in mypy group so processing.py type-checks * fix(processing): address Copilot review comments * assets_upload now falls back to UploadedFile.content_type and finally an extension-based classification so HEIC/HEIF/TIFF uploads still classify on hosts whose mimetypes DB doesn't ship ``image/heic`` mappings. * _ffprobe_summary derives the container from ffprobe's ``format.format_name`` (a comma-joined synonym list — pick the first token in the passthrough set) instead of trusting the filename extension. A ``.bin`` file containing MP4 bytes now classifies correctly; a ``.mp4`` file containing avi-only bytes no longer slips into the passthrough branch. * Zero-byte ffmpeg output now removes the staging file before raising, mirroring the timeout/error branches above. All three failure paths share a small _drop_staging() helper so cleanup stays consistent. New tests: * ffprobe summary prefers format_name over filename, with a deterministic fallback to the extension when format_name is absent. * zero-byte transcode output cleans up its staging file (asserts no leftover ``staging`` files in assetdir). * assets_upload classifies HEIC via Content-Type when guess_type returns None. * test(processing): drop /tmp/ paths from ffprobe summary tests SonarCloud's python:S5443 flags hardcoded ``/tmp/`` paths as "publicly writable directory" usage. The flagged lines pass these strings as labels to ``_ffprobe_summary`` (whose internals are mocked away in those tests) — only the extension is consumed by the real code path. Switching to ``fixture.<ext>`` keeps the test intent clear and silences the security hotspot. * feat(processing): pick transcode target per board (H.264 vs HEVC) The previous pipeline always emitted H.264, which is wasteful on boards whose player can hardware- or software-decode HEVC: a typical clip re-encodes ~30-50% smaller at perceptual parity. Introduce a board-profile grid keyed on ``DEVICE_TYPE`` (set at image-build time in the Dockerfile) so each device gets the codec its on-device player actually decodes well: ┌──────────┬─────────────────┬──────────────┬──────────────┐ │ Board │ Player │ HEVC OK? │ Target codec │ ├──────────┼─────────────────┼──────────────┼──────────────┤ │ pi2/pi3 │ VLC + mmal-vc4 │ no HW, slow CPU │ H.264 │ │ pi4-64 │ mpv + V4L2 HEVC │ HW-decoded │ HEVC │ │ pi5 │ mpv + SW decode │ A76 SW @ 1080p │ HEVC │ │ x86 │ mpv + va/nv/qsv │ HW-decoded │ HEVC │ │ unset │ (dev / unknown) │ assume no │ H.264 │ └──────────┴─────────────────┴──────────────┴──────────────┘ Per-board passthrough also tightens: an HEVC upload to a pi3 device no longer slips through unplayable — it gets transcoded to H.264. Conversely, an H.264 upload on pi5 still passes through unchanged (no point re-encoding to HEVC on a row that already plays). The ``Asset.metadata['transcode_target']`` field now records the codec the device wanted, written on both passthrough and transcode paths so the operator can see "this device wanted hevc, the upload already was hevc, no work needed" without inferring. * ``_BOARD_PROFILES`` maps each ``DEVICE_TYPE`` value the image builder emits to ``{transcode_target, passthrough_video_codecs, video_args}``. ``_resolve_board_profile`` reads the env var. * ``_video_can_passthrough`` and ``_transcode_to_target`` accept an optional profile arg; tests pin the profile per case rather than mutating env (and one parametrised test still uses env so the resolve path is exercised end-to-end). * HEVC encode args include ``-tag:v hvc1`` for broader player compat (mpv/VLC don't care, but iOS / browsers prefer hvc1 over hev1). * libx265 CRF 28 chosen as the rough perceptual equivalent of libx264 CRF 23 — matches the heuristic in libx265's own docs. Tests: * New parametrised tests for the codec grid: per-board target codec resolution, per-board passthrough decision, per-board ffmpeg argv (including ``-tag:v hvc1`` only on HEVC boards), pi3 + HEVC source → libx264 transcode, pi5 passthrough records target codec. * Updated existing passthrough test to pin DEVICE_TYPE=pi5 since the default profile is now H.264-only. * feat(youtube,ui): chain YouTube into the same processing pipeline + error pill Two unifications driven by the same goal — every "row processing" state and every "row failed" state should look identical to the operator regardless of which celery task handled the row. YouTube → normalize_video_asset chain ------------------------------------- ``download_youtube_asset`` no longer terminates the row's in-flight state on its own. After yt-dlp lands the .mp4 it: * writes ``metadata['source']='youtube'`` and ``metadata['source_url']`` so an operator can recover the original URL after ``name`` is overwritten with the resolved title, * leaves ``is_processing=True``, * dispatches ``normalize_video_asset`` to take over. The chained pass runs ffprobe and decides per-board passthrough vs. transcode using the codec grid landed in this PR. That matters because yt-dlp's ``format_sort: vcodec:h264`` is a preference, not a guarantee — when no H.264 rendition is available yt-dlp falls back to whatever it can get (vp9 webm, av1, ...). Without the chain, those downloads would land on a pi3 device unplayable. With the chain, the same codec grid that protects file uploads protects YouTube downloads too, and the row carries the same metadata shape (``original_ext``, ``transcoded``, ``transcode_target``). Failure-path unification ------------------------ ``_DownloadYoutubeTask.on_failure`` now reuses ``processing._set_processing_error`` + ``processing._notify`` — single source of truth for the error_message contract instead of two near-duplicate blocks. A failed YouTube download now writes ``metadata.error_message`` (``DownloadError: 404 Not Found`` etc.) exactly like a failed normalisation does. UI: error pill -------------- The asset table row template renders a warn-coloured "Failed" pill (in the column previously occupied by the active toggle) when ``metadata.error_message`` is populated and ``is_processing`` is clear. The full message rides along on the title/aria-label so the operator can hover for context — no extra modal needed. Same shape as the existing ``processing-pill`` so the column layout stays stable across in-progress / failed / done states. Tests ----- * ``test_download_youtube_asset_success_chains_into_normalize_video`` — happy path now asserts ``is_processing=True`` post-task and ``dispatch_normalize_video`` was called with the asset_id. * ``test_download_youtube_asset_on_failure_writes_error_metadata`` — replaces the old "clears processing" test; asserts both ``is_processing=False`` and the ExceptionType+message in ``metadata.error_message``. * Three other YouTube tests updated to mock ``dispatch_normalize_video`` so they don't hit a real broker. * ``test_asset_row_renders_error_pill_when_processing_failed`` and ``test_asset_row_no_error_pill_when_metadata_clean`` lock in the template's pill rendering. * feat(processing): normalise BMP, ICO, TGA, JPEG 2000, and AVIF to WebP Extends the image-normalisation pipeline to cover the realistic set of "operator drags an unusual image format into the upload modal" cases, all handled by Pillow's built-in decoders without a new apt or wheel dependency: ┌──────────┬────────────────────────────────────────────────────┐ │ Format │ Why we want it converted │ ├──────────┼────────────────────────────────────────────────────┤ │ BMP │ Uncompressed; a 4K BMP is ~30 MB vs ~1 MB as WebP. │ │ ICO │ Multi-frame Windows icon; pick the largest, flatten│ │ TGA │ Screenshot tools / game asset exports; no browser │ │ │ support. │ │ JPEG2000 │ .jp2/.j2k/.jpx/.jpc/.jpf — scanner output; no │ │ │ browser support. │ │ AVIF │ Modern phone exports. Chromium 85+ renders AVIF, │ │ │ but the legacy Pi 2/3 Qt5 WebEngine predates it, │ │ │ so converting on upload means one playback path │ │ │ across the fleet. │ └──────────┴────────────────────────────────────────────────────┘ JPEG / PNG / WebP / GIF / SVG remain untouched — already viewer-friendly and well-compressed. Implementation: * Extend ``NORMALIZE_IMAGE_EXTS``; the rest of the pipeline already accepts any extension in this set (RGBA conversion happens inside ``_convert_image_to_webp`` regardless of source format). * Replace the duplicate extension set in ``assets_upload`` with a call to ``processing.needs_image_normalisation`` so the source of truth lives in one place. * Widen the upload modal's <input accept> attribute. Tests: * ``test_image_normalises_to_lossless_webp_across_formats`` is a parametrised matrix that round-trips each new format end-to-end: source synthesised via Pillow, runs through ``_run_image_normalisation``, asserts the WebP output decodes cleanly back to a 16x16 image. Catches both decoder-side regressions (Pillow drops a format) and writer-side regressions (RGBA convert mode breaks one source). * ``test_needs_image_normalisation`` extended to cover every entry in the new set plus negative cases (.jpg/.png/.webp/.gif/.svg stay False). Total: 109 image-format assertions. * fix(processing): address Copilot review on commit `8602faff` Six items from Copilot's fresh review pass: * ``assets_upload`` last-resort image-extension allowlist now derives from ``processing.NORMALIZE_IMAGE_EXTS`` rather than duplicating the set. Adding a new normalisable format (or removing one) only touches one place. * ``_run_image_normalisation`` cleans up the ``.webp.tmp`` staging file on every failure path — Pillow's ``UnidentifiedImageError`` and a generic OSError mid-encode (disk pressure, libheif crash). Mirrors the video pipeline's _drop_staging contract. * ffmpeg failure messages decode the bytes ``stderr`` to UTF-8 text (with replacement on malformed bytes) and tail-trim long output, so ``metadata.error_message`` reads as a real diagnostic instead of ``b'Invalid data found'``. * Removed the dead ``path.normpath(staging) == path.normpath( src_uri)`` branch in the video transcode path. With the staging suffix in place the two paths can never collide; expanded the surrounding comment to explain why. * Updated ``normalize_video_asset``'s docstring to describe the per-board codec grid (libx264 on pi2/pi3, libx265 on pi4-64 / pi5 / x86) rather than the now-stale "transcode to H.264 MP4". * Fixed "truecate" → "truncate" typo in ``_asset_row.html`` comment. New tests: * ``test_image_partial_write_cleans_staging`` — half-writes the ``.webp.tmp`` then raises OSError; asserts the runner removes the partial file before propagating. * ``test_format_subprocess_stderr_decodes_and_trims`` — covers the bytes-decode, malformed-byte-replacement, tail-trim, and empty-stderr cases for the new helper. * ``test_video_ffmpeg_error_cleans_staging`` strengthened to assert the error message contains no ``b'...'`` Python repr prefix — it's now operator-readable text. * fix(processing): address Copilot review on commit `778d5c9f` Three code fixes plus a PR-description sync: * ``_run_image_normalisation`` no-op path (when src_ext isn't in NORMALIZE_IMAGE_EXTS) now also clears ``metadata.error_message``. Without this, a row re-uploaded as a JPEG/PNG after a previously-failed HEIC conversion would drop is_processing but keep showing the "Failed" pill — the operator's table would lie about the row's current state. * ``_ffprobe_summary`` now catches ``sh.CommandNotFound`` in addition to ``TimeoutException`` / ``ErrorReturnCode``. A stripped-down image / dev box without ffprobe in PATH used to crash the task with an unhandled CommandNotFound; now it collapses to the same all-'unknown' summary so the runner falls through to the transcode branch (which itself fails clean if ffmpeg is also missing — same on_failure contract). * Rewrote the ``_ffprobe_summary`` docstring: the actual behaviour is "unknown" for missing video stream, "none" only for genuinely missing audio stream. The previous "''" claim was wrong and would have misled callers / future maintainers. Tests: * ``test_image_no_op_path_clears_stale_error_message`` — JPEG re-uploaded over a row whose previous attempt failed; the no-op branch must wipe the stale error_message. * ``test_ffprobe_summary_handles_missing_ffprobe_binary`` — CommandNotFound side-effect; asserts all-'unknown' summary rather than a propagating exception. * fix(processing): address Copilot review on commit `42697452` Four contract gaps Copilot flagged: * ``normalize_image_asset`` / ``normalize_video_asset`` use ``autoretry_for=(OSError,)`` to recover from transient disk pressure. ``FileNotFoundError`` is-a ``OSError`` so the filter was catching it too — but a missing source file is permanent, and retrying just delays the on_failure that writes ``metadata.error_message``. Adding ``dont_autoretry_for=(FileNotFoundError,)`` to both decorators makes the missing-source raise propagate immediately, so the operator sees the "Failed" pill and the error message at the next browser refresh instead of waiting through up-to-3 exponential-backoff retry cycles. * ``_run_image_normalisation`` and ``_run_video_normalisation`` both call ``os.replace(staging, final_uri)`` after a successful conversion / transcode. A rename failure (cross-device link, filesystem-full at the very last step, permissions) was outside the existing try/except, so the staging file would linger until cleanup()'s 1h sweep. Wrap both in a try/except that calls ``_drop_image_staging`` / ``_drop_staging`` on any OSError before propagating — the "no leftover staging artifacts on failure" contract now holds across every failure path. Tests: * ``test_image_rename_failure_cleans_staging`` and ``test_video_rename_failure_cleans_staging`` — patch ``os.replace`` to raise OSError; assert the staging file is gone before the exception reaches the runner's caller. * ``test_normalize_tasks_exclude_filenotfounderror_from_autoretry`` — celery-config-time check that both tasks expose ``FileNotFoundError`` in their dont_autoretry_for tuple, so a future change to the decorator can't silently regress the immediate-fail contract. * docs(processing): align docstrings with current normalise scope Three stale docstring callouts from Copilot's review of `7099b25e`: * ``processing.py`` module docstring — listed only HEIC/HEIF/TIFF for the image task. Updated to enumerate the full set (HEIC/HEIF/TIFF/BMP/ICO/TGA/JPEG 2000 family/AVIF) and to call out the JPEG/PNG/WebP/GIF/SVG no-op short-circuit. * ``needs_image_normalisation`` docstring — same drift; rewrote the leading sentence to match what the predicate actually checks (``_ext(...) in NORMALIZE_IMAGE_EXTS``). * ``tests/test_processing.py`` module docstring — said the image task covers only HEIC/HEIF/TIFF and that video transcodes are libx264-only. Both stale: extended to enumerate every image format the suite exercises, and to describe the per-board ``DEVICE_TYPE`` codec grid (libx264 on pi2/pi3, libx265 on pi4-64/pi5/x86) that the parametrised video tests pin down. No code changes; documentation only. * fix(processing): exclude UnidentifiedImageError from autoretry Pillow's UnidentifiedImageError inherits from OSError, so it was getting caught by normalize_image_asset's autoretry_for=(OSError,) filter — a corrupt-image upload would retry up to 3 times with exponential backoff before metadata.error_message landed. Add it to dont_autoretry_for alongside FileNotFoundError so the permanent-failure contract surfaces immediately. Test extended to assert both exclusions are in place at celery-config time. * docs(website): describe new upload-time normalisation in user-facing copy Reflect the per-board codec grid + the wider image format set on the marketing site, but in plain language — operators don't care about codec names, mpv vs VLC, or hardware-decode paths. * features.html — replaced the single "Images, videos, and web pages" card with three smaller ones: "Drop in almost anything" (the wider format support), "Plays smoothly on any device" (the per-board normalisation, framed as "Anthias prepares it in the background"), and "YouTube on your screen" (the YouTube download path, with the no-ads angle). * faq.yaml — added "What image and video files can I upload?" entry, written conversationally; reframed the existing YouTube answer to mention the local-playback / no-ads benefit. Avoids: codec names (H.264/HEVC), library names (ffmpeg/yt-dlp), implementation terms (transcode/normalisation/passthrough), and file-format extensions in the running prose. The Processing / Failed dashboard badges get a one-line mention so operators know what they'll see while a file is being prepared. * fix(views): recover image extension from MIME subtype as last-resort fallback Two items from Copilot's review of `eb785f76`: * ``assets_upload`` had a two-step extension fallback (mimetypes guess → operator-supplied filename). On a host with a sparse mimetypes DB and an upload whose name has no extension — e.g. an Android share that renames the file to ``image`` — both fell through and the file landed extensionless. With no ``.heic`` / ``.avif`` / etc. on disk, ``needs_image_normalisation`` returned False and the upload silently slipped past the normalise pipeline. Added a third step: when both prior fallbacks come back empty, derive ``.<subtype>`` from the ``image/<subtype>`` MIME and accept it only if that ext is in ``NORMALIZE_IMAGE_EXTS`` — same source of truth the pipeline already uses, so adding a new normalisable format only touches one place. * Fixed a typo in the ``Asset.metadata`` model field comment (``image_normalize_asset`` / ``video_normalize_asset`` → ``normalize_image_asset`` / ``normalize_video_asset``) so the comment matches the actual task names. New test ``test_assets_upload_extensionless_heic_falls_back_to_mime_subtype`` mocks both ``guess_type`` and ``guess_extension`` to simulate the worst-case sparse-DB scenario, uploads a file with no name extension, and asserts the row lands at ``<id>.heic`` with the normalise task dispatched. * fix(processing): include canonical ffprobe format names in passthrough set Copilot caught a real bug: ``_PASSTHROUGH_CONTAINERS`` was a set of short extension labels (``ts``, ``mkv``, ...), but the same set is matched against ffprobe's reported ``format.format_name`` — which uses different canonical names. Concretely: * ``.ts`` → ffprobe reports ``mpegts``, not ``ts`` → forced transcode despite being passthrough-eligible. * ``.mkv`` → ffprobe reports ``matroska,webm`` — only worked accidentally before because ``webm`` happened to be in the set, mislabelling the container in metadata. Fix: add ``mpegts`` and ``matroska`` to the set with a comment explaining the dual purpose (extension labels + canonical names). Containers whose canonical name already matches the extension label (``mp4``/``mov``/``mpeg``/``flv``/``avi``/``webm``) stay listed once. New parametrised test ``test_passthrough_containers_match_real_ffprobe_format_names`` locks the contract by mocking ``_ffprobe_streams`` with the actual ``format_name`` strings ffprobe emits for each container. Asserts both the resolved label is in ``_PASSTHROUGH_CONTAINERS`` and the ``_video_can_passthrough`` decision returns True on a pi5 profile — so a future change to either the set or the resolution logic that re-introduces the regression fails this test. * fix(processing): drop redundant .copy() in image conversion + stale comments Three items from Copilot's latest review pass: * ``_convert_image_to_webp`` was holding TWO full pixel buffers in memory at the WebP encode step: ``image.convert('RGBA')`` already returns a new image with its own buffer, then ``.copy()`` cloned it again. Meaningful on a Pi 5 decoding a 50 MP HEIC where each buffer is ~200 MB. Move the ``.save()`` inside the ``with Image.open(...)`` block instead — the converted image is safe to use across the close, but encoding inside the context means we never hold both source decoder state and the converted buffer at once. * ``docker-compose.yml.tmpl`` worker comment said "exotic-codec → H.264 transcode"; rephrased to "board-appropriate H.264/HEVC transcode" to match the per-board grid. * ``_asset_modal.html`` accept-attribute comment said "exotic video codecs → H.264 MP4"; same rewording. * fix(processing): single ffprobe per upload + safer duration probe + byte-true stderr trim Three Copilot items, all real bugs in the failure semantics or performance of the video pipeline: * ``_ffprobe_summary`` now also extracts ``format.duration`` from the same probe payload and returns it as ``duration_seconds`` alongside container/codec info. The runner's passthrough path reuses that value instead of re-shelling ffprobe via ``get_video_duration`` — saves one probe per passthrough row, which is the common case on a per-board-codec-matched fleet. Floor to 1s mirrors the YouTube-task rule. Probe-failure path collapses to ``duration_seconds=None`` like the other fields. * ``_resolve_duration_seconds`` now catches the exceptions ``get_video_duration`` raises (sh.ErrorReturnCode_1 on bad format, generic Exception on unexpected failures) and returns None instead of propagating. After a successful transcode the file is on disk and the row is otherwise ready; failing the whole task because the post-transcode duration probe stumbled was an own-goal — the operator can edit duration manually. * ``_format_subprocess_stderr`` now trims raw bytes BEFORE decoding so ``_STDERR_TAIL_BYTES`` is truly a byte limit, not a character limit. Multibyte UTF-8 in the keep window can no longer push the decoded length past the budget. Mid-multibyte cuts produce the Unicode replacement character via ``errors='replace'`` rather than crashing. New tests: * ``test_ffprobe_summary_extracts_duration_from_probe_payload`` covers good/sub-second/missing/unparseable values. * ``test_video_passthrough_uses_summary_duration_no_second_probe`` asserts ``get_video_duration`` is not called in the passthrough branch — locks the no-double-probe contract. * ``test_resolve_duration_seconds_swallows_probe_exceptions`` proves the helper returns None instead of propagating. * ``test_format_subprocess_stderr_byte_trim_handles_multibyte_utf8`` exercises the mid-multibyte cut + decoded-len bound. * fix(processing): unify stderr trim across str/bytes; skip viewer reload on intermediate hops Two more Copilot items: * ``_format_subprocess_stderr`` had two trim branches: bytes (via byte-precise tail) and str (via character-count tail). The str branch could exceed _STDERR_TAIL_BYTES under multibyte text. Normalise to bytes once at the top (encoding str via UTF-8 with replacement) and run a single byte-precise trim — both paths now respect the byte budget identically. * ``_notify`` gains a ``reload_viewer`` keyword. The YouTube task's intermediate notification (after writing title/duration but before chaining into normalize_video_asset, while is_processing is still True) now passes ``reload_viewer=False``. The browser-side dashboard nudge still fires so the operator sees the resolved title immediately; the on-device viewer doesn't reload its playlist for a row that's still mid-flight. The chained normalize step's _notify (which runs once is_processing clears and the file is final) handles the actual viewer reload — saves the viewer one redundant playlist refresh per YouTube upload. Tests: * ``test_notify_browser_only_skips_viewer_reload`` exercises the new flag. * The YouTube-success test now mocks Redis and asserts ``publish.assert_not_called()`` to lock in the no-intermediate- reload contract. * docs: drop stale PDF reference, fix grammar nit * processing.py: _set_processing_error docstring listed "encrypted PDF" as a permanent-failure case from the issue's three-workstream framing. PDF is explicitly out of scope for this PR — replaced with concrete failure modes the current image/video tasks actually surface. * docker-compose.yml.tmpl: "a single configure here" reads as a verb. Changed to "a single configuration here". * fix(processing): reject decompression-bomb image uploads before decode Real security gap Copilot caught: Pillow happily allocates pixel buffers proportional to ``width × height`` regardless of how small the source file is on disk. A few KB of crafted bytes advertising a 1,000,000×1,000,000 image would force the celery worker to attempt a multi-TB allocation — at best a hard OOM that kills the worker and stalls the upload pipeline; at worst a swap-storm that drags the on-device viewer with it. Pillow ships ``MAX_IMAGE_PIXELS`` (default ~89 MP) which raises ``DecompressionBombError`` past 2× that threshold and warns softly at the first level. That default is too lax for signage content (where 4K @ 8 MP is already large) and pillow-heif's own decoder can bypass the check on certain HEIF/AVIF inputs. Two layers of protection: 1. ``_MAX_IMAGE_PIXELS = 50_000_000`` constant — bigger than any legitimate phone-camera output (modern flagships top out around 50 MP at the standard 4:3 aspect after JPEG/HEIC compression) but tiny compared to typical bomb fixtures. 2. ``_convert_image_to_webp`` reads ``image.size`` from the format header before any decode and raises ValueError if the dimensions exceed the cap. The on_failure path writes the message to ``metadata.error_message`` like any other permanent failure. Lowering Pillow's global ``Image.MAX_IMAGE_PIXELS`` to the same value protects any future call site that goes through ``Image.open`` outside this helper. New test ``test_image_decompression_bomb_is_rejected`` mocks ``Image.open`` to return a stub whose ``.size`` exceeds the cap (synthesising a real billion-pixel fixture would itself need GBs of memory) and asserts the runner raises before any ``convert()`` / ``save()`` is reached. * docs(views): sync assets_upload comment with NORMALIZE_IMAGE_EXTS Inline comment listed only HEIC/HEIF/TIFF/BMP, but the constant it points at also covers ICO/TGA/JP2 family/AVIF. Rewrote to reference the constant as source of truth and enumerate the current set so the comment stops drifting on the next addition. * fix(processing): disable failed assets + reject misnamed bypass uploads Two real bugs Copilot caught: * ``_set_processing_error`` cleared ``is_processing`` but left ``is_enabled=True``, so a failed normalisation would still get queued for playback by the viewer's scheduler (which filters on is_enabled + date window only — it doesn't check ``metadata.error_message``). The on-screen result was a black rectangle for the row's duration. Flipping ``is_enabled=False`` alongside ``is_processing=False`` keeps the bad row out of rotation; the operator can re-enable from the dashboard once the underlying issue is fixed. The ``error-pill`` template already replaces the active toggle so the operator sees the failure state before they re-enable. * ``assets_upload`` deferred to ``mimetypes.guess_type`` first and only consulted ``UploadedFile.content_type`` when guess_type produced no image/video classification. If an operator renamed a HEIC to ``photo.jpg`` and uploaded it, guess_type returned ``image/jpeg`` (a passthrough type), the Content-Type fallback was skipped, the file landed as ``.jpg``, and the normalise pipeline never ran — a silent failure-to-render. Modern browsers sniff the actual bytes and tag the upload with ``image/heic`` regardless of filename, so the view now cross-checks: when guess_type and Content-Type share a top-level (image/* or video/) but disagree on subtype, AND Content-Type's subtype maps to a NORMALIZE_IMAGE_EXTS extension, prefer Content-Type. Only upgrades — never downgrades — to avoid the inverse case (a JPEG mis-tagged as image/heic by the browser somehow) accidentally routing into the pipeline. Tests: test_set_processing_error_writes_metadata extended to assert is_enabled flips to False alongside the error message write. * New test_assets_upload_misnamed_heic_uses_browser_content_type uploads HEIC bytes named ``photo.jpg`` and asserts the file lands as .heic with the normalise task dispatched. * build(pi2/pi3): add Pillow + pillow-heif build deps for armv7 source builds Real concern Copilot caught about the Pillow / pillow-heif introduction: neither ships armv7l manylinux wheels (Pillow 11 explicitly dropped them in its release notes; pillow-heif only publishes x86_64 / aarch64). uv's resolution on a pi2 / pi3 image build therefore falls back to sdist, and the existing ``builder_extra_apt`` only covers libcec / libdbus headers — the ``uv sync`` step would gcc-fail at the first JPEG / HEIF binding. Extend ``get_uv_builder_context`` to take a ``board`` argument and append the Pillow / pillow-heif build-time deps when ``service='server'`` and ``board in {'pi2', 'pi3'}``. 64-bit boards (pi4-64 / pi5 / x86) and the test image still get binary wheels and the apt list stays unchanged for them — adding the deps unconditionally would waste ~70 MB of layer space on every non-armv7 build. Pillow's documented build deps: libjpeg62-turbo-dev / libfreetype-dev / liblcms2-dev / libopenjp2-7-dev / libtiff-dev / libwebp-dev / zlib1g-dev pillow-heif: libheif-dev (the libheif1 runtime is already in ``base_apt_dependencies`` for both architectures). Verified: ``--build-target pi3`` now generates a Dockerfile that installs the new build deps; ``--build-target pi5`` does not.	2026-05-07 12:22:36 +01:00

7 Commits