Anthias

mirror of https://github.com/Screenly/Anthias.git synced 2026-06-13 10:44:18 -04:00

Author	SHA1	Message	Date
Viktor Petersson	eb8909a788	fix(server): stream the backup as an async iterator so it doesn't time out (#3074 ) * fix(server): stream the backup as an async iterator so it doesn't time out StreamingHttpResponse only streams an asynchronous iterator under ASGI. Handed the sync stream_backup() generator, Django's __aiter__ falls back to `await sync_to_async(list)(...)`, which drains the whole generator — building the entire archive (buffered in RAM) before the first response byte. That silently reintroduced the 0-bytes-then-timeout failure the streaming path was meant to fix and risks OOM on a 1 GB Pi (issue #3073). - add astream_backup(): async wrapper pulling each chunk via sync_to_async(thread_sensitive=False) so bytes flow as the tar builds - close the underlying sync generator on disconnect to stop the producer - point the download view at astream_backup() (is_async path) - regression test drives aiter(StreamingHttpResponse(...)) — the real ASGI consumption path the unit-level test never exercised Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): satisfy mypy strict in backup stream wrapper + test - wrap next() in a typed helper with a None sentinel; sync_to_async(next) resolved next's single-arg overload, tripping mypy on the 2-arg call - use a distinct file handle in the regression test (text- then binary-mode reuse of one variable is a mypy type conflict) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): serialize backup generator access to avoid disconnect race Copilot review: with next() and close() on sync_to_async's shared pool, a client disconnect mid-next() could run gen.close() on another thread concurrently — raising "generator already executing" and leaking the producer thread. - drive next() and the cleanup close() through one dedicated single-worker executor so they can never overlap; the queued close() runs only after any in-flight next() returns - add a regression test that aclose()s the async generator mid-stream and asserts the producer thread exits cleanly Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-12 12:03:14 +01:00
Viktor Petersson	10f273b7c3	fix(server): classify RTSP/streaming URLs as streaming, not webpage (#3071 ) * fix(server): classify RTSP/streaming URLs as streaming, not webpage The React→Django migration (#2818) reimplemented the URL→mimetype classifier in the assets_create form handler but dropped the scheme check, so rtsp://, rtmp:// and HLS/DASH manifest URLs fell through to mimetype='webpage'. The viewer then routed them to QtWebEngine instead of the video player and the stream never played. - add `is_streaming_uri()` to remote_video, reusing the existing stream-scheme / manifest-extension sets so it can't drift from `is_downloadable_remote_video` - assets_create now stamps stream URLs as mimetype='streaming' and applies the default_streaming_duration window - regression tests for the classifier and the form handler Validated end-to-end on the x86 testbed: an ffmpeg-fed RTSP feed played live in the viewer (decoded and progressing on screen). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): reject RTMP — Qt6's QMediaPlayer can't play it On-device testing showed rtmp:// streams render black on every board. Root cause is in Qt6's FFmpeg backend: it sets a `timeout` AVFormatContext option the rtmp protocol misreads as TCP listen mode, so the open fails (libavformat itself plays rtmp fine). Rather than let operators add a stream that never shows: - drop rtmp from validate_url's allowed schemes (gates the UI form and the API) - drop rtmp from remote_video._STREAM_SCHEMES so is_streaming_uri never classifies a legacy rtmp row as a playable stream - assets_create returns a clear "use RTSP/HLS/DASH" message instead of the generic "Invalid URL" - tests updated url_fails keeps probing rtmp defensively for any pre-existing DB rows. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(review): restrict is_streaming_uri manifest match to http(s) Addresses Copilot review on #3071: a manifest extension (.m3u8/.mpd/…) was treated as streaming for any scheme, so file:///x/index.m3u8 would classify as mimetype='streaming'. Manifests are HTTP-delivered — only match them over http(s). Adds file:// manifest regression cases. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(sonar): silence S5332 on http manifest test fixture The DASH-over-http URL in the is_streaming_uri parametrize is a deliberate test input — it exercises the http (not https) manifest branch. It's a fixture, not real traffic, so annotate the SonarCloud clear-text-protocol hotspot with # NOSONAR rather than changing the test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 21:45:29 +01:00
Viktor Petersson	9107539a08	fix(build): replace retired Linaro armhf toolchain with Debian's (#3070 ) * fix(build): replace retired Linaro armhf toolchain with Debian's releases.linaro.org has been retired, so the pi2/pi3 webview-builder's download of gcc-linaro-7.4.1 fails to connect (curl exit 7). Every master Docker build has gone red since, and because publish-latest only advances the floating latest-<board> tags on a fully green matrix, the latest-* images froze — users tracking latest stopped getting any merged fixes. - Install Debian's supported crossbuild-essential-armhf (gcc 14, same arm-linux-gnueabihf- prefix) instead of fetching the dead tarball - Symlink it under the legacy gcc-linaro path the frozen Qt 5 qmake.conf bakes into CROSS_COMPILE, so the pinned WebView-v2026.04.1 artifact needs no rebuild; the app still links against the Raspbian /sysroot, so the target glibc is unchanged - Apply the same swap to the offline toolchain-rebuild path (build_qt5.sh + its Dockerfile) so the dead URL is gone everywhere Validated by building the pi2 viewer image: qmake/make link the AnthiasViewer binary against the pinned Qt 5 libs with the Debian cross-gcc, producing a valid ARM EABI5 hardfloat executable. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(build): harden toolchain shim and correct amd64-pin comment Addresses Copilot review on #3070: - build_qt5.sh: fail fast with a clear message if the armhf cross toolchain is absent (else the glob expands to a literal and errors confusingly), and use `ln -sf` so reruns are idempotent. - Dockerfile.qt5-webview-builder.j2: the amd64-pin comment referenced the removed Linaro download; the real reason is the pinned Qt 5 bundle's x86_64 host qmake (and the x86_64-hosted Debian cross-gcc). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 17:16:36 +01:00
Viktor Petersson	4402ea4fb0	fix(sentry): don't report settings-save input validation as errors (#3068 ) ANTHIAS-3D ("AuthSettingsError: New passwords do not match!") is operator input validation, not a bug — a mismatched/incorrect password, a taken username, or a too-weak password typed into the settings form. Both settings-save surfaces caught it under a broad `except Exception` + `logger.exception(...)`, and Sentry's logging integration turns that ERROR record into an event. - catch AuthSettingsError ahead of the generic handler in both the HTML view and the DRF v2 view; log it at warning (no traceback) so it never reaches the logging integration, and surface the operator-friendly message (the v2 view previously buried it under a generic "An error occurred") - add AuthSettingsError to the before_send drop filter as a backstop for any other path that logs it as an error - spell auth.py's AnyRequest as an explicit TypeAlias: the implicit form flipped to mypy "not valid as a type" once settings.py began importing the module - regression tests for the before_send drop and the warning-level, no-traceback v2 rejection Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 15:05:06 +01:00
Viktor Petersson	93f47428df	fix(install): accept pi3-64 in Ansible playbook validation (#3067 ) A Raspberry Pi 3 running a 64-bit OS reports DEVICE_TYPE=pi3-64 (both bin/install.sh::set_device_type and bin/upgrade_containers.sh emit it, and latest-pi3-64 Docker images are built), but the Ansible playbook never accepted that value: - ansible/site.yml's pre_task assertion only allowed pi2, pi3, pi4-64, pi5, x86, arm64 — so a 64-bit Pi 3 install aborted at "Gathering Facts" with "Required environment variables missing or invalid" (forums.screenly.io/t/6716). - ansible/roles/system/tasks/docker.yml's docker_arch_by_device_type map is exhaustive-by-design and would have raised on the unmapped pi3-64 key; add the arm64 mapping. - the device_is_pi membership test omitted pi3-64, which would have dropped the gpio group on a board that is a Pi. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 11:56:12 +01:00
Viktor Petersson	a3aa63fa1a	feat(viewer): blank/unblank commands to turn the display off (#3065 ) * feat(viewer): add blank/unblank commands to turn the display off Adds a way to blank the screen on demand, parallel to the existing next/previous/stop/play viewer commands on the anthias.viewer Redis channel: * Wayland boards (x86/pi5/arm64): wlr-randr powers the connector off (true DPMS power-off) and back on. _wlr_output_names() gained an include_disabled flag so unblank can re-enable a connector that's currently Enabled: no. * eglfs/linuxfb boards (pi2/pi3/pi4): the Qt app owns the DRM master and can't be powered off externally, so the asset loop paints a new all-black BLACK_SCREEN image instead (same proven loadImage path as the standby screen). Backlight stays on; the screen goes black. blank_display() flips display_blanked + loop_is_stopped from the subscriber thread and runs the out-of-process wlr-randr call there; the webview repaint is deferred to the main loop thread (start_loop), which owns current_browser_url — mirroring the existing rotation-bounce threading discipline. Validated live on x86 (wayland): `blank` -> DP-1 Enabled: no, `unblank` -> Enabled: yes. eglfs black-paint reuses the standby loadImage path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): address review on display blanking (mypy, log spam, state) - Import get_skip_event from anthias_viewer.utils in the test instead of via the viewer module, which doesn't explicitly re-export it (mypy attr-defined under --no-implicit-reexport). - start_loop: guard the black repaint on current_browser_url so it doesn't re-call view_image() — and re-log "Current url ..." at INFO — on every 0.1s tick while blanked (Copilot). - Route stop/play through module-level helpers that set the loop_is_stopped global start_loop actually reads; the prior setattr(__main__, ...) wrote a dead namespace under `python -m anthias_viewer` and never paused the loop. play now implies unblank when the display is blanked (Copilot). - Add tests for stop flag + play-implies-unblank. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(viewer): mark BLACK_SCREEN http URL safe for Sonar (S5332) http is intentional — the viewer talks to the local anthias-server over plain HTTP (TLS is the opt-in Caddy sidecar's job), identical to the existing STANDBY_SCREEN / SPLASH_PAGE_URL. Annotate the line # NOSONAR, the repo's documented convention for Sonar false positives under Automatic Analysis (see sonar-project.properties; cf. test_csrf.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 11:50:54 +01:00
Viktor Petersson	f67aa1e7f5	fix(celery): soft-limit the display-power and telemetry pokes (#3063 ) * fix(celery): soft-limit the display-power and telemetry pokes - ANTHIAS-A/9/B group by the worker-SIGKILL signature, not the task, so they survived #3017 — which only soft-limited the asset probe - get_display_power and send_telemetry_task still ran under a bare time_limit=30; a wedged CEC query or a getaddrinfo stall (requests' timeout doesn't cover DNS) tripped the hard limit and SIGKILLed the pool child - give both the #3017 treatment: soft_time_limit raises inside the task so it logs and skips the tick; the hard limit stays as the C-code backstop - add regression tests for the limits and the soft-timeout skip Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(celery): write display_power value and TTL atomically - a soft-limit signal between SET and EXPIRE could leave display_power without a TTL (stale value that never expires) - use a single SET with ex= so the write and TTL are atomic Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(telemetry): write cooldown key and TTL atomically - send_telemetry_task now runs under a soft time limit, so a SoftTimeLimitExceeded landing between the cooldown SET and EXPIRE would leave telemetry-cooldown without a TTL — silencing telemetry permanently (same class as the display_power fix in `f5e94664`) - collapse the SET + EXPIRE into one SET … ex= so the value and TTL are written atomically - update test_telemetry to assert the atomic SET and accept ex= in the fake redis client Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 11:25:44 +01:00
Viktor Petersson	bc27c555df	fix(server): polish home bulk-action & upload UI/UX (#3066 ) * fix(server): polish home bulk-action & upload UI/UX Follow-up UI/UX pass over the recently shipped bulk asset management (#3048), multi-file upload (#3049), and ffmpeg/HandBrake rejection hints (#3040). - Reserve bottom space (.has-bulk-selection) while a selection is active so the fixed bulk-action bar never floats over the last rows or their action buttons — exactly the assets a bulk selection targets. - Cap .modal-card to the viewport and scroll inside it, with sticky header/footer, so a tall bulk-edit form (or the Edit modal with the failure alert + Advanced open) keeps its title and Apply/Save buttons reachable instead of pushing them below the fold. - Wire real drag-and-drop on the upload dropzone (dropFiles() feeds the same sequential uploadFiles() batch path); the dashed zone already read as a drop target but silently ignored drops. - Lift the selection checkbox contrast on the dark Enabled surface so the select-all (checked/indeterminate) and per-row boxes are legible against the purple gradient; scoped to .asset-select so the activity switch keeps its track styling. - Anchor the bulk bar's clear (x) to the top-right on phones so it no longer wraps beside the destructive Delete (mis-tap risk). - Stack the ffmpeg recipe's copy button under the command at narrow widths; make the empty-state CTA a <button> (action, not nav); drop the bulk duration field's placeholder that collided with its floating label. Presentational only — no API/model changes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): stop dropzone highlight flicker on child drag-over Copilot review on #3066: dragleave bubbles from the dropzone's child icon/paragraphs, toggling dragActive off while the cursor is still over the label and flickering the highlight. Add the .self modifier so the handler only runs when leaving the label itself. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): clear dropzone drag highlight when the modal closes Second Copilot pass on #3066: dragActive could stay true if the user drags into the dropzone then closes the Add modal (Esc/backdrop/Cancel) before dragleave fires, so the dropzone re-opened still highlighted. Reset dragActive in closeModal() alongside the other modal state. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 11:19:18 +01:00
Viktor Petersson	8a4db73b84	fix(viewer): detect DB changes under WAL so new assets load (#3062 ) * fix(viewer): detect DB changes under WAL so new assets load (#3061) The viewer polls the database's mtime to decide when to reload its playlist, stat'ing only the main `anthias.db` file. Since #3015 the DB is opened with `journal_mode=WAL`, where commits land in the `anthias.db-wal`/`-shm` sidecars and leave the main file's mtime frozen until a (rare) checkpoint. So `get_db_mtime()` never advanced and `refresh_playlist()` never reloaded — most visibly, the first asset on a fresh install never displayed (its empty playlist also has a `None` deadline, so the deadline fallback can't recover it either). Take the newest mtime across `anthias.db`, `anthias.db-wal`, and `anthias.db-shm`: WAL commits move the sidecars and a checkpoint moves the main file, so the value advances on every write regardless of journal mode. Add a regression test that a write touching only the `-wal` sidecar is detected. Fixes #3061 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(viewer): isolate WAL mtime test in tmp_path to avoid xdist race The new test created /tmp/fakedb-wal in the shared /tmp; with the now WAL-aware get_db_mtime(), that sidecar could leak into the concurrent test_check_get_db_mtime (asserting == 0 on /tmp/fakedb) under `pytest -n auto`. Use a unique tmp_path dir so the sidecar can't escape. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 11:04:26 +01:00
Viktor Petersson	65b122cbaf	chore(deps): bump grouped Dependabot updates and refresh uv.lock (#3064 ) Consolidates 6 Dependabot PRs into one grouped update and regenerates uv.lock (which the per-PR Dependabot runs left untouched, editing only pyproject.toml). Python (dev/build tooling + type stubs + minor runtime libs): - pytest-cov 6.0.0 -> 7.1.0 (dev) - click 8.1.7 -> 8.4.1 (docker-image-builder, local) - sh 2.2.2 -> 2.3.0 (server, viewer) - django-stubs-ext 6.0.3 -> 6.0.5 (server, mypy) - types-pytz 2026.2.0.20260506 -> 2026.2.0.20260518 (dev-host) GitHub Actions (pinned-SHA bumps): - github/codeql-action init/autobuild/analyze (v4) - codecov/codecov-action (v6) Supersedes #3058, #3057, #3056, #3055, #3054, #3053. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 10:53:56 +01:00
Viktor Petersson	3f05ead272	feat(server): add bulk asset management (enable/disable/edit/delete) (#3048 ) * feat(server): add bulk asset management (enable/disable/edit/delete) Operators with large libraries previously had to act on assets one row at a time — a recurring forum request for years (#3046). The home page now has per-row selection checkboxes, a per-section select-all in each surface header, and a floating bulk-action bar (Enable, Disable, Edit, Delete) that appears whenever a selection is active. Backend: two new server-rendered endpoints reuse the shared _asset_table_response so the whole batch swaps the table partial and nudges the viewer once, instead of one round-trip per row. - assets_bulk_action — enable/disable/delete a selected set; delete goes through delete_asset_with_file so on-disk cleanup matches the per-asset path (#2908). - assets_bulk_update — applies common schedule fields (start/end dates, duration, play-from/until times, weekdays). Each group is opt-in via an apply_* flag so an operator only overwrites the fields they ticked, and everything is parsed before any row is mutated so a bad date/time toasts without a half-applied batch. Video duration stays owned by the probe task, mirroring assets_update. Frontend: selection lives in the homeApp Alpine state (selectedIds), so row :checked bindings re-evaluate after the table's 5s HTMX swap and the selection survives. setVisible() is re-published from the table partial on every swap to drive the select-all state and prune a selection of rows that have since disappeared. The bulk-edit modal reuses the existing flatpickr date/time inputs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): address Copilot review on bulk asset management - asset_ids filter: stop mark_safe'ing the JSON. It lands in a double-quoted x-init="…" attribute, so its own double quotes were closing the attribute early and breaking the Alpine expression. Return a plain str and let Django autoescaping entity-encode the quotes; the browser decodes them back to valid JSON. Adds a regression test asserting the rendered attribute is escaped. - bulk delete: pass delete_asset_with_file(nudge_viewer=False) per row and fire a single viewer reload after the batch, instead of one reload per asset. New keyword-only flag defaults True so the API / single-delete paths are unchanged. - bulk enable/disable: collapse the per-row save() loop into one queryset update() (no model signals here); its row count drives the toast and the empty-selection guard. - bulk duration: a blank field with apply_duration on no longer clobbers every asset's duration to 0 — it toasts and changes nothing (mirrors the per-asset edit form's preserve-on-blank intent). Negative values are rejected too, and the modal input is now required as client-side defense-in-depth. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * perf(server): collapse bulk_update writes into a single bulk_update() Second Copilot pass on #3048: assets_bulk_update still did one asset.save() per selected row, so a large selection (the point of bulk editing) fired N UPDATEs. Mutate the in-memory objects, track exactly which columns were touched (honouring the skip-videos-for-duration rule), and write them all with one Asset.objects.bulk_update(). An edit that ends up touching nothing (apply_dates ticked with both fields blank, or a duration-only edit on an all-video selection) now short-circuits with the "nothing to change" toast, since bulk_update() rejects an empty field list. Adds a test asserting exactly one UPDATE statement for a 5-asset batch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): keep bulk duration write off video rows; clear modal default Third Copilot pass on #3048: - The single bulk_update() folded `duration` into the all-rows field list, which writes every video row's stale in-memory duration back and can clobber a concurrent probe_video_duration UPDATE. Write duration on its own bulk_update() over only the non-video subset, so video rows are never touched by the duration column at all. Added a test spying on bulk_update to assert no video object is in a duration write. - The bulk-edit duration input was prefilled with 10, so toggling "Duration" and submitting would silently set every non-video asset to 10s. Drop the default to an empty placeholder and rely on the existing `required` to force an intentional value. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): prune selection once across both sections; accurate bulk count Fourth Copilot pass on #3048: - syncVisibleIds(active, inactive): the table partial now publishes both sections' ids in one call and selectedIds is pruned once against their union. The previous two sequential setVisible() calls pruned on the first call while the other section's list was still stale, so a row that moved between sections (e.g. an enabled asset just disabled) lost its selection across the swap. - assets_bulk_update toast now reports only the rows actually written: a duration-only edit skips videos, so a mixed selection reports the non-video count instead of claiming every row was updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): bulk forms clear only on real success; O(n) section selection Fifth Copilot pass on #3048: - Bulk endpoints return 200 even when they refuse the input (bad date, partial time window, blank duration, "nothing to change") and just ride an error/info toast on HX-Trigger. The forms cleared the selection / closed the modal on any 2xx, wiping the operator's work when nothing was applied. New isSuccessResponse(event) helper gates the clear/close on a 2xx whose toast is absent or kind 'success'; wired into all four bulk forms (enable/disable/delete/edit). - sectionAllSelected/sectionSomeSelected did linear includes() against selectedIds, i.e. O(visible × selected) on every reactive re-evaluation. Build a Set once per call so they stay O(n) for the large selections bulk editing targets. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): trim bulk ids; guard enable/disable on matched-rows count Sixth Copilot pass on #3048: - _bulk_ids() now strips each CSV segment, so a hand-built "a, b" (with spaces) still matches its rows instead of silently no-op'ing. - enable/disable take the count from a separate matched-rows count() rather than update()'s return value, which reports rows changed on some backends — re-enabling already-enabled assets would otherwise count 0 and (returning no toast) make the client treat it as success and clear the selection. A genuinely empty match now returns an info toast so the client's success gate keeps the selection. The empty bulk-delete case gets the same info toast. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): info toast on bulk_update with no matching ids Seventh Copilot pass on #3048: assets_bulk_update returned a silent no-toast 2xx when the posted ids matched no rows (stale selection), so the client's success gate cleared the selection / closed the modal even though nothing applied. Return the same info toast the enable/disable path uses so the selection is kept. Adds a test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * perf(server): bulk_update via uniform queryset update()s; toast on no-op action Eighth Copilot pass on #3048: - assets_bulk_action no longer returns a silent no-toast 2xx for an unknown action or empty id set — an unknown action toasts an error, an empty selection an info toast, so the client's success gate keeps the selection instead of treating it as success. - assets_bulk_update applies the (uniform) new values with plain queryset update()s instead of loading every selected row and building a bulk_update() CASE. Shared fields go in one update(*shared); duration goes through a separate exclude(mimetype='video').update(), which keeps the never-touch-video-duration guarantee at the SQL level (no in-memory staleness). Row counts come from matched-rows count()s, not update()'s changed-rows return. Replaced the bulk_update-spy test with a SQL-level video-duration-untouched test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> fix(server): bridge bulk hx-on handlers into Alpine via window event Ninth Copilot pass on #3048 — and a real runtime bug: the bulk forms' hx-on::after-request handlers called Alpine component methods (isSuccessResponse / clearSelection / closeBulkEdit / bulkDeleteOpen) directly, but hx-on runs in GLOBAL scope, so those are undefined there and the handler threw ReferenceError — the selection never cleared and the modals never closed on success. Fixed with the same dispatch-to-window bridge the Add modal already uses for 'asset-saved': hx-on now calls the global window.bulkSucceeded() gate and, on success, dispatches a 'bulk-done' window CustomEvent (with detail flags for which modal to close). The root x-data handles it via @bulk-done.window="onBulkDone($event)" in Alpine scope. Added a render test asserting the forms use the bridge and never call Alpine methods from hx-on. Also: assets_bulk_update now returns an info toast on an empty id set (was a silent no-toast 2xx the success gate would mis-read as success). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 10:31:15 +01:00
Viktor Petersson	9094a2031d	fix(install): install polkitd on Trixie+ manage-network path; minimize apt updates (#3060 ) * fix(ansible): install polkitd on Trixie+ manage-network path - NetworkManager only Recommends polkitd, so minimal Armbian Trixie images ship NM without it and have no /etc/polkit-1/rules.d - the rules copy task then failed: "Destination directory /etc/polkit-1/rules.d does not exist", aborting the install - install polkitd explicitly so the rule has a home and a daemon to enforce it Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(ansible): drop redundant apt update_cache on polkitd install - the network role runs after the system role's apt update + dist upgrade, so the cache is already fresh - matches the splashscreen role's mid-playbook package installs, which carry no update_cache Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(install): refresh apt lists once per source change - guard install.sh's two apt-get updates behind apt_update_once so a minimal image no longer double-updates; reset only when the legacy Raspbian mirror is actually rewritten - drop redundant update_cache on the libc6-dev reinstall; the cache is already fresh from install.sh - net result: one base update + one per added repo (Docker) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-11 10:30:13 +01:00
Viktor Petersson	8cea9cb736	fix(balena): add missing device types to unpin fleet map (#3051 ) - Add anthias-pi3-64 and anthias-rockpi4 to FLEET_DEVICE_TYPE - Both fleets were in FLEETS but absent from the map, so they hit the "unknown device type" branch and the job exited 1 (errors=2) - Re-syncs the map with bin/balena_fleet_maintenance.py per the comment Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 19:37:25 +01:00
Viktor Petersson	d323bf147d	feat(viewer): add "Prefer dark mode" setting for web page assets (#3050 ) * feat(viewer): add "Prefer dark mode" setting for web page assets Adds a "Prefer dark mode" toggle in Settings that instructs the Qt webview to render web page assets dark. The C++ webview owns the realization: applyDarkModePreference() reads ANTHIAS_PREFER_DARK_MODE and injects --blink-settings=forceDarkModeEnabled=true into Chromium before QtWebEngine inits, working uniformly across Qt5 (Pi 1-4) and Qt6 (Pi 5/x86) without a version macro. Python owns the setting: the viewer exports the env var per board in _build_webview_env() and respawns the webview when the operator toggles the setting (reusing the rotation-bounce handoff). Wired through the settings page, page_context, the form POST, and the v2 device-settings API. Validated on a Pi 5 testbed: with the setting on, AnthiasViewer's env carries ANTHIAS_PREFER_DARK_MODE=1 and the QtWebEngine renderer processes run with --blink-settings=forceDarkModeEnabled=true; with it off, neither is present. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(test): annotate pixel list so mypy doesn't flag no-any-return PIL's getdata() is untyped, so sum(list(...)) was Any and the -> float return tripped mypy's no-any-return. Annotate the list as list[int]. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(viewer): address review on dark-mode flag and its test - main.cpp: merge forceDarkModeEnabled into an existing --blink-settings switch instead of appending a duplicate (Chromium keeps only the last occurrence, so a second switch would silently drop any Blink settings already set); no-op if already requested. - test_webview_dark_mode: launch via the suite's browser_type / browser_type_launch_args fixtures so the conftest --no-sandbox override (and any future launch config) is inherited rather than hardcoded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 18:57:10 +01:00
Viktor Petersson	bb9da44d02	fix(server): restore multi-file (bulk) upload in the Add Asset modal (#3049 ) * fix(server): restore multi-file (bulk) upload in the Add Asset modal Multi-file upload (added in #2778 for the React UI) was lost in the #2818 React→Alpine/HTMX rewrite: the file picker accepted a single file only. The assets_upload endpoint already takes one file per POST, so this is a client-only fix. - Add `multiple` to the #add-file input. - Drive the upload from uploadFiles() in home.ts: iterate the selected files and POST them sequentially (one XHR per file) against the existing single-file endpoint, with "X of N" progress. htmx's single-form submit would only ever send the first file, so the file tab is no longer htmx-managed; toasts are replayed from the server's HX-Trigger header by hand. - Single-file uploads still flow through the same path unchanged. Adds an integration test (test_add_multiple_uploads_at_once) that selects two files in one go and asserts both persist. Fixes #3045 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): don't clear upload state on closeModal mid-batch Hiding the modal during an in-flight 'sending' upload cleared uploadState, which disarmed uploadFiles()'s re-entry guard and let a reopened modal start a second batch racing the first over the shared progress/index fields. Only reset upload state when nothing is in flight. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): surface upload rejections instead of swallowing them assets_upload refused invalid/empty uploads with messages.error + HTTP 200, which the HTMX/XHR path drops silently (the partial carries no toast header) — so the operator saw nothing and the batch uploader counted the rejection as a success. Pass the rejection through _asset_table_response(toast=('error', …)) so it rides the HX-Trigger header on every transport. Client side, uploadOne() now distinguishes 'ok' / 'rejected' (2xx + error toast) / 'error' (transport): a rejected file surfaces its server toast and the batch skips it and keeps going, while a true transport failure aborts. Addresses Copilot review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(server): update uploadOne comment for the UploadResult return The doc comment still described the old boolean return; it now documents the ok / rejected / error tri-state. Comment-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(server): attribute single-file limit to the endpoint, not htmx The comments said htmx "would only ever send the first file", but htmx includes every selected file in the multipart body — the real single-file constraint is assets_upload reading request.FILES.get. Reword both comments. Comment-only. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-10 06:45:38 +02:00
Viktor Petersson	3b8cf7afa1	fix(viewer): recognize all cage/Wayland boards for screen rotation (#3047 ) Screen rotation was a no-op on Raspberry Pi 5 (and likely generic arm64 cage boards): `_is_wayland_board()` only treated x86 as a Wayland board, so neither the linuxfb `:rotation=` option nor the wlr-randr path ran. - Key `_is_wayland_board()` off `QT_QPA_PLATFORM` starting with `wayland`, mirroring the docker/Dockerfile.viewer.j2 split (x86/arm64/pi5) instead of an x86-only DEVICE_TYPE check. - Update tests to pair DEVICE_TYPE with QT_QPA_PLATFORM as in production; switch linuxfb-intent tests off the (now wayland) pi5. - Add regression tests covering the board matrix and the Pi 5 wlr-randr rotation path. Validated end-to-end on a real Pi 5: both the boot-time apply and the live Settings reload now push the wlroots transform (90/270) to the connected display. Fixes #3044 Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 17:35:22 +02:00
Viktor Petersson	fe8f0541e9	fix(viewer): default x86 audio fallback to ALSA 'default' (#3038 ) On x86, get_alsa_audio_device() previously returned 'sysdefault:CARD=HID' as the fallback, which matches no real ALSA card on standard Intel/AMD/Nvidia HDA chipsets, leaving operators with no working audio. Defer to the `default` device instead. On x86 (a Qt 6 board) VideoView::resolveAlsaDevice resolves `default` to the PulseAudio default sink, since Debian's Qt 6 Multimedia only has a PulseAudio backend. This mirrors the ARM64 branch above, which already returns 'default' for the same reason. Update the corresponding test to assert the new fallback. Co-authored-by: Xzzz <xzz@carnetderoot.net> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> v2026.06.3	2026-06-09 09:52:41 +02:00
Viktor Petersson	8c6cfaf26a	feat(server): offer HandBrake GUI steps for rejected video uploads (#3040 ) * feat(server): offer HandBrake GUI steps for rejected video uploads - Add _handbrake_steps mirroring the ffmpeg recipe's codec/1080p choices - Persist the steps to metadata.error_handbrake on rejection - Render them as a numbered list with a handbrake.fr link in the modal Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): make HandBrake steps a real preset-based walkthrough - Lead with the stock "Fast 1080p30" preset (H.264 MP4, 1080p cap) - HEVC boards just flip Video Encoder to "H.265 (x265)" - Spell out source pick, Save As/Browse, and Start Encode - Drop the cap arg: the 1080p preset is the low-RAM fix too Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: assert on HANDBRAKE_URL constant, not the literal Reference processing.HANDBRAKE_URL instead of a duplicated URL literal so CodeQL stops flagging the assertion as incomplete-URL-substring sanitization (false positive in a test containment check). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs: address Copilot review on HandBrake steps - Reword final step: upload as a new asset (the Edit modal has no upload control) - Rename stale-clear test to mention error_handbrake too Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:55:57 +02:00
Viktor Petersson	5d21924080	fix(celery): don't report the by-design codec rejection to Sentry (#3041 ) The upload codec/resolution gate raises ``UnsupportedVideoCodecError`` as a deliberate, operator-facing rejection — it's surfaced in the UI as a "Failed" pill plus a copy-pasteable ffmpeg re-encode recipe via ``_NormalizeAssetTask.on_failure``. It's an expected outcome (e.g. Pi 5 has no H.264 HW decode block, an unknown arm64 board can't certify any codec), not a fault, so it shouldn't reach Sentry — but every rejection was landing there as an unhandled task error (Sentry ANTHIAS-1J, ANTHIAS-20). List it in the video task's ``throws``: Celery then logs it at INFO without a traceback, and sentry-sdk's CeleryIntegration skips ``task.throws`` exceptions (``_capture_exception`` returns early on ``isinstance(exc, task.throws)``), so the gate stops flooding Sentry. ``on_failure`` still runs, so the operator-facing error pill and recipe are unchanged. Regression test asserts the video task declares it in ``throws`` and the image task (which never raises it) does not. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:45:01 +02:00
Viktor Petersson	4e47d70d87	fix(build): ship libwebpdemux2 so pi3 Pillow can save WebP (#3042 ) The upload image-normalisation pipeline converts HEIC/HEIF/TIFF/etc. to lossless WebP via Pillow. On 32-bit Pi (armv7) there's no Pillow wheel, so it's compiled from source against libwebp-dev and dynamically links the whole webp family at runtime: libwebp7, libwebpmux3 and libwebpdemux2 (Pillow's _webp uses WebPAnimDecoder). The runtime image only pulled libwebp7 + libwebpmux3 — both transitive deps of ffmpeg's libavcodec — but NOT libwebpdemux2, which nothing else depends on. Without it, ``import PIL._webp`` fails to resolve libwebpdemux.so.2, the WebP plugin registers no save handler, and the conversion dies with ``KeyError: 'WEBP'`` (Sentry ANTHIAS-1Y, 11 events, all pi3). Add libwebp7 / libwebpdemux2 / libwebpmux3 to the runtime base_apt_dependencies explicitly so the pipeline never depends on ffmpeg's transitive graph — same rationale as the libheif1 entry beside them. Inert on 64-bit boards (their Pillow wheel bundles its own webp libs), so listed unconditionally for simplicity, matching libheif1. Verified: a from-source Pillow 12.2.0 _webp.so on debian:trixie links libwebp.so.7, libwebpmux.so.3, libwebpdemux.so.2 and libsharpyuv.so.0; installing ffmpeg+libheif1 leaves libwebpdemux2 absent. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:44:45 +02:00
Viktor Petersson	2f1016008f	fix(server): surface the exception detail in GitHub update-check warnings (#3023 ) - A bare 'no data' hid the host/timeout info str(exc) carries — Sentry ANTHIAS-8 read 'ConnectionError fetching latest release from GitHub: no data', which said nothing actionable - Ported from #3014 (the rest of that PR shipped via #3019) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:04:39 +02:00
Viktor Petersson	03ae4eb187	chore(release): bump to 2026.06.3 (#3037 ) - CalVer (YYYY.0M.MICRO); still June 2026, micro 2 -> 3 - Gives Sentry a real release boundary: every build since 2026.6.2 reported the same base version (only the +git-hash differed), so resolved-in-next-release never stuck and fixed issues kept reopening on the next event. A version bump lets the deployed fixes actually clear from the board. - Ships the crash/noise fixes merged since 2026.6.2: SQLite WAL + busy timeout (#3015), celery migration-gate (#3016) and asset-probe soft limits (#3017), transient-redis/CancelledError Sentry filtering + redis healthcheck (#3018/#3028), GitHub update-check log level (#3019), webview respawn on D-Bus death at setup and mid-play (#3020/#3031), resilient static-file scan (#3026), Wayland-socket wait (#3030), and Sentry release/board triage tags (#3021/#3025) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:02:55 +02:00
Viktor Petersson	6bcffc5b63	chore(deps): sync uv.lock with pyproject.toml pins (#3039 ) master's committed uv.lock had drifted from its own pyproject.toml: uvicorn, pyyaml, django-stubs and types-python-dateutil were bumped in earlier merged PRs but the lockfile was never regenerated, so `uv lock --check` failed on master. Regenerate the lock so it matches the already-declared `==` pins (httptools moves transitively with uvicorn[standard]). No pyproject changes — no new dependency decisions. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 08:02:32 +02:00
Viktor Petersson	3b5ad782bc	fix(celery): store display_power in redis as a string, not a bool (#3032 ) - diagnostics.get_display_power() returns str \| bool; redis-py rejects a bool with DataError: Invalid input of type: 'bool', so every clean CEC True/False crashed the get_display_power task and left the key unset (Sentry ANTHIAS-2C, 32 events) - Coerce to str before r.set: the v2 System Info API exposes display_power as string \| null and passes it through, so 'True'/'False'/'CEC error' all fit — and the on/off state now actually populates instead of only the error fallbacks landing - The existing test asserted the buggy bool write (a Mock redis accepted it); rewrite it to assert str coercion across True/False/ error, with an isinstance guard against the DataError Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:46:50 +02:00
Viktor Petersson	0145e6ed81	fix(sentry): silence celery redis-backend reconnect-retry log (#3036 ) - celery's redis result backend logs its own connection-loss retries at ERROR ("Connection to Redis lost: Retry (4/20) in 1.00 second.") while it retries on its own — the same expected-transient noise as the consumer/beat loggers already ignored, but from the celery.backends.redis logger (Sentry ANTHIAS-2E) - ignore_logger('celery.backends.redis'); extend the regression test Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-09 07:45:15 +02:00
Viktor Petersson	871401c614	fix(viewer): wait for cage's Wayland socket before spawning the webview (#3030 ) * fix(viewer): wait for cage's Wayland socket before spawning the webview - On Wayland boards a webview respawn that races a moment when cage's socket is briefly absent had every attempt die with Qt's "Failed to create wl_display (Connection refused)", burning the whole inline retry budget and crash-looping the container (Sentry ANTHIAS-19) - _wait_for_wayland_socket() blocks (bounded) for $XDG_RUNTIME_DIR/$WAYLAND_DISPLAY before each spawn — the Wayland analogue of start_viewer.sh's /dev/fb0 and eglfs-display waits - No-op on linuxfb/eglfs boards, when the env names no socket, and (the common case) when the socket is already up; a dead compositor falls through after the timeout so the normal spawn-failure path still runs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): address review — gate Wayland wait on env, share the deadline - _is_wayland_board() is x86-only, so gating the socket wait on it skipped exactly the Pi 5 where ANTHIAS-19 fired. Gate on the WAYLAND_DISPLAY env cage exports instead — set on x86/arm64/pi5 alike, unset on linuxfb/eglfs — so the wait engages on every cage board (Copilot) - Fold the wait into _spawn_webview_once's existing monotonic deadline so the socket wait and the handshake share one startup_timeout budget rather than stacking up to +10s on the inline respawn path (Copilot) - Drop the now-unused BROWSER_WAYLAND_SOCKET_WAIT_SECONDS constant - Tests now drive the real env signals (incl. a DEVICE_TYPE=pi5 case that fails if the wait is board-gated) and pass a monotonic deadline Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 21:28:17 +02:00
Viktor Petersson	056f539d4d	fix(viewer): respawn the webview on a video-play D-Bus death, not log+stop (#3031 ) - MPVMediaPlayer.play/stop called playVideo/stopVideo directly and, on a webview-gone D-Bus error (NoReply — the webview crashed mid-call), logged ERROR and gave up: a Sentry event for a self-healing condition, and the video stayed dead until the next rotation respawned the webview (Sentry ANTHIAS-1A) - Route both through the same _send_to_webview wrapper the image/page paths use (#3012): reap + respawn + retry once on webview death, injected via set_send_to_webview() in setup() alongside the bus - Genuinely unexpected errors still propagate to play/stop's existing log+clear-state handling; with no wrapper injected (tests/standalone) calls run directly, preserving prior behaviour - Add tests for the routing, respawn-recovery, re-raise, and the no-injection fallback Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-08 21:27:33 +02:00
Viktor Petersson	fe942a50f6	fix(sentry): also drop transient redis TimeoutError events (#3028 ) - The before_send filter caught redis.exceptions.ConnectionError but not redis.exceptions.TimeoutError — in redis-py the two are siblings under RedisError, not parent/child, so a redis outage that hangs the socket (rather than refusing) slipped through to Sentry - Surfaced post-deploy as ANTHIAS-1B (Timeout connecting to server, viewer resolution reporter) once the build-hash release tags made it identifiable - Match on both types; add the redis-stubs TimeoutError entry and a regression test that also pins the sibling (not subclass) relation Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 21:07:30 +02:00
Viktor Petersson	757feaef2a	fix(server): survive unreadable static files instead of crash-looping (#3026 ) * fix(server): survive unreadable static files instead of crash-looping - A balena OTA rewrote the staticfiles layer onto a device with corrupted ext4 metadata; WhiteNoise's one-shot startup scan raised OSError 117 (Structure needs cleaning) at ASGI import and uvicorn crash-looped, bricking the device over one unreadable Django-admin vendor file (Sentry ANTHIAS-Y, 400+ events from one device) - Subclass the middleware with a per-entry fault-tolerant scan: skip what the filesystem refuses, serve the rest, and emit one ERROR per startup so the storage fault still reaches Sentry once per boot - Add a minimal whitenoise stub (channels-stubs pattern) so the subclass type-checks under strict mypy Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): address review — robust static URL join + module logger - Normalise root with a trailing separator before slicing so a root without one can't yield a '/static//css/app.css' double-slash URL that fails to match requests (this was also the CI test failure) - Use a module-level logger instead of the root logger, per the codebase convention - Tests now assert the full canonical /static/ URLs so a URL-join regression can't slip through Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): address review — close scandir FDs, partial stub, real responses - Iterate os.scandir under a with-block so directory FDs close promptly on large/deep trees - Drive the middleware tests through update_files_dictionary directly (the test settings enable WHITENOISE_AUTOREFRESH, so __init__ never scans) and assert the full canonical /static/ URLs — this is also what the prior CI failure was telling us - get_response returns a real HttpResponse per Django's contract; tighten the stub's get_response to HttpResponseBase (no None) and mark py.typed partial, matching channels-stubs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 21:07:00 +02:00
Viktor Petersson	eb1baa6fbf	feat(sentry): stamp the release with the build hash and tag balena deploys (#3025 ) * feat(sentry): stamp the release with the build hash and tag balena deploys - The CalVer alone proved ambiguous: a balena OTA deploy from master ships new code while pyproject still carries the last tagged version, so pre- and post-deploy events were indistinguishable in the 2026.6.2 audit - Postfix the release with GIT_SHORT_HASH (already baked into every image by tools/image_builder) as semver build metadata: 2026.6.2+abc1234 - Tag events balena=true/false via the same is_balena_app() helper the reboot/shutdown tasks use, so triage knows which operational playbook applies - Add tests for the release stamping and the tag wiring Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sentry): keep the balena check import-light, test it behaviorally - Importing anthias_common.utils into Django settings dragged sh/requests/redis into every settings load and crashed django-stubs' mypy plugin in CI's slim environment - Inline the BALENA env check as is_balena_deploy() and pin it against the canonical is_balena_app() helper in a test so the two can't drift (also replaces the brittle source-text assertion — Copilot) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 19:42:02 +02:00
Viktor Petersson	9863d8c9d3	fix(viewer): aspect-fit, gapless looping, and 30fps cap for pi1/2/3 video (#3004 ) * fix(viewer): aspect-fit, gapless looping, and 30fps cap for pi1/2/3 video Fixes the issue #2987 regressions on the Qt5 linuxfb boards by moving playback from a bash gst-launch relaunch loop into a small in-process GStreamer helper (anthias_viewer/gst_fbdev_player.py): - Portrait/4:3 videos no longer stretch to the framebuffer: a CAPS-event pad probe reads the decoder's native dims + PAR and pins aspect-fit caps (pixel-aspect-ratio=1/1) on the capsfilter, so the bcm2835 ISP scales aspect-correct and fbdevsink centers the frame. The previous fb-sized forced caps parked the distortion in a PAR that fbdevsink ignores (reproduced on-device: 1080x1920 -> 3840x2160 par 81/256). - Clips no longer freeze/cut at loop boundaries: playbin about-to-finish re-queues the same URI for a gapless loop instead of rebuilding the whole pipeline per iteration (0.4-1.7 s per loop measured on a Pi 4, several seconds on a Pi 3, all eaten out of the fixed slot duration). Flush-seek on EOS and NULL->PLAYING restart remain as fallbacks. - 50/60 fps sources drop to an even 30 fps cadence up front (videorate drop-only) instead of juddering on irregular late-frame drops; the decode->ISP->memcpy chain sustains ~40 fps at 1080p on a Pi 3. - The framebuffer is zeroed at startup so letterbox borders are black rather than remnants of the previous asset. - The helper runs by path (not -m) so the package __init__ (Django settings, redis, D-Bus) never imports in the child; validated e2e on the armhf image: negotiation, rotation, looping, SIGTERM exit 0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): address review feedback on the fbdev player helper - Correct the module docstring: the helper is executed by file path, not -m (the package __init__ must not import in the child) - Fail fast with a clear log line when the GStreamer python bindings are missing instead of crashing with a traceback - Clear the framebuffer in scanline-sized chunks so a 4K console doesn't peak a ~33 MB allocation on a 512 MB board Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): degrade to silent video when the audio branch fails Integrated testbed run surfaced a wholesale-failure mode the relaunch loop also had: a broken audio branch killed the video with it. Two real-world triggers: the ALSA card is absent (HDMI audio disabled in config.txt -> no vc4hdmi), and an undecodable audio codec (AC3 - a52dec lives in plugins-ugly, not shipped). Retry once with GST_PLAY_FLAG_AUDIO cleared on both the synchronous start failure (alsasink can't reach READY) and the first async pipeline error; a genuine video error recurs on the retry and still exits non-zero. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): read the visible fb resolution via FBIOGET_VSCREENINFO Integrated testbed run surfaced a divergence the sysfs read hides: sysfs virtual_size reports xres_virtual/yres_virtual, which can be larger than the scanned-out mode (panning / double-buffer configs — observed live: visible 1920x1080, virtual 3840x2160). fbdevsink centers/crops against varinfo.xres/yres, so scaling to the virtual size paints mostly off-screen. Query the same ioctl fbdevsink uses; keep the sysfs read (and the 1080p default) as fallbacks for hosts without fb access. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): swap the audio sink for fakesink on the video-only retry Clearing GST_PLAY_FLAG_AUDIO is not sufficient: an element set on the audio-sink property remains a playsink child and is still state-synced with the pipeline, so a failing alsasink failed the retry too (observed live on the testbed). Replace it with a fakesink when degrading to silent video. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): pre-flight the ALSA device and rebuild on audio failure The integrated testbed run showed two things the previous in-place retry missed: - a playbin whose sink activation failed does not reliably restart after NULL: the video-only retry failed instantly on the reused element with no further GStreamer error; - alsasink opens the PCM on NULL->READY, so a missing card is detectable synchronously before it can poison playbin's whole sink activation. So: pre-flight the device with a standalone alsasink and only wire the audio branch when it opens; on any pipeline error with audio enabled (e.g. an undecodable AC3 track mid-preroll), tear down and rebuild a fresh video-only playbin instead of restarting the errored one. Genuine video errors recur on the rebuilt pipeline and still exit non-zero. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): satisfy mypy on the gi-typed returns Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 16:00:33 +02:00
Viktor Petersson	e882417c0c	perf(viewer): pace Qt6 video frame delivery to scene-render capacity (#3006 ) * perf(viewer): pace video frame delivery to scene-render capacity Part of issue #2987: 1080p60 content on Pi 4 presented at 22.6 fps with the playback position falling to ~0.6x realtime (clips ended early), because every sink delivery scheduled a QML scene render on a GUI thread that sustains ~45 renders/s at 1080p — overload made throughput collapse below even the 30 fps a 1080p30 clip achieves. QMediaPlayer now renders into an intermediate QVideoSink; frames forward to the VideoOutput's sink only once the scene graph has composited the previous one (QQuickWindow::afterRendering). 30 fps sources pass untouched; 60 fps sources settle into an even ~half cadence instead of irregular drops. If the render signal is not wired the gate falls back to unpaced forwarding. Stats lines gain a frames-forwarded field between frames-delivered and frames-rendered so the gate is observable in playback-stats.log. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * perf(viewer): forward parked frames on render completion (mailbox) The 1-deep gate was stop-and-wait: render -> re-arm -> idle until the next sink delivery -> render, measuring only ~23 presented fps on a GUI thread that renders faster back-to-back. Park the newest frame that arrives mid-render in a single-slot mailbox and forward it the moment afterRendering fires, so renders chain at capacity with at most one frame of latency. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): reset the pacing gate and clear the sink on stop() A frame parked in the mailbox at stop() time could be forwarded by a later afterRendering (stale-frame flash on the next reveal) and kept its decoder buffer alive between assets. Clear the mailbox, re-arm the gate, and push an empty frame to the VideoOutput so the last displayed buffer is released too (review feedback). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 15:54:06 +02:00
Viktor Petersson	edcd6f381a	fix(celery): gate worker startup on applied database migrations (#3016 ) * fix(celery): gate worker startup on applied database migrations - The celery container boots in parallel with anthias-server, whose start script is still running dbbackup -> migrate (or the dbrestore fallback that drops and re-creates every table) - A task replayed off the Redis broker in that window died with OperationalError: no such table: assets — one burst per device on every upgrade/first boot (Sentry ANTHIAS-1) - Block in worker_init until the unapplied-migration plan is empty; tasks stay queued in the broker while waiting, so nothing is lost - Treat probe errors (database locked, missing django_migrations table) as not-ready instead of raising - Works for every deployment topology (compose, balena, dev, test) without touching the five compose files that spell the worker CMD Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(celery): address review — log probe failures, rate-limit wait warning - Log the migration-readiness probe's underlying error at DEBUG so a persistent non-transient cause is identifiable from device logs - Repeat the waiting warning every 30s instead of every 5s poll - Patch time.sleep by dotted path for strict mypy Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(celery): address review — only DatabaseError counts as not-ready - Catch django.db.utils.DatabaseError in the readiness probe; a programming bug now fails fast instead of parking the worker in an infinite wait - Reword the docstring to be accurate about DEBUG-level visibility - Add a fail-fast regression test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(celery): address review — explicit next-log threshold for the wait warning Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(celery): address review — point the wait warning at the DEBUG probe logs Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 15:52:42 +02:00
Viktor Petersson	43b937563b	fix(sentry): stop reporting transient redis blips and client disconnects (#3018 ) * fix(sentry): stop reporting transient redis blips and client disconnects - Redis restarting (container recycle, compose startup before DNS resolves) produced an error event per process per blip even though every consumer self-heals: celery reconnects with backoff, the viewer's resolution reporter retries next tick, Channels re-establishes on the next frame (Sentry ANTHIAS-M, ANTHIAS-K, ANTHIAS-H, ANTHIAS-J) - Add a before_send hook that drops events whose exception chain contains redis.exceptions.ConnectionError or asyncio.CancelledError (an HTTP client hanging up mid-request under ASGI — ANTHIAS-N) - Silence celery's per-reconnect-attempt ERROR log at the logger (it arrives as a log message, not an exception) - Downgrade the viewer reporter's redis-down log to a warning and extract the tick body into a testable helper - Add regression tests for the filter and the reporter tick Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sentry): address review — typed before_send, cleaner test fixtures - Annotate the hook with sentry_sdk.types Event/Hint for strict mypy - Build exc_info triples directly in tests instead of catching BaseException (Sonar S5754) and compare events by equality (Sonar S5796) - Use record.getMessage() in the caplog assertion (Copilot) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tests): address review — make the ignored-logger test order-independent Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tests): address review — lift the module-wide logging disable for caplog tests Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sentry): silence celery beat's reconnect-retry log too - The embedded beat scheduler logs every broker reconnect attempt at ERROR ("beat: Connection error ... Trying again"), the same expected-transient noise as the consumer logger (Sentry ANTHIAS-P) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sentry): address review — respect __suppress_context__ in the chain walk Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(compose): healthcheck redis and gate services on it answering PING - depends_on with bare service_started only orders container creation; uvicorn/celery/viewer could still race a redis that hadn't finished loading its RDB, producing the startup connection-refused noise (review feedback on this PR) - Add a redis-cli ping healthcheck to the prod template, dev, and test composes, and gate anthias-server / anthias-viewer / anthias-celery on service_healthy - compose-only: the balena supervisor doesn't support depends_on conditions, and a redis container recycling mid-life is gated by nothing — so the Sentry-side handling of transient redis errors stays Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 15:51:54 +02:00
Viktor Petersson	50cc80455a	feat(sentry): tag events with device type, host kernel, and board model (#3021 ) * feat(sentry): tag events with device type, host kernel, and board model - Events are sent from inside containers, so Sentry's stock OS detection never sees the host — the armv7 webview-crash cohort (ANTHIAS-D / ANTHIAS-F) was impossible to segment because nothing on the event said which kernel the device boots - Tag device_type (baked into the image), kernel_release / kernel_machine (containers share the host kernel), and the device-tree board model - Add tests for the board-model reader Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sentry): address review — decode the device-tree model as UTF-8 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sentry): address review — trim only the trailing NUL/whitespace Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sentry): address review — match device_helper's device-tree trim idiom Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:50:29 +02:00
Viktor Petersson	daa1d4bbb6	fix(server): open SQLite with WAL journaling and a busy timeout (#3015 ) * fix(server): open SQLite with WAL journaling and a busy timeout - uvicorn, the celery worker, and the viewer share one SQLite file across containers; the stock rollback journal plus a 0s busy timeout raised OperationalError: database is locked fleet-wide (Sentry ANTHIAS-C, ANTHIAS-E, ANTHIAS-G) - timeout=20 waits for the lock instead of failing on the spot - journal_mode=WAL lets readers and the writer coexist; synchronous=NORMAL is the recommended WAL pairing - transaction_mode=IMMEDIATE queues concurrent writers on the busy handler instead of deadlocking on the read-to-write lock upgrade - Add regression tests covering the connection options Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tests): annotate tmp_path for strict mypy Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:43:19 +02:00
Viktor Petersson	5b6e0a4adc	fix(test): give the test-stack celery worker the test environment (#3013 ) * fix(test): give the test-stack celery worker the test environment The anthias-celery service in docker-compose.test.yml lacked ENVIRONMENT=test and ANTHIAS_TEST_DB_PATH, so the worker's settings took the production branch. - Worker read the empty /data/.anthias/anthias.db instead of the test DB, failing every dispatched task with "no such table: assets" - Worker picked up the default production Sentry DSN, leaking test-run errors into the production project tagged as production - Share the environment block between anthias-test and anthias-celery via a YAML anchor so the two cannot drift again Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(integration): assert video normalisation's terminal state With the celery worker now on the correct test DB, the upload test raced the worker: it asserted the transient placeholder duration and is_processing=True, which the (newly working) normalisation pipeline overwrites moments later. - Wait for is_processing to clear instead of racing the worker - Assert the probed duration (5 s) and the codec-gate rejection (no DEVICE_TYPE in the test container means an empty HW-decode set) - The wait doubles as a regression guard for the worker reading the wrong DB: that failure mode leaves the row stuck on processing Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test: fix comment typo (ffprobe'd) from Copilot review Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:39:30 +02:00
Viktor Petersson	633b41ed87	fix(celery): catch the asset-probe timeout instead of hard-killing the worker (#3017 ) * fix(celery): catch the asset-probe timeout instead of hard-killing the worker - revalidate_asset_url's 30s hard limit was reachable by a legitimate probe (DNS stall + HEAD 10s + GET 10s), and tripping it SIGKILLs the pool child — three Sentry issues per occurrence (ANTHIAS-A, ANTHIAS-9, ANTHIAS-B) - Add soft_time_limit=60 / time_limit=90: the soft limit raises inside the task, which records the verdict an HTTP timeout gets (unreachable) instead of dying - Give the periodic sweep the same treatment: abort cleanly a minute before its hard limit, releasing the singleton lock - Add regression tests for limits and soft-timeout behaviour Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(celery): address review — cover the DB update with the soft-limit catch - The soft signal is delivered asynchronously, so it can land during the row UPDATE as well as the probe; cover the whole task body in both the on-demand recheck and the sweep - Re-raise SoftTimeLimitExceeded past the sweep's blanket per-asset handler so the outer abort path sees it - Satisfy strict mypy on the Optional time-limit comparisons; reword a misleading test comment Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:38:34 +02:00
Viktor Petersson	9ff20a82b6	fix(server): log GitHub release-check failures at warning, not error (#3019 ) * fix(server): log GitHub release-check failures at warning, not error - A device that can't reach api.github.com (offline installs, locked-down networks, GitHub outages, rate limits) is a routine condition the update check already degrades through gracefully: 5-minute backoff plus the cached last verdict - ERROR-level logs land in Sentry, so every offline device produced events on each splash-page render (Sentry ANTHIAS-8) - Downgrade every failure path in lib/github.py to warning and pin the module as ERROR-free with a regression test Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tests): address review — AST-walk the github module for ERROR logging - Replace the brittle substring check with an AST walk over actual logging.* calls, so comments can't false-positive and ERROR-level calls can't slip through renamed (Copilot) - Patch logging by dotted path for strict mypy Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tests): address review — drop unused fixture, accurate AST-check docstring Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tests): address review — include the logging.fatal alias in the AST guard Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 14:37:47 +02:00
Viktor Petersson	d2960d56f4	fix(viewer): respawn the webview when it dies before setup's bus.get (#3020 ) - The armv7 Qt5 init crash can strike in the gap between the D-Bus handshake (which made load_browser() return) and setup()'s bus.get — the anthias.viewer name is released again, pydbus raises ServiceUnknown, and the GError escaped main() into a container restart loop (Sentry ANTHIAS-3) - Apply the same webview-gone detection and respawn-then-retry-once contract as _send_to_webview (#3012), with the generous startup budget since nothing is on screen yet - Unrelated D-Bus errors (e.g. Disconnected) still propagate so the container restart handles what a respawn can't fix - Add regression tests for both paths Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 13:39:49 +02:00
Viktor Petersson	50aa201e53	fix(viewer): respawn the webview when it dies mid-D-Bus call instead of crashing (#3012 ) * fix(viewer): respawn the webview when it dies mid-D-Bus call instead of crashing - The armv7 Qt5 heap-corruption crash that load_browser() already retries past can also strike after the D-Bus handshake; the death then surfaces as a GError (NoReply) out of the in-flight loadImage/loadPage call, escapes main(), and turns one process crash into a container restart loop (Sentry 58040ab3) - Wrap loadImage/loadPage in _send_to_webview(): on a webview-gone D-Bus error, reap the dead process, respawn with the inline budget, and retry the call once; anything else still raises - Reset current_browser_url in load_browser() so a respawned webview always gets its asset re-sent, even when the URL is unchanged (a crashed single-asset playlist previously respawned to a blank screen forever) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tests): use https URLs in new viewer tests - Resolves SonarCloud S5332 security hotspots on the PR Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 11:09:42 +02:00
Viktor Petersson	8ff0331c32	fix(viewer): silence sh's monitor-thread crash report on webview exit (#3011 ) - Pass _bg_exc=False when spawning AnthiasViewer: sh's default re-raises the exit error (e.g. SignalException_SIGABRT on a Qt init crash, or SIGTERM from our own teardown) inside its daemon monitor thread, where nothing can catch it - Sentry reported these as unhandled errors even though the handshake watch already detects the death and load_browser() already retries - The handle is only used via is_alive()/process.stdout/terminate(), never .wait(), so no exception is silently deferred - Lock the kwargs in with a regression assertion in test_load_browser Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 10:36:36 +02:00
Viktor Petersson	f26f3e7e0c	fix(server): keep legacy host timezones from crash-looping Django (#3010 ) - Hosts with a legacy alias in /etc/timezone (e.g. US/Central) passed the old pytz validation but failed Django's own check against /usr/share/zoneinfo, raising ValueError at startup in every Django process (server, celery, migrate) - Validate the way Django does: a zoneinfo lookup plus the on-disk file check, falling back to UTC when either fails - Ship tzdata-legacy in the base image so legacy aliases resolve and devices keep their actual local time instead of dropping to UTC - Add regression tests for the validation fallback Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 10:31:41 +02:00
Viktor Petersson	040094a35e	chore(release): bump to 2026.06.2 (#3009 ) - CalVer (YYYY.0M.MICRO); still June 2026, micro 1 -> 2 - Ships the Qt 6 video audio fix (#3001) — PulseAudio in the viewer container; videos were silent on pi4-64/pi5/x86/arm64 since the QtMultimedia migration - Adds the arm64/Qt6 pi3-64 board and the Rock Pi 4 fleet (#2985) - Page-load watchdog so a stalled fetch can't freeze the display (#3003), Sentry error tracking for the Django services (#3007) - Redis data persisted to the mounted volume so device identity survives recreation (#2983); unpinner also rolls OS + supervisor updates (#2984) - Streamed backup downloads (#3005), 12-hour AM/PM asset times (#3002), BuildKit frontend via mirror.gcr.io (#3008) Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 08:36:23 +02:00
Viktor Petersson	da53e045a6	fix(server): stream the backup download so large libraries don't time out (#3005 ) * fix(server): stream the backup download so large libraries don't time out Part of issue #2987 ("Get backup" reportedly never completes on a Pi 3B+): the settings download built the whole tar.gz on disk before sending the first byte. tar+gzip at the default level 9 measures 98 s for 355 MB on a Pi 4 (~3.6 MB/s) — a multi-GB library on a Pi 3 sends nothing for longer than a browser keeps a byte-less request alive, so the download always aborted. - settings_backup now streams the archive while it is built (StreamingHttpResponse over a pipe fed by a tarfile producer thread): first bytes hit the wire immediately, nothing is staged on the SD card, and a client disconnect stops the producer. - gzip level drops to 1 for both the streamed and the API (create_backup) paths — backups are mostly already-compressed media, so level 9 burned minutes of CPU for ~no size win. - The Content-Disposition filename is RFC-8187-escaped via Django's content_disposition_header (player_name is operator-controlled). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(server): surface backup-stream producer failures, fix mypy type - stream_backup() re-raises a producer failure once the pipe drains, so the response aborts mid-transfer instead of completing 200 with a silently truncated archive (review feedback). Client disconnects still log-and-stop without morphing into spurious errors. - Return type is Generator (not Iterator) — the disconnect test calls .close(), which mypy rejects on Iterator. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 08:00:28 +02:00
Viktor Petersson	10c68b26cc	feat(viewer,build,balena): add arm64/Qt6 pi3-64 board and the Rock Pi 4 fleet; keep 32-bit pi3 as legacy (#2985 ) * feat(viewer,build): add arm64/Qt6 pi3-64 board; keep 32-bit pi3 as legacy Revises issue #2906 Phase 2. The original plan (delete the Qt 5 toolchain, force Pi 2/Pi 3 onto Qt 6) is abandoned: Qt 5 was fixed up on master and stays. Instead, add a NEW board target `pi3-64` — a 64-bit (arm64) Qt 6 viewer image for Raspberry Pi 3 hardware on a 64-bit OS — as its own image stream, disk image, and balena fleet. The legacy 32-bit armhf/Qt5 `pi3` board is left untouched and flagged as legacy/maintenance. pi3-64 mirrors the existing `pi4-64` path (Qt 6, eglfs_kms; video played in-process by AnthiasViewer's QtMultimedia pipeline — QMediaPlayer + the ffmpeg/libavcodec backend with V4L2 HW decode, no external player). VideoCore IV is H.264-only HW decode. Board selection is by `uname -m`: a Pi 3 on a 64-bit OS gets `pi3-64`, a 32-bit OS keeps `pi3` (the model string is identical on both arches). - image_builder: pi3-64 build params (arm64) + is_qt6; constants. - Dockerfile.viewer.j2 + start_viewer.sh: pi3-64 shares the pi4-64 eglfs KMS path; renamed board-agnostic eglfs-kms-pi4.json -> eglfs-kms.json. - Detection: install.sh / upgrade_containers.sh (aarch64 Pi 3 -> pi3-64). - Runtime: media_player force_mpv set (selects MPVMediaPlayer, the QtMultimedia D-Bus shim); processing codec grid {'h264'}. - CI: docker-build matrix + mirror-latest-tags. - Balena (fleet screenly_ose/anthias-pi3-64, device type raspberrypi3-64): disk-image + manual-deploy workflows, balena_ota_deploy.sh, balena_fleet_maintenance.py, balena_unpin_devices.py, deploy_to_balena.sh, balena-host-config.json. - Pi Imager: SUPPORTED_BOARDS += pi3-64 (non-maintenance); pi3 stays legacy. - Docs + tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * docs(website): link the Pi 3 (64-bit) bullet like its siblings Copilot review: the list is introduced as 'links to the images', so the new pi3-64 entry should be navigable like the surrounding bullets. Link the label to the release-images section. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(balena): add the Rock Pi 4 fleet (screenly_ose/anthias-rockpi4) Wires the anthias-rockpi4 balena fleet (device type rockpi-4b-rk3399) into the OTA deploy + disk-image pipeline. The fleet has no board-specific image build: it runs the generic arm64 containers, so bin/balena_ota_deploy.sh / bin/deploy_to_balena.sh map the rockpi4 board to the <short-hash>-arm64 image tags (and strip the /dev/vchiq mount — no VideoCore on RK3399), and the disk-image preflight verifies the arm64 images exist. Root-cause fix for the fleet's codec gate: balena ships no anthias_host_agent service, so host:board_subtype was never published and resolve_device_key() stayed 'arm64' — whose HW-decode set is empty, rejecting every video upload. The model-string → subtype table moves to the dependency-free anthias_common.device_helper.detect_board_subtype (single source, imported by host_agent), and anthias_common.board.get_board_subtype now falls back to reading /proc/device-tree/model in-container when Redis has no value. The device tree is kernel-global — the same mechanism get_device_type has always used for Pi detection — so the rockpi4 fleet resolves its {h264, hevc} envelope without a host-side daemon, and compose installs whose host_agent died self-heal too. - build-balena-disk-image.yaml: rockpi4 in both matrices, fleet + rockpi-4b-rk3399 image cases, arm64 images in the preflight check. - deploy-balena-manual.yaml: rockpi4 board option. - balena-host-config.json: rockpi4 declared {} (config.txt is RPi-only; the reconcile hard-fails on a missing key). - balena_fleet_maintenance.py / balena_unpin_devices.py: fleet added. - tests: get_board_subtype Redis-first + device-tree-fallback order; detect_board_subtype patch targets follow the move. - docs: board-enablement, balena-fleet-host-config, installation-options. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:49:12 +02:00
Viktor Petersson	6f01e0aa33	fix(webview): add page-load watchdog so a stalled fetch can't freeze the display (#3003 ) - Chromium has no overall navigation timeout: a fetch whose packets stop arriving mid-load (WiFi dropout, no FIN/RST) keeps loadFinished from ever firing, so the dual-view swap never happens and the screen freezes on the previous asset until the container is restarted - Arm a single-shot QTimer per load attempt (default 30s, tunable via ANTHIAS_WEBPAGE_TIMEOUT_S, clamped to 5-3600s) - On timeout, stop() the wedged navigation (cancelling its pending network I/O so dead sockets don't pile up in the connection pool) and retry the same URI on a fresh request - Failed-fast loads (DNS / connection refused) reuse the watchdog as a paced retry tick - the viewer only re-sends loadPage when the URL changes, so a single-webpage playlist gets no other retry - Disarm the watchdog on every path that supersedes the pending load (success, loadImage, playVideo) Fixes #2999 Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:48:23 +02:00
Viktor Petersson	e211ec848e	feat(sentry): add Sentry error tracking to the Django services (#3007 ) * feat(sentry): add Sentry error tracking to the Django services Adds sentry-sdk to the server and viewer dependency groups (plus the mypy group, since that CI job imports settings) and initialises it in the shared Django settings module, so anthias-server, anthias-celery, and the viewer all report crashes. - The DSN can be overridden via the SENTRY_DSN env var (pointing at an operator's own Sentry project), or set to an empty string to disable crash reporting entirely. - environment= mirrors the ENVIRONMENT env var (production default) so dev/CI events are filterable in Sentry. - release= comes from pyproject.toml's [project].version via the existing get_anthias_release() helper. - send_default_pii is gated on the existing analytics_opt_out knob in anthias.conf — opted-out devices still report crashes, but without request headers / IPs / user data. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(sentry): don't send events from test runs by default The unit suite is built to run with no external network dependencies (conftest.py force-mocks Redis for the same reason), and exceptions raised on purpose by failing tests must not land in the production Sentry project. Default the DSN to empty under ENVIRONMENT=test or pytest (reusing the existing argv detector, moved up from the DATABASES section); an explicit SENTRY_DSN still wins so the integration stack can opt in deliberately. Addresses the Copilot review comment on the hardcoded default DSN. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(sentry): pin the no-events-under-pytest guarantee Asserts the client has no DSN, builds no transport, and drops capture calls when running under pytest, so a future settings.py refactor can't silently re-enable sending from test runs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:41:46 +02:00
Viktor Petersson	7fc57fecf0	fix(docker): pull the BuildKit frontend via mirror.gcr.io (#3008 ) * fix(docker): pull the BuildKit frontend via mirror.gcr.io The `# syntax=docker/dockerfile:1.4` directive made every image build fetch the frontend from registry-1.docker.io — the last remaining Docker Hub dependency (base images already come from mirror.gcr.io, bun/uv from ghcr.io). Docker Hub pulls from shared GitHub runner IPs intermittently time out, failing CI before the build even starts. Re-point the directive at Google's pull-through cache, which serves the same multi-arch manifest list. The version pin stays for frontend reproducibility. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(docker): bump the BuildKit frontend pin from 1.4 to 1.24 1.4 dates to May 2022; 1.24 is the current release. Nothing in the templates needs newer syntax (--mount=type=cache predates 1.4), so this is purely picking up four years of frontend bugfixes. Keeps the minor-pin convention — the tag floats only over patch releases. Validated by building the rendered redis image against the mirrored 1.24 frontend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(docker): use ENV key=value form flagged by 1.24 build checks `docker build --check` with the 1.24 frontend flags the legacy `ENV DEBIAN_FRONTEND noninteractive` form (LegacyKeyValueFormat) in the test template — the only hit across all four templates. All rendered Dockerfiles now lint clean against the new frontend. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:41:21 +02:00
Viktor Petersson	fb4770bfe3	feat(balena): unpinner also rolls OS + supervisor updates (#2984 ) * fix(viewer): fix two startup warnings surfaced by the log cleanup With debug logging removed (#2977), two pre-existing startup warnings became visible in the viewer logs. Both are fixed here: - migrate_legacy_paths.sh: `set -euo pipefail` + the bare `${USER}` in USER_HOME aborted the script when $USER is unset — which is the case in-container (DATA mode, via migrate_in_container_paths.sh). On a legacy (screenly→anthias) device that would skip the in-container migration entirely. Fall back to `id -un` so it resolves without tripping `set -u`. - AnthiasViewer: the MainWindow ctor called showFullScreen() AND main() called window->show(), showing the window twice. Under cage/wayland the double surface-commit tripped wlroots' "A configure is scheduled for an uninitialized xdg_surface" warning at startup. Show once, in main(), after construction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * feat(balena): unpinner also rolls OS + supervisor updates bin/balena_unpin_devices.py cleared the app-release pin but left unpinned devices on whatever old balenaOS / supervisor they booted. Add two optional, cloud-API-only phases so the one hourly job brings the fleet fully current: - --os-update: start a resinhup toward the latest balenaOS for the device type (POST actions/<base>/v2/<uuid>/resinhup, base resolved from /config). Online-only, skips OS < 2.14.0 (single-hop HUP floor per balena-hup-action-utils), bounded to --os-percent (default 5%) of the eligible population per run so the backlog ramps instead of stampeding. - --supervisor: point devices at the newest supervisor release for their CPU architecture (PATCH should_be_managed_by__release). Restricted to devices already on the target OS so the supervisor/OS pairing stays compatible, and tranche-bounded like the OS phase. Both phases honour the anthias_keep_pinned opt-out, stay dry-run by default, and keep the aggregate-only public-log output. The hourly workflow now passes --os-update --supervisor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(balena): skip supervisor bump on legacy-OS device types getSupervisorReleasesForCpuArchitecture returns the newest supervisor for an arch regardless of OS. On a device type frozen on a legacy balenaOS line that's unsafe: raspberry-pi2 tops out at balenaOS 5.1.x (devices run supervisor 15.x), and pointing them at 17.x would likely break them. Only run the supervisor phase when the fleet's target OS is on the calendar-versioned line (major >= 2025); legacy-OS fleets keep their OS-matched supervisor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(balena): set a User-Agent so Cloudflare allows the resinhup call The device-actions host (actions.balena-devices.com) is behind Cloudflare, which 403s the default Python-urllib User-Agent as a banned client signature (error 1010) — so every OS-update POST failed. Send a descriptive User-Agent on all requests; the resinhup action then triggers normally (HTTP 202). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(balena): report actual per-fleet OS-update started/failed counts The per-fleet line printed the tranche size regardless of how many resinhup calls actually succeeded. Track started/failed per fleet and print both, so a few transient busy/offline failures are visible instead of hidden behind the planned count. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-07 07:24:43 +02:00

1 2 3 4 5 ...

4086 Commits