mirror of
https://github.com/Screenly/Anthias.git
synced 2026-06-10 17:18:43 -04:00
* fix(viewer): skip the Writeback connector in the eglfs headless guard #2962 added wait_for_eglfs_display so a screenless eglfs (Pi 4) board waits for a display instead of crash-looping on Qt's "no screens available". eglfs_has_display() treated any connector status other than "disconnected" as a present display (to tolerate bridges that report "unknown"). balenaOS 2026.x exposes a KMS `card0-Writeback-1` virtual connector that ALWAYS reports "unknown". On a headless Pi 4 (both HDMI ports "disconnected") the writeback connector's "unknown" satisfied the guard, so it skipped the wait, launched eglfs, and the viewer crash-looped on "no screens available" / "AnthiasViewer exited before emitting D-Bus handshake" — exactly the failure #2962 was meant to prevent. Confirmed on multiple live pi4 on 2026.1.0 (card0-HDMI-A-1/-2 = disconnected, card0-Writeback-1 = unknown). Skip `*Writeback*` connectors so only real display outputs (HDMI/DSI/DP/…) count. A genuinely headless board now waits gracefully; the bridge-"unknown" hedge is preserved for real connectors. Verified locally for headless, connected, and bridge-unknown layouts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): retry AnthiasViewer spawn so armv7 WebEngine-init crash self-heals - Wrap the AnthiasViewer launch in a capped exponential-backoff retry loop (BROWSER_SPAWN_MAX_ATTEMPTS) instead of raising on the first failed handshake - Convert the tight container restart loop on Pi 2/Pi 3 into an in-process retry that self-heals on a later launch - Publish viewer:webview_status to Redis (retrying/failed) so a stuck board is distinguishable from an empty playlist - Add WebviewLaunchError + _spawn_webview_once helper; throttle repeat warnings to avoid flooding journald - Cover retry-then-succeed and exhaust-then-raise paths in tests - Document the armv7 WebEngine-init crash + retry stop-gap in docs/board-enablement.md The 32-bit Qt5 viewer intermittently aborts during Chromium/WebEngine init (malloc(): unaligned tcache chunk detected) ~75-90% of launches; reproduced on a 64-bit Pi 3B+. No userspace mitigation fixes the corruption, but a fresh launch clears it ~10-25% of the time, so retrying catches a good launch within a few attempts (validated on-device: handshake on attempt 6). Clean fix is arm64/Qt6 on 64-bit OS. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): address review — bound status-beacon Redis, funnel CommandNotFound - Give the webview health beacon a dedicated Redis client with short socket timeouts (connect_to_redis gains opt-in timeout params, defaulting to the historical blocking behaviour) so a Redis stall can't hang viewer startup inside the spawn-retry loop - Wrap sh.CommandNotFound into WebviewLaunchError in _spawn_webview_once so a missing binary is reported + handled on the same path as every other launch failure instead of escaping the retry loop - Reword the board-enablement note so it describes the WebEngine-init observation without referencing a --no-sandbox flag the viewer doesn't receive - conftest: accept the new connect_to_redis kwargs in the fake factory Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): chain final launch error; harden second redis test patch - Chain the exhausted-retries WebviewLaunchError from last_error so the traceback preserves the underlying failure (timeout / early-exit / wrapped CommandNotFound) - conftest: the autouse _mock_redis fixture's connect_to_redis patch now accepts *args, **kwargs too (matches the import-time patch) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): scope spawn-retry by call site; drop write-only status beacon Addresses self-review findings on the retry mechanism: - Mid-playback respawn (view_image/view_webpage, on the asset_loop thread) now uses a small, short budget (BROWSER_SPAWN_INLINE_*) so a persistent crash can't freeze the loop (no rotations/skips/standby, watchdog starved) for minutes; startup keeps the generous budget. A persistent mid-run failure raises and the container restart re-rolls. - Permanent failures (missing binary) raise WebviewBinaryMissingError and short-circuit the retry instead of burning the full backoff budget. - _spawn_webview_once now reaps the terminated process (SIGTERM, wait, SIGKILL) on the handshake-timeout path so a retry can't overlap two AnthiasViewers contending for the framebuffer / D-Bus name. - Reset the stale `browser` global before re-spawning. - Poll spawned process every 0.25s (was 1s) so a fast init crash is noticed promptly in the retry loop. - Drop the write-only viewer:webview_status Redis beacon (no reader existed) and revert the connect_to_redis timeout-param widening + conftest churn; operator-visible status is the throttled log output. - Tests: cover early-exit, terminate-on-timeout, missing-binary short-circuit, backoff growth, and the inline budget cap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): clamp max_attempts to >=1; correct retry-logging comment - Guard load_browser against a non-positive max_attempts (would skip the loop and raise a confusing "0 attempts; last error: None") - Reword the comment: the first failure logs its reason AND a retry line, so it's not literally "one log line per attempt" Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(viewer): clamp backoff_cap and startup_timeout in load_browser - A backoff_cap below 1s would devolve into a tight retry loop; a negative one would make sleep() raise ValueError mid-retry and mask the real launch error - Clamp a negative startup_timeout to 0 (immediate-timeout attempt) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documentation
This page has moved to https://anthias.screenly.io/docs/.
The Anthias documentation now lives at https://anthias.screenly.io/docs/.