mirror of
https://github.com/Screenly/Anthias.git
synced 2026-06-10 09:08:09 -04:00
* fix(install): persistent host-agent venv (anthias-host-agent.service 203/EXEC) PR #2843 switched the installer venv to a mktemp tmpdir cleaned up on EXIT, but anthias-host-agent.service's ExecStart still hardcodes /home/${USER}/installer_venv/bin/python. Every fresh install since that refactor leaves the unit in a status=203/EXEC restart loop with no Python at the configured path, and /api/v2/info then blocks ~80s on get_node_ip() waiting for the host_agent_ready key that will never appear. Split the two venvs: * INSTALLER_VENV: still ephemeral mktemp, used by ansible-core during install/upgrade and torn down by the EXIT trap. * HOST_AGENT_VENV: new persistent venv at /home/${USER}/installer_venv (path kept stable so devices installed before the refactor don't need a unit rewrite), recreated from the host dep group on every install + upgrade so deps track pyproject.toml. provision_host_agent_venv runs after install_ansible() and before run_ansible_playbook() so the venv exists before ansible's state: started fires the unit. On upgrade the unit is already loaded with the previous venv's in-memory interpreter, so the state: started no-op never picks up the new deps — restart explicitly when the unit is already active. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(celery): switch beat to in-memory scheduler (Python 3.13 dbm.sqlite3 locking) celery -B with the default PersistentScheduler stores its schedule via shelve. On Python 3.13, shelve defaults to dbm.sqlite3, which raises dbm.sqlite3.error: locking protocol intermittently under contention — observed on x86 but not pi4-64 in this build matrix, which is consistent with a benign-looking race specific to the amd64 docker layer's filesystem ordering. When Beat stalls, reconcile_stuck_processing and the other periodic tasks set up by setup_periodic_tasks stop firing, so stuck-in-is_processing assets never get re-dispatched. setup_periodic_tasks defines every periodic task statically (no django-celery-beat / no dynamic schedule edits), so a non-persistent scheduler is sufficient. Switch to celery.beat.Scheduler in all three compose files (prod template + dev + test) and drop the --schedule /tmp/celerybeat-schedule flag that's now unused. The telemetry cooldown comment is updated to reference the new flag — the actual 24h cooldown is still gated by the Redis TTL, which is the persisted source of truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): return 404 (not 500) for unknown asset_id across v1/v1.1/v1.2/v2 AssetViewV{1,1_1,1_2,2}.get / put / patch / update and the shared DeleteAssetViewMixin / AssetContentViewMixin / ViewerCurrentAssetViewV1 all called Asset.objects.get(asset_id=...) bare. The Asset.DoesNotExist that fires for a deleted-or-typo'd id has no DRF exception handler registered, so it bubbled up as a 500 with the database traceback — caller sees a server error for what is structurally a missing resource. AssetRecheckViewV2 already gets this right via filter(...).exists() + explicit 404; standardise the rest by routing the lookup through django.shortcuts.get_object_or_404 (DRF's exception handler converts the resulting Http404 to a clean 404 Response). The new test_unknown_asset_id_returns_404 parametrises across every API version so a future view that reverts to Asset.objects.get bare trips immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): rename queryset → asset in ViewerCurrentAssetViewV1 get_object_or_404 returns a single Asset, not a queryset; the variable name was already misleading under the previous bare Asset.objects.get(...) call. Address Copilot review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): silence uv cross-filesystem hardlink warning INSTALLER_VENV lands in /tmp (the mktemp -t default), while uv's cache lives at ~/.cache/uv on $HOME. On the typical Pi/Debian install /tmp is tmpfs and $HOME is the SD card, so uv's default hardlink mode fails for every wheel and falls back to a noisy "Failed to hardlink files; falling back to full copy" line. Set UV_LINK_MODE=copy on the install_ansible invocation so the fallback becomes the documented choice. provision_host_agent_venv is unaffected — both its venv and the uv cache live on $HOME, so hardlinks work there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(compose): pass --remove-orphans on every up Surfaced during e2e testing: after a compose recreate, anthias-server's up -d emitted "Found orphan containers ([anthias-anthias-viewer-run-…]) … you can run this command with the --remove-orphans flag to clean it up." These linger from earlier `docker compose run` invocations that created run-NNN sidecar containers — without --remove-orphans they just keep running and clutter `docker ps`. Apply to both the prod upgrade path (upgrade_containers.sh) and the dev bring-up (start_development_server.sh). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 lines
247 B
Bash
Executable File
13 lines
247 B
Bash
Executable File
#!/bin/bash
|
|
|
|
set -euo pipefail
|
|
|
|
COMPOSE_ARGS=(
|
|
'-f' 'docker-compose.dev.yml'
|
|
)
|
|
|
|
bin/generate_dev_mode_dockerfiles.sh
|
|
|
|
docker compose "${COMPOSE_ARGS[@]}" down --remove-orphans
|
|
docker compose "${COMPOSE_ARGS[@]}" up -d --build --remove-orphans
|