mirror of
https://github.com/Screenly/Anthias.git
synced 2026-06-10 17:18:43 -04:00
* fix(install): persistent host-agent venv (anthias-host-agent.service 203/EXEC) PR #2843 switched the installer venv to a mktemp tmpdir cleaned up on EXIT, but anthias-host-agent.service's ExecStart still hardcodes /home/${USER}/installer_venv/bin/python. Every fresh install since that refactor leaves the unit in a status=203/EXEC restart loop with no Python at the configured path, and /api/v2/info then blocks ~80s on get_node_ip() waiting for the host_agent_ready key that will never appear. Split the two venvs: * INSTALLER_VENV: still ephemeral mktemp, used by ansible-core during install/upgrade and torn down by the EXIT trap. * HOST_AGENT_VENV: new persistent venv at /home/${USER}/installer_venv (path kept stable so devices installed before the refactor don't need a unit rewrite), recreated from the host dep group on every install + upgrade so deps track pyproject.toml. provision_host_agent_venv runs after install_ansible() and before run_ansible_playbook() so the venv exists before ansible's state: started fires the unit. On upgrade the unit is already loaded with the previous venv's in-memory interpreter, so the state: started no-op never picks up the new deps — restart explicitly when the unit is already active. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(celery): switch beat to in-memory scheduler (Python 3.13 dbm.sqlite3 locking) celery -B with the default PersistentScheduler stores its schedule via shelve. On Python 3.13, shelve defaults to dbm.sqlite3, which raises dbm.sqlite3.error: locking protocol intermittently under contention — observed on x86 but not pi4-64 in this build matrix, which is consistent with a benign-looking race specific to the amd64 docker layer's filesystem ordering. When Beat stalls, reconcile_stuck_processing and the other periodic tasks set up by setup_periodic_tasks stop firing, so stuck-in-is_processing assets never get re-dispatched. setup_periodic_tasks defines every periodic task statically (no django-celery-beat / no dynamic schedule edits), so a non-persistent scheduler is sufficient. Switch to celery.beat.Scheduler in all three compose files (prod template + dev + test) and drop the --schedule /tmp/celerybeat-schedule flag that's now unused. The telemetry cooldown comment is updated to reference the new flag — the actual 24h cooldown is still gated by the Redis TTL, which is the persisted source of truth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): return 404 (not 500) for unknown asset_id across v1/v1.1/v1.2/v2 AssetViewV{1,1_1,1_2,2}.get / put / patch / update and the shared DeleteAssetViewMixin / AssetContentViewMixin / ViewerCurrentAssetViewV1 all called Asset.objects.get(asset_id=...) bare. The Asset.DoesNotExist that fires for a deleted-or-typo'd id has no DRF exception handler registered, so it bubbled up as a 500 with the database traceback — caller sees a server error for what is structurally a missing resource. AssetRecheckViewV2 already gets this right via filter(...).exists() + explicit 404; standardise the rest by routing the lookup through django.shortcuts.get_object_or_404 (DRF's exception handler converts the resulting Http404 to a clean 404 Response). The new test_unknown_asset_id_returns_404 parametrises across every API version so a future view that reverts to Asset.objects.get bare trips immediately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): rename queryset → asset in ViewerCurrentAssetViewV1 get_object_or_404 returns a single Asset, not a queryset; the variable name was already misleading under the previous bare Asset.objects.get(...) call. Address Copilot review. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): silence uv cross-filesystem hardlink warning INSTALLER_VENV lands in /tmp (the mktemp -t default), while uv's cache lives at ~/.cache/uv on $HOME. On the typical Pi/Debian install /tmp is tmpfs and $HOME is the SD card, so uv's default hardlink mode fails for every wheel and falls back to a noisy "Failed to hardlink files; falling back to full copy" line. Set UV_LINK_MODE=copy on the install_ansible invocation so the fallback becomes the documented choice. provision_host_agent_venv is unaffected — both its venv and the uv cache live on $HOME, so hardlinks work there. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(compose): pass --remove-orphans on every up Surfaced during e2e testing: after a compose recreate, anthias-server's up -d emitted "Found orphan containers ([anthias-anthias-viewer-run-…]) … you can run this command with the --remove-orphans flag to clean it up." These linger from earlier `docker compose run` invocations that created run-NNN sidecar containers — without --remove-orphans they just keep running and clutter `docker ps`. Apply to both the prod upgrade path (upgrade_containers.sh) and the dev bring-up (start_development_server.sh). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
57 lines
1.6 KiB
YAML
57 lines
1.6 KiB
YAML
# vim: ft=yaml.docker-compose
|
|
|
|
services:
|
|
anthias-server:
|
|
# Explicit image tag so anthias-celery below can reference the same
|
|
# built image without a duplicate `build:` block (which would
|
|
# produce a separate, byte-identical-but-distinct image tag).
|
|
image: anthias-server:dev
|
|
build:
|
|
context: .
|
|
dockerfile: docker/Dockerfile.server
|
|
ports:
|
|
- 8000:8080
|
|
environment:
|
|
- HOME=/data
|
|
- LISTEN=0.0.0.0
|
|
- CELERY_BROKER_URL=redis://redis:6379/0
|
|
- CELERY_RESULT_BACKEND=redis://redis:6379/0
|
|
- ENVIRONMENT=development
|
|
depends_on:
|
|
- redis
|
|
restart: always
|
|
volumes:
|
|
- anthias-data:/data
|
|
- ./:/usr/src/app/
|
|
|
|
anthias-celery:
|
|
# Reuses anthias-server:dev via the explicit image tag above.
|
|
# Compose builds anthias-server first (it owns the build:) and
|
|
# this service inherits the same image, only overriding CMD.
|
|
image: anthias-server:dev
|
|
depends_on:
|
|
anthias-server:
|
|
condition: service_started
|
|
redis:
|
|
condition: service_started
|
|
command: >
|
|
celery -A anthias_server.celery_tasks.celery worker -B -n worker@anthias
|
|
--loglevel=info --scheduler celery.beat.Scheduler
|
|
environment:
|
|
- HOME=/data
|
|
- CELERY_BROKER_URL=redis://redis:6379/0
|
|
- CELERY_RESULT_BACKEND=redis://redis:6379/0
|
|
- ENVIRONMENT=development
|
|
restart: always
|
|
volumes:
|
|
- anthias-data:/data
|
|
- ./:/usr/src/app/
|
|
|
|
redis:
|
|
platform: "linux/amd64"
|
|
image: mirror.gcr.io/library/redis:alpine
|
|
|
|
volumes:
|
|
anthias-data:
|
|
redis-data:
|