Files
Anthias/docker-compose.dev.yml
Viktor Petersson f547642fc4 fix: e2e-test findings (host-agent venv, celery beat, asset GET 404) (#2881)
* fix(install): persistent host-agent venv (anthias-host-agent.service 203/EXEC)

PR #2843 switched the installer venv to a mktemp tmpdir cleaned up
on EXIT, but anthias-host-agent.service's ExecStart still hardcodes
/home/${USER}/installer_venv/bin/python. Every fresh install since
that refactor leaves the unit in a status=203/EXEC restart loop with
no Python at the configured path, and /api/v2/info then blocks ~80s
on get_node_ip() waiting for the host_agent_ready key that will
never appear.

Split the two venvs:

* INSTALLER_VENV: still ephemeral mktemp, used by ansible-core during
  install/upgrade and torn down by the EXIT trap.
* HOST_AGENT_VENV: new persistent venv at /home/${USER}/installer_venv
  (path kept stable so devices installed before the refactor don't
  need a unit rewrite), recreated from the host dep group on every
  install + upgrade so deps track pyproject.toml.

provision_host_agent_venv runs after install_ansible() and before
run_ansible_playbook() so the venv exists before ansible's
state: started fires the unit. On upgrade the unit is already
loaded with the previous venv's in-memory interpreter, so the
state: started no-op never picks up the new deps — restart
explicitly when the unit is already active.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(celery): switch beat to in-memory scheduler (Python 3.13 dbm.sqlite3 locking)

celery -B with the default PersistentScheduler stores its schedule
via shelve. On Python 3.13, shelve defaults to dbm.sqlite3, which
raises dbm.sqlite3.error: locking protocol intermittently under
contention — observed on x86 but not pi4-64 in this build matrix,
which is consistent with a benign-looking race specific to the
amd64 docker layer's filesystem ordering. When Beat stalls,
reconcile_stuck_processing and the other periodic tasks set up by
setup_periodic_tasks stop firing, so stuck-in-is_processing assets
never get re-dispatched.

setup_periodic_tasks defines every periodic task statically (no
django-celery-beat / no dynamic schedule edits), so a non-persistent
scheduler is sufficient. Switch to celery.beat.Scheduler in all
three compose files (prod template + dev + test) and drop the
--schedule /tmp/celerybeat-schedule flag that's now unused. The
telemetry cooldown comment is updated to reference the new flag —
the actual 24h cooldown is still gated by the Redis TTL, which is
the persisted source of truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(api): return 404 (not 500) for unknown asset_id across v1/v1.1/v1.2/v2

AssetViewV{1,1_1,1_2,2}.get / put / patch / update and the shared
DeleteAssetViewMixin / AssetContentViewMixin / ViewerCurrentAssetViewV1
all called Asset.objects.get(asset_id=...) bare. The Asset.DoesNotExist
that fires for a deleted-or-typo'd id has no DRF exception handler
registered, so it bubbled up as a 500 with the database traceback —
caller sees a server error for what is structurally a missing
resource. AssetRecheckViewV2 already gets this right via
filter(...).exists() + explicit 404; standardise the rest by routing
the lookup through django.shortcuts.get_object_or_404 (DRF's exception
handler converts the resulting Http404 to a clean 404 Response).

The new test_unknown_asset_id_returns_404 parametrises across every
API version so a future view that reverts to Asset.objects.get bare
trips immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(api): rename queryset → asset in ViewerCurrentAssetViewV1

get_object_or_404 returns a single Asset, not a queryset; the
variable name was already misleading under the previous bare
Asset.objects.get(...) call. Address Copilot review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): silence uv cross-filesystem hardlink warning

INSTALLER_VENV lands in /tmp (the mktemp -t default), while uv's
cache lives at ~/.cache/uv on $HOME. On the typical Pi/Debian
install /tmp is tmpfs and $HOME is the SD card, so uv's default
hardlink mode fails for every wheel and falls back to a noisy
"Failed to hardlink files; falling back to full copy" line. Set
UV_LINK_MODE=copy on the install_ansible invocation so the
fallback becomes the documented choice. provision_host_agent_venv
is unaffected — both its venv and the uv cache live on $HOME, so
hardlinks work there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(compose): pass --remove-orphans on every up

Surfaced during e2e testing: after a compose recreate, anthias-server's
up -d emitted "Found orphan containers ([anthias-anthias-viewer-run-…])
… you can run this command with the --remove-orphans flag to clean it
up." These linger from earlier `docker compose run` invocations that
created run-NNN sidecar containers — without --remove-orphans they
just keep running and clutter `docker ps`. Apply to both the prod
upgrade path (upgrade_containers.sh) and the dev bring-up
(start_development_server.sh).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 21:17:26 +01:00

57 lines
1.6 KiB
YAML

# vim: ft=yaml.docker-compose
services:
anthias-server:
# Explicit image tag so anthias-celery below can reference the same
# built image without a duplicate `build:` block (which would
# produce a separate, byte-identical-but-distinct image tag).
image: anthias-server:dev
build:
context: .
dockerfile: docker/Dockerfile.server
ports:
- 8000:8080
environment:
- HOME=/data
- LISTEN=0.0.0.0
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- ENVIRONMENT=development
depends_on:
- redis
restart: always
volumes:
- anthias-data:/data
- ./:/usr/src/app/
anthias-celery:
# Reuses anthias-server:dev via the explicit image tag above.
# Compose builds anthias-server first (it owns the build:) and
# this service inherits the same image, only overriding CMD.
image: anthias-server:dev
depends_on:
anthias-server:
condition: service_started
redis:
condition: service_started
command: >
celery -A anthias_server.celery_tasks.celery worker -B -n worker@anthias
--loglevel=info --scheduler celery.beat.Scheduler
environment:
- HOME=/data
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- ENVIRONMENT=development
restart: always
volumes:
- anthias-data:/data
- ./:/usr/src/app/
redis:
platform: "linux/amd64"
image: mirror.gcr.io/library/redis:alpine
volumes:
anthias-data:
redis-data: