Files
Anthias/bin/start_development_server.sh
Viktor Petersson f547642fc4 fix: e2e-test findings (host-agent venv, celery beat, asset GET 404) (#2881)
* fix(install): persistent host-agent venv (anthias-host-agent.service 203/EXEC)

PR #2843 switched the installer venv to a mktemp tmpdir cleaned up
on EXIT, but anthias-host-agent.service's ExecStart still hardcodes
/home/${USER}/installer_venv/bin/python. Every fresh install since
that refactor leaves the unit in a status=203/EXEC restart loop with
no Python at the configured path, and /api/v2/info then blocks ~80s
on get_node_ip() waiting for the host_agent_ready key that will
never appear.

Split the two venvs:

* INSTALLER_VENV: still ephemeral mktemp, used by ansible-core during
  install/upgrade and torn down by the EXIT trap.
* HOST_AGENT_VENV: new persistent venv at /home/${USER}/installer_venv
  (path kept stable so devices installed before the refactor don't
  need a unit rewrite), recreated from the host dep group on every
  install + upgrade so deps track pyproject.toml.

provision_host_agent_venv runs after install_ansible() and before
run_ansible_playbook() so the venv exists before ansible's
state: started fires the unit. On upgrade the unit is already
loaded with the previous venv's in-memory interpreter, so the
state: started no-op never picks up the new deps — restart
explicitly when the unit is already active.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(celery): switch beat to in-memory scheduler (Python 3.13 dbm.sqlite3 locking)

celery -B with the default PersistentScheduler stores its schedule
via shelve. On Python 3.13, shelve defaults to dbm.sqlite3, which
raises dbm.sqlite3.error: locking protocol intermittently under
contention — observed on x86 but not pi4-64 in this build matrix,
which is consistent with a benign-looking race specific to the
amd64 docker layer's filesystem ordering. When Beat stalls,
reconcile_stuck_processing and the other periodic tasks set up by
setup_periodic_tasks stop firing, so stuck-in-is_processing assets
never get re-dispatched.

setup_periodic_tasks defines every periodic task statically (no
django-celery-beat / no dynamic schedule edits), so a non-persistent
scheduler is sufficient. Switch to celery.beat.Scheduler in all
three compose files (prod template + dev + test) and drop the
--schedule /tmp/celerybeat-schedule flag that's now unused. The
telemetry cooldown comment is updated to reference the new flag —
the actual 24h cooldown is still gated by the Redis TTL, which is
the persisted source of truth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(api): return 404 (not 500) for unknown asset_id across v1/v1.1/v1.2/v2

AssetViewV{1,1_1,1_2,2}.get / put / patch / update and the shared
DeleteAssetViewMixin / AssetContentViewMixin / ViewerCurrentAssetViewV1
all called Asset.objects.get(asset_id=...) bare. The Asset.DoesNotExist
that fires for a deleted-or-typo'd id has no DRF exception handler
registered, so it bubbled up as a 500 with the database traceback —
caller sees a server error for what is structurally a missing
resource. AssetRecheckViewV2 already gets this right via
filter(...).exists() + explicit 404; standardise the rest by routing
the lookup through django.shortcuts.get_object_or_404 (DRF's exception
handler converts the resulting Http404 to a clean 404 Response).

The new test_unknown_asset_id_returns_404 parametrises across every
API version so a future view that reverts to Asset.objects.get bare
trips immediately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(api): rename queryset → asset in ViewerCurrentAssetViewV1

get_object_or_404 returns a single Asset, not a queryset; the
variable name was already misleading under the previous bare
Asset.objects.get(...) call. Address Copilot review.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): silence uv cross-filesystem hardlink warning

INSTALLER_VENV lands in /tmp (the mktemp -t default), while uv's
cache lives at ~/.cache/uv on $HOME. On the typical Pi/Debian
install /tmp is tmpfs and $HOME is the SD card, so uv's default
hardlink mode fails for every wheel and falls back to a noisy
"Failed to hardlink files; falling back to full copy" line. Set
UV_LINK_MODE=copy on the install_ansible invocation so the
fallback becomes the documented choice. provision_host_agent_venv
is unaffected — both its venv and the uv cache live on $HOME, so
hardlinks work there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(compose): pass --remove-orphans on every up

Surfaced during e2e testing: after a compose recreate, anthias-server's
up -d emitted "Found orphan containers ([anthias-anthias-viewer-run-…])
… you can run this command with the --remove-orphans flag to clean it
up." These linger from earlier `docker compose run` invocations that
created run-NNN sidecar containers — without --remove-orphans they
just keep running and clutter `docker ps`. Apply to both the prod
upgrade path (upgrade_containers.sh) and the dev bring-up
(start_development_server.sh).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 21:17:26 +01:00

13 lines
247 B
Bash
Executable File

#!/bin/bash
set -euo pipefail
COMPOSE_ARGS=(
'-f' 'docker-compose.dev.yml'
)
bin/generate_dev_mode_dockerfiles.sh
docker compose "${COMPOSE_ARGS[@]}" down --remove-orphans
docker compose "${COMPOSE_ARGS[@]}" up -d --build --remove-orphans