Files
Anthias/docker-compose.test.yml
Viktor Petersson 43b937563b fix(sentry): stop reporting transient redis blips and client disconnects (#3018)
* fix(sentry): stop reporting transient redis blips and client disconnects

- Redis restarting (container recycle, compose startup before DNS
  resolves) produced an error event per process per blip even though
  every consumer self-heals: celery reconnects with backoff, the
  viewer's resolution reporter retries next tick, Channels
  re-establishes on the next frame (Sentry ANTHIAS-M, ANTHIAS-K,
  ANTHIAS-H, ANTHIAS-J)
- Add a before_send hook that drops events whose exception chain
  contains redis.exceptions.ConnectionError or asyncio.CancelledError
  (an HTTP client hanging up mid-request under ASGI — ANTHIAS-N)
- Silence celery's per-reconnect-attempt ERROR log at the logger
  (it arrives as a log message, not an exception)
- Downgrade the viewer reporter's redis-down log to a warning and
  extract the tick body into a testable helper
- Add regression tests for the filter and the reporter tick

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(sentry): address review — typed before_send, cleaner test fixtures

- Annotate the hook with sentry_sdk.types Event/Hint for strict mypy
- Build exc_info triples directly in tests instead of catching
  BaseException (Sonar S5754) and compare events by equality
  (Sonar S5796)
- Use record.getMessage() in the caplog assertion (Copilot)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(tests): address review — make the ignored-logger test order-independent

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(tests): address review — lift the module-wide logging disable for caplog tests

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(sentry): silence celery beat's reconnect-retry log too

- The embedded beat scheduler logs every broker reconnect attempt at
  ERROR ("beat: Connection error ... Trying again"), the same
  expected-transient noise as the consumer logger (Sentry ANTHIAS-P)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(sentry): address review — respect __suppress_context__ in the chain walk

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(compose): healthcheck redis and gate services on it answering PING

- depends_on with bare service_started only orders container
  creation; uvicorn/celery/viewer could still race a redis that
  hadn't finished loading its RDB, producing the startup
  connection-refused noise (review feedback on this PR)
- Add a redis-cli ping healthcheck to the prod template, dev, and
  test composes, and gate anthias-server / anthias-viewer /
  anthias-celery on service_healthy
- compose-only: the balena supervisor doesn't support depends_on
  conditions, and a redis container recycling mid-life is gated by
  nothing — so the Sentry-side handling of transient redis errors
  stays

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-07 15:51:54 +02:00

74 lines
2.7 KiB
YAML

# vim: ft=yaml.docker-compose
services:
anthias-test:
# Both anthias-test and anthias-celery declare the same image tag
# AND the same build block. Compose deduplicates the build by
# image tag, so the image is produced once; ghcr layer cache
# (configured by ``tools.image_builder`` via the Dockerfile's
# ``# syntax`` line + ``--mount=type=cache`` and the buildx
# invocation in ci) is hit by both service builds. Sharing the
# ``build:`` block also keeps either service from triggering a
# registry pull for the local-only ``anthias-test:dev`` tag.
image: anthias-test:dev
build: &test_build
context: .
dockerfile: docker/Dockerfile.test
# Shared with anthias-celery via the anchor: the worker MUST see
# the same ENVIRONMENT/ANTHIAS_TEST_DB_PATH as the server. Without
# ENVIRONMENT=test, settings.py takes the production branch on the
# worker — it reads the (empty) /data/.anthias/anthias.db instead
# of the test DB ("no such table: assets" on every task) and, with
# no DSN override, ships those errors to the production Sentry
# project tagged environment=production.
environment: &test_env
- HOME=/data
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- ENVIRONMENT=test
# Pin the test DB to the writable volume mount on /data so CI
# keeps its historic location. Local runs (no override) fall back
# to a repo-local path inside BASE_DIR — see
# src/anthias_server/django_project/settings.py.
- ANTHIAS_TEST_DB_PATH=/data/.anthias/test.db
stdin_open: true
tty: true
volumes:
- .:/usr/src/app
- anthias-data:/data
anthias-celery:
# Same image as anthias-test (test image is a superset of server:
# same base apt + venv + bun + chromium for playwright); only the CMD
# differs. Sharing the ``build:`` anchor lets compose route this
# service through the ghcr layer cache too instead of attempting
# a Docker Hub pull on the local-only ``anthias-test:dev`` tag.
image: anthias-test:dev
build: *test_build
command: >
celery -A anthias_server.celery_tasks.celery worker -B -n worker@anthias
--loglevel=info --scheduler celery.beat.Scheduler
depends_on:
anthias-test:
condition: service_started
redis:
condition: service_healthy
environment: *test_env
restart: always
volumes:
- .:/usr/src/app
- anthias-data:/data
redis:
image: mirror.gcr.io/library/redis:alpine
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 3s
retries: 12
start_period: 10s
volumes:
anthias-data:
redis-data: