refactor(docker): drop celery image, restore base apt layer dedup (#2776)

* refactor(docker): drop celery image, restore base apt layer dedup

- Delete Dockerfile.celery.j2; compose now runs celery on the
  anthias-server image with a `command:` override.
- Make viewer extend Dockerfile.base.j2 (mirroring test); drop 17
  packages duplicated between viewer and base_apt_dependencies, plus
  4 within-list duplicates.
- Move `# syntax=docker/dockerfile:1.4` to line 1 of every rendered
  Dockerfile. It previously lived in uv-builder.j2 line 1 and got
  bumped mid-file for server by the bun-builder prelude, silently
  disabling the 1.4 frontend and breaking cache-key parity with
  viewer — the actual blocker for layer dedup.
- Collapse CI matrix from (board × service) to (board) so all
  services for a board build on the same runner with the same
  buildkit cache, producing byte-identical apt layer digests at the
  registry.
- Add ENV DJANGO_SETTINGS_MODULE to the server image so the merged
  image runs both server and celery CMDs.
- Update all five compose templates (prod, balena prod, balena dev,
  dev, test) to redirect anthias-celery at the server image with a
  command: override. dev compose pins an explicit `image:` tag so
  both services share the locally-built SHA.
- Remove old anthias-celery / srly-ose-celery containers in
  upgrade_containers.sh so the recreated container can take the name.

Verified end-to-end on x86: server and viewer apt layers share a
single digest; SHARED SIZE jumps from 132 MB to 1.216 GB; merged
image runs both workloads in compose (celery task round-trips
through Redis to SUCCESS).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(docker): cache buildkit layers in GHCR registry across CI runs

Add a --cache-backend / $BUILDX_CACHE_BACKEND option to
tools.image_builder with two modes:

- `local` (default): writes to /tmp/.buildx-cache/<board>/.
  Unchanged from before; right for local dev.
- `registry`: pushes BuildKit cache to
  ghcr.io/screenly/anthias-<service>:buildcache-<board>. Reuses the
  GHCR login already done by docker-build.yaml, no extra tokens or
  third-party actions needed.

Wire CI to use registry mode on push events (master) so subsequent
runs of the same board pull cached layers — the ~825 MB extracted
apt install per service goes from ~3 min cold to a few seconds
warm. workflow_dispatch on a non-master branch falls back to local
mode (effectively no-cache) so manual runs can't pollute the master
cache.

Drop the old actions/cache@v5 step that mirrored
/tmp/.buildx-cache/<board> through actions/cache — registry cache
is per-step rather than one big tarball, so it survives the GitHub
Actions cache 10 GB-per-repo eviction better.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(image-builder): move local cache out of /tmp to user XDG cache dir

SonarCloud python:S5443 flagged the previous /tmp/.buildx-cache/
default as a security hotspot — `/tmp` is world-writable, so on a
multi-user host another account could in principle tamper with the
buildkit cache. Switch to $XDG_CACHE_HOME/anthias-buildx/<board>/
(default ~/.cache/anthias-buildx/), which is per-user by default
and follows XDG Base Directory convention.

CI is unaffected: docker-build.yaml uses --cache-backend=registry
on push events, which pushes cache to GHCR and never touches the
local path. Local dev users with stale state in
/tmp/.buildx-cache/<board>/ can rm it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): correct cache-backend comments to match real behavior

Two doc fixes per Copilot review on #2776:

- tools/image_builder/__main__.py: the cache-backend rationale
  block still referenced /tmp/.buildx-cache/<board>; update to
  $XDG_CACHE_HOME/anthias-buildx/<board> so it matches the
  implementation moved in 529a50e0.
- .github/workflows/docker-build.yaml: the env comment claimed
  pull-request builds read from the registry cache, but this
  workflow has no pull_request trigger — non-push runs are
  workflow_dispatch, which both falls through to local cache and
  skips `docker login ghcr.io`, so it has no GHCR auth at all.
  Rewrite the comment around the push / workflow_dispatch split
  the code actually implements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): address Copilot review on registry cache + test compose

- tools/image_builder/__main__.py: comment in the registry-cache
  branch said the cache namespace was "picked from the build's tag
  list", but the implementation hardcodes
  ghcr.io/screenly/anthias-{service}. Rewrite the comment to
  describe what the code actually does and call out the hardcode
  so a future namespaces refactor doesn't silently break cache.
- docker-compose.test.yml: anthias-celery had its own `build:`
  block pointing at Dockerfile.test, claiming "reuses the test
  image" — but compose builds two separate images per service
  even with identical context, defeating the dedup intent. Mirror
  the docker-compose.dev.yml pattern: pin anthias-test to an
  explicit `image: anthias-test:dev` tag and have anthias-celery
  reference the same tag with no `build:`. Also bind-mount the
  source into celery so it picks up code changes (matches
  anthias-test's existing volume).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(image-builder): read-only registry cache without --push

Per Copilot review: --cache-backend=registry previously tried to
push cache to ghcr.io/... regardless of --push, so a local invocation
without GHCR auth would fail mid-build with a confusing registry
error. Split the behavior:

- Reads (cache_from) are always set when registry mode is active —
  the anthias-* GHCR packages are public, so warm-starting off CI's
  cache without auth works and helps local dev.
- Writes (cache_to) only happen when --push is also set, since
  that's when the workflow has authenticated to GHCR. Without
  --push, log a yellow warning and skip cache_to.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): set DJANGO_SETTINGS_MODULE in test image for celery worker

Per Copilot review on #2776 (suppressed-due-to-low-confidence note,
but the bug is real): docker-compose.test.yml runs the celery
worker from anthias-test:dev. celery_tasks.py calls django.setup()
at module import time, which needs DJANGO_SETTINGS_MODULE in the
environment. The pre-refactor Dockerfile.celery.j2 set it
explicitly; this PR moved that ENV to Dockerfile.server.j2 only,
so the production celery (running on the server image) is fine but
the test celery would have crashed with ImproperlyConfigured.

Set the same ENV in Dockerfile.test.j2. Server and test images
both ship a usable Django environment for any process that imports
anthias_django.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Viktor Petersson
2026-04-29 15:21:43 +01:00
committed by GitHub
parent c76f5afa20
commit 5e00c8ba25
20 changed files with 241 additions and 145 deletions

View File

@@ -40,24 +40,35 @@ jobs:
# Scoped per-job (not at workflow level) so `run-tests` and any
# future read-only job don't inherit `packages: write`. `buildx`
# needs it so `docker login ghcr.io` with GITHUB_TOKEN can push
# ghcr.io/screenly/anthias-*. `contents: read` is the implicit
# default but pinned explicitly so a future workflow edit can't
# silently lose checkout access.
# both ghcr.io/screenly/anthias-* image tags and the
# `buildcache-*` registry cache tags written by --cache-backend=
# registry. `contents: read` is the implicit default but pinned
# explicitly so a future workflow edit can't silently lose
# checkout access.
permissions:
contents: read
packages: write
strategy:
# Don't cancel sibling jobs on the first failure: any platform that
# has already finished building its image will have pushed the
# immutable <short-hash>-<board> tag, which is harmless on its own.
# Only the publish-latest job below — gated on the entire matrix
# succeeding — moves the floating latest-* tag, so a partial failure
# leaves users on the previous coherent latest-* set instead of a
# half-pushed mix of old + new images.
# has already finished building its images will have pushed the
# immutable <short-hash>-<board> tags, which are harmless on their
# own. Only the publish-latest job below — gated on the entire
# matrix succeeding — moves the floating latest-* tag, so a partial
# failure leaves users on the previous coherent latest-* set
# instead of a half-pushed mix of old + new images.
#
# The matrix is intentionally only on `board`, not `(board, service)`:
# buildkit's per-runner cache hashes apt-get-update output (timestamps
# in /var/lib/apt/lists/*, mirror selection) into the layer digest,
# so the same package list installed on two different runners
# produces two different layer hashes. Building all services for
# one board on a single runner means the base apt layer is hashed
# once and shared across server / viewer / test / etc. — which is
# what makes Dockerfile.base.j2's include-shared layer actually
# dedup at the registry level. See refactor: drop celery image.
fail-fast: false
matrix:
board: ['pi1', 'pi2', 'pi3', 'pi4', 'pi4-64', 'pi5', 'x86']
service: ['server', 'celery', 'redis', 'viewer']
python-version: ["3.11"]
runs-on: ubuntu-24.04
@@ -95,19 +106,6 @@ jobs:
docker buildx create --use --name multiarch-builder
docker buildx inspect --bootstrap
- name: Cache Docker layers
uses: actions/cache@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5
id: cache
with:
path: /tmp/.buildx-cache/${{ matrix.board }}-${{ matrix.service }}
key: buildx-${{ matrix.board }}-${{ matrix.service }}-${{ hashFiles('docker/**/*') }}
restore-keys: |
buildx-${{ matrix.board }}-${{ matrix.service }}-
- name: Inspect cache before build
run: |
ls -la /tmp/.buildx-cache/${{ matrix.board }}-${{ matrix.service }} || true
- name: Login to Docker Hub
if: success() && github.event_name == 'push'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4
@@ -123,20 +121,32 @@ jobs:
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Build Container
- name: Build Containers
env:
DOCKER_BUILDKIT: 1
BUILDKIT_PROGRESS: plain
# On push events (master): use BuildKit's registry cache
# backend — pushes cache to ghcr.io/screenly/anthias-
# <service>:buildcache-<board> so subsequent push runs of
# the same board can pull cached layers without
# re-installing ~825 MB of apt packages from scratch.
#
# On workflow_dispatch (manual runs on any branch): fall
# through to `local` mode — a per-runner ephemeral
# directory, so effectively no-cache. workflow_dispatch
# also skips the `docker login ghcr.io` step above (it's
# gated on `event_name == 'push'`), so registry cache
# would have no auth to write with anyway. This workflow
# has no `pull_request` trigger; PRs never run this job.
BUILDX_CACHE_BACKEND: ${{ github.event_name == 'push' && 'registry' || 'local' }}
run: |
uv run python -m tools.image_builder \
--build-target=${{ matrix.board }} \
--service=${{ matrix.service }} \
--service=server \
--service=viewer \
--service=redis \
${{ github.event_name == 'push' && '--push --skip-latest-tag' || '' }}
- name: Inspect cache after build
run: |
ls -la /tmp/.buildx-cache/${{ matrix.board }}-${{ matrix.service }} || true
# Mirror the immutable <short-hash>-<board> tags pushed by the buildx
# matrix onto the floating latest-<board> tag. Runs only after every
# buildx job has succeeded, so latest-* is never advanced from a
@@ -199,7 +209,7 @@ jobs:
set -euo pipefail
GIT_SHORT_HASH=$(git rev-parse --short=7 HEAD)
BOARDS=(pi1 pi2 pi3 pi4 pi4-64 pi5 x86)
SERVICES=(server celery redis viewer)
SERVICES=(server redis viewer)
# GHCR first so the canonical primary is current even if the
# Docker Hub mirror later in the loop flakes.
NAMESPACES=(ghcr.io/screenly/anthias screenly/anthias)

View File

@@ -43,7 +43,6 @@ jobs:
uv run python -m tools.image_builder \
--dockerfiles-only \
--disable-cache-mounts \
--service celery \
--service redis \
--service test

1
.gitignore vendored
View File

@@ -44,7 +44,6 @@ docker/Dockerfile.base
docker/Dockerfile.nginx
docker/Dockerfile.server
docker/Dockerfile.websocket
docker/Dockerfile.celery
docker/Dockerfile.redis
docker/Dockerfile.viewer
docker/Dockerfile.test

View File

@@ -11,7 +11,7 @@ Anthias is an open-source digital signage platform for Raspberry Pi and x86 PCs
Anthias runs as a set of Docker containers:
- **anthias-server** (port 80 in prod, 8000 in dev) — uvicorn (ASGI) serving the Django web app, REST API, the React frontend's static assets (via WhiteNoise), uploaded media at `/anthias_assets/`, and the WebSocket endpoint at `/ws` (Django Channels with a Redis-backed channel layer). Always plain HTTP — TLS is opt-in and handled by the **anthias-caddy** sidecar that `bin/enable_ssl.sh` installs as a compose override (Caddy local CA by default, or auto Let's Encrypt with `--domain`, or BYO cert with `--cert`/`--key`).
- **anthias-celery** — Async task queue (asset downloads, cleanup). Publishes asset-update events back to the WebSocket consumers via the Channels Redis layer.
- **anthias-celery** — Async task queue (asset downloads, cleanup). Runs the same image as `anthias-server` with a CMD override that starts the Celery worker; the two services share the entire root filesystem to avoid duplicating ~825 MB of identical apt content per device. Publishes asset-update events back to the WebSocket consumers via the Channels Redis layer.
- **anthias-viewer** — Drives the display, receives instructions over the Redis pub/sub `anthias.viewer` channel, talks to anthias-server over HTTP.
- **redis** (port 6379) — Celery broker + result backend, Channels channel layer, and the viewer signalling bus (pub/sub channel + per-correlation-ID reply lists).
- **webview** — Qt-based browser for rendering content on the display; fetches `/anthias_assets/` from anthias-server.
@@ -72,7 +72,7 @@ uv run ruff check /path/to/file.py # Lint specific file
```bash
# Build and start test containers
uv run python -m tools.image_builder --dockerfiles-only --disable-cache-mounts --service celery --service redis --service test
uv run python -m tools.image_builder --dockerfiles-only --disable-cache-mounts --service redis --service test
docker compose -f docker-compose.test.yml up -d --build
# Prepare and run tests (integration and non-integration must be run separately)

View File

@@ -69,7 +69,6 @@ if [[ -n $(docker ps | grep srly-ose) ]]; then
set +e
docker container rename srly-ose-server anthias-server
docker container rename srly-ose-viewer anthias-viewer
docker container rename srly-ose-celery anthias-celery
set -e
fi
@@ -77,11 +76,18 @@ fi
# * nginx / websocket — folded into anthias-server (uvicorn).
# * wifi-connect — service removed; nmcli/nmtui is the supported
# path now.
# * anthias-celery / srly-ose-celery containers from the era when
# celery had its own image. The new compose file recreates the
# anthias-celery container against ghcr.io/screenly/anthias-server,
# so the old container (still pointing at the deleted celery image)
# must be removed first or the server-image-backed replacement
# can't take its name.
# Volumes are shared across services, so removing the containers is safe.
set +e
docker rm -f \
anthias-nginx anthias-websocket anthias-wifi-connect \
srly-ose-nginx srly-ose-websocket srly-ose-wifi-connect \
anthias-celery srly-ose-celery \
>/dev/null 2>&1
set -e

View File

@@ -45,10 +45,12 @@ services:
io.balena.features.supervisor-api: '1'
anthias-celery:
image: ghcr.io/screenly/anthias-celery:${GIT_SHORT_HASH}-${BOARD}
build:
context: .
dockerfile: ./docker/Dockerfile.celery
# Runs on the same image as anthias-server with a CMD override.
# See docker-compose.yml.tmpl for context on the merge.
image: ghcr.io/screenly/anthias-server:${GIT_SHORT_HASH}-${BOARD}
command: >
celery -A celery_tasks.celery worker -B -n worker@anthias
--loglevel=info --schedule /tmp/celerybeat-schedule
depends_on:
- anthias-server
- redis

View File

@@ -39,7 +39,12 @@ services:
io.balena.features.supervisor-api: '1'
anthias-celery:
image: ghcr.io/screenly/anthias-celery:${GIT_SHORT_HASH}-${BOARD}
# Runs on the same image as anthias-server with a CMD override.
# See docker-compose.yml.tmpl for context on the merge.
image: ghcr.io/screenly/anthias-server:${GIT_SHORT_HASH}-${BOARD}
command: >
celery -A celery_tasks.celery worker -B -n worker@anthias
--loglevel=info --schedule /tmp/celerybeat-schedule
depends_on:
- anthias-server
- redis

View File

@@ -2,6 +2,10 @@
services:
anthias-server:
# Explicit image tag so anthias-celery below can reference the same
# built image without a duplicate `build:` block (which would
# produce a separate, byte-identical-but-distinct image tag).
image: anthias-server:dev
build:
context: .
dockerfile: docker/Dockerfile.server
@@ -21,19 +25,27 @@ services:
- ./:/usr/src/app/
anthias-celery:
build:
context: .
dockerfile: docker/Dockerfile.celery
# Reuses anthias-server:dev via the explicit image tag above.
# Compose builds anthias-server first (it owns the build:) and
# this service inherits the same image, only overriding CMD.
image: anthias-server:dev
depends_on:
- anthias-server
- redis
anthias-server:
condition: service_started
redis:
condition: service_started
command: >
celery -A celery_tasks.celery worker -B -n worker@anthias
--loglevel=info --schedule /tmp/celerybeat-schedule
environment:
- HOME=/data
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
- ENVIRONMENT=development
restart: always
volumes:
- anthias-data:/data
- ./:/usr/src/app/
redis:
platform: "linux/amd64"

View File

@@ -2,6 +2,10 @@
services:
anthias-test:
# Explicit image tag so anthias-celery below can reference the same
# built image without a duplicate `build:` block (which would
# produce a separate, byte-identical-but-distinct image tag).
image: anthias-test:dev
build:
context: .
dockerfile: docker/Dockerfile.test
@@ -17,18 +21,27 @@ services:
- anthias-data:/data
anthias-celery:
build:
context: .
dockerfile: docker/Dockerfile.celery
# Reuses anthias-test:dev via the explicit image tag above — the
# test image is a superset of server (same base apt + venv +
# bun + chrome for selenium). Compose builds anthias-test first
# (it owns the build:) and this service inherits the same image,
# only overriding CMD.
image: anthias-test:dev
command: >
celery -A celery_tasks.celery worker -B -n worker@anthias
--loglevel=info --schedule /tmp/celerybeat-schedule
depends_on:
- anthias-test
- redis
anthias-test:
condition: service_started
redis:
condition: service_started
environment:
- HOME=/data
- CELERY_BROKER_URL=redis://redis:6379/0
- CELERY_RESULT_BACKEND=redis://redis:6379/0
restart: always
volumes:
- .:/usr/src/app
- anthias-data:/data
redis:

View File

@@ -58,10 +58,15 @@ services:
io.balena.features.supervisor-api: '1'
anthias-celery:
image: ghcr.io/screenly/anthias-celery:${DOCKER_TAG}-${DEVICE_TYPE}
build:
context: .
dockerfile: docker/Dockerfile.celery
# Runs on the same image as anthias-server with a CMD override.
# Shipping one image instead of two is the point — server and celery
# share their entire root filesystem (base apt + venv + app source),
# and a separate celery image was duplicating ~825 MB extracted of
# identical content per device. See refactor: drop celery image.
image: ghcr.io/screenly/anthias-server:${DOCKER_TAG}-${DEVICE_TYPE}
command: >
celery -A celery_tasks.celery worker -B -n worker@anthias
--loglevel=info --schedule /tmp/celerybeat-schedule
depends_on:
- anthias-server
- redis

View File

@@ -1,22 +0,0 @@
{% include 'uv-builder.j2' %}
{% include 'Dockerfile.base.j2' %}
COPY --from=uv-builder /venv /venv
ENV PATH="/venv/bin:$PATH"
ENV VIRTUAL_ENV="/venv"
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
COPY . /usr/src/app/
ENV GIT_HASH={{ git_hash }}
ENV GIT_SHORT_HASH={{ git_short_hash }}
ENV GIT_BRANCH={{ git_branch }}
ENV DJANGO_SETTINGS_MODULE="anthias_django.settings"
CMD celery -A celery_tasks.celery worker \
-B -n worker@anthias \
--loglevel=info \
--schedule \
/tmp/celerybeat-schedule

View File

@@ -1,3 +1,5 @@
# syntax=docker/dockerfile:1.4
# vim: ft=dockerfile
{% if environment == 'production' %}
{# bun ships no 32-bit binaries at all — its release artifacts cover
only {linux,darwin,windows}-{x64,aarch64}, so a target-platform
@@ -58,5 +60,6 @@ ENV GIT_HASH={{ git_hash }}
ENV GIT_SHORT_HASH={{ git_short_hash }}
ENV GIT_BRANCH={{ git_branch }}
ENV DEVICE_TYPE={{ device_type }}
ENV DJANGO_SETTINGS_MODULE="anthias_django.settings"
CMD ["bash", "bin/start_server.sh"]

View File

@@ -1,9 +1,9 @@
# syntax=docker/dockerfile:1.4
# vim: ft=dockerfile
{% include 'uv-builder.j2' %}
{% include 'Dockerfile.base.j2' %}
# vim: ft=dockerfile
# @TODO: Uncomment this build stage when test_add_asset_streaming is fixed.
# FROM debian:buster as builder
@@ -63,4 +63,5 @@ RUN cp ansible/roles/anthias/files/anthias.conf \
ENV GIT_HASH={{ git_hash }}
ENV GIT_SHORT_HASH={{ git_short_hash }}
ENV GIT_BRANCH={{ git_branch }}
ENV DJANGO_SETTINGS_MODULE="anthias_django.settings"
ENV PATH="/opt/chrome-linux64:/opt/chromedriver-linux64:$PATH"

View File

@@ -1,8 +1,9 @@
# syntax=docker/dockerfile:1.4
# vim: ft=dockerfile
{% include 'uv-builder.j2' %}
FROM {{ base_image }}:{{ base_image_tag }}
{% include 'Dockerfile.base.j2' %}
# This list needs to be trimmed back later
{% if disable_cache_mounts %}
RUN \
{% else %}
@@ -10,7 +11,7 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
{% endif %}
apt-get update && \
apt-get -y install --no-install-recommends \
{% for dependency in apt_dependencies -%}
{% for dependency in viewer_extra_apt_dependencies -%}
{% if not loop.last %}
{{ dependency }} \
{% else %}
@@ -22,10 +23,6 @@ COPY --from=uv-builder /venv /venv
ENV PATH="/venv/bin:$PATH"
ENV VIRTUAL_ENV="/venv"
# Works around issue with `curl`
# https://github.com/balena-io-library/base-images/issues/562
RUN c_rehash
# QT Base from packages does not support eglfs.
# Qt 5 boards (Pi 1-4 32-bit) get a custom cross-built Qt runtime via the
# qt5-* archive below; Qt 6 boards use Debian's apt qt6-* packages and so
@@ -74,6 +71,4 @@ WORKDIR /usr/src/app
RUN mkdir -p /usr/src/app
COPY . /usr/src/app/
{% include 'labels.j2' %}
CMD ["bash", "./bin/start_viewer.sh"]

View File

@@ -5,8 +5,7 @@
push/delete access. Without it, packages stay private even when the
repo is public, and the package page on GitHub has no link back. #}
{% set service_descriptions = {
'server': 'Anthias web server (uvicorn + Django + Channels)',
'celery': 'Anthias background task worker (asset downloads, cleanup, display power)',
'server': 'Anthias web server (uvicorn + Django + Channels); also runs the Celery worker via a CMD override',
'redis': 'Redis broker for Anthias Celery and Channels',
'viewer': 'Anthias display/viewer service',
'test': 'Anthias test runner',

View File

@@ -1,6 +1,3 @@
# syntax=docker/dockerfile:1.4
# vim: ft=dockerfile
{# Single source of truth for the uv version — used both by the
prebuilt-image COPY (amd64/arm64) and the PyPI fallback
(32-bit ARM), so both paths stay byte-pinned and reproducible. #}

View File

@@ -90,7 +90,6 @@ Build and start the containers.
$ uv run python -m tools.image_builder \
--dockerfiles-only \
--disable-cache-mounts \
--service celery \
--service redis \
--service test
$ docker compose \

View File

@@ -34,24 +34,37 @@ def build_image(
clean_build: bool,
push: bool,
dockerfiles_only: bool,
cache_backend: str,
) -> None:
# Enable BuildKit
os.environ['DOCKER_BUILDKIT'] = '1'
context = {}
# Create board-specific cache directory
cache_dir = Path('/tmp/.buildx-cache') / (
# Local cache: per-board on-disk directory under the user's
# XDG-style cache home (override via $XDG_CACHE_HOME). Per-user
# rather than under /tmp so a multi-user host doesn't share
# buildkit cache state across accounts. Unused by the registry
# backend, which pushes to GHCR instead.
cache_scope = (
f'{board}-64'
if board == 'pi4' and target_platform == 'linux/arm64/v8'
else board
)
try:
cache_dir.mkdir(parents=True, exist_ok=True)
except Exception as e:
click.secho(
f'Warning: Failed to create cache directory: {e}', fg='yellow'
)
xdg_cache_home = (
Path(os.environ['XDG_CACHE_HOME'])
if os.environ.get('XDG_CACHE_HOME')
else Path.home() / '.cache'
)
cache_dir = xdg_cache_home / 'anthias-buildx' / cache_scope
if cache_backend == 'local':
try:
cache_dir.mkdir(parents=True, exist_ok=True)
except Exception as e:
click.secho(
f'Warning: Failed to create cache directory: {e}',
fg='yellow',
)
base_apt_dependencies = [
'build-essential',
@@ -141,22 +154,79 @@ def build_image(
except: # noqa: E722
docker.buildx.create(name='multiarch-builder', use=True)
docker.buildx.build(
context_path='.',
cache=(not clean_build),
cache_from={
'type': 'local',
'src': str(cache_dir),
}
if not clean_build
else None,
cache_to={
# Resolve cache_from / cache_to. `--clean-build` short-circuits both
# to None for a true cold rebuild. Otherwise we pick a backend:
#
# * local — board-scoped on-disk directory at
# $XDG_CACHE_HOME/anthias-buildx/<board> (typically
# ~/.cache/anthias-buildx/<board>). Used for local dev so
# cache state survives across `tools.image_builder`
# invocations on the same machine.
# * registry — BuildKit's registry cache backend
# (https://docs.docker.com/build/cache/backends/registry/).
# Pushes cache to a tagged image at
# <namespace>-<service>:buildcache-<board>. Reuses the GHCR
# login already done by CI — no extra tokens or third-party
# actions needed — and inherits GHCR's free unlimited
# storage for public packages. Cache lives next to the real
# image tags but with a `buildcache-*` prefix so it can't
# collide with the immutable <short-hash>-<board> or
# floating latest-<board> tags.
if clean_build:
cache_from = None
cache_to = None
elif cache_backend == 'registry':
# Hardcode the GHCR-primary namespace so the cache lives next to
# the published images for this service. Doesn't read from
# `namespaces` below: cache only needs one canonical home, and
# GHCR's free unlimited storage for public packages makes it the
# right one. If the namespaces list changes in the future, this
# ref needs to move with it.
cache_ref = (
f'ghcr.io/screenly/anthias-{service}:buildcache-{cache_scope}'
)
# Reads are always safe — anthias-* GHCR packages are public,
# so cache_from works without auth (matters for someone
# invoking this locally with --cache-backend=registry to
# warm-start off CI's cache).
cache_from = {'type': 'registry', 'ref': cache_ref}
if push:
cache_to = {
'type': 'registry',
'ref': cache_ref,
'mode': 'max',
# `image-manifest=true` writes the cache as an OCI
# image manifest rather than the legacy index-only
# form, which is the only thing GHCR will accept
# under the ghcr.io/screenly/anthias-* repos (it
# rejects standalone cache manifests). Cheap, just
# affects how the cache blob is wrapped.
'image-manifest': 'true',
}
else:
# Without --push the build hasn't authenticated to GHCR,
# so trying to write cache there would fail mid-build.
# Read-only: pull layers from the published cache, don't
# update it.
cache_to = None
click.secho(
f'cache-backend=registry without --push: reading from '
f'{cache_ref} but not writing back.',
fg='yellow',
)
else:
cache_from = {'type': 'local', 'src': str(cache_dir)}
cache_to = {
'type': 'local',
'dest': str(cache_dir),
'mode': 'max',
}
if not clean_build
else None,
docker.buildx.build(
context_path='.',
cache=(not clean_build),
cache_from=cache_from,
cache_to=cache_to,
builder='multiarch-builder',
file=f'docker/Dockerfile.{service}',
load=True,
@@ -225,6 +295,21 @@ def build_image(
'--dockerfiles-only',
is_flag=True,
)
@click.option(
'--cache-backend',
type=click.Choice(['local', 'registry']),
default='local',
envvar='BUILDX_CACHE_BACKEND',
help=(
'BuildKit cache backend. `local` (default) writes to '
'$XDG_CACHE_HOME/anthias-buildx/<board>/ (typically '
'~/.cache/anthias-buildx/) and is right for local dev. '
'`registry` pushes the cache to '
'ghcr.io/screenly/anthias-<service>:buildcache-<board> for '
'CI — reuses the GHCR login already done by the workflow, '
'no extra tokens needed. Override via $BUILDX_CACHE_BACKEND.'
),
)
def main(
clean_build: bool,
build_target: str,
@@ -235,6 +320,7 @@ def main(
push: bool,
skip_latest_tag: bool,
dockerfiles_only: bool,
cache_backend: str,
) -> None:
git_branch = pygit2.Repository('.').head.shorthand
git_hash = str(pygit2.Repository('.').head.target)
@@ -300,6 +386,7 @@ def main(
clean_build,
push,
dockerfiles_only,
cache_backend,
)

View File

@@ -2,7 +2,6 @@ SHORT_HASH_LENGTH = 7
BUILD_TARGET_OPTIONS = ['pi1', 'pi2', 'pi3', 'pi4', 'pi4-64', 'pi5', 'x86']
SERVICES = (
'server',
'celery',
'redis',
'viewer',
'test',

View File

@@ -80,7 +80,6 @@ def generate_dockerfile(service: str, context: dict[str, Any]) -> None:
def get_uv_builder_context(service: str) -> dict[str, Any]:
service_to_group = {
'server': 'server',
'celery': 'server',
'viewer': 'viewer',
'test': 'test',
}
@@ -158,25 +157,29 @@ def get_viewer_context(board: str, target_platform: str) -> dict[str, Any]:
qt_major_version = qt_version.split('.')[0]
apt_dependencies = [
'build-essential',
# Viewer-only apt deps. The shared set (build-essential, curl, ffmpeg,
# git-core, libcec-dev, libffi-dev, libssl-dev, net-tools, procps,
# psmisc, python-is-python3, python3-dev, python3-gi, python3-pip,
# python3-setuptools, sqlite3, sudo, plus libraspberrypi0 on 32-bit
# Pi boards) is installed by Dockerfile.base.j2 in a layer that
# server (and test) also use, so it dedups across images. Anything
# listed here is unique to the viewer image.
viewer_extra_apt_dependencies = [
'ca-certificates',
'curl',
'dbus-daemon',
'fonts-arphic-uming',
'git-core',
'libasound2-dev',
'libavcodec-dev',
'libavdevice-dev',
'libavfilter-dev',
'libavformat-dev',
'libavutil-dev',
'libbz2-dev',
'libcec-dev ',
'libdbus-1-dev',
'libdbus-glib-1-dev',
'libdrm-dev',
'libegl1-mesa-dev',
'libevent-dev',
'libffi-dev',
'libfontconfig1-dev',
'libfreetype6-dev',
'libgbm-dev',
@@ -204,7 +207,7 @@ def get_viewer_context(board: str, target_platform: str) -> dict[str, Any]:
'libsnappy-dev',
'libsqlite3-dev',
'libsrtp2-dev',
'libssl-dev',
'libswresample-dev',
'libswscale-dev',
'libsystemd-dev',
'libts-dev',
@@ -241,31 +244,13 @@ def get_viewer_context(board: str, target_platform: str) -> dict[str, Any]:
'libxslt1-dev',
'libxss-dev',
'libxtst-dev',
'net-tools',
'procps',
'psmisc',
'python3-dev',
'python3-gi',
'python3-netifaces',
'python3-pip',
'python3-setuptools',
'python-is-python3',
'ttf-wqy-zenhei',
'vlc',
'sudo',
'sqlite3',
'ffmpeg',
'libavcodec-dev',
'libavdevice-dev',
'libavfilter-dev',
'libavformat-dev',
'libavutil-dev',
'libswresample-dev',
'libswscale-dev',
]
if is_qt6:
apt_dependencies.extend(
viewer_extra_apt_dependencies.extend(
[
'mpv',
'qt6-base-dev',
@@ -274,9 +259,11 @@ def get_viewer_context(board: str, target_platform: str) -> dict[str, Any]:
]
)
else:
apt_dependencies.extend(
# libraspberrypi0 already comes in via base_apt_dependencies on
# 32-bit Pi boards (see __main__.py), so it's deliberately not
# repeated here.
viewer_extra_apt_dependencies.extend(
[
'libraspberrypi0',
'libgst-dev',
'libsqlite0-dev',
'libsrtp0-dev',
@@ -285,10 +272,10 @@ def get_viewer_context(board: str, target_platform: str) -> dict[str, Any]:
)
if board != 'pi1':
apt_dependencies.extend(['libssl1.1'])
viewer_extra_apt_dependencies.extend(['libssl1.1'])
return {
'apt_dependencies': apt_dependencies,
'viewer_extra_apt_dependencies': viewer_extra_apt_dependencies,
'qt_version': qt_version,
'qt_major_version': qt_major_version,
'webview_version': webview_version,