mirror of
https://github.com/Screenly/Anthias.git
synced 2026-06-10 09:08:09 -04:00
chore: drop transcode-era defensive hardening on celery + server image
These guards were load-bearing while the asset processor ran libx264 /
libx265 transcodes; with the on-device transcode pipeline gone they're
dead code defending against a workload that no longer exists.
Removed:
- ``cpus: ${CELERY_CPU_LIMIT}`` / ``cpus: 2.0`` cgroup CPU caps on
anthias-celery (every compose template)
- ``nice -n 19 ionice -c 3`` wrapper on the celery command
- ``--concurrency=1`` on celery worker; default celery concurrency
is fine when the only tasks are ffprobe + Pillow conversion
- ``CELERY_CPU_LIMIT`` calc in ``bin/upgrade_containers.sh``
- ``_rpt1-ffmpeg-pin.j2`` include + reinstall layer in
``Dockerfile.server.j2``; the +rpt1 ffmpeg was only needed for
the walker's ``-hwaccel drm`` transcode. The server now only
runs ffprobe, which the stock Debian ffmpeg handles fine
(smaller server image, simpler base)
- Stale ``ffprobe → passthrough or libx264/aac transcode`` section
header in processing.py
Kept:
- ``mem_limit: ${CELERY_MEMORY_LIMIT_KB}k`` on celery — still a
useful safety net against a decompression-bomb fixture or
runaway ffprobe
- ``+rpt1`` ffmpeg pin on the *viewer* image — still load-bearing
for mpv's ``v4l2_request`` HW decode on Pi 4 / Pi 5 / Rock Pi 4
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -15,27 +15,11 @@ SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
|
||||
TOTAL_MEMORY_KB=$(grep MemTotal /proc/meminfo | awk {'print $2'})
|
||||
export VIEWER_MEMORY_LIMIT_KB=$(echo "$TOTAL_MEMORY_KB" \* 0.8 | bc)
|
||||
export SHM_SIZE_KB="$(echo "$TOTAL_MEMORY_KB" \* 0.3 | bc | cut -d'.' -f1)"
|
||||
# Hard cgroup CPU cap for anthias-celery. Half the host's cores
|
||||
# (floored to 1.0 so single-core boxes still make progress) keeps
|
||||
# the upload-time normalisation pipeline from starving the viewer
|
||||
# or sshd even when libx265 wants every cycle it can get. On a
|
||||
# Pi 4 / Pi 5 / Rock Pi 4 (4 cores) that's 2 CPUs' worth of
|
||||
# compute, leaving 2 for the viewer + system. On an 8-core x86
|
||||
# box that's 4 CPUs, leaving 4 for everything else. Live-
|
||||
# confirmed on the Rock Pi 4 that ``nice -n 19`` + ``ionice -c 3``
|
||||
# alone are insufficient — the kernel still hands libx265 every
|
||||
# available cycle if nothing else is asking for them, which
|
||||
# starves sshd through banner exchange and drops mpv frames.
|
||||
CELERY_CPU_LIMIT_RAW=$(echo "$(nproc) * 0.5" | bc -l)
|
||||
export CELERY_CPU_LIMIT=$(awk -v v="$CELERY_CPU_LIMIT_RAW" 'BEGIN { printf "%.1f", (v < 1.0 ? 1.0 : v) }')
|
||||
# Hard cgroup memory limit for anthias-celery. 60% of host RAM
|
||||
# keeps libx265 (~1.5 GB resident on 4K HEVC encodes) from pushing
|
||||
# the system into swap, which is what actually starves sshd + the
|
||||
# viewer on 4 GB SBCs. Without this cap a single 4K transcode on
|
||||
# the Rock Pi 4 made the box unresponsive even with the CPU quota
|
||||
# in place — cgroup CPU isolation doesn't help if libx265 can
|
||||
# allocate all available RAM. 60% leaves 40% for the viewer +
|
||||
# server + redis + system, matching the CPU 50/50 split.
|
||||
# Memory cap for anthias-celery. 60% of host RAM is conservative
|
||||
# headroom for the remaining celery workloads (ffprobe metadata,
|
||||
# HEIC → WebP image conversion); the cap is here as a safety net
|
||||
# against a decompression-bomb fixture or runaway ffprobe, not
|
||||
# because routine workloads come anywhere near it.
|
||||
export CELERY_MEMORY_LIMIT_KB=$(echo "$TOTAL_MEMORY_KB * 0.6" | bc | cut -d'.' -f1)
|
||||
GIT_BRANCH="${GIT_BRANCH:-master}"
|
||||
|
||||
|
||||
@@ -52,16 +52,8 @@ services:
|
||||
# Runs on the same image as anthias-server with a CMD override.
|
||||
# See docker-compose.yml.tmpl for context on the merge.
|
||||
image: ghcr.io/screenly/anthias-server:${GIT_SHORT_HASH}-${BOARD}
|
||||
# nice + ionice + cgroup cpu-quota keep the upload-time
|
||||
# transcode pipeline from starving the on-device viewer; see
|
||||
# docker-compose.yml.tmpl. Literal ``2.0`` here (half of
|
||||
# 4-core Pi 2 / 3 / 4 / 5, the boards balena targets); the
|
||||
# non-balena compose computes the limit dynamically.
|
||||
cpus: 2.0
|
||||
command: >
|
||||
nice -n 19 ionice -c 3
|
||||
celery -A anthias_server.celery_tasks.celery worker -B -n worker@anthias
|
||||
--concurrency=1
|
||||
--loglevel=info --schedule /tmp/celerybeat-schedule
|
||||
depends_on:
|
||||
- anthias-server
|
||||
|
||||
@@ -49,18 +49,8 @@ services:
|
||||
# Runs on the same image as anthias-server with a CMD override.
|
||||
# See docker-compose.yml.tmpl for context on the merge.
|
||||
image: ghcr.io/screenly/anthias-server:${GIT_SHORT_HASH}-${BOARD}
|
||||
# nice + ionice + cgroup cpu-quota keep the upload-time
|
||||
# transcode pipeline from starving the on-device viewer; see
|
||||
# docker-compose.yml.tmpl for the full rationale. Literal
|
||||
# ``2.0`` here (half of 4-core Pi 2 / 3 / 4 / 5 targets, the
|
||||
# only boards balena builds for); the non-balena compose
|
||||
# computes the limit dynamically via $(nproc) in
|
||||
# ``bin/upgrade_containers.sh``.
|
||||
cpus: 2.0
|
||||
command: >
|
||||
nice -n 19 ionice -c 3
|
||||
celery -A anthias_server.celery_tasks.celery worker -B -n worker@anthias
|
||||
--concurrency=1
|
||||
--loglevel=info --schedule /tmp/celerybeat-schedule
|
||||
depends_on:
|
||||
- anthias-server
|
||||
|
||||
@@ -29,12 +29,6 @@ services:
|
||||
# Compose builds anthias-server first (it owns the build:) and
|
||||
# this service inherits the same image, only overriding CMD.
|
||||
image: anthias-server:dev
|
||||
# Hard cgroup CFS quota — see docker-compose.yml.tmpl. nice/ionice
|
||||
# are insufficient when the worker is the only heavy workload. Dev
|
||||
# uses a literal ``2.0`` (works on any laptop / desktop / SBC with
|
||||
# at least 2 cores) instead of recomputing per host; the production
|
||||
# path scales with $(nproc) via bin/upgrade_containers.sh.
|
||||
cpus: 2.0
|
||||
depends_on:
|
||||
anthias-server:
|
||||
condition: service_started
|
||||
@@ -42,7 +36,6 @@ services:
|
||||
condition: service_started
|
||||
command: >
|
||||
celery -A anthias_server.celery_tasks.celery worker -B -n worker@anthias
|
||||
--concurrency=1
|
||||
--loglevel=info --scheduler celery.beat.Scheduler
|
||||
environment:
|
||||
- HOME=/data
|
||||
|
||||
@@ -98,58 +98,14 @@ services:
|
||||
# and a separate celery image was duplicating ~825 MB extracted of
|
||||
# identical content per device. See refactor: drop celery image.
|
||||
image: ghcr.io/screenly/anthias-server:${DOCKER_TAG}-${DEVICE_TYPE}
|
||||
# ``nice -n 19 ionice -c 3`` lowers CPU and IO priority to the
|
||||
# idle class so the upload-time normalisation pipeline (HEIC→WebP,
|
||||
# exotic-codec → board-appropriate H.264/HEVC transcode) never starves the on-device
|
||||
# viewer mid-playback. Both wrappers are no-ops when the system is
|
||||
# idle — a background sweep on a quiet device still runs at
|
||||
# full speed; only contention with the viewer process slows it
|
||||
# down. Subprocesses (ffmpeg, ffprobe, Pillow's libheif binding)
|
||||
# inherit the wrapper's settings, so a single configuration here
|
||||
# covers every workload the worker spawns.
|
||||
#
|
||||
# ``--concurrency=1`` caps the worker at one normalize task at
|
||||
# a time. Default celery concurrency = num_cores, which on a
|
||||
# Pi 4 / Pi 5 / Rock Pi 4 means 4 parallel libx265 encodes
|
||||
# sharing the same SoC as the viewer's mpv process. Even at
|
||||
# nice 19 four ffmpegs still saturate the cores and push the
|
||||
# board into swap (each libx265 1080p encode needs ~500 MB
|
||||
# RAM; a 4 GB SBC OOM-spirals well before the walker
|
||||
# finishes). Asset processing is upload-time, not throughput-
|
||||
# bound — serial encodes finish a few minutes later, but the
|
||||
# viewer never drops a frame.
|
||||
#
|
||||
# ``cpus: <half the host>`` is a hard cgroup CFS quota.
|
||||
# nice/ionice are soft priority hints and only matter when
|
||||
# something else is asking for cycles — when libx265 is the
|
||||
# only heavy workload the scheduler still gives it everything
|
||||
# available, and on a 4 GB SBC that's enough to starve sshd
|
||||
# through banner exchange + mpv mid-frame. The hard cap is the
|
||||
# only way to guarantee "encoding never impacts playback or
|
||||
# UI access". Half the host's cores (computed in
|
||||
# bin/upgrade_containers.sh, floored to 1.0) leaves the other
|
||||
# half for the viewer + system on every supported SBC: 2 CPUs
|
||||
# of headroom on a 4-core Pi 4 / Pi 5 / Rock Pi 4, 4 CPUs on
|
||||
# an 8-core x86, etc. Multi-core encoders (libx265 with
|
||||
# ``-threads N``) still run in parallel up to the cap, so
|
||||
# bigger machines finish encodes faster without ever
|
||||
# compromising playback responsiveness.
|
||||
cpus: ${CELERY_CPU_LIMIT}
|
||||
# ``mem_limit`` partners with the cpus cap above. cgroup CPU
|
||||
# isolation alone doesn't stop libx265 from allocating every
|
||||
# available byte at 4K (~1.5 GB resident), which on a 4 GB
|
||||
# SBC pushes the system into swap and starves the viewer +
|
||||
# sshd. 60% of host RAM (computed in
|
||||
# bin/upgrade_containers.sh) keeps libx265's working set
|
||||
# comfortably inside the cap and leaves enough headroom for
|
||||
# everything else. Live-confirmed on the Rock Pi 4 that
|
||||
# without this cap a single 4K transcode wedged the device
|
||||
# despite the cpus cap.
|
||||
# ``mem_limit`` keeps a runaway celery task (e.g. a HEIC →
|
||||
# WebP convert on a decompression-bomb fixture) from eating
|
||||
# the box's RAM and starving the viewer. 60% of host (computed
|
||||
# in bin/upgrade_containers.sh) leaves comfortable headroom on
|
||||
# every supported SBC.
|
||||
mem_limit: ${CELERY_MEMORY_LIMIT_KB}k
|
||||
command: >
|
||||
nice -n 19 ionice -c 3
|
||||
celery -A anthias_server.celery_tasks.celery worker -B -n worker@anthias
|
||||
--concurrency=1
|
||||
--loglevel=info --scheduler celery.beat.Scheduler
|
||||
depends_on:
|
||||
- anthias-server
|
||||
|
||||
@@ -49,31 +49,6 @@ RUN bun run build
|
||||
|
||||
{% include 'Dockerfile.base.j2' %}
|
||||
|
||||
{# Hardware-decode codecs the asset processor's walker uses to
|
||||
transcode source video. The pin only fires on boards whose
|
||||
v4l2_request HEVC / H.264 decoders aren't reachable from stock
|
||||
Debian's ffmpeg — same set as the viewer image. See the include
|
||||
file's header for the full per-board rationale. #}
|
||||
{% include '_rpt1-ffmpeg-pin.j2' %}
|
||||
|
||||
{# Pull in the same ffmpeg + libav* family the pin made eligible.
|
||||
Without re-running ``apt install ffmpeg`` here the base image
|
||||
stage's earlier install wins and we silently stay on the
|
||||
vanilla Debian build. ``--reinstall`` forces the +rpt1 package
|
||||
to land even when the same version is already on disk. #}
|
||||
{% if board in ('pi4-64', 'pi5', 'arm64') %}
|
||||
{% if disable_cache_mounts %}
|
||||
RUN \
|
||||
{% else %}
|
||||
RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
|
||||
{% endif %}
|
||||
apt-get update && \
|
||||
apt-get install -y --reinstall --no-install-recommends \
|
||||
ffmpeg libavcodec61 libavdevice61 libavfilter10 \
|
||||
libavformat61 libavutil59 libpostproc58 libswresample5 \
|
||||
libswscale8
|
||||
{% endif %}
|
||||
|
||||
COPY --from=uv-builder /venv /venv
|
||||
ENV PATH="/venv/bin:$PATH"
|
||||
ENV VIRTUAL_ENV="/venv"
|
||||
|
||||
@@ -656,7 +656,7 @@ def _run_image_normalisation(asset: Asset) -> None:
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Video normalisation: ffprobe → passthrough or libx264/aac transcode
|
||||
# Video normalisation: ffprobe → metadata write + HW-decode codec gate
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user