Commit Graph

4 Commits

Author SHA1 Message Date
dependabot[bot]
68f1bc0981 chore(deps): bump the github-actions group with 3 updates (#2994)
Bumps the github-actions group with 3 updates: [actions/checkout](https://github.com/actions/checkout), [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) and [github/codeql-action](https://github.com/github/codeql-action).


Updates `actions/checkout` from 6.0.2 to 6.0.3
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](de0fac2e45...df4cb1c069)

Updates `astral-sh/setup-uv` from 8.1.0 to 8.2.0
- [Release notes](https://github.com/astral-sh/setup-uv/releases)
- [Commits](08807647e7...fac544c07d)

Updates `github/codeql-action` from 4.36.0 to 4.36.1
- [Release notes](https://github.com/github/codeql-action/releases)
- [Changelog](https://github.com/github/codeql-action/blob/main/CHANGELOG.md)
- [Commits](7211b7c807...87557b9c84)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: 6.0.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
- dependency-name: astral-sh/setup-uv
  dependency-version: 8.2.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-actions
- dependency-name: github/codeql-action
  dependency-version: 4.36.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-actions
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-06-07 07:22:21 +02:00
Viktor Petersson
133ec78ff0 refactor(packaging): adopt src/ layout with split server/viewer packages (#2817)
* refactor(packaging): adopt src/ layout with split server/viewer packages

Move all Python source under src/ following modern packaging conventions.
Server, viewer, host-agent, and shared common code now live as four
top-level packages with clear excision boundaries — anthias_viewer can
be removed wholesale when the rewrite-out-of-Python lands without
touching the server.

  src/anthias_common/         shared: errors, utils, internal_auth, device_helper
  src/anthias_server/         Django app, REST API, Celery tasks, manage.py
    lib/                      server-only: auth, backup_helper, diagnostics, github, telemetry
  src/anthias_viewer/         player runtime (was viewer/)
  src/anthias_host_agent/     systemd-driven host shim (was host_agent.py)
  tools/raspberry_pi_imager/  moved from repo root
  tests/conftest.py           moved from repo root

pyproject.toml gets [build-system], setuptools src/ discovery, and an
anthias-manage console script. Django AppConfigs keep label='anthias_app'
and label='api' so existing migration dependency tuples don't move.
BASE_DIR computed from parents[3] to keep templates/static at repo root.
mypy_path set to ["src", "stubs"] with explicit_package_bases.

Dockerfile templates set PYTHONPATH=/usr/src/app/src; bin/start_*.sh
and CI workflows use python -m anthias_server.manage / python -m
anthias_viewer instead of bare ./manage.py and python -m viewer.
Ansible host-agent unit invokes python -m anthias_host_agent.

Verified end-to-end in the docker test container:
  - 430 unit tests pass (matches baseline)
  - 7 integration tests pass, 5 skipped (matches baseline)
  - ruff, mypy clean

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style: ruff format the new src/ tree

The longer post-rename module paths (anthias_common.internal_auth vs
lib.internal_auth, etc.) pushed several import lines past 79 chars, so
ruff format had to wrap them. Apply that formatting and split the one
multi-import in anthias_viewer/__init__.py into per-symbol lines so the
existing # noqa: E402 sits on the `from` line where ruff expects it,
without needing a re-anchor when format wraps the parens.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: realign sonar + gitignore comment to src/ layout

sonar-project.properties still pointed at the pre-refactor top-level
packages (anthias_app, anthias_django, api, lib, viewer, ...) and
their old per-file coverage.exclusions paths, which would have
produced empty Sonar runs and stale exclusions. Collapse sources to
`src` and rewrite the exclusions to the new src/anthias_*/ paths.

Also fix the stale path reference in .gitignore's comment for the
test DB (now src/anthias_server/django_project/settings.py).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore: gitignore .claude/ and untrack the lock file I just leaked

Previous commit accidentally pulled in .claude/scheduled_tasks.lock
because .claude was in .dockerignore but not .gitignore. Add the
pattern to .gitignore and drop the file from the index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(dockerignore): exclude pytest cache, __pycache__ dirs, and the local test DB

Three entries that were missing relative to the new src/ layout:

- .anthias-test.db (and -journal/-wal/-shm siblings) — created at the
  repo root by src/anthias_server/django_project/settings.py when a
  developer runs the host pytest suite. Without this exclude, the
  next docker build COPY . bakes the file into /usr/src/app/.
- **/__pycache__ — *.py[co] only matched the .pyc/.pyo files, leaving
  the empty cache directories to ship.
- .pytest_cache — host-side, regenerable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(urls): preserve 'anthias_app' URL namespace, not just the app label

Copilot caught that the import-rewrite swept up the URL namespace too:
app_name in src/anthias_server/app/urls.py changed from 'anthias_app'
to 'anthias_server.app', which leaves templates/login.html's
{% url 'anthias_app:login' %} pointing at a namespace that no longer
exists — NoReverseMatch at render time when an unauthenticated request
hits the login page.

The namespace is the same kind of stable user-facing identifier as the
AppConfig label (which we already kept as 'anthias_app'). Restore it,
and revert the two reverse() callers in lib/auth.py and app/views.py
that the rewrite changed in lockstep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): update --confcutdir to the new tools/raspberry_pi_imager path

Copilot caught that the earlier sweep missed --confcutdir=raspberry_pi_imager
(no trailing slash) — replace_all of "raspberry_pi_imager/" only matched
path-with-slash forms. Without confcutdir, pytest walks back up looking
for conftests and discovers the repo-root tests/conftest.py, which
applies the Anthias-specific Django/Redis stubs to the rpi-imager test
run on the website-deploy workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 08:08:32 +01:00
Viktor Petersson
d9ebc8051c chore(build): upgrade to Debian Trixie + Python 3.13, drop Balena base images (#2779)
* chore(build): upgrade to Debian Trixie + Python 3.13, drop Balena base images

Move every container off `balenalib/raspberrypi*-debian:bookworm` (Balena
hasn't published a `trixie` tag on any of those repos and last refreshed
in May 2025) onto vanilla `debian:trixie`. Pi 1 and 32-bit Pi 4 are
retired at the same time — Pi 1 has no `linux/arm/v6` variant in upstream
Debian, and Pi 4 always has a 64-bit path that avoids the messy
`libssl1.1` / `libgst-dev` / `libsqlite0-dev` Qt 5 deps. Surviving build
matrix: pi2, pi3, pi4-64, pi5, x86.

For the surviving 32-bit boards (pi2, pi3) the legacy Broadcom userland
(libraspberrypi0 → /opt/vc/lib/{libbcm_host,libmmal,libvchiq_arm}) is
still required at runtime by the Qt 5 webview. Trixie's
archive.raspberrypi.org/debian/main no longer ships those packages
(replaced by raspi-utils + libdtovl0, which actively break
libraspberrypi0), so Dockerfile.base.j2 conditionally writes Deb822
.sources entries pointing at archive.raspberrypi.org/debian trixie main
and archive.raspbian.org/raspbian trixie firmware (where the legacy
Raspbian builds of libraspberrypi0 still live, armhf only). The
.deb-form raspberrypi-archive-keyring + raspbian-archive-keyring packages
are extracted with `dpkg-deb -x` (their bundled keys carry trixie-policy-
compliant binding signatures, unlike the standalone .public.key files
which fail Sequoia/sqv's post-2026-02-01 SHA-1 ban). Architectures: armhf
on each .sources file keeps apt from querying the Pi mirrors for the
arm64 / x86 builds.

Trixie package renames also fixed: libgles2-mesa → libgles2,
ttf-wqy-zenhei → fonts-wqy-zenhei, libpng16-16 → libpng16-16t64 (time64
transition; armhf has no `Provides:` fallback like amd64 does), and the
Qt 5-only libgst-dev / libsqlite0-dev / libsrtp0-dev / libssl1.1 are
dropped (libgstreamer1.0-dev, libsqlite3-dev, libsrtp2-dev, libssl3 take
their place — first added explicitly, the rest already in the main
list). The transitional `git-core` is gone in trixie; `git` covers it.

Python 3.13 (Trixie's default) replaces the 3.11 pin everywhere:
pyproject.toml requires-python and mypy python_version, ruff.toml
target-version, .python-version, uv.lock (regenerated; only diff is
async-timeout dropped — its marker was python<3.11), uv-builder.j2's
UV_PYTHON, Dockerfile.dev's FROM, bin/install.sh's host check, and every
CI workflow's setup-python pin.

Cleanup that falls out: drop the cache_scope / device_type / version_suffix
`pi4 + arm64 → pi4-64` re-mapping (board is now self-identifying), drop
the `c_rehash` workaround in Dockerfile.base.j2 (specific to a Balena
curl bug, not vanilla Debian), drop the dead arm/v6 + arm/v8 branches in
uv-builder.j2 (only arm/v7 remains as the 32-bit ARM target), retire the
old build_qt5.sh `pi1`/`pi4` branches, and delete docker/Dockerfile.celery
(left behind from the celery-image removal in 5e00c8ba).

Out-of-band prereq before merging anything that depends on a viewer
build: cut a new `WebView-v*` release with
webview-{ver}-trixie-{board}.tar.gz (and qt5-5.15.14-trixie-{pi2,pi3}.tar.gz)
for the surviving boards, then bump WEBVIEW_VERSION in
tools/image_builder/utils.py:143. The webview Dockerfiles already point
at debian:trixie, so triggering build-webview.yaml on the new tag should
produce the artifacts.

Verification (proven via real `docker buildx --platform=...` runs):
- x86 server image: full build, runs Debian 13.4 + Python 3.13.5; Django
  5.2.13, channels 4.3.1, uvicorn 0.32.1 all import.
- x86 redis image: Redis 8.0.2 on trixie.
- pi3 (linux/arm/v7 under qemu) server image: full build green — Pi
  apt sources bootstrap works, libraspberrypi0 installs from
  raspbian/firmware/armhf with /opt/vc/lib/* present.
- pi3 (linux/arm/v7 under qemu) viewer image: 147s apt layer green
  end-to-end through libpulse-dev, libgstreamer1.0-dev, libsdl2-dev,
  libpng16-16t64, etc.; build proceeds through uv-builder + main stages
  and stops only at the WebView qt5 tarball fetch (the trixie artifacts
  haven't been cut yet — that's the prereq above).
- ruff check + ruff format --check on tools/image_builder/: clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): replace distutils.strtobool (3.12+ removal); satisfy SC2129

Two CI failures from the Trixie/3.13 bump fall out of stdlib & lint:

- `lib/utils.py:8` imported `from distutils.util import strtobool`,
  which is gone in Python 3.12+. mypy on 3.13 flagged it as
  import-not-found. Inline the original truthy/falsy table directly in
  `string_to_bool` so every caller keeps accepting the same
  y/yes/t/true/on/1 / n/no/f/false/off/0 set.
- actionlint/shellcheck SC2129 on `.github/workflows/docker-build.yaml`
  in the `Set Docker tag` step I added — three sequential
  `>> "$GITHUB_ENV"` redirects collapse into one `{ ...; } >> $GITHUB_ENV`
  block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): HTTPS + SHA256-pin Pi keyring fetch; nuke libcec-dev typo

Address Copilot's review on PR 2779.

- docker/Dockerfile.base.j2 + webview/Dockerfile: switch the Pi/Raspbian
  keyring downloads (and the resulting Deb822 `URIs:` for both apt
  archives) from `http://` to `https://`. Both archives serve TLS
  cleanly today (verified with curl --proto '=https' --tlsv1.2). The
  keyring .deb is the trust anchor for everything fetched after it, so
  the .deb hash is now also pinned via `sha256sum -c -` before
  `dpkg-deb -x` extracts it — TLS alone wouldn't catch an upstream
  archive-side swap. Hashes match the
  raspberrypi-archive-keyring_2025.1+rpt1_all.deb and
  raspbian-archive-keyring_20120528.4_all.deb files served at the time
  this commit lands; bumping either filename is the signal to refresh
  the pin too.
- tools/image_builder/__main__.py: trim the trailing space from
  `'libcec-dev '` in `base_apt_dependencies`. apt is forgiving about it
  but it produces extra whitespace in the rendered Dockerfile and is
  easy to miss in diffs.

Verified by re-running the keyring bootstrap end-to-end on a fresh
debian:trixie linux/arm/v7 container: both .debs pass sha256sum -c, apt
update fetches over HTTPS, and libraspberrypi0 installs from
archive.raspbian.org/raspbian trixie/firmware as before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sonar): declare USER root explicitly in webview/Dockerfile builder

SonarCloud's docker:S6471 hotspot was already flagging this file on
master (the implicit-root warning lives on every `FROM debian:*` line
without a `USER` directive); my Trixie change shifted the original line
107 to 131 and Sonar re-emitted it as a "new in PR" finding. Resolve
with the rule's recommended escape hatch — declare the user explicitly,
which converts the implicit-default into an acknowledged choice and
silences the rule.

Both stages stay on `USER root`: the builder stage's `dpkg-deb -x` /
`dpkg --purge libraspberrypi-dev` and the runtime stage's writes to
/sysroot, /opt/vc, /root/.pyenv, /usr/local/bin all require root. This
image is a CI-local Qt 5 cross-compile builder that produces the
WebView tarball as a release artifact — it is never deployed, so the
"don't run as root" guidance behind S6471 doesn't apply in the way it
would for a published runtime image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: fix two Copilot-flagged comment inaccuracies

- Dockerfile.base.j2: comment said libraspberrypi0 comes from
  archive.raspbian.org's `rpi` component, but the Deb822 source
  below correctly declares `Components: firmware`. Verified via
  Packages.gz on archive.raspbian.org/dists/trixie/firmware/
  binary-armhf — that's the only component shipping
  libraspberrypi0 on trixie/armhf. Comment now matches reality.

- image_builder/utils.py: Qt 5 branch comment claimed the modern
  equivalents (libgstreamer1.0-dev, libsqlite3-dev, libsrtp2-dev)
  for the dropped trixie packages were "pulled by the main viewer
  apt list above". libsqlite3-dev / libsrtp2-dev are indeed in
  that list, but libgstreamer1.0-dev is Qt 5-only and is added by
  the extend() call right below — corrected the comment to point
  there instead.

Both are pure comment changes; behavior unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(webview): adopt registry-cache backend, mirror docker-build.yaml

Both Docker-build steps in build-webview.yaml had ad-hoc caching that
left the bulk of layer state on the floor:

* `build-docker-image` (Pi 1-4 / Qt 5 builder) used
  `--cache-from screenly/ose-qt-builder:latest`, which is the
  image-tag-as-cache trick — only reuses the final manifest, never the
  apt-install + Qt cross-build intermediate layers, and silently no-ops
  the first time after a Dockerfile reorder invalidates the tag.
* `compile-webview-part-2` (Qt 6 / pi5+pi4-64+x86) shipped with
  `docker compose build` and zero cache config, so every PR rebuilt the
  per-board Qt 6 builder image cold.

Switch both to BuildKit's registry cache backend, identical pattern to
docker-build.yaml's `buildx` job: cache pushed to
`ghcr.io/screenly/anthias-webview-qt5-builder:buildcache` (Qt 5) and
`ghcr.io/screenly/anthias-webview-qt6-builder:buildcache-<board>`
(Qt 6, scoped per-board because the three Dockerfiles share almost
nothing). `mode=max,image-manifest=true` because GHCR rejects the
legacy standalone-cache manifest format on `ghcr.io/screenly/*`, same
constraint that bit the main workflow.

Auth-side details:

* Both jobs gain `permissions: { contents: read, packages: write }`,
  scoped per-job so other jobs don't inherit GHCR push.
* New "Login to GitHub Container Registry" step on each, gated on
  `event_name != 'pull_request'`. Fork PRs hand out a read-only
  GITHUB_TOKEN — cache-to would 401 mid-build — so `cache-to` is
  pushed-only-on-push, while `cache-from` runs unconditionally and
  warm-starts PRs off the latest master cache once the buildcache
  package is flipped public (same convention as anthias-server etc.).

Qt 6 build step had to switch from `docker compose build` to
`docker buildx bake -f docker-compose.yml --load --set <target>.cache-*`
because compose's YAML can't carry env-var-conditional cache_to without
emitting an empty list entry that buildx rejects. To keep the
subsequent `docker compose run` happy, the three Qt 6 services in
webview/docker-compose.yml gain explicit `image:` tags
(`webview-builder-{x86,pi5,pi4-64}`) so bake's `--load` puts the image
under a name compose looks up by tag rather than rebuilding it.

The Qt 5 job's old `Set buildx arguments` step (which assembled a
quoted string in $GITHUB_OUTPUT) is gone — build args inline in the
final `docker buildx build` invocation now, no GITHUB_OUTPUT
round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(webview): trixie apt rename + adopt GHCR for Qt 5 builder image

Two intertwined fixes in webview/Dockerfile + the workflow that
publishes/consumes its image. CI never caught either because the
Docker-build step in build-webview.yaml is gated to push events, so
this Trixie-targeted Dockerfile has not yet built on master.

apt: drop the renamed-on-Trixie packages
  Stage 1 (armhf sysroot, archive.raspbian.org + deb.debian.org):
  * libgst-dev          → gone, libgstreamer1.0-dev (already listed)
                          replaces it
  * libsqlite0-dev      → gone, libsqlite3-dev (already listed) replaces
  * libsrtp0-dev        → gone in deb.debian.org/main; libsrtp2-dev
                          (already listed) is the trixie default
  * libpng16-16         → renamed libpng16-16t64 under the time_t
                          transition; old name is fully gone
  Stage 2 (amd64 runtime/builder, deb.debian.org):
  * libpng16-16         → libpng16-16t64
  Verified by GET on
  {deb.debian.org,archive.raspbian.org,archive.raspberrypi.org}/dists/
  trixie/main/binary-{armhf,amd64}/Packages.gz: every removed name is
  MISSING, every replacement is FOUND. Without this fix the first
  master push would die in stage 1's apt-get install.

GHCR migration: screenly/ose-qt-builder → ghcr.io/screenly/anthias-...
  Move the published Qt 5 builder image off Docker Hub and into the
  same GHCR namespace as the rest of the anthias-* artifacts. New ref
  is ghcr.io/screenly/anthias-webview-qt5-builder:latest (image) +
  :buildcache (cache, set up in eadd83d1) — one repo, two tags, same
  auth flow.
  * build-docker-image: drop the Docker Hub login step, retag the
    push target to the GHCR ref via an IMAGE_REF env var.
  * compile-webview-part-1: declare permissions: { contents: read,
    packages: read }, add the GHCR login (gated on non-PR), point the
    `docker run` at the GHCR ref.
  Migration window: the GHCR package is created private on first push
  and needs to be flipped public so fork-PR runners (no GHCR auth) can
  pull. Same one-shot operational step as the existing anthias-*
  packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: fix second `rpi` vs `firmware` comment in image_builder

5e289198 fixed the same stale wording in docker/Dockerfile.base.j2
but missed the analogous comment block in
tools/image_builder/__main__.py — flagged by Copilot's second-pass
review.

The comment was a self-referential pointer to the apt-source bootstrap
in Dockerfile.base.j2, claiming libraspberrypi0 lives in
archive.raspbian.org's `rpi` component when in fact it ships under
`firmware` on trixie/armhf (the Deb822 entry written by the same code
correctly says `Components: firmware`). Reword to match reality and
add a note that this was verified against Packages.gz so a future
maintainer doesn't redo the lookup.

Pure comment change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(webview): build Qt 5 builder inline, drop the publish job

a9b9522d migrated the Qt 5 builder image from
screenly/ose-qt-builder:latest (Docker Hub) to
ghcr.io/screenly/anthias-webview-qt5-builder:latest (GHCR), but the
publish step (`build-docker-image`) is gated to push events. On PR
runs the GHCR image therefore never exists, and the consumer
(compile-webview-part-1) blew up trying to `docker pull` it:

    Error response from daemon: Head ...manifests/latest: denied

The image is a CI-internal build artifact — only consumed by the next
step in the same workflow, never deployed, never pulled by any
external user. Publishing it as a registry artifact is just inventory
the workflow has to manage. So instead:

* Delete the `build-docker-image` job entirely.
* Move the build into compile-webview-part-1 as a step that runs on
  every event (PR + push), produces the image with `--load`, and tags
  it locally as `webview-qt5-builder:latest` for the subsequent
  `docker run` to consume.
* Keep the registry-cache backend on
  ghcr.io/screenly/anthias-webview-qt5-builder:buildcache so cold
  builds remain fast: `cache-from` always, `cache-to` only on
  push events (fork PRs have a read-only GITHUB_TOKEN and would 401
  on cache write — same gating as docker-build.yaml).

Side benefits:
* Removes the chicken-and-egg of "PR can't run because GHCR image
  doesn't exist; GHCR image only gets pushed on master".
* Drops the cross-job artifact handoff (and the auth dance to read
  the published image), so fork PRs work without any GHCR public-flip
  step.
* Two matrix runners (pi2, pi3) build in parallel from the same
  registry cache — second-onward runs hit cache for everything once
  the first push to master warms it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci(webview): drop registry cache plumbing, simpler is fine

eadd83d1 added BuildKit registry-cache backends to both webview build
steps; 3dc0a04a kept them when moving the Qt 5 build inline. The
caching is purely a speed optimization — none of it is load-bearing
for correctness, fork PRs can't write cache anyway, and the per-job
GHCR login + permissions block is real surface area in exchange for
saving a few minutes on warm runs.

Strip it all back out:

* compile-webview-part-1: drop the GHCR login + `permissions:
  packages: write`. The "Build Qt 5 builder image" step is a plain
  `docker buildx build --load` now — same inline-build architecture
  from 3dc0a04a, just no `--cache-from` / `--cache-to`.
* compile-webview-part-2: drop the GHCR login + `permissions:`,
  revert "Build Docker Image" from `docker buildx bake -f
  docker-compose.yml --load --set <target>.cache-*` back to plain
  `docker compose build`. COMPOSE_BAKE=true stays so compose still
  uses the bake builder under the hood — no behavior change beyond
  removing the cache flags.

webview/docker-compose.yml's explicit `image:` tags from eadd83d1
stay in place: they happen to match the compose default
(`<project>-<service>`) so plain `docker compose build` produces
the same image names the previous bake invocation did, and `compose
run` finds them either way.

Cold pi2/pi3 builds will be ~9 min on every run instead of getting
fast on warm runs. That's fine for now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Revert "ci(webview): drop registry cache plumbing, simpler is fine"

This reverts commit 1284a5ebd9.

* chore(webview): add bin/rebuild_qt5_toolchain.sh helper

build_webview.yaml's pi2/pi3 jobs fetch a pre-built Qt 5
cross-compile toolchain from a `WebView-v*` GitHub release
(webview/build_webview_with_qt5.sh:21 pins QT5_TOOLCHAIN_TAG to
WebView-v0.3.5). The trixie-targeted tarballs
qt5-5.15.14-trixie-{pi2,pi3}.tar.gz don't exist on any release yet —
the original Trixie commit (65311092) called out cutting them as an
out-of-band prereq. Until they exist, pi2/pi3 CI fails with
`sha256sum: no properly formatted checksum lines found` because curl
falls back to a 404 HTML page on the missing .sha256 URL.

This helper produces those tarballs locally:

* Builds webview/Dockerfile (the same image CI's
  compile-webview-part-1 builds inline) once, --load only.
* Runs build_qt5.sh inside that image once per requested board (pi2
  by default, pi3 by default, or whichever boards are passed on the
  command line). Sequential because Qt 5 + QtWebEngine peaks at ~16
  GB RAM per build and the Linaro cross-compile toolchain extracted
  into .qt5-toolchain-build/src/ is shared between boards.
* Drops outputs at .qt5-toolchain-build/release/qt5-5.15.14-trixie-
  {pi2,pi3}.tar.gz (+ .sha256), ready to upload via
  `gh release upload`.

Idempotent: existing release/<tarball>.tar.gz short-circuits the run
for that board. ccache state is preserved across runs at
.qt5-toolchain-build/ccache/. BUILD_WEBVIEW=0 in the env skips the
bonus webview-* tarball that build_qt5.sh otherwise produces (the
Dockerfile defaults BUILD_WEBVIEW=1 so the helper inherits that
default for parity with the previous CI flow).

The .qt5-toolchain-build/ directory is intentionally hidden + at
the repo root rather than ~/tmp so it's discoverable to whoever
runs this next without grep'ing scrollback for a path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(webview): make Qt 5 cross-build Dockerfile produce working tarballs on trixie

The webview/Dockerfile in this repo wasn't actually exercised end-to-end
before — master CI uses screenly/ose-qt-builder from Docker Hub, and the
inline-build path introduced for trixie only ran build_webview_with_qt5.sh
(which downloads prebuilt qt5 toolchains). Rebuilding those toolchains for
trixie surfaced four real bugs:

* python interpreter never on PATH for non-interactive shells. The pyenv
  block only wired itself up via ~/.bashrc, which doesn't load when the
  rebuild script does `docker run /webview/build_qt5.sh`. Replace pyenv
  with apt-pinned python2.7 from archive.debian.org bullseye (trixie main
  dropped py2 entirely; bullseye archive still ships 2.7.18). Pin only
  python2.7 + its libpython runtime libs, leave everything else on trixie.
  Symlink /usr/local/bin/python -> python2.7 so QtWebEngine's
  `/usr/bin/env python` resolves.

* QtWebEngine configure silently rejected fontconfig because the sysroot
  was missing /usr/share/pkgconfig/bzip2.pc. The Dockerfile only copies
  /lib, /usr/include, /usr/lib from the builder stage; on trixie's
  libbz2-dev the .pc file lives in /usr/share/pkgconfig (arch-indep),
  so freetype2.pc's `Requires.private: bzip2` failed to resolve, which
  cascaded into fontconfig: no, which silently dropped QtWebEngine from
  the build. Add the missing COPY.

* Several QtWebEngine-required dev libs missing from the sysroot
  (libharfbuzz-dev, liblcms2-dev, libre2-dev, libxml2-dev). Same libs
  also need to be installed on the *host* runtime stage because chromium
  pdfium evaluates `harfbuzz_from_pkgconfig` in the host toolchain
  context, where Qt's host_pkg_config="/usr/bin/pkg-config" drops the
  sysroot args from chromium's pkg_config template.

* `make -j$(nproc)+2` OOMs on >8-core hosts. cc1plus under qemu-arm
  peaks at ~3-4 GB during chromium compile, so the default formula
  needs ~50 GB on a 16-core box. Make MAKE_CORES env-overridable in
  build_qt5.sh and have rebuild_qt5_toolchain.sh cap at min(nproc, 8).

Also: -webengine-proprietary-codecs in the configure args so the
resulting QtWebEngine supports H.264/AAC/MP3 (matches what Debian
qt6-webengine ships).

Verified on a 16-core/22GB+32GB-swap host: produces
qt5-5.15.14-trixie-{pi2,pi3}.tar.gz (88M, 98M) with 251 webengine entries
each, plus the matching webview-*.tar.gz apps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(webview): bump QT5_TOOLCHAIN_TAG to WebView-v2026.04.1

Trixie qt5-5.15.14-trixie-{pi2,pi3} toolchain tarballs are published on
the new WebView-v2026.04.1 release; the previous WebView-v0.3.5 only
ships the bookworm tarballs and is now unreachable for trixie pi2/pi3 CI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(webview): refresh stale tag reference in rebuild_qt5_toolchain.sh hint

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): pass full SHA for GIT_HASH; keep short SHA only in GIT_SHORT_HASH

Both `.github/workflows/build-webview.yaml` and `bin/rebuild_qt5_toolchain.sh`
were populating the GIT_HASH build arg with the *short* hash, making
GIT_HASH and GIT_SHORT_HASH identical and stripping the unambiguous
SHA needed by `lib/diagnostics.py:os.getenv('GIT_HASH')` for downstream
traceability. Pass `git rev-parse HEAD` for GIT_HASH and reserve
`--short HEAD` for GIT_SHORT_HASH (which is already what
`tools/image_builder/__main__.py` does for the main service images).

Caught in Copilot review of #2779.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): exclude Qt 5 toolchain build dir + caches from COPY

The viewer image's `COPY . /usr/src/app/` was slurping in 1.6 GB of
local Qt 5 cross-build state (`.qt5-toolchain-build/`) plus 69 MB of
`.mypy_cache/`, inflating every viewer/server image by ~1.7 GB even
though the build needs none of it. Add those plus `.ruff_cache`,
`.idea`, `.cursor`, `.claude`, `.cache`, and tighten the existing
`*.git` / `*.github` globs (which match files ending in `.git` /
`.github` but not the directories themselves on most matchers) to
the literal directory names.

Caught while validating the trixie 5-board matrix: x86 viewer was
6.28 GB and pi5 viewer 2.23 GB; both had the same 1.76 GB COPY layer
that's mostly `.qt5-toolchain-build/`. Fixed image should be ~5 MB
for COPY and ~1.5 GB for the viewer overall.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:30:59 +01:00
Viktor Petersson
93e5501847 ci: enforce strict mypy across the codebase (#2752)
* ci: enforce strict mypy across the codebase

Add a Python mypy job that runs on every PR (`.github/workflows/python-mypy.yaml`) and
the supporting type annotations to make `strict = true` pass cleanly across all 88
source files.

Configuration choices:
* `strict = true` with no global relaxations (no `disallow_subclassing_any = false`,
  no `disallow_untyped_decorators = false`, no `disable_error_code`,
  no `ignore_missing_imports`).
* `follow_imports = "normal"`.
* django-stubs + djangorestframework-stubs plugins; django_stubs_ext.monkeypatch()
  in settings so generic Django classes are subscriptable at runtime.
* Local stubs in `stubs/` for libraries that ship incomplete type info
  (drf_spectacular's view methods, redis-py's sync API).
* A scoped `[[tool.mypy.overrides]]` block lists 7 third-party libs without any
  type info (cec, geventwebsocket, hurry.filesize, pydbus, sh, splinter, vlc) so
  future stub releases will be noticed instead of silently ignored.

The two `# type: ignore` escape hatches that previously existed are gone:
`lib/utils.py` now imports `mplayer` from `sh` properly, and `tests/test_settings.py`
patches `os.getenv` via `mock.patch.object` instead of direct assignment.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: fix mypy CI runtime deps and unused HttpResponse import

* `lib/auth.py`: drop the local `HttpResponse` import; only the string-form
  annotation needs it and the runtime `isinstance` check only uses `HttpRequest`
  (caught by ruff's F401).
* Add a `mypy` dependency group (extends `dev-host` with the runtime deps that
  `anthias_django.settings` touches) so the django-stubs plugin can introspect
  the app registry on a fresh CI runner. Skips the heavy native-extension deps
  (cec, netifaces, etc.) that aren't needed for type checking.
* python-mypy workflow: install via `uv pip install --group mypy` (consistent
  with other workflows) and create a dummy `~/.screenly/screenly.conf` so the
  settings module's first-import side effect succeeds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): migrate password hashing to PBKDF2 (CodeQL py/weak-sensitive-data-hashing)

CodeQL flagged the SHA256-based password hashing in lib/auth.py and api/views/v2.py
as a weak password hash (SHA256 is a fast hash, unsuitable for password storage).

Switch new passwords to Django's `make_password` (PBKDF2-SHA256, 600k iterations
by default in Django 4.2). Add `hash_password` / `verify_password` helpers in
lib/auth.py that:

* hash with PBKDF2 for any new password write,
* verify both Django-format hashes and legacy bare-SHA256 hex digests so existing
  config files still authenticate,
* opportunistically re-hash legacy SHA256 entries to PBKDF2 on a successful
  login, phasing out the weak format over time.

`settings.py` auto-migration now uses `hash_password` for first-load plaintext
passwords and recognises both legacy SHA256 and Django algorithm-prefixed
formats so it doesn't double-hash an already-stored hash.

Also fixes the stale ruff format check (5 files reformatted by `ruff format`)
that was breaking the python-lint job.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: drop legacy SHA256 password verify and add docker-image-builder to mypy group

CodeQL kept flagging the SHA256 fallback path in `lib/auth.py:verify_password`
as `py/weak-sensitive-data-hashing`, even though it was scoped to a one-time
migration. Rather than suppress the alert, drop the legacy verify entirely so
no password material ever flows into hashlib.sha256.

Migration of existing installs now happens at settings load time:
* `settings.py` detects a 64-char hex (legacy SHA256) password hash on read
  and clears both the `password` field and `auth_backend`. The device stays
  reachable (no auth required) and the operator must re-set credentials via
  the web UI. A clear warning is logged so the change is visible.

Also fix the python-mypy CI job: include the `docker-image-builder` group in
the `mypy` group so `pygit2` and `python_on_whales` (imported by
`tools/image_builder/__main__.py`) resolve.

BREAKING: any Anthias install with a SHA256-format password in
`screenly.conf` will have basic auth disabled on first start of this version.
The operator must log in (no password required) and set a new password via
the UI to re-enable basic auth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address review feedback (Copilot + manual review)

Critical:
* settings.py — `_get` previously called `hash_password()` at module-import
  time when it found a plaintext password. That imports Django's password
  hashers, which raises `ImproperlyConfigured` if reached before
  `django.setup()` (e.g. when `viewer/__init__.py` imports `settings`).
  Drop the auto-hash entirely and treat plaintext the same as legacy
  SHA256: clear it, disable basic auth, log an `error`-level warning, and
  persist the cleaned state via `_needs_save_after_load` so the warning
  doesn't repeat on every load.
* lib/backup_helper.py — `recover()` called `tar.extractall()` on an
  uploaded archive. A crafted backup with `../` paths or symlinks could
  overwrite arbitrary files under HOME. Add `_safe_extract()` that
  validates each member's resolved path stays inside the destination and
  rejects unsafe symlinks/hardlinks.
* api/helpers.py — `custom_exception_handler()` discarded DRF's default
  response and always returned 500, breaking 4xx propagation for
  `ValidationError`/`NotFound`/etc. Return DRF's response when present;
  fall back to 500 only when it's None.
* tests/test_settings.py — `broken_settings_should_raise_value_error`
  was missing the `test_` prefix and never actually ran. Renamed.

API correctness:
* api/views/mixins.py — replace bare `raise Exception(...)` for missing
  uploads / wrong file extensions / missing asset URI with DRF's
  `ValidationError` / `NotFound`, so callers get proper 4xx with a
  structured error body instead of a 500.
* anthias_app/helpers.py, api/serializers/{mixins,v1_1}.py — replace
  `assert video_duration is not None` with explicit raises. `assert` is
  stripped under `python -O`, which would silently turn the next call
  into an `AttributeError` on None.

Typing/stubs:
* host_agent.py — pass `decode_responses=True` to both `redis.Redis()`
  constructions and switch the channel name + command map keys from
  bytes to str. The local redis stub assumes decoded responses, and
  host_agent was the only caller violating that invariant.
* lib/auth.py — document the `R | HttpResponse` return-type collapse
  for DRF Response (mirrors Django's @login_required).
* api/serializers/__init__.py — comment why `UpdateAssetSerializer`'s
  fields are widened to `Field[Any, Any, Any, Any]` (so v2's overrides
  with different field types don't trip [assignment]).

Tests/CI:
* api/tests/test_v2_endpoints.py — also assert the stored hash starts
  with `pbkdf2_sha256$` so a regression to a weaker hasher is caught
  even if `verify_password()` itself were broken.
* .github/workflows/python-mypy.yaml — write a minimal valid
  `screenly.conf` (with section headers) instead of an empty file. The
  empty-file path only worked by accident of `_get`'s defensive
  try/except.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(test): force re-import of `settings` in fake_settings()

The `fake_settings()` test helper only deleted `sys.modules['settings']`
*after* the yield (i.e. on clean exit). Once a prior test imported
`settings` cleanly, subsequent `import settings` calls returned the cache
without re-instantiating `AnthiasSettings`, so the fixture's config file
was silently ignored.

This was hidden because `test_broken_settings_should_raise_value_error`
was missing the `test_` prefix and never ran. After renaming it in the
previous commit it now runs and exposes the bug.

Pop the module before import (and again in `finally`) so each test gets
a fresh `AnthiasSettings()` instance bound to the fixture file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Copilot review feedback

- backup_helper: use dedicated BackupRecoverError instead of bare
  Exception so callers can map archive failures to 4xx responses.
- mixins: catch BackupRecoverError / TarError in RecoverViewMixin and
  re-raise as DRF ValidationError so bad uploads return 400, not 500.
- utils: drop top-level `from sh import ffprobe, mplayer`; resolve
  binaries lazily with sh.Command at call sites so a missing tool
  doesn't break module import.
- host_agent: import redis ConnectionError explicitly (mypy strict).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ci): unblock mypy and ruff format; sanitize recover error

- stubs/redis-stubs: export ConnectionError so host_agent's
  retry_if_exception_type(redis.ConnectionError) typechecks under the
  isolated mypy CI env (no real redis package installed).
- host_agent.py: switch to redis.ConnectionError (matches new stub).
- lib/utils.py: ruff format fix for the lazy sh.Command(...) call.
- api/views/mixins.py: don't echo backup-recover exception text into
  the API response (CodeQL py/stack-trace-exposure). Log server-side
  and return a generic 'Invalid backup archive.' message instead.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): generate server-side name for backup recovery upload

Don't pass the client-supplied `file_upload.name` into `path.join('static', ...)`.
A crafted name (path separators, leading `../`, or an absolute path) could
write outside the static directory before the tarball is parsed. Use a
UUID-named `.tar.gz` instead — the original filename is never needed
again after the content-type check.

Addresses suppressed Copilot review comment on api/views/mixins.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Copilot review round 2

- lib/utils.py: catch sh.CommandNotFound in get_video_duration (return
  None) and url_fails (skip streaming probe) so missing ffprobe/mplayer
  don't surface as 500s.
- lib/auth.py: decode basic-auth credentials with partition(':') so
  passwords containing ':' are accepted (RFC 7617).
- lib/backup_helper.py: tighten _safe_extract — reject non-regular tar
  members (symlinks, hardlinks, device nodes, FIFOs) and extract members
  individually after validation instead of calling extractall().
- tests/test_backup_helper.py: add coverage for path traversal,
  absolute-path, symlink, and FIFO rejection in _safe_extract.
- settings.py: fail fast in AnthiasSettings.__init__ when HOME is unset
  instead of silently rooting all config/state paths at the cwd.
- api/serializers/v1_1.py: raise ValidationError when 'duration' is
  missing/invalid for non-video assets instead of an unhandled KeyError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(test): mark tar fixture/extract calls as NOSONAR

The new SafeExtractTest builds and opens single-member tar archives
specifically to test the safe-extract guard. SonarCloud flags any
tarfile.open(...) call (rule python:S5042). Annotate the two test-
fixture call sites with NOSONAR — same pattern used elsewhere in this
repo (e.g. host_agent.py).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style(ci): use 6-space step indentation in python-mypy.yaml

Match the majority style used by the other workflows in this repo.
The previous 4-space indent under `steps:` was valid YAML and ran
fine in CI, but consistency with build-webview.yaml,
build-balena-disk-image.yaml, deploy-website.yaml, etc. makes the
file easier to read alongside the rest.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Copilot review round 3

- api/serializers/v1_1.py: raise DRF ValidationError instead of
  ValueError when video duration can't be determined, so clients get
  a 4xx with a field-level error rather than a 500.
- api/views/mixins.py: ensure the uploaded backup tarball is removed
  in all paths (success and failure). recover() only deletes on
  success; without explicit cleanup, rejected archives — including
  attacker-controlled ones — accumulate under static/.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style: ruff format tests/test_backup_helper.py

Single line that fit within 79 chars but I had broken across three
lines during the merge resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Copilot review round 4

* Add partial PEP 561 stubs in `stubs/channels-stubs/` covering the
  `channels` API surface we actually use: `AsyncWebsocketConsumer`
  (with `as_asgi`), `ProtocolTypeRouter`, `URLRouter`,
  `AllowedHostsOriginValidator`, and `get_channel_layer`. Drop the
  `# type: ignore[misc]` from `AssetConsumer` and remove `channels.*`
  from the mypy `ignore_missing_imports` overrides — the override
  block is back to listing only libs that genuinely ship no type info.

* Delete `_safe_extract` and `_is_within_directory` from
  `lib/backup_helper.py`, plus the corresponding `SafeExtractTest` and
  `_build_archive_with` from `tests/test_backup_helper.py`. After the
  master merge, `recover()` validates each member through
  `_safe_tar_member` (skip-on-fail) and extracts inside the loop, so
  the older raise-on-fail helper became dead code with no caller.
  `RecoverLegacyTarballTest::test_recover_skips_path_traversal_member`
  already exercises the path-traversal guard end-to-end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Copilot review round 5

* `celery_tasks.py:cleanup()` — bail out with a logged error when HOME
  is unset instead of `path.join('', 'anthias_assets')` resolving to a
  relative path. The `find ... -delete` was never going to run on a
  Pi without HOME, but the silent fallback would have chewed through
  the celery worker's cwd if it ever did.

* `api/serializers/v1_1.py:CreateAssetSerializerV1_1.prepare_asset()` —
  fix the video duration handling so non-zero and omitted durations
  are no longer dropped on the floor. Previously only the magic
  `duration == 0` case set `asset['duration']`; a client posting
  `duration=30` for a video produced a serialized dict with no
  duration key, then the view did `Asset.objects.create(**data)` and
  the row got the field's default. Now: missing or 0 means "infer
  from the file via get_video_duration()"; any other integer is
  persisted as-is. Non-int values raise ValidationError up front.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 14:53:37 +01:00