Files
Anthias/.github/workflows/python-lint.yaml
Viktor Petersson 470a37caf4 chore(host): unify host Python install on uv, clean up ansible roles (#2750)
* chore(ansible): fix all ansible-lint violations and remove skip list

Drives the 19 deferred violations from the previous skip list to zero
and deletes .ansible-lint. The roles now pass the ansible-lint
'production' profile (previously 'min').

- var-naming[no-role-prefix] (17): rename register/set_fact vars in
  network/screenly/splashscreen/system roles to use the role's prefix
  (e.g. config_path -> system_config_path, x_service -> screenly_x_service).
- risky-shell-pipe (1): add 'set -o pipefail' + bash executable to the
  /etc/timezone shell task in system role.
- no-free-form (1): switch swapoff to keyword form (cmd:/removes:).

Also resyncs uv.lock with pyproject.toml's ansible-core==2.19.9 pin
(drift left over from #2749).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ansible): fix bugs, simplify, drop Debian <= 7 support

Bug fixes
---------
- Fix /etc/sudoers.d/screenly_overrides: was copied verbatim, so the
  literal "${USER}" was written into sudoers (sudo doesn't expand it)
  and the rule pointed at the renamed upgrade_screenly.sh. Convert to
  a Jinja template using {{ anthias_user }}, point at upgrade_anthias.sh,
  and add `validate: visudo -cf %s` so a syntax error blocks install.
- Fix double-prefix var name screenly_screenly_x_service_exists left over
  from the lint-cleanup replace_all cascade.

Simplification
--------------
- site.yml: lift env('USER') into anthias_user play var; assert
  USER + DEVICE_TYPE are set/valid in pre_tasks. Replace ~30 inline
  lookups across roles with {{ anthias_user }}.
- system role: drop `lsb_release -cs` and `getconf LONG_BIT` shellouts;
  use ansible_distribution_release and ansible_userspace_bits facts.
  Collapse two near-duplicate "add user to docker group" tasks into one
  task using `append: true` plus a conditional gpio entry for ARM.
- system role: replace the /etc/timezone shell pipe with a non-shell
  `command: readlink` + `copy: content:` pair (no pipefail dance needed).
- system role: drop the unconditional `rpi-update` install on ARM; that
  package ships experimental kernels and shouldn't be run unattended.
- screenly role: move anthias-host-agent.service template from the
  non-standard tasks/templates/ to the conventional roles/<role>/templates/.

Debian 11 path
--------------
- Bump pinned versions in the Debian 11 pip branch to current release
  where Python 3.9 still supports them (ansible-core 2.15.13, redis
  7.4.0, requests 2.33.1, tenacity 9.1.2, getmac 0.9.5). Drop the
  unused docker==6.0.0 pin. Comment why ansible-core stays on 2.15.x.

Drop Debian <= 7
----------------
- splashscreen role: remove four Jessie/Wheezy tasks and the
  `ansible_distribution_major_version|int > 7` guards on every
  remaining task. Delete unused files/asplashscreen.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ansible): normalize *_exist to *_exists for consistency

system_cdefs_exist was the lone singular outlier among
network_manager_exists / screenly_x_service_exists.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(systemd): harden anthias-host-agent and wifi-connect units

Both units now follow modern systemd best practice:
- explicit Type= (simple / oneshot)
- explicit dependency on docker.service via Requires= (was just After=)
- Documentation= URL + SyslogIdentifier= for journalctl filtering
- structured sandboxing: PrivateTmp, PrivateDevices, ProtectSystem=full,
  ProtectHome=read-only, ProtectKernel{Tunables,Modules,Logs},
  ProtectControlGroups, ProtectClock, ProtectHostname, ProtectProc=invisible,
  RestrictRealtime, RestrictSUIDSGID, RestrictNamespaces,
  RestrictAddressFamilies (per-service: AF_UNIX for wifi-connect,
  AF_UNIX/AF_INET/AF_INET6 for host-agent), LockPersonality, UMask=0027.

wifi-connect (no privilege escalation, just talks to docker.sock)
goes further with NoNewPrivileges, CapabilityBoundingSet=,
SystemCallFilter=@system-service minus @privileged/@resources, and
Type=oneshot + RemainAfterExit so systemd records "active (exited)"
after `docker compose up -d` returns.

host-agent is left more permissive because host_agent.py shells out
to `sudo systemctl reboot|poweroff` which needs setuid + CAP_SYS_BOOT
+ the reboot() syscall — those would all be blocked under the tighter
profile. The unit calls this out in a comment.

Also makes start_wifi_connect_service.sh executable so the unit can
invoke it directly instead of via a `bash` wrapper.

systemd-analyze security score:
  anthias-host-agent: 7.5 EXPOSED -> 4.9 OK
  wifi-connect:       7.3 MEDIUM  -> 1.1 OK

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(host): bootstrap installer_venv with uv, drop requirements.host.txt

uv manages its own Python independent of the system Python, which lets
both Debian 11 and Debian 12+ use the same dependency set from
pyproject.toml's `host` group. The install flow no longer needs:

- the requirements/requirements.host.txt frozen snapshot
- the Debian 11 ansible-core==2.15.x special case in install.sh
- the parallel Debian 11 / Debian 12+ pip tasks in the screenly role
- the python3-pip / python3-venv / python3-full apt packages
- the cryptography==38.0.1 wheel-build workaround
- the python-lint.yaml drift check on requirements.host.txt
- the long-stale `supervisor` pip-removal migration tasks

bin/install.sh changes:
- new `clone_repo` step (runs before install_ansible) so we have
  pyproject.toml in place
- install_ansible now: curl|sh the official uv installer, then
  `uv sync --no-default-groups --group host --no-install-project`
  with UV_PROJECT_ENVIRONMENT=/home/$USER/installer_venv
- run_ansible_playbook uses the venv's ansible-playbook directly
  instead of relying on PATH activation

ansible/roles/screenly/tasks/main.yml: replace both pip tasks with a
single `uv sync` task (run as the anthias user, with
UV_PROJECT_ENVIRONMENT pointed at installer_venv) so the venv stays
in sync if ansible-playbook is rerun standalone.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): replace pre-uv installer_venv on upgrade

Existing Anthias hosts have an installer_venv created by
`python3 -m venv`, which uv won't recognize on upgrade. Detect by the
absence of the `uv = <version>` line in pyvenv.cfg and rm -rf it so
`uv sync` rebuilds cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(systemd): allow AF_NETLINK in anthias-host-agent

host_agent.py uses the netifaces C extension which opens AF_NETLINK
(NETLINK_ROUTE) sockets to enumerate interfaces during the
set_ip_addresses pubsub command. The previous RestrictAddressFamilies
list (AF_UNIX/AF_INET/AF_INET6) blocked this with EAFNOSUPPORT.

Verified by tracing netifaces under strace and by running a
test service unit with the same restrictions.

Audit: only three pubsub commands ever target the hostcmd channel —
reboot, shutdown, set_ip_addresses. CEC lookups happen inside the
anthias-celery Docker container, not on the host.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(install): apply audit fixes from review

- Bug: upgrade_containers.sh was always pulled from master regardless
  of the user's selected ref, so a tagged install got master's upgrade
  script. Now uses ${BRANCH}.
- Pin uv to UV_PIN_VERSION (=0.9.17, matching docker/uv-builder.j2)
  and reinstall if the local uv is a different version. Avoids drift
  between the host bootstrap and the docker image build.
- Apply --ask-become-pass on every arch when the NOPASSWD sudoers
  file is missing (was x86_64 only — could hang on Pi if the sudo
  timestamp expired mid-playbook). Also tell the user a prompt is
  coming.
- Drop the stale "Please reboot and run upgrade" branch in
  post_installation; both branches told the user to reboot anyway.
  Add an SSH-detection notice so the user knows their session will
  drop on reboot.
- Use git -C in write_anthias_version so it doesn't rely on cwd.
- Drop redundant `-e` from shebang (set -euo pipefail two lines down
  is stricter).
- Drop `apt update -y` (-y is meaningless for update); standardize on
  apt-get.
- Quote ${USER} interpolations everywhere; replace `A && B || C`
  pseudo-if-else with proper if-then-else (shellcheck SC2015);
  fix array expansion `${VERSION_PROMPT}` -> `${VERSION_PROMPT[*]}`
  (SC2128); split `local FOO=$(...)` into separate decl/assign
  (SC2155). shellcheck now clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ansible): drop dead migration helpers and split system role

Migration helpers
-----------------
Dropped tasks that only existed to clean up cruft from pre-Anthias
(Screenly OSE, mid-2024-) installs. They have been idempotent no-ops
on every modern install for many releases.

screenly role:
- Remove screenly_utils.sh (renamed long ago)
- Remove cron entry "Cleanup screenly_assets" (state: absent of a
  cron job no Anthias release has created)
- Remove old upgrade_screenly.sh (renamed to upgrade_anthias.sh)
- Remove screenly_usb_assets.sh and the autoplay udev rule
- Remove plymouth-quit-wait.service / plymouth-quit.service deletion
  (Plymouth handling is now done by the splashscreen role)
- Drop the entire screenly_deprecated_systemd_units list and the
  three tasks that used it (X.service, screenly-celery,
  screenly-web, screenly-websocket_server_layer, screenly-viewer,
  matchbox, screenly-host-agent, udev-restart, wifi-connect — all
  deprecated unit names from the Screenly era)
- Remove the ngrok binary cleanup

network role:
- Drop the screenly_net_mgr.py / screenly_net_watchdog.py removal
  pipeline (the whole stat -> set_fact -> stop -> rm chain). Modern
  installs use NetworkManager directly.

upgrade-script integrity
------------------------
Documented why /usr/local/sbin/upgrade_anthias.sh is fetched from
GitHub without a checksum: the URL is meant to track upstream master
so users can pull in fixes without reinstalling, and integrity is
bounded by HTTPS to githubusercontent. If we ever ship signed
release assets, we should switch to fetching the signed asset.

system role split
-----------------
~370-line system/tasks/main.yml split into focused includes:
  - boot.yml         — /boot/{config,cmdline}.txt edits
                       (raspberry-pi + touches-boot-partition tags
                        applied at the include level)
  - packages.yml     — libc6-dev cdefs.h fix, Anthias apt deps,
                       and removal of pre-Anthias / distro-Docker
                       packages
  - docker.yml       — Docker repo, install, and user-to-docker-group
  - timezone.yml     — /etc/timezone backfill from /etc/localtime
  - dist_upgrade.yml — apt upgrade dist + autoremove
                       (system-upgrade tag at the include level)
  - misc.yml         — rc.local, dpkg 01_nodoc, swap removal
main.yml is now the orchestrator. Tags moved from per-task to per-include.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): pin upgrade_anthias.sh URL to the installed ref

The Ansible task that wrote /usr/local/sbin/upgrade_anthias.sh
hardcoded raw.githubusercontent.com/.../master/bin/install.sh. So a
tag-pinned install (e.g. v0.20.4) silently received master's
install.sh as its upgrade entry point, defeating the version pin.

The mirror bug in install.sh::upgrade_docker_containers (which fetched
upgrade_containers.sh from master regardless of selected ref) was
fixed earlier in this PR; this commit closes the same gap on the
upgrade-script side.

- site.yml: add `anthias_branch` play var (env ANTHIAS_BRANCH,
  defaulting to 'master' if unset for safety on standalone
  ansible-playbook runs); include it in the pre_tasks assertion.
- screenly role: render the URL with `{{ anthias_branch }}`. Reword
  the comment + task name so they actually match the behaviour.
- install.sh::run_ansible_playbook: export ANTHIAS_BRANCH=${BRANCH}
  so the playbook can pick up the user's selected ref.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ansible): suppress sonarcloud S2612 on intentional system modes

SonarCloud's ansible:S2612 ("granting access to others") fires on
seven file/directory modes in the new system role task files. Each
matches a Debian/Raspbian convention and the file *needs* to be
world-readable to function:

- /etc/timezone (0644) — libc/locale paths read it
- /etc/dpkg/dpkg.cfg.d/01_nodoc (0644) — dpkg honours per any caller
- /etc/apt/keyrings (0755) and /etc/apt/keyrings/docker.asc (0644) —
  per Debian's own apt-secure docs; the _apt user reads them
- /etc/rc.local (0755) — systemd-rc-local-generator runs it
- cmdline.txt and its .orig backup (0755) — Pi imager / NOOBS / pi-config
  ship 0755; deviating breaks raspi-config tooling

Suppressing per-line with `# NOSONAR` and a one-liner reason rather
than weakening the modes (which would actually break the system).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): point ansible at the venv python; pre-create netdev group

Two issues found by running bin/install.sh end-to-end against a
privileged Debian 12 container with systemd + DinD.

1. Ansible needs Python on the target. Since this PR drops the system
   python3 install in favour of uv-managed Python under installer_venv,
   gather_facts (and every subsequent module) failed with:
       "/usr/bin/python3 not found"
   Fix: export ANSIBLE_PYTHON_INTERPRETER=$installer_venv/bin/python
   in install.sh::run_ansible_playbook so Ansible uses the venv we
   just provisioned with uv sync.

2. The `netdev` group is created by network-manager, but Anthias can
   be installed with MANAGE_NETWORK=No, in which case the package
   isn't pulled in and the group doesn't exist. The user-membership
   task then failed with "Group netdev does not exist". Pre-create
   it with `ansible.builtin.group` (system:true) so the membership
   task is safe regardless of MANAGE_NETWORK.

Both bugs were latent on Raspberry Pi OS (which always pre-installs
NetworkManager) but bite on minimal Debian 12 / x86 images.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ansible): pre-create all required groups, not just netdev

Previous fix only handled netdev. Minimal Debian/x86 images can also
be missing input, plugdev, video, dialout (anything that udev or
desktop-package postinsts would have created). Pre-create the whole
list with ansible.builtin.group + state: present.

Also dedupe the base/pi group lists between the create task and the
membership task using YAML anchors so they can't drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(install): use curl instead of wget; add gettext-base

End-to-end test in a privileged Debian 12 container surfaced two
post-Ansible-playbook bash issues:

- upgrade_docker_containers used wget, but the install_packages step
  only installs curl. Switch to `curl -fsSL ... -o`.
- bin/upgrade_containers.sh uses `envsubst` (gettext-base). Add
  gettext-base to APT_INSTALL_ARGS so it's present.

Both surfaced because we slimmed the apt list when migrating the
host bootstrap to uv. They were latent on full RPi OS images.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(systemd): keep host-agent sudo path working under systemd <= 252

Copilot's review caught a real regression: nearly all the Protect*/
Restrict* directives we added implicitly enable NoNewPrivileges on
systemd <= 252 (Debian 11/12), which blocks sudo's setuid escalation.
host_agent.py shells out to `sudo systemctl reboot|poweroff`, so the
hardened unit silently broke reboot/shutdown via the API on those
distros.

Reproduced inside a Debian 12 + systemd 252 container: every direct-
ive flagged "implies NNP=yes" in systemd.exec(5) actually does so on
that version. Setting `NoNewPrivileges=false` does *not* override the
implication — the only fix is to drop the offending directives.

Strip anthias-host-agent.service down to the directives empirically
verified to keep NNP=0 on systemd 252:

  PrivateTmp, ProtectSystem=full, ProtectHome=read-only,
  ProtectControlGroups, ProtectProc=invisible, UMask=0027,
  CapabilityBoundingSet (narrowed), AmbientCapabilities=

This still gives meaningful sandboxing (capability bounding set
limits even sudo's child processes) without breaking reboot.

wifi-connect.service is unaffected — it doesn't escalate.

systemd-analyze security on a real Debian 12 systemd 252:
  anthias-host-agent: 6.5 MEDIUM (was claimed 4.9 OK on Ubuntu 24.04
                                   host, where the implication
                                   doesn't fire)

Also addressing other Copilot review comments
---------------------------------------------
- screenly_overrides: tighten the systemctl rule to the specific
  verbs host_agent.py uses (`reboot`, `poweroff`). Granting blanket
  `/bin/systemctl` was overly broad — could start/stop/mask any
  unit on the system.
- packages.yml: gate the cdefs.h / libc6-dev workaround on
  ansible_architecture == 'armv7l'. The header path is armhf-
  specific; on aarch64/x86 the task always hit "missing" and
  reinstalled libc6-dev unnecessarily on every run.
- screenly/tasks/main.yml: assert that uv exists before `uv sync`
  with a clear error message pointing at install.sh, instead of
  letting the task fail deep inside command-not-found.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(ansible): skip dhcpcd disable when service is not installed

On stock Debian/Ubuntu x86 (and any host without dhcpcd installed),
the unconditional systemd stop/disable failed the playbook with
"Could not find the requested service dhcpcd". Gather service facts
first and only run the disable when dhcpcd.service is actually
present. Pi OS Buster/Bullseye still ship dhcpcd by default, so the
existing behavior there is unchanged; Bookworm (12+) is already
excluded by the version gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(install): mention x86 alongside Raspberry Pi in intro banner

The intro banner only warned about losing the Pi's desktop
environment; reword so it reflects that Anthias also runs on x86
and that the host is repurposed regardless of platform.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(host-agent): retry redis connect quietly on first-boot startup

On a fresh install the host_agent.service unit starts before the redis
docker container is accepting connections, so it crashed on every
boot with a 50-line ConnectionRefusedError traceback, was restarted
by systemd 10s later, and repeated until redis came up — typically
~5 minutes of journal noise per cold boot.

Wrap the redis connect+subscribe in a tenacity Retrying matching the
pattern already used by set_ip_addresses: retry only on
redis.exceptions.ConnectionError, 5s fixed wait, up to 60 attempts,
and use before_sleep_log so each retry logs a single WARN line
instead of a traceback. After the bounded retry budget, the
exception is re-raised and systemd's normal restart policy applies.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(ansible): title-case Anthias in splashscreen task name

The "Set plymouth default theme to anthias" task name was the only
user-facing message in the playbook that didn't title-case Anthias.
Audit covered task names, msg/fail_msg/debug fields, and systemd
unit Description= templates; everything else was already correct.
The command argument and changed_when comparison stay lowercase
because they reference the literal on-disk theme identifier.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style(host-agent): collapse before_sleep_log call to satisfy ruff format

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(sudoers): match host_agent.py's actual systemctl path

host_agent.py invokes /usr/bin/systemctl, but the rule listed
/bin/systemctl. On usrmerged Debian sudo matches by inode so this
worked in practice, but the rule should match what the agent
actually calls so it doesn't depend on /bin -> /usr/bin staying a
symlink. Also drops /sbin/shutdown — the agent never invokes it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(host-agent): surface retry exceptions and support ens interfaces

Two related fixes turned up while validating reboot/shutdown end-to-end
on a multipass-launched Debian 13 VM:

1. set_ip_addresses() swallowed every retry's exception under the bare
   `except RetryError`, leaving only a generic "Unable to connect"
   warning. Add before_sleep_log(..., exc_info=True) so the actual
   requests.* exception is logged on each attempt, and put a 5s timeout
   on requests.get() so a hung connect can't stretch one attempt out.
2. SUPPORTED_INTERFACES missed `ens` (systemd "slot" naming used by
   QEMU/multipass and many cloud images), so get_ip_addresses() returned
   an empty list on those hosts even when the NIC was up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(host-agent): hoist internet probe URL to a named constant

Pulls the 1.1.1.1 anycast literal out of set_ip_addresses() into
INTERNET_PROBE_URL and silences the SonarCloud python:S1313 hotspot
on the constant with a comment explaining the IP is Cloudflare's
public anycast probe (not a private address). The previous commit's
edit to the same line tripped the quality gate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 07:48:12 +01:00

52 lines
1.1 KiB
YAML

name: Run Python Linter
on:
push:
branches:
- 'master'
paths:
- pyproject.toml
- uv.lock
- '**/*.py'
- '.github/workflows/python-lint.yaml'
pull_request:
branches:
- master
paths:
- pyproject.toml
- uv.lock
- '**/*.py'
- '.github/workflows/python-lint.yaml'
jobs:
run-python-linter:
runs-on: ubuntu-24.04
strategy:
matrix:
python-version: ["3.11"]
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
with:
version: '0.9.17'
- name: Install dependencies
run: |
uv venv
uv pip install --group dev-host
- name: Run Ruff linting checks
run: |
uv run ruff check .
- name: Run Ruff formatting checks
run: |
uv run ruff format --check .