mirror of
https://github.com/Screenly/Anthias.git
synced 2026-06-10 17:18:43 -04:00
* chore(ansible): fix all ansible-lint violations and remove skip list Drives the 19 deferred violations from the previous skip list to zero and deletes .ansible-lint. The roles now pass the ansible-lint 'production' profile (previously 'min'). - var-naming[no-role-prefix] (17): rename register/set_fact vars in network/screenly/splashscreen/system roles to use the role's prefix (e.g. config_path -> system_config_path, x_service -> screenly_x_service). - risky-shell-pipe (1): add 'set -o pipefail' + bash executable to the /etc/timezone shell task in system role. - no-free-form (1): switch swapoff to keyword form (cmd:/removes:). Also resyncs uv.lock with pyproject.toml's ansible-core==2.19.9 pin (drift left over from #2749). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ansible): fix bugs, simplify, drop Debian <= 7 support Bug fixes --------- - Fix /etc/sudoers.d/screenly_overrides: was copied verbatim, so the literal "${USER}" was written into sudoers (sudo doesn't expand it) and the rule pointed at the renamed upgrade_screenly.sh. Convert to a Jinja template using {{ anthias_user }}, point at upgrade_anthias.sh, and add `validate: visudo -cf %s` so a syntax error blocks install. - Fix double-prefix var name screenly_screenly_x_service_exists left over from the lint-cleanup replace_all cascade. Simplification -------------- - site.yml: lift env('USER') into anthias_user play var; assert USER + DEVICE_TYPE are set/valid in pre_tasks. Replace ~30 inline lookups across roles with {{ anthias_user }}. - system role: drop `lsb_release -cs` and `getconf LONG_BIT` shellouts; use ansible_distribution_release and ansible_userspace_bits facts. Collapse two near-duplicate "add user to docker group" tasks into one task using `append: true` plus a conditional gpio entry for ARM. - system role: replace the /etc/timezone shell pipe with a non-shell `command: readlink` + `copy: content:` pair (no pipefail dance needed). - system role: drop the unconditional `rpi-update` install on ARM; that package ships experimental kernels and shouldn't be run unattended. - screenly role: move anthias-host-agent.service template from the non-standard tasks/templates/ to the conventional roles/<role>/templates/. Debian 11 path -------------- - Bump pinned versions in the Debian 11 pip branch to current release where Python 3.9 still supports them (ansible-core 2.15.13, redis 7.4.0, requests 2.33.1, tenacity 9.1.2, getmac 0.9.5). Drop the unused docker==6.0.0 pin. Comment why ansible-core stays on 2.15.x. Drop Debian <= 7 ---------------- - splashscreen role: remove four Jessie/Wheezy tasks and the `ansible_distribution_major_version|int > 7` guards on every remaining task. Delete unused files/asplashscreen. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ansible): normalize *_exist to *_exists for consistency system_cdefs_exist was the lone singular outlier among network_manager_exists / screenly_x_service_exists. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(systemd): harden anthias-host-agent and wifi-connect units Both units now follow modern systemd best practice: - explicit Type= (simple / oneshot) - explicit dependency on docker.service via Requires= (was just After=) - Documentation= URL + SyslogIdentifier= for journalctl filtering - structured sandboxing: PrivateTmp, PrivateDevices, ProtectSystem=full, ProtectHome=read-only, ProtectKernel{Tunables,Modules,Logs}, ProtectControlGroups, ProtectClock, ProtectHostname, ProtectProc=invisible, RestrictRealtime, RestrictSUIDSGID, RestrictNamespaces, RestrictAddressFamilies (per-service: AF_UNIX for wifi-connect, AF_UNIX/AF_INET/AF_INET6 for host-agent), LockPersonality, UMask=0027. wifi-connect (no privilege escalation, just talks to docker.sock) goes further with NoNewPrivileges, CapabilityBoundingSet=, SystemCallFilter=@system-service minus @privileged/@resources, and Type=oneshot + RemainAfterExit so systemd records "active (exited)" after `docker compose up -d` returns. host-agent is left more permissive because host_agent.py shells out to `sudo systemctl reboot|poweroff` which needs setuid + CAP_SYS_BOOT + the reboot() syscall — those would all be blocked under the tighter profile. The unit calls this out in a comment. Also makes start_wifi_connect_service.sh executable so the unit can invoke it directly instead of via a `bash` wrapper. systemd-analyze security score: anthias-host-agent: 7.5 EXPOSED -> 4.9 OK wifi-connect: 7.3 MEDIUM -> 1.1 OK Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(host): bootstrap installer_venv with uv, drop requirements.host.txt uv manages its own Python independent of the system Python, which lets both Debian 11 and Debian 12+ use the same dependency set from pyproject.toml's `host` group. The install flow no longer needs: - the requirements/requirements.host.txt frozen snapshot - the Debian 11 ansible-core==2.15.x special case in install.sh - the parallel Debian 11 / Debian 12+ pip tasks in the screenly role - the python3-pip / python3-venv / python3-full apt packages - the cryptography==38.0.1 wheel-build workaround - the python-lint.yaml drift check on requirements.host.txt - the long-stale `supervisor` pip-removal migration tasks bin/install.sh changes: - new `clone_repo` step (runs before install_ansible) so we have pyproject.toml in place - install_ansible now: curl|sh the official uv installer, then `uv sync --no-default-groups --group host --no-install-project` with UV_PROJECT_ENVIRONMENT=/home/$USER/installer_venv - run_ansible_playbook uses the venv's ansible-playbook directly instead of relying on PATH activation ansible/roles/screenly/tasks/main.yml: replace both pip tasks with a single `uv sync` task (run as the anthias user, with UV_PROJECT_ENVIRONMENT pointed at installer_venv) so the venv stays in sync if ansible-playbook is rerun standalone. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): replace pre-uv installer_venv on upgrade Existing Anthias hosts have an installer_venv created by `python3 -m venv`, which uv won't recognize on upgrade. Detect by the absence of the `uv = <version>` line in pyvenv.cfg and rm -rf it so `uv sync` rebuilds cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(systemd): allow AF_NETLINK in anthias-host-agent host_agent.py uses the netifaces C extension which opens AF_NETLINK (NETLINK_ROUTE) sockets to enumerate interfaces during the set_ip_addresses pubsub command. The previous RestrictAddressFamilies list (AF_UNIX/AF_INET/AF_INET6) blocked this with EAFNOSUPPORT. Verified by tracing netifaces under strace and by running a test service unit with the same restrictions. Audit: only three pubsub commands ever target the hostcmd channel — reboot, shutdown, set_ip_addresses. CEC lookups happen inside the anthias-celery Docker container, not on the host. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(install): apply audit fixes from review - Bug: upgrade_containers.sh was always pulled from master regardless of the user's selected ref, so a tagged install got master's upgrade script. Now uses ${BRANCH}. - Pin uv to UV_PIN_VERSION (=0.9.17, matching docker/uv-builder.j2) and reinstall if the local uv is a different version. Avoids drift between the host bootstrap and the docker image build. - Apply --ask-become-pass on every arch when the NOPASSWD sudoers file is missing (was x86_64 only — could hang on Pi if the sudo timestamp expired mid-playbook). Also tell the user a prompt is coming. - Drop the stale "Please reboot and run upgrade" branch in post_installation; both branches told the user to reboot anyway. Add an SSH-detection notice so the user knows their session will drop on reboot. - Use git -C in write_anthias_version so it doesn't rely on cwd. - Drop redundant `-e` from shebang (set -euo pipefail two lines down is stricter). - Drop `apt update -y` (-y is meaningless for update); standardize on apt-get. - Quote ${USER} interpolations everywhere; replace `A && B || C` pseudo-if-else with proper if-then-else (shellcheck SC2015); fix array expansion `${VERSION_PROMPT}` -> `${VERSION_PROMPT[*]}` (SC2128); split `local FOO=$(...)` into separate decl/assign (SC2155). shellcheck now clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ansible): drop dead migration helpers and split system role Migration helpers ----------------- Dropped tasks that only existed to clean up cruft from pre-Anthias (Screenly OSE, mid-2024-) installs. They have been idempotent no-ops on every modern install for many releases. screenly role: - Remove screenly_utils.sh (renamed long ago) - Remove cron entry "Cleanup screenly_assets" (state: absent of a cron job no Anthias release has created) - Remove old upgrade_screenly.sh (renamed to upgrade_anthias.sh) - Remove screenly_usb_assets.sh and the autoplay udev rule - Remove plymouth-quit-wait.service / plymouth-quit.service deletion (Plymouth handling is now done by the splashscreen role) - Drop the entire screenly_deprecated_systemd_units list and the three tasks that used it (X.service, screenly-celery, screenly-web, screenly-websocket_server_layer, screenly-viewer, matchbox, screenly-host-agent, udev-restart, wifi-connect — all deprecated unit names from the Screenly era) - Remove the ngrok binary cleanup network role: - Drop the screenly_net_mgr.py / screenly_net_watchdog.py removal pipeline (the whole stat -> set_fact -> stop -> rm chain). Modern installs use NetworkManager directly. upgrade-script integrity ------------------------ Documented why /usr/local/sbin/upgrade_anthias.sh is fetched from GitHub without a checksum: the URL is meant to track upstream master so users can pull in fixes without reinstalling, and integrity is bounded by HTTPS to githubusercontent. If we ever ship signed release assets, we should switch to fetching the signed asset. system role split ----------------- ~370-line system/tasks/main.yml split into focused includes: - boot.yml — /boot/{config,cmdline}.txt edits (raspberry-pi + touches-boot-partition tags applied at the include level) - packages.yml — libc6-dev cdefs.h fix, Anthias apt deps, and removal of pre-Anthias / distro-Docker packages - docker.yml — Docker repo, install, and user-to-docker-group - timezone.yml — /etc/timezone backfill from /etc/localtime - dist_upgrade.yml — apt upgrade dist + autoremove (system-upgrade tag at the include level) - misc.yml — rc.local, dpkg 01_nodoc, swap removal main.yml is now the orchestrator. Tags moved from per-task to per-include. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): pin upgrade_anthias.sh URL to the installed ref The Ansible task that wrote /usr/local/sbin/upgrade_anthias.sh hardcoded raw.githubusercontent.com/.../master/bin/install.sh. So a tag-pinned install (e.g. v0.20.4) silently received master's install.sh as its upgrade entry point, defeating the version pin. The mirror bug in install.sh::upgrade_docker_containers (which fetched upgrade_containers.sh from master regardless of selected ref) was fixed earlier in this PR; this commit closes the same gap on the upgrade-script side. - site.yml: add `anthias_branch` play var (env ANTHIAS_BRANCH, defaulting to 'master' if unset for safety on standalone ansible-playbook runs); include it in the pre_tasks assertion. - screenly role: render the URL with `{{ anthias_branch }}`. Reword the comment + task name so they actually match the behaviour. - install.sh::run_ansible_playbook: export ANTHIAS_BRANCH=${BRANCH} so the playbook can pick up the user's selected ref. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ansible): suppress sonarcloud S2612 on intentional system modes SonarCloud's ansible:S2612 ("granting access to others") fires on seven file/directory modes in the new system role task files. Each matches a Debian/Raspbian convention and the file *needs* to be world-readable to function: - /etc/timezone (0644) — libc/locale paths read it - /etc/dpkg/dpkg.cfg.d/01_nodoc (0644) — dpkg honours per any caller - /etc/apt/keyrings (0755) and /etc/apt/keyrings/docker.asc (0644) — per Debian's own apt-secure docs; the _apt user reads them - /etc/rc.local (0755) — systemd-rc-local-generator runs it - cmdline.txt and its .orig backup (0755) — Pi imager / NOOBS / pi-config ship 0755; deviating breaks raspi-config tooling Suppressing per-line with `# NOSONAR` and a one-liner reason rather than weakening the modes (which would actually break the system). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): point ansible at the venv python; pre-create netdev group Two issues found by running bin/install.sh end-to-end against a privileged Debian 12 container with systemd + DinD. 1. Ansible needs Python on the target. Since this PR drops the system python3 install in favour of uv-managed Python under installer_venv, gather_facts (and every subsequent module) failed with: "/usr/bin/python3 not found" Fix: export ANSIBLE_PYTHON_INTERPRETER=$installer_venv/bin/python in install.sh::run_ansible_playbook so Ansible uses the venv we just provisioned with uv sync. 2. The `netdev` group is created by network-manager, but Anthias can be installed with MANAGE_NETWORK=No, in which case the package isn't pulled in and the group doesn't exist. The user-membership task then failed with "Group netdev does not exist". Pre-create it with `ansible.builtin.group` (system:true) so the membership task is safe regardless of MANAGE_NETWORK. Both bugs were latent on Raspberry Pi OS (which always pre-installs NetworkManager) but bite on minimal Debian 12 / x86 images. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ansible): pre-create all required groups, not just netdev Previous fix only handled netdev. Minimal Debian/x86 images can also be missing input, plugdev, video, dialout (anything that udev or desktop-package postinsts would have created). Pre-create the whole list with ansible.builtin.group + state: present. Also dedupe the base/pi group lists between the create task and the membership task using YAML anchors so they can't drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(install): use curl instead of wget; add gettext-base End-to-end test in a privileged Debian 12 container surfaced two post-Ansible-playbook bash issues: - upgrade_docker_containers used wget, but the install_packages step only installs curl. Switch to `curl -fsSL ... -o`. - bin/upgrade_containers.sh uses `envsubst` (gettext-base). Add gettext-base to APT_INSTALL_ARGS so it's present. Both surfaced because we slimmed the apt list when migrating the host bootstrap to uv. They were latent on full RPi OS images. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(systemd): keep host-agent sudo path working under systemd <= 252 Copilot's review caught a real regression: nearly all the Protect*/ Restrict* directives we added implicitly enable NoNewPrivileges on systemd <= 252 (Debian 11/12), which blocks sudo's setuid escalation. host_agent.py shells out to `sudo systemctl reboot|poweroff`, so the hardened unit silently broke reboot/shutdown via the API on those distros. Reproduced inside a Debian 12 + systemd 252 container: every direct- ive flagged "implies NNP=yes" in systemd.exec(5) actually does so on that version. Setting `NoNewPrivileges=false` does *not* override the implication — the only fix is to drop the offending directives. Strip anthias-host-agent.service down to the directives empirically verified to keep NNP=0 on systemd 252: PrivateTmp, ProtectSystem=full, ProtectHome=read-only, ProtectControlGroups, ProtectProc=invisible, UMask=0027, CapabilityBoundingSet (narrowed), AmbientCapabilities= This still gives meaningful sandboxing (capability bounding set limits even sudo's child processes) without breaking reboot. wifi-connect.service is unaffected — it doesn't escalate. systemd-analyze security on a real Debian 12 systemd 252: anthias-host-agent: 6.5 MEDIUM (was claimed 4.9 OK on Ubuntu 24.04 host, where the implication doesn't fire) Also addressing other Copilot review comments --------------------------------------------- - screenly_overrides: tighten the systemctl rule to the specific verbs host_agent.py uses (`reboot`, `poweroff`). Granting blanket `/bin/systemctl` was overly broad — could start/stop/mask any unit on the system. - packages.yml: gate the cdefs.h / libc6-dev workaround on ansible_architecture == 'armv7l'. The header path is armhf- specific; on aarch64/x86 the task always hit "missing" and reinstalled libc6-dev unnecessarily on every run. - screenly/tasks/main.yml: assert that uv exists before `uv sync` with a clear error message pointing at install.sh, instead of letting the task fail deep inside command-not-found. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ansible): skip dhcpcd disable when service is not installed On stock Debian/Ubuntu x86 (and any host without dhcpcd installed), the unconditional systemd stop/disable failed the playbook with "Could not find the requested service dhcpcd". Gather service facts first and only run the disable when dhcpcd.service is actually present. Pi OS Buster/Bullseye still ship dhcpcd by default, so the existing behavior there is unchanged; Bookworm (12+) is already excluded by the version gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(install): mention x86 alongside Raspberry Pi in intro banner The intro banner only warned about losing the Pi's desktop environment; reword so it reflects that Anthias also runs on x86 and that the host is repurposed regardless of platform. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(host-agent): retry redis connect quietly on first-boot startup On a fresh install the host_agent.service unit starts before the redis docker container is accepting connections, so it crashed on every boot with a 50-line ConnectionRefusedError traceback, was restarted by systemd 10s later, and repeated until redis came up — typically ~5 minutes of journal noise per cold boot. Wrap the redis connect+subscribe in a tenacity Retrying matching the pattern already used by set_ip_addresses: retry only on redis.exceptions.ConnectionError, 5s fixed wait, up to 60 attempts, and use before_sleep_log so each retry logs a single WARN line instead of a traceback. After the bounded retry budget, the exception is re-raised and systemd's normal restart policy applies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(ansible): title-case Anthias in splashscreen task name The "Set plymouth default theme to anthias" task name was the only user-facing message in the playbook that didn't title-case Anthias. Audit covered task names, msg/fail_msg/debug fields, and systemd unit Description= templates; everything else was already correct. The command argument and changed_when comparison stay lowercase because they reference the literal on-disk theme identifier. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * style(host-agent): collapse before_sleep_log call to satisfy ruff format Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(sudoers): match host_agent.py's actual systemctl path host_agent.py invokes /usr/bin/systemctl, but the rule listed /bin/systemctl. On usrmerged Debian sudo matches by inode so this worked in practice, but the rule should match what the agent actually calls so it doesn't depend on /bin -> /usr/bin staying a symlink. Also drops /sbin/shutdown — the agent never invokes it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(host-agent): surface retry exceptions and support ens interfaces Two related fixes turned up while validating reboot/shutdown end-to-end on a multipass-launched Debian 13 VM: 1. set_ip_addresses() swallowed every retry's exception under the bare `except RetryError`, leaving only a generic "Unable to connect" warning. Add before_sleep_log(..., exc_info=True) so the actual requests.* exception is logged on each attempt, and put a 5s timeout on requests.get() so a hung connect can't stretch one attempt out. 2. SUPPORTED_INTERFACES missed `ens` (systemd "slot" naming used by QEMU/multipass and many cloud images), so get_ip_addresses() returned an empty list on those hosts even when the NIC was up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(host-agent): hoist internet probe URL to a named constant Pulls the 1.1.1.1 anycast literal out of set_ip_addresses() into INTERNET_PROBE_URL and silences the SonarCloud python:S1313 hotspot on the constant with a comment explaining the IP is Cloudflare's public anycast probe (not a private address). The previous commit's edit to the same line tripped the quality gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
52 lines
1.1 KiB
YAML
52 lines
1.1 KiB
YAML
name: Run Python Linter
|
|
|
|
on:
|
|
push:
|
|
branches:
|
|
- 'master'
|
|
paths:
|
|
- pyproject.toml
|
|
- uv.lock
|
|
- '**/*.py'
|
|
- '.github/workflows/python-lint.yaml'
|
|
pull_request:
|
|
branches:
|
|
- master
|
|
paths:
|
|
- pyproject.toml
|
|
- uv.lock
|
|
- '**/*.py'
|
|
- '.github/workflows/python-lint.yaml'
|
|
|
|
jobs:
|
|
run-python-linter:
|
|
runs-on: ubuntu-24.04
|
|
strategy:
|
|
matrix:
|
|
python-version: ["3.11"]
|
|
steps:
|
|
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6
|
|
|
|
- name: Set up Python ${{ matrix.python-version }}
|
|
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6
|
|
with:
|
|
python-version: ${{ matrix.python-version }}
|
|
|
|
- name: Install uv
|
|
uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0
|
|
with:
|
|
version: '0.9.17'
|
|
|
|
- name: Install dependencies
|
|
run: |
|
|
uv venv
|
|
uv pip install --group dev-host
|
|
|
|
- name: Run Ruff linting checks
|
|
run: |
|
|
uv run ruff check .
|
|
|
|
- name: Run Ruff formatting checks
|
|
run: |
|
|
uv run ruff format --check .
|