Commit Graph

57 Commits

Author SHA1 Message Date
Viktor Petersson
2d7b92c006 Hugo website: docs migration, API reference, FAQ, and SEO (#2807)
* Move website to Hugo

* Rewrite in progress

* Add Hugo-native API reference page and fix CSS build path

Two related changes for the Hugo site:

1. CSS build target: package.json's css:build/css:watch wrote to
   assets/styles/style.css, but baseof.html uses a plain <link href>
   that Hugo serves from static/. The merge left a stale 14K static
   copy alongside the freshly-built 23K asset copy, so pages rendered
   with most utility classes undefined. Build target is now
   static/assets/styles/style.css, matching the convention used by
   every other website asset.

2. Hugo-native API docs at /api/. The OpenAPI spec is loaded from
   data/openapi.yaml (generated via `manage.py spectacular`) and
   rendered in layouts/_default/api.html and a recursive schema
   partial. Endpoints are grouped by tag with anchor jumps, color-
   coded method badges, params/request/response tables, and inline
   $ref resolution. Renders all 18 v2 endpoints across 9 tags with
   the existing Tailwind theme. No third-party JS bundle.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Move documentation under Hugo and redirect old GitHub paths

Migrates docs/ markdown into website/content/docs/ rendered with a new
docs/ layout (list + single) and Tailwind prose styling. Images and
the d2 diagram move to website/static/docs/. Internal links rewritten
from /docs/foo.md to /docs/foo/, and GitHub-style alerts pre-converted
to bold-labeled blockquotes since the goldmark alert extension is not
enabled on this Hugo version.

The original docs/*.md files are kept as redirect stubs that point at
https://anthias.screenly.io/docs/... so external links into the GitHub
docs tree still resolve to a useful page. Root README.md links updated
to point at the website URLs.

Hugo nav now exposes Docs alongside Features / Get Started / API / FAQ.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix factual inaccuracies in migrated docs against the codebase

Reviewed all docs against the current source. Concrete fixes:

* _index.md: container names use the post-rebrand `anthias-` compose
  project prefix (e.g. `anthias-anthias-server-1`, `anthias-redis-1`)
  rather than the legacy `screenly-` form. Replaced `docker-compose
  logs` with `docker compose logs` and added the optional `anthias-
  caddy` sidecar to the container table.
* developer-documentation.md: fixed leading-letter typo ("unning"),
  and replaced the old Django test-runner invocation with the pytest
  commands used by the suite today (`pytest -n auto -m "not
  integration"` and `pytest -m integration`).
* balena-fleet-deployment.md: corrected the supported board list
  ($BOARD_TYPE) to match `bin/deploy_to_balena.sh --help`
  (`pi2`, `pi3`, `pi4-64`, `pi5` — no `pi1` or plain `pi4`). Updated
  registry reference from Docker Hub to GHCR.
* migrating-assets-to-screenly.md: `cd ~/screenly` → `cd ~/anthias`
  (post-rebrand install path).
* raspberry-pi5-ssd-install-instructions.md: fixed "Opitions" and
  "uinsg" typos in the boot-order steps.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Polish docs styling: callouts, syntax highlighting, hierarchy

Reworks docs prose styling so the migrated pages don't read like
default-Hugo-render-output:

* Headings: in-body H1/H2 collapse to a section divider style with a
  top border so they don't compete with the dark page hero. H4-H6
  become small uppercase eyebrows. Markdown sources mix #/##/####
  inconsistently — the visual scale now compresses gracefully.
* Alerts: a render-blockquote hook detects the bold-label preamble
  produced by our preprocessor (`> **Note**` etc.) and emits a typed
  `<blockquote class="docs-alert docs-alert-note">` so each kind gets
  its own colored border + label (note/tip/important/warning/caution).
* Syntax highlighting: enable Hugo Chroma with the github style,
  noClasses=false. Generated chroma.css ships as a static asset and is
  loaded alongside style.css. `pre`/`<code>` get a light surface that
  the chroma token colors sit on top of.
* Inline code, lists, links, tables, and images all get a small
  rebalance — bullet color, link underline weight, image shadow,
  table border-radius — to match the brand-purple theme.
* Footer: the Resources / Docs link pointed at the legacy
  github.com/.../docs/README.md path; now points at /docs/. Added an
  API Reference link alongside.
* Stripped a stray `<br>` in the Pi5 SSD doc that was creating a
  random gap between a blockquote and its illustrative screenshot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Make x86/PC docs consistent and more user-friendly

The migrated docs used four different forms — "x86", "x86 device",
"PC (x86) devices", and "PC (x86 Devices)" — depending on the page.
Standardize on **PC (x86)** as the user-facing label (PC is what
people search for; x86 stays as the architecture qualifier).

Also rewrites x86-installation.md from a flat bullet dump into a
clearer five-step walkthrough — what you need, download, flash,
install Debian, prep the system, run the installer — and crosslinks
the right anchor in installation-options.md so PC users can hand off
to the scripted install without scrolling.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Expand FAQ with forum-driven questions and refactor to data file

The FAQ had six entries that didn't reflect what people actually ask
on forums.screenly.io. Reviewed the all-time top topics and added the
ones that show up over and over: portrait rotation, YouTube playback,
Wi-Fi setup, static IP assignment, audio output, resolution / 4K,
black-screen troubleshooting, transitions, asset storage / backup,
SSH, HTTPS pointer, commercial-use clarity, getting logs, and a link
to the API reference.

Refactored the layout so it reads from data/faq.yaml grouped by
section (About, Installation & updates, Display & playback,
Operations) and renders each answer through markdownify. This makes
adding new entries a one-paragraph YAML edit instead of duplicating
~15 lines of accordion markup. Answers reuse the .docs-prose styling
so code, links, lists, and inline pre snippets all match the docs
pages.

Also tightened the "Accessing the REST API" section in /docs/ to
point at the new /api/ page first, with the live ReDoc URL on the
device as a secondary callout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Correct rotation FAQ — Anthias renders via linuxfb + DRM, no Wayland

Verified in code: docker/Dockerfile.viewer.j2 sets
QT_QPA_PLATFORM=linuxfb, webview/build_qt{5,6}.sh both pass
-skip wayland to the Qt build, and viewer/media_player.py invokes
mpv with --vo=drm. There is no Wayland compositor in the runtime
stack on any board.

Replaced the previous "Pi 5 with Wayland uses a different stack"
hand-wave with the actual fallback: if /boot/firmware/config.txt's
display_rotate=N doesn't stick on a Pi 5 / KMS pipeline, append
video=HDMI-A-1:...,rotate=N to /boot/firmware/cmdline.txt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Tighten three FAQ answers after a code-driven validation pass

* SSH: previous answer claimed SSH was on by default for both the
  Anthias disk image and the scripted install. Anthias's installer
  doesn't touch sshd at all, so the answer now distinguishes between
  the prebuilt images (SSH on) and a self-flashed Raspberry Pi OS Lite
  (SSH must be pre-enabled).
* Audio output: static/src/components/settings/audio-output.tsx hides
  the 3.5mm option on Pi 5 because the hardware lacks the jack. Call
  that out so Pi 5 users don't go looking for a missing dropdown item.
* Black screen: replaced the `xset dpms force on` suggestion. Anthias
  has no X server on any board (Qt runs on linuxfb, mpv on --vo=drm),
  so xset can't toggle DPMS. Pointed users at re-seating HDMI or
  checking the TV's input as a more grounded recovery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Correct features-page claim — Anthias detects display state, can't toggle it

The "Display power control" card promised programmatic on/off
toggling of the connected screen for energy savings. That isn't a
real feature. lib/diagnostics.py only calls libcec's tv.is_on() to
*query* the TV's power state — there's no power-on / standby command
path anywhere in the codebase. The result surfaces read-only as
display_power on the System Info page (static/src/components/
system-info.tsx).

Replaced the card with what's actually shipping: HDMI-CEC display
state *detection*, visible on the System Info page.

Verified the rest of the page against code while I was in there.
Accurate as written: image/video/webpage assets (Qt webview + mpv),
scheduling (start_date/end_date/duration on the asset model), drag-
drop playlists (@dnd-kit/sortable), shuffle (settings.shufflePlaylist),
1080p output (mpv pinned to 1920x1080@60 on pi4-64/pi5 in
viewer/media_player.py), real-time WebSocket sync (Django Channels +
Redis pub/sub), REST API (drf-spectacular), four-container compose
topology, backup/restore, optional basic auth (lib/auth.py BasicAuth),
System Info page fields (loadavg/free_space/uptime/anthias_version),
and the supported hardware list (matches ansible/site.yml's
device_type assertion).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Punchier homepage tagline: "Free digital signage for everyone."

Replaces "Open source digital signage for any screen" with a shorter,
benefit-led headline. The new line breaks naturally across two lines
on desktop (free digital signage / for everyone.) and stays single-
line on mobile to avoid an awkward orphan.

Subtitle is unchanged — it still does the explanatory work (Pi or
PC, schedule images/videos/webpages, no subscriptions).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* SEO sweep: per-page meta, FAQPage / TechArticle JSON-LD, robots.txt

Two functional gaps in the existing setup:

* og:description and twitter:description were hardcoded to a single
  marketing line on every page, while the per-page <meta name=
  description> already pulled from front matter. So the Slack/Twitter/
  Discord card preview always read the same blurb regardless of which
  page you shared. Now both the OG and Twitter description reflect the
  page's own .Params.description.

* Page titles drifted: most pages embedded "Anthias" in the title
  string, but the docs pages were just "Documentation" /
  "Installation Options" / etc. — fine for the H1, weak for SERPs.
  Title now appends " | Anthias" only when the page title doesn't
  already contain the brand, so existing branded titles stay clean
  and docs pages get a brand suffix automatically.

Other tightening:

* Added FAQPage structured data on /faq/ generated from data/faq.yaml
  so Google can surface FAQ rich results.
* Added TechArticle structured data on individual /docs/ pages.
* og:type now flips to "article" on docs pages.
* og:image:alt + twitter:image:alt populated.
* theme-color set to the brand purple for mobile browser chrome.
* JSON-LD home schema URL now uses site.BaseURL instead of a
  hardcoded production URL — important for staging / dev parity.
* <html lang> reads site.LanguageCode instead of a fixed "en".
* Added a real robots.txt that points crawlers at /sitemap.xml
  (Hugo already generates the sitemap, but a robots.txt makes the
  pointer explicit and unblocks tooling that looks for it).

Replaced placeholder image alt text in the docs ("balena-ss-01",
"imager-01", "rpi-eeprom-update", etc.) with descriptive captions —
better for screen readers and image-search SEO.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Move site assets into Hugo's expected layout and rename /docs URLs

Two related cleanups.

ASSETS — site assets were split across website/static/assets/ (the
shadowed copy hugo.toml's [[module.mounts]] directed traffic to) and
website/assets/ (an unused duplicate). Hugo's own build report showed
"Processed images: 0" because nothing actually flowed through Pipes.

  * Removed the [[module.mounts]] override so Hugo uses default
    layout: assets/ for Pipes-processable resources, static/ for
    served-as-is files.
  * Used `git mv` to record the docs/ image and stylesheet renames as
    history-preserving moves rather than delete+add diffs.
  * Removed the duplicate website/static/assets/images/ directory —
    files already lived in website/assets/images/.
  * Bun's css:build/css:watch now write to assets/styles/style.css so
    Tailwind output flows through Hugo Pipes.
  * baseof.html loads style.css and chroma.css via resources.Get +
    fingerprint, with SRI integrity attributes. Each deploy produces
    a fresh content-hashed URL (/styles/style.<hash>.css), so the
    browser cache invalidates correctly without manual cache-busting.
  * Logos, social icons, hero raster (overview*.png), favicon, and
    plus/minus accordion icons all flow through resources.Get for
    consistent asset handling.
  * Added layouts/_default/_markup/render-image.html so markdown
    image references in /docs are looked up via resources.Get and
    emitted with loading="lazy" decoding="async".

URL RENAMES — the docs URLs were verbatim copies of the original
GitHub filenames, which made for noisy URLs like
/docs/raspberry-pi5-ssd-install-instructions/. Slugged each page and
left aliases for the old paths so Hugo emits a meta-refresh redirect:

  /docs/installation-options/                       → /docs/install/
  /docs/balena-fleet-deployment/                    → /docs/balena/
  /docs/x86-installation/                           → /docs/pc/
  /docs/raspberry-pi5-ssd-install-instructions/     → /docs/pi5-ssd/
  /docs/migrating-assets-to-screenly/               → /docs/migrate-to-screenly/
  /docs/qa-checklist/                               → /docs/qa/
  /docs/developer-documentation/                    → /docs/development/

Cross-doc links inside /docs and the README + repo-root docs/ stub
files all point at the new URLs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix lint + mypy on raspberry_pi_imager test (carried over from rebase)

The test_build_pi_imager_json.py file landed in `88d3881b Move website
to Hugo` with two pre-existing CI failures:

* ruff format --check: a few helper definitions had a stale line
  break the formatter wanted to collapse.
* mypy: `make_image_metadata(board: str) -> dict` is missing the
  generic type parameters that the project's mypy config flags as
  type-arg. Annotated as `dict[str, Any]`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Self-host Plus Jakarta Sans via @fontsource (drop Google Fonts CDN)

Removes the third-party Google Fonts <link>. SonarCloud's Web:S5725
hotspot was flagging the link as a resource-integrity (SRI) risk —
SRI is impossible against Google Fonts because the served stylesheet
rotates per User-Agent and the woff2 URLs change with the font CSS.

Self-hosting the same font from npm via @fontsource removes the
cross-origin resource entirely.

How it's wired:

* `bun add -D @fontsource/plus-jakarta-sans` for the font binaries.
* `scripts/install-fonts.ts` is a small bun script that, given the
  installed package, copies woff2 files for latin + latin-ext at
  weights 400/500/600/700/800 to `static/fonts/` (so Hugo serves
  them at `/fonts/...`) and emits a combined
  `assets/fonts/plus-jakarta-sans.css` with the urls rewritten to
  absolute /fonts/... paths and the woff fallback stripped.
* `package.json` adds `fonts:install`, and chains it through
  `css:build` / `css:watch` so Tailwind always sees the generated
  CSS up to date.
* `main.css` @imports the generated CSS — Tailwind/Lightning CSS
  inlines the @font-face rules into the final fingerprinted
  style.<hash>.css.
* `.gitignore` excludes `assets/fonts/` and `static/fonts/` since
  both are deterministically regenerated from node_modules.
* `baseof.html` no longer pulls from fonts.googleapis.com.

Total payload: 10 woff2 files (~136KB), but each is loaded
on-demand by unicode-range — typical English-only visitors fetch
~50KB of fonts, served from same-origin.

The second Web:S5725 hotspot (gtag.js from googletagmanager.com)
is unchanged in this commit — Google's tag manager script is
updated server-side without a stable hash, so SRI cannot apply.
That one needs a product call (keep with dismissal, drop GA, or
move to a privacy-first SRI-friendly alternative).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address SonarCloud code-smell findings on the website

Cleared the unrelated SonarCloud findings raised on this PR:

* `install-fonts.ts`: `fs` and `path` imports use the `node:` prefix
  (typescript:S7772). The new prefixed form is the bun-recommended
  one, no behavior change.
* `_markup/render-image.html`: rewrote the comment that referenced
  `<img>` literally — Web:ImgWithoutAltCheck was treating the word
  inside the Hugo comment block as an actual element with no alt.
* `_default/faq.html`: replaced the accordion's `<div role="region">`
  with a real `<section>` element (Web:S6819). The aria-labelledby
  binding stays, so the accessible name resolution is identical and
  the semantics are now native rather than ARIA-emulated.
* `assets/styles/chroma.css`: stripped the two stray-semicolon lines
  left over from the sed pass that emptied the github-style backdrop
  (css:S1116). The remaining `.chroma { -webkit-text-size-adjust:
  none }` rule is what's actually load-bearing.
* `_default/baseof.html`:
  - accordion JS now reads `this.dataset.accordion` instead of
    `this.getAttribute('data-accordion')` (javascript:S7761).
  - GA bootstrap uses `globalThis.dataLayer` instead of
    `window.dataLayer` (javascript:S7764). Same semantics in any
    browser context, no globalThis polyfill needed for our targets.
* `layouts/index.html`: dropped the deprecated `scrolling="0"`
  attribute from the GitHub stars iframe (Web:S1827); replaced
  with the equivalent `overflow-hidden` Tailwind class.

The Web:S5725 SRI hotspot on the gtag.js script (line 162 of
baseof.html) is the only remaining finding. Google Tag Manager is
versioned server-side without a stable hash, so SRI fundamentally
can't apply — that one is being kept and dismissed in the
SonarCloud UI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot review + trigger marketing deploy on release publish

Copilot review:

* api.html: request-body renderer only looked at application/json,
  so endpoints whose only content type is multipart/form-data (file
  uploads) or application/x-www-form-urlencoded would render an
  empty Request body section. Pick application/json first if present,
  otherwise fall back to the first listed content type, and label
  the rendered schema with its actual content type.
* build_pi_imager_json.py: every requests.get() now sets a 30s
  timeout and calls raise_for_status() so a slow/rate-limited GitHub
  API doesn't hang the deploy job and a 4xx/5xx fails fast with a
  clear message rather than a confusing KeyError on response.json().
* docs/raspberry-pi5-ssd-install-instructions.md: "Other HAT's" →
  "Other HATs".
* docs/qa-checklist.md: dropped the spurious "a" in "Change a the
  start and end dates".
* deploy-website.yaml: jq's has() takes one key, so the validation
  step `has("name", "description", ...)` was actually a syntax error
  on every run — rewrote as `all($k; $entry | has($k))` over the
  required-key list.
* layouts/_default/get-started.html: the "Documentation" CTA pointed
  at the old GitHub markdown file; now links to /docs/ to match the
  navbar / footer.
* website/README.md: rewrote the project-structure tree to match
  what's actually in the repo (data/, scripts/, layouts/docs/,
  Goldmark _markup/ hooks etc.) and documented the bun pipeline —
  `hugo server` alone leaves /fonts/* as 404s because the woff2
  files are gitignored and materialized by `bun run fonts:install`.

Marketing deploy on release publish:

`build-balena-disk-image.yaml` cuts the GitHub release with the
*.img.zst artefacts as its final step; until now the marketing site
only re-deployed on master push or manual dispatch, so rpi-imager.
json on the live site lagged the freshest disk images by however
long it took someone to push an unrelated website change. Hooking
deploy-website.yaml to `on: release: types: [published]` makes the
site rebuild as soon as the release exists, which is exactly when
the GitHub API starts surfacing the new assets the JSON generator
queries. `prerelease=true` releases are included because that's what
build-balena-disk-image.yaml currently flags every release as.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address second round of Copilot review

* installation-options.md: balenaEthcher → balenaEtcher.
* balena-fleet-deployment.md: includa → include.
* developer-documentation.md: spash screen → splash screen.
* qa-checklist.md: enabling **Show splash screen** is supposed to
  *display* the splash, not hide it — flipped "is not being
  displayed" → "is being displayed". Also clickin → clicking.
* raspberry-pi5-ssd-install-instructions.md: `sudo apt update -y`
  isn't valid (apt's -y is only for install / upgrade), so the
  copy-paste step would error. Dropped the `-y` from update; the
  full-upgrade line keeps it because that's where it actually does
  something.
* deploy-website.yaml: the jq required-keys check was missing
  `icon` and `website`, which build_pi_imager_json.py's
  REQUIRED_FIELDS already enforces in the Python tests. Added them
  so the runtime validation matches the generator's contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address third round of Copilot review

* website/.gitignore: `_.log` was a typo from the original Hugo
  bootstrap — it doesn't match anything. Replaced with the intended
  `*.log` so log files are actually ignored.
* website/package.json: rewrote the `dev` script to capture both
  child PIDs and trap EXIT/INT/TERM so Ctrl-C (or hugo crashing)
  takes the Tailwind watcher down with it. Mirrors the pattern in
  the repo-root package.json's `dev`.
* docs/raspberry-pi5-ssd-install-instructions.md: "Early Pi 5's"
  → "Early Pi 5s" (no apostrophe on plurals).
* docs/qa-checklist.md: "make sure that the screen in standby mode"
  → "make sure that the screen is in standby mode" (missing verb).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Rewrite raspberry_pi_imager tests in pytest style

The file was unittest.TestCase classes — pytest discovers and runs
those, but the boilerplate doesn't earn its keep. Each test method
re-declared `@patch('...requests.get')` and rebuilt the same
MagicMock setup, and the per-board cases lived as 5+3+2+2 separate
methods that should have been one parametrize each.

Reworked as flat module-level functions backed by three fixtures:

* `mock_requests_get` — patches the module's `requests.get` and yields
  the mock so each test sets `return_value` / `side_effect` directly.
* `mock_release_assets` — preconfigured to return the canned release
  asset list, used by the `get_asset_list` cases.
* `mock_full_build` — wires up the three call shapes
  `build_imager_json()` makes (latest, asset list, per-asset json).

Per-board cases collapse into `@pytest.mark.parametrize`:
get_board_from_url's positive cases, the non-image-returns-None
cases, the maintenance-mode boards, and the modern boards.

Coverage is the same — 21 collected cases (pytest fans the parametrize
out from 12 test methods to 21 ids), all passing in 0.12s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix PR checks and Copilot review items

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 09:16:25 +01:00
Viktor Petersson
5e00c8ba25 refactor(docker): drop celery image, restore base apt layer dedup (#2776)
* refactor(docker): drop celery image, restore base apt layer dedup

- Delete Dockerfile.celery.j2; compose now runs celery on the
  anthias-server image with a `command:` override.
- Make viewer extend Dockerfile.base.j2 (mirroring test); drop 17
  packages duplicated between viewer and base_apt_dependencies, plus
  4 within-list duplicates.
- Move `# syntax=docker/dockerfile:1.4` to line 1 of every rendered
  Dockerfile. It previously lived in uv-builder.j2 line 1 and got
  bumped mid-file for server by the bun-builder prelude, silently
  disabling the 1.4 frontend and breaking cache-key parity with
  viewer — the actual blocker for layer dedup.
- Collapse CI matrix from (board × service) to (board) so all
  services for a board build on the same runner with the same
  buildkit cache, producing byte-identical apt layer digests at the
  registry.
- Add ENV DJANGO_SETTINGS_MODULE to the server image so the merged
  image runs both server and celery CMDs.
- Update all five compose templates (prod, balena prod, balena dev,
  dev, test) to redirect anthias-celery at the server image with a
  command: override. dev compose pins an explicit `image:` tag so
  both services share the locally-built SHA.
- Remove old anthias-celery / srly-ose-celery containers in
  upgrade_containers.sh so the recreated container can take the name.

Verified end-to-end on x86: server and viewer apt layers share a
single digest; SHARED SIZE jumps from 132 MB to 1.216 GB; merged
image runs both workloads in compose (celery task round-trips
through Redis to SUCCESS).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* perf(docker): cache buildkit layers in GHCR registry across CI runs

Add a --cache-backend / $BUILDX_CACHE_BACKEND option to
tools.image_builder with two modes:

- `local` (default): writes to /tmp/.buildx-cache/<board>/.
  Unchanged from before; right for local dev.
- `registry`: pushes BuildKit cache to
  ghcr.io/screenly/anthias-<service>:buildcache-<board>. Reuses the
  GHCR login already done by docker-build.yaml, no extra tokens or
  third-party actions needed.

Wire CI to use registry mode on push events (master) so subsequent
runs of the same board pull cached layers — the ~825 MB extracted
apt install per service goes from ~3 min cold to a few seconds
warm. workflow_dispatch on a non-master branch falls back to local
mode (effectively no-cache) so manual runs can't pollute the master
cache.

Drop the old actions/cache@v5 step that mirrored
/tmp/.buildx-cache/<board> through actions/cache — registry cache
is per-step rather than one big tarball, so it survives the GitHub
Actions cache 10 GB-per-repo eviction better.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(image-builder): move local cache out of /tmp to user XDG cache dir

SonarCloud python:S5443 flagged the previous /tmp/.buildx-cache/
default as a security hotspot — `/tmp` is world-writable, so on a
multi-user host another account could in principle tamper with the
buildkit cache. Switch to $XDG_CACHE_HOME/anthias-buildx/<board>/
(default ~/.cache/anthias-buildx/), which is per-user by default
and follows XDG Base Directory convention.

CI is unaffected: docker-build.yaml uses --cache-backend=registry
on push events, which pushes cache to GHCR and never touches the
local path. Local dev users with stale state in
/tmp/.buildx-cache/<board>/ can rm it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): correct cache-backend comments to match real behavior

Two doc fixes per Copilot review on #2776:

- tools/image_builder/__main__.py: the cache-backend rationale
  block still referenced /tmp/.buildx-cache/<board>; update to
  $XDG_CACHE_HOME/anthias-buildx/<board> so it matches the
  implementation moved in 529a50e0.
- .github/workflows/docker-build.yaml: the env comment claimed
  pull-request builds read from the registry cache, but this
  workflow has no pull_request trigger — non-push runs are
  workflow_dispatch, which both falls through to local cache and
  skips `docker login ghcr.io`, so it has no GHCR auth at all.
  Rewrite the comment around the push / workflow_dispatch split
  the code actually implements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): address Copilot review on registry cache + test compose

- tools/image_builder/__main__.py: comment in the registry-cache
  branch said the cache namespace was "picked from the build's tag
  list", but the implementation hardcodes
  ghcr.io/screenly/anthias-{service}. Rewrite the comment to
  describe what the code actually does and call out the hardcode
  so a future namespaces refactor doesn't silently break cache.
- docker-compose.test.yml: anthias-celery had its own `build:`
  block pointing at Dockerfile.test, claiming "reuses the test
  image" — but compose builds two separate images per service
  even with identical context, defeating the dedup intent. Mirror
  the docker-compose.dev.yml pattern: pin anthias-test to an
  explicit `image: anthias-test:dev` tag and have anthias-celery
  reference the same tag with no `build:`. Also bind-mount the
  source into celery so it picks up code changes (matches
  anthias-test's existing volume).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(image-builder): read-only registry cache without --push

Per Copilot review: --cache-backend=registry previously tried to
push cache to ghcr.io/... regardless of --push, so a local invocation
without GHCR auth would fail mid-build with a confusing registry
error. Split the behavior:

- Reads (cache_from) are always set when registry mode is active —
  the anthias-* GHCR packages are public, so warm-starting off CI's
  cache without auth works and helps local dev.
- Writes (cache_to) only happen when --push is also set, since
  that's when the workflow has authenticated to GHCR. Without
  --push, log a yellow warning and skip cache_to.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): set DJANGO_SETTINGS_MODULE in test image for celery worker

Per Copilot review on #2776 (suppressed-due-to-low-confidence note,
but the bug is real): docker-compose.test.yml runs the celery
worker from anthias-test:dev. celery_tasks.py calls django.setup()
at module import time, which needs DJANGO_SETTINGS_MODULE in the
environment. The pre-refactor Dockerfile.celery.j2 set it
explicitly; this PR moved that ENV to Dockerfile.server.j2 only,
so the production celery (running on the server image) is fine but
the test celery would have crashed with ImproperlyConfigured.

Set the same ENV in Dockerfile.test.j2. Server and test images
both ship a usable Django environment for any process that imports
anthias_django.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 15:21:43 +01:00
Viktor Petersson
7476a43b27 chore: drop wifi-connect service end-to-end (#2763)
The anthias-wifi-connect captive-portal helper has been pinned to
balena-os/wifi-connect v4.11.1 (Feb 2023) for ~3 years; upstream
dropped the ARMv6 binary back in v4.4.6 so Pi 1 was silently
shipping a wifi-connect container with no binary inside, and the
host script `bin/start_wifi_connect.sh` had a `set -e`-vs-`$?` bug
that made the captive-portal branch unreachable. nmcli/nmtui covers
the supported install path.

Removing the whole service rather than bumping it: there are no
production users left and bumping would require rewriting both the
architecture-to-asset matcher (Rust target triples now) and the
unzip step (tar.gz now).

Removed
- Container build:  docker/Dockerfile.wifi-connect[.j2],
                    `wifi-connect` group in pyproject.toml + uv.lock,
                    `wifi-connect` entry in image_builder SERVICES,
                    `get_wifi_connect_context()`,
                    `wifi-connect` cell in CI matrix +
                    docker-build.yaml retag SERVICES list.
- Compose:          `anthias-wifi-connect` service from prod / balena
                    / balena-dev templates, plus the now-unused
                    `host.docker.internal:host-gateway` extra_hosts
                    on `anthias-viewer`.
- Helper scripts:   bin/start_wifi_connect.sh,
                    start_wifi_connect_service.sh,
                    send_zmq_message.py.
- Viewer plumbing:  the second ZmqSubscriber bound to
                    host.docker.internal:10001, the
                    `viewer-subscriber-ready` Redis flag, the
                    `setup_wifi` / `show_splash` / `show_hotspot_page`
                    handlers and their entries in the `commands`
                    dict, the `mq_data` / `load_screen_displayed`
                    globals, and the now-unused `redis_connection`
                    parameter on `ZmqSubscriber`.
- Server:           `/hotspot` URL route, `views_files.hotspot`,
                    `HOTSPOT_FILE` / `INITIALIZED_FLAG` constants,
                    `HotspotViewTest`, templates/hotspot.html,
                    static/img/wifi-off.svg, /data/hotspot dir
                    creation in bin/start_viewer.sh.
- Host:             sudoers entry for /usr/local/sbin/wifi-connect,
                    ansible/roles/network template + vars.
- Docs:             docs/wifi-setup.md, the Wi-Fi Setup section and
                    container row in docs/README.md, the
                    wifi-connect.service line and stale
                    `initialized` flag bullet in
                    docs/developer-documentation.md, the
                    "Reset Wi-Fi → hotspot page" step in
                    docs/qa-checklist.md.

Migration paths kept (intentional)
- bin/upgrade_containers.sh now runs `docker rm -f` on
  anthias-wifi-connect and srly-ose-wifi-connect alongside the
  existing nginx/websocket cleanup, so on next pull devices drop
  the stale container.
- ansible/roles/network/tasks/main.yml stops, disables, and
  removes /etc/systemd/system/wifi-connect.service, then notifies
  a new `Reload systemd daemon` handler. Idempotent on fresh
  installs.

Verified
- `ruff check` + `ruff format --check`: clean.
- Strict `mypy .` (django-stubs + drf-stubs plugins): 97 files,
  0 issues.
- `ansible-lint ansible/`: passes at the `production` profile.
- All three compose templates render and parse via
  `docker compose config`.
- `python -m tools.image_builder --dockerfiles-only` generates
  the remaining 5 services with no Dockerfile.wifi-connect
  produced.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 09:40:41 +01:00
Viktor Petersson
f421130b24 refactor(server): collapse nginx + websocket containers into uvicorn (#2757)
* refactor(server): collapse nginx + websocket containers into uvicorn

Replace the nginx + gunicorn + gevent-websocket trio with a single
uvicorn ASGI server inside `anthias-server`:

* HTTP, /static/, /anthias_assets/, /static_with_mime/, and /hotspot
  are now served from Django (WhiteNoise + small file-serving views in
  `anthias_app/views_files.py` that re-implement nginx's IP allowlists).
* WebSockets move from a separate gevent process talking ZMQ to Django
  Channels with a Redis-backed channel layer, fanned out by celery via
  `channel_layer.group_send`.
* TLS termination is handled by uvicorn directly when SSL_CERTFILE /
  SSL_KEYFILE are set; `bin/enable_ssl.sh` now writes a compose
  override (no longer ansible) and a companion `bin/disable_ssl.sh`
  removes it. Cert + key live under `~/.anthias/ssl/`.
* `bin/upgrade_containers.sh` removes the legacy `anthias-nginx` and
  `anthias-websocket` containers on upgrade so they don't linger.
* Drop `gunicorn`, `gevent`, `gevent-websocket`, and the `websocket`
  uv group from `pyproject.toml`; add `channels`, `channels-redis`,
  `daphne`, `uvicorn[standard]`, and `whitenoise`.

Notes on hardening: `--forwarded-allow-ips` defaults to off so the IP
allowlist can't be bypassed via a spoofed `X-Forwarded-For`; operators
behind a reverse proxy can opt in via the `FORWARDED_ALLOW_IPS` env
var. Backup uploads previously sized by nginx's `client_max_body_size
4G` are preserved by setting `DATA_UPLOAD_MAX_MEMORY_SIZE = None`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address review feedback on uvicorn migration

* Drop USE_X_FORWARDED_HOST (inconsistent with the deliberate
  --forwarded-allow-ips hardening; without a proxy, X-Forwarded-Host is
  client-controlled).
* Remove daphne — uvicorn runs production and the test environment now
  uses it too (bin/prepare_test_environment.sh).
* Replace _safe_join's parents-membership check with Path.is_relative_to.
* Drop AllowedHostsOriginValidator wrapper (no-op under ALLOWED_HOSTS=['*'])
  and document where to put it back if hosts are ever locked down.
* Rename DOCKER_CIDR → DOCKER_BRIDGE_CIDR with a comment that this is
  defense-in-depth, not a real perimeter (LAN clients via the published
  port also appear in 172.16/12).
* Add anthias_app/tests.py covering the IP allowlists, mime override,
  hotspot gating, and traversal/symlink rejection in _safe_join (17 tests).
* Note the single-worker ZmqPublisher bind constraint in start_server.sh
  so a future scale-up doesn't EADDRINUSE on tcp://0.0.0.0:10001.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): clear SonarCloud hotspots on uvicorn migration

* Restrict views_files.anthias_assets / static_with_mime / hotspot to
  GET via @require_GET (Sonar S3752, x3): they are read-only file
  servers and should reject other methods at the view boundary.
* Mark RFC1918 / Docker-bridge CIDR literals as NOSONAR S1313 (x4):
  they are intentional, well-known private network ranges.
* Mark `http://*` in CSRF_TRUSTED_ORIGINS as NOSONAR S5332 with a
  comment explaining devices ship over HTTP and operators opt into TLS
  via bin/enable_ssl.sh.

Existing 17 view tests continue to pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: clear remaining static-analysis findings

* ruff format -- the previous tests.py reformatted itself; CI's
  `ruff format --check` now passes.
* CodeQL py/path-injection on _safe_join: rewrite using
  os.path.realpath + os.path.commonpath, which CodeQL recognises as a
  sanitiser for path-injection sinks. Behaviour is identical to the
  Path.is_relative_to version (both reject `..` and symlink escapes;
  the 17 tests in anthias_app/tests.py still pass).
* SonarCloud NOSONAR markers: switch to the codebase's bare `# NOSONAR`
  form (matches host_agent.py and tests/test_backup_helper.py); the
  earlier `# NOSONAR <rule>` form was not being honoured.
* Centralise the test-fixture IPs in module-level constants so S1313
  is suppressed in one place rather than at every callsite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): inline path-injection check in views

CodeQL only treats os.path.commonpath as a sanitiser when the check
sits in the same function as the file-system sink — calling
_safe_join() from a separate function still leaves the open()/isfile()
sinks tainted (4 alerts on PR #2757).

Repeat the realpath + commonpath check inline in anthias_assets and
static_with_mime so CodeQL can prove the post-check path stays under
the configured root. _safe_join is kept for the SafeJoinTest unit
tests and as a documented helper.

Existing 17 tests in anthias_app/tests.py continue to pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(security): use realpath+startswith path sanitiser for CodeQL

CodeQL's path-injection model recognises the canonical
`realpath(...).startswith(base + sep)` pattern but apparently not
`os.path.commonpath(...) == root` in this codepath. Switch the inline
check in anthias_assets and static_with_mime to startswith so the
analyser can prove the post-check path stays under the configured
root.

Behaviour is identical: traversal and symlink-escape still 404
(verified by SafeJoinTest + view tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Copilot review feedback

* lib/utils.py imported channels/asgiref at module level. The viewer
  container imports lib.utils via viewer/__init__.py but its uv
  dependency group does not ship channels, so the viewer would
  ImportError on startup. Move the channels imports into
  YoutubeDownloadThread.run() (server/celery-only path) so lib.utils
  remains importable from the viewer.
* Drop the unused _safe_join() helper and its three SafeJoinTest
  cases — the views inline a realpath+startswith sanitiser (CodeQL
  needs the check in the same function as the sink), and the helper
  was only being exercised in isolation. Add an equivalent
  symlink-escape test against anthias_assets so the actual code path
  used by the views is covered.
* Refresh the anthias_django/settings.py docstring + Django doc URLs
  from /3.2/ → /4.2/ to match the pinned Django version.

15 view tests pass (was 17 — lost 3 SafeJoinTest + gained 1 symlink
test against the real view).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: refresh architecture diagram for uvicorn migration

Drop the anthias-nginx and anthias-websocket nodes (and their edges)
from docs/d2/anthias-diagram-overview.d2 — the user now talks
directly to anthias-server (uvicorn handling HTTP + /ws), Celery
fans out asset-update events through the Redis-backed Channels
layer, and the viewer fetches media from anthias-server over HTTP.

Regenerate the SVG with d2 v0.7.1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Copilot SSL + CSRF / WS-origin feedback

* Dual uvicorn listeners when SSL is enabled (Copilot #1, #2). HTTP on
  $HTTP_PORT (default 8080) for inter-container traffic — viewer +
  webview hit anthias-server over plain HTTP on the Docker network and
  cannot validate uvicorn's self-signed cert. HTTPS on $HTTPS_PORT
  (default 8443) for external clients. bin/enable_ssl.sh now appends
  443:8443 to the compose ports list (instead of using `!override` to
  swap 80:8080 for 443:8080), so port 80 stays available for backward
  compatibility and the Docker-network HTTP port keeps working.
* Drop CSRF_TRUSTED_ORIGINS = ['http://*', 'https://*'] (Copilot #3).
  Verified via Django shell: those leading wildcards are ignored by
  Django 4.2 (only subdomain wildcards like https://*.example.com are
  honoured), so the setting was a no-op. Same-origin POSTs still pass
  through Django's built-in Origin/Host check.
* Re-add channels.security.websocket.AllowedHostsOriginValidator to
  the WebSocket router (Copilot #5). Currently a no-op under
  ALLOWED_HOSTS=['*'], but tightening ALLOWED_HOSTS later will now
  also tighten /ws.

Smoke test (dev + SSL override):
- HTTP  http://localhost:8000/      -> 200
- HTTPS https://localhost:8443/     -> 200
- HTTP  http://localhost:8443/      -> 000 (TLS-only, expected)
- internal http://localhost:8080/   -> 200
- 15 view tests still pass.

Note: Copilot #4 (Docker-bridge CIDR is bypassable via the published
port) is documented in views_files.py as defense-in-depth and matches
the original nginx posture; switching to app-layer auth is out of
scope for this PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(ssl): switch from in-uvicorn TLS to a Caddy sidecar

The previous SSL implementation gave anthias-server two uvicorn
listeners (HTTP + HTTPS) so the viewer/webview could keep talking
plain HTTP over the Docker network while external clients got TLS.
That dual-listener dance is non-zero overhead and complicates signal
handling. Switch to the standard reverse-proxy pattern instead.

When SSL is enabled by bin/enable_ssl.sh:

* anthias-server stays a single uvicorn listener on plain HTTP 8080
  (no SSL_CERTFILE/SSL_KEYFILE knobs, no dual-port logic).
* A Caddy sidecar (caddy:2-alpine, only present when the override is
  installed) terminates TLS on host port 443, redirects 80→443, and
  reverse-proxies to anthias-server:8080 — so X-Forwarded-Proto /
  X-Forwarded-For are forwarded as-is by Caddy.
* The override removes anthias-server's external port mapping
  (`ports: !override []`), so all external traffic must enter through
  Caddy and the IP allowlists in views_files.py see the original LAN
  client IP rather than the docker-bridge gateway. Inter-container
  traffic is unchanged.
* `FORWARDED_ALLOW_IPS=*` is set on anthias-server in the override —
  safe because anthias-server is no longer reachable from outside the
  Docker network — and `SECURE_PROXY_SSL_HEADER` is added in Django
  settings so request.is_secure() returns True for HTTPS callers.
* When SSL is *not* enabled there is zero new container, zero new
  config — the base compose file is untouched and Caddy isn't pulled
  or run.

bin/disable_ssl.sh now also removes the anthias-caddy container
before deleting the override, so HTTPS-only state is fully reversed.

Smoke-tested with a temporary Caddy override:
- HTTPS via Caddy:        200
- HTTP via Caddy:         301 → https://...
- Direct anthias-server:  refused (port mapping dropped by override)
- WebSocket upgrade:      101 Switching Protocols
- request.is_secure() with X-Forwarded-Proto=https: True
- 15 anthias_app view tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs(views_files): document IP-allowlist threat model

Spell out exactly when the docker-bridge CIDR check is and isn't a
real perimeter:

* No-SSL default: anthias-server is published as 80:8080, so requests
  arrive with REMOTE_ADDR set to the docker bridge gateway (172.x) and
  LAN clients aren't actually excluded. Trying to plug the gap with
  auth would be security theatre — credentials would travel in
  plaintext over the LAN anyway.
* SSL via the Caddy sidecar: Caddy terminates TLS, rewrites
  X-Forwarded-For, uvicorn honours it (FORWARDED_ALLOW_IPS=*), and the
  check sees the real client IP — so the bypass is closed for any
  deployment that actually cares about confidentiality.

This is documentation only; no behavioural change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(ssl): add --domain (auto Let's Encrypt) + drop openssl shim

bin/enable_ssl.sh now has three modes instead of two:

* Default (no args) — Caddy issues per-SNI certs lazily from its
  built-in local CA via `tls internal { on_demand }`. Drops the
  openssl self-signed-cert generation step entirely; Caddy persists
  the CA in the anthias-caddy-data volume and rotates leaf certs
  itself. Browsers still warn (CA is local) but no openssl/cert
  hygiene is needed on the host.

* `--domain example.com [--email you@example.com] [--staging]` —
  Caddy auto-issues + renews from Let's Encrypt. Caddy auto-creates
  the HTTP→HTTPS redirect for hostname sites. Use `--staging` to point
  at the ACME staging endpoint while testing, so the production rate
  limits aren't burned.

* `--cert /path/to/cert.pem --key /path/to/key.pem [--domain ...]` —
  unchanged: bring your own cert, Caddy serves it as-is with
  `auto_https off`.

Verified:
- All three Caddyfiles pass `caddy validate`.
- Default mode end-to-end: HTTPS=200 with cert from "Caddy Local
  Authority - ECC Intermediate", per-SNI SANs (DNS:localhost,
  IP Address:192.168.99.99 etc.), HTTP→HTTPS=301, /ws upgrade=101,
  anthias-server's external port mapping is dropped so direct access
  is refused.

Docs (CLAUDE.md, docs/README.md, docs/developer-documentation.md)
updated to describe the Caddy sidecar instead of in-uvicorn TLS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address self-review findings on PR #2757

* Gate SECURE_PROXY_SSL_HEADER on FORWARDED_ALLOW_IPS
  (anthias_django/settings.py): without the gate, a client on a
  plain-HTTP deploy could send `X-Forwarded-Proto: https` and flip
  `request.is_secure()`. Django reads the header from META directly,
  independent of uvicorn's --proxy-headers flag, so the previous
  unconditional setting was actually exploitable in non-SSL mode
  (secure-cookied sessions would drop on the next plain-HTTP request,
  redirects would point at https:// URLs that don't exist).

  Verified live: non-SSL → SECURE_PROXY_SSL_HEADER is None and
  is_secure() with spoofed XFP=https returns False; SSL via Caddy
  override → header is set and is_secure() returns True.

* Replace the isfile() pre-check + open() in anthias_assets and
  static_with_mime with a try/except FileNotFoundError around open()
  (anthias_app/views_files.py). Eliminates a (tiny but real) TOCTOU
  window between the stat and the open. IsADirectoryError handled
  too, since `realpath('/dir/')` resolves to the directory and open()
  would otherwise 500.

* Comment FORWARDED_ALLOW_IPS=* assumption in bin/enable_ssl.sh: the
  wildcard is only safe because the override drops anthias-server's
  external port mapping, so any future edit that re-adds a host:port
  publication has to either tighten the wildcard to Caddy's IP/CIDR
  or unset it.

* Replace ANSI-C escape sequences in the Caddyfile generator with
  plain multi-line strings. `read -r -d ''` was the first attempt
  but it strips trailing newlines, which collapsed `auto_https off`
  onto the same line as `}` in cert mode. Multi-line literals with
  echo "$VAR" are unambiguous and Caddy validates all three modes
  cleanly again.

* Add a docker-volume cleanup hint to bin/disable_ssl.sh: Caddy's
  local CA persists in anthias_anthias-caddy-data so an enable →
  disable → enable cycle reuses the same CA (intentional — browsers
  that trusted it stay trusted), and operators who want a fresh CA
  now have the exact `docker volume rm` command in the script's
  output.

15 view tests still pass; default + SSL Caddyfiles still validate;
default + SSL endpoints still return 200 / 301 / 101 in smoke tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Copilot's host/MIME hardening feedback

Two security tightenings on top of the prior SECURE_PROXY_SSL_HEADER
gate (which Copilot flagged on a stale snapshot — that one's already
fixed in 07b784b9):

* `ALLOWED_HOSTS` is now driven by the `ALLOWED_HOSTS` env var, with
  `*` kept as the default so flexible LAN-by-IP / mDNS access still
  works out of the box. Operators on hardened LANs can opt into a
  strict allowlist (`ALLOWED_HOSTS=192.168.1.50,anthias.local,...`)
  to defend against DNS-rebinding without us guessing the right set
  of hostnames at install time. Verified the env override parses to
  `['192.168.1.50', 'anthias.local', 'localhost']`.

* `static_with_mime` now allowlists the `?mime=` query param against
  a small set of download-only types
  (`application/{gzip,octet-stream,x-gzip,x-tar,x-tgz,zip}`) instead
  of accepting whatever the caller sends. Closes the XSS footgun
  where `?mime=text/html` would have served a stored file as HTML.
  The frontend's only legitimate caller (the backup download) sends
  `application/x-tgz`, which is in the allowlist; anything else
  falls back to mimetypes.guess_type. Added
  `test_mime_override_rejects_html` to lock that behaviour in.

16 view tests pass; ruff clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 12:51:40 +01:00
Viktor Petersson
3c96b541a1 refactor: rename legacy 'screenly' dirs to 'anthias' with auto-migration (#2753)
* refactor: rename legacy 'screenly' dirs to 'anthias' with auto-migration

For legacy reasons the host directories storing the cloned repo, user
assets, and config + DB still carried the old 'screenly' name. Rename
all three to their 'anthias' equivalents, plus the in-container paths,
the screenly.db / screenly.conf filenames, /tmp/screenly.watchdog,
/etc/sudoers.d/screenly_overrides, the ansible role, and the nginx URL
location. Existing installations are migrated automatically:

  ~/screenly/         -> ~/anthias/
  ~/screenly_assets/  -> ~/anthias_assets/
  ~/.screenly/        -> ~/.anthias/
    screenly.db   -> anthias.db
    screenly.conf -> anthias.conf  (paths rewritten in the body)
  /etc/sudoers.d/screenly_overrides -> /etc/sudoers.d/anthias_overrides

Migration is driven by two new helpers:

  - bin/migrate_legacy_paths.sh: idempotent host-side rename. Self-relocates
    if invoked from inside the dir being renamed. Rewrites both relative
    and absolute path values inside screenly.conf. Leaves dir-level
    back-compat symlinks at the old paths and file-level symlinks
    (screenly.db, screenly.conf) inside the migrated config dir so
    user automation / one-version downgrade still find familiar names.
  - bin/migrate_in_container_paths.sh: defensive /data/.screenly and
    /data/screenly_assets symlinks invoked from the container start
    scripts, in case an older docker-compose.yml is still mounting the
    legacy paths during a partial upgrade.

Wired into bin/install.sh (renames ~/screenly before clone_repo, then
runs the in-repo helper after) and bin/upgrade_containers.sh (runs the
helper near the top before regenerating docker-compose.yml).

Out of scope (intentional): the screenly/anthias-* Docker Hub namespace,
the Screenly/Anthias GitHub repo URLs, the screenly_ose Balena fleet,
api.screenlyapp.com / apt.screenlyapp.com legacy URLs, and brand URLs
in docs.

Tests: added tests/test_migrate_legacy_paths.py (4 cases: full migration,
absolute-path conf rewrite, idempotent rerun, fresh-install no-op) and
tests/test_backup_helper.py::RecoverLegacyTarballTest (recover() still
accepts pre-rename .tar.gz backups). Ruff clean. All 6 new tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style: apply ruff format to new test files

CI's `ruff format --check` flagged tests/test_backup_helper.py and
tests/test_migrate_legacy_paths.py. Reformatted; behaviour unchanged,
6/6 migration-related tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: suppress SonarCloud S5042 on write-mode tarfile.open in fixtures

The two new fixture-building calls in tests/test_backup_helper.py use
`tarfile.open(..., 'w:gz')` (write mode), which Sonar's python:S5042
rule flags as "expanding this archive file" without distinguishing
read from write. arcnames are hardcoded test inputs with no
path-traversal surface, so the warning is a false positive here.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: address Copilot review feedback

- lib/backup_helper.py: harden recover() against tar path traversal
  (Zip Slip / CVE-2007-4559). New _safe_tar_member() rejects absolute
  paths, '..' components, non-regular-non-directory members
  (symlinks/hardlinks/devices), members outside the allowed top-level
  dirs, and any post-normalisation path that escapes $HOME. Iterates
  members manually instead of bulk extractall(), and passes
  filter='data' on Python with PEP-706 extraction filters
  (3.11.4+/3.12+) for belt-and-suspenders defence.
- tests/test_backup_helper.py: BackupHelperTest now patches HOME to a
  per-test tmpdir so `tearDown` no longer rmtree's a real ~/anthias
  checkout when run on a developer workstation. Also added
  test_recover_skips_path_traversal_member, which proves a hostile
  tarball entry like `../evil.txt` is logged-and-skipped, not written
  outside $HOME.
- docs/raspberry-pi5-ssd-install-instructions.md: capitalise "This"
  after the period.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: add missing leading slash to repo dir heading

The heading for the cloned repo dir was rendered as
`home/${USER}/anthias/`, while every other heading in the section uses
absolute paths like `/home/${USER}/.anthias/`. Same fix applied to the
legacy-path mention in the note below it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 13:34:53 +01:00
Viktor Petersson
c7ec6ea771 chore(build): replace webpack, npm, and jest with bun (#2746)
* chore(deps): manage Python deps via uv dependency-groups

Replaces the six service-scoped requirements*.txt files with
PEP 735 dependency-groups in pyproject.toml and rebuilds every
Docker image as a two-stage build: a uv-builder stage (using the
official ghcr.io/astral-sh/uv image, with a pip fallback for
armv6) produces /venv via `uv sync --group <svc>`, which the
runtime stage copies in. uv.lock becomes authoritative for all
services. requirements/requirements.host.txt is kept as a
committed, auto-generated artifact (`uv export --group host`) so
bin/install.sh and the Ansible role keep working; a python-lint
CI step enforces it stays in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(deps): bump Django, cryptography, pyOpenSSL, and 5 others

- Django 4.2.29 → 4.2.30 (latest 4.2 LTS)
- cryptography 3.3.2 → 46.0.7 (capped by pyOpenSSL 26's `cryptography<47`;
  cryptography 47 is incompatible with the latest pyOpenSSL)
- pyOpenSSL 19.1.0 → 26.0.0 (required by newer cryptography ABI —
  pyOpenSSL 19 crashed at import against cryptography ≥ ~3.4)
- requests 2.32.5 → 2.33.1 (aligned across every group, including
  docker-image-builder and local)
- pyasn1 0.6.2 → 0.6.3
- redis 7.1.0 → 7.4.0
- Cython 3.2.3 → 3.2.4
- sh 1.8 → 2.2.2 (major bump; usages in celery_tasks.py, bin/wait.py,
  lib/utils.py stick to the stable `sh.<cmd>` + `sh.ErrorReturnCode_N`
  API — verified still works)
- python-vlc 3.0.20123 → 3.0.21203

`mako` and `flatted` were requested but skipped: `mako` was already
removed from the project (9535745e), and `flatted` is an npm dep in
`package-lock.json`, not a Python dep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(deps): bump wheel from 0.38.1 to 0.46.2

Closes Dependabot PR #2651.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(deps): manage Python deps via uv dependency-groups

Replaces the six service-scoped requirements*.txt files with
PEP 735 dependency-groups in pyproject.toml and rebuilds every
Docker image as a two-stage build: a uv-builder stage (using the
official ghcr.io/astral-sh/uv image, with a pip fallback for
armv6) produces /venv via `uv sync --group <svc>`, which the
runtime stage copies in. uv.lock becomes authoritative for all
services. requirements/requirements.host.txt is kept as a
committed, auto-generated artifact (`uv export --group host`) so
bin/install.sh and the Ansible role keep working; a python-lint
CI step enforces it stays in sync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(deps): bump Django, cryptography, pyOpenSSL, and 5 others

- Django 4.2.29 → 4.2.30 (latest 4.2 LTS)
- cryptography 3.3.2 → 46.0.7 (capped by pyOpenSSL 26's `cryptography<47`;
  cryptography 47 is incompatible with the latest pyOpenSSL)
- pyOpenSSL 19.1.0 → 26.0.0 (required by newer cryptography ABI —
  pyOpenSSL 19 crashed at import against cryptography ≥ ~3.4)
- requests 2.32.5 → 2.33.1 (aligned across every group, including
  docker-image-builder and local)
- pyasn1 0.6.2 → 0.6.3
- redis 7.1.0 → 7.4.0
- Cython 3.2.3 → 3.2.4
- sh 1.8 → 2.2.2 (major bump; usages in celery_tasks.py, bin/wait.py,
  lib/utils.py stick to the stable `sh.<cmd>` + `sh.ErrorReturnCode_N`
  API — verified still works)
- python-vlc 3.0.20123 → 3.0.21203

`mako` and `flatted` were requested but skipped: `mako` was already
removed from the project (9535745e), and `flatted` is an npm dep in
`package-lock.json`, not a Python dep.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(deps): bump wheel from 0.38.1 to 0.46.2

Closes Dependabot PR #2651.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix: adapt sh 2.x API changes in wait.py and viewer

Two real breakages uncovered by auditing every `sh.*` call site
against the sh 1.x → 2.x API:

- bin/wait.py: `sh.grep(sh.route(), 'default')` no longer pipes
  in sh 2.x — the inner command stringifies to its stdout and
  becomes a literal argument to grep, producing
  `grep '<route_output>' default` and an ErrorReturnCode_2. Use
  the idiomatic `sh.grep('default', _in=sh.route())` instead.

- viewer/__init__.py: `browser.process.alive` is gone in sh 2.x
  (`OProc` no longer exposes it). Use `browser.process.is_alive()[0]`,
  which returns the `(alive_bool, exit_code)` tuple.

Plus two review nits:
- Add trailing newline to docs/migrating-assets-to-screenly.md
- Use `diff -u` in the requirements.host.txt CI drift check so
  failures print a readable unified diff.

Verified against sh==2.2.2 inside the rebuilt server image:
- `sh.grep('default', _in=sh.echo('…'))` pipes correctly
- `cmd.process.is_alive()` → `(True, None)` while running,
  `(False, 0)` after wait()
- `cmd.process.stdout.decode('utf-8')` still works on `_bg=True`
  processes

83/83 unit tests + 12/12 integration tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): serialize apt cache access with sharing=locked

The multi-stage uv-builder + runtime layout means two RUN steps can
race on BuildKit's shared `/var/cache/apt` cache mount. apt requires
an exclusive lock on /var/cache/apt/archives, so a concurrent
apt-get in the sibling stage causes the build to fail with
`E: Could not get lock /var/cache/apt/archives/lock`.

BuildKit's default cache mount sharing mode is `shared` (unrestricted
concurrent access). Switching to `sharing=locked` makes BuildKit
serialize access across stages, matching apt's locking model.

Discovered while cross-compiling `pi4-64` under QEMU, where the
slower emulated apt-get in stage 1 overlapped with the host-speed
apt-get in stage 2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: fix ansible-lint and sbom workflows

**ansible-lint** (broken since 2026-04-08, #2732):
- `ansible-community/ansible-lint-action@main` repo is gone (404),
  so every run failed with "Unable to resolve action".
- Rewrite the workflow to use setup-uv + `uv run ansible-lint` from
  a new `ansible-lint==26.4.0` entry in the `dev-host` dependency
  group — matches the uv-based pattern already used by
  `python-lint.yaml`.
- Add `.ansible-lint` config with a skip list covering 19
  pre-existing violations in `ansible/` roles
  (`var-naming[no-role-prefix]`, `risky-shell-pipe`, `no-free-form`)
  so the workflow can go green today; follow-up PRs should drive
  the skip list down.
- Extend the path triggers to fire on config, workflow, and lock
  changes — not just `ansible/**`.

**sbom** (broken since 2026-04-02):
- The `sbomify/github-action` renamed `SBOM_FILE` to `LOCK_FILE` for
  lockfile inputs. Every run has been failing with "`uv.lock` is a
  lock file, not an SBOM. Please use LOCK_FILE instead of SBOM_FILE."
- Rename both `SBOM_FILE` envs (`package-lock.json` and `uv.lock`)
  to `LOCK_FILE`.

Verified locally: `uv run ansible-lint ansible/` passes (0
failures, 0 warnings).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* chore(build): replace webpack, npm, and jest with bun

Collapses the JS toolchain to a single tool. Bun handles installs
(replacing npm), bundling via `bun build` + `sass` CLI (replacing
webpack + ts-loader + babel + mini-css-extract-plugin), and testing
via `bun test` (replacing jest + ts-jest + jest-fixed-jsdom). Dev/test
Dockerfiles pull the bun binary from the official `oven/bun` image via
`COPY --from=`; production uses `oven/bun` as a builder stage.

Removes 18 devDependencies and 5 config files; adds only `bunfig.toml`
and `@happy-dom/global-registrator`.

Drive-by fix: `FormData` was imported as a value from `@/types` in
two files but is a type-only interface shadowing the browser global.
Webpack+ts-loader silently erased it; Bun's bundler surfaced the bug.
Converted to `import type`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(docker): symlink bunx to bun in dev and test images

`bunx` is a symlink to `bun` in the official `oven/bun` image, so the
single-file `COPY --from=oven/bun:...-slim /usr/local/bin/bun` missed it.
Result: `bun run dev:css` / `bun run build:css` failed with
`bunx: command not found` inside dev and test containers.

Recreate the symlink after the copy. Production is unaffected because
its builder stage uses `FROM oven/bun` (bunx already present).

Caught by full end-to-end build verification.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* ci: SHA-pin all external GitHub Actions

Addresses SonarCloud rule githubactions:S7637 ("Use full commit SHA
hash for this dependency") and brings the repo in line with the
hardened CI guidance from OpenSSF, CISA, and GitHub itself: tag refs
like @v7 or @master are mutable and can be retargeted by the action
owner or via compromise. Pinning to a full commit SHA removes that
supply-chain risk.

Every `uses:` reference to an external action across all 13 workflow
files is now pinned by SHA, with the original tag preserved as an
inline comment so the intent remains readable:

    uses: actions/checkout@de0fac2e45 # v6

Dependabot's github-actions ecosystem (already configured in
.github/dependabot.yml) recognises this `<SHA> # <tag>` format and
will update both the SHA and the comment together on future version
bumps, so we don't lose automated update coverage.

Scope: 21 distinct external actions × 73 total use sites across
ansible-lint, build-balena-disk-image, build-webview, codeql-analysis,
deploy-website, docker-build, generate-openapi-schema, javascript-lint,
lint-workflows, python-lint, sbom, and test-runner. Local workflow
references (./.github/workflows/...) left untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs,chore: address review feedback on bun migration

- Update CLAUDE.md and docs/developer-documentation.md to replace
  npm/webpack/jest references with bun equivalents. The old webpack
  ProvidePlugin bullet was superseded by tsconfig's react-jsx runtime;
  restate that.
- Add comments in setupTests.ts explaining (1) why Bun's native fetch
  is stashed and restored around happy-dom's GlobalRegistrator (so MSW
  can intercept) and (2) why testing-library is imported dynamically
  after registration (so `screen` binds to a live document.body).
- Narrow the production builder SCSS COPY back to `*.scss` and drop
  the unused `bunfig.toml` copy (it's only consumed by `bun test`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(dev): fail-fast when a watcher crashes in `bun run dev`

`wait` without arguments returns the last-exiting job's status, so a
crashing JS or CSS watcher could leave the script reporting success.
Track each watcher's PID, use `wait -n` to exit on the first failure,
and kill the survivor via a trap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 06:53:56 +01:00
Nico Miguelino
29ae072514 chore: replace Poetry with uv for managing host dependencies (#2611) 2025-12-16 05:03:27 -08:00
Nico Miguelino
89e6182871 chore: migrate to TypeScript (#2359) 2025-06-24 12:26:50 -07:00
nicomiguelino
9a55aa6cbc docs: add guide on how to get started with the admin site 2025-06-04 09:47:32 -07:00
Nico Miguelino
87fbddeacc chore(ci): replace NPM lint script with lint:check and lint:fix (#2301) 2025-05-28 09:52:19 -07:00
Nico Miguelino
51e4511bba feat: migrate to React (#2265) 2025-05-26 21:04:19 -07:00
Nico Miguelino
490051585f Replace flake8 with ruff (#2092) 2025-01-14 06:47:52 -08:00
Nico Miguelino
de804d4f06 chore: refactor the image builder script into multiple files (#2161) 2024-12-17 09:52:12 -08:00
nicomiguelino
2db1dc6b4d docs: update header title in the developer docs 2024-12-11 06:45:51 -08:00
nicomiguelino
4dc950b660 docs: remove unused how-tos 2024-12-05 14:39:18 -08:00
nicomiguelino
27dcf0e5fa docs: update documentation for building images in Pi or x86 2024-12-05 14:05:52 -08:00
Nico Miguelino
7dd6d49881 chore: update development mode scripts to containerize Poetry and other relevant dependencies (#2144) 2024-12-04 10:14:07 -08:00
Nico Miguelino
01d28d55ec chore: make use of Webpack for building CSS and JS files (#2127) 2024-11-15 11:17:08 -08:00
Nico Miguelino
c766045f3e chore: use multi-stage builds for server images in both development and production environments (#2117) 2024-11-08 21:59:42 -08:00
Nico Miguelino
f8749b123e chore(workflow): port the Docker image builder script to Python (#2060) 2024-11-07 06:04:32 -08:00
Nico Miguelino
f019fd6c27 docs: make use of alerts in Markdown files (#2110) 2024-11-04 14:11:03 -08:00
Nico Miguelino
d7132ab325 chore(tools): write a script for starting the development server (#2104) 2024-10-31 08:06:41 -07:00
Nico Miguelino
c8c86042f8 docs: update dev docs, specifically on running the Python linter (#2103) 2024-10-30 09:13:35 -07:00
Nico Miguelino
284c025db6 docs: update docs and scripts for setting up Anthias in dev mode (#2097) 2024-10-17 14:11:53 -07:00
Nico Miguelino
188e3993d0 Migrate web server back-end from Flask to Django (#2040) 2024-10-16 14:07:45 -07:00
Nico Miguelino
3b6ed15039 Remove docs for playing around with basic authentication via the command line (#2078) 2024-10-03 09:06:04 -07:00
Nico Miguelino
29d4c24fb2 chore(ci): replace --with with --only when installing specific deps (#2063) 2024-09-10 08:29:47 -07:00
Nico Miguelino
68d908fb8b tests: replaces nose attributes with unittest's skip (#2045) 2024-09-02 10:33:08 -07:00
Trickfilm400
157ec7b224 fix(docs): typo in developer-documentation.md (#2047) 2024-09-01 11:26:17 -07:00
Nico Miguelino
d5f5c63e1e Use Poetry for the Python linter (#2042) 2024-08-29 11:29:59 -07:00
Nico Miguelino
3ca8d4e2b0 Update documentation (#2037)
* docs: replace old info with up-to-date info
* docs: add follow-up info about the devices where the installer doesn't work
2024-08-27 07:04:03 -07:00
Nico Miguelino
99a38d69ee Upgrade containers to use Bookworm (#1980)
* upgrade containers from using Buster to Bookworm
* replace OMX with VLC
* update the Qt version, webview hash, and the webview download URL
* enable FKMS for Raspberry Pi 1, 2, 3 and 4 devices
2024-07-22 14:26:01 -07:00
Nico Miguelino
876ed0cf19 chore: renames screenly-host-agent to anthias-host-agent (#1957) 2024-07-04 18:50:54 -07:00
Nico Miguelino
2ff6cb889f Renames Plymouth files to include Anthias in the path names (#1958)
* chore: renames Plymouth files to include Anthias in the path names
* fix: change default theme from `screenly` to `anthias`
2024-07-04 17:16:44 -07:00
Nico Miguelino
d2b6c05811 Cleans up upgrade (via web UI) code (#1947)
* cleanup upgrade (via web UI) code
* rename `screenly.scss` and `screenly.css` files to `anthias.scss`
and `anthias.css`, respectively
* install Node.js dependencies for transpiling SASS files
* add NPM scripts for compiling SASS and CoffeeScript in development mode
2024-06-28 13:31:45 -07:00
Nico Miguelino
87e2d493ce Adds Python linting in CI (#1939) 2024-06-21 09:11:14 -07:00
nicomiguelino
5a82c77cf7 chore: cleanup code related to USB assets 2024-06-18 15:49:13 -07:00
Nico Miguelino
bb86fdc6be fix: Bump Chrome and ChromeDriver to latest stable version. (#1869)
* Fixes the unit test pipeline
2024-04-03 00:34:23 -07:00
nicomiguelino
1f0d6435cd Do another major docs overhaul. 2023-05-27 00:16:53 -07:00
nicomiguelino
e08531f601 Overhaul general docs and Wi-fi setup docs. 2023-05-26 13:30:43 -07:00
nicomiguelino
09b7fa2586 Add description for the NGINX component. 2023-05-25 16:31:57 -07:00
nicomiguelino
ec184fab9a Edit developer documentation. 2023-05-25 16:13:55 -07:00
nicomiguelino
3063f7e5e7 Modify diagram (via D2). 2023-05-24 23:54:14 -07:00
nicomiguelino
ab43191c96 Apply first steps in overhauling the repo's docs. 2023-05-23 22:53:10 -07:00
Viktor Petersson
c85dddd783 More release information 2022-12-06 18:08:19 +00:00
Viktor Petersson
7bd7c6485c Updates documentation 2022-12-06 17:57:25 +00:00
Emyll Almonte
c934e8a99b Update developer-documentation.md 2020-12-24 01:18:15 -05:00
Viktor Petersson
b00c73c05c Merge branch 'master' into experimental
# Conflicts:
#	balena.yml
#	bin/start_balena.sh
#	docker-compose.yml
#	docker/Dockerfile.base
#	docker/Dockerfile.server.template
#	docker/Dockerfile.viewer.template
#	docker/Dockerfile.websocket.template
#	docs/developer-documentation.md
2020-12-16 17:47:18 +00:00
Emyll Almonte
ff99090302 Update developer-documentation.md 2020-11-26 17:42:17 -05:00
Emyll Almonte
1126d99707 Update developer-documentation.md 2020-11-26 17:34:41 -05:00