LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-14 19:58:44 -04:00

Author	SHA1	Message	Date
Richard Palethorpe	73385713ca	feat(distributed): enforce registration token for worker file transfer (#10183 ) The worker HTTP file-transfer server is authenticated by the registration token via checkBearerToken, which fails open on an empty token: every /v1/files, /v1/files-list and /v1/backend-logs request is then served unauthenticated, granting read/write to the worker's models/staging/data directories. The fail-open was also silent (the only auth log sat on the unreachable reject branch), and the worker process never runs DistributedConfig.Validate(), so the existing frontend warning did not cover the component that exposes the server. Mirror the NatsRequireAuth pattern: keep anonymous as the default but make it loud and opt-in enforceable. - Log a prominent warning when the file-transfer server starts tokenless. - Add LOCALAI_REGISTRATION_REQUIRE_AUTH: DistributedConfig.Validate() errors on an empty token (frontend) and the worker refuses to start (fail-fast, before registration), so production can fail closed. Also satisfies the F-003 suggestion to fail Validate() on distributed + empty token. - Add LOCALAI_DISTRIBUTED_REQUIRE_AUTH umbrella switch implying both RegistrationRequireAuth and NatsRequireAuth — one production knob locking down the registration/file-transfer layer and the NATS bus together; the granular flags remain available as single-layer overrides. Wired into the frontend, supervisor worker, and agent worker (vLLM worker has neither a NATS connection nor a file-transfer server, so it is left untouched). - Document in distributed-mode.md (warning callout + flag tables). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-05 14:34:28 +02:00
Richard Palethorpe	3a932a9803	feat(distributed): Add NATS JWT authentication and TLS/mTLS options (#10159 ) * feat(distributed): NATS JWT auth, TLS/mTLS options, and e2e coverage Mint per-node NATS user JWTs at registration when LOCALAI_NATS_ACCOUNT_SEED is set, and connect workers with scoped credentials from the register response. Add optional LOCALAI_NATS_TLS_CA/CERT/KEY for private CA and mTLS alongside tls:// URLs, plus test-e2e-distributed and NatsJWT container e2e specs. Document JWT setup (nats-auth-setup.sh) and TLS env vars in distributed-mode. Assisted-by: Grok:grok grok-build Signed-off-by: Richard Palethorpe <io@richiejp.com> * fix(distributed): correct NATS JWT scoping and harden client auth The JWT-auth path added in 46467cc7 had several gaps that fail silently under LOCALAI_NATS_REQUIRE_AUTH: - Agent-worker minted JWTs did not allow the subjects the agent worker actually subscribes to (jobs.mcp-ci.new and nodes.<id>.backend.stop), so MCP-CI jobs and backend-stop session cleanup were silently dropped. Scope the agent permission set to those subjects. - NATS subscription permission violations were swallowed (Subscribe returned a live-but-dead subscription). Confirm subscriptions with a server round-trip so a denial surfaces synchronously, and log async permission errors. - The backend worker connected anonymously when given a JWT without its paired seed; reject the unpaired credential instead. - The documented service-user permissions in nats-auth-setup.sh omitted prefixcache.>, which the frontend publishes and subscribes; add it. Also: add a credential-provider hook to the messaging client (consumed by the follow-up credential-lifecycle change), drop the always-nil error from NatsMessagingOptions, run go mod tidy (jwt/v2 and nkeys are now direct), and gofmt the feature's files. Tests: an agent-JWT e2e spec that connects to the enforcing NATS server and exercises every subscription the agent worker makes, plus permission allow-list coverage unit tests. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> * feat(distributed): acquire and auto-refresh worker NATS credentials Workers fetched NATS credentials once at startup, which broke two cases under JWT auth: a worker that registered while still pending admin approval never received a minted JWT (it connected unauthenticated and gave up), and a long-running worker's 24h JWT expired with no way to renew it. Introduce workerregistry.NATSCredentialManager, built on idempotent re-registration (the frontend preserves the node row and mints a fresh JWT each call): - Acquire re-registers through admin approval until the node is approved and credentials are minted (or returns the first success when auth is not required, preserving anonymous-NATS behavior). - RefreshLoop re-registers before the JWT expires (~75% of its lifetime), updating the credentials served to the connection. - Both are bounded (default 100 attempts / consecutive failures) and return an error on exhaustion, so an unapprovable or unrenewable worker exits non-zero and surfaces the problem instead of hanging or drifting toward an expired credential. The messaging client gains WithUserJWTProvider, fetching credentials on each (re)connect so the connection transparently adopts a refreshed JWT when the server expires the old one. RegisterFull exposes the approval status and full response; Register delegates to it. Both the backend worker and the agent worker are wired to this: explicit env credentials are used as-is, minted credentials are acquired-with-wait and refreshed, and a permanent refresh failure shuts the worker down so it restarts and re-acquires. Tests cover Acquire (wait-through-pending, bounded give-up, context cancel), RefreshLoop (refresh-before-expiry, bounded failure, no-expiry exit) and jwtExpiry decoding. Docs updated in distributed-mode.md. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-06-03 19:43:56 +02:00
LocalAI [bot]	0d57957ebb	feat(worker): add LOCALAI_PREFETCH_MODELS for boot-time gallery prefetch (#10108 ) In LocalAI distributed mode the master streams a model GGUF to a worker on first inference. On bandwidth-constrained cluster networks (libp2p circuit-v2 relays under NAT, double-NAT residential, slow overlays) that transfer can be slow or unreliable — meanwhile each worker's outbound internet is usually fine. LOCALAI_PREFETCH_MODELS lets the operator name gallery model IDs to download at worker boot, BEFORE the worker subscribes to backend.install events. Reuses gallery.InstallModelFromGallery so the on-disk /models layout matches what the master would have pushed, and the master can still push files on demand if the gallery is unreachable at boot (prefetch is non-fatal on every error path). The installer is wrapped in a function-value indirection so tests can swap a fake without touching the real gallery; production never reassigns the binding. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-31 12:22:45 +02:00
Richard Palethorpe	5d0b549049	feat(gallery): verify backend OCI images with keyless cosign (#9823 ) * feat(gallery): verify backend OCI images with keyless cosign Close a trust gap where a registry compromise or MITM could silently replace a backend image: the gallery YAML tells LocalAI which image to pull, but until now nothing verified the bytes came from our CI. Consumer (pkg/oci/cosignverify): - New package using sigstore-go to verify keyless-cosign signatures. - OCI 1.1 referrers API + new bundle format (no legacy :tag.sig). - Policy fields: Issuer / IssuerRegex / Identity / IdentityRegex / NotBefore. NotBefore is the revocation lever — keyless Fulcio certs are ephemeral so revocation is policy-side; advancing not_before in the gallery YAML invalidates every signature predating the cutoff. - TUF trusted root cached process-wide so N backends from one gallery do 1 fetch, not N. Plumbing: - pkg/downloader: ImageVerifier interface + WithImageVerifier option threaded through DownloadFileWithContext. Verification runs between oci.GetImage and oci.ExtractOCIImage, with digest pinning via pinnedImageRef to close the TOCTOU window. Skips the verifier's HEAD when the ref is already digest-pinned. - core/config: Gallery.Verification YAML block. - core/gallery: backendDownloadOptions builds the verifier from the policy; applied on initial URI, mirrors, and tag fallbacks. - core/gallery/upgrade: the upgrade path now routes through the same options builder. A regression Ginkgo spec pins this contract — without it, UpgradeBackend silently bypassed verification. - core/cli: --require-backend-integrity (LOCALAI_REQUIRE_BACKEND_INTEGRITY) escalates missing policy / empty SHA256 from warn to hard-fail. Producer (.github/workflows/backend_merge.yml): - id-token: write at job scope (PR-fork-safe via existing event gate). - sigstore/cosign-installer@v3 pinned to v2.4.1. - After each docker buildx imagetools create, resolve the manifest list digest and run cosign sign --recursive --new-bundle-format --registry-referrers-mode=oci-1-1 against repo@digest. --recursive signs the index and every per-arch entry, matching how the consumer resolves a tag to a platform-specific manifest before verifying. Rollout: backend/index.yaml has no `verification:` block yet, so this PR is backward-compatible — installs proceed with a warning until the gallery is populated. Strict mode is opt-in. Assisted-by: claude-code:claude-opus-4-7 [Bash] [Edit] [Read] [Write] [WebSearch] [WebFetch] Signed-off-by: Richard Palethorpe <io@richiejp.com> * refactor(gallery): plumb RequireBackendIntegrity through config instead of env The previous implementation re-exported the --require-backend-integrity CLI flag into LOCALAI_REQUIRE_BACKEND_INTEGRITY via os.Setenv, then re-read it in core/gallery via os.Getenv. This leaked process state into the gallery package and made the flag impossible to override per-call or test without touching the env. Add RequireBackendIntegrity to ApplicationConfig (with a matching WithRequireBackendIntegrity AppOption) and thread the bool through every install/upgrade path: InstallBackend, InstallBackendFromGallery, UpgradeBackend, InstallModelFromGallery, InstallExternalBackend, ApplyGalleryFromString/File, startup.InstallModels. Worker subcommands gain the same env-bound flag on WorkerFlags so distributed-worker installs honor it consistently with the worker daemon path. Add a forbidigo lint rule against os.Getenv / os.LookupEnv / os.Environ to keep the env-leak pattern from creeping back. Existing offenders (p2p, config loaders, etc.) are baseline-grandfathered by the existing new-from-merge-base: origin/master setting; targeted path exclusions cover the legitimate cases — kong CLI entry points, backend subprocesses, system capability probes, gRPC AUTH_TOKEN inheritance, test gating env vars. Assisted-by: claude-code:claude-opus-4-7 Signed-off-by: Richard Palethorpe <io@richiejp.com> --------- Signed-off-by: Richard Palethorpe <io@richiejp.com>	2026-05-18 08:02:20 +02:00
LocalAI [bot]	e5d7b84216	fix(distributed): split NATS backend.upgrade off install + dedup loads (#9717 ) * feat(messaging): add backend.upgrade NATS subject + payload types Splits the slow force-reinstall path off backend.install so it can run on its own subscription goroutine, eliminating head-of-line blocking between routine model loads and full gallery upgrades. Wire-level Force flag on BackendInstallRequest is kept for one release as the rolling-update fallback target; doc note marks it deprecated. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed/worker): add per-backend mutex helper to backendSupervisor Different backend names lock independently; same backend serializes. This is the synchronization primitive used by the upcoming concurrent install handler — without it, wrapping the NATS callback in a goroutine would race the gallery directory when two requests target the same backend. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(distributed/worker): run backend.install handler in a goroutine NATS subscriptions deliver messages serially on a single per-subscription goroutine. With a synchronous install handler, a multi-minute gallery download would head-of-line-block every other install request to the same worker — manifesting upstream as a 5-minute "nats: timeout" on unrelated routine model loads. The body now runs in its own goroutine, with a per-backend mutex (lockBackend) protecting the gallery directory from concurrent operations on the same backend. Different backend names install in parallel. Backward-compat: req.Force=true is still honored here, so an older master that hasn't been updated to send on backend.upgrade keeps working. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed/worker): subscribe to backend.upgrade as a separate path Slow force-reinstall now lives on its own NATS subscription, so a multi-minute gallery pull cannot head-of-line-block the routine backend.install handler on the same worker. Same per-backend mutex guards both — concurrent install + upgrade for the same backend serialize at the gallery directory; different backends are independent. upgradeBackend stops every live process for the backend, force-installs from gallery, and re-registers. It does not start a new process — the next backend.install will spawn one with the freshly-pulled binary. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * feat(distributed): add UpgradeBackend on NodeCommandSender; drop Force from InstallBackend Master now sends to backend.upgrade for force-reinstall, with a nats.ErrNoResponders fallback to the legacy backend.install Force=true path so a rolling update with a new master + an old worker still converges. The Force parameter leaves the public Go API surface entirely — only the internal fallback sets it on the wire. InstallBackend timeout drops 5min -> 3min (most replies are sub-second since the worker short-circuits on already-running or already-installed). UpgradeBackend timeout is 15min, sized for real-world Jetson-on-WiFi gallery pulls. Updates the admin install HTTP endpoint (core/http/endpoints/localai/nodes.go) to the new signature too. router_test.go's fakeUnloader does not yet implement the new interface shape; Task 3.2 will catch it up before the next package-level test run. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(distributed): update fakeUnloader for new NodeCommandSender shape InstallBackend lost its force bool param (Force is not part of the public Go API anymore — only the internal upgrade-fallback path sets it on the wire). UpgradeBackend gained a method. Fake records both call slices and provides an installHook concurrency seam for upcoming singleflight tests. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(distributed): cover UpgradeBackend's new subject + rolling-update fallback Task 3.1 changed the master to publish UpgradeBackend on the new backend.upgrade subject; the existing UpgradeBackend tests scripted the old install subject and so all 3 began failing as expected. Updates them to script SubjectNodeBackendUpgrade with BackendUpgradeReply. Adds two new specs for the rolling-update fallback: - ErrNoResponders on backend.upgrade triggers a backend.install Force=true retry on the same node. - Non-NoResponders errors propagate to the caller unchanged. scriptedMessagingClient gains scriptNoResponders (real nats sentinel) and scriptReplyMatching (predicate-matched canned reply, used to assert that the fallback path actually sets Force=true on the install retry). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(distributed): coalesce concurrent identical backend.install via singleflight Six simultaneous chat completions for the same not-yet-loaded model were observed firing six independent NATS install requests, each serializing through the worker's per-subscription goroutine and amplifying queue depth. SmartRouter now wraps the NATS round-trip in a singleflight.Group keyed by (nodeID, backend, modelID, replica): N concurrent identical loads share one round-trip and one reply. Distinct (modelID, replica) keys still fire independent calls, so multi-replica scaling and multi-model fan-out are unaffected. fakeUnloader gains a sync.Mutex around its recording slices to keep concurrent test goroutines race-clean. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * test(e2e/distributed): drop force arg from InstallBackend test calls Two e2e test call sites still passed the trailing force bool that was removed from RemoteUnloaderAdapter.InstallBackend in `9bde76d7`. Caught by golangci-lint typecheck on the upgrade-split branch (master CI was already green because these tests don't run in the standard test path). Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(distributed): extract worker business logic to core/services/worker core/cli/worker.go grew to 1212 lines after the backend.upgrade split. The CLI package was carrying backendSupervisor, NATS lifecycle handlers, gallery install/upgrade orchestration, S3 file staging, and registration helpers — all distributed-worker business logic that doesn't belong in the cobra surface. Move it to a new core/services/worker package, mirroring the existing core/services/{nodes,messaging,galleryop} pattern. core/cli/worker.go shrinks to ~19 lines: a kong-tagged shim that embeds worker.Config and delegates Run. No behavior change. All symbols stay unexported except Config and Run. The three worker-specific tests (addr/replica/concurrency) move with the code via git mv so history follows them. Files split as: worker.go - Run entry point config.go - Config struct (kong tags retained, kong not imported) supervisor.go - backendProcess, backendSupervisor, process lifecycle install.go - installBackend, upgradeBackend, findBackend, lockBackend lifecycle.go - subscribeLifecycleEvents (verbatim, decomposition is a follow-up commit) file_staging.go - subscribeFileStaging, isPathAllowed registration.go - advertiseAddr, registrationBody, heartbeatBody, etc. reply.go - replyJSON process_helpers.go - readLastLinesFromFile Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(distributed/worker): decompose subscribeLifecycleEvents into per-event handlers The 226-line subscribeLifecycleEvents method packed eight NATS subscriptions inline. Each grew context-shaped doc comments mixed with subscription plumbing, making it hard to read any one handler without scrolling past the others. Extract each handler into its own method on *backendSupervisor; the subscriber becomes a thin 8-line dispatcher. No behavior change: each method body is byte-equivalent to its corresponding inline goroutine + handler. Doc comments that were attached to the inline SubscribeReply calls migrate to the new method godocs. Adding the next NATS subject is now a 2-line patch to the dispatcher plus one new method, instead of grafting onto a monolith. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>	2026-05-08 16:24:54 +02:00

5 Commits