The trace middleware buffered the full request and response bodies for every
JSON exchange. With a chatty agent-pool RAG workload, /embeddings responses
(large vector arrays) accumulated to tens of MB in the in-memory buffer; the
admin Traces page would then download and parse 40+ MB on every load and on
every 5s auto-refresh, locking the UI in a loading state.
Add LOCALAI_TRACING_MAX_BODY_BYTES (default 64 KiB) that caps each captured
body. The full payload still flows through to the real client; only the
trace copy is bounded. Exchanges record body_truncated and original
body_bytes so the dashboard can show that truncation happened. The cap is
configurable via env, CLI, and runtime_settings.json.
Also unblock recovery: the Traces page now keeps the Clear button enabled
while loading, since "buffer too large to render" is exactly when the user
needs to clear it.
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* chore: ignore local .worktrees directory
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(openai): stream usage non-zero when tools are enabled
The streaming chat-completions worker for tool-bearing requests
(processTools in core/http/endpoints/openai/chat.go) never forwarded the
cumulative TokenUsage from ComputeChoices to the chunks it placed on the
responses channel. The outer streaming loop's running usage tracker
therefore stayed at the zero value, and the include_usage trailer
reported {prompt_tokens:0, completion_tokens:0, total_tokens:0} whenever
the request carried a `tools` array. Without tools, the alternative
`process` path stamps Usage on every chunk, so that path was unaffected.
Forward the final TokenUsage via a usage-only sentinel chunk (empty
Choices, populated Usage) emitted right before close(responses). The
outer loop's per-chunk Usage capture moves above the empty-Choices skip
so the sentinel updates the tracker without ever reaching the wire,
keeping the existing OpenAI spec contract (intermediate chunks carry no
`usage` field, and the deferred-final-chunk helpers remain Usage-free
per the regression test for issue #8546).
Adds streamUsageFromTokenUsage, usageSentinelChunk, and
applyChunkToUsage helpers with focused Ginkgo coverage plus a flow-level
test that mirrors the outer-loop sequence.
Fixes#9927
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]
* refactor(openai): return final TokenUsage from stream workers
Replace the usage-only sentinel SSE chunk introduced in the previous
commit with a plain return value. The streaming workers process and
processTools (now extracted as package-level processStream and
processStreamWithTools) return (backend.TokenUsage, error); the outer
ChatEndpoint loop reads the cumulative counts off the existing `ended`
channel (now carrying streamWorkerResult{usage, err}) and builds the
include_usage trailer from a normal Go value after the LOOP exits.
This drops the empty-Choices "skip but capture Usage" rule from the
outer loop and removes the usageSentinelChunk / applyChunkToUsage
helpers entirely. The SSE responses channel is back to a single
purpose: wire chunks only.
processStream and processStreamWithTools move into chat_stream_workers.go
so they can be exercised directly from tests. The chat_stream_usage_test.go
suite now drives the workers with a mocked backend.ModelInferenceFunc
and asserts on the returned TokenUsage. The regression coverage for
issue #9927 is therefore behavioral: reverting the fix (discarding
ComputeChoices' usage return) makes the assertions fail with concrete
count mismatches.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The merged feature (#9920) let admins see per-API-key and per-source
totals but did not surface which user owned each key, and lumped
every user's Web UI traffic into a single global Web UI row. This
makes the admin Sources tab properly per-user attributable:
- KeyTotal gains UserID + UserName, populated from the snapshot the
usage middleware already records. The by_key roll-up now groups by
(api_key_id, api_key_name, user_id, user_name).
- New SourceTotals.ByUserSource roll-up groups (source, user_id,
user_name) for sources without a key identity (web, legacy). Only
populated on the admin path (includeLegacy=true); the non-admin
endpoint stays unchanged for backwards compatibility.
- SourcesTable accepts showUserColumn={isAdmin}; admin view renders
a User column, makes the search match user name/id, and expands
Web UI / legacy pseudo-rows from the global aggregate to one row
per user using by_user_source.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* feat(galleryop): add TargetNodeID to ManagementOp for single-node installs
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(galleryop): add NodeScopedKey helpers for per-node opcache rows
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(galleryop): use strings.Cut for NodeScopedKey parsing, reject empty nodeID
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(nodes): scope DistributedBackendManager.InstallBackend to single node via TargetNodeID
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(http): make /api/nodes/:id/backends/install async via gallery service job queue
The handler previously called unloader.InstallBackend synchronously and
blocked the browser for up to 3 minutes waiting on the NATS reply. It now
enqueues a TargetNodeID-scoped ManagementOp on BackendGalleryChannel and
returns HTTP 202 + jobID immediately, matching /api/backends/install/:id.
The opcache key is built via NodeScopedKey(nodeID, backend) so concurrent
installs of the same backend across different nodes do not stomp each
other. galleryService/opcache/appConfig are threaded through
RegisterNodeAdminRoutes for this.
Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(http): log malformed backend_galleries override and stop test drain goroutine
Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(api): expose nodeID for node-scoped backend ops in /api/operations
Node-scoped backend installs land in opcache under "node:<nodeID>:<backend>"
keys. Without splitting that prefix back out, the operations panel renders
the full key as the display name and has no structured way to label which
worker an install is targeting. Detect the prefix, surface nodeID as its own
response field, and reduce the display name back to the bare backend slug.
Bare (non-scoped) ops are left untouched so legacy installs do not gain a
misleading empty nodeID.
Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(react-ui): poll job status for node-targeted backend installs
Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(react-ui): make NodeInstallPicker state updates pure and surface cancellations as errors
Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(react-ui): clarify async semantics in handleInstallOnTarget
Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(http): use statusUrl casing for node install response to match codebase precedent
Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The existing master push pipeline produces `master` (rolling) and
`sha-<short>` tags. Neither is orderable by build time, so downstream
GitOps that want to auto-bump to the newest master build (e.g. Flux
ImagePolicy) can't pick the latest from the tag list — alphabetical
sort over hex shas is effectively random, and the rolling `master`
tag can't be referenced as an immutable bump target.
Add a third tag of the form `master-<epoch>-<sha>` (Unix epoch in
seconds + short sha), gated on default-branch pushes via metadata-
action's `is_default_branch` predicate. The sha is retained for
traceability; the epoch makes the tags numerically orderable, so a
Flux ImagePolicy like
filterTags:
pattern: '^master-(?P<ts>[0-9]+)-[a-f0-9]+$'
extract: '$ts'
policy:
numerical:
order: asc
will reliably bump to the newest master build.
Applied to both image_build.yml (OCI labels stay consistent) and
image_merge.yml (the actual tag publisher via buildx imagetools).
utils: fail immediately on extraction errors
Setting ContinueOnError to false ensures that ExtractArchive does not
leave the model or backend directory in an inconsistent state if a
partial failure occurs. This improves robustness against malformed
archives or unexpected I/O issues during installation.
Signed-off-by: RinZ27 <222222878+RinZ27@users.noreply.github.com>
* feat(usage): add Source, APIKeyID, APIKeyName columns to UsageRecord
Adds three additive columns plus UsageSource* constants. The columns
are auto-migrated by InitDB. APIKeyID is a nullable foreign reference
to UserAPIKey.ID; APIKeyName is snapshotted on each row so revoked
keys keep showing their name in history.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(usage): backfill Source on pre-feature usage rows
InitDB now classifies any pre-existing usage_record with an empty
source: 'legacy-api-key' user -> legacy, everything else -> web.
The backfill is idempotent (only touches NULL/empty rows).
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(usage): add GetUserUsageBySource aggregator
Groups by (bucket, source, api_key_id, api_key_name). Filters out
legacy by default. Returns both per-bucket detail and roll-ups
(by_source, by_key sorted desc and capped at 200, grand_total).
The MAX(created_at) projection is iterated via Rows().Scan into a
string column and parsed manually because the SQLite driver surfaces
the aggregated timestamp as a string, which database/sql refuses to
scan directly into time.Time. Postgres returns a real timestamp; the
same string path handles its RFC3339 form too.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(usage): log Rows() errors and assert LastUsed in tests
Adds rows.Err() and Rows() open-failure logging in
computeSourceTotals so silent data drops surface in logs. Logs on
parseLastUsedString format misses for the same reason. Strengthens
the snapshot-survival test to assert LastUsed is a recent timestamp,
locking the SQLite time-string parser behaviour.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(usage): add admin GetAllUsageBySource with filters and truncation
Optional user_id and api_key_id filters (composed with AND). Legacy
bucket is included for admin callers. truncated=true when more than
200 distinct keys would be in the by_key roll-up.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(auth): plumb auth_source and auth_apikey through Echo context
tryAuthenticate now sets auth_source on every successful branch
(web for session/Bearer-session, apikey for Bearer-key/x-api-key/
token-cookie, legacy for legacy env key match). For named-key
branches it also stores the resolved *UserAPIKey under auth_apikey
so downstream middlewares can snapshot id+name without re-validating.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(auth): expand tryAuthenticate godoc and cover Bearer-session branch
Documents all three context-keys side effects (auth_source,
auth_apikey, _auth_session) plus the split of responsibilities with
the parent Middleware. Adds a test for the Bearer-as-session-token
classification so future regressions there fail loudly.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(usage): UsageMiddleware records source + snapshots key name
Reads auth_source and auth_apikey from the Echo context (set by
auth.Middleware in the previous task). Snapshots UserAPIKey.ID and
Name onto each row so revoked keys remain readable in history.
Falls back to source=web when no auth_source is set (auth disabled
or unrecognised path).
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(usage): add /api/auth/usage/sources and admin variant
Self endpoint filters legacy server-side; admin endpoint includes
legacy and accepts user_id + api_key_id filters. Response includes
buckets, totals.{by_source, by_key, grand_total}, and a truncated
flag set when the per-key roll-up was capped at 200.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(routes): mark test mirror handlers as keep-in-sync with production
The newTestAuthApp helper duplicates production route handlers
inline because it cannot use RegisterAuthRoutes (which requires a
*application.Application). Naming the source path on each mirror
makes the drift contract explicit for future maintainers.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add usageApi.getMySources/getAdminSources + i18n strings
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): add Sources tab skeleton with data fetch
Adds Usage page tab that fetches /api/auth/usage/sources (or the
admin variant). Renders raw totals plus a placeholder key list;
real visualisations land in subsequent commits. Restructures the
existing tab button block so Models and Sources are visible to
non-admins (Users remains admin-only).
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): source mix ribbon + searchable/sortable sources table
Replaces the SourcesTab placeholder rendering with two reusable
components: SourceMixRibbon (one segmented bar per source class)
and SourcesTable (search + sort + revoked-key dim). Pulls the
current API key list to detect revoked keys.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(ui): skip revoked-key detection until the key list is known
existingKeyIds defaulted to an empty Set, which made every live
api_key row render as (revoked) during the brief window before
apiKeysApi.list() resolved, and permanently after a fetch failure.
Use null as the unknown state and suppress the revoked badge until
the parent provides a real Set.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(ui): top-N stacked time chart and drill-in chip for Sources tab
Top 7 sources by total tokens get distinct colours; the rest roll up
into 'Other'. Clicking a row in the SourcesTable dims everything
except that series in the chart; the chip is the canonical clear.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs(usage): document per-API-key Sources tab and endpoints
Extends features/authentication.md Usage Tracking section with:
- A 'Sources' tab description and source-class taxonomy
- Endpoint documentation for /api/auth/usage/sources and the
admin variant
- Response shape example with by_source / by_key / grand_total
- Migration note about pre-feature row backfill
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(usage): silence errcheck on deferred rows.Close
CI errcheck flagged the bare 'defer rows.Close()' in
computeSourceTotals. Wrap in a closure that discards the close
error explicitly; an error here is non-actionable since we have
already drained the rows and logged any iteration failure.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* refactor(usage): bound batcher intake and add Shutdown/FlushNow hooks
The pre-existing usage batcher had no cap on its add() path; the
usageMaxPending=5000 constant only guarded the re-queue path after
a failed write, leaving memory growth unbounded if the DB fell
behind. This commit:
- Adds the cap to add() so saturation drops new records (rate-limited
warn at 1/1024) instead of growing unbounded.
- Raises usageMaxPending to 50000 to absorb realistic inference bursts.
- Replaces the package-level batcher global with a mutex-guarded pair
plus a currentBatcher() accessor so Init / Shutdown cycles are
race-free.
- Adds ShutdownUsageRecorder() for graceful drain on process exit
(not yet wired into app shutdown, just published).
- Adds FlushNow() for deterministic tests; the middleware suite no
longer needs 6s sleeps per spec and now runs in ~50ms instead of 18s.
- Re-queue on failed flush is now cap-aware: prepends as much of the
failed batch as fits alongside concurrent arrivals, instead of
dropping the whole batch when full.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(usage): drain usage batcher on graceful shutdown
Registers ShutdownUsageRecorder with the existing
signals.RegisterGracefulTerminationHandler so SIGINT/SIGTERM
synchronously flushes any in-memory usage records before the
process exits. Without this, up to one flush interval (5s) of
recorded usage was lost when LocalAI restarted.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Aligns LocalAI's llama-cpp gRPC backend with upstream's auto-on prompt
cache path so repeated system prompts (agents, OpenAI/Anthropic-compatible
CLIs, coding assistants) skip prefill on subsequent calls without any
YAML changes. Reported in #9921.
Upstream's server enables `kv_unified=true` (and bumps `n_parallel` to 4)
when slot count is auto, which unlocks `cache_idle_slots`. LocalAI
hardcodes `n_parallel=1` and so far also hardcoded `kv_unified=false`,
which silently force-disables idle-slot saving at server init. The host
prompt cache was allocated but never written across requests.
Changes in backend/cpp/llama-cpp/grpc-server.cpp:
- params.kv_unified: false -> true (single-slot path now benefits from
the prompt cache; users can opt out with `kv_unified:false`)
- params.n_ctx_checkpoints: 8 -> 32 (match upstream default)
- params.cache_idle_slots = true initialized explicitly (upstream default)
- params.checkpoint_every_nt = 8192 initialized explicitly (upstream default)
- New option parsers: cache_idle_slots / idle_slots_cache,
checkpoint_every_nt / checkpoint_every_n_tokens
Docs:
- features/text-generation.md: fix misleading `cache_ram` description
(it's the host-side prompt cache, not the KV cache), document the
kv_unified + cache_ram + cache_idle_slots interaction, add rows for
the two newly-exposed options, and add a worked example for the
agent/CLI workload from the issue.
- advanced/model-configuration.md: mark the legacy `prompt_cache_path`
/ `prompt_cache_all` / `prompt_cache_ro` YAML fields as unused by the
llama-cpp gRPC backend (they target upstream's CLI completion tool
and are not consumed by grpc-server.cpp) and point readers at the
new prompt-cache explainer.
Closes#9921
Assisted-by: claude:opus-4.7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
refactor(agents): bump skillserver, drop redundant Name from list_skills/search_skills
skillserver's list_skills MCP tool used to ship every entry with name=""
(field was commented out), while search_skills populated it - two tools
with inconsistent shape for the same data. skill.Name and skill.ID are
populated from the same source string anyway (the directory name), so
returning both was pure duplication.
Bumps github.com/mudler/skillserver to a7317cb, which drops the Name
field from both SkillInfo and SearchResult and leaves ID as the single
canonical identifier (already what read_skill consumes).
Adds core/services/skills/skills_mcp_test.go, a regression that drives
the LocalAI FilesystemManager through an in-process MCP session and
asserts a newly-created skill is visible by ID on the still-open session.
This is a cleanup, not the root cause of #9868 - the reporter likely
sees something deeper than a cosmetic JSON shape issue.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
llama.cpp's model loader asserts back().pattern == nullptr on
params.tensor_buft_overrides (and on params.kv_overrides.back().key[0]
== 0) before binding them into llama_model_params. PR #8560 attempted
to satisfy llama_params_fit's placeholder requirement by pre-filling
params.tensor_buft_overrides up to llama_max_tensor_buft_overrides()
*before* the option-parse loop. Any subsequent push_back from
override_tensor / draft_cpu_moe / draft_n_cpu_moe / draft_override_tensor
then appended real entries after the placeholders, leaving back() with
a real pattern and tripping the assert. The draft override vector
likewise had no terminator at all.
Mirror upstream common/arg.cpp:645-658 instead: real entries are
pushed during option parsing, and after parsing we pad the main vector
up to ntbo (placeholders land at the end, so back() is always nullptr)
and append a single {nullptr, nullptr} to the draft vector when it is
non-empty. The existing kv_overrides terminator block already matches
upstream and stays.
Verified against ggml-org/llama.cpp@5cbaa5e: only tensor_buft_overrides
(main + draft) and kv_overrides are sentinel-terminated common_params
fields; everything else is size-driven std::vector.
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>
useOperations() was calling setOperations() with a fresh array on every
1s poll, even when the payload was identical. In React 19 the DOM diff
no longer short-circuits dangerouslySetInnerHTML on equal __html, so the
forced Chat re-render re-assigned innerHTML on every assistant message
once per second — wiping any text the user had selected.
Skip the state update when the serialised operations payload is
unchanged, and switch loading/error to functional setters so they also
short-circuit at the source.
Also fixes the chat copy button on plain HTTP: navigator.clipboard is
undefined in non-secure contexts (a common LXC+Docker deployment), but
the previous code called it unconditionally and showed a success toast
regardless. Routed Chat, AgentChat and CanvasPanel through a new
copyToClipboard() helper that uses navigator.clipboard when available
and falls back to a hidden-textarea + execCommand('copy') trick that
browsers still honour outside secure contexts. The fallback preserves
the user's existing selection.
Regression coverage in e2e/chat-polling-selection.spec.js: a
MutationObserver counts mutations on the assistant content node across
3s of polling (must be 0); the copy test stubs out navigator.clipboard
and asserts that execCommand('copy') is invoked.
Assisted-by: claude-opus-4-7-1m
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The new ace-step.cpp revision moves backend initialization inside each
`*_load` call and drops the separate `DiTGGMLConfig` argument from
`dit_ggml_load` (config now lives in `DiTGGML::cfg`, populated from GGUF
metadata at load time). Drop the now-removed `*_init_backend` calls and
replace `g_dit_cfg` accesses with `g_dit.cfg`.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Adapt the C++ wrapper to the new `generate_video()` signature: upstream now
returns `bool` and writes frames/audio via out-parameters (`sd_image_t**`,
`sd_audio_t**`). Also set `p->fps` on the params struct (new upstream field)
and free the returned audio handle on both the success and error paths.
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Non-image/non-audio file attachments (txt, md, csv, json) were being
stored in the 'files' metadata field but never added to the message
content array sent to /v1/chat/completions. Images and audio correctly
received content blocks; files did not.
Fix: push a text content block into messageContent when textContent is
present, matching the pattern used for image_url and audio_url.
Also fixes Home.jsx addFiles which never called file.text() at all,
meaning files attached on the home screen had empty textContent even
before reaching useChat.js.
Note: PDF files use file.text() which returns raw bytes rather than
parsed text. Proper PDF support would require PDF.js or server-side
extraction and is not part of this fix.
Signed-off-by: Daniel Liljeberg <damien_@hotmail.com>
The flake set `src = ./sources;` referencing a non-existent subdirectory,
so `nix build` and `nix develop` both failed evaluation. Point `src` at
the repo root and refresh `vendorHash` accordingly.
Add `devShells.default` with the Go toolchain, protobuf generators,
Node.js/bun for the React UI (`make react-ui`), and the linters used by
`make lint` (golangci-lint, gofumpt, goimports, staticcheck).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(gallery): verify backend OCI images with keyless cosign
Close a trust gap where a registry compromise or MITM could silently
replace a backend image: the gallery YAML tells LocalAI which image to
pull, but until now nothing verified the bytes came from our CI.
Consumer (pkg/oci/cosignverify):
- New package using sigstore-go to verify keyless-cosign signatures.
- OCI 1.1 referrers API + new bundle format (no legacy :tag.sig).
- Policy fields: Issuer / IssuerRegex / Identity / IdentityRegex /
NotBefore. NotBefore is the revocation lever — keyless Fulcio certs
are ephemeral so revocation is policy-side; advancing not_before in
the gallery YAML invalidates every signature predating the cutoff.
- TUF trusted root cached process-wide so N backends from one gallery
do 1 fetch, not N.
Plumbing:
- pkg/downloader: ImageVerifier interface + WithImageVerifier option
threaded through DownloadFileWithContext. Verification runs between
oci.GetImage and oci.ExtractOCIImage, with digest pinning via
pinnedImageRef to close the TOCTOU window. Skips the verifier's HEAD
when the ref is already digest-pinned.
- core/config: Gallery.Verification YAML block.
- core/gallery: backendDownloadOptions builds the verifier from the
policy; applied on initial URI, mirrors, and tag fallbacks.
- core/gallery/upgrade: the upgrade path now routes through the same
options builder. A regression Ginkgo spec pins this contract —
without it, UpgradeBackend silently bypassed verification.
- core/cli: --require-backend-integrity (LOCALAI_REQUIRE_BACKEND_INTEGRITY)
escalates missing policy / empty SHA256 from warn to hard-fail.
Producer (.github/workflows/backend_merge.yml):
- id-token: write at job scope (PR-fork-safe via existing event gate).
- sigstore/cosign-installer@v3 pinned to v2.4.1.
- After each docker buildx imagetools create, resolve the manifest
list digest and run cosign sign --recursive --new-bundle-format
--registry-referrers-mode=oci-1-1 against repo@digest. --recursive
signs the index and every per-arch entry, matching how the consumer
resolves a tag to a platform-specific manifest before verifying.
Rollout: backend/index.yaml has no `verification:` block yet, so this
PR is backward-compatible — installs proceed with a warning until the
gallery is populated. Strict mode is opt-in.
Assisted-by: claude-code:claude-opus-4-7 [Bash] [Edit] [Read] [Write] [WebSearch] [WebFetch]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* refactor(gallery): plumb RequireBackendIntegrity through config instead of env
The previous implementation re-exported the --require-backend-integrity
CLI flag into LOCALAI_REQUIRE_BACKEND_INTEGRITY via os.Setenv, then
re-read it in core/gallery via os.Getenv. This leaked process state
into the gallery package and made the flag impossible to override
per-call or test without touching the env.
Add RequireBackendIntegrity to ApplicationConfig (with a matching
WithRequireBackendIntegrity AppOption) and thread the bool through
every install/upgrade path: InstallBackend, InstallBackendFromGallery,
UpgradeBackend, InstallModelFromGallery, InstallExternalBackend,
ApplyGalleryFromString/File, startup.InstallModels. Worker subcommands
gain the same env-bound flag on WorkerFlags so distributed-worker
installs honor it consistently with the worker daemon path.
Add a forbidigo lint rule against os.Getenv / os.LookupEnv / os.Environ
to keep the env-leak pattern from creeping back. Existing offenders
(p2p, config loaders, etc.) are baseline-grandfathered by the existing
new-from-merge-base: origin/master setting; targeted path exclusions
cover the legitimate cases — kong CLI entry points, backend
subprocesses, system capability probes, gRPC AUTH_TOKEN inheritance,
test gating env vars.
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>