Compare commits

...

51 Commits

Author SHA1 Message Date
LocalAI [bot]
61bf34ea2f fix(traces): cap captured body size to keep admin Traces UI responsive (#9946)
The trace middleware buffered the full request and response bodies for every
JSON exchange. With a chatty agent-pool RAG workload, /embeddings responses
(large vector arrays) accumulated to tens of MB in the in-memory buffer; the
admin Traces page would then download and parse 40+ MB on every load and on
every 5s auto-refresh, locking the UI in a loading state.

Add LOCALAI_TRACING_MAX_BODY_BYTES (default 64 KiB) that caps each captured
body. The full payload still flows through to the real client; only the
trace copy is bounded. Exchanges record body_truncated and original
body_bytes so the dashboard can show that truncation happened. The cap is
configurable via env, CLI, and runtime_settings.json.

Also unblock recovery: the Traces page now keeps the Clear button enabled
while loading, since "buffer too large to render" is exactly when the user
needs to clear it.


Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-22 15:29:24 +02:00
LocalAI [bot]
0b2ae3c6ca fix(openai): stream usage non-zero when tools are enabled (#9941)
* chore: ignore local .worktrees directory

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(openai): stream usage non-zero when tools are enabled

The streaming chat-completions worker for tool-bearing requests
(processTools in core/http/endpoints/openai/chat.go) never forwarded the
cumulative TokenUsage from ComputeChoices to the chunks it placed on the
responses channel. The outer streaming loop's running usage tracker
therefore stayed at the zero value, and the include_usage trailer
reported {prompt_tokens:0, completion_tokens:0, total_tokens:0} whenever
the request carried a `tools` array. Without tools, the alternative
`process` path stamps Usage on every chunk, so that path was unaffected.

Forward the final TokenUsage via a usage-only sentinel chunk (empty
Choices, populated Usage) emitted right before close(responses). The
outer loop's per-chunk Usage capture moves above the empty-Choices skip
so the sentinel updates the tracker without ever reaching the wire,
keeping the existing OpenAI spec contract (intermediate chunks carry no
`usage` field, and the deferred-final-chunk helpers remain Usage-free
per the regression test for issue #8546).

Adds streamUsageFromTokenUsage, usageSentinelChunk, and
applyChunkToUsage helpers with focused Ginkgo coverage plus a flow-level
test that mirrors the outer-loop sequence.

Fixes #9927

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]

* refactor(openai): return final TokenUsage from stream workers

Replace the usage-only sentinel SSE chunk introduced in the previous
commit with a plain return value. The streaming workers process and
processTools (now extracted as package-level processStream and
processStreamWithTools) return (backend.TokenUsage, error); the outer
ChatEndpoint loop reads the cumulative counts off the existing `ended`
channel (now carrying streamWorkerResult{usage, err}) and builds the
include_usage trailer from a normal Go value after the LOOP exits.

This drops the empty-Choices "skip but capture Usage" rule from the
outer loop and removes the usageSentinelChunk / applyChunkToUsage
helpers entirely. The SSE responses channel is back to a single
purpose: wire chunks only.

processStream and processStreamWithTools move into chat_stream_workers.go
so they can be exercised directly from tests. The chat_stream_usage_test.go
suite now drives the workers with a mocked backend.ModelInferenceFunc
and asserts on the returned TokenUsage. The regression coverage for
issue #9927 is therefore behavioral: reverting the fix (discarding
ComputeChoices' usage return) makes the assertions fail with concrete
count mismatches.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:opus-4-7 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-22 10:13:41 +02:00
LocalAI [bot]
4735345105 chore: ⬆️ Update ggml-org/llama.cpp to bb28c1fe246b72276ee1d00ce89306be7b865766 (#9934)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-22 09:49:33 +02:00
LocalAI [bot]
7384fd800b chore: ⬆️ Update antirez/ds4 to 8d576642c39b9a2d782a80159ba84ef5a81c0b81 (#9932)
⬆️ Update antirez/ds4

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-22 08:31:49 +02:00
LocalAI [bot]
6942713d85 chore: ⬆️ Update leejet/stable-diffusion.cpp to 3a8788cb7d74f185d6b18688e9563015524ecaf5 (#9933)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-22 00:31:19 +02:00
LocalAI [bot]
0cf52c44d4 chore: ⬆️ Update ggml-org/whisper.cpp to 8443cf05e3fa8ce1b32348e1bcbcf8fc31f7f3ae (#9929)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-21 23:24:01 +02:00
LocalAI [bot]
0d34cf7cbd chore: ⬆️ Update ikawrakow/ik_llama.cpp to 48a55f74e4c6e2aeda363dd386c1ac9170a0af71 (#9930)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-21 23:23:37 +02:00
LocalAI [bot]
f0cb02afb8 feat(usage): attribute Sources rows to user accounts in admin view (#9935)
The merged feature (#9920) let admins see per-API-key and per-source
totals but did not surface which user owned each key, and lumped
every user's Web UI traffic into a single global Web UI row. This
makes the admin Sources tab properly per-user attributable:

- KeyTotal gains UserID + UserName, populated from the snapshot the
  usage middleware already records. The by_key roll-up now groups by
  (api_key_id, api_key_name, user_id, user_name).
- New SourceTotals.ByUserSource roll-up groups (source, user_id,
  user_name) for sources without a key identity (web, legacy). Only
  populated on the admin path (includeLegacy=true); the non-admin
  endpoint stays unchanged for backwards compatibility.
- SourcesTable accepts showUserColumn={isAdmin}; admin view renders
  a User column, makes the search match user name/id, and expands
  Web UI / legacy pseudo-rows from the global aggregate to one row
  per user using by_user_source.

Refs: #9862

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-21 23:23:06 +02:00
LocalAI [bot]
a39e025d64 fix(nodes): make per-node backend install async via gallery job queue (#9928)
* feat(galleryop): add TargetNodeID to ManagementOp for single-node installs

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(galleryop): add NodeScopedKey helpers for per-node opcache rows

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(galleryop): use strings.Cut for NodeScopedKey parsing, reject empty nodeID

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(nodes): scope DistributedBackendManager.InstallBackend to single node via TargetNodeID

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(http): make /api/nodes/:id/backends/install async via gallery service job queue

The handler previously called unloader.InstallBackend synchronously and
blocked the browser for up to 3 minutes waiting on the NATS reply. It now
enqueues a TargetNodeID-scoped ManagementOp on BackendGalleryChannel and
returns HTTP 202 + jobID immediately, matching /api/backends/install/:id.

The opcache key is built via NodeScopedKey(nodeID, backend) so concurrent
installs of the same backend across different nodes do not stomp each
other. galleryService/opcache/appConfig are threaded through
RegisterNodeAdminRoutes for this.

Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(http): log malformed backend_galleries override and stop test drain goroutine

Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(api): expose nodeID for node-scoped backend ops in /api/operations

Node-scoped backend installs land in opcache under "node:<nodeID>:<backend>"
keys. Without splitting that prefix back out, the operations panel renders
the full key as the display name and has no structured way to label which
worker an install is targeting. Detect the prefix, surface nodeID as its own
response field, and reduce the display name back to the bare backend slug.
Bare (non-scoped) ops are left untouched so legacy installs do not gain a
misleading empty nodeID.

Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(react-ui): poll job status for node-targeted backend installs

Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(react-ui): make NodeInstallPicker state updates pure and surface cancellations as errors

Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(react-ui): clarify async semantics in handleInstallOnTarget

Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(http): use statusUrl casing for node install response to match codebase precedent

Assisted-by: Claude:opus-4-7 [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-21 22:25:53 +02:00
Ettore Di Giacinto
05e8e1e9f4 ci(images): publish chronologically-orderable master-<epoch>-<sha> tags
The existing master push pipeline produces `master` (rolling) and
`sha-<short>` tags. Neither is orderable by build time, so downstream
GitOps that want to auto-bump to the newest master build (e.g. Flux
ImagePolicy) can't pick the latest from the tag list — alphabetical
sort over hex shas is effectively random, and the rolling `master`
tag can't be referenced as an immutable bump target.

Add a third tag of the form `master-<epoch>-<sha>` (Unix epoch in
seconds + short sha), gated on default-branch pushes via metadata-
action's `is_default_branch` predicate. The sha is retained for
traceability; the epoch makes the tags numerically orderable, so a
Flux ImagePolicy like

  filterTags:
    pattern: '^master-(?P<ts>[0-9]+)-[a-f0-9]+$'
    extract: '$ts'
  policy:
    numerical:
      order: asc

will reliably bump to the newest master build.

Applied to both image_build.yml (OCI labels stay consistent) and
image_merge.yml (the actual tag publisher via buildx imagetools).
2026-05-21 17:18:30 +00:00
Rin
a7f6cc8956 [utils] Fail immediately on extraction errors (#9926)
utils: fail immediately on extraction errors

Setting ContinueOnError to false ensures that ExtractArchive does not
leave the model or backend directory in an inconsistent state if a
partial failure occurs. This improves robustness against malformed
archives or unexpected I/O issues during installation.

Signed-off-by: RinZ27 <222222878+RinZ27@users.noreply.github.com>
2026-05-21 19:00:33 +02:00
LocalAI [bot]
f15b9178ec feat(usage): track and visualise usage per API key (#9920)
* feat(usage): add Source, APIKeyID, APIKeyName columns to UsageRecord

Adds three additive columns plus UsageSource* constants. The columns
are auto-migrated by InitDB. APIKeyID is a nullable foreign reference
to UserAPIKey.ID; APIKeyName is snapshotted on each row so revoked
keys keep showing their name in history.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(usage): backfill Source on pre-feature usage rows

InitDB now classifies any pre-existing usage_record with an empty
source: 'legacy-api-key' user -> legacy, everything else -> web.
The backfill is idempotent (only touches NULL/empty rows).

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(usage): add GetUserUsageBySource aggregator

Groups by (bucket, source, api_key_id, api_key_name). Filters out
legacy by default. Returns both per-bucket detail and roll-ups
(by_source, by_key sorted desc and capped at 200, grand_total).

The MAX(created_at) projection is iterated via Rows().Scan into a
string column and parsed manually because the SQLite driver surfaces
the aggregated timestamp as a string, which database/sql refuses to
scan directly into time.Time. Postgres returns a real timestamp; the
same string path handles its RFC3339 form too.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(usage): log Rows() errors and assert LastUsed in tests

Adds rows.Err() and Rows() open-failure logging in
computeSourceTotals so silent data drops surface in logs. Logs on
parseLastUsedString format misses for the same reason. Strengthens
the snapshot-survival test to assert LastUsed is a recent timestamp,
locking the SQLite time-string parser behaviour.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(usage): add admin GetAllUsageBySource with filters and truncation

Optional user_id and api_key_id filters (composed with AND). Legacy
bucket is included for admin callers. truncated=true when more than
200 distinct keys would be in the by_key roll-up.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(auth): plumb auth_source and auth_apikey through Echo context

tryAuthenticate now sets auth_source on every successful branch
(web for session/Bearer-session, apikey for Bearer-key/x-api-key/
token-cookie, legacy for legacy env key match). For named-key
branches it also stores the resolved *UserAPIKey under auth_apikey
so downstream middlewares can snapshot id+name without re-validating.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(auth): expand tryAuthenticate godoc and cover Bearer-session branch

Documents all three context-keys side effects (auth_source,
auth_apikey, _auth_session) plus the split of responsibilities with
the parent Middleware. Adds a test for the Bearer-as-session-token
classification so future regressions there fail loudly.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(usage): UsageMiddleware records source + snapshots key name

Reads auth_source and auth_apikey from the Echo context (set by
auth.Middleware in the previous task). Snapshots UserAPIKey.ID and
Name onto each row so revoked keys remain readable in history.
Falls back to source=web when no auth_source is set (auth disabled
or unrecognised path).

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(usage): add /api/auth/usage/sources and admin variant

Self endpoint filters legacy server-side; admin endpoint includes
legacy and accepts user_id + api_key_id filters. Response includes
buckets, totals.{by_source, by_key, grand_total}, and a truncated
flag set when the per-key roll-up was capped at 200.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(routes): mark test mirror handlers as keep-in-sync with production

The newTestAuthApp helper duplicates production route handlers
inline because it cannot use RegisterAuthRoutes (which requires a
*application.Application). Naming the source path on each mirror
makes the drift contract explicit for future maintainers.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add usageApi.getMySources/getAdminSources + i18n strings

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): add Sources tab skeleton with data fetch

Adds Usage page tab that fetches /api/auth/usage/sources (or the
admin variant). Renders raw totals plus a placeholder key list;
real visualisations land in subsequent commits. Restructures the
existing tab button block so Models and Sources are visible to
non-admins (Users remains admin-only).

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): source mix ribbon + searchable/sortable sources table

Replaces the SourcesTab placeholder rendering with two reusable
components: SourceMixRibbon (one segmented bar per source class)
and SourcesTable (search + sort + revoked-key dim). Pulls the
current API key list to detect revoked keys.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(ui): skip revoked-key detection until the key list is known

existingKeyIds defaulted to an empty Set, which made every live
api_key row render as (revoked) during the brief window before
apiKeysApi.list() resolved, and permanently after a fetch failure.
Use null as the unknown state and suppress the revoked badge until
the parent provides a real Set.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(ui): top-N stacked time chart and drill-in chip for Sources tab

Top 7 sources by total tokens get distinct colours; the rest roll up
into 'Other'. Clicking a row in the SourcesTable dims everything
except that series in the chart; the chip is the canonical clear.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* docs(usage): document per-API-key Sources tab and endpoints

Extends features/authentication.md Usage Tracking section with:
- A 'Sources' tab description and source-class taxonomy
- Endpoint documentation for /api/auth/usage/sources and the
  admin variant
- Response shape example with by_source / by_key / grand_total
- Migration note about pre-feature row backfill

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(usage): silence errcheck on deferred rows.Close

CI errcheck flagged the bare 'defer rows.Close()' in
computeSourceTotals. Wrap in a closure that discards the close
error explicitly; an error here is non-actionable since we have
already drained the rows and logged any iteration failure.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* refactor(usage): bound batcher intake and add Shutdown/FlushNow hooks

The pre-existing usage batcher had no cap on its add() path; the
usageMaxPending=5000 constant only guarded the re-queue path after
a failed write, leaving memory growth unbounded if the DB fell
behind. This commit:

- Adds the cap to add() so saturation drops new records (rate-limited
  warn at 1/1024) instead of growing unbounded.
- Raises usageMaxPending to 50000 to absorb realistic inference bursts.
- Replaces the package-level batcher global with a mutex-guarded pair
  plus a currentBatcher() accessor so Init / Shutdown cycles are
  race-free.
- Adds ShutdownUsageRecorder() for graceful drain on process exit
  (not yet wired into app shutdown, just published).
- Adds FlushNow() for deterministic tests; the middleware suite no
  longer needs 6s sleeps per spec and now runs in ~50ms instead of 18s.
- Re-queue on failed flush is now cap-aware: prepends as much of the
  failed batch as fits alongside concurrent arrivals, instead of
  dropping the whole batch when full.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(usage): drain usage batcher on graceful shutdown

Registers ShutdownUsageRecorder with the existing
signals.RegisterGracefulTerminationHandler so SIGINT/SIGTERM
synchronously flushes any in-memory usage records before the
process exits. Without this, up to one flush interval (5s) of
recorded usage was lost when LocalAI restarted.

Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-21 16:34:02 +02:00
LocalAI [bot]
959de86761 feat(llama-cpp): make server-side prompt cache work by default (#9925)
Aligns LocalAI's llama-cpp gRPC backend with upstream's auto-on prompt
cache path so repeated system prompts (agents, OpenAI/Anthropic-compatible
CLIs, coding assistants) skip prefill on subsequent calls without any
YAML changes. Reported in #9921.

Upstream's server enables `kv_unified=true` (and bumps `n_parallel` to 4)
when slot count is auto, which unlocks `cache_idle_slots`. LocalAI
hardcodes `n_parallel=1` and so far also hardcoded `kv_unified=false`,
which silently force-disables idle-slot saving at server init. The host
prompt cache was allocated but never written across requests.

Changes in backend/cpp/llama-cpp/grpc-server.cpp:
- params.kv_unified: false -> true (single-slot path now benefits from
  the prompt cache; users can opt out with `kv_unified:false`)
- params.n_ctx_checkpoints: 8 -> 32 (match upstream default)
- params.cache_idle_slots = true initialized explicitly (upstream default)
- params.checkpoint_every_nt = 8192 initialized explicitly (upstream default)
- New option parsers: cache_idle_slots / idle_slots_cache,
  checkpoint_every_nt / checkpoint_every_n_tokens

Docs:
- features/text-generation.md: fix misleading `cache_ram` description
  (it's the host-side prompt cache, not the KV cache), document the
  kv_unified + cache_ram + cache_idle_slots interaction, add rows for
  the two newly-exposed options, and add a worked example for the
  agent/CLI workload from the issue.
- advanced/model-configuration.md: mark the legacy `prompt_cache_path`
  / `prompt_cache_all` / `prompt_cache_ro` YAML fields as unused by the
  llama-cpp gRPC backend (they target upstream's CLI completion tool
  and are not consumed by grpc-server.cpp) and point readers at the
  new prompt-cache explainer.

Closes #9921

Assisted-by: claude:opus-4.7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-21 16:31:48 +02:00
LocalAI [bot]
4c234abc2c refactor(agents): bump skillserver, drop redundant Name from list_skills output (#9916)
refactor(agents): bump skillserver, drop redundant Name from list_skills/search_skills

skillserver's list_skills MCP tool used to ship every entry with name=""
(field was commented out), while search_skills populated it - two tools
with inconsistent shape for the same data. skill.Name and skill.ID are
populated from the same source string anyway (the directory name), so
returning both was pure duplication.

Bumps github.com/mudler/skillserver to a7317cb, which drops the Name
field from both SkillInfo and SearchResult and leaves ID as the single
canonical identifier (already what read_skill consumes).

Adds core/services/skills/skills_mcp_test.go, a regression that drives
the LocalAI FilesystemManager through an in-process MCP session and
asserts a newly-created skill is visible by ID on the still-open session.

This is a cleanup, not the root cause of #9868 - the reporter likely
sees something deeper than a cosmetic JSON shape issue.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-21 14:45:53 +02:00
Richard Palethorpe
c68818a62e fix(llama-cpp): terminate tensor_buft_overrides with sentinel (#9919)
llama.cpp's model loader asserts back().pattern == nullptr on
params.tensor_buft_overrides (and on params.kv_overrides.back().key[0]
== 0) before binding them into llama_model_params. PR #8560 attempted
to satisfy llama_params_fit's placeholder requirement by pre-filling
params.tensor_buft_overrides up to llama_max_tensor_buft_overrides()
*before* the option-parse loop. Any subsequent push_back from
override_tensor / draft_cpu_moe / draft_n_cpu_moe / draft_override_tensor
then appended real entries after the placeholders, leaving back() with
a real pattern and tripping the assert. The draft override vector
likewise had no terminator at all.

Mirror upstream common/arg.cpp:645-658 instead: real entries are
pushed during option parsing, and after parsing we pad the main vector
up to ntbo (placeholders land at the end, so back() is always nullptr)
and append a single {nullptr, nullptr} to the draft vector when it is
non-empty. The existing kv_overrides terminator block already matches
upstream and stays.

Verified against ggml-org/llama.cpp@5cbaa5e: only tensor_buft_overrides
(main + draft) and kv_overrides are sentinel-terminated common_params
fields; everything else is size-driven std::vector.

Assisted-by: claude-code:claude-opus-4-7

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-21 12:55:06 +02:00
LocalAI [bot]
11d5bd0cc3 fix(react-ui/chat): stop wiping selection on every /api/operations poll (#9904) (#9917)
useOperations() was calling setOperations() with a fresh array on every
1s poll, even when the payload was identical. In React 19 the DOM diff
no longer short-circuits dangerouslySetInnerHTML on equal __html, so the
forced Chat re-render re-assigned innerHTML on every assistant message
once per second — wiping any text the user had selected.

Skip the state update when the serialised operations payload is
unchanged, and switch loading/error to functional setters so they also
short-circuit at the source.

Also fixes the chat copy button on plain HTTP: navigator.clipboard is
undefined in non-secure contexts (a common LXC+Docker deployment), but
the previous code called it unconditionally and showed a success toast
regardless. Routed Chat, AgentChat and CanvasPanel through a new
copyToClipboard() helper that uses navigator.clipboard when available
and falls back to a hidden-textarea + execCommand('copy') trick that
browsers still honour outside secure contexts. The fallback preserves
the user's existing selection.

Regression coverage in e2e/chat-polling-selection.spec.js: a
MutationObserver counts mutations on the assistant content node across
3s of polling (must be 0); the copy test stubs out navigator.clipboard
and asserts that execCommand('copy') is invoked.


Assisted-by: claude-opus-4-7-1m

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-21 12:17:51 +02:00
LocalAI [bot]
12e056e96d chore: ⬆️ Update ggml-org/llama.cpp to ad277572619fcfb6ddd38f4c6437283a4b2b8636 (#9915)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-21 09:07:31 +02:00
LocalAI [bot]
308aa8908a chore: ⬆️ Update ace-step/acestep.cpp to ed53caf164e4492a5620b2e3f2264629cf66da24 (#9913)
⬆️ Update ace-step/acestep.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-21 00:15:57 +02:00
LocalAI [bot]
b2d68a53a2 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 11a1fea9e291f12ce2c803a9d7812c30ca806bcf (#9914)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-20 22:04:06 +00:00
LocalAI [bot]
e3706c0512 chore(model-gallery): ⬆️ update checksum (#9910)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-20 23:38:45 +02:00
LocalAI [bot]
1ffd82a050 chore: ⬆️ Update antirez/ds4 to 2606543be7a8c125a32cee37f5d1d85dc78f2fcf (#9909)
⬆️ Update antirez/ds4

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-20 21:22:26 +00:00
LocalAI [bot]
f515168dbe chore(acestep-cpp): bump pin to ed53caf and adapt wrapper to new API (#9908)
The new ace-step.cpp revision moves backend initialization inside each
`*_load` call and drops the separate `DiTGGMLConfig` argument from
`dit_ggml_load` (config now lives in `DiTGGML::cfg`, populated from GGUF
metadata at load time). Drop the now-removed `*_init_backend` calls and
replace `g_dit_cfg` accesses with `g_dit.cfg`.


Assisted-by: Claude:claude-opus-4-7 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-20 21:05:32 +00:00
LocalAI [bot]
ef6ca34513 chore: ⬆️ Update leejet/stable-diffusion.cpp to 5b0267e941cade15bd80089d89838795d9f4baa6 (#9907)
Adapt the C++ wrapper to the new `generate_video()` signature: upstream now
returns `bool` and writes frames/audio via out-parameters (`sd_image_t**`,
`sd_audio_t**`). Also set `p->fps` on the params struct (new upstream field)
and free the returned audio handle on both the success and error paths.


Assisted-by: claude-code:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-20 20:53:19 +00:00
dependabot[bot]
9413c3767f chore(deps): update transformers requirement from >=5.8.0 to >=5.8.1 in /backend/python/transformers (#9883)
chore(deps): update transformers requirement

Updates the requirements on [transformers](https://github.com/huggingface/transformers) to permit the latest version.
- [Release notes](https://github.com/huggingface/transformers/releases)
- [Commits](https://github.com/huggingface/transformers/compare/v5.8.0...v5.8.1)

---
updated-dependencies:
- dependency-name: transformers
  dependency-version: 5.8.1
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-20 22:16:02 +02:00
dependabot[bot]
3bf3cce232 chore(deps): bump sentence-transformers from 5.4.0 to 5.5.0 in /backend/python/transformers (#9888)
chore(deps): bump sentence-transformers in /backend/python/transformers

Bumps [sentence-transformers](https://github.com/huggingface/sentence-transformers) from 5.4.0 to 5.5.0.
- [Release notes](https://github.com/huggingface/sentence-transformers/releases)
- [Commits](https://github.com/huggingface/sentence-transformers/compare/v5.4.0...v5.5.0)

---
updated-dependencies:
- dependency-name: sentence-transformers
  dependency-version: 5.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-20 22:13:39 +02:00
LocalAI [bot]
06f8159035 chore: ⬆️ Update ggml-org/llama.cpp to 67ace021da905e27ecbdf1176b0eef578a5288c0 (#9897)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-20 22:05:58 +02:00
LocalAI [bot]
f6a73f54fa feat(swagger): update swagger (#9872)
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-20 22:05:35 +02:00
LocalAI [bot]
24e04d8e81 chore: ⬆️ Update ikawrakow/ik_llama.cpp to 77413bc900f9a2bfd8a5407f184427bcc0825f6c (#9899)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-20 01:02:53 +02:00
LocalAI [bot]
b9a49449ae chore: ⬆️ Update ggml-org/whisper.cpp to afa2ea544fb4b0448916b4a31ecd33c8685bd482 (#9898)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-20 01:02:25 +02:00
LocalAI [bot]
1879e11042 chore: ⬆️ Update antirez/ds4 to 599e49d253971451f710cb8323344e789906ed6c (#9900)
⬆️ Update antirez/ds4

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-20 01:01:45 +02:00
LocalAI [bot]
403d391316 chore(model-gallery): ⬆️ update checksum (#9901)
⬆️ Checksum updates in gallery/index.yaml

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-20 01:01:20 +02:00
Daniel Liljeberg
fc3980dadd fix: inject text-file content into chat completions messages (#9896)
Non-image/non-audio file attachments (txt, md, csv, json) were being
  stored in the 'files' metadata field but never added to the message
  content array sent to /v1/chat/completions. Images and audio correctly
  received content blocks; files did not.

  Fix: push a text content block into messageContent when textContent is
  present, matching the pattern used for image_url and audio_url.

  Also fixes Home.jsx addFiles which never called file.text() at all,
  meaning files attached on the home screen had empty textContent even
  before reaching useChat.js.

  Note: PDF files use file.text() which returns raw bytes rather than
  parsed text. Proper PDF support would require PDF.js or server-side
  extraction and is not part of this fix.

Signed-off-by: Daniel Liljeberg <damien_@hotmail.com>
2026-05-20 01:00:32 +02:00
Richard Palethorpe
2009544b44 fix(nix): correct flake src path and add dev shell (#9894)
The flake set `src = ./sources;` referencing a non-existent subdirectory,
so `nix build` and `nix develop` both failed evaluation. Point `src` at
the repo root and refresh `vendorHash` accordingly.

Add `devShells.default` with the Go toolchain, protobuf generators,
Node.js/bun for the React UI (`make react-ui`), and the linters used by
`make lint` (golangci-lint, gofumpt, goimports, staticcheck).

Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-19 19:28:30 +02:00
dependabot[bot]
e859345b12 chore(deps): bump github.com/alecthomas/kong from 1.14.0 to 1.15.0 (#9881)
Bumps [github.com/alecthomas/kong](https://github.com/alecthomas/kong) from 1.14.0 to 1.15.0.
- [Commits](https://github.com/alecthomas/kong/compare/v1.14.0...v1.15.0)

---
updated-dependencies:
- dependency-name: github.com/alecthomas/kong
  dependency-version: 1.15.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-19 08:07:07 +02:00
dependabot[bot]
f30712f8e8 chore(deps): bump github.com/aws/aws-sdk-go-v2 from 1.41.6 to 1.41.7 (#9892)
Bumps [github.com/aws/aws-sdk-go-v2](https://github.com/aws/aws-sdk-go-v2) from 1.41.6 to 1.41.7.
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/v1.41.6...v1.41.7)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2
  dependency-version: 1.41.7
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-19 08:06:50 +02:00
dependabot[bot]
a19c77c5f8 chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.2 to 2.29.0 (#9882)
Bumps [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) from 2.28.2 to 2.29.0.
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.28.2...v2.29.0)

---
updated-dependencies:
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-version: 2.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-19 08:06:34 +02:00
LocalAI [bot]
4b02d23c0c chore: ⬆️ Update ggml-org/llama.cpp to 5cbaa5e69e09bde3334cd8c355570553a0dca027 (#9876)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-19 08:06:16 +02:00
LocalAI [bot]
21140e96b2 chore: ⬆️ Update ggml-org/whisper.cpp to 47b9eb37a33c5031a1b667ace64477330b9f36c1 (#9877)
⬆️ Update ggml-org/whisper.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-19 08:05:56 +02:00
dependabot[bot]
fc803e8d48 chore(deps): bump golang.org/x/crypto from 0.50.0 to 0.51.0 (#9886)
Bumps [golang.org/x/crypto](https://github.com/golang/crypto) from 0.50.0 to 0.51.0.
- [Commits](https://github.com/golang/crypto/compare/v0.50.0...v0.51.0)

---
updated-dependencies:
- dependency-name: golang.org/x/crypto
  dependency-version: 0.51.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-19 08:04:15 +02:00
LocalAI [bot]
ca51606bfe chore: ⬆️ Update ikawrakow/ik_llama.cpp to 40aae0b6d86d50c0ee7011b3ce59a233203e430a (#9875)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-19 08:01:41 +02:00
Azteczek
cb502de309 feat: add flake.nix for dockerless setup (#9851)
* Add flake.nix

Signed-off-by: Azteczek <243776410+Azteczek@users.noreply.github.com>

* Add flake.lock

Signed-off-by: Azteczek <243776410+Azteczek@users.noreply.github.com>

---------

Signed-off-by: Azteczek <243776410+Azteczek@users.noreply.github.com>
2026-05-18 15:23:10 +01:00
Richard Palethorpe
5d0b549049 feat(gallery): verify backend OCI images with keyless cosign (#9823)
* feat(gallery): verify backend OCI images with keyless cosign

Close a trust gap where a registry compromise or MITM could silently
replace a backend image: the gallery YAML tells LocalAI which image to
pull, but until now nothing verified the bytes came from our CI.

Consumer (pkg/oci/cosignverify):
- New package using sigstore-go to verify keyless-cosign signatures.
- OCI 1.1 referrers API + new bundle format (no legacy :tag.sig).
- Policy fields: Issuer / IssuerRegex / Identity / IdentityRegex /
  NotBefore. NotBefore is the revocation lever — keyless Fulcio certs
  are ephemeral so revocation is policy-side; advancing not_before in
  the gallery YAML invalidates every signature predating the cutoff.
- TUF trusted root cached process-wide so N backends from one gallery
  do 1 fetch, not N.

Plumbing:
- pkg/downloader: ImageVerifier interface + WithImageVerifier option
  threaded through DownloadFileWithContext. Verification runs between
  oci.GetImage and oci.ExtractOCIImage, with digest pinning via
  pinnedImageRef to close the TOCTOU window. Skips the verifier's HEAD
  when the ref is already digest-pinned.
- core/config: Gallery.Verification YAML block.
- core/gallery: backendDownloadOptions builds the verifier from the
  policy; applied on initial URI, mirrors, and tag fallbacks.
- core/gallery/upgrade: the upgrade path now routes through the same
  options builder. A regression Ginkgo spec pins this contract —
  without it, UpgradeBackend silently bypassed verification.
- core/cli: --require-backend-integrity (LOCALAI_REQUIRE_BACKEND_INTEGRITY)
  escalates missing policy / empty SHA256 from warn to hard-fail.

Producer (.github/workflows/backend_merge.yml):
- id-token: write at job scope (PR-fork-safe via existing event gate).
- sigstore/cosign-installer@v3 pinned to v2.4.1.
- After each docker buildx imagetools create, resolve the manifest
  list digest and run cosign sign --recursive --new-bundle-format
  --registry-referrers-mode=oci-1-1 against repo@digest. --recursive
  signs the index and every per-arch entry, matching how the consumer
  resolves a tag to a platform-specific manifest before verifying.

Rollout: backend/index.yaml has no `verification:` block yet, so this
PR is backward-compatible — installs proceed with a warning until the
gallery is populated. Strict mode is opt-in.

Assisted-by: claude-code:claude-opus-4-7 [Bash] [Edit] [Read] [Write] [WebSearch] [WebFetch]
Signed-off-by: Richard Palethorpe <io@richiejp.com>

* refactor(gallery): plumb RequireBackendIntegrity through config instead of env

The previous implementation re-exported the --require-backend-integrity
CLI flag into LOCALAI_REQUIRE_BACKEND_INTEGRITY via os.Setenv, then
re-read it in core/gallery via os.Getenv. This leaked process state
into the gallery package and made the flag impossible to override
per-call or test without touching the env.

Add RequireBackendIntegrity to ApplicationConfig (with a matching
WithRequireBackendIntegrity AppOption) and thread the bool through
every install/upgrade path: InstallBackend, InstallBackendFromGallery,
UpgradeBackend, InstallModelFromGallery, InstallExternalBackend,
ApplyGalleryFromString/File, startup.InstallModels. Worker subcommands
gain the same env-bound flag on WorkerFlags so distributed-worker
installs honor it consistently with the worker daemon path.

Add a forbidigo lint rule against os.Getenv / os.LookupEnv / os.Environ
to keep the env-leak pattern from creeping back. Existing offenders
(p2p, config loaders, etc.) are baseline-grandfathered by the existing
new-from-merge-base: origin/master setting; targeted path exclusions
cover the legitimate cases — kong CLI entry points, backend
subprocesses, system capability probes, gRPC AUTH_TOKEN inheritance,
test gating env vars.

Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>

---------

Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-18 08:02:20 +02:00
LocalAI [bot]
11cff1b309 chore: ⬆️ Update ggml-org/llama.cpp to 87589042cac2c390cec8d68fb2fad64e0a2a252a (#9855)
⬆️ Update ggml-org/llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-18 08:01:30 +02:00
LocalAI [bot]
4ca3d2cdc0 docs: ⬆️ update docs version mudler/LocalAI (#9863)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-17 23:20:16 +02:00
LocalAI [bot]
3cba35ed32 chore: ⬆️ Update antirez/ds4 to c9dd9499bfa57c1bbfbb4446eff963330ab5329b (#9864)
⬆️ Update antirez/ds4

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-17 23:19:58 +02:00
LocalAI [bot]
265ae35231 chore: ⬆️ Update ikawrakow/ik_llama.cpp to c35189d83c91aad780aba62b89f2830cb2916223 (#9866)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-17 23:19:43 +02:00
LocalAI [bot]
6a48157a80 chore: ⬆️ Update leejet/stable-diffusion.cpp to bd17f53b7386fb5f60e8587b75e73c4b2fed3426 (#9854)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-16 23:12:05 +02:00
LocalAI [bot]
41c838b2df chore: ⬆️ Update ikawrakow/ik_llama.cpp to 3e573cfea6e0a332eff822ffbdb1dd3b112e9051 (#9856)
⬆️ Update ikawrakow/ik_llama.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-16 22:44:08 +02:00
LocalAI [bot]
21e793ad2a chore: ⬆️ Update antirez/ds4 to ef0a4905d05263df8e63689f2dd1efac618a752c (#9857)
⬆️ Update antirez/ds4

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-16 22:43:46 +02:00
LocalAI [bot]
7c190bb4b9 docs: ⬆️ update docs version mudler/LocalAI (#9853)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-05-16 22:43:06 +02:00
LocalAI [bot]
d77a9137d8 feat(llama-cpp): bump to MTP-merge SHA and automatically set MTP defaults (#9852)
* feat(llama-cpp): bump to MTP-merge SHA and document draft-mtp spec type

Update LLAMA_VERSION to 0253fb21 (post ggml-org/llama.cpp#22673 merge,
2026-05-16) to pick up Multi-Token Prediction support.

No grpc-server.cpp changes are required: the existing `spec_type` option
delegates to upstream's `common_speculative_types_from_names()`, which
already accepts the new `draft-mtp` name. The `n_rs_seq` cparam needed
by MTP is auto-derived inside `common_context_params_to_llama` from
`params.speculative.need_n_rs_seq()`, and when no `draft_model` is set
the upstream server builds the MTP context off the target model itself.

Docs: extend the speculative-decoding section of the model-configuration
guide with the new type, both load paths (MTP head embedded in the main
GGUF vs. separate `mtp-*.gguf` sibling), the PR's recommended
`spec_n_max:2-3`, and the chained `draft-mtp,ngram-mod` recipe. Also
notes that the upstream `-hf` auto-discovery of `mtp-*.gguf` siblings is
not wired through LocalAI's gRPC layer.

Agent guide: short note explaining that new upstream spec types are
picked up automatically and that MTP needs no gRPC plumbing.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* feat(llama-cpp): auto-detect MTP heads and enable draft-mtp on import + load

Detect upstream's `<arch>.nextn_predict_layers` GGUF metadata key (set by
`convert_hf_to_gguf.py` for Qwen3.5/3.6 family models and similar) and,
when present and the user has not configured a `spec_type` explicitly,
auto-append the upstream-recommended speculative-decoding tuple:

  - spec_type:draft-mtp
  - spec_n_max:6
  - spec_p_min:0.75

The 0.75 p_min is pinned defensively because upstream marks the current
default with a "change to 0.0f" TODO; locking it here keeps acceptance
thresholds stable across future llama.cpp bumps.

Detection runs in two places:

  - The model importer (`POST /models/import-uri`, the `/import-model`
    UI) range-fetches the GGUF header for HuggingFace / direct-URL
    imports via `gguf.ParseGGUFFileRemote`, with a 30s timeout and
    non-fatal error handling. OCI/Ollama URIs are skipped because the
    artifact is not directly streamable; the load-time hook covers them
    once the file is on disk.
  - The llama-cpp load-time hook (`guessGGUFFromFile`) reads the local
    header on every model start and appends the same options if
    `spec_type` is not already set.

Both paths share `ApplyMTPDefaults` and respect an explicit user-set
`spec_type:` / `speculative_type:` so YAML overrides win. Ginkgo
specs cover the append, preserve-user-choice, legacy alias, and nil
safety paths.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

* fix(importer): resolve huggingface:// URIs before MTP header probe

`gguf.ParseGGUFFileRemote` only speaks HTTP(S), but the importer was
handing it the raw `huggingface://...` URI directly (and similarly for
any other custom downloader scheme). Live-test against
`huggingface://ggml-org/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-MTP-Q8_0.gguf`
exposed this: the probe failed with `unsupported protocol scheme
"huggingface"`, was caught by the non-fatal error path, and the MTP
options were silently never applied to the generated YAML.

Route every candidate URI through `downloader.URI.ResolveURL()` and
require the resolved form to be HTTP(S). After the fix the probe
successfully reads `<arch>.nextn_predict_layers=1` from the real HF
GGUF and the emitted ConfigFile carries spec_type:draft-mtp,
spec_n_max:6, spec_p_min:0.75 as intended.

Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-16 22:42:48 +02:00
128 changed files with 6054 additions and 765 deletions

View File

@@ -112,6 +112,8 @@ Add a YAML anchor definition in the `## metas` section (around line 2-300). Look
Add image entries at the end of the file, following the pattern of similar backends such as `diffusers` or `chatterbox`. Include both `latest` (production) and `master` (development) tags.
**Note on integrity:** OCI backends installed from a gallery whose `verification:` block is set are verified against a keyless-cosign policy before extraction; tarball/HTTP backends use the optional `sha256:` field. New backends do not need any extra YAML — the gallery-level `verification:` block covers every entry. See [.agents/backend-signing.md](backend-signing.md) for the producer-side CI step.
## 4. Update the Makefile
The Makefile needs to be updated in several places to support building and testing the new backend:

120
.agents/backend-signing.md Normal file
View File

@@ -0,0 +1,120 @@
# Backend image signing & verification
LocalAI verifies backend OCI images against a per-gallery keyless-cosign
policy. This page documents the trust model, the producer side
(`.github/workflows/backend_merge.yml` in this repo), and the consumer
side (`pkg/oci/cosignverify` plus the gallery YAML).
## Trust model
- **Producer:** `.github/workflows/backend_merge.yml` signs each pushed
manifest list with `cosign sign --recursive` in keyless mode after
`docker buildx imagetools create`. The signing cert is issued by
Fulcio bound to the workflow's OIDC identity. There is no long-lived
signing key. `--recursive` signs both the manifest list and every
per-arch entry — needed because our consumer resolves a tag to a
per-arch manifest before checking signatures.
- **Storage:** Signatures are written as OCI 1.1 referrers
(`--registry-referrers-mode=oci-1-1`) in the new Sigstore bundle format
(`--new-bundle-format`). No `:sha256-<hex>.sig` tag clutter.
- **Consumer:** `pkg/oci/cosignverify` discovers the bundle via the
referrers API, hands it to `sigstore-go`, and verifies it against the
policy declared in the gallery YAML (`Gallery.Verification`).
- **Revocation:** Keyless cosign certs are ephemeral (10-minute Fulcio
validity), so revocation is policy-side, not CA-side. The gallery's
`verification.not_before` (RFC3339) is the kill-switch — advance it to
invalidate every signature produced before a known compromise window.
## Producer setup
`backend_merge.yml` is the workflow that joins per-arch digests into the
multi-arch manifest list users actually pull, so it's also the right place
to sign. The job needs:
- `permissions: { id-token: write, contents: read }` at the job level so
the runner can exchange its GitHub OIDC token for a Fulcio cert.
- `sigstore/cosign-installer@v3` step (cosign ≥ 2.2 for
`--new-bundle-format`).
- After each `docker buildx imagetools create`, resolve the resulting
list digest with `docker buildx imagetools inspect <tag> --format
'{{.Manifest.Digest}}'` and sign:
```sh
cosign sign --yes --recursive \
--new-bundle-format \
--registry-referrers-mode=oci-1-1 \
"${REGISTRY_REPO}@${DIGEST}"
```
Sign by digest, never by tag — signing by tag binds the signature to
whatever the tag points at *now*, and a subsequent tag push orphans it.
`backend_build_darwin.yml` builds and pushes single-arch darwin images
that bypass the manifest-list merge. If/when those entries get a gallery
`verification:` policy, the equivalent cosign step has to land there
too.
## Consumer setup (in `mudler/LocalAI` gallery YAML)
Once CI is signing, add a `verification:` block to the backend gallery
entry (`backend/index.yaml`):
```yaml
- name: localai
url: github:mudler/LocalAI/backend/index.yaml@master
verification:
issuer: "https://token.actions.githubusercontent.com"
identity_regex: "^https://github\\.com/mudler/LocalAI/\\.github/workflows/backend_merge\\.yml@refs/heads/master$"
# Optional revocation cutoff; advance during incident response.
# not_before: "2026-06-01T00:00:00Z"
```
Identity matching pins the OIDC subject Fulcio issued the signing cert
to. Without this, any image signed by *anyone* with a Fulcio cert would
pass — the regex is what makes a signature mean "produced by our CI".
## Strict mode
Default behaviour: OCI backends without a `verification:` block install
with a warning (logs include `installing OCI backend without signature
verification`). Tarball/HTTP backends without a `sha256` field log a
similar warning.
For production, set `LOCALAI_REQUIRE_BACKEND_INTEGRITY=1` (or pass
`--require-backend-integrity` to `local-ai run` / `local-ai backends
install` / `local-ai models install`). The warning becomes a hard error
and unverifiable backends refuse to install.
## Revocation playbook
If `backend_merge.yml` (or any workflow with `id-token: write`) is
compromised and we've shipped malicious signed images:
1. **Identify the compromise window.** Find the earliest IntegratedTime
from the bad signatures (Rekor search by `subject` filter).
2. **Set `verification.not_before`** in `backend/index.yaml` to a
timestamp just *after* that window's start.
3. **Push the YAML.** Deployed LocalAI instances pick it up on next
gallery refresh (1-hour cache in `core/gallery/gallery.go`).
4. **Fix the underlying compromise** in the workflow and re-sign images
with the new build, which will have IntegratedTime > `not_before`.
5. **Optional:** for absolute decisiveness, also rotate to a new
workflow path (`backend_merge_v2.yml`) and update `identity_regex`.
## Where the code lives
- `pkg/oci/cosignverify/` — verifier, policy, OCI referrer fetch, NotBefore enforcement.
- `pkg/downloader/uri.go``WithImageVerifier` option threaded through `DownloadFileWithContext`.
- `core/gallery/backends.go``backendDownloadOptions` builds the verifier from the gallery's policy.
- `core/config/gallery.go``Gallery.Verification` YAML schema.
- `core/cli/run.go`, `core/cli/backends.go`, `core/cli/models.go``--require-backend-integrity` flag propagation.
- `.github/workflows/backend_merge.yml` — producer-side `cosign sign --recursive` after each multi-arch manifest list push.
## Out of scope (follow-ups)
- **Signing the gallery YAML itself.** The index is fetched over HTTPS
from GitHub; we trust the host. A cosign blob signature on the YAML
would close that gap but adds key-management overhead. Revisit this
page if/when added.
- **Tarball/HTTP backend signing.** Cosign can sign arbitrary blobs, but
for now non-OCI backends keep using the `sha256:` field in YAML.

View File

@@ -61,6 +61,12 @@ Always check `llama.cpp` for new model configuration options that should be supp
- `reasoning_format` - Reasoning format options
- Any new flags or parameters
### Speculative Decoding Types
The `spec_type` option in `grpc-server.cpp` delegates to upstream's `common_speculative_types_from_names()`, so new speculative types added to the `common_speculative_type_from_name` map in `common/speculative.cpp` are picked up automatically with no code changes - only docs need an entry in `docs/content/advanced/model-configuration.md`. Current values: `none`, `draft-simple`, `draft-eagle3`, `draft-mtp`, `ngram-simple`, `ngram-map-k`, `ngram-map-k4v`, `ngram-mod`, `ngram-cache`.
`draft-mtp` (Multi-Token Prediction, [ggml-org/llama.cpp#22673](https://github.com/ggml-org/llama.cpp/pull/22673)) does not need a separate draft GGUF: when `spec_type` includes `draft-mtp` and `draftmodel` is empty, the upstream server creates an MTP context off the target model itself. LocalAI's gRPC layer needs no changes for this — it works through the existing `params.speculative.types` plumbing and the derived `cparams.n_rs_seq = params.speculative.need_n_rs_seq()` in `common_context_params_to_llama`.
### Implementation Guidelines
1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation

View File

@@ -31,6 +31,13 @@ on:
jobs:
merge:
runs-on: ubuntu-latest
# id-token: write is required for keyless cosign — the workflow
# exchanges the GitHub OIDC token for a short-lived Fulcio cert that
# signs each pushed manifest. Without this permission the runner
# cannot mint the token, and `cosign sign` fails with "no token".
permissions:
contents: read
id-token: write
env:
quay_username: ${{ secrets.quayUsername }}
steps:
@@ -57,6 +64,15 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@master
# cosign signs each pushed manifest list with --recursive so the
# index and every per-arch entry get an attached Sigstore bundle.
# 2.2+ is required for --new-bundle-format.
- name: Install cosign
if: github.event_name != 'pull_request'
uses: sigstore/cosign-installer@v3
with:
cosign-release: 'v2.4.1'
- name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v4
@@ -120,11 +136,26 @@ jobs:
' <<< "$DOCKER_METADATA_OUTPUT_JSON")
if [ -z "$tags" ]; then
echo "No quay.io tags from docker/metadata-action; skipping quay merge"
else
# shellcheck disable=SC2086
docker buildx imagetools create $tags \
$(printf 'quay.io/go-skynet/ci-cache@sha256:%s ' *)
exit 0
fi
# shellcheck disable=SC2086
docker buildx imagetools create $tags \
$(printf 'quay.io/go-skynet/ci-cache@sha256:%s ' *)
# Resolve the manifest-list digest (any tag points at it) so
# cosign can sign by digest. Signing by tag would leave the
# signature orphaned the next time the tag moves.
first_tag=$(jq -cr '
.tags | map(select(startswith("quay.io/"))) | .[0]
' <<< "$DOCKER_METADATA_OUTPUT_JSON")
digest=$(docker buildx imagetools inspect "$first_tag" --format '{{.Manifest.Digest}}')
# --recursive walks the list and signs every per-arch entry
# too — clients that resolve a tag to a platform-specific
# manifest before checking signatures need the per-arch
# signatures, not just the list-level one.
cosign sign --yes --recursive \
--new-bundle-format \
--registry-referrers-mode=oci-1-1 \
"quay.io/go-skynet/local-ai-backends@${digest}"
- name: Create manifest list and push (dockerhub)
if: github.event_name != 'pull_request'
@@ -139,11 +170,19 @@ jobs:
' <<< "$DOCKER_METADATA_OUTPUT_JSON")
if [ -z "$tags" ]; then
echo "No dockerhub tags from docker/metadata-action; skipping dockerhub merge"
else
# shellcheck disable=SC2086
docker buildx imagetools create $tags \
$(printf 'localai/localai-backends@sha256:%s ' *)
exit 0
fi
# shellcheck disable=SC2086
docker buildx imagetools create $tags \
$(printf 'localai/localai-backends@sha256:%s ' *)
first_tag=$(jq -cr '
.tags | map(select(startswith("localai/"))) | .[0]
' <<< "$DOCKER_METADATA_OUTPUT_JSON")
digest=$(docker buildx imagetools inspect "$first_tag" --format '{{.Manifest.Digest}}')
cosign sign --yes --recursive \
--new-bundle-format \
--registry-referrers-mode=oci-1-1 \
"localai/localai-backends@${digest}"
- name: Inspect manifest
if: github.event_name != 'pull_request'

View File

@@ -106,6 +106,7 @@ jobs:
type=ref,event=branch
type=semver,pattern={{raw}}
type=sha
type=raw,value={{branch}}-{{date 'X'}}-{{sha}},enable={{is_default_branch}}
flavor: |
latest=${{ inputs.tag-latest }}
suffix=${{ inputs.tag-suffix }},onlatest=true

View File

@@ -80,6 +80,7 @@ jobs:
type=ref,event=branch
type=semver,pattern={{raw}}
type=sha
type=raw,value={{branch}}-{{date 'X'}}-{{sha}},enable={{is_default_branch}}
flavor: |
latest=${{ inputs.tag-latest }}
suffix=${{ inputs.tag-suffix }},onlatest=true

3
.gitignore vendored
View File

@@ -77,3 +77,6 @@ local-backends/
tests/e2e-ui/ui-test-server
core/http/react-ui/playwright-report/
core/http/react-ui/test-results/
# Local worktrees
.worktrees/

View File

@@ -46,8 +46,52 @@ linters:
msg: 'LocalAI tests must use Ginkgo/Gomega; use Fail(...) instead of t.Fail. See .agents/coding-style.md.'
- pattern: '^t\.FailNow$'
msg: 'LocalAI tests must use Ginkgo/Gomega; use Fail(...) instead of t.FailNow. See .agents/coding-style.md.'
# In-process config should flow through ApplicationConfig / kong-bound
# CLI flags, not via os.Getenv. The CLI layer is the legitimate
# env→struct boundary (kong's `env:"..."` tag); anything deeper that
# reads env directly leaks process state into business logic and
# makes flags impossible to test or override per-request. Backend
# subprocesses, the system/capabilities probe, and a few places that
# read non-LocalAI env vars (HOME, PATH, AUTH_TOKEN passed by parent)
# are exempt — see linters.exclusions.rules below.
- pattern: '^os\.(Getenv|LookupEnv|Environ)$'
msg: 'Plumb config through ApplicationConfig (or the relevant CLI struct) instead of reading env directly. CLI entry points (core/cli/) bind env vars via kong''s `env:` tag — that is the only sanctioned env→struct boundary. See .agents/coding-style.md.'
exclusions:
paths:
# Upstream whisper.cpp source tree fetched by the whisper backend Makefile.
- 'backend/go/whisper/sources'
- 'docs/'
rules:
# CLI entry points: kong's `env:"..."` tag is the legitimate env→struct
# boundary, and a handful of subcommands legitimately propagate values
# to spawned subprocesses (LLAMACPP_GRPC_SERVERS, MLX hostfile, ...).
- path: ^core/cli/
text: 'os\.(Getenv|LookupEnv|Environ)'
linters: [forbidigo]
# Backend subprocesses are independent binaries with their own env
# surface; they're not "in-process config" of the LocalAI server.
- path: ^backend/
text: 'os\.(Getenv|LookupEnv|Environ)'
linters: [forbidigo]
# System capability probe reads HOME, PATH-style vars to discover
# GPUs, default paths, etc. — not LocalAI config.
- path: ^pkg/system/
text: 'os\.(Getenv|LookupEnv|Environ)'
linters: [forbidigo]
# gRPC server reads AUTH_TOKEN passed in by the parent process at spawn
# time; model.Loader sets/inherits env to communicate with subprocesses.
- path: ^pkg/grpc/
text: 'os\.(Getenv|LookupEnv|Environ)'
linters: [forbidigo]
- path: ^pkg/model/
text: 'os\.(Getenv|LookupEnv|Environ)'
linters: [forbidigo]
# Top-level main binaries (local-ai, launcher) are entry points.
- path: ^cmd/
text: 'os\.(Getenv|LookupEnv|Environ)'
linters: [forbidigo]
# Tests legitimately read $HOME, $TMPDIR, and gating env vars
# (LOCALAI_COSIGN_LIVE, etc.) to skip live-network specs.
- path: _test\.go$
text: 'os\.(Getenv|LookupEnv|Environ)'
linters: [forbidigo]

View File

@@ -31,6 +31,7 @@ LocalAI follows the Linux kernel project's [guidelines for AI coding assistants]
| [.agents/debugging-backends.md](.agents/debugging-backends.md) | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
| [.agents/adding-gallery-models.md](.agents/adding-gallery-models.md) | Adding GGUF models from HuggingFace to the model gallery |
| [.agents/localai-assistant-mcp.md](.agents/localai-assistant-mcp.md) | LocalAI Assistant chat modality — adding admin tools to the in-process MCP server, editing skill prompts, keeping REST + MCP + skills in sync |
| [.agents/backend-signing.md](.agents/backend-signing.md) | Backend OCI image signing (keyless cosign + sigstore-go) — producer-side CI setup, consumer-side gallery `verification:` block, strict mode (`LOCALAI_REQUIRE_BACKEND_INTEGRITY`), revocation via `not_before` |
## Quick Reference

View File

@@ -1,10 +1,10 @@
# ds4 backend Makefile.
#
# Upstream pin lives below as DS4_VERSION?=950e8e6474a1c9fabe04e669d607606a7ef8824f
# Upstream pin lives below as DS4_VERSION?=8d576642c39b9a2d782a80159ba84ef5a81c0b81
# (.github/bump_deps.sh) can find and update it - matches the
# llama-cpp / ik-llama-cpp / turboquant convention.
DS4_VERSION?=950e8e6474a1c9fabe04e669d607606a7ef8824f
DS4_VERSION?=8d576642c39b9a2d782a80159ba84ef5a81c0b81
DS4_REPO?=https://github.com/antirez/ds4
CURRENT_MAKEFILE_DIR := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=5cc0d86c760e9858e4bed4418400bb39dbe025f2
IK_LLAMA_VERSION?=48a55f74e4c6e2aeda363dd386c1ac9170a0af71
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -1,5 +1,5 @@
LLAMA_VERSION?=1348f67c58f561808136e8a152a9eddec168f221
LLAMA_VERSION?=bb28c1fe246b72276ee1d00ce89306be7b865766
LLAMA_REPO?=https://github.com/ggerganov/llama.cpp
CMAKE_ARGS?=

View File

@@ -517,16 +517,27 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
params.warmup = true;
// no_op_offload: disable host tensor op offload (default: false)
params.no_op_offload = false;
// kv_unified: enable unified KV cache (default: false)
params.kv_unified = false;
// n_ctx_checkpoints: max context checkpoints per slot (default: 8)
params.n_ctx_checkpoints = 8;
// llama memory fit fails if we don't provide a buffer for tensor overrides
const size_t ntbo = llama_max_tensor_buft_overrides();
while (params.tensor_buft_overrides.size() < ntbo) {
params.tensor_buft_overrides.push_back({nullptr, nullptr});
}
// kv_unified: enable unified KV cache. Upstream's server auto-enables this
// when the slot count is auto (-np <0), bumping n_parallel to 4 alongside.
// LocalAI keeps n_parallel=1 by default, which would skip that auto path
// and leave kv_unified=false. We flip the default to true here so the
// server-side prompt cache (cache_idle_slots) is actually usable on the
// single-slot path that LocalAI ships with: without it, idle slots are
// never persisted across requests and the prompt cache is dead weight.
// Users can opt out with `options: [ "kv_unified:false" ]`.
params.kv_unified = true;
// n_ctx_checkpoints: max context checkpoints per slot. Match upstream's
// default (32); the previous LocalAI-specific 8 was unnecessarily tight
// and limits partial-prefix recovery without a clear memory rationale.
params.n_ctx_checkpoints = 32;
// cache_idle_slots: save and clear idle slot KV to the prompt cache on
// task switch. Upstream default is true; the server auto-disables it if
// kv_unified=false or cache_ram_mib=0, so flipping kv_unified above is
// what actually unlocks it.
params.cache_idle_slots = true;
// checkpoint_every_nt: create a context checkpoint every N tokens during
// prefill (-1 disables). Match upstream's default (8192).
params.checkpoint_every_nt = 8192;
// decode options. Options are in form optname:optvale, or if booleans only optname.
for (int i = 0; i < request->options_size(); i++) {
@@ -685,7 +696,29 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
try {
params.n_ctx_checkpoints = std::stoi(optval_str);
} catch (const std::exception& e) {
// If conversion fails, keep default value (8)
// If conversion fails, keep default value (32)
}
}
// --- server-side idle-slot prompt cache toggle (upstream --cache-idle-slots) ---
// Saves the slot's KV state into the host-side prompt cache on task
// switch so a later request with the same prefix can warm-load it.
// Auto-disabled by the server if kv_unified=false or cache_ram=0.
} else if (!strcmp(optname, "cache_idle_slots") || !strcmp(optname, "idle_slots_cache")) {
if (optval_str == "true" || optval_str == "1" || optval_str == "yes" || optval_str == "on" || optval_str == "enabled") {
params.cache_idle_slots = true;
} else if (optval_str == "false" || optval_str == "0" || optval_str == "no" || optval_str == "off" || optval_str == "disabled") {
params.cache_idle_slots = false;
}
// --- prefill checkpoint cadence (upstream -cpent / --checkpoint-every-n-tokens) ---
// -1 disables checkpointing during prefill.
} else if (!strcmp(optname, "checkpoint_every_nt") || !strcmp(optname, "checkpoint_every_n_tokens")) {
if (optval != NULL) {
try {
params.checkpoint_every_nt = std::stoi(optval_str);
} catch (const std::exception& e) {
// If conversion fails, keep default value (8192)
}
}
@@ -1081,6 +1114,20 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
params.kv_overrides.back().key[0] = 0;
}
// tensor_buft_overrides sentinel termination (mirrors upstream common/arg.cpp).
// Real entries are pushed during option parsing; here we pad/terminate so the
// model loader sees back().pattern == nullptr (GGML_ASSERT at common.cpp:1543)
// and so llama_params_fit has the placeholder slots it requires.
{
const size_t ntbo = llama_max_tensor_buft_overrides();
while (params.tensor_buft_overrides.size() < ntbo) {
params.tensor_buft_overrides.push_back({nullptr, nullptr});
}
}
if (!params.speculative.draft.tensor_buft_overrides.empty()) {
params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
}
// TODO: Add yarn
if (!request->tensorsplit().empty()) {

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# acestep.cpp version
ACESTEP_REPO?=https://github.com/ace-step/acestep.cpp
ACESTEP_CPP_VERSION?=e0c8d75a672fca5684c88c68dbf6d12f58754258
ACESTEP_CPP_VERSION?=ed53caf164e4492a5620b2e3f2264629cf66da24
SO_TARGET?=libgoacestepcpp.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -22,12 +22,11 @@
#include <vector>
// Global model contexts (loaded once, reused across requests)
static DiTGGML g_dit = {};
static DiTGGMLConfig g_dit_cfg;
static VAEGGML g_vae = {};
static bool g_dit_loaded = false;
static bool g_vae_loaded = false;
static bool g_is_turbo = false;
static DiTGGML g_dit = {};
static VAEGGML g_vae = {};
static bool g_dit_loaded = false;
static bool g_vae_loaded = false;
static bool g_is_turbo = false;
// Silence latent [15000, 64] — read once from DiT GGUF
static std::vector<float> g_silence_full;
@@ -72,10 +71,9 @@ int load_model(const char * lm_model_path, const char * text_encoder_path,
g_text_enc_path = text_encoder_path;
g_dit_path = dit_model_path;
// Load DiT model
// Load DiT model (backend init + config are handled inside dit_ggml_load)
fprintf(stderr, "[acestep-cpp] Loading DiT from %s\n", dit_model_path);
dit_ggml_init_backend(&g_dit);
if (!dit_ggml_load(&g_dit, dit_model_path, g_dit_cfg, nullptr, 0.0f)) {
if (!dit_ggml_load(&g_dit, dit_model_path)) {
fprintf(stderr, "[acestep-cpp] FATAL: failed to load DiT from %s\n", dit_model_path);
return 1;
}
@@ -149,16 +147,16 @@ int generate_music(const char * caption, const char * lyrics, int bpm,
// Compute T (latent frames at 25Hz)
int T = (int)(duration * FRAMES_PER_SECOND);
T = ((T + g_dit_cfg.patch_size - 1) / g_dit_cfg.patch_size) * g_dit_cfg.patch_size;
int S = T / g_dit_cfg.patch_size;
T = ((T + g_dit.cfg.patch_size - 1) / g_dit.cfg.patch_size) * g_dit.cfg.patch_size;
int S = T / g_dit.cfg.patch_size;
if (T > 15000) {
fprintf(stderr, "[acestep-cpp] ERROR: T=%d exceeds max 15000\n", T);
return 2;
}
int Oc = g_dit_cfg.out_channels; // 64
int ctx_ch = g_dit_cfg.in_channels - Oc; // 128
int Oc = g_dit.cfg.out_channels; // 64
int ctx_ch = g_dit.cfg.in_channels - Oc; // 128
fprintf(stderr, "[acestep-cpp] T=%d, S=%d, duration=%.1fs, seed=%d\n", T, S, duration, seed);
@@ -191,9 +189,8 @@ int generate_music(const char * caption, const char * lyrics, int bpm,
fprintf(stderr, "[acestep-cpp] caption: %d tokens, lyrics: %d tokens\n", S_text, S_lyric);
// 4. Text encoder forward
// 4. Text encoder forward (backend init handled inside qwen3_load_text_encoder)
Qwen3GGML text_enc = {};
qwen3_init_backend(&text_enc);
if (!qwen3_load_text_encoder(&text_enc, g_text_enc_path.c_str())) {
fprintf(stderr, "[acestep-cpp] FATAL: failed to load text encoder\n");
return 4;
@@ -209,9 +206,8 @@ int generate_music(const char * caption, const char * lyrics, int bpm,
std::vector<float> lyric_embed(H_text * S_lyric);
qwen3_embed_lookup(&text_enc, lyric_ids.data(), S_lyric, lyric_embed.data());
// 6. Condition encoder
// 6. Condition encoder (backend init handled inside cond_ggml_load)
CondGGML cond = {};
cond_ggml_init_backend(&cond);
if (!cond_ggml_load(&cond, g_dit_path.c_str())) {
fprintf(stderr, "[acestep-cpp] FATAL: failed to load condition encoder\n");
qwen3_free(&text_enc);

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=0b8296915c4094090cff6bd2e09a5e98288c3c7d
STABLEDIFFUSION_GGML_VERSION?=3a8788cb7d74f185d6b18688e9563015524ecaf5
CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -1188,6 +1188,9 @@ int gen_video(sd_vid_gen_params_t *p, int steps, char *dst, float cfg_scale, int
p->high_noise_sample_params.scheduler = scheduler;
p->high_noise_sample_params.flow_shift = flow_shift;
// Pin output fps in params; upstream uses it for audio sync (and we also mux at this rate).
p->fps = fps;
// Load init/end reference images if provided (resized to output dims).
uint8_t* init_buf = nullptr;
uint8_t* end_buf = nullptr;
@@ -1206,11 +1209,14 @@ int gen_video(sd_vid_gen_params_t *p, int steps, char *dst, float cfg_scale, int
// Generate
int num_frames_out = 0;
sd_image_t* frames = generate_video(sd_c, p, &num_frames_out);
sd_image_t* frames = nullptr;
sd_audio_t* audio = nullptr;
bool ok = generate_video(sd_c, p, &frames, &num_frames_out, &audio);
std::free(p);
if (!frames || num_frames_out == 0) {
if (!ok || !frames || num_frames_out == 0) {
fprintf(stderr, "generate_video produced no frames\n");
if (audio) free_sd_audio(audio);
if (init_buf) free(init_buf);
if (end_buf) free(end_buf);
return 1;
@@ -1224,6 +1230,7 @@ int gen_video(sd_vid_gen_params_t *p, int steps, char *dst, float cfg_scale, int
if (frames[i].data) free(frames[i].data);
}
free(frames);
if (audio) free_sd_audio(audio);
if (init_buf) free(init_buf);
if (end_buf) free(end_buf);

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# whisper.cpp version
WHISPER_REPO?=https://github.com/ggml-org/whisper.cpp
WHISPER_CPP_VERSION?=968eebe77225d25e57a3f981da7c696310f0e881
WHISPER_CPP_VERSION?=8443cf05e3fa8ce1b32348e1bcbcf8fc31f7f3ae
SO_TARGET?=libgowhisper.so
CMAKE_ARGS+=-DBUILD_SHARED_LIBS=OFF

View File

@@ -2,9 +2,9 @@ torch==2.7.1
llvmlite==0.43.0
numba==0.60.0
accelerate
transformers>=5.8.0
transformers>=5.8.1
bitsandbytes
sentence-transformers==5.4.0
sentence-transformers==5.5.0
diffusers
soundfile
protobuf==6.33.5

View File

@@ -2,9 +2,9 @@ torch==2.7.1
accelerate
llvmlite==0.43.0
numba==0.60.0
transformers>=5.8.0
transformers>=5.8.1
bitsandbytes
sentence-transformers==5.4.0
sentence-transformers==5.5.0
diffusers
soundfile
protobuf==6.33.5

View File

@@ -2,9 +2,9 @@
torch==2.9.0
llvmlite==0.43.0
numba==0.60.0
transformers>=5.8.0
transformers>=5.8.1
bitsandbytes
sentence-transformers==5.4.0
sentence-transformers==5.5.0
diffusers
soundfile
protobuf==6.33.5

View File

@@ -1,11 +1,11 @@
--extra-index-url https://download.pytorch.org/whl/rocm7.0
torch==2.10.0+rocm7.0
accelerate
transformers>=5.8.0
transformers>=5.8.1
llvmlite==0.43.0
numba==0.60.0
bitsandbytes
sentence-transformers==5.4.0
sentence-transformers==5.5.0
diffusers
soundfile
protobuf==6.33.5

View File

@@ -3,9 +3,9 @@ torch
optimum[openvino]
llvmlite==0.43.0
numba==0.60.0
transformers>=5.8.0
transformers>=5.8.1
bitsandbytes
sentence-transformers==5.4.0
sentence-transformers==5.5.0
diffusers
soundfile
protobuf==6.33.5

View File

@@ -2,9 +2,9 @@ torch==2.7.1
llvmlite==0.43.0
numba==0.60.0
accelerate
transformers>=5.8.0
transformers>=5.8.1
bitsandbytes
sentence-transformers==5.4.0
sentence-transformers==5.5.0
diffusers
soundfile
protobuf==6.33.5

View File

@@ -212,12 +212,12 @@ func New(opts ...config.AppOption) (*Application, error) {
}
}
if err := coreStartup.InstallModels(options.Context, application.GalleryService(), options.Galleries, options.BackendGalleries, options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, nil, options.ModelsURL...); err != nil {
if err := coreStartup.InstallModels(options.Context, application.GalleryService(), options.Galleries, options.BackendGalleries, options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.RequireBackendIntegrity, nil, options.ModelsURL...); err != nil {
xlog.Error("error installing models", "error", err)
}
for _, backend := range options.ExternalBackends {
if err := galleryop.InstallExternalBackend(options.Context, options.BackendGalleries, options.SystemState, application.ModelLoader(), nil, backend, "", ""); err != nil {
if err := galleryop.InstallExternalBackend(options.Context, options.BackendGalleries, options.SystemState, application.ModelLoader(), nil, backend, "", "", options.RequireBackendIntegrity); err != nil {
xlog.Error("error installing external backend", "error", err)
}
}
@@ -267,13 +267,13 @@ func New(opts ...config.AppOption) (*Application, error) {
}
if options.PreloadJSONModels != "" {
if err := galleryop.ApplyGalleryFromString(options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.Galleries, options.BackendGalleries, options.PreloadJSONModels); err != nil {
if err := galleryop.ApplyGalleryFromString(options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.Galleries, options.BackendGalleries, options.PreloadJSONModels, options.RequireBackendIntegrity); err != nil {
return nil, err
}
}
if options.PreloadModelsFromPath != "" {
if err := galleryop.ApplyGalleryFromFile(options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.Galleries, options.BackendGalleries, options.PreloadModelsFromPath); err != nil {
if err := galleryop.ApplyGalleryFromFile(options.SystemState, application.ModelLoader(), options.EnforcePredownloadScans, options.AutoloadBackendGalleries, options.Galleries, options.BackendGalleries, options.PreloadModelsFromPath, options.RequireBackendIntegrity); err != nil {
return nil, err
}
}
@@ -552,6 +552,13 @@ func loadRuntimeSettingsFromFile(options *config.ApplicationConfig) {
options.TracingMaxItems = *settings.TracingMaxItems
}
}
if settings.TracingMaxBodyBytes != nil {
// Allow the on-disk setting to override the CLI/env default. The
// startup default is non-zero (see NewApplicationConfig), so a plain
// `== 0` guard like the others would never trigger; we instead respect
// any value the file specifies. 0 in the file means "uncapped".
options.TracingMaxBodyBytes = *settings.TracingMaxBodyBytes
}
// Branding / whitelabeling. There are no env vars for these — the file is
// the only source — so apply unconditionally. Without this block a server

View File

@@ -217,7 +217,7 @@ func (uc *UpgradeChecker) runCheck(ctx context.Context) {
err = bm.UpgradeBackend(ctx, name, nil)
} else {
err = gallery.UpgradeBackend(ctx, uc.systemState, uc.modelLoader,
uc.galleries, name, nil)
uc.galleries, name, nil, uc.appConfig.RequireBackendIntegrity)
}
if err != nil {
xlog.Error("Failed to auto-upgrade backend",

View File

@@ -86,7 +86,7 @@ func ModelInference(ctx context.Context, s string, messages schema.Messages, ima
if !slices.Contains(modelNames, modelName) {
utils.ResetDownloadTimers()
// if we failed to load the model, we try to download it
err := gallery.InstallModelFromGallery(ctx, o.Galleries, o.BackendGalleries, o.SystemState, loader, modelName, gallery.GalleryModel{}, utils.DisplayDownloadFunction, o.EnforcePredownloadScans, o.AutoloadBackendGalleries)
err := gallery.InstallModelFromGallery(ctx, o.Galleries, o.BackendGalleries, o.SystemState, loader, modelName, gallery.GalleryModel{}, utils.DisplayDownloadFunction, o.EnforcePredownloadScans, o.AutoloadBackendGalleries, o.RequireBackendIntegrity)
if err != nil {
xlog.Error("failed to install model from gallery", "error", err, "model", modelFile)
//return nil, err

View File

@@ -17,9 +17,10 @@ import (
)
type BackendsCMDFlags struct {
BackendGalleries string `env:"LOCALAI_BACKEND_GALLERIES,BACKEND_GALLERIES" help:"JSON list of backend galleries" group:"backends" default:"${backends}"`
BackendsPath string `env:"LOCALAI_BACKENDS_PATH,BACKENDS_PATH" type:"path" default:"${basepath}/backends" help:"Path containing backends used for inferencing" group:"storage"`
BackendsSystemPath string `env:"LOCALAI_BACKENDS_SYSTEM_PATH,BACKEND_SYSTEM_PATH" type:"path" default:"/var/lib/local-ai/backends" help:"Path containing system backends used for inferencing" group:"backends"`
BackendGalleries string `env:"LOCALAI_BACKEND_GALLERIES,BACKEND_GALLERIES" help:"JSON list of backend galleries" group:"backends" default:"${backends}"`
BackendsPath string `env:"LOCALAI_BACKENDS_PATH,BACKENDS_PATH" type:"path" default:"${basepath}/backends" help:"Path containing backends used for inferencing" group:"storage"`
BackendsSystemPath string `env:"LOCALAI_BACKENDS_SYSTEM_PATH,BACKEND_SYSTEM_PATH" type:"path" default:"/var/lib/local-ai/backends" help:"Path containing system backends used for inferencing" group:"backends"`
RequireBackendIntegrity bool `env:"LOCALAI_REQUIRE_BACKEND_INTEGRITY,REQUIRE_BACKEND_INTEGRITY" help:"If true, reject backend installs without a configured signature verification policy (OCI URIs) or SHA256 (tarball/HTTP URIs)." group:"hardening" default:"false"`
}
type BackendsList struct {
@@ -126,7 +127,7 @@ func (bi *BackendsInstall) Run(ctx *cliContext.Context) error {
}
modelLoader := model.NewModelLoader(systemState)
err = galleryop.InstallExternalBackend(context.Background(), galleries, systemState, modelLoader, progressCallback, bi.BackendArgs, bi.Name, bi.Alias)
err = galleryop.InstallExternalBackend(context.Background(), galleries, systemState, modelLoader, progressCallback, bi.BackendArgs, bi.Name, bi.Alias, bi.RequireBackendIntegrity)
if err != nil {
return err
}
@@ -197,7 +198,7 @@ func (bu *BackendsUpgrade) Run(ctx *cliContext.Context) error {
}
}
if err := gallery.UpgradeBackend(context.Background(), systemState, modelLoader, galleries, name, progressCallback); err != nil {
if err := gallery.UpgradeBackend(context.Background(), systemState, modelLoader, galleries, name, progressCallback, bu.RequireBackendIntegrity); err != nil {
fmt.Printf("Failed to upgrade %s: %v\n", name, err)
} else {
fmt.Printf("Backend %s upgraded successfully\n", name)

View File

@@ -32,6 +32,7 @@ type ModelsList struct {
type ModelsInstall struct {
DisablePredownloadScan bool `env:"LOCALAI_DISABLE_PREDOWNLOAD_SCAN" help:"If true, disables the best-effort security scanner before downloading any files." group:"hardening" default:"false"`
RequireBackendIntegrity bool `env:"LOCALAI_REQUIRE_BACKEND_INTEGRITY,REQUIRE_BACKEND_INTEGRITY" help:"If true, reject backend installs without a configured signature verification policy (OCI URIs) or SHA256 (tarball/HTTP URIs)." group:"hardening" default:"false"`
AutoloadBackendGalleries bool `env:"LOCALAI_AUTOLOAD_BACKEND_GALLERIES" help:"If true, automatically loads backend galleries" group:"backends" default:"true"`
ModelArgs []string `arg:"" optional:"" name:"models" help:"Model configuration URLs to load"`
@@ -71,7 +72,6 @@ func (ml *ModelsList) Run(ctx *cliContext.Context) error {
}
func (mi *ModelsInstall) Run(ctx *cliContext.Context) error {
systemState, err := system.GetSystemState(
system.WithModelPath(mi.ModelsPath),
system.WithBackendPath(mi.BackendsPath),
@@ -135,7 +135,7 @@ func (mi *ModelsInstall) Run(ctx *cliContext.Context) error {
}
modelLoader := model.NewModelLoader(systemState)
err = startup.InstallModels(context.Background(), galleryService, galleries, backendGalleries, systemState, modelLoader, !mi.DisablePredownloadScan, mi.AutoloadBackendGalleries, progressCallback, modelName)
err = startup.InstallModels(context.Background(), galleryService, galleries, backendGalleries, systemState, modelLoader, !mi.DisablePredownloadScan, mi.AutoloadBackendGalleries, mi.RequireBackendIntegrity, progressCallback, modelName)
if err != nil {
return err
}

View File

@@ -67,6 +67,7 @@ type RunCMD struct {
OllamaAPIRootEndpoint bool `env:"LOCALAI_OLLAMA_API_ROOT_ENDPOINT" default:"false" help:"Register Ollama-compatible health check on / (replaces web UI on root path). The /api/* Ollama endpoints are always available regardless of this flag" group:"api"`
DisableRuntimeSettings bool `env:"LOCALAI_DISABLE_RUNTIME_SETTINGS,DISABLE_RUNTIME_SETTINGS" default:"false" help:"Disables the runtime settings. When set to true, the server will not load the runtime settings from the runtime_settings.json file" group:"api"`
DisablePredownloadScan bool `env:"LOCALAI_DISABLE_PREDOWNLOAD_SCAN" help:"If true, disables the best-effort security scanner before downloading any files." group:"hardening" default:"false"`
RequireBackendIntegrity bool `env:"LOCALAI_REQUIRE_BACKEND_INTEGRITY,REQUIRE_BACKEND_INTEGRITY" help:"If true, backend installs without a configured signature verification policy (for OCI URIs) or SHA256 (for tarball/HTTP URIs) are rejected. Default is to warn and install. Set this in production once your gallery's verification: block is populated." group:"hardening" default:"false"`
OpaqueErrors bool `env:"LOCALAI_OPAQUE_ERRORS" default:"false" help:"If true, all error responses are replaced with blank 500 errors. This is intended only for hardening against information leaks and is normally not recommended." group:"hardening"`
UseSubtleKeyComparison bool `env:"LOCALAI_SUBTLE_KEY_COMPARISON" default:"false" help:"If true, API Key validation comparisons will be performed using constant-time comparisons rather than simple equality. This trades off performance on each request for resiliancy against timing attacks." group:"hardening"`
DisableApiKeyRequirementForHttpGet bool `env:"LOCALAI_DISABLE_API_KEY_REQUIREMENT_FOR_HTTP_GET" default:"false" help:"If true, a valid API key is not required to issue GET requests to portions of the web ui. This should only be enabled in secure testing environments" group:"hardening"`
@@ -99,6 +100,7 @@ type RunCMD struct {
LoadToMemory []string `env:"LOCALAI_LOAD_TO_MEMORY,LOAD_TO_MEMORY" help:"A list of models to load into memory at startup" group:"models"`
EnableTracing bool `env:"LOCALAI_ENABLE_TRACING,ENABLE_TRACING" help:"Enable API tracing" group:"api"`
TracingMaxItems int `env:"LOCALAI_TRACING_MAX_ITEMS" default:"1024" help:"Maximum number of traces to keep" group:"api"`
TracingMaxBodyBytes int `env:"LOCALAI_TRACING_MAX_BODY_BYTES" default:"65536" help:"Maximum bytes captured per request/response body in the trace buffer (0 = uncapped). Caps memory growth from chatty endpoints like /embeddings." group:"api"`
AgentJobRetentionDays int `env:"LOCALAI_AGENT_JOB_RETENTION_DAYS,AGENT_JOB_RETENTION_DAYS" default:"30" help:"Number of days to keep agent job history (default: 30)" group:"api"`
OpenResponsesStoreTTL string `env:"LOCALAI_OPEN_RESPONSES_STORE_TTL,OPEN_RESPONSES_STORE_TTL" default:"0" help:"TTL for Open Responses store (e.g., 1h, 30m, 0 = no expiration)" group:"api"`
@@ -272,6 +274,7 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
opts = append(opts, config.EnableTracing)
}
opts = append(opts, config.WithTracingMaxItems(r.TracingMaxItems))
opts = append(opts, config.WithTracingMaxBodyBytes(r.TracingMaxBodyBytes))
token := ""
if r.Peer2Peer || r.Peer2PeerToken != "" {
@@ -503,6 +506,10 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
opts = append(opts, config.WithAutoUpgradeBackends(r.AutoUpgradeBackends))
}
if r.RequireBackendIntegrity {
opts = append(opts, config.WithRequireBackendIntegrity(r.RequireBackendIntegrity))
}
if r.PreferDevelopmentBackends {
opts = append(opts, config.WithPreferDevelopmentBackends(r.PreferDevelopmentBackends))
}

View File

@@ -1,10 +1,11 @@
package worker
type WorkerFlags struct {
BackendsPath string `env:"LOCALAI_BACKENDS_PATH,BACKENDS_PATH" type:"path" default:"${basepath}/backends" help:"Path containing backends used for inferencing" group:"backends"`
BackendGalleries string `env:"LOCALAI_BACKEND_GALLERIES,BACKEND_GALLERIES" help:"JSON list of backend galleries" group:"backends" default:"${backends}"`
BackendsSystemPath string `env:"LOCALAI_BACKENDS_SYSTEM_PATH,BACKEND_SYSTEM_PATH" type:"path" default:"/var/lib/local-ai/backends" help:"Path containing system backends used for inferencing" group:"backends"`
ExtraLLamaCPPArgs string `name:"llama-cpp-args" env:"LOCALAI_EXTRA_LLAMA_CPP_ARGS,EXTRA_LLAMA_CPP_ARGS" help:"Extra arguments to pass to llama-cpp-rpc-server"`
BackendsPath string `env:"LOCALAI_BACKENDS_PATH,BACKENDS_PATH" type:"path" default:"${basepath}/backends" help:"Path containing backends used for inferencing" group:"backends"`
BackendGalleries string `env:"LOCALAI_BACKEND_GALLERIES,BACKEND_GALLERIES" help:"JSON list of backend galleries" group:"backends" default:"${backends}"`
BackendsSystemPath string `env:"LOCALAI_BACKENDS_SYSTEM_PATH,BACKEND_SYSTEM_PATH" type:"path" default:"/var/lib/local-ai/backends" help:"Path containing system backends used for inferencing" group:"backends"`
RequireBackendIntegrity bool `env:"LOCALAI_REQUIRE_BACKEND_INTEGRITY,REQUIRE_BACKEND_INTEGRITY" help:"If true, reject backend installs without a configured signature verification policy (OCI URIs) or SHA256 (tarball/HTTP URIs)." group:"hardening" default:"false"`
ExtraLLamaCPPArgs string `name:"llama-cpp-args" env:"LOCALAI_EXTRA_LLAMA_CPP_ARGS,EXTRA_LLAMA_CPP_ARGS" help:"Extra arguments to pass to llama-cpp-rpc-server"`
}
type Worker struct {

View File

@@ -18,7 +18,7 @@ import (
// installing the backend from the gallery if it isn't present.
// `name` is the gallery entry name (for vLLM the meta entry "vllm"
// resolves to a platform-specific package via capability lookup).
func findBackendPath(name, galleries string, systemState *system.SystemState) (string, error) {
func findBackendPath(name, galleries string, systemState *system.SystemState, requireIntegrity bool) (string, error) {
backends, err := gallery.ListSystemBackends(systemState)
if err != nil {
return "", err
@@ -33,7 +33,7 @@ func findBackendPath(name, galleries string, systemState *system.SystemState) (s
xlog.Error("failed loading galleries", "error", err)
return "", err
}
if err := gallery.InstallBackendFromGallery(context.Background(), gals, systemState, ml, name, nil, true); err != nil {
if err := gallery.InstallBackendFromGallery(context.Background(), gals, systemState, ml, name, nil, true, requireIntegrity); err != nil {
xlog.Error("backend not found, failed to install it", "name", name, "error", err)
return "", err
}

View File

@@ -27,7 +27,7 @@ const (
llamaCPPGalleryName = "llama-cpp"
)
func findLLamaCPPBackend(galleries string, systemState *system.SystemState) (string, error) {
func findLLamaCPPBackend(galleries string, systemState *system.SystemState, requireIntegrity bool) (string, error) {
backends, err := gallery.ListSystemBackends(systemState)
if err != nil {
xlog.Warn("Failed listing system backends", "error", err)
@@ -43,7 +43,7 @@ func findLLamaCPPBackend(galleries string, systemState *system.SystemState) (str
xlog.Error("failed loading galleries", "error", err)
return "", err
}
err := gallery.InstallBackendFromGallery(context.Background(), gals, systemState, ml, llamaCPPGalleryName, nil, true)
err := gallery.InstallBackendFromGallery(context.Background(), gals, systemState, ml, llamaCPPGalleryName, nil, true, requireIntegrity)
if err != nil {
xlog.Error("llama-cpp backend not found, failed to install it", "error", err)
return "", err
@@ -76,7 +76,7 @@ func (r *LLamaCPP) Run(ctx *cliContext.Context) error {
if err != nil {
return err
}
grpcProcess, err := findLLamaCPPBackend(r.BackendGalleries, systemState)
grpcProcess, err := findLLamaCPPBackend(r.BackendGalleries, systemState, r.RequireBackendIntegrity)
if err != nil {
return err
}

View File

@@ -9,8 +9,8 @@ import (
const mlxDistributedGalleryName = "mlx-distributed"
func findMLXDistributedBackendPath(galleries string, systemState *system.SystemState) (string, error) {
return findBackendPath(mlxDistributedGalleryName, galleries, systemState)
func findMLXDistributedBackendPath(galleries string, systemState *system.SystemState, requireIntegrity bool) (string, error) {
return findBackendPath(mlxDistributedGalleryName, galleries, systemState, requireIntegrity)
}
// buildMLXCommand builds the exec.Cmd to launch the mlx-distributed backend.

View File

@@ -28,7 +28,7 @@ func (r *MLXDistributed) Run(ctx *cliContext.Context) error {
return err
}
backendPath, err := findMLXDistributedBackendPath(r.BackendGalleries, systemState)
backendPath, err := findMLXDistributedBackendPath(r.BackendGalleries, systemState, r.RequireBackendIntegrity)
if err != nil {
return fmt.Errorf("cannot find mlx-distributed backend: %w", err)
}

View File

@@ -73,7 +73,7 @@ func (r *P2P) Run(ctx *cliContext.Context) error {
for {
xlog.Info("Starting llama-cpp-rpc-server", "address", address, "port", port)
grpcProcess, err := findLLamaCPPBackend(r.BackendGalleries, systemState)
grpcProcess, err := findLLamaCPPBackend(r.BackendGalleries, systemState, r.RequireBackendIntegrity)
if err != nil {
xlog.Error("Failed to find llama-cpp-rpc-server", "error", err)
return

View File

@@ -48,7 +48,7 @@ func (r *P2PMLX) Run(ctx *cliContext.Context) error {
c, cancel := context.WithCancel(context.Background())
defer cancel()
backendPath, err := findMLXDistributedBackendPath(r.BackendGalleries, systemState)
backendPath, err := findMLXDistributedBackendPath(r.BackendGalleries, systemState, r.RequireBackendIntegrity)
if err != nil {
xlog.Warn("Could not find mlx-distributed backend from gallery, will try backend.py directly", "error", err)
}

View File

@@ -77,7 +77,7 @@ func (r *VLLMDistributed) Run(ctx *cliContext.Context) error {
return fmt.Errorf("getting system state: %w", err)
}
backendPath, err := findBackendPath("vllm", r.BackendGalleries, systemState)
backendPath, err := findBackendPath("vllm", r.BackendGalleries, systemState, r.RequireBackendIntegrity)
if err != nil {
return fmt.Errorf("cannot find vllm backend: %w", err)
}

View File

@@ -21,6 +21,7 @@ type ApplicationConfig struct {
Debug bool
EnableTracing bool
TracingMaxItems int
TracingMaxBodyBytes int // Per-body cap for captured request/response bodies; 0 disables the cap
EnableBackendLogging bool
GeneratedContentDir string
@@ -60,6 +61,13 @@ type ApplicationConfig struct {
AutoUpgradeBackends bool
PreferDevelopmentBackends bool
// RequireBackendIntegrity promotes a missing SHA256 (tarball/HTTP URIs)
// or missing verification policy (OCI URIs) from a warning to a hard
// failure during backend install/upgrade. Off by default to keep
// upgrades non-breaking; operators opt in explicitly via
// --require-backend-integrity / LOCALAI_REQUIRE_BACKEND_INTEGRITY.
RequireBackendIntegrity bool
SingleBackend bool // Deprecated: use MaxActiveBackends = 1 instead
MaxActiveBackends int // Maximum number of active backends (0 = unlimited, 1 = single backend mode)
WatchDogIdle bool
@@ -180,6 +188,7 @@ func NewApplicationConfig(o ...AppOption) *ApplicationConfig {
LRUEvictionRetryInterval: 1 * time.Second, // Default: 1 second
WatchDogInterval: 500 * time.Millisecond, // Default: 500ms
TracingMaxItems: 1024,
TracingMaxBodyBytes: 64 * 1024, // 64 KiB - caps each request/response body in the trace buffer
AgentPool: AgentPoolConfig{
Enabled: true,
Timeout: "5m",
@@ -436,6 +445,10 @@ func WithAutoUpgradeBackends(v bool) AppOption {
return func(o *ApplicationConfig) { o.AutoUpgradeBackends = v }
}
func WithRequireBackendIntegrity(v bool) AppOption {
return func(o *ApplicationConfig) { o.RequireBackendIntegrity = v }
}
func WithPreferDevelopmentBackends(v bool) AppOption {
return func(o *ApplicationConfig) { o.PreferDevelopmentBackends = v }
}
@@ -567,6 +580,12 @@ func WithTracingMaxItems(items int) AppOption {
}
}
func WithTracingMaxBodyBytes(bytes int) AppOption {
return func(o *ApplicationConfig) {
o.TracingMaxBodyBytes = bytes
}
}
func WithGeneratedContentDir(generatedContentDir string) AppOption {
return func(o *ApplicationConfig) {
o.GeneratedContentDir = generatedContentDir
@@ -909,6 +928,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
f16 := o.F16
debug := o.Debug
tracingMaxItems := o.TracingMaxItems
tracingMaxBodyBytes := o.TracingMaxBodyBytes
enableTracing := o.EnableTracing
enableBackendLogging := o.EnableBackendLogging
cors := o.CORS
@@ -997,6 +1017,7 @@ func (o *ApplicationConfig) ToRuntimeSettings() RuntimeSettings {
F16: &f16,
Debug: &debug,
TracingMaxItems: &tracingMaxItems,
TracingMaxBodyBytes: &tracingMaxBodyBytes,
EnableTracing: &enableTracing,
EnableBackendLogging: &enableBackendLogging,
CORS: &cors,
@@ -1135,6 +1156,9 @@ func (o *ApplicationConfig) ApplyRuntimeSettings(settings *RuntimeSettings) (req
if settings.TracingMaxItems != nil {
o.TracingMaxItems = *settings.TracingMaxItems
}
if settings.TracingMaxBodyBytes != nil {
o.TracingMaxBodyBytes = *settings.TracingMaxBodyBytes
}
if settings.EnableBackendLogging != nil {
o.EnableBackendLogging = *settings.EnableBackendLogging
}

View File

@@ -1,6 +1,37 @@
package config
type Gallery struct {
URL string `json:"url" yaml:"url"`
Name string `json:"name" yaml:"name"`
// GalleryVerification declares the keyless-cosign signature policy that
// every OCI backend image fetched from this gallery must satisfy.
//
// Verification is opt-in: galleries without a Verification block install
// backends with no signature check (the downloader logs a warning when
// LOCALAI_REQUIRE_BACKEND_INTEGRITY is unset; that flag turns the warning
// into a hard error).
//
// Identity matching: set Issuer (exact) or IssuerRegex, AND Identity
// (exact) or IdentityRegex. For GitHub Actions keyless signing the
// typical shape is:
//
// verification:
// issuer: "https://token.actions.githubusercontent.com"
// identity_regex: "^https://github\\.com/mudler/local-ai-backends/\\.github/workflows/build\\.yaml@refs/heads/master$"
// not_before: "2026-05-01T00:00:00Z"
//
// NotBefore is the revocation lever: advance it to invalidate every
// signature produced before a known compromise window. Keyless cosign
// certs are ephemeral so there is no CA-side revocation.
type GalleryVerification struct {
Issuer string `json:"issuer,omitempty" yaml:"issuer,omitempty"`
IssuerRegex string `json:"issuer_regex,omitempty" yaml:"issuer_regex,omitempty"`
Identity string `json:"identity,omitempty" yaml:"identity,omitempty"`
IdentityRegex string `json:"identity_regex,omitempty" yaml:"identity_regex,omitempty"`
// NotBefore is an RFC3339 timestamp. Empty disables the time check.
NotBefore string `json:"not_before,omitempty" yaml:"not_before,omitempty"`
}
type Gallery struct {
URL string `json:"url" yaml:"url"`
Name string `json:"name" yaml:"name"`
Verification *GalleryVerification `json:"verification,omitempty" yaml:"verification,omitempty"`
}

View File

@@ -54,6 +54,13 @@ func guessGGUFFromFile(cfg *ModelConfig, f *gguf.GGUFFile, defaultCtx int) {
cfg.modelTemplate = chatTemplate.ValueString()
}
// Auto-enable Multi-Token Prediction (ggml-org/llama.cpp#22673) when the
// GGUF carries an embedded MTP head. Skipped silently for non-MTP models
// and when the user already configured a spec_type.
if n, ok := HasEmbeddedMTPHead(f); ok {
ApplyMTPDefaults(cfg, n)
}
// Thinking support detection is done after model load via DetectThinkingSupportFromBackend
// template estimations

84
core/config/mtp.go Normal file
View File

@@ -0,0 +1,84 @@
package config
import (
"strings"
gguf "github.com/gpustack/gguf-parser-go"
"github.com/mudler/xlog"
)
// mtpSpecOptions lists the speculative-decoding option keys auto-applied when
// an MTP head is detected on a llama-cpp GGUF. Defaults track the upstream
// MTP PR (ggml-org/llama.cpp#22673):
//
// - spec_type:draft-mtp activates Multi-Token Prediction
// - spec_n_max:6 draft window
// - spec_p_min:0.75 pinned because upstream marked the 0.75 default
// with a "change to 0.0f" TODO; locking it here keeps acceptance
// thresholds stable across future bumps
var mtpSpecOptions = []string{
"spec_type:draft-mtp",
"spec_n_max:6",
"spec_p_min:0.75",
}
// MTPSpecOptions returns a copy of the option keys auto-applied when an MTP
// head is detected. Exported for testing and for the importer.
func MTPSpecOptions() []string {
out := make([]string, len(mtpSpecOptions))
copy(out, mtpSpecOptions)
return out
}
// HasEmbeddedMTPHead reports whether the parsed GGUF declares a Multi-Token
// Prediction head. Detection reads `<arch>.nextn_predict_layers`, which is
// what `gguf_writer.add_nextn_predict_layers(n)` emits in upstream's
// `conversion/qwen.py` MTP mixin. A positive layer count means the head is
// present in the same GGUF as the trunk.
func HasEmbeddedMTPHead(f *gguf.GGUFFile) (uint32, bool) {
if f == nil {
return 0, false
}
arch := f.Architecture().Architecture
if arch == "" {
return 0, false
}
v, ok := f.Header.MetadataKV.Get(arch + ".nextn_predict_layers")
if !ok {
return 0, false
}
n := gguf.ValueNumeric[uint32](v)
return n, n > 0
}
// hasSpecTypeOption returns true when the slice already contains a
// user-configured `spec_type:` / `speculative_type:` entry. Used to avoid
// clobbering an explicit choice with the MTP auto-defaults.
func hasSpecTypeOption(opts []string) bool {
for _, o := range opts {
if strings.HasPrefix(o, "spec_type:") || strings.HasPrefix(o, "speculative_type:") {
return true
}
}
return false
}
// ApplyMTPDefaults appends the auto-MTP option keys to cfg.Options when none
// is already configured. It is a no-op when the user already picked a
// `spec_type` (either via YAML or via the importer's preferences flow).
//
// `layers` is the value read from `<arch>.nextn_predict_layers` and is only
// used for the diagnostic log line.
func ApplyMTPDefaults(cfg *ModelConfig, layers uint32) {
if cfg == nil {
return
}
if hasSpecTypeOption(cfg.Options) {
xlog.Debug("[mtp] embedded MTP head detected but spec_type already configured; leaving user choice intact",
"name", cfg.Name, "nextn_layers", layers)
return
}
cfg.Options = append(cfg.Options, mtpSpecOptions...)
xlog.Info("[mtp] embedded MTP head detected; enabling draft-mtp speculative decoding",
"name", cfg.Name, "nextn_layers", layers, "spec_n_max", 6, "spec_p_min", 0.75)
}

86
core/config/mtp_test.go Normal file
View File

@@ -0,0 +1,86 @@
package config_test
import (
. "github.com/mudler/LocalAI/core/config"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
var _ = Describe("MTP auto-defaults", func() {
Context("MTPSpecOptions", func() {
It("returns the upstream-recommended speculative tuple", func() {
Expect(MTPSpecOptions()).To(Equal([]string{
"spec_type:draft-mtp",
"spec_n_max:6",
"spec_p_min:0.75",
}))
})
It("returns a defensive copy so callers cannot mutate the package default", func() {
opts := MTPSpecOptions()
opts[0] = "spec_type:none"
Expect(MTPSpecOptions()[0]).To(Equal("spec_type:draft-mtp"))
})
})
Context("ApplyMTPDefaults", func() {
It("appends MTP options when nothing is configured", func() {
cfg := &ModelConfig{Name: "qwen-mtp"}
ApplyMTPDefaults(cfg, 1)
Expect(cfg.Options).To(Equal([]string{
"spec_type:draft-mtp",
"spec_n_max:6",
"spec_p_min:0.75",
}))
})
It("preserves unrelated options already on the config", func() {
cfg := &ModelConfig{
Name: "qwen-mtp",
Options: []string{"use_jinja:true", "cache_reuse:256"},
}
ApplyMTPDefaults(cfg, 1)
Expect(cfg.Options).To(Equal([]string{
"use_jinja:true",
"cache_reuse:256",
"spec_type:draft-mtp",
"spec_n_max:6",
"spec_p_min:0.75",
}))
})
It("is a no-op when the user already configured spec_type", func() {
cfg := &ModelConfig{
Name: "qwen-mtp",
Options: []string{"spec_type:ngram-simple", "use_jinja:true"},
}
ApplyMTPDefaults(cfg, 1)
Expect(cfg.Options).To(Equal([]string{
"spec_type:ngram-simple",
"use_jinja:true",
}))
})
It("also respects the legacy speculative_type alias", func() {
cfg := &ModelConfig{
Name: "qwen-mtp",
Options: []string{"speculative_type:ngram-mod"},
}
ApplyMTPDefaults(cfg, 1)
Expect(cfg.Options).To(Equal([]string{"speculative_type:ngram-mod"}))
})
It("tolerates a nil config", func() {
Expect(func() { ApplyMTPDefaults(nil, 1) }).ToNot(Panic())
})
})
Context("HasEmbeddedMTPHead", func() {
It("returns false on a nil GGUF file", func() {
n, ok := HasEmbeddedMTPHead(nil)
Expect(ok).To(BeFalse())
Expect(n).To(BeZero())
})
})
})

View File

@@ -38,6 +38,7 @@ type RuntimeSettings struct {
Debug *bool `json:"debug,omitempty"`
EnableTracing *bool `json:"enable_tracing,omitempty"`
TracingMaxItems *int `json:"tracing_max_items,omitempty"`
TracingMaxBodyBytes *int `json:"tracing_max_body_bytes,omitempty"` // Per-body cap in bytes; 0 disables the cap
EnableBackendLogging *bool `json:"enable_backend_logging,omitempty"`
// Security/CORS settings

View File

@@ -16,6 +16,7 @@ import (
"github.com/mudler/LocalAI/pkg/downloader"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/oci"
"github.com/mudler/LocalAI/pkg/oci/cosignverify"
"github.com/mudler/LocalAI/pkg/system"
"github.com/mudler/xlog"
cp "github.com/otiai10/copy"
@@ -102,8 +103,81 @@ func writeBackendMetadata(backendPath string, metadata *BackendMetadata) error {
return nil
}
// backendDownloadOptions translates the gallery's verification policy into
// downloader options, and gates the call on strict-integrity mode. Both
// InstallBackend and UpgradeBackend MUST route their download through these
// options — without them, the corresponding code path silently downloads
// and activates unverified backend bytes even when the gallery has a
// verification: policy configured.
//
// For OCI URIs with a verification policy, returns a slice containing
// downloader.WithImageVerifier(v) — the downloader will then run cosign
// signature verification between fetching the manifest and extracting
// layers (see pkg/downloader/uri.go OCI branch).
//
// For OCI URIs without a verification policy, or non-OCI URIs without a
// SHA256, the function either returns a non-fatal warning (requireIntegrity
// false) or fails the install (requireIntegrity true).
func backendDownloadOptions(config *GalleryBackend, requireIntegrity bool) ([]downloader.DownloadOption, error) {
uri := downloader.URI(config.URI)
hasVerification := config.Gallery.Verification != nil
hasSHA := config.SHA256 != ""
switch {
case uri.LooksLikeOCI():
if !hasVerification {
if requireIntegrity {
return nil, fmt.Errorf("strict integrity: gallery %q has no verification policy for OCI backend %q (set verification: in the gallery YAML or disable --require-backend-integrity)",
config.Gallery.Name, config.Name)
}
xlog.Warn("installing OCI backend without signature verification",
"backend", config.Name, "gallery", config.Gallery.Name, "uri", config.URI)
return nil, nil
}
v, err := newGalleryVerifier(config.Gallery.Verification)
if err != nil {
return nil, fmt.Errorf("gallery %q verification policy: %w", config.Gallery.Name, err)
}
return []downloader.DownloadOption{downloader.WithImageVerifier(v)}, nil
case uri.LooksLikeDir():
// Local directory — out of scope for integrity checks.
return nil, nil
default:
if !hasSHA && requireIntegrity {
return nil, fmt.Errorf("strict integrity: backend %q has no SHA256 (gallery %q)",
config.Name, config.Gallery.Name)
}
// Non-strict: pkg/downloader already emits a warning when sha is empty.
return nil, nil
}
}
// newGalleryVerifier constructs a cosignverify.Verifier from the gallery
// policy. Parses NotBefore (RFC3339) here so YAML errors surface at install
// time rather than during signature verification.
func newGalleryVerifier(p *config.GalleryVerification) (*cosignverify.Verifier, error) {
pol := cosignverify.Policy{
Issuer: p.Issuer,
IssuerRegex: p.IssuerRegex,
Identity: p.Identity,
IdentityRegex: p.IdentityRegex,
}
if p.NotBefore != "" {
t, err := time.Parse(time.RFC3339, p.NotBefore)
if err != nil {
return nil, fmt.Errorf("not_before %q: %w", p.NotBefore, err)
}
pol.NotBefore = t
}
return cosignverify.NewVerifier(pol, nil, nil)
}
// InstallBackendFromGallery installs a backend from the gallery.
func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, name string, downloadStatus func(string, string, string, float64), force bool) error {
// requireIntegrity escalates a missing SHA256 / verification policy from a
// warning to a hard failure (see backendDownloadOptions).
func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, name string, downloadStatus func(string, string, string, float64), force, requireIntegrity bool) error {
if !force {
// check if we already have the backend installed
backends, err := ListSystemBackends(systemState)
@@ -149,7 +223,7 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
xlog.Debug("Installing backend from meta backend", "name", name, "bestBackend", bestBackend.Name)
// Then, let's install the best backend
if err := InstallBackend(ctx, systemState, modelLoader, bestBackend, downloadStatus); err != nil {
if err := InstallBackend(ctx, systemState, modelLoader, bestBackend, downloadStatus, requireIntegrity); err != nil {
return err
}
@@ -175,10 +249,10 @@ func InstallBackendFromGallery(ctx context.Context, galleries []config.Gallery,
return nil
}
return InstallBackend(ctx, systemState, modelLoader, backend, downloadStatus)
return InstallBackend(ctx, systemState, modelLoader, backend, downloadStatus, requireIntegrity)
}
func InstallBackend(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, config *GalleryBackend, downloadStatus func(string, string, string, float64)) error {
func InstallBackend(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, config *GalleryBackend, downloadStatus func(string, string, string, float64), requireIntegrity bool) error {
// Get configurable fallback tag values from SystemState
latestTag, masterTag, devSuffix := getFallbackTagValues(systemState)
@@ -213,6 +287,14 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
return fmt.Errorf("failed to create base path: %v", err)
}
// Build the download options once and reuse for every retry path —
// mirrors and tag fallbacks must verify against the same gallery
// policy or we open a hole where a non-default URI bypasses the check.
downloadOpts, optsErr := backendDownloadOptions(config, requireIntegrity)
if optsErr != nil {
return fmt.Errorf("backend %q: %w", config.Name, optsErr)
}
uri := downloader.URI(config.URI)
// Check if it is a directory
if uri.LooksLikeDir() {
@@ -222,7 +304,7 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
}
} else {
xlog.Debug("Downloading backend", "uri", config.URI, "backendPath", backendPath)
if err := uri.DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus); err != nil {
if err := uri.DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus, downloadOpts...); err != nil {
xlog.Debug("Backend download failed, trying fallback", "backendPath", backendPath, "error", err)
// resetBackendPath cleans up partial state from a failed OCI extraction
@@ -243,7 +325,7 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
default:
}
resetBackendPath()
if err := downloader.URI(mirror).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus); err == nil {
if err := downloader.URI(mirror).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus, downloadOpts...); err == nil {
success = true
xlog.Debug("Downloaded backend from mirror", "uri", config.URI, "backendPath", backendPath)
break
@@ -256,7 +338,7 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
if fallbackURI != string(config.URI) {
resetBackendPath()
xlog.Info("Trying fallback URI", "original", config.URI, "fallback", fallbackURI)
if err := downloader.URI(fallbackURI).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus); err == nil {
if err := downloader.URI(fallbackURI).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus, downloadOpts...); err == nil {
xlog.Info("Downloaded backend using fallback URI", "uri", fallbackURI, "backendPath", backendPath)
success = true
} else {
@@ -265,7 +347,7 @@ func InstallBackend(ctx context.Context, systemState *system.SystemState, modelL
resetBackendPath()
devFallbackURI := fallbackURI + "-" + devSuffix
xlog.Info("Trying development fallback URI", "fallback", devFallbackURI)
if err := downloader.URI(devFallbackURI).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus); err == nil {
if err := downloader.URI(devFallbackURI).DownloadFileWithContext(ctx, backendPath, config.SHA256, 1, 1, downloadStatus, downloadOpts...); err == nil {
xlog.Info("Downloaded backend using development fallback URI", "uri", devFallbackURI, "backendPath", backendPath)
success = true
} else {

View File

@@ -117,13 +117,13 @@ var _ = Describe("Gallery Backends", func() {
Describe("InstallBackendFromGallery", func() {
It("should return error when backend is not found", func() {
err := InstallBackendFromGallery(context.TODO(), galleries, systemState, ml, "non-existent", nil, true)
err := InstallBackendFromGallery(context.TODO(), galleries, systemState, ml, "non-existent", nil, true, false)
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("no backend found with name \"non-existent\""))
})
It("should install backend from gallery", func() {
err := InstallBackendFromGallery(context.TODO(), galleries, systemState, ml, "test-backend", nil, true)
err := InstallBackendFromGallery(context.TODO(), galleries, systemState, ml, "test-backend", nil, true, false)
Expect(err).ToNot(HaveOccurred())
Expect(filepath.Join(tempDir, "test-backend", "run.sh")).To(BeARegularFile())
})
@@ -545,7 +545,7 @@ var _ = Describe("Gallery Backends", func() {
VRAM: 1000000000000,
Backend: system.Backend{BackendsPath: tempDir},
}
err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true)
err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true, false)
Expect(err).NotTo(HaveOccurred())
metaBackendPath := filepath.Join(tempDir, "meta-backend")
@@ -625,7 +625,7 @@ var _ = Describe("Gallery Backends", func() {
VRAM: 1000000000000,
Backend: system.Backend{BackendsPath: tempDir},
}
err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true)
err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true, false)
Expect(err).NotTo(HaveOccurred())
metaBackendPath := filepath.Join(tempDir, "meta-backend")
@@ -709,7 +709,7 @@ var _ = Describe("Gallery Backends", func() {
VRAM: 1000000000000,
Backend: system.Backend{BackendsPath: tempDir},
}
err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true)
err = InstallBackendFromGallery(context.TODO(), []config.Gallery{gallery}, nvidiaSystemState, ml, "meta-backend", nil, true, false)
Expect(err).NotTo(HaveOccurred())
metaBackendPath := filepath.Join(tempDir, "meta-backend")
@@ -808,7 +808,7 @@ var _ = Describe("Gallery Backends", func() {
system.WithBackendPath(newPath),
)
Expect(err).NotTo(HaveOccurred())
err = InstallBackend(context.TODO(), systemState, ml, &backend, nil)
err = InstallBackend(context.TODO(), systemState, ml, &backend, nil, false)
Expect(newPath).To(BeADirectory())
Expect(err).To(HaveOccurred()) // Will fail due to invalid URI, but path should be created
})
@@ -840,7 +840,7 @@ var _ = Describe("Gallery Backends", func() {
system.WithBackendPath(tempDir),
)
Expect(err).NotTo(HaveOccurred())
err = InstallBackend(context.TODO(), systemState, ml, &backend, nil)
err = InstallBackend(context.TODO(), systemState, ml, &backend, nil, false)
Expect(err).ToNot(HaveOccurred())
Expect(filepath.Join(tempDir, "test-backend", "metadata.json")).To(BeARegularFile())
dat, err := os.ReadFile(filepath.Join(tempDir, "test-backend", "metadata.json"))
@@ -873,7 +873,7 @@ var _ = Describe("Gallery Backends", func() {
Expect(filepath.Join(tempDir, "test-backend", "metadata.json")).ToNot(BeARegularFile())
err = InstallBackend(context.TODO(), systemState, ml, &backend, nil)
err = InstallBackend(context.TODO(), systemState, ml, &backend, nil, false)
Expect(err).ToNot(HaveOccurred())
Expect(filepath.Join(tempDir, "test-backend", "metadata.json")).To(BeARegularFile())
})
@@ -894,7 +894,7 @@ var _ = Describe("Gallery Backends", func() {
system.WithBackendPath(tempDir),
)
Expect(err).NotTo(HaveOccurred())
err = InstallBackend(context.TODO(), systemState, ml, &backend, nil)
err = InstallBackend(context.TODO(), systemState, ml, &backend, nil, false)
Expect(err).ToNot(HaveOccurred())
Expect(filepath.Join(tempDir, "test-backend", "metadata.json")).To(BeARegularFile())

View File

@@ -47,7 +47,7 @@ var _ = Describe("Backend versioning", func() {
backend.URI = srcDir
backend.Version = "1.2.3"
err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil)
err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil, false)
Expect(err).NotTo(HaveOccurred())
// Read the metadata file and check version
@@ -74,7 +74,7 @@ var _ = Describe("Backend versioning", func() {
backend.URI = srcDir
backend.Version = "2.0.0"
err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil)
err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil, false)
Expect(err).NotTo(HaveOccurred())
metadataPath := filepath.Join(tempDir, "test-backend-uri", "metadata.json")
@@ -100,7 +100,7 @@ var _ = Describe("Backend versioning", func() {
backend.URI = srcDir
// Version intentionally left empty
err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil)
err = gallery.InstallBackend(context.Background(), systemState, modelLoader, backend, nil, false)
Expect(err).NotTo(HaveOccurred())
metadataPath := filepath.Join(tempDir, "test-backend-noversion", "metadata.json")

View File

@@ -1,10 +1,13 @@
package importers
import (
"context"
"encoding/json"
"path/filepath"
"strings"
"time"
gguf "github.com/gpustack/gguf-parser-go"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/schema"
@@ -261,6 +264,13 @@ func (i *LlamaCPPImporter) Import(details Details) (gallery.ModelConfig, error)
// Apply per-model-family inference parameter defaults
config.ApplyInferenceDefaults(&modelConfig, details.URI)
// Auto-detect Multi-Token Prediction heads (ggml-org/llama.cpp#22673) and
// enable speculative decoding. Mirrors the load-time hook so freshly
// imported configs already carry spec_type:draft-mtp before the model is
// ever loaded - users see it in the YAML preview rather than discovering
// it after the first start.
maybeApplyMTPDefaults(&modelConfig, details, &cfg)
data, err := yaml.Marshal(modelConfig)
if err != nil {
return gallery.ModelConfig{}, err
@@ -291,6 +301,85 @@ func pickPreferredGroup(groups []hfapi.ShardGroup, prefs []string) *hfapi.ShardG
return &groups[len(groups)-1]
}
// maybeApplyMTPDefaults parses the picked GGUF header (range-fetched over
// HTTP for HF/URL imports) and, if the file declares a Multi-Token Prediction
// head, appends the auto-MTP option keys to modelConfig.Options. Failures
// during the probe are non-fatal: the importer keeps the config without MTP
// so an unrelated network blip or weird header doesn't break the import.
//
// OCI/Ollama URIs are skipped because the artifact isn't directly fetchable
// as a GGUF byte stream - the load-time hook (core/config/gguf.go) covers
// those once the model is materialised on disk.
func maybeApplyMTPDefaults(modelConfig *config.ModelConfig, details Details, cfg *gallery.ModelConfig) {
probeURL := pickMTPProbeURL(details, cfg)
if probeURL == "" {
return
}
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
defer func() {
if r := recover(); r != nil {
xlog.Debug("[mtp-importer] panic while probing GGUF header", "uri", probeURL, "recover", r)
}
}()
f, err := gguf.ParseGGUFFileRemote(ctx, probeURL)
if err != nil {
xlog.Debug("[mtp-importer] failed to read remote GGUF header for MTP detection", "uri", probeURL, "error", err)
return
}
n, ok := config.HasEmbeddedMTPHead(f)
if !ok {
return
}
config.ApplyMTPDefaults(modelConfig, n)
}
// pickMTPProbeURL returns an HTTP(S) URL pointing at the main (non-mmproj)
// GGUF shard that should be inspected for an MTP head, or "" when no
// suitable URL is available. Custom URI schemes (`huggingface://`,
// `ollama://`, etc.) are run through `downloader.URI.ResolveURL` so the
// resulting URL is something `gguf.ParseGGUFFileRemote` can actually open.
// OCI/Ollama URIs are skipped because the artifact is not directly
// streamable as a GGUF byte range.
func pickMTPProbeURL(details Details, cfg *gallery.ModelConfig) string {
uri := downloader.URI(details.URI)
if uri.LooksLikeOCI() {
return ""
}
if strings.HasSuffix(strings.ToLower(details.URI), ".gguf") {
return resolveHTTPProbe(details.URI)
}
for _, f := range cfg.Files {
lower := strings.ToLower(f.Filename)
if strings.Contains(lower, "mmproj") {
continue
}
if !strings.HasSuffix(lower, ".gguf") {
continue
}
return resolveHTTPProbe(f.URI)
}
return ""
}
// resolveHTTPProbe resolves an importer-side URI to the HTTP(S) URL that
// `gguf.ParseGGUFFileRemote` can range-fetch. Returns "" if the URI can't
// be reduced to an HTTP(S) endpoint (e.g. local path, unsupported scheme).
func resolveHTTPProbe(uri string) string {
resolved := downloader.URI(uri).ResolveURL()
if downloader.URI(resolved).LooksLikeHTTPURL() {
return resolved
}
return ""
}
// appendShardGroup copies every shard of group into cfg.Files under dest,
// skipping any entry whose target filename is already present so repeated
// calls (e.g. the rare case of mmproj + model picking the same group)

View File

@@ -77,7 +77,7 @@ func InstallModelFromGallery(
modelGalleries, backendGalleries []lconfig.Gallery,
systemState *system.SystemState,
modelLoader *model.ModelLoader,
name string, req GalleryModel, downloadStatus func(string, string, string, float64), enforceScan, automaticallyInstallBackend bool) error {
name string, req GalleryModel, downloadStatus func(string, string, string, float64), enforceScan, automaticallyInstallBackend, requireBackendIntegrity bool) error {
applyModel := func(model *GalleryModel) error {
name = strings.ReplaceAll(name, string(os.PathSeparator), "__")
@@ -137,7 +137,7 @@ func InstallModelFromGallery(
if automaticallyInstallBackend && installedModel.Backend != "" {
xlog.Debug("Installing backend", "backend", installedModel.Backend)
if err := InstallBackendFromGallery(ctx, backendGalleries, systemState, modelLoader, installedModel.Backend, downloadStatus, false); err != nil {
if err := InstallBackendFromGallery(ctx, backendGalleries, systemState, modelLoader, installedModel.Backend, downloadStatus, false, requireBackendIntegrity); err != nil {
return err
}
}

View File

@@ -89,7 +89,7 @@ var _ = Describe("Model test", func() {
Expect(models[0].URL).To(Equal(bertEmbeddingsURL))
Expect(models[0].Installed).To(BeFalse())
err = InstallModelFromGallery(context.TODO(), galleries, []config.Gallery{}, systemState, nil, "test@bert", GalleryModel{}, func(s1, s2, s3 string, f float64) {}, true, true)
err = InstallModelFromGallery(context.TODO(), galleries, []config.Gallery{}, systemState, nil, "test@bert", GalleryModel{}, func(s1, s2, s3 string, f float64) {}, true, true, false)
Expect(err).ToNot(HaveOccurred())
dat, err := os.ReadFile(filepath.Join(tempdir, "bert.yaml"))

View File

@@ -232,7 +232,7 @@ func summarizeNodeDrift(nodes []NodeBackendRef) (majority struct{ version, diges
// UpgradeBackend upgrades a single backend to the latest gallery version using
// an atomic swap with backup-based rollback on failure.
func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, galleries []config.Gallery, backendName string, downloadStatus func(string, string, string, float64)) error {
func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, galleries []config.Gallery, backendName string, downloadStatus func(string, string, string, float64), requireIntegrity bool) error {
// Look up the installed backend
installedBackends, err := ListSystemBackends(systemState)
if err != nil {
@@ -251,7 +251,7 @@ func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelL
// If this is a meta backend, recursively upgrade the concrete backend it points to
if installed.Metadata != nil && installed.Metadata.MetaBackendFor != "" {
xlog.Info("Meta backend detected, upgrading concrete backend", "meta", backendName, "concrete", installed.Metadata.MetaBackendFor)
return UpgradeBackend(ctx, systemState, modelLoader, galleries, installed.Metadata.MetaBackendFor, downloadStatus)
return UpgradeBackend(ctx, systemState, modelLoader, galleries, installed.Metadata.MetaBackendFor, downloadStatus, requireIntegrity)
}
// Find the gallery entry
@@ -265,6 +265,16 @@ func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelL
return fmt.Errorf("no gallery entry found for backend %q", backendName)
}
// Resolve integrity options (cosign verifier for OCI URIs, strict-mode
// gate for missing SHA256/policy) BEFORE writing anything to disk.
// Without this, the upgrade path would atomically swap in an
// unverified backend even when the gallery has a verification policy
// — see backendDownloadOptions in backends.go.
downloadOpts, err := backendDownloadOptions(galleryEntry, requireIntegrity)
if err != nil {
return fmt.Errorf("upgrade %q: %w", backendName, err)
}
backendPath := filepath.Join(systemState.Backend.BackendsPath, backendName)
tmpPath := backendPath + ".upgrade-tmp"
backupPath := backendPath + ".backup"
@@ -285,7 +295,7 @@ func UpgradeBackend(ctx context.Context, systemState *system.SystemState, modelL
return fmt.Errorf("failed to copy backend from directory: %w", err)
}
} else {
if err := uri.DownloadFileWithContext(ctx, tmpPath, "", 1, 1, downloadStatus); err != nil {
if err := uri.DownloadFileWithContext(ctx, tmpPath, galleryEntry.SHA256, 1, 1, downloadStatus, downloadOpts...); err != nil {
os.RemoveAll(tmpPath)
return fmt.Errorf("failed to download backend: %w", err)
}

View File

@@ -383,7 +383,7 @@ var _ = Describe("Upgrade Detection and Execution", func() {
})
ml := model.NewModelLoader(systemState)
err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil)
err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil, false)
Expect(err).NotTo(HaveOccurred())
// Verify run.sh was updated
@@ -417,7 +417,7 @@ var _ = Describe("Upgrade Detection and Execution", func() {
})
ml := model.NewModelLoader(systemState)
err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil)
err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil, false)
Expect(err).To(HaveOccurred())
// Verify v1 is still intact
@@ -432,5 +432,41 @@ var _ = Describe("Upgrade Detection and Execution", func() {
Expect(json.Unmarshal(metaData, &meta)).To(Succeed())
Expect(meta.Version).To(Equal("1.0.0"))
})
// Regression: an earlier version of UpgradeBackend wrote the
// downloaded bytes to disk without going through
// backendDownloadOptions, so the gallery's verification policy
// (and strict-integrity gate) didn't apply on upgrade. This test
// pins the upgrade path to the same integrity gate as installs:
// strict mode + an OCI URI without a verification: block must
// hard-fail *before* anything is downloaded or swapped in.
It("should refuse to upgrade an OCI backend that bypasses integrity in strict mode", func() {
installBackendWithVersion("my-backend", "1.0.0", "#!/bin/sh\necho v1")
// OCI URI, no Gallery.Verification → backendDownloadOptions
// returns a strict-integrity error before any network call.
writeGalleryYAML([]GalleryBackend{
{
Metadata: Metadata{
Name: "my-backend",
},
URI: "oci://example.invalid/missing:never-fetched",
Version: "2.0.0",
},
})
ml := model.NewModelLoader(systemState)
err := UpgradeBackend(context.Background(), systemState, ml, galleries, "my-backend", nil, true)
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("strict integrity"))
// The installed v1 must be untouched — the upgrade should
// have aborted before writing anything.
content, err := os.ReadFile(filepath.Join(backendsPath, "my-backend", "run.sh"))
Expect(err).NotTo(HaveOccurred())
Expect(string(content)).To(Equal("#!/bin/sh\necho v1"))
Expect(filepath.Join(backendsPath, "my-backend.upgrade-tmp")).NotTo(BeAnExistingFile())
Expect(filepath.Join(backendsPath, "my-backend.backup")).NotTo(BeAnExistingFile())
})
})
})

View File

@@ -28,6 +28,7 @@ import (
"github.com/mudler/LocalAI/core/services/monitoring"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/quantization"
"github.com/mudler/LocalAI/pkg/signals"
"github.com/mudler/xlog"
)
@@ -267,9 +268,12 @@ func API(application *application.Application) (*echo.Echo, error) {
e.Static("/generated-videos", videoPath)
}
// Initialize usage recording when auth DB is available
// Initialize usage recording when auth DB is available, and ensure the
// batcher drains its in-memory queue on graceful shutdown so the last
// few seconds of usage don't disappear when the process exits.
if application.AuthDB() != nil {
httpMiddleware.InitUsageRecorder(application.AuthDB())
signals.RegisterGracefulTerminationHandler(httpMiddleware.ShutdownUsageRecorder)
}
// Auth is applied to _all_ endpoints. Filtering out endpoints to bypass is
@@ -403,7 +407,7 @@ func API(application *application.Application) (*echo.Echo, error) {
}
}
routes.RegisterNodeSelfServiceRoutes(e, registry, distCfg.RegistrationToken, distCfg.AutoApproveNodes, application.AuthDB(), application.ApplicationConfig().Auth.APIKeyHMACSecret)
routes.RegisterNodeAdminRoutes(e, registry, remoteUnloader, adminMiddleware, application.AuthDB(), application.ApplicationConfig().Auth.APIKeyHMACSecret, application.ApplicationConfig().Distributed.RegistrationToken)
routes.RegisterNodeAdminRoutes(e, registry, remoteUnloader, application.GalleryService(), opcache, application.ApplicationConfig(), adminMiddleware, application.AuthDB(), application.ApplicationConfig().Auth.APIKeyHMACSecret, application.ApplicationConfig().Distributed.RegistrationToken)
// Distributed SSE routes (job progress + agent events via NATS)
if d := application.Distributed(); d != nil {

View File

@@ -38,9 +38,15 @@ func InitDB(databaseURL string) (*gorm.DB, error) {
}
// Backfill: users created before the provider column existed have an empty
// provider treat them as local accounts so the UI can identify them.
// provider - treat them as local accounts so the UI can identify them.
db.Exec("UPDATE users SET provider = ? WHERE provider = '' OR provider IS NULL", ProviderLocal)
// Backfill: pre-feature usage_records have no source column. Classify them so the
// new per-source aggregators include them.
if err := BackfillUsageSource(db); err != nil {
return nil, fmt.Errorf("failed to backfill usage source: %w", err)
}
// Create composite index on users(provider, subject) for fast OAuth lookups
if err := db.Exec("CREATE INDEX IF NOT EXISTS idx_users_provider_subject ON users(provider, subject)").Error; err != nil {
// Ignore error on postgres if index already exists

View File

@@ -16,8 +16,10 @@ import (
)
const (
contextKeyUser = "auth_user"
contextKeyRole = "auth_role"
contextKeyUser = "auth_user"
contextKeyRole = "auth_role"
contextKeyAPIKey = "auth_apikey"
contextKeySource = "auth_source"
)
// Middleware returns an Echo middleware that handles authentication.
@@ -75,6 +77,7 @@ func Middleware(db *gorm.DB, appConfig *config.ApplicationConfig) echo.Middlewar
}
c.Set(contextKeyUser, syntheticUser)
c.Set(contextKeyRole, RoleAdmin)
c.Set(contextKeySource, UsageSourceLegacy)
authenticated = true
}
}
@@ -213,6 +216,20 @@ func GetUserRole(c echo.Context) string {
return role
}
// GetAPIKey returns the resolved API key from the echo context, or nil.
// Nil for session-cookie and legacy-env-key authentication.
func GetAPIKey(c echo.Context) *UserAPIKey {
k, _ := c.Get(contextKeyAPIKey).(*UserAPIKey)
return k
}
// GetSource returns the request's authentication source: UsageSourceAPIKey,
// UsageSourceWeb, UsageSourceLegacy, or empty if no authentication was performed.
func GetSource(c echo.Context) string {
s, _ := c.Get(contextKeySource).(string)
return s
}
// RequireRouteFeature returns a global middleware that checks the user has access
// to the feature required by the matched route. It uses the RouteFeatureRegistry
// to look up the required feature for each route pattern + HTTP method.
@@ -421,47 +438,67 @@ func RequireQuota(db *gorm.DB) echo.MiddlewareFunc {
}
// tryAuthenticate attempts to authenticate the request using the database.
//
// On success it returns the user and, as a side effect, sets the following
// values on the Echo context:
// - contextKeySource ("auth_source"): always set, one of UsageSourceWeb /
// UsageSourceAPIKey. UsageSourceLegacy is set elsewhere by the parent
// Middleware when a legacy env key matches.
// - contextKeyAPIKey ("auth_apikey"): set to the resolved *UserAPIKey for
// named-key branches (Bearer, x-api-key, xi-api-key, token cookie).
// - "_auth_session": session record, used by Middleware to drive cookie
// rotation. Only set on the session-cookie branch.
//
// contextKeyUser and contextKeyRole are populated by the parent Middleware
// after this function returns.
func tryAuthenticate(c echo.Context, db *gorm.DB, appConfig *config.ApplicationConfig) *User {
hmacSecret := appConfig.Auth.APIKeyHMACSecret
// a. Session cookie
// a. Session cookie -> web UI
if cookie, err := c.Cookie(sessionCookie); err == nil && cookie.Value != "" {
if user, session := ValidateSession(db, cookie.Value, hmacSecret); user != nil {
// Store session for rotation check in middleware
c.Set("_auth_session", session)
c.Set(contextKeySource, UsageSourceWeb)
return user
}
}
// b. Authorization: Bearer token
// b. Authorization: Bearer
authHeader := c.Request().Header.Get("Authorization")
if strings.HasPrefix(authHeader, "Bearer ") {
token := strings.TrimPrefix(authHeader, "Bearer ")
// Try as session ID first
// b1. Session token via Bearer -> still web UI
if user, _ := ValidateSession(db, token, hmacSecret); user != nil {
c.Set(contextKeySource, UsageSourceWeb)
return user
}
// Try as user API key
// b2. Named API key
if key, err := ValidateAPIKey(db, token, hmacSecret); err == nil {
c.Set(contextKeySource, UsageSourceAPIKey)
c.Set(contextKeyAPIKey, key)
return &key.User
}
}
// c. x-api-key / xi-api-key headers
// c. x-api-key / xi-api-key -> named API key
for _, header := range []string{"x-api-key", "xi-api-key"} {
if key := c.Request().Header.Get(header); key != "" {
if apiKey, err := ValidateAPIKey(db, key, hmacSecret); err == nil {
if k := c.Request().Header.Get(header); k != "" {
if apiKey, err := ValidateAPIKey(db, k, hmacSecret); err == nil {
c.Set(contextKeySource, UsageSourceAPIKey)
c.Set(contextKeyAPIKey, apiKey)
return &apiKey.User
}
}
}
// d. token cookie (legacy)
// d. token cookie -> named API key
if cookie, err := c.Cookie("token"); err == nil && cookie.Value != "" {
// Try as user API key
if key, err := ValidateAPIKey(db, cookie.Value, hmacSecret); err == nil {
c.Set(contextKeySource, UsageSourceAPIKey)
c.Set(contextKeyAPIKey, key)
return &key.User
}
}

View File

@@ -303,4 +303,122 @@ var _ = Describe("Auth Middleware", func() {
}
})
})
Describe("auth context plumbing for usage source", func() {
// probeApp builds a minimal echo app with the auth middleware and a single
// "/probe" route that captures the user, source, and apikey from context.
type probe struct {
user *auth.User
source string
key *auth.UserAPIKey
}
probeApp := func(db *gorm.DB, appConfig *config.ApplicationConfig, p *probe) *echo.Echo {
e := echo.New()
e.Use(auth.Middleware(db, appConfig))
e.GET("/probe", func(c echo.Context) error {
p.user = auth.GetUser(c)
p.source = auth.GetSource(c)
p.key = auth.GetAPIKey(c)
return c.NoContent(http.StatusOK)
})
return e
}
It("session cookie sets source=web, apikey=nil", func() {
db := testDB()
appConfig := config.NewApplicationConfig()
user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
token := createTestSession(db, user.ID)
var p probe
app := probeApp(db, appConfig, &p)
rec := doRequest(app, http.MethodGet, "/probe", withSessionCookie(token))
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(p.user).ToNot(BeNil())
Expect(p.user.ID).To(Equal(user.ID))
Expect(p.source).To(Equal(auth.UsageSourceWeb))
Expect(p.key).To(BeNil())
})
It("Bearer session token sets source=web, apikey=nil", func() {
db := testDB()
appConfig := config.NewApplicationConfig()
user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
token := createTestSession(db, user.ID)
var p probe
app := probeApp(db, appConfig, &p)
rec := doRequest(app, http.MethodGet, "/probe", withBearerToken(token))
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(p.user).ToNot(BeNil())
Expect(p.user.ID).To(Equal(user.ID))
Expect(p.source).To(Equal(auth.UsageSourceWeb))
Expect(p.key).To(BeNil())
})
It("Bearer API key sets source=apikey and exposes the resolved *UserAPIKey", func() {
db := testDB()
appConfig := config.NewApplicationConfig()
user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
plaintext, key, err := auth.CreateAPIKey(db, user.ID, "ci", auth.RoleUser, appConfig.Auth.APIKeyHMACSecret, nil)
Expect(err).ToNot(HaveOccurred())
var p probe
app := probeApp(db, appConfig, &p)
rec := doRequest(app, http.MethodGet, "/probe", withBearerToken(plaintext))
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(p.source).To(Equal(auth.UsageSourceAPIKey))
Expect(p.key).ToNot(BeNil())
Expect(p.key.ID).To(Equal(key.ID))
})
It("x-api-key header sets source=apikey", func() {
db := testDB()
appConfig := config.NewApplicationConfig()
user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
plaintext, _, err := auth.CreateAPIKey(db, user.ID, "ci", auth.RoleUser, appConfig.Auth.APIKeyHMACSecret, nil)
Expect(err).ToNot(HaveOccurred())
var p probe
app := probeApp(db, appConfig, &p)
rec := doRequest(app, http.MethodGet, "/probe", withXApiKey(plaintext))
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(p.source).To(Equal(auth.UsageSourceAPIKey))
Expect(p.key).ToNot(BeNil())
})
It("token cookie sets source=apikey", func() {
db := testDB()
appConfig := config.NewApplicationConfig()
user := createTestUser(db, "alice@example.com", auth.RoleUser, auth.ProviderLocal)
plaintext, _, err := auth.CreateAPIKey(db, user.ID, "ci", auth.RoleUser, appConfig.Auth.APIKeyHMACSecret, nil)
Expect(err).ToNot(HaveOccurred())
var p probe
app := probeApp(db, appConfig, &p)
rec := doRequest(app, http.MethodGet, "/probe", withTokenCookie(plaintext))
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(p.source).To(Equal(auth.UsageSourceAPIKey))
Expect(p.key).ToNot(BeNil())
})
It("legacy env key sets source=legacy, apikey=nil", func() {
db := testDB()
appConfig := config.NewApplicationConfig()
appConfig.ApiKeys = []string{"legacy-secret"}
var p probe
app := probeApp(db, appConfig, &p)
rec := doRequest(app, http.MethodGet, "/probe", withBearerToken("legacy-secret"))
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(p.source).To(Equal(auth.UsageSourceLegacy))
Expect(p.key).To(BeNil())
})
})
})

View File

@@ -5,14 +5,31 @@ import (
"strings"
"time"
"github.com/mudler/xlog"
"gorm.io/gorm"
)
// Source classification for a UsageRecord.
const (
UsageSourceAPIKey = "apikey" // request authenticated with a named UserAPIKey
UsageSourceWeb = "web" // request authenticated with a session cookie (web UI)
UsageSourceLegacy = "legacy" // request authenticated with an env-configured legacy key
)
// UsageRecord represents a single API request's token usage.
type UsageRecord struct {
ID uint `gorm:"primaryKey;autoIncrement"`
UserID string `gorm:"size:36;index:idx_usage_user_time"`
UserName string `gorm:"size:255"`
ID uint `gorm:"primaryKey;autoIncrement"`
UserID string `gorm:"size:36;index:idx_usage_user_time"`
UserName string `gorm:"size:255"`
// Source classifies how the request authenticated. One of UsageSource* constants.
// Empty for pre-feature rows until the InitDB backfill runs.
Source string `gorm:"size:16;index:idx_usage_source"`
// APIKeyID is the UserAPIKey.ID when Source == UsageSourceAPIKey. Nil otherwise.
APIKeyID *string `gorm:"size:36;index:idx_usage_apikey"`
// APIKeyName is a snapshot of UserAPIKey.Name at write time. Survives key deletion.
APIKeyName string `gorm:"size:255"`
Model string `gorm:"size:255;index"`
Endpoint string `gorm:"size:255"`
PromptTokens int64
@@ -30,9 +47,12 @@ func RecordUsage(db *gorm.DB, record *UsageRecord) error {
// UsageBucket is an aggregated time bucket for the dashboard.
type UsageBucket struct {
Bucket string `json:"bucket"`
Model string `json:"model"`
Model string `json:"model,omitempty"`
UserID string `json:"user_id,omitempty"`
UserName string `json:"user_name,omitempty"`
Source string `json:"source,omitempty"`
APIKeyID string `json:"api_key_id,omitempty"`
APIKeyName string `json:"api_key_name,omitempty"`
PromptTokens int64 `json:"prompt_tokens"`
CompletionTokens int64 `json:"completion_tokens"`
TotalTokens int64 `json:"total_tokens"`
@@ -119,6 +139,28 @@ func GetUserUsage(db *gorm.DB, userID, period string) ([]UsageBucket, error) {
return buckets, nil
}
// BackfillUsageSource sets the Source column on pre-feature usage rows.
// Idempotent: only touches rows where source is NULL or empty.
// - rows whose user_id == "legacy-api-key" -> UsageSourceLegacy
// - everything else -> UsageSourceWeb
func BackfillUsageSource(db *gorm.DB) error {
// Legacy first (more specific predicate)
if err := db.Exec(
`UPDATE usage_records SET source = ? WHERE (source IS NULL OR source = '') AND user_id = ?`,
UsageSourceLegacy, "legacy-api-key",
).Error; err != nil {
return fmt.Errorf("backfill legacy usage source: %w", err)
}
// Everything else -> web
if err := db.Exec(
`UPDATE usage_records SET source = ? WHERE (source IS NULL OR source = '')`,
UsageSourceWeb,
).Error; err != nil {
return fmt.Errorf("backfill web usage source: %w", err)
}
return nil
}
// GetAllUsage returns aggregated usage for all users (admin). Optional userID filter.
func GetAllUsage(db *gorm.DB, period, userID string) ([]UsageBucket, error) {
sqlite := isSQLiteDB(db)
@@ -149,3 +191,257 @@ func GetAllUsage(db *gorm.DB, period, userID string) ([]UsageBucket, error) {
}
return buckets, nil
}
// TotalsEntry is a token+request roll-up.
type TotalsEntry struct {
Tokens int64 `json:"tokens"`
Requests int64 `json:"requests"`
}
// KeyTotal is the per-key roll-up returned by sources endpoints. UserID and
// UserName are snapshotted from the UsageRecord so revoked-and-deleted keys
// still carry their owner attribution in admin views.
type KeyTotal struct {
APIKeyID string `json:"api_key_id"`
APIKeyName string `json:"api_key_name"`
UserID string `json:"user_id"`
UserName string `json:"user_name"`
Tokens int64 `json:"tokens"`
Requests int64 `json:"requests"`
LastUsed time.Time `json:"last_used"`
}
// UserSourceTotal is a per-(user, source) roll-up for sources that don't carry
// a named API key identity (web, legacy). It exists so admin views can show
// which user generated each block of Web UI / legacy traffic; the per-apikey
// breakdown for source=apikey already lives in KeyTotal.
type UserSourceTotal struct {
Source string `json:"source"`
UserID string `json:"user_id"`
UserName string `json:"user_name"`
Tokens int64 `json:"tokens"`
Requests int64 `json:"requests"`
}
// SourceTotals summarises a per-source breakdown.
type SourceTotals struct {
BySource map[string]TotalsEntry `json:"by_source"`
ByKey []KeyTotal `json:"by_key"` // server-sorted desc by tokens, capped
ByUserSource []UserSourceTotal `json:"by_user_source,omitempty"` // populated only when includeLegacy=true
GrandTotal TotalsEntry `json:"grand_total"`
}
const maxKeyTotals = 200
// GetUserUsageBySource returns per-source aggregated usage for one user. Legacy
// is excluded by design (visible to admins only via the admin variant).
func GetUserUsageBySource(db *gorm.DB, userID, period string) ([]UsageBucket, SourceTotals, error) {
sqlite := isSQLiteDB(db)
since, dateFmt := periodToWindow(period, sqlite)
bucketExpr := fmt.Sprintf("%s as bucket", dateFmt)
query := db.Model(&UsageRecord{}).
Select(bucketExpr+", source, COALESCE(api_key_id, '') as api_key_id, api_key_name, "+
"SUM(prompt_tokens) as prompt_tokens, "+
"SUM(completion_tokens) as completion_tokens, "+
"SUM(total_tokens) as total_tokens, "+
"COUNT(*) as request_count").
Where("user_id = ?", userID).
Where("source <> ?", UsageSourceLegacy).
Group("bucket, source, api_key_id, api_key_name").
Order("bucket ASC")
if !since.IsZero() {
query = query.Where("created_at >= ?", since)
}
var buckets []UsageBucket
if err := query.Find(&buckets).Error; err != nil {
return nil, SourceTotals{}, err
}
totals := computeSourceTotals(db, userID, "", since, false)
return buckets, totals, nil
}
// computeSourceTotals rolls up by_source / by_key / grand_total.
// userID/apiKeyID are optional filters. includeLegacy controls whether the
// legacy bucket is exposed (admin-only).
func computeSourceTotals(db *gorm.DB, userID, apiKeyID string, since time.Time, includeLegacy bool) SourceTotals {
totals := SourceTotals{BySource: map[string]TotalsEntry{}}
bySourceQ := db.Model(&UsageRecord{}).
Select("source, SUM(total_tokens) as tokens, COUNT(*) as requests").
Group("source")
bySourceQ = applyFilters(bySourceQ, userID, apiKeyID, since, includeLegacy)
var bySourceRows []struct {
Source string
Tokens int64
Requests int64
}
if err := bySourceQ.Scan(&bySourceRows).Error; err != nil {
xlog.Warn("computeSourceTotals: by-source Scan failed", "error", err)
return totals
}
for _, r := range bySourceRows {
totals.BySource[r.Source] = TotalsEntry{Tokens: r.Tokens, Requests: r.Requests}
totals.GrandTotal.Tokens += r.Tokens
totals.GrandTotal.Requests += r.Requests
}
byKeyQ := db.Model(&UsageRecord{}).
Select("COALESCE(api_key_id, '') as api_key_id, api_key_name, "+
"user_id, user_name, "+
"SUM(total_tokens) as tokens, COUNT(*) as requests, MAX(created_at) as last_used").
Where("api_key_id IS NOT NULL AND api_key_id <> ''").
Group("api_key_id, api_key_name, user_id, user_name").
Order("tokens DESC").
Limit(maxKeyTotals)
byKeyQ = applyFilters(byKeyQ, userID, apiKeyID, since, includeLegacy)
// Iterate Rows() manually because MAX(created_at) is returned as a string by
// the SQLite driver, and Go's database/sql refuses to scan that into
// *time.Time. Postgres returns a proper timestamp. We accept both shapes
// via a Rows.Scan into a string column, then parse uniformly.
rows, err := byKeyQ.Rows()
if err != nil {
xlog.Warn("computeSourceTotals: by-key Rows() failed", "error", err)
} else {
defer func() { _ = rows.Close() }()
out := make([]KeyTotal, 0)
for rows.Next() {
var (
apiKeyID, apiKeyName, userIDCol, userName, lastUsedRaw string
tokens, requests int64
)
if scanErr := rows.Scan(&apiKeyID, &apiKeyName, &userIDCol, &userName, &tokens, &requests, &lastUsedRaw); scanErr != nil {
continue
}
out = append(out, KeyTotal{
APIKeyID: apiKeyID,
APIKeyName: apiKeyName,
UserID: userIDCol,
UserName: userName,
Tokens: tokens,
Requests: requests,
LastUsed: parseLastUsedString(lastUsedRaw),
})
}
if rerr := rows.Err(); rerr != nil {
xlog.Warn("computeSourceTotals: by-key rows iteration failed", "error", rerr)
}
totals.ByKey = out
}
// by_user_source: only populated for admin callers (includeLegacy=true) so
// they can attribute Web UI / legacy traffic to specific users. Per-apikey
// rows already carry user info via KeyTotal above, so this query only
// covers source != apikey.
if includeLegacy {
byUserSourceQ := db.Model(&UsageRecord{}).
Select("source, user_id, user_name, "+
"SUM(total_tokens) as tokens, COUNT(*) as requests").
Where("source <> ?", UsageSourceAPIKey).
Group("source, user_id, user_name").
Order("tokens DESC")
byUserSourceQ = applyFilters(byUserSourceQ, userID, apiKeyID, since, includeLegacy)
var byUserSourceRows []UserSourceTotal
if scanErr := byUserSourceQ.Scan(&byUserSourceRows).Error; scanErr != nil {
xlog.Warn("computeSourceTotals: by-user-source Scan failed", "error", scanErr)
} else {
totals.ByUserSource = byUserSourceRows
}
}
return totals
}
// parseLastUsedString converts the textual MAX(created_at) value returned by
// SQLite (or any driver that surfaces the timestamp as a string) into a
// time.Time. Returns the zero time on parse failure.
func parseLastUsedString(s string) time.Time {
if s == "" {
return time.Time{}
}
// GORM's SQLite driver emits Go's default time formatting. Try the formats
// it commonly produces, falling back to RFC3339Nano.
layouts := []string{
"2006-01-02 15:04:05.999999999 -0700 MST",
"2006-01-02 15:04:05.999999999-07:00",
"2006-01-02 15:04:05.999999999",
"2006-01-02 15:04:05",
time.RFC3339Nano,
time.RFC3339,
}
for _, layout := range layouts {
if t, err := time.Parse(layout, s); err == nil {
return t
}
}
xlog.Warn("parseLastUsedString: unrecognised format", "value", s)
return time.Time{}
}
// GetAllUsageBySource is the admin variant of GetUserUsageBySource.
// Optional filters: userID and apiKeyID. Legacy is included.
// truncated == true iff the per-key roll-up was capped at maxKeyTotals.
func GetAllUsageBySource(db *gorm.DB, period, userID, apiKeyID string) ([]UsageBucket, SourceTotals, bool, error) {
sqlite := isSQLiteDB(db)
since, dateFmt := periodToWindow(period, sqlite)
bucketExpr := fmt.Sprintf("%s as bucket", dateFmt)
query := db.Model(&UsageRecord{}).
Select(bucketExpr+", source, COALESCE(api_key_id, '') as api_key_id, api_key_name, "+
"user_id, user_name, "+
"SUM(prompt_tokens) as prompt_tokens, "+
"SUM(completion_tokens) as completion_tokens, "+
"SUM(total_tokens) as total_tokens, "+
"COUNT(*) as request_count").
Group("bucket, source, api_key_id, api_key_name, user_id, user_name").
Order("bucket ASC")
query = applyFilters(query, userID, apiKeyID, since, true)
var buckets []UsageBucket
if err := query.Find(&buckets).Error; err != nil {
return nil, SourceTotals{}, false, err
}
totals := computeSourceTotals(db, userID, apiKeyID, since, true)
// Count distinct api_key_ids matching the filters. If > maxKeyTotals,
// the by_key slice was capped and we signal truncation to the caller.
truncated := false
var distinct int64
countQ := applyFilters(
db.Model(&UsageRecord{}).
Distinct("api_key_id").
Where("api_key_id IS NOT NULL AND api_key_id <> ''"),
userID, apiKeyID, since, true,
)
if err := countQ.Count(&distinct).Error; err != nil {
xlog.Warn("GetAllUsageBySource: distinct api_key_id count failed", "error", err)
} else {
truncated = distinct > maxKeyTotals
}
return buckets, totals, truncated, nil
}
func applyFilters(q *gorm.DB, userID, apiKeyID string, since time.Time, includeLegacy bool) *gorm.DB {
if userID != "" {
q = q.Where("user_id = ?", userID)
}
if apiKeyID != "" {
q = q.Where("api_key_id = ?", apiKeyID)
}
if !since.IsZero() {
q = q.Where("created_at >= ?", since)
}
if !includeLegacy {
q = q.Where("source <> ?", UsageSourceLegacy)
}
return q
}

View File

@@ -3,11 +3,13 @@
package auth_test
import (
"fmt"
"time"
"github.com/mudler/LocalAI/core/http/auth"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"gorm.io/gorm"
)
var _ = Describe("Usage", func() {
@@ -158,4 +160,275 @@ var _ = Describe("Usage", func() {
}
})
})
Describe("Usage source backfill", func() {
It("backfills 'web' for pre-feature rows", func() {
db := testDB()
rawDB, err := db.DB()
Expect(err).ToNot(HaveOccurred())
_, err = rawDB.Exec(
`INSERT INTO usage_records (user_id, source, model, created_at, total_tokens, prompt_tokens, completion_tokens, duration) VALUES (?, '', ?, ?, 0, 0, 0, 0)`,
"user-x", "gpt-4", time.Now())
Expect(err).ToNot(HaveOccurred())
Expect(auth.BackfillUsageSource(db)).To(Succeed())
var loaded auth.UsageRecord
Expect(db.Where("user_id = ?", "user-x").First(&loaded).Error).To(Succeed())
Expect(loaded.Source).To(Equal(auth.UsageSourceWeb))
})
It("backfills 'legacy' for pre-feature rows with legacy-api-key user_id", func() {
db := testDB()
rawDB, err := db.DB()
Expect(err).ToNot(HaveOccurred())
_, err = rawDB.Exec(
`INSERT INTO usage_records (user_id, source, model, created_at, total_tokens, prompt_tokens, completion_tokens, duration) VALUES (?, '', ?, ?, 0, 0, 0, 0)`,
"legacy-api-key", "gpt-4", time.Now())
Expect(err).ToNot(HaveOccurred())
Expect(auth.BackfillUsageSource(db)).To(Succeed())
var loaded auth.UsageRecord
Expect(db.Where("user_id = ?", "legacy-api-key").First(&loaded).Error).To(Succeed())
Expect(loaded.Source).To(Equal(auth.UsageSourceLegacy))
})
It("is idempotent on re-run", func() {
db := testDB()
Expect(auth.BackfillUsageSource(db)).To(Succeed())
Expect(auth.BackfillUsageSource(db)).To(Succeed())
})
})
Describe("UsageRecord with source fields", func() {
It("persists Source, APIKeyID, APIKeyName", func() {
db := testDB()
keyID := "key-uuid-1"
record := &auth.UsageRecord{
UserID: "user-1",
UserName: "Test User",
Source: auth.UsageSourceAPIKey,
APIKeyID: &keyID,
APIKeyName: "ci-runner",
Model: "gpt-4",
Endpoint: "/v1/chat/completions",
TotalTokens: 150,
CreatedAt: time.Now(),
}
Expect(auth.RecordUsage(db, record)).To(Succeed())
var loaded auth.UsageRecord
Expect(db.First(&loaded, record.ID).Error).To(Succeed())
Expect(loaded.Source).To(Equal(auth.UsageSourceAPIKey))
Expect(loaded.APIKeyID).ToNot(BeNil())
Expect(*loaded.APIKeyID).To(Equal("key-uuid-1"))
Expect(loaded.APIKeyName).To(Equal("ci-runner"))
})
It("allows nil APIKeyID for web/legacy sources", func() {
db := testDB()
record := &auth.UsageRecord{
UserID: "user-1",
Source: auth.UsageSourceWeb,
Model: "gpt-4",
CreatedAt: time.Now(),
}
Expect(auth.RecordUsage(db, record)).To(Succeed())
var loaded auth.UsageRecord
Expect(db.First(&loaded, record.ID).Error).To(Succeed())
Expect(loaded.Source).To(Equal(auth.UsageSourceWeb))
Expect(loaded.APIKeyID).To(BeNil())
Expect(loaded.APIKeyName).To(BeEmpty())
})
})
Describe("GetUserUsageBySource", func() {
insert := func(db *gorm.DB, userID, source, keyID, keyName string, tokens int64, when time.Time) {
rec := &auth.UsageRecord{
UserID: userID,
Source: source,
Model: "gpt-4",
TotalTokens: tokens,
CreatedAt: when,
}
if keyID != "" {
rec.APIKeyID = &keyID
rec.APIKeyName = keyName
}
Expect(auth.RecordUsage(db, rec)).To(Succeed())
}
It("returns only the caller's rows, never legacy", func() {
db := testDB()
now := time.Now()
insert(db, "alice", auth.UsageSourceAPIKey, "k1", "ci", 100, now)
insert(db, "alice", auth.UsageSourceWeb, "", "", 50, now)
insert(db, "alice", auth.UsageSourceLegacy, "", "", 30, now)
insert(db, "bob", auth.UsageSourceAPIKey, "k2", "bobk", 90, now)
buckets, totals, err := auth.GetUserUsageBySource(db, "alice", "month")
Expect(err).ToNot(HaveOccurred())
for _, b := range buckets {
Expect(b.UserID).To(Or(BeEmpty(), Equal("alice")))
Expect(b.Source).ToNot(Equal(auth.UsageSourceLegacy))
}
Expect(totals.GrandTotal.Tokens).To(Equal(int64(150)))
Expect(totals.BySource[auth.UsageSourceAPIKey].Tokens).To(Equal(int64(100)))
Expect(totals.BySource[auth.UsageSourceWeb].Tokens).To(Equal(int64(50)))
_, hasLegacy := totals.BySource[auth.UsageSourceLegacy]
Expect(hasLegacy).To(BeFalse())
})
It("snapshots survive key deletion", func() {
db := testDB()
now := time.Now()
insert(db, "alice", auth.UsageSourceAPIKey, "deleted-key", "old-name", 42, now)
_, totals, err := auth.GetUserUsageBySource(db, "alice", "month")
Expect(err).ToNot(HaveOccurred())
Expect(totals.ByKey).To(HaveLen(1))
Expect(totals.ByKey[0].APIKeyName).To(Equal("old-name"))
Expect(totals.ByKey[0].APIKeyID).To(Equal("deleted-key"))
Expect(totals.ByKey[0].LastUsed).ToNot(BeZero())
Expect(totals.ByKey[0].LastUsed).To(BeTemporally("~", now, 2*time.Second))
})
})
Describe("GetAllUsageBySource", func() {
insert := func(db *gorm.DB, userID, source, keyID string, tokens int64) {
rec := &auth.UsageRecord{
UserID: userID,
Source: source,
Model: "gpt-4",
TotalTokens: tokens,
CreatedAt: time.Now(),
}
if keyID != "" {
rec.APIKeyID = &keyID
rec.APIKeyName = "name-" + keyID
}
Expect(auth.RecordUsage(db, rec)).To(Succeed())
}
It("includes legacy for admins", func() {
db := testDB()
insert(db, "alice", auth.UsageSourceAPIKey, "k1", 10)
insert(db, "legacy-api-key", auth.UsageSourceLegacy, "", 5)
_, totals, _, err := auth.GetAllUsageBySource(db, "month", "", "")
Expect(err).ToNot(HaveOccurred())
Expect(totals.BySource).To(HaveKey(auth.UsageSourceLegacy))
Expect(totals.BySource[auth.UsageSourceLegacy].Tokens).To(Equal(int64(5)))
})
It("filters by user_id AND api_key_id", func() {
db := testDB()
insert(db, "alice", auth.UsageSourceAPIKey, "k1", 10)
insert(db, "alice", auth.UsageSourceAPIKey, "k2", 20)
insert(db, "bob", auth.UsageSourceAPIKey, "k3", 30)
_, totals, _, err := auth.GetAllUsageBySource(db, "month", "alice", "k2")
Expect(err).ToNot(HaveOccurred())
Expect(totals.GrandTotal.Tokens).To(Equal(int64(20)))
})
It("sets truncated=true when by_key exceeds the cap", func() {
db := testDB()
for i := 0; i < 210; i++ {
insert(db, "alice", auth.UsageSourceAPIKey, fmt.Sprintf("key-%03d", i), int64(210-i))
}
_, totals, truncated, err := auth.GetAllUsageBySource(db, "month", "", "")
Expect(err).ToNot(HaveOccurred())
Expect(truncated).To(BeTrue())
Expect(totals.ByKey).To(HaveLen(200))
Expect(totals.ByKey[0].Tokens > totals.ByKey[199].Tokens).To(BeTrue())
})
// insertNamed records a row with explicit user_id, user_name, source,
// and optional api key snapshot. Used by the user-attribution tests
// below which the older insert helper can't express.
insertNamed := func(db *gorm.DB, userID, userName, source, keyID, keyName string, tokens int64) {
rec := &auth.UsageRecord{
UserID: userID,
UserName: userName,
Source: source,
Model: "gpt-4",
TotalTokens: tokens,
CreatedAt: time.Now(),
}
if keyID != "" {
rec.APIKeyID = &keyID
rec.APIKeyName = keyName
}
Expect(auth.RecordUsage(db, rec)).To(Succeed())
}
It("attributes each KeyTotal to its owner user", func() {
db := testDB()
insertNamed(db, "alice", "Alice", auth.UsageSourceAPIKey, "k1", "ci-runner", 100)
insertNamed(db, "bob", "Bob", auth.UsageSourceAPIKey, "k2", "lap", 50)
_, totals, _, err := auth.GetAllUsageBySource(db, "month", "", "")
Expect(err).ToNot(HaveOccurred())
Expect(totals.ByKey).To(HaveLen(2))
byID := map[string]auth.KeyTotal{}
for _, k := range totals.ByKey {
byID[k.APIKeyID] = k
}
Expect(byID["k1"].UserID).To(Equal("alice"))
Expect(byID["k1"].UserName).To(Equal("Alice"))
Expect(byID["k2"].UserID).To(Equal("bob"))
Expect(byID["k2"].UserName).To(Equal("Bob"))
})
It("breaks Web UI and legacy traffic out per user in by_user_source for admin", func() {
db := testDB()
// Alice and Bob both have Web UI traffic; a synthetic legacy user
// also contributes. ByUserSource should expose one row per
// (source, user) pair, never for source=apikey.
insertNamed(db, "alice", "Alice", auth.UsageSourceWeb, "", "", 30)
insertNamed(db, "bob", "Bob", auth.UsageSourceWeb, "", "", 70)
insertNamed(db, "legacy-api-key", "API Key User", auth.UsageSourceLegacy, "", "", 10)
insertNamed(db, "alice", "Alice", auth.UsageSourceAPIKey, "k1", "ci-runner", 5)
_, totals, _, err := auth.GetAllUsageBySource(db, "month", "", "")
Expect(err).ToNot(HaveOccurred())
Expect(totals.ByUserSource).ToNot(BeEmpty())
for _, r := range totals.ByUserSource {
Expect(r.Source).ToNot(Equal(auth.UsageSourceAPIKey))
}
webByUser := map[string]int64{}
legacyByUser := map[string]int64{}
for _, r := range totals.ByUserSource {
switch r.Source {
case auth.UsageSourceWeb:
webByUser[r.UserID] = r.Tokens
case auth.UsageSourceLegacy:
legacyByUser[r.UserID] = r.Tokens
}
}
Expect(webByUser["alice"]).To(Equal(int64(30)))
Expect(webByUser["bob"]).To(Equal(int64(70)))
Expect(legacyByUser["legacy-api-key"]).To(Equal(int64(10)))
})
It("does NOT populate by_user_source in the non-admin path", func() {
db := testDB()
insertNamed(db, "alice", "Alice", auth.UsageSourceWeb, "", "", 30)
_, totals, err := auth.GetUserUsageBySource(db, "alice", "month")
Expect(err).ToNot(HaveOccurred())
// Non-admin path uses includeLegacy=false, so by_user_source stays nil.
Expect(totals.ByUserSource).To(BeNil())
})
})
})

View File

@@ -16,8 +16,11 @@ import (
"github.com/google/uuid"
"github.com/gorilla/websocket"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/http/auth"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/xlog"
"gorm.io/gorm"
@@ -381,14 +384,24 @@ func ResumeNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
}
}
// InstallBackendOnNodeEndpoint triggers backend installation on a worker node via NATS.
// InstallBackendOnNodeEndpoint triggers backend installation on a worker node.
// Async: enqueues a ManagementOp on the gallery service channel and returns a
// jobID immediately. The gallery service worker goroutine drives the actual
// install via DistributedBackendManager.InstallBackend, which honors the op's
// TargetNodeID to scope the fan-out to one node. The UI polls /api/backends/job/:uid
// for progress, mirroring /api/backends/install/:id.
//
// Backend can be either a gallery ID (resolved against BackendGalleries) or a
// direct URI install (URI + Name + optional Alias) same shape as the
// direct URI install (URI + Name + optional Alias) - same shape as the
// standalone /api/backends/install-external path, just scoped to one node.
func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerFunc {
//
// The legacy unloader argument is retained for signature symmetry with
// DeleteBackendOnNodeEndpoint / ListBackendsOnNodeEndpoint but is no longer
// used here - the async path goes through galleryService.
func InstallBackendOnNodeEndpoint(_ nodes.NodeCommandSender, galleryService *galleryop.GalleryService, opcache *galleryop.OpCache, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
if unloader == nil {
return c.JSON(http.StatusServiceUnavailable, nodeError(http.StatusServiceUnavailable, "NATS not configured"))
if galleryService == nil {
return c.JSON(http.StatusServiceUnavailable, nodeError(http.StatusServiceUnavailable, "gallery service not configured"))
}
nodeID := c.Param("id")
var req struct {
@@ -401,25 +414,65 @@ func InstallBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.Handler
if err := c.Bind(&req); err != nil {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "invalid request body"))
}
// Either a gallery backend name or a direct URI must be supplied.
if req.Backend == "" && req.URI == "" {
return c.JSON(http.StatusBadRequest, nodeError(http.StatusBadRequest, "backend name or uri required"))
}
// Admin-driven backend install: not tied to a specific replica slot
// (no model is being loaded). Pass replica 0 to match the worker's
// admin process-key convention (`backend#0`). The worker's fast path
// takes over if the backend is already running — upgrades go through
// the dedicated /api/backends/upgrade path on backend.upgrade.
reply, err := unloader.InstallBackend(nodeID, req.Backend, "", req.BackendGalleries, req.URI, req.Name, req.Alias, 0)
jobUUID, err := uuid.NewUUID()
if err != nil {
xlog.Error("Failed to install backend on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", err)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to install backend on node"))
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "failed to generate job id"))
}
if !reply.Success {
xlog.Error("Backend install failed on node", "node", nodeID, "backend", req.Backend, "uri", req.URI, "error", reply.Error)
return c.JSON(http.StatusInternalServerError, nodeError(http.StatusInternalServerError, "backend installation failed"))
jobID := jobUUID.String()
// Cache key: for gallery installs, use the backend slug; for URI
// installs prefer the provided Name (falling back to URI). All keys
// are node-scoped so concurrent installs of the same backend on
// different nodes do not stomp each other in opcache.
backendKey := req.Backend
if backendKey == "" {
backendKey = req.Name
if backendKey == "" {
backendKey = req.URI
}
}
return c.JSON(http.StatusOK, map[string]string{"message": "backend installed"})
cacheKey := galleryop.NodeScopedKey(nodeID, backendKey)
opcache.SetBackend(cacheKey, jobID)
// Optional caller-supplied galleries override. Mirrors the standalone
// install path so an admin can point at a private gallery.
galleries := appConfig.BackendGalleries
if req.BackendGalleries != "" {
var custom []config.Gallery
if err := json.Unmarshal([]byte(req.BackendGalleries), &custom); err != nil {
xlog.Warn("Ignoring malformed backend_galleries override; falling back to configured galleries", "error", err, "nodeID", nodeID)
} else if len(custom) > 0 {
galleries = custom
}
}
ctx, cancelFunc := context.WithCancel(context.Background())
op := galleryop.ManagementOp[gallery.GalleryBackend, any]{
ID: jobID,
GalleryElementName: req.Backend,
Galleries: galleries,
TargetNodeID: nodeID,
ExternalURI: req.URI,
ExternalName: req.Name,
ExternalAlias: req.Alias,
Context: ctx,
CancelFunc: cancelFunc,
}
galleryService.StoreCancellation(jobID, cancelFunc)
go func() {
galleryService.BackendGalleryChannel <- op
}()
xlog.Info("Node-scoped backend install dispatched", "node", nodeID, "backend", req.Backend, "uri", req.URI, "jobID", jobID)
return c.JSON(http.StatusAccepted, map[string]string{
"jobID": jobID,
"statusUrl": "/api/backends/job/" + jobID,
"message": "backend installation started",
})
}
}

View File

@@ -0,0 +1,123 @@
package localai_test
import (
"bytes"
"encoding/json"
"net/http"
"net/http/httptest"
"github.com/labstack/echo/v4"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery"
"github.com/mudler/LocalAI/core/http/endpoints/localai"
"github.com/mudler/LocalAI/core/services/galleryop"
)
// InstallBackendOnNodeEndpoint became async to stop blocking the browser on
// the 3-minute NATS reply timeout. These specs lock in the new contract:
// HTTP 202 with a jobID, a ManagementOp enqueued on the gallery channel, and
// an opcache entry keyed by NodeScopedKey so concurrent installs of the same
// backend on different nodes do not stomp each other.
var _ = Describe("InstallBackendOnNodeEndpoint async behavior", func() {
var (
e *echo.Echo
galleryService *galleryop.GalleryService
opcache *galleryop.OpCache
appCfg *config.ApplicationConfig
dispatched chan galleryop.ManagementOp[gallery.GalleryBackend, any]
done chan struct{}
drainExited chan struct{}
)
BeforeEach(func() {
e = echo.New()
appCfg = &config.ApplicationConfig{
BackendGalleries: []config.Gallery{{Name: "test-gallery", URL: "http://example.com"}},
}
galleryService = galleryop.NewGalleryService(appCfg, nil)
opcache = galleryop.NewOpCache(galleryService)
// Drain the gallery channel into a buffered side channel so the
// handler's `go func() { ch <- op }()` send does not block waiting
// for the real worker (which is not running in this unit test).
dispatched = make(chan galleryop.ManagementOp[gallery.GalleryBackend, any], 4)
done = make(chan struct{})
drainExited = make(chan struct{})
go func() {
defer close(drainExited)
for {
select {
case op := <-galleryService.BackendGalleryChannel:
dispatched <- op
case <-done:
return
}
}
}()
})
AfterEach(func() {
// Signal the drain goroutine to exit. We do NOT close
// BackendGalleryChannel: the handler's dispatch goroutine may still
// be pending (specs that don't Eventually-Receive), and a send on a
// closed channel panics. Signalling via `done` lets the drain
// goroutine return without touching the gallery channel.
close(done)
Eventually(drainExited, "2s").Should(BeClosed())
})
It("returns 202 with a jobID and dispatches a TargetNodeID-scoped op", func() {
body := `{"backend": "llama-cpp"}`
req := httptest.NewRequest(http.MethodPost, "/api/nodes/node-xyz/backends/install", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
rec := httptest.NewRecorder()
c := e.NewContext(req, rec)
c.SetParamNames("id")
c.SetParamValues("node-xyz")
handler := localai.InstallBackendOnNodeEndpoint(nil, galleryService, opcache, appCfg)
Expect(handler(c)).To(Succeed())
Expect(rec.Code).To(Equal(http.StatusAccepted))
var resp map[string]any
Expect(json.Unmarshal(rec.Body.Bytes(), &resp)).To(Succeed())
Expect(resp["jobID"]).To(BeAssignableToTypeOf(""))
Expect(resp["jobID"].(string)).ToNot(BeEmpty())
Expect(resp["message"]).To(Equal("backend installation started"))
Eventually(dispatched, "2s").Should(Receive())
Expect(opcache.Exists(galleryop.NodeScopedKey("node-xyz", "llama-cpp"))).To(BeTrue())
Expect(opcache.IsBackendOp(galleryop.NodeScopedKey("node-xyz", "llama-cpp"))).To(BeTrue())
})
It("returns 400 when neither backend nor uri is supplied", func() {
req := httptest.NewRequest(http.MethodPost, "/api/nodes/node-xyz/backends/install", bytes.NewBufferString(`{}`))
req.Header.Set("Content-Type", "application/json")
rec := httptest.NewRecorder()
c := e.NewContext(req, rec)
c.SetParamNames("id")
c.SetParamValues("node-xyz")
handler := localai.InstallBackendOnNodeEndpoint(nil, galleryService, opcache, appCfg)
Expect(handler(c)).To(Succeed())
Expect(rec.Code).To(Equal(http.StatusBadRequest))
})
It("accepts a direct URI install and uses the name as the cache key", func() {
body := `{"uri": "oci://example.com/custom-backend:v1", "name": "custom"}`
req := httptest.NewRequest(http.MethodPost, "/api/nodes/node-xyz/backends/install", bytes.NewBufferString(body))
req.Header.Set("Content-Type", "application/json")
rec := httptest.NewRecorder()
c := e.NewContext(req, rec)
c.SetParamNames("id")
c.SetParamValues("node-xyz")
handler := localai.InstallBackendOnNodeEndpoint(nil, galleryService, opcache, appCfg)
Expect(handler(c)).To(Succeed())
Expect(rec.Code).To(Equal(http.StatusAccepted))
Expect(opcache.Exists(galleryop.NodeScopedKey("node-xyz", "custom"))).To(BeTrue())
})
})

View File

@@ -73,363 +73,6 @@ func mergeToolCallDeltas(existing []schema.ToolCall, deltas []schema.ToolCall) [
// @Success 200 {object} schema.OpenAIResponse "Response"
// @Router /v1/chat/completions [post]
func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator *templates.Evaluator, startupOptions *config.ApplicationConfig, natsClient mcpTools.MCPNATSClient, assistantHolder *mcpTools.LocalAIAssistantHolder) echo.HandlerFunc {
process := func(s string, req *schema.OpenAIRequest, config *config.ModelConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse, extraUsage bool, id string, created int) error {
initialMessage := schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0, FinishReason: nil}},
Object: "chat.completion.chunk",
}
responses <- initialMessage
// Detect if thinking token is already in prompt or template
// When UseTokenizerTemplate is enabled, predInput is empty, so we check the template
var template string
if config.TemplateConfig.UseTokenizerTemplate {
template = config.GetModelTemplate()
} else {
template = s
}
thinkingStartToken := reason.DetectThinkingStartToken(template, &config.ReasoningConfig)
extractor := reason.NewReasoningExtractor(thinkingStartToken, config.ReasoningConfig)
_, _, _, err := ComputeChoices(req, s, config, cl, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, tokenUsage backend.TokenUsage) bool {
var reasoningDelta, contentDelta string
// Always keep the Go-side extractor in sync with raw tokens so it
// can serve as fallback for backends without an autoparser (e.g. vLLM).
goReasoning, goContent := extractor.ProcessToken(s)
// When C++ autoparser chat deltas are available, prefer them — they
// handle model-specific formats (Gemma 4, etc.) without Go-side tags.
// Otherwise fall back to Go-side extraction.
if tokenUsage.HasChatDeltaContent() {
rawReasoning, cd := tokenUsage.ChatDeltaReasoningAndContent()
contentDelta = cd
reasoningDelta = extractor.ProcessChatDeltaReasoning(rawReasoning)
} else {
reasoningDelta = goReasoning
contentDelta = goContent
}
usage := schema.OpenAIUsage{
PromptTokens: tokenUsage.Prompt,
CompletionTokens: tokenUsage.Completion,
TotalTokens: tokenUsage.Prompt + tokenUsage.Completion,
}
if extraUsage {
usage.TimingTokenGeneration = tokenUsage.TimingTokenGeneration
usage.TimingPromptProcessing = tokenUsage.TimingPromptProcessing
}
delta := &schema.Message{}
if contentDelta != "" {
delta.Content = &contentDelta
}
if reasoningDelta != "" {
delta.Reasoning = &reasoningDelta
}
// Usage rides as a struct field for the consumer to track the
// running cumulative — it is stripped before JSON marshal so the
// wire chunk stays spec-compliant (no `usage` on intermediate
// chunks). The dedicated trailer chunk (when include_usage=true)
// carries the final totals.
usageForChunk := usage
resp := schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: []schema.Choice{{Delta: delta, Index: 0, FinishReason: nil}},
Object: "chat.completion.chunk",
Usage: &usageForChunk,
}
responses <- resp
return true
})
close(responses)
return err
}
processTools := func(noAction string, prompt string, req *schema.OpenAIRequest, config *config.ModelConfig, loader *model.ModelLoader, responses chan schema.OpenAIResponse, extraUsage bool, id string, created int, textContentToReturn *string) error {
// Detect if thinking token is already in prompt or template
var template string
if config.TemplateConfig.UseTokenizerTemplate {
template = config.GetModelTemplate()
} else {
template = prompt
}
thinkingStartToken := reason.DetectThinkingStartToken(template, &config.ReasoningConfig)
extractor := reason.NewReasoningExtractor(thinkingStartToken, config.ReasoningConfig)
result := ""
lastEmittedCount := 0
sentInitialRole := false
sentReasoning := false
hasChatDeltaToolCalls := false
hasChatDeltaContent := false
_, _, chatDeltas, err := ComputeChoices(req, prompt, config, cl, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
result += s
// Track whether ChatDeltas from the C++ autoparser contain
// tool calls or content, so the retry decision can account for them.
for _, d := range usage.ChatDeltas {
if len(d.ToolCalls) > 0 {
hasChatDeltaToolCalls = true
}
if d.Content != "" {
hasChatDeltaContent = true
}
}
var reasoningDelta, contentDelta string
goReasoning, goContent := extractor.ProcessToken(s)
if usage.HasChatDeltaContent() {
rawReasoning, cd := usage.ChatDeltaReasoningAndContent()
contentDelta = cd
reasoningDelta = extractor.ProcessChatDeltaReasoning(rawReasoning)
} else {
reasoningDelta = goReasoning
contentDelta = goContent
}
// Emit reasoning deltas in their own SSE chunks before any tool-call chunks
// (OpenAI spec: reasoning and tool_calls never share a delta)
if reasoningDelta != "" {
responses <- schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{Reasoning: &reasoningDelta},
Index: 0,
}},
Object: "chat.completion.chunk",
}
sentReasoning = true
}
// Stream content deltas (cleaned of reasoning tags) while no tool calls
// have been detected. Once the incremental parser finds tool calls,
// content stops — per OpenAI spec, content and tool_calls don't mix.
if lastEmittedCount == 0 && contentDelta != "" {
if !sentInitialRole {
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
Object: "chat.completion.chunk",
}
sentInitialRole = true
}
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{Content: &contentDelta},
Index: 0,
}},
Object: "chat.completion.chunk",
}
}
// Try incremental XML parsing for streaming support using iterative parser
// This allows emitting partial tool calls as they're being generated
cleanedResult := functions.CleanupLLMResult(result, config.FunctionsConfig)
// Determine XML format from config
var xmlFormat *functions.XMLToolCallFormat
if config.FunctionsConfig.XMLFormat != nil {
xmlFormat = config.FunctionsConfig.XMLFormat
} else if config.FunctionsConfig.XMLFormatPreset != "" {
xmlFormat = functions.GetXMLFormatPreset(config.FunctionsConfig.XMLFormatPreset)
}
// Use iterative parser for streaming (partial parsing enabled)
// Try XML parsing first
partialResults, parseErr := functions.ParseXMLIterative(cleanedResult, xmlFormat, true)
if parseErr == nil && len(partialResults) > 0 {
// Emit new XML tool calls that weren't emitted before
if len(partialResults) > lastEmittedCount {
for i := lastEmittedCount; i < len(partialResults); i++ {
toolCall := partialResults[i]
initialMessage := schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{
{
Index: i,
ID: id,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: toolCall.Name,
},
},
},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
}
select {
case responses <- initialMessage:
default:
}
}
lastEmittedCount = len(partialResults)
}
} else {
// Try JSON tool call parsing for streaming.
// Only emit NEW tool calls (same guard as XML parser above).
jsonResults, jsonErr := functions.ParseJSONIterative(cleanedResult, true)
if jsonErr == nil && len(jsonResults) > lastEmittedCount {
for i := lastEmittedCount; i < len(jsonResults); i++ {
jsonObj := jsonResults[i]
name, ok := jsonObj["name"].(string)
if !ok || name == "" {
continue
}
args := "{}"
if argsVal, ok := jsonObj["arguments"]; ok {
if argsStr, ok := argsVal.(string); ok {
args = argsStr
} else {
argsBytes, _ := json.Marshal(argsVal)
args = string(argsBytes)
}
}
initialMessage := schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{
{
Index: i,
ID: id,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: name,
Arguments: args,
},
},
},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
}
responses <- initialMessage
}
lastEmittedCount = len(jsonResults)
}
}
return true
},
func(attempt int) bool {
// After streaming completes: check if we got actionable content
cleaned := extractor.CleanedContent()
// Check for tool calls from chat deltas (will be re-checked after ComputeChoices,
// but we need to know here whether to retry).
// Also check ChatDelta flags — when the C++ autoparser is active,
// tool calls and content are delivered via ChatDeltas while the
// raw message is cleared. Without this check, we'd retry
// unnecessarily, losing valid results and concatenating output.
hasToolCalls := lastEmittedCount > 0 || hasChatDeltaToolCalls
hasContent := cleaned != "" || hasChatDeltaContent
if !hasContent && !hasToolCalls {
xlog.Warn("Streaming: backend produced only reasoning, retrying",
"reasoning_len", len(extractor.Reasoning()), "attempt", attempt+1)
extractor.ResetAndSuppressReasoning()
result = ""
lastEmittedCount = 0
sentInitialRole = false
hasChatDeltaToolCalls = false
hasChatDeltaContent = false
return true
}
return false
},
)
if err != nil {
return err
}
// Try using pre-parsed tool calls from C++ autoparser (chat deltas)
var functionResults []functions.FuncCallResults
var reasoning string
if deltaToolCalls := functions.ToolCallsFromChatDeltas(chatDeltas); len(deltaToolCalls) > 0 {
xlog.Debug("[ChatDeltas] Using pre-parsed tool calls from C++ autoparser", "count", len(deltaToolCalls))
functionResults = deltaToolCalls
// Use content/reasoning from deltas too
*textContentToReturn = functions.ContentFromChatDeltas(chatDeltas)
reasoning = functions.ReasoningFromChatDeltas(chatDeltas)
} else {
// Fallback: parse tool calls from raw text (no chat deltas from backend)
xlog.Debug("[ChatDeltas] no pre-parsed tool calls, falling back to Go-side text parsing")
reasoning = extractor.Reasoning()
cleanedResult := extractor.CleanedContent()
*textContentToReturn = functions.ParseTextContent(cleanedResult, config.FunctionsConfig)
cleanedResult = functions.CleanupLLMResult(cleanedResult, config.FunctionsConfig)
functionResults = functions.ParseFunctionCall(cleanedResult, config.FunctionsConfig)
}
xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
// noAction is a sentinel "just answer" pseudo-function — not a real
// tool call. Scan the whole slice rather than only index 0 so we
// don't drop a real tool call that happens to follow a noAction
// entry, and so the default branch isn't entered with only noAction
// entries to emit as tool_calls.
noActionToRun := !hasRealCall(functionResults, noAction)
switch {
case noActionToRun:
// Token-cumulative usage is communicated to the streaming
// consumer via the per-token callback's chunk struct (stripped
// before wire marshal). The final usage trailer — when the
// caller opted in with stream_options.include_usage — is built
// by the outer streaming loop, not here.
var result string
if !sentInitialRole {
var hqErr error
result, hqErr = handleQuestion(config, functionResults, extractor.CleanedContent(), prompt)
if hqErr != nil {
xlog.Error("error handling question", "error", hqErr)
return hqErr
}
}
for _, chunk := range buildNoActionFinalChunks(
id, req.Model, created,
sentInitialRole, sentReasoning,
result, reasoning,
) {
responses <- chunk
}
default:
for _, chunk := range buildDeferredToolCallChunks(
id, req.Model, created,
functionResults, lastEmittedCount,
sentInitialRole, *textContentToReturn,
sentReasoning, reasoning,
) {
responses <- chunk
}
}
close(responses)
return err
}
return func(c echo.Context) error {
var textContentToReturn string
id := uuid.New().String()
@@ -697,17 +340,19 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
}
responses := make(chan schema.OpenAIResponse)
ended := make(chan error, 1)
ended := make(chan streamWorkerResult, 1)
go func() {
if !shouldUseFn {
ended <- process(predInput, input, config, ml, responses, extraUsage, id, created)
u, err := processStream(predInput, input, config, cl, startupOptions, ml, responses, id, created)
ended <- streamWorkerResult{usage: u, err: err}
} else {
ended <- processTools(noActionName, predInput, input, config, ml, responses, extraUsage, id, created, &textContentToReturn)
u, err := processStreamWithTools(noActionName, predInput, input, config, cl, startupOptions, ml, responses, id, created, &textContentToReturn)
ended <- streamWorkerResult{usage: u, err: err}
}
}()
usage := &schema.OpenAIUsage{}
var finalUsage backend.TokenUsage
toolsCalled := false
var collectedToolCalls []schema.ToolCall
var collectedContent string
@@ -725,13 +370,6 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
xlog.Debug("No choices in the response, skipping")
continue
}
// Capture the running cumulative usage from this chunk
// (when present) so the include_usage trailer can carry
// the final totals. Usage is stripped before marshal
// below so the wire chunk stays spec-compliant.
if ev.Usage != nil {
usage = ev.Usage
}
if len(ev.Choices[0].Delta.ToolCalls) > 0 {
toolsCalled = true
// Collect and merge tool call deltas for MCP execution
@@ -747,11 +385,6 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
collectedContent += *sp
}
}
// OpenAI streaming spec: intermediate chunks must NOT
// carry a `usage` field. Strip the tracking copy
// before marshalling — usage is delivered via the
// dedicated trailer chunk when include_usage=true.
ev.Usage = nil
respData, err := json.Marshal(ev)
if err != nil {
xlog.Debug("Failed to marshal response", "error", err)
@@ -766,15 +399,16 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
return err
}
c.Response().Flush()
case err := <-ended:
if err == nil {
case res := <-ended:
if res.err == nil {
finalUsage = res.usage
break LOOP
}
xlog.Error("Stream ended with error", "error", err)
xlog.Error("Stream ended with error", "error", res.err)
errorResp := schema.ErrorResponse{
Error: &schema.APIError{
Message: err.Error(),
Message: res.err.Error(),
Type: "server_error",
Code: "server_error",
},
@@ -797,7 +431,10 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
// still trying to send (e.g., after client disconnect). The goroutine
// calls close(responses) when done, which terminates the drain.
if input.Context.Err() != nil {
go func() { for range responses {} }()
go func() {
for range responses {
}
}()
<-ended
}
@@ -921,8 +558,16 @@ func ChatEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, evaluator
// Trailing usage chunk per OpenAI spec: emit only when the
// caller opted in via stream_options.include_usage. Shape:
// {"choices":[],"usage":{...},"object":"chat.completion.chunk",...}
if input.StreamOptions != nil && input.StreamOptions.IncludeUsage && usage != nil {
trailer := streamUsageTrailerJSON(id, input.Model, created, *usage)
//
// finalUsage is the authoritative TokenUsage returned by the
// worker function (process / processTools) via the `ended`
// channel. The worker reads it from ComputeChoices' return
// value, which is the cumulative count produced by the backend
// over the whole prediction. Issue #9927 was caused by the
// tools-path worker not surfacing this value at all.
if input.StreamOptions != nil && input.StreamOptions.IncludeUsage {
trailerUsage := streamUsageFromTokenUsage(finalUsage, extraUsage)
trailer := streamUsageTrailerJSON(id, input.Model, created, trailerUsage)
_, _ = fmt.Fprintf(c.Response().Writer, "data: %s\n\n", trailer)
}

View File

@@ -4,10 +4,39 @@ import (
"encoding/json"
"fmt"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/functions"
)
// streamWorkerResult is what the streaming workers (process / processTools)
// hand back to the outer ChatEndpoint loop through the `ended` channel.
// Threading the final TokenUsage here, instead of piggy-backing it on the
// `responses` SSE channel, keeps the SSE channel single-purpose (wire chunks)
// and gives the trailer emitter a plain Go value to read after LOOP exits.
// Fix for issue #9927: the previous tools-path worker never surfaced the
// cumulative token counts at all, so the include_usage trailer reported zeros.
type streamWorkerResult struct {
usage backend.TokenUsage
err error
}
// streamUsageFromTokenUsage converts the backend's cumulative TokenUsage into
// the OpenAI-spec OpenAIUsage shape used on the wire. `extraUsage` controls
// whether the non-standard timing fields are forwarded.
func streamUsageFromTokenUsage(usage backend.TokenUsage, extraUsage bool) schema.OpenAIUsage {
out := schema.OpenAIUsage{
PromptTokens: usage.Prompt,
CompletionTokens: usage.Completion,
TotalTokens: usage.Prompt + usage.Completion,
}
if extraUsage {
out.TimingTokenGeneration = usage.TimingTokenGeneration
out.TimingPromptProcessing = usage.TimingPromptProcessing
}
return out
}
// streamUsageTrailerJSON returns the bytes of the OpenAI-spec trailing usage
// chunk emitted in streaming completions when the request opts in via
// `stream_options.include_usage: true`. The shape is:

View File

@@ -1,10 +1,14 @@
package openai
import (
"context"
"encoding/json"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/functions"
"github.com/mudler/LocalAI/pkg/model"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
@@ -152,6 +156,28 @@ var _ = Describe("streaming usage spec compliance", func() {
})
})
Describe("streamUsageFromTokenUsage", func() {
It("converts backend TokenUsage to schema OpenAIUsage", func() {
tu := backend.TokenUsage{Prompt: 18, Completion: 213}
u := streamUsageFromTokenUsage(tu, false)
Expect(u.PromptTokens).To(Equal(18))
Expect(u.CompletionTokens).To(Equal(213))
Expect(u.TotalTokens).To(Equal(231))
Expect(u.TimingTokenGeneration).To(BeZero())
Expect(u.TimingPromptProcessing).To(BeZero())
})
It("includes timings when extraUsage is true", func() {
tu := backend.TokenUsage{
Prompt: 10, Completion: 20,
TimingPromptProcessing: 0.5,
TimingTokenGeneration: 1.5,
}
u := streamUsageFromTokenUsage(tu, true)
Expect(u.TimingPromptProcessing).To(Equal(0.5))
Expect(u.TimingTokenGeneration).To(Equal(1.5))
})
})
Describe("OpenAIRequest.StreamOptions", func() {
It("parses stream_options.include_usage=true", func() {
body := []byte(`{
@@ -177,3 +203,160 @@ var _ = Describe("streaming usage spec compliance", func() {
})
})
})
// Functional regression coverage for issue #9927: the streaming workers
// must surface the cumulative TokenUsage returned by ComputeChoices to
// their caller. The earlier broken implementations discarded that value
// (`_, _, chatDeltas, err := ComputeChoices(...)`) and threw away the
// counts on the floor, so the include_usage trailer always reported
// zeros when tools were enabled.
//
// These tests stub backend.ModelInferenceFunc so the worker exercises the
// real ComputeChoices → predFunc → LLMResponse pipeline. If a future change
// drops the TokenUsage somewhere along that path, the assertions on the
// returned value fail with a concrete count mismatch (e.g. 0 vs 213),
// not with a "function undefined" compile error.
var _ = Describe("streaming workers surface final TokenUsage (issue #9927)", func() {
var (
origInference modelInferenceFunc
appCfg *config.ApplicationConfig
)
BeforeEach(func() {
origInference = backend.ModelInferenceFunc
appCfg = config.NewApplicationConfig()
})
AfterEach(func() {
backend.ModelInferenceFunc = origInference
})
// mockBackendUsage installs a stub backend that yields one LLMResponse
// carrying the supplied TokenUsage. ComputeChoices' single-attempt path
// copies these counts into the value it returns to the worker.
mockBackendUsage := func(usage backend.TokenUsage, response string) {
backend.ModelInferenceFunc = func(
ctx context.Context, s string, messages schema.Messages,
images, videos, audios []string,
loader *model.ModelLoader, c *config.ModelConfig, cl *config.ModelConfigLoader,
o *config.ApplicationConfig,
tokenCallback func(string, backend.TokenUsage) bool,
tools, toolChoice string,
logprobs, topLogprobs *int,
logitBias map[string]float64,
metadata map[string]string,
) (func() (backend.LLMResponse, error), error) {
return func() (backend.LLMResponse, error) {
return backend.LLMResponse{
Response: response,
Usage: usage,
}, nil
}, nil
}
}
makeReq := func() *schema.OpenAIRequest {
ctx, cancel := context.WithCancel(context.Background())
req := &schema.OpenAIRequest{
Context: ctx,
Cancel: cancel,
}
req.Model = "test-model" // promoted from BasicModelRequest
return req
}
// drainResponses consumes everything the worker pushes onto the channel
// so the worker is never blocked on its send. The channel is unbuffered
// (matching production), so the drain goroutine must be running before
// the worker is called.
drainResponses := func(ch <-chan schema.OpenAIResponse) <-chan struct{} {
done := make(chan struct{})
go func() {
for range ch {
}
close(done)
}()
return done
}
Describe("processStream (no-tools path)", func() {
It("returns the cumulative TokenUsage produced by the backend", func() {
mockBackendUsage(backend.TokenUsage{Prompt: 18, Completion: 213}, "Hello there")
req := makeReq()
cfg := &config.ModelConfig{}
responses := make(chan schema.OpenAIResponse)
done := drainResponses(responses)
actual, err := processStream("prompt", req, cfg, nil, appCfg, nil, responses, "req-1", 0)
<-done
Expect(err).ToNot(HaveOccurred())
Expect(actual.Prompt).To(Equal(18),
"prompt tokens must round-trip from backend through processStream")
Expect(actual.Completion).To(Equal(213),
"completion tokens must round-trip from backend through processStream")
})
It("returns zero TokenUsage when the backend reports zero (negative control)", func() {
mockBackendUsage(backend.TokenUsage{}, "x")
req := makeReq()
cfg := &config.ModelConfig{}
responses := make(chan schema.OpenAIResponse)
done := drainResponses(responses)
actual, err := processStream("prompt", req, cfg, nil, appCfg, nil, responses, "req-1", 0)
<-done
Expect(err).ToNot(HaveOccurred())
Expect(actual.Prompt).To(BeZero())
Expect(actual.Completion).To(BeZero())
})
})
Describe("processStreamWithTools (tools path)", func() {
It("returns the cumulative TokenUsage produced by the backend", func() {
// This is the direct regression check for issue #9927: with tools
// enabled, the trailer was reporting {0,0,0} because the worker
// discarded ComputeChoices' second return value.
mockBackendUsage(backend.TokenUsage{Prompt: 18, Completion: 213}, "answer")
req := makeReq()
cfg := &config.ModelConfig{}
responses := make(chan schema.OpenAIResponse)
done := drainResponses(responses)
var textContent string
actual, err := processStreamWithTools("none", "prompt", req, cfg, nil, appCfg, nil, responses, "req-1", 0, &textContent)
<-done
Expect(err).ToNot(HaveOccurred())
Expect(actual.Prompt).To(Equal(18),
"prompt tokens must round-trip from backend through processStreamWithTools (issue #9927)")
Expect(actual.Completion).To(Equal(213),
"completion tokens must round-trip from backend through processStreamWithTools (issue #9927)")
})
It("forwards timing fields when the backend supplies them", func() {
mockBackendUsage(backend.TokenUsage{
Prompt: 10, Completion: 20,
TimingPromptProcessing: 0.5,
TimingTokenGeneration: 1.5,
}, "answer")
req := makeReq()
cfg := &config.ModelConfig{}
responses := make(chan schema.OpenAIResponse)
done := drainResponses(responses)
var textContent string
actual, err := processStreamWithTools("none", "prompt", req, cfg, nil, appCfg, nil, responses, "req-1", 0, &textContent)
<-done
Expect(err).ToNot(HaveOccurred())
Expect(actual.TimingPromptProcessing).To(Equal(0.5))
Expect(actual.TimingTokenGeneration).To(Equal(1.5))
})
})
})

View File

@@ -0,0 +1,390 @@
package openai
import (
"encoding/json"
"github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/pkg/functions"
"github.com/mudler/LocalAI/pkg/model"
reason "github.com/mudler/LocalAI/pkg/reasoning"
"github.com/mudler/xlog"
)
// processStream is the streaming worker for chat completions with no
// tool/function calling involved. It pushes SSE-shaped chunks onto
// `responses` and returns the authoritative cumulative TokenUsage from
// the prediction so the caller can populate the include_usage trailer
// without having to peek inside the chunks.
//
// The caller owns the `responses` channel and is expected to read from
// it while this function runs; processStream closes the channel before
// returning.
func processStream(
s string,
req *schema.OpenAIRequest,
cfg *config.ModelConfig,
cl *config.ModelConfigLoader,
startupOptions *config.ApplicationConfig,
loader *model.ModelLoader,
responses chan schema.OpenAIResponse,
id string,
created int,
) (backend.TokenUsage, error) {
responses <- schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0, FinishReason: nil}},
Object: "chat.completion.chunk",
}
// Detect if thinking token is already in prompt or template
// When UseTokenizerTemplate is enabled, predInput is empty, so we check the template
var template string
if cfg.TemplateConfig.UseTokenizerTemplate {
template = cfg.GetModelTemplate()
} else {
template = s
}
thinkingStartToken := reason.DetectThinkingStartToken(template, &cfg.ReasoningConfig)
extractor := reason.NewReasoningExtractor(thinkingStartToken, cfg.ReasoningConfig)
_, finalUsage, _, err := ComputeChoices(req, s, cfg, cl, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, tokenUsage backend.TokenUsage) bool {
var reasoningDelta, contentDelta string
// Always keep the Go-side extractor in sync with raw tokens so it
// can serve as fallback for backends without an autoparser (e.g. vLLM).
goReasoning, goContent := extractor.ProcessToken(s)
// When C++ autoparser chat deltas are available, prefer them: they
// handle model-specific formats (Gemma 4, etc.) without Go-side tags.
// Otherwise fall back to Go-side extraction.
if tokenUsage.HasChatDeltaContent() {
rawReasoning, cd := tokenUsage.ChatDeltaReasoningAndContent()
contentDelta = cd
reasoningDelta = extractor.ProcessChatDeltaReasoning(rawReasoning)
} else {
reasoningDelta = goReasoning
contentDelta = goContent
}
delta := &schema.Message{}
if contentDelta != "" {
delta.Content = &contentDelta
}
if reasoningDelta != "" {
delta.Reasoning = &reasoningDelta
}
responses <- schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model, // we have to return what the user sent here, due to OpenAI spec.
Choices: []schema.Choice{{Delta: delta, Index: 0, FinishReason: nil}},
Object: "chat.completion.chunk",
}
return true
})
close(responses)
return finalUsage, err
}
// processStreamWithTools is the streaming worker for chat completions
// with tools / function calling. Same contract as processStream: pushes
// chunks onto `responses`, closes the channel, returns the cumulative
// TokenUsage.
//
// Returning the TokenUsage as a normal Go value (rather than smuggling
// it on a sentinel chunk) is the fix for issue #9927 — the previous
// implementation discarded the value from ComputeChoices, so the
// include_usage trailer reported zeros whenever `tools` was in play.
func processStreamWithTools(
noAction string,
prompt string,
req *schema.OpenAIRequest,
cfg *config.ModelConfig,
cl *config.ModelConfigLoader,
startupOptions *config.ApplicationConfig,
loader *model.ModelLoader,
responses chan schema.OpenAIResponse,
id string,
created int,
textContentToReturn *string,
) (backend.TokenUsage, error) {
// Detect if thinking token is already in prompt or template
var template string
if cfg.TemplateConfig.UseTokenizerTemplate {
template = cfg.GetModelTemplate()
} else {
template = prompt
}
thinkingStartToken := reason.DetectThinkingStartToken(template, &cfg.ReasoningConfig)
extractor := reason.NewReasoningExtractor(thinkingStartToken, cfg.ReasoningConfig)
result := ""
lastEmittedCount := 0
sentInitialRole := false
sentReasoning := false
hasChatDeltaToolCalls := false
hasChatDeltaContent := false
_, finalUsage, chatDeltas, err := ComputeChoices(req, prompt, cfg, cl, startupOptions, loader, func(s string, c *[]schema.Choice) {}, func(s string, usage backend.TokenUsage) bool {
result += s
// Track whether ChatDeltas from the C++ autoparser contain
// tool calls or content, so the retry decision can account for them.
for _, d := range usage.ChatDeltas {
if len(d.ToolCalls) > 0 {
hasChatDeltaToolCalls = true
}
if d.Content != "" {
hasChatDeltaContent = true
}
}
var reasoningDelta, contentDelta string
goReasoning, goContent := extractor.ProcessToken(s)
if usage.HasChatDeltaContent() {
rawReasoning, cd := usage.ChatDeltaReasoningAndContent()
contentDelta = cd
reasoningDelta = extractor.ProcessChatDeltaReasoning(rawReasoning)
} else {
reasoningDelta = goReasoning
contentDelta = goContent
}
// Emit reasoning deltas in their own SSE chunks before any tool-call chunks
// (OpenAI spec: reasoning and tool_calls never share a delta)
if reasoningDelta != "" {
responses <- schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{Reasoning: &reasoningDelta},
Index: 0,
}},
Object: "chat.completion.chunk",
}
sentReasoning = true
}
// Stream content deltas (cleaned of reasoning tags) while no tool calls
// have been detected. Once the incremental parser finds tool calls,
// content stops: per OpenAI spec, content and tool_calls don't mix.
if lastEmittedCount == 0 && contentDelta != "" {
if !sentInitialRole {
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{Delta: &schema.Message{Role: "assistant"}, Index: 0}},
Object: "chat.completion.chunk",
}
sentInitialRole = true
}
responses <- schema.OpenAIResponse{
ID: id, Created: created, Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{Content: &contentDelta},
Index: 0,
}},
Object: "chat.completion.chunk",
}
}
// Try incremental XML parsing for streaming support using iterative parser
// This allows emitting partial tool calls as they're being generated
cleanedResult := functions.CleanupLLMResult(result, cfg.FunctionsConfig)
// Determine XML format from config
var xmlFormat *functions.XMLToolCallFormat
if cfg.FunctionsConfig.XMLFormat != nil {
xmlFormat = cfg.FunctionsConfig.XMLFormat
} else if cfg.FunctionsConfig.XMLFormatPreset != "" {
xmlFormat = functions.GetXMLFormatPreset(cfg.FunctionsConfig.XMLFormatPreset)
}
// Use iterative parser for streaming (partial parsing enabled)
// Try XML parsing first
partialResults, parseErr := functions.ParseXMLIterative(cleanedResult, xmlFormat, true)
if parseErr == nil && len(partialResults) > 0 {
// Emit new XML tool calls that weren't emitted before
if len(partialResults) > lastEmittedCount {
for i := lastEmittedCount; i < len(partialResults); i++ {
toolCall := partialResults[i]
initialMessage := schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{
{
Index: i,
ID: id,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: toolCall.Name,
},
},
},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
}
select {
case responses <- initialMessage:
default:
}
}
lastEmittedCount = len(partialResults)
}
} else {
// Try JSON tool call parsing for streaming.
// Only emit NEW tool calls (same guard as XML parser above).
jsonResults, jsonErr := functions.ParseJSONIterative(cleanedResult, true)
if jsonErr == nil && len(jsonResults) > lastEmittedCount {
for i := lastEmittedCount; i < len(jsonResults); i++ {
jsonObj := jsonResults[i]
name, ok := jsonObj["name"].(string)
if !ok || name == "" {
continue
}
args := "{}"
if argsVal, ok := jsonObj["arguments"]; ok {
if argsStr, ok := argsVal.(string); ok {
args = argsStr
} else {
argsBytes, _ := json.Marshal(argsVal)
args = string(argsBytes)
}
}
initialMessage := schema.OpenAIResponse{
ID: id,
Created: created,
Model: req.Model,
Choices: []schema.Choice{{
Delta: &schema.Message{
Role: "assistant",
ToolCalls: []schema.ToolCall{
{
Index: i,
ID: id,
Type: "function",
FunctionCall: schema.FunctionCall{
Name: name,
Arguments: args,
},
},
},
},
Index: 0,
FinishReason: nil,
}},
Object: "chat.completion.chunk",
}
responses <- initialMessage
}
lastEmittedCount = len(jsonResults)
}
}
return true
},
func(attempt int) bool {
// After streaming completes: check if we got actionable content
cleaned := extractor.CleanedContent()
// Check for tool calls from chat deltas (will be re-checked after ComputeChoices,
// but we need to know here whether to retry).
// Also check ChatDelta flags: when the C++ autoparser is active,
// tool calls and content are delivered via ChatDeltas while the
// raw message is cleared. Without this check, we'd retry
// unnecessarily, losing valid results and concatenating output.
hasToolCalls := lastEmittedCount > 0 || hasChatDeltaToolCalls
hasContent := cleaned != "" || hasChatDeltaContent
if !hasContent && !hasToolCalls {
xlog.Warn("Streaming: backend produced only reasoning, retrying",
"reasoning_len", len(extractor.Reasoning()), "attempt", attempt+1)
extractor.ResetAndSuppressReasoning()
result = ""
lastEmittedCount = 0
sentInitialRole = false
hasChatDeltaToolCalls = false
hasChatDeltaContent = false
return true
}
return false
},
)
if err != nil {
return finalUsage, err
}
// Try using pre-parsed tool calls from C++ autoparser (chat deltas)
var functionResults []functions.FuncCallResults
var reasoning string
if deltaToolCalls := functions.ToolCallsFromChatDeltas(chatDeltas); len(deltaToolCalls) > 0 {
xlog.Debug("[ChatDeltas] Using pre-parsed tool calls from C++ autoparser", "count", len(deltaToolCalls))
functionResults = deltaToolCalls
// Use content/reasoning from deltas too
*textContentToReturn = functions.ContentFromChatDeltas(chatDeltas)
reasoning = functions.ReasoningFromChatDeltas(chatDeltas)
} else {
// Fallback: parse tool calls from raw text (no chat deltas from backend)
xlog.Debug("[ChatDeltas] no pre-parsed tool calls, falling back to Go-side text parsing")
reasoning = extractor.Reasoning()
cleanedResult := extractor.CleanedContent()
*textContentToReturn = functions.ParseTextContent(cleanedResult, cfg.FunctionsConfig)
cleanedResult = functions.CleanupLLMResult(cleanedResult, cfg.FunctionsConfig)
functionResults = functions.ParseFunctionCall(cleanedResult, cfg.FunctionsConfig)
}
xlog.Debug("[ChatDeltas] final tool call decision", "tool_calls", len(functionResults), "text_content", *textContentToReturn)
// noAction is a sentinel "just answer" pseudo-function: not a real
// tool call. Scan the whole slice rather than only index 0 so we
// don't drop a real tool call that happens to follow a noAction
// entry, and so the default branch isn't entered with only noAction
// entries to emit as tool_calls.
noActionToRun := !hasRealCall(functionResults, noAction)
switch {
case noActionToRun:
// The final usage trailer (when the caller opted in with
// stream_options.include_usage) is built by the outer streaming
// loop from the TokenUsage this function returns, not from any
// chunk on the responses channel.
var result string
if !sentInitialRole {
var hqErr error
result, hqErr = handleQuestion(cfg, functionResults, extractor.CleanedContent(), prompt)
if hqErr != nil {
xlog.Error("error handling question", "error", hqErr)
return finalUsage, hqErr
}
}
for _, chunk := range buildNoActionFinalChunks(
id, req.Model, created,
sentInitialRole, sentReasoning,
result, reasoning,
) {
responses <- chunk
}
default:
for _, chunk := range buildDeferredToolCallChunks(
id, req.Model, created,
functionResults, lastEmittedCount,
sentInitialRole, *textContentToReturn,
sentReasoning, reasoning,
) {
responses <- chunk
}
}
close(responses)
return finalUsage, err
}

View File

@@ -17,16 +17,20 @@ import (
)
type APIExchangeRequest struct {
Method string `json:"method"`
Path string `json:"path"`
Headers *http.Header `json:"headers"`
Body *[]byte `json:"body"`
Method string `json:"method"`
Path string `json:"path"`
Headers *http.Header `json:"headers"`
Body *[]byte `json:"body"`
BodyTruncated bool `json:"body_truncated,omitempty"`
BodyBytes int `json:"body_bytes,omitempty"` // original size before truncation
}
type APIExchangeResponse struct {
Status int `json:"status"`
Headers *http.Header `json:"headers"`
Body *[]byte `json:"body"`
Status int `json:"status"`
Headers *http.Header `json:"headers"`
Body *[]byte `json:"body"`
BodyTruncated bool `json:"body_truncated,omitempty"`
BodyBytes int `json:"body_bytes,omitempty"` // original size before truncation
}
type APIExchange struct {
@@ -66,11 +70,29 @@ var doInitializeTracing = sync.OnceFunc(func() {
type bodyWriter struct {
http.ResponseWriter
body *bytes.Buffer
body *bytes.Buffer
maxBytes int // 0 = unlimited capture
truncated bool
totalBytes int // bytes the upstream handler wrote, even past the cap
}
func (w *bodyWriter) Write(b []byte) (int, error) {
w.body.Write(b)
// Capture into the trace buffer up to maxBytes, then drop the overflow
// so a chatty endpoint can't grow the buffer without bound. The full
// payload still flows through to the real client below.
w.totalBytes += len(b)
if w.maxBytes <= 0 {
w.body.Write(b)
} else if remain := w.maxBytes - w.body.Len(); remain > 0 {
if remain >= len(b) {
w.body.Write(b)
} else {
w.body.Write(b[:remain])
w.truncated = true
}
} else {
w.truncated = true
}
return w.ResponseWriter.Write(b)
}
@@ -80,6 +102,20 @@ func (w *bodyWriter) Flush() {
}
}
// truncateForTrace returns a defensive copy of body capped at maxBytes,
// and a flag indicating whether the cap forced truncation. maxBytes <= 0
// disables the cap.
func truncateForTrace(body []byte, maxBytes int) ([]byte, bool) {
if maxBytes <= 0 || len(body) <= maxBytes {
out := make([]byte, len(body))
copy(out, body)
return out, false
}
out := make([]byte, maxBytes)
copy(out, body[:maxBytes])
return out, true
}
func initializeTracing(maxItems int) {
tracingMaxItems = maxItems
doInitializeTracing()
@@ -134,11 +170,18 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {
startTime := time.Now()
// Cap captured payload size. Without this, /embeddings and
// streaming /chat/completions blow the in-memory buffer into the
// tens of MB, which then locks the admin Traces UI fetching the
// JSON dump faster than the 5s auto-refresh.
maxBodyBytes := app.ApplicationConfig().TracingMaxBodyBytes
// Wrap response writer to capture body
resBody := new(bytes.Buffer)
mw := &bodyWriter{
ResponseWriter: c.Response().Writer,
body: resBody,
maxBytes: maxBodyBytes,
}
c.Response().Writer = mw
@@ -159,8 +202,7 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {
// via any heap-dump-style introspection, and tokens shouldn't
// outlive the request that carried them.
requestHeaders := redactSensitiveHeaders(c.Request().Header)
requestBody := make([]byte, len(body))
copy(requestBody, body)
requestBody, requestTruncated := truncateForTrace(body, maxBodyBytes)
responseHeaders := redactSensitiveHeaders(c.Response().Header())
responseBody := make([]byte, resBody.Len())
copy(responseBody, resBody.Bytes())
@@ -168,15 +210,19 @@ func TraceMiddleware(app *application.Application) echo.MiddlewareFunc {
Timestamp: startTime,
Duration: time.Since(startTime),
Request: APIExchangeRequest{
Method: c.Request().Method,
Path: c.Path(),
Headers: &requestHeaders,
Body: &requestBody,
Method: c.Request().Method,
Path: c.Path(),
Headers: &requestHeaders,
Body: &requestBody,
BodyTruncated: requestTruncated,
BodyBytes: len(body),
},
Response: APIExchangeResponse{
Status: status,
Headers: &responseHeaders,
Body: &responseBody,
Status: status,
Headers: &responseHeaders,
Body: &responseBody,
BodyTruncated: mw.truncated,
BodyBytes: mw.totalBytes,
},
}
if handlerErr != nil {

View File

@@ -0,0 +1,116 @@
package middleware
import (
"bytes"
"net/http/httptest"
"strings"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// The trace middleware copies request and response bodies into an in-memory
// buffer that backs the admin /api/traces endpoint. With no upper bound a
// chatty workload (embeddings, large completions) trivially produces a
// multi-MB response that locks the Traces UI in a loading state — fetching
// and parsing the payload outruns the 5-second auto-refresh. These specs
// pin the capping contract so future refactors keep both the cap and the
// passthrough to the real client intact.
var _ = Describe("bodyWriter capping", func() {
It("captures the full body when maxBytes is 0 (unlimited)", func() {
downstream := httptest.NewRecorder()
buf := &bytes.Buffer{}
bw := &bodyWriter{ResponseWriter: downstream, body: buf, maxBytes: 0}
payload := []byte(strings.Repeat("x", 4096))
n, err := bw.Write(payload)
Expect(err).ToNot(HaveOccurred())
Expect(n).To(Equal(len(payload)))
Expect(buf.Len()).To(Equal(len(payload)))
Expect(downstream.Body.Len()).To(Equal(len(payload)))
Expect(bw.truncated).To(BeFalse())
})
It("stops appending to the trace buffer once maxBytes is reached but still forwards to the client", func() {
downstream := httptest.NewRecorder()
buf := &bytes.Buffer{}
bw := &bodyWriter{ResponseWriter: downstream, body: buf, maxBytes: 100}
payload := []byte(strings.Repeat("a", 250))
n, err := bw.Write(payload)
Expect(err).ToNot(HaveOccurred())
Expect(n).To(Equal(len(payload)), "Write must return the full byte count so callers see no short write")
Expect(buf.Len()).To(Equal(100), "trace buffer should hold exactly maxBytes")
Expect(downstream.Body.Len()).To(Equal(len(payload)), "client must still receive every byte")
Expect(bw.truncated).To(BeTrue())
})
It("handles a write that straddles the cap by keeping only the leading slice", func() {
downstream := httptest.NewRecorder()
buf := &bytes.Buffer{}
bw := &bodyWriter{ResponseWriter: downstream, body: buf, maxBytes: 10}
_, err := bw.Write([]byte("12345"))
Expect(err).ToNot(HaveOccurred())
Expect(bw.truncated).To(BeFalse())
_, err = bw.Write([]byte("67890ABCDE"))
Expect(err).ToNot(HaveOccurred())
Expect(buf.String()).To(Equal("1234567890"))
Expect(downstream.Body.String()).To(Equal("1234567890ABCDE"))
Expect(bw.truncated).To(BeTrue())
})
It("ignores further writes after the cap was already hit", func() {
downstream := httptest.NewRecorder()
buf := &bytes.Buffer{}
bw := &bodyWriter{ResponseWriter: downstream, body: buf, maxBytes: 4}
_, _ = bw.Write([]byte("AAAA"))
_, _ = bw.Write([]byte("BBBB"))
_, _ = bw.Write([]byte("CCCC"))
Expect(buf.String()).To(Equal("AAAA"))
Expect(downstream.Body.String()).To(Equal("AAAABBBBCCCC"))
Expect(bw.truncated).To(BeTrue())
})
})
var _ = Describe("truncateForTrace", func() {
It("returns the input unchanged when below the cap", func() {
in := []byte("hello")
out, truncated := truncateForTrace(in, 1024)
Expect(truncated).To(BeFalse())
Expect(out).To(Equal(in))
})
It("truncates when the input exceeds the cap and signals truncation", func() {
in := []byte(strings.Repeat("z", 200))
out, truncated := truncateForTrace(in, 64)
Expect(truncated).To(BeTrue())
Expect(out).To(HaveLen(64))
Expect(string(out)).To(Equal(strings.Repeat("z", 64)))
})
It("treats maxBytes <= 0 as unlimited (back-compat with current default)", func() {
in := []byte(strings.Repeat("q", 10_000))
out, truncated := truncateForTrace(in, 0)
Expect(truncated).To(BeFalse())
Expect(out).To(HaveLen(len(in)))
})
It("does not retain the caller's backing array (defensive copy)", func() {
in := []byte("abcdefghij")
out, truncated := truncateForTrace(in, 4)
Expect(truncated).To(BeTrue())
Expect(string(out)).To(Equal("abcd"))
// Mutating the source must not corrupt the trace copy.
in[0] = 'Z'
Expect(string(out)).To(Equal("abcd"))
})
})

View File

@@ -4,6 +4,7 @@ import (
"bytes"
"encoding/json"
"sync"
"sync/atomic"
"time"
"github.com/labstack/echo/v4"
@@ -14,18 +15,37 @@ import (
const (
usageFlushInterval = 5 * time.Second
usageMaxPending = 5000
// usageMaxPending bounds the in-memory queue. Sized for bursty inference
// traffic on a self-hosted instance with a slow or unavailable DB.
usageMaxPending = 50000
)
// usageBatcher accumulates usage records and flushes them to the DB periodically.
type usageBatcher struct {
mu sync.Mutex
pending []*auth.UsageRecord
db *gorm.DB
mu sync.Mutex
pending []*auth.UsageRecord
db *gorm.DB
stop chan struct{}
done chan struct{}
stopOnce sync.Once
}
// droppedRecords counts records discarded because the in-memory queue was full.
// Used to rate-limit the warn log so a sustained outage doesn't flood it.
var droppedRecords atomic.Uint64
func (b *usageBatcher) add(r *auth.UsageRecord) {
b.mu.Lock()
if len(b.pending) >= usageMaxPending {
b.mu.Unlock()
// Rate-limit: one warn per 1024 drops keeps the log readable.
n := droppedRecords.Add(1)
if n&1023 == 1 {
xlog.Warn("usage batcher full, dropping record",
"cap", usageMaxPending, "total_dropped", n)
}
return
}
b.pending = append(b.pending, r)
b.mu.Unlock()
}
@@ -42,31 +62,102 @@ func (b *usageBatcher) flush() {
if err := b.db.Create(&batch).Error; err != nil {
xlog.Error("Failed to flush usage batch", "count", len(batch), "error", err)
// Re-queue failed records with a cap to avoid unbounded growth
// Cap-aware re-queue: prepend as much of the failed batch as fits
// alongside any records added concurrently with the failed write.
b.mu.Lock()
if len(b.pending) < usageMaxPending {
b.pending = append(batch, b.pending...)
room := usageMaxPending - len(b.pending)
if room > 0 {
if room > len(batch) {
room = len(batch)
}
b.pending = append(batch[:room], b.pending...)
}
b.mu.Unlock()
}
}
var batcher *usageBatcher
func (b *usageBatcher) run() {
defer close(b.done)
ticker := time.NewTicker(usageFlushInterval)
defer ticker.Stop()
for {
select {
case <-ticker.C:
b.flush()
case <-b.stop:
b.flush() // final drain
return
}
}
}
func (b *usageBatcher) shutdown() {
b.stopOnce.Do(func() {
close(b.stop)
<-b.done
})
}
// The package-level batcher is guarded by batcherMu so Init / Shutdown cycles
// (the test pattern) don't race against UsageMiddleware reads.
var (
batcherMu sync.RWMutex
batcher *usageBatcher
)
func currentBatcher() *usageBatcher {
batcherMu.RLock()
defer batcherMu.RUnlock()
return batcher
}
// InitUsageRecorder starts a background goroutine that periodically flushes
// accumulated usage records to the database.
// accumulated usage records to the database. Calling it more than once
// shuts down the previous batcher first so its goroutine doesn't leak.
func InitUsageRecorder(db *gorm.DB) {
if db == nil {
return
}
batcher = &usageBatcher{db: db}
go func() {
ticker := time.NewTicker(usageFlushInterval)
defer ticker.Stop()
for range ticker.C {
batcher.flush()
}
}()
batcherMu.Lock()
old := batcher
batcher = nil
batcherMu.Unlock()
if old != nil {
old.shutdown()
}
b := &usageBatcher{
db: db,
stop: make(chan struct{}),
done: make(chan struct{}),
}
batcherMu.Lock()
batcher = b
batcherMu.Unlock()
go b.run()
}
// ShutdownUsageRecorder stops the background flusher and synchronously drains
// pending records once. Safe to call multiple times. Not yet wired into the
// application lifecycle; intended for graceful process exit and tests.
func ShutdownUsageRecorder() {
batcherMu.Lock()
b := batcher
batcher = nil
batcherMu.Unlock()
if b != nil {
b.shutdown()
}
}
// FlushNow synchronously flushes any pending usage records. Intended for tests
// that need deterministic behaviour without waiting for the ticker.
func FlushNow() {
if b := currentBatcher(); b != nil {
b.flush()
}
}
// usageResponseBody is the minimal structure we need from the response JSON.
@@ -84,7 +175,8 @@ type usageResponseBody struct {
func UsageMiddleware(db *gorm.DB) echo.MiddlewareFunc {
return func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c echo.Context) error {
if db == nil || batcher == nil {
b := currentBatcher()
if db == nil || b == nil {
return next(c)
}
@@ -149,9 +241,17 @@ func UsageMiddleware(db *gorm.DB) echo.MiddlewareFunc {
return handlerErr
}
source := auth.GetSource(c)
if source == "" {
// Auth disabled or unrecognised path: classify as web so the row is still
// bucketable rather than silently dropped from per-source aggregates.
source = auth.UsageSourceWeb
}
record := &auth.UsageRecord{
UserID: user.ID,
UserName: user.Name,
Source: source,
Model: resp.Model,
Endpoint: c.Request().URL.Path,
PromptTokens: resp.Usage.PromptTokens,
@@ -161,7 +261,13 @@ func UsageMiddleware(db *gorm.DB) echo.MiddlewareFunc {
CreatedAt: startTime,
}
batcher.add(record)
if key := auth.GetAPIKey(c); key != nil {
id := key.ID
record.APIKeyID = &id
record.APIKeyName = key.Name
}
b.add(record)
return handlerErr
}

View File

@@ -0,0 +1,140 @@
//go:build auth
package middleware_test
import (
"bytes"
"encoding/json"
"net/http"
"net/http/httptest"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/http/auth"
"github.com/mudler/LocalAI/core/http/middleware"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"gorm.io/gorm"
)
// testAuthDB returns a fresh in-memory SQLite auth DB.
func testAuthDB() *gorm.DB {
db, err := auth.InitDB(":memory:")
if err != nil {
panic(err)
}
return db
}
var _ = Describe("UsageMiddleware", func() {
var (
e *echo.Echo
db *gorm.DB
)
BeforeEach(func() {
db = testAuthDB()
e = echo.New()
middleware.InitUsageRecorder(db)
})
AfterEach(func() {
middleware.ShutdownUsageRecorder()
})
okHandler := func(c echo.Context) error {
body, _ := json.Marshal(map[string]any{
"model": "gpt-4",
"usage": map[string]int{
"prompt_tokens": 10, "completion_tokens": 5, "total_tokens": 15,
},
})
c.Response().Header().Set("Content-Type", "application/json")
c.Response().WriteHeader(http.StatusOK)
_, _ = c.Response().Write(body)
return nil
}
// FlushNow drains pending records synchronously, replacing the 6s sleep
// that was previously needed to wait for the batcher's ticker.
flush := middleware.FlushNow
It("records source=web when auth_source is web", func() {
e.POST("/v1/chat/completions", okHandler, func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c echo.Context) error {
c.Set("auth_user", &auth.User{ID: "alice", Name: "Alice"})
c.Set("auth_source", auth.UsageSourceWeb)
return next(c)
}
}, middleware.UsageMiddleware(db))
req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewReader([]byte(`{}`)))
e.ServeHTTP(httptest.NewRecorder(), req)
flush()
var rec auth.UsageRecord
Expect(db.Where("user_id = ?", "alice").First(&rec).Error).To(Succeed())
Expect(rec.Source).To(Equal(auth.UsageSourceWeb))
Expect(rec.APIKeyID).To(BeNil())
Expect(rec.APIKeyName).To(BeEmpty())
})
It("records source=apikey with snapshotted name when auth_apikey is set", func() {
e.POST("/v1/chat/completions", okHandler, func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c echo.Context) error {
c.Set("auth_user", &auth.User{ID: "alice", Name: "Alice"})
c.Set("auth_source", auth.UsageSourceAPIKey)
c.Set("auth_apikey", &auth.UserAPIKey{ID: "key-1", Name: "ci-runner"})
return next(c)
}
}, middleware.UsageMiddleware(db))
req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewReader([]byte(`{}`)))
e.ServeHTTP(httptest.NewRecorder(), req)
flush()
var rec auth.UsageRecord
Expect(db.Where("user_id = ?", "alice").First(&rec).Error).To(Succeed())
Expect(rec.Source).To(Equal(auth.UsageSourceAPIKey))
Expect(rec.APIKeyID).ToNot(BeNil())
Expect(*rec.APIKeyID).To(Equal("key-1"))
Expect(rec.APIKeyName).To(Equal("ci-runner"))
})
It("FlushNow drains pending records synchronously", func() {
e.POST("/v1/chat/completions", okHandler, func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c echo.Context) error {
c.Set("auth_user", &auth.User{ID: "carol", Name: "Carol"})
c.Set("auth_source", auth.UsageSourceWeb)
return next(c)
}
}, middleware.UsageMiddleware(db))
req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewReader([]byte(`{}`)))
e.ServeHTTP(httptest.NewRecorder(), req)
// No sleep: FlushNow should drain immediately.
middleware.FlushNow()
var rec auth.UsageRecord
Expect(db.Where("user_id = ?", "carol").First(&rec).Error).To(Succeed())
Expect(rec.Source).To(Equal(auth.UsageSourceWeb))
})
It("falls back to source=web when auth_source is empty", func() {
e.POST("/v1/chat/completions", okHandler, func(next echo.HandlerFunc) echo.HandlerFunc {
return func(c echo.Context) error {
c.Set("auth_user", &auth.User{ID: "alice", Name: "Alice"})
// no auth_source set
return next(c)
}
}, middleware.UsageMiddleware(db))
req := httptest.NewRequest("POST", "/v1/chat/completions", bytes.NewReader([]byte(`{}`)))
e.ServeHTTP(httptest.NewRecorder(), req)
flush()
var rec auth.UsageRecord
Expect(db.Where("user_id = ?", "alice").First(&rec).Error).To(Succeed())
Expect(rec.Source).To(Equal(auth.UsageSourceWeb))
})
})

View File

@@ -0,0 +1,143 @@
import { test, expect } from '@playwright/test'
// Regression coverage for issue #9904:
// - /api/operations was polled every 1s and *always* re-rendered the Chat
// page, even when the response was unchanged. The reconciliation would
// collapse any text selection inside an assistant message.
// - The copy button next to each assistant message used navigator.clipboard
// without any fallback, which is undefined when the page is served over
// plain http (non-secure context) from a remote host.
async function setupChatPage(page) {
await page.route('**/api/models/capabilities', (route) => {
route.fulfill({
contentType: 'application/json',
body: JSON.stringify({
data: [{ id: 'test-model', capabilities: ['FLAG_CHAT'] }],
}),
})
})
// Poll-tracking mock: assert the hook is hammering /api/operations every
// ~1s, and always return an empty list so its contents never change.
let operationsHits = 0
await page.route('**/api/operations', (route) => {
operationsHits++
route.fulfill({
contentType: 'application/json',
body: JSON.stringify({ operations: [] }),
})
})
await page.route('**/v1/chat/completions', (route) => {
// One short SSE stream so the chat finishes streaming quickly and we
// can interact with a stable assistant message.
const body = [
'data: {"choices":[{"delta":{"content":"Hello world this is a long assistant reply that we can try to select."},"index":0}]}\n\n',
'data: {"choices":[{"delta":{},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":1,"completion_tokens":1,"total_tokens":2}}\n\n',
'data: [DONE]\n\n',
].join('')
route.fulfill({
status: 200,
headers: { 'Content-Type': 'text/event-stream' },
body,
})
})
return { getOperationsHits: () => operationsHits }
}
test.describe('Chat - /api/operations polling (#9904)', () => {
test('text selection inside an assistant message survives polling', async ({ page }) => {
const { getOperationsHits } = await setupChatPage(page)
await page.goto('/app/chat')
await expect(page.getByRole('button', { name: 'test-model' })).toBeVisible({ timeout: 10_000 })
await page.locator('.chat-input').fill('Hi')
await page.locator('.chat-send-btn').click()
const assistantContent = page.locator('.chat-message-assistant .chat-message-content').first()
await expect(assistantContent).toContainText('Hello world', { timeout: 10_000 })
// Sanity check: the polling we're regressing against is actually firing.
await page.waitForTimeout(2_500)
expect(getOperationsHits()).toBeGreaterThan(1)
// Sanity check that the bug we're guarding against is structurally
// possible: count how many times the assistant content node gets
// *touched* by React (childList / characterData mutations) over a
// 3-second window. Before the fix, every poll re-rendered Chat and
// re-set dangerouslySetInnerHTML, triggering a mutation cascade that
// collapsed the user's text selection. After the fix, polling with
// identical contents must not mutate the DOM at all.
const mutationCount = await assistantContent.evaluate((el) => new Promise((resolve) => {
let count = 0
const obs = new MutationObserver((records) => { count += records.length })
obs.observe(el, { childList: true, subtree: true, characterData: true })
setTimeout(() => { obs.disconnect(); resolve(count) }, 3_000)
}))
expect(mutationCount).toBe(0)
// Same sanity check translated to a user-observable property: a
// programmatically created selection survives the polling window.
await assistantContent.evaluate((el) => {
const range = document.createRange()
range.selectNodeContents(el)
const sel = window.getSelection()
sel.removeAllRanges()
sel.addRange(range)
})
const initialSelection = await page.evaluate(() => window.getSelection().toString())
expect(initialSelection).toContain('Hello world')
await page.waitForTimeout(2_500)
const selectionAfterPolling = await page.evaluate(() => window.getSelection().toString())
expect(selectionAfterPolling).toBe(initialSelection)
})
})
test.describe('Chat - copy button (#9904)', () => {
test('copy button works when navigator.clipboard is unavailable (plain http)', async ({ page }) => {
await setupChatPage(page)
// Simulate a non-secure context: hide navigator.clipboard before any of
// our app code touches it. This mirrors what browsers do over plain
// http from a remote host.
await page.addInitScript(() => {
Object.defineProperty(window, 'isSecureContext', { value: false, configurable: true })
try {
Object.defineProperty(navigator, 'clipboard', { value: undefined, configurable: true })
} catch { /* some browsers refuse — the secure-context flag is enough */ }
})
await page.goto('/app/chat')
await expect(page.getByRole('button', { name: 'test-model' })).toBeVisible({ timeout: 10_000 })
await page.locator('.chat-input').fill('Hi')
await page.locator('.chat-send-btn').click()
const assistantBubble = page.locator('.chat-message-assistant .chat-message-bubble').first()
await expect(assistantBubble).toContainText('Hello world', { timeout: 10_000 })
// Spy on document.execCommand so we can confirm the fallback path ran.
await page.evaluate(() => {
window.__execCommandCalls = []
const original = document.execCommand?.bind(document)
document.execCommand = (cmd, ...rest) => {
window.__execCommandCalls.push(cmd)
// execCommand('copy') in a headless browser may return false because
// there is no real clipboard, but the fact that we tried is what we
// care about for this regression.
return original ? original(cmd, ...rest) : false
}
})
await assistantBubble.locator('.chat-message-actions button').first().click()
const execCommandCalls = await page.evaluate(() => window.__execCommandCalls)
expect(execCommandCalls).toContain('copy')
})
})

View File

@@ -97,7 +97,8 @@
},
"toasts": {
"selectModel": "Bitte wählen Sie ein Modell",
"copied": "In die Zwischenablage kopiert"
"copied": "In die Zwischenablage kopiert",
"copyFailed": "Kopieren in die Zwischenablage fehlgeschlagen"
},
"menu": {
"trigger": "Chats",

View File

@@ -53,7 +53,30 @@
},
"usage": {
"title": "Usage",
"subtitle": "API token usage statistics"
"subtitle": "API token usage statistics",
"sources": {
"tab": "Sources",
"mixTitle": "Source mix",
"ribbonAria": "{{apikey}}% API keys, {{web}}% Web UI, {{legacy}}% Legacy",
"topSources": "Top sources over time",
"searchPlaceholder": "Search by name or prefix",
"sortBy": "Sort",
"sortTokens": "Tokens",
"sortRequests": "Requests",
"sortLastUsed": "Last used",
"sortName": "Name",
"sortUser": "User",
"webUI": "Web UI",
"legacy": "Legacy",
"revoked": "revoked",
"filteredTo": "Filtered to: {{name}}",
"clearFilter": "Clear filter",
"other": "Other ({{count}})",
"noTrafficShort": "No requests in this period.",
"noKeysYet": "Once requests come in, you'll see them broken down here.",
"createKey": "Create your first API key",
"truncatedWarning": "Showing top 200 keys. Apply a filter to narrow further."
}
},
"explorer": {
"title": "Explorer",

View File

@@ -97,7 +97,8 @@
},
"toasts": {
"selectModel": "Please select a model",
"copied": "Copied to clipboard"
"copied": "Copied to clipboard",
"copyFailed": "Could not copy to clipboard"
},
"menu": {
"trigger": "Chats",

View File

@@ -97,7 +97,8 @@
},
"toasts": {
"selectModel": "Por favor selecciona un modelo",
"copied": "Copiado al portapapeles"
"copied": "Copiado al portapapeles",
"copyFailed": "No se pudo copiar al portapapeles"
},
"menu": {
"trigger": "Chats",

View File

@@ -97,7 +97,8 @@
},
"toasts": {
"selectModel": "Seleziona un modello",
"copied": "Copiato negli appunti"
"copied": "Copiato negli appunti",
"copyFailed": "Impossibile copiare negli appunti"
},
"menu": {
"trigger": "Chat",

View File

@@ -97,7 +97,8 @@
},
"toasts": {
"selectModel": "请选择一个模型",
"copied": "已复制到剪贴板"
"copied": "已复制到剪贴板",
"copyFailed": "无法复制到剪贴板"
},
"menu": {
"trigger": "聊天",

View File

@@ -2,6 +2,7 @@ import { useState, useEffect, useRef } from 'react'
import { renderMarkdown } from '../utils/markdown'
import { getArtifactIcon } from '../utils/artifacts'
import { safeHref } from '../utils/url'
import { copyToClipboard } from '../utils/clipboard'
import DOMPurify from 'dompurify'
import hljs from 'highlight.js'
@@ -23,11 +24,13 @@ export default function CanvasPanel({ artifacts, selectedId, onSelect, onClose }
}
}, [current, showPreview])
const handleCopy = () => {
const handleCopy = async () => {
const text = current.code || current.url || ''
navigator.clipboard.writeText(text)
setCopySuccess(true)
setTimeout(() => setCopySuccess(false), 2000)
const ok = await copyToClipboard(text)
if (ok) {
setCopySuccess(true)
setTimeout(() => setCopySuccess(false), 2000)
}
}
const handleDownload = () => {

View File

@@ -1,7 +1,7 @@
import { useState, useMemo, useEffect, useRef } from 'react'
import Modal from './Modal'
import SearchableSelect from './SearchableSelect'
import { nodesApi } from '../utils/api'
import { nodesApi, backendsApi } from '../utils/api'
// NodeInstallPicker is the single multi-node install surface used both from
// the Backends gallery split-button and from the "Install on more nodes" `+`
@@ -240,6 +240,37 @@ export default function NodeInstallPicker({
}
const clearSelection = () => setSelected(new Set())
// pollJob resolves with { done: true, error?: string } once a single job
// completes, fails, or is cancelled. Bounded by a hard wall-clock cap so a
// stuck worker eventually surfaces in the UI as "Failed" instead of
// spinning forever.
const pollJob = (jobID) => new Promise((resolve) => {
const POLL_INTERVAL_MS = 1500
const HARD_CAP_MS = 6 * 60 * 1000 // 6 min - generous for a fresh worker download
const startedAt = Date.now()
const tick = async () => {
try {
const status = await backendsApi.getJob(jobID)
if (status?.completed) { resolve({ done: true }); return }
if (status?.error) { resolve({ done: true, error: status.error }); return }
if (status?.processed && !status?.completed) {
resolve({ done: true, error: status.error || 'install did not complete' })
return
}
} catch (err) {
resolve({ done: true, error: err?.message || 'polling failed' })
return
}
if (Date.now() - startedAt > HARD_CAP_MS) {
resolve({ done: true, error: 'timed out waiting for install to finish' })
return
}
setTimeout(tick, POLL_INTERVAL_MS)
}
tick()
})
const submit = async () => {
if (selected.size === 0 || submitting) return
if (counts.overrides > 0 && !showMismatchConfirm) {
@@ -255,38 +286,68 @@ export default function NodeInstallPicker({
return next
})
const results = await Promise.allSettled(ids.map(id =>
// Phase 1: dispatch all installs in parallel. Each POST returns immediately
// with { jobID } now that the handler is async.
const dispatchResults = await Promise.allSettled(ids.map(id =>
nodesApi.installBackend(id, effectiveBackendName)
.then(r => ({ id, ok: true, message: r?.message }))
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
.then(r => ({ id, ok: true, jobID: r?.jobID }))
.catch(err => ({ id, ok: false, error: err?.message || 'dispatch failed' }))
))
let successCount = 0, failCount = 0
setPerNode(prev => {
const next = { ...prev }
for (const r of results) {
if (r.status !== 'fulfilled') continue
const v = r.value
if (v.ok) {
next[v.id] = { status: 'done' }
successCount++
} else {
next[v.id] = { status: 'error', error: v.error }
failCount++
}
// Classify dispatch results synchronously OUTSIDE the setter. React may
// invoke a functional state updater more than once (StrictMode dev double
// invoke, concurrent rendering replay): building the jobs array inside
// the closure would duplicate entries and re-poll the same job.
const jobs = []
const dispatchPatch = {}
for (const r of dispatchResults) {
if (r.status !== 'fulfilled') continue
const v = r.value
if (v.ok && v.jobID) {
dispatchPatch[v.id] = { status: 'installing', jobID: v.jobID }
jobs.push({ nodeID: v.id, jobID: v.jobID })
} else {
dispatchPatch[v.id] = { status: 'error', error: v.error || 'dispatch failed' }
}
return next
}
setPerNode(prev => ({ ...prev, ...dispatchPatch }))
// Phase 2: poll each job. Promise.all resolves when the last job settles;
// intermediate updates flip per-row state via the setPerNode inside pollJob.
await Promise.all(jobs.map(async ({ nodeID, jobID }) => {
const result = await pollJob(jobID)
setPerNode(prev => {
const next = { ...prev }
if (result.error) {
next[nodeID] = { status: 'error', error: result.error, jobID }
} else {
next[nodeID] = { status: 'done', jobID }
}
return next
})
}))
// Phase 3: summary toast + onComplete. Read latest state via functional setter.
let successCount = 0
let failCount = 0
setPerNode(prev => {
for (const v of Object.values(prev)) {
if (v.status === 'done') successCount++
else if (v.status === 'error') failCount++
}
return prev
})
setSubmitting(false)
if (successCount > 0 && onComplete) onComplete()
if (failCount === 0) {
if (failCount === 0 && successCount > 0) {
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
setTimeout(() => onClose?.(), 800)
} else if (successCount === 0) {
} else if (successCount === 0 && failCount > 0) {
addToast?.(`Install failed on all ${failCount} node${failCount === 1 ? '' : 's'}`, 'error')
} else {
} else if (successCount > 0 && failCount > 0) {
addToast?.(`Installed on ${successCount}, failed on ${failCount}`, 'warning')
}
}
@@ -297,32 +358,58 @@ export default function NodeInstallPicker({
.map(([id]) => id)
if (failedIds.length === 0) return
setSelected(new Set(failedIds))
// Replace state for failed rows so they show "installing" again, not stale errors.
setPerNode(prev => {
const next = { ...prev }
failedIds.forEach(id => { next[id] = { status: 'installing' } })
return next
})
setSubmitting(true)
const results = await Promise.allSettled(failedIds.map(id =>
const dispatchResults = await Promise.allSettled(failedIds.map(id =>
nodesApi.installBackend(id, effectiveBackendName)
.then(r => ({ id, ok: true, message: r?.message }))
.catch(err => ({ id, ok: false, error: err?.message || 'install failed' }))
.then(r => ({ id, ok: true, jobID: r?.jobID }))
.catch(err => ({ id, ok: false, error: err?.message || 'dispatch failed' }))
))
// Same precaution as in submit(): classify outside the functional setter
// so a replayed updater can't push duplicate jobs into the polling list.
const jobs = []
const dispatchPatch = {}
for (const r of dispatchResults) {
if (r.status !== 'fulfilled') continue
const v = r.value
if (v.ok && v.jobID) {
dispatchPatch[v.id] = { status: 'installing', jobID: v.jobID }
jobs.push({ nodeID: v.id, jobID: v.jobID })
} else {
dispatchPatch[v.id] = { status: 'error', error: v.error || 'dispatch failed' }
}
}
setPerNode(prev => ({ ...prev, ...dispatchPatch }))
await Promise.all(jobs.map(async ({ nodeID, jobID }) => {
const result = await pollJob(jobID)
setPerNode(prev => {
const next = { ...prev }
if (result.error) next[nodeID] = { status: 'error', error: result.error, jobID }
else next[nodeID] = { status: 'done', jobID }
return next
})
}))
setSubmitting(false)
let successCount = 0, failCount = 0
setPerNode(prev => {
const next = { ...prev }
for (const r of results) {
if (r.status !== 'fulfilled') continue
const v = r.value
if (v.ok) { next[v.id] = { status: 'done' }; successCount++ }
else { next[v.id] = { status: 'error', error: v.error }; failCount++ }
for (const id of failedIds) {
const v = prev[id]
if (v?.status === 'done') successCount++
else if (v?.status === 'error') failCount++
}
return next
return prev
})
setSubmitting(false)
if (successCount > 0 && onComplete) onComplete()
if (failCount === 0) {
if (failCount === 0 && successCount > 0) {
addToast?.(`Installed on ${successCount} node${successCount === 1 ? '' : 's'}`, 'success')
setTimeout(() => onClose?.(), 800)
}

View File

@@ -218,9 +218,15 @@ export function useChat(initialModel = '') {
})
userFiles.push({ name: file.name, type: 'audio' })
} else {
// Text/PDF files - append to content
userFiles.push({ name: file.name, type: 'file', content: file.textContent || '' })
}
// Text/PDF files - append to content
if (file.textContent) {
messageContent.push({
type: 'text',
text: `\n\n--- File: ${file.name} ---\n${file.textContent}\n--- End of ${file.name} ---`,
})
}
userFiles.push({ name: file.name, type: 'file', content: file.textContent || '' })
}
}
} else {
messageContent = content

View File

@@ -2,6 +2,14 @@ import { useState, useEffect, useCallback, useRef } from 'react'
import { operationsApi } from '../utils/api'
import { useAuth } from '../context/AuthContext'
// Serialize ops into a stable comparison key. Each op is a flat map of
// primitives, so JSON.stringify is good enough and stable as long as the
// server emits keys in the same order (Go's map iteration into JSON happens
// to be stable here because we build an explicit map[string]any).
function serializeOps(ops) {
return JSON.stringify(ops)
}
export function useOperations(pollInterval = 1000) {
const [operations, setOperations] = useState([])
const [loading, setLoading] = useState(true)
@@ -11,16 +19,26 @@ export function useOperations(pollInterval = 1000) {
const previousCountRef = useRef(0)
const onAllCompleteRef = useRef(null)
// Track the last payload we wrote into state. Each poll otherwise produces
// a fresh array reference even when nothing changed, and that re-render
// ripples into the Chat page — wiping the user's text selection mid-read
// (#9904).
const lastSerializedRef = useRef('[]')
const fetchOperations = useCallback(async () => {
if (!isAdmin) {
setLoading(false)
setLoading((prev) => (prev ? false : prev))
return
}
try {
const data = await operationsApi.list()
const ops = data?.operations || (Array.isArray(data) ? data : [])
setOperations(ops)
const serialized = serializeOps(ops)
if (serialized !== lastSerializedRef.current) {
lastSerializedRef.current = serialized
setOperations(ops)
}
// Separate active (non-failed) operations from failed ones
const activeOps = ops.filter(op => !op.error)
@@ -32,11 +50,11 @@ export function useOperations(pollInterval = 1000) {
}
previousCountRef.current = activeOps.length
setError(null)
setError((prev) => (prev === null ? prev : null))
} catch (err) {
setError(err.message)
setError((prev) => (prev === err.message ? prev : err.message))
} finally {
setLoading(false)
setLoading((prev) => (prev ? false : prev))
}
}, [isAdmin])

View File

@@ -9,6 +9,7 @@ import ResourceCards from '../components/ResourceCards'
import ConfirmDialog from '../components/ConfirmDialog'
import { useAgentChat } from '../hooks/useAgentChat'
import { relativeTime } from '../utils/format'
import { copyToClipboard } from '../utils/clipboard'
function getLastMessagePreview(conv) {
if (!conv.messages || conv.messages.length === 0) return ''
@@ -390,9 +391,13 @@ export default function AgentChat() {
}
}
const copyMessage = (content) => {
navigator.clipboard.writeText(content)
addToast('Copied to clipboard', 'success', 2000)
const copyMessage = async (content) => {
const ok = await copyToClipboard(content)
addToast(
ok ? 'Copied to clipboard' : 'Could not copy to clipboard',
ok ? 'success' : 'error',
ok ? 2000 : 3000,
)
}
const senderToRole = (sender) => {

View File

@@ -179,16 +179,19 @@ export default function Backends() {
// Install a single gallery backend on a specific node, used in target-node
// mode (the URL has ?target=<node-id> set from the Nodes page entry point).
// The handler is async - we dispatch and let the global Operations panel
// surface progress; no need to await completion here.
const handleInstallOnTarget = async (id) => {
if (!targetNode) return
try {
await nodesApi.installBackend(targetNode.id, id)
addToast(`Installing ${id} on ${targetNode.name}`, 'info')
// Per-node install is request-reply, not part of the global jobs feed —
// refetch to reflect the new Nodes column state.
setTimeout(() => { fetchBackends(); refetchNodes() }, 600)
addToast(`Installing ${id} on ${targetNode.name}...`, 'info')
// The install runs async via the gallery job queue. Refetch shortly so
// the Nodes column reflects "installing" state; the Operations panel
// tracks the actual progress until completion.
setTimeout(() => { fetchBackends(); refetchNodes() }, 1200)
} catch (err) {
addToast(`Install failed on ${targetNode.name}: ${err.message}`, 'error')
addToast(`Install dispatch failed on ${targetNode.name}: ${err.message}`, 'error')
}
}

View File

@@ -17,6 +17,7 @@ import ChatsMenu from '../components/ChatsMenu'
import { useAuth } from '../context/AuthContext'
import { useOperations } from '../hooks/useOperations'
import { relativeTime } from '../utils/format'
import { copyToClipboard } from '../utils/clipboard'
function getLastMessagePreview(chat) {
if (!chat.history || chat.history.length === 0) return ''
@@ -798,10 +799,14 @@ export default function Chat() {
}
}
const copyMessage = (content) => {
const copyMessage = async (content) => {
const text = typeof content === 'string' ? content : content?.[0]?.text || ''
navigator.clipboard.writeText(text)
addToast(t('toasts.copied'), 'success', 2000)
const ok = await copyToClipboard(text)
if (ok) {
addToast(t('toasts.copied'), 'success', 2000)
} else {
addToast(t('toasts.copyFailed'), 'error', 3000)
}
}
const contextPercent = getContextUsagePercent()

View File

@@ -161,7 +161,11 @@ export default function Home() {
const newFiles = []
for (const file of fileList) {
const base64 = await fileToBase64(file)
newFiles.push({ name: file.name, type: file.type, base64 })
const entry = { name: file.name, type: file.type, base64 }
if (!file.type.startsWith('image/') && !file.type.startsWith('audio/')) {
entry.textContent = await file.text().catch(() => '')
}
newFiles.push(entry)
}
setter(prev => [...prev, ...newFiles])
}, [])

View File

@@ -406,7 +406,15 @@ export default function Traces() {
<button className="btn btn-secondary btn-sm" onClick={fetchTraces}><i className="fas fa-rotate" /> Refresh</button>
<button className="btn btn-secondary btn-sm" onClick={handleExport} disabled={traces.length === 0}><i className="fas fa-download" /> Export</button>
<div style={{ flex: 1 }} />
<button className="btn btn-danger btn-sm" onClick={handleClear} disabled={traces.length === 0}><i className="fas fa-trash" /> Clear</button>
<button
className="btn btn-danger btn-sm"
onClick={handleClear}
/* Stay enabled while loading: a massive in-memory trace buffer is
precisely the case where the user can't see the table yet and
needs Clear to recover. Clearing an already-empty server-side
buffer is a harmless no-op. */
disabled={!loading && traces.length === 0}
><i className="fas fa-trash" /> Clear</button>
</div>
{settings && (() => {

View File

@@ -4,6 +4,7 @@ import { useTranslation } from 'react-i18next'
import { useAuth } from '../context/AuthContext'
import { apiUrl } from '../utils/basePath'
import LoadingSpinner from '../components/LoadingSpinner'
import SourcesTab from './Usage/SourcesTab'
const PERIODS = [
{ key: 'day', label: 'Day' },
@@ -724,23 +725,27 @@ export default function Usage() {
{p.label}
</button>
))}
<div style={{ width: 1, height: 20, background: 'var(--color-border-subtle)', margin: '0 var(--spacing-xs)' }} />
<button
className={`btn btn-sm ${activeTab === 'models' ? 'btn-primary' : 'btn-secondary'}`}
onClick={() => setActiveTab('models')}
>
<i className="fas fa-cube" style={{ fontSize: '0.7rem' }} /> Models
</button>
{isAdmin && (
<>
<div style={{ width: 1, height: 20, background: 'var(--color-border-subtle)', margin: '0 var(--spacing-xs)' }} />
<button
className={`btn btn-sm ${activeTab === 'models' ? 'btn-primary' : 'btn-secondary'}`}
onClick={() => setActiveTab('models')}
>
<i className="fas fa-cube" style={{ fontSize: '0.7rem' }} /> Models
</button>
<button
className={`btn btn-sm ${activeTab === 'users' ? 'btn-primary' : 'btn-secondary'}`}
onClick={() => setActiveTab('users')}
>
<i className="fas fa-users" style={{ fontSize: '0.7rem' }} /> Users
</button>
</>
<button
className={`btn btn-sm ${activeTab === 'users' ? 'btn-primary' : 'btn-secondary'}`}
onClick={() => setActiveTab('users')}
>
<i className="fas fa-users" style={{ fontSize: '0.7rem' }} /> Users
</button>
)}
<button
className={`btn btn-sm ${activeTab === 'sources' ? 'btn-primary' : 'btn-secondary'}`}
onClick={() => setActiveTab('sources')}
>
<i className="fas fa-key" style={{ fontSize: '0.7rem' }} /> {t('usage.sources.tab')}
</button>
<div style={{ flex: 1 }} />
<button className="btn btn-secondary btn-sm" onClick={fetchUsage} disabled={loading} style={{ gap: 4 }}>
<i className={`fas fa-rotate${loading ? ' fa-spin' : ''}`} /> Refresh
@@ -884,6 +889,10 @@ export default function Usage() {
</div>
)
)}
{activeTab === 'sources' && (
<SourcesTab period={period} adminUserId={selectedUserId} />
)}
</>
)}
</div>

View File

@@ -0,0 +1,83 @@
import { useTranslation } from 'react-i18next'
const SEGMENT_COLORS = {
apikey: 'var(--color-primary)',
web: 'var(--color-info, #3b82f6)',
legacy: 'var(--color-warning, #f59e0b)',
}
// SourceMixRibbon renders one segmented horizontal bar showing the share of
// tokens by source class (apikey / web / legacy). Clicking a segment invokes
// onSelectSourceClass with the segment key so the parent can filter the view.
//
// Props:
// bySource: { apikey?: {tokens, requests}, web?: {...}, legacy?: {...} }
// keyCount: number of distinct API keys in the dataset (for the legend)
// onSelectSourceClass: (cls: 'apikey'|'web'|'legacy') => void (optional)
export default function SourceMixRibbon({ bySource = {}, keyCount = 0, onSelectSourceClass }) {
const { t } = useTranslation('admin')
const apikey = (bySource.apikey?.tokens) || 0
const web = (bySource.web?.tokens) || 0
const legacy = (bySource.legacy?.tokens) || 0
const total = apikey + web + legacy || 1
const pct = (n) => Math.round((n / total) * 100)
const apiPct = pct(apikey)
const webPct = pct(web)
const legacyPct = pct(legacy)
const segments = [
{ key: 'apikey', label: `${apiPct}% API keys (${keyCount})`, pct: apiPct, color: SEGMENT_COLORS.apikey },
{ key: 'web', label: `${webPct}% ${t('usage.sources.webUI')}`, pct: webPct, color: SEGMENT_COLORS.web },
{ key: 'legacy', label: `${legacyPct}% ${t('usage.sources.legacy')}`, pct: legacyPct, color: SEGMENT_COLORS.legacy },
].filter((s) => s.pct > 0)
return (
<div
role="group"
aria-label={t('usage.sources.ribbonAria', { apikey: apiPct, web: webPct, legacy: legacyPct })}
style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-xs)' }}
>
<div style={{ fontSize: '0.875rem', fontWeight: 600, color: 'var(--color-text-primary)' }}>
{t('usage.sources.mixTitle')}
</div>
<div
style={{
display: 'flex',
height: 12,
borderRadius: 'var(--radius-sm)',
overflow: 'hidden',
border: '1px solid var(--color-border-subtle)',
}}
>
{segments.map((s) => (
<button
key={s.key}
type="button"
onClick={() => onSelectSourceClass?.(s.key)}
aria-label={s.label}
style={{
width: `${s.pct}%`,
background: s.color,
border: 'none',
padding: 0,
cursor: onSelectSourceClass ? 'pointer' : 'default',
}}
/>
))}
</div>
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-sm)', fontSize: '0.75rem' }}>
{segments.map((s) => (
<span key={s.key} style={{ display: 'inline-flex', alignItems: 'center', gap: 6 }}>
<span
style={{ width: 10, height: 10, borderRadius: 2, background: s.color, display: 'inline-block' }}
aria-hidden
/>
{s.label}
</span>
))}
</div>
</div>
)
}

View File

@@ -0,0 +1,147 @@
import { useMemo } from 'react'
import { useTranslation } from 'react-i18next'
const TOP_N = 7
// Distinct, accessible-ish series colors that read on both light and dark themes.
const SERIES_COLORS = [
'var(--color-primary)',
'var(--color-success, #10b981)',
'var(--color-warning, #f59e0b)',
'var(--color-info, #3b82f6)',
'var(--color-danger, #ef4444)',
'#a855f7',
'#ec4899',
]
const OTHER_COLOR = 'var(--color-text-muted, #94a3b8)'
function identityFor(bucket) {
return bucket.api_key_id || bucket.source || 'unknown'
}
// buckets: UsageBucket[] from /api/auth/usage/sources (server-sorted ASC by bucket)
// selectedKey: 'web' | 'legacy' | api_key_id | null
// totals: SourceTotals (for the "Other (count)" legend label)
export default function SourceTimeChart({ buckets = [], selectedKey, totals }) {
const { t } = useTranslation('admin')
// Find the top-N identities by total tokens across the period.
const topIds = useMemo(() => {
const sums = new Map()
for (const b of buckets) {
const id = identityFor(b)
sums.set(id, (sums.get(id) || 0) + (b.total_tokens || 0))
}
return [...sums.entries()]
.sort((a, b) => b[1] - a[1])
.slice(0, TOP_N)
.map(([id]) => id)
}, [buckets])
const topSet = useMemo(() => new Set(topIds), [topIds])
// Resolve a display label for an identity (api_key_id -> snapshotted name, or source name).
const labelByIdentity = useMemo(() => {
const m = new Map()
for (const b of buckets) {
const id = identityFor(b)
if (m.has(id)) continue
if (b.source === 'web') { m.set(id, t('usage.sources.webUI')); continue }
if (b.source === 'legacy') { m.set(id, t('usage.sources.legacy')); continue }
m.set(id, b.api_key_name || b.api_key_id || id)
}
return m
}, [buckets, t])
// Build a dense per-bucket row, splitting top-N vs Other.
const series = useMemo(() => {
const byBucket = new Map()
for (const b of buckets) {
const id = identityFor(b)
const seriesId = topSet.has(id) ? id : '__other__'
const row = byBucket.get(b.bucket) || { bucket: b.bucket, total: 0 }
row[seriesId] = (row[seriesId] || 0) + (b.total_tokens || 0)
row.total += b.total_tokens || 0
byBucket.set(b.bucket, row)
}
return [...byBucket.values()]
}, [buckets, topSet])
const max = useMemo(
() => series.reduce((m, r) => Math.max(m, r.total), 0) || 1,
[series]
)
const seriesIds = [...topIds, '__other__']
const colorOf = (id) =>
id === '__other__'
? OTHER_COLOR
: SERIES_COLORS[topIds.indexOf(id) % SERIES_COLORS.length]
const labelOfId = (id) => {
if (id === '__other__') return null // computed inline (need count)
return labelByIdentity.get(id) || id
}
const otherCount = Math.max(0, (totals?.by_key?.length || 0) - TOP_N)
// SVG geometry: 24px wide per bar (2px gap), 100px tall, viewBox stretches with bar count.
const barWidth = 20
const barGap = 4
const slotWidth = barWidth + barGap
const height = 100
const width = Math.max(series.length * slotWidth, 200)
return (
<div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-xs)' }}>
<div style={{ fontSize: '0.875rem', fontWeight: 600, color: 'var(--color-text-primary)' }}>
{t('usage.sources.topSources')}
</div>
<svg
viewBox={`0 0 ${width} ${height}`}
preserveAspectRatio="none"
style={{ width: '100%', height: 160, display: 'block' }}
aria-hidden
>
{series.map((row, i) => {
let y = height
return (
<g key={row.bucket} transform={`translate(${i * slotWidth}, 0)`}>
{seriesIds.map(id => {
const v = row[id] || 0
if (!v) return null
const h = (v / max) * height
y -= h
const dim = selectedKey && selectedKey !== id ? 0.25 : 1
const title = id === '__other__'
? t('usage.sources.other', { count: otherCount })
: labelOfId(id)
return (
<rect
key={id}
x={barGap / 2} y={y}
width={barWidth} height={h}
fill={colorOf(id)} opacity={dim}
>
<title>{`${row.bucket} - ${title}: ${v.toLocaleString()}`}</title>
</rect>
)
})}
</g>
)
})}
</svg>
<div style={{ display: 'flex', flexWrap: 'wrap', gap: 'var(--spacing-sm)', fontSize: '0.75rem' }}>
{seriesIds.map(id => (
<span key={id} style={{ display: 'inline-flex', alignItems: 'center', gap: 6 }}>
<span style={{ width: 10, height: 10, borderRadius: 2, background: colorOf(id), display: 'inline-block' }} aria-hidden />
{id === '__other__'
? t('usage.sources.other', { count: otherCount })
: labelOfId(id)}
</span>
))}
</div>
</div>
)
}

View File

@@ -0,0 +1,176 @@
import { useEffect, useState } from 'react'
import { useTranslation } from 'react-i18next'
import { usageApi, apiKeysApi } from '../../utils/api'
import { useAuth } from '../../context/AuthContext'
import LoadingSpinner from '../../components/LoadingSpinner'
import SourceMixRibbon from './SourceMixRibbon'
import SourcesTable from './SourcesTable'
import SourceTimeChart from './SourceTimeChart'
const EMPTY_DATA = {
buckets: [],
totals: { by_source: {}, by_key: [], grand_total: { tokens: 0, requests: 0 } },
truncated: false,
}
// Resolve a human label for the currently selected key (web/legacy class or api_key_id).
function labelForSelected(totals, selectedKey, t) {
if (!selectedKey) return ''
if (selectedKey === 'web') return t('usage.sources.webUI')
if (selectedKey === 'legacy') return t('usage.sources.legacy')
const row = (totals?.by_key || []).find(k => k.api_key_id === selectedKey)
return row ? (row.api_key_name || selectedKey) : selectedKey
}
// SourcesTab fetches and renders per-source / per-API-key usage breakdown.
// Task 10 replaces the raw JSON / list placeholders with SourceMixRibbon and
// SourcesTable. Task 11 will add the time chart and drill-in chip.
export default function SourcesTab({ period, adminUserId }) {
const { t } = useTranslation('admin')
const { isAdmin } = useAuth()
const [data, setData] = useState(EMPTY_DATA)
const [loading, setLoading] = useState(false)
const [error, setError] = useState(null)
const [selectedKey, setSelectedKey] = useState(null)
const [search, setSearch] = useState('')
const [sortKey, setSortKey] = useState('tokens')
// Pull the current set of API key ids so the table can mark unknown keys as
// revoked. null = "don't know yet" so the table won't dim live keys during
// the fetch or after a failure.
const [existingKeyIds, setExistingKeyIds] = useState(null)
useEffect(() => {
apiKeysApi
.list()
.then((resp) => {
const list = Array.isArray(resp) ? resp : (resp?.keys || [])
setExistingKeyIds(new Set(list.map((k) => k.id)))
})
.catch(() => { /* leave existingKeyIds null so revoked detection is skipped */ })
}, [])
useEffect(() => {
let cancelled = false
setLoading(true)
setError(null)
const p = isAdmin
? usageApi.getAdminSources(period, adminUserId)
: usageApi.getMySources(period)
p
.then((d) => { if (!cancelled) setData(d || EMPTY_DATA) })
.catch((e) => { if (!cancelled) setError(e) })
.finally(() => { if (!cancelled) setLoading(false) })
return () => { cancelled = true }
}, [isAdmin, period, adminUserId])
const totals = data.totals || EMPTY_DATA.totals
const buckets = data.buckets || EMPTY_DATA.buckets
const grandT = totals.grand_total || { tokens: 0, requests: 0 }
const truncated = data.truncated || false
const isEmpty = !loading && (grandT.tokens || 0) === 0 && (grandT.requests || 0) === 0
if (loading) {
return (
<div style={{ display: 'flex', justifyContent: 'center', padding: 'var(--spacing-xl)' }}>
<LoadingSpinner size="lg" />
</div>
)
}
if (error) {
return (
<div className="empty-state">
<div className="empty-state-icon"><i className="fas fa-triangle-exclamation" /></div>
<h2 className="empty-state-title">Failed to load</h2>
<p className="empty-state-text">{String(error.message || error)}</p>
</div>
)
}
if (isEmpty) {
return (
<div className="empty-state">
<div className="empty-state-icon"><i className="fas fa-key" /></div>
<h2 className="empty-state-title">{t('usage.sources.noTrafficShort')}</h2>
<p className="empty-state-text">{t('usage.sources.noKeysYet')}</p>
</div>
)
}
return (
<div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-md)' }}>
<div className="card" style={{ padding: 'var(--spacing-md)' }}>
<SourceMixRibbon
bySource={totals.by_source}
keyCount={(totals.by_key || []).length}
onSelectSourceClass={(cls) => setSelectedKey(cls)}
/>
</div>
{selectedKey && (
<div style={{ display: 'flex', alignItems: 'center', gap: 'var(--spacing-xs)' }}>
<span
style={{
display: 'inline-flex',
alignItems: 'center',
gap: 'var(--spacing-xs)',
padding: 'calc(var(--spacing-xs) / 2) var(--spacing-sm)',
background: 'var(--color-bg-secondary)',
color: 'var(--color-text-primary)',
fontSize: '0.75rem',
borderRadius: 'var(--radius-sm)',
border: '1px solid var(--color-border-subtle)',
}}
>
<i className="fas fa-filter" style={{ fontSize: '0.6875rem', color: 'var(--color-text-muted)' }} aria-hidden />
{t('usage.sources.filteredTo', { name: labelForSelected(totals, selectedKey, t) })}
<button
type="button"
onClick={() => setSelectedKey(null)}
aria-label={t('usage.sources.clearFilter')}
style={{
appearance: 'none',
background: 'transparent',
border: 'none',
color: 'var(--color-text-muted)',
cursor: 'pointer',
padding: 0,
fontSize: '0.875rem',
lineHeight: 1,
}}
>
<i className="fas fa-xmark" />
</button>
</span>
</div>
)}
<div className="card" style={{ padding: 'var(--spacing-md)' }}>
<SourceTimeChart buckets={buckets} selectedKey={selectedKey} totals={totals} />
</div>
<div className="card" style={{ padding: 'var(--spacing-md)' }}>
<SourcesTable
totals={totals}
selectedKey={selectedKey}
onSelectKey={setSelectedKey}
search={search}
setSearch={setSearch}
sortKey={sortKey}
setSortKey={setSortKey}
existingKeyIds={existingKeyIds}
showUserColumn={isAdmin}
/>
</div>
{truncated && (
<div style={{ fontSize: '0.75rem', color: 'var(--color-warning)' }}>
{t('usage.sources.truncatedWarning')}
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,245 @@
import { useMemo } from 'react'
import { useTranslation } from 'react-i18next'
const SORT_FNS = {
tokens: (a, b) => (b.tokens || 0) - (a.tokens || 0),
requests: (a, b) => (b.requests || 0) - (a.requests || 0),
last_used: (a, b) => new Date(b.last_used || 0).getTime() - new Date(a.last_used || 0).getTime(),
name: (a, b) => (a.name || '').localeCompare(b.name || ''),
user: (a, b) => (a.userName || '').localeCompare(b.userName || ''),
}
function formatTokens(n) {
if (!n) return '0'
if (n >= 1_000_000) return (n / 1_000_000).toFixed(1) + 'M'
if (n >= 1_000) return (n / 1_000).toFixed(1) + 'k'
return String(n)
}
function formatRelative(iso) {
if (!iso) return '-'
const t = new Date(iso).getTime()
if (Number.isNaN(t) || t <= 0) return '-'
const diff = Date.now() - t
if (diff < 60_000) return 'just now'
if (diff < 3_600_000) return Math.round(diff / 60_000) + 'm ago'
if (diff < 86_400_000) return Math.round(diff / 3_600_000) + 'h ago'
return Math.round(diff / 86_400_000) + 'd ago'
}
// SourcesTable is the searchable, sortable list of key totals plus pseudo-rows
// for the web UI and legacy (unkeyed) source classes. Clicking a row selects
// it; the parent decides what to do with the selection (the drill-in panel
// will be wired in Task 11).
//
// Props:
// totals: SourceTotals payload (from /api/auth/usage/sources)
// selectedKey: currently-selected row id (api_key_id | 'web' | 'legacy' | null)
// onSelectKey: (id|null) => void
// search / setSearch: free-text filter state lifted to the parent
// sortKey / setSortKey: sort column state lifted to the parent
// existingKeyIds: Set<string> of current (non-revoked) api key ids, or null
// when the parent hasn't yet learned which keys exist. Null suppresses the
// revoked badge entirely so live keys aren't dimmed during the fetch or
// after a failure.
// showUserColumn: render the User column. Admin views set this true so the
// reader can attribute each key (and each Web UI row) to its owner.
export default function SourcesTable({
totals,
selectedKey,
onSelectKey,
search,
setSearch,
sortKey,
setSortKey,
existingKeyIds = null,
showUserColumn = false,
}) {
const { t } = useTranslation('admin')
const rows = useMemo(() => {
const named = (totals?.by_key || []).map((k) => ({
kind: 'apikey',
id: k.api_key_id,
name: k.api_key_name || k.api_key_id,
userID: k.user_id || '',
userName: k.user_name || '',
prefix: '',
tokens: k.tokens,
requests: k.requests,
last_used: k.last_used,
revoked: existingKeyIds != null && !existingKeyIds.has(k.api_key_id),
}))
// Pseudo-rows for sources that don't have a named key identity.
// In admin view (showUserColumn=true), prefer the per-user breakdown
// from totals.by_user_source so each user's Web UI / legacy traffic
// gets its own row. Otherwise fall back to the global by_source aggregate.
let unkeyed = []
if (showUserColumn && Array.isArray(totals?.by_user_source) && totals.by_user_source.length > 0) {
unkeyed = totals.by_user_source.map((r) => ({
kind: r.source,
id: r.source + ':' + (r.user_id || ''),
name: r.source === 'legacy' ? t('usage.sources.legacy') : t('usage.sources.webUI'),
userID: r.user_id || '',
userName: r.user_name || '',
prefix: '-',
tokens: r.tokens,
requests: r.requests,
}))
} else {
if (totals?.by_source?.web) {
unkeyed.push({
kind: 'web',
id: 'web',
name: t('usage.sources.webUI'),
userID: '',
userName: '',
prefix: '-',
tokens: totals.by_source.web.tokens,
requests: totals.by_source.web.requests,
})
}
if (totals?.by_source?.legacy) {
unkeyed.push({
kind: 'legacy',
id: 'legacy',
name: t('usage.sources.legacy'),
userID: '',
userName: '',
prefix: '-',
tokens: totals.by_source.legacy.tokens,
requests: totals.by_source.legacy.requests,
})
}
}
return [...named, ...unkeyed]
}, [totals, existingKeyIds, showUserColumn, t])
const filtered = useMemo(() => {
const q = (search || '').trim().toLowerCase()
const list = q
? rows.filter((r) =>
(r.name || '').toLowerCase().includes(q) ||
(r.prefix || '').toLowerCase().includes(q) ||
(r.userName || '').toLowerCase().includes(q) ||
(r.userID || '').toLowerCase().includes(q)
)
: rows
return [...list].sort(SORT_FNS[sortKey] || SORT_FNS.tokens)
}, [rows, search, sortKey])
const iconFor = (kind) =>
kind === 'apikey' ? 'fas fa-key' : kind === 'web' ? 'fas fa-globe' : 'fas fa-gear'
return (
<div style={{ display: 'flex', flexDirection: 'column', gap: 'var(--spacing-sm)' }}>
<div style={{ display: 'flex', alignItems: 'center', gap: 'var(--spacing-sm)', flexWrap: 'wrap' }}>
<input
type="search"
value={search}
onChange={(e) => setSearch(e.target.value)}
placeholder={t('usage.sources.searchPlaceholder')}
aria-label={t('usage.sources.searchPlaceholder')}
style={{
flex: '1 1 12rem',
minWidth: 160,
padding: 'var(--spacing-xs) var(--spacing-sm)',
border: '1px solid var(--color-border-subtle)',
borderRadius: 'var(--radius-sm)',
background: 'var(--color-bg-primary)',
color: 'var(--color-text-primary)',
}}
/>
<label style={{ display: 'inline-flex', alignItems: 'center', gap: 6, fontSize: '0.75rem' }}>
{t('usage.sources.sortBy')}:
<select
value={sortKey}
onChange={(e) => setSortKey(e.target.value)}
style={{
padding: 'calc(var(--spacing-xs) / 2) var(--spacing-xs)',
border: '1px solid var(--color-border-subtle)',
borderRadius: 'var(--radius-sm)',
background: 'var(--color-bg-primary)',
color: 'var(--color-text-primary)',
}}
>
<option value="tokens">{t('usage.sources.sortTokens')}</option>
<option value="requests">{t('usage.sources.sortRequests')}</option>
<option value="last_used">{t('usage.sources.sortLastUsed')}</option>
<option value="name">{t('usage.sources.sortName')}</option>
{showUserColumn && <option value="user">{t('usage.sources.sortUser')}</option>}
</select>
</label>
</div>
<div className="table-container">
<table className="table">
<thead>
<tr>
<th>{t('usage.sources.sortName')}</th>
{showUserColumn && <th style={{ width: 180 }}>{t('usage.sources.sortUser')}</th>}
<th style={{ width: 110 }}>Prefix</th>
<th style={{ width: 100, textAlign: 'right' }}>{t('usage.sources.sortRequests')}</th>
<th style={{ width: 100, textAlign: 'right' }}>{t('usage.sources.sortTokens')}</th>
<th style={{ width: 120, textAlign: 'right' }}>{t('usage.sources.sortLastUsed')}</th>
</tr>
</thead>
<tbody>
{filtered.map((r) => {
const isSel = selectedKey === r.id
return (
<tr
key={r.id}
onClick={() => onSelectKey?.(isSel ? null : r.id)}
style={{
cursor: 'pointer',
background: isSel ? 'var(--color-bg-secondary)' : undefined,
opacity: r.revoked ? 0.5 : 1,
}}
>
<td>
<span style={{ display: 'inline-flex', alignItems: 'center', gap: 8 }}>
<i
className={iconFor(r.kind)}
style={{ color: 'var(--color-text-muted)', fontSize: '0.8125rem' }}
/>
<span>{r.name}</span>
{r.revoked && (
<span
style={{
fontSize: '0.6875rem',
textTransform: 'uppercase',
color: 'var(--color-text-muted)',
}}
>
({t('usage.sources.revoked')})
</span>
)}
</span>
</td>
{showUserColumn && (
<td style={{ color: 'var(--color-text-secondary)', fontSize: '0.8125rem' }}>
{r.userName || r.userID || '-'}
</td>
)}
<td style={{ color: 'var(--color-text-muted)', fontSize: '0.75rem' }}>{r.prefix || '-'}</td>
<td style={{ textAlign: 'right', fontFamily: 'var(--font-mono)' }}>
{Number(r.requests || 0).toLocaleString()}
</td>
<td style={{ textAlign: 'right', fontFamily: 'var(--font-mono)' }}>
{formatTokens(r.tokens || 0)}
</td>
<td style={{ textAlign: 'right', fontSize: '0.75rem', color: 'var(--color-text-muted)' }}>
{formatRelative(r.last_used)}
</td>
</tr>
)
})}
</tbody>
</table>
</div>
</div>
)
}

View File

@@ -422,6 +422,14 @@ export const usageApi = {
if (userId) url += `&user_id=${encodeURIComponent(userId)}`
return fetchJSON(url)
},
getMySources: (period) =>
fetchJSON(`/api/auth/usage/sources?period=${period || 'month'}`),
getAdminSources: (period, userId, apiKeyId) => {
let url = `/api/auth/admin/usage/sources?period=${period || 'month'}`
if (userId) url += `&user_id=${encodeURIComponent(userId)}`
if (apiKeyId) url += `&api_key_id=${encodeURIComponent(apiKeyId)}`
return fetchJSON(url)
},
getMyQuotas: () => fetchJSON('/api/auth/quota'),
}

View File

@@ -0,0 +1,81 @@
// Clipboard helper that works in non-secure contexts.
//
// navigator.clipboard is only defined on https:// origins and on
// http://localhost. When LocalAI is served over plain http from a remote
// host (LXC + Docker is a common deployment), every page that called
// `navigator.clipboard.writeText` silently failed (#9904). This helper
// transparently falls back to a hidden-textarea + execCommand('copy')
// trick that browsers still honour when the page is not a secure context.
//
// Returns true on success, false on failure. Callers should use the return
// value to drive the success/failure toast — the old code always claimed
// success regardless of what actually happened.
export async function copyToClipboard(text) {
if (text == null) return false
const value = typeof text === 'string' ? text : String(text)
if (typeof navigator !== 'undefined' && navigator.clipboard?.writeText && window.isSecureContext) {
try {
await navigator.clipboard.writeText(value)
return true
} catch {
// Permissions denied, browser refused, etc. — try the fallback.
}
}
return legacyCopy(value)
}
function legacyCopy(value) {
if (typeof document === 'undefined') return false
const ta = document.createElement('textarea')
ta.value = value
// Keep the textarea out of the viewport and out of layout reads. Using
// `position: fixed` + a negative offset avoids scrolling the page when
// we call .select() below.
ta.setAttribute('readonly', '')
ta.style.position = 'fixed'
ta.style.top = '0'
ta.style.left = '-9999px'
ta.style.opacity = '0'
document.body.appendChild(ta)
// Preserve the current selection so triggering execCommand doesn't blow
// away whatever the user had highlighted on the page.
const previousSelection = saveSelection()
let ok = false
try {
ta.select()
ta.setSelectionRange(0, value.length)
ok = document.execCommand('copy')
} catch {
ok = false
} finally {
document.body.removeChild(ta)
restoreSelection(previousSelection)
}
return ok
}
function saveSelection() {
try {
const sel = window.getSelection()
if (!sel || sel.rangeCount === 0) return null
const ranges = []
for (let i = 0; i < sel.rangeCount; i++) ranges.push(sel.getRangeAt(i).cloneRange())
return ranges
} catch {
return null
}
}
function restoreSelection(ranges) {
if (!ranges) return
try {
const sel = window.getSelection()
if (!sel) return
sel.removeAllRanges()
for (const r of ranges) sel.addRange(r)
} catch {
// best-effort
}
}

View File

@@ -789,6 +789,30 @@ func RegisterAuthRoutes(e *echo.Echo, app *application.Application) {
})
})
// GET /api/auth/usage/sources - caller's per-source breakdown (no legacy)
e.GET("/api/auth/usage/sources", func(c echo.Context) error {
user := auth.GetUser(c)
if user == nil {
return c.JSON(http.StatusUnauthorized, map[string]string{"error": "not authenticated"})
}
period := c.QueryParam("period")
if period == "" {
period = "month"
}
buckets, totals, err := auth.GetUserUsageBySource(db, user.ID, period)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to get usage"})
}
return c.JSON(http.StatusOK, map[string]any{
"buckets": buckets,
"totals": totals,
"truncated": false,
})
})
// Admin endpoints
adminMw := auth.RequireAdmin()
@@ -1104,6 +1128,27 @@ func RegisterAuthRoutes(e *echo.Echo, app *application.Application) {
})
}, adminMw)
// GET /api/auth/admin/usage/sources - all users' per-source breakdown (admin only)
e.GET("/api/auth/admin/usage/sources", func(c echo.Context) error {
period := c.QueryParam("period")
if period == "" {
period = "month"
}
userID := c.QueryParam("user_id")
apiKeyID := c.QueryParam("api_key_id")
buckets, totals, truncated, err := auth.GetAllUsageBySource(db, period, userID, apiKeyID)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to get usage"})
}
return c.JSON(http.StatusOK, map[string]any{
"buckets": buckets,
"totals": totals,
"truncated": truncated,
})
}, adminMw)
// --- Invite management endpoints ---
// POST /api/auth/admin/invites - create invite (admin only)

View File

@@ -286,6 +286,45 @@ func newTestAuthApp(db *gorm.DB, appConfig *config.ApplicationConfig) *echo.Echo
return c.JSON(http.StatusOK, map[string]string{"message": "user deleted"})
}, adminMw)
// Mirror of production handler in routes/auth.go GET /api/auth/usage/sources.
// Keep this body in sync with the real handler; this test app cannot call
// RegisterAuthRoutes because it needs a *application.Application.
e.GET("/api/auth/usage/sources", func(c echo.Context) error {
user := auth.GetUser(c)
if user == nil {
return c.JSON(http.StatusUnauthorized, map[string]string{"error": "not authenticated"})
}
period := c.QueryParam("period")
if period == "" {
period = "month"
}
buckets, totals, err := auth.GetUserUsageBySource(db, user.ID, period)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to get usage"})
}
return c.JSON(http.StatusOK, map[string]any{
"buckets": buckets, "totals": totals, "truncated": false,
})
})
// Mirror of production handler in routes/auth.go GET /api/auth/admin/usage/sources.
// Keep this body in sync with the real handler.
e.GET("/api/auth/admin/usage/sources", func(c echo.Context) error {
period := c.QueryParam("period")
if period == "" {
period = "month"
}
userID := c.QueryParam("user_id")
apiKeyID := c.QueryParam("api_key_id")
buckets, totals, truncated, err := auth.GetAllUsageBySource(db, period, userID, apiKeyID)
if err != nil {
return c.JSON(http.StatusInternalServerError, map[string]string{"error": "failed to get usage"})
}
return c.JSON(http.StatusOK, map[string]any{
"buckets": buckets, "totals": totals, "truncated": truncated,
})
}, adminMw)
// Regular API endpoint for testing
e.POST("/v1/chat/completions", func(c echo.Context) error {
return c.String(http.StatusOK, "ok")
@@ -931,4 +970,110 @@ var _ = Describe("Auth Routes", Label("auth"), func() {
Expect(providers).To(ContainElement(auth.ProviderGitHub))
})
})
Describe("GET /api/auth/usage/sources", func() {
It("returns only the caller's data, never legacy", func() {
app := newTestAuthApp(db, appConfig)
alice := createRouteTestUser(db, "alice@example.com", auth.RoleUser)
aliceToken, err := auth.CreateSession(db, alice.ID, "")
Expect(err).ToNot(HaveOccurred())
keyID := "k-alice"
now := time.Now()
Expect(auth.RecordUsage(db, &auth.UsageRecord{
UserID: alice.ID, Source: auth.UsageSourceAPIKey,
APIKeyID: &keyID, APIKeyName: "alice-key",
Model: "gpt-4", TotalTokens: 100, CreatedAt: now,
})).To(Succeed())
Expect(auth.RecordUsage(db, &auth.UsageRecord{
UserID: alice.ID, Source: auth.UsageSourceWeb,
Model: "gpt-4", TotalTokens: 50, CreatedAt: now,
})).To(Succeed())
Expect(auth.RecordUsage(db, &auth.UsageRecord{
UserID: "legacy-api-key", Source: auth.UsageSourceLegacy,
Model: "gpt-4", TotalTokens: 30, CreatedAt: now,
})).To(Succeed())
rec := doAuthRequest(app, http.MethodGet, "/api/auth/usage/sources?period=month", nil, withSession(aliceToken))
Expect(rec.Code).To(Equal(http.StatusOK))
var resp struct {
Buckets []auth.UsageBucket `json:"buckets"`
Totals auth.SourceTotals `json:"totals"`
Truncated bool `json:"truncated"`
}
Expect(json.Unmarshal(rec.Body.Bytes(), &resp)).To(Succeed())
_, hasLegacy := resp.Totals.BySource[auth.UsageSourceLegacy]
Expect(hasLegacy).To(BeFalse())
Expect(resp.Totals.GrandTotal.Tokens).To(Equal(int64(150)))
Expect(resp.Truncated).To(BeFalse())
})
It("returns 401 when unauthenticated", func() {
app := newTestAuthApp(db, appConfig)
// Without a session cookie or bearer token, the global auth middleware
// should refuse the request before our handler runs.
rec := doAuthRequest(app, http.MethodGet, "/api/auth/usage/sources?period=month", nil)
Expect(rec.Code).To(Equal(http.StatusUnauthorized))
})
})
Describe("GET /api/auth/admin/usage/sources", func() {
It("returns 403 for non-admin", func() {
app := newTestAuthApp(db, appConfig)
alice := createRouteTestUser(db, "alice@example.com", auth.RoleUser)
aliceToken, _ := auth.CreateSession(db, alice.ID, "")
rec := doAuthRequest(app, http.MethodGet, "/api/auth/admin/usage/sources?period=month", nil, withSession(aliceToken))
Expect(rec.Code).To(Equal(http.StatusForbidden))
})
It("returns legacy bucket for admin and applies api_key_id filter", func() {
app := newTestAuthApp(db, appConfig)
admin := createRouteTestUser(db, "admin@example.com", auth.RoleAdmin)
adminToken, _ := auth.CreateSession(db, admin.ID, "")
k1 := "k1"
k2 := "k2"
now := time.Now()
Expect(auth.RecordUsage(db, &auth.UsageRecord{UserID: "alice", Source: auth.UsageSourceAPIKey, APIKeyID: &k1, APIKeyName: "ci", Model: "gpt-4", TotalTokens: 10, CreatedAt: now})).To(Succeed())
Expect(auth.RecordUsage(db, &auth.UsageRecord{UserID: "alice", Source: auth.UsageSourceAPIKey, APIKeyID: &k2, APIKeyName: "lap", Model: "gpt-4", TotalTokens: 20, CreatedAt: now})).To(Succeed())
Expect(auth.RecordUsage(db, &auth.UsageRecord{UserID: "legacy-api-key", Source: auth.UsageSourceLegacy, Model: "gpt-4", TotalTokens: 5, CreatedAt: now})).To(Succeed())
rec := doAuthRequest(app, http.MethodGet,
"/api/auth/admin/usage/sources?period=month&api_key_id=k2", nil, withSession(adminToken))
Expect(rec.Code).To(Equal(http.StatusOK))
var resp struct {
Totals auth.SourceTotals `json:"totals"`
Truncated bool `json:"truncated"`
}
Expect(json.Unmarshal(rec.Body.Bytes(), &resp)).To(Succeed())
Expect(resp.Totals.GrandTotal.Tokens).To(Equal(int64(20)))
})
It("includes legacy in by_source for admin with no filter", func() {
app := newTestAuthApp(db, appConfig)
admin := createRouteTestUser(db, "admin@example.com", auth.RoleAdmin)
adminToken, _ := auth.CreateSession(db, admin.ID, "")
now := time.Now()
Expect(auth.RecordUsage(db, &auth.UsageRecord{UserID: "legacy-api-key", Source: auth.UsageSourceLegacy, Model: "gpt-4", TotalTokens: 7, CreatedAt: now})).To(Succeed())
rec := doAuthRequest(app, http.MethodGet, "/api/auth/admin/usage/sources?period=month", nil, withSession(adminToken))
Expect(rec.Code).To(Equal(http.StatusOK))
var resp struct {
Totals auth.SourceTotals `json:"totals"`
}
Expect(json.Unmarshal(rec.Body.Bytes(), &resp)).To(Succeed())
Expect(resp.Totals.BySource).To(HaveKey(auth.UsageSourceLegacy))
Expect(resp.Totals.BySource[auth.UsageSourceLegacy].Tokens).To(Equal(int64(7)))
})
})
})

View File

@@ -6,7 +6,9 @@ import (
"strings"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/endpoints/localai"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/nodes"
"gorm.io/gorm"
)
@@ -53,7 +55,12 @@ func RegisterNodeSelfServiceRoutes(e *echo.Echo, registry *nodes.NodeRegistry, r
// RegisterNodeAdminRoutes registers /api/nodes/ endpoints used by admins
// (list, get, get models, drain, delete, approve, backend management). Protected by admin middleware.
func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloader nodes.NodeCommandSender, adminMw echo.MiddlewareFunc, authDB *gorm.DB, hmacSecret string, registrationToken string) {
//
// galleryService/opcache/appConfig are threaded in for the async node-scoped
// backend install path (POST /:id/backends/install). That handler enqueues a
// ManagementOp on the gallery channel rather than blocking on a NATS reply, so
// the browser gets HTTP 202 + jobID immediately instead of waiting up to 3 minutes.
func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloader nodes.NodeCommandSender, galleryService *galleryop.GalleryService, opcache *galleryop.OpCache, appConfig *config.ApplicationConfig, adminMw echo.MiddlewareFunc, authDB *gorm.DB, hmacSecret string, registrationToken string) {
if registry == nil {
return
}
@@ -78,7 +85,7 @@ func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloade
// Backend management on workers
admin.GET("/:id/backends", localai.ListBackendsOnNodeEndpoint(unloader))
admin.POST("/:id/backends/install", localai.InstallBackendOnNodeEndpoint(unloader))
admin.POST("/:id/backends/install", localai.InstallBackendOnNodeEndpoint(unloader, galleryService, opcache, appConfig))
admin.POST("/:id/backends/delete", localai.DeleteBackendOnNodeEndpoint(unloader))
// Model management on workers

View File

@@ -214,6 +214,17 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
}
}
// Node-scoped backend ops (from /api/nodes/:id/backends/install)
// carry the nodeID inside the opcache key as "node:<nodeID>:<backend>".
// Pull it back out so the operations panel can label which node the
// install is targeting, and so the display name is just the backend
// slug instead of the full prefixed key.
scopedNodeID := ""
if nodeID, backend, ok := galleryop.ParseNodeScopedKey(galleryID); ok {
scopedNodeID = nodeID
galleryID = backend
}
// Extract display name (remove repo prefix if exists)
displayName := galleryID
if strings.Contains(galleryID, "@") {
@@ -237,6 +248,12 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
"cancellable": isCancellable,
"message": message,
}
// Only attach nodeID when this op was node-scoped: an empty string
// would mislead the UI into rendering a node attribution that never
// existed in the first place.
if scopedNodeID != "" {
opData["nodeID"] = scopedNodeID
}
if status != nil && status.Error != nil {
opData["error"] = status.Error.Error()
}

View File

@@ -0,0 +1,98 @@
package routes_test
import (
"encoding/json"
"net/http"
"net/http/httptest"
"github.com/labstack/echo/v4"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/application"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/routes"
"github.com/mudler/LocalAI/core/services/galleryop"
)
// These specs guard the contract between the opcache (which stores
// node-scoped backend installs under a "node:<nodeID>:<backend>" key) and the
// /api/operations response surface the React UI polls. Without nodeID
// extraction the panel would show the raw prefixed key and have no way to
// label which worker an install is targeting.
var _ = Describe("/api/operations with node-scoped backend ops", func() {
// We pass a zero-value *application.Application because the handler's
// distributed-services branch guards on a nil check on the returned
// *DistributedServices, which is nil for a fresh Application{}.
noopMw := func(next echo.HandlerFunc) echo.HandlerFunc { return next }
It("emits nodeID and the un-prefixed backend name for keys built by NodeScopedKey", func() {
appCfg := &config.ApplicationConfig{}
galleryService := galleryop.NewGalleryService(appCfg, nil)
opcache := galleryop.NewOpCache(galleryService)
key := galleryop.NodeScopedKey("worker-7", "llama-cpp")
opcache.SetBackend(key, "job-uuid-123")
e := echo.New()
routes.RegisterUIAPIRoutes(e, nil, nil, appCfg, galleryService, opcache, &application.Application{}, noopMw)
req := httptest.NewRequest(http.MethodGet, "/api/operations", nil)
rec := httptest.NewRecorder()
e.ServeHTTP(rec, req)
Expect(rec.Code).To(Equal(http.StatusOK))
// The handler wraps operations in {"operations": [...]}.
var envelope struct {
Operations []map[string]any `json:"operations"`
}
Expect(json.Unmarshal(rec.Body.Bytes(), &envelope)).To(Succeed())
var found map[string]any
for _, op := range envelope.Operations {
if op["jobID"] == "job-uuid-123" {
found = op
break
}
}
Expect(found).ToNot(BeNil(), "node-scoped op should appear in /api/operations")
Expect(found["nodeID"]).To(Equal("worker-7"))
Expect(found["name"]).To(Equal("llama-cpp"))
Expect(found["isBackend"]).To(Equal(true))
})
It("does not emit nodeID for non-node-scoped backend ops", func() {
appCfg := &config.ApplicationConfig{}
galleryService := galleryop.NewGalleryService(appCfg, nil)
opcache := galleryop.NewOpCache(galleryService)
// Legacy/global install path: bare backend name as the opcache key.
opcache.SetBackend("llama-cpp", "job-uuid-456")
e := echo.New()
routes.RegisterUIAPIRoutes(e, nil, nil, appCfg, galleryService, opcache, &application.Application{}, noopMw)
req := httptest.NewRequest(http.MethodGet, "/api/operations", nil)
rec := httptest.NewRecorder()
e.ServeHTTP(rec, req)
Expect(rec.Code).To(Equal(http.StatusOK))
var envelope struct {
Operations []map[string]any `json:"operations"`
}
Expect(json.Unmarshal(rec.Body.Bytes(), &envelope)).To(Succeed())
var found map[string]any
for _, op := range envelope.Operations {
if op["jobID"] == "job-uuid-456" {
found = op
break
}
}
Expect(found).ToNot(BeNil())
// Critical: bare ops must NOT gain a misleading empty nodeID field.
Expect(found).ToNot(HaveKey("nodeID"), "non-node-scoped ops must NOT carry a nodeID field")
Expect(found["name"]).To(Equal("llama-cpp"))
})
})

View File

@@ -113,7 +113,7 @@ func (g *GalleryService) backendHandler(op *ManagementOp[gallery.GalleryBackend,
// InstallExternalBackend installs a backend from an external source (OCI image, URL, or path).
// This method contains the logic to detect the input type and call the appropriate installation function.
// It can be used by both CLI and Web UI for installing backends from external sources.
func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, downloadStatus func(string, string, string, float64), backend, name, alias string) error {
func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, systemState *system.SystemState, modelLoader *model.ModelLoader, downloadStatus func(string, string, string, float64), backend, name, alias string, requireIntegrity bool) error {
uri := downloader.URI(backend)
switch {
case uri.LooksLikeDir():
@@ -127,7 +127,7 @@ func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, sys
},
Alias: alias,
URI: backend,
}, downloadStatus); err != nil {
}, downloadStatus, requireIntegrity); err != nil {
return fmt.Errorf("error installing backend %s: %w", backend, err)
}
case uri.LooksLikeOCI() && !uri.LooksLikeOCIFile():
@@ -141,7 +141,7 @@ func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, sys
},
Alias: alias,
URI: backend,
}, downloadStatus); err != nil {
}, downloadStatus, requireIntegrity); err != nil {
return fmt.Errorf("error installing backend %s: %w", backend, err)
}
case uri.LooksLikeOCIFile():
@@ -163,7 +163,7 @@ func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, sys
},
Alias: alias,
URI: backend,
}, downloadStatus); err != nil {
}, downloadStatus, requireIntegrity); err != nil {
return fmt.Errorf("error installing backend %s: %w", backend, err)
}
default:
@@ -171,7 +171,7 @@ func InstallExternalBackend(ctx context.Context, galleries []config.Gallery, sys
if name != "" || alias != "" {
return fmt.Errorf("specifying a name or alias is not supported for gallery backends")
}
err := gallery.InstallBackendFromGallery(ctx, galleries, systemState, modelLoader, backend, downloadStatus, true)
err := gallery.InstallBackendFromGallery(ctx, galleries, systemState, modelLoader, backend, downloadStatus, true, requireIntegrity)
if err != nil {
return fmt.Errorf("error installing backend %s: %w", backend, err)
}

View File

@@ -70,6 +70,7 @@ var _ = Describe("InstallExternalBackend", func() {
"test-backend", // gallery name
"custom-name", // name should not be allowed
"",
false,
)
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("specifying a name or alias is not supported for gallery backends"))
@@ -85,6 +86,7 @@ var _ = Describe("InstallExternalBackend", func() {
"non-existent-backend",
"",
"",
false,
)
Expect(err).To(HaveOccurred())
})
@@ -101,6 +103,7 @@ var _ = Describe("InstallExternalBackend", func() {
"oci://quay.io/mudler/tests:localai-backend-test",
"", // name is required for OCI images
"",
false,
)
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("specifying a name is required for OCI images"))
@@ -133,6 +136,7 @@ var _ = Describe("InstallExternalBackend", func() {
testBackendPath,
"", // name should be inferred as "source-backend"
"",
false,
)
// The function should at least attempt to install with the inferred name
// Even if it fails for other reasons, it shouldn't fail due to missing name
@@ -151,6 +155,7 @@ var _ = Describe("InstallExternalBackend", func() {
testBackendPath,
"custom-backend-name",
"",
false,
)
// The function should use the provided name
if err != nil {
@@ -168,6 +173,7 @@ var _ = Describe("InstallExternalBackend", func() {
testBackendPath,
"custom-backend-name",
"custom-alias",
false,
)
// The function should accept alias for directory paths
if err != nil {
@@ -190,4 +196,60 @@ var _ = Describe("ManagementOp with External Backend", func() {
Expect(op.ExternalName).To(Equal("test-backend"))
Expect(op.ExternalAlias).To(Equal("test-alias"))
})
Context("TargetNodeID field", func() {
It("defaults to empty string", func() {
op := galleryop.ManagementOp[string, string]{
ExternalURI: "oci://example.com/backend:latest",
}
Expect(op.TargetNodeID).To(BeEmpty())
})
It("preserves TargetNodeID across a channel send", func() {
ch := make(chan galleryop.ManagementOp[string, string], 1)
ch <- galleryop.ManagementOp[string, string]{
GalleryElementName: "llama-cpp",
TargetNodeID: "node-abc-123",
}
received := <-ch
Expect(received.TargetNodeID).To(Equal("node-abc-123"))
Expect(received.GalleryElementName).To(Equal("llama-cpp"))
})
})
Describe("NodeScopedKey", func() {
It("builds a unique key per (nodeID, backend) pair", func() {
Expect(galleryop.NodeScopedKey("node-a", "llama-cpp")).To(Equal("node:node-a:llama-cpp"))
Expect(galleryop.NodeScopedKey("node-b", "llama-cpp")).To(Equal("node:node-b:llama-cpp"))
Expect(galleryop.NodeScopedKey("node-a", "vllm")).To(Equal("node:node-a:vllm"))
})
It("handles backend names containing colons", func() {
// Gallery IDs sometimes look like "official@llama-cpp"; nodeIDs are UUIDs
// without colons, but the backend slug may contain anything. Splitting on
// the first colon after the prefix MUST yield the full backend back.
key := galleryop.NodeScopedKey("node-1", "official@llama-cpp:v2")
node, backend, ok := galleryop.ParseNodeScopedKey(key)
Expect(ok).To(BeTrue())
Expect(node).To(Equal("node-1"))
Expect(backend).To(Equal("official@llama-cpp:v2"))
})
It("rejects keys without the node prefix", func() {
_, _, ok := galleryop.ParseNodeScopedKey("llama-cpp")
Expect(ok).To(BeFalse())
_, _, ok = galleryop.ParseNodeScopedKey("official@llama-cpp")
Expect(ok).To(BeFalse())
})
It("rejects malformed node-prefixed keys", func() {
_, _, ok := galleryop.ParseNodeScopedKey("node:only-one-segment")
Expect(ok).To(BeFalse())
})
It("rejects keys with an empty nodeID segment", func() {
_, _, ok := galleryop.ParseNodeScopedKey("node::llama-cpp")
Expect(ok).To(BeFalse())
})
})
})

View File

@@ -16,6 +16,7 @@ type LocalModelManager struct {
modelLoader *model.ModelLoader
enforcePredownloadScans bool
automaticallyInstallBackend bool
requireBackendIntegrity bool
}
// NewLocalModelManager creates a LocalModelManager from the application config.
@@ -25,6 +26,7 @@ func NewLocalModelManager(appConfig *config.ApplicationConfig, ml *model.ModelLo
modelLoader: ml,
enforcePredownloadScans: appConfig.EnforcePredownloadScans,
automaticallyInstallBackend: appConfig.AutoloadBackendGalleries,
requireBackendIntegrity: appConfig.RequireBackendIntegrity,
}
}
@@ -53,32 +55,34 @@ func (m *LocalModelManager) InstallModel(ctx context.Context, op *ManagementOp[g
if m.automaticallyInstallBackend && installedModel.Backend != "" {
xlog.Debug("Installing backend", "backend", installedModel.Backend)
return gallery.InstallBackendFromGallery(ctx, op.BackendGalleries, m.systemState,
m.modelLoader, installedModel.Backend, progressCb, false)
m.modelLoader, installedModel.Backend, progressCb, false, m.requireBackendIntegrity)
}
return nil
case op.GalleryElementName != "":
return gallery.InstallModelFromGallery(ctx, op.Galleries, op.BackendGalleries,
m.systemState, m.modelLoader, op.GalleryElementName, op.Req, progressCb,
m.enforcePredownloadScans, m.automaticallyInstallBackend)
m.enforcePredownloadScans, m.automaticallyInstallBackend, m.requireBackendIntegrity)
default:
return installModelFromRemoteConfig(ctx, m.systemState, m.modelLoader, op.Req,
progressCb, m.enforcePredownloadScans, m.automaticallyInstallBackend, op.BackendGalleries)
progressCb, m.enforcePredownloadScans, m.automaticallyInstallBackend, op.BackendGalleries, m.requireBackendIntegrity)
}
}
// LocalBackendManager handles backend install/delete on the local instance.
type LocalBackendManager struct {
systemState *system.SystemState
modelLoader *model.ModelLoader
backendGalleries []config.Gallery
systemState *system.SystemState
modelLoader *model.ModelLoader
backendGalleries []config.Gallery
requireBackendIntegrity bool
}
// NewLocalBackendManager creates a LocalBackendManager from the application config.
func NewLocalBackendManager(appConfig *config.ApplicationConfig, ml *model.ModelLoader) *LocalBackendManager {
return &LocalBackendManager{
systemState: appConfig.SystemState,
modelLoader: ml,
backendGalleries: appConfig.BackendGalleries,
systemState: appConfig.SystemState,
modelLoader: ml,
backendGalleries: appConfig.BackendGalleries,
requireBackendIntegrity: appConfig.RequireBackendIntegrity,
}
}
@@ -93,7 +97,7 @@ func (b *LocalBackendManager) ListBackends() (gallery.SystemBackends, error) {
}
func (b *LocalBackendManager) UpgradeBackend(ctx context.Context, name string, progressCb ProgressCallback) error {
return gallery.UpgradeBackend(ctx, b.systemState, b.modelLoader, b.backendGalleries, name, progressCb)
return gallery.UpgradeBackend(ctx, b.systemState, b.modelLoader, b.backendGalleries, name, progressCb, b.requireBackendIntegrity)
}
func (b *LocalBackendManager) CheckUpgrades(ctx context.Context) (map[string]gallery.UpgradeInfo, error) {
@@ -103,10 +107,10 @@ func (b *LocalBackendManager) CheckUpgrades(ctx context.Context) (map[string]gal
func (b *LocalBackendManager) InstallBackend(ctx context.Context, op *ManagementOp[gallery.GalleryBackend, any], progressCb ProgressCallback) error {
if op.ExternalURI != "" {
return InstallExternalBackend(ctx, b.backendGalleries, b.systemState, b.modelLoader,
progressCb, op.ExternalURI, op.ExternalName, op.ExternalAlias)
progressCb, op.ExternalURI, op.ExternalName, op.ExternalAlias, b.requireBackendIntegrity)
}
return gallery.InstallBackendFromGallery(ctx, b.backendGalleries, b.systemState,
b.modelLoader, op.GalleryElementName, progressCb, true)
b.modelLoader, op.GalleryElementName, progressCb, true, b.requireBackendIntegrity)
}
func (b *LocalBackendManager) IsDistributed() bool { return false }

View File

@@ -123,7 +123,7 @@ func (g *GalleryService) modelHandler(op *ManagementOp[gallery.GalleryModel, gal
return nil
}
func installModelFromRemoteConfig(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, req gallery.GalleryModel, downloadStatus func(string, string, string, float64), enforceScan, automaticallyInstallBackend bool, backendGalleries []config.Gallery) error {
func installModelFromRemoteConfig(ctx context.Context, systemState *system.SystemState, modelLoader *model.ModelLoader, req gallery.GalleryModel, downloadStatus func(string, string, string, float64), enforceScan, automaticallyInstallBackend bool, backendGalleries []config.Gallery, requireBackendIntegrity bool) error {
config, err := gallery.GetGalleryConfigFromURLWithContext[gallery.ModelConfig](ctx, req.URL, systemState.Model.ModelsPath)
if err != nil {
return err
@@ -137,7 +137,7 @@ func installModelFromRemoteConfig(ctx context.Context, systemState *system.Syste
}
if automaticallyInstallBackend && installedModel.Backend != "" {
if err := gallery.InstallBackendFromGallery(ctx, backendGalleries, systemState, modelLoader, installedModel.Backend, downloadStatus, false); err != nil {
if err := gallery.InstallBackendFromGallery(ctx, backendGalleries, systemState, modelLoader, installedModel.Backend, downloadStatus, false, requireBackendIntegrity); err != nil {
return err
}
}
@@ -150,23 +150,23 @@ type galleryModel struct {
ID string `json:"id"`
}
func processRequests(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, requests []galleryModel) error {
func processRequests(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, requests []galleryModel, requireBackendIntegrity bool) error {
ctx := context.Background()
var err error
for _, r := range requests {
utils.ResetDownloadTimers()
if r.ID == "" {
err = installModelFromRemoteConfig(ctx, systemState, modelLoader, r.GalleryModel, utils.DisplayDownloadFunction, enforceScan, automaticallyInstallBackend, backendGalleries)
err = installModelFromRemoteConfig(ctx, systemState, modelLoader, r.GalleryModel, utils.DisplayDownloadFunction, enforceScan, automaticallyInstallBackend, backendGalleries, requireBackendIntegrity)
} else {
err = gallery.InstallModelFromGallery(
ctx, galleries, backendGalleries, systemState, modelLoader, r.ID, r.GalleryModel, utils.DisplayDownloadFunction, enforceScan, automaticallyInstallBackend)
ctx, galleries, backendGalleries, systemState, modelLoader, r.ID, r.GalleryModel, utils.DisplayDownloadFunction, enforceScan, automaticallyInstallBackend, requireBackendIntegrity)
}
}
return err
}
func ApplyGalleryFromFile(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, s string) error {
func ApplyGalleryFromFile(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, s string, requireBackendIntegrity bool) error {
dat, err := os.ReadFile(s)
if err != nil {
return err
@@ -177,15 +177,15 @@ func ApplyGalleryFromFile(systemState *system.SystemState, modelLoader *model.Mo
return err
}
return processRequests(systemState, modelLoader, enforceScan, automaticallyInstallBackend, galleries, backendGalleries, requests)
return processRequests(systemState, modelLoader, enforceScan, automaticallyInstallBackend, galleries, backendGalleries, requests, requireBackendIntegrity)
}
func ApplyGalleryFromString(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, s string) error {
func ApplyGalleryFromString(systemState *system.SystemState, modelLoader *model.ModelLoader, enforceScan, automaticallyInstallBackend bool, galleries []config.Gallery, backendGalleries []config.Gallery, s string, requireBackendIntegrity bool) error {
var requests []galleryModel
err := json.Unmarshal([]byte(s), &requests)
if err != nil {
return err
}
return processRequests(systemState, modelLoader, enforceScan, automaticallyInstallBackend, galleries, backendGalleries, requests)
return processRequests(systemState, modelLoader, enforceScan, automaticallyInstallBackend, galleries, backendGalleries, requests, requireBackendIntegrity)
}

Some files were not shown because too many files have changed in this diff Show More