InitDB now classifies any pre-existing usage_record with an empty
source: 'legacy-api-key' user -> legacy, everything else -> web.
The backfill is idempotent (only touches NULL/empty rows).
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Adds three additive columns plus UsageSource* constants. The columns
are auto-migrated by InitDB. APIKeyID is a nullable foreign reference
to UserAPIKey.ID; APIKeyName is snapshotted on each row so revoked
keys keep showing their name in history.
Refs: #9862
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Non-image/non-audio file attachments (txt, md, csv, json) were being
stored in the 'files' metadata field but never added to the message
content array sent to /v1/chat/completions. Images and audio correctly
received content blocks; files did not.
Fix: push a text content block into messageContent when textContent is
present, matching the pattern used for image_url and audio_url.
Also fixes Home.jsx addFiles which never called file.text() at all,
meaning files attached on the home screen had empty textContent even
before reaching useChat.js.
Note: PDF files use file.text() which returns raw bytes rather than
parsed text. Proper PDF support would require PDF.js or server-side
extraction and is not part of this fix.
Signed-off-by: Daniel Liljeberg <damien_@hotmail.com>
The flake set `src = ./sources;` referencing a non-existent subdirectory,
so `nix build` and `nix develop` both failed evaluation. Point `src` at
the repo root and refresh `vendorHash` accordingly.
Add `devShells.default` with the Go toolchain, protobuf generators,
Node.js/bun for the React UI (`make react-ui`), and the linters used by
`make lint` (golangci-lint, gofumpt, goimports, staticcheck).
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(gallery): verify backend OCI images with keyless cosign
Close a trust gap where a registry compromise or MITM could silently
replace a backend image: the gallery YAML tells LocalAI which image to
pull, but until now nothing verified the bytes came from our CI.
Consumer (pkg/oci/cosignverify):
- New package using sigstore-go to verify keyless-cosign signatures.
- OCI 1.1 referrers API + new bundle format (no legacy :tag.sig).
- Policy fields: Issuer / IssuerRegex / Identity / IdentityRegex /
NotBefore. NotBefore is the revocation lever — keyless Fulcio certs
are ephemeral so revocation is policy-side; advancing not_before in
the gallery YAML invalidates every signature predating the cutoff.
- TUF trusted root cached process-wide so N backends from one gallery
do 1 fetch, not N.
Plumbing:
- pkg/downloader: ImageVerifier interface + WithImageVerifier option
threaded through DownloadFileWithContext. Verification runs between
oci.GetImage and oci.ExtractOCIImage, with digest pinning via
pinnedImageRef to close the TOCTOU window. Skips the verifier's HEAD
when the ref is already digest-pinned.
- core/config: Gallery.Verification YAML block.
- core/gallery: backendDownloadOptions builds the verifier from the
policy; applied on initial URI, mirrors, and tag fallbacks.
- core/gallery/upgrade: the upgrade path now routes through the same
options builder. A regression Ginkgo spec pins this contract —
without it, UpgradeBackend silently bypassed verification.
- core/cli: --require-backend-integrity (LOCALAI_REQUIRE_BACKEND_INTEGRITY)
escalates missing policy / empty SHA256 from warn to hard-fail.
Producer (.github/workflows/backend_merge.yml):
- id-token: write at job scope (PR-fork-safe via existing event gate).
- sigstore/cosign-installer@v3 pinned to v2.4.1.
- After each docker buildx imagetools create, resolve the manifest
list digest and run cosign sign --recursive --new-bundle-format
--registry-referrers-mode=oci-1-1 against repo@digest. --recursive
signs the index and every per-arch entry, matching how the consumer
resolves a tag to a platform-specific manifest before verifying.
Rollout: backend/index.yaml has no `verification:` block yet, so this
PR is backward-compatible — installs proceed with a warning until the
gallery is populated. Strict mode is opt-in.
Assisted-by: claude-code:claude-opus-4-7 [Bash] [Edit] [Read] [Write] [WebSearch] [WebFetch]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* refactor(gallery): plumb RequireBackendIntegrity through config instead of env
The previous implementation re-exported the --require-backend-integrity
CLI flag into LOCALAI_REQUIRE_BACKEND_INTEGRITY via os.Setenv, then
re-read it in core/gallery via os.Getenv. This leaked process state
into the gallery package and made the flag impossible to override
per-call or test without touching the env.
Add RequireBackendIntegrity to ApplicationConfig (with a matching
WithRequireBackendIntegrity AppOption) and thread the bool through
every install/upgrade path: InstallBackend, InstallBackendFromGallery,
UpgradeBackend, InstallModelFromGallery, InstallExternalBackend,
ApplyGalleryFromString/File, startup.InstallModels. Worker subcommands
gain the same env-bound flag on WorkerFlags so distributed-worker
installs honor it consistently with the worker daemon path.
Add a forbidigo lint rule against os.Getenv / os.LookupEnv / os.Environ
to keep the env-leak pattern from creeping back. Existing offenders
(p2p, config loaders, etc.) are baseline-grandfathered by the existing
new-from-merge-base: origin/master setting; targeted path exclusions
cover the legitimate cases — kong CLI entry points, backend
subprocesses, system capability probes, gRPC AUTH_TOKEN inheritance,
test gating env vars.
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Richard Palethorpe <io@richiejp.com>
---------
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* feat(llama-cpp): bump to MTP-merge SHA and document draft-mtp spec type
Update LLAMA_VERSION to 0253fb21 (post ggml-org/llama.cpp#22673 merge,
2026-05-16) to pick up Multi-Token Prediction support.
No grpc-server.cpp changes are required: the existing `spec_type` option
delegates to upstream's `common_speculative_types_from_names()`, which
already accepts the new `draft-mtp` name. The `n_rs_seq` cparam needed
by MTP is auto-derived inside `common_context_params_to_llama` from
`params.speculative.need_n_rs_seq()`, and when no `draft_model` is set
the upstream server builds the MTP context off the target model itself.
Docs: extend the speculative-decoding section of the model-configuration
guide with the new type, both load paths (MTP head embedded in the main
GGUF vs. separate `mtp-*.gguf` sibling), the PR's recommended
`spec_n_max:2-3`, and the chained `draft-mtp,ngram-mod` recipe. Also
notes that the upstream `-hf` auto-discovery of `mtp-*.gguf` siblings is
not wired through LocalAI's gRPC layer.
Agent guide: short note explaining that new upstream spec types are
picked up automatically and that MTP needs no gRPC plumbing.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama-cpp): auto-detect MTP heads and enable draft-mtp on import + load
Detect upstream's `<arch>.nextn_predict_layers` GGUF metadata key (set by
`convert_hf_to_gguf.py` for Qwen3.5/3.6 family models and similar) and,
when present and the user has not configured a `spec_type` explicitly,
auto-append the upstream-recommended speculative-decoding tuple:
- spec_type:draft-mtp
- spec_n_max:6
- spec_p_min:0.75
The 0.75 p_min is pinned defensively because upstream marks the current
default with a "change to 0.0f" TODO; locking it here keeps acceptance
thresholds stable across future llama.cpp bumps.
Detection runs in two places:
- The model importer (`POST /models/import-uri`, the `/import-model`
UI) range-fetches the GGUF header for HuggingFace / direct-URL
imports via `gguf.ParseGGUFFileRemote`, with a 30s timeout and
non-fatal error handling. OCI/Ollama URIs are skipped because the
artifact is not directly streamable; the load-time hook covers them
once the file is on disk.
- The llama-cpp load-time hook (`guessGGUFFromFile`) reads the local
header on every model start and appends the same options if
`spec_type` is not already set.
Both paths share `ApplyMTPDefaults` and respect an explicit user-set
`spec_type:` / `speculative_type:` so YAML overrides win. Ginkgo
specs cover the append, preserve-user-choice, legacy alias, and nil
safety paths.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(importer): resolve huggingface:// URIs before MTP header probe
`gguf.ParseGGUFFileRemote` only speaks HTTP(S), but the importer was
handing it the raw `huggingface://...` URI directly (and similarly for
any other custom downloader scheme). Live-test against
`huggingface://ggml-org/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-MTP-Q8_0.gguf`
exposed this: the probe failed with `unsupported protocol scheme
"huggingface"`, was caught by the non-fatal error path, and the MTP
options were silently never applied to the generated YAML.
Route every candidate URI through `downloader.URI.ResolveURL()` and
require the resolved form to be HTTP(S). After the fix the probe
successfully reads `<arch>.nextn_predict_layers=1` from the real HF
GGUF and the emitted ConfigFile carries spec_type:draft-mtp,
spec_n_max:6, spec_p_min:0.75 as intended.
Assisted-by: Claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
fix(ollama): accept float-encoded integer options (num_ctx, top_k, ...)
Home Assistant's Ollama integration encodes integer options as JSON
floats (e.g. `"num_ctx": 8192.0`). Stdlib `json.Unmarshal` refuses to
decode a number with fractional notation into an `int` field, so the
entire request was rejected with HTTP 400 before reaching the backend:
Unmarshal type error: expected=int, got=number 8192.0,
field=options.num_ctx
Add a custom `UnmarshalJSON` on `OllamaOptions` that routes the int
fields (`top_k`, `num_predict`, `seed`, `repeat_last_n`, `num_ctx`)
through `*json.Number`, then converts via `Int64()` with a `Float64()`
fallback. Public field types are unchanged, so endpoint code is
untouched. Float fields and `stop` continue to parse via the default
path.
Fixes#9837
Assisted-by: Claude Code:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
Out-of-bounds read in SmartypantsRenderer.smartLeftAngle (CWE-125,
CVSS 7.5). Reachable transitively via LocalAGI's Email connector,
which renders inbound HTML email replies using html.CommonFlags
(includes Smartypants). An unmatched `<` in the inbound body could
panic the agent service.
Bump to v0.0.0-20260411013819-759bbc3e3207 (contains the fix). The
klauspost/compress entry loses its `// indirect` tag because
go mod tidy noticed pkg/utils/untar.go imports it directly.
Assisted-by: Claude:claude-opus-4-7 [Claude-Code]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
* realtime: honor output_modalities to skip TTS in text-only mode
The emulated realtime pipeline previously ignored the OpenAI Realtime spec
field output_modalities and always synthesized TTS. Add resolveOutputModalities
+ modalitiesContainAudio helpers and gate the TTS / ResponseOutputAudio*
emission so a client requesting ["text"] gets only ResponseOutputText* events.
This lets thin clients (e.g. thing5-poc) cache TTS on the client side while
still using the realtime WS for VAD + STT + LLM + tool-call parsing.
Assisted-by: Claude:claude-opus-4-7
* realtime: plumb response-level output_modalities and echo on session
Follow-up to the previous commit:
- Resolve response.create's output_modalities at the gate so a per-response
override of an audio session is honored (the test asserted this contract
but the production call site was passing nil).
- Mirror OutputModalities in the RealtimeSession echo so session.update
round-trips the client-supplied value, matching MaxOutputTokens's pattern.
Assisted-by: Claude:claude-opus-4-7
* realtime: silence errcheck on deferred os.Remove of TTS file
CI's errcheck flagged the pre-existing `defer os.Remove(audioFilePath)`
inside the audio-emission block (now wrapped by the modality gate). Wrap
the call in a closure that explicitly discards the error — the canonical
Go pattern for "I want to defer a cleanup whose error I genuinely don't
care about."
Assisted-by: Claude:claude-opus-4-7 golangci-lint
---------
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The Ollama /api/tags handler passes a nil filter to galleryop.ListModels.
When ModelsPath contains any non-skipped loose file the function then
calls filter(name, nil) and panics, which Echo surfaces to clients as
"Server disconnected without sending a response" - the exact failure
Home Assistant's Ollama integration reports against LocalAI.
Mirror the nil guard already present in
ModelConfigLoader.GetModelConfigsByFilter so every caller is safe, and
add a regression test that exercises the loose-file path with a nil
filter.
Assisted-by: claude:claude-opus-4-7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* fix(streaming): comply with OpenAI usage / stream_options spec (#8546)
LocalAI emitted `"usage":{"prompt_tokens":0,...}` on every streamed
chunk because `OpenAIResponse.Usage` was a value type without
`omitempty`. The official OpenAI Node SDK and its consumers
(continuedev/continue, Kilo Code, Roo Code, Zed, IntelliJ Continue)
filter on a truthy `result.usage` to detect the trailing usage chunk;
LocalAI's zero-but-non-null usage on every intermediate chunk made
that filter swallow every content chunk and surface an empty chat
response while the server log looked successful.
Changes:
- `core/schema/openai.go`: `Usage *OpenAIUsage \`json:"usage,omitempty"\``
so intermediate chunks no longer carry a `usage` key. Add
`OpenAIRequest.StreamOptions` with `include_usage` to mirror OpenAI's
request field.
- `core/http/endpoints/openai/chat.go` and `completion.go`: keep using
the `Usage` struct field as an in-process channel for the running
cumulative, but strip it before JSON marshalling. When the request
set `stream_options.include_usage: true`, emit a dedicated trailing
chunk with `"choices": []` and the populated usage (matching the
OpenAI spec and llama.cpp's server behavior).
- `chat_emit.go`: new `streamUsageTrailerJSON` helper; drop the
`usage` parameter from `buildNoActionFinalChunks` since chunks no
longer carry usage.
- Update `image.go`, `inpainting.go`, `edit.go` to wrap their Usage
values with `&` for the new pointer field.
- UI: send `stream_options:{include_usage:true}` from the React
(`useChat.js`) and legacy (`static/chat.js`) chat clients so the
token-count badge keeps populating now that the server is
spec-compliant.
Tests:
- New `chat_stream_usage_test.go` pins the spec invariants:
intermediate chunks have no `usage` key, the trailer JSON has
`"choices":[]` and a populated `usage`, and `OpenAIRequest` parses
`stream_options.include_usage`.
- Update `chat_emit_test.go` to reflect that finals no longer embed
usage.
Verified against the live LocalAI instance: before the fix Continue's
filter logic swallowed 16/16 token chunks; with the new shape it
yields 4/5 and routes usage through the dedicated trailer chunk.
Fixes#8546
Assisted-by: Claude:opus-4.7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* fix(streaming): silence errcheck on usage trailer Fprintf
The new spec-compliant `stream_options.include_usage` trailer writes
were flagged by errcheck since they're new code (golangci-lint runs
new-from-merge-base on master); the surrounding `fmt.Fprintf` data:
writes are grandfathered. Drop the return values explicitly to match
the linter's contract without adding a nolint shim.
Assisted-by: Claude:opus-4.7 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
The llama.cpp backend already accepts a free-form options: array in the
model config that maps to common_params fields, but a coverage audit
against upstream pin 7f3f843c flagged 12 user-visible knobs that were
neither set via the typed proto fields nor reachable via options:.
Wire them up under the existing if/else chain in params_parse, before
the speculative section. Each new option follows the file's prevailing
patterns (try/catch around numeric parses, the same true/1/yes/on bool
form used elsewhere, hardware_concurrency() fallback for thread counts,
mirror of draft_override_tensor for override_tensor).
Top-level / batching / IO:
- n_ubatch (alias ubatch) -- physical batch size; was previously
force-aliased to n_batch at line 482, blocking embedding/rerank
workloads that need independent control
- threads_batch (alias n_threads_batch) -- main-model batch threads;
mirrors the existing draft_threads_batch
- direct_io (alias use_direct_io) -- O_DIRECT model loads
- verbosity -- llama.cpp log threshold (line 479 had this commented
out)
- override_tensor (alias tensor_buft_overrides) -- per-tensor buffer
overrides for the main model; mirrors draft_override_tensor
Embedding / multimodal:
- pooling_type (alias pooling) -- mean/cls/last/rank/none; previously
only auto-flipped to RANK for rerankers
- embd_normalize (alias embedding_normalize) -- and the embedding
handler now reads params_base.embd_normalize instead of a hardcoded
2 at the previous embd_normalize literal in Embedding()
- mmproj_use_gpu (alias mmproj_offload) -- mmproj on CPU vs GPU
- image_min_tokens / image_max_tokens -- per-image vision token budget
Reasoning surface (the audit-focus three; LocalAI's existing
ReasoningConfig.DisableReasoning only feeds the per-request
chat_template_kwargs.enable_thinking and does not touch any of these):
- reasoning_format -- none/auto/deepseek/deepseek-legacy parser
- enable_reasoning (alias reasoning_budget) -- -1/0/>0 thinking budget
- prefill_assistant -- trailing-assistant-message prefill toggle
All 14 referenced fields exist on both the upstream pin and the
turboquant fork's common.h, so no LOCALAI_LEGACY_LLAMA_CPP_SPEC guard
is needed.
Docs: extend model-configuration.md with new "Reasoning Models",
"Multimodal Backend Options", "Embedding & Reranking Backend Options",
and "Other Backend Tuning Options" subsections; also refresh the
Speculative Type Values table to show the new dash-separated canonical
names alongside the underscore aliases LocalAI still accepts.
Assisted-by: claude-code:claude-opus-4-7
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
* ⬆️ Update ggml-org/llama.cpp
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(llama-cpp): adapt to upstream COMMON_SPECULATIVE_TYPE_DRAFT rename
ggml-org/llama.cpp#22964 ("spec: update CLI arguments for better
consistency") renamed the speculative type enum values:
COMMON_SPECULATIVE_TYPE_DRAFT -> COMMON_SPECULATIVE_TYPE_DRAFT_SIMPLE
COMMON_SPECULATIVE_TYPE_EAGLE3 -> COMMON_SPECULATIVE_TYPE_DRAFT_EAGLE3
and the registered name strings flipped from underscore- to dash-
separated form (e.g. ngram_simple -> ngram-simple), with the bare
draft/eagle3 aliases replaced by draft-simple/draft-eagle3.
This broke the build with the new LLAMA_VERSION on every variant
(vulkan/arm64, darwin and likely all the rest) at grpc-server.cpp:461.
Update the upstream branch of the speculative-type fallback to use the
new identifier (the LOCALAI_LEGACY_LLAMA_CPP_SPEC fork branch keeps the
old name), and normalize spec_type option tokens before passing them to
common_speculative_types_from_names so existing model configs that say
spec_type:draft / spec_type:ngram_simple keep working.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude-code:claude-opus-4-7
---------
Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>