Compare commits

..

21 Commits

Author SHA1 Message Date
Ettore Di Giacinto
bd4d166c6f fix(ik-llama): align json alias to ordered_json to resolve mtmd.h conflict (#10534)
mtmd.h declares `using json = nlohmann::ordered_json` at global scope (and its
mtmd.cpp depends on it), while ik_llama's whole server/common stack also uses
ordered_json. Our grpc-server.cpp/utils.hpp kept a plain `nlohmann::json` alias,
which now collides with mtmd.h once it is included for the multimodal port:
"conflicting declaration 'using json = ...'". Switch our two aliases to
ordered_json to match; it is API-compatible (utils.hpp already used ordered_json
for its log helper) and our json never crosses into an unordered-json API.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-27 23:41:50 +00:00
Ettore Di Giacinto
c67b511ac0 fix(ik-llama): port multimodal path to mtmd API and bump to f96eaddb (#10534)
The IK_LLAMA_VERSION bump to f96eaddba8bed6a9a5e628bbf6a566775c70b49c pulls in
upstream commit "Prune examples/llava", which deletes examples/llava (clip.* /
llava.*). The ik-llama backend's grpc-server.cpp built a local `myclip` library
from those files and called the removed clip/llava C API, so the bump no longer
builds.

ik_llama keeps its multimodal stack in the surviving `mtmd` library
(examples/mtmd/, public headers mtmd.h + mtmd-helper.h). This ports the backend's
multimodal path onto the high-level mtmd_* / mtmd_helper_* API in place, leaving
the text path (which still uses ik_llama's retained old common API) untouched:

- Makefile: bump IK_LLAMA_VERSION to f96eaddb.
- prepare.sh: drop the clip/llava source copy + sed block; mtmd is a library
  target, no source copy needed.
- CMakeLists.txt: remove the `myclip` target; link `mtmd` and add its include
  dir; build grpc-server as C++17 (mtmd headers require it).
- patches: drop 0002 (targeted the deleted examples/llava/clip.cpp; the mtmd
  clip.cpp never calls ggml_quantize_chunk, so the fix is unneeded). Keep 0001
  (verified still applies).
- grpc-server.cpp / utils.hpp: replace clip_model_load + clip_image_load_from_bytes
  + llava_image_embed_make_with_clip_img + the manual [img-N] prefix splitting and
  per-image llava_embd_batch decode loop with mtmd_init_from_file (moved after the
  model load, which it requires), mtmd_helper_bitmap_init_from_buf, mtmd_tokenize
  and mtmd_helper_eval_chunks. Legacy [img-N] tags are translated, in order, into
  mtmd media markers (mtmd_default_marker()); the post-image suffix text stays on
  the normal token path so the sampling loop is unchanged.

Supersedes #10534.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]
2026-06-27 22:57:30 +00:00
LocalAI [bot]
1154be5eea fix(config): fall back to DefaultContextSize for unparseable GGUFs; pin NVFP4 gallery context_size (#10563)
The GGUF metadata parser (gpustack/gguf-parser-go) cannot read NVFP4-quantized
GGUFs at all: it errors with "read tensor info 0: This quantized type is
currently unsupported" because NVFP4 is a ggml tensor type it does not know.
When ParseGGUFFile errors, the llama-cpp defaults hook skips guessGGUFFromFile
entirely and the deferred fallback sets the context window to the conservative
GGUFFallbackContextSize (1024). The result: a model that trains to 262144
tokens runs with n_ctx=1024, and every prompt over ~1k tokens fails with
"request (N tokens) exceeds the available context size (1024 tokens)".

Two changes:

- Drop GGUFFallbackContextSize (1024) and fall back to DefaultContextSize
  (4096) in both the GGUF run-estimate path (gguf.go) and the deferred hook
  fallback (hooks_llamacpp.go). 1024 is a sensible floor for a tiny CPU GGUF
  but a footgun for a large, long-context model whose header simply cannot be
  parsed. Strengthen the existing "GGUF unreadable" test to assert the value.

- Set context_size explicitly on the four NVFP4 gallery entries
  (qwen3.6-35b-a3b-nvfp4-mtp, qwopus3.6-27b-v2-mtp-nvfp4,
  qwopus3.6-27b-coder-mtp-nvfp4, qwen3.6-27b-nvfp4-mtp) so the parser failure
  is irrelevant for them. 32768 matches sibling Qwen entries and is safe on
  memory; operators can raise it toward the 262144 train length.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 23:34:52 +02:00
LocalAI [bot]
8aba4fdba3 chore(fish-speech): drop the darwin/metal build target (#10561)
The fish-speech metal-darwin-arm64 backend build has been failing on every
release (v4.5.3, v4.5.4, v4.5.5) and is a standing red on the darwin backend
matrix. fish-speech pulls `tokenizers` transitively from its upstream source
(`pip install -e fish-speech-src`), and on darwin/arm64 there is no prebuilt
wheel for the pinned old `tokenizers` version, so pip builds it from source.
Modern rustc rejects that old crate as a hard error:

    error: casting `&T` to `&mut T` is undefined behavior ...
       --> tokenizers-lib/src/models/bpe/trainer.rs:517:47
       = note: `#[deny(invalid_reference_casting)]` on by default
    error: could not compile `tokenizers` (lib) due to 1 previous error

This is deterministic, not a flake, and there is no clean fix that does not
either pin a stale Rust toolchain or downgrade a soundness lint guarding real
UB. Until upstream fish-speech moves to a tokenizers version that compiles on
current toolchains, drop darwin support so the release backend build stays
green. The Linux/CUDA/ROCm/Intel/L4T variants are unaffected.

Removes:
- the `-metal-darwin-arm64-fish-speech` entry from `includeDarwin` in
  backend-matrix.yml
- the `metal:` capability mappings and the concrete `metal-fish-speech` /
  `metal-fish-speech-development` gallery entries in backend/index.yaml
- the now-unused darwin-only requirements-mps.txt

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 23:24:21 +02:00
LocalAI [bot]
d7d7721eae feat(distributed): SyncedMap component + migrate finetune/quant/agent-tasks to cross-replica state (#10542)
* feat(distributed): add SyncedMap cross-replica in-memory state component

Introduce core/services/syncstate.SyncedMap[K,V]: a thread-safe in-memory map
that keeps itself consistent across frontend replicas via NATS, with an optional
pluggable durable Store and hydrate-from-source convergence.

Several features keep process-local state surfaced to the API (finetune/quant
jobs, agent tasks, model configs) and each hand-wired the same in-memory + NATS
broadcast + read-through-store legs - or forgot to, reintroducing cross-replica
staleness. SyncedMap makes that consistency a configuration choice:

- local writes mutate the map, write through the Store, then broadcast a delta;
- the apply path is memory-only and never re-publishes or re-writes the Store
  (structural echo-loop guard, mirroring galleryop.mergeStatus);
- on Start and on NATS reconnect the map re-hydrates from the source (Store, else
  Loader); an optional periodic Reconcile repairs silent drift;
- standalone mode (nil NATS client) is a strict in-memory no-op.

Reconnect re-hydrate is wired via a new *messaging.Client.OnReconnect callback,
consumed through an optional type-assertion so MessagingClient stays minimal.
Adds messaging.SubjectSyncStateDelta and a reusable testutil.FakeBus (synchronous
in-process MessagingClient with wildcard matching) for adopter tests.

Component only; service migrations follow in subsequent commits.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* refactor(finetune): back jobs with SyncedMap for cross-replica consistency

FineTuneService kept jobs in a process-local map and, although it wrote them to
Postgres, ListJobs/GetJob never read the store back and the wired natsClient was
never used - so in distributed mode a job created on one replica was invisible to
the others. Replace the map and the dead client with a syncstate.SyncedMap keyed
by job ID, value *schema.FineTuneJob (the exact REST shape, so responses are
unchanged).

- Add a Store adapter (core/services/finetune/syncstore.go) over FineTuneStore,
  plus FineTuneStore.ListAll (global hydrate; per-user List kept) and an
  idempotent Upsert (create-or-update; Create alone fails on dup key).
- Writes go through SyncedMap.Set/Delete (write-through + broadcast); reads use
  List/Get. The on-disk state.json path becomes the standalone Loader, keeping
  single-node restart recovery (stale->stopped / exporting->failed fixups).
- Fold SetNATSClient/SetFineTuneStore into NewFineTuneService; app.go passes the
  distributed NATS client + store when distributed, nil otherwise.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* refactor(agentpool): back agent tasks with SyncedMap for cross-replica consistency

AgentJobService.ListTasks read the process-local tasks map only, while ListJobs
already read through the DB persister + dispatcher NATS - so in distributed mode
a task created on one replica was invisible to the others. Back tasks with a
syncstate.SyncedMap keyed by task ID (value schema.Task, the exact REST shape);
jobs are left untouched.

- Store adapter (task_syncstore.go) over the existing JobPersister
  (LoadTasks/SaveTask/DeleteTask); reads svc.persister/userID live so a persister
  swap needs no rebuild. No new persister methods required.
- Task reads -> SyncedMap.List/Get; create/update -> Set (write-through +
  broadcast); delete -> Delete. The file persister now owns its own task set so
  the write-through path does not re-enter the SyncedMap lock (deadlock guard).
- The distributed NATS client is not available at construction (start() precedes
  initDistributed), so it is injected via SetTaskSyncNATS, which rebuilds the
  still-empty map before Start/hydrate. Wired at the main, restart, and per-user
  (UserServicesManager) distributed sites.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* refactor(quantization): back jobs with SyncedMap + durable QuantStore

QuantizationService kept jobs in a process-local map persisted only to a local
state.json, so in distributed mode jobs were neither visible across replicas nor
durable cluster-wide. Back jobs with a syncstate.SyncedMap keyed by job ID
(value *schema.QuantizationJob, the exact REST shape).

- New distributed.QuantStore (GORM, table quantization_jobs) mirroring
  FineTuneStore: Create/Get/ListAll/Upsert(idempotent)/Delete, registered for
  AutoMigrate via distributed.InitStores (Stores.Quant).
- New adapter (quantization/syncstore.go) over QuantStore implementing
  syncstate.Store, with record<->schema conversion.
- Reads go through List/Get, writes through Set/Delete (write-through +
  broadcast); state.json is kept as the standalone Loader for single-node restart
  recovery (stale-job fixups preserved).
- app.go passes the distributed NATS client + QuantStore when distributed, nil
  otherwise; Start/Close lifecycle mirrors finetune.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(syncstate): annotate gosec G118 false positive on lifeCtx

gosec flagged the WithCancel in Start as "cancellation function not called"
because the returned cancel is stored on the struct rather than called/deferred
in scope. It is invoked in Close (covered by tests), and lifeCtx must outlive
Start to drive the reconnect/reconcile goroutines. Suppress the verified false
positive with a justified #nosec G118.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* test(distributed): e2e two-replica SyncedMap sync over real NATS + Postgres

Adds the real-infrastructure counterpart to the fake-bus unit tests, in the
existing distributed e2e suite (testcontainers NATS + PostgreSQL). Two SyncedMap
instances stand in for two frontend replicas - each with its OWN NATS connection
to a shared server and a SHARED Postgres store (the distributed-mode invariant) -
and assert, over the wire:

- a create on replica A is observed by replica B;
- an update and a delete propagate A -> B (delete prunes, which a reload cannot);
- a late-joining replica recovers a job it never received a delta for, via store
  hydrate on Start (the at-most-once gap a fake bus cannot exercise);
- a local Set is written through to the shared Postgres store.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 23:23:51 +02:00
Nicholas Ciechanowski
c548150f99 fix(distributed): missing agent NATS permission (#10549)
Signed-off-by: Nicholas Ciechanowski <nicholas@ciech.anow.ski>
2026-06-27 21:10:12 +00:00
LocalAI [bot]
ec26b86dd4 docs: ⬆️ update docs version mudler/LocalAI (#10560)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-27 22:36:02 +02:00
LocalAI [bot]
d11b202dd2 fix(backends): whisper darwin run.sh loads whichever fallback lib exists (.so/.dylib) (#10553)
fix(backends): whisper darwin run.sh loads whichever fallback lib exists

The macOS branch hardcoded WHISPER_LIBRARY=$CURDIR/libgowhisper-fallback.dylib,
but the cmake build emits a Mach-O named libgowhisper-fallback.so on darwin, so
the Go loader panicked at runtime ("dlopen ...dylib: no such file") and the
backend exited ("grpc service not ready") — breaking e.g. the silero-vad-ggml
VAD on darwin. Pick whichever of .dylib/.so is present so it is robust to the
build's naming either way.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 14:07:56 +02:00
LocalAI [bot]
e95018ef70 chore(model gallery): 🤖 add 1 new models via gallery agent (#10544)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-27 09:42:46 +02:00
LocalAI [bot]
0258f8af55 fix(backends): repair release CI build/test breaks (kokoros, fish-speech, llama-cpp-quantization, sglang) (#10547)
* fix(kokoros): implement new Backend RPCs to fix the build

The backend.proto grew six RPCs (SoundDetection, Depth, TokenClassify,
Score and the bidi-streaming Forward) that the kokoros gRPC service never
implemented, so the trait impl no longer satisfies `Backend`:

    error[E0046]: not all trait items implemented, missing:
      `sound_detection`, `depth`, `token_classify`, `score`,
      `ForwardStream`, `forward`

kokoros is a TTS backend with no use for these, so add `unimplemented`
stubs (plus the `ForwardStream` associated type) matching the existing
pattern for every other unsupported RPC in this file.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(fish-speech): add setuptools-rust for the editable source install

install.sh installs the fish-speech source tree editable with
`--no-build-isolation`, which means the build backends of its transitive
dependencies must already be present in the venv. One of them builds a
Rust extension and its metadata step fails with:

    ModuleNotFoundError: No module named 'setuptools_rust'

Add setuptools-rust to requirements.txt so installRequirements provisions
it before the editable install runs.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(llama-cpp-quantization): vendor convert_hf_to_gguf.py with conversion/

Upstream llama.cpp split the model-specific logic out of the single
convert_hf_to_gguf.py file into a sibling `conversion/` package, so the
script now starts with `from conversion import ...`. Downloading just the
one file therefore fails at runtime with:

    ModuleNotFoundError: No module named 'conversion'

Clone the repo (reusing the clone already needed to build llama-quantize)
and copy both the script and the `conversion/` package into the backend
dir. Python puts the script's own directory on sys.path[0], so the package
resolves when it sits beside the script.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(sglang): pin the CPU source build to sglang v0.5.11

The CPU profile builds sgl-kernel from a `git clone` of sglang with no
ref, so it always tracks master. Recent master added CPU kernels (e.g.
mamba/fla.cpp) that fail to compile in our builder:

    constexpr variable 'scale' must be initialized by a constant
    static library kineto_LIBRARY-NOTFOUND not found

Pin the clone to v0.5.11, the same release the GPU path already floors on
(requirements-cublas12-after.txt). Overridable via SGLANG_VERSION so the
pin can be bumped deliberately.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 09:42:22 +02:00
LocalAI [bot]
14b29ebf4e fix(backends): derive darwin RUN_BINARY from the exec line only (#10541)
golang-darwin.sh's packaging check derived the launch binary by grepping every
$CURDIR/... reference in run.sh and taking the last one. Backends that pick a
runtime CPU variant assign it via unquoted `LIBRARY=$CURDIR/libgo<x>-avx512.so`
lines, so the heuristic returned `libgo<x>-avx512.so` — a variant Darwin never
builds (arm64 builds only fallback) — and the check then failed with
"package/libgo<x>-avx512.so not found ... refusing to package (#10267)",
breaking the darwin builds for whisper, sam3-cpp, vibevoice-cpp and friends.

Scan only the `exec` line(s) (the actual launch contract) and tolerate a
quoted `exec "$CURDIR"/<binary>`. parakeet-cpp's parakeet-cpp-grpc and the
quoted-only backends (sherpa/piper/opus) resolve correctly; no Linux change.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 02:05:40 +02:00
LocalAI [bot]
f0d0bff232 fix(llama-cpp): stop reinterpreting plain-string message content as JSON (#10524) (#10538)
The llama-cpp gRPC backend reconstructs OpenAI messages from proto for the
tokenizer-template path and blindly json::parse'd each message's content
string. LocalAI's Go layer always flattens content to a plain string, so a
user prompt that merely looks like JSON (e.g. mealie's ingredient array
["1/4 cup brown sugar", ...]) was reinterpreted as structured content parts and
rejected by oaicompat_chat_params_parse with "unsupported content[].type".

Normalize content per role instead: user/system/developer content is opaque
text and is never JSON-sniffed; assistant/tool content still collapses a literal
JSON null/object (tool-call bookkeeping) to a string, but a plain string is
never turned into an array/scalar. The array defense is role-independent, so the
role gate only governs the benign null/object case.

While here, extract the duplicated per-message reconstruction and the
pre-template content sanitization into shared, unit-tested helpers
(message_content.h) so the streaming (PredictStream) and non-streaming (Predict)
paths cannot drift. This removes ~490 lines of copy-pasted defensive code, the
dead tool-role parse branches, and the redundant Predict-only tool_calls branch,
while preserving the prior #7324 (null content -> "") and #7528 (tool array
content -> string) fixes.

Tests:
- backend/cpp/llama-cpp/message_content_test.cpp: standalone C++ unit tests for
  all three helpers (#10524, #7324, #7528, multimodal), discovered and run by
  `make test-backend-cpp` and a new generic tests-backend-cpp CI job. Also wired
  as an opt-in CMake/ctest target (-DLLAMA_GRPC_BUILD_TESTS=ON).
- core/schema/message_test.go: Go regression pinning that ToProto flattens a
  JSON-array-looking text part to the verbatim string.
- prepare.sh now copies message_content.h into the build tree.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 01:42:05 +02:00
LocalAI [bot]
64150ca7ab fix(distributed): broadcast admin model-config changes across replicas (#10540)
In distributed mode the admin model endpoints (/models/edit, /models/import,
/models/toggle-state and the PATCH config-json endpoint) wrote the YAML to the
shared models dir but reloaded only the local replica's in-memory
ModelConfigLoader. With multiple frontend replicas behind one service, a save
landed on whichever replica handled the request; peers kept serving their stale
in-memory view, so a load-balanced request was a coin-flip between old and new
config (a created alias visible on one replica and missing on the other, an
edited alias target diverging, etc.).

The NATS cache-invalidation channel (SubjectCacheInvalidateModels +
OnModelsChanged) already existed for the gallery install/delete path; these
admin endpoints simply never published on it. Wire them up via a new
GalleryService.BroadcastModelsChanged helper (no-op in standalone mode).

Also fix delete propagation: LoadModelConfigsFromPath is additive and never
drops an entry whose file is gone, so the subscriber hook (which only reloaded
from disk) could not propagate a removal. ApplyRemoteChange now honors the
event op - pruning the element on "delete" and reloading otherwise - and shuts
down any running instance of the affected model so the new config takes effect.
This closes the same latent gap on the gallery delete path.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 01:36:57 +02:00
LocalAI [bot]
f98b0f1c1e fix(gpu-libs): bundle transitive deps of GPU runtime libs (#10537) (#10539)
fix(gpu-libs): bundle transitive deps of GPU runtime libs

The per-vendor packagers in package-gpu-libs.sh copy an explicit allowlist
of top-level GPU runtime libraries (libamdhip64, libhipblas, librocblas, the
CUDA/Intel equivalents, ...) but never resolved their transitive
dependencies. Backends run through the bundled lib/ld.so with
LD_LIBRARY_PATH=lib, so any transitive dep not in the allowlist is a fatal
"cannot open shared object file" at load time.

On recent ROCm (base image rocm 7.2.1) the runtime libs link against
librocprofiler-register.so.0, which is not in the allowlist, so the rocm
llama-cpp backend (and every other GPU backend sharing this script) failed
to load with:

  librocprofiler-register.so.0: cannot open shared object file

The Vulkan path already solved this class of problem with copy_elf_deps
(ldd-based transitive resolution), but that sweep was only wired into the
Vulkan ICD path. This adds a generic sweep_transitive_deps that runs the
same ldd resolution over everything the allowlist already bundled, and wires
it into the ROCm, CUDA and Intel packagers. ldd returns the full recursive
closure, so one pass suffices; core libc-family deps are skipped via
is_core_lib so we never shadow the loader's own libc/libstdc++.

Adds a self-contained regression test (gcc + ldd) that fabricates a primary
lib linking a transitive lib and asserts the sweep bundles the dependency.

Fixes #10537

Assisted-by: Claude:opus-4.8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 01:36:33 +02:00
LocalAI [bot]
2c96c2d08e chore: ⬆️ Update mudler/parakeet.cpp to f469a57270a1cc4554acb15febf60e56619673b9 (#10530)
⬆️ Update mudler/parakeet.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-27 00:50:51 +02:00
LocalAI [bot]
f01a969f7b docs: ⬆️ update docs version mudler/LocalAI (#10531)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-27 00:29:29 +02:00
LocalAI [bot]
56600eec3e fix(nodes): show a node's existing labels on the detail view (#10529)
fix(nodes): return labels in single-node GET so the detail view shows them

The node detail view (/app/nodes/:id) reads `node.labels` to render a
node's existing labels, but the single-node GET endpoint returned a bare
BackendNode whose Labels live in a separate table - so the list was always
empty and operators could only add labels, never see what was already set
(#10527). The same response also lacked in_flight_count and model_count.

Add NodeRegistry.GetWithExtras, mirroring the existing List vs ListWithExtras
split: bare Get stays cheap for the routing hot paths and existence checks,
while the detail endpoint uses the enriched variant to attach the labels map
and live counts. No frontend change is needed - the UI already renders
existing labels once the data is present.

Closes #10527


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 23:06:42 +02:00
LocalAI [bot]
c4fa256cdf chore(model gallery): 🤖 add 1 new models via gallery agent (#10526)
chore(model gallery): 🤖 add new models via gallery agent

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-26 22:31:22 +02:00
LocalAI [bot]
17c1fc74b2 fix(backends): darwin packaging for silero-vad (last Linux-only Go backend) (#10528)
fix(backends): darwin packaging for silero-vad

silero-vad was the last Go backend with Linux-only darwin packaging:
- package.sh fell through to "Could not detect architecture" -> exit 1 on
  macOS (no Darwin branch), so its darwin image never packaged.
- run.sh exported LD_LIBRARY_PATH, which macOS dyld ignores, so the bundled
  libonnxruntime.dylib couldn't be found at runtime.

Add a Darwin branch to package.sh (skip the glibc/ld.so bundling; add an
@loader_path/lib rpath so @rpath resolves to package/lib/) and a
DYLD_LIBRARY_PATH branch to run.sh — mirroring the piper darwin fix (#10525).

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 22:31:06 +02:00
LocalAI [bot]
068d397acf fix(backends): set rpath on the piper darwin binary so it can load its bundled libs (#10525)
The metal-darwin-arm64-piper backend crashed at launch on macOS:

    DYLD "Library missing"
      Library not loaded: @rpath/libucd.dylib
      Referenced from: .../piper
      Reason: no LC_RPATH's found

The piper binary links libucd, libespeak-ng, libpiper_phonemize and
libonnxruntime via @rpath, but ships with no LC_RPATH, so dyld cannot
expand @rpath and aborts before piper runs. The libraries themselves are
already bundled in package/lib/ by package.sh.

Additionally, package.sh's architecture detection only handled the Linux
glibc loaders (/lib64/ld-linux-x86-64.so.2, /lib/ld-linux-aarch64.so.1)
and otherwise hit `echo "Error: Could not detect architecture"; exit 1`,
so on macOS packaging failed outright.

Add a Darwin branch (before the Linux checks) that skips the glibc/ld.so
bundling macOS has no use for and instead runs
`install_name_tool -add_rpath @loader_path/lib` on the piper binary, so
@rpath resolves to the bundled package/lib/ directory.

Also mirror sherpa-onnx/opus in run.sh: export DYLD_LIBRARY_PATH on
Darwin (LD_LIBRARY_PATH is Linux-only) as a defensive fallback.

Validated by hand on Apple Silicon: with the rpath added, piper
synthesized a real WAV. The darwin build is validated in CI.

Assisted-by: Claude:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 15:10:15 +02:00
LocalAI [bot]
5b3572f8b8 feat(macos): sign and notarize the DMG, app, and server binary (#10510)
Produce a Gatekeeper-clean macOS distribution with no user workaround:

- Launcher DMG + the LocalAI.app inside it are built via fyne, codesigned
  with the Developer ID under the hardened runtime, then the DMG is signed,
  notarized (notarytool) and stapled. Replaces macos-dmg-creator (which had
  no signing hook) with fyne package + hdiutil so we control the .app before
  packaging.
- The bare local-ai darwin server binary is signed + notarized via
  GoReleaser's native notarize block (quill backend, runs on Linux).
- All signing is gated on secrets being present, so forks/PRs/local builds
  stay unsigned and green (contrib/macos/sign-and-notarize.sh no-ops).
- Add hardened-runtime entitlements and FyneApp.toml for deterministic
  packaging; update macOS install docs to drop the quarantine workaround.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-26 12:45:51 +02:00
91 changed files with 4070 additions and 1039 deletions

View File

@@ -4991,9 +4991,6 @@ includeDarwin:
- backend: "qwen-tts"
tag-suffix: "-metal-darwin-arm64-qwen-tts"
build-type: "mps"
- backend: "fish-speech"
tag-suffix: "-metal-darwin-arm64-fish-speech"
build-type: "mps"
- backend: "voxcpm"
tag-suffix: "-metal-darwin-arm64-voxcpm"
build-type: "mps"

View File

@@ -24,6 +24,11 @@ jobs:
args: release --clean
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
MACOS_SIGN_P12: ${{ secrets.MACOS_CERTIFICATE }}
MACOS_SIGN_PASSWORD: ${{ secrets.MACOS_CERTIFICATE_PWD }}
MACOS_NOTARY_KEY: ${{ secrets.MACOS_NOTARY_KEY }}
MACOS_NOTARY_KEY_ID: ${{ secrets.MACOS_NOTARY_KEY_ID }}
MACOS_NOTARY_ISSUER_ID: ${{ secrets.MACOS_NOTARY_ISSUER_ID }}
launcher-build-darwin:
runs-on: macos-latest
steps:
@@ -35,9 +40,19 @@ jobs:
uses: actions/setup-go@v5
with:
go-version: 1.23
- name: Build launcher for macOS ARM64
run: |
make build-launcher-darwin
- name: Import signing certificate
env:
MACOS_CERTIFICATE: ${{ secrets.MACOS_CERTIFICATE }}
MACOS_CERTIFICATE_PWD: ${{ secrets.MACOS_CERTIFICATE_PWD }}
MACOS_CI_KEYCHAIN_PWD: ${{ secrets.MACOS_CI_KEYCHAIN_PWD }}
run: bash contrib/macos/sign-and-notarize.sh import-cert
- name: Build, sign and notarize the DMG
env:
MACOS_SIGN_IDENTITY: ${{ secrets.MACOS_SIGN_IDENTITY }}
MACOS_NOTARY_KEY: ${{ secrets.MACOS_NOTARY_KEY }}
MACOS_NOTARY_KEY_ID: ${{ secrets.MACOS_NOTARY_KEY_ID }}
MACOS_NOTARY_ISSUER_ID: ${{ secrets.MACOS_NOTARY_ISSUER_ID }}
run: make release-launcher-darwin
- name: Upload DMG to Release
uses: softprops/action-gh-release@v3
with:

View File

@@ -121,3 +121,19 @@ jobs:
detached: true
connect-timeout-seconds: 180
limit-access-to-actor: true
# Fast standalone unit tests for the backends' pure C++ helpers - currently the
# llama-cpp message reconstruction (backend/cpp/llama-cpp/message_content.h),
# which guards the OpenAI chat content normalization (mudler/LocalAI#10524,
# #7324, #7528). The runner discovers every *_test.cpp under backend/cpp/, so
# new pure-C++ unit tests are picked up with no CI changes. These need only the
# C++ stdlib + nlohmann/json, so they run on every PR without the full
# llama.cpp + gRPC backend build. (The same suite is also wired as an opt-in
# CMake/ctest target, -DLLAMA_GRPC_BUILD_TESTS=ON, for in-backend-build runs.)
tests-backend-cpp:
runs-on: ubuntu-latest
steps:
- name: Clone
uses: actions/checkout@v7
- name: Run backend C++ unit tests
run: make test-backend-cpp

3
.gitignore vendored
View File

@@ -94,3 +94,6 @@ core/http/react-ui/test-results/
# SDD / brainstorm scratch (agent-driven development)
.superpowers/
# Local Apple signing material (never commit)
.certs/

View File

@@ -9,7 +9,8 @@ source:
enabled: true
name_template: '{{ .ProjectName }}-{{ .Tag }}-source'
builds:
- main: ./cmd/local-ai
- id: local-ai
main: ./cmd/local-ai
env:
- CGO_ENABLED=0
ldflags:
@@ -35,3 +36,19 @@ snapshot:
version_template: "{{ .Tag }}-next"
changelog:
use: github-native
# Sign + notarize the macOS server binary via the quill backend (runs on Linux,
# no macOS runner needed). Disabled automatically when MACOS_SIGN_P12 is unset
# (forks / PRs), so those builds stay unsigned and green.
notarize:
macos:
- enabled: '{{ isEnvSet "MACOS_SIGN_P12" }}'
ids:
- local-ai
sign:
certificate: "{{.Env.MACOS_SIGN_P12}}"
password: "{{.Env.MACOS_SIGN_PASSWORD}}"
notarize:
issuer_id: "{{.Env.MACOS_NOTARY_ISSUER_ID}}"
key_id: "{{.Env.MACOS_NOTARY_KEY_ID}}"
key: "{{.Env.MACOS_NOTARY_KEY}}"
wait: true

View File

@@ -103,7 +103,7 @@ COVERAGE_E2E_LABELS?=!real-models
COVERAGE_EXCLUDE_RE?=grpc/proto/.*[.]pb[.]go
.PHONY: all test test-coverage test-coverage-baseline test-coverage-check test-ui test-ui-coverage-baseline test-ui-coverage-check install-hooks build vendor lint lint-all
.PHONY: all test test-coverage test-coverage-baseline test-coverage-check test-backend-cpp test-ui test-ui-coverage-baseline test-ui-coverage-check install-hooks build vendor lint lint-all
all: help
@@ -201,6 +201,13 @@ test: prepare-test
OPUS_SHIM_LIBRARY=$(abspath ./pkg/opus/shim/libopusshim.so) \
$(GOCMD) run github.com/onsi/ginkgo/v2/ginkgo --flake-attempts $(TEST_FLAKES) --fail-fast -v -r $(TEST_PATHS)
## Compiles and runs the standalone C++ unit tests for the backends (pure
## helpers that depend only on the stdlib + nlohmann/json, no full backend
## build). Discovers every *_test.cpp under backend/cpp/ - see
## backend/cpp/run-unit-tests.sh. Set NLOHMANN_INCLUDE to skip the header fetch.
test-backend-cpp:
bash backend/cpp/run-unit-tests.sh
## Runs the core suite ($(TEST_PATHS)) with statement-coverage instrumentation
## and writes a merged profile to $(COVERAGE_PROFILE). Deliberately omits
## --fail-fast so a single failure doesn't truncate the coverage number, and
@@ -1453,13 +1460,32 @@ docs: docs/static/gallery.html
########################################################
## fyne cross-platform build
build-launcher-darwin: build-launcher
go run github.com/tiagomelo/macos-dmg-creator/cmd/createdmg@latest \
--appName "LocalAI" \
--appBinaryPath "$(LAUNCHER_BINARY_NAME)" \
--bundleIdentifier "com.localai.launcher" \
--iconPath "core/http/static/logo.png" \
--outputDir "dist/"
# Build LocalAI.app from the launcher via fyne (metadata read from cmd/launcher/FyneApp.toml).
# Signing happens via contrib/macos/sign-and-notarize.sh, which is a no-op when the signing
# secrets are unset, so unsigned local/fork builds keep working.
build-launcher-darwin:
rm -rf dist/LocalAI.app cmd/launcher/LocalAI.app
mkdir -p dist
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os darwin -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)
mv cmd/launcher/LocalAI.app dist/LocalAI.app
bash contrib/macos/sign-and-notarize.sh sign dist/LocalAI.app
# Wrap the (signed) app into a drag-to-Applications DMG via hdiutil, then sign the DMG.
dmg-launcher-darwin: build-launcher-darwin
rm -rf dist/dmg dist/LocalAI.dmg
mkdir -p dist/dmg
cp -R dist/LocalAI.app dist/dmg/LocalAI.app
ln -s /Applications dist/dmg/Applications
hdiutil create -volname "LocalAI" -srcfolder dist/dmg -ov -format UDZO dist/LocalAI.dmg
bash contrib/macos/sign-and-notarize.sh sign dist/LocalAI.dmg
# Submit the DMG to Apple notarization and staple the ticket (no-op without notary secrets).
notarize-launcher-darwin: dmg-launcher-darwin
bash contrib/macos/sign-and-notarize.sh notarize dist/LocalAI.dmg
# Single entrypoint for CI: build -> sign app -> dmg -> sign dmg -> notarize -> staple.
release-launcher-darwin: notarize-launcher-darwin
@echo "dist/LocalAI.dmg is ready"
build-launcher-linux:
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os linux -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)-linux && mv launcher.tar.xz ../../$(LAUNCHER_BINARY_NAME)-linux.tar.xz
cd cmd/launcher && go run fyne.io/tools/cmd/fyne@latest package -os linux -icon ../../core/http/static/logo.png --executable $(LAUNCHER_BINARY_NAME)-linux && mv LocalAI.tar.xz ../../$(LAUNCHER_BINARY_NAME)-linux.tar.xz

View File

@@ -1,15 +1,6 @@
## Clip/LLaVA library for multimodal support — built locally from copied sources
set(TARGET myclip)
add_library(${TARGET} clip.cpp clip.h llava.cpp llava.h)
install(TARGETS ${TARGET} LIBRARY)
target_include_directories(myclip PUBLIC .)
target_include_directories(myclip PUBLIC ../..)
target_include_directories(myclip PUBLIC ../../common)
target_link_libraries(${TARGET} PRIVATE common ggml llama ${CMAKE_THREAD_LIBS_INIT})
target_compile_features(${TARGET} PRIVATE cxx_std_11)
if (NOT MSVC)
target_compile_options(${TARGET} PRIVATE -Wno-cast-qual)
endif()
## Multimodal support is provided by the in-tree `mtmd` library target
## (examples/mtmd/), which the grpc-server links and includes below. clip/llava
## were pruned upstream; the high-level mtmd_* / mtmd_helper_* API is used instead.
set(TARGET grpc-server)
set(CMAKE_CXX_STANDARD 17)
@@ -67,12 +58,16 @@ add_library(hw_grpc_proto
${hw_proto_hdrs} )
add_executable(${TARGET} grpc-server.cpp json.hpp)
target_link_libraries(${TARGET} PRIVATE common llama myclip ${CMAKE_THREAD_LIBS_INIT} absl::flags hw_grpc_proto
# mtmd public headers (mtmd.h / mtmd-helper.h) live in examples/mtmd/.
# Linking the mtmd target also propagates this include dir, but we add it
# explicitly for clarity.
target_include_directories(${TARGET} PRIVATE ../mtmd)
target_link_libraries(${TARGET} PRIVATE common llama mtmd ${CMAKE_THREAD_LIBS_INIT} absl::flags hw_grpc_proto
absl::flags_parse
gRPC::${_REFLECTION}
gRPC::${_GRPC_GRPCPP}
protobuf::${_PROTOBUF_LIBPROTOBUF})
target_compile_features(${TARGET} PRIVATE cxx_std_11)
target_compile_features(${TARGET} PRIVATE cxx_std_17)
if(TARGET BUILD_INFO)
add_dependencies(${TARGET} BUILD_INFO)
endif()

View File

@@ -1,5 +1,5 @@
IK_LLAMA_VERSION?=b84902d2ad27c34f989f23947200c4b91b1568fd
IK_LLAMA_VERSION?=f96eaddba8bed6a9a5e628bbf6a566775c70b49c
LLAMA_REPO?=https://github.com/ikawrakow/ik_llama.cpp
CMAKE_ARGS?=

View File

@@ -11,8 +11,8 @@
#include <memory>
#include <string>
#include <getopt.h>
#include "clip.h"
#include "llava.h"
#include "mtmd.h"
#include "mtmd-helper.h"
#include "log.h"
#include "common.h"
#include "json.hpp"
@@ -45,7 +45,9 @@ using backend::HealthMessage;
///// LLAMA.CPP server code below
using json = nlohmann::json;
// Match mtmd.h and ik_llama's server/common headers, which all use
// nlohmann::ordered_json; a plain nlohmann::json alias collides at global scope.
using json = nlohmann::ordered_json;
struct server_params
{
@@ -219,6 +221,11 @@ struct llama_client_slot
// multimodal
std::vector<slot_image> images;
// Full prompt with mtmd media markers (mtmd_default_marker()) substituted in
// place of the legacy [img-N] tags, covering the text up to and including the
// last image. The text after the last image is kept in params.input_suffix and
// decoded through the normal token path so the sampling loop is unchanged.
std::string mtmd_prompt;
// stats
size_t sent_count = 0;
@@ -252,14 +259,14 @@ struct llama_client_slot
for (slot_image & img : images)
{
free(img.image_embedding);
if (img.img_data) {
clip_image_u8_free(img.img_data);
if (img.bitmap) {
mtmd_bitmap_free(img.bitmap);
img.bitmap = nullptr;
}
img.prefix_prompt = "";
}
images.clear();
mtmd_prompt = "";
}
bool has_budget(gpt_params &global_params) {
@@ -396,46 +403,13 @@ struct llama_metrics {
}
};
struct llava_embd_batch {
std::vector<llama_pos> pos;
std::vector<int32_t> n_seq_id;
std::vector<llama_seq_id> seq_id_0;
std::vector<llama_seq_id *> seq_ids;
std::vector<int8_t> logits;
llama_batch batch;
llava_embd_batch(float * embd, int32_t n_tokens, llama_pos pos_0, llama_seq_id seq_id) {
pos .resize(n_tokens);
n_seq_id.resize(n_tokens);
seq_ids .resize(n_tokens + 1);
logits .resize(n_tokens);
seq_id_0.resize(1);
seq_id_0[0] = seq_id;
seq_ids [n_tokens] = nullptr;
batch = {
/*n_tokens =*/ n_tokens,
/*tokens =*/ nullptr,
/*embd =*/ embd,
/*pos =*/ pos.data(),
/*n_seq_id =*/ n_seq_id.data(),
/*seq_id =*/ seq_ids.data(),
/*logits =*/ logits.data(),
};
for (int i = 0; i < n_tokens; i++) {
batch.pos [i] = pos_0 + i;
batch.n_seq_id[i] = 1;
batch.seq_id [i] = seq_id_0.data();
batch.logits [i] = false;
}
}
};
struct llama_server_context
{
llama_model *model = nullptr;
llama_context *ctx = nullptr;
const llama_vocab * vocab = nullptr;
clip_ctx *clp_ctx = nullptr;
mtmd_context *mctx = nullptr;
gpt_params params;
@@ -491,11 +465,6 @@ struct llama_server_context
if (!params.mmproj.path.empty()) {
multimodal = true;
LOG_INFO("Multi Modal Mode Enabled", {});
clp_ctx = clip_model_load(params.mmproj.path.c_str(), /*verbosity=*/ 1);
if(clp_ctx == nullptr) {
LOG_ERR("unable to load clip model: %s", params.mmproj.path.c_str());
return false;
}
if (params.n_ctx < 2048) { // request larger context for the image embedding
params.n_ctx = 2048;
@@ -512,10 +481,24 @@ struct llama_server_context
}
if (multimodal) {
const int n_embd_clip = clip_n_mmproj_embd(clp_ctx);
const int n_embd_llm = llama_model_n_embd(model);
if (n_embd_clip != n_embd_llm) {
LOG("%s: embedding dim of the multimodal projector (%d) is not equal to that of LLaMA (%d). Make sure that you use the correct mmproj file.\n", __func__, n_embd_clip, n_embd_llm);
// mtmd_init_from_file requires the already-loaded text model, so it must
// run AFTER llama_init_from_gpt_params. It validates the projector
// against the model internally and returns nullptr on dim mismatch, so
// the explicit clip_n_mmproj_embd check is no longer needed.
mtmd_context_params mparams = mtmd_context_params_default();
mparams.use_gpu = params.mmproj_use_gpu;
mparams.print_timings = false;
mparams.n_threads = params.n_threads_mtmd != -1 ? params.n_threads_mtmd
: params.n_threads_batch != -1 ? params.n_threads_batch
: params.n_threads;
mparams.verbosity = GGML_LOG_LEVEL_INFO;
mparams.flash_attn_type = params.flash_attn ? LLAMA_FLASH_ATTN_TYPE_ENABLED
: LLAMA_FLASH_ATTN_TYPE_DISABLED;
mparams.image_min_tokens = params.image_min_tokens;
mparams.image_max_tokens = params.image_max_tokens;
mctx = mtmd_init_from_file(params.mmproj.path.c_str(), model, mparams);
if (mctx == nullptr) {
LOG_ERR("unable to load multimodal projector: %s", params.mmproj.path.c_str());
llama_free(ctx);
llama_free_model(model);
return false;
@@ -865,8 +848,8 @@ struct llama_server_context
slot_image img_sl;
img_sl.id = img.count("id") != 0 ? img["id"].get<int>() : slot->images.size();
img_sl.img_data = clip_image_u8_init();
if (!clip_image_load_from_bytes(image_buffer.data(), image_buffer.size(), img_sl.img_data))
img_sl.bitmap = mtmd_helper_bitmap_init_from_buf(mctx, image_buffer.data(), image_buffer.size());
if (img_sl.bitmap == nullptr)
{
LOG_ERR("%s: failed to load image, slot_id: %d, img_sl_id: %d",
__func__,
@@ -879,50 +862,74 @@ struct llama_server_context
{"slot_id", slot->id},
{"img_sl_id", img_sl.id}
});
img_sl.request_encode_image = true;
slot->images.push_back(img_sl);
}
// process prompt
// example: system prompt [img-102] user [img-103] describe [img-134] -> [{id: 102, prefix: 'system prompt '}, {id: 103, prefix: ' user '}, {id: 134, prefix: ' describe '}]}
// Translate the legacy [img-N] tags into mtmd media markers, in
// order, and collect the matching bitmaps in marker order so they
// line up with the markers passed to mtmd_tokenize(). The text after
// the last image stays in input_suffix and is decoded through the
// normal token path, so the sampling loop is unchanged.
// example: system prompt [img-102] user [img-103] describe [img-134]
if (slot->images.size() > 0 && !slot->prompt.is_array())
{
const std::string marker = mtmd_default_marker();
std::string prompt = slot->prompt.get<std::string>();
size_t pos = 0, begin_prefix = 0;
std::string built_prompt;
std::vector<slot_image> ordered;
size_t pos = 0, copy_from = 0;
std::string pattern = "[img-";
while ((pos = prompt.find(pattern, pos)) != std::string::npos) {
size_t end_prefix = pos;
pos += pattern.length();
size_t end_pos = prompt.find(']', pos);
if (end_pos != std::string::npos)
{
std::string image_id = prompt.substr(pos, end_pos - pos);
try
{
int img_id = std::stoi(image_id);
bool found = false;
for (slot_image &img : slot->images)
{
if (img.id == img_id) {
found = true;
img.prefix_prompt = prompt.substr(begin_prefix, end_prefix - begin_prefix);
begin_prefix = end_pos + 1;
break;
}
}
if (!found) {
LOG("ERROR: Image with id: %i, not found.\n", img_id);
slot->images.clear();
return false;
}
} catch (const std::invalid_argument& e) {
LOG("Invalid image number id in prompt\n");
slot->images.clear();
return false;
auto free_images = [&]() {
for (slot_image &img : slot->images) {
if (img.bitmap) {
mtmd_bitmap_free(img.bitmap);
img.bitmap = nullptr;
}
}
slot->images.clear();
};
while ((pos = prompt.find(pattern, pos)) != std::string::npos) {
size_t tag_begin = pos;
pos += pattern.length();
size_t end_pos = prompt.find(']', pos);
if (end_pos == std::string::npos) {
break;
}
std::string image_id = prompt.substr(pos, end_pos - pos);
try
{
int img_id = std::stoi(image_id);
bool found = false;
for (slot_image &img : slot->images)
{
if (img.id == img_id) {
found = true;
// text before this tag, then the media marker
built_prompt += prompt.substr(copy_from, tag_begin - copy_from);
built_prompt += marker;
copy_from = end_pos + 1;
ordered.push_back(img);
break;
}
}
if (!found) {
LOG("ERROR: Image with id: %i, not found.\n", img_id);
free_images();
return false;
}
} catch (const std::invalid_argument& e) {
LOG("Invalid image number id in prompt\n");
free_images();
return false;
}
pos = end_pos + 1;
}
// bitmaps are consumed in marker order by mtmd_tokenize()
slot->images = ordered;
slot->mtmd_prompt = built_prompt;
slot->prompt = "";
slot->params.input_suffix = prompt.substr(begin_prefix);
slot->params.input_suffix = prompt.substr(copy_from);
slot->params.cache_prompt = false; // multimodal doesn't support cache prompt
}
}
@@ -1176,21 +1183,10 @@ struct llama_server_context
bool process_images(llama_client_slot &slot) const
{
for (slot_image &img : slot.images)
{
if (!img.request_encode_image)
{
continue;
}
if (!llava_image_embed_make_with_clip_img(clp_ctx, params.n_threads, img.img_data, &img.image_embedding, &img.image_tokens)) {
LOG("Error processing the given image");
return false;
}
img.request_encode_image = false;
}
// With the mtmd pipeline, image encoding is no longer eager: the bitmaps
// are tokenized and encoded together with the surrounding text inside
// ingest_images() via mtmd_tokenize() + mtmd_helper_eval_chunks(). This
// just reports whether the slot carries any images to process.
return slot.images.size() > 0;
}
@@ -1435,69 +1431,70 @@ struct llama_server_context
}
}
// for multiple images processing
// Tokenize the multimodal prompt (text interleaved with media markers) together
// with the slot's bitmaps, then decode the resulting chunks into the llama
// context via the high-level mtmd helper. The helper runs llama_decode() on the
// text chunks and mtmd_encode() + llama_decode() on the image chunks, handling
// batching and any pre/post decode setup (e.g. non-causal attention for gemma3).
// Advances slot.n_past by the number of positions consumed, then leaves the
// post-image suffix tokens in `batch` so the normal decode + sampling loop
// produces the first generated token.
bool ingest_images(llama_client_slot &slot, int n_batch)
{
int image_idx = 0;
while (image_idx < (int) slot.images.size())
if (mctx == nullptr)
{
slot_image &img = slot.images[image_idx];
LOG("%s : multimodal context is not initialized\n", __func__);
return false;
}
// process prefix prompt
for (int32_t i = 0; i < (int32_t) batch.n_tokens; i += n_batch)
{
const int32_t n_tokens = std::min(n_batch, (int32_t) (batch.n_tokens - i));
llama_batch batch_view = {
n_tokens,
batch.token + i,
nullptr,
batch.pos + i,
batch.n_seq_id + i,
batch.seq_id + i,
batch.logits + i,
};
if (llama_decode(ctx, batch_view))
{
LOG("%s : failed to eval\n", __func__);
return false;
}
}
// bitmaps stay owned by slot.images (freed on reset()); pass non-owning ptrs
std::vector<const mtmd_bitmap *> bitmaps;
bitmaps.reserve(slot.images.size());
for (const slot_image &img : slot.images)
{
bitmaps.push_back(img.bitmap);
}
// process image with llm
for (int i = 0; i < img.image_tokens; i += n_batch)
{
int n_eval = img.image_tokens - i;
if (n_eval > n_batch)
{
n_eval = n_batch;
}
mtmd_input_text inp_txt;
inp_txt.text = slot.mtmd_prompt.c_str();
inp_txt.add_special = add_bos_token;
inp_txt.parse_special = true;
const int n_embd = llama_model_n_embd(model);
float * embd = img.image_embedding + i * n_embd;
llava_embd_batch llava_batch = llava_embd_batch(embd, n_eval, slot.n_past, 0);
if (llama_decode(ctx, llava_batch.batch))
{
LOG("%s : failed to eval image\n", __func__);
return false;
}
slot.n_past += n_eval;
}
image_idx++;
mtmd::input_chunks chunks(mtmd_input_chunks_init());
int32_t res = mtmd_tokenize(mctx,
chunks.ptr.get(),
&inp_txt,
bitmaps.data(),
bitmaps.size());
if (res != 0)
{
LOG("%s : failed to tokenize multimodal prompt, res = %d\n", __func__, res);
return false;
}
common_batch_clear(batch);
const llama_pos start_pos = (llama_pos) system_tokens.size() + slot.n_past;
llama_pos new_n_past = start_pos;
if (mtmd_helper_eval_chunks(mctx,
ctx,
chunks.ptr.get(),
start_pos,
slot.id,
n_batch,
/*logits_last=*/ false,
&new_n_past) != 0)
{
LOG("%s : failed to eval multimodal chunks\n", __func__);
return false;
}
slot.n_past += (int32_t) (new_n_past - start_pos);
// append prefix of next image
const auto json_prompt = (image_idx >= (int) slot.images.size()) ?
slot.params.input_suffix : // no more images, then process suffix prompt
(json)(slot.images[image_idx].prefix_prompt);
std::vector<llama_token> append_tokens = tokenize(json_prompt, false); // has next image
for (int i = 0; i < (int) append_tokens.size(); ++i)
{
common_batch_add(batch, append_tokens[i], system_tokens.size() + slot.n_past, { slot.id }, true);
slot.n_past += 1;
}
// queue the post-image suffix text for the normal decode + sampling path
common_batch_clear(batch);
std::vector<llama_token> suffix_tokens = tokenize(slot.params.input_suffix, false);
for (llama_token tok : suffix_tokens)
{
common_batch_add(batch, tok, system_tokens.size() + slot.n_past, { slot.id }, false);
slot.n_past += 1;
}
return true;
@@ -1884,8 +1881,11 @@ struct llama_server_context
const bool has_images = process_images(slot);
// process the prefix of first image
std::vector<llama_token> prefix_tokens = has_images ? tokenize(slot.images[0].prefix_prompt, add_bos_token) : prompt_tokens;
// For the multimodal path the whole pre-image / inter-image text is
// tokenized and decoded inside ingest_images() via mtmd, so no prefix
// tokens are queued here; the post-image suffix is appended by
// ingest_images() for the normal decode + sampling loop.
std::vector<llama_token> prefix_tokens = has_images ? std::vector<llama_token>() : prompt_tokens;
int32_t slot_npast = slot.n_past_se > 0 ? slot.n_past_se : slot.n_past;

View File

@@ -1,11 +0,0 @@
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -2494,7 +2494,7 @@
}
new_data = work.data();
- new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr);
+ new_size = ggml_quantize_chunk(new_type, f32_data, new_data, 0, n_elms/cur->ne[0], cur->ne[0], nullptr, nullptr);
} else {
new_type = cur->type;
new_data = cur->data;

View File

@@ -17,28 +17,9 @@ cp -r grpc-server.cpp llama.cpp/examples/grpc-server/
cp -r utils.hpp llama.cpp/examples/grpc-server/
cp -rfv llama.cpp/vendor/nlohmann/json.hpp llama.cpp/examples/grpc-server/
## Copy clip/llava files for multimodal support (built as myclip library)
cp -rfv llama.cpp/examples/llava/clip.h llama.cpp/examples/grpc-server/clip.h
cp -rfv llama.cpp/examples/llava/clip.cpp llama.cpp/examples/grpc-server/clip.cpp
cp -rfv llama.cpp/examples/llava/llava.cpp llama.cpp/examples/grpc-server/llava.cpp
# Prepend llama.h include to llava.h
echo '#include "llama.h"' > llama.cpp/examples/grpc-server/llava.h
cat llama.cpp/examples/llava/llava.h >> llama.cpp/examples/grpc-server/llava.h
# Copy clip-impl.h if it exists
if [ -f llama.cpp/examples/llava/clip-impl.h ]; then
cp -rfv llama.cpp/examples/llava/clip-impl.h llama.cpp/examples/grpc-server/clip-impl.h
fi
# Copy stb_image.h
if [ -f llama.cpp/vendor/stb/stb_image.h ]; then
cp -rfv llama.cpp/vendor/stb/stb_image.h llama.cpp/examples/grpc-server/stb_image.h
elif [ -f llama.cpp/common/stb_image.h ]; then
cp -rfv llama.cpp/common/stb_image.h llama.cpp/examples/grpc-server/stb_image.h
fi
## Fix API compatibility in llava.cpp (llama_n_embd -> llama_model_n_embd)
if [ -f llama.cpp/examples/grpc-server/llava.cpp ]; then
sed -i 's/llama_n_embd(/llama_model_n_embd(/g' llama.cpp/examples/grpc-server/llava.cpp
fi
## Multimodal support is provided by the `mtmd` library target (examples/mtmd/),
## which the grpc-server links and includes directly. No source copy is needed:
## clip/llava were pruned upstream and the high-level mtmd_* API is used instead.
set +e
if grep -q "grpc-server" llama.cpp/examples/CMakeLists.txt; then

View File

@@ -11,9 +11,12 @@
#include "json.hpp"
#include "clip.h"
#include "mtmd.h"
using json = nlohmann::json;
// mtmd.h and ik_llama's entire server/common stack (chat.h, server-common.h,
// server-task.h, ...) declare `using json = nlohmann::ordered_json`, so match it
// here: a plain `nlohmann::json` alias collides with mtmd.h's at global scope.
using json = nlohmann::ordered_json;
extern bool server_verbose;
@@ -111,13 +114,12 @@ struct slot_image
{
int32_t id;
bool request_encode_image = false;
float * image_embedding = nullptr;
int32_t image_tokens = 0;
clip_image_u8 * img_data;
std::string prefix_prompt; // before of this image
// mtmd bitmap (image/audio) decoded from the request buffer. Owned by the
// slot; freed via mtmd_bitmap_free() on reset. The high-level mtmd pipeline
// (mtmd_tokenize + mtmd_helper_eval_chunks) consumes these directly, so the
// legacy eager-encode fields (embedding/tokens) and per-image prefix prompt
// are no longer needed.
mtmd_bitmap * bitmap = nullptr;
};
// completion token output with probabilities

View File

@@ -87,3 +87,18 @@ target_compile_features(${TARGET} PRIVATE cxx_std_11)
if(TARGET BUILD_INFO)
add_dependencies(${TARGET} BUILD_INFO)
endif()
# Unit test for the message-content normalization helper (message_content.h).
# Off by default so the normal backend build is untouched; enable with
# -DLLAMA_GRPC_BUILD_TESTS=ON and run via ctest. It reuses llama.cpp's vendored
# <nlohmann/json.hpp> (propagated by the common helpers library) so it has no
# extra dependency beyond what the backend already builds against.
option(LLAMA_GRPC_BUILD_TESTS "Build grpc-server unit tests" OFF)
if(LLAMA_GRPC_BUILD_TESTS)
enable_testing()
add_executable(message_content_test message_content_test.cpp message_content.h)
target_include_directories(message_content_test PRIVATE ${CMAKE_CURRENT_SOURCE_DIR})
target_link_libraries(message_content_test PRIVATE ${_LLAMA_COMMON_TARGET})
target_compile_features(message_content_test PRIVATE cxx_std_17)
add_test(NAME message_content_test COMMAND message_content_test)
endif()

View File

@@ -39,6 +39,7 @@
#include "common.h"
#include "arg.h"
#include "chat-auto-parser.h"
#include "message_content.h"
#include <getopt.h>
#include <grpcpp/ext/proto_server_reflection_plugin.h>
#include <grpcpp/grpcpp.h>
@@ -1616,242 +1617,20 @@ public:
for (int i = 0; i < request->messages_size(); i++) {
const auto& msg = request->messages(i);
json msg_json;
msg_json["role"] = msg.role();
bool is_last_user_msg = (i == last_user_msg_idx);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0 || request->videos_size() > 0);
// Handle content - can be string, null, or array
// For multimodal content, we'll embed images/audio from separate fields
if (!msg.content().empty()) {
// Try to parse content as JSON to see if it's already an array
json content_val;
try {
content_val = json::parse(msg.content());
// Handle null values - convert to empty string to avoid template errors
if (content_val.is_null()) {
content_val = "";
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
content_val = msg.content();
}
// If content is an object (e.g., from tool call failures), convert to string
if (content_val.is_object()) {
content_val = content_val.dump();
}
// If content is a string and this is the last user message with images/audio, combine them
if (content_val.is_string() && is_last_user_msg && has_images_or_audio) {
json content_array = json::array();
// Add text first
content_array.push_back({{"type", "text"}, {"text", content_val.get<std::string>()}});
// Add images
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
// Add audios
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else {
// Use content as-is (already array or not last user message)
// Ensure null values are converted to empty string
if (content_val.is_null()) {
msg_json["content"] = "";
} else {
msg_json["content"] = content_val;
}
}
} else if (is_last_user_msg && has_images_or_audio) {
// If no content but this is the last user message with images/audio, create content array
json content_array = json::array();
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else if (msg.role() == "tool") {
// Tool role messages must have content field set, even if empty
// Jinja templates expect content to be a string, not null or object
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d is tool role, content_empty=%d\n", i, msg.content().empty() ? 1 : 0);
if (msg.content().empty()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): empty content, set to empty string\n", i);
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): content exists: %s\n",
i, msg.content().substr(0, std::min<size_t>(200, msg.content().size())).c_str());
// Content exists, parse and ensure it's a string
json content_val;
try {
content_val = json::parse(msg.content());
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): parsed JSON, type=%s\n",
i, content_val.is_null() ? "null" :
content_val.is_object() ? "object" :
content_val.is_string() ? "string" :
content_val.is_array() ? "array" : "other");
// Handle null values - Jinja templates expect content to be a string, not null
if (content_val.is_null()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): null content, converted to empty string\n", i);
} else if (content_val.is_object()) {
// If content is an object (e.g., from tool call failures/errors), convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): object content, converted to string: %s\n",
i, content_val.dump().substr(0, std::min<size_t>(200, content_val.dump().size())).c_str());
} else if (content_val.is_string()) {
msg_json["content"] = content_val.get<std::string>();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): string content, using as-is\n", i);
} else {
// For arrays or other types, convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): %s content, converted to string\n",
i, content_val.is_array() ? "array" : "other type");
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
msg_json["content"] = msg.content();
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (tool): not JSON, using as string\n", i);
}
}
} else {
// Ensure all messages have content set (fallback for any unhandled cases)
// Jinja templates expect content to be present, default to empty string if not set
if (!msg_json.contains("content")) {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d (role=%s): no content field, adding empty string\n",
i, msg.role().c_str());
msg_json["content"] = "";
}
llama_grpc::ReconstructedMessageInput rin;
rin.role = msg.role();
rin.content = msg.content();
rin.name = msg.name();
rin.tool_call_id = msg.tool_call_id();
rin.reasoning_content = msg.reasoning_content();
rin.tool_calls = msg.tool_calls();
rin.is_last_user_msg = (i == last_user_msg_idx);
if (rin.is_last_user_msg) {
for (int j = 0; j < request->images_size(); j++) rin.images.push_back(request->images(j));
for (int j = 0; j < request->audios_size(); j++) rin.audios.push_back(request->audios(j));
for (int j = 0; j < request->videos_size(); j++) rin.videos.push_back(request->videos(j));
}
// Add optional fields for OpenAI-compatible message format
if (!msg.name().empty()) {
msg_json["name"] = msg.name();
}
if (!msg.tool_call_id().empty()) {
msg_json["tool_call_id"] = msg.tool_call_id();
}
if (!msg.reasoning_content().empty()) {
msg_json["reasoning_content"] = msg.reasoning_content();
}
if (!msg.tool_calls().empty()) {
// Parse tool_calls JSON string and add to message
try {
json tool_calls = json::parse(msg.tool_calls());
msg_json["tool_calls"] = tool_calls;
SRV_INF("[TOOL CALLS DEBUG] PredictStream: Message %d has tool_calls: %s\n", i, tool_calls.dump().c_str());
// IMPORTANT: If message has tool_calls but content is empty or not set,
// set content to space " " instead of empty string "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat converts empty strings to null (line 312),
// which causes template errors when accessing message.content[:tool_start_length]
if (!msg_json.contains("content") || (msg_json.contains("content") && msg_json["content"].is_string() && msg_json["content"].get<std::string>().empty())) {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d has tool_calls but empty content, setting to space\n", i);
msg_json["content"] = " ";
}
// Log each tool call with name and arguments
if (tool_calls.is_array()) {
for (size_t tc_idx = 0; tc_idx < tool_calls.size(); tc_idx++) {
const auto& tc = tool_calls[tc_idx];
std::string tool_name = "unknown";
std::string tool_args = "{}";
if (tc.contains("function")) {
const auto& func = tc["function"];
if (func.contains("name")) {
tool_name = func["name"].get<std::string>();
}
if (func.contains("arguments")) {
tool_args = func["arguments"].is_string() ?
func["arguments"].get<std::string>() :
func["arguments"].dump();
}
} else if (tc.contains("name")) {
tool_name = tc["name"].get<std::string>();
if (tc.contains("arguments")) {
tool_args = tc["arguments"].is_string() ?
tc["arguments"].get<std::string>() :
tc["arguments"].dump();
}
}
SRV_INF("[TOOL CALLS DEBUG] PredictStream: Message %d, tool_call %zu: name=%s, arguments=%s\n",
i, tc_idx, tool_name.c_str(), tool_args.c_str());
}
}
} catch (const json::parse_error& e) {
SRV_WRN("Failed to parse tool_calls JSON: %s\n", e.what());
}
}
// Debug: Log final content state before adding to array
if (msg_json.contains("content")) {
if (msg_json["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d FINAL STATE: content is NULL - THIS WILL CAUSE ERROR!\n", i);
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d FINAL STATE: content type=%s, has_value=%d\n",
i, msg_json["content"].is_string() ? "string" :
msg_json["content"].is_array() ? "array" :
msg_json["content"].is_object() ? "object" : "other",
msg_json["content"].is_null() ? 0 : 1);
}
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: Message %d FINAL STATE: NO CONTENT FIELD - THIS WILL CAUSE ERROR!\n", i);
}
messages_json.push_back(msg_json);
messages_json.push_back(llama_grpc::build_reconstructed_message(rin));
}
// Final safety check: Ensure no message has null content (Jinja templates require strings)
@@ -2072,36 +1851,7 @@ public:
if (body_json.contains("messages") && body_json["messages"].is_array()) {
SRV_INF("[CONTENT DEBUG] PredictStream: Before oaicompat_chat_params_parse - checking %zu messages\n", body_json["messages"].size());
for (size_t idx = 0; idx < body_json["messages"].size(); idx++) {
auto& msg = body_json["messages"][idx];
std::string role_str = msg.contains("role") ? msg["role"].get<std::string>() : "unknown";
if (msg.contains("content")) {
if (msg["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s) has NULL content - FIXING!\n", idx, role_str.c_str());
msg["content"] = ""; // Fix null content
} else if (role_str == "tool" && msg["content"].is_array()) {
// Tool messages must have string content, not array
// oaicompat_chat_params_parse expects tool messages to have string content
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=tool) has array content, converting to string\n", idx);
msg["content"] = msg["content"].dump();
} else if (!msg["content"].is_string() && !msg["content"].is_array()) {
// If content is object or other non-string type, convert to string for templates
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s) content is not string/array, converting\n", idx, role_str.c_str());
if (msg["content"].is_object()) {
msg["content"] = msg["content"].dump();
} else {
msg["content"] = "";
}
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s): content type=%s\n",
idx, role_str.c_str(),
msg["content"].is_string() ? "string" :
msg["content"].is_array() ? "array" :
msg["content"].is_object() ? "object" : "other");
}
} else {
SRV_INF("[CONTENT DEBUG] PredictStream: BEFORE TEMPLATE - Message %zu (role=%s) MISSING content field - ADDING!\n", idx, role_str.c_str());
msg["content"] = ""; // Add missing content
}
llama_grpc::normalize_template_message(body_json["messages"][idx]);
}
}
@@ -2433,264 +2183,20 @@ public:
SRV_INF("[CONTENT DEBUG] Predict: Processing %d messages\n", request->messages_size());
for (int i = 0; i < request->messages_size(); i++) {
const auto& msg = request->messages(i);
json msg_json;
msg_json["role"] = msg.role();
SRV_INF("[CONTENT DEBUG] Predict: Message %d: role=%s, content_empty=%d, content_length=%zu\n",
i, msg.role().c_str(), msg.content().empty() ? 1 : 0, msg.content().size());
if (!msg.content().empty()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d content (first 200 chars): %s\n",
i, msg.content().substr(0, std::min<size_t>(200, msg.content().size())).c_str());
llama_grpc::ReconstructedMessageInput rin;
rin.role = msg.role();
rin.content = msg.content();
rin.name = msg.name();
rin.tool_call_id = msg.tool_call_id();
rin.reasoning_content = msg.reasoning_content();
rin.tool_calls = msg.tool_calls();
rin.is_last_user_msg = (i == last_user_msg_idx);
if (rin.is_last_user_msg) {
for (int j = 0; j < request->images_size(); j++) rin.images.push_back(request->images(j));
for (int j = 0; j < request->audios_size(); j++) rin.audios.push_back(request->audios(j));
for (int j = 0; j < request->videos_size(); j++) rin.videos.push_back(request->videos(j));
}
bool is_last_user_msg = (i == last_user_msg_idx);
bool has_images_or_audio = (request->images_size() > 0 || request->audios_size() > 0 || request->videos_size() > 0);
// Handle content - can be string, null, or array
// For multimodal content, we'll embed images/audio from separate fields
if (!msg.content().empty()) {
// Try to parse content as JSON to see if it's already an array
json content_val;
try {
content_val = json::parse(msg.content());
// Handle null values - convert to empty string to avoid template errors
if (content_val.is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d parsed JSON is null, converting to empty string\n", i);
content_val = "";
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
content_val = msg.content();
}
// If content is an object (e.g., from tool call failures), convert to string
if (content_val.is_object()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d content is object, converting to string\n", i);
content_val = content_val.dump();
}
// If content is a string and this is the last user message with images/audio, combine them
if (content_val.is_string() && is_last_user_msg && has_images_or_audio) {
json content_array = json::array();
// Add text first
content_array.push_back({{"type", "text"}, {"text", content_val.get<std::string>()}});
// Add images
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
// Add audios
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
} else {
// Use content as-is (already array or not last user message)
// Ensure null values are converted to empty string
if (content_val.is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d content_val was null, setting to empty string\n", i);
msg_json["content"] = "";
} else {
msg_json["content"] = content_val;
SRV_INF("[CONTENT DEBUG] Predict: Message %d content set, type=%s\n",
i, content_val.is_string() ? "string" :
content_val.is_array() ? "array" :
content_val.is_object() ? "object" : "other");
}
}
} else if (is_last_user_msg && has_images_or_audio) {
// If no content but this is the last user message with images/audio, create content array
json content_array = json::array();
if (request->images_size() > 0) {
for (int j = 0; j < request->images_size(); j++) {
json image_chunk;
image_chunk["type"] = "image_url";
json image_url;
image_url["url"] = "data:image/jpeg;base64," + request->images(j);
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
}
if (request->audios_size() > 0) {
for (int j = 0; j < request->audios_size(); j++) {
json audio_chunk;
audio_chunk["type"] = "input_audio";
json input_audio;
input_audio["data"] = request->audios(j);
input_audio["format"] = "wav"; // default, could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
}
if (request->videos_size() > 0) {
for (int j = 0; j < request->videos_size(); j++) {
json video_chunk;
video_chunk["type"] = "input_video";
json input_video;
input_video["data"] = request->videos(j);
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
msg_json["content"] = content_array;
SRV_INF("[CONTENT DEBUG] Predict: Message %d created content array with media\n", i);
} else if (!msg.tool_calls().empty()) {
// Tool call messages may have null content, but templates expect string
// IMPORTANT: Set to space " " instead of empty string "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat converts empty strings to null (line 312),
// which causes template errors when accessing message.content[:tool_start_length]
SRV_INF("[CONTENT DEBUG] Predict: Message %d has tool_calls, setting content to space (not empty string)\n", i);
msg_json["content"] = " ";
} else if (msg.role() == "tool") {
// Tool role messages must have content field set, even if empty
// Jinja templates expect content to be a string, not null or object
SRV_INF("[CONTENT DEBUG] Predict: Message %d is tool role, content_empty=%d\n", i, msg.content().empty() ? 1 : 0);
if (msg.content().empty()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): empty content, set to empty string\n", i);
} else {
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): content exists: %s\n",
i, msg.content().substr(0, std::min<size_t>(200, msg.content().size())).c_str());
// Content exists, parse and ensure it's a string
json content_val;
try {
content_val = json::parse(msg.content());
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): parsed JSON, type=%s\n",
i, content_val.is_null() ? "null" :
content_val.is_object() ? "object" :
content_val.is_string() ? "string" :
content_val.is_array() ? "array" : "other");
// Handle null values - Jinja templates expect content to be a string, not null
if (content_val.is_null()) {
msg_json["content"] = "";
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): null content, converted to empty string\n", i);
} else if (content_val.is_object()) {
// If content is an object (e.g., from tool call failures/errors), convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): object content, converted to string: %s\n",
i, content_val.dump().substr(0, std::min<size_t>(200, content_val.dump().size())).c_str());
} else if (content_val.is_string()) {
msg_json["content"] = content_val.get<std::string>();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): string content, using as-is\n", i);
} else {
// For arrays or other types, convert to string
msg_json["content"] = content_val.dump();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): %s content, converted to string\n",
i, content_val.is_array() ? "array" : "other type");
}
} catch (const json::parse_error&) {
// Not JSON, treat as plain string
msg_json["content"] = msg.content();
SRV_INF("[CONTENT DEBUG] Predict: Message %d (tool): not JSON, using as string\n", i);
}
}
} else {
// Ensure all messages have content set (fallback for any unhandled cases)
// Jinja templates expect content to be present, default to empty string if not set
if (!msg_json.contains("content")) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d (role=%s): no content field, adding empty string\n",
i, msg.role().c_str());
msg_json["content"] = "";
}
}
// Add optional fields for OpenAI-compatible message format
if (!msg.name().empty()) {
msg_json["name"] = msg.name();
}
if (!msg.tool_call_id().empty()) {
msg_json["tool_call_id"] = msg.tool_call_id();
}
if (!msg.reasoning_content().empty()) {
msg_json["reasoning_content"] = msg.reasoning_content();
}
if (!msg.tool_calls().empty()) {
// Parse tool_calls JSON string and add to message
try {
json tool_calls = json::parse(msg.tool_calls());
msg_json["tool_calls"] = tool_calls;
SRV_INF("[TOOL CALLS DEBUG] Predict: Message %d has tool_calls: %s\n", i, tool_calls.dump().c_str());
// IMPORTANT: If message has tool_calls but content is empty or not set,
// set content to space " " instead of empty string "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat converts empty strings to null (line 312),
// which causes template errors when accessing message.content[:tool_start_length]
if (!msg_json.contains("content") || (msg_json.contains("content") && msg_json["content"].is_string() && msg_json["content"].get<std::string>().empty())) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d has tool_calls but empty content, setting to space\n", i);
msg_json["content"] = " ";
}
// Log each tool call with name and arguments
if (tool_calls.is_array()) {
for (size_t tc_idx = 0; tc_idx < tool_calls.size(); tc_idx++) {
const auto& tc = tool_calls[tc_idx];
std::string tool_name = "unknown";
std::string tool_args = "{}";
if (tc.contains("function")) {
const auto& func = tc["function"];
if (func.contains("name")) {
tool_name = func["name"].get<std::string>();
}
if (func.contains("arguments")) {
tool_args = func["arguments"].is_string() ?
func["arguments"].get<std::string>() :
func["arguments"].dump();
}
} else if (tc.contains("name")) {
tool_name = tc["name"].get<std::string>();
if (tc.contains("arguments")) {
tool_args = tc["arguments"].is_string() ?
tc["arguments"].get<std::string>() :
tc["arguments"].dump();
}
}
SRV_INF("[TOOL CALLS DEBUG] Predict: Message %d, tool_call %zu: name=%s, arguments=%s\n",
i, tc_idx, tool_name.c_str(), tool_args.c_str());
}
}
} catch (const json::parse_error& e) {
SRV_WRN("Failed to parse tool_calls JSON: %s\n", e.what());
}
}
// Debug: Log final content state before adding to array
if (msg_json.contains("content")) {
if (msg_json["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: Message %d FINAL STATE: content is NULL - THIS WILL CAUSE ERROR!\n", i);
} else {
SRV_INF("[CONTENT DEBUG] Predict: Message %d FINAL STATE: content type=%s, has_value=%d\n",
i, msg_json["content"].is_string() ? "string" :
msg_json["content"].is_array() ? "array" :
msg_json["content"].is_object() ? "object" : "other",
msg_json["content"].is_null() ? 0 : 1);
}
} else {
SRV_INF("[CONTENT DEBUG] Predict: Message %d FINAL STATE: NO CONTENT FIELD - THIS WILL CAUSE ERROR!\n", i);
}
messages_json.push_back(msg_json);
messages_json.push_back(llama_grpc::build_reconstructed_message(rin));
}
// Final safety check: Ensure no message has null content (Jinja templates require strings)
@@ -2911,36 +2417,7 @@ public:
if (body_json.contains("messages") && body_json["messages"].is_array()) {
SRV_INF("[CONTENT DEBUG] Predict: Before oaicompat_chat_params_parse - checking %zu messages\n", body_json["messages"].size());
for (size_t idx = 0; idx < body_json["messages"].size(); idx++) {
auto& msg = body_json["messages"][idx];
std::string role_str = msg.contains("role") ? msg["role"].get<std::string>() : "unknown";
if (msg.contains("content")) {
if (msg["content"].is_null()) {
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s) has NULL content - FIXING!\n", idx, role_str.c_str());
msg["content"] = ""; // Fix null content
} else if (role_str == "tool" && msg["content"].is_array()) {
// Tool messages must have string content, not array
// oaicompat_chat_params_parse expects tool messages to have string content
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=tool) has array content, converting to string\n", idx);
msg["content"] = msg["content"].dump();
} else if (!msg["content"].is_string() && !msg["content"].is_array()) {
// If content is object or other non-string type, convert to string for templates
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s) content is not string/array, converting\n", idx, role_str.c_str());
if (msg["content"].is_object()) {
msg["content"] = msg["content"].dump();
} else {
msg["content"] = "";
}
} else {
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s): content type=%s\n",
idx, role_str.c_str(),
msg["content"].is_string() ? "string" :
msg["content"].is_array() ? "array" :
msg["content"].is_object() ? "object" : "other");
}
} else {
SRV_INF("[CONTENT DEBUG] Predict: BEFORE TEMPLATE - Message %zu (role=%s) MISSING content field - ADDING!\n", idx, role_str.c_str());
msg["content"] = ""; // Add missing content
}
llama_grpc::normalize_template_message(body_json["messages"][idx]);
}
}

View File

@@ -0,0 +1,192 @@
#pragma once
#include <string>
#include <vector>
#include <nlohmann/json.hpp>
namespace llama_grpc {
// Normalizes a proto message's content string into the JSON value used when
// reconstructing OpenAI-format messages for the tokenizer (jinja) template.
//
// Shared by the streaming (PredictStream) and non-streaming (Predict) message
// reconstruction paths so the two cannot drift.
//
// LocalAI's Go layer (schema.Messages.ToProto) always sends content as a plain
// text string; multimodal media travels in separate proto fields, never inside
// content. So user/system/developer content is *only ever* opaque text and must
// NOT be JSON-sniffed: a prompt that merely looks like JSON (e.g. an ingredient
// list ["1/4 cup sugar", ...]) would otherwise be reinterpreted as structured
// content parts and rejected by oaicompat_chat_params_parse with
// "unsupported content[].type" (https://github.com/mudler/LocalAI/issues/10524).
// (developer is OpenAI's modern system alias - same "human-authored text" nature.)
//
// For assistant/tool messages we still collapse a literal JSON null/object
// (tool-call bookkeeping) to a string, but we never turn a plain string into an
// array/scalar. The array defense is therefore role-independent (arrays/scalars
// fall through for every role); the role gate only governs the null/object case.
inline nlohmann::ordered_json normalize_message_content(const std::string& role,
const std::string& content) {
nlohmann::ordered_json content_val = content;
if (role != "user" && role != "system" && role != "developer") {
try {
nlohmann::ordered_json parsed = nlohmann::ordered_json::parse(content);
if (parsed.is_null()) {
content_val = "";
} else if (parsed.is_object()) {
content_val = parsed.dump();
}
// arrays / scalars: keep the original plain-text string as-is
} catch (const nlohmann::ordered_json::parse_error&) {
// Not JSON, already the plain string
}
}
return content_val;
}
// Final safety pass applied to each reconstructed OpenAI message right before it
// is handed to oaicompat_chat_params_parse (jinja templating). Jinja templates
// assume content is a string: a literal null breaks slicing such as
// message.content[:N] (#7324), and a tool message with array content is rejected
// (#7528). A multimodal user message legitimately carries a typed-part array
// ({type:text}, {type:image_url}, ...), which must be left intact. Shared by the
// streaming and non-streaming paths so this invariant cannot drift between them.
inline void normalize_template_message(nlohmann::ordered_json& msg) {
if (!msg.contains("content")) {
msg["content"] = ""; // templates expect the field to exist
return;
}
nlohmann::ordered_json& content = msg["content"];
const std::string role = (msg.contains("role") && msg["role"].is_string())
? msg["role"].get<std::string>()
: std::string();
if (content.is_null()) {
content = ""; // #7324: null would crash content[:N] slicing
} else if (role == "tool" && content.is_array()) {
content = content.dump(); // #7528: tool messages must have string content
} else if (!content.is_string() && !content.is_array()) {
if (content.is_object()) {
content = content.dump(); // tool-call bookkeeping object -> string
} else {
content = ""; // other scalar (number/bool) -> empty
}
}
// string, or a non-tool (multimodal) typed-part array: leave untouched
}
// One proto message's data, flattened to plain types so the reconstruction logic
// can be shared and unit-tested without protobuf. The streaming and non-streaming
// predict paths both populate this from proto::Message + the request's media.
struct ReconstructedMessageInput {
std::string role;
std::string content; // proto.Message.content (always a plain string)
std::string name;
std::string tool_call_id;
std::string reasoning_content;
std::string tool_calls; // tool_calls as a JSON string, or empty
bool is_last_user_msg = false; // attach request media to this message
std::vector<std::string> images; // base64 (jpeg)
std::vector<std::string> audios; // base64 (wav)
std::vector<std::string> videos; // base64
};
// Appends the request's media as OpenAI typed content parts. Imperative (not
// brace-init) to avoid nlohmann's object-vs-array initializer-list ambiguity.
inline void append_media_parts(nlohmann::ordered_json& content_array,
const std::vector<std::string>& images,
const std::vector<std::string>& audios,
const std::vector<std::string>& videos) {
for (const auto& img : images) {
nlohmann::ordered_json image_chunk;
image_chunk["type"] = "image_url";
nlohmann::ordered_json image_url;
image_url["url"] = "data:image/jpeg;base64," + img;
image_chunk["image_url"] = image_url;
content_array.push_back(image_chunk);
}
for (const auto& aud : audios) {
nlohmann::ordered_json audio_chunk;
audio_chunk["type"] = "input_audio";
nlohmann::ordered_json input_audio;
input_audio["data"] = aud;
input_audio["format"] = "wav"; // default; could be made configurable
audio_chunk["input_audio"] = input_audio;
content_array.push_back(audio_chunk);
}
for (const auto& vid : videos) {
nlohmann::ordered_json video_chunk;
video_chunk["type"] = "input_video";
nlohmann::ordered_json input_video;
input_video["data"] = vid;
video_chunk["input_video"] = input_video;
content_array.push_back(video_chunk);
}
}
// Reconstructs a single OpenAI-format message (the object fed to
// oaicompat_chat_params_parse) from a proto message. Shared by PredictStream and
// Predict so the content/multimodal/tool_calls handling cannot drift between the
// two stream modes (it previously lived as two ~150-line copies with a redundant
// Predict-only tool_calls->" " branch). Guarantees content is always a string or
// a typed-part array, never null/missing.
inline nlohmann::ordered_json build_reconstructed_message(const ReconstructedMessageInput& in) {
nlohmann::ordered_json msg_json;
msg_json["role"] = in.role;
const bool has_media = !in.images.empty() || !in.audios.empty() || !in.videos.empty();
if (!in.content.empty()) {
nlohmann::ordered_json content_val = normalize_message_content(in.role, in.content);
if (content_val.is_string() && in.is_last_user_msg && has_media) {
// Last user message + media: build a typed-part array (text first).
nlohmann::ordered_json content_array = nlohmann::ordered_json::array();
nlohmann::ordered_json text_part;
text_part["type"] = "text";
text_part["text"] = content_val.get<std::string>();
content_array.push_back(text_part);
append_media_parts(content_array, in.images, in.audios, in.videos);
msg_json["content"] = content_array;
} else if (content_val.is_null()) {
msg_json["content"] = "";
} else {
msg_json["content"] = content_val;
}
} else if (in.is_last_user_msg && has_media) {
// No text but media on the last user message: media-only typed array.
nlohmann::ordered_json content_array = nlohmann::ordered_json::array();
append_media_parts(content_array, in.images, in.audios, in.videos);
msg_json["content"] = content_array;
} else {
// Empty content (any role, incl. tool/assistant): templates need a string.
msg_json["content"] = "";
}
if (!in.name.empty()) {
msg_json["name"] = in.name;
}
if (!in.tool_call_id.empty()) {
msg_json["tool_call_id"] = in.tool_call_id;
}
if (!in.reasoning_content.empty()) {
msg_json["reasoning_content"] = in.reasoning_content;
}
if (!in.tool_calls.empty()) {
try {
nlohmann::ordered_json tool_calls = nlohmann::ordered_json::parse(in.tool_calls);
msg_json["tool_calls"] = tool_calls;
// tool_calls + empty/blank content: use " " not "", because llama.cpp's
// common_chat_msgs_to_json_oaicompat turns "" into null, which breaks
// templates that slice message.content[:tool_start_length] (#7324).
if (!msg_json.contains("content") ||
(msg_json["content"].is_string() && msg_json["content"].get<std::string>().empty())) {
msg_json["content"] = " ";
}
} catch (const nlohmann::ordered_json::parse_error&) {
// Malformed tool_calls JSON: leave content as-is (prior behavior).
}
}
return msg_json;
}
} // namespace llama_grpc

View File

@@ -0,0 +1,234 @@
// Unit tests for the shared message-reconstruction helpers (message_content.h).
//
// Build & run standalone (nlohmann/json single header on the include path):
// g++ -std=c++17 -I<dir-with-nlohmann> message_content_test.cpp -o t && ./t
// or via CMake: -DLLAMA_GRPC_BUILD_TESTS=ON then ctest.
//
// Regression coverage for:
// #10524 - a user/system prompt that is itself a JSON-array string must stay
// plain text, never be reinterpreted as OpenAI structured parts.
// #7324 - assistant/tool null content -> "" (templates slice content[:N]);
// assistant+tool_calls+empty content -> " " (not "", which becomes null).
// #7528 - tool message array content must reach the template as a string.
// multimodal - last user message text + media -> typed-part array, media kept.
#include <cassert>
#include <iostream>
#include <string>
#include "message_content.h"
using nlohmann::ordered_json;
using llama_grpc::normalize_message_content;
using llama_grpc::normalize_template_message;
using llama_grpc::build_reconstructed_message;
using llama_grpc::ReconstructedMessageInput;
static int failures = 0;
static void check(bool ok, const std::string& name, const std::string& detail = "") {
if (!ok) {
std::cerr << "FAIL " << name << (detail.empty() ? "" : ": " + detail) << "\n";
failures++;
}
}
// ---- normalize_message_content -------------------------------------------
static void expect_norm_string(const char* name, const std::string& role,
const std::string& content, const std::string& want) {
auto got = normalize_message_content(role, content);
if (!got.is_string()) {
check(false, name, "expected a JSON string, got " +
std::string(got.is_array() ? "array" : got.is_object() ? "object" : "other") +
" (" + got.dump() + ")");
return;
}
check(got.get<std::string>() == want, name, "expected \"" + want + "\", got \"" + got.get<std::string>() + "\"");
}
static void test_normalize() {
const std::string ingredients = R"(["1/4 cup brown sugar, packed","1 pound ground beef"])";
// #10524 - JSON-array text must stay a string. Role-INDEPENDENT array defense.
for (const char* role : {"user", "system", "developer", "function", "assistant", "tool"}) {
expect_norm_string((std::string("json_array_stays_text:") + role).c_str(), role, ingredients, ingredients);
}
// #10524 - user/system/developer JSON-object text stays verbatim (NOT re-dumped).
expect_norm_string("user_json_object_verbatim", "user", R"({"a":1})", R"({"a":1})");
expect_norm_string("system_json_object_verbatim", "system", R"({"a":1})", R"({"a":1})");
expect_norm_string("developer_json_object_verbatim", "developer", R"({"a":1})", R"({"a":1})");
// Plain text unchanged for all roles.
expect_norm_string("user_plain_text", "user", "hello world", "hello world");
expect_norm_string("assistant_non_json_text_kept", "assistant", "hi [unclosed", "hi [unclosed");
// #7324 boundary - user/system/developer literal "null" preserved (never parsed).
expect_norm_string("user_literal_null_stays", "user", "null", "null");
expect_norm_string("system_literal_null_stays", "system", "null", "null");
expect_norm_string("developer_literal_null_stays", "developer", "null", "null");
// #7324 - assistant/tool literal null collapses to empty string.
expect_norm_string("assistant_null_to_empty", "assistant", "null", "");
expect_norm_string("tool_null_to_empty", "tool", "null", "");
// #7324/#7528 - assistant/tool object bookkeeping stringified (stays a string).
check(normalize_message_content("assistant", R"({"tool":"x"})").is_string(), "assistant_object_stringified");
check(normalize_message_content("tool", R"({"error":"boom"})").is_string(), "tool_object_stringified");
// #10524-family - a bare scalar that parses as a JSON number stays the string.
expect_norm_string("assistant_scalar_number_stays_string", "assistant", "42", "42");
// baseline - empty content stays empty.
expect_norm_string("user_empty_stays_empty", "user", "", "");
}
// ---- normalize_template_message (BEFORE TEMPLATE sanitizer) ---------------
static void test_template_sanitizer() {
// #7528 - a tool message with an ACTUAL array becomes a string.
{
ordered_json msg = {{"role", "tool"}, {"content", ordered_json::array({{{"type", "text"}, {"text", "r"}}})}};
normalize_template_message(msg);
check(msg["content"].is_string(), "before_template_tool_array_to_string", "got " + msg["content"].dump());
}
// #7324 - null content -> "" for any role.
{
ordered_json msg = {{"role", "assistant"}, {"content", nullptr}};
normalize_template_message(msg);
check(msg["content"].is_string() && msg["content"] == "", "before_template_null_to_empty");
}
// object content -> dumped string (would otherwise throw at the template).
{
ordered_json msg = {{"role", "assistant"}, {"content", {{"x", 1}}}};
normalize_template_message(msg);
check(msg["content"].is_string(), "before_template_object_to_string", "got " + msg["content"].dump());
}
// missing content field -> "".
{
ordered_json msg = {{"role", "user"}};
normalize_template_message(msg);
check(msg.contains("content") && msg["content"] == "", "before_template_missing_to_empty");
}
// multimodal: a well-typed user array must be left UNTOUCHED (role!=tool).
{
ordered_json parts = ordered_json::array();
parts.push_back({{"type", "text"}, {"text", "x"}});
ordered_json img; img["type"] = "image_url"; img["image_url"] = {{"url", "data:..."}};
parts.push_back(img);
ordered_json msg = {{"role", "user"}, {"content", parts}};
normalize_template_message(msg);
check(msg["content"].is_array() && msg["content"].size() == 2, "before_template_user_typed_array_preserved",
"got " + msg["content"].dump());
}
// a plain string is left untouched.
{
ordered_json msg = {{"role", "user"}, {"content", "hello"}};
normalize_template_message(msg);
check(msg["content"] == "hello", "before_template_string_untouched");
}
}
// ---- build_reconstructed_message ----------------------------------------
static void test_reconstruction() {
const std::string ingredients = R"(["1/4 cup brown sugar","1 pound ground beef"])";
// #10524 end-state - user JSON-array text, no media -> string content.
{
ReconstructedMessageInput in;
in.role = "user"; in.content = ingredients;
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == ingredients, "recon_user_json_array_string",
"got " + m["content"].dump());
}
// multimodal - user text + one image on last user msg -> typed array, image kept.
{
ReconstructedMessageInput in;
in.role = "user"; in.content = ingredients; in.is_last_user_msg = true;
in.images.push_back("BASE64IMG");
auto m = build_reconstructed_message(in);
check(m["content"].is_array() && m["content"].size() == 2, "recon_multimodal_text_plus_image",
"got " + m["content"].dump());
check(m["content"][0]["type"] == "text" && m["content"][0]["text"] == ingredients, "recon_multimodal_text_first");
check(m["content"][1]["type"] == "image_url", "recon_multimodal_image_kept");
}
// multimodal media-only - empty text + image on last user msg.
{
ReconstructedMessageInput in;
in.role = "user"; in.content = ""; in.is_last_user_msg = true;
in.images.push_back("BASE64IMG");
auto m = build_reconstructed_message(in);
check(m["content"].is_array() && m["content"].size() == 1 && m["content"][0]["type"] == "image_url",
"recon_media_only", "got " + m["content"].dump());
}
// #7528 - tool array-string content stays a string.
{
ReconstructedMessageInput in;
in.role = "tool"; in.content = R"(["a","b"])"; in.tool_call_id = "call_1";
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == R"(["a","b"])", "recon_tool_array_string",
"got " + m["content"].dump());
check(m["tool_call_id"] == "call_1", "recon_tool_call_id_set");
}
// tool empty content -> "".
{
ReconstructedMessageInput in;
in.role = "tool"; in.content = "";
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == "", "recon_tool_empty_to_string");
}
// #7324 - assistant + tool_calls + empty content -> " " (single space, not "").
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "";
in.tool_calls = R"([{"id":"c1","type":"function","function":{"name":"f","arguments":"{}"}}])";
auto m = build_reconstructed_message(in);
check(m["content"].is_string() && m["content"] == " ", "recon_toolcalls_empty_content_space",
"got " + m["content"].dump());
check(m["tool_calls"].is_array() && m["tool_calls"].size() == 1, "recon_toolcalls_parsed");
}
// assistant + tool_calls + real content keeps the content.
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "I'll call f";
in.tool_calls = R"([{"id":"c1","type":"function","function":{"name":"f","arguments":"{}"}}])";
auto m = build_reconstructed_message(in);
check(m["content"] == "I'll call f", "recon_toolcalls_with_content_kept");
}
// assistant null content -> "".
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "null";
auto m = build_reconstructed_message(in);
check(m["content"] == "", "recon_assistant_null_to_empty");
}
// malformed tool_calls JSON must not throw; content preserved.
{
ReconstructedMessageInput in;
in.role = "assistant"; in.content = "hi"; in.tool_calls = "{not json";
auto m = build_reconstructed_message(in);
check(m["content"] == "hi" && !m.contains("tool_calls"), "recon_malformed_toolcalls_safe");
}
// optional fields: name + reasoning carried through.
{
ReconstructedMessageInput in;
in.role = "tool"; in.content = "result"; in.name = "get_weather"; in.reasoning_content = "thinking";
auto m = build_reconstructed_message(in);
check(m["name"] == "get_weather" && m["reasoning_content"] == "thinking", "recon_optional_fields");
}
}
int main() {
test_normalize();
test_template_sanitizer();
test_reconstruction();
if (failures == 0) {
std::cout << "OK: all message_content tests passed\n";
return 0;
}
std::cerr << failures << " test(s) failed\n";
return 1;
}

View File

@@ -18,6 +18,10 @@ done
cp -r CMakeLists.txt llama.cpp/tools/grpc-server/
cp -r grpc-server.cpp llama.cpp/tools/grpc-server/
# Shared message-reconstruction helpers (included by grpc-server.cpp) and their
# unit test (compiled only when -DLLAMA_GRPC_BUILD_TESTS=ON).
cp -r message_content.h llama.cpp/tools/grpc-server/
cp -r message_content_test.cpp llama.cpp/tools/grpc-server/
cp -rfv llama.cpp/vendor/nlohmann/json.hpp llama.cpp/tools/grpc-server/
cp -rfv llama.cpp/vendor/cpp-httplib/httplib.h llama.cpp/tools/grpc-server/

71
backend/cpp/run-unit-tests.sh Executable file
View File

@@ -0,0 +1,71 @@
#!/bin/bash
#
# Discovers and runs every standalone C++ unit test under backend/cpp/.
#
# A "standalone" unit test is a *_test.cpp that depends only on the C++ standard
# library and nlohmann/json (single header) - i.e. it exercises pure helpers and
# does not need the full llama.cpp + gRPC backend build. Tests that DO need the
# backend build use the CMake/ctest path (e.g. -DLLAMA_GRPC_BUILD_TESTS=ON)
# instead and are skipped here.
#
# This keeps CI generic: adding a new pure-C++ unit test file named *_test.cpp in
# an active backend source dir is picked up automatically, with no CI edits.
#
# Env:
# NLOHMANN_INCLUDE include dir that contains nlohmann/json.hpp. If unset, the
# nlohmann/json single header is fetched to a temp dir.
# CXX compiler (default: g++).
# JSON_VERSION nlohmann/json tag to fetch when NLOHMANN_INCLUDE is unset
# (default: v3.11.3).
set -uo pipefail
ROOT="$(cd "$(dirname "$0")" && pwd)"
CXX="${CXX:-g++}"
JSON_VERSION="${JSON_VERSION:-v3.11.3}"
JSON_INC="${NLOHMANN_INCLUDE:-}"
if [ -z "$JSON_INC" ]; then
JSON_INC="$(mktemp -d)"
mkdir -p "$JSON_INC/nlohmann"
echo "Fetching nlohmann/json ${JSON_VERSION} single header..."
if ! curl -L -sf \
"https://raw.githubusercontent.com/nlohmann/json/${JSON_VERSION}/single_include/nlohmann/json.hpp" \
-o "$JSON_INC/nlohmann/json.hpp"; then
echo "ERROR: failed to fetch nlohmann/json header" >&2
exit 1
fi
fi
# Active source dirs only - exclude per-variant build copies, dev snapshots and
# the vendored upstream llama.cpp tree.
mapfile -t tests < <(find "$ROOT" -name '*_test.cpp' \
-not -path '*/llama.cpp/*' \
-not -path '*-build/*' \
-not -path '*-dev/*' \
-not -path '*fallback*' | sort)
if [ "${#tests[@]}" -eq 0 ]; then
echo "No standalone C++ unit tests found under $ROOT"
exit 0
fi
fail=0
for test_src in "${tests[@]}"; do
name="$(basename "$test_src" .cpp)"
bin="$(mktemp -d)/$name"
echo "==> $test_src"
if ! "$CXX" -std=c++17 -Wall -Wextra \
-I"$JSON_INC" -I"$(dirname "$test_src")" \
"$test_src" -o "$bin"; then
echo "COMPILE FAILED: $test_src" >&2
fail=1
continue
fi
if ! "$bin"; then
echo "TEST FAILED: $test_src" >&2
fail=1
fi
done
echo "Ran ${#tests[@]} standalone C++ unit test file(s)"
exit "$fail"

View File

@@ -1,6 +1,6 @@
# parakeet-cpp backend Makefile.
#
# Upstream pin lives below as PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
# Upstream pin lives below as PARAKEET_VERSION?=f469a57270a1cc4554acb15febf60e56619673b9
# (.github/bump_deps.sh) can find and update it - matches the
# whisper.cpp / ds4 / vibevoice-cpp convention.
#
@@ -15,7 +15,7 @@
# That's what the L0 smoke test uses. The default target below does the
# proper clone-at-pin + cmake build so CI doesn't need a side-checkout.
PARAKEET_VERSION?=89f5e2977b4d8bccd45e7bcc6f2ef7c4ed49e89a
PARAKEET_VERSION?=f469a57270a1cc4554acb15febf60e56619673b9
PARAKEET_REPO?=https://github.com/mudler/parakeet.cpp
GOCMD?=go

View File

@@ -16,7 +16,15 @@ cp -rfv $CURDIR/run.sh $CURDIR/package/
cp -rfLv $CURDIR/sources/go-piper/piper-phonemize/pi/lib/* $CURDIR/package/lib/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
if [ "$(uname)" = "Darwin" ]; then
# macOS has no glibc loader to bundle. The piper binary links its bundled
# libs (libucd, libespeak-ng, libpiper_phonemize, libonnxruntime) via
# @rpath but ships with no LC_RPATH, so dyld aborts at launch with
# "Library not loaded: @rpath/libucd.dylib ... no LC_RPATH's found".
# Add an @loader_path/lib rpath so @rpath resolves to package/lib/.
echo "Detected macOS; adding @loader_path/lib rpath so bundled libs resolve via @rpath..."
install_name_tool -add_rpath @loader_path/lib "$CURDIR/package/piper"
elif [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so

View File

@@ -4,7 +4,12 @@ set -ex
CURDIR=$(dirname "$(realpath "$0")")
export ESPEAK_NG_DATA="$CURDIR"/espeak-ng-data
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then

View File

@@ -15,7 +15,14 @@ cp -avf $CURDIR/run.sh $CURDIR/package/
cp -rfLv $CURDIR/backend-assets/lib/* $CURDIR/package/lib/
# Detect architecture and copy appropriate libraries
if [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
if [ "$(uname)" = "Darwin" ]; then
# macOS has no glibc loader to bundle. silero-vad links its bundled
# libonnxruntime via @rpath but ships with no LC_RPATH, so dyld can't find
# it at runtime. Add an @loader_path/lib rpath so @rpath resolves to
# package/lib/ (matching the piper darwin fix, #10525).
echo "Detected macOS; adding @loader_path/lib rpath so bundled libs resolve via @rpath..."
install_name_tool -add_rpath @loader_path/lib "$CURDIR/package/silero-vad"
elif [ -f "/lib64/ld-linux-x86-64.so.2" ]; then
# x86_64 architecture
echo "Detected x86_64 architecture, copying x86_64 libraries..."
cp -arfLv /lib64/ld-linux-x86-64.so.2 $CURDIR/package/lib/ld.so

View File

@@ -3,7 +3,11 @@ set -ex
CURDIR=$(dirname "$(realpath "$0")")
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
if [ "$(uname)" = "Darwin" ]; then
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
export LD_LIBRARY_PATH="$CURDIR"/lib:$LD_LIBRARY_PATH
fi
# If there is a lib/ld.so, use it
if [ -f "$CURDIR"/lib/ld.so ]; then

View File

@@ -13,8 +13,14 @@ if [ "$(uname)" != "Darwin" ]; then
fi
if [ "$(uname)" = "Darwin" ]; then
# macOS: single dylib variant (Metal or Accelerate)
LIBRARY="$CURDIR/libgowhisper-fallback.dylib"
# macOS: single fallback variant (Metal/Accelerate). The cmake build emits a
# Mach-O named .so, but tolerate .dylib too — pick whichever exists so the Go
# loader doesn't panic on a hardcoded name that isn't on disk.
if [ -e "$CURDIR/libgowhisper-fallback.dylib" ]; then
LIBRARY="$CURDIR/libgowhisper-fallback.dylib"
else
LIBRARY="$CURDIR/libgowhisper-fallback.so"
fi
export DYLD_LIBRARY_PATH="$CURDIR"/lib:$DYLD_LIBRARY_PATH
else
LIBRARY="$CURDIR/libgowhisper-fallback.so"

View File

@@ -1356,7 +1356,6 @@
intel: "intel-fish-speech"
amd: "rocm-fish-speech"
nvidia-l4t: "nvidia-l4t-fish-speech"
metal: "metal-fish-speech"
default: "cpu-fish-speech"
nvidia-cuda-13: "cuda13-fish-speech"
nvidia-cuda-12: "cuda12-fish-speech"
@@ -4870,7 +4869,6 @@
intel: "intel-fish-speech-development"
amd: "rocm-fish-speech-development"
nvidia-l4t: "nvidia-l4t-fish-speech-development"
metal: "metal-fish-speech-development"
default: "cpu-fish-speech-development"
nvidia-cuda-13: "cuda13-fish-speech-development"
nvidia-cuda-12: "cuda12-fish-speech-development"
@@ -4946,16 +4944,6 @@
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech"
mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech
- !!merge <<: *fish-speech
name: "metal-fish-speech"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-fish-speech"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-fish-speech
- !!merge <<: *fish-speech
name: "metal-fish-speech-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-fish-speech"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-fish-speech
## faster-qwen3-tts
- !!merge <<: *faster-qwen3-tts
name: "faster-qwen3-tts-development"

View File

@@ -1,2 +0,0 @@
torch
torchaudio

View File

@@ -7,3 +7,7 @@ setuptools
six
scipy
numpy
# fish-speech is installed editable with --no-build-isolation, so the build
# backends of its transitive deps must already be in the venv. One of them
# builds a Rust extension and needs setuptools-rust present at metadata time.
setuptools-rust

View File

@@ -11,14 +11,31 @@ fi
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade "
installRequirements
# Fetch convert_hf_to_gguf.py from llama.cpp
# Fetch convert_hf_to_gguf.py from llama.cpp.
# Upstream split the model-specific logic out of the single file into a
# sibling `conversion/` package (convert_hf_to_gguf.py now does
# `from conversion import ...`), so a single-file download no longer runs —
# it fails with `ModuleNotFoundError: No module named 'conversion'`. We clone
# the repo and copy both the script and the package; Python puts the script's
# own directory on sys.path[0], so the package resolves when placed beside it.
LLAMA_CPP_CONVERT_VERSION="${LLAMA_CPP_CONVERT_VERSION:-master}"
LLAMA_CPP_SRC="${EDIR}/llama.cpp"
CONVERT_SCRIPT="${EDIR}/convert_hf_to_gguf.py"
if [ ! -f "${CONVERT_SCRIPT}" ]; then
echo "Downloading convert_hf_to_gguf.py from llama.cpp (${LLAMA_CPP_CONVERT_VERSION})..."
curl -L --fail --retry 3 \
"https://raw.githubusercontent.com/ggml-org/llama.cpp/${LLAMA_CPP_CONVERT_VERSION}/convert_hf_to_gguf.py" \
-o "${CONVERT_SCRIPT}" || echo "Warning: Failed to download convert_hf_to_gguf.py."
cloneLlamaCpp() {
if [ ! -d "${LLAMA_CPP_SRC}/.git" ]; then
git clone --depth 1 --branch "${LLAMA_CPP_CONVERT_VERSION}" \
https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}" 2>/dev/null || \
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}"
fi
}
if [ ! -f "${CONVERT_SCRIPT}" ] || [ ! -d "${EDIR}/conversion" ]; then
echo "Fetching convert_hf_to_gguf.py + conversion/ from llama.cpp (${LLAMA_CPP_CONVERT_VERSION})..."
cloneLlamaCpp
cp "${LLAMA_CPP_SRC}/convert_hf_to_gguf.py" "${CONVERT_SCRIPT}"
rm -rf "${EDIR}/conversion"
cp -r "${LLAMA_CPP_SRC}/conversion" "${EDIR}/conversion"
fi
# Install gguf package from the same llama.cpp commit to keep them in sync
@@ -41,12 +58,7 @@ QUANTIZE_BIN="${EDIR}/llama-quantize"
if [ ! -x "${QUANTIZE_BIN}" ] && ! command -v llama-quantize &>/dev/null; then
if command -v cmake &>/dev/null; then
echo "Building llama-quantize from llama.cpp (${LLAMA_CPP_CONVERT_VERSION})..."
LLAMA_CPP_SRC="${EDIR}/llama.cpp"
if [ ! -d "${LLAMA_CPP_SRC}" ]; then
git clone --depth 1 --branch "${LLAMA_CPP_CONVERT_VERSION}" \
https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}" 2>/dev/null || \
git clone --depth 1 https://github.com/ggml-org/llama.cpp.git "${LLAMA_CPP_SRC}"
fi
cloneLlamaCpp # reuses the clone fetched for convert_hf_to_gguf.py
cmake -B "${LLAMA_CPP_SRC}/build" -S "${LLAMA_CPP_SRC}" -DGGML_NATIVE=OFF -DBUILD_SHARED_LIBS=OFF
cmake --build "${LLAMA_CPP_SRC}/build" --target llama-quantize -j"$(nproc 2>/dev/null || echo 2)"
cp "${LLAMA_CPP_SRC}/build/bin/llama-quantize" "${QUANTIZE_BIN}"

View File

@@ -85,9 +85,15 @@ if [ "x${BUILD_TYPE}" == "x" ] || [ "x${FROM_SOURCE:-}" == "xtrue" ]; then
# The resulting binary still requires an AVX-512 capable CPU at runtime,
# same constraint sglang upstream documents in docker/xeon.Dockerfile.
# Pin the source build to the same release the GPU path floors on
# (0.5.11, see requirements-cublas12-after.txt). An unpinned master clone
# pulls in newer CPU kernels (e.g. mamba/fla.cpp) that fail to compile
# (constexpr non-constant + kineto_LIBRARY-NOTFOUND). Bump deliberately.
SGLANG_VERSION="${SGLANG_VERSION:-v0.5.11}"
_sgl_src=$(mktemp -d)
trap 'rm -rf "${_sgl_src}"' EXIT
git clone --depth 1 https://github.com/sgl-project/sglang "${_sgl_src}/sglang"
git clone --depth 1 --branch "${SGLANG_VERSION}" \
https://github.com/sgl-project/sglang "${_sgl_src}/sglang"
# Patch -march=native → -march=sapphirerapids in the CPU kernel CMakeLists
sed -i 's/-march=native/-march=sapphirerapids/g' \

View File

@@ -570,6 +570,43 @@ impl Backend for KokorosService {
) -> Result<Response<backend::Result>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn sound_detection(
&self,
_: Request<backend::SoundDetectionRequest>,
) -> Result<Response<backend::SoundDetectionResponse>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn depth(
&self,
_: Request<backend::DepthRequest>,
) -> Result<Response<backend::DepthResponse>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn token_classify(
&self,
_: Request<backend::TokenClassifyRequest>,
) -> Result<Response<backend::TokenClassifyResponse>, Status> {
Err(Status::unimplemented("Not supported"))
}
async fn score(
&self,
_: Request<backend::ScoreRequest>,
) -> Result<Response<backend::ScoreResponse>, Status> {
Err(Status::unimplemented("Not supported"))
}
type ForwardStream = ReceiverStream<Result<backend::ForwardReply, Status>>;
async fn forward(
&self,
_: Request<tonic::Streaming<backend::ForwardRequest>>,
) -> Result<Response<Self::ForwardStream>, Status> {
Err(Status::unimplemented("Not supported"))
}
}
#[cfg(test)]

View File

@@ -0,0 +1,8 @@
Website = "https://localai.io"
[Details]
Icon = "../../core/http/static/logo.png"
Name = "LocalAI"
ID = "com.localai.launcher"
Version = "0.0.0"
Build = 1

View File

@@ -0,0 +1,14 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.security.network.client</key>
<true/>
<key>com.apple.security.network.server</key>
<true/>
<key>com.apple.security.cs.allow-jit</key>
<true/>
<key>com.apple.security.cs.allow-unsigned-executable-memory</key>
<true/>
</dict>
</plist>

View File

@@ -0,0 +1,84 @@
#!/usr/bin/env bash
# Code-sign and notarize macOS artifacts for LocalAI.
# Every sub-command is a no-op (exit 0) when its required secret is unset,
# so unsigned builds (forks, local dev, PRs) keep working.
set -euo pipefail
ENTITLEMENTS="contrib/macos/Launcher.entitlements"
KEYCHAIN="localai-ci.keychain-db"
cmd_import_cert() {
if [ -z "${MACOS_CERTIFICATE:-}" ]; then
echo "[sign] MACOS_CERTIFICATE unset: skipping cert import (unsigned build)"
return 0
fi
local certfile keychain_pwd default_keychain
certfile="$(mktemp).p12"
keychain_pwd="${MACOS_CI_KEYCHAIN_PWD:?MACOS_CI_KEYCHAIN_PWD required when signing}"
echo "$MACOS_CERTIFICATE" | base64 --decode > "$certfile"
security create-keychain -p "$keychain_pwd" "$KEYCHAIN"
security set-keychain-settings -lut 21600 "$KEYCHAIN"
security unlock-keychain -p "$keychain_pwd" "$KEYCHAIN"
security import "$certfile" -k "$KEYCHAIN" -P "${MACOS_CERTIFICATE_PWD:?}" \
-T /usr/bin/codesign -T /usr/bin/security
security set-key-partition-list -S apple-tool:,apple:,codesign: \
-s -k "$keychain_pwd" "$KEYCHAIN" >/dev/null
default_keychain="$(security default-keychain | tr -d ' "')"
security list-keychains -d user -s "$KEYCHAIN" "$default_keychain"
rm -f "$certfile"
echo "[sign] certificate imported into $KEYCHAIN"
}
cmd_sign() {
local target="$1"
if [ -z "${MACOS_SIGN_IDENTITY:-}" ]; then
echo "[sign] MACOS_SIGN_IDENTITY unset: skipping codesign of $target"
return 0
fi
case "$target" in
*.app)
# Hardened runtime + entitlements are required for notarizing the app bundle.
codesign --deep --force --options runtime --timestamp \
--entitlements "$ENTITLEMENTS" \
--sign "$MACOS_SIGN_IDENTITY" "$target"
;;
*)
# A disk image carries no entitlements/runtime; just sign the container.
codesign --force --timestamp --sign "$MACOS_SIGN_IDENTITY" "$target"
;;
esac
codesign --verify --strict --verbose=2 "$target"
echo "[sign] signed $target"
}
cmd_notarize() {
local dmg="$1"
if [ -z "${MACOS_NOTARY_KEY:-}" ]; then
echo "[notarize] MACOS_NOTARY_KEY unset: skipping notarization of $dmg"
return 0
fi
local keyfile
keyfile="$(mktemp).p8"
echo "$MACOS_NOTARY_KEY" | base64 --decode > "$keyfile"
xcrun notarytool submit "$dmg" \
--key "$keyfile" \
--key-id "${MACOS_NOTARY_KEY_ID:?}" \
--issuer "${MACOS_NOTARY_ISSUER_ID:?}" \
--wait
rm -f "$keyfile"
xcrun stapler staple "$dmg"
xcrun stapler validate "$dmg"
echo "[notarize] notarized and stapled $dmg"
}
main() {
local sub="${1:-}"; shift || true
case "$sub" in
import-cert) cmd_import_cert ;;
sign) cmd_sign "$@" ;;
notarize) cmd_notarize "$@" ;;
*) echo "usage: $0 {import-cert|sign <path>|notarize <dmg>}" >&2; exit 2 ;;
esac
}
main "$@"

View File

@@ -37,6 +37,8 @@ func (a *Application) RestartAgentJobService() error {
if d.JobStore != nil {
agentJobService.SetDistributedJobStore(d.JobStore)
}
// Keep agent tasks consistent across replicas (same client the dispatcher uses).
agentJobService.SetTaskSyncNATS(d.Nats)
}
// Start the service

View File

@@ -604,6 +604,10 @@ func (a *Application) StartAgentPool() {
usm.SetJobDBStore(s)
}
}
// Keep per-user agent tasks consistent across replicas (nil in standalone).
if d := a.Distributed(); d != nil {
usm.SetJobSyncNATS(d.Nats)
}
aps.SetUserServicesManager(usm)
a.agentPoolService.Store(aps)

View File

@@ -16,6 +16,7 @@ import (
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/jobs"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/modeladmin"
"github.com/mudler/LocalAI/core/services/monitoring"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/routing/admission"
@@ -279,6 +280,9 @@ func New(opts ...config.AppOption) (*Application, error) {
if application.agentJobService != nil {
application.agentJobService.SetDistributedBackends(distSvc.Dispatcher)
application.agentJobService.SetDistributedJobStore(distSvc.JobStore)
// Keep agent tasks consistent across replicas (jobs already sync via the
// dispatcher + DB read-through). Same NATS client the dispatcher uses.
application.agentJobService.SetTaskSyncNATS(distSvc.Nats)
}
// Wire skill store into AgentPoolService (wired at pool start time via closure)
// The actual wiring happens in StartAgentPool since the pool doesn't exist yet.
@@ -330,9 +334,14 @@ func New(opts ...config.AppOption) (*Application, error) {
gs := application.galleryService
sys := options.SystemState
cfgLoaderOpts := options.ToConfigLoaderOptions()
gs.OnModelsChanged = func(_ messaging.CacheInvalidateEvent) {
if err := application.ModelConfigLoader().LoadModelConfigsFromPath(sys.Model.ModelsPath, cfgLoaderOpts...); err != nil {
xlog.Warn("Failed to reload model configs after peer invalidation", "error", err)
gs.OnModelsChanged = func(evt messaging.CacheInvalidateEvent) {
// ApplyRemoteChange honors the op: a "delete" prunes the element
// (a reload-from-path is additive and cannot drop it), anything
// else reloads from disk; a named element's running instance is
// shut down so the new config takes effect. The originating
// replica reloads inline and never depends on this path.
if err := modeladmin.ApplyRemoteChange(application.ModelConfigLoader(), application.modelLoader, sys.Model.ModelsPath, evt, cfgLoaderOpts...); err != nil {
xlog.Warn("Failed to apply peer model config change", "error", err)
}
}
if err := application.galleryService.SubscribeBroadcasts(); err != nil {

View File

@@ -12,14 +12,12 @@ package config
// these; config never imports backend.
const (
// DefaultContextSize is the fallback context window when none is configured
// or estimable from the model.
// or estimable from the model. It is also the fallback for a GGUF whose
// metadata yields no usable estimate or that the parser cannot read at all
// (e.g. a quant type it does not know, such as NVFP4): a model-agnostic
// safe default beats a tiny, surprising window that truncates real prompts.
DefaultContextSize = 4096
// GGUFFallbackContextSize is the context window for a GGUF model whose
// metadata yields no usable estimate (see guessGGUFFromFile). Deliberately
// smaller than DefaultContextSize to stay conservative on memory there.
GGUFFallbackContextSize = 1024
// DefaultNGPULayers means "offload all layers"; the backend (fit_params)
// clamps to what actually fits in device memory.
DefaultNGPULayers = 99999999

View File

@@ -33,7 +33,7 @@ func guessGGUFFromFile(cfg *ModelConfig, f *gguf.GGUFFile, defaultCtx int) {
cSize := int(ctxSize)
cfg.ContextSize = &cSize
} else {
defaultCtx = GGUFFallbackContextSize
defaultCtx = DefaultContextSize
cfg.ContextSize = &defaultCtx
}
}

View File

@@ -34,7 +34,7 @@ func llamaCppDefaults(cfg *ModelConfig, modelPath string) {
// Default context size if not set, regardless of whether GGUF parsing succeeds
defer func() {
if cfg.ContextSize == nil {
ctx := GGUFFallbackContextSize
ctx := DefaultContextSize
cfg.ContextSize = &ctx
}
}()

View File

@@ -248,7 +248,11 @@ var _ = Describe("Backend hooks and parser defaults", func() {
}
cfg.SetDefaults(ModelPath(dir))
// An unreadable/unparseable GGUF (e.g. a quant type the parser does
// not know, such as NVFP4) yields no estimate, so the hook must fall
// back to DefaultContextSize rather than a tiny, surprising value.
Expect(cfg.ContextSize).NotTo(BeNil())
Expect(*cfg.ContextSize).To(Equal(DefaultContextSize))
})
})

View File

@@ -23,8 +23,10 @@ import (
"github.com/mudler/LocalAI/core/application"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/finetune"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/quantization"
@@ -400,25 +402,45 @@ func API(application *application.Application) (*echo.Echo, error) {
routes.RegisterAgentPoolRoutes(e, application, agentsMw, skillsMw, collectionsMw)
// Fine-tuning routes
fineTuningMw := auth.RequireFeature(application.AuthDB(), auth.FeatureFineTuning)
// In distributed mode pass the shared NATS client + PostgreSQL store so
// fine-tune jobs stay consistent across replicas (the SyncedMap broadcasts
// mutations and hydrates from the DB); standalone passes nil for both.
var ftNats messaging.MessagingClient
var ftStore *distributed.FineTuneStore
if d := application.Distributed(); d != nil {
ftNats = d.Nats
if d.DistStores != nil && d.DistStores.FineTune != nil {
ftStore = d.DistStores.FineTune
}
}
ftService := finetune.NewFineTuneService(
application.ApplicationConfig(),
application.ModelLoader(),
application.ModelConfigLoader(),
ftNats,
ftStore,
)
if d := application.Distributed(); d != nil {
ftService.SetNATSClient(d.Nats)
if d.DistStores != nil && d.DistStores.FineTune != nil {
ftService.SetFineTuneStore(d.DistStores.FineTune)
}
}
routes.RegisterFineTuningRoutes(e, ftService, application.ApplicationConfig(), fineTuningMw)
// Quantization routes
quantizationMw := auth.RequireFeature(application.AuthDB(), auth.FeatureQuantization)
// In distributed mode pass the shared NATS client + PostgreSQL store so
// quantization jobs stay consistent across replicas (the SyncedMap broadcasts
// mutations and hydrates from the DB); standalone passes nil for both.
var quantNats messaging.MessagingClient
var quantStore *distributed.QuantStore
if d := application.Distributed(); d != nil {
quantNats = d.Nats
if d.DistStores != nil && d.DistStores.Quant != nil {
quantStore = d.DistStores.Quant
}
}
qService := quantization.NewQuantizationService(
application.ApplicationConfig(),
application.ModelLoader(),
application.ModelConfigLoader(),
quantNats,
quantStore,
)
routes.RegisterQuantizationRoutes(e, qService, application.ApplicationConfig(), quantizationMw)

View File

@@ -155,7 +155,7 @@ func AutocompleteEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, a
// @Param name path string true "Model name"
// @Success 200 {object} map[string]any "success message"
// @Router /api/models/config-json/{name} [patch]
func PatchConfigEndpoint(cl *config.ModelConfigLoader, _ *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
func PatchConfigEndpoint(cl *config.ModelConfigLoader, _ *model.ModelLoader, gs *galleryop.GalleryService, appConfig *config.ApplicationConfig) echo.HandlerFunc {
svc := modeladmin.NewConfigService(cl, appConfig)
return func(c echo.Context) error {
modelName := c.Param("name")
@@ -173,6 +173,14 @@ func PatchConfigEndpoint(cl *config.ModelConfigLoader, _ *model.ModelLoader, app
if _, err := svc.PatchConfig(c.Request().Context(), modelName, patchMap); err != nil {
return c.JSON(httpStatusForModelAdminError(err), map[string]any{"error": err.Error()})
}
// Patch rewrites the config on disk and reloads only the local loader;
// tell peers to refresh so the change is consistent across replicas.
// No-op in standalone mode.
if gs != nil {
gs.BroadcastModelsChanged(modelName, "install")
}
return c.JSON(http.StatusOK, map[string]any{
"success": true,
"message": fmt.Sprintf("Model '%s' updated successfully", modelName),

View File

@@ -45,7 +45,7 @@ var _ = Describe("Config Metadata Endpoints", func() {
app = echo.New()
app.GET("/api/models/config-metadata", ConfigMetadataEndpoint())
app.GET("/api/models/config-metadata/autocomplete/:provider", AutocompleteEndpoint(configLoader, modelLoader, appConfig))
app.PATCH("/api/models/config-json/:name", PatchConfigEndpoint(configLoader, modelLoader, appConfig))
app.PATCH("/api/models/config-json/:name", PatchConfigEndpoint(configLoader, modelLoader, nil, appConfig))
})
AfterEach(func() {

View File

@@ -10,6 +10,7 @@ import (
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
httpUtils "github.com/mudler/LocalAI/core/http/middleware"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/modeladmin"
"github.com/mudler/LocalAI/internal"
"github.com/mudler/LocalAI/pkg/model"
@@ -55,7 +56,7 @@ func GetEditModelPage(cl *config.ModelConfigLoader, appConfig *config.Applicatio
}
// EditModelEndpoint handles updating existing model configurations
func EditModelEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
func EditModelEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, gs *galleryop.GalleryService, appConfig *config.ApplicationConfig) echo.HandlerFunc {
svc := modeladmin.NewConfigService(cl, appConfig)
return func(c echo.Context) error {
modelName := c.Param("name")
@@ -70,6 +71,17 @@ func EditModelEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appC
if err != nil {
return c.JSON(httpStatusForModelAdminError(err), ModelResponse{Success: false, Error: err.Error()})
}
// Tell peer replicas to refresh their in-memory config: this endpoint
// only reloaded the local loader. A rename is a delete of the old name
// plus an install of the new one. No-op in standalone mode.
if gs != nil {
if result.Renamed {
gs.BroadcastModelsChanged(result.OldName, "delete")
}
gs.BroadcastModelsChanged(result.NewName, "install")
}
msg := fmt.Sprintf("Model '%s' updated successfully. Model has been reloaded with new configuration.", result.NewName)
if result.Renamed {
msg = fmt.Sprintf("Model '%s' renamed to '%s' and updated successfully.", result.OldName, result.NewName)

View File

@@ -56,7 +56,7 @@ var _ = Describe("Edit Model test", func() {
app := echo.New()
// Set up a simple renderer for the test
app.Renderer = &testRenderer{}
app.POST("/import-model", ImportModelEndpoint(modelConfigLoader, applicationConfig))
app.POST("/import-model", ImportModelEndpoint(modelConfigLoader, nil, applicationConfig))
app.GET("/edit-model/:name", GetEditModelPage(modelConfigLoader, applicationConfig))
requestBody := bytes.NewBufferString(`{"name": "foo", "backend": "foo", "model": "foo"}`)
@@ -106,7 +106,7 @@ var _ = Describe("Edit Model test", func() {
Expect(exists).To(BeTrue())
app := echo.New()
app.POST("/models/edit/:name", EditModelEndpoint(modelConfigLoader, modelLoader, applicationConfig))
app.POST("/models/edit/:name", EditModelEndpoint(modelConfigLoader, modelLoader, nil, applicationConfig))
newYAML := "name: newname\nbackend: llama\nmodel: foo\n"
req := httptest.NewRequest("POST", "/models/edit/oldname", bytes.NewBufferString(newYAML))
@@ -163,7 +163,7 @@ var _ = Describe("Edit Model test", func() {
Expect(modelConfigLoader.LoadModelConfigsFromPath(tempDir)).To(Succeed())
app := echo.New()
app.POST("/models/edit/:name", EditModelEndpoint(modelConfigLoader, modelLoader, applicationConfig))
app.POST("/models/edit/:name", EditModelEndpoint(modelConfigLoader, modelLoader, nil, applicationConfig))
req := httptest.NewRequest(
"POST",
@@ -204,7 +204,7 @@ var _ = Describe("Edit Model test", func() {
Expect(modelConfigLoader.LoadModelConfigsFromPath(tempDir)).To(Succeed())
app := echo.New()
app.POST("/models/edit/:name", EditModelEndpoint(modelConfigLoader, modelLoader, applicationConfig))
app.POST("/models/edit/:name", EditModelEndpoint(modelConfigLoader, modelLoader, nil, applicationConfig))
req := httptest.NewRequest(
"POST",

View File

@@ -125,7 +125,7 @@ func ImportModelURIEndpoint(cl *config.ModelConfigLoader, appConfig *config.Appl
}
// ImportModelEndpoint handles creating new model configurations
func ImportModelEndpoint(cl *config.ModelConfigLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
func ImportModelEndpoint(cl *config.ModelConfigLoader, gs *galleryop.GalleryService, appConfig *config.ApplicationConfig) echo.HandlerFunc {
return func(c echo.Context) error {
// Get the raw body
body, err := io.ReadAll(c.Request().Body)
@@ -245,6 +245,13 @@ func ImportModelEndpoint(cl *config.ModelConfigLoader, appConfig *config.Applica
}
return c.JSON(http.StatusInternalServerError, response)
}
// Tell peer replicas to load the newly-created config from the shared
// models dir: this endpoint only reloaded the local loader. No-op in
// standalone mode.
if gs != nil {
gs.BroadcastModelsChanged(modelConfig.Name, "install")
}
// Return success response
response := ModelResponse{
Success: true,

View File

@@ -60,7 +60,10 @@ func GetNodeEndpoint(registry *nodes.NodeRegistry) echo.HandlerFunc {
return func(c echo.Context) error {
ctx := c.Request().Context()
id := c.Param("id")
node, err := registry.Get(ctx, id)
// GetWithExtras (not Get) so the response carries the node's labels,
// loaded-model count, and in-flight total — the bare BackendNode keeps
// labels in a separate table, leaving the detail view's label list empty.
node, err := registry.GetWithExtras(ctx, id)
if err != nil {
return c.JSON(http.StatusNotFound, nodeError(http.StatusNotFound, "node not found"))
}

View File

@@ -7,6 +7,7 @@ import (
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/modeladmin"
"github.com/mudler/LocalAI/pkg/model"
)
@@ -24,7 +25,7 @@ import (
// @Failure 404 {object} ModelResponse
// @Failure 500 {object} ModelResponse
// @Router /api/models/{name}/{action} [put]
func ToggleStateModelEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, appConfig *config.ApplicationConfig) echo.HandlerFunc {
func ToggleStateModelEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, gs *galleryop.GalleryService, appConfig *config.ApplicationConfig) echo.HandlerFunc {
svc := modeladmin.NewConfigService(cl, appConfig)
return func(c echo.Context) error {
modelName := c.Param("name")
@@ -36,6 +37,14 @@ func ToggleStateModelEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoade
if err != nil {
return c.JSON(httpStatusForModelAdminError(err), ModelResponse{Success: false, Error: err.Error()})
}
// Enabling/disabling rewrites the config on disk and reloads only the
// local loader; tell peers to refresh so the model's availability is
// consistent across replicas. No-op in standalone mode.
if gs != nil {
gs.BroadcastModelsChanged(modelName, "install")
}
msg := fmt.Sprintf("Model '%s' has been %sd successfully.", modelName, action)
if action == modeladmin.ActionDisable {
msg += " The model will not be loaded on demand until re-enabled."

View File

@@ -72,19 +72,19 @@ func RegisterLocalAIRoutes(router *echo.Echo,
router.POST("/backends/upgrades/check", backendGalleryEndpointService.CheckUpgradesEndpoint(), adminMiddleware)
router.POST("/backends/upgrade/:name", backendGalleryEndpointService.UpgradeBackendEndpoint(), adminMiddleware)
// Custom model import endpoint
router.POST("/models/import", localai.ImportModelEndpoint(cl, appConfig), adminMiddleware)
router.POST("/models/import", localai.ImportModelEndpoint(cl, galleryService, appConfig), adminMiddleware)
// URI model import endpoint
router.POST("/models/import-uri", localai.ImportModelURIEndpoint(cl, appConfig, galleryService, opcache), adminMiddleware)
// Custom model edit endpoint
router.POST("/models/edit/:name", localai.EditModelEndpoint(cl, ml, appConfig), adminMiddleware)
router.POST("/models/edit/:name", localai.EditModelEndpoint(cl, ml, galleryService, appConfig), adminMiddleware)
// List model aliases endpoint
router.GET("/api/aliases", localai.ListAliasesEndpoint(cl), adminMiddleware)
// Toggle model enable/disable endpoint
router.PUT("/models/toggle-state/:name/:action", localai.ToggleStateModelEndpoint(cl, ml, appConfig), adminMiddleware)
router.PUT("/models/toggle-state/:name/:action", localai.ToggleStateModelEndpoint(cl, ml, galleryService, appConfig), adminMiddleware)
// Toggle model pinned status endpoint
router.PUT("/models/toggle-pinned/:name/:action", localai.TogglePinnedModelEndpoint(cl, appConfig, func() {

View File

@@ -922,7 +922,7 @@ func RegisterUIAPIRoutes(app *echo.Echo, cl *config.ModelConfigLoader, ml *model
app.GET("/api/models/config-metadata/autocomplete/:provider", localai.AutocompleteEndpoint(cl, ml, appConfig), adminMiddleware)
// PATCH config endpoint - partial update using nested JSON merge
app.PATCH("/api/models/config-json/:name", localai.PatchConfigEndpoint(cl, ml, appConfig), adminMiddleware)
app.PATCH("/api/models/config-json/:name", localai.PatchConfigEndpoint(cl, ml, galleryService, appConfig), adminMiddleware)
// VRAM estimation endpoint
app.POST("/api/models/vram-estimate", localai.VRAMEstimateEndpoint(cl, appConfig), adminMiddleware)

View File

@@ -68,6 +68,32 @@ var _ = Describe("LLM tests", func() {
Expect(protoMessages[0].Content).To(Equal("Hello World"))
})
// Regression for mudler/LocalAI#10524: a text part whose inner text is
// itself a JSON-array string (mealie sends an ingredient list) must
// flatten to that exact string verbatim. ToProto must NOT escape or
// restructure it - the C++ backend then treats it as opaque text. This
// pins the precise Go-side input that produced the "unsupported
// content[].type" gRPC error before the backend stopped re-parsing it.
It("flattens a JSON-array-looking text part to the verbatim string (#10524)", func() {
ingredients := `["1/4 cup brown sugar, packed","1 pound ground beef"]`
messages := Messages{
{
Role: "user",
Content: []any{
map[string]any{
"type": "text",
"text": ingredients,
},
},
},
}
protoMessages := messages.ToProto()
Expect(protoMessages).To(HaveLen(1))
Expect(protoMessages[0].Content).To(Equal(ingredients))
})
It("should convert message with tool_calls", func() {
messages := Messages{
{

View File

@@ -30,6 +30,8 @@ import (
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/jobs"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
"github.com/mudler/LocalAI/core/templates"
"github.com/mudler/LocalAI/pkg/httpclient"
"github.com/mudler/LocalAI/pkg/model"
@@ -43,8 +45,18 @@ type AgentJobService struct {
configLoader *config.ModelConfigLoader
evaluator *templates.Evaluator
// tasks is the cross-replica task store: an in-memory map kept consistent
// across replicas via NATS, with read-through to the configured persister
// (file in standalone, PostgreSQL in distributed). Unlike jobs - which already
// converge via the dispatcher + DB read-through - tasks previously read
// in-memory only, so ListTasks went stale on non-originating replicas.
tasks *syncstate.SyncedMap[string, schema.Task]
// taskNats is the distributed NATS client backing the tasks SyncedMap. It is
// not available at construction time, so it is injected via SetTaskSyncNATS
// during distributed wiring; nil keeps tasks in-memory-only (standalone).
taskNats messaging.MessagingClient
// Storage (in-memory primary, persister for secondary persistence)
tasks *xsync.SyncedMap[string, schema.Task]
jobs *xsync.SyncedMap[string, schema.Job]
persister JobPersister
userID string // Scoping: empty for global (main service), set for per-user instances
@@ -96,6 +108,31 @@ func (s *AgentJobService) SetDistributedJobStore(store *jobs.JobStore) {
s.persister = &dbJobPersister{store: store}
}
// SetTaskSyncNATS wires the distributed NATS client used to keep agent *tasks*
// consistent across replicas (jobs already converge via the dispatcher + DB
// read-through, so they are left untouched). The client is not available when the
// service is constructed, so it is injected here during distributed wiring and the
// tasks SyncedMap is rebuilt to pick it up. It is always called before Start /
// hydrate, while the map is still empty, so rebuilding loses no state. Passing nil
// (standalone) keeps the map in-memory-only with no broadcast.
func (s *AgentJobService) SetTaskSyncNATS(nats messaging.MessagingClient) {
s.taskNats = nats
s.buildTasksMap()
}
// buildTasksMap (re)constructs the cross-replica tasks SyncedMap from the current
// taskNats. The Store adapter reads s.persister/s.userID live, so a persister swap
// (SetDistributedJobStore) needs no rebuild; only the NATS client, fixed at
// New-time, forces one - hence SetTaskSyncNATS calls this.
func (s *AgentJobService) buildTasksMap() {
s.tasks = syncstate.New(syncstate.Config[string, schema.Task]{
Name: "agent.tasks",
Key: func(t schema.Task) string { return t.ID },
Nats: s.taskNats,
Store: &taskStoreAdapter{svc: s},
})
}
// Dispatcher returns the distributed dispatcher (nil if not in distributed mode).
func (s *AgentJobService) Dispatcher() DistributedDispatcher {
return s.dispatcher
@@ -106,13 +143,6 @@ func (s *AgentJobService) DBStore() *jobs.JobStore {
return s.rawDBStore
}
// saveTasks persists tasks via the configured persister (file or DB).
func (s *AgentJobService) saveTasks(task schema.Task) {
if err := s.persister.SaveTask(s.userID, task); err != nil {
xlog.Warn("Failed to persist task", "error", err, "task_id", task.ID)
}
}
// saveJobs persists jobs via the configured persister (file or DB).
func (s *AgentJobService) saveJobs(job schema.Job) {
if err := s.persister.SaveJob(s.userID, job); err != nil {
@@ -129,18 +159,8 @@ func (s *AgentJobService) LoadFromDB() {
// loadFromPersister loads tasks and jobs from the configured persister into memory.
func (s *AgentJobService) loadFromPersister() {
if tasks, err := s.persister.LoadTasks(s.userID); err != nil {
if err := s.hydrateTasks(s.appConfig.Context); err != nil {
xlog.Warn("Failed to load tasks from persister", "error", err)
} else {
for _, task := range tasks {
s.tasks.Set(task.ID, task)
if task.Enabled && task.Cron != "" {
if err := s.ScheduleCronTask(task); err != nil {
xlog.Warn("Failed to schedule cron task on load", "error", err, "task_id", task.ID)
}
}
}
xlog.Info("Loaded tasks from persister", "count", len(tasks))
}
if loadedJobs, err := s.persister.LoadJobs(s.userID); err != nil {
@@ -153,6 +173,27 @@ func (s *AgentJobService) loadFromPersister() {
}
}
// hydrateTasks loads tasks into the cross-replica SyncedMap and (re)schedules
// cron entries for enabled tasks. Hydration goes through the SyncedMap's Store
// read-through (Start), not Set, so it neither re-persists nor re-broadcasts the
// loaded tasks. Each service instance hydrates exactly once: the main service via
// Start -> loadFromPersister, per-user services via LoadFromDB or LoadTasksFromFile.
func (s *AgentJobService) hydrateTasks(ctx context.Context) error {
if err := s.tasks.Start(ctx); err != nil {
return err
}
tasks := s.tasks.List()
for _, task := range tasks {
if task.Enabled && task.Cron != "" {
if err := s.ScheduleCronTask(task); err != nil {
xlog.Warn("Failed to schedule cron task on load", "error", err, "task_id", task.ID)
}
}
}
xlog.Info("Loaded tasks from persister", "count", len(tasks))
return nil
}
// JobExecution represents a job to be executed
type JobExecution struct {
Job schema.Job
@@ -200,21 +241,19 @@ func NewAgentJobServiceWithPaths(
) *AgentJobService {
retentionDays := cmp.Or(appConfig.AgentJobRetentionDays, 30)
tasks := xsync.NewSyncedMap[string, schema.Task]()
jobsMap := xsync.NewSyncedMap[string, schema.Job]()
return &AgentJobService{
s := &AgentJobService{
appConfig: appConfig,
modelLoader: modelLoader,
configLoader: configLoader,
evaluator: evaluator,
tasks: tasks,
jobs: jobsMap,
persister: &fileJobPersister{
tasks: tasks,
jobs: jobsMap,
tasksFile: tasksFile,
jobsFile: jobsFile,
taskSet: make(map[string]schema.Task),
},
jobQueue: make(chan JobExecution, 100), // Buffer for 100 jobs
cancellations: xsync.NewSyncedMap[string, context.CancelFunc](),
@@ -222,25 +261,17 @@ func NewAgentJobServiceWithPaths(
cronEntries: xsync.NewSyncedMap[string, cron.EntryID](),
retentionDays: retentionDays,
}
// Build the cross-replica tasks map standalone (nil NATS); SetTaskSyncNATS
// rebuilds it with the distributed client once that is available, before Start.
s.buildTasksMap()
return s
}
// LoadTasksFromFile loads tasks from the persister into the in-memory map
// and schedules cron entries. Named "FromFile" for backward compat; in DB
// mode it loads from the database.
func (s *AgentJobService) LoadTasksFromFile() error {
tasks, err := s.persister.LoadTasks(s.userID)
if err != nil {
return err
}
for _, task := range tasks {
s.tasks.Set(task.ID, task)
if task.Enabled && task.Cron != "" {
if err := s.ScheduleCronTask(task); err != nil {
xlog.Warn("Failed to schedule cron task on load", "error", err, "task_id", task.ID)
}
}
}
return nil
return s.hydrateTasks(s.appConfig.Context)
}
// SaveTasksToFile flushes the current tasks map via the persister. File
@@ -293,8 +324,12 @@ func (s *AgentJobService) CreateTask(task schema.Task) (string, error) {
task.Enabled = true // Default to enabled
}
// Store task
s.tasks.Set(id, task)
// Store task: Set updates the in-memory map, write-throughs to the persister
// (file or DB), and broadcasts the create to peer replicas. Background ctx
// because CreateTask carries no request ctx (mirrors the finetune service).
if err := s.tasks.Set(context.Background(), task); err != nil {
return "", fmt.Errorf("failed to persist task: %w", err)
}
// Schedule cron if enabled and has cron expression
if task.Enabled && task.Cron != "" {
@@ -303,16 +338,15 @@ func (s *AgentJobService) CreateTask(task schema.Task) (string, error) {
}
}
s.saveTasks(task)
return id, nil
}
// UpdateTask updates an existing task
func (s *AgentJobService) UpdateTask(id string, task schema.Task) error {
if !s.tasks.Exists(id) {
existing, ok := s.tasks.Get(id)
if !ok {
return fmt.Errorf("%w: %s", ErrTaskNotFound, id)
}
existing := s.tasks.Get(id)
// Preserve ID and CreatedAt
task.ID = id
@@ -324,8 +358,10 @@ func (s *AgentJobService) UpdateTask(id string, task schema.Task) error {
s.UnscheduleCronTask(id)
}
// Store updated task
s.tasks.Set(id, task)
// Store updated task: write-through + broadcast (see CreateTask).
if err := s.tasks.Set(context.Background(), task); err != nil {
return fmt.Errorf("failed to persist task: %w", err)
}
// Schedule new cron if enabled and has cron expression
if task.Enabled && task.Cron != "" {
@@ -334,24 +370,22 @@ func (s *AgentJobService) UpdateTask(id string, task schema.Task) error {
}
}
s.saveTasks(task)
return nil
}
// DeleteTask deletes a task
func (s *AgentJobService) DeleteTask(id string) error {
if !s.tasks.Exists(id) {
if _, ok := s.tasks.Get(id); !ok {
return fmt.Errorf("%w: %s", ErrTaskNotFound, id)
}
// Unschedule cron
s.UnscheduleCronTask(id)
// Remove from memory
s.tasks.Delete(id)
if err := s.persister.DeleteTask(id); err != nil {
xlog.Warn("Failed to delete task from persister", "error", err, "task_id", id)
// Delete removes from the in-memory map, deletes from the persister, and
// broadcasts the removal to peer replicas.
if err := s.tasks.Delete(context.Background(), id); err != nil {
xlog.Warn("Failed to delete task from store", "error", err, "task_id", id)
}
return nil
@@ -359,8 +393,8 @@ func (s *AgentJobService) DeleteTask(id string) error {
// GetTask retrieves a task by ID
func (s *AgentJobService) GetTask(id string) (*schema.Task, error) {
task := s.tasks.Get(id)
if task.ID == "" {
task, ok := s.tasks.Get(id)
if !ok {
return nil, fmt.Errorf("%w: %s", ErrTaskNotFound, id)
}
return &task, nil
@@ -368,7 +402,7 @@ func (s *AgentJobService) GetTask(id string) (*schema.Task, error) {
// ListTasks returns all tasks, sorted by creation date (newest first)
func (s *AgentJobService) ListTasks() []schema.Task {
tasks := s.tasks.Values()
tasks := s.tasks.List()
// Sort by CreatedAt descending (newest first), then by Name for stability
slices.SortFunc(tasks, func(a, b schema.Task) int {
if a.CreatedAt.Equal(b.CreatedAt) {
@@ -397,8 +431,8 @@ func (s *AgentJobService) buildPrompt(templateStr string, params map[string]stri
// ExecuteJob creates and queues a job for execution
// multimedia can be nil for backward compatibility
func (s *AgentJobService) ExecuteJob(taskID string, params map[string]string, triggeredBy string, multimedia *schema.MultimediaAttachment) (string, error) {
task := s.tasks.Get(taskID)
if task.ID == "" {
task, ok := s.tasks.Get(taskID)
if !ok {
return "", fmt.Errorf("%w: %s", ErrTaskNotFound, taskID)
}
@@ -1451,6 +1485,12 @@ func (s *AgentJobService) Stop() error {
if s.cronScheduler != nil {
s.cronScheduler.Stop()
}
// Release the tasks SyncedMap subscription / background workers.
if s.tasks != nil {
if err := s.tasks.Close(); err != nil {
xlog.Warn("Error closing tasks sync map", "error", err)
}
}
xlog.Info("AgentJobService stopped")
return nil
}

View File

@@ -14,24 +14,38 @@ import (
)
// fileJobPersister persists tasks and jobs to JSON files.
// It holds references to the service's syncmaps and serializes the entire
// map contents on each save (bulk write). Reads at runtime return nil
// (the in-memory map is the authoritative source); LoadTasks/LoadJobs
// are used only at startup to bootstrap the syncmaps.
//
// Jobs serialize the service's in-memory jobs syncmap on each save (bulk write).
// Tasks are kept in this persister's own taskSet map instead: the tasks SyncedMap
// calls SaveTask/DeleteTask while holding its internal lock (write-through), so
// reading back the SyncedMap here would re-enter that lock and deadlock. The
// self-contained taskSet, seeded by LoadTasks, lets a per-task write rewrite the
// whole bulk file without touching the SyncedMap.
//
// Runtime reads (GetJob/ListJobs) return nil (the in-memory state is the
// authoritative source); LoadTasks/LoadJobs bootstrap state at startup.
type fileJobPersister struct {
tasks *xsync.SyncedMap[string, schema.Task]
jobs *xsync.SyncedMap[string, schema.Job]
tasksFile string
jobsFile string
mu sync.Mutex
// taskSet is the persister's own view of all tasks, seeded by LoadTasks and
// updated by SaveTask/DeleteTask. The bulk JSON file is rewritten from it.
taskSet map[string]schema.Task
}
func (p *fileJobPersister) SaveTask(_ string, _ schema.Task) error {
return p.saveTasksToFile()
func (p *fileJobPersister) SaveTask(_ string, task schema.Task) error {
p.mu.Lock()
defer p.mu.Unlock()
p.taskSet[task.ID] = task
return p.writeTasksLocked()
}
func (p *fileJobPersister) DeleteTask(_ string) error {
return p.saveTasksToFile()
func (p *fileJobPersister) DeleteTask(taskID string) error {
p.mu.Lock()
defer p.mu.Unlock()
delete(p.taskSet, taskID)
return p.writeTasksLocked()
}
func (p *fileJobPersister) SaveJob(_ string, _ schema.Job) error {
@@ -43,7 +57,9 @@ func (p *fileJobPersister) DeleteJob(_ string) error {
}
func (p *fileJobPersister) FlushTasks() error {
return p.saveTasksToFile()
p.mu.Lock()
defer p.mu.Unlock()
return p.writeTasksLocked()
}
func (p *fileJobPersister) FlushJobs() error {
@@ -83,6 +99,12 @@ func (p *fileJobPersister) LoadTasks(_ string) ([]schema.Task, error) {
return nil, fmt.Errorf("failed to parse tasks file: %w", err)
}
// Seed the in-memory set so subsequent per-task SaveTask/DeleteTask merge into
// (rather than overwrite) the persisted tasks when the bulk file is rewritten.
for _, t := range tf.Tasks {
p.taskSet[t.ID] = t
}
xlog.Info("Loaded tasks from file", "count", len(tf.Tasks))
return tf.Tasks, nil
}
@@ -118,19 +140,20 @@ func (p *fileJobPersister) CleanupOldJobs(_ time.Duration) (int64, error) {
return 0, nil // cleanup handled via in-memory filtering
}
// saveTasksToFile serializes the entire tasks map to the JSON file.
func (p *fileJobPersister) saveTasksToFile() error {
// writeTasksLocked serializes the persister's task set to the JSON file. Callers
// must hold p.mu.
func (p *fileJobPersister) writeTasksLocked() error {
if p.tasksFile == "" {
return nil
}
p.mu.Lock()
defer p.mu.Unlock()
tf := schema.TasksFile{
Tasks: p.tasks.Values(),
tasks := make([]schema.Task, 0, len(p.taskSet))
for _, t := range p.taskSet {
tasks = append(tasks, t)
}
tf := schema.TasksFile{Tasks: tasks}
data, err := json.MarshalIndent(tf, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal tasks: %w", err)

View File

@@ -20,28 +20,26 @@ var _ = Describe("JobPersister", func() {
Context("fileJobPersister", func() {
var (
p *fileJobPersister
tasks *xsync.SyncedMap[string, schema.Task]
jobsMap *xsync.SyncedMap[string, schema.Job]
tmpDir string
)
BeforeEach(func() {
tmpDir = GinkgoT().TempDir()
tasks = xsync.NewSyncedMap[string, schema.Task]()
jobsMap = xsync.NewSyncedMap[string, schema.Job]()
p = &fileJobPersister{
tasks: tasks,
jobs: jobsMap,
tasksFile: filepath.Join(tmpDir, "tasks.json"),
jobsFile: filepath.Join(tmpDir, "jobs.json"),
// taskSet is the persister's own task view (decoupled from the tasks
// SyncedMap to avoid re-entering its lock during write-through).
taskSet: make(map[string]schema.Task),
}
})
It("SaveTask writes all tasks to file", func() {
tasks.Set("t1", schema.Task{ID: "t1", Name: "Task One", Model: "m", Prompt: "p"})
tasks.Set("t2", schema.Task{ID: "t2", Name: "Task Two", Model: "m", Prompt: "p"})
Expect(p.SaveTask("", schema.Task{})).To(Succeed())
Expect(p.SaveTask("", schema.Task{ID: "t1", Name: "Task One", Model: "m", Prompt: "p"})).To(Succeed())
Expect(p.SaveTask("", schema.Task{ID: "t2", Name: "Task Two", Model: "m", Prompt: "p"})).To(Succeed())
// Verify file contents
data, err := os.ReadFile(p.tasksFile)
@@ -52,11 +50,9 @@ var _ = Describe("JobPersister", func() {
})
It("DeleteTask writes updated tasks to file", func() {
tasks.Set("t1", schema.Task{ID: "t1", Name: "Keep"})
tasks.Set("t2", schema.Task{ID: "t2", Name: "Delete"})
Expect(p.SaveTask("", schema.Task{ID: "t1", Name: "Keep"})).To(Succeed())
Expect(p.SaveTask("", schema.Task{ID: "t2", Name: "Delete"})).To(Succeed())
// Simulate deletion from memory (caller does this before calling persister)
tasks.Delete("t2")
Expect(p.DeleteTask("t2")).To(Succeed())
data, err := os.ReadFile(p.tasksFile)

View File

@@ -0,0 +1,152 @@
package agentpool
// White-box tests (package agentpool) so a spec can build two AgentJobService
// instances sharing one in-memory bus and assert that agent *tasks* converge
// across replicas - the bug this migration fixes (ListTasks used to read
// in-memory only, so a task created on replica A was invisible on replica B).
// Jobs are deliberately untouched here: they already converge via the dispatcher
// + DB read-through.
import (
"context"
"time"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
"github.com/mudler/LocalAI/core/services/testutil"
"github.com/mudler/LocalAI/pkg/system"
)
// newTaskSyncService builds an AgentJobService wired to the given bus and a
// throwaway data dir (so the file persister has somewhere to write). Model/config
// loaders are nil because the task sync paths under test never touch them.
func newTaskSyncService(bus messaging.MessagingClient) *AgentJobService {
tmpDir := GinkgoT().TempDir()
sysState := &system.SystemState{}
sysState.Model.ModelsPath = tmpDir
appConfig := config.NewApplicationConfig(
config.WithDynamicConfigDir(tmpDir),
config.WithContext(context.Background()),
)
appConfig.SystemState = sysState
svc := NewAgentJobServiceWithPaths(appConfig, nil, nil, nil,
// Distinct per-replica files so the file persister write-through never
// crosses replicas: convergence here must be proven via the bus alone.
tmpDir+"/tasks.json", tmpDir+"/jobs.json")
svc.SetTaskSyncNATS(bus)
return svc
}
var _ = Describe("AgentJobService task cross-replica sync", func() {
Describe("two replicas sharing one bus", func() {
var (
bus *testutil.FakeBus
a, b *AgentJobService
)
BeforeEach(func() {
// One shared bus, two replicas: exactly the distributed topology where a
// round-robin request may land on a replica that did not originate the
// change.
bus = testutil.NewFakeBus()
a = newTaskSyncService(bus)
b = newTaskSyncService(bus)
// Start hydrates (empty here) and subscribes both replicas to deltas.
Expect(a.Start(context.Background())).To(Succeed())
Expect(b.Start(context.Background())).To(Succeed())
})
AfterEach(func() {
Expect(a.Stop()).To(Succeed())
Expect(b.Stop()).To(Succeed())
})
It("makes a task created on A visible via B's GetTask and ListTasks", func() {
id, err := a.CreateTask(schema.Task{Name: "Shared", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
got, err := b.GetTask(id)
Expect(err).NotTo(HaveOccurred(), "B must see a task A just created")
Expect(got.Name).To(Equal("Shared"))
listed := b.ListTasks()
Expect(listed).To(HaveLen(1))
Expect(listed[0].ID).To(Equal(id))
})
It("propagates a task update from A to B", func() {
id, err := a.CreateTask(schema.Task{Name: "Before", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
Expect(a.UpdateTask(id, schema.Task{Name: "After", Model: "m", Prompt: "p"})).To(Succeed())
got, err := b.GetTask(id)
Expect(err).NotTo(HaveOccurred())
Expect(got.Name).To(Equal("After"), "an update on A must be visible on B")
})
It("removes a task from B when it is deleted on A", func() {
id, err := a.CreateTask(schema.Task{Name: "Doomed", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
_, err = b.GetTask(id)
Expect(err).NotTo(HaveOccurred(), "precondition: B must have the task before the delete")
Expect(a.DeleteTask(id)).To(Succeed())
_, err = b.GetTask(id)
Expect(err).To(HaveOccurred(), "a delete on A must remove the task from B")
Expect(b.ListTasks()).To(BeEmpty())
})
It("does not re-broadcast a delta it received (echo-loop guard)", func() {
subject := messaging.SubjectSyncStateDelta("agent.tasks")
_, err := a.CreateTask(schema.Task{Name: "Once", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
// Exactly one publish: A's create. B applies it without re-publishing,
// otherwise this would be 2+ and a real bus would storm.
Expect(bus.PublishCount(subject)).To(Equal(1))
})
})
Describe("ListTasks ordering and scoping", func() {
var svc *AgentJobService
BeforeEach(func() {
svc = newTaskSyncService(testutil.NewFakeBus())
Expect(svc.Start(context.Background())).To(Succeed())
})
AfterEach(func() { Expect(svc.Stop()).To(Succeed()) })
It("sorts newest-first, breaking ties by name", func() {
// CreateTask stamps CreatedAt with time.Now(); space them out so ordering
// is deterministic rather than relying on the sub-millisecond gap.
oldID, err := svc.CreateTask(schema.Task{Name: "Old", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
time.Sleep(5 * time.Millisecond)
newID, err := svc.CreateTask(schema.Task{Name: "New", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
listed := svc.ListTasks()
Expect(listed).To(HaveLen(2))
Expect(listed[0].ID).To(Equal(newID), "newest first")
Expect(listed[1].ID).To(Equal(oldID))
})
})
Describe("compile-time adapter contract", func() {
It("satisfies syncstate.Store for tasks", func() {
// Mirrors the var assertion in task_syncstore.go; keeps the type
// referenced from a spec so drift surfaces here too.
var _ syncstate.Store[string, schema.Task] = (*taskStoreAdapter)(nil)
Expect(&taskStoreAdapter{}).ToNot(BeNil())
})
})
})

View File

@@ -0,0 +1,47 @@
package agentpool
import (
"context"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/syncstate"
)
// taskStoreAdapter bridges the existing JobPersister (file- or DB-backed) to the
// generic syncstate.Store the tasks SyncedMap consumes. Only tasks are migrated:
// jobs already converge across replicas via the dispatcher (NATS) plus the DB
// read-through in ListJobs/GetJob, whereas ListTasks read in-memory only and so
// went stale on replicas that did not originate the change.
//
// The adapter reads svc.persister and svc.userID live (rather than capturing
// them) because both are configured by setters - SetDistributedJobStore swaps the
// file persister for the DB one, SetUserID scopes per-user queries - AFTER the
// service, and thus this adapter, is constructed. Reading them at call time means
// the SyncedMap never has to be rebuilt when the persister is swapped.
//
// The SyncedMap value type is schema.Task: the exact shape ListTasks returns, so
// reads need no conversion and REST responses are provably unchanged.
type taskStoreAdapter struct {
svc *AgentJobService
}
// compile-time assertion that the adapter satisfies the component's Store.
var _ syncstate.Store[string, schema.Task] = (*taskStoreAdapter)(nil)
// List hydrates the map from durable storage on Start/reconnect: the file's task
// list (standalone) or every task row (DB / distributed).
func (a *taskStoreAdapter) List(_ context.Context) ([]schema.Task, error) {
return a.svc.persister.LoadTasks(a.svc.userID)
}
// Upsert write-through persists a single task created/updated locally; the
// SyncedMap then broadcasts the delta to peers.
func (a *taskStoreAdapter) Upsert(_ context.Context, task schema.Task) error {
return a.svc.persister.SaveTask(a.svc.userID, task)
}
// Delete write-through removes a task locally; the SyncedMap then broadcasts the
// removal to peers.
func (a *taskStoreAdapter) Delete(_ context.Context, id string) error {
return a.svc.persister.DeleteTask(id)
}

View File

@@ -7,6 +7,7 @@ import (
"github.com/mudler/LocalAGI/webui/collections"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/services/jobs"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/templates"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
@@ -28,6 +29,9 @@ type UserServicesManager struct {
// Shared distributed backends (set once, inherited by per-user job services)
jobDispatcher DistributedDispatcher
jobDBStore *jobs.JobStore
// jobNats keeps per-user agent tasks consistent across replicas (nil in
// standalone). Inherited by each per-user AgentJobService.
jobNats messaging.MessagingClient
}
// NewUserServicesManager creates a new UserServicesManager.
@@ -162,6 +166,10 @@ func (m *UserServicesManager) GetJobs(userID string) (*AgentJobService, error) {
if m.jobDispatcher != nil {
svc.SetDistributedBackends(m.jobDispatcher)
}
// Inherit the NATS client so per-user tasks broadcast across replicas. Must be
// set before the hydrate below (LoadFromDB / LoadTasksFromFile) so the tasks
// SyncedMap is rebuilt with the client while it is still empty.
svc.SetTaskSyncNATS(m.jobNats)
if m.jobDBStore != nil {
svc.SetDistributedJobStore(m.jobDBStore)
// Load tasks/jobs from DB immediately (per-user services skip Start())
@@ -189,6 +197,12 @@ func (m *UserServicesManager) SetJobDBStore(s *jobs.JobStore) {
m.jobDBStore = s
}
// SetJobSyncNATS sets the NATS client used to keep per-user agent tasks consistent
// across replicas.
func (m *UserServicesManager) SetJobSyncNATS(nats messaging.MessagingClient) {
m.jobNats = nats
}
// ListAllUserIDs returns all user IDs that have scoped data directories.
func (m *UserServicesManager) ListAllUserIDs() ([]string, error) {
return m.storage.ListUserDirs()

View File

@@ -8,6 +8,7 @@ import (
"github.com/google/uuid"
"github.com/mudler/LocalAI/core/services/advisorylock"
"gorm.io/gorm"
"gorm.io/gorm/clause"
)
// FineTuneJobRecord tracks fine-tune jobs in PostgreSQL.
@@ -80,6 +81,34 @@ func (s *FineTuneStore) List(userID string) ([]FineTuneJobRecord, error) {
return jobs, q.Find(&jobs).Error
}
// ListAll returns every fine-tune job across all users. The SyncedMap that backs
// FineTuneService is a single global map (the REST API filters by user at read
// time), so hydrate needs the full set rather than the per-user List above.
func (s *FineTuneStore) ListAll() ([]FineTuneJobRecord, error) {
var jobs []FineTuneJobRecord
return jobs, s.db.Order("created_at DESC").Find(&jobs).Error
}
// Upsert idempotently inserts or fully replaces a job row by primary key. The
// SyncedMap write-through path issues a single Set per mutation regardless of
// whether the job already exists, so it needs one create-or-update primitive
// (Create alone fails on a duplicate key, UpdateStatus alone misses new rows and
// only touches a few columns).
func (s *FineTuneStore) Upsert(job *FineTuneJobRecord) error {
if job.ID == "" {
job.ID = uuid.New().String()
}
now := time.Now()
if job.CreatedAt.IsZero() {
job.CreatedAt = now
}
job.UpdatedAt = now
return s.db.Clauses(clause.OnConflict{
Columns: []clause.Column{{Name: "id"}},
UpdateAll: true,
}).Create(job).Error
}
// UpdateStatus updates the status and message of a fine-tune job.
func (s *FineTuneStore) UpdateStatus(id, status, message string) error {
return s.db.Model(&FineTuneJobRecord{}).Where("id = ?", id).Updates(map[string]any{

View File

@@ -0,0 +1,13 @@
package distributed_test
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestDistributed(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Distributed Suite")
}

View File

@@ -0,0 +1,61 @@
package distributed_test
import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/testutil"
)
var _ = Describe("FineTuneStore", func() {
var store *distributed.FineTuneStore
BeforeEach(func() {
db := testutil.SetupTestDB()
var err error
store, err = distributed.NewFineTuneStore(db)
Expect(err).ToNot(HaveOccurred())
})
Describe("ListAll", func() {
It("returns jobs across all users (unlike per-user List)", func() {
Expect(store.Create(&distributed.FineTuneJobRecord{ID: "j1", UserID: "u1", Status: "queued"})).To(Succeed())
Expect(store.Create(&distributed.FineTuneJobRecord{ID: "j2", UserID: "u2", Status: "queued"})).To(Succeed())
all, err := store.ListAll()
Expect(err).ToNot(HaveOccurred())
Expect(all).To(HaveLen(2))
perUser, err := store.List("u1")
Expect(err).ToNot(HaveOccurred())
Expect(perUser).To(HaveLen(1), "List stays per-user")
})
})
Describe("Upsert", func() {
It("inserts a new row", func() {
Expect(store.Upsert(&distributed.FineTuneJobRecord{ID: "up-1", UserID: "u1", Status: "queued"})).To(Succeed())
got, err := store.Get("up-1")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("queued"))
})
It("idempotently updates an existing row on a repeated key", func() {
Expect(store.Upsert(&distributed.FineTuneJobRecord{ID: "up-2", UserID: "u1", Status: "queued"})).To(Succeed())
// Second Upsert with the same primary key must update, not error on a
// duplicate-key violation (this is the SyncedMap write-through contract).
Expect(store.Upsert(&distributed.FineTuneJobRecord{ID: "up-2", UserID: "u1", Status: "completed", Message: "done"})).To(Succeed())
got, err := store.Get("up-2")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("completed"))
Expect(got.Message).To(Equal("done"))
all, err := store.ListAll()
Expect(err).ToNot(HaveOccurred())
Expect(all).To(HaveLen(1), "upsert must not create a duplicate")
})
})
})

View File

@@ -11,6 +11,7 @@ import (
type Stores struct {
Gallery *GalleryStore
FineTune *FineTuneStore
Quant *QuantStore
Skills *SkillStore
}
@@ -26,15 +27,21 @@ func InitStores(db *gorm.DB) (*Stores, error) {
return nil, fmt.Errorf("fine-tune store: %w", err)
}
quant, err := NewQuantStore(db)
if err != nil {
return nil, fmt.Errorf("quantization store: %w", err)
}
skills, err := NewSkillStore(db)
if err != nil {
return nil, fmt.Errorf("skills store: %w", err)
}
xlog.Info("Distributed stores initialized (Gallery, FineTune, Skills)")
xlog.Info("Distributed stores initialized (Gallery, FineTune, Quant, Skills)")
return &Stores{
Gallery: gallery,
FineTune: ft,
Quant: quant,
Skills: skills,
}, nil
}

View File

@@ -0,0 +1,105 @@
package distributed
import (
"context"
"fmt"
"time"
"github.com/google/uuid"
"github.com/mudler/LocalAI/core/services/advisorylock"
"gorm.io/gorm"
"gorm.io/gorm/clause"
)
// QuantJobRecord tracks quantization jobs in PostgreSQL. The columns mirror the
// API shape (schema.QuantizationJob); the structured Config and ExtraOptions are
// serialized into JSON text columns so a record fully reconstructs the job.
type QuantJobRecord struct {
ID string `gorm:"primaryKey;size:36" json:"id"`
UserID string `gorm:"index;size:36" json:"user_id,omitempty"`
Model string `gorm:"size:255" json:"model"`
Backend string `gorm:"size:64" json:"backend"`
ModelID string `gorm:"size:255" json:"model_id,omitempty"`
QuantizationType string `gorm:"size:32" json:"quantization_type"`
Status string `gorm:"index;size:32;default:queued" json:"status"` // queued, downloading, converting, quantizing, completed, failed, stopped
Message string `gorm:"type:text" json:"message,omitempty"`
OutputDir string `gorm:"size:512" json:"output_dir,omitempty"`
OutputFile string `gorm:"size:512" json:"output_file,omitempty"`
ConfigJSON string `gorm:"column:config;type:text" json:"-"`
ExtraOptsJSON string `gorm:"column:extra_options;type:text" json:"-"`
ImportStatus string `gorm:"size:32" json:"import_status,omitempty"`
ImportMessage string `gorm:"type:text" json:"import_message,omitempty"`
ImportModelName string `gorm:"size:255" json:"import_model_name,omitempty"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
}
func (QuantJobRecord) TableName() string { return "quantization_jobs" }
// QuantStore manages quantization job state in PostgreSQL.
type QuantStore struct {
db *gorm.DB
}
// NewQuantStore creates a new QuantStore and auto-migrates.
// Uses a PostgreSQL advisory lock to prevent concurrent migration races
// when multiple instances (frontend + workers) start at the same time.
func NewQuantStore(db *gorm.DB) (*QuantStore, error) {
if err := advisorylock.WithLockCtx(context.Background(), db, advisorylock.KeySchemaMigrate, func() error {
return db.AutoMigrate(&QuantJobRecord{})
}); err != nil {
return nil, fmt.Errorf("migrating quantization_jobs: %w", err)
}
return &QuantStore{db: db}, nil
}
// Create stores a new quantization job.
func (s *QuantStore) Create(job *QuantJobRecord) error {
if job.ID == "" {
job.ID = uuid.New().String()
}
job.CreatedAt = time.Now()
job.UpdatedAt = job.CreatedAt
return s.db.Create(job).Error
}
// Get retrieves a quantization job by ID.
func (s *QuantStore) Get(id string) (*QuantJobRecord, error) {
var job QuantJobRecord
if err := s.db.First(&job, "id = ?", id).Error; err != nil {
return nil, err
}
return &job, nil
}
// ListAll returns every quantization job across all users. The SyncedMap that
// backs QuantizationService is a single global map (the REST API filters by user
// at read time), so hydrate needs the full set.
func (s *QuantStore) ListAll() ([]QuantJobRecord, error) {
var jobs []QuantJobRecord
return jobs, s.db.Order("created_at DESC").Find(&jobs).Error
}
// Upsert idempotently inserts or fully replaces a job row by primary key. The
// SyncedMap write-through path issues a single Set per mutation regardless of
// whether the job already exists, so it needs one create-or-update primitive
// (Create alone fails on a duplicate key).
func (s *QuantStore) Upsert(job *QuantJobRecord) error {
if job.ID == "" {
job.ID = uuid.New().String()
}
now := time.Now()
if job.CreatedAt.IsZero() {
job.CreatedAt = now
}
job.UpdatedAt = now
return s.db.Clauses(clause.OnConflict{
Columns: []clause.Column{{Name: "id"}},
UpdateAll: true,
}).Create(job).Error
}
// Delete removes a quantization job.
func (s *QuantStore) Delete(id string) error {
return s.db.Where("id = ?", id).Delete(&QuantJobRecord{}).Error
}

View File

@@ -0,0 +1,57 @@
package distributed_test
import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/testutil"
)
var _ = Describe("QuantStore", func() {
var store *distributed.QuantStore
BeforeEach(func() {
db := testutil.SetupTestDB()
var err error
store, err = distributed.NewQuantStore(db)
Expect(err).ToNot(HaveOccurred())
})
Describe("ListAll", func() {
It("returns jobs across all users", func() {
Expect(store.Create(&distributed.QuantJobRecord{ID: "j1", UserID: "u1", Status: "queued"})).To(Succeed())
Expect(store.Create(&distributed.QuantJobRecord{ID: "j2", UserID: "u2", Status: "queued"})).To(Succeed())
all, err := store.ListAll()
Expect(err).ToNot(HaveOccurred())
Expect(all).To(HaveLen(2))
})
})
Describe("Upsert", func() {
It("inserts a new row", func() {
Expect(store.Upsert(&distributed.QuantJobRecord{ID: "up-1", UserID: "u1", Status: "queued"})).To(Succeed())
got, err := store.Get("up-1")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("queued"))
})
It("idempotently updates an existing row on a repeated key", func() {
Expect(store.Upsert(&distributed.QuantJobRecord{ID: "up-2", UserID: "u1", Status: "queued"})).To(Succeed())
// Second Upsert with the same primary key must update, not error on a
// duplicate-key violation (this is the SyncedMap write-through contract).
Expect(store.Upsert(&distributed.QuantJobRecord{ID: "up-2", UserID: "u1", Status: "completed", Message: "done"})).To(Succeed())
got, err := store.Get("up-2")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("completed"))
Expect(got.Message).To(Equal("done"))
all, err := store.ListAll()
Expect(err).ToNot(HaveOccurred())
Expect(all).To(HaveLen(1), "upsert must not create a duplicate")
})
})
})

View File

@@ -0,0 +1,13 @@
package finetune
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestFinetune(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Finetune Suite")
}

View File

@@ -19,6 +19,7 @@ import (
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/utils"
@@ -32,44 +33,63 @@ type FineTuneService struct {
modelLoader *model.ModelLoader
configLoader *config.ModelConfigLoader
mu sync.Mutex
jobs map[string]*schema.FineTuneJob
// mu serializes the read-modify-write of job values. The SyncedMap guards its
// own map structure, but a job is a pointer mutated in place (e.g. the export
// goroutine), so the service still needs a lock to keep those field updates
// and the subsequent Set atomic with respect to readers.
mu sync.Mutex
// Distributed mode (nil when not in distributed mode)
natsClient messaging.Publisher
fineTuneStore *distributed.FineTuneStore
// jobs is the cross-replica job store: an in-memory map kept consistent across
// replicas via NATS, optionally read-through to PostgreSQL in distributed mode.
jobs *syncstate.SyncedMap[string, *schema.FineTuneJob]
}
// SetNATSClient sets the NATS client for distributed progress publishing.
func (s *FineTuneService) SetNATSClient(nc messaging.Publisher) {
s.mu.Lock()
defer s.mu.Unlock()
s.natsClient = nc
}
// SetFineTuneStore sets the PostgreSQL fine-tune store for distributed persistence.
func (s *FineTuneService) SetFineTuneStore(store *distributed.FineTuneStore) {
s.mu.Lock()
defer s.mu.Unlock()
s.fineTuneStore = store
}
// NewFineTuneService creates a new FineTuneService.
// NewFineTuneService creates a new FineTuneService. In distributed mode pass the
// shared NATS client and PostgreSQL store so jobs stay consistent across
// replicas; pass nil for both in standalone mode, where the disk Loader hydrates
// the map and there is nothing to broadcast.
func NewFineTuneService(
appConfig *config.ApplicationConfig,
modelLoader *model.ModelLoader,
configLoader *config.ModelConfigLoader,
nats messaging.MessagingClient,
store *distributed.FineTuneStore,
) *FineTuneService {
s := &FineTuneService{
appConfig: appConfig,
modelLoader: modelLoader,
configLoader: configLoader,
jobs: make(map[string]*schema.FineTuneJob),
}
s.loadAllJobs()
// Only attach a Store interface when a concrete store exists, otherwise the
// SyncedMap would see a non-nil interface wrapping a nil pointer and try to
// hydrate/write through a nil DB.
var syncStore syncstate.Store[string, *schema.FineTuneJob]
if store != nil {
syncStore = &fineTuneStoreAdapter{store: store}
}
s.jobs = syncstate.New(syncstate.Config[string, *schema.FineTuneJob]{
Name: "finetune.jobs",
Key: func(j *schema.FineTuneJob) string { return j.ID },
Nats: nats,
Store: syncStore,
Loader: s.loadJobsFromDisk, // ignored when Store is set (distributed mode)
})
// Hydrate + subscribe. A hydrate failure must not take the server down: log
// and continue degraded (standalone), mirroring the OpCache wiring.
if err := s.jobs.Start(appConfig.Context); err != nil {
xlog.Warn("FineTune SyncedMap start failed; running degraded", "error", err)
}
return s
}
// Close releases the SyncedMap subscription and background workers.
func (s *FineTuneService) Close() error {
return s.jobs.Close()
}
// fineTuneBaseDir returns the base directory for fine-tune job data.
func (s *FineTuneService) fineTuneBaseDir() string {
return filepath.Join(s.appConfig.DataPath, "fine-tune")
@@ -100,15 +120,18 @@ func (s *FineTuneService) saveJobState(job *schema.FineTuneJob) {
}
}
// loadAllJobs scans the fine-tune directory for persisted jobs and loads them.
func (s *FineTuneService) loadAllJobs() {
// loadJobsFromDisk scans the fine-tune directory for persisted jobs and returns
// them. It is the SyncedMap Loader used in standalone mode (no DB); the returned
// slice hydrates the map on Start.
func (s *FineTuneService) loadJobsFromDisk(_ context.Context) ([]*schema.FineTuneJob, error) {
baseDir := s.fineTuneBaseDir()
entries, err := os.ReadDir(baseDir)
if err != nil {
// Directory doesn't exist yet — that's fine
return
// Directory doesn't exist yet — that's fine, start empty.
return nil, nil
}
var jobs []*schema.FineTuneJob
for _, entry := range entries {
if !entry.IsDir() {
continue
@@ -137,12 +160,13 @@ func (s *FineTuneService) loadAllJobs() {
job.ExportMessage = "Server restarted while export was running"
}
s.jobs[job.ID] = &job
jobs = append(jobs, &job)
}
if len(s.jobs) > 0 {
xlog.Info("Loaded persisted fine-tune jobs", "count", len(s.jobs))
if len(jobs) > 0 {
xlog.Info("Loaded persisted fine-tune jobs", "count", len(jobs))
}
return jobs, nil
}
// StartJob starts a new fine-tuning job.
@@ -236,27 +260,13 @@ func (s *FineTuneService) StartJob(ctx context.Context, userID string, req schem
CreatedAt: time.Now().UTC().Format(time.RFC3339),
Config: &req,
}
s.jobs[jobID] = job
s.saveJobState(job)
// Persist to PostgreSQL in distributed mode
if s.fineTuneStore != nil {
configJSON, _ := json.Marshal(req)
extraJSON, _ := json.Marshal(req.ExtraOptions)
s.fineTuneStore.Create(&distributed.FineTuneJobRecord{
ID: jobID,
UserID: userID,
Model: req.Model,
Backend: backendName,
ModelID: modelID,
TrainingType: req.TrainingType,
TrainingMethod: req.TrainingMethod,
Status: "queued",
OutputDir: outputDir,
ConfigJSON: string(configJSON),
ExtraOptsJSON: string(extraJSON),
})
// Set write-through persists to PostgreSQL (distributed) and broadcasts to
// peer replicas; the disk state.json is written separately for restart
// recovery / standalone hydrate.
if err := s.jobs.Set(ctx, job); err != nil {
return nil, fmt.Errorf("failed to persist job: %w", err)
}
s.saveJobState(job)
return &schema.FineTuneJobResponse{
ID: jobID,
@@ -270,7 +280,7 @@ func (s *FineTuneService) GetJob(userID, jobID string) (*schema.FineTuneJob, err
s.mu.Lock()
defer s.mu.Unlock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
return nil, fmt.Errorf("job not found: %s", jobID)
}
@@ -286,7 +296,7 @@ func (s *FineTuneService) ListJobs(userID string) []*schema.FineTuneJob {
defer s.mu.Unlock()
var result []*schema.FineTuneJob
for _, job := range s.jobs {
for _, job := range s.jobs.List() {
if userID == "" || job.UserID == userID {
result = append(result, job)
}
@@ -302,7 +312,7 @@ func (s *FineTuneService) ListJobs(userID string) []*schema.FineTuneJob {
// StopJob stops a running fine-tuning job.
func (s *FineTuneService) StopJob(ctx context.Context, userID, jobID string, saveCheckpoint bool) error {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID)
@@ -323,10 +333,10 @@ func (s *FineTuneService) StopJob(ctx context.Context, userID, jobID string, sav
s.mu.Lock()
job.Status = "stopped"
job.Message = "Training stopped by user"
s.saveJobState(job)
if s.fineTuneStore != nil {
s.fineTuneStore.UpdateStatus(jobID, "stopped", "Training stopped by user")
if err := s.jobs.Set(ctx, job); err != nil {
xlog.Warn("Failed to persist stopped job", "job_id", jobID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
return nil
@@ -335,7 +345,7 @@ func (s *FineTuneService) StopJob(ctx context.Context, userID, jobID string, sav
// DeleteJob removes a fine-tuning job and its associated data from disk.
func (s *FineTuneService) DeleteJob(userID, jobID string) error {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID)
@@ -360,9 +370,10 @@ func (s *FineTuneService) DeleteJob(userID, jobID string) error {
}
exportModelName := job.ExportModelName
delete(s.jobs, jobID)
if s.fineTuneStore != nil {
s.fineTuneStore.Delete(jobID)
// Delete write-through removes the DB row (distributed) and broadcasts the
// removal to peer replicas. DeleteJob has no ctx, so use Background.
if err := s.jobs.Delete(context.Background(), jobID); err != nil {
xlog.Warn("Failed to delete job from store", "job_id", jobID, "error", err)
}
s.mu.Unlock()
@@ -398,7 +409,7 @@ func (s *FineTuneService) DeleteJob(userID, jobID string) error {
// StreamProgress opens a gRPC progress stream and calls the callback for each update.
func (s *FineTuneService) StreamProgress(ctx context.Context, userID, jobID string, callback func(event *schema.FineTuneProgressEvent)) error {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID)
@@ -427,7 +438,7 @@ func (s *FineTuneService) StreamProgress(ctx context.Context, userID, jobID stri
}, func(update *pb.FineTuneProgressUpdate) {
// Update job status and persist
s.mu.Lock()
if j, ok := s.jobs[jobID]; ok {
if j, ok := s.jobs.Get(jobID); ok {
// Don't let progress updates overwrite terminal states
isTerminal := j.Status == "stopped" || j.Status == "completed" || j.Status == "failed"
if !isTerminal {
@@ -436,10 +447,10 @@ func (s *FineTuneService) StreamProgress(ctx context.Context, userID, jobID stri
if update.Message != "" {
j.Message = update.Message
}
s.saveJobState(j)
if s.fineTuneStore != nil {
s.fineTuneStore.UpdateStatus(jobID, j.Status, j.Message)
if err := s.jobs.Set(ctx, j); err != nil {
xlog.Warn("Failed to persist progress update", "job_id", jobID, "error", err)
}
s.saveJobState(j)
}
s.mu.Unlock()
@@ -474,7 +485,7 @@ func (s *FineTuneService) StreamProgress(ctx context.Context, userID, jobID stri
// ListCheckpoints lists checkpoints for a job.
func (s *FineTuneService) ListCheckpoints(ctx context.Context, userID, jobID string) ([]*pb.CheckpointInfo, error) {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return nil, fmt.Errorf("job not found: %s", jobID)
@@ -520,7 +531,7 @@ func sanitizeModelName(s string) string {
// ExportModel starts an async model export from a checkpoint and returns the intended model name immediately.
func (s *FineTuneService) ExportModel(ctx context.Context, userID, jobID string, req schema.ExportRequest) (string, error) {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return "", fmt.Errorf("job not found: %s", jobID)
@@ -572,6 +583,9 @@ func (s *FineTuneService) ExportModel(ctx context.Context, userID, jobID string,
job.ExportStatus = "exporting"
job.ExportMessage = ""
job.ExportModelName = ""
if err := s.jobs.Set(ctx, job); err != nil {
xlog.Warn("Failed to persist export start", "job_id", jobID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
@@ -662,24 +676,30 @@ func (s *FineTuneService) ExportModel(ctx context.Context, userID, jobID string,
xlog.Info("Model exported and registered", "job_id", jobID, "model_name", modelName, "format", req.ExportFormat)
// Runs after the HTTP request returns, so use Background rather than the
// (now likely cancelled) request ctx for the write-through.
s.mu.Lock()
job.ExportStatus = "completed"
job.ExportModelName = modelName
job.ExportMessage = ""
s.saveJobState(job)
if s.fineTuneStore != nil {
s.fineTuneStore.UpdateExportStatus(jobID, "completed", "", modelName)
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist export completion", "job_id", jobID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
}()
return modelName, nil
}
// setExportMessage updates the export message and persists the job state.
// setExportMessage updates the export message and persists the job state. Called
// from the background export goroutine, so it uses Background for write-through.
func (s *FineTuneService) setExportMessage(job *schema.FineTuneJob, msg string) {
s.mu.Lock()
job.ExportMessage = msg
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist export message", "job_id", job.ID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
}
@@ -687,7 +707,7 @@ func (s *FineTuneService) setExportMessage(job *schema.FineTuneJob, msg string)
// GetExportedModelPath returns the path to the exported model directory and its name.
func (s *FineTuneService) GetExportedModelPath(userID, jobID string) (string, string, error) {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return "", "", fmt.Errorf("job not found: %s", jobID)
@@ -723,10 +743,10 @@ func (s *FineTuneService) setExportFailed(job *schema.FineTuneJob, message strin
s.mu.Lock()
job.ExportStatus = "failed"
job.ExportMessage = message
s.saveJobState(job)
if s.fineTuneStore != nil {
s.fineTuneStore.UpdateExportStatus(job.ID, "failed", message, "")
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist export failure", "job_id", job.ID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
}

View File

@@ -0,0 +1,185 @@
package finetune
// White-box tests (package finetune) so a spec can drive the service's internal
// SyncedMap the same way StartJob does (via jobs.Set) without standing up a
// training backend, then assert the cross-replica reads (GetJob/ListJobs) and
// the adapter conversions that keep REST responses byte-for-byte unchanged.
import (
"context"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/testutil"
)
// newTestService builds a standalone FineTuneService wired to the given bus. The
// model/config loaders are nil because the read/sync paths under test never touch
// them; the data dir is a throwaway temp dir so the disk Loader finds nothing.
func newTestService(bus *testutil.FakeBus) *FineTuneService {
appConfig := &config.ApplicationConfig{
Context: context.Background(),
DataPath: GinkgoT().TempDir(),
}
return NewFineTuneService(appConfig, nil, nil, bus, nil)
}
var _ = Describe("FineTuneService", func() {
ctx := context.Background()
Describe("cross-replica job visibility", func() {
var (
bus *testutil.FakeBus
a, b *FineTuneService
)
BeforeEach(func() {
// One shared bus, two replicas: exactly the distributed topology where
// a round-robin request may land on a replica that did not originate
// the change.
bus = testutil.NewFakeBus()
a = newTestService(bus)
b = newTestService(bus)
})
AfterEach(func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
})
It("makes a job created on A visible via B's GetJob and ListJobs", func() {
job := &schema.FineTuneJob{ID: "job-1", UserID: "user-1", Status: "queued", CreatedAt: "2026-06-27T10:00:00Z"}
// StartJob persists via jobs.Set; drive that directly to avoid a backend.
Expect(a.jobs.Set(ctx, job)).To(Succeed())
got, err := b.GetJob("user-1", "job-1")
Expect(err).ToNot(HaveOccurred(), "B must see a job A just created")
Expect(got.Status).To(Equal("queued"))
listed := b.ListJobs("user-1")
Expect(listed).To(HaveLen(1))
Expect(listed[0].ID).To(Equal("job-1"))
})
It("removes a job from B when it is deleted on A", func() {
job := &schema.FineTuneJob{ID: "job-2", UserID: "user-1", Status: "completed", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, job)).To(Succeed())
_, err := b.GetJob("user-1", "job-2")
Expect(err).ToNot(HaveOccurred(), "precondition: B must have the job before the delete")
Expect(a.jobs.Delete(ctx, "job-2")).To(Succeed())
_, err = b.GetJob("user-1", "job-2")
Expect(err).To(HaveOccurred(), "a delete on A must remove the job from B")
})
It("propagates a status update from A to B", func() {
job := &schema.FineTuneJob{ID: "job-3", UserID: "user-1", Status: "training", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, job)).To(Succeed())
updated := &schema.FineTuneJob{ID: "job-3", UserID: "user-1", Status: "completed", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, updated)).To(Succeed())
got, err := b.GetJob("user-1", "job-3")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("completed"))
})
})
Describe("ListJobs", func() {
var svc *FineTuneService
BeforeEach(func() {
svc = newTestService(testutil.NewFakeBus())
})
AfterEach(func() { Expect(svc.Close()).To(Succeed()) })
It("filters by user and sorts newest-first", func() {
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "old", UserID: "u1", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "new", UserID: "u1", CreatedAt: "2026-06-27T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "other", UserID: "u2", CreatedAt: "2026-06-26T10:00:00Z"})).To(Succeed())
jobs := svc.ListJobs("u1")
Expect(jobs).To(HaveLen(2), "only u1's jobs")
Expect(jobs[0].ID).To(Equal("new"), "newest first")
Expect(jobs[1].ID).To(Equal("old"))
})
It("returns every user's jobs when the userID filter is empty", func() {
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "a", UserID: "u1", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "b", UserID: "u2", CreatedAt: "2026-06-26T10:00:00Z"})).To(Succeed())
Expect(svc.ListJobs("")).To(HaveLen(2))
})
It("rejects GetJob for a job owned by another user", func() {
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "x", UserID: "owner", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
_, err := svc.GetJob("intruder", "x")
Expect(err).To(HaveOccurred(), "a different user must not read someone else's job")
})
})
Describe("store adapter conversion", func() {
// The SyncedMap value type is *schema.FineTuneJob (the exact REST shape).
// These specs prove the DB adapter round-trips it losslessly, so hydrate
// and write-through in distributed mode keep responses unchanged.
It("round-trips a job through jobToRecord/recordToJob preserving the API shape", func() {
original := &schema.FineTuneJob{
ID: "rt-1",
UserID: "user-1",
Model: "base-model",
Backend: "trl",
ModelID: "trl-finetune-rt-1",
TrainingType: "lora",
TrainingMethod: "sft",
Status: "completed",
Message: "done",
OutputDir: "/data/fine-tune/rt-1",
ExtraOptions: map[string]string{"hf_token": "secret"},
CreatedAt: "2026-06-27T10:00:00Z",
ExportStatus: "completed",
ExportMessage: "",
ExportModelName: "base-model-ft-rt-1",
Config: &schema.FineTuneJobRequest{Model: "base-model", Backend: "trl", DatasetSource: "data.jsonl"},
}
rec := jobToRecord(original)
Expect(rec.ID).To(Equal("rt-1"))
Expect(rec.ConfigJSON).ToNot(BeEmpty(), "structured config must serialize into the JSON column")
Expect(rec.ExtraOptsJSON).ToNot(BeEmpty())
back := recordToJob(rec)
Expect(back.ID).To(Equal(original.ID))
Expect(back.UserID).To(Equal(original.UserID))
Expect(back.Model).To(Equal(original.Model))
Expect(back.Backend).To(Equal(original.Backend))
Expect(back.ModelID).To(Equal(original.ModelID))
Expect(back.TrainingType).To(Equal(original.TrainingType))
Expect(back.TrainingMethod).To(Equal(original.TrainingMethod))
Expect(back.Status).To(Equal(original.Status))
Expect(back.Message).To(Equal(original.Message))
Expect(back.OutputDir).To(Equal(original.OutputDir))
Expect(back.ExportStatus).To(Equal(original.ExportStatus))
Expect(back.ExportModelName).To(Equal(original.ExportModelName))
Expect(back.CreatedAt).To(Equal(original.CreatedAt))
Expect(back.ExtraOptions).To(Equal(original.ExtraOptions))
Expect(back.Config).ToNot(BeNil())
Expect(back.Config.DatasetSource).To(Equal("data.jsonl"))
})
})
Describe("compile-time adapter contract", func() {
It("satisfies syncstate.Store for *distributed.FineTuneStore", func() {
// Guards against drift between the adapter and the component interface;
// the var assertion in syncstore.go covers it at build time, this keeps
// the type referenced from a spec too.
var _ *distributed.FineTuneStore
Expect(&fineTuneStoreAdapter{}).ToNot(BeNil())
})
})
})

View File

@@ -0,0 +1,114 @@
package finetune
import (
"context"
"encoding/json"
"time"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/syncstate"
)
// fineTuneStoreAdapter bridges the distributed PostgreSQL FineTuneStore to the
// generic syncstate.Store the SyncedMap consumes. It is only wired in distributed
// mode; standalone leaves Store nil and hydrates from disk via a Loader instead.
//
// The SyncedMap value type is *schema.FineTuneJob (the exact shape the REST API
// returns) so reads need no conversion and the response JSON is provably
// unchanged. The adapter is the single place that translates between that API
// shape and the DB FineTuneJobRecord.
type fineTuneStoreAdapter struct {
store *distributed.FineTuneStore
}
// compile-time assertion that the adapter satisfies the component's Store.
var _ syncstate.Store[string, *schema.FineTuneJob] = (*fineTuneStoreAdapter)(nil)
func (a *fineTuneStoreAdapter) List(_ context.Context) ([]*schema.FineTuneJob, error) {
records, err := a.store.ListAll()
if err != nil {
return nil, err
}
jobs := make([]*schema.FineTuneJob, 0, len(records))
for i := range records {
jobs = append(jobs, recordToJob(&records[i]))
}
return jobs, nil
}
func (a *fineTuneStoreAdapter) Upsert(_ context.Context, job *schema.FineTuneJob) error {
return a.store.Upsert(jobToRecord(job))
}
func (a *fineTuneStoreAdapter) Delete(_ context.Context, id string) error {
return a.store.Delete(id)
}
// recordToJob maps a persisted DB record back to the API shape, reconstructing
// the structured Config / ExtraOptions from their JSON columns.
func recordToJob(r *distributed.FineTuneJobRecord) *schema.FineTuneJob {
job := &schema.FineTuneJob{
ID: r.ID,
UserID: r.UserID,
Model: r.Model,
Backend: r.Backend,
ModelID: r.ModelID,
TrainingType: r.TrainingType,
TrainingMethod: r.TrainingMethod,
Status: r.Status,
Message: r.Message,
OutputDir: r.OutputDir,
ExportStatus: r.ExportStatus,
ExportMessage: r.ExportMessage,
ExportModelName: r.ExportModelName,
CreatedAt: r.CreatedAt.UTC().Format(time.RFC3339),
}
if r.ExtraOptsJSON != "" {
// Best-effort: a malformed column must not drop the whole job from the API.
_ = json.Unmarshal([]byte(r.ExtraOptsJSON), &job.ExtraOptions)
}
if r.ConfigJSON != "" {
var cfg schema.FineTuneJobRequest
if err := json.Unmarshal([]byte(r.ConfigJSON), &cfg); err == nil {
job.Config = &cfg
}
}
return job
}
// jobToRecord maps the API shape to a DB record for write-through, serializing
// the structured Config / ExtraOptions into their JSON columns. CreatedAt is
// parsed back from the RFC3339 string the service stamps; an unparseable value
// is left zero so FineTuneStore.Upsert stamps "now".
func jobToRecord(job *schema.FineTuneJob) *distributed.FineTuneJobRecord {
rec := &distributed.FineTuneJobRecord{
ID: job.ID,
UserID: job.UserID,
Model: job.Model,
Backend: job.Backend,
ModelID: job.ModelID,
TrainingType: job.TrainingType,
TrainingMethod: job.TrainingMethod,
Status: job.Status,
Message: job.Message,
OutputDir: job.OutputDir,
ExportStatus: job.ExportStatus,
ExportMessage: job.ExportMessage,
ExportModelName: job.ExportModelName,
}
if job.Config != nil {
if data, err := json.Marshal(job.Config); err == nil {
rec.ConfigJSON = string(data)
}
}
if job.ExtraOptions != nil {
if data, err := json.Marshal(job.ExtraOptions); err == nil {
rec.ExtraOptsJSON = string(data)
}
}
if t, err := time.Parse(time.RFC3339, job.CreatedAt); err == nil {
rec.CreatedAt = t
}
return rec
}

View File

@@ -404,6 +404,36 @@ var _ = Describe("GalleryService cache invalidation broadcasts", func() {
Element: "x", Op: "install",
})).To(Succeed())
})
It("BroadcastModelsChanged delivers the element and op to a peer's OnModelsChanged", func() {
var (
mu sync.Mutex
seen []messaging.CacheInvalidateEvent
)
svcB.OnModelsChanged = func(evt messaging.CacheInvalidateEvent) {
mu.Lock()
seen = append(seen, evt)
mu.Unlock()
}
Expect(svcA.SubscribeBroadcasts()).To(Succeed())
Expect(svcB.SubscribeBroadcasts()).To(Succeed())
// An admin edit on replica A must reach replica B over the same subject
// the gallery path uses, so B refreshes its in-memory config loader.
svcA.BroadcastModelsChanged("my-alias", "install")
mu.Lock()
defer mu.Unlock()
Expect(seen).To(ContainElement(messaging.CacheInvalidateEvent{
Element: "my-alias", Op: "install",
}))
})
It("BroadcastModelsChanged is a no-op when NATS is not wired (standalone)", func() {
standalone := galleryop.NewGalleryService(&config.ApplicationConfig{}, nil)
// No SetNATSClient: must not panic and must simply do nothing.
Expect(func() { standalone.BroadcastModelsChanged("x", "delete") }).ToNot(Panic())
})
})
var _ = Describe("GalleryService PostgreSQL hydration", func() {

View File

@@ -201,6 +201,24 @@ func (g *GalleryService) publishCacheInvalidate(subject string, evt messaging.Ca
}
}
// BroadcastModelsChanged notifies peer replicas that a model config was
// created, edited, or removed out-of-band of the gallery install/delete
// channel (e.g. the admin /models/edit, /models/import and
// /models/toggle-state endpoints, which write the YAML and reload only the
// local in-memory loader). Peers receive it via OnModelsChanged and refresh
// their own ModelConfigLoader so a request load-balanced to any replica sees
// the same config. No-op in standalone mode (no NATS client).
//
// op is "install" for a create/edit (the element must be (re)loaded from
// disk) or "delete" for a removal (the element must be pruned from memory,
// which a reload-from-path cannot do because the loader is additive).
func (g *GalleryService) BroadcastModelsChanged(element, op string) {
g.publishCacheInvalidate(messaging.SubjectCacheInvalidateModels, messaging.CacheInvalidateEvent{
Element: element,
Op: op,
})
}
// mergeStatus is the broadcast-side merge: it updates the in-memory map from
// a peer's GalleryProgressEvent without re-publishing to NATS or re-writing
// to PostgreSQL. UpdateStatus is the local-write entry point and does both;

View File

@@ -22,6 +22,14 @@ const subscribeConfirmTimeout = 5 * time.Second
type Client struct {
conn *nats.Conn
mu sync.RWMutex
// reconnectCbs are invoked after the underlying connection is
// re-established. nats.go transparently resubscribes existing
// subscriptions on reconnect, but it cannot know that a consumer kept
// derived in-memory state (e.g. syncstate.SyncedMap) that may have drifted
// while the link was down — these callbacks let such consumers re-hydrate.
cbMu sync.Mutex
reconnectCbs []func()
}
// New creates a new NATS client with auto-reconnect.
@@ -31,6 +39,10 @@ func New(url string, opts ...Option) (*Client, error) {
o(&cfg)
}
// Allocate the client up front so the reconnect handler closure can reach
// it; conn is populated after nats.Connect succeeds below.
c := &Client{}
natsOpts := []nats.Option{
nats.RetryOnFailedConnect(true),
nats.MaxReconnects(-1),
@@ -41,6 +53,7 @@ func New(url string, opts ...Option) (*Client, error) {
}),
nats.ReconnectHandler(func(_ *nats.Conn) {
xlog.Info("NATS reconnected")
c.runReconnectCallbacks()
}),
nats.ClosedHandler(func(_ *nats.Conn) {
xlog.Info("NATS connection closed")
@@ -103,7 +116,33 @@ func New(url string, opts ...Option) (*Client, error) {
return nil, fmt.Errorf("connecting to NATS at %s: %w", sanitize.URL(url), err)
}
return &Client{conn: nc}, nil
c.conn = nc
return c, nil
}
// OnReconnect registers a callback invoked after the NATS connection is
// re-established. It is consumed via an optional interface type-assertion
// (interface{ OnReconnect(func()) }) rather than being added to MessagingClient,
// so the messaging abstraction stays minimal and standalone/test clients are not
// forced to implement reconnect semantics. A nil callback is ignored.
func (c *Client) OnReconnect(cb func()) {
if cb == nil {
return
}
c.cbMu.Lock()
c.reconnectCbs = append(c.reconnectCbs, cb)
c.cbMu.Unlock()
}
// runReconnectCallbacks invokes registered reconnect callbacks. It copies the
// slice under the lock so a callback that (re)registers cannot deadlock.
func (c *Client) runReconnectCallbacks() {
c.cbMu.Lock()
cbs := append([]func(){}, c.reconnectCbs...)
c.cbMu.Unlock()
for _, cb := range cbs {
cb()
}
}
// Publish marshals data as JSON and publishes it to the given subject.

View File

@@ -380,6 +380,20 @@ func SubjectCacheInvalidateCollection(name string) string {
return "cache.invalidate.collections." + sanitizeSubjectToken(name)
}
// SyncedMap State Sync (Pub/Sub — broadcast to all frontends)
//
// The reusable syncstate.SyncedMap component publishes a {op,key,value} delta on
// this subject whenever a replica mutates a piece of cross-replica in-memory
// state. Peers subscribe and apply the delta to their own map, so a round-robin
// API request that lands on a replica which did not originate the change still
// sees it. Convergence on (re)connect is done by re-hydrating from the durable
// source, so no request/reply snapshot subject is needed here.
func SubjectSyncStateDelta(name string) string {
return subjectSyncStatePrefix + sanitizeSubjectToken(name) + ".delta"
}
const subjectSyncStatePrefix = "state."
// Prefix-Cache Routing Sync (Pub/Sub - broadcast to all frontends)
//
// Frontends share prefix-cache observations so a request routed to any replica

View File

@@ -0,0 +1,53 @@
package modeladmin
import (
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog"
)
// opDelete is the CacheInvalidateEvent.Op value the gallery delete path and the
// admin delete endpoint use; a delete must prune (a reload-from-path cannot).
const opDelete = "delete"
// ApplyRemoteChange refreshes this replica's in-memory model state from a peer
// replica's model-config change broadcast (messaging.CacheInvalidateEvent on
// SubjectCacheInvalidateModels). It is the subscriber-side counterpart to
// GalleryService.BroadcastModelsChanged.
//
// The op matters because LoadModelConfigsFromPath is additive: it loads every
// YAML on disk into the loader but never removes an entry whose file is gone.
// So a delete cannot be propagated by a plain reload - the deleted element must
// be explicitly pruned. Specifically:
//
// - op == "delete" with a named element: prune that element from the loader.
// - otherwise: reload all configs from disk (picks up creates and edits).
//
// In both cases, when an element is named, any running instance on this replica
// is shut down (best-effort) so the next request rebuilds it from the new
// config instead of serving the stale one - mirroring what the originating
// replica does on a local edit/delete.
//
// ml may be nil (no running instances to shut down). modelsPath and opts are
// forwarded to LoadModelConfigsFromPath.
func ApplyRemoteChange(cl *config.ModelConfigLoader, ml *model.ModelLoader, modelsPath string, evt messaging.CacheInvalidateEvent, opts ...config.ConfigLoaderOption) error {
if evt.Op == opDelete && evt.Element != "" {
cl.RemoveModelConfig(evt.Element)
} else if err := cl.LoadModelConfigsFromPath(modelsPath, opts...); err != nil {
return err
}
// Drop any running instance of the affected model so the next request
// rebuilds it from the refreshed config instead of serving the stale one.
// Best-effort: the model may not be loaded on this replica, which surfaces
// as a benign error here.
if ml != nil && evt.Element != "" {
if err := ml.ShutdownModel(evt.Element); err != nil {
xlog.Debug("ApplyRemoteChange: could not shut down model instance (likely not loaded)",
"model", evt.Element, "error", err)
}
}
return nil
}

View File

@@ -0,0 +1,80 @@
package modeladmin
import (
"os"
"path/filepath"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"gopkg.in/yaml.v3"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/services/messaging"
)
var _ = Describe("ApplyRemoteChange", func() {
var (
dir string
loader *config.ModelConfigLoader
)
BeforeEach(func() {
dir = GinkgoT().TempDir()
loader = config.NewModelConfigLoader(dir)
})
writeYAML := func(name string, body map[string]any) {
body["name"] = name
data, err := yaml.Marshal(body)
Expect(err).ToNot(HaveOccurred())
Expect(os.WriteFile(filepath.Join(dir, name+".yaml"), data, 0644)).To(Succeed())
}
It("loads a peer-created config from disk on an install event", func() {
// Peer wrote the YAML to the shared models dir; this replica has not
// loaded it yet (empty in-memory loader).
writeYAML("peer-alias", map[string]any{"alias": "qwen"})
_, ok := loader.GetModelConfig("peer-alias")
Expect(ok).To(BeFalse(), "precondition: not yet in memory")
err := ApplyRemoteChange(loader, nil, dir, messaging.CacheInvalidateEvent{
Element: "peer-alias", Op: "install",
})
Expect(err).ToNot(HaveOccurred())
_, ok = loader.GetModelConfig("peer-alias")
Expect(ok).To(BeTrue(), "install event must reload the new config from disk")
})
It("prunes a peer-deleted config that a reload-from-path cannot drop", func() {
// Model is present in memory (loaded earlier) but its file is now gone
// from the shared dir. LoadModelConfigsFromPath is additive, so only an
// explicit prune can remove it - this is the cross-replica delete bug.
writeYAML("doomed", map[string]any{"alias": "qwen"})
Expect(loader.LoadModelConfigsFromPath(dir)).To(Succeed())
_, ok := loader.GetModelConfig("doomed")
Expect(ok).To(BeTrue(), "precondition: in memory")
Expect(os.Remove(filepath.Join(dir, "doomed.yaml"))).To(Succeed())
err := ApplyRemoteChange(loader, nil, dir, messaging.CacheInvalidateEvent{
Element: "doomed", Op: "delete",
})
Expect(err).ToNot(HaveOccurred())
_, ok = loader.GetModelConfig("doomed")
Expect(ok).To(BeFalse(), "delete event must prune the element from memory")
})
It("does a full reload when no element is named", func() {
writeYAML("m1", map[string]any{"alias": "qwen"})
writeYAML("m2", map[string]any{"alias": "qwen"})
err := ApplyRemoteChange(loader, nil, dir, messaging.CacheInvalidateEvent{})
Expect(err).ToNot(HaveOccurred())
_, ok1 := loader.GetModelConfig("m1")
_, ok2 := loader.GetModelConfig("m2")
Expect(ok1).To(BeTrue())
Expect(ok2).To(BeTrue())
})
})

View File

@@ -673,6 +673,49 @@ func (r *NodeRegistry) Get(ctx context.Context, nodeID string) (*BackendNode, er
return &node, nil
}
// GetWithExtras returns a single node enriched with the same computed fields as
// ListWithExtras (labels, loaded-model count, in-flight total). The plain Get
// returns a bare BackendNode whose Labels live in a separate table, so the node
// detail view needs this to show a node's existing labels and live counts.
func (r *NodeRegistry) GetWithExtras(ctx context.Context, nodeID string) (*NodeWithExtras, error) {
node, err := r.Get(ctx, nodeID)
if err != nil {
return nil, err
}
labels := make(map[string]string)
nodeLabels, err := r.GetNodeLabels(ctx, nodeID)
if err != nil {
xlog.Warn("GetWithExtras: failed to get labels", "node", nodeID, "error", err)
} else {
for _, l := range nodeLabels {
labels[l.Key] = l.Value
}
}
var modelCount int64
if err := r.db.WithContext(ctx).Model(&NodeModel{}).
Where("node_id = ? AND state = ?", nodeID, "loaded").
Count(&modelCount).Error; err != nil {
xlog.Warn("GetWithExtras: failed to get model count", "node", nodeID, "error", err)
}
var inFlight struct{ Total int }
if err := r.db.WithContext(ctx).Model(&NodeModel{}).
Select("COALESCE(SUM(in_flight), 0) as total").
Where("node_id = ? AND state IN ?", nodeID, []string{"loaded", "unloading"}).
Scan(&inFlight).Error; err != nil {
xlog.Warn("GetWithExtras: failed to get in-flight count", "node", nodeID, "error", err)
}
return &NodeWithExtras{
BackendNode: *node,
ModelCount: int(modelCount),
InFlightCount: inFlight.Total,
Labels: labels,
}, nil
}
// GetByName returns a single node by name.
func (r *NodeRegistry) GetByName(ctx context.Context, name string) (*BackendNode, error) {
var node BackendNode

View File

@@ -646,6 +646,38 @@ var _ = Describe("NodeRegistry", func() {
})
})
Describe("GetWithExtras", func() {
It("returns the node enriched with its labels map", func() {
node := makeNode("extras-node", "10.0.0.80:50051", 8_000_000_000)
Expect(registry.Register(context.Background(), node, true)).To(Succeed())
Expect(registry.SetNodeLabel(context.Background(), node.ID, "env", "prod")).To(Succeed())
Expect(registry.SetNodeLabel(context.Background(), node.ID, "region", "us-east")).To(Succeed())
got, err := registry.GetWithExtras(context.Background(), node.ID)
Expect(err).ToNot(HaveOccurred())
Expect(got).ToNot(BeNil())
Expect(got.ID).To(Equal(node.ID))
Expect(got.Name).To(Equal("extras-node"))
Expect(got.Labels).To(Equal(map[string]string{"env": "prod", "region": "us-east"}))
})
It("returns an empty (non-nil) labels map when the node has none", func() {
node := makeNode("extras-no-labels", "10.0.0.81:50051", 8_000_000_000)
Expect(registry.Register(context.Background(), node, true)).To(Succeed())
got, err := registry.GetWithExtras(context.Background(), node.ID)
Expect(err).ToNot(HaveOccurred())
Expect(got).ToNot(BeNil())
Expect(got.Labels).ToNot(BeNil())
Expect(got.Labels).To(BeEmpty())
})
It("returns an error for an unknown node", func() {
_, err := registry.GetWithExtras(context.Background(), "does-not-exist")
Expect(err).To(HaveOccurred())
})
})
Describe("FindNodesBySelector", func() {
It("returns nodes matching all labels in selector", func() {
n1 := makeNode("sel-match", "10.0.0.80:50051", 8_000_000_000)

View File

@@ -0,0 +1,13 @@
package quantization
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestQuantization(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Quantization Suite")
}

View File

@@ -17,6 +17,9 @@ import (
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery/importers"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/utils"
@@ -30,26 +33,63 @@ type QuantizationService struct {
modelLoader *model.ModelLoader
configLoader *config.ModelConfigLoader
mu sync.Mutex
jobs map[string]*schema.QuantizationJob
// mu serializes the read-modify-write of job values. The SyncedMap guards its
// own map structure, but a job is a pointer mutated in place (e.g. the import
// goroutine), so the service still needs a lock to keep those field updates and
// the subsequent Set atomic with respect to readers.
mu sync.Mutex
// jobs is the cross-replica job store: an in-memory map kept consistent across
// replicas via NATS, optionally read-through to PostgreSQL in distributed mode.
jobs *syncstate.SyncedMap[string, *schema.QuantizationJob]
}
// NewQuantizationService creates a new QuantizationService.
// NewQuantizationService creates a new QuantizationService. In distributed mode
// pass the shared NATS client and PostgreSQL store so jobs stay consistent across
// replicas; pass nil for both in standalone mode, where the disk Loader hydrates
// the map and there is nothing to broadcast.
func NewQuantizationService(
appConfig *config.ApplicationConfig,
modelLoader *model.ModelLoader,
configLoader *config.ModelConfigLoader,
nats messaging.MessagingClient,
store *distributed.QuantStore,
) *QuantizationService {
s := &QuantizationService{
appConfig: appConfig,
modelLoader: modelLoader,
configLoader: configLoader,
jobs: make(map[string]*schema.QuantizationJob),
}
s.loadAllJobs()
// Only attach a Store interface when a concrete store exists, otherwise the
// SyncedMap would see a non-nil interface wrapping a nil pointer and try to
// hydrate/write through a nil DB.
var syncStore syncstate.Store[string, *schema.QuantizationJob]
if store != nil {
syncStore = &quantStoreAdapter{store: store}
}
s.jobs = syncstate.New(syncstate.Config[string, *schema.QuantizationJob]{
Name: "quant.jobs",
Key: func(j *schema.QuantizationJob) string { return j.ID },
Nats: nats,
Store: syncStore,
Loader: s.loadJobsFromDisk, // ignored when Store is set (distributed mode)
})
// Hydrate + subscribe. A hydrate failure must not take the server down: log and
// continue degraded (standalone), mirroring the FineTune/OpCache wiring.
if err := s.jobs.Start(appConfig.Context); err != nil {
xlog.Warn("Quantization SyncedMap start failed; running degraded", "error", err)
}
return s
}
// Close releases the SyncedMap subscription and background workers.
func (s *QuantizationService) Close() error {
return s.jobs.Close()
}
// quantizationBaseDir returns the base directory for quantization job data.
func (s *QuantizationService) quantizationBaseDir() string {
return filepath.Join(s.appConfig.DataPath, "quantization")
@@ -80,15 +120,18 @@ func (s *QuantizationService) saveJobState(job *schema.QuantizationJob) {
}
}
// loadAllJobs scans the quantization directory for persisted jobs and loads them.
func (s *QuantizationService) loadAllJobs() {
// loadJobsFromDisk scans the quantization directory for persisted jobs and
// returns them. It is the SyncedMap Loader used in standalone mode (no DB); the
// returned slice hydrates the map on Start.
func (s *QuantizationService) loadJobsFromDisk(_ context.Context) ([]*schema.QuantizationJob, error) {
baseDir := s.quantizationBaseDir()
entries, err := os.ReadDir(baseDir)
if err != nil {
// Directory doesn't exist yet — that's fine
return
// Directory doesn't exist yet — that's fine, start empty.
return nil, nil
}
var jobs []*schema.QuantizationJob
for _, entry := range entries {
if !entry.IsDir() {
continue
@@ -117,12 +160,13 @@ func (s *QuantizationService) loadAllJobs() {
job.ImportMessage = "Server restarted while import was running"
}
s.jobs[job.ID] = &job
jobs = append(jobs, &job)
}
if len(s.jobs) > 0 {
xlog.Info("Loaded persisted quantization jobs", "count", len(s.jobs))
if len(jobs) > 0 {
xlog.Info("Loaded persisted quantization jobs", "count", len(jobs))
}
return jobs, nil
}
// StartJob starts a new quantization job.
@@ -188,7 +232,12 @@ func (s *QuantizationService) StartJob(ctx context.Context, userID string, req s
CreatedAt: time.Now().UTC().Format(time.RFC3339),
Config: &req,
}
s.jobs[jobID] = job
// Set write-through persists to PostgreSQL (distributed) and broadcasts to
// peer replicas; the disk state.json is written separately for restart
// recovery / standalone hydrate.
if err := s.jobs.Set(ctx, job); err != nil {
return nil, fmt.Errorf("failed to persist job: %w", err)
}
s.saveJobState(job)
return &schema.QuantizationJobResponse{
@@ -203,7 +252,7 @@ func (s *QuantizationService) GetJob(userID, jobID string) (*schema.Quantization
s.mu.Lock()
defer s.mu.Unlock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
return nil, fmt.Errorf("job not found: %s", jobID)
}
@@ -219,7 +268,7 @@ func (s *QuantizationService) ListJobs(userID string) []*schema.QuantizationJob
defer s.mu.Unlock()
var result []*schema.QuantizationJob
for _, job := range s.jobs {
for _, job := range s.jobs.List() {
if userID == "" || job.UserID == userID {
result = append(result, job)
}
@@ -235,7 +284,7 @@ func (s *QuantizationService) ListJobs(userID string) []*schema.QuantizationJob
// StopJob stops a running quantization job.
func (s *QuantizationService) StopJob(ctx context.Context, userID, jobID string) error {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID)
@@ -256,6 +305,9 @@ func (s *QuantizationService) StopJob(ctx context.Context, userID, jobID string)
s.mu.Lock()
job.Status = "stopped"
job.Message = "Quantization stopped by user"
if err := s.jobs.Set(ctx, job); err != nil {
xlog.Warn("Failed to persist stopped job", "job_id", jobID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
@@ -265,7 +317,7 @@ func (s *QuantizationService) StopJob(ctx context.Context, userID, jobID string)
// DeleteJob removes a quantization job and its associated data from disk.
func (s *QuantizationService) DeleteJob(userID, jobID string) error {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID)
@@ -289,7 +341,11 @@ func (s *QuantizationService) DeleteJob(userID, jobID string) error {
}
importModelName := job.ImportModelName
delete(s.jobs, jobID)
// Delete write-through removes the DB row (distributed) and broadcasts the
// removal to peer replicas. DeleteJob has no ctx, so use Background.
if err := s.jobs.Delete(context.Background(), jobID); err != nil {
xlog.Warn("Failed to delete job from store", "job_id", jobID, "error", err)
}
s.mu.Unlock()
// Remove job directory (state.json, output files)
@@ -324,7 +380,7 @@ func (s *QuantizationService) DeleteJob(userID, jobID string) error {
// StreamProgress opens a gRPC progress stream and calls the callback for each update.
func (s *QuantizationService) StreamProgress(ctx context.Context, userID, jobID string, callback func(event *schema.QuantizationProgressEvent)) error {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID)
@@ -353,7 +409,7 @@ func (s *QuantizationService) StreamProgress(ctx context.Context, userID, jobID
}, func(update *pb.QuantizationProgressUpdate) {
// Update job status and persist
s.mu.Lock()
if j, ok := s.jobs[jobID]; ok {
if j, ok := s.jobs.Get(jobID); ok {
// Don't let progress updates overwrite terminal states
isTerminal := j.Status == "stopped" || j.Status == "completed" || j.Status == "failed"
if !isTerminal {
@@ -365,6 +421,9 @@ func (s *QuantizationService) StreamProgress(ctx context.Context, userID, jobID
if update.OutputFile != "" {
j.OutputFile = update.OutputFile
}
if err := s.jobs.Set(ctx, j); err != nil {
xlog.Warn("Failed to persist progress update", "job_id", jobID, "error", err)
}
s.saveJobState(j)
}
s.mu.Unlock()
@@ -399,7 +458,7 @@ func sanitizeQuantModelName(s string) string {
// ImportModel imports a quantized model into LocalAI asynchronously.
func (s *QuantizationService) ImportModel(ctx context.Context, userID, jobID string, req schema.QuantizationImportRequest) (string, error) {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return "", fmt.Errorf("job not found: %s", jobID)
@@ -459,6 +518,9 @@ func (s *QuantizationService) ImportModel(ctx context.Context, userID, jobID str
job.ImportStatus = "importing"
job.ImportMessage = ""
job.ImportModelName = ""
if err := s.jobs.Set(ctx, job); err != nil {
xlog.Warn("Failed to persist import start", "job_id", jobID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
@@ -514,10 +576,15 @@ func (s *QuantizationService) ImportModel(ctx context.Context, userID, jobID str
xlog.Info("Quantized model imported and registered", "job_id", jobID, "model_name", modelName)
// Runs after the HTTP request returns, so use Background rather than the
// (now likely cancelled) request ctx for the write-through.
s.mu.Lock()
job.ImportStatus = "completed"
job.ImportModelName = modelName
job.ImportMessage = ""
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist import completion", "job_id", jobID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
}()
@@ -525,10 +592,14 @@ func (s *QuantizationService) ImportModel(ctx context.Context, userID, jobID str
return modelName, nil
}
// setImportMessage updates the import message and persists the job state.
// setImportMessage updates the import message and persists the job state. Called
// from the background import goroutine, so it uses Background for write-through.
func (s *QuantizationService) setImportMessage(job *schema.QuantizationJob, msg string) {
s.mu.Lock()
job.ImportMessage = msg
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist import message", "job_id", job.ID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
}
@@ -539,6 +610,9 @@ func (s *QuantizationService) setImportFailed(job *schema.QuantizationJob, messa
s.mu.Lock()
job.ImportStatus = "failed"
job.ImportMessage = message
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist import failure", "job_id", job.ID, "error", err)
}
s.saveJobState(job)
s.mu.Unlock()
}
@@ -546,7 +620,7 @@ func (s *QuantizationService) setImportFailed(job *schema.QuantizationJob, messa
// GetOutputPath returns the path to the quantized model file and a download name.
func (s *QuantizationService) GetOutputPath(userID, jobID string) (string, string, error) {
s.mu.Lock()
job, ok := s.jobs[jobID]
job, ok := s.jobs.Get(jobID)
if !ok {
s.mu.Unlock()
return "", "", fmt.Errorf("job not found: %s", jobID)

View File

@@ -0,0 +1,187 @@
package quantization
// White-box tests (package quantization) so a spec can drive the service's
// internal SyncedMap the same way StartJob does (via jobs.Set) without standing
// up a quantization backend, then assert the cross-replica reads
// (GetJob/ListJobs) and the adapter conversions that keep REST responses
// byte-for-byte unchanged.
import (
"context"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/testutil"
)
// newTestService builds a standalone QuantizationService wired to the given bus.
// The model/config loaders are nil because the read/sync paths under test never
// touch them; the data dir is a throwaway temp dir so the disk Loader finds
// nothing.
func newTestService(bus *testutil.FakeBus) *QuantizationService {
appConfig := &config.ApplicationConfig{
Context: context.Background(),
DataPath: GinkgoT().TempDir(),
}
return NewQuantizationService(appConfig, nil, nil, bus, nil)
}
var _ = Describe("QuantizationService", func() {
ctx := context.Background()
Describe("cross-replica job visibility", func() {
var (
bus *testutil.FakeBus
a, b *QuantizationService
)
BeforeEach(func() {
// One shared bus, two replicas: exactly the distributed topology where a
// round-robin request may land on a replica that did not originate the
// change.
bus = testutil.NewFakeBus()
a = newTestService(bus)
b = newTestService(bus)
})
AfterEach(func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
})
It("makes a job created on A visible via B's GetJob and ListJobs", func() {
job := &schema.QuantizationJob{ID: "job-1", UserID: "user-1", Status: "queued", CreatedAt: "2026-06-27T10:00:00Z"}
// StartJob persists via jobs.Set; drive that directly to avoid a backend.
Expect(a.jobs.Set(ctx, job)).To(Succeed())
got, err := b.GetJob("user-1", "job-1")
Expect(err).ToNot(HaveOccurred(), "B must see a job A just created")
Expect(got.Status).To(Equal("queued"))
listed := b.ListJobs("user-1")
Expect(listed).To(HaveLen(1))
Expect(listed[0].ID).To(Equal("job-1"))
})
It("removes a job from B when it is deleted on A", func() {
job := &schema.QuantizationJob{ID: "job-2", UserID: "user-1", Status: "completed", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, job)).To(Succeed())
_, err := b.GetJob("user-1", "job-2")
Expect(err).ToNot(HaveOccurred(), "precondition: B must have the job before the delete")
Expect(a.jobs.Delete(ctx, "job-2")).To(Succeed())
_, err = b.GetJob("user-1", "job-2")
Expect(err).To(HaveOccurred(), "a delete on A must remove the job from B")
})
It("propagates a status update from A to B", func() {
job := &schema.QuantizationJob{ID: "job-3", UserID: "user-1", Status: "quantizing", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, job)).To(Succeed())
updated := &schema.QuantizationJob{ID: "job-3", UserID: "user-1", Status: "completed", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, updated)).To(Succeed())
got, err := b.GetJob("user-1", "job-3")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("completed"))
})
})
Describe("ListJobs", func() {
var svc *QuantizationService
BeforeEach(func() {
svc = newTestService(testutil.NewFakeBus())
})
AfterEach(func() { Expect(svc.Close()).To(Succeed()) })
It("filters by user and sorts newest-first", func() {
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "old", UserID: "u1", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "new", UserID: "u1", CreatedAt: "2026-06-27T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "other", UserID: "u2", CreatedAt: "2026-06-26T10:00:00Z"})).To(Succeed())
jobs := svc.ListJobs("u1")
Expect(jobs).To(HaveLen(2), "only u1's jobs")
Expect(jobs[0].ID).To(Equal("new"), "newest first")
Expect(jobs[1].ID).To(Equal("old"))
})
It("returns every user's jobs when the userID filter is empty", func() {
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "a", UserID: "u1", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "b", UserID: "u2", CreatedAt: "2026-06-26T10:00:00Z"})).To(Succeed())
Expect(svc.ListJobs("")).To(HaveLen(2))
})
It("rejects GetJob for a job owned by another user", func() {
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "x", UserID: "owner", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
_, err := svc.GetJob("intruder", "x")
Expect(err).To(HaveOccurred(), "a different user must not read someone else's job")
})
})
Describe("store adapter conversion", func() {
// The SyncedMap value type is *schema.QuantizationJob (the exact REST shape).
// These specs prove the DB adapter round-trips it losslessly, so hydrate and
// write-through in distributed mode keep responses unchanged.
It("round-trips a job through jobToRecord/recordToJob preserving the API shape", func() {
original := &schema.QuantizationJob{
ID: "rt-1",
UserID: "user-1",
Model: "base-model",
Backend: "llama-cpp-quantization",
ModelID: "llama-cpp-quantization-quantize-rt-1",
QuantizationType: "q4_k_m",
Status: "completed",
Message: "done",
OutputDir: "/data/quantization/rt-1",
OutputFile: "/data/quantization/rt-1/model.gguf",
ExtraOptions: map[string]string{"hf_token": "secret"},
CreatedAt: "2026-06-27T10:00:00Z",
ImportStatus: "completed",
ImportMessage: "",
ImportModelName: "base-model-q4_k_m-rt-1",
Config: &schema.QuantizationJobRequest{Model: "base-model", Backend: "llama-cpp-quantization", QuantizationType: "q4_k_m"},
}
rec := jobToRecord(original)
Expect(rec.ID).To(Equal("rt-1"))
Expect(rec.ConfigJSON).ToNot(BeEmpty(), "structured config must serialize into the JSON column")
Expect(rec.ExtraOptsJSON).ToNot(BeEmpty())
back := recordToJob(rec)
Expect(back.ID).To(Equal(original.ID))
Expect(back.UserID).To(Equal(original.UserID))
Expect(back.Model).To(Equal(original.Model))
Expect(back.Backend).To(Equal(original.Backend))
Expect(back.ModelID).To(Equal(original.ModelID))
Expect(back.QuantizationType).To(Equal(original.QuantizationType))
Expect(back.Status).To(Equal(original.Status))
Expect(back.Message).To(Equal(original.Message))
Expect(back.OutputDir).To(Equal(original.OutputDir))
Expect(back.OutputFile).To(Equal(original.OutputFile))
Expect(back.ImportStatus).To(Equal(original.ImportStatus))
Expect(back.ImportModelName).To(Equal(original.ImportModelName))
Expect(back.CreatedAt).To(Equal(original.CreatedAt))
Expect(back.ExtraOptions).To(Equal(original.ExtraOptions))
Expect(back.Config).ToNot(BeNil())
Expect(back.Config.QuantizationType).To(Equal("q4_k_m"))
})
})
Describe("compile-time adapter contract", func() {
It("satisfies syncstate.Store for *distributed.QuantStore", func() {
// Guards against drift between the adapter and the component interface;
// the var assertion in syncstore.go covers it at build time, this keeps
// the type referenced from a spec too.
var _ *distributed.QuantStore
Expect(&quantStoreAdapter{}).ToNot(BeNil())
})
})
})

View File

@@ -0,0 +1,114 @@
package quantization
import (
"context"
"encoding/json"
"time"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/syncstate"
)
// quantStoreAdapter bridges the distributed PostgreSQL QuantStore to the generic
// syncstate.Store the SyncedMap consumes. It is only wired in distributed mode;
// standalone leaves Store nil and hydrates from disk via a Loader instead.
//
// The SyncedMap value type is *schema.QuantizationJob (the exact shape the REST
// API returns) so reads need no conversion and the response JSON is provably
// unchanged. The adapter is the single place that translates between that API
// shape and the DB QuantJobRecord.
type quantStoreAdapter struct {
store *distributed.QuantStore
}
// compile-time assertion that the adapter satisfies the component's Store.
var _ syncstate.Store[string, *schema.QuantizationJob] = (*quantStoreAdapter)(nil)
func (a *quantStoreAdapter) List(_ context.Context) ([]*schema.QuantizationJob, error) {
records, err := a.store.ListAll()
if err != nil {
return nil, err
}
jobs := make([]*schema.QuantizationJob, 0, len(records))
for i := range records {
jobs = append(jobs, recordToJob(&records[i]))
}
return jobs, nil
}
func (a *quantStoreAdapter) Upsert(_ context.Context, job *schema.QuantizationJob) error {
return a.store.Upsert(jobToRecord(job))
}
func (a *quantStoreAdapter) Delete(_ context.Context, id string) error {
return a.store.Delete(id)
}
// recordToJob maps a persisted DB record back to the API shape, reconstructing
// the structured Config / ExtraOptions from their JSON columns.
func recordToJob(r *distributed.QuantJobRecord) *schema.QuantizationJob {
job := &schema.QuantizationJob{
ID: r.ID,
UserID: r.UserID,
Model: r.Model,
Backend: r.Backend,
ModelID: r.ModelID,
QuantizationType: r.QuantizationType,
Status: r.Status,
Message: r.Message,
OutputDir: r.OutputDir,
OutputFile: r.OutputFile,
ImportStatus: r.ImportStatus,
ImportMessage: r.ImportMessage,
ImportModelName: r.ImportModelName,
CreatedAt: r.CreatedAt.UTC().Format(time.RFC3339),
}
if r.ExtraOptsJSON != "" {
// Best-effort: a malformed column must not drop the whole job from the API.
_ = json.Unmarshal([]byte(r.ExtraOptsJSON), &job.ExtraOptions)
}
if r.ConfigJSON != "" {
var cfg schema.QuantizationJobRequest
if err := json.Unmarshal([]byte(r.ConfigJSON), &cfg); err == nil {
job.Config = &cfg
}
}
return job
}
// jobToRecord maps the API shape to a DB record for write-through, serializing
// the structured Config / ExtraOptions into their JSON columns. CreatedAt is
// parsed back from the RFC3339 string the service stamps; an unparseable value is
// left zero so QuantStore.Upsert stamps "now".
func jobToRecord(job *schema.QuantizationJob) *distributed.QuantJobRecord {
rec := &distributed.QuantJobRecord{
ID: job.ID,
UserID: job.UserID,
Model: job.Model,
Backend: job.Backend,
ModelID: job.ModelID,
QuantizationType: job.QuantizationType,
Status: job.Status,
Message: job.Message,
OutputDir: job.OutputDir,
OutputFile: job.OutputFile,
ImportStatus: job.ImportStatus,
ImportMessage: job.ImportMessage,
ImportModelName: job.ImportModelName,
}
if job.Config != nil {
if data, err := json.Marshal(job.Config); err == nil {
rec.ConfigJSON = string(data)
}
}
if job.ExtraOptions != nil {
if data, err := json.Marshal(job.ExtraOptions); err == nil {
rec.ExtraOptsJSON = string(data)
}
}
if t, err := time.Parse(time.RFC3339, job.CreatedAt); err == nil {
rec.CreatedAt = t
}
return rec
}

View File

@@ -0,0 +1,289 @@
// Package syncstate provides SyncedMap, a reusable cross-replica in-memory map.
//
// LocalAI in distributed mode runs multiple frontend replicas behind a
// round-robin load balancer. Several features keep process-local in-memory state
// that is surfaced to the HTTP/UI API; without cross-replica sync a poll that
// lands on a replica which did not originate a change sees stale or missing data.
// SyncedMap collapses the three legs each feature otherwise hand-wires - an
// in-memory map, a NATS broadcast/apply path, and optional durable read-through -
// into one well-tested component so cross-replica consistency is a configuration
// choice rather than a bespoke re-implementation.
package syncstate
import (
"context"
"sync"
"time"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/xlog"
)
// Op values carried on the wire and passed to OnApply.
const (
opSet = "set"
opDelete = "delete"
)
// Store is optional durable backing for a SyncedMap. In distributed mode it is a
// single shared DB, so the apply path (a delta received from a peer) updates
// memory only and never re-writes the Store.
type Store[K comparable, V any] interface {
List(ctx context.Context) ([]V, error)
Upsert(ctx context.Context, v V) error
Delete(ctx context.Context, k K) error
}
// Config configures a SyncedMap.
type Config[K comparable, V any] struct {
Name string // subject namespace, e.g. "finetune.jobs"
Key func(V) K // extract the key from a value
Nats messaging.MessagingClient // nil => standalone: in-memory only, no broadcast/subscribe
Store Store[K, V] // optional read-through persistence
Loader func(ctx context.Context) ([]V, error) // source when there is no Store (e.g. disk reload)
OnApply func(op string, k K, v V) // optional hook after an applied change (e.g. ShutdownModel)
Reconcile time.Duration // optional periodic re-hydrate; 0 = off
}
// delta is the JSON wire envelope broadcast on every local mutation. Value is
// omitempty so a delete carries only op+key.
type delta[K comparable, V any] struct {
Op string `json:"op"`
Key K `json:"key"`
Value V `json:"value,omitempty"`
}
// SyncedMap is a cross-replica in-memory map. A local write (Set/Delete) updates
// memory, the optional durable Store, then broadcasts a delta to peers. A peer's
// delta updates memory only and fires OnApply - it never re-broadcasts and never
// writes the Store. That structural split is the echo-loop guard (same pattern as
// galleryop.mergeStatus / OpCache.applyStart): receiving your own broadcast just
// re-applies an idempotent value to memory, so there is no storm and no
// double-write.
type SyncedMap[K comparable, V any] struct {
cfg Config[K, V]
mu sync.RWMutex
data map[K]V
sub Subscription
// lifeCtx outlives Start's argument: a reconnect callback or reconcile tick
// can fire long after Start returns, so they must not be tied to a ctx the
// caller may cancel. Close cancels it.
lifeCtx context.Context
cancel context.CancelFunc
wg sync.WaitGroup
}
// Subscription is the subset of messaging.Subscription the component holds onto.
type Subscription = messaging.Subscription
// New constructs a SyncedMap. Call Start to hydrate and begin syncing.
func New[K comparable, V any](cfg Config[K, V]) *SyncedMap[K, V] {
return &SyncedMap[K, V]{cfg: cfg, data: make(map[K]V)}
}
func (m *SyncedMap[K, V]) subject() string {
return messaging.SubjectSyncStateDelta(m.cfg.Name)
}
// Start hydrates from the source, subscribes for peer deltas, registers a
// reconnect re-hydrate (when the client supports it), and starts the optional
// reconcile ticker.
func (m *SyncedMap[K, V]) Start(ctx context.Context) error {
if err := m.hydrate(ctx); err != nil {
return err
}
// The cancel func is stored on the struct and invoked in Close (covered by
// tests); lifeCtx must outlive Start to drive the reconnect/reconcile
// goroutines, so it cannot be cancelled or deferred within this scope.
m.lifeCtx, m.cancel = context.WithCancel(context.Background()) // #nosec G118 -- cancel is invoked in Close()
if m.cfg.Nats != nil {
sub, err := messaging.SubscribeJSON(m.cfg.Nats, m.subject(), m.apply)
if err != nil {
return err
}
m.sub = sub
// nats.go transparently resubscribes on reconnect, but it cannot know we
// kept derived in-memory state that may have drifted while the link was
// down, so re-hydrate from the durable source. Detected via an optional
// interface so MessagingClient itself stays minimal; standalone/test
// clients without the method simply fall back to the reconcile ticker.
if r, ok := m.cfg.Nats.(interface{ OnReconnect(func()) }); ok {
r.OnReconnect(func() {
if err := m.hydrate(m.lifeCtx); err != nil {
xlog.Warn("syncstate: reconnect re-hydrate failed", "name", m.cfg.Name, "error", err)
}
})
}
}
if m.cfg.Reconcile > 0 {
m.wg.Add(1)
go m.reconcileLoop()
}
return nil
}
// Close unsubscribes and stops the reconcile ticker.
func (m *SyncedMap[K, V]) Close() error {
if m.cancel != nil {
m.cancel()
}
m.wg.Wait()
if m.sub != nil {
return m.sub.Unsubscribe()
}
return nil
}
// Set updates the value locally, writes through the Store, then broadcasts.
// Per the data-flow contract the Store write happens under the lock so memory and
// durable state move together; the broadcast is best-effort after unlocking.
func (m *SyncedMap[K, V]) Set(ctx context.Context, v V) error {
k := m.cfg.Key(v)
m.mu.Lock()
m.data[k] = v
if m.cfg.Store != nil {
if err := m.cfg.Store.Upsert(ctx, v); err != nil {
m.mu.Unlock()
return err
}
}
m.mu.Unlock()
m.publish(opSet, k, v)
return nil
}
// Delete removes the key locally, deletes it from the Store, then broadcasts.
func (m *SyncedMap[K, V]) Delete(ctx context.Context, k K) error {
m.mu.Lock()
delete(m.data, k)
if m.cfg.Store != nil {
if err := m.cfg.Store.Delete(ctx, k); err != nil {
m.mu.Unlock()
return err
}
}
m.mu.Unlock()
var zero V
m.publish(opDelete, k, zero)
return nil
}
// Get returns the value for k and whether it was present.
func (m *SyncedMap[K, V]) Get(k K) (V, bool) {
m.mu.RLock()
defer m.mu.RUnlock()
v, ok := m.data[k]
return v, ok
}
// List returns a snapshot slice of all values.
func (m *SyncedMap[K, V]) List() []V {
m.mu.RLock()
defer m.mu.RUnlock()
out := make([]V, 0, len(m.data))
for _, v := range m.data {
out = append(out, v)
}
return out
}
// Snapshot returns a copy of the underlying map.
func (m *SyncedMap[K, V]) Snapshot() map[K]V {
m.mu.RLock()
defer m.mu.RUnlock()
out := make(map[K]V, len(m.data))
for k, v := range m.data {
out[k] = v
}
return out
}
// publish broadcasts a delta. Standalone (nil Nats) is a strict no-op.
func (m *SyncedMap[K, V]) publish(op string, k K, v V) {
if m.cfg.Nats == nil {
return
}
if err := m.cfg.Nats.Publish(m.subject(), delta[K, V]{Op: op, Key: k, Value: v}); err != nil {
xlog.Warn("syncstate: failed to broadcast delta", "name", m.cfg.Name, "op", op, "error", err)
}
}
// apply handles a peer's delta: memory-only update plus OnApply. It deliberately
// never writes the Store nor re-publishes - that is the echo-loop guard.
func (m *SyncedMap[K, V]) apply(d delta[K, V]) {
switch d.Op {
case opSet:
m.mu.Lock()
m.data[d.Key] = d.Value
m.mu.Unlock()
case opDelete:
m.mu.Lock()
delete(m.data, d.Key)
m.mu.Unlock()
default:
xlog.Warn("syncstate: ignoring delta with unknown op", "name", m.cfg.Name, "op", d.Op)
return
}
if m.cfg.OnApply != nil {
m.cfg.OnApply(d.Op, d.Key, d.Value)
}
}
// hydrate replaces the whole map from the durable source: Store if present, else
// Loader. With neither, a late joiner starts empty and catches up via deltas
// (acceptable only for ephemeral state).
func (m *SyncedMap[K, V]) hydrate(ctx context.Context) error {
var (
vals []V
err error
)
switch {
case m.cfg.Store != nil:
vals, err = m.cfg.Store.List(ctx)
case m.cfg.Loader != nil:
vals, err = m.cfg.Loader(ctx)
default:
return nil
}
if err != nil {
return err
}
m.replaceAll(vals)
return nil
}
// replaceAll atomically swaps the map contents for the given values, keyed via
// cfg.Key.
func (m *SyncedMap[K, V]) replaceAll(vals []V) {
next := make(map[K]V, len(vals))
for _, v := range vals {
next[m.cfg.Key(v)] = v
}
m.mu.Lock()
m.data = next
m.mu.Unlock()
}
// reconcileLoop periodically re-hydrates to repair silent drift (missed deltas).
func (m *SyncedMap[K, V]) reconcileLoop() {
defer m.wg.Done()
t := time.NewTicker(m.cfg.Reconcile)
defer t.Stop()
for {
select {
case <-m.lifeCtx.Done():
return
case <-t.C:
if err := m.hydrate(m.lifeCtx); err != nil {
xlog.Warn("syncstate: reconcile re-hydrate failed", "name", m.cfg.Name, "error", err)
}
}
}
}

View File

@@ -0,0 +1,13 @@
package syncstate_test
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestSyncstate(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Syncstate Suite")
}

View File

@@ -0,0 +1,291 @@
package syncstate_test
import (
"context"
"sync"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
"github.com/mudler/LocalAI/core/services/testutil"
)
// job is a minimal JSON-serializable value stand-in for the real cross-replica
// records (finetune/quant/agent jobs) the component is built for.
type job struct {
ID string `json:"id"`
Status string `json:"status"`
}
func jobKey(j *job) string { return j.ID }
const stateName = "test.jobs"
func deltaSubject() string { return messaging.SubjectSyncStateDelta(stateName) }
// fakeStore is an in-memory Store that records call counts so specs can assert
// the write-through-vs-apply split (local writes hit the Store; applied deltas
// must not).
type fakeStore struct {
mu sync.Mutex
data map[string]*job
upsertCalls int
deleteCalls int
listCalls int
}
func newFakeStore(seed ...*job) *fakeStore {
s := &fakeStore{data: map[string]*job{}}
for _, j := range seed {
s.data[j.ID] = j
}
return s
}
func (s *fakeStore) List(_ context.Context) ([]*job, error) {
s.mu.Lock()
defer s.mu.Unlock()
s.listCalls++
out := make([]*job, 0, len(s.data))
for _, j := range s.data {
out = append(out, j)
}
return out, nil
}
func (s *fakeStore) Upsert(_ context.Context, j *job) error {
s.mu.Lock()
defer s.mu.Unlock()
s.upsertCalls++
s.data[j.ID] = j
return nil
}
func (s *fakeStore) Delete(_ context.Context, k string) error {
s.mu.Lock()
defer s.mu.Unlock()
s.deleteCalls++
delete(s.data, k)
return nil
}
// add simulates a peer replica writing to the shared DB out-of-band (e.g. while
// this replica was partitioned), so a re-hydrate can be observed to pick it up.
func (s *fakeStore) add(j *job) {
s.mu.Lock()
defer s.mu.Unlock()
s.data[j.ID] = j
}
func (s *fakeStore) counts() (upsert, del, list int) {
s.mu.Lock()
defer s.mu.Unlock()
return s.upsertCalls, s.deleteCalls, s.listCalls
}
var _ = Describe("SyncedMap", func() {
ctx := context.Background()
Describe("cross-replica delta propagation", func() {
var (
bus *testutil.FakeBus
a, b *syncstate.SyncedMap[string, *job]
)
BeforeEach(func() {
bus = testutil.NewFakeBus()
a = syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
b = syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
Expect(a.Start(ctx)).To(Succeed())
Expect(b.Start(ctx)).To(Succeed())
})
AfterEach(func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
})
It("propagates a Set on A to B", func() {
Expect(a.Set(ctx, &job{ID: "1", Status: "running"})).To(Succeed())
got, ok := b.Get("1")
Expect(ok).To(BeTrue(), "replica B should see the value A just set")
Expect(got.Status).To(Equal("running"))
})
It("prunes a Delete on A from B", func() {
Expect(a.Set(ctx, &job{ID: "1", Status: "running"})).To(Succeed())
_, present := b.Get("1")
Expect(present).To(BeTrue(), "precondition: B must have the value before the delete")
Expect(a.Delete(ctx, "1")).To(Succeed())
_, ok := b.Get("1")
Expect(ok).To(BeFalse(), "a delete on A must remove the key from B")
})
})
Describe("hydration", func() {
It("hydrates on Start from a preloaded Store", func() {
store := newFakeStore(&job{ID: "x", Status: "done"})
m := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Store: store})
Expect(m.Start(ctx)).To(Succeed())
got, ok := m.Get("x")
Expect(ok).To(BeTrue(), "Start must populate the map from the Store")
Expect(got.Status).To(Equal("done"))
})
It("uses the Loader when Store is nil", func() {
m := syncstate.New(syncstate.Config[string, *job]{
Name: stateName,
Key: jobKey,
Loader: func(_ context.Context) ([]*job, error) {
return []*job{{ID: "l", Status: "loaded"}}, nil
},
})
Expect(m.Start(ctx)).To(Succeed())
got, ok := m.Get("l")
Expect(ok).To(BeTrue(), "Loader output must hydrate the map when there is no Store")
Expect(got.Status).To(Equal("loaded"))
})
})
Describe("echo-loop guard", func() {
It("applies its own broadcast once and does not re-publish", func() {
bus := testutil.NewFakeBus()
a := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
b := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
Expect(a.Start(ctx)).To(Succeed())
Expect(b.Start(ctx)).To(Succeed())
defer func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
}()
Expect(a.Set(ctx, &job{ID: "e", Status: "running"})).To(Succeed())
// One local write must produce exactly one broadcast: A and B both
// receive it and apply to memory, but the apply path never re-publishes.
Expect(bus.PublishCount(deltaSubject())).To(Equal(1),
"the apply path must not re-broadcast, otherwise replicas storm")
Expect(a.List()).To(HaveLen(1), "A must not double-store its own echo")
_, ok := b.Get("e")
Expect(ok).To(BeTrue())
})
})
Describe("Store write-through vs apply", func() {
It("writes the Store on local Set/Delete but not on an applied delta", func() {
bus := testutil.NewFakeBus()
storeA := newFakeStore()
storeB := newFakeStore()
a := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus, Store: storeA})
b := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus, Store: storeB})
Expect(a.Start(ctx)).To(Succeed())
Expect(b.Start(ctx)).To(Succeed())
defer func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
}()
Expect(a.Set(ctx, &job{ID: "w", Status: "running"})).To(Succeed())
upA, _, _ := storeA.counts()
upB, _, _ := storeB.counts()
Expect(upA).To(Equal(1), "local Set must write through to its own Store")
Expect(upB).To(Equal(0), "the apply path must never write the peer's Store")
Expect(a.Delete(ctx, "w")).To(Succeed())
_, delA, _ := storeA.counts()
_, delB, _ := storeB.counts()
Expect(delA).To(Equal(1), "local Delete must delete from its own Store")
Expect(delB).To(Equal(0), "the apply path must never delete from the peer's Store")
})
})
Describe("OnApply hook", func() {
It("fires with the correct op and key on an applied delta", func() {
bus := testutil.NewFakeBus()
var (
mu sync.Mutex
ops []string
keys []string
)
a := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
b := syncstate.New(syncstate.Config[string, *job]{
Name: stateName, Key: jobKey, Nats: bus,
OnApply: func(op string, k string, _ *job) {
mu.Lock()
ops = append(ops, op)
keys = append(keys, k)
mu.Unlock()
},
})
Expect(a.Start(ctx)).To(Succeed())
Expect(b.Start(ctx)).To(Succeed())
defer func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
}()
Expect(a.Set(ctx, &job{ID: "o", Status: "running"})).To(Succeed())
Expect(a.Delete(ctx, "o")).To(Succeed())
mu.Lock()
defer mu.Unlock()
Expect(ops).To(Equal([]string{"set", "delete"}))
Expect(keys).To(Equal([]string{"o", "o"}))
})
})
Describe("standalone (nil Nats)", func() {
It("works in-memory with no panic and nothing to broadcast", func() {
m := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey})
Expect(m.Start(ctx)).To(Succeed())
defer func() { Expect(m.Close()).To(Succeed()) }()
Expect(func() {
Expect(m.Set(ctx, &job{ID: "s", Status: "running"})).To(Succeed())
}).ToNot(Panic())
got, ok := m.Get("s")
Expect(ok).To(BeTrue())
Expect(got.Status).To(Equal("running"))
Expect(m.List()).To(HaveLen(1))
Expect(m.Snapshot()).To(HaveKey("s"))
Expect(m.Delete(ctx, "s")).To(Succeed())
_, ok = m.Get("s")
Expect(ok).To(BeFalse())
})
})
Describe("reconnect re-hydrate", func() {
It("re-reads the source when the messaging client reconnects", func() {
bus := testutil.NewFakeBus()
store := newFakeStore(&job{ID: "init", Status: "running"})
m := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus, Store: store})
Expect(m.Start(ctx)).To(Succeed())
defer func() { Expect(m.Close()).To(Succeed()) }()
_, ok := m.Get("init")
Expect(ok).To(BeTrue())
// A peer writes to the shared DB while we are unaware (no delta seen).
store.add(&job{ID: "late", Status: "running"})
_, ok = m.Get("late")
Expect(ok).To(BeFalse(), "the new row should not appear before a re-hydrate")
bus.TriggerReconnect()
_, ok = m.Get("late")
Expect(ok).To(BeTrue(), "reconnect must re-hydrate from the source and pick up drift")
_, _, list := store.counts()
Expect(list).To(Equal(2), "exactly one Start hydrate plus one reconnect re-hydrate")
})
})
})

View File

@@ -0,0 +1,160 @@
package testutil
import (
"encoding/json"
"strings"
"sync"
"time"
"github.com/mudler/LocalAI/core/services/messaging"
)
// FakeBus is an in-memory messaging.MessagingClient that delivers each published
// message synchronously to every registered subscriber whose subject filter
// matches, including NATS-style wildcard subjects (`*` matches exactly one
// token).
//
// Synchronous delivery keeps specs deterministic: the moment Publish returns,
// every matching subscriber's handler has already run, so the spec body can read
// the resulting state without polling. It is the shared test double for every
// cross-replica-sync adopter (gallery, syncstate, ...) so they exercise the same
// delivery semantics. It deliberately depends only on the standard library and
// the messaging package — no test framework — so it is importable anywhere.
type FakeBus struct {
mu sync.Mutex
subs []fakeBusSub
// publishCounts records how many messages were published per subject, so a
// spec can assert the echo-loop guard (an applied delta must not re-publish).
publishCounts map[string]int
// reconnectCbs back the optional OnReconnect/TriggerReconnect pair, letting a
// spec exercise the component's reconnect re-hydrate path without a real
// NATS server.
reconnectCbs []func()
}
type fakeBusSub struct {
subject string
handler func([]byte)
}
// NewFakeBus returns a ready-to-use in-memory bus.
func NewFakeBus() *FakeBus {
return &FakeBus{publishCounts: map[string]int{}}
}
// subjectMatches reports whether a subscription filter matches a concrete
// subject, honoring the single-token `*` wildcard used by NATS.
func subjectMatches(filter, subject string) bool {
if filter == subject {
return true
}
fp := strings.Split(filter, ".")
sp := strings.Split(subject, ".")
if len(fp) != len(sp) {
return false
}
for i := range fp {
if fp[i] == "*" {
continue
}
if fp[i] != sp[i] {
return false
}
}
return true
}
// Publish marshals data as JSON and delivers it synchronously to every matching
// subscriber.
func (b *FakeBus) Publish(subject string, data any) error {
payload, err := json.Marshal(data)
if err != nil {
return err
}
b.mu.Lock()
b.publishCounts[subject]++
subs := append([]fakeBusSub(nil), b.subs...)
b.mu.Unlock()
for _, s := range subs {
if subjectMatches(s.subject, subject) {
s.handler(payload)
}
}
return nil
}
// PublishCount returns how many messages were published on the exact subject.
func (b *FakeBus) PublishCount(subject string) int {
b.mu.Lock()
defer b.mu.Unlock()
return b.publishCounts[subject]
}
type fakeBusSubscription struct {
bus *FakeBus
subRef fakeBusSub
}
func (s *fakeBusSubscription) Unsubscribe() error {
s.bus.mu.Lock()
defer s.bus.mu.Unlock()
for i, candidate := range s.bus.subs {
if candidate.subject == s.subRef.subject {
s.bus.subs = append(s.bus.subs[:i], s.bus.subs[i+1:]...)
return nil
}
}
return nil
}
func (b *FakeBus) Subscribe(subject string, handler func([]byte)) (messaging.Subscription, error) {
sub := fakeBusSub{subject: subject, handler: handler}
b.mu.Lock()
b.subs = append(b.subs, sub)
b.mu.Unlock()
return &fakeBusSubscription{bus: b, subRef: sub}, nil
}
func (b *FakeBus) QueueSubscribe(subject, _ string, handler func([]byte)) (messaging.Subscription, error) {
return b.Subscribe(subject, handler)
}
func (b *FakeBus) QueueSubscribeReply(string, string, func([]byte, func([]byte))) (messaging.Subscription, error) {
return &fakeBusSubscription{bus: b}, nil
}
func (b *FakeBus) SubscribeReply(string, func([]byte, func([]byte))) (messaging.Subscription, error) {
return &fakeBusSubscription{bus: b}, nil
}
func (b *FakeBus) Request(string, []byte, time.Duration) ([]byte, error) {
return nil, nil
}
func (b *FakeBus) IsConnected() bool { return true }
func (b *FakeBus) Close() {}
// OnReconnect mirrors *messaging.Client.OnReconnect so a spec can drive the
// component's reconnect re-hydrate path. The component detects this method via an
// optional interface assertion; implementing it here keeps the fake a faithful
// stand-in for the concrete client.
func (b *FakeBus) OnReconnect(cb func()) {
if cb == nil {
return
}
b.mu.Lock()
b.reconnectCbs = append(b.reconnectCbs, cb)
b.mu.Unlock()
}
// TriggerReconnect runs every registered reconnect callback, simulating a NATS
// reconnect event.
func (b *FakeBus) TriggerReconnect() {
b.mu.Lock()
cbs := append([]func(){}, b.reconnectCbs...)
b.mu.Unlock()
for _, cb := range cbs {
cb()
}
}

View File

@@ -22,13 +22,16 @@ Download the latest DMG from GitHub releases:
3. Drag the LocalAI application to your Applications folder
4. Launch LocalAI from your Applications folder
## Known Issues
## Verification
> **Note**: The DMGs are not signed by Apple and may show as quarantined.
>
> **Workaround**: See [this issue](https://github.com/mudler/LocalAI/issues/6268) for details on how to bypass the quarantine.
>
> **Fix tracking**: The signing issue is being tracked in [this issue](https://github.com/mudler/LocalAI/issues/6244).
The `LocalAI.dmg` (and the app inside it) and the `local-ai` server binary are
signed with an Apple Developer ID and notarized by Apple, so they launch with no
quarantine prompt or workaround. To inspect the signature yourself:
```bash
spctl --assess --type open --context context:primary-signature -v /Applications/LocalAI.app
codesign --verify --deep --strict --verbose=2 /Applications/LocalAI.app
```
## Next Steps

View File

@@ -1,3 +1,3 @@
{
"version": "v4.5.0"
"version": "v4.5.5"
}

View File

@@ -1,4 +1,106 @@
---
- name: "qwen-agentworld-35b-a3b"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF
description: |
# Qwen-AgentWorld-35B-A3B
📑 Technical Report |
📖 Blog |
🤗 Hugging Face |
🤖 ModelScope |
💻 GitHub |
🖥️ Demo
> [!Note]
> This repository contains the model weights and configuration files for **Qwen-AgentWorld-35B-A3B**, a native language world model trained for agentic environment simulation.
>
> These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, etc.
**Qwen-AgentWorld** is the first language world model to cover seven agent interaction domains within a single model. It simulates agentic environments via long chain-of-thought reasoning, predicting the next environment state given an agent's action and interaction history. Trained through a three-stage pipeline — CPT injects environment knowledge, SFT activates next-state-prediction reasoning, RL sharpens simulation fidelity — Qwen-AgentWorld is a **native world model**: environment modeling is the training objective from the CPT stage onward, not a post-hoc add-on.
## Highlights
...
license: "apache-2.0"
tags:
- llm
- gguf
- qwen
icon: https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/Qwen-AgentWorld/logo.png
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
options:
- use_jinja:true
parameters:
model: llama-cpp/models/Qwen-AgentWorld-35B-A3B-GGUF/Qwen-AgentWorld-35B-A3B-UD-Q4_K_M.gguf
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/Qwen-AgentWorld-35B-A3B-GGUF/Qwen-AgentWorld-35B-A3B-UD-Q4_K_M.gguf
sha256: e7a8eafdd8013443b6bcc4b6fb47b2d2025f772d359650b9ceb7d75971e22cad
uri: https://huggingface.co/unsloth/Qwen-AgentWorld-35B-A3B-GGUF/resolve/main/Qwen-AgentWorld-35B-A3B-UD-Q4_K_M.gguf
- name: "ornith-1.0-9b"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
- https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B-GGUF
description: |
[](https://deep-reinforce.com/ornith.html)
# Ornith-1.0-9B-GGUF
Aloha! 🌺 Today, we are releasing Ornith-1.0, a self-improving family of open-source models for agentic coding.
Highlights:
- **State-of-the-Art Coding Agents**: Available in 9B-Dense, 31B-Dense, 35B-MoE, and 397B-MoE (post-trained on top of Gemma 4 and Qwen 3.5), achieving state-of-the-art performance among open-source models of comparable size on coding benchmarks such as Terminal-Bench 2.1, SWE-Bench, NL2Repo and OpenClaw.
- **Self-Improving Training Framework**:  Ornith-1.0 employs RL to learn to generate not only solution rollouts, but also the scallfold that drive those rollouts. By jointly optimizing the scaffold and the resulting solution, the model discovers better search trajectories and generates higher-quality solutions.
- **Licence**: MIT licensed, globally accessible, and free from regional limitations.
## Ornith 1.0 9B
This model card documents **Ornith-1.0-9B**, the most lightweight member of the Ornith family, designed for efficient single-GPU deployment.
### Benchmarks
Ornith-1.0-9B
Qwen3.5-9B
Qwen3.5-35B
Gemma4-12B
Gemma4-31B
Agentic Coding
...
license: "mit"
tags:
- llm
- gguf
overrides:
backend: llama-cpp
function:
automatic_tool_parsing_fallback: true
grammar:
disable: true
known_usecases:
- chat
options:
- use_jinja:true
parameters:
model: llama-cpp/models/Ornith-1.0-9B-GGUF/ornith-1.0-9b-Q4_K_M.gguf
template:
use_tokenizer_template: true
files:
- filename: llama-cpp/models/Ornith-1.0-9B-GGUF/ornith-1.0-9b-Q4_K_M.gguf
sha256: 5720d1f671b4996481274fffe01868c3c36e87c135cc8538471cc7bd6087b106
uri: https://huggingface.co/deepreinforce-ai/Ornith-1.0-9B-GGUF/resolve/main/ornith-1.0-9b-Q4_K_M.gguf
- name: "ornith-1.0-35b"
url: "github:mudler/LocalAI/gallery/virtual.yaml@master"
urls:
@@ -477,6 +579,10 @@
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
overrides:
backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly (the model trains
# to 262144, 32768 is a safe default operators can raise).
context_size: 32768
function:
automatic_tool_parsing_fallback: true
grammar:
@@ -509,6 +615,9 @@
- gguf
overrides:
backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly.
context_size: 32768
function:
automatic_tool_parsing_fallback: true
grammar:
@@ -536,6 +645,9 @@
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/sGQKmrMc6L6guMoaB5_Y2.png
overrides:
backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly.
context_size: 32768
function:
automatic_tool_parsing_fallback: true
grammar:
@@ -586,6 +698,10 @@
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png
overrides:
backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly (the model trains
# to 262144, 32768 is a safe default operators can raise).
context_size: 32768
function:
automatic_tool_parsing_fallback: true
grammar:

View File

@@ -19,6 +19,7 @@ func WorkerPermissions(nodeID, nodeType string) (pubAllow, subAllow []string) {
// Keep this list in sync with the subscriptions in core/cli/agent_worker.go.
subAllow = []string{
"agent.execute",
"agent.*.cancel",
"jobs.*.cancel",
"jobs.*.progress",
"jobs.*.result",

View File

@@ -17,9 +17,15 @@ rm -rf "${BACKEND_DIR}"/build-*
# run.sh's final `exec $CURDIR/<binary>` is the contract for what gets launched;
# the binary is not always named after the backend (e.g. parakeet-cpp launches
# parakeet-cpp-grpc), so derive it from run.sh and fall back to ${BACKEND}.
#
# Only scan the `exec` line(s): many run.sh select a runtime CPU variant via
# unquoted `LIBRARY=$CURDIR/libgo<x>-avx512.so` lines, and a whole-file grep
# would pick the last of those (avx512, which Darwin never builds) instead of
# the binary — failing the check below for whisper/sam3-cpp/vibevoice-cpp/...
# Also tolerate the exec being quoted (`exec "$CURDIR"/<binary>`).
RUN_BINARY=""
if [ -f "${BACKEND_DIR}/run.sh" ]; then
RUN_BINARY=$(grep -oE '\$CURDIR/[A-Za-z0-9._-]+' "${BACKEND_DIR}/run.sh" | grep -v 'ld\.so' | tail -1 | sed 's|\$CURDIR/||')
RUN_BINARY=$(grep -E '^[[:space:]]*exec[[:space:]]' "${BACKEND_DIR}/run.sh" | grep -oE '"?\$CURDIR"?/[A-Za-z0-9._-]+' | grep -v 'ld\.so' | tail -1 | sed -E 's|"?\$CURDIR"?/||')
fi
RUN_BINARY="${RUN_BINARY:-${BACKEND}}"

View File

@@ -141,6 +141,38 @@ copy_elf_deps() {
done < <(ldd "$elf" 2>/dev/null | awk '/=>/ && $3 ~ /^\// {print $3}')
}
# Sweep the transitive shared-library dependencies of everything already
# bundled in a lib dir. The per-vendor packagers below copy an explicit
# allowlist of top-level runtime libs, but those libs pull in transitive deps
# that aren't in the list (e.g. ROCm's librocprofiler-register.so.0, libnuma,
# libdrm_amdgpu). Because backends run through the bundled lib/ld.so with
# LD_LIBRARY_PATH=lib (see run.sh), an unbundled transitive dep is a hard load
# failure (issue #10537: "librocprofiler-register.so.0: cannot open shared
# object file"). ldd resolves the full recursive closure, so a single pass over
# the already-bundled libs is enough; core libc-family deps are skipped via
# copy_elf_deps/is_core_lib so we never shadow the loader's own libc/libstdc++.
sweep_transitive_deps() {
local dir="${1:-$TARGET_LIB_DIR}"
command -v ldd >/dev/null 2>&1 || return 0
# Snapshot the current set first: copy_elf_deps adds files as it runs, and
# ldd already returns the full recursive closure, so we only need to sweep
# the libs that were present before the sweep started.
# `local x=$(...)` keeps set -e from tripping on shopt -p's nonzero exit.
local old_nullglob=$(shopt -p nullglob)
shopt -s nullglob
local libs=("$dir"/*.so*)
eval "$old_nullglob"
local lib
for lib in "${libs[@]}"; do
[ -e "$lib" ] || continue
# Skip symlinks: their real target is in the snapshot and gets swept.
[ -L "$lib" ] && continue
copy_elf_deps "$lib"
done
}
# Package NVIDIA CUDA libraries
package_cuda_libs() {
echo "Packaging CUDA libraries for BUILD_TYPE=${BUILD_TYPE}..."
@@ -185,6 +217,10 @@ package_cuda_libs() {
# cp -arfL /usr/local/cuda/targets "$TARGET_LIB_DIR/../cuda/" 2>/dev/null || true
# fi
# Pull in transitive deps the allowlist misses so the backend is
# self-contained (same class of failure as #10537).
sweep_transitive_deps "$TARGET_LIB_DIR"
echo "CUDA libraries packaged successfully"
}
@@ -261,6 +297,10 @@ package_rocm_libs() {
fi
done
# Pull in transitive deps the allowlist misses (librocprofiler-register.so.0,
# libnuma, libdrm_amdgpu, ...) so the backend is self-contained. See #10537.
sweep_transitive_deps "$TARGET_LIB_DIR"
echo "ROCm libraries packaged successfully"
}
@@ -303,6 +343,10 @@ package_intel_libs() {
fi
done
# Pull in transitive deps the allowlist misses so the backend is
# self-contained (same class of failure as #10537).
sweep_transitive_deps "$TARGET_LIB_DIR"
echo "Intel oneAPI libraries packaged successfully"
}
@@ -432,6 +476,7 @@ export -f copy_lib
export -f copy_libs_glob
export -f is_core_lib
export -f copy_elf_deps
export -f sweep_transitive_deps
export -f package_cuda_libs
export -f package_rocm_libs
export -f package_intel_libs

View File

@@ -0,0 +1,54 @@
#!/bin/bash
# Regression test for scripts/build/package-gpu-libs.sh.
#
# Guards issue #10537: the per-vendor packagers copy an explicit allowlist of
# top-level GPU runtime libs but used to miss their transitive dependencies
# (e.g. ROCm's librocprofiler-register.so.0). Since backends run through the
# bundled lib/ld.so with LD_LIBRARY_PATH=lib, an unbundled transitive dep is a
# fatal "cannot open shared object file" at load time.
#
# This test fabricates a primary lib that links a transitive lib, simulates the
# allowlist step (primary copied, transitive not), and asserts the transitive
# sweep pulls the dependency in. Requires gcc + ldd (present in build images).
set -euo pipefail
CURDIR=$(dirname "$(realpath "$0")")
SCRIPT="$CURDIR/package-gpu-libs.sh"
if ! command -v gcc >/dev/null 2>&1 || ! command -v ldd >/dev/null 2>&1; then
echo "SKIP: gcc/ldd not available"
exit 0
fi
WORK=$(mktemp -d)
trap 'rm -rf "$WORK"' EXIT
# Transitive dependency (stand-in for librocprofiler-register.so.0).
echo 'int transitive_fn(void){return 42;}' > "$WORK/transitive.c"
gcc -shared -fPIC -o "$WORK/libfaketransitive.so.0" "$WORK/transitive.c"
# Primary allowlisted lib (stand-in for libhipblas.so) that links it.
echo 'int transitive_fn(void); int primary_fn(void){return transitive_fn();}' > "$WORK/primary.c"
gcc -shared -fPIC -o "$WORK/libfakeprimary.so.0" "$WORK/primary.c" \
-L"$WORK" -l:libfaketransitive.so.0 -Wl,-rpath,"$WORK"
# Simulate the allowlist step: primary already bundled, transitive not.
TARGET="$WORK/target"
mkdir -p "$TARGET"
cp "$WORK/libfakeprimary.so.0" "$TARGET/"
# Make the transitive dep resolvable like /opt/rocm libs are in the build image.
export LD_LIBRARY_PATH="$WORK:${LD_LIBRARY_PATH:-}"
# shellcheck source=/dev/null
source "$SCRIPT" "$TARGET"
sweep_transitive_deps "$TARGET"
if [ -e "$TARGET/libfaketransitive.so.0" ]; then
echo "PASS: transitive dependency was bundled by sweep_transitive_deps"
exit 0
fi
echo "FAIL: transitive dependency was NOT bundled (regression of #10537)"
ls -la "$TARGET"
exit 1

View File

@@ -0,0 +1,161 @@
package distributed_test
import (
"context"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
pgdriver "gorm.io/driver/postgres"
"gorm.io/gorm"
"gorm.io/gorm/logger"
)
// ftSyncStore adapts the real FineTuneStore to syncstate.Store, exactly as the
// finetune service does in production. Defined here (rather than reusing the
// service's unexported adapter) so the e2e exercises the store + component over
// real infrastructure without pulling in backend execution.
type ftSyncStore struct{ s *distributed.FineTuneStore }
func (a ftSyncStore) List(_ context.Context) ([]*distributed.FineTuneJobRecord, error) {
recs, err := a.s.ListAll()
if err != nil {
return nil, err
}
out := make([]*distributed.FineTuneJobRecord, len(recs))
for i := range recs {
r := recs[i]
out[i] = &r
}
return out, nil
}
func (a ftSyncStore) Upsert(_ context.Context, r *distributed.FineTuneJobRecord) error {
return a.s.Upsert(r)
}
func (a ftSyncStore) Delete(_ context.Context, k string) error { return a.s.Delete(k) }
// This suite is the real-infrastructure counterpart to the fake-bus unit tests:
// two SyncedMap instances stand in for two LocalAI frontend replicas, each with
// its OWN NATS connection to a shared NATS server and a SHARED PostgreSQL store -
// the exact distributed-mode invariant (single shared DB, per-replica process
// state). It proves the delta path works over the wire and that a late-joining
// replica recovers via store hydrate (the at-most-once gap a fake bus cannot
// exercise).
var _ = Describe("SyncedMap two-replica sync over real NATS", Label("Distributed"), func() {
var (
infra *TestInfra
ftStore *distributed.FineTuneStore
)
BeforeEach(func() {
infra = SetupInfra("localai_syncstate_dist_test")
db, err := gorm.Open(pgdriver.Open(infra.PGURL), &gorm.Config{
Logger: logger.Default.LogMode(logger.Silent),
})
Expect(err).ToNot(HaveOccurred())
ftStore, err = distributed.NewFineTuneStore(db)
Expect(err).ToNot(HaveOccurred())
})
// newReplica builds an independent "replica": its own NATS client to the
// shared server plus a SyncedMap over the shared store, started (hydrate +
// subscribe) and cleaned up automatically.
newReplica := func() *syncstate.SyncedMap[string, *distributed.FineTuneJobRecord] {
GinkgoHelper()
nc, err := messaging.New(infra.NatsURL)
Expect(err).ToNot(HaveOccurred())
sm := syncstate.New(syncstate.Config[string, *distributed.FineTuneJobRecord]{
Name: "finetune.jobs",
Key: func(r *distributed.FineTuneJobRecord) string { return r.ID },
Nats: nc,
Store: ftSyncStore{s: ftStore},
})
Expect(sm.Start(infra.Ctx)).To(Succeed())
FlushNATS(nc) // ensure the subscription is registered server-side before any publish
DeferCleanup(func() {
_ = sm.Close()
nc.Close()
})
return sm
}
rec := func(id, status string) *distributed.FineTuneJobRecord {
return &distributed.FineTuneJobRecord{
ID: id, UserID: "u1", Model: "m", Backend: "b",
TrainingType: "lora", TrainingMethod: "sft", Status: status,
}
}
It("propagates a create from replica A to replica B over the wire", func() {
a := newReplica()
b := newReplica()
Expect(a.Set(infra.Ctx, rec("job-1", "queued"))).To(Succeed())
Eventually(func() bool { _, ok := b.Get("job-1"); return ok }, "10s", "50ms").
Should(BeTrue(), "replica B must observe the job created on A via NATS")
got, ok := b.Get("job-1")
Expect(ok).To(BeTrue())
Expect(got.Status).To(Equal("queued"))
})
It("propagates an update and a delete across replicas", func() {
a := newReplica()
b := newReplica()
Expect(a.Set(infra.Ctx, rec("job-2", "queued"))).To(Succeed())
Eventually(func() bool { _, ok := b.Get("job-2"); return ok }, "10s", "50ms").Should(BeTrue())
// Update on A -> B reflects the new status.
Expect(a.Set(infra.Ctx, rec("job-2", "training"))).To(Succeed())
Eventually(func() string {
if r, ok := b.Get("job-2"); ok {
return r.Status
}
return ""
}, "10s", "50ms").Should(Equal("training"))
// Delete on A -> B prunes (a reload-from-path could not do this).
Expect(a.Delete(infra.Ctx, "job-2")).To(Succeed())
Eventually(func() bool { _, ok := b.Get("job-2"); return ok }, "10s", "50ms").
Should(BeFalse(), "replica B must drop the job deleted on A")
})
It("hydrates a late-joining replica from the shared store (missed-delta recovery)", func() {
a := newReplica()
// Written (and broadcast) BEFORE replica C exists, so C can never receive
// the delta - it can only learn the job by hydrating from shared Postgres
// on Start. This is the at-most-once gap a fake bus cannot exercise.
Expect(a.Set(infra.Ctx, rec("job-3", "completed"))).To(Succeed())
Eventually(func() (*distributed.FineTuneJobRecord, error) { return ftStore.Get("job-3") }, "10s", "50ms").
ShouldNot(BeNil(), "write-through must reach the shared store first")
c := newReplica() // joins late; Start() hydrates from the store synchronously
got, ok := c.Get("job-3")
Expect(ok).To(BeTrue(), "late replica must recover the job via store hydrate, not a delta")
Expect(got.Status).To(Equal("completed"))
})
It("write-through persists a local Set to the shared PostgreSQL store", func() {
a := newReplica()
Expect(a.Set(infra.Ctx, rec("job-4", "queued"))).To(Succeed())
persisted, err := ftStore.Get("job-4")
Expect(err).ToNot(HaveOccurred())
Expect(persisted.ID).To(Equal("job-4"))
Expect(persisted.Status).To(Equal("queued"))
})
})