Compare commits

..

11 Commits

Author SHA1 Message Date
LocalAI [bot]
ade9cc9e37 fix(openresponses): bound resume-stream buffer and enforce response ownership (#10569)
The background=true resumable-stream path had two latent issues.

1. Unbounded resume buffer. AppendEvent grew StreamEvents without limit, so
   a long-running or abandoned background generation could consume process
   memory without bound. The store now caps the buffer (event count and total
   bytes, mirroring llama.cpp's byte-capped slot ring), evicting oldest events
   from the front and advancing a droppedThrough watermark. GetEventsAfter
   returns ErrOffsetLost when the requested starting_after is below the
   watermark, and handleStreamResume surfaces that as HTTP 409 before
   committing to the SSE response, so a resuming client gets a clear error
   instead of a silently truncated stream.

2. Missing ownership check (IDOR). GET /responses/:id, its stream resume, and
   /cancel looked up responses purely by ID, letting any caller who knows or
   guesses an ID read or cancel another caller's response. Responses now carry
   the creating caller's identity (auth.GetUser), stamped at creation and
   compared on read/cancel/resume; a mismatch returns 404 (not 403) so
   existence is not leaked. Backward compatible: responses with no owner
   (single-key / no-auth deployments) remain accessible.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 02:02:15 +02:00
LocalAI [bot]
471e38e4e7 chore: ⬆️ Update leejet/stable-diffusion.cpp to 9956436c925a367daeab097598b1ea1f32d3503f (#10533)
⬆️ Update leejet/stable-diffusion.cpp

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-28 01:55:44 +02:00
LocalAI [bot]
f3d829e2ef feat(distributed): add LOCALAI_DISTRIBUTED_SHARED_MODELS to skip staging on shared volumes (#10556) (#10566)
In distributed mode, even when the frontend and workers share the same
models directory via a shared volume mount, starting a model on a worker
re-staged (re-downloaded) it: stageModelFiles always uploads model files
into a tracking-key-namespaced subdir on the worker, and the staging probe
only checks that staged location, so a file already present on the shared
volume at the canonical path was never reused.

Add a config switch LOCALAI_DISTRIBUTED_SHARED_MODELS (default false). When
enabled, the operator asserts that all nodes mount the SAME models directory
at the SAME path, so staging is unnecessary: the frontend's absolute model
paths are already valid on the worker. In that mode stageModelFiles returns
the cloned opts unchanged without uploading, leaving the path fields pointing
at their canonical absolute paths so the worker loads them directly from the
shared volume.

The value is plumbed from DistributedConfig through SmartRouterOptions into
the SmartRouter. Docs and docker-compose.distributed.yaml updated.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 01:23:07 +02:00
LocalAI [bot]
91885c2c7e fix(distributed): return empty backend list for agent nodes instead of failing backend.list (#10545) (#10565)
Opening an AGENT-type worker node's detail page errored with
"failed to list backends on node" / NATS "nodes.<id>.backend.list:
no responders available". Agent workers only subscribe to agent.*,
jobs.*, mcp.* and <prefix>.backend.stop; they never subscribe to
backend.list, so the per-node ListBackendsOnNodeEndpoint request had
no responder and timed out.

The aggregate cluster-wide list already guards this in
managers_distributed.go (skip nodes whose NodeType is set and not
"backend"). The single-node endpoint lacked the same guard. Thread the
NodeRegistry into ListBackendsOnNodeEndpoint and short-circuit to an
empty (non-nil) list for non-backend node types before issuing the
doomed NATS request, mirroring the aggregate-list gate so both views
stay consistent.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 01:22:48 +02:00
LocalAI [bot]
f1fcafb888 fix(gallery): match mmproj/model quant as a whole token so F16 no longer selects BF16 (#10559) (#10564)
pickPreferredGroup matched a quant preference against the shard base
filename with strings.Contains. Because `f16` is a substring of `bf16`,
asking for the `F16` mmproj quant would wrongly satisfy a `BF16` file and
select it when its group came first.

Match the preference as a whole token instead: it must be delimited by a
non-alphanumeric character (or the string start/end) on both outer edges.
Separators inside the preference itself (e.g. `ud-q4_k_xl`) are left
untouched, and all occurrences are scanned before rejecting.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 01:21:33 +02:00
LocalAI [bot]
fdff114701 ci(vibevoice): skip the ASR transcription e2e on release tag builds (#10567)
The `tests-vibevoice-cpp-grpc-transcription` job downloads the vibevoice ASR
model (`vibevoice-asr-q4_k.gguf`, ~10 GB) and decodes it through the
e2e-backends harness. On release tag pushes the detect step forces the full
matrix (run-all=true), so this job runs and consistently times out: the inner
`go test -timeout 30m` cannot pull a 10 GB file from HuggingFace's throttled
Xet CDN within budget (curl --max-time 600 x5 retries overruns the deadline),
leaving an orphaned curl and a 30m panic. It has been red on every release
(v4.5.3/4/5).

Guard the job's `if` with `!startsWith(github.ref, 'refs/tags/')` so it no
longer runs on tag/release builds. It still runs on PRs and branch pushes that
touch vibevoice-cpp, so real regressions are caught off the release path. A
proper fix (a small ASR test GGUF) can re-enable it on tags later.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-28 00:40:21 +02:00
LocalAI [bot]
1154be5eea fix(config): fall back to DefaultContextSize for unparseable GGUFs; pin NVFP4 gallery context_size (#10563)
The GGUF metadata parser (gpustack/gguf-parser-go) cannot read NVFP4-quantized
GGUFs at all: it errors with "read tensor info 0: This quantized type is
currently unsupported" because NVFP4 is a ggml tensor type it does not know.
When ParseGGUFFile errors, the llama-cpp defaults hook skips guessGGUFFromFile
entirely and the deferred fallback sets the context window to the conservative
GGUFFallbackContextSize (1024). The result: a model that trains to 262144
tokens runs with n_ctx=1024, and every prompt over ~1k tokens fails with
"request (N tokens) exceeds the available context size (1024 tokens)".

Two changes:

- Drop GGUFFallbackContextSize (1024) and fall back to DefaultContextSize
  (4096) in both the GGUF run-estimate path (gguf.go) and the deferred hook
  fallback (hooks_llamacpp.go). 1024 is a sensible floor for a tiny CPU GGUF
  but a footgun for a large, long-context model whose header simply cannot be
  parsed. Strengthen the existing "GGUF unreadable" test to assert the value.

- Set context_size explicitly on the four NVFP4 gallery entries
  (qwen3.6-35b-a3b-nvfp4-mtp, qwopus3.6-27b-v2-mtp-nvfp4,
  qwopus3.6-27b-coder-mtp-nvfp4, qwen3.6-27b-nvfp4-mtp) so the parser failure
  is irrelevant for them. 32768 matches sibling Qwen entries and is safe on
  memory; operators can raise it toward the 262144 train length.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 23:34:52 +02:00
LocalAI [bot]
8aba4fdba3 chore(fish-speech): drop the darwin/metal build target (#10561)
The fish-speech metal-darwin-arm64 backend build has been failing on every
release (v4.5.3, v4.5.4, v4.5.5) and is a standing red on the darwin backend
matrix. fish-speech pulls `tokenizers` transitively from its upstream source
(`pip install -e fish-speech-src`), and on darwin/arm64 there is no prebuilt
wheel for the pinned old `tokenizers` version, so pip builds it from source.
Modern rustc rejects that old crate as a hard error:

    error: casting `&T` to `&mut T` is undefined behavior ...
       --> tokenizers-lib/src/models/bpe/trainer.rs:517:47
       = note: `#[deny(invalid_reference_casting)]` on by default
    error: could not compile `tokenizers` (lib) due to 1 previous error

This is deterministic, not a flake, and there is no clean fix that does not
either pin a stale Rust toolchain or downgrade a soundness lint guarding real
UB. Until upstream fish-speech moves to a tokenizers version that compiles on
current toolchains, drop darwin support so the release backend build stays
green. The Linux/CUDA/ROCm/Intel/L4T variants are unaffected.

Removes:
- the `-metal-darwin-arm64-fish-speech` entry from `includeDarwin` in
  backend-matrix.yml
- the `metal:` capability mappings and the concrete `metal-fish-speech` /
  `metal-fish-speech-development` gallery entries in backend/index.yaml
- the now-unused darwin-only requirements-mps.txt

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 23:24:21 +02:00
LocalAI [bot]
d7d7721eae feat(distributed): SyncedMap component + migrate finetune/quant/agent-tasks to cross-replica state (#10542)
* feat(distributed): add SyncedMap cross-replica in-memory state component

Introduce core/services/syncstate.SyncedMap[K,V]: a thread-safe in-memory map
that keeps itself consistent across frontend replicas via NATS, with an optional
pluggable durable Store and hydrate-from-source convergence.

Several features keep process-local state surfaced to the API (finetune/quant
jobs, agent tasks, model configs) and each hand-wired the same in-memory + NATS
broadcast + read-through-store legs - or forgot to, reintroducing cross-replica
staleness. SyncedMap makes that consistency a configuration choice:

- local writes mutate the map, write through the Store, then broadcast a delta;
- the apply path is memory-only and never re-publishes or re-writes the Store
  (structural echo-loop guard, mirroring galleryop.mergeStatus);
- on Start and on NATS reconnect the map re-hydrates from the source (Store, else
  Loader); an optional periodic Reconcile repairs silent drift;
- standalone mode (nil NATS client) is a strict in-memory no-op.

Reconnect re-hydrate is wired via a new *messaging.Client.OnReconnect callback,
consumed through an optional type-assertion so MessagingClient stays minimal.
Adds messaging.SubjectSyncStateDelta and a reusable testutil.FakeBus (synchronous
in-process MessagingClient with wildcard matching) for adopter tests.

Component only; service migrations follow in subsequent commits.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* refactor(finetune): back jobs with SyncedMap for cross-replica consistency

FineTuneService kept jobs in a process-local map and, although it wrote them to
Postgres, ListJobs/GetJob never read the store back and the wired natsClient was
never used - so in distributed mode a job created on one replica was invisible to
the others. Replace the map and the dead client with a syncstate.SyncedMap keyed
by job ID, value *schema.FineTuneJob (the exact REST shape, so responses are
unchanged).

- Add a Store adapter (core/services/finetune/syncstore.go) over FineTuneStore,
  plus FineTuneStore.ListAll (global hydrate; per-user List kept) and an
  idempotent Upsert (create-or-update; Create alone fails on dup key).
- Writes go through SyncedMap.Set/Delete (write-through + broadcast); reads use
  List/Get. The on-disk state.json path becomes the standalone Loader, keeping
  single-node restart recovery (stale->stopped / exporting->failed fixups).
- Fold SetNATSClient/SetFineTuneStore into NewFineTuneService; app.go passes the
  distributed NATS client + store when distributed, nil otherwise.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* refactor(agentpool): back agent tasks with SyncedMap for cross-replica consistency

AgentJobService.ListTasks read the process-local tasks map only, while ListJobs
already read through the DB persister + dispatcher NATS - so in distributed mode
a task created on one replica was invisible to the others. Back tasks with a
syncstate.SyncedMap keyed by task ID (value schema.Task, the exact REST shape);
jobs are left untouched.

- Store adapter (task_syncstore.go) over the existing JobPersister
  (LoadTasks/SaveTask/DeleteTask); reads svc.persister/userID live so a persister
  swap needs no rebuild. No new persister methods required.
- Task reads -> SyncedMap.List/Get; create/update -> Set (write-through +
  broadcast); delete -> Delete. The file persister now owns its own task set so
  the write-through path does not re-enter the SyncedMap lock (deadlock guard).
- The distributed NATS client is not available at construction (start() precedes
  initDistributed), so it is injected via SetTaskSyncNATS, which rebuilds the
  still-empty map before Start/hydrate. Wired at the main, restart, and per-user
  (UserServicesManager) distributed sites.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* refactor(quantization): back jobs with SyncedMap + durable QuantStore

QuantizationService kept jobs in a process-local map persisted only to a local
state.json, so in distributed mode jobs were neither visible across replicas nor
durable cluster-wide. Back jobs with a syncstate.SyncedMap keyed by job ID
(value *schema.QuantizationJob, the exact REST shape).

- New distributed.QuantStore (GORM, table quantization_jobs) mirroring
  FineTuneStore: Create/Get/ListAll/Upsert(idempotent)/Delete, registered for
  AutoMigrate via distributed.InitStores (Stores.Quant).
- New adapter (quantization/syncstore.go) over QuantStore implementing
  syncstate.Store, with record<->schema conversion.
- Reads go through List/Get, writes through Set/Delete (write-through +
  broadcast); state.json is kept as the standalone Loader for single-node restart
  recovery (stale-job fixups preserved).
- app.go passes the distributed NATS client + QuantStore when distributed, nil
  otherwise; Start/Close lifecycle mirrors finetune.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* fix(syncstate): annotate gosec G118 false positive on lifeCtx

gosec flagged the WithCancel in Start as "cancellation function not called"
because the returned cancel is stored on the struct rather than called/deferred
in scope. It is invoked in Close (covered by tests), and lifeCtx must outlive
Start to drive the reconnect/reconcile goroutines. Suppress the verified false
positive with a justified #nosec G118.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* test(distributed): e2e two-replica SyncedMap sync over real NATS + Postgres

Adds the real-infrastructure counterpart to the fake-bus unit tests, in the
existing distributed e2e suite (testcontainers NATS + PostgreSQL). Two SyncedMap
instances stand in for two frontend replicas - each with its OWN NATS connection
to a shared server and a SHARED Postgres store (the distributed-mode invariant) -
and assert, over the wire:

- a create on replica A is observed by replica B;
- an update and a delete propagate A -> B (delete prunes, which a reload cannot);
- a late-joining replica recovers a job it never received a delta for, via store
  hydrate on Start (the at-most-once gap a fake bus cannot exercise);
- a local Set is written through to the shared Postgres store.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-27 23:23:51 +02:00
Nicholas Ciechanowski
c548150f99 fix(distributed): missing agent NATS permission (#10549)
Signed-off-by: Nicholas Ciechanowski <nicholas@ciech.anow.ski>
2026-06-27 21:10:12 +00:00
LocalAI [bot]
ec26b86dd4 docs: ⬆️ update docs version mudler/LocalAI (#10560)
⬆️ Update docs version mudler/LocalAI

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: mudler <2420543+mudler@users.noreply.github.com>
2026-06-27 22:36:02 +02:00
58 changed files with 3128 additions and 248 deletions

View File

@@ -4991,9 +4991,6 @@ includeDarwin:
- backend: "qwen-tts" - backend: "qwen-tts"
tag-suffix: "-metal-darwin-arm64-qwen-tts" tag-suffix: "-metal-darwin-arm64-qwen-tts"
build-type: "mps" build-type: "mps"
- backend: "fish-speech"
tag-suffix: "-metal-darwin-arm64-fish-speech"
build-type: "mps"
- backend: "voxcpm" - backend: "voxcpm"
tag-suffix: "-metal-darwin-arm64-voxcpm" tag-suffix: "-metal-darwin-arm64-voxcpm"
build-type: "mps" build-type: "mps"

View File

@@ -1008,7 +1008,11 @@ jobs:
# image + working dir. # image + working dir.
tests-vibevoice-cpp-grpc-transcription: tests-vibevoice-cpp-grpc-transcription:
needs: detect-changes needs: detect-changes
if: needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true' # Skip on release tag pushes: the ASR Q4_K model is ~10 GB and cannot be
# pulled from HF within the inner `go test -timeout 30m` budget on a CI
# runner, so every tag build hung and timed out. Still runs on PRs/branch
# pushes that touch vibevoice-cpp so regressions are caught off the release path.
if: (needs.detect-changes.outputs.vibevoice-cpp == 'true' || needs.detect-changes.outputs.run-all == 'true') && !startsWith(github.ref, 'refs/tags/')
runs-on: bigger-runner runs-on: bigger-runner
timeout-minutes: 150 timeout-minutes: 150
steps: steps:

View File

@@ -8,7 +8,7 @@ JOBS?=$(shell nproc --ignore=1)
# stablediffusion.cpp (ggml) # stablediffusion.cpp (ggml)
STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp STABLEDIFFUSION_GGML_REPO?=https://github.com/leejet/stable-diffusion.cpp
STABLEDIFFUSION_GGML_VERSION?=8caa3f908ae6d4a4bef531e73b9a969f266a3d1f STABLEDIFFUSION_GGML_VERSION?=9956436c925a367daeab097598b1ea1f32d3503f
CMAKE_ARGS+=-DGGML_MAX_NAME=128 CMAKE_ARGS+=-DGGML_MAX_NAME=128

View File

@@ -1356,7 +1356,6 @@
intel: "intel-fish-speech" intel: "intel-fish-speech"
amd: "rocm-fish-speech" amd: "rocm-fish-speech"
nvidia-l4t: "nvidia-l4t-fish-speech" nvidia-l4t: "nvidia-l4t-fish-speech"
metal: "metal-fish-speech"
default: "cpu-fish-speech" default: "cpu-fish-speech"
nvidia-cuda-13: "cuda13-fish-speech" nvidia-cuda-13: "cuda13-fish-speech"
nvidia-cuda-12: "cuda12-fish-speech" nvidia-cuda-12: "cuda12-fish-speech"
@@ -4870,7 +4869,6 @@
intel: "intel-fish-speech-development" intel: "intel-fish-speech-development"
amd: "rocm-fish-speech-development" amd: "rocm-fish-speech-development"
nvidia-l4t: "nvidia-l4t-fish-speech-development" nvidia-l4t: "nvidia-l4t-fish-speech-development"
metal: "metal-fish-speech-development"
default: "cpu-fish-speech-development" default: "cpu-fish-speech-development"
nvidia-cuda-13: "cuda13-fish-speech-development" nvidia-cuda-13: "cuda13-fish-speech-development"
nvidia-cuda-12: "cuda12-fish-speech-development" nvidia-cuda-12: "cuda12-fish-speech-development"
@@ -4946,16 +4944,6 @@
uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech" uri: "quay.io/go-skynet/local-ai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech"
mirrors: mirrors:
- localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech - localai/localai-backends:master-nvidia-l4t-cuda-13-arm64-fish-speech
- !!merge <<: *fish-speech
name: "metal-fish-speech"
uri: "quay.io/go-skynet/local-ai-backends:latest-metal-darwin-arm64-fish-speech"
mirrors:
- localai/localai-backends:latest-metal-darwin-arm64-fish-speech
- !!merge <<: *fish-speech
name: "metal-fish-speech-development"
uri: "quay.io/go-skynet/local-ai-backends:master-metal-darwin-arm64-fish-speech"
mirrors:
- localai/localai-backends:master-metal-darwin-arm64-fish-speech
## faster-qwen3-tts ## faster-qwen3-tts
- !!merge <<: *faster-qwen3-tts - !!merge <<: *faster-qwen3-tts
name: "faster-qwen3-tts-development" name: "faster-qwen3-tts-development"

View File

@@ -1,2 +0,0 @@
torch
torchaudio

View File

@@ -37,6 +37,8 @@ func (a *Application) RestartAgentJobService() error {
if d.JobStore != nil { if d.JobStore != nil {
agentJobService.SetDistributedJobStore(d.JobStore) agentJobService.SetDistributedJobStore(d.JobStore)
} }
// Keep agent tasks consistent across replicas (same client the dispatcher uses).
agentJobService.SetTaskSyncNATS(d.Nats)
} }
// Start the service // Start the service

View File

@@ -604,6 +604,10 @@ func (a *Application) StartAgentPool() {
usm.SetJobDBStore(s) usm.SetJobDBStore(s)
} }
} }
// Keep per-user agent tasks consistent across replicas (nil in standalone).
if d := a.Distributed(); d != nil {
usm.SetJobSyncNATS(d.Nats)
}
aps.SetUserServicesManager(usm) aps.SetUserServicesManager(usm)
a.agentPoolService.Store(aps) a.agentPoolService.Store(aps)

View File

@@ -355,6 +355,7 @@ func initDistributed(cfg *config.ApplicationConfig, authDB *gorm.DB, configLoade
PrefixProvider: prefixProvider, PrefixProvider: prefixProvider,
PrefixConfig: prefixCfg, PrefixConfig: prefixCfg,
Pressure: pressure, Pressure: pressure,
SharedModels: cfg.Distributed.SharedModels,
}) })
// Wire staging-progress broadcasting so file-staging shows up on every // Wire staging-progress broadcasting so file-staging shows up on every

View File

@@ -280,6 +280,9 @@ func New(opts ...config.AppOption) (*Application, error) {
if application.agentJobService != nil { if application.agentJobService != nil {
application.agentJobService.SetDistributedBackends(distSvc.Dispatcher) application.agentJobService.SetDistributedBackends(distSvc.Dispatcher)
application.agentJobService.SetDistributedJobStore(distSvc.JobStore) application.agentJobService.SetDistributedJobStore(distSvc.JobStore)
// Keep agent tasks consistent across replicas (jobs already sync via the
// dispatcher + DB read-through). Same NATS client the dispatcher uses.
application.agentJobService.SetTaskSyncNATS(distSvc.Nats)
} }
// Wire skill store into AgentPoolService (wired at pool start time via closure) // Wire skill store into AgentPoolService (wired at pool start time via closure)
// The actual wiring happens in StartAgentPool since the pool doesn't exist yet. // The actual wiring happens in StartAgentPool since the pool doesn't exist yet.

View File

@@ -160,6 +160,7 @@ type RunCMD struct {
RegistrationRequireAuth bool `env:"LOCALAI_REGISTRATION_REQUIRE_AUTH" default:"false" help:"Fail startup when distributed mode is enabled but LOCALAI_REGISTRATION_TOKEN is empty (node endpoints and worker file-transfer server would otherwise be unauthenticated)" group:"distributed"` RegistrationRequireAuth bool `env:"LOCALAI_REGISTRATION_REQUIRE_AUTH" default:"false" help:"Fail startup when distributed mode is enabled but LOCALAI_REGISTRATION_TOKEN is empty (node endpoints and worker file-transfer server would otherwise be unauthenticated)" group:"distributed"`
DistributedRequireAuth bool `env:"LOCALAI_DISTRIBUTED_REQUIRE_AUTH" default:"false" help:"Umbrella switch: require BOTH NATS JWT credentials and a registration token when distributed mode is enabled (implies --nats-require-auth and --registration-require-auth)" group:"distributed"` DistributedRequireAuth bool `env:"LOCALAI_DISTRIBUTED_REQUIRE_AUTH" default:"false" help:"Umbrella switch: require BOTH NATS JWT credentials and a registration token when distributed mode is enabled (implies --nats-require-auth and --registration-require-auth)" group:"distributed"`
AutoApproveNodes bool `env:"LOCALAI_AUTO_APPROVE_NODES" default:"false" help:"Auto-approve new worker nodes (skip admin approval)" group:"distributed"` AutoApproveNodes bool `env:"LOCALAI_AUTO_APPROVE_NODES" default:"false" help:"Auto-approve new worker nodes (skip admin approval)" group:"distributed"`
DistributedSharedModels bool `env:"LOCALAI_DISTRIBUTED_SHARED_MODELS" default:"false" help:"Assert that every node mounts the SAME models directory at the SAME path (shared volume). When true, the router skips staging model files to workers and loads them directly from the shared path, avoiding re-downloads." group:"distributed"`
DistributedPrefixCache bool `env:"LOCALAI_DISTRIBUTED_PREFIX_CACHE" default:"true" help:"Enable prefix-cache-aware routing in distributed mode (default true). When false, routing falls back to round-robin." group:"distributed"` DistributedPrefixCache bool `env:"LOCALAI_DISTRIBUTED_PREFIX_CACHE" default:"true" help:"Enable prefix-cache-aware routing in distributed mode (default true). When false, routing falls back to round-robin." group:"distributed"`
DistributedPrefixCacheTTL string `env:"LOCALAI_DISTRIBUTED_PREFIX_CACHE_TTL" help:"Idle-timeout for prefix-cache index entries; also drives the background eviction cadence (every TTL/2). Default 5m." group:"distributed"` DistributedPrefixCacheTTL string `env:"LOCALAI_DISTRIBUTED_PREFIX_CACHE_TTL" help:"Idle-timeout for prefix-cache index entries; also drives the background eviction cadence (every TTL/2). Default 5m." group:"distributed"`
BackendInstallTimeout string `env:"LOCALAI_NATS_BACKEND_INSTALL_TIMEOUT" help:"NATS round-trip timeout for backend.install requests sent to worker nodes (default 15m). Increase for slow links pulling multi-GB images." group:"distributed"` BackendInstallTimeout string `env:"LOCALAI_NATS_BACKEND_INSTALL_TIMEOUT" help:"NATS round-trip timeout for backend.install requests sent to worker nodes (default 15m). Increase for slow links pulling multi-GB images." group:"distributed"`
@@ -310,6 +311,9 @@ func (r *RunCMD) Run(ctx *cliContext.Context) error {
if r.DistributedRequireAuth { if r.DistributedRequireAuth {
opts = append(opts, config.EnableDistributedRequireAuth) opts = append(opts, config.EnableDistributedRequireAuth)
} }
if r.DistributedSharedModels {
opts = append(opts, config.EnableDistributedSharedModels)
}
if r.NatsAccountSeed != "" { if r.NatsAccountSeed != "" {
opts = append(opts, config.WithNatsAccountSeed(r.NatsAccountSeed)) opts = append(opts, config.WithNatsAccountSeed(r.NatsAccountSeed))
} }

View File

@@ -12,14 +12,12 @@ package config
// these; config never imports backend. // these; config never imports backend.
const ( const (
// DefaultContextSize is the fallback context window when none is configured // DefaultContextSize is the fallback context window when none is configured
// or estimable from the model. // or estimable from the model. It is also the fallback for a GGUF whose
// metadata yields no usable estimate or that the parser cannot read at all
// (e.g. a quant type it does not know, such as NVFP4): a model-agnostic
// safe default beats a tiny, surprising window that truncates real prompts.
DefaultContextSize = 4096 DefaultContextSize = 4096
// GGUFFallbackContextSize is the context window for a GGUF model whose
// metadata yields no usable estimate (see guessGGUFFromFile). Deliberately
// smaller than DefaultContextSize to stay conservative on memory there.
GGUFFallbackContextSize = 1024
// DefaultNGPULayers means "offload all layers"; the backend (fit_params) // DefaultNGPULayers means "offload all layers"; the backend (fit_params)
// clamps to what actually fits in device memory. // clamps to what actually fits in device memory.
DefaultNGPULayers = 99999999 DefaultNGPULayers = 99999999

View File

@@ -31,6 +31,14 @@ type DistributedConfig struct {
// available to enforce just one layer. // available to enforce just one layer.
RequireAuth bool // LOCALAI_DISTRIBUTED_REQUIRE_AUTH RequireAuth bool // LOCALAI_DISTRIBUTED_REQUIRE_AUTH
AutoApproveNodes bool // --auto-approve-nodes / LOCALAI_AUTO_APPROVE_NODES (skip admin approval for new workers) AutoApproveNodes bool // --auto-approve-nodes / LOCALAI_AUTO_APPROVE_NODES (skip admin approval for new workers)
// SharedModels asserts that every node (frontend and workers) mounts the
// SAME models directory at the SAME path (e.g. a shared volume, as in
// docker-compose.distributed.yaml). When true, the router skips staging
// model files to workers entirely: the frontend's absolute model paths are
// already valid on the worker, so re-uploading them into a per-model
// subdirectory only re-downloads what is already present (#10556). Default
// false preserves the historical per-node staging behavior.
SharedModels bool // --distributed-shared-models / LOCALAI_DISTRIBUTED_SHARED_MODELS
// NATS JWT auth (optional; see pkg/natsauth and docs/features/distributed-mode.md) // NATS JWT auth (optional; see pkg/natsauth and docs/features/distributed-mode.md)
NatsAccountSeed string // LOCALAI_NATS_ACCOUNT_SEED — account signing seed to mint per-node worker JWTs NatsAccountSeed string // LOCALAI_NATS_ACCOUNT_SEED — account signing seed to mint per-node worker JWTs
@@ -282,6 +290,13 @@ var EnableAutoApproveNodes = func(o *ApplicationConfig) {
o.Distributed.AutoApproveNodes = true o.Distributed.AutoApproveNodes = true
} }
// EnableDistributedSharedModels marks the cluster as sharing one models
// directory across all nodes, so the router skips staging model files to
// workers (see DistributedConfig.SharedModels).
var EnableDistributedSharedModels = func(o *ApplicationConfig) {
o.Distributed.SharedModels = true
}
// DisablePrefixCache turns off prefix-cache-aware routing (falls back to // DisablePrefixCache turns off prefix-cache-aware routing (falls back to
// round-robin). Prefix-cache routing is enabled by default in distributed mode. // round-robin). Prefix-cache routing is enabled by default in distributed mode.
var DisablePrefixCache = func(o *ApplicationConfig) { var DisablePrefixCache = func(o *ApplicationConfig) {

View File

@@ -33,7 +33,7 @@ func guessGGUFFromFile(cfg *ModelConfig, f *gguf.GGUFFile, defaultCtx int) {
cSize := int(ctxSize) cSize := int(ctxSize)
cfg.ContextSize = &cSize cfg.ContextSize = &cSize
} else { } else {
defaultCtx = GGUFFallbackContextSize defaultCtx = DefaultContextSize
cfg.ContextSize = &defaultCtx cfg.ContextSize = &defaultCtx
} }
} }

View File

@@ -34,7 +34,7 @@ func llamaCppDefaults(cfg *ModelConfig, modelPath string) {
// Default context size if not set, regardless of whether GGUF parsing succeeds // Default context size if not set, regardless of whether GGUF parsing succeeds
defer func() { defer func() {
if cfg.ContextSize == nil { if cfg.ContextSize == nil {
ctx := GGUFFallbackContextSize ctx := DefaultContextSize
cfg.ContextSize = &ctx cfg.ContextSize = &ctx
} }
}() }()

View File

@@ -248,7 +248,11 @@ var _ = Describe("Backend hooks and parser defaults", func() {
} }
cfg.SetDefaults(ModelPath(dir)) cfg.SetDefaults(ModelPath(dir))
// An unreadable/unparseable GGUF (e.g. a quant type the parser does
// not know, such as NVFP4) yields no estimate, so the hook must fall
// back to DefaultContextSize rather than a tiny, surprising value.
Expect(cfg.ContextSize).NotTo(BeNil()) Expect(cfg.ContextSize).NotTo(BeNil())
Expect(*cfg.ContextSize).To(Equal(DefaultContextSize))
}) })
}) })

View File

@@ -25,8 +25,8 @@ var (
type LlamaCPPImporter struct{} type LlamaCPPImporter struct{}
func (i *LlamaCPPImporter) Name() string { return "llama-cpp" } func (i *LlamaCPPImporter) Name() string { return "llama-cpp" }
func (i *LlamaCPPImporter) Modality() string { return "text" } func (i *LlamaCPPImporter) Modality() string { return "text" }
func (i *LlamaCPPImporter) AutoDetects() bool { return true } func (i *LlamaCPPImporter) AutoDetects() bool { return true }
// AdditionalBackends advertises drop-in replacements that share the // AdditionalBackends advertises drop-in replacements that share the
@@ -293,7 +293,7 @@ func pickPreferredGroup(groups []hfapi.ShardGroup, prefs []string) *hfapi.ShardG
for _, pref := range prefs { for _, pref := range prefs {
lower := strings.ToLower(pref) lower := strings.ToLower(pref)
for i := range groups { for i := range groups {
if strings.Contains(strings.ToLower(groups[i].Base), lower) { if quantTokenMatches(strings.ToLower(groups[i].Base), lower) {
return &groups[i] return &groups[i]
} }
} }
@@ -301,6 +301,39 @@ func pickPreferredGroup(groups []hfapi.ShardGroup, prefs []string) *hfapi.ShardG
return &groups[len(groups)-1] return &groups[len(groups)-1]
} }
// quantTokenMatches reports whether pref appears in base as a whole token
// rather than as a substring of a larger alphanumeric run. Both arguments
// must already be lowercased.
//
// A plain strings.Contains is wrong here: `f16` is a substring of `bf16`, so
// asking for the `F16` quant used to wrongly select a `BF16` file (#10559).
// Only the OUTER edges of the matched preference must hit a boundary — a
// non-alphanumeric char (or the start/end of base). Separators inside the
// preference itself (e.g. `ud-q4_k_xl`) are intentionally left untouched.
func quantTokenMatches(base, pref string) bool {
if pref == "" {
return false
}
for start := strings.Index(base, pref); start != -1; {
end := start + len(pref)
leftOK := start == 0 || !isAlphaNum(base[start-1])
rightOK := end == len(base) || !isAlphaNum(base[end])
if leftOK && rightOK {
return true
}
next := strings.Index(base[start+1:], pref)
if next == -1 {
break
}
start += next + 1
}
return false
}
func isAlphaNum(b byte) bool {
return (b >= 'a' && b <= 'z') || (b >= '0' && b <= '9')
}
// maybeApplyMTPDefaults parses the picked GGUF header (range-fetched over // maybeApplyMTPDefaults parses the picked GGUF header (range-fetched over
// HTTP for HF/URL imports) and, if the file declares a Multi-Token Prediction // HTTP for HF/URL imports) and, if the file declares a Multi-Token Prediction
// head, appends the auto-MTP option keys to modelConfig.Options. Failures // head, appends the auto-MTP option keys to modelConfig.Options. Failures

View File

@@ -374,6 +374,104 @@ var _ = Describe("LlamaCPPImporter", func() {
}) })
}) })
Context("quant token boundary matching", func() {
// Regression for #10559: the quant preference must match as a whole
// token, not as a substring. Asking for `F16` used to select a
// `BF16` mmproj because strings.Contains("...bf16.gguf", "f16") is
// true — the leading `b` was ignored.
const repoBase = "https://huggingface.co/acme/example-GGUF/resolve/main/"
hfFile := func(path, sha string) hfapi.ModelFile {
return hfapi.ModelFile{
Path: path,
SHA256: sha,
URL: repoBase + path,
}
}
withHF := func(preferences string, files ...hfapi.ModelFile) Details {
d := Details{
URI: "https://huggingface.co/acme/example-GGUF",
HuggingFace: &hfapi.ModelDetails{
ModelID: "acme/example-GGUF",
Files: files,
},
}
if preferences != "" {
d.Preferences = json.RawMessage(preferences)
}
return d
}
It("selects the F16 mmproj over BF16 (BF16 listed first)", func() {
details := withHF(`{"name":"VL","mmproj_quantizations":"F16"}`,
hfFile("model-Q4_K_M.gguf", "model"),
hfFile("mmproj-x-BF16.gguf", "bf16"),
hfFile("mmproj-x-F16.gguf", "f16"),
)
modelConfig, err := importer.Import(details)
Expect(err).ToNot(HaveOccurred())
Expect(modelConfig.ConfigFile).To(ContainSubstring("mmproj: llama-cpp/mmproj/VL/mmproj-x-F16.gguf"), fmt.Sprintf("%+v", modelConfig))
Expect(modelConfig.ConfigFile).ToNot(ContainSubstring("BF16"), fmt.Sprintf("%+v", modelConfig))
})
It("selects the F16 mmproj over BF16 (F16 listed first)", func() {
details := withHF(`{"name":"VL","mmproj_quantizations":"F16"}`,
hfFile("model-Q4_K_M.gguf", "model"),
hfFile("mmproj-x-F16.gguf", "f16"),
hfFile("mmproj-x-BF16.gguf", "bf16"),
)
modelConfig, err := importer.Import(details)
Expect(err).ToNot(HaveOccurred())
Expect(modelConfig.ConfigFile).To(ContainSubstring("mmproj: llama-cpp/mmproj/VL/mmproj-x-F16.gguf"), fmt.Sprintf("%+v", modelConfig))
Expect(modelConfig.ConfigFile).ToNot(ContainSubstring("BF16"), fmt.Sprintf("%+v", modelConfig))
})
It("selects BF16 when BF16 is the requested mmproj quant", func() {
details := withHF(`{"name":"VL","mmproj_quantizations":"BF16"}`,
hfFile("model-Q4_K_M.gguf", "model"),
hfFile("mmproj-x-F16.gguf", "f16"),
hfFile("mmproj-x-BF16.gguf", "bf16"),
)
modelConfig, err := importer.Import(details)
Expect(err).ToNot(HaveOccurred())
Expect(modelConfig.ConfigFile).To(ContainSubstring("mmproj: llama-cpp/mmproj/VL/mmproj-x-BF16.gguf"), fmt.Sprintf("%+v", modelConfig))
})
It("still matches a normal model quant with internal separators", func() {
// ud-q4_k_xl contains `-`/`_` internally; only the outer edges
// must hit a token boundary.
details := withHF(`{"name":"M","quantizations":"ud-q4_k_xl"}`,
hfFile("model-UD-Q4_K_XL.gguf", "xl"),
hfFile("model-Q3_K_M.gguf", "q3"),
)
modelConfig, err := importer.Import(details)
Expect(err).ToNot(HaveOccurred())
Expect(modelConfig.ConfigFile).To(ContainSubstring("model: llama-cpp/models/M/model-UD-Q4_K_XL.gguf"), fmt.Sprintf("%+v", modelConfig))
})
It("falls back to the last group when no preference matches", func() {
details := withHF(`{"name":"M","quantizations":"Q2_K"}`,
hfFile("model-Q8_0.gguf", "q8"),
hfFile("model-Q3_K_M.gguf", "q3"),
)
modelConfig, err := importer.Import(details)
Expect(err).ToNot(HaveOccurred())
Expect(modelConfig.ConfigFile).To(ContainSubstring("model: llama-cpp/models/M/model-Q3_K_M.gguf"), fmt.Sprintf("%+v", modelConfig))
})
})
Context("AdditionalBackends", func() { Context("AdditionalBackends", func() {
It("advertises ik-llama-cpp and turboquant as drop-in replacements", func() { It("advertises ik-llama-cpp and turboquant as drop-in replacements", func() {
entries := importer.AdditionalBackends() entries := importer.AdditionalBackends()

View File

@@ -23,8 +23,10 @@ import (
"github.com/mudler/LocalAI/core/application" "github.com/mudler/LocalAI/core/application"
"github.com/mudler/LocalAI/core/schema" "github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/finetune" "github.com/mudler/LocalAI/core/services/finetune"
"github.com/mudler/LocalAI/core/services/galleryop" "github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/nodes" "github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/quantization" "github.com/mudler/LocalAI/core/services/quantization"
@@ -400,25 +402,45 @@ func API(application *application.Application) (*echo.Echo, error) {
routes.RegisterAgentPoolRoutes(e, application, agentsMw, skillsMw, collectionsMw) routes.RegisterAgentPoolRoutes(e, application, agentsMw, skillsMw, collectionsMw)
// Fine-tuning routes // Fine-tuning routes
fineTuningMw := auth.RequireFeature(application.AuthDB(), auth.FeatureFineTuning) fineTuningMw := auth.RequireFeature(application.AuthDB(), auth.FeatureFineTuning)
// In distributed mode pass the shared NATS client + PostgreSQL store so
// fine-tune jobs stay consistent across replicas (the SyncedMap broadcasts
// mutations and hydrates from the DB); standalone passes nil for both.
var ftNats messaging.MessagingClient
var ftStore *distributed.FineTuneStore
if d := application.Distributed(); d != nil {
ftNats = d.Nats
if d.DistStores != nil && d.DistStores.FineTune != nil {
ftStore = d.DistStores.FineTune
}
}
ftService := finetune.NewFineTuneService( ftService := finetune.NewFineTuneService(
application.ApplicationConfig(), application.ApplicationConfig(),
application.ModelLoader(), application.ModelLoader(),
application.ModelConfigLoader(), application.ModelConfigLoader(),
ftNats,
ftStore,
) )
if d := application.Distributed(); d != nil {
ftService.SetNATSClient(d.Nats)
if d.DistStores != nil && d.DistStores.FineTune != nil {
ftService.SetFineTuneStore(d.DistStores.FineTune)
}
}
routes.RegisterFineTuningRoutes(e, ftService, application.ApplicationConfig(), fineTuningMw) routes.RegisterFineTuningRoutes(e, ftService, application.ApplicationConfig(), fineTuningMw)
// Quantization routes // Quantization routes
quantizationMw := auth.RequireFeature(application.AuthDB(), auth.FeatureQuantization) quantizationMw := auth.RequireFeature(application.AuthDB(), auth.FeatureQuantization)
// In distributed mode pass the shared NATS client + PostgreSQL store so
// quantization jobs stay consistent across replicas (the SyncedMap broadcasts
// mutations and hydrates from the DB); standalone passes nil for both.
var quantNats messaging.MessagingClient
var quantStore *distributed.QuantStore
if d := application.Distributed(); d != nil {
quantNats = d.Nats
if d.DistStores != nil && d.DistStores.Quant != nil {
quantStore = d.DistStores.Quant
}
}
qService := quantization.NewQuantizationService( qService := quantization.NewQuantizationService(
application.ApplicationConfig(), application.ApplicationConfig(),
application.ModelLoader(), application.ModelLoader(),
application.ModelConfigLoader(), application.ModelConfigLoader(),
quantNats,
quantStore,
) )
routes.RegisterQuantizationRoutes(e, qService, application.ApplicationConfig(), quantizationMw) routes.RegisterQuantizationRoutes(e, qService, application.ApplicationConfig(), quantizationMw)

View File

@@ -25,6 +25,7 @@ import (
"github.com/mudler/LocalAI/core/http/auth" "github.com/mudler/LocalAI/core/http/auth"
"github.com/mudler/LocalAI/core/schema" "github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/galleryop" "github.com/mudler/LocalAI/core/services/galleryop"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/nodes" "github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/nodes/prefixcache" "github.com/mudler/LocalAI/core/services/nodes/prefixcache"
"github.com/mudler/LocalAI/pkg/httpclient" "github.com/mudler/LocalAI/pkg/httpclient"
@@ -550,12 +551,23 @@ func DeleteBackendOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerF
} }
// ListBackendsOnNodeEndpoint lists installed backends on a worker node via NATS. // ListBackendsOnNodeEndpoint lists installed backends on a worker node via NATS.
func ListBackendsOnNodeEndpoint(unloader nodes.NodeCommandSender) echo.HandlerFunc { func ListBackendsOnNodeEndpoint(unloader nodes.NodeCommandSender, registry *nodes.NodeRegistry) echo.HandlerFunc {
return func(c echo.Context) error { return func(c echo.Context) error {
nodeID := c.Param("id")
// Agent-type workers don't run backends and never subscribe to the
// nodes.<id>.backend.list NATS subject, so the request would hang
// until timeout with "no responders". Their backend list is simply
// empty. Mirror the aggregate-list guard in managers_distributed.go
// (skip nodes whose NodeType is set and not "backend") so the
// single-node and cluster-wide views stay consistent.
if node, err := registry.Get(c.Request().Context(), nodeID); err == nil {
if node.NodeType != "" && node.NodeType != nodes.NodeTypeBackend {
return c.JSON(http.StatusOK, []messaging.NodeBackendInfo{})
}
}
if unloader == nil { if unloader == nil {
return c.JSON(http.StatusServiceUnavailable, nodeError(http.StatusServiceUnavailable, "NATS not configured")) return c.JSON(http.StatusServiceUnavailable, nodeError(http.StatusServiceUnavailable, "NATS not configured"))
} }
nodeID := c.Param("id")
reply, err := unloader.ListBackends(nodeID) reply, err := unloader.ListBackends(nodeID)
if err != nil { if err != nil {
xlog.Error("Failed to list backends on node", "node", nodeID, "error", err) xlog.Error("Failed to list backends on node", "node", nodeID, "error", err)

View File

@@ -0,0 +1,103 @@
package localai
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/nodes"
"github.com/mudler/LocalAI/core/services/testutil"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
// stubNodeCommandSender records whether ListBackends was invoked so the test can
// assert the endpoint short-circuits (no NATS request) for agent-type nodes.
type stubNodeCommandSender struct {
listBackendsCalled bool
}
func (s *stubNodeCommandSender) InstallBackend(_, _, _, _, _, _, _ string, _ int, _ string, _ func(messaging.BackendInstallProgressEvent)) (*messaging.BackendInstallReply, error) {
return &messaging.BackendInstallReply{}, nil
}
func (s *stubNodeCommandSender) UpgradeBackend(_, _, _, _, _, _ string, _ int, _ string, _ func(messaging.BackendInstallProgressEvent)) (*messaging.BackendUpgradeReply, error) {
return &messaging.BackendUpgradeReply{}, nil
}
func (s *stubNodeCommandSender) DeleteBackend(_, _ string) (*messaging.BackendDeleteReply, error) {
return &messaging.BackendDeleteReply{Success: true}, nil
}
func (s *stubNodeCommandSender) ListBackends(_ string) (*messaging.BackendListReply, error) {
s.listBackendsCalled = true
return &messaging.BackendListReply{Backends: []messaging.NodeBackendInfo{{Name: "llama-cpp"}}}, nil
}
func (s *stubNodeCommandSender) StopBackend(_, _ string) error { return nil }
func (s *stubNodeCommandSender) UnloadModelOnNode(_, _ string) error { return nil }
var _ = Describe("ListBackendsOnNodeEndpoint", func() {
var registry *nodes.NodeRegistry
BeforeEach(func() {
db := testutil.SetupTestDB()
var err error
registry, err = nodes.NewNodeRegistry(db)
Expect(err).ToNot(HaveOccurred())
})
callEndpoint := func(unloader nodes.NodeCommandSender, nodeID string) *httptest.ResponseRecorder {
e := echo.New()
req := httptest.NewRequest(http.MethodGet, "/", nil)
rec := httptest.NewRecorder()
c := e.NewContext(req, rec)
c.SetParamNames("id")
c.SetParamValues(nodeID)
handler := ListBackendsOnNodeEndpoint(unloader, registry)
Expect(handler(c)).To(Succeed())
return rec
}
It("returns an empty list for an agent node without issuing a NATS request", func() {
ctx := context.Background()
node := &nodes.BackendNode{Name: "agent-1", NodeType: nodes.NodeTypeAgent}
Expect(registry.Register(ctx, node, true)).To(Succeed())
stub := &stubNodeCommandSender{}
rec := callEndpoint(stub, node.ID)
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(stub.listBackendsCalled).To(BeFalse(),
"agent workers don't subscribe to backend.list; the endpoint must not issue the doomed NATS request")
var list []messaging.NodeBackendInfo
Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed())
Expect(list).To(BeEmpty())
// Must be `[]`, not `null`, so the UI can render it.
Expect(rec.Body.String()).To(ContainSubstring("[]"))
})
It("consults the unloader (NATS) for a backend node", func() {
ctx := context.Background()
node := &nodes.BackendNode{Name: "backend-1", NodeType: nodes.NodeTypeBackend, Address: "10.0.0.1:50051"}
Expect(registry.Register(ctx, node, true)).To(Succeed())
stub := &stubNodeCommandSender{}
rec := callEndpoint(stub, node.ID)
Expect(rec.Code).To(Equal(http.StatusOK))
Expect(stub.listBackendsCalled).To(BeTrue(),
"backend nodes must still be queried over NATS")
var list []messaging.NodeBackendInfo
Expect(json.Unmarshal(rec.Body.Bytes(), &list)).To(Succeed())
Expect(list).To(HaveLen(1))
Expect(list[0].Name).To(Equal("llama-cpp"))
})
})

View File

@@ -3,6 +3,7 @@ package openresponses
import ( import (
"context" "context"
"encoding/json" "encoding/json"
"errors"
"fmt" "fmt"
"time" "time"
@@ -10,6 +11,7 @@ import (
"github.com/labstack/echo/v4" "github.com/labstack/echo/v4"
"github.com/mudler/LocalAI/core/backend" "github.com/mudler/LocalAI/core/backend"
"github.com/mudler/LocalAI/core/config" "github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/http/auth"
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp" mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
openaiEndpoint "github.com/mudler/LocalAI/core/http/endpoints/openai" openaiEndpoint "github.com/mudler/LocalAI/core/http/endpoints/openai"
"github.com/mudler/LocalAI/core/http/middleware" "github.com/mudler/LocalAI/core/http/middleware"
@@ -246,8 +248,11 @@ func ResponsesEndpoint(cl *config.ModelConfigLoader, ml *model.ModelLoader, eval
// Create cancellable context for background execution // Create cancellable context for background execution
bgCtx, bgCancel := context.WithCancel(context.Background()) bgCtx, bgCancel := context.WithCancel(context.Background())
// Store the background response // Store the background response and stamp its owner before the ID
// is returned to the client, so later GET/cancel/resume can verify
// the caller owns it.
store.StoreBackground(responseID, input, queuedResponse, bgCancel, input.Stream) store.StoreBackground(responseID, input, queuedResponse, bgCancel, input.Stream)
store.SetOwner(responseID, ownerFromContext(c))
// Start background processing goroutine // Start background processing goroutine
go func() { go func() {
@@ -1587,6 +1592,7 @@ func handleOpenResponsesNonStream(c echo.Context, responseID string, createdAt i
if shouldStore { if shouldStore {
store := GetGlobalStore() store := GetGlobalStore()
store.Store(responseID, input, response) store.Store(responseID, input, response)
store.SetOwner(responseID, ownerFromContext(c))
} }
return c.JSON(200, response) return c.JSON(200, response)
@@ -2322,6 +2328,7 @@ func handleOpenResponsesStream(c echo.Context, responseID string, createdAt int6
if shouldStore { if shouldStore {
store := GetGlobalStore() store := GetGlobalStore()
store.Store(responseID, input, responseCompleted) store.Store(responseID, input, responseCompleted)
store.SetOwner(responseID, ownerFromContext(c))
} }
// Send [DONE] // Send [DONE]
@@ -2966,6 +2973,18 @@ func convertORToolsToOpenAIFormat(orTools []schema.ORFunctionTool) []functions.T
return result return result
} }
// ownerFromContext returns the identity (user ID) of the authenticated
// caller, or empty string when no authentication was performed (single-key /
// no-auth deployments). It is the value stamped on a response at creation and
// compared on read/cancel/resume to prevent one caller from accessing
// another's response by guessing its ID.
func ownerFromContext(c echo.Context) string {
if u := auth.GetUser(c); u != nil {
return u.ID
}
return ""
}
// GetResponseEndpoint returns a handler for GET /responses/:id // GetResponseEndpoint returns a handler for GET /responses/:id
// This endpoint is used for polling background responses or resuming streaming // This endpoint is used for polling background responses or resuming streaming
// @Summary Get a response by ID // @Summary Get a response by ID
@@ -2991,6 +3010,12 @@ func GetResponseEndpoint() func(c echo.Context) error {
return sendOpenResponsesError(c, 404, "not_found", fmt.Sprintf("response not found: %s", responseID), "id") return sendOpenResponsesError(c, 404, "not_found", fmt.Sprintf("response not found: %s", responseID), "id")
} }
// Enforce response ownership. Return 404 (not 403) on mismatch so the
// existence of another caller's response is not leaked.
if !accessAllowed(stored, ownerFromContext(c)) {
return sendOpenResponsesError(c, 404, "not_found", fmt.Sprintf("response not found: %s", responseID), "id")
}
// Check if streaming resume is requested // Check if streaming resume is requested
streamParam := c.QueryParam("stream") streamParam := c.QueryParam("stream")
if streamParam == "true" { if streamParam == "true" {
@@ -3022,16 +3047,21 @@ func GetResponseEndpoint() func(c echo.Context) error {
// handleStreamResume handles resuming a streaming response from a specific sequence number // handleStreamResume handles resuming a streaming response from a specific sequence number
func handleStreamResume(c echo.Context, store *ResponseStore, responseID string, stored *StoredResponse, startingAfter int) error { func handleStreamResume(c echo.Context, store *ResponseStore, responseID string, stored *StoredResponse, startingAfter int) error {
// Fetch buffered events before committing to an SSE response so an
// offset-lost gap can be reported as a clean HTTP status rather than a
// silently truncated event stream.
events, err := store.GetEventsAfter(responseID, startingAfter)
if err != nil {
if errors.Is(err, ErrOffsetLost) {
return sendOpenResponsesError(c, 409, "invalid_request_error", fmt.Sprintf("starting_after=%d is older than the oldest retained event; the resume buffer evicted those events and the stream cannot be resumed from that point", startingAfter), "starting_after")
}
return sendOpenResponsesError(c, 500, "server_error", fmt.Sprintf("failed to get events: %v", err), "")
}
c.Response().Header().Set("Content-Type", "text/event-stream") c.Response().Header().Set("Content-Type", "text/event-stream")
c.Response().Header().Set("Cache-Control", "no-cache") c.Response().Header().Set("Cache-Control", "no-cache")
c.Response().Header().Set("Connection", "keep-alive") c.Response().Header().Set("Connection", "keep-alive")
// Get buffered events after the starting point
events, err := store.GetEventsAfter(responseID, startingAfter)
if err != nil {
return sendOpenResponsesError(c, 500, "server_error", fmt.Sprintf("failed to get events: %v", err), "")
}
// Send all buffered events // Send all buffered events
for _, event := range events { for _, event := range events {
fmt.Fprintf(c.Response().Writer, "event: %s\ndata: %s\n\n", event.EventType, string(event.Data)) fmt.Fprintf(c.Response().Writer, "event: %s\ndata: %s\n\n", event.EventType, string(event.Data))
@@ -3126,6 +3156,17 @@ func CancelResponseEndpoint() func(c echo.Context) error {
} }
store := GetGlobalStore() store := GetGlobalStore()
// Look up first so ownership can be checked before any mutation.
stored, err := store.Get(responseID)
if err != nil {
return sendOpenResponsesError(c, 404, "not_found", fmt.Sprintf("response not found: %s", responseID), "id")
}
// Return 404 (not 403) on owner mismatch so existence is not leaked.
if !accessAllowed(stored, ownerFromContext(c)) {
return sendOpenResponsesError(c, 404, "not_found", fmt.Sprintf("response not found: %s", responseID), "id")
}
response, err := store.Cancel(responseID) response, err := store.Cancel(responseID)
if err != nil { if err != nil {
return sendOpenResponsesError(c, 404, "not_found", fmt.Sprintf("response not found: %s", responseID), "id") return sendOpenResponsesError(c, 404, "not_found", fmt.Sprintf("response not found: %s", responseID), "id")

View File

@@ -3,6 +3,7 @@ package openresponses
import ( import (
"context" "context"
"encoding/json" "encoding/json"
"errors"
"fmt" "fmt"
"sync" "sync"
"time" "time"
@@ -11,6 +12,30 @@ import (
"github.com/mudler/xlog" "github.com/mudler/xlog"
) )
const (
// defaultMaxStreamEvents bounds how many resume-buffer events a single
// background response retains. Without a cap, a long-running or abandoned
// background generation grows StreamEvents without limit and can exhaust
// process memory. When the cap is exceeded the oldest events are evicted
// from the front (see AppendEvent). Mirrors llama.cpp's byte-capped slot
// ring used for resumable /slots state.
defaultMaxStreamEvents = 8192
// defaultMaxStreamBytes caps the total serialized size of retained
// resume-buffer events, evicting oldest-first when exceeded. This guards
// against a handful of very large events defeating the count cap. 0
// disables the byte cap (count cap still applies).
defaultMaxStreamBytes = 64 << 20 // 64 MiB
)
// ErrOffsetLost is returned by GetEventsAfter when the requested
// starting_after sequence number is older than the oldest event still
// retained in the resume buffer (i.e. the events between the requested
// offset and the current watermark were evicted by the cap). Callers should
// surface this to clients as a distinct error instead of silently returning
// a truncated stream that omits the dropped events.
var ErrOffsetLost = errors.New("resume offset lost: requested events were evicted from the buffer")
// ResponseStore provides thread-safe storage for Open Responses API responses // ResponseStore provides thread-safe storage for Open Responses API responses
type ResponseStore struct { type ResponseStore struct {
mu sync.RWMutex mu sync.RWMutex
@@ -18,6 +43,12 @@ type ResponseStore struct {
ttl time.Duration // Time-to-live for stored responses (0 = no expiration) ttl time.Duration // Time-to-live for stored responses (0 = no expiration)
cleanupCtx context.Context cleanupCtx context.Context
cleanupCancel context.CancelFunc cleanupCancel context.CancelFunc
// maxStreamEvents / maxStreamBytes bound the per-response resume buffer.
// Set once at construction from the default constants; tests may lower
// them. A value <= 0 disables that particular cap.
maxStreamEvents int
maxStreamBytes int
} }
// StreamedEvent represents a buffered SSE event for streaming resume // StreamedEvent represents a buffered SSE event for streaming resume
@@ -35,6 +66,12 @@ type StoredResponse struct {
StoredAt time.Time StoredAt time.Time
ExpiresAt *time.Time // nil if no expiration ExpiresAt *time.Time // nil if no expiration
// Owner is the identity (user ID) that created this response. It is set
// once at creation and never mutated, so it can be read without holding
// mu. Empty means "no owner" (single-key / no-auth deployments), in which
// case ownership checks are skipped for backward compatibility.
Owner string
// Background execution support // Background execution support
CancelFunc context.CancelFunc // For cancellation of background tasks CancelFunc context.CancelFunc // For cancellation of background tasks
StreamEvents []StreamedEvent // Buffered events for streaming resume StreamEvents []StreamedEvent // Buffered events for streaming resume
@@ -42,6 +79,14 @@ type StoredResponse struct {
IsBackground bool // Was created with background=true IsBackground bool // Was created with background=true
EventsChan chan struct{} // Signals new events for live subscribers EventsChan chan struct{} // Signals new events for live subscribers
mu sync.RWMutex // Protect concurrent access to this response mu sync.RWMutex // Protect concurrent access to this response
// streamBytes tracks the total serialized size of the events currently
// retained in StreamEvents, used to enforce the byte cap. droppedThrough
// is the highest sequence number evicted from the front of the buffer
// (-1 = nothing evicted); it is the watermark GetEventsAfter compares
// against to detect a lost resume offset. Both are guarded by mu.
streamBytes int
droppedThrough int
} }
var getGlobalStore = sync.OnceValue(func() *ResponseStore { var getGlobalStore = sync.OnceValue(func() *ResponseStore {
@@ -81,8 +126,10 @@ func (s *ResponseStore) SetTTL(ttl time.Duration) {
// If ttl is 0, responses are stored indefinitely // If ttl is 0, responses are stored indefinitely
func NewResponseStore(ttl time.Duration) *ResponseStore { func NewResponseStore(ttl time.Duration) *ResponseStore {
store := &ResponseStore{ store := &ResponseStore{
responses: make(map[string]*StoredResponse), responses: make(map[string]*StoredResponse),
ttl: ttl, ttl: ttl,
maxStreamEvents: defaultMaxStreamEvents,
maxStreamBytes: defaultMaxStreamBytes,
} }
// Start cleanup goroutine if TTL is set // Start cleanup goroutine if TTL is set
@@ -109,11 +156,12 @@ func (s *ResponseStore) Store(responseID string, request *schema.OpenResponsesRe
} }
stored := &StoredResponse{ stored := &StoredResponse{
Request: request, Request: request,
Response: response, Response: response,
Items: items, Items: items,
StoredAt: time.Now(), StoredAt: time.Now(),
ExpiresAt: nil, ExpiresAt: nil,
droppedThrough: -1,
} }
// Set expiration if TTL is configured // Set expiration if TTL is configured
@@ -256,16 +304,17 @@ func (s *ResponseStore) StoreBackground(responseID string, request *schema.OpenR
} }
stored := &StoredResponse{ stored := &StoredResponse{
Request: request, Request: request,
Response: response, Response: response,
Items: items, Items: items,
StoredAt: time.Now(), StoredAt: time.Now(),
ExpiresAt: nil, ExpiresAt: nil,
CancelFunc: cancelFunc, CancelFunc: cancelFunc,
StreamEvents: []StreamedEvent{}, StreamEvents: []StreamedEvent{},
StreamEnabled: streamEnabled, StreamEnabled: streamEnabled,
IsBackground: true, IsBackground: true,
EventsChan: make(chan struct{}, 100), // Buffered channel for event notifications EventsChan: make(chan struct{}, 100), // Buffered channel for event notifications
droppedThrough: -1,
} }
// Set expiration if TTL is configured // Set expiration if TTL is configured
@@ -349,6 +398,25 @@ func (s *ResponseStore) AppendEvent(responseID string, event *schema.ORStreamEve
EventType: event.Type, EventType: event.Type,
Data: data, Data: data,
}) })
stored.streamBytes += len(data)
// Evict oldest events from the front once either cap is exceeded. The
// byte cap never evicts the only remaining event (a single oversized
// event is still served once). Each eviction advances droppedThrough so
// a later resume below the watermark is reported as ErrOffsetLost rather
// than silently skipping the dropped events.
for (s.maxStreamEvents > 0 && len(stored.StreamEvents) > s.maxStreamEvents) ||
(s.maxStreamBytes > 0 && stored.streamBytes > s.maxStreamBytes && len(stored.StreamEvents) > 1) {
evicted := stored.StreamEvents[0]
stored.streamBytes -= len(evicted.Data)
if evicted.SequenceNumber > stored.droppedThrough {
stored.droppedThrough = evicted.SequenceNumber
}
// Release the evicted payload so it can be GC'd even though the
// backing array element is still owned by the slice until reuse.
stored.StreamEvents[0].Data = nil
stored.StreamEvents = stored.StreamEvents[1:]
}
stored.mu.Unlock() stored.mu.Unlock()
// Notify any subscribers of new event // Notify any subscribers of new event
@@ -374,6 +442,14 @@ func (s *ResponseStore) GetEventsAfter(responseID string, startingAfter int) ([]
stored.mu.RLock() stored.mu.RLock()
defer stored.mu.RUnlock() defer stored.mu.RUnlock()
// If the requested offset is older than the watermark, the events the
// client expects next (those in (startingAfter, droppedThrough]) were
// evicted by the cap. Signal the gap rather than returning a stream that
// silently skips them.
if startingAfter < stored.droppedThrough {
return nil, ErrOffsetLost
}
var result []StreamedEvent var result []StreamedEvent
for _, event := range stored.StreamEvents { for _, event := range stored.StreamEvents {
if event.SequenceNumber > startingAfter { if event.SequenceNumber > startingAfter {
@@ -447,3 +523,30 @@ func (s *ResponseStore) IsStreamEnabled(responseID string) (bool, error) {
return stored.StreamEnabled, nil return stored.StreamEnabled, nil
} }
// SetOwner records the identity that owns a stored response. It is called
// once, right after the response is stored and before its ID is handed back
// to any client, so no lock on the stored response is required. A no-op for
// an empty owner or unknown response ID.
func (s *ResponseStore) SetOwner(responseID, owner string) {
if owner == "" {
return
}
s.mu.RLock()
stored, exists := s.responses[responseID]
s.mu.RUnlock()
if !exists {
return
}
stored.Owner = owner
}
// accessAllowed reports whether a caller identified by callerID may read or
// mutate the given stored response. An empty owner (single-key / no-auth
// deployments) is accessible by anyone, preserving backward compatibility;
// otherwise the caller identity must match the recorded owner.
func accessAllowed(stored *StoredResponse, callerID string) bool {
return stored.Owner == "" || stored.Owner == callerID
}

View File

@@ -585,6 +585,86 @@ var _ = Describe("ResponseStore", func() {
Expect(enabled2).To(BeFalse()) Expect(enabled2).To(BeFalse())
}) })
It("should bound the resume buffer and evict oldest events past the cap", func() {
// Lower the caps so the test stays fast; production defaults are
// large. Same-package access to the unexported fields is fine.
store.maxStreamEvents = 5
store.maxStreamBytes = 0 // count cap only for this test
responseID := "resp_buffer_cap"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{
ID: responseID,
Object: "response",
Status: schema.ORStatusInProgress,
}
_, cancel := context.WithCancel(context.Background())
defer cancel()
store.StoreBackground(responseID, request, response, cancel, true)
// Append well past the cap.
const total = 20
for i := range total {
err := store.AppendEvent(responseID, &schema.ORStreamEvent{
Type: "response.output_text.delta",
SequenceNumber: i,
})
Expect(err).ToNot(HaveOccurred())
}
stored, err := store.Get(responseID)
Expect(err).ToNot(HaveOccurred())
// (a) Buffer length stays bounded by the cap.
Expect(len(stored.StreamEvents)).To(Equal(5))
// (b) Oldest events were evicted: only the last 5 sequence numbers
// remain (15..19).
Expect(stored.StreamEvents[0].SequenceNumber).To(Equal(15))
Expect(stored.StreamEvents[len(stored.StreamEvents)-1].SequenceNumber).To(Equal(19))
// Asking for events after the last retained seq still works.
retained, err := store.GetEventsAfter(responseID, 14)
Expect(err).ToNot(HaveOccurred())
Expect(retained).To(HaveLen(5))
// (c) Asking below the dropped watermark returns ErrOffsetLost.
_, err = store.GetEventsAfter(responseID, 0)
Expect(err).To(MatchError(ErrOffsetLost))
_, err = store.GetEventsAfter(responseID, -1)
Expect(err).To(MatchError(ErrOffsetLost))
})
It("should record and enforce response ownership", func() {
responseID := "resp_owner_test"
request := &schema.OpenResponsesRequest{Model: "test"}
response := &schema.ORResponseResource{ID: responseID, Object: "response", Status: schema.ORStatusCompleted}
store.Store(responseID, request, response)
store.SetOwner(responseID, "userA")
stored, err := store.Get(responseID)
Expect(err).ToNot(HaveOccurred())
Expect(stored.Owner).To(Equal("userA"))
// Owner matches -> allowed; different identity -> denied.
Expect(accessAllowed(stored, "userA")).To(BeTrue())
Expect(accessAllowed(stored, "userB")).To(BeFalse())
// Backward compatibility: a response with no owner is accessible
// by any caller (single-key / no-auth deployments).
noOwnerID := "resp_no_owner"
store.Store(noOwnerID, request, &schema.ORResponseResource{ID: noOwnerID, Object: "response"})
noOwner, err := store.Get(noOwnerID)
Expect(err).ToNot(HaveOccurred())
Expect(noOwner.Owner).To(BeEmpty())
Expect(accessAllowed(noOwner, "anyone")).To(BeTrue())
Expect(accessAllowed(noOwner, "")).To(BeTrue())
})
It("should notify subscribers of new events", func() { It("should notify subscribers of new events", func() {
responseID := "resp_events_chan" responseID := "resp_events_chan"
request := &schema.OpenResponsesRequest{Model: "test"} request := &schema.OpenResponsesRequest{Model: "test"}

View File

@@ -88,7 +88,7 @@ func RegisterNodeAdminRoutes(e *echo.Echo, registry *nodes.NodeRegistry, unloade
admin.POST("/:id/approve", localai.ApproveNodeEndpoint(registry, authDB, hmacSecret, natsCfg)) admin.POST("/:id/approve", localai.ApproveNodeEndpoint(registry, authDB, hmacSecret, natsCfg))
// Backend management on workers // Backend management on workers
admin.GET("/:id/backends", localai.ListBackendsOnNodeEndpoint(unloader)) admin.GET("/:id/backends", localai.ListBackendsOnNodeEndpoint(unloader, registry))
admin.POST("/:id/backends/install", localai.InstallBackendOnNodeEndpoint(unloader, galleryService, opcache, appConfig)) admin.POST("/:id/backends/install", localai.InstallBackendOnNodeEndpoint(unloader, galleryService, opcache, appConfig))
admin.POST("/:id/backends/delete", localai.DeleteBackendOnNodeEndpoint(unloader)) admin.POST("/:id/backends/delete", localai.DeleteBackendOnNodeEndpoint(unloader))

View File

@@ -30,6 +30,8 @@ import (
mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp" mcpTools "github.com/mudler/LocalAI/core/http/endpoints/mcp"
"github.com/mudler/LocalAI/core/schema" "github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/jobs" "github.com/mudler/LocalAI/core/services/jobs"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
"github.com/mudler/LocalAI/core/templates" "github.com/mudler/LocalAI/core/templates"
"github.com/mudler/LocalAI/pkg/httpclient" "github.com/mudler/LocalAI/pkg/httpclient"
"github.com/mudler/LocalAI/pkg/model" "github.com/mudler/LocalAI/pkg/model"
@@ -43,8 +45,18 @@ type AgentJobService struct {
configLoader *config.ModelConfigLoader configLoader *config.ModelConfigLoader
evaluator *templates.Evaluator evaluator *templates.Evaluator
// tasks is the cross-replica task store: an in-memory map kept consistent
// across replicas via NATS, with read-through to the configured persister
// (file in standalone, PostgreSQL in distributed). Unlike jobs - which already
// converge via the dispatcher + DB read-through - tasks previously read
// in-memory only, so ListTasks went stale on non-originating replicas.
tasks *syncstate.SyncedMap[string, schema.Task]
// taskNats is the distributed NATS client backing the tasks SyncedMap. It is
// not available at construction time, so it is injected via SetTaskSyncNATS
// during distributed wiring; nil keeps tasks in-memory-only (standalone).
taskNats messaging.MessagingClient
// Storage (in-memory primary, persister for secondary persistence) // Storage (in-memory primary, persister for secondary persistence)
tasks *xsync.SyncedMap[string, schema.Task]
jobs *xsync.SyncedMap[string, schema.Job] jobs *xsync.SyncedMap[string, schema.Job]
persister JobPersister persister JobPersister
userID string // Scoping: empty for global (main service), set for per-user instances userID string // Scoping: empty for global (main service), set for per-user instances
@@ -96,6 +108,31 @@ func (s *AgentJobService) SetDistributedJobStore(store *jobs.JobStore) {
s.persister = &dbJobPersister{store: store} s.persister = &dbJobPersister{store: store}
} }
// SetTaskSyncNATS wires the distributed NATS client used to keep agent *tasks*
// consistent across replicas (jobs already converge via the dispatcher + DB
// read-through, so they are left untouched). The client is not available when the
// service is constructed, so it is injected here during distributed wiring and the
// tasks SyncedMap is rebuilt to pick it up. It is always called before Start /
// hydrate, while the map is still empty, so rebuilding loses no state. Passing nil
// (standalone) keeps the map in-memory-only with no broadcast.
func (s *AgentJobService) SetTaskSyncNATS(nats messaging.MessagingClient) {
s.taskNats = nats
s.buildTasksMap()
}
// buildTasksMap (re)constructs the cross-replica tasks SyncedMap from the current
// taskNats. The Store adapter reads s.persister/s.userID live, so a persister swap
// (SetDistributedJobStore) needs no rebuild; only the NATS client, fixed at
// New-time, forces one - hence SetTaskSyncNATS calls this.
func (s *AgentJobService) buildTasksMap() {
s.tasks = syncstate.New(syncstate.Config[string, schema.Task]{
Name: "agent.tasks",
Key: func(t schema.Task) string { return t.ID },
Nats: s.taskNats,
Store: &taskStoreAdapter{svc: s},
})
}
// Dispatcher returns the distributed dispatcher (nil if not in distributed mode). // Dispatcher returns the distributed dispatcher (nil if not in distributed mode).
func (s *AgentJobService) Dispatcher() DistributedDispatcher { func (s *AgentJobService) Dispatcher() DistributedDispatcher {
return s.dispatcher return s.dispatcher
@@ -106,13 +143,6 @@ func (s *AgentJobService) DBStore() *jobs.JobStore {
return s.rawDBStore return s.rawDBStore
} }
// saveTasks persists tasks via the configured persister (file or DB).
func (s *AgentJobService) saveTasks(task schema.Task) {
if err := s.persister.SaveTask(s.userID, task); err != nil {
xlog.Warn("Failed to persist task", "error", err, "task_id", task.ID)
}
}
// saveJobs persists jobs via the configured persister (file or DB). // saveJobs persists jobs via the configured persister (file or DB).
func (s *AgentJobService) saveJobs(job schema.Job) { func (s *AgentJobService) saveJobs(job schema.Job) {
if err := s.persister.SaveJob(s.userID, job); err != nil { if err := s.persister.SaveJob(s.userID, job); err != nil {
@@ -129,18 +159,8 @@ func (s *AgentJobService) LoadFromDB() {
// loadFromPersister loads tasks and jobs from the configured persister into memory. // loadFromPersister loads tasks and jobs from the configured persister into memory.
func (s *AgentJobService) loadFromPersister() { func (s *AgentJobService) loadFromPersister() {
if tasks, err := s.persister.LoadTasks(s.userID); err != nil { if err := s.hydrateTasks(s.appConfig.Context); err != nil {
xlog.Warn("Failed to load tasks from persister", "error", err) xlog.Warn("Failed to load tasks from persister", "error", err)
} else {
for _, task := range tasks {
s.tasks.Set(task.ID, task)
if task.Enabled && task.Cron != "" {
if err := s.ScheduleCronTask(task); err != nil {
xlog.Warn("Failed to schedule cron task on load", "error", err, "task_id", task.ID)
}
}
}
xlog.Info("Loaded tasks from persister", "count", len(tasks))
} }
if loadedJobs, err := s.persister.LoadJobs(s.userID); err != nil { if loadedJobs, err := s.persister.LoadJobs(s.userID); err != nil {
@@ -153,6 +173,27 @@ func (s *AgentJobService) loadFromPersister() {
} }
} }
// hydrateTasks loads tasks into the cross-replica SyncedMap and (re)schedules
// cron entries for enabled tasks. Hydration goes through the SyncedMap's Store
// read-through (Start), not Set, so it neither re-persists nor re-broadcasts the
// loaded tasks. Each service instance hydrates exactly once: the main service via
// Start -> loadFromPersister, per-user services via LoadFromDB or LoadTasksFromFile.
func (s *AgentJobService) hydrateTasks(ctx context.Context) error {
if err := s.tasks.Start(ctx); err != nil {
return err
}
tasks := s.tasks.List()
for _, task := range tasks {
if task.Enabled && task.Cron != "" {
if err := s.ScheduleCronTask(task); err != nil {
xlog.Warn("Failed to schedule cron task on load", "error", err, "task_id", task.ID)
}
}
}
xlog.Info("Loaded tasks from persister", "count", len(tasks))
return nil
}
// JobExecution represents a job to be executed // JobExecution represents a job to be executed
type JobExecution struct { type JobExecution struct {
Job schema.Job Job schema.Job
@@ -200,21 +241,19 @@ func NewAgentJobServiceWithPaths(
) *AgentJobService { ) *AgentJobService {
retentionDays := cmp.Or(appConfig.AgentJobRetentionDays, 30) retentionDays := cmp.Or(appConfig.AgentJobRetentionDays, 30)
tasks := xsync.NewSyncedMap[string, schema.Task]()
jobsMap := xsync.NewSyncedMap[string, schema.Job]() jobsMap := xsync.NewSyncedMap[string, schema.Job]()
return &AgentJobService{ s := &AgentJobService{
appConfig: appConfig, appConfig: appConfig,
modelLoader: modelLoader, modelLoader: modelLoader,
configLoader: configLoader, configLoader: configLoader,
evaluator: evaluator, evaluator: evaluator,
tasks: tasks,
jobs: jobsMap, jobs: jobsMap,
persister: &fileJobPersister{ persister: &fileJobPersister{
tasks: tasks,
jobs: jobsMap, jobs: jobsMap,
tasksFile: tasksFile, tasksFile: tasksFile,
jobsFile: jobsFile, jobsFile: jobsFile,
taskSet: make(map[string]schema.Task),
}, },
jobQueue: make(chan JobExecution, 100), // Buffer for 100 jobs jobQueue: make(chan JobExecution, 100), // Buffer for 100 jobs
cancellations: xsync.NewSyncedMap[string, context.CancelFunc](), cancellations: xsync.NewSyncedMap[string, context.CancelFunc](),
@@ -222,25 +261,17 @@ func NewAgentJobServiceWithPaths(
cronEntries: xsync.NewSyncedMap[string, cron.EntryID](), cronEntries: xsync.NewSyncedMap[string, cron.EntryID](),
retentionDays: retentionDays, retentionDays: retentionDays,
} }
// Build the cross-replica tasks map standalone (nil NATS); SetTaskSyncNATS
// rebuilds it with the distributed client once that is available, before Start.
s.buildTasksMap()
return s
} }
// LoadTasksFromFile loads tasks from the persister into the in-memory map // LoadTasksFromFile loads tasks from the persister into the in-memory map
// and schedules cron entries. Named "FromFile" for backward compat; in DB // and schedules cron entries. Named "FromFile" for backward compat; in DB
// mode it loads from the database. // mode it loads from the database.
func (s *AgentJobService) LoadTasksFromFile() error { func (s *AgentJobService) LoadTasksFromFile() error {
tasks, err := s.persister.LoadTasks(s.userID) return s.hydrateTasks(s.appConfig.Context)
if err != nil {
return err
}
for _, task := range tasks {
s.tasks.Set(task.ID, task)
if task.Enabled && task.Cron != "" {
if err := s.ScheduleCronTask(task); err != nil {
xlog.Warn("Failed to schedule cron task on load", "error", err, "task_id", task.ID)
}
}
}
return nil
} }
// SaveTasksToFile flushes the current tasks map via the persister. File // SaveTasksToFile flushes the current tasks map via the persister. File
@@ -293,8 +324,12 @@ func (s *AgentJobService) CreateTask(task schema.Task) (string, error) {
task.Enabled = true // Default to enabled task.Enabled = true // Default to enabled
} }
// Store task // Store task: Set updates the in-memory map, write-throughs to the persister
s.tasks.Set(id, task) // (file or DB), and broadcasts the create to peer replicas. Background ctx
// because CreateTask carries no request ctx (mirrors the finetune service).
if err := s.tasks.Set(context.Background(), task); err != nil {
return "", fmt.Errorf("failed to persist task: %w", err)
}
// Schedule cron if enabled and has cron expression // Schedule cron if enabled and has cron expression
if task.Enabled && task.Cron != "" { if task.Enabled && task.Cron != "" {
@@ -303,16 +338,15 @@ func (s *AgentJobService) CreateTask(task schema.Task) (string, error) {
} }
} }
s.saveTasks(task)
return id, nil return id, nil
} }
// UpdateTask updates an existing task // UpdateTask updates an existing task
func (s *AgentJobService) UpdateTask(id string, task schema.Task) error { func (s *AgentJobService) UpdateTask(id string, task schema.Task) error {
if !s.tasks.Exists(id) { existing, ok := s.tasks.Get(id)
if !ok {
return fmt.Errorf("%w: %s", ErrTaskNotFound, id) return fmt.Errorf("%w: %s", ErrTaskNotFound, id)
} }
existing := s.tasks.Get(id)
// Preserve ID and CreatedAt // Preserve ID and CreatedAt
task.ID = id task.ID = id
@@ -324,8 +358,10 @@ func (s *AgentJobService) UpdateTask(id string, task schema.Task) error {
s.UnscheduleCronTask(id) s.UnscheduleCronTask(id)
} }
// Store updated task // Store updated task: write-through + broadcast (see CreateTask).
s.tasks.Set(id, task) if err := s.tasks.Set(context.Background(), task); err != nil {
return fmt.Errorf("failed to persist task: %w", err)
}
// Schedule new cron if enabled and has cron expression // Schedule new cron if enabled and has cron expression
if task.Enabled && task.Cron != "" { if task.Enabled && task.Cron != "" {
@@ -334,24 +370,22 @@ func (s *AgentJobService) UpdateTask(id string, task schema.Task) error {
} }
} }
s.saveTasks(task)
return nil return nil
} }
// DeleteTask deletes a task // DeleteTask deletes a task
func (s *AgentJobService) DeleteTask(id string) error { func (s *AgentJobService) DeleteTask(id string) error {
if !s.tasks.Exists(id) { if _, ok := s.tasks.Get(id); !ok {
return fmt.Errorf("%w: %s", ErrTaskNotFound, id) return fmt.Errorf("%w: %s", ErrTaskNotFound, id)
} }
// Unschedule cron // Unschedule cron
s.UnscheduleCronTask(id) s.UnscheduleCronTask(id)
// Remove from memory // Delete removes from the in-memory map, deletes from the persister, and
s.tasks.Delete(id) // broadcasts the removal to peer replicas.
if err := s.tasks.Delete(context.Background(), id); err != nil {
if err := s.persister.DeleteTask(id); err != nil { xlog.Warn("Failed to delete task from store", "error", err, "task_id", id)
xlog.Warn("Failed to delete task from persister", "error", err, "task_id", id)
} }
return nil return nil
@@ -359,8 +393,8 @@ func (s *AgentJobService) DeleteTask(id string) error {
// GetTask retrieves a task by ID // GetTask retrieves a task by ID
func (s *AgentJobService) GetTask(id string) (*schema.Task, error) { func (s *AgentJobService) GetTask(id string) (*schema.Task, error) {
task := s.tasks.Get(id) task, ok := s.tasks.Get(id)
if task.ID == "" { if !ok {
return nil, fmt.Errorf("%w: %s", ErrTaskNotFound, id) return nil, fmt.Errorf("%w: %s", ErrTaskNotFound, id)
} }
return &task, nil return &task, nil
@@ -368,7 +402,7 @@ func (s *AgentJobService) GetTask(id string) (*schema.Task, error) {
// ListTasks returns all tasks, sorted by creation date (newest first) // ListTasks returns all tasks, sorted by creation date (newest first)
func (s *AgentJobService) ListTasks() []schema.Task { func (s *AgentJobService) ListTasks() []schema.Task {
tasks := s.tasks.Values() tasks := s.tasks.List()
// Sort by CreatedAt descending (newest first), then by Name for stability // Sort by CreatedAt descending (newest first), then by Name for stability
slices.SortFunc(tasks, func(a, b schema.Task) int { slices.SortFunc(tasks, func(a, b schema.Task) int {
if a.CreatedAt.Equal(b.CreatedAt) { if a.CreatedAt.Equal(b.CreatedAt) {
@@ -397,8 +431,8 @@ func (s *AgentJobService) buildPrompt(templateStr string, params map[string]stri
// ExecuteJob creates and queues a job for execution // ExecuteJob creates and queues a job for execution
// multimedia can be nil for backward compatibility // multimedia can be nil for backward compatibility
func (s *AgentJobService) ExecuteJob(taskID string, params map[string]string, triggeredBy string, multimedia *schema.MultimediaAttachment) (string, error) { func (s *AgentJobService) ExecuteJob(taskID string, params map[string]string, triggeredBy string, multimedia *schema.MultimediaAttachment) (string, error) {
task := s.tasks.Get(taskID) task, ok := s.tasks.Get(taskID)
if task.ID == "" { if !ok {
return "", fmt.Errorf("%w: %s", ErrTaskNotFound, taskID) return "", fmt.Errorf("%w: %s", ErrTaskNotFound, taskID)
} }
@@ -1451,6 +1485,12 @@ func (s *AgentJobService) Stop() error {
if s.cronScheduler != nil { if s.cronScheduler != nil {
s.cronScheduler.Stop() s.cronScheduler.Stop()
} }
// Release the tasks SyncedMap subscription / background workers.
if s.tasks != nil {
if err := s.tasks.Close(); err != nil {
xlog.Warn("Error closing tasks sync map", "error", err)
}
}
xlog.Info("AgentJobService stopped") xlog.Info("AgentJobService stopped")
return nil return nil
} }

View File

@@ -14,24 +14,38 @@ import (
) )
// fileJobPersister persists tasks and jobs to JSON files. // fileJobPersister persists tasks and jobs to JSON files.
// It holds references to the service's syncmaps and serializes the entire //
// map contents on each save (bulk write). Reads at runtime return nil // Jobs serialize the service's in-memory jobs syncmap on each save (bulk write).
// (the in-memory map is the authoritative source); LoadTasks/LoadJobs // Tasks are kept in this persister's own taskSet map instead: the tasks SyncedMap
// are used only at startup to bootstrap the syncmaps. // calls SaveTask/DeleteTask while holding its internal lock (write-through), so
// reading back the SyncedMap here would re-enter that lock and deadlock. The
// self-contained taskSet, seeded by LoadTasks, lets a per-task write rewrite the
// whole bulk file without touching the SyncedMap.
//
// Runtime reads (GetJob/ListJobs) return nil (the in-memory state is the
// authoritative source); LoadTasks/LoadJobs bootstrap state at startup.
type fileJobPersister struct { type fileJobPersister struct {
tasks *xsync.SyncedMap[string, schema.Task]
jobs *xsync.SyncedMap[string, schema.Job] jobs *xsync.SyncedMap[string, schema.Job]
tasksFile string tasksFile string
jobsFile string jobsFile string
mu sync.Mutex mu sync.Mutex
// taskSet is the persister's own view of all tasks, seeded by LoadTasks and
// updated by SaveTask/DeleteTask. The bulk JSON file is rewritten from it.
taskSet map[string]schema.Task
} }
func (p *fileJobPersister) SaveTask(_ string, _ schema.Task) error { func (p *fileJobPersister) SaveTask(_ string, task schema.Task) error {
return p.saveTasksToFile() p.mu.Lock()
defer p.mu.Unlock()
p.taskSet[task.ID] = task
return p.writeTasksLocked()
} }
func (p *fileJobPersister) DeleteTask(_ string) error { func (p *fileJobPersister) DeleteTask(taskID string) error {
return p.saveTasksToFile() p.mu.Lock()
defer p.mu.Unlock()
delete(p.taskSet, taskID)
return p.writeTasksLocked()
} }
func (p *fileJobPersister) SaveJob(_ string, _ schema.Job) error { func (p *fileJobPersister) SaveJob(_ string, _ schema.Job) error {
@@ -43,7 +57,9 @@ func (p *fileJobPersister) DeleteJob(_ string) error {
} }
func (p *fileJobPersister) FlushTasks() error { func (p *fileJobPersister) FlushTasks() error {
return p.saveTasksToFile() p.mu.Lock()
defer p.mu.Unlock()
return p.writeTasksLocked()
} }
func (p *fileJobPersister) FlushJobs() error { func (p *fileJobPersister) FlushJobs() error {
@@ -83,6 +99,12 @@ func (p *fileJobPersister) LoadTasks(_ string) ([]schema.Task, error) {
return nil, fmt.Errorf("failed to parse tasks file: %w", err) return nil, fmt.Errorf("failed to parse tasks file: %w", err)
} }
// Seed the in-memory set so subsequent per-task SaveTask/DeleteTask merge into
// (rather than overwrite) the persisted tasks when the bulk file is rewritten.
for _, t := range tf.Tasks {
p.taskSet[t.ID] = t
}
xlog.Info("Loaded tasks from file", "count", len(tf.Tasks)) xlog.Info("Loaded tasks from file", "count", len(tf.Tasks))
return tf.Tasks, nil return tf.Tasks, nil
} }
@@ -118,19 +140,20 @@ func (p *fileJobPersister) CleanupOldJobs(_ time.Duration) (int64, error) {
return 0, nil // cleanup handled via in-memory filtering return 0, nil // cleanup handled via in-memory filtering
} }
// saveTasksToFile serializes the entire tasks map to the JSON file. // writeTasksLocked serializes the persister's task set to the JSON file. Callers
func (p *fileJobPersister) saveTasksToFile() error { // must hold p.mu.
func (p *fileJobPersister) writeTasksLocked() error {
if p.tasksFile == "" { if p.tasksFile == "" {
return nil return nil
} }
p.mu.Lock() tasks := make([]schema.Task, 0, len(p.taskSet))
defer p.mu.Unlock() for _, t := range p.taskSet {
tasks = append(tasks, t)
tf := schema.TasksFile{
Tasks: p.tasks.Values(),
} }
tf := schema.TasksFile{Tasks: tasks}
data, err := json.MarshalIndent(tf, "", " ") data, err := json.MarshalIndent(tf, "", " ")
if err != nil { if err != nil {
return fmt.Errorf("failed to marshal tasks: %w", err) return fmt.Errorf("failed to marshal tasks: %w", err)

View File

@@ -20,28 +20,26 @@ var _ = Describe("JobPersister", func() {
Context("fileJobPersister", func() { Context("fileJobPersister", func() {
var ( var (
p *fileJobPersister p *fileJobPersister
tasks *xsync.SyncedMap[string, schema.Task]
jobsMap *xsync.SyncedMap[string, schema.Job] jobsMap *xsync.SyncedMap[string, schema.Job]
tmpDir string tmpDir string
) )
BeforeEach(func() { BeforeEach(func() {
tmpDir = GinkgoT().TempDir() tmpDir = GinkgoT().TempDir()
tasks = xsync.NewSyncedMap[string, schema.Task]()
jobsMap = xsync.NewSyncedMap[string, schema.Job]() jobsMap = xsync.NewSyncedMap[string, schema.Job]()
p = &fileJobPersister{ p = &fileJobPersister{
tasks: tasks,
jobs: jobsMap, jobs: jobsMap,
tasksFile: filepath.Join(tmpDir, "tasks.json"), tasksFile: filepath.Join(tmpDir, "tasks.json"),
jobsFile: filepath.Join(tmpDir, "jobs.json"), jobsFile: filepath.Join(tmpDir, "jobs.json"),
// taskSet is the persister's own task view (decoupled from the tasks
// SyncedMap to avoid re-entering its lock during write-through).
taskSet: make(map[string]schema.Task),
} }
}) })
It("SaveTask writes all tasks to file", func() { It("SaveTask writes all tasks to file", func() {
tasks.Set("t1", schema.Task{ID: "t1", Name: "Task One", Model: "m", Prompt: "p"}) Expect(p.SaveTask("", schema.Task{ID: "t1", Name: "Task One", Model: "m", Prompt: "p"})).To(Succeed())
tasks.Set("t2", schema.Task{ID: "t2", Name: "Task Two", Model: "m", Prompt: "p"}) Expect(p.SaveTask("", schema.Task{ID: "t2", Name: "Task Two", Model: "m", Prompt: "p"})).To(Succeed())
Expect(p.SaveTask("", schema.Task{})).To(Succeed())
// Verify file contents // Verify file contents
data, err := os.ReadFile(p.tasksFile) data, err := os.ReadFile(p.tasksFile)
@@ -52,11 +50,9 @@ var _ = Describe("JobPersister", func() {
}) })
It("DeleteTask writes updated tasks to file", func() { It("DeleteTask writes updated tasks to file", func() {
tasks.Set("t1", schema.Task{ID: "t1", Name: "Keep"}) Expect(p.SaveTask("", schema.Task{ID: "t1", Name: "Keep"})).To(Succeed())
tasks.Set("t2", schema.Task{ID: "t2", Name: "Delete"}) Expect(p.SaveTask("", schema.Task{ID: "t2", Name: "Delete"})).To(Succeed())
// Simulate deletion from memory (caller does this before calling persister)
tasks.Delete("t2")
Expect(p.DeleteTask("t2")).To(Succeed()) Expect(p.DeleteTask("t2")).To(Succeed())
data, err := os.ReadFile(p.tasksFile) data, err := os.ReadFile(p.tasksFile)

View File

@@ -0,0 +1,152 @@
package agentpool
// White-box tests (package agentpool) so a spec can build two AgentJobService
// instances sharing one in-memory bus and assert that agent *tasks* converge
// across replicas - the bug this migration fixes (ListTasks used to read
// in-memory only, so a task created on replica A was invisible on replica B).
// Jobs are deliberately untouched here: they already converge via the dispatcher
// + DB read-through.
import (
"context"
"time"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
"github.com/mudler/LocalAI/core/services/testutil"
"github.com/mudler/LocalAI/pkg/system"
)
// newTaskSyncService builds an AgentJobService wired to the given bus and a
// throwaway data dir (so the file persister has somewhere to write). Model/config
// loaders are nil because the task sync paths under test never touch them.
func newTaskSyncService(bus messaging.MessagingClient) *AgentJobService {
tmpDir := GinkgoT().TempDir()
sysState := &system.SystemState{}
sysState.Model.ModelsPath = tmpDir
appConfig := config.NewApplicationConfig(
config.WithDynamicConfigDir(tmpDir),
config.WithContext(context.Background()),
)
appConfig.SystemState = sysState
svc := NewAgentJobServiceWithPaths(appConfig, nil, nil, nil,
// Distinct per-replica files so the file persister write-through never
// crosses replicas: convergence here must be proven via the bus alone.
tmpDir+"/tasks.json", tmpDir+"/jobs.json")
svc.SetTaskSyncNATS(bus)
return svc
}
var _ = Describe("AgentJobService task cross-replica sync", func() {
Describe("two replicas sharing one bus", func() {
var (
bus *testutil.FakeBus
a, b *AgentJobService
)
BeforeEach(func() {
// One shared bus, two replicas: exactly the distributed topology where a
// round-robin request may land on a replica that did not originate the
// change.
bus = testutil.NewFakeBus()
a = newTaskSyncService(bus)
b = newTaskSyncService(bus)
// Start hydrates (empty here) and subscribes both replicas to deltas.
Expect(a.Start(context.Background())).To(Succeed())
Expect(b.Start(context.Background())).To(Succeed())
})
AfterEach(func() {
Expect(a.Stop()).To(Succeed())
Expect(b.Stop()).To(Succeed())
})
It("makes a task created on A visible via B's GetTask and ListTasks", func() {
id, err := a.CreateTask(schema.Task{Name: "Shared", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
got, err := b.GetTask(id)
Expect(err).NotTo(HaveOccurred(), "B must see a task A just created")
Expect(got.Name).To(Equal("Shared"))
listed := b.ListTasks()
Expect(listed).To(HaveLen(1))
Expect(listed[0].ID).To(Equal(id))
})
It("propagates a task update from A to B", func() {
id, err := a.CreateTask(schema.Task{Name: "Before", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
Expect(a.UpdateTask(id, schema.Task{Name: "After", Model: "m", Prompt: "p"})).To(Succeed())
got, err := b.GetTask(id)
Expect(err).NotTo(HaveOccurred())
Expect(got.Name).To(Equal("After"), "an update on A must be visible on B")
})
It("removes a task from B when it is deleted on A", func() {
id, err := a.CreateTask(schema.Task{Name: "Doomed", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
_, err = b.GetTask(id)
Expect(err).NotTo(HaveOccurred(), "precondition: B must have the task before the delete")
Expect(a.DeleteTask(id)).To(Succeed())
_, err = b.GetTask(id)
Expect(err).To(HaveOccurred(), "a delete on A must remove the task from B")
Expect(b.ListTasks()).To(BeEmpty())
})
It("does not re-broadcast a delta it received (echo-loop guard)", func() {
subject := messaging.SubjectSyncStateDelta("agent.tasks")
_, err := a.CreateTask(schema.Task{Name: "Once", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
// Exactly one publish: A's create. B applies it without re-publishing,
// otherwise this would be 2+ and a real bus would storm.
Expect(bus.PublishCount(subject)).To(Equal(1))
})
})
Describe("ListTasks ordering and scoping", func() {
var svc *AgentJobService
BeforeEach(func() {
svc = newTaskSyncService(testutil.NewFakeBus())
Expect(svc.Start(context.Background())).To(Succeed())
})
AfterEach(func() { Expect(svc.Stop()).To(Succeed()) })
It("sorts newest-first, breaking ties by name", func() {
// CreateTask stamps CreatedAt with time.Now(); space them out so ordering
// is deterministic rather than relying on the sub-millisecond gap.
oldID, err := svc.CreateTask(schema.Task{Name: "Old", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
time.Sleep(5 * time.Millisecond)
newID, err := svc.CreateTask(schema.Task{Name: "New", Model: "m", Prompt: "p"})
Expect(err).NotTo(HaveOccurred())
listed := svc.ListTasks()
Expect(listed).To(HaveLen(2))
Expect(listed[0].ID).To(Equal(newID), "newest first")
Expect(listed[1].ID).To(Equal(oldID))
})
})
Describe("compile-time adapter contract", func() {
It("satisfies syncstate.Store for tasks", func() {
// Mirrors the var assertion in task_syncstore.go; keeps the type
// referenced from a spec so drift surfaces here too.
var _ syncstate.Store[string, schema.Task] = (*taskStoreAdapter)(nil)
Expect(&taskStoreAdapter{}).ToNot(BeNil())
})
})
})

View File

@@ -0,0 +1,47 @@
package agentpool
import (
"context"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/syncstate"
)
// taskStoreAdapter bridges the existing JobPersister (file- or DB-backed) to the
// generic syncstate.Store the tasks SyncedMap consumes. Only tasks are migrated:
// jobs already converge across replicas via the dispatcher (NATS) plus the DB
// read-through in ListJobs/GetJob, whereas ListTasks read in-memory only and so
// went stale on replicas that did not originate the change.
//
// The adapter reads svc.persister and svc.userID live (rather than capturing
// them) because both are configured by setters - SetDistributedJobStore swaps the
// file persister for the DB one, SetUserID scopes per-user queries - AFTER the
// service, and thus this adapter, is constructed. Reading them at call time means
// the SyncedMap never has to be rebuilt when the persister is swapped.
//
// The SyncedMap value type is schema.Task: the exact shape ListTasks returns, so
// reads need no conversion and REST responses are provably unchanged.
type taskStoreAdapter struct {
svc *AgentJobService
}
// compile-time assertion that the adapter satisfies the component's Store.
var _ syncstate.Store[string, schema.Task] = (*taskStoreAdapter)(nil)
// List hydrates the map from durable storage on Start/reconnect: the file's task
// list (standalone) or every task row (DB / distributed).
func (a *taskStoreAdapter) List(_ context.Context) ([]schema.Task, error) {
return a.svc.persister.LoadTasks(a.svc.userID)
}
// Upsert write-through persists a single task created/updated locally; the
// SyncedMap then broadcasts the delta to peers.
func (a *taskStoreAdapter) Upsert(_ context.Context, task schema.Task) error {
return a.svc.persister.SaveTask(a.svc.userID, task)
}
// Delete write-through removes a task locally; the SyncedMap then broadcasts the
// removal to peers.
func (a *taskStoreAdapter) Delete(_ context.Context, id string) error {
return a.svc.persister.DeleteTask(id)
}

View File

@@ -7,6 +7,7 @@ import (
"github.com/mudler/LocalAGI/webui/collections" "github.com/mudler/LocalAGI/webui/collections"
"github.com/mudler/LocalAI/core/config" "github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/services/jobs" "github.com/mudler/LocalAI/core/services/jobs"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/templates" "github.com/mudler/LocalAI/core/templates"
"github.com/mudler/LocalAI/pkg/model" "github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/xlog" "github.com/mudler/xlog"
@@ -28,6 +29,9 @@ type UserServicesManager struct {
// Shared distributed backends (set once, inherited by per-user job services) // Shared distributed backends (set once, inherited by per-user job services)
jobDispatcher DistributedDispatcher jobDispatcher DistributedDispatcher
jobDBStore *jobs.JobStore jobDBStore *jobs.JobStore
// jobNats keeps per-user agent tasks consistent across replicas (nil in
// standalone). Inherited by each per-user AgentJobService.
jobNats messaging.MessagingClient
} }
// NewUserServicesManager creates a new UserServicesManager. // NewUserServicesManager creates a new UserServicesManager.
@@ -162,6 +166,10 @@ func (m *UserServicesManager) GetJobs(userID string) (*AgentJobService, error) {
if m.jobDispatcher != nil { if m.jobDispatcher != nil {
svc.SetDistributedBackends(m.jobDispatcher) svc.SetDistributedBackends(m.jobDispatcher)
} }
// Inherit the NATS client so per-user tasks broadcast across replicas. Must be
// set before the hydrate below (LoadFromDB / LoadTasksFromFile) so the tasks
// SyncedMap is rebuilt with the client while it is still empty.
svc.SetTaskSyncNATS(m.jobNats)
if m.jobDBStore != nil { if m.jobDBStore != nil {
svc.SetDistributedJobStore(m.jobDBStore) svc.SetDistributedJobStore(m.jobDBStore)
// Load tasks/jobs from DB immediately (per-user services skip Start()) // Load tasks/jobs from DB immediately (per-user services skip Start())
@@ -189,6 +197,12 @@ func (m *UserServicesManager) SetJobDBStore(s *jobs.JobStore) {
m.jobDBStore = s m.jobDBStore = s
} }
// SetJobSyncNATS sets the NATS client used to keep per-user agent tasks consistent
// across replicas.
func (m *UserServicesManager) SetJobSyncNATS(nats messaging.MessagingClient) {
m.jobNats = nats
}
// ListAllUserIDs returns all user IDs that have scoped data directories. // ListAllUserIDs returns all user IDs that have scoped data directories.
func (m *UserServicesManager) ListAllUserIDs() ([]string, error) { func (m *UserServicesManager) ListAllUserIDs() ([]string, error) {
return m.storage.ListUserDirs() return m.storage.ListUserDirs()

View File

@@ -8,6 +8,7 @@ import (
"github.com/google/uuid" "github.com/google/uuid"
"github.com/mudler/LocalAI/core/services/advisorylock" "github.com/mudler/LocalAI/core/services/advisorylock"
"gorm.io/gorm" "gorm.io/gorm"
"gorm.io/gorm/clause"
) )
// FineTuneJobRecord tracks fine-tune jobs in PostgreSQL. // FineTuneJobRecord tracks fine-tune jobs in PostgreSQL.
@@ -80,6 +81,34 @@ func (s *FineTuneStore) List(userID string) ([]FineTuneJobRecord, error) {
return jobs, q.Find(&jobs).Error return jobs, q.Find(&jobs).Error
} }
// ListAll returns every fine-tune job across all users. The SyncedMap that backs
// FineTuneService is a single global map (the REST API filters by user at read
// time), so hydrate needs the full set rather than the per-user List above.
func (s *FineTuneStore) ListAll() ([]FineTuneJobRecord, error) {
var jobs []FineTuneJobRecord
return jobs, s.db.Order("created_at DESC").Find(&jobs).Error
}
// Upsert idempotently inserts or fully replaces a job row by primary key. The
// SyncedMap write-through path issues a single Set per mutation regardless of
// whether the job already exists, so it needs one create-or-update primitive
// (Create alone fails on a duplicate key, UpdateStatus alone misses new rows and
// only touches a few columns).
func (s *FineTuneStore) Upsert(job *FineTuneJobRecord) error {
if job.ID == "" {
job.ID = uuid.New().String()
}
now := time.Now()
if job.CreatedAt.IsZero() {
job.CreatedAt = now
}
job.UpdatedAt = now
return s.db.Clauses(clause.OnConflict{
Columns: []clause.Column{{Name: "id"}},
UpdateAll: true,
}).Create(job).Error
}
// UpdateStatus updates the status and message of a fine-tune job. // UpdateStatus updates the status and message of a fine-tune job.
func (s *FineTuneStore) UpdateStatus(id, status, message string) error { func (s *FineTuneStore) UpdateStatus(id, status, message string) error {
return s.db.Model(&FineTuneJobRecord{}).Where("id = ?", id).Updates(map[string]any{ return s.db.Model(&FineTuneJobRecord{}).Where("id = ?", id).Updates(map[string]any{

View File

@@ -0,0 +1,13 @@
package distributed_test
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestDistributed(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Distributed Suite")
}

View File

@@ -0,0 +1,61 @@
package distributed_test
import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/testutil"
)
var _ = Describe("FineTuneStore", func() {
var store *distributed.FineTuneStore
BeforeEach(func() {
db := testutil.SetupTestDB()
var err error
store, err = distributed.NewFineTuneStore(db)
Expect(err).ToNot(HaveOccurred())
})
Describe("ListAll", func() {
It("returns jobs across all users (unlike per-user List)", func() {
Expect(store.Create(&distributed.FineTuneJobRecord{ID: "j1", UserID: "u1", Status: "queued"})).To(Succeed())
Expect(store.Create(&distributed.FineTuneJobRecord{ID: "j2", UserID: "u2", Status: "queued"})).To(Succeed())
all, err := store.ListAll()
Expect(err).ToNot(HaveOccurred())
Expect(all).To(HaveLen(2))
perUser, err := store.List("u1")
Expect(err).ToNot(HaveOccurred())
Expect(perUser).To(HaveLen(1), "List stays per-user")
})
})
Describe("Upsert", func() {
It("inserts a new row", func() {
Expect(store.Upsert(&distributed.FineTuneJobRecord{ID: "up-1", UserID: "u1", Status: "queued"})).To(Succeed())
got, err := store.Get("up-1")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("queued"))
})
It("idempotently updates an existing row on a repeated key", func() {
Expect(store.Upsert(&distributed.FineTuneJobRecord{ID: "up-2", UserID: "u1", Status: "queued"})).To(Succeed())
// Second Upsert with the same primary key must update, not error on a
// duplicate-key violation (this is the SyncedMap write-through contract).
Expect(store.Upsert(&distributed.FineTuneJobRecord{ID: "up-2", UserID: "u1", Status: "completed", Message: "done"})).To(Succeed())
got, err := store.Get("up-2")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("completed"))
Expect(got.Message).To(Equal("done"))
all, err := store.ListAll()
Expect(err).ToNot(HaveOccurred())
Expect(all).To(HaveLen(1), "upsert must not create a duplicate")
})
})
})

View File

@@ -11,6 +11,7 @@ import (
type Stores struct { type Stores struct {
Gallery *GalleryStore Gallery *GalleryStore
FineTune *FineTuneStore FineTune *FineTuneStore
Quant *QuantStore
Skills *SkillStore Skills *SkillStore
} }
@@ -26,15 +27,21 @@ func InitStores(db *gorm.DB) (*Stores, error) {
return nil, fmt.Errorf("fine-tune store: %w", err) return nil, fmt.Errorf("fine-tune store: %w", err)
} }
quant, err := NewQuantStore(db)
if err != nil {
return nil, fmt.Errorf("quantization store: %w", err)
}
skills, err := NewSkillStore(db) skills, err := NewSkillStore(db)
if err != nil { if err != nil {
return nil, fmt.Errorf("skills store: %w", err) return nil, fmt.Errorf("skills store: %w", err)
} }
xlog.Info("Distributed stores initialized (Gallery, FineTune, Skills)") xlog.Info("Distributed stores initialized (Gallery, FineTune, Quant, Skills)")
return &Stores{ return &Stores{
Gallery: gallery, Gallery: gallery,
FineTune: ft, FineTune: ft,
Quant: quant,
Skills: skills, Skills: skills,
}, nil }, nil
} }

View File

@@ -0,0 +1,105 @@
package distributed
import (
"context"
"fmt"
"time"
"github.com/google/uuid"
"github.com/mudler/LocalAI/core/services/advisorylock"
"gorm.io/gorm"
"gorm.io/gorm/clause"
)
// QuantJobRecord tracks quantization jobs in PostgreSQL. The columns mirror the
// API shape (schema.QuantizationJob); the structured Config and ExtraOptions are
// serialized into JSON text columns so a record fully reconstructs the job.
type QuantJobRecord struct {
ID string `gorm:"primaryKey;size:36" json:"id"`
UserID string `gorm:"index;size:36" json:"user_id,omitempty"`
Model string `gorm:"size:255" json:"model"`
Backend string `gorm:"size:64" json:"backend"`
ModelID string `gorm:"size:255" json:"model_id,omitempty"`
QuantizationType string `gorm:"size:32" json:"quantization_type"`
Status string `gorm:"index;size:32;default:queued" json:"status"` // queued, downloading, converting, quantizing, completed, failed, stopped
Message string `gorm:"type:text" json:"message,omitempty"`
OutputDir string `gorm:"size:512" json:"output_dir,omitempty"`
OutputFile string `gorm:"size:512" json:"output_file,omitempty"`
ConfigJSON string `gorm:"column:config;type:text" json:"-"`
ExtraOptsJSON string `gorm:"column:extra_options;type:text" json:"-"`
ImportStatus string `gorm:"size:32" json:"import_status,omitempty"`
ImportMessage string `gorm:"type:text" json:"import_message,omitempty"`
ImportModelName string `gorm:"size:255" json:"import_model_name,omitempty"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
}
func (QuantJobRecord) TableName() string { return "quantization_jobs" }
// QuantStore manages quantization job state in PostgreSQL.
type QuantStore struct {
db *gorm.DB
}
// NewQuantStore creates a new QuantStore and auto-migrates.
// Uses a PostgreSQL advisory lock to prevent concurrent migration races
// when multiple instances (frontend + workers) start at the same time.
func NewQuantStore(db *gorm.DB) (*QuantStore, error) {
if err := advisorylock.WithLockCtx(context.Background(), db, advisorylock.KeySchemaMigrate, func() error {
return db.AutoMigrate(&QuantJobRecord{})
}); err != nil {
return nil, fmt.Errorf("migrating quantization_jobs: %w", err)
}
return &QuantStore{db: db}, nil
}
// Create stores a new quantization job.
func (s *QuantStore) Create(job *QuantJobRecord) error {
if job.ID == "" {
job.ID = uuid.New().String()
}
job.CreatedAt = time.Now()
job.UpdatedAt = job.CreatedAt
return s.db.Create(job).Error
}
// Get retrieves a quantization job by ID.
func (s *QuantStore) Get(id string) (*QuantJobRecord, error) {
var job QuantJobRecord
if err := s.db.First(&job, "id = ?", id).Error; err != nil {
return nil, err
}
return &job, nil
}
// ListAll returns every quantization job across all users. The SyncedMap that
// backs QuantizationService is a single global map (the REST API filters by user
// at read time), so hydrate needs the full set.
func (s *QuantStore) ListAll() ([]QuantJobRecord, error) {
var jobs []QuantJobRecord
return jobs, s.db.Order("created_at DESC").Find(&jobs).Error
}
// Upsert idempotently inserts or fully replaces a job row by primary key. The
// SyncedMap write-through path issues a single Set per mutation regardless of
// whether the job already exists, so it needs one create-or-update primitive
// (Create alone fails on a duplicate key).
func (s *QuantStore) Upsert(job *QuantJobRecord) error {
if job.ID == "" {
job.ID = uuid.New().String()
}
now := time.Now()
if job.CreatedAt.IsZero() {
job.CreatedAt = now
}
job.UpdatedAt = now
return s.db.Clauses(clause.OnConflict{
Columns: []clause.Column{{Name: "id"}},
UpdateAll: true,
}).Create(job).Error
}
// Delete removes a quantization job.
func (s *QuantStore) Delete(id string) error {
return s.db.Where("id = ?", id).Delete(&QuantJobRecord{}).Error
}

View File

@@ -0,0 +1,57 @@
package distributed_test
import (
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/testutil"
)
var _ = Describe("QuantStore", func() {
var store *distributed.QuantStore
BeforeEach(func() {
db := testutil.SetupTestDB()
var err error
store, err = distributed.NewQuantStore(db)
Expect(err).ToNot(HaveOccurred())
})
Describe("ListAll", func() {
It("returns jobs across all users", func() {
Expect(store.Create(&distributed.QuantJobRecord{ID: "j1", UserID: "u1", Status: "queued"})).To(Succeed())
Expect(store.Create(&distributed.QuantJobRecord{ID: "j2", UserID: "u2", Status: "queued"})).To(Succeed())
all, err := store.ListAll()
Expect(err).ToNot(HaveOccurred())
Expect(all).To(HaveLen(2))
})
})
Describe("Upsert", func() {
It("inserts a new row", func() {
Expect(store.Upsert(&distributed.QuantJobRecord{ID: "up-1", UserID: "u1", Status: "queued"})).To(Succeed())
got, err := store.Get("up-1")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("queued"))
})
It("idempotently updates an existing row on a repeated key", func() {
Expect(store.Upsert(&distributed.QuantJobRecord{ID: "up-2", UserID: "u1", Status: "queued"})).To(Succeed())
// Second Upsert with the same primary key must update, not error on a
// duplicate-key violation (this is the SyncedMap write-through contract).
Expect(store.Upsert(&distributed.QuantJobRecord{ID: "up-2", UserID: "u1", Status: "completed", Message: "done"})).To(Succeed())
got, err := store.Get("up-2")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("completed"))
Expect(got.Message).To(Equal("done"))
all, err := store.ListAll()
Expect(err).ToNot(HaveOccurred())
Expect(all).To(HaveLen(1), "upsert must not create a duplicate")
})
})
})

View File

@@ -0,0 +1,13 @@
package finetune
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestFinetune(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Finetune Suite")
}

View File

@@ -19,6 +19,7 @@ import (
"github.com/mudler/LocalAI/core/schema" "github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed" "github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/messaging" "github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
pb "github.com/mudler/LocalAI/pkg/grpc/proto" pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/model" "github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/utils" "github.com/mudler/LocalAI/pkg/utils"
@@ -32,44 +33,63 @@ type FineTuneService struct {
modelLoader *model.ModelLoader modelLoader *model.ModelLoader
configLoader *config.ModelConfigLoader configLoader *config.ModelConfigLoader
mu sync.Mutex // mu serializes the read-modify-write of job values. The SyncedMap guards its
jobs map[string]*schema.FineTuneJob // own map structure, but a job is a pointer mutated in place (e.g. the export
// goroutine), so the service still needs a lock to keep those field updates
// and the subsequent Set atomic with respect to readers.
mu sync.Mutex
// Distributed mode (nil when not in distributed mode) // jobs is the cross-replica job store: an in-memory map kept consistent across
natsClient messaging.Publisher // replicas via NATS, optionally read-through to PostgreSQL in distributed mode.
fineTuneStore *distributed.FineTuneStore jobs *syncstate.SyncedMap[string, *schema.FineTuneJob]
} }
// SetNATSClient sets the NATS client for distributed progress publishing. // NewFineTuneService creates a new FineTuneService. In distributed mode pass the
func (s *FineTuneService) SetNATSClient(nc messaging.Publisher) { // shared NATS client and PostgreSQL store so jobs stay consistent across
s.mu.Lock() // replicas; pass nil for both in standalone mode, where the disk Loader hydrates
defer s.mu.Unlock() // the map and there is nothing to broadcast.
s.natsClient = nc
}
// SetFineTuneStore sets the PostgreSQL fine-tune store for distributed persistence.
func (s *FineTuneService) SetFineTuneStore(store *distributed.FineTuneStore) {
s.mu.Lock()
defer s.mu.Unlock()
s.fineTuneStore = store
}
// NewFineTuneService creates a new FineTuneService.
func NewFineTuneService( func NewFineTuneService(
appConfig *config.ApplicationConfig, appConfig *config.ApplicationConfig,
modelLoader *model.ModelLoader, modelLoader *model.ModelLoader,
configLoader *config.ModelConfigLoader, configLoader *config.ModelConfigLoader,
nats messaging.MessagingClient,
store *distributed.FineTuneStore,
) *FineTuneService { ) *FineTuneService {
s := &FineTuneService{ s := &FineTuneService{
appConfig: appConfig, appConfig: appConfig,
modelLoader: modelLoader, modelLoader: modelLoader,
configLoader: configLoader, configLoader: configLoader,
jobs: make(map[string]*schema.FineTuneJob),
} }
s.loadAllJobs()
// Only attach a Store interface when a concrete store exists, otherwise the
// SyncedMap would see a non-nil interface wrapping a nil pointer and try to
// hydrate/write through a nil DB.
var syncStore syncstate.Store[string, *schema.FineTuneJob]
if store != nil {
syncStore = &fineTuneStoreAdapter{store: store}
}
s.jobs = syncstate.New(syncstate.Config[string, *schema.FineTuneJob]{
Name: "finetune.jobs",
Key: func(j *schema.FineTuneJob) string { return j.ID },
Nats: nats,
Store: syncStore,
Loader: s.loadJobsFromDisk, // ignored when Store is set (distributed mode)
})
// Hydrate + subscribe. A hydrate failure must not take the server down: log
// and continue degraded (standalone), mirroring the OpCache wiring.
if err := s.jobs.Start(appConfig.Context); err != nil {
xlog.Warn("FineTune SyncedMap start failed; running degraded", "error", err)
}
return s return s
} }
// Close releases the SyncedMap subscription and background workers.
func (s *FineTuneService) Close() error {
return s.jobs.Close()
}
// fineTuneBaseDir returns the base directory for fine-tune job data. // fineTuneBaseDir returns the base directory for fine-tune job data.
func (s *FineTuneService) fineTuneBaseDir() string { func (s *FineTuneService) fineTuneBaseDir() string {
return filepath.Join(s.appConfig.DataPath, "fine-tune") return filepath.Join(s.appConfig.DataPath, "fine-tune")
@@ -100,15 +120,18 @@ func (s *FineTuneService) saveJobState(job *schema.FineTuneJob) {
} }
} }
// loadAllJobs scans the fine-tune directory for persisted jobs and loads them. // loadJobsFromDisk scans the fine-tune directory for persisted jobs and returns
func (s *FineTuneService) loadAllJobs() { // them. It is the SyncedMap Loader used in standalone mode (no DB); the returned
// slice hydrates the map on Start.
func (s *FineTuneService) loadJobsFromDisk(_ context.Context) ([]*schema.FineTuneJob, error) {
baseDir := s.fineTuneBaseDir() baseDir := s.fineTuneBaseDir()
entries, err := os.ReadDir(baseDir) entries, err := os.ReadDir(baseDir)
if err != nil { if err != nil {
// Directory doesn't exist yet — that's fine // Directory doesn't exist yet — that's fine, start empty.
return return nil, nil
} }
var jobs []*schema.FineTuneJob
for _, entry := range entries { for _, entry := range entries {
if !entry.IsDir() { if !entry.IsDir() {
continue continue
@@ -137,12 +160,13 @@ func (s *FineTuneService) loadAllJobs() {
job.ExportMessage = "Server restarted while export was running" job.ExportMessage = "Server restarted while export was running"
} }
s.jobs[job.ID] = &job jobs = append(jobs, &job)
} }
if len(s.jobs) > 0 { if len(jobs) > 0 {
xlog.Info("Loaded persisted fine-tune jobs", "count", len(s.jobs)) xlog.Info("Loaded persisted fine-tune jobs", "count", len(jobs))
} }
return jobs, nil
} }
// StartJob starts a new fine-tuning job. // StartJob starts a new fine-tuning job.
@@ -236,27 +260,13 @@ func (s *FineTuneService) StartJob(ctx context.Context, userID string, req schem
CreatedAt: time.Now().UTC().Format(time.RFC3339), CreatedAt: time.Now().UTC().Format(time.RFC3339),
Config: &req, Config: &req,
} }
s.jobs[jobID] = job // Set write-through persists to PostgreSQL (distributed) and broadcasts to
s.saveJobState(job) // peer replicas; the disk state.json is written separately for restart
// recovery / standalone hydrate.
// Persist to PostgreSQL in distributed mode if err := s.jobs.Set(ctx, job); err != nil {
if s.fineTuneStore != nil { return nil, fmt.Errorf("failed to persist job: %w", err)
configJSON, _ := json.Marshal(req)
extraJSON, _ := json.Marshal(req.ExtraOptions)
s.fineTuneStore.Create(&distributed.FineTuneJobRecord{
ID: jobID,
UserID: userID,
Model: req.Model,
Backend: backendName,
ModelID: modelID,
TrainingType: req.TrainingType,
TrainingMethod: req.TrainingMethod,
Status: "queued",
OutputDir: outputDir,
ConfigJSON: string(configJSON),
ExtraOptsJSON: string(extraJSON),
})
} }
s.saveJobState(job)
return &schema.FineTuneJobResponse{ return &schema.FineTuneJobResponse{
ID: jobID, ID: jobID,
@@ -270,7 +280,7 @@ func (s *FineTuneService) GetJob(userID, jobID string) (*schema.FineTuneJob, err
s.mu.Lock() s.mu.Lock()
defer s.mu.Unlock() defer s.mu.Unlock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
return nil, fmt.Errorf("job not found: %s", jobID) return nil, fmt.Errorf("job not found: %s", jobID)
} }
@@ -286,7 +296,7 @@ func (s *FineTuneService) ListJobs(userID string) []*schema.FineTuneJob {
defer s.mu.Unlock() defer s.mu.Unlock()
var result []*schema.FineTuneJob var result []*schema.FineTuneJob
for _, job := range s.jobs { for _, job := range s.jobs.List() {
if userID == "" || job.UserID == userID { if userID == "" || job.UserID == userID {
result = append(result, job) result = append(result, job)
} }
@@ -302,7 +312,7 @@ func (s *FineTuneService) ListJobs(userID string) []*schema.FineTuneJob {
// StopJob stops a running fine-tuning job. // StopJob stops a running fine-tuning job.
func (s *FineTuneService) StopJob(ctx context.Context, userID, jobID string, saveCheckpoint bool) error { func (s *FineTuneService) StopJob(ctx context.Context, userID, jobID string, saveCheckpoint bool) error {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID) return fmt.Errorf("job not found: %s", jobID)
@@ -323,10 +333,10 @@ func (s *FineTuneService) StopJob(ctx context.Context, userID, jobID string, sav
s.mu.Lock() s.mu.Lock()
job.Status = "stopped" job.Status = "stopped"
job.Message = "Training stopped by user" job.Message = "Training stopped by user"
s.saveJobState(job) if err := s.jobs.Set(ctx, job); err != nil {
if s.fineTuneStore != nil { xlog.Warn("Failed to persist stopped job", "job_id", jobID, "error", err)
s.fineTuneStore.UpdateStatus(jobID, "stopped", "Training stopped by user")
} }
s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
return nil return nil
@@ -335,7 +345,7 @@ func (s *FineTuneService) StopJob(ctx context.Context, userID, jobID string, sav
// DeleteJob removes a fine-tuning job and its associated data from disk. // DeleteJob removes a fine-tuning job and its associated data from disk.
func (s *FineTuneService) DeleteJob(userID, jobID string) error { func (s *FineTuneService) DeleteJob(userID, jobID string) error {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID) return fmt.Errorf("job not found: %s", jobID)
@@ -360,9 +370,10 @@ func (s *FineTuneService) DeleteJob(userID, jobID string) error {
} }
exportModelName := job.ExportModelName exportModelName := job.ExportModelName
delete(s.jobs, jobID) // Delete write-through removes the DB row (distributed) and broadcasts the
if s.fineTuneStore != nil { // removal to peer replicas. DeleteJob has no ctx, so use Background.
s.fineTuneStore.Delete(jobID) if err := s.jobs.Delete(context.Background(), jobID); err != nil {
xlog.Warn("Failed to delete job from store", "job_id", jobID, "error", err)
} }
s.mu.Unlock() s.mu.Unlock()
@@ -398,7 +409,7 @@ func (s *FineTuneService) DeleteJob(userID, jobID string) error {
// StreamProgress opens a gRPC progress stream and calls the callback for each update. // StreamProgress opens a gRPC progress stream and calls the callback for each update.
func (s *FineTuneService) StreamProgress(ctx context.Context, userID, jobID string, callback func(event *schema.FineTuneProgressEvent)) error { func (s *FineTuneService) StreamProgress(ctx context.Context, userID, jobID string, callback func(event *schema.FineTuneProgressEvent)) error {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID) return fmt.Errorf("job not found: %s", jobID)
@@ -427,7 +438,7 @@ func (s *FineTuneService) StreamProgress(ctx context.Context, userID, jobID stri
}, func(update *pb.FineTuneProgressUpdate) { }, func(update *pb.FineTuneProgressUpdate) {
// Update job status and persist // Update job status and persist
s.mu.Lock() s.mu.Lock()
if j, ok := s.jobs[jobID]; ok { if j, ok := s.jobs.Get(jobID); ok {
// Don't let progress updates overwrite terminal states // Don't let progress updates overwrite terminal states
isTerminal := j.Status == "stopped" || j.Status == "completed" || j.Status == "failed" isTerminal := j.Status == "stopped" || j.Status == "completed" || j.Status == "failed"
if !isTerminal { if !isTerminal {
@@ -436,10 +447,10 @@ func (s *FineTuneService) StreamProgress(ctx context.Context, userID, jobID stri
if update.Message != "" { if update.Message != "" {
j.Message = update.Message j.Message = update.Message
} }
s.saveJobState(j) if err := s.jobs.Set(ctx, j); err != nil {
if s.fineTuneStore != nil { xlog.Warn("Failed to persist progress update", "job_id", jobID, "error", err)
s.fineTuneStore.UpdateStatus(jobID, j.Status, j.Message)
} }
s.saveJobState(j)
} }
s.mu.Unlock() s.mu.Unlock()
@@ -474,7 +485,7 @@ func (s *FineTuneService) StreamProgress(ctx context.Context, userID, jobID stri
// ListCheckpoints lists checkpoints for a job. // ListCheckpoints lists checkpoints for a job.
func (s *FineTuneService) ListCheckpoints(ctx context.Context, userID, jobID string) ([]*pb.CheckpointInfo, error) { func (s *FineTuneService) ListCheckpoints(ctx context.Context, userID, jobID string) ([]*pb.CheckpointInfo, error) {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return nil, fmt.Errorf("job not found: %s", jobID) return nil, fmt.Errorf("job not found: %s", jobID)
@@ -520,7 +531,7 @@ func sanitizeModelName(s string) string {
// ExportModel starts an async model export from a checkpoint and returns the intended model name immediately. // ExportModel starts an async model export from a checkpoint and returns the intended model name immediately.
func (s *FineTuneService) ExportModel(ctx context.Context, userID, jobID string, req schema.ExportRequest) (string, error) { func (s *FineTuneService) ExportModel(ctx context.Context, userID, jobID string, req schema.ExportRequest) (string, error) {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return "", fmt.Errorf("job not found: %s", jobID) return "", fmt.Errorf("job not found: %s", jobID)
@@ -572,6 +583,9 @@ func (s *FineTuneService) ExportModel(ctx context.Context, userID, jobID string,
job.ExportStatus = "exporting" job.ExportStatus = "exporting"
job.ExportMessage = "" job.ExportMessage = ""
job.ExportModelName = "" job.ExportModelName = ""
if err := s.jobs.Set(ctx, job); err != nil {
xlog.Warn("Failed to persist export start", "job_id", jobID, "error", err)
}
s.saveJobState(job) s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
@@ -662,24 +676,30 @@ func (s *FineTuneService) ExportModel(ctx context.Context, userID, jobID string,
xlog.Info("Model exported and registered", "job_id", jobID, "model_name", modelName, "format", req.ExportFormat) xlog.Info("Model exported and registered", "job_id", jobID, "model_name", modelName, "format", req.ExportFormat)
// Runs after the HTTP request returns, so use Background rather than the
// (now likely cancelled) request ctx for the write-through.
s.mu.Lock() s.mu.Lock()
job.ExportStatus = "completed" job.ExportStatus = "completed"
job.ExportModelName = modelName job.ExportModelName = modelName
job.ExportMessage = "" job.ExportMessage = ""
s.saveJobState(job) if err := s.jobs.Set(context.Background(), job); err != nil {
if s.fineTuneStore != nil { xlog.Warn("Failed to persist export completion", "job_id", jobID, "error", err)
s.fineTuneStore.UpdateExportStatus(jobID, "completed", "", modelName)
} }
s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
}() }()
return modelName, nil return modelName, nil
} }
// setExportMessage updates the export message and persists the job state. // setExportMessage updates the export message and persists the job state. Called
// from the background export goroutine, so it uses Background for write-through.
func (s *FineTuneService) setExportMessage(job *schema.FineTuneJob, msg string) { func (s *FineTuneService) setExportMessage(job *schema.FineTuneJob, msg string) {
s.mu.Lock() s.mu.Lock()
job.ExportMessage = msg job.ExportMessage = msg
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist export message", "job_id", job.ID, "error", err)
}
s.saveJobState(job) s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
} }
@@ -687,7 +707,7 @@ func (s *FineTuneService) setExportMessage(job *schema.FineTuneJob, msg string)
// GetExportedModelPath returns the path to the exported model directory and its name. // GetExportedModelPath returns the path to the exported model directory and its name.
func (s *FineTuneService) GetExportedModelPath(userID, jobID string) (string, string, error) { func (s *FineTuneService) GetExportedModelPath(userID, jobID string) (string, string, error) {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return "", "", fmt.Errorf("job not found: %s", jobID) return "", "", fmt.Errorf("job not found: %s", jobID)
@@ -723,10 +743,10 @@ func (s *FineTuneService) setExportFailed(job *schema.FineTuneJob, message strin
s.mu.Lock() s.mu.Lock()
job.ExportStatus = "failed" job.ExportStatus = "failed"
job.ExportMessage = message job.ExportMessage = message
s.saveJobState(job) if err := s.jobs.Set(context.Background(), job); err != nil {
if s.fineTuneStore != nil { xlog.Warn("Failed to persist export failure", "job_id", job.ID, "error", err)
s.fineTuneStore.UpdateExportStatus(job.ID, "failed", message, "")
} }
s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
} }

View File

@@ -0,0 +1,185 @@
package finetune
// White-box tests (package finetune) so a spec can drive the service's internal
// SyncedMap the same way StartJob does (via jobs.Set) without standing up a
// training backend, then assert the cross-replica reads (GetJob/ListJobs) and
// the adapter conversions that keep REST responses byte-for-byte unchanged.
import (
"context"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/testutil"
)
// newTestService builds a standalone FineTuneService wired to the given bus. The
// model/config loaders are nil because the read/sync paths under test never touch
// them; the data dir is a throwaway temp dir so the disk Loader finds nothing.
func newTestService(bus *testutil.FakeBus) *FineTuneService {
appConfig := &config.ApplicationConfig{
Context: context.Background(),
DataPath: GinkgoT().TempDir(),
}
return NewFineTuneService(appConfig, nil, nil, bus, nil)
}
var _ = Describe("FineTuneService", func() {
ctx := context.Background()
Describe("cross-replica job visibility", func() {
var (
bus *testutil.FakeBus
a, b *FineTuneService
)
BeforeEach(func() {
// One shared bus, two replicas: exactly the distributed topology where
// a round-robin request may land on a replica that did not originate
// the change.
bus = testutil.NewFakeBus()
a = newTestService(bus)
b = newTestService(bus)
})
AfterEach(func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
})
It("makes a job created on A visible via B's GetJob and ListJobs", func() {
job := &schema.FineTuneJob{ID: "job-1", UserID: "user-1", Status: "queued", CreatedAt: "2026-06-27T10:00:00Z"}
// StartJob persists via jobs.Set; drive that directly to avoid a backend.
Expect(a.jobs.Set(ctx, job)).To(Succeed())
got, err := b.GetJob("user-1", "job-1")
Expect(err).ToNot(HaveOccurred(), "B must see a job A just created")
Expect(got.Status).To(Equal("queued"))
listed := b.ListJobs("user-1")
Expect(listed).To(HaveLen(1))
Expect(listed[0].ID).To(Equal("job-1"))
})
It("removes a job from B when it is deleted on A", func() {
job := &schema.FineTuneJob{ID: "job-2", UserID: "user-1", Status: "completed", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, job)).To(Succeed())
_, err := b.GetJob("user-1", "job-2")
Expect(err).ToNot(HaveOccurred(), "precondition: B must have the job before the delete")
Expect(a.jobs.Delete(ctx, "job-2")).To(Succeed())
_, err = b.GetJob("user-1", "job-2")
Expect(err).To(HaveOccurred(), "a delete on A must remove the job from B")
})
It("propagates a status update from A to B", func() {
job := &schema.FineTuneJob{ID: "job-3", UserID: "user-1", Status: "training", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, job)).To(Succeed())
updated := &schema.FineTuneJob{ID: "job-3", UserID: "user-1", Status: "completed", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, updated)).To(Succeed())
got, err := b.GetJob("user-1", "job-3")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("completed"))
})
})
Describe("ListJobs", func() {
var svc *FineTuneService
BeforeEach(func() {
svc = newTestService(testutil.NewFakeBus())
})
AfterEach(func() { Expect(svc.Close()).To(Succeed()) })
It("filters by user and sorts newest-first", func() {
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "old", UserID: "u1", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "new", UserID: "u1", CreatedAt: "2026-06-27T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "other", UserID: "u2", CreatedAt: "2026-06-26T10:00:00Z"})).To(Succeed())
jobs := svc.ListJobs("u1")
Expect(jobs).To(HaveLen(2), "only u1's jobs")
Expect(jobs[0].ID).To(Equal("new"), "newest first")
Expect(jobs[1].ID).To(Equal("old"))
})
It("returns every user's jobs when the userID filter is empty", func() {
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "a", UserID: "u1", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "b", UserID: "u2", CreatedAt: "2026-06-26T10:00:00Z"})).To(Succeed())
Expect(svc.ListJobs("")).To(HaveLen(2))
})
It("rejects GetJob for a job owned by another user", func() {
Expect(svc.jobs.Set(ctx, &schema.FineTuneJob{ID: "x", UserID: "owner", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
_, err := svc.GetJob("intruder", "x")
Expect(err).To(HaveOccurred(), "a different user must not read someone else's job")
})
})
Describe("store adapter conversion", func() {
// The SyncedMap value type is *schema.FineTuneJob (the exact REST shape).
// These specs prove the DB adapter round-trips it losslessly, so hydrate
// and write-through in distributed mode keep responses unchanged.
It("round-trips a job through jobToRecord/recordToJob preserving the API shape", func() {
original := &schema.FineTuneJob{
ID: "rt-1",
UserID: "user-1",
Model: "base-model",
Backend: "trl",
ModelID: "trl-finetune-rt-1",
TrainingType: "lora",
TrainingMethod: "sft",
Status: "completed",
Message: "done",
OutputDir: "/data/fine-tune/rt-1",
ExtraOptions: map[string]string{"hf_token": "secret"},
CreatedAt: "2026-06-27T10:00:00Z",
ExportStatus: "completed",
ExportMessage: "",
ExportModelName: "base-model-ft-rt-1",
Config: &schema.FineTuneJobRequest{Model: "base-model", Backend: "trl", DatasetSource: "data.jsonl"},
}
rec := jobToRecord(original)
Expect(rec.ID).To(Equal("rt-1"))
Expect(rec.ConfigJSON).ToNot(BeEmpty(), "structured config must serialize into the JSON column")
Expect(rec.ExtraOptsJSON).ToNot(BeEmpty())
back := recordToJob(rec)
Expect(back.ID).To(Equal(original.ID))
Expect(back.UserID).To(Equal(original.UserID))
Expect(back.Model).To(Equal(original.Model))
Expect(back.Backend).To(Equal(original.Backend))
Expect(back.ModelID).To(Equal(original.ModelID))
Expect(back.TrainingType).To(Equal(original.TrainingType))
Expect(back.TrainingMethod).To(Equal(original.TrainingMethod))
Expect(back.Status).To(Equal(original.Status))
Expect(back.Message).To(Equal(original.Message))
Expect(back.OutputDir).To(Equal(original.OutputDir))
Expect(back.ExportStatus).To(Equal(original.ExportStatus))
Expect(back.ExportModelName).To(Equal(original.ExportModelName))
Expect(back.CreatedAt).To(Equal(original.CreatedAt))
Expect(back.ExtraOptions).To(Equal(original.ExtraOptions))
Expect(back.Config).ToNot(BeNil())
Expect(back.Config.DatasetSource).To(Equal("data.jsonl"))
})
})
Describe("compile-time adapter contract", func() {
It("satisfies syncstate.Store for *distributed.FineTuneStore", func() {
// Guards against drift between the adapter and the component interface;
// the var assertion in syncstore.go covers it at build time, this keeps
// the type referenced from a spec too.
var _ *distributed.FineTuneStore
Expect(&fineTuneStoreAdapter{}).ToNot(BeNil())
})
})
})

View File

@@ -0,0 +1,114 @@
package finetune
import (
"context"
"encoding/json"
"time"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/syncstate"
)
// fineTuneStoreAdapter bridges the distributed PostgreSQL FineTuneStore to the
// generic syncstate.Store the SyncedMap consumes. It is only wired in distributed
// mode; standalone leaves Store nil and hydrates from disk via a Loader instead.
//
// The SyncedMap value type is *schema.FineTuneJob (the exact shape the REST API
// returns) so reads need no conversion and the response JSON is provably
// unchanged. The adapter is the single place that translates between that API
// shape and the DB FineTuneJobRecord.
type fineTuneStoreAdapter struct {
store *distributed.FineTuneStore
}
// compile-time assertion that the adapter satisfies the component's Store.
var _ syncstate.Store[string, *schema.FineTuneJob] = (*fineTuneStoreAdapter)(nil)
func (a *fineTuneStoreAdapter) List(_ context.Context) ([]*schema.FineTuneJob, error) {
records, err := a.store.ListAll()
if err != nil {
return nil, err
}
jobs := make([]*schema.FineTuneJob, 0, len(records))
for i := range records {
jobs = append(jobs, recordToJob(&records[i]))
}
return jobs, nil
}
func (a *fineTuneStoreAdapter) Upsert(_ context.Context, job *schema.FineTuneJob) error {
return a.store.Upsert(jobToRecord(job))
}
func (a *fineTuneStoreAdapter) Delete(_ context.Context, id string) error {
return a.store.Delete(id)
}
// recordToJob maps a persisted DB record back to the API shape, reconstructing
// the structured Config / ExtraOptions from their JSON columns.
func recordToJob(r *distributed.FineTuneJobRecord) *schema.FineTuneJob {
job := &schema.FineTuneJob{
ID: r.ID,
UserID: r.UserID,
Model: r.Model,
Backend: r.Backend,
ModelID: r.ModelID,
TrainingType: r.TrainingType,
TrainingMethod: r.TrainingMethod,
Status: r.Status,
Message: r.Message,
OutputDir: r.OutputDir,
ExportStatus: r.ExportStatus,
ExportMessage: r.ExportMessage,
ExportModelName: r.ExportModelName,
CreatedAt: r.CreatedAt.UTC().Format(time.RFC3339),
}
if r.ExtraOptsJSON != "" {
// Best-effort: a malformed column must not drop the whole job from the API.
_ = json.Unmarshal([]byte(r.ExtraOptsJSON), &job.ExtraOptions)
}
if r.ConfigJSON != "" {
var cfg schema.FineTuneJobRequest
if err := json.Unmarshal([]byte(r.ConfigJSON), &cfg); err == nil {
job.Config = &cfg
}
}
return job
}
// jobToRecord maps the API shape to a DB record for write-through, serializing
// the structured Config / ExtraOptions into their JSON columns. CreatedAt is
// parsed back from the RFC3339 string the service stamps; an unparseable value
// is left zero so FineTuneStore.Upsert stamps "now".
func jobToRecord(job *schema.FineTuneJob) *distributed.FineTuneJobRecord {
rec := &distributed.FineTuneJobRecord{
ID: job.ID,
UserID: job.UserID,
Model: job.Model,
Backend: job.Backend,
ModelID: job.ModelID,
TrainingType: job.TrainingType,
TrainingMethod: job.TrainingMethod,
Status: job.Status,
Message: job.Message,
OutputDir: job.OutputDir,
ExportStatus: job.ExportStatus,
ExportMessage: job.ExportMessage,
ExportModelName: job.ExportModelName,
}
if job.Config != nil {
if data, err := json.Marshal(job.Config); err == nil {
rec.ConfigJSON = string(data)
}
}
if job.ExtraOptions != nil {
if data, err := json.Marshal(job.ExtraOptions); err == nil {
rec.ExtraOptsJSON = string(data)
}
}
if t, err := time.Parse(time.RFC3339, job.CreatedAt); err == nil {
rec.CreatedAt = t
}
return rec
}

View File

@@ -22,6 +22,14 @@ const subscribeConfirmTimeout = 5 * time.Second
type Client struct { type Client struct {
conn *nats.Conn conn *nats.Conn
mu sync.RWMutex mu sync.RWMutex
// reconnectCbs are invoked after the underlying connection is
// re-established. nats.go transparently resubscribes existing
// subscriptions on reconnect, but it cannot know that a consumer kept
// derived in-memory state (e.g. syncstate.SyncedMap) that may have drifted
// while the link was down — these callbacks let such consumers re-hydrate.
cbMu sync.Mutex
reconnectCbs []func()
} }
// New creates a new NATS client with auto-reconnect. // New creates a new NATS client with auto-reconnect.
@@ -31,6 +39,10 @@ func New(url string, opts ...Option) (*Client, error) {
o(&cfg) o(&cfg)
} }
// Allocate the client up front so the reconnect handler closure can reach
// it; conn is populated after nats.Connect succeeds below.
c := &Client{}
natsOpts := []nats.Option{ natsOpts := []nats.Option{
nats.RetryOnFailedConnect(true), nats.RetryOnFailedConnect(true),
nats.MaxReconnects(-1), nats.MaxReconnects(-1),
@@ -41,6 +53,7 @@ func New(url string, opts ...Option) (*Client, error) {
}), }),
nats.ReconnectHandler(func(_ *nats.Conn) { nats.ReconnectHandler(func(_ *nats.Conn) {
xlog.Info("NATS reconnected") xlog.Info("NATS reconnected")
c.runReconnectCallbacks()
}), }),
nats.ClosedHandler(func(_ *nats.Conn) { nats.ClosedHandler(func(_ *nats.Conn) {
xlog.Info("NATS connection closed") xlog.Info("NATS connection closed")
@@ -103,7 +116,33 @@ func New(url string, opts ...Option) (*Client, error) {
return nil, fmt.Errorf("connecting to NATS at %s: %w", sanitize.URL(url), err) return nil, fmt.Errorf("connecting to NATS at %s: %w", sanitize.URL(url), err)
} }
return &Client{conn: nc}, nil c.conn = nc
return c, nil
}
// OnReconnect registers a callback invoked after the NATS connection is
// re-established. It is consumed via an optional interface type-assertion
// (interface{ OnReconnect(func()) }) rather than being added to MessagingClient,
// so the messaging abstraction stays minimal and standalone/test clients are not
// forced to implement reconnect semantics. A nil callback is ignored.
func (c *Client) OnReconnect(cb func()) {
if cb == nil {
return
}
c.cbMu.Lock()
c.reconnectCbs = append(c.reconnectCbs, cb)
c.cbMu.Unlock()
}
// runReconnectCallbacks invokes registered reconnect callbacks. It copies the
// slice under the lock so a callback that (re)registers cannot deadlock.
func (c *Client) runReconnectCallbacks() {
c.cbMu.Lock()
cbs := append([]func(){}, c.reconnectCbs...)
c.cbMu.Unlock()
for _, cb := range cbs {
cb()
}
} }
// Publish marshals data as JSON and publishes it to the given subject. // Publish marshals data as JSON and publishes it to the given subject.

View File

@@ -380,6 +380,20 @@ func SubjectCacheInvalidateCollection(name string) string {
return "cache.invalidate.collections." + sanitizeSubjectToken(name) return "cache.invalidate.collections." + sanitizeSubjectToken(name)
} }
// SyncedMap State Sync (Pub/Sub — broadcast to all frontends)
//
// The reusable syncstate.SyncedMap component publishes a {op,key,value} delta on
// this subject whenever a replica mutates a piece of cross-replica in-memory
// state. Peers subscribe and apply the delta to their own map, so a round-robin
// API request that lands on a replica which did not originate the change still
// sees it. Convergence on (re)connect is done by re-hydrating from the durable
// source, so no request/reply snapshot subject is needed here.
func SubjectSyncStateDelta(name string) string {
return subjectSyncStatePrefix + sanitizeSubjectToken(name) + ".delta"
}
const subjectSyncStatePrefix = "state."
// Prefix-Cache Routing Sync (Pub/Sub - broadcast to all frontends) // Prefix-Cache Routing Sync (Pub/Sub - broadcast to all frontends)
// //
// Frontends share prefix-cache observations so a request routed to any replica // Frontends share prefix-cache observations so a request routed to any replica

View File

@@ -63,6 +63,11 @@ type SmartRouterOptions struct {
// The reconciler reads the same instance to autoscale a saturated cache-warm // The reconciler reads the same instance to autoscale a saturated cache-warm
// replica. nil disables recording (the disabled path stays a no-op). // replica. nil disables recording (the disabled path stays a no-op).
Pressure *prefixcache.Pressure Pressure *prefixcache.Pressure
// SharedModels asserts that every node mounts the same models directory at
// the same path. When true, stageModelFiles skips all uploading and leaves
// the absolute model paths untouched so the worker loads them directly from
// the shared volume (#10556). See config.DistributedConfig.SharedModels.
SharedModels bool
} }
// SmartRouter routes inference requests to the best available backend node. // SmartRouter routes inference requests to the best available backend node.
@@ -93,6 +98,9 @@ type SmartRouter struct {
// per-request routing doesn't stall behind a busy backend's serialized // per-request routing doesn't stall behind a busy backend's serialized
// HealthCheck/Predict. See probe_cache.go for the rationale. // HealthCheck/Predict. See probe_cache.go for the rationale.
probeCache *probeCache probeCache *probeCache
// sharedModels skips file staging when all nodes mount the same models
// directory at the same path (see SmartRouterOptions.SharedModels).
sharedModels bool
} }
// probeCacheTTL is how long a successful gRPC HealthCheck on a backend is // probeCacheTTL is how long a successful gRPC HealthCheck on a backend is
@@ -122,6 +130,7 @@ func NewSmartRouter(registry ModelRouter, opts SmartRouterOptions) *SmartRouter
prefixProvider: opts.PrefixProvider, prefixProvider: opts.PrefixProvider,
prefixConfig: opts.PrefixConfig, prefixConfig: opts.PrefixConfig,
pressure: opts.Pressure, pressure: opts.Pressure,
sharedModels: opts.SharedModels,
} }
} }
@@ -947,6 +956,19 @@ func (r *SmartRouter) buildClientForAddr(node *BackendNode, addr string, paralle
// simply remove the {ModelsPath}/{trackingKey}/ directory. // simply remove the {ModelsPath}/{trackingKey}/ directory.
func (r *SmartRouter) stageModelFiles(ctx context.Context, node *BackendNode, opts *pb.ModelOptions, trackingKey string) (*pb.ModelOptions, error) { func (r *SmartRouter) stageModelFiles(ctx context.Context, node *BackendNode, opts *pb.ModelOptions, trackingKey string) (*pb.ModelOptions, error) {
opts = proto.Clone(opts).(*pb.ModelOptions) opts = proto.Clone(opts).(*pb.ModelOptions)
// Shared-models mode: every node mounts the same models directory at the
// same path, so the frontend's absolute model paths are already valid on the
// worker. Staging would only re-upload files that already exist on the shared
// volume (under a tracking-key subdir the probe never reuses), re-downloading
// the model on every load (#10556). Return the clone untouched: no upload, no
// path rewrite, no staging tracker.
if r.sharedModels {
xlog.Info("Skipping model file staging: shared-models mode is on (LOCALAI_DISTRIBUTED_SHARED_MODELS); worker loads directly from the shared volume",
"node", node.Name, "modelFile", opts.ModelFile, "trackingKey", trackingKey)
return opts, nil
}
xlog.Info("Staging model files for remote node", "node", node.Name, "modelFile", opts.ModelFile, "trackingKey", trackingKey) xlog.Info("Staging model files for remote node", "node", node.Name, "modelFile", opts.ModelFile, "trackingKey", trackingKey)
// Derive the frontend models directory from ModelFile and Model. // Derive the frontend models directory from ModelFile and Model.

View File

@@ -0,0 +1,85 @@
package nodes
import (
"context"
"os"
"path/filepath"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
pb "github.com/mudler/LocalAI/pkg/grpc/proto"
)
// These tests cover shared-models mode (LOCALAI_DISTRIBUTED_SHARED_MODELS): when
// every node mounts the same models directory at the same path, the router must
// NOT stage model files to workers. The canonical absolute path is already valid
// on the worker, so staging would only re-download what is already present
// (#10556).
var _ = Describe("stageModelFiles shared-models mode", func() {
var (
stager *fakeFileStager
node *BackendNode
tmp string
gguf string
modelID = "ornith-1.0-35b"
)
BeforeEach(func() {
stager = &fakeFileStager{}
node = &BackendNode{ID: "node-1", Name: "node-1", Address: "10.0.0.1:50051"}
tmp = GinkgoT().TempDir()
modelDir := filepath.Join(tmp, "models", "llama-cpp", "models")
Expect(os.MkdirAll(modelDir, 0o755)).To(Succeed())
gguf = filepath.Join(modelDir, "ornith.gguf")
Expect(os.WriteFile(gguf, []byte("weights"), 0o644)).To(Succeed())
})
It("does not stage and keeps the canonical absolute ModelFile when shared-models is enabled", func() {
router := &SmartRouter{
fileStager: stager,
stagingTracker: NewStagingTracker(),
sharedModels: true,
}
opts := &pb.ModelOptions{
Model: "llama-cpp/models/ornith.gguf",
ModelFile: gguf,
}
staged, err := router.stageModelFiles(context.Background(), node, opts, modelID)
Expect(err).ToNot(HaveOccurred())
// The file stager must never be touched: no upload, no re-download.
Expect(stager.ensureCalls).To(BeEmpty())
// The worker loads directly from the shared volume, so the path is unchanged.
Expect(staged.ModelFile).To(Equal(gguf))
})
It("stages files (existing behavior) when shared-models is disabled", func() {
router := &SmartRouter{
fileStager: stager,
stagingTracker: NewStagingTracker(),
sharedModels: false,
}
opts := &pb.ModelOptions{
Model: "llama-cpp/models/ornith.gguf",
ModelFile: gguf,
}
staged, err := router.stageModelFiles(context.Background(), node, opts, modelID)
Expect(err).ToNot(HaveOccurred())
// Default mode uploads the model file to the worker.
Expect(stager.ensureCalls).ToNot(BeEmpty())
stagedLocals := make([]string, 0, len(stager.ensureCalls))
for _, c := range stager.ensureCalls {
stagedLocals = append(stagedLocals, c.localPath)
}
Expect(stagedLocals).To(ContainElement(gguf))
// ModelFile is rewritten to the remote (tracking-key namespaced) path.
Expect(staged.ModelFile).ToNot(Equal(gguf))
})
})

View File

@@ -0,0 +1,13 @@
package quantization
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestQuantization(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Quantization Suite")
}

View File

@@ -17,6 +17,9 @@ import (
"github.com/mudler/LocalAI/core/config" "github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/gallery/importers" "github.com/mudler/LocalAI/core/gallery/importers"
"github.com/mudler/LocalAI/core/schema" "github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
pb "github.com/mudler/LocalAI/pkg/grpc/proto" pb "github.com/mudler/LocalAI/pkg/grpc/proto"
"github.com/mudler/LocalAI/pkg/model" "github.com/mudler/LocalAI/pkg/model"
"github.com/mudler/LocalAI/pkg/utils" "github.com/mudler/LocalAI/pkg/utils"
@@ -30,26 +33,63 @@ type QuantizationService struct {
modelLoader *model.ModelLoader modelLoader *model.ModelLoader
configLoader *config.ModelConfigLoader configLoader *config.ModelConfigLoader
mu sync.Mutex // mu serializes the read-modify-write of job values. The SyncedMap guards its
jobs map[string]*schema.QuantizationJob // own map structure, but a job is a pointer mutated in place (e.g. the import
// goroutine), so the service still needs a lock to keep those field updates and
// the subsequent Set atomic with respect to readers.
mu sync.Mutex
// jobs is the cross-replica job store: an in-memory map kept consistent across
// replicas via NATS, optionally read-through to PostgreSQL in distributed mode.
jobs *syncstate.SyncedMap[string, *schema.QuantizationJob]
} }
// NewQuantizationService creates a new QuantizationService. // NewQuantizationService creates a new QuantizationService. In distributed mode
// pass the shared NATS client and PostgreSQL store so jobs stay consistent across
// replicas; pass nil for both in standalone mode, where the disk Loader hydrates
// the map and there is nothing to broadcast.
func NewQuantizationService( func NewQuantizationService(
appConfig *config.ApplicationConfig, appConfig *config.ApplicationConfig,
modelLoader *model.ModelLoader, modelLoader *model.ModelLoader,
configLoader *config.ModelConfigLoader, configLoader *config.ModelConfigLoader,
nats messaging.MessagingClient,
store *distributed.QuantStore,
) *QuantizationService { ) *QuantizationService {
s := &QuantizationService{ s := &QuantizationService{
appConfig: appConfig, appConfig: appConfig,
modelLoader: modelLoader, modelLoader: modelLoader,
configLoader: configLoader, configLoader: configLoader,
jobs: make(map[string]*schema.QuantizationJob),
} }
s.loadAllJobs()
// Only attach a Store interface when a concrete store exists, otherwise the
// SyncedMap would see a non-nil interface wrapping a nil pointer and try to
// hydrate/write through a nil DB.
var syncStore syncstate.Store[string, *schema.QuantizationJob]
if store != nil {
syncStore = &quantStoreAdapter{store: store}
}
s.jobs = syncstate.New(syncstate.Config[string, *schema.QuantizationJob]{
Name: "quant.jobs",
Key: func(j *schema.QuantizationJob) string { return j.ID },
Nats: nats,
Store: syncStore,
Loader: s.loadJobsFromDisk, // ignored when Store is set (distributed mode)
})
// Hydrate + subscribe. A hydrate failure must not take the server down: log and
// continue degraded (standalone), mirroring the FineTune/OpCache wiring.
if err := s.jobs.Start(appConfig.Context); err != nil {
xlog.Warn("Quantization SyncedMap start failed; running degraded", "error", err)
}
return s return s
} }
// Close releases the SyncedMap subscription and background workers.
func (s *QuantizationService) Close() error {
return s.jobs.Close()
}
// quantizationBaseDir returns the base directory for quantization job data. // quantizationBaseDir returns the base directory for quantization job data.
func (s *QuantizationService) quantizationBaseDir() string { func (s *QuantizationService) quantizationBaseDir() string {
return filepath.Join(s.appConfig.DataPath, "quantization") return filepath.Join(s.appConfig.DataPath, "quantization")
@@ -80,15 +120,18 @@ func (s *QuantizationService) saveJobState(job *schema.QuantizationJob) {
} }
} }
// loadAllJobs scans the quantization directory for persisted jobs and loads them. // loadJobsFromDisk scans the quantization directory for persisted jobs and
func (s *QuantizationService) loadAllJobs() { // returns them. It is the SyncedMap Loader used in standalone mode (no DB); the
// returned slice hydrates the map on Start.
func (s *QuantizationService) loadJobsFromDisk(_ context.Context) ([]*schema.QuantizationJob, error) {
baseDir := s.quantizationBaseDir() baseDir := s.quantizationBaseDir()
entries, err := os.ReadDir(baseDir) entries, err := os.ReadDir(baseDir)
if err != nil { if err != nil {
// Directory doesn't exist yet — that's fine // Directory doesn't exist yet — that's fine, start empty.
return return nil, nil
} }
var jobs []*schema.QuantizationJob
for _, entry := range entries { for _, entry := range entries {
if !entry.IsDir() { if !entry.IsDir() {
continue continue
@@ -117,12 +160,13 @@ func (s *QuantizationService) loadAllJobs() {
job.ImportMessage = "Server restarted while import was running" job.ImportMessage = "Server restarted while import was running"
} }
s.jobs[job.ID] = &job jobs = append(jobs, &job)
} }
if len(s.jobs) > 0 { if len(jobs) > 0 {
xlog.Info("Loaded persisted quantization jobs", "count", len(s.jobs)) xlog.Info("Loaded persisted quantization jobs", "count", len(jobs))
} }
return jobs, nil
} }
// StartJob starts a new quantization job. // StartJob starts a new quantization job.
@@ -188,7 +232,12 @@ func (s *QuantizationService) StartJob(ctx context.Context, userID string, req s
CreatedAt: time.Now().UTC().Format(time.RFC3339), CreatedAt: time.Now().UTC().Format(time.RFC3339),
Config: &req, Config: &req,
} }
s.jobs[jobID] = job // Set write-through persists to PostgreSQL (distributed) and broadcasts to
// peer replicas; the disk state.json is written separately for restart
// recovery / standalone hydrate.
if err := s.jobs.Set(ctx, job); err != nil {
return nil, fmt.Errorf("failed to persist job: %w", err)
}
s.saveJobState(job) s.saveJobState(job)
return &schema.QuantizationJobResponse{ return &schema.QuantizationJobResponse{
@@ -203,7 +252,7 @@ func (s *QuantizationService) GetJob(userID, jobID string) (*schema.Quantization
s.mu.Lock() s.mu.Lock()
defer s.mu.Unlock() defer s.mu.Unlock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
return nil, fmt.Errorf("job not found: %s", jobID) return nil, fmt.Errorf("job not found: %s", jobID)
} }
@@ -219,7 +268,7 @@ func (s *QuantizationService) ListJobs(userID string) []*schema.QuantizationJob
defer s.mu.Unlock() defer s.mu.Unlock()
var result []*schema.QuantizationJob var result []*schema.QuantizationJob
for _, job := range s.jobs { for _, job := range s.jobs.List() {
if userID == "" || job.UserID == userID { if userID == "" || job.UserID == userID {
result = append(result, job) result = append(result, job)
} }
@@ -235,7 +284,7 @@ func (s *QuantizationService) ListJobs(userID string) []*schema.QuantizationJob
// StopJob stops a running quantization job. // StopJob stops a running quantization job.
func (s *QuantizationService) StopJob(ctx context.Context, userID, jobID string) error { func (s *QuantizationService) StopJob(ctx context.Context, userID, jobID string) error {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID) return fmt.Errorf("job not found: %s", jobID)
@@ -256,6 +305,9 @@ func (s *QuantizationService) StopJob(ctx context.Context, userID, jobID string)
s.mu.Lock() s.mu.Lock()
job.Status = "stopped" job.Status = "stopped"
job.Message = "Quantization stopped by user" job.Message = "Quantization stopped by user"
if err := s.jobs.Set(ctx, job); err != nil {
xlog.Warn("Failed to persist stopped job", "job_id", jobID, "error", err)
}
s.saveJobState(job) s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
@@ -265,7 +317,7 @@ func (s *QuantizationService) StopJob(ctx context.Context, userID, jobID string)
// DeleteJob removes a quantization job and its associated data from disk. // DeleteJob removes a quantization job and its associated data from disk.
func (s *QuantizationService) DeleteJob(userID, jobID string) error { func (s *QuantizationService) DeleteJob(userID, jobID string) error {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID) return fmt.Errorf("job not found: %s", jobID)
@@ -289,7 +341,11 @@ func (s *QuantizationService) DeleteJob(userID, jobID string) error {
} }
importModelName := job.ImportModelName importModelName := job.ImportModelName
delete(s.jobs, jobID) // Delete write-through removes the DB row (distributed) and broadcasts the
// removal to peer replicas. DeleteJob has no ctx, so use Background.
if err := s.jobs.Delete(context.Background(), jobID); err != nil {
xlog.Warn("Failed to delete job from store", "job_id", jobID, "error", err)
}
s.mu.Unlock() s.mu.Unlock()
// Remove job directory (state.json, output files) // Remove job directory (state.json, output files)
@@ -324,7 +380,7 @@ func (s *QuantizationService) DeleteJob(userID, jobID string) error {
// StreamProgress opens a gRPC progress stream and calls the callback for each update. // StreamProgress opens a gRPC progress stream and calls the callback for each update.
func (s *QuantizationService) StreamProgress(ctx context.Context, userID, jobID string, callback func(event *schema.QuantizationProgressEvent)) error { func (s *QuantizationService) StreamProgress(ctx context.Context, userID, jobID string, callback func(event *schema.QuantizationProgressEvent)) error {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return fmt.Errorf("job not found: %s", jobID) return fmt.Errorf("job not found: %s", jobID)
@@ -353,7 +409,7 @@ func (s *QuantizationService) StreamProgress(ctx context.Context, userID, jobID
}, func(update *pb.QuantizationProgressUpdate) { }, func(update *pb.QuantizationProgressUpdate) {
// Update job status and persist // Update job status and persist
s.mu.Lock() s.mu.Lock()
if j, ok := s.jobs[jobID]; ok { if j, ok := s.jobs.Get(jobID); ok {
// Don't let progress updates overwrite terminal states // Don't let progress updates overwrite terminal states
isTerminal := j.Status == "stopped" || j.Status == "completed" || j.Status == "failed" isTerminal := j.Status == "stopped" || j.Status == "completed" || j.Status == "failed"
if !isTerminal { if !isTerminal {
@@ -365,6 +421,9 @@ func (s *QuantizationService) StreamProgress(ctx context.Context, userID, jobID
if update.OutputFile != "" { if update.OutputFile != "" {
j.OutputFile = update.OutputFile j.OutputFile = update.OutputFile
} }
if err := s.jobs.Set(ctx, j); err != nil {
xlog.Warn("Failed to persist progress update", "job_id", jobID, "error", err)
}
s.saveJobState(j) s.saveJobState(j)
} }
s.mu.Unlock() s.mu.Unlock()
@@ -399,7 +458,7 @@ func sanitizeQuantModelName(s string) string {
// ImportModel imports a quantized model into LocalAI asynchronously. // ImportModel imports a quantized model into LocalAI asynchronously.
func (s *QuantizationService) ImportModel(ctx context.Context, userID, jobID string, req schema.QuantizationImportRequest) (string, error) { func (s *QuantizationService) ImportModel(ctx context.Context, userID, jobID string, req schema.QuantizationImportRequest) (string, error) {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return "", fmt.Errorf("job not found: %s", jobID) return "", fmt.Errorf("job not found: %s", jobID)
@@ -459,6 +518,9 @@ func (s *QuantizationService) ImportModel(ctx context.Context, userID, jobID str
job.ImportStatus = "importing" job.ImportStatus = "importing"
job.ImportMessage = "" job.ImportMessage = ""
job.ImportModelName = "" job.ImportModelName = ""
if err := s.jobs.Set(ctx, job); err != nil {
xlog.Warn("Failed to persist import start", "job_id", jobID, "error", err)
}
s.saveJobState(job) s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
@@ -514,10 +576,15 @@ func (s *QuantizationService) ImportModel(ctx context.Context, userID, jobID str
xlog.Info("Quantized model imported and registered", "job_id", jobID, "model_name", modelName) xlog.Info("Quantized model imported and registered", "job_id", jobID, "model_name", modelName)
// Runs after the HTTP request returns, so use Background rather than the
// (now likely cancelled) request ctx for the write-through.
s.mu.Lock() s.mu.Lock()
job.ImportStatus = "completed" job.ImportStatus = "completed"
job.ImportModelName = modelName job.ImportModelName = modelName
job.ImportMessage = "" job.ImportMessage = ""
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist import completion", "job_id", jobID, "error", err)
}
s.saveJobState(job) s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
}() }()
@@ -525,10 +592,14 @@ func (s *QuantizationService) ImportModel(ctx context.Context, userID, jobID str
return modelName, nil return modelName, nil
} }
// setImportMessage updates the import message and persists the job state. // setImportMessage updates the import message and persists the job state. Called
// from the background import goroutine, so it uses Background for write-through.
func (s *QuantizationService) setImportMessage(job *schema.QuantizationJob, msg string) { func (s *QuantizationService) setImportMessage(job *schema.QuantizationJob, msg string) {
s.mu.Lock() s.mu.Lock()
job.ImportMessage = msg job.ImportMessage = msg
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist import message", "job_id", job.ID, "error", err)
}
s.saveJobState(job) s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
} }
@@ -539,6 +610,9 @@ func (s *QuantizationService) setImportFailed(job *schema.QuantizationJob, messa
s.mu.Lock() s.mu.Lock()
job.ImportStatus = "failed" job.ImportStatus = "failed"
job.ImportMessage = message job.ImportMessage = message
if err := s.jobs.Set(context.Background(), job); err != nil {
xlog.Warn("Failed to persist import failure", "job_id", job.ID, "error", err)
}
s.saveJobState(job) s.saveJobState(job)
s.mu.Unlock() s.mu.Unlock()
} }
@@ -546,7 +620,7 @@ func (s *QuantizationService) setImportFailed(job *schema.QuantizationJob, messa
// GetOutputPath returns the path to the quantized model file and a download name. // GetOutputPath returns the path to the quantized model file and a download name.
func (s *QuantizationService) GetOutputPath(userID, jobID string) (string, string, error) { func (s *QuantizationService) GetOutputPath(userID, jobID string) (string, string, error) {
s.mu.Lock() s.mu.Lock()
job, ok := s.jobs[jobID] job, ok := s.jobs.Get(jobID)
if !ok { if !ok {
s.mu.Unlock() s.mu.Unlock()
return "", "", fmt.Errorf("job not found: %s", jobID) return "", "", fmt.Errorf("job not found: %s", jobID)

View File

@@ -0,0 +1,187 @@
package quantization
// White-box tests (package quantization) so a spec can drive the service's
// internal SyncedMap the same way StartJob does (via jobs.Set) without standing
// up a quantization backend, then assert the cross-replica reads
// (GetJob/ListJobs) and the adapter conversions that keep REST responses
// byte-for-byte unchanged.
import (
"context"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/config"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/testutil"
)
// newTestService builds a standalone QuantizationService wired to the given bus.
// The model/config loaders are nil because the read/sync paths under test never
// touch them; the data dir is a throwaway temp dir so the disk Loader finds
// nothing.
func newTestService(bus *testutil.FakeBus) *QuantizationService {
appConfig := &config.ApplicationConfig{
Context: context.Background(),
DataPath: GinkgoT().TempDir(),
}
return NewQuantizationService(appConfig, nil, nil, bus, nil)
}
var _ = Describe("QuantizationService", func() {
ctx := context.Background()
Describe("cross-replica job visibility", func() {
var (
bus *testutil.FakeBus
a, b *QuantizationService
)
BeforeEach(func() {
// One shared bus, two replicas: exactly the distributed topology where a
// round-robin request may land on a replica that did not originate the
// change.
bus = testutil.NewFakeBus()
a = newTestService(bus)
b = newTestService(bus)
})
AfterEach(func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
})
It("makes a job created on A visible via B's GetJob and ListJobs", func() {
job := &schema.QuantizationJob{ID: "job-1", UserID: "user-1", Status: "queued", CreatedAt: "2026-06-27T10:00:00Z"}
// StartJob persists via jobs.Set; drive that directly to avoid a backend.
Expect(a.jobs.Set(ctx, job)).To(Succeed())
got, err := b.GetJob("user-1", "job-1")
Expect(err).ToNot(HaveOccurred(), "B must see a job A just created")
Expect(got.Status).To(Equal("queued"))
listed := b.ListJobs("user-1")
Expect(listed).To(HaveLen(1))
Expect(listed[0].ID).To(Equal("job-1"))
})
It("removes a job from B when it is deleted on A", func() {
job := &schema.QuantizationJob{ID: "job-2", UserID: "user-1", Status: "completed", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, job)).To(Succeed())
_, err := b.GetJob("user-1", "job-2")
Expect(err).ToNot(HaveOccurred(), "precondition: B must have the job before the delete")
Expect(a.jobs.Delete(ctx, "job-2")).To(Succeed())
_, err = b.GetJob("user-1", "job-2")
Expect(err).To(HaveOccurred(), "a delete on A must remove the job from B")
})
It("propagates a status update from A to B", func() {
job := &schema.QuantizationJob{ID: "job-3", UserID: "user-1", Status: "quantizing", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, job)).To(Succeed())
updated := &schema.QuantizationJob{ID: "job-3", UserID: "user-1", Status: "completed", CreatedAt: "2026-06-27T10:00:00Z"}
Expect(a.jobs.Set(ctx, updated)).To(Succeed())
got, err := b.GetJob("user-1", "job-3")
Expect(err).ToNot(HaveOccurred())
Expect(got.Status).To(Equal("completed"))
})
})
Describe("ListJobs", func() {
var svc *QuantizationService
BeforeEach(func() {
svc = newTestService(testutil.NewFakeBus())
})
AfterEach(func() { Expect(svc.Close()).To(Succeed()) })
It("filters by user and sorts newest-first", func() {
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "old", UserID: "u1", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "new", UserID: "u1", CreatedAt: "2026-06-27T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "other", UserID: "u2", CreatedAt: "2026-06-26T10:00:00Z"})).To(Succeed())
jobs := svc.ListJobs("u1")
Expect(jobs).To(HaveLen(2), "only u1's jobs")
Expect(jobs[0].ID).To(Equal("new"), "newest first")
Expect(jobs[1].ID).To(Equal("old"))
})
It("returns every user's jobs when the userID filter is empty", func() {
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "a", UserID: "u1", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "b", UserID: "u2", CreatedAt: "2026-06-26T10:00:00Z"})).To(Succeed())
Expect(svc.ListJobs("")).To(HaveLen(2))
})
It("rejects GetJob for a job owned by another user", func() {
Expect(svc.jobs.Set(ctx, &schema.QuantizationJob{ID: "x", UserID: "owner", CreatedAt: "2026-06-25T10:00:00Z"})).To(Succeed())
_, err := svc.GetJob("intruder", "x")
Expect(err).To(HaveOccurred(), "a different user must not read someone else's job")
})
})
Describe("store adapter conversion", func() {
// The SyncedMap value type is *schema.QuantizationJob (the exact REST shape).
// These specs prove the DB adapter round-trips it losslessly, so hydrate and
// write-through in distributed mode keep responses unchanged.
It("round-trips a job through jobToRecord/recordToJob preserving the API shape", func() {
original := &schema.QuantizationJob{
ID: "rt-1",
UserID: "user-1",
Model: "base-model",
Backend: "llama-cpp-quantization",
ModelID: "llama-cpp-quantization-quantize-rt-1",
QuantizationType: "q4_k_m",
Status: "completed",
Message: "done",
OutputDir: "/data/quantization/rt-1",
OutputFile: "/data/quantization/rt-1/model.gguf",
ExtraOptions: map[string]string{"hf_token": "secret"},
CreatedAt: "2026-06-27T10:00:00Z",
ImportStatus: "completed",
ImportMessage: "",
ImportModelName: "base-model-q4_k_m-rt-1",
Config: &schema.QuantizationJobRequest{Model: "base-model", Backend: "llama-cpp-quantization", QuantizationType: "q4_k_m"},
}
rec := jobToRecord(original)
Expect(rec.ID).To(Equal("rt-1"))
Expect(rec.ConfigJSON).ToNot(BeEmpty(), "structured config must serialize into the JSON column")
Expect(rec.ExtraOptsJSON).ToNot(BeEmpty())
back := recordToJob(rec)
Expect(back.ID).To(Equal(original.ID))
Expect(back.UserID).To(Equal(original.UserID))
Expect(back.Model).To(Equal(original.Model))
Expect(back.Backend).To(Equal(original.Backend))
Expect(back.ModelID).To(Equal(original.ModelID))
Expect(back.QuantizationType).To(Equal(original.QuantizationType))
Expect(back.Status).To(Equal(original.Status))
Expect(back.Message).To(Equal(original.Message))
Expect(back.OutputDir).To(Equal(original.OutputDir))
Expect(back.OutputFile).To(Equal(original.OutputFile))
Expect(back.ImportStatus).To(Equal(original.ImportStatus))
Expect(back.ImportModelName).To(Equal(original.ImportModelName))
Expect(back.CreatedAt).To(Equal(original.CreatedAt))
Expect(back.ExtraOptions).To(Equal(original.ExtraOptions))
Expect(back.Config).ToNot(BeNil())
Expect(back.Config.QuantizationType).To(Equal("q4_k_m"))
})
})
Describe("compile-time adapter contract", func() {
It("satisfies syncstate.Store for *distributed.QuantStore", func() {
// Guards against drift between the adapter and the component interface;
// the var assertion in syncstore.go covers it at build time, this keeps
// the type referenced from a spec too.
var _ *distributed.QuantStore
Expect(&quantStoreAdapter{}).ToNot(BeNil())
})
})
})

View File

@@ -0,0 +1,114 @@
package quantization
import (
"context"
"encoding/json"
"time"
"github.com/mudler/LocalAI/core/schema"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/syncstate"
)
// quantStoreAdapter bridges the distributed PostgreSQL QuantStore to the generic
// syncstate.Store the SyncedMap consumes. It is only wired in distributed mode;
// standalone leaves Store nil and hydrates from disk via a Loader instead.
//
// The SyncedMap value type is *schema.QuantizationJob (the exact shape the REST
// API returns) so reads need no conversion and the response JSON is provably
// unchanged. The adapter is the single place that translates between that API
// shape and the DB QuantJobRecord.
type quantStoreAdapter struct {
store *distributed.QuantStore
}
// compile-time assertion that the adapter satisfies the component's Store.
var _ syncstate.Store[string, *schema.QuantizationJob] = (*quantStoreAdapter)(nil)
func (a *quantStoreAdapter) List(_ context.Context) ([]*schema.QuantizationJob, error) {
records, err := a.store.ListAll()
if err != nil {
return nil, err
}
jobs := make([]*schema.QuantizationJob, 0, len(records))
for i := range records {
jobs = append(jobs, recordToJob(&records[i]))
}
return jobs, nil
}
func (a *quantStoreAdapter) Upsert(_ context.Context, job *schema.QuantizationJob) error {
return a.store.Upsert(jobToRecord(job))
}
func (a *quantStoreAdapter) Delete(_ context.Context, id string) error {
return a.store.Delete(id)
}
// recordToJob maps a persisted DB record back to the API shape, reconstructing
// the structured Config / ExtraOptions from their JSON columns.
func recordToJob(r *distributed.QuantJobRecord) *schema.QuantizationJob {
job := &schema.QuantizationJob{
ID: r.ID,
UserID: r.UserID,
Model: r.Model,
Backend: r.Backend,
ModelID: r.ModelID,
QuantizationType: r.QuantizationType,
Status: r.Status,
Message: r.Message,
OutputDir: r.OutputDir,
OutputFile: r.OutputFile,
ImportStatus: r.ImportStatus,
ImportMessage: r.ImportMessage,
ImportModelName: r.ImportModelName,
CreatedAt: r.CreatedAt.UTC().Format(time.RFC3339),
}
if r.ExtraOptsJSON != "" {
// Best-effort: a malformed column must not drop the whole job from the API.
_ = json.Unmarshal([]byte(r.ExtraOptsJSON), &job.ExtraOptions)
}
if r.ConfigJSON != "" {
var cfg schema.QuantizationJobRequest
if err := json.Unmarshal([]byte(r.ConfigJSON), &cfg); err == nil {
job.Config = &cfg
}
}
return job
}
// jobToRecord maps the API shape to a DB record for write-through, serializing
// the structured Config / ExtraOptions into their JSON columns. CreatedAt is
// parsed back from the RFC3339 string the service stamps; an unparseable value is
// left zero so QuantStore.Upsert stamps "now".
func jobToRecord(job *schema.QuantizationJob) *distributed.QuantJobRecord {
rec := &distributed.QuantJobRecord{
ID: job.ID,
UserID: job.UserID,
Model: job.Model,
Backend: job.Backend,
ModelID: job.ModelID,
QuantizationType: job.QuantizationType,
Status: job.Status,
Message: job.Message,
OutputDir: job.OutputDir,
OutputFile: job.OutputFile,
ImportStatus: job.ImportStatus,
ImportMessage: job.ImportMessage,
ImportModelName: job.ImportModelName,
}
if job.Config != nil {
if data, err := json.Marshal(job.Config); err == nil {
rec.ConfigJSON = string(data)
}
}
if job.ExtraOptions != nil {
if data, err := json.Marshal(job.ExtraOptions); err == nil {
rec.ExtraOptsJSON = string(data)
}
}
if t, err := time.Parse(time.RFC3339, job.CreatedAt); err == nil {
rec.CreatedAt = t
}
return rec
}

View File

@@ -0,0 +1,289 @@
// Package syncstate provides SyncedMap, a reusable cross-replica in-memory map.
//
// LocalAI in distributed mode runs multiple frontend replicas behind a
// round-robin load balancer. Several features keep process-local in-memory state
// that is surfaced to the HTTP/UI API; without cross-replica sync a poll that
// lands on a replica which did not originate a change sees stale or missing data.
// SyncedMap collapses the three legs each feature otherwise hand-wires - an
// in-memory map, a NATS broadcast/apply path, and optional durable read-through -
// into one well-tested component so cross-replica consistency is a configuration
// choice rather than a bespoke re-implementation.
package syncstate
import (
"context"
"sync"
"time"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/xlog"
)
// Op values carried on the wire and passed to OnApply.
const (
opSet = "set"
opDelete = "delete"
)
// Store is optional durable backing for a SyncedMap. In distributed mode it is a
// single shared DB, so the apply path (a delta received from a peer) updates
// memory only and never re-writes the Store.
type Store[K comparable, V any] interface {
List(ctx context.Context) ([]V, error)
Upsert(ctx context.Context, v V) error
Delete(ctx context.Context, k K) error
}
// Config configures a SyncedMap.
type Config[K comparable, V any] struct {
Name string // subject namespace, e.g. "finetune.jobs"
Key func(V) K // extract the key from a value
Nats messaging.MessagingClient // nil => standalone: in-memory only, no broadcast/subscribe
Store Store[K, V] // optional read-through persistence
Loader func(ctx context.Context) ([]V, error) // source when there is no Store (e.g. disk reload)
OnApply func(op string, k K, v V) // optional hook after an applied change (e.g. ShutdownModel)
Reconcile time.Duration // optional periodic re-hydrate; 0 = off
}
// delta is the JSON wire envelope broadcast on every local mutation. Value is
// omitempty so a delete carries only op+key.
type delta[K comparable, V any] struct {
Op string `json:"op"`
Key K `json:"key"`
Value V `json:"value,omitempty"`
}
// SyncedMap is a cross-replica in-memory map. A local write (Set/Delete) updates
// memory, the optional durable Store, then broadcasts a delta to peers. A peer's
// delta updates memory only and fires OnApply - it never re-broadcasts and never
// writes the Store. That structural split is the echo-loop guard (same pattern as
// galleryop.mergeStatus / OpCache.applyStart): receiving your own broadcast just
// re-applies an idempotent value to memory, so there is no storm and no
// double-write.
type SyncedMap[K comparable, V any] struct {
cfg Config[K, V]
mu sync.RWMutex
data map[K]V
sub Subscription
// lifeCtx outlives Start's argument: a reconnect callback or reconcile tick
// can fire long after Start returns, so they must not be tied to a ctx the
// caller may cancel. Close cancels it.
lifeCtx context.Context
cancel context.CancelFunc
wg sync.WaitGroup
}
// Subscription is the subset of messaging.Subscription the component holds onto.
type Subscription = messaging.Subscription
// New constructs a SyncedMap. Call Start to hydrate and begin syncing.
func New[K comparable, V any](cfg Config[K, V]) *SyncedMap[K, V] {
return &SyncedMap[K, V]{cfg: cfg, data: make(map[K]V)}
}
func (m *SyncedMap[K, V]) subject() string {
return messaging.SubjectSyncStateDelta(m.cfg.Name)
}
// Start hydrates from the source, subscribes for peer deltas, registers a
// reconnect re-hydrate (when the client supports it), and starts the optional
// reconcile ticker.
func (m *SyncedMap[K, V]) Start(ctx context.Context) error {
if err := m.hydrate(ctx); err != nil {
return err
}
// The cancel func is stored on the struct and invoked in Close (covered by
// tests); lifeCtx must outlive Start to drive the reconnect/reconcile
// goroutines, so it cannot be cancelled or deferred within this scope.
m.lifeCtx, m.cancel = context.WithCancel(context.Background()) // #nosec G118 -- cancel is invoked in Close()
if m.cfg.Nats != nil {
sub, err := messaging.SubscribeJSON(m.cfg.Nats, m.subject(), m.apply)
if err != nil {
return err
}
m.sub = sub
// nats.go transparently resubscribes on reconnect, but it cannot know we
// kept derived in-memory state that may have drifted while the link was
// down, so re-hydrate from the durable source. Detected via an optional
// interface so MessagingClient itself stays minimal; standalone/test
// clients without the method simply fall back to the reconcile ticker.
if r, ok := m.cfg.Nats.(interface{ OnReconnect(func()) }); ok {
r.OnReconnect(func() {
if err := m.hydrate(m.lifeCtx); err != nil {
xlog.Warn("syncstate: reconnect re-hydrate failed", "name", m.cfg.Name, "error", err)
}
})
}
}
if m.cfg.Reconcile > 0 {
m.wg.Add(1)
go m.reconcileLoop()
}
return nil
}
// Close unsubscribes and stops the reconcile ticker.
func (m *SyncedMap[K, V]) Close() error {
if m.cancel != nil {
m.cancel()
}
m.wg.Wait()
if m.sub != nil {
return m.sub.Unsubscribe()
}
return nil
}
// Set updates the value locally, writes through the Store, then broadcasts.
// Per the data-flow contract the Store write happens under the lock so memory and
// durable state move together; the broadcast is best-effort after unlocking.
func (m *SyncedMap[K, V]) Set(ctx context.Context, v V) error {
k := m.cfg.Key(v)
m.mu.Lock()
m.data[k] = v
if m.cfg.Store != nil {
if err := m.cfg.Store.Upsert(ctx, v); err != nil {
m.mu.Unlock()
return err
}
}
m.mu.Unlock()
m.publish(opSet, k, v)
return nil
}
// Delete removes the key locally, deletes it from the Store, then broadcasts.
func (m *SyncedMap[K, V]) Delete(ctx context.Context, k K) error {
m.mu.Lock()
delete(m.data, k)
if m.cfg.Store != nil {
if err := m.cfg.Store.Delete(ctx, k); err != nil {
m.mu.Unlock()
return err
}
}
m.mu.Unlock()
var zero V
m.publish(opDelete, k, zero)
return nil
}
// Get returns the value for k and whether it was present.
func (m *SyncedMap[K, V]) Get(k K) (V, bool) {
m.mu.RLock()
defer m.mu.RUnlock()
v, ok := m.data[k]
return v, ok
}
// List returns a snapshot slice of all values.
func (m *SyncedMap[K, V]) List() []V {
m.mu.RLock()
defer m.mu.RUnlock()
out := make([]V, 0, len(m.data))
for _, v := range m.data {
out = append(out, v)
}
return out
}
// Snapshot returns a copy of the underlying map.
func (m *SyncedMap[K, V]) Snapshot() map[K]V {
m.mu.RLock()
defer m.mu.RUnlock()
out := make(map[K]V, len(m.data))
for k, v := range m.data {
out[k] = v
}
return out
}
// publish broadcasts a delta. Standalone (nil Nats) is a strict no-op.
func (m *SyncedMap[K, V]) publish(op string, k K, v V) {
if m.cfg.Nats == nil {
return
}
if err := m.cfg.Nats.Publish(m.subject(), delta[K, V]{Op: op, Key: k, Value: v}); err != nil {
xlog.Warn("syncstate: failed to broadcast delta", "name", m.cfg.Name, "op", op, "error", err)
}
}
// apply handles a peer's delta: memory-only update plus OnApply. It deliberately
// never writes the Store nor re-publishes - that is the echo-loop guard.
func (m *SyncedMap[K, V]) apply(d delta[K, V]) {
switch d.Op {
case opSet:
m.mu.Lock()
m.data[d.Key] = d.Value
m.mu.Unlock()
case opDelete:
m.mu.Lock()
delete(m.data, d.Key)
m.mu.Unlock()
default:
xlog.Warn("syncstate: ignoring delta with unknown op", "name", m.cfg.Name, "op", d.Op)
return
}
if m.cfg.OnApply != nil {
m.cfg.OnApply(d.Op, d.Key, d.Value)
}
}
// hydrate replaces the whole map from the durable source: Store if present, else
// Loader. With neither, a late joiner starts empty and catches up via deltas
// (acceptable only for ephemeral state).
func (m *SyncedMap[K, V]) hydrate(ctx context.Context) error {
var (
vals []V
err error
)
switch {
case m.cfg.Store != nil:
vals, err = m.cfg.Store.List(ctx)
case m.cfg.Loader != nil:
vals, err = m.cfg.Loader(ctx)
default:
return nil
}
if err != nil {
return err
}
m.replaceAll(vals)
return nil
}
// replaceAll atomically swaps the map contents for the given values, keyed via
// cfg.Key.
func (m *SyncedMap[K, V]) replaceAll(vals []V) {
next := make(map[K]V, len(vals))
for _, v := range vals {
next[m.cfg.Key(v)] = v
}
m.mu.Lock()
m.data = next
m.mu.Unlock()
}
// reconcileLoop periodically re-hydrates to repair silent drift (missed deltas).
func (m *SyncedMap[K, V]) reconcileLoop() {
defer m.wg.Done()
t := time.NewTicker(m.cfg.Reconcile)
defer t.Stop()
for {
select {
case <-m.lifeCtx.Done():
return
case <-t.C:
if err := m.hydrate(m.lifeCtx); err != nil {
xlog.Warn("syncstate: reconcile re-hydrate failed", "name", m.cfg.Name, "error", err)
}
}
}
}

View File

@@ -0,0 +1,13 @@
package syncstate_test
import (
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
)
func TestSyncstate(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Syncstate Suite")
}

View File

@@ -0,0 +1,291 @@
package syncstate_test
import (
"context"
"sync"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
"github.com/mudler/LocalAI/core/services/testutil"
)
// job is a minimal JSON-serializable value stand-in for the real cross-replica
// records (finetune/quant/agent jobs) the component is built for.
type job struct {
ID string `json:"id"`
Status string `json:"status"`
}
func jobKey(j *job) string { return j.ID }
const stateName = "test.jobs"
func deltaSubject() string { return messaging.SubjectSyncStateDelta(stateName) }
// fakeStore is an in-memory Store that records call counts so specs can assert
// the write-through-vs-apply split (local writes hit the Store; applied deltas
// must not).
type fakeStore struct {
mu sync.Mutex
data map[string]*job
upsertCalls int
deleteCalls int
listCalls int
}
func newFakeStore(seed ...*job) *fakeStore {
s := &fakeStore{data: map[string]*job{}}
for _, j := range seed {
s.data[j.ID] = j
}
return s
}
func (s *fakeStore) List(_ context.Context) ([]*job, error) {
s.mu.Lock()
defer s.mu.Unlock()
s.listCalls++
out := make([]*job, 0, len(s.data))
for _, j := range s.data {
out = append(out, j)
}
return out, nil
}
func (s *fakeStore) Upsert(_ context.Context, j *job) error {
s.mu.Lock()
defer s.mu.Unlock()
s.upsertCalls++
s.data[j.ID] = j
return nil
}
func (s *fakeStore) Delete(_ context.Context, k string) error {
s.mu.Lock()
defer s.mu.Unlock()
s.deleteCalls++
delete(s.data, k)
return nil
}
// add simulates a peer replica writing to the shared DB out-of-band (e.g. while
// this replica was partitioned), so a re-hydrate can be observed to pick it up.
func (s *fakeStore) add(j *job) {
s.mu.Lock()
defer s.mu.Unlock()
s.data[j.ID] = j
}
func (s *fakeStore) counts() (upsert, del, list int) {
s.mu.Lock()
defer s.mu.Unlock()
return s.upsertCalls, s.deleteCalls, s.listCalls
}
var _ = Describe("SyncedMap", func() {
ctx := context.Background()
Describe("cross-replica delta propagation", func() {
var (
bus *testutil.FakeBus
a, b *syncstate.SyncedMap[string, *job]
)
BeforeEach(func() {
bus = testutil.NewFakeBus()
a = syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
b = syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
Expect(a.Start(ctx)).To(Succeed())
Expect(b.Start(ctx)).To(Succeed())
})
AfterEach(func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
})
It("propagates a Set on A to B", func() {
Expect(a.Set(ctx, &job{ID: "1", Status: "running"})).To(Succeed())
got, ok := b.Get("1")
Expect(ok).To(BeTrue(), "replica B should see the value A just set")
Expect(got.Status).To(Equal("running"))
})
It("prunes a Delete on A from B", func() {
Expect(a.Set(ctx, &job{ID: "1", Status: "running"})).To(Succeed())
_, present := b.Get("1")
Expect(present).To(BeTrue(), "precondition: B must have the value before the delete")
Expect(a.Delete(ctx, "1")).To(Succeed())
_, ok := b.Get("1")
Expect(ok).To(BeFalse(), "a delete on A must remove the key from B")
})
})
Describe("hydration", func() {
It("hydrates on Start from a preloaded Store", func() {
store := newFakeStore(&job{ID: "x", Status: "done"})
m := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Store: store})
Expect(m.Start(ctx)).To(Succeed())
got, ok := m.Get("x")
Expect(ok).To(BeTrue(), "Start must populate the map from the Store")
Expect(got.Status).To(Equal("done"))
})
It("uses the Loader when Store is nil", func() {
m := syncstate.New(syncstate.Config[string, *job]{
Name: stateName,
Key: jobKey,
Loader: func(_ context.Context) ([]*job, error) {
return []*job{{ID: "l", Status: "loaded"}}, nil
},
})
Expect(m.Start(ctx)).To(Succeed())
got, ok := m.Get("l")
Expect(ok).To(BeTrue(), "Loader output must hydrate the map when there is no Store")
Expect(got.Status).To(Equal("loaded"))
})
})
Describe("echo-loop guard", func() {
It("applies its own broadcast once and does not re-publish", func() {
bus := testutil.NewFakeBus()
a := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
b := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
Expect(a.Start(ctx)).To(Succeed())
Expect(b.Start(ctx)).To(Succeed())
defer func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
}()
Expect(a.Set(ctx, &job{ID: "e", Status: "running"})).To(Succeed())
// One local write must produce exactly one broadcast: A and B both
// receive it and apply to memory, but the apply path never re-publishes.
Expect(bus.PublishCount(deltaSubject())).To(Equal(1),
"the apply path must not re-broadcast, otherwise replicas storm")
Expect(a.List()).To(HaveLen(1), "A must not double-store its own echo")
_, ok := b.Get("e")
Expect(ok).To(BeTrue())
})
})
Describe("Store write-through vs apply", func() {
It("writes the Store on local Set/Delete but not on an applied delta", func() {
bus := testutil.NewFakeBus()
storeA := newFakeStore()
storeB := newFakeStore()
a := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus, Store: storeA})
b := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus, Store: storeB})
Expect(a.Start(ctx)).To(Succeed())
Expect(b.Start(ctx)).To(Succeed())
defer func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
}()
Expect(a.Set(ctx, &job{ID: "w", Status: "running"})).To(Succeed())
upA, _, _ := storeA.counts()
upB, _, _ := storeB.counts()
Expect(upA).To(Equal(1), "local Set must write through to its own Store")
Expect(upB).To(Equal(0), "the apply path must never write the peer's Store")
Expect(a.Delete(ctx, "w")).To(Succeed())
_, delA, _ := storeA.counts()
_, delB, _ := storeB.counts()
Expect(delA).To(Equal(1), "local Delete must delete from its own Store")
Expect(delB).To(Equal(0), "the apply path must never delete from the peer's Store")
})
})
Describe("OnApply hook", func() {
It("fires with the correct op and key on an applied delta", func() {
bus := testutil.NewFakeBus()
var (
mu sync.Mutex
ops []string
keys []string
)
a := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus})
b := syncstate.New(syncstate.Config[string, *job]{
Name: stateName, Key: jobKey, Nats: bus,
OnApply: func(op string, k string, _ *job) {
mu.Lock()
ops = append(ops, op)
keys = append(keys, k)
mu.Unlock()
},
})
Expect(a.Start(ctx)).To(Succeed())
Expect(b.Start(ctx)).To(Succeed())
defer func() {
Expect(a.Close()).To(Succeed())
Expect(b.Close()).To(Succeed())
}()
Expect(a.Set(ctx, &job{ID: "o", Status: "running"})).To(Succeed())
Expect(a.Delete(ctx, "o")).To(Succeed())
mu.Lock()
defer mu.Unlock()
Expect(ops).To(Equal([]string{"set", "delete"}))
Expect(keys).To(Equal([]string{"o", "o"}))
})
})
Describe("standalone (nil Nats)", func() {
It("works in-memory with no panic and nothing to broadcast", func() {
m := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey})
Expect(m.Start(ctx)).To(Succeed())
defer func() { Expect(m.Close()).To(Succeed()) }()
Expect(func() {
Expect(m.Set(ctx, &job{ID: "s", Status: "running"})).To(Succeed())
}).ToNot(Panic())
got, ok := m.Get("s")
Expect(ok).To(BeTrue())
Expect(got.Status).To(Equal("running"))
Expect(m.List()).To(HaveLen(1))
Expect(m.Snapshot()).To(HaveKey("s"))
Expect(m.Delete(ctx, "s")).To(Succeed())
_, ok = m.Get("s")
Expect(ok).To(BeFalse())
})
})
Describe("reconnect re-hydrate", func() {
It("re-reads the source when the messaging client reconnects", func() {
bus := testutil.NewFakeBus()
store := newFakeStore(&job{ID: "init", Status: "running"})
m := syncstate.New(syncstate.Config[string, *job]{Name: stateName, Key: jobKey, Nats: bus, Store: store})
Expect(m.Start(ctx)).To(Succeed())
defer func() { Expect(m.Close()).To(Succeed()) }()
_, ok := m.Get("init")
Expect(ok).To(BeTrue())
// A peer writes to the shared DB while we are unaware (no delta seen).
store.add(&job{ID: "late", Status: "running"})
_, ok = m.Get("late")
Expect(ok).To(BeFalse(), "the new row should not appear before a re-hydrate")
bus.TriggerReconnect()
_, ok = m.Get("late")
Expect(ok).To(BeTrue(), "reconnect must re-hydrate from the source and pick up drift")
_, _, list := store.counts()
Expect(list).To(Equal(2), "exactly one Start hydrate plus one reconnect re-hydrate")
})
})
})

View File

@@ -0,0 +1,160 @@
package testutil
import (
"encoding/json"
"strings"
"sync"
"time"
"github.com/mudler/LocalAI/core/services/messaging"
)
// FakeBus is an in-memory messaging.MessagingClient that delivers each published
// message synchronously to every registered subscriber whose subject filter
// matches, including NATS-style wildcard subjects (`*` matches exactly one
// token).
//
// Synchronous delivery keeps specs deterministic: the moment Publish returns,
// every matching subscriber's handler has already run, so the spec body can read
// the resulting state without polling. It is the shared test double for every
// cross-replica-sync adopter (gallery, syncstate, ...) so they exercise the same
// delivery semantics. It deliberately depends only on the standard library and
// the messaging package — no test framework — so it is importable anywhere.
type FakeBus struct {
mu sync.Mutex
subs []fakeBusSub
// publishCounts records how many messages were published per subject, so a
// spec can assert the echo-loop guard (an applied delta must not re-publish).
publishCounts map[string]int
// reconnectCbs back the optional OnReconnect/TriggerReconnect pair, letting a
// spec exercise the component's reconnect re-hydrate path without a real
// NATS server.
reconnectCbs []func()
}
type fakeBusSub struct {
subject string
handler func([]byte)
}
// NewFakeBus returns a ready-to-use in-memory bus.
func NewFakeBus() *FakeBus {
return &FakeBus{publishCounts: map[string]int{}}
}
// subjectMatches reports whether a subscription filter matches a concrete
// subject, honoring the single-token `*` wildcard used by NATS.
func subjectMatches(filter, subject string) bool {
if filter == subject {
return true
}
fp := strings.Split(filter, ".")
sp := strings.Split(subject, ".")
if len(fp) != len(sp) {
return false
}
for i := range fp {
if fp[i] == "*" {
continue
}
if fp[i] != sp[i] {
return false
}
}
return true
}
// Publish marshals data as JSON and delivers it synchronously to every matching
// subscriber.
func (b *FakeBus) Publish(subject string, data any) error {
payload, err := json.Marshal(data)
if err != nil {
return err
}
b.mu.Lock()
b.publishCounts[subject]++
subs := append([]fakeBusSub(nil), b.subs...)
b.mu.Unlock()
for _, s := range subs {
if subjectMatches(s.subject, subject) {
s.handler(payload)
}
}
return nil
}
// PublishCount returns how many messages were published on the exact subject.
func (b *FakeBus) PublishCount(subject string) int {
b.mu.Lock()
defer b.mu.Unlock()
return b.publishCounts[subject]
}
type fakeBusSubscription struct {
bus *FakeBus
subRef fakeBusSub
}
func (s *fakeBusSubscription) Unsubscribe() error {
s.bus.mu.Lock()
defer s.bus.mu.Unlock()
for i, candidate := range s.bus.subs {
if candidate.subject == s.subRef.subject {
s.bus.subs = append(s.bus.subs[:i], s.bus.subs[i+1:]...)
return nil
}
}
return nil
}
func (b *FakeBus) Subscribe(subject string, handler func([]byte)) (messaging.Subscription, error) {
sub := fakeBusSub{subject: subject, handler: handler}
b.mu.Lock()
b.subs = append(b.subs, sub)
b.mu.Unlock()
return &fakeBusSubscription{bus: b, subRef: sub}, nil
}
func (b *FakeBus) QueueSubscribe(subject, _ string, handler func([]byte)) (messaging.Subscription, error) {
return b.Subscribe(subject, handler)
}
func (b *FakeBus) QueueSubscribeReply(string, string, func([]byte, func([]byte))) (messaging.Subscription, error) {
return &fakeBusSubscription{bus: b}, nil
}
func (b *FakeBus) SubscribeReply(string, func([]byte, func([]byte))) (messaging.Subscription, error) {
return &fakeBusSubscription{bus: b}, nil
}
func (b *FakeBus) Request(string, []byte, time.Duration) ([]byte, error) {
return nil, nil
}
func (b *FakeBus) IsConnected() bool { return true }
func (b *FakeBus) Close() {}
// OnReconnect mirrors *messaging.Client.OnReconnect so a spec can drive the
// component's reconnect re-hydrate path. The component detects this method via an
// optional interface assertion; implementing it here keeps the fake a faithful
// stand-in for the concrete client.
func (b *FakeBus) OnReconnect(cb func()) {
if cb == nil {
return
}
b.mu.Lock()
b.reconnectCbs = append(b.reconnectCbs, cb)
b.mu.Unlock()
}
// TriggerReconnect runs every registered reconnect callback, simulating a NATS
// reconnect event.
func (b *FakeBus) TriggerReconnect() {
b.mu.Lock()
cbs := append([]func(){}, b.reconnectCbs...)
b.mu.Unlock()
for _, cb := range cbs {
cb()
}
}

View File

@@ -57,6 +57,11 @@ services:
LOCALAI_AGENT_POOL_VECTOR_ENGINE: "postgres" LOCALAI_AGENT_POOL_VECTOR_ENGINE: "postgres"
LOCALAI_AGENT_POOL_DATABASE_URL: "postgresql://localai:localai@postgres:5432/localai?sslmode=disable" LOCALAI_AGENT_POOL_DATABASE_URL: "postgresql://localai:localai@postgres:5432/localai?sslmode=disable"
LOCALAI_REGISTRATION_TOKEN: "changeme" # Change this in production! LOCALAI_REGISTRATION_TOKEN: "changeme" # Change this in production!
# Shared-models mode (optional): set when every node mounts the SAME
# models directory at the SAME path (see "Shared Volume Mode" below).
# The router then skips gRPC file staging and workers load models
# directly from the shared volume instead of re-downloading them.
# LOCALAI_DISTRIBUTED_SHARED_MODELS: "true"
# Auth (required for distributed mode — must use PostgreSQL) # Auth (required for distributed mode — must use PostgreSQL)
LOCALAI_AUTH: "true" LOCALAI_AUTH: "true"
LOCALAI_AUTH_DATABASE_URL: "postgresql://localai:localai@postgres:5432/localai?sslmode=disable" LOCALAI_AUTH_DATABASE_URL: "postgresql://localai:localai@postgres:5432/localai?sslmode=disable"
@@ -157,8 +162,11 @@ services:
# Then add to the volumes section: # Then add to the volumes section:
# shared_models: # shared_models:
# #
# With shared volumes, model files are already available on the backend — # With shared volumes the model files are already present on every worker at
# gRPC file staging becomes a no-op (paths match). # the same path. Set LOCALAI_DISTRIBUTED_SHARED_MODELS=true on the frontend
# (see its environment above) so the router skips gRPC file staging and the
# worker loads the model directly from the shared path instead of
# re-downloading it into a per-model subdirectory.
# --- Adding More Workers --- # --- Adding More Workers ---
# Copy the worker-1 service above and change: # Copy the worker-1 service above and change:

View File

@@ -67,6 +67,7 @@ The frontend is a standard LocalAI instance with distributed mode enabled. These
| `--registration-require-auth` | `LOCALAI_REGISTRATION_REQUIRE_AUTH` | `false` | Fail startup when distributed mode is enabled but the registration token is empty (node endpoints and worker file-transfer would otherwise be unauthenticated) | | `--registration-require-auth` | `LOCALAI_REGISTRATION_REQUIRE_AUTH` | `false` | Fail startup when distributed mode is enabled but the registration token is empty (node endpoints and worker file-transfer would otherwise be unauthenticated) |
| `--distributed-require-auth` | `LOCALAI_DISTRIBUTED_REQUIRE_AUTH` | `false` | **Umbrella switch.** Implies both `--nats-require-auth` and `--registration-require-auth` — one knob to lock down the NATS bus *and* the registration/file-transfer layer. Set this in production instead of the two granular flags. | | `--distributed-require-auth` | `LOCALAI_DISTRIBUTED_REQUIRE_AUTH` | `false` | **Umbrella switch.** Implies both `--nats-require-auth` and `--registration-require-auth` — one knob to lock down the NATS bus *and* the registration/file-transfer layer. Set this in production instead of the two granular flags. |
| `--auto-approve-nodes` | `LOCALAI_AUTO_APPROVE_NODES` | `false` | Auto-approve new worker nodes (skip admin approval) | | `--auto-approve-nodes` | `LOCALAI_AUTO_APPROVE_NODES` | `false` | Auto-approve new worker nodes (skip admin approval) |
| `--distributed-shared-models` | `LOCALAI_DISTRIBUTED_SHARED_MODELS` | `false` | Assert that every node mounts the **same** models directory at the **same** path (a shared volume). When `true`, the router skips file staging entirely and workers load models directly from the shared path instead of re-downloading them. See [Shared models directory](#shared-models-directory). |
| `--auth` | `LOCALAI_AUTH` | `false` | **Must be `true`** for distributed mode | | `--auth` | `LOCALAI_AUTH` | `false` | **Must be `true`** for distributed mode |
| `--auth-database-url` | `LOCALAI_AUTH_DATABASE_URL` | *(required)* | PostgreSQL connection URL | | `--auth-database-url` | `LOCALAI_AUTH_DATABASE_URL` | *(required)* | PostgreSQL connection URL |
| `--backend-install-timeout` | `LOCALAI_NATS_BACKEND_INSTALL_TIMEOUT` | `15m` | How long the frontend waits for a worker to acknowledge a backend install before considering the request stalled. Raise it when workers pull large backend images over slow links. If a worker takes longer than this, the operation shows as "still installing in background" in the admin UI and clears once the worker finishes. | | `--backend-install-timeout` | `LOCALAI_NATS_BACKEND_INSTALL_TIMEOUT` | `15m` | How long the frontend waits for a worker to acknowledge a backend install before considering the request stalled. Raise it when workers pull large backend images over slow links. If a worker takes longer than this, the operation shows as "still installing in background" in the admin UI and clears once the worker finishes. |
@@ -133,6 +134,14 @@ When S3 is not configured, model files are transferred directly from the fronten
For high-throughput or very large model files, S3 can be more efficient since it avoids streaming through the frontend. For high-throughput or very large model files, S3 can be more efficient since it avoids streaming through the frontend.
### Shared models directory
If every node (frontend and workers) mounts the **same** models directory at the **same** path - for example a shared volume or network filesystem, as shown in the "Shared Volume Mode" section of `docker-compose.distributed.yaml` - the model files are already present on each worker at their canonical path. In that case staging is wasted work: it copies files that already exist into a per-model subdirectory the worker then loads from, which shows up as a re-download of a model you already have.
Set `LOCALAI_DISTRIBUTED_SHARED_MODELS=true` (or `--distributed-shared-models`) on the frontend to skip staging entirely. The router then leaves the model's absolute paths untouched and the worker loads them directly from the shared volume.
This flag is a contract you assert: all nodes must mount identical paths. Leave it off (the default) when workers have independent models directories - the frontend stages files to them over HTTP (or S3) as described above.
{{% notice warning %}} {{% notice warning %}}
The worker HTTP file transfer server is authenticated by `LOCALAI_REGISTRATION_TOKEN`. If the token is **empty**, the server **fails open** — anyone who can reach the port gets read/write access to the worker's models/staging/data directories (a remote model-poisoning / exfiltration vector). The worker logs a loud warning at startup in this case. Always set `LOCALAI_REGISTRATION_TOKEN` in distributed mode, and set `LOCALAI_DISTRIBUTED_REQUIRE_AUTH=true` (frontend **and** workers) to make a missing token *or* missing NATS credentials a hard startup error rather than a silent fail-open. Firewall the file-transfer port (gRPC base 1) so only the frontend can reach it. The worker HTTP file transfer server is authenticated by `LOCALAI_REGISTRATION_TOKEN`. If the token is **empty**, the server **fails open** — anyone who can reach the port gets read/write access to the worker's models/staging/data directories (a remote model-poisoning / exfiltration vector). The worker logs a loud warning at startup in this case. Always set `LOCALAI_REGISTRATION_TOKEN` in distributed mode, and set `LOCALAI_DISTRIBUTED_REQUIRE_AUTH=true` (frontend **and** workers) to make a missing token *or* missing NATS credentials a hard startup error rather than a silent fail-open. Firewall the file-transfer port (gRPC base 1) so only the frontend can reach it.
{{% /notice %}} {{% /notice %}}

View File

@@ -1,3 +1,3 @@
{ {
"version": "v4.5.2" "version": "v4.5.5"
} }

View File

@@ -579,6 +579,10 @@
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_35b_a3b_score.png
overrides: overrides:
backend: llama-cpp backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly (the model trains
# to 262144, 32768 is a safe default operators can raise).
context_size: 32768
function: function:
automatic_tool_parsing_fallback: true automatic_tool_parsing_fallback: true
grammar: grammar:
@@ -611,6 +615,9 @@
- gguf - gguf
overrides: overrides:
backend: llama-cpp backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly.
context_size: 32768
function: function:
automatic_tool_parsing_fallback: true automatic_tool_parsing_fallback: true
grammar: grammar:
@@ -638,6 +645,9 @@
icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/sGQKmrMc6L6guMoaB5_Y2.png icon: https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/sGQKmrMc6L6guMoaB5_Y2.png
overrides: overrides:
backend: llama-cpp backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly.
context_size: 32768
function: function:
automatic_tool_parsing_fallback: true automatic_tool_parsing_fallback: true
grammar: grammar:
@@ -688,6 +698,10 @@
icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png icon: https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3.6/Figures/qwen3.6_27b_score.png
overrides: overrides:
backend: llama-cpp backend: llama-cpp
# NVFP4 GGUFs use a quant type the GGUF metadata parser cannot read, so
# context size cannot be auto-derived; set it explicitly (the model trains
# to 262144, 32768 is a safe default operators can raise).
context_size: 32768
function: function:
automatic_tool_parsing_fallback: true automatic_tool_parsing_fallback: true
grammar: grammar:

View File

@@ -19,6 +19,7 @@ func WorkerPermissions(nodeID, nodeType string) (pubAllow, subAllow []string) {
// Keep this list in sync with the subscriptions in core/cli/agent_worker.go. // Keep this list in sync with the subscriptions in core/cli/agent_worker.go.
subAllow = []string{ subAllow = []string{
"agent.execute", "agent.execute",
"agent.*.cancel",
"jobs.*.cancel", "jobs.*.cancel",
"jobs.*.progress", "jobs.*.progress",
"jobs.*.result", "jobs.*.result",

View File

@@ -0,0 +1,161 @@
package distributed_test
import (
"context"
"github.com/mudler/LocalAI/core/services/distributed"
"github.com/mudler/LocalAI/core/services/messaging"
"github.com/mudler/LocalAI/core/services/syncstate"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
pgdriver "gorm.io/driver/postgres"
"gorm.io/gorm"
"gorm.io/gorm/logger"
)
// ftSyncStore adapts the real FineTuneStore to syncstate.Store, exactly as the
// finetune service does in production. Defined here (rather than reusing the
// service's unexported adapter) so the e2e exercises the store + component over
// real infrastructure without pulling in backend execution.
type ftSyncStore struct{ s *distributed.FineTuneStore }
func (a ftSyncStore) List(_ context.Context) ([]*distributed.FineTuneJobRecord, error) {
recs, err := a.s.ListAll()
if err != nil {
return nil, err
}
out := make([]*distributed.FineTuneJobRecord, len(recs))
for i := range recs {
r := recs[i]
out[i] = &r
}
return out, nil
}
func (a ftSyncStore) Upsert(_ context.Context, r *distributed.FineTuneJobRecord) error {
return a.s.Upsert(r)
}
func (a ftSyncStore) Delete(_ context.Context, k string) error { return a.s.Delete(k) }
// This suite is the real-infrastructure counterpart to the fake-bus unit tests:
// two SyncedMap instances stand in for two LocalAI frontend replicas, each with
// its OWN NATS connection to a shared NATS server and a SHARED PostgreSQL store -
// the exact distributed-mode invariant (single shared DB, per-replica process
// state). It proves the delta path works over the wire and that a late-joining
// replica recovers via store hydrate (the at-most-once gap a fake bus cannot
// exercise).
var _ = Describe("SyncedMap two-replica sync over real NATS", Label("Distributed"), func() {
var (
infra *TestInfra
ftStore *distributed.FineTuneStore
)
BeforeEach(func() {
infra = SetupInfra("localai_syncstate_dist_test")
db, err := gorm.Open(pgdriver.Open(infra.PGURL), &gorm.Config{
Logger: logger.Default.LogMode(logger.Silent),
})
Expect(err).ToNot(HaveOccurred())
ftStore, err = distributed.NewFineTuneStore(db)
Expect(err).ToNot(HaveOccurred())
})
// newReplica builds an independent "replica": its own NATS client to the
// shared server plus a SyncedMap over the shared store, started (hydrate +
// subscribe) and cleaned up automatically.
newReplica := func() *syncstate.SyncedMap[string, *distributed.FineTuneJobRecord] {
GinkgoHelper()
nc, err := messaging.New(infra.NatsURL)
Expect(err).ToNot(HaveOccurred())
sm := syncstate.New(syncstate.Config[string, *distributed.FineTuneJobRecord]{
Name: "finetune.jobs",
Key: func(r *distributed.FineTuneJobRecord) string { return r.ID },
Nats: nc,
Store: ftSyncStore{s: ftStore},
})
Expect(sm.Start(infra.Ctx)).To(Succeed())
FlushNATS(nc) // ensure the subscription is registered server-side before any publish
DeferCleanup(func() {
_ = sm.Close()
nc.Close()
})
return sm
}
rec := func(id, status string) *distributed.FineTuneJobRecord {
return &distributed.FineTuneJobRecord{
ID: id, UserID: "u1", Model: "m", Backend: "b",
TrainingType: "lora", TrainingMethod: "sft", Status: status,
}
}
It("propagates a create from replica A to replica B over the wire", func() {
a := newReplica()
b := newReplica()
Expect(a.Set(infra.Ctx, rec("job-1", "queued"))).To(Succeed())
Eventually(func() bool { _, ok := b.Get("job-1"); return ok }, "10s", "50ms").
Should(BeTrue(), "replica B must observe the job created on A via NATS")
got, ok := b.Get("job-1")
Expect(ok).To(BeTrue())
Expect(got.Status).To(Equal("queued"))
})
It("propagates an update and a delete across replicas", func() {
a := newReplica()
b := newReplica()
Expect(a.Set(infra.Ctx, rec("job-2", "queued"))).To(Succeed())
Eventually(func() bool { _, ok := b.Get("job-2"); return ok }, "10s", "50ms").Should(BeTrue())
// Update on A -> B reflects the new status.
Expect(a.Set(infra.Ctx, rec("job-2", "training"))).To(Succeed())
Eventually(func() string {
if r, ok := b.Get("job-2"); ok {
return r.Status
}
return ""
}, "10s", "50ms").Should(Equal("training"))
// Delete on A -> B prunes (a reload-from-path could not do this).
Expect(a.Delete(infra.Ctx, "job-2")).To(Succeed())
Eventually(func() bool { _, ok := b.Get("job-2"); return ok }, "10s", "50ms").
Should(BeFalse(), "replica B must drop the job deleted on A")
})
It("hydrates a late-joining replica from the shared store (missed-delta recovery)", func() {
a := newReplica()
// Written (and broadcast) BEFORE replica C exists, so C can never receive
// the delta - it can only learn the job by hydrating from shared Postgres
// on Start. This is the at-most-once gap a fake bus cannot exercise.
Expect(a.Set(infra.Ctx, rec("job-3", "completed"))).To(Succeed())
Eventually(func() (*distributed.FineTuneJobRecord, error) { return ftStore.Get("job-3") }, "10s", "50ms").
ShouldNot(BeNil(), "write-through must reach the shared store first")
c := newReplica() // joins late; Start() hydrates from the store synchronously
got, ok := c.Get("job-3")
Expect(ok).To(BeTrue(), "late replica must recover the job via store hydrate, not a delta")
Expect(got.Status).To(Equal("completed"))
})
It("write-through persists a local Set to the shared PostgreSQL store", func() {
a := newReplica()
Expect(a.Set(infra.Ctx, rec("job-4", "queued"))).To(Succeed())
persisted, err := ftStore.Get("job-4")
Expect(err).ToNot(HaveOccurred())
Expect(persisted.ID).To(Equal("job-4"))
Expect(persisted.Status).To(Equal("queued"))
})
})