LocalAI/core/services/nodes at 0b9344ef3d7ef820592ee82f8e9f75c40f58903f - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-05-17 13:10:23 -04:00

Files

History

LocalAI [bot] 22ff86d64f fix(distributed): round-robin replicas of the same model (#9695 )

FindAndLockNodeWithModel previously ordered candidate replicas by
in_flight ASC, available_vram DESC. The primary key is correct, but the
tiebreaker meant that whenever in_flight tied — the common case at low
to moderate concurrency where requests don't overlap — the node with
the largest available VRAM won every pick. With autoscaling placing
replicas of the same model on multiple nodes, the fattest GPU node
ended up taking nearly all the load while the others sat idle.

Insert last_used ASC between the two existing tiers. last_used is
already refreshed inside the same transaction that increments in_flight
(and by TouchNodeModel on cache hits in the router), so the
"oldest-used" replica naturally rotates through the candidate set —
strict round-robin without a schema change. available_vram DESC is
demoted to a final tiebreaker for cold starts where last_used is
identical across replicas.

Placement queries (FindNodeWithVRAM, FindLeastLoadedNode, and the
*FromSet variants) have the same fattest-GPU bias on tiebreakers but
are higher-cost to fix consistently. Deferred to a follow-up so the
routing fix can land first — for the user-observed symptom routing was
the dominant cause anyway.

Test: registry_test.go adds a focused spec that loads three replicas
on three nodes with 24/16/8 GB VRAM and asserts each is picked at
least twice across 9 in_flight-tied calls.


Assisted-by: claude-code:claude-opus-4-7 [Read] [Edit] [Bash] [Grep]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-05-06 19:40:54 +02:00

..

distributed_store_test.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

distributed_store.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

file_stager_http.go

feat: track files being staged (#9275 )

2026-04-08 14:33:58 +02:00

file_stager_s3.go

feat: track files being staged (#9275 )

2026-04-08 14:33:58 +02:00

file_stager.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

file_staging_client.go

feat: wire transcription for llama.cpp, add streaming support (#9353 )

2026-04-14 16:13:40 +02:00

file_transfer_server_test.go

feat(distributed): Avoid resending models to backend nodes (#9193 )

2026-03-31 16:28:13 +02:00

file_transfer_server.go

fix(distributed): worker container healthcheck always unhealthy

2026-04-27 13:51:57 +00:00

health_mock_test.go

feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654 )

2026-05-05 15:10:13 +02:00

health_test.go

feat(distributed): support multiple replicas of one model on the same node (#9583 )

2026-04-27 21:20:05 +02:00

health.go

fix(distributed): orchestrator resilience — auto-upgrade routing, worker bind-wait, RAG-init crash, log spam (#9657 )

2026-05-04 19:09:16 +02:00

inflight_test.go

feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654 )

2026-05-05 15:10:13 +02:00

inflight.go

feat(distributed): support multiple replicas of one model on the same node (#9583 )

2026-04-27 21:20:05 +02:00

interfaces.go

feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )

2026-05-05 08:42:50 +02:00

managers_distributed_test.go

fix(distributed): scope Upgrade All to nodes that have the backend installed (#9678 )

2026-05-06 00:28:41 +02:00

managers_distributed.go

fix(distributed): scope Upgrade All to nodes that have the backend installed (#9678 )

2026-05-06 00:28:41 +02:00

model_router_test.go

feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )

2026-05-05 08:42:50 +02:00

model_router.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

nodes_suite_test.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

reconciler_test.go

feat(distributed): support multiple replicas of one model on the same node (#9583 )

2026-04-27 21:20:05 +02:00

reconciler.go

feat(distributed): support multiple replicas of one model on the same node (#9583 )

2026-04-27 21:20:05 +02:00

registry_test.go

fix(distributed): round-robin replicas of the same model (#9695 )

2026-05-06 19:40:54 +02:00

registry.go

fix(distributed): round-robin replicas of the same model (#9695 )

2026-05-06 19:40:54 +02:00

router_test.go

feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )

2026-05-05 08:42:50 +02:00

router.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

staging_keys_test.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

staging_keys.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

staging_progress.go

feat: track files being staged (#9275 )

2026-04-08 14:33:58 +02:00

unloader_test.go

feat(distributed): support multiple replicas of one model on the same node (#9583 )

2026-04-27 21:20:05 +02:00

unloader.go

feat(distributed): support multiple replicas of one model on the same node (#9583 )

2026-04-27 21:20:05 +02:00