LocalAI/core/services at 6715d75f227fe1790418337e4b6ff0dcc249e87f - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-20 14:49:09 -04:00

Files

History

Ettore Di Giacinto 6715d75f22 feat(config): default concurrent serving (n_parallel) by GPU VRAM

The llama.cpp backend defaults n_parallel=1, which serializes multi-user requests
and leaves continuous batching off (it auto-enables only at n_parallel>1). Fold a
VRAM-scaled parallel-slot default into the hardware-config path so multi-user
serving works out of the box: >=32GiB->8, >=8GiB->4, >=4GiB->2, else unchanged.
With the backend's unified KV the slots SHARE the context budget, so this adds
concurrency without multiplying KV memory. Explicit parallel/n_parallel always
wins. EnsureParallelOption is shared by the single-host path (ApplyHardwareDefaults
with the local GPU) and the distributed router (per selected node's reported VRAM,
since the frontend may have no GPU). LocalGPU now also reports VRAM.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-20 09:35:04 +00:00

..

feat(distributed): sync state with frontends, better backend management reporting (#9426 )

2026-04-19 17:55:53 +02:00

feat(agents): surface KB source citations in RAG responses (#10228 )

2026-06-09 16:32:56 +02:00

fix(agents): emit chat event timestamps in milliseconds (#9867 ) (#10243 )

2026-06-12 23:18:44 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

fix: distributed backend reinstall/upgrade UI stuck on 'reinstalling' (#10214 )

2026-06-08 10:03:02 +02:00

facerecognition

feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480 )

2026-04-22 21:55:41 +02:00

chore: Security hardening (#9719 )

2026-05-08 16:25:45 +02:00

fix(downloader): stall timeout, resume-safe cancel, and stale-partial reaping (#10406 )

2026-06-19 21:35:21 +02:00

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

fix: distributed backend reinstall/upgrade UI stuck on 'reinstalling' (#10214 )

2026-06-08 10:03:02 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

feat(config): default concurrent serving (n_parallel) by GPU VRAM

2026-06-20 09:35:04 +00:00

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

refactor(agents): bump skillserver, drop redundant Name from list_skills output (#9916 )

2026-05-21 14:45:53 +02:00

feat: track files being staged (#9275 )

2026-04-08 14:33:58 +02:00

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

voicerecognition

feat: voice recognition (#9500 )

2026-04-23 12:07:14 +02:00

feat(config): node-aware hardware defaults — larger physical batch on Blackwell

2026-06-19 22:02:14 +00:00