LocalAI/core at 4d14fe5bef3afdbd8a3bccb9fbc2c683cf9f5ffa - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-22 07:39:02 -04:00

Files

History

Ettore Di Giacinto 4d14fe5bef fix(distributed): detach cold-load staging from the request context

A model not yet loaded on a worker is staged lazily on the inference
request path. Staging a multi-GB model takes minutes - far longer than
any client keeps its HTTP request open - so a browser refresh, an
ingress/LB idle-timeout, or a round-robined retry landing on another
frontend replica cancels the request context and aborts the upload with
"context canceled" mid-transfer. Large models then never finish staging,
so they never load (observed in a 2-replica deployment: both frontends
repeatedly failed to stage a 15.7 GB GGUF, each attempt dying at a
different offset).

Bind the cold load (staging + LoadModel + the per-model advisory lock) to
context.WithoutCancel(ctx): it keeps the request's values (prefix chain)
but drops cancellation/deadline. Each long step keeps its own bound (the
file stager's resume budget, LoadModel's 5m timeout), and the advisory
lock still de-dupes concurrent loaders across replicas.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

2026-06-21 23:11:26 +00:00

..

feat(watchdog): add size-aware LRU eviction mode (#9527 )

2026-06-21 17:17:04 +02:00

fix(backend): call vram.EstimateModelMultiContext (master build broken: undefined vram.EstimateModel) (#10426 )

2026-06-21 17:51:46 +02:00

feat(watchdog): add size-aware LRU eviction mode (#9527 )

2026-06-21 17:17:04 +02:00

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424 )

2026-06-21 21:07:10 +02:00

dependencies_manager

feat(ui): move to React for frontend (#8772 )

2026-03-05 21:47:12 +01:00

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424 )

2026-06-21 21:07:10 +02:00

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

fix(distributed): detach cold-load staging from the request context

2026-06-21 23:11:26 +00:00

feat(gallery): verify backend OCI images with keyless cosign (#9823 )

2026-05-18 08:02:20 +02:00

fix(openresponses): populate Content and accept bare {role,content} items (#10039 ) (#10040 )

2026-05-28 07:21:48 +00:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00