LocalAI/core at 94b6cd635501e031e7ebc40030ca75e116e308a3 - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-20 22:59:09 -04:00

Files

History

Ettore Di Giacinto 94b6cd6355 feat(config): enable cross-request prefix caching for serving (Phase 2)

The llama.cpp backend ships n_cache_reuse=0 (cross-request KV prefix reuse via
shifting disabled). Enable it by default (256) so repeated prefixes - system
prompts, RAG context, agent scaffolds, multi-turn chat - aren't recomputed. This
is the universally-useful part of 'paged attention' (shared-prefix reuse, which
the upstream maintainers themselves identify as where paged attn actually helps)
and needs none of the block-KV machinery.

Lives in a serving_defaults.go sibling to hardware_defaults.go (device-driven vs
serving-policy defaults); both run from SetDefaults and only fill unset values.
Explicit cache_reuse/n_cache_reuse always wins. Device-independent, so it
propagates to distributed nodes via the model options with no router change.
Shares the backendOptionSet helper with the Phase-1 parallel default.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-20 13:01:23 +00:00

..

fix(downloader): stall timeout, resume-safe cancel, and stale-partial reaping (#10406 )

2026-06-19 21:35:21 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

feat(distributed): declarative per-model scheduling via env/args (#10308 )

2026-06-13 18:31:06 +02:00

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

feat(config): enable cross-request prefix caching for serving (Phase 2)

2026-06-20 13:01:23 +00:00

dependencies_manager

feat(ui): move to React for frontend (#8772 )

2026-03-05 21:47:12 +01:00

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

feat(config): hardware-tuned defaults — Blackwell batch + VRAM-scaled concurrency (#10411 )

2026-06-20 14:45:59 +02:00

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

feat(config): hardware-tuned defaults — Blackwell batch + VRAM-scaled concurrency (#10411 )

2026-06-20 14:45:59 +02:00

feat(gallery): verify backend OCI images with keyless cosign (#9823 )

2026-05-18 08:02:20 +02:00

fix(openresponses): populate Content and accept bare {role,content} items (#10039 ) (#10040 )

2026-05-28 07:21:48 +00:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00