LocalAI/core/config at 94b6cd635501e031e7ebc40030ca75e116e308a3 - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-21 07:08:50 -04:00

Files

History

Ettore Di Giacinto 94b6cd6355 feat(config): enable cross-request prefix caching for serving (Phase 2)

The llama.cpp backend ships n_cache_reuse=0 (cross-request KV prefix reuse via
shifting disabled). Enable it by default (256) so repeated prefixes - system
prompts, RAG context, agent scaffolds, multi-turn chat - aren't recomputed. This
is the universally-useful part of 'paged attention' (shared-prefix reuse, which
the upstream maintainers themselves identify as where paged attn actually helps)
and needs none of the block-KV machinery.

Lives in a serving_defaults.go sibling to hardware_defaults.go (device-driven vs
serving-policy defaults); both run from SetDefaults and only fill unset values.
Explicit cache_reuse/n_cache_reuse always wins. Device-independent, so it
propagates to distributed nodes via the model options with no router change.
Shares the backendOptionSet helper with the Phase-1 parallel default.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-20 13:01:23 +00:00

..

gen_inference_defaults

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

application_config_test.go

fix(settings): start watchdog on cold-enable from the React UI (#9125 ) (#10287 )

2026-06-14 16:46:14 +02:00

application_config.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

backend_capabilities_test.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

backend_capabilities.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

backend_hooks.go

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

chat_template_kwargs_test.go

feat: generic chat_template_kwargs (model config + per-request metadata) (#10359 )

2026-06-16 12:16:34 +02:00

config_suite_test.go

dependencies(grpcio): bump to fix CI issues (#2362 )

2024-05-21 14:33:47 +02:00

distributed_config_test.go

feat(distributed): enforce registration token for worker file transfer (#10183 )

2026-06-05 14:34:28 +02:00

distributed_config.go

feat(distributed): declarative per-model scheduling via env/args (#10308 )

2026-06-13 18:31:06 +02:00

gallery.go

feat(gallery): verify backend OCI images with keyless cosign (#9823 )

2026-05-18 08:02:20 +02:00

gguf_reasoning_test.go

Respect explicit reasoning config during GGUF thinking probe (#9463 )

2026-04-21 21:53:10 +02:00

gguf.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

hardware_defaults_internal_test.go

feat(config): hardware-tuned defaults — Blackwell batch + VRAM-scaled concurrency (#10411 )

2026-06-20 14:45:59 +02:00

hardware_defaults_test.go

feat(config): hardware-tuned defaults — Blackwell batch + VRAM-scaled concurrency (#10411 )

2026-06-20 14:45:59 +02:00

hardware_defaults.go

feat(config): enable cross-request prefix caching for serving (Phase 2)

2026-06-20 13:01:23 +00:00

hooks_llamacpp.go

fix(config): skip vocab arrays and mmap GGUF headers to speed up startup (#10213 )

2026-06-07 23:33:52 +02:00

hooks_test.go

fix(config): skip vocab arrays and mmap GGUF headers to speed up startup (#10213 )

2026-06-07 23:33:52 +02:00

hooks_vllm.go

feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563 )

2026-04-29 00:49:28 +02:00

inference_defaults_test.go

feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )

2026-03-22 00:57:15 +01:00

inference_defaults.go

feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )

2026-03-22 00:57:15 +01:00

inference_defaults.json

chore: bump inference defaults from unsloth (#10358 )

2026-06-16 09:59:36 +02:00

mitm_host_owners_test.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

model_config_filter.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

model_config_loader_test.go

feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )

2026-05-05 08:42:50 +02:00

model_config_loader.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

model_config_test.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

model_config.go

feat(config): enable cross-request prefix caching for serving (Phase 2)

2026-06-20 13:01:23 +00:00

model_test.go

fix(tests): inline model_test fixtures after tests/models_fixtures removal

2026-04-28 12:58:49 +00:00

mtp_test.go

fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208 )

2026-06-07 22:09:02 +02:00

mtp.go

fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208 )

2026-06-07 22:09:02 +02:00

parser_defaults.json

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

pipeline_streaming_test.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

reasoning_effort_test.go

feat: forward reasoning_effort to the backend so jinja models honor it (#10184 )

2026-06-05 13:45:43 +00:00

runtime_settings_persist_test.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

runtime_settings_persist.go

feat(branding): admin-configurable instance name, tagline, and assets (#9635 )

2026-05-02 15:51:36 +02:00

runtime_settings.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

serving_defaults_test.go

feat(config): enable cross-request prefix caching for serving (Phase 2)

2026-06-20 13:01:23 +00:00

serving_defaults.go

feat(config): enable cross-request prefix caching for serving (Phase 2)

2026-06-20 13:01:23 +00:00

voice_gate_test.go

feat(realtime): gate realtime pipeline voice models behind voice recognition (#10319 )

2026-06-13 23:38:08 +02:00