LocalAI/core/config at 6715d75f227fe1790418337e4b6ff0dcc249e87f - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-20 14:49:09 -04:00

Files

History

Ettore Di Giacinto 6715d75f22 feat(config): default concurrent serving (n_parallel) by GPU VRAM

The llama.cpp backend defaults n_parallel=1, which serializes multi-user requests
and leaves continuous batching off (it auto-enables only at n_parallel>1). Fold a
VRAM-scaled parallel-slot default into the hardware-config path so multi-user
serving works out of the box: >=32GiB->8, >=8GiB->4, >=4GiB->2, else unchanged.
With the backend's unified KV the slots SHARE the context budget, so this adds
concurrency without multiplying KV memory. Explicit parallel/n_parallel always
wins. EnsureParallelOption is shared by the single-host path (ApplyHardwareDefaults
with the local GPU) and the distributed router (per selected node's reported VRAM,
since the frontend may have no GPU). LocalGPU now also reports VRAM.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-20 09:35:04 +00:00

..

gen_inference_defaults

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

application_config_test.go

fix(settings): start watchdog on cold-enable from the React UI (#9125 ) (#10287 )

2026-06-14 16:46:14 +02:00

application_config.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

backend_capabilities_test.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

backend_capabilities.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

backend_hooks.go

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

chat_template_kwargs_test.go

feat: generic chat_template_kwargs (model config + per-request metadata) (#10359 )

2026-06-16 12:16:34 +02:00

config_suite_test.go

dependencies(grpcio): bump to fix CI issues (#2362 )

2024-05-21 14:33:47 +02:00

distributed_config_test.go

feat(distributed): enforce registration token for worker file transfer (#10183 )

2026-06-05 14:34:28 +02:00

distributed_config.go

feat(distributed): declarative per-model scheduling via env/args (#10308 )

2026-06-13 18:31:06 +02:00

gallery.go

feat(gallery): verify backend OCI images with keyless cosign (#9823 )

2026-05-18 08:02:20 +02:00

gguf_reasoning_test.go

Respect explicit reasoning config during GGUF thinking probe (#9463 )

2026-04-21 21:53:10 +02:00

gguf.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

hardware_defaults_internal_test.go

test(config): injectable local-GPU seam + single-instance coverage

2026-06-19 22:18:27 +00:00

hardware_defaults_test.go

feat(config): default concurrent serving (n_parallel) by GPU VRAM

2026-06-20 09:35:04 +00:00

hardware_defaults.go

feat(config): default concurrent serving (n_parallel) by GPU VRAM

2026-06-20 09:35:04 +00:00

hooks_llamacpp.go

fix(config): skip vocab arrays and mmap GGUF headers to speed up startup (#10213 )

2026-06-07 23:33:52 +02:00

hooks_test.go

fix(config): skip vocab arrays and mmap GGUF headers to speed up startup (#10213 )

2026-06-07 23:33:52 +02:00

hooks_vllm.go

feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563 )

2026-04-29 00:49:28 +02:00

inference_defaults_test.go

feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )

2026-03-22 00:57:15 +01:00

inference_defaults.go

feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )

2026-03-22 00:57:15 +01:00

inference_defaults.json

chore: bump inference defaults from unsloth (#10358 )

2026-06-16 09:59:36 +02:00

mitm_host_owners_test.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

model_config_filter.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

model_config_loader_test.go

feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )

2026-05-05 08:42:50 +02:00

model_config_loader.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

model_config_test.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

model_config.go

test(config): injectable local-GPU seam + single-instance coverage

2026-06-19 22:18:27 +00:00

model_test.go

fix(tests): inline model_test fixtures after tests/models_fixtures removal

2026-04-28 12:58:49 +00:00

mtp_test.go

fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208 )

2026-06-07 22:09:02 +02:00

mtp.go

fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208 )

2026-06-07 22:09:02 +02:00

parser_defaults.json

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

pipeline_streaming_test.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

reasoning_effort_test.go

feat: forward reasoning_effort to the backend so jinja models honor it (#10184 )

2026-06-05 13:45:43 +00:00

runtime_settings_persist_test.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

runtime_settings_persist.go

feat(branding): admin-configurable instance name, tagline, and assets (#9635 )

2026-05-02 15:51:36 +02:00

runtime_settings.go

feat(pii): NER tier engine — privacy-filter.cpp backend + NER-centric PII filter (#10360 )

2026-06-18 11:45:22 +01:00

voice_gate_test.go

feat(realtime): gate realtime pipeline voice models behind voice recognition (#10319 )

2026-06-13 23:38:08 +02:00