LocalAI/core/config at 67f80a152b072a5e2b37928a421b65a6bcf85305 - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-08 00:36:37 -04:00

Files

History

LocalAI [bot] 67f80a152b fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208 )

Gemma4 MTP (ggml-org/llama.cpp#23398) registers the prediction head as a
separate `gemma4-assistant` architecture. That assistant GGUF still carries
`<arch>.nextn_predict_layers`, so the architecture-agnostic detection in
HasEmbeddedMTPHead matched it and appended the `spec_type:draft-mtp` defaults.

Unlike the DeepSeek/Qwen embedded-head models, an assistant checkpoint cannot
self-speculate: it is a draft model that requires a paired target context
(`ctx_other`) and throws if loaded alone. Auto-applying the self-spec defaults
to a standalone assistant import therefore produces a broken config.

Guard the detection against draft-only assistant architectures (the `-assistant`
suffix is upstream's naming convention) so importing one no longer yields a
self-speculation config. Two-model target+draft pairing remains expressible
manually via `draft_model:` and is left to a follow-up.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-07 22:09:02 +02:00

..

gen_inference_defaults

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

feat: forward reasoning_effort to the backend so jinja models honor it (#10184 )

2026-06-05 13:45:43 +00:00

application_config_test.go

feat: backend versioning, upgrade detection and auto-upgrade (#9315 )

2026-04-11 22:31:15 +02:00

application_config.go

feat(distributed): gated X-LocalAI-Node response header (middleware + wrapper) (#9976 )

2026-05-25 10:51:48 +02:00

backend_capabilities_test.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

backend_capabilities.go

fix(config): add face/speaker recognition constants and register insightface + speaker-recognition (#10110 )

2026-06-04 21:48:01 +02:00

backend_hooks.go

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

config_suite_test.go

dependencies(grpcio): bump to fix CI issues (#2362 )

2024-05-21 14:33:47 +02:00

distributed_config_test.go

feat(distributed): enforce registration token for worker file transfer (#10183 )

2026-06-05 14:34:28 +02:00

distributed_config.go

feat(distributed): enforce registration token for worker file transfer (#10183 )

2026-06-05 14:34:28 +02:00

gallery.go

feat(gallery): verify backend OCI images with keyless cosign (#9823 )

2026-05-18 08:02:20 +02:00

gguf_reasoning_test.go

Respect explicit reasoning config during GGUF thinking probe (#9463 )

2026-04-21 21:53:10 +02:00

gguf.go

feat(llama-cpp): bump to MTP-merge SHA and automatically set MTP defaults (#9852 )

2026-05-16 22:42:48 +02:00

hooks_llamacpp.go

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

hooks_test.go

feat(config): default prompt_cache_all to true (#9951 )

2026-05-22 22:06:22 +02:00

hooks_vllm.go

feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563 )

2026-04-29 00:49:28 +02:00

inference_defaults_test.go

feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )

2026-03-22 00:57:15 +01:00

inference_defaults.go

feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )

2026-03-22 00:57:15 +01:00

inference_defaults.json

chore: bump inference defaults from unsloth (#9396 )

2026-04-17 09:05:55 +02:00

mitm_host_owners_test.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

model_config_filter.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

model_config_loader_test.go

feat(concurrency-groups): per-model exclusive groups for backend loading (#9662 )

2026-05-05 08:42:50 +02:00

model_config_loader.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

model_config_test.go

feat: prefix-cache-aware routing for distributed mode (#10071 )

2026-05-30 23:24:22 +02:00

model_config.go

feat: forward reasoning_effort to the backend so jinja models honor it (#10184 )

2026-06-05 13:45:43 +00:00

model_test.go

fix(tests): inline model_test fixtures after tests/models_fixtures removal

2026-04-28 12:58:49 +00:00

mtp_test.go

fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208 )

2026-06-07 22:09:02 +02:00

mtp.go

fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208 )

2026-06-07 22:09:02 +02:00

parser_defaults.json

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

reasoning_effort_test.go

feat: forward reasoning_effort to the backend so jinja models honor it (#10184 )

2026-06-05 13:45:43 +00:00

runtime_settings_persist_test.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

runtime_settings_persist.go

feat(branding): admin-configurable instance name, tagline, and assets (#9635 )

2026-05-02 15:51:36 +02:00

runtime_settings.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00