LocalAI/core/config at fdff11470178e301a57c0e3ba1d7de862d6b07bb - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-28 10:27:30 -04:00

Files

History

LocalAI [bot] 1154be5eea fix(config): fall back to DefaultContextSize for unparseable GGUFs; pin NVFP4 gallery context_size (#10563 )

The GGUF metadata parser (gpustack/gguf-parser-go) cannot read NVFP4-quantized
GGUFs at all: it errors with "read tensor info 0: This quantized type is
currently unsupported" because NVFP4 is a ggml tensor type it does not know.
When ParseGGUFFile errors, the llama-cpp defaults hook skips guessGGUFFromFile
entirely and the deferred fallback sets the context window to the conservative
GGUFFallbackContextSize (1024). The result: a model that trains to 262144
tokens runs with n_ctx=1024, and every prompt over ~1k tokens fails with
"request (N tokens) exceeds the available context size (1024 tokens)".

Two changes:

- Drop GGUFFallbackContextSize (1024) and fall back to DefaultContextSize
  (4096) in both the GGUF run-estimate path (gguf.go) and the deferred hook
  fallback (hooks_llamacpp.go). 1024 is a sensible floor for a tiny CPU GGUF
  but a footgun for a large, long-context model whose header simply cannot be
  parsed. Strengthen the existing "GGUF unreadable" test to assert the value.

- Set context_size explicitly on the four NVFP4 gallery entries
  (qwen3.6-35b-a3b-nvfp4-mtp, qwopus3.6-27b-v2-mtp-nvfp4,
  qwopus3.6-27b-coder-mtp-nvfp4, qwen3.6-27b-nvfp4-mtp) so the parser failure
  is irrelevant for them. 32768 matches sibling Qwen entries and is safe on
  memory; operators can raise it toward the 262144 train length.


Assisted-by: Claude:claude-opus-4-8 [Claude Code]

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-27 23:34:52 +02:00

..

gen_inference_defaults

security(http): refuse redirects on outbound clients via hardened pkg/httpclient (#10087 )

2026-05-30 12:04:10 +02:00

feat(realtime): conversation compaction (summarize-then-drop) + OpenAI item.delete/truncate/clear (#10446 )

2026-06-22 21:28:49 +02:00

application_config_test.go

fix(settings): start watchdog on cold-enable from the React UI (#9125 ) (#10287 )

2026-06-14 16:46:14 +02:00

application_config.go

fix: correct scheme/host on self-referential URLs behind an HTTPS reverse proxy (#10482 ) (#10504 )

2026-06-25 08:10:59 +02:00

backend_capabilities_test.go

feat(gallery): Speed up load times and clean gallery entries (#9211 )

2026-05-06 14:51:38 +02:00

backend_capabilities.go

feat(ced): sound-event classification backend (CED audio tagger) (#10425 )

2026-06-22 01:00:28 +02:00

backend_hooks.go

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

chat_template_kwargs_test.go

feat: generic chat_template_kwargs (model config + per-request metadata) (#10359 )

2026-06-16 12:16:34 +02:00

config_suite_test.go

dependencies(grpcio): bump to fix CI issues (#2362 )

2024-05-21 14:33:47 +02:00

defaults.go

fix(config): fall back to DefaultContextSize for unparseable GGUFs; pin NVFP4 gallery context_size (#10563 )

2026-06-27 23:34:52 +02:00

distributed_config_test.go

feat(distributed): enforce registration token for worker file transfer (#10183 )

2026-06-05 14:34:28 +02:00

distributed_config.go

feat(distributed): declarative per-model scheduling via env/args (#10308 )

2026-06-13 18:31:06 +02:00

gallery.go

feat(gallery): verify backend OCI images with keyless cosign (#9823 )

2026-05-18 08:02:20 +02:00

generic_defaults_test.go

feat(config): prefix caching default + consolidate scattered defaults (#10415 )

2026-06-20 22:44:44 +02:00

generic_defaults.go

feat(config): prefix caching default + consolidate scattered defaults (#10415 )

2026-06-20 22:44:44 +02:00

gguf_reasoning_test.go

Respect explicit reasoning config during GGUF thinking probe (#9463 )

2026-04-21 21:53:10 +02:00

gguf.go

fix(config): fall back to DefaultContextSize for unparseable GGUFs; pin NVFP4 gallery context_size (#10563 )

2026-06-27 23:34:52 +02:00

hardware_defaults_internal_test.go

fix(config): per-device VRAM headroom for Blackwell defaults (#10485 ) (#10494 )

2026-06-25 00:07:48 +02:00

hardware_defaults_test.go

fix(config): gate parallel-slot default on per-device VRAM too (#10485 ) (#10507 )

2026-06-25 15:48:23 +02:00

hardware_defaults.go

fix(config): gate parallel-slot default on per-device VRAM too (#10485 ) (#10507 )

2026-06-25 15:48:23 +02:00

hooks_llamacpp.go

fix(config): fall back to DefaultContextSize for unparseable GGUFs; pin NVFP4 gallery context_size (#10563 )

2026-06-27 23:34:52 +02:00

hooks_test.go

fix(config): fall back to DefaultContextSize for unparseable GGUFs; pin NVFP4 gallery context_size (#10563 )

2026-06-27 23:34:52 +02:00

hooks_vllm.go

feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map (#9563 )

2026-04-29 00:49:28 +02:00

inference_defaults_test.go

feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )

2026-03-22 00:57:15 +01:00

inference_defaults.go

feat: inferencing default, automatic tool parsing fallback and wire min_p (#9092 )

2026-03-22 00:57:15 +01:00

inference_defaults.json

chore: bump inference defaults from unsloth (#10358 )

2026-06-16 09:59:36 +02:00

mitm_host_owners_test.go

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

model_config_filter.go

feat: add distributed mode (#9124 )

2026-03-30 00:47:27 +02:00

model_config_loader_test.go

feat(models): model aliases - redirect a model name to another configured model (#10414 )

2026-06-20 22:38:42 +02:00

model_config_loader.go

feat(models): model aliases - redirect a model name to another configured model (#10414 )

2026-06-20 22:38:42 +02:00

model_config_test.go

feat(models): model aliases - redirect a model name to another configured model (#10414 )

2026-06-20 22:38:42 +02:00

model_config.go

fix(config): per-device VRAM headroom for Blackwell defaults (#10485 ) (#10494 )

2026-06-25 00:07:48 +02:00

model_test.go

fix(tests): inline model_test fixtures after tests/models_fixtures removal

2026-04-28 12:58:49 +00:00

mtp_test.go

fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208 )

2026-06-07 22:09:02 +02:00

mtp.go

fix(mtp): don't auto-enable self-spec MTP for draft-only assistant GGUFs (#10208 )

2026-06-07 22:09:02 +02:00

parser_defaults.json

feat(vllm): parity with llama.cpp backend (#9328 )

2026-04-13 11:00:29 +02:00

pipeline_streaming_test.go

feat(realtime): stream the LLM / TTS / transcription pipeline stages (#10176 )

2026-06-11 08:43:12 +01:00

reasoning_effort_test.go

feat: forward reasoning_effort to the backend so jinja models honor it (#10184 )

2026-06-05 13:45:43 +00:00

runtime_settings_persist_test.go

fix(settings): merge partial /api/settings updates instead of overwriting (#10463 )

2026-06-23 13:27:34 +02:00

runtime_settings_persist.go

fix(settings): merge partial /api/settings updates instead of overwriting (#10463 )

2026-06-23 13:27:34 +02:00

runtime_settings.go

feat(watchdog): add size-aware LRU eviction mode (#9527 )

2026-06-21 17:17:04 +02:00

serving_defaults_test.go

feat(config): prefix caching default + consolidate scattered defaults (#10415 )

2026-06-20 22:44:44 +02:00

serving_defaults.go

feat(config): prefix caching default + consolidate scattered defaults (#10415 )

2026-06-20 22:44:44 +02:00

voice_gate_test.go

feat(realtime): speaker-aware conversations - surface identity to client and LLM (#10424 )

2026-06-21 21:07:10 +02:00