LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-11 18:27:32 -04:00

Files

Ettore Di Giacinto 8134d6db37 docs(dllm): record Q4_K_M validation and quantization guidance

Q4_K_M validated on GB10: quality holds (cosine 0.9862, coherent
generation, 19/48 stopper exit) but a forward step is ~5x slower than
BF16 (27.5s vs 5.6s: native BF16 tensor cores vs K-quant MoE dequant).
Guidance: prefer BF16 when it fits; Q4_K_M is the memory-bound option.

Assisted-by: Claude Code (Fable 5)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-11 19:22:02 +00:00

advanced

feat: forward reasoning_effort to the backend so jinja models honor it (#10184 )

2026-06-05 13:45:43 +00:00

features

docs(dllm): record Q4_K_M validation and quantization guidance

2026-06-11 19:22:02 +00:00

getting-started

feat(cli): add interactive chat mode (#10226 )

2026-06-09 14:58:44 +00:00

installation

feat(ui): Interactive model config editor with autocomplete (#9149 )