Files
LocalAI/docs/content/features
Ettore Di Giacinto 8134d6db37 docs(dllm): record Q4_K_M validation and quantization guidance
Q4_K_M validated on GB10: quality holds (cosine 0.9862, coherent
generation, 19/48 stopper exit) but a forward step is ~5x slower than
BF16 (27.5s vs 5.6s: native BF16 tensor cores vs K-quant MoE dequant).
Guidance: prefer BF16 when it fits; Q4_K_M is the memory-bound option.

Assisted-by: Claude Code (Fable 5)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-11 19:22:02 +00:00
..