Ettore Di Giacinto
8134d6db37
docs(dllm): record Q4_K_M validation and quantization guidance
...
Q4_K_M validated on GB10: quality holds (cosine 0.9862, coherent
generation, 19/48 stopper exit) but a forward step is ~5x slower than
BF16 (27.5s vs 5.6s: native BF16 tensor cores vs K-quant MoE dequant).
Guidance: prefer BF16 when it fits; Q4_K_M is the memory-bound option.
Assisted-by: Claude Code (Fable 5)
Signed-off-by: Ettore Di Giacinto <mudler@localai.io >
2026-06-11 19:22:02 +00:00
..
2026-04-03 09:46:06 +02:00
2026-06-02 18:43:22 +02:00
2026-04-04 15:14:35 +02:00
2026-06-02 18:43:22 +02:00
2026-06-07 22:08:24 +02:00
2026-06-02 18:43:22 +02:00
2026-05-21 16:34:02 +02:00
2026-04-21 22:06:35 +02:00
2026-04-03 09:46:06 +02:00
2026-06-02 18:43:22 +02:00
2026-04-03 09:46:06 +02:00
2026-06-02 18:43:22 +02:00
2026-06-07 00:37:12 +02:00
2026-04-03 10:14:13 +02:00
2026-04-22 21:55:41 +02:00
2026-06-02 18:43:22 +02:00
2026-06-02 18:43:22 +02:00
2026-04-03 09:46:06 +02:00
2026-04-12 08:51:30 +02:00
2026-06-01 14:31:08 +02:00
2026-04-28 19:29:27 +02:00
2026-06-02 18:43:22 +02:00
2026-06-02 18:43:22 +02:00
2026-06-02 18:43:22 +02:00
2026-06-02 18:43:22 +02:00
2026-04-03 09:46:06 +02:00
2026-05-27 18:43:57 +02:00
2026-06-02 18:43:22 +02:00
2026-06-11 08:43:12 +01:00
2026-03-08 17:59:33 +01:00
2026-06-02 18:43:22 +02:00
2026-06-02 18:43:22 +02:00
2026-04-03 09:46:06 +02:00
2026-03-08 17:59:33 +01:00
2026-04-25 21:04:31 +00:00
2026-06-11 19:22:02 +00:00
2026-06-04 17:26:31 +02:00
2026-03-08 17:59:33 +01:00
2026-03-08 17:59:33 +01:00
2026-06-02 18:43:22 +02:00