LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-23 16:19:07 -04:00

Files

Ettore Di Giacinto cb28deda6b bench(paged): decode profile overturns 'engine-addressable' - decode is 54.6% MoE GEMM too

Decode-dominated B=64 nsys: mul_mat_q<MXFP4> 54.6%, attention only 19.8%. Both
phases are FP4-MoE-kernel-bound (Lever 3). The paged series cannot close the vLLM
gap in either phase; its real value is capacity + prefix-sharing, not tok/s parity.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-19 23:27:35 +00:00

paged

docs(paged): upstream survey - no FP4 MoE GEMM to patch in; phase 3 is from-scratch

2026-06-19 22:44:11 +00:00

patches

bench(paged): decode profile overturns 'engine-addressable' - decode is 54.6% MoE GEMM too

2026-06-19 23:27:35 +00:00

CMakeLists.txt

fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413 )

2026-04-18 20:30:28 +02:00

grpc-server.cpp

feat: generic chat_template_kwargs (model config + per-request metadata) (#10359 )