LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-29 19:06:43 -04:00

Files

Ettore Di Giacinto be65438eac docs(paged): record MoE-prefill engine-gap decomposition + GEMM-port negatives (default-off)

nsys cross-engine decomposition: the MoE prefill 64% gap vs vLLM is engine plumbing, not the kernel (GPU 97% busy, 443 vs 197 us/tok). Three buckets: per-expert W4A4 M-fragmentation (58%), GDN scan (24%), f32<->bf16 casts (15%). Offline-repack (0045) and verbatim vLLM-marlin port both trail FP4-MMQ via wrapper overhead, kept default-off as recorded negatives.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-29 17:20:07 +00:00

ds4

chore: ⬆️ Update antirez/ds4 to 80ebbc396aee40eedc1d829222f3362d10fa4c6c (#10378 )

2026-06-18 00:32:13 +02:00

grpc

fix: speedup git submodule update with --single-branch (#2847 )

2024-07-13 22:32:25 +02:00

ik-llama-cpp

fix(ik-llama): port multimodal path to mtmd API and bump to f96eaddb (#10534 ) (#10568 )

2026-06-28 08:57:11 +02:00

llama-cpp

paged: drop bf16-tau (patch 0026), subsumed by decode fusions (tau=100000 flat, zero speed benefit)