LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-27 01:47:18 -04:00

Files

Ettore Di Giacinto 4d3fecd524 docs(paged): MoE decode re-graph lever (patch 0025) + speedup-hunt B findings

Mirror of llama.cpp dev-tree patch 0025 (qwen35moe NVFP4 MoE-decode re-graph) and the GPU-agent B
findings in SPEEDUP_HUNT.md: re-confirmed MoE decode decomposition @npl128, the measured re-graph
lever (+4.4%/+2.9%/+1.9% decode_agg at npl 32/64/128; bit-exact: test-backend-ops MUL_MAT_ID 806/806
+ parallel-greedy np16 byte-identical ON==OFF), grouped-GEMM occupancy headroom (exhausted on this
bandwidth-bound model), and the W4A16 assessment (rejected: non-bit-exact, slower BF16 MMA).

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-26 14:53:14 +00:00

ds4

chore: ⬆️ Update antirez/ds4 to 80ebbc396aee40eedc1d829222f3362d10fa4c6c (#10378 )

2026-06-18 00:32:13 +02:00

grpc

fix: speedup git submodule update with --single-branch (#2847 )

2024-07-13 22:32:25 +02:00

ik-llama-cpp

fix(backends): quote $CURDIR in run.sh (fixes backends in paths with spaces) (#10519 )

2026-06-26 01:02:48 +02:00

llama-cpp

docs(paged): MoE decode re-graph lever (patch 0025) + speedup-hunt B findings