LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-23 16:19:07 -04:00

Files

Ettore Di Giacinto d6c91b7d62 analysis: finalize PR #22569 paged-KV eval (full detail + compute-bound note)

Agent-finalized eval: builds (1-line Qwen3 reshape fix), but on GB10+32B paged is
~12% slower than contiguous and both cap at LLAMA_MAX_SEQ=256 (not OOM; 16GiB/119).
Agent argues 32B is compute-bound + plateaus by npl=128 so raising the cap won't
help - but 540 t/s << ~1900 bandwidth ceiling, so the plateau cause is unconfirmed
(attention-over-KV or CPU sampling, not matmul saturation). Next: raise the cap +
remeasure to settle it. Verdict: do not adopt #22569; paged KV not a GB10 lever.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-21 14:35:02 +00:00

ds4

chore: ⬆️ Update antirez/ds4 to 80ebbc396aee40eedc1d829222f3362d10fa4c6c (#10378 )

2026-06-18 00:32:13 +02:00

grpc

fix: speedup git submodule update with --single-branch (#2847 )

2024-07-13 22:32:25 +02:00

ik-llama-cpp

chore: ⬆️ Update ikawrakow/ik_llama.cpp to b3dfb7858cfcb9166e92f366e5af87f19ebc94be (#10395 )

2026-06-19 00:03:37 +02:00

llama-cpp

analysis: finalize PR #22569 paged-KV eval (full detail + compute-bound note)