LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-23 16:19:07 -04:00

Files

Ettore Di Giacinto ba6bd94976 feat(paged): assert mask-pad invariant for the paged tile route (patch 0012)

Patch 0012 of the paged-attention series. Adds a defensive GGML_ASSERT in
src/paged-attn.cpp so the now-default paged decode route (GQA-grouped
fattn-tile kernel) cannot silently start leaking past-end KV rows.

The route stays correct only because the compacted mask/block-table length
n_view = GGML_PAD(n_gather, 256) is a whole number of flash-attn KV tiles
(nbatch_fa = 64 for head_dim 128 divides 256), so the last tile sits entirely
inside the -inf pad window. The assert (n_view % 64 == 0) pins that implicit
invariant: a future pad < 256 or tile > 256 that broke it now aborts instead
of leaking. Additive only, no behaviour change.

Verified on the DGX dev tree: build-cpu compiles and the paged CPU byte gate
(LLAMA_KV_PAGED off vs on, Qwen3-0.6B-Q8_0, greedy) stays byte-identical with
the assert silent.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-23 09:13:08 +00:00

ds4

chore: ⬆️ Update antirez/ds4 to 80ebbc396aee40eedc1d829222f3362d10fa4c6c (#10378 )

2026-06-18 00:32:13 +02:00

grpc

fix: speedup git submodule update with --single-branch (#2847 )

2024-07-13 22:32:25 +02:00

ik-llama-cpp

chore: ⬆️ Update ikawrakow/ik_llama.cpp to b3dfb7858cfcb9166e92f366e5af87f19ebc94be (#10395 )

2026-06-19 00:03:37 +02:00

llama-cpp

feat(paged): assert mask-pad invariant for the paged tile route (patch 0012)