mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-30 03:17:01 -04:00
FIX A (patch 0031 compose break): the chunked GDN prefill patch carried
'#include <cuda_bf16.h>' and '#include <type_traits>' as CONTEXT lines, but
those were introduced by the dropped bf16-tau patch 0026, so on the
bf16-tau-free 0001-0030 base only '#include <cstdlib>' is present and 'git
apply' failed. The same 0026 drop also shifted 0031's later hunks off their
context (the ', hyb' kernel-launch arg, the 'STATE_BF16, HYBRID' template
params, and the GDN_LAUNCH_ARGS list). Regenerated 0031 against a fresh
pin(0ed235ea) + 0001-0030 tree: the chunked kernel now SELF-PROVIDES the
cuda_bf16.h / type_traits includes (adds them, plus the climits it needs for
INT_MAX) and the dispatch guard is the 2-param 'if constexpr (!KDA &&
!keep_rs_t)' form. Behaviour is unchanged: 0031 stays opt-in, default OFF
(GDN_CHUNK_MIN), a recorded negative. The full 0001-0042 series now applies
clean on 0ed235ea ('git apply --check' green for every patch).
FIX B (patch 0041 S3 default): the decode-shape-stable scheduler defaulted OFF.
Make it default ON whenever paged KV is active (LLAMA_KV_PAGED set), still
overridable to off via LLAMA_PAGED_DECODE_STABLE=0. Minimal host-side change in
update_slots(); re-exported from the dev tree, README 0041 row updated to match.
Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>