LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-23 16:19:07 -04:00

Files

Ettore Di Giacinto 84d59e659b docs(paged): additive "hook, don't edit" layout for the patch series

Maintainers rejected PR #22569 (the upstream paged draft) as "slop" - it rewrites
core attention and is unvendorable. Our own series must be additive so it survives
llama.cpp pin bumps. This documents the rule and the per-patch core-touch budget:
every change is either new code in a new vendored src/ file, or a single env-gated
hook at one call site that delegates to it - no logic in core files, no core struct
edits.

Grounds it in the pinned source: llm_graph_input_i is pure-virtual and
res->add_input() lets a new file register a graph input, so paged behavior plugs in
without editing core graph types. Redesigns 0003 (gather-read) from the old 4-file
surgery to one build_attn hook + a new paged-attn.{h,cpp} (a gather-input subclass)
+ two thin cache accessors (~8 core lines vs a core-struct rewrite). 0005 lands
entirely in LocalAI's grpc-server.cpp (no core patch).

Dev tree at the pin with 0001+0002 applied is set up; 0003 implementation is the
next focused token-identical Gate-0 block.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-22 07:28:44 +00:00

paged

feat(paged): target-readiness for 2xH200 - correctness PASS, load-gen harness, projection

2026-06-21 23:16:28 +00:00

patches

docs(paged): additive "hook, don't edit" layout for the patch series

2026-06-22 07:28:44 +00:00

CMakeLists.txt

fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413 )

2026-04-18 20:30:28 +02:00

grpc-server.cpp

feat: generic chat_template_kwargs (model config + per-request metadata) (#10359 )