LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-22 14:14:55 -04:00

Files

Ettore Di Giacinto 595e448714 docs(llama.cpp): note tensor split now works with quantized KV cache (#10135 )

The split_mode: tensor description claimed tensor parallelism requires
KV-cache quantization to be disabled. ggml-org/llama.cpp#23792 lifts that
restriction by extending the meta backend to preserve shape information
through KV-cache flatten/reshape, so cache_type_k/cache_type_v
quantization can be combined with -sm tensor on builds that include it.

Documentation only: no backend code, grpc-server.cpp comment, or
llama.cpp pin changes.


Assisted-by: Claude Code:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-02 15:52:23 +02:00

advanced

docs: fix documentation typos (#10125 )

2026-06-01 14:31:08 +02:00

features

docs(llama.cpp): note tensor split now works with quantized KV cache (#10135 )

2026-06-02 15:52:23 +02:00

getting-started

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

installation

feat(ui): Interactive model config editor with autocomplete (#9149 )