Files
LocalAI/docs/content
Ettore Di Giacinto 595e448714 docs(llama.cpp): note tensor split now works with quantized KV cache (#10135)
The split_mode: tensor description claimed tensor parallelism requires
KV-cache quantization to be disabled. ggml-org/llama.cpp#23792 lifts that
restriction by extending the meta backend to preserve shape information
through KV-cache flatten/reshape, so cache_type_k/cache_type_v
quantization can be combined with -sm tensor on builds that include it.

Documentation only: no backend code, grpc-server.cpp comment, or
llama.cpp pin changes.


Assisted-by: Claude Code:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-02 15:52:23 +02:00
..
2025-12-08 16:59:11 +01:00
2025-11-19 22:21:20 +01:00