The split_mode: tensor description claimed tensor parallelism requires
KV-cache quantization to be disabled. ggml-org/llama.cpp#23792 lifts that
restriction by extending the meta backend to preserve shape information
through KV-cache flatten/reshape, so cache_type_k/cache_type_v
quantization can be combined with -sm tensor on builds that include it.
Documentation only: no backend code, grpc-server.cpp comment, or
llama.cpp pin changes.
Assisted-by: Claude Code:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>