LocalAI/backend/cpp/llama-cpp/grpc-server.cpp at f4036fa83f748c8aa48ca055134ac11f5b5eeb37

mirror of https://github.com/mudler/LocalAI.git synced 2026-04-30 12:08:13 -04:00

Files

Ettore Di Giacinto 21eace40ec feat(llama-cpp): expose split_mode option for multi-GPU placement (#9560 )

Adds split_mode (alias sm) to the llama.cpp backend options allowlist,
accepting none|layer|row|tensor. The tensor value targets the experimental
backend-agnostic tensor parallelism from ggml-org/llama.cpp#19378 and
requires a llama.cpp build that includes that PR, FlashAttention enabled,
KV-cache quantization disabled, and a manually set context size.


Assisted-by: Claude:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-04-25 14:02:57 +02:00

163 KiB

Raw Blame History

View Raw

163 KiB Raw Blame History

163 KiB

Raw Blame History