LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-25 00:59:28 -04:00

Files

Ettore Di Giacinto 811f0db2e3 feat(llama-cpp): add main-model cpu_moe/n_cpu_moe options

Mirror the existing draft_cpu_moe/draft_n_cpu_moe siblings for the main
model, matching upstream --cpu-moe / --n-cpu-moe (common/arg.cpp). Lets
users keep MoE expert weights on CPU to manage VRAM on large MoE models.

Closes part of #10483

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-24 17:07:14 +00:00

CMakeLists.txt

fix(turboquant): resolve common.h by detecting llama-common vs common target (#9413 )

2026-04-18 20:30:28 +02:00

grpc-server.cpp

feat(llama-cpp): add main-model cpu_moe/n_cpu_moe options

2026-06-24 17:07:14 +00:00

Makefile

chore: ⬆️ Update ggml-org/llama.cpp to be4a6a63eb2b848e19c277bdcf2bd399e8af76d9 (#10467 )

2026-06-24 09:40:54 +02:00

package.sh

fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend (#9099 )