Files
LocalAI/backend/cpp/llama-cpp
Ettore Di Giacinto 811f0db2e3 feat(llama-cpp): add main-model cpu_moe/n_cpu_moe options
Mirror the existing draft_cpu_moe/draft_n_cpu_moe siblings for the main
model, matching upstream --cpu-moe / --n-cpu-moe (common/arg.cpp). Lets
users keep MoE expert weights on CPU to manage VRAM on large MoE models.

Closes part of #10483

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-24 17:07:14 +00:00
..
2026-04-12 08:51:30 +02:00