LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-06-26 17:37:07 -04:00

Files

Ettore Di Giacinto 167768cac3 feat(backend): llama-cpp-localai-paged variant + NVFP4 Qwen3.6 gallery

New backend = stock llama-cpp grpc-server + the paged patchset (forces LLAMA_PAGED=on),
shipped as its own meta-backend (mirrors turboquant, simpler: no fork pin, no
grpc-server patching - the paged runtime hooks already exist in grpc-server.cpp).
Stock llama-cpp untouched (LLAMA_PAGED?=on retained; the de-risk flip deferred for
sign-off). Gallery: qwen3.6-27b-nvfp4 (dense) + qwen3.6-35b-a3b-nvfp4 (MoE) with the
benchmark run config (paged_kv, max_batch_tokens, parallel, flash_attention, f16),
mudler/ GGUF uris (sha256 TODO until publish). Importer dropdown entry + tests.

Assisted-by: Claude:opus-4.8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-26 12:58:56 +00:00

build

feat: single-build ggml CPU_ALL_VARIANTS for llama-cpp + turboquant (x86/arm64/apple) (#10497 )

2026-06-25 15:47:03 +02:00

changed-backends.js

feat(backend): llama-cpp-localai-paged variant + NVFP4 Qwen3.6 gallery

2026-06-26 12:58:56 +00:00

coverage-check.sh

test: add Go + React UI coverage gates and fill test gaps (#9989 )

2026-05-26 22:06:10 +02:00

ensure-playwright-browser.sh

test: add Go + React UI coverage gates and fill test gaps (#9989 )