From 1be959ce30e68ed686a630932dba2754a6d5fed9 Mon Sep 17 00:00:00 2001 From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com> Date: Sat, 20 Jun 2026 11:04:56 +0200 Subject: [PATCH] docs: mention apex-quant in the README (#10412) Add apex-quant (MoE per-tensor/per-layer quantization recipe) to the "Backends built by us" section as a note after the engines table, since it is a quantization recipe rather than a native inference engine. Signed-off-by: Ettore Di Giacinto Co-authored-by: Ettore Di Giacinto --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index b05af2dfb..5fff7db69 100644 --- a/README.md +++ b/README.md @@ -240,6 +240,8 @@ Most backends wrap a best-in-class upstream engine. A handful of them are native | [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation | | [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings (shipped in-tree) | +We also maintain [apex-quant](https://github.com/localai-org/apex-quant), a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp. + ## Resources - [Documentation](https://localai.io/)