docs: mention apex-quant in the README (#10412)

Add apex-quant (MoE per-tensor/per-layer quantization recipe) to the "Backends built by us" section as a note after the engines table, since it is a quantization recipe rather than a native inference engine. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-06-20 22:59:09 -04:00 · 2026-06-20 11:04:56 +02:00
parent 518381278e
commit 1be959ce30
1 changed files with 2 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -240,6 +240,8 @@ Most backends wrap a best-in-class upstream engine. A handful of them are native
 | [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation |
 | [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings (shipped in-tree) |

+We also maintain [apex-quant](https://github.com/localai-org/apex-quant), a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp.
+
 ## Resources

 - [Documentation](https://localai.io/)