mirror of
https://github.com/mudler/LocalAI.git
synced 2026-06-20 22:59:09 -04:00
docs: mention apex-quant in the README (#10412)
Add apex-quant (MoE per-tensor/per-layer quantization recipe) to the "Backends built by us" section as a note after the engines table, since it is a quantization recipe rather than a native inference engine. Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
@@ -240,6 +240,8 @@ Most backends wrap a best-in-class upstream engine. A handful of them are native
|
||||
| [LocalVQE](https://github.com/localai-org/LocalVQE) | Joint acoustic echo cancellation, noise suppression, and dereverberation |
|
||||
| [local-store](https://github.com/mudler/LocalAI) | Local-first vector database for embeddings (shipped in-tree) |
|
||||
|
||||
We also maintain [apex-quant](https://github.com/localai-org/apex-quant), a per-tensor, per-layer quantization recipe for Mixture-of-Experts models that exploits their structural sparsity to produce GGUFs matching or beating Q8_0 quality - and they run out of the box on stock llama.cpp.
|
||||
|
||||
## Resources
|
||||
|
||||
- [Documentation](https://localai.io/)
|
||||
|
||||
Reference in New Issue
Block a user