LocalAI/docs/content/features at 595e4487148bc637e45ef3b19289441ea352c156 - LocalAI - Gitea: Git with a cup of tea

mirror/LocalAI

mirror of https://github.com/mudler/LocalAI.git synced 2026-07-22 14:14:55 -04:00

Files

History

Ettore Di Giacinto 595e448714 docs(llama.cpp): note tensor split now works with quantized KV cache (#10135 )

The split_mode: tensor description claimed tensor parallelism requires
KV-cache quantization to be disabled. ggml-org/llama.cpp#23792 lifts that
restriction by extending the meta backend to preserve shape information
through KV-cache flatten/reshape, so cache_type_k/cache_type_v
quantization can be combined with -sm tensor on builds that include it.

Documentation only: no backend code, grpc-server.cpp comment, or
llama.cpp pin changes.


Assisted-by: Claude Code:claude-opus-4-8

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

2026-06-02 15:52:23 +02:00

..

_index.en.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

agents.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

api-discovery.md

feat(api): Allow coding agents to interactively discover how to control and configure LocalAI (#9084 )

2026-04-04 15:14:35 +02:00

audio-diarization.md

feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp (#9654 )

2026-05-05 15:10:13 +02:00

audio-to-text.md

feat(parakeet-cpp): dynamic batching for concurrent transcription requests (#10112 )

2026-06-02 14:49:02 +02:00

audio-transform.md

feat(localvqe/audio): v1.3 release and add spectrograms to audio transform UI (#10113 )

2026-05-31 23:56:46 +02:00

authentication.md

feat(usage): track and visualise usage per API key (#9920 )

2026-05-21 16:34:02 +02:00

backend-monitor.md

fix(backend-monitor): accept model as a query parameter (#9411 )

2026-04-21 22:06:35 +02:00

backends.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

cloud-proxy.md

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

constrained_grammars.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

distributed_inferencing.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

distributed-mode.md

feat(ds4): layer-split distributed inference (#10098 )

2026-05-31 00:09:55 +02:00

distribution.md

fix(docs): commit distribution.md

2026-04-03 10:14:13 +02:00

embeddings.md

feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis (#9480 )

2026-04-22 21:55:41 +02:00

face-recognition.md

fix(docs): replace Docsy alert shortcode with Relearn notice

2026-04-25 21:04:31 +00:00

fine-tuning.md

fix(docs): Use notice instead of alert (#9134 )

2026-03-25 13:55:48 +01:00

gpt-vision.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

GPU-acceleration.md

feat(rocm): bump to 7.x (#9323 )

2026-04-12 08:51:30 +02:00

image-generation.md

docs: fix documentation typos (#10125 )

2026-06-01 14:31:08 +02:00

localai-assistant.md

feat: localai assistant chat modality (#9602 )

2026-04-28 19:29:27 +02:00

mcp.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

middleware.md

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

mitm-proxy.md

feat(middleware): Model routing, PII filtering, Cloud model proxies (#9802 )

2026-05-25 09:28:27 +02:00

mlx-distributed.md

feat(mlx-distributed): add new MLX-distributed backend (#8801 )

2026-03-09 17:29:32 +01:00

model-gallery.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

object-detection.md

feat(backend): rfdetr-cpp native object detection + segmentation backend (#10028 )

2026-05-27 18:43:57 +02:00

openai-functions.md

docs: document tool calling on vLLM and MLX backends

2026-04-13 16:58:55 +00:00

openai-realtime.md

Remove header from OpenAI Realtime API documentation

2026-04-09 09:00:28 +02:00

p2p.md

feat: Add documentation for undocumented API endpoints (#8852 )

2026-03-08 17:59:33 +01:00

quantization.md

fix(docs): Use notice instead of alert (#9134 )

2026-03-25 13:55:48 +01:00

reranker.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

runtime-settings.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

sound-generation.md

feat: Add documentation for undocumented API endpoints (#8852 )

2026-03-08 17:59:33 +01:00

stores.md

fix(docs): replace Docsy alert shortcode with Relearn notice

2026-04-25 21:04:31 +00:00

text-generation.md

docs(llama.cpp): note tensor split now works with quantized KV cache (#10135 )

2026-06-02 15:52:23 +02:00

text-to-audio.md

fix(docs): fix broken references to distributed mode

2026-04-03 09:46:06 +02:00

video-generation.md

feat: Add documentation for undocumented API endpoints (#8852 )

2026-03-08 17:59:33 +01:00

voice-activity-detection.md

feat: Add documentation for undocumented API endpoints (#8852 )

2026-03-08 17:59:33 +01:00

voice-recognition.md

fix(docs): replace Docsy alert shortcode with Relearn notice

2026-04-25 21:04:31 +00:00