LocalAI [bot] 6e1dbae256 feat(llama-cpp): expose 12 missing common_params via options[] (#9814)
The llama.cpp backend already accepts a free-form options: array in the
model config that maps to common_params fields, but a coverage audit
against upstream pin 7f3f843c flagged 12 user-visible knobs that were
neither set via the typed proto fields nor reachable via options:.

Wire them up under the existing if/else chain in params_parse, before
the speculative section. Each new option follows the file's prevailing
patterns (try/catch around numeric parses, the same true/1/yes/on bool
form used elsewhere, hardware_concurrency() fallback for thread counts,
mirror of draft_override_tensor for override_tensor).

Top-level / batching / IO:
  - n_ubatch (alias ubatch) -- physical batch size; was previously
    force-aliased to n_batch at line 482, blocking embedding/rerank
    workloads that need independent control
  - threads_batch (alias n_threads_batch) -- main-model batch threads;
    mirrors the existing draft_threads_batch
  - direct_io (alias use_direct_io) -- O_DIRECT model loads
  - verbosity -- llama.cpp log threshold (line 479 had this commented
    out)
  - override_tensor (alias tensor_buft_overrides) -- per-tensor buffer
    overrides for the main model; mirrors draft_override_tensor

Embedding / multimodal:
  - pooling_type (alias pooling) -- mean/cls/last/rank/none; previously
    only auto-flipped to RANK for rerankers
  - embd_normalize (alias embedding_normalize) -- and the embedding
    handler now reads params_base.embd_normalize instead of a hardcoded
    2 at the previous embd_normalize literal in Embedding()
  - mmproj_use_gpu (alias mmproj_offload) -- mmproj on CPU vs GPU
  - image_min_tokens / image_max_tokens -- per-image vision token budget

Reasoning surface (the audit-focus three; LocalAI's existing
ReasoningConfig.DisableReasoning only feeds the per-request
chat_template_kwargs.enable_thinking and does not touch any of these):
  - reasoning_format -- none/auto/deepseek/deepseek-legacy parser
  - enable_reasoning (alias reasoning_budget) -- -1/0/>0 thinking budget
  - prefill_assistant -- trailing-assistant-message prefill toggle

All 14 referenced fields exist on both the upstream pin and the
turboquant fork's common.h, so no LOCALAI_LEGACY_LLAMA_CPP_SPEC guard
is needed.

Docs: extend model-configuration.md with new "Reasoning Models",
"Multimodal Backend Options", "Embedding & Reranking Backend Options",
and "Other Backend Tuning Options" subsections; also refresh the
Speculative Type Values table to show the new dash-separated canonical
names alongside the underscore aliases LocalAI still accepts.


Assisted-by: claude-code:claude-opus-4-7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
2026-05-14 08:53:34 +02:00
2026-04-08 19:23:16 +02:00
2025-02-15 18:17:15 +01:00
2023-05-04 15:01:29 +02:00




LocalAI stars LocalAI License

Follow LocalAI_API Join LocalAI Discord Community

mudler%2FLocalAI | Trendshift

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

  • Drop-in API compatibility — OpenAI, Anthropic, ElevenLabs APIs
  • 36+ backends — llama.cpp, vLLM, transformers, whisper, diffusers, MLX...
  • Any hardware — NVIDIA, AMD, Intel, Apple Silicon, Vulkan, or CPU-only
  • Multi-user ready — API key auth, user quotas, role-based access
  • Built-in AI agents — autonomous agents with tool use, RAG, MCP, and skills
  • Privacy-first — your data never leaves your infrastructure

Created by Ettore Di Giacinto and maintained by the LocalAI team.

📖 Documentation | 💬 Discord | 💻 Quickstart | 🖼️ Models | FAQ

Guided tour

https://github.com/user-attachments/assets/08cbb692-57da-48f7-963d-2e7b43883c18

Click to see more!

User and auth

https://github.com/user-attachments/assets/228fa9ad-81a3-4d43-bfb9-31557e14a36c

Agents

https://github.com/user-attachments/assets/6270b331-e21d-4087-a540-6290006b381a

Usage metrics per user

https://github.com/user-attachments/assets/cbb03379-23b4-4e3d-bd26-d152f057007f

Fine-tuning and Quantization

https://github.com/user-attachments/assets/5ba4ace9-d3df-4795-b7d4-b0b404ea71ee

WebRTC

https://github.com/user-attachments/assets/ed88e34c-fed3-4b83-8a67-4716a9feeb7b

Quickstart

macOS

Download LocalAI for macOS

Note: The DMG is not signed by Apple. After installing, run: sudo xattr -d com.apple.quarantine /Applications/LocalAI.app. See #6268 for details.

Containers (Docker, podman, ...)

Already ran LocalAI before? Use docker start -i local-ai to restart an existing container.

CPU only:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU:

# CUDA 13
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13

# CUDA 12
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12

# NVIDIA Jetson ARM64 (CUDA 12, for AGX Orin and similar)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64

# NVIDIA Jetson ARM64 (CUDA 13, for DGX Spark)
docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

AMD GPU (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU (oneAPI):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

Vulkan GPU:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

Loading models

# From the model gallery (see available models with `local-ai models list` or at https://models.localai.io)
local-ai run llama-3.2-1b-instruct:q4_k_m
# From Huggingface
local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf
# From the Ollama OCI registry
local-ai run ollama://gemma:2b
# From a YAML config
local-ai run https://gist.githubusercontent.com/.../phi-2.yaml
# From a standard OCI registry (e.g., Docker Hub)
local-ai run oci://localai/phi-2:latest

Automatic Backend Detection: LocalAI automatically detects your GPU capabilities and downloads the appropriate backend. For advanced options, see GPU Acceleration.

For more details, see the Getting Started guide.

Latest News

For older news and full release notes, see GitHub Releases and the News page.

Features

Supported Backends & Acceleration

LocalAI supports 36+ backends including llama.cpp, vLLM, transformers, whisper.cpp, diffusers, MLX, MLX-VLM, and many more. Hardware acceleration is available for NVIDIA (CUDA 12/13), AMD (ROCm), Intel (oneAPI/SYCL), Apple Silicon (Metal), Vulkan, and NVIDIA Jetson (L4T). All backends can be installed on-the-fly from the Backend Gallery.

See the full Backend & Model Compatibility Table and GPU Acceleration guide.

Resources

Team

LocalAI is maintained by a small team of humans, together with the wider community of contributors.

A huge thank you to everyone who contributes code, reviews PRs, files issues, and helps users in Discord — LocalAI is a community-driven project and wouldn't exist without you. See the full contributors list.

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai,
  author = {Ettore Di Giacinto},
  title = {LocalAI: The free, Open source OpenAI alternative},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/go-skynet/LocalAI}},

Sponsors

Do you find LocalAI useful?

Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.

A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:


Individual sponsors

A special thanks to individual sponsors, a full list is on GitHub and buymeacoffee. Special shout out to drikster80 for being generous. Thank you everyone!

Star history

LocalAI Star history Chart

License

LocalAI is a community-driven project created by Ettore Di Giacinto and maintained by the LocalAI team.

MIT - Author Ettore Di Giacinto mudler@localai.io

Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

Contributors

This is a community project, a special thanks to our contributors!

Description
No description provided
Readme MIT 109 MiB
Languages
Go 66.6%
JavaScript 12.6%
Python 6.8%
HTML 5.7%
C++ 3.2%
Other 5.1%