mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-16 12:59:33 -04:00
New .agents/vllm-backend.md with everything that's easy to get wrong
on the vllm/vllm-omni backends:
- Use vLLM's native ToolParserManager / ReasoningParserManager — do
not write regex-based parsers. Selection is explicit via Options[],
defaults live in core/config/parser_defaults.json.
- Concrete parsers don't always accept the tools= kwarg the abstract
base declares; try/except TypeError is mandatory.
- ChatDelta.tool_calls is the contract — Reply.message text alone
won't surface tool calls in /v1/chat/completions.
- vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu.
Newer wheels declare torch==2.10.0+cpu which only exists on the
PyTorch test channel and pulls an incompatible torchvision.
- SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL
symptom + FROM_SOURCE=true escape hatch are documented.
- libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C
silently fails to register torch ops if they're missing.
- backend_hooks system: hooks_llamacpp / hooks_vllm split + the
'*' / '' / named-backend keys.
- ToProto() must serialize ToolCallID and Reasoning — easy to miss
when adding fields to schema.Message.
Also extended .agents/adding-backends.md with a generic 'Bundling
runtime shared libraries' section: Dockerfile.python is FROM scratch,
package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to
LD_LIBRARY_PATH, and how to verify packaging without trusting the
host (extract image, boot in fresh ubuntu container).
Index in AGENTS.md updated.
2.0 KiB
2.0 KiB
LocalAI Agent Instructions
This file is an index to detailed topic guides in the .agents/ directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
Topics
| File | When to read |
|---|---|
| .agents/building-and-testing.md | Building the project, running tests, Docker builds for specific platforms |
| .agents/adding-backends.md | Adding a new backend (Python, Go, or C++) — full step-by-step checklist |
| .agents/coding-style.md | Code style, editorconfig, logging, documentation conventions |
| .agents/llama-cpp-backend.md | Working on the llama.cpp backend — architecture, updating, tool call parsing |
| .agents/vllm-backend.md | Working on the vLLM / vLLM-omni backends — native parsers, ChatDelta, CPU build, libnuma packaging, backend hooks |
| .agents/testing-mcp-apps.md | Testing MCP Apps (interactive tool UIs) in the React UI |
| .agents/api-endpoints-and-auth.md | Adding API endpoints, auth middleware, feature permissions, user access control |
| .agents/debugging-backends.md | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
| .agents/adding-gallery-models.md | Adding GGUF models from HuggingFace to the model gallery |
Quick Reference
- Logging: Use
github.com/mudler/xlog(same API as slog) - Go style: Prefer
anyoverinterface{} - Comments: Explain why, not what
- Docs: Update
docs/content/when adding features or changing config - Build: Inspect
Makefileand.github/workflows/— ask the user before running long builds - UI: The active UI is the React app in
core/http/react-ui/. The older Alpine.js/HTML UI incore/http/static/is pending deprecation — all new UI work goes in the React UI