mirror of https://github.com/mudler/LocalAI.git synced 2026-04-16 12:59:33 -04:00

Files

Ettore Di Giacinto daa0272f2e docs(agents): capture vllm backend lessons + runtime lib packaging (#9333 )

New .agents/vllm-backend.md with everything that's easy to get wrong
on the vllm/vllm-omni backends:

- Use vLLM's native ToolParserManager / ReasoningParserManager — do
  not write regex-based parsers. Selection is explicit via Options[],
  defaults live in core/config/parser_defaults.json.
- Concrete parsers don't always accept the tools= kwarg the abstract
  base declares; try/except TypeError is mandatory.
- ChatDelta.tool_calls is the contract — Reply.message text alone
  won't surface tool calls in /v1/chat/completions.
- vllm version pin trap: 0.14.1+cpu pairs with torch 2.9.1+cpu.
  Newer wheels declare torch==2.10.0+cpu which only exists on the
  PyTorch test channel and pulls an incompatible torchvision.
- SIMD baseline: prebuilt wheel needs AVX-512 VNNI/BF16. SIGILL
  symptom + FROM_SOURCE=true escape hatch are documented.
- libnuma.so.1 + libgomp.so.1 must be bundled because vllm._C
  silently fails to register torch ops if they're missing.
- backend_hooks system: hooks_llamacpp / hooks_vllm split + the
  '*' / '' / named-backend keys.
- ToProto() must serialize ToolCallID and Reasoning — easy to miss
  when adding fields to schema.Message.

Also extended .agents/adding-backends.md with a generic 'Bundling
runtime shared libraries' section: Dockerfile.python is FROM scratch,
package.sh is the mechanism, libbackend.sh adds ${EDIR}/lib to
LD_LIBRARY_PATH, and how to verify packaging without trusting the
host (extract image, boot in fresh ubuntu container).

Index in AGENTS.md updated.

2026-04-13 11:09:57 +02:00

2.0 KiB

Raw Blame History

LocalAI Agent Instructions

This file is an index to detailed topic guides in the .agents/ directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.

Topics

File	When to read
.agents/building-and-testing.md	Building the project, running tests, Docker builds for specific platforms
.agents/adding-backends.md	Adding a new backend (Python, Go, or C++) — full step-by-step checklist
.agents/coding-style.md	Code style, editorconfig, logging, documentation conventions
.agents/llama-cpp-backend.md	Working on the llama.cpp backend — architecture, updating, tool call parsing
.agents/vllm-backend.md	Working on the vLLM / vLLM-omni backends — native parsers, ChatDelta, CPU build, libnuma packaging, backend hooks
.agents/testing-mcp-apps.md	Testing MCP Apps (interactive tool UIs) in the React UI
.agents/api-endpoints-and-auth.md	Adding API endpoints, auth middleware, feature permissions, user access control
.agents/debugging-backends.md	Debugging runtime backend failures, dependency conflicts, rebuilding backends
.agents/adding-gallery-models.md	Adding GGUF models from HuggingFace to the model gallery

Quick Reference

Logging: Use github.com/mudler/xlog (same API as slog)
Go style: Prefer any over interface{}
Comments: Explain why, not what
Docs: Update docs/content/ when adding features or changing config
Build: Inspect Makefile and .github/workflows/ — ask the user before running long builds
UI: The active UI is the React app in core/http/react-ui/. The older Alpine.js/HTML UI in core/http/static/ is pending deprecation — all new UI work goes in the React UI

2.0 KiB Raw Blame History

LocalAI Agent Instructions

Topics

Quick Reference

2.0 KiB

Raw Blame History