mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-17 04:56:52 -04:00
Bring the sglang Python backend up to feature parity with vllm by adding
the same engine_args:-map plumbing the vLLM backend already has. Any
ServerArgs field (~380 in sglang 0.5.11) becomes settable from a model
YAML, including the speculative-decoding flags needed for Multi-Token
Prediction. Validation matches the vllm backend's: keys are checked
against dataclasses.fields(ServerArgs), unknown keys raise ValueError
with a difflib close-match suggestion at LoadModel time, and the typed
ModelOptions fields keep their existing meaning with engine_args
overriding them.
Backend code:
* backend/python/sglang/backend.py: add _apply_engine_args, import
dataclasses/difflib/ServerArgs, call from LoadModel; rename Seed ->
sampling_seed (sglang 0.5.11 renamed the SamplingParams field).
* backend/python/sglang/test.py + test.sh + Makefile: six unit tests
exercising the helper directly (no engine load required).
Build / CI / backend gallery (cuda13 + l4t13 paths are now first-class):
* backend/python/sglang/install.sh: add --prerelease=allow because
sglang 0.5.11 hard-pins flash-attn-4 which only ships beta wheels;
add --index-strategy=unsafe-best-match for cublas12 so the cu128
torch index wins over default-PyPI's cu130; new pyproject.toml-driven
l4t13 install path so [tool.uv.sources] can pin torch/torchvision/
torchaudio/sglang to the jetson-ai-lab index without forcing every
transitive PyPI dep through the L4T mirror's flaky proxy (mirrors the
equivalent fix in backend/python/vllm/install.sh).
* backend/python/sglang/pyproject.toml (new): L4T project spec with
explicit-source jetson-ai-lab index. Replaces requirements-l4t13.txt
for the l4t13 BUILD_PROFILE; other profiles still go through the
requirements-*.txt pipeline via libbackend.sh's installRequirements.
* backend/python/sglang/requirements-l4t13.txt: removed; superseded
by pyproject.toml.
* backend/python/sglang/requirements-cublas{12,13}{,-after}.txt: pin
sglang>=0.5.11 (Gemma 4 floor); add cu130 torch index for cublas13
(new files) and cu128 torch index for cublas12 (default PyPI now
ships cu130 torch wheels by default and breaks cu12 hosts).
* backend/index.yaml: add cuda13-sglang and cuda13-sglang-development
capability mappings + image entries pointing at
quay.io/.../-gpu-nvidia-cuda-13-sglang.
* .github/workflows/backend.yml: new cublas13 sglang matrix entry,
mirroring vllm's cuda13 build.
Model gallery + docs:
* gallery/sglang.yaml: base sglang config template, mirrors vllm.yaml.
* gallery/sglang-gemma-4-{e2b,e4b}-mtp.yaml: Gemma 4 MTP demos
transcribed verbatim from the SGLang Gemma 4 cookbook MTP commands.
* gallery/sglang-mimo-7b-mtp.yaml: MiMo-7B-RL with built-in MTP heads
+ online fp8 weight quantization, verified end-to-end on a 16 GB
RTX 5070 Ti at ~88 tok/s. Uses mem_fraction_static: 0.7 because the
MTP draft worker's vocab embedding is loaded unquantised and OOMs
the static reservation at sglang's 0.85 default.
* gallery/index.yaml: three new entries (gemma-4-e2b-it:sglang-mtp,
gemma-4-e4b-it:sglang-mtp, mimo-7b-mtp:sglang).
* docs/content/features/text-generation.md: new SGLang section with
setup, engine_args reference, MTP demos, version requirements.
* .agents/sglang-backend.md (new): agent one-pager covering the flat
ServerArgs structure, the typed-vs-engine_args precedence, the
speculative-decoding cheatsheet, and the mem_fraction_static gotcha
documented above.
* AGENTS.md: index entry for the new agent doc.
Known limitation: the two Gemma 4 MTP gallery entries ship a recipe
that doesn't yet run on stock libraries. The drafter checkpoints
(google/gemma-4-{E2B,E4B}-it-assistant) declare
model_type: gemma4_assistant / Gemma4AssistantForCausalLM, which
neither transformers (<=5.6.0, including the SGLang cookbook's pinned
commit 91b1ab1f... and main HEAD) nor sglang's own model registry
(<=0.5.11) registers as of 2026-05-06. They will start working when
HF or sglang upstream registers the architecture -- no LocalAI
changes needed. The MiMo MTP demo and the non-MTP Gemma 4 paths work
today on this build (verified on RTX 5070 Ti, 16 GB).
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] [WebFetch] [WebSearch]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
4.7 KiB
4.7 KiB
LocalAI Agent Instructions
This file is the entry point for AI coding assistants (Claude Code, Cursor, Copilot, Codex, Aider, etc.) working on LocalAI. It is an index to detailed topic guides in the .agents/ directory. Read the relevant file(s) for the task at hand — you don't need to load all of them.
Human contributors: see CONTRIBUTING.md for the development workflow.
Policy for AI-Assisted Contributions
LocalAI follows the Linux kernel project's guidelines for AI coding assistants. Before submitting AI-assisted code, read .agents/ai-coding-assistants.md. Key rules:
- No
Signed-off-byfrom AI. Only the human submitter may sign off on the Developer Certificate of Origin. - No
Co-Authored-By: <AI>trailers. The human contributor owns the change. - Use an
Assisted-by:trailer to attribute AI involvement. Format:Assisted-by: AGENT_NAME:MODEL_VERSION [TOOL1] [TOOL2]. - The human submitter is responsible for reviewing, testing, and understanding every line of generated code.
Topics
| File | When to read |
|---|---|
| .agents/ai-coding-assistants.md | Policy for AI-assisted contributions — licensing, DCO, attribution |
| .agents/building-and-testing.md | Building the project, running tests, Docker builds for specific platforms |
| .agents/ci-caching.md | CI build cache layout (registry-backed BuildKit cache on quay.io/go-skynet/ci-cache), DEPS_REFRESH weekly cache-buster for unpinned Python deps, manual eviction |
| .agents/adding-backends.md | Adding a new backend (Python, Go, or C++) — full step-by-step checklist, including importer integration (the /import-model dropdown is server-driven from GET /backends/known) |
| .agents/coding-style.md | Code style, editorconfig, logging, documentation conventions |
| .agents/llama-cpp-backend.md | Working on the llama.cpp backend — architecture, updating, tool call parsing |
| .agents/vllm-backend.md | Working on the vLLM / vLLM-omni backends — native parsers, ChatDelta, CPU build, libnuma packaging, backend hooks |
| .agents/sglang-backend.md | Working on the SGLang backend — engine_args validation against ServerArgs, speculative-decoding (EAGLE/EAGLE3/DFLASH/MTP) recipes, parser handling |
| .agents/testing-mcp-apps.md | Testing MCP Apps (interactive tool UIs) in the React UI |
| .agents/api-endpoints-and-auth.md | Adding API endpoints, auth middleware, feature permissions, user access control |
| .agents/debugging-backends.md | Debugging runtime backend failures, dependency conflicts, rebuilding backends |
| .agents/adding-gallery-models.md | Adding GGUF models from HuggingFace to the model gallery |
| .agents/localai-assistant-mcp.md | LocalAI Assistant chat modality — adding admin tools to the in-process MCP server, editing skill prompts, keeping REST + MCP + skills in sync |
Quick Reference
- Logging: Use
github.com/mudler/xlog(same API as slog) - Go style: Prefer
anyoverinterface{} - Comments: Explain why, not what
- Docs: Update
docs/content/when adding features or changing config - New API endpoints: LocalAI advertises its capability surface in several independent places — swagger
@Tags,/api/instructionsregistry, authRouteFeatureRegistry, React UIcapabilities.js, docs. Read .agents/api-endpoints-and-auth.md and follow its checklist — missing any surface means clients, admins, and the UI won't know the endpoint exists. - Admin endpoints → MCP tool: every admin endpoint that an admin would manage conversationally (install/list/edit/toggle/upgrade) MUST also be exposed as an MCP tool in
pkg/mcp/localaitools/. The LocalAI Assistant chat modality and the standalonelocal-ai mcp-serverconsume that package; drift between REST and MCP is a real risk. Read .agents/localai-assistant-mcp.md — theTestToolHTTPRouteMappingCompletetest fails until you wire the new tool and update the route map. - Build: Inspect
Makefileand.github/workflows/— ask the user before running long builds - UI: The active UI is the React app in
core/http/react-ui/. The older Alpine.js/HTML UI incore/http/static/is pending deprecation — all new UI work goes in the React UI