mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-17 04:56:52 -04:00
feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686)
Bring the sglang Python backend up to feature parity with vllm by adding
the same engine_args:-map plumbing the vLLM backend already has. Any
ServerArgs field (~380 in sglang 0.5.11) becomes settable from a model
YAML, including the speculative-decoding flags needed for Multi-Token
Prediction. Validation matches the vllm backend's: keys are checked
against dataclasses.fields(ServerArgs), unknown keys raise ValueError
with a difflib close-match suggestion at LoadModel time, and the typed
ModelOptions fields keep their existing meaning with engine_args
overriding them.
Backend code:
* backend/python/sglang/backend.py: add _apply_engine_args, import
dataclasses/difflib/ServerArgs, call from LoadModel; rename Seed ->
sampling_seed (sglang 0.5.11 renamed the SamplingParams field).
* backend/python/sglang/test.py + test.sh + Makefile: six unit tests
exercising the helper directly (no engine load required).
Build / CI / backend gallery (cuda13 + l4t13 paths are now first-class):
* backend/python/sglang/install.sh: add --prerelease=allow because
sglang 0.5.11 hard-pins flash-attn-4 which only ships beta wheels;
add --index-strategy=unsafe-best-match for cublas12 so the cu128
torch index wins over default-PyPI's cu130; new pyproject.toml-driven
l4t13 install path so [tool.uv.sources] can pin torch/torchvision/
torchaudio/sglang to the jetson-ai-lab index without forcing every
transitive PyPI dep through the L4T mirror's flaky proxy (mirrors the
equivalent fix in backend/python/vllm/install.sh).
* backend/python/sglang/pyproject.toml (new): L4T project spec with
explicit-source jetson-ai-lab index. Replaces requirements-l4t13.txt
for the l4t13 BUILD_PROFILE; other profiles still go through the
requirements-*.txt pipeline via libbackend.sh's installRequirements.
* backend/python/sglang/requirements-l4t13.txt: removed; superseded
by pyproject.toml.
* backend/python/sglang/requirements-cublas{12,13}{,-after}.txt: pin
sglang>=0.5.11 (Gemma 4 floor); add cu130 torch index for cublas13
(new files) and cu128 torch index for cublas12 (default PyPI now
ships cu130 torch wheels by default and breaks cu12 hosts).
* backend/index.yaml: add cuda13-sglang and cuda13-sglang-development
capability mappings + image entries pointing at
quay.io/.../-gpu-nvidia-cuda-13-sglang.
* .github/workflows/backend.yml: new cublas13 sglang matrix entry,
mirroring vllm's cuda13 build.
Model gallery + docs:
* gallery/sglang.yaml: base sglang config template, mirrors vllm.yaml.
* gallery/sglang-gemma-4-{e2b,e4b}-mtp.yaml: Gemma 4 MTP demos
transcribed verbatim from the SGLang Gemma 4 cookbook MTP commands.
* gallery/sglang-mimo-7b-mtp.yaml: MiMo-7B-RL with built-in MTP heads
+ online fp8 weight quantization, verified end-to-end on a 16 GB
RTX 5070 Ti at ~88 tok/s. Uses mem_fraction_static: 0.7 because the
MTP draft worker's vocab embedding is loaded unquantised and OOMs
the static reservation at sglang's 0.85 default.
* gallery/index.yaml: three new entries (gemma-4-e2b-it:sglang-mtp,
gemma-4-e4b-it:sglang-mtp, mimo-7b-mtp:sglang).
* docs/content/features/text-generation.md: new SGLang section with
setup, engine_args reference, MTP demos, version requirements.
* .agents/sglang-backend.md (new): agent one-pager covering the flat
ServerArgs structure, the typed-vs-engine_args precedence, the
speculative-decoding cheatsheet, and the mem_fraction_static gotcha
documented above.
* AGENTS.md: index entry for the new agent doc.
Known limitation: the two Gemma 4 MTP gallery entries ship a recipe
that doesn't yet run on stock libraries. The drafter checkpoints
(google/gemma-4-{E2B,E4B}-it-assistant) declare
model_type: gemma4_assistant / Gemma4AssistantForCausalLM, which
neither transformers (<=5.6.0, including the SGLang cookbook's pinned
commit 91b1ab1f... and main HEAD) nor sglang's own model registry
(<=0.5.11) registers as of 2026-05-06. They will start working when
HF or sglang upstream registers the architecture -- no LocalAI
changes needed. The MiMo MTP demo and the non-MTP Gemma 4 paths work
today on this build (verified on RTX 5070 Ti, 16 GB).
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] [WebFetch] [WebSearch]
Signed-off-by: Richard Palethorpe <io@richiejp.com>
This commit is contained in:
committed by
GitHub
parent
048daa0cdc
commit
c894d9c826
@@ -8,6 +8,12 @@ run: sglang
|
||||
bash run.sh
|
||||
@echo "sglang run."
|
||||
|
||||
.PHONY: test
|
||||
test: sglang
|
||||
@echo "Testing sglang..."
|
||||
bash test.sh
|
||||
@echo "sglang tested."
|
||||
|
||||
.PHONY: protogen-clean
|
||||
protogen-clean:
|
||||
$(RM) backend_pb2_grpc.py backend_pb2.py
|
||||
|
||||
@@ -9,10 +9,18 @@ The streaming path applies sglang's per-request FunctionCallParser and
|
||||
ReasoningParser so tool_calls and reasoning_content are emitted
|
||||
incrementally inside ChatDelta, which is a capability sglang exposes
|
||||
natively and vLLM does not.
|
||||
|
||||
Like the vLLM backend, this one accepts an arbitrary ``engine_args:``
|
||||
map in the model YAML; keys are validated against ``ServerArgs`` fields
|
||||
and forwarded to ``Engine(**kwargs)``. That covers speculative decoding
|
||||
(EAGLE/EAGLE3/DFLASH/NGRAM/STANDALONE plus MTP via NEXTN), attention
|
||||
backend selection, MoE knobs, hierarchical cache, and so on.
|
||||
"""
|
||||
import asyncio
|
||||
from concurrent import futures
|
||||
import argparse
|
||||
import dataclasses
|
||||
import difflib
|
||||
import signal
|
||||
import sys
|
||||
import os
|
||||
@@ -38,6 +46,7 @@ from grpc_auth import get_auth_interceptors
|
||||
# are wrapped in try/except so older / leaner installs that omit them
|
||||
# still load the backend for plain text generation.
|
||||
from sglang.srt.entrypoints.engine import Engine
|
||||
from sglang.srt.server_args import ServerArgs
|
||||
|
||||
try:
|
||||
from sglang.srt.function_call.function_call_parser import FunctionCallParser
|
||||
@@ -66,6 +75,19 @@ except Exception:
|
||||
HAS_TRANSFORMERS = False
|
||||
|
||||
|
||||
# sglang 0.5.11 renamed SamplingParams.seed -> sampling_seed (PR #21952).
|
||||
# Earlier 0.5.x releases (e.g. 0.5.1.post2 — the wheel still pinned by the
|
||||
# pypi.jetson-ai-lab.io sbsa/cu130 mirror used by the l4t13 build profile)
|
||||
# accept only `seed`. Detect the supported keyword once at import time so
|
||||
# both versions work without a hard pin floor.
|
||||
try:
|
||||
import inspect as _inspect
|
||||
from sglang.srt.sampling.sampling_params import SamplingParams as _SamplingParams
|
||||
_SEED_KEY = "sampling_seed" if "sampling_seed" in _inspect.signature(_SamplingParams).parameters else "seed"
|
||||
except Exception:
|
||||
_SEED_KEY = "sampling_seed"
|
||||
|
||||
|
||||
_ONE_DAY_IN_SECONDS = 60 * 60 * 24
|
||||
MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))
|
||||
|
||||
@@ -82,6 +104,37 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
|
||||
opts[key.strip()] = value.strip()
|
||||
return opts
|
||||
|
||||
def _apply_engine_args(self, engine_kwargs: dict, engine_args_json: str) -> dict:
|
||||
"""Merge user-supplied engine_args (JSON object) into the kwargs dict
|
||||
that will be forwarded to ``sglang.Engine`` (which constructs a
|
||||
``ServerArgs`` from them).
|
||||
|
||||
Mirrors ``backend/python/vllm/backend.py::_apply_engine_args`` but
|
||||
operates on the kwargs dict because sglang's ``Engine.__init__``
|
||||
accepts ``**kwargs`` directly rather than a pre-built dataclass.
|
||||
Validation happens against ``ServerArgs`` fields so a typo fails
|
||||
early with a close-match suggestion instead of producing a confusing
|
||||
``TypeError`` deep inside engine startup.
|
||||
"""
|
||||
if not engine_args_json:
|
||||
return engine_kwargs
|
||||
try:
|
||||
extra = json.loads(engine_args_json)
|
||||
except json.JSONDecodeError as e:
|
||||
raise ValueError(f"engine_args is not valid JSON: {e}") from e
|
||||
if not isinstance(extra, dict):
|
||||
raise ValueError(
|
||||
f"engine_args must be a JSON object, got {type(extra).__name__}"
|
||||
)
|
||||
valid = {f.name for f in dataclasses.fields(ServerArgs)}
|
||||
for key in extra:
|
||||
if key not in valid:
|
||||
suggestion = difflib.get_close_matches(key, valid, n=1)
|
||||
hint = f" did you mean {suggestion[0]!r}?" if suggestion else ""
|
||||
raise ValueError(f"unknown engine_args key {key!r}.{hint}")
|
||||
engine_kwargs.update(extra)
|
||||
return engine_kwargs
|
||||
|
||||
def _messages_to_dicts(self, messages) -> List[dict]:
|
||||
result: List[dict] = []
|
||||
for msg in messages:
|
||||
@@ -137,6 +190,16 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
|
||||
if self.reasoning_parser_name:
|
||||
engine_kwargs["reasoning_parser"] = self.reasoning_parser_name
|
||||
|
||||
# engine_args from YAML overrides typed fields above so operators can
|
||||
# tune anything ServerArgs exposes (speculative decoding, attention
|
||||
# backend, MoE, hierarchical cache, …) without waiting on protobuf
|
||||
# changes.
|
||||
try:
|
||||
engine_kwargs = self._apply_engine_args(engine_kwargs, request.EngineArgs)
|
||||
except ValueError as err:
|
||||
print(f"engine_args error: {err}", file=sys.stderr)
|
||||
return backend_pb2.Result(success=False, message=str(err))
|
||||
|
||||
try:
|
||||
self.llm = Engine(**engine_kwargs)
|
||||
except Exception as err:
|
||||
@@ -221,7 +284,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
|
||||
"TopP": "top_p",
|
||||
"TopK": "top_k",
|
||||
"MinP": "min_p",
|
||||
"Seed": "seed",
|
||||
"Seed": _SEED_KEY,
|
||||
"StopPrompts": "stop",
|
||||
"StopTokenIds": "stop_token_ids",
|
||||
"IgnoreEOS": "ignore_eos",
|
||||
|
||||
@@ -23,17 +23,32 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# cublas12 needs a cu128 torch index (see requirements-cublas12.txt) — without
|
||||
# unsafe-best-match uv falls through to default PyPI's cu130 torch wheel and
|
||||
# the resulting sgl-kernel can't load on our cu12 host libs.
|
||||
if [ "x${BUILD_PROFILE}" == "xcublas12" ]; then
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# sglang 0.5.11 (Gemma 4 support) declares flash-attn-4 as a hard dep, but
|
||||
# upstream only publishes pre-release wheels (4.0.0b*). uv rejects
|
||||
# pre-releases by default — opt in for sglang specifically. Drop this once
|
||||
# flash-attn-4 4.0 stable lands.
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --prerelease=allow"
|
||||
|
||||
# JetPack 7 / L4T arm64 wheels are built for cp312 and shipped via
|
||||
# pypi.jetson-ai-lab.io. Bump the venv Python so the prebuilt sglang
|
||||
# wheel resolves cleanly. unsafe-best-match is required because the
|
||||
# jetson-ai-lab index lists transitive deps (e.g. decord) at older
|
||||
# versions only — without it uv refuses to fall through to PyPI for a
|
||||
# compatible wheel and resolution fails.
|
||||
# wheel resolves cleanly. The actual install on l4t13 goes through
|
||||
# pyproject.toml (see the elif branch below) so [tool.uv.sources] can
|
||||
# pin only torch/torchvision/torchaudio/sglang to the jetson-ai-lab
|
||||
# index — leaving PyPI as the path for transitive deps like
|
||||
# markdown-it-py / anthropic / propcache that the L4T mirror's proxy
|
||||
# 503s on. No --index-strategy flag here: the explicit index keeps the
|
||||
# scoping clean.
|
||||
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
PYTHON_VERSION="3.12"
|
||||
PYTHON_PATCH="12"
|
||||
PY_STANDALONE_TAG="20251120"
|
||||
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
||||
fi
|
||||
|
||||
# sglang's CPU path has no prebuilt wheel on PyPI — upstream publishes
|
||||
@@ -95,6 +110,27 @@ if [ "x${BUILD_TYPE}" == "x" ] || [ "x${FROM_SOURCE:-}" == "xtrue" ]; then
|
||||
fi
|
||||
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} .
|
||||
popd
|
||||
# L4T arm64 (JetPack 7): drive the install through pyproject.toml so that
|
||||
# [tool.uv.sources] can pin torch/torchvision/torchaudio/sglang to the
|
||||
# jetson-ai-lab index, while everything else (transitive deps and
|
||||
# PyPI-resolvable packages like transformers / accelerate) comes from
|
||||
# PyPI. Bypasses installRequirements because uv pip install -r
|
||||
# requirements.txt does not honor sources — see
|
||||
# backend/python/sglang/pyproject.toml for the rationale. Mirrors the
|
||||
# equivalent path in backend/python/vllm/install.sh.
|
||||
elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
||||
ensureVenv
|
||||
if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then
|
||||
export C_INCLUDE_PATH="${C_INCLUDE_PATH:-}:$(_portable_dir)/include/python${PYTHON_VERSION}"
|
||||
fi
|
||||
pushd "${backend_dir}"
|
||||
# Build deps first (matches installRequirements' requirements-install.txt
|
||||
# pass — sglang/sgl-kernel sdists need packaging/setuptools-scm in the
|
||||
# venv before they can build under --no-build-isolation).
|
||||
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} -r requirements-install.txt
|
||||
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --requirement pyproject.toml
|
||||
popd
|
||||
runProtogen
|
||||
else
|
||||
installRequirements
|
||||
fi
|
||||
|
||||
68
backend/python/sglang/pyproject.toml
Normal file
68
backend/python/sglang/pyproject.toml
Normal file
@@ -0,0 +1,68 @@
|
||||
# L4T arm64 (JetPack 7 / sbsa cu130) install spec for the sglang backend.
|
||||
#
|
||||
# Why this file exists, and why only the l4t13 BUILD_PROFILE consumes it:
|
||||
#
|
||||
# pypi.jetson-ai-lab.io hosts the L4T-specific torch / sglang / sgl-kernel
|
||||
# wheels we need on aarch64 + cuda13, but it ALSO transparently proxies the
|
||||
# rest of PyPI through `/+f/<sha>/<filename>` URLs that 503 frequently.
|
||||
# With `--extra-index-url` + `--index-strategy=unsafe-best-match` (the
|
||||
# historical fix in install.sh) uv would pick those proxy URLs for ordinary
|
||||
# PyPI packages — markdown-it-py, anthropic, propcache, etc. — and trip on
|
||||
# the 503s. See e.g. CI run 25439791228 (markdown-it-py-4.0.0).
|
||||
#
|
||||
# `explicit = true` on the index makes uv consult the L4T mirror ONLY for
|
||||
# packages mapped under [tool.uv.sources]. Everything else goes to PyPI.
|
||||
# This breaks the historical 503 path without losing access to the L4T
|
||||
# wheels we actually need from there. Mirrors the equivalent fix already
|
||||
# in backend/python/vllm/pyproject.toml.
|
||||
#
|
||||
# `uv pip install -r requirements.txt` does NOT honor [tool.uv.sources]
|
||||
# (sources are project-mode only, not pip-compat mode), so install.sh's
|
||||
# l4t13 branch invokes `uv pip install --requirement pyproject.toml`
|
||||
# directly. Other BUILD_PROFILEs continue to use the requirements-*.txt
|
||||
# pipeline through libbackend.sh's installRequirements and never read
|
||||
# this file.
|
||||
[project]
|
||||
name = "localai-sglang-l4t13"
|
||||
version = "0.0.0"
|
||||
requires-python = ">=3.12,<3.13"
|
||||
dependencies = [
|
||||
# Mirror of requirements.txt — kept in sync manually for now since the
|
||||
# l4t13 path bypasses installRequirements (see install.sh).
|
||||
"grpcio==1.80.0",
|
||||
"protobuf",
|
||||
"certifi",
|
||||
"setuptools",
|
||||
"pillow",
|
||||
# L4T-specific accelerator stack (sourced from jetson-ai-lab below).
|
||||
"torch",
|
||||
"torchvision",
|
||||
"torchaudio",
|
||||
# sglang on jetson — the [all] extra is deliberately omitted because it
|
||||
# pulls outlines/decord, and decord has no aarch64 cp312 wheel anywhere
|
||||
# (PyPI nor the jetson-ai-lab index ships only legacy cp35-cp37). With
|
||||
# [all] uv backtracks through versions trying to satisfy decord and
|
||||
# lands on sglang==0.1.16. The 0.5.0 floor matches the only major
|
||||
# series the jetson-ai-lab sbsa/cu130 mirror currently publishes
|
||||
# (sglang==0.5.1.post2 as of 2026-05-06). Bumping to >=0.5.11 here
|
||||
# would make the build unsatisfiable until the mirror catches up.
|
||||
# Gemma 4 / MTP recipes are therefore not supported on l4t13 — those
|
||||
# features land on cublas12/cublas13 hosts that pull the newer wheel
|
||||
# from PyPI. backend.py keeps backward compat with the 0.5.x SamplingParams
|
||||
# field rename via runtime detection.
|
||||
"sglang>=0.5.0",
|
||||
# PyPI-resolvable packages that complete the runtime.
|
||||
"accelerate",
|
||||
"transformers",
|
||||
]
|
||||
|
||||
[[tool.uv.index]]
|
||||
name = "jetson-ai-lab"
|
||||
url = "https://pypi.jetson-ai-lab.io/sbsa/cu130"
|
||||
explicit = true
|
||||
|
||||
[tool.uv.sources]
|
||||
torch = { index = "jetson-ai-lab" }
|
||||
torchvision = { index = "jetson-ai-lab" }
|
||||
torchaudio = { index = "jetson-ai-lab" }
|
||||
sglang = { index = "jetson-ai-lab" }
|
||||
@@ -1,3 +1,4 @@
|
||||
# Bump this pin deliberately — sglang releases weekly and API surfaces
|
||||
# (FunctionCallParser, ReasoningParser) move between releases.
|
||||
sglang[all]>=0.4.0
|
||||
# 0.5.11 is the floor for Gemma 4 support (PR sgl-project/sglang#21952).
|
||||
sglang[all]>=0.5.11
|
||||
|
||||
@@ -1,5 +1,12 @@
|
||||
# sglang 0.5.11 hard-pins torch==2.9.1. PyPI's default torch 2.9.1 wheel is
|
||||
# now the cu130 build, which drags in cu130-flavoured sgl-kernel/sglang-kernel
|
||||
# binaries that need libnvrtc.so.13 — incompatible with our cu12 host libs.
|
||||
# Pin the cu128 PyTorch index so uv pulls cu12-flavoured torch (and the
|
||||
# matching sgl-kernel cu12 wheels). install.sh adds --index-strategy=unsafe-best-match
|
||||
# for cublas12 so uv consults this index alongside PyPI.
|
||||
--extra-index-url https://download.pytorch.org/whl/cu128
|
||||
accelerate
|
||||
torch==2.7.1
|
||||
torch==2.9.1
|
||||
torchvision
|
||||
torchaudio==2.7.1
|
||||
torchaudio
|
||||
transformers
|
||||
|
||||
4
backend/python/sglang/requirements-cublas13-after.txt
Normal file
4
backend/python/sglang/requirements-cublas13-after.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
# Bump this pin deliberately — sglang releases weekly and API surfaces
|
||||
# (FunctionCallParser, ReasoningParser) move between releases.
|
||||
# 0.5.11 is the floor for Gemma 4 support (PR sgl-project/sglang#21952).
|
||||
sglang[all]>=0.5.11
|
||||
6
backend/python/sglang/requirements-cublas13.txt
Normal file
6
backend/python/sglang/requirements-cublas13.txt
Normal file
@@ -0,0 +1,6 @@
|
||||
--extra-index-url https://download.pytorch.org/whl/cu130
|
||||
accelerate
|
||||
torch
|
||||
torchvision
|
||||
torchaudio
|
||||
transformers
|
||||
@@ -1,12 +0,0 @@
|
||||
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
|
||||
accelerate
|
||||
torch
|
||||
torchvision
|
||||
torchaudio
|
||||
transformers
|
||||
# Drop the [all] extra: it pulls outlines/decord, and decord has no
|
||||
# aarch64 cp312 wheel anywhere (PyPI nor the jetson-ai-lab index ships
|
||||
# only legacy cp35-cp37). With [all] uv backtracks through versions
|
||||
# trying to satisfy decord and lands on sglang==0.1.16. Floor at 0.5.0
|
||||
# so uv can't silently downgrade if a future resolution misfires.
|
||||
sglang>=0.5.0
|
||||
101
backend/python/sglang/test.py
Normal file
101
backend/python/sglang/test.py
Normal file
@@ -0,0 +1,101 @@
|
||||
"""Unit tests for the sglang backend.
|
||||
|
||||
Helper-level tests run without launching the gRPC server or loading model
|
||||
weights — they only exercise the pure-Python helpers on
|
||||
``BackendServicer``. They do still require ``sglang`` to be importable
|
||||
because ``_apply_engine_args`` validates keys against
|
||||
``ServerArgs``'s dataclass fields.
|
||||
"""
|
||||
import unittest
|
||||
|
||||
|
||||
class TestSglangHelpers(unittest.TestCase):
|
||||
"""Tests for the pure helpers on BackendServicer (no gRPC, no engine)."""
|
||||
|
||||
def _servicer(self):
|
||||
import sys
|
||||
import os
|
||||
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
|
||||
from backend import BackendServicer # noqa: E402
|
||||
return BackendServicer()
|
||||
|
||||
def test_parse_options(self):
|
||||
servicer = self._servicer()
|
||||
opts = servicer._parse_options([
|
||||
"tool_parser:hermes",
|
||||
"reasoning_parser:deepseek_r1",
|
||||
"invalid_no_colon",
|
||||
"key_with_colons:a:b:c",
|
||||
])
|
||||
self.assertEqual(opts["tool_parser"], "hermes")
|
||||
self.assertEqual(opts["reasoning_parser"], "deepseek_r1")
|
||||
self.assertEqual(opts["key_with_colons"], "a:b:c")
|
||||
self.assertNotIn("invalid_no_colon", opts)
|
||||
|
||||
def test_apply_engine_args_known_keys(self):
|
||||
"""User-supplied JSON merges into the kwargs dict; pre-set typed
|
||||
fields stay put when not overridden."""
|
||||
import json as _json
|
||||
servicer = self._servicer()
|
||||
base = {
|
||||
"model_path": "facebook/opt-125m",
|
||||
"mem_fraction_static": 0.7,
|
||||
}
|
||||
extras = _json.dumps({
|
||||
"trust_remote_code": True,
|
||||
"speculative_algorithm": "EAGLE",
|
||||
"speculative_num_steps": 1,
|
||||
})
|
||||
out = servicer._apply_engine_args(base, extras)
|
||||
self.assertIs(out, base) # in-place merge — same dict back
|
||||
self.assertTrue(out["trust_remote_code"])
|
||||
self.assertEqual(out["speculative_algorithm"], "EAGLE")
|
||||
self.assertEqual(out["speculative_num_steps"], 1)
|
||||
self.assertEqual(out["model_path"], "facebook/opt-125m")
|
||||
self.assertEqual(out["mem_fraction_static"], 0.7)
|
||||
|
||||
def test_apply_engine_args_engine_args_overrides_typed_fields(self):
|
||||
"""engine_args wins over previously-set typed kwargs (vLLM precedence)."""
|
||||
import json as _json
|
||||
servicer = self._servicer()
|
||||
base = {"model_path": "facebook/opt-125m", "mem_fraction_static": 0.7}
|
||||
out = servicer._apply_engine_args(
|
||||
base, _json.dumps({"mem_fraction_static": 0.5}),
|
||||
)
|
||||
self.assertEqual(out["mem_fraction_static"], 0.5)
|
||||
|
||||
def test_apply_engine_args_unknown_key_raises(self):
|
||||
"""Typo'd key raises ValueError with a close-match suggestion."""
|
||||
import json as _json
|
||||
servicer = self._servicer()
|
||||
base = {"model_path": "facebook/opt-125m"}
|
||||
with self.assertRaises(ValueError) as ctx:
|
||||
servicer._apply_engine_args(
|
||||
base, _json.dumps({"trust_remotecode": True}),
|
||||
)
|
||||
msg = str(ctx.exception)
|
||||
self.assertIn("trust_remotecode", msg)
|
||||
self.assertIn("trust_remote_code", msg)
|
||||
|
||||
def test_apply_engine_args_empty_passthrough(self):
|
||||
"""Empty / None engine_args returns the kwargs dict untouched."""
|
||||
servicer = self._servicer()
|
||||
base = {"model_path": "facebook/opt-125m"}
|
||||
self.assertIs(servicer._apply_engine_args(base, ""), base)
|
||||
self.assertIs(servicer._apply_engine_args(base, None), base)
|
||||
|
||||
def test_apply_engine_args_invalid_json_raises(self):
|
||||
servicer = self._servicer()
|
||||
with self.assertRaises(ValueError) as ctx:
|
||||
servicer._apply_engine_args({}, "not-json")
|
||||
self.assertIn("not valid JSON", str(ctx.exception))
|
||||
|
||||
def test_apply_engine_args_non_object_raises(self):
|
||||
servicer = self._servicer()
|
||||
with self.assertRaises(ValueError) as ctx:
|
||||
servicer._apply_engine_args({}, "[1,2,3]")
|
||||
self.assertIn("must be a JSON object", str(ctx.exception))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
12
backend/python/sglang/test.sh
Executable file
12
backend/python/sglang/test.sh
Executable file
@@ -0,0 +1,12 @@
|
||||
#!/bin/bash
|
||||
set -e
|
||||
|
||||
backend_dir=$(dirname $0)
|
||||
|
||||
if [ -d $backend_dir/common ]; then
|
||||
source $backend_dir/common/libbackend.sh
|
||||
else
|
||||
source $backend_dir/../common/libbackend.sh
|
||||
fi
|
||||
|
||||
runUnittests
|
||||
Reference in New Issue
Block a user