feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos (#9686)

Bring the sglang Python backend up to feature parity with vllm by adding the same engine_args:-map plumbing the vLLM backend already has. Any ServerArgs field (~380 in sglang 0.5.11) becomes settable from a model YAML, including the speculative-decoding flags needed for Multi-Token Prediction. Validation matches the vllm backend's: keys are checked against dataclasses.fields(ServerArgs), unknown keys raise ValueError with a difflib close-match suggestion at LoadModel time, and the typed ModelOptions fields keep their existing meaning with engine_args overriding them. Backend code: * backend/python/sglang/backend.py: add _apply_engine_args, import dataclasses/difflib/ServerArgs, call from LoadModel; rename Seed -> sampling_seed (sglang 0.5.11 renamed the SamplingParams field). * backend/python/sglang/test.py + test.sh + Makefile: six unit tests exercising the helper directly (no engine load required). Build / CI / backend gallery (cuda13 + l4t13 paths are now first-class): * backend/python/sglang/install.sh: add --prerelease=allow because sglang 0.5.11 hard-pins flash-attn-4 which only ships beta wheels; add --index-strategy=unsafe-best-match for cublas12 so the cu128 torch index wins over default-PyPI's cu130; new pyproject.toml-driven l4t13 install path so [tool.uv.sources] can pin torch/torchvision/ torchaudio/sglang to the jetson-ai-lab index without forcing every transitive PyPI dep through the L4T mirror's flaky proxy (mirrors the equivalent fix in backend/python/vllm/install.sh). * backend/python/sglang/pyproject.toml (new): L4T project spec with explicit-source jetson-ai-lab index. Replaces requirements-l4t13.txt for the l4t13 BUILD_PROFILE; other profiles still go through the requirements-*.txt pipeline via libbackend.sh's installRequirements. * backend/python/sglang/requirements-l4t13.txt: removed; superseded by pyproject.toml. * backend/python/sglang/requirements-cublas{12,13}{,-after}.txt: pin sglang>=0.5.11 (Gemma 4 floor); add cu130 torch index for cublas13 (new files) and cu128 torch index for cublas12 (default PyPI now ships cu130 torch wheels by default and breaks cu12 hosts). * backend/index.yaml: add cuda13-sglang and cuda13-sglang-development capability mappings + image entries pointing at quay.io/.../-gpu-nvidia-cuda-13-sglang. * .github/workflows/backend.yml: new cublas13 sglang matrix entry, mirroring vllm's cuda13 build. Model gallery + docs: * gallery/sglang.yaml: base sglang config template, mirrors vllm.yaml. * gallery/sglang-gemma-4-{e2b,e4b}-mtp.yaml: Gemma 4 MTP demos transcribed verbatim from the SGLang Gemma 4 cookbook MTP commands. * gallery/sglang-mimo-7b-mtp.yaml: MiMo-7B-RL with built-in MTP heads + online fp8 weight quantization, verified end-to-end on a 16 GB RTX 5070 Ti at ~88 tok/s. Uses mem_fraction_static: 0.7 because the MTP draft worker's vocab embedding is loaded unquantised and OOMs the static reservation at sglang's 0.85 default. * gallery/index.yaml: three new entries (gemma-4-e2b-it:sglang-mtp, gemma-4-e4b-it:sglang-mtp, mimo-7b-mtp:sglang). * docs/content/features/text-generation.md: new SGLang section with setup, engine_args reference, MTP demos, version requirements. * .agents/sglang-backend.md (new): agent one-pager covering the flat ServerArgs structure, the typed-vs-engine_args precedence, the speculative-decoding cheatsheet, and the mem_fraction_static gotcha documented above. * AGENTS.md: index entry for the new agent doc. Known limitation: the two Gemma 4 MTP gallery entries ship a recipe that doesn't yet run on stock libraries. The drafter checkpoints (google/gemma-4-{E2B,E4B}-it-assistant) declare model_type: gemma4_assistant / Gemma4AssistantForCausalLM, which neither transformers (<=5.6.0, including the SGLang cookbook's pinned commit 91b1ab1f... and main HEAD) nor sglang's own model registry (<=0.5.11) registers as of 2026-05-06. They will start working when HF or sglang upstream registers the architecture -- no LocalAI changes needed. The MiMo MTP demo and the non-MTP Gemma 4 paths work today on this build (verified on RTX 5070 Ti, 16 GB). Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash] [WebFetch] [WebSearch] Signed-off-by: Richard Palethorpe <io@richiejp.com>
2026-05-17 04:56:52 -04:00 · 2026-05-07 16:27:29 +01:00
parent 048daa0cdc
commit c894d9c826
21 changed files with 732 additions and 21 deletions
--- a/backend/python/sglang/Makefile
+++ b/backend/python/sglang/Makefile
@@ -8,6 +8,12 @@ run: sglang
 	bash run.sh
 	@echo "sglang run."

+.PHONY: test
+test: sglang
+	@echo "Testing sglang..."
+	bash test.sh
+	@echo "sglang tested."
+
 .PHONY: protogen-clean
 protogen-clean:
 	$(RM) backend_pb2_grpc.py backend_pb2.py
--- a/backend/python/sglang/backend.py
+++ b/backend/python/sglang/backend.py
@@ -9,10 +9,18 @@ The streaming path applies sglang's per-request FunctionCallParser and
 ReasoningParser so tool_calls and reasoning_content are emitted
 incrementally inside ChatDelta, which is a capability sglang exposes
 natively and vLLM does not.
+
+Like the vLLM backend, this one accepts an arbitrary ``engine_args:``
+map in the model YAML; keys are validated against ``ServerArgs`` fields
+and forwarded to ``Engine(**kwargs)``. That covers speculative decoding
+(EAGLE/EAGLE3/DFLASH/NGRAM/STANDALONE plus MTP via NEXTN), attention
+backend selection, MoE knobs, hierarchical cache, and so on.
 """
 import asyncio
 from concurrent import futures
 import argparse
+import dataclasses
+import difflib
 import signal
 import sys
 import os
@@ -38,6 +46,7 @@ from grpc_auth import get_auth_interceptors
 # are wrapped in try/except so older / leaner installs that omit them
 # still load the backend for plain text generation.
 from sglang.srt.entrypoints.engine import Engine
+from sglang.srt.server_args import ServerArgs

 try:
    from sglang.srt.function_call.function_call_parser import FunctionCallParser
@@ -66,6 +75,19 @@ except Exception:
    HAS_TRANSFORMERS = False


+# sglang 0.5.11 renamed SamplingParams.seed -> sampling_seed (PR #21952).
+# Earlier 0.5.x releases (e.g. 0.5.1.post2 — the wheel still pinned by the
+# pypi.jetson-ai-lab.io sbsa/cu130 mirror used by the l4t13 build profile)
+# accept only `seed`. Detect the supported keyword once at import time so
+# both versions work without a hard pin floor.
+try:
+    import inspect as _inspect
+    from sglang.srt.sampling.sampling_params import SamplingParams as _SamplingParams
+    _SEED_KEY = "sampling_seed" if "sampling_seed" in _inspect.signature(_SamplingParams).parameters else "seed"
+except Exception:
+    _SEED_KEY = "sampling_seed"
+
+
 _ONE_DAY_IN_SECONDS = 60 * 60 * 24
 MAX_WORKERS = int(os.environ.get('PYTHON_GRPC_MAX_WORKERS', '1'))

@@ -82,6 +104,37 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
            opts[key.strip()] = value.strip()
        return opts

+    def _apply_engine_args(self, engine_kwargs: dict, engine_args_json: str) -> dict:
+        """Merge user-supplied engine_args (JSON object) into the kwargs dict
+        that will be forwarded to ``sglang.Engine`` (which constructs a
+        ``ServerArgs`` from them).
+
+        Mirrors ``backend/python/vllm/backend.py::_apply_engine_args`` but
+        operates on the kwargs dict because sglang's ``Engine.__init__``
+        accepts ``**kwargs`` directly rather than a pre-built dataclass.
+        Validation happens against ``ServerArgs`` fields so a typo fails
+        early with a close-match suggestion instead of producing a confusing
+        ``TypeError`` deep inside engine startup.
+        """
+        if not engine_args_json:
+            return engine_kwargs
+        try:
+            extra = json.loads(engine_args_json)
+        except json.JSONDecodeError as e:
+            raise ValueError(f"engine_args is not valid JSON: {e}") from e
+        if not isinstance(extra, dict):
+            raise ValueError(
+                f"engine_args must be a JSON object, got {type(extra).__name__}"
+            )
+        valid = {f.name for f in dataclasses.fields(ServerArgs)}
+        for key in extra:
+            if key not in valid:
+                suggestion = difflib.get_close_matches(key, valid, n=1)
+                hint = f" did you mean {suggestion[0]!r}?" if suggestion else ""
+                raise ValueError(f"unknown engine_args key {key!r}.{hint}")
+        engine_kwargs.update(extra)
+        return engine_kwargs
+
    def _messages_to_dicts(self, messages) -> List[dict]:
        result: List[dict] = []
        for msg in messages:
@@ -137,6 +190,16 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
        if self.reasoning_parser_name:
            engine_kwargs["reasoning_parser"] = self.reasoning_parser_name

+        # engine_args from YAML overrides typed fields above so operators can
+        # tune anything ServerArgs exposes (speculative decoding, attention
+        # backend, MoE, hierarchical cache, …) without waiting on protobuf
+        # changes.
+        try:
+            engine_kwargs = self._apply_engine_args(engine_kwargs, request.EngineArgs)
+        except ValueError as err:
+            print(f"engine_args error: {err}", file=sys.stderr)
+            return backend_pb2.Result(success=False, message=str(err))
+
        try:
            self.llm = Engine(**engine_kwargs)
        except Exception as err:
@@ -221,7 +284,7 @@ class BackendServicer(backend_pb2_grpc.BackendServicer):
            "TopP": "top_p",
            "TopK": "top_k",
            "MinP": "min_p",
-            "Seed": "seed",
+            "Seed": _SEED_KEY,
            "StopPrompts": "stop",
            "StopTokenIds": "stop_token_ids",
            "IgnoreEOS": "ignore_eos",
--- a/backend/python/sglang/install.sh
+++ b/backend/python/sglang/install.sh
@@ -23,17 +23,32 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
    EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
 fi

+# cublas12 needs a cu128 torch index (see requirements-cublas12.txt) — without
+# unsafe-best-match uv falls through to default PyPI's cu130 torch wheel and
+# the resulting sgl-kernel can't load on our cu12 host libs.
+if [ "x${BUILD_PROFILE}" == "xcublas12" ]; then
+    EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
+fi
+
+# sglang 0.5.11 (Gemma 4 support) declares flash-attn-4 as a hard dep, but
+# upstream only publishes pre-release wheels (4.0.0b*). uv rejects
+# pre-releases by default — opt in for sglang specifically. Drop this once
+# flash-attn-4 4.0 stable lands.
+EXTRA_PIP_INSTALL_FLAGS+=" --prerelease=allow"
+
 # JetPack 7 / L4T arm64 wheels are built for cp312 and shipped via
 # pypi.jetson-ai-lab.io. Bump the venv Python so the prebuilt sglang
-# wheel resolves cleanly. unsafe-best-match is required because the
-# jetson-ai-lab index lists transitive deps (e.g. decord) at older
-# versions only — without it uv refuses to fall through to PyPI for a
-# compatible wheel and resolution fails.
+# wheel resolves cleanly. The actual install on l4t13 goes through
+# pyproject.toml (see the elif branch below) so [tool.uv.sources] can
+# pin only torch/torchvision/torchaudio/sglang to the jetson-ai-lab
+# index — leaving PyPI as the path for transitive deps like
+# markdown-it-py / anthropic / propcache that the L4T mirror's proxy
+# 503s on. No --index-strategy flag here: the explicit index keeps the
+# scoping clean.
 if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
    PYTHON_VERSION="3.12"
    PYTHON_PATCH="12"
    PY_STANDALONE_TAG="20251120"
-    EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
 fi

 # sglang's CPU path has no prebuilt wheel on PyPI — upstream publishes
@@ -95,6 +110,27 @@ if [ "x${BUILD_TYPE}" == "x" ] || [ "x${FROM_SOURCE:-}" == "xtrue" ]; then
        fi
        uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} .
    popd
+# L4T arm64 (JetPack 7): drive the install through pyproject.toml so that
+# [tool.uv.sources] can pin torch/torchvision/torchaudio/sglang to the
+# jetson-ai-lab index, while everything else (transitive deps and
+# PyPI-resolvable packages like transformers / accelerate) comes from
+# PyPI. Bypasses installRequirements because uv pip install -r
+# requirements.txt does not honor sources — see
+# backend/python/sglang/pyproject.toml for the rationale. Mirrors the
+# equivalent path in backend/python/vllm/install.sh.
+elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
+    ensureVenv
+    if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then
+        export C_INCLUDE_PATH="${C_INCLUDE_PATH:-}:$(_portable_dir)/include/python${PYTHON_VERSION}"
+    fi
+    pushd "${backend_dir}"
+        # Build deps first (matches installRequirements' requirements-install.txt
+        # pass — sglang/sgl-kernel sdists need packaging/setuptools-scm in the
+        # venv before they can build under --no-build-isolation).
+        uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} -r requirements-install.txt
+        uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --requirement pyproject.toml
+    popd
+    runProtogen
 else
    installRequirements
 fi
--- a/backend/python/sglang/pyproject.toml
+++ b/backend/python/sglang/pyproject.toml
@@ -0,0 +1,68 @@
+# L4T arm64 (JetPack 7 / sbsa cu130) install spec for the sglang backend.
+#
+# Why this file exists, and why only the l4t13 BUILD_PROFILE consumes it:
+#
+# pypi.jetson-ai-lab.io hosts the L4T-specific torch / sglang / sgl-kernel
+# wheels we need on aarch64 + cuda13, but it ALSO transparently proxies the
+# rest of PyPI through `/+f/<sha>/<filename>` URLs that 503 frequently.
+# With `--extra-index-url` + `--index-strategy=unsafe-best-match` (the
+# historical fix in install.sh) uv would pick those proxy URLs for ordinary
+# PyPI packages — markdown-it-py, anthropic, propcache, etc. — and trip on
+# the 503s. See e.g. CI run 25439791228 (markdown-it-py-4.0.0).
+#
+# `explicit = true` on the index makes uv consult the L4T mirror ONLY for
+# packages mapped under [tool.uv.sources]. Everything else goes to PyPI.
+# This breaks the historical 503 path without losing access to the L4T
+# wheels we actually need from there. Mirrors the equivalent fix already
+# in backend/python/vllm/pyproject.toml.
+#
+# `uv pip install -r requirements.txt` does NOT honor [tool.uv.sources]
+# (sources are project-mode only, not pip-compat mode), so install.sh's
+# l4t13 branch invokes `uv pip install --requirement pyproject.toml`
+# directly. Other BUILD_PROFILEs continue to use the requirements-*.txt
+# pipeline through libbackend.sh's installRequirements and never read
+# this file.
+[project]
+name = "localai-sglang-l4t13"
+version = "0.0.0"
+requires-python = ">=3.12,<3.13"
+dependencies = [
+    # Mirror of requirements.txt — kept in sync manually for now since the
+    # l4t13 path bypasses installRequirements (see install.sh).
+    "grpcio==1.80.0",
+    "protobuf",
+    "certifi",
+    "setuptools",
+    "pillow",
+    # L4T-specific accelerator stack (sourced from jetson-ai-lab below).
+    "torch",
+    "torchvision",
+    "torchaudio",
+    # sglang on jetson — the [all] extra is deliberately omitted because it
+    # pulls outlines/decord, and decord has no aarch64 cp312 wheel anywhere
+    # (PyPI nor the jetson-ai-lab index ships only legacy cp35-cp37). With
+    # [all] uv backtracks through versions trying to satisfy decord and
+    # lands on sglang==0.1.16. The 0.5.0 floor matches the only major
+    # series the jetson-ai-lab sbsa/cu130 mirror currently publishes
+    # (sglang==0.5.1.post2 as of 2026-05-06). Bumping to >=0.5.11 here
+    # would make the build unsatisfiable until the mirror catches up.
+    # Gemma 4 / MTP recipes are therefore not supported on l4t13 — those
+    # features land on cublas12/cublas13 hosts that pull the newer wheel
+    # from PyPI. backend.py keeps backward compat with the 0.5.x SamplingParams
+    # field rename via runtime detection.
+    "sglang>=0.5.0",
+    # PyPI-resolvable packages that complete the runtime.
+    "accelerate",
+    "transformers",
+]
+
+[[tool.uv.index]]
+name = "jetson-ai-lab"
+url = "https://pypi.jetson-ai-lab.io/sbsa/cu130"
+explicit = true
+
+[tool.uv.sources]
+torch = { index = "jetson-ai-lab" }
+torchvision = { index = "jetson-ai-lab" }
+torchaudio = { index = "jetson-ai-lab" }
+sglang = { index = "jetson-ai-lab" }
--- a/backend/python/sglang/requirements-cublas12-after.txt
+++ b/backend/python/sglang/requirements-cublas12-after.txt
@@ -1,3 +1,4 @@
 # Bump this pin deliberately — sglang releases weekly and API surfaces
 # (FunctionCallParser, ReasoningParser) move between releases.
-sglang[all]>=0.4.0
+# 0.5.11 is the floor for Gemma 4 support (PR sgl-project/sglang#21952).
+sglang[all]>=0.5.11
--- a/backend/python/sglang/requirements-cublas12.txt
+++ b/backend/python/sglang/requirements-cublas12.txt
@@ -1,5 +1,12 @@
+# sglang 0.5.11 hard-pins torch==2.9.1. PyPI's default torch 2.9.1 wheel is
+# now the cu130 build, which drags in cu130-flavoured sgl-kernel/sglang-kernel
+# binaries that need libnvrtc.so.13 — incompatible with our cu12 host libs.
+# Pin the cu128 PyTorch index so uv pulls cu12-flavoured torch (and the
+# matching sgl-kernel cu12 wheels). install.sh adds --index-strategy=unsafe-best-match
+# for cublas12 so uv consults this index alongside PyPI.
+--extra-index-url https://download.pytorch.org/whl/cu128
 accelerate
-torch==2.7.1
+torch==2.9.1
 torchvision
-torchaudio==2.7.1
+torchaudio
 transformers
--- a/backend/python/sglang/requirements-cublas13-after.txt
+++ b/backend/python/sglang/requirements-cublas13-after.txt
@@ -0,0 +1,4 @@
+# Bump this pin deliberately — sglang releases weekly and API surfaces
+# (FunctionCallParser, ReasoningParser) move between releases.
+# 0.5.11 is the floor for Gemma 4 support (PR sgl-project/sglang#21952).
+sglang[all]>=0.5.11
--- a/backend/python/sglang/requirements-cublas13.txt
+++ b/backend/python/sglang/requirements-cublas13.txt
@@ -0,0 +1,6 @@
+--extra-index-url https://download.pytorch.org/whl/cu130
+accelerate
+torch
+torchvision
+torchaudio
+transformers
--- a/backend/python/sglang/requirements-l4t13.txt
+++ b/backend/python/sglang/requirements-l4t13.txt
@@ -1,12 +0,0 @@
--extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130
-accelerate
-torch
-torchvision
-torchaudio
-transformers
-# Drop the [all] extra: it pulls outlines/decord, and decord has no
-# aarch64 cp312 wheel anywhere (PyPI nor the jetson-ai-lab index ships
-# only legacy cp35-cp37). With [all] uv backtracks through versions
-# trying to satisfy decord and lands on sglang==0.1.16. Floor at 0.5.0
-# so uv can't silently downgrade if a future resolution misfires.
-sglang>=0.5.0
--- a/backend/python/sglang/test.py
+++ b/backend/python/sglang/test.py
@@ -0,0 +1,101 @@
+"""Unit tests for the sglang backend.
+
+Helper-level tests run without launching the gRPC server or loading model
+weights — they only exercise the pure-Python helpers on
+``BackendServicer``. They do still require ``sglang`` to be importable
+because ``_apply_engine_args`` validates keys against
+``ServerArgs``'s dataclass fields.
+"""
+import unittest
+
+
+class TestSglangHelpers(unittest.TestCase):
+    """Tests for the pure helpers on BackendServicer (no gRPC, no engine)."""
+
+    def _servicer(self):
+        import sys
+        import os
+        sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
+        from backend import BackendServicer  # noqa: E402
+        return BackendServicer()
+
+    def test_parse_options(self):
+        servicer = self._servicer()
+        opts = servicer._parse_options([
+            "tool_parser:hermes",
+            "reasoning_parser:deepseek_r1",
+            "invalid_no_colon",
+            "key_with_colons:a:b:c",
+        ])
+        self.assertEqual(opts["tool_parser"], "hermes")
+        self.assertEqual(opts["reasoning_parser"], "deepseek_r1")
+        self.assertEqual(opts["key_with_colons"], "a:b:c")
+        self.assertNotIn("invalid_no_colon", opts)
+
+    def test_apply_engine_args_known_keys(self):
+        """User-supplied JSON merges into the kwargs dict; pre-set typed
+        fields stay put when not overridden."""
+        import json as _json
+        servicer = self._servicer()
+        base = {
+            "model_path": "facebook/opt-125m",
+            "mem_fraction_static": 0.7,
+        }
+        extras = _json.dumps({
+            "trust_remote_code": True,
+            "speculative_algorithm": "EAGLE",
+            "speculative_num_steps": 1,
+        })
+        out = servicer._apply_engine_args(base, extras)
+        self.assertIs(out, base)  # in-place merge — same dict back
+        self.assertTrue(out["trust_remote_code"])
+        self.assertEqual(out["speculative_algorithm"], "EAGLE")
+        self.assertEqual(out["speculative_num_steps"], 1)
+        self.assertEqual(out["model_path"], "facebook/opt-125m")
+        self.assertEqual(out["mem_fraction_static"], 0.7)
+
+    def test_apply_engine_args_engine_args_overrides_typed_fields(self):
+        """engine_args wins over previously-set typed kwargs (vLLM precedence)."""
+        import json as _json
+        servicer = self._servicer()
+        base = {"model_path": "facebook/opt-125m", "mem_fraction_static": 0.7}
+        out = servicer._apply_engine_args(
+            base, _json.dumps({"mem_fraction_static": 0.5}),
+        )
+        self.assertEqual(out["mem_fraction_static"], 0.5)
+
+    def test_apply_engine_args_unknown_key_raises(self):
+        """Typo'd key raises ValueError with a close-match suggestion."""
+        import json as _json
+        servicer = self._servicer()
+        base = {"model_path": "facebook/opt-125m"}
+        with self.assertRaises(ValueError) as ctx:
+            servicer._apply_engine_args(
+                base, _json.dumps({"trust_remotecode": True}),
+            )
+        msg = str(ctx.exception)
+        self.assertIn("trust_remotecode", msg)
+        self.assertIn("trust_remote_code", msg)
+
+    def test_apply_engine_args_empty_passthrough(self):
+        """Empty / None engine_args returns the kwargs dict untouched."""
+        servicer = self._servicer()
+        base = {"model_path": "facebook/opt-125m"}
+        self.assertIs(servicer._apply_engine_args(base, ""), base)
+        self.assertIs(servicer._apply_engine_args(base, None), base)
+
+    def test_apply_engine_args_invalid_json_raises(self):
+        servicer = self._servicer()
+        with self.assertRaises(ValueError) as ctx:
+            servicer._apply_engine_args({}, "not-json")
+        self.assertIn("not valid JSON", str(ctx.exception))
+
+    def test_apply_engine_args_non_object_raises(self):
+        servicer = self._servicer()
+        with self.assertRaises(ValueError) as ctx:
+            servicer._apply_engine_args({}, "[1,2,3]")
+        self.assertIn("must be a JSON object", str(ctx.exception))
+
+
+if __name__ == "__main__":
+    unittest.main()
--- a/backend/python/sglang/test.sh
+++ b/backend/python/sglang/test.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+set -e
+
+backend_dir=$(dirname $0)
+
+if [ -d $backend_dir/common ]; then
+    source $backend_dir/common/libbackend.sh
+else
+    source $backend_dir/../common/libbackend.sh
+fi
+
+runUnittests