From 5cda4f1ccfe11c049ee811fa07249fde28d84c63 Mon Sep 17 00:00:00 2001 From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com> Date: Fri, 22 May 2026 23:01:22 +0200 Subject: [PATCH] fix(L4T13 backends): switch vllm/sglang/vllm-omni to PyPI aarch64+cu130 wheels (#9950) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels The L4T13 vllm backend pulled torch / torchvision / torchaudio / vllm from pypi.jetson-ai-lab.io's sbsa/cu130 mirror via [tool.uv.sources] with no version pins. That mirror started shipping torch 2.11.0 next to a vllm-0.20.0+cu130 wheel that was still compiled against torch 2.10's c10 ABI, so uv landed on the mismatched pair and vllm crashed at import: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib (c10::MessageLogger's constructor signature changed between torch 2.10 and 2.11; the vllm wheel referenced the 2.10 form, the installed libc10.so exported only the 2.11 form.) Since torch 2.11 (April 2026) PyPI publishes its own aarch64 + cu130 manylinux wheels, and vllm 0.20.0 ships an aarch64 wheel whose Requires- Dist locks torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0. That makes uv's resolver produce an ABI-consistent set automatically, so the mirror and the [tool.uv.sources] pinning are no longer needed. flash-attn is dropped from the dep list: PyPI has no aarch64 wheel, but vLLM 0.20+ already bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel, so the Dao-AILab package isn't required at runtime. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto * refactor(vllm): retire l4t13 pyproject.toml in favor of requirements-*.txt pyproject.toml only existed because uv pip install -r requirements.txt doesn't honor [tool.uv.sources]. The previous commit dropped [tool.uv. sources] (PyPI now serves the aarch64 + cu130 wheels directly), so the file no longer carries any logic the requirements-*.txt path can't. Replace with the same two-file pattern every other build profile uses: - requirements-l4t13.txt (accelerate / torch / transformers / bitsandbytes - matches cublas13's split) - requirements-l4t13-after.txt (vllm; runs after the base resolve so the cu130 torch wheel lands first) install.sh's whole l4t13 elif branch goes away; libbackend.sh's installRequirements already handles the requirements-install.txt build- deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and the runProtogen call, so falling through to the standard else: branch produces identical install behavior with less surface area. No functional change at install time - same wheels, same order. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto * fix(sglang,vllm-omni): switch L4T13 backends to PyPI aarch64+cu130 wheels Same root cause and same fix as the vllm backend in the previous commits: the L4T13 sglang and vllm-omni backends both pulled their accelerator stack from pypi.jetson-ai-lab.io's sbsa/cu130 mirror with no version pins, so they would silently land on the same torch 2.11 vs cu130-built wheel ABI mismatch the moment the mirror published an out-of-sync pair. sglang ------ - Drop pyproject.toml + [tool.uv.sources]. The historical comment said the [all] extra was unsafe on aarch64 because of decord, but sglang 0.5.x now uses `decord2` on aarch64/arm/armv7l (which ships cp312 aarch64 wheels), so we can match cublas13's sglang[all]>=0.5.11 pin and stop being capped at the 0.5.1.post2 the L4T mirror shipped. That unblocks Gemma 4 / MTP recipes on Jetson Thor. - New requirements-l4t13.txt mirrors the cublas13 split (accelerate / torch / torchvision / torchaudio / transformers), requirements-l4t13- after.txt carries sglang[all]>=0.5.11. - install.sh's l4t13 elif branch goes away; falls through to the standard installRequirements path. vllm-omni --------- - requirements-l4t13.txt drops --extra-index-url to jetson-ai-lab and drops flash-attn (PyPI has no aarch64 wheel, vLLM 0.20+ bundles its own vllm_flash_attn fa2 + fa3 internally). - install.sh's l4t13 vllm-install branch collapses into the cublas13 branch since both now just run `pip install vllm --torch-backend=auto` against PyPI. - --index-strategy=unsafe-best-match is dropped from the top-level l4t13 guard; without the L4T mirror in the picture it had no purpose. The from-source vllm-omni install on top still keeps its existing `sed -i '/^fa3-fwd[[:space:]]*==/d' requirements/cuda.txt` workaround - fa3-fwd has no aarch64 wheel and no sdist, unrelated to flash-attn. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto * fix(sglang): drop [all] extra on l4t13 - xatlas has no aarch64 wheel CI revealed that sglang[all]==0.5.12 transitively pulls xatlas via the [diffusion] sub-extra, and xatlas ships no aarch64 wheel. Its sdist depends on scikit_build_core without declaring it in build-system. requires, so under --no-build-isolation uv can't build it from source: × Failed to build `xatlas==0.0.11` ├─▶ The build backend returned an error ╰─▶ Call to `scikit_build_core.build.build_wheel` failed (exit status: 1) ModuleNotFoundError: No module named 'scikit_build_core' help: `xatlas` (v0.0.11) was included because `sglang[all]` (v0.5.12) depends on `xatlas` Upstream sglang explicitly gates st_attn and vsa on `platform_machine != aarch64` inside the same [diffusion] extra but forgot xatlas - same class of bug that bit the old decord pin. Use plain `sglang>=0.5.11` on l4t13. backend.py imports only base sglang.srt symbols (Engine, ServerArgs, FunctionCallParser, ReasoningParser); the [all] extras are optional accelerators not required at import time. cublas13 (x86_64) keeps [all] because xatlas has x86_64 wheels there. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto --------- Signed-off-by: Ettore Di Giacinto Co-authored-by: Ettore Di Giacinto --- backend/python/sglang/install.sh | 35 ++-------- backend/python/sglang/pyproject.toml | 68 ------------------- .../sglang/requirements-l4t13-after.txt | 15 ++++ backend/python/sglang/requirements-l4t13.txt | 9 +++ backend/python/vllm-omni/install.sh | 25 +++---- .../python/vllm-omni/requirements-l4t13.txt | 8 ++- backend/python/vllm/install.sh | 32 ++------- backend/python/vllm/pyproject.toml | 61 ----------------- .../python/vllm/requirements-l4t13-after.txt | 4 ++ backend/python/vllm/requirements-l4t13.txt | 8 +++ 10 files changed, 61 insertions(+), 204 deletions(-) delete mode 100644 backend/python/sglang/pyproject.toml create mode 100644 backend/python/sglang/requirements-l4t13-after.txt create mode 100644 backend/python/sglang/requirements-l4t13.txt delete mode 100644 backend/python/vllm/pyproject.toml create mode 100644 backend/python/vllm/requirements-l4t13-after.txt create mode 100644 backend/python/vllm/requirements-l4t13.txt diff --git a/backend/python/sglang/install.sh b/backend/python/sglang/install.sh index d7108d85f..928f7bd11 100755 --- a/backend/python/sglang/install.sh +++ b/backend/python/sglang/install.sh @@ -36,15 +36,11 @@ fi # flash-attn-4 4.0 stable lands. EXTRA_PIP_INSTALL_FLAGS+=" --prerelease=allow" -# JetPack 7 / L4T arm64 wheels are built for cp312 and shipped via -# pypi.jetson-ai-lab.io. Bump the venv Python so the prebuilt sglang -# wheel resolves cleanly. The actual install on l4t13 goes through -# pyproject.toml (see the elif branch below) so [tool.uv.sources] can -# pin only torch/torchvision/torchaudio/sglang to the jetson-ai-lab -# index — leaving PyPI as the path for transitive deps like -# markdown-it-py / anthropic / propcache that the L4T mirror's proxy -# 503s on. No --index-strategy flag here: the explicit index keeps the -# scoping clean. +# JetPack 7 / L4T arm64 sglang + torch wheels come straight from PyPI now +# (torch 2.11+ ships aarch64 + cu130 manylinux wheels and sglang 0.5.11+ +# ships a cp312 aarch64 wheel pinned to that torch). They're cp312-only, +# so bump the venv Python accordingly. +# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then PYTHON_VERSION="3.12" PYTHON_PATCH="12" @@ -110,27 +106,6 @@ if [ "x${BUILD_TYPE}" == "x" ] || [ "x${FROM_SOURCE:-}" == "xtrue" ]; then fi uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} . popd -# L4T arm64 (JetPack 7): drive the install through pyproject.toml so that -# [tool.uv.sources] can pin torch/torchvision/torchaudio/sglang to the -# jetson-ai-lab index, while everything else (transitive deps and -# PyPI-resolvable packages like transformers / accelerate) comes from -# PyPI. Bypasses installRequirements because uv pip install -r -# requirements.txt does not honor sources — see -# backend/python/sglang/pyproject.toml for the rationale. Mirrors the -# equivalent path in backend/python/vllm/install.sh. -elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then - ensureVenv - if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then - export C_INCLUDE_PATH="${C_INCLUDE_PATH:-}:$(_portable_dir)/include/python${PYTHON_VERSION}" - fi - pushd "${backend_dir}" - # Build deps first (matches installRequirements' requirements-install.txt - # pass — sglang/sgl-kernel sdists need packaging/setuptools-scm in the - # venv before they can build under --no-build-isolation). - uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} -r requirements-install.txt - uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --requirement pyproject.toml - popd - runProtogen else installRequirements fi diff --git a/backend/python/sglang/pyproject.toml b/backend/python/sglang/pyproject.toml deleted file mode 100644 index 9f061f2b8..000000000 --- a/backend/python/sglang/pyproject.toml +++ /dev/null @@ -1,68 +0,0 @@ -# L4T arm64 (JetPack 7 / sbsa cu130) install spec for the sglang backend. -# -# Why this file exists, and why only the l4t13 BUILD_PROFILE consumes it: -# -# pypi.jetson-ai-lab.io hosts the L4T-specific torch / sglang / sgl-kernel -# wheels we need on aarch64 + cuda13, but it ALSO transparently proxies the -# rest of PyPI through `/+f//` URLs that 503 frequently. -# With `--extra-index-url` + `--index-strategy=unsafe-best-match` (the -# historical fix in install.sh) uv would pick those proxy URLs for ordinary -# PyPI packages — markdown-it-py, anthropic, propcache, etc. — and trip on -# the 503s. See e.g. CI run 25439791228 (markdown-it-py-4.0.0). -# -# `explicit = true` on the index makes uv consult the L4T mirror ONLY for -# packages mapped under [tool.uv.sources]. Everything else goes to PyPI. -# This breaks the historical 503 path without losing access to the L4T -# wheels we actually need from there. Mirrors the equivalent fix already -# in backend/python/vllm/pyproject.toml. -# -# `uv pip install -r requirements.txt` does NOT honor [tool.uv.sources] -# (sources are project-mode only, not pip-compat mode), so install.sh's -# l4t13 branch invokes `uv pip install --requirement pyproject.toml` -# directly. Other BUILD_PROFILEs continue to use the requirements-*.txt -# pipeline through libbackend.sh's installRequirements and never read -# this file. -[project] -name = "localai-sglang-l4t13" -version = "0.0.0" -requires-python = ">=3.12,<3.13" -dependencies = [ - # Mirror of requirements.txt — kept in sync manually for now since the - # l4t13 path bypasses installRequirements (see install.sh). - "grpcio==1.80.0", - "protobuf", - "certifi", - "setuptools", - "pillow", - # L4T-specific accelerator stack (sourced from jetson-ai-lab below). - "torch", - "torchvision", - "torchaudio", - # sglang on jetson — the [all] extra is deliberately omitted because it - # pulls outlines/decord, and decord has no aarch64 cp312 wheel anywhere - # (PyPI nor the jetson-ai-lab index ships only legacy cp35-cp37). With - # [all] uv backtracks through versions trying to satisfy decord and - # lands on sglang==0.1.16. The 0.5.0 floor matches the only major - # series the jetson-ai-lab sbsa/cu130 mirror currently publishes - # (sglang==0.5.1.post2 as of 2026-05-06). Bumping to >=0.5.11 here - # would make the build unsatisfiable until the mirror catches up. - # Gemma 4 / MTP recipes are therefore not supported on l4t13 — those - # features land on cublas12/cublas13 hosts that pull the newer wheel - # from PyPI. backend.py keeps backward compat with the 0.5.x SamplingParams - # field rename via runtime detection. - "sglang>=0.5.0", - # PyPI-resolvable packages that complete the runtime. - "accelerate", - "transformers", -] - -[[tool.uv.index]] -name = "jetson-ai-lab" -url = "https://pypi.jetson-ai-lab.io/sbsa/cu130" -explicit = true - -[tool.uv.sources] -torch = { index = "jetson-ai-lab" } -torchvision = { index = "jetson-ai-lab" } -torchaudio = { index = "jetson-ai-lab" } -sglang = { index = "jetson-ai-lab" } diff --git a/backend/python/sglang/requirements-l4t13-after.txt b/backend/python/sglang/requirements-l4t13-after.txt new file mode 100644 index 000000000..fc2ca2030 --- /dev/null +++ b/backend/python/sglang/requirements-l4t13-after.txt @@ -0,0 +1,15 @@ +# sglang 0.5.11+ ships an aarch64 manylinux wheel on PyPI whose Requires-Dist +# pins torch==2.11.0 / torchaudio==2.11.0, locking an ABI-consistent set with +# the cu130 torch wheel installed above. 0.5.11 is the floor for Gemma 4 +# support (sgl-project/sglang#21952). +# +# The [all] extra is deliberately NOT used on aarch64: it pulls the +# [diffusion] sub-extra which requires `xatlas`, and xatlas ships no +# aarch64 wheel and its sdist depends on scikit_build_core without +# declaring it in build-system.requires — so under --no-build-isolation +# uv can't build it. Upstream sglang gates st_attn and vsa on +# platform_machine != aarch64 in the diffusion extra but forgot xatlas. +# Plain `sglang` carries everything backend.py uses (Engine, ServerArgs, +# FunctionCallParser, ReasoningParser); the [all] extras are optional +# accelerators not required at import time. +sglang>=0.5.11 diff --git a/backend/python/sglang/requirements-l4t13.txt b/backend/python/sglang/requirements-l4t13.txt new file mode 100644 index 000000000..73df815a1 --- /dev/null +++ b/backend/python/sglang/requirements-l4t13.txt @@ -0,0 +1,9 @@ +# JetPack 7 / L4T arm64 + CUDA 13. Since PyTorch 2.11 (April 2026), PyPI ships +# aarch64 + cu130 manylinux wheels for torch/torchvision/torchaudio directly, +# so we no longer need a custom --extra-index-url for the L4T mirror. +# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ +accelerate +torch +torchvision +torchaudio +transformers diff --git a/backend/python/vllm-omni/install.sh b/backend/python/vllm-omni/install.sh index 8823948ec..b6a0c0dcd 100755 --- a/backend/python/vllm-omni/install.sh +++ b/backend/python/vllm-omni/install.sh @@ -13,14 +13,14 @@ else fi # Handle l4t build profiles (Python 3.12, pip fallback) if needed. -# unsafe-best-match is required on l4t13 because the jetson-ai-lab index -# lists transitive deps at limited versions — without it uv pins to the -# first matching index and fails to resolve a compatible wheel from PyPI. +# Since PyTorch 2.11 (April 2026) PyPI ships aarch64 + cu130 manylinux wheels +# directly for torch/torchvision/torchaudio and an aarch64 vllm wheel pinned +# to that torch, so the jetson-ai-lab mirror is no longer needed. +# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then PYTHON_VERSION="3.12" PYTHON_PATCH="12" PY_STANDALONE_TAG="20251120" - EXTRA_PIP_INSTALL_FLAGS="${EXTRA_PIP_INSTALL_FLAGS:-} --index-strategy=unsafe-best-match" fi if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then @@ -42,18 +42,11 @@ if [ "x${BUILD_TYPE}" == "xhipblas" ]; then else uv pip install vllm==0.14.0 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700 fi -elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then - # JetPack 7 / L4T arm64 cu130 — vllm comes from the prebuilt SBSA wheel - # at jetson-ai-lab. Version is unpinned: the index ships whatever build - # matches the cu130/cp312 ABI. unsafe-best-match lets uv fall through - # to PyPI for transitive deps not present on the jetson-ai-lab index. - if [ "x${USE_PIP}" == "xtrue" ]; then - pip install vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130 - else - uv pip install --index-strategy=unsafe-best-match vllm --extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130 - fi -elif [ "x${BUILD_PROFILE}" == "xcublas13" ]; then - # vllm 0.19+ defaults to cu130 wheels on PyPI, no extra index needed. +elif [ "x${BUILD_PROFILE}" == "xcublas13" ] || [ "x${BUILD_PROFILE}" == "xl4t13" ]; then + # cublas13 (x86_64) and l4t13 (aarch64) both pull vllm from PyPI now: + # vllm 0.19+ defaults to cu130 wheels on x86_64 and vllm 0.20+ ships an + # aarch64 manylinux wheel pinned to torch==2.11.0. No extra index needed + # in either case. if [ "x${USE_PIP}" == "xtrue" ]; then pip install vllm --torch-backend=auto else diff --git a/backend/python/vllm-omni/requirements-l4t13.txt b/backend/python/vllm-omni/requirements-l4t13.txt index ff6f8e5b7..da422726b 100644 --- a/backend/python/vllm-omni/requirements-l4t13.txt +++ b/backend/python/vllm-omni/requirements-l4t13.txt @@ -1,11 +1,15 @@ ---extra-index-url https://pypi.jetson-ai-lab.io/sbsa/cu130 +# JetPack 7 / L4T arm64 + CUDA 13. PyPI ships aarch64 + cu130 manylinux wheels +# for torch/torchvision/torchaudio directly since PyTorch 2.11 (April 2026), +# so no custom index is needed. flash-attn is dropped here: PyPI has no +# aarch64 wheel for it, but vLLM 0.20+ bundles its own vllm_flash_attn +# (fa2 + fa3) inside the main wheel, so it is not required at runtime. +# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ accelerate torch torchvision torchaudio transformers bitsandbytes -flash-attn diffusers librosa soundfile diff --git a/backend/python/vllm/install.sh b/backend/python/vllm/install.sh index cb8729ac1..320ef6772 100755 --- a/backend/python/vllm/install.sh +++ b/backend/python/vllm/install.sh @@ -43,14 +43,11 @@ if [ "x${BUILD_PROFILE}" == "xcublas13" ]; then EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match" fi -# JetPack 7 / L4T arm64 wheels (torch, vllm, flash-attn) live on -# pypi.jetson-ai-lab.io and are built for cp312, so bump the venv Python -# accordingly. JetPack 6 keeps cp310 + USE_PIP=true. -# -# l4t13 uses pyproject.toml (see the elif branch below) to pin only the -# L4T-specific wheels to the jetson-ai-lab index via [tool.uv.sources]. -# That keeps PyPI as the resolution path for transitive deps like -# anthropic/openai/propcache, which the L4T mirror's proxy 503s on. +# JetPack 7 / L4T arm64 vllm + torch wheels come straight from PyPI now +# (torch 2.11+ ships aarch64 + cu130 manylinux wheels and vllm 0.20+ ships +# an aarch64 wheel pinned to that torch). They're cp312-only, so bump the +# venv Python accordingly. JetPack 6 keeps cp310 + USE_PIP=true. +# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then USE_PIP=true fi @@ -103,25 +100,6 @@ if [ "x${BUILD_TYPE}" == "xintel" ]; then export CMAKE_PREFIX_PATH="$(python -c 'import site; print(site.getsitepackages()[0])'):${CMAKE_PREFIX_PATH:-}" VLLM_TARGET_DEVICE=xpu uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --no-deps . popd -# L4T arm64 (JetPack 7): drive the install through pyproject.toml so that -# [tool.uv.sources] can pin torch/vllm/flash-attn/torchvision/torchaudio -# to the jetson-ai-lab index, while everything else (transitive deps and -# PyPI-resolvable packages like transformers) comes from PyPI. Bypasses -# installRequirements because uv pip install -r requirements.txt does not -# honor sources — see backend/python/vllm/pyproject.toml for the rationale. -elif [ "x${BUILD_PROFILE}" == "xl4t13" ]; then - ensureVenv - if [ "x${PORTABLE_PYTHON}" == "xtrue" ]; then - export C_INCLUDE_PATH="${C_INCLUDE_PATH:-}:$(_portable_dir)/include/python${PYTHON_VERSION}" - fi - pushd "${backend_dir}" - # Build deps first (matches installRequirements' requirements-install.txt - # pass — fastsafetensors and friends need pybind11 in the venv before - # their sdists can build under --no-build-isolation). - uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} -r requirements-install.txt - uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --requirement pyproject.toml - popd - runProtogen # FROM_SOURCE=true on a CPU build skips the prebuilt vllm wheel in # requirements-cpu-after.txt and compiles vllm locally against the host's # actual CPU. Not used by default because it takes ~30-40 minutes, but diff --git a/backend/python/vllm/pyproject.toml b/backend/python/vllm/pyproject.toml deleted file mode 100644 index b06b9c425..000000000 --- a/backend/python/vllm/pyproject.toml +++ /dev/null @@ -1,61 +0,0 @@ -# L4T arm64 (JetPack 7 / sbsa cu130) install spec for the vllm backend. -# -# Why this file exists, and why only the l4t13 BUILD_PROFILE consumes it: -# -# pypi.jetson-ai-lab.io hosts the L4T-specific torch / vllm / flash-attn -# wheels we need on aarch64 + cuda13, but it ALSO transparently proxies the -# rest of PyPI through `/+f//` URLs that 503 frequently. With -# `--extra-index-url` + `--index-strategy=unsafe-best-match` (the historical -# fix in install.sh) uv would pick those proxy URLs for ordinary PyPI -# packages — `anthropic`, `openai`, `propcache`, `annotated-types` — and -# trip on the 503s. See e.g. CI run 25212201349 (anthropic-0.97.0). -# -# `explicit = true` on the index makes uv consult the L4T mirror ONLY for -# packages mapped under [tool.uv.sources]. Everything else goes to PyPI. -# This breaks the historical 503 path without losing access to the L4T -# wheels we actually need from there. -# -# `uv pip install -r requirements.txt` does NOT honor [tool.uv.sources] -# (sources are project-mode only, not pip-compat mode), so install.sh's -# l4t13 branch invokes `uv pip install --requirement pyproject.toml` -# directly. Other BUILD_PROFILEs continue to use the requirements-*.txt -# pipeline through libbackend.sh's installRequirements and never read -# this file. -[project] -name = "localai-vllm-l4t13" -version = "0.0.0" -requires-python = ">=3.12,<3.13" -dependencies = [ - # Mirror of requirements.txt — kept in sync manually for now since the - # l4t13 path bypasses installRequirements (see install.sh). - "grpcio==1.80.0", - "protobuf", - "certifi", - "setuptools", - "pillow", - "charset-normalizer>=3.4.7", - "chardet", - # L4T-specific accelerator stack (sourced from jetson-ai-lab below). - "torch", - "torchvision", - "torchaudio", - "flash-attn", - "vllm", - # PyPI-resolvable packages that complete the runtime — accelerate, - # transformers, bitsandbytes carry their own wheels for aarch64. - "accelerate", - "transformers", - "bitsandbytes", -] - -[[tool.uv.index]] -name = "jetson-ai-lab" -url = "https://pypi.jetson-ai-lab.io/sbsa/cu130" -explicit = true - -[tool.uv.sources] -torch = { index = "jetson-ai-lab" } -torchvision = { index = "jetson-ai-lab" } -torchaudio = { index = "jetson-ai-lab" } -flash-attn = { index = "jetson-ai-lab" } -vllm = { index = "jetson-ai-lab" } diff --git a/backend/python/vllm/requirements-l4t13-after.txt b/backend/python/vllm/requirements-l4t13-after.txt new file mode 100644 index 000000000..c959c6ae0 --- /dev/null +++ b/backend/python/vllm/requirements-l4t13-after.txt @@ -0,0 +1,4 @@ +# vLLM 0.20+ ships an aarch64 manylinux wheel on PyPI whose Requires-Dist pins +# torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0, locking an ABI- +# consistent set with the cu130 torch wheel installed above. +vllm diff --git a/backend/python/vllm/requirements-l4t13.txt b/backend/python/vllm/requirements-l4t13.txt new file mode 100644 index 000000000..e566fa855 --- /dev/null +++ b/backend/python/vllm/requirements-l4t13.txt @@ -0,0 +1,8 @@ +# JetPack 7 / L4T arm64 + CUDA 13. Since PyTorch 2.11 (April 2026), PyPI ships +# aarch64 + cu130 manylinux wheels for torch/torchvision/torchaudio directly, +# so we no longer need a custom --extra-index-url for the L4T mirror. +# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ +accelerate +torch +transformers +bitsandbytes