mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-29 11:07:18 -04:00
* fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels The L4T13 vllm backend pulled torch / torchvision / torchaudio / vllm from pypi.jetson-ai-lab.io's sbsa/cu130 mirror via [tool.uv.sources] with no version pins. That mirror started shipping torch 2.11.0 next to a vllm-0.20.0+cu130 wheel that was still compiled against torch 2.10's c10 ABI, so uv landed on the mismatched pair and vllm crashed at import: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib (c10::MessageLogger's constructor signature changed between torch 2.10 and 2.11; the vllm wheel referenced the 2.10 form, the installed libc10.so exported only the 2.11 form.) Since torch 2.11 (April 2026) PyPI publishes its own aarch64 + cu130 manylinux wheels, and vllm 0.20.0 ships an aarch64 wheel whose Requires- Dist locks torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0. That makes uv's resolver produce an ABI-consistent set automatically, so the mirror and the [tool.uv.sources] pinning are no longer needed. flash-attn is dropped from the dep list: PyPI has no aarch64 wheel, but vLLM 0.20+ already bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel, so the Dao-AILab package isn't required at runtime. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(vllm): retire l4t13 pyproject.toml in favor of requirements-*.txt pyproject.toml only existed because uv pip install -r requirements.txt doesn't honor [tool.uv.sources]. The previous commit dropped [tool.uv. sources] (PyPI now serves the aarch64 + cu130 wheels directly), so the file no longer carries any logic the requirements-*.txt path can't. Replace with the same two-file pattern every other build profile uses: - requirements-l4t13.txt (accelerate / torch / transformers / bitsandbytes - matches cublas13's split) - requirements-l4t13-after.txt (vllm; runs after the base resolve so the cu130 torch wheel lands first) install.sh's whole l4t13 elif branch goes away; libbackend.sh's installRequirements already handles the requirements-install.txt build- deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and the runProtogen call, so falling through to the standard else: branch produces identical install behavior with less surface area. No functional change at install time - same wheels, same order. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang,vllm-omni): switch L4T13 backends to PyPI aarch64+cu130 wheels Same root cause and same fix as the vllm backend in the previous commits: the L4T13 sglang and vllm-omni backends both pulled their accelerator stack from pypi.jetson-ai-lab.io's sbsa/cu130 mirror with no version pins, so they would silently land on the same torch 2.11 vs cu130-built wheel ABI mismatch the moment the mirror published an out-of-sync pair. sglang ------ - Drop pyproject.toml + [tool.uv.sources]. The historical comment said the [all] extra was unsafe on aarch64 because of decord, but sglang 0.5.x now uses `decord2` on aarch64/arm/armv7l (which ships cp312 aarch64 wheels), so we can match cublas13's sglang[all]>=0.5.11 pin and stop being capped at the 0.5.1.post2 the L4T mirror shipped. That unblocks Gemma 4 / MTP recipes on Jetson Thor. - New requirements-l4t13.txt mirrors the cublas13 split (accelerate / torch / torchvision / torchaudio / transformers), requirements-l4t13- after.txt carries sglang[all]>=0.5.11. - install.sh's l4t13 elif branch goes away; falls through to the standard installRequirements path. vllm-omni --------- - requirements-l4t13.txt drops --extra-index-url to jetson-ai-lab and drops flash-attn (PyPI has no aarch64 wheel, vLLM 0.20+ bundles its own vllm_flash_attn fa2 + fa3 internally). - install.sh's l4t13 vllm-install branch collapses into the cublas13 branch since both now just run `pip install vllm --torch-backend=auto` against PyPI. - --index-strategy=unsafe-best-match is dropped from the top-level l4t13 guard; without the L4T mirror in the picture it had no purpose. The from-source vllm-omni install on top still keeps its existing `sed -i '/^fa3-fwd[[:space:]]*==/d' requirements/cuda.txt` workaround - fa3-fwd has no aarch64 wheel and no sdist, unrelated to flash-attn. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang): drop [all] extra on l4t13 - xatlas has no aarch64 wheel CI revealed that sglang[all]==0.5.12 transitively pulls xatlas via the [diffusion] sub-extra, and xatlas ships no aarch64 wheel. Its sdist depends on scikit_build_core without declaring it in build-system. requires, so under --no-build-isolation uv can't build it from source: × Failed to build `xatlas==0.0.11` ├─▶ The build backend returned an error ╰─▶ Call to `scikit_build_core.build.build_wheel` failed (exit status: 1) ModuleNotFoundError: No module named 'scikit_build_core' help: `xatlas` (v0.0.11) was included because `sglang[all]` (v0.5.12) depends on `xatlas` Upstream sglang explicitly gates st_attn and vsa on `platform_machine != aarch64` inside the same [diffusion] extra but forgot xatlas - same class of bug that bit the old decord pin. Use plain `sglang>=0.5.11` on l4t13. backend.py imports only base sglang.srt symbols (Engine, ServerArgs, FunctionCallParser, ReasoningParser); the [all] extras are optional accelerators not required at import time. cublas13 (x86_64) keeps [all] because xatlas has x86_64 wheels there. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
90 lines
3.1 KiB
Bash
Executable File
90 lines
3.1 KiB
Bash
Executable File
#!/bin/bash
|
|
set -e
|
|
|
|
PYTHON_VERSION="3.12"
|
|
PYTHON_PATCH="12"
|
|
PY_STANDALONE_TAG="20251120"
|
|
|
|
backend_dir=$(dirname $0)
|
|
if [ -d $backend_dir/common ]; then
|
|
source $backend_dir/common/libbackend.sh
|
|
else
|
|
source $backend_dir/../common/libbackend.sh
|
|
fi
|
|
|
|
# Handle l4t build profiles (Python 3.12, pip fallback) if needed.
|
|
# Since PyTorch 2.11 (April 2026) PyPI ships aarch64 + cu130 manylinux wheels
|
|
# directly for torch/torchvision/torchaudio and an aarch64 vllm wheel pinned
|
|
# to that torch, so the jetson-ai-lab mirror is no longer needed.
|
|
# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/
|
|
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
|
PYTHON_VERSION="3.12"
|
|
PYTHON_PATCH="12"
|
|
PY_STANDALONE_TAG="20251120"
|
|
fi
|
|
|
|
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
|
|
USE_PIP=true
|
|
fi
|
|
|
|
# Install base requirements first
|
|
installRequirements
|
|
|
|
# Install vllm based on build type. vllm-omni tracks vllm master from
|
|
# source (cloned below) so we leave the upstream vllm dependency unpinned
|
|
# — vllm 0.19+ ships cu130 wheels by default, which is what we want for
|
|
# cublas13. Older cuda12/rocm/cpu paths still resolve a compatible wheel
|
|
# from the relevant channel.
|
|
if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
|
|
# ROCm
|
|
if [ "x${USE_PIP}" == "xtrue" ]; then
|
|
pip install vllm==0.14.0 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
|
|
else
|
|
uv pip install vllm==0.14.0 --extra-index-url https://wheels.vllm.ai/rocm/0.14.0/rocm700
|
|
fi
|
|
elif [ "x${BUILD_PROFILE}" == "xcublas13" ] || [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
|
# cublas13 (x86_64) and l4t13 (aarch64) both pull vllm from PyPI now:
|
|
# vllm 0.19+ defaults to cu130 wheels on x86_64 and vllm 0.20+ ships an
|
|
# aarch64 manylinux wheel pinned to torch==2.11.0. No extra index needed
|
|
# in either case.
|
|
if [ "x${USE_PIP}" == "xtrue" ]; then
|
|
pip install vllm --torch-backend=auto
|
|
else
|
|
uv pip install vllm --torch-backend=auto
|
|
fi
|
|
elif [ "x${BUILD_TYPE}" == "xcublas" ] || [ "x${BUILD_TYPE}" == "x" ]; then
|
|
# cuda12 / CPU — keep the 0.14.0 pin for compatibility with the existing
|
|
# cuda12 vllm-omni image; bumping should be its own change.
|
|
if [ "x${USE_PIP}" == "xtrue" ]; then
|
|
pip install vllm==0.14.0 --torch-backend=auto
|
|
else
|
|
uv pip install vllm==0.14.0 --torch-backend=auto
|
|
fi
|
|
else
|
|
echo "Unsupported build type: ${BUILD_TYPE}" >&2
|
|
exit 1
|
|
fi
|
|
|
|
# Clone and install vllm-omni from source
|
|
if [ ! -d vllm-omni ]; then
|
|
git clone https://github.com/vllm-project/vllm-omni.git
|
|
fi
|
|
|
|
cd vllm-omni/
|
|
|
|
# fa3-fwd ships no aarch64 wheels and there is no source distribution, so on
|
|
# aarch64 (e.g. l4t13 / SBSA cu130) the upstream requirements/cuda.txt is
|
|
# unsatisfiable. Drop it before resolving — vllm-omni does not hard-require
|
|
# the fused FA3 kernel at import time on Jetson/SBSA targets.
|
|
if [ "$(uname -m)" = "aarch64" ] && [ -f requirements/cuda.txt ]; then
|
|
sed -i '/^fa3-fwd[[:space:]]*==/d' requirements/cuda.txt
|
|
fi
|
|
|
|
if [ "x${USE_PIP}" == "xtrue" ]; then
|
|
pip install ${EXTRA_PIP_INSTALL_FLAGS:-} -e .
|
|
else
|
|
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} -e .
|
|
fi
|
|
|
|
cd ..
|