mirror of
https://github.com/mudler/LocalAI.git
synced 2026-05-29 19:19:19 -04:00
* fix(vllm): switch L4T13 backend to PyPI aarch64+cu130 wheels The L4T13 vllm backend pulled torch / torchvision / torchaudio / vllm from pypi.jetson-ai-lab.io's sbsa/cu130 mirror via [tool.uv.sources] with no version pins. That mirror started shipping torch 2.11.0 next to a vllm-0.20.0+cu130 wheel that was still compiled against torch 2.10's c10 ABI, so uv landed on the mismatched pair and vllm crashed at import: ImportError: vllm/_C.abi3.so: undefined symbol: _ZN3c1013MessageLoggerC1EPKciib (c10::MessageLogger's constructor signature changed between torch 2.10 and 2.11; the vllm wheel referenced the 2.10 form, the installed libc10.so exported only the 2.11 form.) Since torch 2.11 (April 2026) PyPI publishes its own aarch64 + cu130 manylinux wheels, and vllm 0.20.0 ships an aarch64 wheel whose Requires- Dist locks torch==2.11.0 / torchvision==0.26.0 / torchaudio==2.11.0. That makes uv's resolver produce an ABI-consistent set automatically, so the mirror and the [tool.uv.sources] pinning are no longer needed. flash-attn is dropped from the dep list: PyPI has no aarch64 wheel, but vLLM 0.20+ already bundles its own vllm_flash_attn (fa2 + fa3) inside the main wheel, so the Dao-AILab package isn't required at runtime. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * refactor(vllm): retire l4t13 pyproject.toml in favor of requirements-*.txt pyproject.toml only existed because uv pip install -r requirements.txt doesn't honor [tool.uv.sources]. The previous commit dropped [tool.uv. sources] (PyPI now serves the aarch64 + cu130 wheels directly), so the file no longer carries any logic the requirements-*.txt path can't. Replace with the same two-file pattern every other build profile uses: - requirements-l4t13.txt (accelerate / torch / transformers / bitsandbytes - matches cublas13's split) - requirements-l4t13-after.txt (vllm; runs after the base resolve so the cu130 torch wheel lands first) install.sh's whole l4t13 elif branch goes away; libbackend.sh's installRequirements already handles the requirements-install.txt build- deps pass, the C_INCLUDE_PATH export for PORTABLE_PYTHON, and the runProtogen call, so falling through to the standard else: branch produces identical install behavior with less surface area. No functional change at install time - same wheels, same order. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang,vllm-omni): switch L4T13 backends to PyPI aarch64+cu130 wheels Same root cause and same fix as the vllm backend in the previous commits: the L4T13 sglang and vllm-omni backends both pulled their accelerator stack from pypi.jetson-ai-lab.io's sbsa/cu130 mirror with no version pins, so they would silently land on the same torch 2.11 vs cu130-built wheel ABI mismatch the moment the mirror published an out-of-sync pair. sglang ------ - Drop pyproject.toml + [tool.uv.sources]. The historical comment said the [all] extra was unsafe on aarch64 because of decord, but sglang 0.5.x now uses `decord2` on aarch64/arm/armv7l (which ships cp312 aarch64 wheels), so we can match cublas13's sglang[all]>=0.5.11 pin and stop being capped at the 0.5.1.post2 the L4T mirror shipped. That unblocks Gemma 4 / MTP recipes on Jetson Thor. - New requirements-l4t13.txt mirrors the cublas13 split (accelerate / torch / torchvision / torchaudio / transformers), requirements-l4t13- after.txt carries sglang[all]>=0.5.11. - install.sh's l4t13 elif branch goes away; falls through to the standard installRequirements path. vllm-omni --------- - requirements-l4t13.txt drops --extra-index-url to jetson-ai-lab and drops flash-attn (PyPI has no aarch64 wheel, vLLM 0.20+ bundles its own vllm_flash_attn fa2 + fa3 internally). - install.sh's l4t13 vllm-install branch collapses into the cublas13 branch since both now just run `pip install vllm --torch-backend=auto` against PyPI. - --index-strategy=unsafe-best-match is dropped from the top-level l4t13 guard; without the L4T mirror in the picture it had no purpose. The from-source vllm-omni install on top still keeps its existing `sed -i '/^fa3-fwd[[:space:]]*==/d' requirements/cuda.txt` workaround - fa3-fwd has no aarch64 wheel and no sdist, unrelated to flash-attn. Reference: https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/ Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] [WebFetch] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> * fix(sglang): drop [all] extra on l4t13 - xatlas has no aarch64 wheel CI revealed that sglang[all]==0.5.12 transitively pulls xatlas via the [diffusion] sub-extra, and xatlas ships no aarch64 wheel. Its sdist depends on scikit_build_core without declaring it in build-system. requires, so under --no-build-isolation uv can't build it from source: × Failed to build `xatlas==0.0.11` ├─▶ The build backend returned an error ╰─▶ Call to `scikit_build_core.build.build_wheel` failed (exit status: 1) ModuleNotFoundError: No module named 'scikit_build_core' help: `xatlas` (v0.0.11) was included because `sglang[all]` (v0.5.12) depends on `xatlas` Upstream sglang explicitly gates st_attn and vsa on `platform_machine != aarch64` inside the same [diffusion] extra but forgot xatlas - same class of bug that bit the old decord pin. Use plain `sglang>=0.5.11` on l4t13. backend.py imports only base sglang.srt symbols (Engine, ServerArgs, FunctionCallParser, ReasoningParser); the [all] extras are optional accelerators not required at import time. cublas13 (x86_64) keeps [all] because xatlas has x86_64 wheels there. Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> --------- Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
112 lines
4.6 KiB
Bash
Executable File
112 lines
4.6 KiB
Bash
Executable File
#!/bin/bash
|
|
set -e
|
|
|
|
EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
|
|
|
|
# Avoid overcommitting the CPU during builds that compile native code.
|
|
export NVCC_THREADS=2
|
|
export MAX_JOBS=1
|
|
|
|
backend_dir=$(dirname $0)
|
|
|
|
if [ -d $backend_dir/common ]; then
|
|
source $backend_dir/common/libbackend.sh
|
|
else
|
|
source $backend_dir/../common/libbackend.sh
|
|
fi
|
|
|
|
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
|
|
fi
|
|
|
|
if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
|
fi
|
|
|
|
# cublas12 needs a cu128 torch index (see requirements-cublas12.txt) — without
|
|
# unsafe-best-match uv falls through to default PyPI's cu130 torch wheel and
|
|
# the resulting sgl-kernel can't load on our cu12 host libs.
|
|
if [ "x${BUILD_PROFILE}" == "xcublas12" ]; then
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
|
fi
|
|
|
|
# sglang 0.5.11 (Gemma 4 support) declares flash-attn-4 as a hard dep, but
|
|
# upstream only publishes pre-release wheels (4.0.0b*). uv rejects
|
|
# pre-releases by default — opt in for sglang specifically. Drop this once
|
|
# flash-attn-4 4.0 stable lands.
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --prerelease=allow"
|
|
|
|
# JetPack 7 / L4T arm64 sglang + torch wheels come straight from PyPI now
|
|
# (torch 2.11+ ships aarch64 + cu130 manylinux wheels and sglang 0.5.11+
|
|
# ships a cp312 aarch64 wheel pinned to that torch). They're cp312-only,
|
|
# so bump the venv Python accordingly.
|
|
# https://pytorch.org/blog/vllm-and-pytorch-work-together-to-improve-the-developer-experience-on-aarch64/
|
|
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
|
PYTHON_VERSION="3.12"
|
|
PYTHON_PATCH="12"
|
|
PY_STANDALONE_TAG="20251120"
|
|
fi
|
|
|
|
# sglang's CPU path has no prebuilt wheel on PyPI — upstream publishes
|
|
# a separate pyproject_cpu.toml that must be swapped in before `pip install`.
|
|
# Reference: docker/xeon.Dockerfile in the sglang upstream repo.
|
|
#
|
|
# When BUILD_TYPE is empty (CPU profile) or FROM_SOURCE=true is forced,
|
|
# install torch/transformers/etc from requirements-cpu.txt, then clone
|
|
# sglang and install its python/ and sgl-kernel/ packages from source
|
|
# using the CPU pyproject.
|
|
if [ "x${BUILD_TYPE}" == "x" ] || [ "x${FROM_SOURCE:-}" == "xtrue" ]; then
|
|
# sgl-kernel's CPU build links against libnuma and libtbb. Install
|
|
# them here (Docker builder stage) before running the source build.
|
|
# Harmless no-op on runs outside the docker build since installRequirements
|
|
# below still needs them only if we reach the source build branch.
|
|
if command -v apt-get >/dev/null 2>&1 && [ "$(id -u)" = "0" ]; then
|
|
apt-get update
|
|
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
|
|
libnuma-dev numactl libtbb-dev libgomp1 libomp-dev google-perftools \
|
|
build-essential cmake ninja-build
|
|
fi
|
|
|
|
installRequirements
|
|
|
|
# sgl-kernel's pyproject_cpu.toml uses scikit-build-core as its build
|
|
# backend. With --no-build-isolation, that (and ninja/cmake) must be
|
|
# present in the venv before we build from source.
|
|
uv pip install --no-build-isolation "scikit-build-core>=0.10" ninja cmake
|
|
|
|
# sgl-kernel's CPU shm.cpp uses __m512 AVX-512 intrinsics unconditionally.
|
|
# csrc/cpu/CMakeLists.txt hard-codes add_compile_options(-march=native),
|
|
# which on runners without AVX-512 in /proc/cpuinfo fails with
|
|
# "__m512 return without 'avx512f' enabled changes the ABI".
|
|
# CXXFLAGS alone is insufficient because CMake's add_compile_options()
|
|
# appends -march=native *after* CXXFLAGS, overriding it.
|
|
# We therefore patch the CMakeLists.txt to replace -march=native with
|
|
# -march=sapphirerapids so the flag is consistent throughout the build.
|
|
# The resulting binary still requires an AVX-512 capable CPU at runtime,
|
|
# same constraint sglang upstream documents in docker/xeon.Dockerfile.
|
|
|
|
_sgl_src=$(mktemp -d)
|
|
trap 'rm -rf "${_sgl_src}"' EXIT
|
|
git clone --depth 1 https://github.com/sgl-project/sglang "${_sgl_src}/sglang"
|
|
|
|
# Patch -march=native → -march=sapphirerapids in the CPU kernel CMakeLists
|
|
sed -i 's/-march=native/-march=sapphirerapids/g' \
|
|
"${_sgl_src}/sglang/sgl-kernel/csrc/cpu/CMakeLists.txt"
|
|
|
|
pushd "${_sgl_src}/sglang/sgl-kernel"
|
|
if [ -f pyproject_cpu.toml ]; then
|
|
cp pyproject_cpu.toml pyproject.toml
|
|
fi
|
|
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} .
|
|
popd
|
|
|
|
pushd "${_sgl_src}/sglang/python"
|
|
if [ -f pyproject_cpu.toml ]; then
|
|
cp pyproject_cpu.toml pyproject.toml
|
|
fi
|
|
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} .
|
|
popd
|
|
else
|
|
installRequirements
|
|
fi
|