mirror of
https://github.com/mudler/LocalAI.git
synced 2026-04-30 12:08:13 -04:00
* feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang
Adds new build profiles mirroring the diffusers/ace-step pattern so vLLM
serving (and SGLang on arm64) can be deployed on CUDA 13 hosts and
JetPack 7 boards:
- vllm: cublas13 (PyPI cu130 channel) + l4t13 (jetson-ai-lab SBSA cu130
prebuilt vllm + flash-attn).
- vllm-omni: cublas13 + l4t13. Floats vllm version on cu13 since vllm
0.19+ ships cu130 wheels by default and vllm-omni tracks vllm master;
cu12 path keeps the 0.14.0 pin to avoid disturbing existing images.
- sglang: l4t13 arm64 only — uses the prebuilt sglang wheel from the
jetson-ai-lab SBSA cu130 index, so no source build is needed.
Cublas13 sglang on x86_64 is intentionally deferred.
CI matrix gains five new images (-gpu-nvidia-cuda-13-vllm{,-omni},
-nvidia-l4t-cuda-13-arm64-{vllm,vllm-omni,sglang}); backend/index.yaml
gains the matching capability keys (nvidia-cuda-13, nvidia-l4t-cuda-13)
and latest/development merge entries.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Write] [Bash]
* fix(backends): use unsafe-best-match index strategy on l4t13 builds
The jetson-ai-lab SBSA cu130 index lists transitive deps (decord, etc.)
at limited versions / older Python ABIs. uv defaults to the first index
that contains a package and refuses to fall through to PyPI, so sglang
l4t13 build fails resolving decord. Mirror the existing cpu sglang
profile by setting --index-strategy=unsafe-best-match on l4t13 across
the three backends, and apply it to the explicit vllm install line in
vllm-omni's install.sh (which doesn't honor EXTRA_PIP_INSTALL_FLAGS).
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
* fix(sglang): drop [all] extras on l4t13, floor version at 0.5.0
The [all] extra brings in outlines→decord, and decord has no aarch64
cp312 wheel on PyPI nor the jetson-ai-lab index (only legacy cp35-cp37
tags). With unsafe-best-match enabled, uv backtracked through sglang
versions trying to satisfy decord and silently landed on
sglang==0.1.16, an ancient version with an entirely different dep
tree (cloudpickle/outlines 0.0.44, etc.).
Drop [all] so decord is no longer required, and floor sglang at 0.5.0
to prevent any future resolver misfire from degrading the version
again.
Assisted-by: Claude:claude-opus-4-7 [Read] [Edit] [Bash]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
85 lines
3.7 KiB
Bash
Executable File
85 lines
3.7 KiB
Bash
Executable File
#!/bin/bash
|
|
set -e
|
|
|
|
EXTRA_PIP_INSTALL_FLAGS="--no-build-isolation"
|
|
|
|
# Avoid to overcommit the CPU during build
|
|
# https://github.com/vllm-project/vllm/issues/20079
|
|
# https://docs.vllm.ai/en/v0.8.3/serving/env_vars.html
|
|
# https://docs.redhat.com/it/documentation/red_hat_ai_inference_server/3.0/html/vllm_server_arguments/environment_variables-server-arguments
|
|
export NVCC_THREADS=2
|
|
export MAX_JOBS=1
|
|
|
|
backend_dir=$(dirname $0)
|
|
|
|
if [ -d $backend_dir/common ]; then
|
|
source $backend_dir/common/libbackend.sh
|
|
else
|
|
source $backend_dir/../common/libbackend.sh
|
|
fi
|
|
|
|
# This is here because the Intel pip index is broken and returns 200 status codes for every package name, it just doesn't return any package links.
|
|
# This makes uv think that the package exists in the Intel pip index, and by default it stops looking at other pip indexes once it finds a match.
|
|
# We need uv to continue falling through to the pypi default index to find optimum[openvino] in the pypi index
|
|
# the --upgrade actually allows us to *downgrade* torch to the version provided in the Intel pip index
|
|
if [ "x${BUILD_PROFILE}" == "xintel" ]; then
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --upgrade --index-strategy=unsafe-first-match"
|
|
fi
|
|
|
|
# CPU builds need unsafe-best-match to pull torch==2.10.0+cpu from the
|
|
# pytorch test channel while still resolving transformers/vllm from pypi.
|
|
if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
|
fi
|
|
|
|
# JetPack 7 / L4T arm64 wheels (torch, vllm, flash-attn) live on
|
|
# pypi.jetson-ai-lab.io and are built for cp312, so bump the venv Python
|
|
# accordingly. JetPack 6 keeps cp310 + USE_PIP=true. unsafe-best-match
|
|
# is required because the jetson-ai-lab index lists transitive deps at
|
|
# limited versions — without it uv pins to the first matching index and
|
|
# fails to resolve a compatible wheel from PyPI.
|
|
if [ "x${BUILD_PROFILE}" == "xl4t12" ]; then
|
|
USE_PIP=true
|
|
fi
|
|
if [ "x${BUILD_PROFILE}" == "xl4t13" ]; then
|
|
PYTHON_VERSION="3.12"
|
|
PYTHON_PATCH="12"
|
|
PY_STANDALONE_TAG="20251120"
|
|
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
|
|
fi
|
|
|
|
# FROM_SOURCE=true on a CPU build skips the prebuilt vllm wheel in
|
|
# requirements-cpu-after.txt and compiles vllm locally against the host's
|
|
# actual CPU. Not used by default because it takes ~30-40 minutes, but
|
|
# kept here for hosts where the prebuilt wheel SIGILLs (CPU without the
|
|
# required SIMD baseline, e.g. AVX-512 VNNI/BF16). Default CI uses a
|
|
# bigger-runner with compatible hardware instead.
|
|
if [ "x${BUILD_TYPE}" == "x" ] && [ "x${FROM_SOURCE:-}" == "xtrue" ]; then
|
|
# Temporarily hide the prebuilt wheel so installRequirements doesn't
|
|
# pull it — the rest of the requirements files (base deps, torch,
|
|
# transformers) are still installed normally.
|
|
_cpu_after="${backend_dir}/requirements-cpu-after.txt"
|
|
_cpu_after_bak=""
|
|
if [ -f "${_cpu_after}" ]; then
|
|
_cpu_after_bak="${_cpu_after}.from-source.bak"
|
|
mv "${_cpu_after}" "${_cpu_after_bak}"
|
|
fi
|
|
installRequirements
|
|
if [ -n "${_cpu_after_bak}" ]; then
|
|
mv "${_cpu_after_bak}" "${_cpu_after}"
|
|
fi
|
|
|
|
# Build vllm from source against the installed torch.
|
|
# https://docs.vllm.ai/en/latest/getting_started/installation/cpu/
|
|
_vllm_src=$(mktemp -d)
|
|
trap 'rm -rf "${_vllm_src}"' EXIT
|
|
git clone --depth 1 https://github.com/vllm-project/vllm "${_vllm_src}/vllm"
|
|
pushd "${_vllm_src}/vllm"
|
|
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} wheel packaging ninja "setuptools>=49.4.0" numpy typing-extensions pillow setuptools-scm
|
|
# Respect pre-installed torch version — skip vllm's own requirements-build.txt torch pin.
|
|
VLLM_TARGET_DEVICE=cpu uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --no-deps .
|
|
popd
|
|
else
|
|
installRequirements
|
|
fi
|