fix(vllm): install ROCm vLLM from the AMD wheel index on Python 3.12 (#10651)

* fix(vllm): install ROCm vLLM from the AMD wheel index on Python 3.12

The rocm-vllm backend crashed at load with "No module named 'vllm'".
requirements-hipblas-after.txt requested a bare `vllm`, which resolves to
the CUDA-only PyPI wheel; that wheel is unusable on an AMD GPU. vLLM's
prebuilt ROCm wheels live on a dedicated index (https://wheels.vllm.ai/rocm/)
and are published only for CPython 3.12, so on the backend's default 3.10
the installer silently falls back to the CUDA wheel.

Add a hipblas branch to backend/python/vllm/install.sh that pins Python to
3.12 and installs vllm from the ROCm wheel index, hiding the bare-`vllm`
after-file so installRequirements installs only the base ROCm
torch/transformers first and does not pull the CUDA wheel.

Fixes #10642

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

* chore(vllm): drop the dead hipblas-after requirement and its hide dance

requirements-hipblas-after.txt (a bare `vllm`) is never installed for
hipblas: installRequirements only adds requirements-${BUILD_PROFILE}-after.txt
when BUILD_TYPE != BUILD_PROFILE, and for hipblas they are equal. So the file
was dead and the install.sh hide/restore of it was a no-op. Remove both. The
hipblas branch already installs vllm explicitly from the ROCm wheel index, so
deleting the bare-`vllm` file also removes a latent CUDA-wheel trap should the
installRequirements gap ever be closed.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-8 [Claude Code]

---------

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
This commit is contained in:
LocalAI [bot]
2026-07-03 00:44:55 +02:00
committed by GitHub
parent 237bce48e8
commit ef15b4bfda
2 changed files with 31 additions and 1 deletions

View File

@@ -35,6 +35,21 @@ if [ "x${BUILD_PROFILE}" == "xcpu" ]; then
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# AMD ROCm: vLLM ships prebuilt ROCm wheels, but on a DEDICATED index
# (https://wheels.vllm.ai/rocm/), NOT PyPI, and ONLY for CPython 3.12. On any
# other Python the installer silently falls back to the CUDA-only PyPI wheel,
# which is unusable on an AMD GPU (import fails, so the backend never finds the
# vllm module). Force Python 3.12 before the venv is created (matches the
# intel/l4t13 cp312 bump); the hipblas branch below pulls vllm from the ROCm
# wheel index. unsafe-best-match lets uv consult that index and PyPI together.
# https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=rocm
if [ "x${BUILD_TYPE}" == "xhipblas" ]; then
PYTHON_VERSION="3.12"
PYTHON_PATCH="12"
PY_STANDALONE_TAG="20251120"
EXTRA_PIP_INSTALL_FLAGS+=" --index-strategy=unsafe-best-match"
fi
# cublas13 pulls the vLLM wheel from a per-tag cu130 index (PyPI's vllm wheel
# is built against CUDA 12 and won't load on cu130). uv's default per-package
# first-match strategy would still pick the PyPI wheel, so allow it to consult
@@ -194,6 +209,22 @@ elif [ "x${BUILD_TYPE}" == "xintel" ]; then
export CMAKE_PREFIX_PATH="$(python -c 'import site; print(site.getsitepackages()[0])'):${CMAKE_PREFIX_PATH:-}"
VLLM_TARGET_DEVICE=xpu uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} --no-deps .
popd
# AMD ROCm: install vllm from its dedicated ROCm wheel index instead of the
# CUDA-only PyPI wheel. installRequirements brings the base ROCm
# torch/transformers (requirements-hipblas.txt), then we pull vllm (plus the
# matching ROCm torch, via --upgrade) from wheels.vllm.ai/rocm. This is the
# method upstream prescribes for AMD; the Python-3.12 pin is set above.
# There is intentionally no requirements-hipblas-after.txt: a bare `vllm`
# there would resolve to the CUDA wheel, and installRequirements never loads
# a ${BUILD_TYPE}-after file for hipblas anyway (BUILD_TYPE == BUILD_PROFILE).
# https://docs.vllm.ai/en/latest/getting_started/installation/gpu.html?device=rocm
elif [ "x${BUILD_TYPE}" == "xhipblas" ]; then
installRequirements
# --upgrade reconciles the base ROCm torch to whatever the vllm ROCm wheel
# pins; --extra-index-url adds the ROCm wheel repository on top of PyPI.
uv pip install ${EXTRA_PIP_INSTALL_FLAGS:-} \
--extra-index-url https://wheels.vllm.ai/rocm/ --upgrade vllm
# FROM_SOURCE=true on a CPU build skips the prebuilt vllm wheel in
# requirements-cpu-after.txt and compiles vllm locally against the host's
# actual CPU. Not used by default because it takes ~30-40 minutes, but

View File

@@ -1 +0,0 @@
vllm